Dataset info
| Number of variables | 9 |
|---|---|
| Number of observations | 5000 |
| Missing cells | 66 (0.1%) |
| Duplicate rows | 7 (0.1%) |
| Total size in memory | 1.7 MiB |
| Average record size in memory | 359.7 B |
Variables types
| CAT | 5 |
|---|---|
| NUM | 4 |
Reproduction info
| Date of analysis | 2020-09-19 00:30:19.284260 |
|---|---|
| Version | pandas-profiling v2.4.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download Configuration | config.yaml |
Warnings
| Dataset has 7 (0.1%) duplicate rows | Warning |
order_date has a high cardinality: 157 distinct values | Warning |
product_id has a high cardinality: 1322 distinct values | Warning |
order_id
Real number (ℝ≥0)
| Distinct count | 3234 |
|---|---|
| Unique (%) | 64.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1707213.704 |
|---|---|
| Minimum | 1666774 |
| Maximum | 1742998 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 39.2 KiB |
Quantile statistics
| Minimum | 1666774 |
|---|---|
| 5-th percentile | 1671745 |
| Q1 | 1688852.5 |
| median | 1708448.5 |
| Q3 | 1725622.75 |
| 95-th percentile | 1739225.2 |
| Maximum | 1742998 |
| Range | 76224 |
| Interquartile range (IQR) | 36770.25 |
Descriptive statistics
| Standard deviation | 21525.81671 |
|---|---|
| Coefficient of variation (CV) | 0.01260874176 |
| Kurtosis | -1.140615918 |
| Mean | 1707213.704 |
| Median Absolute Deviation (MAD) | 18514.66223 |
| Skewness | -0.1439536427 |
| Sum | 8536068518 |
| Variance | 463360785.2 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1695451 | 13 | 0.3% | |
| 1720729 | 12 | 0.2% | |
| 1714324 | 11 | 0.2% | |
| 1737589 | 10 | 0.2% | |
| 1711465 | 10 | 0.2% | |
| 1726822 | 10 | 0.2% | |
| 1742182 | 9 | 0.2% | |
| 1729018 | 9 | 0.2% | |
| 1701550 | 9 | 0.2% | |
| 1707739 | 9 | 0.2% | |
| Other values (3224) | 4898 | 98.0% |
| Value | Count | Frequency (%) | |
| 1666774 | 1 | < 0.1% | |
| 1666783 | 2 | < 0.1% | |
| 1666813 | 1 | < 0.1% | |
| 1666882 | 1 | < 0.1% | |
| 1666894 | 2 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1742998 | 1 | < 0.1% | |
| 1742992 | 2 | < 0.1% | |
| 1742989 | 1 | < 0.1% | |
| 1742953 | 1 | < 0.1% | |
| 1742947 | 2 | < 0.1% |
| Distinct count | 157 |
|---|---|
| Unique (%) | 3.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 39.2 KiB |
| 30/12/2019 | 70 |
|---|---|
| 18/12/2019 | 68 |
| 05/12/2019 | 67 |
| 01/12/2019 | 67 |
| 15/12/2019 | 66 |
| Other values (152) |
| Value | Count | Frequency (%) | |
| 30/12/2019 | 70 | 1.4% | |
| 18/12/2019 | 68 | 1.4% | |
| 05/12/2019 | 67 | 1.3% | |
| 01/12/2019 | 67 | 1.3% | |
| 15/12/2019 | 66 | 1.3% | |
| 04/11/2019 | 66 | 1.3% | |
| 24/12/2019 | 63 | 1.3% | |
| 24/11/2019 | 62 | 1.2% | |
| 22/12/2019 | 60 | 1.2% | |
| 11/12/2019 | 60 | 1.2% | |
| Other values (147) | 4351 | 87.0% |
customer_id
Real number (ℝ≥0)
| Distinct count | 1773 |
|---|---|
| Unique (%) | 35.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 15474.8326 |
|---|---|
| Minimum | 12391 |
| Maximum | 18287 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 39.2 KiB |
Quantile statistics
| Minimum | 12391 |
|---|---|
| 5-th percentile | 12890 |
| Q1 | 14096 |
| median | 15492.5 |
| Q3 | 16916 |
| 95-th percentile | 17954.15 |
| Maximum | 18287 |
| Range | 5896 |
| Interquartile range (IQR) | 2820 |
Descriptive statistics
| Standard deviation | 1650.211651 |
|---|---|
| Coefficient of variation (CV) | 0.106638417 |
| Kurtosis | -1.217662056 |
| Mean | 15474.8326 |
| Median Absolute Deviation (MAD) | 1438.657936 |
| Skewness | -0.00997199229 |
| Sum | 77374163 |
| Variance | 2723198.495 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 17841 | 117 | 2.3% | |
| 14096 | 106 | 2.1% | |
| 12748 | 68 | 1.4% | |
| 14298 | 26 | 0.5% | |
| 14456 | 24 | 0.5% | |
| 14606 | 24 | 0.5% | |
| 13263 | 23 | 0.5% | |
| 16549 | 23 | 0.5% | |
| 13089 | 20 | 0.4% | |
| 18283 | 18 | 0.4% | |
| Other values (1763) | 4551 | 91.0% |
| Value | Count | Frequency (%) | |
| 12391 | 1 | < 0.1% | |
| 12420 | 1 | < 0.1% | |
| 12471 | 5 | 0.1% | |
| 12472 | 6 | 0.1% | |
| 12474 | 5 | 0.1% |
| Value | Count | Frequency (%) | |
| 18287 | 2 | < 0.1% | |
| 18283 | 18 | 0.4% | |
| 18282 | 1 | < 0.1% | |
| 18272 | 3 | 0.1% | |
| 18265 | 2 | < 0.1% |
city
Categorical
| Distinct count | 19 |
|---|---|
| Unique (%) | 0.4% |
| Missing | 16 |
| Missing (%) | 0.3% |
| Memory size | 39.2 KiB |
| Jakarta Selatan | |
|---|---|
| Jakarta Pusat | |
| Jakarta Utara | 433 |
| Yogyakarta | 392 |
| Jakarta Barat | 349 |
| Other values (13) |
| Value | Count | Frequency (%) | |
| Jakarta Selatan | 791 | 15.8% | |
| Jakarta Pusat | 675 | 13.5% | |
| Jakarta Utara | 433 | 8.7% | |
| Yogyakarta | 392 | 7.8% | |
| Jakarta Barat | 349 | 7.0% | |
| Malang | 296 | 5.9% | |
| Jakarta Timur | 295 | 5.9% | |
| Surakarta | 288 | 5.8% | |
| Bogor | 255 | 5.1% | |
| Tangerang | 252 | 5.0% | |
| Other values (8) | 958 | 19.2% |
province
Categorical
| Distinct count | 8 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 12 |
| Missing (%) | 0.2% |
| Memory size | 39.2 KiB |
| DKI Jakarta | |
|---|---|
| Jawa Barat | |
| Jawa Tengah | 456 |
| Jawa Timur | 397 |
| Yogyakarta | 391 |
| Other values (2) | 375 |
| Value | Count | Frequency (%) | |
| DKI Jakarta | 2547 | 50.9% | |
| Jawa Barat | 822 | 16.4% | |
| Jawa Tengah | 456 | 9.1% | |
| Jawa Timur | 397 | 7.9% | |
| Yogyakarta | 391 | 7.8% | |
| Banten | 251 | 5.0% | |
| Bali | 124 | 2.5% | |
| (Missing) | 12 | 0.2% |
| Distinct count | 1322 |
|---|---|
| Unique (%) | 26.4% |
| Missing | 11 |
| Missing (%) | 0.2% |
| Memory size | 39.2 KiB |
| P4009 | 35 |
|---|---|
| P1902 | 27 |
| P0255 | 26 |
| P2521 | 25 |
| P2094 | 25 |
| Other values (1316) |
| Value | Count | Frequency (%) | |
| P4009 | 35 | 0.7% | |
| P1902 | 27 | 0.5% | |
| P0255 | 26 | 0.5% | |
| P2521 | 25 | 0.5% | |
| P2094 | 25 | 0.5% | |
| P2101 | 24 | 0.5% | |
| P2489 | 23 | 0.5% | |
| P2853 | 23 | 0.5% | |
| P2445 | 22 | 0.4% | |
| P2089 | 22 | 0.4% | |
| Other values (1311) | 4737 | 94.7% |
brand
Categorical
| Distinct count | 10 |
|---|---|
| Unique (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 39.2 KiB |
| BRAND_S | |
|---|---|
| BRAND_P | |
| BRAND_R | |
| BRAND_C | |
| BRAND_A | |
| Other values (5) |
| Value | Count | Frequency (%) | |
| BRAND_S | 989 | 19.8% | |
| BRAND_P | 665 | 13.3% | |
| BRAND_R | 627 | 12.5% | |
| BRAND_C | 536 | 10.7% | |
| BRAND_A | 427 | 8.5% | |
| BRAND_W | 385 | 7.7% | |
| BRAND_L | 374 | 7.5% | |
| BRAND_B | 344 | 6.9% | |
| BRAND_J | 334 | 6.7% | |
| BRAND_H | 319 | 6.4% |
quantity
Real number (ℝ≥0)
| Distinct count | 65 |
|---|---|
| Unique (%) | 1.3% |
| Missing | 14 |
| Missing (%) | 0.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.42398716 |
|---|---|
| Minimum | 1 |
| Maximum | 720 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 39.2 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 5 |
| Q3 | 12 |
| 95-th percentile | 36 |
| Maximum | 720 |
| Range | 719 |
| Interquartile range (IQR) | 10 |
Descriptive statistics
| Standard deviation | 29.44202501 |
|---|---|
| Coefficient of variation (CV) | 2.577210967 |
| Kurtosis | 208.764203 |
| Mean | 11.42398716 |
| Median Absolute Deviation (MAD) | 10.797158 |
| Skewness | 12.2428929 |
| Sum | 56960 |
| Variance | 866.8328367 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 938 | 18.8% | |
| 2 | 732 | 14.6% | |
| 12 | 677 | 13.5% | |
| 6 | 457 | 9.1% | |
| 4 | 383 | 7.7% | |
| 3 | 367 | 7.3% | |
| 10 | 351 | 7.0% | |
| 24 | 277 | 5.5% | |
| 8 | 130 | 2.6% | |
| 5 | 121 | 2.4% | |
| Other values (54) | 553 | 11.1% |
| Value | Count | Frequency (%) | |
| 1 | 938 | 18.8% | |
| 2 | 732 | 14.6% | |
| 3 | 367 | 7.3% | |
| 4 | 383 | 7.7% | |
| 5 | 121 | 2.4% |
| Value | Count | Frequency (%) | |
| 720 | 1 | < 0.1% | |
| 600 | 1 | < 0.1% | |
| 576 | 2 | < 0.1% | |
| 480 | 1 | < 0.1% | |
| 432 | 2 | < 0.1% |
item_price
Real number (ℝ≥0)
| Distinct count | 127 |
|---|---|
| Unique (%) | 2.5% |
| Missing | 13 |
| Missing (%) | 0.3% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 933742.7311 |
|---|---|
| Minimum | 26000 |
| Maximum | 29762000 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 39.2 KiB |
Quantile statistics
| Minimum | 26000 |
|---|---|
| 5-th percentile | 149000 |
| Q1 | 450000 |
| median | 604000 |
| Q3 | 1045000 |
| 95-th percentile | 2795000 |
| Maximum | 29762000 |
| Range | 29736000 |
| Interquartile range (IQR) | 595000 |
Descriptive statistics
| Standard deviation | 1030829.81 |
|---|---|
| Coefficient of variation (CV) | 1.103976263 |
| Kurtosis | 137.870979 |
| Mean | 933742.7311 |
| Median Absolute Deviation (MAD) | 623479.205 |
| Skewness | 7.099641665 |
| Sum | 4656575000 |
| Variance | 1.062610098e+12 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 590000 | 540 | 10.8% | |
| 450000 | 527 | 10.5% | |
| 740000 | 323 | 6.5% | |
| 1045000 | 287 | 5.7% | |
| 310000 | 264 | 5.3% | |
| 1745000 | 258 | 5.2% | |
| 159000 | 245 | 4.9% | |
| 747000 | 232 | 4.6% | |
| 1325000 | 196 | 3.9% | |
| 520000 | 177 | 3.5% | |
| Other values (116) | 1938 | 38.8% |
| Value | Count | Frequency (%) | |
| 26000 | 4 | 0.1% | |
| 33000 | 2 | < 0.1% | |
| 37000 | 1 | < 0.1% | |
| 40000 | 1 | < 0.1% | |
| 54000 | 3 | 0.1% |
| Value | Count | Frequency (%) | |
| 29762000 | 1 | < 0.1% | |
| 13995000 | 1 | < 0.1% | |
| 11545000 | 1 | < 0.1% | |
| 10495000 | 2 | < 0.1% | |
| 8748000 | 1 | < 0.1% |
First rows
| order_id | order_date | customer_id | city | province | product_id | brand | quantity | item_price | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 1703458 | 17/10/2019 | 14004 | Jakarta Selatan | DKI Jakarta | P1910 | BRAND_J | 10.0 | 740000.0 |
| 1 | 1706815 | 24/10/2019 | 17220 | Jakarta Selatan | DKI Jakarta | P2934 | BRAND_R | 2.0 | 604000.0 |
| 2 | 1710718 | 03/11/2019 | 16518 | Jakarta Utara | DKI Jakarta | P0908 | BRAND_C | 8.0 | 1045000.0 |
| 3 | 1683592 | 19/08/2019 | 16364 | Jakarta Barat | DKI Jakarta | P0128 | BRAND_A | 4.0 | 205000.0 |
| 4 | 1702573 | 16/10/2019 | 15696 | Jakarta Timur | DKI Jakarta | P2968 | BRAND_R | 2.0 | NaN |
| 5 | 1672906 | 16/07/2019 | 12748 | Jakarta Utara | DKI Jakarta | P0710 | BRAND_C | 4.0 | 520000.0 |
| 6 | 1711399 | 04/11/2019 | 16791 | Jakarta Barat | DKI Jakarta | P0860 | BRAND_C | 1.0 | 1465000.0 |
| 7 | 1695367 | 26/09/2019 | 13069 | Surakarta | Jawa Tengah | P3342 | BRAND_S | 2.0 | 205000.0 |
| 8 | 1741846 | 30/12/2019 | 16873 | Jakarta Barat | DKI Jakarta | P3203 | BRAND_S | 32.0 | 450000.0 |
| 9 | 1720189 | 24/11/2019 | 14723 | Tangerang | Banten | P1701 | BRAND_H | 2.0 | 149000.0 |
Last rows
| order_id | order_date | customer_id | city | province | product_id | brand | quantity | item_price | |
|---|---|---|---|---|---|---|---|---|---|
| 4990 | 1678408 | 02/08/2019 | 15182 | Jakarta Pusat | DKI Jakarta | P1903 | BRAND_J | 10.0 | 740000.0 |
| 4991 | 1736503 | 20/12/2019 | 18109 | Tangerang | Banten | P2866 | BRAND_R | 3.0 | 1045000.0 |
| 4992 | 1734787 | 18/12/2019 | 18283 | Jakarta Selatan | DKI Jakarta | P0734 | BRAND_C | 2.0 | 310000.0 |
| 4993 | 1678615 | 04/08/2019 | 16880 | Bekasi | Jawa Barat | P2426 | BRAND_P | 3.0 | 310000.0 |
| 4994 | 1707424 | 25/10/2019 | 13021 | Yogyakarta | Yogyakarta | P1913 | BRAND_J | 10.0 | 740000.0 |
| 4995 | 1724011 | 01/12/2019 | 12838 | Tangerang | Banten | P3047 | BRAND_R | 2.0 | 450000.0 |
| 4996 | 1676302 | 28/07/2019 | 13833 | Bogor | Jawa Barat | P0760 | BRAND_C | 3.0 | 1465000.0 |
| 4997 | 1706071 | 23/10/2019 | 16332 | Jakarta Timur | DKI Jakarta | P1681 | BRAND_H | 4.0 | 747000.0 |
| 4998 | 1703620 | 17/10/2019 | 13055 | Jakarta Barat | DKI Jakarta | P0757 | BRAND_C | 8.0 | 695000.0 |
| 4999 | 1720036 | 24/11/2019 | 17609 | Jakarta Pusat | DKI Jakarta | P3334 | BRAND_S | 1.0 | 1045000.0 |