Dataset statistics
| Number of variables | 11 |
|---|---|
| Number of observations | 1025010 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 2239 |
| Duplicate rows (%) | 0.2% |
| Total size in memory | 86.0 MiB |
| Average record size in memory | 88.0 B |
Variable types
| NUM | 6 |
|---|---|
| CAT | 5 |
Reproduction
| Analysis started | 2020-08-24 23:49:49.446172 |
|---|---|
| Analysis finished | 2020-08-24 23:50:43.382737 |
| Duration | 53.94 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Dataset has 2239 (0.2%) duplicate rows | Duplicates |
target has 513702 (50.1%) zeros | Zeros |
att_1
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 3 | |
|---|---|
| 1 | |
| 4 | |
| 2 |
| Value | Count | Frequency (%) | |
| 3 | 257150 | 25.1% | |
| 1 | 256087 | 25.0% | |
| 4 | 256077 | 25.0% | |
| 2 | 255696 | 24.9% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 3 | 257150 | 8.4% | |
| 1 | 256087 | 8.3% | |
| 4 | 256077 | 8.3% | |
| 2 | 255696 | 8.3% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 2050020 | 66.7% | |
| Other Punctuation | 1025010 | 33.3% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 1025010 | 50.0% | |
| 3 | 257150 | 12.5% | |
| 1 | 256087 | 12.5% | |
| 4 | 256077 | 12.5% | |
| 2 | 255696 | 12.5% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 3075030 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 3 | 257150 | 8.4% | |
| 1 | 256087 | 8.3% | |
| 4 | 256077 | 8.3% | |
| 2 | 255696 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 3075030 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 3 | 257150 | 8.4% | |
| 1 | 256087 | 8.3% | |
| 4 | 256077 | 8.3% | |
| 2 | 255696 | 8.3% |
att_2
Real number (ℝ≥0)
| Distinct count | 13 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.997861484278202 |
|---|---|
| Minimum | 1.0 |
| Maximum | 13.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 13 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.743529466 |
|---|---|
| Coefficient of variation (CV) | 0.5349533532 |
| Kurtosis | -1.21531586 |
| Mean | 6.997861484 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.0005632497179 |
| Sum | 7172878 |
| Variance | 14.01401286 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1 | 79234 | 7.7% | |
| 11 | 79158 | 7.7% | |
| 6 | 79142 | 7.7% | |
| 4 | 79017 | 7.7% | |
| 12 | 78858 | 7.7% | |
| 13 | 78833 | 7.7% | |
| 2 | 78818 | 7.7% | |
| 8 | 78786 | 7.7% | |
| 5 | 78769 | 7.7% | |
| 10 | 78761 | 7.7% | |
| 3 | 78690 | 7.7% | |
| 7 | 78542 | 7.7% | |
| 9 | 78402 | 7.6% |
| Value | Count | Frequency (%) | |
| 1 | 79234 | 7.7% | |
| 2 | 78818 | 7.7% | |
| 3 | 78690 | 7.7% | |
| 4 | 79017 | 7.7% | |
| 5 | 78769 | 7.7% | |
| 6 | 79142 | 7.7% | |
| 7 | 78542 | 7.7% | |
| 8 | 78786 | 7.7% | |
| 9 | 78402 | 7.6% | |
| 10 | 78761 | 7.7% |
| Value | Count | Frequency (%) | |
| 13 | 78833 | 7.7% | |
| 12 | 78858 | 7.7% | |
| 11 | 79158 | 7.7% | |
| 10 | 78761 | 7.7% | |
| 9 | 78402 | 7.6% | |
| 8 | 78786 | 7.7% | |
| 7 | 78542 | 7.7% | |
| 6 | 79142 | 7.7% | |
| 5 | 78769 | 7.7% | |
| 4 | 79017 | 7.7% |
att_3
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 1 | |
|---|---|
| 4 | |
| 3 | |
| 2 |
| Value | Count | Frequency (%) | |
| 1 | 256671 | 25.0% | |
| 4 | 256535 | 25.0% | |
| 3 | 255943 | 25.0% | |
| 2 | 255861 | 25.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 1 | 256671 | 8.3% | |
| 4 | 256535 | 8.3% | |
| 3 | 255943 | 8.3% | |
| 2 | 255861 | 8.3% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 2050020 | 66.7% | |
| Other Punctuation | 1025010 | 33.3% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 1025010 | 50.0% | |
| 1 | 256671 | 12.5% | |
| 4 | 256535 | 12.5% | |
| 3 | 255943 | 12.5% | |
| 2 | 255861 | 12.5% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 3075030 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 1 | 256671 | 8.3% | |
| 4 | 256535 | 8.3% | |
| 3 | 255943 | 8.3% | |
| 2 | 255861 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 3075030 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 1 | 256671 | 8.3% | |
| 4 | 256535 | 8.3% | |
| 3 | 255943 | 8.3% | |
| 2 | 255861 | 8.3% |
att_4
Real number (ℝ≥0)
| Distinct count | 13 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.006294572735876 |
|---|---|
| Minimum | 1.0 |
| Maximum | 13.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 13 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.744054308 |
|---|---|
| Coefficient of variation (CV) | 0.5343843695 |
| Kurtosis | -1.215713979 |
| Mean | 7.006294573 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.001749992216 |
| Sum | 7181522 |
| Variance | 14.01794266 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 11 | 79304 | 7.7% | |
| 12 | 79237 | 7.7% | |
| 13 | 79206 | 7.7% | |
| 7 | 79038 | 7.7% | |
| 3 | 79020 | 7.7% | |
| 6 | 78968 | 7.7% | |
| 8 | 78914 | 7.7% | |
| 2 | 78793 | 7.7% | |
| 1 | 78738 | 7.7% | |
| 10 | 78596 | 7.7% | |
| 4 | 78522 | 7.7% | |
| 9 | 78367 | 7.6% | |
| 5 | 78307 | 7.6% |
| Value | Count | Frequency (%) | |
| 1 | 78738 | 7.7% | |
| 2 | 78793 | 7.7% | |
| 3 | 79020 | 7.7% | |
| 4 | 78522 | 7.7% | |
| 5 | 78307 | 7.6% | |
| 6 | 78968 | 7.7% | |
| 7 | 79038 | 7.7% | |
| 8 | 78914 | 7.7% | |
| 9 | 78367 | 7.6% | |
| 10 | 78596 | 7.7% |
| Value | Count | Frequency (%) | |
| 13 | 79206 | 7.7% | |
| 12 | 79237 | 7.7% | |
| 11 | 79304 | 7.7% | |
| 10 | 78596 | 7.7% | |
| 9 | 78367 | 7.6% | |
| 8 | 78914 | 7.7% | |
| 7 | 79038 | 7.7% | |
| 6 | 78968 | 7.7% | |
| 5 | 78307 | 7.6% | |
| 4 | 78522 | 7.7% |
att_5
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 3 | |
|---|---|
| 4 | |
| 1 | |
| 2 |
| Value | Count | Frequency (%) | |
| 3 | 256901 | 25.1% | |
| 4 | 256531 | 25.0% | |
| 1 | 256331 | 25.0% | |
| 2 | 255247 | 24.9% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 3 | 256901 | 8.4% | |
| 4 | 256531 | 8.3% | |
| 1 | 256331 | 8.3% | |
| 2 | 255247 | 8.3% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 2050020 | 66.7% | |
| Other Punctuation | 1025010 | 33.3% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 1025010 | 50.0% | |
| 3 | 256901 | 12.5% | |
| 4 | 256531 | 12.5% | |
| 1 | 256331 | 12.5% | |
| 2 | 255247 | 12.5% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 3075030 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 3 | 256901 | 8.4% | |
| 4 | 256531 | 8.3% | |
| 1 | 256331 | 8.3% | |
| 2 | 255247 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 3075030 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 3 | 256901 | 8.4% | |
| 4 | 256531 | 8.3% | |
| 1 | 256331 | 8.3% | |
| 2 | 255247 | 8.3% |
att_6
Real number (ℝ≥0)
| Distinct count | 13 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.99924586101599 |
|---|---|
| Minimum | 1.0 |
| Maximum | 13.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 13 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.74196432 |
|---|---|
| Coefficient of variation (CV) | 0.5346239287 |
| Kurtosis | -1.213944154 |
| Mean | 6.999245861 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.0003795025901 |
| Sum | 7174297 |
| Variance | 14.00229697 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 7 | 79444 | 7.8% | |
| 10 | 79073 | 7.7% | |
| 1 | 79069 | 7.7% | |
| 2 | 78969 | 7.7% | |
| 8 | 78905 | 7.7% | |
| 5 | 78871 | 7.7% | |
| 11 | 78865 | 7.7% | |
| 12 | 78855 | 7.7% | |
| 13 | 78765 | 7.7% | |
| 4 | 78748 | 7.7% | |
| 6 | 78587 | 7.7% | |
| 3 | 78518 | 7.7% | |
| 9 | 78341 | 7.6% |
| Value | Count | Frequency (%) | |
| 1 | 79069 | 7.7% | |
| 2 | 78969 | 7.7% | |
| 3 | 78518 | 7.7% | |
| 4 | 78748 | 7.7% | |
| 5 | 78871 | 7.7% | |
| 6 | 78587 | 7.7% | |
| 7 | 79444 | 7.8% | |
| 8 | 78905 | 7.7% | |
| 9 | 78341 | 7.6% | |
| 10 | 79073 | 7.7% |
| Value | Count | Frequency (%) | |
| 13 | 78765 | 7.7% | |
| 12 | 78855 | 7.7% | |
| 11 | 78865 | 7.7% | |
| 10 | 79073 | 7.7% | |
| 9 | 78341 | 7.6% | |
| 8 | 78905 | 7.7% | |
| 7 | 79444 | 7.8% | |
| 6 | 78587 | 7.7% | |
| 5 | 78871 | 7.7% | |
| 4 | 78748 | 7.7% |
att_7
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 3 | |
|---|---|
| 2 | |
| 4 | |
| 1 |
| Value | Count | Frequency (%) | |
| 3 | 256914 | 25.1% | |
| 2 | 256530 | 25.0% | |
| 4 | 255816 | 25.0% | |
| 1 | 255750 | 25.0% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 3 | 256914 | 8.4% | |
| 2 | 256530 | 8.3% | |
| 4 | 255816 | 8.3% | |
| 1 | 255750 | 8.3% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 2050020 | 66.7% | |
| Other Punctuation | 1025010 | 33.3% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 1025010 | 50.0% | |
| 3 | 256914 | 12.5% | |
| 2 | 256530 | 12.5% | |
| 4 | 255816 | 12.5% | |
| 1 | 255750 | 12.5% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 3075030 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 3 | 256914 | 8.4% | |
| 2 | 256530 | 8.3% | |
| 4 | 255816 | 8.3% | |
| 1 | 255750 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 3075030 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 3 | 256914 | 8.4% | |
| 2 | 256530 | 8.3% | |
| 4 | 255816 | 8.3% | |
| 1 | 255750 | 8.3% |
att_8
Real number (ℝ≥0)
| Distinct count | 13 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.000838040604482 |
|---|---|
| Minimum | 1.0 |
| Maximum | 13.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 13 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.74142301 |
|---|---|
| Coefficient of variation (CV) | 0.5344250201 |
| Kurtosis | -1.214147951 |
| Mean | 7.000838041 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | -0.0005056426607 |
| Sum | 7175929 |
| Variance | 13.99824614 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 7 | 79524 | 7.8% | |
| 2 | 79075 | 7.7% | |
| 9 | 79036 | 7.7% | |
| 11 | 78895 | 7.7% | |
| 12 | 78853 | 7.7% | |
| 13 | 78834 | 7.7% | |
| 4 | 78790 | 7.7% | |
| 10 | 78763 | 7.7% | |
| 3 | 78749 | 7.7% | |
| 1 | 78717 | 7.7% | |
| 5 | 78698 | 7.7% | |
| 8 | 78582 | 7.7% | |
| 6 | 78494 | 7.7% |
| Value | Count | Frequency (%) | |
| 1 | 78717 | 7.7% | |
| 2 | 79075 | 7.7% | |
| 3 | 78749 | 7.7% | |
| 4 | 78790 | 7.7% | |
| 5 | 78698 | 7.7% | |
| 6 | 78494 | 7.7% | |
| 7 | 79524 | 7.8% | |
| 8 | 78582 | 7.7% | |
| 9 | 79036 | 7.7% | |
| 10 | 78763 | 7.7% |
| Value | Count | Frequency (%) | |
| 13 | 78834 | 7.7% | |
| 12 | 78853 | 7.7% | |
| 11 | 78895 | 7.7% | |
| 10 | 78763 | 7.7% | |
| 9 | 79036 | 7.7% | |
| 8 | 78582 | 7.7% | |
| 7 | 79524 | 7.8% | |
| 6 | 78494 | 7.7% | |
| 5 | 78698 | 7.7% | |
| 4 | 78790 | 7.7% |
att_9
Categorical
| Distinct count | 4 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 7.8 MiB |
| 1 | |
|---|---|
| 4 | |
| 3 | |
| 2 |
| Value | Count | Frequency (%) | |
| 1 | 257063 | 25.1% | |
| 4 | 256483 | 25.0% | |
| 3 | 255986 | 25.0% | |
| 2 | 255478 | 24.9% |
Length
| Max length | 3 |
|---|---|
| Median length | 3 |
| Mean length | 3 |
| Min length | 3 |
Most occurring characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 1 | 257063 | 8.4% | |
| 4 | 256483 | 8.3% | |
| 3 | 255986 | 8.3% | |
| 2 | 255478 | 8.3% |
Most occurring categories
| Value | Count | Frequency (%) | |
| Decimal Number | 2050020 | 66.7% | |
| Other Punctuation | 1025010 | 33.3% |
Most frequent Decimal Number characters
| Value | Count | Frequency (%) | |
| 0 | 1025010 | 50.0% | |
| 1 | 257063 | 12.5% | |
| 4 | 256483 | 12.5% | |
| 3 | 255986 | 12.5% | |
| 2 | 255478 | 12.5% |
Most frequent Other Punctuation characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 100.0% |
Most occurring scripts
| Value | Count | Frequency (%) | |
| Common | 3075030 | 100.0% |
Most frequent Common characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 1 | 257063 | 8.4% | |
| 4 | 256483 | 8.3% | |
| 3 | 255986 | 8.3% | |
| 2 | 255478 | 8.3% |
Most occurring blocks
| Value | Count | Frequency (%) | |
| ASCII | 3075030 | 100.0% |
Most frequent ASCII characters
| Value | Count | Frequency (%) | |
| . | 1025010 | 33.3% | |
| 0 | 1025010 | 33.3% | |
| 1 | 257063 | 8.4% | |
| 4 | 256483 | 8.3% | |
| 3 | 255986 | 8.3% | |
| 2 | 255478 | 8.3% |
att_10
Real number (ℝ≥0)
| Distinct count | 13 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.98882840167413 |
|---|---|
| Minimum | 1.0 |
| Maximum | 13.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 7 |
| Q3 | 10 |
| 95-th percentile | 13 |
| Maximum | 13 |
| Range | 12 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.739935723 |
|---|---|
| Coefficient of variation (CV) | 0.5351305695 |
| Kurtosis | -1.214092216 |
| Mean | 6.988828402 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.003976262213 |
| Sum | 7163619 |
| Variance | 13.98711921 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3 | 79480 | 7.8% | |
| 2 | 79379 | 7.7% | |
| 4 | 79200 | 7.7% | |
| 9 | 79200 | 7.7% | |
| 8 | 79040 | 7.7% | |
| 10 | 78924 | 7.7% | |
| 5 | 78787 | 7.7% | |
| 6 | 78761 | 7.7% | |
| 1 | 78707 | 7.7% | |
| 7 | 78515 | 7.7% | |
| 13 | 78479 | 7.7% | |
| 12 | 78303 | 7.6% | |
| 11 | 78235 | 7.6% |
| Value | Count | Frequency (%) | |
| 1 | 78707 | 7.7% | |
| 2 | 79379 | 7.7% | |
| 3 | 79480 | 7.8% | |
| 4 | 79200 | 7.7% | |
| 5 | 78787 | 7.7% | |
| 6 | 78761 | 7.7% | |
| 7 | 78515 | 7.7% | |
| 8 | 79040 | 7.7% | |
| 9 | 79200 | 7.7% | |
| 10 | 78924 | 7.7% |
| Value | Count | Frequency (%) | |
| 13 | 78479 | 7.7% | |
| 12 | 78303 | 7.6% | |
| 11 | 78235 | 7.6% | |
| 10 | 78924 | 7.7% | |
| 9 | 79200 | 7.7% | |
| 8 | 79040 | 7.7% | |
| 7 | 78515 | 7.7% | |
| 6 | 78761 | 7.7% | |
| 5 | 78787 | 7.7% | |
| 4 | 79200 | 7.7% |
| Distinct count | 10 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.6170056877493878 |
|---|---|
| Minimum | 0.0 |
| Maximum | 9.0 |
| Zeros | 513702 |
| Zeros (%) | 50.1% |
| Memory size | 7.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 2 |
| Maximum | 9 |
| Range | 9 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 0.7737462783 |
|---|---|
| Coefficient of variation (CV) | 1.254034272 |
| Kurtosis | 7.738836538 |
| Mean | 0.6170056877 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 2.006835438 |
| Sum | 632437 |
| Variance | 0.5986833031 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 513702 | 50.1% | |
| 1 | 433097 | 42.3% | |
| 2 | 48828 | 4.8% | |
| 3 | 21634 | 2.1% | |
| 4 | 3978 | 0.4% | |
| 5 | 2050 | 0.2% | |
| 6 | 1460 | 0.1% | |
| 7 | 236 | < 0.1% | |
| 8 | 17 | < 0.1% | |
| 9 | 8 | < 0.1% |
| Value | Count | Frequency (%) | |
| 0 | 513702 | 50.1% | |
| 1 | 433097 | 42.3% | |
| 2 | 48828 | 4.8% | |
| 3 | 21634 | 2.1% | |
| 4 | 3978 | 0.4% | |
| 5 | 2050 | 0.2% | |
| 6 | 1460 | 0.1% | |
| 7 | 236 | < 0.1% | |
| 8 | 17 | < 0.1% | |
| 9 | 8 | < 0.1% |
| Value | Count | Frequency (%) | |
| 9 | 8 | < 0.1% | |
| 8 | 17 | < 0.1% | |
| 7 | 236 | < 0.1% | |
| 6 | 1460 | 0.1% | |
| 5 | 2050 | 0.2% | |
| 4 | 3978 | 0.4% | |
| 3 | 21634 | 2.1% | |
| 2 | 48828 | 4.8% | |
| 1 | 433097 | 42.3% | |
| 0 | 513702 | 50.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
| att_1 | att_2 | att_3 | att_4 | att_5 | att_6 | att_7 | att_8 | att_9 | att_10 | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 10.0 | 1.0 | 11.0 | 1.0 | 13.0 | 1.0 | 12.0 | 1.0 | 1.0 | 9.0 |
| 1 | 2.0 | 11.0 | 2.0 | 13.0 | 2.0 | 10.0 | 2.0 | 12.0 | 2.0 | 1.0 | 9.0 |
| 2 | 3.0 | 12.0 | 3.0 | 11.0 | 3.0 | 13.0 | 3.0 | 10.0 | 3.0 | 1.0 | 9.0 |
| 3 | 4.0 | 10.0 | 4.0 | 11.0 | 4.0 | 1.0 | 4.0 | 13.0 | 4.0 | 12.0 | 9.0 |
| 4 | 4.0 | 1.0 | 4.0 | 13.0 | 4.0 | 12.0 | 4.0 | 11.0 | 4.0 | 10.0 | 9.0 |
| 5 | 1.0 | 2.0 | 1.0 | 4.0 | 1.0 | 5.0 | 1.0 | 3.0 | 1.0 | 6.0 | 8.0 |
| 6 | 1.0 | 9.0 | 1.0 | 12.0 | 1.0 | 10.0 | 1.0 | 11.0 | 1.0 | 13.0 | 8.0 |
| 7 | 2.0 | 1.0 | 2.0 | 2.0 | 2.0 | 3.0 | 2.0 | 4.0 | 2.0 | 5.0 | 8.0 |
| 8 | 3.0 | 5.0 | 3.0 | 6.0 | 3.0 | 9.0 | 3.0 | 7.0 | 3.0 | 8.0 | 8.0 |
| 9 | 4.0 | 1.0 | 4.0 | 4.0 | 4.0 | 2.0 | 4.0 | 3.0 | 4.0 | 5.0 | 8.0 |
Last rows
| att_1 | att_2 | att_3 | att_4 | att_5 | att_6 | att_7 | att_8 | att_9 | att_10 | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1025000 | 2.0 | 12.0 | 4.0 | 3.0 | 1.0 | 3.0 | 3.0 | 5.0 | 3.0 | 2.0 | 1.0 |
| 1025001 | 1.0 | 4.0 | 4.0 | 8.0 | 4.0 | 5.0 | 3.0 | 9.0 | 2.0 | 1.0 | 0.0 |
| 1025002 | 1.0 | 9.0 | 3.0 | 6.0 | 2.0 | 8.0 | 3.0 | 5.0 | 2.0 | 9.0 | 1.0 |
| 1025003 | 1.0 | 12.0 | 3.0 | 9.0 | 3.0 | 6.0 | 1.0 | 3.0 | 1.0 | 9.0 | 1.0 |
| 1025004 | 3.0 | 7.0 | 1.0 | 6.0 | 4.0 | 12.0 | 2.0 | 1.0 | 1.0 | 4.0 | 0.0 |
| 1025005 | 3.0 | 1.0 | 1.0 | 12.0 | 2.0 | 9.0 | 4.0 | 9.0 | 2.0 | 6.0 | 1.0 |
| 1025006 | 3.0 | 3.0 | 4.0 | 5.0 | 2.0 | 7.0 | 1.0 | 4.0 | 4.0 | 3.0 | 1.0 |
| 1025007 | 1.0 | 11.0 | 4.0 | 7.0 | 3.0 | 9.0 | 1.0 | 13.0 | 2.0 | 7.0 | 1.0 |
| 1025008 | 3.0 | 11.0 | 1.0 | 8.0 | 1.0 | 1.0 | 3.0 | 13.0 | 2.0 | 8.0 | 1.0 |
| 1025009 | 2.0 | 5.0 | 2.0 | 9.0 | 4.0 | 9.0 | 2.0 | 3.0 | 3.0 | 3.0 | 2.0 |
Most frequent
| att_1 | att_2 | att_3 | att_4 | att_5 | att_6 | att_7 | att_8 | att_9 | att_10 | target | count | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 410 | 1.0 | 10.0 | 2.0 | 3.0 | 3.0 | 6.0 | 3.0 | 12.0 | 2.0 | 4.0 | 0.0 | 3 |
| 911 | 2.0 | 9.0 | 1.0 | 1.0 | 2.0 | 10.0 | 3.0 | 4.0 | 4.0 | 13.0 | 0.0 | 3 |
| 2027 | 4.0 | 8.0 | 3.0 | 8.0 | 2.0 | 2.0 | 4.0 | 1.0 | 2.0 | 7.0 | 1.0 | 3 |
| 0 | 1.0 | 1.0 | 1.0 | 3.0 | 1.0 | 6.0 | 2.0 | 7.0 | 4.0 | 12.0 | 0.0 | 2 |
| 1 | 1.0 | 1.0 | 1.0 | 3.0 | 3.0 | 4.0 | 4.0 | 2.0 | 3.0 | 10.0 | 0.0 | 2 |
| 2 | 1.0 | 1.0 | 1.0 | 10.0 | 2.0 | 11.0 | 2.0 | 10.0 | 3.0 | 2.0 | 1.0 | 2 |
| 3 | 1.0 | 1.0 | 1.0 | 12.0 | 3.0 | 12.0 | 4.0 | 13.0 | 1.0 | 11.0 | 1.0 | 2 |
| 4 | 1.0 | 1.0 | 2.0 | 1.0 | 1.0 | 8.0 | 4.0 | 2.0 | 2.0 | 8.0 | 2.0 | 2 |
| 5 | 1.0 | 1.0 | 2.0 | 1.0 | 3.0 | 3.0 | 3.0 | 5.0 | 4.0 | 9.0 | 1.0 | 2 |
| 6 | 1.0 | 1.0 | 2.0 | 2.0 | 4.0 | 7.0 | 3.0 | 11.0 | 2.0 | 8.0 | 0.0 | 2 |