Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 1000 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 732 |
| Duplicate rows (%) | 73.2% |
| Total size in memory | 39.2 KiB |
| Average record size in memory | 40.1 B |
Variable types
| NUM | 5 |
|---|
Reproduction
| Analysis started | 2020-08-24 23:43:58.061762 |
|---|---|
| Analysis finished | 2020-08-24 23:44:03.007575 |
| Duration | 4.95 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
| Distinct count | 13 |
|---|---|
| Unique (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 7.782 |
|---|---|
| Minimum | 0.0 |
| Maximum | 14.0 |
| Zeros | 49 |
| Zeros (%) | 4.9% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 9 |
| Q3 | 12 |
| 95-th percentile | 14 |
| Maximum | 14 |
| Range | 14 |
| Interquartile range (IQR) | 9 |
Descriptive statistics
| Standard deviation | 4.595388296 |
|---|---|
| Coefficient of variation (CV) | 0.5905150728 |
| Kurtosis | -1.322574979 |
| Mean | 7.782 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | -0.3369267852 |
| Sum | 7782 |
| Variance | 21.11759359 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 12 | 140 | 14.0% | |
| 10 | 119 | 11.9% | |
| 13 | 101 | 10.1% | |
| 2 | 93 | 9.3% | |
| 1 | 91 | 9.1% | |
| 9 | 74 | 7.4% | |
| 14 | 74 | 7.4% | |
| 8 | 69 | 6.9% | |
| 3 | 63 | 6.3% | |
| 7 | 62 | 6.2% | |
| 0 | 49 | 4.9% | |
| 5 | 45 | 4.5% | |
| 11 | 20 | 2.0% |
| Value | Count | Frequency (%) | |
| 0 | 49 | 4.9% | |
| 1 | 91 | 9.1% | |
| 2 | 93 | 9.3% | |
| 3 | 63 | 6.3% | |
| 5 | 45 | 4.5% | |
| 7 | 62 | 6.2% | |
| 8 | 69 | 6.9% | |
| 9 | 74 | 7.4% | |
| 10 | 119 | 11.9% | |
| 11 | 20 | 2.0% |
| Value | Count | Frequency (%) | |
| 14 | 74 | 7.4% | |
| 13 | 101 | 10.1% | |
| 12 | 140 | 14.0% | |
| 11 | 20 | 2.0% | |
| 10 | 119 | 11.9% | |
| 9 | 74 | 7.4% | |
| 8 | 69 | 6.9% | |
| 7 | 62 | 6.2% | |
| 5 | 45 | 4.5% | |
| 3 | 63 | 6.3% |
| Distinct count | 15 |
|---|---|
| Unique (%) | 1.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.516 |
|---|---|
| Minimum | 0.0 |
| Maximum | 14.0 |
| Zeros | 19 |
| Zeros (%) | 1.9% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 3 |
| median | 6 |
| Q3 | 10 |
| 95-th percentile | 13 |
| Maximum | 14 |
| Range | 14 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 3.797389125 |
|---|---|
| Coefficient of variation (CV) | 0.5827791781 |
| Kurtosis | -1.053124228 |
| Mean | 6.516 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 0.276116374 |
| Sum | 6516 |
| Variance | 14.42016416 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 2 | 138 | 13.8% | |
| 3 | 133 | 13.3% | |
| 6 | 122 | 12.2% | |
| 10 | 103 | 10.3% | |
| 9 | 87 | 8.7% | |
| 13 | 67 | 6.7% | |
| 7 | 65 | 6.5% | |
| 4 | 63 | 6.3% | |
| 8 | 48 | 4.8% | |
| 5 | 42 | 4.2% | |
| 12 | 40 | 4.0% | |
| 14 | 27 | 2.7% | |
| 1 | 24 | 2.4% | |
| 11 | 22 | 2.2% | |
| 0 | 19 | 1.9% |
| Value | Count | Frequency (%) | |
| 0 | 19 | 1.9% | |
| 1 | 24 | 2.4% | |
| 2 | 138 | 13.8% | |
| 3 | 133 | 13.3% | |
| 4 | 63 | 6.3% | |
| 5 | 42 | 4.2% | |
| 6 | 122 | 12.2% | |
| 7 | 65 | 6.5% | |
| 8 | 48 | 4.8% | |
| 9 | 87 | 8.7% |
| Value | Count | Frequency (%) | |
| 14 | 27 | 2.7% | |
| 13 | 67 | 6.7% | |
| 12 | 40 | 4.0% | |
| 11 | 22 | 2.2% | |
| 10 | 103 | 10.3% | |
| 9 | 87 | 8.7% | |
| 8 | 48 | 4.8% | |
| 7 | 65 | 6.5% | |
| 6 | 122 | 12.2% | |
| 5 | 42 | 4.2% |
| Distinct count | 13 |
|---|---|
| Unique (%) | 1.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.762 |
|---|---|
| Minimum | 0.0 |
| Maximum | 13.0 |
| Zeros | 47 |
| Zeros (%) | 4.7% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 4 |
| median | 6 |
| Q3 | 10 |
| 95-th percentile | 12 |
| Maximum | 13 |
| Range | 13 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 3.588625675 |
|---|---|
| Coefficient of variation (CV) | 0.530704773 |
| Kurtosis | -0.8647297059 |
| Mean | 6.762 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.05189933699 |
| Sum | 6762 |
| Variance | 12.87823423 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 6 | 137 | 13.7% | |
| 5 | 135 | 13.5% | |
| 12 | 132 | 13.2% | |
| 7 | 113 | 11.3% | |
| 10 | 109 | 10.9% | |
| 4 | 83 | 8.3% | |
| 2 | 79 | 7.9% | |
| 0 | 47 | 4.7% | |
| 13 | 43 | 4.3% | |
| 9 | 43 | 4.3% | |
| 8 | 35 | 3.5% | |
| 1 | 24 | 2.4% | |
| 3 | 20 | 2.0% |
| Value | Count | Frequency (%) | |
| 0 | 47 | 4.7% | |
| 1 | 24 | 2.4% | |
| 2 | 79 | 7.9% | |
| 3 | 20 | 2.0% | |
| 4 | 83 | 8.3% | |
| 5 | 135 | 13.5% | |
| 6 | 137 | 13.7% | |
| 7 | 113 | 11.3% | |
| 8 | 35 | 3.5% | |
| 9 | 43 | 4.3% |
| Value | Count | Frequency (%) | |
| 13 | 43 | 4.3% | |
| 12 | 132 | 13.2% | |
| 10 | 109 | 10.9% | |
| 9 | 43 | 4.3% | |
| 8 | 35 | 3.5% | |
| 7 | 113 | 11.3% | |
| 6 | 137 | 13.7% | |
| 5 | 135 | 13.5% | |
| 4 | 83 | 8.3% | |
| 3 | 20 | 2.0% |
| Distinct count | 14 |
|---|---|
| Unique (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.373 |
|---|---|
| Minimum | 0.0 |
| Maximum | 14.0 |
| Zeros | 87 |
| Zeros (%) | 8.7% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2 |
| median | 6 |
| Q3 | 10 |
| 95-th percentile | 14 |
| Maximum | 14 |
| Range | 14 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 4.560775337 |
|---|---|
| Coefficient of variation (CV) | 0.7156402537 |
| Kurtosis | -1.151830226 |
| Mean | 6.373 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 0.2734792701 |
| Sum | 6373 |
| Variance | 20.80067167 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 6 | 115 | 11.5% | |
| 2 | 114 | 11.4% | |
| 1 | 106 | 10.6% | |
| 14 | 105 | 10.5% | |
| 0 | 87 | 8.7% | |
| 7 | 76 | 7.6% | |
| 5 | 72 | 7.2% | |
| 4 | 70 | 7.0% | |
| 9 | 66 | 6.6% | |
| 13 | 50 | 5.0% | |
| 12 | 48 | 4.8% | |
| 10 | 42 | 4.2% | |
| 11 | 25 | 2.5% | |
| 8 | 24 | 2.4% |
| Value | Count | Frequency (%) | |
| 0 | 87 | 8.7% | |
| 1 | 106 | 10.6% | |
| 2 | 114 | 11.4% | |
| 4 | 70 | 7.0% | |
| 5 | 72 | 7.2% | |
| 6 | 115 | 11.5% | |
| 7 | 76 | 7.6% | |
| 8 | 24 | 2.4% | |
| 9 | 66 | 6.6% | |
| 10 | 42 | 4.2% |
| Value | Count | Frequency (%) | |
| 14 | 105 | 10.5% | |
| 13 | 50 | 5.0% | |
| 12 | 48 | 4.8% | |
| 11 | 25 | 2.5% | |
| 10 | 42 | 4.2% | |
| 9 | 66 | 6.6% | |
| 8 | 24 | 2.4% | |
| 7 | 76 | 7.6% | |
| 6 | 115 | 11.5% | |
| 5 | 72 | 7.2% |
target
Real number (ℝ≥0)
| Distinct count | 9 |
|---|---|
| Unique (%) | 0.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4.131 |
|---|---|
| Minimum | 1.0 |
| Maximum | 9.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 7.9 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 3 |
| median | 4 |
| Q3 | 6 |
| 95-th percentile | 7 |
| Maximum | 9 |
| Range | 8 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 1.982869328 |
|---|---|
| Coefficient of variation (CV) | 0.4799974165 |
| Kurtosis | -0.6234381692 |
| Mean | 4.131 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.3042798877 |
| Sum | 4131 |
| Variance | 3.931770771 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 3 | 181 | 18.1% | |
| 4 | 172 | 17.2% | |
| 5 | 158 | 15.8% | |
| 2 | 142 | 14.2% | |
| 6 | 118 | 11.8% | |
| 1 | 92 | 9.2% | |
| 7 | 88 | 8.8% | |
| 8 | 31 | 3.1% | |
| 9 | 18 | 1.8% |
| Value | Count | Frequency (%) | |
| 1 | 92 | 9.2% | |
| 2 | 142 | 14.2% | |
| 3 | 181 | 18.1% | |
| 4 | 172 | 17.2% | |
| 5 | 158 | 15.8% | |
| 6 | 118 | 11.8% | |
| 7 | 88 | 8.8% | |
| 8 | 31 | 3.1% | |
| 9 | 18 | 1.8% |
| Value | Count | Frequency (%) | |
| 9 | 18 | 1.8% | |
| 8 | 31 | 3.1% | |
| 7 | 88 | 8.8% | |
| 6 | 118 | 11.8% | |
| 5 | 158 | 15.8% | |
| 4 | 172 | 17.2% | |
| 3 | 181 | 18.1% | |
| 2 | 142 | 14.2% | |
| 1 | 92 | 9.2% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| in1 | in2 | in3 | in4 | target | |
|---|---|---|---|---|---|
| 0 | 13.0 | 6.0 | 12.0 | 2.0 | 4.0 |
| 1 | 2.0 | 7.0 | 9.0 | 2.0 | 2.0 |
| 2 | 12.0 | 8.0 | 7.0 | 6.0 | 7.0 |
| 3 | 7.0 | 10.0 | 8.0 | 13.0 | 6.0 |
| 4 | 13.0 | 6.0 | 5.0 | 0.0 | 4.0 |
| 5 | 10.0 | 13.0 | 5.0 | 0.0 | 7.0 |
| 6 | 12.0 | 9.0 | 10.0 | 6.0 | 5.0 |
| 7 | 1.0 | 3.0 | 10.0 | 7.0 | 3.0 |
| 8 | 8.0 | 5.0 | 10.0 | 2.0 | 3.0 |
| 9 | 7.0 | 3.0 | 7.0 | 0.0 | 2.0 |
Last rows
| in1 | in2 | in3 | in4 | target | |
|---|---|---|---|---|---|
| 990 | 3.0 | 6.0 | 2.0 | 12.0 | 1.0 |
| 991 | 8.0 | 5.0 | 10.0 | 2.0 | 2.0 |
| 992 | 9.0 | 8.0 | 5.0 | 2.0 | 8.0 |
| 993 | 9.0 | 3.0 | 10.0 | 5.0 | 2.0 |
| 994 | 10.0 | 7.0 | 1.0 | 6.0 | 3.0 |
| 995 | 2.0 | 14.0 | 2.0 | 9.0 | 1.0 |
| 996 | 5.0 | 7.0 | 3.0 | 12.0 | 3.0 |
| 997 | 1.0 | 2.0 | 12.0 | 4.0 | 3.0 |
| 998 | 1.0 | 2.0 | 12.0 | 6.0 | 2.0 |
| 999 | 10.0 | 3.0 | 6.0 | 14.0 | 2.0 |
Most frequent
| in1 | in2 | in3 | in4 | target | count | |
|---|---|---|---|---|---|---|
| 207 | 14.0 | 11.0 | 13.0 | 14.0 | 9.0 | 14 |
| 28 | 2.0 | 1.0 | 5.0 | 1.0 | 1.0 | 12 |
| 49 | 3.0 | 2.0 | 0.0 | 14.0 | 1.0 | 12 |
| 0 | 0.0 | 5.0 | 7.0 | 2.0 | 1.0 | 11 |
| 11 | 1.0 | 2.0 | 12.0 | 4.0 | 2.0 | 11 |
| 12 | 1.0 | 2.0 | 12.0 | 4.0 | 3.0 | 11 |
| 109 | 9.0 | 8.0 | 5.0 | 2.0 | 4.0 | 11 |
| 58 | 3.0 | 6.0 | 2.0 | 12.0 | 2.0 | 10 |
| 106 | 9.0 | 3.0 | 10.0 | 5.0 | 4.0 | 10 |
| 124 | 10.0 | 7.0 | 1.0 | 6.0 | 3.0 | 10 |