Overview

Dataset statistics

Number of variables5
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows757
Duplicate rows (%)75.7%
Total size in memory39.2 KiB
Average record size in memory40.1 B

Variable types

NUM5

Reproduction

Analysis started2020-08-24 23:43:49.027167
Analysis finished2020-08-24 23:43:54.270665
Duration5.24 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 757 (75.7%) duplicate rows Duplicates
In1 has 260 (26.0%) zeros Zeros
In2 has 184 (18.4%) zeros Zeros
In3 has 193 (19.3%) zeros Zeros
In4 has 201 (20.1%) zeros Zeros
target has 93 (9.3%) zeros Zeros

Variables

In1
Real number (ℝ≥0)

ZEROS

Distinct count5
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.722
Minimum0.0
Maximum4.0
Zeros260
Zeros (%)26.0%
Memory size7.9 KiB
2020-08-24T23:43:54.316333image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median2
Q33
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.357408766
Coefficient of variation (CV)0.7882745448
Kurtosis-1.08148663
Mean1.722
Median Absolute Deviation (MAD)1
Skewness0.226435896
Sum1722
Variance1.842558559
2020-08-24T23:43:54.426048image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
228528.5%
 
026026.0%
 
117717.7%
 
414114.1%
 
313713.7%
 
ValueCountFrequency (%) 
026026.0%
 
117717.7%
 
228528.5%
 
313713.7%
 
414114.1%
 
ValueCountFrequency (%) 
414114.1%
 
313713.7%
 
228528.5%
 
117717.7%
 
026026.0%
 

In2
Real number (ℝ≥0)

ZEROS

Distinct count5
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.985
Minimum0.0
Maximum4.0
Zeros184
Zeros (%)18.4%
Memory size7.9 KiB
2020-08-24T23:43:54.715566image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.358904567
Coefficient of variation (CV)0.6845866835
Kurtosis-1.112135168
Mean1.985
Median Absolute Deviation (MAD)1
Skewness0.05836389067
Sum1985
Variance1.846621622
2020-08-24T23:43:54.826729image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
229829.8%
 
419719.7%
 
018418.4%
 
118118.1%
 
314014.0%
 
ValueCountFrequency (%) 
018418.4%
 
118118.1%
 
229829.8%
 
314014.0%
 
419719.7%
 
ValueCountFrequency (%) 
419719.7%
 
314014.0%
 
229829.8%
 
118118.1%
 
018418.4%
 

In3
Real number (ℝ≥0)

ZEROS

Distinct count5
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.127
Minimum0.0
Maximum4.0
Zeros193
Zeros (%)19.3%
Memory size7.9 KiB
2020-08-24T23:43:54.951638image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.41805567
Coefficient of variation (CV)0.6666928395
Kurtosis-1.236137813
Mean2.127
Median Absolute Deviation (MAD)1
Skewness-0.14717875
Sum2127
Variance2.010881882
2020-08-24T23:43:55.065449image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
224424.4%
 
423023.0%
 
019319.3%
 
319319.3%
 
114014.0%
 
ValueCountFrequency (%) 
019319.3%
 
114014.0%
 
224424.4%
 
319319.3%
 
423023.0%
 
ValueCountFrequency (%) 
423023.0%
 
319319.3%
 
224424.4%
 
114014.0%
 
019319.3%
 

In4
Real number (ℝ≥0)

ZEROS

Distinct count5
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.985
Minimum0.0
Maximum4.0
Zeros201
Zeros (%)20.1%
Memory size7.9 KiB
2020-08-24T23:43:55.188992image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.377923682
Coefficient of variation (CV)0.6941681018
Kurtosis-1.210495104
Mean1.985
Median Absolute Deviation (MAD)1
Skewness-0.02593243586
Sum1985
Variance1.898673674
2020-08-24T23:43:55.303448image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
224024.0%
 
320620.6%
 
020120.1%
 
417817.8%
 
117517.5%
 
ValueCountFrequency (%) 
020120.1%
 
117517.5%
 
224024.0%
 
320620.6%
 
417817.8%
 
ValueCountFrequency (%) 
417817.8%
 
320620.6%
 
224024.0%
 
117517.5%
 
020120.1%
 

target
Real number (ℝ≥0)

ZEROS

Distinct count5
Unique (%)0.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.785
Minimum0.0
Maximum4.0
Zeros93
Zeros (%)9.3%
Memory size7.9 KiB
2020-08-24T23:43:55.426587image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q32
95-th percentile3
Maximum4
Range4
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9548228562
Coefficient of variation (CV)0.5349147654
Kurtosis-0.4276964129
Mean1.785
Median Absolute Deviation (MAD)1
Skewness-0.01568373571
Sum1785
Variance0.9116866867
2020-08-24T23:43:55.544936image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
240340.3%
 
128028.0%
 
319719.7%
 
0939.3%
 
4272.7%
 
ValueCountFrequency (%) 
0939.3%
 
128028.0%
 
240340.3%
 
319719.7%
 
4272.7%
 
ValueCountFrequency (%) 
4272.7%
 
319719.7%
 
240340.3%
 
128028.0%
 
0939.3%
 

Interactions

2020-08-24T23:43:49.225524image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:49.417867image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:49.596707image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:49.783796image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:50.121944image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:50.302373image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:50.486338image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:50.673759image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:50.852432image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:51.033901image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:51.214821image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:51.393446image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:51.577732image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:51.758365image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:51.938642image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:52.116985image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:52.294654image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:52.473659image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:52.655006image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:52.839683image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:53.026874image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:53.206683image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:53.390306image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:53.574500image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:53.766005image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-24T23:43:55.672161image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-24T23:43:55.846434image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-24T23:43:56.028938image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-24T23:43:56.199256image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-24T23:43:54.030655image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:54.204177image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

In1In2In3In4target
04.02.03.00.03.0
13.03.00.03.03.0
22.04.01.00.02.0
32.01.02.03.02.0
42.03.04.02.02.0
50.01.01.00.00.0
61.02.02.01.02.0
74.02.04.03.03.0
83.03.03.01.02.0
90.04.04.01.02.0

Last rows

In1In2In3In4target
9901.03.00.01.02.0
9912.01.01.02.01.0
9922.02.00.04.02.0
9931.02.04.02.02.0
9942.00.04.03.00.0
9952.02.01.04.02.0
9961.02.02.03.02.0
9970.00.01.04.00.0
9980.02.01.03.01.0
9992.00.03.04.01.0

Duplicate rows

Most frequent

In1In2In3In4targetcount
962.02.03.03.02.021
1584.01.03.01.02.019
922.02.00.04.02.016
30.00.03.00.00.015
150.02.00.00.01.014
1092.04.00.03.03.014
170.02.01.02.01.013
190.02.01.03.01.012
400.04.04.02.03.011
942.02.02.03.02.011