Overview

Dataset statistics

Number of variables5
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows732
Duplicate rows (%)73.2%
Total size in memory39.2 KiB
Average record size in memory40.1 B

Variable types

NUM5

Reproduction

Analysis started2020-08-24 23:43:58.061762
Analysis finished2020-08-24 23:44:03.007575
Duration4.95 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 732 (73.2%) duplicate rows Duplicates
in1 has 49 (4.9%) zeros Zeros
in2 has 19 (1.9%) zeros Zeros
in3 has 47 (4.7%) zeros Zeros
in4 has 87 (8.7%) zeros Zeros

Variables

in1
Real number (ℝ≥0)

ZEROS

Distinct count13
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.782
Minimum0.0
Maximum14.0
Zeros49
Zeros (%)4.9%
Memory size7.9 KiB
2020-08-24T23:44:03.053234image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median9
Q312
95-th percentile14
Maximum14
Range14
Interquartile range (IQR)9

Descriptive statistics

Standard deviation4.595388296
Coefficient of variation (CV)0.5905150728
Kurtosis-1.322574979
Mean7.782
Median Absolute Deviation (MAD)4
Skewness-0.3369267852
Sum7782
Variance21.11759359
2020-08-24T23:44:03.167060image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1214014.0%
 
1011911.9%
 
1310110.1%
 
2939.3%
 
1919.1%
 
9747.4%
 
14747.4%
 
8696.9%
 
3636.3%
 
7626.2%
 
0494.9%
 
5454.5%
 
11202.0%
 
ValueCountFrequency (%) 
0494.9%
 
1919.1%
 
2939.3%
 
3636.3%
 
5454.5%
 
7626.2%
 
8696.9%
 
9747.4%
 
1011911.9%
 
11202.0%
 
ValueCountFrequency (%) 
14747.4%
 
1310110.1%
 
1214014.0%
 
11202.0%
 
1011911.9%
 
9747.4%
 
8696.9%
 
7626.2%
 
5454.5%
 
3636.3%
 

in2
Real number (ℝ≥0)

ZEROS

Distinct count15
Unique (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.516
Minimum0.0
Maximum14.0
Zeros19
Zeros (%)1.9%
Memory size7.9 KiB
2020-08-24T23:44:03.290684image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile2
Q13
median6
Q310
95-th percentile13
Maximum14
Range14
Interquartile range (IQR)7

Descriptive statistics

Standard deviation3.797389125
Coefficient of variation (CV)0.5827791781
Kurtosis-1.053124228
Mean6.516
Median Absolute Deviation (MAD)3
Skewness0.276116374
Sum6516
Variance14.42016416
2020-08-24T23:44:03.406451image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
213813.8%
 
313313.3%
 
612212.2%
 
1010310.3%
 
9878.7%
 
13676.7%
 
7656.5%
 
4636.3%
 
8484.8%
 
5424.2%
 
12404.0%
 
14272.7%
 
1242.4%
 
11222.2%
 
0191.9%
 
ValueCountFrequency (%) 
0191.9%
 
1242.4%
 
213813.8%
 
313313.3%
 
4636.3%
 
5424.2%
 
612212.2%
 
7656.5%
 
8484.8%
 
9878.7%
 
ValueCountFrequency (%) 
14272.7%
 
13676.7%
 
12404.0%
 
11222.2%
 
1010310.3%
 
9878.7%
 
8484.8%
 
7656.5%
 
612212.2%
 
5424.2%
 

in3
Real number (ℝ≥0)

ZEROS

Distinct count13
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.762
Minimum0.0
Maximum13.0
Zeros47
Zeros (%)4.7%
Memory size7.9 KiB
2020-08-24T23:44:03.529437image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q14
median6
Q310
95-th percentile12
Maximum13
Range13
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.588625675
Coefficient of variation (CV)0.530704773
Kurtosis-0.8647297059
Mean6.762
Median Absolute Deviation (MAD)2
Skewness0.05189933699
Sum6762
Variance12.87823423
2020-08-24T23:44:03.630682image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
613713.7%
 
513513.5%
 
1213213.2%
 
711311.3%
 
1010910.9%
 
4838.3%
 
2797.9%
 
0474.7%
 
13434.3%
 
9434.3%
 
8353.5%
 
1242.4%
 
3202.0%
 
ValueCountFrequency (%) 
0474.7%
 
1242.4%
 
2797.9%
 
3202.0%
 
4838.3%
 
513513.5%
 
613713.7%
 
711311.3%
 
8353.5%
 
9434.3%
 
ValueCountFrequency (%) 
13434.3%
 
1213213.2%
 
1010910.9%
 
9434.3%
 
8353.5%
 
711311.3%
 
613713.7%
 
513513.5%
 
4838.3%
 
3202.0%
 

in4
Real number (ℝ≥0)

ZEROS

Distinct count14
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.373
Minimum0.0
Maximum14.0
Zeros87
Zeros (%)8.7%
Memory size7.9 KiB
2020-08-24T23:44:03.742719image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median6
Q310
95-th percentile14
Maximum14
Range14
Interquartile range (IQR)8

Descriptive statistics

Standard deviation4.560775337
Coefficient of variation (CV)0.7156402537
Kurtosis-1.151830226
Mean6.373
Median Absolute Deviation (MAD)4
Skewness0.2734792701
Sum6373
Variance20.80067167
2020-08-24T23:44:03.857516image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
611511.5%
 
211411.4%
 
110610.6%
 
1410510.5%
 
0878.7%
 
7767.6%
 
5727.2%
 
4707.0%
 
9666.6%
 
13505.0%
 
12484.8%
 
10424.2%
 
11252.5%
 
8242.4%
 
ValueCountFrequency (%) 
0878.7%
 
110610.6%
 
211411.4%
 
4707.0%
 
5727.2%
 
611511.5%
 
7767.6%
 
8242.4%
 
9666.6%
 
10424.2%
 
ValueCountFrequency (%) 
1410510.5%
 
13505.0%
 
12484.8%
 
11252.5%
 
10424.2%
 
9666.6%
 
8242.4%
 
7767.6%
 
611511.5%
 
5727.2%
 

target
Real number (ℝ≥0)

Distinct count9
Unique (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.131
Minimum1.0
Maximum9.0
Zeros0
Zeros (%)0.0%
Memory size7.9 KiB
2020-08-24T23:44:03.979932image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q13
median4
Q36
95-th percentile7
Maximum9
Range8
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.982869328
Coefficient of variation (CV)0.4799974165
Kurtosis-0.6234381692
Mean4.131
Median Absolute Deviation (MAD)1
Skewness0.3042798877
Sum4131
Variance3.931770771
2020-08-24T23:44:04.093809image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
318118.1%
 
417217.2%
 
515815.8%
 
214214.2%
 
611811.8%
 
1929.2%
 
7888.8%
 
8313.1%
 
9181.8%
 
ValueCountFrequency (%) 
1929.2%
 
214214.2%
 
318118.1%
 
417217.2%
 
515815.8%
 
611811.8%
 
7888.8%
 
8313.1%
 
9181.8%
 
ValueCountFrequency (%) 
9181.8%
 
8313.1%
 
7888.8%
 
611811.8%
 
515815.8%
 
417217.2%
 
318118.1%
 
214214.2%
 
1929.2%
 

Interactions

2020-08-24T23:43:58.277250image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:58.444747image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:58.612823image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:58.777919image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:58.945093image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:59.117919image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:59.284500image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:59.451141image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:59.621441image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:59.796548image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:59.968885image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:00.135309image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:00.454957image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:00.617644image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:00.781498image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:00.950838image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:01.123378image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:01.296095image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:01.461498image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:01.632735image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:01.805217image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:01.978752image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:02.153216image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:02.323205image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:02.496488image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-24T23:44:04.371338image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-24T23:44:04.541084image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-24T23:44:04.711880image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-24T23:44:04.885890image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-24T23:44:02.761419image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:44:02.936132image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

in1in2in3in4target
013.06.012.02.04.0
12.07.09.02.02.0
212.08.07.06.07.0
37.010.08.013.06.0
413.06.05.00.04.0
510.013.05.00.07.0
612.09.010.06.05.0
71.03.010.07.03.0
88.05.010.02.03.0
97.03.07.00.02.0

Last rows

in1in2in3in4target
9903.06.02.012.01.0
9918.05.010.02.02.0
9929.08.05.02.08.0
9939.03.010.05.02.0
99410.07.01.06.03.0
9952.014.02.09.01.0
9965.07.03.012.03.0
9971.02.012.04.03.0
9981.02.012.06.02.0
99910.03.06.014.02.0

Duplicate rows

Most frequent

in1in2in3in4targetcount
20714.011.013.014.09.014
282.01.05.01.01.012
493.02.00.014.01.012
00.05.07.02.01.011
111.02.012.04.02.011
121.02.012.04.03.011
1099.08.05.02.04.011
583.06.02.012.02.010
1069.03.010.05.04.010
12410.07.01.06.03.010