Overview

Dataset statistics

Number of variables20
Number of observations3196
Missing cells0
Missing cells (%)0.0%
Duplicate rows2206
Duplicate rows (%)69.0%
Total size in memory499.5 KiB
Average record size in memory160.0 B

Variable types

BOOL20

Reproduction

Analysis started2020-08-25 01:29:07.204185
Analysis finished2020-08-25 01:29:08.937291
Duration1.73 second
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 2206 (69.0%) duplicate rows Duplicates

Variables

c36
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2407
1
789
ValueCountFrequency (%) 
0240775.3%
 
178924.7%
 

c14
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3181
1
 
15
ValueCountFrequency (%) 
0318199.5%
 
1150.5%
 

c27
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3021
1
 
175
ValueCountFrequency (%) 
0302194.5%
 
11755.5%
 

c31
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2631
1
565
ValueCountFrequency (%) 
0263182.3%
 
156517.7%
 

c17
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3099
1
 
97
ValueCountFrequency (%) 
0309997.0%
 
1973.0%
 

c32
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3021
1
 
175
ValueCountFrequency (%) 
0302194.5%
 
11755.5%
 

c22
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2556
1
640
ValueCountFrequency (%) 
0255680.0%
 
164020.0%
 

c13
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
1
2205
0
991
ValueCountFrequency (%) 
1220569.0%
 
099131.0%
 

c9
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
1980
1
1216
ValueCountFrequency (%) 
0198062.0%
 
1121638.0%
 

c18
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
1
2196
0
1000
ValueCountFrequency (%) 
1219668.7%
 
0100031.3%
 

c10
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2225
1
971
ValueCountFrequency (%) 
0222569.6%
 
197130.4%
 

c35
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
1
2345
0
851
ValueCountFrequency (%) 
1234573.4%
 
085126.6%
 

c1
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2839
1
 
357
ValueCountFrequency (%) 
0283988.8%
 
135711.2%
 

c5
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2129
1
1067
ValueCountFrequency (%) 
0212966.6%
 
1106733.4%
 

c30
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3060
1
 
136
ValueCountFrequency (%) 
0306095.7%
 
11364.3%
 

c16
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3040
1
 
156
ValueCountFrequency (%) 
0304095.1%
 
11564.9%
 

c20
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2714
1
 
482
ValueCountFrequency (%) 
0271484.9%
 
148215.1%
 

c6
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
1722
1
1474
ValueCountFrequency (%) 
0172253.9%
 
1147446.1%
 

c12
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2860
1
 
336
ValueCountFrequency (%) 
0286089.5%
 
133610.5%
 

target
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
1
1669
0
1527
ValueCountFrequency (%) 
1166952.2%
 
0152747.8%
 

Correlations

2020-08-25T01:29:09.229285image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:29:09.536298image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:29:09.857011image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:29:10.197211image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:29:08.279226image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:29:08.762432image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

c36c14c27c31c17c32c22c13c9c18c10c35c1c5c30c16c20c6c12target
000000001010100000001
100000001010101000001
200000001010101000001
300000011110100000001
400000001010100000001
500000001010100100001
600000011110100000001
700000001010101000001
800010001010100000001
900000001010100000001

Last rows

c36c14c27c31c17c32c22c13c9c18c10c35c1c5c30c16c20c6c12target
318600000111100011000010
318700000101110000000010
318800000111110000000010
318900000101100000000010
319000000101010000001010
319100000101010010000010
319200000101010010000010
319300000101010010000010
319401000101010010010010
319501000101010010010010

Duplicate rows

Most frequent

c36c14c27c31c17c32c22c13c9c18c10c35c1c5c30c16c20c6c12targetcount
170000000001010000010139
4841000000100010000010135
380000000001110000010030
1190000000101010000010128
870000000100010000010127
940000000100010100010127
840000000100010000000124
920000000100010100000124
1410000000101110000000024
1450000000101110000010024