Overview

Dataset statistics

Number of variables20
Number of observations3196
Missing cells0
Missing cells (%)0.0%
Duplicate rows2206
Duplicate rows (%)69.0%
Total size in memory499.5 KiB
Average record size in memory160.0 B

Variable types

BOOL20

Reproduction

Analysis started2020-08-25 01:13:24.126088
Analysis finished2020-08-25 01:13:25.844905
Duration1.72 second
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 2206 (69.0%) duplicate rows Duplicates

Variables

A35
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2407
1
789
ValueCountFrequency (%) 
0240775.3%
 
178924.7%
 

A13
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3181
1
 
15
ValueCountFrequency (%) 
0318199.5%
 
1150.5%
 

A26
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3021
1
 
175
ValueCountFrequency (%) 
0302194.5%
 
11755.5%
 

A30
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2631
1
565
ValueCountFrequency (%) 
0263182.3%
 
156517.7%
 

A16
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3099
1
 
97
ValueCountFrequency (%) 
0309997.0%
 
1973.0%
 

A31
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3021
1
 
175
ValueCountFrequency (%) 
0302194.5%
 
11755.5%
 

A21
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2556
1
640
ValueCountFrequency (%) 
0255680.0%
 
164020.0%
 

A12
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
1
2205
0
991
ValueCountFrequency (%) 
1220569.0%
 
099131.0%
 

A08
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
1980
1
1216
ValueCountFrequency (%) 
0198062.0%
 
1121638.0%
 

A17
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
1
2196
0
1000
ValueCountFrequency (%) 
1219668.7%
 
0100031.3%
 

A09
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2225
1
971
ValueCountFrequency (%) 
0222569.6%
 
197130.4%
 

A34
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
1
2345
0
851
ValueCountFrequency (%) 
1234573.4%
 
085126.6%
 

A00
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2839
1
 
357
ValueCountFrequency (%) 
0283988.8%
 
135711.2%
 

A04
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2129
1
1067
ValueCountFrequency (%) 
0212966.6%
 
1106733.4%
 

A29
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3060
1
 
136
ValueCountFrequency (%) 
0306095.7%
 
11364.3%
 

A15
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
3040
1
 
156
ValueCountFrequency (%) 
0304095.1%
 
11564.9%
 

A19
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2714
1
 
482
ValueCountFrequency (%) 
0271484.9%
 
148215.1%
 

A05
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
1722
1
1474
ValueCountFrequency (%) 
0172253.9%
 
1147446.1%
 

A11
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
0
2860
1
 
336
ValueCountFrequency (%) 
0286089.5%
 
133610.5%
 

target
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size25.1 KiB
1
1669
0
1527
ValueCountFrequency (%) 
1166952.2%
 
0152747.8%
 

Correlations

2020-08-25T01:13:26.128606image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:13:26.433506image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:13:26.744754image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:13:27.051376image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:13:25.186960image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:13:25.678502image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

A35A13A26A30A16A31A21A12A08A17A09A34A00A04A29A15A19A05A11target
010000000110101000100
110100001010100000000
200000001101111000000
300000001000100000001
410000001010100000000
500000000010101000101
600010001001100001000
700000001110101000100
810000001000100000101
900010001000000010000

Last rows

A35A13A26A30A16A31A21A12A08A17A09A34A00A04A29A15A19A05A11target
318600000001000101000101
318700010001000000001101
318800001000010100000001
318900000100110000000010
319000000110110000000000
319100000001000101000101
319200010001011100001001
319300000001001101000001
319400000001100000000101
319500000001111100000000

Duplicate rows

Most frequent

A35A13A26A30A16A31A21A12A08A17A09A34A00A04A29A15A19A05A11targetcount
170000000001010000010139
4841000000100010000010135
380000000001110000010030
1190000000101010000010128
870000000100010000010127
940000000100010100010127
840000000100010000000124
920000000100010100000124
1410000000101110000000024
1450000000101110000010024