Overview

Dataset statistics

Number of variables11
Number of observations1124
Missing cells0
Missing cells (%)0.0%
Duplicate rows100
Duplicate rows (%)8.9%
Total size in memory96.7 KiB
Average record size in memory88.1 B

Variable types

BOOL11

Reproduction

Analysis started2020-08-25 01:43:33.563220
Analysis finished2020-08-25 01:43:34.442324
Duration0.88 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 100 (8.9%) duplicate rows Duplicates

Variables

Bit 1
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
0
574
1
550
ValueCountFrequency (%) 
057451.1%
 
155048.9%
 

Bit 2
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
0
572
1
552
ValueCountFrequency (%) 
057250.9%
 
155249.1%
 

Bit 3
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
1
564
0
560
ValueCountFrequency (%) 
156450.2%
 
056049.8%
 

Bit 4
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
1
563
0
561
ValueCountFrequency (%) 
156350.1%
 
056149.9%
 

Bit 5
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
0
568
1
556
ValueCountFrequency (%) 
056850.5%
 
155649.5%
 

Bit 6
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
0
566
1
558
ValueCountFrequency (%) 
056650.4%
 
155849.6%
 

Bit 7
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
1
562
0
562
ValueCountFrequency (%) 
156250.0%
 
056250.0%
 

Bit 8
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
0
566
1
558
ValueCountFrequency (%) 
056650.4%
 
155849.6%
 

Bit 9
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
1
566
0
558
ValueCountFrequency (%) 
156650.4%
 
055849.6%
 

Bit 10
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
0
577
1
547
ValueCountFrequency (%) 
057751.3%
 
154748.7%
 

target
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.9 KiB
1
567
0
557
ValueCountFrequency (%) 
156750.4%
 
055749.6%
 

Correlations

2020-08-25T01:43:34.511349image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:43:34.755659image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:43:34.992028image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:43:35.230451image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:43:34.030004image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:43:34.320857image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

Bit 1Bit 2Bit 3Bit 4Bit 5Bit 6Bit 7Bit 8Bit 9Bit 10target
011001000011
110000111100
200101111111
301010001001
400000001101
500110011101
600000110001
711111010101
800011111111
900101011100

Last rows

Bit 1Bit 2Bit 3Bit 4Bit 5Bit 6Bit 7Bit 8Bit 9Bit 10target
111401110110000
111511111001010
111601111100010
111711101000000
111800001010100
111900101101101
112001001000111
112101001100100
112211001110100
112301001011100

Duplicate rows

Most frequent

Bit 1Bit 2Bit 3Bit 4Bit 5Bit 6Bit 7Bit 8Bit 9Bit 10targetcount
0000000001002
1000000011012
2000001001012
3000001100012
4000001101112
5000010000002
6000010100002
7000011101112
8000100101012
9000101000002