Overview

Dataset statistics

Number of variables7
Number of observations160
Missing cells0
Missing cells (%)0.0%
Duplicate rows96
Duplicate rows (%)60.0%
Total size in memory8.9 KiB
Average record size in memory56.8 B

Variable types

BOOL7

Reproduction

Analysis started2020-08-25 01:20:30.346916
Analysis finished2020-08-25 01:20:30.968090
Duration0.62 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 96 (60.0%) duplicate rows Duplicates

Variables

A0
Boolean

Distinct count2
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
1
80
0
80
ValueCountFrequency (%) 
18050.0%
 
08050.0%
 

A1
Boolean

Distinct count2
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
1
80
0
80
ValueCountFrequency (%) 
18050.0%
 
08050.0%
 

B0
Boolean

Distinct count2
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
1
80
0
80
ValueCountFrequency (%) 
18050.0%
 
08050.0%
 

B1
Boolean

Distinct count2
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
1
80
0
80
ValueCountFrequency (%) 
18050.0%
 
08050.0%
 

Irrelevant
Boolean

Distinct count2
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
1
80
0
80
ValueCountFrequency (%) 
18050.0%
 
08050.0%
 

Correlated
Boolean

Distinct count2
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
0
86
1
74
ValueCountFrequency (%) 
08653.8%
 
17446.2%
 

target
Boolean

Distinct count2
Unique (%)1.2%
Missing0
Missing (%)0.0%
Memory size1.4 KiB
0
90
1
70
ValueCountFrequency (%) 
09056.2%
 
17043.8%
 

Correlations

2020-08-25T01:20:31.028537image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:20:31.229999image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:20:31.420598image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:20:31.813665image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:20:30.658401image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:20:30.868246image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

A0A1B0B1IrrelevantCorrelatedtarget
00000000
10000100
20001000
30001100
40010010
50010100
60011001
70011111
80100010
90100100

Last rows

A0A1B0B1IrrelevantCorrelatedtarget
1501101101
1511101111
1521110001
1531110011
1541110101
1551110111
1561111001
1571111011
1581111101
1591111111

Duplicate rows

Most frequent

A0A1B0B1IrrelevantCorrelatedtargetcount
000000004
100001004
200010004
300011004
600101004
900111114
1201001004
1501011004
1601100004
1701101004