corral

Dataset statistics

Number of variables	7
Number of observations	160
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	96
Duplicate rows (%)	60.0%
Total size in memory	8.9 KiB
Average record size in memory	56.8 B

Variable types

BOOL	7

Reproduction

Analysis started	2020-08-25 01:20:30.346916
Analysis finished	2020-08-25 01:20:30.968090
Duration	0.62 seconds
Version	pandas-profiling v2.8.0
Command line	`pandas_profiling --config_file config.yaml [YOUR_FILE.csv]`
Download configuration	config.yaml

Warnings

Dataset has 96 (60.0%) duplicate rows

Duplicates

A0
Boolean

Distinct count	2
Unique (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	1.4 KiB

1	80
0	80

Frequency Table

Value	Count	Frequency (%)
1	80	50.0%
0	80	50.0%

A1
Boolean

Distinct count	2
Unique (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	1.4 KiB

1	80
0	80

Frequency Table

Value	Count	Frequency (%)
1	80	50.0%
0	80	50.0%

B0
Boolean

Distinct count	2
Unique (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	1.4 KiB

1	80
0	80

Frequency Table

Value	Count	Frequency (%)
1	80	50.0%
0	80	50.0%

B1
Boolean

Distinct count	2
Unique (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	1.4 KiB

1	80
0	80

Frequency Table

Value	Count	Frequency (%)
1	80	50.0%
0	80	50.0%

Irrelevant
Boolean

Distinct count	2
Unique (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	1.4 KiB

1	80
0	80

Frequency Table

Value	Count	Frequency (%)
1	80	50.0%
0	80	50.0%

Correlated
Boolean

Distinct count	2
Unique (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	1.4 KiB

0	86
1	74

Frequency Table

Value	Count	Frequency (%)
0	86	53.8%
1	74	46.2%

target
Boolean

Distinct count	2
Unique (%)	1.2%
Missing	0
Missing (%)	0.0%
Memory size	1.4 KiB

0	90
1	70

Frequency Table

Value	Count	Frequency (%)
0	90	56.2%
1	70	43.8%

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Count
Matrix

First rows

	A1	B0	B1	Irrelevant	Correlated	target
0	0	0	0	0	0	0
1	0	0	0	1	0	0
2	0	0	1	0	0	0
3	0	0	1	1	0	0
4	0	1	0	0	1	0
5	0	1	0	1	0	0
6	0	1	1	0	0	1
7	0	1	1	1	1	1
8	1	0	0	0	1	0
9	1	0	0	1	0	0

Last rows

	A0	A1	B0	B1	Irrelevant	Correlated	target
150	1	1	0	1	1	0	1
151	1	1	0	1	1	1	1
152	1	1	1	0	0	0	1
153	1	1	1	0	0	1	1
154	1	1	1	0	1	0	1
155	1	1	1	0	1	1	1
156	1	1	1	1	0	0	1
157	1	1	1	1	0	1	1
158	1	1	1	1	1	0	1
159	1	1	1	1	1	1	1

Most frequent

	A1	B0	B1	Irrelevant	Correlated	target	count
0	0	0	0	0	0	0	4
1	0	0	0	1	0	0	4
2	0	0	1	0	0	0	4
3	0	0	1	1	0	0	4
6	0	1	0	1	0	0	4
9	0	1	1	1	1	1	4
12	1	0	0	1	0	0	4
15	1	0	1	1	0	0	4
16	1	1	0	0	0	0	4
17	1	1	0	1	0	0	4

Overview

Variables

Correlations