parity5+5

Dataset statistics

Number of variables	11
Number of observations	1124
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	100
Duplicate rows (%)	8.9%
Total size in memory	96.7 KiB
Average record size in memory	88.1 B

Variable types

BOOL	11

Reproduction

Analysis started	2020-08-25 01:43:33.563220
Analysis finished	2020-08-25 01:43:34.442324
Duration	0.88 seconds
Version	pandas-profiling v2.8.0
Command line	`pandas_profiling --config_file config.yaml [YOUR_FILE.csv]`
Download configuration	config.yaml

Warnings

Dataset has 100 (8.9%) duplicate rows

Duplicates

Bit 1
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

0	574
1	550

Frequency Table

Value	Count	Frequency (%)
0	574	51.1%
1	550	48.9%

Bit 2
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

0	572
1	552

Frequency Table

Value	Count	Frequency (%)
0	572	50.9%
1	552	49.1%

Bit 3
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

1	564
0	560

Frequency Table

Value	Count	Frequency (%)
1	564	50.2%
0	560	49.8%

Bit 4
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

1	563
0	561

Frequency Table

Value	Count	Frequency (%)
1	563	50.1%
0	561	49.9%

Bit 5
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

0	568
1	556

Frequency Table

Value	Count	Frequency (%)
0	568	50.5%
1	556	49.5%

Bit 6
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

0	566
1	558

Frequency Table

Value	Count	Frequency (%)
0	566	50.4%
1	558	49.6%

Bit 7
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

1	562
0	562

Frequency Table

Value	Count	Frequency (%)
1	562	50.0%
0	562	50.0%

Bit 8
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

0	566
1	558

Frequency Table

Value	Count	Frequency (%)
0	566	50.4%
1	558	49.6%

Bit 9
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

1	566
0	558

Frequency Table

Value	Count	Frequency (%)
1	566	50.4%
0	558	49.6%

Bit 10
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

0	577
1	547

Frequency Table

Value	Count	Frequency (%)
0	577	51.3%
1	547	48.7%

target
Boolean

Distinct count	2
Unique (%)	0.2%
Missing	0
Missing (%)	0.0%
Memory size	8.9 KiB

1	567
0	557

Frequency Table

Value	Count	Frequency (%)
1	567	50.4%
0	557	49.6%

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Count
Matrix

First rows

	Bit 1	Bit 2	Bit 3	Bit 4	Bit 5	Bit 6	Bit 7	Bit 8	Bit 9	Bit 10	target
0	1	1	0	0	1	0	0	0	0	1	1
1	1	0	0	0	0	1	1	1	1	0	0
2	0	0	1	0	1	1	1	1	1	1	1
3	0	1	0	1	0	0	0	1	0	0	1
4	0	0	0	0	0	0	0	1	1	0	1
5	0	0	1	1	0	0	1	1	1	0	1
6	0	0	0	0	0	1	1	0	0	0	1
7	1	1	1	1	1	0	1	0	1	0	1
8	0	0	0	1	1	1	1	1	1	1	1
9	0	0	1	0	1	0	1	1	1	0	0

Last rows

	Bit 1	Bit 2	Bit 3	Bit 4	Bit 5	Bit 6	Bit 7	Bit 8	Bit 9	Bit 10	target
1114	0	1	1	1	0	1	1	0	0	0	0
1115	1	1	1	1	1	0	0	1	0	1	0
1116	0	1	1	1	1	1	0	0	0	1	0
1117	1	1	1	0	1	0	0	0	0	0	0
1118	0	0	0	0	1	0	1	0	1	0	0
1119	0	0	1	0	1	1	0	1	1	0	1
1120	0	1	0	0	1	0	0	0	1	1	1
1121	0	1	0	0	1	1	0	0	1	0	0
1122	1	1	0	0	1	1	1	0	1	0	0
1123	0	1	0	0	1	0	1	1	1	0	0

Most frequent

	Bit 4	Bit 5	Bit 6	Bit 7	Bit 8	Bit 9	Bit 10	target	count
0	0	0	0	0	0	1	0	0	2
1	0	0	0	0	1	1	0	1	2
2	0	0	1	0	0	1	0	1	2
3	0	0	1	1	0	0	0	1	2
4	0	0	1	1	0	1	1	1	2
5	0	1	0	0	0	0	0	0	2
6	0	1	0	1	0	0	0	0	2
7	0	1	1	1	0	1	1	1	2
8	1	0	0	1	0	1	0	1	2
9	1	0	1	0	0	0	0	0	2

Overview

Variables

Correlations