monk3

Dataset statistics

Number of variables	7
Number of observations	554
Missing cells	0
Missing cells (%)	0.0%
Duplicate rows	116
Duplicate rows (%)	20.9%
Total size in memory	30.4 KiB
Average record size in memory	56.2 B

Variable types

CAT	4
BOOL	3

Reproduction

Analysis started	2020-08-25 01:40:00.869449
Analysis finished	2020-08-25 01:40:01.566296
Duration	0.7 seconds
Version	pandas-profiling v2.8.0
Command line	`pandas_profiling --config_file config.yaml [YOUR_FILE.csv]`
Download configuration	config.yaml

Warnings

Dataset has 116 (20.9%) duplicate rows

Duplicates

Head shape
Categorical

Distinct count	3
Unique (%)	0.5%
Missing	0
Missing (%)	0.0%
Memory size	4.5 KiB

1	192
2	184
0	178

Value	Count	Frequency (%)
1	192	34.7%
2	184	33.2%
0	178	32.1%

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Overview of Unicode Properties

Unique unicode characters	3
Unique unicode categories (?)	1
Unique unicode scripts (?)	1
Unique unicode blocks (?)	1

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

Value	Count	Frequency (%)
1	192	34.7%
2	184	33.2%
0	178	32.1%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	554	100.0%

Most frequent Decimal Number characters

Value	Count	Frequency (%)
1	192	34.7%
2	184	33.2%
0	178	32.1%

Most occurring scripts

Value	Count	Frequency (%)
Common	554	100.0%

Most frequent Common characters

Value	Count	Frequency (%)
1	192	34.7%
2	184	33.2%
0	178	32.1%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	554	100.0%

Most frequent ASCII characters

Value	Count	Frequency (%)
1	192	34.7%
2	184	33.2%
0	178	32.1%

Body shape
Categorical

Distinct count	3
Unique (%)	0.5%
Missing	0
Missing (%)	0.0%
Memory size	4.5 KiB

2	186
0	185
1	183

Value	Count	Frequency (%)
2	186	33.6%
0	185	33.4%
1	183	33.0%

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Overview of Unicode Properties

Unique unicode characters	3
Unique unicode categories (?)	1
Unique unicode scripts (?)	1
Unique unicode blocks (?)	1

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

Value	Count	Frequency (%)
2	186	33.6%
0	185	33.4%
1	183	33.0%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	554	100.0%

Most frequent Decimal Number characters

Value	Count	Frequency (%)
2	186	33.6%
0	185	33.4%
1	183	33.0%

Most occurring scripts

Value	Count	Frequency (%)
Common	554	100.0%

Most frequent Common characters

Value	Count	Frequency (%)
2	186	33.6%
0	185	33.4%
1	183	33.0%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	554	100.0%

Most frequent ASCII characters

Value	Count	Frequency (%)
2	186	33.6%
0	185	33.4%
1	183	33.0%

Is smiling
Boolean

Distinct count	2
Unique (%)	0.4%
Missing	0
Missing (%)	0.0%
Memory size	4.5 KiB

1	281
0	273

Frequency Table

Value	Count	Frequency (%)
1	281	50.7%
0	273	49.3%

Holding
Categorical

Distinct count	3
Unique (%)	0.5%
Missing	0
Missing (%)	0.0%
Memory size	4.5 KiB

1	188
2	184
0	182

Value	Count	Frequency (%)
1	188	33.9%
2	184	33.2%
0	182	32.9%

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Overview of Unicode Properties

Unique unicode characters	3
Unique unicode categories (?)	1
Unique unicode scripts (?)	1
Unique unicode blocks (?)	1

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

Value	Count	Frequency (%)
1	188	33.9%
2	184	33.2%
0	182	32.9%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	554	100.0%

Most frequent Decimal Number characters

Value	Count	Frequency (%)
1	188	33.9%
2	184	33.2%
0	182	32.9%

Most occurring scripts

Value	Count	Frequency (%)
Common	554	100.0%

Most frequent Common characters

Value	Count	Frequency (%)
1	188	33.9%
2	184	33.2%
0	182	32.9%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	554	100.0%

Most frequent ASCII characters

Value	Count	Frequency (%)
1	188	33.9%
2	184	33.2%
0	182	32.9%

Jacket color
Categorical

Distinct count	4
Unique (%)	0.7%
Missing	0
Missing (%)	0.0%
Memory size	4.5 KiB

2	140
3	139
0	139
1	136

Value	Count	Frequency (%)
2	140	25.3%
3	139	25.1%
0	139	25.1%
1	136	24.5%

Length

Max length	1
Median length	1
Mean length	1
Min length	1

Overview of Unicode Properties

Unique unicode characters	4
Unique unicode categories (?)	1
Unique unicode scripts (?)	1
Unique unicode blocks (?)	1

The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

Value	Count	Frequency (%)
2	140	25.3%
3	139	25.1%
0	139	25.1%
1	136	24.5%

Most occurring categories

Value	Count	Frequency (%)
Decimal Number	554	100.0%

Most frequent Decimal Number characters

Value	Count	Frequency (%)
2	140	25.3%
3	139	25.1%
0	139	25.1%
1	136	24.5%

Most occurring scripts

Value	Count	Frequency (%)
Common	554	100.0%

Most frequent Common characters

Value	Count	Frequency (%)
2	140	25.3%
3	139	25.1%
0	139	25.1%
1	136	24.5%

Most occurring blocks

Value	Count	Frequency (%)
ASCII	554	100.0%

Most frequent ASCII characters

Value	Count	Frequency (%)
2	140	25.3%
3	139	25.1%
0	139	25.1%
1	136	24.5%

Has tie
Boolean

Distinct count	2
Unique (%)	0.4%
Missing	0
Missing (%)	0.0%
Memory size	4.5 KiB

0	279
1	275

Frequency Table

Value	Count	Frequency (%)
0	279	50.4%
1	275	49.6%

target
Boolean

Distinct count	2
Unique (%)	0.4%
Missing	0
Missing (%)	0.0%
Memory size	4.5 KiB

1	288
0	266

Frequency Table

Value	Count	Frequency (%)
1	288	52.0%
0	266	48.0%

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Count
Matrix

First rows

	Head shape	Body shape	Is smiling	Holding	Jacket color	Has tie	target
0	1	1	1	2	2	0	1
1	1	1	1	2	3	1	1
2	1	1	1	2	3	0	1
3	1	1	1	2	1	1	0
4	1	1	1	2	0	1	0
5	1	1	1	0	2	1	1
6	1	1	1	0	3	0	1
7	1	1	1	0	0	0	0
8	1	1	0	2	3	0	1
9	1	1	0	2	0	0	0

Last rows

	Holding	Jacket color	Has tie
544	0	0	1
545	0	0	0
546	1	2	1
547	1	2	0
548	1	3	1
549	1	3	0
550	1	1	1
551	1	1	0
552	1	0	1
553	1	0	0

Most frequent

	Is smiling	Holding	Jacket color	Has tie	count
0	0	0	1	0	2
1	0	0	3	0	2
2	0	1	0	0	2
3	0	1	1	0	2
4	0	1	2	1	2
5	0	2	2	0	2
6	0	2	2	1	2
7	1	0	0	0	2
8	1	1	0	1	2
9	1	1	2	1	2

	Head shape	Body shape	Is smiling	Holding	Jacket color	Has tie	target
0	1	1	1	2	2	0	1
1	1	1	1	2	3	1	1
2	1	1	1	2	3	0	1
3	1	1	1	2	1	1	0
4	1	1	1	2	0	1	0
5	1	1	1	0	2	1	1
6	1	1	1	0	3	0	1
7	1	1	1	0	0	0	0
8	1	1	0	2	3	0	1
9	1	1	0	2	0	0	0

	Holding	Jacket color	Has tie
544	0	0	1
545	0	0	0
546	1	2	1
547	1	2	0
548	1	3	1
549	1	3	0
550	1	1	1
551	1	1	0
552	1	0	1
553	1	0	0

	Is smiling	Holding	Jacket color	Has tie	count
0	0	0	1	0	2
1	0	0	3	0	2
2	0	1	0	0	2
3	0	1	1	0	2
4	0	1	2	1	2
5	0	2	2	0	2
6	0	2	2	1	2
7	1	0	0	0	2
8	1	1	0	1	2
9	1	1	2	1	2

	Head shape	Body shape	Is smiling	Holding	Jacket color	Has tie	target
0	1	1	1	2	2	0	1
1	1	1	1	2	3	1	1
2	1	1	1	2	3	0	1
3	1	1	1	2	1	1	0
4	1	1	1	2	0	1	0
5	1	1	1	0	2	1	1
6	1	1	1	0	3	0	1
7	1	1	1	0	0	0	0
8	1	1	0	2	3	0	1
9	1	1	0	2	0	0	0

	Holding	Jacket color	Has tie
544	0	0	1
545	0	0	0
546	1	2	1
547	1	2	0
548	1	3	1
549	1	3	0
550	1	1	1
551	1	1	0
552	1	0	1
553	1	0	0

	Is smiling	Holding	Jacket color	Has tie	count
0	0	0	1	0	2
1	0	0	3	0	2
2	0	1	0	0	2
3	0	1	1	0	2
4	0	1	2	1	2
5	0	2	2	0	2
6	0	2	2	1	2
7	1	0	0	0	2
8	1	1	0	1	2
9	1	1	2	1	2

Overview

Variables

Most occurring characters

Most occurring categories

Most frequent Decimal Number characters

Most occurring scripts

Most frequent Common characters

Most occurring blocks

Most frequent ASCII characters

Most occurring characters

Most occurring categories

Most frequent Decimal Number characters

Most occurring scripts

Most frequent Common characters

Most occurring blocks

Most frequent ASCII characters

Most occurring characters

Most occurring categories

Most frequent Decimal Number characters

Most occurring scripts

Most frequent Common characters

Most occurring blocks

Most frequent ASCII characters

Most occurring characters

Most occurring categories

Most frequent Decimal Number characters

Most occurring scripts

Most frequent Common characters

Most occurring blocks

Most frequent ASCII characters

Correlations

Pearson's r

Spearman's ρ

Kendall's τ

Phik (φk)

Cramér's V (φc)

Missing values

Sample

First rows

Last rows

Duplicate rows

Most frequent

	Head shape	Body shape	Is smiling	Holding	Jacket color	Has tie	target
0	1	1	1	2	2	0	1
1	1	1	1	2	3	1	1
2	1	1	1	2	3	0	1
3	1	1	1	2	1	1	0
4	1	1	1	2	0	1	0
5	1	1	1	0	2	1	1
6	1	1	1	0	3	0	1
7	1	1	1	0	0	0	0
8	1	1	0	2	3	0	1
9	1	1	0	2	0	0	0

	Holding	Jacket color	Has tie
544	0	0	1
545	0	0	0
546	1	2	1
547	1	2	0
548	1	3	1
549	1	3	0
550	1	1	1
551	1	1	0
552	1	0	1
553	1	0	0

	Is smiling	Holding	Jacket color	Has tie	count
0	0	0	1	0	2
1	0	0	3	0	2
2	0	1	0	0	2
3	0	1	1	0	2
4	0	1	2	1	2
5	0	2	2	0	2
6	0	2	2	1	2
7	1	0	0	0	2
8	1	1	0	1	2
9	1	1	2	1	2