Dataset statistics
Number of variables | 7 |
---|---|
Number of observations | 554 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 116 |
Duplicate rows (%) | 20.9% |
Total size in memory | 30.4 KiB |
Average record size in memory | 56.2 B |
Variable types
CAT | 4 |
---|---|
BOOL | 3 |
Reproduction
Analysis started | 2020-08-25 01:40:00.869449 |
---|---|
Analysis finished | 2020-08-25 01:40:01.566296 |
Duration | 0.7 seconds |
Version | pandas-profiling v2.8.0 |
Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
Download configuration | config.yaml |
Dataset has 116 (20.9%) duplicate rows | Duplicates |
Head shape
Categorical
Distinct count | 3 |
---|---|
Unique (%) | 0.5% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.5 KiB |
1 | |
---|---|
2 | |
0 |
Value | Count | Frequency (%) | |
1 | 192 | 34.7% | |
2 | 184 | 33.2% | |
0 | 178 | 32.1% |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Most occurring characters
Value | Count | Frequency (%) | |
1 | 192 | 34.7% | |
2 | 184 | 33.2% | |
0 | 178 | 32.1% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 554 | 100.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
1 | 192 | 34.7% | |
2 | 184 | 33.2% | |
0 | 178 | 32.1% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 554 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
1 | 192 | 34.7% | |
2 | 184 | 33.2% | |
0 | 178 | 32.1% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 554 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
1 | 192 | 34.7% | |
2 | 184 | 33.2% | |
0 | 178 | 32.1% |
Body shape
Categorical
Distinct count | 3 |
---|---|
Unique (%) | 0.5% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.5 KiB |
2 | |
---|---|
0 | |
1 |
Value | Count | Frequency (%) | |
2 | 186 | 33.6% | |
0 | 185 | 33.4% | |
1 | 183 | 33.0% |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Most occurring characters
Value | Count | Frequency (%) | |
2 | 186 | 33.6% | |
0 | 185 | 33.4% | |
1 | 183 | 33.0% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 554 | 100.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
2 | 186 | 33.6% | |
0 | 185 | 33.4% | |
1 | 183 | 33.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 554 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
2 | 186 | 33.6% | |
0 | 185 | 33.4% | |
1 | 183 | 33.0% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 554 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
2 | 186 | 33.6% | |
0 | 185 | 33.4% | |
1 | 183 | 33.0% |
Is smiling
Boolean
Distinct count | 2 |
---|---|
Unique (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.5 KiB |
1 | |
---|---|
0 |
Value | Count | Frequency (%) | |
1 | 281 | 50.7% | |
0 | 273 | 49.3% |
Holding
Categorical
Distinct count | 3 |
---|---|
Unique (%) | 0.5% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.5 KiB |
1 | |
---|---|
2 | |
0 |
Value | Count | Frequency (%) | |
1 | 188 | 33.9% | |
2 | 184 | 33.2% | |
0 | 182 | 32.9% |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Most occurring characters
Value | Count | Frequency (%) | |
1 | 188 | 33.9% | |
2 | 184 | 33.2% | |
0 | 182 | 32.9% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 554 | 100.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
1 | 188 | 33.9% | |
2 | 184 | 33.2% | |
0 | 182 | 32.9% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 554 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
1 | 188 | 33.9% | |
2 | 184 | 33.2% | |
0 | 182 | 32.9% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 554 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
1 | 188 | 33.9% | |
2 | 184 | 33.2% | |
0 | 182 | 32.9% |
Jacket color
Categorical
Distinct count | 4 |
---|---|
Unique (%) | 0.7% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.5 KiB |
2 | |
---|---|
3 | |
0 | |
1 |
Value | Count | Frequency (%) | |
2 | 140 | 25.3% | |
3 | 139 | 25.1% | |
0 | 139 | 25.1% | |
1 | 136 | 24.5% |
Length
Max length | 1 |
---|---|
Median length | 1 |
Mean length | 1 |
Min length | 1 |
Most occurring characters
Value | Count | Frequency (%) | |
2 | 140 | 25.3% | |
3 | 139 | 25.1% | |
0 | 139 | 25.1% | |
1 | 136 | 24.5% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 554 | 100.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
2 | 140 | 25.3% | |
3 | 139 | 25.1% | |
0 | 139 | 25.1% | |
1 | 136 | 24.5% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 554 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
2 | 140 | 25.3% | |
3 | 139 | 25.1% | |
0 | 139 | 25.1% | |
1 | 136 | 24.5% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 554 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
2 | 140 | 25.3% | |
3 | 139 | 25.1% | |
0 | 139 | 25.1% | |
1 | 136 | 24.5% |
Has tie
Boolean
Distinct count | 2 |
---|---|
Unique (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.5 KiB |
0 | |
---|---|
1 |
Value | Count | Frequency (%) | |
0 | 279 | 50.4% | |
1 | 275 | 49.6% |
target
Boolean
Distinct count | 2 |
---|---|
Unique (%) | 0.4% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 4.5 KiB |
1 | |
---|---|
0 |
Value | Count | Frequency (%) | |
1 | 288 | 52.0% | |
0 | 266 | 48.0% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
Head shape | Body shape | Is smiling | Holding | Jacket color | Has tie | target | |
---|---|---|---|---|---|---|---|
0 | 1 | 1 | 1 | 2 | 2 | 0 | 1 |
1 | 1 | 1 | 1 | 2 | 3 | 1 | 1 |
2 | 1 | 1 | 1 | 2 | 3 | 0 | 1 |
3 | 1 | 1 | 1 | 2 | 1 | 1 | 0 |
4 | 1 | 1 | 1 | 2 | 0 | 1 | 0 |
5 | 1 | 1 | 1 | 0 | 2 | 1 | 1 |
6 | 1 | 1 | 1 | 0 | 3 | 0 | 1 |
7 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
8 | 1 | 1 | 0 | 2 | 3 | 0 | 1 |
9 | 1 | 1 | 0 | 2 | 0 | 0 | 0 |
Last rows
Head shape | Body shape | Is smiling | Holding | Jacket color | Has tie | target | |
---|---|---|---|---|---|---|---|
544 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
545 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
546 | 0 | 0 | 0 | 1 | 2 | 1 | 0 |
547 | 0 | 0 | 0 | 1 | 2 | 0 | 0 |
548 | 0 | 0 | 0 | 1 | 3 | 1 | 0 |
549 | 0 | 0 | 0 | 1 | 3 | 0 | 0 |
550 | 0 | 0 | 0 | 1 | 1 | 1 | 0 |
551 | 0 | 0 | 0 | 1 | 1 | 0 | 0 |
552 | 0 | 0 | 0 | 1 | 0 | 1 | 0 |
553 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
Most frequent
Head shape | Body shape | Is smiling | Holding | Jacket color | Has tie | target | count | |
---|---|---|---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 2 |
1 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 2 |
2 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 2 |
3 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 2 |
4 | 0 | 0 | 0 | 1 | 2 | 1 | 0 | 2 |
5 | 0 | 0 | 0 | 2 | 2 | 0 | 0 | 2 |
6 | 0 | 0 | 0 | 2 | 2 | 1 | 0 | 2 |
7 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 2 |
8 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 2 |
9 | 0 | 0 | 1 | 1 | 2 | 1 | 0 | 2 |