Overview

Dataset statistics

Number of variables20
Number of observations1728
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory270.1 KiB
Average record size in memory160.1 B

Variable types

BOOL19
CAT1

Reproduction

Analysis started2020-08-25 01:12:52.045766
Analysis finished2020-08-25 01:12:53.716880
Duration1.67 second
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Variables

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1152
1
576
ValueCountFrequency (%) 
0115266.7%
 
157633.3%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1152
1
576
ValueCountFrequency (%) 
0115266.7%
 
157633.3%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 

doors_2
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 

safety_low
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1152
1
576
ValueCountFrequency (%) 
0115266.7%
 
157633.3%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1152
1
576
ValueCountFrequency (%) 
0115266.7%
 
157633.3%
 

persons_4
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1152
1
576
ValueCountFrequency (%) 
0115266.7%
 
157633.3%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 

doors_3
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1152
1
576
ValueCountFrequency (%) 
0115266.7%
 
157633.3%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 

persons_2
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1152
1
576
ValueCountFrequency (%) 
0115266.7%
 
157633.3%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 

doors_4
Boolean

Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1296
1
432
ValueCountFrequency (%) 
0129675.0%
 
143225.0%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1152
1
576
ValueCountFrequency (%) 
0115266.7%
 
157633.3%
 

target
Categorical

Distinct count4
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size13.6 KiB
0
1210
1
384
3
 
69
2
 
65
ValueCountFrequency (%) 
0121070.0%
 
138422.2%
 
3694.0%
 
2653.8%
 
2020-08-25T01:12:53.808367image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
0121070.0%
 
138422.2%
 
3694.0%
 
2653.8%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1728100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0121070.0%
 
138422.2%
 
3694.0%
 
2653.8%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1728100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
0121070.0%
 
138422.2%
 
3694.0%
 
2653.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1728100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
0121070.0%
 
138422.2%
 
3694.0%
 
2653.8%
 

Correlations

2020-08-25T01:12:53.963634image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:12:54.329860image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:12:54.693720image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:12:55.231103image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:12:53.010241image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:12:53.526234image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

buying_price_vhighluggage_boot_size_bigluggage_boot_size_smallbuying_price_highdoors_2maintenance_price_highdoors_5morebuying_price_lowsafety_lowluggage_boot_size_medpersons_4buying_price_meddoors_3safety_highmaintenance_price_vhighpersons_2maintenance_price_lowdoors_4persons_moretarget
010101000100000110000
110101000000000110000
210101000000001110000
310001000110000110000
410001000010000110000
510001000010001110000
611001000100000110000
711001000000000110000
811001000000001110000
910101000101000100000

Last rows

buying_price_vhighluggage_boot_size_bigluggage_boot_size_smallbuying_price_highdoors_2maintenance_price_highdoors_5morebuying_price_lowsafety_lowluggage_boot_size_medpersons_4buying_price_meddoors_3safety_highmaintenance_price_vhighpersons_2maintenance_price_lowdoors_4persons_moretarget
171801000011001001001002
171900100011100000001010
172000100011000000001011
172100100011000001001013
172200000011110000001010
172300000011010000001013
172400000011010001001012
172501000011100000001010
172601000011000000001013
172701000011000001001012