Overview

Dataset statistics

Number of variables5
Number of observations50
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory2.1 KiB
Average record size in memory42.6 B

Variable types

NUM4
BOOL1

Reproduction

Analysis started2020-08-25 01:04:41.342190
Analysis finished2020-08-25 01:04:43.850281
Duration2.51 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

AIDS has unique values Unique
Total has unique values Unique
Age has 10 (20.0%) zeros Zeros
Race has 10 (20.0%) zeros Zeros

Variables

Age
Real number (ℝ≥0)

ZEROS

Distinct count5
Unique (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0
Minimum0
Maximum4
Zeros10
Zeros (%)20.0%
Memory size528.0 B
2020-08-25T01:04:43.899873image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.428571429
Coefficient of variation (CV)0.7142857143
Kurtosis-1.309707447
Mean2
Median Absolute Deviation (MAD)1
Skewness0
Sum100
Variance2.040816327
2020-08-25T01:04:44.009584image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
41020.0%
 
31020.0%
 
21020.0%
 
11020.0%
 
01020.0%
 
ValueCountFrequency (%) 
01020.0%
 
11020.0%
 
21020.0%
 
31020.0%
 
41020.0%
 
ValueCountFrequency (%) 
41020.0%
 
31020.0%
 
21020.0%
 
11020.0%
 
01020.0%
 

Race
Real number (ℝ≥0)

ZEROS

Distinct count5
Unique (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.0
Minimum0
Maximum4
Zeros10
Zeros (%)20.0%
Memory size528.0 B
2020-08-25T01:04:44.129536image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile4
Maximum4
Range4
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.428571429
Coefficient of variation (CV)0.7142857143
Kurtosis-1.309707447
Mean2
Median Absolute Deviation (MAD)1
Skewness0
Sum100
Variance2.040816327
2020-08-25T01:04:44.239903image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
41020.0%
 
31020.0%
 
21020.0%
 
11020.0%
 
01020.0%
 
ValueCountFrequency (%) 
01020.0%
 
11020.0%
 
21020.0%
 
31020.0%
 
41020.0%
 
ValueCountFrequency (%) 
41020.0%
 
31020.0%
 
21020.0%
 
11020.0%
 
01020.0%
 

AIDS
Real number (ℝ≥0)

UNIQUE

Distinct count50
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9037.02
Minimum3.0
Maximum82334.0
Zeros0
Zeros (%)0.0%
Memory size528.0 B
2020-08-25T01:04:44.352978image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum3
5-th percentile19.4
Q1114.5
median1640
Q39073.75
95-th percentile45579.5
Maximum82334
Range82331
Interquartile range (IQR)8959.25

Descriptive statistics

Standard deviation16823.62682
Coefficient of variation (CV)1.861634346
Kurtosis7.814248603
Mean9037.02
Median Absolute Deviation (MAD)1597
Skewness2.696343291
Sum451851
Variance283034419.3
2020-08-25T01:04:44.460070image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3420412.0%
 
4812.0%
 
10612.0%
 
8312.0%
 
612.0%
 
91012.0%
 
194412.0%
 
49312.0%
 
149012.0%
 
179012.0%
 
215212.0%
 
49012.0%
 
14012.0%
 
8233412.0%
 
41712.0%
 
39012.0%
 
2612.0%
 
25812.0%
 
56012.0%
 
116212.0%
 
73112.0%
 
3812.0%
 
2720012.0%
 
5177612.0%
 
6912.0%
 
Other values (25)2550.0%
 
ValueCountFrequency (%) 
312.0%
 
612.0%
 
1412.0%
 
2612.0%
 
3112.0%
 
3812.0%
 
4812.0%
 
5512.0%
 
6912.0%
 
7712.0%
 
ValueCountFrequency (%) 
8233412.0%
 
5530012.0%
 
5177612.0%
 
3800612.0%
 
3420412.0%
 
2720012.0%
 
2389612.0%
 
2071212.0%
 
1606812.0%
 
1571312.0%
 

Total
Real number (ℝ≥0)

UNIQUE

Distinct count50
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4423370.62
Minimum162616.0
Maximum22686934.0
Zeros0
Zeros (%)0.0%
Memory size528.0 B
2020-08-25T01:04:44.576148image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum162616
5-th percentile170489.75
Q1702786.5
median1616417.5
Q32806359.25
95-th percentile16994352.1
Maximum22686934
Range22524318
Interquartile range (IQR)2103572.75

Descriptive statistics

Standard deviation6371785.346
Coefficient of variation (CV)1.440481907
Kurtosis1.344818033
Mean4423370.62
Median Absolute Deviation (MAD)930730
Skewness1.663655902
Sum221168531
Variance4.05996485e+13
2020-08-25T01:04:44.677126image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
16261612.0%
 
227627612.0%
 
371836612.0%
 
1444338212.0%
 
1728265912.0%
 
17562612.0%
 
242298012.0%
 
16887612.0%
 
272760412.0%
 
17217012.0%
 
83715912.0%
 
250280012.0%
 
72656112.0%
 
1664197712.0%
 
1527037812.0%
 
358052312.0%
 
184580012.0%
 
1388828512.0%
 
25579012.0%
 
154038112.0%
 
69863712.0%
 
241001912.0%
 
148327812.0%
 
2268693412.0%
 
17879812.0%
 
Other values (25)2550.0%
 
ValueCountFrequency (%) 
16261612.0%
 
16887612.0%
 
16911512.0%
 
17217012.0%
 
17562612.0%
 
17879812.0%
 
19485812.0%
 
20236012.0%
 
25579012.0%
 
26563712.0%
 
ValueCountFrequency (%) 
2268693412.0%
 
2184591112.0%
 
1728265912.0%
 
1664197712.0%
 
1527037812.0%
 
1499942312.0%
 
1470429312.0%
 
1444338212.0%
 
1442295612.0%
 
1388828512.0%
 

target
Boolean

Distinct count2
Unique (%)4.0%
Missing0
Missing (%)0.0%
Memory size528.0 B
1
25
0
25
ValueCountFrequency (%) 
12550.0%
 
02550.0%
 

Interactions

2020-08-25T01:04:41.503234image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:41.635309image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:41.764544image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:41.879989image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:41.997654image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:42.133591image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:42.260640image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:42.372862image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:42.489737image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:42.601089image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:42.709142image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:42.813806image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:42.914565image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:43.029190image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:43.149341image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:43.251671image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:04:44.785477image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:04:44.947048image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:04:45.107880image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:04:45.267519image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:04:43.599022image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:04:43.769363image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

AgeRaceAIDSTotaltarget
0042555.014443382.01
11455300.014704293.01
22482334.016641977.01
33438006.013888285.01
44416068.021845911.01
5022489.02367256.01
61234204.02410019.01
72251776.02727604.01
83223896.02276276.01
94210169.03580523.01

Last rows

AgeRaceAIDSTotaltarget
40016.0726561.00
411183.0739686.00
4221106.0837159.00
433169.0698637.00
444155.01098938.00
45003.0175626.00
461078.0178798.00
472077.0202360.00
483031.0168876.00
494014.0265637.00