Overview

Dataset statistics

Number of variables9
Number of observations768
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory54.1 KiB
Average record size in memory72.2 B

Variable types

NUM8
BOOL1

Reproduction

Analysis started2020-08-25 01:44:47.472306
Analysis finished2020-08-25 01:44:58.847836
Duration11.38 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Pregnant has 111 (14.5%) zeros Zeros
Diastolic blood pressure has 35 (4.6%) zeros Zeros
Triceps skin fold thickness has 227 (29.6%) zeros Zeros
2-Hour serum insulin has 374 (48.7%) zeros Zeros
Body mass index has 11 (1.4%) zeros Zeros

Variables

Pregnant
Real number (ℝ≥0)

ZEROS

Distinct count17
Unique (%)2.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.8450520833333335
Minimum0.0
Maximum17.0
Zeros111
Zeros (%)14.5%
Memory size6.1 KiB
2020-08-25T01:44:58.890263image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q36
95-th percentile10
Maximum17
Range17
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.369578063
Coefficient of variation (CV)0.8763413316
Kurtosis0.1592197775
Mean3.845052083
Median Absolute Deviation (MAD)2
Skewness0.9016739792
Sum2953
Variance11.35405632
2020-08-25T01:44:59.015091image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
113517.6%
 
011114.5%
 
210313.4%
 
3759.8%
 
4688.9%
 
5577.4%
 
6506.5%
 
7455.9%
 
8384.9%
 
9283.6%
 
10243.1%
 
11111.4%
 
13101.3%
 
1291.2%
 
1420.3%
 
1710.1%
 
1510.1%
 
ValueCountFrequency (%) 
011114.5%
 
113517.6%
 
210313.4%
 
3759.8%
 
4688.9%
 
5577.4%
 
6506.5%
 
7455.9%
 
8384.9%
 
9283.6%
 
ValueCountFrequency (%) 
1710.1%
 
1510.1%
 
1420.3%
 
13101.3%
 
1291.2%
 
11111.4%
 
10243.1%
 
9283.6%
 
8384.9%
 
7455.9%
 

plasma glucose
Real number (ℝ≥0)

Distinct count136
Unique (%)17.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean120.89453125
Minimum0.0
Maximum199.0
Zeros5
Zeros (%)0.7%
Memory size6.1 KiB
2020-08-25T01:44:59.133419image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile79
Q199
median117
Q3140.25
95-th percentile181
Maximum199
Range199
Interquartile range (IQR)41.25

Descriptive statistics

Standard deviation31.9726182
Coefficient of variation (CV)0.2644670347
Kurtosis0.6407798204
Mean120.8945312
Median Absolute Deviation (MAD)20
Skewness0.1737535018
Sum92847
Variance1022.248314
2020-08-25T01:44:59.251980image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
99172.2%
 
100172.2%
 
106141.8%
 
111141.8%
 
125141.8%
 
129141.8%
 
108131.7%
 
112131.7%
 
105131.7%
 
102131.7%
 
95131.7%
 
122121.6%
 
109121.6%
 
107111.4%
 
119111.4%
 
120111.4%
 
128111.4%
 
114111.4%
 
90111.4%
 
124111.4%
 
117111.4%
 
84101.3%
 
115101.3%
 
12691.2%
 
10391.2%
 
Other values (111)46360.3%
 
ValueCountFrequency (%) 
050.7%
 
4410.1%
 
5610.1%
 
5720.3%
 
6110.1%
 
6210.1%
 
6510.1%
 
6710.1%
 
6830.4%
 
7140.5%
 
ValueCountFrequency (%) 
19910.1%
 
19810.1%
 
19740.5%
 
19630.4%
 
19520.3%
 
19430.4%
 
19320.3%
 
19110.1%
 
19010.1%
 
18940.5%
 

Diastolic blood pressure
Real number (ℝ≥0)

ZEROS

Distinct count47
Unique (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.10546875
Minimum0.0
Maximum122.0
Zeros35
Zeros (%)4.6%
Memory size6.1 KiB
2020-08-25T01:44:59.378352image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile38.7
Q162
median72
Q380
95-th percentile90
Maximum122
Range122
Interquartile range (IQR)18

Descriptive statistics

Standard deviation19.35580717
Coefficient of variation (CV)0.2800908166
Kurtosis5.18015656
Mean69.10546875
Median Absolute Deviation (MAD)8
Skewness-1.843607983
Sum53073
Variance374.6472712
2020-08-25T01:44:59.483541image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
70577.4%
 
74526.8%
 
68455.9%
 
78455.9%
 
72445.7%
 
64435.6%
 
80405.2%
 
76395.1%
 
60374.8%
 
0354.6%
 
62344.4%
 
82303.9%
 
66303.9%
 
88253.3%
 
84233.0%
 
90222.9%
 
58212.7%
 
86212.7%
 
50131.7%
 
56121.6%
 
54111.4%
 
52111.4%
 
9281.0%
 
7581.0%
 
6570.9%
 
Other values (22)557.2%
 
ValueCountFrequency (%) 
0354.6%
 
2410.1%
 
3020.3%
 
3810.1%
 
4010.1%
 
4440.5%
 
4620.3%
 
4850.7%
 
50131.7%
 
52111.4%
 
ValueCountFrequency (%) 
12210.1%
 
11410.1%
 
11030.4%
 
10820.3%
 
10630.4%
 
10420.3%
 
10210.1%
 
10030.4%
 
9830.4%
 
9640.5%
 

Triceps skin fold thickness
Real number (ℝ≥0)

ZEROS

Distinct count51
Unique (%)6.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean20.536458333333332
Minimum0.0
Maximum99.0
Zeros227
Zeros (%)29.6%
Memory size6.1 KiB
2020-08-25T01:44:59.601613image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median23
Q332
95-th percentile44
Maximum99
Range99
Interquartile range (IQR)32

Descriptive statistics

Standard deviation15.95221757
Coefficient of variation (CV)0.776775494
Kurtosis-0.5200718662
Mean20.53645833
Median Absolute Deviation (MAD)12
Skewness0.1093724965
Sum15772
Variance254.4732453
2020-08-25T01:44:59.699199image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
022729.6%
 
32314.0%
 
30273.5%
 
27233.0%
 
23222.9%
 
33202.6%
 
18202.6%
 
28202.6%
 
31192.5%
 
19182.3%
 
39182.3%
 
29172.2%
 
37162.1%
 
22162.1%
 
26162.1%
 
25162.1%
 
40162.1%
 
35152.0%
 
41152.0%
 
15141.8%
 
36141.8%
 
17141.8%
 
20131.7%
 
24121.6%
 
42111.4%
 
Other values (26)11815.4%
 
ValueCountFrequency (%) 
022729.6%
 
720.3%
 
820.3%
 
1050.7%
 
1160.8%
 
1270.9%
 
13111.4%
 
1460.8%
 
15141.8%
 
1660.8%
 
ValueCountFrequency (%) 
9910.1%
 
6310.1%
 
6010.1%
 
5610.1%
 
5420.3%
 
5220.3%
 
5110.1%
 
5030.4%
 
4930.4%
 
4840.5%
 

2-Hour serum insulin
Real number (ℝ≥0)

ZEROS

Distinct count186
Unique (%)24.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.79947916666667
Minimum0.0
Maximum846.0
Zeros374
Zeros (%)48.7%
Memory size6.1 KiB
2020-08-25T01:44:59.802750image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median30.5
Q3127.25
95-th percentile293
Maximum846
Range846
Interquartile range (IQR)127.25

Descriptive statistics

Standard deviation115.2440024
Coefficient of variation (CV)1.444169856
Kurtosis7.214259554
Mean79.79947917
Median Absolute Deviation (MAD)30.5
Skewness2.272250858
Sum61286
Variance13281.18008
2020-08-25T01:44:59.905298image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
037448.7%
 
105111.4%
 
13091.2%
 
14091.2%
 
12081.0%
 
18070.9%
 
9470.9%
 
10070.9%
 
11060.8%
 
13560.8%
 
11560.8%
 
5650.7%
 
7650.7%
 
6650.7%
 
21050.7%
 
4950.7%
 
8840.5%
 
19040.5%
 
15540.5%
 
16540.5%
 
5440.5%
 
12540.5%
 
9040.5%
 
6440.5%
 
20040.5%
 
Other values (161)25733.5%
 
ValueCountFrequency (%) 
037448.7%
 
1410.1%
 
1510.1%
 
1610.1%
 
1820.3%
 
2210.1%
 
2320.3%
 
2510.1%
 
2910.1%
 
3210.1%
 
ValueCountFrequency (%) 
84610.1%
 
74410.1%
 
68010.1%
 
60010.1%
 
57910.1%
 
54510.1%
 
54310.1%
 
54010.1%
 
51010.1%
 
49520.3%
 

Body mass index
Real number (ℝ≥0)

ZEROS

Distinct count248
Unique (%)32.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean31.992578125000005
Minimum0.0
Maximum67.1
Zeros11
Zeros (%)1.4%
Memory size6.1 KiB
2020-08-25T01:45:00.207791image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile21.8
Q127.3
median32
Q336.6
95-th percentile44.395
Maximum67.1
Range67.1
Interquartile range (IQR)9.3

Descriptive statistics

Standard deviation7.88416032
Coefficient of variation (CV)0.2464371671
Kurtosis3.290442901
Mean31.99257813
Median Absolute Deviation (MAD)4.6
Skewness-0.4289815885
Sum24570.3
Variance62.15998396
2020-08-25T01:45:00.319541image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
32131.7%
 
31.6121.6%
 
31.2121.6%
 
0111.4%
 
32.4101.3%
 
33.3101.3%
 
32.991.2%
 
30.191.2%
 
32.891.2%
 
30.891.2%
 
34.281.0%
 
29.781.0%
 
33.681.0%
 
30.570.9%
 
35.570.9%
 
33.270.9%
 
25.970.9%
 
39.470.9%
 
30.470.9%
 
27.870.9%
 
3070.9%
 
28.770.9%
 
27.670.9%
 
28.960.8%
 
28.460.8%
 
Other values (223)55872.7%
 
ValueCountFrequency (%) 
0111.4%
 
18.230.4%
 
18.410.1%
 
19.110.1%
 
19.310.1%
 
19.410.1%
 
19.520.3%
 
19.630.4%
 
19.910.1%
 
2010.1%
 
ValueCountFrequency (%) 
67.110.1%
 
59.410.1%
 
57.310.1%
 
5510.1%
 
53.210.1%
 
52.910.1%
 
52.320.3%
 
5010.1%
 
49.710.1%
 
49.610.1%
 

Diabetes pedigree function
Real number (ℝ≥0)

Distinct count517
Unique (%)67.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.47187630208333337
Minimum0.078
Maximum2.42
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-08-25T01:45:00.436188image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.078
5-th percentile0.14035
Q10.24375
median0.3725
Q30.62625
95-th percentile1.13285
Maximum2.42
Range2.342
Interquartile range (IQR)0.3825

Descriptive statistics

Standard deviation0.331328595
Coefficient of variation (CV)0.7021513764
Kurtosis5.594953528
Mean0.4718763021
Median Absolute Deviation (MAD)0.1675
Skewness1.919911066
Sum362.401
Variance0.1097786379
2020-08-25T01:45:00.538589image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.25460.8%
 
0.25860.8%
 
0.23850.7%
 
0.26850.7%
 
0.25950.7%
 
0.20750.7%
 
0.26150.7%
 
0.29940.5%
 
0.16740.5%
 
0.24540.5%
 
0.30440.5%
 
0.28440.5%
 
0.2640.5%
 
0.68740.5%
 
0.26340.5%
 
0.55140.5%
 
0.19740.5%
 
0.23740.5%
 
0.69240.5%
 
0.2740.5%
 
0.1940.5%
 
0.14230.4%
 
0.16530.4%
 
0.20530.4%
 
0.58330.4%
 
Other values (492)66386.3%
 
ValueCountFrequency (%) 
0.07810.1%
 
0.08410.1%
 
0.08520.3%
 
0.08820.3%
 
0.08910.1%
 
0.09210.1%
 
0.09610.1%
 
0.110.1%
 
0.10110.1%
 
0.10210.1%
 
ValueCountFrequency (%) 
2.4210.1%
 
2.32910.1%
 
2.28810.1%
 
2.13710.1%
 
1.89310.1%
 
1.78110.1%
 
1.73110.1%
 
1.69910.1%
 
1.69810.1%
 
1.610.1%
 

Age
Real number (ℝ≥0)

Distinct count52
Unique (%)6.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.240885416666664
Minimum21.0
Maximum81.0
Zeros0
Zeros (%)0.0%
Memory size6.1 KiB
2020-08-25T01:45:00.653844image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum21
5-th percentile21
Q124
median29
Q341
95-th percentile58
Maximum81
Range60
Interquartile range (IQR)17

Descriptive statistics

Standard deviation11.76023154
Coefficient of variation (CV)0.3537881556
Kurtosis0.6431588885
Mean33.24088542
Median Absolute Deviation (MAD)7
Skewness1.129596701
Sum25529
Variance138.3030459
2020-08-25T01:45:00.767495image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
22729.4%
 
21638.2%
 
25486.2%
 
24466.0%
 
23384.9%
 
28354.6%
 
26334.3%
 
27324.2%
 
29293.8%
 
31243.1%
 
41222.9%
 
30212.7%
 
37192.5%
 
42182.3%
 
33172.2%
 
36162.1%
 
38162.1%
 
32162.1%
 
45152.0%
 
34141.8%
 
43131.7%
 
46131.7%
 
40131.7%
 
39121.6%
 
35101.3%
 
Other values (27)11314.7%
 
ValueCountFrequency (%) 
21638.2%
 
22729.4%
 
23384.9%
 
24466.0%
 
25486.2%
 
26334.3%
 
27324.2%
 
28354.6%
 
29293.8%
 
30212.7%
 
ValueCountFrequency (%) 
8110.1%
 
7210.1%
 
7010.1%
 
6920.3%
 
6810.1%
 
6730.4%
 
6640.5%
 
6530.4%
 
6410.1%
 
6340.5%
 

target
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.1 KiB
0
500
1
268
ValueCountFrequency (%) 
050065.1%
 
126834.9%
 

Interactions

2020-08-25T01:44:47.831351image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:47.995213image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:48.161686image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:48.320152image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:48.478579image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:48.631437image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:48.792140image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:48.957885image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:49.115692image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:49.283083image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:49.461627image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:49.624712image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:49.786090image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:49.942869image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:50.112789image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:50.275815image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:50.441375image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:50.792500image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:50.953337image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:51.109069image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:51.262739image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:51.410883image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:51.576314image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:51.736492image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:51.889547image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:52.042839image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:52.199915image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:52.352931image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:52.503584image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:52.646670image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:52.801450image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:52.949936image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:53.099679image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:53.245778image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:53.398044image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:53.544337image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:53.688969image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:53.827431image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:53.987376image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:54.140258image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:54.281571image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:54.446631image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:54.615111image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:54.773466image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:54.928332image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:55.085493image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:55.245394image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:55.403571image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:55.779793image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:55.956682image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:56.131667image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:56.317479image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:56.472301image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:56.621222image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:56.800536image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:56.951058image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:57.104977image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:57.264525image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:57.441933image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:57.605438image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:57.758761image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:57.904439image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:58.061649image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:58.217241image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:45:00.895026image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:45:01.129894image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:45:01.364923image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:45:01.604789image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:44:58.469085image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:44:58.724107image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

Pregnantplasma glucoseDiastolic blood pressureTriceps skin fold thickness2-Hour serum insulinBody mass indexDiabetes pedigree functionAgetarget
04.0117.062.012.00.029.70.38030.01
14.0158.078.00.00.032.90.80331.01
22.0118.080.00.00.042.90.69321.01
313.0129.00.030.00.039.90.56944.01
45.0162.0104.00.00.037.70.15152.01
57.0114.064.00.00.027.40.73234.01
66.0102.082.00.00.030.80.18036.01
71.0196.076.036.0249.036.50.87529.01
89.0102.076.037.00.032.90.66546.01
97.0161.086.00.00.030.40.16547.01

Last rows

Pregnantplasma glucoseDiastolic blood pressureTriceps skin fold thickness2-Hour serum insulinBody mass indexDiabetes pedigree functionAgetarget
7585.0132.080.00.00.026.80.18669.00
7599.091.068.00.00.024.20.20058.00
7603.0128.078.00.00.021.10.26855.00
7610.0108.068.020.00.027.30.78732.00
7622.0112.068.022.094.034.10.31526.00
7631.081.074.041.057.046.31.09632.00
7644.094.065.022.00.024.70.14821.00
7653.0158.064.013.0387.031.20.29524.00
7660.057.060.00.00.021.70.73567.00
7674.095.060.032.00.035.40.28428.00