Overview

Dataset statistics

Number of variables9
Number of observations22784
Missing cells0
Missing cells (%)0.0%
Duplicate rows1
Duplicate rows (%)< 0.1%
Total size in memory1.6 MiB
Average record size in memory72.0 B

Variable types

NUM9

Reproduction

Analysis started2020-08-24 23:55:04.603621
Analysis finished2020-08-24 23:55:18.230372
Duration13.63 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 1 (< 0.1%) duplicate rows Duplicates
P3 is highly skewed (γ1 = 74.16685526) Skewed
P6p4 has 7495 (32.9%) zeros Zeros
P19p2 has 9203 (40.4%) zeros Zeros
H5p2 has 2553 (11.2%) zeros Zeros
H40p4 has 4159 (18.3%) zeros Zeros

Variables

P3
Real number (ℝ≥0)

SKEWED

Distinct count5818
Unique (%)25.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2935.8657391151687
Minimum1.0
Maximum2819401.0
Zeros0
Zeros (%)0.0%
Memory size178.1 KiB
2020-08-24T23:55:18.284146image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile41
Q1163
median506
Q31683
95-th percentile10129.7
Maximum2819401
Range2819400
Interquartile range (IQR)1520

Descriptive statistics

Standard deviation24949.88017
Coefficient of variation (CV)8.498304209
Kurtosis7546.166944
Mean2935.865739
Median Absolute Deviation (MAD)421
Skewness74.16685526
Sum66890765
Variance622496520.4
2020-08-24T23:55:18.391043image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
71560.2%
 
39550.2%
 
88550.2%
 
72550.2%
 
63550.2%
 
66540.2%
 
60530.2%
 
44520.2%
 
67510.2%
 
101490.2%
 
56490.2%
 
43490.2%
 
65480.2%
 
68480.2%
 
76480.2%
 
49480.2%
 
55480.2%
 
48480.2%
 
58470.2%
 
103470.2%
 
42460.2%
 
89460.2%
 
83460.2%
 
90460.2%
 
81460.2%
 
Other values (5793)2153994.5%
 
ValueCountFrequency (%) 
13< 0.1%
 
27< 0.1%
 
310< 0.1%
 
4120.1%
 
510< 0.1%
 
67< 0.1%
 
7160.1%
 
8130.1%
 
9200.1%
 
10170.1%
 
ValueCountFrequency (%) 
28194011< 0.1%
 
12174051< 0.1%
 
10251741< 0.1%
 
6168771< 0.1%
 
6030751< 0.1%
 
4060961< 0.1%
 
4020601< 0.1%
 
3740571< 0.1%
 
3699211< 0.1%
 
3267611< 0.1%
 

P6p4
Real number (ℝ≥0)

ZEROS

Distinct count12051
Unique (%)52.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.010329851520747923
Minimum0.0
Maximum0.8944444060325623
Zeros7495
Zeros (%)32.9%
Memory size178.1 KiB
2020-08-24T23:55:18.700759image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.002361449995
Q30.007428525132
95-th percentile0.0380582789
Maximum0.894444406
Range0.894444406
Interquartile range (IQR)0.007428525132

Descriptive statistics

Standard deviation0.04210519478
Coefficient of variation (CV)4.076069699
Kurtosis215.0800043
Mean0.01032985152
Median Absolute Deviation (MAD)0.002361449995
Skewness13.28553655
Sum235.355337
Variance0.001772847428
2020-08-24T23:55:18.807376image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0749532.9%
 
0.002793299966120.1%
 
0.0045872000059< 0.1%
 
0.0061349999169< 0.1%
 
0.008849600339< 0.1%
 
0.0033784001139< 0.1%
 
0.0037879000889< 0.1%
 
0.0033223000369< 0.1%
 
0.0054644998169< 0.1%
 
0.0050504999249< 0.1%
 
0.0091743003589< 0.1%
 
0.0040984000088< 0.1%
 
0.001508299968< 0.1%
 
0.0047392998828< 0.1%
 
0.0067568002278< 0.1%
 
0.0048544001778< 0.1%
 
0.0053190998738< 0.1%
 
0.0029239999138< 0.1%
 
0.0028985999528< 0.1%
 
0.0028329000348< 0.1%
 
0.00087410002028< 0.1%
 
0.0024510000378< 0.1%
 
0.006578899928< 0.1%
 
0.0071429000248< 0.1%
 
0.0038910999438< 0.1%
 
Other values (12026)1508466.2%
 
ValueCountFrequency (%) 
0749532.9%
 
0.00017840000511< 0.1%
 
0.00019079999771< 0.1%
 
0.00020070000031< 0.1%
 
0.00020219999711< 0.1%
 
0.00020629999931< 0.1%
 
0.00021030000061< 0.1%
 
0.00021379999821< 0.1%
 
0.00021549999661< 0.1%
 
0.00022050000551< 0.1%
 
ValueCountFrequency (%) 
0.8944444061< 0.1%
 
0.89165627961< 0.1%
 
0.86230248211< 0.1%
 
0.85828638081< 0.1%
 
0.84197771551< 0.1%
 
0.84159511331< 0.1%
 
0.83791947361< 0.1%
 
0.83333331351< 0.1%
 
0.81763279441< 0.1%
 
0.81648749111< 0.1%
 

P11p3
Real number (ℝ≥0)

Distinct count18765
Unique (%)82.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4840210583842532
Minimum0.07987560331821443
Maximum1.0
Zeros0
Zeros (%)0.0%
Memory size178.1 KiB
2020-08-24T23:55:18.919490image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.07987560332
5-th percentile0.3934582919
Q10.4487662464
median0.4833839387
Q30.5217391253
95-th percentile0.575201562
Maximum1
Range0.9201243967
Interquartile range (IQR)0.07297287881

Descriptive statistics

Standard deviation0.06033422973
Coefficient of variation (CV)0.1246520759
Kurtosis3.677803767
Mean0.4840210584
Median Absolute Deviation (MAD)0.03641425073
Skewness-0.2709088479
Sum11027.93579
Variance0.003640219277
2020-08-24T23:55:19.020641image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.51780.8%
 
0.4444443882350.2%
 
0.428571403330.1%
 
0.4615384936330.1%
 
0.4782609046250.1%
 
0.4545454979220.1%
 
0.400000006220.1%
 
0.4705882072210.1%
 
0.4761905074210.1%
 
0.4736841917200.1%
 
0.4666666985190.1%
 
0.5454545021170.1%
 
0.4642857015170.1%
 
0.4499999881170.1%
 
0.4583333135160.1%
 
0.5263158083160.1%
 
0.6000000238160.1%
 
0.5333333015150.1%
 
0.4375150.1%
 
0.46875150.1%
 
0.4482758939140.1%
 
0.4893617034140.1%
 
0.472222209140.1%
 
0.5135135055130.1%
 
0.4827586114130.1%
 
Other values (18740)2214397.2%
 
ValueCountFrequency (%) 
0.079875603321< 0.1%
 
0.088044799861< 0.1%
 
0.096834301951< 0.1%
 
0.10532149671< 0.1%
 
0.10676159711< 0.1%
 
0.11664149911< 0.1%
 
0.12020179631< 0.1%
 
0.12050019951< 0.1%
 
0.12453869731< 0.1%
 
0.12849399451< 0.1%
 
ValueCountFrequency (%) 
11< 0.1%
 
0.96153849361< 0.1%
 
0.89743590351< 0.1%
 
0.88888889551< 0.1%
 
0.83999997381< 0.1%
 
0.83035707471< 0.1%
 
0.81666672231< 0.1%
 
0.81578952071< 0.1%
 
0.81251< 0.1%
 
0.81159418821< 0.1%
 

P16p2
Real number (ℝ≥0)

Distinct count15570
Unique (%)68.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.716130718410936
Minimum0.2337023019790649
Maximum1.0
Zeros0
Zeros (%)0.0%
Memory size178.1 KiB
2020-08-24T23:55:19.139107image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0.233702302
5-th percentile0.5800441563
Q10.6622828692
median0.7142856717
Q30.7710386366
95-th percentile0.8604651093
Maximum1
Range0.766297698
Interquartile range (IQR)0.1087557673

Descriptive statistics

Standard deviation0.08726447653
Coefficient of variation (CV)0.1218555137
Kurtosis1.085519526
Mean0.7161307184
Median Absolute Deviation (MAD)0.05420303345
Skewness-0.1347583095
Sum16316.32229
Variance0.007615088865
2020-08-24T23:55:19.269211image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0.751590.7%
 
0.66666668651470.6%
 
0.8000000119970.4%
 
0.7142856717810.4%
 
0.6999999881650.3%
 
0.777777791600.3%
 
0.7272726893560.2%
 
0.6000000238470.2%
 
0.8333333135460.2%
 
1400.2%
 
0.7333332896400.2%
 
0.6923077106380.2%
 
0.769230783370.2%
 
0.5360.2%
 
0.6875340.1%
 
0.7619047761340.1%
 
0.625340.1%
 
0.6363636255330.1%
 
0.7857143283330.1%
 
0.647058785320.1%
 
0.7058823705310.1%
 
0.722222209310.1%
 
0.8181818128290.1%
 
0.7083333135270.1%
 
0.6842104793270.1%
 
Other values (15545)2149094.3%
 
ValueCountFrequency (%) 
0.2337023021< 0.1%
 
0.23690770571< 0.1%
 
0.24554879961< 0.1%
 
0.251< 0.1%
 
0.25779870151< 0.1%
 
0.2598376871< 0.1%
 
0.28050848841< 0.1%
 
0.29032260181< 0.1%
 
0.30000001191< 0.1%
 
0.30844160911< 0.1%
 
ValueCountFrequency (%) 
1400.2%
 
0.99598932271< 0.1%
 
0.99590998891< 0.1%
 
0.99581587311< 0.1%
 
0.99497491121< 0.1%
 
0.9930555821< 0.1%
 
0.99277460581< 0.1%
 
0.99251681571< 0.1%
 
0.99250942471< 0.1%
 
0.99239128831< 0.1%
 

P19p2
Real number (ℝ≥0)

ZEROS

Distinct count10941
Unique (%)48.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.05743685887154143
Minimum0.0
Maximum1.0
Zeros9203
Zeros (%)40.4%
Memory size178.1 KiB
2020-08-24T23:55:19.405436image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0.002538099885
Q30.02992774919
95-th percentile0.3594396457
Maximum1
Range1
Interquartile range (IQR)0.02992774919

Descriptive statistics

Standard deviation0.1398113667
Coefficient of variation (CV)2.434175013
Kurtosis14.83941661
Mean0.05743685887
Median Absolute Deviation (MAD)0.002538099885
Skewness3.605640596
Sum1308.641393
Variance0.01954721825
2020-08-24T23:55:19.515090image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0920340.4%
 
0.5120.1%
 
0.0103093003911< 0.1%
 
0.0052355998211< 0.1%
 
0.2510< 0.1%
 
0.014285700410< 0.1%
 
110< 0.1%
 
0.0199999995510< 0.1%
 
0.00588240008810< 0.1%
 
0.0040815998810< 0.1%
 
0.0285714007910< 0.1%
 
0.0454544983810< 0.1%
 
0.024390200159< 0.1%
 
0.015151499779< 0.1%
 
0.014084500269< 0.1%
 
0.0020367000259< 0.1%
 
0.032258100819< 0.1%
 
0.0033223000369< 0.1%
 
0.02564099999< 0.1%
 
0.030302999549< 0.1%
 
0.0033332998868< 0.1%
 
0.0089285997678< 0.1%
 
0.012195100088< 0.1%
 
0.010638300338< 0.1%
 
0.0037175000188< 0.1%
 
Other values (10916)1335558.6%
 
ValueCountFrequency (%) 
0920340.4%
 
0.00010900000051< 0.1%
 
0.00020559999391< 0.1%
 
0.00020809999841< 0.1%
 
0.00020939999381< 0.1%
 
0.00022719999831< 0.1%
 
0.00022909999831< 0.1%
 
0.00023069999591< 0.1%
 
0.00023269999661< 0.1%
 
0.00023859999781< 0.1%
 
ValueCountFrequency (%) 
110< 0.1%
 
0.99928617481< 0.1%
 
0.99720668791< 0.1%
 
0.9971181751< 0.1%
 
0.99698799851< 0.1%
 
0.99661022421< 0.1%
 
0.99633699661< 0.1%
 
0.99503719811< 0.1%
 
0.99429219961< 0.1%
 
0.99402987961< 0.1%
 

H5p2
Real number (ℝ≥0)

ZEROS

Distinct count6002
Unique (%)26.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.17151803101478946
Minimum0.0
Maximum1.0
Zeros2553
Zeros (%)11.2%
Memory size178.1 KiB
2020-08-24T23:55:19.640119image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.0793591477
median0.1478261054
Q30.2307692021
95-th percentile0.4375
Maximum1
Range1
Interquartile range (IQR)0.1514100544

Descriptive statistics

Standard deviation0.1389870691
Coefficient of variation (CV)0.8103350316
Kurtosis4.25561241
Mean0.171518031
Median Absolute Deviation (MAD)0.07439608872
Skewness1.550698063
Sum3907.866819
Variance0.01931740537
2020-08-24T23:55:19.755880image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0255311.2%
 
0.2000000034572.0%
 
0.254311.9%
 
0.16666670144101.8%
 
0.33333331353501.5%
 
0.14285710453161.4%
 
0.1253111.4%
 
0.1111110972751.2%
 
0.52491.1%
 
0.10000000152431.1%
 
0.090909101072030.9%
 
0.28571429851890.8%
 
0.22222219411790.8%
 
0.083333298561740.8%
 
0.18181820211640.7%
 
0.076923102141540.7%
 
0.15384620431450.6%
 
0.4000000061330.6%
 
0.13333329561260.6%
 
0.11764709651250.5%
 
0.071428596971200.5%
 
0.27272731071150.5%
 
0.066666699951080.5%
 
0.055555600671010.4%
 
0.1764705926960.4%
 
Other values (5977)1505766.1%
 
ValueCountFrequency (%) 
0255311.2%
 
0.0008439000231< 0.1%
 
0.0013193000341< 0.1%
 
0.0014535000081< 0.1%
 
0.0015480000291< 0.1%
 
0.0017331000421< 0.1%
 
0.0018187999961< 0.1%
 
0.0019511999561< 0.1%
 
0.0020789999981< 0.1%
 
0.0024814000351< 0.1%
 
ValueCountFrequency (%) 
1470.2%
 
0.97560977941< 0.1%
 
0.94520550971< 0.1%
 
0.92307692771< 0.1%
 
0.9130434991< 0.1%
 
0.91007190941< 0.1%
 
0.90909087661< 0.1%
 
0.89285707471< 0.1%
 
0.88888889551< 0.1%
 
0.87969917061< 0.1%
 

H15p1
Real number (ℝ≥0)

Distinct count18583
Unique (%)81.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.978000387167453
Minimum0.0
Maximum10.0
Zeros21
Zeros (%)0.1%
Memory size178.1 KiB
2020-08-24T23:55:19.892777image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile4.939043546
Q15.548364162
median5.958333492
Q36.363078475
95-th percentile7.203769851
Maximum10
Range10
Interquartile range (IQR)0.8147143126

Descriptive statistics

Standard deviation0.7586486479
Coefficient of variation (CV)0.1269067579
Kurtosis5.913719514
Mean5.978000387
Median Absolute Deviation (MAD)0.4069666862
Skewness-0.2684810011
Sum136202.7608
Variance0.5755477709
2020-08-24T23:55:19.999386image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
61060.5%
 
5550.2%
 
5.5520.2%
 
5.666666508350.2%
 
5.571428776280.1%
 
6.333333492270.1%
 
5.714285851240.1%
 
7240.1%
 
5.800000191220.1%
 
5.75210.1%
 
0210.1%
 
6.5210.1%
 
5.25200.1%
 
6.25200.1%
 
5.333333492200.1%
 
5.428571224190.1%
 
5.599999905180.1%
 
5.833333492170.1%
 
6.142857075160.1%
 
5.625160.1%
 
5.727272511150.1%
 
5.875150.1%
 
6.400000095150.1%
 
6.666666508150.1%
 
6.166666508150.1%
 
Other values (18558)2212797.1%
 
ValueCountFrequency (%) 
0210.1%
 
11< 0.1%
 
1.8181817531< 0.1%
 
1.8214286571< 0.1%
 
1.843751< 0.1%
 
1.8500000241< 0.1%
 
1.8571429251< 0.1%
 
1.9310344461< 0.1%
 
2.0727272031< 0.1%
 
2.1251< 0.1%
 
ValueCountFrequency (%) 
101< 0.1%
 
9.7407407761< 0.1%
 
9.703703881< 0.1%
 
9.516128541< 0.1%
 
9.4444446561< 0.1%
 
9.4285717011< 0.1%
 
9.3222217561< 0.1%
 
9.2641963961< 0.1%
 
9.2549018861< 0.1%
 
9.2539682391< 0.1%
 

H40p4
Real number (ℝ≥0)

ZEROS

Distinct count2421
Unique (%)10.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4916256467710881
Minimum0.0
Maximum1.0
Zeros4159
Zeros (%)18.3%
Memory size178.1 KiB
2020-08-24T23:55:20.114793image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.2432432026
median0.5
Q30.75
95-th percentile1
Maximum1
Range1
Interquartile range (IQR)0.5067567974

Descriptive statistics

Standard deviation0.3316551073
Coefficient of variation (CV)0.6746090435
Kurtosis-1.095900308
Mean0.4916256468
Median Absolute Deviation (MAD)0.25
Skewness-0.02660254313
Sum11201.19874
Variance0.1099951102
2020-08-24T23:55:20.224204image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0415918.3%
 
1333214.6%
 
0.515536.8%
 
0.66666668659904.3%
 
0.33333331356582.9%
 
0.755712.5%
 
0.60000002384251.9%
 
0.80000001193591.6%
 
0.253501.5%
 
0.4000000062681.2%
 
0.83333331352451.1%
 
0.71428567172351.0%
 
0.2000000032301.0%
 
0.5714285971970.9%
 
0.4285714031860.8%
 
0.6251750.8%
 
0.85714292531600.7%
 
0.3751440.6%
 
0.28571429851430.6%
 
0.44444438821370.6%
 
0.5555555821300.6%
 
0.7777777911190.5%
 
0.16666670141090.5%
 
0.4545454979920.4%
 
0.5454545021900.4%
 
Other values (2396)772733.9%
 
ValueCountFrequency (%) 
0415918.3%
 
0.019230799751< 0.1%
 
0.023255800832< 0.1%
 
0.023809500041< 0.1%
 
0.025000000371< 0.1%
 
0.027026999741< 0.1%
 
0.028571400791< 0.1%
 
0.030302999541< 0.1%
 
0.031496100131< 0.1%
 
0.032258100812< 0.1%
 
ValueCountFrequency (%) 
1333214.6%
 
0.98575502631< 0.1%
 
0.98113209011< 0.1%
 
0.97777777911< 0.1%
 
0.97727268931< 0.1%
 
0.97500002381< 0.1%
 
0.97058820721< 0.1%
 
0.96666669851< 0.1%
 
0.96551722291< 0.1%
 
0.96153849361< 0.1%
 

target
Real number (ℝ≥0)

Distinct count2045
Unique (%)9.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50074.43978230337
Minimum0.0
Maximum500001.0
Zeros52
Zeros (%)0.2%
Memory size178.1 KiB
2020-08-24T23:55:20.335391image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile14999
Q121000
median33200
Q356100
95-th percentile150000
Maximum500001
Range500001
Interquartile range (IQR)35100

Descriptive statistics

Standard deviation52843.47555
Coefficient of variation (CV)1.055298387
Kurtosis20.326923
Mean50074.43978
Median Absolute Deviation (MAD)15300
Skewness3.755120054
Sum1140896036
Variance2792432908
2020-08-24T23:55:20.441383image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
14999311113.7%
 
21300980.4%
 
17500940.4%
 
31300890.4%
 
16300890.4%
 
26300870.4%
 
23800860.4%
 
18800800.4%
 
22500780.3%
 
20000700.3%
 
36300690.3%
 
20300670.3%
 
32500660.3%
 
15000650.3%
 
27500650.3%
 
24100630.3%
 
27300630.3%
 
33800620.3%
 
25600620.3%
 
28800610.3%
 
23300610.3%
 
28300610.3%
 
38800610.3%
 
20600600.3%
 
30300600.3%
 
Other values (2020)1795678.8%
 
ValueCountFrequency (%) 
0520.2%
 
14999311113.7%
 
15000650.3%
 
15100230.1%
 
15200380.2%
 
15300310.1%
 
15400330.1%
 
15500290.1%
 
15600500.2%
 
15700360.2%
 
ValueCountFrequency (%) 
500001470.2%
 
4942001< 0.1%
 
4926001< 0.1%
 
4811001< 0.1%
 
4786001< 0.1%
 
4712001< 0.1%
 
4693001< 0.1%
 
4684001< 0.1%
 
4669001< 0.1%
 
4625001< 0.1%
 

Interactions

2020-08-24T23:55:05.445562image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:05.598162image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:05.738966image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:05.881322image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:06.041765image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:06.188613image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:06.337096image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:06.482843image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:06.635052image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:06.784256image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:06.923671image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:07.061704image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:07.196577image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:07.346903image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:07.487323image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:07.625529image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:07.762419image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:07.901028image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:08.045316image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:08.185692image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:08.337309image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:08.474523image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:08.625977image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:08.764796image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:08.903025image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:09.044019image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:09.183708image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:09.324858image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:09.488191image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:09.825337image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:09.983172image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:10.159333image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:10.321284image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:10.482413image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:10.653851image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:10.812880image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:10.972261image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:11.120282image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:11.260093image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:11.401021image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:11.558898image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:11.701856image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:11.845750image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:11.990695image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:12.138339image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:12.283199image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:12.427748image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:12.567700image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:12.710483image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:12.866369image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:13.012104image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:13.160655image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:13.305510image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:13.450380image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:13.597723image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:13.742191image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:13.886398image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:14.024738image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:14.374153image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:14.517345image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:14.663992image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:14.807087image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:14.948743image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:15.105073image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:15.251895image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:15.393423image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:15.536145image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:15.704120image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:15.849551image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:15.994961image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:16.138446image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:16.282228image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:16.426152image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:16.577540image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:16.717838image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:16.855370image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:17.010602image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:17.158263image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:17.303300image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:17.446505image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:17.591836image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-24T23:55:20.567931image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-24T23:55:20.787183image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-24T23:55:20.998959image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-24T23:55:21.218698image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-24T23:55:17.851636image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:55:18.107890image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

P3P6p4P11p3P16p2P19p2H5p2H15p1H40p4target
07074.00.0049640.5074780.5797290.0366130.0202446.6187840.774059130600.0
1597.00.0038710.4800000.6951420.0033500.1707327.1639340.14285740500.0
21931.00.0023200.4777470.6835840.0000000.1176476.1858480.68750028700.0
3164.00.0000000.4925050.7804880.0000000.1000006.6198351.00000028500.0
4119.00.0000000.4806450.7563020.6722690.0000006.1616160.00000024100.0
5164.00.0000000.4316700.7926830.0548780.2500005.0530970.50000014999.0
6261.00.0000000.4813280.7241380.0000000.0000006.5921790.00000033200.0
732112.00.0237820.5384670.6148480.2639820.0696806.7114230.24878166300.0
81209.00.0054260.5239060.6749380.0041360.0471386.7313060.571429138000.0
91263.00.0213810.5142540.8859860.0095010.1142866.5510980.000000140600.0

Last rows

P3P6p4P11p3P16p2P19p2H5p2H15p1H40p4target
22774795.00.0057390.4945000.7559750.0012580.2500006.6621250.57142961200.0
227751143.00.0023260.4782320.7042870.0000000.1090916.6071430.33333329000.0
227763260.00.0000000.1914710.7064420.0000000.1021234.3128710.47524854500.0
22777105.00.0000000.5094340.7142860.0000000.0000005.9204550.00000020000.0
22778172.00.0000000.4148470.7383720.4244190.0555566.3333330.00000032500.0
227793664.00.0039670.4612170.6749450.0005460.1217396.5848180.21428638900.0
2278027037.00.0067550.4888440.6631650.1024150.1817725.8471760.59833827900.0
22781376.00.0143300.5619240.6888300.1090430.1666676.8857150.33333351100.0
22782113.00.0098040.5163400.7787610.0000000.0000005.5784320.00000017200.0
227832319.00.0098420.5332370.7041830.0159550.3629036.3661310.600000117700.0

Duplicate rows

Most frequent

P3P6p4P11p3P16p2P19p2H5p2H15p1H40p4targetcount
014.00.00.6818180.50.00.0036364.4444441.065600.02