Overview

Dataset statistics

Number of variables5
Number of observations3848
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory150.4 KiB
Average record size in memory40.0 B

Variable types

NUM5

Reproduction

Analysis started2020-08-25 00:02:34.049641
Analysis finished2020-08-25 00:02:38.267732
Duration4.22 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Variables

RIDGE
Real number (ℝ)

Distinct count3809
Unique (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.003636643747891553
Minimum-23.283899307250977
Maximum21.40660095214844
Zeros0
Zeros (%)0.0%
Memory size30.2 KiB
2020-08-25T00:02:38.318301image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum-23.28389931
5-th percentile-11.17349524
Q1-3.983725011
median-0.163850002
Q34.64715004
95-th percentile10.14166489
Maximum21.40660095
Range44.69050026
Interquartile range (IQR)8.630875051

Descriptive statistics

Standard deviation6.398236563
Coefficient of variation (CV)-1759.379529
Kurtosis-0.05367512239
Mean-0.003636643748
Median Absolute Deviation (MAD)4.33465004
Skewness-0.1305804539
Sum-13.99380514
Variance40.93743111
2020-08-25T00:02:38.420086image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-5.05840015420.1%
 
-0.764500021920.1%
 
-6.88299989720.1%
 
-1.66240000720.1%
 
-0.705699980320.1%
 
3.0081999320.1%
 
-3.24099993720.1%
 
-0.874000012920.1%
 
-1.1155999920.1%
 
6.17740011220.1%
 
-0.563300013520.1%
 
5.75229978620.1%
 
4.32639980320.1%
 
-1.05260002620.1%
 
0.0472000017820.1%
 
-6.35459995320.1%
 
4.09819984420.1%
 
0.797299981120.1%
 
7.57200002720.1%
 
0.602400004920.1%
 
4.87290000920.1%
 
-1.32939994320.1%
 
4.66309976620.1%
 
4.23810005220.1%
 
6.31920003920.1%
 
Other values (3784)379898.7%
 
ValueCountFrequency (%) 
-23.283899311< 0.1%
 
-22.800500871< 0.1%
 
-21.752000811< 0.1%
 
-21.747100831< 0.1%
 
-20.865400311< 0.1%
 
-19.738100051< 0.1%
 
-19.468700411< 0.1%
 
-18.933700561< 0.1%
 
-18.76630021< 0.1%
 
-18.621000291< 0.1%
 
ValueCountFrequency (%) 
21.406600951< 0.1%
 
20.633300781< 0.1%
 
18.530099871< 0.1%
 
18.257499691< 0.1%
 
18.055999761< 0.1%
 
17.492900851< 0.1%
 
16.952199941< 0.1%
 
16.447099691< 0.1%
 
16.311000821< 0.1%
 
16.222499851< 0.1%
 

NUB
Real number (ℝ)

Distinct count3811
Unique (%)99.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0001596685136153349
Minimum-16.393499374389652
Maximum17.25830078125
Zeros0
Zeros (%)0.0%
Memory size30.2 KiB
2020-08-25T00:02:38.530598image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum-16.39349937
5-th percentile-8.222910118
Q1-3.757625043
median-0.2316999957
Q33.750525057
95-th percentile8.419920206
Maximum17.25830078
Range33.65180016
Interquartile range (IQR)7.508150101

Descriptive statistics

Standard deviation5.186310551
Coefficient of variation (CV)32481.73627
Kurtosis-0.3087882455
Mean0.0001596685136
Median Absolute Deviation (MAD)3.712400079
Skewness0.07219002509
Sum0.6144044404
Variance26.89781713
2020-08-25T00:02:38.632535image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
3.12520.1%
 
-5.54029989220.1%
 
-5.58960008620.1%
 
-3.76320004520.1%
 
1.9098999520.1%
 
-1.35720002720.1%
 
7.78240013120.1%
 
0.325100004720.1%
 
-1.26579999920.1%
 
-5.51770019520.1%
 
-2.28859996820.1%
 
2.64400005320.1%
 
1.35450005520.1%
 
9.80830001820.1%
 
7.77260017420.1%
 
0.801199972620.1%
 
-3.01029992120.1%
 
1.62689995820.1%
 
0.0175000000720.1%
 
-3.98569989220.1%
 
0.0494999997320.1%
 
-1.08860003920.1%
 
-2.85780000720.1%
 
4.94309997620.1%
 
2.05739998820.1%
 
Other values (3786)379898.7%
 
ValueCountFrequency (%) 
-16.393499371< 0.1%
 
-16.310499191< 0.1%
 
-15.871100431< 0.1%
 
-15.181900021< 0.1%
 
-14.598400121< 0.1%
 
-14.308300021< 0.1%
 
-14.142100331< 0.1%
 
-13.857199671< 0.1%
 
-13.61110021< 0.1%
 
-13.532199861< 0.1%
 
ValueCountFrequency (%) 
17.258300781< 0.1%
 
17.186399461< 0.1%
 
16.437599181< 0.1%
 
15.352700231< 0.1%
 
15.042499541< 0.1%
 
14.393699651< 0.1%
 
14.270099641< 0.1%
 
14.215000151< 0.1%
 
14.165900231< 0.1%
 
14.10799981< 0.1%
 

CRACK
Real number (ℝ)

Distinct count3816
Unique (%)99.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.0031031201140502016
Minimum-31.413000106811523
Maximum30.317800521850586
Zeros0
Zeros (%)0.0%
Memory size30.2 KiB
2020-08-25T00:02:38.749763image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum-31.41300011
5-th percentile-12.8170599
Q1-5.453274846
median-0.05614999868
Q35.661125183
95-th percentile12.49110994
Maximum30.31780052
Range61.73080063
Interquartile range (IQR)11.11440003

Descriptive statistics

Standard deviation7.875198832
Coefficient of variation (CV)2537.832421
Kurtosis-0.1554786953
Mean0.003103120114
Median Absolute Deviation (MAD)5.568200072
Skewness-0.05705019029
Sum11.9408062
Variance62.01875664
2020-08-25T00:02:38.853600image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
4.32929992720.1%
 
-3.9651000520.1%
 
3.875600120.1%
 
6.56650018720.1%
 
-0.440800011220.1%
 
1.23780000220.1%
 
2.53390002320.1%
 
5.88730001420.1%
 
2.39590001120.1%
 
7.82539987620.1%
 
-4.32189989120.1%
 
-2.24009990720.1%
 
-1.43449997920.1%
 
1.19270002820.1%
 
-0.0104999998620.1%
 
-1.1698999420.1%
 
-2.36789989520.1%
 
8.7075996420.1%
 
3.67820000620.1%
 
2.90630006820.1%
 
10.1878995920.1%
 
0.0208000000620.1%
 
-7.73439979620.1%
 
-10.3979997620.1%
 
0.699800014520.1%
 
Other values (3791)379898.7%
 
ValueCountFrequency (%) 
-31.413000111< 0.1%
 
-26.465200421< 0.1%
 
-25.543100361< 0.1%
 
-23.542900091< 0.1%
 
-22.736900331< 0.1%
 
-22.706100461< 0.1%
 
-22.463100431< 0.1%
 
-22.422899251< 0.1%
 
-21.985700611< 0.1%
 
-21.970300671< 0.1%
 
ValueCountFrequency (%) 
30.317800521< 0.1%
 
25.643400191< 0.1%
 
25.164699551< 0.1%
 
23.820400241< 0.1%
 
22.525999071< 0.1%
 
22.261299131< 0.1%
 
21.668699261< 0.1%
 
21.623199461< 0.1%
 
21.46330071< 0.1%
 
21.278299331< 0.1%
 

WEIGHT
Real number (ℝ)

Distinct count3826
Unique (%)99.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.004237028622148818
Minimum-34.03519821166992
Maximum35.80279922485352
Zeros0
Zeros (%)0.0%
Memory size30.2 KiB
2020-08-25T00:02:38.967777image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum-34.03519821
5-th percentile-16.10761023
Q1-7.018650055
median-0.1493500024
Q36.7997998
95-th percentile17.09936495
Maximum35.80279922
Range69.83799744
Interquartile range (IQR)13.81844985

Descriptive statistics

Standard deviation10.04309165
Coefficient of variation (CV)2370.314801
Kurtosis-0.1602870554
Mean0.004237028622
Median Absolute Deviation (MAD)6.893500015
Skewness0.1087342671
Sum16.30408614
Variance100.86369
2020-08-25T00:02:39.074525image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
1.57819998320.1%
 
-7.3070001620.1%
 
-1.28190004820.1%
 
-11.9310998920.1%
 
5.63280010220.1%
 
-0.552999973320.1%
 
-8.2594995520.1%
 
11.4315004320.1%
 
1.08700001220.1%
 
-5.80770015720.1%
 
4.51859998720.1%
 
-13.8886003520.1%
 
2.5894999520.1%
 
-1.89429998420.1%
 
0.855099976120.1%
 
-8.92039966620.1%
 
13.8762998620.1%
 
-11.5500001920.1%
 
2.61960005820.1%
 
-1.22580003720.1%
 
14.8641004620.1%
 
-8.17380046820.1%
 
12.888299941< 0.1%
 
-0.23440000411< 0.1%
 
-3.7307999131< 0.1%
 
Other values (3801)380198.8%
 
ValueCountFrequency (%) 
-34.035198211< 0.1%
 
-32.27650071< 0.1%
 
-30.969400411< 0.1%
 
-30.901399611< 0.1%
 
-30.57340051< 0.1%
 
-28.518199921< 0.1%
 
-28.201200491< 0.1%
 
-27.520599371< 0.1%
 
-27.198200231< 0.1%
 
-26.467300421< 0.1%
 
ValueCountFrequency (%) 
35.802799221< 0.1%
 
33.459400181< 0.1%
 
30.136999131< 0.1%
 
28.796699521< 0.1%
 
28.304199221< 0.1%
 
27.596599581< 0.1%
 
27.568700791< 0.1%
 
27.462299351< 0.1%
 
27.360700611< 0.1%
 
27.152799611< 0.1%
 

target
Real number (ℝ)

Distinct count3784
Unique (%)98.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.00016629382793204362
Minimum-12.03909969329834
Maximum10.867300033569336
Zeros0
Zeros (%)0.0%
Memory size30.2 KiB
2020-08-25T00:02:39.189725image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum-12.03909969
5-th percentile-5.079669857
Q1-2.132449985
median-0.0304499995
Q32.028625011
95-th percentile5.374345088
Maximum10.86730003
Range22.90639973
Interquartile range (IQR)4.161074996

Descriptive statistics

Standard deviation3.144394589
Coefficient of variation (CV)18908.66684
Kurtosis0.1951873519
Mean0.0001662938279
Median Absolute Deviation (MAD)2.080800056
Skewness0.1097937519
Sum0.6398986499
Variance9.887217333
2020-08-25T00:02:39.285316image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
-2.38240003620.1%
 
2.05480003420.1%
 
0.154100000920.1%
 
-0.880999982420.1%
 
-2.68470001220.1%
 
-3.54949998920.1%
 
1.01510000220.1%
 
-1.57330000420.1%
 
-2.10579991320.1%
 
-3.50950002720.1%
 
-1.77079999420.1%
 
-0.472200006220.1%
 
3.42219996520.1%
 
-0.592299997820.1%
 
2.33640003220.1%
 
-0.328500002620.1%
 
-2.82380008720.1%
 
-1.72580003720.1%
 
1.24660003220.1%
 
-3.60520005220.1%
 
-0.389299988720.1%
 
0.854099988920.1%
 
-3.50449991220.1%
 
0.339800000220.1%
 
0.455099999920.1%
 
Other values (3759)379898.7%
 
ValueCountFrequency (%) 
-12.039099691< 0.1%
 
-11.877200131< 0.1%
 
-10.326100351< 0.1%
 
-9.7083997731< 0.1%
 
-9.3949003221< 0.1%
 
-9.2762002941< 0.1%
 
-9.0034999851< 0.1%
 
-8.8751< 0.1%
 
-8.846799851< 0.1%
 
-8.7623996731< 0.1%
 
ValueCountFrequency (%) 
10.867300031< 0.1%
 
10.493399621< 0.1%
 
10.441699981< 0.1%
 
10.265000341< 0.1%
 
10.154800421< 0.1%
 
10.135800361< 0.1%
 
9.7566995621< 0.1%
 
9.6809997561< 0.1%
 
9.4362001421< 0.1%
 
9.4197998051< 0.1%
 

Interactions

2020-08-25T00:02:34.288263image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:34.413978image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:34.547467image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:34.678872image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:34.810626image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:34.935429image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:35.080241image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:35.226797image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:35.375635image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:35.529365image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:35.668715image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:35.805405image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:35.948318image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:36.091929image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:36.233520image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:36.372354image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:36.510356image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:36.659976image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:36.808588image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:36.960102image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:37.097175image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:37.223816image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:37.357147image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:37.661260image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:37.794265image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T00:02:39.390905image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T00:02:39.563096image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T00:02:39.736483image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T00:02:39.906767image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T00:02:38.011409image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T00:02:38.196425image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

RIDGENUBCRACKWEIGHTtarget
0-2.34823.63145.028910.8721-1.3852
1-1.15201.48053.2375-0.59392.1235
2-2.5245-6.8633-2.80378.4631-3.4126
35.7523-6.5091-5.15104.3480-10.3261
48.7494-3.8978-1.3834-14.8776-2.4153
510.4303-3.162812.7885-14.8519-6.4942
6-3.60494.60816.55405.97734.0404
7-5.6383-0.8158-3.81201.16747.0468
89.54344.08652.7542-18.9002-0.0672
9-9.02922.97233.675913.88204.2106

Last rows

RIDGENUBCRACKWEIGHTtarget
38387.1823-2.65480.244700-10.8065-4.7775
38393.4640-8.2061-1.421900-3.3024-3.8587
3840-12.0811-1.39754.74440023.95762.3006
38411.76355.4823-7.332600-0.2084-2.4527
384210.08414.1937-4.093000-12.4840-2.9099
3843-11.1764-3.1833-0.1941006.85078.5044
38444.8725-1.5653-1.354000-13.88862.1865
38456.38144.3648-22.422899-19.13341.8819
38462.7014-3.8759-7.262700-6.2986-0.4284
38476.6282-0.7684-10.631300-5.9356-3.4739