Overview

Dataset statistics

Number of variables11
Number of observations699
Missing cells0
Missing cells (%)0.0%
Duplicate rows8
Duplicate rows (%)1.1%
Total size in memory60.2 KiB
Average record size in memory88.2 B

Variable types

NUM10
BOOL1

Reproduction

Analysis started2020-08-25 01:09:22.734408
Analysis finished2020-08-25 01:09:37.616083
Duration14.88 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 8 (1.1%) duplicate rows Duplicates
Uniformity of Cell Shape is highly correlated with Uniformity of Cell SizeHigh correlation
Uniformity of Cell Size is highly correlated with Uniformity of Cell ShapeHigh correlation
Bare Nuclei has 402 (57.5%) zeros Zeros

Variables

Sample code number
Real number (ℝ≥0)

Distinct count645
Unique (%)92.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1071704.0987124464
Minimum61634.0
Maximum13454352.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:09:37.664319image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum61634
5-th percentile411453
Q1870688.5
median1171710
Q31238298
95-th percentile1333890.8
Maximum13454352
Range13392718
Interquartile range (IQR)367609.5

Descriptive statistics

Standard deviation617095.7298
Coefficient of variation (CV)0.5758079404
Kurtosis257.7171591
Mean1071704.099
Median Absolute Deviation (MAD)104381
Skewness13.67532594
Sum749121165
Variance3.808071398e+11
2020-08-25T01:09:37.776834image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
118240460.9%
 
127609150.7%
 
119864130.4%
 
117171020.3%
 
56068020.3%
 
135484020.3%
 
123877720.3%
 
101702320.3%
 
41145320.3%
 
111529320.3%
 
116873620.3%
 
46690620.3%
 
111457020.3%
 
121242220.3%
 
133978120.3%
 
127779220.3%
 
114397820.3%
 
73411120.3%
 
132007720.3%
 
124060320.3%
 
110552420.3%
 
129959620.3%
 
65454620.3%
 
49345220.3%
 
32067520.3%
 
Other values (620)64191.7%
 
ValueCountFrequency (%) 
6163410.1%
 
6337510.1%
 
7638910.1%
 
9571910.1%
 
12805910.1%
 
14293210.1%
 
14488810.1%
 
14544710.1%
 
16029610.1%
 
16752810.1%
 
ValueCountFrequency (%) 
1345435210.1%
 
823370410.1%
 
137192010.1%
 
137102610.1%
 
136982110.1%
 
136888210.1%
 
136827310.1%
 
136826710.1%
 
136532810.1%
 
136507510.1%
 

Clump Thickness
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.417739628040057
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:09:37.901516image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.815740659
Coefficient of variation (CV)0.6373713473
Kurtosis-0.6237154123
Mean4.417739628
Median Absolute Deviation (MAD)2
Skewness0.5928585327
Sum3088
Variance7.928395456
2020-08-25T01:09:38.017317image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
114520.7%
 
513018.6%
 
310815.5%
 
48011.4%
 
10699.9%
 
2507.2%
 
8466.6%
 
6344.9%
 
7233.3%
 
9142.0%
 
ValueCountFrequency (%) 
114520.7%
 
2507.2%
 
310815.5%
 
48011.4%
 
513018.6%
 
6344.9%
 
7233.3%
 
8466.6%
 
9142.0%
 
10699.9%
 
ValueCountFrequency (%) 
10699.9%
 
9142.0%
 
8466.6%
 
7233.3%
 
6344.9%
 
513018.6%
 
48011.4%
 
310815.5%
 
2507.2%
 
114520.7%
 

Uniformity of Cell Size
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.13447782546495
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:09:38.126753image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.05145911
Coefficient of variation (CV)0.9735143395
Kurtosis0.09880288537
Mean3.134477825
Median Absolute Deviation (MAD)0
Skewness1.233136558
Sum2191
Variance9.3114027
2020-08-25T01:09:38.241302image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
138454.9%
 
10679.6%
 
3527.4%
 
2456.4%
 
4405.7%
 
5304.3%
 
8294.1%
 
6273.9%
 
7192.7%
 
960.9%
 
ValueCountFrequency (%) 
138454.9%
 
2456.4%
 
3527.4%
 
4405.7%
 
5304.3%
 
6273.9%
 
7192.7%
 
8294.1%
 
960.9%
 
10679.6%
 
ValueCountFrequency (%) 
10679.6%
 
960.9%
 
8294.1%
 
7192.7%
 
6273.9%
 
5304.3%
 
4405.7%
 
3527.4%
 
2456.4%
 
138454.9%
 

Uniformity of Cell Shape
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.207439198855508
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:09:38.351684image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.971912767
Coefficient of variation (CV)0.9265686995
Kurtosis0.007010980047
Mean3.207439199
Median Absolute Deviation (MAD)0
Skewness1.161859179
Sum2242
Variance8.832265496
2020-08-25T01:09:38.464496image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
135350.5%
 
2598.4%
 
10588.3%
 
3568.0%
 
4446.3%
 
5344.9%
 
6304.3%
 
7304.3%
 
8284.0%
 
971.0%
 
ValueCountFrequency (%) 
135350.5%
 
2598.4%
 
3568.0%
 
4446.3%
 
5344.9%
 
6304.3%
 
7304.3%
 
8284.0%
 
971.0%
 
10588.3%
 
ValueCountFrequency (%) 
10588.3%
 
971.0%
 
8284.0%
 
7304.3%
 
6304.3%
 
5344.9%
 
4446.3%
 
3568.0%
 
2598.4%
 
135350.5%
 

Marginal Adhesion
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.8068669527896994
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:09:38.764936image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.855379239
Coefficient of variation (CV)1.017283429
Kurtosis0.9879470695
Mean2.806866953
Median Absolute Deviation (MAD)0
Skewness1.524468091
Sum1962
Variance8.1531906
2020-08-25T01:09:38.884643image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
140758.2%
 
3588.3%
 
2588.3%
 
10557.9%
 
4334.7%
 
8253.6%
 
5233.3%
 
6223.1%
 
7131.9%
 
950.7%
 
ValueCountFrequency (%) 
140758.2%
 
2588.3%
 
3588.3%
 
4334.7%
 
5233.3%
 
6223.1%
 
7131.9%
 
8253.6%
 
950.7%
 
10557.9%
 
ValueCountFrequency (%) 
10557.9%
 
950.7%
 
8253.6%
 
7131.9%
 
6223.1%
 
5233.3%
 
4334.7%
 
3588.3%
 
2588.3%
 
140758.2%
 

Single Epithelial Cell Size
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.216022889842632
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:09:38.994913image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.214299887
Coefficient of variation (CV)0.6885211836
Kurtosis2.169066423
Mean3.21602289
Median Absolute Deviation (MAD)0
Skewness1.712171802
Sum2248
Variance4.903123988
2020-08-25T01:09:39.108089image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
238655.2%
 
37210.3%
 
4486.9%
 
1476.7%
 
6415.9%
 
5395.6%
 
10314.4%
 
8213.0%
 
7121.7%
 
920.3%
 
ValueCountFrequency (%) 
1476.7%
 
238655.2%
 
37210.3%
 
4486.9%
 
5395.6%
 
6415.9%
 
7121.7%
 
8213.0%
 
920.3%
 
10314.4%
 
ValueCountFrequency (%) 
10314.4%
 
920.3%
 
8213.0%
 
7121.7%
 
6415.9%
 
5395.6%
 
4486.9%
 
37210.3%
 
238655.2%
 
1476.7%
 

Bare Nuclei
Real number (ℝ≥0)

ZEROS

Distinct count11
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.4177396280400572
Minimum0
Maximum10
Zeros402
Zeros (%)57.5%
Memory size5.6 KiB
2020-08-25T01:09:39.223503image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile8
Maximum10
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.499862469
Coefficient of variation (CV)1.763273325
Kurtosis3.391647791
Mean1.417739628
Median Absolute Deviation (MAD)0
Skewness2.066583426
Sum991
Variance6.249312362
2020-08-25T01:09:39.332837image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
040257.5%
 
113218.9%
 
5304.3%
 
2304.3%
 
3284.0%
 
8213.0%
 
4192.7%
 
10162.3%
 
991.3%
 
781.1%
 
640.6%
 
ValueCountFrequency (%) 
040257.5%
 
113218.9%
 
2304.3%
 
3284.0%
 
4192.7%
 
5304.3%
 
640.6%
 
781.1%
 
8213.0%
 
991.3%
 
ValueCountFrequency (%) 
10162.3%
 
991.3%
 
8213.0%
 
781.1%
 
640.6%
 
5304.3%
 
4192.7%
 
3284.0%
 
2304.3%
 
113218.9%
 

Bland Chromatin
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.4377682403433476
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:09:39.445958image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.438364252
Coefficient of variation (CV)0.7092869798
Kurtosis0.1846213115
Mean3.43776824
Median Absolute Deviation (MAD)1
Skewness1.099969082
Sum2403
Variance5.945620227
2020-08-25T01:09:39.558134image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
216623.7%
 
316523.6%
 
115221.7%
 
77310.4%
 
4405.7%
 
5344.9%
 
8284.0%
 
10202.9%
 
9111.6%
 
6101.4%
 
ValueCountFrequency (%) 
115221.7%
 
216623.7%
 
316523.6%
 
4405.7%
 
5344.9%
 
6101.4%
 
77310.4%
 
8284.0%
 
9111.6%
 
10202.9%
 
ValueCountFrequency (%) 
10202.9%
 
9111.6%
 
8284.0%
 
77310.4%
 
6101.4%
 
5344.9%
 
4405.7%
 
316523.6%
 
216623.7%
 
115221.7%
 

Normal Nucleoli
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.866952789699571
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:09:39.668692image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.053633894
Coefficient of variation (CV)1.065114816
Kurtosis0.4742686755
Mean2.86695279
Median Absolute Deviation (MAD)0
Skewness1.422261257
Sum2004
Variance9.324679956
2020-08-25T01:09:39.774750image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
144363.4%
 
10618.7%
 
3446.3%
 
2365.2%
 
8243.4%
 
6223.1%
 
5192.7%
 
4182.6%
 
9162.3%
 
7162.3%
 
ValueCountFrequency (%) 
144363.4%
 
2365.2%
 
3446.3%
 
4182.6%
 
5192.7%
 
6223.1%
 
7162.3%
 
8243.4%
 
9162.3%
 
10618.7%
 
ValueCountFrequency (%) 
10618.7%
 
9162.3%
 
8243.4%
 
7162.3%
 
6223.1%
 
5192.7%
 
4182.6%
 
3446.3%
 
2365.2%
 
144363.4%
 

Mitoses
Real number (ℝ≥0)

Distinct count9
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5894134477825466
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:09:39.875406image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.715077943
Coefficient of variation (CV)1.07906344
Kurtosis12.65787807
Mean1.589413448
Median Absolute Deviation (MAD)0
Skewness3.560657844
Sum1111
Variance2.941492349
2020-08-25T01:09:39.985401image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
157982.8%
 
2355.0%
 
3334.7%
 
10142.0%
 
4121.7%
 
791.3%
 
881.1%
 
560.9%
 
630.4%
 
ValueCountFrequency (%) 
157982.8%
 
2355.0%
 
3334.7%
 
4121.7%
 
560.9%
 
630.4%
 
791.3%
 
881.1%
 
10142.0%
 
ValueCountFrequency (%) 
10142.0%
 
881.1%
 
791.3%
 
630.4%
 
560.9%
 
4121.7%
 
3334.7%
 
2355.0%
 
157982.8%
 

target
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
0
458
1
241
ValueCountFrequency (%) 
045865.5%
 
124134.5%
 

Interactions

2020-08-25T01:09:23.196894image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:23.358293image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:23.507915image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:23.658073image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:23.809059image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:23.959913image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:24.107847image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:24.257584image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:24.403066image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:24.550702image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:24.696805image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:24.848320image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:24.979664image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:25.107165image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:25.228700image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:25.354920image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:25.652170image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:25.783252image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:25.912024image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:26.045712image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:26.170624image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:26.310547image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:26.438163image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:26.568923image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:26.699290image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:26.826391image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:26.956322image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:27.091241image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:27.219863image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:27.348071image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:27.476848image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:27.619328image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:27.744649image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:27.873937image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:28.006910image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:28.135163image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:28.263399image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:28.402002image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:28.538246image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:28.668053image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:28.796830image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:28.939471image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:29.077400image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:29.208409image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:29.337503image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:29.470104image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:29.605442image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:29.745882image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:29.874267image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:30.188282image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:30.319885image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:30.465503image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:30.595569image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:30.731171image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:30.861390image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:30.992163image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:31.123900image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:31.257874image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:31.387717image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:31.520844image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:31.650153image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:31.794706image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:31.923913image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:32.061671image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:32.197601image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:32.327915image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:32.462565image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:32.598773image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:32.728738image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:32.866719image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:32.995908image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:33.149575image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:33.275218image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:33.404501image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:33.530468image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:33.657805image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:33.782661image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:33.915102image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:34.047848image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:34.176110image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:34.300293image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:34.617306image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:34.745844image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:34.873853image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:35.004204image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:35.138149image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:35.273059image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:35.405016image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:35.533401image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:35.664847image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:35.791661image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:35.934264image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:36.062657image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:36.192146image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:36.324033image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:36.452262image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:36.579460image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:36.714077image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:36.841362image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:36.969130image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:09:40.107955image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:09:40.368913image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:09:40.632799image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:09:40.899617image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:09:37.209000image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:09:37.488522image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

Sample code numberClump ThicknessUniformity of Cell SizeUniformity of Cell ShapeMarginal AdhesionSingle Epithelial Cell SizeBare NucleiBland ChromatinNormal NucleoliMitosestarget
01365328.01.01.02.01.02.002.01.01.00
1242970.05.07.07.01.05.083.04.01.00
21133041.05.03.01.02.02.002.01.01.00
3183936.03.01.01.01.02.002.01.01.00
41168278.03.01.01.01.02.002.01.01.00
51059552.01.01.01.01.02.003.01.01.00
61185610.01.01.01.01.03.022.01.01.00
71158247.01.01.01.01.01.001.01.01.00
81238186.04.01.01.01.02.002.01.01.00
91270479.05.01.03.03.02.022.03.01.00

Last rows

Sample code numberClump ThicknessUniformity of Cell SizeUniformity of Cell ShapeMarginal AdhesionSingle Epithelial Cell SizeBare NucleiBland ChromatinNormal NucleoliMitosestarget
689320675.03.03.05.02.03.017.01.01.01
6901238948.08.05.06.02.03.016.06.01.01
691529329.010.010.010.010.010.014.010.010.01
6921352663.05.04.06.08.04.008.010.01.01
693846423.010.06.03.06.04.017.08.04.01
694695091.05.010.010.05.04.054.04.01.01
695837480.07.04.04.03.04.016.09.01.01
6961057013.08.04.05.01.02.0107.03.01.01
697390840.08.04.07.01.03.013.09.02.01
698760001.08.010.03.02.06.043.010.01.01

Duplicate rows

Most frequent

Sample code numberClump ThicknessUniformity of Cell SizeUniformity of Cell ShapeMarginal AdhesionSingle Epithelial Cell SizeBare NucleiBland ChromatinNormal NucleoliMitosestargetcount
0320675.03.03.05.02.03.017.01.01.012
1466906.01.01.01.01.02.001.01.01.002
2704097.01.01.01.01.01.002.01.01.002
31100524.06.010.010.02.08.017.03.03.012
41116116.09.010.010.01.010.083.03.01.012
51198641.03.01.01.01.02.003.01.01.002
61218860.01.01.01.01.01.003.01.01.002
71321942.05.01.01.01.02.003.01.01.002