Overview

Dataset statistics

Number of variables10
Number of observations699
Missing cells0
Missing cells (%)0.0%
Duplicate rows236
Duplicate rows (%)33.8%
Total size in memory54.7 KiB
Average record size in memory80.2 B

Variable types

NUM9
BOOL1

Reproduction

Analysis started2020-08-25 01:11:06.004273
Analysis finished2020-08-25 01:11:18.711125
Duration12.71 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 236 (33.8%) duplicate rows Duplicates
Cell_Shape_Uniformity is highly correlated with Cell_Size_UniformityHigh correlation
Cell_Size_Uniformity is highly correlated with Cell_Shape_UniformityHigh correlation
Bare_Nuclei has 402 (57.5%) zeros Zeros

Variables

Clump_Thickness
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.417739628040057
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:11:18.765854image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.815740659
Coefficient of variation (CV)0.6373713473
Kurtosis-0.6237154123
Mean4.417739628
Median Absolute Deviation (MAD)2
Skewness0.5928585327
Sum3088
Variance7.928395456
2020-08-25T01:11:18.886621image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
114520.7%
 
513018.6%
 
310815.5%
 
48011.4%
 
10699.9%
 
2507.2%
 
8466.6%
 
6344.9%
 
7233.3%
 
9142.0%
 
ValueCountFrequency (%) 
114520.7%
 
2507.2%
 
310815.5%
 
48011.4%
 
513018.6%
 
6344.9%
 
7233.3%
 
8466.6%
 
9142.0%
 
10699.9%
 
ValueCountFrequency (%) 
10699.9%
 
9142.0%
 
8466.6%
 
7233.3%
 
6344.9%
 
513018.6%
 
48011.4%
 
310815.5%
 
2507.2%
 
114520.7%
 

Cell_Size_Uniformity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.13447782546495
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:11:18.995903image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.05145911
Coefficient of variation (CV)0.9735143395
Kurtosis0.09880288537
Mean3.134477825
Median Absolute Deviation (MAD)0
Skewness1.233136558
Sum2191
Variance9.3114027
2020-08-25T01:11:19.109926image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
138454.9%
 
10679.6%
 
3527.4%
 
2456.4%
 
4405.7%
 
5304.3%
 
8294.1%
 
6273.9%
 
7192.7%
 
960.9%
 
ValueCountFrequency (%) 
138454.9%
 
2456.4%
 
3527.4%
 
4405.7%
 
5304.3%
 
6273.9%
 
7192.7%
 
8294.1%
 
960.9%
 
10679.6%
 
ValueCountFrequency (%) 
10679.6%
 
960.9%
 
8294.1%
 
7192.7%
 
6273.9%
 
5304.3%
 
4405.7%
 
3527.4%
 
2456.4%
 
138454.9%
 

Cell_Shape_Uniformity
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.207439198855508
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:11:19.219888image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.971912767
Coefficient of variation (CV)0.9265686995
Kurtosis0.007010980047
Mean3.207439199
Median Absolute Deviation (MAD)0
Skewness1.161859179
Sum2242
Variance8.832265496
2020-08-25T01:11:19.333587image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
135350.5%
 
2598.4%
 
10588.3%
 
3568.0%
 
4446.3%
 
5344.9%
 
7304.3%
 
6304.3%
 
8284.0%
 
971.0%
 
ValueCountFrequency (%) 
135350.5%
 
2598.4%
 
3568.0%
 
4446.3%
 
5344.9%
 
6304.3%
 
7304.3%
 
8284.0%
 
971.0%
 
10588.3%
 
ValueCountFrequency (%) 
10588.3%
 
971.0%
 
8284.0%
 
7304.3%
 
6304.3%
 
5344.9%
 
4446.3%
 
3568.0%
 
2598.4%
 
135350.5%
 

Marginal_Adhesion
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.8068669527896994
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:11:19.443291image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.855379239
Coefficient of variation (CV)1.017283429
Kurtosis0.9879470695
Mean2.806866953
Median Absolute Deviation (MAD)0
Skewness1.524468091
Sum1962
Variance8.1531906
2020-08-25T01:11:19.561069image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
140758.2%
 
2588.3%
 
3588.3%
 
10557.9%
 
4334.7%
 
8253.6%
 
5233.3%
 
6223.1%
 
7131.9%
 
950.7%
 
ValueCountFrequency (%) 
140758.2%
 
2588.3%
 
3588.3%
 
4334.7%
 
5233.3%
 
6223.1%
 
7131.9%
 
8253.6%
 
950.7%
 
10557.9%
 
ValueCountFrequency (%) 
10557.9%
 
950.7%
 
8253.6%
 
7131.9%
 
6223.1%
 
5233.3%
 
4334.7%
 
3588.3%
 
2588.3%
 
140758.2%
 

Single_Epi_Cell_Size
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.216022889842632
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:11:19.672354image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.214299887
Coefficient of variation (CV)0.6885211836
Kurtosis2.169066423
Mean3.21602289
Median Absolute Deviation (MAD)0
Skewness1.712171802
Sum2248
Variance4.903123988
2020-08-25T01:11:19.786823image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
238655.2%
 
37210.3%
 
4486.9%
 
1476.7%
 
6415.9%
 
5395.6%
 
10314.4%
 
8213.0%
 
7121.7%
 
920.3%
 
ValueCountFrequency (%) 
1476.7%
 
238655.2%
 
37210.3%
 
4486.9%
 
5395.6%
 
6415.9%
 
7121.7%
 
8213.0%
 
920.3%
 
10314.4%
 
ValueCountFrequency (%) 
10314.4%
 
920.3%
 
8213.0%
 
7121.7%
 
6415.9%
 
5395.6%
 
4486.9%
 
37210.3%
 
238655.2%
 
1476.7%
 

Bare_Nuclei
Real number (ℝ≥0)

ZEROS

Distinct count11
Unique (%)1.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.4177396280400572
Minimum0
Maximum10
Zeros402
Zeros (%)57.5%
Memory size5.6 KiB
2020-08-25T01:11:19.900853image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q31
95-th percentile8
Maximum10
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation2.499862469
Coefficient of variation (CV)1.763273325
Kurtosis3.391647791
Mean1.417739628
Median Absolute Deviation (MAD)0
Skewness2.066583426
Sum991
Variance6.249312362
2020-08-25T01:11:20.009200image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
040257.5%
 
113218.9%
 
5304.3%
 
2304.3%
 
3284.0%
 
8213.0%
 
4192.7%
 
10162.3%
 
991.3%
 
781.1%
 
640.6%
 
ValueCountFrequency (%) 
040257.5%
 
113218.9%
 
2304.3%
 
3284.0%
 
4192.7%
 
5304.3%
 
640.6%
 
781.1%
 
8213.0%
 
991.3%
 
ValueCountFrequency (%) 
10162.3%
 
991.3%
 
8213.0%
 
781.1%
 
640.6%
 
5304.3%
 
4192.7%
 
3284.0%
 
2304.3%
 
113218.9%
 

Bland_Chromatin
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.4377682403433476
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:11:20.122362image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.438364252
Coefficient of variation (CV)0.7092869798
Kurtosis0.1846213115
Mean3.43776824
Median Absolute Deviation (MAD)1
Skewness1.099969082
Sum2403
Variance5.945620227
2020-08-25T01:11:20.236440image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
216623.7%
 
316523.6%
 
115221.7%
 
77310.4%
 
4405.7%
 
5344.9%
 
8284.0%
 
10202.9%
 
9111.6%
 
6101.4%
 
ValueCountFrequency (%) 
115221.7%
 
216623.7%
 
316523.6%
 
4405.7%
 
5344.9%
 
6101.4%
 
77310.4%
 
8284.0%
 
9111.6%
 
10202.9%
 
ValueCountFrequency (%) 
10202.9%
 
9111.6%
 
8284.0%
 
77310.4%
 
6101.4%
 
5344.9%
 
4405.7%
 
316523.6%
 
216623.7%
 
115221.7%
 

Normal_Nucleoli
Real number (ℝ≥0)

Distinct count10
Unique (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.866952789699571
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:11:20.345830image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.053633894
Coefficient of variation (CV)1.065114816
Kurtosis0.4742686755
Mean2.86695279
Median Absolute Deviation (MAD)0
Skewness1.422261257
Sum2004
Variance9.324679956
2020-08-25T01:11:20.455261image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
144363.4%
 
10618.7%
 
3446.3%
 
2365.2%
 
8243.4%
 
6223.1%
 
5192.7%
 
4182.6%
 
9162.3%
 
7162.3%
 
ValueCountFrequency (%) 
144363.4%
 
2365.2%
 
3446.3%
 
4182.6%
 
5192.7%
 
6223.1%
 
7162.3%
 
8243.4%
 
9162.3%
 
10618.7%
 
ValueCountFrequency (%) 
10618.7%
 
9162.3%
 
8243.4%
 
7162.3%
 
6223.1%
 
5192.7%
 
4182.6%
 
3446.3%
 
2365.2%
 
144363.4%
 

Mitoses
Real number (ℝ≥0)

Distinct count9
Unique (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.5894134477825466
Minimum1.0
Maximum10.0
Zeros0
Zeros (%)0.0%
Memory size5.6 KiB
2020-08-25T01:11:20.758049image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.715077943
Coefficient of variation (CV)1.07906344
Kurtosis12.65787807
Mean1.589413448
Median Absolute Deviation (MAD)0
Skewness3.560657844
Sum1111
Variance2.941492349
2020-08-25T01:11:20.867880image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
157982.8%
 
2355.0%
 
3334.7%
 
10142.0%
 
4121.7%
 
791.3%
 
881.1%
 
560.9%
 
630.4%
 
ValueCountFrequency (%) 
157982.8%
 
2355.0%
 
3334.7%
 
4121.7%
 
560.9%
 
630.4%
 
791.3%
 
881.1%
 
10142.0%
 
ValueCountFrequency (%) 
10142.0%
 
881.1%
 
791.3%
 
630.4%
 
560.9%
 
4121.7%
 
3334.7%
 
2355.0%
 
157982.8%
 

target
Boolean

Distinct count2
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.6 KiB
0
458
1
241
ValueCountFrequency (%) 
045865.5%
 
124134.5%
 

Interactions

2020-08-25T01:11:06.447359image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:06.579935image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:06.717282image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:06.853619image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:06.988854image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:07.123948image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:07.440536image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:07.571859image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:07.704226image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:07.842308image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:07.985760image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:08.128480image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:08.270001image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:08.403760image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:08.539532image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:08.678625image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:08.817860image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:08.957158image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:09.090840image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:09.225847image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:09.378289image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:09.514374image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:09.650295image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:09.787429image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:09.925173image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:10.061327image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:10.198153image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:10.340304image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:10.478843image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:10.613181image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:10.757133image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:10.892386image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:11.026268image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:11.163418image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:11.299762image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:11.436626image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:11.569789image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:11.705080image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:11.841874image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:12.176840image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:12.312265image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:12.449646image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:12.585323image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:12.718524image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:12.854608image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:12.994547image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:13.134481image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:13.272426image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:13.413214image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:13.552899image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:13.692963image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:13.835479image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:13.980564image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:14.117862image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:14.258091image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:14.396690image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:14.531387image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:14.667849image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:14.802789image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:14.939553image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:15.077110image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:15.213992image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:15.349370image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:15.483931image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:15.621912image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:15.766740image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:15.910268image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:16.043551image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:16.180460image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:16.317120image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:16.460839image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:16.605535image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:16.954265image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:17.089637image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:17.224112image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:17.360351image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:17.498616image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:17.633809image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:17.772366image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:17.917366image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:18.053238image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:11:20.993120image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:11:21.240820image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:11:21.489937image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:11:21.736106image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-08-25T01:11:18.289950image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:11:18.569285image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

Clump_ThicknessCell_Size_UniformityCell_Shape_UniformityMarginal_AdhesionSingle_Epi_Cell_SizeBare_NucleiBland_ChromatinNormal_NucleoliMitosestarget
05.01.01.01.02.003.01.01.00
15.04.04.05.07.013.02.01.00
23.01.01.01.02.023.01.01.00
36.08.08.01.03.043.07.01.00
44.01.01.03.02.003.01.01.00
58.010.010.08.07.019.07.01.01
61.01.01.01.02.013.01.01.00
72.01.02.01.02.003.01.01.00
82.01.01.01.02.001.01.05.00
94.02.01.01.02.002.01.01.00

Last rows

Clump_ThicknessCell_Size_UniformityCell_Shape_UniformityMarginal_AdhesionSingle_Epi_Cell_SizeBare_NucleiBland_ChromatinNormal_NucleoliMitosestarget
6891.01.01.01.02.001.01.08.00
6901.01.01.03.02.001.01.01.00
6915.010.010.05.04.054.04.01.01
6923.01.01.01.02.001.01.01.00
6933.01.01.01.02.002.01.02.00
6943.01.01.01.03.021.01.01.00
6952.01.01.01.02.001.01.01.00
6965.010.010.03.07.038.010.02.01
6974.08.06.04.03.0410.06.01.01
6984.08.08.05.04.0510.04.01.01

Duplicate rows

Most frequent

Clump_ThicknessCell_Size_UniformityCell_Shape_UniformityMarginal_AdhesionSingle_Epi_Cell_SizeBare_NucleiBland_ChromatinNormal_NucleoliMitosestargetcount
41.01.01.01.02.001.01.01.0027
61.01.01.01.02.003.01.01.0023
51.01.01.01.02.002.01.01.0021
203.01.01.01.02.002.01.01.0020
193.01.01.01.02.001.01.01.0012
132.01.01.01.02.001.01.01.0010
213.01.01.01.02.003.01.01.0010
284.01.01.01.02.001.01.01.0010
294.01.01.01.02.002.01.01.0010
375.01.01.01.02.002.01.01.0010