Overview

Dataset statistics

Number of variables11
Number of observations1066
Missing cells0
Missing cells (%)0.0%
Duplicate rows751
Duplicate rows (%)70.5%
Total size in memory91.7 KiB
Average record size in memory88.1 B

Variable types

CAT7
BOOL2
NUM2

Reproduction

Analysis started2020-08-25 01:23:17.368982
Analysis finished2020-08-25 01:23:18.996282
Duration1.63 second
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Area of the largest spot has constant value "1" Constant
Dataset has 751 (70.5%) duplicate rows Duplicates
class code has 147 (13.8%) zeros Zeros
largest spot code has 216 (20.3%) zeros Zeros

Variables

class code
Real number (ℝ≥0)

ZEROS

Distinct count6
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.627579737335835
Minimum0
Maximum5
Zeros147
Zeros (%)13.8%
Memory size8.5 KiB
2020-08-25T01:23:19.042511image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.844965019
Coefficient of variation (CV)0.7021537701
Kurtosis-1.44657903
Mean2.627579737
Median Absolute Deviation (MAD)1
Skewness0.134861704
Sum2801
Variance3.403895921
2020-08-25T01:23:19.141044image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
533131.1%
 
223922.4%
 
121119.8%
 
014713.8%
 
3958.9%
 
4434.0%
 
ValueCountFrequency (%) 
014713.8%
 
121119.8%
 
223922.4%
 
3958.9%
 
4434.0%
 
533131.1%
 
ValueCountFrequency (%) 
533131.1%
 
4434.0%
 
3958.9%
 
223922.4%
 
121119.8%
 
014713.8%
 

largest spot code
Real number (ℝ≥0)

ZEROS

Distinct count6
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.958724202626642
Minimum0
Maximum5
Zeros216
Zeros (%)20.3%
Memory size8.5 KiB
2020-08-25T01:23:19.248576image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q34
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.703402798
Coefficient of variation (CV)0.5757220619
Kurtosis-0.7660704256
Mean2.958724203
Median Absolute Deviation (MAD)1
Skewness-0.7874597319
Sum3154
Variance2.901581094
2020-08-25T01:23:19.353888image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
441438.8%
 
321820.5%
 
021620.3%
 
514513.6%
 
2464.3%
 
1272.5%
 
ValueCountFrequency (%) 
021620.3%
 
1272.5%
 
2464.3%
 
321820.5%
 
441438.8%
 
514513.6%
 
ValueCountFrequency (%) 
514513.6%
 
441438.8%
 
321820.5%
 
2464.3%
 
1272.5%
 
021620.3%
 

spot dist code
Categorical

Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
2
477
3
331
1
223
0
 
35
ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 
2020-08-25T01:23:19.496594image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 

Activity
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
902
2
 
164
ValueCountFrequency (%) 
190284.6%
 
216415.4%
 
2020-08-25T01:23:19.627460image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
190284.6%
 
216415.4%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
190284.6%
 
216415.4%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
190284.6%
 
216415.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
190284.6%
 
216415.4%
 

Evolution
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
3
505
2
484
1
 
77
ValueCountFrequency (%) 
350547.4%
 
248445.4%
 
1777.2%
 
2020-08-25T01:23:19.760401image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.106633.3%
 
0106633.3%
 
350515.8%
 
248415.1%
 
1772.4%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number213266.7%
 
Other Punctuation106633.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0106650.0%
 
350523.7%
 
248422.7%
 
1773.6%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1066100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3198100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.106633.3%
 
0106633.3%
 
350515.8%
 
248415.1%
 
1772.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3198100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.106633.3%
 
0106633.3%
 
350515.8%
 
248415.1%
 
1772.4%
 
Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
1028
3
 
25
2
 
13
ValueCountFrequency (%) 
1102896.4%
 
3252.3%
 
2131.2%
 
2020-08-25T01:23:19.900254image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.106633.3%
 
0106633.3%
 
1102832.1%
 
3250.8%
 
2130.4%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number213266.7%
 
Other Punctuation106633.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0106650.0%
 
1102848.2%
 
3251.2%
 
2130.6%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1066100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3198100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.106633.3%
 
0106633.3%
 
1102832.1%
 
3250.8%
 
2130.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3198100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.106633.3%
 
0106633.3%
 
1102832.1%
 
3250.8%
 
2130.4%
 
Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
635
2
431
ValueCountFrequency (%) 
163559.6%
 
243140.4%
 
2020-08-25T01:23:20.027190image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
163559.6%
 
243140.4%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
163559.6%
 
243140.4%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
163559.6%
 
243140.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
163559.6%
 
243140.4%
 

become complex
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
2
933
1
 
133
ValueCountFrequency (%) 
293387.5%
 
113312.5%
 
2020-08-25T01:23:20.155162image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
293387.5%
 
113312.5%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
293387.5%
 
113312.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
293387.5%
 
113312.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
293387.5%
 
113312.5%
 

Area
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
1039
2
 
27
ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 
2020-08-25T01:23:20.286508image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 

Area of the largest spot
Boolean

CONSTANT
REJECTED

Distinct count1
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
1066
ValueCountFrequency (%) 
11066100.0%
 

target
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
0
884
1
182
ValueCountFrequency (%) 
088482.9%
 
118217.1%
 

Interactions

2020-08-25T01:23:17.947276image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:23:18.078866image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:23:18.210730image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:23:18.342641image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:23:20.411343image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:23:20.840582image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:23:21.102376image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:23:21.366324image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-08-25T01:23:21.581009image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-08-25T01:23:18.586588image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:23:18.867159image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

class codelargest spot codespot dist codeActivityEvolutionPrevious 24 hour codeHistorically-complexbecome complexAreaArea of the largest spottarget
054313.01.012110
113112.01.012110
223213.01.012111
354322.01.022110
414213.01.012110
514212.01.022110
622113.01.022110
723213.01.012110
805212.01.012110
921212.01.022110

Last rows

class codelargest spot codespot dist codeActivityEvolutionPrevious 24 hour codeHistorically-complexbecome complexAreaArea of the largest spottarget
105605213.01.012110
105724212.01.022110
105830123.01.022110
105905213.01.012110
106053312.01.012110
106154311.01.012110
106220213.01.012111
106354313.01.012110
106454313.01.012110
106550313.01.011110

Duplicate rows

Most frequent

class codelargest spot codespot dist codeActivityEvolutionPrevious 24 hour codeHistorically-complexbecome complexAreaArea of the largest spottargetcount
605213.01.01211067
13854312.01.01211044
2013213.01.01211040
13654312.01.01111035
405212.01.01211026
14254313.01.01111025
1813212.01.01211023
14054312.01.02111023
2814213.01.01211022
6723213.01.01211021