Overview

Dataset statistics

Number of variables13
Number of observations1066
Missing cells0
Missing cells (%)0.0%
Duplicate rows701
Duplicate rows (%)65.8%
Total size in memory108.4 KiB
Average record size in memory104.1 B

Variable types

CAT8
NUM4
BOOL1

Reproduction

Analysis started2020-08-25 01:52:54.654972
Analysis finished2020-08-25 01:52:58.542140
Duration3.89 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Area_of_the_largest_spot has constant value "1" Constant
Dataset has 701 (65.8%) duplicate rows Duplicates
largest_spot_size has 216 (20.3%) zeros Zeros
C-class_flares_production_by_this_region has 884 (82.9%) zeros Zeros
M-class_flares_production_by_this_region has 1030 (96.6%) zeros Zeros
target has 147 (13.8%) zeros Zeros

Variables

largest_spot_size
Real number (ℝ≥0)

ZEROS

Distinct count6
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.958724202626642
Minimum0
Maximum5
Zeros216
Zeros (%)20.3%
Memory size8.5 KiB
2020-08-25T01:52:58.591643image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q12
median4
Q34
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.703402798
Coefficient of variation (CV)0.5757220619
Kurtosis-0.7660704256
Mean2.958724203
Median Absolute Deviation (MAD)1
Skewness-0.7874597319
Sum3154
Variance2.901581094
2020-08-25T01:52:58.695604image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
441438.8%
 
321820.5%
 
021620.3%
 
514513.6%
 
2464.3%
 
1272.5%
 
ValueCountFrequency (%) 
021620.3%
 
1272.5%
 
2464.3%
 
321820.5%
 
441438.8%
 
514513.6%
 
ValueCountFrequency (%) 
514513.6%
 
441438.8%
 
321820.5%
 
2464.3%
 
1272.5%
 
021620.3%
 
Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
2
477
3
331
1
223
0
 
35
ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 
2020-08-25T01:52:58.833787image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
247744.7%
 
333131.1%
 
122320.9%
 
0353.3%
 

Activity
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
902
2
 
164
ValueCountFrequency (%) 
190284.6%
 
216415.4%
 
2020-08-25T01:52:58.966086image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
190284.6%
 
216415.4%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
190284.6%
 
216415.4%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
190284.6%
 
216415.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
190284.6%
 
216415.4%
 

Evolution
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
3
505
2
484
1
 
77
ValueCountFrequency (%) 
350547.4%
 
248445.4%
 
1777.2%
 
2020-08-25T01:52:59.095376image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
350547.4%
 
248445.4%
 
1777.2%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
350547.4%
 
248445.4%
 
1777.2%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
350547.4%
 
248445.4%
 
1777.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
350547.4%
 
248445.4%
 
1777.2%
 
Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
1028
3
 
25
2
 
13
ValueCountFrequency (%) 
1102896.4%
 
3252.3%
 
2131.2%
 
2020-08-25T01:52:59.224416image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
1102896.4%
 
3252.3%
 
2131.2%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1102896.4%
 
3252.3%
 
2131.2%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
1102896.4%
 
3252.3%
 
2131.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
1102896.4%
 
3252.3%
 
2131.2%
 
Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
635
2
431
ValueCountFrequency (%) 
163559.6%
 
243140.4%
 
2020-08-25T01:52:59.355396image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
163559.6%
 
243140.4%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
163559.6%
 
243140.4%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
163559.6%
 
243140.4%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
163559.6%
 
243140.4%
 
Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
2
933
1
 
133
ValueCountFrequency (%) 
293387.5%
 
113312.5%
 
2020-08-25T01:52:59.486522image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
293387.5%
 
113312.5%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
293387.5%
 
113312.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
293387.5%
 
113312.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
293387.5%
 
113312.5%
 

Area
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
1039
2
 
27
ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 
2020-08-25T01:52:59.615227image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters2
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
1103997.5%
 
2272.5%
 

Area_of_the_largest_spot
Boolean

CONSTANT
REJECTED

Distinct count1
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
1
1066
ValueCountFrequency (%) 
11066100.0%
 
Distinct count8
Unique (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.300187617260788
Minimum0
Maximum8
Zeros884
Zeros (%)82.9%
Memory size8.5 KiB
2020-08-25T01:52:59.716758image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum8
Range8
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8357841683
Coefficient of variation (CV)2.784206011
Kurtosis19.6165169
Mean0.3001876173
Median Absolute Deviation (MAD)0
Skewness3.952904981
Sum320
Variance0.698535176
2020-08-25T01:52:59.831856image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
088482.9%
 
111210.5%
 
2333.1%
 
3201.9%
 
490.8%
 
540.4%
 
630.3%
 
810.1%
 
ValueCountFrequency (%) 
088482.9%
 
111210.5%
 
2333.1%
 
3201.9%
 
490.8%
 
540.4%
 
630.3%
 
810.1%
 
ValueCountFrequency (%) 
810.1%
 
630.3%
 
540.4%
 
490.8%
 
3201.9%
 
2333.1%
 
111210.5%
 
088482.9%
 
Distinct count6
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.04690431519699812
Minimum0
Maximum5
Zeros1030
Zeros (%)96.6%
Memory size8.5 KiB
2020-08-25T01:52:59.946295image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.3028112169
Coefficient of variation (CV)6.455935145
Kurtosis116.685621
Mean0.0469043152
Median Absolute Deviation (MAD)0
Skewness9.559345482
Sum50
Variance0.09169463309
2020-08-25T01:53:00.045014image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0103096.6%
 
1292.7%
 
230.3%
 
320.2%
 
510.1%
 
410.1%
 
ValueCountFrequency (%) 
0103096.6%
 
1292.7%
 
230.3%
 
320.2%
 
410.1%
 
510.1%
 
ValueCountFrequency (%) 
510.1%
 
410.1%
 
320.2%
 
230.3%
 
1292.7%
 
0103096.6%
 
Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size8.5 KiB
0
1061
1
 
4
2
 
1
ValueCountFrequency (%) 
0106199.5%
 
140.4%
 
210.1%
 
2020-08-25T01:53:00.175562image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
0106199.5%
 
140.4%
 
210.1%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1066100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0106199.5%
 
140.4%
 
210.1%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1066100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
0106199.5%
 
140.4%
 
210.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1066100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
0106199.5%
 
140.4%
 
210.1%
 

target
Real number (ℝ≥0)

ZEROS

Distinct count6
Unique (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.627579737335835
Minimum0
Maximum5
Zeros147
Zeros (%)13.8%
Memory size8.5 KiB
2020-08-25T01:53:00.277765image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.844965019
Coefficient of variation (CV)0.7021537701
Kurtosis-1.44657903
Mean2.627579737
Median Absolute Deviation (MAD)1
Skewness0.134861704
Sum2801
Variance3.403895921
2020-08-25T01:53:00.379100image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
533131.1%
 
223922.4%
 
121119.8%
 
014713.8%
 
3958.9%
 
4434.0%
 
ValueCountFrequency (%) 
014713.8%
 
121119.8%
 
223922.4%
 
3958.9%
 
4434.0%
 
533131.1%
 
ValueCountFrequency (%) 
533131.1%
 
4434.0%
 
3958.9%
 
223922.4%
 
121119.8%
 
014713.8%
 

Interactions

2020-08-25T01:52:55.385940image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:55.519942image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:55.672268image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:55.811853image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:55.938466image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:56.103270image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:56.268901image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:56.419207image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:56.568702image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:56.704341image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:56.854453image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:56.993308image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:57.127258image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:57.257698image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:57.404387image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:57.539889image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:53:00.507336image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:53:00.808238image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:53:01.117749image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:53:01.419628image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-08-25T01:53:01.676386image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-08-25T01:52:57.872547image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:52:58.377659image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

largest_spot_sizespot_distributionActivityEvolutionPrevious_24_hour_flare_activity_codeHistorically-complexDid_region_become_historically_complexAreaArea_of_the_largest_spotC-class_flares_production_by_this_regionM-class_flares_production_by_this_regionX-class_flares_production_by_this_regiontarget
00313111110005
13213112110002
24213112110001
33312111110005
44311112110005
50212112110001
65213112110000
70213112110001
80212112111001
95213112110000

Last rows

largest_spot_sizespot_distributionActivityEvolutionPrevious_24_hour_flare_activity_codeHistorically-complexDid_region_become_historically_complexAreaArea_of_the_largest_spotC-class_flares_production_by_this_regionM-class_flares_production_by_this_regionX-class_flares_production_by_this_regiontarget
10564312112111005
10574212122110002
10584312222110005
10594322112110005
10603213112110002
10614312111110005
10624322112110005
10634212122110001
10643312112110005
10655211112110000

Duplicate rows

Most frequent

largest_spot_sizespot_distributionActivityEvolutionPrevious_24_hour_flare_activity_codeHistorically-complexDid_region_become_historically_complexAreaArea_of_the_largest_spotC-class_flares_production_by_this_regionM-class_flares_production_by_this_regionX-class_flares_production_by_this_regiontargetcount
130521311211000067
113431211211000544
63321311211000139
111431211111000534
128521211211000026
117431311111000525
58321211211000123
115431212111000523
97421311211000121
64321311211000220