Overview

Dataset statistics

Number of variables11
Number of observations1000
Missing cells0
Missing cells (%)0.0%
Duplicate rows720
Duplicate rows (%)72.0%
Total size in memory86.1 KiB
Average record size in memory88.1 B

Variable types

CAT11

Reproduction

Analysis started2020-08-24 23:43:44.417970
Analysis finished2020-08-24 23:43:46.033439
Duration1.62 second
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 720 (72.0%) duplicate rows Duplicates

Variables

In1
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2
438
1
284
3
278
ValueCountFrequency (%) 
243843.8%
 
128428.4%
 
327827.8%
 
2020-08-24T23:43:46.104146image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
243814.6%
 
12849.5%
 
32789.3%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
243821.9%
 
128414.2%
 
327813.9%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
243814.6%
 
12849.5%
 
32789.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
243814.6%
 
12849.5%
 
32789.3%
 

In2
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
3
416
2
357
1
227
ValueCountFrequency (%) 
341641.6%
 
235735.7%
 
122722.7%
 
2020-08-24T23:43:46.241590image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
341613.9%
 
235711.9%
 
12277.6%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
341620.8%
 
235717.8%
 
122711.3%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
341613.9%
 
235711.9%
 
12277.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
341613.9%
 
235711.9%
 
12277.6%
 

In3
Categorical

Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2
306
1
306
3
242
4
146
ValueCountFrequency (%) 
230630.6%
 
130630.6%
 
324224.2%
 
414614.6%
 
2020-08-24T23:43:46.380265image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters6
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
130610.2%
 
230610.2%
 
32428.1%
 
41464.9%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
130615.3%
 
230615.3%
 
324212.1%
 
41467.3%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
130610.2%
 
230610.2%
 
32428.1%
 
41464.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
130610.2%
 
230610.2%
 
32428.1%
 
41464.9%
 

In4
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
1
349
3
327
2
324
ValueCountFrequency (%) 
134934.9%
 
332732.7%
 
232432.4%
 
2020-08-24T23:43:46.516363image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
134911.6%
 
332710.9%
 
232410.8%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
134917.4%
 
332716.4%
 
232416.2%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
134911.6%
 
332710.9%
 
232410.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
134911.6%
 
332710.9%
 
232410.8%
 

In5
Categorical

Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2
342
3
246
1
241
4
171
ValueCountFrequency (%) 
234234.2%
 
324624.6%
 
124124.1%
 
417117.1%
 
2020-08-24T23:43:46.649830image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters6
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
234211.4%
 
32468.2%
 
12418.0%
 
41715.7%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
234217.1%
 
324612.3%
 
124112.0%
 
41718.6%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
234211.4%
 
32468.2%
 
12418.0%
 
41715.7%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
234211.4%
 
32468.2%
 
12418.0%
 
41715.7%
 

In6
Categorical

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
1
545
2
455
ValueCountFrequency (%) 
154554.5%
 
245545.5%
 
2020-08-24T23:43:46.788127image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
154518.2%
 
245515.2%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
154527.3%
 
245522.8%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
154518.2%
 
245515.2%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
154518.2%
 
245515.2%
 

In7
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2
448
1
304
3
248
ValueCountFrequency (%) 
244844.8%
 
130430.4%
 
324824.8%
 
2020-08-24T23:43:46.923680image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
244814.9%
 
130410.1%
 
32488.3%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
244822.4%
 
130415.2%
 
324812.4%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
244814.9%
 
130410.1%
 
32488.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
244814.9%
 
130410.1%
 
32488.3%
 

In8
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2
454
3
290
1
256
ValueCountFrequency (%) 
245445.4%
 
329029.0%
 
125625.6%
 
2020-08-24T23:43:47.056937image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
245415.1%
 
32909.7%
 
12568.5%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
245422.7%
 
329014.5%
 
125612.8%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
245415.1%
 
32909.7%
 
12568.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
245415.1%
 
32909.7%
 
12568.5%
 

In9
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
2
365
3
349
1
286
ValueCountFrequency (%) 
236536.5%
 
334934.9%
 
128628.6%
 
2020-08-24T23:43:47.189887image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
236512.2%
 
334911.6%
 
12869.5%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
236518.2%
 
334917.4%
 
128614.3%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
236512.2%
 
334911.6%
 
12869.5%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
236512.2%
 
334911.6%
 
12869.5%
 

In10
Categorical

Distinct count3
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
3
396
2
336
1
268
ValueCountFrequency (%) 
339639.6%
 
233633.6%
 
126826.8%
 
2020-08-24T23:43:47.323557image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters5
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
339613.2%
 
233611.2%
 
12688.9%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
339619.8%
 
233616.8%
 
126813.4%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
339613.2%
 
233611.2%
 
12688.9%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
339613.2%
 
233611.2%
 
12688.9%
 

target
Categorical

Distinct count4
Unique (%)0.4%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
4
399
3
352
5
217
2
 
32
ValueCountFrequency (%) 
439939.9%
 
335235.2%
 
521721.7%
 
2323.2%
 
2020-08-24T23:43:47.457412image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length3
Median length3
Mean length3
Min length3

Overview of Unicode Properties

Unique unicode characters6
Unique unicode categories (?)2
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
439913.3%
 
335211.7%
 
52177.2%
 
2321.1%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number200066.7%
 
Other Punctuation100033.3%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
0100050.0%
 
439920.0%
 
335217.6%
 
521710.8%
 
2321.6%
 

Most frequent Other Punctuation characters

ValueCountFrequency (%) 
.1000100.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common3000100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
439913.3%
 
335211.7%
 
52177.2%
 
2321.1%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII3000100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
.100033.3%
 
0100033.3%
 
439913.3%
 
335211.7%
 
52177.2%
 
2321.1%
 

Correlations

2020-08-24T23:43:47.581380image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-24T23:43:47.800319image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-24T23:43:48.021935image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-24T23:43:48.253665image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-08-24T23:43:48.476841image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-08-24T23:43:45.515531image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-24T23:43:45.937927image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

In1In2In3In4In5In6In7In8In9In10target
02.01.01.02.01.01.02.02.01.01.02.0
11.02.03.03.02.01.03.01.03.03.05.0
23.03.02.01.02.02.03.03.03.03.05.0
32.03.04.02.04.02.02.02.01.02.05.0
41.02.01.02.02.02.02.01.01.02.03.0
52.02.03.02.04.02.02.02.02.01.04.0
62.02.02.02.02.01.02.02.02.01.04.0
71.03.03.01.04.02.01.01.02.03.04.0
82.02.01.01.03.01.01.03.01.01.03.0
93.03.03.02.01.01.02.02.03.02.04.0

Last rows

In1In2In3In4In5In6In7In8In9In10target
9902.02.04.02.02.02.02.02.03.03.05.0
9913.03.03.02.01.01.03.03.01.01.04.0
9922.01.03.02.03.01.02.01.03.02.04.0
9931.03.03.02.02.02.02.01.03.01.04.0
9942.03.02.03.03.02.01.01.02.01.03.0
9953.02.01.03.03.01.01.03.03.02.03.0
9962.03.02.02.02.01.03.03.01.02.04.0
9971.02.01.03.01.01.01.01.03.02.04.0
9982.03.03.02.03.01.03.02.02.03.05.0
9992.03.03.02.02.02.03.02.03.03.05.0

Duplicate rows

Most frequent

In1In2In3In4In5In6In7In8In9In10targetcount
61.01.02.02.01.02.01.02.02.02.03.011
872.02.01.01.03.01.01.03.01.01.03.011
11.01.01.02.01.01.03.03.01.01.03.010
1292.03.02.02.02.01.03.03.01.02.04.010
1322.03.03.01.01.02.01.03.02.02.04.010
2033.03.03.02.01.01.02.02.03.02.04.010
171.02.01.03.01.01.01.01.03.02.03.09
191.02.02.01.01.02.01.02.01.02.03.09
341.03.01.03.01.01.01.03.02.02.03.09
1162.02.04.03.03.02.03.02.03.03.05.09