Dataset statistics
Number of variables | 11 |
---|---|
Number of observations | 40768 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 3.4 MiB |
Average record size in memory | 88.0 B |
Variable types
CAT | 10 |
---|---|
NUM | 1 |
Reproduction
Analysis started | 2020-08-24 23:54:55.574734 |
---|---|
Analysis finished | 2020-08-24 23:54:59.317724 |
Duration | 3.74 seconds |
Version | pandas-profiling v2.8.0 |
Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
Download configuration | config.yaml |
x1
Categorical
Distinct count | 2 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
1 | |
---|---|
-1 |
Value | Count | Frequency (%) | |
1 | 20549 | 50.4% | |
-1 | 20219 | 49.6% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.495952708 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
1 | 40768 | 28.6% | |
. | 40768 | 28.6% | |
0 | 40768 | 28.6% | |
- | 20219 | 14.2% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 57.2% | |
Other Punctuation | 40768 | 28.6% | |
Dash Punctuation | 20219 | 14.2% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
1 | 40768 | 50.0% | |
0 | 40768 | 50.0% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 20219 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 142523 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
1 | 40768 | 28.6% | |
. | 40768 | 28.6% | |
0 | 40768 | 28.6% | |
- | 20219 | 14.2% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 142523 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
1 | 40768 | 28.6% | |
. | 40768 | 28.6% | |
0 | 40768 | 28.6% | |
- | 20219 | 14.2% |
x2
Categorical
Distinct count | 3 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
0 | |
---|---|
-1 | |
1 |
Value | Count | Frequency (%) | |
0 | 13719 | 33.7% | |
-1 | 13639 | 33.5% | |
1 | 13410 | 32.9% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.334551609 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
0 | 54487 | 40.1% | |
. | 40768 | 30.0% | |
1 | 27049 | 19.9% | |
- | 13639 | 10.0% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 60.0% | |
Other Punctuation | 40768 | 30.0% | |
Dash Punctuation | 13639 | 10.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
0 | 54487 | 66.8% | |
1 | 27049 | 33.2% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 13639 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 135943 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
0 | 54487 | 40.1% | |
. | 40768 | 30.0% | |
1 | 27049 | 19.9% | |
- | 13639 | 10.0% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 135943 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
0 | 54487 | 40.1% | |
. | 40768 | 30.0% | |
1 | 27049 | 19.9% | |
- | 13639 | 10.0% |
x3
Categorical
Distinct count | 3 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
1 | |
---|---|
-1 | |
0 |
Value | Count | Frequency (%) | |
1 | 13667 | 33.5% | |
-1 | 13636 | 33.4% | |
0 | 13465 | 33.0% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.334478022 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
0 | 54233 | 39.9% | |
. | 40768 | 30.0% | |
1 | 27303 | 20.1% | |
- | 13636 | 10.0% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 60.0% | |
Other Punctuation | 40768 | 30.0% | |
Dash Punctuation | 13636 | 10.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
0 | 54233 | 66.5% | |
1 | 27303 | 33.5% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 13636 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 135940 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
0 | 54233 | 39.9% | |
. | 40768 | 30.0% | |
1 | 27303 | 20.1% | |
- | 13636 | 10.0% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 135940 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
0 | 54233 | 39.9% | |
. | 40768 | 30.0% | |
1 | 27303 | 20.1% | |
- | 13636 | 10.0% |
x4
Categorical
Distinct count | 3 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
1 | |
---|---|
0 | |
-1 |
Value | Count | Frequency (%) | |
1 | 13670 | 33.5% | |
0 | 13644 | 33.5% | |
-1 | 13454 | 33.0% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.330013736 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
0 | 54412 | 40.1% | |
. | 40768 | 30.0% | |
1 | 27124 | 20.0% | |
- | 13454 | 9.9% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 60.1% | |
Other Punctuation | 40768 | 30.0% | |
Dash Punctuation | 13454 | 9.9% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
0 | 54412 | 66.7% | |
1 | 27124 | 33.3% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 13454 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 135758 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
0 | 54412 | 40.1% | |
. | 40768 | 30.0% | |
1 | 27124 | 20.0% | |
- | 13454 | 9.9% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 135758 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
0 | 54412 | 40.1% | |
. | 40768 | 30.0% | |
1 | 27124 | 20.0% | |
- | 13454 | 9.9% |
x5
Categorical
Distinct count | 3 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
1 | |
---|---|
-1 | |
0 |
Value | Count | Frequency (%) | |
1 | 13686 | 33.6% | |
-1 | 13566 | 33.3% | |
0 | 13516 | 33.2% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.332760989 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
0 | 54284 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27252 | 20.1% | |
- | 13566 | 10.0% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 60.0% | |
Other Punctuation | 40768 | 30.0% | |
Dash Punctuation | 13566 | 10.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 13566 | 100.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
0 | 54284 | 66.6% | |
1 | 27252 | 33.4% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 135870 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
0 | 54284 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27252 | 20.1% | |
- | 13566 | 10.0% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 135870 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
0 | 54284 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27252 | 20.1% | |
- | 13566 | 10.0% |
x6
Categorical
Distinct count | 3 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
1 | |
---|---|
-1 | |
0 |
Value | Count | Frequency (%) | |
1 | 13627 | 33.4% | |
-1 | 13618 | 33.4% | |
0 | 13523 | 33.2% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.334036499 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
0 | 54291 | 39.9% | |
. | 40768 | 30.0% | |
1 | 27245 | 20.0% | |
- | 13618 | 10.0% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 60.0% | |
Other Punctuation | 40768 | 30.0% | |
Dash Punctuation | 13618 | 10.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
0 | 54291 | 66.6% | |
1 | 27245 | 33.4% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 13618 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 135922 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
0 | 54291 | 39.9% | |
. | 40768 | 30.0% | |
1 | 27245 | 20.0% | |
- | 13618 | 10.0% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 135922 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
0 | 54291 | 39.9% | |
. | 40768 | 30.0% | |
1 | 27245 | 20.0% | |
- | 13618 | 10.0% |
x7
Categorical
Distinct count | 3 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
0 | |
---|---|
1 | |
-1 |
Value | Count | Frequency (%) | |
0 | 13664 | 33.5% | |
1 | 13586 | 33.3% | |
-1 | 13518 | 33.2% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.331583595 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
0 | 54432 | 40.1% | |
. | 40768 | 30.0% | |
1 | 27104 | 20.0% | |
- | 13518 | 10.0% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 60.0% | |
Other Punctuation | 40768 | 30.0% | |
Dash Punctuation | 13518 | 10.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
0 | 54432 | 66.8% | |
1 | 27104 | 33.2% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 13518 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 135822 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
0 | 54432 | 40.1% | |
. | 40768 | 30.0% | |
1 | 27104 | 20.0% | |
- | 13518 | 10.0% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 135822 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
0 | 54432 | 40.1% | |
. | 40768 | 30.0% | |
1 | 27104 | 20.0% | |
- | 13518 | 10.0% |
x8
Categorical
Distinct count | 3 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
-1 | |
---|---|
0 | |
1 |
Value | Count | Frequency (%) | |
-1 | 13627 | 33.4% | |
0 | 13574 | 33.3% | |
1 | 13567 | 33.3% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.334257261 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
0 | 54342 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27194 | 20.0% | |
- | 13627 | 10.0% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 60.0% | |
Other Punctuation | 40768 | 30.0% | |
Dash Punctuation | 13627 | 10.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
0 | 54342 | 66.6% | |
1 | 27194 | 33.4% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 13627 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 135931 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
0 | 54342 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27194 | 20.0% | |
- | 13627 | 10.0% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 135931 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
0 | 54342 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27194 | 20.0% | |
- | 13627 | 10.0% |
x9
Categorical
Distinct count | 3 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
1 | |
---|---|
0 | |
-1 |
Value | Count | Frequency (%) | |
1 | 13647 | 33.5% | |
0 | 13573 | 33.3% | |
-1 | 13548 | 33.2% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.332319466 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
0 | 54341 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27195 | 20.0% | |
- | 13548 | 10.0% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 60.0% | |
Other Punctuation | 40768 | 30.0% | |
Dash Punctuation | 13548 | 10.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 13548 | 100.0% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
0 | 54341 | 66.6% | |
1 | 27195 | 33.4% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 135852 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
0 | 54341 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27195 | 20.0% | |
- | 13548 | 10.0% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 135852 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
0 | 54341 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27195 | 20.0% | |
- | 13548 | 10.0% |
x10
Categorical
Distinct count | 3 |
---|---|
Unique (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 318.6 KiB |
1 | |
---|---|
0 | |
-1 |
Value | Count | Frequency (%) | |
1 | 13720 | 33.7% | |
0 | 13562 | 33.3% | |
-1 | 13486 | 33.1% |
Length
Max length | 4 |
---|---|
Median length | 3 |
Mean length | 3.330798666 |
Min length | 3 |
Most occurring characters
Value | Count | Frequency (%) | |
0 | 54330 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27206 | 20.0% | |
- | 13486 | 9.9% |
Most occurring categories
Value | Count | Frequency (%) | |
Decimal Number | 81536 | 60.0% | |
Other Punctuation | 40768 | 30.0% | |
Dash Punctuation | 13486 | 9.9% |
Most frequent Decimal Number characters
Value | Count | Frequency (%) | |
0 | 54330 | 66.6% | |
1 | 27206 | 33.4% |
Most frequent Other Punctuation characters
Value | Count | Frequency (%) | |
. | 40768 | 100.0% |
Most frequent Dash Punctuation characters
Value | Count | Frequency (%) | |
- | 13486 | 100.0% |
Most occurring scripts
Value | Count | Frequency (%) | |
Common | 135790 | 100.0% |
Most frequent Common characters
Value | Count | Frequency (%) | |
0 | 54330 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27206 | 20.0% | |
- | 13486 | 9.9% |
Most occurring blocks
Value | Count | Frequency (%) | |
ASCII | 135790 | 100.0% |
Most frequent ASCII characters
Value | Count | Frequency (%) | |
0 | 54330 | 40.0% | |
. | 40768 | 30.0% | |
1 | 27206 | 20.0% | |
- | 13486 | 9.9% |
target
Real number (ℝ)
Distinct count | 40368 |
---|---|
Unique (%) | 99.0% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | 0.009558489707427686 |
---|---|
Minimum | -12.694299697875975 |
Maximum | 12.20259952545166 |
Zeros | 0 |
Zeros (%) | 0.0% |
Memory size | 318.6 KiB |
Quantile statistics
Minimum | -12.6942997 |
---|---|
5-th percentile | -7.283281398 |
Q1 | -3.17632246 |
median | 0.02114815079 |
Q3 | 3.213844955 |
95-th percentile | 7.318846703 |
Maximum | 12.20259953 |
Range | 24.89689922 |
Interquartile range (IQR) | 6.390167415 |
Descriptive statistics
Standard deviation | 4.39261068 |
---|---|
Coefficient of variation (CV) | 459.5507046 |
Kurtosis | -0.606878549 |
Mean | 0.009558489707 |
Median Absolute Deviation (MAD) | 3.195055008 |
Skewness | -0.007671203875 |
Sum | 389.6805084 |
Variance | 19.29502859 |
Histogram with fixed size bins (bins=10)
Value | Count | Frequency (%) | |
1.034989953 | 3 | < 0.1% | |
2.045269966 | 3 | < 0.1% | |
1.817289948 | 3 | < 0.1% | |
-1.49969995 | 2 | < 0.1% | |
-1.306579947 | 2 | < 0.1% | |
-0.103726998 | 2 | < 0.1% | |
4.469629765 | 2 | < 0.1% | |
-1.940089941 | 2 | < 0.1% | |
4.725840092 | 2 | < 0.1% | |
-2.173520088 | 2 | < 0.1% | |
-1.726850033 | 2 | < 0.1% | |
4.817200184 | 2 | < 0.1% | |
-3.051769972 | 2 | < 0.1% | |
2.875610113 | 2 | < 0.1% | |
1.565369964 | 2 | < 0.1% | |
10.27729988 | 2 | < 0.1% | |
-3.499890089 | 2 | < 0.1% | |
-3.633100033 | 2 | < 0.1% | |
-2.329129934 | 2 | < 0.1% | |
-5.498439789 | 2 | < 0.1% | |
-2.765619993 | 2 | < 0.1% | |
-4.102519989 | 2 | < 0.1% | |
-5.358990192 | 2 | < 0.1% | |
2.92276001 | 2 | < 0.1% | |
4.221879959 | 2 | < 0.1% | |
Other values (40343) | 40715 | 99.9% |
Value | Count | Frequency (%) | |
-12.6942997 | 1 | < 0.1% | |
-12.04590034 | 1 | < 0.1% | |
-11.57479954 | 1 | < 0.1% | |
-11.55790043 | 1 | < 0.1% | |
-11.46090031 | 1 | < 0.1% | |
-11.31770039 | 1 | < 0.1% | |
-11.26780033 | 1 | < 0.1% | |
-11.24209976 | 1 | < 0.1% | |
-11.22679996 | 1 | < 0.1% | |
-11.22399998 | 1 | < 0.1% |
Value | Count | Frequency (%) | |
12.20259953 | 1 | < 0.1% | |
11.90380001 | 1 | < 0.1% | |
11.88949966 | 1 | < 0.1% | |
11.41959953 | 1 | < 0.1% | |
11.40680027 | 1 | < 0.1% | |
11.27130032 | 1 | < 0.1% | |
11.23709965 | 1 | < 0.1% | |
11.23579979 | 1 | < 0.1% | |
11.13560009 | 1 | < 0.1% | |
11.08220005 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.Cramér's V (φc)
Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.First rows
x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | target | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 0.0 | 1.0 | 0.0 | -1.0 | 1.0 | 7.73906 |
1 | -1.0 | 0.0 | -1.0 | -1.0 | 1.0 | 1.0 | 1.0 | 0.0 | -1.0 | 0.0 | 3.95676 |
2 | 1.0 | 0.0 | 1.0 | 1.0 | -1.0 | -1.0 | 0.0 | 1.0 | 1.0 | 1.0 | 4.71592 |
3 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | -1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 5.02863 |
4 | -1.0 | 0.0 | -1.0 | -1.0 | -1.0 | -1.0 | -1.0 | 1.0 | 1.0 | -1.0 | -11.57480 |
5 | 1.0 | 1.0 | 1.0 | -1.0 | -1.0 | -1.0 | -1.0 | 1.0 | -1.0 | -1.0 | 6.87817 |
6 | 1.0 | -1.0 | 0.0 | 0.0 | -1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | -1.94303 |
7 | 1.0 | -1.0 | 0.0 | -1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 1.0 | 0.0 | -1.42592 |
8 | -1.0 | 0.0 | -1.0 | -1.0 | -1.0 | 1.0 | 0.0 | 1.0 | -1.0 | 1.0 | -3.65451 |
9 | -1.0 | 1.0 | -1.0 | 0.0 | 1.0 | 0.0 | -1.0 | 0.0 | -1.0 | 1.0 | -1.49176 |
Last rows
x1 | x2 | x3 | x4 | x5 | x6 | x7 | x8 | x9 | x10 | target | |
---|---|---|---|---|---|---|---|---|---|---|---|
40758 | -1.0 | 1.0 | 1.0 | 1.0 | -1.0 | -1.0 | -1.0 | 0.0 | 0.0 | 0.0 | -7.460250 |
40759 | -1.0 | 0.0 | 0.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 1.0 | 0.0 | 3.244770 |
40760 | -1.0 | 1.0 | 1.0 | 1.0 | 1.0 | -1.0 | 0.0 | 1.0 | 0.0 | 0.0 | -1.015610 |
40761 | -1.0 | 0.0 | -1.0 | -1.0 | -1.0 | 0.0 | 1.0 | -1.0 | 0.0 | -1.0 | -4.294800 |
40762 | -1.0 | 0.0 | -1.0 | 1.0 | -1.0 | 0.0 | 0.0 | -1.0 | -1.0 | 1.0 | -6.565690 |
40763 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 1.0 | -1.0 | 0.0 | 6.764770 |
40764 | 1.0 | 1.0 | 0.0 | 1.0 | 1.0 | 1.0 | -1.0 | 1.0 | 0.0 | 1.0 | 5.538390 |
40765 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | 1.0 | -1.0 | -1.0 | 0.0 | 3.978300 |
40766 | -1.0 | 0.0 | 0.0 | -1.0 | 1.0 | 0.0 | 1.0 | 0.0 | -1.0 | 0.0 | -0.609818 |
40767 | 1.0 | -1.0 | 0.0 | 0.0 | -1.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | -0.813671 |