Overview

Dataset statistics

Number of variables10
Number of observations1473
Missing cells0
Missing cells (%)0.0%
Duplicate rows48
Duplicate rows (%)3.3%
Total size in memory115.2 KiB
Average record size in memory80.1 B

Variable types

CAT5
BOOL3
NUM2

Reproduction

Analysis started2020-08-25 01:17:21.327115
Analysis finished2020-08-25 01:17:23.001420
Duration1.67 second
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 48 (3.3%) duplicate rows Duplicates
Number_of_children_ever_born has 97 (6.6%) zeros Zeros

Variables

Wifes_age
Real number (ℝ≥0)

Distinct count34
Unique (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.53835709436524
Minimum16.0
Maximum49.0
Zeros0
Zeros (%)0.0%
Memory size11.6 KiB
2020-08-25T01:17:23.045605image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum16
5-th percentile21
Q126
median32
Q339
95-th percentile47
Maximum49
Range33
Interquartile range (IQR)13

Descriptive statistics

Standard deviation8.227244755
Coefficient of variation (CV)0.2528475771
Kurtosis-0.9438944909
Mean32.53835709
Median Absolute Deviation (MAD)6
Skewness0.2564492055
Sum47929
Variance67.68755627
2020-08-25T01:17:23.164599image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
25805.4%
 
26694.7%
 
32644.3%
 
30644.3%
 
28634.3%
 
35624.2%
 
24614.1%
 
27594.0%
 
29594.0%
 
22594.0%
 
36573.9%
 
33553.7%
 
37513.5%
 
34503.4%
 
21483.3%
 
31463.1%
 
23443.0%
 
38443.0%
 
47432.9%
 
45412.8%
 
42402.7%
 
44392.6%
 
43342.3%
 
39342.3%
 
41342.3%
 
Other values (9)17311.7%
 
ValueCountFrequency (%) 
1630.2%
 
1780.5%
 
1870.5%
 
19181.2%
 
20281.9%
 
21483.3%
 
22594.0%
 
23443.0%
 
24614.1%
 
25805.4%
 
ValueCountFrequency (%) 
49231.6%
 
48302.0%
 
47432.9%
 
46221.5%
 
45412.8%
 
44392.6%
 
43342.3%
 
42402.7%
 
41342.3%
 
40342.3%
 

Wifes_education
Categorical

Distinct count4
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
4
577
3
410
2
334
1
152
ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 
2020-08-25T01:17:23.313315image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 
Distinct count4
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
4
899
3
352
2
 
178
1
 
44
ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 
2020-08-25T01:17:23.457975image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 

Number_of_children_ever_born
Real number (ℝ≥0)

ZEROS

Distinct count15
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2613713509843856
Minimum0.0
Maximum16.0
Zeros97
Zeros (%)6.6%
Memory size11.6 KiB
2020-08-25T01:17:23.766104image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q34
95-th percentile8
Maximum16
Range16
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.358548863
Coefficient of variation (CV)0.7231770347
Kurtosis1.529606551
Mean3.261371351
Median Absolute Deviation (MAD)2
Skewness1.099013946
Sum4804
Variance5.562752738
2020-08-25T01:17:23.881620image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
227618.7%
 
127618.7%
 
325917.6%
 
419713.4%
 
51359.2%
 
0976.6%
 
6926.2%
 
7493.3%
 
8473.2%
 
9161.1%
 
11110.7%
 
10110.7%
 
1240.3%
 
1320.1%
 
1610.1%
 
ValueCountFrequency (%) 
0976.6%
 
127618.7%
 
227618.7%
 
325917.6%
 
419713.4%
 
51359.2%
 
6926.2%
 
7493.3%
 
8473.2%
 
9161.1%
 
ValueCountFrequency (%) 
1610.1%
 
1320.1%
 
1240.3%
 
11110.7%
 
10110.7%
 
9161.1%
 
8473.2%
 
7493.3%
 
6926.2%
 
51359.2%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
1
1253
0
 
220
ValueCountFrequency (%) 
1125385.1%
 
022014.9%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
1
1104
0
369
ValueCountFrequency (%) 
1110474.9%
 
036925.1%
 
Distinct count4
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
3
585
1
436
2
425
4
 
27
ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 
2020-08-25T01:17:24.035832image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 
Distinct count4
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
4
684
3
431
2
229
1
 
129
ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 
2020-08-25T01:17:24.179686image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
0
1364
1
 
109
ValueCountFrequency (%) 
0136492.6%
 
11097.4%
 

target
Categorical

Distinct count3
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
1
629
3
511
2
333
ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 
2020-08-25T01:17:24.320166image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 

Interactions

2020-08-25T01:17:21.811823image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:17:21.967723image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:17:22.132857image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:17:22.296304image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:17:24.440696image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:17:24.687625image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:17:24.929989image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:17:25.183633image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-08-25T01:17:25.398941image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-08-25T01:17:22.580876image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:17:22.867764image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

Wifes_ageWifes_educationHusbands_educationNumber_of_children_ever_bornWifes_religionWifes_now_working?Husbands_occupationStandard-of-living_indexMedia_exposuretarget
024.0233.0112301
145.01310.0113401
243.0237.0113401
342.0329.0113301
436.0338.0113201
519.0440.0113301
638.0236.0113201
721.0331.0103201
827.0233.0113401
945.0118.0112211

Last rows

Wifes_ageWifes_educationHusbands_educationNumber_of_children_ever_bornWifes_religionWifes_now_working?Husbands_occupationStandard-of-living_indexMedia_exposuretarget
146330.0132.0113403
146423.0221.0112403
146525.0243.0111303
146642.0246.0112403
146729.0443.0111403
146833.0442.0102403
146933.0443.0111403
147039.0338.0101403
147133.0334.0102203
147217.0331.0112403

Duplicate rows

Most frequent

Wifes_ageWifes_educationHusbands_educationNumber_of_children_ever_bornWifes_religionWifes_now_working?Husbands_occupationStandard-of-living_indexMedia_exposuretargetcount
1226.0441.01114013
2132.0443.01014033
2536.0443.01114023
3141.0444.00024023
020.0233.01134012
120.0340.01133012
221.0230.01134012
321.0331.01133032
421.0441.01114022
522.0441.01113022