Overview

Dataset statistics

Number of variables10
Number of observations1473
Missing cells0
Missing cells (%)0.0%
Duplicate rows48
Duplicate rows (%)3.3%
Total size in memory115.2 KiB
Average record size in memory80.1 B

Variable types

CAT5
BOOL3
NUM2

Reproduction

Analysis started2020-08-25 01:20:25.695688
Analysis finished2020-08-25 01:20:27.567080
Duration1.87 second
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Dataset has 48 (3.3%) duplicate rows Duplicates
Children has 97 (6.6%) zeros Zeros

Variables

Wife_age
Real number (ℝ≥0)

Distinct count34
Unique (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.53835709436524
Minimum16
Maximum49
Zeros0
Zeros (%)0.0%
Memory size11.6 KiB
2020-08-25T01:20:27.612900image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum16
5-th percentile21
Q126
median32
Q339
95-th percentile47
Maximum49
Range33
Interquartile range (IQR)13

Descriptive statistics

Standard deviation8.227244755
Coefficient of variation (CV)0.2528475771
Kurtosis-0.9438944909
Mean32.53835709
Median Absolute Deviation (MAD)6
Skewness0.2564492055
Sum47929
Variance67.68755627
2020-08-25T01:20:27.728007image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
25805.4%
 
26694.7%
 
32644.3%
 
30644.3%
 
28634.3%
 
35624.2%
 
24614.1%
 
22594.0%
 
27594.0%
 
29594.0%
 
36573.9%
 
33553.7%
 
37513.5%
 
34503.4%
 
21483.3%
 
31463.1%
 
23443.0%
 
38443.0%
 
47432.9%
 
45412.8%
 
42402.7%
 
44392.6%
 
43342.3%
 
39342.3%
 
40342.3%
 
Other values (9)17311.7%
 
ValueCountFrequency (%) 
1630.2%
 
1780.5%
 
1870.5%
 
19181.2%
 
20281.9%
 
21483.3%
 
22594.0%
 
23443.0%
 
24614.1%
 
25805.4%
 
ValueCountFrequency (%) 
49231.6%
 
48302.0%
 
47432.9%
 
46221.5%
 
45412.8%
 
44392.6%
 
43342.3%
 
42402.7%
 
41342.3%
 
40342.3%
 

Wife_education
Categorical

Distinct count4
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
4
577
3
410
2
334
1
152
ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 
2020-08-25T01:20:27.886280image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
457739.2%
 
341027.8%
 
233422.7%
 
115210.3%
 
Distinct count4
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
4
899
3
352
2
 
178
1
 
44
ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 
2020-08-25T01:20:28.027894image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
489961.0%
 
335223.9%
 
217812.1%
 
1443.0%
 

Children
Real number (ℝ≥0)

ZEROS

Distinct count15
Unique (%)1.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.2613713509843856
Minimum0
Maximum16
Zeros97
Zeros (%)6.6%
Memory size11.6 KiB
2020-08-25T01:20:28.138764image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q34
95-th percentile8
Maximum16
Range16
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.358548863
Coefficient of variation (CV)0.7231770347
Kurtosis1.529606551
Mean3.261371351
Median Absolute Deviation (MAD)2
Skewness1.099013946
Sum4804
Variance5.562752738
2020-08-25T01:20:28.252590image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
227618.7%
 
127618.7%
 
325917.6%
 
419713.4%
 
51359.2%
 
0976.6%
 
6926.2%
 
7493.3%
 
8473.2%
 
9161.1%
 
11110.7%
 
10110.7%
 
1240.3%
 
1320.1%
 
1610.1%
 
ValueCountFrequency (%) 
0976.6%
 
127618.7%
 
227618.7%
 
325917.6%
 
419713.4%
 
51359.2%
 
6926.2%
 
7493.3%
 
8473.2%
 
9161.1%
 
ValueCountFrequency (%) 
1610.1%
 
1320.1%
 
1240.3%
 
11110.7%
 
10110.7%
 
9161.1%
 
8473.2%
 
7493.3%
 
6926.2%
 
51359.2%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
1
1253
0
 
220
ValueCountFrequency (%) 
1125385.1%
 
022014.9%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
1
1104
0
369
ValueCountFrequency (%) 
1110474.9%
 
036925.1%
 
Distinct count4
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
3
585
1
436
2
425
4
 
27
ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 
2020-08-25T01:20:28.403258image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
358539.7%
 
143629.6%
 
242528.9%
 
4271.8%
 
Distinct count4
Unique (%)0.3%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
4
684
3
431
2
229
1
 
129
ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 
2020-08-25T01:20:28.546278image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters4
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
468446.4%
 
343129.3%
 
222915.5%
 
11298.8%
 
Distinct count2
Unique (%)0.1%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
0
1364
1
 
109
ValueCountFrequency (%) 
0136492.6%
 
11097.4%
 

target
Categorical

Distinct count3
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size11.6 KiB
1
629
3
511
2
333
ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 
2020-08-25T01:20:28.688899image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Length

Max length1
Median length1
Mean length1
Min length1

Overview of Unicode Properties

Unique unicode characters3
Unique unicode categories (?)1
Unique unicode scripts (?)1
Unique unicode blocks (?)1
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Most occurring characters

ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 

Most occurring categories

ValueCountFrequency (%) 
Decimal Number1473100.0%
 

Most frequent Decimal Number characters

ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 

Most occurring scripts

ValueCountFrequency (%) 
Common1473100.0%
 

Most frequent Common characters

ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 

Most occurring blocks

ValueCountFrequency (%) 
ASCII1473100.0%
 

Most frequent ASCII characters

ValueCountFrequency (%) 
162942.7%
 
351134.7%
 
233322.6%
 

Interactions

2020-08-25T01:20:26.199886image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:20:26.366699image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:20:26.524515image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:20:26.685120image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Correlations

2020-08-25T01:20:28.813416image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-08-25T01:20:29.054900image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-08-25T01:20:29.298138image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-08-25T01:20:29.543132image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2020-08-25T01:20:29.760973image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2020-08-25T01:20:26.963232image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/
2020-08-25T01:20:27.440675image/svg+xmlMatplotlib v3.3.1, https://matplotlib.org/

Sample

First rows

Wife_ageWife_educationHusband_educationChildrenWife_religionWife_workingHusband_occupationStandard-of-livingMedia_exposuretarget
024233112301
1451310113401
243237113401
342329113301
436338113201
519440113301
638236113201
721331103201
827233113401
945118112211

Last rows

Wife_ageWife_educationHusband_educationChildrenWife_religionWife_workingHusband_occupationStandard-of-livingMedia_exposuretarget
146330132113403
146423221112403
146525243111303
146642246112403
146729443111403
146833442102403
146933443111403
147039338101403
147133334102203
147217331112403

Duplicate rows

Most frequent

Wife_ageWife_educationHusband_educationChildrenWife_religionWife_workingHusband_occupationStandard-of-livingMedia_exposuretargetcount
12264411114013
21324431014033
25364431114023
31414440024023
0202331134012
1203401133012
2212301134012
3213311133032
4214411114022
5224411113022