Questions Answered by Exploratory Data Analysis (EDA)

Questions Answered by Exploratory Data Analysis (EDA)

What are the key properties of a Dataset (Center, Spread, Skew, probability distribution, correlation, outliers)

1. What is the center of the data (mean, median, mode)
2. How much spread is there in the data? (Variance, Standard deviation, Quartiles, Interquartile Range (IQR), Example: IQR = Q3 - Q1)
3. Is the data skewed? : Mean > Median = Positive, Mean = Median = Symmetrical, Mean < median = Negatively skewed
4. What distribution does the data follows? Is the data Normally distributed?
5. Are the elements in the Dataset uncorrelated? i.e. two variable move positively or negatively together or not; linearly or non-linearly or not
6. Does the center of the data change over time? Example: for time series data, does the mean change over time?
7. Does the spread of the dataset Change over time? Example: for time series data, does the variance change over time?
8. Are there outliers in the data?
9. Does the data conform to your assumptions? Normally Distributed, constant parameter, no outliers, close to normally distributed, members are independent or nearly independent, variance increases over time, or several outliers are there in the data

Reference: Anderson A., Semmelroth D., Statistics for Big Data

--
Sayed Ahmed

Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://sitestree.com, http://bangla.salearningschool.com