{"id":16200,"date":"2019-09-15T20:27:58","date_gmt":"2019-09-16T00:27:58","guid":{"rendered":"http:\/\/bangla.salearningschool.com\/recent-posts\/important-basic-concepts-statistics-for-big-data\/"},"modified":"2019-09-17T17:00:50","modified_gmt":"2019-09-17T21:00:50","slug":"important-basic-concepts-statistics-for-big-data","status":"publish","type":"post","link":"http:\/\/bangla.sitestree.com\/?p=16200","title":{"rendered":"Important Basic Concepts: Statistics for Big Data"},"content":{"rendered":"<p><strong>Important Basic Concepts: Statistics for Big Data<\/strong><\/p>\n<p><strong>Graphical : Exploratory Data Analysis (EDA) methods?<\/strong><br \/>\nFirst of all, EDA is about exploring the data and understanding if the data will be good for the experiment and study. Graphs and plots can easily show the data patterns. The raw data can be difficult to understand for patterns and fitness, Graphs can easily show some information about the data.<\/p>\n<p><strong>Graphical Methods can be as follows:<\/strong><br \/>\n1. Scatter Plots<br \/>\n2. Histograms<br \/>\n3. Box Plots<br \/>\n4. Normal Probability plots<br \/>\n<strong><br \/>\nQuantitative Exploratory Data Analysis Techniques:<\/strong><\/p>\n<p>1. Interval Estimation (Ranges)<br \/>\n2. Hypothesis testing (Null Hypothesis, Alternate Hypothesis)<\/p>\n<p><strong>1. Interval Estimation (Ranges): <\/strong>Create a range of values within which a variable is likely to fall. Confidence Interval (mean will be here) is an interval estimation.<\/p>\n<p><strong>2. Hypothesis testing:<\/strong> Test various propositions about a data<\/p>\n<p>Example: Test that the mean age of Canadian Population is 53.<\/p>\n<p>It&#8217;s a multi-step process. Steps can be as follows:<\/p>\n<p><strong>1. Test Null Hypothesis:<\/strong> Assume the Hypothesis is true<br \/>\n<strong>2. Alternate Hypothesis:<\/strong> Hypothesis that will be accepted if the null hypothesis is rejected<br \/>\n<strong>3. Significance Level: <\/strong>what level of significance the null hypothesis will be conducted (i.e. 95% of the time the average return of index investing is 6% for 10 years period)<br \/>\n<strong>4. Test Statistic: <\/strong>Numerical measure showing sample data is consistent with Null Hypothesis<br \/>\n<strong>6. Critical Value:<\/strong> If test statistic (numerical measure) is more extreme than critical value &#8211; null hypothesis is rejected<br \/>\n<strong>7. Decision:<\/strong> decision is made by considering Test Statistic and Critical value<\/p>\n<p><strong>Some Basic Probability Distributions:<\/strong><\/p>\n<p><strong>Binomial Distribution:<\/strong> When the variable can have only one of two values<\/p>\n<p><strong>Poisson Distribution: <\/strong> Describe the likelihood of given number of events occurring during a time interval (customers to your shop in an hour)<\/p>\n<p><strong>Normal Distribution:<\/strong> Symmetrical data. probability that a variable will have a given distance from the mean on both lower and higher side is equal.<\/p>\n<p><strong>t distribution:<\/strong> Similar to Normal Distribution. Extreme large or extreme low values are highly likely. Shows too much variance. Useful when the sample size is small (it is also told when there is not variance, standard deviation)<\/p>\n<p><strong>Chi Square Test: <\/strong>Test to see if a population follows a particular distribution such as normal distribution.<\/p>\n<p><strong>The F distribution:<\/strong> To test if two datasets are from the same population (by using variances).<\/p>\n<p><strong>Related Concepts:<\/strong><\/p>\n<p><strong>What is Z Score? <\/strong><br \/>\nProbability of a particular score to be occurring in our normal distribution.<br \/>\nHelps to compare two values that are from two different normal distributions<\/p>\n<p><strong>Another definition:<\/strong> it is a measure on how a value is related to the mean.<\/p>\n<p><strong>Chi Square test for Normal Distribution:<\/strong><br \/>\nNull Hypothesis: No relation exists between categorical variables. They are independent. If the Hypothesis is true, it is a normal distribution<\/p>\n<p><strong>What is p value in Chi Square test:<\/strong><br \/>\np value is just a significance. Helps to understand the significance of the result. A small p value means a strong evidence against the Null Hypothesis.<\/p>\n<p>Reference: Anderson A., Semmelroth D., Statistics for Big Data<\/p>\n<p>Sayed Ahmed<\/p>\n<p>Linkedin: <a href=\"https:\/\/ca.linkedin.com\/in\/sayedjustetc\">https:\/\/ca.linkedin.com\/in\/sayedjustetc<\/a><\/p>\n<p>Blog: <a href=\"http:\/\/sitestree.com\">http:\/\/sitestree.com<\/a>, <a href=\"http:\/\/bangla.salearningschool.com\">http:\/\/bangla.salearningschool.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Important Basic Concepts: Statistics for Big Data Graphical : Exploratory Data Analysis (EDA) methods? First of all, EDA is about exploring the data and understanding if the data will be good for the experiment and study. Graphs and plots can easily show the data patterns. The raw data can be difficult to understand for patterns &hellip; <\/p>\n<p><a class=\"more-link btn\" href=\"http:\/\/bangla.sitestree.com\/?p=16200\">Continue reading<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1904,182],"tags":[],"class_list":["post-16200","post","type-post","status-publish","format-standard","hentry","category-statistics-for-big-data","category---blog","item-wrap"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":26205,"url":"http:\/\/bangla.sitestree.com\/?p=26205","url_meta":{"origin":16200,"position":0},"title":"Important Basic Concepts: Statistics for Big Data #Root","author":"Author-Check- Article-or-Video","date":"April 19, 2021","format":false,"excerpt":"Important Basic Concepts: Statistics for Big Data Graphical : Exploratory Data Analysis (EDA) methods? First of all, EDA is about exploring the data and understanding if the data will be good for the experiment and study. Graphs and plots can easily show the data patterns. The raw data can be\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":16619,"url":"http:\/\/bangla.sitestree.com\/?p=16619","url_meta":{"origin":16200,"position":1},"title":"Math\/Stat\/CS\/DS Topics that you need to know (with Cognitive, Psychomotor, Affective domain skills) to become a true and great Data Scientist","author":"Sayed","date":"January 7, 2020","format":false,"excerpt":"\"The core topics are cross-validation, shrinkage methods (ridge regression, the LASSO, etc.), neural networks, gradient boosting, separating hyperplanes, support vector machines, basis expansion and regularization (e.g., smoothing splines, wavelet smoothing, kernel smoothing), generalized additive models, bump hunting, multivariate adaptive regression splines (MARS), self-organizing maps, mixture model-based clustering, ensemble learning, and\u2026","rel":"","context":"In &quot;Math and Statistics for Data Science, and Engineering&quot;","block_context":{"text":"Math and Statistics for Data Science, and Engineering","link":"http:\/\/bangla.sitestree.com\/?cat=1908"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":78242,"url":"http:\/\/bangla.sitestree.com\/?p=78242","url_meta":{"origin":16200,"position":2},"title":"Statistics for Data Analytics and Machine Learning Projects","author":"Sayed","date":"May 22, 2025","format":false,"excerpt":"\u2022Null Hypothesis \u2022[2] \u2022Paired t-test \u2022Unpaired t-test \u2022Pearson Correlation \u2022One Way: Analysis of variance \u2022Spearman Correlation \u2022Spearman \u2022Kendal Tau Coef \u2022Wilcoxon Sum test \u2022Basic EDA \u2022Mcnaimer\u2019s test \u2022Friedman test \u2022Kruskal-Wallis Test \u2022Two Way Analysis of variance \u2022K-Fold Cross Validation paired t-test \u2022Wilcoxon Signed Rank Test Data Analytics, Machine Learning, Data\u2026","rel":"","context":"In &quot;Analytics and Machine Learning Project Development&quot;","block_context":{"text":"Analytics and Machine Learning Project Development","link":"http:\/\/bangla.sitestree.com\/?cat=1974"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2025\/05\/image-37.png?resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2025\/05\/image-37.png?resize=350%2C200 1x, https:\/\/i0.wp.com\/bangla.sitestree.com\/wp-content\/uploads\/2025\/05\/image-37.png?resize=525%2C300 1.5x"},"classes":[]},{"id":76547,"url":"http:\/\/bangla.sitestree.com\/?p=76547","url_meta":{"origin":16200,"position":3},"title":"Advanced Data Visualization Resources (in Python)","author":"Sayed","date":"January 1, 2025","format":false,"excerpt":"Advanced Data Visualization Advanced Data Visualization Data Visualization Misc Data Visualization Overview: Data Visualization: About Data Visualization http:\/\/guides.library.duke.edu\/c.php?g=289678&p=1930713 I Can See Clearly Now: A Survey of Data Visualization Techniques & Practice https:\/\/www.slideshare.net\/myles_harrison\/i-can-see-clearly-now-a-survey-of-data-visualization-techniques-practice-31179055 Visualization in Data Science: What is it for? http:\/\/guides.library.duke.edu\/c.php?g=289678&p=1930713 Anscombe's quartet https:\/\/en.wikipedia.org\/wiki\/Anscombe's_quartet Storytelling and data visualization\u2026 So what?\u2026","rel":"","context":"In &quot;Data Visualization&quot;","block_context":{"text":"Data Visualization","link":"http:\/\/bangla.sitestree.com\/?cat=1903"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":16199,"url":"http:\/\/bangla.sitestree.com\/?p=16199","url_meta":{"origin":16200,"position":4},"title":"Questions Answered by Exploratory Data Analysis (EDA)","author":"Sayed","date":"September 15, 2019","format":false,"excerpt":"Questions Answered by Exploratory Data Analysis (EDA) What are the key properties of a Dataset (Center, Spread, Skew, probability distribution, correlation, outliers) 1. What is the center of the data (mean, median, mode) 2. How much spread is there in the data? (Variance, Standard deviation, Quartiles, Interquartile Range (IQR), Example:\u2026","rel":"","context":"In &quot;Statistics for Big Data&quot;","block_context":{"text":"Statistics for Big Data","link":"http:\/\/bangla.sitestree.com\/?cat=1904"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":26203,"url":"http:\/\/bangla.sitestree.com\/?p=26203","url_meta":{"origin":16200,"position":5},"title":"Questions Answered by Exploratory Data Analysis (EDA) #Root","author":"Author-Check- Article-or-Video","date":"April 19, 2021","format":false,"excerpt":"Questions Answered by Exploratory Data Analysis (EDA) What are the key properties of a Dataset (Center, Spread, Skew, probability distribution, correlation, outliers) 1. What is the center of the data (mean, median, mode) 2. How much spread is there in the data? (Variance, Standard deviation, Quartiles, Interquartile Range (IQR), Example:\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]}],"_links":{"self":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/16200","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16200"}],"version-history":[{"count":1,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/16200\/revisions"}],"predecessor-version":[{"id":16201,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/16200\/revisions\/16201"}],"wp:attachment":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16200"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16200"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16200"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}