{"id":16198,"date":"2019-09-15T15:31:35","date_gmt":"2019-09-15T19:31:35","guid":{"rendered":"https:\/\/bangla.salearningschool.com\/recent-posts\/best-practices-in-data-preparation\/"},"modified":"2019-09-17T17:00:56","modified_gmt":"2019-09-17T21:00:56","slug":"best-practices-in-data-preparation","status":"publish","type":"post","link":"http:\/\/bangla.sitestree.com\/?p=16198","title":{"rendered":"Best Practices in Data Preparation"},"content":{"rendered":"<p>Best Practices in Data Preparation<\/p>\n<p>1. Check data formats (Image, CSV, PC, Mac, mainframe, text, structured, unstructured)<br \/>\n2. Verify data types (numbers, text, floats, currencies, nominal, ordinal, interval, range)<br \/>\n3. Graph your Data (Scatter, Histogram, bar, line)<br \/>\n4. Verify the data (data accuracy, data makes sense)<br \/>\n5. Identify outliers ( Examples: very large or very small (than the rest))<br \/>\n6. Deal with missing values<br \/>\n7. Check your assumptions on data distribution (normal, poisson )<br \/>\n8. Backup and document &#8211; everything that you do<\/p>\n<p>Reference: Anderson A., Semmelroth D.<br \/>\nSayed Ahmed<\/p>\n<p>Linkedin: <a href=\"https:\/\/ca.linkedin.com\/in\/sayedjustetc\">https:\/\/ca.linkedin.com\/in\/sayedjustetc<\/a><\/p>\n<p>Blog: <a href=\"http:\/\/sitestree.com\">http:\/\/sitestree.com<\/a>, <a href=\"http:\/\/bangla.salearningschool.com\">http:\/\/bangla.salearningschool.com<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Best Practices in Data Preparation 1. Check data formats (Image, CSV, PC, Mac, mainframe, text, structured, unstructured) 2. Verify data types (numbers, text, floats, currencies, nominal, ordinal, interval, range) 3. Graph your Data (Scatter, Histogram, bar, line) 4. Verify the data (data accuracy, data makes sense) 5. Identify outliers ( Examples: very large or very &hellip; <\/p>\n<p><a class=\"more-link btn\" href=\"http:\/\/bangla.sitestree.com\/?p=16198\">Continue reading<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1904,182],"tags":[],"class_list":["post-16198","post","type-post","status-publish","format-standard","hentry","category-statistics-for-big-data","category---blog","item-wrap"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack-related-posts":[{"id":26201,"url":"http:\/\/bangla.sitestree.com\/?p=26201","url_meta":{"origin":16198,"position":0},"title":"Best Practices in Data Preparation #Root","author":"Author-Check- Article-or-Video","date":"April 19, 2021","format":false,"excerpt":"Best Practices in Data Preparation 1. Check data formats (Image, CSV, PC, Mac, mainframe, text, structured, unstructured) 2. Verify data types (numbers, text, floats, currencies, nominal, ordinal, interval, range) 3. Graph your Data (Scatter, Histogram, bar, line) 4. Verify the data (data accuracy, data makes sense) 5. Identify outliers (\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":24955,"url":"http:\/\/bangla.sitestree.com\/?p=24955","url_meta":{"origin":16198,"position":1},"title":"Text Classification such as article category classification with Deep Learning\/Neural Network Approach #Root","author":"Author-Check- Article-or-Video","date":"April 14, 2021","format":false,"excerpt":"Text Classification such as article category classification with Deep Learning\/Neural Network Approach What deep learning method to use to classify text files? https:\/\/www.quora.com\/What-deep-learning-method-to-use-to-classify-text-files Classification Examples: https:\/\/faroit.com\/keras-docs\/0.3.3\/examples\/ Best Practices for Document Classification with Deep Learning https:\/\/machinelearningmastery.com\/best-practices-document-classification-deep-learning\/ LSTM with sentence representations for document-level sentiment classification https:\/\/www.sciencedirect.com\/science\/article\/pii\/S092523121830479X A C-LSTM Neural Network for Text\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":22577,"url":"http:\/\/bangla.sitestree.com\/?p=22577","url_meta":{"origin":16198,"position":2},"title":"Some Data Analysis Tools #Root #By Sayed Ahmed #Data Warehouse Misc #Big Data","author":"Author-Check- Article-or-Video","date":"March 16, 2021","format":false,"excerpt":"Just some Links: Data Collection and Analysis Tools http:\/\/asq.org\/learn-about-quality\/data-collection-analysis-tools\/overview\/overview.html \u00a0 Big Data Analytics: Time For New Tools http:\/\/www.informationweek.com\/big-data\/big-data-analytics\/big-data-analytics-time-for-new-tools\/a\/d-id\/1318106 \u00a0 Data analysis tools target non-experts Tools simplify the application of advanced analytics and the interpretation of results http:\/\/radar.oreilly.com\/2013\/08\/data-analysis-tools-target-non-experts.html \u00a0 Guide to big data analytics tools, trends and best practices http:\/\/searchbusinessanalytics.techtarget.com\/essentialguide\/Guide-to-big-data-analytics-tools-trends-and-best-practices \u00a0\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":17003,"url":"http:\/\/bangla.sitestree.com\/?p=17003","url_meta":{"origin":16198,"position":3},"title":"Python: Ecommerce: Part \u2014 2: Drop Duplicates, Sort, and Take Only Unique Products After Merging All Supplier D ata Files into One File","author":"Sayed","date":"April 19, 2020","format":false,"excerpt":"All code in One Block # # Section: Verify, and Process Supplier Data Before Sending products to # # your retail (Magento 2) or marketplace (Amazon, Walmart)# In[7]:# combined_csv.sort_values(\u201cModel Code\u201d, inplace = True) # dropping ALL duplicte values based on Product SKU = Model Codeno_duplicates_combined_csv = combined_csv.drop_duplicates(subset = \u201cModel Code\u201d,\u2026","rel":"","context":"In &quot;Build Ecommerce Software&quot;","block_context":{"text":"Build Ecommerce Software","link":"http:\/\/bangla.sitestree.com\/?cat=1912"},"img":{"alt_text":"8112223 Canada Inc. (Justetc)","src":"https:\/\/miro.medium.com\/fit\/c\/80\/80\/0*P_esmjKoJnHlNjFX","width":350,"height":200},"classes":[]},{"id":22237,"url":"http:\/\/bangla.sitestree.com\/?p=22237","url_meta":{"origin":16198,"position":4},"title":"On Data Management from Enterprise Data Analytics to Data-Based Decision Making #Root #Data Warehouse Misc","author":"Author-Check- Article-or-Video","date":"March 12, 2021","format":false,"excerpt":"On Data Management from Enterprise Data Analytics to Data-Based Decision Making Establish data quality standards and train others in this regard Five steps to an improved data quality assurance plan Data Quality Standards Handbook on Data Quality Assessment Methods and Tools Five Fundamental Data Quality Practices data management\/analysis software tools\u2026","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"","src":"","width":0,"height":0},"classes":[]},{"id":24598,"url":"http:\/\/bangla.sitestree.com\/?p=24598","url_meta":{"origin":16198,"position":5},"title":"Tech Talk Follow Up: Best Practices for Security in Amazon S3 #Root","author":"Author-Check- Article-or-Video","date":"April 12, 2021","format":false,"excerpt":"The on-demand content is now available. Webinar: Best Practices for Security in Amazon S3 Recording Presentation slides From: http:\/\/sitestree.com\/tech-talk-follow-up-best-practices-for-security-in-amazon-s3\/ Categories:RootTags: Post Data:2018-07-30 20:13:59 Shop Online: https:\/\/www.ShopForSoul.com\/ (Big Data, Cloud, Security, Machine Learning): Courses: http:\/\/Training.SitesTree.com In Bengali: http:\/\/Bangla.SaLearningSchool.com http:\/\/SitesTree.com 8112223 Canada Inc.\/JustEtc: http:\/\/JustEtc.net (Software\/Web\/Mobile\/Big-Data\/Machine Learning) Shop Online: https:\/\/www.ShopForSoul.com\/ Medium: https:\/\/medium.com\/@SayedAhmedCanada","rel":"","context":"In &quot;FromSitesTree.com&quot;","block_context":{"text":"FromSitesTree.com","link":"http:\/\/bangla.sitestree.com\/?cat=1917"},"img":{"alt_text":"Amazon Web Services","src":"https:\/\/i0.wp.com\/pages.awscloud.com\/rs\/112-TZM-766\/images\/AWS_logo_RGB_2017_100x60.png?resize=350%2C200&ssl=1","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/16198","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=16198"}],"version-history":[{"count":1,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/16198\/revisions"}],"predecessor-version":[{"id":16203,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=\/wp\/v2\/posts\/16198\/revisions\/16203"}],"wp:attachment":[{"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=16198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=16198"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/bangla.sitestree.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=16198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}