Python: Ecommerce: Part — 2: Drop Duplicates, Sort, and Take Only Unique Products After Merging All Supplier D ata Files into One File

All code in One Block

# # Section: Verify, and Process Supplier Data Before Sending products to 
# # your retail (Magento 2) or marketplace (Amazon, Walmart)# In[7]:# combined_csv.sort_values(“Model Code”, inplace = True) 
# dropping ALL duplicte values based on Product SKU = Model Codeno_duplicates_combined_csv = combined_csv.drop_duplicates(subset = “Model Code”, 
 keep = False, inplace = False);
no_duplicates_combined_csv.shape# In[8]:#55690 vs 55527# In[9]:no_duplicates_combined_csv_verify = combined_csv;
type(no_duplicates_combined_csv_verify)# In[10]:# verify the shape after dropping duplicates
no_duplicates_combined_csv_verify.drop_duplicates(subset = “Model Code”, 
 keep = False, inplace = True);
len(no_duplicates_combined_csv_verify)# In[11]:#55690 vs 55527# In[12]:# show combined data : show first 10 rows
no_duplicates_combined_csv[:3]# In[16]:# Stop# # Find only the unique products, sorted and duplicate removed# In[14]:# sorting by SKU = Model Code
sorted_merged_data = no_duplicates_combined_csv.sort_values(“Model Code”, inplace = False) 
sorted_merged_data.head()# dropping ALL duplicte values : No need here. Though old code : keeping it anyway
unique_sorted_data = sorted_merged_data.drop_duplicates(subset =”Model Code”, keep = False, inplace = False) 
unique_sorted_data.head(3)# In[15]:# total data count at this point
unique_sorted_data.shape

From Jupyter Notebook: Cell by Cell with output

Section: Verify, and Process Supplier Data Before Sending products to your retail (Magento 2) or marketplace (Amazon, Walmart)

In [7]:

# combined_csv.sort_values("Model Code", inplace = True)# dropping ALL duplicte values based on Product SKU = Model Codeno_duplicates_combined_csv = combined_csv.drop_duplicates(subset = "Model Code", keep = False, inplace = False);no_duplicates_combined_csv.shape

Out[7]:

(55527, 40)

In [8]:

#55690 vs 55527

In [9]:

no_duplicates_combined_csv_verify = combined_csv;type(no_duplicates_combined_csv_verify)

Out[9]:

pandas.core.frame.DataFrame

In [10]:

# verify the shape after dropping duplicatesno_duplicates_combined_csv_verify.drop_duplicates(subset = "Model Code", keep = False, inplace = True);len(no_duplicates_combined_csv_verify)

Out[10]:

55527

In [11]:

#55690 vs 55527

In [12]:

# show combined data : show first 10 rowsno_duplicates_combined_csv[:3]

Out[12]:

Product IDModel CodeFull Product NameShort Product NameProduct URLCategory NameCategory URLSubcategory NameSubcategory URLDate Product Was Launched…Related ProductsRelated AccessoriesWeight KgHeight mmWidth mmDepth mmVideo linkRetail PriceStock statusDate Back0107890POU_0850GV7YPull Rope Fitness Exercises Resistance Bands L…Pull Rope Fitness Exercises Resistance Bands L…

3 rows × 40 columns

In [16]:

# Stop

Find only the unique products, sorted and duplicate removed

In [14]:

# sorting by SKU = Model Codesorted_merged_data = no_duplicates_combined_csv.sort_values("Model Code", inplace = False)sorted_merged_data.head()# dropping ALL duplicate values :  No need here. Though old code : keeping it anywayunique_sorted_data = sorted_merged_data.drop_duplicates(subset ="Model Code",  keep = False, inplace = False)unique_sorted_data.head(3)

Out[14]:

Product IDModel CodeFull Product NameShort Product NameProduct URLCategory NameCategory URLSubcategory NameSubcategory URLDate Product Was Launched…Related ProductsRelated AccessoriesWeight KgHeight mmWidth mmDepth mmVideo linkRetail PriceStock statusDate Back899230399A01AL3301111Black 3x3x3 MoYu AoLong V2 PuzzleBlack 3x3x3 MoYu AoLong V2 Puzzle

3 rows × 40 columns

In [15]:

# total data count at this pointunique_sorted_data.shape

Out[15]:

(55527, 40)

***. ***. ***

Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)

MSc. in Comp. Sc. (U of Manitoba, Canada)

MSc. in Data Science and Analytics (Ryerson University, Canada)

Linkedinhttps://ca.linkedin.com/in/sayedjustetc

Bloghttp://Bangla.SaLearningSchool.comhttp://SitesTree.com

Training Courses: http://Training.SitesTree.com

8112223 Canada Inc/Justetchttp://JustEtc.net

Facebook Groups/Forums to discuss (Q & A):

https://www.facebook.com/banglasalearningschool

https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Build Ecommerce Software and Systems

Build Ecommerce Software and Systems

8112223 Canada Inc. (Justetc)

WRITTEN BY

Software Engineer, Data Scientist, Machine Learning Engineer.

Build Ecommerce Software and Systems

Build Ecommerce Software and Systems

Write the first response