Python: Ecommerce: Part — 1: Merge Multiple Supplier Data Files into One File

Python: Ecommerce: Part — 1: Merge Multiple Supplier Data Files into One File

Section: Merge multiple Supplier Data Files

All code in one block

#!/usr/bin/env python
# coding: utf-8
# # Section: Merge multiple Supplier Data Files
#
# In[1]:
# if there is a need to merge multiple files — use this block
import os;
import glob;
import pandas as pd;
# supplier data files/feeds are kept here
data_folder = ‘data-supplier-2019–04–14/supplier-raw-data/’;
os.chdir(data_folder);
# In[6]:
# show all data feed file name
# file extension for supplier data file
extension = ‘csv’;
all_filenames = [i for i in glob.glob(‘*.{}’.format(extension))]
all_filenames
# In[7]:
# total number of rows combined all data files/feeds
row_total_count = 0
for f in all_filenames:
df_s = pd.read_csv(f)
print(df_s.shape, f)
row_total_count += df_s.shape[0]
row_total_count # print(row_total_count)
# In[8]:
# combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames]);
combined_csv.shape
# In[10]:
# export combined data to a csv file
combined_csv.to_csv( “../all_supplier_products_2019_04_14.csv”, index=False, encoding=’utf-8-sig’)
# In[13]:
# read csv data file and show data on the screen
df = pd.read_csv(‘../all_supplier_products_2019_04_14.csv’);
df.head()

The following is from Jupyter Notebook: Cell By Cell Display. Output data are also shown

In [1]:

# if there is a need to merge multiple files -- use this block
import os;
import glob;
import pandas as pd;

# supplier data files/feeds are kept here
data_folder = 'data-supplier-2019-04-14/supplier-raw-data/';
os.chdir(data_folder);

In [6]:

# show all data feed file name
# file extension for supplier data file
extension = 'csv';
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
all_filenames

Out[6]:

['data_feeds_5e95c25a1f7f6.csv',
'data_feeds_5e95c2962d471.csv',
'data_feeds_5e95c2d255409.csv',
'data_feeds_5e95c30e63423.csv',
'data_feeds_5e95c38646478.csv',
'data_feeds_5e95c5dd76370.csv']

In [7]:

# total number of rows combined all data files/feeds
row_total_count = 0
for f in all_filenames:
df_s = pd.read_csv(f)
print(df_s.shape, f)
row_total_count += df_s.shape[0]
row_total_count # print(row_total_count)
(8058, 40) data_feeds_5e95c25a1f7f6.csv
(7, 40) data_feeds_5e95c2962d471.csv
(1, 40) data_feeds_5e95c2d255409.csv
... ....
(1072, 40) data_feeds_5e95c565d6e30.csv
(4833, 40) data_feeds_5e95c5dd76370.csv

Out[7]:

55690

In [8]:

# combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames]);
combined_csv.shape

Out[8]:

(55690, 40)

In [10]:

# export combined data to a csv file
combined_csv.to_csv( "../all_supplier_products_2019_04_14.csv", index=False, encoding='utf-8-sig')

In [13]:

df = pd.read_csv('../all_supplier_products_2019_04_14.csv');
df.head()

Out[13]:

Product ID Model Code Full Product NameShort Product NameProduct URLCategory NameCategory URLSubcategory NameSubcategory URLDate Product Was Launched…Related ProductsRelated AccessoriesWeight KgHeight mmWidth mmDepth mmVideo linkRetail PriceStock statusDate Back0107890POU_0850GV7YPull Rope Fitness Exercises Resistance Bands L…Pull Rope Fitness

***. ***. ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***
Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Training Courses: http://Training.SitesTree.com
8112223 Canada Inc/Justetc: http://JustEtc.net

Facebook Groups/Forums to discuss (Q & A):
https://www.facebook.com/banglasalearningschool
https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/