Python: Merge Multiple csv files into one to facilitate reporting on transaction data over time
By Sayed Ahmed
Merge multiple transaction files into one. This is an extension to the article:
The Code for Merging
#!/usr/bin/env python # coding: utf-8
# In[1]:
import os import glob import pandas as pd data_folder = ‘./’ os.chdir(data_folder)
# In[2]:
extension = ‘csv’ all_filenames = [i for i in glob.glob(‘*.{}’.format(extension))]
# In[ ]:
sorted(list(all_filenames))
# In[3]:
# test : Check the data for each csv files to be combined. # do the data align well with each other row_total_count = 0 for i in range(0, len(all_filenames)): j = i + 1 for f in all_filenames[i:j]: file = f print(file) df_s = pd.read_csv(f) print(df_s.shape, f) #print(f) row_total_count += df_s.shape[0] print(df_s.head())
#row_total_count #df_s.head()
# In[10]:
# keep track of total rows in all files so that you can compare the shape with the final combined data file row_total_count = 0 for f in all_filenames: file = f print(file) df_s = pd.read_csv(f, header=None) print(df_s.shape, f) #print(f) row_total_count += df_s.shape[0] #print(df_s.head())
row_total_count #df_s.head()
# In[15]:
#combine all files in the list, axis =0 i.e. one after another combined_csv = pd.concat([pd.read_csv(f, header=None) for f in all_filenames], axis=0) # combined_csv.sort_values(“Model Code”, inplace = True) # dropping ALL duplicate values #combined_csv.drop_duplicates(subset =”Model Code”, keep = False, inplace = True)
#export to csv combined_csv.to_csv( “rbc_mastercard_data_combined”, index=False, encoding=’utf-8-sig’)
# In[16]:
combined_csv.shape
# In[17]:
row_total_count == combined_csv.shape[0]
# In[19]:
df = pd.read_csv(‘rbc_mastercard_data_combined’) df.head(100)
# In[ ]:
df.shape
Final Output
Posted On:
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada
*** . *** *** . *** . *** . ***
Sayed Ahmed
BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Facebook Groups/Forums to discuss (Q & A):
https://www.facebook.com/banglasalearningschool
https://www.facebook.com/justetcsocial
Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/