Python: Read Amex (American Express) Canada: PDF Statement Transaction Data into CSV file
By Sayed Ahmed
In general, you will get CSV files from your Amex card; However, sometimes you need to download in every 3 or 6 months. Otherwise, you might only get/request pdf files for transactions (depends on the card).
Sure, you can use tools such as Adobe Acrobat (Paid) to convert PDF statements to csv/xls files. This is a simple approach that you might be able to use when you do not have access to any such tools.
Steps:
Download the statements.
Copy the text (only the transactions) into a text file
The use the code provided here or similar to read the text file and create a CSV file
Note:
You can use PDF export software to open pdf file and copy/paste. Acrobat did not give me a good format.
Copy and paste using PDF Export (to open pdf), resulted 5 lines for each transaction. if any transaction (credit adjustment) comes to be 4 lines or 6 lines (foreign transaction), make them 5 lines.
Related:
# code will generate both individual csv output files; also a merged file that combines all data into one csv file
#!/usr/bin/env python
# coding: utf-8
# In[1]:
import os;
import glob;
import pandas as pd;
os.chdir(‘./’);
# In[2]:
extension = ‘csv’;
all_filenames = [i for i in glob.glob(‘*.{}’.format(extension))];
# In[3]:
sorted(list(all_filenames))
# In[4]:
os.getcwd();
# In[5]:
# test : Check the data for each csv files to be combined.
# do the data align well with each other
row_total_count = 0
for i in range(0, len(all_filenames)):
j = i + 1
for f in all_filenames[i:j]:
file = f
print(file)
df_s = pd.read_csv(f)
print(df_s.shape, f)
#print(f)
row_total_count += df_s.shape[0]
print(df_s.head(1))
#row_total_count
#df_s.head()
# In[6]:
# keep track of total rows in all files so that you can compare the shape with the final combined data file
row_total_count = 0
for f in all_filenames:
file = f
print(file)
df_s = pd.read_csv(f, header=None)
print(df_s.shape, f)
#print(f)
row_total_count += df_s.shape[0]
#print(df_s.head())
row_total_count
#df_s.head()
# In[7]:
#combine all files in the list, axis =0 i.e. one after another
combined_csv = pd.concat([pd.read_csv(f, header=None) for f in all_filenames], axis=0)
# combined_csv.sort_values(“Model Code”, inplace = True)
# dropping ALL duplicte values
#combined_csv.drop_duplicates(subset =”Model Code”, keep = False, inplace = True)
#export to csv
combined_csv.to_csv( “amex_gold_card_data_combined”, index=False, encoding=’utf-8-sig’)
# In[8]:
combined_csv.shape
# In[9]:
row_total_count == combined_csv.shape[0]
# In[11]:
df = pd.read_csv(‘amex_gold_card_data_combined’)
df.head(3)
# In[12]:
df.shape
# In[ ]:
# In[ ]:
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada
*** . *** *** . *** . *** . ***
Sayed Ahmed
BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc
Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Training Courses: http://Training.SitesTree.com
Facebook Groups/Forums to discuss (Q & A):
https://www.facebook.com/banglasalearningschool
Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/