Python: Read Amex (American Express) Canada: Mastercard PDF Statement Transaction Data into CSV file

Python: Read Amex (American Express) Canada: PDF Statement Transaction Data into CSV file

By Sayed Ahmed

In general, you will get CSV files from your Amex card; However, sometimes you need to download in every 3 or 6 months. Otherwise, you might only get/request pdf files for transactions (depends on the card).

Sure, you can use tools such as Adobe Acrobat (Paid) to convert PDF statements to csv/xls files. This is a simple approach that you might be able to use when you do not have access to any such tools.

Steps:

Download the statements.

Copy the text (only the transactions) into a text file

The use the code provided here or similar to read the text file and create a CSV file

Note:

You can use PDF export software to open pdf file and copy/paste. Acrobat did not give me a good format.

Copy and paste using PDF Export (to open pdf), resulted 5 lines for each transaction. if any transaction (credit adjustment) comes to be 4 lines or 6 lines (foreign transaction), make them 5 lines.

Related:

 
# code will generate both individual csv output files; also a merged file that combines all data into one csv file
#!/usr/bin/env python
# coding: utf-8
# In[1]:
import os;
import glob;
import pandas as pd;
os.chdir(‘./’);
# In[2]:
extension = ‘csv’;
all_filenames = [i for i in glob.glob(‘*.{}’.format(extension))];
# In[3]:
sorted(list(all_filenames))
# In[4]:
os.getcwd();
# In[5]:
# test : Check the data for each csv files to be combined.
# do the data align well with each other
row_total_count = 0
for i in range(0, len(all_filenames)):
j = i + 1
for f in all_filenames[i:j]:
file = f
print(file)
df_s = pd.read_csv(f)
print(df_s.shape, f)
#print(f)
row_total_count += df_s.shape[0]
print(df_s.head(1))
#row_total_count
#df_s.head()
# In[6]:
# keep track of total rows in all files so that you can compare the shape with the final combined data file
row_total_count = 0
for f in all_filenames:
file = f
print(file)
df_s = pd.read_csv(f, header=None)
print(df_s.shape, f)
#print(f)
row_total_count += df_s.shape[0]
#print(df_s.head())
row_total_count
#df_s.head()
# In[7]:
#combine all files in the list, axis =0 i.e. one after another
combined_csv = pd.concat([pd.read_csv(f, header=None) for f in all_filenames], axis=0)
# combined_csv.sort_values(“Model Code”, inplace = True)
# dropping ALL duplicte values
#combined_csv.drop_duplicates(subset =”Model Code”, keep = False, inplace = True)
#export to csv
combined_csv.to_csv( “amex_gold_card_data_combined”, index=False, encoding=’utf-8-sig’)
# In[8]:
combined_csv.shape
# In[9]:
row_total_count == combined_csv.shape[0]
# In[11]:
df = pd.read_csv(‘amex_gold_card_data_combined’)
df.head(3)
# In[12]:
df.shape
# In[ ]:
# In[ ]:

Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)

MSc. in Comp. Sc. (U of Manitoba, Canada)

MSc. in Data Science and Analytics (Ryerson University, Canada)

Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com

Training Courses: http://Training.SitesTree.com

Facebook Groups/Forums to discuss (Q & A):

https://www.facebook.com/banglasalearningschool

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Matlab: Optimization Project Examples

You might want to start with Example Projects or Check examples after some theories. Just digging theories more and more might confuse you (or make things look harder) esp. if your goal is to be eventually able to write code for an optimization project. It might help you to understand stuff faster. Sure, dig theories to get clear understanding. For researchers who want to improve (invent) the background technology – clear understanding of theories are a must.

Step 1: Mathematical Modeling with Optimization, Part 1

https://www.mathworks.com/videos/mathematical-modeling-with-optimization-part-1-101559.html

Step 2: Equation to MATLAB. Chart to find what can be a possible solution

https://www.mathworks.com/videos/mathematical-modeling-with-optimization-part-2-68974.html

Mathematical Modeling with Optimization, Part 3: Problem-Based Mixed-Integer Linear Programming

https://www.mathworks.com/videos/mathematical-modeling-with-optimization-part-3-problem-based-mixed-integer-linear-programming-1504298693853.html

Problem-Based Nonlinear Programming

https://www.mathworks.com/videos/part-4-problem-based-nonlinear-programming-1549458887351.html

Unconstrained Optimization Example

https://www.mathworks.com/help/optim/examples/tutorial-for-the-optimization-toolbox.html

Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***
Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Training Courses: http://Training.SitesTree.com

Facebook Groups/Forums to discuss (Q & A):
https://www.facebook.com/banglasalearningschool
https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Python: Generate Expense Report from RBC MasterCard Transaction Data

Python: Generate Expense Report from RBC MasterCard Transaction Data

For the context read the following two first

Python: Read RBC Canada: Mastercard PDF Statement Transaction Data into CSV file

By Sayed Ahmed


medium.com


Python: Merge Multiple csv files into one to facilitate reporting on transaction data over time

By Sayed Ahmed


medium.com

After, you have created the merged i.e. combined csv file, you might want to create reports for your purpose. For example, you can calculate how much was spent on Amazon.ca. The code example will show similar how much was spent on Amazon.ca.

Just by changing the variab;e

spend_calculation_for = ‘AMAZON.CA’;

you will be able to calculate how much was spent on other places such as BestBuy, Ebay or similar. The code is case sensitive; Hence, you need to provide the value exactly for the above mentioned variable.

Statements such as dollar, no_dol_sign_dollar, you might want to use print for them. This code was written in Jupyter Notebook and then was exported to .py file.

#!/usr/bin/env python
# coding: utf-8# In[1]:import os;
import glob;
import pandas as pd;
data_folder = ‘./’;
os.chdir(data_folder);# In[2]:Spend_calculation_for = ‘AMAZON.CA’;# In[3]:df = pd.read_csv(‘to_process_rbc_mastercard_data_combined.csv’)
df.head(3)# In[4]:transactions = df[ df[‘2’].str.contains(Spend_calculation_for) ]# In[5]:dollar = transactions[0:50][‘3’]
dollar = sorted(dollar)
dollar# In[6]:no_dol_sign_dollar = [x.strip(“$”) for x in dollar]
no_dol_sign_dollar# In[7]:no_dol_sign_dollar = [float(x.replace(“-$”, ‘-’)) for x in no_dol_sign_dollar]
no_dol_sign_dollar# In[8]:import math
total_spent = sum(no_dol_sign_dollar)
#sum = sum + [ float(x[1:]) for x in aliexpress_dollar ]
total_spent# In[9]:# just spent without refund
pos_no_dol_sign_dollar = [ x if x > 0 else 0 for x in no_dol_sign_dollar]
pos_no_dol_sign_dollar[0:10]# In[10]:spent = sum(pos_no_dol_sign_dollar)
spent# In[11]:# just spent without refund
neg_no_dol_sign_dollar = [ x if x < 0 else 0 for x in no_dol_sign_dollar]
#if (len(neg_no_dol_sign_aliexpress_dollar)) > 0:
# neg_no_dol_sign_aliexpress_dollar[-1], neg_no_dol_sign_aliexpress_dollar[-2], neg_no_dol_sign_aliexpress_dollar[-3]
neg_no_dol_sign_dollar# In[12]:got_refund = sum(neg_no_dol_sign_dollar);
got_refund# In[14]:total_spent = spent — abs(got_refund);
total_spent# In[ ]:


***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***
Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Training Courses: http://Training.SitesTree.com

Facebook Groups/Forums to discuss (Q & A):
https://www.facebook.com/banglasalearningschool
https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Python: Merge Multiple csv files into one to facilitate reporting on transaction data over time

Python: Merge Multiple csv files into one to facilitate reporting on transaction data over time

Python: Merge Multiple csv files into one to facilitate reporting on transaction data over time

Python: Merge Multiple csv files into one to facilitate reporting on transaction data over time

By Sayed Ahmed

Merge multiple transaction files into one. This is an extension to the article:

Python: Read RBC Canada: Mastercard PDF Statement Transaction Data into CSV file
By Sayed Ahmedmedium.com

The Code for Merging

#!/usr/bin/env python
# coding: utf-8
# In[1]:
import os
import glob
import pandas as pd
data_folder = ‘./’
os.chdir(data_folder)
# In[2]:
extension = ‘csv’
all_filenames = [i for i in glob.glob(‘*.{}’.format(extension))]
# In[ ]:
sorted(list(all_filenames))
# In[3]:
# test : Check the data for each csv files to be combined.
# do the data align well with each other
row_total_count = 0
for i in range(0, len(all_filenames)):
j = i + 1
for f in all_filenames[i:j]:
file = f
print(file)
df_s = pd.read_csv(f)
print(df_s.shape, f)
#print(f)
row_total_count += df_s.shape[0]
print(df_s.head())
#row_total_count
#df_s.head()
# In[10]:
# keep track of total rows in all files so that you can compare the shape with the final combined data file
row_total_count = 0
for f in all_filenames:
file = f
print(file)
df_s = pd.read_csv(f, header=None)
print(df_s.shape, f)
#print(f)
row_total_count += df_s.shape[0]
#print(df_s.head())
row_total_count
#df_s.head()
# In[15]:
#combine all files in the list, axis =0 i.e. one after another
combined_csv = pd.concat([pd.read_csv(f, header=None) for f in all_filenames], axis=0)
# combined_csv.sort_values(“Model Code”, inplace = True)
# dropping ALL duplicate values
#combined_csv.drop_duplicates(subset =”Model Code”, keep = False, inplace = True)
#export to csv
combined_csv.to_csv( “rbc_mastercard_data_combined”, index=False, encoding=’utf-8-sig’)
# In[16]:
combined_csv.shape
# In[17]:
row_total_count == combined_csv.shape[0]
# In[19]:
df = pd.read_csv(‘rbc_mastercard_data_combined’)
df.head(100)
# In[ ]:
df.shape

Final Output

1*j5ET61-3P-l-GTj2pMhePg.png

Posted On:

https://medium.com/@SayedAhmedCanada/python-merge-multiple-csv-files-into-one-to-facilitate-reporting-on-transaction-data-over-time-2e4524355d43

Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)

MSc. in Comp. Sc. (U of Manitoba, Canada)

MSc. in Data Science and Analytics (Ryerson University, Canada)

Linkedin:

Blog: ,

Training Courses:

Facebook Groups/Forums to discuss (Q & A):

https://www.facebook.com/banglasalearningschool

https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

GUROBI Optimization: Project Examples

GUROBI Optimization: Functional Code Examples: https://www.gurobi.com/resource/functional-code-examples/

Gurobi Interactive Examples: http://examples.gurobi.com/

*** ***. ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***
Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Training Courses: http://Training.SitesTree.com

Facebook Groups/Forums to discuss (Q & A):
https://www.facebook.com/banglasalearningschool
https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Python: Read RBC Canada: Mastercard Statement Data into CSV file

Sure, you can use tools such as Adobe Acrobat to convert PDF statements to csv/xls files. This is a simple approach that you might be able to use when you do not have access to any such tools.

How can this help? If you were trying to calculate based on the transactions. One application can be calculating your business or personal expenses. Also, if you are writing a tax or financial software, similar can be useful. For sure, tools and programming languages are available to read PDF files directly than the approach I mentioned here.

Steps:

Download the statement.
Copy the text (only the transactions) into a text file
The use the code provided here or similar to read the text file and create a CSV file
You can write another Python code to merge multiple csv files
Using the combined CSV file, you can calculate your yearly categorized expenses or similar : for example

Sample Transaction Data
‘DEC 21 DEC 24 PAYMENT – THANK YOU / PAIEMENT – MERCI\n75105398358619984366202\n-$60.00\nJAN 10 JAN 10 PAYMENT – THANK YOU / PAIEMENT – MERCI\n75105399010619980541202\n-$21.82\nJAN 10 JAN 10 PURCHASE INTEREST 19.99% \n12312312312\n-$0.82\n\n\nDEC 10 DEC 12 PRESTO VIC PARK RELOAD EAST YORK ON\n55134428345800179063337\n$10.25\nDEC 13 DEC 14 TIM HORTONS 0670 QTH TORONTO ON\n55419218347200311042837\n$1.80\n

Note:
Assume that each transaction spans into three lines when you copy from a pdf file to a text file. The lines with interest charges might span into only one line. You can convert that into 3 lines before executing the code on it.

DEC 21 DEC 24 PAYMENT – THANK YOU / PAIEMENT – MERCI
75105398358619984366202
-$60.00
JAN 10 JAN 10 PAYMENT – THANK YOU / PAIEMENT – MERCI
75105399010619980541202
-$21.82
JAN 10 JAN 10 PURCHASE INTEREST 19.99%
12312312312
-$0.82

The Code:
#!/usr/bin/env python
# coding: utf-8

# In[5]:

file = open(‘dec2018-jan2019.txt’,’r’);

import os.path
import re

# open the file for reading
path = ‘dec2018-jan2019.txt’
if os.path.exists(path):
# print(‘debug: exists’)
file = open(‘dec2018-jan2019.txt’,’r’)
else:
print(‘Fatal error: file does not exist’)
exit(1)

# In[6]:

fileContent = file.read()
fileContent

# In[ ]:

#f = open("dec2018-jan2019.out.txt", "a");
#f.write("Now the file has more content!")
#f.close()

# In[3]:

one_line = ”;
count = 0;
f = open("dec2018-jan2019.out.txt.csv", "a");
file = open(‘dec2018-jan2019.txt’,’r’)
for line in file:
#print(line);
line = line.rstrip("\n");
if line !=”:
count = count + 1;

if (count % 3 == 1):
line = line[0:6] + "," + line[6:13] + "," + line[13:];

if (count % 3 == 2):
line = line + "," ;

if (count % 3 == 0):
#print(line[0]);
if line[0] == ‘-‘:
line = line[1:];
if (line[0] != ‘$’ ):
print(‘Warning possible mistake’)

#print(count);
one_line = one_line + line;

if (count % 3 == 0):
one_line = one_line + "\r\n";
print(one_line);
f.write(one_line);
one_line = ”;

#print(‘–‘);

f.close();

# In[ ]:

*** ***. ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***
Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Training Courses: http://Training.SitesTree.com

Facebook Groups/Forums to discuss (Q & A):
https://www.facebook.com/banglasalearningschool
https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Misc Stock Market: Crash: Bottom : Bear: Recession

Shiller PE Ratio by Month

https://www.multpl.com/shiller-pe/table/by-month

PE Ratio (TTM)

https://www.profitspi.com/stock/view.aspx?v=stock-chart&uv=101055

Moving Averages: S&P 500 Index

https://www.barchart.com/stocks/quotes/$SPX/technical-analysis

Hang/San Index

https://www.hsi.com.hk/eng

"we are likely to see a global recession in the coming quarters."

https://www.greaterfool.ca/2020/03/14/anatomy-of-bear-markets/comment-page-1/?unapproved=694043&moderation-hash=2a5261659bd9bc483fae73ca46c67007#comment-694043

"Therefore, we don’t believe any real bottom will setup before May, June or July of 2020 – after Q2 earnings are announced and contingent on the virus event subsiding and earnings starting to recover. Otherwise, we could be “searching for a bottom” for quite a while yet."

https://www.thetechnicaltraders.com/wheres-the-bottom-the-question-everyone-is-asking/

***. ***. ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***
Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Training Courses: http://Training.SitesTree.com

Facebook Groups/Forums to discuss (Q & A):
https://www.facebook.com/banglasalearningschool
https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Coronavirus and the Stock Market

"Finally, the news media will never keep things in perspective. Ignore them." https://www.greaterfool.ca/wp-content/uploads/2020/03/DOUG-2.jpg?x70502 https://www.greaterfool.ca/wp-content/uploads/2020/03/DOUG-3.jpg?x70502 https://www.greaterfool.ca/wp-content/uploads/2020/03/DOUG-4.jpg?x70502 https://www.greaterfool.ca/wp-content/uploads/2020/03/DOUG-5.png?x70502

Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***
Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Training Courses: http://Training.SitesTree.com

Facebook Groups/Forums to discuss (Q & A):
https://www.facebook.com/banglasalearningschool
https://www.facebook.com/justetcsocial

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/

Project: Optimization

Portfolio Optimization with Linear and Fixed Costs

Report: https://github.com/chsasank/CVX/blob/master/Report/report.pdf

Project: https://github.com/chsasank/CVX

Trajectory Optimization

https://github.com/zi-w/6.832-project

CourseSchedulingOptimization

https://github.com/BenKang34/CourseSchedulingOptimization

Local search and optimization algorithms in AI

https://github.com/aliarjomandbigdeli/local-search-and-optimization

CS524-Group-Assignment-Optimization

https://github.com/xiaohk/CS524-Group-Assignment-Optimization

TOP – Timetable Optimisation Project

https://people.orie.cornell.edu/mru8/orie6326/project.html

Website Performance Optimization portfolio project

https://github.com/baker-natalie/optimization-project

Target Tracking Projects on Github

https://github.com/topics/target-tracking

Tracking Multiple People in a Multi-Camera Environment

https://www.epfl.ch/labs/cvlab/research/research-surv/research-body-surv-index-php

Appendix:

Optimization Course: https://github.com/MichielStock/SelectedTopicsOptimization

ORIE 7191: Topics in Optimization for Machine Learning

https://github.com/udellgroup/orie7191

Course

https://github.com/ssinhaleite/eth-mathematical-optimization-summary


*** ***. ***
Note: Older short-notes from this site are posted on Medium: https://medium.com/@SayedAhmedCanada

*** . *** *** . *** . *** . ***

Sayed Ahmed

BSc. Eng. in Comp. Sc. & Eng. (BUET)
MSc. in Comp. Sc. (U of Manitoba, Canada)
MSc. in Data Science and Analytics (Ryerson University, Canada)
Linkedin: https://ca.linkedin.com/in/sayedjustetc

Blog: http://Bangla.SaLearningSchool.com, http://SitesTree.com
Online and Offline Training: http://Training.SitesTree.com (Also, can be free and low cost sometimes)

Facebook Group/Form to discuss (Q & A): https://www.facebook.com/banglasalearningschool

Get access to courses on Big Data, Data Science, AI, Cloud, Linux, System Admin, Web Development and Misc. related. Also, create your own course to sell to others. http://sitestree.com/training/