Justetc Social Services (non-profit)Jan 31 · 29 min read

Indicator visualizations will be in a separate ipynb file (visualize-indicators-final.ipynb)

Visualizations can be created using the UI interface i.e. Select options and execute the code block after to get the visualizations.

I have placed a separate section at the end of this file where research questions and plots as I placed on my documents and presentations are placed (Code there will generate the plots for that section)

For research question section, corresponding health status will need to be selected and data reloaded to re — execute (UI selections will not matter)

#!conda install basemapimport pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets

import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline# it is important 
# from geopy.geocoders import Nominatim

# for geo maps
# !conda install basemap
import conda
import os
if 'PROJ_DIR' in os.environ:
    pyproj_datadir = os.environ['PROJ_DIR']
else:
    conda_file_dir = conda.CONDA_PACKAGE_ROOT
    conda_dir = conda_file_dir.split('lib')[0]
    pyproj_datadir = os.path.join(os.path.join(conda_dir, 'Library'),'share')
    os.environ['PROJ_LIB'] = pyproj_datadir
 
#important
#from mpl_toolkits.basemap import Basemap

Location of the Data Files Folder.

Data folder will have data files, one for one aspect of healthcare performance. I will load data file names(aspects) in a drop down — to act upon that. Hence, any new data files with similar structures (likely) can be easily integrated with the visualizations.

import os

data_folder = './data/'
measure_files = os.listdir(data_folder)
measure_files['.~lock.health-status.xls#',
 'access-to-care.xls',
 'health-status.xls',
 'non-med-determinants.xls',
 'patient-safety.xls',
 'prescribing-primary.xls',
 'quality-of-care.xls']

Select a measure/aspect to visualize upon

If you see: [‘.~lock.health-status.xls#’, or similar that needs to be ignored

Note: when a health-aspect i.e. measure will be changed in the drop-down below, some code need to be re-executed to load related data.

I am marking with START-RELOAD-DATA and END-RELOAD-DATA

print('Select a health measure/aspect to visualize\n')
# create the interactive interface
def f(measure):
    return measure

print('Select a measure:')
measure_file = interactive(f, measure = measure_files);
display(measure_file)Select a health measure/aspect to visualize

Select a measure:



interactive(children=(Dropdown(description='measure', options=('.~lock.health-status.xls#', 'access-to-care.xl…

Test what aspect/measure we have selected

START-RELOAD-DATA

'Selected: ' + measure_file.result'Selected: access-to-care.xls'if (measure_file.result == ''):
    measure_file.result = 'health-status.xls'

Load Data and Display

data_file = data_folder + measure_file.result
measure_data = pd.read_excel(data_file)
measure_data.head()

Find all the performance indicators under this aspect/measure

irrespective we have data for Canada or not

# find details on Indicators
# find all indicators
measure_data.set_index(['Indicator'])
indicators = pd.Index(measure_data['Indicator']).unique()
indicatorsIndex(['Inability to Pay for Medical Bills', 'Poor Weekend/Evening Care',
       'Regular Doctor', 'Same or Next Day Appt',
       'Wait Time: Cataract Surgery', 'Wait Time: Hip Replacement',
       'Wait Time: Knee Replacement', 'Wait Time: Specialist'],
      dtype='object', name='Indicator')

Find the performance indicators under this aspect/measure when Canada must have data

# indicators that must have canadian data

# first where canada do exist
data_with_canada = measure_data[measure_data['Region'] == 'Canada']

#indicators when canada also have data
indicators_with_canada = data_with_canada['Indicator'].unique()

print('All indicators\n', indicators)
print('Indicators where canada also has Data\n', indicators_with_canada)
'Indicator Counts', len(indicators_with_canada), len(indicators)All indicators
 Index(['Inability to Pay for Medical Bills', 'Poor Weekend/Evening Care',
       'Regular Doctor', 'Same or Next Day Appt',
       'Wait Time: Cataract Surgery', 'Wait Time: Hip Replacement',
       'Wait Time: Knee Replacement', 'Wait Time: Specialist'],
      dtype='object', name='Indicator')
Indicators where canada also has Data
 ['Inability to Pay for Medical Bills' 'Poor Weekend/Evening Care'
 'Regular Doctor' 'Same or Next Day Appt' 'Wait Time: Cataract Surgery'
 'Wait Time: Hip Replacement' 'Wait Time: Knee Replacement'
 'Wait Time: Specialist']





('Indicator Counts', 8, 8)

Find years as we can see in the data

All years, also seprately when we have data for Canada

# find all years
years = measure_data['Data year'].dropna().unique()
years_with_canada = data_with_canada['Data year'].dropna().unique()


print('Years we have data\n', years)
print('Years where canada also has data\n', years_with_canada)Years we have data
 ['2016' 2016 'Not applicable' '2015' '2013']
Years where canada also has data
 [2016]

Years when we have data for Canada

# sort years for all data
years = [ int(aYear) for aYear in years if (aYear != 'Not applicable') and len( str(aYear).split(' ')) <= 1 ]
years = sorted(years)
years

all_years = [0] + years
all_years[0, 2013, 2015, 2016, 2016]

Sort the years so that we can show them in drop down in ascending format

0 means all years selected

# sort years
years_with_canada = [ int(aYear) for aYear in years_with_canada if (aYear != 'Not applicable') and len( str(aYear).split(' ')) <= 1 ]
years_with_canada = sorted(years_with_canada)
years_with_canada

all_years_canada = [0] + years_with_canada
all_years_canada[0, 2016]

All countries

countries = measure_data[ (measure_data['Type of region'] == 'Country') | (measure_data['Type of region'] == 'Canada') ]['Region']
countries = sorted (countries.unique())
#print(countries)
#countries = ['Canada'] + countries

# bring Canada at the top
if 'Canada' in countries:
    countries = ['Canada'] + countries
    
countries[:1]['Canada']

unique color code for each country

color will be used as a third dimension in some plots

# countries unique color code
all_regions = measure_data['Region'].unique()
region_colors = []
region_colors_dict = {}
import random
random.seed(0)
for aRegion in all_regions:
    region_colors_dict[aRegion] = np.random.randint(0, 255)
    
list(region_colors_dict.keys())[:5], list(region_colors_dict.values())[:5](['United Kingdom',
  'Saskatchewan',
  'Germany',
  'British Columbia',
  'Australia'],
 [227, 16, 56, 193, 232])# latitude and longitude information
geolocator = Nominatim(user_agent="chrome")
lats, lons = [], []
lats_dict = {}
lons_dict = {}
#for aCountry in countries:
for aRegion in all_regions:  
    try:
        location = geolocator.geocode(aRegion)
        lats.append(location.latitude)
        lons.append(location.longitude)
        lats_dict[aRegion] = location.latitude
        lons_dict[aRegion] = location.longitude
    except:
        continue
    
lats_dict['Canada']
#list(zip(lats, lons))---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-17-1521376715a3> in <module>()
      1 # latitude and longitude information
----> 2 geolocator = Nominatim(user_agent="chrome")
      3 lats, lons = [], []
      4 lats_dict = {}
      5 lons_dict = {}


NameError: name 'Nominatim' is not defined# Province  unique color code
all_provinces = measure_data[ measure_data['Type of region'] == 'Province'  ]['Region'].unique()
province_colors = []
province_colors_dict = {}
import random
random.seed(0)
for aProvince in all_provinces:
    province_colors_dict[aProvince] = np.random.randint(0, 255)
    
list(province_colors_dict.keys()), list(province_colors_dict.values())(['Saskatchewan',
  'British Columbia',
  'Newfoundland and Labrador',
  'Manitoba',
  'Alberta',
  'Quebec',
  'New Brunswick',
  'Ontario',
  'Nova Scotia',
  'Prince Edward Island'],
 [172, 161, 91, 156, 3, 216, 153, 234, 191, 96])

latitude longitude for all regions (countries, provinces)

will be used in map/geo plots

# latitude and longitude information
geolocator = Nominatim(user_agent="chrome")
lats, lons = [], []
lats_dict = {}
lons_dict = {}
#for aCountry in countries:
for aRegion in all_regions:  
    try:
        location = geolocator.geocode(aRegion)
        lats.append(location.latitude)
        lons.append(location.longitude)
        lats_dict[aRegion] = location.latitude
        lons_dict[aRegion] = location.longitude
    except:
        continue
    
lats_dict['Canada']
#list(zip(lats, lons))---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-19-1521376715a3> in <module>()
      1 # latitude and longitude information
----> 2 geolocator = Nominatim(user_agent="chrome")
      3 lats, lons = [], []
      4 lats_dict = {}
      5 lons_dict = {}


NameError: name 'Nominatim' is not defined

Test to check the OECD Data: Benchmark data

for anIndicator in indicators:
    oecd_data = measure_data.loc[  (measure_data['Indicator'] == anIndicator)   & (measure_data['Region'] == 'OECD average') ]
    #oecd_data['Value'].tolist()[0]
    print(oecd_data[ ['Indicator', 'Value', 'Data year' ]])

#print(all_years)
for aYear in all_years:
    indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicators[0])  & ( (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) ) )   | \
                                         ( (measure_data['Indicator'] == indicators[0]) & (measure_data['Region'] == 'OECD average')) ]

    #print(indicator_data[ ['Indicator', 'Value', 'Data year' ]])
    #print(indicator_data['Value'])Indicator  Value       Data year
17  Inability to Pay for Medical Bills    8.7  Not applicable
                    Indicator  Value       Data year
27  Poor Weekend/Evening Care   54.1  Not applicable
         Indicator  Value       Data year
57  Regular Doctor   94.6  Not applicable
                Indicator  Value       Data year
84  Same or Next Day Appt   59.2  Not applicable
                       Indicator  Value       Data year
110  Wait Time: Cataract Surgery   98.0  Not applicable
                      Indicator  Value       Data year
128  Wait Time: Hip Replacement  115.2  Not applicable
                       Indicator  Value       Data year
161  Wait Time: Knee Replacement  204.6  Not applicable
                 Indicator  Value       Data year
174  Wait Time: Specialist   41.9  Not applicable

END-RELOAD-DATA

Method to plot over years or only for a year.

#Note: All combinations of UI selections might not work as that will require extensive testing and adjust (adjust with data and real life) #country might be represented by colors when applicable #supports plots such as: Bubble, Line, Bar, Hor Bar, Pie #other plot types can also be added. Once implemented will work for all measures and indicators (with proper data format provided) #Note: it is known to me that code can be improved here. similar codes are in multiple places — could be reduced

#plt.rcParams['figure.figsize'] = [10, 15]
def plot_measure_by_years(year, indicator, bubble_scale, chart_type = '', ratios=[10, 1], animate=False, provincial_only=False, all_years=all_years, fig_size=[10, 10], sec_fig=True):
    plt.rcParams['figure.figsize'] = fig_size
    
    # redundend code to address a last minute bug
    # countries unique color code
    all_regions = measure_data['Region'].unique()
    region_colors = []
    region_colors_dict = {}
    import random
    random.seed(0)
    for aRegion in all_regions:
        region_colors_dict[aRegion] = np.random.randint(0, 255)
            
            
    # print('year, indicator', year, indicator)   
    
    # keep these, I might need this
    
    """
    plt.ion()    
    f = plt.figure()
    ax = f.gca()
    
    #top box
    f.show()
    """
    
    # benchmark data
    oecd_data = measure_data.loc[  (measure_data['Indicator'] == indicator)   & (measure_data['Region'] == 'OECD average') ]
    #benchmark_value = oecd_data['Value']
    #print(type(benchmark_value), benchmark_value)
    benchmark_value = oecd_data['Value'].tolist()[0]
        
    
    # one improvement that can be made: usually I kept, two plots side by side. where the right one shows country and color
    # as in other cases we do not need the right one, code to hide the right one or just to create and use one subplot is more
    # appropriate
    # if (sec_fig == True):
    fig, axs = plt.subplots(1, 2, figsize=fig_size, sharey=False, gridspec_kw = {'width_ratios':ratios})
    #else:
    #axs = plt.subplots(1, 1, figsize=fig_size )
        
        
    #plt.xticks(rotation=90)
        
    # for one year    
    if ( year > 1 ):
        
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & ( (measure_data['Data year'] == year) & (measure_data['Type of region'].isin(['Country', 'Canada']) ) )   | \
                                         ( (measure_data['Indicator'] == indicator) & (measure_data['Region'] == 'OECD average')) ]
        if (provincial_only == True):
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                              (measure_data['Data year'] == year) & (measure_data['Type of region'].isin(['Province']))]
        
        #print(indicator_data)
        
        # for country color codes
        if (provincial_only == False):
            c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
            m =  [ x  for x in indicator_data['Region'] ] 
            c_code =  [ x[0:3]  for x in indicator_data['Region'] ]  
        else:
            #province_colors_dict
            c =  [ province_colors_dict[x]  for x in indicator_data['Region'] ]    
            m =  [ x  for x in indicator_data['Region'] ] 
            c_code =  [ x[0:3]  for x in indicator_data['Region'] ]
            region_colors_dict = province_colors_dict
            
        
        
        # this block might not apply to anything for one single year plot
        
        color_as_a_dimension = False
        if color_as_a_dimension == True:
            axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='^')                                
            count = list(region_colors_dict.keys())
            #print(count)
            for j in range(len(c)):
                axs[1].annotate(count[j],  (1.0001, j*5))

            axs[1].set_xlabel('Country and Colors')
            axs[1].set_ylabel('')

            axs[1].set_xticks([])
            axs[1].set_yticks([])
            
            
        
        
                
        if chart_type == 'Line': 
            # best to use only for one year
            
            
            
            axs[0].plot(indicator_data['Value'], indicator_data['Region']) #, c=c                 
            axs[0].set_xticklabels(indicator_data['Value'], rotation=90) # can be turned off                        
            # https://stackoverflow.com/questions/10998621/rotate-axis-text-in-python-matplotlib
            plt.suptitle(indicator + ' For a Year')
            axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Countries/Regions')
            
            fig.savefig('./saved_images_from_visualizations/' + 'line_' +indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
            fig, axs = plt.subplots(1, 2, figsize=fig_size, sharey=False, gridspec_kw = {'width_ratios':ratios})
            axs[0].plot(indicator_data['Region'], indicator_data['Value']) #, c=c                
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off 
            
            plt.suptitle(indicator + ' Over a Year')
            axs[0].set_xlabel('Regions/Countries')
            axs[0].set_ylabel('Values')
        
            #fig.savefig('./saved_images_from_visualizations/cancer_mortality_2017_country_region_x.png')
            fig.savefig('./saved_images_from_visualizations/' + 'line_' + indicator.replace(' ', '_')[0:12] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
            
        elif chart_type == 'Bar':
            #fig, axs = plt.subplots(1, 2, figsize=(10, 8), sharey=False, gridspec_kw = {'width_ratios':ratios})
            axs[0].bar(indicator_data['Region'], indicator_data['Value']) #, c=c                
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off 
            
            plt.suptitle(indicator + ' Over a Year')
            axs[0].set_xlabel('Regions/Countries')
            axs[0].set_ylabel('Values')
        
            #fig.savefig('./saved_images_from_visualizations/cancer_mortality_2017_country_region_x.png')
            fig.savefig('./saved_images_from_visualizations/' + 'bar_' + indicator.replace(' ', '_')[0:12] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
        elif chart_type == 'Hor Bar':
            #fig, axs = plt.subplots(1, 2, figsize=(10, 8), sharey=False, gridspec_kw = {'width_ratios':ratios})
            axs[0].barh(indicator_data['Region'], indicator_data['Value']) #, c=c                
            #axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off 
            
            plt.suptitle(indicator + ' Over a Year')
            axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Regions/Countries')
        
            #fig.savefig('./saved_images_from_visualizations/cancer_mortality_2017_country_region_x.png')
            fig.savefig('./saved_images_from_visualizations/' + 'hor_bar' + indicator.replace(' ', '_')[0:12] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
        elif chart_type == 'Pie':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.barh(indicator_data['Region'], indicator_data['Value'])                
            #ax.pie(indicator_data['Value'], labels=indicator_data['Data year'], autopct="%1.1f%%")
            
            
            axs[0].pie(indicator_data['Value'], labels=indicator_data['Region']) #, c=c                 
            axs[0].set_xticklabels(indicator_data['Value'], rotation=90) # can be turned off                        
            # https://stackoverflow.com/questions/10998621/rotate-axis-text-in-python-matplotlib
            plt.suptitle(indicator + ' For a Year')
            axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Countries/Regions')
            
            
            
            fig.savefig('./saved_images_from_visualizations/' + 'pie_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
        else:
            
            #print(indicator_data)
            # for country color codes
            if (provincial_only == False):
                c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ]  
                
            else:
                #province_colors_dict
                c =  [ province_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ] 
                #c_code =  [ x[0:3]  for x in indicator_data['Region'] ] 
                region_colors_dict = province_colors_dict
            

            #ax.scatter(indicator_data['Data year'], indicator_data['Region'], s=indicator_data['Value'] * bubble_scale, c=c )
            axs[0].scatter(indicator_data['Region'], indicator_data['Value'], s=indicator_data['Value'] * bubble_scale, c=c )
            #axs[0].set_xticks([year])
            #axs[0].set_yticks(indicator_data['Value'])

            plt.suptitle(indicator + ' Over a Year ' + str(year) )
            axs[0].set_xlabel('Regions')
            axs[0].set_ylabel('Values')
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off                        
            
            
            axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='^')                                
            count = list(region_colors_dict.keys())
            #print(count)
            for j in range(len(c)):
                axs[1].annotate(count[j],  (1.0001, j*5))

            axs[1].set_xlabel('Country and Colors')
            axs[1].set_ylabel('')

            axs[1].set_xticks([])
            axs[1].set_yticks([])
            
            fig.savefig('./saved_images_from_visualizations/' + 'bubble_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
                    
        #plt.show()        
    else:
        # for multiple years
        # though this method is primarily to use for one year, unless in some specific cases
        """
        plt.ion()    
        f = plt.figure()
        #ax = f.gca()

        #top box
        f.show()
        """
        
        
        
        # redundend code to address a last minute bug
        # countries unique color code
        all_regions = measure_data['Region'].unique()
        region_colors = []
        region_colors_dict = {}
        import random
        random.seed(0)
        for aRegion in all_regions:
            region_colors_dict[aRegion] = np.random.randint(0, 255)

        #list(region_colors_dict.keys())[:5], list(region_colors_dict.values())[:5]


            
            
        #print(all_years)
        for aYear in all_years:
            if aYear == 0:
                continue
                
            if aYear == 'Not applicable':
                continue
                
            # older : original # 
            #indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == aYear) ]
            
            """
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & ( (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) ) )   | \
                                         ( (measure_data['Indicator'] == indicator) & (measure_data['Region'] == 'OECD average')) ]
            
            """
            
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  &  (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
            if (provincial_only == True):
                indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                              (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Province']))]
                
            #print(indicator_data)
        
        
             #print(indicator_data)
            # for country color codes
            #print(region_colors_dict)
            if (provincial_only == False):
                c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ]  
                
            else:
                #province_colors_dict
                c =  [ province_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ] 
                #c_code =  [ x[0:3]  for x in indicator_data['Region'] ] 
                region_colors_dict = province_colors_dict
                
                
        
            #c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
            #m =  [ x  for x in indicator_data['Region'] ] 
            
            #print( list(zip(m,c)))
            
            if chart_type == 'Line':                
                # plt.plot(indicator_data['Data year'], indicator_data['Value'], c=c)
                # plt.xticks(indicator_data['Value'])
                
                
                # was here axs[0].plot(indicator_data['Data year'], indicator_data['Value']) #, c=c           
                # was here axs[0].set_xticks(indicator_data['Value'])
                
                # brought from one year
                # best to use only for one year
                #plt.plot(indicator_data['Data year'], indicator_data['Value'], c=c)            
                #plt.xticks(indicator_data['Value'])

                #axs[0].plot(indicator_data['Data year'], indicator_data['Value']) #, c=c                       
                axs[0].plot(indicator_data['Data year'], indicator_data['Value']) #, c=c     
                #axs[0].plot(165, 'OECD', color='Red')
                #axs[0].set_xticklabels(indicator_data['Data year'], rotation=90) # can be turned off                        
                # https://stackoverflow.com/questions/10998621/rotate-axis-text-in-python-matplotlib
                #plt.suptitle(indicator + ' For a Year')
                #axs[0].set_xlabel('Years')
                #axs[0].set_ylabel('Values')

                fig.savefig('./saved_images_from_visualizations/cancer_mortality_years_country.png')

                #plt.show()

                """
                fig, axs = plt.subplots(1, 2, figsize=(10, 8), sharey=False, gridspec_kw = {'width_ratios':ratios})
                axs[0].plot(indicator_data['Region'], indicator_data['Value']) #, c=c                
                axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off 

                plt.suptitle(indicator + ' Over a Year')
                axs[0].set_xlabel('Regions/Countries')
                axs[0].set_ylabel('Values')

                fig.savefig('./saved_images_from_visualizations/cancer_mortality_2017_country_region_x.png')

                plt.show()  
                """
                
                # end of brought from one year
                
                #f.canvas.draw()
                

                
            
                
            elif chart_type == 'Bar':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                plt.bar(indicator_data['Data year'], indicator_data['Value'])
            elif chart_type == 'Hor Bar':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                plt.barh(indicator_data['Region'], indicator_data['Value'])
            elif chart_type == 'Pie':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                #plt.barh(indicator_data['Region'], indicator_data['Value'])                
                ax.pie(indicator_data['Value'], labels=indicator_data['Data year'], autopct="%1.1f%%")
            else:
                indicator_data['Value'] = indicator_data['Value'].div(benchmark_value)
                
                # for country color codes
                c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
                m =  [ x  for x in indicator_data['Region'] ] 
        
        
                axs[0].scatter(indicator_data['Data year'], indicator_data['Value'], s = indicator_data['Value'] * bubble_scale, c=c)
                
                plt.suptitle(indicator + ' Over Years \n Values are in multiples of Benchmark (' + str(benchmark_value) + ')' )
                axs[0].set_xlabel('Years')
                axs[0].set_ylabel('Values/Becnhmark Value(' + str(benchmark_value) +')' )
        
                fig.savefig('./saved_images_from_visualizations/transport_mortality_over_years.png')
            
                #plt.show()
            
            
                # show country colors
                axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='o')
                
                count = list(m) #list(region_colors_dict.keys())
                #print(count)
                for j in range(len(c)):
                    axs[1].annotate(count[j],  (1, j*5))
                    
                axs[1].set_xlabel('Country and Colors')
                axs[1].set_ylabel('')

                axs[1].set_xticks([])
                axs[1].set_yticks([])
                
            if (animate):
                plt.pause(0.01)
            #f.canvas.draw()
    
    #plt.show()

Plot over a region for different countries

Some of the plots as can be generated from this method can also be generated using the method above. The use of this method will be mostly averaging over years. However, in the dataset, data for all years might not be available; hence, to make the visualization, the years selected need to make sense consideringreal-world Ideally, I could give options to select multiple years individually and show info to the users what data are there and what are appropriate (I am considering that out of scope for now)

plt.rcParams['figure.figsize'] = [10, 10]
def plot_measure_by_regions(year, indicator, bubble_scale, chart_type = '', ratios=[3,1], provincial_only=False, all_years=all_years):    
    #print('year, indicator', year, indicator)    
    plt.ion()    
    # now f = plt.figure()
    # now ax = f.gca()
    #plt.xticks([])
    # now f.show()
    
    # indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicators[0] ) ]
    # (20, 10)
    fig, axs = plt.subplots(1, 2, figsize=(20, 10), sharey=False, gridspec_kw = {'width_ratios':ratios})
    #plt.xticks(rotation=90)
    
    if ( year > 1 ):
        #indicator_data = indicator_data [ indicator_data['Data year'] == year]
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == year) ]
        
        # for country color codes
        c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
        m =  [ x  for x in indicator_data['Region'] ] 
        c_code =  [ x[0:3]  for x in indicator_data['Region'] ]  
    
    
        #print(indicator_data)
        if chart_type == 'Line':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.plot(indicator_data['Region'], indicator_data['Value'])
            
            
            axs[0].plot(indicator_data['Region'], indicator_data['Value']) #,  s=indicator_data['Value'] * bubble_scale, c=c )
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
            
            fig.savefig('./saved_images_from_visualizations/' + 'line_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            
        elif chart_type == 'Bar':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.bar(indicator_data['Region'], indicator_data['Value'])
            
            indicator_data = indicator_data.sort_values(by=['Value'])
            axs[0].bar(indicator_data['Region'], indicator_data['Value']) #,  s=indicator_data['Value'] * bubble_scale, c=c )
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
            
            fig.savefig('./saved_images_from_visualizations/' + 'bar_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            
        elif chart_type == 'Hor Bar':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                #plt.barh(indicator_data['Region'], indicator_data['Value'])
                
                indicator_data = indicator_data.sort_values(by=['Value'])
                axs[0].barh(indicator_data['Region'], indicator_data['Value']) #,  s=indicator_data['Value'] * bubble_scale, c=c )
                plt.suptitle(indicator + ' Over Regions')
                axs[0].set_xlabel('Regions and Countries')
                axs[0].set_ylabel('Values')
                #axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
            
                fig.savefig('./saved_images_from_visualizations/' + 'hor_bar_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
        elif chart_type == 'Pie':
                #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
                #plt.barh(indicator_data['Region'], indicator_data['Value'])                
                #ax.pie(indicator_data['Value'], labels=indicator_data['Region'], autopct="%1.1f%%")
                
                
                axs[0].pie(indicator_data['Value'], labels = indicator_data['Region'], ) #,  s=indicator_data['Value'] * bubble_scale, c=c )
                plt.suptitle(indicator + ' Over Regions')
                axs[0].set_xlabel('Regions and Countries')
                axs[0].set_ylabel('Values')
                axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    

                fig.savefig('./saved_images_from_visualizations/' + 'pie_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
        else:
            axs[0].scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale, c=c )
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')
            axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
                                    
            
            
            # plt.show()
            # show country colors
            axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='o')

            count = list(m) #list(region_colors_dict.keys())
            #print(count)
            for j in range(len(c)):
                axs[1].annotate(count[j],  (1, j*5))

            axs[1].set_xlabel('Country and Colors')
            axs[1].set_ylabel('')

            axs[1].set_xticks([])
            axs[1].set_yticks([])
            
            fig.savefig('./saved_images_from_visualizations/' + 'bubble_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            
            
        # now plt.show()
    
    # multiple year selected
    else:        
        #indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  &  (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if (provincial_only == True):
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                          (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Province']))]


        # for country color codes
        c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
        m =  [ x  for x in indicator_data['Region'] ] 
        c_code =  [ x[0:3]  for x in indicator_data['Region'] ] 
        
        indicator_data = indicator_data.set_index(['Region'])
       
        
        x = indicator_data.groupby(['Region']).mean()
            
       
        
        #print(x.index, x['Value'])
            
            
        ### for aYear in all_years:
        # now indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == aYear) ]

        """
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  &  (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if (provincial_only == True):
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                          (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Province']))]


        indicator_data = indicator_data.set_index(['Region'])
        indicator_data['mean'] = indicator_data.groupby(['Region']).mean()
        print(indicator_data)
        """

        

        if chart_type == 'Line':
            
            axs[0].plot(x.index, x['Value']) #,  s = x['Value'] * bubble_scale, c=c )
            
            # though I am repeating this block of code - this can be just placed at the end of the else block
            # just trying to save on debug time            
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')            
            axs[0].set_xticklabels(x.index, rotation=90) # can be turned off   
            
            fig.savefig('./saved_images_from_visualizations/' + 'line_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            


        elif chart_type == 'Bar':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            # now plt.bar(indicator_data['Region'], indicator_data['Value'])
            
            
            x = x.sort_values(by=['Value'])            
            axs[0].bar(x.index, x['Value']) #,  s = x['Value'] * bubble_scale, c=c )
            
            # though I am repeating this block of code - this can be just placed at the end of the else block
            # just trying to save on debug time            
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')            
            axs[0].set_xticklabels(x.index, rotation=90) # can be turned off   
            
            fig.savefig('./saved_images_from_visualizations/' + 'bar_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
                                    
        elif chart_type == 'Hor Bar':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.barh(indicator_data['Region'], indicator_data['Value'])
            
            x = x.sort_values(by=['Value'])            
            axs[0].barh(x.index, x['Value']) #,  s = x['Value'] * bubble_scale, c=c )
            
            # though I am repeating this block of code - this can be just placed at the end of the else block
            # just trying to save on debug time            
            plt.suptitle(indicator + ' Over Regions')
            #axs[0].set_xlabel('Regions and Countries')
            axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Regions and Countries')            
            #axs[0].set_xticklabels(x.index, rotation=90) # can be turned off 
            fig.savefig('./saved_images_from_visualizations/' + 'hor_bar_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            
        elif chart_type == 'Pie':
            #ax.scatter(indicator_data['Region'], indicator_data['Value'],  s=indicator_data['Value'] * bubble_scale )
            #plt.barh(indicator_data['Region'], indicator_data['Value'])                
            # now ax.pie(indicator_data['Value'], labels=indicator_data['Region'], autopct="%1.1f%%")
            
            
            
            
            axs[0].pie(x['Value'], labels=x.index) #, c=c                 
            axs[0].set_xticklabels(x['Value'], rotation=90) # can be turned off                        
            # https://stackoverflow.com/questions/10998621/rotate-axis-text-in-python-matplotlib
            plt.suptitle(indicator + ' Over Regions')
            #axs[0].set_xlabel('Values')
            axs[0].set_ylabel('Countries/Regions')
            
            fig.savefig('./saved_images_from_visualizations/' + 'pie_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
            
            plt.show()
            
            

        else:
            #axs[0].scatter(indicator_data['Region'], indicator_data['Value'],  s = indicator_data['Value'] * bubble_scale, c=c )
            axs[0].scatter(x.index, x['Value'],  s = x['Value'] * bubble_scale, c=c )
            
            plt.suptitle(indicator + ' Over Regions')
            axs[0].set_xlabel('Regions and Countries')
            axs[0].set_ylabel('Values')
            #axs[0].set_xticklabels(indicator_data['Region'], rotation=90) # can be turned off    
            axs[0].set_xticklabels(x.index, rotation=90) # can be turned off    

            #plt.show()
            # show country colors
            axs[1].scatter([1]*len(c), [i*5 for i in range(len(c))], s=300, c=c, marker='o')

            count = list(m) #list(region_colors_dict.keys())
            #print(count)
            for j in range(len(c)):
                axs[1].annotate(count[j],  (1, j*5))

            axs[1].set_xlabel('Country and Colors')
            axs[1].set_ylabel('')

            axs[1].set_xticks([])
            axs[1].set_yticks([])

            fig.savefig('./saved_images_from_visualizations/' + 'bubble_' + indicator.replace(' ', '_')[0:10] + '_' + str(np.random.randint(0, 99999)) + '.png')
                
            #now f.canvas.draw()
            
    #plt.suptitle(indicator + 'Over Regions and Years')
    #plt.xlabel('Regions and Countries')
    #plt.ylabel('Values')

Geo Plot, Map Plot on a World Map

"""
magnitudes = measure_data[['Region', 'Value']]
magnitudes['Region'][1]

year = 2015
indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicators[0])  & (measure_data['Data year'] == 2015) ]          
#print(indicator_data)
magnitudes = indicator_data[['Region', 'Value', 'Data year']]

for r, v in zip(magnitudes['Region'], magnitudes['Value']):
    print(r, v)
""""\nmagnitudes = measure_data[['Region', 'Value']]\nmagnitudes['Region'][1]\n\nyear = 2015\nindicator_data = measure_data.loc[  (measure_data['Indicator'] == indicators[0])  & (measure_data['Data year'] == 2015) ]          \n#print(indicator_data)\nmagnitudes = indicator_data[['Region', 'Value', 'Data year']]\n\nfor r, v in zip(magnitudes['Region'], magnitudes['Value']):\n    print(r, v)\n"def plot_map_measure_by_regions(year, indicator, bubble_scale, chart_type = '', provincial_only=False, all_years=all_years):    
    #print('year, indicator', year, indicator)    
               
    if ( year > 1 ):
        
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == year) ]          
        #print(indicator_data)
        magnitudes = indicator_data[['Region', 'Value']]
        
        
        # Make this plot larger.
        plt.figure(figsize=(16,12))


        eq_map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
                      lat_0=0, lon_0=-130)
        eq_map.drawcoastlines()
        eq_map.drawcountries()
        eq_map.fillcontinents(color = 'gray')
        eq_map.drawmapboundary()
        eq_map.drawmeridians(np.arange(0, 360, 30))
        eq_map.drawparallels(np.arange(-90, 90, 30))

        min_marker_size = 2.5 #* bubble_scale
        #for lon, lat, mag in zip(lons, lats, magnitudes):
        #for i in range(indicator_data.shape[0]):
        for reg, val in zip(magnitudes['Region'], magnitudes['Value']):
            #try:
            #reg = magnitudes[i:i+1]['Region']    
            #reg = magnitudes['Region'][i]
            #print(reg)
            lat = lats_dict[reg]
            lon = lons_dict[reg]   

            #print(lat, lon)
            x, y = eq_map(lon, lat)
            mag = val #magnitudes['Value'][i] #magnitudes[i:i+1]['Value']    
            #print(mag)
            msize = mag * min_marker_size/bubble_scale
            #print(msize, msize)
            #marker_string = get_marker_color(mag)
            #eq_map.plot(x, y, marker_string, markersize=msize)
            eq_map.plot(x, y,  marker='o', markersize=msize)
            
            #x, y = eq_map(0, 0)
            #eq_map.plot(x, y, marker='D',color='m')

            #plt.show()

            #except:
                #print('hello')
                #continue

        title_string = indicator
        title_string += ' for year ' + str(year)
        plt.title(title_string)
        #plt.show()
        
        plt.savefig('./saved_images_from_visualizations/' + 'geo_plot_for_a_year_' +indicator.replace(' ', '_')[0:5] + '_' + str(np.random.randint(0, 99999)) + '.png')
        plt.show()
        
        
        #plt.suptitle(indicator + 'Over Regions and Years')
        #plt.xlabel('Regions and Countries')
        #plt.ylabel('Values')
        #plt.show()
        
    else:
        # this is to plot over multiple years
        # average taken over multiple years will be plotted
        # I could make this more user friendly and data friendly 
        # by giving users the option to select years, countries; also by informing to what extent data are available (skipping as that will extend the work much)
        
        #indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  &  (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if (provincial_only == True):
            indicator_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & \
                                          (measure_data['Data year'] == aYear) & (measure_data['Type of region'].isin(['Province']))]

        #print(indicator_data)
        # for country color codes
        """
        c =  [ region_colors_dict[x]  for x in indicator_data['Region'] ]    
        m =  [ x  for x in indicator_data['Region'] ] 
        c_code =  [ x[0:3]  for x in indicator_data['Region'] ] 
        """
        
        indicator_data = indicator_data.set_index(['Region'])
        x = indicator_data.groupby(['Region']).mean()
        
        #print(x)
        #print(x['Value'][0])
        
        
        
        #print(indicator_data)
        # not used
        magnitudes = pd.DataFrame()
        magnitudes['Region'] = x.index
        magnitudes['Value'] = x['Value']
        
        #print(magnitudes)
        
        # Make this plot larger.
        plt.figure(figsize=(16,12))


        eq_map = Basemap(projection='robin', resolution = 'l', area_thresh = 1000.0,
                      lat_0=0, lon_0=-130)
        eq_map.drawcoastlines()
        eq_map.drawcountries()
        eq_map.fillcontinents(color = 'gray')
        eq_map.drawmapboundary()
        eq_map.drawmeridians(np.arange(0, 360, 30))
        eq_map.drawparallels(np.arange(-90, 90, 30))

        min_marker_size = 2.5 #* bubble_scale
        #for lon, lat, mag in zip(lons, lats, magnitudes):
        #for i in range(indicator_data.shape[0]):
        #for reg, val in zip(magnitudes['Region'], magnitudes['Value']):
        for reg, val in zip(x.index, x['Value']):
            try:
                #reg = magnitudes[i:i+1]['Region']    
                #reg = magnitudes['Region'][i]
                #print(reg, val)
                lat = lats_dict[reg]
                lon = lons_dict[reg]   

                #print(lat, lon)
                x, y = eq_map(lon, lat)
                mag = val #magnitudes['Value'][i] #magnitudes[i:i+1]['Value']    
                #print(mag)
                msize = mag * min_marker_size/bubble_scale
                #print(msize, msize)
                #marker_string = get_marker_color(mag)
                #eq_map.plot(x, y, marker_string, markersize=msize)
                eq_map.plot(x, y,  marker='o', markersize=msize)

                #x, y = eq_map(0, 0)
                #eq_map.plot(x, y, marker='D',color='m')

                #plt.show()

            except:
                #print('hello')
                continue

        title_string = indicator
        title_string += ' Average for selected years '
        plt.title(title_string)
        #plt.show()
        
        plt.savefig('./saved_images_from_visualizations/' + 'geo_plot_average_over_multiple_years_' +indicator.replace(' ', '_')[0:5] + '_' + str(np.random.randint(0, 99999)) + '.png')
        plt.show()

Heatmap to compare across indicators and countries

Options implemented:

Heatmap for one year, one indicator for a health-aspect across countries Heatmap for one year, all indicator for a health-aspect across countries

Heatmap for one year, one indicator for a health-aspect across Canadian province Heatmap for one year, all indicator for a health-aspect across Canadian province

Heatmap for all years with mean values, one indicator for a health-aspect across countries Heatmap for all years with mean values, all indicator for a health-aspect across countries

Heatmap for all years with mean values, one indicator for a health-aspect across Canadian province Heatmap for all years with mean values, all indicator for a health-aspect across Canadian province

Whether taking mean over multiple years is a pragmatic approach or not: — it can be an effective way of measurements provided data exist for all those years (otherwise the years with data will dominate) — in my case, data will not be available for all indicators for all years — we can then just plot for one year or I could give an interface to select years, indicators, countries for custom comparison — that can be a long task. Hence, I am giving the tool that can be extended in different ways

# ref : https://cmdlinetips.com/2019/01/how-to-make-heatmap-with-seaborn-in-python/


import pandas as pd
import seaborn as sns
import matplotlib.pyplot as pltdef plot_heatmap_across_indicators(year, indicator = '', ratios = [3,1], provincial_only = False, all_years = all_years, fig_size = [10, 10]):
    
    if ( year > 1 ):
        heatmap_data = measure_data.loc[  (measure_data['Data year'] == year)  & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if ( provincial_only == True ):
            heatmap_data = measure_data.loc[  (measure_data['Data year'] == year)  & (measure_data['Type of region'].isin(['Province']) )  ]
            
        if ( indicator != ''):
            heatmap_data = measure_data.loc[  (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == year) & ( measure_data['Type of region'].isin(['Country', 'Canada']) ) ]
            if ( provincial_only == True ):
                heatmap_data = measure_data.loc[ (measure_data['Indicator'] == indicator)  & (measure_data['Data year'] == year)  & (measure_data['Type of region'].isin(['Province']) )  ]

        #print(heatmap_data)
        indicator_data_heatmap = heatmap_data[ ['Region', 'Value', 'Indicator', 'Data year']  ]
        
        #print(indicator_data_heatmap)
        
        
        heatmap1_data = pd.pivot_table(indicator_data_heatmap, values='Value',  index=['Region'], columns='Indicator')
        plt.figure(figsize=fig_size)
        ax = sns.heatmap(heatmap1_data, cmap="YlGnBu")
        
        # https://stackoverflow.com/questions/48470251/move-tick-marks-at-the-top-of-the-seaborn-plot?noredirect=1&lq=1
        ax.xaxis.set_ticks_position('top')
        ax.set_xticklabels(indicator_data_heatmap['Indicator'], rotation=90) # can be turned off                        
        
        #plt.show()

                            
        title_string = measure_file.result[0:len(measure_file.result)-4]  + ':' + indicator
        title_string += ' for year ' + str(year)
        plt.title(title_string)
        #plt.show()
        
        plt.savefig('./saved_images_from_visualizations/' + 'heatmap_' + measure_file.result[0:len(measure_file.result)-4] + indicator.replace(' ', '_')[0:5] + '_' + str(np.random.randint(0, 99999)) + '.png')
        #plt.show()
        
        
        
    else:
        # this is to plot over multiple years
        # average taken over multiple years will be plotted
        # I could make this more user friendly and data friendly 
        # by giving users the option to select years, countries; also by informing to what extent data are available (skipping as that will extend the work much)
        
        
        heatmap_data = measure_data.loc[   (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
        if ( provincial_only == True ):
            heatmap_data = measure_data.loc[ (measure_data['Type of region'].isin(['Province']) )  ]
            
        if ( indicator != '' ):
            heatmap_data = measure_data.loc[  (measure_data['Indicator'] == indicator) & (measure_data['Type of region'].isin(['Country', 'Canada']) )  ]
            if ( provincial_only == True ):
                heatmap_data = measure_data.loc[ (measure_data['Indicator'] == indicator)   & (measure_data['Type of region'].isin(['Province']) )  ]

        #print(heatmap_data)
        indicator_data_heatmap = heatmap_data[ ['Region', 'Value', 'Indicator', 'Data year']  ]
        
                
        indicator_data = indicator_data_heatmap.set_index(['Region'])
        
        # x is not used, mean is calculated by seaborn
        #x = indicator_data.groupby(['Region', 'Indicator']).mean()        
        #print(x.index)
        #print(x)
        
        
        #heatmap1_data = pd.pivot_table(indicator_data_heatmap, values='Value',  index=['Region'],  columns='Indicator') 
        #heatmap1_data = pd.pivot_table(x, values='Value',  index=x.index,  columns='Indicator') 
        heatmap1_data = pd.pivot_table(indicator_data, index = indicator_data.index, columns='Indicator', values='Value', aggfunc = 'mean') 
        plt.figure(figsize=fig_size)
        #ax = 
        sns.heatmap(heatmap1_data, cmap="YlGnBu")
        
        #ax.xaxis.set_ticks_position('top')
        #ax.set_xticklabels(indicator_data_heatmap['Indicator'], rotation=90) # can be turned off                        
        
        
        #plt.show()

                            
        title_string = measure_file.result[0:len(measure_file.result)-4] + ':' + indicator
        
        all_years_str = ''
        for aYear in all_years:
            if (aYear > 0):
                all_years_str += str(aYear) + ', '
            
            
        year_str = ' for year '  + str(year) if year > 0 else ' Mean over years \n' + all_years_str
        title_string += year_str
        plt.title(title_string)
        #plt.show()
        
        plt.savefig('./saved_images_from_visualizations/' + 'heatmap_over_years_' + measure_file.result[0:len(measure_file.result)-4] + indicator.replace(' ', '_')[0:5] + '_' + str(np.random.randint(0, 99999)) + '.png')
        #plt.show()

Create the components for the UI interface

Users will be interact with the system to generate custom visualizations

START-RELOAD-DAT

this UI component creation might not be the must

# create the interactive interface
def f(indicator):
    return indicator

#print ('Measure' + measure_file.result)
#print('Select parameters\n')
#print('Select Indicator with or without Canadian data')
indicator_country = interactive(f, indicator=indicators);
#display(indicator_country)
indicator_country.result


def f(canada_indicator):
    return canada_indicator

#print('Select Indicator with Canadian data')
indicator_canada = interactive(f, canada_indicator=indicators_with_canada);
#display(indicator_canada)
indicator_canada.result


def f(year):
    return year

#print('Select Year: 0 = all years')
year_country = interactive(f, year=all_years);
#display(year_country)
year_country.result

#print('Select Year: 0 = all years')
year_canada = interactive(f, year=all_years_canada);
#display(year_canada)
year_canada.result


def f(what_to_plot):
    return what_to_plot

#print('Select what to plt:')
what_to_plot = {}
# this can come from an excel file as well
what_to_plot['health-status.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
what_to_plot['access-to-care.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
#what_to_plot['indicator-methodology.xls'] = ['Indicator Values over years', 'Indicator Values over countries']
what_to_plot['non-med-determinants.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
what_to_plot['patient-safety.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
what_to_plot['prescribing-primary.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']
what_to_plot['quality-of-care.xls'] = ['Indicator Values over years', 'Indicator Values over countries', 'Geo plot', 'Heatmap']


plots = interactive(f, what_to_plot=what_to_plot[measure_file.result]);
#display(plots)
plots.result


def f(chart_type):
    return chart_type

#print('Select chart type:')
chart_types = ['Bubble', 'Bar', 'Hor Bar', 'Pie', 'Line']
chart = interactive(f, chart_type=chart_types);
#display(chart)
chart.result


def f(scale):
    return scale

#print('Select bubble size')
bubble_scale_country = interactive(f, scale=(0, 100, 1));
#display(bubble_scale_country)
#bubble_scale_country.result = 10

def f(x):
    return x

#print('Max country count to compare with')
country_count = interactive(f, x=range(1, 20));
#display(country_count)
country_count.result

#print('Select countries')
#print('Compare with Canada')
all_countries = interactive(f, x=True);
#display(all_countries)
all_countries.result

country_0 = interactive(f, x = countries);
country_1 = interactive(f, x = countries);
#country_str = "var%d = interactive(f, x = countries)"
#display(country_0)
#display(country_1)
country_0.result

def f(com_with_cdn):
    return com_with_cdn

#print('Compare with Canada')
compare_canada = interactive(f, com_with_cdn=False);
#display(compare_canada)
compare_canada.result

def f(use_data_with_canada):
    return use_data_with_canada

use_data_with_canada = interactive(f, use_data_with_canada=False);
#display(use_data_with_canada)
use_data_with_canada.result


def f(provincial_only):
    return provincial_only

provincial_only_plot = interactive(f, provincial_only=False);
#display(use_data_with_canada)
provincial_only_plot.result


# animate over time
def f(animate):
    return animate

#print('Compare with Canada')
animate_ = interactive(f, animate=False);
#display(compare_canada)
animate_.result


# for heatmap, do we consider the indicator or not
def f(heatmap_consider_indicator):
    return heatmap_consider_indicator

heatmap_consider_indicator_ = interactive(f, heatmap_consider_indicator=False);
#display(heatmap_consider_indicator_)
heatmap_consider_indicator_.result

Render the UI controls

Note: use_data_with_canada — is to indicate that when canadian data are available. Also, Canada_Indicator and Year from the right will be used

com_with_cdn and animate controls are not used so far

Must as part of reload data

# Reference: https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20Styling.html
print('Indicator and year list on the right, represent where Canadian data exist')
from ipywidgets import Button, GridBox, Layout, ButtonStyle
GridBox(children=[
                    use_data_with_canada, compare_canada,
                    indicator_country, indicator_canada, 
                    year_country, year_canada,
                    plots, chart,
                    bubble_scale_country, provincial_only_plot,
                    heatmap_consider_indicator_, animate_
                ],
        
        layout=Layout(
            width='100%',
            grid_template_rows='auto auto',
            grid_template_columns='50%50%',
            )
       )Indicator and year list on the right, represent where Canadian data exist



---------------------------------------------------------------------------

ImportError                               Traceback (most recent call last)

<ipython-input-28-5b8e7f56abb0> in <module>()
      1 # Reference: https://ipywidgets.readthedocs.io/en/stable/examples/Widget%20Styling.html
      2 print('Indicator and year list on the right, represent where Canadian data exist')
----> 3 from ipywidgets import Button, GridBox, Layout, ButtonStyle
      4 GridBox(children=[
      5                     use_data_with_canada, compare_canada,


ImportError: cannot import name 'GridBox'

END-RELOAD-DATA

# assign a default value
bubble_scale_country.result = 10

Create the Plot Based on User Selections

plt.rcParams['figure.figsize'] = [10, 8]
#print(chart.result)
#provincial_only = False
heatmap_consider_indicator = heatmap_consider_indicator_.result
# Data irrespective Canada has data or not
if use_data_with_canada.result == False:
    indicator = indicator_country.result
    year = year_country.result
    bubble_scale = bubble_scale_country.result

# when we are saying data for canada must exist there    
else:
    indicator = indicator_canada.result
    year = year_canada.result
    bubble_scale = bubble_scale_country.result
    #provincial_only = provincial_only_plot.result
    
#print(indicator, year, bubble_scale)
if plots.result == 'Indicator Values over years':
    plot_measure_by_years(year, indicator, bubble_scale, chart_type=chart.result, ratios=[3,1], animate = animate_.result, 
        provincial_only=provincial_only_plot.result, fig_size=[20,10], sec_fig=False)
elif plots.result == 'Geo plot':
    plot_map_measure_by_regions(year, indicator, bubble_scale, chart_type=chart.result, \
                                provincial_only=provincial_only_plot.result)
    
elif plots.result == 'Heatmap':
    if ( heatmap_consider_indicator == False ):
        indicator = ''
    plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot.result, fig_size=[10, 10])   
else: # Indicator Values over countries
    plot_measure_by_regions(year, indicator, bubble_scale, chart_type=chart.result, \
                            ratios=[3,1], provincial_only=provincial_only_plot.result)

Section: Research Questions and Answers

Please Select the related measure and reload all data (as marked with Start-Reload, End-Reload). Otherwise the following visualizations might not work unless that is for currently selected measures

A better solution could be: I could place the measure selection here and could execute all the data reload code

Visualizations plotted independently for the visualizations used in the detail prsentation document

All these can be generated using the UI, I am just showing specific cases as I plotted using UI and provided on my report

How does Canada compare for a health status indicator such as : Cancer Mortality (F) for 2017 (per 100k)? Example: Cancer Mortality, 2017:

year = 2017
indicator = 'Cancer Mortality (F)'

# does not matter
bubble_scale = 11

chart_type = 'Line'

# subplot ratios
ratios = [1000, 1]

# not implemented
animate = False

# if for Provinces - Canada
provincial_only = False

# figure size
fig_size = [5, 5]

# not important
sec_fig = True

print('The right line on the plot does not count; comes from the right subplot; that is not relevant for this case')
plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)The right line on the plot does not count; comes from the right subplot; that is not relevant for this case

Research Question: how does an indicator such as Transport Mortality changed over time for different countries?

year = 0
indicator = 'Transport Accident Mortality (M)'

# does not matter
bubble_scale = 100

chart_type = 'Bubble'

# subplot ratios
# as we will show countries by colors, ratios are useful
ratios = [3, 1]

# not implemented
animate = False

# if for Provinces - Canada
provincial_only = False

# figure size
fig_size = [20, 10]

# not important
sec_fig = True

print('Note: Y axis values need to be multiplied by Benchmark value to get actual values')
plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)Note: Y axis values need to be multiplied by Benchmark value to get actual values


C:\ProgramData\Anaconda3\lib\site-packages\ipykernel_launcher.py:346: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

Research Question: How do Canadian provinces perform for Transport Mortality (M) for 2017?

year = 2017
indicator = 'Transport Accident Mortality (M)'

# does not matter
bubble_scale = 100

# will plot for other chart types as well
chart_type = 'Line'

# subplot ratios
# as we will show countries by colors, ratios are useful
ratios = [10, 1]

# not implemented
animate = False

# Matters as we are plotting for Canadian provinces
provincial_only = True

# figure size
fig_size = [10, 8]

# not important
sec_fig = True

print('Note: Y axis values need to be multiplied by Benchmark value to get actual values')
plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)Note: Y axis values need to be multiplied by Benchmark value to get actual values

Research, Analysis, and Visualization Concern:

How does the transport mortality compare against countries on 2017 based on the data we have? Visualize in different format

chart_types = ['Bubble', 'Bar', 'Hor Bar', 'Pie', 'Line']

year = 2017
indicator = 'Transport Accident Mortality (M)'

# does not matter
bubble_scale = 32

# will plot for other chart types as well
chart_type = 'Line'

# subplot ratios
# as we will show countries by colors, ratios are useful
ratios = [10, 1]

# not implemented
animate = False

# Matters as we are plotting for Canadian provinces
provincial_only = False

# figure size
fig_size = [5, 5]

# not important
sec_fig = True


for chart_type in chart_types:
    plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)

How does the transport mortality compare against Canadian provinces on 2017 based on the data we have? Visualize in different format

Note: the code from the above cell need to be executed first as I am reusing some variables

# Matters as we are plotting for Canadian provinces
provincial_only = True

# figure size
fig_size = [10, 8]

# not important
sec_fig = True

ratios = [3, 1]

for chart_type in chart_types:
    plot_measure_by_years(year, indicator, bubble_scale, chart_type, ratios, animate, provincial_only, fig_size=fig_size)

For Research Question: What are average alcohol consumption across countries over last couple of years

# you can add or remove years, to get average measures over those years
year = 0 # 0 indicates all years for the list all_years
all_years = [2013, 2014, 2015, 2016, 2017] 
indicator = 'Alcohol Consumption: Adults'
bubble_scale = 91 # NA
chart_type='Bar' # change to Line, Bubble, Pie, 'Hor Bar'
provincial_only = False # if you set true only provincial data will be plotted

# plot_measure_by_regions(year, indicator, bubble_scale, chart_type = '', ratios=[3,1], provincial_only=False, all_years=all_years):    
plot_measure_by_regions(year, indicator, bubble_scale, chart_type, ratios=[3,1], provincial_only=provincial_only, all_years=all_years)

the above case just for year 2015

#year = 0 # 0 indicates all years for the list all_years
#all_years = [2013, 2014, 2015, 2016, 2017] 
indicator = 'Alcohol Consumption: Adults'
#bubble_scale = 91 # NA
#chart_type='Bar' # change to Line, Bubble, Pie, 'Hor Bar'
provincial_only = False # if you set true only provincial data will be plotted


print('if you want for only one year change as follows')
print('For Research Question: What are average alcohol consumption across countries for 2015')
# if you want for only one year change as follows
year = 2015
# other chart type sych as bar will work though will have issues
chart_type='Bar' 
#plot_measure_by_regions(year, indicator, bubble_scale, chart_type,  ratios=[3,1], provincial_only)      
plot_measure_by_regions(year, indicator, bubble_scale, chart_type, ratios=[3,1], provincial_only=provincial_only, all_years=all_years)if you want for only one year change as follows
For Research Question: What are average alcohol consumption across countries for 2015

Research Question: For 2015, which country smoked the most? Used Geo plot. However, plots like above sections could also be used

chart_type'Bar'year = 2015
indicator = 'Smoking: Adults (M)'

# the plot on the presentation used 2
# note : this is inverse size of the bubble
bubble_scale = 3
chart_type = ''
# are not relevant
# chart_type = '',  all_years=all_years, also provincial_only=False

plot_map_measure_by_regions(year, indicator, bubble_scale, chart_type = '', provincial_only=False, all_years=all_years)

Research Question: Where in the world obesity are more common?

year = 0
indicator = 'Obesity Reported: Adults'
# the plot on the presentation used 2
# note : this is inverse size of the bubble
bubble_scale = 2
chart_type = ''
# are not relevant
# chart_type = '',  all_years=all_years, also provincial_only=False

plot_map_measure_by_regions(year, indicator, bubble_scale, chart_type = '', provincial_only=False, all_years=all_years)

Research Question: Using a Heatmap, How different countries compare for their Non Medical Determinants aspect for 2014

Note: current selection needs to be: Non Medical Determinants.

You can change the Health Measure and then all data have to be reloaded by executing the sections marked: START-Reload, END-Reload

indicator = ''
year = 2014
indicator = ''
provincial_only_plot = False
plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot, fig_size=[10, 10])

Research question: How do different countries compare for health status indicators over years

Note: current selection needs to be: Health status with code from START-Reload-data to END-RELOAD-Data need to be executed)

indicator = ''
year = 0
indicator = ''
provincial_only_plot = False
plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot, fig_size=[10, 10])

Research question: How do Canadian Provinces compare for health status indicators over years

Note: current selection needs to be: Health status (otherwise, plots will be using current health aspect/measure that I may or may not have tested)

indicator = ''
year = 0
indicator = ''
provincial_only_plot = True
fig_size_=[7, 5]
plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot, fig_size=fig_size_)

year, indicator(2017, 'Transport Accident Mortality (F)')

Research Question: How does transport mortality (Female) compare acrosss countries for 2017

This is just an example to show heatmap plot when ‘heatmap_consider_indicator’ option is selected and an indicator is selected

year = 2017
indicator = 'Transport Accident Mortality (F)'
fig_size_=[3, 5]
plot_heatmap_across_indicators(year, indicator, provincial_only=provincial_only_plot.result, fig_size=fig_size_)

Access to Care

Please select Access to Care Measure, and reload all data

Wait time for specialists in Days

year = 0
indicator = 'Wait Time: Specialist'
bubble_scale = 10
chart_type = 'Bubble'
provincial_only=False
ratios=[3,1]

plot_measure_by_regions(year, indicator, bubble_scale, chart_type=chart_type, ratios=ratios, provincial_only=False)

Access to Care: same or next day appointment

year, indicator, bubble_scale(0, 'Same or Next Day Appt', 10)year = 0
indicator = 'Same or Next Day Appt'
bubble_scale = 10
chart_type = 'Bubble'
provincial_only=True
ratios=[3,1]

plot_measure_by_regions(year, indicator, bubble_scale, chart_type=chart_type, ratios=ratios, provincial_only=True)

Heatmap: Access to Care Indicators

year,indicator(0, '')year = 0
indicator = ''
plot_heatmap_across_indicators(year, indicator, provincial_only=False, fig_size=[10, 10])

The following code are supposed to be removed at the final step

Code reused from lab 06 the geoplot

# references
# https://pypi.org/project/geopy/

Health System Performance

Following

Indicator visualizations will be in a separate ipynb file (visualize-indicators-final.ipynb)

Visualizations can be created using the UI interface i.e. Select options and execute the code block after to get the visualizations.

I have placed a separate section at the end of this file where research questions and plots as I placed on my documents and presentations are placed (Code there will generate the plots for that section)

For research question section, corresponding health status will need to be selected and data reloaded to re — execute (UI selections will not matter)

Location of the Data Files Folder.

Select a measure/aspect to visualize upon

Note: when a health-aspect i.e. measure will be changed in the drop-down below, some code need to be re-executed to load related data.

I am marking with START-RELOAD-DATA and END-RELOAD-DATA

Test what aspect/measure we have selected

START-RELOAD-DATA

Load Data and Display

Find all the performance indicators under this aspect/measure

Find the performance indicators under this aspect/measure when Canada must have data

Find years as we can see in the data

Sort the years so that we can show them in drop down in ascending format

0 means all years selected

All countries

unique color code for each country

color will be used as a third dimension in some plots

latitude longitude for all regions (countries, provinces)

will be used in map/geo plots

Test to check the OECD Data: Benchmark data

END-RELOAD-DATA

Method to plot over years or only for a year.

Plot over a region for different countries

Geo Plot, Map Plot on a World Map

Heatmap to compare across indicators and countries

Create the components for the UI interface

Users will be interact with the system to generate custom visualizations

START-RELOAD-DAT

Render the UI controls

END-RELOAD-DATA

Create the Plot Based on User Selections

Section: Research Questions and Answers

Please Select the related measure and reload all data (as marked with Start-Reload, End-Reload). Otherwise the following visualizations might not work unless that is for currently selected measures

Visualizations plotted independently for the visualizations used in the detail prsentation document

How does Canada compare for a health status indicator such as : Cancer Mortality (F) for 2017 (per 100k)? Example: Cancer Mortality, 2017:

Research Question: how does an indicator such as Transport Mortality changed over time for different countries?

Research Question: How do Canadian provinces perform for Transport Mortality (M) for 2017?

Research, Analysis, and Visualization Concern:

How does the transport mortality compare against Canadian provinces on 2017 based on the data we have? Visualize in different format

For Research Question: What are average alcohol consumption across countries over last couple of years

Research Question: For 2015, which country smoked the most? Used Geo plot. However, plots like above sections could also be used

Research Question: Where in the world obesity are more common?

Research Question: Using a Heatmap, How different countries compare for their Non Medical Determinants aspect for 2014

Note: current selection needs to be: Non Medical Determinants.

Research question: How do different countries compare for health status indicators over years

Note: current selection needs to be: Health status with code from START-Reload-data to END-RELOAD-Data need to be executed)

Research question: How do Canadian Provinces compare for health status indicators over years

Note: current selection needs to be: Health status (otherwise, plots will be using current health aspect/measure that I may or may not have tested)

Research Question: How does transport mortality (Female) compare acrosss countries for 2017

Access to Care

Please select Access to Care Measure, and reload all data

Wait time for specialists in Days

Access to Care: same or next day appointment

Heatmap: Access to Care Indicators

The following code are supposed to be removed at the final step

Code reused from lab 06 the geoplot

Health System Performance

Similar Posts

Machine Learning, Big Data, Data Science, Analytics, Cloud, Security, AI, Robotics, Database, BI, Development: Software, Web, Mobile