This project aims to analyze to referral system among health units in the Philippines in order to identify the characteristics and consequently, the robustness of this system. The main data used in this analysis are geospatial coordinates of different health units in the network — barangay health stations (BHS), rural health units (RHU), and hospitals. The network is supplemented with bed capacity information of the different health units. The aforementioned data was obtained from the DOH Data Collect app v2.1 and the National Health Facility Registry.
import pandas as pd
import numpy as np
from glob import glob
import sys
import locale
from geopy.distance import vincenty
import warnings
warnings.filterwarnings("ignore")
np.set_printoptions(threshold=sys.maxsize, precision=2)
pd.set_option('float_format', '{:,.2f}'.format)
pd.set_option('display.max_rows', 1000)
Before performing the analysis, the data must be cleaned to standardize text and minimize effect of errors (e.g. typographical, encoding, outliers). The goal of this processing is to prepare the data for network analysis.
files = glob('Geographic Coordinates/*.xlsx')
cols = ['Health Facility Code Short', 'Facility Name', 'Health Facility Type',
'Region Name ',
'Province Name ', 'City/Municipality Name', 'Barangay Name',
'Latitude', 'Longitude', 'Service Capability', 'Licensing Status',
'Bed Capacity']
HF_list = pd.DataFrame()
for f in files:
data = pd.read_excel(f, usecols=cols)
HF_list = HF_list.append(data)
HF_list.isnull().sum() # Verify mismatched fields across different excel files
HF_list.columns = ['SHORT_HFCODE', 'HF_NAME', 'HF_TYPE', 'REGION', 'PROVINCE',
'MUNI_CITY', 'BRGY', 'LAT', 'LONG', 'SERVICE_CAP',
'LICENSING', 'BED_CAP']
str_cols = ['HF_NAME', 'HF_TYPE', 'REGION', 'PROVINCE', 'MUNI_CITY', 'BRGY',
'SERVICE_CAP', 'LICENSING', 'BED_CAP']
HF_list[str_cols] = HF_list[str_cols].fillna('UNKNOWN').apply(lambda x: x.str.upper().str.strip())
HF_list['SHORT_HFCODE'] = HF_list['SHORT_HFCODE'].astype(int)
HF_list.to_excel('cleaned/HFList_cleaned.xlsx') #Store the combined dataframe
rhu = pd.read_excel('rhu2018.xlsx', sheet_name='MAIN', na_values='None')
str_cols = ['HF_NAME', 'REGION', 'PROVINCE', 'MUNI_CITY', 'BRGY',
'STREET_NAME', 'BUILDING', 'FACILITY_HEAD', 'DETACHED', 'BRGYS',
'SDN', 'SDN_NAME', 'REF1_NAME', 'REF1_SAMEPROV',
'REF1_REF1A', 'REF1A_SAMEPROV', 'REF2_NAME', 'REF3_NAME',
'AMB_ACCESS', 'AMB_OWN', 'PHIC_ACC', 'PHIC_PACKAGES', 'PHIC_MCP',
'PHIC_PCB1', 'PHIC_MALARIA', 'PHIC_TBDOTS', 'PHIC_ABTC',
'PHIC_NCP', 'PHIC_OTH']
code_cols = ['id', 'REF1_CODE', 'REF2_CODE', 'REF3_CODE']
float_cols = ['REF1_DIST', 'REF1_TRAVEL', 'REF2_DIST', 'REF2_TRAVEL',
'REF3_DIST', 'REF3_TRAVEL']
# int_cols = ['id', 'BHS_COUNT','CATCHMENT', 'REF1_CODE', 'REF2_CODE',
# 'REF3_CODE', 'MD_NO', 'MD_AUG', 'MD_TOTAL','MD_FT', 'MD_PT',
# 'MD_VISIT', 'RN_NO', 'RN_AUG', 'RN_TOTAL', 'RN_FT', 'RN_PT',
# 'RN_VISIT', 'MW_NO', 'MW_AUG', 'MW_TOTAL', 'MW_FT', 'MW_PT',
# 'MW_VISIT']
rhu[str_cols] = rhu[str_cols].apply(lambda x: x.str.upper().str.strip())
rhu[code_cols] = rhu[code_cols].fillna(0).astype(int)
rhu[float_cols] = rhu[float_cols].astype(float)
rhu[str_cols] = rhu[str_cols].fillna('UNKNOWN')
# Extract short code for merging with the Geographic Coordinates files
rhu['SHORT_HFCODE'] = rhu['HF_CODE'].apply(lambda x: int(x[-6:]))
rhu.to_excel('cleaned/rhu_cleaned.xlsx')
As the data is being processed from different tables, the other tables can be used to fill some missing information. Aside from imputing missing information, coordinates outside the Philippines are identified.
# Bounding box for the Philippines (manually extracted from Google).
long_min, lat_min, long_max, lat_max = (117.17427453, 5.58100332277, 126.537423944, 18.5052273625)
HF_list = pd.read_excel('cleaned/HFList_cleaned.xlsx')
# Groupby the data to account for duplicate names for different codes
HF_dict = HF_list[['HF_NAME', 'SHORT_HFCODE']].groupby('HF_NAME')['SHORT_HFCODE'].apply(set).to_dict()
latlong_dict = HF_list[['SHORT_HFCODE', 'LAT', 'LONG']].set_index('SHORT_HFCODE').to_dict()
rhu = pd.read_excel('cleaned/rhu_cleaned.xlsx')
# Create copies of the dataframe for later use
rhu2 = rhu.copy()
rhu3 = rhu.copy()
cols = ['id', 'HF_CODE', 'SHORT_HFCODE', 'HF_NAME', 'REGION', 'PROVINCE', 'MUNI_CITY', 'BRGY',
'STREET_NAME', 'BUILDING', 'FACILITY_HEAD', 'DETACHED', 'BRGYS',
'SDN', 'SDN_NAME', 'REF1_NAME', 'REF1_SAMEPROV',
'REF1_REF1A', 'REF1A_SAMEPROV', 'REF2_NAME', 'REF3_NAME',
'AMB_ACCESS', 'AMB_OWN', 'PHIC_ACC', 'PHIC_PACKAGES', 'PHIC_MCP',
'PHIC_PCB1', 'PHIC_MALARIA', 'PHIC_TBDOTS', 'PHIC_ABTC',
'PHIC_NCP', 'PHIC_OTH', 'REF1_CODE', 'REF1_DIST', 'REF1_TRAVEL',
'REF2_CODE', 'REF2_DIST', 'REF2_TRAVEL',
'REF3_CODE', 'REF3_DIST', 'REF3_TRAVEL']
rhu = rhu[cols]
# Using the health facility list, complete the RHU data
rhu.loc[rhu['REF1_CODE']==0, 'REF_CODE'] = rhu[rhu['REF1_CODE']==0]['REF1_NAME'].map(HF_dict)
temp = rhu[['SHORT_HFCODE', 'REF_CODE']].dropna().copy()
# This dataframe contains the HF codes of one health facility to other facilities.
temp.head()
SHORT_HFCODE | REF_CODE | |
---|---|---|
8 | 2228 | {3313} |
20 | 6698 | {3313} |
29 | 147 | {2703} |
36 | 2033 | {3667} |
37 | 2184 | {2703} |
# Out of all the mapped referred facilities, the closest facility is used as the actual referred facility
# Slight pre-processing: "place" the health facilities without coordinates to southwest of southwest Philippine boundary (lat_min - 10, long_min - 20) or northeast of northeast Philippine boundary
temp_dict = pd.DataFrame(temp.apply(lambda x: min([(vincenty((latlong_dict['LAT'][x['SHORT_HFCODE']] if latlong_dict['LAT'][x['SHORT_HFCODE']]==latlong_dict['LAT'][x['SHORT_HFCODE']] else lat_min-10,
latlong_dict['LONG'][x['SHORT_HFCODE']] if latlong_dict['LONG'][x['SHORT_HFCODE']]==latlong_dict['LONG'][x['SHORT_HFCODE']] else long_min-20),
(latlong_dict['LAT'][i] if latlong_dict['LAT'][i]==latlong_dict['LAT'][i] else lat_max+10,
latlong_dict['LONG'][i] if latlong_dict['LONG'][i]==latlong_dict['LONG'][i] else long_max+20)).km, i, x['SHORT_HFCODE']) for i in x['REF_CODE']], key=lambda x: x[0]), axis=1).tolist()).set_index(2).to_dict()
rhu['REF_CODE'] = rhu['SHORT_HFCODE'].map(temp_dict[1])
rhu.loc[rhu['REF1_CODE']!=0, 'REF_CODE'] = rhu.loc[rhu['REF1_CODE']!=0, 'REF1_CODE']
The data contains up to three referred facilities so the same processing is performed for the second and third HF code.
rhu2.loc[rhu2['REF2_CODE']==0, 'REF_CODE'] = rhu2[rhu2['REF2_CODE']==0]['REF2_NAME'].map(HF_dict)
temp = rhu2[['SHORT_HFCODE', 'REF_CODE']].dropna().copy()
temp_dict = pd.DataFrame(temp.apply(lambda x: min([(vincenty((latlong_dict['LAT'][x['SHORT_HFCODE']] if latlong_dict['LAT'][x['SHORT_HFCODE']]==latlong_dict['LAT'][x['SHORT_HFCODE']] else lat_min-10,
latlong_dict['LONG'][x['SHORT_HFCODE']] if latlong_dict['LONG'][x['SHORT_HFCODE']]==latlong_dict['LONG'][x['SHORT_HFCODE']] else long_min-20),
(latlong_dict['LAT'][i] if latlong_dict['LAT'][i]==latlong_dict['LAT'][i] else lat_max+10,
latlong_dict['LONG'][i] if latlong_dict['LONG'][i]==latlong_dict['LONG'][i] else long_max+20)).km, i, x['SHORT_HFCODE']) for i in x['REF_CODE']], key=lambda x: x[0]), axis=1).tolist()).set_index(2).to_dict()
rhu2['REF_CODE'] = rhu2['SHORT_HFCODE'].map(temp_dict[1])
rhu2.loc[rhu2['REF2_CODE']!=0, 'REF_CODE'] = rhu2.loc[rhu2['REF2_CODE']!=0, 'REF2_CODE']
rhu3.loc[rhu3['REF3_CODE']==0, 'REF_CODE'] = rhu3[rhu3['REF3_CODE']==0]['REF3_NAME'].map(HF_dict)
temp = rhu3[['SHORT_HFCODE', 'REF_CODE']].dropna().copy()
temp_dict = pd.DataFrame(temp.apply(lambda x: min([(vincenty((latlong_dict['LAT'][x['SHORT_HFCODE']] if latlong_dict['LAT'][x['SHORT_HFCODE']]==latlong_dict['LAT'][x['SHORT_HFCODE']] else lat_min-10,
latlong_dict['LONG'][x['SHORT_HFCODE']] if latlong_dict['LONG'][x['SHORT_HFCODE']]==latlong_dict['LONG'][x['SHORT_HFCODE']] else long_min-20),
(latlong_dict['LAT'][i] if latlong_dict['LAT'][i]==latlong_dict['LAT'][i] else lat_max+10,
latlong_dict['LONG'][i] if latlong_dict['LONG'][i]==latlong_dict['LONG'][i] else long_max+20)).km, i, x['SHORT_HFCODE']) for i in x['REF_CODE']], key=lambda x: x[0]), axis=1).tolist()).set_index(2).to_dict()
rhu3['REF_CODE'] = rhu3['SHORT_HFCODE'].map(temp_dict[1])
rhu3.loc[rhu3['REF3_CODE']!=0, 'REF_CODE'] = rhu3.loc[rhu3['REF3_CODE']!=0, 'REF3_CODE']
rhu.dropna(subset=['REF_CODE'], inplace=True)
rhu2.dropna(subset=['REF_CODE'], inplace=True)
rhu3.dropna(subset=['REF_CODE'], inplace=True)
rhu.rename({'REF1_DIST':'REF_DIST', 'REF1_TRAVEL':'REF_TRAVEL', 'REF1_NAME':'REF_NAME'}, axis=1, inplace=True)
rhu2.rename({'REF2_DIST':'REF_DIST', 'REF2_TRAVEL':'REF_TRAVEL', 'REF1_NAME':'REF_NAME'}, axis=1, inplace=True)
rhu3.rename({'REF3_DIST':'REF_DIST', 'REF3_TRAVEL':'REF_TRAVEL', 'REF1_NAME':'REF_NAME'}, axis=1, inplace=True)
cols2 = ['SHORT_HFCODE', 'REF_CODE']
rhu_edges = rhu[cols2].append(rhu2[cols2]).append(rhu3[cols2])
# Add a column identifying the type of facility for the later network analysis
rhu_edges['HF_TYPE'] = 'RHU'
HF_list[['SHORT_HFCODE', 'LAT', 'LONG']].describe()
SHORT_HFCODE | LAT | LONG | |
---|---|---|---|
count | 29,424.00 | 25,752.00 | 25,752.00 |
mean | 19,505.19 | 12.27 | 122.49 |
std | 11,203.47 | 7.11 | 4.42 |
min | 1.00 | 0.00 | 0.00 |
25% | 9,724.75 | 9.52 | 121.02 |
50% | 19,799.50 | 12.73 | 122.38 |
75% | 29,542.25 | 14.81 | 124.16 |
max | 38,002.00 | 1,000.00 | 126.58 |
rhu_edges['source_lat'] = rhu_edges['SHORT_HFCODE'].map(latlong_dict['LAT'])
rhu_edges['source_long'] = rhu_edges['SHORT_HFCODE'].map(latlong_dict['LONG'])
rhu_edges['target_lat'] = rhu_edges['REF_CODE'].map(latlong_dict['LAT'])
rhu_edges['target_long'] = rhu_edges['REF_CODE'].map(latlong_dict['LONG'])
rhu_edges.loc[~((rhu_edges['source_lat'].between(lat_min, lat_max)) & (rhu_edges['source_long'].between(
long_min, long_max))), ['source_lat', 'source_long']] = np.nan
rhu_edges.loc[~((rhu_edges['target_lat'].between(lat_min, lat_max)) & (rhu_edges['target_long'].between(
long_min, long_max))), ['target_lat', 'target_long']] = np.nan
missing_latlong = ~rhu_edges[['source_lat', 'source_long', 'target_lat', 'target_long']].isnull().sum(axis=1).astype(bool)
rhu_edges.loc[missing_latlong, 'DIST'] = rhu_edges.loc[missing_latlong, ['source_lat', 'source_long', 'target_lat', 'target_long']].apply(lambda x: vincenty((x['source_lat'], x['source_long']), (x['target_lat'], x['target_long'])).km, axis=1)
rhu_edges.loc[rhu_edges['DIST']==0, 'DIST'] = 1
In this case, referral between facilities that are too high are classified as outliers. The threshold to be used for filtering out these outliers is subject to the researcher's choice. The unit of distance is in kilometers. Note that since the distance is extracted from coordinates, errors in encoding will change the actual coordinates of the facilities, therefore making changes to the distance (e.g. dividing by a constant factor) cannot be done. Instead, other imputing techinques are used to fill for missing distance data.
B = rhu_edges.boxplot('DIST', return_type='both')
outliers = [i.get_ydata()[1] for i in B.lines['whiskers']]
rhu_edges.loc[rhu_edges['DIST'] > outliers[1], 'DIST'] = np.nan
outliers
[0.0006040641840882818, 88.56724901828711]
muni_dict = HF_list[['SHORT_HFCODE', 'MUNI_CITY', 'PROVINCE']].set_index('SHORT_HFCODE').to_dict()
With the municipality information of the facilities, the distance of different health units are imputed using the median. The assumption is that within the same municipality, the distance of referring facilities are more or less similar. For tracking, the source of the distance information is stored.
rhu_edges['muni_city'] = rhu_edges['SHORT_HFCODE'].map(muni_dict['MUNI_CITY'])
mean_dist_city = rhu_edges.groupby('muni_city')['DIST'].median().to_dict()
imputed_muni = ~(rhu_edges.loc[rhu_edges['DIST'].isnull(), 'muni_city'].map(mean_dist_city).isnull())
imputed_muni = imputed_muni[imputed_muni].index
rhu_edges.loc[imputed_muni, "IMPUTED"] = "MUNI"
rhu_edges.loc[rhu_edges['DIST'].isnull(), 'DIST'] = rhu_edges.loc[rhu_edges['DIST'].isnull(), 'muni_city'].map(mean_dist_city)
For those facilities without municipality information, the province is used.
rhu_edges['province'] = rhu_edges['SHORT_HFCODE'].map(muni_dict['PROVINCE'])
mean_dist_prov = rhu_edges.groupby('province')['DIST'].median().to_dict()
imputed_prov = ~(rhu_edges.loc[rhu_edges['DIST'].isnull(), 'province'].map(mean_dist_prov).isnull())
imputed_prov = imputed_prov[imputed_prov].index
rhu_edges.loc[imputed_prov, "IMPUTED"] = "PROV"
rhu_edges.loc[rhu_edges['DIST'].isnull(), 'DIST'] = rhu_edges.loc[rhu_edges['DIST'].isnull(), 'province'].map(mean_dist_prov)
rhu_edges['DIST'].isnull().sum()
17
If after all these, the referring facilities still do not have distance information, the connection between the facilities is dropped.
rhu_edges.dropna(subset=['DIST'], inplace=True)
prov_HF_dict = rhu_edges.groupby('province')[['SHORT_HFCODE', 'REF_CODE']].agg(set).rename({'SHORT_HFCODE':'RHU', 'REF_CODE':'HOSP'}, axis=1).to_dict()
rhu_edges['REF_CODES'] = [set(rhu_edges[~rhu_edges[['target_lat', 'target_long']].isnull().sum(axis=1).astype(bool)]['REF_CODE'])] * len(rhu_edges)
rhu_edges = rhu_edges[rhu_edges['REF_CODE']!=rhu_edges['SHORT_HFCODE']]
The referrals above are based on actual data, i.e. actual referrals from facility to facility. This list is supplemented with the three nearest facilities, regardless of being the same facilities as the actual data.
n = 3 #num of nearest neighbors to connect
temp = rhu_edges[['SHORT_HFCODE', 'REF_CODES']].dropna().drop_duplicates(subset='SHORT_HFCODE').copy()
df_neighbors = pd.DataFrame(temp.apply(lambda x: sorted([(vincenty((latlong_dict['LAT'][x['SHORT_HFCODE']] if latlong_dict['LAT'][x['SHORT_HFCODE']]==latlong_dict['LAT'][x['SHORT_HFCODE']] else lat_min-10,
latlong_dict['LONG'][x['SHORT_HFCODE']] if latlong_dict['LONG'][x['SHORT_HFCODE']]==latlong_dict['LONG'][x['SHORT_HFCODE']] else long_min-20),
(latlong_dict['LAT'][i] if latlong_dict['LAT'][i]==latlong_dict['LAT'][i] else lat_max+10,
latlong_dict['LONG'][i] if latlong_dict['LONG'][i]==latlong_dict['LONG'][i] else long_max+20)).km, i, x['SHORT_HFCODE']) for i in x['REF_CODES'] if i!=x['SHORT_HFCODE']], key=lambda x: x[0])[:n], axis=1).tolist())#.set_index(2).to_dict()
df_neighbors_edges = pd.DataFrame(df_neighbors[0].append(df_neighbors[1]).append(df_neighbors[2]).tolist(), columns=['DIST', 'REF_CODE', 'SHORT_HFCODE'])
df_neighbors_edges['IMPUTED'] = 'NEAREST_NEIGHBOR'
rhu_edges[['SHORT_HFCODE', 'REF_CODE', 'DIST', 'IMPUTED']]
SHORT_HFCODE | REF_CODE | DIST | IMPUTED | |
---|---|---|---|---|
0 | 25 | 32,170.00 | 0.00 | NaN |
1 | 105 | 5,940.00 | 0.41 | NaN |
2 | 106 | 3,313.00 | 11.13 | NaN |
4 | 137 | 3,313.00 | 11.29 | NaN |
5 | 1760 | 6,513.00 | 5.98 | NaN |
... | ... | ... | ... | ... |
2296 | 6814 | 273.00 | 51.12 | NaN |
2297 | 7045 | 273.00 | 15.07 | NaN |
2298 | 7696 | 273.00 | 29.09 | NaN |
2299 | 8861 | 273.00 | 16.35 | NaN |
2300 | 27898 | 237.00 | 41.68 | MUNI |
5925 rows × 4 columns
rhu_edges = rhu_edges[['SHORT_HFCODE', 'REF_CODE', 'DIST', 'IMPUTED']].append(df_neighbors_edges)
rhu_edges.loc[rhu_edges['DIST'] > outliers[1], 'DIST'] = np.nan
rhu_edges.loc[rhu_edges['DIST']==0, 'DIST'] = 1
rhu_edges.describe()
DIST | REF_CODE | SHORT_HFCODE | |
---|---|---|---|
count | 12,157.00 | 12,330.00 | 12,330.00 |
mean | 16.11 | 4,600.62 | 5,816.23 |
std | 16.55 | 7,159.54 | 6,469.21 |
min | 0.00 | 1.00 | 10.00 |
25% | 3.39 | 622.00 | 2,307.00 |
50% | 11.07 | 2,850.00 | 4,234.00 |
75% | 22.87 | 5,139.00 | 7,062.00 |
max | 88.57 | 261,101.00 | 36,628.00 |
The same processing done for the rural health units is performed for the barangay health stations.
bhs = pd.read_excel('bhs2018.xlsx', sheet_name='MAIN', na_values='None')
str_cols = ['HF_NAME', 'REGION', 'PROVINCE', 'MUNI_CITY', 'BRGY',
'STREET_NAME', 'BUILDING', 'FACILITY_HEAD', 'DETACHED',
'BRGYS', 'RHU_NAME', 'RHU_SAME_CTY', 'RHU_NOTSAME_CTY',
'AMB_ACCESS']
code_cols = ['id', 'RHU_CODE']
float_cols = ['RHU_DIST', 'RHU_TRAVEL']
# int_cols = ['CATCHMENT', 'MD_NO', 'MD_AUG', 'MD_TOTAL',
# 'MD_FT', 'MD_PT', 'MD_VISIT', 'RN_NO', 'RN_AUG', 'RN_TOTAL', 'RN_FT',
# 'RN_PT', 'RN_VISIT', 'MW_NO', 'MW_AUG', 'MW_TOTAL', 'MW_FT', 'MW_PT',
# 'MW_VISIT', 'BHW_NO']
bhs[str_cols] = bhs[str_cols].apply(lambda x: x.str.upper().str.strip())
bhs[code_cols] = bhs[code_cols].fillna(0).astype(int)
bhs[float_cols] = bhs[float_cols].astype(float)
bhs[str_cols] = bhs[str_cols].fillna('UNKNOWN')
bhs['SHORT_HFCODE'] = bhs['HF_CODE'].apply(lambda x: int(x[-6:]))
bhs.to_excel('cleaned/bhs_cleaned.xlsx')
bhs = pd.read_excel('cleaned/bhs_cleaned.xlsx')
bhs.loc[bhs['RHU_CODE']==0, 'REF_CODE'] = bhs[bhs['RHU_CODE']==0]['RHU_NAME'].map(HF_dict)
temp = bhs[['SHORT_HFCODE', 'REF_CODE']].dropna().copy()
temp_dict = pd.DataFrame(temp.apply(lambda x: min([(vincenty((latlong_dict['LAT'][x['SHORT_HFCODE']] if latlong_dict['LAT'][x['SHORT_HFCODE']]==latlong_dict['LAT'][x['SHORT_HFCODE']] else lat_min-10,
latlong_dict['LONG'][x['SHORT_HFCODE']] if latlong_dict['LONG'][x['SHORT_HFCODE']]==latlong_dict['LONG'][x['SHORT_HFCODE']] else long_min-20),
(latlong_dict['LAT'][i] if latlong_dict['LAT'][i]==latlong_dict['LAT'][i] else lat_max+10,
latlong_dict['LONG'][i] if latlong_dict['LONG'][i]==latlong_dict['LONG'][i] else long_max+20)).km, i, x['SHORT_HFCODE']) for i in x['REF_CODE']], key=lambda x: x[0]), axis=1).tolist()).set_index(2).to_dict()
bhs['REF_CODE'] = bhs['SHORT_HFCODE'].map(temp_dict[1])
bhs.dropna(subset=['REF_CODE'], inplace=True)
cols = ['SHORT_HFCODE', 'REF_CODE']
bhs = bhs[cols]
bhs['HF_TYPE'] = 'BHS'
bhs['source_lat'] = bhs['SHORT_HFCODE'].map(latlong_dict['LAT'])
bhs['source_long'] = bhs['SHORT_HFCODE'].map(latlong_dict['LONG'])
bhs['target_lat'] = bhs['REF_CODE'].map(latlong_dict['LAT'])
bhs['target_long'] = bhs['REF_CODE'].map(latlong_dict['LONG'])
bhs.loc[~((bhs['source_lat'].between(lat_min, lat_max)) & (bhs['source_long'].between(
long_min, long_max))), ['source_lat', 'source_long']] = np.nan
bhs.loc[~((bhs['target_lat'].between(lat_min, lat_max)) & (bhs['target_long'].between(
long_min, long_max))), ['target_lat', 'target_long']] = np.nan
missing_latlong = ~bhs[['source_lat', 'source_long', 'target_lat', 'target_long']].isnull().sum(axis=1).astype(bool)
bhs.loc[missing_latlong, 'DIST'] = bhs.loc[missing_latlong, ['source_lat', 'source_long', 'target_lat', 'target_long']].apply(lambda x: vincenty((x['source_lat'], x['source_long']), (x['target_lat'], x['target_long'])).km, axis=1)
bhs.loc[bhs['DIST']==0, 'DIST'] = 1
B = bhs.boxplot('DIST', return_type='both')
outliers = [i.get_ydata()[1] for i in B.lines['whiskers']]
outliers
[0.0024470114519412256, 16.988283224538524]
outliers = [i.get_ydata()[1] for i in B.lines['whiskers']]
bhs.loc[bhs['DIST'] > outliers[1], 'DIST'] = np.nan
muni_dict = HF_list[['SHORT_HFCODE', 'MUNI_CITY', 'PROVINCE']].set_index('SHORT_HFCODE').to_dict()
bhs['muni_city'] = bhs['SHORT_HFCODE'].map(muni_dict['MUNI_CITY'])
mean_dist_city = bhs.groupby('muni_city')['DIST'].mean().to_dict()
imputed_muni = ~(bhs.loc[bhs['DIST'].isnull(), 'muni_city'].map(mean_dist_city).isnull())
imputed_muni = imputed_muni[imputed_muni].index
bhs.loc[imputed_muni, "IMPUTED"] = "MUNI"
bhs.loc[bhs['DIST'].isnull(), 'DIST'] = bhs.loc[bhs['DIST'].isnull(), 'muni_city'].map(mean_dist_city)
bhs['province'] = bhs['SHORT_HFCODE'].map(muni_dict['PROVINCE'])
mean_dist_prov = bhs.groupby('province')['DIST'].mean().to_dict()
imputed_prov = ~(bhs.loc[bhs['DIST'].isnull(), 'province'].map(mean_dist_prov).isnull())
imputed_prov = imputed_prov[imputed_prov].index
bhs.loc[imputed_prov, "IMPUTED"] = "PROV"
bhs.loc[bhs['DIST'].isnull(), 'DIST'] = bhs.loc[bhs['DIST'].isnull(), 'province'].map(mean_dist_prov)
bhs['DIST'].isnull().sum()
6
bhs.dropna(subset=['DIST'], inplace=True)
bhs = bhs[['SHORT_HFCODE', 'REF_CODE', 'DIST', 'IMPUTED']]
bhs = bhs[bhs['REF_CODE'] != bhs['SHORT_HFCODE']]
edge_list = rhu_edges.append(bhs)
edge_list
DIST | IMPUTED | REF_CODE | SHORT_HFCODE | |
---|---|---|---|---|
0 | 0.00 | NaN | 32,170.00 | 25 |
1 | 0.41 | NaN | 5,940.00 | 105 |
2 | 11.13 | NaN | 3,313.00 | 106 |
4 | 11.29 | NaN | 3,313.00 | 137 |
5 | 5.98 | NaN | 6,513.00 | 1760 |
... | ... | ... | ... | ... |
19297 | 8.88 | NaN | 7,207.00 | 17805 |
19325 | 3.56 | NaN | 79.00 | 17893 |
19345 | 0.65 | NaN | 27,529.00 | 27530 |
19367 | 1.68 | NaN | 169.00 | 29236 |
19368 | 12.67 | NaN | 169.00 | 29238 |
12793 rows × 4 columns
edge_list = edge_list.groupby(['REF_CODE', 'SHORT_HFCODE'])[['IMPUTED', 'DIST']].agg({'DIST':'mean', 'IMPUTED':'first'}).reset_index()
edge_list.to_excel('edge_list.xlsx', index=False)
Using the facilities as nodes and the referral as edges, a tripartite network of BHS, RHU, and hospitals is created. The characteristics of the healthcare provider network for different regions are characterized and explored in this section. The main metric used in the analysis is degree and path length.
import pandas as pd
import networkx as nx
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import itertools
import random
nodes = pd.read_excel("nodeslist.xlsx")
edges = pd.read_excel("edge_list.xlsx")
For the succeeding analysis, the imputed distances using the nearest neighbors are not included.
edges = edges[(edges['IMPUTED']!='NEAREST_NEIGHBOR') | (edges['SHORT_HFCODE']!=6108)]
nodes.drop(['LAT', 'LONG'], axis=1, inplace=True)
G = nx.from_pandas_edgelist(edges, source='SHORT_HFCODE', target='REF_CODE', edge_attr='DIST', create_using=nx.DiGraph)
nodes.index = nodes['SHORT_HFCODE']
node_attrib = nodes.to_dict()
for col in nodes.columns:
nx.set_node_attributes(G, node_attrib[col], col)
bhs_nodes = [i for i, j in G.nodes(data=True) if j.get('HF_TYPE')=='BARANGAY HEALTH STATION']
rhu_nodes = [i for i, j in G.nodes(data=True) if j.get('HF_TYPE')=='RURAL HEALTH UNIT']
hosp_nodes = [i for i, j in G.nodes(data=True) if ((j.get('HF_TYPE')=='HOSPITAL') or (j.get('HF_TYPE')=='INFIRMARY'))]
import seaborn as sns
region_deg = {}
for region in nodes['REGION'].unique():
deg = list(dict(G.in_degree(nodes[((nodes['HF_TYPE']=='INFIRMARY') | (nodes['HF_TYPE']=='HOSPITAL')) & (nodes['REGION']==region)]['SHORT_HFCODE'])).values())
region_deg[region] = deg
df_deg = pd.DataFrame(list(itertools.chain(*[list(zip(len(j) * [i], j)) for i, j in region_deg.items()]))).rename({0:'Region', 1:'Degree'}, axis=1)#.boxplot(by=0)
region_map = {'AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM)':'ARMM',
'CORDILLERA ADMINISTRA TIVE REGION (CAR)':'CAR',
'REGION IV-B (MIMAROPA)':'IV-B', 'NATIONAL CAPITAL REGION (NCR)':'NCR',
'REGION X (NORTHERN MINDANAO)':'X', 'REGION XI (DAVAO REGION)':'XI',
'REGION XII (SOCCSKSA RGEN)':'XII', 'REGION XIII (CARAGA)':'XIII',
'REGION I (ILOCOS REGION)':'I', 'REGION II (CAGAYAN VALLEY)':'II',
'REGION III (CENTRAL LUZON)':'III', 'REGION IV-A (CALABAR ZON)':'IV-A',
'REGION V (BICOL REGION)':'V', 'REGION VI (WESTERN VISAYAS)':'VI',
'REGION VII (CENTRAL VISAYAS)':'VII', 'REGION VIII (EASTERN VISAYAS)':'VIII',
'REGION IX (ZAMBOANGA PENINSULA)':'IX'}
df_deg['Region'] = df_deg['Region'].map(region_map)
(df_deg.groupby('Region')['Degree']
.agg(['min', 'max', 'mean', 'median', 'count']).sort_values(by='Region')
[['max', 'mean', 'median', 'count']])
max | mean | median | count | |
---|---|---|---|---|
Region | ||||
ARMM | 24 | 6.94 | 4.50 | 18 |
CAR | 32 | 9.50 | 7.00 | 36 |
I | 68 | 14.95 | 10.00 | 39 |
II | 58 | 9.66 | 7.00 | 38 |
III | 153 | 17.21 | 12.50 | 62 |
IV-A | 144 | 11.51 | 8.00 | 61 |
IV-B | 59 | 8.32 | 6.00 | 34 |
IX | 53 | 13.17 | 11.00 | 23 |
NCR | 73 | 29.92 | 28.50 | 48 |
V | 65 | 12.10 | 8.00 | 41 |
VI | 96 | 10.84 | 7.00 | 50 |
VII | 63 | 10.23 | 8.00 | 48 |
VIII | 146 | 15.92 | 11.00 | 39 |
X | 47 | 8.58 | 6.50 | 36 |
XI | 30 | 6.85 | 5.50 | 26 |
XII | 47 | 8.10 | 5.00 | 21 |
XIII | 37 | 7.58 | 6.00 | 26 |
upper_fence = df_deg.groupby('Region')['Degree'].quantile(0.75) + 1.5 * (df_deg.groupby('Region')['Degree'].quantile(0.75) - df_deg.groupby('Region')['Degree'].quantile(0.25))
fig, ax = plt.subplots(figsize=(10,8))
# fig = plt.figure()
sns.boxplot(data=df_deg, x='Region', y='Degree', palette='Blues',
order=upper_fence.sort_values(ascending=False).index)
labels = [l.get_text() for l in ax.get_xticklabels()]
ax.set_xticklabels(labels, ha='right');
df_deg['Outlierness'] = df_deg['Degree'] - df_deg['Region'].map(upper_fence)
deg_outlier = df_deg[df_deg['Outlierness'] > 0].groupby('Region')['Outlierness'].agg(['min', 'max', 'mean', 'median', 'count'])
deg_outlier
min | max | mean | median | count | |
---|---|---|---|---|---|
Region | |||||
ARMM | 6.00 | 6.00 | 6.00 | 6.00 | 1 |
CAR | 3.75 | 10.75 | 7.42 | 7.75 | 3 |
I | 1.75 | 39.75 | 27.00 | 33.25 | 4 |
II | 0.62 | 34.62 | 13.62 | 5.62 | 3 |
III | 4.50 | 123.50 | 38.83 | 27.50 | 6 |
IV-A | 8.00 | 123.00 | 39.50 | 13.50 | 4 |
IV-B | 2.00 | 45.00 | 14.25 | 5.00 | 4 |
IX | 16.00 | 16.00 | 16.00 | 16.00 | 1 |
NCR | 4.88 | 4.88 | 4.88 | 4.88 | 1 |
V | 12.00 | 45.00 | 31.25 | 34.00 | 4 |
VI | 7.62 | 76.62 | 27.38 | 12.62 | 4 |
VII | 1.62 | 41.62 | 14.62 | 7.62 | 4 |
VIII | 0.25 | 121.25 | 34.75 | 8.75 | 4 |
X | 17.00 | 32.00 | 24.50 | 24.50 | 2 |
XI | 8.00 | 10.00 | 9.00 | 9.00 | 2 |
XII | 26.50 | 26.50 | 26.50 | 26.50 | 1 |
XIII | 2.62 | 23.62 | 13.12 | 13.12 | 2 |
import seaborn as sns
region_deg = {}
for region in nodes['REGION'].unique():
deg = list(dict(G.in_degree(nodes[(nodes['HF_TYPE']=='RURAL HEALTH UNIT') & (nodes['REGION']==region)]['SHORT_HFCODE'])).values())
region_deg[region] = deg
df_deg = pd.DataFrame(list(itertools.chain(*[list(zip(len(j) * [i], j)) for i, j in region_deg.items()]))).rename({0:'Region', 1:'Degree'}, axis=1)#.boxplot(by=0)
region_map = {'AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM)':'ARMM',
'CORDILLERA ADMINISTRA TIVE REGION (CAR)':'CAR',
'REGION IV-B (MIMAROPA)':'IV-B', 'NATIONAL CAPITAL REGION (NCR)':'NCR',
'REGION X (NORTHERN MINDANAO)':'X', 'REGION XI (DAVAO REGION)':'XI',
'REGION XII (SOCCSKSA RGEN)':'XII', 'REGION XIII (CARAGA)':'XIII',
'REGION I (ILOCOS REGION)':'I', 'REGION II (CAGAYAN VALLEY)':'II',
'REGION III (CENTRAL LUZON)':'III', 'REGION IV-A (CALABAR ZON)':'IV-A',
'REGION V (BICOL REGION)':'V', 'REGION VI (WESTERN VISAYAS)':'VI',
'REGION VII (CENTRAL VISAYAS)':'VII', 'REGION VIII (EASTERN VISAYAS)':'VIII',
'REGION IX (ZAMBOANGA PENINSULA)':'IX'}
df_deg['Region'] = df_deg['Region'].map(region_map)
(df_deg.groupby('Region')['Degree']
.agg(['min', 'max', 'mean', 'median', 'count']).sort_values(by='Region')
[['max', 'mean', 'median', 'count']])
max | mean | median | count | |
---|---|---|---|---|
Region | ||||
ARMM | 43 | 3.72 | 1.00 | 76 |
CAR | 27 | 7.54 | 6.00 | 98 |
I | 58 | 9.66 | 7.00 | 148 |
II | 45 | 13.63 | 11.00 | 86 |
III | 24 | 6.53 | 5.00 | 264 |
IV-A | 60 | 9.97 | 8.00 | 183 |
IV-B | 50 | 13.84 | 11.00 | 79 |
IX | 39 | 8.04 | 6.50 | 96 |
NCR | 25 | 0.29 | 0.00 | 384 |
V | 40 | 9.53 | 8.00 | 129 |
VI | 47 | 12.17 | 10.00 | 142 |
VII | 37 | 11.29 | 9.00 | 141 |
VIII | 22 | 4.84 | 4.00 | 161 |
X | 73 | 10.98 | 9.00 | 88 |
XI | 41 | 15.94 | 16.00 | 63 |
XII | 49 | 16.78 | 18.00 | 46 |
XIII | 35 | 4.95 | 3.50 | 58 |
upper_fence = df_deg.groupby('Region')['Degree'].quantile(0.75) + 1.5 * (df_deg.groupby('Region')['Degree'].quantile(0.75) - df_deg.groupby('Region')['Degree'].quantile(0.25))
fig, ax = plt.subplots(figsize=(10,8))
# fig = plt.figure()
sns.boxplot(data=df_deg, x='Region', y='Degree', palette='Blues',
order=upper_fence.sort_values(ascending=False).index)
labels = [l.get_text() for l in ax.get_xticklabels()]
ax.set_xticklabels(labels, ha='right');
df_deg['Outlierness'] = df_deg['Degree'] - df_deg['Region'].map(upper_fence)
deg_outlier_rhu = df_deg[df_deg['Outlierness'] > 0].groupby('Region')['Outlierness'].agg(['min', 'max', 'mean', 'median', 'count'])
deg_outlier_rhu.loc['CAR'] = 0
sp = list(nx.all_pairs_dijkstra_path_length(G, weight='DIST'))
sp = pd.DataFrame(sp).set_index(0)[1].to_dict()
bhs_rhu_ave_sp = {}
for rhu_node in rhu_nodes:
ave_sp = 0
N = 0
for bhs_node in bhs_nodes:
if sp.get(bhs_node) and sp.get(bhs_node, {}).get(rhu_node):
ave_sp += sp[bhs_node][rhu_node]
N += 1
if N:
ave_sp /= N
bhs_rhu_ave_sp[rhu_node] = ave_sp
df_bhs_rhu = pd.DataFrame(bhs_rhu_ave_sp.items(), columns=['HF_CODE', 'AVE_SPL'])
df_bhs_rhu['REGION'] = df_bhs_rhu['HF_CODE'].map(nodes['REGION']).map(region_map)
bhs_rhu_spl = df_bhs_rhu.groupby('REGION')['AVE_SPL'].agg(['min', 'max', 'mean', 'median', 'count'])
bhs_rhu_spl
min | max | mean | median | count | |
---|---|---|---|---|---|
REGION | |||||
ARMM | 0.65 | 30.09 | 5.56 | 4.06 | 43 |
CAR | 1.52 | 21.32 | 6.02 | 5.03 | 75 |
I | 0.00 | 31.66 | 4.45 | 3.59 | 140 |
II | 2.39 | 31.66 | 5.75 | 5.40 | 82 |
III | 0.09 | 24.48 | 3.19 | 2.70 | 237 |
IV-A | 0.21 | 30.29 | 3.93 | 3.35 | 157 |
IV-B | 2.34 | 51.20 | 8.06 | 5.81 | 75 |
IX | 0.86 | 42.92 | 6.72 | 4.83 | 92 |
NCR | 0.25 | 73.30 | 14.82 | 3.23 | 19 |
V | 0.13 | 32.46 | 6.71 | 4.98 | 118 |
VI | 0.54 | 38.20 | 5.56 | 4.35 | 136 |
VII | 1.09 | 28.91 | 5.10 | 4.27 | 134 |
VIII | 0.04 | 86.02 | 6.28 | 4.21 | 145 |
X | 1.00 | 23.34 | 5.95 | 4.62 | 83 |
XI | 0.79 | 47.85 | 8.91 | 6.29 | 58 |
XII | 2.23 | 67.11 | 9.84 | 6.95 | 44 |
XIII | 0.65 | 34.38 | 6.72 | 5.89 | 43 |
fig, ax = plt.subplots(figsize=(10,8))
ax.plot(deg_outlier_rhu['mean'], bhs_rhu_spl['mean'], 'o', color='deepskyblue',
markersize='15')
for i,j,k in zip(deg_outlier_rhu['mean'].values, bhs_rhu_spl['mean'].values, list(deg_outlier_rhu.index)):
ax.text(i,j,k, fontsize=14)
ax.set_xlabel("Degree Outlierness (RHU)", fontsize=14)
ax.set_ylabel("Average shortest path length (BHS->RHU)", fontsize=14);
rhu_hosp_ave_sp = {}
for hosp_node in hosp_nodes:
ave_sp = 0
N = 0
for rhu_node in rhu_nodes:
if sp.get(rhu_node) and sp.get(rhu_node, {}).get(hosp_node):
ave_sp += sp[rhu_node][hosp_node]
N += 1
if N:
ave_sp /= N
rhu_hosp_ave_sp[hosp_node] = ave_sp
df_rhu_hosp = pd.DataFrame(rhu_hosp_ave_sp.items(), columns=['HF_CODE', 'AVE_SPL'])
df_rhu_hosp['REGION'] = df_rhu_hosp['HF_CODE'].map(nodes['REGION']).map(region_map)
rhu_hosp_spl = df_rhu_hosp.groupby('REGION')['AVE_SPL'].agg(['min', 'max', 'mean', 'median', 'count'])
rhu_hosp_spl
min | max | mean | median | count | |
---|---|---|---|---|---|
REGION | |||||
ARMM | 13.96 | 52.74 | 30.15 | 32.53 | 18 |
CAR | 6.52 | 30.51 | 16.47 | 16.00 | 35 |
I | 5.73 | 34.93 | 16.85 | 14.64 | 36 |
II | 0.22 | 43.62 | 18.39 | 14.65 | 32 |
III | 3.53 | 33.86 | 12.97 | 10.79 | 57 |
IV-A | 3.13 | 72.94 | 16.03 | 15.33 | 57 |
IV-B | 5.94 | 54.63 | 27.00 | 25.51 | 27 |
IX | 0.42 | 56.19 | 23.80 | 22.07 | 21 |
NCR | 1.20 | 33.93 | 7.21 | 3.62 | 45 |
V | 10.95 | 52.01 | 22.74 | 19.48 | 39 |
VI | 7.50 | 40.18 | 18.40 | 16.46 | 47 |
VII | 5.43 | 45.77 | 20.58 | 18.71 | 45 |
VIII | 7.55 | 66.16 | 26.83 | 23.98 | 39 |
X | 1.53 | 53.28 | 20.28 | 17.49 | 36 |
XI | 7.14 | 75.00 | 29.56 | 25.20 | 20 |
XII | 14.24 | 81.92 | 34.14 | 19.55 | 17 |
XIII | 7.66 | 40.07 | 19.27 | 18.85 | 25 |
fig, ax = plt.subplots(figsize=(10,8))
ax.plot(deg_outlier['mean'], rhu_hosp_spl['mean'], 'o', color='deepskyblue',
markersize='15')
for i,j,k in zip(deg_outlier['mean'].values, rhu_hosp_spl['mean'].values,
list(deg_outlier.index)):
ax.text(i,j,k, fontsize=14)
ax.set_xlabel("Degree Outlierness (Hosp)", fontsize=14)
ax.set_ylabel("Average shortest path length (RHU->Hosp)", fontsize=14);
#I changed the ylabel to RHU->Hosp
bhs_hosp_ave_sp = {}
for hosp_node in hosp_nodes:
ave_sp = 0
N = 0
for bhs_node in bhs_nodes:
if sp.get(bhs_node) and sp.get(bhs_node, {}).get(hosp_node):
ave_sp += sp[bhs_node][hosp_node]
N += 1
if N:
ave_sp /= N
bhs_hosp_ave_sp[hosp_node] = ave_sp
df_bhs_hosp = pd.DataFrame(bhs_hosp_ave_sp.items(), columns=['HF_CODE', 'AVE_SPL'])
df_bhs_hosp['REGION'] = df_bhs_hosp['HF_CODE'].map(nodes['REGION']).map(region_map)
bhs_hosp_spl = df_bhs_hosp.groupby('REGION')['AVE_SPL'].agg(['min', 'max', 'mean', 'median', 'count'])
bhs_hosp_spl
min | max | mean | median | count | |
---|---|---|---|---|---|
REGION | |||||
ARMM | 4.51 | 65.32 | 35.23 | 38.25 | 17 |
CAR | 10.50 | 40.53 | 22.46 | 21.81 | 35 |
I | 3.78 | 36.87 | 18.46 | 16.85 | 39 |
II | 4.40 | 49.01 | 21.31 | 21.14 | 35 |
III | 1.27 | 37.93 | 15.56 | 12.76 | 62 |
IV-A | 1.00 | 78.27 | 18.08 | 15.30 | 60 |
IV-B | 3.78 | 56.02 | 29.10 | 28.42 | 30 |
IX | 0.87 | 61.03 | 25.67 | 24.71 | 23 |
NCR | 1.21 | 68.62 | 19.19 | 6.63 | 41 |
V | 0.67 | 51.68 | 25.13 | 22.92 | 41 |
VI | 9.23 | 49.10 | 21.48 | 19.70 | 48 |
VII | 6.58 | 62.52 | 23.95 | 23.19 | 48 |
VIII | 12.82 | 88.35 | 33.74 | 29.56 | 39 |
X | 7.83 | 52.84 | 23.69 | 22.38 | 36 |
XI | 1.86 | 74.25 | 29.39 | 30.16 | 25 |
XII | 2.80 | 71.17 | 30.50 | 23.27 | 20 |
XIII | 6.83 | 57.35 | 24.26 | 24.00 | 26 |
fig, ax = plt.subplots(figsize=(10,8))
ax.plot(deg_outlier['mean'], bhs_hosp_spl['mean'], 'o', color='deepskyblue',
markersize='15')
for i,j,k in zip(deg_outlier['mean'].values, bhs_hosp_spl['mean'].values, list(deg_outlier.index)):
ax.text(i,j,k, fontsize=14)
ax.set_xlabel("Degree Outlierness (Hosp)", fontsize=14)
ax.set_ylabel("Average shortest path length (BHS->Hosp)", fontsize=14);
## I changed the ylabel to BHS -> Hosp
After characterizing the network, further analysis were performed by simulating what would happen to the network as higher percentages of the population enters the system. That is, the facilities that would overload are identified based on the bed capacity.
nodes.groupby('HF_TYPE')['CATCHMENT'].describe()
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
HF_TYPE | ||||||||
BARANGAY HEALTH STATION | 17,435.00 | 3,800.18 | 6,013.85 | 0.00 | 1,302.00 | 2,444.00 | 4,408.50 | 123,708.00 |
HOSPITAL | 0.00 | nan | nan | nan | nan | nan | nan | nan |
INFIRMARY | 0.00 | nan | nan | nan | nan | nan | nan | nan |
RURAL HEALTH UNIT | 2,172.00 | 38,663.14 | 44,947.38 | 0.00 | 16,981.75 | 30,224.00 | 46,929.25 | 903,309.00 |
intake = nodes.loc[(nodes['HF_TYPE']=='RURAL HEALTH UNIT') | (nodes['HF_TYPE']=='BARANGAY HEALTH STATION'), ['REGION', 'CATCHMENT', 'HF_TYPE']]
intake.groupby('REGION')['CATCHMENT'].agg(['mean','median', 'min', 'max'])
mean | median | min | max | |
---|---|---|---|---|
REGION | ||||
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM) | 9,252.73 | 2,682.00 | 0.00 | 222,885.00 |
CORDILLERA ADMINISTRA TIVE REGION (CAR) | 3,688.14 | 1,187.00 | 0.00 | 134,461.00 |
NATIONAL CAPITAL REGION (NCR) | 30,745.29 | 25,084.00 | 0.00 | 290,977.00 |
REGION I (ILOCOS REGION) | 6,133.94 | 2,556.00 | 0.00 | 204,135.00 |
REGION II (CAGAYAN VALLEY) | 4,528.67 | 1,826.00 | 0.00 | 158,245.00 |
REGION III (CENTRAL LUZON) | 9,866.27 | 3,700.50 | 0.00 | 260,691.00 |
REGION IV-A (CALABAR ZON) | 9,040.42 | 3,122.50 | 0.00 | 463,727.00 |
REGION IV-B (MIMAROPA) | 5,363.06 | 2,104.50 | 0.00 | 272,190.00 |
REGION IX (ZAMBOANGA PENINSULA) | 9,427.20 | 3,773.50 | 0.00 | 903,309.00 |
REGION V (BICOL REGION) | 7,685.29 | 3,480.50 | 0.00 | 210,047.00 |
REGION VI (WESTERN VISAYAS) | 7,620.31 | 3,293.00 | 0.00 | 597,740.00 |
REGION VII (CENTRAL VISAYAS) | 5,827.44 | 2,131.00 | 0.00 | 414,265.00 |
REGION VIII (EASTERN VISAYAS) | 9,092.89 | 4,577.00 | 0.00 | 387,813.00 |
REGION X (NORTHERN MINDANAO) | 6,047.59 | 2,369.00 | 0.00 | 197,890.00 |
REGION XI (DAVAO REGION) | 5,155.10 | 1,818.50 | 0.00 | 175,935.00 |
REGION XII (SOCCSKSA RGEN) | 8,722.27 | 2,168.00 | 0.00 | 453,585.00 |
REGION XIII (CARAGA) | 7,212.89 | 2,530.00 | 0.00 | 137,527.00 |
num_iter=1000
#check changes in the system if 0.001% to 1% of population enters the HCPN;
pop_prop = 0.00001
hosp_zero_cap = nodes[((nodes['HF_TYPE']=='HOSPITAL') | (nodes['HF_TYPE']=='INFIRMARY')) & (nodes['BED_CAP']==0)].SHORT_HFCODE.values
len(hosp_zero_cap)
0
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(9, 6))
intake_t = intake['CATCHMENT'] * pop_prop
df_isolates = pd.DataFrame()
df_rem_hosps = pd.DataFrame()
# ax2 = ax1.twinx()
for region in ['NATIONAL CAPITAL REGION (NCR)', 'REGION VIII (EASTERN VISAYAS)']:
rhu_reg = nodes[(nodes['REGION']==region) & (nodes['HF_TYPE']=='RURAL HEALTH UNIT')]['SHORT_HFCODE']
hosp_reg = nodes[(nodes['REGION']==region) & ((nodes['HF_TYPE']=='HOSPITAL') | (nodes['HF_TYPE']=='INFIRMARY'))]['SHORT_HFCODE']
G_reg = G.subgraph(rhu_reg.tolist() + hosp_reg.tolist()).copy()
hosp_reg_connected = [i for i in hosp_reg if ((i not in nx.isolates(G_reg)) and (i not in hosp_zero_cap))]
remaining_hosps = [len(hosp_reg_connected)]
G_reg.remove_nodes_from(list(nx.isolates(G_reg)))
G_reg.remove_nodes_from(hosp_zero_cap)
num_isolates = [0]
remaining_hosp = [i for i, j in G_reg.nodes(data=True) if ((j['HF_TYPE']=='HOSPITAL') or (j['HF_TYPE']=='INFIRMARY'))]
# t=0
# while remaining_hosp:
for t in range(num_iter):
removed_hosps = []
for rhu in rhu_reg:
try:
N = len(list(G_reg.neighbors(rhu)))
except nx.NetworkXError:
continue
nearby_hosps = list(G_reg.neighbors(rhu))
# dist_sum = G_reg.out_degree(rhu, weight='DIST')
dist_sum = sum([G_reg[rhu][i]['DIST'] for i in nearby_hosps if G_reg[rhu][i]['DIST']==G_reg[rhu][i]['DIST']])
for hosp in nearby_hosps:
if len(nearby_hosps)==1:
prob_hosp = 1
else:
prob_hosp = (1 - G_reg[rhu][hosp]['DIST'] / dist_sum) if G_reg[rhu][hosp]['DIST']==G_reg[rhu][hosp]['DIST'] else 0
new_patient = intake_t[rhu] * prob_hosp if intake_t[rhu]==intake_t[rhu] else 0
add_one = 1 if random.random()<new_patient%1 else 0
if (G_reg.nodes()[hosp].get('patients', 0) + int(new_patient) + add_one) > G_reg.nodes()[hosp]['BED_CAP']:
G_reg.remove_node(hosp)
removed_hosps.append(hosp)
else:
nx.set_node_attributes(G_reg, {hosp: G_reg.nodes()[hosp].get('patients', 0) + int(new_patient) + add_one}, 'patients')
remaining_hosp = [i for i, j in G_reg.nodes(data=True) if ((j['HF_TYPE']=='HOSPITAL') or (j['HF_TYPE']=='INFIRMARY'))]
remaining_hosps.append(len(remaining_hosp))
num_isolates.append(len(list(nx.isolates(G_reg))))
# print(t, removed_hosps)
df_rem_hosps.loc[region, "0.05%"] = remaining_hosps[np.where(pop_prop * np.arange(num_iter+1)==0.0005)[0][0]] / remaining_hosps[0]
df_rem_hosps.loc[region, "0.5%"] = remaining_hosps[np.where(pop_prop * np.arange(num_iter+1)==0.005)[0][0]] / remaining_hosps[0]
df_isolates.loc[region, "0.05%"] = num_isolates[np.where(pop_prop * np.arange(num_iter+1)==0.0005)[0][0]] / num_isolates[-1]
df_isolates.loc[region, "0.5%"] = num_isolates[np.where(pop_prop * np.arange(num_iter+1)==0.005)[0][0]] / num_isolates[-1]
# print("At 0.05%, remaining hosps are", remaining_hosps[np.where(pop_prop * np.arange(num_iter+1)==0.0005)[0][0]] / remaining_hosps[0], region)
# print("At 0.5%, remaining hosps are", remaining_hosps[np.where(pop_prop * np.arange(num_iter+1)==0.005)[0][0]] / remaining_hosps[0], region)
# print("At 0.05%, the isolated RHUs are", num_isolates[np.where(pop_prop * np.arange(num_iter+1)==0.0005)[0][0]] / num_isolates[-1], region)
# print("At 0.5%, the isolated RHUs are", num_isolates[np.where(pop_prop * np.arange(num_iter+1)==0.005)[0][0]] / num_isolates[-1], region)
ax1.plot(pop_prop * np.arange(num_iter+1), [i/remaining_hosps[0] for i in remaining_hosps], label=region)
ax2.plot(pop_prop * np.arange(num_iter+1), [i/num_isolates[-1] for i in num_isolates])
fig.legend()
ax1.set_xlabel('Percent of population in HCPN')
ax2.set_xlabel('Percent of population in HCPN')
ax1.set_xticklabels(['-0.2%', '0.0%', '0.2%', '0.4%', '0.6%', '0.8%', '1.0%'])
ax2.set_xticklabels(['-0.2%', '0.0%', '0.2%', '0.4%', '0.6%', '0.8%', '1.0%'])
ax1.set_ylabel('Percentage of remaining hospitals')
ax2.set_ylabel('Percentage of isolated RHUs')
# t+=1
# break
Text(0, 0.5, 'Percentage of isolated RHUs')
fig, [ax1, ax2] = plt.subplots(1, 2, figsize=(9, 6))
intake_t = intake['CATCHMENT'] * pop_prop
df_isolates = pd.DataFrame()
df_rem_hosps = pd.DataFrame()
# ax2 = ax1.twinx()
for region in nodes['REGION'].unique():
rhu_reg = nodes[(nodes['REGION']==region) & (nodes['HF_TYPE']=='RURAL HEALTH UNIT')]['SHORT_HFCODE']
hosp_reg = nodes[(nodes['REGION']==region) & ((nodes['HF_TYPE']=='HOSPITAL') | (nodes['HF_TYPE']=='INFIRMARY'))]['SHORT_HFCODE']
G_reg = G.subgraph(rhu_reg.tolist() + hosp_reg.tolist()).copy()
hosp_reg_connected = [i for i in hosp_reg if ((i not in nx.isolates(G_reg)) and (i not in hosp_zero_cap))]
remaining_hosps = [len(hosp_reg_connected)]
G_reg.remove_nodes_from(list(nx.isolates(G_reg)))
G_reg.remove_nodes_from(hosp_zero_cap)
num_isolates = [0]
remaining_hosp = [i for i, j in G_reg.nodes(data=True) if ((j['HF_TYPE']=='HOSPITAL') or (j['HF_TYPE']=='INFIRMARY'))]
# t=0
# while remaining_hosp:
for t in range(num_iter):
removed_hosps = []
for rhu in rhu_reg:
try:
N = len(list(G_reg.neighbors(rhu)))
except nx.NetworkXError:
continue
nearby_hosps = list(G_reg.neighbors(rhu))
# dist_sum = G_reg.out_degree(rhu, weight='DIST')
dist_sum = sum([G_reg[rhu][i]['DIST'] for i in nearby_hosps if G_reg[rhu][i]['DIST']==G_reg[rhu][i]['DIST']])
for hosp in nearby_hosps:
if len(nearby_hosps)==1:
prob_hosp = 1
else:
prob_hosp = (1 - G_reg[rhu][hosp]['DIST'] / dist_sum) if G_reg[rhu][hosp]['DIST']==G_reg[rhu][hosp]['DIST'] else 0
new_patient = intake_t[rhu] * prob_hosp if intake_t[rhu]==intake_t[rhu] else 0
add_one = 1 if random.random()<new_patient%1 else 0
if (G_reg.nodes()[hosp].get('patients', 0) + int(new_patient) + add_one) > G_reg.nodes()[hosp]['BED_CAP']:
G_reg.remove_node(hosp)
removed_hosps.append(hosp)
else:
nx.set_node_attributes(G_reg, {hosp: G_reg.nodes()[hosp].get('patients', 0) + int(new_patient) + add_one}, 'patients')
remaining_hosp = [i for i, j in G_reg.nodes(data=True) if ((j['HF_TYPE']=='HOSPITAL') or (j['HF_TYPE']=='INFIRMARY'))]
remaining_hosps.append(len(remaining_hosp))
num_isolates.append(len(list(nx.isolates(G_reg))))
# print(t, removed_hosps)
df_rem_hosps.loc[region, "0.05%"] = remaining_hosps[np.where(pop_prop * np.arange(num_iter+1)==0.0005)[0][0]] / remaining_hosps[0]
df_rem_hosps.loc[region, "0.5%"] = remaining_hosps[np.where(pop_prop * np.arange(num_iter+1)==0.005)[0][0]] / remaining_hosps[0]
df_isolates.loc[region, "0.05%"] = num_isolates[np.where(pop_prop * np.arange(num_iter+1)==0.0005)[0][0]] / num_isolates[-1]
df_isolates.loc[region, "0.5%"] = num_isolates[np.where(pop_prop * np.arange(num_iter+1)==0.005)[0][0]] / num_isolates[-1]
# print("At 0.05%, remaining hosps are", remaining_hosps[np.where(pop_prop * np.arange(num_iter+1)==0.0005)[0][0]] / remaining_hosps[0], region)
# print("At 0.5%, remaining hosps are", remaining_hosps[np.where(pop_prop * np.arange(num_iter+1)==0.005)[0][0]] / remaining_hosps[0], region)
# print("At 0.05%, the isolated RHUs are", num_isolates[np.where(pop_prop * np.arange(num_iter+1)==0.0005)[0][0]] / num_isolates[-1], region)
# print("At 0.5%, the isolated RHUs are", num_isolates[np.where(pop_prop * np.arange(num_iter+1)==0.005)[0][0]] / num_isolates[-1], region)
ax1.plot(pop_prop * np.arange(num_iter+1), [i/remaining_hosps[0] for i in remaining_hosps], label=region)
ax2.plot(pop_prop * np.arange(num_iter+1), [i/num_isolates[-1] for i in num_isolates])
fig.legend()
ax1.set_xlabel('Percent of population in HCPN')
ax2.set_xlabel('Percent of population in HCPN')
ax1.set_xticklabels(['-0.2%', '0.0%', '0.2%', '0.4%', '0.6%', '0.8%', '1.0%'])
ax2.set_xticklabels(['-0.2%', '0.0%', '0.2%', '0.4%', '0.6%', '0.8%', '1.0%'])
ax1.set_ylabel('Percentage of remaining hospitals')
ax2.set_ylabel('Percentage of isolated RHUs')
# t+=1
# break
Text(0, 0.5, 'Percentage of isolated RHUs')
print("Remaining hospitals")
(df_rem_hosps*100)
Remaining hospitals
0.05% | 0.5% | |
---|---|---|
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM) | 27.78 | 0.00 |
CORDILLERA ADMINISTRA TIVE REGION (CAR) | 41.67 | 2.78 |
REGION IV-B (MIMAROPA) | 24.24 | 3.03 |
NATIONAL CAPITAL REGION (NCR) | 26.67 | 8.89 |
REGION X (NORTHERN MINDANAO) | 22.22 | 5.56 |
REGION XI (DAVAO REGION) | 38.10 | 23.81 |
REGION XII (SOCCSKSA RGEN) | 38.89 | 22.22 |
REGION XIII (CARAGA) | 32.00 | 12.00 |
REGION I (ILOCOS REGION) | 13.89 | 5.56 |
REGION II (CAGAYAN VALLEY) | 9.09 | 0.00 |
REGION III (CENTRAL LUZON) | 5.26 | 1.75 |
REGION IV-A (CALABAR ZON) | 8.77 | 1.75 |
REGION V (BICOL REGION) | 5.13 | 2.56 |
REGION VI (WESTERN VISAYAS) | 8.00 | 2.00 |
REGION VII (CENTRAL VISAYAS) | 11.11 | 4.44 |
REGION VIII (EASTERN VISAYAS) | 17.95 | 5.13 |
REGION IX (ZAMBOANGA PENINSULA) | 19.05 | 14.29 |
print("RHU isolates")
(df_isolates*100)
RHU isolates
0.05% | 0.5% | |
---|---|---|
AUTONOMOUS REGION IN MUSLIM MINDANAO (ARMM) | 56.60 | 100.00 |
CORDILLERA ADMINISTRA TIVE REGION (CAR) | 17.81 | 100.00 |
REGION IV-B (MIMAROPA) | 76.09 | 100.00 |
NATIONAL CAPITAL REGION (NCR) | 48.15 | 92.59 |
REGION X (NORTHERN MINDANAO) | 52.63 | 100.00 |
REGION XI (DAVAO REGION) | 5.26 | 100.00 |
REGION XII (SOCCSKSA RGEN) | 40.00 | 100.00 |
REGION XIII (CARAGA) | 68.57 | 100.00 |
REGION I (ILOCOS REGION) | 96.12 | 100.00 |
REGION II (CAGAYAN VALLEY) | 79.49 | 100.00 |
REGION III (CENTRAL LUZON) | 94.42 | 100.00 |
REGION IV-A (CALABAR ZON) | 91.11 | 100.00 |
REGION V (BICOL REGION) | 82.35 | 100.00 |
REGION VI (WESTERN VISAYAS) | 79.79 | 100.00 |
REGION VII (CENTRAL VISAYAS) | 97.47 | 100.00 |
REGION VIII (EASTERN VISAYAS) | 92.59 | 100.00 |
REGION IX (ZAMBOANGA PENINSULA) | 100.00 | 100.00 |