2. IFs Dataset

This section describes the steps to process the IFs dataset, applying filters, cleaning data, and transforming it to align with WASH indicators and milestones. It includes specific functions for handling and manipulating IFs data, preparing it for further analysis and visualization.

Final Columns for Output

The following columns are retained in the final dataset for consistency and usability in downstream processing:

final_columns = ['indicator','year','country','unit','value_name','jmp_category',
                 'commitment','value','base_value','initial_value','2030','2050']

Files to Keep

The IFs dataset includes a selected list of files, each representing various indicators or metrics relevant to WASH. These files are stored in IFS_INPUT_DIR and include the following:

files_to_keep = [
    '01. Deaths by Category of Cause - Millions (2nd Dimensions = Diarrhea).csv',
    '06. Poverty Headcount less than $2.15 per Day, Log Normal - Millions.csv',
    '08. State Failure Instability Event - IFs Index.csv',
    '11. Governance Effectiveness - WB index.csv',
    # Additional files listed in the dataset...
]

Year Filtering Configuration

The year filtering configuration determines which years are kept in the final dataset, allowing for analysis by key milestone years (2019, 2030, and 2050).

year_filter_config = {
    "year_range": {
        "years": list(range(2018, 2051)),
        "files": [
            '13. Sanitation Services, Access, percent of population (2nd Dimensions = Basic + Safely Managed).csv',
            '17. Water Services, Access, percent of population (2nd Dimensions = Basic + Safely Managed).csv',
        ]
    },
    "milestone_years": [2019, 2030, 2050]
}

2.A. IFS Functions

The following functions perform specific transformations, cleanups, and mappings on the IFs dataset to ensure consistency with the WASH indicators.

  1. base_jmp_category

    Determines the JMP category based on the “2nd_dimension” field and assigns either “BS” (Basic) or “SM” (Safely Managed) accordingly.

    def base_jmp_category(x):
        if "Base" in str(x["value_name"]):
            if "Basic" in x["2nd_dimension"]:
                return "BS"
            if "Safely" in x["2nd_dimension"]:
                return "SM"
            return np.nan
        return x["jmp_category"]
    
  2. get_ifs_name

    Cleans up the source file name by removing unnecessary components, such as “2nd Dimension” information, and provides a simplified file name.

    def get_ifs_name(source):
        source = re.sub(r"\s*\(2nd Dimension.*?\)", "", source)
        return re.sub(r'^\d+\. ', '', source.replace(f"{IFS_INPUT_DIR}/", "")).replace(".csv", "")
    
  3. get_value_types

    Splits a string to extract value types, handling numeric formatting issues that may be present in file names.

    def get_value_types(lst):
        lst = lst.split('.')[0]
        lst = lst.replace('_0_','_0.').split("_")
        return lst
    
  4. cleanup_data

    Cleans up the data by removing unnecessary characters in the “unit” and “value” columns, replacing empty values with NaN for consistency.

    def cleanup_data(dataframe):
        dataframe['unit'] = dataframe['unit'].apply(lambda x: x.replace("2017","") if x else None)
        dataframe['value'] = dataframe['value'].apply(lambda x: x.replace(' ','') if ' ' in str(x) else x)
        dataframe['value'] = dataframe['value'].apply(lambda x: x if len(str(x)) > 0 else np.nan)
    
  5. filter_dataframe_by_year

    Filters the dataset by year based on the file name, using either a predefined range or milestone years.

    def filter_dataframe_by_year(dataframe, filename):
        filename = filename.split("/")[3]
        if filename in year_filter_config["year_range"]["files"]:
            filtered_df = dataframe[dataframe['year'].isin(year_filter_config["year_range"]["years"])]
        else:
            filtered_df = dataframe[dataframe['year'].isin(year_filter_config["milestone_years"])]
        return filtered_df.reset_index(drop=True)
    
  6. remove_unmatches_jmp_category

    Removes entries where the JMP category does not match the expected “Base” value or is inconsistent with the “2nd_dimension” field.

    def remove_unmatches_jmp_category(x):
        if x["value_name"] != "Base":
            if x["2nd_dimension"] == "Basic" and x["jmp_category"] == "SM":
                return True
            if x["2nd_dimension"] == "SafelyManaged" and x["jmp_category"] == "ALB":
                return True
            if x["2nd_dimension"] == "SafelyManaged" and x["jmp_category"] == "BS":
                return True
        return False
    
  7. remove_unmatch_commitment

    Removes entries where the year does not align with the defined “commitment” target (e.g., 2030 or 2050).

    def remove_unmatch_commitment(x):
        if "2030" in x['commitment']:
            if x["year"] == 2050 or x["year"] > 2030:
                return True
        if "2050" in x['commitment'] and x["year"] == 2030:
            return True
        return False
    
  8. add_initial_value_for_wash

    Adds initial values for WASH indicators based on the earliest year available in the dataset.

    def add_initial_value_for_wash(x, dataframe):
        if x["year"] in [2030, 2050, 2022]:
            value_of_min_year = list(dataframe[
                (dataframe["indicator"] == x["indicator"]) &
                (dataframe["country"] == x["country"]) &
                (dataframe["jmp_category"] == x["jmp_category"]) &
                (dataframe["year"] == dataframe["year"].min())
            ]['value'])
            if len(value_of_min_year):
                return value_of_min_year[0]
        return np.nan
    
  9. add_base_value

    Adds the base value for an indicator, using “Base” entries in the dataset if available.

    def add_base_value(x, dataframe, is_wash_data=True):
        if x["value_name"] != "Base":
            if is_wash_data:
                value_of_base = list(dataframe[
                    (dataframe["indicator"] == x["indicator"]) &
                    (dataframe["country"] == x["country"]) &
                    (dataframe["jmp_category"] == x["jmp_category"]) &
                    (dataframe["year"] == x["year"]) &
                    (dataframe["value_name"] == "Base") &
                    (dataframe["2nd_dimension"] == x["2nd_dimension"])
                ]['value'])
                if len(value_of_base):
                    return value_of_base[0]
            else:
                value_of_base = list(dataframe[
                    (dataframe["indicator"] == x["indicator"]) &
                    (dataframe["country"] == x["country"]) &
                    (dataframe["year"] == x["year"]) &
                    (dataframe["value_name"] == "Base")
                ]['value'])
                if len(value_of_base):
                    return value_of_base[0]
        return np.nan
    
  10. modify_commitment_name

    Modifies the commitment name for clarity, appending relevant indicators such as “Water” or “Sanitation” as needed.

    def modify_commitment_name(x):
        commitment_name = str(x['commitment']).strip()
        if x["value_name"] == "Base":
            return "Base"
        if "2030" in commitment_name or "2050" in commitment_name:
            value_name = x['value_name']
            if 'W' in value_name and 'S' in value_name:
                value_name = "Water and Sanitation"
            if 'W' in value_name:
                value_name = "Water"
            if 'S' in value_name:
                value_name = "Sanitation"
            return f"Full {value_name} Access in {commitment_name}"
        return x['commitment']
    
  11. get_alb_value

    Adjusts values based on “Basic” and “SafelyManaged” categories in the dataset.

    def get_alb_value(x, df):
        if x["2nd_dimension"] == "Basic":
            additional_value = df[
                (df["indicator"] == x["indicator"]) &
                (df["year"] == x["year"]) &
                (df["country"] == x["country"]) &
                (df["commitment"] == x["commitment"]) &
                (df["value_name"] == x["value_name"]) &
                (df["2nd_dimension"] == "SafelyManaged")
            ]
            if not additional_value.empty:
                return x["value"] + additional_value["value"].iloc[0]
        return x["value"]
    

These functions collectively handle filtering, cleaning, and transforming the IFs dataset, ensuring consistency with WASH categories and preparing it for subsequent analysis.

2.B. IFS Data Processing

2.B.1. Combine, Filter, and Remap IFs Values

This section describes the process of transforming and processing IFs data files into a unified DataFrame, combined_df. The transformation involves cleaning, reshaping, and filtering the data to prepare it for analysis.

### Combining and Transforming Data

The following code iterates over each file in the IFs dataset, applies necessary transformations, and combines them into a single DataFrame (combined_df) for analysis.

combined_df = pd.DataFrame(columns=final_columns)
for file in files:
    cleanup_semicolon(file)
    data = pd.read_csv(file, header=[1,2,4,5], sep=',')

    # Update column names
    new_columns = list(data.columns)
    for i, col in enumerate(new_columns):
        if col == ('Unnamed: 0_level_0', 'Unnamed: 0_level_1', 'Unnamed: 0_level_2', 'Unnamed: 0_level_3'):
            new_columns[i] = 'Year'
    data.columns = new_columns
    df = pd.DataFrame(data.to_dict('records'))

    # Reshape data
    df_melted = df.melt(id_vars=['Year'], var_name='variable', value_name='value')
    new_data = []
    for value_list in df_melted.to_dict('records'):
        value_type = get_value_types(value_list["variable"][3])
        new_data.append({
            "year": int(value_list["Year"]),
            "country": map_country_name(value_list["variable"][0]),
            "2nd_dimension": value_list["variable"][1],
            "unit": value_list["variable"][2],
            "value_type": list(filter(lambda v: v, value_type)),
            "value": value_list["value"]
        })
    df = pd.DataFrame(new_data)

    # Apply filters and transformations
    df = filter_dataframe_by_year(df, file)
    df_split = pd.DataFrame(df['value_type'].tolist(), index=df.index)
    df_split.columns = ['value_name', 'jmp_category', 'commitment']
    df_final = pd.concat([df, df_split], axis=1)

    # Assign indicator and clean up categories
    df_final['indicator'] = get_ifs_name(file)
    df_final['jmp_category'] = df_final.apply(base_jmp_category, axis=1)
    df_final['jmp_category'] = df_final['jmp_category'].replace({"BS": "ALB"})
    df_final['commitment'] = df_final.apply(modify_commitment_name, axis=1)

    # Adjust values based on category
    if "Water Service" in file or "Sanitation Service" in file:
        df_final['value'] = df_final.apply(lambda x: get_alb_value(x, df_final), axis=1)

    df_final['remove'] = df_final.apply(remove_unmatches_jmp_category, axis=1)
    df_final = df_final[df_final['remove'] == False].reset_index(drop=True)

    # Add initial and base values
    df_final['initial_value'] = np.nan
    df_final['base_value'] = np.nan
    df_final['2030'] = np.nan
    df_final['2050'] = np.nan

    if "Water Service" in file or "Sanitation Service" in file:
        df_final['initial_value'] = df_final.apply(lambda x: add_initial_value_for_wash(x, df_final), axis=1)
        df_final['base_value'] = df_final.apply(lambda x: add_base_value(x, df_final), axis=1)
    else:
        df_final['base_value'] = df_final.apply(lambda x: add_base_value(x, df_final, is_wash_data=False), axis=1)

    if file.split("/")[3] not in year_filter_config["year_range"]["files"]:
        df_final = df_final[df_final['year'] != 2019].reset_index(drop=True)
    else:
        df_final['2030'] = df_final.apply(lambda x: float(x["value"]) if "2030" in x["commitment"] else np.nan, axis=1)
        df_final['2050'] = df_final.apply(lambda x: float(x["value"]) if "2050" in x["commitment"] else np.nan, axis=1)

    df_final = df_final[final_columns]
    combined_df = pd.concat([combined_df.dropna(axis=1, how='all'), df_final], ignore_index=True)

### Filtering Commitments

After combining the data, a final filter is applied to remove any rows that do not align with the target commitment year (2030 or 2050).

combined_df['remove'] = combined_df.apply(lambda x: remove_unmatch_commitment(x), axis=1)

2.B.2. IFS Data Cleanup

The cleanup_data function is applied to the combined_df DataFrame to finalize data cleaning by removing extraneous characters in columns such as “unit” and handling missing values. For details on cleanup_data, refer to *IFS Functions in Section 2.A.

2.C. IFs Table of Keys

In this section, we generate unique key tables for various columns in the IFs dataset, which are used to standardize identifiers across the data. Each key table is saved as a separate CSV file.

2.C.1. Indicators

Generates a unique identifier for each indicator in the dataset.

indicator_table = create_table_key(combined_df, 'indicator')

2.C.2. Units

Creates a key table for the “unit” column to ensure consistent units across all data files.

units_table = create_table_key(combined_df, 'unit')

2.C.3. Value Names

Assigns unique identifiers to each “value_name” in the dataset for standardized referencing.

value_names_table = create_table_key(combined_df, 'value_name')

2.C.4. JMP Categories

Generates a key table for the “jmp_category” column, ensuring consistent JMP categorization.

jmp_categories_table = create_table_key(combined_df, 'jmp_category')

2.C.5. JMP Names Table (Custom)

Defines a custom key table for the JMP names used in the dataset, with a manual assignment of IDs for “Water,” “Sanitation,” and “Water and Sanitation.”

jmp_names_table = pd.DataFrame([
    {"id": 1,"jmp_name": "Water"},
    {"id": 2,"jmp_name": "Sanitation"},
    {"id": 3,"jmp_name": "Water and Sanitation"}
])
jmp_names_table.to_csv(f'{OUTPUT_DIR}/key_jmp_name.csv',index=False)

2.C.6. Commitments

Creates a unique identifier table for the “commitment” column, which is crucial for tracking progress toward specified targets (e.g., 2030 and 2050 goals).

commitments_table = create_table_key(combined_df, 'commitment')

2.C.7. Country

Generates a key table for the “country” column, standardizing country names across all datasets.

countries_table = create_table_key(combined_df, 'country')

These key tables ensure that each essential column in the dataset has a unique identifier, facilitating consistency and accuracy in data processing and subsequent analyses.

2.D. IFS Table Results

This section documents the final steps of processing the IFs dataset by applying custom mappings, creating key table mappings, and preparing the final tables for output. These steps ensure that all data is structured and standardized, ready for visualization and analysis.

2.D.1. Custom Table Mapping (JMP Name)

To standardize WASH indicators in the IFs dataset, we map specific JMP names (such as Water, Sanitation) to unique identifiers using the jmp_dict dictionary. The map_jmp_id function assigns JMP name IDs to each row based on the value_name and indicator columns.

JMP Name Codes

  • FS: Full Sanitation Access

  • FW: Full Water Access

  • FWS: Full Water and Sanitation Access

  • SI: Sanitation Increased

  • WI: Water Increased

  • WSI: Water and Sanitation Increased

Code

jmp_dict = dict(zip(jmp_names_table['jmp_name'], jmp_names_table['id']))

def map_jmp_id(x):
    value_name = x["value_name"]
    # For the Base data
    if value_name == "Base":
        if x["indicator"].startswith("Water"):
            return jmp_dict['Water']
        if x["indicator"].startswith("Sanitation"):
            return jmp_dict['Sanitation']
    if 'W' in value_name and 'S' in value_name:
        return jmp_dict['Water and Sanitation']
    if 'W' in value_name:
        return jmp_dict['Water']
    if 'S' in value_name:
        return jmp_dict['Sanitation']
    return 0

combined_df['jmp_name_id'] = combined_df.apply(map_jmp_id, axis=1)

The map_jmp_id function applies specific conditions to assign the correct JMP name ID for each row in the dataset, enhancing consistency in JMP categories across records.

2.D.2. IFS Key Table Mapping

The following code maps unique IDs from various key tables (such as indicator, unit, value_name, etc.) into the combined_df DataFrame. This ensures each attribute is represented with a standardized identifier for consistent reference.

Code

ifs_table_with_id = merge_id(combined_df, indicator_table, 'indicator')
ifs_table_with_id = merge_id(ifs_table_with_id, units_table, 'unit')
ifs_table_with_id = merge_id(ifs_table_with_id, value_names_table, 'value_name')
ifs_table_with_id = merge_id(ifs_table_with_id, jmp_categories_table, 'jmp_category')
ifs_table_with_id = merge_id(ifs_table_with_id, commitments_table, 'commitment')
ifs_table_with_id = merge_id(ifs_table_with_id, countries_table, 'country')

The above mappings ensure that each key table is referenced consistently, with unique IDs assigned to indicators, units, values, categories, commitments, and countries.

2.D.3. IFS Final Result

After applying key table mappings and ensuring all values are correctly assigned, we prepare the final table by removing rows without valid values and sorting by year.

ifs_table_with_id = ifs_table_with_id[ifs_table_with_id['value'].notna()].reset_index(drop=True)
ifs_table_with_id = ifs_table_with_id.sort_values(by='year').reset_index(drop=True)

This final sorting and filtering step ensures the data is clean and organized, ready for export.

2.D.4. Save IFS Table

We save the final cleaned and filtered data (excluding milestone columns) to a CSV file for analysis and visualization.

final_ifs = ifs_table_with_id[ifs_table_with_id['remove'] == False].reset_index(drop=True)
final_ifs = final_ifs.drop(columns=['remove'])
final_ifs.drop(columns=['2030', '2050']).to_csv(IFS_OUTPUT_FILE, index=False)

This final_ifs file contains the core dataset, cleaned and prepared for analysis.

2.D.5. Save IFS Graph Table

The final graph table is prepared for specific visualizations, duplicating certain entries for compatibility with 2050 projections.

  1. Initial Graph Preparation

    This table retains only specific indicators and adds an actual_year column to track the year of each entry. We then map each year to the nearest milestone year (2030 or 2050).

    graph_with_id = ifs_table_with_id[ifs_table_with_id['indicator_id'].isin([7, 13])].drop(columns=['remove'])
    graph_with_id.loc[:, 'actual_year'] = graph_with_id['year']
    graph_with_id.loc[:, 'year'] = graph_with_id['year'].apply(lambda x: 2030 if x <= 2030 else 2050)
    graph_with_id.loc[:, 'value'] = graph_with_id.apply(lambda x: x['value'] if (x['2030'] is not np.nan or x['2050'] is not np.nan or base_value is np.nan) else np.nan, axis=1)
    graph_with_id = graph_with_id[['actual_year','year','country_id','indicator_id','value_name_id','jmp_name_id','jmp_category_id','commitment_id','value']]
    
  2. Replicating 2030 for 2050

    We duplicate rows with year 2030 to show the same data in 2050 for certain projections.

    replicate_for_2050 = graph_with_id[graph_with_id['year'] == 2030].copy()
    replicate_for_2050.loc[:, 'year'] = 2050
    combined_graph = pd.concat([graph_with_id, replicate_for_2050], ignore_index=True)
    combined_graph = combined_graph.dropna(subset=['value'])
    
  3. Additional Fields and Base Commitment

    The combined_graph DataFrame is expanded to include a full_wash_coverage field and duplicated commitments for visual legend consistency.

    combined_graph['full_wash_coverage'] = 1
    combined_graph['actual_commitment_id'] = combined_graph['commitment_id']
    
    base_commitment = combined_graph[combined_graph['commitment_id'] == 5].copy()
    base_commitment.loc[:, 'full_wash_coverage'] = 0
    for a in range(1, 5):
        base_commitment.loc[:, 'commitment_id'] = a
        combined_graph = pd.concat([combined_graph, base_commitment], ignore_index=True)
    

    The combined_graph DataFrame is saved as the final graph table, ready for visualization.

    combined_graph.to_csv(IFS_GRAPH_OUTPUT_FILE, index=False)
    
  4. Duplicate Commitment Key for Legend

    Finally, we create an additional commitment key table to support graph legends in visualizations.

    actual_commitment = pd.read_csv(f"{OUTPUT_DIR}/key_commitment.csv")
    actual_commitment = actual_commitment.rename(columns={"commitment": "actual_commitment"})
    actual_commitment.to_csv(f"{OUTPUT_DIR}/key_actual_commitment.csv", index=False)
    

This additional commitment key table ensures consistency in legends and provides clarity for commitment types in visual outputs.

This section completes the processing of the IFs dataset, producing final tables for analysis and visualization and establishing consistent mappings and keys for attributes across the data.

2.E. Progress Rates

The Progress Rates section processes IFs data to calculate the yearly increase in WASH indicators, evaluate progress towards full services, and filter data to create a final table of progress rates. This section includes functions for calculating adjustments, creating collections, applying year filters, and saving the final results.

2.E.1 Progress Rates Functions

The get_alb_value_for_progress_rates function calculates adjusted values for indicators categorized as “ALB” by adding corresponding “SM” values when available.

Code

def get_alb_value_for_progress_rates(x, dataframe):
    if x["jmp_category"] == "ALB":
        sm_value = list(dataframe[
            (dataframe["jmp_category"] == "SM") &
            (dataframe["indicator"] == x["indicator"]) &
            (dataframe["country"] == x["country"]) &
            (dataframe["year"] == x["year"])
        ]["value"])[0]
        return x["value"] + sm_value
    return x["value"]

This function ensures that values under “ALB” category reflect combined levels of service access by adding “Safely Managed” (SM) values where applicable.

2.E.2 Progress Rates Collections

The progress_rates_df DataFrame collects progress rates data from relevant IFs files. The code below prepares and processes each file, transforms the data, and appends it to progress_rates_df.

Columns

  • indicator

  • year

  • country

  • jmp_category

  • value_name

  • value

Code

progress_rates_columns = ["indicator", "year", "country", "jmp_category", "value_name", "value"]
progress_rates_df = pd.DataFrame(columns=progress_rates_columns)

for file in files:
    if file.split("/")[3] not in year_filter_config["year_range"]["files"]:
        continue
    print(file)
    data = pd.read_csv(file, header=[1,2,4,5], sep=',')

    # Update column names
    new_columns = list(data.columns)
    for i, col in enumerate(new_columns):
        if col == ('Unnamed: 0_level_0', 'Unnamed: 0_level_1', 'Unnamed: 0_level_2', 'Unnamed: 0_level_3'):
            new_columns[i] = 'Year'
    data.columns = new_columns
    df = pd.DataFrame(data.to_dict('records'))

    # Reshape data
    df_melted = df.melt(id_vars=['Year'], var_name='variable', value_name='value')
    new_data = []
    for value_list in df_melted.to_dict('records'):
        value_type = get_value_types(value_list["variable"][3])
        new_data.append({
            "year": int(value_list["Year"]),
            "country": map_country_name(value_list["variable"][0]),
            "2nd_dimension": value_list["variable"][1],
            "unit": value_list["variable"][2],
            "value_type": list(filter(lambda v: v, value_type)),
            "value": value_list["value"]
        })
    df = pd.DataFrame(new_data)

    # Split value types
    df_split = pd.DataFrame(df['value_type'].tolist(), index=df.index)
    df_split.columns = ['value_name', 'jmp_category', 'commitment']
    df_final = pd.concat([df, df_split], axis=1)

    # Assign indicator and categories
    df_final['indicator'] = get_ifs_name(file)
    df_final['jmp_category'] = df_final.apply(base_jmp_category, axis=1)
    df_final['jmp_category'] = df_final['jmp_category'].replace({"BS": "ALB"})
    df_final['commitment'] = df_final.apply(modify_commitment_name, axis=1)
    df_final = df_final[df_final['commitment'] == "Base"]
    df_final = df_final[progress_rates_columns]
    df_final['value'] = df_final.apply(lambda x: get_alb_value_for_progress_rates(x, df_final), axis=1)
    progress_rates_df = pd.concat([progress_rates_df.dropna(axis=1, how='all'), df_final], ignore_index=True)

progress_rates_df = progress_rates_df.sort_values(by=["indicator", "country", "jmp_category", "value_name", "year"])
progress_rates_df['yearly_increase'] = progress_rates_df.groupby(["indicator", "country", "jmp_category"])['value'].diff()

This code block calculates yearly increases for each indicator and prepares the data for additional filtering.

2.E.3 Progress Rates Year Filters

This step filters the progress rates to identify whether a country has reached full services for each indicator. For each group, the code calculates the average yearly increase and checks if the value has reached or exceeded 99%.

Code

progress_rates_df["full_services"] = progress_rates_df["value"].apply(lambda x: x > 99)
filtered_dfs = []

for name, group in progress_rates_df.groupby(["indicator", "country", "jmp_category", "value_name"]):
    group = group.sort_values(by="value")
    group["yearly_increase"] = group["value"].diff()
    avg_yearly_increase = group["yearly_increase"].mean()

    reached_100 = group[group["value"] >= 99]
    if not reached_100.empty:
        filtered_group = reached_100.iloc[[0]].copy()
        filtered_group.loc[:, "avg_yearly_increase"] = avg_yearly_increase
        filtered_group.loc[:, "full_services"] = True
    else:
        filtered_group = pd.DataFrame({
            "indicator": [name[0]],
            "country": [name[1]],
            "jmp_category": [name[2]],
            "value_name": [name[3]],
            "year": [2100],
            "value": [group["value"].iloc[-1]],
            "avg_yearly_increase": [avg_yearly_increase],
            "full_services": False
        })
    filtered_dfs.append(filtered_group)

progress_rates_df = pd.concat(filtered_dfs, ignore_index=True)
progress_rates_df = progress_rates_df[progress_rates_columns + ["avg_yearly_increase", "full_services"]]

This code identifies the first year that a country achieves “full services” (value >= 99%) for each indicator. If not reached, it sets a default year of 2100.

2.E.4 Progress Rates Key Table Mapping

The final step in processing progress rates is to apply key mappings to standardize jmp_name_id, jmp_category_id, country_id, and indicator_id.

Code

progress_rates_df['jmp_name_id'] = progress_rates_df.apply(map_jmp_id, axis=1)
progress_rates_df = merge_id(progress_rates_df, jmp_categories_table, 'jmp_category')
progress_rates_df = merge_id(progress_rates_df, countries_table, 'country')
progress_rates_df = merge_id(progress_rates_df, indicator_table, 'indicator')
progress_rates_df = progress_rates_df.drop(columns=["value_name"])

This mapping step ensures that all key columns in progress_rates_df have standardized IDs for consistency.

2.E.5 Save Progress Rates Table

Finally, we save the processed progress_rates_df to a CSV file, IFS_PR_OUTPUT_FILE, for future analysis and visualization.

Code

progress_rates_df.to_csv(IFS_PR_OUTPUT_FILE, index=False)

The resulting table contains progress rates, yearly increase calculations, and flags for full services, providing a comprehensive view of progress toward WASH goals.