3. JMP Dataset
The JMP Dataset section documents the steps for processing and transforming the JMP data, including renaming columns, categorizing values, and mapping key tables to standardize data attributes.
—
3.A. JMP Data Processing
In this step, the JMP data is loaded, columns are renamed for clarity, and values are categorized to prepare the dataset for further analysis.
Loading Data
The JMP dataset is read into a DataFrame.
data = pd.read_csv(JMP_INPUT_FILE, encoding='latin-1')
data.head()
3.A.1 Rename the Columns
Column names are renamed for better clarity and understanding of each variable.
data.columns = [
'country',
'year',
'jmp_name',
'total_ALB',
'annual_rate_change_ALB',
'total_SM',
'annual_rate_change_SM',
'manual_rate_change_SM',
'manual_rate_change_ALB'
]
data = data.drop(columns=[
'manual_rate_change_SM',
'manual_rate_change_ALB'
])
3.A.2 Categorize the Values
The data is reshaped using pd.melt, categorizing the value_type and jmp_category columns. Values of -99 are replaced with NaN.
data_melted = pd.melt(
data,
id_vars=['country', 'year', 'jmp_name'], # columns to keep
var_name='variable', # melted column
value_name='value' # values column
)
data_melted['value_type'] = data_melted['variable'].apply(lambda x: 'total' if 'total' in x else 'annual_rate_change')
data_melted['jmp_category'] = data_melted['variable'].apply(lambda x: 'ALB' if 'ALB' in x else 'SM')
data_melted['jmp_category'] = data_melted['jmp_category'].replace({"BS": "ALB"})
data_melted['country'] = data_melted['country'].apply(map_country_name)
data_melted = data_melted.drop(columns=['variable'])
data_melted['value'] = data_melted['value'].apply(lambda x: np.nan if x == -99 else x)
This transformation ensures that values are properly categorized and ready for key mappings.
—
3.B. JMP Table Keys
To standardize and reference columns consistently, key tables are created for jmp_category and value_type.
3.B.1 JMP Categories
The JMP categories table is extended with additional categories, using the create_table_key function to maintain consistency with the existing IFS table keys.
jmp_categories_table = create_table_key(data_melted, 'jmp_category')
### 3.B.2 JMP Value Types
A key table for value_type is created, which assigns unique identifiers to each value type in the dataset.
value_types_table = create_table_key(data_melted, 'value_type')
—
3.C. JMP Table Results
This section details the process of merging identifiers from the key tables, performing data cleanup, and saving the final JMP table.
3.C.1 JMP Key Table Mapping
Using the merge_id function, we map key tables to the main JMP DataFrame (data_melted), ensuring each field has a unique identifier.
jmp_table_with_id = merge_id(data_melted, value_types_table, 'value_type')
jmp_table_with_id = merge_id(jmp_table_with_id, countries_table, 'country')
jmp_table_with_id = merge_id(jmp_table_with_id, jmp_names_table, 'jmp_name')
jmp_table_with_id = merge_id(jmp_table_with_id, jmp_categories_table, 'jmp_category')
3.C.2 JMP Data Cleanup (Remove Nullable Country)
After mapping, rows with undefined country_id values are removed.
jmp_table_with_id = jmp_table_with_id[jmp_table_with_id['country_id'] != 0].reset_index(drop=True)
This step ensures that all rows in the final dataset have a valid country_id.
3.C.3 JMP Final Result
We review the final table to confirm that all mappings and transformations were successful.
jmp_table_with_id.head()
3.C.4 Save JMP Table
The processed JMP data is saved to a CSV file for further analysis or visualization.
jmp_table_with_id.to_csv(JMP_OUTPUT_FILE, index=False)
The saved file provides a complete view of the JMP dataset, including standardized identifiers and organized values.