Model application manual

This notebook describes how to apply the model on different data as presented in the thesis. It will be explained:

How to create a readable data set

Creation of a readable data set

The created model requires an excel file in a specific format. A function was created to create a plain excel file in the right format. The function requires a list entailing the product categories which are treated by the model, a string naming the product category which flows into the reuse part of the model, a list indication the products treated in the reuse part of the model, and the number of considered use cycles in the reuse part. Furthermore, a file name has to be provided. The created file will appear in the "data_model" folder.

All sheets of the file are empty and to be filled by the user.

To showcase how to input data, an example file will be considered.

Things to consider populating the data frame

A few sheets are presented as examples. In general, common-sense mistakes need to be avoided.

Ensuring mass conservation when splitting the material flow

The following output shows the first sheet, the split of the initial inflow.

The sum of this column has to equal 1.

The same holds true for the sheet "Reuse_inflow_split".

And as well for the sheet "MaTrace_D_secondary_material". In this case, the last two rows need to be excluded since the indicate the export and to production rate.

Furthermore, some transfer coefficients must sum up to 1. In the model one transfer coefficient is sufficient to define a flow which splits into two. However, for the Monte Carlo simulations both are needed.

The following cell shows the example of the sheet "MaTrace_end_of_life":

The column "fraction export eol products" and "fraction collected eol products" as well as the columns "collection to recycling rate" and "postconsumer disposal" have to sum up to 1 for every product category.

Please also consult the system diagram or the excel file data_example.xlsx to find the respective transfer coefficients.

Defining survival curves

The sheet "MaTrace_in_use_stock" show examples on how to define survival curves in the model implementation:

One can select a distribution via the column "distribution". The normal, lognormal, weibull, gamma, and gompertz distribution are preimplemented. The location, scale and shape factors can be set via the corresponding columns. It is also possible to define own distributions.

An example for this is the distribution for the product category "hydroprocessing catalysts poisoning". The string says "defined_distribution_example_dist". This works in the following way. A distribution is defined in the file "defined_distributions.xlsx" (see next cell).

The file holds survival curves defined by the user. Only the first 10 lines are shown. It is important that the distribution is defined for a sufficient number of years, meaning at least the number of considered years.

To use this distribution in the model, one has to fill the column "distribution" with "defined_distribution_column_name". Hence, when the entry in the column "distribution" says "defined_distribution_example_dist", the defined distribution "example_dist" will be used for this product category. If it says "defined_distribution_exmaple_dist_2" the distribution "example_dist_2" will be used for the product category.

How to execute the model iteself

The first step to execute the model is to load the required data. The model receives a dictionary as input which contains the sheets of the mentioned excel file as pandas data frames. The following code cell creates this dictionary.

Furthermore, the number of years and the start year have to be defined. A pandas dataframe containing the "defined_distributions.xlsx" file has to be passed as well.

Optionally, one can decide to print the state to have an indication whether the model is stuck or how long it will still run. One can also define whether the simplified model output shall differentiate between use cycles. Lastly, one can select the number of considered use cycles (default is 3).

The following code shows the execution of the model:

The returned dictionaries "matrace_data_dic" and "reuse_data_dic" are structured in the same way. The first key takes a string containing the considered year. The second key takes a string indicating a stock or a flow in the year. This will then return a pandas containing the values over product categories or products.

The following code cell shows the second keys of "matrace_data_dic".

And the following code cell shows the second keys of "reuse_data_dic".

The following code cell shows the content.

The retuned pandas dataframe "graph_data_pd" contains the stocks and the accumulated outflows.

It is easily possible to create a stacked area chart out of it.

How to conduct Monte Carlo simulations with the provided code

The code to conduct Monte Carlo simulations consists of three files which need to be executed one after another:

In the following it will be discussed what each file does and how to use it. To do so, files and the data used in the cobalt case study will be used. Each of the files has the variable "proof_of_concept" in the very top. If this is "True", an exemplary Monte Carlo simulation and evaluation considering 10 runs will be conducted. If it is set to "False" the results as presented in the thesis will be reproduced.

Executing "monte_carlo_simulations.py"

At the start of the file, the user can adjust the settings. Those are the number of runs ("n_runs"), the number of the considered years ("n_years"), the start year ("start_year"), and the number of the considered use cycles ("considered_use_cycles").

Then, the data to be loaded needs to be specified. Firstly, the file entailing the survival curves designed by the user has to be loaded (see above). Secondly, the path to the input data (format as described above) has to be specified. Lastly, the excel file containing the uncertainty rating has to be set.

This file must entail the same sheet names as the input file. Instead of the data, the colums must hold the uncertainty score (0 to 5, as explained in the main body of the thesis).

If an input shall not be varied, the column can be either deleted from the file holding the uncertainty scores, or all values can be set to 0. Only numerical inputs can be considered.

The following cell shows one sheet of the excel holding the input data. The next one shows the uncertainty ratings of those inputs.

As it can be seen, only the uncertainty scores for the columns "shape" and "scale" appear. This is intended since only numerical values can be varied by this implementation and the location of the Weibull distribution is a factor which was not considered.

Once the paths and the settings are defined, the script can be run. The next cell executes the script for 10 runs.

The printed output reflects the steps of the code. After the data is imported, random numbers following the defined distributions are created. Since some split vectors and transfer coefficients will not sum up to 1 anymore (see above) they have to be normalized. Afterwards, the experiments are run. The data is collected in a dictionary. If several thousand runs are executed, the dictionary is dumped in splits to decrease the runtime.

The results and a file containing the used model inputs can be found in the folder "monte_carlo_results/proof_of_concept".

The following cell shows part of the saved inputs. The rows represent runs.

The results are saved in the form of nested dictionaries:

The fist key represents runs:

The second part specifies from which part of the model the data is coming from:

Since this data structure is hard to handle, the script "monte_carlo_evaluation.py" serves the purpose to transfer the data into multidimensional pandas data frames.

Executing "monte_carlo_evaluation.py"

In order to execute this script, one has to adjust the settings in the beginning of the file so they are the same as the used ones in "mone_carlo_simulations.py". The path to the results has to be defined. Then the file is ready to be executed.

This script creates the multidimensional pandas data frame 'compact_results'. The first key entails the runs, the second the stock or flow, and the third one the product or the product category.

Executing "monte_carlo_uncertainty.py"

In order to execute this script, one has to adjust the settings in the beginning of the file so they are the same as the used ones in "mone_carlo_simulations.py" and "monte_carlo_evaluation.py". The path to the results has to be defined. Then the file is ready to be executed.

The file calculates for all considered years the spearman correlation and the normalized spearman square correlation between all inputs and the total in-use stock, the total hibernating stock, the total disposal flow, and the total export flow.

The results are written into the files "monte_carlo_results" (results of mentioned stocks and flows over years and runs), "speaman_results_abs" (spearman correlation between inputs and outputs over years) and "spearman_results_normalized" (normalized spareman square correlation between inputs and outputs over years).

The following cell shows a part of the table displaying the normalized square spearmen correlation.

Using this data, one can create plots displaying the contribution of an input to the uncertainty of an output. The following graph is not representative since only 10 runs were conducted.