# Artifact Appendix
This document describes the requirements, expected outputs, and usage instructions of the software artifact that accompanies the paper "Optimal Graph Stretching for Distributed Averaging".
This document was modelled after the [Artifact Evaluation file used at the Privacy Enhancing Technologies 2025 symposium](https://petsymposium.org/artifacts.php), though the paper that this artifact accompanies has not been published there.


## Description
Generates random graphs, increases their girth, heuristically optimises the topology, and measures performance of distributed averaging.

### Security/Privacy Issues and Ethical Concerns
No issues or concerns. All code consists of numerical simulations on artificial datasets.


## Basic Requirements
### Hardware Requirements
Very high CPU compute power. More cores is better.

You are unlikely to be able to run the artifact on a personal computer. It is recommended that you use a high-performance computing cluster (HPC).

### Software Requirements
Code is written in MATLAB. This is a proprietary tool that requires a license. The code is _not_ compatible with GNU Octave.

The following packages are required:
* MATLAB R2021b or newer
* MATLAB Parallel Computing Toolbox
* MATLAB Statistics and Machine Learning Toolbox

You can download MATLAB at <https://mathworks.com/downloads/>. For more information on installing MATLAB and its toolboxes, see <https://mathworks.com/help/install/ug/install-products-with-internet-connection.html>.

> **Notes for HPCs**  
> * Many HPCs use [Slurm](https://slurm.schedmd.com/) to schedule tasks. Therefore, example configuration files for Slurm are included in this artifact.
> * Your HPC may already have MATLAB and its toolboxes pre-installed. Be sure to check the documentation of your HPC for detailed information.
> * Using Slurm is _not_ required.

### Estimated Time and Storage Consumption

* **Running experiments**

  | **Physical cores** | 64         | 128       | 256       |
  |--------------------|------------|-----------|-----------|
  | **Time**           | 1000 hours | 500 hours | 250 hours |
  | **Memory**         | 96 GB      | 192 GB    | 384 GB    |
  | **Storage**        | 25 GB      | 25 GB     | 25 GB     |

* **Creating figures**

  | **Physical cores** | 8       |
  |--------------------|---------|
  | **Time**           | 6 hours |
  | **Memory**         | 290 GB  |
  | **Storage**        | 25 GB   |

These times count only the time while scripts are running. They do not include the time needed for setting up or the time spent waiting due to Slurm's scheduling.

> **Physical cores vs logical cores**  
> Note the distinction between physical cores and logical cores. On a typical personal computer, you have two logical cores per physical core. If your computer says it has 16 logical cores, you likely have only 8 physical cores.

> **How many physical cores do I have?**  
> The number of physical cores can typically be found in your operating system's settings, task manager, or in some sort of "About" section. Search for `number of cores [windows / macos / kde / gnome / ...]` in your favourite search engine for more information.

> **Reducing execution time and memory usage**  
> You can choose to reduce execution time and memory usage at the cost of reduced output accuracy by reducing the number of times each experiment is repeated. Halving the number of repetitions will similarly (approximately) halve the time and memory requirements. Relevant instructions are provided in the "Experiments" section.


## Environment
### Accessibility
All code is available at <https://doi.org/10.4121/e64c61d3-deb5-4aad-af60-92d92755781f>. Clone the git repository linked on that page.

The distribution artifact (ending in `.cache.7z`) is not required. Download it if you are interested in inspecting the experiment's outputs without reproducing them yourself. Note that this still requires the amount of memory indicated above. Relevant instructions are provided in the "Experiments" section.

### Set up the environment
Instructions depend on whether you have access to a graphical user interface (GUI), or only have a command line.

#### With a GUI
1. \[Optional: Inspect Only\] Extract the distribution artifact `StretchSim.cache.7z` into the git repository. Ensure that the `cache` directory is in the same directory as `StretchSim.m` and is populated with cache files. That is, you should have the file `StretchSim/StretchSim.m`, and the directory `StretchSim/cache/a/a/` should contain many `.mat` files.
2. Launch MATLAB.
3. Install the "Parallel Computing Toolbox" by MathWorks:
   1. At the top of the MATLAB window, click the "Home" tab, and click "Add-Ons". The "Add-On Explorer" will open in a new window.
   2. Search for "Parallel Computing Toolbox" and go to the toolbox's page.
   3. Confirm that the toolbox is published by MathWorks.
   4. Click "Install" and then click "Continue". MATLAB will close and an installer will open.
   5. Walk through the steps of the installer. After completion, the installer will close and MATLAB will open.
4. Similarly, install the "Statistics and Machine Learning Toolbox" by MathWorks.
5. Install the "[Progress bar (cli, gui, parfor)](https://mathworks.com/matlabcentral/fileexchange/121363-progress-bar-cli-gui-parfor)" add-on (v1.3.1) by HyunGwang Cho:
   1. Open the "Add-On Explorer" as described above.
   2. Search for "Progress bar (cli, gui, parfor)" and go to the add-on's page.
   3. Confirm that the add-on is published by Armin Ataei.
   4. Click "Add" and then "Add to MATLAB" to install the add-on.
6. Similarly, install the "[matlab2tikz/matlab2tikz](https://mathworks.com/matlabcentral/fileexchange/22022-matlab2tikz-matlab2tikz)" add-on (v1.50.0.0) by Nico Schlömer.
7. Close the add-on explorer.

#### With command line only
1. Install the "Parallel Computing Toolbox" by MathWorks:
   * If you are using an HPC, chances are that all licensed toolboxes have already been installed. Check your HPC's documentation for the details.
   * If the required toolbox is not pre-installed, consult the [silent installation instructions](https://mathworks.com/help/install/ug/install-noninteractively-silent-installation.html) or the [MATLAB Package Manager documentation](https://mathworks.com/help/install/ug/matlab-package-manager.html) for more details on how to install the required toolboxes.
2. Similarly, install the "Statistics and Machine Learning Toolbox" by MathWorks.
3. Install the "[Progress bar (cli, gui, parfor)](https://mathworks.com/matlabcentral/fileexchange/121363-progress-bar-cli-gui-parfor)" add-on (v1.3.1) by HyunGwang Cho:
   1. Visit <https://mathworks.com/matlabcentral/fileexchange/121363-progress-bar-cli-gui-parfor>.
   2. Press the button "Download" to download `ProgressBar.zip`. You may need to log in with your MathWorks account before you can download the file.
   3. Extract `ProgressBar.zip` into a temporary directory of your choosing.
   4. Inside the temporary directory, enter the directory `ProgressBar`, and copy the file `ProgressBar.m` into the directory where `StretchSim.m` resides.
   5. Check that the files `ProgressBar.m` and `StretchSim.m` are in the same directory.
   6. \[Optional\] Delete `ProgressBar.zip`, and delete the temporary directory into which you extracted `ProgressBar.zip`.
4. Install the "[matlab2tikz/matlab2tikz](https://mathworks.com/matlabcentral/fileexchange/22022-matlab2tikz-matlab2tikz)" add-on (v1.50.0.0) by Nico Schlömer:
   1. Visit <https://mathworks.com/matlabcentral/fileexchange/22022-matlab2tikz-matlab2tikz>.
   2. Press the button "Download" to download `matlab2tikz-matlab2tikz-v1.1.0-99-g806c97d.zip`. You may need to log in with your MathWorks account before you can download the file.
   3. Extract `matlab2tikz-matlab2tikz-v1.1.0-99-g806c97d.zip` into a temporary directory of your choosing.
   4. Inside the temporary directory, enter the directory `src`, and copy _all files_ into the directory where `StretchSim.m` resides.
   5. Check that the files were copied successfully:
      * In the directory where `StretchSim.m` resides, you should have (amongst others) the files `cleanfigure.m` and `matlab2tikz.m`, and also the directories `dev` and `private`.
      * Inside the directory `dev`, you should have the file `formatWhitespace.m`.
      * Inside the directory `private`, you should have (amongst others) the files `errorUnknownEnvironment.m` and `versionString.m`.
   6. \[Optional\] Delete `matlab2tikz-matlab2tikz-806c97d.zip`, and delete the temporary directory into which you extracted `matlab2tikz-matlab2tikz-806c97d.zip`.
5. \[If you use Slurm\] Create `.sbatch` files to run the scripts:
   * Different HPCs have different methods and requirements for running scripts. Therefore, there are no universal usage instructions we can provide here. You must manually consult the documentation of your HPC to learn how to run MATLAB scripts with Slurm.
   * That said, we provide several example Slurm configuration files in the directory `SlurmExamples`. The example files in that directory are suitable for use on the [Delft AI Cluster](https://daic.tudelft.nl/).
     * The high-level structure of the example files is as follows. We split the task into two parts: running experiments and creating figures. Experiments run separately in many sub-tasks using Slurm's array feature. Once all sub-tasks have finished, their results are combined by the final task of creating figures.
     * The method is to first schedule `RunExperiments`, and once you have _manually_ confirmed that all experiments have completed, schedule `CreateFigures`.
     * Each sub-task is given approximately the same workload. Make sure you configure a number of sub-tasks that divides the `repeat_count` parameter, which is explained further in the "Experiments" section. By default, there are 250 sub-tasks and 500 repetitions.
     * Still, some sub-tasks may finish sooner than others. Simply re-schedule those sub-tasks. They will continue from where they left off.
     * To change the number of sub-tasks in `RunExperiments.sbatch` to some number `<X>`, change `--array=1-250` to `--array=1-<X>` and change `SLURM_ARRAY_TASK_MAX=250` to `SLURM_ARRAY_TASK_MAX=<X>`.
     * To (re-)run only specific sub-tasks `<X,Y,Z>`, change `--array=1-250` to `--array=<X,Y,Z>`, but do **not** change the value set to `SLURM_ARRAY_TASK_MAX`. For example, to re-run sub-tasks `4,5,6,7,14,34`, set `--array=4-7,14,34`.
     * Don't schedule the tasks yet! Wait for the detailed usage instructions, in the section "Experiments".
   * Place the created `.sbatch` files inside the same directory as `StretchSim.m`.
6. \[Optional: Inspect Only\] The distribution artifact `StretchSim.cache.7z` contains all experiment results. The following steps describe how to extract the results into the right directories.
   1. Inside the `StretchSim/` directory, create the directory `cache_xz/`.
   2. Download the distribution artifact `StretchSim.cache.7z` and place it inside the directory `cache_xz/`.
   3. Extract `StretchSim.cache.7z` into `cache_xz/`:
      ```shell
      # Inside 'StretchSim/cache_xz/'
      7za x -aos StretchSim.cache.7z
      ```
   4. \[If you use Slurm\] Upload the `.tar.xz` files you extracted from `StretchSim.cache.7z` to your server:
      ```shell
      # On your local machine, inside 'StretchSim/'
      scp -Cr "cache_xz/" "<username>@<server>:<remote_directory>/StretchSim/"
      ```
   5. Extract the `.tar.xz` files into the `cache/` directory:
      1. \[If you do not use Slurm\]
         ```shell
         # Inside 'StretchSim/'
         ../SlurmExamples/Unzip.sbatch
         ```
      2. \[If you do use Slurm\]
         ```shell
         # On your remote machine, inside 'StretchSim/'
         cp ../SlurmExamples/Unzip.sbatch ./
         sbatch Unzip.sbatch
         ```
   6. Verify that `StretchSim/cache/0/0/` contains many `.mat` files.

### Testing the Environment
To test the environment, you will need to run the following commands in MATLAB. In the GUI, this can be done using the "Command Window" (typically in the bottom right). Without a GUI, you can place these commands in a file `test_env.m` and run `matlab -batch test_env`.

1. Run the command `parpool(2)`. A parallel pool should be started using the 'Processes' profile. Wait for the pool to have started. Shut down the parallel pool again by running the command `delete(gcp('nocreate'))`. If you do not get any errors, the "Parallel Computing Toolbox" was installed correctly.
2. Run the command `randsample(10, 1)`. If you do not get any errors, the "Statistics and Machine Learning Toolbox" was installed correctly.
3. Run the command `p = ProgressBar(1)`. A progress bar window should appear. Close the window with `delete(p)`. If you do not get any errors, the add-on "Progress bar (cli, gui, parfor)" was installed correctly.
4. Run the command `matlab2tikz('figurehandle', figure)`. An empty figure and a file selector should appear. You can close both manually. If you do not get any errors beyond "Invalid path. The path must not contain a null character.", the add-on "matlab2tikz/matlab2tikz" was installed correctly.


## Artifact Evaluation
### Main Results and Claims
#### Main Result 1: StretchSim
(The following is an excerpt from the paper.)

Stretching the girth of a graph increases convergence time proportional to the number of edges removed. Consequently, stretching by iteratively removing the edge that is simultaneously in the largest number of cycles results in the smallest convergence time cost. Furthermore, convergence time can be recuperated using a greedy algorithm to add edges without decreasing girth. Finally, minimising the number of leaves does not affect convergence time.

See Section 6 of the paper, including Figures 1 through 8.

### Experiments
> **I am receiving the error "array exceeds maximum array size preference"**  
> If you receive this error at any point, your machine does not have sufficient memory available to store the necessary data structures. You can resolve this error in three ways:
> 1. Acquire more RAM, either by closing other programs, or by adding more RAM to your machine.
> 2. Allow MATLAB to temporarily store data on disk. To do so, find the "Home" tab at the top of MATLAB, click "Preferences", go to "Workspace", and disable the option "Limit the maximum array size to a percentage of RAM". Note, however, that doing so is likely to significantly slow down the experiments, possibly by several orders of magnitude.

#### Experiment 1: StretchSim
* **Expected result:** Exact replica of Figures 1 through 8.
* **Claim:** See Main Result 1.
* **Requirements:**

  | **Physical cores** | 64        | 1280     |
  |--------------------|-----------|----------|
  | **Time**           | 460 hours | 65 hours |
  | **Memory**         | 375 GB    | 6810 GB  |
  | **Storage**        | 13.5 GB   | 13.5 GB  |

1. Set up the environment as described above.
2. \[If you use a GUI\] Launch MATLAB.
3. Navigate to the directory `StretchSim/`.
4. By default, cached results are loaded if they are present. Therefore:
   * \[Default\] To reproduce the results yourself, ensure that the directory `StretchSim/cache/` is empty or does not exist before your first run. You will see that during step 7 below, the directory `StretchSim/cache/` will be filled; this is fine.
   * \[Optional\] Alternatively, to analyse the generated results without reproducing them yourself, make sure that the distribution artifact `StretchSim.cache.7z` has been extracted properly, as described in the section "Environment".
5. Edit the file `StretchSim.m`:
   * \[Optional\] To limit the number of physical cores used in the experiment, find `parallel_max_workers = Inf` and change `Inf` to the desired value. A value of `Inf` indicates that all physical cores on your machine should be used.
   * \[Optional\] To reduce execution time and memory usage at the cost of accuracy, find `repeat_count = 500` and change `500` to a lower value. Note that doing so will result in figures that are not exactly the same as in the paper.
6. Run `StretchSim.m`:
    * \[Method 1: Slurm\] To run the script on a machine with [SLURM](https://slurm.schedmd.com/), use `sbatch` to schedule the `.sbatch` files you created in the section "Set up the environment".
    * \[Method 2: CLI\] Simply run `matlab -batch StretchSim` to start the script. Alternatively, run `matlab -batch StretchSim > log.txt` to store the output into `log.txt`.
    * \[Method 3: GUI\] With `StretchSim.m` opened in the editor, click the "Editor" tab at the top, and click "Run".
7. Keep track of progress and check for errors:
   * \[Method 1: Slurm\] Check the status of the queued jobs with the command `squeue -u <username>`, where `<username>` is your username. Additionally, check the contents of the log files, whose names start with `slurm-`.
   * \[Method 2: CLI\] Check the output that is written to the screen, or into `log.txt` if configured appropriately.
   * \[Method 3: GUI\] Check the output that is written in the "Command Window". Additionally, a separate progress bar window appears, which will close again once all experiments have completed running.

   Wait for the post-processing to complete. The program is done when the message "StretchSim has completed running." is written to the "Command Window".

The output of each individual experiment is stored in the `cache/` directory. You can safely stop and resume `StretchSim` at any point without losing much progress.

The created figures are stored in the `figs/` directory, unless configured otherwise.

##### Downloading and archiving the cache
1. Archive the `cache/` directory into `.tar.xz` files:
   1. \[If you do not use Slurm\]
      ```shell
      # Inside 'StretchSim/'
      ../SlurmExamples/Zip.sbatch
      ```
   2. \[If you do use Slurm\]
      ```shell
      # On your remote machine, inside 'StretchSim/'
      cp ../SlurmExamples/Zip.sbatch ./
      sbatch Zip.sbatch
      ```
2. \[If you use Slurm\] Download the `.tar.xz` files you created from your server:
   ```shell
   # On your local machine, inside 'StretchSim/'
   scp -Cr "<username>@<server>:<remote_directory>/StretchSim/cache_xz" "./"
   ```
3. Archive `cache_xz/` into `StretchSim.cache.7z`:
   ```shell
   # Inside 'StretchSim/cache_xz/'
   7za a StretchSim.cache.7z "*.tar.xz"
   ```


## Limitations
None.


## Notes on Reusability
The code is documented, well-structured, and extensible. The code could easily be changed to consider different types of graphs, stretching algorithms, distributed protocols, metrics, and visualisations.
