###########################################################
### Alpinist: an Annotation-Aware GPU Program Optimizer ###
### Artifact number: 109                                ###
###########################################################

#################
### Structure ###
#################
The structure of the Alpinist directory is as follows:
- deps/: all the dependencies of Alpinist and the scripts to generate the evaluation tables
- Alpinist-Examples/: The examples directory of Alpinist. The directories contain the original and optimized programs (starting with orig_ and opt_ respectively)
- scripts/: Helper scripts to generate the evaluation tables. This directory can be ignored.
- table_1.sh: A script to generate Table 1 in the paper
- table_2.sh: A script to generate Table 2 in the paper
- runN.sh: A script to run a specific example multiple times
- setup.sh: A script to install all dependencies
- vercors_wiki.html: A tutorial and documentation of the VerCors verifier
- *.txt: The output of different commands (explained below)
- TACAS_2022_paper_108.pdf: The accompanying paper "Alpinist: an Annotation-Aware GPU Program Optimizer"

###################################
### Data Availability Statement ###
###################################
The artifact is stored at the 4TU archive, under the name "Artifact for paper (Alpinist: an Annotation-Aware GPU Program Optimizer)". The artifact can be setup with the TACAS 22 Artifact Evaluation VM.

Virtual Machine link:
   - https://zenodo.org/record/5562597#.YYTwjVMo9wg

#############
### SETUP ###
#############
The VM image provided by TACAS has been used with the following parameters:
- 2 cores of a processor
- 8 GB of RAM

Unzip the zip file, start the VM, and add the folder created by unzipping the zip file to
the shared folders of the VM (Devices->Shared Folders->Shared Folders Settings... in VirtualBox).
In the VM, open a terminal, navigate to the shared folder (it can be found under /media) and
copy the (entire) unzipped subfolder "Alpinist" into the home directory of the tacas22 user (i.e. /home/tacas22). This can be done by executing:

cp -rf Alpinist ~

Then navigate to the new location, and move inside the Alpinist folder. 

Run `sudo ./setup.sh` to install all dependencies. 

If you want to install the dependencies manually, go into the deps folder:
   There are four Python libraries, which can be installed using the following commands (in order):
   - sudo pip3 install docutils-0.18-py2.py3-none-any.whl
   - sudo pip3 install wcwidth-0.2.5-py2.py3-none-any.whl
   - sudo pip3 install beautifultable-1.0.1-py2.py3-none-any.whl
   - sudo pip3 install statistics-1.0.3.5.tar.gz

   There are two debian packages for Java 11 and for Alpinist, which can be installed using the following commands (in order):
   - sudo dpkg -i openjdk-11-jre-headless_11.0.11+9-0ubuntu2~20.04_amd64.deb
   - sudo dpkg -i Alpinist.deb

If the scripts are not executable, go to ~/Alpinist, and run:
chmod +x table_1.sh table_2.sh runN.sh setup.sh

######################################
### The set of examples            ###
######################################

The examples used for the paper can be found in the "Alpinist-Examples" folder. They are grouped by optimisation type, and in every folder of a single example, there are two files present, one prefixed "orig_" and one prefixed "opt_". The first file can be used as input to Alpinist, and requires that the optimization corresponding to the subfolder of "Alpinist-Examples" in which the example resides is applied on it. The second file is the optimized code as Alpinist produces it. It can be overwritten, but is included here to show what the tool will produce.

The examples are written in the PVL language, which is a language supported by VerCors and Alpinist to write GPU code. Documentation about this language, and VerCors in general, can be found in "vercors_wiki.html", which is included in the "Alpinist" folder of the artifact.

All the examples are annotated with special optimization annotations, as mentioned in the paper. These optimization annotations point Alpinist to where an optimization should be applied. 
The annotations start with the keyword "gpuopt" followed by the optimizations and its arguments. The annotations can 

* gpuopt loop_unroll <i> <k>
   - <i>: The iteration variable
   - <k>: The number of iterations to unroll
* gpuopt matrix_lin <m> <major> <M> <N>
   - <m>: The name of the matrix
   - <major>: Either R or C for row-major and column-major access respectively
   - <M>: The number of rows
   - <N>: The number of columns
* gpuopt glob_to_reg <a> <l>
   - <a>: The name of the array
   - <l>: The index to prefetch 
* gpuopt iter_merge  <i> <k>
   - <i>: The iteration variable
   - <k>: The number of iterations to merge
* gpuopt tile <mode> <c>
   - <mode>: Either inter or intra for inter-tiling and intra-tiling respectively
   - <c>: The stride or chunk size
* gpuopt fuse <k> <tb>
   - <k>: The number of kernels to fuse
   - <tb>: The number of threadblocks


#############################################################
### Documentation for replicating the tables in the paper ###
#############################################################

Before explaining how to replicate the results reported in the tables of the paper, it is important to note the following:
1. Given that the experiments run in a VM, it is to be expected that the measured results will deviate from the reported results, but the general trend should be similar.
2. VerCors uses Z3, which can noticably vary in runtime, when applied multiple times on the same problem. This can greatly influence the measured runtimes.
3. We do not include the time it takes for VerCors and Alpinist to parse a problem, and prepare the data structures for verification / optimization. This is consistent with what we have done for the results reported in the paper. We made this decision, as for each case, this parsing and preparation work is similar, for both VerCors and Alpinist.
4. The tables do not output the results of Alpinist and VerCors to the console. We chose this to keep the output of the scripts readable. However, for transparency, all the output of Alpinist and VerCors are logged into .log files corresponding to each script.

####################################
### Documentation for table_1.sh ###
####################################
The script table_1.sh can be run to generate Table 1 of the paper: 

./table_1.sh <N> <l>

<N>: The number of experiments per program, default 1. This argument is optional. For the paper, we set this to 10. When more than one experiment is conducted per program, the average runtimes are determined, and added to the produced table.
<L>: A comma-separated list of optimizations to run (in order), default is all six optimizations. 
      The following optimizations can be specified: data_prefetch, iteration_merging, kernel_fusion, loop_unroll, matrix_linearization and tiling. Not selecting all optimizations means that not all experiments will be performed, only those related to the selected optimizations.

Every experiment consists of three steps, applied to a particular example / program:
1. Measuring the time it takes to verify the original code of the example using VerCors;
2. Measuring the time it takes to optimize the code using Alpinist;
3. Measuring the time it takes to verify the optimized code using VerCors.

# Examples
To generate Table 1 entirely, running each experiment once, run:
- ./table_1.sh

Running each experiment once takes approximately 1.5 hours in total.
See table_1.txt for an example output of the script, in which we have removed the intermediate progress messages.


To generate Table 1 entirely with each experiment performed 6 times, run:
- ./table_1.sh 6

To generate Table 1 for only kernel fusion and data prefetching, and run each experiment once, 
   run:
- ./table_1.sh 1 kernel_fusion,data_prefetch

This examples takes approximately 30 minutes in total to run.
See table_1_kf_dp.txt for the output of the script.

Note: When the list of optimizations l is specified, the number of experiments N has to be specified as well.

####################################
### Documentation for table_2.sh ###
####################################
The script table_2.sh can be run to generate Table 2 of the paper: 

./table_2.sh <N>

<N>: The number of experiments per program, default 1. This argument is optional. For the paper, we set this to 10. As for Table 1, when multiple experiments are conducted per program, the average runtimes are recorded. Also, an experiment again consists of the three steps outlined above for Table 1.

# Examples
To generate Table 2 entirely, running each experiment once, run:
- ./table_2.sh

Running each experiment once takes approximately 30 minutes in total.
See table_2.txt for an example output of the script, in which we have removed the intermediate progress messages.

#################################
### Documentation for runN.sh ###
#################################
runN.sh runs a given number of experiments for a particular example.

./runN.sh <N> <directory> <optimization>

<N>: The number of experiments per program, default 1.
<directory>: The directory with the example(s) to run. The script assumes that examples in the directory are prefixed with "orig_".
<optimization>: The optimization to apply. The following optimizations are supported: loop_unroll, matrix_lin, glob_to_reg (for data prefetching), iter_merge, tile and fuse

# Examples

To apply data prefetching twice to "Alpinist-Examples/data_prefetch/examples/Register1/orig_Register1-orig.pvl", run:
- ./runN.sh 2 Alpinist-Examples/data_prefetch/examples/Register1/ glob_to_reg 


###################################################
### Verifying a particular example with VerCors ###
###################################################
The scripts table_1.sh and table_2.sh can be used to generate the tables, but they do not report the result of applying verification, only the runtimes.
To get insight in the verification result for a particular file, it can be subjected to VerCors individually.

To verify an example, the following command can be used:

/usr/bin/vercors --silicon --progress <input file>

<input file>: The file to verify.

If the file verifies successfully, a message is printed "The final verdict is Pass.".

Note: A warning could be printed that states "Could not find file: Object.java". 
      This warning is related to the Java frontend of VerCors and can be safely ignored if PVL is used.

#####################################################
### Optimizing a particular example with Alpinist ###
#####################################################
To optimize an example, the following command can be used:

/usr/bin/alpinist --silicon --progress --encoded-gpuopt <output file> --gpuopt <optimization> <input file>

<input file>:       The file to optimize.
<output file>:      A name for the file containing the optimizated program.
<optimization>:     The optimization to apply. The following optimizations are supported: loop_unroll, matrix_lin, glob_to_reg (for data prefetching), iter_merge, tile and fuse


###############
### Example ###
###############
Suppose we want to optimize and verify the file located at
   /home/tacas22/Alpinist-Examples/loop_unroll/examples/plus/orig_loop_unroll_plus.pvl

To optimize the file we would run (from /home/tacas22):
 
/usr/bin/alpinist --silicon --progress --gpuopt loop_unroll --encoded-gpuopt opt_unrolled_program.pvl Alpinist-Examples/loop_unroll/examples/plus/orig_loop_unroll_plus.pvl

To verify the optimized program we would run (from /home/tacas22):

/usr/bin/vercors --silicon --progress opt_unrolled_program.pvl

