TY - DATA T1 - Data and Code Underlying the Bachelor Thesis: Binarization of Historical Watermarks PY - 2025/11/18 AU - Anna Lantink AU - Martin Skrodzki UR - DO - 10.4121/226cb04e-4370-47d0-b678-792db5be685c.v1 KW - binarization KW - historical paper KW - physical watermarks KW - human evaluation N2 -
Data and Code Underlying the Publication: Binarization of Historical Watermarks
These files provide the data and code used for the Bachelor's thesis, "Binarization of Historical Watermarks: A Review of Thresholding Techniques Applied to Historical Watermark Images." The objective of this thesis was to review the effectiveness of different thresholding techniques when applied to noisy historical watermark images. To this end, several different thresholding techniques were programmatically implemented. Data was collected for both a qualitative and quantitative evaluation, separately. For the qualitative evaluation, watermarks were randomly sampled from a private watermark dataset owned and provided by the German Museum of Books and Writing (https://www.dnb.de/EN/Ueber-uns/DBSM/dbsm_node.html##sprg315370), with their permission. These sampled watermarks were thresholded using several techniques. Consenting participants were then asked to fill out a survey regarding which technique they thought was most effective for each watermark image. For qualitative data, a dataset of human sketches [1] was randomly sampled, split into test and validation sets, and noised to appear like watermarks. The F1 Score, PSNR, NRM, and MPM metrics were calculated for each pair of clean and noised images.
Note that all participant data is anonymized, and no original watermark data is included due to copyright restrictions.
Organization of the data
- `code.zip`: This file contains all the code used during the research. This includes implementations of thresholding techniques, as well as code used for data processing. It should be noted that none of the watermark data, either real or synthetic, is included in this file. For this reason, some paths that lead to images in the code will not work.
- `qualitative_data.zip`: This file contains the anonymized survey data filled in by participants, as well as the resulting files produced by processing the survey data. A copy of the original survey is not included, since permission has not been gained to redistribute the watermark images which are included in the survey. The code used to generate the processed results can be found in `code.zip`.
- `quantitative_data.zip`: This file contains `.csv` files that detail the images sampled from the Human Drawings dataset [1], and the results after processing and evaluating these images. The code used to generate the processed results can be found in `code.zip`.
References
[1] M. Eitz, J. Hays, and M. Alexa, “How do humans sketch objects?” ACM Trans. Graph., vol. 31, no. 4, pp. 1–10, Aug. 2012, doi: 10.1145/2185520.2185540.
ER -