Data and Code Underlying the Bachelor Thesis: Binarization of Historical Watermarks

DOI:10.4121/226cb04e-4370-47d0-b678-792db5be685c.v1
The DOI displayed above is for this specific version of this dataset, which is currently the latest. Newer versions may be published in the future. For a link that will always point to the latest version, please use
DOI: 10.4121/226cb04e-4370-47d0-b678-792db5be685c

Datacite citation style

Lantink, Anna; Skrodzki, Martin (2025): Data and Code Underlying the Bachelor Thesis: Binarization of Historical Watermarks. Version 1. 4TU.ResearchData. dataset. https://doi.org/10.4121/226cb04e-4370-47d0-b678-792db5be685c.v1
Other citation styles (APA, Harvard, MLA, Vancouver, Chicago, IEEE) available at Datacite

Dataset

Data and Code Underlying the Publication: Binarization of Historical Watermarks

These files provide the data and code used for the Bachelor's thesis, "Binarization of Historical Watermarks: A Review of Thresholding Techniques Applied to Historical Watermark Images." The objective of this thesis was to review the effectiveness of different thresholding techniques when applied to noisy historical watermark images. To this end, several different thresholding techniques were programmatically implemented. Data was collected for both a qualitative and quantitative evaluation, separately. For the qualitative evaluation, watermarks were randomly sampled from a private watermark dataset owned and provided by the German Museum of Books and Writing (https://www.dnb.de/EN/Ueber-uns/DBSM/dbsm_node.html##sprg315370), with their permission. These sampled watermarks were thresholded using several techniques. Consenting participants were then asked to fill out a survey regarding which technique they thought was most effective for each watermark image. For qualitative data, a dataset of human sketches [1] was randomly sampled, split into test and validation sets, and noised to appear like watermarks. The F1 Score, PSNR, NRM, and MPM metrics were calculated for each pair of clean and noised images.


Note that all participant data is anonymized, and no original watermark data is included due to copyright restrictions.


Organization of the data

- `code.zip`: This file contains all the code used during the research. This includes implementations of thresholding techniques, as well as code used for data processing. It should be noted that none of the watermark data, either real or synthetic, is included in this file. For this reason, some paths that lead to images in the code will not work.

- `qualitative_data.zip`: This file contains the anonymized survey data filled in by participants, as well as the resulting files produced by processing the survey data. A copy of the original survey is not included, since permission has not been gained to redistribute the watermark images which are included in the survey. The code used to generate the processed results can be found in `code.zip`.

- `quantitative_data.zip`: This file contains `.csv` files that detail the images sampled from the Human Drawings dataset [1], and the results after processing and evaluating these images. The code used to generate the processed results can be found in `code.zip`.


References

[1] M. Eitz, J. Hays, and M. Alexa, “How do humans sketch objects?” ACM Trans. Graph., vol. 31, no. 4, pp. 1–10, Aug. 2012, doi: 10.1145/2185520.2185540.

History

  • 2025-11-18 first online, published, posted

Publisher

4TU.ResearchData

Format

zipped folders, including: Code (Python, Java, R); Qualitative Data (CSV, MD); Quantitative Data (TXT, CSV)

Organizations

TU Delft, Faculty of Electrical Engineering, Mathematics and Computer Science

DATA

Files (4)