# Binarization for Historical Watermark Images
This code is used for the Bachelor Thesis titled 'Binarization for Historical Watermark Images' by Anna Lantink. This is part of the 2024 [Research Project](https://github.com/TU-Delft-CSE/Research-Project) of TU Delft.

## Code Structure

- `dataset`: contains information regarding the watermark and synthetic watermark datasets.
    - `groundtruth_processing.py`: converts drawings from the drawings dataset [1] into synthetic ground truths.
    - `image_noising.py`: this file takes synthetic ground truths and noises them so they appear like watermarks.
    - `SampleWatermarks.java`: this file goes through all images in a file, logs them into a `.txt` file, and samples them randomly. Note: I made this a `.java` file for efficiency purposes. Since Java code can be bulky, I only included the class with the relevant code here, and thus it is not fully configured.
- `evaluation`: contains code and data that evaluates the algorithms.
  - `ac1-coeff`: contains code that is used in calculating Gwett's AC1 coefficient. This includes `main.r` which is used in the actual coefficient calculation.
  - `evaluate.py`: used to run the evaluation metrics on images.
  - `evaluation_processing.py`: calculates the mean and standard deviation from the files generated by `evaluate.py`.
  - `heat_maps`: contains code regarding the qualitative heat maps. `heat_map.py` is used to process the raw survey data to extract median likert ratings.
- `global_methods`: contains implementations of global thresholding algorithms.
  - `Kavallieratou.py`: implementation of thresholding as explained in Kavallieratou [2].
  - `Mello_Costa.py`: implementation of thresholding as explained in Mello and Costa [3].
  - `Otsu.py`: implementation of thresholding as explained by Otsu [4].
- `hybrid_methods`: contains implementations of hybrid thresholding algorithms.
  - `Rao_et_al.py`: implementation of thresholding as explained in Roa et al. [5].
- `local_methods`: contains implementations of local thresholding algorithms.
  - `Bolan_Su.py`: implementation of thresholding as explained in Bolan Su et al. [6].
  - `Gatos.py`: implementation of thresholding as explained in Gatos et al. [7].
  - `Kamel_Zhao.py`: implementation of thresholding as explained byKamel and Zhao [8].
  - `Lantink.py`: implementation of a thresholding algorithm designed for this thesis.
  - `Niblack.py`: implementation of thresholding as explained by Niblack [9].
  - `Sauvola.py`: implementation of thresholding as explained by Sauvola and Pietikäinen [10].
  - `SoftwareProject.py`: implementation of thresholding from a previous watermark prototype [11].
- `utils.py`: file containing utility functions that are used in some other files.
- `main.py`: this file is mostly used to run other files, but does not contain any particularly important code.

### References
[1] M. Eitz, J. Hays, and M. Alexa, “How do humans sketch objects?,” ACM Trans. Graph., vol. 31, no. 4, pp. 1–10, Aug. 2012, doi: 10.1145/2185520.2185540.

[2] E. Kavallieratou, “A binarization algorithm specialized on document images and photos,” in Eighth International Conference on Document Analysis and Recognition (ICDAR’05), Seoul, South Korea: IEEE, 2005, pp. 463-467 Vol. 1. doi: 10.1109/ICDAR.2005.1.

[3] C. A. B. Mello and A. H. M. Costa, “Image Thresholding of Historical Documents Using Entropy and ROC Curves,” in Progress in Pattern Recognition, Image Analysis and Applications, vol. 3773, A. Sanfeliu and M. L. Cortés, Eds., in Lecture Notes in Computer Science, vol. 3773. , Berlin, Heidelberg: Springer Berlin Heidelberg, 2005, pp. 905–916. doi: 10.1007/11578079_93.

[4] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms,” IEEE Trans. Syst., Man, Cybern., vol. 9, no. 1, pp. 62–66, Jan. 1979, doi: 10.1109/TSMC.1979.4310076.

[5] A. V. S. Rao, G. Sunil, N. V. Rao, T. S. K. Prabhu, L. P. Reddy, and A. S. C. S. Sastry, “Adaptive Binarization of Ancient Documents,” in 2009 Second International Conference on Machine Vision, Dubai, UAE: IEEE, 2009, pp. 22–26. doi: 10.1109/ICMV.2009.8.

[6] Bolan Su, Shijian Lu, and Chew Lim Tan, “Robust Document Image Binarization Technique for Degraded Document Images,” IEEE Trans. on Image Process., vol. 22, no. 4, pp. 1408–1417, Apr. 2013, doi: 10.1109/TIP.2012.2231089.

[7] B. Gatos, I. Pratikakis, and S. J. Perantonis, “Adaptive degraded document image binarization,” Pattern Recognition, vol. 39, no. 3, pp. 317–327, Mar. 2006, doi: 10.1016/j.patcog.2005.09.010.

[8] M. Kamel and A. Zhao, “Extraction of Binary Character/Graphics Images from Grayscale Document Images,” CVGIP: Graphical Models and Image Processing, vol. 55, no. 3, pp. 203–217, May 1993, doi: 10.1006/cgip.1993.1015.

[9] W. Niblack, An introduction to digital image processing. Englewood Cliffs, N.J: Prentice-Hall International, 1986.

[10] J. Sauvola and M. Pietikäinen, “Adaptive document image binarization,” Pattern Recognition, vol. 33, no. 2, pp. 225–236, Feb. 2000, doi: 10.1016/S0031-3203(99)00055-2.

[11] D.-M. Banta, S. Kho, A. N. Lantink, A.-R. Marin, and V. Petkov, “A watermark recognition sys- tem: An approach to matching similar water- marks,” http://resolver.tudelft.nl/uuid:e8dfbd63-ae54- 4159-b786-d1d8c64dc827, 2023.
