Title: AI IN THE SKY: ADVANCING WILDLIFE SURVEY METHODS IN AFRICA WITH DEEP LEARNING AND AERIAL IMAGERY

Author: Zeyu Xu

Department of Nature Resources Group, Faculty of Geo-Information Science and Earth Observation, University of Twente

Contact: z.xu-1@utwente.nl

====================
General Introduction
====================

This dataset mainly supports the research of my PhD thesis, which consists of six chapters. Chapters 2 to 5 present the core research work. Chapter 2 is a review article, while Chapters 3 to 5 are based on image data.

Remote sensing image datasets used in Chapters 3 to 5:

(1) The Aerial Elephant Dataset (AED):

This is a public dataset, available at:https://zenodo.org/records/3234780

(2) The Antelope Dataset:

This dataset was provided by a collaborator, African Parks in South Sudan. I was granted permission to use it for my PhD research, but it cannot be shared.

For access inquiries, please contact the South Sudan African Parks: https://www.africanparks.org/the-parks/badingilo-boma

Deep Learning Network code used in Chapters 3 to 5:

YOLO: https://github.com/ultralytics/ultralytics
RT-DETR: https://github.com/ultralytics/ultralytics
CenterNet: https://github.com/xingyizhou/CenterNet
U-Net: https://github.com/zeyu-rs/ZyPro
D2-Net: https://github.com/mihaidusmanu/d2-net

===============================
Description of the Included Data
===============================

(1) AED:

The original AED dataset provides only point-based annotations. I created corresponding bounding box annotations in standard VOC format. These data are used in Chapters 3 and 4.

(2) Supporting scripts:

Supporting scripts for Chapters 3 to 5

yolo_visualization.py: Tool for visualizing YOLO-format object detection annotations with customizable class colors and confidence thresholds.

voc_dataset_clipper.py: Utility to split large images and their VOC-format annotations into smaller tiles while maintaining accurate object annotations.

image_size_checker.py: Script to verify and filter images based on size, removing unmatched images and labels.

random_processor.py: Tool for randomly sampling from value ranges in a CSV file to generate balanced subsets.

text_file_splitter.py: Script to split images and YOLO-format labels into tiles while handling edge cases and preserving annotations with IOU thresholds.

statistics_analyzer.py: Tool for analyzing and visualizing dataset statistics such as class distribution and bounding box sizes.
