Code underlying the PhD thesis: Label Alchemy: Transforming Noisy Data into Precious Insights in Deep Learning
DOI: 10.4121/b00277a6-9431-47dc-9369-e9a477031e66
Datacite citation style
Dataset
Labels are essential for training Deep Neural Networks (DNNs), guiding learning with fundamental ground truth. Label quality directly impacts DNN performance and generalization with accurate labels fostering robust predictions. Noisy labels introduce errors and hinder learning, affecting performance adversely. High-quality labels aid convergence, optimizing DNN training towards accurate data distribution representation. Ensuring label accuracy is vital for DNNs' effective learning, generalization, and real-world performance. Undoubtedly, ensuring the quality of labels is not only critical but also demanding, often entailing considerable resources in terms of time and cost. As the scale of datasets grows, methods such as crowdsourcing have gained traction to expedite the labeling process. However, this approach comes with its own set of challenges, most notably the inherent susceptibility to errors and inaccuracies. For example, it was observed that the accuracy of AlexNet in classifying CIFAR-10 images plummeted from 77\% to a mere 10\% when labels were subjected to random flips. This stark drop in accuracy exemplifies the magnitude of influence that corrupted or erroneous labels can exert on the performance of DNNs. Such instances underscore the critical relationship between accurate labels and the efficacy of DNNs in understanding and effectively leveraging data.
Ensuring DNN robustness is vital, involving strategies like noise label identification, filtering, and integrating noise patterns into training for resilient models. Architectural and loss function design also combats label-related challenges, enhancing DNN adaptability across applications. This thesis investigates the pivotal role of labels in DNN training and their quality impact on model performance. Strategies spanning noise recovery, robust learning frameworks, and multi-label solutions contribute to DNN resilience against noisy labels, advancing both understanding and practical applications.
***This is the code repository for each chapter of the thesis. ***
History
- 2024-04-22 first online, published, posted
Publisher
4TU.ResearchDataFormat
compressed files for each chapter .zip, python code files .pyOrganizations
TU Delft, Faculty of Engineering, Mathematics and Computer Science, Distributed Systems GroupDATA
Files (6)
- 1,173 bytesMD5:
f36998e92cf48b454f02875619fda9f4
README.md - 21,402 bytesMD5:
9af8d13048c5a11508048fddeb12ad87
chapter2.zip - 8,026,652 bytesMD5:
d440083f9abcc98b5099f6a8c1638356
chapter3.zip - 37,887 bytesMD5:
b34c256fc4d4caeffe9d7a727defbb56
chapter4.zip - 54,891 bytesMD5:
18d8e26512f6c1790bff33832734da4d
chapter5.zip - 126,943 bytesMD5:
895d410acfbbc8f494deccdfd4d13fbf
chapter6.zip -
download all files (zip)
8,268,948 bytes unzipped