The new training technique called FairDeDup was developed by an Oregon State University doctoral student and researchers at Adobe to make artificial intelligence systems less socially biased. Deduplication, the process of removing redundant information from training data, reduces the high costs associated with training AI systems. Datasets gathered from the internet often contain biases present in society, which can perpetuate unfair ideas and behavior when encoded in AI models. FairDeDup aims to address these biases by incorporating fairness considerations during the deduplication process.
The researchers found that while removing redundant data can improve AI training accuracy and reduce resource requirements, it can also exacerbate harmful social biases that AI systems often learn. FairDeDup was designed as an improvement upon an earlier method, SemDeDup, by incorporating fairness considerations. By thinning datasets of image captions collected from the web through a content-aware pruning process, FairDeDup allows for informed decisions about which data to keep and which to discard. This approach enables more cost-effective and accurate AI training that is also fairer.
FairDeDup not only removes redundant data but also incorporates controllable, human-defined dimensions of diversity to mitigate biases. Biases related to occupation, race, gender, age, geography, and culture can be perpetuated during training if not addressed. By addressing biases during dataset pruning, the researchers aim to create AI systems that are more socially just. Rather than imposing a prescribed notion of fairness, the approach allows for contextualized fairness defined by users or specific settings in which the AI system is deployed.
The FairDeDup algorithm, presented at the IEEE/CVF Conference on Computer Vision and Pattern Recognition in Seattle, aims to create a pathway for AI to act fairly based on the setting and user base in which it is deployed. The collaboration between the OSU doctoral student, Adobe researchers, and an assistant professor at OSU College of Engineering highlights the interdisciplinary nature of this research. By incorporating fairness considerations into the deduplication process, the researchers hope to mitigate biases and create AI systems that serve diverse representations of people.
By addressing biases present in training data, FairDeDup seeks to make AI systems more socially just and fair. The approach allows for the removal of redundant data while incorporating diversity dimensions to mitigate biases related to various factors such as occupation, race, gender, age, geography, and culture. By incorporating fairness considerations into the training process, the researchers aim to create a more equitable and accurate AI training that is cost-effective and fair. FairDeDup provides a pathway for AI systems to act fairly based on the contexts and user bases in which they are deployed, allowing for user-defined notions of fairness to guide the AI’s behavior.