Investigating Data Interventions for Subgroup Fairness: An ICU Case Study
arXiv:2604.03478v1 Announce Type: new Abstract: In high-stakes settings where machine learning models are used to automate decision-making about individuals, the presence of algorithmic bias can exacerbate systemic harm to certain subgroups of people. These biases often stem from the underlying training data. In practice, interventions to "fix the data" depend on the actual additional data sources available -- where many are less than ideal. In these cases, the effects of data scaling on subgroup performance become volatile, as the improvements from increased sample size are counteracted by the introduction of distribution shifts in the training set. In this paper, we investigate the limitations of combining data sources to improve subgroup performance within the context of healthcare. Clinical models are commonly trained on datasets comprised of patient electronic health record (EHR) data from different hospitals or admission departments. Across two such datasets, the eICU Collaborat
arXiv:2604.03478v1 Announce Type: new Abstract: In high-stakes settings where machine learning models are used to automate decision-making about individuals, the presence of algorithmic bias can exacerbate systemic harm to certain subgroups of people. These biases often stem from the underlying training data. In practice, interventions to "fix the data" depend on the actual additional data sources available -- where many are less than ideal. In these cases, the effects of data scaling on subgroup performance become volatile, as the improvements from increased sample size are counteracted by the introduction of distribution shifts in the training set. In this paper, we investigate the limitations of combining data sources to improve subgroup performance within the context of healthcare. Clinical models are commonly trained on datasets comprised of patient electronic health record (EHR) data from different hospitals or admission departments. Across two such datasets, the eICU Collaborative Research Database and the MIMIC-IV dataset, we find that data addition can both help and hurt model fairness and performance, and many intuitive strategies for data selection are unreliable. We compare model-based post-hoc calibration and data-centric addition strategies to find that the combination of both is important to improve subgroup performance. Our work questions the traditional dogma of "better data" for overcoming fairness challenges by comparing and combining data- and model-based approaches.
Executive Summary
This article presents an ICU case study investigating data interventions for subgroup fairness. By examining the effects of combining two different datasets, the authors challenge the traditional notion that 'better data' is the solution to fairness challenges. The study reveals that data addition can both improve and harm model fairness and performance, and that intuitive strategies for data selection may be unreliable. The authors propose a combination of model-based post-hoc calibration and data-centric addition strategies to improve subgroup performance. This research questions the conventional approach to addressing fairness challenges and highlights the importance of data- and model-based approaches. The study's findings have significant implications for the fields of healthcare and artificial intelligence, and its methodological approach can be applied to various high-stakes settings.
Key Points
- ▸ The authors challenge the traditional notion that 'better data' is the solution to fairness challenges.
- ▸ Data addition can both improve and harm model fairness and performance.
- ▸ Intuitive strategies for data selection may be unreliable.
- ▸ A combination of model-based post-hoc calibration and data-centric addition strategies is necessary to improve subgroup performance.
Merits
Strength in Methodology
The authors use a rigorous case study approach, combining two different datasets to investigate the effects of data addition on subgroup fairness.
Value in Challenging Conventional Wisdom
The study challenges the traditional notion that 'better data' is the solution to fairness challenges, which is a valuable contribution to the field.
Importance of Data- and Model-Based Approaches
The authors highlight the importance of combining data- and model-based approaches to improve subgroup performance, which is a critical insight for researchers and practitioners.
Demerits
Limitation in Generalizability
The study's findings may not be generalizable to other high-stakes settings, as the authors focus on the ICU context.
Potential for Overemphasis on Technical Solutions
The study's focus on data- and model-based approaches may lead to an overemphasis on technical solutions, potentially overlooking the need for policy and social changes to address fairness challenges.
Expert Commentary
This article presents a timely and important contribution to the field of fairness and accountability in machine learning. The authors' rigorous case study approach and emphasis on the importance of combining data- and model-based approaches are particularly noteworthy. However, the study's limitations in generalizability and potential for overemphasis on technical solutions should be taken into consideration. The study's findings have significant implications for the fields of healthcare and artificial intelligence, and its methodological approach can be applied to various high-stakes settings. As the field continues to evolve, it is essential to prioritize both technical and policy solutions to address fairness challenges in machine learning.
Recommendations
- ✓ Future studies should investigate the generalizability of the study's findings to other high-stakes settings.
- ✓ Researchers and practitioners should prioritize a combination of data- and model-based approaches to improve subgroup performance.
Sources
Original: arXiv - cs.LG