AMCIS 2024 TREOs

Clustered interpretability: Developing metrics and a framework for explainable machine learning in clustered datasets

Matthew Baucum, Colorado State UniversityFollow

Media is loading

Abstract

Cluster-heterogenous data – i.e., datasets in which feature distributions systematically differ across subgroups – is commonly encountered in machine learning research. Because cluster-based heterogeneity complicates the data analysis process, it is especially important that machine learning models trained on clustered datasets follow best practices for interpretability and explainability. Yet there is currently a literature gap regarding how the principles of interpretable machine learning (IML) should be applied to models trained on clustered datasets. Most IML research focuses on interpretability at either the global level (i.e., understanding feature-outcome relationships across the entire dataset) or the local level (understanding feature-outcome relationships for individual training instances), with no well-established techniques for understanding feature-outcome relationships at the cluster level. Other research has focused on the interpretability of unsupervised clustering (i.e., clustering features based on their data values), but quantifying and optimizing the interpretability of clusters based on their feature-outcome relationships is an open research question. In this work, we propose novel metrics for quantifying the interpretability of cluster solutions based on the clusters’ heterogenous feature-outcome relationships. We then develop a metaheuristic algorithm for fitting cluster partitions that optimize these metrics. Our approach emphases both within-cluster interpretability (i.e., ensuring each cluster is well-described by a parsimonious set of feature-outcome relationships) and between-cluster interpretability (i.e., ensuring that clusters’ feature-outcome relationships differ in understandable ways). Using two real-world healthcare datasets, we demonstrate that machine learning models trained on interpretable cluster solutions perform just as well (or better) than models trained on traditionally-fit clusters (e.g., k-means, DB-scan, etc.), and that the heterogeneous feature-outcome relationships across clusters are more interpretable under our approach. We discuss our framework’s implications for the information systems field’s increasing emphasis on explainable AI-driven decision making (Bauer et al. 2023).

Paper Number

tpp1397

Recommended Citation

Baucum, Matthew, "Clustered interpretability: Developing metrics and a framework for explainable machine learning in clustered datasets" (2024). AMCIS 2024 TREOs. 83.
https://aisel.aisnet.org/treos_amcis2024/83

Download

COinS

When commenting on articles, please be friendly, welcoming, respectful and abide by the AIS eLibrary Discussion Thread Code of Conduct posted here.

AMCIS 2024 TREOs

Clustered interpretability: Developing metrics and a framework for explainable machine learning in clustered datasets

Abstract

Paper Number

Recommended Citation

Search

Links

Browse

Author Corner

AMCIS 2024 TREOs

Clustered interpretability: Developing metrics and a framework for explainable machine learning in clustered datasets

Authors

Abstract

Paper Number

Recommended Citation

Share

Search

Links

Browse

Author Corner