Abstract

Interpreting machine learning models remains challenging, particularly in high-stakes applications where trust and transparency are vital. We introduce Reliable Gradient Explanations (RGE), a method designed to enhance the stability and consistency of gradient-based feature importance explanations. RGE combines first-order gradient information with second-order Hessian elements to refine feature importance based on output curvature, reducing instability in traditional methods. Preliminary results indicate that RGE improves explanation accuracy and stability across different model architectures. Ongoing research aims to refine RGE, evaluate its performance on diverse datasets, and compare it with established interpretability techniques, ultimately promoting more transparent and reliable AI-driven decisions

Share

COinS