AI Explainability through Signal Processing

 

Large language models (LLMs) have revolutionized machine learning due to their ability to capture complex interactions between input features. Popular post-hoc explanation methods like SHAP provide marginal feature attributions, while their extensions to interaction importances only scale to small input lengths (around 20 features). We have introduced Spectral Explainer (SPEX), a model-agnostic interaction attribution algorithm that efficiently scales to large input lengths (around 1000 features). SPEX exploits underlying natural sparsity among interactions—common in real-world data—and applies a sparse Fourier transform using a channel decoding algorithm to efficiently identify important interactions.

Check out our code on the SHAP-IQ repository!

Publications

  1. L. Butler*, A. Agarwal*, J.S. Kang*, Y.E. Erginbas, B. Yu, K. Ramchandran, “ ProxySPEX: Inference-Efficient Interpretability via Sparse Feature Interactions in LLMs”, Arxiv 2025.

  2. J.S. Kang*, L. Butler*, A. Agarwal*, Y.E. Erginbas, R. Pedarsani, K. Ramchandran, B. Yu, “SPEX: Scaling Feature Interaction Explanations for LLMs”, ICML 2025.

  3. J.S. Kang, Y.E. Erginbas, L. Butler, R. Pedarsani, K. Ramchandran, “Learning to Understand: Identifying interactions via the Mobius transform”, NeurIPS 2024.