research-article

Open Access

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

Authors:
Sebastian Bordt

Department of Computer Science, University of Tübingen, Germany

Department of Computer Science, University of Tübingen, Germany
View Profile

,
Michèle Finck

Law Faculty, University of Tübingen, Germany

Law Faculty, University of Tübingen, Germany
View Profile

,
Eric Raidl

Ethics and Philosophy Lab, University of Tübingen, Germany

Ethics and Philosophy Lab, University of Tübingen, Germany
View Profile

,
Ulrike von Luxburg

Department of Computer Science, University of Tübingen, Germany

Department of Computer Science, University of Tübingen, Germany
View Profile

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and TransparencyJune 2022Pages 891–905https://doi.org/10.1145/3531146.3533153

Published:20 June 2022Publication History

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

Pages 891–905

ABSTRACT

Existing and planned legislation stipulates various obligations to provide information about machine learning algorithms and their functioning, often interpreted as obligations to “explain”. Many researchers suggest using post-hoc explanation algorithms for this purpose. In this paper, we combine legal, philosophical and technical arguments to show that post-hoc explanation algorithms are unsuitable to achieve the law’s objectives. Indeed, most situations where explanations are requested are adversarial, meaning that the explanation provider and receiver have opposing interests and incentives, so that the provider might manipulate the explanation for her own ends. We show that this fundamental conflict cannot be resolved because of the high degree of ambiguity of post-hoc explanations in realistic application scenarios. As a consequence, post-hoc explanation algorithms are unsuitable to achieve the transparency objectives inherent to the legal norms. Instead, there is a need to more explicitly discuss the objectives underlying “explainability” obligations as these can often be better achieved through other mechanisms. There is an urgent need for a more open and honest discussion regarding the potential and limitations of post-hoc explanations in adversarial contexts, in particular in light of the current negotiations of the European Union’s draft Artificial Intelligence Act.

References

P. Achinstein. 1983. The Nature of Explanation. Oxford University Press, New York.Google Scholar
A. Adadi and M. Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6(2018), 52138–52160.Google ScholarCross Ref
J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim. 2018. Sanity checks for saliency maps. In Neural Information Processing Systems (NeurIPS).Google Scholar
A.Karimi, G. Barthe, B. Schölkopf, and I. Valera. 2021. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arxiv:2010.04050Google Scholar
C. Anders, P. Pasliev, A. K. Dombrowski, K. R. Müller, and P. Kessel. 2020. Fairwashing explanations with off-manifold detergent. In International Conference on Machine Learning (ICML).Google Scholar
S. Barocas, M. Hardt, and A. Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.Google Scholar
S. Barocas, A. Selbst, and M. Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In ACM Conference on Fairness, Accountability, and Transparency.Google Scholar
R. B. Braithwaite. 1953. Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science. Cambridge University Press, Cambridge.Google Scholar
O. Camburu, E. Giunchiglia, J. Foerster, T. Lukasiewicz, and P. Blunsom. 2019. Can I trust the explainer? Verifying post-hoc explanatory methods. arXiv:1910.02065 (2019).Google Scholar
L. Chazette, W. Brunotte, and T. Speith. 2021. Exploring explainability: A definition, a model, and a knowledge catalogue. In IEEE 29th International Requirements Engineering Conference (RE).Google Scholar
European Commission. 2020. White Paper on Artificial Intelligence-A European approach to excellence and trust. Com (2020) 65 Final (2020).Google Scholar
I. Covert, S. Lundberg, and S.I. Lee. 2021. Explaining by removing: A unified framework for model explanation. Journal of Machine Learning Research (JMLR) 22, 209 (2021), 1–90.Google Scholar
F. Ding, M. Hardt, J. Miller, and L. Schmidt. 2021. Retiring Adult: New Datasets for Fair Machine Learning. In Neural Information Processing Systems (NeurIPS).Google Scholar
L. Edwards and M. Veale. 2017. Slave to the algorithm: Why a right to an explanation is probably not the remedy you are looking for. Duke Law and Technology Review 16 (2017).Google Scholar
D. Garreau and U. von Luxburg. 2020. Explaining the Explainer: A First Theoretical Analysis of LIME. In Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar
S. Ghalebikesabi, L. Ter-Minassian, K. DiazOrdaz, and C. C. Holmes. 2021. On locality of local explanation models. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
C. Hempel. 1965. Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. Free Press, New York.Google Scholar
M. Hildebrandt. 2019. Privacy as protection of the incomputable self: From agnostic to agonistic machine learning. Theoretical Inquiries in Law 20, 1 (2019), 83–121.Google ScholarCross Ref
A. Z. Jacobs and H. Wallach. 2021. Measurement and fairness. In ACM conference on Fairness, Accountability, and Transparency.Google Scholar
D. Janzing, L. Minorics, and P. Blöbaum. 2020. Feature relevance quantification in explainable AI: A causal problem. In International Conference on Artificial Intelligence and Statistics (AISTATS).Google Scholar
M. Kaminski and J. Urban. 2021. The Right to Contest AI. Columbia Law Review (2021).Google Scholar
L. Kästner, M. Langer, V. Lazar, A. Schomäcker, T. Speith, and S. Sterz. 2021. On the Relation of Trust and Explainability: Why to Engineer for Trustworthiness. In IEEE 29th International Requirements Engineering Conference Workshops (REW).Google Scholar
J. Kleinberg, J. Ludwig, S. Mullainathan, and C. Sunstein. 2018. Discrimination in the Age of Algorithms. Journal of Legal Analysis 10 (2018), 113–174.Google ScholarCross Ref
R. Kommiya Mothilal, D. Mahajan, C. Tan, and A. Sharma. 2021. Towards unifying feature attribution and counterfactual explanations: Different means to the same end. In AAAI/ACM Conference on AI, Ethics, and Society.Google Scholar
S. Krishna, T. Han, A. Gu, J. Pombra, S. Jabbari, S. Wu, and H. Lakkaraju. 2022. The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective. arXiv preprint arXiv:2202.01602(2022).Google Scholar
M. Langer, D. Oster, T. Speith, H. Hermanns, L. Kästner, E. Schmidt, A. Sesing, and K. Baum. 2021. What do we want from Explainable Artificial Intelligence (XAI)? – A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial Intelligence 296 (2021).Google Scholar
E. Lee, D. Braines, Mi. Stiffler, A. Hudler, and D. Harborne. 2019. Developing the sensitivity of LIME for better machine learning explanation. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications.Google Scholar
D. Lewis. 1973. Counterfactuals. Blackwell.Google Scholar
Q. V. Liao and K. R. Varshney. 2021. Human-Centered Explainable AI (XAI): From Algorithms to User Experiences. arXiv preprint arXiv:2110.10790(2021).Google Scholar
S. Lundberg and S. Lee. 2017. A unified approach to interpreting model predictions. In Neural Information Processing Systems (NeurIPS).Google Scholar
S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. I. Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1 (2020), 56–67.Google Scholar
G. Malgieri and G. Comandé. 2017. Why a Right to Legibility of Automated Decision-Making Exists in the General Data Protection Regulation. International Data Privacy Law 7, 4 (11 2017), 243–265.Google Scholar
C. Molnar. 2020. Interpretable machine learning. Lulu.com.Google Scholar
R. Mothilal, A. Sharma, and C. Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In ACM Conference on Fairness, Accountability, and Transparency.Google Scholar
High-Level Expert Group on AI. 2019. Ethics Guidelines for Trustworthy AI.Google Scholar
Working Party. 2016. Guidelines on Automated individual decision-making and Profiling for the purposes of RegulationGuidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679.Google Scholar
A. Paullada, I. Raji, E. Bender, E.and Denton, and A. Hanna. 2021. Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns 2, 11 (2021).Google Scholar
J. Pearl. 2000. Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge.Google ScholarDigital Library
K. Popper. 1959. The Logic of Scientific Discovery. Hutchinson, London.Google Scholar
A. Reutlinger and J. Saatsi. 2018. Explanation Beyond Causation; Philosophical Perspectives on Non-Causal Explanations. Oxford University Press, Oxford.Google Scholar
M. T. Ribeiro, S. Singh, and C. Guestrin. 2016. Why should i trust you? Explaining the predictions of any classifier. In 22nd ACM SIGKDD international conference on knowledge discovery and data mining.Google Scholar
C. Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.Google ScholarCross Ref
W. Salmon. 1971. Statistical Explanation and Statistical Relevance. University of Pittsburgh Press, Pittsburgh, PA.Google Scholar
W. Salmon. 1989. Four Decades of Scientific Explanation. In Scientific Explanation, Kitcher and Salmon (Eds.). Minnesota Studies in the Philosophy of Science, Vol. 13. University of Minnesota Press, 3–219.Google Scholar
A. Selbst and J. Powles. 2018. Meaningful Information and the Right to Explanation. In ACM Conference on Fairness, Accountability, and Transparency.Google Scholar
D. Slack, A. Hilgard, S. Singh, and H. Lakkaraju. 2021. Reliable post hoc explanations: Modeling uncertainty in explainability. In Neural Information Processing Systems (NeurIPS).Google Scholar
D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju. 2020. Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In AAAI/ACM Conference on AI, Ethics, and Society.Google Scholar
D. Slack, S. Hilgard, H. Lakkaraju, and S. Singh. 2021. Counterfactual Explanations Can Be Manipulated. arXiv:2106.02666 (2021).Google Scholar
P. Spirtes, C. Glymour, and R. Scheines. 1993. Causation, Prediction, and Search. Springer, Berlin.Google Scholar
W. Spohn. 1980. Stochastic independence, causal independence, and shieldability. Journal of Philosophical Logic 9 (1980), 73–99.Google ScholarCross Ref
M. Sundararajan and A. Najmi. 2020. The many Shapley values for model explanation. In International Conference on Machine Learning (ICML).Google Scholar
R. Tomsett, D. Braines, D. Harborne, A. Preece, and S. Chakraborty. 2018. Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems. In ICML Workshop on Human Interpretability in Machine Learning.Google Scholar
P. Tschandl, C. Rinner, Z. Apalla, G. Argenziano, N. Codella, A. Halpern, M. Janda, A. Lallas, C. Longo, J. Malvehy, J. Paoli, S. Puig, C. Rosendahl, H. Soyer, I. Zalaudek, and H. Kittler. 2020. Human–computer collaboration for skin cancer recognition. Nature Medicine 26, 8 (2020), 1229–1234.Google ScholarCross Ref
M. Veale and F. Zuiderveen Borgesius. 2021. Demystifying the Draft EU Artificial Intelligence Act—Analysing the good, the bad, and the unclear elements of the proposed approach. Computer Law Review International 22, 4 (2021), 97–112.Google ScholarCross Ref
S. Venkatasubramanian and M. Alfano. 2020. The Philosophical Basis of Algorithmic Recourse. In ACM Conference on Fairness, Accountability, and Transparency.Google Scholar
G. Vilone and L. Longo. 2021. Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion 76(2021), 89–106.Google ScholarDigital Library
W. J. von Eschenbach. 2021. Transparency and the Black Box Problem: Why We Do Not Trust AI. Philos. Technol. 34(2021), 1607–1622.Google Scholar
U. von Luxburg, R. Williamson, and I. Guyon. 2012. Clustering: Science or Art?JMLR Workshop and Conference Proceedings (Workshop on Unsupervised Learning and Transfer Learning)(2012), 65 – 79.Google Scholar
S. Wachter, B. Mittelstadt, and L. Floridi. 2017. Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation. International Data Privacy Law 7, 2 (06 2017), 76–99.Google Scholar
S. Wachter, B. Mittelstadt, and C. Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31(2017), 841.Google Scholar
J. Woodward. 2003. Making Things Happen: A Theory of Causal Explanation. Oxford University Press.Google Scholar
J. Woodward and L. Ross. 2003. Scientific Explanation. The Stanford Encyclopedia of Philosophy (Summer Edition 2021) (2003). https://plato.stanford.edu/archives/sum2021/entries/scientific-explanation/Google Scholar
C. Zednik and H. Boelsen. forthcoming. Scientific Exploration and Explainable Artificial Intelligence. Minds and Machines(forthcoming).Google Scholar
Y. Zhang, K. Song, Y. Sun, S. Tan, and M. Udell. 2019. Why Should You Trust My Explanation? Understanding Uncertainty in LIME Explanations. arXiv preprint arXiv:1904.12991(2019).Google Scholar

Index Terms

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

Index terms have been assigned to the content through auto-classification.

Recommendations

When Should We Use Linear Explanations?
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge Management

The increasing interest in transparent and fair AI systems has propelled the research in explainable AI (XAI). One of the main research lines in XAI is post-hoc explainability, the task of explaining the logic of an already deployed black-box model. ...
Read More
Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities
While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the ...
Read More
Counterfactual Explanations for Reinforcement Learning Agents
AAMAS '23: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems

Reinforcement learning (RL) algorithms often use neural networks to represent agent's policy, making them difficult to interpret. Counterfactual explanations are human-friendly explanations which offer users actionable advice on how to change their ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
June 2022
2351 pages
ISBN:9781450393522
DOI:10.1145/3531146

Copyright © 2022 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 20 June 2022
Check for updates
Author Tags
Artificial Intelligence Act
Counterfactual Explanations
Explainability
GDPR
LIME
Regulation
SHAP
Transparency
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 17
  Total Citations
  View Citations
- 1,391
  Total Downloads
- Downloads (Last 12 months)1,071
- Downloads (Last 6 weeks)117
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

When Should We Use Linear Explanations?

Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities

Counterfactual Explanations for Reinforcement Learning Agents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency

ABSTRACT

References

Cited By

Index Terms

Recommendations

When Should We Use Linear Explanations?

Redefining Counterfactual Explanations for Reinforcement Learning: Overview, Challenges and Opportunities

Counterfactual Explanations for Reinforcement Learning Agents

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media