skip to main content
10.1145/3531146.3533153acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfacctConference Proceedingsconference-collections
research-article
Open Access

Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts

Authors Info & Claims
Published:20 June 2022Publication History

ABSTRACT

Existing and planned legislation stipulates various obligations to provide information about machine learning algorithms and their functioning, often interpreted as obligations to “explain”. Many researchers suggest using post-hoc explanation algorithms for this purpose. In this paper, we combine legal, philosophical and technical arguments to show that post-hoc explanation algorithms are unsuitable to achieve the law’s objectives. Indeed, most situations where explanations are requested are adversarial, meaning that the explanation provider and receiver have opposing interests and incentives, so that the provider might manipulate the explanation for her own ends. We show that this fundamental conflict cannot be resolved because of the high degree of ambiguity of post-hoc explanations in realistic application scenarios. As a consequence, post-hoc explanation algorithms are unsuitable to achieve the transparency objectives inherent to the legal norms. Instead, there is a need to more explicitly discuss the objectives underlying “explainability” obligations as these can often be better achieved through other mechanisms. There is an urgent need for a more open and honest discussion regarding the potential and limitations of post-hoc explanations in adversarial contexts, in particular in light of the current negotiations of the European Union’s draft Artificial Intelligence Act.

References

  1. P. Achinstein. 1983. The Nature of Explanation. Oxford University Press, New York.Google ScholarGoogle Scholar
  2. A. Adadi and M. Berrada. 2018. Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI). IEEE Access 6(2018), 52138–52160.Google ScholarGoogle ScholarCross RefCross Ref
  3. J. Adebayo, J. Gilmer, M. Muelly, I. Goodfellow, M. Hardt, and B. Kim. 2018. Sanity checks for saliency maps. In Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  4. A.Karimi, G. Barthe, B. Schölkopf, and I. Valera. 2021. A survey of algorithmic recourse: definitions, formulations, solutions, and prospects. arxiv:2010.04050Google ScholarGoogle Scholar
  5. C. Anders, P. Pasliev, A. K. Dombrowski, K. R. Müller, and P. Kessel. 2020. Fairwashing explanations with off-manifold detergent. In International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  6. S. Barocas, M. Hardt, and A. Narayanan. 2019. Fairness and Machine Learning. fairmlbook.org. http://www.fairmlbook.org.Google ScholarGoogle Scholar
  7. S. Barocas, A. Selbst, and M. Raghavan. 2020. The hidden assumptions behind counterfactual explanations and principal reasons. In ACM Conference on Fairness, Accountability, and Transparency.Google ScholarGoogle Scholar
  8. R. B. Braithwaite. 1953. Scientific Explanation: A Study of the Function of Theory, Probability and Law in Science. Cambridge University Press, Cambridge.Google ScholarGoogle Scholar
  9. O. Camburu, E. Giunchiglia, J. Foerster, T. Lukasiewicz, and P. Blunsom. 2019. Can I trust the explainer? Verifying post-hoc explanatory methods. arXiv:1910.02065 (2019).Google ScholarGoogle Scholar
  10. L. Chazette, W. Brunotte, and T. Speith. 2021. Exploring explainability: A definition, a model, and a knowledge catalogue. In IEEE 29th International Requirements Engineering Conference (RE).Google ScholarGoogle Scholar
  11. European Commission. 2020. White Paper on Artificial Intelligence-A European approach to excellence and trust. Com (2020) 65 Final (2020).Google ScholarGoogle Scholar
  12. I. Covert, S. Lundberg, and S.I. Lee. 2021. Explaining by removing: A unified framework for model explanation. Journal of Machine Learning Research (JMLR) 22, 209 (2021), 1–90.Google ScholarGoogle Scholar
  13. F. Ding, M. Hardt, J. Miller, and L. Schmidt. 2021. Retiring Adult: New Datasets for Fair Machine Learning. In Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  14. L. Edwards and M. Veale. 2017. Slave to the algorithm: Why a right to an explanation is probably not the remedy you are looking for. Duke Law and Technology Review 16 (2017).Google ScholarGoogle Scholar
  15. D. Garreau and U. von Luxburg. 2020. Explaining the Explainer: A First Theoretical Analysis of LIME. In Conference on Artificial Intelligence and Statistics (AISTATS).Google ScholarGoogle Scholar
  16. S. Ghalebikesabi, L. Ter-Minassian, K. DiazOrdaz, and C. C. Holmes. 2021. On locality of local explanation models. In Advances in Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  17. C. Hempel. 1965. Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. Free Press, New York.Google ScholarGoogle Scholar
  18. M. Hildebrandt. 2019. Privacy as protection of the incomputable self: From agnostic to agonistic machine learning. Theoretical Inquiries in Law 20, 1 (2019), 83–121.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Z. Jacobs and H. Wallach. 2021. Measurement and fairness. In ACM conference on Fairness, Accountability, and Transparency.Google ScholarGoogle Scholar
  20. D. Janzing, L. Minorics, and P. Blöbaum. 2020. Feature relevance quantification in explainable AI: A causal problem. In International Conference on Artificial Intelligence and Statistics (AISTATS).Google ScholarGoogle Scholar
  21. M. Kaminski and J. Urban. 2021. The Right to Contest AI. Columbia Law Review (2021).Google ScholarGoogle Scholar
  22. L. Kästner, M. Langer, V. Lazar, A. Schomäcker, T. Speith, and S. Sterz. 2021. On the Relation of Trust and Explainability: Why to Engineer for Trustworthiness. In IEEE 29th International Requirements Engineering Conference Workshops (REW).Google ScholarGoogle Scholar
  23. J. Kleinberg, J. Ludwig, S. Mullainathan, and C. Sunstein. 2018. Discrimination in the Age of Algorithms. Journal of Legal Analysis 10 (2018), 113–174.Google ScholarGoogle ScholarCross RefCross Ref
  24. R. Kommiya Mothilal, D. Mahajan, C. Tan, and A. Sharma. 2021. Towards unifying feature attribution and counterfactual explanations: Different means to the same end. In AAAI/ACM Conference on AI, Ethics, and Society.Google ScholarGoogle Scholar
  25. S. Krishna, T. Han, A. Gu, J. Pombra, S. Jabbari, S. Wu, and H. Lakkaraju. 2022. The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective. arXiv preprint arXiv:2202.01602(2022).Google ScholarGoogle Scholar
  26. M. Langer, D. Oster, T. Speith, H. Hermanns, L. Kästner, E. Schmidt, A. Sesing, and K. Baum. 2021. What do we want from Explainable Artificial Intelligence (XAI)? – A stakeholder perspective on XAI and a conceptual model guiding interdisciplinary XAI research. Artificial Intelligence 296 (2021).Google ScholarGoogle Scholar
  27. E. Lee, D. Braines, Mi. Stiffler, A. Hudler, and D. Harborne. 2019. Developing the sensitivity of LIME for better machine learning explanation. In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications.Google ScholarGoogle Scholar
  28. D. Lewis. 1973. Counterfactuals. Blackwell.Google ScholarGoogle Scholar
  29. Q. V. Liao and K. R. Varshney. 2021. Human-Centered Explainable AI (XAI): From Algorithms to User Experiences. arXiv preprint arXiv:2110.10790(2021).Google ScholarGoogle Scholar
  30. S. Lundberg and S. Lee. 2017. A unified approach to interpreting model predictions. In Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  31. S. M. Lundberg, G. Erion, H. Chen, A. DeGrave, J. M. Prutkin, B. Nair, R. Katz, J. Himmelfarb, N. Bansal, and S. I. Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence 2, 1 (2020), 56–67.Google ScholarGoogle Scholar
  32. G. Malgieri and G. Comandé. 2017. Why a Right to Legibility of Automated Decision-Making Exists in the General Data Protection Regulation. International Data Privacy Law 7, 4 (11 2017), 243–265.Google ScholarGoogle Scholar
  33. C. Molnar. 2020. Interpretable machine learning. Lulu.com.Google ScholarGoogle Scholar
  34. R. Mothilal, A. Sharma, and C. Tan. 2020. Explaining machine learning classifiers through diverse counterfactual explanations. In ACM Conference on Fairness, Accountability, and Transparency.Google ScholarGoogle Scholar
  35. High-Level Expert Group on AI. 2019. Ethics Guidelines for Trustworthy AI.Google ScholarGoogle Scholar
  36. Working Party. 2016. Guidelines on Automated individual decision-making and Profiling for the purposes of RegulationGuidelines on Automated individual decision-making and Profiling for the purposes of Regulation 2016/679.Google ScholarGoogle Scholar
  37. A. Paullada, I. Raji, E. Bender, E.and Denton, and A. Hanna. 2021. Data and its (dis) contents: A survey of dataset development and use in machine learning research. Patterns 2, 11 (2021).Google ScholarGoogle Scholar
  38. J. Pearl. 2000. Causality: Models, Reasoning and Inference. Cambridge University Press, Cambridge.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. K. Popper. 1959. The Logic of Scientific Discovery. Hutchinson, London.Google ScholarGoogle Scholar
  40. A. Reutlinger and J. Saatsi. 2018. Explanation Beyond Causation; Philosophical Perspectives on Non-Causal Explanations. Oxford University Press, Oxford.Google ScholarGoogle Scholar
  41. M. T. Ribeiro, S. Singh, and C. Guestrin. 2016. Why should i trust you? Explaining the predictions of any classifier. In 22nd ACM SIGKDD international conference on knowledge discovery and data mining.Google ScholarGoogle Scholar
  42. C. Rudin. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1, 5 (2019), 206–215.Google ScholarGoogle ScholarCross RefCross Ref
  43. W. Salmon. 1971. Statistical Explanation and Statistical Relevance. University of Pittsburgh Press, Pittsburgh, PA.Google ScholarGoogle Scholar
  44. W. Salmon. 1989. Four Decades of Scientific Explanation. In Scientific Explanation, Kitcher and Salmon (Eds.). Minnesota Studies in the Philosophy of Science, Vol. 13. University of Minnesota Press, 3–219.Google ScholarGoogle Scholar
  45. A. Selbst and J. Powles. 2018. Meaningful Information and the Right to Explanation. In ACM Conference on Fairness, Accountability, and Transparency.Google ScholarGoogle Scholar
  46. D. Slack, A. Hilgard, S. Singh, and H. Lakkaraju. 2021. Reliable post hoc explanations: Modeling uncertainty in explainability. In Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  47. D. Slack, S. Hilgard, E. Jia, S. Singh, and H. Lakkaraju. 2020. Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In AAAI/ACM Conference on AI, Ethics, and Society.Google ScholarGoogle Scholar
  48. D. Slack, S. Hilgard, H. Lakkaraju, and S. Singh. 2021. Counterfactual Explanations Can Be Manipulated. arXiv:2106.02666 (2021).Google ScholarGoogle Scholar
  49. P. Spirtes, C. Glymour, and R. Scheines. 1993. Causation, Prediction, and Search. Springer, Berlin.Google ScholarGoogle Scholar
  50. W. Spohn. 1980. Stochastic independence, causal independence, and shieldability. Journal of Philosophical Logic 9 (1980), 73–99.Google ScholarGoogle ScholarCross RefCross Ref
  51. M. Sundararajan and A. Najmi. 2020. The many Shapley values for model explanation. In International Conference on Machine Learning (ICML).Google ScholarGoogle Scholar
  52. R. Tomsett, D. Braines, D. Harborne, A. Preece, and S. Chakraborty. 2018. Interpretable to Whom? A Role-based Model for Analyzing Interpretable Machine Learning Systems. In ICML Workshop on Human Interpretability in Machine Learning.Google ScholarGoogle Scholar
  53. P. Tschandl, C. Rinner, Z. Apalla, G. Argenziano, N. Codella, A. Halpern, M. Janda, A. Lallas, C. Longo, J. Malvehy, J. Paoli, S. Puig, C. Rosendahl, H. Soyer, I. Zalaudek, and H. Kittler. 2020. Human–computer collaboration for skin cancer recognition. Nature Medicine 26, 8 (2020), 1229–1234.Google ScholarGoogle ScholarCross RefCross Ref
  54. M. Veale and F. Zuiderveen Borgesius. 2021. Demystifying the Draft EU Artificial Intelligence Act—Analysing the good, the bad, and the unclear elements of the proposed approach. Computer Law Review International 22, 4 (2021), 97–112.Google ScholarGoogle ScholarCross RefCross Ref
  55. S. Venkatasubramanian and M. Alfano. 2020. The Philosophical Basis of Algorithmic Recourse. In ACM Conference on Fairness, Accountability, and Transparency.Google ScholarGoogle Scholar
  56. G. Vilone and L. Longo. 2021. Notions of explainability and evaluation approaches for explainable artificial intelligence. Information Fusion 76(2021), 89–106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. W. J. von Eschenbach. 2021. Transparency and the Black Box Problem: Why We Do Not Trust AI. Philos. Technol. 34(2021), 1607–1622.Google ScholarGoogle Scholar
  58. U. von Luxburg, R. Williamson, and I. Guyon. 2012. Clustering: Science or Art?JMLR Workshop and Conference Proceedings (Workshop on Unsupervised Learning and Transfer Learning)(2012), 65 – 79.Google ScholarGoogle Scholar
  59. S. Wachter, B. Mittelstadt, and L. Floridi. 2017. Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation. International Data Privacy Law 7, 2 (06 2017), 76–99.Google ScholarGoogle Scholar
  60. S. Wachter, B. Mittelstadt, and C. Russell. 2017. Counterfactual explanations without opening the black box: Automated decisions and the GDPR. Harv. JL & Tech. 31(2017), 841.Google ScholarGoogle Scholar
  61. J. Woodward. 2003. Making Things Happen: A Theory of Causal Explanation. Oxford University Press.Google ScholarGoogle Scholar
  62. J. Woodward and L. Ross. 2003. Scientific Explanation. The Stanford Encyclopedia of Philosophy (Summer Edition 2021) (2003). https://plato.stanford.edu/archives/sum2021/entries/scientific-explanation/Google ScholarGoogle Scholar
  63. C. Zednik and H. Boelsen. forthcoming. Scientific Exploration and Explainable Artificial Intelligence. Minds and Machines(forthcoming).Google ScholarGoogle Scholar
  64. Y. Zhang, K. Song, Y. Sun, S. Tan, and M. Udell. 2019. Why Should You Trust My Explanation? Understanding Uncertainty in LIME Explanations. arXiv preprint arXiv:1904.12991(2019).Google ScholarGoogle Scholar

Index Terms

  1. Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            FAccT '22: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency
            June 2022
            2351 pages
            ISBN:9781450393522
            DOI:10.1145/3531146

            Copyright © 2022 Owner/Author

            This work is licensed under a Creative Commons Attribution International 4.0 License.

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 20 June 2022

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format