XAI for Image Captioning using SHAP

Christine Dewi, Rung Ching Chen*, Hui Yu, Xiaoyi Jiang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review


In the fields of computer vision (CV) and natural language processing (NLP), they attempt to create a textual description of a given image is known as image captioning. Captioning is the process of creating an explanation for an image. Recognizing the significant items in an image, their qualities, and their connections are required for image captioning. It must also be able to construct phrases that are valid in both syntax and semantics. Deep-learning-based approaches are deal with the intricacies and problems of image captioning. This article provides a simple and effective Explainable Artificial Intelligence (XAI) technique for image text. Deep learning techniques have been widely applied to this work in recent years, and the results have been relatively positive. This work employs Azure Cognitive Service and Open-Source Image Captioning model to get image caption. We implement Explainable Artificial Intelligence (XAI) Image Captioning (Image to Text) using Shapley Additive explanations (SHAP). This work applies Cosine similarity by spaCy and Term Frequency Inverse Document Frequency (TF-IDF transform) to evaluate the sentence similarity. Our research work found that Azure Cognitive Services provides better descriptions for images compared to the Open-Source Image Captioning Model.

Original languageEnglish
Pages (from-to)711-724
Number of pages14
JournalJournal of Information Science and Engineering
Issue number4
Publication statusPublished - 1 Jul 2023


  • API
  • azure cognitive service
  • explainable artificial intelligence
  • image captioning
  • SHAP

Cite this