XAI for Image Captioning using SHAP

Christine Dewi, Rung Ching Chen*, Hui Yu, Xiaoyi Jiang

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    Abstract

    In the fields of computer vision (CV) and natural language processing (NLP), they attempt to create a textual description of a given image is known as image captioning. Captioning is the process of creating an explanation for an image. Recognizing the significant items in an image, their qualities, and their connections are required for image captioning. It must also be able to construct phrases that are valid in both syntax and semantics. Deep-learning-based approaches are deal with the intricacies and problems of image captioning. This article provides a simple and effective Explainable Artificial Intelligence (XAI) technique for image text. Deep learning techniques have been widely applied to this work in recent years, and the results have been relatively positive. This work employs Azure Cognitive Service and Open-Source Image Captioning model to get image caption. We implement Explainable Artificial Intelligence (XAI) Image Captioning (Image to Text) using Shapley Additive explanations (SHAP). This work applies Cosine similarity by spaCy and Term Frequency Inverse Document Frequency (TF-IDF transform) to evaluate the sentence similarity. Our research work found that Azure Cognitive Services provides better descriptions for images compared to the Open-Source Image Captioning Model.

    Original languageEnglish
    Pages (from-to)711-724
    Number of pages14
    JournalJournal of Information Science and Engineering
    Volume39
    Issue number4
    DOIs
    Publication statusPublished - 1 Jul 2023

    Keywords

    • API
    • azure cognitive service
    • explainable artificial intelligence
    • image captioning
    • SHAP

    Cite this