IMAGE CAPTION GENERATION USING DEEP LEARNING ALGORITHM

Authors

  • Shan-E-Fatima1 Kratika Gupta2 Deepti Goyal3 Dr. Suman Kumar Mishra4 Author

DOI:

https://doi.org/10.84761/938ybq74

Abstract

This study investigates the effectiveness of an image captioning model utilizing VGG16 and LSTM architectures on the Flickr8K dataset. Through meticulous experimentation and evaluation, valuable insights into the model's capabilities and limitations in generating descriptive captions for images were gained. The findings contribute to the broader understanding of image captioning techniques and offer guidance for future advancements in the field. The exploration of VGG16 and LSTM architecture involved data preprocessing, model training, and evaluation. The Flickr8K dataset, comprising 8,000 images paired with textual descriptions, served as the foundation. Data preprocessing, feature extraction using VGG16, and LSTM training were conducted. Optimization of model parameters and hyperparameters was performed to achieve optimal performance. Evaluation metrics including BLEU score, Semantic Similarity score, and ROUGE scores were utilized. While moderate overlap with reference captions was observed according to the BLEU score, the model demonstrated a high degree of semantic similarity. However, challenges in maintaining coherence and capturing higher-order linguistic structures were revealed by the analysis of ROUGE scores. Implications of this research extend to domains such as computer vision, natural language processing, and human-computer interaction. By bridging the semantic gap between visual content and textual descriptions, image captioning models can enhance accessibility, improve image understanding, and facilitate human-machine communication. Despite promising performance in capturing semantic content, opportunities for improvement exist, including refining model architecture, integrating attention mechanisms, and leveraging larger datasets. Continued innovation in image captioning promises advanced systems with widespread applications across industries and disciplines.

Downloads

Download data is not yet available.

Published

2019-2024

Issue

Section

Articles

How to Cite

IMAGE CAPTION GENERATION USING DEEP LEARNING ALGORITHM. (2024). Ianna Journal of Interdisciplinary Studies,ISSN(O):2735-9891,ISSN(P):2735-9883, 5(2), 179-194. https://doi.org/10.84761/938ybq74

Share