Mobile QR Code QR CODE
Title Image Visual Description Model
Authors Rishabh Kanodiya;Smriti Mittal;Shikha Jain
DOI https://doi.org/10.5573/IEIESPC.2020.9.2.169
Page pp.169-176
ISSN 2287-5255
Keywords Computer vision; CNN; Deep learning; GRU; Image context; Image captioning; Visual description; Validity measures
Abstract Image captioning is a keen area of interest for many researchers. With the evolution of machine learning and deep learning, different models are being applied to improve the accuracy and time complexity of the model. However, further improvement in terms of accuracy and time complexity is still an open research challenge. This paper’s contribution is twofold. First, we propose an image captioning model (ImgCap) using a VGG16 Convolution Neural Network and Gated Recurrent Unit (GRU) to generate the captions. Next, a similarity metric (SimM) is proposed in order to compare the generated captions with the expected ones. Furthermore, the proposed model is compared with an existing Long Short-Term Memory (LSTM)-based model. We observe that the proposed model outperforms the existing one in terms of both accuracy and time complexity.