Using Image Pre-classification to improve the accuracy of the image captioning systems
Keywords:
Deep Learning, Image Captioning Systems, FastText, ResNet50, CNN, LSTM, Classification effects, Classified datasetsAbstract
Deep learning for the purpose of image description and captioning has been one of the most promised computer science application recently. It consists of two parts; the image and the text description models. In previous researches, we studied the effect of using different languages and datasets on the image description models. In this paper, we study the classification effect of the image dataset on those models. So, a new combined 12000-images dataset consisting of two international datasets (Flickr2k and MS-COCO) is built. The designed models support Arabic and English languages. For the description part, we used two different scenarios. In the first scenario, we used CNN and LSTM models. While for the second one, ResNet50 and FastText are used as image and text models respectively. The training is applied for both indoor and outdoor classes. Tests scenarios are applied in two cases and four ways which are the word-by-word and the sentence-by-sentence models. The performance analysis proves that classified classes have a higher performance than the unclassified ones in case of repeating-based and non-repeating-based datasets in all scenarios.