深度摺積神經網路於混合式整體學習之影像檢索技術

隨著網路及科技的日新月異，數位相機、平板、智慧型手機等可攜式影音多媒體裝置的普及化，使得數位影像資料每天以爆炸式地增長，進入大數據(Big Data)時代。在面對大量且複雜的影像資料庫，如何有效管理並依照使用者的需求檢索所需影像，是目前影像檢索技術所面臨的重要課題。為了學習出最佳的影像特徵描述，以取得穩定且正確的影像檢索結果，因此將不同架構的摺積神經網路結合在一起，以提升特徵描述的學習效果。

本論文提出使用兩種不同架構的摺積神經網路(Convolutional Neural Networks, CNN)組成混合式整體學習 (Mixture of Ensemble Learning) 模型的方法。其將兩種深度學習網路(AlexNet 和 NIN)學習出的影像特徵描述，經加權平均運算後，取得更能夠代表影像的特徵描述，以便能迅速得到正確的檢索結果。由實驗結果顯示，CNN的整體學習架構確實能夠有效提升學習的效果，使影像分類的準確度高於單一摺積神經網路。而將整體學習出來的影像特徵，應用到影像檢索之中，在CIFAR-10及CIFAR-100影像資料庫的檢索平均準確率(mean average precision, MAP )達到0.867和0.526的表現。

關鍵字：影像檢索、整體學習、深度學習、類神經網路、摺積神經網路

Mixture of Deep CNN-based Ensemble Model for Image Retrieval

Rapid Internet deployment and technology development have led us into the era of Big Data. There are numerous digital image data being continuously produced by our pads, smartphones, digital cameras, and other portable multimedia devices. We are facing many problems of challenge. One of the primary problems is how we can find an effective method to manage our image datasets and conduct customized retrieval. We propose a model, which combines two distinguishable deep Convolutional Neural Networks (CNN) architectures to achieve better performance for image retrieval.

This paper proposes an ensemble model based on a mixed architecture of deep CNN. It utilizes two kinds of deep learning networks, AlexNet and Network In Network (NIN), to obtain the image features, and to compute the weighted average feature vectors for image retrieval. From our experiment result, ensemble architecture could effectively enhance learning with higher accuracy than single CNN in image classification. The proposed Mixture of deep CNN-based Ensemble Model (MCNNE) was applied to CIFAR-10 and CIFAR-100 datasets. It achieved 0.867 and 0.526 Mean Average Precision (MAP) in image retrieval tasks, respectively.

Keywords－ Content-based image retrieval, Ensemble learning, Deep learning, Neural networks, Convolutional neural networks