基於多重時間描述之內涵式音樂檢索
隨著多媒體壓縮技術、行動裝置與行動網路的蓬勃發展,透過串流平台或社群網站分享、下載各種多媒體影音資料已成為日常生活的一部分。而對於不經意聽到卻感興趣的歌曲,內涵式音樂檢索(Content Based Music Retrieval, CBMR)可直接利用歌曲內容如旋律、音色等特徵做為檢索依據,避免使用者無法描述其關鍵字或標注錯誤的情況。
面對大量的檢索資料庫所耗費的大量比對時間,本研究提出以稀疏自編碼器(Sparse Auto Encoder, SAE)將片段時間的音訊Chroma特徵轉換為資訊含量較高的描述元(Descriptor),藉由學習找出相對關鍵的特徵增加檢索效能,並降低比對的特徵數量減少比對時間。實驗結果顯示,本研究提出之方法不僅節省50%以上的時間,也大幅提升MRR值,說明長時間的特徵更能描述歌曲檢索資訊。
關鍵字: 音樂檢索、翻唱歌曲,類神經網路,深度學習
Temporal Multi-
Descriptors
for Content Based Music Retrieval
Nowadays, sharing or downloading
multimedia resources from the internet has become part of our daily life.
However, it is hard to find the particular music in such a tremendous amount of
data on internet when it comes to searching the music with limited information.
The Content Based Music Retrieval (CBMR) can direct get the desired music by
using features extracted from the content as the keywords for searching.
To deal with massive retrieval
data, we use Chroma clip as input for the Sparse Auto Encoder (SAE)
transferring feature to Descriptor before matching to reduce feature’s
quantity, and learning which parts is more important for the input data. The
experiment results show that our method provide over 50% matching time
reduction and higher MRR compared with traditional approach.
Keywords: Music Retrieval, Cover Song,
Neural Network, Deep Learning