基於深度學習之AAC壓縮域翻唱歌快速檢索

摘要

 

隨著多媒體資料的增加,如何從龐大的資料庫中快速找到使用著有興趣的資料成為愈來愈重要的議題。傳統資料檢索的方法大多使用關鍵字來做搜尋,但需要大量人力來為資料先做標記,隨著資料量的增加,關鍵字標記的方法變得較不具彈性。內涵式檢索方法是較自然的方式,也可以避免不同人對同一首歌給定標記不一樣的問題。

本論文針對現今網路常見的音樂格式AAC,提出做在AAC壓縮域的翻唱歌快速檢索,其利用部分解碼後的MDCT係數,對應到Chroma特徵,再將多個數量的音框合成音段,作為深度學習的輸入,藉由學習自動找出更能代表音樂的關鍵特徵,並經由稀疏自編碼器把歌曲進行降維,改善傳統方法比對時間過長的問題。實驗結果顯示,所提出之方法其檢索效能MRR值為0.505,與相關文獻檢索方法相比,也節省約70%以上的比對時間。

關鍵字:音樂檢索、翻唱歌曲、AAC、深度學習



 

Fast Cover Song Retrieval in AAC Domain based on Deep Learning

Abstract

 

   With the increasing of multimedia data, it becomes more and more important to quickly search the interests from large databases. Keyword annotation is the traditional approach, but it needs large amount of manual effort to annotate the keyword. As the size of data increases, the keyword annotation approach becomes infeasible. Content-based retrieval is more natural, it extracts features from music content to create a representation that overcomes human labeling errors.

 This thesis focuses on the AAC file which is widely used by streaming internet sources. Here, the proposed system directly maps the modified discrete cosine transform coefficients (MDCT) into a 12-dimensional chroma feature. We combine frames to a segment as the input of deep learning, deep learning can automatically find more meaningful features of music data. We also applied sparse autoencoder to reduce dimensionality of songs. With these efforts, significant matching time can be saved. The experimental results show that the proposed method can reach 0.505 of mean reciprocal rank (MRR) and save over 70% matching time compared with conventional approaches.

Keywords music information retrieval, cover song, AAC, deep learning