基於機器學習方法之巨量音樂檢索系統

在大數據的時代中,網際網路上的多媒體資訊量以指數性成長,如何正確地尋找特定多媒體資訊成為一個重要的研究議題。

本系統參考翻唱歌曲辨識的理論架構,利用歌曲的音樂內涵式特徵,消除不同樂器、語言、歌手等等演奏時的音色、調性與些微結構差異,尋找資料庫中與輸入歌曲俱有相似旋律特徵的歌曲。

在內涵式音樂檢索領域中,由於不同歌曲的時間長度不一,先前的研究以輸入歌曲對整個資料庫的歌曲進行高複雜度的比對來計算歌曲間的相似度,最後輸出資料庫中相似度最高的歌曲清單,這種方法雖然盡可能提升辨識正確率,但是消耗過多的運算資源,在大規模的資料庫並不可行。本研究提出在大規模資料庫中快速檢索特定相似歌曲的系統,系統擷取音樂的頻譜特徵並以二維傅立葉轉換成固定長度的向量,再以機器學習的方式強化向量的模式特徵,藉此將全部歌曲投影到一個向量空間,最後系統直接比對歌曲間的向量距離,就能將相似度最高的音樂作為回饋歌單。

本系統不僅大幅度地提升內涵式音樂檢索的效率,更探討音樂檢索結合機器學習的潛力。

關鍵字:音樂資訊檢索、翻唱歌曲辨識、二維傅立葉轉換、機器學習。

 

Large-Scale Music Retrieval System Using Machine Learning Approaches

In this work, we proposed a music retrieval system which can search the similar music in large-scale database.

Large-scale similar music recognition should calculate song-to-song similarity that can accommodate differences in timing, key and tempo. Simple vector distance measure is not powerful enough to perform the similar music recognition task, but expensive solutions such as dynamic time warping do not scale to millions of instances, making the similar music recognition inappropriate for commercial-scale application. In this work, we used the content-based music features of songs as input and transformed them into semantic vectors by 2D-Fourier transform. We even explored different machine learning approaches to learn and reinforce the pattern of these semantic vector. By projecting the songs into the sematic vector space, we can use the efficient nearest neighbor algorithm to compare the similarity of songs and retrieve the most similar songs in the large-scale database.

The proposed system is not only efficient enough to perform scalable content-based music retrieval, but also develop the potential of machine learning approaches, making the similar music recognition application more fast and accurate.

Keywords- Music information retrieval, Cover song identification, 2D-Fourier transform, Machine learning