KR101530481B1

KR101530481B1 - Music classification device using autoregressive model and method thereof

Info

Publication number: KR101530481B1
Application number: KR1020140039279A
Authority: KR
Inventors: 김무영; 변가람
Original assignee: 세종대학교산학협력단
Priority date: 2014-04-02
Filing date: 2014-04-02
Publication date: 2015-06-30

Abstract

The present invention relates to a music classification device using an autoregressive model and a method thereof. The music classification device comprises: an input portion for receiving a sound source; a short-term feature extracting portion for extracting a short-term feature vector for a tone feature of the sound source inputted from the input portion; a long-term feature extracting portion for extracting a long-term feature vector by using the short-term feature vector; an AR modeling portion for extracting an LPC by using the extracted short-term feature vector, modeling the extracted LPC in the autoregressive model, producing a new feature vector having an increased degree, and converting the same into an LSP parameter; a feature selection portion for selecting a top feature vector having a high recognition rate among the short-term feature vector, the long-term feature vector, and the new feature vector; a model generation portion for producing a classification model of a music by using the selected feature vector; and a music classification portion for classifying a genre or a mode of a test music inputted based on the classification model. According to the present invention, through a selection process of the top feature vector after extracting the feature vector of the music, the music classification device using an autoregressive model enables to reduce time required to get a classification result and to get the high recognition rate by enabling to reduce the calculation amount required for a genre classification of inputted music data.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a music classification apparatus and a music classification method using an autoregressive model,

본 발명은 자기 회귀 모델을 이용한 음악 분류 장치 및 그 방법에 관한 것으로서, 연산량을 줄이고 입력된 음악의 장르 또는 무드를 정확하게 구별할 수 있는 자기 회귀 모델을 이용한 음악 분류 장치 및 그 방법에 관한 것이다. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a music classification apparatus and method using an autoregressive model, and more particularly, to a music classification apparatus and method using an autoregression model that can reduce a calculation amount and accurately distinguish genres or moods of inputted music.

최근 많은 대중 음악이 제작되고 있는 가운데 음악 데이터를 어떻게 하면 효과적으로 카테고리 별로 분류할 것인지, 특히 음악 데이터의 장르별 분류가 이슈가 되고 있다. 기존에는 일일이 사람이 수작업에 의해 음악 데이터를 카테고리 별로 분류하였다. 하지만, 방대한 디지털 음악 데이터를 분류하기 위해서는 작곡가, 가수, 장르 별로 자동 분류하는 알고리즘이 필요하다.Recently, a lot of pop music has been produced, and how to classify music data effectively by category, especially the classification of music data by genre, becomes an issue. In the past, music data was manually classified into categories by a person manually. However, in order to classify massive digital music data, algorithms for automatically classifying music by composer, singer, and genre are required.

장르는 나라마다 혹은 사람마다 경계가 분명치 않고 문화, 가수, 시장에 따라 정의를 내리기 모호한 점이 있다. 음악 데이터 분석에 의한 자동 장르 분류는 효율적인 데이터관리와 음악 추천 등 다양한 어플리케이션에 적용이 가능하다. 또한, 수작업으로 분류를 하지 않아 경제적으로도 효율성이 있다.The genres are ambiguous in terms of culture, singer, and market depending on the country or person. Automatic genre classification by music data analysis is applicable to various applications such as efficient data management and music recommendation. In addition, it is economically efficient because it is not classified by hand.

음악 데이터 분석에 의한 장르 분류는 특징 벡터 추출, 분류기 등 다양한 방법으로 연구가 진행되고 발전하고 있다. 음악 장르 또는 무드를 분류하기 위해서는 일단 음악 데이터를 분석하여야 한다. 음악 데이터를 분석하는 방법에는 박(Beat), 리듬, 박자(Meter), 멜로디, 화음(Chord) 등 다양한 기준으로 분석하는 방법이 있다.Genre classification by music data analysis has been progressed and developed by various methods such as feature vector extraction and classifier. To classify music genres or moods, music data must first be analyzed. There are various methods of analyzing music data such as Beat, Rhythm, Meter, Melody, and Chord.

음악 데이터 분석 후에는 Gaussian Mixture Model(GMM)을 비롯하여 Hidden Marcov Model(HMM), Nearest Neighbor(NN), K-Nearest Neighbor(KNN), Super Vector Machine(SVM)과 같은 다양한 분류기로 장르를 분류한다.After analyzing the music data, the genres are classified by various classifiers such as Gaussian Mixture Model (GMM), Hidden Marcov Model (HMM), Nearest Neighbor (NN), K- Nearest Neighbor (KNN) and Super Vector Machine (SVM).

Foote는 음악의 12차 MFCC(Mel-Frequency Cepstral Coefficient: 멜주파수 캡스트럴 계수)의 히스토그램을 만들고, 분류기로 NN을 사용하여 장르를 분류하였다. Bagci는 13차의 MFCC와 델타값을 구하고, Inter-Genre Similarity(IGS) 모델과 GMM분류기를 사용하여 장르를 분류하였다.Foote created a histogram of the music's 12th MFCC (Mel-Frequency Cepstral Coefficient) and classified the genres using NN as a classifier. Bagci classifies genres using MFCC and delta values of 13th order, Inter-Genre Similarity (IGS) model and GMM classifier.

Jiang은 음악의 Octave를 고려하여, MFCC와 다른 특징인 OSC(Octave-based Spectral Contrast: 옥타브기반 스펙트럴 콘트라스트)를 제안하여 장르 분류 성공률을 향상시켰다. 분류기로 GMM을 사용하여 Jazz, Pop, Romantic, Baroque, Rock의 5가지 장르에 대해서 약 82 % 장르 분류 성공률을 얻었다.Jiang has proposed Octave-based Spectral Contrast (Octave-based Spectral Contrast), another feature of MFCC, to improve the genre classification success rate, considering Octave of music. Using GMM as a classifier, we obtained 82% genre classification success rate for five genres of Jazz, Pop, Romantic, Baroque, and Rock.

기존의 음악 장르 분류 방법 또는 시스템은 MFCC, Chroma, OSC 등 다양한 특징 벡터를 이용하여, SVM으로 장르를 인식한다. 그러나 상기 시스템은 낮은 인식 성공률 때문에 더 높은 성공률이 기대되는 음악 장르 분류 방법 또는 시스템에 대한 기술이 계속 논의되고 있다.Conventional music genre classification method or system recognizes genre by SVM using MFCC, Chroma, OSC and various feature vectors. However, the system is still under discussion for a music genre classification method or system that is expected to have a higher success rate due to its low recognition success rate.

본 발명의 배경이 되는 기술은 국내공개특허 제2011-0013646호(2011.02.10 공개)에 개시되어 있다.The technology to be a background of the present invention is disclosed in Korean Patent Publication No. 2011-0013646 (published on Mar. 2, 2011).

본 발명은 입력된 음악에 대하여 연산량을 줄이고 장르 또는 무드를 정확하게 구별하기 위한 자기 회귀 모델을 이용한 음악 분류 장치 및 그 방법에 관한 기술을 제공하는데 목적이 있다. An object of the present invention is to provide a music classification apparatus and a method thereof using an autoregressive model for reducing the amount of computation and correctly distinguishing genres or moods for input music.

상기한 바와 같은 목적을 달성하기 위한 본 발명의 하나의 실시예에 따른 음악 분류 장치는, 음원을 입력받는 입력부; 상기 입력된 음원의 음색 특징에 대한 Short-term 특징 벡터를 추출하는 Short-term 특징 추출부; 상기 Short-term 특징 벡터를 이용하여 Long-term 특징 벡터를 추출하는 Long-term 특징 추출부; 상기 추출된 Short-term 특징 벡터를 이용하여 선형예측계수(LPC)를 추출하고, 추출된 선형예측계수(LPC)를 자기 회귀 모델로 모델링하여 차수가 증가된 새로운 특징 벡터를 생성한 뒤, LSP 파라미터로 변환하는 AR 모델링부; 상기 Short-term 특징 벡터, Long-term 특징 벡터 및 새로운 특징 벡터 중에서 인식률이 높은 상위 특징 벡터를 선택하는 특징 선택부; 상기 선택된 상위 특징 벡터를 이용하여 음악의 분류 모델을 생성하는 모델 생성부; 및 상기 분류 모델을 기반으로 입력된 테스트 음악의 장르 또는 무드를 분류하는 음악 분류부를 포함한다. According to an aspect of the present invention, there is provided a music classifying apparatus comprising: an input unit for receiving a sound source; A short-term feature extraction unit for extracting a short-term feature vector for a tone color feature of the input sound source; A long-term feature extraction unit for extracting a long-term feature vector using the short-term feature vector; (LPC) using the extracted short-term feature vector, modeling the extracted LPC with an autoregressive model to generate a new feature vector with an increased order, An AR modeling unit for converting the AR model into an AR model; A feature selecting unit for selecting the short-term feature vector, the long-term feature vector, and an upper feature vector having a higher recognition rate from among the new feature vectors; A model generation unit for generating a music classification model using the selected upper feature vector; And a music classifier for classifying the genre or mood of the test music inputted based on the classification model.

상기 Short-term 특징 벡터, Long-term 특징 벡터 및 새로운 특징 벡터는 MFCC(Mel-Frequency Cepstral Coefficient) 및 DFB(Decorrelated Filter Bank)를 포함할 수 있다. The short-term feature vector, the long-term feature vector, and the new feature vector may include a Mel-Frequency Cepstral Coefficient (MFCC) and a Decorrelated Filter Bank (DFB).

상기 Short-term 특징 벡터는 Texture Window 기법을 이용하여 추출되는 음색에 대한 통계적 특징을 포함하되, 상기 통계적 특징은 평균값 및 분산값을 포함할 수 있다. The short-term feature vector includes a statistical feature for a tone color extracted using a texture window technique, and the statistical feature may include an average value and a variance value.

상기 Long-term 특징 벡터는, FMS(Feature-based Modulation Spectrum)를 이용하여 추출되는 FMSFM(Feature-based Modulation Spectral Flatness Measures) 및 FMSCM(Feature-based Modulation Spectral Crest Measures)를 포함하며, 상기 FMSFM 및 FMSCM은 아래 수학식으로 정의될 수 있다. The long-term feature vector includes Feature-based Modulation Spectral Flatness Measures (FMSFM) and Feature-based Modulation Spectral Measurements (FMSCM) extracted using a Feature-based Modulation Spectrum (FMS). The FMSFM and the FMSCM Can be defined by the following equation.

상기 FMS 및 상기 FMS의 평균은, 아래 수학식으로 정의될 수 있다.The average of the FMS and the FMS can be defined by the following equation.

여기서, T는 총 Texture Window의 수이고, X_t(k,p)는 t번째 Texture Window내의 p번째 프레임의 Short-term 특징의 k번째 요소이고, P_i는 t번째 Texture Window내에 속한 프레임의 총수이고, M은 Modulation Fourier-transform의 크기이다.Here, T is the number of total texture windows, X _t (k, p) is the kth element of the short-term feature of the pth frame in the tth texture window, P _i is the total number of frames And M is the magnitude of the Modulation Fourier-transform.

상기 Long-term 특징 벡터는 통계적 특징을 포함하되, 상기 통계적 특징은 평균값 및 분산값을 포함할 수 있다. The long-term feature vector includes a statistical feature, which may include an average value and a variance value.

상기 음악 분류부는, 분류 대상이 되는 상기 테스트 음악이 입력되면, 상기 테스트 음악에 대하여 상기 상위 특징 벡터에 해당하는 특징 벡터를 추출하고, 상기 추출된 특징 벡터를 기초로 상기 분류 모델과 비교하여 상기 입력된 테스트 음악의 장르 또는 무드를 분류할 수 있다. Wherein the music classification unit extracts a feature vector corresponding to the upper feature vector for the test music when the test music to be classified is input and compares the feature vector with the classification model based on the extracted feature vector, The genre or mood of the test music can be classified.

상기 특징 선택부는, SVM(Support Vector Machine) ranker를 이용하여 상기 상위 특징 벡터를 선택하고, 상기 음악 분류부는, One-against-one SVM(Support Vector Machine)를 이용하여 상기 테스트 음악의 장르 또는 무드를 분류할 수 있다. The feature selecting unit selects the upper feature vector using a SVM (Support Vector Machine) ranker, and the music classifier selects a genre or mood of the test music using a one-against-one SVM (Support Vector Machine) Can be classified.

본 발명의 다른 실시예에 따르면, 음악 분류 장치를 이용한 음악 분류 방법에 있어서, 입력된 음원의 음색 특징에 대한 Short-term 특징 벡터를 추출하는 단계; 상기 Short-term 특징 벡터를 이용하여 Long-term 특징 벡터를 추출하는 단계; 상기 추출된 Short-term 특징 벡터를 이용하여 선형예측계수(LPC)를 추출하는 단계; 추출된 선형예측계수(LPC)를 자기 회귀 모델로 모델링하여 차수가 증가된 새로운 특징 벡터를 생성하여 LSP 파라미터로 변환하는 단계; 상기 Short-term 특징 벡터, Long-term 특징 벡터 및 새로운 특징 벡터 중에서 인식률이 높은 상위 특징 벡터를 선택하는 단계; 및 상기 선택된 상위 특징 벡터를 이용하여 음악의 분류 모델을 생성하는 단계; 상기 분류 모델을 기반으로 입력된 테스트 음악의 장르 또는 무드를 분류하는 단계를 포함한다. According to another embodiment of the present invention, there is provided a music classification method using a music classification apparatus, comprising: extracting a short-term feature vector for a tone color feature of an input sound source; Extracting a long-term feature vector using the short-term feature vector; Extracting a linear prediction coefficient (LPC) using the extracted short-term feature vector; Modeling the extracted LPC with an autoregressive model to generate a new feature vector with an increased order and transforming the new feature vector into LSP parameters; Selecting an upper feature vector having a higher recognition rate from the Short-term feature vector, the Long-term feature vector, and the new feature vector; And generating a music classification model using the selected upper feature vector; And classifying the genre or mood of the inputted test music based on the classification model.

본 발명에 따르면, 음악의 특징 벡터 중에서 인식율이 높은 특징 벡터를 선택함으로써 입력된 음악의 장르 분류에 요구되는 계산량을 줄일 수 있으므로 분류 결과를 얻는데 걸리는 시간을 단축할 수 있으며, 높은 인식률을 얻을 수 있다. According to the present invention, by selecting a feature vector having a high recognition rate from the feature vectors of music, it is possible to reduce the amount of calculation required for the genre classification of the inputted music, so that it is possible to shorten the time required to obtain the classification result, .

도 1은 본 발명의 실시예에 따른 음악 분류 장치의 구성도이다.
도 2는 본 발명의 실시예에 따른 음악 분류 방법의 순서도이다.
도 3은 입력되는 음악 신호의 파형도이다.
도 4는 Analysis Window를 설명하기 위한 도면이다.
도 5는 Texture Window를 설명하기 위한 도면이다.
도 6은 본 발명의 실시예에 따른 MFCC 알고리즘을 이용한 특징 추출 블록도이다.
도 7은 본 발명의 실시예에 따른 DFB 알고리즘을 이용한 특징 추출 블록도이다.
도 8은 본 발명의 실시예에 따른 OSC 알고리즘을 이용한 특징 추출 블록도이다.
도 9는 본 발명의 실시예에 따른 LSP 변환 과정을 설명하기 위한 도면이다. 1 is a configuration diagram of a music classification apparatus according to an embodiment of the present invention.
2 is a flowchart of a music classification method according to an embodiment of the present invention.
3 is a waveform diagram of an input music signal.
4 is a diagram for explaining the Analysis Window.
5 is a view for explaining the Texture Window.
6 is a feature extraction block diagram using the MFCC algorithm according to the embodiment of the present invention.
7 is a feature extraction block diagram using the DFB algorithm according to the embodiment of the present invention.
8 is a feature extraction block diagram using an OSC algorithm according to an embodiment of the present invention.
9 is a diagram for explaining an LSP conversion process according to an embodiment of the present invention.

이하에서는 첨부된 도면을 참조하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

먼저 본 발명의 음악 장르 인식 장치에 대하여 설명한다.First, the music genre recognizing apparatus of the present invention will be described.

도 1은 본 발명의 실시예에 따른 음악 분류 장치의 구성도이다.1 is a configuration diagram of a music classification apparatus according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 음악 분류 장치(100)는 입력부(110), Short-term 특징 추출부(120), Long-term 특징 추출부(130), AR 모델링부(140), 특징 선택부(150), 모델 생성부(160) 및 음악 분류부(170))를 포함한다.1, the music classification apparatus 100 includes an input unit 110, a short-term feature extraction unit 120, a long-term feature extraction unit 130, an AR modeling unit 140, (150), a model generating unit (160), and a music classifying unit (170)).

입력부(110)는 음성 또는 음악 신호를 프레임 단위로 입력 받는다. 본 발명에 따른 하나의 실시예로서 입력부(110)는 마이크로폰을 통한 사람의 음성을 직접 입력 받거나 음악 데이터 베이스로부터 저장된 음악 데이터를 입력 받을 수 있다.The input unit 110 receives voice or music signals on a frame basis. As an embodiment of the present invention, the input unit 110 may directly receive a human voice through a microphone or receive music data stored from a music database.

Short-term 특징 추출부(120)는 입력부(110)로 입력된 음성 또는 음악 신호의 Short-term 특징에 해당하는 음색 특징(Timbre feature) 또는 통계적 특징을 추출한다.The short-term feature extraction unit 120 extracts a timbre feature or a statistical feature corresponding to the short-term feature of the voice or music signal input to the input unit 110. [

Long-term 특징 추출부(130)는 Short-term 특징 추출부(120)에서 추출된 음색 특징(Timbre feature)을 이용하여 Long-term 특징을 추출한다. 또한, Long-term 특징 추출부(130)는 통계적 특징도 추출한다.The long-term feature extraction unit 130 extracts the long-term feature using the timbre feature extracted from the short-term feature extraction unit 120. Also, the long-term feature extraction unit 130 extracts statistical features.

AR 모델링부(140)는 Short-term 특징 추출부(120)에서 추출된 음색 특징(Timbre feature)을 이용하여 선형예측계수(LPC)를 추출하고, 추출된 선형예측계수(LPC)를 자기 회귀 모델(AR 모델)로 모델링하여 차수가 증가된 새로운 특징 벡터를 생성한다. 그리고, AR 모델링부(140)는 새로운 특징 벡터를 LSP 파라미터로 변환한다. The AR modeling unit 140 extracts a linear prediction coefficient LPC using the timbre feature extracted from the short-term feature extraction unit 120 and outputs the extracted linear prediction coefficient LPC to an autoregressive model (AR model) to generate a new feature vector with an increased order. Then, the AR modeling unit 140 converts the new feature vector into LSP parameters.

특징 선택부(150)는 SVM ranker라는 알고리즘을 이용하여 Short-term 특징부(120), Long-term 특징부(130) 및 AR 모델링부(140)에서 추출된 전체 특징 벡터 중에서 인식률이 높은 상위 특징 벡터를 선택한다. 즉, 특징 선택부(150)는 Short-term 특징 벡터, Long-term 특징 벡터 및 LSP 변환된 새로운 특징 벡터에 대하여 인식률을 측정하고, 인식률이 기준값보다 높은 상위 특징 벡터를 선택한다. 전체 특징 벡터 중에서 선택된 상위 특징 벡터만이 음악 장르 분류에 이용되기 때문에 선택부(150)는 특징 추출에 필요한 계산량을 줄일 수 있으며 인식률을 향상시킬 수 있다.The feature selecting unit 150 selects an upper feature having a higher recognition rate among the feature vectors extracted from the short-term feature unit 120, the long-term feature unit 130, and the AR modeling unit 140 using an algorithm called an SVM ranker Select a vector. That is, the feature selector 150 measures the recognition rate for the Short-term feature vector, the Long-term feature vector, and the LSP-transformed new feature vector, and selects an upper feature vector whose recognition rate is higher than the reference value. Since only the upper feature vector selected from the whole feature vectors is used for music genre classification, the selection unit 150 can reduce the amount of calculation required for feature extraction and improve the recognition rate.

모델 생성부(160)는 선택부(150)가 선택한 상위 특징 벡터를 기초로 각 음악 분류 모델을 생성한다. 구체적인 장르는 Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal 등으로 구분될 수 있다.The model generation unit 160 generates each music classification model based on the upper feature vector selected by the selection unit 150. [ Specific genres can be classified into Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal and so on.

음악 분류부(170)는 모델 생성부(160)가 생성한 음악 분류 모델을 기초로 테스트 음악에서 추출한 상위 벡터를 이용하여 테스트 음악의 장르 또는 무드를 인식기를 이용하여 분류한다.The music classification unit 170 classifies the genre or mood of the test music using the recognizer by using the upper vector extracted from the test music based on the music classification model generated by the model generation unit 160. [

이하에서는 본 발명의 실시예에 따른 음악 분류 장치(100)를 이용한 음악 장르 인식 방법에 관하여 더욱 상세하게 설명한다.Hereinafter, a music genre recognition method using the music classification apparatus 100 according to an embodiment of the present invention will be described in more detail.

도 2는 본 발명의 실시예에 따른 음악 분류 방법의 순서도이다.2 is a flowchart of a music classification method according to an embodiment of the present invention.

본 발명의 실시예에 따른 음악 장르 인식 방법은, 전체적으로 분류 모델을 생성하기 위한 학습단계와 생성된 모델을 통하여 입력된 테스트 음악의 장르 또는 무드를 분류하기 위한 테스트 단계를 포함한다. 도 2에 나타낸 것과 같이 학습단계는 S210 내지 S250 단계를 포함하고, 테스트 단계는 S260 내지 S300 단계를 포함한다.The method for recognizing a music genre according to an embodiment of the present invention includes a learning step for generating a classification model as a whole and a testing step for classifying the genre or mood of the inputted test music through the generated model. As shown in FIG. 2, the learning step includes steps S210 to S250, and the testing step includes steps S260 to S300.

최종적으로 음악의 장르 또는 무드를 분류하기 위해서는 음악의 장르 또는 무드를 인식하는 것이 선행되어야 하는데 음악의 장르 또는 무드를 인식하기 위하여, 음악의 분류 모델을 정확히 생성해야 한다.In order to finally classify music genres or moods, it is necessary to recognize the genre or mood of the music. In order to recognize the genre or mood of the music, a music classification model must be accurately generated.

음악 장르의 인식을 위해서 본 발명의 실시예에 따른 음악 분류 장치(100)는 먼저 각 장르의 음악에 대해서 특징 벡터를 추출하는데, 특징 벡터의 종류로는 Short-term 및Long-term 특징 벡터, LSP 변환된 Short-term에 대한 새로운 특징 벡터를 포함한다.In order to recognize the music genre, the music classification apparatus 100 according to the embodiment of the present invention first extracts a feature vector for each genre of music. The feature vectors include Short-term and Long-term feature vectors, LSP And a new feature vector for the transformed Short-term.

도 3은 입력되는 음악 신호의 파형도이다. 도 4는 Analysis Window를 설명하기 위한 도면이며, 도 5는 Texture Windows를 설명하기 위한 도면이다.3 is a waveform diagram of an input music signal. FIG. 4 is a view for explaining an Analysis Window, and FIG. 5 is a view for explaining Texture Windows.

도 3에 도시한 바와 같이 입력부(110)를 통하여 음악 신호가 입력되면, Short-term 특징 추출부(120)는 도 4와 같은 Window를 통해서 프레임별 Short-term 특징 벡터를 추출한다(S210). Short-term 특징 추출부(120)는 각 프레임(Analysis Window)에서 추출된 Short-term 특징 벡터를 기초로, 도 5의 Texture Window 기법을 이용하여 통계적 특징 벡터를 추출한다. 통계적 특징은 평균(Mean) 또는 분산(Variance) 중에서 적어도 하나를 포함한다.As shown in FIG. 3, when a music signal is input through the input unit 110, the short-term feature extraction unit 120 extracts a short-term feature vector for each frame through the window shown in FIG. 4 (S210). The short-term feature extraction unit 120 extracts a statistical feature vector using the texture window technique of FIG. 5 based on the short-term feature vector extracted from each frame (Analysis Window). The statistical features include at least one of Mean or Variance.

이하에서는, Short-term 특징 벡터에 해당하는 음색 특징(Timbre Feature)을 추출하는 단계(S210)에 대해 더욱 상세하게 설명한다.Hereinafter, the step S210 of extracting the timbre feature corresponding to the short-term feature vector will be described in more detail.

본 발명의 하나의 실시예에 따른Short-term 특징 추출은, 음성인식과 화자인식에 주로 사용되는 MFCC(Mel-Frequency Cepstral Coefficient: 멜주파수 캡스트럴 계수), mel-scale band pass filter와 high pass filter를 이용하는 Short-term feature extraction according to one embodiment of the present invention is performed by using MFCC (Mel-Frequency Cepstral Coefficient), mel-scale band pass filter, and high pass Using a filter

DFB(Decorrelated Filter Bank: 상관감소 필터 뱅크), 음악 신호의 특성을 octave band 별로 나타내며, 음악 인식에 사용되는 OSC(Octave-based Spectral Contrast: 옥타브기반 스펙트럴 콘트라스트)를 포함한다.DFB (Decorrelated Filter Bank), the characteristics of the music signal by octave band, and the Octave-based Spectral Contrast (OSC) used for music recognition.

이하 도 6을 참조하여 MFCC 알고리즘에 대해 설명한다.Hereinafter, the MFCC algorithm will be described with reference to FIG.

도 6은 본 발명의 실시예에 따른 MFCC 알고리즘을 이용한 특징 추출 블록도이다.6 is a feature extraction block diagram using the MFCC algorithm according to the embodiment of the present invention.

먼저, Short-term 특징 추출부(120)는 시간 도메인에서 각 프레임에 Hamming window를 적용 후 Fast Fourier Transform(FFT)을 수행한다. 그 후 밴드의 수가 B개인 Mel-scale Band-pass Filter로 스펙트럼을 스케일링 한 후 각 스펙트럼의 가중치 합 WS(b)_b=1,…, _B 를 구한다. 그 후 이 값에 log를 적용한다(1≤b≤B). 마지막으로, Short-term 특징 추출부(120)는 각 가중치 합에 Discrete Cosine Transform (DCT)을 적용하여 K차원의 MFCC 특징 벡터를 추출한다.First, the short-term feature extraction unit 120 performs Fast Fourier Transform (FFT) after applying a Hamming window to each frame in the time domain. Then, after scaling the spectrum with a Mel-scale band-pass filter with B number of bands, the weighting sum of each spectrum WS (b) _{b = 1, ... , And} _B are obtained. Then apply log to this value (1 ≤ b ≤ B). Finally, the short-term feature extraction unit 120 extracts a K-dimensional MFCC feature vector by applying Discrete Cosine Transform (DCT) to each weighted sum.

Short-term 특징 추출부(120)가 MFCC를 이용해 추출하는 특징은 아래 수학식 1로 정의된다.The feature that the short-term feature extraction unit 120 extracts using the MFCC is defined by Equation 1 below.

다음으로, 도 7을 참조하여 DFB(Decorrelated Filter Bank: 상관감소 필터 뱅크)에 대해 설명한다.Next, the DFB (Decorrelated Filter Bank) will be described with reference to FIG.

도 7은 본 발명의 실시예에 따른 DFB 알고리즘을 이용한 특징 추출 블록도이다.7 is a feature extraction block diagram using the DFB algorithm according to the embodiment of the present invention.

Short-term 특징 추출부(120)는 DFB 알고리즘을 통해 MFCC와 마찬가지로 각 밴드별 스케일링된 값들을 더한 후, log를 적용하여 log(WS(b))(1≤b≤B)를 생성한다. 그리고, 도 7과 같이 마지막 과정에서 DCT를 대신하여 FIR High-pass-filter

를 통과하게 한다. 최종적으로Short-term 특징 추출부(120)는 아래의 수학식 2와 같이 D차원의 DFB 특징 벡터를 추출한다.The short-term feature extraction unit 120 generates a log (WS (b)) (1? B? B) by adding scaled values for each band as in MFCC through a DFB algorithm and applying log. As shown in FIG. 7, in the last step, an FIR High-pass-filter

. Finally, the short-term feature extraction unit 120 extracts the D-dimensional DFB feature vector as shown in Equation (2) below.

다음으로, 도 8를 참조하여 OSC(Octave-based Spectral Contrast: 옥타브 기반 스펙트럴 콘트라스트)에 대해 설명한다.Next, an octave-based spectral contrast (OSC) will be described with reference to FIG.

도 8은 본 발명의 실시예에 따른 OSC 알고리즘을 이용한 특징 추출 블록도이다.8 is a feature extraction block diagram using an OSC algorithm according to an embodiment of the present invention.

OSC는 MFCC나 DFB와 다르게 청각모델이 아닌 옥타브(Octave)를 기반으로 특징을 추출한다. Short-term 특징 추출부(120)는 OSC 알고리즘을 이용하여 각 밴드별 스펙트럼의 Peak와 Valley값을 고려한 특징 벡터를 추출한다. 대부분의 음악에서 강한 Peak는 Harmonic부분과 연관되며, 강한 Valley는 Non-harmonic부분과 연관된다. 따라서, OSC는 밴드별 스펙트럼의 Peak와 Valley값을 고려함으로써, 음악의 Harmonic과 Non-harmonic 성분을 나타낼 수 있다.Unlike MFCC or DFB, OSC extracts features based on octave rather than auditory model. The short-term feature extraction unit 120 extracts a feature vector considering the peaks and valleys of the spectrum of each band using the OSC algorithm. In most music, strong peaks are associated with the harmonic part, and stronger valleys are associated with the non-harmonic part. Therefore, OSC can represent the harmonic and non-harmonic components of music by considering the peak and valley values of spectrum per band.

도 8에 도시한 바와 같이, OSC는 MFCC나 DFB와 다르게 멜스케일 밴드 패스 필터(Mel-scale Band-pass Filter)를 사용하지 않고, 옥타브스케일 밴드 패스 필터(Octave-scale Band-pass Filter)를 사용한다. 본 발명에서는 아래 표 1과 같이 8개의 Octave-scale Band-pass Filter를 이용한다.As shown in FIG. 8, the OSC uses an octave-scale band-pass filter instead of the Mel-scale band-pass filter, unlike MFCC and DFB. do. In the present invention, eight octave-scale band-pass filters are used as shown in Table 1 below.

BandBand Frequency(Hz)Frequency (Hz) 1One 0~1000-100 22 100~200100 to 200 33 200~400200 to 400 44 400~800400 to 800 55 800~1600800 ~ 1600 66 1600~32001600 to 3200 77 3200~80003200 ~ 8000 88 8000~220508000-22050

한 프레임의 길이와 FFT 포인트가 N으로 같을 때, 각 프레임에 대한 FFT 스펙트럼은 {x₁, x₂,…,x_N}로 정의할 수 있다. i번째 밴드에 해당하는 FET포인트 수가 Ki이면, i번째 밴드의 스펙트럼은 {x_i _,1, x_i _,2,…,x_i _, _Ki}와 같이 나타낼 수 있다. Peak와 Valley를 구하기 위해선 먼저 각 밴드별 스펙트럼을 내림차순으로 정리한다. 내림차순 정리된 i번째 밴드의 스펙트럼 {x?_,1, x?_,2,…, x?_, _Ki}을 가지고, 아래 식을 이용하여 Peak와 Valley를 추출할 수 있다.When the length of one frame and the FFT point are equal to N, the FFT spectrum for each frame is {x ₁ , x ₂ , ... , x _N }. If the number of FET points corresponding to the i-th band is K _i _, the spectrum of the i-th band is {x _i _{, 1} , x _i _{, 2} , ... , x _i _, _Ki }. To find peaks and valleys, first sort the spectra for each band in descending order. The spectrum of the ith band in descending order {x? _{, 1} , x? _{, 2} , ... , x? _, _Ki }, we can extract Peak and Valley using the following equation.

여기서α는 주변 값의 범위에 대한 상수로서 0.02부터 0.2까지 실험한 결과 α는 성능에 중요한 영향을 미치지 않는 바, 0.02로 설정하여 사용할 수 있다. 수학식 3 및 수학식 4에서 구한 Peak값과 Valley값의 차이를 계산하여 다음의 수학식 5와 같이 Spectral Contrast(SC_i)를 구한다.In this case, α is a constant for the range of the peripheral value, and as a result of testing from 0.02 to 0.2, α has no significant effect on the performance, so it can be set to 0.02. The difference between the peak value and the valley value obtained by Equations (3) and (4) is calculated and the spectral contrast (SC _i ) is obtained as shown in Equation (5).

OSC는 다음과 같이 I개 밴드에 대한 Spectral Contrast와 Valley {SC₁, SC₂, …,SC_I, V₁, V₂, …,V_I}를 특징 벡터로 사용하게 된다. 본 발명에서는, 8개의 밴드를 사용하기 때문에 16차 OSC 특징 벡터를 사용하게 된다. i번째 밴드에 대해서 OSC는 Spectral Contrast와 Valley를 {SC_i, V_i}와 같이 특징 벡터로 사용하게 된다.The OSC uses Spectral Contrast and Valley {SC ₁ , SC ₂ , ... , SC _I , V ₁ , V ₂ , ... , V _I } are used as feature vectors. In the present invention, 16-order OSC feature vectors are used because eight bands are used. For the i-th band, OSC uses Spectral Contrast and Valley as feature vectors such as {SC _i , V _i }.

Short-term 특징 추출부(120)는 다음의 수학식 6 및 수학식 7을 통하여 Analysis Window에서 추출된 특징 벡터들의 평균(μ_t(k))과 분산(σ_t ²(k))을 추출한다.Short-term feature extraction unit 120 extracts the mean (μ _t (k)) and the variance (σ _t ² (k)) of the feature vector extracted from the Analysis Window from the following equation 6 and equation 7 of .

상기 수학식에서, X_t(k,p)는 t번째 Texture Window에 p번째 프레임의 음색 특징(Timbre feature)의 k번째 요소이고, P는 Texture Window안에 포함된 총 프레임 개수이다.In the above equation, X _t (k, p) is the kth element of the timbre feature of the p th frame in the t th Texture Window, and P is the total number of frames contained in the Texture Window.

t번째 Texture Window에 구해진 통계적 특징 벡터인 μ_t(k),σ_t ²(k)는 전체 Texture Window에 대하여 각각 평균을 구하면 아래 수학식 8 및 수학식 9와 같다.The statistical feature vectors μ _t (k) and σ _t ² (k) obtained in the t-th texture window are averaged for the entire texture window.

상기의 수학식 8 및 수학식 9와 같이, Short-term 특징 추출부(120)가 Short-term 특징을 추출하면, Long-term 특징 추출부(130)는 추출된 Short-term 특징을 이용하여 Long-term 특징을 추출한다(S220). 이하 Long-term 특징 중의 하나인 변조 스펙트럼(Modulation Spectrum)을 사용한 스펙트럼 특징(Spectro-temporal Feature)에 대해 설명한다.When the short-term feature extraction unit 120 extracts the short-term feature, as shown in Equations (8) and (9), the long-term feature extraction unit 130 extracts the short- -term feature is extracted (S220). Hereinafter, a spectro-temporal feature using a modulation spectrum, which is one of the long-term features, will be described.

본 발명의 실시예에 따르면 Long-term 특징 추출부(130)는 변조 스펙트럼(Modulation Spectrum)을 사용한 특징 벡터로 MSFM(Modulation Spectral Flatness Measures: 변조 스펙트럴 프래트니스 측정), MSCM(Modulation Spectral Crest Measures: 변조 스펙트럴 크레스트 측정)를 추출한다.According to the embodiment of the present invention, the long-term feature extraction unit 130 extracts a feature vector using a modulation spectrum, such as Modulation Spectral Flatness Measures (MSFM), Modulation Spectral Crest Measures (MSCM) : Modulation spectral crest measurement).

또한, Long-term 특징 추출(130)는 상기 MSFM/MSCM을 기반으로 FMSFM(Feature-based Modulation Spectral Flatness Measures: 특징기반 변조 스펙트럴 프래트니스 측정) 및 FMSCM(Feature-based Modulation Spectral Crest Measures: 특징기반 변조 스펙트럴 크레스트 측정)을 추출할 수 있다.In addition, the long-term feature extraction unit 130 extracts feature-based modulation spectral flatness measures (FMSFM) and feature-based modulation spectral crest measures (FMSCM) based on the MSFM / MSCM Based modulation spectral crest measurement) can be extracted.

MSFM/MSCM은 옥타브 밴드 합(Octave Band Sum: OBS)의 변조 스펙트럼(Modulation Spectrum)을 이용하는 반면, 본 발명의 실시예에 따른 FMSFM/FMSCM은 FMS(Feature-based Modulation Spectrum: 특징기반 변조 스펙트럼)를 이용하여 추출된다. Long-term 특징 추출부(130)는 FMS를 이용하여 각 Feature Dimension별로 변조 스펙트럼(Modulation Spectrum)을 추출하는데, 이를 통해 각 Feature Dimension이 시간에 따라 어떻게 변화하는지를 알 수 있다.MSFM / MSCM uses a modulation spectrum of an octave band sum (OBS), whereas the FMSFM / FMSCM according to an embodiment of the present invention uses a Feature-based Modulation Spectrum (FMS) . The long-term feature extraction unit 130 extracts a modulation spectrum for each Feature Dimension using the FMS, thereby knowing how each Feature Dimension changes with time.

S220단계에 대하여 더욱 상세하게 설명하면, 먼저 Long-term 특징 추출부(130)는 다음의 수학식 10과 같이 FMS를 추출한다.Step S220 will be described in more detail. First, the long-term feature extraction unit 130 extracts an FMS as shown in Equation (10).

여기서 X_t(k,p)는 t번째 texture window내의 p번째 프레임의 음색 특징(Timbre Feature)의 k번째 요소이고, P_t는 t번째 Texture Window내의 속한 프레임의 총 수이고, M은 변조 푸리에 변환(Modulation Fourier-transform)의 크기이다. 그리고 Long-term 특징 추출부(130)는 전체 Texture Window에 대하여 아래와 같이 평균 FMS를 추출한다.Where X _t (k, p) is the k th element of the timbre feature of the p th frame in the t th texture window, P _t is the total number of frames in the t th texture window, M is the modulated Fourier transform (Modulation Fourier-transform). The long-term feature extraction unit 130 extracts an average FMS for the entire texture window as follows.

여기서 T는 총 Texture Window의 개수이다.Where T is the total number of texture windows.

다음으로, Long-term 특징 추출부(130)는 수학식 10 및 수학식 11의 결과를 이용하여 FMSFM/FMSCM을 다음과 같이 추출한다.Next, the long-term feature extraction unit 130 extracts the FMSFM / FMSCM using the results of Equation (10) and Equation (11) as follows.

수학식 12에서 연산된 FMSFM의 작은값(예를 들면 0)과 큰값(예를 들면 1 또는 1이상의 값)은 각각 평균 FMS의 Peakiness와 Flatness를 나타낸다. FMSCM은 FMSFM과 반대의 성향을 보인다. 만약 k번째 FMSFM이 매우 작은 값을 가진다면, 입력 음악에서 k번째 Modulation 주파수가 반복되는 패턴을 가진다. A small value (for example, 0) and a large value (for example, 1 or 1 or more) of the FMSFM calculated in the equation (12) represent the average and the flatness of the average FMS, respectively. FMSCM shows a tendency to contradict with FMSFM. If the kth FMSFM has a very small value, the kth modulation frequency in the input music has a repetitive pattern.

Long-term 특징 추출부(130)는 MSFM/MSCM를 추출하기 위해 Octave Band내의 에너지인 Octave Band Sum (OBS)을 사용한다.The long-term feature extraction unit 130 uses the Octave Band Sum (OBS), which is the energy in the Octave Band, to extract the MSFM / MSCM.

다음으로 AR 모델링부(140)는 Short-term 특징 추출부(120)에서 추출된 음색 특징(Timbre feature)을 이용하여 선형예측계수(LPC)를 추출하고, 추출된 선형예측계수(LPC)를 자기 회귀 모델(AR 모델)로 모델링하여 차수가 증가된 Short-term에 대한 새로운 특징 벡터를 생성한다(S230). 그리고, AR 모델링부(140)는 새로운 특징 벡터를 LSP 파라미터로 변환한다. Next, the AR modeling unit 140 extracts the linear prediction coefficients LPC using the timbre feature extracted from the short-term feature extraction unit 120, and outputs the extracted linear prediction coefficients LPC to the self- A new feature vector for the short-term whose degree is increased is modeled by a regression model (AR model) (S230). Then, the AR modeling unit 140 converts the new feature vector into LSP parameters.

이하에서는 도 9를 통하여 S230 단계에서 LSP 변환된 Short-term에 대한 특징 벡터를 생성하는 과정에 대하여 더욱 상세하게 설명한다. Hereinafter, the process of generating the feature vector for the LSP-transformed Short-term in step S230 will be described in more detail with reference to FIG.

도 9는 본 발명의 실시예에 따른 LSP 변환 과정을 설명하기 위한 도면이다. 9 is a diagram for explaining an LSP conversion process according to an embodiment of the present invention.

AR 모델링부(140)는 modulation spectrum과 유사하지만 퓨리에 변환이 아닌 레빈슨(levinson) 알고리즘을 이용하여 시간축의 선형 예측 계수(Linear Prediction Coding Coefficient, LPC)를 추출한다. The AR modeling unit 140 extracts a linear prediction coefficient (LPC) of a time axis using a Levinson algorithm, which is similar to a modulation spectrum but is not a Fourier transform.

그리고, 도 9와 같이 선형예측계수(LPC)를 자기 회귀 모델(Autoregressive Model, AR 모델)을 통해 모델링하면 기존의 Short-term특징 벡터에 비하여 차수가 증가한 새로운 특징 벡터가 생성된다. When a linear prediction coefficient (LPC) is modeled through an autoregressive model (AR model) as shown in FIG. 9, a new feature vector having an order higher than that of the existing short-term feature vector is generated.

예를 들어, 입력된 Short-term 특징 벡터가 13차인 경우, 자기 회귀 모델(AR 모델)을 이용하면 10배가 증가된 130차수의 새로운 특징 벡터가 생성된다. For example, when the inputted short-term feature vector is 13th, 130-degree new feature vectors are generated, which are increased 10-fold by using the autoregressive model (AR model).

그러나, 선형예측계수(LPC)는 다이나믹 레인지(dynamic range)가 크기 때문에 인식 에러가 발생할 수 있으므로, AR 모델링부(140)는 자기 회귀 모델(AR 모델)에 의해 생성된 새로운 특징 벡터들을 LSP 파라미터(Line Spectum Pairs Parameter)로 변환하여 각 장르/무드 별 모델을 생성한다. However, since the linear prediction coefficient (LPC) has a large dynamic range, a recognition error may occur. Therefore, the AR modeling unit 140 converts the new feature vectors generated by the autoregressive model (AR model) Line Spectrum Pairs Parameter) to generate each genre / mood model.

LSP 파라미터는 다이나믹 레인지가 선형예측계수(LPC)보다 상대적으로 적고, 인접한 LSP의 상대적 거리에 의하여 스펙트럼의 폴(pole)과 제로(zero)를 정확하게 유추할 수 있으므로 선형예측계수(LPC)만을 사용하는 경우에 비하여 우수한 성능을 가진다. The LSP parameter uses the linear prediction coefficient (LPC) only because the dynamic range is relatively smaller than the linear prediction coefficient (LPC), and the poles and zeros of the spectrum can be precisely estimated by the relative distance of adjacent LSPs. And has excellent performance compared to the case.

이와 같이 Short-term 특징 추출부(120), Long-term 특징 추출부(130), AR 모델링부(140)는 각각 특징 벡터를 생성하고, 실제 시스템에서 추출되는 전체 특징에 해당하는 특징 벡터는 아래 표 2와 같이 예시할 수 있다.In this way, the short-term feature extraction unit 120, the long-term feature extraction unit 130, and the AR modeling unit 140 respectively generate feature vectors, and the feature vectors corresponding to all features extracted from the actual system are Table 2 shows examples.

Feature vectorsFeature vectors DimensionDimension Texture windowTexture window MeanMean MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 VarianceVariance MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 Feature-based modulation spectrumFeature-based modulation spectrum FMSFMFMSFM MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 FMSCMFMSCM MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 Feature-based autoregressive modelFeature-based autoregressive model LSPLSP MFCCMFCC 130130 DFBDFB 130130 OSCOSC 160160 TotalTotal 588588

이와 같이, 표 2에서 보는 바와 같이 Short-term 특징 추출부(120), Long-term 특징 추출부(130) 및 AR 모델링부(140)는 입력된 음악에 대하여 588개의 특징 벡터를 생성할 수 있다. 즉, Short-term 특징 추출부(120)는 S210단계에서 설명한 것처럼, Texture window 기법을 통하여 입력된 음악 신호 중에서 13개의 차수 각각에 대하여 MFCC의 평균 값(Mean)에 대한 13개의 특징 벡터를 획득하고, 13개의 차수 각각에 대하여 DFB의 평균 값(Mean)에 대한 13개의 특징 벡터를 획득한다.As shown in Table 2, the short-term feature extraction unit 120, the long-term feature extraction unit 130, and the AR modeling unit 140 can generate 588 feature vectors for the input music . That is, as described in step S210, the short-term feature extraction unit 120 obtains thirteen feature vectors for the mean of MFCC for each of the thirteen orders among the music signals input through the texture window technique , 13 feature vectors are obtained with respect to the average value of DFB for each of the 13 orders.

그리고, Long-term 특징 추출부(130)는 S220단계에서 설명한 것처럼, Feature-based modulation spectrum을 이용하여 13개의 차수 각각에 대하여 MFCC의 FMSFM 연산된 13개의 특징 벡터를 획득하고, 13개의 차수 각각에 대하여 DFB의 FMSCM 연산된 13개의 특징 벡터를 획득한다.As described in step S220, the long-term feature extraction unit 130 obtains 13 FMSFM-calculated feature vectors for each of 13 orders using the feature-based modulation spectrum, The FMSCM computed 13 feature vectors of the DFB are obtained.

마찬가지로, AR 모델링부(140)는 S220단계에서 설명한 것처럼, Feature-based autoregressive model을 이용하여 차수가 10배로 증가된 130개의 차수 각각에 대하여 MFCC의 LSP 변환된 130개의 특징 벡터를 획득하고, 130개의 차수 각각에 대하여 DFB의 LSP 변환된 130개의 특징 벡터를 획득한다.Similarly, as described in step S220, the AR modeling unit 140 obtains 130 LSP-transformed feature vectors of the MFCC for each of the 130 orders whose orders are increased by a factor of 10 using the feature-based autoregressive model, 130 feature vectors of the DFB are obtained for each of the orders.

다음으로 특징 선택부(150)는 추출된Short-term 또는 Long-term 특징 벡터 중에서 상위 특징 벡터를 선택(feature selection)한다(S240). 여기서 상위 특징 벡터란 인식률이 기준값보다 높은 특징 벡터를 의미한다.Next, the feature selecting unit 150 selects an upper feature vector from the extracted short-term or long-term feature vectors (S240). Here, the upper feature vector means a feature vector whose recognition rate is higher than the reference value.

이와 같이 본 발명의 실시예에 따르면 종래 기술과 달리, 전체 특징 벡터들 중에서 우선 순위에 대응하는 특징 벡터를 선택함으로써, 인식률을 향상시키고 계산량을 줄일 수 있다. 표 2를 예로 들면, 특징 선택부(150)는 획득한 588개의 특징 벡터 중에서 인식률이 우수한 160개의 상위 특징 벡터를 선택한다. 따라서, 테스트 단계에서는 추출된 전체 특징들 중에서 특징 선택부(150)에 의해 선택된 상위 특징 벡터만이 장르 인식에 사용된다.As described above, according to the embodiment of the present invention, by selecting a feature vector corresponding to the priority among all the feature vectors unlike the prior art, the recognition rate can be improved and the amount of calculation can be reduced. Taking Table 2 as an example, the feature selecting unit 150 selects 160 higher feature vectors having excellent recognition rates from 588 feature vectors obtained. Therefore, in the test step, only the upper feature vector selected by the feature selecting unit 150 among all the extracted features is used for genre recognition.

그리고, 본 발명의 실시예에 따른 특징 선택부(150)는 Best-first search와 Support Vector Machine(SVM) ranker를 이용하는 방법 등을 통하여 우선 순위에 대응하는 상위 특징 벡터를 선택할 수 있다. 즉, 특징 선택부(150)는 SVM-ranker라는SVM 인식기를 이용하여 각 특징별로 가치를 평가하여 Ranking을 매겨 특징 벡터를 선택할 수 있는데, SVM-ranker를 통하여 Ranking을 매기는 기술은 당업자라면 용이하게 실시할 수 있는 내용이므로 상세한 설명은 생략한다.The feature selector 150 according to the embodiment of the present invention can select an upper feature vector corresponding to a priority order by using a Best-first search and a method using a Support Vector Machine (SVM) ranker. That is, the feature selector 150 can select a feature vector by evaluating the value of each feature using an SVM recognizer called an SVM-ranker, and then rank the feature. The technique of ranking through the SVM-ranker is easy The detailed description will be omitted.

다음으로 모델 생성부(160)는 선택된 상위 특징 벡터를 이용하여 음악의 분류 모델을 생성한다(S240). 구체적인 장르는 Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal 등으로 구분될 수 있다. 구체적인 방법에 대해 살펴보면, 모델 생성부(160)는 추출된 상위 특징 벡터를 이용하여 음악의 각 장르별로 모델링을 수행한다. 즉, 모델 생성부(160)는 각각의 음악 장르에 대응하는 상위 특징 벡터의 특징 값들을 그룹핑하여 각 장르별로 모델을 생성한다.Next, the model generating unit 160 generates a music classification model using the selected upper feature vector (S240). Specific genres can be classified into Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal and so on. To describe a specific method, the model generation unit 160 performs modeling for each genre of music using the extracted top feature vectors. That is, the model generation unit 160 groups the feature values of the upper feature vectors corresponding to each music genre and generates a model for each genre.

이와 같이 분류 모델이 생성되면, 입력된 테스트 음악의 장르 또는 무드를 분류하기 위한 테스트 단계가 진행된다.When the classification model is generated in this manner, a test step for classifying the genre or mood of the inputted test music is performed.

먼저, 도 2와 같이 분류 대상이 되는 테스트 음악 신호가 입력되면(S260), 음악 분류부(170)는 입력된 음악 신호에서 상위 특징 벡터에 포함되는Short-term 특징 벡터, Long-term 특징 벡터, AR 모델에 의해 생성된 새로운 특징 벡터를 순차적으로 추출한다(S270, S280, S290). 즉, 표 2의 경우에는, 음악 분류부(170)는 588개 전체에 대한 특징 벡터를 추출할 필요가 없이 우선 순위에 해당하는 160개의 특징 벡터만을 추출한다.First, when a test music signal to be classified is input (S260), the music classifying unit 170 classifies a short-term feature vector, a long-term feature vector, New feature vectors generated by the AR model are sequentially extracted (S270, S280, S290). That is, in the case of Table 2, the music classifier 170 does not need to extract the feature vectors for all 588, but extracts only 160 feature vectors corresponding to the priorities.

예를 들면, Texture window 기법을 통하여 획득한 MFCC의 평균 값(Mean)에 대하여 13개의 차수 중에서 1, 4, 6, 9, 13차에 대응하는 음악 신호의 특징 벡터가 우선 순위에 포함된다고 가정하면, 음악 분류부(170)는 입력된 테스트 음악 신호 중에서 상기 5개에 대응하는 특징 벡터를 추출하도록 한다.For example, if it is assumed that the feature vectors of music signals corresponding to the first, fourth, sixth, ninth, and thirteenth orders among the thirteen orders are included in the priority order with respect to the mean value of MFCC obtained through the texture window technique , The music classification unit 170 extracts the feature vectors corresponding to the five test music signals.

여기서, Short-term 특징 벡터를 추출하는 단계, Long-term 특징 벡터를 추출하는 단계, AR 모델에 의해 생성된 새로운 특징 벡터를 추출하는 단계(S270, S280, S290)는 상위 특징 벡터에 해당하는 Short-term 특징 벡터, Long-term 특징 벡터, 새로운 특징 벡터를 추출한다는 점을 제외하고 상기 S210, S220, S230와 실질적으로 동일한 바, 중복되는 설명은 생략한다. Here, a step of extracting a short-term feature vector, a step of extracting a long-term feature vector, and a step of extracting a new feature vector generated by the AR model (S270, S280, S290) -term feature vector, a long-term feature vector, and a new feature vector, which are substantially the same as those of S210, S220 and S230 described above, and redundant explanations are omitted.

또한 설명의 편의상 음악 분류부(170)가 Short-term 특징 벡터, Long-term 특징 벡터, 새로운 특징 벡터를 추출하는 것(S270, S280, S290)으로 설명하였으나, Short-term 특징 추출부(120), Long-term 특징 추출부(130) 및 AR 모델링부(140)가 테스트 과정을 직접 수행할 수도 있다.The short-term feature extraction unit 120 extracts a short-term feature vector, a long-term feature vector, and a new feature vector (S270, S280, and S290) The long-term feature extraction unit 130, and the AR modeling unit 140 may directly perform the test process.

다음으로, 음악 분류부(170)는 추출된 상위 특징 벡터에 해당하는 특징 벡터를 이용하여 모델 생성부(160)가 생성한 분류 모델을 기반으로 입력된 테스트 음악의 장르 또는 무드를 분류한다(S300). 특히 본 발명의 실시예에 따르면 음악 분류부(170)는One-against-one SVM을 사용하여 음악 장르 또는 무드를 분류할 수 있다.Next, the music classifying unit 170 classifies the genre or mood of the inputted test music based on the classification model generated by the model generating unit 160 using the feature vector corresponding to the extracted upper feature vector (S300 ). In particular, according to an embodiment of the present invention, the music classifier 170 can classify music genres or moods using a one-against-one SVM.

이와 같이 본 발명의 실시예에 따르면 테스트 단계에서 상위 특징 벡터만을 추출하여 음악 장르 또는 무드를 분류하는데 이용하기 때문에, 입력되는 테스트 음악의 모든 특징 벡터를 추출하여 음악 장르 또는 무드를 분류하는 종래 기술에 비하여 연산량이 월등히 감소할 수 있다.As described above, according to the embodiment of the present invention, since only the upper feature vector is extracted in the test step and used to classify the music genre or the mood, the conventional technique of classifying music genre or mood by extracting all the feature vectors of the input test music The computation amount can be significantly reduced.

본 발명의 실시예에 따른 하나의 실험예로서 음악 장르 분류 성능 평가를 위해GTZAN 데이터 베이스와 MIREX 무드 클러스터링 방법을 이용하였다. As one experimental example according to an embodiment of the present invention, a GTZAN database and an MIREX mood clustering method are used for music genre classification performance evaluation.

즉, 장르 인식에는 GTZAN 데이터 베이스를 이용하고, 무드 인식에는 MIREX 무드 클러스터링 방법을 이용하여 성능 평가를 수행하였다. That is, performance evaluation was performed using GTZAN database for genre recognition and MIREX mood clustering method for mood recognition.

GTZAN은 Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal로 총 10개의 장르를 포함하며, 각 장르당 100곡, 한 곡당 30초로 16bit, 22050Hz, 모노, AU파일 포맷으로 구성되어있다.GTZAN includes 10 genres including Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop and Metal. It has 100 songs per genre and 30 seconds per song in 16bit, 22050Hz, mono and AU file formats. Consists of.

분류기Classifier Accuracy (%)Accuracy (%) 장르genre 무드Mood 차수Order All feature setAll feature set 82.482.4 62.262.2 588588 + Feature selection+ Feature selection 85.085.0 69.769.7 160160

표 3에 나타난 정확성 결과에서 나타났듯이, 588개의 전체 특징 벡터를 모두 추출하여 장르 또는 무드를 인식하는 경우에 비하여, 상위 160개의 특징 벡터를 선별하여, 본 발명의 실시예에 따라서 선별된 특징 벡터에 해당하는 특징 벡터를 추출하여 장르 또는 무드를 인식하는 경우가 장르 또는 무드 인식의 정확성이 높다는 것을 알 수 있다. As shown in the accuracy results shown in Table 3, the upper 160 feature vectors are selected, compared with the case where all the 588 feature vectors are extracted to recognize the genre or the mood, and the feature vectors selected according to the embodiment of the present invention The recognition of the genre or the mood is highly accurate when the genre or the mood is recognized.

이와 같이 본 발명의 실시예에 따른 음악 장르 분류 방법 및 그 장치에 따르면, 음악의 특징 추출 후의 상위 특징 벡터 선택 과정을 통해, 입력된 음악 데이터의 장르 분류에 요구되는 계산량을 줄일 수 있기 때문에 분류 결과를 얻는데 걸리는 시간을 단축 할 수 있다. 또한, 상위 특징 벡터 선택을 통해 입력된 음악 데이터의 장르 또는 무드를 분류하더라도 전체 특징 벡터를 사용하는 종래의 방법과 비교하여 높은 인식률을 얻을 수 있다.As described above, according to the method and apparatus for classifying music genres according to the embodiment of the present invention, it is possible to reduce the amount of calculation required for classifying genre of input music data through the process of selecting an upper feature vector after extracting features of music, Can be shortened. Even if the genre or mood of the music data inputted through the selection of the upper feature vector is classified, a high recognition rate can be obtained as compared with the conventional method using the entire feature vector.

또한, 특징 벡터 추출시, 음색(timbre) 특징 이외에도 자기 회귀 모델(AR 모델)을 이용하여 음색(timbre) 특징을 기반으로 한 LSP 파라미터를 추출하여 특징 벡터로 이용함으로써, 장르 및 무드 인식 시스템의 성능을 향상시킬 수 있다. In extracting feature vectors, LSP parameters based on the timbre feature are extracted and used as feature vectors by using an autoregressive model (AR model) in addition to the timbre feature, so that the performance of genre and mood recognition system Can be improved.

이제까지 본 발명에 대하여 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 따라서 본 발명의 범위는 전술한 실시예에 한정되지 않고 특허청구범위에 기재된 내용 및 그와 동등한 범위 내에 있는 다양한 실시 형태가 포함되도록 해석되어야 할 것이다.The present invention has been described above with reference to the embodiments. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. Therefore, the scope of the present invention is not limited to the above-described embodiments, but should be construed to include various embodiments within the scope of the claims and equivalents thereof.

100: 음악 분류 장치, 110: 입력부,
120: Short-term 특징 추출부, 130: Long-term 특징 추출부,
140: AR 모델링부, 150: 특징 선택부,
160: 모델 생성부, 170: 음악 분류부100: music classification device, 110: input part,
120: Short-term feature extraction unit, 130: Long-term feature extraction unit,
140: AR modeling unit, 150: feature selecting unit,
160: model generation unit, 170: music classification unit

Claims

An input unit for receiving a sound source;
A short-term feature extraction unit for extracting a short-term feature vector for a tone color feature of the input sound source;
A long-term feature extraction unit for extracting a long-term feature vector using the short-term feature vector;
Term feature vector (LPC) using the extracted short-term feature vector, and modeling the extracted LPC with an autoregressive model to obtain a new feature An AR modeling unit for generating a vector and converting the vector into an LSP parameter;
A feature selecting unit for selecting the short-term feature vector, the long-term feature vector, and an upper feature vector having a higher recognition rate from among the new feature vectors;
A model generation unit for generating a music classification model using the selected upper feature vector; And
And a music classifier for classifying a genre or a mood of the test music inputted based on the classification model,
The short-term feature vector, the long-term feature vector,
A Mel-Frequency Cepstral Coefficient (MFCC) and a Decorrelated Filter Bank (DFB)
The short-term feature vector includes a statistical feature for a tone color extracted using a texture window technique,
Wherein the statistical characteristic includes an average value and a variance value.

delete

The method according to claim 1,
The long-term feature vector may be expressed as:
Based modulation spectral flatness measures (FMSFM) and feature-based modulation spectral crest measures (FMSCM) extracted using Feature-based Modulation Spectrum (FMS)
Wherein the FMSFM and the FMSCM are defined by the following equations.

5. The method of claim 4,
The average of the FMS and the FMS,
A music classification apparatus defined by the following equation:

Here, T is the number of total texture windows, X _t (k, p) is the kth element of the short-term feature of the pth frame in the tth texture window, P _i is the total number of frames And M is the magnitude of the Modulation Fourier-transform.

The method according to claim 1,
The long-term feature vector includes a statistical feature,
Wherein the statistical characteristic includes an average value and a variance value.

The method according to claim 1,
Wherein the music classification unit comprises:
Extracting a feature vector corresponding to the upper feature vector with respect to the test music and comparing the extracted feature vector with the classification model based on the extracted feature vector, Or mood.

The method according to claim 1,
Wherein the feature selecting unit comprises:
Selects an upper feature vector using a SVM (Support Vector Machine) ranker,
Wherein the music classification unit comprises:
A music classification apparatus for classifying a genre or a mood of the test music using a one-against-one SVM (Support Vector Machine).

A music classification method using a music classification apparatus,
Extracting a short-term feature vector for a tone color feature of the input sound source;
Extracting a long-term feature vector using the short-term feature vector;
Extracting a linear prediction coefficient (LPC) using the extracted short-term feature vector;
Modeling the extracted LPC with an autoregressive model to generate a new feature vector for the short-term feature vector with an increased order and transforming the new feature vector into LSP parameters;
Selecting an upper feature vector having a higher recognition rate from the Short-term feature vector, the Long-term feature vector, and the new feature vector; And
Generating a music classification model using the selected upper feature vector;
Classifying the genre or mood of the test music inputted based on the classification model,
The short-term feature vector, the long-term feature vector,
A Mel-Frequency Cepstral Coefficient (MFCC) and a Decorrelated Filter Bank (DFB)
The short-term feature vector includes a statistical feature for a tone color extracted using a texture window technique,
Wherein the statistical feature includes an average value and a variance value.

delete

10. The method of claim 9,
The long-term feature vector may be expressed as:
Based modulation spectral flatness measures (FMSFM) and feature-based modulation spectral crest measures (FMSCM) extracted using Feature-based Modulation Spectrum (FMS)
Wherein the FMSFM and the FMSCM are defined by the following equation.

13. The method of claim 12,
The average of the FMS and the FMS,
A music classification method defined by the following equation:

10. The method of claim 9,
The long-term feature vector includes a statistical feature,
Wherein the statistical feature includes an average value and a variance value.

10. The method of claim 9,
Wherein classifying the genre or mood of the input music based on the classification model comprises:
Extracting a feature vector corresponding to the upper feature vector with respect to the test music when the test music to be classified is input; And
And classifying the genre or mood of the test music by comparing the extracted feature vector with the classification model.

10. The method of claim 9,
In the step of selecting a feature whose recognition rate is higher than a reference value,
SVM (Support Vector Machine) ranker,
In the step of classifying the genre or the mood of the inputted music based on the classification model,
A music classification method for classifying a genre or a mood of the test music using a one-against-one SVM (Support Vector Machine).

10. The method of claim 9,
The step of extracting a linear prediction coefficient (LPC) using the extracted short-term feature vector comprises:
And extracting the linear prediction coefficient (LPC) using a Levinson algorithm.