KR20140134988A

KR20140134988A - Music genre classification apparatus and method thereof

Info

Publication number: KR20140134988A
Application number: KR1020130055133A
Authority: KR
Inventors: 김무영; 임신철
Original assignee: 세종대학교산학협력단
Priority date: 2013-05-15
Filing date: 2013-05-15
Publication date: 2014-11-25
Also published as: KR101501349B1

Abstract

The present invention relates to a device and a method for classifying a music genre. The device of the present invention comprises: an input unit to input a sound source; a short-term feature extracting unit to extract a short-term feature vector of the sound source inputted by the input unit; a long-term feature extracting unit to extract a long-term feature vector using the short-term feature vector; a feature selecting unit to select a superordinate feature vector having a higher recognition rate between the short-term feature vector or the long-term feature vector; a model generating unit to generate a model by a music genre using the selected superordinate feature vector; and a genre classifying unit to classify a genre of a test music inputted based on the model by a genre. According to the present invention, the device and the method can reduce computation required to classify the genre of inputted music data by carrying out a superordinate feature vector selecting process after the feature vectors of the music are extracted, thereby reducing the time to obtain classification results. Also, although the genre of the inputted music data is classified by selecting main features, the device and the method of the present invention can obtain a high recognition rate when compared to the device and the method using an entire feature.

Description

MUSIC GENRE CLASSIFICATION APPARATUS AND METHOD THEREOF FIELD OF THE INVENTION [0001]

본 발명은 음악 장르 분류 장치 및 그 방법에 관한 것으로서, 연산량을 줄이고 입력된 음악의 장르를 정확하게 구별할 수 있는 음악 장르 분류 장치 및 그 방법에 관한 것이다.The present invention relates to a music genre classifying apparatus and method thereof, and more particularly, to a music genre classifying apparatus and method which can reduce a calculation amount and accurately distinguish genres of inputted music.

최근 많은 대중 음악이 제작되고 있는 가운데 음악 데이터를 어떻게 하면 효과적으로 카테고리 별로 분류할 것인지, 특히 음악 데이터의 장르별 분류가 이슈가 되고 있다. 기존에는 일일이 사람이 수작업에 의해 음악 데이터를 카테고리 별로 분류하였다. 하지만, 방대한 디지털 음악 데이터를 분류하기 위해서는 작곡가, 가수, 장르 별로 자동 분류하는 알고리즘이 필요하다.Recently, a lot of pop music has been produced, and how to classify music data effectively by category, especially the classification of music data by genre, becomes an issue. In the past, music data was manually classified into categories by a person manually. However, in order to classify massive digital music data, algorithms for automatically classifying music by composer, singer, and genre are required.

장르는 나라마다 혹은 사람마다 경계가 분명치 않고 문화, 가수, 시장에 따라 정의를 내리기 모호한 점이 있다. 음악 데이터 분석에 의한 자동 장르 분류는 효율적인 데이터관리와 음악 추천 등 다양한 어플리케이션에 적용이 가능하다. 또한, 수작업으로 분류를 하지 않아 경제적으로도 효율성이 있다.The genres are ambiguous in terms of culture, singer, and market depending on the country or person. Automatic genre classification by music data analysis is applicable to various applications such as efficient data management and music recommendation. In addition, it is economically efficient because it is not classified by hand.

음악 데이터 분석에 의한 장르 분류는 특징 벡터 추출, 분류기 등 다양한 방법으로 연구가 진행되고 발전하고 있다. 음악 장르를 분류하기 위해서는 일단 음악 데이터를 분석하여야 한다. 음악 데이터를 분석하는 방법에는 박(Beat), 리듬, 박자(Meter), 멜로디, 화음(Chord) 등 다양한 기준으로 분석하는 방법이 있다.Genre classification by music data analysis has been progressed and developed by various methods such as feature vector extraction and classifier. To classify musical genres, music data should be analyzed first. There are various methods of analyzing music data such as Beat, Rhythm, Meter, Melody, and Chord.

음악 데이터 분석 후에는 Gaussian Mixture Model(GMM)을 비롯하여 Hidden Marcov Model(HMM), Nearest Neighbor(NN), K-Nearest Neighbor(KNN), Super Vector Machine(SVM)과 같은 다양한 분류기로 장르를 분류한다.After analyzing the music data, the genres are classified by various classifiers such as Gaussian Mixture Model (GMM), Hidden Marcov Model (HMM), Nearest Neighbor (NN), K- Nearest Neighbor (KNN) and Super Vector Machine (SVM).

Foote는 음악의 12차 MFCC(Mel-Frequency Cepstral Coefficient: 멜주파수 캡스트럴 계수)의 히스토그램을 만들고, 분류기로 NN을 사용하여 장르를 분류하였다. Bagci는 13차의 MFCC와 델타값을 구하고, Inter-Genre Similarity(IGS) 모델과 GMM분류기를 사용하여 장르를 분류하였다.Foote created a histogram of the music's 12th MFCC (Mel-Frequency Cepstral Coefficient) and classified the genres using NN as a classifier. Bagci classifies genres using MFCC and delta values of 13th order, Inter-Genre Similarity (IGS) model and GMM classifier.

Jiang은 음악의 Octave를 고려하여, MFCC와 다른 특징인 OSC(Octave-based Spectral Contrast: 옥타브기반 스펙트럴 콘트라스트)를 제안하여 장르 분류 성공률을 향상시켰다. 분류기로 GMM을 사용하여 Jazz, Pop, Romantic, Baroque, Rock의 5가지 장르에 대해서 약 82 % 장르 분류 성공률을 얻었다.Jiang has proposed Octave-based Spectral Contrast (Octave-based Spectral Contrast), another feature of MFCC, to improve the genre classification success rate, considering Octave of music. Using GMM as a classifier, we obtained 82% genre classification success rate for five genres of Jazz, Pop, Romantic, Baroque, and Rock.

기존의 음악 장르 분류 방법 또는 시스템은 MFCC, Chroma, OSC 등 다양한 특징 벡터를 이용하여, SVM으로 장르를 인식한다. 그러나 상기 시스템은 낮은 인식 성공률 때문에 더 높은 성공률이 기대되는 음악 장르 분류 방법 또는 시스템에 대한 기술이 계속 논의되고 있다.Conventional music genre classification method or system recognizes genre by SVM using MFCC, Chroma, OSC and various feature vectors. However, the system is still under discussion for a music genre classification method or system that is expected to have a higher success rate due to its low recognition success rate.

본 발명의 배경이 되는 기술은 국내공개특허 제20110013646호(2011.02.10 공개)에 개시되어 있다.The technique to be a background of the present invention is disclosed in Korean Patent Laid-Open Publication No. 20110013646 (published on Mar. 10, 2011).

본 발명은 입력된 음악에 대하여 연산량을 줄이고 장르를 정확하게 구별하기 위한 음악 장르 분류 장치 및 그 방법에 관한 기술을 제공하는데 목적이 있다.An object of the present invention is to provide a music genre classifying apparatus and a method for the music genre classifying apparatus for reducing the amount of computation and accurately discriminating genres of inputted music.

상기한 바와 같은 목적을 달성하기 위한 본 발명의 하나의 실시예에 따른 음악 장르 분류 장치는, 음원을 입력받는 입력부; 상기 입력부에 입력된 음원의 Short-term 특징 벡터를 추출하는 Short-term 특징 추출부; 상기 Short-term 특징 벡터를 이용하여 Long-term 특징 벡터를 추출하는 Long-term 특징 추출부; 상기 Short-term 또는 Long-term 특징 벡터 중에서 인식률이 높은 상위 특징 벡터를 선택하는 특징 선택부; 상기 선택된 상위 특징 벡터를 이용하여 음악의 장르별 모델을 생성하는 모델 생성부; 및 상기 장르별 모델을 기반으로 입력된 테스트 음악의 장르를 분류하는 장르 분류부를 포함한다.According to an aspect of the present invention, there is provided an apparatus for classifying music genres according to an embodiment of the present invention includes an input unit for receiving a sound source; A short-term feature extraction unit for extracting a short-term feature vector of a sound source input to the input unit; A long-term feature extraction unit for extracting a long-term feature vector using the short-term feature vector; A feature selector for selecting an upper feature vector having a higher recognition rate from among the Short-term or Long-term feature vectors; A model generating unit for generating a model for each genre of music using the selected upper feature vector; And a genre classifying unit for classifying the genre of the test music inputted based on the genre-specific model.

또한, 상기 Short-term 특징 벡터는, MFCC(Mel-Frequency Cepstral Coefficient: 멜주파수 캡스트럴 계수), DFB(Decorrelated Filter Bank: 상관감소 필터 뱅크), OSC(Octave-based Spectral Contrast: 옥타브기반 스펙트럴 콘트라스트) 중에서 적어도 하나를 포함할 수 있다.In addition, the short-term feature vector may include at least one of a Mel-Frequency Cepstral Coefficient (MFCC), a Decorrelated Filter Bank (DFB), an Octave-based Spectral Contrast (OSC) Contrast). &Lt; / RTI >

또한, 상기 Short-term 특징 벡터는, texture window 기법을 이용하여 추출되는 통계적 특징 벡터를 더 포함하되, 상기 통계적 특징 벡터는 평균값, 분산값, 최대값, 최소값 중에서 적어도 하나를 포함할 수 있다.In addition, the short-term feature vector may further include a statistical feature vector extracted using a texture window technique, and the statistical feature vector may include at least one of an average value, a variance value, a maximum value, and a minimum value.

또한, 상기 Long-term 특징 벡터는, FMS(Feature-based Modulation Spectrum: 특징 기반 변조 스펙트럼)를 이용하여 추출되는 FMSFM(Feature-based Modulation Spectral Flatness Measures: 특징기반 변조 스펙트럴 프래트니스 측정), FMSCM(Feature-based Modulation Spectral Crest Measures: 특징기반 변조 스펙트럴 크레스트 측정) 중에서 적어도 하나를 포함하며,In addition, the long-term feature vector may include Feature-based Modulation Spectral Flatness Measures (FMSFM) extracted using Feature-based Modulation Spectrum (FMS), FMSCM (Feature-based Modulation Spectral Crest Measures), wherein the feature-

상기 FMSFM 및 FMSCM은 아래 수학식으로 정의되는 음악 장르 분류 장치.
Wherein the FMSFM and the FMSCM are defined by the following formulas.

또한, 상기 Long-term 특징 벡터는, FMSC(Feature-based Modulation Spectral Contrast: 특징기반 변조 스펙트럴 콘트라스트), FMSV(Feature-based Modulation Spectral Valley: 특징기반 변조 스펙트럴 밸리) 중에서 적어도 하나를 추출하며, 상기 FMSC 및 FMSV는 아래 수학식으로 정의될 수 있다.The long-term feature vector extracts at least one of Feature-based Modulation Spectral Contrast (FMSC) and Feature-based Modulation Spectral Valley (FMSV) The FMSC and the FMSV can be defined by the following equations.

여기서, θ_q는 q번째 Modulation 밴드 내의 Modulation Frequency의 집합이다.Here, θ _q is a set of modulation frequencies in the q-th modulation band.

또한, 상기 FMS 및 상기 FMS의 평균은, 아래 수학식으로 정의될 수 있다.In addition, the average of the FMS and the FMS can be defined by the following equation.

여기서, T는 총 texture window의 수이고, X_t(k,p)는 t번째 texture window내의 p번째 프레임의 Short-term 특징의 k번째 요소이고, P_t는 t번째 texture window내에 속한 프레임의 총수이고, M은 Modulation Fourier-transform의 크기이다.Here, T is the total number of texture windows, X _t (k, p) is the kth element of the short-term feature of the p th frame in the t th texture window, P _t is the total number of frames belonging to the t th texture window And M is the magnitude of the Modulation Fourier-transform.

또한, 상기 Long-term 특징 벡터는, 통계적 특징을 더 포함하되, 상기 통계적 특징은 평균값, 분산값, 최대값, 최소값 중에서 적어도 하나를 포함할 수 있다.In addition, the long-term feature vector may further include a statistical feature, and the statistical feature may include at least one of an average value, a variance value, a maximum value, and a minimum value.

또한, 상기 장르 분류부는, 분류 대상이 되는 상기 테스트 음악이 입력되면, 상기 테스트 음악에 대하여 상기 상위 특징 벡터에 해당하는 특징 벡터를 추출하고, 상기 추출된 특징 벡터를 기초로 상기 장르별 모델과 비교하여 상기 입력된 테스트 음악의 장르를 분류할 수 있다.When the test music to be classified is input, the genre classifier extracts a feature vector corresponding to the upper feature vector with respect to the test music, compares the extracted feature vector with the genre-based model based on the extracted feature vector, The genres of the input test music can be classified.

또한, 상기 특징 선택부는, SVM(Support Vector Machine) ranker를 이용하여 상기 상위 특징 벡터를 선택하고, 상기 장르 분류부는, One-against-one SVM(Support Vector Machine)를 이용하여 상기 테스트 음악의 장르를 분류할 수 있다.The feature selecting unit selects the upper feature vector using a SVM (Support Vector Machine) ranker, and the genre classifier classifies the genre of the test music using a one-against-one SVM (Support Vector Machine) Can be classified.

또한, 본 발명의 하나의 실시예에 따른 음악 장르 분류 방법은, 음악 장르 분류 장치를 이용한 음악 장르 분류 방법에 있어서, 입력된 음원에 대해 Short-term 특징 벡터를 추출하는 단계; 상기 Short-term 특징 벡터를 이용하여 Long-term 특징 벡터를 추출하는 단계; 상기 Short-term 또는 Long-term 특징 벡터 중에서 인식률이 높은 상위 특징 벡터를 선택하는 단계; 및 상기 선택된 상위 특징 벡터를 이용하여 음악의 장르별 모델을 생성하는 단계; 상기 장르별 모델을 기반으로 입력된 테스트 음악의 장르를 분류하는 단계를 포함한다.According to another aspect of the present invention, there is provided a music genre classification method using a music genre classification apparatus, the method comprising: extracting a short-term feature vector for an input sound source; Extracting a long-term feature vector using the short-term feature vector; Selecting an upper feature vector having a higher recognition rate from among the Short-term or Long-term feature vectors; And generating a model for each genre of music using the selected upper feature vector; And classifying genres of the inputted test music based on the genre-specific model.

본 발명에 따르면, 음악의 특징 벡터 중에서 인식율이 높은 특징 벡터를 선택함으로써 입력된 음악의 장르 분류에 요구되는 계산량을 줄일 수 있으므로 분류 결과를 얻는데 걸리는 시간을 단축할 수 있으며, 높은 인식률을 얻을 수 있다.According to the present invention, by selecting a feature vector having a high recognition rate from the feature vectors of music, it is possible to reduce the amount of calculation required for the genre classification of the inputted music, so that it is possible to shorten the time required to obtain the classification result, .

도 1은 본 발명의 실시예에 따른 음악 장르 분류 장치의 구성도이다.
도 2는 본 발명의 실시예에 따른 음악 장르 분류 방법의 순서도이다.
도 3은 입력되는 음악 신호의 파형도이다.
도 4는 Analysis Window를 설명하기 위한 도면이다.
도 5는 texture window를 설명하기 위한 도면이다.
도 6은 본 발명의 실시예에 따른 MFCC 알고리즘을 이용한 특징 추출 블록도이다.
도 7은 본 발명의 실시예에 따른 DFB 알고리즘을 이용한 특징 추출 블록도이다.
도 8은 본 발명의 실시예에 따른 OSC 알고리즘을 이용한 특징 추출 블록도이다.1 is a configuration diagram of a music genre classifying apparatus according to an embodiment of the present invention.
2 is a flowchart of a music genre classification method according to an embodiment of the present invention.
3 is a waveform diagram of an input music signal.
4 is a diagram for explaining the Analysis Window.
5 is a view for explaining a texture window.
6 is a feature extraction block diagram using the MFCC algorithm according to the embodiment of the present invention.
7 is a feature extraction block diagram using the DFB algorithm according to the embodiment of the present invention.
8 is a feature extraction block diagram using an OSC algorithm according to an embodiment of the present invention.

이하에서는 첨부된 도면을 참조하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

먼저 본 발명의 음악 장르 인식 장치에 대하여 설명한다.First, the music genre recognizing apparatus of the present invention will be described.

도 1은 본 발명의 실시예에 따른 음악 장르 분류 장치의 구성도이다.1 is a configuration diagram of a music genre classifying apparatus according to an embodiment of the present invention.

도 1에 도시한 바와 같이, 음악 장르 분류 장치(100)는 입력부(110), Short-term 특징 추출부(120), Long-term 특징 추출부(130), 특징 선택부(140), 모델 생성부(150) 및 장르 분류부(160))를 포함한다.1, the music genre classification apparatus 100 includes an input unit 110, a short-term feature extraction unit 120, a long-term feature extraction unit 130, a feature selection unit 140, Section 150 and genre classification section 160).

입력부(110)는 음성 또는 음악 신호를 프레임 단위로 입력 받는다. 본 발명에 따른 하나의 실시예로서 입력부(110)는 마이크로폰을 통한 사람의 음성을 직접 입력 받거나 음악 데이터 베이스로부터 저장된 음악 데이터를 입력 받을 수 있다.The input unit 110 receives voice or music signals on a frame basis. As an embodiment of the present invention, the input unit 110 may directly receive a human voice through a microphone or receive music data stored from a music database.

Short-term 특징 추출부(120)는 입력부(100)로 입력된 음성 또는 음악 신호의 Short-term 특징에 해당하는 음색 특징(Timbre feature) 또는 통계적 특징을 추출한다.The short-term feature extraction unit 120 extracts a timbre feature or a statistical feature corresponding to a short-term feature of a voice or music signal input to the input unit 100.

Long-term 특징 추출부(130)는 Short-term 특징 추출부(120)에서 추출된 음색 특징(Timbre feature)을 이용하여 Long-term 특징을 추출한다. 또한, Long-term 특징 추출부(130)는 통계적 특징도 추출한다.The long-term feature extraction unit 130 extracts the long-term feature using the timbre feature extracted from the short-term feature extraction unit 120. Also, the long-term feature extraction unit 130 extracts statistical features.

특징 선택부(140)는 SVM ranker라는 알고리즘을 이용하여 Short-term 특징부(120) 또는 Long-term 특징부(130)에서 추출된 전체 특징 벡터 중에서 인식률이 높은 상위 특징 벡터를 선택한다. 즉, 특징 선택부(140)는 전체 Short-term 또는 Long-term 특징 벡터에 대하여 인식률을 측정하고, 인식률이 기준값보다 높은 상위 특징 벡터를 선택한다. 전체 특징 벡터 중에서 선택된 상위 특징 벡터만이 음악 장르 분류에 이용되기 때문에 선택부(140)는 특징 추출에 필요한 계산량을 줄일 수 있으며 인식률을 향상시킬 수 있다.The feature selection unit 140 selects an upper feature vector having a higher recognition rate from among the entire feature vectors extracted from the short-term feature unit 120 or the long-term feature unit 130 using an algorithm called an SVM ranker. That is, the feature selecting unit 140 measures the recognition rate for the entire short-term or long-term feature vector, and selects an upper feature vector whose recognition rate is higher than the reference value. Since only the upper feature vector selected from the entire feature vectors is used for music genre classification, the selection unit 140 can reduce the amount of calculation required for feature extraction and improve the recognition rate.

모델 생성부(150)는 선택부(140)가 선택한 상위 특징 벡터를 기초로 각 음악 장르별 모델을 생성한다. 구체적인 장르는 Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal 등으로 구분될 수 있다.The model generation unit 150 generates a model for each music genre based on the upper feature vector selected by the selection unit 140. [ Specific genres can be classified into Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal and so on.

장르 분류부(160)는 모델 생성부(150)가 생성한 음악 장르별 모델을 기초로 테스트 음악에서 추출한 상위 벡터를 이용하여 테스트 음악의 장르를 인식기를 이용하여 분류한다.The genre classifying unit 160 classifies the genres of the test music using the recognizer by using the upper vector extracted from the test music on the basis of the music genre model generated by the model generating unit 150. [

이하에서는 본 발명의 실시예에 따른 음악 장르 분류 장치(100)를 이용한 음악 장르 인식 방법에 관하여 더욱 상세하게 설명한다.Hereinafter, a music genre recognition method using the music genre classification apparatus 100 according to an embodiment of the present invention will be described in more detail.

도 2는 본 발명의 실시예에 따른 음악 장르 분류 방법의 순서도이다.2 is a flowchart of a music genre classification method according to an embodiment of the present invention.

본 발명의 실시예에 따른 음악 장르 인식 방법은, 전체적으로 장르별 모델을 생성하기 위한 학습단계와 생성된 모델을 통하여 입력된 테스트 음악의 장르를 분류하기 위한 테스트 단계를 포함한다. 도 2에 나타낸 것과 같이 학습단계는 S210 내지 S240 단계를 포함하고, 테스트 단계는 S250 내지 S280 단계를 포함한다.The method for recognizing a music genre according to an embodiment of the present invention includes a learning step for generating a genre model as a whole and a testing step for classifying a genre of the inputted test music through the generated model. As shown in FIG. 2, the learning step includes steps S210 to S240, and the testing step includes steps S250 to S280.

최종적으로 음악의 장르를 분류하기 위해서는 음악의 장르를 인식하는 것이 선행되어야 하는데 음악의 장르를 인식하기 위하여, 음악의 장르별 모델을 정확히 생성해야 한다.In order to finally classify music genres, it is necessary to recognize the genre of music. In order to recognize the genre of music, it is necessary to accurately generate a genre model of music.

음악 장르의 인식을 위해서 본 발명의 실시예에 따른 음악 장르 분류 장치(100)는 먼저 각 장르의 음악에 대해서 특징 벡터를 추출하는데, 특징 벡터의 종류로는 Short-term 및 Long-term 특징 벡터, 그리고, Short-term 및 Long-term 특징 벡터를 이용한 통계적 특징 벡터를 포함한다.For the recognition of the music genre, the music genre classifying apparatus 100 according to the embodiment of the present invention first extracts the feature vectors for each genre of music. The feature vectors include Short-term and Long-term feature vectors, And includes statistical feature vectors using short-term and long-term feature vectors.

도 3은 입력되는 음악 신호의 파형도이다. 도 4는 Analysis Window를 설명하기 위한 도면이며, 도 5는 texture windows를 설명하기 위한 도면이다.3 is a waveform diagram of an input music signal. FIG. 4 is a view for explaining an Analysis Window, and FIG. 5 is a view for explaining a texture window.

도 3에 도시한 바와 같이 입력부(110)를 통하여 음악 신호가 입력되면, Short-term 특징 추출부(120)는 도 4와 같은 Window를 통해서 프레임별 Short-term 특징 벡터를 추출한다(S210). Short-term 특징 추출부(120)는 각 프레임(Analysis Window)에서 추출된 Short-term 특징 벡터를 기초로, 도 5의 texture window 기법을 이용하여 통계적 특징 벡터를 추출한다. 통계적 특징은 평균, 분산, 최대값 또는 최소값 중에서 적어도 하나를 포함한다.As shown in FIG. 3, when a music signal is input through the input unit 110, the short-term feature extraction unit 120 extracts a short-term feature vector for each frame through the window shown in FIG. 4 (S210). The short-term feature extraction unit 120 extracts a statistical feature vector using the texture window technique of FIG. 5 based on the short-term feature vector extracted from each frame (Analysis Window). The statistical feature includes at least one of an average, a variance, a maximum value, or a minimum value.

이하에서는, Short-term 특징 벡터에 해당하는 음색 특징(Timbre Feature)을 추출하는 단계(S210)에 대해 더욱 상세하게 설명한다.Hereinafter, the step S210 of extracting the timbre feature corresponding to the short-term feature vector will be described in more detail.

본 발명의 하나의 실시예에 따른Short-term 특징 추출에는 음성인식과 화자인식에 주로 사용되는 MFCC(Mel-Frequency Cepstral Coefficient: 멜주파수 캡스트럴 계수), DFB(Decorrelated Filter Bank: 상관감소 필터 뱅크) 또는 음악 인식에 사용되는 OSC(Octave-based Spectral Contrast: 옥타브기반 스펙트럴 콘트라스트)를 이용한다.In short-term feature extraction according to an embodiment of the present invention, a Mel-Frequency Cepstral Coefficient (MFCC) and a Decorrelated Filter Bank (DFB), which are mainly used for speech recognition and speaker recognition, ) Or an OSC (Octave-based Spectral Contrast) used for music recognition.

이하 도 6을 참조하여 MFCC 알고리즘에 대해 설명한다.Hereinafter, the MFCC algorithm will be described with reference to FIG.

도 6은 본 발명의 실시예에 따른 MFCC 알고리즘을 이용한 특징 추출 블록도이다.6 is a feature extraction block diagram using the MFCC algorithm according to the embodiment of the present invention.

먼저, Short-term 특징 추출부(120)는 시간 도메인에서 각 프레임에 Hamming window를 적용 후 Fast Fourier Transform(FFT)을 수행한다. 그 후 밴드의 수가 B개인 Mel-scale Band-pass Filter로 스펙트럼을 스케일링 한 후 각 스펙트럼의 가중치 합 WS(b)_b=1,…, _B 를 구한다. 그 후 이 값에 log를 적용한다(1≤b≤B). 마지막으로, Short-term 특징 추출부(120)는 각 가중치 합에 Discrete Cosine Transform (DCT)을 적용하여 K차원의 MFCC 특징 벡터를 추출한다.First, the short-term feature extraction unit 120 performs Fast Fourier Transform (FFT) after applying a Hamming window to each frame in the time domain. Then, after scaling the spectrum with a Mel-scale band-pass filter with B number of bands, the weighting sum of each spectrum WS (b) _{b = 1, ... , And} _B are obtained. Then apply log to this value (1 ≤ b ≤ B). Finally, the short-term feature extraction unit 120 extracts a K-dimensional MFCC feature vector by applying Discrete Cosine Transform (DCT) to each weighted sum.

Short-term 특징 추출부(120)가 MFCC를 이용해 추출하는 특징은 아래 수학식 1로 정의된다.The feature that the short-term feature extraction unit 120 extracts using the MFCC is defined by Equation 1 below.

다음으로, 도 7을 참조하여 DFB(Decorrelated Filter Bank: 상관감소 필터 뱅크)에 대해 설명한다.Next, the DFB (Decorrelated Filter Bank) will be described with reference to FIG.

도 7은 본 발명의 실시예에 따른 DFB 알고리즘을 이용한 특징 추출 블록도이다.7 is a feature extraction block diagram using the DFB algorithm according to the embodiment of the present invention.

Short-term 특징 추출부(120)는 DFB 알고리즘을 통해 MFCC와 마찬가지로 각 밴드별 스케일링된 값들을 더한 후, log를 적용하여 log(WS(b))(1≤b≤B)를 생성한다. 그리고, 도 7과 같이 마지막 과정에서 DCT를 대신하여 FIR High-pass-filter

를 통과하게 한다. 최종적으로 Short-term 특징 추출부(120)는 아래의 수학식 2와 같이 D차원의 DFB 특징 벡터를 추출한다.The short-term feature extraction unit 120 generates a log (WS (b)) (1? B? B) by adding scaled values for each band as in MFCC through a DFB algorithm and applying log. As shown in FIG. 7, in the last step, an FIR High-pass-filter

. Finally, the short-term feature extraction unit 120 extracts the D-dimensional DFB feature vector as shown in Equation (2) below.

다음으로, 도 8를 참조하여 OSC(Octave-based Spectral Contrast: 옥타브 기반 스펙트럴 콘트라스트)에 대해 설명한다.Next, an octave-based spectral contrast (OSC) will be described with reference to FIG.

도 8은 본 발명의 실시예에 따른 OSC 알고리즘을 이용한 특징 추출 블록도이다.8 is a feature extraction block diagram using an OSC algorithm according to an embodiment of the present invention.

OSC는 MFCC나 DFB와 다르게 청각모델이 아닌 옥타브(Octave)를 기반으로 특징을 추출한다. Short-term 특징 추출부(120)는 OSC 알고리즘을 이용하여 각 밴드별 스펙트럼의 Peak와 Valley값을 고려한 특징 벡터를 추출한다. 대부분의 음악에서 강한 Peak는 Harmonic부분과 연관되며, 강한 Valley는 Non-harmonic부분과 연관된다. 따라서, OSC는 밴드별 스펙트럼의 Peak와 Valley 값을 고려함으로써, 음악의 Harmonic과 Non-harmonic 성분을 나타낼 수 있다.Unlike MFCC or DFB, OSC extracts features based on octave rather than auditory model. The short-term feature extraction unit 120 extracts a feature vector considering the peaks and valleys of the spectrum of each band using the OSC algorithm. In most music, strong peaks are associated with the harmonic part, and stronger valleys are associated with the non-harmonic part. Therefore, OSC can represent the harmonic and non-harmonic components of music by considering the peak and valley values of spectrum per band.

도 8에 도시한 바와 같이, OSC는 MFCC나 DFB와 다르게 멜스케일 밴드 패스 필터(Mel-scale Band-pass Filter)를 사용하지 않고, 옥타브스케일 밴드 패스 필터(Octave-scale Band-pass Filter)를 사용한다. 본 발명에서는 아래 표 1과 같이 8개의 Octave-scale Band-pass Filter를 이용한다.As shown in FIG. 8, the OSC uses an octave-scale band-pass filter instead of the Mel-scale band-pass filter, unlike MFCC and DFB. do. In the present invention, eight octave-scale band-pass filters are used as shown in Table 1 below.

BandBand Frequency(Hz)Frequency (Hz) 1One 0~1000-100 22 100~200100 to 200 33 200~400200 to 400 44 400~800400 to 800 55 800~1600800 ~ 1600 66 1600~32001600 to 3200 77 3200~80003200 ~ 8000 88 8000~220508000-22050

한 프레임의 길이와 FFT 포인트가 N으로 같을 때, 각 프레임에 대한 FFT 스펙트럼은 {x₁, x₂,…,x_N}로 정의할 수 있다. i번째 밴드에 해당하는 FET포인트 수가 Ki이면, i번째 밴드의 스펙트럼은 {x_i _,1, x_i _,2,…,x_i _, _Ki}와 같이 나타낼 수 있다. Peak와 Valley를 구하기 위해선 먼저 각 밴드별 스펙트럼을 내림차순으로 정리한다. 내림차순 정리된 i번째 밴드의 스펙트럼 {x'_i _,1, x'_i _,2, …, x'_i _, _Ki)을 가지고, 아래 식을 이용하여 Peak와 Valley를 추출할 수 있다.When the length of one frame and the FFT point are equal to N, the FFT spectrum for each frame is {x ₁ , x ₂ , ... , x _N }. If the number of FET points corresponding to the i-th band is K _i _, the spectrum of the i-th band is {x _i _{, 1} , x _i _{, 2} , ... , x _i _, _Ki }. To find peaks and valleys, first sort the spectra for each band in descending order. The spectrum of the i-th band in descending order {x ' _i _{, 1} , x' _i _{, 2} , ... , x ' _i _, _Ki ), we can extract Peak and Valley using the following equation.

여기서α는 주변 값의 범위에 대한 상수로서 0.02부터 0.2까지 실험한 결과 α는 성능에 중요한 영향을 미치지 않는 바, 0.02로 설정하여 사용할 수 있다. 수학식 3 및 수학식 4에서 구한 Peak 값과 Valley 값의 차이를 계산하여 다음의 수학식 5와 같이 Spectral Contrast(SC_i)를 구한다.In this case, α is a constant for the range of the peripheral value, and as a result of testing from 0.02 to 0.2, α has no significant effect on the performance, so it can be set to 0.02. The difference between the peak value and the valley value obtained by Equations (3) and (4) is calculated and the spectral contrast (SC _i ) is obtained as shown in Equation (5).

OSC는 다음과 같이 I개 밴드에 대한 Spectral Contrast와 Valley {SC₁, SC₂, …,SC_I, V₁, V₂, …,V_I}를 특징 벡터로 사용하게 된다. 본 발명에서는, 8개의 밴드를 사용하기 때문에 16차 OSC 특징 벡터를 사용하게 된다. i번째 밴드에 대해서 OSC는 Spectral Contrast와 Valley를 {SC_i, V_i}와 같이 특징 벡터로 사용하게 된다.The OSC uses Spectral Contrast and Valley {SC ₁ , SC ₂ , ... , SC _I , V ₁ , V ₂ , ... , V _I } are used as feature vectors. In the present invention, 16-order OSC feature vectors are used because eight bands are used. For the i-th band, OSC uses Spectral Contrast and Valley as feature vectors such as {SC _i , V _i }.

Short-term 특징 추출부(120)는 다음의 수학시 6 및 수학식 7을 통하여 Analysis Window에서 추출된 특징 벡터들의 평균(μ_t(k))과 분산(σ_t ²(k))을 추출한다.Short-term feature extraction unit 120 extracts the mean (μ _t (k)) and the variance (σ _t ² (k)) of the feature vector extracted from the Analysis Window through the following equation: 6) and (7 .

상기 수학식에서, X_t(k,p)는 t번째 texture window에 p번째 프레임의 음색 특징(Timbre feature)의 k번째 요소이고, P는 texture window 안에 포함된 총 프레임 개수이다.In the above equation, X _t (k, p) is the kth element of the timbre feature of the p th frame in the t th texture window, and P is the total number of frames contained in the texture window.

본 발명에서 Short-term 특징 추출부(120)는 아래 수학식 8 및 수학식 9와 같이 통계적 특징인 특징 벡터의 최소값(MIN_t(k))과 최대값(MAX_t(k))도 추출한다.The short-term feature extraction unit 120 extracts the minimum value MIN _t (k) and the maximum value MAX _t (k) of the feature vector, which are statistical features, as shown in the following Equations 8 and 9 .

추출된 통계적 특징 벡터인 최대값(MAX_t(k))과 최소값(MIN_t(k))은 texture window내의 에너지의 크기를 나타낸다.The maximum value (MAX _t (k)) and the minimum value (MIN _t (k)) of the extracted statistical feature vector represent the magnitude of energy in the texture window.

t번째 texture window에 구해진 통계적 특징 벡터인 μ_t(k),σ_t ²(k), MIN_t(k), MAX_t(k)는 전체 texture window에 대하여 각각 평균을 구하면 아래 수학식과 10 내지 13과 같다.statistical feature vector in _{_{μ t (k), σ t}} 2 (k) obtained in the t-th _{texture window, MIN t (k)} , MAX t (k) is a mathematical expression 10 to 13 below, ask each averaged over the entire texture window Respectively.

상기의 수학식 1 내지 수학식 3과 같이, Short-term 특징 추출부(120)가 Short-term 특징을 추출하면, Long-term 특징 추출부(130)는 추출된 Short-term 특징을 이용하여 Long-term 특징을 추출한다(S220). 이하 Long-term 특징 중의 하나인 변조 스펙트럼(Modulation Spectrum)을 사용한 스펙트럼 특징(Spectro-temporal Feature)에 대해 설명한다.When the short-term feature extraction unit 120 extracts the short-term feature, the long-term feature extraction unit 130 extracts the long-term feature using the extracted short-term feature, as shown in Equations (1) to (3) -term feature is extracted (S220). Hereinafter, a spectro-temporal feature using a modulation spectrum, which is one of the long-term features, will be described.

본 발명의 실시예에 따르면 Long-term 특징 추출부(130)는 변조 스펙트럼(Modulation Spectrum)을 사용한 특징 벡터로 MSFM(Modulation Spectral Flatness Measures: 변조 스펙트럴 프래트니스 측정), MSCM(Modulation Spectral Crest Measures: 변조 스펙트럴 크레스트 측정), MSC(Modulation Spectral Contrast: 변조 스펙트럴 콘트라스트), MSV(Modulation Spectral Valley: 변조 스펙트럴 밸리)를 추출한다.According to the embodiment of the present invention, the long-term feature extraction unit 130 extracts a feature vector using a modulation spectrum, such as Modulation Spectral Flatness Measures (MSFM), Modulation Spectral Crest Measures (MSCM) : Modulation spectral crest measurement), MSC (Modulation Spectral Contrast) and MSV (Modulation Spectral Valley) are extracted.

또한, Long-term 특징 추출(130)는 상기 MSFM/MSCM을 기반으로 FMSFM(Feature-based Modulation Spectral Flatness Measures: 특징기반 변조 스펙트럴 프래트니스 측정) 및 FMSCM(Feature-based Modulation Spectral Crest Measures: 특징기반 변조 스펙트럴 크레스트 측정)을 추출할 수 있다.In addition, the long-term feature extraction unit 130 extracts feature-based modulation spectral flatness measures (FMSFM) and feature-based modulation spectral crest measures (FMSCM) based on the MSFM / MSCM Based modulation spectral crest measurement) can be extracted.

MSFM/MSCM은 옥타브 밴드 합(Octave Band Sum: OBS)의 변조 스펙트럼(Modulation Spectrum)을 이용하는 반면, 본 발명의 실시예에 따른 FMSFM/FMSCM은 FMS(Feature-based Modulation Spectrum: 특징기반 변조 스펙트럼)를 이용하여 추출된다. Long-term 특징 추출부(130)는 FMS를 이용, 각 Feature Dimension 별로 변조 스펙트럼(Modulation Spectrum)을 추출하는데, 이를 통해 각 Feature Dimension이 시간에 따라 어떻게 변화하는지를 알 수 있다.MSFM / MSCM uses a modulation spectrum of an octave band sum (OBS), whereas the FMSFM / FMSCM according to an embodiment of the present invention uses a Feature-based Modulation Spectrum (FMS) . The long-term feature extraction unit 130 extracts a modulation spectrum for each Feature Dimension using the FMS, thereby knowing how each Feature Dimension changes with time.

먼저, Long-term 특징 추출부(130)는 다음의 수학식 14와 같이 FMS를 추출한다.First, the long-term feature extraction unit 130 extracts the FMS as shown in Equation (14).

여기서 X_t(k,p)는 t번째 texture window 내의 p번째 프레임의 음색 특징(Timbre Feature)의 k번째 요소이고, P_t는 t번째 texture window내의 속한 프레임의 총 수이고, M은 변조 푸리에 변환(Modulation Fourier-transform)의 크기이다. 그리고 Long-term 특징 추출부(130)는 전체 texture window에 대하여 아래와 같이 평균 FMS를 추출한다.Where and X _t (k, p) is the k th element of the voice characteristics (Timbre Feature) of the p-th frame in the t-th texture window, P _t is the total number of the frame that belong in the t-th texture window, M is modulation Fourier transform (Modulation Fourier-transform). The long-term feature extraction unit 130 extracts an average FMS for the entire texture window as follows.

여기서 T는 총 texture window의 개수이다.Where T is the total number of texture windows.

다음으로, Long-term 특징 추출부(130)는 수학식 14 및 수학식 15의 결과를 이용하여 FMSFM/FMSCM을 다음과 같이 추출한다.Next, the long-term feature extraction unit 130 extracts the FMSFM / FMSCM using the results of Equations (14) and (15) as follows.

수학식 16에서 연산된 FMSFM의 작은값(예를 들면 0)과 큰값(예를 들면 1 또는 1이상의 값)은 각각 평균 FMS의 Peakiness와 Flatness를 나타낸다. FMSCM은 FMSFM과 반대의 성향을 보인다. 만약 k번째 FMSFM이 매우 작은 값을 가진다면, 입력 음악에서 k번째 Modulation 주파수가 반복되는 패턴을 가진다. A small value (for example, 0) and a large value (for example, 1 or 1 or more) of the FMSFM calculated in the equation (16) represent the average FMS peaks and flatness, respectively. FMSCM shows a tendency to contradict with FMSFM. If the kth FMSFM has a very small value, the kth modulation frequency in the input music has a repetitive pattern.

마지막으로 Long-term 특징 추출부(130)는 FMS를 이용하여, FMSC 및 FMSV를 다음의 수학식 18 및 수학식 19와 같이 추출한다.Finally, the long-term feature extraction unit 130 extracts the FMSC and the FMSV using the FMS as shown in the following Equations (18) and (19).

θ_q는 q번째 Modulation 밴드 내의 Modulation Frequency의 집합이다. 최종적으로 모든 Modulation 밴드 Q에 대하여, FMSV/FMSC의 평균과 분산을 다음의 수학식 20 및 수학식 21과 같이 추출한다.θ _q is a set of modulation frequencies in the qth modulation band. Finally, for all Modulation bands Q, the mean and variance of the FMSV / FMSC are extracted as shown in the following equations (20) and (21).

Long-term 특징 추출부(130)는 MSFM/MSCM과 MSV/MSC을 추출하기 위해 Octave Band내의 에너지인 Octave Band Sum (OBS)을 사용한다.The long-term feature extraction unit 130 uses the Octave Band Sum (OBS), which is the energy in the Octave Band, to extract MSFM / MSCM and MSV / MSC.

실제 시스템에서 추출되는 전체 특징에 해당하는 특징 벡터는 아래 표 2와 같다.Table 2 shows the feature vectors corresponding to the overall features extracted from the actual system.

Feature vectorsFeature vectors DimensionDimension texture windowtexture window MeanMean MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 VarianceVariance MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 MaxMax MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 MinMin MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 Feature-based modulation spectrumFeature-based modulation spectrum FMSFMFMSFM MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 FMSCMFMSCM MFCCMFCC 1313 DFBDFB 1313 OSCOSC 1616 MeanMean FMSC/FMSVFMSC / FMSV MFCCMFCC 2626 DFBDFB 2626 OSCOSC 3232 VarYes FMSC/FMSVFMSC / FMSV MFCCMFCC 2626 DFBDFB 2626 OSCOSC 3232 Octave-based modulation spectrumOctave-based modulation spectrum MSFMMSFM OBSOBS 88 MSCMMSCM OBSOBS 88 MeanMean MSC/MSVMSC / MSV OBSOBS 1616 VarYes MSC/MSVMSC / MSV OBSOBS 1616 TotalTotal 468468

이와 같이, 표 2에서 보는 바와 같이 Short-term 특징 추출부(120)와 Long-term 특징 추출부(130)는 입력된 음악에 대하여 468개의 특징 벡터를 생성할 수 있다. 예를 들면, texture window 기법을 통하여 입력된 음악 신호 중에서 13개의 차수 각각에 대하여 MFCC의 평균 값(Mean)에 대한 13개의 특징 벡터를 획득하고, 13개의 차수 각각에 대하여 DFB의 평균 값(Mean)에 대한 13개의 특징 벡터를 획득한다.As shown in Table 2, the short-term feature extraction unit 120 and the long-term feature extraction unit 130 can generate 468 feature vectors for the input music. For example, 13 feature vectors for mean of MFCC are obtained for each of thirteen orders among the music signals input through the texture window technique, and the mean value of DFB for each of 13 orders is obtained. 13 < / RTI >

다음으로 특징 선택부(140)는 추출된 Short-term 또는 Long-term 특징 벡터 중에서 상위 특징 벡터를 선택한다(S230). 여기서 상위 특징 벡터란 인식률이 기준값보다 높은 특징 벡터를 의미한다.Next, the feature selecting unit 140 selects an upper feature vector among the extracted short-term or long-term feature vectors (S230). Here, the upper feature vector means a feature vector whose recognition rate is higher than the reference value.

이와 같이 본 발명의 실시예에 따르면 종래 기술과 달리, 추출된 Short-term 또는 Long-term 특징 벡터들 중에서 우선 순위에 대응하는 특징 벡터를 선택함으로써, 인식률을 향상시키고 계산량을 줄일 수 있다. 표 2를 예로 들면, 특징 선택부(140)는 획득한 468개의 특징 벡터 중에서 인식률이 우수한 160개의 상위 특징 벡터를 선택한다. 따라서, 테스트 단계에서는 추출된 전체 특징들 중에서 특징 선택부(140)에 의해 선택된 상위 특징 벡터만이 장르 인식에 사용된다.As described above, according to the embodiment of the present invention, by selecting a feature vector corresponding to the priority order among the extracted short-term or long-term feature vectors unlike the prior art, the recognition rate can be improved and the amount of calculation can be reduced. Taking Table 2 as an example, the feature selecting unit 140 selects 160 higher feature vectors having excellent recognition rates from 468 feature vectors obtained. Therefore, in the test step, only the upper feature vector selected by the feature selecting unit 140 among all the extracted features is used for genre recognition.

그리고, 본 발명의 실시예에 따른 특징 선택부(140)는 Best-first search와 Support Vector Machine(SVM) ranker를 이용하는 방법 등을 통하여 우선 순위에 대응하는 상위 특징 벡터를 선택할 수 있다. 즉, 특징 선택부(140)는 SVM-ranker라는SVM 인식기를 이용하여 각 특징별로 가치를 평가하여 Ranking을 매겨 특징 벡터를 선택할 수 있는데, SVM-ranker를 통하여 Ranking을 매기는 기술은 당업자라면 용이하게 실시할 수 있는 내용이므로 상세한 설명은 생략한다.The feature selecting unit 140 according to the embodiment of the present invention can select an upper feature vector corresponding to a priority order through a method of using a best-first search and a support vector machine (SVM) ranker. That is, the feature selecting unit 140 can select a feature vector by evaluating the value of each feature using a SVM recognizer called SVM-ranker, and then rank the feature. The technique of ranking through the SVM-ranker is easy The detailed description will be omitted.

다음으로 모델 생성부(150)는 선택된 상위 특징 벡터를 이용하여 음악의 장르별 모델을 생성한다(S240). 구체적인 장르는 Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal 등으로 구분될 수 있다. 구체적인 방법에 대해 살펴 보면, 모델 생성부(150)는 추출된 상위 특징 벡터를 이용하여 음악의 각 장르별로 모델링을 수행한다. 즉, 모델 생성부(150)는 각각의 음악 장르에 대응하는 상위 특징 벡터의 특징 값들을 그룹핑하여 각 장르별로 모델을 생성한다.Next, the model generating unit 150 generates a model for each genre of music using the selected upper feature vector (S240). Specific genres can be classified into Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal and so on. The model generation unit 150 performs modeling for each genre of music using the extracted top feature vectors. That is, the model generation unit 150 groups feature values of upper feature vectors corresponding to respective musical genres, and generates a model for each genre.

이와 같이 장르별 모델이 생성되면, 입력된 테스트 음악의 장르를 분류하기 위한 테스트 단계가 진행된다.When the genre-specific model is generated as described above, a test step for classifying the genre of the input test music is performed.

먼저, 도 2와 같이 분류 대상이 되는 테스트 음악 신호가 입력되면(S250), 장르 분류부(160)는 입력된 음악 신호에서 상위 특징 벡터에 포함되는 Short-term 특징 벡터와 Long-term 특징 벡터를 순차적으로 추출한다(S260, S270). 즉, 표 2의 경우에는, 장르 분류부(160)는 468개 전체에 대한 특징 벡터를 추출할 필요가 없이 우선 순위에 해당하는 160개의 특징 벡터만을 추출한다.2, when a test music signal to be classified is inputted (S250), the genre classifying unit 160 classifies a short-term feature vector and a long-term feature vector included in an upper feature vector into an input music signal (S260, S270). That is, in the case of Table 2, the genre classifier 160 does not need to extract the feature vectors for all 468, but extracts only 160 feature vectors corresponding to the priorities.

예를 들면, texture window 기법을 통하여 획득한 MFCC의 평균 값(Mean)에 대하여 13개의 차수 중에서 1, 4, 6, 9, 13차에 대응하는 음악 신호의 특징 벡터가 우선 순위에 포함된다고 가정하면, 장르 분류부(160)는 입력된 테스트 음악 신호 중에서 상기 5개에 대응하는 특징 벡터를 추출하도록 한다.For example, if it is assumed that the feature vectors of music signals corresponding to the first, fourth, sixth, ninth, and thirteenth orders among the thirteen orders are included in the priority order with respect to the mean value of MFCC obtained through the texture window technique , And the genre classifying unit 160 extracts the feature vectors corresponding to the five test music signals from the input test music signals.

여기서, Short-term 특징 벡터와 Long-term 특징 벡터를 추출하는 단계(S260, S270)는 상위 특징 벡터에 해당하는 Short-term 특징 벡터와 Long-term 특징 벡터를 추출한다는 점을 제외하고 상기 S210과 S220와 실질적으로 동일한 바, 중복되는 설명은 생략한다. 또한 설명의 편의상 장르분류부(160)가 Short-term 특징 벡터와 Long-term 특징 벡터를 추출하는 것(S260, S270)으로 설명하였으나, Short-term 특징 추출부(120)와 Long-term 특징 추출부(130)가 테스트 과정을 수행할 수도 있다.The steps S260 and S270 for extracting the short-term feature vector and the long-term feature vector extract the short-term feature vector and the long-term feature vector corresponding to the feature vector, Substantially the same as S220, and redundant description is omitted. Although it has been described that the genre classifier 160 extracts the short-term feature vector and the long-term feature vector (S260 and S270) for convenience of explanation, the short-term feature extraction unit 120 and the long- Unit 130 may perform the test procedure.

다음으로, 장르 분류부(160)는 추출된 상위 특징 벡터에 해당하는 특징 벡터를 이용하여 모델 생성부(150)가 생성한 장르별 모델을 기반으로 입력된 테스트 음악의 장르를 분류한다(S280). 특히 본 발명의 실시예에 따르면 장르 분류부(160)는 One-against-one SVM을 사용하여 음악 장르를 분류할 수 있다.Next, the genre classifying unit 160 classifies the genres of the test music based on the genre-based model generated by the model generating unit 150 using the feature vector corresponding to the extracted upper feature vector (S280). In particular, according to the embodiment of the present invention, the genre classifier 160 can classify music genres using a one-against-one SVM.

이와 같이 본 발명의 실시예에 따르면 테스트 단계에서 상위 특징 벡터만을 추출하여 음악 장르를 분류하는데 이용하기 때문에, 입력되는 테스트 음악의 모든 특징 벡터를 추출하여 음악 장르를 분류하는 종래 기술에 비하여 연산량이 월등히 감소할 수 있다.As described above, according to the embodiment of the present invention, since only the upper feature vector is extracted in the test step and used to classify the music genre, the amount of computation is much higher than that of the prior art in which all the feature vectors of the input test music are extracted to classify the music genre .

본 발명의 실시예에 따른 하나의 실험예로서 음악 장르 분류 성능 평가를 위해GTZAN과 ISMIR2004 공개 데이터 베이스를 이용하였다. GTZAN은 Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop, Metal로 총 10개의 장르를 포함하며, 각 장르당 100곡, 한 곡당 30초로 16bit, 22050Hz, 모노, AU파일 포맷으로 구성되어있다.As an experimental example according to an embodiment of the present invention, GTZAN and ISMIR2004 public databases were used for music genre classification performance evaluation. GTZAN includes 10 genres including Classical, Country, Disco, Hiphop, Jazz, Rock, Blues, Reggae, Pop and Metal. It has 100 songs per genre and 30 seconds per song in 16bit, 22050Hz, mono and AU file formats. Consists of.

분류기Classifier Accuracy(%)Accuracy (%) GTZANGTZAN ISMIR2004ISMIR2004 Dim.Dim. All feature setAll feature set 84.084.0 84.884.8 468468 +Feature selection+ Feature selection 85.085.0 86.386.3 160160 +SVM RBF kernel+ SVM RBF kernel 87.487.4 89.989.9 160160

장르 분류 방법How to classify genres Accuracy(%)Accuracy (%) GTZANGTZAN ISMIR2004ISMIR2004 Gaussian supervector
(MIREX2009: winner)Gaussian supervector
(MIREX2009: winner) 82.182.1 79.079.0 Block-level features
(MIREX2010: winner)Block-level features
(MIREX2010: winner) 85.585.5 88.288.2 Gaussian supervector + visual features
(MIREX2011: second place)Gaussian SuperVector + visual features
(MIREX2011: second place) 86.186.1 86.186.1 본 발명의 실시예Examples of the present invention 87.487.4 89.989.9

표 3은 모든 특징을 사용한 성능, 특징 선택 알고리즘을 적용한 성능, 그리고 마지막으로 Linear 커널을 RBF 커널로 바꾸었을 때의 성능을 나타낸다. 표 4는 다른 시스템들과 성능을 비교한 표이다. 본 발명에 의한 실시예가 높은 인식률을 보이는 것을 알 수 있다.Table 3 shows performance using all features, performance with feature selection algorithm, and finally, performance when the Linear kernel is replaced with the RBF kernel. Table 4 compares performance with other systems. It can be seen that the embodiment according to the present invention has a high recognition rate.

이와 같이 본 발명의 실시예에 따른 음악 장르 분류 방법 및 그 장치에 따르면, 음악의 특징 추출 후의 상위 특징 벡터 선택 과정을 통해, 입력된 음악 데이터의 장르 분류에 요구되는 계산량을 줄일 수 있기 때문에 분류 결과를 얻는데 걸리는 시간을 단축할 수 있다. 또한, 상위 특징 벡터 선택을 통해 입력된 음악 데이터의 장르를 분류하여도 전체 특징을 사용한 장치 또는 방법과 비교하여 높은 인식률을 얻을 수 있다.As described above, according to the method and apparatus for classifying music genres according to the embodiment of the present invention, it is possible to reduce the amount of calculation required for classifying genre of input music data through the process of selecting an upper feature vector after extracting features of music, Can be shortened. In addition, even if the genre of the music data inputted through the selection of the upper feature vector is classified, a high recognition rate can be obtained compared with the device or the method using the whole feature.

이제까지 본 발명에 대하여 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 따라서 본 발명의 범위는 전술한 실시예에 한정되지 않고 특허청구범위에 기재된 내용 및 그와 동등한 범위 내에 있는 다양한 실시 형태가 포함되도록 해석되어야 할 것이다.The present invention has been described above with reference to the embodiments. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. Therefore, the scope of the present invention is not limited to the above-described embodiments, but should be construed to include various embodiments within the scope of the claims and equivalents thereof.

110: 입력부 120: Short-term 특징 추출부
130: Long-term 특징 추출부 140: 특징 선택부
150: 모델 생성부 160: 장르 분류부110: Input unit 120: Short-term feature extraction unit
130: Long-term feature extraction unit 140:
150: model generation unit 160: genre classification unit

Claims

An input unit for receiving a sound source;
A short-term feature extraction unit for extracting a short-term feature vector of a sound source input to the input unit;
A long-term feature extraction unit for extracting a long-term feature vector using the short-term feature vector;
A feature selector for selecting an upper feature vector having a higher recognition rate from among the Short-term or Long-term feature vectors;
A model generating unit for generating a model for each genre of music using the selected upper feature vector; And
And a genre classifying unit for classifying a genre of the test music inputted based on the genre-specific model.

The method according to claim 1,
The Short-term feature vector may be expressed as:
(MFCC), a decorrelated filter bank (DFB), and an octave-based spectral contrast (OSC).

The method according to claim 1,
The Short-term feature vector may be expressed as:
and a statistical feature vector extracted using the texture window technique,
Wherein the statistical feature vector includes at least one of an average value, a variance value, a maximum value, and a minimum value.

The method according to claim 1,
The long-term feature vector may be expressed as:
At least one of Feature-based Modulation Spectral Flatness Measures (FMSFM) extracted by using Feature-based Modulation Spectrum (FMS), and Feature-based Modulation Spectral Crest Measures (FMSCM)
Wherein the FMSFM and the FMSCM are defined by the following formulas.

The method according to claim 1,
The long-term feature vector may be expressed as:
Feature-based Modulation Spectral Contrast (FMSC), and Feature-based Modulation Spectral Valley (FMSV)
The FMSC and FMSV are defined by the following mathematical formulas:

Here, θ _q is a set of modulation frequencies in the q-th modulation band.

5. The method according to claim 4 or 5,
The average of the FMS and the FMS,
A music genre classifier defined by the following equation:

Where T is the total number of texture windows, X _t (k, p) is the kth element of the short-term feature of the p th frame in the t th texture window, P _i is the total number of frames in the t th texture window And M is the magnitude of the Modulation Fourier-transform.

The method according to claim 1,
The long-term feature vector may be expressed as:
It further includes statistical features,
Wherein the statistical feature includes at least one of an average value, a variance value, a maximum value, and a minimum value.

The method according to claim 1,
Wherein the genre classifying unit comprises:
The method comprising: extracting a feature vector corresponding to the upper feature vector for the test music, comparing the extracted feature vector with the genre-based model based on the extracted feature vector, A music genre classifier for classifying music genres.

The method according to claim 1,
Wherein the feature selecting unit comprises:
Selects an upper feature vector using a SVM (Support Vector Machine) ranker,
Wherein the genre classifying unit comprises:
A genre classification apparatus for classifying a genre of the test music using a one-against-one SVM (Support Vector Machine).

A music genre classification method using a music genre classification apparatus,
Extracting a short-term feature vector for the input sound source;
Extracting a long-term feature vector using the short-term feature vector;
Selecting an upper feature vector having a higher recognition rate from among the Short-term or Long-term feature vectors; And
Generating a model for each genre of music using the selected upper feature vector;
And classifying the genre of the test music inputted based on the genre-specific model.

11. The method of claim 10,
The Short-term feature vector may be expressed as:
(MFCC), a decorrelated filter bank (DFB), and an octave-based spectral contrast (OSC).

11. The method of claim 10,
The Short-term feature vector may be expressed as:
and a statistical feature vector extracted using the texture window technique,
Wherein the statistical feature vector includes at least one of an average value, a variance value, a maximum value, and a minimum value.

11. The method of claim 10,
The long-term feature vector may be expressed as:
At least one of Feature-based Modulation Spectral Flatness Measures (FMSFM) extracted by using Feature-based Modulation Spectrum (FMS), and Feature-based Modulation Spectral Crest Measures (FMSCM)
Wherein the FMSFM and the FMSCM are defined by the following formulas.

11. The method of claim 10,
The long-term feature vector may be expressed as:
, Feature-based Modulation Spectral Contrast (FMSC), and Feature-based Modulation Spectral Valley (FMSV)
Wherein the FMSC and the FMSV are defined as follows:

Here, θ _q is a set of modulation frequencies in the q-th modulation band.

14. The method according to claim 13 or 14,
The average of the FMS and the FMS,
A music genre classification method defined by the following equation:

11. The method of claim 10,
The long-term feature vector may be expressed as:
It further includes statistical features,
Wherein the statistical characteristic includes at least one of an average value, a variance value, a maximum value, and a minimum value.

11. The method of claim 10,
Classifying the genre of the inputted music based on the genre-
Extracting a feature vector corresponding to the upper feature vector with respect to the test music when the test music to be classified is input; And
And classifying the genre of the test music by comparing the extracted top feature vector with the genre-specific model.

11. The method of claim 10,
In the step of selecting a feature whose recognition rate is higher than a reference value,
SVM (Support Vector Machine) ranker,
In the step of classifying the genre of music inputted based on the genre-specific model,
A method of classifying music genres classified by the one-against-one SVM (Support Vector Machine).