KR101520621B1

KR101520621B1 - / Method and apparatus for query by singing/huming

Info

Publication number: KR101520621B1
Application number: KR1020080084038A
Authority: KR
Inventors: 엄기완; 덩징; 쑤안 주; 이재원; 원원 시
Original assignee: 삼성전자주식회사
Priority date: 2007-09-28
Filing date: 2008-08-27
Publication date: 2015-05-15
Also published as: KR20090032972A; CN101398827A; CN101398827B

Abstract

본 발명은 노래/허밍에 의한 질의 방법에 관한 것으로, 본 발명의 일 실시 예에 따른 방법은 임의의 노래/허밍으로부터 추출된 프레임 레벨의 질의 시퀀스를 입력하고, 바에 의해 분할된 템플릿 시퀀스 세트를 입력하고, 질의 시퀀스와 템플릿 시퀀스 사이에 매칭을 수행하고, 매칭 결과에 따라 질의 후보들을 출력하여 이루어진다.The present invention relates to a query method by song / humming, wherein a method according to an embodiment of the present invention includes inputting a query sequence of a frame level extracted from arbitrary song / humming, inputting a set of template sequences divided by a bar Performs matching between the query sequence and the template sequence, and outputs the query candidates according to the matching result.

노래, 허밍, 질의 Song, humming, query

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method and apparatus for querying by singing /

본 발명은 자동 검색 시스템 및 방법에 관한 것으로, 더 상세하게는 사용자가 음악 데이터베이스로부터 원하는 노래 또는 음악들을 빠르게 검색할 수 있는 노래/허밍에 의한 질의 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an automatic search system and method, and more particularly, to a method and apparatus for querying by song / humming in which a user can quickly search for a desired song or music from a music database.

종래에, 사용자가 음악을 듣고자 하는 경우, 음악 데이터베이스에서 로래를 조회하기 위해서 사용자는 음악의 제목, 가수의 이름 또는 특정 음악에 관련한 다른 정보들을 타이핑하여야만 한다. 그러나, 음악 데이터베이스에 저장된 노래들의 수가 점점 증가함에 따라, 사용자가 듣기를 원하는 음악을 찾기 위해서 노래들에 대한 많은 정보를 기억하기가 어렵게 되었다.Conventionally, when a user wishes to listen to music, in order to query a roulette in a music database, the user must type the title of the music, the name of the singer, or other information related to the specific music. However, as the number of songs stored in the music database increases, it becomes difficult to memorize a lot of information about the songs in order to find the music the user wants to listen to.

콘텐츠 기반의 검색 방법, 즉 노래/허밍에 의한 질의가 제안된다. QBSH시스템은 특히 노래, 허밍 또는 휘파람의 질의들을 수용함으로써 음악을 검색하고, 사용자가 음악 데이터베이스로부터 상세한 정보를 읽어버렸던 노래를 찾는데 도움을 주는데 목적이 있다.A query based on a content-based search method, i.e., song / humming, is proposed. The QBSH system is specifically aimed at retrieving music by accepting queries of songs, humming or whistling and helping the user find songs that have read detailed information from the music database.

미국공개특허 제2007/131094호에서, QBH 시스템이 개시된다. 이는 3D 검색 알고리즘을 이용한 음악 정보 검색에 초점을 둔다. 이는 질의와 데이터베이스 사이에서 독립적으로 멜로디와 리릭(lyrics) 사이의 유사도를 가리키는 유사도 측정치를 계산한다. 검색은 3차원 공간에서 수행되는데, 시간에 대한 t, 음향-음소 음성 특징들에 대한 S. 그리고 UDS 스트링에 대한 H이다. 이 발명은, 표준 음성 인식기와 추가 'da' 워드들 및 QBH 시스템을 결합한다. In U. S. Patent Publication No. 2007/131094, a QBH system is disclosed. It focuses on music information search using 3D search algorithm. It computes a measure of similarity between the query and the database independently, indicating the similarity between the melody and the lyrics. The search is performed in three-dimensional space, t for time, S. for acoustic-phonetic features and H for UDS string. This invention combines the standard speech recognizer with additional 'da' words and the QBH system.

또 다른 US 6,188,010호에서, 노래/허밍에 의한 질의 방법이 개시된다. 이는 단지 그 곡의 멜로디를 알 수 있는 경우에 노래 제목에 대한 검색을 가능하게 한다. 피아노 롤 뮤직 로테이션 인터페이스 또는 피아노 키보드 인터페이스가 멜로디를 입력하는데 사용될 수 있다.In another US 6,188,010, a query method by song / humming is disclosed. This makes it possible to search for song titles only when the song's melody is known. A piano roll music rotation interface or a piano keyboard interface can be used to input the melody.

US 6,121,530 및 US 5,784,686은 음악 검색 방법에 관한 것이다. US 6,121,530 and US 5,784,686 relate to music retrieval methods.

일반적으로, 종래의 QBSH는 4가지 주요 구성 부분들을 포함하는데, (a) 멜로디는 피칭 트랙에 의한 질의로부터 추출되고 세미톤(음표)으로 변환된다. (b) 음표 분할은 멜로디에 대해 수행되고 음표 길이는 무시된다. (c) 멜로디는 상대 음표 차이, UDS 스트링과 같은 동일한 형태들로 변환된다. (d) 검색 방법은 스트링 매칭, DTW 및 비터비 검색 등을 포함한다.In general, the conventional QBSH includes four main components: (a) the melody is extracted from the query by the pitching track and converted to a semitone (note). (b) Note splitting is performed on the melody and the note length is ignored. (c) The melody is converted into the same forms as the relative note difference, UDS string. (d) The search method includes string matching, DTW and viterbi search, and the like.

그러나, 종래의 QBSH 시스템에서, 대용량 음악 데이터베이스에 대한 인식률은 크기 감소하는데, 이는 실제 애플리케이션에 대한 적용을 불가능하게 한다. 또한, 처리 속도도 대용량 음악 데이터베이스에 대해서 더 길다. 따라서, 대용량 음악 데이터베이스에 대해 노래/허밍에 의한 원하는 곡을 빠르게 찾을 수 있는 QBSH 방법 및 시스템이 필요하다.However, in the conventional QBSH system, the recognition rate for a large-capacity music database is reduced in size, which makes it impossible to apply it to practical applications. Also, the processing speed is longer for a large-capacity music database. Therefore, there is a need for a QBSH method and system that can quickly find a desired song by song / humming for a large music database.

상기 문제점들을 해결하기 위해, 본 발명은 노래/허밍에 의한 질의 방법 및 장치를 제공하는데, 이를 통해 사용자는 대용량 음악 데이터베이스에서 노래/허밍에 의해 원하는 곡을 빠르게 찾을 수 있고, 많은 노래를 잘 부르거나 많이 알 필요가 없으며, 가사(lyrics)가 있거나 없거나, 곡들의 또는 휘파람의 일부를 불러서 질의를 할 수 있다.In order to solve the above problems, the present invention provides a method and apparatus for querying by song / humming, whereby a user can quickly find a desired song by singing / humming in a large music database, You do not need to know much, you can ask for lyrics, or you can call some of the songs or whistles to ask questions.

본 발명의 일 실시 예에 따른 노래/허밍에 의한 질의 방법에 있어서, 곡 템플릿을 생성하는 방법은, 곡으로부터 메인 멜로디 컨투어를 추출하고, 추출한 메인 멜로디 컨투어에 바 분할을 수행하고, 바 분할이 수행된 상기 메인 멜로디 컨투어를 프레임 레벨 음표 시퀀스로 변환하여 템플릿 시퀀스로 저장하여 이루어진다. In a method of querying by song / humming according to an embodiment of the present invention, a method of generating a song template includes extracting a main melody contour from a song, performing bar division on the extracted main melody contour, Converts the main melody contour into a frame-level note sequence, and stores the template sequence as a template sequence.

본 발명의 일 실시 예에 따른 바 분할을 수행하는 단계는 상기 곡 멜로디와 관련한 정보를 획득하고, 상기 획득한 정보에 따라 각각의 바의 시작 위치와 종료 위치를 찾고, 상기 멜로디 컨투어에 시작 지점과 종료 지점을 라벨링하여 이루어진다. The step of performing the bar division according to an embodiment of the present invention may include acquiring information related to the song melody, searching for a start position and an end position of each bar according to the obtained information, This is done by labeling the end point.

여기서, 상기 메인 멜로디 컨투어를 프레임 레벨 음표 시퀀스로 변환하는 것은 상기 멜로디 음표 시퀀스를 획득하기 위해 상기 메인 멜로디 컨투어를 소정의 프레임 시프트로 샘플링하는 것을 특징으로 한다.The converting of the main melody contour into a frame-level note sequence may include sampling the main melody contour at a predetermined frame shift to obtain the melody note sequence.

본 발명의 일 실시 예에 따른 상기 방법은 상기 멜로디 음표 컨투어를 평균 차감된 멜로디 음표 시퀀스로 변환하는 단계를 더 포함하는 것을 특징으로 한다.The method according to an embodiment of the present invention may further include converting the melody note contour into an average subtracted melody note sequence.

본 발명의 일 실시 예에 따른 노래/허밍에 의한 질의 방법은 노래/허밍으로 부터 추출된 프레임 레벨 질의 시퀀스를 입력하고, 바 분할된 템플릿 시퀀스를 입력하고, 상기 질의 시퀀스와 상기 템플릿 시퀀스 사이에 매칭을 수행하고 매칭 결과에 따라 질의 후보들을 출력하여 이루어진다.A method for querying by song / humming according to an embodiment of the present invention includes inputting a frame-level query sequence extracted from a song / humming, inputting a bar-divided template sequence, matching between the query sequence and the template sequence And outputs the query candidates according to the matching result.

여기서, 상기 질의 시퀀스와 상기 템플릿 시퀀스 사이에 매칭은,Wherein matching between the query sequence and the template sequence comprises:

제1 프레임 시프트로 상기 입력 질의 시퀀스로부터 저화음(low resolution) 질의 시퀀스를 추출하고, 상기 제1 프레임 시프트에서 상기 입력 템플릿 시퀀스로부터 저화음 템플릿 시퀀스를 추출하고, 상기 저화음 질의 시퀀스와 상기 저화음 템플릿 시퀀스 사이에 제1 매칭을 수행하여, 상기 매칭 결과에 따라 곡 데이터베이스로부터 제1 세트의 후보 곡들을 얻고, 상기 제1 프레임 시프트보다 작은 제2 프레임 시프트로 상기 입력 질의 시퀀스로부터 고화음 질의 시퀀스를 추출하고, 상기 제2 프레임 시프트로 상기 후보 템플릿들에 포함된 상기 입력 템플릿 시퀀스들로부터 고화음 템플릿 시퀀스를 추출하고, 상기 제1 세트의 후보 곡들을 기초로 상기 고화음 질의 시퀀스와 상기 고화음 템플릿 시퀀스의 제2 매칭을 수행하고, 상기 제2 매칭 결과에 따라 상기 곡 데이터베이스들로부터 최종 세트의 후보 곡들을 얻는 것을 특징으로 한다.Extracting a low resolution query sequence from the input query sequence at a first frame shift, extracting a low tone template sequence from the input template sequence at the first frame shift, Obtaining a first set of candidate songs from the tune database according to the matching result, and obtaining a sequence of high-tone-quality queries from the input query sequence with a second frame shift smaller than the first frame shift Extracting a high-tone template sequence from the input template sequences included in the candidate templates at the second frame shift, and outputting the high-tone tone sequence and the high-tone tone template based on the first set of candidate songs, Performs a second matching of the sequence, and, in accordance with the second matching result, Characterized in that obtaining candidate songs in the final set from the database.

여기서, 상기 매칭은 고속 템포 스케일링의 선형 스케일링 알고리즘을 사용함으로써 수행되는 것을 특징으로 한다.Here, the matching is performed by using a linear scaling algorithm of fast tempo scaling.

여기서, 상기 질의 시퀀스와 상기 템플릿 시퀀스를 매칭하는 단계는, (a) 질의 시퀀스(Q)와 템플릿 시퀀스(T)를 입력하는 단계; (b) 상기 스케일링 r이 r₀ 이고 상기 스케일링 r의 오프셋 δ가 δ₀ 을 갖도록 하는 단계; (c) 선형 스케일링 알고리즘을 사용해서, Q₁=Q×r, 및 상기 시퀀스 Q₁ 과 상기 템플릿 시퀀스 T의 거리 값 d=｜Q₁-T｜을 계산하는 단계; (d) 상기 선형 스케일링 알고리즘을 사용해서, Q_high=Q×(r+δ), 및 상기 시퀀스 Q_high 와 상기 템플릿 시퀀스 T의 거리 값 d_hign=｜Q_high-T｜을 계산하는 단계; (e) 상기 선형 스케일링 알고리즘을 사용해서, Q_low=Q×(r-δ), 및 상기 시퀀스 Q_low 와 상기 템플릿 시퀀스 T의 거리 값 d_low=｜Q_low-T｜을 계산하는 단계; (f) 상기 거리 값들 d,d_high,d_low 를 비교하는 단계; (g) 상기 d_high 가 가장 작다면, r은 r+δ 및 d는 d_high 이고, 상기 d_low 가 가장 작다면, r은 r-δ 및 d는 d_low 이고, 상기 d가 가장 작다면, r 및 d이도록 하는 단계; (h) 상기 δ가 소정의 값 이상인지 결정하는 단계; (i) 상기 δ가 소정의 값 이상인 경우, 상기 δ가 δ/2이고, 상기 (d) 단계를 수행하는 단계; (j) 상기 δ가 소정의 값 이하인 경우, 상기 거리 값 d를 출력하는 단계; 및 (k) 상기 거리 값 d가 소정의 임계치 이하인지에 따라 상기 질의 시퀀스와 상기 템플릿 시퀀스가 매칭되는지를 결정하는 단계를 포함한다.Wherein the step of matching the query sequence with the template sequence comprises the steps of: (a) inputting a query sequence (Q) and a template sequence (T); (b) causing the scaling r to be r ₀ and the offset 隆 of the scaling r to have 隆₀ ; (c) calculating Q ₁ = Q x r and a distance value d = | Q ₁ -T | of the sequence Q ₁ and the template sequence T, using a linear scaling algorithm; (d) calculating Q _high = Q x (r +?) and the distance value d _hign = | Q _high- T | of the sequence Q _high and the template sequence T, using the linear scaling algorithm; (e) calculating Q _low = Q x (r-delta) using the linear scaling algorithm and a distance value d _low = | Q _low- T | of the sequence Q _low and the template sequence T; (f) comparing the distance values d, d _high , d _low ; (g) If d _high is the smallest, r is r + delta and d is d _high , and if d _low is the smallest, r is r-delta and d is d _low , r and d; (h) determining whether the? is not less than a predetermined value; (i) if? is equal to or greater than a predetermined value,? is? / 2, and performing the step (d); (j) outputting the distance value d when? is less than or equal to a predetermined value; And (k) determining whether the query sequence and the template sequence match according to whether the distance value d is less than or equal to a predetermined threshold.

여기서, 상기 r₀ 및 δ₀ 는 각각 1.4 및 0.4이고, 상기 소정의 값은 0.09인 것을 특징으로 한다.Here, r ₀ and 隆₀ are 1.4 and 0.4, respectively, and the predetermined value is 0.09.

여기서, 상기 질의 시퀀스와 상기 템플릿 시퀀스의 매칭을 수행하는 경우, 재귀 정렬(recursive alignment) 알고리즘을 사용해서 상기 질의 시퀀스와 상기 템플릿 시퀀스의 거리 값을 계산하고, 상기 거리 값이 소정의 임계값 이하인 경우, 질의 결과로서 상기 템플릿 시퀀스에 상응하는 곡을 출력하고, 그렇지 않은 경우, 상기 템플릿을 버리고, 상기 질의 시퀀스와 또 다른 템플릿 시퀀스 사이의 매칭을 수행하는 것을 특징으로 한다.Here, when the query sequence and the template sequence are matched, a distance value between the query sequence and the template sequence is calculated using a recursive alignment algorithm. If the distance value is less than a predetermined threshold value And outputs a song corresponding to the template sequence as a result of the query. Otherwise, the template is discarded, and matching between the query sequence and another template sequence is performed.

여기서, 상기 질의 시퀀스와 상기 템플릿 시퀀스의 매칭은 (a) 상기 질의 시퀀스 Q=(q₁,q₂,...,q_N) 및 상기 템플릿 시퀀스 T=(t₁,t₂,...t_M)을 입력하는 단계(여기서, N과 M은 각각 Q 및 T의 프레임 수); (b) 재귀 깊이 D를 입력하고, j=[N/2] 및 i=0을 갖도록 하는 단계; (c) 상기 시퀀스 Q를 포인트 j에서, 2개의 시퀀스들 Q₁=(q₁,q₂,...,q_i), Q₂=(q_j ₊₁,q_j ₊₂,...,q_N)으로 나누는 단계; (d) Q1의 합 sum(Q1) 과 Q2의 합 sum(Q2)을 계산하고, 비율 sum(Q1)/sum(Q)를 계산하는 단계; (e) k=[M/2]를 얻고, 상기 템플릿 시퀀스 T를 포인트 k에서, 2개의 시퀀스들 T₁=(t₁,t₂,...,t_k) 및 T₂(t_K ₊₁,t_K ₊₂,...,t_M)으로 나누는 단계; (f) 선형 스케일링 알고리즘을 사용해서, Q₁ 과 T₁ 사이의 거리 값 d₁ 및 Q₂ 와 T₂ 사이의 거리 값 d₂ 를 계산하고, S₁=d₁+d₂ 를 갖도록 하는 단계; (g) 상기 템플릿 시퀀스 T를 포인트 h에서, 2개의 시퀀스들 T₃=(t₁,t₂,...,t_h) 및 T₄=(t_h ₊₁,t_h ₊₂,...,t_M)으로 나누고, T₃ 와 T의 합의 비율이 R₀ 를 갖도록 하는 단계; (h) 상기 선형 스케일링 알고리즘을 사용해서, Q₁ 과 T₃ 의 거리 값 d₃ , Q₂ 와 T₄ 의 거리 값 d₄ 를 계산하고, S₂ 가 d₃+d₄ 를 갖도록 하는 단계; (i) S₁ 과 S₂ 를 비교하고, S₁ 이 S₂ 보다 작다면 S=S₁ 및 i=k이고, 그렇지 않다면, S=S₂ 및 i=h 를 갖도록 하는 단계; (j) D가 0이라면 S를 출력하고, D가 0이 아니라면 상기 템플릿 시퀀스 T를 2개의 시퀀스 T₁=(t₁,t₂,...t_i) 및 T₂=(t_i ₊₁,t_i ₊₂,...,t_M)으로 나누고, Q=Q₁ 및 T=T₁ 을 갖도록 하고 연산을 반복하기 위해 단계 (a)로 돌아가서 S₁ 을 얻고, Q=Q₂ 및 T=T₂ 를 갖도록 하고 연산을 반복하기 위해 단계 (a)로 돌아가서 S₂ 를 얻고나서, S=S₁+S₂ 를 갖도록 하고 S를 출력하는 단계; 및 (k) S가 소정의 값보다 작다면 질의 결과로서 상기 템플릿에 상응하는 곡을 출력하고, 그렇지 않다면 상기 질의 시퀀스와 또 다른 템플릿 시퀀스 사이의 매칭을 수행하는 단계를 포함한다.(Q ₁ , q ₂ , ..., q _N ) and the template sequence T = (t ₁ , t ₂ , ..., q _N ) and the template sequence t _M ), where N and M are the number of frames of Q and T, respectively; (b) inputting a recursive depth D and having j = [N / 2] and i = 0; (c) at the point sequence Q j, 2 sequences of _{_{Q 1 = (q 1, q}} 2, ..., q i), Q 2 = (q j +1, q j +2, ... , q _N ); (d) calculating a sum sum (Q1) of Q1 and a sum (Q2) of Q2 and calculating a ratio sum (Q1) / sum (Q); (e) k = [M / 2] to obtain, for the template T sequence at point k, the two sequence _{_{T 1 = (t 1, t}} 2, ..., t k) and T ₂ (t ₊ _K ₁ , t _K ₊₂ , ..., t _M ); (f) the method comprising using a linear scaling algorithm, calculating the distance value d ₂ between Q ₁ and distance value between T ₁ d ₁ and Q ₂ and T _2, and so as to have a _{_{_{S 1 = d 1 + d 2}}} ; (g) from the point h to the sequence of the template T, a sequence of 2 _{_{T 3 = (t 1, t}} 2, ..., t h) and _{_{_{T 4 = (t h +1,}}} t h +2, .. ., t _M ), so that the ratio of the sum of T ₃ and T is R ₀ ; (h) the step of using the linear scaling algorithm, and calculates the Q ₁ and distance value d ₃ T _3, the distance value of Q ₂ and T _₄ d _4, S ₂ to have a d ₃ + d _4; (i) comparing S ₁ and S ₂ , and if S ₁ is less than S ₂ , S = S ₁ and i = k, otherwise S = S ₂ and i = h; (j) If the D is 0, S, and outputs, D is not 0, the two sequences to the template sequences _{_{T T 1 = (t 1,}} t 2, ... t i) and T ₂ = (t _i ₊₁ , t _i ₊₂ , ..., t _M , returning to step (a) to have Q = Q ₁ and T = T ₁ and repeat the operation to obtain S ₁ , and Q = Q ₂ and T = T ₂ and returning to step (a) to obtain S ₂ , then having S = S ₁ + S ₂ and outputting S; And (k) outputting a song corresponding to the template as a query result if S is less than a predetermined value, and if not, performing matching between the query sequence and another template sequence.

본 발명의 다른 실시 예에 따라, 노래/허밍에 의한 질의에 있어서 곡 템플릿을 생성하는 장치는 곡 데이터베이스의 각각의 곡으로부터 메인 멜로디 컨투어를 추출하는 추출부, 상기 추출한 메인 멜로디 컨투어상에 바 분할을 수행하는 바 분할부 및 상기 바 분할이 수행된 멜로디 컨투어를 프레임 레벨의 음표 시퀀스로 변환하여 템플릿 시퀀스로 저장하는 변환부를 포함한다.According to another embodiment of the present invention, an apparatus for generating a song template in a query by song / humming includes an extraction unit for extracting a main melody contour from each piece of music database, a bar division on the extracted main melody contour And a conversion unit for converting the melody contour into melody contours to be performed and a melody contour in which the bar division is performed into a melody contour in a frame level and storing the melody contour as a template sequence.

본 발명의 또 다른 실시 예에 따라, 노래/허밍에 의한 질의 장치는 노래/허밍으로부터 추출된 프레임 레벨 질의 시퀀스를 입력하는 질의 시퀀스 입력부,바에 의해 분할된 프레임 레벨의 템플릿 시퀀스를 입력하는 템플릿 시퀀스 입력부, 상기 질의 시퀀스와 상기 템플릿 시퀀스 사이에 매칭을 수행하는 매칭부 및 상기 매칭 의 결과에 따라 질의 후보들을 출력하는 출력부를 포함한다.According to another embodiment of the present invention, a query device by song / humming includes a query sequence input unit for inputting a frame-level query sequence extracted from a song / humming, a template sequence input unit for inputting a template sequence of a frame- A matching unit for performing matching between the query sequence and the template sequence, and an output unit for outputting query candidates according to a result of the matching.

본 발명의 또 다른 실시 예에 따라, 곡 데이터베이스의 각각의 곡의 메인 멜로디를 추출하고 상기 멜로디 컨투어상에 바 분할을 수행함으로써 템플릿 세트를 생성하는 템플릿 생성부, 상기 노래/허밍으로부터 질의 멜로디 컨투어를 추출하는 멜로디 추출부 및 상기 질의 멜로디 컨투어와 템플릿 세트의 각각의 템플릿 사이에 매칭을 수행하고, 상기 매칭 결과에 따라 후보 곡들을 출력하는 멜로디 매칭부를 포함한다.According to still another embodiment of the present invention, there is provided a method for generating a melody contour, comprising the steps of: generating a template set by extracting a main melody of each song of a song database and performing bar division on the melody contour; A melody extracting unit for extracting melody contours, a melody extracting unit for extracting melody contours, and a melody matching unit for performing matching between respective templates of the query melody contour and template sets, and outputting candidate songs according to the matching result.

본 발명의 또 다른 실시 예에 따라, 상기 장치는 다중 화음 프레임 레벨 멜로디 템플릿 시퀀스를 생성하기 위해, 소정의 프레임 시프트로 상기 메인 멜로디로부터 음표 시퀀스를 추출하는 다중 화음 템플릿 시퀀스 생성부를 더 포함한다.According to another embodiment of the present invention, the apparatus further includes a polyphonic tone template sequence generator for extracting a note sequence from the main melody with a predetermined frame shift to generate a polyphonic frame level melody template sequence.

본 발명의 또 다른 실시 예에 따라, 상기 장치는 다중 화음 프레임 레벨 질의 시퀀스를 생성하기 위해, 상기 질의 멜로디 컨투어로부터 음표 시퀀스를 추출하는 다중 화음 질의 시퀀스 생성부를 더 포함한다.According to another embodiment of the present invention, the apparatus further includes a multi-tone tone query sequence generation unit for extracting a note sequence from the query melody contour, to generate a multi-tone tone frame-level query sequence.

본 발명의 또 다른 실시 예에 따라, 상기 멜로디 매칭부는 상기 다중 화음 프레임 레벨 멜로디 템플릿 시퀀스와 상기 다중 화음 프레임 레벨 리 시퀀스 사이의 거리를 계산하는 거리 값 계산부 및 상기 거리 값이 소정의 임계값 이하인지 결정하고, 상기 거리 값이 상기 임계값 이하인 경우, 상기 질의 시퀀스와 매칭되는 상기 템플릿 시퀀스를 결정하고 상기 템플릿 시퀀스에 상응하는 곡을 출력하고, 그렇지 않으면 상기 질의 시퀀스와 또 다른 템플릿 시퀀스 사이에 매칭을 수행하는 출력부를 포함한다.According to another embodiment of the present invention, the melody matching unit includes a distance value calculating unit for calculating a distance between the multi-tone frame-level melody template sequence and the multi-tone frame frame-level resequencing, And if the distance value is less than or equal to the threshold value, determining the template sequence to match the query sequence and outputting a song corresponding to the template sequence; otherwise, matching the query sequence with another template sequence For example.

이하, 첨부한 도면들을 참조하여 본 발명의 바람직한 실시 예들을 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 노래/허밍에 의한 질의 장치(100)의 개략적인 블록도이다.1 is a schematic block diagram of a song / humming query device 100 according to an embodiment of the present invention.

도 1을 참조하면, 노래/허밍에 의한 질의 장치(100)는 템플릿 생성부(110), 멜로디 추출부(120) 및 멜로디 매칭부(130)를 포함한다. Referring to FIG. 1, a song / humming inquiry apparatus 100 includes a template generating unit 110, a melody extracting unit 120, and a melody matching unit 130.

템플릿 생성부(110)는 곡 데이터베이스의 각각의 곡의 메인 멜로디를 추출하고 멜로디 컨투어에 바 분할을 수행함으로써 템플릿 세트를 생성한다. 여기서, 템플릿 세트 생성이 수행되는데, 다음 매칭 처리를 위한 템플릿들이 만들어진다. 또한, 템플릿 세트 생성은 오프라인으로 수행될 수 있다. 곡 데이터베이스의 각각의 곡의 메인 멜로디가 추출되고, 이어 바 분할이 추출된 메인 멜로디에 대해 수행되어 다중 화음 템플릿 세트가 만들어진다. 템플릿 생성과 관련하여 도 3 및 4를 참조하여 후술한다. The template generation unit 110 generates a template set by extracting a main melody of each piece of the music database and performing bar division on the melody contour. Here, template set generation is performed, and templates for the next matching processing are created. In addition, template set generation can be performed offline. The main melody of each song in the music database is extracted, and then the bar division is performed on the extracted main melody to create a set of multi-tone templates. With reference to FIGS. 3 and 4, the template generation will be described later.

멜로디 추출부(120)는 사용자의 노래/허밍으로부터 질의 멜로디 컨투어를 추출한다. 즉, 멜로디 컨투어가 다음 매칭 처리를 위한 사용자의 노래/허밍으로부터 얻어진다. The melody extracting unit 120 extracts a query melody contour from the song / humming of the user. That is, the melody contour is obtained from the song / humming of the user for the next matching process.

먼저, 사용자의 노래/허밍이 입력되는데, 이는 가사를 가지거나 가지지 않은 사람의 질의, 곡의 한 소절의 노래, 또는 휘파람일 수 있다. 이러한 질의 타입들이 웨이브 포맷 파일로 저장될 수 있다. First, the user's song / humming is input, which may be a query of a person with or without lyrics, a song of a song of a song, or a whistle. These query types can be stored in a wave format file.

다음으로, 음조(pitch) 트래킹이 수행된다. 이 프로세스에서, 음조들이 가장 높은 프레임의 화음, 즉 가장 작은 프레임 시프트로부터 추출된다. 저화음의 멜로디 컨투어가 요구되는 경우, 상응하는 멜로디 시퀀스가 가장 작은 프레임 시프트로 추출된 멜로디 시퀀스로부터 추출된다. 예를 들면, 가장 작은 프레임 시프트가 0.1s, 오디오의 길이가 8s인 경우, 80 프레임들을 가진 데이터가 얻어진다. 0.2s의 프레임 시프트의 멜로디 시퀀스가 요구된다면, 짝수(또는 홀수)의 프레임들이 80 프레임들의 데이터로부터 얻어진다. 음조를 추출하는 알고리즘은 종래의 자동-상관 알고리즘이며, 상세한 설명은 생략된다.Next, pitch tracking is performed. In this process, pitches are extracted from the chord of the highest frame, i. E. The smallest frame shift. If a melodic contour of a lower chord is desired, the corresponding melody sequence is extracted from the extracted melody sequence with the smallest frame shift. For example, if the smallest frame shift is 0.1s and the audio length is 8s, data with 80 frames is obtained. If a melody sequence of a frame shift of 0.2s is required, even (or odd) frames are obtained from the data of 80 frames. The algorithm for extracting the tone is a conventional auto-correlation algorithm, and a detailed description thereof is omitted.

다음으로, 멜로디 컨투어 스무딩이 수행된다. 정확하게 음조 주파수 값을 추출하는 것이 불가능하기 때문에, 반 음조 주파수 또는 다중 음조 주파수, 또는 몇몇 고립된 지점들일 수 있고, 추출된 음조들을 컨투어를 스무딩하는 것이 필요하다. 이 과정에서, 중간(median) 필터링과 선형(linear) 필터링이 음조 컨투어를 스무딩하기 위해 채택되며, 이는 음조 트래킹에서 비 이상적인 음조 프레임들을 제거하기 위함이다.Next, melody contour smoothing is performed. Since it is impossible to extract the tonal frequency value accurately, it may be a semi-tonal frequency or multiple tonal frequencies, or some isolated points, and it is necessary to smooth the contour to the extracted tonalities. In this process, median filtering and linear filtering are employed to smoothing the tone contour in order to remove non-ideal tonal frames in tonal tracking.

음조 트래킹 후에, 음조부터 세미 톤으로의 변환이 수행된다. 각각의 음조 주파수는 다음 수학식 1과 같이 세미 톤으로 변환된다.After tonality tracking, conversion from tonality to semitone is performed. Each tonal frequency is converted into a semitone as shown in the following equation (1).

여기서, freq는 음조 주파수이다.Here, freq is the pitch frequency.

다음으로, 평균 차감법이 변환된 세미톤에 대해 수행된다. 동일한 곡을 다양한 사람들이 다른 키 음표로 노래하거나 허밍하고, 곡 데이터베이스에서 정확하게 음표를 허밍하기가 어렵기 때문에, 매칭 처리에 영향을 줄이기 위해서, 매칭 프로세스를 위한 소정의 대표값으로 음표 시퀀스를 바꿔줄 필요가 있다. 본 발명의 일 실시 예에서, 세미 톤 시퀀스는 평균 차감된 시퀀스로 변환되고, 이것이 다음 매칭 처리에 사용된다. 특히, 세미톤 시퀀스는 매칭을 위한 알고리즘에 사용된 시퀀스로 변환된다. 즉, 세미톤 시퀀스에 대해서, 시퀀스에서 영이 아닌 값의 합이 얻어지고, 평균값이 얻어진다. 이어 각각의 영이 아닌 값들의 합에서 평균값을 빼고, 매칭을 위한 시퀀스가 얻어진다. 세미톤 시퀀스에서 영과 동일한 값들은 버려진다. 또한, 메인 멜로디 추출이 동일한 방식으로 수행된다.Next, an average subtraction method is performed on the converted semitone. Since it is difficult for different people to sing or hum with different key notes and to hum from notes in the music database accurately, in order to reduce the influence on the matching process, the note sequence is changed to a predetermined representative value for the matching process There is a need. In one embodiment of the invention, the semitone sequence is converted to an averaged subtracted sequence, which is then used in the next matching process. In particular, the semitone sequence is transformed into a sequence used in an algorithm for matching. That is, for the semitone sequence, the sum of non-zero values in the sequence is obtained, and an average value is obtained. Subsequently, the average value is subtracted from the sum of each non-zero value, and a sequence for matching is obtained. The same values as zero in the semitone sequence are discarded. Also, the main melody extraction is performed in the same manner.

마지막으로, 프레임 레벨 멜로디 컨투어가 출력된다. 이 과정에서, 단순한 샘플링 방법이 가장 높은 화음의 프레임 레벨 음표 컨투어들로부터 다른 화음의 프레임 레벨 음표 컨투어들을 얻기 위해 사용된다. 이 동작은 각각의 곡에 대해 다른 화음을 가진 템플릿을 생성하는 것과 유사하고 더 이상 설명하지 않는다.Finally, a frame-level melody contour is output. In this process, a simple sampling method is used to obtain the frame level note contours of the other chords from the frame levels of the highest chord note contours. This operation is similar to creating a template with different chords for each song and is not described further.

멜로디 매칭부(130)는 질의 멜로디 컨투어와 템플릿 세트의 각각의 템플릿 사이에 매칭을 수행하고, 매칭 결과에 따라 후보 곡들을 출력한다. 멜로디 컨투어가 사용자의 노래/허밍으로부터 추출되고 난 후, 사용자가 질의하기를 바라는 결과를 얻기 위해서 멜로디 컨투어와 템플릿 세트 사이의 매칭이 수행된다. 매칭 동작은 도 5를 참조하여 후술한다.The melody matching unit 130 performs matching between the query melody contour and each template of the template set, and outputs candidate songs according to the matching result. After the melody contour is extracted from the user ' s song / humming, matching between the melody contour and the template set is performed to obtain the result that the user wishes to query. The matching operation will be described later with reference to Fig.

도 2는 도 1에 도시된 노래/허밍에 의한 질의 장치(200)의 구체적인 블록도 이다.FIG. 2 is a specific block diagram of the song / humming query device 200 shown in FIG.

도 2에 도시된 노래/허밍에 의한 질의 장치(200)는 도 1과 비교하여, 다중 화음 템플릿 시퀀스 생성부(210)과 다중 화음 질의 시퀀스 생성부(220), 그리고, 멜로디 매칭부(130)는 거리값 계산부(230) 및 출력부(240)를 포함한다.2, the song / humming query device 200 includes a multiplexed tone template sequence generator 210, a multiplexed tone query sequence generator 220, and a melody matching unit 130, A distance value calculation unit 230 and an output unit 240. [

다중 화음 템플릿 시퀀스 생성부(210)는 다중 화음 프레임 레벨 멜로디 템플릿 시퀀스를 생성하기 위해, 소정의 프레임 시프트로 메인 멜로디로부터 음표 시퀀스를 추출한다. The multi-chord template sequence generation unit 210 extracts a note sequence from the main melody with a predetermined frame shift to generate a multi-chord frame level melody template sequence.

다중 화음 질의 시퀀스 생성부(220)는 다중 화음 프레임 레벨 질의 시퀀스를 생성하기 위해, 질의 멜로디 컨투어로부터 음표 시퀀스를 추출한다. The multi-tone pitch query sequence generator 220 extracts a note sequence from the query melody contour to generate a multi-tone frame-level query sequence.

거리값 계산부(230)는 다중 화음 템플릿 시퀀스 생성부(210)로부터 출력된 다중 화음 프레임 레벨 멜로디 템플릿 시퀀스와 다중 화음 질의 시퀀스 생성부(220)로부터 출력된 다중 화음 프레임 레벨 질의 시퀀스 사이의 거리를 계산한다. The distance value calculation unit 230 calculates the distance between the multiplexed tone frame level melody template sequence output from the multiplexed tone template sequence generation unit 210 and the multiplexed tone frame level query sequence output from the multiplexed tone query sequence generation unit 220 .

출력부(240)는 거리값 계산부(230)로부터 출력된 거리값이 소정의 임계값 이하인지 결정한다. 여기서, 거리 값이 상기 임계값 이하인 경우, 상기 질의 시퀀스와 매칭되는 템플릿 시퀀스를 결정하고 템플릿 시퀀스에 상응하는 곡을 출력하고, 그렇지 않으면 질의 시퀀스와 또 다른 템플릿 시퀀스 사이에 매칭을 수행한다.The output unit 240 determines whether the distance value output from the distance value calculation unit 230 is equal to or less than a predetermined threshold value. Here, if the distance value is equal to or less than the threshold value, a template sequence matching with the query sequence is determined and a song corresponding to the template sequence is output. Otherwise, the query sequence is matched with another template sequence.

도 3은 도 1에 도시된 곡 템플릿 생성부(110)의 구체적인 블록도이다.FIG. 3 is a specific block diagram of the song template generating unit 110 shown in FIG.

도 3을 참조하면, 곡 템플릿 생성부(110)는 추출부(300), 바 분할부(310) 및 변환부(320)를 포함한다.3, the song template generating unit 110 includes an extracting unit 300, a bar dividing unit 310, and a converting unit 320. [

추출부(300)는 곡 데이터베이스의 각각의 곡으로부터 메인 멜로디 컨투어를 추출한다. 바 분할부(310)는 추출부(300)에서 추출한 메인 멜로디 컨투어에 바 분할을 수행한다. 변환부(320)는 바 분할이 수행된 멜로디 컨투어를 프레임 레벨 음표 시퀀스로 변환하여 템플릿 시퀀스로 저장한다. The extraction unit (300) extracts the main melody contour from each piece of music database. The bar division unit 310 performs bar division on the main melody contour extracted by the extraction unit 300. The transforming unit 320 transforms the melody contour subjected to the bar division into a frame-level note sequence, and stores the transformed melody contour as a template sequence.

추출부(300), 바 분할부(310) 및 변환부(320)의 구체적인 동작은 도 4를 참조하여 설명한다.The specific operation of the extracting unit 300, the bar dividing unit 310, and the converting unit 320 will be described with reference to FIG.

도 4는 본 발명의 다른 실시 예에 따른 곡 템플릿 생성 방법을 설명하기 위한 흐름도이다.4 is a flowchart illustrating a method of generating a song template according to another embodiment of the present invention.

도 4를 참조하면, 먼저, 곡 데이터베이스가 입력된다. 단계 400에서, 각각의 곡의 메인 멜로디가 추출된다. 멜로디를 추출하는 방식은 미디(MIDI) 타입에 따라서 다양하다. 모노 미디 타입의 곡은 단지 하나의 트랙이 있기 때문에, 트랙에 저장된 멜로디가 메인 멜로디이다. 다음(polyphonic) 미디 타입의 곡은 일반적으로 메인 멜로디가 첫 번째 트랙에 존재한다. 하지만, 다음 미디 타입의 몇몇 곡들은 그러하지 아니하다. 다음의 트랙들이 메인 멜로디를 가진 트랙으로 결정될 수 있다. (1) 트랙 타이틀에 "MELODIES", "VOCAL", "SING", "SOLO", "LEAD", "VOICE"의 단어들을 가진 트랙 타이틀; (2) 최대 평균 음표 강도(mean note intensity)를 가진 트랙; (3) 최대 전체 음표 길이(note duration)을 가진 트랙, 음표 길이는 첫 번째 음표의 시작시간부터 마지막 음표의 종료시간까지의 시간을 의미한다. 위에 3 조건 중 하나를 만족하는 트랙은 메인 멜로디가 존재하는 트랙으로 간주되고, 그 트랙으로부터 음표 정보가 메인 멜로디로서 추출된다.Referring to FIG. 4, first, a music database is input. In step 400, the main melody of each song is extracted. The method of extracting the melody varies according to the MIDI type. Because a mono MIDI type song has only one track, the melody stored in the track is the main melody. The next (polyphonic) MIDI type song usually has the main melody on the first track. However, some of the following MIDI types do not. The following tracks can be determined as tracks having the main melody. (1) Track titles with words "MELODIES", "VOCAL", "SING", "SOLO", "LEAD", "VOICE" in the track title; (2) a track having a maximum mean note intensity; (3) a track having a maximum total note duration, and note length means the time from the start time of the first note to the end time of the last note. A track satisfying one of the above three conditions is regarded as a track in which a main melody is present, and note information is extracted from the track as a main melody.

단계 402에서, 다른 정보가 트랙으로부터 추출된다. 하나의 곡에 대해, 템포, 박자 기호(4/4 박자 및 2/4박자) 및 1/4 음표 길이 등은 매우 유용한 정보이고, 일반적으로 각각의 트랙의 헤더에 쓰여 진다. 따라서, 이러한 정보는 각각의 미디 곡의 트랙들로부터 얻을 수 있다.In step 402, other information is extracted from the track. For one song, tempo, time signature (4/4 and 2/4 time), and quarter note length are very useful information, and are generally written in the header of each track. Thus, this information can be obtained from the tracks of each MIDI song.

단계 404에서, 각각의 바의 시작시간과 종료시간을 단계 402에서 얻어진 정보에 따라 알 수 있다. 특히, 미디에서 각각의 바의 시작과 종료 위치는 길이와 박자 기호로 레이블링 된다.In step 404, the start time and the end time of each bar can be known in accordance with the information obtained in step 402. In particular, the beginning and ending positions of each bar in MIDI are labeled with length and time signature.

단계 402 및 404의 동작들을 합쳐 바 분할(bar segmentation)이라고 명명한다. 사용자가 어떤 곡을 노래/허밍을 하는 경우, 통상적으로 바의 시작에서 시작해서 바의 종료에서 멈춘다. 이러한 습관이 질의의 정확도를 높이고 질의를 빠르게 할 수 있다. 또한, 사용자가 질의를 수행하는 경우 곡의 어떠한 부분으로부터 질의를 시작할 수 있다. 즉 사용자는 곡의 어떠한 부분으로부터 노래/허밍을 할 수 있다. 따라서, 자동 바 분할이 본 발명에서 채택된다.The operations of steps 402 and 404 are collectively referred to as bar segmentation. If the user is singing / humming a song, it typically starts at the beginning of the bar and stops at the end of the bar. These habits can increase the accuracy of the query and speed up the query. Also, if the user is performing a query, the query can start from any part of the song. That is, the user can sing / hum from any part of the song. Therefore, automatic bar division is adopted in the present invention.

단계 406에서, 프레임-레벨 다중 화음 멜로디 컨투어가 생성된다. 다른 화음을 가진 멜로디 음표 시퀀스가 다른 프레임 시프트들에 따라 얻어질 수 있다. 저화음은 큰 프레임 시프트로 각각의 곡의 음표들을 샘플링하는 것을 의미한다. 이 경우, 획득한 음표들의 수는 작고, 멜로디의 대표는 비교적 거칠다. 고화음은 작은 프레임 시프트로 각각의 곡의 음표들을 샘플링하는 것을 의미한다. 이 경우, 획득한 음표들의 수는 비교적 많고, 멜로디의 대표는 비교적 정교하다. 일반적으로 2개 이상의 프레임 레벨 멜로디 시퀀스들이 하나의 곡으로부터 얻을 수 있다. 마지막으로 얻어진 멜로디 음표 시퀀스들에 대해 평균 차감 처리가 수행된다. In step 406, a frame-level multiplex tone melody contour is generated. Melody note sequences with different chords can be obtained according to different frame shifts. A low chord means sampling the notes of each song with a large frame shift. In this case, the number of obtained notes is small, and the representation of the melody is relatively rough. A high tone means sampling the notes of each song with a small frame shift. In this case, the number of acquired notes is relatively large, and the representative of the melody is relatively elaborate. Generally, two or more frame level melody sequences can be obtained from one song. The average subtraction process is performed on the finally obtained melody note sequences.

단계 408에서, 매칭 처리를 위한 템플릿 세트가 출력된다. 즉, 각각의 곡, 및 매칭을 위해 템플릿들의 집합으로부터 추출된 다른 화음을 가진 멜로디 시퀀스들이다. In step 408, a template set for the matching process is output. I. E., Melody sequences with each song, and other chords extracted from the set of templates for matching.

도 5는 본 발명의 또 다른 실시 예에 따른 노래/허밍에 의한 질의 방법을 설명하기 위한 흐름도이다.5 is a flowchart illustrating a query method by song / humming according to another embodiment of the present invention.

도 5를 참조하면, 단계 500에서, 노래/허밍으로부터 추출된 프레임 레벨 멜로디 컨투어가 입력된다. 다음 매칭 처리에서, 프레임 레벨 멜로디 컨투어는 노래/허밍 질의 시퀀스로 언급될 수 있다. 단계 502에서, 템플릿 생성부로부터 얻어진 템플릿 세트가 입력된다. 다음 매칭 처리에서, 템플릿 세트의 각각의 템플릿은 템플릿 시퀀스로 언급될 수 있다.Referring to FIG. 5, in step 500, a frame level melody contour extracted from song / humming is input. In the next matching process, the frame level melody contour may be referred to as a song / humming query sequence. In step 502, the template set obtained from the template generating unit is input. In the next matching process, each template in the template set may be referred to as a template sequence.

단계 504에서, 저화음 멜로디 컨투어들이 생성된다. 이 과정에서, 동일한 화음의 멜로디 컨투어들이 가장 높은 화음을 가진 프레임 레벨 다중 화음 멜로디와 가장 높은 화음의 프레임 레벨 다중 화음 템플릿 세트로부터 각각 얻어진다. 저화음의 멜로디 컨투어은 샘플링 포인트들이 작기 때문에, 빠른 매칭이 짧은 주기에서 후보들을 얻기 위해서 질의 시퀀스와 템플릿 시퀀스 사이에 수행될 수 있다. At step 504, low-tone melody contours are generated. In this process, the melodic contours of the same chord are obtained from the frame-level polyphonic melody with the highest polyphony and the highest polyphonic frame-level polyphonic sound template set, respectively. Because the melody contour of the low-pitched tone is small, fast matching can be performed between the query sequence and the template sequence to obtain candidates in a short period.

단계 506에서, 빠른 매칭 프로세스가 다음의 비교를 위해 작은 후보들 세트를 선택하도록 수행된다. 특히, 추출된 멜로디 컨투어(즉, 노래/허밍 질의 시퀀스(Q))와 템플릿 세트(즉, 템플릿 시퀀스 T)가 수행된다. 모든 종류의 매칭 알고리즘들이 매칭 처리를 수행하는데 이용될 수 있다. 본 발명의 일 실시 예에서, 빠 른 템포 스케일링을 가진 선형 스케일링(LS)이 제안되며, 이는 빠른 템포 스케일링 동작을 가진 종래의 선형 스케일링 알고리즘이다. 여기서, 제안된 알고리즘은 알고리즘 1로 언급되며, 다음 표1에 상세히 설명한다.At step 506, a fast matching process is performed to select a small set of candidates for the next comparison. In particular, an extracted melody contour (i.e., a song / humming query sequence Q) and a template set (i.e., a template sequence T) are performed. All kinds of matching algorithms can be used to perform the matching process. In one embodiment of the present invention, linear scaling (LS) with fast tempo scaling is proposed, which is a conventional linear scaling algorithm with fast tempo scaling operation. Here, the proposed algorithm is referred to as Algorithm 1 and is described in detail in Table 1 below.

여기서, 정수 M,N은 시퀀스들에서 프레임들의 수를 의미하며, r은 스케일링이며, δ는 r의 오프셋이고, r₀ 와 δ₀ 는 1.4와 0.4로 설정된 2개의 기 설정값이다. Q×r은 N부터 N×r까지의 질의 시퀀스를 스케일링하는 것을 의미한다. Q의 길이가 100, r이 1,5라면, 변환된 Q는 150이고, LS(Q,T)는 리니어 스케일링 함수를 의미하며, 질의와 템플릿의 거리 함수는 d=｜q-t｜로 정의된다. Where r is the scaling, δ is the offset of r, and r ₀ and δ ₀ are the two preset values set at 1.4 and 0.4. Q x r means scaling the query sequence from N to N x r. If the length of Q is 100 and r is 1,5, then the transformed Q is 150, LS (Q, T) is the linear scaling function, and the distance function of the query and template is defined as d = | qt |.

LS(Q,T)의 상세한 설명은 도 5를 참조하여 설명한다. 도 6에 도시된 것처럼, 수평축은 템플릿 시퀀스를 나타내고, 수직축은 질의 시퀀스를 나타낸다. 거리는 대각선의 포인트들을 이용해서 계산되는데, 즉 2개의 시퀀스들의 차이의 절대값을 계산하는 것이다.LS (Q, T) will be described in detail with reference to Fig. As shown in FIG. 6, the horizontal axis represents a template sequence, and the vertical axis represents a query sequence. The distance is calculated using diagonal points, that is, to calculate the absolute value of the difference of the two sequences.

알고리즘 1이 수행되고 거리가 출력된 후, 4개의 다른 필터들이 후보들의 크기를 추가로 감소시키기 위해 사용된다. 이러한 필터들은 분산, 서브-세그먼트 평균 비교, 중심 비교 및 음표 분산 비교이다.After Algorithm 1 is performed and the distance is output, four different filters are used to further reduce the size of the candidates. These filters are variance, sub-segment mean comparison, center comparison, and note variance comparison.

고속 템포 스케일링의 선형 스케일링 필터는 획득한 거리가 소정의 값보다 작다면, 상응하는 템플릿 시퀀스는 남을 것이고, 그렇지 않다면 버려질 것이다.If the linear scaling filter of fast tempo scaling is less than the predetermined value, then the corresponding template sequence will remain, otherwise it will be discarded.

분산 비교 필터는 질의 시퀀스와 템플릿 시퀀스 사이의 분산을 비교해서, 분산의 소정의 임계값보다 작다면, 상응하는 템플릿 시퀀스는 남을 것이고, 그렇지 않다면 버려질 것이다.The variance comparison filter compares the variance between the query sequence and the template sequence, and if the variance comparison filter is less than the predetermined threshold of variance, the corresponding template sequence will remain, otherwise it will be discarded.

서브-세그먼트 비교 필터는 평균은 질의와 템플릿을 4개의 세그먼트들로 균등하게 분할하고, 각각의 서브 세그먼트들의 평균을 계산하여 4차원 벡터를 각각 구성한다. 그리고, 질의 시퀀스 벡터와 템플릿 시퀀스 벡터 사이의 거리를 비교한다. 거리 값이 소정의 임계값보다 작다면, 상응하는 템플릿 시퀀스는 남을 것이고, 그렇지 않다면 버려질 것이다.The sub-segment comparison filter averages queries and templates evenly into four segments, and calculates the average of each sub-segment to construct each of the four-dimensional vectors. Then, the distance between the query sequence vector and the template sequence vector is compared. If the distance value is less than the predetermined threshold value, the corresponding template sequence will remain, otherwise it will be discarded.

중심 비교 필터는 질의 시퀀스와 템플릿 시퀀스의 중심들을 계산하고 중심들 사이의 거리를 비교한다. 거리 값이 소정의 임계값보다 작다면, 상응하는 템플릿 시퀀스는 남을 것이고, 그렇지 않다면 버려질 것이다.The center comparison filter computes the query sequence and the centers of the template sequence and compares the distances between the centers. If the distance value is less than the predetermined threshold value, the corresponding template sequence will remain, otherwise it will be discarded.

중심을 계산하는 식은 다음 수학식 2와 같다.The formula for calculating the center is shown in Equation 2 below.

여기서, S(i)는 시퀀스에서 i번째 값이고, N은 시퀀스의 길이이다.Where S (i) is the i-th value in the sequence and N is the length of the sequence.

음표 분산 비교 필터는 질의 시퀀스와 템플릿 시퀀스의 음표의 분산을 각각 계산하고, 이어 큰 값부터 낮은 값에 따라 정렬하고, 최대값을 가진 3개의 음표들을 사용해서 3차원 벡터를 구성하고나서, 질의 시퀀스의 벡터와 템플릿 시퀀스의 벡터 사이의 거리 값을 비교한다. 거리 값이 소정의 임계값보다 작다면, 템플릿 시퀀스는 남을 것이고, 그렇지 않다면 버려질 것이다.The note distribution variance filter calculates the variance of the query sequence and the notes of the template sequence, then arranges them according to the values from the largest value to the lowest value, constructs the three-dimensional vector using the three notes having the maximum value, And the distance between the vector of the template sequence and the vector of the template sequence. If the distance value is less than the predetermined threshold value, the template sequence will remain, otherwise it will be discarded.

상기 5개의 필터가 다음 정확한 매칭을 위해 템플릿 시퀀스들을 필터링하는데 주로 사용된다.The five filters are mainly used to filter the template sequences for the next exact match.

단계 506에서의 필터링 후에, 후보 곡들의 크기가 줄어들게 되고, 단계 508에서, 작은 스케일의 템플릿 세트가 출력된다.After filtering in step 506, the size of the candidate songs is reduced, and in step 508, a small scale template set is output.

단계 510에서, 고화음 멜로디 컨투어가 생성된다. 이 과정에서, 고화음 질의 시퀀스와 동일한 화음을 가진 고화음 템플릿이 최종 정확 매칭을 위해 출력된다.In step 510, a high tone melody contour is generated. In this process, a high-tone template with the same chord as the high-chord quality sequence is output for final correct matching.

단계 512에서, 정확한 매칭이 수행된다. 이 과정에서, 평균 시프트를 가진 고속 재귀 정렬이 사용된다. 평균 시프트를 가진 고속 재귀 정렬은 재귀 정렬(RA) 알고리즘에 기반하며, 여기서 알고리즘 2로 명명되며 다음과 같다.In step 512, an exact match is performed. In this process, a fast recursive sort with an average shift is used. A fast recursive sort with an average shift is based on a recursive sort (RA) algorithm, which is named Algorithm 2, as follows.

도 7A 및 7B는 종래의 RA 알고리즘을 설명하기 위한 도면이다. 도 7A에 도시된 것처럼, 굵은 선은 질의 시퀀스와 템플릿 시퀀스 사이의 최적의 정렬 경로이다. 재귀의 첫 번째 단계에서, 3개의 정렬 경로들, 실선, 점선 및 이점 쇄선을 비교한다. 점선을 따라 얻어진 스코어가 가장 작다면, 다음 재귀 단계에서, 도 7B에 도시된 것처럼 그 경로를 2개로 분할한다. 이어, 계산 동작이 2개의 경로 그룹들에 대하여 반복되고, 가장 작은 값이 2개의 그룹에서 각각 얻어진다. 이어 마지막 경로가 2개의 경로들을 가장 작은 스코어와 결합함으로써 얻어진다. 7A and 7B are diagrams for explaining a conventional RA algorithm. As shown in Figure 7A, the bold line is the optimal alignment path between the query sequence and the template sequence. In the first phase of recursion, the three alignment paths, the solid line, the dotted line, and the dashed line are compared. If the score obtained along the dotted line is the smallest, in the next recursion step, the path is divided into two as shown in Fig. 7B. The calculation operation is then repeated for the two path groups, and the smallest value is obtained in each of the two groups. The final path is then obtained by combining the two paths with the smallest score.

고속 RA 알고리즘은 계산 부분을 저장하기 위해서 최적 경로의 벤딩 디렉션이 결정되고 나서 동일한 벤딩 디렉션을 가진 경로만이 계산된다는 점에서 종래의 RA 알고리즘과는 다르다. 예를 들면, 도 7A에서, 점선에 상응하는 경로만이 나머지 경로 계산을 저장하기 위해서 최적 경로가 윗 방향으로 구부려지는 것을 결정함으로써 계산된다.The fast RA algorithm differs from the conventional RA algorithm in that only the path with the same bending direction is calculated after the bending direction of the optimal path is determined to store the calculation part. For example, in Figure 7A, only the path corresponding to the dashed line is calculated by determining that the best path is bent upward to store the remaining path calculations.

일반적으로, 여성 가수는 남자보다 더 높은 음조 주파수를 가진다. 즉, 동일한 곡에 대해 서로 다른 사람들이 서로 다른 음조 주파수 값들(음표 값들)을 가진다. 이는 매칭 처리의 정확도에 관한 문제이다. 이러한 문제들을 다루기 위해, 평균 시프트가 질의와 템플릿 사이에 최소 스코어를 이루기 위해 구현된다. 여기서, 평균 시프트를 가진 고속 RA가 제안되고, 알고리즘 3에서 설명한다.Generally, female singers have higher tone frequencies than men. That is, different people have different tone frequency values (note values) for the same song. This is a problem regarding the accuracy of the matching process. To address these problems, an average shift is implemented to achieve a minimum score between the query and the template. Here, a high-speed RA with an average shift is proposed and will be described in Algorithm 3.

알고리즘 3에서, δ₀ 는 2로 설정된 값이다. Q±δ는 Q에 δ를 더하거나 빼는 모든 값들이고, δ는 시프트 값이다.In Algorithm 3, δ ₀ is a value set to 2. Q ± δ are all values that add or subtract δ to Q, and δ is the shift value.

여기서, 평균 시프트는 이전에 설명된 평균 감산과 같은 알고리즘이다. 노래/허밍은 정확하지 않기 때문에, 질의 시퀀스의 평균과 템플릿 시퀀스의 평균 사이에 차이가 있다. 이러한 차이를 없애기 위해, 평균 시프트 알고리즘이 사용된다. 즉, 질의 시퀀스의 평균과 템플릿 시퀀스의 평균 사이에 거리를 가지도록 질의 시퀀스의 평균이 적당한 위치, 위 또는 아래로 이동된다.Here, the average shift is the same algorithm as the previously described average subtraction. Because song / humming is not accurate, there is a difference between the average of the query sequence and the average of the template sequence. To eliminate this difference, an average shift algorithm is used. That is, the average of the query sequence is shifted to an appropriate position, up or down so as to have a distance between the average of the query sequence and the average of the template sequence.

도 8은 평균 시프트 알고리즘을 설명하기 위한 도면이다. 도 8에서, t는 질의 시퀀스의 평균을 의미하고 δ는 2로 설정된다. 질의 시퀀스의 평균이 각각 t, t-2 및 t+2인 경우에 질의 시퀀스와 템플릿 시퀀스 사이의 거리 값이 계산된다. 이어, 계산된 거리들이 크기로 비교된다. t+2에서 거리가 가장 작다면, t+2는 t'로 표시되고 δ를 2로 나눈다. 이어, t',t'-1 및 t'+1에서 거리 값들이 크기로 비교되고, 가장 작은 거리가 최종 결과이다.8 is a diagram for explaining an average shift algorithm. In FIG. 8, t denotes the average of the query sequence and? Is set to 2. The distance value between the query sequence and the template sequence is calculated when the average of the query sequence is t, t-2 and t + 2, respectively. The calculated distances are then compared in size. If the distance at t + 2 is the smallest, t + 2 is denoted by t 'and δ is divided by 2. Next, the distance values are compared in size at t ', t'-1 and t'+ 1, and the smallest distance is the final result.

단계 308에서, 매칭 결과가 출력된다. 특히, 질의 시퀀스와 템플릿 시퀀스 사이의 거리가 소정의 값보다 작다면, 2개의 시퀀스들은 서로 매칭된 것으로 간주된다. 템플릿에 상응하는 데이터베이스의 곡이 후보 곡으로 간주된다. 정확한 매칭을 통해, 후보들의 크기가 추가로 감소된다. 이어 사용자가 짧은 시간안에 출력된 후보들로부터 듣기를 바라는 곡을 선택할 수 있다.In step 308, a matching result is output. In particular, if the distance between the query sequence and the template sequence is less than a predetermined value, the two sequences are considered to be matched to each other. A song in the database corresponding to the template is considered a candidate song. Through accurate matching, the size of the candidates is further reduced. The user can then select a song he or she wishes to hear from the candidates output in a short time.

본 발명의 일 실시 예에 따라, 매칭 처리들이 서로 다른 화음들에서 수행된다. 먼저, 대용량 데이터베이스로부터 작은 후보 세트들을 선택하도록, 매칭이 저화음의 멜로디 컨투어들 사이에서 수행된다. 이어, 작은 후보 세트들에 기반하여 정확한 매칭이 질의 결과를 얻기 위해 고화음에서 멜로디 컨투어들 사이에서 수행된다. 그러한 프로세스에 따라 질의의 속도가 증가된다. According to one embodiment of the present invention, matching processes are performed on different chords. First, matching is performed between the melodic contours of the low-tones to select small candidate sets from the large-volume database. Then, based on the small candidate sets, an exact match is performed between the melody contours in the high tones to obtain the query result. Such a process increases the speed of the query.

본 발명은 다음과 같은 장점을 가진다. 멜로디 추출에서, 멜로디가 자동 바 분할으로 수행되고, 질의가 바에 따라 수행되어, 정확도가 증가한다. 매칭이 다른 화음에서 수행되어 처리 속도가 증가한다. 사용자는 노래가 서툴러도 되며, 노래/허밍에 에러가 허용된다. 본 발명은 PC, 모바일 폰 및 MP3 플레이어에 적용될 수 있다.The present invention has the following advantages. In melody extraction, the melody is performed with automatic bar division, the query is performed as needed, and the accuracy is increased. Matching is performed on different chords to increase processing speed. The user may have a problem with the song, and an error is allowed in the song / hum. The present invention can be applied to PCs, mobile phones, and MP3 players.

한편, 본 발명은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다.Meanwhile, the present invention can be embodied in computer readable code on a computer readable recording medium. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored.

컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현하는 것을 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다. 그리고 본 발명을 구현하기 위한 기능적인(functional) 프로그램, 코드 및 코드 세그먼트들은 본 발명이 속하는 기술 분야의 프로그래머들에 의하여 용이하게 추론될 수 있다.Examples of the computer-readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device and the like, and also a carrier wave (for example, transmission via the Internet) . In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner. In addition, functional programs, codes, and code segments for implementing the present invention can be easily deduced by programmers skilled in the art to which the present invention belongs.

이제까지 본 발명에 대하여 바람직한 실시 예를 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 본 발명을 구현할 수 있음을 이해할 것이다. 그러므로 상기 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 한다.The present invention has been described above with reference to preferred embodiments. It will be understood by those skilled in the art that the present invention may be embodied in various other forms without departing from the spirit or essential characteristics thereof. Therefore, the above-described embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

도 2는 도 1에 도시된 노래/허밍에 의한 질의 장치(200)의 구체적인 블록도이다. FIG. 2 is a specific block diagram of the song / humming query device 200 shown in FIG.

도 6은 종래의 LS 알고리즘을 설명하기 위한 도면이다.6 is a diagram for explaining a conventional LS algorithm.

도 7은 종래의 RA 알고리즘을 설명하기 위한 도면이다.7 is a diagram for explaining a conventional RA algorithm.

도 8은 평균 시프트 알고리즘을 설명하기 위한 도면이다.8 is a diagram for explaining an average shift algorithm.

Claims

CLAIMS 1. A method for generating a song template in a query by song or humming,

Extracting a main melody contour from an arbitrary piece of music;

Performing bar segmentation on the extracted main melody contour; And

Converting the melody contour into a frame-level note sequence and storing the converted melody contour as a template sequence,

Converting the melody contour into a frame-level note sequence and storing the melody contour as a template sequence,

And sampling the melody contour with a predetermined frame shift to obtain the frame-level note sequence.

The method according to claim 1,

Wherein the performing the bar segmentation comprises:

Obtaining information on the melody of the tune;

Searching for a start position and an end position of each bar according to the obtained information; And

And labeling the start point and end point of each bar in the melody contour.

delete

The method according to claim 1,

And converting the frame-level note sequence into an average subtracted melody note sequence.

A method of querying by song or humming,

Inputting a query sequence of a frame level extracted from any song or humming;

Performing a matching between the query sequence stored in the tune database and corresponding to the respective divided frame level template sequences; And

And if the template sequence matches the query sequence, outputting a song corresponding to the template as a result,

Wherein the template sequence comprises:

And a melody contour of each of the plurality of songs is sampled at a predetermined frame shift.

6. The method of claim 5,

Wherein the predetermined frame shift includes a first frame shift and a second frame shift,

Wherein the query sequence and the template sequence match,

Sampling the input query sequence with the first frame shift to extract a low-tone preview sequence;

Extracting a hypothesis template sequence sampled at the first frame shift of the template sequences;

Performing a first matching of the low tone tone query sequence and the low tone template sequence and obtaining candidate template sequences from the template sequences according to the result of the matching;

Sampling the input query sequence with the second frame shift smaller than the first frame shift and extracting a sequence of high tone quality queries;

Extracting a high-tone template sequence sampled at the second frame shift among the candidate template sequences; And

And performing a second matching of the high tone tone query sequence and the high tone tone template sequence.

6. The method of claim 5,

Wherein the matching of the query sequence and the template sequence is performed using a linear scaling algorithm.

6. The method of claim 5,

Wherein the step of matching the query sequence with the template sequence comprises:

(a) inputting a query sequence (Q) and a template sequence (T);

(b) the scaling r is r ₀ and the offset 隆 of the scaling r has 隆₀ ;

(c) calculating Q ₁ = Q x r and a distance value d = | Q ₁ -T | of the sequence Q ₁ and the template sequence T, using a linear scaling algorithm;

(d) calculating Q _high = Q x (r +?) and the distance value d _hign = | Q _high- T | of the sequence Q _high and the template sequence T, using the linear scaling algorithm;

(e) calculating Q _low = Q x (r-delta) using the linear scaling algorithm and a distance value d _low = | Q _low- T | of the sequence Q _low and the template sequence T;

(f) comparing the distance values d, d _high , d _low ;

(g) If d _high is the smallest, r is r + delta and d is d _high , and if d _low is the smallest, r is r-delta and d is d _low , r and d;

(h) determining whether the? is not less than a predetermined value;

(i) if? is equal to or greater than a predetermined value,? is? / 2, and performing the step (d);

(j) outputting the distance value d when? is less than or equal to a predetermined value; And

(k) determining whether the query sequence and the template sequence match according to whether the distance value d is less than or equal to a predetermined threshold value.

9. The method of claim 8,

Wherein r ₀ and 隆₀ are 1.4 and 0.4, respectively, and the predetermined value is 0.09.

6. The method of claim 5,

When the query sequence and the template sequence are matched,

Calculating a distance value between the query sequence and the template sequence using a recursive sorting algorithm and outputting a query corresponding to the template sequence as a query result if the distance value is less than or equal to a predetermined threshold value; And discarding the template and performing a matching between the query sequence and another template sequence.

6. The method of claim 5,

Wherein the query sequence and the template sequence match,

(a) inputting the query sequence Q = (q ₁ , q ₂ , ..., q _N ) and the template sequence T = (t ₁ , t ₂ , ... t _M ) M is the number of frames of Q and T, respectively);

(b) inputting a recursive depth D and having j = [N / 2] and i = 0;

(c) at the point sequence Q j, 2 sequences of _{_{Q 1 = (q 1, q}} 2, ..., q i), Q 2 = (q j + 1, q j + 2, ... , q _N );

(d) calculating a sum sum (Q1) of Q1 and a sum (Q2) of Q2 and calculating a ratio sum (Q1) / sum (Q);

(e) k = [M / 2] to obtain, for the template T sequence at point k, the two sequence _{_{T 1 = (t 1, t}} 2, ..., t k) and T ₂ (t ₊ _K ₁ , t _K ₊₂ , ..., t _M );

(f) the method comprising using a linear scaling algorithm, calculating the distance value d ₂ between Q ₁ and distance value between T ₁ d ₁ and Q ₂ and T _2, and so as to have a _{_{_{S 1 = d 1 + d 2}}} ;

(g) from the point h to the sequence of the template T, a sequence of 2 _{_{T 3 = (t 1, t}} 2, ..., t h) and _{_{_{T 4 = (t h +1,}}} t h +2, .. ., t _M ), so that the ratio of the sum of T ₃ and T is R ₀ ;

(h) the step of using the linear scaling algorithm, and calculates the Q ₁ and distance value d ₃ T _3, the distance value of Q ₂ and T _₄ d _4, S ₂ to have a d ₃ + d _4;

(i) comparing S ₁ and S ₂ , and if S ₁ is less than S ₂ , S = S ₁ and i = k, otherwise S = S ₂ and i = h;

(j) If the D is 0, S, and outputs, D is not 0, the two sequences to the template sequences _{_{T T 1 = (t 1,}} t 2, ... t i) and T ₂ = (t _i ₊₁ , t _i ₊₂ , ..., t _M , returning to step (a) to have Q = Q ₁ and T = T ₁ and repeat the operation to obtain S ₁ , and Q = Q ₂ and T = T ₂ and returning to step (a) to obtain S ₂ , then having S = S ₁ + S ₂ and outputting S; And

(k) outputting a song corresponding to the template as a query result if S is less than a predetermined value, and if not, performing a matching between the query sequence and another template sequence.

An apparatus for generating a song template in a query by a song or a humming,

An extracting unit for extracting a main melody contour from each piece of music in the music database;

A bar division unit for performing bar division on the extracted main melody contour; And

And a conversion unit for converting the melody contour into a frame level note sequence and storing the converted melody contour as a template sequence,

Wherein,

And samples the melody contour at a predetermined frame shift to obtain the frame-level note sequence.

As a query device by song or humming,

A query sequence input unit for inputting a query sequence of a frame level extracted from a song or a humming;

A matching unit that performs matching between the query sequence and the divided frame level template sequence corresponding to each of the tunes; And

And an output unit outputting query candidates according to a result of the matching,

Wherein the template sequence comprises:

And a melody contour of each of the songs is sampled at a predetermined frame shift.

As a query device by song or humming,

A template generation unit for generating a template set by extracting a main melody of each piece of music in a music database and performing bar division on a melody contour;

A melody extracting unit for extracting a query melody contour from the song or humming; And

And a melody matching unit performing matching between the query melody contour and each template of the template set and outputting candidate songs according to the matching result,

Further comprising a polyphonic tone template sequence generator for extracting a note sequence from the main melody with a predetermined frame shift to generate a polyphonic frame level melody template sequence.

15. The method of claim 14,

And a multi-tone pitch query sequence generator for extracting a note sequence from the query melody contour to generate a multi-chord frame level query sequence.

16. The apparatus of claim 15, wherein the melody matching unit comprises:

A distance value calculator for calculating a distance between the multi-tone frame-level melody template sequence and the multi-tone frame-level melody query sequence; And

Determining whether the distance value is less than or equal to a predetermined threshold value and determining the template sequence to match the query sequence if the distance value is less than or equal to the threshold value and outputting a song corresponding to the template sequence; And an output unit for performing a matching between the sequence and another template sequence.

The method according to claim 1,

Wherein sampling at the predetermined frame shift comprises:

Sampling the melody contour with the first frame shift to obtain a first frame level melody note sequence; And

And sampling the melody contour with the second frame shift to obtain a second frame-level melody note sequence.

13. The method of claim 12,

Wherein,

Sampling the melody contour with the first frame shift to obtain a first frame level melody note sequence, and

Wherein the melody contour is sampled at the second frame shift to obtain a second frame-level melody note sequence.

14. The method of claim 13,

The matching unit,

Sampling the input query sequence with the first frame shift to extract a low-tone preview sequence,

Extracting a low-tone template sequence sampled at the first frame shift among the template sequences,

Performing a first matching of the low-tone-tone query sequence and the low-tone template sequence, obtaining candidate template sequences from the template sequences according to a result of the matching,

Sampling the sequence of input queries with the second frame shift smaller than the first frame shift to extract a sequence of high-

Extracting a high-tone template sequence sampled at the second frame shift among the candidate template sequences, and

And performs a second matching of the high-tone-quality query sequence and the high-tone-tone template sequence.