KR100512143B1

KR100512143B1 - Method and apparatus for searching of musical data based on melody

Info

Publication number: KR100512143B1
Application number: KR10-2002-0007550A
Authority: KR
Inventors: 송정민; 배소영; 윤경로
Original assignee: 엘지전자 주식회사
Priority date: 2002-02-08
Filing date: 2002-02-08
Publication date: 2005-09-02
Also published as: KR20030067377A

Abstract

본 발명은 음악 데이터를 검색함에 있어서, 입력된 멜로디의 특징정보를 추출하고 추출된 멜로디 특징정보를 기반으로 하여 음악 데이터 베이스에서 음악 데이터를 검색하는 방법과 그 검색 장치에 관한 것이다.The present invention relates to a method for retrieving music data from a music database based on extracted melody feature information and retrieving the melody feature information.

본 발명은 음악 데이터 베이스의 음악 데이터에서 멜로디 특징을 추출하고, 질의로 입력되는 음악 데이터에서 멜로디 특징을 추출하고, 상기 추출된 음악 데이터 베이스의 음악 데이터 멜로디 특징정보와 입력된 질의 음악 멜로디 특징정보를 비교하여 그 유사도에 따라 음악 데이터의 검색 결과를 출력해 주는 것을 특징으로 하는 멜로디 기반 음악 데이터 검색장치와 멜로디 기반 음악 데이터 검색 방법이다.The present invention extracts a melody feature from music data of a music database, extracts a melody feature from music data input by a query, and extracts the music data melody feature information and the input query music melody feature information of the extracted music database. A melody-based music data retrieval apparatus and a melody-based music data retrieval method characterized by comparing and outputting a search result of music data according to the similarity.

Description

METHOD AND APPARATUS FOR SEARCHING OF MUSICAL DATA BASED ON MELODY}

영화, 사진, 음악 등의 멀티미디어 데이터를 검색하는 방법으로 그 멀티미디어에 추가되어 있는 제작자 정보, 제목, 생성 일자 등, 서지 정보를 이용하여 검색하는 방법과, 멀티미디어의 내용을 기반으로 하여 검색하는 방법이 있다.Searching for multimedia data such as movies, photos, and music is performed by searching for bibliographic information such as creator information, title, creation date, etc., and searching based on multimedia contents. have.

그러나 전자의 검색방법은 사용자의 검색 의도를 충분히 반영할 수 없는 경우가 많다. 예를 들어 바닷가 백사장 위에 파란 하늘이 있는 사진을 찾고 싶을 때, 그 사진에 풍경을 묘사하는 서지 정보가 없다면 검색이 불가능하다. 그러나 후자의 검색방법은 사진 자체에서 영상처리 기법으로 자동으로 추출할 수 있는 정보인 색상정보를 그 사진의 내용기반 검색을 위한 특징정보로 기술하여 두고, 사용자가 영상 팔레트 또는 비슷한 다른 사진을 질의 이미지로 하여 질의 이미지로부터 추출하는 특징정보와 검색 대상 이미지의 특징정보를 서로 비교함으로써 원하는 사진을 검색할 수 있다. However, the former search method is often unable to fully reflect the user's search intent. For example, if you want to find a picture with a blue sky on a white beach, you can't search if there is no bibliographic information describing the landscape. However, the latter retrieval method describes color information, which is information that can be automatically extracted by the image processing technique from the photo itself, as feature information for content-based retrieval of the photo. The desired picture can be retrieved by comparing the feature information extracted from the query image with the feature information of the search target image.

이렇게 사용자가 멀티미디어를 검색할 때, 그 멀티미디어의 내용적인 정보를 이용하여 검색하면 사용자의 검색의도에 따라 더욱 정확한 검색을 편리하게 할 수 있다. 실제로 사용자는 멀티미디어의 제목, 제작자 등의 정보를 이용하여 검색을 시도할 수도 있지만, 동영상인 경우에는 움직임, 색상 정보, 사진의 경우에는 색상, 모양 정보, 음악의 경우에는 음악의 빠르기, 분위기 정보 등에 더 많은 관심을 가질 수 있다. 내용적인 정보는 멀티미디어 제작자가 직접 입력하거나 자동으로 그 정보를 추출하여 검색을 위한 데이터 베이스를 구축할 수 있다.When a user searches for multimedia in this way, if the user searches using the contents of the multimedia, the user can conveniently search more precisely according to the user's search intention. In fact, the user may try to search by using information such as the title and creator of the multimedia, but in the case of video, movement, color information, color, shape information in the case of music, music speed, mood information, etc. You may be more interested. Content information can be directly input by multimedia producers or automatically extracted from the information to build a database for searching.

앞서 설명한 내용기반 멀티미디어 검색에서 음악을 내용기반으로 검색하는 경우 서지 정보(제목, 연주자, 작곡자, 발행년도 등) 이외의 내용적인 질의를 할 수 있으면 검색 의도에 더욱 맞는 검색을 할 수 있다. 음악의 경우에는 어떠한 음악을 듣고 난 뒤에 심상이나 멜로디, 곡의 빠르기 등을 기억하는데, 이는 음악의 서지 정보보다 더욱 기억하기 쉽고 오래 남는다. 따라서 음악을 검색할 때는 이러한 내용적인 정보를 이용하여 검색을 하는 것이 원하는 음악을 찾을 때 효율적이다. 내용 기반 음악 검색을 하는 방법은 그 질의의 종류에 따라서 다음과 같이 대략 4가지 경우로 나눌 수 있다.In the case of content-based multimedia search in the above-described content-based multimedia search, if a content query other than bibliographic information (title, player, composer, publication year, etc.) can be made, a search more suited to the search intention can be performed. In the case of music, after listening to some music, the image, the melody, and the speed of the song are memorized, which is easier to remember and longer than the bibliographic information of the music. Therefore, when searching for music, it is efficient to search for music using this content information. Content-based music search can be divided into four types according to the type of query.

1. 멜로디 질의 입력 음악 검색1. Input music search of melody query

이 방법은 사용자가 음악의 멜로디를 마이크를 통해 입력하여 그 멜로디가 포함되어 있는 음악을 찾는 방법이다.In this method, the user inputs the melody of the music through a microphone and finds the music containing the melody.

2. 음악 스타일 검색2. Music Style Search

이 방법은 원하는 스타일의 음악 샘플을 입력하거나 해당 음악 스타일을 규정하는 몇 가지 변수를 조절 입력하여 그 음악의 스타일에 해당하는 음악들을 찾는 방법이다.This method is to find the music that corresponds to the style of music by inputting a sample of music of the desired style or by adjusting several variables that define the music style.

3. 곡의 빠르기 검색3. Fast search of songs

이 방법은 사용자가 원하는 빠르기를 수치로 입력하여 그에 해당하는 빠르기의 음악을 검색하는 방법이다.In this method, a user inputs a desired speed as a numerical value and searches for a corresponding speed music.

4. 샘플 입력 검색4. Sample input search

이 방법은 저장된 음악 파일을 입력하거나 연주되고 있는 음악을 마이크로 입력하여 그 음악과 같은 음악을 데이터 베이스에서 찾는 방법이다.In this method, a stored music file is inputted or a music being played is input into a microphone to find a music in the database.

상기한 음악 데이터 검색 방법에서 멜로디 질의 입력을 이용한 음악 검색 방법에 대해서 살펴본다. 여러 가지 음악의 요소 중에 멜로디는 음악을 듣는 사람이 쉽게 기억할 수 있는 요소이다. 오디오 매체를 통해서 들어본 음악은 그 전체 부분이 아니더라도 중요한 부분의 멜로디를 기억하기 쉽다. 이러한 특성을 이용한 검색 방법이 멜로디 질의 입력 음악 검색 방법인데, 사용자가 기억하는 부분의 멜로디를 허밍 또는 노래로 마이크를 통해, 또는 키보드 등의 입력장치를 통해 입력하여 데이터 베이스의 음악 중에서 입력된 멜로디에 해당하는 부분을 포함하는 음악을 검색 결과로 내준다.A music search method using a melody query input will be described in the music data search method. Among the various elements of music, melody can be easily remembered by the listener. Music heard through audio media is easy to remember the melody of an important part, if not the whole part. The search method using this characteristic is a melody query input music search method. The melody of the user's memory is input to the melody input from the music of the database by inputting the melody of the part memorized through the microphone or through an input device such as a keyboard. The music containing the corresponding part is returned as a search result.

이와 같은 멜로디 질의 입력 음악 검색 기술의 특징은 음악 데이터의 멜로디 특징이 수동(매뉴얼)으로 인덱싱(indexing)되어 있는 데이터 베이스를 대상으로 한다는 것이다. 즉, 원래의 음악에 대한 악보 정보, 음 높이, 길이 등이 미리 수동으로 인덱싱되어 있다. 허밍 질의에서는 마이크를 통한 신호를 분석하여 음 높이, 길이 등을 자동으로 추출하여 검색하는데, 그러한 정보들을 키보드로 입력받아 검색 할 수도 있다. 따라서, 이러한 멜로디 매칭은 텍스트 매칭과 같은 스트링 매칭의 문제가 되고, 이 문제를 효율적으로 검색하기 위한 방법들이 발표되었다.The feature of this melody query input music search technique is that it targets a database where the melody characteristics of music data are manually indexed. That is, score information, pitch, length, and the like of the original music are manually indexed in advance. The humming query analyzes the signal through the microphone and automatically extracts and retrieves the pitch, length, and so on. Therefore, such melody matching becomes a problem of string matching such as text matching, and methods for efficiently searching for this problem have been published.

멜로디 매칭을 위한 방법으로는 기호로 표현된 멜로디 질의와 음악간에 스트링 매칭을 하는데 있어서, 일반적인 텍스트 검색에서 사용하는 스트링의 의미적 정보를 이용하여 매칭하는 방법을 도입하여, 멜로디 기호열의 음악적 의미 정보를 추출하여 검색하는 방법이 있다.As a method for melody matching, in the case of string matching between a melody query represented by a symbol and music, a method of matching using the semantic information of a string used in a general text search is introduced, and the musical semantic information of the melody symbol string is introduced. There is a way to extract and search.

또 다른 방법으로는 컴퓨터 음악 표현의 표준적인 형태인 MIDI 포맷을 기반으로 하여 스트링 매칭하는 방법이나, 두 MIDI 파일의 유사도를 멜로디를 기반으로 산출하는 방법, MIDI 파일에서 멜로디 스트링, 리듬 스트링, 코드 스트링을 추출하여 스트링 매칭하는 방법이 있다. 또한, 상기 멜로디 매칭을 위한 MIDI 파일의 표현방법이 소개되기도 한다.Alternative methods include string matching based on the MIDI format, a standard form of computer music representation, or calculating similarities between two MIDI files based on melody, melody strings, rhythm strings, and chord strings from MIDI files. There is a method of extracting and string matching. In addition, a method of expressing a MIDI file for melody matching may be introduced.

또한 상기한 멜로디 매칭을 위해서, 소리 입력으로부터 멜로디 요소를 추출하는 방법이나, MIDI 파일과 허밍을 매칭할 때 쓰이는 매칭 방법이 소개되기도 하였으며, 멜로디를 연속된 음들과의 차이로 표현하는 방법으로 멜로디 윤곽선을 표현하고 매칭하는 방법이 소개되기도 하였다.In addition, for the melody matching, a method of extracting a melody element from a sound input or a matching method used when matching a MIDI file and a humming has been introduced, and a method of expressing a melody as a difference from successive notes has been introduced. A method of expressing and matching is also introduced.

앞서 소개한 기존의 멜로디 검색 방법들은 이미 음악적인 정보(음 높이, 길이)가 추출되어 있는 데이터 베이스를 대상으로 검색하는 방법이다. 이는 실제적으로 사용자들이 보통 생성, 전송, 획득, 소비하는 형태인 음악신호 형태가 아니므로 검색을 위해서는 모든 음악에 대한 음악적인 정보를 수동(매뉴얼)으로 인덱싱해야 되고, 대용량의 음악 데이터 베이스를 대상으로 할 때는 수동 인덱싱이 불가능하므로 인터넷을 통한 음악 검색, 개인 음악 라이브러리 관리 등의 어플리케이션에 적용할 수 없다는 제약이 따른다.The existing melody search methods introduced above are methods for searching a database where musical information (pitch, length) has already been extracted. Since this is not a form of music signal that users usually create, transmit, acquire, and consume, the search requires manual indexing of musical information about all music. In this case, since manual indexing is impossible, it cannot be applied to applications such as music search and personal music library management through the Internet.

본 발명은 멜로디 기반 음악 데이터 검색에 있어서, 검색하려는 데이터 베이스의 음악적 정보를 음악으로부터 자동으로 추출하고, 멜로디 입력 질의로부터 음악적 정보를 자동으로 추출하여 음악 데이터를 멜로디 기반으로 검색하는 방법과 그 검색장치를 제공함을 목적으로 한다.In the melody-based music data retrieval, a method for retrieving music data based on melody by automatically extracting musical information of a database to be searched from music, and automatically extracting musical information from a melody input query It is intended to provide.

본 발명은 데이터 베이스에 저장되어 있는 음악을 검색하는 방법과 검색장치로서, 음악의 제목이나 가수, 앨범 이름 등의 문자적인 정보를 이용하여 검색하지 않고, 사람이 쉽게 기억할 수 있는 음악적인 특징인 멜로디에 해당하는 신호를 입력하여 그 멜로디가 포함되어 있는 음악을 검색하는 방법과 검색장치를 제공함을 목적으로 한다.The present invention is a method and a search apparatus for searching music stored in a database, the melody being a musical feature that can be easily remembered by a person without searching using textual information such as the title, singer, album name of the music, etc. It is an object of the present invention to provide a method and a search apparatus for searching for music including the melody by inputting a signal corresponding to the melody.

특히 본 발명에서는 멜로디 입력을 통한 음악 검색에 있어서, 음악적 정보가 수동으로 미리 추출되어 있는 음악 데이터 베이스를 대상으로 하지 않고, 음악적인 정보에 상응하는 정보를 보통의 음악 클립으로부터 자동 추출하여 검색에 사용함으로써, 사용자들이 흔히 접할 수 있는 음악 데이터들에 대하여 다른 부가적인 데이터 즉, 수동으로 인덱싱된 음악적 정보를 사용하지 않고 멜로디 기반 음악 데이터 검색을 할 수 있는 방법과 검색 장치를 제공함을 목적으로 한다.In particular, in the present invention, in the music search through the melody input, instead of targeting the music database where the music information has been manually extracted in advance, information corresponding to the music information is automatically extracted from the normal music clip and used for the search. Accordingly, an object of the present invention is to provide a method and a retrieval apparatus for melody-based music data retrieval without using other additional data, ie, manually indexed musical information, to music data commonly encountered by users.

상기 목적을 달성하기 위한 본 발명의 멜로디 기반 음악 검색방법은, 음악 데이터의 멜로디 특징을 음악으로부터 자동으로 추출하여 멜로디 특징 데이터 베이스를 구성하고, 질의로 입력되는 음악의 멜로디 특징을 자동으로 추출하여 이 추출된 질의 음악 멜로디 특징과 상기 멜로디 특징 데이터 베이스의 멜로디 특징의 유사도를 비교하여 음악 데이터를 검색하는 것을 특징으로 한다.The melody-based music retrieval method of the present invention for achieving the above object, by automatically extracting the melody feature of the music data from the music to form a melody feature database, and automatically extracts the melody feature of the music input by the query The music data is searched by comparing the similarity between the extracted query music melody feature and the melody feature of the melody feature database.

또한 상기 목적을 달성하기 위한 본 발명의 멜로디 기반 음악 검색장치는, 검색 대상이 되는 음악 데이터에서 멜로디 특징을 추출하는 음악 데이터 멜로디 특징 추출수단과, 질의로 입력되는 데이터의 멜로디 특징을 추출하는 질의 멜로디 특징 추출수단과, 상기 질의 멜로디 특징과 음악 멜로디 특징 데이터 베이스의 멜로디 특징의 유사도를 측정하는 유사도 측정수단을 포함하여 이루어지는 것을 특징으로 한다.In addition, the melody-based music retrieval apparatus of the present invention for achieving the above object, the music data melody feature extraction means for extracting the melody feature from the music data to be searched, and the query melody to extract the melody feature of the data input by the query And feature similarity measuring means for measuring similarity between the quality melody feature and the melody feature of the music melody feature database.

도1은 본 발명의 멜로디 기반 음악 검색장치의 실시예를 나타낸 도면으로서, 음악 데이터 베이스의 음악 데이터에서 자동으로 멜로디 특징을 추출하여 멜로디 특징 데이터 베이스를 구성하고, 질의 입력이 마이크를 통한 사용자의 허밍 입력인 경우에 허밍음으로부터 자동으로 멜로디 특징을 추출하고, 상기 추출된 질의 멜로디 특징정보를 이용해서 멜로디 특징 데이터 베이스를 대상으로 하여 음악을 검색하는 예이다.1 is a diagram illustrating an embodiment of a melody-based music retrieval apparatus of the present invention, in which a melody feature database is automatically extracted from the music data of a music database, and a melody feature database is generated. In the case of an input, the melody feature is automatically extracted from the hum and the music is searched for the melody feature database using the extracted query melody feature information.

도1의 멜로디 기반 음악 검색장치는, 검색 대상이 되는 음악 데이터를 저장하는 음악 데이터 베이스(1)와, 상기 음악 데이터 베이스의 음악 데이터에서 멜로디 특징을 추출하는 음악 데이터 멜로디 특징 추출부(2)와, 상기 추출된 음악 데이터 멜로디 특징정보가 저장되는 멜로디 특징 데이터 베이스(3)와, 질의 입력 데이터에서 멜로디 특징을 자동으로 추출하는 질의 입력 멜로디 특징 추출부(4)와, 상기 음악 데이터에서 추출한 멜로디 특징과 질의 멜로디에서 추출한 멜로디 특징을 비교하여 그 유사도를 측정하여 검색 결과를 출력하는 검색부(5)를 포함하여 이루어지고 있다.The melody-based music retrieval apparatus of FIG. 1 includes a music database 1 for storing music data to be searched for, a music data melody feature extracting unit 2 for extracting a melody feature from the music data of the music database; A melody feature database 3 storing the extracted music data melody feature information, a query input melody feature extractor 4 for automatically extracting a melody feature from the query input data, and a melody feature extracted from the music data And a searcher 5 for comparing the melody features extracted from the query melody and measuring the similarity and outputting a search result.

음악 데이터 베이스(1)에는 검색 대상이 될 음악 데이터가 저장된다. 음악 데이터 베이스(1)로부터 제공되는 음악 데이터는 음악 데이터 멜로디 특징 추출부(2)에서 멜로디 특징정보가 추출되고, 추출된 멜로디 특징정보는 멜로디 특징 데이터 베이스(3)에 저장된다. 질의로 입력되는 음악 데이터는 질의 메로디 특징 추출부(4)에 의해서 질의 멜로디에 대한 특징정보가 추출된다. 검색부(5)는 질의 멜로디로부터 추출된 특징정보와 상기 멜로디 특징 데이터 베이스(3)에 저장되어 있는 특징정보를 비교하여 양자간의 유사도를 산출하고, 유사도가 높은 순으로 검색 결과를 출력함으로써, 질의 입력 데이터와 유사한 음악 데이터를 검색하여 준다.The music database 1 stores music data to be searched for. The melody feature information is extracted from the music data melody feature extracting unit 2 and the extracted melody feature information is stored in the melody feature database 3 in the music data provided from the music database 1. As the music data input by the query, feature information about the query melody is extracted by the query melody feature extractor 4. The search unit 5 compares the feature information extracted from the query melody with the feature information stored in the melody feature database 3 to calculate the similarity between the two and outputs the search results in the order of high similarity. Search for music data similar to the input data.

상기 음악 데이터 베이스(1)나 멜로디 특징 데이터 베이스(3)는 하나의 컴퓨터 저장장치나, 분산되고 연결된 다수개의 컴퓨터 저장장치, 음악 재생장치에 부가된 저장장치 등을 포함하는 저장장치에 저장할 수 있으며, 멜로디 질의 입력은 마이크와 같은 오디오 신호 입력수단을 통하여 사용자가 허밍이나 노래로 입력한 멜로디, 다른 음악파일을 선택하여 입력된 멜로디, 키보드나 다른 기호 입력 수단을 통하여 멜로디를 표현하는 기호열로 입력된 멜로디, 멜로디를 표현하는 기호열이 저장되어있는 파일 등을 이용할 수 있다.The music database 1 or the melody feature database 3 may be stored in a storage device including one computer storage device, a plurality of distributed and connected computer storage devices, a storage device added to a music player, and the like. The melody query input is input as a melody input by a user through a audio signal input means such as a microphone, a melody input by hum or a song, a melody input by selecting another music file, or a melody string through a keyboard or other symbol input means. Melody, a file in which a symbol string representing a melody is stored, and the like can be used.

음악 데이터의 멜로디 특징은 압축되거나 압축되지 않은 모든 형태의 디지털 음악 데이터로부터 컴퓨터 알고리즘, 신호처리 프로세서 등으로 처리하여 자동으로 추출하여 멜로디 특징 데이터 베이스를 구축하고, 이에 대해 음악을 검색한다. 여기서 멜로디 질의는 압축되거나 압축되지 않은 모든 형태의 디지털 음악 데이터로 표현될 수 있으며, 또한 멜로디 질의는 기호로 표시된 멜로디 질의를 오디오 신호로 변환하여 멜로디 특징을 자동으로 추출하여 멜로디 특징 데이터 베이스에 대하여 검색할 수 있다. 또한 상기 특징 데이터 베이스는 기호로 표시된 음악 데이터를 오디오 신호로 변환하여 멜로디 특징을 자동으로 추출하여, 음악 데이터의 멜로디 특징을 데이터 베이스로 구축하고, 이에 대하여 검색한다.The melody feature of the music data is automatically extracted from all types of compressed or uncompressed digital music data using a computer algorithm, a signal processing processor, etc. to build a melody feature database, and search for music. Here, the melody query can be represented as all types of digital music data, compressed or uncompressed, and the melody query is automatically converted to an audio signal by converting a melody query represented by a symbol into an audio signal to search for a melody feature database. can do. In addition, the feature database converts the music data represented by the symbol into an audio signal to automatically extract the melody feature, constructs a melody feature of the music data into a database, and searches for the same.

한편, 상기 음악 멜로디 특징을 추출하는 장치 즉, 음악 데이터 멜로디 특징 추출부(2)와, 질의 멜로디 특징 추출 장치 즉, 질의 멜로디 특징 추출부(4)는 스펙트로그램 구성, 부분음 개선, 하모닉 합산, 프레임별 노트 에너지 벡터 산출, 노트 경계 분할, 노트 프래그먼트 구성 수단 및/또는 과정을 포함한다. 도2에 이와 같은 멜로디 특징 추출부를 나타내었다.Meanwhile, the apparatus for extracting the music melody feature, that is, the music data melody feature extractor 2 and the query melody feature extractor, that is, the query melody feature extractor 4, have a spectrogram structure, partial sound enhancement, harmonic summation, Frame-by-frame note energy vector calculation, note boundary segmentation, note fragment construction means and / or process. Figure 2 shows such a melody feature extraction.

도2를 살펴보면 본 발명에 따른 멜로디 특징 추출장치는, 멜로디 특징정보를 추출하기 위한 제1특징정보 추출부(6), 상기 제1특징정보 추출부에서 추출된 정보들을 이용하여 음(계) 분할(segmentation)을 수행하기 위한 제2특징정보 추출부(7), 상기 제1특징정보 추출부와 제2특징정보 추출부에서 추출된 정보들을 이용하여 최종적인 멜로디 특징정보를 추출하기 위한 제3특징정보 추출부(8)를 포함하여 이루어지고 있다. 멜로디 특징 추출을 위한 입력으로 16kHz의 샘플링 비, 8비트의 신호값 해상도, 단채널의 PCM 오디오 형식을 사용하여 음악 검색을 하는데 필요한 오디오 신호 특성의 손실없이 원 음악 신호의 정보량을 줄일 수 있다.Referring to FIG. 2, the apparatus for extracting melody features according to the present invention comprises a first feature information extractor 6 for extracting melody feature information, and a sound (system) segmentation using information extracted by the first feature information extractor. a third feature for extracting final melody feature information using information extracted by the second feature information extractor 7 for performing segmentation, the first feature information extractor and the second feature information extractor It includes the information extraction section (8). As an input for extracting melody features, the sampling rate of 16 kHz, 8-bit signal value resolution, and short channel PCM audio format can reduce the amount of information of the original music signal without losing the audio signal characteristics required for music searching.

상기 제1특징정보 추출부(6)는 프레임별로 특징정보 추출과정 즉, 스펙트로그램 구성부/또는 과정(601), 부분음 개선부/또는 과정(602), 하모닉 합산부/또는 과정(603), 프레임별 노트 에너지 벡터산출부/또는 과정(604)을 포함하여 이루어지고, 제2특징정보 추출부(7)는 세그멘테이션을 위하여 상기 부분음 개선정보를 이용해서 프레임별 에너지의 산출부/또는 과정(701), 산출된 프레임별 에너지로부터 최소 피크점의 추출부/또는 과정(702)을 포함하여 이루어지고, 상기 제3특징정보 추출부(8)는 세그먼트별로 특징정보의 추출 즉, 상기 프레임별 노트 에너지 벡터 및 최소 피크점을 이용한 세그먼트 노트 에너지 벡터의 산출부/또는 과정(801), 세그먼트 노트 에너지 벡터로부터 노트 프래그먼트의 구성부/또는 과정(802)에 의해서 이루어진다.The first feature information extractor 6 extracts feature information for each frame, that is, a spectrogram component / or process 601, a partial sound improving unit / or process 602, and a harmonic adder / or process 603. And a note energy vector calculation unit / or process 604 for each frame, and the second feature information extractor 7 calculates / or processes energy for each frame using the partial sound enhancement information for segmentation. 701, an extraction unit / or process 702 of the minimum peak point is calculated from the calculated energy for each frame, and the third feature information extraction unit 8 extracts feature information for each segment, that is, for each frame. A calculation unit / or process 801 of a segment note energy vector using the note energy vector and the minimum peak point, and a component / or process 802 of the note fragment from the segment note energy vector.

상기 도2 및 이하 설명될 도3 내지 도8을 참조하여 멜로디 특징정보를 추출하는 과정에 대해서 설명한다. 먼저, 음악 데이터를 음악 CD에서 발췌하여 음악 멜로디 특징을 자동으로 추출하는 경우에 오디오 형식을 변환하여 특징 추출 장치에 입력한다. 음악 CD에서 손실 없이 추출한 오디오 형식인 44 KHz 샘플링 비(sampling rate), 스테레오(stereo), 16 비트 해상도(bit resolution)를 갖는 디지털 신호를 16 KHz, 모노(mono), 8비트로 변환하여 특징을 추출한다. 이는 C0 (16.532 Hz) 음부터 B8 (7902.1Hz) 음까지를 포함할 수 있는 샘플링 비(sampling rate)이며, 음악 중에 멜로디라고 여겨지는 음들이 나타나는 대역의 상한인 B5 음(987.7Hz)에 대하여 8개의 부분음(partial)을 추출할 수 있다.A process of extracting melody feature information will be described with reference to FIG. 2 and FIGS. 3 to 8 to be described below. First, when the music data is extracted from the music CD and the music melody feature is automatically extracted, the audio format is converted and input into the feature extraction apparatus. Extract features by converting digital signals with 44 KHz sampling rate, stereo, and 16-bit resolution, a lossless audio format from music CDs, into 16 KHz, mono, and 8-bit do. This is a sampling rate that can range from C0 (16.532 Hz) notes to B8 (7902.1 Hz) notes, and 8 for B5 notes (987.7 Hz), the upper limit of the band at which notes considered to be melodies in music. Partial parts can be extracted.

질의 입력을 마이크를 통한 사람의 허밍으로 할 경우에 음악 데이터가 멜로디 특징 추출 장치에 입력되는 형식과 동일한 형식으로 질의 입력을 변환하여 사용한다.When the query input is a human hum through the microphone, the query input is converted into the same format as that of the music data input to the melody feature extraction apparatus.

1. 스펙트로그램 구성(spectrogram construction)1. spectrogram construction

멜로디 특징 추출을 위하여 신호에 대한 주파수 특성을 시간적으로 해석할 수 있는 스펙트로그램 구성을 수행하는데, 스펙트로그램 구성부/또는 과정(601)에 의해서 스펙트로그램을 구성하는 데는 아래의 식1과 같이 고속 푸리어 변환(FFT, Fast Fourier Transform)을 사용한다. 16kHz의 샘플링 비로 샘플링된 신호를 1024의 FFT 크기, 512의 프레임 중첩으로 스펙트로그램을 구성한다. 즉, 스펙트로그램의 FFT를 수행하는 크기(FFT size)는 1024 샘플(sample)이고, 512 샘플(sample)의 중첩을 가지며, 변환 전에 해밍 윈도우(Hamming window)를 사용하여 처리(windowing)한다.In order to extract the melody feature, spectrogram configuration is performed to interpret the frequency characteristics of the signal in time, and the spectrogram is constructed by the spectrogram component / or process 601 as shown in Equation 1 below. Fast Fourier Transform (FFT) is used. A sample sampled at a sampling rate of 16 kHz consists of a spectrogram with an FFT size of 1024 and frame overlap of 512. That is, the FFT size of the FFT of the spectrogram is 1024 samples, has 512 samples of overlap, and is processed by using a Hamming window before conversion.

여기에서 T는 오디오 클립의 프레임 크기, N은 FFT size(1024)이다.Where T is the frame size of the audio clip and N is the FFT size (1024).

이렇게 변환된 신호의 에너지 스펙트럼을 다음의 식2와 같이 구한다.The energy spectrum of the signal thus converted is obtained as shown in Equation 2 below.

2. 부분음 개선(partial enhancing]2. Partial enhancing

부분음 개선은 다중음으로 구성된 보통의 음악 클립에서 중요한 음(predominant sound, 사람에게 쉽게 잘 들리는 소리 즉, 뚜렷한 음)을 추출해 내기 위한 처리 작업으로서 부분음 개선부/또는 과정(602)에 의해서 이루어진다. 음악은 여러 가지의 요소음으로 구성이 되는데, 이 중에서 중요한 음은 큰 에너지와 선명한 부분음(partial)들을 가지는 요소음으로 결정된다. 여러 가지의 소리가 섞여있는 음악의 경우에 하나의 요소음은 다른 요소음들이 갖는 부분음들에 의해서 그 부분음이 사라지거나 선명도가 떨어지게 된다. 따라서 선명한(뚜렷한) 부분음을 추출하기 위하여 다음의 식3과 같은 부분음 개선 처리를 수행한다. The partial sound enhancement is performed by the partial sound enhancement unit / or process 602 as a processing operation for extracting a predominant sound from a normal music clip composed of multiple sounds. . Music is composed of various elemental sounds, of which the most important sound is determined by the elemental sound with great energy and clear partials. In the case of music in which various sounds are mixed, one element sound is lost or clarity due to the partial sounds of other element sounds. Therefore, in order to extract a clear (clear) partial sound, partial sound enhancement processing is performed as shown in Equation 3 below.

삭제delete

식3에서 W의 값은 FFT전에 신호에 가해지는 윈도우의 메인로브 크기와 비례하여 결정할 수 있고, 4 혹은 8의 값으로 사용한다. 이 과정은 뚜렷한 부분음을 추출해 내기 위해서 현재 FFT 인덱스와 주위 FFT 인덱스의 에너지 값의 차이를 평균하여 부분음을 개선하는 과정이다. 이를 위하여 FFT를 수행하기 전 신호에 가해지는 윈도우의 메인로브 크기로 평균할 주위의 인덱스 개수를 결정하는데는 주위의 8개 인덱스의 에너지값을 고려하거나, 주위의 16개의 인덱스의 에너지값을 고려하게 된다.In Equation 3, the value of W can be determined in proportion to the size of the main lobe of the window applied to the signal before the FFT, and used as a value of 4 or 8. This process improves the partial sound by averaging the difference between the energy values of the current FFT index and the surrounding FFT index in order to extract the distinct partial sound. To determine the number of surrounding indexes to average with the size of the main lobe of the window applied to the signal before performing the FFT, consider the energy values of the surrounding eight indexes or the energy values of the surrounding 16 indexes. do.

이 과정을 수행하면, 부분음의 절대적인 에너지가 크더라도 주변의 에너지보다 상대적으로 작은 부분음은 작은 값을 가지게 되고, 부분음의 절대적인 에너지가 작더라도 주변의 에너지보다 상대적으로 큰 부분음은 그 값이 커지게 된다. 즉, 도3에 나타낸 예와 같이, 부분음의 절대적인 에너지가 크더라도 개선된 부분음 스펙트럼 상에서 보면 주변의 에너지가 상대적으로 큰 부분음은 그 값이 커지게 되는 것이다. 도3의 (a)는 부분음 개선 전의 주파수별 에너지 스펙트럼 분포를 보여주고 있으며, 도3의 (b)는 부분음 개선 후의 주파수별 에너지 스펙트럼 분포를 보여주고 있다. 도3에서 알 수 있는 바와 같이 부분음 개선(602)을 수행하면 주변의 에너지보다 상대적으로 작은 부분음은 사람의 귀에 잘 들리지 않고, 상대적으로 큰 부분음은 사람의 귀에 잘 들린다는 특징을 반영하게 되고, 이는 사람에게 쉽게 잘 들리는 소리의 부분음의 에너지를 강화하는 효과를 가져오게 된다. 부분음 개선의 결과는 후단의 하모닉 합산부/또는 과정(603)을 위해서 제공되는 한편, 세그멘테이션을 위한 프레임별 에너지 산출부/또는 과정(701)을 위해서도 제공된다.In this process, even if the absolute energy of the partial sound is large, the partial sound that is relatively smaller than the surrounding energy has a smaller value. Even if the absolute energy of the partial sound is small, the partial sound that is relatively larger than the surrounding energy is the value. Will become large. That is, as shown in the example shown in FIG. 3, even if the absolute energy of the partial sound is large, the partial sound having a relatively large surrounding energy becomes larger when viewed on the improved partial sound spectrum. FIG. 3 (a) shows an energy spectrum distribution by frequency before partial sound improvement, and FIG. 3 (b) shows an energy spectrum distribution by frequency after partial sound improvement. As can be seen in FIG. 3, when the partial sound improvement 602 is performed, the partial sound relatively smaller than the surrounding energy is hard to be heard by the human ear, and the relatively large partial sound is easily heard by the human ear. This has the effect of enhancing the energy of the partial sound of the sound that is easily heard by a person. The result of the partial sound enhancement is provided for the harmonic adder / or process 603 at the next stage, while also for the frame-by-frame energy calculator / or process 701 for segmentation.

3. 하모닉 합산(harmonic sum)3. Harmonic sum

위와 같이 개선된 부분음을 입력으로 하여 하모닉 합산이 하모닉 합산부/또는 과정(603)에 의해서 이루어진다. 하모닉 합산과정에서는 개선된 부분음을 주파수 영역에서 등 간격으로 에너지를 합산하여 프레임별 피치값을 추출한다. 또한 합산된 부분음의 수로 합산 값을 정규화하며, 합산의 고려 대상을 FFT 크기의 절반보다 작게 설정한다.The harmonic summation is performed by the harmonic adder / or process 603 by inputting the improved partial sound as described above. In the harmonic summing process, the improved partial sound is summed at equal intervals in the frequency domain to extract pitch values for each frame. In addition, the summation value is normalized by the sum of the summed partial tones, and the summation consideration is set to less than half of the FFT size.

음의 인식에 가장 중요한 요소는 음의 하모닉 특성(harmonicity)이다. 사람의 발성이나 악기의 연주에 의한 소리는 발생기관의 특성에 의해서 주파수 영역에서 부분음들이 일정한 간격으로 나타나게 된다. 음의 인식은 이러한 부분음들이 얼마나 하모닉 특성을 갖는가를 인지하는 과정이다. 하모닉 합산에 의한 피치(pitch) 추출 방법은 다른 어떤 방법보다 성공적임이 보고되어 왔고, 본 발명에서는 개선된 부분음을 입력으로 하여 다음의 식4와 같이 하모닉 합산을 수행한다.The most important factor in the recognition of sound is the harmonic nature of the sound. The sound produced by human utterance or instrumental performance causes partial sounds to appear at regular intervals in the frequency domain due to the characteristics of the generator. Sound recognition is the process of recognizing how harmonic these parts sound. It has been reported that the pitch extraction method by the harmonic summation is more successful than any other method. In the present invention, the harmonic summation is performed by using the improved partial sound as shown in Equation 4 below.

여기에서 [x]는 x를 넘지 않는 정수를 나타낸다. 이와 같은 하모닉 합산 과정의 수행으로 기본 주파수 p에 해당하는 음의 크기를 알 수 있다. 도4에 하모닉 합산의 효과를 예로 나타내었다. 도4에 나타낸 바와 같이 개선된 부분음 스펙트럼을 기반으로 하여 하모닉 합산을 수행하면 기본 주파수 p에 해당하는 음의 크기 즉, 주파수별 에너지를 알 수 있게 된다.Here, [x] represents an integer not exceeding x. By performing such a harmonic summation process, the loudness corresponding to the fundamental frequency p can be known. 4 shows the effect of harmonic summing. As shown in FIG. 4, when the harmonic summation is performed based on the improved partial sound spectrum, the loudness corresponding to the fundamental frequency p, that is, the energy for each frequency can be known.

4. 노트 에너지 계산 (note energy calculation)4. note energy calculation

프레임별 노트 에너지 벡터의 산출은 상기 하모닉 합산의 결과를 이용해서 프레임별로 노트 에너지 벡터(note energy vector)를 산출하는 것으로, 이는 프레임별 노트 에너지 벡터 산출부/또는 과정(604)에 의해서 이루어진다. 프레임별 노트 에너지 벡터 산출 과정에서는 음악의 표준 음계 대역별로 하모닉 합산 값을 구한다. 즉, 노트 에너지를 산출할 때 108개의 표준 음계 대역을 사용한다. 그리고 노트 대역 경계에서의 에너지 값을 FFT 인덱스에서의 에너지 값을 이용하여 보간(interpolation)해서 사용한다.The calculation of the note energy vector for each frame is to calculate a note energy vector for each frame by using the result of the harmonic summation, which is performed by the frame-by-frame note energy vector calculation unit / or process 604. In the calculation of the note energy vector for each frame, the harmonic sum is calculated for each standard scale band of music. That is, 108 standard scale bands are used to calculate note energy. The energy value at the note band boundary is interpolated using the energy value at the FFT index.

사람이 음을 인식하는 데는 어느 정도 분해능이 있기 때문에 기본 주파수 별로 하모닉 합산으로 표현한 신호를 대역별로 분리해서 표현한다. 음악에서는 표준적으로 사용하는 음계가 있기 때문에 그 표준음계에 해당하는 대역별로 신호를 표현한다. 표준음계는 C0음에 해당하는 16.532Hz부터 B8음에 해당하는 7902.1Hz까지 걸쳐있는데, 각 음별로 상한과 하한이 결정된다. 현대 음악은 옥타브 당 12음으로 나뉘어지고, 이러한 체계의 음계로 표현된 음 사이의 관계는 다음의 식5와 같이 나타난다.Since humans have some resolution in recognizing sound, the signal expressed by harmonic summation by fundamental frequency is expressed separately by band. Since music has a standard scale used in music, signals are represented by bands corresponding to the standard scale. The standard scale ranges from 16.532 Hz, which corresponds to C0, to 7902.1 Hz, which corresponds to B8. The upper and lower limits for each note are determined. Modern music is divided into twelve notes per octave, and the relationship between the notes expressed in the scale of this system is expressed by Equation 5 below.

식5에서 I는 표준음계로 표현된 음의 차이이고 R은 두 음의 기본 주파수 비율이다. 이 식에 의하면 음계상에 인접한 두 음의 비율은 1.059463이다.In Equation 5, I is the difference between the notes expressed in the standard scale and R is the fundamental frequency ratio of the two notes. According to this equation, the ratio of two adjacent notes on the scale is 1.059463.

표준음계와 대역에 의해서 하모닉 합산에 대한 대역별 에너지를 구하게 되는데, 이를 노트 에너지(note energy)라 하고, 이는 다음의 식6과 같이 구해진다.The band-specific energy for the harmonic summation is obtained by the standard scale and the band. This is called note energy, which is obtained as shown in Equation 6 below.

식6에서 M은 표준음계의 전체 크기를 나타내는 108이며, 따라서 각 프레임의 노트 에너지는 108개의 요소를 가지는 벡터로 표시된다. 노트 에너지로 음을 표현하면, 음의 시간적인 미세한 차이(바이브레이션 등)를 하나의 음으로 표현할 수 있고, 이후에 매칭의 과정에서 피치 시프트 과정이 간편해진다. 도5는 노트 에너지의 산출과정을 보여준다. 도5의 (a)는 FFT 인덱스에서의 하모닉 합산값과 인접한 하모닉 합산값을 이용해서 보간(interpolation)된 값이며, (b)는 FFT 인덱스, (c)는 노트 인덱스를 각각 나타내고 있다. 도5에서 알 수 있듯이 오디오 신호의 주기적인 성질을 이용해서 구한 하모닉 합산값에 대한 대역별 에너지 즉, 노트 에너지로 음을 표현하였으며, 도5의 (a)와 같은 하모닉 합산값을 기반으로 하여 (c)에 나타낸 바와 같이 노트 C2, C#2, D2, D#2, E2, F2 에 대한 에너지를 산출하였다.In Equation 6, M is 108, which represents the total magnitude of the standard scale, so the note energy of each frame is represented by a vector having 108 elements. When the sound is expressed by note energy, the minute time difference (vibration, etc.) of the sound can be expressed as a single sound, and the pitch shift process is simplified in the matching process. 5 shows the calculation process of note energy. Fig. 5A is a value interpolated using the harmonic summation value and the adjacent harmonic summation value in the FFT index, (b) shows an FFT index, and (c) shows a note index, respectively. As can be seen in FIG. 5, the sound is expressed by band-specific energy, that is, note energy, which is obtained using the periodic property of the audio signal, and based on the harmonic summation value as shown in FIG. As shown in c), the energy for notes C2, C # 2, D2, D # 2, E2, F2 was calculated.

이와 같이 구한 프레임별 노트 에너지 벡터는 세그먼트 노트 에너지 벡터의 산출을 위하여 제공된다.The frame-specific note energy vector thus obtained is provided for calculation of the segment note energy vector.

5. 노트 분할(note segmentation)5. Note segmentation

노트 분할과정 즉, 음(계) 세그멘테이션 과정은 음의 성격이 같은 프레임들을 하나의 세그먼트로 묶는 과정으로, 이 과정을 통해 멜로디 특징을 세그먼트로 표현함으로써 저장과 매칭에 필요한 데이터의 양을 줄일 수 있게 한다. 즉, 노트 경계의 분할 과정은 상기 추출한 개선된 부분음의 프레임별 에너지의 극소점들 중에서 일정한 프레임 간격 내에서의 최소점이 극소점이 되는 점들을 세그먼트의 경계점으로 선택하는 방법으로 노트 분할을 수행하는 것이다.The note segmentation process, which is the segmentation process, is a process of grouping the frames of the same musical character into one segment. This process reduces the amount of data required for storage and matching by expressing the melody characteristics in segments. do. That is, the division of the note boundary is performed by dividing the note by selecting a point where the minimum point becomes the minimum point within a certain frame interval among the extracted minimum points of the energy of the frame of the improved partial sound as the boundary point of the segment. .

사람이나 악기가 하나의 음을 내기 위해서는 일정 시간동안 경과음을 발생시키는데 이 시간동안에는 음의 에너지 값이 작다. 여러 소리가 섞여있는 음악에서 중요한 음의 에너지 변화를 얻어내기 위해서 개선된 부분음 스펙트럼을 사용하여 에너지를 구한다. 이 개선된 부분음 스펙트럼으로부터 얻어진 에너지가 시간적으로 변하게 되는데, 개선된 부분음의 프레임별 에너지의 극소점들 중에서 일정한 프레임 간격 내에서의 최소점이 극소점이 되는 점들을 세그먼트의 경계점으로 설정한다.In order for a person or musical instrument to produce a single note, an elapsed sound is generated for a certain period of time, during which the energy value of the note is small. In order to obtain significant energy change in the mixed music, the energy is calculated using the improved partial tone spectrum. The energy obtained from this improved partial tone spectrum is changed in time. Among the minimum points of the frame-by-frame energy of the improved partial tone, the minimum point within a constant frame interval is set as the boundary point of the segment.

도6은 노트 분할 과정의 예를 보여준다. 즉, 제2특징정보 추출부(7)에서 프레임별 에너지 산출부/또는 과정(701)과 최소 피크점 추출부/또는 과정(702)을 통해서 노트 분할을 수행하는데, 부분음 개선을 통해서 개선된 부분음을 입력으로 하여, 개선된 부분음의 프레임 에너지를 구하고, 구해진 프레임 에너지의 극소점들 중에서 일정한 프레임 간격 내에서 최소점이 극소점이 되는 점들을 세그먼트의 경계점으로 설정하는 것이다. 이와 같은 방법으로 노트 분할을 수행하고, 구해진 노트 분할의 경계점에 관한 정보를 제3특징정보 추출부(8)의 세그먼트 노트 에너지 벡터 산출부/또는 과정(801)을 위해서 제공한다.6 shows an example of a note division process. That is, the second feature information extractor 7 performs note division through the frame-specific energy calculator / or process 701 and the minimum peak point extractor / or process 702, which are improved through partial sound improvement. By inputting the partial sound, the frame energy of the improved partial sound is obtained, and among the obtained minimum points of the frame energy, the minimum points within the constant frame interval are set as the boundary points of the segments. Note splitting is performed in this manner, and information about the obtained boundary point of note splitting is provided for the segment note energy vector calculating unit / or process 801 of the third feature information extracting unit 8.

6. 노트 프래그먼트 구성(note fragment construction)6. Note fragment construction

노트 프래그먼트 구성은 하나의 세그먼트안에 포함되어 있는 프레임 노트 에너지 벡터들의 요소 값 평균으로 세그먼트 노트 에너지를 나타내어 노트 프래그먼트를 구성하는 과정이다. 노트 프래그먼트 구성은 상기 노트 분할 과정에서 구한 노트 경계점 내의 노트 에너지 벡터를 요소별로 평균하고 여러 개의 극대값들을 추출하여 세그먼트를 표현하는 노트 에너지 벡터를 구하는 과정이다. 이 때 세그먼트를 표현하는 노트 에너지 벡터의 산출은 노트 에너지 벡터의 요소값들의 평균으로 각 요소 값을 나누어 노트 에너지 벡터를 산출할 수 있다. 또한, 음악 데이터의 경우보다 질의 입력인 경우에 더 적은 수의 극대값들을 추출하는 방법으로 노트 프래그먼트를 구성한다. 예를 들어, 질의 입력이 사용자가 마이크를 통해 입력한 허밍일 경우에, 음악 데이터의 노트 에너지 벡터에서는 7개의 극대값을 추출하고, 허밍 질의 입력의 노트 에너지 벡터에서는 3개의 극대값을 추출한다.The note fragment configuration is a process of constructing a note fragment by representing segment note energy as an average of element value of frame note energy vectors included in one segment. The note fragment configuration is a process of obtaining a note energy vector representing a segment by averaging the note energy vectors within a note boundary point obtained in the note splitting process for each element and extracting a plurality of local maxima. In this case, the note energy vector representing the segment may be calculated by dividing each element value by an average of element values of the note energy vector. In addition, note fragments are constructed by extracting fewer maximum values in the case of query input than in the case of music data. For example, when the query input is a humming input by a user through a microphone, seven maximum values are extracted from the note energy vector of the music data, and three maximum values are extracted from the note energy vector of the humming query input.

노트 프래그먼트 구성은 제3특징정보 추출부(8)에서 이루어진다. 노트 프래그먼트 구성은 상기 노트 세그멘테이션 과정에서 구해진 노트분할 정보와, 제2특징정보 추출부(6)의 프레임별 노트 에너지 벡터 산출부/또는 과정(604)을 통해서 구해진 프레임별 노트 에너지 벡터를 이용해서 세그먼트별 노트 에너지 벡터의 산출부/또는 과정(801)에서 프레임별 노트 에너지 벡터의 평균을 구하여 세그먼트 노트 에너지 벡터를 구하고, 이로부터 노트 프래그먼트 구성부/또는 과정(802)에서 노트 프래그먼트를 구성하게 된다.The note fragment configuration is performed by the third feature information extraction unit 8. The note fragment configuration is segmented using the note splitting information obtained in the note segmentation process and the note energy vector for each frame obtained through the frame-specific note energy vector calculating unit / or process 604 of the second feature information extractor 6. In the calculation unit / or process 801 of each note energy vector, the average of the note energy vectors for each frame is obtained to obtain a segment note energy vector. From this, the note fragment configuration unit / or process 802 configures the note fragment.

노트 프래그먼트 구성은 다음의 식7과 같이 구한다.The composition of the note fragment is obtained as shown in Equation 7 below.

여기에서 C값은 세그먼트 S_l에 포함되어 있는 프레임의 개수이며, l_s는 세그먼트의 시작 프레임 인덱스, l_e는 세그먼트의 끝 프레임 인덱스이다. 최종적으로 노트 에너지 벡터는 그 벡터의 요소 값의 평균으로 각 요소를 정규화 하여 구성한다.Here, C value is the number of frames included in the segment S _l , l _s is the start frame index of the segment, l _e is the end frame index of the segment. Finally, the note energy vector is constructed by normalizing each element by the average of the element values of the vector.

그리고, 세그먼트에서 중요한 음들을 추출하는데 노트 에너지의 크기에 따라서 음악 클립일 경우에는 상위 7개 요소의 값을, 마이크를 통한 허밍 질의 입력일 경우에는 상위 3개의 값을 추출한다.In addition, in order to extract important notes from the segment, the value of the top 7 elements is extracted in the case of a music clip and the top 3 values in the case of a humming query input through a microphone according to the magnitude of the note energy.

이와 같이 하여 멜로디 특징정보를 추출하였다. 상기한 멜로디 특징정보 추출부/또는 과정을 통해서 추출된 멜로디 특징정보는 음악 데이터 베이스로부터의 음악 데이터에 대한 것이거나, 입력된 질의 데이터에 대한 것이다.In this way, the melody characteristic information was extracted. The melody feature information extracted through the melody feature information extractor / or process is for music data from a music database or for input query data.

따라서, 도1에 나타낸 바와 같이 음악 데이터 베이스(1)에 대하여 추출된 멜로디 특징정보인 경우에는 해당 데이터 베이스(3)에 저장하고, 입력 질의 데이터에 대한 멜로디 특징정보인 경우에는 검색부(5)에 제공됨으로써, 양자의 유사도 계산을 통해 검색 결과가 출력된다.Therefore, as shown in FIG. 1, the melody feature information extracted from the music database 1 is stored in the database 3, and the search unit 5 in the case of the melody feature information of the input query data. The search results are output through the similarity calculation of both.

7. 유사도의 계산7. Calculation of Similarity

유사도의 계산(similarity matching) 방법에 대해서 살펴본다. 유사도의 계산은 음악 멜로디 특징과 질의 멜로디 특징의 시간적인 차이, 부분적인 변이, 음악과 질의 입력의 전체적인 음 높이 차이 등을 고려하여 계산한다.The method of calculating similarity will be described. The similarity is calculated by considering the temporal difference between the music melody feature and the query melody feature, partial variation, and the overall pitch difference between the music and the query input.

질의 데이터가 허밍일 경우에 허밍은 음악 클립과 전체 길이가 다르고, 부분적으로 음이 틀리거나 길이가 다르게 나타날 수 있으며, 또한 피치 추출 등의 과정에서 잘못된 음 정보가 추출되어 있을 수도 있다. 따라서 부분적인 오류를 허용하면서 시간적인 길이가 다르고 부분적으로 변화를 보이는 두 신호를 매칭하는 방법인 DP 매칭 (Dynamic Programming Matching) 방법을 사용한다. 또한 허밍과 음악 클립의 전체적인 음 높이 차이, 즉 피치 시프트(pitch shift)가 발생하므로, 행렬 요소의 유사도 값을 산출할 때 세그먼트 노트 에너지 벡터의 인덱스를 시프트 시키면서 DP 매칭한다.When the query data is humming, the humming may be different from the music clip in overall length, partially different or different in length, and wrong sound information may be extracted in the process of pitch extraction. Therefore, we use DP Programming (Dynamic Programming Matching), which is a method of matching two signals that have different temporal lengths and partially change while allowing partial errors. In addition, since the overall pitch difference between the humming and the music clip, that is, the pitch shift occurs, the DP matching is performed while shifting the index of the segment note energy vector when calculating the similarity value of the matrix elements.

DP 매칭은 다음과 같이 수행되며, 도7 및 도8을 참조한다.DP matching is performed as follows, see FIGS. 7 and 8.

두개의 시계열 R과 Q를 가정하고 그 길이를 각각 NR, NQ라고 하면, R과 Q는 각각 다음과 같이 나타낼 수 있다.If two time series R and Q are assumed and the lengths are NR and NQ, respectively, R and Q can be expressed as follows.

R = r₀,r₁,r₂,...,r_i,...,r_NR-1 R = r ₀ , r ₁ , r ₂ , ..., r _i , ..., r _NR-1

Q = q₀,q₁,q₂,...,q_j,...,q_NQ-1 Q = q ₀ , q ₁ , q ₂ , ..., q _j , ..., q _NQ-1

두 계열 R,Q를 정합하기 위해서 i행 × j열 행렬을 형성하는데, 피치 시프트가 ps일 때, 행렬의 (i번째, j번째) 요소는 유사도 값인 d_ps(r_i, q_j) 값을 나타낸다. 유사도 값은 다음의 식8과 같이 표현된다.To match two series R, Q, an i-row × j-column matrix is formed. When the pitch shift is ps, the (i-th, j-th) elements of the matrix take on the similarity values d _ps (r _i , q _j ). Indicates. The similarity value is expressed as in Equation 8 below.

식8에서 행렬 요소 (i, j)는 각각 r_i와 q_j의 정합에 해당하며, 피치 시프트가 ps일 때의 정합 경로 C_ps는 R과 Q의 정합을 결정하는 연속적인 행렬 요소들의 집합으로 정의된다. 피치 시프트가 ps일 때의 정합 경로 C_ps의 k번째 요소는 c_ps,k=(i, j)로 정의되며, 따라서 다음과 같이 정합 경로 C_ps를 표현한다.In Equation 8, the matrix elements (i, j) correspond to the matching of r _i and q _j respectively, and the matching path C _ps when the pitch shift is _ps is a set of continuous matrix elements that determine the matching of R and Q. Is defined. The kth element of the matching path C _ps when the pitch shift is ps is defined as c _{ps, k} = (i, j), thus representing the matching path C _ps as follows.

일정한 피치 시프트 ps일 때, 많은 정합 경로 C_ps가 존재할 수 있으나, 다음의 식9와 같이 정합 비용이 최소화되는 경로를 최적의 정합 경로로 선택한다.When there is a constant pitch shift ps, many matching paths C _ps may exist, but the path that minimizes the matching cost is selected as an optimal matching path as shown in Equation 9 below.

식9에서 분모의 K_ps는 여러 정합 경로가 서로 다른 길이를 가지는 것을 보상하기 위한 것이다.In Equation 9, K _ps of the denominator is intended to compensate for different matching paths having different lengths.

그리고, 여러 피치 시프트 값에 따라 산출한 최소 경로 비용의 최소 값으로 음악 멜로디 특징과 허밍 멜로디 특징의 유사도 값을 다음의 식10과 같이 결정한다.The similarity value between the music melody feature and the humming melody feature is determined as the minimum value of the minimum path cost calculated according to various pitch shift values as shown in Equation 10 below.

여기서 정합비용의 합은 유사도를 의미하며, 상기한 정합비용이 작을수록 멜로디 특징정보 요소 사이의 차가 적기 때문에 질의 데이터와 대상 음악 데이터 사이의 유사도가 높다.Here, the sum of the matching costs means the similarity. The smaller the matching cost, the smaller the difference between the melody characteristic information elements, and thus the similarity between the query data and the target music data is high.

따라서, 이와 같이 하여 음악 메로디 특징과 질의 멜로디 즉, 허밍 멜로디 특징의 유사도값을 결정하고, 그 유사도 값이 가장 높은 순으로 음악 데이터를 정렬함으로써 사용자가 원하는 음악 데이터 검색 결과를 낼 수 있게 된다.Thus, by determining the similarity value between the music melody feature and the query melody, that is, the humming melody feature, and sorting the music data in the order of the highest similarity value, the user can obtain the desired music data search result.

도7에서 격자의 교차점은 노트 프래그먼트를 표현하며, 가로축은 질의 데이터의 노트 프래그먼트를, 세로축은 음악 데이터의 노트 프래그먼트를 각각 표현하게 된다.In FIG. 7, the intersection points of the grids represent note fragments, the horizontal axis represents note fragments of query data, and the vertical axis represents note fragments of music data.

상기한 정합 경로 후보에서 최소 경로를 결정할 때에는 3방향 정합 경로 후보 또는 5방향 정합 경로 후보에서 최소 경로를 결정하는 방법을 실시할 수 있다.When determining the minimum path from the matched path candidate, a method of determining the minimum path from the 3-way matched path candidate or the 5-way matched path candidate may be performed.

3방향 정합 경로 후보에서 최소 경로를 결정할 때의 정합 경로는 다음의 식11과 같은 정합식에 의해서 구해진다.The matching path at the time of determining the minimum path in the three-way matching path candidate is obtained by a matching equation as shown in Equation 11 below.

그리고, 5방향 정합 경로 후보에서 최소 경로를 결정할 때의 정합 경로는 다음의 식12와 같은 정합식에 의해서 구해진다.The matching path at the time of determining the minimum path in the 5-way matching path candidate is obtained by a matching equation as shown in Equation 12 below.

도8의 (a)는 3방향 정합 경로 후보의 예를 보여주며, 도8의 (b)는 5방향 정합 경로 후보의 예를 보여준다. 도8의 (a)에서는 현재 노트 프래그먼트에서 가로 및 세로와 대각선 방향의 인접한 노트 프래그먼트로의 3방향에 대한 정합 경로 후보를 보여주고 있으며, 도8의 (b)에서는 세로와 대각선 사이의 대각선 방향과, 가로와 대각선 사이의 대각선 방향(점선으로 표현하였다)을 포함하는 5방향에 대한 정합 경로 후보를 보여주고 있다.FIG. 8A shows an example of a three-way match path candidate, and FIG. 8B shows an example of a five-way match path candidate. FIG. 8 (a) shows matching path candidates in three directions from the current note fragment to adjacent note fragments in the horizontal, vertical, and diagonal directions. In FIG. We show the matching path candidates for five directions, including the diagonal direction (indicated by dashed lines) between the horizontal and diagonal lines.

지금까지 설명한 DP 매칭은 본 발명에서 다음과 같이 응용하여 유사도 계산이 이루어질 수 있다.The DP matching described so far may be applied to the similarity calculation in the present invention as follows.

먼저, 유사도 측정 장치가 음악 멜로디 특징과 질의 멜로디 특징의 시간적인 차이, 부분적인 변이, 음악과 질의 입력의 전체적인 음 높이 차이 등을 고려하여 유사도를 측정함에 있어서, 상기한 멜로디 특징 추출의 수행 결과, 특히 구성된 노트 프래그먼트를 이용하여 DP 매칭을 수행한다.First, in the similarity measurement device measuring the similarity in consideration of the temporal difference between the music melody feature and the query melody feature, partial variation, and the overall pitch difference between the music and the query input, the result of performing the melody feature extraction, In particular, DP matching is performed using the configured note fragment.

또한, 유사도 측정 시 DP 매칭을 사용함에 있어서, DP 행렬의 요소를 상기 세그먼트를 표현하는 노트에너지 벡터로 표현된 음악 멜로디 특징과 질의 멜로디 특징의 벡터 간의 정규화된 유클리디안 거리를 이용하여 산출한다.In addition, when using DP matching in measuring similarity, the elements of the DP matrix are calculated using the normalized Euclidean distance between the vector of the music melody feature and the query melody feature represented by the note energy vector representing the segment.

여기서, 상기 DP 행렬의 요소를 산출할 때, 피치 시프트된 특징 벡터를 이용한 거리를 이용하여 산출하고, 여러 가지의 피치 시프트 값에서 산출된 DP 행렬을 바탕으로 DP 매칭을 하여 얻어진 여러 가지의 최소 비용 정합 경로 중에서 최소값을 내어주는 피치 시프트를 결정하고, 이 최소값을 음악 멜로디 특징과 질의 멜로디 특징의 유사도로 사용한다.Here, when calculating the elements of the DP matrix, various minimum costs obtained by calculating the distance using the pitch shifted feature vector and performing DP matching based on the DP matrix calculated from various pitch shift values A pitch shift giving a minimum value in the matching path is determined, and the minimum value is used as the similarity between the music melody characteristic and the quality melody characteristic.

또한 DP 매칭을 이용한 유사도 계산에서, 상기 마이크를 통한 사용자의 허밍을 질의 입력으로 사용하고, 여러 가지 피치 시프트 값에서 DP 매칭을 수행하여 음악 멜로디 특징과 질의 입력 멜로디 특징의 유사도를 산출할 때, 피치 시프트 값을 보통 사용자가 가능한 최대 최저 피치 시프트 값으로 한정하여 검색할 수 있다.In the similarity calculation using the DP matching, when the user's humming through the microphone is used as a query input and DP matching is performed at various pitch shift values, the pitch of the music melody feature and the query input melody feature is calculated. The shift value can usually be retrieved by limiting the user to the maximum possible minimum pitch shift value.

또한 상기 유사도 측정 시 DP 매칭을 사용함에 있어서, 현재 정합점까지의 최소 비용 정합 경로를 구함에 있어서 3방향 경로의 최소값을 구하거나, 현재 정합점까지의 최소 비용 정합 경로를 구함에 있어서 5방향 경로의 최소값을 구하는 방법으로 DP 매칭을 사용할 수 있다.In addition, in using the DP matching to measure the similarity, in order to obtain the minimum cost matching path to the current matching point, the minimum value of the three-way path or to obtain the minimum cost matching path to the current matching point DP matching can be used to find the minimum value of.

또한, 상기 유사도 측정 시 DP 매칭을 사용함에 있어서, 전체 최소 비용 정합 경로를 구할 때, 그 비용을 비용 경로의 길이로 정규화하여 산출할 수 있다.In addition, in using the DP matching in the similarity measurement, when the total minimum cost matching path is obtained, the cost may be normalized to the length of the cost path.

또는 상기한 바와 같이 현재 정합점까지의 최소 비용 정합 경로를 구함에 있어서 후보 방향 경로 비용을 현재 정합점까지의 정합 길이로 정규화하여 산출하는 방법을 사용할 수 있다.Alternatively, as described above, in obtaining the minimum cost matching path to the current matching point, a method of normalizing the candidate direction path cost to the matching length to the current matching point may be used.

이러한 DP 매칭 방법에 여러 가지 변형으로 성능을 향상시킬 수 있는데, 본 발명에서는 다음과 같은 DP 매칭의 변형을 사용하였다.Performance can be improved by various modifications to the DP matching method. In the present invention, the following modifications of DP matching are used.

즉, 정합 윈도우를 사용하여 DP 행렬의 가장자리 부분을 정합하지 않고 매칭을 수행함으로써, 매칭 속도를 향상시킬 수 있다. That is, the matching speed may be improved by performing matching without matching the edges of the DP matrix using the matching window.

또한, 최소 비용 정합 경로가 대각선 정합 경로에서 벗어나는 정도를 반영할 수 있는 유사도 산출 방법을 사용할 수 있다. 최소 비용 정합 경로가 대각선 정합 경로에서 벗어나는 정도를 측정하는 데에는 최소 비용 경로의 길이를 음악 멜로디 특징의 길이와 질의 멜로디 특징의 길이의 합으로 나눈 값을 사용한다. 이러한 최소 비용 정합 경로가 대각선 정합 경로에서 벗어나는 정도를 측정하는 값은 상기에서 산출한 최소 비용 정합 경로의 비용 값에 더하거나 곱해서 사용할 수 있다. In addition, a similarity calculation method may be used that may reflect the degree to which the minimum cost matching path deviates from the diagonal matching path. To measure the extent to which the minimum cost matching path deviates from the diagonal matching path, the length of the minimum cost path is divided by the sum of the length of the music melody feature and the length of the query melody feature. The value for measuring the degree of deviation of the minimum cost matching path from the diagonal matching path may be used by adding to or multiplying the cost value of the minimum cost matching path calculated above.

한편, 5방향의 정합 스텝 패턴을 사용하여 부분적으로 오류가 있는 유사도 값의 요소를 건너뛰어 매칭할 수 있도록 한다. 또는 수직, 수평 정합 스텝 패턴이 2회 이상 반복적으로 나타나지 않게 함으로써 대각선에서 크게 벗어나는 정합을 수행하지 않도록 한다.On the other hand, a matching step pattern in five directions can be used to skip over and match elements with similarly misleading values. Alternatively, the vertical and horizontal matching step patterns do not appear repeatedly two or more times so as not to perform a registration that deviates greatly from the diagonal.

지금까지 설명한 DP 매칭 방법을 사용해서 구하여진 유사도 값을 이용하여 가장 유사한 세그먼트를 포함하는 곡부터 차례대로 일정한 개수 이상을 검색결과로 출력함으로서, 멜로디 기반의 음악 검색이 이루어지게 된다.Melody based music search is performed by outputting a predetermined number or more as a search result starting from a song including the most similar segment using the similarity value obtained using the DP matching method described above.

본 발명은 음악적 정보가 수동으로 미리 추출되어 있는 음악 데이터 베이스를 대상으로 하지 않고, 음악적인 정보에 상응하는 정보를 보통의 음악 클립으로부터 자동 추출하여 검색에 사용함으로써, 사용자들이 흔히 접할 수 있는 음악 데이터들에 대하여 다른 부가적인 데이터 즉, 수동으로 인덱싱된 음악적 정보를 사용하지 않고 멜로디 기반 음악 데이터 검색을 할 수 있는 방법과 검색 장치를 제공하였다.The present invention does not target a music database in which musical information has been manually extracted in advance, and by automatically extracting information corresponding to the musical information from a normal music clip and using the same for searching, the music data commonly encountered by users. The present invention provides a method and a retrieval apparatus for melody-based music data retrieval without using other additional data, ie, manually indexed musical information.

도1은 본 발명의 멜로디 기반 음악 검색장치의 블럭도1 is a block diagram of a melody-based music retrieval apparatus of the present invention.

도2는 본 발명에서 멜로디 특징 추출과정을 나타낸 도면2 is a view showing a melody feature extraction process in the present invention

도3은 본 발명에서 부분음 개선을 설명하기 위한 도면3 is a view for explaining partial sound improvement in the present invention;

도4는 본 발명에서 하모닉 합산을 설명하기 위한 도면4 is a view for explaining the harmonic summation in the present invention;

도5는 본 발명에서 노트 에너지 산출을 설명하기 위한 도면5 is a view for explaining the note energy calculation in the present invention

도6은 본 발명에서 노트 분할을 설명하기 위한 도면6 is a diagram for explaining note division in the present invention.

도7은 본 발명에서 정합 경로의 예를 설명하기 위한 도면7 is a view for explaining an example of the matching path in the present invention;

도8은 본 발명에서 3방향/5방향 정합 경로 후보를 설명하기 위한 도면8 is a view for explaining a three-way / five-way matching path candidate in the present invention;

Claims

Automatically extracting melody features of the music data from the music to construct a melody feature database ;

Automatically extracting a melody characteristic of music input by a query ;

Searching for music data by comparing the similarity between the extracted query music melody features and the melody features of the melody feature database ; In the melody-based music search method comprising a,

The extracting of the melody feature may include configuring a spectrogram for data to extract feature information and improving a partial sound .

The method of claim 1, wherein the extracting of the melody characteristic information further includes after the partial sound improving step, adding a harmonic sum, calculating a note energy vector for each frame, dividing a note boundary, and forming a note fragment. Melody-based music search method characterized by.

The method of claim 1, wherein the melody query is input as a melody query input by a user by a hum or a song, a melody query input by selecting another music file, or a symbol string representing a melody through a keyboard or a symbol input means. Melody query, a melody-based music search method characterized in that the input is selected in the form of the input melody query by selecting a file that stores a symbol string representing the melody.

The method of claim 1, wherein the extraction of the melody feature information includes a process of constructing a spectrogram of data to extract feature information, and when constructing the spectrogram, processing half of an audio frame size is superimposed over time using an FFT. Melody-based music search method characterized in that.

The method of claim 1, wherein the extraction of the melody feature information includes a process of improving a partial sound of data for extracting feature information, and averaging the difference between energy values of the current FFT index and the surrounding FFT index to extract a distinctive partial sound. Melody-based music retrieval method characterized by improving the partial sound.

6. The method of claim 5, wherein when the partial sound is improved by averaging the difference between energy values of the current FFT index and the surrounding FFT index to extract the distinct partial sound, the average of the main lobe of the window applied before the FFT is performed. Melody-based music retrieval method characterized in that it determines the number of index to be around .

2. The method of claim 1, wherein the extraction of the melody feature information includes a harmonic summing process of data for extracting feature information, and for harmonic summing, an improved partial sound is summed at equal intervals in the frequency domain for each harmonic summation. Melody-based music retrieval method characterized in that the value is extracted, the summation value is normalized by the sum of the summed partial notes, or the subject to be summed is set smaller than half of the FFT size.

The method of claim 1, wherein the extracting of the melody feature information includes calculating a note energy vector for each frame of data from which feature information is to be extracted, and calculating a note energy vector by calculating a harmonic sum for each standard scale band of music. Melody-based music search method characterized by.

The method of claim 8, wherein the energy value at the note band boundary is interpolated using the energy value in the FFT index to calculate the note energy.

The method of claim 1, wherein the extraction of the melody feature information includes a process of dividing a note boundary of data for extracting feature information, and for dividing the note boundary, a constant among the minimum points of the energy of each frame of the improved partial sound is fixed. Melody-based music retrieval method characterized in that for selecting the minimum point in the frame interval as the boundary point of the segment.

The method of claim 1, wherein the extraction of the melody feature information comprises a process of constructing a note fragment of data to extract feature information, and for constructing the note fragment, averaging note energy vectors within a note boundary point for each element and extracting a plurality of local maximum values. Melody-based music retrieval method characterized by obtaining a note energy vector representing a segment.

The method of claim 11, wherein when calculating a note energy vector representing a segment to construct the note fragment, the note energy vector is calculated by dividing each element value by an average of the element values of the note energy vector, or the note fragment is constructed. Melody-based music retrieval method characterized in that the extraction of fewer maximum values for the query input than the music data.

The method of claim 1, wherein the similarity calculation between the melody feature and the query melody feature of the music data is performed by measuring similarity in consideration of temporal differences, partial variations, and overall pitch differences between music and query inputs. Melody-based music retrieval method characterized in that through the DP matching using a note fragment configured as the target melody characteristic information.

15. The method of claim 13, wherein the DP matching for measuring similarity is performed by calculating elements of a DP matrix using a normalized Euclidean distance between a music melody feature expressed as a segmental note energy vector and a vector of query melody feature. Melody-based music search method characterized by.

15. The method of claim 13, wherein the DP matching for measuring similarity is calculated by normalizing the cost to the length of the cost path when obtaining the total minimum cost matching path, or the candidate direction path cost to the matching length to the current matching point. Melody-based music retrieval method characterized in that the normalized calculation.

The method of claim 13, wherein the similarity is calculated by reflecting the degree of deviation of the minimum cost matching path from the diagonal matching path when measuring the similarity, wherein the length of the minimum cost path is determined as the degree of deviation of the minimum cost matching path from the diagonal matching path. The sum of the length of the melody feature and the length of the query melody feature is used, and the value that measures how far the minimum cost matching path deviates from the diagonal matching path is added to or multiplied by the cost value of the minimum cost matching path calculated above. Melody based music search method characterized in that the use

Music data melody feature extraction means for extracting melody features of music data to be searched from a music database; In the melody-based music search apparatus comprising a similarity measuring means for measuring the similarity of the melody characteristics of the base ,

In extracting the melody feature of the music data and the query melody feature, the melody-based music search comprises extracting feature information through a process of constructing a spectrogram for the data from which feature information is to be extracted and a partial sound enhancement process Device.

18. The apparatus of claim 17, further comprising a melody feature database for storing music melody feature information extracted by the music data melody feature extracting means, wherein the melody feature database is a computer storage device or a plurality of distributed and connected melody features. Melody-based music retrieval device comprising a computer storage device, a storage device including a storage device added to the music playback device, etc. to automatically extract the melody features of the music data to build a feature database.

18. The apparatus of claim 17, wherein the music melody feature extracting means and the query melody feature extracting means comprise: spectrogram constructing means for constructing a spectrogram of an input sound, partial sound improving means for performing partial sound enhancement from the spectrogram, Harmonic summation means for performing harmonic summation from an improved partial sound, note energy vector calculating means for calculating a note energy vector per frame using the summed harmonic summation information, and dividing a note boundary using the improved partial sound; And a note fragment constructing means for constructing a note fragment using the note segmentation means and the note boundary information and the frame-specific note energy vector.