KR20060019096A

KR20060019096A - Hummed-based audio source query/retrieval system and method

Info

Publication number: KR20060019096A
Application number: KR1020040067575A
Authority: KR
Inventors: 허성필; 한평희; 원성기
Original assignee: 주식회사 케이티
Priority date: 2004-08-26
Filing date: 2004-08-26
Publication date: 2006-03-03

Abstract

1. 청구범위에 기재된 발명이 속한 기술분야1. TECHNICAL FIELD OF THE INVENTION

본 발명은 허밍 기반의 음원 질의/검색 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것임.The present invention relates to a humming-based sound source query / search system and a method thereof, and a computer-readable recording medium having recorded thereon a program for realizing the method.

2. 발명이 해결하려고 하는 기술적 과제2. The technical problem to be solved by the invention

본 발명은 개인마다 다른 음정과 템포의 수용, 가창실수, 네트워크의 부하 등을 고려해, 원격 사용자의 허밍(humming)을 검색 키(key)로 하여 서버에 저장된 오디오 정보(음원 정보)로부터 사용자가 찾고자 하는 해당 음원을 검색하여 제공할 수 있는 허밍 기반의 음원 검색 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있음.The present invention is intended to find a user from the audio information (sound source information) stored in the server using the humming of the remote user as a search key in consideration of the different pitches and tempo, individual singing, network load, etc. It is an object of the present invention to provide a humming-based sound source retrieval system and a method for searching and providing a corresponding sound source, and a computer-readable recording medium recording a program for realizing the method.

3. 발명의 해결방법의 요지3. Summary of Solution to Invention

본 발명은, 음원 질의/검색 시스템에 있어서, 사용자로부터 검색하고자 하는 악곡의 일부분을 허밍(humming)으로 입력받기 위한 허밍질의 입력수단; 입력된 허밍질의신호로부터 한 음표의 시작점과 끝점의 경계를 식별하기 위한 구간검출수단; 상기 구간검출수단에서 검출된 음표의 음고, 음장의 특징 정보를 추출하기 위한 특징량 추출수단; 각 음표간에 상대화 값을 취하여 심볼 멜로디 시퀀스를 생성하기 위한 심볼 멜로디 표현수단; 각 오디오 정보(음원 정보)의 메타데이터 정보를 생성하기 위한 메타데이터 생성수단; 상기 메타데이터 정보를 저장하고 있는 메타데이 터 저장수단; 상기 심볼 멜로디 표현수단으로부터 상기 심볼 멜로디 시퀀스를 입력받아, 동적 프로그래밍(DP) 매칭을 이용하여 상기 심볼 멜로디 시퀀스와 상기 메타데이터 저장수단에 저장되어 있는 각 음원의 특징량과의 유사도를 계산하기 위한 유사도 계산수단; 상기 유사도 계산수단에서 계산된 거리 정보들을 정렬하여, 해당 메타데이터 정보를 상기 메타데이터 저장수단에서 추출하기 위한 거리기반 분류수단; 및 상기 거리기반 분류수단으로부터의 검색결과를 출력하기 위한 검색결과 출력수단을 포함한다.According to an aspect of the present invention, there is provided a sound source query / search system comprising: a humming input unit for receiving a part of a piece of music to be searched by a user by humming; Section detecting means for identifying a boundary between a start point and an end point of a note from the input humming signal; Feature amount extracting means for extracting pitch information of the note detected by said section detecting means and feature information of a sound field; Symbol melody representation means for taking a relative value between each note to generate a symbol melody sequence; Metadata generating means for generating metadata information of each audio information (sound source information); Metadata storage means for storing the metadata information; Similarity for receiving the symbol melody sequence from the symbol melody representation means and calculating the similarity between the symbol melody sequence and the feature amount of each sound source stored in the metadata storage means using dynamic programming (DP) matching. Calculation means; Distance-based classification means for arranging the distance information calculated by the similarity calculation means and extracting the metadata information from the metadata storage means; And search result output means for outputting a search result from the distance based classification means.

4. 발명의 중요한 용도4. Important uses of the invention

본 발명은 음원 질의/검색 등에 이용됨.
The present invention is used for sound source query / search.

허밍, 오디오, 음원, 질의, 검색, 메타데이터, 유사도 Humming, Audio, Audio, Query, Search, Metadata, Similarity

Description

Humming-based audio source query / retrieval system and method

도 1 은 본 발명에 따른 허밍 기반의 음원 질의/검색 시스템의 일실시예 구성도,1 is a configuration diagram of an embodiment of a humming-based sound source query / search system according to the present invention;

도 2 는 본 발명에 이용되는 메타데이터 DB의 응용 레코드 구성 예시도,2 is an example of application record configuration of the metadata DB used in the present invention;

도 3 은 본 발명에 이용되는 음장(Duration, IOI) 특성 예시도, 3 is an exemplary view of the sound field (Duration, IOI) characteristics used in the present invention,

도 4 는 본 발명의 실시예에 따른 음원 DB의 악곡 예시도,4 is an exemplary diagram of music of a sound source DB according to an embodiment of the present invention;

도 5 는 본 발명에 이용되는 DP 매칭 예시도,5 is an exemplary DP matching diagram used in the present invention;

도 6 은 본 발명에 따른 허밍 기반의 음원 질의/검색 방법에 대한 일실시예 흐름도이다.
6 is a flowchart illustrating an embodiment of a humming-based sound source query / search method according to the present invention.

* 도면의 주요 부분에 대한 부호의 설명* Explanation of symbols for the main parts of the drawings

10 : 허밍 질의 입력부 20 : 멜로디 추출기10: input humming query 20: melody extractor

21 : 구간 검출기 22 : 특징량 추출기21: interval detector 22: feature extractor

23 : 심볼 멜로디 표현기 30 : 검색결과 출력부23: symbol melody presenter 30: search results output unit

40 : 검색엔진 41 : 멜로디 유사도 계산기 40: Search Engine 41: Melody Similarity Calculator

42 : 거리기반 분류기 50 : 메타데이터 생성기42: distance-based classifier 50: metadata generator

60 : 음원 DB 70 : 메타데이터 DB
60: sound source DB 70: metadata DB

본 발명은 허밍 기반의 음원 질의/검색 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것으로, 원격 사용자의 허밍(humming)을 검색 키(key)로 하여 서버에 저장된 오디오 정보(음원 정보)로부터 사용자가 찾고자 하는 해당 음원을 검색하여 제공할 수 있는 허밍 기반의 음원 질의/검색 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.The present invention relates to a humming-based sound source query / search system and a method thereof, and a computer-readable recording medium recording a program for realizing the method, wherein a humming of a remote user is used as a search key. Humming-based sound source query / search system and method thereof capable of searching for and providing a corresponding sound source to be searched for by a user from audio information (sound source information) stored in a server, and can be read by a computer recording a program for realizing the method. To a recording medium.

최근, 인터넷, 무선 네트워크, 초고속 네트워크 등의 발달과 더불어, 디지털 멀티미디어 컨텐츠(오디오, 비디오, 이미지, 뮤직 비디오 등)의 사용량이 급증하고 있고, 교육, 오락, 과학 등 많은 분야에서 다양한 컨텐츠가 개발 및 제작되어 멀티미디어 데이터베이스, 디지털 TV, 인터넷 방송, 뮤직비디오 등의 응용에서 활용되고 있다. 이러한 멀티미디어 정보의 대두는 현대 생활에 많은 영향을 주고 있으며, 특히 인간의 사고 및 미디어의 활용 방법에 커다란 변화를 주고 있다. In recent years, with the development of the Internet, wireless networks, and high-speed networks, the use of digital multimedia contents (audio, video, images, music videos, etc.) is rapidly increasing, and various contents are developed and developed in many fields such as education, entertainment, and science. It is being used in applications such as multimedia database, digital TV, internet broadcasting and music video. The rise of such multimedia information has a great influence on modern life, and in particular, it is changing a lot of ways of thinking and using media.

이와 같이 방대한 양의 멀티미디어 컨텐츠를 효율적으로 사용 및 관리하기 위해서는 정보 검색(Information Retrieval) 시스템이 필요하게 된다. In order to efficiently use and manage a large amount of multimedia contents, an information retrieval system is required.

그러나, 이러한 멀티미디어 정보 검색 방법은 주로 텍스트와 이미지를 중심으로 발달되어 왔으나, 최근 동영상과 오디오 정보에 대한 검색 요구가 증가되고 있는 상황이다. 특히, 인터넷이 보편화되고 오디오의 암호화와 복호화 기술이 발달됨에 따라 인터넷을 이용하여 음악을 제공하는 서비스가 늘어나고 있다. 소위 주문형 음악(MOD : Music On Demand) 혹은 주문형 오디오(AOD : Audio On Demand)로 불리는 이들 인터넷 사이트들은 방대한 분량의 음원(악곡)을 저장하고 사용자의 요구에 따라 선택적으로 음악을 제공하고 있다.However, the multimedia information retrieval method has been mainly developed based on texts and images, but there is an increasing demand for retrieval of video and audio information. In particular, as the Internet becomes more popular and audio encryption and decryption technologies are developed, services for providing music using the Internet are increasing. These Internet sites, called so-called Music On Demand (MOD) or Audio On Demand (AOD), store a vast amount of music (music) and selectively provide music to the user's needs.

이러한 오디오 정보 검색(Audio Information Retrieval) 방법으로는 크게 텍스트 기반 검색(Text-based Retrieval)과 내용 기반 검색(Contents-based Retrieval)으로 분류할 수가 있다. Such audio information retrieval methods can be classified into text-based retrieval and contents-based retrieval.

상기 텍스트 기반 검색은 곡명, 가수명, 가사, 작곡가, 연주가 이름 등 악곡의 특징을 나타내는 서지적인 정보를 키워드로 사용하여 검색하는 방법이다. The text-based search is a method of searching by using bibliographic information representing characteristics of a piece of music, such as the name of a song, a singer's name, lyrics, a composer, and a player's name, as keywords.

한편, 상기 내용 기반에 의한 검색은 음악의 연주 내용 등을 검색키로 사용하는 것으로, 사용되는 입력정보로써 유저(users)의 허밍을 입력키로 한 QBH(Query-by-Humming) 방법이 있다. 본 발명은 악곡에 대한 주선율인 멜로디를 노래 부름으로써, 이것을 검색의 단서로 하여 사용자가 원하는 오디오 정보(음원 정보)를 검색하는 기술이다. On the other hand, the content-based search is to use the content of the music, such as the search key, there is a QBH (Query-by-Humming) method using the humming of the user (users) as the input information used. The present invention is a technique of searching for audio information (sound source information) desired by a user by singing a melody, which is a predominance rate for a piece of music, as a search clue.

그럼, 허밍 질의에 의한 오디어 검색 방법의 도입 배경을 살펴보기로 한다. Now, let's take a look at the background of introduction of audio search method by humming query.

일반적으로, 사용자는 서비스 제공자가 특정 기준을 사용하여 미리 분류하여 정리한 웹 페이지를 선택적으로 따라 가면서 원하는 노래를 찾거나 때로는 노래 제 목의 일부를 문자열 형태로 입력하여 검색함으로써 원하는 노래를 찾기도 한다. 하지만, 이러한 사전 분류에 의한 검색 방법이나 노래 제목에 의한 검색 방법은 사용자가 찾고자 하는 노래의 제목을 미리 알고 있을 경우에나 가능하다. 또한, 검색 대상이 되는 음악 자료의 양이 방대해 짐에 따라 음악 자료를 분류하기가 곤란해지고 복잡해 질 뿐만 아니라, 제목 기반 검색 기법의 한계성에 부딪치게 된다.In general, a user may search for a desired song by selectively following a web page previously classified and organized by a service provider using a specific criterion, or sometimes by searching for a part of a song title by entering a string. . However, such a search method by dictionary classification or search method by song title is possible only when the user knows in advance the title of the song to be searched for. In addition, as the amount of music material to be searched is enormous, it becomes difficult and complicated to classify the music material, and also faces the limitations of the title-based search technique.

수많은 대중가요를 접하다 보면, 현대인은 누구나 때로는 노래 제목이나 가사는 잘 기억이 나지 않지만, 한 두 소절의 가락이나 리듬을 기억하는 경우가 허다 하다. 예를 들면, 몇몇 가락이 머리를 맴돌지만 제목이 생각나지 않는 경우가 빈번히 일어난다. 이러한 경우, 기존의 제목에 의한 음악 자료 검색 방법이나 사전 분류에 의한 검색 방법은 거의 도움이 되지 않는다. 정말로 필요한 것은 내용에 기반을 둔 음악 자료의 검색 방법일 것이다. In the face of so many popular songs, modern people sometimes don't remember song titles or lyrics, but they often remember one or two rhythms or rhythms. For example, a few rhythms hover over the head, but often the title does not come to mind. In such a case, the conventional music data searching method by the title or the searching method by dictionary classification is hardly helpful. What you really need is a way to search for music based content.

내용 기반이라 함은, 전술한 바와 같이 노래 제목이나 가사가 아닌 노래 자체, 즉 멜로디에 의한 검색을 말한다. 디지털 신호 처리나 음악 자료의 표현과 저장 기법 및 컴퓨터 하드웨어 기술의 발달로, 이러한 내용 기반 음악 자료 검색 방법이 가능해졌다. 일예로, 음악 자료 도서관에서는 종종 한두 소절을 직접 노래로 부르거나 아니면 흥얼거림으로써 특정 음악 자료를 찾아 달라는 요청이 많이 들어 오고 있다고 한다. 하지만, 방대한 양의 레코드 자료를 소장하고 있다는 점을 감안하면, 특정 음악을 검색하는 것은 엄청난 일이 될 것이다. The content-based means the search by the song itself, that is, the melody, not the song title or lyrics as described above. Advances in digital signal processing, music data representation and storage techniques, and computer hardware technologies have made this method of content-based music data retrieval possible. For example, the music library often calls for one or two measures to sing or hum to find a particular piece of music. However, given the vast amount of records, searching for a particular piece of music would be enormous.

전자 도서관이 발전됨에 따라 이와 같은 방대한 자료의 검색이 인터넷을 통해 가능해 짐을 고려할 때, 멜로디에 기반한 효과적인 음악 자료 검색 방법은 필수 적이라 하겠다. 이러한 내용 기반의 음악 자료 검색을 통해서, 사용자는 몇몇 소절을 부르거나 흥얼거리기만 하면, 동일한 가락을 포함한 모든 곡들을 조회하고 찾을 수 있게 된다. 따라서, 이와 같은 검색 방법은 음악 취미가나 전문가에게 모두 도움을 준다. Considering that the search for such a huge amount of data is made possible through the Internet as the electronic library is developed, an effective method of searching music based on melody is essential. Through this content-based music data search, users can search and find all songs including the same rhythm by simply singing or humming a few measures. Thus, such a search method is helpful for both music hobbyists and professionals.

내용 기반 음악 자료 검색 기법은 미래 전자 음악 도서관 등에서 빠져서는 안될 아주 중요한 구성요소가 될 것이다. 전문 음악가는 특정 작곡가의 음악을 분석할 수도 있을 것이다. 예를 들면, 반복되는 주제나 자주 나타나는 멜로디를 찾고 분석할 수 있다. Content-based music data retrieval techniques will be a very important component in future electronic music libraries. Professional musicians may analyze the music of a particular composer. For example, you can find and analyze recurring themes or melodies that appear frequently.

물론, 종래에도 내용 기반의 오디오 검색 방법이 존재하지 않은 것은 아니지만, 실제 클라이언트-서버 환경에서 고효율(high efficiency), 고속도(high speed)의 오디오 검색 서비스를 허밍에 의해 제공하기 위해서 특히 고려되어야 할 사항으로는, 개인마다 다른 음정과 템포의 수용, 허밍시 모호한 기억에 기인하는 음표의 삽입이나 탈락과 같은 가창 실수(singing error) 등의 해결이 필요하다. 또 다른 고려 사항으로, 허밍한 음향신호를 서버측에 그대로 전송하게 되면 전송데이터 사이즈가 방대하여, 많은 사용자들이 동시에 서버쪽에 검색을 요청하게 되는 경우 네트워크에 걸리는 트래픽의 증가 및 서버측의 과도한 계산량 등을 고려하여 전송 데이터의 사이즈가 중요한 요소가 된다. 따라서, 보다 효율적이고 효과적인 멜로디 추출법이 요구된다 하겠다.Of course, the content-based audio retrieval method does not exist in the prior art, but in particular, in order to provide a high efficiency, high speed audio retrieval service by humming in a real client-server environment, special consideration should be given. For this reason, it is necessary to solve singing errors such as the insertion and dropping of notes due to vague memories at the time of accommodating different pitches and tempos for each individual. Another consideration is that transmitting the humming sound signal to the server side as it is, the transmission data size is huge, and if many users request the search at the server side at the same time, the increase in the traffic on the network and the excessive calculation amount on the server side, etc. In consideration of the size of the transmission data is an important factor. Therefore, more efficient and effective melody extraction method is required.

허밍을 이용한 오디오 정보 검색 시스템에 대한 본격적인 연구는 1990년대에 시작되었으나 최근 들어 보다 활발히 이루어지고 있는 상황이다. 예를 들면 Query- by-Humming, MiDiLib, MELDEX, SuperMBox, SoundCompus 등과 같은 시스템들이 해당된다.The full-fledged study of audio information retrieval system using humming started in the 1990s, but is more active in recent years. Examples include systems such as Query-by-Humming, MiDiLib, MELDEX, SuperMBox, and SoundCompus.

효율적인 오디오 검색 시스템을 구현하기 위해서 고려되어야 할 사항으로는, 전술한 바와 같이 개인마다 다른 음정과 템포의 수용, 허밍시 모호한 기억에 기인하는 음표의 삽입이나 탈락과 같은 가창 실수 등이 있다. 이와는 별도로 허밍 질의(hummed query)가 완벽하더라도 입력 허밍 신호를 멜로디 매칭에 이용되는 음악 표기로 정확히 변환하기가 어렵다는 점을 고려해야 한다.In order to implement an efficient audio retrieval system, as mentioned above, individual pitches and tempos are accommodated by individuals, and a song mistake such as insertion or dropping of notes due to ambiguous memory at the time of humming is considered. Apart from this, it is important to consider that even if the hummed query is perfect, it is difficult to accurately convert the input hum signal into the music notation used for melody matching.

비록, 종래에도 멜로디 표현과 매칭을 위한 다양한 방법이 여러 각도로 시도되었지만, 성능면에서는 아직 만족할만한 결과를 얻지 못하고 있다. 또한, 구현된 검색시스템을 인터넷 환경에서 운용하기 위해서는 네트워크의 부하를 고려한 전송데이터 크기를 줄이려는 노력이 필요하다. 이와 같은 다양하고 광범위한 문제를 해결하기 위해, 네트워크 자원에 효율적이고 이용자의 오류에 강한 효과적인 멜로디 표현 방법이 필요할 뿐만 아니라, 멜로디 추출에 있어 보다 효과적인 정보 추출 방법이 요구된다.Although various methods for melody expression and matching have been attempted at various angles in the related art, performances have not yet been satisfactory. In addition, in order to operate the implemented retrieval system in the Internet environment, it is necessary to reduce the size of transmission data considering the load of the network. In order to solve such a wide variety of problems, not only an efficient method of expressing melody that is efficient for network resources and resistant to user error is required, but also a more effective method of extracting information in melody extraction.

따라서, 개인마다 다른 음정과 템포의 수용, 가창실수, 네트워크의 부하 등을 고려하여, 허밍 질의로부터 주선율인 멜로디를 효율적으로 표현하고 해당 음원을 효과적으로 검색하여 제공할 수 있는 방안이 절실히 요구된다.
Therefore, in consideration of the different pitches and tempo, the number of songs, and the network load for each individual, there is an urgent need for a method of efficiently expressing the melody, which is the mainline rate, from the humming query, and efficiently searching for and providing the corresponding sound source.

본 발명은 상기 요구에 부응하기 위하여 제안된 것으로, 개인마다 다른 음정 과 템포의 수용, 가창실수, 네트워크의 부하 등을 고려해, 원격 사용자의 허밍(humming)을 검색 키(key)로 하여 서버에 저장된 오디오 정보(음원 정보)로부터 사용자가 찾고자 하는 해당 음원을 검색하여 제공할 수 있는 허밍 기반의 음원 검색 시스템 및 그 방법과, 상기 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공하는데 그 목적이 있다.The present invention has been proposed to meet the above demands, and is stored in a server using a humming of a remote user as a search key in consideration of different pitches and tempo, individual singing, and network load. To provide a humming-based sound source retrieval system and method and a computer-readable recording medium recording a program for realizing the method to search and provide the sound source that the user wants to find from the audio information (sound source information) The purpose is.

본 발명의 다른 목적 및 장점들은 하기의 설명에 의해서 이해될 수 있으며, 본 발명의 실시예에 의해 보다 분명하게 알게 될 것이다. 또한, 본 발명의 목적 및 장점들은 특허 청구 범위에 나타낸 수단 및 그 조합에 의해 실현될 수 있음을 쉽게 알 수 있을 것이다.
Other objects and advantages of the present invention can be understood by the following description, and will be more clearly understood by the embodiments of the present invention. In addition, it will be readily appreciated that the objects and advantages of the present invention may be realized by the means and combinations thereof indicated in the claims.

상기 목적을 달성하기 위한 본 발명은, 클라이언트의 음원 질의 시스템에 있어서, 사용자로부터 검색하고자 하는 악곡의 일부분을 허밍(humming)으로 입력받기 위한 허밍질의 입력수단; 입력된 허밍질의신호로부터 한 음표의 시작점과 끝점의 경계를 식별하기 위한 구간검출수단; 상기 구간검출수단에서 검출된 음표의 음고, 음장의 특징 정보를 추출하기 위한 특징량 추출수단; 각 음표간에 상대화 값을 취하여 심볼 멜로디 시퀀스를 생성하여 서버로 전송하는 심볼 멜로디 표현수단; 및 상기 서버로부터의 "동적 프로그래밍(DP) 매칭을 이용한 '심볼 멜로디 시퀀스'와 '데이터베이스에 저장되어 있는 각 음원의 특징량'과의 유사도 측정 결과에 따른 메타데이터 정보"(검색결과)를 출력하기 위한 검색결과 출력수단을 포함하여 이루어 진 것을 특징으로 한다. In order to achieve the above object, the present invention provides a sound source query system of a client, comprising: a humming input means for receiving a part of a piece of music to be searched by a user as a humming; Section detecting means for identifying a boundary between a start point and an end point of a note from the input humming signal; Feature amount extracting means for extracting pitch information of the note detected by said section detecting means and feature information of a sound field; Symbol melody expression means for taking a relative value between each note and generating a symbol melody sequence and transmitting the same to a server; And outputting "metadata information according to a similarity measurement result between" symbol melody sequence "using" dynamic programming (DP) matching "and" feature amount of each sound source stored in the database "from the server (search results). Characterized in that it comprises a search result output means for.

그리고, 본 발명은 서버의 음원 검색 시스템에 있어서, 각 오디오 정보(음원 정보)의 메타데이터 정보를 생성하기 위한 메타데이터 생성수단; 상기 메타데이터 정보를 저장하고 있는 메타데이터 저장수단; 클라이언트로부터 "사용자의 허밍질의신호에서 각 음표간 상대화 값을 취한 심볼 멜로디 시퀀스"를 입력받아, 동적 프로그래밍(DP) 매칭을 이용하여 상기 심볼 멜로디 시퀀스와 상기 메타데이터 저장수단에 저장되어 있는 각 음원의 특징량과의 유사도를 계산하기 위한 유사도 계산수단; 및 상기 유사도 계산수단에서 계산된 거리 정보들을 정렬하여, 해당 메타데이터 정보를 상기 메타데이터 저장수단에서 추출하여 상기 클라이언트로 전송하는 거리기반 분류수단을 포함하여 이루어진 것을 특징으로 한다. The present invention provides a sound source retrieval system of a server, comprising: metadata generating means for generating metadata information of each audio information (sound information); Metadata storage means for storing the metadata information; Receives the "symbol melody sequence which takes the relative value of each note in the user's humming signal" from the client, and uses the dynamic programming (DP) matching of each symbol melody sequence and the metadata stored in the metadata storage means. Similarity calculating means for calculating a similarity with the feature amount; And distance-based classification means for arranging the distance information calculated by the similarity calculation means, extracting the metadata information from the metadata storage means, and transmitting the metadata information to the client.

또한, 본 발명은 음원 질의/검색 시스템에 있어서, 사용자로부터 검색하고자 하는 악곡의 일부분을 허밍(humming)으로 입력받기 위한 허밍질의 입력수단; 입력된 허밍질의신호로부터 한 음표의 시작점과 끝점의 경계를 식별하기 위한 구간검출수단; 상기 구간검출수단에서 검출된 음표의 음고, 음장의 특징 정보를 추출하기 위한 특징량 추출수단; 각 음표간에 상대화 값을 취하여 심볼 멜로디 시퀀스를 생성하기 위한 심볼 멜로디 표현수단; 각 오디오 정보(음원 정보)의 메타데이터 정보를 생성하기 위한 메타데이터 생성수단; 상기 메타데이터 정보를 저장하고 있는 메타데이터 저장수단; 상기 심볼 멜로디 표현수단으로부터 상기 심볼 멜로디 시퀀스를 입력받아, 동적 프로그래밍(DP) 매칭을 이용하여 상기 심볼 멜로디 시퀀스와 상기 메타데이터 저장수단에 저장되어 있는 각 음원의 특징량과의 유사도를 계산하기 위한 유사도 계산수단; 상기 유사도 계산수단에서 계산된 거리 정보들을 정렬하여, 해당 메타데이터 정보를 상기 메타데이터 저장수단에서 추출하기 위한 거리기반 분류수단; 및 상기 거리기반 분류수단으로부터의 검색결과를 출력하기 위한 검색결과 출력수단을 포함하여 이루어진 것을 특징으로 한다. In addition, the present invention provides a sound source query / search system, comprising: humming input means for receiving a part of a piece of music to be searched from a user by humming; Section detecting means for identifying a boundary between a start point and an end point of a note from the input humming signal; Feature amount extracting means for extracting pitch information of the note detected by said section detecting means and feature information of a sound field; Symbol melody representation means for taking a relative value between each note to generate a symbol melody sequence; Metadata generating means for generating metadata information of each audio information (sound source information); Metadata storage means for storing the metadata information; Similarity for receiving the symbol melody sequence from the symbol melody representation means and calculating the similarity between the symbol melody sequence and the feature amount of each sound source stored in the metadata storage means using dynamic programming (DP) matching. Calculation means; Distance-based classification means for arranging the distance information calculated by the similarity calculation means and extracting the metadata information from the metadata storage means; And search result output means for outputting a search result from the distance-based classification means.

한편, 상기 목적을 달성하기 위한 본 발명은, 음원 질의/검색 시스템에 적용되는 음원 질의 방법에 있어서, 사용자로부터 검색하고자 하는 악곡의 일부분을 허밍(humming)으로 입력받는 허밍질의 입력단계; 상기 입력된 허밍질의신호로부터 한 음표의 시작점과 끝점의 경계를 식별하는 구간검출단계; 상기 검출된 음표의 음고, 음장의 특징 정보를 추출하는 특징량 추출단계; 각 음표간에 상대화 값을 취하여 심볼 멜로디 시퀀스를 생성하여 서버로 전송하는 심볼 멜로디 표현단계; 및 상기 서버로부터의 "동적 프로그래밍(DP) 매칭을 이용한 '심볼 멜로디 시퀀스'와 '데이터베이스에 저장되어 있는 각 음원의 특징량'과의 유사도 측정 결과에 따른 메타데이터 정보"(검색결과)를 출력하는 검색결과 출력단계를 포함하여 이루어진 것을 특징으로 한다. On the other hand, the present invention for achieving the above object, in the sound source query method applied to the sound source query / search system, a humming quality input step of receiving a part of the music to be searched from the user by humming (humming); An interval detection step of identifying a boundary between a start point and an end point of a note from the input humming signal; A feature amount extraction step of extracting feature information of the pitch and sound field of the detected note; A symbol melody representation step of generating a symbol melody sequence by taking a relative value between each note and transmitting it to a server; And "metadata information according to a similarity measurement result between" symbol melody sequence "using" dynamic programming (DP) matching "and" feature amount of each sound source stored in the database "from the server (search results). Characterized in that it comprises a search result output step.

그리고, 본 발명은 음원 질의/검색 시스템에 적용되는 음원 검색 방법에 있어서, 각 오디오 정보(음원 정보)의 메타데이터 정보를 생성하여 메타데이터 DB에 저장하는 메타데이터 생성 및 저장 단계; 클라이언트로부터 "사용자의 허밍질의신호에서 각 음표간 상대화 값을 취한 심볼 멜로디 시퀀스"를 입력받아, 동적 프로그래밍(DP) 매칭을 이용하여 상기 심볼 멜로디 시퀀스와 상기 메타데이터 DB에 저장되어 있는 각 음원의 특징량과의 유사도를 계산하는 유사도 계산단계; 및 상기 계 산된 거리 정보들을 정렬하여, 해당 메타데이터 정보를 상기 메타데이터 DB에서 추출하여 상기 클라이언트로 전송하는 거리기반 분류 및 결과 전송 단계를 포함하여 이루어진 것을 특징으로 한다. The present invention provides a sound source search method applied to a sound source query / search system, the method comprising: generating and storing metadata information of each audio information (sound source information) and storing the metadata information in a metadata DB; A feature of each sound source stored in the symbol melody sequence and the metadata DB using dynamic programming (DP) by receiving a "symbol melody sequence from each user's humming signal from the client". A similarity calculation step of calculating the similarity with the quantity; And sorting the calculated distance information, extracting the corresponding metadata information from the metadata DB, and transmitting the distance-based classification and result transmission to the client.

또한, 본 발명은 음원 질의/검색 시스템에 적용되는 음원 질의/검색 방법에 있어서, 사용자로부터 검색하고자 하는 악곡의 일부분을 허밍(humming)으로 입력받는 허밍질의 입력단계; 상기 입력된 허밍질의신호로부터 한 음표의 시작점과 끝점의 경계를 식별하는 구간검출단계; 상기 검출된 음표의 음고, 음장의 특징 정보를 추출하는 특징량 추출단계; 각 음표간에 상대화 값을 취하여 심볼 멜로디 시퀀스를 생성하는 심볼 멜로디 표현단계; 상기 심볼 멜로디 시퀀스를 입력받아, 동적 프로그래밍(DP) 매칭을 이용하여 상기 심볼 멜로디 시퀀스와 각 오디오 정보(음원 정보)의 메타데이터 정보(각 음원의 특징량)와의 유사도를 계산하는 유사도 계산단계; 및 상기 계산된 거리 정보들을 정렬하여, 해당 메타데이터 정보를 출력하는 거리기반 분류 및 검색결과 출력 단계를 포함하여 이루어진 것을 특징으로 한다. The present invention also provides a sound source query / search method applied to a sound source query / search system, comprising: a humming input step of receiving a part of a piece of music to be searched by a user as a humming; An interval detection step of identifying a boundary between a start point and an end point of a note from the input humming signal; A feature amount extraction step of extracting feature information of the pitch and sound field of the detected note; A symbol melody expression step of generating a symbol melody sequence by taking a relative value between each note; A similarity calculation step of receiving the symbol melody sequence and calculating similarity between the symbol melody sequence and metadata information (characteristic amount of each sound source) of each audio information (sound source information) using dynamic programming (DP) matching; And sorting the calculated distance information, and outputting corresponding metadata information and outputting a search result.

다른 한편, 상기 목적을 달성하기 위한 본 발명은, 허밍 기반의 음원 질의를 위하여, 프로세서를 구비한 음원 질의 시스템에, 사용자로부터 검색하고자 하는 악곡의 일부분을 허밍(humming)으로 입력받는 허밍질의 입력기능; 상기 입력된 허밍질의신호로부터 한 음표의 시작점과 끝점의 경계를 식별하는 구간검출기능; 상기 검출된 음표의 음고, 음장의 특징 정보를 추출하는 특징량 추출기능; 각 음표간에 상대화 값을 취하여 심볼 멜로디 시퀀스를 생성하여 서버로 전송하는 심볼 멜로디 표현기능; 및 상기 서버로부터의 "동적 프로그래밍(DP) 매칭을 이용한 '심볼 멜로 디 시퀀스'와 '데이터베이스에 저장되어 있는 각 음원의 특징량'과의 유사도 측정 결과에 따른 메타데이터 정보"(검색결과)를 출력하는 검색결과 출력기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.On the other hand, the present invention for achieving the above object, for the humming-based sound source query, a humming quality input function for receiving a part of the music to be searched by the user in the sound source query system having a processor as a humming (humming) ; A section detection function for identifying a boundary between a start point and an end point of a note from the input humming signal; A feature amount extraction function for extracting pitch information of the detected note and feature information of a sound field; A symbol melody expression function for generating a symbol melody sequence by taking a relative value between each note and transmitting it to a server; And "metadata information according to the result of similarity measurement between" symbol melody sequence "using" dynamic programming (DP) matching "and" feature amount of each sound source stored in the database "from the server (search results). A computer readable recording medium having recorded thereon a program for realizing a search result output function is provided.

그리고, 본 발명은 허밍 기반의 음원 검색을 위하여, 프로세서를 구비한 음원 검색 시스템에, 각 오디오 정보(음원 정보)의 메타데이터 정보를 생성하여 메타데이터 DB에 저장하는 메타데이터 생성 및 저장 기능; 클라이언트로부터 "사용자의 허밍질의신호에서 각 음표간 상대화 값을 취한 심볼 멜로디 시퀀스"를 입력받아, 동적 프로그래밍(DP) 매칭을 이용하여 상기 심볼 멜로디 시퀀스와 상기 메타데이터 DB에 저장되어 있는 각 음원의 특징량과의 유사도를 계산하는 유사도 계산기능; 및 상기 계산된 거리 정보들을 정렬하여, 해당 메타데이터 정보를 상기 메타데이터 DB에서 추출하여 상기 클라이언트로 전송하는 거리기반 분류 및 결과 전송 기능을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.The present invention provides a metadata generation and storage function of generating metadata information of each audio information (sound source information) and storing the metadata information in a metadata DB in a sound source search system having a processor for searching for a humming-based sound source; A feature of each sound source stored in the symbol melody sequence and the metadata DB using dynamic programming (DP) by receiving a "symbol melody sequence from each user's humming signal from the client". A similarity calculation function for calculating similarity with the quantity; And arranging the calculated distance information, extracting the metadata information from the metadata DB, and transmitting the program to the client to provide a distance-based classification and result transmission function. do.

인터넷, 무선 네트워크, 초고속 네트워크 등의 발달과 더불어, 멀티미디어 컨텐츠는 교육, 오락, 과학 등 많은 분야에서 개발 및 제작되어 멀티미디어 데이터베이스, 디지털 TV, 인터넷 방송, 뮤직비디오 등의 응용에서 활용되고 있는 바, 본 발명은 이러한 멀티미디어 데이터베이스 중에서 사용자가 원하는 오디오 정보(음원 정보)를 허밍을 이용해 검색하여 사용하는 기술분야에 널리 활용될 수 있다. In addition to the development of the Internet, wireless networks, and high-speed networks, multimedia contents have been developed and produced in many fields such as education, entertainment, and science, and are being used in applications such as multimedia databases, digital TV, Internet broadcasting, and music videos. The present invention can be widely used in the technical field of searching for and using audio information (sound information) desired by a user in such a multimedia database.

상술한 목적, 특징 및 장점은 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 또 한, 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에 그 상세한 설명을 생략하기로 한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 일실시예를 상세히 설명하기로 한다.The above objects, features and advantages will become more apparent from the following detailed description taken in conjunction with the accompanying drawings, whereby those skilled in the art may easily implement the technical idea of the present invention. There will be. In addition, in describing the present invention, when it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명에 따른 허밍 기반의 음원 질의/검색 시스템의 일실시예 구성도로서, 클라이언트-서버 환경에서 허밍 질의에 대한 오디오 정보(음원 정보)를 검색하는 예를 나타낸다. 1 is a configuration diagram of an embodiment of a humming-based sound source query / search system according to the present invention, and illustrates an example of searching for audio information (sound source information) for a humming query in a client-server environment.

도 1에 도시된 바와 같이, 본 발명에 따른 허밍 기반의 음원 질의/검색 시스템은, 사용자로부터 검색하고자 하는 악곡의 일부분을 허밍(humming)으로 입력받기 위한 허밍질의 입력부(10)와, 입력된 허밍질의신호로부터 한 음표의 시작점과 끝점의 경계를 식별하기 위한 구간 검출기(21)와, 구간 검출기(21)에서 검출된 음표의 음고(Pitch), 음장(IOI)의 특징 정보를 추출하기 위한 특징량 추출기(22)와, 각 음표간에 상대화 값을 취하여 심볼 멜로디 시퀀스를 생성하기 위한 심볼 멜로디 표현기(23)와, 각 오디오 정보(음원 정보)의 메타데이터 정보를 생성하기 위한 메타데이터 생성기(50)와, 메타데이터 정보를 저장하고 있는 메타데이터 데이터베이스(70)와, 심볼 멜로디 표현기(23)로부터 심볼 멜로디 시퀀스를 입력받아, 동적 프로그래밍(DP) 매칭을 이용하여 심볼 멜로디 시퀀스와 메타데이터 데이터베이스(70)에 저장되어 있는 각 음원의 특징량과의 유사도를 계산하기 위한 멜로디 유사도 계산기(41)와, 멜로디 유사도 계산기(41)에서 계산된 거리 정보들을 정렬하여, 해당 메타데이터 정보를 메타데이터 데이터베이스(70)에서 추출하기 위한 거리기반 분류기 (42)와, 거리기반 분류기(42)로부터의 검색결과를 출력하기 위한 검색결과 출력부(30)를 포함한다. As illustrated in FIG. 1, a humming-based sound source query / search system according to the present invention includes a humming input unit 10 for receiving a part of music to be searched by a user as a humming, and an input humming. Characteristic amount for extracting characteristic information of pitch and sound field (IOI) of the notes detected by the interval detector 21, and interval detector 21 for identifying the boundary between the start and end points of a note from the interrogation signal An extractor 22, a symbol melody presenter 23 for generating a symbol melody sequence by taking a relative value between each note, and a metadata generator 50 for generating metadata information of each audio information (sound source information) And a symbol melody sequence from the metadata database 70 storing metadata information and the symbol melody presenter 23, and using symbol programming using dynamic programming (DP) matching. The melody similarity calculator 41 for calculating the similarity between the desequence and the feature amount of each sound source stored in the metadata database 70 and the distance information calculated by the melody similarity calculator 41 are arranged, A distance-based classifier 42 for extracting data information from the metadata database 70 and a search result output unit 30 for outputting a search result from the distance-based classifier 42.

서버는 악보, 피아노, 전자악기, MP3, 음악CD 등을 통해 각종 오디오 정보(음원 정보)의 데이터베이스(60)를 구축하고 있고, 또한 주선율인 멜로디 정보를 추출하여 메타데이터의 형태로 상대음고차, 상대음장비 계열의 메타데이터 DB(70)를 생성ㆍ구축하고 있다. 따라서, 메타데이터 DB(70)에는 도 2에 도시된 바와 같이 각 오디오(음원)에 대한 메타데이터가 저장되어 있는데, 심볼 멜로디 시퀀스(메타데이터), 오디오명, 파일명, 파일 사이즈, 재생시간 등의 각종 정보가 기록되어 있다. The server constructs a database 60 of various audio information (sound source information) through music score, piano, electronic musical instrument, MP3, music CD, etc., and extracts the melody information which is the predominant rate, in the form of metadata, The metadata DB 70 of the relative sound equipment series is created and constructed. Therefore, metadata for each audio (sound source) is stored in the metadata DB 70 as shown in FIG. 2, such as symbol melody sequence (metadata), audio name, file name, file size, playback time, and the like. Various information is recorded.

본 발명에 따른 음원 질의/검색 시스템은 클라이언트-서버 형태의 구조로서, 그 핵심이 되는 멜로디 추출기(20)는 구간 검출기(21), 특징량 추출기(22), 심볼 멜로디 표현기(23)로 이루어져 있고, 검색엔진(40)은 멜로디 유사도 계산기(41), 거리기반 분류기(42)로 이루어져 있다. 또한, 음원 DB(60) 및 메타데이터 DB(70)는 하나 또는 분리의 DB로 구성 가능하다. The sound source query / search system according to the present invention has a client-server type structure, and the melody extractor 20, which is the core thereof, is composed of a range detector 21, a feature extractor 22, and a symbol melody expresser 23. The search engine 40 includes a melody similarity calculator 41 and a distance-based classifier 42. In addition, the sound source DB 60 and the metadata DB 70 can be configured as one or a separate DB.

클라이언트측은 서버측과 인터넷(WEB), 무선인터넷(WAP), 랜(LAN), 고정형 무선랜(Wireless LAN), 이동형 무선랜(휴대 인터넷) 등의 정보통신망을 통해서 연결되어 있다.The client side is connected to the server side via an information communication network such as the Internet (WEB), wireless Internet (WAP), LAN (LAN), fixed wireless LAN (LAN), mobile wireless LAN (portable Internet).

허밍 질의/검색 과정을 개략적으로 살펴보면, 먼저 사용자가 자신이 검색하고자 하는 악곡의 멜로디를 자유롭게 허밍하면, 클라이언트의 멜로디 추출기(20)는 입력된 허밍으로부터 음표 (note)를 검출한 후 검색에 필요한 특징량을 추출한다. 그리고, 상대화된 특징값인 심볼 멜로디 시퀀스를 생성한다. Referring to the humming query / search process schematically, first, when a user freely hums a melody of a piece of music to be searched for, the melody extractor 20 of the client detects a note from the input hum and then features necessary for the search. Extract the amount. Then, a symbol melody sequence that is a relative feature value is generated.

이후, 서버측의 검색엔진(40)은 DP(dynamic programming) 매칭을 이용하여 입력 허밍과 데이터베이스의 악곡(음원)과의 유사성(similarity)을 측정한다. 이때, 검색 결과로서의 오디오 정보 리스트(음원 정보 리스트)를 유사도가 높은 순서로 정렬된 곡명/유사도 값/매칭된 부분 등 오디오에 관한 각종 정보를 사용자에게 제시해 준다. The search engine 40 on the server side then measures the similarity between the input hum and the music (sound source) of the database by using dynamic programming (DP) matching. At this time, the audio information list (sound source information list) as a search result is presented to the user with various kinds of information about the audio such as a song name / similarity value / matched portion sorted in order of high similarity.

그러면, 사용자는 검색 결과로 제시되는 목록으로부터 원하는 곡명을 선택하여 감상하거나 다운로드 등과 같은 종속동작을 할 수도 있다. Then, the user may select and listen to a desired song name from the list presented as a search result or perform a subordinate operation such as downloading.

그럼, 본 발명에 따른 허밍 질의/검색 시스템의 각 구성요소들의 기능을 보다 상세하게 살펴보기로 한다. Then, the function of each component of the humming query / search system according to the present invention will be described in detail.

허밍 질의로서의 멜로디 입력은 사용자가 마이크를 통해, 자유로운 음고, 자유로운 템포로 자신이 기억하는 악곡의 일부분을 노래 부름으로써 이루어진다. 허밍 방법은 휘파람 또는 무성음이고 파열음으로 시작하고 모음인 유성음으로 구성된 음절형태(예를 들면, 타타타, 차차차 등)로 하나의 음표를 노래하는 방법을 사용한다. 이 방법의 장점은 전체 시스템에 있어 안정된 선율 정보의 추출이 용이하고, 사용자에게도 무리없는 멜로디 입력 방법이라는 점이다.Melody input as a humming query is accomplished by the user singing through a microphone a portion of the piece of music he remembers at free pitch, free tempo. The humming method uses a method of singing a single note in the form of syllables (eg, tatata, cha cha cha, etc.) consisting of whistle or unvoiced sound, beginning with burst sound and vowel voice sound. The advantage of this method is that it is easy to extract stable melody information for the whole system, and it is a melody input method that is easy for the user.

구간 검출기(21)는 입력 허밍 신호로부터 한 음표의 시작점(onset)과 끝점(offset)의 경계를 식별하는 역할을 한다. The interval detector 21 serves to identify a boundary between an onset and an offset of a note from the input humming signal.

기존에 주로 사용되던 진폭에 기초한 구간 검출 방법은 한 음표의 구간 검출에 적절한 문턱치(threshold value)를 설정하여 이벤트 검출을 수행하는 방법이다. 그러나, 이 방법은 하나의 음표가 둘 또는 세 개의 음표로 분리되어 검출될 가능성 이 있고, 반대로 복수의 음표가 하나로 통합되어서 판단될 가능성도 높다. 따라서, 구간 검출의 오류가 오디오 검색 시스템의 성능에 심각한 악영향을 줄 소지가 있다.The interval detection method based on amplitude, which is mainly used in the related art, is a method of performing event detection by setting a threshold value suitable for detection of a single note interval. However, this method is likely to detect one note separated into two or three notes, and conversely, it is also highly likely that a plurality of notes are combined into one. Therefore, the error of the section detection may seriously affect the performance of the audio retrieval system.

구간 검출에 사용되는 특성인 음장은 도 3과 같이 "Duration"과 IOI(Inter Onset Interval)로 정의할 수 있다. 여기서, Duration은 어떤 음표의 시작 시간과 그 음표의 종료시간의 차를 의미한다. 한편, IOI 또는 스팬(span)은 어떤 음표의 시작시간에 대해 그 다음 음표의 시작 시간의 차를 의미한다. The sound field, which is a characteristic used for interval detection, may be defined as "Duration" and IOI (Inter Onset Interval) as shown in FIG. Here, Duration means the difference between the start time of a note and the end time of that note. On the other hand, IOI or span means the difference between the start time of a note and the start time of the next note.

본 발명에서는 음장의 특징량으로 IOI를 사용한다. 대부분의 경우 사용자가 노래를 부를 때 대체로 긴 음표(예를 들면, 2분 음표 이상)에 대해서 실제 음장 길이만큼 정확하게 노래를 부르지 않고 스타카토(staccato) 형태로 끊어서 노래를 부르는 경우가 많으므로, IOI를 사용하는 것이 Duration을 사용하는 것 보다 좋은 검색 결과를 보인다. 또한, 허밍의 마지막 음표의 경우 IOI에 대한 정보가 없으므로 사용을 하지 않는 것이 좋은 결과를 보인다.In the present invention, IOI is used as the feature amount of the sound field. In most cases, when a user sings, the IOI is often cut off in staccato form for long notes (e.g., more than half a note) rather than the exact length of the actual sound field. Using it yields better search results than using Duration. In addition, since the last note of the humming does not have any information about the IOI, it is better not to use it.

특징량 추출기(22)는 음고(pitch), 음장(IOI)의 동정(identification)을 통해 특징량을 추출한다. 이때, 음장 정보는 상기의 구간 검출 과정에서 구한다. 또한, 음고는 검출된 음표에서 진폭이 안정된 구간에 대해 켑스트럼 분석을 통해 기본 주파수(fundamental frequency)를 추출함으로써 구한다.The feature extractor 22 extracts the feature through the pitch and identification of the sound field IIO. At this time, the sound field information is obtained in the above-described section detection process. In addition, the pitch is obtained by extracting the fundamental frequency through the Cepstrum analysis for the period of stable amplitude in the detected note.

심볼 멜로디 표현기(23)는 동정(identification)된 음고(pitch), 음장(IOI)의 계열(sequence)을 각각의 상대음고차(deltaPitch), 상대음장비(IOIratio)로 변환하여 심볼 멜로디 표현 시퀀스를 만들고, 이를 서버 쪽에 전송한다. 이때, 상대 음고차(deltaPitch)는 반음(semitone)의 차가 100 센트가 되도록 정규화하고, 상대음장비(IOIratio)는 바로 이전의 음장과의 비를 백분율(%)로 표현한다. 이때, 음향신호를 그대로 전송하지 않고 검색에 필요한 정보만을 심볼 멜로디 시퀀스로 변환하여 서버측에 전송하는 이유는 필요한 트래픽의 양을 최소화하기 위함이다.The symbol melody presenter 23 converts a sequence of the identified pitch and the sound field IIO into respective deltaPitch and relative sound equipment, thereby representing a symbol melody expression sequence. Create and send it to the server side. In this case, the relative pitch (deltaPitch) is normalized so that the difference of the semitone is 100 cents, and the relative sound equipment (IOIratio) expresses the ratio with the immediately previous sound field as a percentage (%). In this case, the reason for converting only the information necessary for the search into the symbol melody sequence without transmitting the sound signal as it is to transmit to the server side is to minimize the amount of traffic required.

서버는 음원(오디오) DB(60) 및 메타데이터 DB(70)를 가지고 있으며, 악곡의 검색처리를 수행한다. 이때, 검색에는 DP 매칭을 이용하여 클라이언트로부터 입력된 시계열과 메타데이터 DB(70)에 보관된 각 악곡 계열과의 유사성을 유클리드 거리나 절대 차이값의 합으로 구한다. 그리고, 매칭 결과 입력 계열과 가장 거리가 가까운 계열을 가진 곡부터 정답의 후보로써 클라이언트에 결과로 전송한다. The server has a sound source (audio) DB 60 and a metadata DB 70, and performs searching processing of music pieces. At this time, the search uses DP matching to find the similarity between the time series input from the client and each music sequence stored in the metadata DB 70 as the sum of Euclidean distance or absolute difference value. Then, the song having the closest sequence to the matching result input sequence is transmitted to the client as a candidate for the correct answer.

음원 데이터베이스(60)의 동일한 곡을 허밍할지라도 사용자의 허밍에 대해서 다음과 같은 점을 고려해야 한다. 첫 번째로, 개인차로 인해 사용자가 허밍을 할 때 음의 높이와 빠르기가 실제 데이터베이스에 포함된 참조 패턴과 다를 수 있다. 두 번째로, 사용자의 모호한 기억으로 인해, 허밍시 음표의 삽입 및 탈락 등이 발생된다. Although humming the same song in the sound source database 60, the following points should be taken into consideration for the user's humming. First, individual differences can cause pitch height and speed to differ from the reference patterns included in real databases when users humming. Secondly, due to the ambiguous memory of the user, insertion and dropping of notes during humming occur.

따라서, 첫 번째 문제를 해결하기 위해서는 허밍 데이터의 정규화(normalization)가 필요하다. 일반적인 멜로디 표현 방법은 연속되는 노트에 상대적인 스팬 비(IOIratio)와 상대적인 피치차(deltaPitch)를 이용하여 데이터를 정규화한다. 또한, 두 번째 문제에 대해서는 DP 매칭 알고리즘을 사용함으로써 해결할 수 있다. Therefore, in order to solve the first problem, normalization of the humming data is required. A typical melody expression method normalizes data using a span ratio (IOIratio) and a relative pitch difference (deltaPitch) relative to successive notes. In addition, the second problem can be solved by using a DP matching algorithm.

DP 매칭의 상세 예로서, 입력 음향신호로부터 특징량 추출기(22)에서 다음과 같은 시계열의 값을 얻었다고 가정한다.
As a detailed example of the DP matching, it is assumed that the following time series values are obtained in the feature extractor 22 from the input acoustic signal.

- 허밍음장시계열 : {245, 123, 247, 250, 249, 124, 125, 510} [단위: ms] - 허밍음고시계열 : {260, 292, 331, 393, 392, 348, 330, 261} [단위: Hz]-Humming sound field time series: {245, 123, 247, 250, 249, 124, 125, 510} [unit: ms]-humming sound time series: {260, 292, 331, 393, 392, 348, 330, 261} [Unit: Hz]

동정(identification)된 음장, 음고의 시계열에 대해 심볼 멜로디 표현기(23)에서는 다음과 같이 상대화를 수행한다.
Regarding the identified sound field and time series of pitches, the symbol melody presenter 23 performs the relativity as follows.

- 허밍상대음장비(IOIratio) : {50.2 200.8 101.2 99.2 49.8 100.8 408.0} [단위: %] - 허밍상대음고차(deltaPitch) : {198 201 297 2 -203 -98 -394} [단위: cent]-Humming relative sound equipment (IOIratio): {50.2 200.8 101.2 99.2 49.8 100.8 408.0} [unit:%]-delta pitch: {198 201 297 2 -203 -98 -394} [unit: cent]

음원 데이터베이스(60)의 어떠한 곡이 도 4와 같은 음표의 조합으로 구성되어 있을 때, 이를 시계열로 나타내면 다음과 같다.
When any song of the sound source database 60 is composed of a combination of notes as shown in FIG. 4, this is shown as time series as follows.

- DB곡 음장시계열 : {4분음표, 8분음표, 4분음표, 4분음표, 4분음표, 8분음표, 8분음표, 2분음표} - DB곡 음고시계열 : { 도 레 미 솔 솔 파 미 도 }-DB Song Sound Field Time Series: {Quarter Note, Eighth Note, Quarter Note, Quarter Note, Quarter Note, Eighth Note, Eighth Note, Quarter Note}-DB Song Note Time Series: {Do Remi Brush brush wave

메타데이터 DB(70)의 상대화된 값은 다음과 같다.
The relative value of the metadata DB 70 is as follows.

- DB곡 상대음장비(IOIratio) : {50 200 100 100 50 100 400} [단위: %] - DB곡 상대음고차(deltaPitch) : {200 200 300 0 -200 -100 -400} [단위: cent]-DB song relative sound equipment (IOIratio): {50 200 100 100 50 100 400} [unit:%]-DB song relative pitch (deltaPitch): {200 200 300 0 -200 -100 -400} [unit: cent ]

검색엔진(40)에서는 입력 허밍의 상대화값과 DB곡의 상대화값에 대해서 DP 매칭을 이용하여 유사도를 계산하여 검색 결과를 출력한다. The search engine 40 outputs a search result by calculating a similarity degree using DP matching on the relative value of the input humming and the relative value of the DB song.

이를 일반화하여 수학적인 식으로 표현하면 다음과 같다.This can be generalized and expressed as a mathematical expression as follows.

음원 데이터베이스(60)내에 음표의 길이가 I인 어떤 악곡의 시계열 R과 입력 허밍의 음표 길이가 J인 시계열 Q에 대해 어떤 점 i, j에 있어서의 DP 매칭 매트릭스 g(i, j)는 도 5와 같이 구해진다.The DP matching matrix g (i, j) at any point i, j for a time series R of a piece of music having a note length I and a time series Q of a note length of an input hum J in the sound source database 60 is shown in FIG. Obtained as

데이터베이스의 원소 Q와 R의 거리 코스트를 d라고 두고, 만일 음표의 길이가 I인 데이터베이스의 시계열 R = {i ₀ , i ₁ , i ₂ ,……, i _I-1 }과 음표의 길이가 J인 입력 허밍의 시계열 Q = {j ₀ , j ₁ , j ₂ ,……, j _J-1 }는 상대음장비와 상대음고차의 세트로 구성되어져 있다면, 두개의 시계열 i와 j에 대한 유사도 d(i, j)=|i-j|는 반복적으로 계산이 가능하고, 이러한 가정에 의해 유사도에 대한 거리 계산은 하기의 [수학식 1]에 의해서 구할 수 있다.
With that the cost of the distance of the element Q and R database d, emergency database length of time series of notes I _{_{R = {i 0, i 1}} , i 2, ... … , _{I i-1}} of the input and Hamming length of time series of notes J _{_{Q = {j 0, j 1}} , j 2, ... … , J _J-1} has been configured, if a relative sound equipment and the relative sets of higher order negative, the degree of similarity d (i, j) for the two time series i with j = | ij | Can be calculated repeatedly, and based on this assumption, the distance calculation for the similarity can be obtained by Equation 1 below.

상기의 허밍 질의에 의한 검색 과정을 도 6을 참조하여 보다 상세하게 살펴보면 다음과 같다. Looking at the search process by the humming query described above in more detail with reference to FIG.

먼저, 사용자가 허밍 질의하면(601), 클라이언트측의 멜로디 추출기(20)에서 이에 대한 심볼 멜로디 시퀀스를 추출하여(602~604), 인터넷 등의 정보통신망을 통해 서버측의 검색엔진(40)으로 전달한다. First, when the user humming query (601), the melody extractor 20 on the client side extracts the symbol melody sequence for this (602 ~ 604), to the search engine 40 on the server side through the information communication network, such as the Internet To pass.

즉, 클라이언트측에서, 사용자로부터 입력된 허밍 질의는 구간 검출기(21)를 통해 하나의 음표(note) 구간으로 자동 검출된다(602). 이때, 검출된 음표로부터 특징량 추출기(22)에서는 음표의 구성요소인 음고(pitch)와 음장(span) 정보를 추출한다(603). 이후에, 심볼 멜로디 표현기(23)에서는 실제 허밍한 악곡이 데이터베이스의 악곡에 대해 템포(tempo)와 음조(tonality)가 다르므로 음표간에 상대화(normalization) 값을 취하여, 정보량이 컴팩트한 심볼 멜로디 시퀀스를 생성하여 (604) 서버측으로 전송한다. That is, at the client side, the humming query input from the user is automatically detected as one note interval through the interval detector 21 (602). At this time, the feature extractor 22 extracts pitch and span information which are components of the note from the detected note (603). Subsequently, in the symbol melody presenter 23, since the actual humming music has a different tempo and tonality with respect to the music in the database, a symbol melody sequence having a compact information amount is obtained by taking a normalization value between the notes. Generate (604) and send it to the server side.

이후, 네트워크를 통해 전송된 심볼 멜로디 시퀀스와 메타데이터 DB(70)의 메타데이터 정보는 검색엔진(40)의 유사도 측정을 통해(605), 유사도가 높은 오디오를 검색의 결과로 하여(606) 인터넷 등의 정보통신망을 통해 클라이언트로 전송된다(607). Subsequently, the symbol melody sequence transmitted through the network and the metadata information of the metadata DB 70 are measured through the similarity measurement of the search engine 40 (605). The data is transmitted to the client through the information communication network (607).

즉, 서버측에서, 멜로디 유사도 계산기(41)는 음원 DB(오디오 DB)(60)의 오디오 파일(음원 파일)을 메타데이터 생성기(50)를 통하여 메타데이터 DB(70)에 도 2의 메타데이터 형태로 기록하여 둔다. 그리고, 멜로디 유사도 계산기(41)는 입력된 심볼 멜로디 시퀀스와 메타데이터 DB(70)에 있는 각 오디오의 특징량과의 유사도를 계산한다(605). 이때, 유사도는 DP 매칭을 이용하여 두 벡터 사이의 거리를 측정하는 유클리디언(Euclidian) 거리나 절대 차이값의 합(sum of absolute difference) 등의 방법을 사용한다. That is, on the server side, the melody similarity calculator 41 transmits the audio file (sound source file) of the sound source DB (audio DB) 60 to the metadata DB 70 via the metadata generator 50 and the metadata of FIG. 2. Record it in the form. The melody similarity calculator 41 calculates the similarity between the input symbol melody sequence and the feature amount of each audio in the metadata DB 70 (605). In this case, the similarity may be a method such as an Euclidian distance or a sum of absolute difference that measures a distance between two vectors using DP matching.

이후, 거리기반 분류기(42)에서는 계산된 거리 정보들을 가장 거리값이 작은 순서로 정렬(sorting)하여(606), 해당하는 메타데이터 정보를 메타데이터 DB(70)에서 추출하여 검색결과 정보로서 인터넷을 통해 클라이언트측으로 전달한다(607).Thereafter, the distance-based classifier 42 sorts the calculated distance information in the order of the smallest distance value (606), extracts the corresponding metadata information from the metadata DB 70, and retrieves the Internet as search result information. It passes to the client side via (607).

그러면, 사용자는 검색된 결과 리스트의 링크로부터 오디오(음원)를 다운로드받아 감상할 수 있다. Then, the user can download and enjoy the audio (sound source) from the link of the searched result list.

상술한 바와 같은 본 발명의 방법은 프로그램으로 구현되어 컴퓨터로 읽을 수 있는 형태로 기록매체(씨디롬, 램, 롬, 플로피 디스크, 하드 디스크, 광자기 디스크 등)에 저장될 수 있다. 이러한 과정은 본 발명이 속하는 기술 분야에서 통상 의 지식을 가진 자가 용이하게 실시할 수 있으므로 더 이상 상세히 설명하지 않기로 한다.As described above, the method of the present invention may be implemented as a program and stored in a recording medium (CD-ROM, RAM, ROM, floppy disk, hard disk, magneto-optical disk, etc.) in a computer-readable form. Since this process can be easily implemented by those skilled in the art will not be described in detail any more.

이상에서 설명한 본 발명은, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경이 가능하므로 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니다.
The present invention described above is capable of various substitutions, modifications, and changes without departing from the technical spirit of the present invention for those skilled in the art to which the present invention pertains. It is not limited by the drawings.

상기와 같은 본 발명은, 음원(오디오) 정보를 인터넷 등의 정보통신망을 통해 허밍을 이용하여 간편하고 효율적으로 검색할 수 있어, 내용 기반 오디오 검색 등의 응용에 효율적으로 사용할 수 있는 효과가 있다. The present invention as described above, the sound information (audio) information can be easily and efficiently searched by using humming through an information communication network such as the Internet, there is an effect that can be efficiently used for applications such as content-based audio search.

Claims

In the sound source query system of the client,

Humming input means for receiving a portion of a piece of music to be searched by a user for humming;

Section detecting means for identifying a boundary between a start point and an end point of a note from the input humming signal;

Feature amount extracting means for extracting pitch information of the note detected by said section detecting means and feature information of a sound field;

Symbol melody expression means for taking a relative value between each note and generating a symbol melody sequence and transmitting the same to a server; And

"Metadata information according to the result of similarity measurement between" symbol melody sequence "using" dynamic programming (DP) matching "and" feature amount of each sound source stored in the database "from the server (search results) Search result output means

Humming-based sound source query system comprising a.

The method of claim 1,

The characteristic sound field used in the section detection means,

A Humming-based sound source query system, characterized by using an Inter Onset Interval (IOI), which is a difference between a start time of a note and a start time of the note and a start time of the next note with respect to a start time of the note.

The method of claim 2,

The feature amount extraction means,

Feature amount is extracted through identification of pitch and sound field (IOI), where sound field information is obtained from the interval detecting means, and pitch information is subjected to a spectral analysis for a section with stable amplitude in the detected note. Humming-based sound source query system, characterized in that obtained by extracting the fundamental frequency through.

The method of claim 1,

The symbol melody representation means,

Humming, characterized by generating a symbol melody representation sequence by converting an identified pitch and a sequence of sound field (IOI) into respective deltaPitch and relative sound equipment (IOIratio). Sound source query system based.

The method according to any one of claims 1 to 4,

The server,

Using DP matching, the similarity between the time series input from the sound source query system and each music sequence stored in the metadata DB is obtained by the sum of Euclidean distance or absolute difference value, and as a result of the matching, the sequence closest to the input sequence is obtained. Humming-based sound source query system, characterized in that the transmission to the sound source query system as a candidate for the correct answer.

In the sound source search system of the server,

Metadata generating means for generating metadata information of each audio information (sound source information);

Metadata storage means for storing the metadata information;

Receives the "symbol melody sequence which takes the relative value of each note in the user's humming signal" from the client, and uses the dynamic programming (DP) matching of each symbol melody sequence and the metadata stored in the metadata storage means. Similarity calculating means for calculating a similarity with the feature amount; And

Distance-based classification means for arranging the distance information calculated by the similarity calculation means, extracting the metadata information from the metadata storage means, and transmitting the metadata information to the client.

Humming-based sound source search system comprising a.

The method of claim 6,

The similarity calculation means,

Using the Euclidian distance or the sum of absolute difference, which measures the distance between two vectors using DP matching, the inputted symbol melody sequence and each of the metadata storage means Humming-based sound source retrieval system, characterized in that it calculates the similarity with the feature amount of the sound source.

The method according to claim 6 or 7,

The distance-based classification means,

Sorting the calculated distance information in the order of the smallest distance value, extracts the corresponding metadata information from the metadata storage means and delivers the metadata information to the client as search result information. Way.

In the sound source query / search system,

Section detecting means for identifying a boundary between a start point and an end point of a note from an input humming signal;

Symbol melody representation means for taking a relative value between each note to generate a symbol melody sequence;

Metadata storage means for storing the metadata information;

Similarity for receiving the symbol melody sequence from the symbol melody representation means and calculating the similarity between the symbol melody sequence and the feature amount of each sound source stored in the metadata storage means using dynamic programming (DP) matching. Calculation means;

Distance-based classification means for arranging the distance information calculated by the similarity calculation means and extracting the metadata information from the metadata storage means; And

Search result output means for outputting a search result from the distance-based classification means

Humming-based sound source query / search system comprising a.

In the sound source query method applied to the sound source query / search system,

A humming input step of receiving a portion of a piece of music to be searched by a user through humming;

An interval detection step of identifying a boundary between a start point and an end point of a note from the input humming signal;

A feature amount extraction step of extracting feature information of the pitch and sound field of the detected note;

A symbol melody representation step of generating a symbol melody sequence by taking a relative value between each note and transmitting it to a server; And

A search for outputting "metadata information according to a similarity measurement result between" symbol melody sequence "using" dynamic programming (DP) matching "and" feature amount of each sound source stored in the database "from the server (search result) Result output stage

Humming-based sound source query method comprising a.

In the sound source search method applied to the sound source query / search system,

A metadata generation and storage step of generating metadata information of each audio information (sound source information) and storing the metadata information in a metadata DB;

A feature of each sound source stored in the symbol melody sequence and the metadata DB using dynamic programming (DP) by receiving a "symbol melody sequence from each user's humming signal from the client". A similarity calculation step of calculating the similarity with the quantity; And

Sorting the calculated distance information, the distance-based classification and result transmission step of extracting the corresponding metadata information from the metadata DB to transmit to the client

Humming-based sound source search method comprising a.

In the sound source query / search method applied to the sound source query / search system,

A symbol melody expression step of generating a symbol melody sequence by taking a relative value between each note;

A similarity calculation step of receiving the symbol melody sequence and calculating similarity between the symbol melody sequence and metadata information (characteristic amount of each sound source) of each audio information (sound source information) using dynamic programming (DP) matching; And

Distance-based classification and search result outputting step of sorting the calculated distance information and outputting corresponding metadata information

Humming-based sound source query / search method comprising a.

For a humming-based sound source query, in a sound source query system having a processor,

A humming input function for receiving a part of a piece of music to be searched from a user by humming;

A section detection function for identifying a boundary between a start point and an end point of a note from the input humming signal;

A feature amount extraction function for extracting pitch information of the detected note and feature information of a sound field;

A symbol melody expression function for generating a symbol melody sequence by taking a relative value between each note and transmitting it to a server; And

A search for outputting "metadata information according to a similarity measurement result between" symbol melody sequence "using" dynamic programming (DP) matching "and" feature amount of each sound source stored in the database "from the server (search result) Result output function

A computer-readable recording medium having recorded thereon a program for realizing this.

In the sound source search system equipped with a processor for a humming-based sound source search,

A metadata generation and storage function for generating metadata information of each audio information (sound source information) and storing the metadata information in a metadata DB;

A feature of each sound source stored in the symbol melody sequence and the metadata DB using dynamic programming (DP) by receiving a "symbol melody sequence from each user's humming signal from the client". A similarity calculation function for calculating similarity with the quantity; And

The distance-based classification and result transmission function of sorting the calculated distance information, extracting the corresponding metadata information from the metadata DB, and transmitting the extracted metadata information to the client