KR20220125389A

KR20220125389A - Music search method and device using inflection point of melody line

Info

Publication number: KR20220125389A
Application number: KR1020210029148A
Authority: KR
Inventors: 김명; 이보현
Original assignee: 이화여자대학교 산학협력단
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2022-09-14
Also published as: KR102451800B1

Abstract

The disclosed technology relates to a music search method and device using an inflection point of a melody line. The method includes: a step of selecting, by the music search device, a plurality of inflection points included in an input music source; a step of extracting, by the music search device, inflection point information based on a gradient variation for each of the plurality of inflection points and a distance ratio between each of the inflection points; a step of calculating a similarity between sequences by comparing the inflection point information of original music with the inflection point information; and a step of dividing the input music source into a length of representative phrase of the original music to be normalized.

Description

Music search method and device using inflection point of melody line {MUSIC SEARCH METHOD AND DEVICE USING INFLECTION POINT OF MELODY LINE}

개시된 기술은 멜로디 라인의 변곡점을 이용한 음악 검색 방법 및 장치에 관한 것이다.The disclosed technology relates to a method and apparatus for searching music using an inflection point of a melody line.

기존 노래 검색 알고리즘은 음의 높이 변화나 길이 분석을 중심으로 연구되고 있다. 대표적으로 음원 데이터에서 옥타브의 차이가 있는 성분들을 계산하여 전체 주파수를 하나의 옥타브 안에 넣어 표현한 크로마그램을 이용하고 있다. 음악 커버곡 검색 시스템 구현에 있어서 크로마그램 간 유사도 계산은 필수적인 구성 요소이다. Existing song search algorithms are being studied focusing on pitch changes or length analysis. Typically, a chromatogram is used to calculate the components with an octave difference in the sound source data and put the entire frequency in one octave. Calculation of similarity between chromatograms is an essential component in implementing a music cover song search system.

한편, 크로마그램을 이용하면 계이름에 대한 정보를 정확하게 파악할 수 있으나 음정 상호 간의 상대적인 높낮이, 곡의 빠르기 변화 등의 관계성은 고려하지 못한다는 단점이 있었다. 예컨대, 원곡을 변형한 커버곡을 검색할 때 조성이나 속도, 전체적인 구성 등을 일정한 기준으로 변형한 경우에는 검색 정확도가 보장될 수 있으나 조성의 변화를 단계적으로 자유롭게 변형하거나 속도의 빠르기를 자유롭게 변형하는 경우에는 검색 정확도가 떨어지는 문제가 있었다.On the other hand, if the chromatogram is used, it is possible to accurately grasp the information on the name of the gye name, but there is a disadvantage that the relationship such as the relative pitch between the pitches and the change in the speed of the song cannot be considered. For example, when searching for a cover song with a modified original song, if the composition, speed, and overall composition are modified based on a certain standard, the search accuracy can be guaranteed, but it is possible to freely transform the change in composition step by step or freely change the speed of the song. In this case, there was a problem that the search accuracy was lowered.

한편, 크로마그램을 이용하는 검색 시스템은 검색 결과에 대한 정확도 뿐만 아니라 크로마그램 비교에 소요되는 저장공간 및 계산량을 줄이기 위한 크로마그램 코딩 방법도 요구된다.On the other hand, a search system using a chromagram requires a chromagram coding method to reduce the amount of storage and calculation required for chromatogram comparison as well as accuracy for a search result.

한국 공개특허 제10-2018-0103639호Korean Patent Publication No. 10-2018-0103639

개시된 기술은 멜로디 라인의 변곡점을 이용한 음악 검색 방법 및 장치를 제공하는데 있다.The disclosed technology is to provide a method and apparatus for searching music using an inflection point of a melody line.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제 1 측면은 검색장치가 입력된 음원에 포함된 복수개의 변곡점들을 선별하는 단계, 상기 검색장치가 상기 복수개의 변곡점들 각각에 대한 기울기 변화 및 각 변곡점들 간의 거리 비율을 토대로 변곡점 정보를 추출하는 단계, 원곡의 변곡점 정보와 상기 변곡점 정보를 비교하여 시퀀스 간 유사도를 계산하는 단계 및 상기 입력된 음원을 상기 원곡의 대표 구절의 길이로 나누어 정규화하는 단계를 포함하는 멜로디 라인의 변곡점을 이용한 음악 검색 방법을 제공하는데 있다.A first aspect of the disclosed technology to achieve the above technical problem is the steps of selecting a plurality of inflection points included in an input sound source by a search apparatus, and the search apparatus changes inclination of each of the plurality of inflection points and between the inflection points. Extracting the inflection point information based on the distance ratio, calculating the similarity between sequences by comparing the inflection point information of the original song with the inflection point information, and normalizing the input sound source by dividing the length of the representative passage of the original song An object of the present invention is to provide a music search method using an inflection point of a melody line.

상기의 기술적 과제를 이루기 위하여 개시된 기술의 제 2 측면은 음원을 입력받는 입력장치, 복수개의 음원들을 저장하는 저장장치 및 상기 입력된 음원에 포함된 복수개의 변곡점들을 선별하고 상기 복수개의 변곡점들 각각에 대한 기울기 변화 및 각 변곡점들 간의 거리 비율을 토대로 변곡점 정보를 추출하고 상기 복수개의 음원들 각각의 변곡점 정보와 상기 입력된 음원의 변곡점 정보를 비교하여 상기 입력된 음원에 대한 원곡을 검색하는 처리장치를 포함하는 멜로디 라인의 변곡점을 이용한 음악 검색 장치를 제공하는데 있다.A second aspect of the disclosed technology to achieve the above technical task is to select an input device for receiving a sound source, a storage device for storing a plurality of sound sources, and a plurality of inflection points included in the input sound source, A processing device for extracting inflection point information based on the gradient change and the distance ratio between each inflection point, and comparing the inflection point information of each of the plurality of sound sources with the inflection point information of the input sound source to search for the original song for the input sound source. An object of the present invention is to provide a music retrieval device using an inflection point of a melody line.

개시된 기술의 실시 예들은 다음의 장점들을 포함하는 효과를 가질 수 있다. 다만, 개시된 기술의 실시 예들이 이를 전부 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다. Embodiments of the disclosed technology may have effects including the following advantages. However, since it does not mean that the embodiments of the disclosed technology should include all of them, the scope of the disclosed technology should not be understood as being limited thereby.

개시된 기술의 일 실시예에 따르면 멜로디 라인의 변곡점을 이용한 음악 검색 방법 및 장치는 커버곡이나 허밍을 이용하여 원곡을 검색하는 효과가 있다.According to an embodiment of the disclosed technology, a music search method and apparatus using an inflection point of a melody line has an effect of searching for an original song using a cover song or humming.

또한, 방송 콘텐츠 내 삽입된 음원의 원곡을 검색하여 음원 저작자에 대한 저작권을 보호하는 효과가 있다.In addition, there is an effect of protecting the copyright of the sound source author by searching for the original song of the sound source inserted in the broadcast content.

또한, 원곡과 커버곡을 비교하여 표절을 판단하는 효과가 있다.In addition, there is an effect of determining plagiarism by comparing the original song and the cover song.

또한, 종래의 크로마그램 검색 시스템 대비 소요되는 저장공간 및 계산량이 줄어드는 효과가 있다.In addition, there is an effect of reducing the amount of storage and calculation required compared to the conventional chromatogram search system.

도 1은 개시된 기술의 일 실시예에 따른 멜로디 라인의 변곡점을 이용한 음악 검색 과정을 나타낸 도면이다.
도 2는 개시된 기술의 일 실시예에 따른 멜로디 라인의 변곡점을 이용한 음악 검색 방법에 대한 순서도이다.
도 3은 개시된 기술의 일 실시예에 따른 멜로디 라인의 변곡점을 이용한 음악 검색 장치에 대한 블록도이다.
도 4는 음원에 포함된 복수개의 변곡점들을 도식화한 도면이다.
도 5는 슬라이딩 방식으로 유사도를 계산하는 것을 나타낸 도면이다.
도 6은 변곡점을 선별한 결과를 나타낸 도면이다.1 is a diagram illustrating a music search process using an inflection point of a melody line according to an embodiment of the disclosed technology.
2 is a flowchart of a music search method using an inflection point of a melody line according to an embodiment of the disclosed technology.
3 is a block diagram of a music search apparatus using an inflection point of a melody line according to an embodiment of the disclosed technology.
4 is a diagram schematically illustrating a plurality of inflection points included in a sound source.
5 is a diagram illustrating calculating the similarity in a sliding manner.
6 is a view showing a result of selecting an inflection point.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention.

제 1 , 제 2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 해당 구성요소들은 상기 용어들에 의해 한정되지는 않으며, 단지 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various components, but the components are not limited by the above terms, and only for the purpose of distinguishing one component from other components. used only as For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

본 명세서에서 사용되는 용어에서 단수의 표현은 문맥상 명백하게 다르게 해석되지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 한다. 그리고 "포함한다" 등의 용어는 설시된 특징, 개수, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 의미하는 것이지, 하나 또는 그 이상의 다른 특징들이나 개수, 단계 동작 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 배제하지 않는 것으로 이해되어야 한다.A singular expression in terms used herein should be understood to include a plural expression unless the context clearly dictates otherwise. And terms such as "comprising" mean that the specified feature, number, step, operation, component, part, or a combination thereof exists, but one or more other features or number, step operation component, part It is to be understood that this does not exclude the possibility of the presence or addition of or combinations thereof.

도면에 대한 상세한 설명을 하기에 앞서, 본 명세서에서의 구성부들에 대한 구분은 각 구성부가 담당하는 주기능 별로 구분한 것에 불과함을 명확히 하고자 한다. 즉, 이하에서 설명할 2개 이상의 구성부가 하나의 구성부로 합쳐지거나 또는 하나의 구성부가 보다 세분화된 기능별로 2개 이상으로 분화되어 구비될 수도 있다. Prior to a detailed description of the drawings, it is intended to clarify that the classification of the constituent parts in the present specification is merely a division according to the main function that each constituent unit is responsible for. That is, two or more components to be described below may be combined into one component, or one component may be divided into two or more for each more subdivided function.

그리고 이하에서 설명할 구성부 각각은 자신이 담당하는 주기능 이외에도 다른 구성부가 담당하는 기능 중 일부 또는 전부의 기능을 추가적으로 수행할 수도 있으며, 구성부 각각이 담당하는 주기능 중 일부 기능이 다른 구성부에 의해 전담되어 수행될 수도 있음은 물론이다. 따라서, 본 명세서를 통해 설명되는 각 구성부들의 존재 여부는 기능적으로 해석되어야 할 것이다.In addition, each of the constituent units to be described below may additionally perform some or all of the functions of other constituent units in addition to the main function it is responsible for. Of course, it can also be performed by being dedicated to it. Therefore, the existence or non-existence of each component described through the present specification should be interpreted functionally.

도 1은 개시된 기술의 일 실시예에 따른 멜로디 라인의 변곡점을 이용한 음악 검색 과정을 나타낸 도면이다. 도 1을 참조하면 검색장치는 멜로디 라인의 시퀀스 대신 변곡점을 이용하여 입력되는 커버곡에 대한 원곡을 검색할 수 있으며 출력장치를 통해 검색 결과를 출력할 수 있다. 여기에서 변곡점은 악보 상에서 음정이 변하는 부분을 점으로 표시한 것이며, 각 변곡점을 나타내는 점을 선으로 연결하여 이해하기 쉽도록 표현할 수 있다. 검색장치는 추출된 변곡점들 중에서 유의미한 데이터 변화가 나타나는 일부의 변곡점들을 선별하여 변곡점 정보를 생성할 수 있다. 그리고 변곡점 정보를 비교하여 입력된 음원과 원곡 간의 유사도를 비교할 수있다. 1 is a diagram illustrating a music search process using an inflection point of a melody line according to an embodiment of the disclosed technology. Referring to FIG. 1 , the search apparatus may search for an original song for an input cover song using an inflection point instead of a sequence of a melody line, and may output a search result through an output device. Here, the inflection point is a point where the pitch changes on the score, and the points representing each inflection point are connected with a line to make it easier to understand. The search apparatus may generate inflection point information by selecting some inflection points at which significant data changes appear among the extracted inflection points. And by comparing the inflection point information, it is possible to compare the similarity between the input sound source and the original song.

도 1에서는 커버곡에 대한 음원이 입력되는 것을 예시로 들었으나 허밍과 같이 특정 노래의 멜로디를 흥얼거린 것을 녹음한 데이터를 음원으로 입력할 수도 있다. 검색장치에 입력되는 음원은 악보를 데이터화한 멜로디 라인을 이용할 수 있다. 이러한 멜로디 라인은 주파수 값으로 표현될 수도 있고 미디 노트 넘버의 형태로 표현될 수도 있다. 검색장치는 복수의 음원들을 저장하는 데이터베이스를 포함하며 데이터베이스에 저장된 음원들과 입력된 음원을 비교하여 유사도를 계산할 수 있다. 검색장치는 입력된 커버곡 음원에서 추출한 변곡점 정보와 기 저장된 음원에서 추출한 변곡점 정보를 각각 비교하여 두 변곡점 시퀀스 간의 유사도를 계산할 수 있다. 예컨대, 입력된 음원의 변곡점 정보에 대한 시퀀스에 대하여 원곡의 대표 구절의 변곡점 정보에 대한 시퀀스를 한칸씩 슬라이딩하여 원곡의 대표 구절이 상기 음원에 포함되어 있는지 계산하는 방식으로 수행될 수 있다. 여기에서 저장된 음원은 원곡에 대한 음원을 의미한다. 이와 같이 유사도를 계산하면 입력된 음원을 저장된 원곡의 대표 구절의 길이로 나누어 정규화함으로써 가장 유사도가 높은 것을 입력된 음원에 대한 원곡으로 검색할 수 있다. In FIG. 1 , the sound source for the cover song is input as an example, but data recorded by humming the melody of a specific song, such as humming, may be input as the sound source. The sound source input to the search device may use a melody line obtained by converting the music score into data. Such a melody line may be expressed as a frequency value or may be expressed in the form of a MIDI note number. The search apparatus includes a database for storing a plurality of sound sources, and may calculate a similarity by comparing sound sources stored in the database with an input sound source. The search apparatus may calculate similarity between two inflection point sequences by comparing inflection point information extracted from the input cover song sound source and inflection point information extracted from a pre-stored sound source, respectively. For example, it may be performed in a manner of calculating whether the representative phrase of the original song is included in the sound source by sliding the sequence for the inflection point information of the representative phrase of the original song one by one with respect to the sequence regarding the inflection point information of the input sound source. Here, the stored sound source means the sound source for the original song. When the similarity is calculated in this way, the input sound source is divided by the length of the representative phrase of the stored original song and normalized, so that the one with the highest similarity can be searched for as the original song for the input sound source.

종래의 음악 검색 알고리즘에서는 크게 조 옮김으로 인한 변화, 음원의 전반적인 빠르기 변화, 곡의 구성 변화 및 매 프레임마다 음원에 포함된 특징값을 추출함에 따라 발생하는 메모리 문제가 성능 저하에 영향을 미치는 주요 원인으로 지목된다. 이러한 문제들 중 가장 대표적으로 조 옮김으로 인한 변화에 대처하는 검색 알고리즘에 대한 연구가 집중적으로 진행되었으며 그 결과 편집 거리(Levenshtein Distance, Edit Distance) 알고리즘과 최장 공통 부분 문자열(Longest Common Substring, LSC) 알고리즘과 같은 시퀀스 유사도 측정 알고리즘을 이용한 검색 시스템이 주로 이용되고 있었다. 이러한 알고리즘들은 문자열을 기반으로 하는 알고리즘이지만 문자열은 일종의 시퀀스로 처리될 수 있으므로 음원에 대한 시퀀스 데이터를 분석하는데에도 이용이 가능하다. 편집 거리 알고리즘은 두 문자열이 같아지려면 몇 번의 삽입, 삭제, 변경이 이루어져야 하는지를 편집 거리로 정의하고, 이를 계산하여 그 최소값을 결과로 반환한다. 즉, 편집 거리가 작을수록 두 문자열이 유사한 것으로 볼 수 있다. 최장 공통 부분 문자열 알고리즘은 주어진 문자열이 공통으로 가잔 부분 문자열 중 가장 긴 것을 찾는 알고리즘이다. 부분 문자열은 연속되지 않아도 포함되며 최장 공통 부분 문자열의 길이가 길수록 두 문자열이 유사한 것으로 볼 수 있다.In the conventional music search algorithm, major causes of performance degradation are changes due to transposition, changes in the overall speed of the sound source, changes in the composition of songs, and memory problems caused by extracting feature values included in the sound source for every frame. is designated as The most representative of these problems, the research on the search algorithm to cope with the change caused by transposition was intensively conducted, and as a result, the Levenshtein Distance, Edit Distance algorithm and the Longest Common Substring (LSC) algorithm were conducted. A search system using a sequence similarity measurement algorithm such as Although these algorithms are string-based algorithms, since the string can be processed as a kind of sequence, it can also be used to analyze sequence data for a sound source. The edit distance algorithm defines the number of insertions, deletions, and changes required for two strings to be the same as the edit distance, calculates it, and returns the minimum value as a result. That is, the smaller the editing distance, the more similar the two strings. The longest common substring algorithm is an algorithm that finds the longest among substrings that a given string has in common. Substrings are included even if they are not contiguous, and as the length of the longest common substring is longer, two strings can be considered similar.

한편, 이러한 알고리즘들 뿐만 아니라 동적 시간 워핑 기법을 이용하여 두 시계열 데이터의 유사도를 측정하는 것도 가능하다. 예컨대, 비교 대상이 되는 데이터의 x축을 n배로 늘리고 특정 데이터와 매칭시켜서 두 데이터의 그래프가 일치하는지 비교할 수 있다. 두 데이터 간의 행렬 상의 이동거리가 짧을수록 유사도가 높은 것으로 판단할 수 있다.On the other hand, it is possible to measure the similarity of two time series data using not only these algorithms but also a dynamic time warping technique. For example, by increasing the x-axis of data to be compared by n times and matching it with specific data, it is possible to compare whether the graphs of the two data match. It can be determined that the shorter the movement distance on the matrix between the two data, the higher the similarity.

한편, 개시된 기술에서는 편집 거리 알고리즘을 변형한 편집 비용 알고리즘을 이용한다. 검색장치는 편집 비용 알고리즘을 이용하여 시퀀스의 특징적인 부분인 변곡점의 특성을 비교하여 유사도를 계산할 수 있다. 편집 비용 알고리즘의 주된 특징은 유사도 계산 시 편집 거리 알고리즘과는 다른 가중치를 둔다는 점과 결과값을 가장 마지막 값이 아닌 마지막 행에서의 최소값으로 취한다는 점이다. 여기에서 다른 가중치라 함은 변곡점들 각각에 대한 기울기 변화 및 각 변곡점들 간의 거리 비율을 의미한다. 가령, 복수개의 변곡점들 중 기울기 변화 및 거리 비율의 변화가 시작되는 특정 변곡점에 대한 특징값을 변곡점 정보로 추출하여 이용할 수 있다.Meanwhile, the disclosed technology uses an editing cost algorithm modified from an editing distance algorithm. The search apparatus may calculate the similarity by comparing the characteristics of the inflection point, which is a characteristic part of the sequence, using the editing cost algorithm. The main characteristics of the editing cost algorithm are that a different weight is given to the editing distance algorithm when calculating the similarity, and the result value is taken as the minimum value in the last row rather than the last value. Here, the other weight means a change in slope for each of the inflection points and a ratio of a distance between the inflection points. For example, a feature value of a specific inflection point at which a change in slope and distance ratio among a plurality of inflection points starts may be extracted and used as inflection point information.

검색장치는 입력된 커버곡 음원에 포함된 복수개의 변곡점들을 추출한 뒤 상술한 가중치에 따라 변곡점 정보를 추출할 수 있다. 그리고 입력된 음원의 변곡점 정보와 기 저장된 원곡의 변곡점 정보를 비교하여 시퀀스 간 유사도를 계산할 수 있다.After extracting a plurality of inflection points included in the input cover song sound source, the search apparatus may extract inflection point information according to the above-described weight. In addition, the similarity between sequences may be calculated by comparing the inflection point information of the input sound source with the previously stored inflection point information of the original song.

한편, 종래의 편집 거리 알고리즘은 입력되는 데이터의 길이가 각각 M과 N 일 때 값을 계산한 행렬의 (M,N) 값을 결과값으로 취하지만 편집 비용 알고리즘에서는 커버곡에 원곡의 대표 구절이 포함되어 있는지를 검색하는 것이 목적이므로 원곡의 대표 구절의 길이를 기준으로 커버곡과의 편집 비용을 모두 계산한 뒤 그중 가장 작은 값을 결과값으로 취할 수 있다.On the other hand, the conventional editing distance algorithm takes the (M,N) value of the matrix calculated when the length of the input data is M and N, respectively, as the result value, but in the editing cost algorithm, the representative phrase of the original song is included in the cover song. The purpose is to search for whether the song is included, so after calculating all the editing costs with the cover song based on the length of the representative passage of the original song, the smallest value among them can be taken as the result.

한편, 상술한 바와 같이 검색장치는 멜로디 라인의 시퀀스를 그대로 이용하는 대신 변곡점을 추출하여 그 특징을 토대로 커버곡과 원곡의 시퀀스 유사도를 계산할 수 있다. 변곡점을 기준으로 유사도를 비교하는 방식이므로 같은 음이 길게 이어지는 부분이나 변화의 정도가 일정하게 유지되는 부분은 그 시작점이나 끝 부분만 메모리에 저장하고 나머지 부분은 저장히자 않아도 되므로 종래 알고리즘 대비 메모리 사용량이 줄어드는 장점이 있다. 또한, 변곡점의 앞뒤 기울기와 각 변곡점들 간의 거리 관계를 고려할 수 있으므로 메모리에 저장하지 않은 생략된 정보들까지도 유추하는 것이 가능하다. 따라서, 메모리 사용량과 계산량은 줄어들지만 검색 정확도를 종래보다 높일 수 있다. 또한 변곡점을 기준으로 시퀀스마다 상대적인 기준을 적용시킬 수 있으므로 곡마다 속도가 다른 부분이 존재하더라도 검색 정확도를 유지할 수 있다.Meanwhile, as described above, instead of using the sequence of the melody line as it is, the search apparatus may extract an inflection point and calculate the sequence similarity between the cover song and the original song based on the characteristic. Since the similarity is compared based on the inflection point, the memory usage is lower compared to the conventional algorithm because it is not necessary to store only the starting point or the end part of the part where the same note continues for a long time or the part where the degree of change is kept constant. has the advantage of being reduced. In addition, since the front and rear slope of the inflection point and the distance relation between the inflection points can be considered, it is possible to infer even omitted information not stored in the memory. Accordingly, the memory usage and calculation amount are reduced, but the search accuracy can be increased compared to the prior art. In addition, since a relative criterion can be applied to each sequence based on the inflection point, search accuracy can be maintained even if there is a part with a different speed for each song.

도 2는 개시된 기술의 일 실시예에 따른 멜로디 라인의 변곡점을 이용한 음악 검색 방법에 대한 순서도이다. 도 2를 참조하면 음악 검색 방법(200)은 검색장치를 통해 순차적으로 수행될 수 있으며 입력된 음원에 포함된 복수개의 변곡점들을 선별하는 단계(210), 복수개의 변곡점들 각각에 대한 기울기 변화 및 각 변곡점들 간의 거리 비율을 토대로 변곡점 정보를 추출하는 단계(220) 원곡의 변곡점 정보와 입력된 음원의 변곡점 정보를 비교하여 시퀀스 간 유사도를 계산하는 단계(230) 및 입력된 음원을 원곡의 대표 구절의 길이로 나누어 정규화하는 단계(240)를 포함한다.2 is a flowchart of a music search method using an inflection point of a melody line according to an embodiment of the disclosed technology. Referring to FIG. 2 , the music search method 200 may be sequentially performed through a search device, and includes a step 210 of selecting a plurality of inflection points included in an input sound source, a change in slope for each of the plurality of inflection points, and each Step of extracting inflection point information based on the distance ratio between the inflection points (220) Comparing the inflection point information of the original song with the inflection point information of the input sound source and calculating the similarity between sequences (230) and using the input sound source as a representative phrase of the original song and normalizing (240) by dividing by length.

210 단계에서 검출장치는 입력된 음원에 포함된 복수개의 변곡점들을 선별한다. 210 단계에서 선별되는 복수개의 변곡점들은 높낮이 변화가 일정하거나 거리 비율이 일정한 것들도 포함된다. 즉, 검출장치를 통해 변곡점이라고 판단되는 모든 특징점들이 추출될 수 있다.In step 210, the detection device selects a plurality of inflection points included in the input sound source. The plurality of inflection points selected in step 210 include those having a constant height change or a constant distance ratio. That is, all feature points determined to be inflection points may be extracted through the detection device.

220 단계에서 검출장치는 복수개의 변곡점들 각각에 대한 기울기 변화 및 각 변곡점들 간의 거리 비율을 토대로 변곡점 정보를 추출한다. 여기에서 추출되는 변곡점들은 기울기 변화와 거리 비율의 변화가 이전과 차이나는 것들일 수 있다. 예컨대, 복수개의 변곡점들 중 기울기 변화 및 거리 비율의 변화가 시작되는 특정 변곡점에 대한 특징값을 변곡점 정보로 추출할 수 있다. 이러한 변곡점들을 선별하고 시계열 순으로 나열하여 시퀀스 데이터 형태의 변곡점 정보를 추출할 수 있다.In step 220, the detection apparatus extracts inflection point information based on a change in slope of each of the plurality of inflection points and a distance ratio between the respective inflection points. The inflection points extracted here may be those in which the change in slope and the change in distance ratio are different from the previous ones. For example, a feature value of a specific inflection point at which a change in slope and a change in distance ratio among the plurality of inflection points starts may be extracted as inflection point information. Inflection point information in the form of sequence data can be extracted by selecting these inflection points and arranging them in time series order.

230 단계에서 검출장치는 원곡의 변곡점 정보와 입력된 음원의 변곡점 정보를 비교하여 시퀀스 간 유사도를 계산한다. 앞서 220 단계에서 변곡점 정보는 시퀀스 데이터 형태로 변환되었으므로 원곡의 시퀀스와 입력된 음원의 시퀀스가 서로 유사한지 비교할 수 있다. 두 시퀀스의 유사도를 비교하기 위해서 검출장치는 입력된 음원의 변곡점 정보에 대한 시퀀스에 대하여 원곡의 대표 구절의 변곡점 정보에 대한 시퀀스를 한칸씩 슬라이딩하여 원곡의 대표 구절이 입력된 음원에 포함되어 있는지 계산할 수 있다. 이러한 계산 과정을 처리하기 위해 검출장치는 편집 비용 알고리즘을 이용할 수 있다.In step 230, the detection device compares the inflection point information of the original song with the inflection point information of the input sound source to calculate the similarity between the sequences. Since the inflection point information is converted into sequence data in step 220 above, it is possible to compare whether the sequence of the original song and the sequence of the input sound source are similar to each other. In order to compare the similarity of the two sequences, the detection device slides the sequence for the inflection point information of the representative phrase of the original song one by one with respect to the sequence for the inflection point information of the input sound source. can In order to process this calculation process, the detection device may use an editing cost algorithm.

240 단계에서 입력된 음원을 원곡의 대표 구절의 길이로 나누어 정규화한다. 여기에서 정규화라는 의미는 입력된 음원을 저장된 원곡의 대표 구절의 길이로 나누는 것으로, 입력된 음원에 대한 편집 비용을 정규화하는 것을 의미한다. 예컨대, 가장 낮은 값의 편집 비용을 가진 음원을 입력된 음원에 대한 원곡으로 판단할 수 있다. 검색장치는 데이터베이스에 저장된 복수의 원곡들 중 입력된 음원에 대한 편집 비용이 가장 낮은 것을 입력된 음원의 원곡으로 판단하고 이 곡에 대한 정보를 검색 결과로 출력할 수 있다.In step 240, the input sound source is normalized by dividing it by the length of the representative passage of the original song. Here, normalization means dividing the input sound source by the length of the representative passage of the stored original song, and it means normalizing the editing cost for the input sound source. For example, the sound source having the lowest editing cost may be determined as the original music for the input sound source. The search apparatus may determine that the lowest editing cost for the input sound source among the plurality of original songs stored in the database is the original song of the input sound source, and output information about the song as a search result.

도 3은 개시된 기술의 일 실시예에 따른 멜로디 라인의 변곡점을 이용한 음악 검색 장치에 대한 블록도이다. 도 3을 참조하면 음악 검색 장치(300)는 입력장치(310), 저장장치(320) 및 처리장치(330)를 포함한다.3 is a block diagram of a music search apparatus using an inflection point of a melody line according to an embodiment of the disclosed technology. Referring to FIG. 3 , the music search device 300 includes an input device 310 , a storage device 320 , and a processing device 330 .

입력장치(310)는 음원을 입력받는다. 입력장치(310)는 사용자로부터 입력되는 음원을 수신할 수도 있고 외부의 단말기로부터 전송되는 음원을 수신할 수도 있다. 입력장치(310)는 음악 검색 장치(300)의 입력 인터페이스의 형태로 구현될 수 있다. 예컨대, 키보드나 마우스와 같이 음원 데이터를 입력할 수 있는 인터페이스일 수 있다.The input device 310 receives a sound source. The input device 310 may receive a sound source input from a user or may receive a sound source transmitted from an external terminal. The input device 310 may be implemented in the form of an input interface of the music search device 300 . For example, it may be an interface capable of inputting sound source data, such as a keyboard or a mouse.

저장장치(320)는 복수개의 음원들을 저장한다. 저장장치는 복수개의 음원들을 저장할 수 있는 용량을 갖는 메모리로 구현될 수 있다. 이러한 메모리는 전원이 공급되지 않아도 저장된 정보를 계속 유지할 수 있는 비휘발성 메모리(Non-Volatile Memory)일 수 있으며 관리자의 입력에 따라 저장된 복수개의 음원들 중 일부가 갱신될 수 있다.The storage device 320 stores a plurality of sound sources. The storage device may be implemented as a memory having a capacity to store a plurality of sound sources. Such a memory may be a non-volatile memory capable of continuously maintaining stored information even when power is not supplied, and some of the plurality of stored sound sources may be updated according to an input of a manager.

처리장치(330)는 입력된 음원에 포함된 복수개의 변곡점들을 선별하고 복수개의 변곡점들 각각에 대한 기울기 변화 및 각 변곡점들 간의 거리 비율을 토대로 변곡점 정보를 추출하고 복수개의 음원들 각각의 변곡점 정보와 입력된 음원의 변곡점 정보를 비교하여 입력된 음원에 대한 원곡을 검색한다. 처리장치(330)는 음악 검색 장치(300)의 프로세서 내지는 AP로 구현될 수 있다. 처리장치(330)는 저장장치(320)에 저장된 복수개의 음원들 각각의 변곡점 정보와 입력된 음원의 변곡점 정보 간의 시퀀스 유사도를 계산하여 복수개의 음원들 중 원곡을 검색할 수 있다.The processing device 330 selects a plurality of inflection points included in the input sound source, extracts inflection point information based on a change in slope for each of the plurality of inflection points and a distance ratio between each inflection point, and extracts the inflection point information of each of the plurality of sound sources and The original song for the input sound source is searched by comparing the inflection point information of the input sound source. The processing device 330 may be implemented as a processor or an AP of the music search device 300 . The processing device 330 may search for an original song among the plurality of sound sources by calculating a sequence similarity between the inflection point information of each of the plurality of sound sources stored in the storage device 320 and the inflection point information of the input sound source.

한편, 음악 검색 장치(300)는 검색 결과를 출력하기 위한 출력장치를 더 포함할 수 있다. 출력장치는 복수개의 음원들 중 입력된 음원과의 유사도가 가장 높은 음원에 대한 정보를 검색 결과로 출력할 수 있다. Meanwhile, the music search apparatus 300 may further include an output apparatus for outputting a search result. The output device may output information about a sound source having the highest similarity with the input sound source among the plurality of sound sources as a search result.

한편, 상술한 바와 같은 음악 검색 장치(300)는 컴퓨터에서 실행될 수 있는 실행가능한 알고리즘을 포함하는 프로그램(또는 어플리케이션)으로 구현될 수도 있다. 상기 프로그램은 일시적 또는 비일시적 판독 가능 매체(non-transitory computer readable medium)에 저장되어 제공될 수 있다.Meanwhile, the music search apparatus 300 as described above may be implemented as a program (or application) including an executable algorithm that can be executed in a computer. The program may be provided by being stored in a temporary or non-transitory computer readable medium.

비일시적 판독 가능 매체란 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상술한 다양한 어플리케이션 또는 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리카드, ROM (read-only memory), PROM (programmable read only memory), EPROM(Erasable PROM, EPROM) 또는 EEPROM(Electrically EPROM) 또는 플래시 메모리 등과 같은 비일시적 판독 가능 매체에 저장되어 제공될 수 있다.The non-transitory readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, etc., and can be read by a device. Specifically, the various applications or programs described above are CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM (read-only memory), PROM (programmable read only memory), EPROM (Erasable PROM, EPROM) Alternatively, it may be provided while being stored in a non-transitory readable medium such as an EEPROM (Electrically EPROM) or a flash memory.

일시적 판독 가능 매체는 스태틱 램(Static RAM，SRAM), 다이내믹 램(Dynamic RAM，DRAM), 싱크로너스 디램 (Synchronous DRAM，SDRAM), 2배속 SDRAM(Double Data Rate SDRAM，DDR SDRAM), 증강형 SDRAM(Enhanced SDRAM，ESDRAM), 동기화 DRAM(Synclink DRAM，SLDRAM) 및 직접 램버스 램(Direct Rambus RAM，DRRAM) 과 같은 다양한 RAM을 의미한다.Temporarily readable media include: Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDR SDRAM), Enhanced SDRAM (Enhanced) SDRAM, ESDRAM), Synchronous DRAM (Synclink DRAM, SLDRAM) and Direct Rambus RAM (Direct Rambus RAM, DRRAM) refers to a variety of RAM.

도 4는 음원에 포함된 복수개의 변곡점들을 도식화한 도면이다. 도 4에서 x축은 타임스탬프를 나타내고 y축은 미디 노트 넘버를 나타낸다. 동그라미로 표시된 것과 같이 데이터의 양상이 변화하는 부분을 변곡점으로 정의할 수 있다. 가령, 5행과 6행처럼 미디 노트 넘버의 차이가 같은 것은 같은 값으로 유지되거나 기울기 변화가 없이 일정하게 증가, 감소 혹은 기울기가 유지된다는 의미이므로 이러한 변곡점들은 변곡점 정보에 포함되지 않도록 제외시킬 수 있다. 이러한 과정에 따라 불필요한 메모리 소모량이 줄어들 수 있다.4 is a diagram schematically illustrating a plurality of inflection points included in a sound source. In FIG. 4 , the x-axis represents the timestamp and the y-axis represents the MIDI note number. As indicated by a circle, a part in which the aspect of data changes can be defined as an inflection point. For example, the same difference in MIDI note numbers as in the 5th and 6th lines means that the same value is maintained or that the slope is constantly increased, decreased or sloped without changing the slope, so these inflection points can be excluded from being included in the inflection point information. . According to this process, unnecessary memory consumption may be reduced.

도 5는 슬라이딩 방식으로 유사도를 계산하는 것을 나타낸 도면이다. 도 5에서 x축은 타임스탬프를 나타내고 y축은 임의의 데이터 값을 나타낸다. 검은색 선으로 나타나는 커버곡 데이터에 대해 아래쪽에 파란 선으로 표시된 원곡의 대표 구절 데이터가 포함되어 있는지 탐색하기 위해서 원곡의 대표 구절 데이터를 한칸씩 옆으로 슬라이딩하며 비교할 수 있다. 도 5에 도시된 바와 같이 타임스탬프 상에서 18~22 부분이 일치하는 구절이 존재하므로 검색장치에 입력된 커버곡이 원곡의 대표 구절을 포함하고 있다는 것을 확인할 수 있다.5 is a diagram illustrating calculating the similarity in a sliding manner. In FIG. 5 , the x-axis represents a timestamp and the y-axis represents an arbitrary data value. You can compare the representative verse data of the original song by sliding it sideways one by one in order to explore whether the cover song data indicated by the black line contains the representative verse data of the original song indicated by the blue line at the bottom. As shown in FIG. 5 , since there are phrases that match 18 to 22 parts on the timestamp, it can be confirmed that the cover song input into the search device includes the representative phrases of the original song.

도 6은 변곡점을 선별한 결과를 나타낸 도면이다. 앞서 도 4 및 도 5를 통해 설명한 편집 비용 알고리즘에서는 같은 값이 여러 프레임에 거쳐 지속되면 그 시작과 끝 부분을 모두 변곡점으로 인식하기 때문에 실제 곡을 들을 때 인식되는 변곡점과 괴리가 있을 수 있다. 또한 값이 일정한 정도로 증가하거나 감소하는 등 기울기 변화량이 일정한 경우에는 프레임 1개당 값이 변화하면 원래 의도대로 변곡점 정보에 포함되지는 않으나 같은 값이 2개 이상의 프레임에서 지속되는 경우에는 기울기 변화량이 일정하더라도 모두 변곡점 정보에 포함될 수 있다. 즉 불필요한 메모리 소모가 발생할 수 있다. 6 is a view showing a result of selecting an inflection point. In the editing cost algorithm described above with reference to FIGS. 4 and 5 , when the same value continues through several frames, both the beginning and the end are recognized as inflection points, so there may be a deviation from the inflection point recognized when listening to an actual song. In addition, if the slope change amount is constant, such as when the value increases or decreases to a certain degree, if the value changes per frame, it is not included in the inflection point information as originally intended. All of them may be included in the inflection point information. That is, unnecessary memory consumption may occur.

따라서 개시된 기술에서는 이러한 한계를 극복하기 위하여 변곡점의 선별을 2단계로 나누어 수행할 수 있다. 예컨대, 도 6에 나타난 대로 1~11행은 일차적으로 값이 변하는 시작 부분이면서 전후로 기울기 변화가 존재하는 부분이므로 변곡점 후보로 둘 수 있다. 당연하게도 1행의 시작 부분과 11행의 끝 부분은 변곡점 정보에 반드시 포함되는 특징이다. 여기에서 3~4행은 기울기가 변화하지 않으므로 선별에서 제외될 수 있다. 도 6에서 O로 표시된 것은 선별된 후보 변곡점을 나타내고 X로 표시된 것은 선별에서 제외된 변곡점을 의미한다. 여기까지가 변곡점 선별의 1단계로 처리될 수 있다. Therefore, in the disclosed technology, in order to overcome this limitation, the selection of the inflection point can be performed by dividing it into two steps. For example, as shown in FIG. 6 , lines 1 to 11 may be regarded as candidates for inflection points because they are the starting portions where the values change primarily and the portions in which the slope changes before and after. Naturally, the beginning of line 1 and the end of line 11 are characteristics that are necessarily included in the inflection point information. Here, lines 3 and 4 can be excluded from screening because the slope does not change. In FIG. 6 , an inflection point denoted by O denotes a selected candidate inflection point, and an inflection point denoted by an X denotes an inflection point excluded from the screening. This can be processed as the first step of selecting the inflection point.

한편, 2단계에서 검색장치는 후보 변곡점들을 다시 선별하여 최종적으로 변곡점 정보를 추출한다. 최종적으로 선별되는 변곡점들은 전후로 기울기가 일정하지 않은 것만 선별될 수 있다. 가령, 도 6에서 3행의 변곡점은 전후로 기울기가 동일하게 유지되고 있으므로 제외될 수 있다. 이러한 과정에 따라 두 단계로 나누어 변곡점들을 선별하여 시퀀스 형태의 변곡점 정보를 생성할 수 있다. 검색장치는 동일한 과정에 따라 데이터베이스에 저장된 복수의 원곡들에 대한 시퀀스를 추출한 뒤 입력된 음원의 시퀀스와 각각 비교하는 것으로 원곡을 검색할 수 있다.Meanwhile, in step 2, the search apparatus selects candidate inflection points again and finally extracts inflection point information. The inflection points finally selected may be selected only if the slopes are not constant before and after the inflection points. For example, the inflection point of row 3 in FIG. 6 may be excluded because the front and rear slopes are maintained the same. According to this process, it is possible to generate sequence-type inflection point information by dividing inflection points into two steps. The search apparatus may search for an original song by extracting a sequence of a plurality of original songs stored in the database according to the same process and comparing the sequence with the input sound source sequence, respectively.

개시된 기술의 일 실시예에 따른 멜로디 라인의 변곡점을 이용한 음악 검색 방법 및 장치는 이해를 돕기 위하여 도면에 도시된 실시 예를 참고로 설명되었으나, 이는 예시적인 것에 불과하며, 당해 분야에서 통상적 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시 예가 가능하다는 점을 이해할 것이다. 따라서, 개시된 기술의 진정한 기술적 보호범위는 첨부된 특허청구범위에 의해 정해져야 할 것이다.A method and apparatus for searching music using an inflection point of a melody line according to an embodiment of the disclosed technology has been described with reference to the embodiment shown in the drawings for better understanding, but this is only an example, and those of ordinary skill in the art It will be appreciated that various modifications and equivalent other embodiments are possible therefrom. Accordingly, the true technical protection scope of the disclosed technology should be defined by the appended claims.

Claims

selecting, by a search device, a plurality of inflection points included in the input sound source;
extracting, by the search apparatus, information on inflection points based on a change in slope of each of the plurality of inflection points and a distance ratio between the respective inflection points;
calculating a similarity between sequences by comparing the inflection point information of the original song with the inflection point information; and
and normalizing the input sound source by dividing it by the length of a representative passage of the original song.

The method of claim 1, wherein calculating the similarity comprises:
Music search using the inflection point of a melody line that calculates whether the representative phrase of the original song is included in the sound source by sliding the sequence for the inflection point information of the representative phrase of the original song one by one with respect to the input sequence for the inflection point information of the sound source Way.

The method of claim 1,
The search apparatus calculates the similarity between the sequences by using an editing cost algorithm. A music search method using an inflection point of a melody line.

The method of claim 1,
The search apparatus extracts, as the inflection point information, a feature value of a specific inflection point where a change in slope and a distance ratio among the plurality of inflection points starts, as the inflection point information.

The method of claim 1,
The search apparatus extracts a plurality of inflection point information for a plurality of sound sources stored in a database, respectively, and compares the plurality of inflection point information with the inflection point information of the input sound source.

6. The method of claim 5,
The search apparatus outputs information on a sound source having the highest similarity with the input sound source among the plurality of sound sources as a search result. A music search method using an inflection point of a melody line.

an input device for receiving a sound source;
a storage device for storing a plurality of sound sources; and
Selecting a plurality of inflection points included in the input sound source, extracting inflection point information based on a change in slope for each of the plurality of inflection points and a distance ratio between the respective inflection points, and extracting the inflection point information of each of the plurality of sound sources and the input A music search apparatus using an inflection point of a melody line comprising a; a processing unit that compares the inflection point information of the sound source to search for an original song for the input sound source.

8. The method of claim 7,
The processing device calculates a degree of sequence similarity between the inflection point information of each of the plurality of sound sources and the inflection point information of the input sound source, and searches for the original song among the plurality of sound sources by using an inflection point of a melody line.

9. The method of claim 8,
The processing device divides the input sound source by the length of the representative phrase of the original song and normalizes the music search device using the inflection point of the melody line.

8. The method of claim 7,
The processing device slides the sequence for the inflection point information of the representative phrase of the original song one by one with respect to the sequence for the inflection point information of the input sound source, and calculates whether the representative phrase of the original song is included in the sound source. music retrieval device.

8. The method of claim 7,
The processing device extracts, as the inflection point information, a feature value of a specific inflection point, at which a change in slope and a distance ratio, among the plurality of inflection points, as the inflection point information.

8. The method of claim 7,
The music search device further comprises an output device,
The output device is a music search device using an inflection point of a melody line for outputting information on a sound source having the highest similarity with the input sound source among the plurality of sound sources as the search result.