KR102037221B1

KR102037221B1 - Audio finger print matching method

Info

Publication number: KR102037221B1
Application number: KR1020170146711A
Authority: KR
Inventors: 이정환; 방경식
Original assignee: 주식회사 아이티밥
Priority date: 2017-11-06
Filing date: 2017-11-06
Publication date: 2019-10-29
Also published as: KR20190051265A

Abstract

오디오 핑거프린트 매칭 방법이 개시된다. 쿼리 입력 모듈(query input module)이 쿼리 음원(query audio source)을 입력받는 단계; 오디오 핑거프린트 생성 모듈(audio fingerprint generation module)이 상기 쿼리 입력 모듈에서 입력받은 쿼리 음원으로부터 오디오 핑거프린트를 생성하는 단계; 서브 핑거프린트 생성 모듈(sub-fingerprint generation module)이 상기 오디오 핑거프린트 생성 모듈에서 생성된 오디오 핑거프린트로부터 서브 핑거프린트를 추출하여 생성하는 단계; 시퀀스 매칭 모듈(sequence matching module)이 상기 서브 핑거프린트 생성 모듈에서 생성된 서브 핑거프린트 및 레퍼런스 데이터베이스에 미리 저장된 레퍼런스 음원의 서브 핑거프린트 간에 시퀀스 매칭(sequence matching)을 수행하여 시퀀스의 상호 일치 여부를 판단하는 단계; 유사 음원 출력 모듈이 상기 시퀀스 매칭 모듈에서 판단된 상호 일치 여부에 따라 쿼리 음원에 상응하는 레퍼런스 음원을 출력하는 단계를 구성한다.An audio fingerprint matching method is disclosed. Receiving, by a query input module, a query audio source; Generating, by an audio fingerprint generation module, an audio fingerprint from a query sound source input from the query input module; Generating, by a sub-fingerprint generation module, a sub-fingerprint from an audio fingerprint generated by the audio fingerprint generation module; A sequence matching module performs sequence matching between the sub fingerprint generated by the sub fingerprint generation module and the sub fingerprints of the reference sound sources stored in the reference database in advance to determine whether the sequences match each other. Doing; The pseudo sound source output module is configured to output a reference sound source corresponding to the query sound source according to the mutual match determined by the sequence matching module.

Description

Audio fingerprint matching method {AUDIO FINGER PRINT MATCHING METHOD}

본 발명은 오디오 핑거프린트(audio fingerprint)에 관한 것으로서, 구체적으로는 오디오 핑거프린트 매칭 방법에 관한 것이다.TECHNICAL FIELD The present invention relates to audio fingerprints, and more particularly, to an audio fingerprint matching method.

최근에는 음원 관련 저작권 협회 등에서 음원을 듣고서 무슨 음원인지를 판독하는 시스템이 많이 이용되고 있다. 그런데, 수백만 곡의 음원을 대비해야 하기 때문에 시간이나 연산량으로 볼 때 상당한 로드(load)가 걸리는 작업이다.Recently, a system for reading a sound source and reading a sound source by a copyright association or the like related to a sound source has been widely used. However, it is a task that requires a considerable load in terms of time or amount of calculation because it is necessary to prepare millions of songs.

깨끗한 음질의 음원을 쿼리(query)로 입력받는 경우에는 시간이나 연산량과는 무관하게 판독률은 높게 나올 수 있다. 그러나, 배경음악(background music)이나 노이즈(noise)가 많은 음원은 정확한 음원 판독이 어려운 경우가 많다.When a clean sound source is input as a query, the read rate may be high regardless of time or calculation amount. However, a sound source having a lot of background music or noise is often difficult to read accurately.

특히, TV 음원에서 박수 소리라든가 웃음 소리와 같은 노이즈가 섞여 있는 경우에는 정확한 음원 판독이 어렵다. 이에, 이러한 소음에도 강건한 오디오 식별 수단이 요구되고 있다.In particular, accurate sound source reading is difficult when noise such as applause or laughter is mixed in a TV sound source. Accordingly, there is a demand for an audio identification means that is robust against such noise.

도 1 내지 도 3은 종래 기술에 따른 오디오 핑거프린트를 이용한 유사 음원 검색 방식을 나타내는 모식도이다.1 to 3 are schematic diagrams showing a similar sound source search method using an audio fingerprint according to the prior art.

도 1은 종래의 오디오 핑거프린트(audio fingerprint)를 생성하는 방식을 나타내고 있다. 먼저 쿼리(query) 음원을 스펙트로그램(spectrogram)으로 변환하고 스펙트로그램에서 특징적인 주파수(frequency)를 시간 단위로 추출하여 오디오 핑거프린트를 생성한다. 그리고 이를 데이터베이스에 저장된 수백만 레퍼런스(reference) 음원의 핑거프린트와 대비하여 유사 음원을 찾아낸다. 도 2와 도 3은 이러한 일련의 대비 과정을 나타내고 있다.1 illustrates a method of generating a conventional audio fingerprint. First, an audio fingerprint is generated by converting a query sound source into a spectrogram and extracting characteristic frequencies from the spectrogram in units of time. And it finds similar sound sources by comparing them with fingerprints of millions of reference sound sources stored in the database. 2 and 3 illustrate this series of contrast processes.

좀 더 구체적으로 보면, 도 2의 (B)에서는 유사도 매트릭스(similarity matrix)의 세로축은 쿼리 음원이고 가로축은 레퍼런스 음원이라고 할 때, 서로 일치하는 스트링(string)의 매칭 구간이 대각선 형태로 나타나게 된다.More specifically, in FIG. 2B, when the vertical axis of the similarity matrix is a query sound source and the horizontal axis is a reference sound source, matching sections of strings matching each other appear in a diagonal form.

그런데, 이러한 매칭 프로세스는 그 연산량과 시간에 있어서 엄청난 로드가 발생할 수밖에 없다. 모든 레퍼런스 음원들을 다 대비할 경우 한정된 시간 내에 음원을 정확하게 찾아낸다는 것은 매우 어려운 작업이다.However, such a matching process inevitably causes a huge load on the amount of computation and time. When all the reference sources are prepared, it is very difficult to find the exact source within a limited time.

10-086261610-0862616 10-2006-003740310-2006-0037403

본 발명의 목적은 오디오 핑거프린트 매칭 방법을 제공하는 데 있다.An object of the present invention is to provide an audio fingerprint matching method.

상술한 본 발명의 목적에 따른 오디오 핑거프린트 매칭 방법은, 쿼리 입력 모듈(query input module)이 쿼리 음원(query audio source)을 입력받는 단계; 오디오 핑거프린트 생성 모듈(audio fingerprint generation module)이 상기 쿼리 입력 모듈에서 입력받은 쿼리 음원으로부터 오디오 핑거프린트를 생성하는 단계; 서브 핑거프린트 생성 모듈(sub-fingerprint generation module)이 상기 오디오 핑거프린트 생성 모듈에서 생성된 오디오 핑거프린트로부터 서브 핑거프린트를 추출하여 생성하는 단계; 시퀀스 매칭 모듈(sequence matching module)이 상기 서브 핑거프린트 생성 모듈에서 생성된 서브 핑거프린트 및 레퍼런스 데이터베이스에 미리 저장된 레퍼런스 음원의 서브 핑거프린트 간에 시퀀스 매칭(sequence matching)을 수행하여 시퀀스의 상호 일치 여부를 판단하는 단계; 유사 음원 출력 모듈이 상기 시퀀스 매칭 모듈에서 판단된 상호 일치 여부에 따라 쿼리 음원에 상응하는 레퍼런스 음원을 출력하는 단계를 포함하도록 구성될 수 있다.In accordance with another aspect of the present invention, an audio fingerprint matching method includes: receiving, by a query input module, a query audio source; Generating, by an audio fingerprint generation module, an audio fingerprint from a query sound source input from the query input module; Generating, by a sub-fingerprint generation module, a sub-fingerprint from an audio fingerprint generated by the audio fingerprint generation module; A sequence matching module performs sequence matching between the sub fingerprint generated by the sub fingerprint generation module and the sub fingerprints of the reference sound sources stored in the reference database in advance to determine whether the sequences match each other. Doing; The similar sound source output module may be configured to include outputting a reference sound source corresponding to the query sound source according to the mutual match determined by the sequence matching module.

여기서, 상기 서브 핑거프린트 생성 모듈이 상기 오디오 핑거프린트 생성 모듈에서 생성된 오디오 핑거프린트로부터 서브 핑거프린트를 추출하여 생성하는 단계는, 이진화부가 상기 오디오 핑거프린트 생성 모듈에서 생성된 오디오 핑거프린트를 이진화(binarization)하고, 서브 핑거프린트 추출부가 상기 이진화부에서 이진화된 오디오 핑거프린트를 소정 비트(bit)수 단위로 순차적으로 추출하여 서브 핑거프린트(sub fingerprint)를 생성하고, 포인터 열 생성부가 상기 서브 핑거프린트 추출부에서 생성된 서브 핑거프린트를 지시하는 포인터(pointer)로 구성되는 포인터 열을 생성하도록 구성될 수 있다.The extracting of the sub fingerprint from the audio fingerprint generated by the audio fingerprint generation module by the sub fingerprint generation module may include: binarizing the audio fingerprint generated by the audio fingerprint generation module. binarization), the sub-fingerprint extractor sequentially extracts the audio fingerprint binarized by the binarization unit by a predetermined number of bits, and generates a sub fingerprint, and a pointer string generator generates the sub fingerprint. The pointer unit may be configured to generate a pointer string including a pointer indicating a sub fingerprint generated by the extractor.

그리고 상기 시퀀스 매칭 모듈이 상기 서브 핑거프린트 생성 모듈에서 생성된 서브 핑거프린트 및 레퍼런스 데이터베이스에 미리 저장된 레퍼런스 음원의 서브 핑거프린트 간에 시퀀스 매칭을 수행하여 시퀀스의 상호 일치 여부를 판단하는 단계는, 고속 근사 매칭(coarse matching)부가 상기 쿼리 음원의 포인터 열과 상기 레퍼런스 음원의 포인터 열로 구성되는 유사도 매트릭스(similarity matrix)를 형성하고 형성된 유사도 매트릭스 상에서 대각선 매칭 라인(diagonal matching line)을 형성하여 고속 근사 매칭을 수행하고, 상세 매칭(fine matching)부가 상기 고속 근사 매칭부에 의해 상기 유사도 매트릭스 상에서 대각선 매칭 라인이 형성된 경우, 대각선 매칭 라인에 대해 로컬 엣지 검출(local edge detection)을 통해 상세 매칭을 수행하도록 구성될 수 있다.The sequence matching module may perform sequence matching between the sub fingerprint generated in the sub fingerprint generation module and the sub fingerprints of the reference sound sources previously stored in the reference database to determine whether the sequences match each other. a coarse matching unit forms a similarity matrix composed of a pointer string of the query sound source and a pointer string of the reference sound source, forms a diagonal matching line on the formed similarity matrix, and performs fast approximation matching, When a diagonal matching line is formed on the similarity matrix by the fast matching unit, a fine matching unit may be configured to perform fine matching through local edge detection on the diagonal matching line.

상술한 오디오 핑거프린트 매칭 방법에 의하면, 오디오 핑거프린트를 서브 핑거프린트로 분할하고 각 서브 핑거프린터를 지시할 수 있는 포인터(pointer)값을 서로 대비하여 고속 근사 매칭(coarse matching)을 우선 적용하도록 구성됨으로써, 비교적 유사도가 높은 레퍼런스 음원을 신속하고 정확하게 검색해낼 수 있으며, 소음에도 강건한 검색 능력을 갖게 되는 효과가 있다.According to the above-described audio fingerprint matching method, the audio fingerprint is divided into sub-fingerprints, and a fast coarse matching is first applied to each other by comparing a pointer value that can indicate each sub-printer. By doing so, it is possible to quickly and accurately search for a reference sound source having a relatively high similarity, and to have a robust search capability against noise.

도 1 내지 도 3은 종래 기술에 따른 오디오 핑거프린트를 이용한 유사 음원 검색 방식을 나타내는 모식도이다.
도 4는 본 발명의 일 실시예에 따른 오디오 핑거프린트 매칭 시스템의 블록 구성도이다.
도 5는 본 발명의 일 실시예에 따른 서브 핑거프린트 매칭을 나타내는 모식도이다.
도 6은 본 발명의 일 실시예에 따른 고속 근사 매칭을 나타내는 모식도이다.
도 7은 본 발명의 일 실시예에 따른 유사도 매트릭스(similarity matrix)를 나타내는 예시도이다.
도 8은 본 발명의 일 실시예에 따른 상세 매칭을 나타내는 모식도이다.
도 9는 본 발명의 일 실시예에 따른 오디오 핑거프린트 매칭 방법의 흐름도이다.1 to 3 are schematic diagrams showing a similar sound source search method using an audio fingerprint according to the prior art.
4 is a block diagram of an audio fingerprint matching system according to an embodiment of the present invention.
5 is a schematic diagram illustrating sub-fingerprint matching according to an embodiment of the present invention.
6 is a schematic diagram showing fast approximation matching according to an embodiment of the present invention.
7 is an exemplary diagram illustrating a similarity matrix according to an embodiment of the present invention.
8 is a schematic diagram showing detailed matching according to an embodiment of the present invention.
9 is a flowchart of an audio fingerprint matching method according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시 예들을 도면에 예시하고 발명을 실시하기 위한 구체적인 내용에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.As the inventive concept allows for various changes and numerous embodiments, particular embodiments will be illustrated in the drawings and described in detail in the written description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention. In describing the drawings, similar reference numerals are used for similar elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component. The term and / or includes a combination of a plurality of related items or any item of a plurality of related items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be. On the other hand, when a component is said to be "directly connected" or "directly connected" to another component, it should be understood that there is no other component in between.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art and shall not be construed in ideal or excessively formal meanings unless expressly defined in this application. Do not.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 4는 본 발명의 일 실시예에 따른 오디오 핑거프린트 매칭 시스템의 블록 구성도이다. 그리고 도 5는 본 발명의 일 실시예에 따른 서브 핑거프린트 매칭을 나타내는 모식도이고, 도 6은 본 발명의 일 실시예에 따른 고속 근사 매칭을 나타내는 모식도이고, 도 7은 본 발명의 일 실시예에 따른 유사도 매트릭스(similarity matrix)를 나타내는 예시도이며, 도 8은 본 발명의 일 실시예에 따른 상세 매칭을 나타내는 모식도이다.4 is a block diagram of an audio fingerprint matching system according to an embodiment of the present invention. 5 is a schematic diagram showing sub-fingerprint matching according to an embodiment of the present invention, FIG. 6 is a schematic diagram showing fast approximation matching according to an embodiment of the present invention, and FIG. 7 is a diagram illustrating one embodiment of the present invention. FIG. 8 is a schematic diagram showing a similarity matrix according to the present invention, and FIG. 8 is a schematic diagram showing detailed matching according to an embodiment of the present invention.

먼저 도 4를 참조하면, 본 발명의 일 실시예에 따른 오디오 핑거프린트 매칭 시스템(audio fingerprint matching system)(100)은 레퍼런스 데이터베이스(reference database)(110), 오디오 핑거프린트 생성 모듈(audio fingerprint generation module)(120), 서브 핑거프린트 생성 모듈(sub fingerprint generation module)(130), 시퀀스 매칭 모듈(sequence matching module)(140), 유사 음원 출력 모듈(150)을 포함하도록 구성될 수 있다.First, referring to FIG. 4, an audio fingerprint matching system 100 according to an embodiment of the present invention includes a reference database 110 and an audio fingerprint generation module. 120), a sub fingerprint generation module (130), a sequence matching module (sequence matching module) 140, and may be configured to include a similar sound source output module 150.

오디오 핑거프린트 매칭 시스템(100)은 오디오 핑거프린트를 이진화(binarization)하여 서브 핑거프린트로 분할하고 그 서브프린터의 포인터(pointer)값들로 구성되는 인버트 테이블(invert table)을 이용하여 레퍼런스 음원을 상호 대비하여 검색하도록 구성된다. 연산량이 획기적으로 줄어들고 연산 시간도 축소되며, 소음에도 강건한 음원 검색이 가능해진다.The audio fingerprint matching system 100 binarizes an audio fingerprint, divides the audio fingerprint into sub-fingerprints, and uses an invert table composed of pointer values of the subprinter to compare the reference sound sources with each other. To search. The amount of computation is dramatically reduced, the computation time is reduced, and sound search that is robust against noise is possible.

이하, 세부적인 구성에 대하여 설명한다.Hereinafter, the detailed structure is demonstrated.

레퍼런스 데이터베이스(110)는 수많은 레퍼런스 음원(reference audio source)에 대한 정보를 미리 저장하도록 구성될 수 있다.The reference database 110 may be configured to store in advance information about a number of reference audio sources.

레퍼런스 데이터베이스(110)는 종래와 달리 레퍼런스 음원의 오디오 핑거프린트뿐만 아니라 이로부터 생성되는 서브 핑거프린트를 미리 저장하도록 구성될 수 있다.Unlike in the related art, the reference database 110 may be configured to store not only an audio fingerprint of a reference sound source but also a sub fingerprint generated therefrom.

서브 핑거프린트는 오디오 핑거프린트가 분할되어 생성되는 구성이다.The sub fingerprint is a component in which an audio fingerprint is divided and generated.

먼저 오디오 핑거프린트를 이진화(binarization)한 후, 이진 코드를 소정 비트(bit)수 단위로 분할하여 서브 핑거프린트가 생성될 수 있다. 이때, 도 5에서 보듯이 10개의 비트로 구성되는 서브 핑거프린트가 생성될 수 있으며, 각 서브 핑거프린트는 서로 일정 비트수만큼 겹치도록 생성될 수 있다.First, after binarization of the audio fingerprint, the sub fingerprint can be generated by dividing the binary code by a predetermined number of bits. In this case, as shown in FIG. 5, a sub fingerprint consisting of 10 bits may be generated, and each sub fingerprint may be generated to overlap each other by a predetermined number of bits.

여기서, 서브 핑거프린트는 10개의 비트로 구성되는 경우 총 1024개의 경우의 수가 생길 수 있다. 이때, 각 서브 핑거프린트에 대해 각 서브 핑거프린트를 지시하는 하나의 값, 포인터(pointer)로 대체할 수 있다. 포인터는 1024개가 있을 수 있고, 각 서브 핑거프린트는 1024개의 포인터 중 어느 하나로 순차적으로 대체될 수 있다. 서브 핑거프린트와 해당 포인터를 서로 대응시키는 인버트 테이블(invert table)을 이용하여 각 서브 핑거프린트를 포인터로 표현할 수 있다.Here, when the sub fingerprint is composed of 10 bits, a total of 1024 cases may be generated. In this case, for each sub fingerprint, one value indicating a sub fingerprint may be replaced with a pointer. There may be 1024 pointers, and each sub-fingerprint may be sequentially replaced with any one of the 1024 pointers. Each sub-fingerprint may be represented by a pointer using an invert table that corresponds to the sub-fingerprint and the corresponding pointer.

레퍼런스 데이터베이스(110)에는 각 레퍼런스 음원의 오디오 핑거프린트에 대해 각 서브 핑거프린트의 포인터들로 표현되는 데이터가 미리 저장될 수 있다. 쿼리 음원(query audio source)에 대해서도 이러한 각 서브 핑거프린터의 포인터들과 대비되면 신속하고 정확하게 레퍼런스 음원이 검색될 수 있다.The reference database 110 may store data represented by pointers of sub-fingerprints in advance with respect to the audio fingerprint of each reference sound source. For a query audio source, a reference sound source can be searched quickly and accurately when compared with the pointers of each of the sub-fingerprinters.

쿼리 음원에는 박수 소리, 소음, 말 소리, 차 소리 등과 같은 다양한 소음이 섞여 있게 되는데, 서로 이웃하는 서브 핑거프린트를 서로 일정 비트수가 겹치게 생성하면, 이러한 소음에도 매우 강건한 대비 알고리즘이 될 수 있다.In the query sound source, various noises such as clapping sounds, noises, speech sounds, and car sounds are mixed. When the neighboring sub-fingerprints are generated with a certain number of beats overlapping with each other, it can be a very robust contrast algorithm.

쿼리 입력 모듈(120)은 쿼리 음원을 입력받도록 구성될 수 있다. 쿼리 음원은 TV(television)에 나오는 배경 음악(background music), 음악 쇼의 생방송 음악 등 그 제한이 없다.The query input module 120 may be configured to receive a query sound source. The query sound source does not have limitations such as background music on TV and live music of a music show.

오디오 핑거프린트 생성 모듈(130)은 쿼리 입력 모듈(120)에서 입력받은 쿼리 음원으로부터 오디오 핑거프린트를 생성하도록 구성될 수 있다. 오디오 핑거프린트는 도 2의 과정을 통해 생성될 수 있다.The audio fingerprint generation module 130 may be configured to generate an audio fingerprint from the query sound source input from the query input module 120. The audio fingerprint may be generated through the process of FIG. 2.

서브 핑거프린트 생성 모듈(130)은 쿼리 음원의 오디오 핑거프린트로부터 서브 핑거프린트를 추출하여 생성하도록 구성될 수 있다. 엄격하게는 서브 핑거프린트들을 각각 지시하는 포인터(pointer) 열을 생성하도록 구성될 수 있다.The sub fingerprint generation module 130 may be configured to extract and generate the sub fingerprint from the audio fingerprint of the query sound source. Strictly, it may be configured to generate a pointer column that respectively indicates the sub fingerprints.

서브 핑거프린트 생성 모듈(130)은 이진화부(131), 서브 핑거프린트 추출부(132), 포인터 열 생성부(133)를 포함하도록 구성될 수 있다.The sub fingerprint generation module 130 may be configured to include a binarization unit 131, a sub fingerprint extraction unit 132, and a pointer column generation unit 133.

여기서, 이진화부(131)는 쿼리 음원의 오디오 핑거프린트를 이진화하여 이진 코드로 변환하여 출력하도록 구성될 수 있다. 그리고 서브 핑거프린트 추출부(132)는 이진화된 오디오 핑거프린트를 소정 비트(bit)수 단위로 순차적으로 추출하여 서브 핑거프린트를 생성하도록 구성될 수 있다. 이때, 도 5에서는 10 비트 단위로 서브 핑거프린트를 추출하는 것을 예시하고 있다. 그리고 서브 핑거프린트를 순차적으로 추출되는 과정에서 서로 몇 비트씩 겹치도록 구성될 수 있다. 도 5에서는 3 비트씩 겹치게 추출되는 과정을 나타낸다. 쿼리 음원에는 소음에 의한 에러 비트(error bit)가 발생할 수 있기 때문에 이를 고려하여 3 비트씩 겹치게 추출하여 소음에 강건한 검색 수단을 제공한다. 한편, 서브 핑거프린트가 10 비트로 구성된 경우 이진 코드로 구성되는 서브 핑거프린트의 경우의 수는 1024가지이다. 포인터 열 생성부(133)는 이러한 1024가지의 서브 핑거프린트를 미리 구비된 인버트 테이블(invert table) 상에서 1024개의 포인터(pointer)로 변환할 수 있다. 이러한 1024가지의 포인터는 각 서브 핑거프린트에 대해 인버트 테이블을 이용하여 인버팅(inverting)되어 서브 핑거프린트 열을 나타내는 포인터 열로 표현될 수 있다. 많은 데이터량을 갖는 오디오 핑거프린트에 비해 매우 간단한 값으로 변환되어 있음을 알 수 있다.Here, the binarization unit 131 may be configured to convert the audio fingerprint of the query sound source into a binary code and output the converted binary code. The sub fingerprint extractor 132 may be configured to sequentially extract the binarized audio fingerprint by a predetermined number of bits to generate a sub fingerprint. 5 illustrates the extraction of the sub fingerprint in units of 10 bits. The sub-fingerprints may be configured to overlap each other by a few bits in the process of sequentially extracting the sub-fingerprints. In FIG. 5, a process of overlapping three bits is shown. Since an error bit may occur due to noise in the query sound source, the search sound source is extracted by overlapping three bits in consideration of the noise bit, thereby providing a robust search means. On the other hand, when the sub fingerprint is composed of 10 bits, the number of sub fingerprints composed of binary codes is 1024. The pointer column generator 133 may convert the 1024 sub fingerprints into 1024 pointers on an invert table provided in advance. These 1024 pointers may be inverted using an invert table for each sub-fingerprint to be represented as a pointer column indicating a sub-fingerprint column. It can be seen that it is converted to a very simple value compared to an audio fingerprint having a large amount of data.

시퀀스 매칭 모듈(140)은 서브 핑거프린트 생성 모듈(130)에서 생성된 서브 핑거프린트 및 레퍼런스 데이터베이스(110)에 미리 저장된 레퍼런스 음원의 서브 핑거프린트 간에 시퀀스 매칭(sequence matching)을 수행하여 시퀀스의 상호 일치 여부를 판단하도록 구성될 수 있다.The sequence matching module 140 performs sequence matching between the sub fingerprint generated by the sub fingerprint generation module 130 and the sub fingerprints of the reference sound source pre-stored in the reference database 110 to mutually match the sequences. It can be configured to determine whether.

시퀀스 매칭 모듈(140)은 고속 근사 매칭부(141), 상세 매칭부(142)를 포함하도록 구성될 수 있다.The sequence matching module 140 may be configured to include the fast approximation matching unit 141 and the detail matching unit 142.

고속 근사 매칭부(141)는 도 6과 같이 쿼리 음원의 포인터와 레퍼런스 음원의 포인터가 양축으로 구성되는 유사도 매트릭스(similarity matrix)를 형성하고 유사도 매트릭스 상에서 서로 일치되는 포인터값을 갖는 영역을 찾아내도록 구성될 수 있다. 이러한 부분은 유사도 매트릭스의 여러 곳에서 점으로 나타날 수 있으며, 쿼리 음원의 소음에 의한 에러 비트로 인해 여러 곳에서 오검출될 수 있다.The fast approximation matching unit 141 forms a similarity matrix in which the pointer of the query sound source and the pointer of the reference sound source are composed of both axes as shown in FIG. Can be. These parts may appear as dots at various places in the similarity matrix, and may be misdetected at various places due to error bits caused by noise of the query source.

포인터 열이 양축에서 순차적으로 설정될 때, 양 음원의 유사한 부분은 대각선 매칭 라인(diagonal matching line)으로 나타나게 된다. 매우 신속하고 정확하게 근사 매칭을 수행할 수 있다.When the pointer rows are set sequentially on both axes, similar portions of the positive sound source appear as diagonal matching lines. Approximate matching can be performed very quickly and accurately.

그리고 고속 근사 매칭부(141)에 의해 유사도 매트릭스 상에서 대각선 매칭 라인이 형성된 경우, 상세 매칭부(142)는 대각선 매칭 라인에 대해 로컬 엣지 검출(local edge detection)을 통해 상세 매칭을 수행하도록 구성될 수 있다.When the diagonal matching line is formed on the similarity matrix by the fast approximation matching unit 141, the detail matching unit 142 may be configured to perform detailed matching on the diagonal matching line through local edge detection. have.

도 8에서 보듯이 먼저 대각선 매칭 라인에서 각 점들의 대각선 매칭 라인 위/아래 근방에 대한 해밍(hamming) 거리를 계산하여 대각선 매칭 라인을 확장한다. 그리고 그 점들의 수를 누적하여 피크(peak)를 검출한 후 후보 대각선을 산출한다. 그리고 그 후보 대각선에 대해서 로컬 엣지(local edge)를 검출(detect)하고 이들을 병합(merge)한 후 검증(verification)하여 상세 매칭을 수행한다.As shown in FIG. 8, the diagonal matching line is first extended by calculating a hamming distance of the points above and below the diagonal matching line. After accumulating the number of points, a peak is detected and a candidate diagonal line is calculated. Local edges are detected with respect to the candidate diagonal lines, merged and verified, and detailed matching is performed.

유사 음원 출력 모듈(150)은 시퀀스 매칭 모듈(140)에서 판단된 상호 일치 여부에 따라 쿼리 음원에 상응하는 레퍼런스 음원을 출력하도록 구성될 수 있다. 레퍼런스 음원에 대한 제목, 코드 등의 기본 정보를 출력할 수 있다.The similar sound source output module 150 may be configured to output a reference sound source corresponding to the query sound source according to the mutual match determined by the sequence matching module 140. Basic information such as title and code of the reference sound source can be output.

도 9는 본 발명의 일 실시예에 따른 오디오 핑거프린트 매칭 방법의 흐름도이다.9 is a flowchart of an audio fingerprint matching method according to an embodiment of the present invention.

도 9를 참조하면, 쿼리 입력 모듈(120)이 쿼리 음원을 입력받는다(S101).Referring to FIG. 9, the query input module 120 receives a query sound source (S101).

다음으로, 오디오 핑거프린트 생성 모듈(130)이 쿼리 입력 모듈(120)에서 입력받은 쿼리 음원으로부터 오디오 핑거프린트를 생성한다(S102).Next, the audio fingerprint generation module 130 generates an audio fingerprint from the query sound source received from the query input module 120 (S102).

다음으로, 서브 핑거프린트 생성 모듈(140)이 오디오 핑거프린트 생성 모듈(130)에서 생성된 오디오 핑거프린트로부터 서브 핑거프린트를 추출하여 생성한다(S103).Next, the sub fingerprint generation module 140 extracts and generates the sub fingerprint from the audio fingerprint generated by the audio fingerprint generation module 130 (S103).

여기서, 이진화부(141)가 오디오 핑거프린트 생성 모듈(130)에서 생성된 오디오 핑거프린트를 이진화하고, 서브 핑거프린트 추출부(142)가 이진화부에서 이진화된 오디오 핑거프린트를 소정 비트수 단위로 순차적으로 추출하여 서브 핑거프린트를 생성하고, 포인터 열 생성부(143)가 서브 핑거프린트 추출부(142)에서 생성된 서브 핑거프린트를 지시하는 포인터로 구성되는 포인터 열을 생성하도록 구성될 수 있다.Here, the binarization unit 141 binarizes the audio fingerprint generated by the audio fingerprint generation module 130, and the sub fingerprint extraction unit 142 sequentially processes the audio fingerprint binarized by the binarization unit by a predetermined number of bits. The sub fingerprint may be generated by extracting the sub fingerprint, and the pointer string generator 143 may be configured to generate a pointer string including a pointer indicating the sub fingerprint generated by the sub fingerprint extractor 142.

다음으로, 시퀀스 매칭 모듈(150)이 서브 핑거프린트 생성 모듈(140)에서 생성된 서브 핑거프린트 및 레퍼런스 데이터베이스(110)에 미리 저장된 레퍼런스 음원의 서브 핑거프린트 간에 시퀀스 매칭을 수행하여 시퀀스의 상호 일치 여부를 판단한다(S104).Next, the sequence matching module 150 performs sequence matching between the sub fingerprints generated by the sub fingerprint generation module 140 and the sub fingerprints of the reference sound source pre-stored in the reference database 110, and thus whether the sequences match each other. Determine (S104).

이때, 고속 근사 매칭부(151)가 쿼리 음원의 포인터 열과 레퍼런스 음원의 포인터 열로 구성되는 유사도 매트릭스를 형성하고 형성된 유사도 매트릭스 상에서 대각선 매칭 라인을 형성하여 고속 근사 매칭을 수행하고, 상세 매칭부(152)가 고속 근사 매칭부(151)에 의해 유사도 매트릭스 상에서 대각선 매칭 라인이 형성된 경우, 대각선 매칭 라인에 대해 로컬 엣지 검출을 통해 상세 매칭을 수행하도록 구성될 수 있다.In this case, the fast approximation matching unit 151 forms a similarity matrix including a pointer string of the query sound source and a pointer string of the reference sound source, forms a diagonal matching line on the formed similarity matrix, and performs fast approximation matching, and the detail matching unit 152. When a diagonal matching line is formed on the similarity matrix by the fast approximation matching unit 151, it may be configured to perform detailed matching on the diagonal matching line through local edge detection.

다음으로, 유사 음원 출력 모듈(160)이 시퀀스 매칭 모듈(150)에서 판단된 상호 일치 여부에 따라 쿼리 음원에 상응하는 레퍼런스 음원을 출력한다(S105).Next, the similar sound source output module 160 outputs a reference sound source corresponding to the query sound source according to the mutual match determined by the sequence matching module 150 (S105).

이상 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허청구범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described with reference to the above embodiments, those skilled in the art can understand that the present invention can be variously modified and changed without departing from the spirit and scope of the invention described in the claims below. There will be.

110: 레퍼런스 데이터베이스
120: 오디오 핑거프린트 생성 모듈
130: 서브 핑거프린트 생성 모듈
131: 이진화부
132: 서브 핑거프린트 추출부
133: 포인터 열 생성부
140: 시퀀스 매칭 모듈
141: 고속 근사 매칭부
142: 상세 매칭부
150: 유사 음원 출력 모듈110: reference database
120: audio fingerprint generation module
130: sub fingerprint generation module
131: binarization department
132: sub fingerprint extraction unit
133: pointer column generator
140: sequence matching module
141: fast approximation matching unit
142: detail matching unit
150: pseudo sound source output module

Claims

A query input module receiving a query audio source;
Generating, by an audio fingerprint generation module, an audio fingerprint from a query sound source input from the query input module;
Generating, by a sub-fingerprint generation module, a sub-fingerprint from an audio fingerprint generated by the audio fingerprint generation module;
A sequence matching module performs sequence matching between the sub fingerprint generated by the sub fingerprint generation module and the sub fingerprints of the reference sound sources stored in the reference database in advance to determine whether the sequences match each other. Doing;
And outputting, by the pseudo sound source output module, a reference sound source corresponding to the query sound source according to the mutual matching determined by the sequence matching module.
The sub-fingerprint generation module extracting and generating the sub-fingerprint from the audio fingerprint generated by the audio fingerprint generation module,
The binarization unit binarizes the audio fingerprint generated by the audio fingerprint generation module, and the sub fingerprint extraction unit sequentially extracts the audio fingerprint binarized by the binarization unit by a predetermined number of bits. Generate a sub-fingerprint, and generate a pointer string composed of a pointer indicating a sub-fingerprint generated by the sub-fingerprint extractor;
The sequence matching module may perform sequence matching between the sub fingerprint generated by the sub fingerprint generation module and the sub fingerprints of the reference sound sources previously stored in the reference database to determine whether the sequences match each other.
A fast approximation matching unit forms a similarity matrix composed of a pointer string of the query sound source and a pointer string of the reference sound source, and forms a diagonal matching line on the formed similarity matrix to achieve fast approximation matching. And a fine matching unit is configured to perform fine matching through local edge detection on the diagonal matching line when the diagonal matching line is formed on the similarity matrix by the fast approximation matching unit. The audio fingerprint matching method, characterized in that.

delete