KR20110036407A

KR20110036407A - Method for detrmining distance metric used in fingerprint mathing of fingerprint system by learning

Info

Publication number: KR20110036407A
Application number: KR1020090094063A
Authority: KR
Inventors: 유창동; 장달원
Original assignee: 한국과학기술원
Priority date: 2009-10-01
Filing date: 2009-10-01
Publication date: 2011-04-07
Also published as: KR101071728B1

Abstract

PURPOSE: A method for efficiently learning a distance metric through the minimization of a cost function is provided to improve fingerprint recognition performance by learning a distance metric from training data. CONSTITUTION: A training data is composed of the fingerprint of original content and the fingerprint of distorted content. A distance metric is determined through learning in order to improve recognition performance in the use of training data(S200). A parameterized distance metric is generated in a determination procedure of the distance metric. A cost function is generated. The parameter of the distance metric is determined through the minimization of the cost function.

Description

METHODS FOR DETRMINING DISTANCE METRIC USED IN FINGERPRINT MATHING OF FINGERPRINT SYSTEM BY LEARNING}

본 발명은 내용 기반 콘텐츠 인식, 특히 핑거 프린팅에 관한 것이고, 보다 상세하게는 본 발명은 핑거 프린팅 시스템에 있어서 핑거프린트 정합 과정의 성능 향상에 관한 것이다. The present invention relates to content-based content recognition, in particular finger printing, and more particularly, to the improvement of the performance of a fingerprint matching process in a finger printing system.

디지털 콘텐츠에 대한 보호, 관리 및 인덱싱에 대한 수요가 증대하고 있으며, 실현가능한 해결 방안으로서 핑거프린팅(fingerprinting)에 대한 관심이 증대되고 있다. 핑거프린팅은 핑거프린트라고 지칭되는 짧은 특성 벡터(short feature vector)를 이용하여 미지의 콘텐츠를 인식(identification)하는 기술이다. 최근, 다양한 오디오/비디오/이미지 핑거프린팅 기법이 제안되고 있다(참고문헌 [1]-[7]).The demand for protection, management and indexing of digital content is increasing, and interest in fingerprinting is increasing as a feasible solution. Fingerprinting is a technique for identifying unknown content using a short feature vector called a fingerprint. Recently, various audio / video / image fingerprinting techniques have been proposed (Refs. [1]-[7]).

콘텐츠 인식을 위한 핑거프린팅 시스템은 일반적으로 1) 핑거프린트 추출(extraction), 2) 데이터베이스 검색(database search) 및 3) 핑거프린트 정합(matching)의 세가지 필수 요소로 구성된다(참고문헌 [4]) A fingerprinting system for content recognition generally consists of three essential elements: 1) fingerprint extraction, 2) database search, and 3) fingerprint matching (Ref. [4]).

핑거프린트 추출 단계에서는 쿼리 콘텐츠(query content)로부터 쿼리 핑거프린트(query fingerprint)가 추출된다. 데이터베이스 검색 단계에서는 쿼리 핑거프린트에 근접한 후보 핑거프린트(candidate fingerprint)의 집합이 데이터베이스로부터 획득된다. 그리고, 핑거프린트 정합 단계에서는 디스턴스 메트릭에 기초하여 후보 핑거프린트와 쿼리 핑거프린트 사이의 거리가 계산된다. 핑거프린팅 시스템은 쿼리 핑거프린트에 가장 근접한 후보 핑거프린트에 관한 메타 데이터를 제공한다.In the fingerprint extraction step, a query fingerprint is extracted from the query content. In the database retrieval phase, a set of candidate fingerprints proximate to the query fingerprint is obtained from the database. In the fingerprint matching step, the distance between the candidate fingerprint and the query fingerprint is calculated based on the distance metric. The fingerprinting system provides metadata regarding the candidate fingerprint that is closest to the query fingerprint.

핑거프린트 추출 및 정합 과정은 시스템의 계산상 효율을 결정짓는 데이터베이스 검색 과정보다는 인식 성능(identification performance)에 더 큰 영향을 미친다. 핑거프린팅 시스템은 시스템의 적용 환경이 바뀔 경우 성능이 저하될 가능성이 있고 이를 위한 대응이 필요한다. 핑거프린팅 시스템의 인식 성능을 향상시키기 위해선 핑거프린트의 추출 과정을 새로 만들거나, 정합 과정을 새로 만들어야 한다. 이 중 핑거프린트의 추출 과정을 새로 만드는 것은 핑거프린트 DB를 새롭게 작성해야 하는 문제점을 수반하게 된다. 하지만, 핑거프린트 정합 과정에 이용되는 디스턴스 메트릭만을 새로 만들게 되면, 기존의 시스템에서 핑거프린트 추출과정과 핑거프린트 DB를 그대로 유지한 채로 성능을 향상시킬 수 있다. 이는 기존의 시스템의 최대한 유지한 채로, 새로운 적용 환경에 적응하도록 만들어 준다. The fingerprint extraction and matching process has a greater impact on identification performance than the database searching process that determines the computational efficiency of the system. Fingerprinting system may be degraded when the application environment of the system is changed, and a countermeasure is required. In order to improve the recognition performance of the fingerprinting system, a fingerprint extraction process or a registration process must be newly created. Among these, creating a new fingerprint extraction process involves a problem of newly writing a fingerprint DB. However, if only the distance metric used for the fingerprint matching process is newly created, performance can be improved while maintaining the fingerprint extraction process and the fingerprint DB in the existing system. This makes it possible to adapt to the new application environment while maintaining as much of the existing system as possible.

분류 및 클러스터링 (clustering, 군집화) 에서 디스턴스 메트릭을 학습(learning, 디스턴스 메트릭을 매트릭스 형태로 파라미터화한 경우 이 매트릭스를 결정하는 과정)하는 방법은 참고문헌 [8]-[10]에서 다루고 있다. 최근 연구에 의해 디스턴스 메트릭 학습에 의해 분류 및 클러스터링 성능을 향상시킬 수 있음이 밝혀졌다(참고문헌 [11]). 그러나 기존의 디스턴스 메트릭의 학습 방법은 핑거프린팅 시스템에 사용되지 않았고, 핑거프린팅 시스템에 적합한 학습 방법이 필요하다. Reference [8]-[10] discusses how distance metrics are learned in the classification and clustering process (the process of determining this matrix when the distance metric is parameterized in matrix form). Recent studies have shown that distance metric learning can improve classification and clustering performance (Ref. [11]). However, the existing distance metric learning method has not been used in a fingerprinting system, and a learning method suitable for a fingerprinting system is needed.

본 발명은 상술한 종래 기술의 문제점을 해소하고자 하는 것으로서, 본 발명의 목적은 핑거프린팅 시스템에 있어서 디스턴스 메트릭의 학습(learning) 과정을 통해 적절한 디스턴스 메트릭을 결정하고, 이를 통해서 핑거프린트의 인식(identification) 성능을 향상시키는 것을 기술적 과제로 한다.The present invention is to solve the problems of the prior art described above, an object of the present invention is to determine the appropriate distance metric through the learning process of the distance metric in the fingerprinting system, through which the identification of the fingerprint (identification) It is a technical problem to improve performance.

상기 기술적 과제를 해결하기 위해, 본 발명은 원본 및 왜곡된 콘텐츠로 구성된 훈련 데이터(training data)를 이용하여 학습된 디스턴스 메트릭을 사용함으로써, 기존의 핑거프린팅 시스템의 핑거프린트 인식 성능을 향상시킬 수 있도록 한다. 이로 인해 기존의 핑거프린팅 시스템의 핑거프린트 데이터베이스와 핑거프린트 추출 과정을 유지한 채로 성능을 향상시킬 수 있게 한다.In order to solve the above technical problem, the present invention by using the distance metric learned using the training data (training data) consisting of the original and distorted content, to improve the fingerprint recognition performance of the existing fingerprinting system do. This improves performance while maintaining the fingerprint database and the fingerprint extraction process of the existing fingerprinting system.

원본 및 왜곡된 콘텐츠로 구성된 훈련 데이터를 가지고 있다고 가정한다. 훈련 데이터를 이용해서 향상된 인식 성능이 나올 수 있도록 디스턴스 메트릭을 학습(Learning)하도록 한다. 디스턴스 메트릭이 학습되기 위해서는 디스턴스 메트릭은 파라미터화된 형태를 가지고 있어야 한다. 학습 과정을 통해서 디스턴스 메트릭의 파라미터가 결정된다. 학습 과정을 위해서는 비용 함수 (cost function) 을 이용한다. 비용함수는 인식 성능이 향상될수록 낮은 값을 갖도록 설계되어 있다. 학습 과정은 이 비용함수를 최소화시키는 디스턴스 메트릭의 파라미터를 구함으로써 하나의 디스턴스 메트릭을 결정하도록 한다. Suppose you have training data consisting of original and distorted content. Use training data to learn distance metrics so that improved recognition performance can be achieved. In order for the distance metric to be learned, the distance metric must have a parameterized form. The learning process determines the parameters of the distance metric. A cost function is used for the learning process. The cost function is designed to have a lower value as the recognition performance is improved. The learning process allows one distance metric to be determined by obtaining a parameter of the distance metric that minimizes this cost function.

원본 콘텐츠들에 대한 핑거프린트는 데이터베이스에 저장되어 있다고 가정하고, 왜곡된 콘텐츠들에 대한 핑거프린트는 쿼리 핑거프린트로 가정한다. 이 때 인식 성능을 개선하기 위해서는, 어떤 왜곡된 콘텐츠의 핑거프린트와 그 왜곡된 콘텐츠의 원본 콘텐츠[대응(corresponding) 콘텐츠]의 핑거프린트 사이의 거리는 작아야하며, 어떤 왜곡된 콘텐츠의 핑거프린트와 그 왜곡된 콘텐츠의 원본 콘텐츠가 아닌 다른 원본 콘텐츠[비대응(non-corresponding) 콘텐츠]의 핑거프린트 사이의 거리는 커야 한다는 전제를 가진다. 이 전제 하에서 비용 함수는 다양한 형태를 가질 수 있다. 본 발명에서는 이 비용 함수의 하나의 실시예로 다음과 같은 두 가지 원칙을 가지는 비용 함수를 제시한다.The fingerprint for the original contents is assumed to be stored in the database, and the fingerprint for the distorted contents is assumed to be a query fingerprint. At this time, in order to improve recognition performance, the distance between the fingerprint of any distorted content and the fingerprint of the original content (corresponding content) of the distorted content should be small, and the fingerprint of any distorted content and its distortion The premise is that the distance between the fingerprints of the original content (non-corresponding content) other than the original content of the specified content must be large. Under this premise, the cost function can take many forms. In the present invention, as an embodiment of this cost function, a cost function having the following two principles is proposed.

(원칙 1) (Principle 1)

원본 콘텐츠[대응(corresponding) 콘텐츠]의 핑거프린트와 왜곡된 콘텐츠의 핑거프린트 사이의 거리는 원본 콘텐츠의 핑거프린트와 다른 원본 콘텐츠[비대응(non-corresponding) 콘텐츠]의 핑거프린트 사이의 거리보다 작아야 한다.The distance between the fingerprint of the original content (corresponding content) and the fingerprint of the distorted content must be less than the distance between the fingerprint of the original content and the fingerprint of the other original content (non-corresponding content). .

(원칙 2) (Principle 2)

왜곡된 콘텐츠의 핑거프린트와 비대응 콘텐츠의 핑거프린트 사이의 거리 여유분(margin)은 가능한 커야 한다(참조문헌 [10]).The distance margin between the fingerprint of the distorted content and the fingerprint of the non-corresponding content should be as large as possible (Ref. [10]).

이상의 두가지 원칙에 기초한 비용함수를 설계하여 이를 이용해서 디스턴스 메트릭을 학습한다. 비용 함수는 상기 두가지 원칙이 충족되는 경우에 최소화된다. 즉, 비용 함수는 왜곡된 콘텐츠의 핑거프린트가 비대응 콘텐츠의 핑거프린트로부터 보다 대응 콘텐츠의 핑거프린트로부터 더 멀어질수록 그 값이 증가하도록 디자인되 어 있다. We design a cost function based on these two principles and use them to learn distance metrics. The cost function is minimized if both of the above principles are met. That is, the cost function is designed such that the value of the distorted content increases as the fingerprint of the non-corresponding content is further away from the fingerprint of the corresponding content.

한편, 비용 함수의 최소화 작업의 편의성을 위해 비용 함수는 볼록(convex) 함수의 형태를 가지도록 하는 것이 바람직하다. 이 경우 비용 함수의 최소화 작업은 볼록 최적화(convex optimization)에 의해 수행될 수 있다.On the other hand, it is preferable that the cost function has the form of a convex function for the convenience of minimizing the cost function. In this case, the minimization of the cost function may be performed by convex optimization.

결국, 특정 형태로 디스턴스 메트릭을 파라미터화한 경우, 이 디스턴스 메트릭의 파라미터를 결정하는 과정이 디스턴스 메트릭 학습이다. 파라미터화된 디스턴스 메트릭의 형태는 다양한 형태를 가질 수 있으며, 본 발명에서는 파라미터화된 디스턴스 메트릭의 실시예의 하나로 마할라노비스 거리의 일반형을 사용한다.After all, when the distance metric is parameterized in a specific form, the process of determining the parameter of the distance metric is distance metric learning. The shape of the parameterized distance metric can take a variety of forms, and the present invention uses a general form of Mahalanobis distance as one embodiment of the parameterized distance metric.

결국, 디스턴스 메트릭을 매트릭스 형태로 파라미터화한 경우 이 매트릭스를 구성하는 각 파라미터 값 및 매트릭스를 결정하는 과정이 디스턴스 메트릭 학습이다. After all, when the distance metric is parameterized in a matrix form, the process of determining each parameter value and matrix constituting the matrix is distance metric learning.

본 발명에서 다룰 디스턴스 메트릭 학습은 실수값을 가지는 핑거프린트에 대해서만 유효하므로, 이하의 설명에서는 핑거프린트는 실수값을 가진다고 가정할 것이다.Since distance metric learning to be dealt with in the present invention is valid only for a fingerprint having a real value, the following description will assume that the fingerprint has a real value.

더 구체적으로, 본 발명은 원본 콘텐츠에 대한 왜곡된 콘텐츠로부터 추출된 왜곡 콘텐츠의 핑거프린트(xi ,j)와 원본 콘텐츠의 핑거프린트(xi)와의 정합을 이용하여 콘텐츠 인식을 하는 핑거 프린팅 시스템의 정합 과정에 사용하는 디스턴스 메트릭을 학습을 통해서 결정하는 방법으로써, (A) 상기 원본 콘텐츠의 핑거프린트(xi) 및 상기 왜곡 콘텐츠의 핑거프린트(xi,j)로 구성된 훈련 데이터(training data)를 마련하는 단계(S 100)와, (B) 상기 훈련 데이터를 이용하여 향상된 인식 성능을 낼 수 있는 디스턴스 메트릭을 학습(learning)을 통해 결정하는 단계(S 200)를 포함한다.More specifically, the present invention provides a fingerprint printing system for recognizing content using matching of fingerprints (x i , j ) of distorted content extracted from distorted content for original content with fingerprint (x i ) of original content. A method of determining the distance metric used in the matching process through learning, comprising: (A) training data consisting of a fingerprint (x i ) of the original content and a fingerprint (x i, j ) of the distorted content; ) (S 100), and (B) determining a distance metric capable of producing improved recognition performance using the training data (S 200).

한편, 상기 (B) 단계는, (B-1) 상기 디스턴스 메트릭을 결정하기 위해 파라미터화한 디스턴스 메트릭을 생성하는 단계(S 210)와, (B-2) 상기 원본 콘텐츠의 핑거프린트(xi)와 상기 왜곡된 콘텐츠의 핑거프린트(xi ,j) 사이의 거리를 작게 만들고 상기 원본 콘텐츠의 핑거프린트(xi)와 다른 원본 콘텐츠(xk)의 핑거프린트 사이의 거리는 크게 만들 때 최소화되는 비용 함수[ε(.)]를 생성하는 단계(S 220)와, (B-3) 상기 비용 함수[ε(.)]가 최소화되는 경우를 찾아 상기 디스턴스 메트릭의 파라미터를 결정하는 단계(S 230)를 포함하는 것을 특징으로 한다.On the other hand, the step (B), (B-1) generating a parameterized distance metric to determine the distance metric (S 210), and (B-2) the fingerprint of the original content (x i ) and making small the distance between the fingerprint (x i, j of the distortion content) is minimized when the distance is largely created between the fingerprint of the fingerprint (x i) and the other original content (x k) of the original content, Generating a cost function [ε (.)] (S 220), and (B-3) determining a parameter of the distance metric by finding a case where the cost function [ε (.)] Is minimized (S 230). It characterized by including).

또한, 상기 (B-1) 단계에서 상기 디스턴스 메트릭은 하나의 실시예로 다음과 같이 하나의 행렬(A)로 파라미터화되는 형태로 정의할 수 있다.In addition, in the step (B-1), the distance metric may be defined as a parameterized form of one matrix A as an embodiment.

[단, 함수 φ(·)는 φ(x) = Wx (W 는 N x N 행렬임)이고, A = W ^T W임][Wherein the function φ (·) is φ (x) = Wx (W is an N × N matrix) and A = W ^T W]

또한, 상기 (B-2) 단계에서 상기 비용 함수[ε(A)]는 하기의 수학식에 의해 정의되는 것이 바람직하다.Further, in the step (B-2), the cost function [ε (A)] is preferably defined by the following equation.

[단, [z]₊ = max(z, 0)이고, M 은 여유분(margin)을 나타내며,

는 왜곡 콘텐츠의 핑거프린트(xi ,j)에 가장 근접한 비정합 핑거프린트임][Where, [z] ₊ = max (z, 0), and M represents a margin,

Is an unmatched fingerprint that is closest to the fingerprint (x i , j ) of the distorted content.

또한, 상기 (B-2) 단계에서 상기 비용 함수[ε(A)]는 볼록 함수인 것이 바람직하다.Further, in the step (B-2), the cost function [epsilon (A)] is preferably a convex function.

또한, 상기 (B-3) 단계에서 상기 비용 함수[ε(A)]가 최소화되는 경우를 찾는 것은 투영 구배법(projected gradient method)을 이용하는 것이 바람직하다.In addition, in the step (B-3), it is preferable to use a projected gradient method to find a case where the cost function [epsilon (A)] is minimized.

본 발명은 원본 및 왜곡된 콘텐츠로 구성된 훈련 데이터(training data)를 이용하여 이로부터 디스턴스 메트릭을 학습(learining)함으로써, 핑거프린트 인식 성능을 향상시켰다.The present invention improves fingerprint recognition performance by learning distance metrics from training data consisting of original and distorted content.

특히, 본 발명은 원본 콘텐츠의 핑거프린트(xi)와 왜곡된 콘텐츠의 핑거프린트(xi ,j) 사이의 거리가 원본 콘텐츠의 핑거프린트(xi)와 다른 원본 콘텐츠(xk)의 핑거프린트 사이의 거리보다 작을 때 최소화되는 비용 함수[ε(A)]를 생성하여, 비용 함수[ε(A)]를 최소화함으로써 효과적으로 디스턴스 메트릭을 학습할 수 있다.In particular, the present invention is a fingerprint (x i) and the fingers of the distortion can be printed (x i, j) fingerprint (x i) of the distance, the original content and the different original content (x k) between the original content of fingers By generating a cost function [epsilon (A)] that is minimized when smaller than the distance between the prints, the distance metric can be effectively learned by minimizing the cost function [epsilon (A)].

또한 본 발명에 따른 디스턴스 메트릭은 마할라노비스(Mahalanobis) 거리에 따른 일반 형태를 가짐으로써 디스턴스 메트릭(Distance Metric)의 파라미터화를 용이하고도 효과적으로 달성할 수 있도록 하였다.In addition, the distance metric according to the present invention has a general shape according to the Mahalanobis distance so that parameterization of the distance metric can be easily and effectively achieved.

또한, 본 발명에 따른 비용 함수[ε(A)]는 볼록 함수의 형태를 가짐으로써 비용 함수[ε(A)]를 최소화 지점을 시각화된 방법, 예컨대 투영 구배법(projected gradient method)을 이용하여 구현할 수 있도록 한다.In addition, the cost function [ε (A)] according to the present invention has the form of a convex function, thereby minimizing the cost function [ε (A) using a visualized method, for example, using a projected gradient method. To be implemented.

이하, 첨부된 도면을 참조하여 본 발명에 대하여 구체적으로 설명한다. Hereinafter, with reference to the accompanying drawings will be described in detail with respect to the present invention.

이하의 설명에서 "학습(learning)"은 디스턴스 메트릭을 매트릭스 형태로 파라미터화한 경우 이 매트릭스를 결정하는 과정을 의미한다.In the following description, "learning" refers to a process of determining the matrix when the distance metric is parameterized in a matrix form.

도 1a 및 도 1b는 본 발명에 따른 핑거 프린팅 시스템의 정합 과정에서 사용하는 디스턴스 메트릭을 학습을 통해 결정하는 과정을 나타낸 흐름도이다.1A and 1B are flowcharts illustrating a process of determining, by learning, a distance metric used in a matching process of a finger printing system according to the present invention.

도 1a에 도시된 것과 같이, 본 발명에 따른 방법은 훈련 데이터를 생성하는 제1 단계(S 100)와, 훈련 데이터로부터 학습(Learning)을 이용하여 디스턴스 메트릭을 결정하는 제2 단계(S 200)로 구성된다.As shown in FIG. 1A, the method according to the present invention includes a first step S100 of generating training data and a second step S200 of determining a distance metric using learning from the training data. It consists of.

그리고, 도 1b에 도시된 것과 같이, 학습(Learning)을 이용하여 디스턴스 메트릭을 결정하는 제2 단계(S 200)는 파라미터화된 디스턴스 메트릭을 생성하는 단계(S 210)와, 비용 함수(cost function) 생성 단계(S 220)와, 비용 함수가 최소화하는 경우를 찾음으로써 디스턴스 메트릭의 파라미터를 결정(학습)하는 단계(S 230)로 구성된다. 이하, 각 단계별로 상세히 설명한다.As shown in FIG. 1B, the second step S 200 of determining a distance metric using learning includes generating a parameterized distance metric S 210 and a cost function. Generation step (S 220) and determining (learning) a parameter of the distance metric by finding a case where the cost function is minimized (S 230). Hereinafter, each step will be described in detail.

1. 훈련 데이터 (1. Training data ( TrainingTraining datadata ) 마련 (S 100)) (S 100)

디스턴스 메트릭을 학습하기 위해서는 원본 콘텐츠의 핑거프린트(xi) 및 왜곡된 콘텐츠들의 핑거프린트(xi ,j) (i = 1, 2, ..., I 및 j = 1, 2,..., J)로 구성된 훈련 데이터(training data) 집합이 요구된다. 왜곡된 콘텐츠의 핑거프린트(xi ,j)는 i 번째 원본 콘텐츠의 j 번째 왜곡 버전으로부터 추출된 핑거프린트를 의미한다. 핑거프린트(xi)는 i 번째 원본 콘텐츠에 대한 핑거프린트이다.To learn the distance metric, the fingerprint of the original content (x i ) and the fingerprint of the distorted content (x i , j ) ( i = 1, 2, ..., I and j = 1, 2, ... , A set of training data consisting of J ) is required. The fingerprint (x i , j ) of the distorted content means a fingerprint extracted from the j- th distortion version of the i- th original content. The fingerprint x i is the fingerprint for the i th original content.

본 발명에 따른 학습 과정에서는 xi 는 데이터베이스에 저장되어 있는 핑거프린트를 나타내고, xi ,j 는 쿼리 핑거프린트를 나타낸다.In the learning process according to the invention x i Denotes a fingerprint stored in a database, and x i , j denote a query fingerprint.

또한, 핑거프린트 쌍 (xi, xi ,j)는 정합(matching) 핑거프린트 쌍이고, 핑거프린트 쌍 (xk, xi ,j) (k≠i)는 비정합(non-matching) 핑거프린트 쌍이다.Further, the fingerprint pair (x i , x i , j ) is a matching fingerprint pair, and the fingerprint pair (x k , x i , j ) ( k ≠ i ) is a non-matching finger. It is a print pair.

훈련 데이터에서 왜곡된 콘텐츠는 실제 적용 사례에서 빈번하게 발생하는 왜곡(distortion)을 고려하여 결정된다. 핑거프린팅 시스템의 새로운 왜곡 환경에 놓이게 된다면, 그 왜곡 환경에 맞는 왜곡 데이터를 사용함으로써 핑거프린팅 시스템의 디스턴스 메트릭이 새로운 왜곡 환경에 적응하도록 할 수 있다. The distorted content in the training data is determined in consideration of the distortion that occurs frequently in practical applications. When placed in the new distortion environment of the fingerprinting system, the distortion metric of the fingerprinting system can be adapted to the new distortion environment by using the distortion data suitable for the distortion environment.

2. 학습을 통한 2. Through learning 디스턴스Distance 메트릭 결정 ( Metric determination ( DETERMININGDETERMINING DISTANCEDISTANCE METRICMETRIC USINGUSING LEARNING) (S 200) LEARNING) (S 200)

2.1. 2.1. 파라미터화된Parameterized 디스턴스Distance 메트릭 ( Metric ( DISTANCEDISTANCE MATRICMATRIC ) 생성 (S 210)) Create (S 210)

디스턴스 메트릭(DISTANCE MATRIC)은 2개의 N차원의 핑거프린트 x 및 x' 사이의 거리를 나타내며, 일반적으로 ∥φ(x) - φ(x')∥² 의 형태로 나타낼 수 있다. 여기서 φ는 N차원 실수공간(R^N )에서 다른 차원으로 매핑하는 매핑 함수이다. 이 φ함수는 디스턴스 메트릭의 파라미터로 사용할 수 있다. 본 발명에서는 매핑 함수 에 대해서 하나의 실시예로 선형 투영(linear projection)을 고려할 것이므로, 함수 φ(·)는 φ(x) = Wx (W 는 N x N 행렬임)인 것으로 가정한다. 이 매핑 함수를 고려했을 경우, 디스턴스 메트릭은 하기의 수학식 1에 의해 정의된다.Distance metric (DISTANCE MATRIC) is 'denotes a distance between the generally ∥ φ (x) - φ ( x' 2 of the N-dimensional fingerprint x and x) ∥ can be expressed in the form of ^two. Where φ is a mapping function that maps from the N-dimensional real space (R ^N ) to another dimension. This φ function can be used as a parameter of the distance metric. Since the present invention will consider linear projection as an embodiment of the mapping function, it is assumed that the function φ (·) is φ (x) = Wx (W is an N × N matrix). In consideration of this mapping function, the distance metric is defined by Equation 1 below.

(단, A = W ^T W)(Where A = W ^T W)

위 식과 같이 하나의 실시예로 선형 투영을 고려할 경우, 행렬 A 를 파라미터로 가지는 디스턴스 메트릭을 얻게 된다. 이상의 식에 의해 얻어진 디스턴스 메트릭은 특정 형태에 한정되지 않으며, 함수 φ(·) 에 따라서 다양한 형태로써 적용 가능하며, 그에 따라 여러가지 형태의 파라미터를 가질 수 있다.In one embodiment, when considering linear projection, a distance metric having a matrix A as a parameter is obtained. The distance metric obtained by the above equation is not limited to a specific form, and can be applied in various forms according to the function φ (·), and thus may have various types of parameters.

위 수학식에 표시된 예는, 디스턴스 메트릭을 마할라노비스(Mahalanobis) 거리의 일반형태로 채용한 것이라고 생각할 수 있다. 마할라노비스 거리는 군집분석에서 가장 많이 사용되는 거리개념으로서, 두 지점의 단순한 거리뿐만이 아니라, 변수의 특성을 나타내는 표준편차와 상관계수가 함께 고려된다는 특징을 가지고 있다. 마할라노비스 거리 구하는 공식에 따른 p. q 사이의 거리는 수학식 1a와 같다.The example shown in the above equation can be considered to employ the distance metric in the general form of Mahalanobis distance. Mahalanobis distance is the most widely used distance concept in cluster analysis. It is characterized not only by the distance between two points, but also by the standard deviation and correlation coefficient that characterize the variable. Find the Mahalanobis distance according to the formula p. The distance between q is equal to Equation 1a.

mahalanobismahalanobis (p,q) = (p-q)∑(p, q) = (p-q) ∑ ^{-1 -One} (p-q) (p-q) ^TT

여기서, ∑ ^-1 은 공분산 행렬의 역행렬이고, T 는 변환행렬이다.Where ∑ ^-1 is the inverse of the covariance matrix and T is the transformation matrix.

이하의 설명에서 x 와 x' 사이의 거리 D _A (x, x') = ∥φ(x) - φ(x')∥²로 정의된다.In the description below, the distance D _A between x and x ' (x, x ') = ∥ φ (x) -φ (x') ∥ ² is defined.

디스턴스 메트릭을 학습한다는 것은 디스턴스 메트릭을 파라미터를 결정한다는 것이고, 이 실시예에서는 행렬 A를 결정한다는 것을 의미한다. 만약 디스턴스 메트릭을 파라미터화한 행렬 A가 단위행렬이라면 D _A (x, x')는 유클리드 거리와 일치하게 된다.Learning the distance metric means determining the distance metric and, in this embodiment, determining the matrix A. D _A if the matrix A parameterizing the distance metric is a unit matrix (x, x ') matches the Euclidean distance.

2.2. 비용 함수 (2.2. Cost function ( CostCost functionfunction ) 생성 (S 220)) Generate (S 220)

디스턴스 메트릭 행렬 A의 파라미터는 비용 함수를 최소화시킴으로써 결정된다. 비용 함수의 형태는 특정 형태에 한정되지 않으며, 어떤 왜곡된 콘텐츠의 핑거프린트와 그 왜곡된 콘텐츠의 원본 콘텐츠[대응(corresponding) 콘텐츠]의 핑거프린트 사이의 거리는 작아야하며, 어떤 왜곡된 콘텐츠의 핑거프린트와 그 왜곡된 콘텐츠의 원본 콘텐츠가 아닌 다른 원본 콘텐츠[비대응(non-corresponding) 콘텐츠]의 핑거프린트 사이의 거리는 커야 한다는 전제 하에 만들어진 다양한 형태로써 적용 가능하다.The parameters of the distance metric matrix A are determined by minimizing the cost function. The form of the cost function is not limited to a particular form, the distance between the fingerprint of any distorted content and the fingerprint of the original content (corresponding content) of the distorted content should be small, and the fingerprint of any distorted content And the distance between the fingerprint of the original content (non-corresponding content) other than the original content of the distorted content can be applied in various forms.

다만 일 예로서, D _A (xi, xi ,j)가 D _A (xk, xi ,j) (k≠i) 보다 작을 때 최소화되는 비용함수를 들 수 있다. 원본 콘텐츠에 대한 왜곡 버전인 쿼리 콘텐츠를 올바르게 인식하기 위해서는 쿼리 핑거프린트(xi,j)는 xk 보다 xi 에 더 가까워야 한 다(k≠i). 이 조건을 만족시키는 본 발명에 따른 비용 함수는 하기의 수학식 2와 같이 구현할 수도 있다.Just as an example, D _A (x i , x i , j ) is D _A cost function minimized when (x k , x i , j ) ( k ≠ i ). In order to correctly recognize a distorted version of the query content to original content query fingerprint (x i, j) are to be in closer than x i x k (k ≠ i). A cost function according to the present invention that satisfies this condition may be implemented as in Equation 2 below.

여기서 [z]₊ (= max(z, 0))는 표준 힌지 손실 함수(standard hinge loss function)을 나타내고, M 은 여유분(margin)을 나타내며,

는 쿼리 핑거프린트(xi ,j)에 가장 근접한 비정합 핑거프린트를 나타낸다. 인덱스

는 하기의 수학식 3에 따라 표현할 수 있다.Where [z] ₊ (= max (z, 0)) represents the standard hinge loss function, M represents the margin,

Denotes an unmatched fingerprint that is closest to the query fingerprint (x i , j ). index

Can be expressed according to Equation 3 below.

수학식 2에 상수 M 과 힌지 손실 함수를 포함시키고 ε(A)를 최소화함으로써, D _A (xξ(i,j), xi ,j) ≥ M + D _A (xi, xi ,j)가 되도록 유도된다(참고문헌 [10]). 따라서 디스턴스 메트릭은 쿼리 핑거프린트(xi,j)와 비대응 콘텐츠의 핑거프린트와의 거리가 적어도 M + D _A (xi, xi ,j)보다는 커지도록 학습된다.By including the constant M and the hinge loss function in Equation 2 and minimizing ε (A), D _A (x ξ (i, j) , x i , j ) ≥ M + D _A It is derived to be (x i , x i , j ) (Ref. [10]). Thus, the distance metric is at least M + D _A between the query fingerprint (x i, j ) and the fingerprint of the non-corresponding content. It is learned to be larger than (x i , x i , j ).

도 2는 비용 함수의 의미를 설명하기 위한 도면이다.2 is a diagram for explaining the meaning of a cost function.

도 2(a)에 도시된 것과 같이, xξ(i,j)가 xi , j 를 중심으로 하고 반경이 M + D _A (xi, xi ,j)인 구의 바깥에 위치하는 경우에는 ε(A)의 피가수(summand)인 M + D _A (xi, xi ,j) - D _A (xξ(i,j), xi ,j) 는 0이 된다. As shown in Fig. 2 (a), x ξ (i, j) is centered on x i , j and the radius is M + D _A (x i, x i, j ) is positioned in the outer sphere has a summand (summand) of _{ε (A) M + D A} (x i, x i, j) - D A (x ξ (i, j) , x i , j ) becomes zero.

그러나, 도 2(b)에 도시된 것과 같이, xξ(i,j)가 xi , j 를 중심으로 하고 반경이 M + D _A (xi, xi ,j)인 구의 내부에 위치하는 경우에는 M + D _A (xi, xi ,j) - D _A (xξ(i,j), xi ,j) 만큼 값(cost)이 비용 함수에 더해진다. A는 M에 의해 스케일이 가능하므로 이하에서는 M= 1로 설정하여도 일반성을 유지할 수 있다.However, as shown in Fig. 2 (b), x ξ (i, j) is centered on x i , j and the radius is M + D _A M + D _A if inside the sphere of (x i , x i , j ) (x i , x i , j ) -D _A The cost is added to the cost function by (x ξ (i, j) , x i , j ). Since A can be scaled by M , the generality can be maintained even when M = 1.

2.3. 비용 함수의 2.3. Of cost function 볼록성Convexity ( ( ConvexityConvexity ofof thethe costcost functionfunction ))

비용 함수는 A에 대해 볼록 함수이므로 전체적인 최소값을 구할 수 있다. 비용 함수의 볼록성(convexity)을 증명하기 위해서, 수학식 2의 ε(A)를 다시 기재하면 하기 수학식 4와 같이 된다.The cost function is a convex function for A, so we can find the overall minimum. In order to prove the convexity of the cost function, if E (A) of Equation 2 is described again, Equation 4 is obtained.

여기서, K (A, i, j)는 하기의 수학식 5에 의해 정의된다.Here, K (A, i, j) is defined by the following equation (5).

(볼록성의 증명)(Proof of convexity)

볼록 함수들의 합은 볼록 함수이다.The sum of the convex functions is the convex function.

따라서, 만약 [K (A, i, j)]₊ 가 볼록하다면, ε(A)도 역시 볼록하다. 또한, 만약 함수 K (A, i, j)가 볼록하다면 [K (A, i, j)]₊ 도 역시 볼록하다. K (A, i, j)는 상수(M)과 두개의 선형 함수들의 합으로 되어 있으므로, K (A, i, j)는 A 에 대해 선형적이다. 따라서, K (A, i, j)가 볼록하면 ε(A)도 역시 볼록하다.Thus, if [ K (A, i, j)] ₊ is convex, ε (A) is also convex. Also, if the function K (A, i, j) is convex, then [ K (A, i, j)] ₊ is also convex. Since K (A, i, j) is a constant M and the sum of two linear functions, K (A, i, j) is linear with respect to A. Therefore, when K (A, i, j) is convex, ε (A) is also convex.

2.4. 최적화 (2.4. optimization ( OptimizationOptimization ) - )- 디스턴스Distance 메트릭의 파라미터 행렬(A) 결정 (S 230) Determining the Parameter Matrix (A) of the Metric (S 230)

행렬 A를 찾아내기 위해서, 참고문헌 [12]에 개시된 투영 구배법(projected gradient method)을 이용한다.To find the matrix A, we use the projected gradient method described in Ref. [12].

디스턴스 메트릭은 음이 아니어야(non-negative) 하고, 삼각 부등식을 만족시키므로, 행렬 A는 양반한정(positive semi-definite)이다(참고문헌 [8]). Since the distance metric must be non-negative and satisfy the triangular inequality, the matrix A is positive semi-definite (Ref. [8]).

투영 구배법은 2가지 단계로 수행된다.Projection gradient is performed in two steps.

먼저, 비구속 최소화(unconstrained minimization)를 위해서, 그레디언트 디슨트 방법[Gradient descent method: 함수의 현재 위치에서 기울기(gradient)가 음인 방향을 찾고, 그 방향으로 이동하여 새로운 위치를 잡고, 이러한 방법을 반복하여 함수의 가장 낮은 지점(local minimum) 을 찾는 방법]을 이용한다.First, for unconstrained minimization, the gradient descent method finds the direction in which the gradient is negative at the current position of the function, moves in that direction, takes a new position, and repeats this method. To find the local minimum of the function.

그 다음, 행렬 A를 양반한정(positive semi-definite) 공간에 투영한다. 투영은 참고자료 [12]에 기재된 반한정 프로그래밍(semidefinite programming)을 이용한다.The matrix A is then projected into positive semi-definite space. Projection uses semifinite programming as described in Ref. [12].

이상과 같은 과정으로 행렬 A를 찾는 과정을 수학적으로 표현하면, 하기 수학식 6과 같이 된다.If the process of finding the matrix A by the above process is mathematically expressed, the following equation (6) is obtained.

여기서, β는 스텝 크기(step size)이고, ∥·∥ _F 는 프로베니우스 놈(Frobenius norm)이다. 즉,

가 된다.Where β is a step size, ∥ · _F is a Frobenius norm. In other words,

Becomes

4. 실험 결과 (4. Experimental results ( EXPERIMENTALEXPERIMENTAL RESULTSRESULTS ))

4.1. 실험 4.1. Experiment 셋업set up

디스턴스 메트릭 학습에 의한 성능 향상을 나타내 보이기 위해서, 본 발명에 따른 디스턴스 메트릭 학습 방법을 참고문헌 [4]에 기재된 오디오 핑거프린트 시스템에 적용하여 실험하기로 한다. In order to show the performance improvement by distance metric learning, the distance metric learning method according to the present invention will be applied to the audio fingerprint system described in Ref. [4].

참고문헌 [4]에 따르면, 길이가 371.5ms인 프레임[쉬프트(shift, 각 프레임의 시작점 또는 끝점의 이격 시간)는 185.7ms]으로부터 16-차원의 핑거프린트가 추출된 다음, 핑거프린트 정합 과정에서는 유클리드 거리가 이용되었다.According to Ref. [4], a 16-dimensional fingerprint is extracted from a frame having a length of 371.5 ms (shift (185.7 ms for the start or end point of each frame)). Euclidean distance was used.

핑거프린트 정합은 5초 또는 10초 길이의 오디오 클립 (27 또는 54 프레임)를 이용하여 수행되었으므로, N = 432 (=27x16) 또는 N = 864 (=54x16)가 된다[하나의 프레임당 16 차원의 핑거프린트가 추출되므로 N차원의 핑거프린트는 N/16(= 27 또는 54)개의 프레임으로부터 추출된 것임].Fingerprint matching was performed using audio clips (27 or 54 frames) of 5 or 10 seconds long, so N = 432 (= 27x16) or N = 864 (= 54x16) (16 dimensions per frame). Since the fingerprint is extracted, the N-dimensional fingerprint is extracted from N / 16 (= 27 or 54) frames.

본 발명에서는 핑거프린트 정합 성능이 주된 관심 대상이므로, 본 실시예에서는 핑거프린트 시스템에 대한 데이터베이스 검색 과정은 제외되었다.Since fingerprint matching performance is of primary interest in the present invention, the database search process for the fingerprint system is omitted in this embodiment.

N 이 너무 큰 수이어서 N 차원 (N x N) 행렬에 대해 학습을 한다는 것은 계산하기가 용이하지 않으므로, 본 실험에서는 N 차원의 행렬 A 대신에 M 차원(M < N)의 행렬 A_S에 대해 학습을 수행하기로 한다. Since N is so large that learning on an N- dimensional ( N x N ) matrix is not easy to calculate, in this experiment, for matrix A _S of M- dimension ( M < N ) instead of matrix A of N- dimension, Let's do the learning.

행렬 A_S 는 N 차원의 핑거프린트를 M 차원으로 나눔으로써 얻어진 M 차원의 핑거프린트를 이용함으로써 구할 수 있다[하나의 프레임당 16 차원의 핑거프린트가 추출되므로 M차원의 핑거프린트는 M/16개의 프레임으로부터 추출된 것임].Matrix A _S Can be obtained by using the M-dimensional fingerprint obtained by dividing the N-dimensional fingerprint by the M-dimensional fingerprint (the 16-dimensional fingerprint is extracted per frame, so the M-dimensional fingerprint is extracted from M / 16 frames). ].

2개의 N차원의 핑거프린트 x 및 x' 사이의 거리는 하기 수학식 7로 구할 수 있다.The distance between two N-dimensional fingerprints x and x 'can be obtained from Equation 7 below.

여기서, x_s ^(k) 및 x_s'^(k) 는 N차원의 핑거프린트 x 및 x' 를 각각 나눔으로써 얻어진 M 차원의 핑거프린트를 의미한다. 본 실시예에서는 M = 48 로 셋팅하였으므로, 상기 수학식 7에서의 피가수(summand)는 3(= M/16=48/16)개의 프레임으로부터 추출된 핑거프린트간의 거리를 의미한다.Here, x _s ^(k) and x _s ' ^(k) mean the M-dimensional fingerprint obtained by dividing the N-dimensional fingerprints x and x', respectively. In this embodiment, since M = 48, the summand in Equation 7 means the distance between the fingerprints extracted from 3 (= M / 16 = 48/16) frames.

4.2. 훈련 집합 (4.2. Training set ( TrainingTraining setset ))

100개의 서로 다른 음원(노래)가 디스턴스 메트릭 학습에 이용되었다. 본 실시예에서는 I = 8000, J = 4 로 셋팅되었다.100 different sound sources (songs) were used for distance metric learning. In this embodiment, I = 8000 and J = 4.

여기서 I는 원본 콘텐츠의 핑거프린트(xi ; i = 1, 2, ..., I ) 개수를 의미하고, J는 원본 콘텐츠의 핑거프린트(xi)에 대한 콘텐츠의 왜곡 버전(xi , j ; i = 1, 2, ..., J )의 개수를 의미한다(3.1. 항목 참조).Where I is the number of fingerprints (x i ; i = 1, 2, ..., I ) of the original content, and J is the distorted version (x i , ) of the content relative to the fingerprint (x i ) of the original content . j ; i = 1, 2, ..., J ) (see 3.1.)

본 실시예에서 이용되는 오디오 왜곡 목록은 참고문헌 [2]에 기재되어 있으며, 하기 표 1과 같다.The audio distortion list used in the present embodiment is described in Ref. [2], and is shown in Table 1 below.

기호sign 오디오 왜곡명Audio distortion 내용Contents L1L1 EQ1
(옥타브 대역 이퀄라이제이션)EQ1
(Octave band equalization) 옥타브 대역의 인접대역을 감쇄
(-6 dB 및 +6 dB 교번하는 방식으로 셋팅)Adjacent bands in the octave band
(Set by alternating -6 dB and +6 dB) L2L2 E
(에코)E
(eco) 구세대 라디오 필터 복제
(Filter emulation of old time radio)Old generation radio filter replication
(Filter emulation of old time radio) L3L3 BPF
(밴드패스필터링)BPF
(Band Pass Filtering) 0.4 - 4 kHZ 밴드 대역 필터
(0.4 - 4 kHZ Band Pass Filter)0.4-4 kHZ Band Band Filter
(0.4-4 kHZ Band Pass Filter) L4L4 WMA
(WMA 인코딩)WMA
(WMA encoding) 64kbps WMA 인코딩
(64kbps WMA encoding)64 kbps WMA encoding
(64kbps WMA encoding)

모든 왜곡에 있어서, 96 kbps의 MP3 인코딩이 수행되었다.For all distortions, 96 kbps MP3 encoding was performed.

4.3. 비교 테스트4.3. Comparison test

본 발명의 성능을 평가하기 위해서, 전술한 훈련 집합으로부터 완전히 구별되는 100개의 완전히 상이한 음원(노래)가 비교 테스트에 사용되었다. 본 비교 데스트에는 전술한 4개의 왜곡(EQ1, E, BPF, WMA) 이외에 하기 표 2와 같은 3개의 왜곡이 추가되어 7개의 왜곡이 이용되었다In order to evaluate the performance of the present invention, 100 completely different sound sources (songs) completely distinguished from the above-described training set were used for the comparison test. In addition to the four distortions (EQ1, E, BPF, WMA) described above, three distortions as shown in Table 2 were added to the comparison test, and seven distortions were used.

기호sign 오디오 왜곡명Audio distortion 내용Contents T1T1 TD
(시간 지연, Time Delay)TD
(Time Delay) 92.9 ms 시프트92.9 ms shift T2T2 SR
(샘플링율 변경)SR
(Sampling rate change) 16 kHz로 다운 샘플링 및
44.1 kHz로 업 샘플링Downsampling to 16 kHz and
Upsampling to 44.1 kHz T3T3 EQ2
(1/3 옥타브 대역 이퀄라이제이션)EQ2
(1/3 octave band equalization) 30-밴드 팝 이퀄라이제이션
(30-band pop equalization)30-band pop equalization
(30-band pop equalization)

상기 3개 이상의 왜곡이 결합된 테스트 세트도 고려되었다.A test set that combines the three or more distortions is also contemplated.

각각의 테스트 세트는 학습이 적용된 경우에서의 왜곡 뿐만 아니라, 학습이 적용되지 않은 경우에서의 왜곡도 모두 포함되어 있다.Each test set includes not only distortion when learning is applied, but also distortion when learning is not applied.

도 3은 본 발명에 따른 실시예[학습이 적용된 디스턴스 메트릭, "Learned"로 도시됨]의 성능과 비교예[종래의 유클리드 거리를 이용하고 학습이 적용되지 않은 경우, "Euclidean"으로 도시됨] 의 성능을 ROC(Receiver Operating Characteristic) 곡선을 이용하여 나타낸 비교 그래프이다.Figure 3 shows the performance and comparative example of the embodiment (distance metric to which the learning is applied, shown as "Learned") according to the present invention. Is a comparison graph showing the performance of using a receiver operating characteristic (ROC) curve.

도 3의 ROC 그래프에서는 음성 오류(FN; False Negaitive, 실제로 음성인데 양성으로 판단하는 오류) 비율에 대한 양성 오류(FP; False Positive, 실제로 양성인데 음성으로 판단하는 오류) 비율을 도시하였다(FN vs. FP).The ROC graph of FIG. 3 shows the ratio of false positives (FP; false positives actually negative) to the ratio of false negatives (FN; false positives). FP).

즉, 본 실험에서는 음성 오류(FN) 비율은 정합 핑거프린트 쌍이 비정합 핑거프린트 쌍으로 판단되는 경우의 비율로 정의하고, 양성 오류(FP) 비율은 비정합 핑거프린트 쌍이 정합 핑거프린트 쌍으로 판단되는 경우의 비율로 정의하기로 한다.That is, in this experiment, the negative error (FN) ratio is defined as the ratio when the matched fingerprint pair is determined to be an unmatched fingerprint pair, and the positive error (FP) ratio is determined as the matched fingerprint pair as the matched fingerprint pair. It is defined as the ratio of cases.

각각의 실험에서는 60,000 개의 정합 핑거프린트 쌍과 100,000,000 개의 비정합 핑거프린트 쌍이 이용되었다.In each experiment, 60,000 matched fingerprint pairs and 100,000,000 unmatched fingerprint pairs were used.

도 3(a) 내지 도 3(d)는 학습이 적용된 왜곡만 존재하는 경우에 대한 성능을 도시하고 있다. 3 (a) to 3 (d) show the performance of the case where only distortion to which learning is applied is present.

도 3(e) 내지 도 3(g)는 학습이 적용되지 않은 왜곡도 포함된 경우에 대한 성능을 도시하고 있다. 3 (e) to 3 (g) show the performances for the case where distortion to which learning is not applied is also included.

도 3(h) 내지 도 3(j)는 학습이 적용된 왜곡과 학습이 적용되지 않은 왜곡이 3개 이상 결합된 경우의 성능을 도시하고 있다.3 (h) to 3 (j) show performances when three or more distortions to which learning is applied and distortions to which learning is not applied are combined.

도 3(a) 내지 도 3(j)에 적용된 왜곡은 하기 표 3과 같다.The distortion applied to FIGS. 3 (a) to 3 (j) is shown in Table 3 below.

도면 drawing 적용된 왜곡Applied distortion 도 3(a)Figure 3 (a) EQ1 + MP3EQ1 + MP3 도 3(b)3 (b) E + MP3E + MP3 도 3(c)Figure 3 (c) BPF + MP3BPF + MP3 도 3(d)3 (d) WMA + MP3WMA + MP3 도 3(e)3 (e) TD + MP3TD + MP3 도 3(f)3 (f) EQ2 + MP3EQ2 + MP3 도 3(g)Fig. 3 (g) SR + MP3SR + MP3 도 3(h)3 (h) WMA + EQ2 + SR + MP3WMA + EQ2 + SR + MP3 도 3(i)3 (i) TD + E + BPF + EQ2 + MP3TD + E + BPF + EQ2 + MP3 도 3(j)3 (j) EQ1 + BPF + EQ2 + MP3EQ1 + BPF + EQ2 + MP3

도 3에 도시된 것과 같이, 본 발명에 따른 실시예[학습이 적용된 디스턴스 메트릭, "Learned"로 도시됨]의 성능은 비교예[종래의 유클리드 거리를 이용하고 학습이 적용되지 않은 경우, "Euclidean"으로 도시됨] 성능에 비하여 우수하다는 것을 보여준다.As shown in Fig. 3, the performance of an embodiment according to the present invention (distance metric to which learning is applied, shown as "Learned") is compared to that of the comparative example (conventional Euclidean distance and no learning is applied, "Euclidean Shows good performance relative to performance.

즉, 동일한 조건일 경우 실시예(Learned)는 비교예(Euclidean)에 비하여 좌하측(left-lower)에 배치됨을 확인할 수 있으며, 이는 실시예가 비교에에 비하여 양성 오류(FP; False Positive) 비율 및 음성 오류(FN; False Negaitive) 비율이 낮다는 것을 의미한다.That is, under the same conditions, it can be seen that the example (Learned) is disposed on the left-lower side compared to the comparative example (Euclidean), which means that the example has a false positive (FP) ratio and This means that the false negative (FN) rate is low.

특히, 도 3(b) 및 도 3(c)에서 명확히 확인되듯이, 본 발명에 따른 디스턴스 메트릭의 학습 효과는 왜곡 정도가 심각한 E(에코) 및 BPF(밴드대역필터) 왜곡에 대해 현저히 향상된 모습을 보여주고 있다.In particular, as clearly seen in Figs. 3 (b) and 3 (c), the learning effect of the distance metric according to the present invention is significantly improved for E (eco) and BPF (band band filter) distortion, which is severely distorted. Is showing.

또한, 도 3(i) 및 도 3(j)에서 확인되듯이, 3개 이상의 왜곡이 결합된 경우에서도 E(에코) 및 BPF(밴드대역필터) 왜곡이 포함된 경우에 인식 성능이 보다 현저히 향상되었음을 보여준다.In addition, as shown in FIGS. 3 (i) and 3 (j), even when three or more distortions are combined, the recognition performance is significantly improved when E (eco) and BPF (bandband filter) distortions are included. Shows that

결국, 도 3의 모든 경우에 대해서 디스턴스 메트릭의 학습이 적용된 본 발명의 실시예(Learned)는, 학습이 적용되지 않은 비교예(Euclidean)에 비하여 왜곡에 대한 성능 저하가 일어나지 않았다.As a result, the embodiment of the present invention to which the learning of the distance metric is applied to all the cases of FIG. 3 has no performance degradation against distortion as compared to the comparative example (Euclidean) to which the learning is not applied.

본 발명에 따르면, 디스턴스 메트릭의 학습을 이용하여 핑거프린트 정합 과정을 향상시키는 방법이 제안되었다. 디스턴스 메트릭의 학습은 인식 성능과 관련된 비용 함수를 최소화함으로써 수행된다. 비용 함수는 쿼리 콘텐츠가 올바르게 인식되었을 때 최소화하도록 디자인된다.According to the present invention, a method for improving a fingerprint matching process using learning of a distance metric has been proposed. The learning of the distance metric is performed by minimizing the cost function associated with recognition performance. The cost function is designed to minimize when query content is correctly recognized.

본 발명을 오디오 핑거프린팅 시스템에 적용한 실험에 따르면, 본 발명에 따른 디스턴스 메트릭 학습에 의해 핑거프린트 성능이 향상되는 것으로 나타났다.According to an experiment applying the present invention to an audio fingerprinting system, fingerprint performance has been improved by distance metric learning according to the present invention.

이상, 본 발명의 특정 실시예에 대하여 상술하였지만, 본 발명의 사상 및 범위는 이러한 특정 실시예에 한정되는 것이 아니라, 본 발명의 요지를 변경하지 않는 범위 내에서 다양하게 수정 및 변형이 가능하다는 것을 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 이해할 것이다. Although specific embodiments of the present invention have been described above, the spirit and scope of the present invention are not limited to the specific embodiments, and various modifications and changes can be made without departing from the spirit of the present invention. Those skilled in the art will understand.

따라서, 이상에서 기술한 실시예들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이므로, 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 하며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Therefore, since the embodiments described above are provided to completely inform the scope of the invention to those skilled in the art, it should be understood that they are exemplary in all respects and not limited. The invention is only defined by the scope of the claims.

<참고 문헌> <References>

이하의 참고 문헌은 본 명세서의 일부로서 합체된다. The following references are incorporated as part of this specification.

[1] "고도로 강인한 오디오 핑거프린트 시스템", J. 하이스트마 등[1] "Highly robust audio fingerprint system", J. Heistma, etc.

[J. Haitsma and T. Kalker, "A highly robust audio fingerprinting system", Proc . Int . Conf . Music Information Retrieval ,, 2002][J. Haitsma and T. Kalker, "A highly robust audio fingerprinting system", Proc . Int . Conf . Music Information Retrieval,, 2002]

[2] "MPEG-7 레벨 기술을 이용한 오디오 자료의 콘텐츠 기반 식별", E.알라망쉬 등[2] "Content-based Identification of Audio Materials Using MPEG-7 Level Technology," E. Alamansh, et al.

[E. Allamanche, J. Herre, O. Helmuth, B. Frba, T Kasten, and M Cremer, "Content-based identification of audio material using MPEG-7 low level description", Proc . Int . Symposium of Music Information Retrieval, 2001][E. Allamanche, J. Herre, O. Helmuth, B. Frba, T Kasten, and M Cremer, "Content-based identification of audio material using MPEG-7 low level description", Proc . Int . Symposium of Music Information Retrieval , 2001]

[3] "오디오 핑거프린팅에서의 왜곡 판별 분석", C. 버지스 등[3] "Distortion Discrimination Analysis in Audio Fingerprinting", C. Burgess et al.

[C. Burges, J. Plat, and S. Jana, "Distortion discriminant analysis for audio fingerprinting", IEEE Trans . Speech Audio Processing, vol. 11, no. 3, pp. 165-174, May, 2003.][C. Burges, J. Plat, and S. Jana, "Distortion discriminant analysis for audio fingerprinting", IEEE Trans . Speech Audio Processing , vol. 11, no. 3, pp. 165-174, May, 2003.]

[4] "표준화된 부대역 모멘츠에 기초한 핑거프린팅", J.S. 서 등[4] "Fingerprinting Based on Standardized Subband Moments", J.S. Standing lights

[J. S. Seo, M. Jin, S. Lee, D. Jang, S. Lee, C. D. Yoo, Audio Fingerprinting Based on Normalized Spectral Subband Moments, IEEE Signal Processing letters, vol. 13, issue 4, pp. 209-212, Apr., 2006.]JS Seo, M. Jin, S. Lee, D. Jang, S. Lee, CD Yoo, Audio Fingerprinting Based on Normalized Spectral Subband Moments, IEEE Signal Processing letters , vol. 13, issue 4, pp. 209-212, Apr., 2006.]

[5] "비디오 핑거프린팅을 위한 특성 추출 및 데이터베이스 전략", J. 오스트빈 등[5] "Feature Extraction and Database Strategy for Video Fingerprinting", J. Ostvin et al.

[J. Oostveen, T. Kalker, and J. Haitsma, 'Feature extraction and a database strategy for video fingerprinting", Proc . Int . Conf . on Visual Information and Information Systems, pp. 117-128, 2002.][J. Oostveen, T. Kalker, and J. Haitsma , 'Feature extraction and a database strategy for video fingerprinting ", Proc. Int. Conf. On Visual Information and Information Systems , pp. 117-128, 2002.]

[6] "콘텐츠 기반 비디오 인식을 위한 강인한 비디오 핑거프린팅", S. 리 등[6] "Strong Video Fingerprinting for Content-Based Video Recognition", S. Lee, et al.

[S. Lee and C. D. Yoo, 'Robust video fingerprinting for content-Based video identification", IEEE Trans . Circuits and Systems for Video Technology, vol. 18, no. 7, pp. 983-988, July 2008.][S. Lee and CD Yoo, "Robust video fingerprinting for content-Based video identification", IEEE Trans . Circuits and Systems for Video Technology , vol. 18, no. 7, pp. 983-988, July 2008.]

[7] "콘텐츠 기반 이미지 복사 감지", C. 김 등[7] "Content-based Image Copy Detection", C. Kim et al.

[C. Kim, "Content-based image copy detection", Signal Processing : Image Communication, Vol. 18 (3), pp. 169-184, March 2003.][C. Kim, "Content-based image copy detection", Signal Processing : Image Communication , Vol. 18 (3), pp. 169-184, March 2003.]

[8] "디스턴스 메트릭 학습, 부가정보와의 군집화 적용", E.P. 씽 등[8] "Learning Distance Metrics, Clustering with Side Information", E.P. Xing etc.

[E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell, "Distance Metric Learning, with application to Clustering with side-information", Proc . NIPS 2003.]EP Xing, AY Ng, MI Jordan, and S. Russell, "Distance Metric Learning, with application to Clustering with side-information", Proc . NIPS 2003.]

[9] "계층 붕괴에 의한 메트릭 학습", 글로버슨 등[9] "Learning Metrics by Hierarchy Collapse," Gloverson, et al.

[A. Globerson and S. Roweis, "Metric learning by collapsing classes", Proc . NIPS 2006.][A. Globerson and S. Roweis, "Metric learning by collapsing classes", Proc . NIPS 2006.]

[10] "큰 여유분 최근린 분류을 위한 디스턴스 메트릭 학습", K. 와인버거 등[10] "Learning Distance Metrics for Large Recent Breakdowns", K. Wine Burger, etc.

[K. Weinberger, J. Blitzer, and L. Saul, 'Distance Metric learning for large margin nearest neighbor classification", Proc . NIPS 2006.][K. Weinberger, J. Blitzer, and L. Saul, 'Distance Metric learning for large margin nearest neighbor classification ", Proc . NIPS 2006.]

[11] '디스턴스 메트릭 학습: 전반적인 개요", L. 양 등[11] 'Learning Distance Metrics: A Overall Overview', L. Yang, etc.

[L. Yang and R. Jin, "Distance Metric learning: A comprehensive survey", Technical report, Department of Computer Science and Engineering, Michigan State University, 2006.][L. Yang and R. Jin, "Distance Metric learning: A comprehensive survey", Technical report , Department of Computer Science and Engineering, Michigan State University, 2006.]

[12] "볼록 최적화", S. 보이드 등[12] "Convex Optimization", S. Boyd, etc.

[S. Boyd and L. Vandenberghe, "Convex Optimization", Cambridge University Press, 2004][S. Boyd and L. Vandenberghe, "Convex Optimization", Cambridge University Press , 2004]

도 1a 및 도 1b는 본 발명에 따른 방법의 과정을 나타낸 흐름도.1a and 1b are flow charts illustrating the process of the method according to the invention.

도 2는 비용 함수의 의미를 설명하기 위한 도면.2 is a diagram for explaining the meaning of a cost function.

도 3은 본 발명에 따른 실시예의 성능과 비교예의 성능을 비교하기 위해서 ROC(Receiver Operating Characteristic) 곡선을 이용하여 나타낸 비교 그래프.Figure 3 is a comparison graph shown using the Receiver Operating Characteristic (ROC) curve to compare the performance of the embodiment and the performance of the comparative example according to the present invention.

* 도면의 주요부분에 대한 부호의 설명DESCRIPTION OF THE REFERENCE NUMERALS

xi : i 번째 원본 콘텐츠의 핑거프린트 (i = 1, 2,..., I )x i : Fingerprint of the i th original content ( i = 1, 2, ..., I )

xi ,j : i 번째 원본 콘텐츠의 j 번째 왜곡 버전으로부터 추출된 핑거프린트 (j = 1, 2,..., J ) x i , j : Fingerprint extracted from the j th distortion version of the i th original content ( j = 1, 2, ..., J)

M : 여유분(margin) M : margin

D _A (x, x') : x 와 x' 사이의 거리 D _A (x, x '): distance between x and x'

: 핑거프린트(xi,j)에 가장 근접한 비정합 핑거프린트

: Mismatched fingerprint closest to fingerprint (x i, j )

Claims

The distance used in the registration process of the fingerprint printing system that recognizes content by matching the fingerprint (x i, j ) of the distorted content extracted from the distorted content of the original content with the fingerprint (x i ) of the original content. As a way to determine metrics by learning,

(A) providing training data consisting of a fingerprint (x i ) of the original content and a fingerprint (x i, j ) of the distorted content;

(B) determining distance metrics through learning by using the training data to produce improved recognition performance;

Method for determining the distance metric used in the matching process of the fingerprint printing system, characterized in that through learning.

(B) using the training data to determine a distance metric capable of producing improved recognition performance through learning (learning),

Step (B) is,

(B-1) generating a parameterized distance metric by parameterizing the distance metric (S 210);

(B-2) fingerprint (x i) and the making the distance between the smaller of the distortion content fingerprint (x i, j), a fingerprint (x i) and the other the original content of the original content of the original content ( generating a cost function ε (A) that is minimized when the distance between the fingerprints of x k ) is made large (S 220),

(B-3) Determining each parameter of the distance metric by finding a case where the cost function [ε (A)] is minimized (S 230)

Step (B) is,

(B-3) determining a case where the cost function [epsilon (A)] is minimized and determining each parameter of the distance metric (S 230),

In the step (B-1), the distance metric matrix (A) is defined by the following equation, the method for determining the distance metric used in the matching process of the finger printing system through learning.

[Wherein the function φ (·) is φ (x) = Wx (W is an N × N matrix) and A = W ^T W]

The distance used in the registration process of the fingerprint printing system that recognizes content by matching the fingerprint (x i, j ) of the distorted content extracted from the distorted content of the original content with the fingerprint (x i ) of the original content. As a way of determining metrics by learning,

Step (B) is,

In the step (B-2), the cost function [epsilon (A)] is defined by the following equation, the method for determining the distance metric used in the matching process of the finger printing system through learning.

[Where, [z] ₊ = max (z, 0), and M represents a margin,

Step (B) is,

(B-3) determining a case where the cost function ε (A) is minimized and determining each parameter of the distance metric (S 230),

In the step (B-2), the cost function [ε (A)] is a convex function, characterized in that the distance metric used in the matching process of the fingerprint printing system to determine through learning.

Step (B) is,

In the step (B-3), the case where the cost function [epsilon (A)] is minimized is characterized by using a projected gradient method. The distance metric used in the registration process of the finger printing system. How to determine through learning.