KR101302567B1

KR101302567B1 - System of recognizing contents and method of extracting binary fingerprint using weighted voting

Info

Publication number: KR101302567B1
Application number: KR1020110142529A
Authority: KR
Inventors: 이석필; 양창모; 장세진; 장달원
Original assignee: 전자부품연구원
Priority date: 2011-12-26
Filing date: 2011-12-26
Publication date: 2013-09-02
Also published as: KR20130074460A

Abstract

가중 투표를 통한 이진 특징점을 이용한 콘텐츠 인식 시스템이 개시된다. 이 콘텐츠 인식 시스템은 콘텐츠별 콘텐츠 검색을 위한 이진 특징점이 저장된 데이터베이스 및 입력 콘텐츠를 정규화하고, 상기 정규화된 데이터에 대해 서로 다른 계산 방식들을 적용하여 특징값들을 산출하고, 상기 산출된 특징값들을 각각의 문턱 값들과 비교하여 일련의 이진 값들로 구성되는 이진 특징점들을 추출하고, 상기 추출된 이진 특징점들을 구성하는 이진 값들 각각에 기설정된 가중치를 반영하며, 상기 가중치가 반영된 이진 특징점들로부터 최종 이진 특징점을 추출한 후, 상기 최종 이진 특징점과 동일한 이진 특징점을 갖는 콘텐츠의 메타데이터를 상기 데이터베이스에서 검색하는 콘텐츠 인식부를 포함한다.A content recognition system using binary feature points through weighted voting is disclosed. The content recognition system normalizes a database and input content storing binary feature points for content-specific content retrieval, calculates feature values by applying different calculation methods to the normalized data, and calculates each of the calculated feature values. Extracting binary feature points consisting of a series of binary values compared to threshold values, reflecting a predetermined weight on each of the binary values constituting the extracted binary feature points, and extracting a final binary feature point from the weighted binary feature points. And a content recognizing unit for searching the database for metadata of the content having the same binary feature point as the final binary feature point.

Description

System of recognizing contents and method of extracting binary fingerprint using weighted voting}

본 발명은 콘텐츠 인식 기술에 관한 것으로, 특히 이진 형태의 특징점을 이용하여 콘텐츠를 인식하는 기술에 관한 것이다.The present invention relates to a content recognition technology, and more particularly, to a technology for recognizing content using a binary feature point.

콘텐츠(비디오 또는 오디오)를 인식하는 시스템이 잘 알려져 있다. 이 콘텐츠 인식 시스템은 흔히 핑거프린트(fingerprint) 또는 해시(hash)라고 불리는 특징점을 입력 콘텐츠에서 추출하며, 이를 기존에 구성된 데이터베이스에서 검색하여 입력 콘텐츠를 인식한다. 이때 사용되는 특징점은 주로 실수형 특징점 또는 이진 특징점으로 나뉠 수 있다. 특히, 이진 특징점이 콘텐츠 인식 시스템에서 많이 사용되고 있다. 이는 이진 특징점의 경우, 데이터베이스에서 차지하는 용량과 검색 시간을 줄일 수 있기 때문이다.Systems that recognize content (video or audio) are well known. This content recognition system extracts feature points, commonly referred to as fingerprints or hashes, from the input content and retrieves it from a previously configured database to recognize the input content. In this case, the feature points used may be mainly divided into real-type feature points or binary feature points. In particular, binary feature points are frequently used in content recognition systems. This is because binary feature points can reduce the capacity and search time taken up by the database.

도 1에서와 같이 이진 특징점은 주로 정규화(S10), 계산(S20), 및 문턱 적용(S30)의 방법으로 생성된다고 할 수 있다. 콘텐츠 인식 시스템은 입력 콘텐츠에 대해 정규화 과정을 수행하여 일부 왜곡에 상관없는 형태의 일정한 형식을 가지는 데이터로 변환한다. 콘텐츠 인식 시스템은 정규화 과정을 통해 변환된 데이터에서 여러 가지 수학적 계산을 거쳐서 왜곡에 강인한 형태의 특징을 뽑게 되고, 이 특징값을 문턱 값과 비교하여 이진 특징점을 추출한다.As shown in FIG. 1, binary feature points are mainly generated by methods of normalization S10, calculation S20, and threshold application S30. The content recognition system performs a normalization process on the input content and converts the data into data having a certain format regardless of some distortion. The content recognition system extracts a feature that is robust to distortion through various mathematical calculations from the transformed data through normalization process, and extracts binary feature points by comparing this feature value with a threshold value.

이와 같이 콘텐츠로부터 추출된 특징점은 검색에 이용된다. 사용자가 어떤 무명의 콘텐츠에 대한 정보를 알고 싶을 때, 이 무명의 콘텐츠를 콘텐츠 인식 시스템의 질의(query)로 입력하면, 콘텐츠 인식 시스템은 입력된 콘텐츠에 대해 특징점을 추출하고, 미리 구성된 데이터베이스에 저장된 특징점들과 비교하여 동일한 특징점을 갖는 콘텐츠를 검색하며, 그 검색된 콘텐츠 정보를 사용자에게 제공하게 된다.In this way, the feature points extracted from the contents are used for searching. When a user wants to know information about an anonymous content, the anonymous content is entered into a query of a content recognition system, and the content recognition system extracts a feature point for the input content and stores it in a preconfigured database. Compared with the feature points, a content having the same feature point is searched for, and the retrieved content information is provided to the user.

한편, 도 1의 이진 특징점 추출 방식과 관련하여 비특허문헌 [1]에는 오디오를 정해진 포맷에 따라서 정규화(normalization)하고, 고속 푸리에 변환(Fast Fourier Transformation, FFT) 후 주파수 밴드에 따른 에너지의 주파수-시간적 차이를 계산하며, 그 계산한 값이 음인지 양인지에 따라서 특징점을 1 아니면 0의 값으로 결정하는 방식이 알려져 있다. 그리고 비특허문헌 [2]에는 비디오를 정규화하고 그 밝기의 차이를 계산하여 그 값이 양인지 음인지에 따라서 특징점을 0 아니면 1의 값으로 결정하는 방식이 알려져 있다. 그러나 도 1과 같은 방식의 경우, 특징점은 전적으로 한 번의 계산 결과에 의존하게 된다. 특징점은 다양한 왜곡에 강인해야 하는데, 계산 과정에서 구한 값이 특정 왜곡에 약하다면 특징점 자체의 특성이 나빠지게 되는 것이다.On the other hand, in relation to the binary feature point extraction method of FIG. 1, the non-patent document [1] describes the normalization of audio according to a predetermined format, and the frequency of energy according to the frequency band after Fast Fourier Transformation (FFT). It is known to calculate the temporal difference and determine the feature point as 1 or 0 according to whether the calculated value is negative or positive. In the non-patent document [2], a method of normalizing a video, calculating a difference in brightness, and determining a feature point as 0 or 1 according to whether the value is positive or negative is known. However, in the case of FIG. 1, the feature point depends entirely on the result of one calculation. The feature point must be robust to various distortions. If the value obtained in the calculation process is weak to a specific distortion, the characteristic of the feature point itself becomes worse.

[1] J. Haitsma and T. Kalker, “A highly robust audio fingerprinting system,” Proc. Int. Conf. Music Information Retrieval, 2002.[1] J. Haitsma and T. Kalker, “A highly robust audio fingerprinting system,” Proc. Int. Conf. Music Information Retrieval, 2002. [2] J. Oostveen, T. Kalker, and J. Haitsma, "Feature extraction and a database strategy for video fingerprinting," Proc. Int. Conf. on Visual Information and Information Systems, pp. 117-128, 2002.[2] J. Oostveen, T. Kalker, and J. Haitsma, "Feature extraction and a database strategy for video fingerprinting," Proc. Int. Conf. on Visual Information and Information Systems, pp. 117-128, 2002.

본 발명은 콘텐츠 인식 성능을 개선할 수 있는 기술적 방안을 제공함을 목적으로 한다.An object of the present invention is to provide a technical solution that can improve the content recognition performance.

전술한 기술적 과제를 달성하기 위한 본 발명의 일 양상에 따른 가중 투표를 통한 이진 특징점을 이용한 콘텐츠 인식 시스템은 콘텐츠별 콘텐츠 검색을 위한 이진 특징점이 저장된 데이터베이스 및 입력 콘텐츠를 정규화하고, 상기 정규화된 데이터에 대해 서로 다른 계산 방식들을 적용하여 특징값들을 산출하고, 상기 산출된 특징값들을 각각의 문턱 값들과 비교하여 일련의 이진 값들로 구성되는 이진 특징점들을 추출하고, 상기 추출된 이진 특징점들을 구성하는 이진 값들 각각에 기설정된 가중치를 반영하며, 상기 가중치가 반영된 이진 특징점들로부터 최종 이진 특징점을 추출한 후, 상기 최종 이진 특징점과 동일한 이진 특징점을 갖는 콘텐츠의 메타데이터를 상기 데이터베이스에서 검색하는 콘텐츠 인식부를 포함한다.According to an aspect of the present invention, a content recognition system using binary feature points through weighted voting normalizes a database and input content storing binary feature points for content search by content, and applies the normalized data to the normalized data. Apply different calculation schemes to calculate feature values, compare the calculated feature values with respective threshold values, extract binary feature points consisting of a series of binary values, and binary values constituting the extracted binary feature points. And a content recognizing unit for reflecting a predetermined weight to each, extracting a final binary feature point from the weighted binary feature points, and then searching the database for metadata of content having the same binary feature point as the final binary feature point.

여기서 상기 가중치는 이진 값들의 위치별로 각각 설정된 값이며, 총 합이 1이다. 그리고 상기 콘텐츠 인식부는 상기 이진 특징점들의 동일 위치의 이진 값들의 총 합에 따라 최종 이진 비트 값을 결정하여 상기 최종 이진 특징점을 추출한다.Here, the weights are values set for respective positions of binary values, and the sum is 1. The content recognizing unit extracts the final binary feature point by determining a final binary bit value according to the total sum of binary values at the same positions of the binary feature points.

한편, 전술한 기술적 과제를 달성하기 위한 본 발명의 일 양상에 따른 가중 투표를 통한 이진 특징점 추출 방법은 정규화된 입력 콘텐츠에 대해 서로 다른 계산 방식들을 적용하여 특징값들을 산출하는 단계, 상기 산출된 특징값들을 각각의 문턱 값들과 비교하여 일련의 이진 값들로 구성되는 이진 특징점들을 추출하는 단계, 상기 추출된 이진 특징점들을 구성하는 이진 값들 각각에 기설정된 가중치를 반영하는 단계, 및 상기 가중치가 반영된 이진 특징점들로부터 최종 이진 특징점을 추출하는 단계를 포함한다.On the other hand, the binary feature point extraction method using weighted voting according to an aspect of the present invention for achieving the above technical problem is calculated by applying different calculation methods to the normalized input content, the calculated feature Extracting binary feature points consisting of a series of binary values by comparing the values with respective threshold values, reflecting a predetermined weight on each of the binary values constituting the extracted binary feature points, and applying the weighted binary feature point. Extracting the final binary feature points from them.

본 발명에 따른 콘텐츠 인식 시스템은 다양한 계산 방식들을 이용하여 이진 특징점들을 추출하고, 가중 투표 과정을 통해 새로운 최종 이진 특징점을 만들어 이를 기반으로 데이터베이스를 검색하므로, 시스템의 성능을 높일 수 있다. 즉, 특징점의 강인성을 높일 수 있게 되어 결과적으로 시스템의 성능을 높일 수 있는 것이다.The content recognition system according to the present invention extracts binary feature points by using various calculation methods, creates a new final binary feature point through a weighted voting process, and searches a database based on this, thereby improving system performance. That is, the robustness of the feature point can be increased, and as a result, the performance of the system can be improved.

도 1은 종래 이진 특징점 추출 과정을 나타낸 도면.
도 2는 본 발명의 일 실시예에 따른 가중 투표를 통한 이진 특징점을 이용한 콘텐츠 인식 시스템 블록도.
도 3은 본 발명의 일 실시예에 따른 가중 투표를 통한 이진 특징점 추출 과정을 나타낸 도면.1 is a diagram illustrating a conventional binary feature point extraction process.
2 is a block diagram of a content recognition system using binary feature points through weighted voting according to an embodiment of the present invention.
3 is a diagram illustrating a binary feature point extraction process through weighted voting according to an embodiment of the present invention.

전술한, 그리고 추가적인 본 발명의 양상들은 첨부된 도면을 참조하여 설명되는 바람직한 실시예들을 통하여 더욱 명백해질 것이다. 이하에서는 본 발명을 이러한 실시예를 통해 당업자가 용이하게 이해하고 재현할 수 있도록 상세히 설명하기로 한다.BRIEF DESCRIPTION OF THE DRAWINGS The foregoing and further aspects of the present invention will become more apparent from the following detailed description of preferred embodiments with reference to the accompanying drawings. Hereinafter, the present invention will be described in detail to enable those skilled in the art to easily understand and reproduce the present invention.

도 2는 본 발명의 일 실시예에 따른 가중 투표를 통한 이진 특징점을 이용한 콘텐츠 인식 시스템 블록도이며, 도 3은 본 발명의 일 실시예에 따른 가중 투표를 통한 이진 특징점 추출 과정을 나타낸 도면이다.2 is a block diagram of a content recognition system using binary feature points through weighted voting according to an embodiment of the present invention, and FIG. 3 illustrates a process of extracting binary feature points through weighted voting according to an embodiment of the present invention.

도시된 바와 같이, 콘텐츠 인식 시스템은 데이터베이스(100) 및 콘텐츠 인식부(210)를 포함한다. 데이터베이스(100)에는 콘텐츠별 메타데이터와 콘텐츠 검색을 위한 이진 특징점이 저장된다. 여기서 메타데이터라 함은 콘텐츠의 구체적인 정보를 의미하는 것이다. 그리고 데이터베이스(100)에 저장된 콘텐츠별 이진 특징점은 도 3에 도시된 바에 따라 추출된 최종 이진 특징점이다. 콘텐츠 인식부(210)는 입력 콘텐츠의 이진 특징점들을 이용하여 데이터베이스(100) 검색을 통해 입력 콘텐츠를 인식하는 구성이다. 이 같은 콘텐츠 인식부(210)는 도시된 바와 같이 디지털 신호 처리기(digital signal processor)(200)에 구현된 소프트웨어적 모듈일 수 있다.As shown, the content recognition system includes a database 100 and a content recognition unit 210. The database 100 stores metadata for each content and binary feature points for content search. Here, metadata means specific information of content. The binary feature points for each content stored in the database 100 are final binary feature points extracted as illustrated in FIG. 3. The content recognizing unit 210 is a component that recognizes the input content by searching the database 100 using binary feature points of the input content. The content recognizing unit 210 may be a software module implemented in the digital signal processor 200 as shown.

도 3에 예시된 바와 같이, 콘텐츠 인식부(210)는 입력 콘텐츠에 대해 정규화(S200), 계산(S300), 문턱적용(S400), 및 가중 투표(S500) 과정을 통해 최종 이진 특징점을 추출한다. 우선, 콘텐츠 인식부(210)는 시스템에 질의(query) 입력된 콘텐츠에 대해 정규화 과정을 수행하여 일부 왜곡에 상관없는 형태의 일정한 형식을 가지는 데이터로 변환한다(S200). 이 정규화 과정은 기존과 동일하다. 정규화 과정이 완료되면, 콘텐츠 인식부(210)는 정규화 과정을 통해 얻어진 일정한 형식을 가지는 데이터에 대해 서로 다른 계산 방식들(계산 1, 계산 2, 계산 3, ......, 계산 N)을 적용하여 왜곡에 강인한 형태의 특징값들(특징 1, 특징 2, 특징 3, ......, 특징 N)을 추출한다(S300). 여기서 계산 방식들로는 FFT 후 주파수 밴드에 따른 에너지의 주파수-시간적 차이를 계산하는 방식, 각 주파수 밴드의 중심값 및 그 중심값들의 밴드별 차이값을 계산하는 방식, 각 주파수 밴드의 x차 무게중심 또는 그 무게중심의 밴드별 차이값을 계산하는 방식, 각 주파수 밴드의 spectral flatness, MFCC(Mel-frequency cepstral coefficients) 등과 같은 잘 알려진 계산 방식들이 이용될 수 있다.As illustrated in FIG. 3, the content recognizing unit 210 extracts the final binary feature point through the normalization (S200), the calculation (S300), the threshold application (S400), and the weighted voting process (S500). . First, the content recognizing unit 210 performs a normalization process on the content input to the system (query) and converts the data into a data having a certain format regardless of some distortion (S200). This normalization process is the same as before. When the normalization process is completed, the content recognizing unit 210 calculates different calculation methods (calculation 1, calculation 2, calculation 3, ......, calculation N) on data having a predetermined format obtained through the normalization process. The feature values (feature 1, feature 2, feature 3, ..., feature N) of the shape which is robust to distortion are extracted by applying (S300). Here, the calculation methods include a method of calculating the frequency-time difference of energy according to the frequency band after FFT, a method of calculating the center value of each frequency band and the difference value of each band of the center values, the x-order center of gravity of each frequency band, or Well-known calculation methods such as the method of calculating the band-specific difference of the center of gravity, the spectral flatness of each frequency band, and the mel-frequency cepstral coefficients (MFCC) may be used.

계산 과정이 완료되면, 콘텐츠 인식부(210)는 계산 과정을 통해 추출된 특징값들을 각각의 문턱 값들과 비교(문턱적용 1, 문턱적용 2, 문턱적용 3, ......, 문턱적용 N)하여 그 값이 문턱 값보다 크면 1로 결정하고 작으면 0으로 결정하여 일련의 이진 값들로 구성되는 이진 특징점들(이진 특징점 1, 이진 특징점 2, 이진 특징점 3, ......, 이진 특징점 N)을 추출한다(S400). 문턱 적용 과정이 완료되면, 콘텐츠 인식부(210)는 문턱 적용 과정을 통해 추출된 이진 특징점들로부터 최종 이진 특징점을 결정하는 가중 투표 과정을 수행한다(S500). N개의 이진 특징점을 가중 투표하는 방법은 전체 합이 1이 되는 가중치를 0과 1로 이루어진 이진 특징점에 곱한 후, 총 더한 값이 0.5보다 크면 비트 1을 최종 이진 특징점에 할당하고, 그렇지 않으면 0을 할당하는 것이다. 즉, 최종 이진 특징점의 k번째 비트를 F_k라 하면, F_k는 수학식 1과 같다.When the calculation process is completed, the content recognition unit 210 compares the feature values extracted through the calculation process with the respective threshold values (threshold application 1, threshold application 2, threshold application 3, ..., threshold application). N) Binary feature points (binary feature point 1, binary feature point 2, binary feature point 3, ......, Binary feature point N) is extracted (S400). When the threshold application process is completed, the content recognizing unit 210 performs a weighted voting process for determining the final binary feature points from the binary feature points extracted through the threshold application process (S500). The weighted voting method of N binary feature points is multiplied by a binary feature point of 0 and 1 whose total sum is 1, and then assigns bit 1 to the final binary feature point if the total sum is greater than 0.5, otherwise 0 is assigned. To assign. That is, assuming that the k th bit of the final binary feature point is F _k , F _k is represented by Equation 1 below.

여기서

은 n번째 가중치,

은 특징점 n의 k번째 비트라고 할 수 있다. 이때,

이다. 가중 투표를 통해 최종 이진 특징점이 추출되면, 콘텐츠 인식부(210)는 그 결정된 최종 이진 특징점을 가지고 데이터베이스(100)에서 해당 콘텐츠를 검색하며, 검색된 콘텐츠의 메타데이터를 결과물로 출력한다.here

Is the nth weight,

Is the k-th bit of the feature point n. At this time,

to be. When the final binary feature point is extracted through weighted voting, the content recognizing unit 210 retrieves the corresponding content from the database 100 with the determined final binary feature point, and outputs the metadata of the retrieved content as a result.

한편, 가중 투표에서 사용하는 가중치는 훈련(S100)을 통해서 결정된다. 훈련은 훈련 데이터를 기반으로 가장 좋은 성능을 보이는 가중치를 찾아내는 과정으로서, 시스템이 만들어내는 단계에서 수행되는 과정이다. 훈련 과정은 다음과 같은 방법으로 수행될 수 있다. 일단 훈련을 위해서는 훈련 데이터가 필요하다. 훈련 데이터는 쌍의 형태로 존재한다. 이진 특징점 자체가 콘텐츠가 같은지 다른지를 판단하는 형태로 사용되기 때문에, 훈련 데이터 역시 쌍의 형태로 존재하며, 두 이진 특징점이 동일한 콘텐츠에서 추출된 것인지 아니면 다른 콘텐츠에서 추출된 것인지에 대한 정보를 가진다. 설명의 편의를 위해서 훈련 데이터 세트는

의 쌍으로 정의한다. 그리고 i<=I_1까지는 동일한 콘텐츠에서 나온 이진 특징점으로 동일한 비트 값이 나와야 하고, 그 이외의 값은 서로 다른 비트 값을 가져야 한다.On the other hand, the weight used in the weighted voting is determined through the training (S100). Training is the process of finding the best performing weights based on the training data, which is the process performed by the system. The training process can be carried out in the following way. Once training requires training data. Training data exists in pairs. Since the binary feature points themselves are used in the form of judging whether the content is the same or different, training data also exists in pairs, and has information about whether two binary feature points are extracted from the same content or from different content. For convenience of explanation, the training data set is

It is defined as a pair of. And until i <= I_1, the same bit value should come out as a binary feature point from the same content, and other values should have different bit values.

이에 대해 부연 설명한다. 보통의 훈련 데이터는 실제 사용하는 데이터 + 그것의 정답으로 구성된다. 예를 들어, 연예인들에 대한 얼굴인식 알고리즘을 위한 훈련 데이터를 만든다고 한다면, (장동건얼굴1.jpg, “장동건”), (장동건얼굴2.jpg, “장동건”), ... (장동건얼굴100.jpg, “장동건”), (한예슬얼굴1.jpg, “한예슬”), ... (한예슬얼굴100.jpg, “한예슬”), (한효주얼굴1.jpg, “한효주”), ... (한효주얼굴100.jpg, “한효주”) 이런 식으로 될 것이다. 즉, (data, data의 정답 값)과 같이 구성되는 것이다. 그러나 훈련 과정에서 요구되는 훈련 데이터는 정답 값이 두 개가 같은지 다른지에 대한 것이다. 이런 식의 구성은 몇몇 논문들에서 사용하였던 방법으로서, 참고 논문 정보는 아래와 같다.This will be explained further. Normal training data consists of actual data + its correct answer. For example, if you create training data for face recognition algorithms for entertainers, (Jang Dong Gun face 1.jpg, “Jang Dong Gun”), (Jang Dong Gun face 2.jpg, “Jang Dong Gun”), ... (Jang Dong Gun face 100) .jpg, “Jang Dong Gun”), (Han Ye Seul Face 1.jpg, “Han Ye Seul”), ... (Han Ye Seul Face 100.jpg, “Han Ye Seul”), (Han Hyo Joo Face 1.jpg, “Han Hyo Joo”), ... (Han Hyo Joo Face 100.jpg, “Hyo Hyo Joo”) It will be like this. That is, it is configured as (data, correct answer value of data). However, the training data required during the training process is whether the two answers are the same or different. This structure is the method used in some papers. Reference paper information is as follows.

[참고 1] Dalwon Jang, Chang D. Yoo, and Ton Kalker, “Distance metric learning for content identification,” IEEE Trans. Information Forensics and Security, vol. 5, no. 4, Dec. 2010.[Reference 1] Dalwon Jang, Chang D. Yoo, and Ton Kalker, “Distance metric learning for content identification,” IEEE Trans. Information Forensics and Security, vol. 5, no. 4, Dec. 2010.

[참고2] Dalwon Jang, Chang D. Yoo, Sunil Lee, Sungwoong Kim, and Ton Kalker, "Pairwise Boosted Audio Fingerprint," IEEE Trans. Information Forensics and Security, vol. 4, no. 4, pp. 995-1004, Dec. 2009. [Reference 2] Dalwon Jang, Chang D. Yoo, Sunil Lee, Sungwoong Kim, and Ton Kalker, "Pairwise Boosted Audio Fingerprint," IEEE Trans. Information Forensics and Security, vol. 4, no. 4, pp. 995-1004, Dec. 2009.

따라서 두 개가 같다는 정답 값에 대해서 하나의 쌍이 있어야 하고, 두 개가 다르다는 정답 값에 대해서 하나의 쌍이 있어야 한다. 이에 대해 오디오를 예를 들어 설명하면 다음과 같다. 어떤 노래 A에 대해, 노이즈를 첨가하거나, 스피커 출력 후 녹음, 지연(delay), 압축 등의 왜곡을 가한 것은 A', A'', A''' 등으로 표현한다. 그리고 다른 노래 B, C, D 등도 존재한다. 이 경우, ((A, A'), “같은 노래”), ((A, A''), “같은 노래”), ((A, A'''), “같은 노래”), ((A, B), “다른 노래”), ((A, C), “다른 노래”), ((A, D), “다른 노래”)와 같은 훈련 데이터를 가질 수 있다.Therefore, there must be one pair for the correct answer that the two are equal, and one pair for the correct answer that the two is different. Audio will be described as an example as follows. For a song A, noise is added or distortion after recording, delay, compression, etc. after speaker output is expressed as A ', A' ', A' '', or the like. And other songs B, C, and D. In this case, ((A, A '), "same song"), ((A, A' '), "same song"), ((A, A' ''), "same song"), (( A, B), “another song”), ((A, C), “another song”), ((A, D), “another song”).

이 같은 훈련 데이터를 가지고 N가지 종류의 이진 특징점을 추출해낼 수 있다. 그리고 그것들의 각각의 비트를

라 할 수 있다. 여기서 설명의 편의를 위해서 i<=I_1까지는 “같은 노래”라고 되어 있는 쌍이다. 따라서 M개의 쌍이 “같은 노래”라고 되어 있고, 각 노래당 P개의 비트가 나온다면 I_1=M*N*P가 된다. 여기서 P는 이진 특징점을 어떻게 만드느냐에 따라 달라진다. 예를 들어, 0.1초에 8비트씩 나오게 하거나 32비트씩 나오게 하느냐에 따라 P 값이 달라진다. 이 같이 I_1개의 서로 같은 노래에서 나온 비트를 만들고, 반대로 서로 다른 노래에서 나온 비트들도 I_2개를 만들고, I_1+I_2=I라고 정의한다. 이렇게 하면 훈련 데이터 전체를

for i=1,2,..., I로 둘 수 있다. 그리고 훈련은 수학식 2를 통해서 이루어진다.With this training data, we can extract N kinds of binary feature points. And each bit of them

. For convenience of explanation, i <= I_1 is a pair of “same song”. Thus, if M pairs are called "same song" and there are P bits per song, then I_1 = M * N * P. Where P depends on how the binary feature points are made. For example, the value of P depends on whether 8-bits or 32-bits are output in 0.1 second. Likewise, I_1 beats are made from the same song and vice versa. I_2 beats are also made from different songs, and I_1 + I_2 = I is defined. This will allow the entire training data

for i = 1,2, ..., I And training is done through Equation 2.

subject to

subject to

단,

only,

즉, i가 I_1보다 작은 경우에는 서로 같은 비트 값을 가져야 하기 때문에 쌍의 두 값이 모두 0.5보다 크거나 0.5보다 작아야 하고, 0.5와의 차이가 크면 클수록 좋다. 그래서 두 개의 곱을 최대화하였고, I_1보다 큰 경우에는 이값을 음수로 유지해야 한다. 따라서 수학식 2를 만들었으며, 수학식 2를 최적화하는

값(n=1,...,N)을 찾아야 한다.

값을 찾는 것은 semi-definite program이나 gradient descent 방법을 통해서 구할 수 있다.
That is, when i is smaller than I_1, the two bit values must be greater than 0.5 or less than 0.5 because the same bit values must be the same. So we have maximized the two products, and if it is greater than I_1 we should keep this value negative. Therefore, Equation 2 is created, and Equation 2 is optimized

Find the value (n = 1, ..., N).

Finding the value can be obtained through a semi-definite program or a gradient descent method.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far I looked at the center of the preferred embodiment for the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

100 : 데이터베이스 200 : 디지털 신호 처리기
210 : 콘텐츠 인식부100: database 200: digital signal processor
210: content recognition unit

Claims

A database storing binary feature points for content search by content; And
Normalize the input content, apply different calculation schemes to the normalized data, calculate feature values, compare the calculated feature values with respective threshold values, extract binary feature points consisting of a series of binary values, and And extracting a predetermined binary feature point from each of the binary values constituting the extracted binary feature points, extracting a final binary feature point from the weighted binary feature points, and extracting metadata of content having the same binary feature point as the final binary feature point. A content recognizing unit for searching in the database;
Content recognition system using a binary feature point through weighted voting characterized in that it comprises a.

The method of claim 1,
The weight is a content recognition system using a binary feature point through weighted voting, characterized in that the sum total is 1.

The method of claim 2,
And the content recognizing unit determines a final binary bit value of the final binary feature point according to Equation 1 below.
[Equation 1]

(here,

Is the nth weight,

Is the kth bit of feature point n)

Database; And
A binary feature extraction method performed by a content recognition system including a content recognition unit which is a software module implemented in a digital signal processor,
Calculating, by the content recognizing unit, feature values by applying different calculation schemes to normalized input content;
Extracting, by the content recognizing unit, binary feature points consisting of a series of binary values by comparing the calculated feature values with respective threshold values;
Reflecting, by the content recognizing unit, a predetermined weight on each of the binary values constituting the extracted binary feature points; And
Extracting, by the content recognizing unit, a final binary feature point from the binary feature points in which the weight is reflected;
Binary feature point extraction method through weighted voting comprising a.

5. The method of claim 4,
The weighting method is a binary feature point extraction method using weighted voting, characterized in that the sum total.

The method of claim 5,
And extracting the final binary feature point to determine a final binary bit value of the final binary feature point according to Equation 1 below.
[Equation 1]

(here,

Is the nth weight,

Is the kth bit of feature point n)