KR102004929B1

KR102004929B1 - System and method for searching similarity of multimedia files

Info

Publication number: KR102004929B1
Application number: KR1020170026160A
Authority: KR
Inventors: 고영웅; 김민자
Original assignee: 한림대학교 산학협력단
Priority date: 2017-02-28
Filing date: 2017-02-28
Publication date: 2019-07-29
Also published as: KR20180099126A

Abstract

본 발명은 멀티미디어 파일 유사도 검색 시스템 및 방법에 관한 것으로, 일실시예에 따른 멀티미디어 파일 유사도 검색 시스템은 멀티미디어 파일에서 연속하는 유사한 프레임을 모아 하나의 비디오 샷으로 분류하는 비디오 샷 분류부; 비디오 샷 분류부에서 분류된 비디오 샷의 프레임들 중에서 소정의 프레임을 대표 프레임으로 선정하는 대표 프레임 선정부; 대표 프레임 선정부에서 선정된 대표 프레임에 비디오 핑거프린트 기술을 적용하여 비디오 특징을 추출해 내는 핑거프린트 적용부; 및 핑거프린트 적용부에서 추출된 원본의 멀티미디어 파일의 비디오 특징과 변경된 멀티미디어 파일의 비디오 특징을 비교하여 유사도를 측정하는 유사도 측정부를 포함함으로써, 멀티미디어 파일 간의 유사성 검색을 이용하여 원본 콘텐츠에 대한 불법적인 배포 및 변경 여부를 판단할 수 있다.The present invention relates to a multimedia file similarity retrieval system and method, and a multimedia file similarity retrieval system according to an exemplary embodiment of the present invention includes a video shot classifying unit for classifying consecutive similar frames in a multimedia file into one video shot; A representative frame selection unit for selecting a predetermined frame among the frames of the video shot classified by the video shot classification unit as a representative frame; A fingerprint application unit for extracting video features by applying a video fingerprint technique to a representative frame selected by the representative frame selection unit; And a similarity measuring unit for comparing the video characteristic of the original multimedia file extracted by the fingerprint application unit with the video characteristic of the changed multimedia file to thereby measure the similarity of the changed multimedia file. Thus, illegal distribution And can determine whether or not it has changed.

Description

{System and method for searching similarity of multimedia files}

본 발명은 멀티미디어 파일 유사도 검색 시스템 및 방법에 관한 것으로, 미디어 인식 정보 탐지 및 비디오 핑거프린트 기반 매칭 기법을 기반으로 하는 멀티미디어 파일 유사도 검색 시스템 및 방법에 관한 것이다.The present invention relates to a multimedia file similarity retrieval system and method, and more particularly, to a multimedia file similarity retrieval system and method based on media recognition information detection and video fingerprint-based matching.

파일 포렌식의 주요 방법은 파일의 유사성 및 중복성 분석이다. 파일의 유사성 및 중복 검색 기법은 파일 동기화 및 네트워크 중복 제거를 위한 중요한 메커니즘이다. The main method of file forensics is file similarity and redundancy analysis. File similarity and duplicate detection techniques are important mechanisms for file synchronization and network deduplication.

블록 해싱 기법은 파일 유사성 및 중복성 검색에 사용되는 가장 중요한 요소로써, 파일을 다수의 작은 블록으로 분할하고, 해시 함수를 이용하여 각 블록을 식별하기 위한 해시 키를 추출하는 것이다. 파일을 작은 블록으로 분할하는 것을 청킹이라 하고, 그 분할된 단위가 청크가 된다. 청크 기반 중복 제거는 데이터 중복 제거에 매우 효과적인 기술이다. The block hashing technique is the most important factor used in file similarity and redundancy search. It divides a file into a number of small blocks and extracts a hash key for identifying each block using a hash function. Dividing a file into small blocks is called chunking, and the divided unit becomes a chunk. Chunk-based deduplication is a very effective technique for deduplication.

파일을 여러 블록으로 분할하는 방법으로는 고정 길이 청킹(Fixed-length chunking) 및 가변 길이 청킹(variable-length chunking)의 두 가지 청킹 방법이 있다. There are two chunking methods of dividing a file into several blocks: fixed-length chunking and variable-length chunking.

고정 길이 청킹은 파일을 고정된 크기의 여러 블록으로 분할한 다음 해시 함수를 적용하여 각 블록에서 해시 키를 추출하는 것으로, 이는 매우 간단한 방법으로 가변 길이 청킹보다 구현이 용이하다. 하지만, 이 방법의 단점은 블록 경계에서 일부 데이터가 삽입되거나 삭제되는 경우 해시 값의 결과가 완전히 다른 값으로 변경될 수 있다는 것이다. 따라서 이 방법은 유사성 분석을 통한 정확성 측면에서 보면 일반적으로 좋은 성능을 기대하기 어렵다. Fixed-length chunking divides a file into several blocks of fixed size and then applies a hash function to extract a hash key from each block. This is easier to implement than variable-length chunking in a very simple way. However, the disadvantage of this method is that if some data is inserted or deleted at the block boundary, the result of the hash value can be changed to a completely different value. Therefore, this method is generally not expected to be good in terms of accuracy through similarity analysis.

가변 길이 청킹은 각 블록을 '앵커(anchor)'라는 미리 정의된 특수 값을 사용하여 분할하는 것으로, 고정 길이 청킹 방식의 데이터 이동 문제를 방지할 수 있다. 따라서 가변 길이 청킹은 고정 길이 청킹보다 유연하지만 파일 유사성을 찾는데 있어 좋은 성능을 얻기 위해 더 많은 계산 시간이 요구된다.Variable length chunking is achieved by dividing each block using a predefined special value called an anchor, thereby preventing the data moving problem of the fixed length chunking scheme. Thus, variable-length chunking is more flexible than fixed-length chunking, but requires more computation time to achieve good performance in finding file similarities.

즉, 기존에 파일 유사도 측정 시 사용되는 바이트 단위의 비교 및 블록 기반 기법은 멀티미디어와 같은 대용량 파일에 적용하기에는 데이터의 계산량과 처리 시간에 대한 오버헤드가 너무 크다. 더욱이 멀티미디어의 특성상 눈으로 확인 시 똑같은 영상의 파일임에도 불구하고 바이트의 내용이 완전히 달라질 수 있기 때문에, 유사도 측정의 정확성이 현저하게 떨어질 수 있다. 파일 유사도 검색 시 사용되는 기존의 방법을 멀티미디어 파일에 그대로 적용하면 너무나 비효율적 유사도 검색 방법이 된다는 문제가 있었다.That is, the byte-based comparison and block-based technique used for measuring the file similarity has a too large overhead for the amount of data to be processed and the processing time to be applied to a large-capacity file such as multimedia. Furthermore, due to the nature of multimedia, the accuracy of similarity measurement can be significantly reduced because the content of the byte can be completely different even though it is a file of the same image when viewed with eyes. There has been a problem in that it is a very inefficient similarity retrieval method when the existing method used in retrieving the file similarity is applied to the multimedia file as it is.

상술한 문제점을 해결하기 위해, 본 발명은 미디어 인식 정보 탐지 및 비디오 핑거프린트 기반 매칭 기법을 기반으로 하는 멀티미디어 파일 유사도 검색 시스템 및 방법을 제공하는 것을 목적으로 한다.It is another object of the present invention to provide a multimedia file similarity search system and method based on media recognition information detection and video fingerprint based matching.

상술한 목적을 달성하기 위해, 본 발명의 일실시예에 따른 멀티미디어 파일 유사도 검색 시스템은 멀티미디어 파일에서 연속하는 유사한 프레임을 모아 하나의 비디오 샷으로 분류하는 비디오 샷 분류부; 상기 비디오 샷 분류부에서 분류된 상기 비디오 샷의 프레임들 중에서 소정의 프레임을 대표 프레임으로 선정하는 대표 프레임 선정부; 상기 대표 프레임 선정부에서 선정된 대표 프레임에 비디오 핑거프린트 기술을 적용하여 비디오 특징을 추출해 내는 핑거프린트 적용부; 및 상기 핑거프린트 적용부에서 추출된 원본의 멀티미디어 파일의 비디오 특징과 변경된 멀티미디어 파일의 비디오 특징을 비교하여 유사도를 측정하는 유사도 측정부를 제공한다.According to an aspect of the present invention, there is provided a multimedia file similarity searching system comprising: a video shot classifying unit for classifying consecutive similar frames in a multimedia file into one video shot; A representative frame selection unit selecting a predetermined frame among the frames of the video shot classified by the video shot classification unit as a representative frame; A fingerprint application unit for extracting video features by applying a video fingerprint technique to a representative frame selected by the representative frame selection unit; And a similarity measuring unit for comparing the video characteristic of the original multimedia file extracted by the fingerprint application unit and the video characteristic of the changed multimedia file to measure the similarity.

상기 비디오 샷 분류부는 상기 비디오 샷의 분류를 위한 비디오 샷 경계를 검출함에 있어 색상·채도·명도(HSV)의 색 공간을 기반으로 히스토그램 유사성을 비교하여 비디오 샷 경계를 검출할 수 있다.The video shot classification unit may detect a video shot boundary by comparing histogram similarities based on a color space of hue, saturation, and brightness (HSV) in detecting a video shot boundary for classifying the video shots.

상기 대표 프레임 선정부는 각 비디오 샷의 첫 번째 프레임을 대표 프레임으로 선정할 수 있다.The representative frame selection unit may select a first frame of each video shot as a representative frame.

상기 핑거프린트 적용부는 상기 대표 프레임 선정부에서 선정된 대표 프레임에 비디오 핑거프린트 기술로 지각 이미지 해싱 기법을 적용하여 비디오 특징을 추출해 낼 수 있다.The fingerprint application unit may extract a video feature by applying a perception image hashing technique to a representative frame selected by the representative frame selection unit using a video fingerprint technique.

상기 지각 이미지 해싱 기법은 DCT 기반 해시 기술을 이용하여 지각 이미지와 관련된 저주파수 DCT 계수를 얻을 수 있다.The perceptual image hashing technique can obtain low frequency DCT coefficients related to the perception image using the DCT based hash technique.

본 발명의 일실시예에 따른 멀티미디어 파일 유사도 검색 방법은 멀티미디어 파일에서 연속하는 유사한 프레임을 모아 하나의 비디오 샷으로 분류하는 비디오 샷 분류 단계; 상기 비디오 샷 분류 단계에서 분류된 상기 비디오 샷의 프레임들 중에서 소정의 프레임을 대표 프레임으로 선정하는 대표 프레임 선정 단계; 상기 대표 프레임 선정 단계에서 선정된 대표 프레임에 비디오 핑거프린트 기술을 적용하여 비디오 특징을 추출해 내는 핑거프린트 적용 단계; 및 상기 핑거프린트 적용 단계에서 추출된 원본의 멀티미디어 파일의 비디오 특징과 변경된 멀티미디어 파일의 비디오 특징을 비교하여 유사도를 측정하는 유사도 측정 단계를 제공함으써, 상술한 목적을 달성할 수 있다. The multimedia file similarity retrieval method according to an exemplary embodiment of the present invention includes a video shot classification step of classifying consecutive similar frames in a multimedia file into one video shot; A representative frame selecting step of selecting a predetermined one of the frames of the video shot classified in the video shot sorting step as a representative frame; A fingerprint applying step of extracting a video feature by applying a video fingerprint technique to the representative frame selected in the representative frame selection step; And a similarity measuring step of comparing the video characteristic of the original multimedia file extracted in the fingerprint application step with the video characteristic of the changed multimedia file to measure the similarity, thereby achieving the above object.

상술한 구성에 의해, 본 발명은 멀티미디어 파일 간의 유사성 검색을 이용하여 원본 콘텐츠에 대한 불법적인 배포 및 변경 여부를 판단할 수 있다.According to the above-described configuration, the present invention can judge illegally distribution and change of original contents using similarity search between multimedia files.

또한, 본 발명은 비디오 샷의 경계 검출을 통해 손실 압축, 크기 조정, 프레임 속도 또는 형식 변환 등을 비롯한 일반적인 비디오 처리 단계에서의 변경에 대해 독립성을 확보할 수 있다.In addition, the present invention can ensure independence from changes in general video processing steps, including lossy compression, resizing, frame rate or format conversion, etc., through boundary detection of video shots.

또한, 본 발명은 비디오 핑거프린트를 사용하므로, 스토리지 복잡성이 낮아지고 비교 매칭 시간이 단축될 수 있다. Further, since the present invention uses a video fingerprint, the storage complexity can be lowered and the comparison matching time can be shortened.

도 1은 본 발명의 일실시예에 따른 멀티미디어 파일 검색 시스템의 개략적인 블록도를 도시하는 도면이다.
도 2는 본 발명의 일실시예에 따른 비디오 샷의 경계 검출을 설명하는 도면이다.
도 3은 본 발명의 일실시예에 따른 대표 프레임의 선정을 설명하는 도면이다.
도 4는 본 발명의 일실시예에 따른 해시리스트의 생성을 설명하는 도면이다.
도 5는 본 발명의 일실시예에 따라 생성된 원본 멀티미디어의 해시 리스트와 변경된 멀티미디어의 해시 리스트의 비교 예를 도시하는 도면이다.
도 6은 본 발명의 일실시예에 따른 멀티미디어 파일 검색 방법의 흐름도를 도시하는 도면이다.FIG. 1 is a schematic block diagram of a multimedia file search system according to an embodiment of the present invention. Referring to FIG.
2 is a diagram illustrating boundary detection of a video shot according to an embodiment of the present invention.
3 is a view for explaining selection of a representative frame according to an embodiment of the present invention.
4 is a diagram for explaining generation of a hash list according to an embodiment of the present invention.
5 is a diagram illustrating an example of a comparison between a hash list of original multimedia generated according to an embodiment of the present invention and a modified hash list of multimedia.
6 is a flowchart illustrating a multimedia file searching method according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명에 따른 멀티미디어 파일 유사도 검색 시스템 및 방법의 실시예들을 설명한다. 참고로, 아래에서 본 발명을 설명함에 있어, 본 발명의 구성요소를 지칭하는 용어들은 각각의 구성 요소들의 기능을 고려하여 명명된 것이므로, 본 발명의 기술적 구성요소를 한정하는 의미로 이해되어서는 안 될 것이다.Embodiments of a multimedia file similarity retrieval system and method according to the present invention will now be described with reference to the accompanying drawings. In the following description of the present invention, it is to be understood that the terminology used herein is for the purpose of describing particular and exemplary embodiments of the invention, Will be.

도 1은 본 발명의 일실시예에 따른 멀티미디어 파일 검색 시스템의 개략적인 블록도를 도시하는 도면이고, 도 2는 본 발명의 일실시예에 따른 비디오 샷의 경계 검출을 설명하는 도면이고, 도 3은 본 발명의 일실시예에 따른 대표 프레임의 선정을 설명하는 도면이고, 도 4는 본 발명의 일실시예에 따른 해시리스트의 생성을 설명하는 도면이고, 그리고 도 5는 본 발명의 일실시예에 따라 생성된 원본 멀티미디어의 해시 리스트와 변경된 멀티미디어의 해시 리스트의 비교 예를 도시하는 도면이다. FIG. 1 is a schematic block diagram of a multimedia file search system according to an embodiment of the present invention. FIG. 2 is a view for explaining boundary detection of a video shot according to an embodiment of the present invention. 4 is a view for explaining generation of a hash list according to an embodiment of the present invention, and FIG. 5 is a flowchart illustrating a method of generating a hash list according to an embodiment of the present invention FIG. 4 is a diagram illustrating a comparison between a hash list of original multimedia generated according to a modified multimedia hash list and a modified multimedia hash list.

도 1에 도시된 바와 같이, 멀티미디어 파일 검색 시스템은 비디오 샷 분류부(110), 대표 프레임 선정부(120), 핑거프린트 적용부(130) 및 유사도 측정부(140)를 포함한다.1, the multimedia file search system includes a video shot classification unit 110, a representative frame selection unit 120, a fingerprint application unit 130, and a similarity measurement unit 140.

비디오 샷 분류부(110)는 멀티미디어 파일에서 연속하는 유사한 프레임을 모아 하나의 비디오 샷으로 분류한다. 비디오 샷 분류부(110)는 비디오 샷의 분류를 위한 비디오 샷 경계를 검출함에 있어 색상·채도·명도(HSV)의 색 공간을 기반으로 히스토그램 유사성을 비교하여 비디오 샷의 경계를 검출한다.The video shot classifying unit 110 classifies consecutive similar frames in the multimedia file into one video shot. The video shot classification unit 110 detects a boundary of a video shot by comparing histogram similarities based on a color space of hue, saturation, and brightness (HSV) in detecting a video shot boundary for classifying video shots.

비디오 샷의 경계 검출은 비디오 분석을 수행하는 첫 번째 단계이며, 모든 비디오 프레임을 참조하지 않고도 더욱 쉽게 멀티미디어 파일 검색 시스템을 구성하는데 도움이 된다. 이 비디오 샷의 경계 검출은 비디오 데이터를 더욱 콤팩트한 형태로 만들기 위해 비디오 샷 간의 경계를 탐지하는 것이다. 시각적 유사성과 시간적 관계를 기반으로 유사한 프레임 세그먼트로 비디오를 분할하여 비디오 샷의 시작과 끝을 찾는다.Boundary detection of video shots is the first step in performing video analysis, and it is even easier to construct a multimedia file search system without reference to all video frames. The boundary detection of this video shot is to detect the boundary between video shots to make the video data more compact. Based on the visual similarity and temporal relationship, the video is segmented into similar frame segments to find the beginning and end of the video shot.

비디오는 프레임들의 연속으로서 유사한 프레임들을 묶어 여러 비디오 샷으로 분할될 수 있다. 분할된 비디오 샷들로부터 각 비디오 샷의 첫 번째 프레임은 각 분할된 비디오 청크의 경계 앵커로 사용될 수 있다.Video can be divided into several video shots by grouping similar frames as a series of frames. From the segmented video shots, the first frame of each video shot can be used as the boundary anchor of each segmented video chunk.

비디오 샷의 경계 검출을 위해 다음 4가지 접근 방식이 사용될 수 있다.The following four approaches can be used for boundary detection of video shots.

·색상/휘도 히스토그램(Color/Luminance histogram): 각 프레임의 모든 픽셀의 색상을 계산하고 이 값을 사용하여 히스토그램을 만든다. 인접 프레임의 히스토그램이 서로 연속적으로 비교된다.· Color / Luminance histogram: Computes the color of every pixel in each frame and uses this value to create a histogram. The histograms of the adjacent frames are successively compared with each other.

·휘도 값(Luminance values): 휘도 값을 비교하지만 조명 변화에 영향을 받는다. 이와 같은 단점을 극복하기 위해서는 적절한 색상 공간에서 하나 이상의 평균 값을 사용한다.· Luminance values: Compares the luminance values but is affected by lighting changes. To overcome these drawbacks, one or more average values are used in the appropriate color space.

·모서리(Edges): 각 프레임에서 모서리를 찾는다. 연속 프레임의 모서리 정보를 비교하며, 비교에 의한 변화의 감지는 곧 비디오 샷의 경계가 있음을 의미한다.· Edges: Find edges in each frame. The edge information of the continuous frame is compared, and the detection of the change by the comparison means that there is a boundary of the video shot.

·움직임(Motion): 비디오에서 오브젝트의 움직임 정보를 활용한다. 모션이 갑자기 변경되면 비디오 샷의 변경이 발생했음을 의미한다.· Motion: Utilizes object motion information in video. A sudden change in motion means that a change in video shot occurred.

비디오 샷의 검출 알고리즘은 이러한 접근법을 조합하여 설계할 수도 있다. 또한, SIFT(Scale Invariant Feature Transform) 또는 SURT(Speeded Up Robust Features)와 같은 객체 인식을 위한 알고리즘 등이 경계 탐지에 사용될 수 있다. 비디오 샷의 경계 검출은 콘텐츠 기반의 비디오 분석을 위한 가장 중요한 프로세스로, 미디어 인식 정보 검색의 정확성과 효율성에 직접적인 영향을 미친다. The detection algorithm of video shots may be designed by combining these approaches. In addition, algorithms for object recognition such as Scale Invariant Feature Transform (SIFT) or Speed Up Robust Features (SURT) can be used for boundary detection. The boundary detection of video shots is the most important process for content-based video analysis and has a direct impact on the accuracy and efficiency of media recognition information retrieval.

본 발명의 일실시예에서의 비디오 샷의 경계 검출의 접근 방법은 비디오 프레임의 HSV(Hue Saturation Value) 색 공간을 기반으로 히스토그램 유사성을 비교하여 비디오 샷의 경계를 검출하는 것이다. HSV 기법은 색상 정보에서 휘도 또는 이미지 밝기를 분리한다. 강도 값(the value of intensity)을 제거한 다음 HSV를 사용하여 많은 조명 변화를 처리할 수 있다. 연속 프레임의 frame_i와 frame_(i+1) 사이의 유사도를 측정하기 위해 상관 관계, 카이 제곱, 히스토그램 교차점 및 Bhattacharyya와 같은 히스트그램 매칭을 수행하는 여러가지 방법 중 하나를 선택할 수 있다. 본 발명의 일실시예의 예로 카이 제곱 거리가 이용되었다.In one embodiment of the present invention, a video shot boundary detection approach is to detect the boundaries of a video shot by comparing histogram similarities based on the HSV (Hue Saturation Value) color space of a video frame. The HSV technique separates luminance or image brightness from color information. After eliminating the value of intensity, HSV can be used to handle many lighting changes. In order to measure the similarity between frame_i and frame_ (i + 1) of successive frames, one can choose one of several methods to perform histogram matching such as correlation, chi-square, histogram intersection and Bhattacharyya. An example of one embodiment of the present invention is the Chi square distance.

본 발명의 일실시예에 따른 비디오 샷의 경계 검출을 설명하는 도면이 도 2에 도시되어 있다. 이를 통해 손실 압축, 크기 조정, 프레임 속도 또는 형식 변환 등을 비롯한 일반적인 비디오 처리 단계에서의 변경에 대해 독립성을 확보할 수 있다.FIG. 2 is a diagram illustrating boundary detection of a video shot in accordance with an embodiment of the present invention. This provides independence from changes in common video processing steps, including lossy compression, resizing, frame rate, or format conversion.

대표 프레임 선정부(120)는 비디오 샷 분류부(110)에서 분류된 비디오 샷의 프레임들 중에서 소정의 프레임을 대표 프레임으로 선정한다. 이 경우, 대표 프레임 선정부(120)는 각 비디오 샷의 첫 번째 프레임을 대표 프레임으로 선정할 수 있다.The representative frame selection unit 120 selects a predetermined frame among the frames of the video shot classified by the video shot classification unit 110 as a representative frame. In this case, the representative frame selecting unit 120 can select the first frame of each video shot as the representative frame.

대표 프레임 선정은 비디오에서 가장 대표적인 프레임을 찾는 것이다. 각 비디오 세그먼트에서 대표 프레임을 선정함으로써 비디오의 중복 데이터를 제거할 수 있다. 대표 프레임을 선정하는 작업은 비교 요소를 희생하지 않고 많은 양의 비디오 데이터를 줄일 수 있다. 대표 프레임 선정 방법은 단순히 비디오 스트림에서 I 프레임을 선정하는 것을 생각할 수 있으나, I 프레임을 대표 프레임으로 선정하는 데는 몇 가지 제약이 있을 수 있다. 원본 비디오는 부분 프레임을 잘라내거나 연결하여 다시 인코딩하거나 편집될 수 있다. 이때 원본 및 편집된 비디오의 내용이 겹치더라도 두 개의 버전은 완전히 다른 바이너리를 가질 수 있다. 이 경우 I 프레임의 위치는 원래 버전과 다르게 선택될 수 있다. 따라서 편집된 비디오의 I 프레임은 원본 비디오의 I 프레임과 다를 수 있으므로 대표 프레임으로 역할을 할 수 없다.Selecting the representative frame is to find the most representative frame in the video. By selecting a representative frame in each video segment, redundant data in the video can be removed. Selecting a representative frame can reduce a large amount of video data without sacrificing the comparison factor. The representative frame selection method can be considered simply to select an I frame from a video stream, but there may be some limitations in selecting an I frame as a representative frame. The original video can be re-encoded or edited by cropping or connecting a partial frame. Even though the original and edited video contents overlap, the two versions can have completely different binaries. In this case, the position of the I frame may be selected differently from the original version. Therefore, the I frame of the edited video may not be the representative frame since it may be different from the I frame of the original video.

본 발명의 일실시예는 비디오 샷 분류부(110)에서 분류된 비디오 샷의 경계를 기준으로 대표 프레임이 선정된다. 주로 대표 프레임의 선정은 각 비디오 샷의 첫 번째 프레임, 중간 프레임 및 마지막 프레임, 또는 첫 번째 프레임과 마지막 프레임을 결합하여 선택할 수 있으나, 본 발명의 일실시예에서는 첫 번째 프레임만을 대표 프레임으로 선정한다.In one embodiment of the present invention, a representative frame is selected based on the boundary of video shots classified by the video shot classification unit 110. The selection of the representative frame is mainly performed by combining the first frame, the intermediate frame and the last frame of each video shot or the first frame and the last frame, but in the embodiment of the present invention, only the first frame is selected as the representative frame .

본 발명의 일실시예에 따른 대표 프레임의 선정을 설명하는 도면이 도 3에 도시되어 있다.FIG. 3 is a view for explaining selection of representative frames according to an embodiment of the present invention.

핑거프린트 적용부(130)는 대표 프레임 선정부(120)에서 선정된 대표 프레임에 비디오 핑거프린트 기술을 적용하여 비디오 특징을 추출해 낸다. 핑거프린트 적용부(130)는 대표 프레임 선정부(120)에서 선정된 대표 프레임에 비디오 핑거프린트 기술로 지각 이미지 해싱 기법을 적용하여 비디오 특징을 추출해 내며, 상기 지각 이미지 해싱 기법은 DCT 기반 해시 기술을 이용하여 지각 이미지와 관련된 저주파수 DCT 계수를 얻을 수 있다.The fingerprint application unit 130 applies a video fingerprint technique to the representative frame selected by the representative frame selection unit 120 to extract video features. The fingerprint application unit 130 extracts a video feature by applying a perceptual image hashing technique to a representative frame selected by the representative frame selection unit 120 using a video fingerprint technique. The perceptual image hashing technique uses a DCT-based hash technique To obtain a low-frequency DCT coefficient related to the perception image.

핑거프린트 적용 기술의 주요 기능은 비디오 핑거프린트를 생성하고 비디오 핑거프린트 매칭을 기반으로 성능을 측정하는 것이다. 비디오 핑거프린트을 사용하면 스토리지 복잡성이 낮아지고 비교 매칭 시간이 단축된다. The main function of the fingerprint application technique is to generate a video fingerprint and measure the performance based on the video fingerprint matching. Using a video fingerprint reduces storage complexity and speeds up comparison matching.

핑거프린트 기술은 MD5, SHA-1 및 SHA-256과 같은 암호화 해시 함수를 사용한다. 암호화 해시 함수는 수학 함수로, 입력 데이터를 가져와서 고정 크기의 짧은 이진 문자열을 반환한다. 해시 함수가 반환하는 값을 해시 값, 메시지 다이제스트 또는 디지털 핑거프린트라고 한다. Fingerprinting techniques use cryptographic hash functions such as MD5, SHA-1, and SHA-256. A cryptographic hash function is a mathematical function that takes input data and returns a short binary string of fixed size. The value returned by the hash function is called a hash value, a message digest, or a digital fingerprint.

암호화 해싱 기술은 일반적으로 실행 파일과 같은 바이너리 데이터에서 특정 개체 (또는 청크)를 식별하는데 사용된다. 암호화 해싱은 작은 변화에도 매우 민감하므로, 포맷 변환 및 압축과 같은 상이한 표현 버전을 갖는 멀티미디어 콘텐츠에 암호화 해싱을 적용하는 데는 상당한 어려움이 있다. 즉, 멀티미디어 파일은 크기 조정, 형식 및 프레임 속도 변경과 같은 다양한 왜곡 현상을 겪을 수 있다. 따라서 멀티미디어 파일 간에 바이너리 레벨에서 데이터를 직접 비교하여 변경 내용 및 유사성을 파악하는 것은 무의미할 수 있다.Cryptographic hashing techniques are commonly used to identify a particular entity (or chunk) in binary data, such as an executable file. Since cryptographic hashing is very sensitive to small changes, there are significant difficulties in applying cryptographic hashing to multimedia content with different presentation versions, such as format conversion and compression. That is, the multimedia file may experience various distortion phenomena such as resizing, formatting, and changing the frame rate. Therefore, it may be pointless to directly compare the data at the binary level among the multimedia files to grasp the change contents and the similarity.

따라서 다양한 버전의 멀티미디어 콘텐츠를 식별하거나 인증하기 위해서는 비디오 핑거프리팅을 위한 견고한 비디오 해싱 기법이 필요하다. 본 발명의 일실시예에서는 지각(perceptual) 이미지 해싱 기법을 적용한다. 지각 이미지 해시 알고리즘에는 BMB(Block Mean value Based) 해시, DCT(DCT based) 해시, MH(Marr-Hildreth operator based) 해시, RADIAL(RADIAL variance based) 해시 여러 가지 유형이 있다. Therefore, a robust video hashing technique for video fingerprinting is needed to identify or authenticate different versions of multimedia content. In an embodiment of the present invention, a perceptual image hashing technique is applied. There are various types of perceptual image hashing algorithms such as BMB (Block Mean Value Based) hash, DCT (DCT based) hash, MH (Marr-Hildreth operator based) hash and RADIAL (RADIAL variance based) hash.

본 발명의 일실시예에서는 비디오 핑거프린트 생성을 위한 시스템 설계 목표 중 하나인 "정확도"에 초점을 맞추기 위해 DCT 해시 알고리즘을 선택한다. 이것은 DCT의 특성에 기반한 저주파수 DCT 계수가 이미지의 "지각적 실체"가 될 수 있기 때문이다. DCT 해시의 비디오 핑거프린트을 생성하는데 사용되는 알고리즘의 단계는 다음과 같다.In one embodiment of the present invention, a DCT hash algorithm is selected to focus on "accuracy" which is one of the system design goals for video fingerprint generation. This is because the low-frequency DCT coefficients based on the characteristics of the DCT can be the "perceptual entities" of the image. The steps of the algorithm used to generate the video fingerprint of the DCT hash are as follows.

핑거프린트 적용부(130)는 대표 프레임 선정부(120)에서 선정된 대표 프레임을 RGB에서 YCbCr로 변환하여 불필요한 높은 주파수를 제거한다. 그리고 핑거프린트 적용부(130)는 Y 매트릭스의 잡음 감소를 위해 메디안 필터를 사용한다. 이후 핑거프린트 적용부(130)는 메디안 필터를 통과한 이미지를 미리 설정된 크기로 표준화하는데. 예를 들면, 이미지 크기를 32 x 32로 조정할 수 있다. 핑거프린트 적용부(130)는 소정 크기의 Y 행렬에 DCT를 적용하여 DCT 계수를 얻으며, 이 얻은 DCT 계수에 해시를 적용하여 대표 프레임에 대한 해시 값을 생성한다.The fingerprint application unit 130 converts the representative frame selected by the representative frame selecting unit 120 from RGB to YCbCr to remove unnecessary high frequencies. Then, the fingerprint application unit 130 uses a median filter for noise reduction of the Y matrix. Then, the fingerprint application unit 130 normalizes the image passed through the median filter to a predetermined size. For example, you can adjust the image size to 32 x 32. The fingerprint application unit 130 obtains a DCT coefficient by applying DCT to a Y matrix of a predetermined size, and generates a hash value for the representative frame by applying a hash to the obtained DCT coefficient.

핑거프린트 기반 매칭은 바이트 대 바이트 비교의 오버 헤드를 제거할 수 있다. 비디오 샷의 경계 검출을 통해 비디오는 여러 청크로 분할되고 각 청크의 키로서 대표 프레임이 선정된다. 각 대표 프레임마다 하나씩 비디오 핑거프린트가 생성되고, 비디오 핑거프린트는 해시 함수로 각 대표 프레임에서 추출한 해시 값을 의미한다. 각 청크의 해시 값을 계산하기 위해 64 비트 해시인 Hash64를 이용하여 해시리스트를 생성한다. Fingerprint based matching can eliminate the overhead of byte-to-byte comparison. Through boundary detection of the video shots, the video is divided into several chunks and representative frames are selected as the keys of each chunk. A video fingerprint is generated for each representative frame, and a video fingerprint is a hash value extracted from each representative frame as a hash function. To calculate the hash value of each chunk, a hash list is generated using Hash64, which is a 64-bit hash.

본 발명의 일실시예에 따른 해시리스트의 생성을 설명하는 도면이 도 4에 도시되어 있다. 본 발명에 따른 핑거프린트 기반 매칭 기법은 핑거프린트가 작은 개체이기 때문에 저장소 오버헤드가 적고 효율적으로 일치하며, 유사성을 탐지하는데 낮은 계산 오버헤드가 있다는 것이다.A diagram for explaining generation of a hash list according to an embodiment of the present invention is shown in FIG. The fingerprint-based matching technique according to the present invention is a small fingerprint, so there is less storage overhead and more efficient coherence, and there is a lower computational overhead for detecting similarity.

유사도 측정부(140)는 핑거프린트 적용부(130)에서 추출된 원본의 멀티미디어 파일(10)의 비디오 특징과 변경된 멀티미디어 파일(20)의 비디오 특징을 비교하여 유사도를 측정한다.The similarity measuring unit 140 compares the video characteristic of the original multimedia file 10 extracted from the fingerprint application unit 130 with the video characteristic of the changed multimedia file 20 to measure the similarity.

유사도 측정부(140)는 멀티미디어 파일에 대한 법의학 조사를 진행하기 위해 해시리스트를 사용하여 유사성과 중복성을 검사한다. 본 발명의 일실시예에 따라 생성된 원본 멀티미디어의 해시 리스트와 변경된 멀티미디어의 해시 리스트의 비교 예를 도시하는 도면이 도 5에 도시되어 있다. 핑거프린트 기반 매칭의 기본 아이디어는 원본 멀티미디어와 변경된 멀티미디어의 핑거프린트를 비교하여 유사성 및 중복성을 찾는 것이다. The similarity measuring unit 140 checks the similarity and the redundancy using the hash list to proceed with forensic examination of the multimedia file. FIG. 5 illustrates an example of a comparison between a hash list of original multimedia generated according to an exemplary embodiment of the present invention and a modified multimedia hash list. The basic idea of fingerprint based matching is to find the similarity and redundancy by comparing the original multimedia and the changed fingerprint of the multimedia.

도 6은 본 발명의 일실시예에 따른 멀티미디어 파일 검색 방법의 흐름도를 도시하는 도면이다.6 is a flowchart illustrating a multimedia file searching method according to an embodiment of the present invention.

원본의 멀티미디어 파일(10)이 제공되면(S602), 비디오 샷 분류부(110)는 원본의 멀티미디어 파일(10)에서 연속하는 유사한 프레임을 모아 하나의 비디오 샷으로 분류한다. 이 경우 비디오 샷 분류부(110)는 비디오 샷의 분류를 위한 비디오 샷 경계를 검출함에 있어 색상·채도·명도(HSV)의 색 공간을 기반으로 히스토그램 유사성을 비교하여 비디오 샷의 경계를 검출하여 비디오 샷으로 분류한다(S604). When the original multimedia file 10 is provided (S602), the video shot classification unit 110 collects consecutive similar frames from the original multimedia file 10 into one video shot. In this case, the video shot classification unit 110 detects the boundary of the video shot by comparing the histogram similarity based on the color space of color, saturation, and brightness (HSV) in detecting the video shot boundary for classifying the video shot, (S604).

대표 프레임 선정부(120)는 비디오 샷 분류부(110)에서 분류된 비디오 샷의 프레임들 중에서 소정의 프레임을 대표 프레임으로 선정한다(S606). 이 경우, 대표 프레임 선정부(120)는 각 비디오 샷의 첫 번째 프레임을 대표 프레임으로 선정할 수 있다.The representative frame selection unit 120 selects a predetermined frame among the frames of the video shot classified by the video shot classification unit 110 as a representative frame (S606). In this case, the representative frame selecting unit 120 can select the first frame of each video shot as the representative frame.

핑거프린트 적용부(130)는 대표 프레임 선정부(120)에서 선정된 대표 프레임에 비디오 핑거프린트 기술을 적용하여 비디오 특징을 추출해 낸다(S608). 핑거프린트 적용부(130)는 대표 프레임 선정부(120)에서 선정된 대표 프레임에 비디오 핑거프린트 기술로 지각 이미지 해싱 기법을 적용하여 비디오 특징을 추출해 내며, 상기 지각 이미지 해싱 기법은 DCT 기반 해시 기술을 이용하여 지각 이미지와 관련된 저주파수 DCT 계수를 얻을 수 있다.The fingerprint application unit 130 extracts video features by applying a video fingerprint technique to the representative frame selected by the representative frame selection unit 120 (S608). The fingerprint application unit 130 extracts a video feature by applying a perceptual image hashing technique to a representative frame selected by the representative frame selection unit 120 using a video fingerprint technique. The perceptual image hashing technique uses a DCT-based hash technique To obtain a low-frequency DCT coefficient related to the perception image.

핑거프린트 적용부(130)는 각 대표 프레임마다 추출된 비디오 핑거프린트를 이용하여 해시레지스트를 생성하고 원본의 멀티미디어 파일(10)의 해시레지스트를 저장한다(S610).The fingerprint application unit 130 generates a hash resist using the extracted video fingerprint for each representative frame and stores the hash resist of the original multimedia file 10 in operation S610.

변경된 멀티미디어 파일(10)이 제공되면(S612), 비디오 샷 분류부(110)는 변경된 멀티미디어 파일(20)에서 연속하는 유사한 프레임을 모아 하나의 비디오 샷으로 분류한다. 이 경우 비디오 샷 분류부(110)는 비디오 샷의 분류를 위한 비디오 샷 경계를 검출함에 있어 색상·채도·명도(HSV)의 색 공간을 기반으로 히스토그램 유사성을 비교하여 비디오 샷의 경계를 검출하여 비디오 샷으로 분류한다(S614).When the changed multimedia file 10 is provided (S612), the video shot classifying unit 110 classifies consecutive similar frames in the changed multimedia file 20 into one video shot. In this case, the video shot classification unit 110 detects the boundary of the video shot by comparing the histogram similarity based on the color space of color, saturation, and brightness (HSV) in detecting the video shot boundary for classifying the video shot, (S614).

대표 프레임 선정부(120)는 비디오 샷 분류부(110)에서 분류된 비디오 샷의 프레임들 중에서 소정의 프레임을 대표 프레임으로 선정한다(S616). 이 경우, 대표 프레임 선정부(120)는 각 비디오 샷의 첫 번째 프레임을 대표 프레임으로 선정할 수 있다.The representative frame selection unit 120 selects a predetermined frame among the frames of the video shot classified by the video shot classification unit 110 as a representative frame (S616). In this case, the representative frame selecting unit 120 can select the first frame of each video shot as the representative frame.

핑거프린트 적용부(130)는 대표 프레임 선정부(120)에서 선정된 대표 프레임에 비디오 핑거프린트 기술을 적용하여 비디오 특징을 추출해 낸다(S618). 핑거프린트 적용부(130)는 대표 프레임 선정부(120)에서 선정된 대표 프레임에 비디오 핑거프린트 기술로 지각 이미지 해싱 기법을 적용하여 비디오 특징을 추출해 내며, 상기 지각 이미지 해싱 기법은 DCT 기반 해시 기술을 이용하여 지각 이미지와 관련된 저주파수 DCT 계수를 얻을 수 있다.The fingerprint application unit 130 extracts video features by applying a video fingerprint technique to the representative frame selected by the representative frame selection unit 120 (S618). The fingerprint application unit 130 extracts a video feature by applying a perceptual image hashing technique to a representative frame selected by the representative frame selection unit 120 using a video fingerprint technique. The perceptual image hashing technique uses a DCT-based hash technique To obtain a low-frequency DCT coefficient related to the perception image.

핑거프린트 적용부(130)는 각 대표 프레임마다 추출된 비디오 핑거프린트를 이용하여 해시레지스트를 생성하고 변경된 멀티미디어 파일(20)의 해시레지스트를 저장한다(S620).The fingerprint application unit 130 generates hash resists using the extracted video fingerprints for each representative frame and stores the hash resists of the changed multimedia file 20 in operation S620.

유사도 측정부(140)는 핑거프린트 적용부(130)에서 추출된 원본의 멀티미디어 파일(10)의 비디오 특징이 저장된 해시레지스트 파일과 변경된 멀티미디어 파일(20)의 비디오 특징이 저장된 해시레지스트 파일을 비교하여 유사도를 측정한다(S622).The similarity measurement unit 140 compares the hash resist file in which the video characteristic of the original multimedia file 10 extracted from the fingerprint application unit 130 is stored with the hash resist file in which the video characteristic of the changed multimedia file 20 is stored And the degree of similarity is measured (S622).

본 발명의 보호 범위는 이하 특허청구범위에 의하여 해석되어야 마땅할 것이다. 또한, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것인 바, 본 발명과 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The scope of protection of the present invention should be interpreted according to the claims. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention as defined by the appended claims. It should be interpreted that it is included in the scope of right.

10: 원본의 멀티미디어 파일 20: 변경된 멀티미디어 파일
110: 비디오 샷 분류부 120: 대표 프레임 선정부
130: 핑거프린트 적용부 140: 유사도 측정부10: original multimedia file 20: changed multimedia file
110: video shot classification unit 120: representative frame selection unit
130: finger print applying unit 140: similarity measuring unit

Claims

A multimedia file similarity search system for comparing a multimedia file of an original and a multimedia file of the original multimedia file,
A video shot classifying unit for classifying consecutive similar frames in the original multimedia file or the modified multimedia file into one video shot;
A representative frame selection unit selecting a predetermined frame among the frames of the video shot classified by the video shot classification unit as a representative frame;
A fingerprint application unit for extracting video features by applying a video fingerprint technique to a representative frame selected by the representative frame selection unit; And
A similarity measuring unit for comparing the video feature extracted from the representative frame of the original multimedia file with the video feature extracted from the representative frame of the changed multimedia file in the fingerprint application unit in the fingerprint application unit, The multimedia file similarity retrieval system comprising:

The method according to claim 1,
Wherein the video shot classification unit detects a video shot boundary by comparing histogram similarities based on a color space of hue, saturation, and brightness (HSV) in detecting a video shot boundary for classifying the video shots. File similarity search system.

The method according to claim 1,
Wherein the representative frame selection unit selects a first frame of each video shot as a representative frame.

4. The method according to any one of claims 1 to 3,
Wherein the fingerprint application unit extracts a video feature by applying a perception image hashing technique to a representative frame selected by the representative frame selection unit using a video fingerprint technique.

5. The method of claim 4,
Wherein the perceptual image hashing technique obtains a low frequency DCT coefficient related to a perceptual image using a DCT based hash technique.

A multimedia file similarity searching method for comparing a similarity between a multimedia file of an original and a multimedia file edited from the original multimedia file,
A video shot classification step of classifying consecutive similar frames from the original multimedia file or the modified multimedia file into one video shot;
A representative frame selecting step of selecting a predetermined one of the frames of the video shot classified in the video shot sorting step as a representative frame;
A fingerprint applying step of extracting a video feature by applying a video fingerprint technique to the representative frame selected in the representative frame selection step; And
A similarity measure step of comparing the video feature extracted from the representative frame of the original multimedia file with the video feature extracted from the representative frame of the changed multimedia file in the fingerprint application step in the fingerprint application step, Wherein the multimedia file similarity searching method comprises:

The method according to claim 6,
Wherein the fingerprint applying step extracts a video feature by applying a perceptual image hashing technique to a representative frame selected in the representative frame selection step using a video fingerprint technique.