KR101341808B1

KR101341808B1 - Video summary method and system using visual features in the video

Info

Publication number: KR101341808B1
Application number: KR1020120125545A
Authority: KR
Inventors: 설상훈; 조은희; 차승욱
Original assignee: 고려대학교 산학협력단
Priority date: 2011-11-30
Filing date: 2012-11-07
Publication date: 2013-12-17
Also published as: KR20130061058A

Abstract

본 발명은 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템에 관한 것으로, 보다 구체적으로는 영상으로부터 저수준 특징벡터를 이용하여 적어도 하나의 비디오 샷을 검출한 후, 검출된 비디오 샷을 대표하는 키프레임을 검출하는 키프레임검출단계; 검출된 상기 키프레임 내 얼굴을 검출하고, 얼굴이 검출된 키프레임을 군집화하여 등장인물을 검출하는 등장인물검출단계; 등장인물이 검출된 상기 키프레임 내 등장인물의 노출횟수, 상기 얼굴이 검출된 키프레임의 저수준 특징벡터에 기초하여 키프레임간 거리가 기설정된 임계값보다 작은 거리를 갖는 키프레임들을 포함하는 키프레임 군집을 검출하고, 검출된 키프레임 군집 내 각 키프레임에 대하여 중요도를 연산하는 중요도연산단계; 및 사용자로부터 요청받은 키프레임 수에 따라, 중요도가 높은 순서별로 해당하는 키프레임을 선별하여 추출하는 선별단계;를 포함한다. 이러한 구성에 의해, 본 발명의 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템은 요약하고자 하는 영상 내 등장인물, 키프레임의 샷길이 및 색상정보와 같은 비주얼 특징을 이용하여 상기 영상 내 키프레임에 대한 중요도를 연산한 후, 연산한 중요도에 기초하여 키프레임을 선별함에 따라, 영상을 용이하게 요약할 수 있는 효과가 있다.The present invention relates to an image summarization method and system using visual features in an image. More specifically, after detecting at least one video shot using a low-level feature vector from an image, a keyframe representing the detected video shot is generated. Detecting a key frame; A character detection step of detecting a face in the detected keyframe and clustering the keyframe in which the face is detected to detect a character; A keyframe including keyframes having a distance between keyframes less than a predetermined threshold based on the number of times the character is exposed in the keyframe in which the character is detected and the low-level feature vector of the keyframe in which the face is detected. A importance calculation step of detecting a cluster and calculating importance for each keyframe in the detected keyframe cluster; And a selection step of selecting and extracting corresponding key frames in order of high importance according to the number of key frames requested by the user. With this configuration, the image summarization method and system using the visual features in the image of the present invention can be applied to keyframes in the image using visual features such as characters in the image to be summarized, shot lengths of keyframes and color information. After calculating the importance and selecting keyframes based on the calculated importance, the image can be easily summarized.

Description

Video summary method and system using visual features in the video}

본 발명은 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템에 관한 것으로, 특히 영상 내 포함된 등장인물, 키프레임의 샷길이 및 색상정보와 같은 비주얼 특징을 이용하여 영상을 효율적으로 요약할 수 있는 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템에 관한 것이다.
The present invention relates to an image summarization method and system using visual features in an image, and in particular, an image capable of efficiently summarizing an image using visual features such as characters included in the image, shot length of keyframes, and color information. The present invention relates to a method and system for summarizing images using my visual features.

최근 들어, 전자, 통신 및 기술이 급속히 발전하고, 카메라 기술의 발전과 스마트 폰의 보급으로 영상 데이터가 급격히 증가함에 따라, 이미지, 동영상 및 음성 데이터와 같이 멀티미디어 데이터로 생성되는 정보의 양이 폭발적으로 증가하였다. 이에 따라, 멀티미디어 데이터를 사용하는 데 있어서 사용자가 원하는 정보를 정확하고 빠르게 추출하는 기술에 대한 요구가 증가하는 추세이다. Recently, with the rapid development of electronics, communication, and technology, and the rapid increase in image data due to the development of camera technology and the spread of smart phones, the amount of information generated from multimedia data such as image, video, and audio data has exploded. Increased. Accordingly, there is a growing demand for a technique for accurately and quickly extracting information desired by a user in using multimedia data.

특히, 이러한 멀티미디어 데이터는 정지영상, 동영상, 그래픽, 애니메이션, 소리, 음악 및 텍스트 등의 다양한 미디어들로 구성되는데, 상기 멀티미디어 데이터에 포함되는 정보는 기존의 문자 위주의 정보보다 정보량이 매우 방대함에 따라, 기존의 주석만을 이용하여 멀티미디어 데이터에 대한 객관적이고 용이한 검색환경을 제공하는 것이 어려운 문제점이 발생했다. In particular, such multimedia data is composed of various media such as still images, moving images, graphics, animations, sounds, music, and texts, and the information included in the multimedia data is much larger than conventional text-oriented information. However, it is difficult to provide an objective and easy retrieval environment for multimedia data using only existing annotations.

상술한 바와 같이, 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템에 대한 선행기술을 살펴보면 다음과 같다. As described above, the prior art of an image summarization method and system using visual features in an image is as follows.

선행기술 1은 한국등록특허 제0792016호(2007.12.28)로서, 오디오 및 비디오 정보를 이용한 등장인물 기반 비디오 요약 장치 및 그 방법에 관한 것이다. 이러한 선행기술 1은 청각 정보를 이용한 화자인식으로 주화자를 검출하고, 특정 배우 단위의 요약을 제공하는 화자 인식부와; 시각 정보를 이용한 얼굴영역 검출 및 얼굴 인식을 통하여 특정인물이 등장하는 키 프레임을 검출하는 얼굴 인식부와; 화자 인식부에서의 청각 정보를 이용한 화자 중심의 비디오 요약 결과와 얼굴 인식부에서의 시각 정보를 이용한 얼굴 인식 결과를 이용하여 등장인물 기반 비디오 요약을 수행하는 비디오 요약부;를 포함하여 구성함으로써, 비디오 데이터 요약 방식에서 오디오 및 비디오 정보를 이용하여 등장인물별 비디오 요약을 제공한다. Prior art 1 relates to Korean Patent Registration No. 0792016 (December 28, 2007), which is a character based video summarization apparatus using audio and video information and a method thereof. The prior art 1 includes a speaker recognition unit that detects a speaker by speaker recognition using auditory information and provides a summary of a specific actor unit; A face recognition unit detecting a key frame in which a specific person appears through face region detection and face recognition using visual information; Video summarization unit for performing character-based video summarization using speaker-centered video summarization results using auditory information in the speaker recognition unit and face recognition results using visual information in the face recognition unit; In the data summary method, a video summary by character is provided using audio and video information.

또한, 선행기술 2는 한국등록특허 제0708337호(2007.4.10)로서, 퍼지 기반 OC-SVM을 이용한 동영상 자동 요약 장치 및 방법에 관한 것이다. 이러한 선행기술 2는 효과적인 비디오 요약 생성을 위해서 인간의 주관적인 판단을 반영하고, 사용자의 환경이나 요구 조건에 맞는 유연한 형태의 비디오 요약 정보를 생성하기 위한 방안을 제시하며, 주어진 비디오에서 중요한 비디오 세그먼트를 추출하고 이로부터 일련의 키 프레임들을 추출함으로써 비디오의 내용을 한눈에 파악하고 원하는 비디오 장면을 곧바로 액세스할 수 있게 한다.
In addition, the prior art 2, Korean Registered Patent No. 0708337 (April 10, 2007), relates to an automatic video summarizing apparatus and method using fuzzy-based OC-SVM. This prior art 2 reflects human subjective judgment for effective video summary generation, proposes a method for generating flexible video summary information that is flexible to the user's environment or requirements, and extracts important video segments from a given video. By extracting a series of key frames from it, you can see the content of the video at a glance and have instant access to the desired video scene.

상기와 같은 종래 기술의 문제점을 해결하기 위해, 본 발명은 영상 내 포함된 등장인물, 키프레임의 샷 길이 및 색상정보와 같은 비주얼 특징을 이용하여, 상기 영상의 키프레임에 대한 중요도를 연산한 후, 연산된 중요도에 기초하여 영상 요약을 용이하게 수행할 수 있는 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템을 제공하고자 한다.
In order to solve the above problems of the prior art, the present invention calculates the importance of the keyframe of the image by using visual features such as the characters included in the image, shot length and color information of the keyframe. To provide an image summarization method and system using visual features in an image that can easily perform image summarization based on the calculated importance.

위와 같은 과제를 해결하기 위한 본 발명의 한 실시 예에 따른 영상 내 비주얼 특징을 이용한 영상 요약 방법은 영상으로부터 저수준 특징벡터를 이용하여 적어도 하나의 비디오 샷을 검출한 후, 검출된 비디오 샷을 대표하는 키프레임을 검출하는 키프레임검출단계; 검출된 상기 키프레임 내 얼굴을 검출하고, 얼굴이 검출된 키프레임을 군집화하여 등장인물을 검출하는 등장인물검출단계; 등장인물이 검출된 상기 키프레임 내 등장인물의 노출횟수, 상기 얼굴이 검출된 키프레임의 저수준 특징벡터에 기초하여 키프레임간 거리가 기설정된 임계값보다 작은 거리를 갖는 키프레임들을 포함하는 키프레임 군집을 검출하고, 검출된 키프레임 군집 내 각 키프레임에 대하여 중요도를 연산하는 중요도연산단계; 및 사용자로부터 요청받은 키프레임 수에 따라, 중요도가 높은 순서별로 해당하는 키프레임을 선별하여 추출하는 선별단계;를 포함한다. According to an embodiment of the present invention, an image summarization method using a visual feature in an image detects at least one video shot using a low-level feature vector, and then represents a detected video shot. A key frame detection step of detecting a key frame; A character detection step of detecting a face in the detected keyframe and clustering the keyframe in which the face is detected to detect a character; A keyframe including keyframes having a distance between keyframes less than a predetermined threshold based on the number of times the character is exposed in the keyframe in which the character is detected and the low-level feature vector of the keyframe in which the face is detected. A importance calculation step of detecting a cluster and calculating importance for each keyframe in the detected keyframe cluster; And a selection step of selecting and extracting corresponding key frames in order of high importance according to the number of key frames requested by the user.

보다 바람직하게는 검출된 상기 비디오 샷 내 등장인물이 존재하는지 여부를 확인하고, 검출된 비디오 샷 내 등장인물이 존재하는 경우에 해당하는 비디오 샷을 키프레임으로 검출하거나, 검출된 비디오 샷 내 등장인물이 존재하지 않는 경우에 검출된 비디오 샷의 첫 번째 프레임을 키프레임으로 검출하는 키프레임검출단계를 포함할 수 있다. More preferably, it is determined whether there is a character in the detected video shot, and a video shot corresponding to a case where there is a character in the detected video shot is detected as a keyframe, or a character in the detected video shot. The key frame detection step of detecting the first frame of the detected video shot as a key frame, if there is no present.

특히, 컬러 히스토그램(histogram) 또는 컬러 코렐로그램(correlogram)인 저수준 특징벡터를 포함할 수 있다. In particular, it may include a low level feature vector that is a color histogram or a color correlogram.

보다 바람직하게는 상기 키프레임으로부터 눈, 코, 입의 에지(edge)부분에 기초하여 얼굴을 검출하는 얼굴검출과정; 및 상기 얼굴이 검출된 키프레임을 색상정보에 따라 군집화하여 상기 키프레임 내 등장인물을 검출하는 등장인물검출과정;을 포함하는 등장인물검출단계를 포함할 수 있다. More preferably, a face detection process of detecting a face based on the edge portion of the eyes, nose and mouth from the key frame; And a character detection process of detecting a character in the key frame by clustering the key frames in which the face is detected according to color information.

보다 바람직하게는 등장인물이 검출된 상기 키프레임 내 등장인물의 노출횟수에 따른 등장인물간 유사도를 연산하고, 연산된 유사도에 따라 두 개의 키프레임 내 등장인물간의 거리를 연산하는 등장인물간거리연산과정; 상기 등장인물에 대한 가중치와, 두 개의 키프레임의 저수준 특징벡터 및 상기 두 개의 키프레임 내 등장인물간의 거리에 기초하여 상기 두 개의 키프레임간의 거리를 연산하는 키프레임간거리연산과정; 상기 두 개의 키프레임간의 거리가 기설정된 임계값 보다 작은 거리를 갖는 키프레임들을 포함하는 키프레임 군집을 검출하고, 검출된 키프레임 군집 내 키프레임들에 대하여 계층적 트리구조를 형성하는 계층적트리구조형성과정; 및 상기 계층적 트리구조 내 각 키프레임에 대한 중요도를 연산하는 중요도연산과정;을 포함하는 중요도연산단계를 포함할 수 있다. More preferably, the character distance calculation process of calculating the similarity between the characters according to the number of exposure of the characters in the key frame, the character is detected, and the distance between the characters in the two key frames according to the calculated similarity ; An inter-frame distance calculation process of calculating a distance between the two key frames based on the weights for the characters, the low-level feature vectors of the two key frames, and the distances between the characters in the two key frames; A hierarchical tree which detects a keyframe cluster including keyframes having a distance between the two keyframes less than a preset threshold and forms a hierarchical tree structure for keyframes in the detected keyframe cluster. Structure formation process; And an importance calculation step of calculating importance for each keyframe in the hierarchical tree structure.

보다 바람직하게는 세미 하우스도르프 거리 알고리즘(Semi Hausdorff distance algorithm)을 이용하여 두 개의 키프레임간의 거리가 기설정된 임계값보다 작은 거리를 갖는 키프레임들을 포함하는 키프레임 군집을 검출하는 계층적트리구조형성과정을 포함할 수 있다. More preferably, a hierarchical tree structure is formed using a semi Hausdorff distance algorithm to detect a keyframe cluster including keyframes having a distance between two keyframes having a distance smaller than a predetermined threshold. Process may be included.

특히, 하나의 상위키프레임과, 상기 상위키프레임에 포함되는 적어도 하나의 하위키프레임 및 상기 하위키프레임에 포함되는 적어도 하나의 최하위키프레임을 갖도록 3레벨(three-level)로 이루어지는 계층적 트리구조를 포함할 수 있다. In particular, a three-level hierarchical tree having one higher key frame, at least one lower key frame included in the upper key frame, and at least one lowest key frame included in the lower key frame. It may include a structure.

특히, 상기 상위키프레임의 샷 길이는 상기 상위키프레임에 포함되는 적어도 하나의 하위키프레임의 샷 길이에 대한 총합일 수 있다. In particular, the shot length of the upper key frame may be a total of the shot lengths of at least one lower key frame included in the upper key frame.

특히, 상기 계층적 트리구조 내 각 키프레임과 상기 상위키프레임간의 거리와, 상기 각 키프레임의 샷 길이를 곱하여 상기 각 키프레임에 대한 중요도를 연산하는 중요도연산과정을 포함할 수 있다. In particular, the method may include an importance calculation process of calculating importance for each key frame by multiplying a distance between each key frame and the higher key frame in the hierarchical tree structure and a shot length of each key frame.

위와 같은 과제를 해결하기 위한 본 발명의 한 실시 예에 따른 영상 내 비주얼 특징을 이용한 영상 요약 시스템은 영상으로부터 저수준 특징벡터를 이용하여 적어도 하나의 비디오 샷을 검출한 후, 검출된 비디오 샷을 대표하는 키프레임을 검출하는 키프레임검출부; 검출된 키프레임 내 얼굴을 검출하고, 얼굴이 검출된 키프레임의 군집화하여 등장인물을 검출하는 등장인물검출부; 등장인물이 검출된 상기 키프레임 내 등장인물의 노출횟수 및 얼굴이 검출된 상기 키프레임의 저수준 특징벡터에 기초하여 키프레임간 거리가 기설정된 임계값보다 작은 거리를 갖는 키프레임들을 포함하는 키프레임 군집을 검출하고, 검출된 키프레임 군집 내 각 키프레임에 대하여 중요도를 연산하는 중요도연산부; 및 사용자로부터 요청받은 키프레임 수만큼 중요도가 높은 순서별로 해당하는 키프레임을 선별하여 추출하는 선별부;를 포함한다.
According to an embodiment of the present invention, an image summarization system using a visual feature in an image detects at least one video shot using a low-level feature vector from an image, and then represents a detected video shot. A key frame detector for detecting a key frame; A character detection unit detecting a face in the detected keyframe and detecting a character by grouping the detected keyframes; A keyframe including keyframes having a distance between keyframes smaller than a predetermined threshold value based on the number of exposures of the character in the keyframe in which the character is detected and the low level feature vector of the keyframe in which the face is detected. A importance calculator for detecting a cluster and calculating importance for each keyframe in the detected keyframe cluster; And a selection unit configured to select and extract the corresponding key frames in the order of high importance as the number of key frames requested from the user.

본 발명의 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템은 요약하고자 하는 영상 내 등장인물, 키프레임의 샷 길이 및 색상정보와 같은 비주얼 특징을 이용하여 상기 영상 내 키프레임에 대한 중요도를 연산한 후, 연산한 중요도에 기초하여 키프레임을 선별하여 요약함에 따라, 영상을 용이하게 요약할 수 있는 효과가 있다. An image summarization method and system using a visual feature in an image of the present invention calculates the importance of the keyframe in the image using visual features such as characters in the image to be summarized, shot lengths of keyframes, and color information. By selecting and summarizing the keyframes based on the calculated importance, the image can be easily summarized.

또한 본 발명의 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템은 영상을 대표하는 적어도 하나의 키프레임에 포함된 색상정보에 따라 상기 키프레임을 군집화하여 서로 유사한 다수의 키프레임을 제거함으로써, 보다 효율적으로 영상을 요약할 수 있는 효과가 있다.
In addition, the image summarization method and system using the visual features in the image of the present invention are more efficient by grouping the keyframes according to the color information included in at least one keyframe representing the image and removing a plurality of similar keyframes. There is an effect that can summarize the image.

도 1은 본 발명의 일 실시 예에 따른 영상 내 비주얼 특징을 이용한 영상 요약 시스템의 블록도이다.
도 2는 본 발명의 다른 실시 예에 따른 영상 내 비주얼 특징을 이용한 영상 요약 방법의 순서도이다.
도 3은 도 2의 중요도연산단계의 세부과정을 나타낸 순서도이다.
도 4는 영상에 대한 비디오 샷 검출과정을 나타낸 도면이다.
도 5는 등장인물의 검출결과를 나타낸 도면이다.
도 6은 키프레임 내 등장인물의 노출을 나타낸 도면이다.
도 7은 등장인물의 등장빈도와 등장인물에 대한 가중치의 관계를 나타낸 그래프이다.
도 8은 키프레임에 대한 계층적 트리구조를 나타낸 도면이다.
도 9는 본 발명을 이용한 영상 요약결과를 나타낸 도면이다.
도 10은 종래기술과 본 발명을 각각 이용하여 영상 요약결과를 나타낸 도면이다.1 is a block diagram of an image summary system using visual features in an image according to an exemplary embodiment.
2 is a flowchart of an image summarization method using visual features in an image according to another exemplary embodiment.
FIG. 3 is a flowchart illustrating a detailed process of the importance calculation step of FIG. 2.
4 is a diagram illustrating a video shot detection process for an image.
5 is a view showing a detection result of the character.
6 is a diagram illustrating exposure of a character in a keyframe.
7 is a graph showing the relationship between the appearance frequency of the characters and weights for the characters.
8 shows a hierarchical tree structure for keyframes.
9 is a view showing a summary image results using the present invention.
10 is a view showing the image summary results using the prior art and the present invention, respectively.

이하, 본 발명을 바람직한 실시 예와 첨부한 도면을 참고로 하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며, 여기에서 설명하는 실시 예에 한정되는 것은 아니다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT Hereinafter, the present invention will be described in detail with reference to preferred embodiments and accompanying drawings, which will be easily understood by those skilled in the art. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein.

이하, 도 1을 참조하여 본 발명의 일 실시 예에 따른 영상 내 비주얼 특징을 이용한 영상 요약 시스템에 대하여 자세히 살펴보도록 한다. Hereinafter, an image summary system using visual features in an image according to an exemplary embodiment will be described in detail with reference to FIG. 1.

도 1은 본 발명의 일 실시 예에 따른 영상 내 비주얼 특징을 이용한 영상 요약 시스템의 블록도이다.1 is a block diagram of an image summary system using visual features in an image according to an exemplary embodiment.

도 1에 도시된 바와 같이, 본 발명의 영상 내 비주얼 특징을 이용한 영상 요약 시스템(100)은 키프레임검출부(120), 등장인물검출부(140), 중요도연산부(160) 및 선별부(180)를 포함한다. As shown in FIG. 1, the image summarization system 100 using the visual features in the image of the present invention includes a key frame detector 120, a character detector 140, a importance calculator 160, and a selector 180. Include.

키프레임검출부(120)는 영상으로부터 저수준 특징벡터를 이용하여 적어도 하나의 비디오 샷을 검출하고, 검출된 비디오 샷을 대표하는 적어도 하나의 키프레임을 검출한다. The key frame detector 120 detects at least one video shot using a low level feature vector from the image, and detects at least one key frame representing the detected video shot.

등장인물검출부(140)는 검출된 키프레임 내 등장인물의 얼굴을 검출하고, 얼굴이 검출된 키프레임에 대한 군집화를 수행하여 등장인물을 검출한다. The character detecting unit 140 detects the face of the character in the detected keyframe and performs clustering on the keyframe in which the face is detected.

중요도연산부(160)는 상기 얼굴이 검출된 키프레임 내 등장인물의 노출횟수 및 상기 얼굴이 검출된 키프레임의 저수준 특징벡터에 기초하여 키프레임간 거리가 기설정된 임계값보다 작은 거리를 갖는 키프레임들을 포함하는 키프레임 군집을 검출하고, 검출된 키프레임 군집 내 각 키프레임에 대하여 트리구조를 형성하고, 트리구조로 형성된 각 키프레임에 대하여 중요도를 연산한다. The importance calculation unit 160 performs keyframes having a distance between keyframes smaller than a predetermined threshold based on the number of exposures of the characters in the keyframe in which the face is detected and the low level feature vector of the keyframe in which the face is detected. Detects a keyframe cluster comprising a plurality of keyframes, forms a tree structure for each keyframe in the detected keyframe cluster, and calculates importance for each keyframe formed in the tree structure.

선별부(180)는 사용자로부터 요청받은 키프레임의 수만큼 중요도가 높은 순서별로 이에 해당하는 키프레임을 선별하여 추출한다. The selector 180 selects and extracts the corresponding keyframes in order of high importance as the number of keyframes requested from the user.

이하, 도 2를 참조하여 본 발명의 다른 실시 예에 따른 영상 내 비주얼 특징을 이용한 영상 요약 방법에 대하여 자세히 살펴보도록 한다. Hereinafter, a method of summarizing an image using visual features in an image according to another exemplary embodiment will be described in detail with reference to FIG. 2.

도 2는 본 발명의 다른 실시 예에 따른 영상 내 비주얼 특징을 이용한 영상 요약 방법의 순서도이다. 2 is a flowchart of an image summarization method using visual features in an image according to another exemplary embodiment.

도 2에 도시된 바와 같이, 먼저, 키프레임검출부(120)가 저수준 특징벡터를 이용하여 요약하고자 하는 영상 내 적어도 하나의 비디오 샷을 검출한 후, 검출된 비디오 샷을 대표하는 키프레임을 검출한다(S210). 이때, 상기 키프레임검출부(120)는 컬러 히스토그램(histogram) 또는 컬러 코렐로그램(correlogram)과 같은 저수준 특징벡터를 이용하여 적어도 하나의 비디오 샷을 검출할 수 있다. As shown in FIG. 2, first, the keyframe detector 120 detects at least one video shot in an image to be summarized using a low level feature vector, and then detects a keyframe representing the detected video shot. (S210). In this case, the keyframe detector 120 may detect at least one video shot using a low level feature vector such as a color histogram or a color correlogram.

이후, 비디오 샷을 검출한 상기 키프레임검출부(120)가 검출한 비디오 샷으로부터 각각의 비디오 샷을 대표하는 키프레임을 검출한다. 예를 들어, 상기 키프레임검출부(120)가 검출한 비디오 샷 내 등장인물이 존재하는지 여부를 먼저 확인하고, 만약 상기 비디오 샷 내 등장인물이 존재하는 경우에는 상기 등장인물이 존재하는 프레임을 상기 비디오 샷의 키프레임으로 검출한다. 하지만 이와 달리, 상기 키프레임검출부(120)가 검출한 비디오 샷 내 등장인물이 존재하지 않는 경우에는 상기 비디오 샷 중 첫 번째 프레임을 해당 비디오 샷을 대표하는 키프레임으로서 검출한다. Thereafter, the key frame detection unit 120 that detects the video shot detects a key frame representing each video shot from the detected video shot. For example, the key frame detection unit 120 first checks whether there is a character in the video shot detected, and if there is a character in the video shot, the frame in which the character exists is displayed in the video. Detects with keyframe of shot. However, in contrast, when there is no character in the video shot detected by the key frame detector 120, the first frame of the video shot is detected as a key frame representing the video shot.

이와 같이, 검출된 비디오 샷은 도 4에 도시된 바와 같이, 공통된 특징이 없거나, 상호 연관성이 존재하지 않는 적어도 하나의 비디오 샷을 검출할 수 있다. As such, the detected video shot may detect at least one video shot that does not have a common feature or does not exist as shown in FIG. 4.

따라서, 도 4를 통해 알 수 있는 바와 같이, 검출된 비디오 샷 1 부터 5에는 공통의 등장인물 또는 공통의 배경이 존재하지 않는 것을 알 수 있다.Thus, as can be seen through FIG. 4, it can be seen that the common characters or the common background do not exist in the detected video shots 1 to 5.

이후, 등장인물검출부(140)가 검출된 키프레임 내 등장인물의 얼굴을 검출하고, 얼굴이 검출된 키프레임들의 군집화를 수행하여 등장인물을 검출한다(S220). 이러한 등장인물검출부(140)는 앞서 과정 S210을 통해 상기 키프레임검출부(120)가 검출한 키프레임으로부터 상기 키프레임 내 존재하는 등장인물의 이목구비 즉, 눈, 코, 입 부분에 해당하는 에지(edge)부분을 이용하여 얼굴의 존재여부를 검출한다. 이후, 상기 등장인물검출부(140)는 얼굴이 검출된 상기 키프레임을 색상정보에 따라 군집화하여 등장인물을 검출한다. 이러한 군집화 과정은 사람의 얼굴이 검출된 키프레임은 동일한 사람 또는 유사한 색상을 갖는 배경이 반복적으로 나타날 수 있기 때문에, 상기 얼굴이 검출된 키프레임을 색상정보에 따라 군집화하는 경우, 상기 키프레임 내 존재하는 등장인물을 효과적으로 검출할 수 있다. Thereafter, the character detecting unit 140 detects the face of the character in the detected keyframe, and performs a clustering of the keyframes in which the face is detected (S220). The character detection unit 140 is an edge corresponding to the obituary of the characters existing in the key frame from the key frame detected by the key frame detection unit 120, ie, eyes, nose, and mouth through the process S210. ) To detect the presence of a face. Thereafter, the character detection unit 140 detects the character by grouping the key frames where the face is detected according to color information. In such a clustering process, since a keyframe in which a human face is detected may repeatedly appear in a background having the same person or a similar color, when the face is clustered according to color information, the keyframes are present in the keyframe. Characters can be effectively detected.

이처럼, 등장인물이 검출된 키프레임은 도 5와 같이 나타날 수 있다. As such, the keyframe in which the character is detected may appear as shown in FIG. 5.

도 5는 등장인물의 검출결과를 나타낸 도면이다. 5 is a view showing a detection result of the character.

도 5에 도시된 바와 같이, 다수의 키프레임으로부터 눈, 코, 입 부분의 에지부분을 이용하여 얼굴을 검출하고, 얼굴이 검출된 키프레임 중 얼굴 및 배경에 나타난 색상정보에 기초로 하여 등장인물 1 부터 4를 검출할 수 있다. As illustrated in FIG. 5, a face is detected from a plurality of keyframes using edges of eyes, nose, and mouth, and characters are displayed based on color information displayed on the face and the background of the detected keyframes. 1 to 4 can be detected.

특히, 각각의 등장인물별로 검출된 키프레임은 모두 유사한 배경에 동일한 등장인물이 존재하는 것을 알 수 있으며, 또한 등장인물 1 부터 4에 각각 해당하는 키프레임은 등장인물별로 서로 다른 배경과 등장인물이 포함되어 있는 것을 알 수 있다. In particular, it can be seen that the keyframes detected for each character all have the same character on a similar background, and the keyframes corresponding to the characters 1 to 4 have different backgrounds and characters for each character. It can be seen that it is included.

이후, 중요도연산부(160)가 검출된 키프레임 내 등장인물이 몇 번이나 노출되었는지를 나타내는 등장인물의 노출횟수, 얼굴이 검출된 키프레임 중 저수준 특징벡터에 기초하여 최소거리를 갖는 두 개의 키프레임을 검출하고, 검출된 키프레임에 대한 중요도를 연산한다(S230). Subsequently, the importance calculator 160 has two keyframes having a minimum distance based on the number of exposures of the characters representing how many times the characters in the detected keyframes have been exposed and the low-level feature vectors of the faces detected. Is detected and the importance level for the detected keyframe is calculated (S230).

이하, 도 3을 참조하여, 키프레임에 대한 중요도 연산에 대하여 보다 자세히 살펴보도록 한다. Hereinafter, referring to FIG. 3, the importance calculation for keyframes will be described in more detail.

도 3은 도 2의 중요도연산단계의 세부과정을 나타낸 순서도이다.FIG. 3 is a flowchart illustrating a detailed process of the importance calculation step of FIG. 2.

도 3에 도시된 바와 같이, 중요도연산부(160)가 먼저, 검출된 키프레임에 대하여 상기 키프레임 내 등장인물이 몇 번이나 노출되는지를 확인하기 위해 노출횟수를 연산한다. 이하, 도 6을 참조하여, 키프레임 내 등장인물의 노출횟수 연산과정에 대하여 자세히 살펴보도록 한다. As shown in FIG. 3, the importance calculation unit 160 first calculates an exposure frequency to check how many times the characters in the key frame are exposed with respect to the detected key frame. Hereinafter, referring to FIG. 6, a detailed description will be given of a process of calculating the number of exposures of characters in a key frame.

도 6은 키프레임 내 등장인물의 노출을 나타낸 도면이다. 6 is a diagram illustrating exposure of a character in a keyframe.

도 6에 도시된 바와 같이, 키프레임이 1 부터 6까지 검출된 경우, 등장인물 1이 a, 등장인물 2가 b, 등장인물 3이 c를 나타내고, 상기 등장인물 1이 등장하는 키프레임들의 집합을 A, 상기 등장인물 2가 등장하는 키프레임들의 집합을 B, 상기 등장인물 3이 등장하는 키프레임들의 집합을 C라고 가정한다. 이에 더하여, 도 6을 통해 알 수 있듯이, 상기 등장인물 1이 등장하는 키프레임들은 f₂, f₃, f₄, f₅이며, 이러한 키프레임들의 집합이 A라고 표현된다. 마찬가지로, 상기 등장인물 2가 등장하는 키프레임들은 f₁, f₃, f₅이며, 이러한 키프레임들의 집합은 B라고 표현되고, 상기 등장인물 3이 등장하는 키프레임들은 f₄, f₅이며, 이러한 키프레임들의 집합은 C라고 표현된다. As shown in FIG. 6, when the key frames are detected from 1 to 6, the character 1 represents a, the character 2 represents b, the character 3 represents c, and the set of key frames in which the character 1 appears A, assume that the set of keyframes in which the character 2 appears is B, and the set of keyframes in which the character 3 appears is C. In addition, as can be seen from FIG. 6, the keyframes in which the character 1 appears are f ₂ , f ₃ , f ₄ , and f ₅ , and a set of such key frames is represented by A. FIG. Similarly, the keyframes in which the character 2 appears are f ₁ , f ₃ , f ₅ , the set of such keyframes is represented by B, the keyframes in which the character 3 appears are f ₄ , f ₅ , This set of keyframes is represented by C.

이에 따라, 각각의 등장인물에 대한 노출횟수 중, 집합 A에 해당되는 노출횟수는 n(A)=4이고, 집합 B에 해당되는 노출횟수는 n(B)=3이며, 집합 C에 해당되는 노출횟수는 n(C)=2이고, 등장인물이 노출되지 않은 횟수는 n(O)=1이다. 또한, 키프레임 내 등장인물 1과 2가 동시에 노출된 횟수는 n(A,B)=n(B,A)=2이고, 등장인물 1과 3이 동시에 노출된 횟수는 n(A,C)=n(C,A)=2이며, 등장인물 2와 3이 동시에 노출된 횟수는 n(C,B)=n(B,C)=1 이다. Accordingly, among the number of exposures for each character, the number of exposures corresponding to set A is n (A) = 4, and the number of exposures corresponding to set B is n (B) = 3, corresponding to set C. The number of exposures is n (C) = 2 and the number of times the character is not exposed is n (O) = 1. In addition, the number of characters 1 and 2 exposed simultaneously in the keyframe is n (A, B) = n (B, A) = 2, the number of characters 1 and 3 exposed simultaneously is n (A, C) = n (C, A) = 2, and the number of times characters 2 and 3 were exposed simultaneously is n (C, B) = n (B, C) = 1.

이후, 중요도연산부(160)가 각각의 등장인물에 대한 노출횟수를 이용하여 각 등장인물 간의 유사도를 하기의 수학식 1을 통해 연산한다. Then, the importance calculation unit 160 calculates the similarity between each of the characters by using the number of exposures to each of the characters through Equation 1 below.

[수학식 1][Equation 1]

S(A,B)=

S (A, B) =

S(A,C)=

S (A, C) =

S(B,C)=

S (B, C) =

S(A,O)=

S (A, O) =

따라서, 상기 중요도연산부(160)는 각 등장인물간의 유사도 연산에 따른 연산결과를 하기의 수학식 2에 적용하여 두 개의 키프레임 내 등장인물간의 거리 P(A, B)를 연산한다(S231). Therefore, the importance calculator 160 calculates the distance P (A, B) between the characters in two key frames by applying the calculation result according to the similarity operation between each character to Equation 2 below (S231).

[수학식 2]&Quot; (2) "

P(A,B) = 1 - S(A,B)P (A, B) = 1-S (A, B)

이후, 중요도연산부(160)는 하기의 수학식 3과 같이, 등장인물에 대한 가중치 α를 1에서 뺀 값을 두 개의 키프레임간 저수준 특징벡터 C(f_i, f_j)와 곱하고, 상기 등장인물에 대한 가중치 α를 앞서 수학식 2를 통해 연산한 두 개의 키프레임 내 등장인물간 거리 P(f_i, f_j)에 곱한 후 그 값을, 앞서 등장인물에 대한 가중치 α를 1에서 뺀 값을 두 개의 키프레임간 저수준 특징벡터 C(f_i, f_j)와 곱한 값에 더함으로써, 두 개의 키프레임간의 거리 D(f_i, f_j)를 연산한다(S232). 이러한 연산과정은 하기의 수학식 3과 같이 나타낼 수 있다. Subsequently, the importance calculation unit 160 multiplies the value obtained by subtracting the weight α for the character from 1 with the low-level feature vector C (f _i , f _j ) between two keyframes, as shown in Equation 3 below. The weight α for is multiplied by the distance P (f _i , f _j ) between the characters in the two keyframes calculated by Equation 2, and then subtracted the weight α for the character from 1 The distance D (f _i , f _j ) between the two key frames is calculated by adding to the value multiplied by the low-level feature vector C (f _i , f _j ) between the two key frames (S232). This calculation process can be expressed as Equation 3 below.

[수학식 3]&Quot; (3) "

D(f_i, f_j) = (1-α) × C(f_i, f_j) + α× P(f_i, f_j)D (f _i , f _j ) = (1-α) × C (f _i , f _j ) + α × P (f _i , f _j )

이때, 저수준 특징벡터는 상기 C(f_i, f_j) = correlogram(f_i)- correlogram(f_j)를 통해 연산되는 것으로서, 두 개의 키프레임간의 컬러 코렐로그램의 차를 나타낸다. 또한, 이때 사용되는 상기 등장인물에 대한 가중치 α 는 등장인물의 등장빈도수 FP에 따라 달라지게 된다. 이러한 상기 등장인물의 등장빈도수 FP는 하기의 수학식 4에 나타난 바와 같이, 전체 등장인물의 길이를 영상의 전체 길이로 나눈 후 이를 백분율화시키면, 영상 내 등장인물에 대한 등장빈도수 FP를 연산할 수 있다. In this case, the low-level feature vector is calculated through C (f _i , f _j ) = correlogram (f _i ) -correlogram (f _j ), and represents a difference in color correlogram between two key frames. In addition, the weight α for the character used at this time depends on the frequency FP of the character. As shown in Equation 4 below, the frequency FP of the character is calculated by dividing the length of the entire character by the total length of the image and then percentage-creating the frequency of the character FP for the characters in the image. have.

[수학식 4]&Quot; (4) "

이와 같이 연산된 등장인물의 등장빈도수에 따른 등장인물에 대한 가중치 α 는 다음과 같다. 앞서 수학식 4를 통해 연산된 등장인물에 대한 등장빈도수 FP가 50 이상인 경우, 상기 등장인물에 대한 가중치 α 는 0.1이 할당되고, 상기 등장인물에 대한 등장빈도수가 FP가 0.5 이하인 경우, 상기 등장인물에 대한 가중치 α 는 0.8이 할당된다. 또는 상기 등장인물에 대한 등장빈도수 FP가 0.5를 초과하고, 50 미만인 경우에는 연산된 등장인물에 대한 등장빈도수 FP에 0.02를 곱한 값을 1에서 뺀 값이 상기 등장인물에 대한 가중치 α 로서 할당된다.The weight α for the characters according to the frequency of appearance of the characters calculated as described above is as follows. When the frequency count FP for the character calculated through Equation 4 is 50 or more, the weight α for the character is assigned 0.1, and if the frequency count for the character is FP less than or equal to 0.5, the character The weight α for is assigned 0.8. Or if the frequency FP for the character is greater than 0.5 and less than 50, the value obtained by subtracting a value obtained by multiplying the calculated frequency FP for the character by 0.02 by 1 is assigned as the weight α for the character.

이러한 상기 등장인물의 등장빈도와 가중치간의 관계를 도 7을 통해 살펴보도록 한다. The relationship between the appearance frequency and the weight of the character will be described with reference to FIG. 7.

도 7은 등장인물의 등장빈도와 등장인물에 대한 가중치의 관계를 나타낸 그래프이다.7 is a graph showing the relationship between the appearance frequency of the characters and weights for the characters.

도 7에 도시된 바와 같이, 등장인물의 등장빈도 FP에 따른 등장인물에 대한 가중치 α는 등장인물의 등장빈도 FP가 50%일 때 0.1이 되며, 상기 등장인물의 등장빈도 FP가 5%일 때 등장인물에 대한 가중치 α는 0.8이 되는 것을 알 수 있다. 이때, 도 7에 표시된 그래프 중 실선그래프는 실험을 통해 획득한 최적값이고, 점선그래프는 이에 대한 근사값으로 볼 수 있다. As shown in FIG. 7, the weight α for the character according to the appearance frequency FP of the character is 0.1 when the appearance frequency FP of the character is 50%, and the appearance frequency FP of the character is 5%. It can be seen that the weight α for the character is 0.8. At this time, the solid line graph of the graph shown in Figure 7 is the optimum value obtained through the experiment, the dotted line graph can be seen as an approximation value.

이처럼, 중요도연산부(160)가 앞서 연산된 두 개의 키프레임간의 거리를 키프레임을 변경하며 연속해서 연산함에 따라, 기설정된 임계값보다 작은 거리를 갖는 키프레임들을 포함하는 키프레임 군집을 검출하는데, 이때, 세미 하우스도르프 거리 알고리즘(Semi Hausdorff distance algorithm)을 이용하여 두 개의 키프레임간 거리를 연산할 수 있다. 특히, 상기 기설정된 임계값은 0.06 또는 0.3이 될 수 있으며, 이러한 임계값은 사용자에 의해 변경이 가능하다.As such, the importance calculator 160 detects a keyframe cluster including keyframes having a distance smaller than a predetermined threshold as the keyframe 160 continuously calculates a distance between two keyframes previously calculated while changing a keyframe. In this case, the distance between two keyframes may be calculated using the Semi Hausdorff distance algorithm. In particular, the preset threshold may be 0.06 or 0.3, and the threshold may be changed by the user.

모든 키프레임은 여러 군집 중 하나의 군집에 포함된다All keyframes are contained in one cluster of multiple clusters

이후, 상기 중요도연산부(160)는 앞서 검출한 키프레임 군집 내 각 키프레임들에 대하여 계층으로 분리한 후, 트리구조를 형성(S233)하며, 이하 도 8을 참조하여, 키프레임 군집 내 각 키프레임에 대한 계층적 트리구조에 대하여 자세히 살펴보도록 한다. Subsequently, the importance calculation unit 160 divides each keyframe in the keyframe cluster detected above into a hierarchical layer, and forms a tree structure (S233). Referring to FIG. 8, each key in the keyframe cluster is described below. Let's take a closer look at the hierarchical tree structure of frames.

도 8은 키프레임에 대한 계층적 트리구조를 나타낸 도면이다.8 shows a hierarchical tree structure for keyframes.

도 8에 도시된 바와 같이, 키프레임에 대한 계층적 트리구조는 하나의 상위키프레임(1)과, 상기 상위키프레임(1)에 포함되는 적어도 하나의 하위키프레임(2, 3, 4) 및 상기 하위키프레임(2, 3, 4)에 각각 포함되는 적어도 하나의 최하위키프레임(5 내지 11)을 갖도록 이루어지는 3 레벨(three-level)로 이루어질 수 있다. 즉, 모든 키프레임에 대하여 계층적 트리구조가 형성되는 것을 알 수 있다. As shown in FIG. 8, the hierarchical tree structure for a key frame includes one higher key frame 1 and at least one lower key frame 2, 3, 4 included in the higher key frame 1. And at least three lowest level key frames 5 to 11 included in the lower key frames 2, 3, and 4, respectively. In other words, it can be seen that a hierarchical tree structure is formed for all keyframes.

이때, 상기 하위키프레임(2, 3, 4)의 샷 길이(shot duration)는 상기 하위키프레임(2, 3, 4)에 각각 포함되는 적어도 하나의 최하위키프레임(5 내지 11)의 샷 길이를 모두 합한 값과 동일하다. 뿐만 아니라, 상위키프레임(1)의 샷 길이 또한 상기 상위키프레임(1)에 포함되는 적어도 하나의 하위키프레임(2, 3, 4)의 샷 길이를 모두 합한 값과 동일하도록 이루어진다. In this case, the shot duration of the subkey frames 2, 3, and 4 is the shot length of at least one least significant key frame 5 to 11 included in the subkey frames 2, 3, and 4, respectively. Is equal to the sum of In addition, the shot length of the upper key frame 1 is also equal to the sum of the shot lengths of at least one lower key frame 2, 3, 4 included in the upper key frame 1.

예를 들어, 하위키프레임(2)의 샷 길이는 상기 하위키프레임(2)에 포함되는 최하위키프레임(5)의 샷 길이인 10과, 상기 하위키프레임(2)에 포함되는 다른 최하위키프레임(6)의 샷 길이인 15를 합산한 값인 25임을 알 수 있다. 뿐만 아니라, 상위키프레임(1)의 샷 길이는 상기 상위키프레임(1)에 포함되는 하위키프레임(2)의 샷 길이인 25와, 상기 상위키프레임(1)에 포함되는 다른 하위키프레임(3)의 샷 길이인 30 및 상기 상위키프레임(1)에 포함되는 또 다른 하위키프레임(4)의 샷 길이인 5를 모두 합친 값인 60임을 알 수 있다. For example, the shot length of the subkey frame 2 is 10, which is the shot length of the lowest key frame 5 included in the sub key frame 2, and the other lowest key included in the sub key frame 2, respectively. It can be seen that 25 is the sum of the shot length 15 of the frame 6. In addition, the shot length of the upper key frame 1 is 25, the shot length of the lower key frame 2 included in the upper key frame 1, and the other lower key frame included in the upper key frame 1. It can be seen that the sum of the shot length 30 of (3) and the shot length 5 of another lower key frame 4 included in the upper key frame 1 are 60, which is the sum of the sums.

상술한 바와 같이, 키프레임에 대한 계층적 트리구조를 연산한 중요도연산부(160)는 하기의 수학식 5를 이용하여, 상기 계층적 트리구조 내 포함되는 모든 키 프레임에 대하여 앞서 연산한 상위 키프레임과의 거리 D_i 와, 상기 키프레임에 해당하는 샷 길이 W_i 를 곱하여, 키프레임에 대한 중요도 IF_i를 연산한다(S234). 이때, 키프레임 군집 내 각 키프레임은 모두 계층적 트리구조로서 표현될 수 있다.As described above, the importance calculation unit 160 that calculates the hierarchical tree structure for the keyframe may use the above-described higher keyframe for all key frames included in the hierarchical tree structure. The multiplier distance D _i is multiplied by the shot length W _i corresponding to the key frame to calculate the importance IF _i for the key frame (S234). In this case, each keyframe in the keyframe cluster may be represented as a hierarchical tree structure.

[수학식 5]&Quot; (5) "

IF_i = (D_i × W_i) IF _i = (D _i × W _i )

이에 따라, 선별부(180)는 각 키프레임에 해당하는 중요도 IF를 내림차순으로 정렬한 후, 중요도가 높은 순서별로 사용자로부터 요청받은 키프레임 수만큼 키프레임을 선별하여 추출한다(S240). Accordingly, the selector 180 sorts the importance IF corresponding to each keyframe in descending order, and then selects and extracts the keyframes as many as the number of keyframes requested from the user in the order of high importance (S240).

이하, 본 발명을 적용한 영상 요약 실험에 대하여 살펴보도록 한다. Hereinafter, an image summary experiment to which the present invention is applied will be described.

표 1은 본 실험에 사용된 영상정보를 나타낸다. 하기의 표 1에 기재된 바와 같이, TV예능, 드라마, 다큐멘터리, 뉴스 등 총 4개의 프로그램 장르와, 각 장르별 2개의 영상을 채택하여 영상 요약 실험을 진행하였다. 선택한 영상은 평균 30분 내외의 길이이며, 요약한 결과는 사용자가 원하는 수의 정지영상이 시간차 순으로 정렬되어 스토리보드로 보여진다. Table 1 shows the image information used in this experiment. As shown in Table 1 below, a total of four program genres such as TV entertainment, drama, documentary, and news, and two images of each genre were adopted to conduct a video summary experiment. The selected image is about 30 minutes in average, and the summary result shows the storyboard with the number of still images you want arranged in chronological order.

프로그램 제목Program title 길이Length 인물의 등장빈도The frequency of characters 비트율Bit rate TV 예능TV entertainment 나는 남자다I'm a man 40분40 minutes 32%32% 30fps30 fps 남자의 자격Man's qualifications 40분40 minutes 35%35% 30fps30 fps 드라마drama LOSTLOST 30분30 minutes 55%55% 30fps30 fps 몽땅 내사랑All my love 35분35 minutes 60%60% 30fps30 fps 다큐멘터리documentary 세계테마기행_
유럽의 지붕 알프스World theme travel_
European alps 30분30 minutes 3%3% 30fps30 fps 세계테마기행_
도교의 성지 무당산World theme travel_
Mudangsan, the sacred place of Taoism 30분30 minutes 5%5% 30fps30 fps 뉴스news MBC 뉴스데스크MBC News Desk 40분40 minutes 25%25% 30fps30 fps CBS 노컷뉴스CBS No Cut News 20분20 minutes 28%28% 30fps30 fps

이러한 원본 영상에서 보도된 사건의 수와 요약 결과에 포함된 사건의 수를 비교해 본 결과, 평균 약 90%의 사건 검출 정확도를 나타내는 것을 알 수 있다. Comparing the number of events reported in the original image with the number of events included in the summary results, it can be seen that the average detection accuracy is about 90%.

원본 비디오에
포함된 사건의 수In the original video
Number of events included 비디오 요약 결과에 포함된 사건의 수The number of events included in the video summary results 정확도accuracy 뉴스 1
MBC 뉴스데스크News 1
MBC News Desk 1919 1818 94%94% 뉴스 2
MBC 정오뉴스News 2
MBC noon news 1010 99 90%90% 뉴스 3
KBS 9뉴스News 3
KBS 9 News 2525 2222 88%88% 뉴스 4
CBS 노컷뉴스News 4
CBS No Cut News 2020 1818 90%90%

이에 따라, 도 9(a)에 도시된 바와 같이, 영상을 5개에 대하여 요약한 결과를 나타내는 것을 알 수 있고, 도 9(b)에 도시된 바와 같이, 영상을 15개에 대하여 요약한 결과를 나타내는 것을 알 수 있다. Accordingly, as shown in FIG. 9 (a), it can be seen that the result of summarizing five images is shown. As shown in FIG. 9 (b), the result of summarizing 15 images is shown. It can be seen that indicates.

뿐만 아니라, 본 발명을 적용한 실험 예와, 종래기술을 적용한 실험 예를 상호 비교하여 사용자 만족도를 측정하였다. In addition, the user satisfaction was measured by comparing the experimental example applying the present invention and the experimental example applying the prior art.

Kim의 알고리즘Kim's Algorithm Shingo의 알고리즘Shingo's Algorithm Marian의 알고리즘Marian's Algorithm 본 발명Invention TV 예능TV entertainment 3.53.5 3.63.6 4.04.0 4.54.5 드라마drama 3.03.0 3.93.9 4.54.5 4.54.5 다큐멘터리documentary 3.03.0 3.03.0 4.04.0 4.84.8 뉴스news 3.63.6 3.63.6 4.34.3 4.84.8

상기 사용자 만족도는 측정하기 위하여, 10명의 사용자에게 원본 영상과 요약된 정지영상의 스토리보드를 제공하고, 각 영상에 대하여 1 부터 5까지의 만족도를 측정하였다. 상기 표 3을 통해 알 수 있는 바와 같이, 본 발명에 따른 영상요약은 본 테스트에서 사용된 종래기술에 비하여 상대적으로 높은 평균 만족도를 갖는 것을 알 수 있다. In order to measure the user satisfaction, 10 users were provided storyboards of the original image and the summarized still image, and the satisfaction level of 1 to 5 was measured for each image. As can be seen from Table 3, it can be seen that the image summary according to the present invention has a relatively high average satisfaction compared to the prior art used in the test.

또한, 도 10을 통해서도 종래기술과 본 발명간의 영상요약결과의 차이를 확인할 수 있다. In addition, it is also possible to confirm the difference in the image summary result between the prior art and the present invention through FIG. 10.

도 10은 종래기술과 본 발명을 각각 이용하여 영상 요약결과를 나타낸 도면이다.10 is a view showing the image summary results using the prior art and the present invention, respectively.

도 10에 도시된 바와 같이, 좌측 영상은 등장인물의 비율이 32%로 인물의 등장빈도가 비교적 높은 TV예능에 관한 영상이고, 우측영상은 등장인물의 비율이 5% 미만으로 인물이 등장빈도가 비교적 낮은 다큐멘터리에 관한 영상이다. As shown in FIG. 10, the left image is a TV entertainment with a relatively high frequency of characters with 32% of the characters, and the right image has a frequency of less than 5% of the characters. This is a relatively low documentary video.

이러한 좌측영상 (a)에 대하여 제1 종래기술인 Marian 방법을 사용하여 요약된 15개의 영상은 모두 인물이 존재하고 있으며, 제2 종래기술인 Shingo 방법을 사용하여 요약한 좌측영상 (b)에 대해서는 조명이 강한 장면과 같이 영상의 내용 파악에 도움이 되지 않는 장면이 선택되는 것을 알 수 있다. 또한 좌측영상 (c)는 영상 내 색상 특징만을 이용하기 때문에 조명으로 인해 색상이 변한 경우에는 인접한 샷이 반복되는 것을 알 수 있다. 하지만, 본 발명을 적용한 좌측영상 (d)는 등장인물과 장면 변환이 확실한 주요 장면만을 적절히 보여줌에 따라, 사용자의 만족도를 향상시킬 수 있다. With respect to the left image (a), all 15 images summarized using the first prior art Marian method exist, and there is a person. The left image (b) summarized using the second conventional technology Shingo method is illuminated. It can be seen that a scene that is not helpful for grasping the contents of the image, such as a strong scene, is selected. In addition, since the left image (c) uses only color features in the image, when the color changes due to lighting, the adjacent shot may be repeated. However, the left image (d) to which the present invention is applied can properly improve the user's satisfaction by properly showing only the main scenes in which the characters and the scene change are certain.

또한, 인물의 등장빈도가 5% 미만인 다큐멘터리 영상을 나타내는 우측의 영상은 종래기술을 이용한 (a) 내지 (c) 영상이 유사한 자연환경을 반복하여 나타내는 것과 달리, 본 발명을 이용한 (d) 영상은 등장인물이 존재하는 인터뷰 장면과 그 외의 장면을 적절히 포함시켜 요약함으로써, 인물의 등장빈도가 비교적 낮은 다큐멘터리 영상에 대해서도 효과적인 영상 요약을 통해 사용자 만족도를 향상시키는 것을 알 수 있다. In addition, the image on the right, which shows a documentary image having a frequency of appearance of less than 5%, is different from the (a) to (c) images using the conventional art, which repeatedly shows a similar natural environment. By properly including and summarizing interview scenes and other scenes in which characters are present, it can be seen that the user satisfaction is improved through effective video summarization even for documentary images having a relatively low frequency of characters.

또한, 이러한 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템은 컴퓨터로 실행하기 위한 프로그램이 기록된 컴퓨터 판독가능 기록매체에 저장될 수 있다. 이때, 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 장치의 예로는 ROM, RAM, CD-ROM, DVD±ROM, DVD-RAM, 자기 테이프, 플로피 디스크, 하드 디스크(hard disk), 광데이터 저장장치 등이 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 장치에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.In addition, an image summarization method and system using the visual features in the image may be stored in a computer readable recording medium having recorded thereon a program for execution by a computer. At this time, the computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of the computer readable recording medium include ROM, RAM, CD-ROM, DVD 占 ROM, DVD-RAM, magnetic tape, floppy disk, hard disk, optical data storage, and the like. The computer readable recording medium can also be distributed over network coupled computer devices so that the computer readable code is stored and executed in a distributed fashion.

본 발명의 영상 내 비주얼 특징을 이용한 영상 요약 방법 및 시스템은 요약하고자 하는 영상 내 등장인물, 키프레임의 샷길이 및 색상정보와 같은 비주얼 특징을 이용하여 상기 영상 내 키프레임에 대한 중요도를 연산한 후, 연산한 중요도에 기초하여 키프레임을 선별하여 요약함에 따라, 영상을 용이하게 요약할 수 있는 효과가 있다. An image summarizing method and system using a visual feature in an image of the present invention calculates the importance of a keyframe in the image by using visual features such as characters in the image to be summarized, shot lengths of keyframes, and color information. By selecting and summarizing the keyframes based on the calculated importance, the image can be easily summarized.

상기에서는 본 발명의 바람직한 실시 예에 대하여 설명하였지만, 본 발명은 이에 한정되는 것이 아니고 본 발명의 기술 사상 범위 내에서 여러 가지로 변형하여 실시하는 것이 가능하고 이 또한 첨부된 특허청구범위에 속하는 것은 당연하다.
While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, Do.

120: 키프레임검출부 140: 등장인물검출부
160: 중요도연산부 180: 선별부120: key frame detection unit 140: character detection unit
160: importance calculation unit 180: selection unit

Claims

A key frame detection step of detecting at least one video shot using a low level feature vector from the image, and then detecting a key frame representative of the detected video shot;
A character detection step of detecting a face in the detected keyframe and clustering the keyframe in which the face is detected to detect a character;
A keyframe including keyframes having a distance between keyframes less than a predetermined threshold based on the number of times the character is exposed in the keyframe in which the character is detected and the low-level feature vector of the keyframe in which the face is detected. A importance calculation step of detecting a cluster and calculating importance for each keyframe in the detected keyframe cluster; And
A selection step of selecting and extracting corresponding key frames in order of high importance according to the number of key frames requested by the user;
Image summary method using a visual feature in the image comprising a.

The method of claim 1,
The key frame detection step
Check whether there is a character in the detected video shot, and detect a video shot corresponding to the case where there is a character in the detected video shot as a keyframe, or if there is no character in the detected video shot. And a first frame of the detected video shot as a key frame.

The method of claim 1,
The low level feature vector
A method of summarizing an image using visual features in an image, characterized in that it is a color histogram or a color correlogram.

The method of claim 1,
The character detection step
A face detection process of detecting a face based on edges of eyes, nose and mouth from the key frame; And
A character detection process of detecting characters in the key frame by clustering the key frames in which the face is detected according to color information;
Image summarization method using a visual feature in the image, characterized in that it comprises a.

The method of claim 1,
The importance calculation step
Calculating distance between characters according to the number of exposures of the characters in the key frame in which the characters are detected, and calculating distances between characters in the two key frames according to the calculated similarity;
An inter-frame distance calculation process of calculating a distance between the two key frames based on the weights for the characters, the low-level feature vectors of the two key frames, and the distances between the characters in the two key frames;
A hierarchical tree which detects a keyframe cluster including keyframes having a distance between the two keyframes less than a preset threshold and forms a hierarchical tree structure for keyframes in the detected keyframe cluster. Structure formation process; And
A importance calculation process of calculating importance for each keyframe in the hierarchical tree structure;
Image summarization method using a visual feature in the image, characterized in that it comprises a.

The method of claim 5,
The hierarchical tree structure formation process
A visual feature in an image is detected by using a semi Hausdorff distance algorithm to detect a keyframe cluster including keyframes having a distance less than a predetermined threshold between two keyframes. Image summary method used.

The method of claim 5,
The hierarchical tree structure
An image comprising three levels, one upper key frame, at least one lower key frame included in the upper key frame, and at least one lowest key frame included in the lower key frame. Video summary method using my visual features.

The method of claim 7, wherein
And a shot length of the upper key frame is a sum of shot lengths of at least one lower key frame included in the upper key frame.

The method of claim 7, wherein
The importance calculation process
An image summarization method using a visual feature in an image, wherein the importance of each keyframe is calculated by multiplying a distance between each keyframe and the upper keyframe in the hierarchical tree structure by the shot length of each keyframe. .

A computer-readable recording medium having recorded thereon a program for executing a method according to any one of claims 1 to 9 with a computer.

A key frame detector for detecting at least one video shot using a low level feature vector from an image, and then detecting a key frame representative of the detected video shot;
A character detection unit detecting a face in the detected keyframe and detecting a character by grouping the detected keyframes;
A keyframe including keyframes having a distance between keyframes smaller than a predetermined threshold value based on the number of exposures of the character in the keyframe in which the character is detected and the low level feature vector of the keyframe in which the face is detected. A importance calculator for detecting a cluster and calculating importance for each keyframe in the detected keyframe cluster; And
A selection unit which selects and extracts corresponding key frames in order of high importance as the number of key frames requested by the user;
Image summary system using a visual feature in the image comprising a.