KR100502429B1

KR100502429B1 - Apparatus and method for detecting user specified actor by using audio and visual features in drama genre video contents

Info

Publication number: KR100502429B1
Application number: KR10-2003-0007531A
Authority: KR
Inventors: 노용만; 배태면; 진성호; 추진호; 강경옥; 이한규; 이희경
Original assignee: 학교법인 한국정보통신학원; 한국전자통신연구원
Priority date: 2003-02-06
Filing date: 2003-02-06
Publication date: 2005-07-20
Also published as: KR20030038572A

Abstract

본 발명은 드라마 장르의 비디오 컨텐츠에서 특정인물 등장구간 검출을 위한 오디오 및 비주얼 정보를 이용한 검색 장치 및 방법에 관한 것으로, 비디오 컨텐츠 영상 내에서 움직임 정보, 칼라히스토그램, 묵음구간빈도를 구하고, 구한 값을 이용하여 비디오 컨텐츠 내 장면 구간을 검색하여 제공하는 장면구간 검색부; 장면구간 검색부로부터 제공된 장면 구간에 대하여 오디오 특징정보와 비디오 특징정보를 분석하여 제공하는 비디오 분석부; 사용자에 의해 선택된 프레임 번호를 포함하고 있는 구간에 해당되는 인덱스를 찾고, 찾은 인덱스와 관계되는 구간의 특징 정보를 검색한 후, 검색된 특징 정보와 다른 구간의 특징 정보간을 비교하여 유사도가 높은 구간의 인덱스를 제공하는 검색부; 검색부로부터 제공되는 유사도가 높은 구간의 인덱스에 해당되는 첫 장면을 디스플레이하는 후보 구간 디스플레이부; 사용자에 의해 구동되며, 사용자가 찾고자 하는 특정 장면을 사용자 인터페이스부를 통해 검색부에 제공하는 비디오 컨텐츠 재생기를 구비한다. 따라서, 비디오 컨텐츠를 편집하는 시스템에서도 의미 있는 장면 구간으로 비디오 컨텐츠를 분할하는 기능으로 사용이 가능하다는 효과가 있다. The present invention relates to a retrieval apparatus and method using audio and visual information for detecting a specific person's appearance section in a video content of a drama genre, wherein motion information, color histogram, and silence interval are calculated in a video content image, and the obtained values are obtained. A scene section search unit searching for and providing a scene section in the video content by using the search section; A video analysis unit analyzing and providing audio feature information and video feature information with respect to the scene section provided from the scene section search unit; Find the index corresponding to the section including the frame number selected by the user, search the feature information of the section related to the found index, compare the searched feature information with the feature information of the other section, and then A search unit providing an index; A candidate section display unit configured to display a first scene corresponding to an index of a section of high similarity provided from the search unit; A video content player driven by a user and providing a specific scene desired by the user to the search unit through the user interface unit is provided. Therefore, the video content editing system can be used as a function of dividing the video content into meaningful scene sections.

Description

Searching apparatus and method using audio and visual information for detecting specific characters' appearance section in drama genre video contents {APPARATUS AND METHOD FOR DETECTING USER SPECIFIED ACTOR BY USING AUDIO AND VISUAL FEATURES IN DRAMA GENRE VIDEO CONTENTS}

본 발명은 드라마 장르의 비디오 컨텐츠에서 특정인물 등장구간 검출을 위한 오디오 및 비주얼 정보를 이용한 검색 장치 및 방법에 관한 것으로, 특히 드라마 장르의 비디오 컨텐츠에서 특정한 내용을 갖는 장면을 검색할 수 있도록 하는 장치 및 방법에 관한 것이다. The present invention relates to a retrieval device and method using audio and visual information for detecting a specific person appearance section in a video content of a drama genre, and more particularly, to an apparatus for retrieving a scene having a specific content from a video content of a drama genre; It is about a method.

통상적으로, 비디오 컨텐츠 검색 시스템은 멀티미디어 관련 국제 표준화 기구인 ISO/IEC(International Standard Organization/International Electrotechnical Committee) JTC1 산하 위원회 SC29의 작업 그룹(working Group)인 동영상 전문가 그룹 MPEG에서 제정된 MPEG-7의 기술자들에 의해 이 검색 시스템이 연구되고 있는 실정이다. Typically, the video content retrieval system is a technician of MPEG-7 established by the Video Experts Group MPEG, a working group of the Committee SC29 under the International Standard Organization / International Electrotechnical Committee (ISO / IEC) JTC1, a multimedia-related international standardization organization. This search system is being studied by the researchers.

즉, 비디오 컨텐츠 검색 시스템에서 특정한 내용을 검색하는 방법에는 입력 비디오 영상의 신호적, 통계적인 특성을 추출하고, 추출된 특성을 비디오에 제공하여 비디오에서 제공된 특성을 자체적으로 비교하면서 검색하는 방법이 연구되고 있는 실정이며, 오디오 정보를 이용하여 검색하는 경우에는 비디오 컨텐츠의 장르를 구분하는데 주로 사용되는 실정이다. In other words, a method of searching for a specific content in a video content retrieval system is a method of extracting signal and statistical characteristics of an input video image, providing the extracted characteristics to a video, and comparing the characteristics provided from the video by itself. In the case of searching using audio information, it is mainly used to classify video content genres.

그렇지만, 비디오 컨텐츠를 이용하는 사용자가 원하는 특정인물이 등장하는 장면을 검색하는 방법과 같이 장면에 의미를 부여하면서 검색하는 경우에는 단순하게 입력 영상 또는 오디오 정보의 비교만으로는 추출이 불가능하다.However, when searching while giving meaning to a scene, such as a method of searching for a scene in which a specific person desired by a user who uses video content appears, it is impossible to extract only by comparing input video or audio information.

그 예로, 비디오 컨텐츠가 뉴스인 경우, 특정 인물이 등장하는 장면을 비디오 컨텐츠에서 영상에 대해 인물의 얼굴(face)을 검출하고, 이를 이용하여 뉴스에서 동일한 얼굴이 등장하는 장면을 검출하거나 비디오 컨텐츠에서 음성 인식 시스템을 통해 인물의 이름이 등장하는 장면을 추출하는 방법이 현재 연구되는 실정이지만, 뉴스가 아닌, 드라마 장르의 경우, 인물의 얼굴을 검출하기가 정확하지 않다는 문제점이 있다. For example, when the video content is news, a face of a person is detected in the video content of a scene in which a specific person appears, and a scene of the same face is detected in the news or a video scene is detected using the same. A method of extracting a scene in which a person's name appears through a voice recognition system is currently being studied. However, in the case of drama genre, not news, there is a problem that it is not accurate to detect a person's face.

따라서, 본 발명은 상술한 문제점을 해결하기 위해 안출한 것으로서, 그 목적은 비주얼 특징 정보뿐만 아니라 오디오 정보를 동시에 이용할 수 있는 검색방법을 제공하여 사용자가 TV를 시청하는 중에 자신이 원하는 특정인물이 나오는 장면을 지정할 경우, 비디오 컨텐츠에서 동일한 인물이 나올 가능성이 높은 장면을 검색할 수 있도록 하는 드라마 장르의 비디오 컨텐츠에서 특정인물 등장구간 검출을 위한 오디오 및 비주얼 정보를 이용한 검색 장치 및 방법을 제공함에 있다. Accordingly, the present invention has been made to solve the above-described problems, the object of which is to provide a search method that can use not only visual feature information but also audio information at the same time that a specific person desired by the user while watching TV The present invention provides a retrieval apparatus and method using audio and visual information for detecting a specific person's appearance section in a video content of a drama genre that enables searching for a scene having a high likelihood of the same person appearing in a video content.

상술한 목적을 달성하기 위한 본 발명에서 드라마 장르의 비디오 컨텐츠에서 특정인물 등장구간 검출을 위한 오디오 및 비주얼 정보를 이용한 검색 장치는 비디오 컨텐츠 영상 내에서 움직임 정보, 칼라히스토그램, 묵음구간빈도를 구하고, 구한 값을 이용하여 비디오 컨텐츠 내 장면 구간을 검색하여 제공하는 장면구간 검색부; 장면구간 검색부로부터 제공된 장면 구간에 대하여 오디오 특징정보와 비디오 특징정보를 분석하여 제공하는 비디오 분석부; 사용자에 의해 선택된 프레임 번호를 포함하고 있는 구간에 해당되는 인덱스를 찾고, 찾은 인덱스와 관계되는 구간의 특징 정보를 검색한 후, 검색된 특징 정보와 다른 구간의 특징 정보간을 비교하여 유사도가 높은 구간의 인덱스를 제공하는 검색부; 검색부로부터 제공되는 유사도가 높은 구간의 인덱스에 해당되는 첫 장면을 디스플레이하는 후보 구간 디스플레이부; 사용자에 의해 구동되며, 사용자가 찾고자 하는 특정 장면을 사용자 인터페이스부를 통해 검색부에 제공하는 비디오 컨텐츠 재생기를 포함하는 것을 특징으로 한다. In the present invention for achieving the above object, a retrieval apparatus using audio and visual information for detecting a specific person's appearance section in the video content of the drama genre obtains the motion information, the color histogram, and the silence interval in the video content image. A scene section search unit searching for and providing a scene section in the video content using a value; A video analysis unit analyzing and providing audio feature information and video feature information with respect to the scene section provided from the scene section search unit; Find the index corresponding to the section including the frame number selected by the user, search the feature information of the section related to the found index, compare the searched feature information with the feature information of the other section, and then A search unit providing an index; A candidate section display unit configured to display a first scene corresponding to an index of a section of high similarity provided from the search unit; It is driven by the user, characterized in that it comprises a video content player for providing a specific scene to the search unit through the user interface to find.

또한, 상술한 목적을 달성하기 위한 본 발명에서 드라마 장르의 비디오 컨텐츠에서 특정인물 등장구간 검출을 위한 오디오 및 비주얼 정보를 이용한 검색 방법은 장면구간 검색부에서 비디오 컨텐츠의 특정 장면 내 현재 프레임과 이전 프레임간을 비교하여 움직임 예측 신뢰도 값을 구한 후, 영상 전체에 대해 각 블록의 움직임 정보의 평균을 구하는 단계와, 장면구간 검색부에서 움직임 예측 신뢰도 값이 문턱치 이상이면, 장면전환이 발생할 수 있는 프레임으로 보고 칼라히스토그램을 구해 이전 프레임의 칼라히스토그램과 거리를 구하며, 구한 칼라히스토그램간의 거리가 문턱치 이상이면, 장면전환이 발생한 것으로 결정한 후, 묵음구간빈도를 계산하는 단계와, 장면구간 검색부에서 계산된 묵음구간빈도가 문턱치 이상이면, 새로운 구간의 첫 프레임으로 설정하는 단계와, 비디오 분석부에서 장면 구간 중 검색에 사용할 수 있는 구간에 대하여 시간 순서대로 인덱스를 붙이고, 장면 구간 내에서 오디오의 LPCC를 계산하며, 계산된 LPCC들을 벡터 양자화(vector quantization)를 수행하여 코드북(codebook)을 특징정보로 추출하여 제공하는 단계와, 비디오 컨텐츠 재생기에서 사용자가 찾고자 하는 특정 장면을 비디오 컨텐츠 재생기를 통해 제공하는 단계와, 검색부에서 사용자에 의해 선택된 프레임 번호를 포함하고 있는 구간에 해당되는 인덱스를 찾고, 찾은 인덱스와 관계되는 구간의 특징 정보를 검색한 후, 검색된 특징 정보와 다른 구간의 특징 정보간을 비교하여 유사도가 높은 구간의 인덱스를 제공하여 사용자가 선택 시청할 수 있도록 디스플레이하는 단계를 포함하는 것을 특징으로 한다.In addition, in the present invention for achieving the above object, a search method using audio and visual information for detecting a specific person appearance section in the video content of the drama genre is the current frame and the previous frame in the specific scene of the video content in the scene section search unit. Obtaining a motion prediction reliability value by comparing the results, and then calculating the average of the motion information of each block for the entire image, and if the motion prediction reliability value is greater than or equal to the threshold, the scene transition frame is a frame in which scene transition may occur. Obtaining the reported color histogram and distance from the previous frame color histogram, if the distance between the color histogram obtained is more than the threshold, determining that the scene transition has occurred, and calculating the silence interval frequency, and the silence calculated by the scene interval search unit If the interval frequency is above the threshold, the first frame of the new interval Setting the indexes, indexing in order of time the sections that can be used for searching among the scene sections in the video analysis unit, calculating the LPCC of the audio in the scene sections, and performing vector quantization on the calculated LPCCs. And extracting and providing a codebook as feature information, providing a specific scene that the user wants to find in the video content player through the video content player, and a frame number selected by the user in the search unit. The user can select and view an index of a section having a high similarity by searching for an index corresponding to a section, searching for feature information of a section related to the found index, and comparing the searched feature information with feature information of another section. And displaying the data so as to display it.

이하, 첨부된 도면을 참조하여 본 발명에 따른 일 실시 예를 상세하게 설명하기로 한다.Hereinafter, an embodiment according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 드라마 장르의 비디오 컨텐츠로부터 특정인물 등장구간을 검색하기 위한 블록 구성도로서, 장면구간 검색부(10)와, 비디오 분석부(20)와, 검색부(30)와, 후보구간 디스플레이부(40)와, 사용자 인터페이스부(50)와, 비디오 컨텐츠 재생기(60)를 포함한다. 1 is a block diagram for searching for a specific character appearance section from the video content of the drama genre according to the present invention, a scene section search unit 10, a video analysis unit 20, a search unit 30, The candidate section display unit 40, a user interface unit 50, and a video content player 60 are included.

장면구간 검색부(10)는 비디오 컨텐츠를 분석하는 블록으로서, 영상 내의 움직임 정보, 칼라히스토그램, 그리고 묵음구간빈도를 구하고, 이를 이용하여 비디오 컨텐츠 내에서 장면 전환을 검출한 장면 구간을 비디오 분석부(20)에 제공한다. The scene section search unit 10 is a block for analyzing video content. The scene section search unit 10 obtains motion information, a color histogram, and a silent section frequency in an image, and uses the video analyzer to determine a scene section that detects a scene change in the video content. 20) to provide.

여기서, 장면구간 검색부(10)는 사용자가 선택한 장면이 음성을 포함하고 있지 않을 경우, 묵음구간빈도가 낮은 경우와 움직임이 너무 빠른 경우에 검색에 사용할 수 없는 구간으로 기록한다. Here, when the scene selected by the user does not include audio, the scene section search unit 10 records the section which cannot be used for searching when the silence interval is low and the movement is too fast.

비디오 분석부(20)는 장면구간 검색부(10)로부터 제공된 장면 구간에 대해 오디오 특징정보와 비디오 특징정보, 즉 장면구간내의 움직임 정보와 칼라히스토그램, 묵음구간빈도, 엘피씨시(Linear Prediction Coefficient Cepstrum, LPCC) 계수로부터 구한 코드북(codebook)을 특징정보로 추출하여 검색부(30)에 제공한다.The video analysis unit 20 may include audio feature information and video feature information, that is, motion information, color histogram, silence interval frequency, and linear prediction coefficient Cepstrum, for the scene section provided from the scene section search unit 10. The codebook obtained from the LPCC coefficients is extracted as the feature information and provided to the search unit 30.

여기서, 비디오 분석부(20)는 검색 대상이 되는 비디오 컨텐츠에 대하여 장면구간 검색부(10)를 통해 여러 구간으로 나누어지며, 각 구간에 대해 오디오 비주얼 특징 정보가 추출되어 저장되는 것이다.Here, the video analyzer 20 divides the video content to be searched into several sections through the scene section search section 10, and extracts and stores audio visual feature information for each section.

검색부(30)는 사용자가 선택한 프레임이 속한 구간을 프레임 번호(또는 플레이 시간)를 이용하여 찾고, 이를 검색의 입력 비디오 구간으로 두고, 입력 비디오 구간과 비디오 컨텐츠내의 다른 구간들에 대해 LPCC 계수로부터 구한 코드북의 유사도를 이용하여 N개의 구간을 선택하며, 선택된 N개의 구간에 대하여 코드북과 칼라히스토그램 정보를 이용하여 가능성이 높은 순으로 M개의 구간을 선택하여 사용자에게 선택된 M개 구간의 첫 장면을 후보 구간 디스플레이부(40)에 제공한다.The search unit 30 finds a section to which the user-selected frame belongs by using a frame number (or play time), sets it as the input video section of the search, and extracts the LPCC coefficients from the input video section and other sections in the video content. Using the similarity of the obtained codebook, N sections are selected, and for the selected N sections, M sections are selected in order of high probability using the codebook and color histogram information, and candidates for the first scene of the M sections selected by the user are selected. The section display unit 40 is provided.

후보 구간 디스플레이부(40)는 검색부(40)로부터 제공되는 M개 구간의 첫 장면을 디스플레이하여 사용자가 선택하여 시청할 수 있도록 한다. The candidate section display unit 40 displays the first scene of the M sections provided from the searcher 40 so that the user can select and watch them.

사용자 인터페이스부(50)는 검색부(30)와 비디오 컨텐츠 재생기(60)간을 인터페이스하는 블록으로서, 비디오 컨텐츠 재생기(60)로부터 제공되는 장면의 시간적 위치를 검색부(30)에 제공한다. The user interface 50 is a block that interfaces between the searcher 30 and the video content player 60, and provides the searcher 30 with a temporal position of a scene provided from the video content player 60.

비디오 컨텐츠 재생기(60)는 사용자에 의해 구동되는 것으로, 사용자가 TV를 시청하는 중에 찾고자 하는 특정 인물이 나오면, 이 비디오 컨텐츠 재생기(60)에 연동되어 있는 리모콘이나 키보드와 같은 입력장치(도시되지 않음)를 통해 컨텐츠에서 사용자가 선택한 장면의 시간적 위치, 즉 프레임 번호나 시간을 사용자 인터페이스부(50)를 통해 검색부(30)에 제공한다. The video content player 60 is driven by the user, and when a user comes to watch while watching a TV, an input device (not shown) such as a remote controller or a keyboard linked to the video content player 60 appears. ) Provides the temporal position of the scene selected by the user in the content, that is, the frame number or time, to the searcher 30 through the user interface 50.

도 2의 흐름도를 참조하면서, 상술한 구성을 바탕으로, 본 발명에 따른 드라마 장르의 비디오 컨텐츠에서 특정인물 등장구간 검출을 위한 오디오 및 비주얼 정보를 이용한 검색 방법에 대하여 상세하게 설명한다. With reference to the flowchart of FIG. 2, a search method using audio and visual information for detecting a specific person appearance section in video content of a drama genre according to the present invention will be described in detail with reference to the above-described configuration.

먼저, 장면구간 검색부(10)는 비디오 컨텐츠를 각각의 의미 있는 장면구간으로 나누는 역할을 하며 전처리 작업으로 수행되는 것으로, 장면 내에서 현재 프레임과 이전 프레임간을 비교하여 움직임 정보를 구한다(단계 203).First, the scene section search unit 10 divides the video content into meaningful scene sections and is performed as a preprocessing operation. The scene section search unit 10 compares the current frame with the previous frame in the scene to obtain motion information (step 203). ).

여기서, 움직임 정보는 현재 프레임에서 이전 프레임으로의 움직임 예측을 수행하였을 때, 움직임 예측결과의 신뢰도가 문턱치 보다 작은 경우의 블록 개수와 영상 내 전체 블록 개수의 비율이 문턱치 보다 높고, 칼라히스토그램의 차를 구하고 문턱치 보다 작은 경우, 연속된 장면으로 간주한다.Here, when the motion information is estimated from the current frame to the previous frame, the ratio of the number of blocks when the reliability of the motion prediction result is smaller than the threshold and the total number of blocks in the image is higher than the threshold and the difference in the color histogram is determined. If it is found and smaller than the threshold, it is regarded as a continuous scene.

이어서, 장면구간 검색부(10)는 연속된 장면으로 간주된 구간에 대해 묵음구간빈도를 구하고 이 값이 정해진 문턱치 보다 크거나 움직임 정보의 값이 정해진 문턱치 보다 작은 경우 검색 불가 구간으로 기록하며, 특징 정보로 저장되는 칼라히스토그램은 프레임 전체에서 구하지 않고 움직임 정보가 존재하는 영역에 한정하여 구한다.Subsequently, the scene section search unit 10 obtains a silent section frequency for a section regarded as a continuous scene and records this as an unsearchable section when the value is larger than a predetermined threshold or the value of motion information is smaller than a predetermined threshold. The color histogram stored as the information is obtained not in the whole frame but in the region where the motion information exists.

즉, 장면구간 검색부(10)에서 사용되는 움직임 정보는 16×16 크기의 블록으로 영상을 나누고, 이전 영상에서 엠에스이(Mean Square Error, MSE) 방법으로 나누어진 블록에 대해 움직임 예측 신뢰도 값을 구한 후, 영상 전체에 대해 각 블록의 움직임 정보의 평균을 구한다(단계 204).That is, the motion information used in the scene section search unit 10 divides an image into blocks having a size of 16 × 16, and obtains a motion prediction reliability value for a block divided by a mean square error (MSE) method from a previous image. Then, the average of the motion information of each block is calculated for the entire image (step 204).

이때, 움직임 정보를 구하는 공식은 수학식 1과 같이 표현된다.At this time, the formula for obtaining the motion information is expressed as in Equation (1).

여기서, MSE 방법으로 나누어진 블록에 대해 구한 움직임 예측 신뢰도 값이 문턱치 보다 낮은지를 판단한다(단계 205).Here, it is determined whether the motion prediction reliability value obtained for the block divided by the MSE method is lower than the threshold (step 205).

상기 판단 단계(205)에서 움직임 예측 신뢰도 값이 문턱치 보다 낮으면 움직임 예측을 신뢰할 수 없는 블록으로 간주하며, 신뢰할 수 없는 블록의 수를 한장의 영상 프레임내의 블록의 수로 나눈 값을 프레임의 움직임 예측 신뢰도를 구한다(단계 206).In the determination step 205, if the motion prediction reliability value is lower than the threshold, the motion prediction is regarded as an unreliable block, and the motion prediction reliability of the frame is obtained by dividing the number of unreliable blocks by the number of blocks in one image frame. Is obtained (step 206).

상기 판단 단계(205)에서 움직임 예측 신뢰도가 문턱치 이상이면, 장면전환이 발생할 수 있는 프레임으로 보고 칼라히스토그램을 구해 이전 프레임의 칼라히스토그램의 거리를 구한다(단계 207). 여기서, 영상 프레임의 움직임 예측 신뢰도는 수학식 2와 같이 표현된다.If the motion prediction reliability is greater than or equal to the threshold in the determination step 205, the color histogram is calculated as a frame in which scene transition may occur, and the distance of the color histogram of the previous frame is obtained (step 207). Here, the motion prediction reliability of the image frame is expressed as in Equation 2.

다음으로, 칼라히스토그램간의 거리가 계산되면, 계산된 칼라히스토그램간의 거리가 문턱치 보다 낮은지를 체크한다(단계 208). Next, when the distance between the color histograms is calculated, it is checked whether the calculated distance between the color histograms is lower than the threshold (step 208).

상기 체크 단계(208)에서 칼라히스토그램간의 거리가 문턱치 보다 낮으면, 다음 프레임을 입력하도록 한다(단계 206).If the distance between the color histograms is lower than the threshold in the check step 208, the next frame is input (step 206).

상기 체크 단계(208)에서 칼라히스토그램간의 거리가 문턱치 이상이면, 장면전환이 발생한 것으로 결정한 후, 묵음구간빈도를 계산한다(단계 209).If the distance between the color histograms is greater than or equal to the threshold in the check step 208, it is determined that the scene change has occurred, and the silence interval frequency is calculated (step 209).

묵음구간빈도 값을 계산된 후, 계산된 묵음구간빈도가 문턱치 보다 낮은지를 판단한다(단계 210).After the silence interval value is calculated, it is determined whether the calculated silence interval is lower than the threshold (step 210).

상기 판단 단계(210)에서 묵음구간빈도가 문턱치 보다 낮으면 다음 프레임을 입력하도록 한다(단계 206).If the silence interval is lower than the threshold in the determination step 210, the next frame is input (step 206).

상기 판단 단계(210)에서 묵음구간빈도가 문턱치 이상이면, 장면전환이 발생한 것으로 보고 새로운 구간의 첫 프레임으로 설정한다(단계 211).If the silence interval is greater than or equal to the threshold in the determination step 210, it is determined that a scene change has occurred and is set as the first frame of the new interval (step 211).

여기서, 묵음구간빈도를 나타내는 공식은 수학식 3과 같이 표현되는데, 이 묵음구간빈도는 음성신호를 20msec 단위의 음성 프레임으로 나눈 후, 음성 신호의 크기가 문턱치 보다 낮은 음성 프레임의 수와 전체 음성 프레임 수의 비율로서 나타낸다.Here, the formula representing the silence interval is expressed as Equation 3, and the silence interval is divided into speech frames of 20 msec, and then the number of speech frames and the total speech frames whose magnitudes of the speech signals are lower than the threshold. Shown as the ratio of numbers.

장면 구간 검색이 끝나면 비디오 컨텐츠는 구간별로 나누어지게 되며 비디오 분석부(20)에는 구간단위로 입력이 들어가게 된다.After the scene section search is completed, the video content is divided into sections, and the video analyzer 20 enters the section by section.

그리고, 비디오 분석부(20)에서는 입력된 구간에 대해, 칼라히스토그램을 수학식 4의 공식을 적용하여 각 영상 프레임에 대해 측정하며, 이전 프레임과 현재 프레임의 칼라히스토그램의 거리는 수학식 5의 공식을 적용하면서 측정한다(단계 212).In addition, the video analyzer 20 measures the color histogram for each image frame by applying the formula of Equation 4 to the input section, and the distance between the previous frame and the color histogram of the current frame is expressed by the formula of Equation 5. Measure while applying (step 212).

비디오 분석부(20)는 장면구간 검색부(10)에 의해 검출된 장면 구간 중 검색에 사용할 수 있는 구간에 대하여 시간 순서대로 인덱스를 붙이고, 장면구간 내에서 20msec의 음성 프레임 단위로 오디오의 LPCC를 계산하며, 계산된 LPCC들을 벡터 양자화(vector quantization)를 수행하여 32개의 코드북(codebook)을 특징정보로 추출하여 검색부(30)에 제공한다(단계 213).The video analysis unit 20 indexes the sections of the scene sections detected by the scene section search section 10 that can be used for searching in chronological order, and converts the LPCC of the audio in units of 20 msec audio frames within the scene section. The calculated LPCCs are subjected to vector quantization, and 32 codebooks are extracted as feature information and provided to the search unit 30 (step 213).

이때, 벡터 양자화는 LBG 알고리즘을 사용하며, 장면구간 검색부(10)에 의해 계산된 움직임 정보와 킬라히스토그램, 묵음구간빈도와 LPCC 계수의 VQ 코드북은 구간 인덱스와 함께 장면 구간에 대한 특징 정보를 저장한다.At this time, the vector quantization uses the LBG algorithm, and the motion information calculated by the scene section retrieval unit 10 and the VQ codebook of the quiescent histogram, the silent section frequency, and the LPCC coefficient store the feature information of the scene section along with the section index. do.

여기서, 오디오의 LPCC 계수는 수학식 6의 공식을 적용하여 구할 수 있다.Here, the LPCC coefficient of the audio may be obtained by applying the formula of Equation 6.

한편, 사용자는 비디오 컨텐츠 재생기(60)를 이용하여 TV를 시청하는 중에 찾고자 하는 장면(예로, 특정 인물)이 나오면, 리모콘 또는 키보드 등의 입력장치(도시되지 않음)를 통해 컨텐츠에서 사용자에 의해 선택된 장면의 시간적 위치, 즉 프레임 번호나 시간을 사용자 인터페이스부(50)를 통해 검색부(30)에 제공한다(단계 214). Meanwhile, when a user (eg, a specific person) comes to watch while watching TV using the video content player 60, the user selects the content from the user through an input device (not shown) such as a remote controller or a keyboard. The temporal position of the scene, that is, the frame number or time, is provided to the search unit 30 through the user interface unit 50 (step 214).

검색부(30)는 사용자에 의해 선택된 프레임이 속하는 구간, 즉 프레임 번호(또는, 플레이 시간)를 포함하고 있는 구간에 해당되는 인덱스를 찾고(단계 215), 이를 이용하여 인덱스와 관계되는 구간의 특징 정보를 검색한 후(단계 216), 이를 다른 구간의 특징 정보와 비교하여 유사도가 높은 구간의 인덱스를 후보구간 디스플레이부(40)로 출력한다(단계 217).The search unit 30 finds an index corresponding to a section to which the frame selected by the user belongs, that is, a section including a frame number (or play time) (step 215), and uses the feature to identify an index related to the section. After retrieving the information (step 216), the index of the section having high similarity is output to the candidate section display unit 40 by comparing it with the feature information of the other section (step 217).

즉, 검색부(30)는 입력 프레임 번호를 포함하는 구간의 코드북과 다른 구간의 코드북을 비교하여 유사도가 높은 순으로 N개를 정렬하며, 코드북 집합간의 유사도는 수학식 7의 공식을 적용하여 결정한다. That is, the search unit 30 compares the codebook of the section including the input frame number with the codebook of the other section and arranges N pieces in the order of high similarity, and the similarity between the codebook sets is determined by applying the formula of Equation 7 below. do.

이후, 검색부(30)는 선택된 N개의 구간에 대해서 칼라히스토그램 정보를 이용하여 입력 구간과의 유사도를 측정한다. 이때, 코드북 집합간의 유사도가 문턱치를 넘는 경우를 제외한 구간에 대해 칼라히스토그램의 유사도를 측정하고, 유사도가 높은 순으로 M개의 구간을 선택하여 사용자에게 선택된 M개 구간의 첫 장면을 후보 구간 디스플레이부(40)에 제공한다. 여기서, M은 N보다 작고 0 보다 큰 정수이다.Thereafter, the search unit 30 measures similarity with the input section using the color histogram information on the selected N sections. In this case, the similarity of the color histogram is measured for the sections except when the similarity between sets of codebooks exceeds the threshold, and M sections are selected in order of high similarity, and the first scene of the M sections selected by the user is displayed as a candidate section display unit ( 40). Where M is an integer smaller than N and greater than zero.

후보 구간 디스플레이부(40)는 검색부(40)로부터 제공되는 M개 구간의 첫 장면을 디스플레이하여 사용자가 선택하여 시청할 수 있도록 한다(단계 218).The candidate section display unit 40 displays the first scene of the M sections provided from the searcher 40 so that the user can select and watch them (step 218).

이상에서 설명한 바와 같이, 본 발명은 비주얼 특징 정보뿐만 아니라 오디오 정보를 동시에 이용할 수 있는 검색방법을 제공하여 사용자가 TV를 시청하는 중에 자신이 원하는 특정인물이 나오는 장면을 지정할 경우, 비디오 컨텐츠에서 동일한 인물이 나올 가능성이 높은 장면을 검색함으로써, 비디오 컨텐츠를 편집하는 시스템에서도 의미 있는 장면 구간으로 비디오 컨텐츠를 분할하는 기능으로 사용이 가능하다는 효과가 있다. As described above, the present invention provides a search method that can use not only visual feature information but also audio information at the same time, when a user designates a scene in which a specific person wants to appear while watching a TV, the same person in video content. By searching for scenes that are likely to come out, the video editing system can be used as a function of dividing video contents into meaningful scene sections.

도 1은 본 발명에 따른 드라마 장르의 비디오 컨텐츠로부터 특정인물 등장구간을 검색하기 위한 블록 구성도이며,1 is a block diagram for searching for a specific character appearance section from the video content of the drama genre according to the present invention,

도 2는 본 발명에 따른 드라마 장르의 비디오 컨텐츠에서 특정인물 등장구간 검출을 위한 오디오 및 비주얼 정보를 이용한 검색 방법에 대한 상세 흐름도이다. 2 is a detailed flowchart illustrating a retrieval method using audio and visual information for detecting a specific person appearance section in video content of a drama genre according to the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

10 : 장면 구간 검색부 20 : 비디오 분석부10: scene section search unit 20: video analysis unit

30 : 검색부 40 : 후보구간 디스플레이부30: search unit 40: candidate section display unit

50 : 사용자 인터페이스부 60 : 비디오 컨텐츠 재생기50: user interface unit 60: video content player

Claims

In a search apparatus using audio and visual information in the video content of the drama genre,

A scene section search unit which obtains motion information, a color histogram, and a silent section frequency in the video content image, and searches for and provides a scene section in the video content using the obtained value;

A video analysis unit analyzing and providing audio feature information and video feature information with respect to the scene section provided from the scene section search unit;

Find an index corresponding to a section including a frame number selected by the user, search for feature information of a section related to the found index, and compare the searched feature information with feature information of another section to have a high similarity. A search unit providing an index of a section;

A candidate section display unit configured to display a first scene corresponding to an index of a section having high similarity provided from the search unit;

And a video content player driven by the user and providing a specific scene desired by the user to the search unit through a user interface unit. Search device using visual information.

The method of claim 1,

If the scene selected by the scene section search unit does not include audio, the video content of the drama genre is recorded in a section in which the silence interval is low and the movement is too fast and cannot be used for searching. Retrieval device using audio and visual information for detecting a specific character appearance section in.

The method of claim 1,

The feature information is extracted from the video content of the drama genre, which is extracted by motion information in a scene section, a codebook obtained from a color histogram, a silent section frequency, and a linear prediction coefficient coefficient (LPCC) coefficient. Search apparatus using audio and visual information for section detection.

The method of claim 1,

The video analysis unit divides the video content to be searched into several sections through the scene section search unit, and extracts and stores audio visual feature information for each section. Retrieval device using audio and visual information for the appearance section detection.

The method of claim 1,

And a user searches for a video by selecting a scene in which a specific person appears and searches for a video in a drama genre of video content.

The method of claim 4, wherein

Retrieval apparatus using audio and visual information for detecting a specific person's appearance section in the video content of the drama genre, characterized by selecting the detectable scene section using the motion information and the silent section frequency in the scene section divided into several sections .

In the retrieval method using a scene section search unit, a video analysis unit, a video content player, a search unit, and audio and visual information in the video content of the drama genre,

Obtaining a motion prediction reliability value by comparing a current frame in a specific scene of the video content with a previous frame in the scene section searcher, and then obtaining an average of motion information of each block for the entire image;

When the motion prediction reliability value is greater than or equal to the threshold value, the scene section searcher calculates a color histogram as a frame capable of scene change and obtains a distance from the color histogram of the previous frame, and if the distance between the color histograms is greater than or equal to the threshold value, After determining that a transition has occurred, calculating a frequency of silence intervals,

Setting the first frame of a new section when the calculated silence interval is greater than or equal to a threshold by the scene section search unit;

The video analysis unit indexes the sections that can be used for searching among the scene sections in time order, calculates the LPCC of the audio within the scene sections, and performs vector quantization on the calculated LPCCs. (codebook) extracting and providing the feature information,

Providing a specific scene that the user wants to find in the video content player through the video content player;

The searcher searches for an index corresponding to a section including the frame number selected by the user, searches for feature information of a section related to the found index, and then searches between the searched feature information and feature information of another section. Displaying the indexes of the similarity sections by comparison so that the user can select and view them.

Search method using audio and visual information for detecting a specific character appearance section in the video content of the drama genre comprising a.

The method of claim 7, wherein

When the motion information is estimated from the current frame to the previous frame, the ratio of the number of blocks when the reliability of the motion prediction result is smaller than the threshold and the total number of blocks in the image is higher than the threshold, and the difference between the color histogram is obtained. A search method using audio and visual information for detecting a specific character appearance section in a video content of a drama genre, characterized in that it is regarded as a continuous scene when smaller than a threshold.

The method of claim 7, wherein

The motion information is equation 1

A search method using audio and visual information for detecting a specific character appearance section in a video content of a drama genre, characterized in that it is expressed and calculated.

The method of claim 7, wherein

The motion prediction reliability of the image frame is

Equation 2

A search method using audio and visual information for detecting a specific person appearance section in a video content of a drama genre, characterized in that it is expressed and calculated.

The method of claim 7, wherein

The formula representing the silence interval is

Equation 3

The silence interval frequency is divided into 20 msec speech frames, and the silence interval frequency is expressed as a ratio of the number of speech frames with a lower speech threshold and the total number of speech frames. Retrieval method using audio and visual information for detecting a specific character appearance section in video content.

The method of claim 7, wherein

The color histogram is Equation 4

Each frame is measured by applying the formula of. The distance between the color histogram of the previous frame and the current frame is

Equation 5

A search method using audio and visual information for detecting a specific character appearance section in a video content of a drama genre characterized by measuring while applying the formula of.

The method of claim 7, wherein

The vector quantization uses an LBG algorithm, and the VQ codebook of the motion information, the chelator histogram, the silence interval, and the LPCC coefficients calculated by the scene interval retrieval unit stores the feature information of the scene interval along with the interval index. A search method using audio and visual information for detecting a specific person appearance section in video content of a drama genre.

The method of claim 13,

The LPCC coefficient is

Equation 6

Searching method using audio and visual information for detecting a specific character appearance section in the video content of the drama genre, characterized in that can be obtained by applying the formula of.

The method of claim 7, wherein

The codebook for extracting the feature information and the codebooks of different sections are compared with each other, and an arbitrary number is sorted in ascending order of similarity.

Equation 7

A search method using audio and visual information for detecting a specific person appearance section in a video content of a drama genre, characterized by determining by applying a formula of.

The method of claim 7, wherein

If the motion prediction reliability value is lower than the threshold, the motion prediction is regarded as an unreliable block, and the motion prediction reliability of the frame is obtained by dividing the number of the unreliable blocks by the number of blocks in one image frame. Retrieval method using audio and visual information for detecting a specific character appearance section in the video content of the drama genre characterized in that it comprises.

The method of claim 7, wherein

And if the distance between the color histograms is lower than a threshold value, inputting a next frame. The method of claim 1, further comprising: inputting a next frame.

The method of claim 7, wherein

And if the silence interval is lower than a threshold, inputting a next frame. The method of claim 1, further comprising: inputting a next frame in the video content of the drama genre.

19. A recording medium on which a program implementing the method of any one of claims 7 to 18 is recorded.