KR20030067133A

KR20030067133A - Intelligent DVR system using a content based automatic video parsing

Info

Publication number: KR20030067133A
Application number: KR1020020006996A
Authority: KR
Inventors: 민병우; 박진우; 이병설; 박준모; 김장환; 윤현진; 김덕현
Original assignee: (주)지토
Priority date: 2002-02-07
Filing date: 2002-02-07
Publication date: 2003-08-14

Abstract

PURPOSE: An intelligent digital video recording system utilizing a technique of automatically segmenting a motion picture is provided to improve accessibility of motion pictures and to easily find a desired portion of a motion picture. CONSTITUTION: An apparatus for automatically segmenting digital motion-picture data includes a motion picture capturing unit(101) for converting an NTSC signal into a digital signal to extract standard continuous frames, and a calculator(103) for analyzing the frames generated by the motion picture capturing unit to calculate a characteristic quantity of a pixel value difference and histogram difference. The characteristic quantity of the pixel value difference and histogram difference, calculated by the characteristic quantity calculator, and the frames generated by the motion picture capturing unit are analyzed to calculate a characteristic quantity of histogram dispersion of a differential image and pixel value dispersion of each frame.

Description

Intelligent DVR system using a content based automatic video parsing}

본 발명은 디지털 영상 녹화기 시스템 (Digital Video Recorder System)의 동영상 녹화 시, 디지털 동영상 데이터를 그 내용에 기반 하여 자동으로 분할, 구조화하고, 분할된 단위 동영상으로부터 대표프레임을 선정함으로써 유연한 동영상의 검색 및 브라우징을 제공할 수 있도록 하는 디지털 영상 녹화기 시스템의 개발에 관한 것으로, 더욱 상세하게는 동영상 장면을 자동으로 분할하고 분할된 장면의 내용을 대표할 수 있는 대표 프레임을 추출하는 기능을 내장한 디지털 영상 녹화기와 이를 이용하여 녹화된 동영상의 구조 정보를 이용한 내용기반 동영상 브라우저(Browser)를 통합한 시스템에 관한 것이다.According to the present invention, when video recording of a digital video recorder system (Digital Video Recorder System), digital video data is automatically divided and structured based on the content, and the representative frame is selected from the divided unit video to search and browse a flexible video. In more detail, the present invention relates to the development of a digital video recorder system, and more particularly, to a digital video recorder having a function of automatically dividing a video scene and extracting a representative frame capable of representing the content of the divided scene. The present invention relates to a system incorporating a content-based video browser using structure information of a recorded video.

기존의 디지털 영상 녹화기 시스템은 여러 압축 기술의 개발로 방대한 데이터 크기에 관한 문제점은 어느 정도 해결되었으나, 동영상 정보의 특징인 순차적 (Sequential) 특성으로 인해 브라우징 (Browsing)이 대단히 곤란하고 많은 시간을 필요로 하는 문제점을 갖고 있다.Conventional digital video recorder system has solved the problem of huge data size to some extent with the development of several compression technologies, but browsing is very difficult and requires a lot of time due to the sequential characteristics, which are the characteristics of video information. There is a problem.

일반적으로 디지털화된 동영상의 가장 큰 특징 중의 하나는 비디오 테이프와는 달리 어느 지점에건 즉각적인 접근이 가능하다는 점이다. 즉 FF (Fast Forward) 나 FB (Fast Backward) 기능에 의존하는 선형적인 접근만이 가능한 비디오 테이프와는 달리 원하는 시점으로 즉각적으로 이동할 수 있는 비선형적인 접근이 가능하며 디지털 영상 녹화기에서의 주요한 접근 점은 바로 화면의 변화가 있는 부분 - 물체의이동, 조명의 변화 등 -인 것이다.In general, one of the greatest features of digitized video is that, unlike video tape, it has instant access at any point. In other words, unlike video tapes that only have linear access that relies on the Fast Forward (Fast) or Fast Backward (FB) functions, a non-linear approach that enables instant movement to the desired point of view is possible. It is the part where the screen changes-the movement of an object, the change of lighting, etc.

그러나, 실질적으로 문제가 되는 것은 점프 기능이 제공된다 할지라도 점프하고 싶은 위치를 사용자가 알 수 없다는 점이다. 대체적으로 디지털 동영상 녹화기 시스템의 플레이어에서는 슬라이드 바 같은 기능을 이용하여 동영상의 원하는 위치로 이동할 수 있는 기능을 제공하지만, 동영상 중 어느 부분에 사용자가 원하는 영상이 있는지 알 수 없는 상태에서는 무용지물이라 하지 않을 수 없다. 결국 녹화된 디지털 동영상의 경우에도 VCR과 마찬가지로 FF 기능이나 FB 기능에 의존하여 원하는 부분을 찾아가는 수밖에 없고, 이는 디지털화 된 미디어의 가장 큰 특징 중의 하나를 사장시키는 결과가 될 뿐만 아니라 디지털 영상 녹화기 시스템의 목적에도 크게 부합되는 것이라 볼 수 있다.However, a practical problem is that the user cannot know where to jump even if a jump function is provided. In general, the player of the digital video recorder system provides a function of moving to a desired position of the video by using a function such as a slide bar, but it can not be said to be useless when the part of the video does not know where the desired video is located. none. After all, in the case of recorded digital video, like the VCR, it has no choice but to rely on the FF function or the FB function to find the desired part. This can be seen as a great match.

이와 같은 문제에 대한 대응책으로 순차적인 특성을 가진 동영상을 내용에 기반 하여 분할하여 줌으로써 동영상 데이터를 구조화 하고자 하는 접근방법이 모색되어 왔으며 많은 연구가 이루어져 왔다. 그러나, 현재까지는 수작업에 의하여 분할을 수행하는 경우가 많아 분할 및 구조화에 지나치게 많은 시간이 걸리며 인력의 낭비가 심하다는 문제점을 가지고 있었다.As a countermeasure to this problem, an approach to structure video data by dividing a video having sequential characteristics based on contents has been sought and many studies have been made. However, until now, the division is often performed by manual labor, and thus, the division and structuring takes too much time and the waste of manpower is severe.

이러한 문제점의 해결을 위하여 분할작업을 자동화하기 위한 연구들이 활발히 수행되고 있다. 예를 들어, 첫 번째로는 연속되는 프레임간의 화소 값의 차이를 계산하여 장면전환 되는 지점을 찾는 방법이 있고, 두 번째로는 연속되는 프레임간의 히스토그램의 차이를 계산하여 장면 전환되는 지점을 찾는 방법이 있으며, 세 번째로는 연속되는 프레임간의 움직임을 파악하여 장면 전환되는 지점을 찾는 방법 등다양한 방법이 제안되어 왔다.In order to solve this problem, studies for automating partitioning work have been actively conducted. For example, the first method is to find the transition point by calculating the difference in pixel values between successive frames, and the second is to find the transition point by calculating the difference in histograms between successive frames. Third, various methods have been proposed, such as a method for finding a scene transition point by grasping motion between successive frames.

그러나, 아직까지는 그 정확성이 충분치 못하며, 점진적인 장면전환에 대응이 곤란하다는 등의 문제점을 안고 있었다.However, so far, the accuracy is insufficient, and it was difficult to cope with the gradual scene change.

또한, 분할된 동영상을 표현하기 위한 수단으로 분할된 각 동영상 단위를 대표적으로 표현할 수 있는 프레임, 즉 대표프레임의 선정이 필요하게 되는데, 종래의 연구에서는 이 부분에 대하여 단지 1초에 1장씩을 선정하거나, 분할된 단위 동영상의 첫 번째, 마지막, 중간 프레임을 선정하는 식의 기계적인 접근방법이 채택되어 왔다.In addition, as a means for representing a divided video, it is necessary to select a frame that can represent each divided video unit, that is, a representative frame. In a conventional study, only one piece per second is selected for this part. Or, a mechanical approach has been adopted to select the first, last, and middle frames of a divided unit video.

그러나, 대표 프레임은 사용자가 브라우징 할 경우, 또는 영상특징에 의한 검색을 수행 할 경우 등에 서비스의 질을 좌우할 수 있는 중요한 요인이 된다는 점에서 보다 신중한 선정이 요구되고 있다.However, since the representative frame is an important factor that can influence the quality of service, such as when a user browses or searches by a video feature, more careful selection is required.

본 발명은 상기와 같은 문제점을 해소하고 동영상의 접근성을 개선하기 위해 디지털 영상 녹화기로 녹화하는 동영상의 내부 구조 정보를 추출하여 동영상 데이터의 TOC (Table Of Contents)를 작성하고 전체 동영상을 장면의 변화가 있는 부분을 기준으로 다음 장면의 변화가 있는 부분까지 하나의 레벨로 나누어주는 구조화 작업을 수행함으로써 원하는 부분을 쉽게 찾아 낼 수 있을 뿐 아니라, 내용적으로 의미 있는 대표 프레임을 선정할 수 있는 디지털 동영상 데이터 자동분할 장치를 응용한 디지털 영상 녹화기 시스템 및 그 방법을 제공하는데 그 목적이 있다.The present invention extracts the internal structure information of the video recorded by the digital video recorder to solve the above problems and improve the accessibility of the video to create a TOC (Table Of Contents) of the video data and change the scene of the entire video The digital video data not only makes it easy to find the desired part but also selects a representative frame that is meaningful in content by performing a structured work that divides the part of the next scene into the one level based on the part that is present. It is an object of the present invention to provide a digital video recorder system and a method using an automatic splitting device.

이와 같은 본 발명의 목적을 달성하기 위한 제1 수단은 NTSC 신호를 아날로그 /디지털 변환하여 얻은 디지털 영상 프레임을 분석하여 화소 값 차이, 히스토그램차이, 차분 영상의 히스토그램 분산 및 각 프레임의 화소 값 분산 등의 특징 량을 계산하는 특징량 계산수단과, 상기 특징량 계산수단에서 계산된 화소 값 차이, 히스토그램 차이 및 차분 영상의 히스토그램 분산의 특징 량을 인공신경망을 이용하여 통합하여 장면전환을 판단하는 제 1 장면전환점 검출수단과, 상기 특징량 계산수단에서 계산된 각 프레임의 화소 값 분산을 감시하여 점진적인 장면 전환점을 검출하는 제 2 장면전환점 검출수단과, 상기 제 1, 제 2 장면전환점 검출수단에서 검출된 장면전환점에 의해 화소 값 차이의 변화추이를 패턴별로 분리하여 각 패턴에 맞는 선정 방법을 채용하여 내용적으로 의미 있는 대표 프레임을 선정하는 대표 프레임 선정수단을 포함하여 구성된다.The first means for achieving the object of the present invention is to analyze the digital image frame obtained by the analog / digital conversion of the NTSC signal, such as pixel value difference, histogram difference, histogram dispersion of the difference image and pixel value dispersion of each frame, etc. A first scene for determining a scene change by integrating a feature amount calculating means for calculating a feature amount and a feature amount of pixel value difference, histogram difference, and histogram dispersion of the difference image calculated by the feature amount calculating means using an artificial neural network; Switch point detection means, second scene change point detection means for monitoring a gradual scene change point by monitoring pixel value dispersion of each frame calculated by the feature amount calculation means, and scenes detected by the first and second scene change point detection means. Adopts the selection method suitable for each pattern by dividing the change trend of pixel value difference by pattern by switching point It is configured to include a representative frame selection means for selecting a representative frame over which information significant.

이와 같은 본 발명의 목적을 달성하기 위한 제2 수단은 NTSC 신호를 아날로그 / 디지털 변환하여 얻은 디지털 영상 프레임을 분석하여 화소 값 차이, 히스토그램 차이, 차분 영상의 히스토그램 분산 및 각 프레임의 화소 값 분산 등의 특징 량을 계산하는 특징량 계산과정과, 상기 특징량 계산과정에서 계산된 화소값 차이, 히스토그램 차이 및 차분 영상의 히스토그램 분산의 특징량을 인공신경망을 이용하여 통합하여 장면전환을 판단하는 제 1 장면전환점 검출과정과, 상기 특징량 계산과정에서 계산된 각 프레임의 화소 값 분산을 감시하여 점진적인 장면 전환점을 검출하는 제 2 장면전환점 검출과정과, 상기 제 1, 제 2 장면전환점 검출과정에서 검출된 장면전환점에 의해 화소 값 차이의 변화추이를 패턴별로 분리하여 각 패턴에 맞는 선정 방법을 채용하여 내용적으로 의미 있는 대표 프레임을 선정하는 대표프레임선정과정으로 이루어진다.The second means for achieving the object of the present invention is to analyze the digital image frame obtained by analog / digital conversion of the NTSC signal, such as pixel value difference, histogram difference, histogram dispersion of the differential image and pixel value dispersion of each frame, etc. A first scene in which a scene change is determined by integrating a feature quantity calculation process of calculating a feature quantity and a feature quantity of pixel value difference, histogram difference, and histogram dispersion of difference images calculated in the feature quantity calculation process using an artificial neural network. A second scene change point detection step of detecting a progressive scene change point by monitoring a switch point detection process, a pixel value distribution of each frame calculated in the feature amount calculation process, and a scene detected in the first and second scene change point detection processes Adopts the selection method suitable for each pattern by dividing the change trend of pixel value difference by pattern by switching point Consists of a representative frame selection process of selecting a representative frame over which information significant.

이와 같은 과정으로 획득 된 동영상 데이터의 검색 및 브라우징 정보는 하나의 파일 포맷 - 동영상 파일에 관련된 정보를 저장하기 위하여 개발된 파일 포맷- 에 저장되고 구조화된 동영상을 표현하고 원하는 곳을 쉽게 찾아가 볼 수 있도록 연속적으로 구성된 동영상을 계층적 구조로 분석하여 시각적으로 보여주는 브라우저가 제공되어 지는 것에 본 발명품의 특징이 있다.The search and browsing information of the video data obtained through the above process is stored in one file format-a file format developed to store information related to video files-so that the structured video can be represented and easily searched for. A feature of the present invention is that a browser is provided that visually analyzes a series of moving images in a hierarchical structure.

도 1은 본 발명의 실시 예에 의한 디지털 영상 녹화기 시스템의 구성도1 is a block diagram of a digital video recorder system according to an embodiment of the present invention

도 2는 연속되는 프레임인 경우의 차분영상 히스토그램의 대표적인 분포도2 is a representative distribution diagram of the differential image histogram in the case of continuous frames

도 3은 장면전환이 나타나는 경우의 차분영상 히스토그램의 대표적인 분포도3 is a representative distribution diagram of a differential image histogram when a scene change appears

도 4는 화면에 순간적인 밝기변화가 있었을 경우 차분영상 히스토그램의 변화도4 is a change diagram of the differential image histogram when there is a momentary change in brightness on the screen

도 5는 화소값의 차이, 히스토그램의 차이, 차분영상 히스토그램의 분산의 3가지 특징량을 통합하기 위하여 사용한 인공신경망의 구조도5 is a structural diagram of an artificial neural network used for integrating three feature quantities of pixel value difference, histogram difference, and variance of differential image histogram.

도 6은 분산값의 변화에 따른 페이딩효과 검출방법에 대한 개념도6 is a conceptual diagram of a method for detecting a fading effect according to a change in dispersion value

도 7은 화소값 차이의 대표적인 변화패턴을 보인 예시도7 is an exemplary view showing a representative change pattern of pixel value difference

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100 : 카메라101 : 동영상 캡춰부100: camera 101: video capture unit

102 : DVR 녹화부103 : 특징량 계산부102: DVR recording unit 103: feature amount calculation unit

104 : 인공 신경망 105, 106 : 장면 전환점 검출부104: artificial neural network 105, 106: scene change point detection unit

107 : 대표프레임 선정부108 : 동영상 브라우저107: representative frame selection unit 108: video browser

본 발명에 의한 영상의 자동분할 기술을 적용한 지능형 디지털 영상 녹화기 시스템은 장면의 변화가 있는 시점으로의 비선형적인 접근이 가능하게 하기 위하여 전체의 동영상을 장면의 변화가 있는 부분을 기준으로 다음 장면의 변화가 있는 부분까지를 하나의 레벨로 분류하는 계층적인 데이터 모델을 제시하고 있다.The intelligent digital video recorder system employing the automatic segmentation technology according to the present invention changes the next scene based on the part of the scene change in order to allow nonlinear access to the scene change point. We present a hierarchical data model that classifies parts up to a single level.

이하, 본 발명을 첨부된 도면을 참조하여 상세히 설명하면 다음과 같다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명의 실시 예에 의한 동영상의 자동 분할 기술을 응용한 디지털 동영상 녹화기 시스템의 구성도를 도시한 것이다.1 is a block diagram of a digital video recorder system to which an automatic segmentation technology of a video is applied according to an exemplary embodiment of the present invention.

도 1 에 도시된 바와 같이, 본 발명의 실시 예에 의한 디지털 동영상 데이터 자동분할장치는 NTSC 신호를 A/D 변환하여 연속적인 단일 프레임을 생성하는 동영상 캡춰부 (101)와, 상기 동영상 캡춰부 (101)에서 생성된 연속적인 단일 프레임을 분석하여 화소 값 차이, 히스토그램 차이, 차분 영상의 히스토그램 분산 및 각 프레임의 화소 값 분산 등의 특징 량을 계산하는 특징량 계산부 (103)와, 상기 특징량 계산부 (103)에서 계산된 화소 값 차이, 히스토그램 차이 및 차분 영상의 히스토그램분산의 특징 량을 인공신경망 (104)을 이용하여 통합하여 장면전환을 판단하는 장면전환점 검출부 (105)와, 상기 특징량 계산부 (103)에서 계산된 각 프레임의 화소 값 분산을 감시하여 점진적인 장면 전환점을 검출하는 장면전환점 검출부(106)와, 상기 장면전환점 검출부 (105), (106)에서 검출된 장면전환점에 의해 화소 값 차이의 변화추이를 감시하여 대표 프레임을 선정하는 대표프레임 선정부 (107)와, 상기 대표프레임 선정부 (107)에서 획득된 동영상 데이터의 검색 및 브라우징 정보를 이용하여 사용자에게 브라우징 서비스를 제공하는 동영상 브라우저로 구성된다.As shown in FIG. 1, the apparatus for automatically splitting digital video data according to an embodiment of the present invention includes a video capturing unit 101 for generating continuous single frames by A / D converting an NTSC signal, and the video capturing unit ( A feature amount calculator 103 for analyzing feature values such as pixel value difference, histogram difference, histogram variance of difference image, and pixel value variance of each frame by analyzing successive single frames generated in step 101); A scene change point detector 105 for determining a scene change by integrating the feature value of the pixel value difference, the histogram difference, and the histogram dispersion of the difference image calculated by the calculation unit 103 using the artificial neural network 104, and the feature amount A scene change point detector 106 which monitors the distribution of pixel values of each frame calculated by the calculation unit 103 and detects a progressive scene change point, and the scene change point detectors 105, ( The representative frame selecting unit 107 which selects a representative frame by monitoring the change trend of the pixel value difference by the scene change point detected at 106 and the search and browsing information of the video data acquired by the representative frame selecting unit 107. It consists of a video browser to provide a browsing service to the user.

이와 같이 구성된 본 발명의 실시 예에 의한 동영상의 자동 분할 기술을 응용한 디지털 동영상 녹화기 시스템의 동작을 도 2와 도 7을 참조하여 상세히 설명하며 다음과 같다.The operation of the digital video recorder system to which the automatic segmentation technology of the video according to the embodiment of the present invention configured as described above will be described in detail with reference to FIGS. 2 and 7.

먼저, 동영상 캡춰기 (101)는 NTSC 신호를 A/D 변환하여 각각의 디지털 프레임을 생성한다. 특징량 계산부 (103)는 생성된 프레임으로부터 연속되는 프레임간에 얻어지는 변화를 측정하는 특징량을 계산한다. 여기서 구해지는 특징량에는, 도 1에 보인 바와 같이, 화소 값의 차이, 히스토그램의 차이 및 본 발명의 특징인 차분 영상 히스토그램의 분산 값이 있다.First, the video capturing 101 A / D converts an NTSC signal to generate each digital frame. The feature variable calculating section 103 calculates a feature variable for measuring the change obtained between successive frames from the generated frame. As shown in FIG. 1, the feature amounts obtained here include a difference in pixel values, a difference in histograms, and a variance value of the differential image histogram which is a feature of the present invention.

다음으로 장면전환점 검출부 (105)는 이들 특징량을 인공신경망(104)에 입력하여 통합한 결과를 바탕으로 하여 장면 전환을 검출한다.Next, the scene change point detector 105 detects the scene change based on a result of inputting and integrating these feature amounts into the artificial neural network 104.

또한 장면전환점 검출부 (105)는 동시에 각 프레임에 대한 화소 값의 분산을 구하여 그 값이 일정한 임계값 이하로 내려갈 경우 페이딩이 일어나고 있다고 판단하여 장면전환점으로 판단하게 된다.In addition, the scene change point detector 105 simultaneously obtains the variance of pixel values for each frame and determines that fading is occurring when the value falls below a predetermined threshold value and determines the scene change point.

한편, 장면전환점의 검출이후 분할된 각 단위동영상으로부터 그 부분을 대표할 수 있는 대표 프레임을 선정하기 위하여, 대표 프레임 선정부 (107)는 특히 움직임에 민감한 화소 값의 차이의 변화패턴을 분석하여 내용적으로 의미 있는 대표 프레임을 선정한다.Meanwhile, in order to select a representative frame capable of representing the part from each of the divided unit videos after detection of the scene change point, the representative frame selecting unit 107 analyzes the change pattern of the difference of the pixel value which is particularly sensitive to motion. The representative representative frame is selected.

이하, 상기 각 부분의 동작을 상세히 설명하면 다음과 같다.Hereinafter, the operation of each part will be described in detail.

상기 특징량 계산부 (103)에서 계산된 장면전환점의 검출을 위해 사용되는 특징량으로는 상술한 바와 같이 다음 수학식(1) 및 (2)에 의하여 표현되는 화소 값의 차이, 히스토그램의 차이 등이 사용되어 왔다.As the feature amount used for the detection of the scene change point calculated by the feature amount calculation unit 103, the difference between the pixel value represented by the following equations (1) and (2), the difference in the histogram, etc. This has been used.

( 1 ) ( One )

(I_m(x,y)는 m번째 프레임 좌표 (x,y)의 화소값, N은 프레임 내의 화소수)(I _m (x, y) is the pixel value of the m th frame coordinate (x, y), N is the number of pixels in the frame)

( 2 ) ( 2 )

여기서, Im(x, y)는 m번째 프레임의 좌표 (x, y)에서의 화소 값이고, N은 프레임 내의 화소 수이다.Where Im (x, y) is the pixel value at the coordinate (x, y) of the m-th frame, and N is the number of pixels in the frame.

여기서, Hm 은 m번째 프레임의 히스토그램이고, K는 화소 값이 가지는 범위이다.Here, Hm is a histogram of the mth frame, and K is a range of pixel values.

그러나, 이와 같은 특징량은 움직임에 특히 약하거나 순간적인 밝기의 변화에 취약한 등의 단점을 가지고 있었다. 이에 반하여 본 발명에서는 연속되는 프레임간의 차분 영상의 히스토그램 분산을 이용함으로써 이와 같은 단점을 보완할 수 있도록하였다.However, such a feature has a disadvantage of being particularly weak to movement or vulnerable to a momentary change of brightness. On the contrary, in the present invention, such a disadvantage can be compensated by using histogram variance of difference images between successive frames.

연속되는 프레임간의 차분 영상의 히스토그램을 구하면 그 값은 도 2에 보인 바와 같이, 대체적으로 낮은 쪽에 분포하게 된다. 이에 반하여 장면전환이 일어나는 경우 그 경계의 전후 프레임간에는 상호간에 전혀 연관성이 없으므로, 차분 영상의 분포는 랜덤분포를 보이게 되어 히스토그램은 도 3 과 같이 거의 편평한 모양을 가지게 된다.When the histogram of the difference image between successive frames is obtained, the value is generally distributed in the lower side as shown in FIG. On the contrary, when a scene change occurs, there is no correlation between the frames before and after the boundary, and thus the distribution of the difference image shows a random distribution, and thus the histogram has an almost flat shape as shown in FIG. 3.

따라서, 이와 같은 히스토그램의 성격의 차이를 이용하여 샷의 경계를 검출할 수 있다.Therefore, the boundary of the shot can be detected using the difference in the characteristics of the histogram.

본 발명에서는 이와 같은 히스토그램의 성격을 측정하기 위한 특징량으로 다음 수학식 (3)에 의하여 정의되는 차분 영상 히스토그램의 분산을 사용한다.In the present invention, the variance of the differential image histogram, which is defined by Equation (3), is used as a feature to measure the characteristics of such a histogram.

( 3 ) (3)

여기서, Hm, m+1 은 m번째 프레임과 (m+1)번째 프레임 사이의 차분 영상의 히스토그램이고, K는 차이 값이 가지는 범위이며, H는 Hm, m+1의 평균값이다.Here, Hm and m + 1 are histograms of the difference images between the mth frame and the (m + 1) th frame, K is a range of difference values, and H is an average value of Hm and m + 1.

이 값은 장면전환이 일어나는 곳에서는 작아지고, 그렇지 않은 곳에서는 커지는 특성을 가지게 된다. 이 특징량은 특히 도 4의 (a) 및 (b)에 도시된 바와 같이 전체적으로 밝기의 변화가 발생하는 경우, 피크가 나타나는 위치가 변할 뿐, 분산 값은 크게 영향을 받지 않는다는 장점을 가지고 있다.This value becomes smaller where the transition takes place and becomes larger where it does not. This feature has the advantage that the position where the peak appears only changes, especially when the change in brightness as a whole occurs as shown in Figs. 4A and 4B, and the dispersion value is not significantly affected.

이러한 특징은 특히 이와 같은 밝기의 변화에 민감한 일반적인 히스토그램의 차이가 가지는 약점을 보완해 줄 수 있다.This feature compensates for the weakness of the difference in general histograms, which are particularly sensitive to this change in brightness.

다음은 상기 장면전환점 검출부(105)에서의 인공신경망을 이용한 복수의 특징량의 통합방법에 대하여 설명한다.Next, a method of integrating a plurality of feature quantities using an artificial neural network in the scene change point detection unit 105 will be described.

기존의 발명에서 사용되는 특징량이나 본 발명에서 제안한 차분 영상의 히스토그램 분산 등은 제각기 다른 상황에 대하여 적합한 면을 가지고 있다. 따라서 이와 같은 각 특징량을 적절히 결합하면 보다 정확한 장면전환점의 검출이 가능하다.The feature quantities used in the existing invention and histogram dispersion of the difference image proposed by the present invention have a suitable aspect for different situations. Therefore, by properly combining the respective feature amounts, it is possible to detect the scene change point more accurately.

본 발명에서는 인공신경망을 사용하여 이들 특징량을 통합한다. 인공신경망은 그 자체의 비 주기적 시정변수를 이용한 특성 때문에 비교적 적은 양의 교사정보만으로도 훈련시킬 수 있다는 장점을 가지고 있어 훈련이 쉽고, 또한 잡음의 영향 등에 강한 특징을 가지고 있다.The present invention incorporates these feature quantities using an artificial neural network. Artificial neural network has the advantage of being able to train with relatively small amount of teacher information because of its characteristics using non-periodic visibility variables, so it is easy to train and has strong characteristics such as noise effects.

본 발명에서 사용한 인공신경망의 구조를 도 5에 보였다. 여기서 상술한 세 가지 특징량은 정규화 한 후 입력 층에 입력되게 되며, 출력 층에는 장면이 전환되는지 아닌지를 나타내는 두개의 뉴런이 배치되었다. 히든 레이어는 한 층이며 그 때의 뉴런의 수는 입력 층과 같은 수를 사용하였다.The structure of the artificial neural network used in the present invention is shown in FIG. Here, the three feature quantities described above are normalized and input to the input layer, and two neurons indicating whether or not the scene is switched are arranged on the output layer. The hidden layer is one layer, and the number of neurons at that time uses the same number as the input layer.

다음은 상기 장면전환점 검출부 (106)에서의 점진적인 장면전환의 검출방법에 대하여 설명한다.Next, a method of detecting a progressive scene change in the scene change point detector 106 will be described.

점진적인 장면전환의 특징은 단일 색 프레임 (전체가 새까만 화면과 같이 아무런 모양이 없는 단일 색으로 점유되는 프레임)이 반드시 나타난다는 점이다. 이 경우 그러한 프레임의 경우 화소 값의 분산이 극도로 낮아지게 되어 장면이 전환되는 지점을 쉽게 검출할 수 있게 된다.The characteristic of a gradual transition is that a single color frame (a frame occupied by a single color without any shape, such as a black screen) must appear. In such a case, the dispersion of pixel values becomes extremely low in such a frame, so that the point where the scene changes can be easily detected.

즉, 도 6에 도시된 바와 같이, 시퀀스가 페이딩 아웃되고 있을 경우, 화소 값의분산은 급속하게 감소하게 되며, 페이딩인이 시작되면 그 값은 다시 증가하게 된다. 샷 경계 검출의 목표는 도 6에서 점 S를 검출하는 것이 된다.That is, as shown in FIG. 6, when the sequence is fading out, the dispersion of pixel values rapidly decreases, and when the fading in starts, the value increases again. The goal of shot boundary detection is to detect point S in FIG.

점 S의 검출을 위해서는 화소 값의 분산을 감시하다가, 그 값이 도 6에 점선으로 표시한 일정한 임계치 이하로 떨어지면 다시 분산값이 경계치 이상으로 올라갈 때까지 계속되는 분산치를 메모리에 저장한다. 즉, 점 A와 점 B 사이의 분산치가 저장되는 것이다. 그리고 이 저장된 중에서 가장 최소 값을 가지는 점을 장면전환점으로 인식하게 된다.In order to detect the point S, the dispersion of pixel values is monitored, and when the value falls below a certain threshold indicated by a dotted line in FIG. 6, the dispersion value is continued in the memory until the dispersion value rises above the threshold. That is, the variance between the points A and B is stored. The point having the lowest value among the stored points is recognized as a scene change point.

특히 화소 값의 분산은 위에서 장면전환 검출을 위하여 이미 구해진 히스토그램을 이용하여 쉽게 구해질 수 있기 때문에 처리 시간 상의 증가가 거의 없이 가능하다는 장점을 가진다. 즉, 프레임의 히스토그램이 이미 구해 져 있는 상태에서 화소 값의 분산은 다음 수학식 (4)에 의하여 간단히 구할 수 있다.In particular, since the variance of pixel values can be easily obtained using the histogram already obtained for scene change detection from above, there is an advantage that little increase in processing time is possible. That is, the variance of pixel values in the state where the histogram of the frame is already obtained can be simply obtained by the following equation (4).

( 4 ) ( 4 )

여기서, H는 히스토그램이고, K는 화소 값의 범위이며, N은 화소 수를 말한다. 한편 M은 프레임 내의 화소 값의 평균인데 이 값도 역시 히스토그램 H에서 쉽게 구할 수 있다.Here, H is a histogram, K is a range of pixel values, and N is the number of pixels. On the other hand, M is the average of the pixel values in the frame, which is also easily obtained from the histogram H.

다음은 상기 대표프레임 선정부 (107)에서의 대표프레임 선정방법에 대하여 설명한다.Next, a representative frame selection method in the representative frame selecting unit 107 will be described.

대표프레임은 연속되는 여러 프레임 중에서 그 샷의 주제를 담고 있는 프레임을 선정하고자 하는 것이므로 원칙적으로는 고도의 의미정보에 대한 분석을 필요로 하는 문제이다. 그러나 현재의 영상처리 기술로는 불특정 영상을 대상으로 한 영상이해란 현실적으로 곤란한 문제라 할 수 있다. 따라서 다양한 종류의 동영상 데이터의 분석을 통하여 의미적으로 중요한 프레임을 선정할 수 있는 일반적인 규칙을 발견해 내는 것이 중요하다.Since the representative frame is to select a frame containing the subject of the shot from a plurality of consecutive frames, it is a problem that requires a high level of analysis of semantic information. However, with current image processing technology, image understanding of unspecified images can be a difficult problem. Therefore, it is important to find general rules for selecting semantically important frames by analyzing various kinds of video data.

본 발명에서는 장면전환점 검출에 이용된 특징량 중 특히 움직임에 민감한 화소값 차이의 변화패턴을 분석하여 대표 프레임을 선정하는 방식을 사용했다.In the present invention, a method of selecting a representative frame by analyzing a change pattern of a pixel value difference, which is particularly sensitive to motion, among the feature amounts used for detecting the scene change point is used.

도 7에 보인 것은 일반적으로 나타나는 화소 값 차이의 변화 패턴이다. 이와 같은 패턴에 근거하여 대표프레임을 선정할 경우 비교적 의미 있는 프레임을 대표 프레임으로 선정할 수 있다. 다음에 각각의 패턴에 대한 대표프레임 선정방법을 설명하기로 한다.Shown in FIG. 7 is a change pattern of a pixel value difference generally appearing. When the representative frame is selected based on the pattern, a relatively meaningful frame can be selected as the representative frame. Next, a representative frame selection method for each pattern will be described.

패턴 A 는 화면상의 변화가 거의 없는 경우이다. 이와 같은 경우에는 어느 프레임을 선정하나 별 문제가 없고, 하나의 프레임만을 선정해도 전체 샷의 내용을 대표하기에 충분하다. 따라서 이 경우에는 중간 지점의 프레임을 선정한다.Pattern A is a case where there is almost no change on the screen. In this case, no matter which frame is selected, there is no problem, and only one frame is sufficient to represent the contents of the entire shot. In this case, therefore, the frame of the intermediate point is selected.

패턴 B 는 급격한 카메라 조작이 일어나고 있거나 커다란 물체가 빠른 속도로 움직인 경우이다. 이러한 경우 대개 화소 값의 차이가 커지는 영역은 과도적인 성격의 내용으로 중요성을 가지지 않는 경우가 많다. 따라서 점으로 표시된 바와 같이 움직임이 시작되기 이전의 프레임과 움직임이 끝난 이후의 프레임을 대표프레임으로 선정한다.Pattern B is when a sudden camera operation is taking place or a large object moves at high speed. In such a case, the area where the difference in pixel values becomes large is often of a transient nature and is not important. Therefore, as indicated by the dots, the frame before the movement starts and the frame after the movement are selected as the representative frames.

패턴 C 는 카메라가 패닝(Panning)하고 있거나, 움직이고 있는 물체를 따라가고 있는 경우 등이 이에 해당한다. 이 경우 특별히 중요성을 가지는 프레임을 선정할만한 단서가 없고, 또한 실제로도 유사한 정도의 중요성을 가지는 프레임이 연속적으로 변화해 가는 경우가 많다. 이 경우는 적절한 간격으로 건너뛰면서 대표프레임을 선정하게 되는 데 그 구체적인 방법은 다음에서 설명한다.Pattern C corresponds to a case where the camera is panning or following a moving object. In this case, there is no clue to select a frame of particular importance, and in practice, a frame having a similar degree of importance often changes continuously. In this case, the representative frame is selected while skipping at an appropriate interval. The detailed method is described below.

화소 값의 변화패턴 C의 경우는 동영상의 내용이 연속적으로 변화하는 경우이므로 시간적인 거리가 가까운 프레임 간에는 유사한 영상특징을 보이게 되나, 시간적으로 멀리 떨어진 경우 전혀 다른 영상특징을 가지는 프레임이 나타나게 된다.In the case of the change pattern C of the pixel value, since the content of the video is continuously changed, similar image characteristics are displayed between frames having a close temporal distance, but when the distance is far from each other, frames having completely different image characteristics appear.

여기서의 문제는 어떻게 하면 연속적으로 변화해 가는 프레임의 시퀀스 안에서 충분히 변화된 프레임을 추출해 낼 것인가 하는 것이다.The problem here is how to extract a sufficiently changed frame in a sequence of continuously changing frames.

본 발명에서는 패턴 C의 경우 다음과 같은 방법에 의하여 대표프레임을 선정했다.In the present invention, in the case of pattern C, the representative frame is selected by the following method.

스텝 1에서 F1을 대표프레임으로 선정하고, 그 선정된 F1을 Fs로 대치하며(Fs = F1), n을 2로 설정한다(n = 2).]In step 1, F1 is selected as the representative frame, the selected F1 is replaced with Fs (Fs = F1), and n is set to 2 (n = 2).]

스텝 2에서 프레임간의 차이함수(D(Fs,Fn))가 임계치보다 큰 경우(D〉임계치), Fn을 대표프레임으로 선정하고, 그 Fn을 Fs로 대치한 후, 그 n 에 1을 합산한다.In step 2, if the difference function D (Fs, Fn) between the frames is larger than the threshold (D> threshold), Fn is selected as the representative frame, the Fn is replaced with Fs, and 1 is added to n. .

스텝 3 에서 분할된 단위 동영상의 마지막 프레임까지 스텝 2를 반복 수행하게 된다.Step 2 is repeated until the last frame of the unit video divided in step 3.

여기서 Fn은 분할된 단위동영상의 n번째 프레임을 뜻한다.Here, Fn means the nth frame of the divided unit video.

구체적인 차이 함수의 구현방법으로는 히스토그램의 차이를 구하는 방법을 사용하였다.As a concrete method of implementing the difference function, a method of obtaining a difference in the histogram is used.

동영상 브라우저에 있어서 구조화된 동영상을 브라우징하기 위하여 기본적인 동영상 플레이뿐만 아니라 부가된 정보들을 제시하기 위하여 계층구조 브라우저를 제공한다. 계층구조 브라우저는 검색결과 얻어진 동영상의 특정한 장면만이 아니라 동영상 클립 전체에 대한 모든 정보를 통합적으로 로드함으로써 사용자는 자신이 검색한 부분이 동영상의 어느 부분에 해당하는지를 쉽게 파악할 수 있는 것이다.In a video browser, a hierarchical browser is provided to present basic information as well as additional information for browsing a structured video. The hierarchical browser loads all the information about the entire video clip, not just the specific scene of the video resulting from the search, so that the user can easily determine which part of the video the search is for.

이상에서 설명한 바와 같이 본 발명은, 동영상 자동분할 기술을 디지털 영상 녹화기에 응용하여 디지털 영상 녹화기의 목적에 적합한 장면을 분할함으로써 기존의 디지털 영상 녹화기의 문제점 (브라우징 및 유연한 검색/관리의 문제점)을 효율적으로 해결하였다.As described above, the present invention can effectively solve the problems of the conventional digital video recorder (problem of browsing and flexible search / management) by segmenting a scene suitable for the purpose of the digital video recorder by applying the video automatic segmentation technology to the digital video recorder. Solved.

이는 동영상 데이터를 내용에 기반 하여 분할함으로써 자동으로 동영상 데이터를 구조화하고, 각각의 분할된 단위를 대표할 수 있는 대표프레임을 선정함으로써 사용자에게 보다 유연한 검색 및 브라우징 서비스를 제공할 수 있는 효과가 있다.This can effectively structure the video data by dividing the video data based on the contents and select a representative frame that can represent each divided unit, thereby providing a more flexible search and browsing service to the user.

본 발명은 기존의 발명에서 사용되어 오던 연속되는 프레임간의 화소 값의 차이, 히스토그램의 차이에 덧붙여 차분 영상의 히스토그램 분산을 장면전환점을 검출하는 특징량으로 채택하였으며, 또한 이상의 세 가지 특징량을 인공신경망을 이용하여 통합함으로써 자동분할의 정확도를 향상시킬 수 있는 효과가 있다.The present invention employs the histogram variance of the differential image as a feature amount for detecting scene transition points in addition to the difference in pixel values and histograms between successive frames, which has been used in the conventional invention, and the above three feature quantities are used as artificial neural networks. By integrating with, it is possible to improve the accuracy of automatic division.

본 발명은 기존에 대응하기 곤란했던 점진적인 장면의 변화에 대처하기 위한 방법으로 각 프레임의 분산 값을 이용하는 방법을 채용하였다. 이 방법을 이용함으로써 분할된 동영상의 각 단위를 대표할 수 있는 대표프레임의 선정에 유용한 정보를 제공할 수 있는 효과가 있다.The present invention adopts a method of using the variance value of each frame as a method for coping with a gradual change of the scene, which has been difficult to cope with the existing. By using this method, it is possible to provide useful information for selecting a representative frame that can represent each unit of a divided video.

본 발명은 동영상의 영상내용정보를 이용한 검색 및 브라우징에 결정적인 영향을미치는 대표 프레임 선정의 문제에 있어서 장면분할을 위해 사용된 화소 값의 차이의 변화 패턴을 분석하는 방법을 채택함으로써 기존의 발명에서와 같은 일정한 간격의 샘플링 방법에서 벗어나 보다 동영상의 내용에 충실한 대표 프레임의 선정이 가능한 효과가 있다.The present invention adopts a method of analyzing a change pattern of the difference of pixel values used for scene division in a problem of selecting a representative frame which has a decisive influence on searching and browsing using video content information of a video. It is possible to select a representative frame that is more faithful to the contents of the moving picture than the same regular interval sampling method.

본 발명은 동영상 정보가 갖고 있는 고유한 시각적인 내용을 바탕으로 사용자가 쉽고 빠르게 원하는 동영상 정보를 검색 및 브라우징이 가능한 효과가 있다.The present invention has the effect that the user can quickly and easily search and browse the desired video information based on the unique visual content of the video information.

Claims

Digital video with a video capture that extracts standard continuous frames by analog-to-digital conversion of NTSC signals, and a feature-quantity calculator that calculates feature values for pixel value differences and histogram differences by analyzing the frames generated by the video capture. In the data automatic splitting device,

Characteristic amount for calculating the histogram variance of the differential image and the characteristic amount of the pixel value variance of each frame by analyzing the pixel value difference calculated by the feature amount calculator, the feature amount of the histogram difference, and the frames generated by the video capture Calculation means.

First scene change point detection means for integrating a feature value of the pixel value difference, histogram difference, and histogram variance of the difference image calculated by the feature variable calculating means using an artificial neural network to determine a scene change;

Second scene change point detecting means for monitoring a gradual scene change point by monitoring the variance of pixel values of each frame calculated by the feature amount calculating means.

Representative frame selection that selects a representative representative frame in terms of content by adopting a selection method suitable for each pattern by dividing a change trend of pixel value differences by patterns by scene change points detected by the first and second scene change point detection means. A digital video recorder comprising a digital video data automatic segmentation device comprising a means.

Digital video data automatic segmentation method which has the process of extracting NTSC signal into continuous frame by analog / digital conversion and analyzing feature frames and calculating feature amount of pixel value difference and histogram difference To

Characteristic amount calculation for calculating the histogram variance of the difference image and the feature amount of the pixel value variance of each frame by analyzing the feature values of the pixel value and histogram difference calculated in the feature amount calculation process and the frames generated in the frame extraction process process.

A first scene change point detection process of determining a scene change by integrating a feature value of a pixel value difference, a histogram difference, and a histogram variance of the difference image calculated using the artificial neural network.

And a second scene change point detection step of detecting progressive scene change points such as fade out and fade in by monitoring the distribution of pixel values of each frame calculated in the feature amount calculation process.

Representative frame selection to select a representative representative frame in terms of content by adopting a selection method suitable for each pattern by dividing the change trend of pixel value difference by patterns according to the scene change points detected in the first and second scene change point detection processes. Digital video recorder applying digital video data automatic segmentation method characterized in that the process consists of

The video browser that can search the video recorded by the digital video recorder based on the search and browsing information of the video data.