KR100683501B1

KR100683501B1 - An image extraction device of anchor frame in the news video using neural network and method thereof

Info

Publication number: KR100683501B1
Application number: KR1020050046074A
Authority: KR
Inventors: 강현철; 이진성
Original assignee: 인천대학교 산학협력단
Priority date: 2005-02-23
Filing date: 2005-05-31
Publication date: 2007-02-15
Also published as: KR20060094006A

Abstract

신경망 기법을 이용한 뉴스 비디오의 앵커 화면 추출 장치에서, 뉴스 비디오를 앵커 화면의 구조적 특성을 갖는 앵커 화면과 그렇지 못한 비 앵커 화면으로 분류한다. 그리고 외부로부터 뉴스 영상 데이터가 입력되면 장면 전환을 검출하고, 신경망 기법을 토대로 검출된 장면 전환에 해당하는 입력 패턴과 앵커 화면에 해당하는 다수의 분류 패턴을 비교하여, 상기 입력 패턴으로부터 앵커 화면을 검출한다. 상기 분류 패턴은 앵커의 수, 뉴스 아이콘의 유무, 그리고 뉴스 아이콘의 위치 중 적어도 하나를 포함하는 앵커 구성 변수에 따라 다수의 모델로 모델링되어 있다. In the anchor screen extraction apparatus for news video using neural network technique, news videos are classified into anchor screens having structural characteristics of anchor screens and non anchor screens. When the news image data is input from the outside, the scene change is detected, and the anchor screen is detected from the input pattern by comparing the input pattern corresponding to the scene change detected based on the neural network technique with a plurality of classification patterns corresponding to the anchor screen. do. The classification pattern is modeled into a plurality of models according to anchor configuration variables including at least one of the number of anchors, the presence or absence of a news icon, and the location of a news icon.

따라서 신경망 기법을 이용하여 뉴스 비디오에서 앵커 화면을 정확하고 효율적으로 추출할 수 있다.Therefore, the neural network technique can accurately and efficiently extract the anchor screen from the news video.

신경망, 뉴스 비디오, 앵커화면, 장면전환 Neural network, news video, anchor screen, cutaway

Description

An image extraction device of anchor frame in the news video using neural network and method

도 1은 본 발명의 실시 예에 따른 뉴스 비디오의 공간적 구조를 나타낸 도이다. 1 is a diagram showing the spatial structure of a news video according to an embodiment of the present invention.

도 2는 본 발명의 실시 예에 따른 앵커 화면 추출 장치의 구조도이다. 2 is a structural diagram of an anchor screen extracting apparatus according to an embodiment of the present invention.

도 3은 본 발명의 실시 예에 따른 앵커 화면이 갖는 공간적 구성을 4가지 신경망 입력 모델로 모델링 한 도이다. 3 is a diagram illustrating a model of the spatial configuration of an anchor screen according to an embodiment of the present invention into four neural network input models.

도 5a 및 도 5b는 본 발명의 실시 예에 따른 앵커 화면을 추출하기 위해, 비 앵커 화면이 갖는 공간적 구성을 신경망의 에러 입력 모델로 모델링 한 도이다. 5A and 5B are diagrams for modeling a spatial configuration of a non-anchor screen as an error input model of a neural network to extract an anchor screen according to an exemplary embodiment of the present invention.

도 6은 본 발명의 실시 예에 따른 앵커 화면 추출 방법의 흐름도이다. 6 is a flowchart illustrating an anchor screen extracting method according to an embodiment of the present invention.

도 7은 본 발명의 실시 예에 따른 장면 전환 추출 과정을 나타낸 도이다. 7 is a diagram illustrating a scene change extraction process according to an embodiment of the present invention.

도 8은 도 7에 도시된 과정에 따라 장면 전환을 검출한 결과를 나타낸 예시도이다. 8 is an exemplary diagram illustrating a result of detecting a scene change according to the process illustrated in FIG. 7.

도 9는 본 발명의 실시 예에 따른 신명망 모델의 ADALINE 네트워크의 구조도이다. 9 is a structural diagram of an ADALINE network of a new network model according to an embodiment of the present invention.

도 10은 본 발명의 실시 예에 따른 신경망 기법을 이용하여 앵커 화면을 추 출한 결과를 나타낸 도이다. 10 is a diagram illustrating a result of extracting an anchor screen using a neural network technique according to an exemplary embodiment of the present invention.

도 11은 본 발명의 실시 예에 대한 유효성을 검증하기 위한 프로그램의 화면 구조도이다. 11 is a screen structure diagram of a program for verifying validity according to an embodiment of the present invention.

도 12는 본 발명의 실시 예에 따른 앵커 화면 추출 장치를 적용할 수 있는 디지털 뉴스 비디오 아카이브 시스템의 구조 예이다. 12 is a structural example of a digital news video archive system to which an anchor screen extracting apparatus according to an embodiment of the present invention can be applied.

본 발명은 영상 추출 장치 및 그 방법에 관한 것으로, 더욱 상세하게 말하자면, 디지털 뉴스 비디오 아카이브(archive) 시스템 구현을 위한 뉴스 비디오의 앵커 화면 추출 장치 및 그 방법에 관한 것이다. The present invention relates to an image extracting apparatus and a method thereof, and more particularly, to an anchor screen extracting apparatus and method of a news video for implementing a digital news video archive system.

다양한 지식 자산들 가운데 영상물의 경우 아날로그 방식의 제작과 보관/배포가 일반적인 상태에서, 최근 들어 급속하게 디지털화가 이루어지고 있는 상황이다. 즉 방송의 디지털화, DVD에 의한 VHS 테이프의 대체, 디지털캠코더의 보급, 보안 감시 시장의 디지털화 등을 그 사례로 들 수 있다.In the case of video, among the various knowledge assets, analog production and storage / distribution are common, and digitalization is rapidly progressing in recent years. Examples include digitalization of broadcasts, replacement of VHS tapes by DVD, the spread of digital camcorders, and digitalization of the security surveillance market.

디지털 영상물은 자기디스크, 자기테이프, 광디스크 등에 보관될 수 있으며, 다양한 저장매체에 저장된 디지털 영상물의 효율적 활용을 위해서는 방대한 분량의 영상물에서 원하는 장면에 대한 손쉽고 빠른 검색이 필수적이다. 또한 디지털 방송으로의 전환과 함께 디지털 영상물에 대한 검색 수요가 필수적으로 생겨날 수밖에 없다. 향후 디지털 방송이 일반화되고, 방송 산업 외에도 여타 기업/기관/행정부처 에서의 디지털 이미지 및 영상물 형태의 지식자산이 급속도로 증가하고 있으므로, 이러한 디지털 영상물에 대한 효율적 저장 및 관리/검색 솔루션이 필수적으로 요구되고 있다.Digital images can be stored in magnetic disks, magnetic tapes, optical disks, etc., and for easy utilization of digital images stored in various storage media, easy and quick search for desired scenes is required. In addition, with the shift to digital broadcasting, the search demand for digital images is inevitably generated. As digital broadcasting becomes more common in the future and intellectual assets in the form of digital images and images from other corporations, agencies, and government agencies in addition to the broadcasting industry are increasing rapidly, efficient storage, management, and retrieval solutions for such digital images are essential. It is becoming.

이러한 요구에 따라 디지털화된 다양하고 방대한 뉴스 정보를 쉽게 데이터베이스화하고 검색할 수 있도록 하기 위한 디지털 뉴스 비디오 아카이브 시스템이 개발되고 있다. In response to these demands, digital news video archive systems have been developed to enable easy database and retrieval of various and digitized news information.

디지털 뉴스 비디오 아카이브 시스템에서는 뉴스 비디오를 사건 단위로 저장 및 검색을 하여야 하며, 이를 위해서는 사건의 시작 부분인 앵커 화면을 검출할 수 있어야 한다. 앵커 화면(anchorperson frame)은 뉴스 비디오에서 앵커가 사건을 소개하는 구간이다. 앵커 화면은 사건의 시작을 의미하기 때문에 뉴스를 구조적으로 분할, 저장하기 위하여 중요할 뿐 아니라, 다음에 보도할 사건을 압축하여 표현하는 뉴스 아이콘과 자막을 담고 있기 때문에, 뉴스 비디오 아카이브를 구현하기 위해서는 앵커 화면의 추출이 필수적인 선결과제가 되어야 한다. 또한 주제어나 색인어를 추출하여 추후의 검색어로 사용하기 위해서도, 앵커 화면의 추출은 중요한 의미를 갖는다.In the digital news video archive system, news video must be stored and searched on a per event basis. To do this, the anchor screen, which is the beginning of the event, must be detected. The anchorperson frame is the section in which the anchor introduces the event in the news video. The anchor screen is not only important for structurally dividing and storing the news because it means the beginning of an event, but also includes news icons and subtitles that compress and represent the event to be reported next. Extraction of anchor screens should be an essential prerequisite. In addition, in order to extract a main word or index word and use it as a future search word, extraction of an anchor screen has an important meaning.

앵커 화면을 추출하기 위해서는 뉴스 비디오를 분석하여 내용을 기반으로 뉴스 비디오를 분할하여야 하는데, 이를 위하여 가장 많이 사용하는 방법은 내용의 변환이 일어나는 곳, 즉 장면 전환(scene change)을 검출하고, 이를 경계로 장면 단위로 뉴스 비디오를 분할하는 방법이다. 이러한 장면 전환은 연속된 화면에서 밝기나 색상의 변화량이 큰 곳에서 발생하기 때문에, 현재 프레임과 이전 프레임의 밝기나 색상 값의 차이나, 밝기나 색상의 히스토그램 차이 등과 같은 화면의 변화량을 하나의 수치로 표현할 수 있는 단위로 계산하여 그 변화량이 큰 지점을 장면 전환으로 검출할 수 있다. In order to extract the anchor screen, it is necessary to analyze the news video and segment the news video based on the content. The most popular method is to detect where the content change occurs, that is, a scene change, and detect the boundary. How to split news video into scene units. Since the scene transitions occur in a large amount of change in brightness or color in a continuous screen, the change amount of the screen such as the difference between the brightness or color value of the current frame and the previous frame, or the difference in the brightness or color histogram is expressed as a single figure. By calculating in units that can be expressed, the point where the change amount is large can be detected as a scene change.

종래에 장면 전환 검출을 이용하여 앵커 화면을 검출하는 방법으로는 장면 전환 중에서 앵커가 출현하는 화면의 특정한 배경색과 같은 색을 갖는 화면을 앵커 화면으로 간주하거나, 앵커의 얼굴색을 계산하여 얼굴색이 나타나는 화면을 앵커 화면으로 간주하거나, 또는 앵커의 얼굴을 인식하여 앵커 화면을 검출하는 방법 등이 있다. 그러나 이 방법들은 배경색이나 얼굴색과 유사한 색상이 많은 화면을 오 검출하는 오류가 많이 발생한다. Conventionally, a method of detecting an anchor screen by using scene change detection is to consider a screen having the same color as a specific background color of a screen in which an anchor appears during a scene change as an anchor screen, or a screen in which a face color appears by calculating an anchor face color. May be regarded as an anchor screen or a method for detecting an anchor screen by recognizing an anchor face. However, these methods generate a lot of errors in detecting a screen with many colors similar to the background or face color.

또 다른 방법으로는 앵커의 움직임이 작다는 가정하에 움직임 양이 작은 화면을 앵커 화면으로 간주하는 방법이 있으나, 움직임 양만으로는 앵커 화면을 정확하게 추출하기가 어렵다.Another method is to consider a screen with a small amount of movement as an anchor screen under the assumption that the anchor movement is small. However, it is difficult to accurately extract the anchor screen using only the movement amount.

그러므로 본 발명이 이루고자 하는 기술적 과제는 종래의 문제점을 해결하기 위한 것으로, 신경망 기법을 이용하여 뉴스 비디오에서 앵커 화면을 정확하게 추출하고자 하는데 있다. Therefore, the technical problem to be achieved by the present invention is to solve the conventional problems, and to accurately extract the anchor screen from the news video using a neural network technique.

특히, 본 발명이 이루고자 하는 기술적 과제는 뉴스 비디오에서 앵커 화면의 구조적 특성을 고려하여, 상기 앵커 화면을 정확하게 추출하고자 하는데 있다. In particular, the technical problem to be achieved by the present invention is to accurately extract the anchor screen in consideration of the structural characteristics of the anchor screen in the news video.

이러한 기술적 과제를 달성하기 위하여, 본 발명의 특징에 따른 앵커 화면 추출 장치는 뉴스 비디오에 해당하는 뉴스 영상 데이터로부터 앵커 화면을 추출하는 장치에서, 외부로부터 인가되는 상기 뉴스 영상 데이터를 입력받는 영상 입력부; 상기 뉴스 영상 데이터로부터 장면 전환(scene change)을 검출하고, 검출된 장면 전환에 해당하는 입력 패턴을 출력하는 장면 전환 검출부; 및 앵커 화면에 해당하는 다수의 분류 패턴이 저장되어 있는 분류 패턴 저장 모듈, 그리고 신경망 기법을 토대로 상기 입력 패턴과 상기 저장되어 있는 분류 패턴을 비교하여, 상기 입력 패턴으로부터 앵커 화면을 검출하는 패턴 비교 모듈을 포함하는 앵커 화면 검출부를 포함한다. In order to achieve the above technical problem, the anchor screen extraction apparatus according to an aspect of the present invention, in the device for extracting the anchor screen from the news image data corresponding to the news video, the image input unit for receiving the news image data applied from the outside; A scene change detector for detecting a scene change from the news image data and outputting an input pattern corresponding to the detected scene change; And a classification pattern storage module storing a plurality of classification patterns corresponding to the anchor screen, and a pattern comparison module configured to detect the anchor screen from the input pattern by comparing the input pattern with the stored classification pattern based on a neural network technique. Includes an anchor screen detector comprising a.

이러한 특징을 가지는 본 발명에서는, 상기 신경망 기법을 이용하여 앵커 화면을 추출하기 위하여, 상기 분류 패턴은 앵커의 수, 뉴스 아이콘의 유무, 그리고 뉴스 아이콘의 위치 중 적어도 하나를 포함하는 앵커 화면 구성 변수를 토대로 하여 모델링된 다수의 분류 패턴으로 이루어진다. 그리고 상기 분류 패턴은 앵커 화면과 비앵커 화면 중 하나로 분류되며, 상기 앵커 화면은 적어도 4가지 이상의 모델로 모델링되고, 상기 비앵커 화면은 적어도 8가지 이상의 모델로 모델링될 수 있다. In the present invention having such a feature, in order to extract the anchor screen using the neural network technique, the classification pattern may include an anchor screen configuration variable including at least one of the number of anchors, the presence or absence of a news icon, and the location of a news icon. It consists of a number of classification patterns modeled on the basis. The classification pattern may be classified into one of an anchor screen and a non-anchor screen, and the anchor screen may be modeled with at least four models, and the non-anchor screen may be modeled with at least eight models.

또한 본 발명의 다른 특징에 따른 앵커 화면 추출 방법은, 뉴스 비디오에 해당하는 뉴스 영상 데이터로부터 앵커 화면을 추출하는 장치의 앵커 화면 추출 방법에서, 외부로부터 상기 뉴스 영상 데이터--상기 뉴스 영상 데이터는 I-프레임을 포함하는 MPEG(motion picture expert group) 비트 스트림으로 이루어짐--가 입력되면, 상기 장치가 뉴스 영상 데이터로부터 I 프레임에 해당하는 화면만을 대상으로 장면 전환을 검출하고, 검출된 장면 전환에 해당하는 입력 패턴을 출력하는 단계; 및 상기 장치가 신경망 기법을 토대로 앵커 화면에 해당하는 다수의 분류 패턴과 상기 입력 패턴을 비교하여, 상기 입력 패턴으로부터 앵커 화면을 검출하는 단계를 포함하며, 상기 분류 패턴은 앵커의 수, 뉴스 아이콘의 유무, 그리고 뉴스 아이콘의 위치 중 적어도 하나를 포함하는 앵커 화면 구성 변수를 토대로 하여 모델링된 다수의 분류 패턴으로 이루어진다. In addition, the anchor screen extraction method according to another aspect of the present invention, in the anchor screen extraction method of the device for extracting the anchor screen from the news image data corresponding to the news video, the news image data from the outside--the news image data is I When the input is made of a motion picture expert group (MPEG) bit stream including a frame, the device detects a scene change only on a screen corresponding to an I frame from the news image data, and corresponds to the detected scene change. Outputting an input pattern; And detecting the anchor screen from the input pattern by comparing the input pattern with a plurality of classification patterns corresponding to the anchor screen based on a neural network technique, wherein the classification pattern includes a number of anchors and a news icon. It consists of a plurality of classification patterns modeled based on the anchor screen configuration variable including at least one of the presence, and the location of the news icon.

이하, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있는 가장 바람직한 실시 예를 첨부된 도면을 참조로 하여 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다.DETAILED DESCRIPTION Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present invention.

뉴스는 시간 순으로 앵커가 말하는 부분과 뒤이어서 나오는 비 앵커 구간인 기자가 설명하는 화면으로 규칙적으로 구성되어 있으며, 도 1에 전형적인 뉴스 비디오가 도시되어 있다. 첨부한 도 1에 도시되어 있듯이, 뉴스 비디오는 앵커 구간, 자막, 뉴스 아이콘 등으로 구성된다. 이러한 뉴스 비디오를 효율적으로 검색하기 위하여 뉴스를 사건별로 구조적으로 분할하여 저장하여야 한다. 이를 위해서는 뉴스 비디오에서 앵커 구간을 정확하게 검출하여야 한다. The news is regularly organized in terms of the time the anchor speaks and the screen described by the reporter, a non-anchor section that follows, with a typical news video shown in FIG. As shown in FIG. 1, the news video includes an anchor section, a subtitle, a news icon, and the like. In order to efficiently search for such news videos, news must be structurally divided and stored by event. For this purpose, the anchor section must be accurately detected in the news video.

따라서 본 발명의 실시 예에서는 넓은 범위의 디지털 비디오 아카이브 시스템에 속하는 디지털 뉴스 비디오 아카이브 시스템을 구현하는 데 필수적인 과제인 앵커 구간 즉, 앵커 화면을 신경망 기법을 이용하여 추출한다. Therefore, the embodiment of the present invention extracts an anchor section, that is, an anchor screen, which is an essential task for implementing a digital news video archive system belonging to a wide range of digital video archive systems, using neural network techniques.

도 2는 본 발명의 실시 예에 따른 신경망 기법을 이용한 뉴스 비디오의 앵커 화면 추출 장치(이하, 설명의 편의상 "앵커 화면 추출 장치"라고 명명함)의 구조를 나타낸 도이다. 도 3은 도 2에 도시된 장면 전환 검출부 및 앵커 화면 추출부의 상세 구조가 도시되어 있다. FIG. 2 is a diagram illustrating a structure of an anchor screen extracting apparatus (hereinafter, referred to as an "anchor screen extracting apparatus" for convenience of description) of a news video using a neural network technique according to an exemplary embodiment of the present invention. 3 illustrates a detailed structure of the scene change detection unit and the anchor screen extracting unit shown in FIG. 2.

첨부한 도 2에 도시되어 있듯이, 본 발명의 실시 예에 따른 앵커 화면 추출 장치는 영상 입력부(10), 버퍼(20), 장면 전환 검출부(30), 앵커화면 추출부(40), 부호화부(50), 및 출력부(60)를 포함한다.As shown in FIG. 2, an anchor screen extracting apparatus according to an embodiment of the present invention may include an image input unit 10, a buffer 20, a scene change detector 30, an anchor screen extractor 40, and an encoder ( 50, and an output unit 60.

영상 입력부(10)는 외부로부터 인가되는 뉴스 영상 데이터를 제공받아 버퍼(20)에 버퍼링하며, 여기서 뉴스 영상 데이터는 카메라 등에 의하여 촬영되어 전기적으로 처리할 수 있도록 영상 처리된 데이터이다. 예를 들어, 뉴스 영상 데이터는 MPEG(Moving Picture Experts Group) 표준에 따라 부호화된 비트 스트림 형태로 처리된 데이터이다. The image input unit 10 receives news image data applied from the outside and buffers the news image data in the buffer 20, where the news image data is image processed data so as to be photographed by an camera or the like and electrically processed. For example, news image data is data processed in the form of a bit stream encoded according to the Moving Picture Experts Group (MPEG) standard.

장면 전환 검출부(30)는 뉴스 영상 데이터로부터 장면 전환(scene change)을 검출하고, 이를 경계로 장면 단위로 뉴스 영상 데이터를 분할한다.The scene change detection unit 30 detects a scene change from the news image data, and divides the news image data into scene units based on the scene change.

본 발명의 실시 예에 따른 장면 전환 검출부(30)는 도 3에 도시되어 있듯이, 뉴스 영상 데이터로부터 장면 전환 검출을 위한 DC 영상을 구하는 DC 영상 생성 모듈(31), DC 영상에 대하여 이진화를 수행하는 이진화 모듈(32), 이진화된 DC 영상을 정규화(normalization)하는 정규화 모듈(33)을 포함한다.As illustrated in FIG. 3, the scene change detection unit 30 according to an embodiment of the present invention performs a binarization on a DC image generation module 31 and a DC image for obtaining a DC image for scene change detection from news image data. Binarization module 32 includes a normalization module 33 for normalizing the binarized DC image.

이러한 구조로 이루어지는 장면 전환 검출부(30)는 영상 데이터의 밝기 성분뿐만 아니라, 색상 성분의 조합에 의한 히스토그램 누적차를 이용하여 장면 전환을 검출한다. 그리고 검출된 장면 전환을 정규화하여 앵커 화면 검출을 위한 입력 패 턴으로 출력한다. The scene change detection unit 30 having such a structure detects the scene change using not only the brightness component of the image data but also the histogram accumulation difference caused by the combination of the color components. The detected scene change is normalized and output as an input pattern for anchor screen detection.

한편, 앵커 화면 추출부(40)는 검출된 장면 전환에서 앵커 화면을 추출하며, 부호화부(50)는 이와 같이 추출된 앵커 화면을 부호화하여 출력부(60)를 통하여 출력하거나 저장한다. Meanwhile, the anchor screen extractor 40 extracts the anchor screen from the detected scene change, and the encoder 50 encodes the extracted anchor screen and outputs or stores the encoded anchor screen through the output unit 60.

본 발명의 실시 예에 따른 앵커 화면 추출부(40)는 신경망 기법을 이용하여 앵커 화면을 추출하기 위하여, 도 3에서와 같이, 앵커 화면 추출을 위한 신경망 기법에 따른 분류 패턴에 해당하는 데이터가 저장되어 있는 패턴 저장 모듈(41), 저장되어 있는 분류 패턴과 장면 전환 추출부(30)에 의하여 출력되는 입력 패턴을 비교하여 상기 입력 패턴이 어떠한 분류 패턴에 해당하는지를 판별함으로써, 앵커 화면을 추출하는 패턴 비교 모듈(42)을 포함한다. In order to extract the anchor screen using the neural network technique, the anchor screen extractor 40 according to an embodiment of the present invention stores data corresponding to a classification pattern according to the neural network technique for extracting the anchor screen as shown in FIG. 3. A pattern for extracting the anchor screen by comparing the classification pattern stored in the pattern storage module 41 and the stored classification pattern with the input pattern output by the scene change extraction unit 30 to determine which classification pattern the input pattern corresponds to Comparison module 42.

신경망은 패턴 분류 분야에서 광범위하게 사용되는 기법으로, 미지의 입력 패턴을 입력 패턴과 가장 유사한 소정의 미리 정해진 패턴 클래스로 할당하는 방법으로, 입력 패턴을 분류한다. 본 발명의 실시 예에서는 신경망을 이용하여 앵커 화면을 검출하기 위하여, 앵커 화면을 다수의 앵커 모델로 모델링하였다.A neural network is a technique widely used in the field of pattern classification, and classifies input patterns by assigning unknown input patterns to predetermined predetermined pattern classes most similar to the input patterns. In an embodiment of the present invention, the anchor screen is modeled as a plurality of anchor models in order to detect the anchor screen using a neural network.

다음에는 본 발명의 실시 예에 따른 앵커 화면에 대한 모델링에 따라 다수의 분류 패턴을 생성하는 방법에 대하여 설명한다. Next, a method of generating a plurality of classification patterns according to modeling of an anchor screen according to an embodiment of the present invention will be described.

앵커 화면 추출을 위해서는 먼저, 뉴스 영상 데이터에서 앵커가 존재하는 화면의 공간적 구조 특성을 분석하여 앵커의 얼굴 영역에 대한 사전 지식을 설정한 후, 이를 바탕으로 앵커의 얼굴을 검출하여 정확히 앵커로 인식된 화면을 키 프레임으로 설정하여야 한다. In order to extract the anchor screen, first, we analyze the spatial structure characteristics of the screen in which the anchor exists in the news image data, set prior knowledge of the anchor's face area, and then detect the anchor's face and accurately recognize it as an anchor. You must set the screen as a key frame.

구체적으로 앵커 화면에서는 앵커의 얼굴이 일정한 크기를 가지고 나타나고, 얼굴의 움직임은 아주 미세하다는 점, 비디오 전반에 걸쳐 나타나는 배경이 유사하다는 점의 공간적 특징을 이용할 수 있다. 따라서 이러한 공간적인 특징을 신경망의 패턴 클래스로 모델링 할 수 있다. In detail, the anchor screen may use spatial features such that the face of the anchor appears with a certain size, the movement of the face is very fine, and the background that appears throughout the video is similar. Therefore, this spatial feature can be modeled as a pattern class of neural network.

앵커 화면에는 보도할 뉴스를 압축하여 표현하는 뉴스 아이콘이 존재할 수도 있고 존재하지 않을 수도 있다. 하지만 뉴스 아이콘의 존재 여부와 상관없이 앵커의 얼굴 영역은 항상 일정하다는 사실을 알 수 있다. 따라서 이러한 경우를 모두 신경망 패턴 클래스로 설정하기 위하여, 앵커의 수와 뉴스 아이콘의 유무, 그리고 뉴스 아이콘의 위치를 토대로 하는 앵커 화면의 공간적 구성에 따라 앵커 화면을 4가지 모델로 모델링하였다. 이하에서는 설명의 편의를 위하여 앵커 화면의 공간적 구성의 특징을 나타내는 앵커의 수와 뉴스 아이콘의 유무, 그리고 뉴스 아이콘의 위치를 "앵커 화면 구성 변수"라고 명명한다. The anchor screen may or may not have a news icon that represents a compressed news report. However, you can see that the anchor's face area is always constant, regardless of the presence of a news icon. Therefore, in order to set all these cases as neural network pattern class, four models of anchor screens were modeled according to the spatial configuration of anchor screen based on the number of anchors, the presence or absence of news icons, and the location of news icons. Hereinafter, for convenience of description, the number of anchors, the presence or absence of a news icon, and the location of the news icon representing the spatial configuration of the anchor screen are referred to as "anchor screen configuration variables".

도 4는 본 발명의 실시 예에 따른 앵커 화면 구성 변수에 따라 앵커 화면을 4가지 모델로 모델링한 것을 나타낸 예시도이다. 4 is an exemplary view illustrating modeling an anchor screen into four models according to an anchor screen configuration variable according to an exemplary embodiment of the present invention.

첨부한 도 4에 도시되어 있듯이, 앵커 화면 구성 변수에 따라 앵커 화면을 단일 앵커이며 뉴스 아이콘이 없는 제1 모델(M1), 단일 앵커이며 앵커의 오른 쪽 윗(또는 아래) 부분에 뉴스 아이콘이 있는 제2 모델(M2), 단일 앵커이며 앵커의 왼 쪽 윗(또는 아래) 부분에 뉴스 아이콘이 있는 제3 모델(M3), 그리고 복수의 앵커가 나타나는 제4 모델(M4)로 모델링한다. As shown in the accompanying FIG. 4, the anchor screen is a single anchor, according to the anchor screen configuration variable, the first model M1 without a news icon, a single anchor, and a news icon in the upper right (or lower) part of the anchor. The second model M2, a single anchor and a third model M3 having a news icon in the upper left (or lower) part of the anchor, and a fourth model M4 in which a plurality of anchors appear.

한편 비 앵커 화면 즉, 앵커가 나타나지 않는 화면도 화면의 밝기의 분포에 따라 모델링할 수 있으며, 도 5a 및 도 5b는 본 발명의 실시 예에 따라 비 앵커 화면을 모델링한 것을 나타낸 도이다. 비 앵커 화면을 도 5a 및 도 5b에 도시되어 있듯이, 화면 밝기의 분포에 따라 8가지의 에러 모델로 모델링할 수 있다. Meanwhile, a non-anchor screen, that is, a screen without an anchor, may be modeled according to the distribution of brightness of the screen, and FIGS. 5A and 5B illustrate a non-anchor screen according to an exemplary embodiment of the present invention. As shown in FIGS. 5A and 5B, the non-anchor screen may be modeled into eight error models according to the distribution of screen brightness.

위에 기술된 바와 같이, 본 발명의 실시 예에서는 패턴 클래스를 크게 앵커 화면과 비 앵커 화면의 두 가지 클래스로 설정하고, 각각의 클래스를 다시 4가지와 8가지의 서브클래스로 구성하여 신경망을 학습시킨다. 그리고 추후에 입력되는 뉴스 영상 데이터에 따른 입력 패턴과 가장 유사한 패턴 클래스를 찾음으로써, 입력 패턴이 앵커 화면인지 또는 비 앵커 화면인지를 판단하여 앵커 화면을 추출한다.As described above, in the embodiment of the present invention, the pattern class is largely set to two classes, the anchor screen and the non-anchor screen, and each class is configured into four and eight subclasses to train the neural network. . Then, by searching for a pattern class most similar to the input pattern according to the news image data input later, it is determined whether the input pattern is an anchor screen or a non-anchor screen to extract an anchor screen.

다음에는 위에 기술된 구조를 토대로 하여 본 발명의 실시 예에 따른 앵커 화면 추출 장치의 동작에 대하여 보다 구체적으로 설명한다. Next, the operation of the anchor screen extracting apparatus according to an embodiment of the present invention will be described in more detail based on the above-described structure.

외부의 방송 시스템으로부터 전송되거나 또는 카메라 등에 의하여 촬영되어 별도의 기록 매체(예: 비디오테이프, DVD 등)에 저장된 뉴스 영상 데이터가 영상 입력부(10)를 통하여 입력되면, 상기 뉴스 영상 데이터는 버퍼(20)에 버퍼링된다(S100). 이러한 뉴스 영상 데이터는 MPEG 비트 스트림으로 구성되어 있는 데이터이다. When news image data transmitted from an external broadcast system or taken by a camera or the like and stored in a separate recording medium (eg, a video tape or a DVD) is input through the image input unit 10, the news image data is buffered 20. Is buffered (S100). Such news video data is data composed of an MPEG bit stream.

일반적으로 MPEG에서는 모든 프레임을 개별 정화상으로 압축하는 것이 아니라, 인접 프레임 사이에 유사점이 많다는 점을 이용하여 동작 보상을 하는데 있어 예측과 보간을 이용한다. 그러나 한편 임의 접근과 같은 VCR 식 제어가 가능해야 한다는 등의 여러 이유로 인해 MPEG 영상 데이터에는 자신이 가지고 있는 정보만으 로도 복원될 수 있는 프레임이 규칙적으로 삽입되어야 한다.In general, MPEG uses prediction and interpolation to compensate for motion by using a similarity between adjacent frames, rather than compressing all frames into individual purified images. However, due to various reasons such as VCR-like control such as random access, MPEG frames should regularly insert frames that can be recovered by only their own information.

그러므로 MPEG 영상 데이터는 정화상으로 압축된 프레임 I( Intra-coded) 프레임, 예측만을 한 P(Predictive-coded) 프레임, 양방향 보간을 한 B(Bidirectional-coded) 프레임이 일정한 패턴으로 섞여 있다. 여기서 I-프레임은 데이터 스트림의 어느 위치에도 올 수 있으며, 데이터의 임의 접근을 위해 사용되며, 다른 이미지들의 참조 없이 부호화된다. I-프레임은 매크로 블럭내에서 지정된 8×8 블럭으로 나누어진 다음, 블록 단위로 부호화된다. 그리고 I-프레임으로 시작하는 연속적인 영상들의 집합을 GOP(Group Of Picture)라고 한다.Therefore, MPEG video data is composed of a frame I (Intra-coded) frame compressed into a clean image, a P (Predictive-coded) frame with only prediction, and a B (Bidirectional-coded) frame with bidirectional interpolation. The I-frame here can come anywhere in the data stream and is used for random access of the data and encoded without reference to other images. The I-frame is divided into 8x8 blocks designated in the macroblock, and then encoded in block units. The set of consecutive pictures starting with an I-frame is called a GOP (Group Of Picture).

GOP에서 적어도 하나의 I 프레임이 한번은 부호화되기 때문에, 본 발명의 실시 예에 따른 장면 전환 검출부(30)는 이러한 MPGE의 뉴스 영상 데이터로부터 I 프레임에 해당하는 화면만을 대상으로 장면 전환을 검출한다. 즉, MPEG 비트 스트림을 완전하게 복호화 하지 않고 필요한 정보가 있는 비트 스트림만 복호화하는 부분 복호화를 수행하여, 복호화에 필요한 시간을 단축한다. 도 7에 본 발명의 실시 예에 따른 장면 전환 검출의 예가 도시되어 있다. Since at least one I frame is encoded once in the GOP, the scene change detection unit 30 according to an embodiment of the present invention detects a scene change only for a screen corresponding to the I frame from the news image data of the MPGE. That is, partial decoding is performed to decode only the bit stream having necessary information without completely decoding the MPEG bit stream, thereby reducing the time required for decoding. 7 illustrates an example of scene change detection according to an embodiment of the present invention.

구체적으로 도 7을 토대로 하여 설명하면, 장면 전환 검출부(30)의 DC 영상 생성 모듈(31)은 뉴스 영상 데이터에서 I 프레임만을 대상으로 하여, I 프레임에서 소정 블록(8×8 블록)의 화소값을 대표하는 DC 계수만을 이용하여 부분적으로 복호화하여 DC 영상(40×30)을 구한다(S110∼S120). Specifically, with reference to FIG. 7, the DC image generating module 31 of the scene change detection unit 30 targets only I frames in the news image data, and pixel values of predetermined blocks (8 × 8 blocks) in the I frames. A DC image (40 × 30) is obtained by partially decoding using only the DC coefficients representing (S110 to S120).

다음, 장면 전환 검출부(30)의 이진화 모듈(32)은 구해진 DC 영상에 대하여 이진화 과정을 수행한다(S130). 특히 상기에서 구해진 DC 영상의 밝기 값의 평균값 을 임계치로 사용하며, 상기 DC 영상을 구성하는 화소의 밝기값과 임계치를 비교하여 이진화를 수행한다. 예를 들어, 소정 화소의 밝기값이 임계치보다 크면 흰색에 해당하는 값(예:+1)을 부여하고, 소정 화소의 밝기값이 임계치보다 작으면 검정색에 해당하는 값(예:-1)을 부여한다. 여기서 상기 임계치는 상기 평균값에 한정되지 않으며 주어진 환경에 따라 그 값이 다르게 설정될 수 있다. Next, the binarization module 32 of the scene change detection unit 30 performs a binarization process on the obtained DC image (S130). In particular, the average value of the brightness value of the DC image obtained above is used as a threshold value, and binarization is performed by comparing the brightness value and the threshold value of the pixels constituting the DC image. For example, if the brightness value of a predetermined pixel is greater than the threshold, a value corresponding to white (eg, +1) is given. If the brightness value of the predetermined pixel is less than the threshold, a value corresponding to black (eg, -1) is given. Grant. The threshold is not limited to the average value and may be set differently according to a given environment.

그리고 정규화 모듈(33)이 이와 같이 이진화된 영상을 정규화하여 도 7에 예시된 바와 같은 정규화 패턴 즉, 입력 패턴을 생성한다(S140). 여기서는 이진화 영상을 10×10으로 정규화 하였으나, 이에 한정되지는 않는다. 도 8에 입력되는 뉴스 영상 데이터로부터 검출된 장면 전환의 예가 도시되어 있다. The normalization module 33 normalizes the binarized image as described above to generate a normalization pattern, that is, an input pattern as illustrated in FIG. 7 (S140). In this example, the binarized image is normalized to 10 × 10, but is not limited thereto. An example of scene change detected from news image data input to FIG. 8 is shown.

위에 기술된 바와 같이 장면 전환이 검출되면, 검출된 장면 전환에 해당하는 입력 패턴을 신경망 기법을 이용하여 앵커 화면과 비앵커 화면의 두 가지의 클래스로 분류시킨다. As described above, when a scene change is detected, an input pattern corresponding to the detected scene change is classified into two classes, an anchor screen and a non-anchor screen, using neural network techniques.

본 발명의 실시 예에서는 ADALINE 신경망을 이용하여 입력 패턴을 두 개의 클래스로 각각 분류하였으나, 반드시 이에 한정되지는 않는다. 도 9에 본 발명의 실시 예에 사용된 ADALINE 신경망의 구조가 도시되어 있다. 일반적으로 ADALINE 신경망은 도 9에서와 같이, 입력 레이어 및 출력 레이어로 이루어지며, 입력 레이어는 네트워크에 대한 각 입력을 위한 입력노드(input node)와 바이어스 노드(bias node)를 포함한다. 그리고 출력 레이어는 네트워크를 위한 출력을 산출하는 ADALINE 노드만을 포함한다. 입력 레이어에 있는 각 노드는 하나의 링크로 ADALINE 노드에 연결된다.In the embodiment of the present invention, the input pattern is classified into two classes using the ADALINE neural network, but is not necessarily limited thereto. Figure 9 shows the structure of the ADALINE neural network used in the embodiment of the present invention. In general, the ADALINE neural network includes an input layer and an output layer, as shown in FIG. 9, and the input layer includes an input node and a bias node for each input to the network. And the output layer contains only the ADALINE nodes that produce the output for the network. Each node in the input layer is connected to the ADALINE node by one link.

이러한 ADALINE 신경망으로 입력되는 입력 패턴은 미리 결정된 범위(일반적으로 -1.0~1.0이 사용된다)로 정규화된 성분들을 가져야 하기 때문에, 위에 기술된 바와 같이 검출된 장면 전환에 대한 영상 데이터가 정규화되었다. 이러한 정규화는 입력 패턴의 한 성분이 우세하여 네트워크의 동작을 교란하는 것을 방지한다. Since the input pattern input to this ADALINE neural network should have components normalized to a predetermined range (typically -1.0 to 1.0 is used), the image data for the detected scene transition is normalized as described above. This normalization prevents one component of the input pattern from prevailing and disturbing the operation of the network.

ADALINE 신경망에서 요구되는 출력들은 항상 두 분류 클래스에 해당하는 두가지의 값들 중 하나가 되어야만 한다. 따라서, 본 발명의 실시 예에서는 입력 패턴에 대한 출력값이 앵커 화면 또는 비앵커 화면 중 하나가 되어야 한다. 여기서 ADALINE 신경망의 입력 벡터의 개수는 100으로 하고, 정규화 입력 패턴에서 화소 값이 0에 해당하면 입력패턴 -1, 화소 값이 255에 해당하는 +1의 입력패턴을 가지도록 하였다.The outputs required by the ADALINE neural network must always be one of two values for the two classification classes. Therefore, in an embodiment of the present invention, the output value for the input pattern should be either an anchor screen or a non-anchor screen. Here, the number of input vectors of the ADALINE neural network is set to 100. If the pixel value corresponds to 0 in the normalized input pattern, the ADALINE neural network has an input pattern of -1 and a pixel value of +1 corresponding to 255.

위에 기술된 바와 같은 신경망 네트워크를 기본으로 하여, 앵커 화면 추출부(40)의 패턴 비교 모듈(41)은 장면 전환 추출부(30)로부터 제공되는 입력 패턴과 저장 모듈(42)에 저장되어 있는 분류 패턴을 비교하여 입력 패턴에 대한 분류 클래스를 결정한다(S150∼S160). 예를 들어, ADALINE 신경망을 토대로 입력 노드를 통하여 입력되는 입력 패턴의 값과 그것에 대응하는 링크의 값의 곱을 합한 다음에, 문턱값 함수가 가중된 합에 적용되어 출력값이 두 분류 클래스 중 하나의 값에 해당되도록 강제한다. 문턱값 함수는 어떤 양의 합(positive sum)을 분류 클래스 A(앵커 화면)와 연관된 값으로 변환하고, 어떤 음의 합(negative sum)은 분류 클래스 B(비앵커 화면)과 관련된 값으로 변환한다.Based on the neural network network as described above, the pattern comparison module 41 of the anchor screen extractor 40 classifies the input pattern provided from the scene change extractor 30 and the classification stored in the storage module 42. The classification classes for the input patterns are determined by comparing the patterns (S150 to S160). For example, based on the ADALINE neural network, the product of the input pattern input through the input node and the corresponding link value is summed, and then the threshold function is applied to the weighted sum so that the output is the value of one of the two classification classes. To force the The threshold function converts a positive sum to a value associated with classification class A (anchor picture), and any negative sum to a value associated with classification class B (non-anchor picture). .

이러한 ADALINE 신경망은 훈련 과정 동안에 링크들과 관련된 가중 값들 (weight values)을 수정함으로써 학습한다. 훈련은 보통 델타규칙으로서 알려진 학습규칙을 가지고 링크 값들을 조정한다.This ADALINE neural network learns by modifying weight values associated with links during the training process. Training usually adjusts link values with a learning rule known as a delta rule.

델타규칙은 링크값들을 정정하기 위해서 ADALINE에 의해서 산출된 에러(하나의 패턴을 잘못된 카테고리로 정렬하는 것)를 사용하며 그래서 입력패턴이 제시되는 다음 번에는 정확한 답이 산출되도록 한다. The delta rule uses the error produced by ADALINE to correct the link values (sorting one pattern into the wrong category) so that the correct answer is produced the next time the input pattern is presented.

본 발명의 실시 예에 따른 델타규칙에 따라 산출되는 가중값은 다음 수학식 1에 따라 산출될 수 있다. The weighting value calculated according to the delta rule according to an embodiment of the present invention may be calculated according to Equation 1 below.

여기서

는 가중값을 나타내며,

는 학습율,

는 에러 그리고

는 입력값(입력패턴의 소정 화소값)을 나타낸다. here

Represents the weighting value,

Is the learning rate,

Is an error and

Indicates an input value (a predetermined pixel value of the input pattern).

도 10에 위에 기술된 바와 같이 동작하는 본 발명의 실시 예에 따른 앵커 화면 추출 장치의 유효성을 검증하기 위한 프로그램의 화면 구조가 예시되어 있으며, 도 11에는 본 발명의 실시 예에 따른 신경망 기법을 이용하여 추출된 앵커 화면 결과가 예시되어 있다. 첨부한 도 10 및 도 11에 따르면 뉴스 비디오에서 정확하게 앵커 화면이 검출되는 것을 알 수 있다. 10 illustrates a screen structure of a program for validating an anchor screen extracting apparatus according to an embodiment of the present invention, which operates as described above, and FIG. 11 uses a neural network technique according to an embodiment of the present invention. The extracted anchor screen result is illustrated. 10 and 11, it can be seen that the anchor screen is accurately detected in the news video.

이러한 실시 예에 따른 앵커 화면 추출 장치를 도 12에 도시된 바와 같은 구조로 구현될 수 있는 디지털 뉴스 비디오 아카이브 시스템에 적용되어 사용될 수 있다. The anchor screen extracting apparatus according to this embodiment may be applied to and used in a digital news video archive system that may be implemented in a structure as shown in FIG. 12.

비록, 본 발명이 가장 실제적이며 바람직한 실시 예를 참조하여 설명되었지만, 본 발명은 상기 개시된 실시 예에 한정되지 않으며, 후술되는 특허청구범위 내에 속하는 다양한 변형 및 등가물들도 포함한다. Although the present invention has been described with reference to the most practical and preferred embodiments, the present invention is not limited to the above disclosed embodiments, but also includes various modifications and equivalents within the scope of the following claims.

이상에서와 같이 본 발명의 실시 예에 따르면 뉴스 비디오의 효율적인 저장과 검색을 위한 디지털 뉴스 비디오 아카이브 시스템 구현을 위하여, 신경망 기법을 이용하여 앵커 화면을 정확하게 검출할 수 있다. As described above, according to an embodiment of the present invention, in order to implement a digital news video archive system for efficient storage and retrieval of news video, an anchor screen may be accurately detected using a neural network technique.

이러한 본 발명은 영상의 검색, 편집 분야 등에 응용될 수 있으며, 또한 디지털 뉴스 아카이브의 구현에 필수적인 기술이 되어 디지털 방송국의 조기 구축에 기여할 뿐만 아니라, 디지탈 신문이나 영화 등의 다른 디지털 영상 아카이브에 응용될 수 있을 것으로 기대된다.The present invention can be applied to the field of image search and editing, and is also an essential technology for the implementation of digital news archives, which contributes to the early construction of digital broadcasting stations, as well as other digital image archives such as digital newspapers and movies. It is expected to be able.

Claims

In the device for extracting the anchor screen from the news image data corresponding to the news video,

An image input unit which receives the news image data applied from the outside;

A scene change detector for detecting a scene change from the news image data and outputting an input pattern corresponding to the detected scene change; And

A classification pattern storage module for storing a plurality of classification patterns corresponding to an anchor screen, and a pattern comparison module for comparing the input pattern with the stored classification pattern based on a neural network technique and detecting an anchor screen from the input pattern. Anchor screen detection unit including

Including,

The scene change detection unit,

A DC image generation module for obtaining a DC image for scene change detection from the news image data;

A binarization module for performing binarization on the DC image; And

Normalization module for outputting an input pattern by normalizing the binarized DC image

Anchor screen extracting device comprising a.

The method of claim 1

And the classification pattern comprises a plurality of classification patterns modeled based on an anchor screen configuration variable including at least one of the number of anchors, the presence or absence of a news icon, and the location of a news icon.

The method according to claim 1 or 2

The classification pattern is classified into one of an anchor screen and a non-anchor screen, and the classification pattern corresponding to the anchor screen is a single anchor, a first model without a news icon, a single anchor, and a news icon on the right and above or below the anchor. And a second model having a single anchor, a third model having a news icon on the left and top or bottom portions of the anchor, and a fourth model showing a plurality of anchors.

The method of claim 1

The binarization module of the scene change detection unit uses the average value of the brightness values of the DC image obtained as the threshold value, and performs the binarization by comparing the brightness value and the threshold value of each pixel constituting the DC image. Extraction device.

The method according to claim 1 or 4

And the news image data comprises an MPEG bit stream including an I-frame, and the scene change detection unit detects a scene change only for a screen corresponding to an I frame from the news image data of the MPEG.

The method of claim 3,

The classification pattern corresponding to the anchor screen is modeled with one of the first to fourth models, and the classification pattern corresponding to the non-anchor screen is modeled with one of a plurality of preset error models.

In the anchor screen extraction method of the device for extracting the anchor screen from the news image data corresponding to the news video,

When the news image data from outside, the news image data consists of an MPEG bit stream including an I-frame, is input, the device obtains a DC image for the screen corresponding to the I frame from the news image data, Binarizing and then normalizing the obtained DC image to output an input pattern corresponding to a scene change; And

Detecting the anchor screen from the input pattern by comparing the input pattern with a plurality of classification patterns corresponding to the anchor screen based on the neural network technique;

Including;

The method of claim 7,

The classification pattern is a single anchor, a first model without a news icon, a single anchor, a second model with a news icon on the right and top or bottom of the anchor, modeled according to the anchor screen configuration variables, a single anchor and anchor And a classification pattern corresponding to any one of a third model having a news icon on a left side and an upper part or a lower part of the fourth model, and a fourth model in which a plurality of anchors appear.