KR101262352B1

KR101262352B1 - Analysys method for video stream data using evolutionary particle filtering

Info

Publication number: KR101262352B1
Application number: KR1020120016980A
Authority: KR
Inventors: 유준희; 석호식; 장병탁
Original assignee: 서울대학교산학협력단
Priority date: 2012-02-20
Filing date: 2012-02-20
Publication date: 2013-05-08

Abstract

PURPOSE: An analysis method for video stream data using evolutionary particle filtering is provided to analyze multi-modal video stream in real time, by extracting dependency relation of video stream. CONSTITUTION: A video stream analysis system generates a particle set showing dominant feature for the video stream data and divides video stream data into segment sets(S1110, S1120). The system performs evolutionary particles filtering the segment set(S1130). The system converts the filtered particle set into a transition probability matrix, by estimating sequence-dependent structure for the filtered particle set(S1140). [Reference numerals] (AA) Start; (BB) End; (S1110) Generate a particle set for video stream data; (S1120) Divide the video stream data to segments using the particle set; (S1130) Perform evolutionary particle filtering for the segment set; (S1140) Convert into a transition probability matrix by estimating a sequence-dependent structure

Description

ANALYSYS METHOD FOR VIDEO STREAM DATA USING EVOLUTIONARY PARTICLE FILTERING}

본 발명은 비디오 스트림 데이터에 대한 필터링 기술에 관한 것으로, 보다 상세하게는, 스트림 데이터를 파티클들의 집합으로 표현되는 세그먼트 집합으로 분할하고 진화 학습을 통하여 비디오 스트림을 유연하게 분석할 수 있는 진화 파티클 필터링을 이용한 비디오 스트림 분석 방법에 관한 것이다.
The present invention relates to a filtering technique for video stream data. More particularly, the present invention relates to an evolutionary particle filtering that divides stream data into a set of segments represented by a set of particles and flexibly analyzes the video stream through evolutionary learning. It relates to a video stream analysis method used.

영상 기술의 발전에 따라, 비디오 데이터에 대한 다양한 분석 및 학습이 이루어지고 있다. 특히, 최근에는 비디오 스트림에 대하여 의미적인 구분을 수행하려는 다양한 시도가 수행되고 있다.With the development of the imaging technology, various analysis and learning on the video data is being performed. In particular, various attempts have recently been made to perform semantic classification on video streams.

그러나, 종래의 기술들은 단순히 화면의 표현(구성)의 변화를 감지하여 영상을 구분짓거나 비교하는 수준에 불과하였다. 즉, 영상 자체에 대한 변동성을 기준으로 분석을 수행하는 것에 불과하여, 멀티 모달(Multi-modal)의 가치에 따라 비디오 스트림 데이터를 구분하거나 필터링 하는 등의 처리를 수행할 수 없는 한계를 가지고 있었다.
However, the related arts merely detect the change in the expression (composition) of the screen and distinguish the images or compare the images. In other words, the analysis is only performed based on the variability of the image itself, and has a limitation in that it cannot perform processing such as classifying or filtering video stream data according to the value of multi-modal.

본 발명은 비디오 스트림의 의존성 관계를 추출하여 소정의 멀티모달 비디오 스트림을 실시간으로 분석할 수 있는 진화 파티클 필터링을 이용한 비디오 스트림 분석 방법을 제공하고자 한다.An object of the present invention is to provide a video stream analysis method using evolutionary particle filtering that can extract a dependency relationship of a video stream and analyze a predetermined multimodal video stream in real time.

또한, 본 발명은 스트림 학습을 우점 특징 추출 및 의존성 학습 단계로 분할함으로써 사전 지식을 사용하지 않고도 매우 유연한 방식으로 스트림을 분석할 수 있는 진화 파티클 필터링을 이용한 비디오 스트림 분석 방법을 제공하고자 한다.
In addition, the present invention is to provide a video stream analysis method using evolutionary particle filtering that can analyze the stream in a very flexible manner without using prior knowledge by dividing stream learning into dominant feature extraction and dependency learning.

실시예들 중에서, 비디오 스트림 분석 방법은 (a) 비디오 스트림 데이터에 대하여 우점 특징을 나타내는 파티클의 집합을 생성하여, 상기 비디오 스트림 데이터를 세그먼트 집합으로 분할하는 단계, (b) 상기 분할된 복수의 세그먼트 집합에 대하여, 순서적 특징을 생성하도록 진화 파티클 필터링을 수행하는 단계 및 (c) 상기 진화 파티클 필터링이 수행된 파티클의 집합에 대하여 순서 의존성 구조를 추정하여 전이 확률 메트릭스로 전환하는 단계를 포함한다. Among the embodiments, the video stream analysis method comprises the steps of: (a) generating a set of particles representing a dominant feature with respect to the video stream data, dividing the video stream data into segment sets, and (b) the divided plurality of segments For the set, performing evolutionary particle filtering to generate an ordered feature, and (c) estimating an order-dependent structure for the set of particles on which the evolutionary particle filtering is performed and converting them into transition probability metrics.

일 실시예에서, 상기 (a) 단계는 (a-1) 원본 이미지에 대하여 특성을 추출하고, 추출된 특성을 이용하여 비주얼 워드 집합을 생성하는 단계, (a-2) 무작위로 선정된 비주월 워드를 연관하여 위치 정보를 가지는 파티클 집합을 생성하는 단계, 및 (a-3) 상기 파티클 집합에 대응되도록 상기 세그먼트 집합을 생성하는 단계를 포함할 수 있다.In an embodiment, the step (a) may include (a-1) extracting a feature from an original image and generating a visual word set using the extracted feature, (a-2) randomly selected non-monthly moon Generating a particle set having position information by associating words, and (a-3) generating the segment set to correspond to the particle set.

일 실시예에서, 상기 (a-1) 단계는 SHIF를 이용하여 상기 특성을 추출할 수 있다.In an embodiment, the step (a-1) may extract the characteristic using SHIF.

일 실시예에서, 상기 파티클 집합은 해당 집합에 포함된 개별 파티클을 노드와 에지의 쌍으로 표현 가능하고, 상기 노드는 비주얼 워드 집합에서 선택될 수 있다.In one embodiment, the particle set may represent individual particles included in the set as a pair of nodes and edges, and the node may be selected from a visual word set.

일 실시예에서, 상기 (b) 단계는 적합도를 중요 분포 q로서 이용하여 파티클을 선정하는 단계를 포함할 수 있다. In one embodiment, step (b) may comprise selecting particles using the goodness of fit as the critical distribution q.

일 실시예에서, 상기 적합도는 파티클 하나를 구성하는 노드간의 거리, 파티클 하나의 이미지 표현 능력 및 페널티를 고려하여 계산될 수 있다. In one embodiment, the goodness of fit may be calculated in consideration of the distance between the nodes constituting one particle, the ability to represent one particle, and the penalty.

일 실시예에서, 상기 진화 파티클 필터링은 상기 적합도 값의 분포를 이용하여 상기 파티클을 샘플링하며, 특정 시점의 어느 한 파티클 집합을 구성하는 파티클의 수는 고정될 수 있다. In one embodiment, the evolutionary particle filtering samples the particles using the distribution of the goodness-of-fit values, and the number of particles constituting a set of particles at a particular point in time may be fixed.

일 실시예에서, 상기 진화 파티클 필터링은 재샘플링에 따른 개체군 퇴보를 방지하기 위하여, 유전 연산자 결과물과 부모 세대 개체군의 적합도를 이용하여 새로운 개체군을 구성할 수 있다.In one embodiment, the evolutionary particle filtering may construct a new population using the goodness of fit of the genetic operator output and the parent generation population to prevent population degeneration due to resampling.

일 실시예에서, 상기 (c) 단계는 (c-1) 서로 상이한 파티클 집합을 비교 탐색하는 단계, (c-2) 상기 비교 탐색한 결과, 두 집합 사이의 차이가 특정 기준값 이상일 경우 새로운 우점 이미지라고 추정하는 단계 및 (c-3) 새로운 우점 이미지가 존재하면, 이를 전이 확률 자료 구조에 추가하는 단계를 포함할 수 있다.In an embodiment, the step (c) may include: (c-1) comparing and searching for different particle sets; and (c-2) comparing the two or more particle sets with a new reference image when the difference between the two sets is greater than or equal to a certain reference value. And (c-3) adding a new dominant image, if present, to the transition probability data structure.

일 실시예에서, 상기 비디오 스트림 분석 방법은 (d) 상기 파티클 집합을 이용하여 상기 세그먼트를 대표하는 이미지를 재생성하는 단계를 더 포함할 수 있다.
In one embodiment, the video stream analysis method may further comprise (d) regenerating an image representing the segment using the particle set.

본 발명에 따르면, 비디오 스트림의 의존성 관계를 추출하여 소정의 멀티모달 비디오 스트림을 실시간으로 분석할 수 있는 효과가 있다.According to the present invention, it is possible to extract a dependency relationship of a video stream to analyze a predetermined multimodal video stream in real time.

또한 본 발명에 따르면, 스트림 학습을 우점 특징 추출 및 의존성 학습 단계로 분할함으로써 사전 지식을 사용하지 않고도 매우 유연한 방식으로 스트림을 분석할 수 있는 효과가 있다.
In addition, according to the present invention, by dividing stream learning into dominant feature extraction and dependency learning, there is an effect that the stream can be analyzed in a very flexible manner without using prior knowledge.

도 1은 본 발명에 따른 이미지 표현 방법의 각 단계에 대한 일 예를 도시하는 참고도이다.
도 2는 본 발명에 따른 진화 파티클 필터링 단계에 대한 알고리즘을 설명하는 참고도이다.
도 3은 본 발명에 따른 순차적 의존성 학습에 대한 알고리즘을 설명하는 참고도이다.
도 4는 전술한 진화 개념을 설명하기 위한 참고도이다.
도 5는 평가 에피소드 하나에 대한 19명의 평가 결과를 나타내는 그래프이다.
도 6은 사람에 의한 평가 결과의 분포도를 나타내고 있다.
도 7은 사람에 의한 평가와 본 발명에 의한 결과를 비교하는 참고도이다.
도 8은 구간을 구성하는 개체군 중 적합도가 가장 높은 개체의 적합도 곡선을 도시하는 그래프이다.
도 9은 진화된 개체군에서 재생성된 이미지의 예이다.
도 10은 추정된 전이 확률 매트릭스에 기반하여 추정된 이미지 순서를 도시하는 참고도이다.
도 11은 본 발명에 따른 비디오 스트림 분석 방법의 일 실시예를 설명하는 순서도이다.1 is a reference diagram showing an example of each step of the image representation method according to the present invention.
2 is a reference diagram illustrating an algorithm for an evolutionary particle filtering step according to the present invention.
3 is a reference diagram illustrating an algorithm for sequential dependency learning according to the present invention.
4 is a reference diagram for explaining the above-described evolution concept.
5 is a graph showing 19 evaluation results for one evaluation episode.
6 shows the distribution of the evaluation results by the person.
7 is a reference diagram comparing the evaluation by the person and the result according to the present invention.
8 is a graph showing the goodness-of-fit curves of the individuals with the highest suitability among the population constituting the section.
9 is an example of an image reproduced in an evolved population.
10 is a reference diagram illustrating an estimated image order based on an estimated transition probability matrix.
11 is a flowchart illustrating an embodiment of a video stream analysis method according to the present invention.

본 발명에 관한 설명은 구조적 내지 기능적 설명을 위한 실시예에 불과하므로, 본 발명의 권리범위는 본문에 설명된 실시예에 의하여 제한되는 것으로 해석되어서는 아니 된다. 즉, 실시예는 다양한 변경이 가능하고 여러 가지 형태를 가질 수 있으므로 본 발명의 권리범위는 기술적 사상을 실현할 수 있는 균등물들을 포함하는 것으로 이해되어야 한다.Description of the present invention is only an embodiment for structural or functional description, the scope of the present invention should not be construed as limited by the embodiments described in the text. That is, the embodiments are to be construed as being variously embodied and having various forms, so that the scope of the present invention should be understood to include equivalents capable of realizing technical ideas.

한편, 본 발명에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.On the other hand, the meaning of the terms described in the present invention will be understood as follows.

"제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 아니 된다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.The terms "first "," second ", and the like are intended to distinguish one element from another, and the scope of the right should not be limited by these terms. For example, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

어떤 구성요소가 다른 구성요소에 "연결되어"있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결될 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어"있다고 언급된 때에는 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. 한편, 구성요소들 간의 관계를 설명하는 다른 표현들, 즉 "~사이에"와 "바로 ~사이에" 또는 "~에 이웃하는"과 "~에 직접 이웃하는" 등도 마찬가지로 해석되어야 한다.It is to be understood that when an element is referred to as being "connected" to another element, it may be directly connected to the other element, but there may be other elements in between. On the other hand, when an element is referred to as being "directly connected" to another element, it should be understood that there are no other elements in between. On the other hand, other expressions describing the relationship between the components, such as "between" and "immediately between" or "neighboring to" and "directly neighboring to", should be interpreted as well.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "포함하다"또는 "가지다" 등의 용어는 설시된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함을 지정하려는 것이며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.It should be understood that the singular " include "or" have "are to be construed as including a stated feature, number, step, operation, component, It is to be understood that the combination is intended to specify that it does not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, or combinations thereof.

각 단계들에 있어 식별부호(예를 들어, a, b, c 등)는 설명의 편의를 위하여 사용되는 것으로 식별부호는 각 단계들의 순서를 설명하는 것이 아니며, 각 단계들은 문맥상 명백하게 특정 순서를 기재하지 않는 이상 명기된 순서와 다르게 일어날 수 있다. 즉, 각 단계들은 명기된 순서와 동일하게 일어날 수도 있고 실질적으로 동시에 수행될 수도 있으며 반대의 순서대로 수행될 수도 있다.In each step, the identification code (e.g., a, b, c, etc.) is used for convenience of explanation, the identification code does not describe the order of each step, Unless otherwise stated, it may occur differently from the stated order. That is, each step may occur in the same order as described, may be performed substantially concurrently, or may be performed in reverse order.

본 발명은 컴퓨터가 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현될 수 있고, 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있으며, 또한, 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.The present invention can be embodied as computer-readable code on a computer-readable recording medium, and the computer-readable recording medium includes all kinds of recording devices for storing data that can be read by a computer system . Examples of the computer-readable recording medium include a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like, and also implemented in the form of a carrier wave (for example, transmission over the Internet) . In addition, the computer-readable recording medium may be distributed over network-connected computer systems so that computer readable codes can be stored and executed in a distributed manner.

여기서 사용되는 모든 용어들은 다르게 정의되지 않는 한, 본 발명이 속하는 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 것으로 해석되어야 하며, 본 발명에서 명백하게 정의하지 않는 한 이상적이거나 과도하게 형식적인 의미를 지니는 것으로 해석될 수 없다.
All terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. Commonly used predefined terms should be interpreted to be consistent with the meanings in the context of the related art and can not be interpreted as having ideal or overly formal meaning unless explicitly defined in the present invention.

본 발명에서는 비디오 스트림에 내재된 의존성 학습 방법을 소개한다. 본 발명에서는 주어진 순차적 스트림 데이터를 파티클 군집(particle population)으로 이루어진 의존성 구조로 변환하고자 한다. 각 파티클 군집은 해당 세그먼트(segment)를 요약하는 역할을 한다. 사전 지식을 가정하지 않고 비감독-온라인 학습으로 세그먼트 요약과 의존성 학습을 진행하다는 점이 본 발명의 특징이다. 본 발명에서 학습은 이단계로 진행된다. 첫 번째 단계에서는 진화 파티클 방법론에 기반하여 어느 한 세그먼트에 공통적으로 등장하는 우점 이미지를 찾고 우점 이미지의 변화를 통해 세그먼트를 추정한다. 각 우점 이미지는 이미지 표현자(image descriptor)의 조합으로 표현된다. 진화를 통해 우점 이미지를 설명할 수 있는 특징을 선택한다. 유전 연산자(genetic operator)는 샘플 악화(sample impoverishment) 현상 방지를 위해 군집에 다양성을 도입하는 역할을 한다. 두 번째 단계에서는 추정된 세그먼트 간의 전이 확률을 계산하고 저장한다. 본 발명을 이용하여 TV 드라마 에피소드에 존재하는 의존관계를 추출하였다. 본 발명에서는 인간 평가자의 평가 결과와 비교하여 본 발명의 성능을 확인하였다.
The present invention introduces the dependency learning method embedded in the video stream. In the present invention, it is intended to convert the given sequential stream data into a dependency structure composed of particle populations. Each particle cluster is responsible for summarizing its segments. It is a feature of the present invention that segment summarization and dependency learning proceed with non-supervised-online learning without assuming prior knowledge. In the present invention, learning proceeds to this stage. In the first step, based on the evolutionary particle methodology, we find the dominant image that is common to one segment and estimate the segment through the change of the dominant image. Each dominant image is represented by a combination of image descriptors. Choose a feature that explains the dominant image through evolution. Genetic operators play a role in introducing diversity into clusters to prevent sample impoverishment. In the second step, the probability of transition between the estimated segments is calculated and stored. The present invention was used to extract dependencies that existed in episodes of TV drama. In the present invention, the performance of the present invention was confirmed by comparison with the evaluation result of the human evaluator.

I. 서론I. Introduction

본 발명은 스트림 데이터를 관측하여 스트림의 순서를 확인하고, 확인된 스트림의 순서를 이용하여 보다 효율적으로 스트림 데이터를 분석할 수 있다. 본 발명은 멀티모달 순차 스트림을 분석하여 그에 내재된 의존성을 학습하는 방법을 개시한다. The present invention can observe stream data to confirm the order of streams, and analyze stream data more efficiently using the identified stream order. The present invention discloses a method of analyzing a multimodal sequential stream and learning the dependencies inherent therein.

본 발명은 연속된 스트림 데이터을 일련의 세그먼트 집합으로 분할한 후, 생성된 세그먼트 집합에서 내재된 의존성 관계를 추출한다. 각 세그먼트 하나는 해당 세그먼트의 우점 특징을 나타내는 파티클의 집합을 통해 요약된다. 분절화(segmenting) 과정을 통해, 주어진 순차적 스트림은 파티클 군집(particle population)의 순열로 변환되며 세그먼트 간의 전이 확률을 계산할 수 있다. 세그먼트 요약 파티클 집합 사이의 의존성 관계는 순차 스트림을 기억하는 또 다른 방법으로 사용된다. The present invention divides the continuous stream data into a series of segment sets, and then extracts the dependency relationships inherent in the generated segment sets. Each segment is summarized through a set of particles representing the dominant characteristics of that segment. Through segmentation, a given sequential stream is transformed into a permutation of particle populations and the transition probability between segments can be calculated. Segment summary The dependency relationship between a set of particles is used as another way to store sequential streams.

의존성 관계를 추출하기 위하여, 본 발명은 진화 파티클 필터링(PF) 방법을사용할 수 있다. 세그먼트에 대하여 그를 설명하는 은닉 변수에 대한 사전 분포를 가정하는 대신, 본 발명은 파티클의 그룹을 통해 해당 분포를 근사할 수 있다. In order to extract the dependency relationship, the present invention may use an evolutionary particle filtering (PF) method. Instead of assuming a prior distribution for the hidden variable that describes it for a segment, the present invention can approximate that distribution through a group of particles.

전통적인 파티클 필터 방법과 달리, 본 발명에서는 파티클 집합에 대하여 세그먼트를 설명할 수 있는 우점 이미지 패턴을 이용하여 표현할 수 있다. 비디오 스트림을 구성하는 실제 데이터는 불규칙적인 왜곡의 영향을 받으며 세그먼트 하나의 길이가 고정되어 있지 않기 때문에, 비디오 스트림을 세그먼트로 분할하는 작업은 쉽지 않다. 그러나 본 발명은 2단계 학습을 이용함으로써 이러한 어려움을 해결한다. 상술한 2단계 학습은 진화 단계와 의존성 학습 단계를 포함할 수 있다. 진화 단계에서는 파티클 진화를 통해 우점 특징을 추출할 수 있고, 의존성 학습 단계에서는 진화 단계의 결과에 기반하여 의존성 구조를 추정한 후 추정 결과를 전이 확률 매트릭스로 저장할 수 있다. 개시된 기술에 따른 스트림 학습 과정은 우점 특징 추출 및 의존성 학습 단계로 분할함으로써 사전 지식을 사용하지 않고도 매우 유연한 방식으로 스트림을 분석할 수 있는 효과가 있다. Unlike the traditional particle filter method, the present invention can express the particle set using a predominant image pattern that can describe a segment. Since the actual data constituting the video stream is affected by irregular distortion and the length of one segment is not fixed, dividing the video stream into segments is not easy. However, the present invention solves this difficulty by using two-step learning. The two-stage learning described above may include an evolutionary step and a dependency learning step. In the evolution stage, the dominant feature can be extracted through particle evolution. In the dependency learning stage, the dependency structure can be estimated based on the evolution stage result, and the estimation result can be stored as a transition probability matrix. The stream learning process according to the disclosed technique can be divided into dominant feature extraction and dependency learning stages, so that the stream can be analyzed in a very flexible manner without using prior knowledge.

II. 관련 연구II. related research

스트림 데이터에서 우점 이미지 구간을 분할하는 방법으로 멀티 모달(멀티 모달리티) 특성 분포에 기반한 비감독 스트림 클러스터링을 위한 계층 혼합 모델(layered mixture model)이 있다. 이러한 다이내믹 혼합 모델은 뉴스 방송에서 관찰할 수 있는 반복 숏을 이용했기 때문에 상대적으로 용이하게 스토리 경계를 발견할 수 있다. 그러나, 이러한 다이내믹 혼합 모델은 뉴스 방송 등이 아닌, 즉, 반복되는 고정 프레임을 기대할 수 없는 스트림 영상(예컨대, TV 드라마 등)에 대해서는 의미있는 분절화를 수행할 수 없다. There is a layered mixture model for unsupervised stream clustering based on a multi-modal (multi-modality) characteristic distribution as a method of dividing a dominant image section from stream data. This dynamic mixing model makes it possible to discover story boundaries relatively easily because it uses repetitive shots that can be observed in news broadcasts. However, such a dynamic mixing model cannot perform meaningful segmentation on stream pictures (eg, TV dramas, etc.) that are not news broadcasts or the like, that is, expecting no repeated fixed frames.

개시된 기술은, 우점 이미지를 공유하는 모든 프레임을 생성할 수 있는 은닉 변수의 존재를 가정하고, 파티클 집합을 이용하여 은닉 분포를 근사할 수 있다. 또한, 개시된 기술은 TV 드라마의 분절화 과정에 따른 비주얼 특성(visual feature) 구성을 처리하기 위하여, 사전 처리된 이미지에서 주요 특성을 선택하고 우점 이미지가 변하였다고 판단되면 다시 주요 특성을 선택하는 방식을 사용할 수 있다. The disclosed technique assumes the existence of a hidden variable that can produce all frames that share a dominant image, and can approximate the hidden distribution using a set of particles. In addition, the disclosed technique uses a method of selecting a main feature from a preprocessed image and selecting a main feature again when it is determined that the predominant image has changed in order to process the visual feature configuration according to the segmentation process of the TV drama. Can be.

개시된 기술은 주어진 스트림에 대한 생성 모델을 구성하기 위하여 파티클 필터링 방법을 이용할 수 있다. 파티클 필터링은 은닉 상태(latent state)에서도 훌륭한 유추 능력을 가지고 있다. 즉, 국부적 독립성(일련의 관찰 변수 기저에 은닉 변수가 있다면 해당 은닉변수를 알고 있다는 조건에서 관찰 변수들의 통계적 독립성을 달성할 수 있다)의 단순함 때문에 은닉 변수 모델은 관찰된 변수 설명을 위한 큰 잠재력을 가지기 때문이다.The disclosed technique can use a particle filtering method to construct a generation model for a given stream. Particle filtering has good analogy in the latent state. In other words, because of the simplicity of local independence (there can be statistical independence of observed variables under the condition that they are known if they are hidden under a set of observed variables), the hidden variable model has great potential for explaining observed variables. Because it has.

본 발명에서 수정된 파티클 필터링 방법을 제안한다. 전형적인 유전자 알고리즘 접근법에서는 최고의 크로모좀이 개체군을 지배하는 현상이 발생하므로, 본 발명의 접근법은 일반적인 유전자 알고리즘 접근법과는 다른 접근을 수행할 수 있다. 따라서, 본 발명은 유전 연산자를 이용하여 개체군의 다양성을 유지할 수 있다. 본 발명에서는 파티클 집합이 협력적인 방식으로 이미지를 표현한다. 다시 말해 단일 파티클의 부분적인 표현을 조합하는 파티클 집합을 이용함으로써, 본 발명은 단일 파티클의 제한된 표현 능력을 극복할 수 있다.In the present invention, a modified particle filtering method is proposed. In typical genetic algorithm approaches, the best chromosomes dominate the population, so the approach of the present invention can take a different approach than the conventional genetic algorithm approach. Thus, the present invention can maintain genetic diversity using genetic operators. In the present invention, a set of particles represents an image in a cooperative manner. In other words, by using a particle set that combines partial representations of a single particle, the present invention can overcome the limited representation ability of a single particle.

실세계 임무의 정보 컨텐츠는 정적이거나 공간적인 특성 뿐 아니라 시간적인 순서에도 좌우되기 때문에 멀티모달적 특성을 가지게 되고, 따라서 순차 데이터의 의존 관계 구조 모델링 연구는 중요한 요소 중의 하나이다. 가우시안 그래픽 모델(sparse Gaussian graphical model)에 기반한 벡터 값의 상호 관계 구조는 복수 개의 전환점이 있는 순차 스트림 처리의 비감독 방식 분석을 적용하고 있다. 즉, 알려지지 않은 K개의 구간(

)이 존재하는 스트림에서 해당 구간이 서로 독립적일 경우, 시간 1에서 T에서 관찰된 스트림은 [수학식 1]과 같이 표현할 수 있다. Since the information content of real world mission is not only static or spatial but also depends on temporal order, it is multimodal. Therefore, the study of dependency modeling of sequential data is one of the important factors. The vector value correlation structure based on the sparse Gaussian graphical model applies an unsupervised analysis of sequential stream processing with multiple turning points. That is, K unknown segments (

When the corresponding sections are independent of each other in the stream in which) exists, the stream observed in T at time 1 may be expressed as Equation 1 below.

여기에서, 시간 t부터 시간 s까지의 데이터가 동일한 구간에 속할 확률은 [수학식 2]와 같이 정의될 수 있다.Here, the probability that the data from time t to time s belong to the same section may be defined as shown in [Equation 2].

기존의 기술은 연속된 지점 사이의 시간에 대한 사전 분포를 네거티브 바이노미얼(negative binomial) 분포로 가정하였다. 기존 기술과 달리 본 발명의 목표는 주어진 스트림을 가장 잘 표현할 수 있는 회귀 모델을 찾는 것이 아니고, 기저 분포를 가정하는 것이 실용적이지 못한 특성을 갖는 스트림의 구간 구조를 추정하는 것이다. 그러므로 본 발명은 [수학식 2]를 계산하는 대신, 현재 파티클 개체군이 주어졌을 때 새로운 프레임의 우도(likelihood)를 계산하여 [수학식 2]를 계산하는 것을 회피할 수 있다. Existing techniques assume that the prior distribution of time between successive points is a negative binomial distribution. Unlike the prior art, the aim of the present invention is not to find a regression model that can best represent a given stream, but to estimate the interval structure of a stream whose characteristics are not practical to assume a base distribution. Therefore, instead of calculating Equation 2, the present invention can avoid calculating Equation 2 by calculating the likelihood of a new frame given the current particle population.

이를 위하여, 본 발명은 우점 이미지의 변화를 포착하고 이를 이용할 수 있다. 본 발명은 이미지 변화 모델링 과정에서 가우시안 혼합 모델(Gaussian mixture model)을 사용할 수 있다. 본 발명은 가우시안 혼합 모델(Gaussian mixture model)에서 주어진 신뢰 수준을 만족하는 픽셀을 전경 픽셀로 간주할 수 있다. 본 발명은 전경/후경 픽셀을 구분하지 않고 SIFT (Scalar invariant feature transform) 방법을 통해 만들어진 비주얼 워드를 사용할 수 있다.
To this end, the present invention can capture and use changes in the dominant image. The present invention can use a Gaussian mixture model in the image change modeling process. In the present invention, a pixel satisfying a given confidence level in a Gaussian mixture model may be regarded as a foreground pixel. According to the present invention, a visual word made through a scale invariant feature transform (SIFT) method may be used without distinguishing between foreground and background pixels.

III. 진화 파티클 필터링과 의존성 학습 III. Evolution particle filtering and dependency learning

이하에서는, 본 발명에 따른 의존성 학습 방법을 설명한다. Hereinafter, the dependency learning method according to the present invention will be described.

본 발명은 우점 이미지 변화 추정을 통해 주어진 스트림을 분절화(segmentation)할 수 있다. 본 발명에서 제공하는, 진화된 파티클은 연계된 구간을 요약하는 생성 모델 역할을 하며 해당 구간을 재생성할 수 있다. The present invention can segment the given stream through the estimation of the dominant image change. The evolved particles provided by the present invention serve as a generation model that summarizes the associated sections and can regenerate the sections.

본 발명은 우점 이미지를 이용하여 파티클 집합을 만들어내고, 생성된 파티클 집합을 순서 정보로 변환하기 위하여 진화 파티클 필터링 단계(알고리즘 1)와 순서 의존성 학습 단계(알고리즘 2)로 구성된 학습 방법을 제공할 수 있다.The present invention can provide a learning method comprising an evolutionary particle filtering step (algorithm 1) and an order dependency learning step (algorithm 2) to generate a particle set using a dominant image and convert the generated particle set into order information. have.

도 1은 본 발명에 따른 이미지 표현 방법의 각 단계에 대한 일 예를 도시하는 참고도로서, 전술한 바와 같이, 본 발명은 원본 이미지에서 SIFT 등을 이용하여 특성을 추출하고<그림 (a)>, 추출된 특성을 이용하여 비주얼 워드 집합을 생성할 수 있다<그림 (b)>. 이후, 무작위로 선정된 비주얼 워드를 이용하여 위치 정보를 갖춘 파티클을 생성할 수 있다<그림 (c)>. 여기에서, 파티클은 특성 및 해당 특성의 위치에 기반하여 이미지의 일정 부분을 커버하도록 할 수 있다<그림 (d)>.1 is a reference diagram showing an example of each step of the image representation method according to the present invention, as described above, the present invention extracts characteristics from the original image using SIFT and the like <Figure (a)> Using the extracted features, a visual word set can be generated (Figure (b)). After that, particles with location information can be generated using randomly selected visual words (Figure (c)). Here, the particles can be made to cover a portion of the image based on the feature and the location of the feature <Figure (d)>.

도 1은 본 발명에 따른 크로모좀 구조를 설명하고 있다. 본 발명에서는 어느 한 구간의 우점 특성을 추출하고자 하므로, 이를 협력적으로 표현하기 위하여 각 파티클은 제한 되지 않은 SIFT 특성의 집합으로 구성될 수 있다. 이를 수식으로 표현하면 [수학식 3]과 같다.1 illustrates a chromosome structure according to the present invention. In the present invention, to extract the dominant characteristic of any one section, in order to cooperatively express this, each particle may be composed of a set of unlimited SIFT characteristics. If this is expressed as an equation, Equation 3 is obtained.

도 1에 도시된 각 원은 이미지의 특정 특성을 표현할 수 있고, 사각형은 특정 이미지 특성 그룹을 대표하는 SIFT 특성에 해당한다. 진화를 통해 파티클을 걸러내면 주어진 이미지의 필수적인 특성만을 표현하는 파티클 집합(

)을 획득할 수 있다.Each circle illustrated in FIG. 1 may represent a specific characteristic of an image, and a rectangle corresponds to a SIFT characteristic representing a specific image characteristic group. When filtering particles through evolution, a set of particles expressing only the essential characteristics of a given image (

) Can be obtained.

도 2는 본 발명에 따른 진화 파티클 필터링에 대한 알고리즘을 설명하는 참고도이다. 이하에서는 도 2를 참조하여 진화 파티클 필터링에 대하여 설명한다.2 is a reference diagram illustrating an algorithm for evolutionary particle filtering according to the present invention. Hereinafter, the evolutionary particle filtering will be described with reference to FIG. 2.

진화 파티클 필터링 단계의 목표는 구간의 필수 특성을 포착하는 파티클 필터를 생성하는 것이다. 이러한 목표를 시간성을 감안한 방식으로 표현하면, 도시된 식 (3), 아래의 [수학식 4]와 같이 표현될 수 있다. 여기에서,

는 현재 구간의 첫번째 프레임을 의미한다.The goal of the evolutionary particle filtering step is to create a particle filter that captures the essential characteristics of the interval. If this goal is expressed in a manner that takes into account time, it may be expressed as shown in Equation (3) below. From here,

Means the first frame of the current section.

새로운 프레임이 입력되면 본 발명에서는 새로운 프레임의 특성을 반영하여 기존 파티클 개체군을 보완할 수 있다. 그러나 새로운 구간의 가능성을 무시할 수도 없으므로, 본 발명에서는 현재 개체군에서 새로운 프레임을 생성할 우도를 이용하여 기존 파티클 개체군을 보완할 수 있다.When a new frame is input, the present invention may complement the existing particle population by reflecting the characteristics of the new frame. However, since the possibility of the new section cannot be ignored, the present invention can supplement the existing particle population by using the likelihood to generate a new frame in the current population.

본 발명에서는 하이퍼 네트워크와 유사한 컨셉으로, 개별 파티클을 <N, E> (N: 노드, E: 노드를 연결하는 에지)로 표현할 수 있다. 노드는 비주얼 워드 집합에서 선택될 수 있고, 에지는 노드 간을 연결할 수 있다. 여기서 새로운 점은 여러 파티클이 협력적인 방식으로 이미지를 표현한다는 점이다. 이미지의 차원이 너무 크기 때문에 전체 이미지를 표현할 수 있는 단일 데이터 유형을 가정하는 것은 비실용적이고, 반면 단일 특징(단일 SIFT 특징)을 표현하는 데이터 구조는 너무 제한된 것일 수 있다. 따라서, 본 발명은 그 중간을 의도하여, 파티클 집합으로 우점 패턴을 이용할 수 있다. 즉, 비록 개별 파티클은 제한된 표현 능력을 갖고 있지만 전체 이미지의 부분을 표시하는 파티클 집합으로 우점 패턴을 포착하는 것이 가능하다. 전체 개체군에서도 적합도가 높은 파티클의 그룹이 이미지를 표현 과정에 활용될 수 있다. In the present invention, similar to a hyper network, individual particles may be expressed as <N, E> (N: node, E: edge connecting nodes). Nodes can be selected from a set of visual words, and edges can connect between nodes. What's new here is that multiple particles represent images in a collaborative way. It is impractical to assume a single data type that can represent an entire image because the dimensions of the image are so large, while the data structure representing a single feature (single SIFT feature) may be too limited. Therefore, the present invention is intended in the middle, it is possible to use a dominant pattern as a set of particles. In other words, although individual particles have limited expressive power, it is possible to capture a dominant pattern with a set of particles that represent part of the entire image. Even in the entire population, a group of highly suitable particles can be used to express the image.

본 발명에 따른 탐색 프로세스는 현재 파티클 집합

와 그 구조적 특성을 반영할 수 있도록 탐색 프로세스에 탐색 바이어스를 적용할 수 있다. 여기에서, 적합도를 계산하는 것을 수식으로 나타내면, 도 2의 수식(4) 또는 아래의 [수학식 5]로 표현될 수 있다.The search process according to the present invention is the current particle set

The search bias can be applied to the search process to reflect the and its structural characteristics. Here, when the calculation of the goodness of fit is represented by a formula, it can be represented by the formula (4) of FIG. 2 or the following [Equation 5].

적합도를 계산하는 [수학식 5]는 세 가지 하위 기준으로 구성된다.

는 파티클 하나를 구성하는 노드간의 거리를 평가한다. 파티클 하나를 구성하는 노드 간의 거리가 너무 클 경우 프레임이 변할 때 파티클 노드의 일부 노드가 사라질 가능성이 있기 때문에 해당 파티클이 충분히 견고하지 않을 가능성이 있다. 그러므로

는 파티클 하나를 구성하는 노드의 평균 거리가 짧은 파티클을 더욱 선호하도록 할 수 있다.

는 파티클 하나의 이미지 표현 능력을 나타낸다.

는 페널티를 의미한다. 만약 어느 한 파티클이 너무 많은 노드를 지니고 있다면 파티클이 너무 자세한 정보를 보유하고 있을 가능성이 있다. 보다 개괄적인 표현 수준을 유지하기 위하여 본 발명은 구성 노드의 수가 적은 파티클을 보다 적극적으로 이용할 수 있다. 협력적인 방식으로 파티클 집합이 이미지를 표현하기 때문에 도 2에 도시된 식 (5)는 파티클 집합

의 함수이다. 반면 적합도는 개별 크로모좀에 적용되기 때문에 식 4는 파티클의

함수이다.Equation 5, which calculates the goodness of fit, consists of three sub-criteria.

Evaluates the distance between nodes that make up a particle. If the distances between the nodes that make up a particle are too large, it is possible that the particles will not be strong enough because some nodes of the particle node will disappear when the frame changes. therefore

May further favor particles having a short average distance of nodes constituting one particle.

Represents the ability of the particle to represent one image.

Means penalty. If a particle has too many nodes, it is possible that the particle has too much information. In order to maintain a more general presentation level, the present invention can more actively use particles having a small number of configuration nodes. Since the particle set represents an image in a cooperative manner, equation (5) shown in FIG.

. On the other hand, since fitness is applied to individual chromosomes, Equation 4

Function.

실제 기저 분포를 근사하기 위해 파티클 필터링 방법에서는 중요 분포

가 요구될 수 있다. 본 발명에서는

가 아닌 적합도 값[수학식 5]의 분포에 기반하여 파티클을 샘플링할 수 있다. 여기에서, 특정 시점에서 어느 한 파티클 개체군을 구성하는 파티클의 수

는 고정될 수 있다. In order to approximate the actual basis distribution, the particle filtering method has a significant distribution.

May be required. In the present invention,

Particles can be sampled based on the distribution of goodness-of-fit values (Equation 5). Where the number of particles that make up a particle population at any given time

Can be fixed.

새로운 파티클들은 진화를 통해 만들어지므로, 파티클 필터링에서 재샘플링을 과도하게 하면 개체군 퇴보(degeneracy)의 현상이 발생할 수 있다. 이를 해결하기 위하여, 본 발명은 유전 연산자를 통해 새로운 파티클을 도입하여 개체군 퇴보 문제를 해결할 수 있다. 이를 수식으로 표현하면, [수학식 6]과 같이 표현될 수 있다.Since new particles are created through evolution, excessive resampling in particle filtering can lead to degeneracy. In order to solve this problem, the present invention can solve the problem of population degeneration by introducing new particles through genetic operators. If this is expressed as an equation, it may be expressed as shown in [Equation 6].

고정된 개체군 크기(

)를 유지하기 위하여 유전 연산자 결과물과 부모 세대 개체군의 적합도를 이용하여 새로운 개체군이 구성된다.Fixed population size (

In order to maintain), a new population is constructed using the fit of the genetic operator output and the parent generation population.

파티클 집합이 이미지의 주요 특성을 지니고 있기 때문에, 이미지 재생성 과정을 주어진 파티클 집합이 이미지를 재생성할 우도를 최대화로 만들어내는 이미지 탐색 과정으로 설명할 수 있다. 이러한 탐색 과정을 수식으로 표현하면, [수학식 7]과 같이 표현될 수 있다.Because particle sets have the main characteristics of an image, the process of regenerating an image can be described as an image search process in which a given particle set maximizes the likelihood to recreate an image. When the search process is expressed by an equation, it may be expressed as shown in [Equation 7].

이러한 접근법에 있어서 과도한 계산 부하가 발생할 수 있으므로, 본 발명에서는 각 노드가 보유하고 있는 SIFT 특징을 활용하여 이미지를 직접 생성할 수 있다. 이 접근법은 식 7에서

에 의해 표현될 수 있다.In this approach, excessive computational load may be generated, and thus, the present invention may directly generate an image by utilizing the SIFT feature possessed by each node. This approach is

Can be represented by

진화된 파티클은 특정 구간의 우점 이미지를 표현하지만, 새로운 구간에 속한 프레임이 관찰될 때면 새로운 파티클 개체군

을 생성해야 한다. 이를 위하여, 본 발명은 계산된 우도 값이 기준값보다 작다면 새로운 구간이라고 판단할 수 있다. 여기에서, 새로운 프레임

의 우도는 [수학식 8]을 이용해서 추정될 수 있다.An evolved particle represents the dominant image of a particular segment, but when a frame belonging to the new segment is observed, a new particle population

You need to create To this end, the present invention may determine that the calculated likelihood value is a new interval if it is smaller than the reference value. Here, the new frame

The likelihood of can be estimated using Equation 8.

여기서

는 사전 지식의 하이퍼 인자를 의미한다.
here

Means hyperfactor of prior knowledge.

도 3은 본 발명에 따른 순차적 의존성 학습에 대한 알고리즘을 설명하는 참고도이다. 이하에서는 도 3을 참조하여 순차적 의존성 학습에 대하여 설명한다.3 is a reference diagram illustrating an algorithm for sequential dependency learning according to the present invention. Hereinafter, sequential dependency learning will be described with reference to FIG. 3.

진화 파티클 필터링이 끝나면 파티클 개체군의 집합 Pf =

이 획득된다. 두 번째 단계(알고리즘 2)에서는 Pf를 전이 확률 매트릭스로 전환할 수 있다. 알고리즘 2에서 제안된 전이 방법을 설명한다.After evolution particle filtering is complete, the set of particle populations, Pf =

Is obtained. In the second step (Algorithm 2), we can convert Pf into a transition probability matrix. The transition method proposed in Algorithm 2 is described.

사전에 |Pf|를 알 수는 없지만, 실제 세그먼트의 수는 추정된 |Pf| 보다 작을 것이다. 이는 드라마 에피소드가 제한된 등장인물이 제한된 배경에서 연기하기 때문이다. 따라서 본 학습 단계에서 중요한 점은 동일한 우점 이미지를 갖고 있는 세그먼트를 찾는 것이다. 본 발명은

를 비교하여 동일한 우점 이미지

를 갖고 있다고 추정할 수 있는 비슷한

를 찾는 것으로 이 문제를 해결하였다. 도 3에 도시된 식을 통하여 특정

에 대응되는

를 찾는 방법을 설명할 수 있다. 즉 서로 다른 파티클 개체군

와

를 비교한 후, 두 개체군 사이의 차이가 특정 기준값 이상일 경우 새로운 우점 이미지라고 추정하고, 전이 확률 자료 구조 T에 추가하는 방식으로 예측된 파티클 개체군의 종류를 줄일 수 있다.You cannot know | Pf | beforehand, but the actual number of segments is estimated | Pf | Will be less than This is because characters with limited drama episodes play in a limited background. Therefore, the important point in this learning step is to find the segments with the same dominant image. The present invention

Compare the same dominant image

Similar to assuming you have

Finding this solves this problem. Specific through the equation shown in FIG.

Corresponding to

Explain how to find Different particle populations

Wow

After comparing, we can reduce the predicted particle population by estimating a new dominant image if the difference between two populations is above a certain reference value and adding it to the transition probability data structure T.

진화 후 각 세그먼트는 해당 파티클 개체군을 보유하게 되므로, 본 발명은 파티클 개체군을 이용하여 세그먼트를 대표하는 이미지를 재생성할 수 있다. 사전 획득한 SIFT 특성이 노드에 남아 있기 때문에 간단하게 이미지를 재생성할 수 있다. 각 파티클은 이미지 패치와 해당 패치의 위치(각 이미지 패치의 중심점)를 갖고 있는 SIFT 특성의 집합이라고 해석할 수 있다. 각 파티클은 이미지의 작은 부분을 표시하지만 파티클이 여러 개 모일 경우 각 노드에 간직된 이미지 패치와 위치 정보가 충분히 축적되므로 사람이 구분할 수 있는 이미지를 재구성할 수 있다.
Since each segment has a particle population after evolution, the present invention can reproduce an image representing the segment using the particle population. The pre-acquired SIFT characteristic remains at the node, making it simple to recreate the image. Each particle can be interpreted as a set of SIFT properties that has an image patch and the location of that patch (the center point of each image patch). Each particle represents a small portion of the image, but if there are many particles, the image patch and location information stored in each node are accumulated enough to reconstruct a human-identifiable image.

도 4는 전술한 진화 개념을 설명하기 위한 참고도이다.4 is a reference diagram for explaining the above-described evolution concept.

도 4를 참조하면, 본 발명은 구간 시작 프레임에서 일련의 SIFT 특징을 추출한 후 크로모좀 노드로 사용함을 알 수 있다. 초기 개체군을 선택한 후, 새로운 프레임이 관찰될 때마다 진화를 반복할 수 있다. 그러나 새로운 프레임이 관찰되면, 현재 개체군에서 주어진 프레임을 재생성할 우도를 계산하여 새로운 구간의 기준(구간의 시작)으로 활용한다. 파티클 집합이 해당 구간을 나타내며 해당 구간의 우점 비주얼 특징 재생성에 사용될 수 있다.
Referring to FIG. 4, it can be seen that the present invention extracts a series of SIFT features from an interval start frame and then uses them as chromosome nodes. After selecting the initial population, evolution can be repeated whenever a new frame is observed. However, when a new frame is observed, the likelihood for regenerating the given frame in the current population is calculated and used as the basis for the new interval (the beginning of the interval). A particle set represents a corresponding section and may be used to regenerate the dominant visual feature of the corresponding section.

IV. 실험 결과IV. Experiment result

이하에서는, 소정의 실제 드라마 시리즈를 대상으로 본 발명의 수행 결과와 19명의 사람에 의한 수행 결과를 비교하여 개시한다. 평가 에피소드의 총 재생 시간은 125분 30초이며 테스트 자료 처리를 위해 총 7530개의 프레임을 추출하였다.
In the following, the results of the present invention are compared with those of 19 people for a given actual drama series. The total duration of the evaluation episode was 125 minutes and 30 seconds, and a total of 7530 frames were extracted to process the test data.

본 발명에 의한 추정 결과를 다루기 전에 사람에 의한 판단 결과를 정리하면 도 5, 도 6 내지 표 1과 같다.Before dealing with the estimation result according to the present invention, the results of judgment by the person are summarized as shown in FIGS. 5 and 6 to Table 1.

도 5는 평가 에피소드 하나에 대한 19명의 평가 결과를 나타내는 그래프이고, 도 6은 사람에 의한 평가 결과의 분포도를 나타내고 있다. [표 1]은 변화 구간에 대한 통계적 요약을 보이고 있다.Fig. 5 is a graph showing the evaluation results of 19 persons for one evaluation episode, and Fig. 6 shows a distribution diagram of the evaluation results by humans. Table 1 shows the statistical summary of the change intervals.

평균Average 표준편차Standard Deviation 최소at least 최대maximum 에피소드 1Episode 1 3.85초3.85 seconds 2.33초2.33 seconds 1.00초1.00 sec 9.00초9.00 sec 에피소드 2Episode 2 3.48초3.48 s 2.37초2.37 seconds 1.00초1.00 sec 9.00초9.00 sec 에피소드 3Episode 3 3.17초3.17 seconds 2.05초2.05 seconds 1.00초1.00 sec 7.00초7.00 seconds

도시된 바와 같이, 사람들에 의한 평가는 어느 한 변환점에 모두 동의하지 않고 있기에, 인접한 여러 개의 변환 지점을 모아서 변환 구간을 구성하였다(도 5). [표 1]에서 새롭게 만들어진 구간의 통계적 특징을 정리하였다. 각 구간의 평균 길이는 각각 3.85초, 3.48초, 3.17초이다. 변환 구간의 길이가 1초인 최소값은 합리적이지만 구간의 최대 길이 9초는 너무 과도하게 긴 구간인 것으로 해석될 수 있다. 최대 구간은 도시의 스카이라인이 계속해서 변하는 장면에 해당하는 것으로 어느 한 시점에서 우점 이미지가 변한다고 판단하기 매우 어려운 구간에 해당한다. 비록 세그먼트의 수를 사전에 알 수는 없지만 비주얼 워드와 같은 기본 구성 요소를 사전에 처리해야 이미지 분석이 가능하다. 변화에 강건한 특성 획득을 위해 SIFT (Scale Invariant Feature Transform)를 이용하여 이미지를 전처리하였다.As shown in the figure, the evaluation by people does not agree with any one of the conversion points, and thus, a plurality of adjacent conversion points are collected to form a conversion interval (FIG. 5). Table 1 summarizes the statistical characteristics of the newly created sections. The average length of each interval is 3.85 seconds, 3.48 seconds, and 3.17 seconds, respectively. The minimum value of 1 second of the conversion interval is reasonable, but the maximum length of the interval of 9 seconds may be interpreted as being too long. The maximum section corresponds to a scene in which the city skyline is continuously changing, and corresponds to a section in which it is very difficult to determine that the dominant image changes at one point in time. Although the number of segments can't be known in advance, image analysis is required before the basic components such as visual words can be processed. The image was preprocessed using Scale Invariant Feature Transform (SIFT) to acquire robust characteristics.

도 7과 [표 2]는 사람에 의한 평가와, 본 발명에 의한 결과를 비교하는 자료이다.7 and Table 2 are the data comparing the evaluation by the person and the result according to the present invention.

에피소드 1Episode 1 에피소드 2Episode 2 사람 평가Person rating 변화점의 수The number of changes 169169 191191 구간 수Number of bins 4141 5050 본 발명Invention 변환점 수Number of conversion points 763763 744744 정확성 (Precision)Precision 0.0760.076 0.1010.101 리콜 (Recall)Recall 0.3430.343 0.3930.393

[표 2]를 살펴보면, 에피소드 1에서는 19명의 평가자가 169건의 변환점을 평가하였으며 이 변환점에 바탕하여 41개의 변환점 구간을 구성하였다. 에피소드 2에서는 사람들은 191건의 변환점을 평가하였으며 이에 바탕하여 50개의 변환점 구간을 구성하였다. 반면, 본 발명은 에피소드 1에 대하여 763개의 변환점을 추정하였고 에피스도 2에 대해서는 744개의 변환점을 추정하였다. 비디오 스트림 분석에서 인간 평가자의 평가점과 컴퓨터가 추정한 평가지점이 정확하게 일치하리라고 기대하는 것은 합리적이지 못하다. 따라서 이를 위하여 추정된 변환점이 인간이 평가한 변환점 구간에 속할 경우 정확한 추정으로 간주하였다. 보다 전통적인 방식으로 성능을 제시하기 위하여 본 발명은 정확성과 리콜 결과도 함께 공개하였다. 리콜은 컴퓨터가 추정한 변환점의 규모와 컴퓨터 추정 변환점 집합에 포함된 인간 평가 변환점의 비율로 정의하였다. 정확성은 추정된 변화점과 정확하게 추정한 변환점의 비율로 정의하였다. 정확성 관점에서 본 발명은 에피소드 1에서 0.076, 에피소드 2에서 각각 0.101을 달성하였다. 리콜 성능은 각각 0.3431, 0.3926이다. 정확성과 리콜 개선 관점에서

로 계산된 우도는 개선의 여지가 많다. 예를 들어 우도의 경향에 집중하여 변환지점을 추정하는 것도 가능하다. 비록 인간 평가 결과보다 매우 저조하지만 본 발명은 SIFT 변환 특성만 사용했다는 사실을 고려했을 때 상당히 유의미한 결과를 달성하였다.As shown in Table 2, in

Episode

1, 19 evaluators evaluated 169 conversion points, and 41 conversion point intervals were constructed based on the conversion points. In episode 2, people assessed 191 transformation points and constructed 50 transition points. In contrast, the present invention estimates 763 transform points for episode 1 and 744 transform points for episodic 2. In video stream analysis, it is not reasonable to expect that the human evaluator's score and the computer-supplied score will match exactly. Therefore, if the estimated conversion point belongs to the human-converted conversion point interval, it is regarded as an accurate estimation. In order to present the performance in a more traditional manner, the present invention also discloses accuracy and recall results. The recall was defined as the ratio of computer estimated transform points to the human estimated transform points included in the set of computer estimated transform points. Accuracy was defined as the ratio of the estimated change point to the accurately estimated conversion point. In terms of accuracy, the present invention achieved 0.076 in episode 1 and 0.101 in episode 2, respectively. Recall performance is 0.3431 and 0.3926 respectively. In terms of improving accuracy and recall

The likelihood, calculated as, has much room for improvement. For example, it is possible to estimate the conversion point by focusing on the likelihood of likelihood. Although much lower than the human evaluation results, the present invention achieved significantly significant results considering the fact that only SIFT transform properties were used.

도 7에서 에피소드 1의 35분 34초에서 37분 34초 구간에서 제안 방법의 추정 결과의 인간 평가 결과를 비교하였다. 도 7에 도시된 바와 같이, 추정 변환점 사이의 간격이 짧은 것을 알 수 있다. In FIG. 7, the human evaluation result of the estimation result of the proposed method was compared between 35 minutes 34 seconds and 37 minutes 34 seconds of episode 1. As shown in FIG. 7, it can be seen that the interval between the estimated conversion points is short.

도 8은 구간을 구성하는 개체군 중 적합도가 가장 높은 개체의 적합도 곡선을 도시하는 그래프이다. 개체군을 구성하는 크로모좀들은 생성된 후속 세대에 더 높은 적합도를 갖는 개체가 있을 때만 교체되기 때문에 도 8의 적합도 곡선은 계단 형태를 보이고 있다. 도 8을 구성하는 구간의 경우, 해당 구간이 지속되는 동안 총 60세대의 진화가 발생하였다. 본 발명은 실시간에 스트림을 분석하는 방법론을 목표로 하기 때문에 진화 세대의 수에 제한을 두지 않았다. 그러므로 본 발명은 제한된 시간 동안 비디오 스트림 분석에서의 다양성과 수렴성을 충족할 수 있는 최적의 방법을 찾을 필요가 있다.8 is a graph showing the goodness-of-fit curves of the individuals with the highest suitability among the population constituting the section. The fitness curve of FIG. 8 shows a stepped shape because the chromosomes that make up the population are replaced only when there is an individual with higher fitness in the subsequent generation that is created. In the case of the section constituting Figure 8, a total of 60 generations of evolution occurred during the duration of the section. Since the present invention aims at a methodology for analyzing streams in real time, there is no limit to the number of evolutionary generations. Therefore, the present invention needs to find an optimal way to meet the diversity and convergence in video stream analysis for a limited time.

도 9은 진화된 개체군에서 재생성된 이미지의 예이다. 이 이미지는 20개 파티클에서 생성된 것이다. 20개의 파티클 만으로도 판독 가능한 이미지를 생성할 수 있었다. 각 개체군은 해당 구간의 우점 이미지를 재생성할 수 있는 충분한 정보를 보유하고 있기 때문에 제안된 방법을 주어진 스트림을 요약하는 새로운 방법으로 해석할 수 있다.9 is an example of an image reproduced in an evolved population. This image is generated from 20 particles. Only 20 particles could produce readable images. Since each population has enough information to recreate the dominant image of the interval, the proposed method can be interpreted as a new way of summarizing a given stream.

도 10은 추정된 전이 확률 매트릭스에 기반하여 추정된 이미지 순서를 도시하는 참고도이다. 10 is a reference diagram illustrating an estimated image order based on an estimated transition probability matrix.

도 10에서 보이고 있는 일련의 그림은 알고리즘 2에서 추정한 전이 확률 매트릭스(매트릭스)에 기반하여 재생성 된 것이다. 도 10에서 상단이 원래 구간의 이미지 순서이며 하단은 추정된 구간의 이미지 순서이다. The series of pictures shown in FIG. 10 are reproduced based on the transition probability matrix (matrix) estimated in Algorithm 2. In FIG. 10, the upper part is the image order of the original section, and the lower part is the image order of the estimated section.

도 10은 다음과 같은 방식으로 생성되었다. 우선 시드 이미지(seed image)를 하나 제시하면 해당 시드 이미지의 다음 구간에 해당하는 구간을 매트릭스에 기반하여 추정한다. 추정된 세그먼트에 대하여 해당 세그먼트에서 이미지를 재생성한 후 재생성된 이미지를 또 다른 시드 이미지로 설정하여 이 세그먼트의 다음 세그먼트를 매트릭스에 기반하여 추정한다. 본 발명은 네 개의 구간을 추정하려고 시도하였으며 해당 구간에 해당하는 대표 이미지를 재생성하여 구간 추정 결과를 확인하였다. 도 10의 네 개의 구간 중 두 번째 구간은 정확한 구간이 아니다. 그러나 두 번째 이미지 역시 부분적으로는 정확한 이미지인데 그 이유는 우점 등장인물(대머리 남성)이 원래 구간과 추정된 구간에 모두 존재하기 때문이다. 두 번째 이미지를 부분적으로 정확한 이미지로 간주한다면 추정 방법이 정확한 구간 순서를 추정한 것이 명확하다. 10 was generated in the following manner. First, when a seed image is presented, the section corresponding to the next section of the seed image is estimated based on the matrix. After regenerating the image in the segment for the estimated segment, the regenerated image is set as another seed image, and the next segment of the segment is estimated based on the matrix. The present invention attempted to estimate four sections and reconstructed the representative image corresponding to the section to confirm the section estimation result. The second of the four sections of FIG. 10 is not an accurate section. But the second image is also partly accurate because the dominant character (the bald male) is present in both the original and the estimated intervals. If the second image is regarded as partially accurate, then the estimation method clearly estimates the correct interval order.

즉, 전이 확률 매트릭스를 구축함으로써 본 발명은 보다 작고 간단한 자료 구조로 비디오 스트림을 표현할 수 있으며 구축된 전이 확률 매트릭스에 기반하여 부분적인 구간 순서를 나타낼 수 있음을 알 수 있다.
In other words, it can be seen that by constructing a transition probability matrix, the present invention can represent a video stream with a smaller and simpler data structure, and can represent a partial interval order based on the constructed transition probability matrix.

도 11은 본 발명에 따른 비디오 스트림 분석 방법의 일 실시예를 설명하는 순서도이다. 이하에서는, 소정의 비디오 스트림 분석 시스템에서 본 발명에 따른 비디오 스트림 분석 방법을 수행하는 것을 가정하여 설명한다. 여기에서, 비디오 스트림 분석 시스템은 본 발명에 따른 비디오 스트림 분석 방법을 수행하기 위한 소정의 구성 요소를 구비할 수 있다. 예를 들어, 진화 파티클 필터, 메트릭스 변환 수단 등을 구비할 수 있다. 그러나, 이러한 비디오 스트림 분석 시스템에 대한 구성은 이하의 본 발명에 따른 비디오 스트림 분석 방법의 설명에 서로 상응하는 구성을 가지므로, 비디오 스트림 분석 시스템에 대한 보다 상세한 설명은 생략한다. 그러나 당업자는 이하의 비디오 스트림 분석 방법에 대한 설명으로부터, 비디오 스트림 분석 시스템을 쉽게 이해할 수 있을 것이다.11 is a flowchart illustrating an embodiment of a video stream analysis method according to the present invention. In the following description, it is assumed that a predetermined video stream analysis system performs the video stream analysis method according to the present invention. Here, the video stream analysis system may be provided with a predetermined component for performing the video stream analysis method according to the present invention. For example, an evolutionary particle filter, matrix transformation means, or the like may be provided. However, since the configuration of the video stream analysis system has a configuration corresponding to each other in the following description of the video stream analysis method according to the present invention, a more detailed description of the video stream analysis system will be omitted. However, those skilled in the art will be able to easily understand the video stream analysis system from the following description of the video stream analysis method.

도 11을 참조하면, 비디오 스트림 분석 시스템은 비디오 스트림 데이터에 대하여 우점 특징을 나타내는 파티클 집합을 생성하여(단계 S1110), 비디오 스트림 데이터를 세그먼트 집합으로 분할할 수 있다(단계 S1120). Referring to FIG. 11, the video stream analysis system may generate a particle set indicating a dominant feature with respect to the video stream data (step S1110), and divide the video stream data into segment sets (step S1120).

비디오 스트림 분석 시스템은 세그먼트 집합에 대하여 순서적 특징을 생성하도록 진화 파티클 필터링을 수행하고(단계 S1130), 진화 파티클 필터링이 수행된 파티클 집합에 대하여 순서 의존성 구조를 추정하여 전이 확률 메트릭스로 전환할 수 있다(단계 S1140).The video stream analysis system may perform evolutionary particle filtering to generate an ordered feature on the set of segments (step S1130), and estimate the order-dependent structure of the set of particles on which the evolutionary particle filtering has been performed and convert the transition probability matrix into a transition probability matrix. (Step S1140).

단계 S1110 내지 단계 S1120에 대한 일 실시예에서, 비디오 스트림 분석 시스템은 원본 이미지에 대하여 특성을 추출하고, 추출된 특성을 이용하여 비주얼 워드 집합을 생성할 수 있다. 비디오 스트림 분석 시스템은 무작위로 선정된 비주월 워드를 연관하여 위치 정보를 가지는 파티클 집합을 생성하고, 생성된 파티클 집합에 대응되도록 세그먼트 집합을 생성할 수 있다.In an embodiment of steps S1110 through S1120, the video stream analysis system may extract a feature from the original image and generate a visual word set using the extracted feature. The video stream analysis system may generate a particle set having position information by associating randomly selected non-monthly words, and generate a segment set to correspond to the generated particle set.

여기에서, 비디오 스트림 분석 시스템은 SHIF를 이용하여 특성을 추출할 수 있다.Here, the video stream analysis system may extract the characteristic using the SHIF.

파티클 집합은 해당 집합에 포함된 개별 파티클을 노드와 에지의 쌍으로 표현 가능하고, 여기에서, 노드는 비주얼 워드 집합에서 선택될 수 있다.A particle set may represent individual particles included in the set as a pair of nodes and edges, where the node may be selected from a visual word set.

단계 S1130에 대한 일 실시예에서, 비디오 스트림 분석 시스템은 적합도를 중요 분포 q로서 이용하여 파티클을 선정할 수 있다.In one embodiment for step S1130, the video stream analysis system may select particles using the goodness of fit as the critical distribution q.

여기에서, 적합도는 상술한 바와 같이 파티클 하나를 구성하는 노드간의 거리, 파티클 하나의 이미지 표현 능력 및 페널티를 고려하여 계산될 수 있고, 진화 파티클 필터링은 적합도 값의 분포를 이용하여 파티클을 샘플링하며 특정 시점의 어느 한 파티클 집합을 구성하는 파티클의 수는 고정될 수 있다.Here, the goodness of fit may be calculated in consideration of the distance between the nodes constituting one particle, the ability to express an image of one particle, and the penalty, as described above, and the evolutionary particle filtering samples the particles using a distribution of goodness values and specifies The number of particles constituting any one particle set at a viewpoint may be fixed.

또한, 진화 파티클 필터링은 재샘플링에 따른 개체군 퇴보를 방지하기 위하여, 유전 연산자 결과물과 부모 세대 개체군의 적합도를 이용하여 새로운 개체군을 구성할 수 있다. In addition, evolutionary particle filtering may construct new populations using the fitness of the genetic operator output and the parental generation population to prevent population degeneration due to resampling.

단계 S1140에 대한 일 실시예에서, 비디오 스트림 분석 시스템은 서로 상이한 파티클 집합을 비교 탐색하고, 비교 탐색한 결과 두 집합 사이의 차이가 특정 기준값 이상일 경우 새로운 우점 이미지라고 추정할 수 있다. 비디오 스트림 분석 시스템은 새로운 우점 이미지가 존재하면 이를 전이 확률 자료 구조에 추가할 수 있다.In an embodiment of step S1140, the video stream analysis system compares and searches different sets of particles, and when the comparison results show that the difference between the two sets is greater than or equal to a certain reference value, the video stream analysis system may estimate a new predominant image. The video stream analysis system can add new dominant images to the transition probability data structure if they exist.

일 실시예예서, 비디오 스트림 분석 시스템은 파티클 집합을 이용하여 세그먼트를 대표하는 이미지를 재생성할 수 있다.
In one embodiment, the video stream analysis system may recreate an image representing a segment using a set of particles.

V. 결론V. Conclusion

본 발명은 비디오 스트림을 분절화하고 그 의존성을 추정할 수 있는 방법을 개시하였다. 스트림 분절화를 위해 본 발명은 파티클 필터링 방법을 수정하여 협력적인 방식으로 동작하는 파티클 집합에 기반하여 구간을 추정하였다. 진화를 통해 우점 비주얼 워드가 파티클에 표현됨을 알 수 있다. 파티클 집합에 기반하여 구간을 다시 표현함으로써 카메라 워크나 조명으로 인한 사소한 왜곡을 이겨내고 구간을 나타낼 수 있게 되었다. 분절화된 구간들을 이용하여 비디오 스트림에서의 시간적인 의존성을 학습할 수 있음을 알 수 있다. 유사한 구간을 상태(state)로 그룹화하여 획득한 구간 순서를 전이 확률 매트릭스로 변환한 후 전이 확률 매트릭스를 이용하여 주어진 큐에 대한 다음 구간을 예상해 보았다. 결과의 상태 순서와 파티클 집합이 연계되기 때문에 제안 방법을 주어진 비디오 스트림을 압축하는 새로운 방법으로 해석할 수 있다.The present invention discloses a method that can segment a video stream and estimate its dependencies. For stream segmentation, the present invention modifies the particle filtering method to estimate the interval based on a set of particles operating in a cooperative manner. The evolution shows that the dominant visual word is represented in the particle. By re-expressing the interval based on the particle set, it is possible to overcome the slight distortion caused by camera work or lighting and represent the interval. It can be seen that the segmented intervals can be used to learn the temporal dependency in the video stream. After converting the interval order obtained by grouping similar intervals into states, the transition probability matrix was used to predict the next interval for a given queue using the transition probability matrix. Because the order of the results and the particle set are linked, the proposed method can be interpreted as a new way to compress a given video stream.

제안 방법의 성능은 인간 평가 결과와 비교하여 확인하였다. 인간 평가 결과와 비교했을 때 제안 방법은 다음과 같은 특성을 갖고 있다. 제안 방법은 인간 평가자의 평가 결과의 네 배에 해당하는 변환점을 추정해 냈으며 추정 정확도는 0.07~0.1 사이이다. 비록 주어진 문제의 난이도를 고려했을 때 상당히 의미 있는 결과임을 알 수 있다.
The performance of the proposed method was confirmed by comparison with human evaluation results. Compared with the results of human evaluation, the proposed method has the following characteristics. The proposed method estimated the transform point corresponding to four times the evaluation result of human evaluator and the estimation accuracy is between 0.07 and 0.1. Although the difficulty of a given problem is considered, it is a very meaningful result.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the present invention as defined by the following claims It can be understood that

Claims

In the stream data analysis method for a video stream,
(a) generating a particle set indicative of a dominant feature with respect to the video stream data, and dividing the video stream data into segment sets;
(b) performing evolutionary particle filtering to generate ordered features for the set of segments; And
(c) estimating an order-dependent structure for the set of particles on which the evolutionary particle filtering has been performed and converting them into transition probability metrics.

The method of claim 1, wherein step (a)
(a-1) extracting a feature from the original image and generating a visual word set using the extracted feature;
(a-2) generating a particle set having location information by associating randomly selected non-monthly words; And
(a-3) generating the segment set to correspond to the particle set.

The method of claim 2, wherein step (a-1)
The video stream analysis method, characterized in that for extracting the characteristic using the SHIF.

The method of claim 1, wherein the particle set is
The individual particles included in the set may be represented by a pair of nodes and edges, and the node may be selected from a visual word set.

The method of claim 4, wherein step (b)
And selecting particles using the goodness-of-fit as the significant distribution q.

The method of claim 5, wherein the goodness of fit
The video stream analysis method is calculated by considering the distance between nodes constituting one particle, the ability to represent one image, and the penalty.

The method of claim 6, wherein the evolutionary particle filtering
The particle is sampled using the distribution of the goodness-of-fit values, and the number of particles constituting any one particle set at a specific time point is fixed.

8. The method of claim 7, wherein the evolutionary particle filtering is
A method of analyzing a video stream comprising constructing a new population using a goodness of fit of the genetic operator output and the parent generation population to prevent population degeneration due to resampling.

2. The method of claim 1, wherein step (c)
(c-1) comparing and searching different particle sets;
(c-2) estimating a new dominant image when the difference between the two sets is greater than or equal to a certain reference value as a result of the comparison search; And
(c-3) if a new dominant image exists, adding it to the transition probability data structure.

The method of claim 1, wherein the video stream analysis method is
(d) regenerating an image representing the segment using the particle set.