KR20080002915A

KR20080002915A - Apparatus and method for processing video data

Info

Publication number: KR20080002915A
Application number: KR1020077025308A
Authority: KR
Inventors: 찰스 파세; 존 웨이스
Original assignee: 유클리드 디스커버리스, 엘엘씨
Priority date: 2005-03-31
Filing date: 2006-03-30
Publication date: 2008-01-04
Also published as: CN101167363A; EP1878256A1; CN101167363B; WO2006105470A1; CA2590869C; JP5065451B2; AU2006230545B2; CA2590869A1; EP1878256A4; JP2008537391A; JP4573895B2; JP2010259087A; KR101216161B1; AU2006230545A1

Abstract

An apparatus and methods for processing video data are described. The invention provides a representation of video data that can be used to assess agreement between the data and a fitting model for a particular parameterization of the data. This allows the comparison of different parameterization techniques and the selection of the optimum one for continued video processing of the particular data. The representation can be utilized in intermediate form as part of a larger process or as a feedback mechanism for processing video data. When utilized in its intermediate form, the invention can be used in processes for storage, enhancement, refinement, feature extraction, compression, coding, and transmission of video data. The invention serves to extract salient information in a robust and efficient manner while addressing the problems typically associated with video data sources.

Description

Apparatus and method for processing video data {APPARATUS AND METHOD FOR PROCESSING VIDEO DATA}

본원은 2005년 3월 31일 출원되고 발명의 명칭이 "System and Method For Video Compression Employing Principal Component Analysis"인 미국 가출원 No. 60/667,532 및 2005년 4월 13일 출원되고 발명의 명칭이 "Apparatus and Methods for Processing Video Data"인 미국 가출원 No. 60/670,951에 우선권을 주장한다. 본원은 2006년 1월 20일 출원된 미국 출원 No. 11/336,366의 일부 계속 출원이며, 2005년 11월 16일 출원된 미국 출원 No. 11/280,625의 일부 계속 출원이며, 이는 2005년 9월 20일 출원된 미국 출원 No. 11/230,686의 일부 계속 출원이며, 이는 2005년 7월 28일 출원된 미국 출원 No. 11/191,562의 일부 계속 출원이다. 전술한 출원들 각각은 그 전체 내용이 본원에 참고문헌으로 포함된다.This application is filed March 31, 2005, and entitled "System and Method For Video Compression Employing Principal Component Analysis." US Provisional Application No. 60 / 667,532, filed April 13, 2005, entitled " Apparatus and Methods for Processing Video Data " Claim priority on 60 / 670,951. This application is filed in the United States application No. United States Application No. 11 / 336,366, filed on November 16, 2005. Some continuing applications of 11 / 280,625, filed September 20, 2005, filed in US Application No. Part of the ongoing application on 11 / 230,686, filed on July 28, 2005, in US Application No. Some continuing applications of 11 / 191,562. Each of the foregoing applications is incorporated herein by reference in its entirety.

본 발명은 일반적으로 디지털 신호 프로세싱의 분야에 관한 것이며, 보다 구체적으로는 신호 또는 이미지 데이터의 효율적인 표현 및 프로세싱을 위한 컴퓨터 장치 및 컴퓨터 구현 방법에 관한 것이며, 가장 구체적으로 비디오 데이터의 효율적인 표현 및 프로세싱을 위한 컴퓨터 장치 및 컴퓨터 구현 방법에 관한 것이다. TECHNICAL FIELD The present invention generally relates to the field of digital signal processing, and more particularly, to a computer device and a computer implemented method for efficient representation and processing of signal or image data, and most particularly to efficient representation and processing of video data. Computer apparatus and a computer implemented method.

본원 발명이 속하는 종래 기술의 일반적인 시스템 설명은 도 1에서와 같이 표현될 수 있다. 여기서, 블록 다이어그램은 통상의 종래 기술의 비디오 프로세싱 시스템을 도시한다. 이러한 시스템은 통상적으로 이하의 단계: 입력 단계(102), 프로세싱 단계(104), 출력 단계(106), 및 하나 이상의 데이터 저장 메커니즘(108)을 포함한다. A general system description of the prior art to which the present invention belongs may be expressed as in FIG. Here, the block diagram shows a conventional prior art video processing system. Such a system typically includes the following steps: input step 102, processing step 104, output step 106, and one or more data storage mechanisms 108.

입력 단계(102)는 카메라 센서, 카메라 센서 어레이, 영역 탐색 센서, 또는 저장 메커니즘으로부터 데이터를 검색하는 수단을 포함할 수 있다. 입력 단계는 인위적 및/또는 자연적으로 발생하는 현상의 시간 상관 시퀀스를 나타내는 비디오 데이터를 제공한다. 데이터의 돌출(salient) 컴포넌트는 잡음 또는 다른 원치 않는 신호에 의해 차폐 또는 오염될 수 있다. The input step 102 may include means for retrieving data from a camera sensor, camera sensor array, area search sensor, or storage mechanism. The input step provides video data representing a time correlation sequence of anthropogenic and / or naturally occurring phenomena. Salient components of the data may be shielded or contaminated by noise or other unwanted signals.

데이터 스트림, 어레이 또는 패킷 형태의 비디오 데이터는 예정된 전달 프로토콜에 따라 직접 또는 중간 저장 요소(108)를 통해 프로세싱 단계(104)로 제공될 수 있다. 프로세싱 단계(104)는 원하는 세트의 비디오 데이터 프로세싱 연산을 실행하기 위해 전용 아날로그 또는 디지털 장치, 또는 중앙처리장치(CPU), 디지털 신호 처리기(DSP), 또는 필드 프로그램가능한 게이트 어레이(FPGA)와 같은 프로그램 가능한 장치의 형태를 가질 수도 있다. 프로세싱 단계(104)는 통상적으로 하나 이상의 CODEC(코더/디코더)을 포함한다. Video data in the form of data streams, arrays or packets may be provided to the processing step 104 either directly or via an intermediate storage element 108 in accordance with a predetermined delivery protocol. Processing step 104 may be a dedicated analog or digital device, or program such as a central processing unit (CPU), a digital signal processor (DSP), or a field programmable gate array (FPGA), to perform a desired set of video data processing operations. It may have the form of a possible device. Processing step 104 typically includes one or more CODECs (coders / decoders).

출력 단계(106)는 신호, 디스플레이, 또는 사용자 또는 외부 장치에 영향을 줄 수 있는 다른 응답을 생성한다. 통상적으로, 출력 장치는 지시자 신호, 디스플레이, 하드카피, 저장소내의 프로세싱된 데이터의 표현을 생성하기 위해 사용되거나, 원격 사이트로 데이터의 전송을 개시하기 위해 사용된다. 이는 이후의 프로세 싱 연산에서 사용하기 위한 중간 신호 또는 제어 파라미터를 제공하기 위해 사용될 수도 있다. The output step 106 generates a signal, display, or other response that may affect the user or external device. Typically, the output device is used to generate an indicator signal, display, hard copy, representation of the processed data in the repository, or to initiate the transfer of data to a remote site. This may be used to provide an intermediate signal or control parameter for use in subsequent processing operations.

저장소는 이러한 시스템에서 선택적인 엘리먼트로 제공된다. 사용될 때, 저장소 엘리먼트(108)는 판독 전용 저장 매체와 같이 비휘발성이거나, 동적 랜덤 액세스 메모리(RAM)와 같이 휘발성일 수도 있다. 단일 비디오 프로세싱 시스템이, 입력, 프로세싱 및 출력 단계와 다양한 관계를 갖는 여러 형태의 저장소 엘리먼트를 포함하는 것이 이상한 것은 아니다. 이러한 저장 엘리먼트의 예는 입력 버퍼, 출력 버퍼, 및 프로세싱 캐시를 포함한다. The repository is provided as an optional element in such a system. When used, storage element 108 may be nonvolatile, such as a read-only storage medium, or may be volatile, such as dynamic random access memory (RAM). It is not unusual for a single video processing system to contain several types of storage elements having various relationships with input, processing and output stages. Examples of such storage elements include input buffers, output buffers, and processing caches.

도1의 비디오 프로세싱 시스템의 주목적은 입력 데이터를 프로세싱하여 특정 애플리케이션에서 의미 있는 출력을 생성하는 것이다. 이러한 목표를 달성하기 위해, 잡음 감소 또는 제거, 특성 추출, 객체 분할(segmentation) 및/또는 정규화, 데이터 카테고리화, 이벤트 검출, 편집, 데이터 선택, 데이터 재-코딩, 및 트랜스코딩을 포함하는 다양한 프로세싱 연산이 사용될 수도 있다. The primary purpose of the video processing system of FIG. 1 is to process input data to produce meaningful output in a particular application. To achieve this goal, various processing including noise reduction or removal, feature extraction, object segmentation and / or normalization, data categorization, event detection, editing, data selection, data re-coding, and transcoding Operations may be used.

불완전하게 제한된 데이터를 생성하는 많은 데이터 소스는 인간에게, 특히 음향 및 시각 이미지에 중요하다. 대부분의 경우, 이러한 소스 신호들의 본질적인 특징은 효율적인 데이터 프로세싱이라는 목적에 악영향을 미친다. 소스 데이터의 고유한 변화가능성은 공학적 가설을 유도하는데 사용되는 순수한 경험 및 발견적 방법으로부터 발생하는 에러를 도입하지 않고 신뢰가능하고 효율적인 방법으로 데이터를 프로세싱하는데 장애가 된다. 입력 데이터가 자연히 또는 고의로 좁게 정의된 특징 세트(가령, 심볼 값들 또는 좁은 대역폭의 제한된 세트)로 제한된 경우, 애플리케이션들에 대해 이러한 변화가능성은 감소된다. 이러한 제한들 모두에 의해, 낮은 상업적 가치를 갖는 프로세싱 기술이 아주 빈번히 생긴다. Many data sources that produce incompletely limited data are important to humans, particularly acoustic and visual images. In most cases, the essential features of these source signals adversely affect the purpose of efficient data processing. The inherent variability of the source data is an obstacle to processing the data in a reliable and efficient manner without introducing errors arising from pure empirical and heuristic methods used to derive engineering hypotheses. If the input data is naturally or deliberately limited to a narrowly defined feature set (eg, a limited set of symbol values or narrow bandwidth), this changeability for applications is reduced. With all of these limitations, processing techniques with low commercial value arise very frequently.

단일 프로세싱 시스템의 설계는 입력으로 사용된 소스 신호의 예상된 특성 및 시스템의 의도된 사용에 의해 영향을 받는다. 대부분의 경우, 요구되는 성능 효율은 중요한 설계 인자일 것이다. 순차적으로, 성능 효율은, 이용가능한 데이터 저장소와 대비한 프로세싱될 데이터량과, 이용가능한 계산력과 대비한 어플리케이션의 계산 복잡도에 의해 영향을 받는다.The design of a single processing system is influenced by the expected characteristics of the source signal used as input and the intended use of the system. In most cases, the required performance efficiency will be an important design factor. In turn, performance efficiency is affected by the amount of data to be processed relative to the available data store and the computational complexity of the application relative to the available computational power.

통상의 비디오 프로세싱 방법은, 느린 데이터 통신 속도, 큰 저장소의 필요성, 및 교란시키는 의식적 인공물의 형태로 나타나는 다수의 비효율을 겪는다. 이들은, 사람들이 비디오 데이터를 이용하고 조작하기 위해 원하는 다양한 방식으로 인해, 그리고 사람들이 시각적 정보의 소정 형태에 대해 갖는 선천적 민감도로 인해, 심각한 문제가 될 수 있다. Conventional video processing methods suffer from a number of inefficiencies in the form of slow data communication speeds, large storage needs, and disturbing conscious artifacts. These can be serious problems because of the various ways that people want to use and manipulate video data, and because of the inherent sensitivity people have to certain types of visual information.

"최적의" 비디오 프로세싱 시스템은 원하는 세트의 프로세싱 연산을 실행하는데 있어서 효율적이고, 신뢰가능하고 강건하다. 이러한 연산은 저장, 송신, 디스플레이, 압축, 편집, 암호화, 강화(enhancement), 카테고리화, 특성 검출, 및 데이터의 인식을 포함한다. 부차적인 연산은 다른 정보 소스와 이러한 프로세싱된 데이터의 통합을 포함한다. 마찬가지로 중요한 것은, 비디오 프로세싱 시스템의 경우, 출력이 의식적 인공물의 도입을 방지함으로써 인간 시각과 호환가능해야 한다는 것이다. An "optimal" video processing system is efficient, reliable and robust in carrying out a desired set of processing operations. Such operations include storage, transmission, display, compression, editing, encryption, enhancement, categorization, feature detection, and recognition of data. Secondary operations include integrating this processed data with other information sources. Equally important, for video processing systems, the output must be compatible with human vision by preventing the introduction of conscious artifacts.

비디오 프로세싱 시스템은 그 속도, 효율, 및 품질이 입력 데이터의 임의의 특정 특성의 세부사항에 의존하지 않으면, "강고한(robust)" 것으로 표현될 수 있다. 강고성은 또한 입력의 일부가 오류인 때 연산을 실행할 수 있는 능력과 관련된다. 많은 비디오 프로세싱 시스템은 일반적 부류의 애플리케이션에 대해 허용할 만큼 충분히 강고하지 않다 - 시스템에 개발에 사용되었던 좁게 한정된 동일 데이터에 대한 적용만을 제공한다. A video processing system can be expressed as "robust" if its speed, efficiency, and quality do not depend on the details of any particular characteristic of the input data. Robustness also relates to the ability to perform operations when part of the input is an error. Many video processing systems are not robust enough to allow for a general class of applications-they only provide application to the same narrowly defined data that was used for development in the system.

돌출한 정보(salient information)는, 센싱된 현상의 신호 특성과 매칭되지 않는 입력 엘리먼트의 샘플링 레이트로 인해, 연속한 값의 데이터 소스를 이산화할 때 손실될 수 있다. 또한, 신호의 강도가 센서의 한계를 초과하여 포화될 때 손실이 존재한다. 유사하게, 입력 데이터의 정밀도가 감소된 경우 정보는 손실되는데, 이는 입력 데이터의 전체 범위의 값이 이산 값의 세트에 의해 표현된 경우 모든 양자화 프로세스에서 발생하며, 이로 인해 데이터 표현의 정확성이 감소된다. Salient information may be lost when discretizing a continuous value data source due to the sampling rate of the input element that does not match the signal characteristic of the sensed phenomenon. In addition, there is a loss when the strength of the signal is saturated beyond the limit of the sensor. Similarly, information is lost when the precision of the input data is reduced, which occurs in all quantization processes when the values of the entire range of input data are represented by a set of discrete values, thereby reducing the accuracy of the data representation. .

총 변화가능성(ensemble variability)은 데이터 또는 정보 소스의 부류내의 모든 예측불가성을 말한다. 시각 정보의 데이터 표현은 시각 정보가 통상적으로 제한되지 않기 때문에 매우 큰 총 변화가능성 등급을 갖는다. 시각 데이터는 센서 어레이에 입사하는 빛에 의해 형성될 수 있는 모든 공간 어레이 시퀀스 또는 공간-시간 시퀀스를 표현할 수 있다. Total variability refers to all unpredictability in the class of data or information source. The data representation of visual information has a very large total changeability rating because visual information is not typically limited. The visual data may represent all spatial array sequences or space-time sequences that may be formed by light incident on the sensor array.

시각 현상의 모델링에서, 비디오 프로세서는 통상적으로 데이터가 표현되거나 번역되는 방식에 대해 소정 세트의 제한 및/또는 구조를 부과한다. 그 결과, 이러한 방법은, 출력의 품질; 출력에 대해 부여되는 신뢰; 및 데이터에 대해 신뢰성 있게 실행될 수 있는 이후의 처리 과제의 유형에; 영향을 주는 체계적인 오류를 도입할 수 있다. In modeling visual phenomena, video processors typically impose some set of restrictions and / or structure on the way data is represented or translated. As a result, this method may include: quality of output; Trust given to the output; And types of subsequent processing tasks that can be reliably executed on the data; Introduce systematic errors that affect.

양자화 방법은 비디오 프레임 내의 데이터의 정확도를 감소시키지만 한편으로는 데이터의 통계적 변화를 유지하려고 한다. 통상적으로, 비디오 데이터가 분석되면 데이터 값들의 분포는 확률 분포에 모인다. 데이터를 공간 주파수의 혼합으로 특징지우기 위해 데이터를 위상 공간으로 투영함으로써, 거부감이 덜 한 방법으로 정확도 감소를 발산시키는 방법도 있다. 심하게 사용될 경우, 이러한 양자화 방법은 지각적으로 받아들이기 어려운 색상을 생성하고 비디오 프레임의 원래 평탄한 영역에서 급격한 픽실레이션(pixilation)을 일으킬 수 있다. The quantization method reduces the accuracy of the data in the video frame while attempting to maintain statistical changes in the data. Typically, when video data is analyzed, the distribution of data values is collected in a probability distribution. Another method is to project the data into phase space to characterize the data as a mixture of spatial frequencies, thereby giving off a reduction in accuracy in a less reluctant way. When used heavily, this quantization method can produce perceptually unacceptable colors and cause sharp pixilation in the original flat region of the video frame.

차등(differential) 코딩이 또한 통상적으로 데이터의 로컬 공간 유사성을 이용하기 위해 사용된다. 프레임의 일부의 데이터는 그 프레임내의 유사 데이터 부근에, 그리고 또한 후속 프레임내의 유사 위치에 클러스터링되는 경향이 있다. 데이터를 공간적으로 이와 인접한 데이터에 관하여 표현하는 것은 이후 양자화와 결합될 수 있으며, 그 최종 결과는, 주어진 정밀도에 대해 그 차이를 나타내는 것이 데이터의 절대값을 이용하는 것보다 더욱 정확하다는 것이다. 이러한 가정은 흑백 비디오 또는 낮은 컬러 비디오와 같이, 원본 비디오 데이터의 스펙트럼 해상도가 제한된 때 잘 들어맞는다. 비디오의 스펙트럼 해상도가 증가함에 따라, 유사성의 가정은 현저히 어긋난다. 이러한 어긋남은 비디오 데이터의 정밀도를 선택적으로 보존할 수 없음에 기인한다. Differential coding is also commonly used to take advantage of local spatial similarity of data. Data in a portion of a frame tends to cluster near similar data in that frame and also in similar locations in subsequent frames. Representing data spatially in terms of adjacent data can then be combined with quantization, and the end result is that representing the difference for a given precision is more accurate than using the absolute value of the data. This assumption fits well when the spectral resolution of the original video data is limited, such as black and white video or low color video. As the spectral resolution of the video increases, the assumption of similarity is significantly off. This deviation is due to the inability to selectively preserve the precision of the video data.

나머지 코딩은, 표현의 에러가 원본 데이터의 정밀도를 원하는 레벨의 정확성까지 복원하기 위해 추가로 차등 엔코딩된다는 점에서, 차등 엔코딩과 유사하다. The rest of the coding is similar to differential encoding in that the error in the representation is further differentially encoded to restore the precision of the original data to the desired level of accuracy.

이러한 방법의 변형들은, 비디오 데이터를, 공간 위상 및 스케일에 있어서의 데이터 상관관계를 노출시키는 대안의 표현으로 변환시키려고 한다. 일단 비디오 데이터가 이러한 방식으로 변환되었다면, 그 후 양자화 및 차등 코딩 방법이 변환된 데이터에 적용될 수 있어서, 돌출 이미지 특성의 보존을 증가시킨다. 널리 보급된 이러한 변환 비디오 압축 기술 중 두 개는 이산 코사인 변환(DCT) 및 이산 웨이블릿 변환(DWT)이다. DCT 변환에서의 에러는 비디오 데이터 값의 넓은 변화에서 나타나며, 따라서, DCT는 통상적으로 이러한 잘못된 상관을 국부화시키기 위해 비디오 데이터의 블록에 대해 일반적으로 사용된다. 이러한 국부화로부터의 인공물은 종종 블록의 경계를 따라 나타난다. DWT의 경우, 기본 기능과 소정의 텍스쳐 사이에 부정합이 존재할 때 더욱 복잡한 인공물이 발생하며, 이는 흐려짐(blurring) 효과를 일으킨다. DCT 및 DWT의 부정적 영향에 대항하기 위해, 표현의 정밀도가 정밀한 대역폭을 희생하고 낮은 왜곡으로 증가된다. Variations of this method seek to transform video data into alternative representations that expose data correlations in spatial phase and scale. Once the video data has been transformed in this manner, then quantization and differential coding methods can be applied to the transformed data, increasing the preservation of the salient image characteristics. Two of these widely used transformed video compression techniques are discrete cosine transform (DCT) and discrete wavelet transform (DWT). Errors in the DCT transform appear in wide variations in video data values, and therefore DCT is typically used for blocks of video data to localize such false correlations. Artifacts from this localization often appear along the boundaries of the block. In the case of DWT, more complex artifacts occur when there is a mismatch between the basic function and a given texture, which causes a blurring effect. To counter the negative effects of DCT and DWT, the precision of representation is increased with low distortion at the expense of precise bandwidth.

본 발명은 현존하는 최신의 방법에 대해 계산적 이점 및 분석적 이점 모두를 제공하는 컴퓨터로 구현되는 비디오 프로세싱 방법이다. 주된 본 발명의 방법은 선형 분해 방법, 공간 분할 방법, 및 공간 정규화 방법의 통합이다. 공간적으로 제약된 비디오 데이터는 선형 분해 방법의 강고성 및 적용가능성을 크게 증가시킨다. 부가적으로, 공간적 정규화에 대응하는 데이터의 공간적 분할 공간 정규화 자체로부터 유도된 이점을 증가시키기 위하여 추가로 기능할 수 있다.The present invention is a computer-implemented video processing method that provides both computational and analytical advantages over existing modern methods. The main method of the present invention is the integration of the linear decomposition method, the spatial partitioning method, and the spatial normalization method. Spatially constrained video data greatly increases the robustness and applicability of the linear decomposition method. In addition, it may further function to increase the benefits derived from the spatial partition spatial normalization itself of the data corresponding to the spatial normalization.

특히, 본 발명은 신호 데이터를 하나 이상의 유용한 표현으로 효율적으로 프로세싱할 수 있는 수단을 제공한다. 본 발명은 많은 공통적으로 발생하는 데이터 세트를 프로세싱하는 데 효율적이며 비디오 및 이미지 데이터를 프로세싱하는 데 특히 효율적이다. 본 발명의 방법은 데이터를 분석하며 프로세싱 및 인코딩을 용이하게 하기 위하여 상기 데이터의 하나 이상의 간결한 표현을 제공한다. 각각의 새롭고, 보다 간결한 데이터 표현은 계산 프로세싱과, 전송 대역폭과, 많은 어플리케이션(비디오 데이터의 인코딩, 압축, 전송, 분석, 저장 및 디스플레이를 포함하나 이에 제한되지 않음)에 대한 저장 요건에 있어서의 감소를 가능하게 한다. 본 발명은 비디오 데이터의 돌출 요소들의 식별 및 추출을 위한 방법을 포함하며, 데이터의 프로세싱 및 표현에 있어서의 우선순위 결정을 가능하게 한다. 노이즈 및 다른 원치하는 신호의 부분은 낮은 우선도로 식별되므로, 추가적인 프로세싱은 비디오 신호의 보다 높은 우선도 부분들을 분석하고 표현하는 데 집중될 수 있다. 결과로써, 비디오 신호는 이전에 가능했던 것 보다 훨씬 간결하게 표현된다. 그리고 정밀도에 있어서의 손실은 개념적으로 중요치 않은 비디오 신호의 일부분에 집중된다.In particular, the present invention provides a means by which signal data can be efficiently processed into one or more useful representations. The present invention is efficient for processing many commonly occurring data sets and is particularly efficient for processing video and image data. The method of the present invention provides one or more concise representations of the data to analyze the data and to facilitate processing and encoding. Each new, more compact data representation reduces computational processing, transmission bandwidth, and storage requirements for many applications, including but not limited to encoding, compression, transmission, analysis, storage, and display of video data. To make it possible. The present invention includes a method for the identification and extraction of salient elements of video data, and enables prioritization in the processing and presentation of data. Since noise and other portions of the unwanted signal are identified at low priority, further processing may be focused on analyzing and representing higher priority portions of the video signal. As a result, the video signal is much more concise than previously possible. And the loss in precision is concentrated on a portion of the video signal that is not conceptually important.

도 1은 종래 기술의 비디오 프로세싱 시스템을 도시하는 블록도이다. 1 is a block diagram illustrating a prior art video processing system.

도 2는 비디오를 프로세싱하는 주요한 모듈을 보여주는 본 발명의 개관을 제공하는 블록도이다. 2 is a block diagram that provides an overview of the present invention showing the major modules for processing video.

도 3은 본 발명의 모션 추정 방법을 도시하는 블록도이다. 3 is a block diagram illustrating a motion estimation method of the present invention.

도 4는 본 발명의 광역 등록 방법을 도시하는 블록도이다. 4 is a block diagram showing a wide area registration method of the present invention.

도 5는 본 발명의 정규화 방법을 도시하는 블록도이다. 5 is a block diagram illustrating a normalization method of the present invention.

도 6은 하이브리드 공간 정규화 압축 방법을 도시하는 블록도이다.6 is a block diagram illustrating a hybrid spatial normalized compression method.

도 7은 국부 정규화에서 사용되는 본 발명의 메쉬 생성 방법을 도시하는 블록도이다.7 is a block diagram illustrating the mesh generation method of the present invention used in local normalization.

도 8은 국부 정규화에서 사용되는 본 발명의 메쉬 기반의 정규화 방법을 도시하는 블록도이다.8 is a block diagram illustrating the mesh-based normalization method of the present invention used in local normalization.

도 9는 본 발명의 결합된 전역 및 국부 정규화 방법을 도시하는 블록도이다.9 is a block diagram illustrating the combined global and local normalization method of the present invention.

도 10은 본 발명의 GPCA-기초 다항식 맞춤(fitting) 및 차등화 방법을 도시하는 블록도이다.10 is a block diagram illustrating the GPCA-based polynomial fitting and differential method of the present invention.

도 11은 본 발명의 반복적 GPCA 정제 방법을 도시하는 블록도이다.11 is a block diagram illustrating an iterative GPCA purification method of the present invention.

도 12는 배경 분해방법을 도시하는 블록도이다.12 is a block diagram showing a background decomposition method.

도 13은 본 발명의 객체 분할 방법을 도시하는 블록도이다.13 is a block diagram showing the object segmentation method of the present invention.

도 14는 본 발명의 객체 보간 방법을 도시하는 블록도이다.14 is a block diagram illustrating an object interpolation method of the present invention.

비디오 신호 데이터에서, 비디오의 프레임들은, 투사되어 이미지화 된 3차원 장면을 2차원 이미지화 표면상에 일반적으로 도시하는 이미지의 시퀀스로 조립된다. 각각의 프레임 또는 이미지는 샘플링된 신호에 대한 이미지 센서 응답을 나타내는 화소(pel)로 구성된다. 종종, 샘플링된 신호는, 반사, 굴절 또는 방사되고 2차원 센서 어레이에 의해 샘플링된 전자기 에너지(가령 전자기, 음향 등)에 대응한 다. 연속한 순차적 샘플링에 의해, 프레임 당 두 개의 공간 차원 및 비디오 시퀀스에서 프레임의 순서에 대응하는 시간 차원을 갖는 시공간적 데이터 스트림을 발생된다. In video signal data, the frames of video are assembled into a sequence of images that generally depict the projected and imaged three-dimensional scene on a two-dimensional imaging surface. Each frame or image consists of pixels (pels) representing image sensor responses to the sampled signal. Often, the sampled signal corresponds to electromagnetic energy (eg, electromagnetic, acoustic, etc.) that has been reflected, refracted, or radiated and sampled by the two-dimensional sensor array. By successive sequential sampling, a spatiotemporal data stream is generated having two spatial dimensions per frame and a temporal dimension corresponding to the order of the frames in the video sequence.

도2에 도시된 바와 같이, 본 발명은 신호 데이터를 분석하고, 돌출 컴포넌트를 식별한다. 신호가 비디오 데이터로 구성될 때, 시공간적 스트림은 얼굴과 같은 종종 특정한 객체인 돌출 컴포넌트를 나타낸다. 식별 프로세스는 돌출 컴포넌트의 존재 및 중요성을 정량화하고, 이들 정량화된 돌출 컴포넌트 중 하나 이상의 가장 중요한 것을 선택한다. 이것은 현재 설명된 프로세싱 이후에 또는 이와 동시에 다른 덜 돌출된 컴포넌트를 식별 및 프로세싱하는 것을 제한하는 것은 아니다. 전술한 돌출 컴포넌트가 이어 추가로 분석되어 가변 및 불변의 서브 컴포넌트를 식별한다. 불변 서브 컴포넌트의 식별은 컴포넌트의 소정 특성을 모델링하는 프로세스이며, 상기 모델링에 의해, 컴포넌트가 원하는 정확도 레벨까지 동기화되게 하는 모델의 파라미터화를 나타낸다. As shown in Figure 2, the present invention analyzes the signal data and identifies the protruding components. When a signal consists of video data, the spatiotemporal stream represents a salient component, which is often a specific object such as a face. The identification process quantifies the presence and importance of overhanging components and selects one or more of the most important of these quantified overhanging components. This does not limit the identification and processing of other less protruding components after or concurrently with the currently described processing. The above-described overhang component is then further analyzed to identify variable and immutable subcomponents. Identification of an invariant subcomponent is the process of modeling certain characteristics of the component, which represents the parameterization of the model by which the component is synchronized to a desired level of accuracy.

일 실시예에서, 전방(foreground) 객체가 검출 및 추적된다. 객체의 화소는 비디오의 각각의 프레임으로부터 식별 및 분할된다. 블록 기반 모션 추정이 다수의 프레임내의 분할된 객체에 적용된다. 이어 이러한 모션 추정은 더 높은 계층의 모션 모델로 통합된다. 모션 모델은 공통 공간 구성에 대한 객체의 인스턴스(instane)를 뒤틀기(warp)하는데 사용된다. 소정의 데이터의 경우, 이러한 구성에서, 객체의 더 많은 특성이 정렬된다. 이러한 정규화는 다수의 프레임에 대한 객체의 화소의 값의 선형 분해가 컴팩트하게 표현될 수 있게 해준다. 객체의 외관 에 속하는 돌출 정보는 이러한 컴팩트한 표현에 포함된다. In one embodiment, the foreground object is detected and tracked. The pixels of the object are identified and segmented from each frame of video. Block based motion estimation is applied to partitioned objects in multiple frames. This motion estimation is then integrated into the higher layer motion model. The motion model is used to warp instances of objects for common spatial configurations. For some data, in this configuration, more properties of the object are aligned. This normalization allows a linear decomposition of the pixel's value of an object over multiple frames to be represented compactly. Extruding information pertaining to the appearance of the object is included in this compact representation.

본 발명의 바람직한 실시예는 전방 비디오 객체의 선형 분해를 상술한다. 객체는 공간적으로 정규화됨으로써, 컴팩트한 선형 외관 모델을 생성한다. 또 다른 바람직한 실시예는 공간 정규화에 앞서 비디오 프레임의 후방으로부터 전방 객체를 추가로 분할한다. The preferred embodiment of the present invention details the linear decomposition of the front video object. Objects are spatially normalized to create a compact linear appearance model. Another preferred embodiment further partitions the front object from the back of the video frame prior to spatial normalization.

본 발명의 바람직한 실시예는 소량의 모션을 실행하면서 카메라에 이야기하는 사람의 비디오에 본 발명을 적용한다. A preferred embodiment of the present invention applies the present invention to video of a person talking to a camera while performing a small amount of motion.

본 발명의 바람직한 실시예는 공간 변환을 통해 잘 표현될 수 있는 비디오내의 소정의 객체에 본 발명을 적용한다. The preferred embodiment of the present invention applies the present invention to a given object in a video that can be well represented through spatial transformation.

본 발명의 바람직한 실시예는 비디오의 둘 이상의 프레임들 간의 한정된 차이점을 결정하기 위해 블록 기반 모션 추정을 이용한다. 더욱 효율적인 선형 분해를 제공하기 위해 더 높은 등급의 모션 모델이 상기 한정된 차이로부터 인수화(factoring)된다. Preferred embodiments of the present invention use block-based motion estimation to determine finite differences between two or more frames of video. Higher grade motion models are factored from the above defined differences to provide more efficient linear decomposition.

검출 및 추적Detection and tracking

프레임 내의 객체를 검출하고 그 객체를 소정 수의 이후의 프레임들을 통해 추적하는 것은 기술 분야에 공지되어 있다. 객체 추적 기능을 수행하기 위해 이용될 수 있는 알고리즘과 프로그램들 중에는, Viola/Jones: P. Viola and M.Jones, "Robust Real-time Object detection" in Proc. 2nd Int'l Workshop on Statistical and Computational Theories of Vision -- Modeling, Learning, Computing and Sampling, Vancouver, Canada, July 2001이 있다. 유사하게, 검출 된 객체를 연속적인 프레임들을 통해 추적하는 데 이용될 수 있는 수많은 알고리즘과 프로그램이 존재한다. 그 예는 다음을 포함한다: C. Edwards, C.Taylor, and T. Cootes. "Learning to identify and track faces in an image sequence." Proc. Int'l Conf. Auto. Face and Gesture Recognition, pages 260-265, 1998.It is known in the art to detect an object within a frame and track that object through a predetermined number of subsequent frames. Among the algorithms and programs that can be used to perform object tracking are Viola / Jones: P. Viola and M.Jones, "Robust Real-time Object Detection" in Proc. 2nd Int'l Workshop on Statistical and Computational Theories of Vision-Modeling, Learning, Computing and Sampling, Vancouver, Canada, July 2001. Similarly, there are numerous algorithms and programs that can be used to track the detected object through successive frames. Examples include: C. Edwards, C. Taylor, and T. Cootes. "Learning to identify and track faces in an image sequence." Proc. Int'l Conf. Auto. Face and Gesture Recognition, pages 260-265, 1998.

객체검출 프로세스의 결과는 프레임 내에서 객체의 중심의 일반적인 위치와 이 객체의 스케일(크기)에 관한 지시를 특정하는 데이터 세트이다. 추적 프로세스의 결과는, 객체의 임시 라벨을 나타내는 동시에, 어떤 수준의 확률로 연속적인 프레임들에서 검출된 객체가 동일한 객체인지를 확인하는 데이터 세트이다.The result of the object detection process is a data set that specifies the general position of the center of the object within the frame and an indication of the scale (size) of the object. The result of the tracking process is a data set that indicates the temporary label of the object and at some level of probability confirms that the object detected in successive frames is the same object.

객체 검출 및 추적 알고리즘은 프레임 내의 하나의 객체에 적용되거나 프레임들내에 있는 둘 이상의 객체에 적용될 수 있다. Object detection and tracking algorithms can be applied to one object in a frame or to two or more objects in frames.

연속적인 프레임들의 그룹내에서 검출된 객체의 하나 이상의 특징들을 추적하는 것이 또한 공지되어 있다. 만약 객체가 예를 들어 사람의 얼굴이라면, 그 특징은 눈이나 코일 수 있다. 한 가지 기법에서, 특징은 개략적으로 "모서리(corner)라고 기재될 수 있는 "라인들"의 교차점에 의해 표현된다. 바람직하게는 강렬하고 서로로부터 공간적으로 이격된 "모서리들"이 특징으로써 선택된다. 특징은 공간 밀도 필드 경사도 분석을 통해 식별될 수 있다. 광 흐름(optical flow)의 계층적인 다해상도 추정을 이용함으로써, 연속적인 프레임들에서 특징들의 과도적인 변위의 결정이 가능하다. M.J.Black 및 Y.Yacoob의 "Tracking and recognizing rigid and non-rigid facial motion ysing local parametric models of image motions" In Proceedings of the International Conference on Computer Vision, pages 374- 381, Boston, Mass., June 1995는 특징들을 추적하기 위해 본 기법을 이용하는 알고리즘의 일 예이다.It is also known to track one or more features of an detected object within a group of consecutive frames. If the object is a human face, for example, the feature can be an eye or a coil. In one technique, a feature is represented by the intersection of "lines" which can be described as roughly "corners." Preferably, the "edges" which are intense and spaced apart from each other are selected as a feature. Features can be identified through spatial density field gradient analysis By using hierarchical multiresolution estimation of optical flow, it is possible to determine the transient displacement of features in successive frames. Y.Yacoob's "Tracking and recognizing rigid and non-rigid facial motion ysing local parametric models of image motions" In Proceedings of the International Conference on Computer Vision, pages 374-381, Boston, Mass., June 1995 This is an example of an algorithm using this technique.

일단 신호의 구성 돌출 컴포넌트가 결정되면, 이러한 컴포넌트들은 제한될 수도 있으며, 모든 다른 신호 컴포넌트는 감소되거나 제거될 수도 있다. 돌출 컴포넌트를 검출하는 프로세스는 도 2에 도시되어 있으며, 비디오 프레임(202)은 하나 이상의 객체 검출(206) 프로세스에 의해 처리되어, 하나 이상의 객체는 식별되고, 이후 추적된다. 제한된 컴포넌트는 비디오 데이터의 중간 형태를 나타낸다. 이어 이러한 중간 데이터는 현재의 비디오 프로세싱 방법에 통상적으로 이용가능하지 않은 기술을 이용하여 엔코딩될 수 있다. 중간 데이터가 여러 형태로 존재하기 때문에, 표준 비디오 엔코딩 기술이 이러한 여러 중간 형태를 엔코딩하도록 사용될 수도 있다. 각각의 예에 대해, 본 발명은 가장 효율적인 엔코딩 기술을 결정하고 이용한다. Once the component salient components of the signal are determined, these components may be limited and all other signal components may be reduced or eliminated. The process of detecting the protruding component is shown in FIG. 2 and the video frame 202 is processed by one or more object detection 206 processes such that one or more objects are identified and then tracked. The restricted component represents an intermediate form of video data. This intermediate data can then be encoded using techniques that are not typically available for current video processing methods. Since intermediate data exists in several forms, standard video encoding techniques may be used to encode these various intermediate forms. For each example, the present invention determines and uses the most efficient encoding technique.

바람직한 일 실시예에서, 돌출 분석 프로세스는 돌출 신호 모드를 검출 및 분류한다. 이러한 프로세스의 일 실시예는 응답 신호를 생성하기 위해 구체적으로 지정된 공간 필터들의 조합을 이용하는데, 이 응답 신호의 강도는 비디오 프레임 내에 있는 객체의 검출된 돌출성과 관련이 있다. 분류기는 상이한 공간 스케일에서 그리고 상이한 비디오 프레임의 위치에서 적용된다. 분류기로부터의 응답의 강도는 돌출 신호 모드의 존재의 가능성을 나타낸다. 현저한 돌출 객체가 중심에 있을 경우, 프로세스는 이와 부합하여 이 객체를 강한 응답으로 분류한다. 돌출 신호 모드의 검출은 비디오 시퀀스에서의 돌출 정보에 대한 이후의 프로세싱 및 분석을 가능하게 함으로써 본 발명을 차별화한다. In one preferred embodiment, the protrusion analysis process detects and classifies the protrusion signal mode. One embodiment of this process uses a combination of specifically specified spatial filters to generate a response signal, the strength of which is related to the detected protrusion of an object within the video frame. The classifier is applied at different spatial scales and at different video frame positions. The intensity of the response from the classifier indicates the possibility of the presence of the salient signal mode. If a prominent overhanging object is in the center, the process classifies it as a strong response. Detection of the salient signal mode differentiates the present invention by enabling subsequent processing and analysis of salient information in the video sequence.

하나 이상의 비디오 프레임에서 돌출 신호 모드의 검출 위치가 주어지면, 본 발명은 돌출 신호 모드의 불변 특성을 분석한다. 부가적으로, 본 발명은 불변 특성에 대해 "덜 돌출된" 신호 모드인 나머지 신호를 분석한다. 불변 특성의 식별은 중복되는 정보를 감소시키고 신호 모드를 분할(즉, 분리)하기 위한 기반을 제공한다. Given the detection position of the salient signal mode in one or more video frames, the present invention analyzes the invariant characteristics of the salient signal mode. In addition, the present invention analyzes the remaining signals that are in a "less protruding" signal mode for invariant properties. Identification of invariant features reduces the redundant information and provides a basis for dividing (ie, separating) signal modes.

특성 포인트 추적Attribute point tracking

본 발명의 일 실시예에서, 하나 이상의 프레임의 공간 위치는 공간 강도 필드 경사도(gradient) 분석을 통해 결정된다. 이러한 특징은 "라인들"의 몇 개의 교차부에 대응하며, 상기 라인들은 느슨하게는 "코너"라고 기재될 수 있다. 이러한 실시예는, 강한 코너들인 동시에 서로와 공간적으로 분리되어 있는 코너들의 세트를 선택하는데, 본원에서는 이 코너들의 세트를 특성 포인트(feature point)라고 한다. 또한, 광학 플로우의 계층적 다해상도 추정을 이용하면 시간에 따른 특성 포인트의 변환 변위의 결정이 가능하다.In one embodiment of the present invention, the spatial position of one or more frames is determined through spatial intensity field gradient analysis. This feature corresponds to several intersections of "lines", which lines may be loosely described as "corners". This embodiment selects a set of corners that are both strong corners and spatially separated from each other, which is referred to herein as a feature point. In addition, the hierarchical multi-resolution estimation of the optical flow makes it possible to determine the transform displacement of a characteristic point over time.

도 2에서, 객체 검출 프로세스(208)로부터 검출 인스턴스를 도출하는 객체 추적(220) 프로세스가 도시되어 있고, 추가로 다수의 비디오 프레임들(202 및 204)에 대하여 하나 이상의 검출된 객체들의 특성들의 대응성을 식별하는 프로세스(222)가 도시되어 있다.In FIG. 2, an object tracking 220 process is shown that derives a detection instance from the object detection process 208, and further corresponds to the correspondence of the properties of one or more detected objects with respect to the plurality of video frames 202 and 204. A process 222 of identifying a last name is shown.

특성 추적의 비제한적 실시예가 사용될 수 있어서, 특성은 블록 기반 모션 추정과 같은 더욱 일정한 경사도 분석 방법을 자격 검증하는데 이용될 수 있다. Non-limiting embodiments of feature tracking can be used so that features can be used to qualify more consistent gradient analysis methods such as block-based motion estimation.

다른 실시예는 특성 추적에 기초한 모션 추정의 예측을 예상한다. Another embodiment anticipates the prediction of motion estimation based on characteristic tracking.

객체 기반 검출 및 추적Object based detection and tracking

본원 발명의 비제한적 실시예에서, 강고한 객체 분류기가 비디오의 프레임에서 얼굴들을 추적하기 위해 사용된다. 이러한 분류기는 얼굴들에 대해 트레이닝된 지향된 에지에 대한 직렬 응답에 기초한다. 이러한 분류기에서, 에지는 기본 하르(Haar) 특성 및 이들 특성을 45도 만큼 회전한 것의 세트로서 정의된다. 직렬 분류기는 아다부스트(AdaBoost) 알고리즘의 변형이다. 게다가, 응답 계산은 합산된 영역 테이블의 사용을 통해 최적화될 수 있다. In a non-limiting embodiment of the present invention, a robust object classifier is used to track the faces in the frame of video. This classifier is based on the serial response for the directed edge trained for the faces. In this classifier, the edges are defined as the set of basic Haar properties and the rotation of these properties by 45 degrees. Serial classifiers are a variation of the AdaBoost algorithm. In addition, the response calculation can be optimized through the use of the summed region table.

로컬 등록Local registration

등록은 두 개 이상의 비디오 프레임에서 식별된 객체의 엘리먼트들 사이의 대응의 할당을 포함한다. 이러한 대응은 비디오 데이터에서 시간적으로 구별되는 포인트에서 비디오 데이터 사이의 공간적 관계를 모델링하는데 기초가 된다. Registration includes assignment of a correspondence between elements of an object identified in two or more video frames. This correspondence is the basis for modeling the spatial relationship between video data at temporally distinct points in the video data.

다양한 제한되지 않는 등록 수단이, 주지의 알고리즘 및 이들 알고리즘의 파생 알고리즘에 관하여 실행하기 위한 특정 실시예 및 이들의 관련된 감소를 설명하기 위해 본 발명에 대해 기술된다.Various non-limiting registration means have been described with reference to the present invention in order to illustrate specific embodiments for implementing with respect to known algorithms and their derivatives, and their associated reductions.

시공간적 시퀀스에서 명백한 광학 플로우를 모델링하는 한 가지 수단은 비디오 데이터의 두 개 이상의 프레임으로부터의 유한 차분 필드의 생성을 통해 달성될 수 있다. 대응성이 공간 및 강도 의미에서 특정의 불변성 제약을 따르면 광학 플로우 필드는 개략적으로 추정될 수 있다.One means of modeling the apparent optical flow in space-time sequences can be achieved through the generation of finite differential fields from two or more frames of video data. If the correspondence follows certain invariant constraints in the spatial and intensity sense, the optical flow field can be estimated roughly.

도 3에 도시된 것처럼, 프레임(302 또는 304)은 데시메이션 프로세스(306) 또는 일부 다른 서브 샘플링 프로세스(가령, 저역필터)를 통해 공간적으로 서브 샘플링된다. 이러한 공간적으로 감소된 이미지(310 및 312)는 또한 추가로 서브 샘플링될 수 있다.As shown in FIG. 3, frame 302 or 304 is spatially subsampled through decimation process 306 or some other sub-sampling process (eg, a low pass filter). These spatially reduced images 310 and 312 can also be further subsampled.

다이아몬드 검색Diamond search

비디오 프레임의 블록으로의 비중첩 분할이 주어지면, 각각의 블록에 대한 정합(match)을 위하여 비디오의 이전 프레임을 탐색한다. 전체 검색 블록 기반(FSBB) 블록 모션 추정은, 현재 프레임에 있는 블록과 비교될 때 가장 적은 에러를 갖는 비디오의 이전 프레임의 위치를 탐색한다. FSBB를 실행하는 것은 계산적으로 매우 고가이며, 종종 로컬화된 모션의 가정에 기초한 다른 모션 추정 방식에 비해 양호한 정합을 도출하지 않는다. 다이아몬드 검색 블록 기반(DSBB) 경사도 하강 모션 추정은 블록에 대한 최상의 정합을 향한 에러 경사도를 반복적으로 트래버스법으로 측정(traverse)하기 위해 다양한 크기의 다이아몬드형 검색 패턴을 이용하는 FSBB에 대한 일반적인 대안이다.Given a non-overlapping split of blocks of video frames, the previous frame of video is searched for a match for each block. Full search block based (FSBB) block motion estimation searches for the location of the previous frame of video with the least error as compared to the block in the current frame. Implementing FSBB is computationally very expensive and often does not yield good matching compared to other motion estimation schemes based on the assumption of localized motion. Diamond Search Block Based (DSBB) Gradient Falling Motion Estimation is a common alternative to FSBB that uses diamond shaped search patterns of various sizes to iteratively traverse the error gradient towards the best match for the block.

본 발명의 일 실시예에서, DSBB는 자신의 값이 더 높은 등급의 모션 모델로 이후에 분해되는 유한 차분을 생성하기 위해 비디오의 하나 이상의 프레임들 사이에서 이미지 경사도 필드의 분석에서 사용된다. In one embodiment of the invention, DSBB is used in the analysis of the image gradient field between one or more frames of video to produce a finite difference whose value is later decomposed into a higher grade motion model.

기술 분야의 당업자는 블록 기반 모션 추정이 규칙적인 메시의 정점들의 분석의 등가로서 보여질 수 있다는 것을 알 것이다. Those skilled in the art will appreciate that block-based motion estimation can be seen as an equivalent of the analysis of regular mesh vertices.

메시(mesh) 기반 모션 추정Mesh based motion estimation

메시 기반 예측은 비디오 프레임의 불연속(discrete) 영역들을 묘사하기 위 하여 에지들에 의해 연결된 정점들의 기하학적 메시를 이용한 후, 이후에 메시 정점들의 위치에 의해 제어되는 변형 모델을 통해 후속 프레임들에 있는 이들 영역의 변형과 이동을 예측한다. 정점들이 이동될 때, 정점들에 의해 정의된 영역내의 화소들은 현재의 프레임을 예측하기 위하여 또한 이동된다. 최초 화소 값의 상대적인 이동 및 결과적인 근사는 화소 위치를 그 화소의 주변에 있는 정점들의 위치와 연관시키는 몇몇 보간법을 통해 수행된다. 순수한 변환과 비교하여 스케일링 및 회전의 부가적인 모델링은, 이러한 동작이 비디오 신호내에 존재할 때, 프레임의 화소들의 보다 정밀한 예측을 생성할 수 있다.Mesh-based prediction uses a geometric mesh of vertices connected by edges to depict discrete regions of a video frame, and then those in subsequent frames through a deformation model controlled by the position of the mesh vertices. Predict the deformation and movement of regions. When the vertices are moved, the pixels in the area defined by the vertices are also moved to predict the current frame. The relative movement of the original pixel value and the resulting approximation is performed through some interpolation that associates the pixel location with the location of the vertices in the periphery of that pixel. Additional modeling of scaling and rotation compared to pure transformation can produce more precise predictions of the pixels of a frame when such motion is present in the video signal.

일반적으로 메시 모델은 규칙적인 것 또는 적응성이 있는 것으로 정의될 수 있다. 규칙적인 메시 모델은 기초가 되는 신호 특성을 고려하지 않고 설계되는 반면, 적응성이 있는 방법은 정점들과 에지들을 기초가 되는 비디오 신호의 특성들과 관련하여 공간적으로 배치하고자 시도한다.In general, a mesh model can be defined as being regular or adaptive. The regular mesh model is designed without considering the underlying signal characteristics, while the adaptive method attempts to spatially place vertices and edges in relation to the characteristics of the underlying video signal.

비디오 내의 이미지화된 객체가 메시내의 에지들과 잘 대응하는 공간적 불연속성을 갖는다고 가정하면, 규칙적인 메시 표현은, 모션 또는 균등하게는 모션 내의 고유한 변형이 예측되거나 모델링될 수 있는 수단을 제공한다.Assuming that the imaged object in the video has a spatial discontinuity that corresponds well with the edges in the mesh, regular mesh representation provides a means by which motion or evenly inherent deformations in motion can be predicted or modeled.

적응성이 있는 메시는 규칙적인 메시보다 기본이 되는 비디오 신호의 특성에 대해 실질적으로 더 많은 고려를 하여 형성된다. 부가적으로, 이러한 메시의 적응성이 있는 본성에 의해 시간에 따른 메시의 다양한 정제가 가능하다.Adaptive meshes are formed with substantially more consideration for the characteristics of the underlying video signal than regular meshes. In addition, the adaptive nature of this mesh allows for various refinements of the mesh over time.

본 발명은 메시, 및 균등하게는 화소 등록을 수행하기 위하여, 동질성 기준을 이용하여 정점 검색 정렬을 조정한다. 이종 강도 경사도와 공간적으로 연관되 어 있는 정점들은 보다 동질성이 있는 경사도를 갖는 것 이전에 추정되는 모션이다.The present invention adjusts vertex search alignment using homogeneity criteria to perform mesh and evenly pixel registration. The vertices that are spatially associated with the heterogeneous intensity gradients are motions that are estimated before having more homogeneous gradients.

바람직한 실시예에서, 메시의 정점 모션 추정은 동일하거나 거의 동일한 동질성을 갖는 정점들에 대한 모션 추정의 공간적 플러드-충진(flood-filling)을 통해 부가적으로 우선시된다.In a preferred embodiment, the vertex motion estimation of the mesh is additionally prioritized through spatial flood-filling of motion estimation for vertices having the same or nearly the same homogeneity.

바람직한 실시예에서, 최초의 메시 공간 구조 및 최종 메시 구조는 표준 그래픽 충진 루틴을 이용하여 매핑 이미지를 면 식별자들로 충진함으로써 면 레벨에서 서로에 대해 매핑된다. 각각의 삼각형과 연관된 아핀(affine) 변환은 변환 테이블에서 빠르게 찾을 수 있고 하나의 메시내의 면과 연관된 화소 위치들은 다른 메시내의 위치로 빠르게 변환될 수 있다.In a preferred embodiment, the initial mesh space structure and the final mesh structure are mapped to each other at the face level by filling the mapping image with face identifiers using a standard graphic filling routine. The affine transformation associated with each triangle can be found quickly in the translation table and the pixel positions associated with the face in one mesh can be quickly transformed into positions in the other mesh.

바람직한 실시예에서, 각각의 모션 추정 정합과 연관된 나머지 오차를 추정하기 위하여, 예비적인 모션 추정이 정점들에 대해 이루어진다. 이러한 예비적 추정은 정점들의 모션 추정 순서를 우선시하기 위하여 부가적으로 사용된다. 이러한 나머지 오차 분석의 이점은 더 적은 왜곡과 연관된 모션 추정은 결과적으로 보다 개연성이 있는 메시 토폴로지(topology)를 유지하게 될 것이라는 점이다.In a preferred embodiment, preliminary motion estimation is performed on the vertices to estimate the remaining error associated with each motion estimation match. This preliminary estimation is additionally used to prioritize the motion estimation order of the vertices. The advantage of this remaining error analysis is that motion estimation associated with less distortion will result in a more likely mesh topology.

바람직한 실시예에서, 메시 정점 모션 추정은 몇몇 제한된 범위로 스케일 다운되며, 메시가 보다 전역적으로(globally) 최적이고 토폴로지 면에서 정확한 솔루션에 접근할 수 있도록 다수의 모션 추정이 여러 번의 반복을 통해 이루어진다.In the preferred embodiment, the mesh vertex motion estimation scales down to some limited range, and multiple motion estimations are made through multiple iterations so that the mesh can access a more globally optimal and topologically accurate solution. .

바람직한 실시예에서, 각각의 정점에 중심이 있는 사각형의 타일 근접성을 이용하는 블록 기반 모션 추정이 사용되어 보간된 다각형 근접성을 고려하여 정점 변위를 결정한다. 공간 보간을 회피하는 것과 오차 경사도 감소를 위한 화소의 뒤틀림에 부가하여, 이 기법도 또한 모션 추정의 병렬 계산을 가능하게 한다.In a preferred embodiment, block-based motion estimation using rectangular tile proximity centered at each vertex is used to determine the vertex displacement taking into account interpolated polygonal proximity. In addition to avoiding spatial interpolation and distortion of pixels for reducing error gradients, this technique also allows parallel computation of motion estimation.

위상 기반 모션 추정Phase based motion estimation

종래 기술에서, 블록 기반 모션 추정은 통상적으로 하나 이상의 공간 매치에서 초래하는 공간 탐색으로서 통상적으로 실행되었다. 도3에 도시된 바와 같이, 위상 기반 정규화 크로스 상관(PNCC)은 현재 프레임 및 이전 프레임으로부터의 블록을 "위상 공간"으로 변환하고 이들 두 블록들의 크로스 상관을 탐색한다. 크로스 상관은 자신의 위치가 두 블록들 사이의 에지의 "위상 시프트"에 대응하는 값들의 필드로서 표현된다. 이러한 위치들은 임계를 통해 분리되고, 그로 인해 공간 좌표로 역변환된다. 공간 좌표는 개별 에지 변위이며, 모션 벡터에 대응한다. In the prior art, block-based motion estimation has typically been performed as a spatial search resulting in one or more spatial matches. As shown in Figure 3, a phase based normalized cross correlation (PNCC) transforms blocks from the current frame and the previous frame into "phase space" and searches for the cross correlation of these two blocks. Cross correlation is expressed as a field of values whose position corresponds to the "phase shift" of the edge between two blocks. These positions are separated through the thresholds and are therefore inversely transformed into spatial coordinates. The spatial coordinates are the individual edge displacements and correspond to the motion vectors.

PNCC의 장점은 비디오 스트림에서 게인/노출 조정의 공차를 가능하게 하는 콘트라스트 마스킹을 포함한다. 또한, PNCC는 공간적으로 기반된 모션 추정으로부터 많은 반복을 취할 수도 있는 단일 단계로부터 경과를 허용한다. 게다가, 모션 추정은 서브 픽셀 정확도이다. Advantages of the PNCC include contrast masking, which allows tolerance of gain / exposure adjustments in the video stream. In addition, the PNCC allows for elapsed from a single step, which may take many iterations from spatially based motion estimation. In addition, motion estimation is subpixel accuracy.

본 발명의 일 실시예는 자신의 값이 더 높은 등급의 모션 모델로 이후에 분해되는 유한 차분을 생성하기 위해 비디오의 하나 이상의 프레임들 사이의 이미지 경사도 필드의 분석에서 PNCC를 이용한다. One embodiment of the present invention uses PNCC in the analysis of an image gradient field between one or more frames of video to produce a finite difference whose value is later decomposed into a higher grade motion model.

글로벌 등록Global registration

일 실시예에서, 본 발명은 둘 이상의 비디오 프레임들에서 검출된 객체의 대응 요소들 사이의 관계를 이용함으로써 대응 모델을 생성한다. 이러한 관계는 유 한 차분 추정의 필드로부터 하나 이상의 선형 모델을 분해한다. 용어 필드는 공간적 위치를 갖는 각각의 유한 차분을 말한다. 이러한 유한 차분은 검출 및 추적 섹션에서 기재된 비디오의 이종의 프레임들에서 대응하는 객체 특징의 과도적인 변위일 수 있다. 이러한 샘플링이 발생하는 필드는 본원에서 유한 차분의 일반적 모집단으로서 언급된다. 설명된 방법은 M.A.Fischler, R.C. Bolles. "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography" Comm. of the ACM, Vol 24, pp 381-395, 1981에 기재되어 있는 RANSAC 알고리즘의 추정과 유사한 강고한 추정을 이용한다. In one embodiment, the present invention creates a correspondence model by using a relationship between corresponding elements of an object detected in two or more video frames. This relationship decomposes one or more linear models from the field of finite difference estimation. The term field refers to each finite difference with spatial location. This finite difference may be the transient displacement of the corresponding object feature in the heterogeneous frames of video described in the detection and tracking section. The field in which this sampling occurs is referred to herein as a general population of finite differences. The described method is described in M.A.Fischler, R.C. Bolles. "Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography" Comm. We use a robust estimation similar to that of the RANSAC algorithm described in of the ACM, Vol 24, pp 381-395, 1981.

도 4에 도시된 것처럼, 글로벌 모션 모델링의 경우에, 유한 차분은 일반적인 모집단 풀(404)로 수집되는 변형 모션 추정(402)이며, 상기 풀(402)은 이러한 모션 추정의 랜덤 샘플링(410)에 의해 반복적으로 프로세싱되며 선형 모델은 이들 샘플들로부터 팩토링된다(420). 그 후 결과는 모집단(404)을 조정하기 위해 사용되어 랜덤 프로세스를 통해 찾을 때, 모델에 대한 이상점(outlier)의 배제를 통해 선형 모델을 더 잘 분류한다.As shown in FIG. 4, in the case of global motion modeling, the finite difference is a transformed motion estimation 402 that is collected into a general population pool 404, which pools 402 to a random sampling 410 of such motion estimation. Is repeatedly processed and a linear model is factored from these samples (420). The results are then used to adjust the population 404 to better classify the linear model through the exclusion of outliers to the model when looking through the random process.

본 발명은 하나 이상의 강고한 추정기를 이용할 수 있는데: 이들 중 하나는 RANSAC 강고한 추정 프로세스일 수 있다. 강고한 추정기는 종래 기술에 잘 기재되어 있다.The present invention may utilize one or more robust estimators, one of which may be a RANSAC robust estimation process. Robust estimators are well described in the prior art.

선형 모델 추정 알고리즘의 일 실시예에서, 모션 모델 추정기는 선형의 최소 제곱 해(least square solution)를 기반으로 한다. 이러한 의존성에 의해, 추정기는 이상점 데이터에 의해 떨어져 나간다. RANSAC에 기초하면, 개시된 방법은 데이 터의 부분집합의 반복적인 추정을 통해 이상점의 효과에 역행하는 강인한 방법이며, 데이터의 중요한 부분집합을 설명할 모션 모델을 탐색한다. 각각의 탐색에 의해 생성된 모델은 이 모델이 표현하는 데이터의 백분율에 대해 테스팅된다. 만일 충분한 수의 반복이 있다면, 데이터의 가장 큰 부분집합과 맞는 모델이 발견될 것이다. 이러한 강고한 선형의 최소 제곱 회귀추정(least square regression)을 수행하는 방법의 기재는 R. Dutter 및 P.J.Huber의 "Numerical methods for the nonlinear robust regresion problem." Journal of Statistical and Computational Simulation, 13:79-113, 1981에 기재되어 있다.In one embodiment of the linear model estimation algorithm, the motion model estimator is based on a linear least square solution. Due to this dependency, the estimator is separated by outlier data. Based on RANSAC, the disclosed method is a robust method that reverses the effect of outliers through iterative estimation of a subset of data, and searches for a motion model to account for an important subset of data. The model generated by each search is tested against the percentage of data represented by this model. If there are enough repetitions, a model will be found that fits the largest subset of data. A description of how to perform this robust linear least square regression is described in R. Dutter and P.J.Huber's "Numerical methods for the nonlinear robust regresion problem." Journal of Statistical and Computational Simulation, 13: 79-113, 1981.

도 4에서 이해되고 설명된 바와 같이, 본 발명은 유한 차분(샘플)의 초기 샘플링 및 선형 모델의 최소 제곱 추정을 포함하는 알고리즘의 변형의 형태인 RANSAC 알고리즘을 초과하는 혁신을 개시한다. 종합 에러는 해결된 선형 모델을 이용하여 일반 모집단에서 모든 샘플에 대해 평가된다. 랭크는 자신의 나머지가 예정된 임계치에 일치하는 샘플의 수에 기초하여 선형 모델에 할당되며, 이러한 랭크는 "후보 컨센서스"로 고려된다. As understood and described in FIG. 4, the present invention discloses an innovation that exceeds the RANSAC algorithm, which is a form of modification of the algorithm including initial sampling of finite differences (samples) and least squares estimation of the linear model. The comprehensive error is evaluated for all samples in the general population using the solved linear model. The rank is assigned to the linear model based on the number of samples whose remainder matches the predetermined threshold, which rank is considered a "candidate consensus".

초기 샘플링, 해법, 및 랭킹은 반복적으로 결과 표준이 충족될 때까지 실행된다. 일단 표준이 충족되면, 최고의 랭크를 갖는 선형 모델은 모집단의 최종 컨센서스로 고려된다. Initial sampling, solutions, and rankings are performed repeatedly until the result standard is met. Once the standard is met, the linear model with the highest rank is considered the final consensus of the population.

옵션 정밀화 단계는 후보 모델에 대한 최상의 적합 등급에서 샘플들의 서브세트를 반복적으로 분석하는 단계, 및 하나 이상의 샘플 부가가 전체 서브세트에 대한 나머지 에러 임계치를 초과할 때까지 서브세트의 크기를 증가시키는 단계를 포함한다. The option refinement step includes iteratively analyzing the subset of samples at the best fit class for the candidate model, and increasing the size of the subset until one or more sample additions exceed the remaining error thresholds for the entire subset. It includes.

도 4에 도시된 것처럼, 글로벌 모델 추정 프로세스(450)는 콘센서스 랭크 수용성 테스트가 만족(452)될 때까지 반복된다. 랭크가 얻어지지 않았을 때, 유한 차분(404)의 모집단은 선형 모델을 보이기 위한 노력으로 발견된 모델에 대하여 분류된다. 최고의(가장 높은 랭크) 모션 모델이 프로세스(460)에서 해집합에 부가된다. 그 후, 모델이 프로세스(470)에서 재추정된다. 완결시에, 모집단(404)은 다시 정렬된다.As shown in FIG. 4, the global model estimation process 450 is repeated until the consensus rank acceptance test is satisfied 452. When no rank is obtained, the population of finite differences 404 is classified for the model found in an effort to show a linear model. The best (highest rank) motion model is added to the solution set in process 460. The model is then reestimated at process 470. Upon completion, population 404 is rearranged.

본 발명의 설명된 제한적이지 않은 실시예는 특정 선형 모델에 대응하는 다른 파라미터 벡터 공간에서 서브 공간 메니폴드를 결정하기 위해 유한 차분 벡터의 필드로서 전술된 벡터 공간을 샘플링하는 일반적 방법으로서 추가로 일반화된다. The described non-limiting embodiment of the present invention is further generalized as a general method of sampling the aforementioned vector space as a field of finite difference vectors to determine the subspace manifold in another parameter vector space corresponding to a particular linear model.

광역 등록 프로세스의 추가적인 결과는, 광역 등록 프로세스와 로컬 블록 프로세스 사이의 차가 로컬 등록 나머지를 초래한다는 것이다. 이러한 나머지는 로컬 모델을 근사화하는데 있어서 광역 모델의 에러이다. An additional consequence of the wide area registration process is that the difference between the wide area registration process and the local block process results in local registration rest. This remainder is an error in the global model in approximating the local model.

정규화Normalization

정규화는 표준, 또는 공통의 공간 구성을 향한 공간 강도 필드의 재샘플링을 의미한다. 이러한 상대적인 공간 구성이 상기 구성들 사이에서 반전가능한 공간 변화일 때, 재샘플링 및 화소의 동반하는 보간은 토폴로지 한계까지 반전가능하다. 본 발명의 정규화 방법은 도 5에 개시된다. Normalization means resampling a spatial intensity field towards a standard, or common spatial configuration. When this relative spatial configuration is a reversible spatial change between the configurations, the resampling and the accompanying interpolation of pixels are invertible to the topological limit. The normalization method of the present invention is disclosed in FIG.

두 개 이상의 공간 강도 필드가 정규화될 때, 증가된 계산 효율은 중간 정규화 계산을 유지함으로써 달성될 수도 있다. When two or more spatial intensity fields are normalized, increased computational efficiency may be achieved by maintaining an intermediate normalization calculation.

등록을 위해, 또는 등가적으로 정규화를 위해 이미지를 샘플링하는데 사용된 공간 변환 모델은 광역 및 로컬 모델을 포함한다. 광역 모델은 변환으로부터 투영으로 등급을 증가시킨다. 로컬 모델은 블록 또는 더욱 복잡하게 구분적으로 선형 메시에 의해 기본적으로 결정된 이웃한 화소에 대한 보간을 적용하는 유한 차분이다. The spatial transformation model used to sample the image for registration, or equivalently for normalization, includes global and local models. The global model increases the grade from transformation to projection. The local model is a finite difference that applies interpolation for neighboring pixels that are basically determined by blocks or, more complexly, linear meshes.

정규화된 강도 필드에 대한 원본 강도 필드의 보간은 강도 필드의 서브세트에 기초한 PCA 출현 모델의 선형성을 증가시킨다.Interpolation of the original intensity field to the normalized intensity field increases the linearity of the PCA emergence model based on a subset of the intensity field.

도 2에 도시된 것처럼, 객체 화소(232 & 234)는 객체 화소(242 & 244)의 정규화된 버전을 생성하도록 다시 샘플링(240)될 수 있다.As shown in FIG. 2, object pixels 232 & 234 can be sampled 240 again to produce a normalized version of object pixels 242 & 244.

메시 기반 정규화Mesh-based normalization

본 발명의 추가적인 실시예는 특징점들을 바둑판 모양으로 하여 삼각형 기반의 메시로 만들고, 메시의 정점들이 추적되고, 각각의 삼각형의 정점들의 상대적인 위치는 이러한 세 개의 정점들과 일치하는 평면에 대해 수직인 3차원 표면을 추정하기 위해 사용된다. 수직한 표면이 카메라의 튀어나온 축과 일치할 때, 이미지화된 삼각형에 대응하는 객체의 최소 변형된 렌더링을 제공할 수 있다. 수직인 직교 표면을 선호하는 경향이 있는 정규화된 이미지를 생성하는 것은 이후의 외관 기반의 PCA 모델의 선형성을 증가시키는 중간 데이터 형태를 보존하는 화소를 생성할 수 있다.A further embodiment of the invention makes the feature points a triangular-based mesh, the vertices of the mesh are tracked, and the relative position of each triangle vertex is 3 perpendicular to the plane coinciding with these three vertices. Used to estimate the dimensional surface. When the vertical surface coincides with the protruding axis of the camera, it can provide a minimal deformed rendering of the object corresponding to the imaged triangle. Generating a normalized image that tends to favor perpendicular orthogonal surfaces can produce pixels that preserve intermediate data shapes that increase the linearity of subsequent appearance-based PCA models.

또 다른 실시예는 글로벌 모션 모델을 암시적으로 모델링하기 위하여 통상적인 블록 기반 모션 추정을 이용한다. 하나의 비 제한적인 실시예에서, 본 방법은 통상적인 블록 기반 모션 추정/예측에 의해 기술된 모션 벡터들로부터 글로벌 유사(affine) 모션 모델을 팩토링한다.Another embodiment uses conventional block-based motion estimation to implicitly model the global motion model. In one non-limiting embodiment, the method factors the global affinity motion model from the motion vectors described by conventional block-based motion estimation / prediction.

본원 발명의 방법은 유사 튀어나온 해의 집합에 대한 선형 해를 포함하는 하나 이상의 글로벌 모션 추정 기법을 이용한다. 다른 튀어나온 모델 및 솔루션 방법은 종래 기술에 기재되어 있다.The method of the present invention utilizes one or more global motion estimation techniques including linear solutions to a set of pseudo-protruding solutions. Other protruding models and solution methods are described in the prior art.

도 9는 글로벌 및 로컬 정규화를 결합하는 방법을 도시한다.9 illustrates a method of combining global and local normalization.

점진적 기하구조 정규화Progressive Geometry Normalization

공간적 불연속의 분류가 사용되어 불연속이 메시 에지들과 일치할 때 불연속을 암시적으로 모델링하기 위하여 테셀레이션 된(tessellation) 메시를 정렬한다.A classification of spatial discontinuity is used to align the tessellation mesh to implicitly model the discontinuity when the discontinuity coincides with the mesh edges.

동종의 영역 경계는 다각형 궤적에 의해 근사화된다. 궤적은 각각의 다각형 정점의 돌출 우선도(saliency priority)를 결정하기 위하여 연속적으로 더 낮은 정밀도로 연속적으로 근사화된다. 정점 우선도는 공유된 정점들에 대한 정점 우선도를 보존하기 위하여 영역들을 가로질러 전파된다.Homogeneous region boundaries are approximated by polygonal trajectories. The trajectories are successively approximated with successively lower precision to determine the salience priority of each polygonal vertex. Vertex priority propagates across regions to preserve vertex priority for shared vertices.

본 발명의 일 실시예에서, 다각형 분해 방법은 필드의 동종 분류와 연관된 경계들의 우선화를 가능하게 한다. 화소들은 스펙트럼 유사성과 같은 동종성 기준에 따라 분류된 후, 분류 라벨이 영역들로 공간적으로 연결된다. 추가적인 바람직한 비제한적인 실시예에서, 4- 또는 8- 연결성(connectedness) 기준이 공간 연결성을 결정하기 위해 적용된다.In one embodiment of the present invention, the polygon decomposition method enables the prioritization of the boundaries associated with homogeneous classification of fields. The pixels are classified according to homogeneity criteria such as spectral similarity, and then the classification label is spatially connected to the regions. In a further preferred non-limiting embodiment, 4- or 8-connectedness criteria are applied to determine spatial connectivity.

바람직한 실시예에서, 이들 공간 영역들의 경계는 그 후 다각형으로 이산화(discretization)된다. 모든 동종의 영역들에 대한 모든 다각형의 공간적 덧씌 움은 그 후 예비적인 메시로 테셀레이션되고 결합된다. 이 메시의 정점들은 여러 기준을 이용하여 분해되어, 원래의 메시의 많은 지각력있는 돌출성을 유지하는 보다 간단한 메시 표현을 생성한다.In a preferred embodiment, the boundaries of these spatial regions are then discretized into polygons. The spatial overlay of all polygons for all homogeneous regions is then tessellated and combined into a preliminary mesh. The vertices of this mesh are decomposed using a number of criteria, creating a simpler mesh representation that retains the many perceptible protrusions of the original mesh.

바람직한 실시예에서, 본 명세서의 다른 부분에 개시되는 이미지 등록 방법은 강한 이미지 경사도(gradient)를 이용하여 이러한 높은 우선도의 정점들을 향해 편향된다. 결과적인 변형 모델은 이미지화된 객체의 기하구조와 연관된 공간적 불연속을 유지하는 경향이 있다.In a preferred embodiment, the image registration method disclosed elsewhere herein is biased towards these high priority vertices using strong image gradients. The resulting deformation model tends to maintain the spatial discontinuity associated with the geometry of the imaged object.

바람직한 실시예에서, 능동 궤적(active contour)이 영역 경계를 정제하기 위해 사용된다. 각각의 다각형에 대한 능동 궤적은 한 번의 반복을 전파하도록 허용된다. 각각의 능동 궤적 정점의 "변형" 또는 모션은 평균화한 동작으로 결합되어 모두 멤버쉽을 갖고 있는 암시된 메시의 강제적인 전파를 가능하게 한다.In a preferred embodiment, active contours are used to refine the region boundaries. The active trajectory for each polygon is allowed to propagate one iteration. The "deformation" or motion of each active trajectory vertex is combined into an averaged motion to allow forced propagation of the implied mesh, all of which have membership.

바람직한 실시예에서, 상이한 영역의 궤적의 부분이기도 한 인접한 정점들에 대해 갖고 있는 인접 정점들의 수의 카운트에 정점들이 할당된다. 이러한 다른 정점들은 반대인 것으로 정의된다. 1의 카운트를 갖는 정점의 경우에, 방대 정점을 갖지 않으며, 따라서 보존될 필요가 있다. 2개의 인접한 반대 정점 모두는 1의 카운트를 갖고(이러한 2개의 정점들은 상이한 다각형 내에 있고 서로와 인접함을 의미), 그 후 하나의 정점은 다른 것에 대해 분석된다. 1의 정점이 2의 값을 갖는 이웃하는 다각형 정점과 대향할 때, 1의 카운트를 갖는 정점은 2의 카운트를 갖는 정점으로 분해되고, 그 정점 카운트는 1이 된다. 그러므로 만약 또 다른 이웃하는 대향 정점이 존재한다면, 이 정점은 다시 분해될 수 있다. 이 경우에 대하여, 원 래의 정점 카운트를 저장하는 것이 중요하므로, 정점이 분해될 때, 우리는 원래의 정점카운트에 기초하여 분해의 방향을 편향시킬 수 있다. 이와 같이 정점 a는 정점 b로 분해된 후, 정점 b는 정점 c로 분해되지 않을 것이며, 대신 정점 c는 정점 b로 분해되어야 하는데 이는 b가 이미 한 번의 분해에서 사용되었기 때문이다.In a preferred embodiment, the vertices are assigned to a count of the number of adjacent vertices that they have for adjacent vertices that are also part of the trajectory of the different area. These other vertices are defined as opposites. In the case of a vertex with a count of 1, it does not have a massive vertex and thus needs to be preserved. Both adjacent opposite vertices have a count of 1 (meaning that these two vertices are in different polygons and are adjacent to each other), and then one vertex is analyzed for the other. When a vertex of 1 opposes a neighboring polygonal vertex with a value of 2, a vertex with a count of 1 is decomposed into a vertex with a count of 2, and the vertex count becomes 1. Therefore, if there is another neighboring opposite vertex, this vertex can be decomposed again. For this case, it is important to store the original vertex count, so when the vertex is decomposed, we can deflect the direction of decomposition based on the original vertex count. Thus, after vertex a is decomposed to vertex b, vertex b will not decompose to vertex c, instead vertex c should be decomposed to vertex b, because b has already been used in one decomposition.

바람직한 실시예에서, T 접합점들이 특이하게 프로세싱된다. 이들 접합점은 인접한 다각형내에 점(point)을 가지지 않는 다각형내의 점들이다. 이 경우, 각각의 다각형 정점은 먼저 이미지 포인트 맵상에 플로팅되고, 이 맵은 정점의 공간상의 위치와 정점의 다각형 식별자를 식별한다. 그 후, 각각의 다각형 경계선이 트래버스법으로 측정되고, 또 다른 다각형으로부터의 인접한 정점들이 존재하는 지를 확인하기 위해 테스트된다. 만약 다른 영역으로부터의 이웃하는 정점들이 존재하면, 이들은 각각 현재의 다각형으로부터의 이웃하는 정점을 갖고 있는지를 확인하기 위해 테스트된다. 그렇지 않으면, 현재의 점이 현재 다각형의 정점으로써 부가된다. 이러한 가외의 테스트는 다른 다각형내에 있는 고립된 정점들이 T 접합점을 생성하기 위해 사용됨을 보증한다. 그렇지 않으면, 이는 새로운 정점들을 단지 부가할 것이다 -이 영역은 이미 매칭하는 정점을 가지고 있었다-. 따라서 이웃하는 정점이 이러한 현재의 영역에 의해 대향되지 않기만 하면 대향하는 정점이 부가된다. 추가의 실시예에서, T 접합을 검출하는 능률이 마스크 이미지를 사용함으로써 증가된다. 다각형 정점들은 순차적으로 방문되고, 마스크는 정점의 화소들이 다각형 정점에 속하는 것으로 식별되도록 업데이트된다. 그 후, 다각형 경계 화소가 트래버스법으로 측정되고 이들이 다각형 정점과 일치한다면, 현재의 다각형내의 정 점으로써 기록된다.In a preferred embodiment, the T junctions are specifically processed. These junctions are points in a polygon that do not have points in adjacent polygons. In this case, each polygon vertex is first plotted on an image point map, which map identifies the vertex's spatial location and the vertex's polygon identifier. Each polygon boundary is then measured by traverse method and tested to see if there are adjacent vertices from another polygon. If there are neighboring vertices from other regions, they are tested to see if they each have neighboring vertices from the current polygon. Otherwise, the current point is added as the vertex of the current polygon. This extra test ensures that isolated vertices in other polygons are used to create the T junction. Otherwise, it will just add new vertices-this region already had matching vertices. Thus, opposite vertices are added as long as neighboring vertices are not opposed by this current region. In a further embodiment, the efficiency of detecting the T junction is increased by using a mask image. The polygon vertices are visited sequentially, and the mask is updated to identify the pixels of the vertices as belonging to the polygon vertices. Then, if polygon boundary pixels are measured by the traverse method and they coincide with polygon vertices, they are recorded as vertices in the current polygon.

바람직한 실시예에서, 스펙트럼 영역이 하나 이상의 중첩하는 동종 이미지 경사 영역에 의해 다시 맵핑되었고, 또 다른 동종 스펙트럼 영역도 중첩할 때, 이미 다시 맵핑된 모든 영역들에는 현재 다시 맵핑되고 있는 그 영역들과 동일한 라벨을 부여받는다. 따라서 본질적으로, 만약 스펙트럼 영역이 두 개의 동종 영역들에 의해 중첩된다면, 이들 두 개의 동종 영역들에 의해 중첩되는 모든 스펙트럼 영역들은 동일한 라벨을 얻을 것이고, 따라서 하나의 스펙트럼 영역은 두 개의 동종 영역들을 대신하여 하나의 동종 영역에 의해 진정하게 커버될 것 같다.In a preferred embodiment, when the spectral region has been remapped by one or more overlapping homogeneous image oblique regions and another homogeneous spectral region also overlaps, all regions already remapped are identical to those regions currently being remapped. Receive a label. Thus in essence, if the spectral regions overlap by two homogeneous regions, all the spectral regions overlapped by these two homogeneous regions will get the same label, so that one spectral region replaces the two homogeneous regions Seems to be truly covered by one homologous region.

본 발명의 일 실시예에서, 인접 합병(merge) 영역을 찾기 위해서는 영역 리스트보다는 영역 맵을 프로세싱하는 것이 유리하다. 추가적인 실시예에서, 스펙트럼 분할 분류기는 비동종 영역을 이용하여 분류기를 트레이닝하기 위해 수정될 수 있다. 이에 의해 프로세싱은 스펙트럼 영역의 에지에 집중할 수 있다. 또한, 에지들을 이용하는 것(가령, 캐니(canny) 에지 검출기)에 기초하여 상이한 분할을 부가하는 것과, 그 후 이를 다각형의 최초 집합을 식별하기 위하여 능동 궤적으로 피딩(feeding)하는 것은 동종 영역의 더 큰 판별을 가능하게 한다.In one embodiment of the invention, it is advantageous to process the region map rather than the region list to find adjacent merge regions. In a further embodiment, the spectral split classifier may be modified to train the classifier using nonhomogeneous regions. This allows processing to concentrate on the edge of the spectral region. In addition, adding different partitions based on using edges (eg, canny edge detector), and then feeding them with active trajectories to identify the initial set of polygons is more of a homogeneous region. Enable large discrimination

로컬 정규화Local normalization

본 발명은 공간시간 스트림내의 화소들이 '로컬' 방식으로 등록될 수 있는 수단을 제공한다.The present invention provides a means by which pixels in a space-time stream can be registered in a 'local' manner.

한 가지 이러한 로컬화된 방법은, 이미지화된 사상(事象) 또는 구체적으로 이미지화된 객체의 로컬 변형과 관련하여 표면적인 이미지 밝기 불변성 모호성을 해결할 때 이미지화된 사상의 로컬화된 간섭성(coherency)이 설명되도록 화소들을 분석하는 수단을 제공하기 위하여 기하학적 메시의 공간 어플리케이션을 사용한다.One such localized method describes the localized coherency of the imaged mapping when solving the apparent image brightness invariance ambiguity in relation to the imaged mapping or specifically the local deformation of the imaged object. The spatial application of the geometric mesh is used to provide a means for analyzing the pixels as much as possible.

이러한 메시가 사용되어 로컬 정규화의 수단으로써 이미지 평면에서의 표면 변형의 구분적으로(piece-wise) 선형인 모델을 제공한다. 이미지화된 사상은 비디오내의 모션과 비교하여 비디오 스트림의 시간 분석이 높을 때 이러한 모델과 종종 대응한다. 모델 가정에 대한 예외는 다음과 같은 다양한 기법, 즉 위상적 제약, 주변 정점 제한, 및 화소와 이미지 경사 영역의 동종성의 분석을 포함하는 다양한 기법을 통하여 다루어진다.This mesh is used to provide a piece-wise linear model of surface deformation in the image plane as a means of local normalization. Imaged mapping often corresponds to this model when the temporal analysis of the video stream is high compared to motion in the video. Exceptions to model assumptions are addressed through various techniques, including topological constraints, peripheral vertex limitations, and analysis of homogeneity of pixel and image gradient regions.

일 실시예에서, 특징점은 삼각형 엘리먼트(이의 정점이 특징점에 대응함)로 구성된 메시를 생성하기 위해 사용된다. 대응하는 특징점들은, 로컬 변형 모델을 생성하기 위하여, 삼각형 및 이에 대응하여 화소들의 보간된 "왜곡(warping)"을 암시하는 다른 프레임이다.In one embodiment, the feature point is used to create a mesh composed of triangular elements whose vertices correspond to the feature point. Corresponding feature points are another frame that suggests an interpolated "warping" of the triangle and corresponding pixels to produce a local deformation model.

도 7은 이러한 객체 메시의 생성을 도시하고 있다. 도 8은 프레임을 로컬방식으로 정규화하기 위한 이러한 객체 메시의 이용을 도시하고 있다.7 illustrates the creation of such an object mesh. 8 illustrates the use of such an object mesh to normalize a frame locally.

한 가지 바람직한 실시예에서, 맵의 각 화소가 얻어진 삼각형을 식별하는 삼각형 맵이 생성된다. 또한, 각각의 삼각형에 대응하는 아핀 변환은 최적화 단계로써 미리 계산된다. 그리고 추가적으로, 로컬 변형 모델을 생성할 때, 샘플링할 소스 화소의 좌표를 결정하기 위하여 공간 좌표를 이용하여 앵커 이미지(anchor image)가 미리 트래버스법으로 측정된다. 이 샘플링된 화소는 현재의 화소 위치를 대체할 것이다.In one preferred embodiment, a triangular map is generated that identifies the triangle from which each pixel of the map was obtained. In addition, the affine transformation corresponding to each triangle is precomputed as an optimization step. In addition, when generating a local deformation model, an anchor image is measured by a traverse method in advance using spatial coordinates to determine coordinates of a source pixel to be sampled. This sampled pixel will replace the current pixel position.

다른 실시예에서, 로컬 변형은 글로벌 변형 후에 수행된다. 이전에 개시된 명세서에서, 글로벌 정규화는, 글로벌 등록 방법이 비디오의 둘 이상의 프레임에서 화소를 공간적으로 정규화하기 위해 사용되는 프로세스로써 기재되었다. 결과적인 글로벌하게 정규화된 비디오 프레임은 로컬 방식으로 추가로 정규화될 수 있다. 이러한 두 방법의 결합은 로컬 정규화를 해에 글로벌하게 도착된 정제로 제약한다. 이는 로컬 방법이 해결할 것이 요구되는 모호성을 현저히 감소시킨다.In another embodiment, local modifications are performed after global modifications. In the previously disclosed specification, global normalization has been described as a process in which a global registration method is used to spatially normalize pixels in two or more frames of video. The resulting globally normalized video frame can be further normalized locally. The combination of these two methods limits local normalization to refinements that arrive globally in the year. This significantly reduces the ambiguity that the local method needs to solve.

또 다른 제한되지 않는 실시예에서, 특징점들, 또는 "정규 메시"의 경우에 있어서의 정점들은, 이들 점의 주변에서 이미지 경사의 분석을 통해 적합화(qualification)된다. 이 이미지 경사는 직접적으로나, 해리스 응답(Harris response)과 같은 몇 가지 간접 계산을 통해 계산될 수 있다. 또한, 이러한 점들은 이미지 경사의 하강과 연관된 모션 추정 에러 또는 공간적 제약에 의해 필터링될 수 있다. 적합화된 점들은 많은 테셀레이션(tessellation) 기법 중 하나에 의해 메시에 대한 기초로 사용될 수 있고, 결과적으로 그 요소들이 삼각형인 메시를 생성한다. 각각의 삼각형에 대하여 아핀 모델이 점들과 이들의 잔여 모션 벡터에 기초하여 생성된다.In another non-limiting embodiment, the feature points, or vertices in the case of a "normal mesh", are qualified through analysis of the image tilt around these points. This image slope can be calculated either directly or through some indirect calculations such as Harris response. In addition, these points may be filtered by motion estimation errors or spatial constraints associated with the falling of the image tilt. Fitted points can be used as the basis for a mesh by one of many tessellation techniques, resulting in a mesh whose elements are triangular. For each triangle an affine model is generated based on the points and their residual motion vector.

본 발명의 방법은 해리스 응답을 포함하는 하나 이상의 이미지 밀도 경사 분석 방법을 이용한다. 다른 이미지 밀도 경사 분석 방법은 종래 기술에 기재되어 있다.The method of the present invention utilizes one or more image density gradient analysis methods that include a Harris response. Other image density gradient analysis methods are described in the prior art.

바람직한 실시예에서, 삼각형 아핀 파라미터의 리스트가 유지된다. 이 리스트는 반복되고 현재/이전의 점 리스트가 (정점 조사 맵을 이용하여) 구성된다. 현 재/이전의 점 리스트는, 상기 삼각형에 대한 아핀 파라미터를 계산하는 변환을 추정하기 위해 사용되는 루틴으로 통과된다. 아핀 파라미터 또는 모델은, 그 후 삼각형 아핀 파라미터 리스트에 저장된다.In a preferred embodiment, a list of triangular affine parameters is maintained. This list is repeated and the current / previous point list is constructed (using the vertex survey map). The current / previous point list is passed to a routine used to estimate the transform that computes the affine parameter for the triangle. The affine parameter or model is then stored in the triangle affine parameter list.

추가적인 실시예에서, 본 방법은 삼각형 식별자 이미지 맵을 트래버스법으로 측정하며, 맵에 있는 각각의 화소는, 화소가 멤버쉽을 갖는 메시 내의 삼각형에 대한 식별자를 포함한다. 그리고 삼각형에 속하는 각각의 화소에 대하여, 이 화소에 대한 대응하는 글로벌 변형 및 로컬 변형 좌표가 계산된다. 이러한 좌표는, 다시, 대응하는 화소를 샘플링하고 그 값을 대응하는 "정규화" 위치에 적용하기 위하여 사용된다.In a further embodiment, the method measures the triangle identifier image map by traverse method, wherein each pixel in the map includes an identifier for a triangle in a mesh in which the pixel has membership. And for each pixel belonging to the triangle, the corresponding global and local deformation coordinates for that pixel are calculated. These coordinates are again used to sample the corresponding pixel and apply its value to the corresponding "normalized" position.

추가적인 실시예에서, 공간적 제약이 이미지 경사의 검색으로부터 생성된 이미지 강도(intensity) 대응 세기(strength) 및 밀도(density)에 기초하여 점들에 적용된다. 점들은, 모션 추정이 이미지 강도 나머지의 몇몇 놈(norm)에 기초하여 행해진 후, 정렬된다. 그 후 점들은 공간 밀도 제약에 기초하여 필터링된다.In a further embodiment, spatial constraints are applied to the points based on image intensity corresponding strength and density generated from the retrieval of the image tilt. The points are aligned after the motion estimation is done based on some norms of the rest of the image intensity. The points are then filtered based on the spatial density constraints.

추가적인 실시예에서, 스펙트럼 공간 분할이 사용되며, 작은 동종 스펙트럼 영역이 공간적 인접성, 이들의 밀도의 유사성, 및/또는 색상에 기초하여 이웃하는 영역들과 병합된다. 그 후, 동종 병합은 동종 텍스처(이미지 경사)의 영역과의 중첩에 기초하여 스펙트럼 영역을 서로 결합하기 위해 사용된다. 그 후 추가적인 실시예는 작은 영역이 보다 큰 영역에 의해 둘러싸여 있는 중앙-포위 점(center-surround point)을 메시의 정점을 지원하기 위한 적합화된 관심점으로 사용한다. 추가의 제한되지 않는 실시예에서, 중앙 포위 점은 크기가 3×3 또는 5×5 또는 7 ×7 화소인 하나의 화소내에 바운딩(bounding) 박스가 존재하는 영역으로 정의되며, 이 바운딩 박스에 대한 공간 이미지 경사는 코너 모양이다. 이 영역의 중심은 코너로 분류될 수 있고, 이 위치를 유리한 정점 위치로 추가로 적합화한다.In further embodiments, spectral spatial partitioning is used, where small homogeneous spectral regions are merged with neighboring regions based on spatial proximity, similarity of their density, and / or color. Homogeneous merging is then used to combine the spectral regions together based on their overlap with the regions of the homogeneous texture (image gradient). Further embodiments then use a center-surround point where small areas are surrounded by larger areas as a fitted point of interest to support the vertices of the mesh. In a further non-limiting embodiment, the central enclosing point is defined as the area where a bounding box exists within one pixel of size 3x3 or 5x5 or 7x7 pixels, for this bounding box. The spatial image slope is a corner shape. The center of this region can be classified as a corner and further adapts this position to an advantageous vertex position.

추가적인 실시예에서, 수평 및 수직 화소 유한 차분 이미지가 사용되어 각각의 메시 에지의 세기를 분류한다. 만약 에지가 공간 위치와 일치하는 많은 유한 차분을 가진다면, 에지와 따라서 이 에지의 정점은 이미지화된 사상의 로컬 변형에 매우 중요한 것으로 고려된다. 만약 에지의 유한 차분의 합의 평균들 사이에 큰 미분계수 차가 존재하면, 아마 영역 에지는 양자화 단계가 아닌 텍스처 변경 에지에 대응한다.In further embodiments, horizontal and vertical pixel finite difference images are used to classify the intensity of each mesh edge. If an edge has many finite differences that coincide with the spatial position, the edges and therefore the vertices of this edge are considered to be very important for the local deformation of the imaged mapping. If there is a large differential coefficient difference between the means of the sum of the finite differences of the edges, then the area edges probably correspond to the texture changing edges, not the quantization step.

추가적인 실시예에서, 공간 밀도 모델 종결 조건이 사용되어 메시 정점의 프로세싱을 최적화한다. 검출 사각형의 최초(outset)의 공간 영역의 대부분을 커버하는 충분한 수의 점들이 검사되었을 때, 프로세싱은 종결될 수 있다. 종결은 점수를 생성한다. 프로세싱에 입력되는 정점 및 특징점은 이 점수에 의해 정렬된다. 점들이 기존 점들에 공간적으로 너무 가깝다거나, 점이 이미지 경사에 있는 에지에 대응하지 않는다면, 이 점은 무시된다. 그렇지 않다며, 점의 주변에 있는 이미지 경사가 하강되고, 경사의 나머지가 제한을 초과하면, 이 점도 또한 무시된다.In further embodiments, spatial density model termination conditions are used to optimize the processing of mesh vertices. When a sufficient number of points have been examined that cover most of the initial spatial area of the detection rectangle, processing can be terminated. Closing generates a score. Vertices and feature points input to the processing are sorted by this score. If the points are too close in space to existing points, or if the points do not correspond to edges in the image slant, this point is ignored. If not, then the image slope around the point is lowered, and if the rest of the slope exceeds the limit, this also is ignored.

바람직한 실시예에서, 로컬 변형 모델링이 반복적으로 수행되며, 반복 마다의 정점 변위가 감소함에 따라 해에 수렴한다.In a preferred embodiment, local deformation modeling is performed iteratively, converging on the solution as the vertex displacement per iteration decreases.

또 다른 실시예에서, 로컬 변형 모델링이 수행되고, 글로벌 변형이 동일한 정규화 이익을 이미 제공했다면 모델 매개변수가 무시된다.In another embodiment, local deformation modeling is performed and the model parameters are ignored if the global deformation has already provided the same normalization benefit.

규칙적인 메시 정규화(Regular Mesh Normalization)Regular Mesh Normalization

본 발명은 전술한 로컬 정규화 방법을 규칙적인 메시를 이용하여 확장한다. 이 메시는 기초가 되는 화소와 관련 없이 구성되지만, 검출된 객체에 대응하여 위치되며 크기를 갖는다.The present invention extends the above-described local normalization method using regular meshes. This mesh is constructed irrespective of the underlying pixel, but is positioned and sized corresponding to the detected object.

검출된 객체 영역이 주어지면, 공간 프레임 위치 및 표면의 크기를 나타내는 스케일은 표면 영역의 개시부에 대하여 정규 메시를 생성한다. 바람직한 실시예에서, 사각형 메시의 윤곽을 그리기 위하여 중첩하지 않는 세트의 타일을 사용한 후 삼각형 메시 요소를 갖는 규칙적인 메시를 생성하도록 타일들의 대각선 분할(partitioning)을 사용한다. 추가적인 바람직한 실시예에서, 타일들은 종래의 비디오 압축 알고리즘(예, MPEG-4 AVC)에서 사용되는 것들과 비례한다.Given the detected object region, a scale representing the spatial frame position and the size of the surface produces a regular mesh with respect to the beginning of the surface region. In a preferred embodiment, a diagonal set of tiles is used to create a regular mesh with triangular mesh elements after using a non-overlapping set of tiles to outline a square mesh. In a further preferred embodiment, the tiles are proportional to those used in conventional video compression algorithms (eg MPEG-4 AVC).

바람직한 실시예에서, 전술한 메시와 연관된 정점들은 트레이닝을 위해 사용되는 비디오의 특정 프레임들에서 이 정점들을 둘러싸고 있는 화소 영역의 분석을 통해 우선순위가 부여된다. 이러한 영역에 대한 경사의 분석은 로컬 이미지 경사에 의존하는 각각의 정점과 연관된 프로세싱(가령 블록 기반의 모션 추정)에 관하여 신뢰를 제공한다. In a preferred embodiment, the vertices associated with the mesh described above are prioritized through analysis of the area of pixels surrounding these vertices in specific frames of video used for training. Analysis of the slope for this region provides confidence in the processing (eg block based motion estimation) associated with each vertex that depends on the local image slope.

다수의 프레임에서의 정점 위치의 대응성은 이미지 경사의 간단한 하강을 통해 발견된다. 바람직한 실시예에서, 이는 블록 기반 모션 추정을 통해 얻어진다. 본 실시예에서, 높은 신뢰도의 정점은 높은 신뢰도의 대응성을 가능하게 한다. 낮은 신뢰도의 정점 대응성은 높은 신뢰도의 정점 대응성으로부터의 추론을 통해 모호한 이미지 경사를 해결함으로써 암시적으로 이루어진다.Correspondence of vertex positions in multiple frames is found through a simple descent of the image tilt. In a preferred embodiment, this is obtained through block based motion estimation. In this embodiment, the peak of high reliability enables the correspondence of high reliability. Low reliability vertex correspondence is implicit by solving ambiguous image gradients through inference from the high reliability vertex correspondence.

한 가지 바람직한 실시예에서, 규칙적인 메시는 개시부 추적 사각형에 대하여 생성된다. 타일들은 16×16으로 생성되고, 대각선 방향으로 절단되어, 삼각형 메시를 형성한다. 이러한 삼각형들의 정점들은 모션 추정된다. 모션 추정은 각각의 점들이 가지는 텍스처의 유형에 의존한다. 텍스처는 세 개의 클래스들, 코너, 에지, 및 균일면(homogeneous)으로 나누어지는데, 이는 또한 정점들의 프로세싱의 순서를 정의하기도 한다. 코너 정점은 이웃하는 정점 추정을 이용, 다시 말해, (이용가능하다면) 이웃하는 점들의 모션 추정이 예측 모션 벡터용으로 이용되고, 모션 추정이 각각의 것에 적용된다. 가장 낮은 안 좋은(mad) 오차를 제공하는 모션벡터가 이 정점 모션 벡터로써 사용된다. 코너에 대해 사용되는 검색 전략은 모든 것(넓고, 작고, 최초의 것)이 사용된다. 에지들에 대하여, 다시 가장 가까운 주변 모션 벡터들이 예측 모션 벡터로 사용되고, 최소량의 오차를 갖는 것이 사용된다. 에지에 대한 검색 전략은 작고 최초의 것이다. 균일면에 대해서는, 이웃하는 정점이 검색되고 가장 낮은 오차를 갖는 모션 추정이 사용된다.In one preferred embodiment, a regular mesh is created for the initiation tracking rectangle. The tiles are produced 16 × 16 and cut diagonally to form a triangular mesh. The vertices of these triangles are motion estimated. Motion estimation depends on the type of texture each point has. The texture is divided into three classes, corner, edge, and homogeneous, which also defines the order of processing of the vertices. Corner vertices use neighboring vertex estimation, that is, motion estimation of neighboring points (if available) is used for the predictive motion vector, and motion estimation is applied to each. The motion vector that provides the lowest mad error is used as this vertex motion vector. The search strategy used for the corners is everything (wide, small, first). For the edges, the closest peripheral motion vectors are again used as the predictive motion vector, and the one with the least amount of error is used. The search strategy for the edge is small and the first one. For uniform planes, neighboring vertices are searched and motion estimation with the lowest error is used.

한 가지 바람직한 실시예에서, 각각의 삼각형 정점에 대한 이미지 경사가 계산되고, 클래스 및 크기(magnitude)에 기초하여 정렬된다. 따라서 코너들이 에지 이전이며, 에지들은 균일면 이전이다. 코너에 대해서는, 강한 코너는 약한 코너 이전이며, 에지에 대해서는 강한 에지가 약한 에지 이전이다.In one preferred embodiment, the image slope for each triangle vertex is calculated and aligned based on the class and magnitude. Thus the corners are before the edges and the edges are before the uniform plane. For corners, the strong corner is before the weak corner, and for the edge, the strong edge is before the weak edge.

한 가지 바람직한 실시예에서, 각각의 삼각형에 대한 로컬 변형은 상기 삼각형과 연관된 모션 추정에 기초한다. 각각의 삼각형은 이에 대해 추정된 아핀(affine)을 갖는다. 삼각형이 위상적으로(topologically) 반전되지 않거나, 변 질되지 않는다면, 삼각형의 부분인 화소들은 얻어진 추정 아핀에 기초하여 현재 이미지를 샘플링하기 위해 사용된다.In one preferred embodiment, the local deformation for each triangle is based on the motion estimation associated with that triangle. Each triangle has an affine estimated for it. If the triangle is not topologically inverted or altered, the pixels that are part of the triangle are used to sample the current image based on the estimated affines obtained.

분할(Segmentation)Segmentation

추가로 설명된 분할 프로세스를 통해 식별된 공간 불연속은 이들 각각의 경계의 지리적 파라미터화를 통해 효율적으로 엔코딩되며, 이는 공간 불연속 모델로 불린다. 이러한 공간 불연속 모델은 엔코딩의 서브세트에 대응하는 더욱 더 간결한 한계 설명을 가능하게 하는 진보된 방법으로 엔코딩될 수도 있다. 진보한 엔코딩은 공간 불연속의 돌출 특성의 대부분을 유지하면서 공간 기하를 우선으로 하는 강고한 수단을 제공한다.The spatial discontinuities identified through the segmentation process described further are efficiently encoded through the geographical parameterization of their respective boundaries, which is called the spatial discontinuity model. This spatial discontinuity model may be encoded in an advanced way that enables an even more concise limit description corresponding to a subset of encoding. Advanced encoding provides a robust means of prioritizing the spatial geometry while retaining most of the spatial discontinuity of the protruding properties.

본 발명의 바람직한 실시예는 다해상도 분할 분석을 공간 밀도 필드의 경사 분석과 조합하며 강고한 분할을 얻기 위하여 일시적인 안정도 제약을 이용한다.Preferred embodiments of the present invention combine multiresolution segmentation analysis with gradient analysis of spatial density fields and use temporary stability constraints to obtain robust segmentation.

도 2에 도시된 것처럼, 일단 객체의 특징의 대응이 시간에 대해 추적(220)되고 모델링(224)되었다면, 이러한 모션/변형 모델에 대한 충실한 지지가 객체에 대응하는 화소를 분할(230)하기 위해 사용될 수 있다. 이 프로세스는 비디오(202&204)내의 검출된 다수의 객체(206&208)에 대해 반복될 수 있다. 이러한 프로세싱의 결과는 분할된 객체 화소(232)이다.As shown in FIG. 2, once the correspondence of an object's features has been tracked 220 and modeled 224 over time, faithful support for this motion / deformation model is required to segment 230 the pixels corresponding to the object. Can be used. This process may be repeated for the detected number of objects 206 & 208 in the video 202 & 204. The result of this processing is the segmented object pixel 232.

본 발명에 의해 사용되는 불변 특성 분석의 일 형태는 공간 불연속의 식별에 촛점을 맞춘다. 이러한 불연속은 에지, 음영, 폐색(occlusion), 라인, 코너, 또는 비디오의 하나 이상의 이미지화된 프레임의 화소들 사이에서 갑작스럽고 식별가능한 분리를 유발하는 소정의 다른 가시적 특성을 드러낸다. 추가적으로, 유사하게 채색된 및/또는 텍스쳐화된 객체 사이의 미묘한 공간 불연속은 비디오 프레임에서 객체의 화소가 객체 그 자체와 관련한 코히어런트한 모션을 진행하지만, 서로에 대해 상이한 모션인 경우 드러난다. 본 발명은 돌출 신호 모드와 관련한 공간 불연속을 강고하게 식별하기 위해 스펙트럼, 텍스쳐, 및 모션 분할의 조합을 이용한다. One form of invariant characterization used by the present invention focuses on the identification of spatial discontinuities. This discontinuity reveals edges, shadows, occlusions, lines, corners, or some other visible property that causes sudden and identifiable separation between pixels of one or more imaged frames of the video. Additionally, subtle spatial discontinuities between similarly colored and / or textured objects are revealed when the pixels of an object in a video frame undergo coherent motion with respect to the object itself, but in different motions with respect to each other. The present invention utilizes a combination of spectrum, texture, and motion segmentation to robustly identify spatial discontinuity in relation to the salient signal mode.

시간적 분할Temporal splitting

변환 모션 벡터의 시간적 통합, 또는 등가적으로 더 높은 등급의 모델로의 유한 차분 측정은 종래 기술에 설명된 모션 분할의 형태이다. Temporal integration of transform motion vectors, or finite difference measurements into equivalent higher-grade models, is a form of motion partitioning described in the prior art.

본 발명의 일 실시예에서, 비디오에서 객체 모션의 유한 차분을 나타내는 모션 벡터의 밀집한 필드가 생성된다. 이러한 파생물은 타일의 일정한 분할을 통해, 또는 공간 분할과 같은 소정의 초기 절차에 의해 서로 공간적으로 그룹화된다. 각각의 그룹의 "파생물(derivatives)"은 선형 최소 제곱 추정을 이용하여 더 높은 등급의 모션 모델로 통합된다. 이어 최종 모션 모델은 k-수단 클러스터링(k-means clustering) 기술을 이용하여 모션 모델 공간에서 벡터로서 클러스터링된다. 파생물은 어떤 클러스터가 이들에게 최상으로 적합한지에 기초하여 분류된다. 이어 클러스터 라벨은 공간 분할의 전개로서 공간적으로 클러스터링된다. 프로세스는 공간 분할이 안정될 때까지 연속된다. In one embodiment of the invention, a dense field of motion vectors is generated that represents the finite difference of object motion in the video. These derivatives are spatially grouped with each other through constant division of tiles, or by some initial procedure such as spatial division. Each group of "derivatives" is integrated into a higher class motion model using linear least squares estimation. The final motion model is then clustered as a vector in motion model space using k-means clustering techniques. Derivatives are classified based on which cluster is best suited for them. The cluster labels are then spatially clustered as the evolution of the spatial partition. The process continues until the partitioning is stable.

본 발명의 추가의 실시예에서, 주어진 구경의 모션 벡터는 구경에 대응하는 화소 위치의 세트로 보간된다. 이러한 보간에 의해 한정된 블록이 객체 경계에 대응하는 화소에 미치면, 최종 분류는 블록의 소정의 변칙 대각 분할이다. In a further embodiment of the invention, the motion vectors of a given aperture are interpolated with a set of pixel positions corresponding to the aperture. If a block defined by such interpolation extends to a pixel corresponding to an object boundary, the final classification is a predetermined anomalous diagonal division of the block.

종래 기술에서, 파생물을 통합하기 위해 사용된 최소 제곱 추정기는 이상점 에 매우 민감하다. 민감도는 반복이 넓게 발산하는 포인트에 대한 모션 모델링 클러스터링 방법을 바이어싱하는 모션 모델을 생성할 수 있다. In the prior art, the least squares estimator used to integrate the derivatives is very sensitive to outliers. Sensitivity can generate a motion model that biases the motion modeling clustering method for points with widely divergent iterations.

본 발명에서 모션 분할 방법은 비디오의 두 개 이상의 프레임에 대한 명확한 화소 모션의 분석을 통해 공간 불연속을 식별한다. 명확한 모션은 비디오의 프레임에 대한 일치에 대해 분석되고 파라미터의 모션 모델로 통합된다. 이러한 일정한 모션과 관련된 공간 불연속이 식별된다. 모션 분할은 시간적 변화가 모션에 의해 유발되기 때문에, 시간적 분할로서 언급될 수 있다. 그러나 시간적 변화는 로컬 변형, 조명 변화 등과 같은 소정의 다른 현상에 의해 유발될 수도 있다. In the present invention, the motion segmentation method identifies spatial discontinuity through analysis of explicit pixel motion for two or more frames of video. Clear motion is analyzed for matches to frames of video and incorporated into a parametric motion model. The spatial discontinuity associated with this constant motion is identified. Motion segmentation can be referred to as temporal segmentation because temporal changes are caused by motion. However, the temporal change may be caused by some other phenomenon such as local deformation, lighting change, and the like.

전술한 방법을 통해, 정규화 방법에 대응하는 돌출 신호 모드는 몇몇 배경 감산 방법 중 하나를 통해 주위의 신호 모드(배경 또는 비객체)로부터 식별 및 분리될 수 있다. 종종, 이러한 방법은 배경을 각각의 시간 인스턴스에서 최소의 변화량을 나타내는 화소로서 통계적으로 모델링한다. 변화는 화소 값 차로서 특징지워질 수 있다. Through the aforementioned method, the protruding signal mode corresponding to the normalization method can be identified and separated from the surrounding signal mode (background or non-object) through one of several background subtraction methods. Often, this method statistically models the background as pixels representing the minimum amount of change in each time instance. The change can be characterized as a pixel value difference.

분할 경계 기반 글로벌 변형 모델링은, 객체 둘레의 경계를 생성한 후, 이 경계를 객체의 검출된 중심쪽으로 경계 정점들이 이종 이미지 경사와 일치하는 위치를 얻을 때까지 붕괴시킴으로써 얻어질 수 있다. 모션 추정들은 이러한 새로운 정점 위치들에 대하여 수집되고, 강고한 아핀 추정이 글로벌 변형 모델을 발견하기 위해 사용된다.Segmented boundary based global deformation modeling can be obtained by creating a boundary around an object and then collapsing the boundary towards the detected center of the object until the boundary vertices obtain a position consistent with the heterogeneous image tilt. Motion estimates are collected for these new vertex positions, and robust affine estimates are used to find the global deformation model.

분할 메시 정점 이미지 경사 하강에 기반한 유한 차분은 글로벌 변형 모델로 통합된다.Finite differences based on segmented mesh vertex image gradient descent are incorporated into the global deformation model.

객체 분할Split Object

도 13에 있는 블록도는 객체 분할의 일 바람직한 실시예를 도시한다. 프로세스는 정규화된 이미지의 총합(1302)으로부터 시작되며, 상기 총합은 그 후 총합 사이에서 쌍 방향으로(pair-wise) 차분화된다. 이러한 차분은 그 후 엘리먼트 방향으로(element-wise) 누적 버퍼에 누적(1306)된다. 누적 버퍼는 보다 현저한 오차 영역을 식별하기 위하여 임계점설정(threshold)(1310)된다. 임계점설정된 엘리먼트 마스크는 그 후 누적된 오차 영역(1310)의 공간적 지원을 결정하기 위하여 형태학적으로 분석된다(1312). 형태학적 분석(1312)의 결과적인 추출(1314)은 그 후 객체와 일치하는 누적된 오차 영역상의 이후의 프로세싱에 집중하기 위하여 검출된 객체 위치와 비교된다(1320). 고립된 공간 영역(1320)의 경계는 그 후 정점 외부가 생성(1324)되는 다각형을 이용하여 근사화된다(1322). 외부의 궤적은 그 후 정점의 위치를 능동 궤적 분석(1332)을 위해 더욱 양호하게 초기화하기 위하여 조정된다(1330). 일단 능동 궤적 분석(1332)이 누적된 오차 공간에 있는 저 에너지 해로 수렴되었다면, 궤적은 최종 궤적(1334)으로 사용되고 궤적 내에 속박된 화소들은 객체화소일 것으로 간주되며, 궤적의 외부에 있는 화소들은 비객체 화소들인 것으로 간주된다.The block diagram in FIG. 13 shows one preferred embodiment of object segmentation. The process begins with a sum 1302 of normalized images, which are then pair-wise differentiated between the sum. This difference is then accumulated 1306 in an element-wise accumulation buffer. The cumulative buffer is thresholded 1310 to identify more significant error areas. The thresholded element mask is then morphologically analyzed 1312 to determine the spatial support of the accumulated error region 1310. The resulting extraction 1314 of morphological analysis 1312 is then compared 1320 to the detected object position to focus on subsequent processing on the accumulated error area that matches the object. The boundary of the isolated spatial region 1320 is then approximated (1322) using a polygon from which a vertex outside is generated 1324. The outer trajectory is then adjusted 1330 to better initialize the vertex position for active trajectory analysis 1332. Once the active trajectory analysis 1332 has converged to a low energy solution in the accumulated error space, the trajectory is used as the final trajectory 1334 and the pixels bound within the trajectory are considered to be object pixels, and pixels outside of the trajectory are non-objects. Are considered to be pixels.

바람직한 실시예에서, 모션 분할은 돌출 이미지 모드의 검출된 위치 및 스케일이 주어진다면 어덩질 수 있다. 거리 변환이 사용되어 검출된 위치로부터의 모든 화소의 거리를 결정한다. 최대 거리와 연관된 화소 값이 유지되면, 배경의 합리적인 모델이 해결될 수 있다. 다시 말해서, 주변 신호는 신호 차분 메트릭을 이 용하여 일시적으로 재샘플링된다.In a preferred embodiment, the motion segmentation can be lumped given the detected position and scale of the protruding image mode. Distance transformation is used to determine the distance of all pixels from the detected position. If the pixel value associated with the maximum distance is maintained, a reasonable model of the background can be solved. In other words, the ambient signal is temporarily resampled using the signal difference metric.

추가적인 실시예는 각각의 화소에 거리를 할당하기 위하여 현재의 검출 위치에 대해 거리 변환을 사용하는 것을 포함한다. 화소에 대한 거리가 소정의 최대 화소 거리 테이블에 있는 거리보다 크다면, 화소 값이 기록된다. 적절한 트레이닝 기간 후에, 이 화소에 대한 최대 거리가 크다면, 화소는 배경 화소일 가장 큰 가능성을 갖는 것으로 추정된다.Additional embodiments include using a distance transform for the current detection position to assign a distance to each pixel. If the distance to the pixel is greater than the distance in the predetermined maximum pixel distance table, the pixel value is recorded. After an appropriate training period, if the maximum distance to this pixel is large, it is assumed that the pixel has the greatest likelihood of being a background pixel.

주변 신호의 모델이 주어지면, 각각의 순간에서의 완전한 돌출 신호 모드가 차분화된다. 각각의 이러한 차분은 공간적으로 정규화된 신호 차분(절대 차분)으로 재샘플링될 수 있다.이러한 차분들은 그후 서로에 대해 정렬되고 누적된다. 이러한 차분들이 돌출 신호 모드에 대해 공간적으로 정규화되었으므로, 차분의 피크는 돌출 신호 모드와 연관된 화소 위치에 대부분 대응할 것이다.Given a model of the ambient signal, the complete salient signal mode at each moment is differentiated. Each such difference can be resampled into a spatially normalized signal difference (absolute difference). These differences are then aligned and accumulated with respect to each other. Since these differences are spatially normalized with respect to the protruding signal mode, the peak of the difference will most likely correspond to the pixel position associated with the protruding signal mode.

본 발명의 일 실시예에서, 트레이닝 기간이 정의되는데, 여기서 객체 검출 위치가 결정되고 이들 위치의 중심은, 비객체 화소가 될 가장 큰 가능성을 가질 배경 화소를 생성하기 위하여 프레임 차분을 허용하는 이러한 위치로부터 이격된 검출 위치를 이용하여 최적 프레임 수를 결정하기 위하여 사용된다.In one embodiment of the present invention, a training period is defined, where the object detection positions are determined and the center of these positions allows such frame differences to produce background pixels with the greatest likelihood of becoming non-object pixels. It is used to determine the optimal number of frames using the detection position spaced from the.

본 발명의 일 실시예에서, 능동 궤적 모델링은, 누적된 오차 "이미지"에서 궤적 정점 위치를 결정함으로써 비객체 배경으로부터 전경(foreground) 객체를 분할하기 위하여 사용된다. 바람직한 실시예에서, 능동 궤적 에지들은 검출된 객체의 스케일과 동일한 크기가 되도록 세분되어 더 큰 자유도를 생성한다. 바람직한 실시예에서, 최종 궤적 위치는 규칙적으로 이격된 궤적을 생성하도록 가장 가까운 규칙적 메시 정점에 스냅핑(snapping)된다.In one embodiment of the invention, active trajectory modeling is used to segment the foreground object from the non-object background by determining the trajectory vertex position in the accumulated error “image”. In a preferred embodiment, the active trajectory edges are subdivided to be the same size as the scale of the detected object to create a greater degree of freedom. In a preferred embodiment, the final trajectory position is snapped to the nearest regular mesh vertices to produce regularly spaced trajectories.

객체 분할의 한 가지 제한되지 않는 실시예에서, 일시적으로 쌍방향의 이미지에 대한 오차 이미지 필터 응답을 생성하기 위해 배향된 핵(kernel)이 사용된다. 총체의 모션 방향에 직교하여 배향되는 필터에 대한 응답은, 배경에 대한 모션이 배경의 폐색 및 누출로부터 발생할 때 오차 표면을 증가시키는 경향이 있다.In one non-limiting embodiment of object segmentation, an oriented kernel is used to generate an error image filter response for a temporarily bidirectional image. The response to the filter oriented perpendicular to the direction of motion of the gross tends to increase the error surface when motion to the background results from occlusion and leakage of the background.

정규화된 이미지의 총합에 대한 정규화된 이미지 프레임 강도 벡터는 잔여 벡터를 생성하는 하나 이상의 기준 프레임으로부터 차분화된다. 이러한 잔여 벡터는 누적된 잔여 벡터를 형성하기 위하여 엘리먼트 방향으로 누적된다. 이러한 누적된 잔여벡터는 그 후 객체 및 비객체 화소의 공간적 분할을 위한 공간적 객체 경계를 정의하기 위하여 공간적으로 조사(probe)된다.The normalized image frame intensity vector for the sum of the normalized images is differentiated from one or more reference frames producing a residual vector. This residual vector is accumulated in the element direction to form the accumulated residual vector. This accumulated residual vector is then probed spatially to define a spatial object boundary for spatial division of object and non-object pixels.

일 바람직한 실시예에서, 누적된 잔여 벡터의 최초의 통계적 분석이 수행되어 누적된 잔여 벡터의 임계점을 설정하기 위하여 사용될 수 있는 통계적 임계값이 얻어진다. 수축(erosion) 및 이어지는 팽창(dilation) 형태학상의 연산을 통해, 예비적인 객체 영역 마스크가 생성된다. 그 후 영역의 궤적 다각형 점들이 분석되어 이러한 점들의 볼록 껍질(convex hull)을 드러낸다. 볼록 껍질은 그 후 능동 궤적 분석 방법을 위한 최초의 궤적으로 이용된다. 능동 궤적은 이것이 객체의 누적된 잔여물의 공간적 경계에 수렴할 때까지 전파된다. 추가의 바람직한 실시예에서, 예비적인 궤적의 에지들은 모든 에지 길이들에 대하여 최소의 에지 길이가 얻어질 때까지 중간점 정점들을 부가함으로써 더 세분화된다. 이러한 추가적인 실시예는 객체의 외곽선을 보다 정확히 맞추기 위하여 능동 궤적 모델의 자유도를 증가 시키도록 되어 있다.In one preferred embodiment, an initial statistical analysis of the accumulated residual vector is performed to obtain a statistical threshold that can be used to set the threshold of the accumulated residual vector. Through erosion and subsequent dilation morphological operations, a preliminary object region mask is created. The trajectory polygon points of the area are then analyzed to reveal the convex hull of these points. The convex hull is then used as the first trajectory for the active trajectory analysis method. The active trajectory propagates until it converges to the spatial boundaries of the accumulated residue of the object. In a further preferred embodiment, the edges of the preliminary trajectory are further refined by adding midpoint vertices until a minimum edge length is obtained for all edge lengths. This additional embodiment is designed to increase the degree of freedom of the active trajectory model to more accurately align the outline of the object.

바람직한 실시예에서, 개선된 궤적이 사용되어 궤적에 의해 암시되는 다각형을 덮어씌우고 정규화된 이미지내의 다각형을 덮어씌움으로써 객체의 화소를 나타내는 화소 마스크를 생성한다.In a preferred embodiment, an improved trajectory is used to create a pixel mask representing the pixels of the object by overwriting the polygons implied by the trajectory and by overwriting the polygons in the normalized image.

비객체의Non-object 분해(resolution) Resolution

도 12에 도시된 블록도는 비객체 분할, 또는 등가적으로, 배경 분해의 일 바람직한 실시예를 개시한다. 배경 버퍼의 초기화(1206) 및 최초의 최대 거리 값(1204)을 이용하여, "안정도"를 검출된 객체 위치(1202)로부터의 가장 긴 거리와 연관시킴으로써 프로세스는 가장 안정된 비객체 화소를 결정하도록 동작한다. 새롭게 검출된 객체 위치(1202)가 주어지면, 프로세스는 각각의 화소 위치를 체크한다(1210). 각각의 화소 위치(1210)에 대하여, 검출된 객체 위치(1202)로부터의 거리가 거리 변환을 이용하여 계산된다. 만약 이 화소에 대한 거리가 최대 거리 버퍼(1204)내의 미리 저장된 위치보다 크다면(1216), 이전의 값은 현재의 값(1218)으로 대체되며 화소 값은 화소 버퍼에 기록된다(1220).The block diagram shown in FIG. 12 discloses one preferred embodiment of non-object segmentation, or equivalently, background decomposition. Using the initialization 1206 of the background buffer and the initial maximum distance value 1204, the process operates to determine the most stable non-object pixel by associating "stability" with the longest distance from the detected object location 1202. do. Given the newly detected object position 1202, the process checks each pixel position (1210). For each pixel location 1210, the distance from the detected object location 1202 is calculated using the distance transform. If the distance to this pixel is greater than the pre-stored position in the maximum distance buffer 1204 (1216), the previous value is replaced by the current value 1218 and the pixel value is written to the pixel buffer (1220).

분해된 배경 이미지가 주어지면, 이미지와 현재의 프레임 사이의 오차는 공간적으로 정규화되고 시간적으로 누적될 수 있다. 이러한 분해된 배경 이미지는 "배경 분해" 섹션에서 기재된다. 이 방법을 통한 배경의 분해는 시간 기반의 폐색 필터 프로세스라고 생각된다.Given a decomposed background image, the error between the image and the current frame can be spatially normalized and accumulated in time. This decomposed background image is described in the "Background Decomposition" section. Background decomposition through this method is thought to be a time-based occlusion filter process.

결과적인 누적된 오차는 그 후 최초의 궤적을 제공하도록 임계점설정된다. 궤적은 그 후 궤적 변형에 대항하여 균형 오차 잔여물로 공간적으로 전파된다.The resulting accumulated error is then thresholded to provide the first trajectory. The trajectory is then propagated spatially to the balance error residue against trajectory deformation.

대안의 실시예에서, 현재의 프레임과 분해된 배경 프레임들간의 절대 차(absolute difference)가 계산된다. 요소 방식의 절대 차는 그 후 뚜렷한 공간 영역으로 분할된다. 이러한 영역을 에워싸는 박스 평균 화소 값이 계산되어, 분해된 배경이 업데이트될 때, 현재 및 분해된 배경 평균 화소 값 사이의 차가 제약 이동을 수행하기 위해 사용될 수 있으므로, 현재의 영역이 분해된 배경과 보다 효과적으로 혼합될 수 있다. 다른 실시예에서, 정규화된 프레임 마스크내의 정점들은 모션 추정되고 각각의 프레임에 대해 저장된다. 이들은 그 후 SVD를 이용하여 처리되어 각각의 프레임들에 대하여 로컬 변형 예측을 생성한다.In an alternative embodiment, an absolute difference between the current frame and the resolved background frames is calculated. The absolute difference of the element scheme is then divided into distinct spatial regions. The box mean pixel value surrounding this area is calculated so that when the decomposed background is updated, the difference between the current and decomposed background average pixel values can be used to perform the constraint shift, so that the current area is more than the decomposed background. Can be mixed effectively. In another embodiment, the vertices in the normalized frame mask are motion estimated and stored for each frame. These are then processed using SVD to generate local distortion prediction for each frame.

경사도 분할Slope Split

텍스쳐 분할 방법, 또는 등가적으로 강도 경사도 분할은 비디오의 하나 이상의 프레임에서 화소의 로컬 경사도를 분석한다. 경사도 응답은 비디오 프레임내의 화소 위치에 국부적인 공간 불연속을 특징지우는 통계 측정이다. 그 후 몇몇 공간 클러스터링 기술 중 하나가 공간 영역으로 경사도 응답을 결합하는데 사용된다. 이러한 영역의 경계는 하나 이상의 비디오 프레임의 공간적 불연속을 식별하는데 유용하다. The texture segmentation method, or equivalently intensity gradient segmentation, analyzes the local gradient of pixels in one or more frames of the video. Gradient response is a statistical measure that characterizes the spatial discontinuities local to pixel locations within a video frame. One of several spatial clustering techniques is then used to combine the gradient response into the spatial domain. The boundaries of these regions are useful for identifying spatial discontinuities in one or more video frames.

본 발명의 일 실시예에서, 컴퓨터 그래픽 텍스쳐 생성으로부터의 합산된 영역 테이블 개념은 강도 필드의 경사도의 계산을 촉진할 목적으로 사용된다. 네 개의 추가 연산과 결합된 4개의 검색(lookup)을 통해 원본 필드의 소정의 직사각형의 합을 용이하게 하는 누진적으로 합산된 값들의 필드가 생성된다. In one embodiment of the present invention, the summed region table concept from computer graphics texture generation is used for the purpose of facilitating the calculation of the slope of the intensity field. Four lookups combined with four additional operations produce a field of progressively summed values that facilitates the sum of a given rectangle of the original field.

추가의 실시예는 이미지에 대해 생성된 해리스 응답(Harris response)을 이 용하며, 각각의 화소에 이웃한 화소는 균일면, 에지 또는 코너로 분류된다. 응답 값은 이러한 정보로부터 생성되며 프레임내의 각각의 엘리먼트에 대한 에지 상태 또는 코너 상태의 정도를 표시한다. A further embodiment utilizes a Harris response generated for the image, with pixels neighboring each pixel being classified into uniform planes, edges or corners. The response value is generated from this information and indicates the degree of edge state or corner state for each element in the frame.

다중 스케일 경사도 분석Multiscale Slope Analysis

본 발명의 일 실시예는 수 개의 공간 스케일을 통해 이미지 경사도 값들을 생성함으로써 이미지 경사도 지원을 추가로 강제한다. 이 방법은 이미지 경사도를 적합화하는 데 도움이 될 수 있어서, 상이한 스케일에서의 공간 불연속이 서로를 지원하는 데 사용된다 - "에지"가 수 개의 상이한 공간 스케일에서 식별되는 한 에지는 "돌출"되어 있어야 한다. 보다 적합화된 이미지 경사도는 보다 돌출된 특징에 대응하는 경향이 있다.One embodiment of the present invention further enforces image tilt support by generating image tilt values over several spatial scales. This method can help to adapt the image gradient so that spatial discontinuities at different scales are used to support each other-as long as the "edge" is identified at several different spatial scales, the edges are "projected" Should be More adapted image inclinations tend to correspond to more prominent features.

바람직한 실시예에서, 텍스터 응답 필드가 먼저 생성되고, 이 필드의 값이 그후 k-수단 비닝(binning)/분할(partitioning)에 기초하여 수 개의 저장소(bin)로 적합화된다. 원본 이미지 경사도 값들은 그 후 각각의 저장소를 한 번의 반복이 분기점 분할을 적용할 수 있는 값들의 간격으로 이용하여 점진적으로 프로세싱된다. 이러한 접근법의 이점은 균일성이 강한 공간적 치우침에 대해 상대적인 의미로 정의된다는 것이다.In a preferred embodiment, a text response field is generated first, and the value of this field is then adapted to several bins based on k-means binning / partitioning. The original image gradient values are then processed progressively using each bin as an interval of values at which one iteration can apply branch segmentation. The advantage of this approach is that it is defined in terms of relative uniformity for strong spatial bias.

스펙트럼 분할Spectral segmentation

스펙트럼 분할 방법은 비디오 화소에서 흑백, 그레이 스케일, 또는 컬러 화소의 통계적 확률 분포를 분석한다. 스펙트럼 분류기는 이러한 화소의 확률 분포에 대한 클러스터링 연산을 실행함으로써 구성된다. 이어 분류기는 하나 이상의 화소를 확률 클래스에 속하는 것으로 분류하기 위해 사용된다. 최종 확률 클래스 및 그 화소에는 클래스 라벨이 주어진다. 이어 이러한 클래스 라벨은 뚜렷한 경계를 갖는 화소의 영역에 공간적으로 관련된다. 이러한 경계는 하나 이상의 비디오 프레임의 공간 불연속을 식별한다. The spectral segmentation method analyzes the statistical probability distribution of black and white, gray scale, or color pixels in video pixels. The spectral classifier is constructed by performing a clustering operation on the probability distribution of these pixels. The classifier is then used to classify one or more pixels as belonging to a probability class. The final probability class and its pixels are given a class label. This class label is then spatially related to the region of the pixel having a distinct boundary. This boundary identifies the spatial discontinuity of one or more video frames.

본 발명은 비디오의 프레임내의 화소를 분할하기 위해 스펙트럼 분류에 기초한 공간 분할을 이용할 수 있다. 더욱이, 영역들 사이의 대응은 앞선 세그먼트에서 영역을 갖는 스펙트럼 영역의 중첩에 기초하여 결정될 수 있다. The present invention can use spatial segmentation based on spectral classification to segment the pixels in a frame of video. Moreover, the correspondence between the regions can be determined based on the overlap of the spectral regions with regions in the preceding segment.

비디오 프레임이 비디오 프레임내의 객체에 대응하는 더욱 큰 영역으로 공간적으로 연결된 연속한 컬러 영역으로 개략적으로 구성된 경우, 채색된(또는 스펙트럼의) 영역의 식별 및 추적은 비디오 시퀀스내의 객체의 이후의 분할을 용이하게 할 수 있다. If the video frame is roughly composed of contiguous color regions spatially connected to larger regions corresponding to objects in the video frame, identification and tracking of colored (or spectral) regions facilitates subsequent segmentation of objects in the video sequence. It can be done.

배경 분할Background split

기재된 발명은 검출된 객체와 비디오의 각각의 프레임내의 각각의 개별 화소 사이의 공간 거리 측정의 시간적 최대값에 기초하는 비디오 프레임 배경 모델링을 위한 방법을 포함한다. 객체의 검출된 위치가 주어진다면, 거리 변환이 적용되어, 프레임 내의 각각의 화소에 대한 스칼라 거리 값을 생성한다. 각각의 화소에 대하여, 모든 비디오 프레임에 대한 최대 거리의 맵이 유지된다. 최대값이 처음으로 할당될 때, 또는 이후에 새롭고 상이한 값으로 업데이트 될 때, 이 비디오 프레임에 대한 대응하는 화소는 "분해된 배경" 프레임내에 유지된다.The disclosed invention includes a method for video frame background modeling based on a temporal maximum of a spatial distance measurement between a detected object and each individual pixel in each frame of a video. Given the detected position of the object, the distance transform is applied to produce a scalar distance value for each pixel in the frame. For each pixel, a map of the maximum distance for all video frames is maintained. When the maximum value is assigned for the first time or later updated with a new and different value, the corresponding pixel for this video frame remains in the "decomposed background" frame.

외관 모델링Appearance modeling

비디오 프로세싱의 통상의 목적은 종종 비디오 프레임의 시퀀스의 외관을 모델링 및 유지하는 것이다. 본 발명은 프로세싱의 사용을 통해 강고하고 넓게 적용가능한 방식으로 적용되는 제한된 외관 모델링 기술을 가능하게 하는 것이 목적이다. 전술한 등록, 분할, 및 정규화는 이러한 목적을 위해 설명된다. A common purpose of video processing is often to model and maintain the appearance of a sequence of video frames. It is an object of the present invention to enable a limited appearance modeling technique that is applied in a robust and widely applicable manner through the use of processing. The above registration, partitioning, and normalization are described for this purpose.

본 발명은 외관 변화 모델링의 수단을 개시한다. 선형 모델의 경우, 외관 변화 모델링의 주된 기반은 선형 상관을 활용하는 컴팩트한 기반을 나타내는 특징 벡터의 분석이다. 공간 강도 필드 화소를 나타내는 특징 벡터는 외관 변화 모델로 어셈블링될 수 있다. The present invention discloses a means of appearance change modeling. In the case of a linear model, the main basis for modeling the appearance change is the analysis of feature vectors representing a compact basis using linear correlation. The feature vector representing the spatial intensity field pixel may be assembled into an appearance change model.

대안의 실시예에서, 외관 변화 모델은 화소의 분할된 서브세트로부터 계산된다. 더욱이, 특징 벡터는 공간적으로 비중첩 특징 벡터로 분할될 수 있다. 이러한 공간 분해는 공간 타일링으로 달성될 수도 있다. 계산 효율은 더욱 광역의 PCA 방법의 차원 감소를 희생하지 않고 이러한 시간적 앙상블을 프로세싱을 통해 달성할 수도 있다. In an alternative embodiment, the appearance change model is calculated from the divided subset of pixels. Moreover, the feature vector can be spatially divided into non-overlapping feature vectors. Such spatial decomposition may be achieved with spatial tiling. Computational efficiency may further be achieved through processing such temporal ensemble without sacrificing the dimensional reduction of the wider PCA method.

외관 변화 모델을 생성할 때, 공간 강도 필드 정규화는 공간 변화의 PCA 모델링을 감소시키기 위해 사용될 수 있다. When generating an appearance change model, spatial intensity field normalization can be used to reduce PCA modeling of space change.

변형 transform 모델링modelling

로컬 변형은 정점 변위로 모델링될 수 있고 보간 기능이 이들 화소들과 연관된 정점들에 따라 화소의 리샘플링을 결정하기 위해 사용될 수 있다. 이러한 정점 변위는, 많은 정점들을 가로질러 하나의 매개변수 집합으로써 발견될 때, 모션에 있어서 큰 양의 변형을 제공할 수 있다. 이러한 매개변수들에 있어서의 상관관계 는 이러한 매개변수 공간의 차수(dimensionality)를 크게 감소시킬 수 있다.Local deformation can be modeled as vertex displacement and an interpolation function can be used to determine the resampling of a pixel according to the vertices associated with these pixels. This vertex displacement can provide a large amount of deformation in motion when found as a set of parameters across many vertices. Correlation in these parameters can greatly reduce the dimensionality of this parameter space.

PCAPCA

출현 변화 모델을 생성하는 바람직한 수단은, 패턴 벡터로서의 비디오의 프레임을 트레이닝 매트릭스, 또는 앙상블, 및 트레이닝 매트릭스에 대한 중요 컴포넌트 분석(PCA)의 애플리케이션으로 조립시키는 것이다. 이러한 확장이 생략되면, 최종 PCA 변환 매트릭스는 비디오의 이후의 프레임을 분석 및 합성하기 위해 사용된다. 생략의 레벨에 기초하여, 화소의 원본 외관의 품질의 변하는 레벨이 얻어질 수 있다. A preferred means of generating an appearance change model is to assemble a frame of video as a pattern vector into a training matrix, or ensemble, and an application of critical component analysis (PCA) to the training matrix. If this extension is omitted, the final PCA transform matrix is used to analyze and synthesize subsequent frames of video. Based on the level of omission, a varying level of quality of the original appearance of the pixel can be obtained.

패턴 벡터의 구성 및 분해의 특정한 수단은 기술 분야의 당업자에게 잘 알려져 있다. Specific means of construction and decomposition of the pattern vector are well known to those skilled in the art.

주위 신호로부터 돌출 신호 모드의 공간 분할 및 이러한 모드의 공간 정규화가 주어지면, 화소 그 자체, 또는 등가적으로 최종 정규화된 신호의 외관은 화소 외관의 표현에 대한 비트 레이트와 근사 에러 사이의 직접적인 교환을 허용하는 낮은 랭크 파라미터화를 이용하여 선형으로 상관된 컴포넌트로 팩토링될 수 있다. 낮은 랭크의 근사화를 얻는 한가지 방법은 부호화된 데이터의 바이트/비트의 절단을 통해서이다. 낮은 랭크의 근사화는 원본 데이터의 압축이 이 기법의 구체적인 적용에 의해 결정된 것으로 간주된다. 예를 들어, 비디오 압축에 있어서, 데이터의 절단이 지각가능한 품질을 과도하게 열화시키지 않는다면, 어플리케이션에 특정된 목표가 압축에 따라서 얻어질 수 있다.Given the spatial division of the protruding signal mode and the spatial normalization of this mode from the ambient signal, the appearance of the pixel itself, or an equivalent final normalized signal, results in a direct exchange between the bit rate and the approximation error for the representation of the pixel appearance. It can be factored into linearly correlated components using low rank parameterization that allows. One way to obtain a low rank approximation is through truncation of bytes / bits of encoded data. A low rank approximation is considered that the compression of the original data is determined by the specific application of this technique. For example, in video compression, if the truncation of the data does not excessively degrade the perceptible quality, an application specific goal may be achieved according to the compression.

도 2에 도시된 것처럼, 데이터의 크기에 있어서 간결한 버전(252&254)을 생 성하기 위하여, 정규화된 객체 화소(242&244)는 벡터 공간으로 투사되고 선형 대응이 PCA와 같은 분해 프로세스(250)를 이용하여 모델링될 수 있다.As shown in FIG. 2, to produce a concise version 252 & 254 in the size of the data, the normalized object pixels 242 & 244 are projected into vector space and a linear correspondence can be obtained using a decomposition process 250 such as PCA. Can be modeled.

연속 continuity PCAPCA

PCA는 PCA 변환을 이용하여 패턴을 PCA 계수로 엔코딩한다. 패턴이 PCA 변환에 의해 더욱 잘 표현되면, 패턴을 엔코딩하기 위해 계수가 덜 필요하다. 트레이닝 패턴의 획득과 엔코딩 될 패턴의 획득 사이에서 시간이 경과함에 따라 패턴 벡터가 열화될 수 있 있음을 인식하면, 변환을 업데이트하는 것은 열화의 반대 작용에 도움을 준다. 새로운 변환을 생성하기 위한 대안으로서, 현재 패턴의 연속 업데이트가 소정의 경우 더욱 계산 효율적이다. PCA uses a PCA transform to encode the pattern into PCA coefficients. If the pattern is better represented by a PCA transform, less coefficients are needed to encode the pattern. Recognizing that a pattern vector may deteriorate over time between acquisition of a training pattern and acquisition of a pattern to be encoded, updating the transformation helps to counteract the degradation. As an alternative to creating a new transformation, continuous updating of the current pattern is in some cases more computationally efficient.

많은 최신 비디오 압축 알고리즘은 하나 이상의 다른 프레임으로부터 비디오의 프레임을 예측한다. 예측 모델은 각각의 예상 프레임을, 다른 프레임의 대응하는 패치로 매칭된 비중첩 타일 및 오프셋 모션 벡터에 의해 파라미터화된 관련된 변환 위치로 분할하는 것에 공통적으로 기초한다. 프레임 인덱스와 선택적으로 결합된 이러한 공간 변위는 타일의 "모션 예측된" 버젼을 제공한다. 만일 예측의 에러가 소정의 임계치 미만이면, 타일의 화소는 나머지 인코딩에 적합하며; 압축 효율에 있어서 대응하는 이득이 존재한다. 그렇지 않으면, 타일의 화소는 직접 엔코딩된다. 이러한 타입의 타일 기반 - 대안으로는 블록 기반이라 칭함 - 모션 예측 방법은 화소를 포함하는 타일을 해석함으로써 비디오를 모델링한다. 비디오내의 이미지화된 사상(事象)이 이러한 타입의 모델링을 지지하면, 대응하는 엔코딩 효율이 증가한다. 이러한 모델링 제한은, 블록 기반의 예측에 고유한 해석적 가정을 따르기 위하여 시간적 분해의 소정 레벨, 또는 초당 프레임의 수가 모션을 겪는 이미지화된 객체에 대해 제공된다고 가정한다. 이러한 해석 모델에 대한 다른 요구는 소정의 시간적 분해에 대한 공간적 변위가 제한되어야 한다는 것인데; 즉, 예측이 유도되는 대상이 되는 프레임과 예측되고 있는 프레임 사이의 시간차는 절대 시간의 상대적으로 짧은 양이어야 한다. 이러한 시간적 분해 및 모션 제한은 비디오 스트림에 존재하는 소정의 과잉의 비디오 신호 컴포넌트의 식별 및 모델링을 용이하게 한다. Many modern video compression algorithms predict frames of video from one or more other frames. The prediction model is commonly based on partitioning each expected frame into related transform positions parameterized by non-overlapping tiles and offset motion vectors matched with corresponding patches of other frames. This spatial displacement, optionally combined with the frame index, provides a "motion predicted" version of the tile. If the error of prediction is below a certain threshold, the pixels of the tile are suitable for the remaining encodings; There is a corresponding gain in compression efficiency. Otherwise, the pixels of the tile are encoded directly. This type of tile-based-alternatively referred to as block-based-motion prediction method models video by interpreting tiles comprising pixels. If imaged events in the video support this type of modeling, the corresponding encoding efficiency increases. This modeling constraint assumes that a given level of temporal decomposition, or number of frames per second, is provided for an imaged object undergoing motion in order to follow the analytical assumptions inherent in block-based prediction. Another requirement for this analytical model is that the spatial displacement for any temporal decomposition must be limited; That is, the time difference between the frame to which the prediction is derived and the frame to be predicted should be a relatively short amount of absolute time. This temporal decomposition and motion constraints facilitate the identification and modeling of any excess video signal components present in the video stream.

본원 발명의 방법에서, 연속 PCA는 내장된 제로-트리 웨이블릿과 조합되어 하이브리드 압축 방법의 이용을 더 증가시킨다. 연속 PCA 기법은 종래의 PCA가 일시적 간섭성(coherency) 또는 일시적으로 국부적인 평활도를 갖는 신호에 대해 강화될 수 있는 수단을 제공한다. 내장된 제로-트리 웨이블릿은 특정 프로세싱의 강고성과 또한 알고리즘의 계산 효율성을 을 증가시키기 위하여 국부적으로 평활한 공간 신호가 공간-스케일 표현으로 분해될 수 있는 수단을 제공한다. 본원 발명에 대하여, 이러한 두 가지 기법은 결합되어 변형 모델의 표현력을 증가시키고 또한 베이시스(basis)의 표현력의 많은 부분이 베이시스의 절단에 의해 제공되도록 간결하고 정렬되어 있는 이들 모델의 표현을 제공한다.In the method of the present invention, continuous PCA is combined with an embedded zero-tree wavelet to further increase the use of the hybrid compression method. The continuous PCA technique provides a means by which conventional PCAs can be enhanced for signals with temporary coherency or temporarily local smoothness. Built-in zero-tree wavelets provide a means by which local smooth spatial signals can be decomposed into spatial-scale representations to increase the robustness of specific processing and also increase the computational efficiency of the algorithm. For the present invention, these two techniques combine to increase the expressiveness of the deformation model and also provide a representation of these models that are concise and aligned such that much of the representational power of the basis is provided by the cleavage of the basis.

또 다른 실시예에서, 연속 PCA는 고정된 입력 블록 사이즈 및 고정된 공차에 적용되어 최초의 가장 강력한 PCA 요소들에 가중치 바이어스를 증가시킨다. 더 긴 데이터 시퀀스에 대하여, 이러한 최초의 PCA 요소는 종종 단지 PCA 요소만이다. 이는 재구성의 시각적 품질에 영향을 주며 기재된 일부 방식으로의 접근법의 이용 을 제한할 수 있다. 본 발명은 종래에 사용되는 최소 제곱 표준의 사용에 바람직한 PCA 요소의 선택을 위해 상이한 표준을 사용한다. 이러한 형태의 모델 선택은 최초의 PCA 요소에 의한 과도 근사화를 회피한다.In another embodiment, continuous PCA is applied to a fixed input block size and a fixed tolerance to increase the weight bias on the first most powerful PCA elements. For longer data sequences, these first PCA elements are often only PCA elements. This affects the visual quality of the reconstruction and may limit the use of the approach in some of the ways described. The present invention uses different standards for the selection of PCA elements that are desirable for the use of the least square standard used in the prior art. This type of model selection avoids transient approximation by the original PCA element.

또 다른 실시예에서, 고정된 입력 블록 크기 및 데이터 블록당 규정된 수의 PCA 요소블록을 갖는 PCA 프로세스가 사용되어 비교적 많은 요소들을 사용하는 것에 대해 교환되는 유용한 균일 재구성을 제공한다. 추가의 실시예에서, 블록 PCA는 연속 PCA와 조합하여 사용되며, 블록 PCA는 블록 PCA 단계를 구비하는 한 세트의 단계 이후에 연속 PCA를 다시 개시한다. 이는 PCA 요소의 수에 있어서의 감소를 갖는 유용하고 균일한 근사화를 제공한다.In another embodiment, a PCA process with a fixed input block size and a prescribed number of PCA element blocks per data block is used to provide a useful uniform reconstruction that is exchanged for using a relatively large number of elements. In a further embodiment, the block PCA is used in combination with the continuous PCA, and the block PCA restarts the continuous PCA after a set of steps including a block PCA step. This provides a useful and uniform approximation with a reduction in the number of PCA elements.

또 다른 실시예에서, 본 발명은 PCA 요소가 인코딩-디코딩 이전 및 이후에 시각적으로 유사한 상황을 이용한다. 인코딩-디코딩 이전 및 이후의 이미지 시퀀스 재구성의 품질은 또한 유사하며, 이는 종종 사용된 양자화의 정도에 의존한다. 본 발명의 방법은 PCA 요소들을 디코딩하며 그 후 이들을 유닛 표준을 갖도록 재정규화한다. 적당한 정량화를 위하여, 디코딩된 PCA 요소들은 대략 수직이다. 더 높은 레벨의 양자화에서, 디코딩된 PCA 요소들은 부분적으로 SVD의 어플리케이션에 의해 복구되어 직교 베이시스 및 수정된 집합의 재구성 계수들을 얻는다.In another embodiment, the present invention utilizes a situation where the PCA elements are visually similar before and after encoding-decoding. The quality of the image sequence reconstruction before and after encoding-decoding is also similar, which often depends on the degree of quantization used. The method of the present invention decodes PCA elements and then renormalizes them to have a unit standard. For proper quantification, the decoded PCA elements are approximately vertical. At higher levels of quantization, the decoded PCA elements are partially recovered by the application of the SVD to obtain orthogonal basis and modified reconstruction coefficients.

또 다른 실시예에서, 합성 품질에 관하여 개선된 결과를 얻기 위하여 변할 수 있고 적응가능한 블록 크기가 하이브리드 연속 PCA 방법과 함께 적용된다. 본 발명은 최대수의 PCA 요소에서의 블록 크기 및 이들 블록에 대한 주어진 오차에 기초한다. 그 후, 본 방법은 PCA 요소들의 최대 수가 도달될 때까지 현재의 블록 크기를 확장 한다. 추가의 실시예에서, PCA 요소들의 시퀀스는 데이터 스트림으로 간주되고, 이 데이터 스트림은 차원의 추가 감소를 일으킨다. 본 방법은 후처리 단계를 수행하며, 여기서 가변 데이터 블록들이 제1 PCA 요소에 대하여 각각의 블록으로부터 수집되며 SVD는 차원을 추가로 감소시키도록 적용된다. 동일한 프로세스는 그 후 제2, 제3 등의 요소들의 수집에 적용된다.In another embodiment, variable and adaptable block sizes are applied with the hybrid continuous PCA method to obtain improved results with respect to synthesis quality. The present invention is based on the block size in the maximum number of PCA elements and given errors for these blocks. The method then extends the current block size until the maximum number of PCA elements is reached. In a further embodiment, the sequence of PCA elements is considered a data stream, which causes an additional reduction in dimension. The method performs a post-processing step wherein variable data blocks are collected from each block for the first PCA element and SVD is applied to further reduce the dimension. The same process then applies to the collection of the second, third, etc. elements.

대칭 분해Symmetric decomposition

본 발명의 일 실시예에서, 분해는 대칭 앙상블에 기초하여 수행된다. 이 앙상블은 제곱 이미지를 여섯 개의 직교 요소들의 합으로 표현한다. 각각의 요소는 제곱의 서로 다른 대칭에 해당한다. 대칭에 의해, 각각의 직교 요소들은 대칭의 행위에 의해 완전한 요소로 맵핑되는 "기초 영역"에 의해 결정된다. 기초 영역들의 합은, 입력 이미지 그 자체가 어떠한 특정 대칭을 갖지 않음을 가정하면, 입력 이미지와 동일한 기수(cardinality)를 가진다.In one embodiment of the present invention, decomposition is performed based on a symmetric ensemble. This ensemble represents the square image as the sum of six orthogonal elements. Each element corresponds to a different symmetry of the square. By symmetry, each orthogonal element is determined by a "basal region" that is mapped to a complete element by the act of symmetry. The sum of the base regions has the same cardinality as the input image, assuming that the input image itself does not have any particular symmetry.

나머지 기반 분해Rest based decomposition

MPEG 비디오 압축에서, 현재 프레임은 모션 벡터를 이용하여 이전의 프레임을 보상하는 모션에 의해 구성되며, 이어 보상 블록에 대한 나머지 업데이트의 애플리케이션이 뒤따르며, 끝으로 충분한 부합을 갖지 않는 소정의 블록이 새로운 블록으로서 엔코딩된다. In MPEG video compression, the current frame consists of a motion that compensates for the previous frame using a motion vector, followed by the application of the remaining updates to the compensation block, and finally any block that does not have sufficient matching is new. It is encoded as a block.

나머지 블록에 대응하는 화소는 모션 벡터를 통해 이전의 프레임의 화소로 맵핑된다. 결과는 나머지 값들의 연속한 애플리케이션을 통해 합성될 수 있는 비디오를 통한 화소의 시간적 경로이다. 이러한 화소는 PCA를 이용하여 최상으로 표 현될 수 있는 화소로서 식별된다. The pixels corresponding to the remaining blocks are mapped to the pixels of the previous frame through the motion vectors. The result is the temporal path of the pixel through the video that can be synthesized through a continuous application of the remaining values. These pixels are identified as pixels that can be best represented using PCA.

폐색 기반 분해(Occlusion-based decomposition ( AcclusionAcclusion -based Decomposition)-based Decomposition)

본 발명의 추가의 확장은 블록에 제공된 모션 벡터가 화소를 이동시킴으로써 이전의 프레임이 폐색(커버링)되게 할 것인지를 결정한다. 각각의 폐색 이벤트의 경우, 폐색 화소를 새로운 층으로 분할한다. 또한 히스토리가 없는 드러난 화소가 존재한다. 드러난 화소는 현재 프레임에서 이들을 적합하게 하고, 히스토리컬 적합성이 상기 층에 대해 행해질 소정의 층으로 배치된다. A further extension of the present invention determines whether the motion vector provided in the block causes the previous frame to be occluded (covered) by moving the pixels. For each occlusion event, occlusion pixels are divided into new layers. There are also exposed pixels without history. The exposed pixels fit them in the current frame, and are placed in a given layer where historical conformance is to be made for that layer.

화소의 시간적 연속은 분할 및 상이한 층으로의 화소의 결합을 통해 지원된다. 일단 안정한 층 모델이 형성되면, 각각의 층의 화소는 코히어런트한 모션 모델에 대한 멤버십에 기초하여 그룹화될 수 있다. Temporal continuity of pixels is supported through division and combining of pixels into different layers. Once a stable layer model is formed, the pixels of each layer can be grouped based on membership to the coherent motion model.

서브 대역 시간 양자화Subband Time Quantization

본 발명의 택일적 실시예는 각각의 프레임을 서브 대역 이미지로 분해하기 위해 이산 코사인 변환(DCT) 또는 이산 웨이블릿 변환(DWT)을 이용한다. 이어 중요 컴포넌트 분석(PCA)이 각각의 이러한 "서브 대역" 비디오에 적용된다. 개념은 비디오 프레임의 서브대역 분해가 원본 비디오 프레임과 비교하여 서브 대역 중 하나에서 공간 변화를 감소시킨다는 것이다. An alternative embodiment of the present invention uses Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT) to decompose each frame into subband images. Critical Component Analysis (PCA) is then applied to each such "sub-band" video. The concept is that subband decomposition of a video frame reduces spatial variation in one of the subbands compared to the original video frame.

이동 객체(사람)의 비디오의 경우, 공간 변화는 PCA에 의해 모델링된 변화를 지배하는 경향이 있다. 서브 대역 분해는 소정의 하나의 분해 비디오에서 공간 변화를 감소시킨다. In the case of video of moving objects (persons), spatial changes tend to dominate the changes modeled by the PCA. Subband decomposition reduces the spatial variation in any one decomposition video.

DCT의 경우, 소정의 하나의 서브 대역에 대한 분해 계수는 공간적으로 서브 대역 비디오로 배열된다. 예를 들어, DCT 계수는 각각의 블록으로부터 취해지며, 원본 비디오의 우표 버전으로 보이는 서브 대역 비디오로 정렬된다. 이는 모든 다른 서브 대역에 대해 반복되며, 최종 서브 대역 비디오는 PCA를 이용하여 각각 프로세싱된다. In the case of DCT, the decomposition coefficients for any one subband are spatially arranged into subband video. For example, the DCT coefficients are taken from each block and ordered into subband video that looks like a stamped version of the original video. This is repeated for all other subbands, and the final subband video is each processed using the PCA.

DWT의 경우, 서브 대역은 DCT에 대해 설명된 방식으로 이미 정렬된다. In the case of DWT, the subbands are already aligned in the manner described for DCT.

비제한 실시예에서, PCA 계수의 생략이 변화된다. In a non-limiting embodiment, the omission of the PCA coefficients is varied.

웨이블릿Wavelet

데이터가 이산 웨이블릿 변환(DWT)을 이용하여 분해되면, 다수의 대역 통과 데이터 세트는 보다 낮은 공간 해상도로 귀결된다. 단일의 스칼라 값만이 생길 때까지, 변환 프로세스는 반복적으로 유도된 데이터에 적용될 수 있다. 분해된 구조의 스칼라 엘리먼트들은 계층적 부모/자식 방식으로 통상적으로 관련된다. 최종 데이터는 다해상도 계층 구조 및 유한 차분을 포함한다. When data is decomposed using Discrete Wavelet Transform (DWT), multiple band pass data sets result in lower spatial resolution. Until only a single scalar value is obtained, the transformation process can be applied to the repeatedly derived data. Scalar elements of the decomposed structure are typically related in a hierarchical parent / child manner. The final data includes multiresolution hierarchy and finite differences.

DWT가 공간 강도 필드에 적용될 경우, 많은 자연적으로 발생한 이미지 사상은 낮은 공간 주파수로 인해 제1 또는 제2 저대역 통과 유도 데이터 구조에 의해 작은 지각있는 손실로 표현된다. 계층적 구조의 생략은 높은 주파수 공간 데이터가 제공되지 않거나 잡음으로 고려되지 않으면, 컴팩트한 표현을 제공한다. When DWT is applied to the spatial intensity field, many naturally occurring image events are represented by small perceptual losses by the first or second low pass induced data structures due to the low spatial frequency. Omission of the hierarchical structure provides a compact representation unless high frequency spatial data is provided or considered noise.

PCA가 작은 수의 계수로 정확한 개조를 달성하기 위해 사용될 수 있는 반면, 변환 그 자체는 매우 클 수 있다. 이러한 "초기" 변환의 크기를 줄이기 위해, 웨이블릿 분해의 내장된 제로 트리(EZT) 구조가 변환 매트릭스의 점진적으로 더욱 정교한 버전을 구축하는데 사용될 수 있다. While PCA can be used to achieve accurate adaptation with a small number of coefficients, the transformation itself can be very large. To reduce the size of this "initial" transform, the built-in zero tree (EZT) structure of wavelet decomposition can be used to build progressively more sophisticated versions of the transform matrix.

부분공간 분류Subspace classification

기술분야에서 숙달된 자에 의해 잘 이해될 수 있는 것처럼, 이산 샘플링 사상 데이터 및 유도 데이터는 대수 벡터 공간에 대응하는 데이터 벡터들의 세트로 표현될 수 있다. 이러한 데이터 벡터들은, 제한되지 않는 방식으로, 분할된 객체의 정규화된 외관에 있는 화소와, 모션 파라미터와, 2차원 또는 3차원상의 특징 또는 정점의 구조적 위치를 포함한다. 이러한 벡터들 각각은 벡터 공간에 존재하며, 공간의 기하구조의 분석이 샘플링된 벡터 또는 파라미터 벡터의 간결한 표현을 생성하기 위해 사용될 수 있다. 유용한 기하구조 상태는 컴팩트한 부분공간을 형성하는 파라미터 벡터에 의해 유형화된다. 하나 이상의 부분공간이 혼합되어, 표면적으로 보다 복잡한 달일 부분공간을 생성할 때, 구성요소의 부분공간들은 식별하기가 어려워질 수 있다. 원본 벡터들의 소정의 상호작용(가령 내적)을 통해 생성되는 고차원 벡터 공간에서 데이터를 검사함으로써 이러한 부분공간의 분리를 가능하게 하는 여러 분할 방법이 존재한다.As can be well understood by those skilled in the art, the discrete sampling event data and the derived data can be represented by a set of data vectors corresponding to the algebraic vector space. Such data vectors include, in a non-limiting manner, pixels in the normalized appearance of the segmented object, motion parameters, and the structural location of two- or three-dimensional features or vertices. Each of these vectors is in vector space, and an analysis of the geometry of the space can be used to produce a concise representation of the sampled or parametric vector. Useful geometry states are typed by parameter vectors that form a compact subspace. When one or more subspaces are mixed to create subspaces that are more complex on the surface, the subspaces of the components can be difficult to identify. There are several partitioning methods that enable the separation of these subspaces by examining the data in a high dimensional vector space created through some interaction (eg, dot product) of the original vectors.

벡터 공간을 분할하는 한 가지 방법은 다항식을 표현하는 베로네세(Veronese) 벡터 공간으로 벡터를 투사하는 것을 포함한다. 이 방법은 일반화된 PCA 또는 GPCA로 기술분야에서 잘 알려져 있다. 이러한 투사를 통해, 다항식에 대한 표준이 발견되고, 그룹화되며, 이러한 표준과 연관된 원본 벡터는 함께 그룹화될 수 있다. 이러한 기술의 이용예는 시간에 대해 추적되는 2차원 공간 점 대응을 3차원 구조 모델 및 이 3차원 모델의 모션으로 팩토링하는 것이다.One way of partitioning the vector space involves projecting the vector into a Veronese vector space representing a polynomial. This method is well known in the art as generalized PCA or GPCA. With this projection, the standards for polynomials are found and grouped, and the original vectors associated with these standards can be grouped together. An example use of this technique is to factor the two-dimensional spatial point correspondence tracked over time into a three-dimensional structural model and the motion of this three-dimensional model.

GPCA 기법은 정의된 대로 적용될 때 불완전하며, 데이터 벡터가 적은 노이즈 로 생성될 때만 결과가 생성된다. 선행 기술은 GPCA 알고리즘을 안내하기 위하여 감독 사용자 개입을 취한다. 이 제약은 본 기술의 잠재성을 크게 제한한다.The GPCA technique is incomplete when applied as defined and results only when the data vector is generated with less noise. The prior art takes supervised user intervention to guide the GPCA algorithm. This constraint greatly limits the potential of the present technology.

본 발명은 노이즈 및 공통 차원의 존재시에 다수의 부분공간의 식별 및 분할을 강고하게 처리하기 위하여 GPCA 방법의 개념적 기초를 연장한다. 이러한 혁신은 종래 기술의 상태에 대해 기술의 감독되지 않는 개선을 제공한다.The present invention extends the conceptual basis of the GPCA method to robustly handle the identification and segmentation of multiple subspaces in the presence of noise and common dimensions. This innovation provides an unsupervised improvement of the technology over the state of the art.

종래 기술에서 GPCA는 이러한 법선 벡터의 탄젠트 공간에 관련 없이 베로네세 맵의 다항식의 법선 벡터에 대해 연산한다. 본 발명의 방법은 베로네세 맵에서 통상적으로 발견되는 법선 벡터의 공간에 직교하는 탄젠트 공간을 발견하기 위하여 GPCA를 연장한다. 이 "탄젠트 공간", 또는 베로네세 맵의 부분공간은, 그 후 베로네세 맵을 팩토링하기 위해 사용된다. In the prior art, GPCA operates on the normal vectors of the polynomial of the Veronese map, irrespective of the tangent space of these normal vectors. The method of the present invention extends the GPCA to find the tangent space orthogonal to the space of the normal vector typically found in the Veronese map. This "tangent space", or subspace of the Veronese map, is then used to factor the Veronese map.

탄젠트 공간은 위치 및 탄젠트 평면 좌표축 사이의 르장드르(Legendre) 변환의 적용과 평면 웨이브 확장을 통해 식별되며, 이는 기하구조 객체의 표현에 있어서의 이중성, 구체적으로는 베로네세 맵의 다항식에 대한 법선의 탄젠트를 드러낸다. 이산 르장드르 변환은 법선 벡터에 대응하는 유도체의 제약된 형태를 정의하기 위하여 볼록 분석을 통해 적용된다. 이러한 접근법은 노이즈의 존재시에 법선 벡터의 계산에 의해 데이터 벡터를 분할하기 위해 사용된다. 이러한 볼록성 분석은 보다 강고한 알고리즘을 제공하기 위하여 GPCA와 일체화된다.Tangent space is identified through the application of the Regendre transformation between the position and tangent plane axes and the planar wave extension, which is the duality in the representation of geometric objects, specifically the normal to the polynomial of the Veronese map. Expose the tangent. Discrete Rejangre transformation is applied through convex analysis to define the constrained form of the derivative corresponding to the normal vector. This approach is used to segment the data vector by calculation of the normal vector in the presence of noise. This convex analysis is integrated with GPCA to provide a more robust algorithm.

본 발명은 GPCA를 적용할 때 반복적인 팩토링화 접근법을 이용한다. 특히, 종래기술에서 발견되는 유도체 기반 구현은 본원에 기재된 아주 동일한 GPCA를 통해 분류된 데이터 벡터들의 총합을 정제하기 위해 확장된다. 반복적으로 적용되 면, 이 기법은 베로네세 맵핑에서 후보 법선 벡터를 강고하게 검색하는 데 이용할 수 있으며, 그 후 확장된 GPCA 기법을 이용하여 이들 벡터들을 적합화한다. 팩토링화 단계에 대하여, 벡터의 정제된 집합과 연관된 원본 데이터는 원본 데이터 집합으로부터 제거된다. 나머지 데이터 집합은 혁신된 GPCA 기법으로 유사하게 분석될 수 있다. 이러한 혁신은 감독되지 않는 방법으로 GPCA 알고리즘을 이용하는 데 중요하다. 도 11은 데이터 벡터들의 반복적인 정제를 도시한다.The present invention uses an iterative factorization approach when applying GPCA. In particular, the derivative based implementations found in the prior art are extended to purify the sum of the data vectors sorted through the very same GPCA described herein. When applied repeatedly, this technique can be used to robustly search for candidate normal vectors in Veronese mapping, and then adapt these vectors using an extended GPCA technique. For the factorization step, original data associated with the refined set of vectors is removed from the original data set. The rest of the dataset can be similarly analyzed with an innovative GPCA technique. This innovation is important for using GPCA algorithms in an unsupervised way. 11 shows iterative refinement of data vectors.

GPCA 기법으로의 본 발명의 연장은 베로네세 다항식 벡터 공간에 다수의 근들이 존재하는 경우 더욱 유리하다. 또한, 선행기술 기법은 베로네세 맵에서의 법선들이 벡터 공간 축과 평행할 때 변질되는 경우를 겪지만, 본원 방법은 변질되지 않는다.The extension of the invention to the GPCA technique is even more advantageous when there are multiple roots in the Veronese polynomial vector space. In addition, the prior art techniques suffer from cases where the normals in the Veronese map deteriorate when parallel to the vector space axis, but the method does not deteriorate.

도 10은 기본적인 다항식 피팅 및 차분화의 방법을 도시한다.10 illustrates a method of basic polynomial fitting and differentiation.

바람직한 실시예에서, GPCA는 임의의 공통 차원(co-dimension) 부분공간(subspace)에 대한 다항식 미분으로 구현된다. SVD는 정규의 공간 차원에 따른 각각의 데이터 포인트 및 클러스터 데이터 포인트에서 정규 공간의 차원을 얻기 위해 사용된다. 각각의 클러스터 내에서, 데이터 포인트들은 이들 모두가 공통 정규 공간 차원과 동일한 랭크를 갖는 최대 집합에 속할 때 동일한 부분공간에 할당된다. 이 방법은 노이즈 없는 데이터에 대하여 최적이라고 인식된다.In a preferred embodiment, the GPCA is implemented with polynomial derivatives for any common co-dimension subspace. SVD is used to obtain the normal spatial dimension at each data point and cluster data point along the normal spatial dimension. Within each cluster, data points are assigned to the same subspace when they all belong to the maximum set having the same rank as the common normal spatial dimension. It is recognized that this method is optimal for noise free data.

다항식 미분을 이용하는 GPCA의 또 다른 비제한적인 실시예는 임의의 공통차원 부분공간을 가진다. 이는 "다항식 미분" 방법의 적응이다. 노이즈는 근접하게 정렬된 정규 벡터들의 집합의 랭크를 증가시키는 경향이 있기 때문에, 다항식의 분 할 단계는 SVD 차원에 따라 데이터 포인트들을 클러스터링한 후 가장 작은 공통 차원을 갖는 클러스터에서 가장 작은 나머지 오차를 갖는 포인트를 선택함으로써 초기화된다. 이 포인트에서의 정규 공간은 그 후 베로네세 맵을 근사적으로 감소시키기 위하여 다항식 분할을 이용하여 적용된다.Another non-limiting embodiment of GPCA using polynomial derivatives has any common dimensional subspace. This is an adaptation of the "polynomial differential" method. Since noise tends to increase the rank of a set of closely aligned normal vectors, the division step of the polynomial clusters the data points along the SVD dimension and then has the smallest residual error in the cluster with the smallest common dimension. It is initialized by selecting a point. The normal space at this point is then applied using polynomial partitioning to approximately reduce the Veronese map.

추가적인 실시예에서, 경사 가중된 나머지 오차는 모든 데이터 포인트에 대해 최소화되며, SVD는 공통 차원 및 기본벡터를 추정하기 위하여 최적의 포인트에서 적용된다. 기초 벡터는 그 후 베로네세 맵을 근사적으로 감소시키기 위하여 다항식 분할을 이용하여 적용된다. In a further embodiment, the slope weighted residual error is minimized for all data points, and SVD is applied at the optimal point to estimate common dimensions and base vectors. The basis vector is then applied using polynomial partitioning to approximately reduce the Veronese map.

바람직한 실시예에서, RCOP 오차는 수치 공차를 노이즈 레벨과 선형으로 스케일링하는 것으로 인해 수치 공차를 설정하기 위해 사용된다. 바람직한 실시예에서, GPCA는 이러한 방식으로 구현되어 각각의 포인트에서 추정된 정규 벡터들에 SVD를 적용하고 정규 벡터 SVD가 동일한 랭크를 갖는 포인트를 식별한다. 그 후 연속 SVD가 동일한 랭크를 갖는 포인트들에서 정규 벡터들의 각각의 수집에 적용된다. 연속 SVD가 랭크를 변경하는 포인트는 서로 다른 부분공간으로써 식별된다.In a preferred embodiment, the RCOP error is used to set the numerical tolerance due to scaling the numerical tolerance linearly with the noise level. In a preferred embodiment, the GPCA is implemented in this way to apply an SVD to the normal vectors estimated at each point and identify the points where the normal vector SVD has the same rank. Continuous SVD is then applied to each collection of normal vectors at points with the same rank. The points at which consecutive SVDs change rank are identified as different subspaces.

하이브리드 공간 정규화 압축Hybrid space normalization compression

본 발명은 비디오 스트림을 둘 이상의 "정규화된" 스트림으로 분할하는 것의 부가를 통해 블록 기반의 모션 예측된 코딩 방식의 효율성을 확장한다. 이러한 스트림들은 그 후 개별적으로 인코딩되어, 종래의 코덱의 해석 모션 가정이 유효하도록 한다. 정규화된 스트림을 디코딩할 때, 스트림들은 이들의 적절한 위치로 비정규화되고 원본 비디오 시퀀스를 생성하도록 서로 합성된다.The present invention extends the efficiency of a block-based motion predicted coding scheme through the addition of dividing a video stream into two or more "normalized" streams. These streams are then individually encoded to make the interpretation motion assumptions of the conventional codec valid. When decoding a normalized stream, the streams are denormalized to their proper location and synthesized with each other to produce the original video sequence.

일 실시예에서, 하나 이상의 객체들이 비디오 스트림에서 검출되며 각각의 개별 객체들과 연관된 화소들이 비객체 화소를 남겨둔 채 이후에 분할된다. 다음으로, 글로벌 공간 모션 모델이 객체 및 비객체에 대해 생성된다. 글로벌 모델은 객체 및 비객체 화소를 공간적으로 정규화하기 위해 사용된다. 이러한 정규화는 비디오 스트림으로부터 비해석(non-translational) 모션을 효율적으로 제거했으며, 폐색 상호반응이 최소화된 비디오의 집합을 제공했다. 이들은 모두 본 발명의 방법의 유용한 특징이다.In one embodiment, one or more objects are detected in the video stream and the pixels associated with each individual object are subsequently split, leaving non-object pixels. Next, a global spatial motion model is created for objects and non-objects. The global model is used to spatially normalize object and non-object pixels. This normalization effectively removed non-translational motion from the video stream and provided a set of videos with minimal occlusion interactions. These are all useful features of the method of the invention.

공간적으로 정규화된 화소를 갖는 객체 및 비객체의 새로운 비디오는 종래의 블록 기반 압축 알고리즘에 입력으로써 제공된다. 비디오를 디코딩할 때, 글로벌 모션 모델 파라미터들이 사용되어 디코딩된 프레임을 비정규화하며, 객체 화소들은 서로 혼합되고 원본 비디오 스트림의 근사화를 생성하기 위하여 비객체 화소상에 혼합된다.New video of objects and non-objects with spatially normalized pixels is provided as input to conventional block-based compression algorithms. When decoding video, global motion model parameters are used to denormalize the decoded frame, and the object pixels are mixed with each other and mixed on non-object pixels to produce an approximation of the original video stream.

도 6에 도시된 것처럼, 하나 이상의 객체들(630&650)에 대한 이전에 검출된 객체 인스턴스(206&208)는 종래의 비디오 압축 방법(632)의 개별 인스턴스로 각각 프로세싱된다. 부가적으로, 객체의 분할(230)으로부터 생겨난 비객체(602)도 종래의 비디오 압축(632)을 이용하여 압축된다. 각각의 이러한 개별 압축 코딩(632)의 결과는 각각의 비디오 스트림에 개별적으로 대응하는 각각에 대한 개별적인 종래의 인코딩된 스트림(634)이다. 몇몇 점(point)에서는, 가능하게는 전송 후에는, 이러한 중간 인코딩된 스트림(234)은 정규화된 비객체의 합성(610) 및 다수의 객체들(638&658)로 압축해제될 수 있다(636). 이러한 합성된 화소들은 이들의 비정규 화된 버전(622, 642&662)으로 비정규화(640)될 수 있어서, 합성 프로세스(670)가 객체 및 비객체 화소들을 결합하여 전체 프레임의 합성(672)을 형성할 수 있도록 서로에 대해 화소들을 공간적으로 정확히 위치시킨다.As shown in FIG. 6, previously detected object instances 206 & 208 for one or more objects 630 & 650 are each processed into separate instances of the conventional video compression method 632. Additionally, non-objects 602 resulting from segmentation 230 of objects are also compressed using conventional video compression 632. The result of each such individual compressed coding 632 is a separate conventional encoded stream 634 for each corresponding to each video stream individually. At some point, possibly after transmission, this intermediate encoded stream 234 may be decompressed 636 into a normalized non-object synthesis 610 and a number of objects 638 & 658. These synthesized pixels can be denormalized 640 to their denormalized versions 622, 642 & 662, so that the compositing process 670 can combine the object and non-object pixels to form a composite 672 of the entire frame. Position the pixels precisely spatially with respect to each other.

바람직한 실시예에서, 인코딩 모드들 사이의 스위칭은 통계적 왜곡 메트릭에 기초하여 수행되는데, 예를 들면 비디오의 프레임들을 인코딩하기 위하여 종래방법 대 부분공간 방법을 허용하게 될 PSNR에 기초하여 수행된다.In a preferred embodiment, the switching between encoding modes is performed based on a statistical distortion metric, for example based on the PSNR, which would allow a conventional versus subspace method to encode frames of video.

본 발명의 다른 실시예에서, 외관, 글로벌 변형, 및 국부 변형의 인코딩된 매개변수들은 보간되어 인코딩되지 않았어야 하는 매개 프레임들의 예측을 생성한다. 보간 방법은 선형, 3제곱, 스플라인(spline)과 같은 표준 보간 방법 중 어느 것이어도 좋다.In another embodiment of the invention, the encoded parameters of the appearance, global transformation, and local transformation are interpolated to produce a prediction of the intermediate frames that should not be encoded. The interpolation method may be any of standard interpolation methods such as linear, cubic, and spline.

도 14에 도시된 것처럼, 객체 보간 방법은 외관 및 변형 매개변수들에 의해 표현될 때 일련의 정규화된 객체들(1402, 1404, &1406)의 보간 분석(1408)을 통해 얻어질 수 있다. 분석은 보간 기능이 적용될 수 있는 일시적인 범위(1410)를 결정한다. 범위 상세(1410)는 그 후 중간의 정규화된 객체(1416&1418)를 근사화하고 마침내 합성하기 위하여 정규화된 객체 상세(1414&1420)와 결합될 수 있다. As shown in FIG. 14, the object interpolation method may be obtained through interpolation analysis 1408 of a series of normalized objects 1402, 1404, & 1406 when represented by appearance and deformation parameters. The analysis determines the temporary range 1410 to which the interpolation function can be applied. Range details 1410 may then be combined with normalized object details 1414 & 1420 to approximate and finally synthesize intermediate normalized objects 1416 & 1418.

하이브리드hybrid 코덱의 통합 Codec integration

본원발명에서 기재된 것처럼 종래의 블록 기반 압축 알고리즘과 정규화-분할 방식을 결합함에 있어서, 결과로 발생된 수 개의 본 발명의 방법이 존재한다. 주로, 특수화된 데이터 구조 및 요구되는 통신 프로토콜이 존재한다.In combining the conventional block-based compression algorithm with the normalization-splitting scheme as described in the present invention, there are several resulting inventive methods. Primarily there are specialized data structures and required communication protocols.

주된 데이터 구조들은 글로벌 공간 변형 파라미터 및 객체 분할 구체화 마스 크를 포함한다. 주된 통신 프로토콜은 글로벌 공간 변형 파라미터와 객체 분할 구체화 마스크의 전송을 포함하는 층들이다.Main data structures include global spatial transformation parameters and object segmentation specification masks. The main communication protocols are layers that contain the transmission of global spatial transformation parameters and object segmentation specification masks.

Claims

A computer implemented method of generating an encoded form of video signal data from a plurality of video frames, the method comprising:

Detecting one or more objects in two or more video frames;

Tracking the one or more objects through the two or more frames of the video frames;

Identifying corresponding elements of one or more objects in the two or more video frames;

Analyzing the corresponding elements to create relationships between the corresponding elements;

Generating a correspondence model using the relationships between the corresponding elements;

Resampling pixel data associated with the one or more objects in the two or more video frames using the correspondence model and thereby generating resampled pixel data, wherein the resampled pixel data comprises: Indicates a first intermediate form of data; And

Restoring a spatial position of the resampled pixel data using the correspondence model, thereby generating a reconstructed pixel,

No detection dictates indirect detection for the entire frame,

The detecting and tracking step includes using a Viola / Jones face detection algorithm,

Computer-generated method for generating an encoded form of video signal data from a plurality of video frames.

Detecting one or more objects in two or more video frames;

Dividing pixel data associated with the one or more objects from other pixel data in the two or more video frames to produce a second intermediate form of data, wherein the partitioning uses spatial partitioning of the pixel data;

Integrating the relationships between the corresponding elements into a model of global motion;

Resampling pixel data associated with the one or more objects in the two or more video frames using the correspondence model and thereby generating resampled pixel data, wherein the resampled pixel data comprises: Indicates a first intermediate form of data;

Restoring a spatial position of the resampled pixel data using the correspondence model, thereby generating a reconstructed pixel; And

Recombining the reconstructed pixels with an associated portion of the second intermediate form of the data to produce an original video frame,

No detection dictates indirect detection for the entire frame,

The detecting and tracking step includes using a face detection algorithm,

Generating the correspondence model includes using a robust estimator for a solution of a multidimensional projecting motion model,

Analyzing the corresponding elements comprises using appearance based motion estimation between the two or more video frames;

The method of claim 1,

Dividing the pixel data associated with the one or more objects from other pixel data in the two or more video frames to produce a second intermediate form of the data, wherein the partitioning uses temporal integration; And

Reassembling the reconstructed pixel with the associated portion of the second intermediate form of data to produce an original video frame,

The method of claim 1, comprising a method of factoring the correspondence model into a global model, the method comprising:

Incorporating relationships between the corresponding elements into a model of global motion,

Generating the correspondence model includes using robust sampling consensus for a solution of a two-dimensional affine motion model,

Analyzing the corresponding element comprises using a sampling population based on finite differences generated from block-based motion estimation between two or more of the video frames;

The method of claim 1, comprising encoding a first intermediate form of the data, the encoding comprising:

Decomposing the resampled pixel data into an encoded representation, the encoded representation representing a third intermediate form of the data;

Truncating zero bytes or more from the encoded representation; And

Reconstructing the resampled pixel data from the encoded representation,

The decomposition and reconstitution step using the principal component analysis (Principle Component Analysis),

The method of claim 3, comprising a method of factoring the correspondence models into global models, the method comprising:

Decomposing the resampled pixel data into an encoded representation, the encoded representation representing a fourth intermediate form of the data;

Truncating at least zero bytes of the encoded representation; And

Reconstructing the resampled pixel data from the encoded representation,

The decomposition and reconstitution step uses a principal component analysis (Principle Component Analysis),

Generating the correspondence models includes using robust sampling consensus for a solution of a two-dimensional affine motion model,

Analyzing the corresponding elements comprises using a sampling population based on finite differences generated from block based motion estimation between two or more of the video frames;

The method of claim 6, wherein each of the two or more video frames comprises object pixels and non-object pixels, the method comprising:

Identifying corresponding elements in non-object pixels in the two or more video frames;

Analyzing corresponding elements in the non-object pixels to create a relationship between corresponding elements in the non-object pixels;

Generating second correspondence models using the relationship between corresponding elements in the non-object pixels;

The analysis of the corresponding elements comprises a time based occlusion filter,

The method of claim 7, wherein

Factoring the corresponding models into global models;

Decomposing the resampled pixel data into an encoded representation, the encoded representation representing a fifth intermediate form of the data;

Truncating zero bytes or more from the encoded representation; And

Reconstructing the resampled pixel data from the encoded representation,

Each of the decomposition and reconstruction steps utilizes a conventional video compression / decomposition process,

A computer-implemented method of separating data vectors present in discrete linear subspaces,

(a) performing subspace segmentation on the data vectors; And

(b) forcing subspace segmentation criteria through the application of tangent vector analysis to the implicit vector space,

Performing the subspace partitioning comprises using GPCA; The implicit vector space includes a Veronese Map; The tangent vector analysis includes a Regendre transformation,

The method of claim 9,

Holding a subset of the set of data vectors;

Performing (a) and (b) on a subset of the set of data vectors,

The method of claim 5,

(a) performing subspace partitioning on the first intermediate form of the data;

(b) forcing subspace partitioning criteria through the application of tangent vector analysis to the implicit vector space;

Retaining a subset of the first intermediate form of the data;

Performing (a) and (b) on a subset of said first intermediate form of said data,

Performing the subspace partitioning comprises using GPCA;

The implicit vector space includes a Veronese Map;

The tangent vector analysis includes a Regendre transformation,

8. The method of claim 7, comprising a method for factoring the correspondence models into global models, the method comprising:

(a) integrating the relationships between the corresponding elements into a model of global motion;

(b) performing subspace segmentation on the data vectors, wherein performing the subspace segmentation comprises using GPCA;

(c) forcing subspace partitioning criteria through the application of tangent vector analysis to the implicit vector space;

(d) holding a subset of the set of data vectors;

(e) performing (b) and (c) on the subset of the set of data vectors, wherein the implicit vector space comprises a Veronese map; The tangent vector analysis comprises a Regards transform;

After (a) to (e) has been carried out, the method is:

(f) decomposing the resampled pixel data into an encoded representation, the encoded representation representing a fourth intermediate form of the data;

(g) truncating at least zero bytes of the encoded representation; And

(h) reconstructing the resampled pixel data from the encoded representation,

Analyzing the corresponding elements comprises using a sampling population based on a finite difference generated from block based motion estimation between the two or more video frames;

The method of claim 1, comprising factoring the correspondence models into local deformation models, the method comprising:

Defining a two-dimensional mesh on top of pixels corresponding to one or more objects, the mesh based on a regular grid of vertices and edges; And

Generating a model of local motion from the relationships between the corresponding elements, wherein the relationships include vertex displacements based on finite differences generated from block based motion estimation between two or more of the video frames. ,

The method of claim 13, wherein the vertex corresponds to discrete image features, and the method includes identifying salient image features corresponding to the object using analysis of an image gradient Harris response.

The method of claim 4, wherein

Forwarding a first intermediate form of the data to be factored into a local deformation model;

The method of claim 6,

Sending a fourth intermediate form of data to be factored into a local deformation model;

The local motion model is based on residual motion not approximated by the global motion model,

The method of claim 12,

The method of claim 2, comprising encoding the first intermediate form of the data, the encoding comprising:

Truncating at least zero bytes of the encoded representation; And

Reconstructing the resampled pixel data from the encoded representation,

Each of the decomposition and reconstitution steps uses principal component analysis,

The method of claim 2, comprising a method of factoring the correspondence model into a global model, the method comprising:

Truncating at least zero bytes of the encoded representation;

Reconstructing the resampled pixel data from the encoded representation,

Each of said decomposition and reconstitution steps uses principal component analysis,

Generating the responsiveness model includes using a robust estimator for the solution of the multidimensional projecting motion model,

Analyzing the corresponding elements comprises using a sampling population based on a finite difference generated from block-based motion estimation between the two or more video frames.

The method of claim 19, wherein each of the two or more video frames comprises an object pixel and a non-object pixel, the method comprising:

The method of claim 20,

Factoring the corresponding models into global models;

Truncating at least zero bytes of the encoded representation; And

Reconstructing the resampled pixel data from the encoded representation,

The method of claim 20, comprising a method of factoring the correspondence models into global models, the method comprising:

(d) holding a subset of the set of data vectors;

After (a) to (e) has been carried out, the method is:

(g) truncating at least zero bytes of the encoded representation; And

(h) reconstructing the resampled pixel data from the encoded representation,

The method of claim 2, comprising a method for factoring the correspondence models into local deformation models, the method comprising:

The method of claim 22, wherein the vertex corresponds to discrete image features, and the method includes identifying salient image features corresponding to the object using analysis of image density gradients.

The method of claim 19,

Forwarding a fourth intermediate form of the data to be factored into a local deformation model;

The method of claim 23, wherein