KR20070086350A

KR20070086350A - Apparatus and method for processing video data

Info

Publication number: KR20070086350A
Application number: KR1020077013724A
Authority: KR
Inventors: 존 웨이즈; 찰스 폴 페이스
Original assignee: 유클리드 디스커버리스, 엘엘씨
Priority date: 2004-11-17
Filing date: 2005-11-16
Publication date: 2007-08-27
Also published as: AU2005306599C1; CN101103364A; EP1815397A2; CN101103364B; EP1815397A4; WO2006055512A3; JP2008521347A; AU2005306599A1; WO2006055512A2; AU2005306599B2

Abstract

An apparatus and methods for processing video data are described. The invention provides a representation of video data that can be used to assess agreement between the data and a fitting model for a particular parameterization of the data. This allows the comparison of different parameterization techniques and the selection of the optimum one for continued video processing of the particular data. The representation can be utilized in intermediate form as part of a larger process or as a feedback mechanism for processing video data. When utilized in its intermediate form, the invention can be used in processes for storage, enhancement, refinement, feature extraction, compression, coding, and transmission of video data. The invention serves to extract salient information in a robust and efficient manner while addressing the problems typically associated with video data sources.

Description

Apparatus and method for processing video data {APPARATUS AND METHOD FOR PROCESSING VIDEO DATA}

본 출원은 2004년 11월 17일자로 가출원된 미국 특허 60/628,861 "System and Method For Video Compression Employing Principal Component Analysis"와, 2004년 11월 17일자로 가출원된 미국 특허 60/628,819 "Apparatus and Methods for Processing and Coding Video Data"에 대한 우선권을 주장한다. 본 출원은 2005년 7월 28일자로 출원된 미국 특허 11/191,562의 일부계속출원으로서 2005년 9월 20일에 출원된 미국 특허 11/230,686의 일부계속출원이다. 선행 특허들의 각각은 전체적으로 여기에서 참조에 의해 편입된다.This application is directed to US Patent 60 / 628,861 "System and Method For Video Compression Employing Principal Component Analysis", filed Nov. 17, 2004, and US Patent 60 / 628,819, "Apparatus and Methods for, filed November 17, 2004." Processing and Coding Video Data. " This application is partly filed in US Patent 11 / 191,562 filed July 28, 2005, and partly filed in US Patent 11 / 230,686, filed September 20, 2005. Each of the preceding patents is incorporated herein by reference in its entirety.

본 발명은 일반적으로 디지털 신호 처리 분야, 더욱 상세하게는 신호 또는 이미지 데이터, 더욱 상세하게는 비디오 데이터의 처리 그리고 효율적인 표현을 위한 컴퓨터 장치 및 컴퓨터-실행용 방법들에 관한 것이다.FIELD OF THE INVENTION The present invention generally relates to the field of digital signal processing, more particularly to computer or computer-implemented methods for the processing and efficient representation of signal or image data, more particularly video data.

현재 본 발명이 속하는 종래 기술 분야의 일반적인 시스템 기술은 도 1과 같이 표현될 수 있다. 여기서, 블록도는 통상적인 종래의 비디오 처리 시스템을 나타낸다. 이러한 시스템들은 통상적으로 하기의 단계들을 포함한다 : 입력 단계(102), 처리 단계(104), 출력 단계(106), 및 하나 이상의 데이터 저장 메커니즘 (들)(108).The general system technology in the prior art to which the present invention pertains can be expressed as shown in FIG. 1. Here, the block diagram shows a conventional conventional video processing system. Such systems typically include the following steps: input step 102, processing step 104, output step 106, and one or more data storage mechanism (s) 108.

입력 단계(102)는 카메라 센서들, 카메라 센서 어레이들, 거리 측정 센서들(range finding sensors)과 같은 엘리먼트들, 또는 저장 메커니즘으로부터 데이터를 검색하기 위한 수단을 포함할 수 있다. 입력 단계는 자연적으로 일어나는 현상들 및/또는 인공의 시간 상관 시퀀스들(time correlated sequences of man-made)을 나타내는 비디오 데이터를 제공한다. 데이터의 중요 성분(salient component)이 잡음 또는 다른 원하지 않는 신호들에 의해 손상(contamination)되거나 마스크(mask)될 수 있다.The input step 102 may include elements such as camera sensors, camera sensor arrays, range finding sensors, or means for retrieving data from the storage mechanism. The input step provides video data representing naturally occurring phenomena and / or artificial time correlated sequences of man-made. Salient components of the data may be masked or masked by noise or other unwanted signals.

데이터 스트림, 어레이, 또는 패킷 형태의 비디오 데이터는 미리 정의된 전송 프로토콜에 따라 중간 저장 엘리먼트(108)를 통해 또는 바로 처리 단계(104)에 전달될 수 있다. 처리 단계(104)는 전용의 아날로그 또는 디지털 장치들, 또는 중앙 처리 유닛들(CPUs : central processing units), 디지털 신호 처리기들(DSPs : digital signal processors)과 같은 프로그램 가능 장치들, 또는 원하는 일련의 비디오 데이터 처리 동작들을 실행하기 위한 현장 프로그램 가능 게이트 어레이들(FPGAs : field programmable gate arrays)의 형태를 취할 수 있다. 처리 단계(104)는 통상적으로 하나 이상의 코덱들(코더/디코더들)을 포함한다.Video data in the form of data streams, arrays, or packets may be passed through the intermediate storage element 108 or directly to the processing step 104 in accordance with a predefined transport protocol. Processing step 104 may be dedicated analog or digital devices, or programmable devices such as central processing units (CPUs), digital signal processors (DSPs), or a desired series of video. It may take the form of field programmable gate arrays (FPGAs) for performing data processing operations. Processing step 104 typically includes one or more codecs (coders / decoders).

출력 단계(106)는 신호, 디스플레이, 또는 사용자나 외부 장치에 영향을 줄 수 있는 다른 응답을 산출한다. 통상적으로, 출력 장치는 지시자 신호, 디스플레이, 하드카피, 저장소에서 처리된 데이터의 표현을 생성하기 위해, 또는 원격 측(a remote site)으로의 데이터 전송을 개시하기 위해 사용된다. 또한, 출력 장치는 차후의 처리 동작들에서 사용되기 위한 중간 신호 또는 제어 파라미터를 생성하기 위해 사용될 수 있다.The output step 106 yields a signal, display, or other response that may affect the user or external device. Typically, the output device is used to generate an indicator signal, display, hard copy, representation of the processed data in storage, or to initiate data transfer to a remote site. The output device can also be used to generate an intermediate signal or control parameter for use in subsequent processing operations.

상기 시스템에서는 저장소가 광학 엘리먼트로서 표현된다. 사용될 때, 저장 엘리먼트(108)는 읽기 전용 저장 매체들과 같은 비휘발성일 수도 있고, 또는 동적 임의 접근 메모리(RAM)와 같은 휘발성일 수도 있다. 개별 비디오 처리 시스템이 여러 타입의 저장 엘리먼트들을 포함하는 것이 드물지는 않으며, 상기 엘리먼트들은 입력, 처리, 및 출력 단계들에 대하여 다양한 관계들을 갖는다. 이러한 저장 엘리먼트들의 예시들에는 입력 버퍼들, 출력 버퍼들, 및 처리 캐시들(processing caches)이 포함된다.In this system the reservoir is represented as an optical element. When used, storage element 108 may be nonvolatile, such as read-only storage media, or may be volatile, such as dynamic random access memory (RAM). It is not uncommon for an individual video processing system to include several types of storage elements, which have various relationships with respect to the input, processing, and output stages. Examples of such storage elements include input buffers, output buffers, and processing caches.

도 1의 비디오 처리 시스템의 주목적은 특정한 애플리케이션을 위해 의미 있는 출력물을 생성하기 위해 입력 데이터를 처리하는 것이다. 이 목적을 달성하기 위하여, 잡음 감소 또는 제거, 특성 추출, 객체 분할 및/또는 정규화, 데이터 범주화, 이벤트 검출, 편집, 데이터 선택, 데이터 재-코딩, 및 변환 코딩과 같은 다양한 처리 동작들이 활용될 수 있다.The primary purpose of the video processing system of FIG. 1 is to process input data to produce meaningful output for a particular application. To accomplish this goal, various processing operations may be utilized, such as noise reduction or removal, feature extraction, object segmentation and / or normalization, data categorization, event detection, editing, data selection, data re-coding, and transform coding. have.

부족한 구속조건이 있는 데이터(poorly constrained data)를 생성하는 많은 데이터 소스들이 사람들에게, 특히 사운드와 시각적 이미지들에 중요하다. 대부분의 경우, 상기 소스 신호들의 본질적 특징들은 효율적인 데이터 처리의 목적과 충돌한다. 소스 데이터의 내재적 변화성(intrinsic variability)은 공학적 추정들을 도출하는데 사용되는 본래 경험적이고 발견적인 방법들로부터 발생하는 오류들 없이 신뢰성 있고 효율적인 방식으로 데이터를 처리하는데 장애물이다. 상기 변화성 은 입력 데이터가 자연스레 또는 의도적으로 (제한된 일련의 심볼 값들 또는 협대역폭과 같은) 좁게 정의된 특징 세트들로 구속될 때 애플리케이션들에 대하여 감소된다. 상기 구속조건들(constraints) 모두가 낮은 상업적 가치를 갖는 처리 기법들을 너무 자주 야기한다.Many data sources that produce poorly constrained data are important to people, especially sound and visual images. In most cases, the essential features of the source signals conflict with the purpose of efficient data processing. Intrinsic variability of source data is an obstacle to processing the data in a reliable and efficient manner without errors arising from the original empirical and heuristic methods used to derive engineering estimates. The variability is reduced for applications when the input data is naturally or deliberately constrained to a narrowly defined feature set (such as a limited set of symbol values or narrow bandwidth). All of the above constraints cause processing techniques with low commercial value too often.

신호 처리 시스템의 설계는 시스템의 의도되는 용도와 입력으로서 사용되는 소스 신호의 예상되는 특징들에 의해 영향받는다. 대부분의 경우, 요구되는 성능 효율성이 또한 중대한 설계 인자일 수 있다. 성능 효율성은 이용 가능한 연산 능력과 비교되는 연산 복잡성뿐만 아니라 이용 가능한 데이터 저장 용량과 비교되는 처리될 데이터량에 의해 영향받는다.The design of the signal processing system is influenced by the intended use of the system and the expected characteristics of the source signal used as the input. In most cases, the required performance efficiency can also be a significant design factor. Performance efficiency is affected by the amount of data to be processed compared to the available data storage capacity as well as the computational complexity compared to the available computing power.

종래의 비디오 처리 방법들은 느린 데이터 통신 속도들, 큰 저장 용량 요구조건들, 및 지각적 인공물들(perceptual artifacts)을 방해하는 형태로 나타나는 다수의 비효율성들을 겪는다. 이들은 심각한 문제점들인데, 그 이유는 사람들이 비디오 데이터를 사용하고 조작하길 원하는 방법들의 다양성 그리고 일부 형태의 시각적 정보를 위해 사람들이 갖는 고유의 민감성 때문이다.Conventional video processing methods suffer from a number of inefficiencies that manifest themselves in the form of slow data communication speeds, large storage capacity requirements, and perceptual artifacts. These are serious problems because of the variety of ways people want to use and manipulate video data and the inherent sensitivity people have for some form of visual information.

"최적" 비디오 처리 시스템은 효율적이고, 신뢰성 있고, 원하는 일련의 처리 동작들의 수행시의 튼튼함(robust)이다. 이러한 동작들은 저장, 전송, 디스플레이, 압축, 편집, 암호화, 증진(enhancement), 범주화, 특성 검출, 및 데이터 인식(recognition)을 포함할 수 있다. 제2 동작들은 상기 처리된 데이터의 다른 정보 소스들과의 통합을 포함한다. 동일하게 중요한 것으로, 비디오 처리 시스템의 경우에는, 지각적 인공물들의 도입을 방지함으로써 출력물들이 인간 시각과 양립할 수 있어야 한다.An "optimal" video processing system is efficient, reliable, and robust in performing a desired series of processing operations. Such operations may include storage, transmission, display, compression, editing, encryption, enhancement, categorization, feature detection, and data recognition. Second operations include incorporation of the processed data with other information sources. Equally important, in the case of a video processing system, the outputs must be compatible with human vision by preventing the introduction of perceptual artifacts.

비디오 처리 시스템은 그 속도, 효율성, 및 품질이 입력 데이터의 임의의 특정한 특징들의 특질들(specifics)에 심하게 종속적이지 않는다면, "튼튼함"으로서 기술될 수 있다. 튼튼함(robustness)은 또한 입력 중의 일부가 잘못될 때 동작들을 수행하는 능력과 관련된다. 많은 비디오 처리 시스템들이 일반적인 등급들의 애플리케이션들을 고려하기에 충분히 튼튼한 것에 실패한다 - 시스템의 개발에서 사용되었던 좁게 구속조건이 있는 동일한 데이터(the same narrowly constrained data)에 대한 애플리케이션만을 제공한다.A video processing system can be described as "strong" if its speed, efficiency, and quality are not heavily dependent on the specifics of any particular features of the input data. Robustness also relates to the ability to perform actions when some of the input goes wrong. Many video processing systems fail to be robust enough to take into account general grades of applications-providing only the application for the same narrowly constrained data that was used in the development of the system.

중요 정보는 센싱된 현상의 신호 특징들을 매칭하지 않는 입력 엘리먼트의 샘플링 레이트로 인해 연속-평가된 데이터 소스의 이산화(discretization)에서 손실될 수 있다. 또한, 신호의 세기가 센서의 한계들을 초과할 때 손실이 발생하고, 포화(saturation)가 야기된다. 유사하게, 입력 데이터의 전체 범위의 값들(the full range of values)이 일련의 이산 값들로 표현될 때 임의의 양자화(quantization) 과정에서 입력 데이터의 정확도가 감소할 때 정보가 손실되고, 그로 인해 데이터 표현의 정확도가 감소한다.Critical information may be lost in discretization of the continuously-evaluated data source due to the sampling rate of the input element that does not match the signal characteristics of the sensed phenomenon. In addition, loss occurs when the strength of the signal exceeds the limits of the sensor, resulting in saturation. Similarly, when the full range of values of the input data is represented as a series of discrete values, information is lost when the accuracy of the input data decreases in any quantization process, thereby causing the data to be lost. The accuracy of the representation is reduced.

앙상블 변화성은 한 클래스의 데이터 또는 정보 소스들의 임의의 예측불가능성을 참조한다. 시각적 정보의 데이터 표본은 매우 큰 정도의 앙상블 변화성을 갖는데, 왜냐하면 시각적 정보에는 통상적으로 구속조건이 없기(unconstrained) 때문이다. 시각적 정보는 센서 어레이 상에서 입사광에 의해 형성될 수 있는 시공의 시퀀스 또는 임의의 공간적 어레이 시퀀스를 나타낼 수 있다.Ensemble variability refers to any unpredictability of a class of data or information sources. Data samples of visual information have a very large ensemble variability, because visual information is typically unconstrained. The visual information may represent a sequence of constructions or any spatial array sequence that may be formed by incident light on the sensor array.

시각적 현상의 모델링에 있어서, 비디오 처리기들은 일반적으로 구조 및/또는 구속조건들의 일부 세트를 데이터가 표현되거나 해석되는 방식에 부과한다. 그 결과로, 이러한 방법들은 출력물의 품질, 출력물이 평가될 수 있도록 하는 신뢰성, 및 데이터에 대하여 신뢰할만하게 수행될 수 있는 차후의 처리 작업들의 타입과 충돌할 수도 있는 시스템적인 오류들을 일으킬 수 있다.In modeling visual phenomena, video processors generally impose some set of structures and / or constraints on the way data is represented or interpreted. As a result, these methods can cause systematic errors that may conflict with the quality of the output, the reliability that allows the output to be evaluated, and the type of subsequent processing tasks that can be reliably performed on the data.

양자화 방법들은 비디오 프레임들의 데이터의 정확도를 감소시키면서 동시에 상기 데이터의 통계적 변화(variation)를 유지하도록 시도한다. 통상적으로, 비디오 데이터는 데이터 값들의 분포들이 확률 분포들로 수집되도록 분석된다. 또한, 데이터를 공간 주파수들의 혼합으로서 특징짓기 위하여 상기 데이터가 위상 공간이 되도록 하는 방법들도 존재하며, 그에 의해 정확도 감소가 덜 못마땅한 방식으로 확산될 수 있다. 중요하게 활용될 때, 상기 양자화 방법들은 지각적으로 이상한(perceptually implausible) 색들을 종종 야기하고, 비디오 프레임의 본래의 매끄러운 영역들에서 갑작스런 픽실레이션(pixilation)을 유도할 수 있다. Quantization methods attempt to maintain a statistical variation of the data while at the same time reducing the accuracy of the data of the video frames. Typically, video data is analyzed such that distributions of data values are collected into probability distributions. There are also methods to make the data phase space in order to characterize the data as a mixture of spatial frequencies, whereby the reduction in accuracy can be spread in a less desirable manner. Significantly utilized, the quantization methods often cause perceptually implausible colors and can lead to sudden pixilation in the original smooth regions of the video frame.

또한, 데이터의 지역적 공간 유사성을 이용하기 위해 상이한 코딩이 통상적으로 사용될 수 있다. 프레임의 한 부분의 데이터는 상기 프레임에서 유사한 데이터 주변으로 클러스터링되는 경향이 있고, 또한 차후 프레임들에서 유사한 위치에 놓이게 된다. 공간적으로 인접한 데이터에 의하여 데이터를 표현하는 것은 양자화와 결합될 수 있고, 소정의 정확도의 경우에 차이들을 표현하는 것이 데이터의 절대값들을 이용하는 것보다 더 정확하다는 것이 넷 결과(net result)이다. 상기 가정은 블랙 및 화이트 비디오, 또는 로우-컬러(low-color) 비디오와 같은 본래 비디 오 데이터의 스펙트럴 분해능(해상도)이 제한될 때 잘 들어맞는다. 비디오의 스펙트럴 분해능이 증가할 때, 유사성의 가정은 상당히 브레이크 다운된다. 상기 브레이크 다운은 비디오 데이터의 정확도를 선택적으로 보장하는 무능력에 기인한다.In addition, different codings may typically be used to take advantage of local spatial similarity of data. Data in one portion of a frame tends to cluster around similar data in the frame, and also in a similar position in subsequent frames. Representing data by spatially adjacent data can be combined with quantization, and the net result is that representing the differences in the case of some accuracy is more accurate than using absolute values of the data. This assumption fits well when the spectral resolution (resolution) of the original video data, such as black and white video or low-color video, is limited. As the spectral resolution of the video increases, the assumption of similarity breaks down considerably. The breakdown is due to the inability to selectively guarantee the accuracy of the video data.

잔여 코딩(residual coding)은 본래 데이터의 정확도를 원하는 수준의 정밀도로 회복시키기 위하여 표현의 오류가 더 상이하게 인코딩되는 차별적 인코딩과 유사하다.Residual coding is similar to differential encoding in which errors in the representation are encoded differently to restore the accuracy of the original data to the desired level of precision.

상기 방법들의 변화성들은 비디오 데이터를 공간 위상 및 스케일에서 데이터 상관성들을 드러내는 교대적 표현들(alternate representations)로 변환하도록 시도한다. 비디오 데이터가 이러한 방식으로 일단 변환되면, 양자화와 차별적 코딩 방법들이 상기 변환된 데이터에 적용될 수 있고 중요한 이미지 특성들의 보호성이 증가한다. 상기 변환 비디오 압축 기법들 중에서 가장 우세한 두 가지는 이산 코사인 변환(DCT : discrete cosine transform)과 이산 웨이브릿 변환(DWT : discrete wavelet transform)이다. DCT 변환에서 오류는 비디오 데이터 값들의 넓은 변화성에서 나타나고, 그러므로 DCT는 통상적으로 상기 거짓 상관성들(false correlations)을 지역화하기 위하여 비디오 데이터의 블록들에서 사용된다. 상기 지역화로부터의 인공물들은 블록들의 경계(border)를 따라 종종 나타난다. DWT의 경우, 기본 함수(the basic function)와 일정한 구조들(certain textures) 사이에 미스매칭이 있을 경우에 더욱 복잡한 인공물들이 발생하고, 이는 흐릿한 효과를 야기한다. DCT 및 DWT의 부정적 효과들을 방해하기 위하여, 값비싼 대역폭의 값에서 더 낮은 왜곡(distortion)으로 표현의 정확도가 증가한다.Variations of the methods attempt to transform video data into alternate representations that reveal data correlations in spatial phase and scale. Once the video data is transformed in this manner, quantization and differential coding methods can be applied to the transformed data and the protection of important image characteristics is increased. The two most prevalent transform video compression techniques are discrete cosine transform (DCT) and discrete wavelet transform (DWT). Errors in the DCT transform appear in wide variability of video data values, and therefore DCT is typically used in blocks of video data to localize the false correlations. Artifacts from the localization often appear along the border of blocks. In the case of DWT, more complex artifacts occur when there is a mismatch between the basic function and certain textures, which causes a blurry effect. To counteract the negative effects of DCT and DWT, the accuracy of representation is increased with lower distortion at expensive bandwidth values.

본 발명은 기존의 종래 방법들에 있어서 연산 및 분석적 이점 모두를 제공하는 컴퓨터-실행 비디오 처리 방법이다. 본 발명에 따른 방법의 원리는 선형 분해 방법(a linear decompositional method), 공간 분할 방법, 및 공간 정규화 방법의 통합이다. 공간적으로 구속조건이 있는 비디오 데이터는 선형 분해 방법들의 튼튼함과 적용가능성을 크게 증가시킨다. 부가적으로, 공간 정규화에 대응하는 데이터의 공간 분할은 공간 정규화만으로부터 도출되는 이익들을 더 증가시킬 수 있다.The present invention is a computer-implemented video processing method that provides both computational and analytical advantages over existing conventional methods. The principle of the method according to the invention is the integration of a linear decompositional method, a spatial partitioning method, and a spatial normalization method. Spatially constrained video data greatly increases the robustness and applicability of linear decomposition methods. Additionally, spatial partitioning of data corresponding to spatial normalization can further increase the benefits derived from spatial normalization only.

특히, 본 발명은 신호 데이터가 효율적으로 하나 이상의 유익한 표현들로 처리될 수 있도록 하는 수단을 제공한다. 본 발명은 공통으로 발생하는 많은 데이터 세트들의 처리시 효율적이고 비디오 및 이미지 데이터의 처리시 특히 효율적이다. 본 발명에 따른 방법은 상기 데이터를 분석하고, 처리 및 인코딩을 활용하기 위해 상기 데이터의 하나 이상의 간결한 표현들을 제공한다.In particular, the present invention provides a means by which signal data can be efficiently processed into one or more beneficial representations. The present invention is efficient in the processing of many commonly occurring data sets and is particularly efficient in the processing of video and image data. The method according to the invention provides one or more concise representations of the data for analyzing the data and for utilizing processing and encoding.

각각의 새로운, 더욱 간결한 데이터 표현은 비디오 데이터의 인코딩, 압축, 전송, 분석, 저장, 및 디스플레이를 포함하지만 이에 제한되지는 않는 많은 애플리케이션들을 위하여 연산 처리, 전송 대역폭, 및 저장 요구조건들의 감소를 허용한다. 본 발명은 비디오 데이터의 중요 성분들의 식별 및 추출을 위한 방법들을 포함하며, 데이터의 처리 및 표현시 우선순위화를 허용한다. 신호의 잡음 및 다른 원하지 않는 부분들은 더 낮은 우선순위로서 식별되어 추가 처리가 비디오 신호의 더 높은 우선순위 부분들을 분석하고 표현하는데 집중될 수 있도록 한다. 그 결과로, 비디오 신호가 예전에 가능한 것보다 더욱 정확하게 표현된다. 그리고, 정밀도의 손실이 지각적으로 덜 중요한 비디오 신호의 부분들에 집중된다.Each new, more concise data representation allows for reduction of computational processing, transmission bandwidth, and storage requirements for many applications, including but not limited to encoding, compression, transmission, analysis, storage, and display of video data. do. The present invention includes methods for identification and extraction of important components of video data and allows prioritization in processing and presentation of the data. The noise and other unwanted parts of the signal are identified as lower priority so that further processing can be focused on analyzing and representing higher priority parts of the video signal. As a result, the video signal is represented more accurately than previously possible. And the loss of precision is concentrated in parts of the video signal that are less perceptually important.

도 1은 종래 기술에 따른 비디오 처리 시스템에 대한 블록도,1 is a block diagram of a video processing system according to the prior art;

도 2는 비디오를 처리하기 위한 주요 모듈들을 나타내는 본 발명의 개요를 보여주는 블록도,2 is a block diagram showing an overview of the present invention showing the main modules for processing video;

도 3은 본 발명의 움직임 추정 방법을 나타내는 블록도,3 is a block diagram showing a motion estimation method of the present invention;

도 4는 본 발명의 글로벌 등록 방법을 나타내는 블록도,4 is a block diagram showing a global registration method of the present invention;

도 5는 본 발명의 정규화 방법을 나타내는 블록도,5 is a block diagram showing a normalization method of the present invention;

도 6은 혼합형 공간 정규화 압축 방법(hybrid spatial normalization compression method)을 나타내는 블록도,FIG. 6 is a block diagram illustrating a hybrid spatial normalization compression method.

도 7은 지역적 정규화에서 사용되는 본 발명의 메쉬(mesh) 생성 방법을 나타내는 블록도,7 is a block diagram showing a mesh generation method of the present invention used in local normalization;

도 8은 지역적 정규화에서 사용되는 본 발명의 메쉬 기반 정규화 방법을 나타내는 블록도,8 is a block diagram illustrating a mesh-based normalization method of the present invention used in local normalization;

도 9는 본 발명의 결합형 글로벌 및 지역적 정규화 방법을 나타내는 블록도,9 is a block diagram illustrating a combined global and regional normalization method of the present invention;

도 10은 본 발명의 GPCA-기본 다항식 결합 및 차별화 방법을 나타내는 블록도,10 is a block diagram illustrating a GPCA-based polynomial binding and differentiation method of the present invention;

도 11은 본 발명의 회귀적(recursive) GPCA 세부 방법을 나타내는 블록도.11 is a block diagram illustrating a recursive GPCA detailed method of the present invention.

비디오 신호 데이터에서는, 투영되고 이미징되는 3차원 장면을 2차원 이미징 표면으로 일반적으로 그리는 이미지들의 시퀀스로 비디오의 프레임들이 조립된다. 각 프레임, 또는 이미지는 샘플링된 신호에 대한 이미징 센서 응답을 표현하는 화소(픽셀)로 구성된다. 종종, 샘플링된 신호는 2차원 센서 어레이에 의해 반사되고(reflected), 굴절되고(refacted), 또는 에너지(예를 들면, 전자파, 음향 등)가 방사되고, 샘플링되는 일부에 대응한다. 연속적인 순차적 샘플링은 비디오 시퀀스에서 프레임들의 순서에 대응하는 시간 차원과 프레임당 두 개의 공간 차원들을 갖는 시공간 데이터 스트림을 도출한다.In video signal data, frames of video are assembled into a sequence of images that typically draw the projected and imaged three-dimensional scene onto a two-dimensional imaging surface. Each frame, or image, consists of pixels (pixels) that represent imaging sensor responses to sampled signals. Often, the sampled signal corresponds to the portion that is reflected, refracted, or radiated (e.g., electromagnetic, acoustic, etc.) by the two-dimensional sensor array, and sampled. Successive sequential sampling results in a spatiotemporal data stream having a temporal dimension corresponding to the order of frames in the video sequence and two spatial dimensions per frame.

도 2에 도시된 바와 같은 본 발명은 신호 데이터를 분석하고 중요 성분들을 식별한다. 신호가 비디오 데이터로 구성되는 경우, 시공간 스트림의 분석은 얼굴들과 같이 종종 특정한 객체들인 중요 성분들을 드러낸다. 식별 과정은 중요 성분들의 존재와 중요성을 한정하고(qualify), 상기 한정된 중요 성분들 중에서 하나 이상의 가장 중요한 성분을 선택한다. 이는, 곧 기술되는 처리 과정과 병렬적으로 또는 이후에 덜 중요한 다른 성분들의 식별 및 처리 과정을 제한하지 않는다. 앞서 언급된 중요 성분들은 더 분석되어 변형 및 불변 부분성분들(variant and invariant subcomponents)을 식별한다. 불변 부분성분들의 식별은 성분들의 일부 측면을 모델링하는 과정이고, 이로 인해 상기 성분들이 원하는 수준의 정밀도로 합성될 수 있도록 하는 모델의 파라미터화가 이루어진다.The present invention as shown in FIG. 2 analyzes the signal data and identifies important components. When the signal consists of video data, analysis of the space-time stream reveals important components that are often specific objects, such as faces. The identification process qualifies the presence and importance of the key components and selects one or more of the most important components from the above defined key components. This does not limit the identification and processing of other components that are less important in parallel or later with the processing described soon. The key components mentioned above are further analyzed to identify variant and invariant subcomponents. Identification of invariant subcomponents is the process of modeling some aspects of the components, which results in parameterization of the model that allows the components to be synthesized to the desired level of precision.

본 발명의 한 실시예에서는, 전경 객체(a foreground object)가 검출되고 추적된다. 상기 객체의 픽셀(pels)은 비디오의 각 프레임으로부터 식별되고 분할된다. 블록-기반 움직임 추정이 다중 프레임들로 상기 분할된 객체에 적용된다. 상 기 움직임 추정치들은 그런 다음 상위 순서 움직임 모델로 통합된다. 움직임 모델이 상기 객체의 인스턴스들을 공통 공간 구성(a common spatial configuration)으로 왜곡(warp)하기 위해 사용된다. 상기 구성에서, 일정한 데이터의 경우, 객체의 그 이상의 특성들이 정렬(align)된다. 상기 정규화는 다중 프레임들에 걸쳐 있는 객체들의 픽셀의 값들에 대한 선형 분해(the linear decomposition)가 조밀하게 표현될 수 있도록 허용한다. 객체의 외관에 적합한 중요한 정보가 상기 조밀한 표현에 포함된다.In one embodiment of the invention, a foreground object is detected and tracked. The pixels of the object are identified and segmented from each frame of video. Block-based motion estimation is applied to the divided object in multiple frames. The motion estimates are then integrated into the higher order motion model. A motion model is used to warp instances of the object into a common spatial configuration. In the above configuration, for certain data, further properties of the object are aligned. The normalization allows the linear decomposition of the values of the pixels of objects across multiple frames to be represented densely. Important information suitable for the appearance of the object is included in the compact representation.

본 발명의 바람직한 실시예는 전경 비디오 객체의 선형 분해를 상세하게 기술한다. 상기 객체는 공간적으로 정규화되고, 그에 의해 조밀한 선형 외관 모델이 산출된다. 다른 바람직한 실시예는 추가로 공간 정규화에 앞서 비디오 프레임의 배경으로부터 상기 전경 객체를 분할한다.A preferred embodiment of the present invention describes in detail the linear decomposition of the foreground video object. The object is spatially normalized, resulting in a compact linear appearance model. Another preferred embodiment further partitions the foreground object from the background of the video frame prior to spatial normalization.

본 발명의 바람직한 실시예는 적은 움직임을 겪는 동안에 카메라에 말하는(speak into a camera) 사람의 비디오에 본 발명을 적용한다.The preferred embodiment of the present invention applies the present invention to video of a person speaking into a camera while experiencing little movement.

본 발명의 바람직한 실시예는 공간 변환들을 통해 잘 표현될 수 있는 비디오의 임의의 객체에 본 발명을 적용한다.The preferred embodiment of the present invention applies the present invention to any object of video that can be well represented through spatial transforms.

본 발명의 바람직한 실시예는 비디오의 둘 이상의 프레임들 사이에 유한 차분들을 결정하기 위하여 블록-기반 움직임 추정을 특히 사용한다. 상위 순서 움직임 모델은 더욱 효과적인 선형 분해를 제공하기 위하여 상기 유한 차분들로부터 팩터된다(factored).A preferred embodiment of the present invention particularly uses block-based motion estimation to determine finite differences between two or more frames of video. The higher order motion model is factored from the finite differences to provide more effective linear decomposition.

검출 & 추적Detection & Tracking

신호의 구성적인 중요 성분들이 일단 결정되면, 상기 성분들은 유지될 것이고, 모든 다른 신호 성분들은 감소하거나 제거될 것이다. 중요 성분의 검출 과정이 도 2에 도시되어 있으며, 여기서 비디오 프레임(202)은 하나 이상의 객체 검출(206) 과정들에 의해 처리되고, 그 결과로 하나 이상의 객체들이 식별되고 차후에 추적된다. 유지되는 성분들은 비디오 데이터의 중간 형태를 나타낸다. 상기 중간 데이터는 기존의 비디오 처리 방법들에는 통상적으로 이용될 수 없는 기법들에 의해 인코딩될 수 있다. 중간 데이터가 여러 형태들로 존재하므로, 표준 비디오 인코딩 기법들이 또한 상기 여러 개의 중간 형태들을 인코딩하기 위해 사용될 수 있다. 예를 들면, 본 발명은 가장 효율적인 인코딩 기법을 결정하고 사용한다.Once the constitutive important components of the signal are determined, the components will be retained and all other signal components will be reduced or eliminated. The process of detection of critical components is shown in FIG. 2, where video frame 202 is processed by one or more object detection 206 processes, as a result of which one or more objects are identified and subsequently tracked. The components maintained represent an intermediate form of the video data. The intermediate data may be encoded by techniques not normally available in existing video processing methods. Since intermediate data exists in several forms, standard video encoding techniques can also be used to encode the various intermediate forms. For example, the present invention determines and uses the most efficient encoding technique.

한 바람직한 실시예에서, 중요성 분석 과정(a saliency analysis process)은 중요 신호 모드들을 검출하고 분류한다. 상기 과정의 한 실시예는, 그 세기가 비디오 프레임 내 객체의 검출된 중요성에 비례하는 응답 신호를 생성하기 위해 특별히 설계된 공간 필터들의 조합을 사용한다. 상기 분류기는 비디오 프레임의 상이한 위치들에서 그리고 상이한 공간 스케일들에 적용된다. 분류기로부터의 응답의 세기는 중요 신호 모드의 존재의 가능성(the likelihood)을 지시한다. 강하게 중요한 객체에 걸쳐 집중될 때, 상기 과정은 상응하는 강한 응답으로 그것을 분류한다. 중요 신호 모드의 검출은 비디오 시퀀스의 중요 정보에 대한 차후의 처리 및 분석을 가능하게 함으로써 본 발명을 특징짓는다.In one preferred embodiment, a saliency analysis process detects and classifies critical signal modes. One embodiment of the process uses a combination of spatial filters specifically designed to generate a response signal whose intensity is proportional to the detected importance of the object in the video frame. The classifier is applied at different positions of the video frame and at different spatial scales. The strength of the response from the classifier indicates the likelihood of the existence of the critical signal mode. When concentrated over strongly important objects, the process classifies them into corresponding strong responses. Detection of critical signal modes characterizes the present invention by enabling subsequent processing and analysis of critical information of the video sequence.

비디오의 하나 이상의 프레임들에서 중요 신호 모드의 검출 위치가 주어지는 경우, 본 발명은 상기 중요 신호 모드의 불변 특성들을 분석한다. 부가하여, 본 발명은 불변 특성들을 위해, 신호의 잔여분, "덜-중요한" 신호 모드들을 분석한다. 불변 특성들의 식별은 중복 정보의 감소와 신호 모드들의 분할(즉, 분리)을 위한 기초를 제공한다.Given the detection location of the critical signal mode in one or more frames of video, the present invention analyzes the invariant characteristics of the critical signal mode. In addition, the present invention analyzes the residual, “less-significant” signal modes of the signal, for invariant properties. Identification of the invariant characteristics provides the basis for the reduction of redundant information and the division (ie separation) of signal modes.

특성 지점 추적Property Point Tracking

본 발명의 한 실시예에서, 하나 이상의 프레임들의 공간 위치들은 공간 명암도 필드 구배 분석(spatial intensity field gradient analysis)을 통해 결정된다. 상기 특성들은 "코너"로서 대충 기술될 수 있는 "라인들"의 일부 교차점에 상응한다. 상기 실시예는 또한 강한 코너들(strong corners)과 여기에서 특성 지점들로서 언급되는 공간적으로 서로 별개의 것(disparate) 모두인 일련의 코너들을 선택한다. 또한, 광 흐름의 계층적 다중-분해능 추정을 사용하는 것은 시간에 따라 특성 지점들에 대한 해석적 대체(translation displacement)의 결정을 가능하게 한다.In one embodiment of the present invention, the spatial positions of one or more frames are determined through spatial intensity field gradient analysis. These properties correspond to some intersection of "lines" which can be roughly described as "corners". The embodiment also selects a series of corners that are both strong corners and spatially disparate ones referred to herein as feature points. In addition, using hierarchical multi-resolution estimation of the light flow enables the determination of the translational displacement for characteristic points over time.

도 2에서, 객체 추적(220) 과정은 객체 추적 과정들(208)로부터의 검출 인스턴스들을 통합(pull together)하고, 또한 그리고 또한 많은 비디오 프레임들(202&204)에 걸쳐 하나 이상의 검출된 객체들의 특성들의 대응들(correspondences)(222)을 식별하기 위해 보여진다.In FIG. 2, the object tracking 220 process pulls together the detection instances from the object tracking processes 208, and also of properties of one or more detected objects across many video frames 202 & 204. It is shown to identify the correspondences 222.

특성 추적의 비-제한적인 실시예는 특성들이 블록-기반 움직임 추정과 같은 더욱 통상적인 구배 분석 방법을 한정하기(qualify) 위해 사용되도록 이용될 수 있다.Non-limiting embodiments of feature tracking can be used such that the features are used to qualify more common gradient analysis methods, such as block-based motion estimation.

다른 실시예는 특성 추적에 기초하여 움직임 추정치들의 예측을 처리한다.Another embodiment processes the prediction of motion estimates based on the feature tracking.

객체-기반 검출 및 추적Object-Based Detection and Tracking

본 발명의 한 비-제한적인 실시예에서, 튼튼한 객체 분류기가 비디오 프레임들의 얼굴들을 추적하기 위해 사용된다. 이러한 분류기는 얼굴들로 향해진 방향성 에지들(oriented edges)에 대한 연쇄적 응답에 기초한다. 상기 분류기에서, 상기 에지들은 일련의 기본 Haar 특성들과 45도 만큼의 상기 특성들의 회전으로서 정의된다. 연쇄적 분류기는 AdaBoost 알고리즘의 변형이다. 부가하여, 응답 연산들은 합산된 영역 테이블들의 사용을 통해 최적화될 수 있다.In one non-limiting embodiment of the invention, a robust object classifier is used to track the faces of video frames. This classifier is based on chained responses to oriented edges directed at faces. In the classifier, the edges are defined as a series of basic Haar properties and rotation of the properties by 45 degrees. The cascade classifier is a variant of the AdaBoost algorithm. In addition, response operations can be optimized through the use of summed region tables.

지역적 등록Regional registration

등록(registration)은 둘 이상의 비디오 프레임들에서 식별되는 객체들의 엘리먼트들 사이에서 대응들의 할당을 동반한다. 상기 대응들은 비디오 데이터에서 시간적으로 별개인 지점들에서의 비디오 데이터 사이의 공간 관계들을 모델링하기 위한 기초가 된다.Registration is accompanied by the assignment of correspondences between elements of objects identified in two or more video frames. The correspondences are the basis for modeling spatial relationships between video data at points that are distinct in time in the video data.

특정한 실시예들 그리고 공지된 알고리즘들에 관하여 실행하기 위한 그에 연관된 리덕션들(reductions)과 상기 알고리즘들의 본 발명에 따른 파생물들(derivatives)을 기술하기 위하여, 등록을 위한 다양한 비-제한적인 수단이 본 발명을 위해 기술된다.In order to describe specific embodiments and related reductions for carrying out with respect to known algorithms and derivatives according to the invention of the above algorithms, various non-limiting means for registration are described. It is described for the invention.

시공간 시퀀스에서 명백한 광 흐름을 모델링하기 위한 한 수단은 비디오 데이터의 둘 이상의 프레임들로부터 유한 차분 필드(a finite difference field)의 생성을 통해 달성될 수 있다. 광 흐름 필드는 대응들이 공간 및 명암도 관점(a spatial and intensity sense) 모두에서 일정한 불변 구속조건들(certain constancy constraints)을 형성하는 경우에 약하게 추정될 수 있다.One means for modeling apparent light flow in a space-time sequence can be achieved through the generation of a finite difference field from two or more frames of video data. The light flow field can be weakly estimated if the correspondences form constant constancy constraints in both spatial and intensity sense.

도 3에 도시된 바와 같이, 프레임(302 또는 304)은 어쩌면 데시메이션 과정(a decimation process)(306), 또는 다른 일부의 부분-샘플링 과정(예를 들면 로우 패스 필터)을 통해 공간적으로 부분-샘플링된다. 상기 공간적으로 감소된 이미지들(310&312)은 추가로 부분-샘플링될 수도 있다.As shown in FIG. 3, the frame 302 or 304 may be spatially partially through a decimation process 306, or some other portion-sampling process (eg, a low pass filter). Sampled. The spatially reduced images 310 & 312 may be further partially-sampled.

다이아몬드 탐색Diamond navigation

비디오 프레임의 블록들로의 비-오버래핑 분할이 주어지는 경우, 각 블록에 대한 매치를 위해 비디오의 선행 프레임을 탐색한다. 완전 탐색 블록-기반(FSBB : full search block-based) 움직임 추정은 현재 프레임의 임의 블록과 비교될 때 최저 오류를 갖는 비디오의 선행 프레임의 위치를 찾는다. FSBB를 수행하는 것은 연산적으로 상당히 비쌀 수 있고, 지역화된 움직임의 추정에 기초한 다른 움직임 추정 스킨들에 비해 더 나은 매치를 종종 산출하지 않는다. 다이아몬드 탐색 블록-기반(DSBB : diamond search block-based) 구배 하강 움직임 추정은 임의 블록을 위한 최선 매치를 향하여 반복적으로 오류 구배(error gradient)를 가로지르기 위한 다양한 사이즈들의 다이아몬드 형상 탐색 패턴을 사용하는 FSBB에 대한 통상적인 대안이다.Given a non-overlapping division into blocks of a video frame, the preceding frame of the video is searched for a match for each block. Full search block-based (FSBB) motion estimation finds the position of the preceding frame of video with the lowest error when compared to any block of the current frame. Performing FSBB can be computationally expensive and often does not yield a better match compared to other motion estimation skins based on localized motion estimation. Diamond search block-based (DSBB) gradient descent motion estimation uses FSBB to use diamond shape search patterns of various sizes to repeatedly traverse the error gradient towards the best match for any block. Is a common alternative.

본 발명의 한 실시예에서, DSBB는 그 값들이 이후에 상위 순서 움직임 모델들에 팩터되는 유한 차분들을 생성하기 위하여 비디오의 하나 이상의 프레임 사이에서의 이미지 구배 필드의 분석에 사용된다.In one embodiment of the present invention, DSBB is used in the analysis of the image gradient field between one or more frames of video to produce finite differences whose values are later factored into higher order motion models.

블록-기반 움직임 추정이 정규화 메쉬의 꼭지점 분석과 균등한 것으로 볼 수 있음은 당업자에게 자명하다.It will be apparent to those skilled in the art that block-based motion estimation can be seen as equivalent to vertex analysis of normalized meshes.

위상-기반 움직임 추정Phase-based motion estimation

종래 기술에서, 블록-기반 움직임 추정은 통상적으로 하나 이상의 공간 매치들을 야기하는 공간 탐색으로서 구현되었다. 도 3에 도시된 바와 같은 위상-기반 정규화된 교차 상관(PNCC : phase-based normalized cross correlation)은 선행 프레임과 현재 프레임으로부터의 블록을 "위상 공간"으로 변환하고, 상기 두 블록들의 교차 상관을 찾는다. 교차 상관은 그 위치들이 두 블록들 사이의 에지들의 '위상 이동들(phase shifts)'에 상응하는 값들의 필드로서 표현된다. 상기 위치들은 스레솔딩(thresholding)을 통해 고립되고, 공간 좌표들로 역변환될 수 있다. 공간 좌표들은 별도의 에지 교체들이고, 움직임 벡터들에 상응한다.In the prior art, block-based motion estimation is typically implemented as spatial search resulting in one or more spatial matches. Phase-based normalized cross correlation (PNCC) as shown in FIG. 3 converts the blocks from the preceding frame and the current frame into “phase space” and finds the cross correlation of the two blocks. . Cross correlation is expressed as a field of values whose positions correspond to 'phase shifts' of the edges between the two blocks. The positions can be isolated through threading and inverted to spatial coordinates. The spatial coordinates are separate edge replacements and correspond to motion vectors.

PNCC의 장점들은 비디오 스트림에서 이득/노출 조정(gain/exposure adjustment)의 허용치를 가능하게 하는 콘트라스트 마스킹(constrast masking)을 포함한다. 또한, PNCC는 공간 기반 움직임 추정기로부터 많은 반복들을 취할 수도 있는 단일 단계로부터의 결과들을 허용한다. 또한, 움직임 추정치들은 정확한 부분-픽셀이다.Advantages of the PNCC include contrast masking, which allows for allowance of gain / exposure adjustment in the video stream. In addition, the PNCC allows for results from a single step that may take many iterations from a spatial based motion estimator. Also, the motion estimates are accurate part-pixels.

본 발명의 한 실시예는, 그 값들이 이후에 상위 순서 움직임 모델들에 팩터되는 유한 차분들을 생성하기 위하여 비디오의 하나 이상의 프레임들 사이의 이미지 구배 필드에 대한 분석에 PNCC를 활용한다.One embodiment of the present invention utilizes PNCC for analysis of an image gradient field between one or more frames of video to produce finite differences whose values are later factored into higher order motion models.

글로벌 등록Global registration

한 실시예에서, 본 발명은 유한 차분 추정들의 필드로부터의 하나 이상의 선 형 모델들을 팩터한다(factor). 상기 샘플링이 일어나는 필드는 여기에서 유한 차분들의 일반 모집단(general population)으로서 언급된다. 기술되는 방법은 RANSAC 알고리즘의 것과 유사한 튼튼한 추정을 사용한다.In one embodiment, the present invention factors one or more linear models from the field of finite difference estimates. The field in which the sampling takes place is referred to herein as a general population of finite differences. The described method uses robust estimation similar to that of the RANSAC algorithm.

도 4에 도시된 바와 같이, 글로벌 움직임 모델링의 경우, 유한 차분들은 상기 움직임 추정지들(410)의 임의 샘플링에 의해 반복적으로 처리되는 일반 모집단 풀(404)로 수집되는 해석적 움직임 추정치들(402)이고 선형 모델이 상기 샘플들을 팩터한다(420). 결과들은 그런 다음 임의 과정을 통해 발견된 바와 같이 모델에 대한 특이한 값들(outliers)의 배제를 통해 선형 모델을 더욱 명백하게 하기 위해 모집단(404)을 조정하기 위하여 사용된다.As shown in FIG. 4, for global motion modeling, finite differences are collected in analytical motion estimates 402 collected into a general population pool 404 that is repeatedly processed by random sampling of the motion estimates 410. And a linear model factors the samples (420). The results are then used to adjust the population 404 to make the linear model more explicit through the exclusion of outliers for the model as found through any process.

선형 모델 추정 알고리즘의 한 실시예에서, 움직임 모델 추정기는 선형 최소 자승 솔루션(a linear least squares solution)에 기초한다. 상기 종속성은 추정기가 특이한 값 데이터에 의해 버려지도록 유발한다. RANSAC에 기초하여, 개시된 방법은 데이터의 중대한 부분집합을 기술할 움직임 모델에 대해 조사하면서, 데이터의 부분집합에 대한 반복적인 추정을 통해 특이한 값들의 효과를 고려하는 튼튼한 방법이다. 각 조사(probe)에 의해 생성되는 모델은 자신이 표현하는 데이터의 퍼센티지에 대하여 검사된다. 충분한 수의 반복들이 존재하는 경우, 데이터의 최대 부분집합을 결합시키는 모델이 발견될 것이다.In one embodiment of the linear model estimation algorithm, the motion model estimator is based on a linear least squares solution. The dependency causes the estimator to be discarded by unusual value data. Based on RANSAC, the disclosed method is a robust method that considers the effects of unusual values through iterative estimation on a subset of the data, while examining the motion model that will describe the significant subset of the data. The model generated by each probe is checked against the percentage of data it represents. If there are a sufficient number of iterations, a model will be found that combines the largest subset of data.

도 4에서 인식되고 도시된 바와 같이, 본 발명은 유한 차분들(샘플들)의 초기 샘플링과 선형 모델의 최소 자승 추정을 동반하는 알고리즘의 반복 형태인 RANSAC 알고리즘을 넘어서서 혁신을 개시한다. 해결된 선형 모델을 이용하여 일반 모집단의 모든 샘플들에 대하여 합성 오류가 평가된다. 그 잔여분이 사전설정 임계치에 이르는 샘플들의 개수에 기초하여 선형 모델에 대한 순위가 할당된다. 상기 순위는 "후보자 컨센서스"로 고려된다.As recognized and shown in FIG. 4, the present invention discloses innovations beyond the RANSAC algorithm, which is an iterative form of the algorithm, accompanied by initial sampling of finite differences (samples) and least squares estimation of the linear model. Using the solved linear model, the synthesis error is evaluated for all samples of the general population. A rank is assigned for the linear model based on the number of samples whose residuals reach a preset threshold. The ranking is considered "candidate consensus."

초기 샘플링(initial sampling), 해결(solving), 및 순위화(ranking)는 종료 기준이 충족되기 전까지 반복적으로 수행된다. 일단 기준이 충족되면, 최고 순위를 갖는 선형 모델이 모집단의 최종 컨센서스로 간주된다.Initial sampling, solving, and ranking are performed repeatedly until the termination criteria are met. Once the criteria are met, the linear model with the highest rank is considered the final consensus of the population.

선택적 정제(refinement) 단계는 후보자 모델에 최선으로 일치하는 순서로 샘플들의 부분집합을 반복적으로 분석하고, 한 샘플을 더 부가하는 것이 전체 부분집합을 위한 잔여 오류 임계치를 초과하기 전까지 부분집합 사이즈를 반복적으로 증가시킨다.The optional refinement step iteratively analyzes a subset of samples in the order that best matches the candidate model, and repeats the subset size until adding more samples exceeds the residual error threshold for the entire subset. To increase.

도 4에 도시된 바와 같이, 글로벌 모델 추정 과정(450)은 컨센서스 순위 수용성 검사(consensus rank acceptability test)가 충족되기 전까지(452) 반복된다. 순위가 달성되지 못하는 경우, 유한 차분들의 모집단(404)은 선형 모델을 드러내기 위한 노력으로 상기 발견된 모델에 비례하여 정렬된다. 최선(최고 순위) 움직임 모델은 과정(460)에서 솔루션 집합에 부가된다. 그러면 상기 모델은 과정(470)에서 재-평가된다. 완료에 따라, 모집단(404)이 재-정렬된다.As shown in FIG. 4, the global model estimation process 450 is repeated 452 until the consensus rank acceptability test is met. If the ranking is not achieved, the population of finite differences 404 is aligned in proportion to the found model in an effort to reveal a linear model. The best (highest rank) motion model is added to the solution set in step 460. The model is then re-evaluated in step 470. Upon completion, population 404 is re-aligned.

본 발명의 기술된 비-제한적인 실시예들은, 유한 차분 벡터들의 필드로서 위에서 기술된 바와 같이, 특정한 선형 모델에 상응할 수도 있는 다른 파라미터 벡터 공간에서 부분공간 다면체(subspace manifolds)를 결정하기 위하여, 벡터 공간을 샘플링하는 일반 방법으로서 더 생성될 수 있다.The described non-limiting embodiments of the present invention, in order to determine subspace manifolds in other parameter vector spaces, which may correspond to a particular linear model, as described above as fields of finite difference vectors, It can be further generated as a general method of sampling vector space.

글로벌 등록 과정의 다른 결과는 상기 등록 과정과 지역적 등록 과정 사이의 차이가 지역적 등록 잔여분을 산출한다는 것이다. 상기 잔여분은 지역적 모델을 근사화하는데 있어서 글로벌 모델의 오류이다.Another result of the global registration process is that the difference between the registration process and the regional registration process yields a regional registration balance. This residual is an error in the global model in approximating the regional model.

정규화Normalization

정규화는 표준, 또는 보통, 공간적 구성을 향하는 공간 명암도 필드들의 재샘플링을 참조한다. 상기 상대적 공간 구성들이 이러한 구성들 사이의 가역 공간 변환들(invertible spatial transformations)일 경우, 픽셀의 보간(interpolation)을 동반하고 재샘플링하는 것은 또한 위상적 한계(a topological limit)까지 가역적이다. 본 발명의 정규화 방법은 도 5에 기술된다.Normalization refers to resampling of spatial contrast fields towards a standard, or usually, spatial configuration. If the relative spatial configurations are invertible spatial transformations between these configurations, accompanying and resampling the interpolation of the pixels is also reversible up to a topological limit. The normalization method of the present invention is described in FIG.

둘 이상의 공간 명암도 필드들이 정규화되는 경우, 증가된 연산적 효율성은 중간 정규화 연산들을 보존함으로써 달성될 수 있다.If two or more spatial contrast fields are normalized, increased computational efficiency can be achieved by preserving intermediate normalization operations.

등록을 위해 또는 동등하게 정규화를 위해 이미지들을 재샘플링하는데 사용되는 공간 변환 모델들은 글로벌 및 지역적 모델들을 포함한다. 글로벌 모델들은 해석적에서 투영적으로(from translational to projective) 증가하는 순서이다. 지역적 모델들은 기본적으로 블록에 의해 또는 더욱 복잡하게 낱낱의 선형 메쉬(piece-wise linear mesh)에 의해 결정되는 것으로서 픽셀의 이웃 상에서 보간함수(interpolant)를 암시하는 유한 차분들이다.Spatial transformation models used to resample images for registration or equally normalization include global and regional models. Global models are in increasing order from translational to projective. Regional models are basically finite differences that imply an interpolant on a pixel's neighborhood as determined by a block or more complexly by a piece-wise linear mesh.

정규화된 명암도 필드에 대한 본래 명암도 필드들의 보간은 명암도 필드의 부분집합들에 기초하여 PCA 외양 모델들의 선형성을 증가시킨다.Interpolation of the original lightness fields to the normalized lightness field increases the linearity of the PCA appearance models based on subsets of the lightness field.

도 2에 도시된 바와 같이, 객체 픽셀(232&234)은 객체 픽셀(242&244)의 정규 화된 버전을 산출하기 위하여 재샘플링(240)될 수 있다.As shown in FIG. 2, object pixels 232 & 234 may be resampled 240 to yield a normalized version of object pixels 242 & 244.

메쉬Mesh -기반 정규화-Based normalization

본 발명의 다른 실시예는 특성 지점들을 삼각형 기반 메쉬로 모자이크화하고, 메쉬의 꼭지점들이 추적되며, 각 삼각형들의 꼭지점들의 상대적 위치들은 상기 세 개의 꼭지점들에 일치하는(coincident with) 평면을 위한 3차원 표면 법선(three-dimensional surface normal)을 추정하기 위해 사용된다. 상기 표면 법선이 카메라의 투영 축에 일치할 때, 이미징된 픽셀은 삼각형에 상응하는 객체의 최소-왜곡 렌더링을 제공한다. 직교의 표면 법선을 선호하는 경향이 있는 정규화된 이미지를 생성하는 것은 차후의 외양-기반 PCA 모델들의 선형성을 증가시킬 중간 데이터 타입을 보전하는 픽셀을 생성할 수 있다. Another embodiment of the invention mosaics the feature points into a triangle-based mesh, vertices of the mesh are tracked, and the relative positions of the vertices of each triangle are three-dimensional for a plane coincident with the three vertices. It is used to estimate the three-dimensional surface normal. When the surface normal coincides with the camera's projection axis, the imaged pixels provide a minimal-distortion rendering of the object corresponding to the triangle. Generating a normalized image that tends to favor orthogonal surface normals can produce pixels that preserve intermediate data types that will increase the linearity of future appearance-based PCA models.

다른 실시예는 암묵적으로 글로벌 움직임 모델을 모델링하기 위해 종래의 블록-기반 움직임 추정을 활용한다. 하나의 비-제한적인 실시예에서, 상기 방법은 종래의 블록-기반 움직임 추정/예측에 의해 기술되는 움직임 벡터들로부터 글로벌 어파인(affine) 움직임 모델을 팩터한다.Another embodiment utilizes conventional block-based motion estimation to implicitly model a global motion model. In one non-limiting embodiment, the method factors the global affine motion model from the motion vectors described by conventional block-based motion estimation / prediction.

도 9는 글로벌 및 지역적 정규화의 결합 방법을 기술한다.9 describes a method of combining global and regional normalization.

점진적 기하학 정규화Progressive Geometry Normalization

공간 불연속성의 분류화는, 그들이 메쉬 에지들에 일치되므로 암시적으로 불연속성들을 모델링하기 위하여 모자이크화된 메쉬를 정렬하는데 사용된다. Classification of spatial discontinuities is used to align mosaicized meshes to implicitly model discontinuities as they correspond to mesh edges.

동종 영역 경계들은 다각형 윤곽에 의해 근사화된다. 윤곽은 각 다각형 꼭지점의 중요 우선순위를 결정하기 위하여 순차적으로 낮아지는 정확성으로 순차적 으로 근사화된다. 꼭지점 우선순위는 공유된 꼭지점들을 위한 꼭지점 우선순위를 보존하기 위하여 영역들에 걸쳐 전달된다.Homogeneous region boundaries are approximated by polygonal contours. The contours are sequentially approximated with decreasing accuracy in order to determine the critical priority of each polygon vertex. Vertex priorities are passed across areas to preserve the vertex priorities for shared vertices.

본 발명의 한 실시예에서, 다각형 분해 방법은 필드의 동종 분류와 연관된 경계들의 우선순위화를 허용한다. 픽셀은 스펙트럴 유사성과 같은 동일한 동종성 기준에 따라 분류되고, 분류화 라벨들은 공간적으로 영역들로 연결된다. 다른 바람직한 비-제한적인 실시예에서, 4- 또는 8-결합관계(connectedness) 기준이 공간 결합관계를 결정하기 위해 적용된다.In one embodiment of the invention, the polygon decomposition method allows for prioritization of the boundaries associated with homogeneous classification of fields. Pixels are classified according to the same homogeneity criteria, such as spectral similarity, and the classification labels are spatially connected to regions. In another preferred non-limiting embodiment, a four or eight-connectedness criterion is applied to determine the spatial coupling relationship.

바람직한 실시예에서, 상기 공간 영역들의 경계들은 다각형으로 이산화된다. 모든 동종 영역들을 위한 모든 다각형들의 공간 오버레이는 모자이크화되고 예비 메쉬로 합쳐진다. 상기 메쉬의 꼭지점들은 본래 메쉬의 지각적 중요성의 많은 부분을 유지하는 더 간단한 메쉬 표현들을 드러내기 위하여 여러 기준에 의해 분해된다.In a preferred embodiment, the boundaries of the spatial regions are discretized into polygons. The spatial overlay of all polygons for all homogeneous regions is mosaicized and merged into a preliminary mesh. The vertices of the mesh are decomposed by several criteria to reveal simpler mesh representations that retain much of the perceptual importance of the original mesh.

바람직한 실시예에서, 상기 명세의 다른 부분에 개시된 바와 같이, 이미지 등록 방법은 강한 이미지 구배들을 갖는 상기 높은 우선순위 꼭지점들을 향해 바이어스된다. 결과적 변형 모델들은 이미징된 객체의 기하학과 연관된 공간 불연속성들을 보존하려는 경향이 있다.In a preferred embodiment, as disclosed elsewhere in this specification, the image registration method is biased towards the high priority vertices with strong image gradients. The resulting deformation models tend to preserve the spatial discontinuities associated with the geometry of the imaged object.

바람직한 실시예에서, 활성 윤곽들은 영역 경계들을 정제하기 위해 사용된다. 각 다각형 영역을 위한 활성 윤곽에는 하나의 반복을 전하는 것이 허용된다. 상이한 영역들에서 각 활성 윤곽 꼭지점의 움직임 또는 "변형"은 그들이 멤버쉽을 모두 갖고 있도록 하기 위한 암시된 메쉬의 구속조건이 있는 전달을 허용하기 위하 여 평균 동작들에서 결합된다.In a preferred embodiment, active contours are used to refine region boundaries. It is allowed to carry one repetition in the active contour for each polygonal area. The movement or “deformation” of each active contour vertex in different regions is combined in average operations to allow for constrained transfer of the implied mesh to ensure that they have all of their memberships.

바람직한 실시예에서, 꼭지점들은 그것이 상이한 영역의 윤곽의 일부이기도 한 인접 꼭지점들을 위한 메쉬에서 갖는 인접 꼭지점들의 개수의 카운트에 할당된다. 상기 다른 꼭지점들은 반대편(in opposition)으로서 정의된다. 1의 카운트를 갖는 꼭지점의 경우, 상기 꼭지점은 반대편 꼭지점을 갖지 않고, 따라서 보존될 필요가 있다. 2 인접 반대편 꼭지점들이 모두 1의 카운트를 갖는 경우(즉, 상기 2 꼭지점들이 상이한 다각형들에 있고, 서로 인접하는 경우), 한 꼭지점이 다른 하나에 대하여 분해(resolve to)된다. 1의 꼭지점이 2의 값을 갖는 이웃 다각형 꼭지점의 반대편이 있을 경우, 1의 카운트를 갖는 꼭지점은 2의 카운트를 갖는 꼭지점으로 분해(resolve into)되고, 상기 꼭지점 카운트는 1이 된다. 그래서, 다른 이웃 반대편 꼭지점이 존재하는 경우, 상기 꼭지점은 다시 분해될 수 있다. 이 경우, 본래 꼭지점 카운트를 보존하는 것이 중요하며, 그 결과로 꼭지점이 분해될 때 본래 꼭지점 카운트에 기초하여 분해(resolving)의 방향을 바이어스 할 수 있다. 이는, 꼭지점(a)가 꼭지점(b)로 분해되고, 그러면 꼭지점(b)는 꼭지점(c)로 분해되기보다는, 꼭지점(b)가 한 분해(resolution)에서 이미 사용되었으므로 꼭지점(c)가 꼭지점(b)로 분해되어야 한다는 것이다.In a preferred embodiment, the vertices are assigned a count of the number of adjacent vertices that it has in the mesh for adjacent vertices which is also part of the contour of the different area. The other vertices are defined as in opposition. For vertices with a count of 1, the vertices do not have opposite vertices and therefore need to be preserved. If two adjacent opposite vertices all have a count of one (ie, the two vertices are in different polygons and are adjacent to each other), one vertex is resolved to the other. If a vertex of 1 is opposite a neighboring polygon vertex with a value of 2, a vertex with a count of 1 resolves into a vertex with a count of 2, and the vertex count becomes 1. So, if there are other neighboring vertices, the vertices can be decomposed again. In this case, it is important to preserve the original vertex count, as a result of which it is possible to bias the direction of resolving based on the original vertex count when the vertex is decomposed. This means that vertex (c) is a vertex because vertex (b) has already been used at one resolution, rather than vertex (a) is decomposed into vertex (b), then vertex (b) is decomposed into vertex (c). to be decomposed to (b).

바람직한 실시예에서, T-합치 지점들(T-junction points)이 특별히 처리된다. 상기 T-합치 지점들은 인접 다각형에서 지점을 갖지 않는 다각형의 지점들이다. 이 경우, 각 다각형 꼭지점은 먼저 이미지 지점 맵(map) 상에서 플롯(plot)되고, 상기 맵은 꼭지점의 공간 위치와 그것의 다각형 식별자를 식별한다. 그러면 각 다각형 주변(perimeter)이 가로질러지고, 다른 다각형으로부터의 임의의 인접 꼭지점들이 있는지를 보기 위해 검사된다. 다른 영역으로부터의 이웃 꼭지점들이 존재하는 경우, 그들은 그들이 이미 현재 다각형으로부터의 이웃 꼭지점들 갖는지를 알기 위해 각각 검사된다. 그렇지 않을 경우, 현재 지점은 현재 다각형의 꼭지점으로서 부가된다. 상기 추가 검사는 다른 다각형의 고립된 꼭지점들이 T-합치 지점들을 생성하기 위해 사용되는 것을 보장한다. 그렇지 않으면, 상기는 해당 영역이 이미 매칭 꼭지점을 갖는 새로운 꼭지점들을 단지 부가할 수도 있다. 그래서, 반대편 꼭지점은 이웃 꼭지점이 상기 현재 영역에 의해 반대편에 있지 않은 경우에만 부가된다. 다른 실시예에서, T-합치들의 검출 효율성은 마스크 이미지를 이용함으로써 증가된다. 다각형들의 꼭지점들은 순차적으로 방문되고, 상기 마스크는 꼭지점들의 픽셀이 다각형 꼭지점에 속하는 것으로서 식별되도록 갱신된다. 그러면 다각형 주변 픽셀이 가로질러지고 그들이 다각형 꼭지점과 일치하는 경우, 그들은 현재 다각형 내에서 꼭지점으로서 기록된다.In a preferred embodiment, T-junction points are treated specially. The T-match points are points of the polygon that do not have points in adjacent polygons. In this case, each polygonal vertex is first plotted on an image point map, which map identifies the spatial location of the vertex and its polygon identifier. The perimeter of each polygon is then traversed and examined to see if there are any adjacent vertices from other polygons. If there are neighboring vertices from another area, they are each checked to see if they already have neighboring vertices from the current polygon. Otherwise, the current point is added as a vertex of the current polygon. The further check ensures that isolated vertices of other polygons are used to generate T-match points. Otherwise, it may just add new vertices whose region already has matching vertices. Thus, the opposite vertex is added only if the neighboring vertex is not on the opposite side by the current region. In another embodiment, the detection efficiency of the T-matches is increased by using a mask image. The vertices of the polygons are visited sequentially, and the mask is updated so that the pixels of the vertices are identified as belonging to the polygonal vertices. Then the pixels around the polygon are traversed and if they match the polygon vertices, they are recorded as vertices within the current polygon.

바람직한 실시예에서, 스펙트럴 영역이 하나 이상의 오버래핑 동종 이미지 구배 영역들에 의해 재매핑(remapping)되고 다른 동종 스펙트럴 영역 또한 오버랩되는 경우, 사전에 재매핑된 영역들 모두에는 현재 재매핑되고 있는 상기 영역들로서 동일한 라벨이 주어진다. 그래서 본질적으로, 스펙트럴 영역이 두 개의 동종 영역들에 의해 오버래핑되는 경우, 상기 두 개의 동종 영역들에 의해 오버래핑되는 스펙트럴 영역들 전부가 동일한 라벨을 갖게 될 것이고, 따라서 한 스펙트럴 영역이 실제로 두 개의 동종 영역들 대신에 하나의 동종 영역에 의해 커버될 것이다.In a preferred embodiment, if the spectral region is remapping by one or more overlapping homogeneous image gradient regions and the other homogeneous spectral region also overlaps, the previously remapped all of the previously remapped regions are currently remapped. The same labels are given as areas. So in essence, if the spectral region is overlapped by two homogeneous regions, then all of the spectral regions overlapped by the two homogeneous regions will have the same label, so that one spectral region is actually two Will be covered by one homologous region instead of three homologous regions.

본 발명의 한 실시예에서, 이웃 병합 기준(adjacency merge criteria)을 찾기 위해서 영역 리스트들보다는 영역 맵들을 처리하는 것이 유용하다. 다른 실시예에서, 스펙트럴 분할 분류기는 비-동종 영역들을 이용하는 분류기를 훈련시키기 위해 수정될 수 있다. 이는 처리 과정이 스펙트럴 영역들의 에지들에 집중되도록 허용한다. 부가하여, 캐니 에지 검출기(canny edge detector)와 같이, 에지들을 이용하는 것에 기초하여 상이한 분할을 부가하는 것과, 그런 다음 다각형들의 초기 집합을 식별하기 위해 그것을 활성 윤곽에 제공하는 것은 동종 영역들의 더 큰 구별을 가능하게 한다.In one embodiment of the invention, it is useful to process region maps rather than region lists to find adjacency merge criteria. In another embodiment, the spectral splitting classifier may be modified to train the classifier using non-homologous regions. This allows the processing to be concentrated at the edges of the spectral regions. In addition, like a canny edge detector, adding different partitions based on using the edges, and then providing it to the active contour to identify the initial set of polygons is a greater distinction of homogeneous regions. To make it possible.

지역적 정규화Regional normalization

본 발명은 시공간 스트림이 '지역적(local)' 방식으로 등록될 수 있도록 하는 수단을 제공한다.The present invention provides a means by which a spatiotemporal stream can be registered in a 'local' manner.

하나의 이러한 지역화된 방법은, 이미징된 현상의 지역화된 간섭성(localized coherency)이 이미징된 현상 또는 이미징된 객체의 지역적 변형에 대하여 명백한 이미지 밝기 일정 애매성들(apparent image brightness constancy ambiguities)을 분해할 때를 고려하도록 픽셀을 분석하는 수단을 제공하기 위하여 기하학 메쉬의 공간 적용을 사용한다. One such localized method resolves the localized coherency of an imaged phenomenon to resolve apparent image brightness constancy ambiguities with respect to the imaged phenomenon or local deformation of the imaged object. Spatial application of geometric meshes is used to provide a means of analyzing pixels to take time into account.

이러한 메쉬는 지역적 정규화의 수단으로서 이미지 평면에서 표면 변형의 낱낱의 선형 모델을 제공하기 위해 사용된다. 이미징된 현상은 비디오 스트림의 시간 분해(temporal resolution)가 비디오의 움직임과 비교하여 높을 경우 이러한 모델에 종종 상응할 수 있다. 모델 추정들에 대한 예외는 위상적 구속조건들, 이웃 꼭지점 제한들, 및 이미지 구배 영역들과 픽셀의 동종성 분석을 포함하는 다양한 기법들을 통해 처리된다.This mesh is used to provide a single linear model of surface deformation in the image plane as a means of local normalization. The imaged phenomena can often correspond to this model when the temporal resolution of the video stream is high compared to the motion of the video. Exceptions to model estimates are handled through various techniques, including topological constraints, neighbor vertex constraints, and homogeneity analysis of pixels with image gradient regions.

한 실시예에서, 특성 지점들은 그 꼭지점들이 특성 지점들에 상응하는 삼각형 엘리먼트들로 구성되는 메쉬를 생성하기 위해 사용된다. 상응하는 특성 지점들은 지역적 변형 모델을 생성하기 위해, 삼각형들의 보간되는 "요곡(warping)"을 암시하는 다른 프레임들, 그리고 상응하게도 픽셀들이다. In one embodiment, feature points are used to create a mesh whose vertices are composed of triangular elements corresponding to the feature points. Corresponding feature points are other frames, and correspondingly pixels, suggesting interpolated "warping" of triangles, to create a local deformation model.

도 7은 이러한 객체 메쉬의 생성을 나타낸다. 도 8은 지역적으로 프레임들을 정규화하기 위한 이러한 객체 메쉬의 사용을 나타낸다.7 shows the creation of such an object mesh. 8 illustrates the use of this object mesh to normalize frames locally.

한 바람직한 실시예에서, 맵의 각 픽셀이 나오는 삼각형을 식별하는 삼각형 맵이 생성된다. 또한, 각 삼각형에 상응하는 어파인 변환이 최적화 단계로서 사전-연산된다. 그리고 또한, 지역적 변형 모델 생성시, 앵커 이미지(선행)가 샘플에 대한 소스 픽셀의 좌표들을 결정하기 위하여 공간 좌표들을 이용하여 가로질러진다. 상기 샘플링된 픽셀은 현재 픽셀 위치를 대체할 것이다.In one preferred embodiment, a triangular map is created that identifies the triangle from which each pixel of the map comes. In addition, the affine transformation corresponding to each triangle is pre-computed as an optimization step. And also, upon generating a regional deformation model, the anchor image (leading) is traversed using spatial coordinates to determine the coordinates of the source pixel for the sample. The sampled pixel will replace the current pixel position.

다른 실시예에서, 지역적 변형은 글로벌 변형 이후에 수행된다. 앞서 기재된 상기 명세에서, 글로벌 정규화는 글로벌 등록 방법이 비디오의 둘 이상의 프레임들에서 픽셀을 공간적으로 정규화하기 위해 사용되도록 하는 과정으로서 기술되었다. 글로벌적으로 정규화된 결과적 비디오 프레임들은 또한 지역적으로 정규화될 수 있다. 상기 두 방법들의 조합은 솔루션에 글로벌적으로 도달된 정제에 대하여 지역적 정규화를 구속한다. 이는 지역적 방법이 분해를 위해 요구되는 애매성을 크게 감소시킬 수 있다.In another embodiment, the regional transformation is performed after the global transformation. In the foregoing specification, global normalization has been described as a process that allows a global registration method to be used to spatially normalize a pixel in two or more frames of video. Globally normalized resulting video frames may also be localized. The combination of the two methods constrains local normalization for tablets that are globally reached in the solution. This can greatly reduce the ambiguity that local methods require for decomposition.

다른 비-제한적인 실시예에서, 특성 지점들 또는 "정규 메쉬"의 경우에 꼭지점들은 상기 지점들의 이웃에서 이미지 구배의 분석을 통해 한정된다. 상기 이미지 구배는 바로 연산되거나, 해리스 응답과 같은 간접적인 일부 연산을 통해 연산될 수 있다. 부가적으로, 상기 지점들은 공간 구속조건과 이미지 구배의 하강(descent)과 연관된 움직임 추정 오류에 의해 필터링될 수 있다. 상기 한정된 지점들은 많은 모자이크화 기법들 중의 하나에 의한 메쉬를 위한 기초로서 사용될 수 있고, 그 엘리먼트들이 삼각형들인 메쉬가 도출된다. 각 삼각형의 경우, 어파인 모델이 상기 지점들과 그들의 잔여 움직임 벡터에 기초하여 생성된다.In another non-limiting embodiment, vertices in the case of feature points or “normal meshes” are defined through analysis of image gradients in the neighborhood of the points. The image gradient may be computed directly or through some indirect operation, such as a Harris response. Additionally, the points can be filtered by motion estimation errors associated with spatial constraints and descent of the image gradient. The defined points can be used as the basis for a mesh by one of many mosaicization techniques, resulting in a mesh whose elements are triangles. For each triangle, an affine model is generated based on the points and their residual motion vector.

바람직한 실시예에서, 삼각형들 어파인 파라미터들의 리스트가 유지된다. 상기 리스트는 반복되고 현재/선행 지점 리스트가 (꼭지점 검색 맵을 이용하여) 구성된다. 현재/선행 지점 리스트는 변환을 추정하기 위해 사용되는 루틴에 전달되고, 상기 루틴은 상기 삼각형을 위한 어파인 파라미터들을 연산한다. 어파인 파라미터들, 또는 모델은 삼각형 어파인 파라미터 리스트에 저장된다.In a preferred embodiment, a list of triangle affine parameters is maintained. The list is repeated and a current / leading point list is constructed (using the vertex search map). The current / preceding point list is passed to a routine used to estimate the transform, which calculates the affine parameters for the triangle. The affine parameters, or model, are stored in the triangle affine parameter list.

다른 실시예에서, 상기 방법은 삼각형 식별자 이미지 맵을 가로지르고, 여기서 맵의 각 픽셀은 픽셀이 멤버쉽을 갖는 메쉬의 삼각형을 위한 식별자를 포함한다. 그리고, 임의의 삼각형에 속하는 각 픽셀의 경우, 상기 픽셀을 위한 상응하는 글로벌 변형 및 지역적 변형 좌표들이 연산된다. 상기 좌표들은 상응하는 픽셀을 샘플링하고 상응하는 "정규화" 위치에서 그것의 값을 적용하기 위해 사용된다.In another embodiment, the method traverses a triangle identifier image map, where each pixel of the map includes an identifier for a triangle of a mesh of which the pixel has membership. And for each pixel belonging to any triangle, the corresponding global and local deformation coordinates for that pixel are computed. The coordinates are used to sample the corresponding pixel and apply its value at the corresponding "normalized" position.

다른 실시예에서, 공간 구속조건들은 이미지 구배의 탐색으로부터 야기되는 이미지 명암도 대응 강도(image correspondence strength)와 밀도에 기초한 지점들 에 적용된다. 지점들은 이미지 명암도 잔여분의 일부 표준에 기초하여 움직임 추정이 이루어진 이후에 정렬(sorting)된다. 상기 지점들은 그런 다음 공간 밀도 구속조건에 기도하여 필터링된다.In another embodiment, spatial constraints are applied to points based on image correspondence strength and density resulting from the search for an image gradient. The points are sorted after the motion estimation is made based on some standard of image contrast residuals. The points are then filtered in prayer to a spatial density constraint.

다른 실시예에서, 스펙트럴 공간 분할이 사용되고, 작은 동종 스펙트럴 영역들은 공간 유사성(spatial affinity), 그들의 명암도의 유사성 및/또는 컬러에 기초하여 이웃 영역들과 병합된다. 그러면, 동종 병합이 동종 텍스처(이미지 구배)의 영역과의 그들의 오버랩에 기초하여 함께 스펙트럴 영역들을 결합시키기 위해 사용된다. 다른 실시예는 중심-주변 지점들을 사용하고, 이들은 메쉬의 꼭지점을 지지하기 위한 관심 지점들을 한정하는 것으로서 더 큰 영역에 의해 둘러싸인 작은 영역이다. 다른 비-제한적인 실시예에서, 중심 주변 지점은 그 경계 박스가 크기면에서 3×3 또는 5×5 또는 7×7 픽셀인 한 픽셀 내인 영역으로서 정의되고, 상기 경계 박스를 위한 공간 이미지 구배는 코너 형상이다. 영역의 중심은 추가로 상기 위치를 유용한 꼭지점 위치로서 한정하는 코너로서 분류될 수 있다.In another embodiment, spectral spatial partitioning is used and small homogeneous spectral regions are merged with neighboring regions based on spatial affinity, similarity in their intensity and / or color. Homogeneous merging is then used to combine spectral regions together based on their overlap with regions of homogeneous texture (image gradient). Another embodiment uses center-peripheral points, which are small areas surrounded by larger areas as defining points of interest for supporting the vertices of the mesh. In another non-limiting embodiment, the point around the center is defined as an area within one pixel whose bounding box is 3 × 3 or 5 × 5 or 7 × 7 pixels in size, and the spatial image gradient for the bounding box is It is a corner shape. The center of the region can further be classified as a corner defining the position as a useful vertex position.

다른 실시예에서, 수평적 및 수직적 픽셀 유한 차분 이미지들(horizontal and vertical pel finite difference images)은 각 메쉬 에지의 명암도를 분류하기 위해 사용된다. 에지가 그것의 공간 위치에 일치하는 많은 유한 차분들을 갖는 경우, 상기 에지와 그에 따른 상기 에지의 꼭지점들은 이미징된 현상의 지역적 변형에 있어서 매우 중요한 것으로 간주된다. 에지의 유한 차분들의 합산들에 대한 평균들 사이에 큰 파생물 차분(derivative difference)이 존재하는 경우, 영역 에지는 거의 대부분의 경우에 텍스처 변화 에지에 상응하고 양자화 단계에 상응하지는 않는다.In another embodiment, horizontal and vertical pel finite difference images are used to classify the intensity of each mesh edge. If an edge has many finite differences that coincide with its spatial location, the edge and thus the vertices of the edge are considered very important in the regional deformation of the imaged phenomenon. If there is a large derivative difference between the means for summations of the finite differences of the edge, the region edge corresponds in most cases to the texture change edge and not to the quantization step.

다른 실시예에서, 공간 밀도 모델 종료 조건은 메쉬 꼭지점들의 처리 과정을 최적화하기 위해 사용된다. 검출 직사각형의 시작(outset)의 공간 영역 대부분을 커버하는 충분한 수의 지점들이 검사(examination)된 경우, 상기 처리 과정이 종료될 수 있다. 상기 종료는 점수를 생성한다. 상기 처리 과정에 입력되는 꼭지점과 특성 지점들이 상기 점수로 정렬된다. 지점이 기존 지점에 공간적으로 너무 근접하는 경우, 또는 지점이 이미지 구배에서 에지에 상응하지 않는 경우, 상기 지점은 폐기된다. 그렇지 않으면, 지점의 이웃의 이미지 구배가 하강되고(descended), 구배의 잔여분이 한계치를 초과하는 경우 상기 지점도 폐기된다.In another embodiment, the spatial density model termination condition is used to optimize the processing of mesh vertices. If a sufficient number of points covering most of the spatial area of the start of the detection rectangle have been examined, the process may end. The end produces a score. Vertices and feature points entered in the processing are sorted by the score. If a point is too close spatially to an existing point, or if the point does not correspond to an edge in the image gradient, the point is discarded. Otherwise, the image gradient of the neighborhood of the point is descended, and the point is also discarded if the remainder of the gradient exceeds the threshold.

정규적 메쉬 정규화(regular mesh normalization) Regular Mesh Normalization (regular mesh normalization)

본 발명은 정규적 메쉬를 활용하는 상기에 언급된 지역적 정규화를 확장한다. 상기 메쉬는 기초 픽셀에 대한 고려 없이 구성되지만 검출된 객체에 상응하게 위치되고 사이징된다.The present invention extends the above mentioned regional normalization utilizing regular meshes. The mesh is constructed without consideration for the underlying pixel but positioned and sized corresponding to the detected object.

검출된 객체 영역이 주어지는 경우, 얼굴의 사이즈를 가리키는 스케일과 공간 프레임 위치는 얼굴 영역의 시작에 걸쳐 정규적 메쉬를 생성한다. 바람직한 실시예에서, 직사각형 메쉬의 윤곽을 그리기 위해 타일들의 비-오버래핑 집합을 사용하고, 삼각형 메쉬 엘리먼트들을 갖는 정규적 메쉬를 산출하기 위해 타일들의 사선 파티셔닝(a diagonal partitioning)을 수행한다. 다른 바람직한 실시예에서, 타일들은 종래의 비디오 압축 알고리즘(예를 들면, MPEG-4 AVC)에서 사용되는 타일들에 비례적이다.Given the detected object area, the scale and spatial frame position indicating the size of the face creates a regular mesh over the beginning of the face area. In a preferred embodiment, a non-overlapping set of tiles is used to outline the rectangular mesh and a diagonal partitioning of the tiles is performed to yield a regular mesh with triangular mesh elements. In another preferred embodiment, the tiles are proportional to the tiles used in conventional video compression algorithms (eg MPEG-4 AVC).

바람직한 실시예에서, 앞서 언급된 메쉬와 연관되는 꼭지점들은 훈련을 위해 사용되는 비디오의 특정 프레임들에서 상기 꼭지점들을 둘러싸는 픽셀 영역들의 분석을 통해 우선순위화된다. 상기 영역들의 구배에 대한 분석은 (블록-기반 움직임 추정과 같은) 지역적 이미지 구배에 의존적일 수 있는 각 꼭지점과 연관된 과정에 대한 신뢰를 제공한다.In a preferred embodiment, the vertices associated with the aforementioned mesh are prioritized through analysis of pixel regions surrounding the vertices in specific frames of video used for training. Analysis of the gradient of the regions provides confidence in the process associated with each vertex that may be dependent on a local image gradient (such as block-based motion estimation).

다중 프레임들의 꼭지점 위치들의 대응들은 이미지 구배의 간단한 하강을 통해 발견된다. 바람직한 실시예에서, 이는 블록-기반 움직임 추정치들을 통해 달성된다. 본 실시예에서, 높은 신뢰 꼭지점들은 높은 신뢰 대응들을 고려한다. 하위 신뢰 꼭지점 대응들은 상위 신뢰 꼭지점 대응들로부터의 추론을 통한 애매한 이미지 구배들의 분해(resloving)를 통해 암시적으로 도달된다.Correspondences of vertex positions of multiple frames are found through a simple descent of the image gradient. In a preferred embodiment, this is achieved through block-based motion estimates. In this embodiment, high confidence vertices consider high trust responses. Lower trust vertex correspondences are implicitly reached through resolving of obscure image gradients through inference from upper trust vertex correspondences.

한 바람직한 실시예에서, 정규적 메쉬가 직사각형을 추적하는 시작에 다시 만들어진다. 타일들은 삼각형 메쉬를 생성하기 위하여 16×16으로 생성되고, 사선으로 잘린다. 상기 삼각형들의 꼭지점들은 추정된 움직임이다. 움직임 추정은 각 지점이 갖는 텍스처의 타입에 따른다. 텍스처는 세 개의 클래스들, 코너, 에지, 그리고 꼭지점들의 처리 순서를 또한 정의하는 동종으로 나누어진다. 코너 꼭지점은 이웃 꼭지점 추정을 사용한다, 즉 이웃 지점들(이용될 수 있다면)의 움직임 추정치들이 예측 움직임 벡터들(predictive motion vectors)을 위해 사용되고, 움직임 추정이 각 코너 꼭지점을 위해 적용된다. 최저 매드 오류(mad error)를 제공하는 움직임 벡터는 상기 꼭지점 움직임 벡터로서 사용된다. 코너를 위해 사용되는 탐색 전략이 모두 (넓고, 작고, 원점)이다. 에지들의 경우, 다시 최근방 이웃 움 직임 벡터들이 예측 움직임 벡터들로서 사용되고, 오류의 최소량을 갖는 하나가 사용된다. 에지들을 위한 탐색 전략은 작고 원점(origin)이다. 동종의 경우, 이웃 꼭지점들이 탐색되고 최저 오류를 갖는 움직임 추정치들이 사용된다.In one preferred embodiment, the regular mesh is rebuilt at the beginning of tracking the rectangle. The tiles are created 16 × 16 to create a triangular mesh, and are cut diagonally. The vertices of the triangles are estimated motions. Motion estimation depends on the type of texture each point has. Textures are divided into classes that define the processing order of the three classes, corners, edges, and vertices. Corner vertices use neighboring vertex estimation, that is, motion estimates of neighboring points (if available) are used for predictive motion vectors, and motion estimation is applied for each corner vertex. The motion vector that provides the lowest mad error is used as the vertex motion vector. The search strategy used for the corners is all (wide, small, origin). In the case of edges, the nearest neighbor motion vectors are again used as predicted motion vectors, and the one with the least amount of error is used. The search strategy for the edges is small and origin. For homogeneous, neighboring vertices are searched and motion estimates with the lowest error are used.

한 바람직한 실시예에서, 각 삼각형 꼭지점을 위한 이미지 구배가 연산되고, 클래스 및 크기에 기초하여 정렬된다. 그래서, 코너들은 동종 이전에 있는 에지들 이전에 있다. 코너들의 경우 강한 코너들이 약한 코너들 이전에 있고, 에지들의 경우 강한 에지들이 약한 에지들 이전에 있다.In one preferred embodiment, the image gradient for each triangle vertex is computed and aligned based on class and size. So, the corners are before the edges that are allogeneic. In the case of corners the strong corners are before the weak corners, and in the case of the edges the strong edges are before the weak edges.

한 바람직한 실시예에서, 각 삼각형을 위한 지역적 변형은 상기 삼각형과 연관된 움직임 추정치에 기초한다. 각 삼각형은 그것을 위해 어파인 추정된다. 삼각형이 위상적으로 가역(invert)되지 않거나 퇴보(degenerate)되지 않으면, 삼각형의 일부인 픽셀이 획득된 추정치 어파인에 기초하여 현재 이미지를 샘플링하기 위해 사용된다.In one preferred embodiment, the local deformation for each triangle is based on a motion estimate associated with that triangle. Each triangle is estimated to be affinity for it. If the triangle is not topologically inverted or degenerate, then pixels that are part of the triangle are used to sample the current image based on the obtained estimate affine.

분할Division

더 기술되는 분할 과정들을 통해 식별되는 공간 불연속성들은 공간 불연속 모델들로서 언급되는, 그들 각각의 경계들에 대한 기하학 파라미터화를 통해 효율적으로 인코딩된다. 상기 공간 불연속 모델들은 인코딩의 부분집합들에 상응하는 더욱 간명한 경계 기술들을 고려하는 점진적인 방식으로 인코딩될 수 있다. 점진적 인코딩은 공간 불연속들의 많은 중요 측면들을 유지하는 동시에 공간 기하학을 우선순위화하는 튼튼한 수단을 제공한다.Spatial discontinuities identified through the segmentation processes that are further described are efficiently encoded through geometric parameterization of their respective boundaries, referred to as spatial discontinuity models. The spatial discontinuity models may be encoded in a progressive manner that takes into account more concise boundary techniques corresponding to subsets of encoding. Progressive encoding provides a robust means of prioritizing spatial geometry while maintaining many important aspects of spatial discontinuities.

본 발명의 바람직한 실시예는 다중-분해능 분할 분석을 공간 명암도 필드의 구배 분석과 결합하고, 또한 튼튼한 분할을 달성하기 위하여 시간적 안정성 구속조건을 사용한다. Preferred embodiments of the present invention combine multi-resolution segmentation analysis with gradient analysis of spatial contrast fields, and also use temporal stability constraints to achieve robust segmentation.

도 2에 도시된 바와 같이, 객체의 특성의 대응들이 시간에 따라 추적되고(220) 모델링되면(224), 상기 움직임/변형 모델에 대한 충실(adherence)이 객체에 상응하는 픽셀을 분할하기(230) 위해 사용될 수 있다. 상기 과정은 비디오(202&204)에서 검출된 다수의 객체들(206&208)을 위해 반복될 수 있다. 상기 과정의 결과들은 분할된 객체 픽셀(232)이다.As shown in FIG. 2, if the correspondences of the characteristics of the object are tracked over time 220 and modeled 224, then the adherence to the motion / deformation model divides the pixels corresponding to the object 230. Can be used for The process can be repeated for multiple objects 206 & 208 detected in video 202 & 204. The results of the process are divided object pixels 232.

본 발명에 의해 사용되는 불변 특성 분석의 한 형태는 공간 불연속들의 식별에 집중한다. 상기 불연속들은 에지들, 그림자들, 폐쇄(occlusions), 라인들, 코너들, 또는 끊김(abrupt)을 야기하는 임의의 다른 가시적 특징 그리고 비디오의 하나 이상의 이미징된 프레임들에서 픽셀들 사이의 식별가능한 분리로서 명시한다. 부가하여, 유사하게 컬러링 및/또는 텍스처링된 객체들 사이의 미묘한 공간 불연속들은 비디오 프레임의 객체들의 픽셀이 서로에 대하여 상대적으로 상이한 움직임이 아닌 객체들 자신들에 상대적으로 간섭적인(coherent) 움직임을 통과할 때에만 명시할 수 있다. 본 발명은 중요 신호 모드와 연관된 공간 불연속들을 튼튼하게 식별하기 위하여 스펙트럴, 텍스처 및 움직임 분할의 조합을 활용한다.One form of invariant characterization used by the present invention focuses on the identification of spatial discontinuities. The discontinuities are edges, shadows, occlusions, lines, corners, or any other visible feature that causes an interruption, and discernible separation between pixels in one or more imaged frames of the video. Specify as. In addition, subtle spatial discontinuities between similarly colored and / or textured objects may cause the pixels of the objects of the video frame to pass relatively coherent movements to the objects themselves rather than to relatively different movements with respect to each other. Can only be specified when The present invention utilizes a combination of spectral, texture and motion segmentation to robustly identify spatial discontinuities associated with the critical signal mode.

시간 분할Time division

해석적 움직임 벡터들, 또는 동등한 공간 명암도 필드의 유한 차분 측정들의 상위-순서 움직임 모델로의 시간 통합은 종래 기술에 기술된 움직임 분할 형태이다.Temporal integration of finite-difference measurements of analytic motion vectors, or equivalent spatial intensity fields, into a higher-order motion model is a form of motion segmentation described in the prior art.

본 발명의 한 실시예에서, 비디오의 객체 움직임의 유한 차분들을 나타내는 움직임 벡터들의 고밀도 필드(dense field)가 생성된다. 상기 파생물들은 타일들의 정규적 파티셔닝을 통해 또는 공간 분할과 같은 일부 초기화 과정에 의해 공간적으로 그룹화된다. 각 그룹의 "파생물들"은 선형 최소 자승 추정기에 의해 상위 순서 움직임 모델로 통합된다. 결과적 움직임 모델들은 k-수단 클러스터링 기법에 의해 움직임 모델 공간에서 벡터들로서 클러스터링된다. 파생물들은 클러스터가 최선으로 그들에 일치되는 것에 기초하여 분류된다. 클러스터 라벨들은 공간 파티셔닝의 발달(evolution)로서 공간적으로 클러스터링된다. 상기 과정은 공간 파티셔닝이 안정적이기 전까지 지속된다.In one embodiment of the invention, a dense field of motion vectors representing finite differences in object motion of the video is created. The derivatives are spatially grouped through regular partitioning of tiles or by some initialization process such as spatial partitioning. Each group of "deriveds" is integrated into a higher order motion model by a linear least squares estimator. The resulting motion models are clustered as vectors in motion model space by the k-means clustering technique. Derivatives are classified based on which clusters best match them. Cluster labels are spatially clustered as the evolution of spatial partitioning. The process continues until space partitioning is stable.

본 발명의 다른 실시예에서, 주어진 틈(aperture)을 위한 움직임 벡터들은 상기 틈에 상응하는 일련의 픽셀 위치들에 보간된다. 상기 보간에 의해 정의된 블록이 객체 경계에 상응하는 픽셀에 놓여있는 경우, 결과적 분류는 블록에 대한 일부의 예외적인 사선 파티셔닝이다. In another embodiment of the invention, the motion vectors for a given aperture are interpolated at a series of pixel positions corresponding to the aperture. If the block defined by the interpolation lies at the pixel corresponding to the object boundary, the resulting classification is some exceptional diagonal partitioning for the block.

종래 기술에서, 파생물들을 통합하기 위해 사용되는 최소 자승 추정기가 시작들(outliers)에 매우 민감하다. 상기 민감성은 반복들이 넓게 분기하는(diverge) 지점에 대하여 움직임 모델 클러스터링 방법을 매우 바이어스하는(bias) 움직임 모델들을 생성할 수 있다.In the prior art, the least-squares estimator used to integrate derivatives is very sensitive to outliers. The sensitivity can produce motion models that highly bias the motion model clustering method for the points at which the iterations diverge widely.

본 발명에서, 움직임 분할 방법들은 비디오의 둘 이상의 프레임들에 걸쳐 명백한 픽셀 움직임의 분석을 통해 공간 불연속들을 식별한다. 명백한 움직임은 비디오의 프레임들에 걸쳐 일관성을 위해 분석되고 파라미터 움직임 모델들에 통합된 다. 상기 일관적인 움직임에 연관된 공간 불연속들이 식별된다. 움직임 분할은 또한 시간 변화들이 움직임에 의해 야기될 수 있으므로 시간 분할로서 언급될 수 있다. 그러나, 시간 변화들은 또한 지역적 변형, 조명 변화들 등과 같은 일부의 다른 현상에 의해 야기될 수 있다.In the present invention, motion segmentation methods identify spatial discontinuities through analysis of apparent pixel motion over two or more frames of video. The apparent motion is analyzed for consistency across the frames of the video and integrated into parametric motion models. Spatial discontinuities associated with the consistent movement are identified. Motion segmentation can also be referred to as time segmentation as time variations can be caused by motion. However, time changes can also be caused by some other phenomenon such as regional deformation, lighting changes, and the like.

기술된 방법을 통해, 정규화 방법에 상응하는 중요 신호 모드가 식별되고 여러 배경 공제(subtraction) 방법들 중의 하나에 의해 주변 신호 모드(배경 또는 비-객체)로부터 분리될 수 있다. 종종, 상기 방법들은 각 시간 인스턴스에서 최소 변화량을 나타내는 픽셀로서 배경을 통계적으로 모델링한다. 변화는 픽셀 값 차분으로서 특징지어질 수 있다. 대안적으로, 움직임 분할은 중요 이미지 모드의 주어진 검출된 위치들과 스케일에 의해 달성될 수 있다. 거리 변환은 검출된 위치로부터의 모든 픽셀의 거리를 결정하기 위해 사용될 수 있다. 최대 거리에 연관된 픽셀 값들이 유지되는 경우, 배경의 합리적 모델이 결정(resolve)될 수 있다. 즉, 주변 신호가 신호 차분 메트릭(metric)에 의해 시간적으로 재-샘플링된다.Through the described method, an important signal mode corresponding to the normalization method can be identified and separated from the surrounding signal mode (background or non-object) by one of several background subtraction methods. Often, the methods statistically model the background as pixels representing the minimum amount of change in each time instance. The change can be characterized as a pixel value difference. Alternatively, motion segmentation can be achieved by given detected positions and scale of the critical image mode. The distance transform can be used to determine the distance of every pixel from the detected location. If the pixel values associated with the maximum distance are maintained, a reasonable model of the background can be resolved. In other words, the ambient signal is re-sampled in time by a signal difference metric.

주변 신호의 모델이 주어지는 경우, 각 시간 인스턴스에서 완벽한 중요 신호 모드가 차분될 수 있다. 상기 차분들의 각각은 공간적으로 정규화된 신호 차분들(절대 차분들)로 재-샘플링될 수 있다. 상기 차분들은 그러면 서로에 상대적으로 정렬되고 축적된다. 상기 차분들이 중요 신호 모드에 상대적으로 공간적 정규화되므로, 차분의 피크들은 중요 신호 모드와 연관되는 픽셀 위치들에 대부분 상응할 것이다.Given a model of the ambient signal, the perfect critical signal mode can be differentiated at each time instance. Each of the differences can be re-sampled into spatially normalized signal differences (absolute differences). The differences are then aligned and accumulated relative to each other. Since the differences are spatially normalized relative to the critical signal mode, the peaks of the difference will mostly correspond to the pixel positions associated with the critical signal mode.

비-객체의 분해Decomposition of non-objects

분해된 배경 이미지가 주어지는 경우, 상기 이미지와 현재 프레임 사이의 오류가 공간적으로 정규화되고 시간적으로 축적될 수 있다. 이러한 분해된 배경 이미지는 "배경 분해" 섹션에서 기술된다.Given a decomposed background image, errors between the image and the current frame can be spatially normalized and accumulated in time. Such decomposed background images are described in the "Background Decomposition" section.

결과적으로 축적된 오류는 초기 윤곽을 제공하기 위해 스레솔딩(thresholding)된다. 윤곽은 윤곽 변형에 반하여 오류 잔여분(error residual)을 균형 잡기 위해 공간적으로 전달된다.As a result, the accumulated error is thresholded to provide an initial contour. The contour is delivered spatially to balance the error residual against the contour deformation.

구배gradient 분할(gradient segmentation) Gradient segmentation

텍스처 분할 방법들, 또는 동등하게 명암도 구배 분할은 비디오의 하나 이상의 프레임들에서 픽셀의 지역적 구배를 분석한다. 구배 응답은 비디오 프레임에서 픽셀 위치에 지역적인(local to) 공간 불연속들을 특징짓는 통계적 측정치이다. 여러 공간 클러스터링 기법들 중의 하나가 구배 응답들을 공간 영역들로 결합시키기 위해 사용된다. 상기 영역들을 위한 경계들은 비디오 프레임들의 하나 이상에서 공간 불연속들을 식별하기에 유용하다.Texture division methods, or equivalently brightness gradient division, analyze a local gradient of pixels in one or more frames of video. Gradient response is a statistical measure that characterizes spatial discontinuities local to pixel locations in a video frame. One of several spatial clustering techniques is used to combine gradient responses into spatial regions. The boundaries for the regions are useful for identifying spatial discontinuities in one or more of the video frames.

본 발명의 한 실시예에서, 컴퓨터 그래픽 텍스처 생성으로부터 나온 합산 영역 테이블은 명암도 필드의 구배의 연산을 촉진시키기 위해 사용된다. 네 개의 부가 동작들로 결합된 네 검색들(lookups)을 통해 본래 필드의 임의의 사각형의 합산을 쉽게 하는, 점진적으로 합산된 값들의 필드가 생성된다.In one embodiment of the invention, a summation area table resulting from computer graphics texture generation is used to facilitate the calculation of the gradient of the intensity field. Four lookups combined with four additional operations create a field of progressively summed values that facilitates summation of any rectangle of the original field.

다른 실시예는 이미지를 위해 생성되는 해리스 응답을 사용하고, 각 픽셀의 이웃은 동종, 에지 또는 코너로서 분류된다. 응답 값은 상기 정보로부터 생성되고, 프레임의 각 엘리먼트를 위하여 에지(edge-ness) 또는 코너(cornered-ness)의 정도(degree)를 지시한다.Another embodiment uses a Harris response generated for the image, with neighbors of each pixel classified as homogeneous, edge or corner. A response value is generated from this information and indicates the degree of edge-ness or cornered-ness for each element of the frame.

다중-스케일 Multi-scale 구배gradient 분석 analysis

본 발명의 실시예는 또한 여러 공간 스케일들을 통해 이미지 구배 값들을 생성함으로써 지지되는 이미지 구배를 구속한다. 상기 방법은 상이한 스케일들에서의 공간 불연속들이 서로 지지하기 위해 사용되도록 이미지 구배를 한정하는 것을 도울 수 있고, "에지"가 여러 상이한 공간 스케일들에서 구별되는 동안에 상기 에지는 "중요"해야한다. 더욱 한정된 이미지 구배는 더욱 중요한 특성에 대응하는 경향을 가질 것이다.Embodiments of the invention also constrain the image gradient supported by generating image gradient values over several spatial scales. The method may help define an image gradient such that spatial discontinuities at different scales are used to support each other, and the edge should be "important" while "edge" is distinguished at several different spatial scales. More defined image gradients will tend to correspond to more important characteristics.

바람직한 실시예에서, 텍스처 응답 필드가 먼저 생성되고, 그런 다음 상기 필드의 값들은 k-수단 비닝(binning)/파티셔닝에 기초하여 여러 빈들(bins)로 양자화된다. 본래 이미지 구배 값들은 개별 반복이 분기점 분할(watershed segmentation)에 적용될 수 있는 값들의 간격으로서 각 빈을 이용하여 점진적으로 처리된다. 상기 접근의 이익은, 동종성이 강한 공간 바이어스(bias)를 갖는 상대적 관점으로 정의된다는 것이다.In a preferred embodiment, a texture response field is first generated, and then the values of the field are quantized into several bins based on k-means binning / partitioning. The original image gradient values are processed progressively using each bin as the interval of values at which individual iterations can be applied to watershed segmentation. The benefit of this approach is that it is defined as a relative view with a homogeneous spatial bias.

스펙트럴Spectral 분할 Division

스펙트럴 분할 방법들은 비디오 신호에서 블랙 및 화이트, 그레이 스케일, 또는 컬러 픽셀의 통계적 확률분포를 분석한다. 스펙트럴 분류기(spectral classifier)는 상기 픽셀의 확률 분포상에서 클러스터링 동작들을 수행함으로써 구성된다. 분류기는 확률 클래스에 속하는 것으로서 하나 이상의 픽셀을 분류하기 위해 사용된다. 결과적 확률 클래스와 그것의 픽셀에는 클래스 라벨이 주어진다. 상기 클래스 라벨들은 별개의 경계들을 갖는 픽셀의 영역들로 공간적으로 연합된다. 상기 경계들은 비디오 프레임들의 하나 이상에서 공간 불연속들을 식별한다.Spectral segmentation methods analyze the statistical probability distribution of black and white, gray scale, or color pixels in a video signal. A spectral classifier is constructed by performing clustering operations on the probability distribution of the pixels. Classifiers are used to classify one or more pixels as belonging to a probability class. The resulting probability class and its pixels are given a class label. The class labels are spatially associated with regions of a pixel having distinct boundaries. The boundaries identify spatial discontinuities in one or more of the video frames.

본 발명은 비디오 프레임들의 픽셀을 분할하기 위하여 스펙트럴 분류에 기초한 공간 분할을 활용한다. 또한, 영역들 사이의 대응이 선행 분할들에서 영역들을 갖는 스펙트럴 영역들의 오버랩에 기초하여 결정될 수 있다.The present invention utilizes spatial segmentation based on spectral classification to segment the pixels of video frames. In addition, the correspondence between regions can be determined based on the overlap of spectral regions having regions in preceding partitions.

비디오 프레임들이 비디오 프레임의 객체들에 상응하는 더 넓은 영역들에 공간적으로 연결되는 연속적인 컬러 영역들로 대략 구성되는 경우, 컬러링된(또는 스펙트럴) 영역들의 식별 및 추적이 비디오 시퀀스에서 객체들의 차후 분할을 용이하게 할 수 있다는 것을 관찰할 수 있다.If the video frames are roughly composed of consecutive color regions that are spatially connected to wider regions corresponding to the objects of the video frame, the identification and tracking of the colored (or spectral) regions is followed by the objects in the video sequence. It can be observed that partitioning can be facilitated.

배경 분할Background split

기술된 발명은, 비디오의 각 프레임에서 각각의 개별 픽셀과 검출된 객체 사이의 공간 거리 측정치들의 시간 최대치(temporal maximum)에 기초한 비디오 프레임 배경 모델링 방법을 포함한다. 객체의 검출된 위치가 주어지는 경우, 프레임의 각 픽셀을 위한 스칼라 거리(scalar distance)를 생성하면서 거리 변환이 적용된다. 각 픽셀을 위한 모든 비디오 프레임들에 걸쳐 있는 최대 거리의 맵이 유지된다. 최대값이 초기에 할당되거나, 또는 새롭고 상이한 값으로 차후에 갱신되는 경우, 상기 비디오 프레임을 위한 대응하는 픽셀은 "분해된 배경" 프레임에 유지된다.The described invention includes a video frame background modeling method based on the temporal maximum of spatial distance measurements between each individual pixel and a detected object in each frame of video. Given the detected position of the object, the distance transformation is applied while generating a scalar distance for each pixel of the frame. A map of the maximum distance spanning all video frames for each pixel is maintained. If the maximum value is initially assigned or subsequently updated to a new and different value, the corresponding pixel for the video frame is kept in a "decomposed background" frame.

외양 ocean 모델링modelling

비디오 처리의 공통 목표는 종종 일련의 시퀀스의 비디오 프레임들의 외양을 모델링하고 보전하는 것이다. 본 발명은 사전처리의 사용을 통해 구속조건이 있는 외양 모델링 기법들이 튼튼하고 넓게 적용될 수 있는 방식으로 제공되는 것이 가능하도록 하는 것이다. 앞서 기술된 등록, 분할, 및 정규화는 이를 위해 명백하다.A common goal of video processing is often to model and preserve the appearance of a sequence of video frames. The present invention allows for the use of preprocessing to provide constrained appearance modeling techniques in a manner that is robust and widely applicable. The registration, segmentation, and normalization described above is apparent for this.

본 발명은 외양 변화 모델링 수단(a means of appearance variance modeling)을 기재한다. 외양 변화 모델링의 주요 기반은 선형 모델의 경우에 선형 상관성들을 이용하는 탄탄한 기초를 드러내기 위한 특성 벡터들의 분석이다. 공간 명암도 필드 픽셀을 나타내는 특성 벡터들은 외양 변화 모델로 합쳐질 수 있다.The present invention describes a means of appearance variance modeling. The main basis of appearance change modeling is the analysis of feature vectors to reveal a solid foundation using linear correlations in the case of linear models. Feature vectors representing spatial contrast field pixels can be combined into an appearance change model.

대안적인 실시예에서, 외양 변화 모델은 픽셀의 분할된 부분집합으로부터 연산된다. 또한, 특성 벡터는 공간적으로 비-오버래핑되는 특성 벡터들로 분리될 수 있다. 상기 공간 분해는 공간 타일링(spatial tiling)에 의해 달성될 수 있다. 연산 효율성은 더욱 글로벌한 PCA 방법의 차원 감소(dimensionality)를 희생하지 않고서 상기 시간 앙상블들을 처리함으로써 달성될 수 있다.In an alternative embodiment, the appearance change model is computed from the divided subset of pixels. Also, the feature vector can be separated into feature vectors that are spatially non-overlapping. The spatial decomposition can be achieved by spatial tiling. Computational efficiency can be achieved by processing the temporal ensemble without sacrificing the dimensionality of the more global PCA method.

외양 변화 모델 생성시, 공간 명암도 필드 정규화는 공간 변환들의 PCA 모델링을 감소시키기 위하여 사용될 수 있다.In generating an appearance change model, spatial intensity field normalization can be used to reduce PCA modeling of spatial transforms.

PCAPCA

외양 변화 모델을 생성하는 바람직한 수단은 훈련용 매트릭스, 또는 앙상블로 패턴 벡터들로서의 비디오 프레임들을 합침으로써, 그리고 상기 훈련용 매트릭스 상에서 주요 성분 분석(PCA : principal component analysis)의 적용에 의해 이루어진다. 이러한 확장이 절단되는 경우, 결과적 PCA 변환 매트릭스는 비디오의 차후 프레임들을 분석하고 합성하기 위해 사용된다. 절단의 레벨에 기초하여, 픽 셀의 본래 외양의 품질에 대한 변동하는 레벨들이 달성될 수 있다.A preferred means of generating the appearance change model is by combining video frames as training matrix, or ensemble pattern vectors, and by applying principal component analysis (PCA) on the training matrix. If this extension is truncated, the resulting PCA transform matrix is used to analyze and synthesize subsequent frames of video. Based on the level of truncation, varying levels of quality of the original appearance of the pixel can be achieved.

패턴 벡터들의 구성 및 분해(decomposition)를 위한 특정 수단은 당업자에 공지되어 있다.Specific means for the construction and decomposition of pattern vectors are known to those skilled in the art.

주변 신호로부터의 중요 신호 모드의 공간 분할과 상기 모드의 공간 정규화가 주어지는 경우, 픽셀 그 자체, 또는 동등하게 결과적으로 정규화된 신호의 외양은 픽셀 외양의 표현을 위한 비트-레이트와 근사치 오류 사이의 직접적인 트레이드-오프를 고려하는 낮은 순위 파라미터화를 이용한 선형적으로 상관된 성분들로 팩터(factored into)될 수 있다. Given the spatial division of the critical signal mode from the surrounding signal and the spatial normalization of the mode, the appearance of the pixel itself, or, as a result, the normalized signal, is directly between the bit-rate and approximation error for the representation of the pixel appearance. It can be factored into linearly correlated components using low rank parameterization to account for trade-offs.

도 2에 도시된 바와 같이, 정규화된 객체 픽셀(242&244)은 벡터 공간으로 투영될 수 있고 선형 대응들은 데이터(252&254)의 차원적으로 간명한 버전을 산출하기 위해 PCA와 같은 분해 과정(250)을 이용하여 모델링될 수 있다.As shown in FIG. 2, normalized object pixels 242 & 244 can be projected into vector space and the linear correspondences undergo a decomposition process 250 such as PCA to yield a dimensionally concise version of data 252 & 254. Can be modeled using.

순차적 Sequential PCAPCA

PCA는 PCA 변환을 이용하여 PCA 계수들(coefficients)로 패턴들을 인코딩한다. 더 우수한 패턴들이 PCA 변환에 의해 표현될 수 있고, 더 적은 계수들이 상기 패턴을 인코딩하기 위해 필요하다. 변환을 갱신하는 인코딩될 패턴들과 훈련용 패턴들의 획득 사이의 시간 흐름에 따라 패턴 벡터들이 강등(degrade)될 수 있음을 인지하는 것은 강등(degradation)에 대한 반대 동작(counter act)을 도울 수 있다. 새로운 변환을 생성하기 위한 대안으로서, 기존 패턴들의 순차적 갱신이 일정 경우들에서 더욱 연산 효율적이다. PCA encodes patterns into PCA coefficients using a PCA transform. Better patterns can be represented by the PCA transform, and fewer coefficients are needed to encode the pattern. Recognizing that pattern vectors can be degraded over time between the patterns to be encoded and the acquisition of training patterns that update the transform can help counter act against degradation. . As an alternative to creating a new transformation, sequential updating of existing patterns is more computationally efficient in some cases.

많은 최신 압축 알고리즘들은 하나 이상의 다른 프레임들로부터 비디오의 프 레임을 예측한다. 예측 모델은 다른 프레임에서 상응하는 패치(patch)에 매칭되고 오프셋 움직임 벡터에 의해 연관된 해석적 대체 파라미터화되는(an associated translational displacement parameterized) 비-오버래핑 타일들로의 각 예측된 프레임의 파티셔닝에 기초한다. 선택적으로 프레임 인덱스와 커플링되는 상기 공간 대체는 타일의 "예측된 움직임" 버전을 제공한다. 예측의 오류가 일정 임계치 미만일 경우, 상기 타일들의 픽셀은 잔여 인코딩을 위해 적합하고; 압축 효율성에서 상응하는 이득이 존재한다. 그렇지 않으면, 타일들의 픽셀은 바로 인코딩된다. 대안적으로 블록-기반으로 불리는 상기 타입의 타일-기반 움직임 예측 방법은 픽셀을 포함하는 타일들을 해석함으로써 비디오를 모델링한다. 비디오의 이미징된 현상이 상기 타입의 모델링에 충실한 경우, 상응하는 인코딩 효율성이 증가한다. 상기 모델링 구속조건은 일정 레벨의 시간 분해능을 추정하고, 또는 초당 프레임들의 수가 블록-기반 예측에서 고유한 해석적 추정을 따르기 위하여 움직임을 겪는 이미징된 객체들을 위해 존재한다. 상기 해석적 모델을 위한 다른 요구조건은, 일정 시간 분해능을 위한 공간 대체가 제한되어야 하는 것이다; 즉, 예측이 도출되는 프레임들과 예측되고 있는 프레임 사이의 시간차는 절대 시간의 상대적으로 짧은 양이어야 한다. 상기 시간 분해능과 움직임 제한들은 비디오 스트림에 존재하는 일정한 중복적 비디오 신호 성분들의 식별 및 모델링을 용이하게 한다.Many modern compression algorithms predict the frame of video from one or more other frames. The predictive model is based on partitioning of each predicted frame into non-overlapping tiles that are matched to corresponding patches in another frame and are associated with an associated translational displacement parameterized by an offset motion vector. . The spatial replacement, optionally coupled with the frame index, provides a "predicted motion" version of the tile. If the error of prediction is below a certain threshold, the pixels of the tiles are suitable for residual encoding; There is a corresponding gain in compression efficiency. Otherwise, the pixels of the tiles are just encoded. Alternatively, this type of tile-based motion prediction method, called block-based, models video by interpreting tiles comprising pixels. If the imaged phenomenon of the video is true to this type of modeling, the corresponding encoding efficiency increases. The modeling constraint exists for the imaged objects that undergo motion to estimate a level of temporal resolution, or to follow an analytical estimate inherent in block-based prediction. Another requirement for the analytical model is that space replacement for constant time resolution should be limited; That is, the time difference between the frames from which the prediction is derived and the frame being predicted should be a relatively short amount of absolute time. The time resolution and motion constraints facilitate the identification and modeling of certain redundant video signal components present in the video stream.

잔여분-기반 분해(residual-based decomposition) Residual -based decomposition

MPEG 비디오 압축에서, 현재 프레임은 움직임 벡터들을 이용하여 선행 프레임을 보상하는 움직임에 의해, 연이어 보상 블록들을 위한 잔여 갱신의 적용에 의 해 구성되고, 최종적으로 충분한 매치를 갖지 않는 임의의 블록들은 새로운 블록들로서 인코딩된다.In MPEG video compression, the current frame is constructed by applying a residual update for compensation blocks, followed by a motion that compensates for the preceding frame using motion vectors, and finally any blocks that do not have enough matches are new blocks. Are encoded as

잔여 블록들에 상응하는 픽셀은 움직임 벡터를 통해 선행 프레임의 픽셀에 매핑된다. 그 결과는 잔여 값들의 연속적인 적용을 통해 합성될 수 있는 비디오를 통한 픽셀의 시간 경로이다. 상기 픽셀은 PCA를 이용하여 가장 잘 표현될 수 있는 것으로서 식별된다.The pixel corresponding to the residual blocks is mapped to the pixel of the preceding frame via the motion vector. The result is a temporal path of pixels through the video that can be synthesized through successive application of residual values. The pixel is identified as best able to be represented using PCA.

폐쇄-기반 분해Closed-based decomposition

본 발명의 다른 향상예는, 블록들에 적용된 움직임 벡터들이 선행 프레임으로부터의 임의의 픽셀이 픽셀 이동에 의해 폐쇄(커버)되도록 야기하는지를 결정한다. 각 폐쇄 이벤트의 경우, 폐쇄하는 픽셀을 새로운 층으로 나눈다. 히스토리 없이 또한 픽셀이 드러나게 될 것이다. 드러난 픽셀은 현재 프레임에서 그들이 결합되고 히스토리컬 결합(a historical fit)이 상기 층을 위해 이루어질 수 있는 임의의 층으로 위치된다. Another improvement of the invention determines if the motion vectors applied to the blocks cause any pixels from the preceding frame to be closed (covered) by pixel movement. For each closure event, we divide the pixel that we are closing into a new layer. The pixel will also be revealed without history. The exposed pixels are placed in any layer where they are combined in the current frame and a historical fit can be made for that layer.

픽셀의 시간 연속성은 상이한 층들에 대한 픽셀의 접목과 융합을 통해 지지된다. 안정적 층 모델이 도달되면, 각 층의 픽셀은 간섭적인 움직임 모델들에 대한 멤버쉽에 기초하여 그룹핑될 수 있다.The temporal continuity of the pixels is supported through grafting and fusing of pixels to different layers. Once the stable layer model is reached, the pixels of each layer can be grouped based on membership to the coherent motion models.

부분-대역 시간 양자화(sub-band temporal quantization) Part-time quantization band (sub-band temporal quantization)

본 발명의 대안적인 실시예는 각 프레임을 부분-대역 이미지들로 분해하기 위하여 이산 코사인 변환(DCT) 또는 이산 웨이브릿 변환(DWT)을 사용한다. 주요 성분 분석(PCA)은 그런 다음 상기 "부분-대역" 비디오들의 각각에 적용된다. 비디 오의 프레임의 부분-대역 분해가 본래 비디오 프레임과 비교하여 임의의 하나의 부분-대역들의 공간 변화를 감소시키는 것이 개념이다. An alternative embodiment of the present invention uses Discrete Cosine Transform (DCT) or Discrete Wavelet Transform (DWT) to decompose each frame into partial-band images. Principal Component Analysis (PCA) is then applied to each of the "partial-band" videos. The concept is that the partial-band decomposition of a frame of video reduces the spatial variation of any one sub-bands compared to the original video frame.

이동 객체(사람)의 비디오의 경우, 공간 변화는 PCA에 의해 모델링되는 변화를 지배하는 경향이 있다. 부분-대역 분해는 임이의 한 분해 비디오에서 공간 변화를 감소시킨다.In the case of video of moving objects (persons), spatial changes tend to dominate the changes modeled by the PCA. Partial-band decomposition reduces the spatial variation in any decomposition video.

DCT의 경우, 임의의 한 부분-대역을 위한 분해 계수들(decomposition coefficients)이 부분-대역 비디오로 공간적으로 배열된다. 예를 들면, DC 계수들은 각 블록으로부터 취해져서 본래 비디오의 우표 버전처럼 보이는 부분-대역 비디오로 배열된다. 이것은 모든 다른 부분-대역들을 위해 반복되고, 결과적 부분-대역 비디오들은 PCA에 의해 각각 처리된다.For DCT, decomposition coefficients for any one sub-band are spatially arranged in the partial-band video. For example, DC coefficients are taken from each block and arranged into partial-band video that looks like a stamped version of the original video. This is repeated for all other part-bands, and the resulting part-band videos are each processed by the PCA.

DWT의 경우, 부분-대역들은 DCT를 위해 기술된 방식으로 이미 배열된다.In the case of DWT, the partial-bands are already arranged in the manner described for the DCT.

비-제한적인 실시예에서, CPA 계수들의 절단이 변동된다.In a non-limiting embodiment, truncation of the CPA coefficients is varied.

웨이브릿Wavelet

데이터가 이산 웨이브릿 변환(DWT)에 의해 분해될 때, 다중 대역-패스 데이터 집합들은 하위 공간 분해들을 야기한다. 변환 과정은 단일한 스칼라 값들만이 도출되기 전까지 도출된 데이터에 회귀적으로 적용될 수 있다. 분해된 구조에서 스칼라 엘리먼트들은 통상적으로 계층적 부모/자식 방식에 관련된다. 결과적 데이터는 다중 분해 계층적 구조(multi resolution hierarchical structure)와 또한 유한 차분들을 포함한다.When data is decomposed by Discrete Wavelet Transform (DWT), multi-band-pass data sets cause subspace decompositions. The transformation process can be applied recursively to the derived data until only single scalar values are derived. Scalar elements in a decomposed structure are typically related to hierarchical parent / child approaches. The resulting data includes a multi resolution hierarchical structure and also finite differences.

DWT가 공간 명암도 필드들에 적용될 때, 자연스럽게 발생하는 많은 이미지들 의 현상은 낮은 공간 주파수에 기인하여 데이터 구조들로부터 도출되는 제1 또는 제2 낮은 대역 패스에 의해 지각적인 손실이 거의 없이 표현된다. 계층적 구조를 절단하는 것은 고주파수 공간 데이터가 존재하든 또는 잡음으로 간주되든 간결한 표현을 제공한다.When DWT is applied to spatial contrast fields, many naturally occurring phenomena of the image are represented with little perceptual loss by the first or second low band pass derived from the data structures due to the low spatial frequency. Truncating the hierarchical structure provides a concise representation whether high frequency spatial data is present or considered noise.

PCA가 작은 수의 계수들을 갖는 정확한 재구성을 달성하기 위해 사용될 수 있는 반면에, 변환 그 자체는 상당히 클 수 있다. 상기 "초기" 변환의 사이즈를 감소시키기 위해, 웨이브릿 분해의 내장된 제로 트리(EZT : an embedded zero tree) 구성이 변환 매트릭스의 점진적으로 더욱 정확한 버전을 세우기 위해 이용될 수 있다.While PCA can be used to achieve accurate reconstruction with a small number of coefficients, the transformation itself can be quite large. To reduce the size of the "initial" transform, an embedded zero tree (EZT) configuration of wavelet decomposition can be used to build progressively more accurate versions of the transform matrix.

부분공간 분류(subspace classification) Subspace Classification (subspace classification)

당업자가 잘 알고 있는 바로서, 이상적으로 샘플링되는 현상 데이터와 파생 데이터는 대수 벡터 공간(algebraic vector space)에 상응하는 일련의 데이터 벡터들로서 표현될 수 있다. 상기 데이터 벡터들은 비-제한적인 방식으로 분할된 객체의 정규화된 외양의 픽셀, 움직임 파라미터들, 그리고 2 또는 3차원에서 꼭지점들 또는 특성들의 임의의 구조적 위치들을 포함한다. 상기 벡터들의 각각은 벡터 공간에 존재하고, 공간의 기하학 분석은 샘플링된, 또는 파라미터, 벡터들의 간명한 표현들을 산출하기 위해 사용될 수 있다. 이로운 기하학 조건들은 간결한 부분공간들을 형성하는 파라미터 벡터들에 의해 대표된다. 하나 이상의 부분공간들이 혼합될 때, 거의 더욱 복잡한 단일 부분공간, 구성적인 부분공간들을 생성하는 것은 식별되기 어려울 수 있다. (내적(inner product)과 같은) 본래 벡터들의 일부 상 호작용을 통해 생성되는 고차원적 벡터 공간의 데이터를 검사함으로써 상기 부분공간들의 분리를 고려하는 분할 방법들은 여러 가지가 있다. As will be appreciated by those skilled in the art, ideally sampled phenomenon data and derived data may be represented as a series of data vectors corresponding to an algebraic vector space. The data vectors comprise the normalized appearance of the segmented object's pixel, motion parameters, and any structural positions of vertices or properties in two or three dimensions in a non-limiting manner. Each of the vectors is in vector space, and the geometric analysis of the space can be used to produce sampled, or parametric, concise representations of the vectors. Advantageous geometric conditions are represented by parameter vectors forming concise subspaces. When one or more subspaces are mixed, it may be difficult to identify a more complex single subspace, constituent subspaces. There are a number of partitioning methods that consider the separation of the subspaces by examining the data of the high-dimensional vector spaces generated through some interaction of the original vectors (such as inner products).

벡터 공간을 분할하는 한 방법은 다항식들을 표현하는 Veronese 벡터 공간으로의 벡터들의 투영을 동반한다. 상기 방법은 일반화된 PCA 또는 GPCA 기법으로서 당업자에 공지되어 있다. 이러한 투영을 통해, 다항식들에 대한 법선들(normals)이 발견 및 그룹핑되고, 상기 법선들과 연관된 본래 벡터들이 함께 그룹핑될 수 있다. 상기 기법의 활용의 예는 3차원 구조 모델과 상기 3차원 모델의 움직임으로의 시간에 따라 추적되는 2차원 공간 지점 대응들의 팩터에 있다.One way of partitioning the vector space is to entrain the projection of the vectors into the Veronese vector space, which represents polynomials. Such methods are known to those skilled in the art as generalized PCA or GPCA techniques. Through this projection, normals to polynomials can be found and grouped, and the original vectors associated with the normals can be grouped together. An example of the use of this technique is in the factor of the three-dimensional structural model and the two-dimensional spatial point correspondences tracked over time into the movement of the three-dimensional model.

GPCA 기법은 정의된 것으로서 적용될 대 불완전하며, 데이터 벡터들이 거의 잡음없이 생성될 때에만 결과들을 산출한다. 종래 기술은 GPCA 알고리즘을 가이드하기 위하여 감독상의 사용자 개입을 가정한다. 상기 구속조건은 기법의 잠재성을 크게 제한한다.The GPCA technique is incomplete when defined and applied, and yields results only when data vectors are generated with almost no noise. The prior art assumes supervisory user intervention to guide the GPCA algorithm. The constraints greatly limit the potential of the technique.

본 발명은 잡음이나 혼합된 여차원(mix co-dimension)이 존재하는 경우 다중 부분공간들의 식별 및 분할을 튼튼하게 처리하기 위해 GPCA 방법의 개념적 기초를 확장한다. 상기 혁신은 현재 기술에서 기법들의 감독 되지 않는 향상을 제공한다.The present invention extends the conceptual basis of the GPCA method to robustly identify and subdivide multiple subspaces in the presence of noise or mixed co-dimensions. The innovation provides an unsupervised improvement of the techniques in current technology.

종래 기술에서, GPCA는 법선 벡터들(normal vectors)의 탄젠트 공간에 대한 고려 없이 Veronese 맵의 다항식들의 상기 법선 벡터들 상에서 동작한다. 본 발명에 따른 방법은 Veronese 맵에서 일반적으로 발견되는 법선 벡터들의 공간에 직교하는 탄젠트 공간을 찾기 위해서 GPCA를 확장한다. 상기 "탄젠트 공간", 또는 Veronese 맵의 부분공간은 Veronese 맵을 팩터하기 위해 사용된다.In the prior art, GPCA operates on the normal vectors of the polynomials of the Veronese map without considering the tangent space of the normal vectors. The method according to the invention extends the GPCA to find the tangent space orthogonal to the space of the normal vectors normally found in the Veronese map. The "tangent space", or subspace of the Veronese map, is used to factor the Veronese map.

탄젠트 공간은 평면파 전개들(plane wave expansions) 그리고 기하학 객체들의 표현에서 이중성을 드러내는 탄젠트 평면 좌표들, 특히 Veronese 맵의 다항식들에 대한 법선들의 탄젠트들과 위치들 사이의 Legendre 변환의 적용을 통해 식별된다. 이산 Legendre 변환은 법선 벡터들에 상응하는 파생물의 구속조건이 있는 형태를 정의하기 위해 볼록렌즈(convex) 분석을 통해 적용된다. 상기 접근은 잡음 존재시 법선 벡터들의 연산에 의해 데이터 벡터들을 분할하기 위하여 사용된다. 상기 볼록함 분석은 더욱 튼튼한 알고리즘을 제공하기 위해 GPCA와 통합된다.Tangent space is identified through the application of legend wave transforms between tangents and positions of normals to plane wave expansions and tangent plane coordinates that reveal duality in the representation of geometric objects, in particular the polynomials of the Veronese map. . Discrete Legendre transforms are applied through convex analysis to define the constrained shape of the derivative corresponding to the normal vectors. This approach is used to divide the data vectors by the calculation of the normal vectors in the presence of noise. The convex analysis is integrated with GPCA to provide a more robust algorithm.

본 발명은 GPCA 적용시 반복적 인자화 접근(iterative factorization approach)을 강조한다. 특히, 종래 기술에서 발견되는 파생물-기반 구현은 여기에서 기술되는 매우 동일한 GPCA를 통해 분류된 데이터 벡터들의 앙상블을 정제하기 위해 확장된다. 반복적으로 적용되어, 상기 기법은 Veronese 매핑에서 후보 법선 벡터들을 튼튼하게 찾기 위해 사용될 수 있고, 더 나아가 상기 확장된 GPCA 기법을 이용하여 상기 벡터들을 한정하기 위해 사용될 수 있다. 인자화 단계의 경우, 정제된 벡터들의 집합과 연관된 본래 데이터가 본래 데이터 집합으로부터 제거된다. 잔존하는 데이터 집합은 마찬가지로 상기 혁신적인 GPCA 기법에 의해 분석될 수 있다. 상기 혁신은 감독되지 않는 방식으로 GPCA 알고리즘을 이용하는데 중요하다. 도 11은 데이터 벡터들의 회귀적 정제를 도시한다.The present invention highlights an iterative factorization approach in GPCA application. In particular, the derivative-based implementations found in the prior art are extended to refine the ensemble of classified data vectors through the very same GPCA described herein. Applied repeatedly, the technique can be used to robustly find candidate normal vectors in Veronese mapping and can further be used to define the vectors using the extended GPCA technique. For the factoring step, the original data associated with the set of purified vectors is removed from the original data set. The remaining data set can likewise be analyzed by the innovative GPCA technique. The innovation is important for using the GPCA algorithm in an unsupervised manner. 11 shows a regression refinement of data vectors.

또한, GPCA 기법의 본 발명에 따른 확장은 Veronese 다항식 벡터 공간에서 다중 루트들(roots)이 있는 경우에 더 큰 장점들을 갖는다는 것을 알 수 있다. 부가하여, Veronese 맵의 법선들이 벡터 공간 축에 병렬일 때 종래 기술에 따른 기법 이 퇴보의 경우에 부딪히는 경우, 본 발명에 따른 방법은 퇴보되지 않는다.It can also be seen that the extension according to the invention of the GPCA technique has greater advantages in the case of multiple roots in the Veronese polynomial vector space. In addition, if the technique according to the prior art encounters a case of regression when the normals of the Veronese map are parallel to the vector space axis, the method according to the invention is not regressed.

도 10은 기본 다항식 결합 및 차별화 방법을 나타낸다. 10 illustrates a basic polynomial binding and differentiation method.

혼합형 공간 정규화 압축Mixed Space Normalized Compression

본 발명은 비디오 스트림을 둘 이상의 "정규화된" 스트림들로 분할하는 것의 부가를 통해 블록-기반 움직임 예측된 코딩 스킴들의 효율성을 확장한다. 상기 스트림들은 종래의 코덱들의 해석적 움직임 추정이 유효하도록 하기 위해 별도로 인코딩된다. 정규화된 스트림들의 디코딩에 따라, 상기 스트림들은 그들의 적합한 위치로 비-정규화되고 본래 비디오 시퀀스를 산출하기 위해 합성된다.The present invention extends the efficiency of block-based motion predicted coding schemes through the addition of dividing a video stream into two or more "normalized" streams. The streams are separately encoded in order for the analytical motion estimation of conventional codecs to be valid. Upon decoding of the normalized streams, the streams are de-normalized to their proper location and synthesized to yield the original video sequence.

한 실시예에서, 하나 이상의 객체들이 비디오 스트림에서 검출되고 각 개별 객체와 연관된 픽셀은 차후에 비-객체 픽셀을 떠나며 분할된다. 다음으로, 글로벌 공간 움직임 모델이 객체 및 비-객체 픽셀을 위해 생성된다. 글로벌 모델은 객체 및 비-객체 픽셀을 공간적으로 정규화하기 위해 사용된다. 이러한 정규화는 비디오 스트림으로부터 비-해석적 움직임을 효과적으로 제거하였고, 그 폐쇄 상호작용이 최소화된 일련의 비디오들을 제공하였다. 이들은 본 발명에 따른 방법의 유용한 특성들 모두이다.In one embodiment, one or more objects are detected in the video stream and the pixels associated with each individual object are subsequently split leaving the non-object pixels. Next, a global spatial motion model is created for object and non-object pixels. The global model is used to spatially normalize object and non-object pixels. This normalization effectively eliminated non-interpretive motion from the video stream and provided a series of videos with minimal closed interaction. These are all useful properties of the method according to the invention.

공간적으로 정규화된 그들의 픽셀을 갖는, 객체 및 비-객체의 새로운 비디오들은 입력으로서 종래의 블록-기반 압축 알고리즘에 제공된다. 비디오들의 디코딩에 따라, 글로벌 움직임 모델 파라미터들은 상기 디코딩된 프레임들을 비-정규화하기 위해 사용되고, 객체 픽셀은 본래 비디오 스트림의 근사화를 산출하기 위해 합성되고 비-객체 픽셀에 적용된다. New videos of objects and non-objects, with their pixels spatially normalized, are provided as input to conventional block-based compression algorithms. In accordance with the decoding of the videos, global motion model parameters are used to de-normalize the decoded frames, and the object pixel is synthesized to apply an approximation of the original video stream and applied to the non-object pixel.

도 6에 도시된 바와 같이, 하나 이상의 객체들(630&650)을 위한 선행 검출 객체 인스턴스들(206&208)이 종래의 비디오 압축 방법(632)의 별도 인스턴스에 의해 각각 처리된다. 부가하여, 객체들의 분할(230)로부터 도출되는 비-객체(602)는 또한 종래의 비디오 압축(632)에 의해 압축된다. 상기 별도의 압축 인코딩들(632) 각각의 결과는 별도로 각 비디오 스트림에 상응하는 각각(634)을 위한 별도의 종래 인코딩된 스트림들이다. 일부 지점에서, 어쩌면 전달 이후에, 상기 중간 인코딩된 스트림들(234)은 정규화된 비-객체(610)와 다수의 객체들(638&658)의 합성으로 압축 해제(636)될 수 있다. 상기 합성된 픽셀은 합성 과정(670)이 객체 및 비-객체 픽셀을 전체 프레임의 통합(672)으로 결합할 수 있도록 서로에 대하여 공간적으로 픽셀을 정확하게 위치시키기 위하여 그들의 비-정규화된 버전들(622, 642&662)로 비-정규화(640)될 수 있다.As shown in FIG. 6, the preceding detection object instances 206 & 208 for one or more objects 630 & 650 are each processed by a separate instance of the conventional video compression method 632. In addition, the non-object 602 derived from the segmentation 230 of objects is also compressed by conventional video compression 632. The result of each of the separate compressed encodings 632 is separate separate conventionally encoded streams for each 634 corresponding to each video stream separately. At some point, perhaps after delivery, the intermediate encoded streams 234 may be decompressed 636 with a combination of normalized non-object 610 and a number of objects 638 & 658. The synthesized pixel has their non-normalized versions 622 in order to accurately position the pixels spatially relative to each other such that the compositing process 670 can combine the object and the non-object pixel into an integration 672 of the entire frame. 642 & 662 may be de-normalized 640.

혼합형 코덱의 통합Hybrid codec integration

본 발명에서 기술된 바와 같이, 종래의 블록-기반 압축 알고리즘과 정규화-분할 스킴의 결합시, 결과적으로 도출되는 여러 개의 본 발명에 따른 방법들이 존재한다. 주요하게는, 요구되는 통신 프로토콜들과 특별화된 데이터 구조들이 존재한다.As described in the present invention, when combining a conventional block-based compression algorithm with a normalization-division scheme, there are several methods according to the present invention that result. Mainly, there are required communication protocols and specialized data structures.

주요 데이터 구조들은 글로벌 공간 변형 파라미터들과 객체 분할 특정 마스트들을 포함한다. 주요 통신 프로토콜들은 글로벌 공간 변형 파라미터들과 객체 분할 특정 마스크들을 포함하는 층들이다.Key data structures include global spatial transformation parameters and object partition specific masts. The main communication protocols are layers that contain global spatial transformation parameters and object segmentation specific masks.

Claims

A computer device for generating video signal data in encoded form from a plurality of video frames, wherein:

Means for identifying corresponding elements of the object between two or more frames;

Means for modeling the correspondences to produce modeled correspondences;

Means for resampling pixel data of the video frames associated with the object utilizing the modeled correspondences; And

Means for recovering spatial locations of the resampled pixel data utilizing the modeled correspondences,

The object is one or more objects,

The resampled data is an intermediate form of the data,

Video signal data generating device.

The method of claim 1,

The object is estimated by a tracking method,

The device,

Means for detecting any object in a video frame sequence; And

Means for tracking the object through two or more frames of a sequence of video frames,

The object detecting and tracking means comprises a Viola / Jones face detection algorithm,

Video signal data generating device.

The method of claim 1,

The object is divided from the video frame by a segmentation method,

The device,

Means for dividing the pixel data associated with the object from other pixel data in the video frame sequence; And

Means for assembling the recovered pixels with associated segmentation data to produce an original video frame,

Said dividing means comprises time integration,

Video signal data generating device.

The method of claim 1,

The corresponding models are factored into global models,

The apparatus comprises means for integrating corresponding measurements into a model of global motion,

The corresponding modeling means is adapted to robust sampling consensus for a solution of a two-dimensional affine motion model, and to finite differences generated from block-based motion estimation between two or more video frames in the sequence. Including a sampling population based on

Video signal data generating device.

The method of claim 1,

The intermediate data is further encoded,

The device,

Means for decomposing the normalized object pixel data into an encoded representation; And

Means for reconstructing the normalized object pixel data from an encoded representation,

Said decomposition means comprises principal component analysis, and

The reconstruction means comprises principal component analysis,

Video signal data generating device.

The method of claim 5,

Non-object pixels of the frame are modeled as follows:

The method includes that the object is a remaining non-object of the frame when other objects are removed;

Video signal data generating device.

The method of claim 5,

The divided and resampled pixels are combined with a conventional video compression / decompression process,

The device,

Means for providing said re-sampled pixel as standard video data to a conventional video compression process; And

Means for storing and transmitting model correspondence data along with corresponding encoded video data,

The compression / decompression method may allow the conventional video compression method to increase compression efficiency.

Video signal data generating device.

The method of claim 1,

The corresponding models are factored into regional deformation models,

The device,

Means for defining a two-dimensional mesh that overlays a pixel corresponding to the object; And

Means for corresponding measurements into a model of local movement,

The mesh defining means is based on a uniform grid of vertices and edges, and the corresponding measurements include vertex displacements based on finite differences generated from block-based motion estimation between two or more video frames in the sequence,

Video signal data generating device.

The method of claim 8,

The vertices correspond to discrete image characteristics,

The device,

Means for identifying significant image characteristics corresponding to the object,

The identification means is an analysis of the image gradient Harris response,

Video signal data generating device.

A computer device for separating data vectors in discrete linear subspaces,

Means for performing subspace partitioning on the set of data vectors;

Means for making subspace segmentation criteria a constraint through the application of tangent vector analysis in an implicit vector space,

The subspace partitioning method is GPCA,

The implicit vector space is a Veronese map,

The tangent space constraint is a Legendre transform,

Data vector separator.