KR20020087957A

KR20020087957A - Preprocessing method applied to textures of arbitrarily shaped objects

Info

Publication number: KR20020087957A
Application number: KR1020027013368A
Authority: KR
Inventors: 발렌테스테판이
Original assignee: 코닌클리케 필립스 일렉트로닉스 엔.브이.
Priority date: 2001-02-06
Filing date: 2002-01-29
Publication date: 2002-11-23
Also published as: JP2004519155A; US6768495B2; US20020168114A1; CN1456014A; EP1360841A1; WO2002063883A1; JP3876227B2; CN1215720C

Abstract

본 발명은 임의로 형성된 오브젝트들에 대응하는 전처리하는 방법에 관한 것으로, 각 오브젝트를 위해 텍스처 부분 및 오브젝트 마스크를 포함하고, 상기 방법은 오브젝트에 연관된 각 오브젝트 평면을 위해, (1) 오브젝트 평면을 2차원 블록들로 분할하는 단계, (2) 원래 픽셀 값들의 추정을 기저 벡터들의 선형 조합으로 표현하도록 선택된 상기 기저 벡터들의 세트를 도입하는 단계, (3) 상기 픽셀 값들의 원래 표현과 이 원래 표현의 상기 추정 사이의 왜곡을 측정하도록 코스트 함수 ψ를 규정하는 단계, (4) 상기 코스트 함수 ψ를 최소화하는 것을 허용하는 계수들을 찾는 단계를 포함하고, 상기 찾는 단계 자체는 초기화 동작, 불투명한 픽셀들에 제한된 기저 벡터들의 추출 및 투영 계수들의 계산 동작, 반복들 동작, 및 미리 결정된 판단기준에 따라 상기 반복들의 중단 동작을 포함한다.The present invention relates to a method of preprocessing corresponding to arbitrarily formed objects, comprising a texture portion and an object mask for each object, the method comprising: (1) two-dimensional object plane for each object plane associated with the object; Dividing into blocks, (2) introducing a set of basis vectors selected to represent an estimate of original pixel values as a linear combination of basis vectors, and (3) the original representation of pixel values and the original representation of Defining a cost function ψ to measure the distortion between the estimates, (4) finding coefficients that allow minimizing the cost function ψ, wherein the finding itself is limited to an initialization operation, opaque pixels According to the extraction operation of the basis vectors and the calculation operation of the projection coefficients, the iterations operation, and predetermined criteria Includes an abort operation of the iterations.

Description

Preprocessing method applied to textures of arbitrarily shaped objects}

1999년 발행된 MPEG-4 표준은 자연 그대로 눈에 보이는 오브젝트들 및 합성 화상들을 효율적으로 인코딩하는 단일화된 방법을 제안하였다. 이 오브젝트들(통상적으로 어떤 임의로 형성된 오브젝트들을 차례로 포함할 수 있는 여러 층들로 이루어짐)을 처리하는 인코더에 있어서, 이들은 두 개의 성분들, 즉 2진수 또는 그레이 레벨 픽셀들로 이루어질 수 있고 장면 구성(scene composition)에 대한 디코더에 의해 이용되는 알파 채널 값들(alpha channel values)을 나타내는 오브젝트 마스크 및 예컨대, 오브젝트의 픽셀들(마스크에서 화이트 픽셀(white pixel)은 텍스처 부분에서 대응하는 픽셀이 불투명하여, 층 계급에서 이것 뒤의 다른 오브젝트의 픽셀들로 대체하는 것을 의미하는 반면, 블랙 픽셀(black pixel)은 텍스처 부분에서 대응하는 픽셀이 완전히 투명한 즉, 눈에 보이지 않는 것을 의미함)의 값들인 텍스처 부분의 형태가 된다. 보다 구체적으로, 본 발명은 텍스처 부분의 인코딩 동작을 어드레싱한다.The MPEG-4 standard, published in 1999, proposed a unified method for efficiently encoding natural objects and composite images. In an encoder that processes these objects (typically made up of several layers which can in turn contain any arbitrarily formed objects), they can consist of two components, binary or gray level pixels, an object mask representing the alpha channel values used by the decoder for the composition and, for example, pixels of the object (white pixels in the mask are layer opaque, with corresponding pixels in the texture part being opaque, Means to replace the pixels of the other object behind it, whereas a black pixel is a form of a texture part that is the values of the corresponding pixel in the texture part is completely transparent, i.e. invisible). Becomes More specifically, the present invention addresses the encoding operation of the texture portion.

MPEG-4 인코더에서 움직이는 텍스처들을 인코딩하기 위해, 컨볼루션 방법은 영상 블록들 상에서 DCT 변환(이산 코사인 변한)을 이용하는 것이다. 보다 정확하게, 인코딩될 평면은 16 ×16 픽셀 크기의 매크로블록으로 분할되고, 16 ×16 휘도 정보는 2차원 8 ×8 DCT 변환(같은 2D 변환은 U 및 V 색차 정보를 포함하는 2의 8 ×8 블록들에 다시 이용됨)에 의해 인코딩된 4의 8 ×8 블록들로 더 분할된다. 임의로 형성된 오브젝트들에 있어서, 어떤 8 ×8 블록은, 오직 투명한 픽셀들(텍스처 정보를 인코딩하는데 필요하지 않음) 또는 오직 불투명한 픽셀들(표준 직각형 8 ×8 DCT는 텍스처 정보를 인코딩하는 데에 이용됨)만을 포함하거나, 또는 적어도 불투명한 픽셀 및 투명한 픽셀을 포함하는 3가지 종류들이 될 수 있다. 이 세 번째 상황에서 해결되어야 할 문제는 비트 소비에 관한 이러한 부분적인 텍스처 정보의 효율적인 인코딩이다.To encode moving textures in an MPEG-4 encoder, the convolution method is to use a DCT transform (discrete cosine transform) on the picture blocks. More precisely, the plane to be encoded is divided into macroblocks of size 16 x 16 pixels, and the 16 x 16 luminance information is a two-dimensional 8 x 8 DCT transform (the same 2D transform is an 8 x 8 of 2 containing U and V color difference information). Is further divided into 4 8x8 blocks encoded by (encoded again for blocks). For arbitrarily formed objects, some 8x8 blocks may only contain transparent pixels (not required to encode texture information) or only opaque pixels (standard rectangular 8x8 DCT is used to encode texture information. Used), or at least three kinds, including at least opaque pixels and transparent pixels. The problem to be solved in this third situation is the efficient encoding of this partial texture information regarding bit consumption.

먼저, 빈 공간들이 텍스처 경계 픽셀들(불투명한 영역의 경계에서의 각 샘플은 투명한 면적들로 대체하기 위하여 좌측 또는 우측 방향에 수평적으로 복제됨)을 연장시킴으로써 채워진 후에, 텍스처들이 직각형 매크로블록들로서 전형적으로 DCT 인코딩될 수 있다. 그러나, 이 패딩(padding) 방법은 주파수 스펙트럼(그들은 수평 방향에서 평탄할 수도 있고, 수직 방향에서 무작위로 변경될 수도 있어, 그 결과 매크로블록들이 DCT 인코딩될 때 보다 많은 비트들을 소비하는 불필요한 주파수 성분들이 된다.)의 관점에서 최적이 될 수 없는 패턴들을 도입한다.First, empty spaces are filled by extending texture boundary pixels (each sample at the boundary of an opaque area is copied horizontally in the left or right direction to replace with transparent areas), and then the textures are rectangular macroblocks. Typically can be DCT encoded. However, this padding method uses the frequency spectrum (they may be flat in the horizontal direction and may be changed randomly in the vertical direction, resulting in unnecessary frequency components that consume more bits when the macroblocks are DCT encoded). Patterns that cannot be optimal in terms of

MPEG-4 표준 내에 정규화되는 다른 해결책은 도 1(예시로서 주어짐)의 패턴들을 인코딩하는 두 단계들에서 착수하는 소위 형태 적응 DCT이다. 도 2에 도시된 바와 같이, 먼저 모든 불투명한 픽셀들이 인코딩될 블록에서 최상의 위치에 쉬프트되고, 이어서 적응형 1차원 n-DCT가 각 열에 인가되는데, 여기서 n은 상기 열(도 2의 예에서는, 좌측에서 우측으로, 1, 4, 7, 5, 7 및 1-DCT가 수직 방향으로 각각 인가됨)에서의 불투명한 픽셀들의 수이다. 이어서 유사하게, 결과로서 생긴 수직 DCT 계수들이 블록에서 도 3의 패턴을 야기하는 최좌측 위치에 쉬프트되고, 유사하게 1차원 n-DCT(n은 중히 여겨지는 행에서 불투명한 픽셀들의 수임)가 각각의 행에 인가된다. 공교롭게도, 연관된 MPEG-4 디코더에서 특정한 기능들(완전히 불투명한 블록들에 이용되는 전형적인 8 ×8 DCT 알고리즘에 반대되는)을 필요로 하는 이러한 방법에 따라, 통상적으로 쉬프트 동작들은 공간적으로 분리되어 극히 적은 상관관계를 갖는 계수들 또는 픽셀들을 연결시키기 때문에, 고주파들을 도입한다.Another solution normalized within the MPEG-4 standard is the so-called shape adaptive DCT undertaken in two steps of encoding the patterns of FIG. 1 (given as an example). As shown in FIG. 2, first all opaque pixels are shifted to the best position in the block to be encoded, and then an adaptive one-dimensional n-DCT is applied to each column, where n is the column (in the example of FIG. 2, From left to right, 1, 4, 7, 5, 7 and 1-DCT are applied in the vertical direction, respectively). Similarly, the resulting vertical DCT coefficients are then shifted to the leftmost position in the block that results in the pattern of FIG. 3, and similarly, one-dimensional n-DCT (where n is the number of opaque pixels in the highly regarded rows). Is applied to the row of. Unfortunately, according to this method requiring certain functions in the associated MPEG-4 decoder (as opposed to the typical 8x8 DCT algorithm used for completely opaque blocks), shift operations are typically spatially separated and thus extremely small. Since we connect the correlated coefficients or pixels, we introduce high frequencies.

본 발명은 임의로 형성된 오브젝트들을 나타내는 화상 요소들(픽셀들)에 대응하는 입력 데이터를 전처리하는 방법에 관한 것으로, 상기 입력 데이터는 각 오브젝트에 대해 상기 오브젝트의 픽셀들의 값들에 대응하는 텍스처 부분(texture part), 및 상기 입력 데이터를, 상기 텍스처 부분에서 완전히 또는 부분적으로 불투명한 픽셀들 및 투명한 픽셀들에 각각 대응하는 데이터의 제 1 및 제 2 서브셋으로 세분하는 오브젝트 마스크(object mask)를 포함하고, 상기 전처리 방법은 상기 불투명한 픽셀들에 대응하는 DCT(이산 코사인 변환) 계수들을 결정하는 데에 제공되고, 각 고려되는 오브젝트에 대해,The present invention relates to a method of preprocessing input data corresponding to image elements (pixels) representing arbitrarily formed objects, the input data being a texture part corresponding to the values of the pixels of the object for each object. And an object mask that subdivides the input data into first and second subsets of data corresponding to opaque and transparent pixels, respectively, completely or partially in the texture portion, wherein the A preprocessing method is provided for determining DCT (Discrete Cosine Transform) coefficients corresponding to the opaque pixels, and for each object considered,

(1) 오브젝트 평면을 2차원 블록들로 분할하는 단계,(1) dividing the object plane into two-dimensional blocks,

(2) 상기 블록에 의해 규정된 화상 영역에 원래 픽셀 값들의 추정을 기저 벡터들의 선형 조합으로 표현하도록 선택된 상기 기저 벡터들의 세트를 도입하는 단계,(2) introducing the set of basis vectors selected to represent an estimate of original pixel values as a linear combination of basis vectors in the picture region defined by the block,

(3) 상기 픽셀 값들의 원래 표현과 이 원래 표현의 상기 추정 사이의 왜곡(distortion)을 측정하도록 코스트 함수 ψ를 규정하는 단계, 및(3) defining a cost function ψ to measure the distortion between the original representation of the pixel values and the estimate of the original representation, and

(4) 상기 코스트 함수 ψ를 최소화하는 것을 허용하는 상기 계수들을 찾는단계를 포함한다. 임의로 형성된 텍스처들을 효율적으로 인코딩하는 본 발명은 특히 MPEG-4 표준에 관해서 유용하나, 이러한 응용에 제한되지 않는다.(4) finding the coefficients that allow to minimize the cost function ψ. The invention of efficiently encoding randomly formed textures is particularly useful with respect to the MPEG-4 standard, but is not limited to this application.

도 1 내지 3은 임의로 형성된 오브젝트의 텍스처 픽셀들을 인코딩하는데 이용되는 종래 기술 방법(형태 적응 DCT)을 도시하는 도면.1-3 illustrate prior art methods (shape adaptive DCT) used to encode texture pixels of an arbitrarily formed object.

도 4는 본 발명에 따른 전처리 방법의 주요 단계들을 나타내는 플로우차트를 도시하는 도면.4 shows a flowchart showing the main steps of the preprocessing method according to the invention.

따라서, 본 발명의 목적은 이러한 바람직하지 않은 주파수들의 도입을 회피하고 보다 나은 코딩 효율을 유도하는 전처리 방법을 제공하는 것이다.It is therefore an object of the present invention to provide a preprocessing method that avoids the introduction of these undesirable frequencies and leads to better coding efficiency.

이 때문에, 본 발명은 설명서의 서두에 정의된 것과 같은 방법에 관한 것으로, 더욱이For this reason, the present invention relates to a method as defined at the beginning of the manual, moreover

(a) 상기 코스트 함수 ψ는(a) the cost function

유형의 관계식으로 주어지고,Given as a relation of type,

여기서, f는 관련된 블록의 픽셀들의 열-벡터이고, ((b_i),iε(1 내지 64))는 8 ×8 DCT의 기저 벡터들이고, f_opaque는 상기 블록의 불투명한 픽셀들에 대한 f의 제한이고, ((b_opaque),iε(1 내지 64))는 상기 블록의 불투명한 픽셀들의 위치에 대한 상기 기저 벡터들의 제한이고,는 f_opaque의 재구성으로 불리며,Where f is a column-vector of pixels of the associated block, ((b _i ), iε (1 to 64)) are basis vectors of 8 × 8 DCT, and f _opaque is f for the _opaque pixels of the block. ((B _opaque ), iε (1 to 64)) is the limit of the basis vectors relative to the position of the opaque pixels of the block, Is called the reconstruction of f _opaque ,

(b) 상기 찾는 단계 자체는,(b) the finding step itself,

- 반복 파라미터 k=0, f^E _opaque=0의 초기 추정, 초기 재구성 계수들 c⁰ _i=0을 포함하는 파라미터들의 초기화 동작,An initial estimation of the repetition parameter k = 0, f ^E _opaque = 0, initialization of the parameters including the initial reconstruction coefficients c ⁰ _i = 0,

- 상기 불투명한 픽셀들에 제한된 상기 기저 벡터들의 추출, 및 다음의 { }가 크로스 상관 함수를 나타내고, i는 1에서 64까지 변화하고, (b_opaque)는 상기 제한된 기저 벡터들인, 투영 계수들Extraction of the basis vectors confined to the opaque pixels, and the next {} represents a cross correlation function, i varies from 1 to 64 and (b _opaque ) are projection coefficients, the limited basis vectors

p^o _i={(f_opaque-f^E _opaque),b_opaque(i)}의 계산 동작,p ^o _i = {(f _opaque -f ^E _opaque ), b _{opaque (i)} }

- 각각이,-Each one,

[a] 상기 코스트 함수를 최소화하도록 최대 기여하는 상기 기저 벡터의 인덱스 i*를 찾는 서브-단계;[a] sub-step of finding an index i * of the basis vector that contributes the most to minimize the cost function;

[b] 관계식 f^E _opaque(k+1)=f^E _opaque(k)+p^k _i.b_opaque(i)에 따라 f^E _opaque의 재구성을 업데이트하는 서브-단계;[b] sub-step of updating the reconstruction of f ^E _opaque according to the relationship f ^E _opaque (k + 1) = f ^E _opaque (k) + p ^k _i .b _{opaque (i)} ;

[c] 상기 재구성 계수들 i≠i*에 대해 c^k+1 _i= c^k _i, 및 c^k+1 _i*= c^k _i*+p^k _i*및 투영 계수들 P^K+1 _i*을 업데이트하는 서브-단계를 수행하는데 제공되는, 반복(들) 동작,[c] c ^{k + 1} _i = c ^k _i , and c ^{k + 1} _{i *} = c ^k _{i *} + p ^k _{i *} and projection coefficients P ^{K + 1} _{i *} for the reconstruction coefficients i ≠ _{i *} An iterative (s) operation, which is provided to perform a sub-step of updating

- 상기 코스트 함수 ψ가 주어진 문턱치 아래이거나, 반복들의 미리 결정된 수가 도달되면, 상기 반복들의 중단 동작을 포함하는 것을 특징으로 한다.-If the cost function ψ is below a given threshold, or if a predetermined number of iterations is reached, aborting the iterations.

본 발명은 이제 첨부된 도면들에 관련하여 일예에 의해 기술된다.The invention is now described by way of example in conjunction with the accompanying drawings.

본 발명에 따른 전처리 방법은 MPEG-4에서 규정된 것처럼 기존의 디코더 구조를 이용하기 위하여, 전형적인 8 ×8 DCT 변환을 기초로 하지만, 0이 아닌 계수들의 수를 최소화하는 동안, 투명한 픽셀들에 관계없는 불투명한 픽셀들을 최대 재건하는 DCT 계수들을 계산함으로써 보다 나은 코딩 효율을 제공한다. 상기 방법은 vol.Ⅱ, 페이지 93 내지 96, 미국 캘리포니아 산타 바바라, 1997년 10월 26 내지 29일, Proceedings ICIP-97의 J.H.change 등에 의해 "A projection onto the overcomplete basis approach for block loss recovery"에 기술된 접근법의 적응이다.The preprocessing method according to the invention is based on a typical 8x8 DCT transform, in order to use the existing decoder structure as defined in MPEG-4, but in relation to transparent pixels, while minimizing the number of nonzero coefficients. Better coding efficiency is provided by calculating DCT coefficients that reconstruct the missing opaque pixels. The method is described in "A projection onto the overcomplete basis approach for block loss recovery" by JHchange of Proceedings ICIP-97, et al., Vol. II, pages 93-96, Santa Barbara, California, October 26-29, 1997. Is an adaptation of the approach.

원래 변질된 MPEG-4 비디오 스트림들의 경우(이 경우, 이러한 스트림에서 작은 에러가 블록들의 큰 수에 퍼질 때조차도, 손상된 영상 블록들을 확인하고 재구성시키는 것이 필요하다)에 유실 정보의 은폐 기술(concealment technique)이 되도록 하는 상기 문서에 기술된 방법의 이러한 제안된 적응에 따라, 기본 사상은 기저 벡터의 선형 조합으로서 층 블록에 대한 추정을 나타내고, 왜곡 측정을 최소화하도록 기저 벡터들의 투영 계수들을 찾는 오버컴플리트 기저들(overcomplete basis)을도입함으로써, 일련의 손상되지 않은 값들로부터 비손상된 원래 픽셀 값들의 추정을 얻게 하는 것이다. 다음과 같은 표기들, 즉In the case of the original corrupted MPEG-4 video streams, in this case it is necessary to identify and reconstruct the corrupted picture blocks even when a small error spreads to a large number of blocks. According to this proposed adaptation of the method described in the above document, the base idea represents the estimate for the layer block as a linear combination of basis vectors, and overcomplete basis for finding projection coefficients of the basis vectors to minimize distortion measurements. By introducing the overcomplete basis, one obtains an estimate of the intact original pixel values from a series of intact values. Notations such as

D = 손상된 블록D = damaged block

N = 상기 블록의 손상되지 않은 이웃N = intact neighbor of the block

U = D 및 N의 결합(= 더 큰 블록)을 사용하여, 손상되지 않은 이웃 정보 N로부터 관련된 손상된 블록 D를 포함하는 더 큰 블록 U을 추정하도록 제안된다. f=(f_ij,(i,j))εU가 비손상된 원래 픽셀 값들(i,jεN)을 나타낸다면, 그 결과 태스트는 f를 추정한다. N에 픽셀 값들이 알려져 있기 때문에, f의 추정 f^E의 왜곡 측정이 예상될 수 있고, 왜곡 측정은 다음과 같은 제곱차로 규정된다.Using a combination of U = D and N (= larger block), it is proposed to estimate a larger block U containing the relevant corrupted block D from the intact neighboring information N. If f = (f _ij , (i, j)) εU represents the original pixel values i, jεN that are intact, the result is that the task estimates f. Since pixel values are known at N, a distortion measurement of the estimated f ^E of f can be expected, and the distortion measurement is defined by the following squared difference.

(b)_ℓ=(b^ℓ _i,j)가 U의 기저이고 기저 벡터들의 세트가 원래 f를 기저 벡터들의 선형 조합으로서 나타내도록 선택되면, 블록내 상관관계에 기인하고 일부 연관된 가정들을 고려하여, 손상된 블록 및 이것의 이웃은 유사한 스펙트럼 특성들을 가질 수 있다. 그러므로, 투영 계수(b) If _l = (b ^l _{i, j} ) is the basis of U and the set of basis vectors is chosen to represent the original f as a linear combination of basis vectors, then due to intra-block correlation and taking into account some associated assumptions, The damaged block and its neighbors may have similar spectral characteristics. Therefore, projection coefficient

는 원래 계수Is the original coefficient

의 바람직한 추정일 수 있다.May be the preferred estimate of.

따라서, 계수들 a'_ℓS이 f^E _N=∑_ℓa_ℓb_ℓ가 f_N의 바람직한 근사치이게 하면, F^E _U=∑_ℓa_ℓb_ℓ는 f_U의 바람직한 추정이 된다(아래 첨자들 N 및 U는 벡터들의 범위를 나타냄).Thus, if the coefficients a ' _ℓ S make f ^E _N = ∑ _ℓ a _ℓ b _ℓ is a preferred approximation of f _N , then F ^E _U = ∑ _ℓ a _ℓ b _ℓ is a good estimate of f _U (subscript N and U represent a range of vectors).

다음에, 손상된 블록을 재구성하는 문제는 문서에 기술된 반복 알고리즘으로 가능한, ψ를 최소화하는 것을 허용하는 이 계수들 a_ℓ's를 찾는 것이다. 본 발명의 목적인 전처리 방법에 따라, 이제 변경된 표기들 및 고려할 사항들에 대한 문제를 공식화한다.Next, the problem of reconstructing the damaged block is to find these coefficients a _ℓ 's that allow to minimize ψ as possible with the iterative algorithm described in the document. In accordance with the preprocessing method which is the object of the present invention, it now formulates a problem with changed notations and considerations.

- f는 인코딩될 매크로블록의 픽셀들의 열 벡터이고,f is a column vector of pixels of the macroblock to be encoded,

- f_opaque는 상기 매크로블록의 불투명한 픽셀들에 대한 f의 제한이고,f _opaque is the limit of f for the opaque pixels of the macroblock,

- B는 8 ×8 DCT 변환의 기저 함수들을 나타내고; B=(b_i), iε(1 내지 64),B represents the basis functions of the 8x8 DCT transform; B = (b _i ), iε (1 to 64),

- B_opaque=(b_opaque(i),iε(1 내지 64)는 불투명한 픽셀들의 위치에 대한 이 기저 벡터들의 제한을 나타낸다.B _opaque = (b _{opaque (i)} , iε (1 to 64) represents the limit of these basis vectors to the position of opaque pixels.

다음에, 상기 문제는 0 계수들의 최대수로 코스트 함수,Next, the problem is a cost function with the maximum number of zero coefficients,

를 최소화함으로써 최소 평균 제곱 지각(least mean square sense)에서 f_opaque를 최대 최대하는 계수들(ci)의 콤팩트 세트를 찾는 것이다.By minimizing, we find a compact set of coefficients (ci) that maximize the f _opaque at the minimum mean square sense.

매크로블록 f가 완전히 불투명하였다면, 모든 픽셀들(직각형 8 ×8 DCT임)을 재건할 수 있는 DCT 계수들의 유한 조합이 존재했을 것이다. 그러나, f의 특정 부분의 재건만을 원한다면, 같은 불투명한 픽셀들을 포함하는 블록을 재건할 수 있는 DCT 계수들의 무한이 존재한다. 사실상, 적정한 DCT 계수들(및 계수들의 최대 콤팩트 세트)의 결정은 명시되지 않는데, 그 이유는 DCT 변환의 기저 함수들이 불투명한 픽셀들의 위치에 제한될 때 더 이상 직교정규가 아니기 때문이다. 코스트 함수 ψ를 최소화하는 계수들을 찾기 위하여, 최대 에너지를 갖는 투영 계수를 연속적으로 검색하는 다음의 반복 알고리즘이 제안되는데(도 4에 도시됨), 즉If macroblock f was completely opaque, there would have been a finite combination of DCT coefficients that could reconstruct all the pixels (rectangular 8 × 8 DCT). However, if only one wants to reconstruct a certain part of f, there is an infinite number of DCT coefficients that can reconstruct a block containing the same opaque pixels. In fact, the determination of the appropriate DCT coefficients (and the maximum compact set of coefficients) is not specified because the basis functions of the DCT transform are no longer orthonormal when limited to the positions of opaque pixels. In order to find the coefficients that minimize the cost function ψ, the following iterative algorithm is proposed (shown in FIG. 4) which continuously searches for the projection coefficient with the maximum energy:

(1) 제 1 단계(초기화 INIT):(1) First step (initialization INIT):

k=0(반복들의 수)k = 0 (number of repetitions)

f^E _opaque=0(초기 재건 계수들)f ^E _opaque = 0 (initial reconstruction coefficients)

(2) 제 2 단계(추출 서브-단계 EXTR 및 계산 서브-단계 CALC):(2) second stage (extraction sub-stage EXTR and computational sub-stage CALC):

투영 계수들이 P⁰ _i={(f_opaque-f^E _opaque),b_opaque(i)}이 계산되고, 여기서 {}은 상관 함수를 나타내고, 1는 1부터 64 까지 변화하며, (b_opaque(i))는 불투명한 픽셀들(형태 마스크 SM에 의해 범위 정해진 텍스처를 갖는 픽셀들 PWT)에 제한되는 추출된 8 ×8 DCT 기저 벡터들이다.The projection coefficients are calculated P ⁰ _i = {(f _opaque -f ^E _opaque ), b _{opaque (i)} }, where {} represents the correlation function, 1 varies from 1 to 64, and (b _{opaque (i )} ) Are extracted 8 × 8 DCT basis vectors that are confined to opaque pixels (pixels PWT with texture bounded by the shape mask SM).

(3) 제 3 단계(추정을 고려한 k 반복들), 각 반복 및 예컨대 k번째 반복 자체는 다음과 같은 동작들을 포함하는데, 즉(3) the third step (k iterations taking into account the estimate), each iteration and for example the k th iteration itself include the following operations, i.e.

(a) 나머지 최대 에너지를 포착하는 기저 벡터의 인덱스 i*를 (서브-단계 FIND에서) 찾는 동작:(a) Finding (in sub-step FIND) the index i * of the basis vector that captures the remaining maximum energy:

i*=arg.max∥p^k _i.b_opaque(i)∥²i=1 내지 64 동안i * = arg.max ∥p ^k _i .b _{opaque (i)} ∥ for ² i = 1 to 64

(b) 다음과 같은 관계식에 따라 f^E _opaque의 재건을 (서브-단계 UOEA에서) 업데이트하는 동작:(b) updating the reconstruction of the f ^e _opaque (in the sub-stage UOEA) according to the following relationship:

f^E _opaque(k+1)=f^E _opaque(k)+p^k _i.b_opaque(i) f ^E _opaque (k + 1) = f ^E _opaque (k) + p ^k _i .b _{opaque (i)}

(c) 재건 계수들 c_i ^k+1=c_i ^k을 i≠i* 및 c_i* ^k+1=c_i* ^k+p_i* ^k동안 (서브-단계 UPDA에서) 업데이트하는 동작:(c) updating the reconstruction coefficients c _i ^{k + 1} = c _i ^k for i ≠ i * and c _{i *} ^{k + 1} = c _{i *} ^k + p _{i *} ^k (in sub-step UPDA):

(d)나머지 투영 계수들을 (서브-단계 UPDA에서) 업데이트하는 동작:(d) Update the remaining projection coefficients (in sub-stage UPDA):

(4) 제 4 단계(반복 알고리즘의 중단 테스트 또는 서브-단계 TEST),(4) the fourth step (interruption test or sub-step TEST of the iteration algorithm),

이중, 나머지가 주어진 문턱치 ε,Of which the threshold ε is given,

아래이거나, 반복들 k_max의 미리 결정된 수가 도달되면, 반복 프로세스를 정지하고(테스트에 YES로 대답), 그렇지 않으면 이 조건들 중 아무 것도 만족되지 않을 때까지 제 3 단계(3)의 반복을 계속한다(테스트에 No로 대답). 상기 알고리즘구현의 끝에서, c_i ^k는 임의 형태의 불투명한 픽셀들을 발생시키는 8 ×8 DCT 계수들이다.Below, or when a predetermined number of iterations k _max is reached, stop the iteration process (answer YES to the test), otherwise continue the iteration of the third step (3) until none of these conditions are met. (No to the test). At the end of the algorithm implementation, c _i ^k are 8x8 DCT coefficients that generate opaque pixels of any form.

이와 같이 기술된 전처리 방법 다음에 통상적으로 텍스처 인코딩, 즉 양자화, 그들의 엔트로피를 더 감소시키도록 개개의 계수 예측(필요한 경우), 콘볼루션 MPEG 코딩 전략에서 완전히 불투명한 블록의 DCT 계수들이 제공되는 계수들의 가변 길이 인코딩 및 스캐닝이 제공되는 통상의 동작들이 올 수 있다. 본 발명을 MPEG-4 표준의 특정 예에 관련하여 상술하였지만, 본 발명이 이에 한정되거나 제한되지 않음을 알 수 있다. 본 발명은 얻어진 출력 비트스트림에서 이용된 어떤 특정 코딩 전략에 한정되지 않는다.The preprocessing method described above is typically followed by texture encoding, i. Conventional operations may be provided in which variable length encoding and scanning are provided. Although the present invention has been described above with reference to specific examples of the MPEG-4 standard, it is to be understood that the present invention is not limited thereto. The invention is not limited to any particular coding strategy used in the resulting output bitstream.

Claims

A method of preprocessing input data corresponding to picture elements (pixels) representing randomly formed objects, the input data comprising: for each object, a texture part corresponding to values of pixels of the object, and An object mask that subdivides input data into first and second subsets of data corresponding to completely or partially opaque pixels and transparent pixels in the texture portion, wherein the preprocessing method comprises the opacity Is provided to determine DCT (discrete cosine transform) coefficients corresponding to one pixel, and for each object considered

(1) dividing the object plane into two-dimensional blocks,

(2) introducing the set of basis vectors selected to represent an estimate of original pixel values as a linear combination of basis vectors in the picture region defined by the block,

(3) defining a cost function ψ to measure the distortion between the original representation of the pixel values and the estimate of the original representation, and

(4) finding the coefficients that allow to minimize the cost function ψ, wherein the method of preprocessing the input data comprises:

(a) the cost function

Given as a relation of type,

Where f is a column-vector of pixels of the associated block, ((b _i ), iε (1 to 64)) are basis vectors of 8 × 8 DCT, and f _opaque is f for the _opaque pixels of the block. ((B _opaque ), iε (1 to 64)) is the limit of the basis vectors relative to the position of the opaque pixels of the block, Is called the reconstruction of f _opaque ,

(b) the finding step itself,

An initial estimation of the repetition parameter k = 0, f ^E _opaque = 0, initialization of the parameters including the initial reconstruction coefficients c ⁰ _i = 0,

Extraction of the basis vectors confined to the opaque pixels, and the next {} represents a cross correlation function, i varies from 1 to 64 and (b _opaque ) are projection coefficients, the limited basis vectors

p ^o _i = {(f _opaque -f ^E _opaque ), b _{opaque (i)} }

-Each one,

[a] sub-step of finding an index i * of the basis vector that contributes the most to minimize the cost function;

[b] sub-step of updating the reconstruction of f ^E _opaque according to the relationship f ^E _opaque (k + 1) = f ^E _opaque (k) + p ^k _i .b _{opaque (i)} ;

[c] c ^{k + 1} _i = c ^k _i , and c ^{k + 1} _{i *} = c ^k _{i *} + p ^k _{i *} and projection coefficients P ^{K + 1} _{i *} for the reconstruction coefficients i ≠ _{i *} An iterative (s) operation, which is provided to perform a sub-step of updating

A stop operation of the iterations if the cost function [psi] is below a given threshold or if a predetermined number of iterations is reached.

The method of claim 1,

The cost function is a relational expression,

Prescribed by

The sub-step [a] is residual The index i * of the basis vector that captures the maximum energy of

i * = arg.max ∥ p ^k _i .b _{opaque (i)} ∥ ² , i = 1 to 64

A method for preprocessing the input data, characterized in that it is provided to find.

A method of coding input data corresponding to a texture of randomly formed objects, the coding method comprising at least a DCT transform of the input data of the texture, quantization of the coefficients obtained from the transform, differential prediction of the data to be coded and the quantized 10. A method of coding input data, comprising a variable length encoding operation of coefficients, the method comprising:

And the DCT coefficients are obtained by an implementation of the preprocessing method according to claim 1 and are coefficients c ^k _i corresponding to opaque pixels of the arbitrarily formed objects.

An encoding device for coding input data corresponding to a texture of randomly formed objects, the encoding device comprising at least means for implementing a DCT transform of the input data of the texture, quantization of coefficients obtained from the transform, differential prediction of the data to be coded And a variable length encoding operation of the quantized coefficients, comprising:

And the DCT coefficients are obtained by an implementation of the preprocessing method according to claim 1, characterized in that the coefficients c ^k _i correspond to opaque pixels of the arbitrarily formed objects.