KR20000031283A

KR20000031283A - Image coding device

Info

Publication number: KR20000031283A
Application number: KR1019980047239A
Authority: KR
Inventors: 정영안
Original assignee: 구자홍; 엘지전자 주식회사
Priority date: 1998-11-02
Filing date: 1998-11-02
Publication date: 2000-06-05
Also published as: KR100323701B1

Abstract

PURPOSE: An image coding device is provided to improve an efficiency of a zero tree coding by rearranging DCT(Discrete Cosine Transform) coefficients and performing zero tree coding for them. CONSTITUTION: An image coding device comprises a conversion portion, a rearrangement portion(201), and a zero tree coding portion(202). The conversion portion converts space areas of each block to frequency areas after dividing an input frame into a plurality of block. The rearrangement portion classifies conversion coefficients form the conversion portion according to an important degree of the image information. The zero tree coding portion codes information related to locations and sizes of the rearranged coefficients according to an order of the important degree.

Description

Video encoding device

본 발명은 영상신호를 부호화하는 장치에 관한 것으로서, 특히 공중망(PSTN)을 통한 영상 전화 시스템용 저전송율 비디오 전송에 적합한 부호화 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus for encoding video signals, and more particularly, to an encoding apparatus suitable for low rate video transmission for video telephone systems over a public network (PSTN).

종래의 영상 부호화 방법의 국제규격으로서 정지영상의 부호화/복호화 규격인 JPEG(Joint Photographic Coding Experts Group), 동영상의 부호화/복호화 규격인 MPEG(Moving Picture Experts Group), 또 저전송율 비디오 표준안인 H.261, H.263등이 제시되고 있다.As an international standard of conventional video encoding methods, JPEG (Joint Photographic Coding Experts Group), which is a coding / decoding standard for still images, Moving Picture Experts Group (MPEG), which is a coding / decoding standard for video, and H.261, a low-rate video standard. , H.263, etc. are proposed.

특히, 기존 공중망(Public Switched Telephone Network ; PSTN)을 통한 영상 통신의 필요성이 증대됨에 따라 부호화 알고리즘 및 표준화 활동 측면에서 초저 비트율 영상 부호화(very low bit video coding) 연구가 활발히 이루어지고 있으며, 그 대표적인 예로 ITU-T(International Telecommunication Union-Telecommunication Standardization Sector ; 국제 전기 통신 연합-전기 통신 표준화 부분, 전 CCITT)의 H.263 권고안은 영상 전화 시스템 구현을 위해서 매우 낮은 비트율 전송에 적합한 움직임 보상 하이브리드(motion compensated hybrid) DPCM/DCT(Differential Pulse Code Modulation/Discrete Cosine Transform) 부호화 방법을 사용한다.In particular, as the need for video communication through a public switched telephone network (PSTN) increases, very low bit video coding studies are actively conducted in terms of coding algorithms and standardization activities. The H.263 Recommendation of the International Telecommunication Union-Telecommunication Standardization Sector (ITU-T), formerly CCITT, is a motion compensated hybrid suitable for very low bit rate transmission for video telephony systems. ) DPCM / DCT (Differential Pulse Code Modulation / Discrete Cosine Transform) coding method is used.

이러한 영상 부호화 방법은 입력 디지탈 영상신호를 DCT 변환하고 양자화한 다음, 양자화된 영상신호를 복원하여 원 영상 신호와의 오차를 검출하고 그 움직임을 추정하여 양자화 스텝을 제어함으로써 원하는 비트율을 확보하고 있다.The video encoding method secures a desired bit rate by DCT converting and quantizing an input digital video signal, restoring the quantized video signal, detecting an error with the original video signal, and estimating the motion to control the quantization step.

도 1은 종래의 이산 코사인 변환(Discrete Cosine Transform ; DCT) 기반 동영상 부호화 장치의 구성 블록도로서, 영상 부호화는 크게 I(Intra) 프레임 부호화 및 P(Predictive) 프레임 부호화로 나누어진다.FIG. 1 is a block diagram illustrating a conventional discrete cosine transform (DCT) based video encoding apparatus. Image encoding is classified into I (Intra) frame encoding and P (Predictive) frame encoding.

이때, I 프레임인 경우는 입력되는 비디오 비트스트림이 그대로 DCT부(101)로 출력되고, P 프레임인 경우는 감산기(110)의 출력 즉, 움직임 예측된 데이터와 현재 입력되는 비트스트림과의 차가 DCT부(101)로 출력된다.At this time, in the case of an I frame, the input video bitstream is output to the DCT unit 101 as it is, and in the case of a P frame, the output of the subtractor 110, that is, the difference between the motion predicted data and the currently input bitstream is DCT. It is output to the unit 101.

상기 DCT부(101)는 2차원 축변환을 통해서 데이터의 상관성을 제거하는데 이를 위해 입력되는 프레임을 블록 단위로 나눈 후 나뉘어진 각각의 블록을 축변환시킨다. 즉, 각 블록의 영상을 공간 영역으로부터 주파수 영역으로 변환한다. 이렇게 DCT된 데이터들은 한쪽 방향(저역쪽)으로 몰리는 경향이 있는데 이렇게 몰려진 데이터들만을 양자화부에서 양자화한다. 이때, 상기 양자화에는 양자화 파라미터 즉, 가중치 매트릭스(weighting matrix)와 양자화 스케일 코드(quantizer-scale-code)가 이용된다. 여기서, 상기 가중치 매트릭스는 각 DCT 계수의 가중치를 나타내고, 양자화 스케일 코드는 양자화 스텝을 결정한다.The DCT unit 101 removes the correlation of data through two-dimensional axis transformation. To this end, the input frame is divided into block units, and then the respective blocks are divided. That is, the image of each block is converted from the spatial domain to the frequency domain. The DCT data tends to be concentrated in one direction (low pass). Only the collected data is quantized in the quantization unit. In this case, a quantization parameter, that is, a weighting matrix and a quantizer scale code is used for the quantization. Here, the weight matrix represents the weight of each DCT coefficient, and the quantization scale code determines the quantization step.

그리고, 이렇게 양자화된 후의 각 계수는 엔트로피 부호화(Entropy Coding)부(103)로 출력된다. 상기 엔트로피 부호화부(103)는 가변 길이 부호화(Valiable Length Coding ; VLC)를 적용하여 자주 나오는 값은 적은 수의 비트로, 드물게 나오는 값은 많은 수의 비트로 표시하여 전체 비트 수를 줄인 후 채널(104)을 통해 전송한다.Each coefficient after quantization is output to an entropy coding unit 103. The entropy encoder 103 applies variable length coding (VLC) to reduce the total number of bits by reducing the number of bits by frequently displaying a small number of bits and a rare value of a large number of bits. Send it through.

또한, 상기 양자화된 데이터는 다시 역양자화부(105)에서 역양자화되고 IDCT부(106)에서 IDCT되어 가산기(107)로 출력된다. 상기 가산기(107)는 움직임 예측부(109)에서 움직임 예측된 데이터와 상기 IDCT된 데이터를 가산하여 프레임 메모리(108)에 저장한다.In addition, the quantized data is inversely quantized by the inverse quantization unit 105 and IDCT by the IDCT unit 106 and output to the adder 107. The adder 107 adds the motion predicted data and the IDCT data by the motion predictor 109 and stores the IDCT data in the frame memory 108.

한편, 시간축으로 연속된 픽쳐들은 주로 화면의 중앙 부분에서 사람이나 물체의 움직임이 있기 때문에 움직임 예측부(109)에서는 이러한 성질을 이용하여 시간축의 중복성을 제거한다. 즉, 화면의 변하지 않은 부분이나 움직였다 하더라도 비슷한 부분을 바로 전 픽쳐에서 가져와 채움으로써 전송해야 할 데이터 량을 큰 폭으로 줄일 수 있다.On the other hand, since pictures that are continuous on the time axis mainly have a motion of a person or an object in the center portion of the screen, the motion predictor 109 removes the redundancy of the time axis using this property. That is, the amount of data to be transmitted can be greatly reduced by taking and filling a similar portion from the previous picture even if it is not changed or moved.

즉, 상기 움짐임 예측부(109)의 출력과 IDCT된 출력을 가산기(107)에서 더하여 프레임 메모리(108)에 저장하면 상기 움직임 예측부(109)에서 현재 입력되는 픽쳐의 움직임 추정시 상기 프레임 메모리(108)에 저장된 데이터가 바로 전 픽쳐가 된다.That is, when the output of the moving prediction unit 109 and the output of the IDCT are added to the adder 107 and stored in the frame memory 108, the frame memory may be estimated when the motion of the picture currently input by the motion predictor 109 is estimated. The data stored in 108 becomes the previous picture.

이렇게 픽쳐 사이에서 가장 비슷한 블록을 찾는 일을 움직임 추정(Motion estimation)이라 하고, 얼마만큼 움직였나 하는 변위를 나타내는 것을 움직임 벡터(Motion Vector ; MV)라 한다. 상기 움직임 벡터는 VLC된 변환 계수 정보와 함께 전송된다. 이때, 상기 움직임 벡터도 최대의 부호화 효율을 얻기 위해서 VLC된다.Finding the most similar block among pictures is called motion estimation, and a motion vector (MV) representing a displacement of how much motion is moved is called motion estimation. The motion vector is transmitted with the VLC transform coefficient information. At this time, the motion vector is also VLC to obtain maximum coding efficiency.

즉, 상기 움직임 예측부(109)에서 움직임 추정을 하기 위해서는 먼저 움직임 벡터(MV)를 구해야 한다. 이때, MV는 한 매크로 블록당 최대 4개까지 나올 수 있는데 이를 그냥 보내면 너무 비트량이 많으므로 바로 전 매크로블록의 MV와의 차이만을 VLC하여 전송한다.That is, in order to estimate the motion in the motion predictor 109, the motion vector MV must first be obtained. In this case, up to four MVs can be output per macroblock. If this signal is sent, the MV can be transmitted.

그리고, 상기 움직임 예측부(109)의 움직임 보상 과정은 전방향, 후방향 예측 블록들을 이용하는데 두 가지의 움직임 보상 프레임이 있다. 그 중 P 프레임은 전방향 예측만으로 움직임 보상이 이루어지며 그 자체로 다음 P 프레임을 예측하는데 사용된다. 또한, 상기 P 프레임은 B 프레임(양방향에서 예측된 프레임)의 전방향 및 후방향 예측을 위해서도 사용된다. 그러나, B 프레임은 그 자체로서 예측을 위해 사용되지는 않는다.The motion compensation process of the motion predictor 109 uses forward and backward prediction blocks, and there are two motion compensation frames. Among them, the P frame is motion compensated only by omnidirectional prediction, and is used to predict the next P frame by itself. The P frame is also used for forward and backward prediction of B frames (frames predicted in both directions). However, B frames are not used for prediction by themselves.

한편, I 프레임은 임의의 화면을 압축 부호화할 때 기준이 되는 화면으로 원 신호를 블록마다 DCT 변환과 양자화 과정에 적용하여 공간 방향의 중복성만을 제거한다.On the other hand, the I frame is a reference screen when compressing and encoding an arbitrary picture, and removes only redundancy in the spatial direction by applying the original signal to DCT transformation and quantization processes for each block.

즉, 기본적으로 첫 번째 프레임은 I-프레임 부호화하고 표준에 따라서 전송된 패킷의 손실이 있을 경우 수시로 수신단에서 I 프레임 요청을 하면 송신단에서는 I 프레임을 보내게 된다. 이때, I 프레임은 P, B 프레임의 움직임 예측에 이용되므로, I 프레임 부호화 결과는 잇따르는 프레임들(P, B 프레임)의 부호화 효율을 크게 좌우한다. 더욱이, 영상의 배경(background) 부분인 경우 I 프레임 부호화 이후 장면 전환(scene change)이 일어나기 전까지는 잇따르는 P 프레임에서 배경 부분의 부호화가 거의 이루어지지 않으므로 I 프레임의 부호화한 결과가 계속해서 주관적 화질에 영향을 끼친다. 즉, I 프레임의 부호화 결과가 좋으면 이후의 P 프레임의 부호화 결과도 좋다.That is, basically, the first frame is I-frame coded, and if there is a loss of the transmitted packet according to the standard, the receiver sends an I frame when the receiver makes an I frame request from time to time. In this case, since the I frame is used for the motion prediction of the P and B frames, the I frame encoding result greatly influences the coding efficiency of subsequent frames (P and B frames). Moreover, since the background part of the image is hardly encoded in the subsequent P frames until the scene change occurs after the I frame encoding, the result of encoding the I frame continues to improve the subjective quality. Affects. In other words, if the encoding result of the I frame is good, the encoding result of the subsequent P frame may be good.

그리고, P 프레임 부호화의 경우, 입력 프레임은 크게 두 부분, 즉 인접 프레임간의 높은 시각적 상관성을 이용하는 움직임 보상 예측(motion compensated prediction) 부분과 움직임 보상 후의 예측 에러인 DFD(Displaced Frame Difference)를 부호화하는 과정으로 나뉜다.In the case of P frame encoding, an input frame is a process of encoding two parts, that is, a motion compensated prediction part using high visual correlation between adjacent frames and a DFD (Displaced Frame Difference), which is a prediction error after motion compensation. Divided into

상기 DFD는 감산기(110)의 출력 즉, 현재 프레임과 움직임 벡터만큼 이동시킨 이전 프레임과의 차 신호로서, DFD 부호화는 보통 P 프레임 발생 비트의 대부분을 차지한다.The DFD is an output signal of the subtractor 110, that is, a difference signal between the current frame and the previous frame shifted by the motion vector, and DFD encoding generally occupies most of the P frame generation bits.

대부분의 표준에서, DFD 부호화 방법은 I 프레임 영상 부호화와 동일한 방식으로 부호화된다. 하지만, 이것은 자연(natural) 영상과 DFD 영상 특성이 다르다는 것을 제대로 이용하지 못하는 것이다. 그 이유는 DFD 영상은 주로 평탄(smoothing) 영역을 포함하는 자연 영상 보다는 공간적 상관성이 적은, 즉 훨씬 더 많은 중간(mid) 및 고주파수 성분을 가지기 때문이다. 따라서, DCT 변환시 높은 에너지 압축(compaction)을 갖는 자연 영상보다는 에너지 압축 효율이 떨어지게 되며, 그 결과로 기존의 지그재그 스캐닝에 의한 런-길이(run-length) 부호화 방법의 효율 또한 떨어지게 된다.In most standards, the DFD encoding method is encoded in the same manner as the I frame image encoding. However, this does not properly exploit the difference between the natural image and the DFD image characteristics. The reason is that DFD images have less spatial correlation, that is, they have much more mid and high frequency components than natural images that mainly contain smoothing areas. Therefore, the energy compression efficiency is lower than that of a natural image having a high energy compression during DCT conversion. As a result, the efficiency of the run-length coding method by conventional zigzag scanning is also reduced.

더욱이 매우 낮은 비트율 전송시 부호화되는 DCT 계수의 수가 매우 적으며 각 계수들 또한 매우 큰 양자화 레벨로 표현되므로, 결국 재구성된 영상에서 블록 현상 및 링잉(ringing) 효과가 현저히 나타난다.In addition, since the number of DCT coefficients encoded at a very low bit rate transmission is very small, and each coefficient is also represented by a very large quantization level, block and ringing effects are remarkable in the reconstructed image.

그리고, 동영상 부호화의 또다른 문제(issue)는 비트율 조절이다. 단순히 고정된 양자화 파라미터에 의하여 비트가 발생되고, 계수별로 부호화되기 때문에 어떤 목표 비트율(target bit rates)에 맞추기 위해서는 반복적인 방법이 요구되며 또한, 그렇게 하더라도 정확한 비트율 조절이 매우 어렵다.And another issue of video encoding is bit rate adjustment. Since bits are generated by fixed quantization parameters and coded by coefficients, an iterative method is required to meet certain target bit rates, and even then, accurate bit rate adjustment is very difficult.

또한, I 프레임 부호화는 단순히 양자화 간격(양자화 파라미터x2)만을 조절함으로써 비트율을 조절하는 구조를 가진다. 따라서, 매우 낮은 비트율 전송시 큰 양자화 간격에 따른 DCT 변환 오차가 커지게 마련이며, 그 결과가 P 프레임의 움직임 추정 및 보상등을 포함한 부호화에 계속해서 영향을 끼쳐 전체적인 부호화 성능을 떨어뜨리는 주된 요인이 된다.In addition, I frame coding has a structure of adjusting the bit rate by simply adjusting only the quantization interval (quantization parameter x2). As a result, DCT conversion errors due to large quantization intervals are increased during very low bit rate transmission, and the result continues to affect encoding, including motion estimation and compensation, of P frames, thereby reducing the overall encoding performance. do.

더욱이, 매우 낮은 비트율(12-48kbps)로 전송시 I 프레임 부호화에 드는 비트량이 대개 전체 발생 비트의 40%-70%임을 차지하므로 I프레임의 효율적인 부호화가 전체 부호화 성능향상을 위해서는 필수적이다.Moreover, since the amount of bits required for I frame encoding when transmitted at a very low bit rate (12-48 kbps) is usually 40% -70% of all generated bits, efficient encoding of I frames is essential for improving overall encoding performance.

한편, 최근에 사피로(Shapiro)의 EZW(Embedded Zerotree Wavelet)의 개념이 소개되면서 정지 영상 압축분야에서 기존의 DCT 기반 영상 부호화기(JPEG)보다 뛰어난 비트율 왜곡(rate distortion) 성능 향상을 보이는 임베디드 제로트리(embedded zerotree) 영상 부호화 연구가 활발히 이루어지고 있다.Meanwhile, the introduction of Shapiro's concept of Embedded Zerotree Wavelet (EZW) has resulted in an improved zero-tree distortion rate improvement in conventional still image compression over conventional DCT-based image encoders (JPEG). (embedded zerotree) Image coding research is being actively conducted.

상기 임베디드 제로트리 부호화는 대개 웨이브렛 변환된 계수 부호화에 많이 쓰이는 방법이다. 즉, 웨이브렛(wavelet) 계수의 제로트리 구조를 이용해 위치(position) 및 크기(amplitude) 정보를 중요도 순으로 부호화함으로써 중요도(significant)에 따라 정렬된 비트 스트림(bit stream) 예컨대, 임베디드 비트 스트림을 얻는다.The embedded zero-tree coding is a method commonly used for wavelet transformed coefficient coding. In other words, by encoding the position and amplitude information in order of importance using a zero-tree structure of wavelet coefficients, a bit stream, for example, an embedded bit stream, sorted according to importance is generated. Get

이 방법은 뛰어난 압축성능 뿐만 아니라 알고리즘이 단순하고, 다양한 해상도 및 화질이 가능한 계위(scalability) 특성 및 정확한 비트율 조절기능이 있다. 즉, 비트 스트림의 전송이 어떤 시점에서 멈추더라도 주어진 비트율에서 양질의 영상을 얻을 수 있고 비트율 제어(rate control)가 매우 용이한 장점이 있다.This method has not only excellent compression performance, but also a simple algorithm, scalability characteristics that enable various resolutions and image quality, and accurate bit rate adjustment. That is, even if the transmission of the bit stream stops at any point, it is possible to obtain a high quality image at a given bit rate and to have very easy rate control.

이러한 임베디드 제로트리 부호화의 주된 특징은 웨이브렛 변환의 자기 유사 성질(self-similarity)을 이용해서 대역간 중요도(significance) 계수의 위치를 예측하고, 웨이브렛 계수의 크기가 순차적으로 근사화되는 연속추정양자화(Successive Approximation Quantization ; SAQ)을 한다.The main feature of the embedded zero-tree coding is the continuous estimation quantization, which uses the self-similarity of the wavelet transform to predict the position of the inter-band importance coefficients and sequentially approximates the magnitude of the wavelet coefficients. Successive Approximation Quantization (SAQ).

대략적인 방법은 다음과 같다.The approximate method is as follows.

입력 영상은 웨이브렛 변환을 이용하여 다양한 해상도를 갖는 대역(subband)들로 분해된다.The input image is decomposed into subbands of various resolutions using wavelet transform.

이때, 가장 성긴(coarse) 대역에는 원 영상의 저주파성분들이 모여있고 다른 대역에는 세부 고주파성분들이 모여있다. 그리고, 가장 높은 주파수 대역을 제외하고 주어진 대역에서의 모든 계수들은 다음 세부대역의 유사한 방향의 계수들과 관계가 있다.At this time, the low frequency components of the original image are gathered in the coarse band and the detail high frequency components are gathered in the other band. And all coefficients in a given band except for the highest frequency band are related to coefficients in similar directions of the next subband.

따라서, 성긴 대역에 있는 계수를 페어런트(parent), 비슷한 방향으로의 같은 위치에 있는 더 세밀한 대역의 계수 집합을 칠드런(children)이라고 한다.Thus, a coherent set of coefficients in the coarse band at the same location in the same direction in a similar direction is called the children.

이때, 가장 낮은 주파수대역에 있는 페어런트 노드(patent node)는 다른 방향의 3개의 칠드런을 가진다.At this time, a parent node in the lowest frequency band has three children in different directions.

즉, EZW는 상기된 페어런트-칠드런 관계에서 만들어진 제로트리라는 데이타 구조를 만든다.That is, the EZW creates a data structure called zero tree created in the parent-child relationship described above.

상기 제로트리 구조는 만약 성긴 대역에 있는 웨이브렛 계수가 어떤 주어진 임계치보다 작다면 그것의 칠드런도 역시 작을 확률이 높다는 성질을 이용한다. 이런 제로트리 구조는 DCT 계수를 부호화하는데 일반적으로 이용되는 지그재그(zigzag) 스캐닝 및 EOB(End Of Block)의 개념과 매우 유사하다.The zero tree structure uses the property that if the wavelet coefficients in the sparse band are smaller than any given threshold, then its children are also likely to be small. This zero-tree structure is very similar to the concepts of zigzag scanning and end of block (EOB) commonly used to encode DCT coefficients.

예컨대, 상기 EZW는 계수를 대역별로 스캐닝한다.For example, the EZW scans coefficients by band.

즉, 페어런트들은 같은 대역의 모든 이웃 페어런트들이 스캐닝된 다음에 그들의 칠드런이 스캐닝된다.That is, the parents are scanned after all neighboring parents of the same band are scanned.

그리고, 각 계수들은 현 임계치에 대해서 비교된다. 이때, 계수의 절대값이 임계치보다 크면 음(negative) 또는 양(positive) 중요도 심볼(significant symbol)중 하나로 부호화된다.Each coefficient is then compared against the current threshold. At this time, if the absolute value of the coefficient is larger than the threshold, it is encoded with one of negative or positive significance symbols.

또한, 제로트리 루트심볼(root symbol)은 제로트리 구조를 이루는 모든 칠드런이 임계치 이하값을 갖고있는 페어런트들을 부호화하는데 쓰인다.In addition, the zero tree root symbol is used to encode parents whose children of the zero tree structure have sub-threshold values.

아이솔레이티드 제로 심볼(isolated zero symbol)은 적어도 하나의 칠드런이 임계치 이상인 계수를 부호화한다.An isolated zero symbol encodes a coefficient whose at least one children is above a threshold.

이때, 상기 EZW는 중요정보라고 판정된 계수들에 대하여 연속추정양자화(SAQ)를 이용하여 좀더 부호화한다. 상기 웨이브렛 계수의 양자화를 위한 연속추정기법은 중요한 비트순으로 나열된 임베디드 비트스트림을 만든다.In this case, the EZW further encodes coefficients determined to be important information by using continuous estimation quantization (SAQ). The continuous estimation technique for quantization of the wavelet coefficients produces an embedded bitstream arranged in order of significant bit.

상기 EZW는 웨이브렛이 주파수 및 공간 정보를 가지므로 데이타의 공간적인 그룹핑(grouping) 및 양자화가 가능하다.In the EZW, since the wavelets have frequency and spatial information, spatial grouping and quantization of data are possible.

또한, 같은 방향의 대역에 대한 연속추정 알고리즘의 효율 향상은 제로 및 비제로(nonzero)값을 효율적으로 예측하는 것이 관건이다.In addition, the improvement of the efficiency of the continuous estimation algorithm for the band in the same direction is a key to efficiently predict the zero and nonzero values.

수많은 웨이브렛 기반 영상 부호화기 중에서 향상된 제로트리 방법을 이용한 Said and Pearman의 SPIHT(Set Partitioning in Hierarchical Trees) 부호화기가 있는데, 압축 성능면에서 매우 뛰어난 임베디드 영상 부호화기이다.Among many wavelet-based image encoders, Said and Pearman's Set Partitioning in Hierarchical Trees (SPIHT) encoder using the improved zero-tree method is an embedded image encoder with excellent compression performance.

상기 SPIHT 부호화 방법은 이미 부호화된 트리노드(tree node)를 제거하고 반씩 감소하는 임계치에 대해 중요 노드(significance node)를 갱신함으로써 계수간 중복성(redundency)을 효율적으로 제거한다.The SPIHT encoding method efficiently removes the redundancy between coefficients by removing a tree node that is already encoded and updating a critical node with respect to a threshold value that decreases by half.

이 방법이 EZW보다 큰 성능 향상을 보이는 가장 큰 이유는 중요한 계수들이 주로 가장 낮은 대역에 주로 분포한다는 성질을 이용한 향상된 제로트리 구조에 기인한다.The main reason for this method's performance improvement over EZW is due to the improved zero-tree structure using the property that important coefficients are mainly distributed in the lowest band.

그러나, 이런 방법들은 주로 높은 해상도(512×512) 영상을 이용한 정지영상 압축 부호화에는 매우 효율적이지만 QCIF와 같은 낮은 해상도(176×144)를 주로 사용하는 저전송율 비디오 압축에서는 필터특성 및 불충분한 대역분해 즉, 웨이브렛 공간 주파수 특성 저하로(제로트리 부호화 효율이 떨어짐으로써) 인해 부호화 성능이 현저히 저하된다.However, these methods are very efficient for still image compression coding using mainly high resolution (512 × 512) images, but filter characteristics and insufficient band resolution in low bit rate video compression using mainly low resolution (176 × 144) such as QCIF. That is, the encoding performance is remarkably degraded due to the wavelet spatial frequency characteristic degradation (zero tree coding efficiency is lowered).

또한, EZW를 DFD 부호화에 적용할 경우 기존의 웨이브렛에 기반한 피라미드 구조의 분해는 상위 레벨로의 에너지 압축 효율 저하 및 중간, 고주파수 성분을 제대로 반영하지 못해 전체적인 부호화 효율은 오히려 DCT 기반 방법보다 떨어지는 결과를 보여준다.In addition, when EZW is applied to DFD encoding, the decomposition of the pyramid structure based on the existing wavelets decreases the energy compression efficiency to the upper level and does not properly reflect the intermediate and high frequency components, resulting in lower overall coding efficiency than the DCT based method. Shows.

더구나, DCT 변환이 대부분의 비디오 압축 표준안의 I 프레임 및 움직임 보상된 영상과의 에러(즉, DFD)를 부호화하기 위해서 사용됨을 감안할 때 기존 비디오 코덱과의 호환성(compatibility) 또한 큰 문제가 된다.Moreover, compatibility with existing video codecs is also a major problem given that DCT transform is used to encode errors (i.e., DFDs) with I frames and motion compensated images in most video compression standards.

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로서, 본 발명의 목적은 DCT 계수를 2-레벨 피라미드 구조로 재배치한 후 제로트리 부호화함으로써, 제로트리 부호화의 효율을 높이면서 기존 DCT 기반 부호화기와의 호환성을 유지하는 영상 부호화 장치를 제공함에 있다.The present invention is to solve the above problems, an object of the present invention is to rearrange the DCT coefficients in a two-level pyramid structure and then zero-tree coding, thereby improving the efficiency of zero-tree coding and compatibility with existing DCT-based encoders The present invention provides an image encoding apparatus that maintains.

도 1은 종래의 DCT 기반 영상 부호화 장치의 구성 블록도1 is a block diagram of a conventional DCT-based video encoding apparatus

도 2는 본 발명에 따른 영상 부호화 장치의 구성 블록도2 is a block diagram illustrating a video encoding apparatus according to the present invention.

도 3은 DCT와 웨이브렛과의 관계를 설명하기 위한 도면3 is a diagram for explaining a relationship between a DCT and a wavelet.

도 4는 도 2에서 DCT 계수가 2-레벨 피라미드 구조로 재배치되는 관계를 나타낸 도면FIG. 4 is a diagram illustrating a relationship in which DCT coefficients are rearranged in a two-level pyramid structure in FIG. 2. FIG.

도 5는 도 2에서 DCT 계수의 2-레벨 피라미드 구조로의 재배치에서 페어런트-칠드런 관계를 나타낸 도면FIG. 5 shows a parent-child relationship in the rearrangement of DCT coefficients to a two-level pyramid structure in FIG.

도 6은 본 발명과 종래기술과의 성능평가의 일예를 나타낸 도면6 shows an example of performance evaluation between the present invention and the prior art;

도면의 주요부분에 대한 부호의 설명Explanation of symbols for main parts of the drawings

101 : DCT부 102 : 양자화부101: DCT unit 102: quantization unit

103 : 엔트로피 부호화부 104 : 채널103: entropy encoder 104: channel

105 : 역양자화부 106 : IDCT부105: dequantization unit 106: IDCT unit

107 : 가산기 108 : 프레임 메모리107: adder 108: frame memory

109 : 움직임 예측부 110 : 감산기109: motion prediction unit 110: subtractor

111 : 비트율 제어부 201 : 재배치부111: bit rate control unit 201: relocation unit

202 : 제로트리 부호화부 203 : 역재배치부202: zero tree encoding unit 203: reverse rearrangement unit

본 발명에 따른 영상 부호화 장치는, 입력 프레임을 복수개의 블록으로 나눈 후 각 블록의 공간 영역을 주파수 영역으로 변환하는 변환부와, 상기 변환부의 변환 계수를 영상재현에 필요한 정보를 포함하고 있는 정도에 따른 중요도 순으로 분류하여 재배치하는 재배치부와, 상기 재배치된 계수의 위치 및 크기 정보를 중요도 순으로 부호화하여 중요도에 따라 정렬된 비트 스트림을 출력하는 제로트리 부호화 부를 포함하여 구성되는 것을 특징으로 한다.According to an aspect of the present invention, a video encoding apparatus includes a transform unit that divides an input frame into a plurality of blocks, and then converts a spatial domain of each block into a frequency domain, and a degree of conversion of the transform coefficients of the transformer to information necessary for image reproduction. And a zero-tree encoder for relocating and classifying the order of importance according to the order of importance, and outputting a bit stream arranged according to the importance by encoding the position and size information of the rearranged coefficients in order of importance.

상기 변환부는 I 프레임의 경우 입력되는 데이타를 DCT에 의해 주파수 영역으로 변환하는 것을 특징으로 한다.In the case of an I frame, the converting unit converts input data into a frequency domain by DCT.

상기 변환부는 P 프레임의 경우 이전 I 또는 P 프레임을 이용하여 움직임 보상 예측을 한 후 상기 움직임 보상된 데이터와 현재 입력되는 데이터와의 차를 DCT에 의해 주파수 영역으로 변환하는 것을 특징으로 한다.In the case of a P frame, the transform unit performs motion compensation prediction using a previous I or P frame, and then converts a difference between the motion compensated data and the currently input data into a frequency domain by DCT.

상기 변환부는 P 프레임의 각 블록에 대해서는 인트라 또는 인터 모드로 부호화하는 것을 특징으로 한다.The converting unit encodes each block of the P frame in an intra or inter mode.

상기 재배치부는 상기 변환부의 DCT 계수를 웨이브렛 해석을 이용하여 분류한 후 이것을 2-레벨 피라미드 구조로 재배치하는 것을 특징으로 한다.The rearrangement unit classifies the DCT coefficients of the transform unit using wavelet analysis and rearranges the DCT coefficients into a two-level pyramid structure.

상기 재배치 부의 2-레벨 피라미드 구조는 균일 대역 분해방법으로 분해된 네개의 대역을 독립적으로 웨이브렛 변환한 구조인 것을 특징으로 한다.The two-level pyramid structure of the rearrangement unit is a structure in which the four bands decomposed by the uniform band decomposition method are independently wavelet transformed.

상기 재배치부는 상기 DCT 계수의 DC 계수와 주파수에 따른 AC 계수 값으로 분해하여, DC의 공간적 상관성을 이용하기 위해서 정의된 페어런트-칠드런 관계로부터 가장 저주파수 대역만 다시 재배치한 형태로 분류 및 재배치하는 것을 특징으로 한다.The rearrangement unit decomposes the DC coefficient of the DCT coefficient and the AC coefficient value according to the frequency, and classifies and rearranges the lowest frequency band only in a rearranged form from the parent-child relationship defined in order to use the spatial correlation of DC. It is done.

상기 재배치부는 재배열된 DCT 계수의 양자화를 위해서 DC 값을 바이어스(bias)시키는 것을 특징으로 한다.The rearrangement unit biases the DC value to quantize the rearranged DCT coefficients.

상기 재배치부는 I 프레임에 대해서는 DC 계수들의 평균값을 DC 계수에서 감산하여 DC 평균으로 DC 값을 바이어스시키는 것을 특징으로 한다.The relocator may bias the DC value by the DC average by subtracting the average value of the DC coefficients from the DC coefficient with respect to the I frame.

상기 재배치부는 P 프레임에 대해서는 각 프레임당 인트라 블록들의 DCT 계수의 평균값을 DC 계수에서 감산하여 DC 평균으로 DC 계수들을 바이어스시키는 것을 특징으로 한다.The relocator may subtract the average value of the DCT coefficients of the intra blocks per frame from the DC coefficient with respect to the P frame to bias the DC coefficients with the DC average.

상기 재배치부는 P 프레임의 인터 모드의 블록에 대해서는 DC 평균값을 0으로 하여 DC 값을 바이어스시키지 않는 것을 특징으로 한다.The relocation unit is characterized in that the DC average value is set to 0 for the inter mode block of the P frame so as not to bias the DC value.

상기 제로트리 부호화부는 재배치된 각 계수들은 일련의 임계치들과 비교하여 중요 계수의 위치 및 부호를 부호화하며, 초기 임계치는 최대 계수치보다 작은 2의 최대 계승으로 하고, 이를 반씩 줄여나가면서 수행하는 것을 특징으로 한다.The zero-tree coding unit encodes the position and the sign of the significant coefficients by comparing each of the rearranged coefficients with a series of thresholds, and performs the initial threshold with a maximum factor of 2 smaller than the maximum coefficient value and decreases it by half. It is done.

상기 제로트리 부호화부는 중요 정보의 위치 및 부호를 부호화하는 분류경로와, 중요계수를 연속적으로 추정하는데 필요한 세밀 구분 경로의 두 단계로 이루어지는 것을 특징으로 한다.The zero tree encoder is characterized in that it consists of two stages: the classification path for encoding the location and code of the important information, and the fine division path required for continuously estimating the critical coefficient.

상기 제로트리 부호화부의 출력비트 스트림은 중요성 테스트 결과와 부호, 세밀하게 분류된 비트들로 이루어지는 것을 특징으로 한다.The output bit stream of the zero tree encoder is characterized by consisting of a significance test result, a sign, and finely classified bits.

상기 제로트리 부호화부는 가장 중요한 계수의 근사값을 먼저 전송하고 한번에 한비트씩 모든 중요 계수값들을 세밀하게 분류함으로써 가장 중요한 정보를 점진적으로 전송하는 것을 특징으로 한다.The zero tree encoder is characterized by gradually transmitting the most important information by first transmitting an approximation of the most important coefficients and classifying all the important coefficient values in detail one bit at a time.

상기 제로트리 부호화부는 제로트리 부호화 과정에서 발생하는 중요성 테스트 결과와 부호, 세밀한 분류 심볼은 적응적 산술 부호화 방법에 의해서 무손실 엔트로피 부호화하는 엔트로피 부호화부를 더 포함하는 것을 특징으로 한다.The zero tree encoder may further include an entropy encoder configured to perform lossless entropy encoding on the significance test result, the code, and the detailed classification symbol generated in the zero tree encoding process by an adaptive arithmetic encoding method.

상기 제로트리 부호화부의 출력은 움직임 보상 및 예측을 위해 다수개의 균일한 분해 대역을 갖도록 역재배치되는 것을 특징으로 한다.The output of the zero tree encoder is rearranged to have a plurality of uniform decomposition bands for motion compensation and prediction.

본 발명의 다른 목적, 특징 및 잇점들은 첨부한 도면을 참조한 실시예들의 상세한 설명을 통해 명백해질 것이다.Other objects, features and advantages of the present invention will become apparent from the following detailed description of embodiments taken in conjunction with the accompanying drawings.

이하, 본 발명의 바람직한 실시예를 첨부도면을 참조하여 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명은 8×8 DCT가 64개의 균일한 분해 대역을 갖는 웨이브렛 변환이라는 웨이브렛 해석을 이용하여, DCT 계수를 2-레벨 제로트리 구조로 재배치함으로써, 제로트리 부호화의 효율을 높이는데 있다.The present invention aims to increase the efficiency of zero-tree coding by rearranging DCT coefficients into a two-level zero-tree structure by using wavelet analysis that 8x8 DCT is a wavelet transform having 64 uniform decomposition bands.

모의 실험 결과 본 발명에 따른 영상 부호화 장치는 효율적인 제로트리 구조를 만들어 저전송 비트율 비디오 압축 표준안인 H.263의 I프레임 부호화 방법보다 화질면에서 뛰어난 성능을 나타내었으며, 더우기 기존의 사피로(Shapiro)의 웨이브렛을 이용한 EZW 방법보다도 더 높은 비트율 왜곡(rate distortion) 향상을 나타냈다.Simulation results show that the video encoding apparatus according to the present invention has an efficient zero-tree structure and has superior performance in terms of image quality than the H.263 I-frame encoding method, which is a low bit rate video compression standard. It shows higher rate distortion improvement than EZW method using wavelet of.

또한, 정지영상 압축에서도 JPEG 및 다른 대표적인 웨이브렛 임베디드 영상 부호화기의 인용된 결과에 필적할 만한 비트율 왜곡 향상을 나타내었다.Still image compression also shows bit rate distortion improvements comparable to those of JPEG and other representative wavelet embedded image encoders.

도 2는 본 발명에 따른 영상 부호화 장치의 구성 블록도로서, 양자화기(200)를 제외한 나머지 블록은 상기된 종래의 도 1과 동일하므로 동일 블록에 대해서는 동일 부호를 사용하고 상세한 설명을 생략한다.FIG. 2 is a block diagram illustrating a video encoding apparatus according to the present invention. Since the remaining blocks except for the quantizer 200 are the same as in FIG. 1, the same reference numerals are used for the same block and detailed description thereof will be omitted.

즉, 본 발명은 계수별로 부호화하는 기존의 접근 방법을 탈피해 각 계수를 일정 비트로 점진적으로 양자화하는 임베디드 제로트리 양자화기(200)를 채택한다.That is, the present invention adopts an embedded zero-tree quantizer 200 that quantizes each coefficient gradually by a predetermined bit, deviating from the conventional approach of encoding by coefficient.

이때, 도 2와 같은 영상 부호화기로 입력되는 비디오의 각 프레임은 첫 번째 프레임(I)을 위해서 인트라(Intra), 나머지 프레임(P)은 인터(Inter)로 부호화된다.In this case, each frame of the video input to the image encoder as shown in FIG. 2 is intra for the first frame I, and the remaining frames P are encoded with Inter.

그리고, 부호화는 매크로 블록단위로 이루어지며 밝기 및 색 성분은 4:2:0 포맷이다.The encoding is performed in macroblock units, and the brightness and color components are in a 4: 2: 0 format.

즉, I 프레임의 경우 첫 번째 입력 영상의 밝기(Y) 및 색(Cb,Cr) 성분을 8×8 블록으로 나눈 후 각 블록은 DCT부(101)로 입력된다. 상기 DCT부(101)는 각 블록의 영상을 DCT에 의해 공간 영역으로부터 주파수 영역으로 변환하여 공간적 중복성을 제거한다.That is, in the case of an I frame, after dividing the brightness (Y) and the color (Cb, Cr) components of the first input image into 8 × 8 blocks, each block is input to the DCT unit 101. The DCT unit 101 removes spatial redundancy by converting an image of each block from the spatial domain to the frequency domain by DCT.

그리고, 상기 DCT부(101)에서 DCT된 각 블록의 계수들은 양자화기(200)의 재배치(Rearrange)부(201)로 입력되어 2-레벨 피라미드 구조로 재배치된다. 즉, 상기 DCT부(101)에서 주파수 영역으로 변환된 DCT 계수를 중요도가 높은 순으로 분류하고 2-레벨 피라미드 구조로 재배치한다.The coefficients of the blocks DCT in the DCT unit 101 are input to the rearrangement unit 201 of the quantizer 200 and rearranged into a two-level pyramid structure. That is, the DCT coefficients transformed by the DCT unit 101 into the frequency domain are sorted in order of high importance and rearranged into a two-level pyramid structure.

상기 재배치부(201)에서 2-레벨 피라미드 구조로 재배치된 DCT 계수는 제로트리 코딩부(202)에서 임베디드 제로트리 구조로 코딩되어 양자화된다. 이때 SPIHT 코딩법을 사용할 수 있다.The DCT coefficients relocated to the two-level pyramid structure in the repositioner 201 are quantized by being coded into an embedded zerotree structure by the zerotree coding unit 202. In this case, the SPIHT coding method may be used.

상기 양자화된 각 계수는 엔트로피 부호화부(103)에서 엔트로피 부호화된 후 채널(104)을 통해 전송한다.Each quantized coefficient is entropy coded by the entropy encoder 103 and then transmitted through the channel 104.

또한, 비디오 시퀀스의 모든 연속되는 프레임 예컨대, P 프레임의 경우는 H.263+에서 사용되는 움직임 추정 및 보상 방법을 쓴다. 따라서, 본 발명의 영상 부호화기는 블록 움직임 추정, ANNEX D(Unrestricted Motion Vector mode) 또는 ANNEX F(Advanced Prediction mode)를 선택적으로 사용할 수 있다.In addition, the motion estimation and compensation method used in H.263 + is used for all consecutive frames of the video sequence, for example, P frames. Accordingly, the image encoder of the present invention may selectively use block motion estimation, ANNEX D (Unrestricted Motion Vector mode), or ANNEX F (Advanced Prediction mode).

그리고, 각 블록들이 움직임 예측된 후의 나머지(residuals) 에러도 DCT부(101)에서 DCT한 다음 양자화를 위해 재배치부(201)에서 2-레벨 피라미드 구조의 나머지 프레임으로 재배치한다.After the blocks are motion predicted, residuals errors are also DCTed by the DCT unit 101 and then rearranged by the repositioner 201 to the remaining frames of the two-level pyramid structure for quantization.

이때, 아주 빠른 움직임이나 물체의 움직임에 의해 겹쳐지거나 드러난 영역, 또는 장면 전환이 일어날 경우, 그런 매크로 블록에 대해서는 예측이 실패한다. 그런 경우 움직임 추정 후 DFD를 부호화하는 것보다는 원 영상의 블록을 부호화하는 것이 부호화 효율 및 주관적 화질 측면에서 바람직하다.At this time, when an overlapping or exposed area or a scene change occurs due to a very fast movement or movement of an object, prediction fails for such a macro block. In such a case, it is preferable to encode a block of the original image rather than to encode DFD after motion estimation in terms of coding efficiency and subjective quality.

본 발명은 그런 블록들에 대해서 인트라 모드로 선택하여 부호화한다.The present invention selects and encodes such blocks in intra mode.

이때, 인트라/인터 모드의 결정은 하기의 수학식 1과 같으며, H.263 부호화기의 모드 결정과 유사하다.In this case, the determination of the intra / inter mode is shown in Equation 1 below, and is similar to the mode decision of the H.263 encoder.

만약, A < (SAD(x,y) - T)이면 인트라 모드가 선택되고 움직임 추정을 행하지 않는다. 단 T는 주어진 임계치로서 각 블록의 움직임 상태에 따라 달라질 수 있으며, 인트라/인터 모드는 H.263과 같이 부가정보로서 전송된다. 여기서, MB_mean은 매크로 블록의 평균값이고, A는 상기 매크로 블록에 대해서 각 픽셀과 평균값과의 차이 즉, 각 픽셀의 편차이며, SAD는 이전 프레임의 같은 위치에서의 차값이다.If A <(SAD (x, y)-T), the intra mode is selected and no motion estimation is done. However, T is a given threshold and may vary according to the motion state of each block, and the intra / inter mode is transmitted as additional information such as H.263. Here, MB _mean is an average value of the macro block, A is a difference between each pixel and the average value for the macro block, that is, a deviation of each pixel, and SAD is a difference value at the same position of the previous frame.

여기서, 인트라 모드로 선택된 블록의 DCT 변환 후 주파수 스펙트럼 특성은 다른 인터 모드의 블록과 많은 차이가 있다. 특히, 인트라 모드로 선택된 블록의 DC 계수의 크기가 크다.Here, the frequency spectrum characteristic after DCT conversion of a block selected as an intra mode has a lot of differences from other inter mode blocks. In particular, the magnitude of the DC coefficient of the block selected in the intra mode is large.

따라서, 본 발명에서는 각 프레임당 인트라 블록들의 DCT 계수의 평균값을 DC 계수에서 빼줌으로서, DC 계수들을 DC 평균으로 바이어스시켜 큰 DC 계수로 인하여 불필요한 스캐닝에 낭비되는 비트수를 줄였다.Therefore, in the present invention, by subtracting the average value of the DCT coefficients of the intra blocks per frame from the DC coefficients, the DC coefficients are biased to the DC average to reduce the number of bits that are wasted for unnecessary scanning due to the large DC coefficients.

물론 인터 블록은 바이어스없이 그대로 재배치 과정을 행한다.Of course, the interblock is relocated without bias.

이와 같은 처리를 한 후 구성된 2-레벨 피라미드 구조의 DCT 계수는 I 프레임 부호화에서 행한 바와 같이, 제로트리 코딩부(202)에서 임베디드 제로트리 부호화를 이용해 양자화한다.The DCT coefficients of the two-level pyramid structure constructed after such processing are quantized by the embedded zerotree coding in the zerotree coding unit 202 as performed in the I frame coding.

이때, 상기 재배치부(201)에서 DCT 계수를 2-레벨 피라미드 구조로 재배치하는 목적은 DCT가 임베디드 제로트리 부호화와 잘 결합할 수 있도록 만들어 계수의 공간 및 주파수간 상호 의존성을 높임으로써, 임베디드 제로트리 부호화 효율을 높이는데 있다.In this case, the purpose of relocating the DCT coefficients to the two-level pyramid structure in the repositioning unit 201 is to make DCT well coupled with the embedded zero-tree coding, thereby increasing the interdependence between the spatial and frequency of the coefficients, and thus the embedded zero tree. It is to improve the coding efficiency.

도 3은 DCT와 웨이브렛의 관계를 직관적으로 알 수 있는 간단한 도면으로서, 일 예로 2×2 블럭 DCT를 고려하면, 변환 후 각 블럭은 정도의 차이는 있겠지만 도 3의 왼쪽 그림과 같이 a(가장 낮은 주파수)에서 d(가장 높은 주파수)까지 네 종류의 변환계수를 가진다.Figure 3 is a simple diagram that can intuitively know the relationship between the DCT and the wavelet. For example, considering the 2 × 2 block DCT, each block after the conversion may have a degree of difference, as shown in the left figure of Figure 3 a (the most Low frequency) to d (highest frequency).

이것은 동일 영상에 대한 적당한 2×2 분해필터를 이용하여 영상을 분해한 다음 각 방향으로 2만큼 부표본화하면 대역 분해된 영상은 변환된 블럭과 등가이다. 즉, 도 3의 오른쪽 그림과 같이 가장 상위대역(low-pass/low-pass)은 각 DCT 변환블럭의 가장 낮은 주파수 성분을 모아놓은 것과 같다. 다른 변환계수들은 나머지 대역에 대하여 비슷한 방법으로 모아진다.This is achieved by decomposing an image using a suitable 2x2 decomposition filter for the same image and subsampling by 2 in each direction, and the band-decomposed image is equivalent to the transformed block. That is, as shown in the right figure of FIG. 3, the highest band (low-pass / low-pass) is as if the lowest frequency components of each DCT conversion block are collected. The other conversion coefficients are collected in a similar way for the remaining bands.

이와같은 개념을 확장하면 8×8블럭 DCT는 64개의 분해대역을 갖는 웨이브렛 변환으로 볼 수 있다.Extending this concept, an 8x8 block DCT can be seen as a wavelet transform with 64 resolution bands.

도 4는 DCT 계수를 2-레벨 피라미드 구조로 재배치하는 과정을 보인 도면으로서, 부호 401은 DCT된 계수를 표시하며, 부호 402는 이것을 2-레벨 피라미드 구조로 재배치한 것을 표시한다.4 shows a process of rearranging DCT coefficients into a two-level pyramid structure, where 401 indicates a DCT coefficient and 402 indicates a rearrangement thereof into a two-level pyramid structure.

즉, 상기 재배치부(201)는 DCT된 각 8×8 블록을 EZW에서 정의된 형태의 3-레벨 웨이브렛 피라미드 구조로 간주한다. 그리고, 각 계수간 공간적 상관성을 나타내는 페어런트-칠드런 관계를 설명하기 위해 도 4와 같이 각 계수가 해당하는 위치에 번호를 부여한다.That is, the repositioner 201 regards each DCT 8 × 8 block as a three-level wavelet pyramid structure defined in the EZW. In order to explain the parent-child relationship indicating the spatial correlation between the coefficients, the coefficients are numbered as shown in FIG. 4.

이때, i가 1부터 63중의 계수라면 계수 i의 페어런트는 i/4의 정수값이며, 반면에 j가 1부터 15중의 계수라면 계수 j의 칠드런은 {4j,4j+1,4j+2,4j+3}이다.At this time, if i is a coefficient from 1 to 63, the parent of the coefficient i is an integer value of i / 4, whereas if j is a coefficient from 1 to 15, the children of the coefficient j are {4j, 4j + 1,4j + 2,4j +3}.

그리고, DC 계수 0은 단지 계수 1,2,3 세개의 칠드런을 갖는 트리(tree)의 루트(root)이다.And, DC coefficient 0 is just the root of the tree with three children 1,2,3 children.

여기서, 대부분의 영상에서 중요한 정보는 DC계수와 처음 몇개의 AC계수에 포함되어 있으므로 각 DCT 블럭의 DC 계수 및 칠드런 1,2,3은 영상 크기의 메모리의 최상위 대역으로 맵핑(mapping)되며 각 계수의 각 페어런트 1,2,3에 대한 칠드런 4,5,6,7; 8,9,10,11; 12,13,14,15;들이 각각 그 다음 대역으로, 다음 대역에 있는 페어런트들의 칠드런이 각각 그 다음 대역으로 대응된다.Here, important information in most images is included in the DC coefficient and the first few AC coefficients, so the DC coefficients and children 1,2,3 of each DCT block are mapped to the uppermost band of the memory of the image size and each coefficient Children 4,5,6,7 for each parent of 1,2,3; 8,9,10,11; 12, 13, 14, and 15; respectively, correspond to the next band, and children of parents in the next band correspond to the next band, respectively.

도 5는 본 발명에 의한 구조의 페어런트-칠드런의 관계(501-502-502' -503-503')를 보여준다.Figure 5 shows the parent-child relationship of the structure according to the present invention (501-502-502 '-503-503').

본 발명의 구조와 종래의 웨이브렛에 의한 2레벨 피라미드 구조와의 차이점은 분해방법에 의한 주파수 특성이다.The difference between the structure of the present invention and the two-level pyramid structure by the conventional wavelet is the frequency characteristic by the decomposition method.

즉, 종래의 웨이브렛 2-레벨 피라미드 구조는 계층적 대역분해방법으로 하나의 저주파 대역에 대해서만 다시 웨이브렛 변환을 적용하지만 본 발명의 2-레벨 피라미드 구조는 균일 대역 분해방법으로 분해된 네개의 대역을 독립적으로 웨이브렛 변환한 구조를 2-레벨 피라미드 구조로 간주하는 것이다.In other words, the conventional wavelet two-level pyramid structure is a hierarchical band decomposition method, and the wavelet transform is applied to only one low frequency band, but the two-level pyramid structure according to the present invention has four bands decomposed by the uniform band decomposition method. The wavelet transformed structure is considered to be a 2-level pyramid structure.

또한 본 발명의 재배치 구조가 단순하고 균일 대역 분해방법과 다른 차이점은 DC의 공간적 상관성을 이용하기 위해서 정의된 페어런트-칠드런 관계로 가장 저주파 대역만 다시 재배치한 형태이다.In addition, the relocation structure of the present invention is simple, and the difference from the uniform band decomposition method is that only the lowest frequency band is rearranged again in a parent-child relationship defined to use the spatial correlation of DC.

따라서 기존의 웨이브렛 2-레벨 피라미드 구조가 저주파 성분을 잘 반영하나 중간주파수 성분을 잘 반영하지 못하는 특성이 있는 반면에 본 발명의 구조는 종래의 웨이브렛 2-레벨 구조에 비해서 저주파 성분의 반영특성은 다소 떨어지나 중간주파수 성분을 잘 반영하는 특성이 있다.Therefore, while the conventional wavelet two-level pyramid structure reflects low frequency components well but does not reflect middle frequency components well, the structure of the present invention is a reflection characteristic of low frequency components compared to the conventional wavelet two-level structure. Is somewhat lower but reflects the intermediate frequency components well.

이러한 특징은 낮은 비트율 전송에서 계층적 분해방법보다 유리하다. 그 이유는 제로트리 부호화의 효율을 결정하는 디케잉 스펙트럼(decaying spectrum) 성질을 잃지 않으면서 일반적으로 낮은 비트율에서 고주파 성분까지 부호화하지 못하는 부호화 특성상, 중간주파수 성분을 잘 반영하여 부호화 효율을 높일 수 있기 때문이다.This feature is advantageous over hierarchical decomposition in low bit rate transmissions. The reason for this is that coding efficiency that can not be encoded from low bit rate to high frequency component without losing the decaying spectrum property that determines the efficiency of zero tree coding can be improved by reflecting the intermediate frequency component well. Because.

한편, 이와 같이 2-레벨 피라미드 구조로 재배치된 DCT 계수의 양자화는 EZW의 향상된 방법즉, 제로트리 부호화부(202)에서 제로트리 구조로 부호화하여 이루어진다.On the other hand, the quantization of the DCT coefficients rearranged in the two-level pyramid structure is performed by encoding the zero tree structure in the improved method of the EZW, that is, the zero tree encoder 202.

즉, 상기 제로트리 부호화부(202)는 가장 상위 대역부터 부호화함으로써, 시각적으로 가장 중요한(significant) 영향을 미치는 계수들을 우선적으로 부호화한 결과가 된다. I 프레임의 경우 각 블록들의 DC 계수들의 평균값을 빼줌으로써 DC 계수들을 DC 계수의 평균으로 바이어스시켜 부호화의 효율을 높인다.That is, the zero tree encoder 202 first encodes coefficients having the most visually significant influence by encoding the highest band. In the case of an I frame, by subtracting the average value of the DC coefficients of each block, the DC coefficients are biased to the average of the DC coefficients, thereby improving coding efficiency.

우선 가장 상위 대역의 DC성분(각 노드)에 대하여 DC 성분들의 평균값을 빼준다. 이웃하는 주변 블럭의 DC값은 공간적 상관성이 크므로, 이렇게함으로써 각 DC 값은 어느 정도 평균값으로 바이어스(bias)되고, 제로트리 부호화시 불필요한 스캐닝에 낭비되는 비트를 줄여 효율을 높일 수 있다.First, the average value of the DC components is subtracted with respect to the DC component (each node) of the uppermost band. Since the DC values of neighboring neighboring blocks are highly spatially correlated, by doing so, each DC value is biased to an average value to some extent, thereby reducing efficiency of unnecessary bits during zero-tree coding and increasing efficiency.

그러나, P 프레임의 경우 이웃하는 주변 블록 DC의 공간적 상관성이 크지 않으므로 DC 평균값을 0으로 놓아 상기와 같은 과정을 거치지 않게 한다.However, in the case of the P frame, since the spatial correlation between neighboring neighboring blocks DC is not large, the DC average value is set to 0 so that the above process is not performed.

그리고, 각 계수들은 일련의 임계치들과 비교하여 중요(significant) 계수의 위치 및 부호(sign)를 부호화 하는데, 초기 문턱치(threshold level)는 최대 계수치보다 작은 2의 최대 계승으로 하고, 이를 반씩 줄여나가면서 수행한다.Each coefficient encodes the position and sign of a significant coefficient compared to a series of thresholds. The initial threshold level is a maximum factor of 2 less than the maximum coefficient, and is reduced by half. Perform as you go.

이때, 임계치보다 큰 계수는 중요계수로 구분되고, 임계치보다 작은 계수는 중요하지 않는 것으로 구분된다.In this case, coefficients larger than the threshold are classified as important coefficients, and coefficients smaller than the threshold are classified as not important.

즉, 중요도(significance)에 따라 LIS(list of significant sets), LIP(list of insignificant pixels), LSP(list of significant pixels)의 세개의 목록(list) 을 사용한다.In other words, three lists of LIS (list of significant sets), LIP (list of insignificant pixels) and LSP (list of significant pixels) are used.

상기 LIP는 최상위 계층의 노드(root)들로, LIS는 LIP의 각 페어런트의 칠드런의 집합으로 각각 초기화된다.The LIP is the root of the highest layer, and the LIS is initialized with a set of children of each parent of the LIP.

부호화는 중요정보의 위치 및 부호를 부호화하는 분류경로(sorting pass)와 중요계수를 연속적으로 추정하는데 필요한 상세 구분 경로(refinement pass)의 두 단계로 이루어진다.The encoding consists of two stages: a sorting pass for encoding the location and code of the important information and a refinement pass necessary for continuously estimating the critical coefficient.

상기 분류 경로에서는 먼저 LIP 화소들을 현재의 문턱치와 비교하여 중요계수(significant)이면 부호를 출력하고 LSP로 이동시킨다.In the classification path, first, LIP pixels are compared with a current threshold, and a sign is output when the significant factor is significant, and then moved to the LSP.

다음으로 LIS의 계수들을 조사하여 모두 중요하지 않으면(insignificant) 단 1비트만으로 표현이 가능하고, 만일 중요 픽셀(significant pixel)이 있으면, 현재 루트의 디센던트(descendant)들을 루트(foor)로 갖는 서브셋(subset)으로 분할 한 후 앞의 과정을 반복한다.Next, we can examine the coefficients of the LIS and represent them with only 1 bit if they are all insignificant, and if there is a significant pixel, the subset with the roots of the descendants of the current root. After dividing by (subset), repeat the previous process.

그리고, LIS와 LIP에 대해 한번의 분류 경로(sorting pass)가 끝나면 LSP에 대해 세밀 구분 경로(refinement pass)가 수행되어 각 계수값들은 1비트씩 세밀하게 분류된다.After one sorting pass is completed for the LIS and the LIP, a refinement pass is performed for the LSP, and each coefficient value is classified by one bit.

이렇게 한번의 전체 경로가 끝나면 임계치를 2로 나누고 그 다음 분류경로로 넘어가게 된다.At the end of this entire path, the threshold is divided by 2 and the next classification path is passed.

출력비트 스트림은 중요성 테스트 결과와 부호, 세밀하게 분류된 비트들로 이루어지며, 부호기와 복호기가 동일한 알고리즘을 공유하면서, 부호화가 진행되는 동안의 모든 결정 결과가 출력되므로, 중요 계수의 위치 정보는 따로 전송될 필요가 없다.The output bit stream is composed of the significance test result, the sign, and the finely divided bits. Since the encoder and the decoder share the same algorithm, all the determination results during the encoding process are output. It does not need to be sent.

이와 같은 방법은 결과적으로 가장 중요한 계수의 근사값을 먼저 전송하고 한번에 한 비트씩 모든 중요 계수 값들을 세밀하게 분류함으로써, 항상 가장 중요한 정보를 선택하는 프로그레시브(progressive) 전송이 되고, 만약 DCT가 웨이브렛과 같이 단위적(unirary)이고 기하학적 규칙(euclidean norm)이 보존된다면 이러한 점진적인 전송은 MSE(mean square error)를 줄이는 최적의 방법이다.This method results in a progressive transmission that always selects the most important information by first sending an approximation of the most important coefficients and then finely classifying all the important coefficient values one bit at a time. Likewise, if the unitary and geometric rules (euclidean norm) are preserved, this gradual transmission is the best way to reduce mean square error (MSE).

한편, 상기 엔트로피 부호화부(103)는 상기 제로트리 부호화부(202)의 임베디드 제로트리 부호화 과정에서 발생하는 중요성 테스트 결과와 부호, 세밀한 분류심볼은 적응적 산술 부호화 방법에 의하여 무손실 엔트로피(entropy)부호화 된다.In the meantime, the entropy encoding unit 103 performs lossless entropy encoding on the significance test result, the code, and the detailed classification symbol generated in the embedded zero tree encoding process of the zero tree encoder 202 by an adaptive arithmetic encoding method. do.

이 방법은 복호화기에 내재적(implicitly)으로 모델을 전송하므로 영상에 대한 사전 정보가 필요없고 중요계수간의 통계적 의존성을 매우 효율적으로 이용할 수 있다.Since this method implicitly transmits the model to the decoder, it does not require any prior information on the image and can utilize the statistical dependence between important coefficients very efficiently.

본 발명에서는 Witten등의 적응적 산술 부호화 알고리즘을 이용하였다.In the present invention, an adaptive arithmetic coding algorithm such as Witten is used.

그리고, 본 발명은 임베디드 부호화기이므로 비트율 제어부(111)의 비트율 조절은 출력단에서 기존의 비트율 왜곡에 기반한 어떠한 비트율 조절 알고리즘도 가능하며, 쉽게 비트율 조절이 가능하다.In addition, since the present invention is an embedded encoder, the bit rate adjustment of the bit rate controller 111 may be any bit rate adjustment algorithm based on the existing bit rate distortion at the output terminal, and the bit rate may be easily adjusted.

한편, 상기 제로트리 부호화부(202)의 출력은 움직임 예측을 위해 역양자화되어야 한다. 이를 위해 상기 제로트리 부호화부(202)의 출력을 재배치부(201)로 입력되기 전의 상태 즉, 다수개의 균일한 분해 대역을 갖도록 역재배치부(203)에서 역재배치하면 역양자화가 수행된 결과를 얻는다.Meanwhile, the output of the zero tree encoder 202 must be dequantized for motion prediction. To this end, if the output of the zero tree encoder 202 is rearranged in the state before the input to the rearrangement unit 201, that is, the rearrangement unit 203 has a plurality of uniform decomposition bands, the result of dequantization is performed. Get

그리고, 상기 역재배치부(203)의 출력은 IDCT부(106)에서 IDCT되어 가산기(107)로 출력된다. 이후의 동작은 도 1과 동일하다.Then, the output of the reverse rearrangement unit 203 is IDCT in the IDCT unit 106 and output to the adder 107. Subsequent operations are the same as in FIG.

도 6은 종래의 부호화기와 본 발명에 따른 부호화기의 성능을 비교한 그래프로서, 본 발명의 부호화기는 H.263+ 및 ANNEX I를 갖는 H.263+ 영상 부호화기 보다 뛰어난 I 프레임 부호화의 뛰어난 비트율 왜곡 성능의 향상을 나타냄을 알 수 있다. 따라서, 장면 전환 및 물체의 빠른 이동으로 인한 장면에서의 드러나거나 겹쳐진 영역에 대하여 뛰어난 부호화 효율을 얻을 수 있다.6 is a graph comparing the performance of the conventional encoder and the encoder according to the present invention, wherein the encoder of the present invention has superior bit rate distortion performance of I frame encoding than H.263 + image encoder having H.263 + and ANNEX I. It can be seen that the improvement of. Therefore, excellent coding efficiency can be obtained for areas that are exposed or overlapped in the scene due to scene change and rapid movement of an object.

이와 같이 본 발명의 알고리즘은 DCT의 재배치로 양자화 계수를 MSB로부터 LSB순으로 부호화하며 대역내(DC계수의 공간적 상관성) 및 대역간 공간적 의존성(한 블럭의 저주파에서 고주파까지의 디케잉 스펙트럼 특성)을 이용하여 임베디드 제로트리의 부호화 효율을 높일 수 있었다.As described above, the algorithm of the present invention encodes the quantization coefficients in the order of MSB to LSB by rearranging the DCT, and in-band (spatial correlation of DC coefficients) and inter-band spatial dependence (dequering spectral characteristics from low frequency to high frequency of one block). The coding efficiency of the embedded zero tree can be increased.

이 방법은 어느 정도 비트플랜(bitplane)영상 부호화와 특성을 공유하지만 비트가 라인별(line by line)별로 전송되지 않고 단일 부호화된 심볼을 가진 제로트리 구조를 사용함으로써 복호화기는 비트플랜의 대부분 영역에 있는 거의 모든 비트가 0임을 추론한다는 점에서 차이가 있다.This method shares some characteristics with bitplane image coding, but by using a zero-tree structure with a single coded symbol in which bits are not transmitted line by line, the decoder is able to The difference is that it infers that almost every bit is zero.

한편, 이렇게 부호화되어 전송된 영상을 복호하는 영상 복호화기는 영상 부호화기와 동일한 알고리즘을 그대로 사용한다. 본 발명에 의한 영상 복호화는 영상 부호화와 같은 과정을 수행함으로써 이루어진다.Meanwhile, the image decoder which decodes the encoded and transmitted image uses the same algorithm as the image encoder. Image decoding according to the present invention is performed by performing the same process as image encoding.

본 발명에 따른 영상 부호화기에 의하면, 8×8 블록단위의 픽쳐 데이터를 DCT 변환하고 상기 DCT 계수를 웨이브렛 구조로 변환하여 중요도가 높은 정보순으로 분류한 결과를 얻은 후 이것을 임베디드 제로트리 부호화하여 전송함으로써, 다음과 같은 장점이 있다.According to the image encoder according to the present invention, DCT transform of picture data of 8 × 8 block unit and transform the DCT coefficient into wavelet structure to obtain the result of sorting in order of information of high importance, and then by transmitting embedded zero-tree encoding This has the following advantages:

첫째, 기존 부호화기와의 호환성(compatibility) 및 뛰어난 비트율 왜곡(rate-distortion) 성능 향상으로 인해 공중망(PSTN)을 통한 영상 전화 시스템용 저전송률 비디오 전송에 적합하다.First, it is suitable for low-rate video transmission for video telephony systems through the public network (PSTN) due to improved compatibility with existing encoders and excellent bit-distortion performance.

둘째, 본 발명에서 이용한 제로트리 부호화는 스케일러블 특성 및 비트율 제어가 매우 용이한 장점을 갖는다. 따라서, 보통 이용 가능한 비트 버짓(budget)이 없거나 버퍼가 주어진 임계치를 넘을 때, 언제라도 비트스트림을 자를 수(truncation) 있다. 이런 특징은 영상 데이터 베이스에서의 브라우징(browing)에 특히 유용하다.Second, the zero-tree coding used in the present invention has the advantage that the scalable characteristics and the bit rate control are very easy. Thus, when there is usually no bit budget available or when the buffer crosses a given threshold, the bitstream can be truncated at any time. This feature is particularly useful for browsing in image databases.

셋째, I 프레임 부호화의 뛰어난 비트율 왜곡 성능으로 인해 장면 전환 및 물체의 빠른 이동으로 인한 장면에서의 드러나거나 겹쳐진 영역(motion failed 영역)에 대하여 뛰어난 부호화 효율을 얻을 수 있다. 특히 MPEG-4 또는 H.263의 테스트 영상이 아닌 카메라로 직접 입력되는 자연 영상인 경우 인트라 매크로 블록의 발생이 많으므로 더 큰 부호화 효율을 얻을 수 있다. 즉, 저전송율 영상 부호화기의 성능은 I 프레임의 효율적인 부호화가 매우 중요하므로, 본 발명의 부호화기가 영상 부호화가에 실제로 적용될 경우 전체적인 부호화 효율이 향상된다.Third, due to the excellent bit rate distortion performance of I frame coding, excellent coding efficiency can be obtained for areas that are exposed or overlapped in a scene due to scene change and rapid movement of an object. In particular, in the case of natural video input directly to a camera rather than a test video of MPEG-4 or H.263, an intra macroblock is frequently generated, and thus, greater coding efficiency can be obtained. In other words, the efficient coding of the I frame is very important for the performance of the low bit rate video encoder, so that the overall coding efficiency is improved when the encoder of the present invention is actually applied to the picture encoder.

넷째, 블록 왜곡 현상이 줄어든다. 이것은 DC 계수가 우선적으로 부호화되고 중간 및 구주파수 성분 반영의 결과이다.Fourth, the block distortion phenomenon is reduced. This is the result of DC coefficients being preferentially coded and intermediate and old frequency component reflections.

다섯째, 본 발명의 현재 표준화가 진행중인 MPEG-4, H.263++, H.26L 요구 조건에 적합하며 실제 응용 가능하다.Fifth, it is suitable for MPEG-4, H.263 ++, and H.26L requirements, which are currently being standardized, and are applicable in practice.

Claims

In the video encoding apparatus that decomposes and encodes one frame into a plurality of blocks,

Converting means for dividing an input frame into a plurality of blocks and converting a spatial domain of each block into a frequency domain;

Repositioning means for classifying and rearranging the transform coefficients of said converting means in order of importance according to the degree of containing information necessary for image reproduction;

And zero-tree encoding means for encoding the position and size information of the rearranged coefficients in order of importance and outputting bit streams arranged in accordance with the importance.

The method of claim 1, wherein the conversion means is I frame

An image encoding device, characterized by converting input data into a frequency domain by discrete cosine transform (DCT).

The method of claim 1, wherein the converting means is a P frame

And performing a motion compensation prediction using previous I and P frames, and then converting the difference between the motion compensated data and the currently input data into a frequency domain by using a discrete cosine transform (DCT).

The method of claim 3, wherein the converting means is adapted to encode a P frame.

And an intra mode / inter mode of each block according to the following equation.

MB _mean is an average value of the macro block, and A is a difference between each pixel and the average value of the macro block.

The method of claim 4, wherein the conversion means

And A <(SAD (x, y)-T) to determine the intra mode in which motion estimation is not performed.

Where SAD is the difference at the same location of the previous frame and T is the given threshold.

The method according to claim 5, wherein the threshold T of the converting means is

The image encoding apparatus may be variable according to the motion state of the screen.

The method of claim 1, wherein the repositioning means,

And classifying the DCT coefficients of the transform means using wavelet analysis and rearranging the DCT coefficients into a two-level pyramid structure.

8. The structure of claim 7, wherein the two-level pyramid structure of the repositioning means is

And a wavelet transform of four bands separated by a uniform band decomposition method.

The method of claim 7, wherein the repositioning means,

Image coding characterized by reclassifying the DC coefficients of DCT coefficients and AC coefficient values according to frequency, and classifying and rearranging only the lowest frequency bands in the form of rearrangement only from the parent-child relationship defined in order to use the spatial correlation of DC. Device.

The method of claim 1, wherein the repositioning means,

And biasing the DC value for quantization of the rearranged DCT coefficients.

The method of claim 11, wherein the repositioning means

And an average value of the DC coefficients is subtracted from the DC coefficients for the I frame to bias the DC value by the DC average.

The method of claim 11, wherein the repositioning means

In the case of the P frame, the image encoding apparatus characterized in that the DC coefficients are biased by the DC average by subtracting the average value of the DCT coefficients of the intra blocks per frame from the DC coefficients.

The method of claim 11, wherein the repositioning means

And a DC average value of 0 for an inter mode block of a P frame to bias the DC value.

The method of claim 1, wherein the zero tree encoding means,

Each of the rearranged coefficients encodes the position and the sign of the significant coefficients in comparison with a series of thresholds, and the initial threshold value is a maximum factor of 2 smaller than the maximum coefficient value, and is performed by decreasing it by half. .

15. The method of claim 14, wherein the zero tree encoding means

And a classification path for encoding the location and code of the important information and a fine division path necessary for continuously estimating the critical coefficient.

15. The method of claim 14, wherein the output bit stream of the zero tree encoding means

A video encoding apparatus comprising a test result of importance, a sign, and bits classified in detail.

15. The method of claim 14, wherein the zero tree encoding means

And transmitting the most important information gradually by first transmitting an approximation of the most important coefficients and classifying all the important coefficient values one bit at a time.

The method of claim 1, wherein the zero tree encoding means,

And an entropy encoding unit for lossless entropy encoding by the adaptive arithmetic encoding method.

The method of claim 1, wherein the output of the zero tree encoding means

And repositioned to have a plurality of uniform decomposition bands for motion compensation and prediction.