KR100480751B1

KR100480751B1 - Digital video encoding/decoding method and apparatus

Info

Publication number: KR100480751B1
Application number: KR10-1998-0042434A
Authority: KR
Inventors: 신재섭; 손세훈; 조대성; 서양석
Original assignee: 삼성전자주식회사
Priority date: 1998-10-10
Filing date: 1998-10-10
Publication date: 2005-05-16
Also published as: KR20000025386A

Abstract

본 발명은 임의의 형태를 갖는 물체가 연속하여 움직이는 경우에 이를 효율적으로 부호화하여 전송하는, 공간상 계층구조와 화질상 계층구조를 동시에 갖는 동영상 부호화/복호화 방법 및 장치에 관한 것이다. 그 방법은, 물체의 형상정보와 물체 내부의 텍스쳐정보로 이루어진 동영상 입력데이터를 공간상 계층구조와 화질상 계층구조로 구성하여 부호화하는 방법에 있어서,(a) 형상정보와 텍스쳐정보를 각각 소정의 비율로 다운샘플링하여 하나의 기본 레이어와 하나 이상의 상위 레이어들을 포함하는 공간상 계층구조를 구성하는 단계, (b) 기본 레이어의 형상정보 및 텍스쳐정보를 부호화하여 기본 레이어의 기본 비트스트림을 생성하고, 복호화된 텍스쳐정보와 원래의 텍스쳐정보의 차를 주파수변환부호화하고, 각 주파수대역별로 화질상 계층구조를 구성하는 단계 및 (c) 하나 이상의 상위 레이어들 각각에 대하여, 기본 레이어에서 업샘플링한 형상정보와 상위 레이어의 형상정보의 차를 부호화하여 상위 레이어의 기본 비트스트림을 생성하고, (b) 단계에서 복호화된 텍스쳐정보와 상위 레이어의 텍스쳐정보의 차를 주파수변환부호화하고, 각 주파수대역별로 화질상 계층구조를 구성하는 단계를 포함함을 특징으로 한다. 본 발명에 의하면, 임의의 모양을 갖는 물체를 대상으로 부호화/복호화를 적용할 수 있으므로 화면에 나타나는 임의의 물체들에 대해 별도의 화질 서비스가 가능해 진다The present invention relates to a video encoding / decoding method having both a spatial hierarchical structure and an image quality hierarchical structure that efficiently encodes and transmits an object having an arbitrary shape in a continuous motion. The method comprises encoding and encoding video input data consisting of object shape information and texture information inside an object into a spatial hierarchical structure and an image quality hierarchical structure, wherein (a) shape information and texture information are respectively determined. Constructing a spatial hierarchy including one base layer and one or more upper layers by downsampling at a rate; (b) encoding the shape and texture information of the base layer to generate a base bitstream of the base layer; Frequency conversion encoding the difference between the decoded texture information and the original texture information, and constructing a hierarchical structure of image quality for each frequency band; and (c) shape information upsampled in the base layer for each of one or more upper layers. Encoding a difference between the shape information of the layer and the upper layer to generate a basic bitstream of the upper layer, and (b) Encoding a difference between the texture information of the decoded texture information with the upper layer, and the frequency conversion, it characterized in that it comprises the step of configuring the hierarchy, the image quality for each frequency band. According to the present invention, since encoding / decoding can be applied to an object having an arbitrary shape, a separate image quality service is possible for arbitrary objects displayed on the screen.

Description

Video encoding / decoding method and apparatus {Digital video encoding / decoding method and apparatus}

본 발명은 데이터 부호화 및 복호화에 관한 것으로서, 특히 임의의 형태를 갖는 물체가 연속하여 움직이는 경우에 이를 효율적으로 부호화하여 전송하는, 공간상 계층구조와 화질상 계층구조를 동시에 갖는 동영상 부호화/복호화 방법 및 장치에 관한 것이다. The present invention relates to data encoding and decoding, and in particular, a video encoding / decoding method having both a spatial hierarchical structure and an image quality hierarchical structure that efficiently encodes and transmits an object having an arbitrary shape in a continuous motion. Relates to a device.

지금까지 연구되어온 많은 부호화/복호화 방식들이 대부분 TV의 화면처럼 일정한 크기를 가지는 사각형 모양의 화상을 부호화/복호화하는 방식에 관한 것이었다. 그 예가 MPEG1, MPEG2, H.261, H.263 등이다. Many coding / decoding methods that have been studied so far are related to a method of encoding / decoding a square-shaped image having a constant size, such as a TV screen. Examples are MPEG1, MPEG2, H.261, H.263 and the like.

기존에 개발된 대부분의 부호화 방식들은 극히 제한된 계층구조의 서비스만을 제공함으로써 인터넷/인트라넷(Internet/Intranet), 무선네트웍(wireless network)과 같이 전송로의 상태가 수시로 변화하는 구조에 능동적으로 대처한다는 것이 가능하지 못했다. 기존 방식 중에서 대표적이라고 할수 있는 MPEG-2(ISO/IEC JTC1/SC29/WG11 13818-2: MPEG-2 video)에서 사각형 스크린 형태의 동영상에 대하여 2개의 공간상 계층구조를 갖는 공간 계층 부호화(spatial scalable coding) 및 2~3개의 개층 구조를 갖는 화질상 계층 부호화(SNR scalable coding)를 제안하고 있으나 계층수의 제한으로 인하여 실질적 응용 분야(real application area)를 창조하기에는 제한점을 가지고 있었다. 또한, 임의의 형태를 가지는 물체(Arbitrary shaped object)에 대한 효율적인 압축방식을 제안하고 있는 MPEG-4(ISO/IEC JTC1/SC29/WG11 14496-2: MPEG-4 video)에서도 공간상 시간상 계층구조를 갖는 부호화 방식을 제안하고 있으나 동일한 공간상에서의 화질상의 계층구조를 비트스트림 상에서 제공해 줄 수 있는 방식은 아직 제안되고 있지 못한 실정이기 때문에 서비스의 질을 높이는데 한계점을 가지고 있다. Most of the existing coding schemes provide only a very limited hierarchical service, and actively cope with a structure in which transmission channel conditions change frequently such as the Internet / Intranet and a wireless network. It was not possible. Spatial scalable with two spatial hierarchies for rectangular screen video in MPEG-2 (ISO / IEC JTC1 / SC29 / WG11 13818-2: MPEG-2 video) coding and SNR scalable coding with two to three layers is proposed, but there are limitations in creating a real application area due to the limitation of the number of layers. In addition, MPEG-4 (ISO / IEC JTC1 / SC29 / WG11 14496-2: MPEG-4 video), which proposes an efficient compression method for arbitrarily shaped objects, also provides spatial temporal hierarchies. Although a coding scheme has been proposed, a method that can provide a hierarchical structure of image quality in the same space on a bitstream has not been proposed, and thus has a limitation in improving service quality.

본 발명이 이루고자 하는 기술적 과제는, 다양한 질의 서비스(QoS: Quality of Service)를 가능하게 하기 위한 공간상/화질상의 계층구조를 가지는 부호화 방식을 제공하는 것이다.An object of the present invention is to provide a coding scheme having a hierarchical structure of spatial / image quality to enable various quality of service (QoS).

또한, 본 발명은 부호화하는 과정에서 전송로의 제한이나 수신단의 수신능력에 따라서 차별적으로 데이터를 전송할 수 있도록 해주는 계층적 부호화(scalable coding) 방식을 제공하며, 사각형 형태의 화상 뿐만아니라 임의의 모양을 갖는 물체에 대한 계층적 부호화 방식을 제공하는 것을 또 다른 기술적 과제로 한다.In addition, the present invention provides a scalable coding scheme that allows data to be differentially transmitted according to the limitation of a transmission line or the reception capability of a receiving end in the encoding process. Another technical problem is to provide a hierarchical coding method for an object.

더불어, 본 발명은 공간상의 계층적 부호화 뿐만아니라 일단 정해진 공간에 대해서 화질을 가변적으로 결정해 줄 수 있는 화질상 계층 부호화(SNR 혹은 Fine Granular scalable coding) 기능을 동시에 제공하며, 보다 세밀한 구조의 서비스 질을 제공해 줄 수 있는 방식을 제공하고, 상기 부호화된 데이터를 복호화하는 방식도 제공하는 것을 기술적 과제로 한다.In addition, the present invention provides not only the spatial hierarchical coding but also the quality hierarchical coding (SNR or Fine Granular scalable coding) function that can variably determine the image quality for a predetermined space, and provides a more detailed service quality. It is a technical object of the present invention to provide a method capable of providing a method and to provide a method of decoding the encoded data.

상술한 기술적 과제를 해결하기 위하여, 본 발명에 의한 동영상 부호화 방법은, 물체의 형상정보와 물체 내부의 텍스쳐정보로 이루어진 동영상 입력데이터를 공간상 계층구조와 화질상 계층구조로 구성하여 부호화하는 방법에 있어서,(a) 상기 형상정보와 상기 텍스쳐정보를 각각 소정의 비율로 다운샘플링하여 하나의 기본 레이어와 하나 이상의 상위 레이어들을 포함하는 공간상 계층구조를 구성하는 단계, (b) 상기 기본 레이어의 형상정보 및 텍스쳐정보를 부호화하여 기본 레이어의 기본 비트스트림을 생성하고, 복호화된 텍스쳐정보와 원래의 텍스쳐정보의 차를 주파수변환부호화하고, 각 주파수대역별로 화질상 계층구조를 구성하는 단계 및 (c) 상기 하나 이상의 상위 레이어들 각각에 대하여, 상기 기본 레이어에서 업샘플링한 형상정보와 상위 레이어의 형상정보의 차를 부호화하여 상위 레이어의 기본 비트스트림을 생성하고, 상기 (b) 단계에서 복호화된 텍스쳐정보와 상위 레이어의 텍스쳐정보의 차를 주파수변환부호화하고, 각 주파수대역별로 화질상 계층구조를 구성하는 단계를 포함함을 특징으로 한다.In order to solve the above technical problem, the video encoding method according to the present invention is a method for encoding a video input data consisting of the shape information of the object and the texture information inside the object in a spatial hierarchy and a quality hierarchy (A) downsampling the shape information and the texture information at a predetermined ratio to form a spatial hierarchy including one base layer and one or more upper layers, (b) the shape of the base layer Generating a basic bitstream of the base layer by encoding the information and the texture information, frequency transform encoding the difference between the decoded texture information and the original texture information, and constructing a hierarchical image quality structure for each frequency band; and (c) For each of the one or more upper layers, shape information upstream from the base layer is higher than the shape information. By encoding the difference of the shape information of the layer to generate a basic bitstream of the upper layer, and the frequency conversion encoding of the difference between the texture information and the texture information of the upper layer decoded in the step (b), the image quality layer for each frequency band And constructing the structure.

상술한 기술적 과제를 해결하기 위하여, 본 발명에 의한 동영상 부호화 방법은, 물체의 형상정보와 물체 내부의 텍스쳐정보로 이루어진 동영상 입력데이터를 공간상 계층구조와 화질상 계층구조로 구성하여 부호화하는 방법에 있어서, (a) 상기 형상정보와 상기 텍스쳐정보를 각각 다운샘플링하여, 가장 큰 비율로 다운샘플링된 하나의 기본 레이어와 상기 기본 레이어보다 작은 비율로 다운샘플링된 하나 이상의 상위 레이어들을 포함하는 공간상 계층구조를 구성하는 단계, (b) 기본 레이어의 형상정보 및 텍스처정보에 대하여,(b1) 상기 기본 레이어의 형상정보를 형상부호화하는 단계, (b2) 상기 기본 레이어의 텍스쳐정보를 패딩하고, 주파수변환부호화하고, 양자화하는 단계. (b3) 상기 (b1) 및 (b2) 단계에서 생성된 데이터를 모아 가변장부호화하여 기본 레이어의 기본 비트스트림을 생성하는 단계, (b4) 상기 (b2) 단계에서 생성된 데이터를 역양자화하고, 역주파수변환하여 재현된 텍스쳐정보와 상기 기본 레이어의 텍스쳐정보의 차를 구하는 단계, (b5) 상기 (b4) 단계의 차에 대하여 주파수변환부호화하고, 각 주파수 별로 분류하여 각각의 주파수 대역에 따른 비트스트림을 생성하는 단계, (c) 각 상위 레이어들의 형상정보 및 텍스쳐정보에 대하여, (c1) 상기 기본 레이어의 형상정보를 상기 상위 레이어로 업샘플링한 형상정보와 상기 상위 레이어의 형상정보의 차를 형상부호화하고, 가변장부호화하여 상위 레이어 기본 비트스트림을 생성하는 단계, (c2) 상기 (b4) 단계에서 재현된 텍스쳐정보를 상기 상위 레이어로 업샘플링하고 패딩한 텍스쳐정보와 상기 상위 레이어의 텍스쳐정보의 차를 구하는 단계 및 (c3) 상기 (c2) 단계의 차에 대하여 주파수변환부호화하고, 각 주파수 별로 분류하여 각각의 주파수 대역에 따른 비트스트림을 생성하는 단계를 포함함을 특징으로 한다.In order to solve the above technical problem, the video encoding method according to the present invention is a method for encoding a video input data consisting of the shape information of the object and the texture information inside the object in a spatial hierarchy and a quality hierarchy (A) a spatial layer including one base layer downsampled at a largest ratio and one or more upper layers downsampled at a smaller ratio than the base layer by downsampling the shape information and the texture information, respectively; Constructing a structure, (b) shape-coding the shape information of the base layer with respect to the shape information and the texture information of the base layer, (b2) padding the texture information of the base layer, and converting the frequency Encoding and quantizing. (b3) generating the basic bitstream of the base layer by variable length encoding by collecting the data generated in the steps (b1) and (b2), (b4) dequantizing the data generated in the step (b2), Obtaining a difference between the texture information reproduced by inverse frequency conversion and the texture information of the base layer, (b5) encoding the frequency transform with respect to the difference of the step (b4), and classifying each frequency into bits according to respective frequency bands. Generating a stream, (c) with respect to the shape information and texture information of each upper layer, (c1) the difference between the shape information upsampled from the shape information of the base layer to the upper layer and the shape information of the upper layer Shape coding and variable length coding to generate an upper layer basic bitstream; (c2) upsampling the texture information reproduced in the step (b4) to the upper layer; Obtaining a difference between the texture information and the texture information of the upper layer; and (c3) generating a bitstream according to each frequency band by performing frequency conversion encoding on the difference of the step (c2) and classifying by frequency. Characterized by including.

본 발명의 상기 (b1) 단계 및 상기 (b2) 단계의 형상정보 부호화는 격주선처리 방식을 이용한 계층적 형상정보 부호화임이 바람직하다.Preferably, the shape information encoding of the step (b1) and the step (b2) of the present invention is hierarchical shape information encoding using a bilinear processing method.

본 발명의 상기 주파수변환부호화는 이산 코사인 변환임이 바람직하다.Preferably, the frequency transform encoding of the present invention is a discrete cosine transform.

본 발명의 상기 주파수변환부호화는 이산 웨이블릿 변환임이 바람직하다.The frequency transform encoding of the present invention is preferably a discrete wavelet transform.

본 발명의 상기 (b5) 단계 및 상기 (c3) 단계는 각각 차의 밝기신호에 대해서는, NㅧN 픽셀 크기의 주블록으로 분할하고, 각 주블록을 4개의 부블록(sub-block)으로 분할하고, 각 부블록 단위로 이산 코사인 변환된 밝기신호를 소정의 개수의 대역별로 분할하고, 밝기신호의 각 주블록에 대응한 색차신호에 대해서는, 각각 N/2ㅧ N/2 크기의 블록으로 분할하고, 각 블록 단위로 이산 코사인 변환된 색차신호를 소정의 개수의 대역별로 분할하고, 각 대역별로 이산 코사인 변환된 밝기신호와 이산 코사인 변환된 색차신호를 재조합하여 화질상의 계층으로 구성함(여기서, N은 2보다 큰 소정의 정수)이 바람직하다. Steps (b5) and (c3) of the present invention divide a main block having an N 밝기 N pixel size and divide each main block into four sub-blocks for a difference brightness signal, respectively. And dividing the discrete cosine-converted brightness signal for each subblock by a predetermined number of bands, and dividing the color difference signal corresponding to each main block of the brightness signal into blocks of size N / 2 ㅧ N / 2, respectively. And dividing the discrete cosine-converted color difference signal into each block unit by a predetermined number of bands, and reconstructing the discrete cosine transformed brightness signal and the discrete cosine transformed color difference signal for each band to form a hierarchical image quality layer. N is a predetermined integer larger than 2).

본 발명의 대역별 재조합은 상기 밝기신호의 주블록 단위로 이루어짐이 바람직하다.Band-by-band recombination of the present invention is preferably made in the main block unit of the brightness signal.

본 발명의 대역별 재조합은 영상 전체 단위로 이루어짐이 바람직하다.It is preferable that the band-specific recombination of the present invention is made in the whole image unit.

본 발명의 대역별로 재조합된 화질상의 각 계층은 소정의 방식에 의해 다시 부호화됨이 바람직하다. Each layer of the image quality recombined for each band of the present invention is preferably re-coded by a predetermined method.

본 발명은 형상정보가 있는 부블록 또는 블록에 대해서만 이산 코사인 변환을 하고, 형상 정보가 없는 부블록 또는 블록에 대해서는 이산 코사인 변환을 하지 않음이 바람직하다. In the present invention, it is preferable that the discrete cosine transform is performed only on subblocks or blocks having shape information, and the discrete cosine transform is not performed on subblocks or blocks without shape information.

상술한 기술적 과제를 해결하기 위하여, 본 발명에 의한 동영상 부호화 장치는, 물체의 형상정보와 물체 내부의 텍스쳐정보로 이루어진 동영상 입력데이터를 공간상 계층구조와 화질상 계층구조로 구성하여 부호화하는 장치에 있어서, 상기 형상정보와 상기 텍스쳐정보를 각각 다운샘플링하여, 가장 큰 비율로 다운샘플링된 하나의 기본 레이어와 상기 기본 레이어보다 작은 비율로 다운샘플링된 하나 이상의 상위 레이어들을 포함하는 공간상 계층구조를 구성하는 다운샘플링부, 상기 기본 레이어의 형상정보를 형상부호화하는 형상부호화부 1; 상기 기본 레이어의 텍스쳐정보를 패딩하고, 주파수변환부호화하고, 양자화하는 텍스쳐부호화부; 상기 형상부호화부 1 및 텍스쳐부호화부에서 출력된 데이터를 모아 가변장부호화하여 기본 레이어의 기본 비트스트림을 생성하는 제1가변장부호화부; 상기 텍스쳐부호화부에서 출력된 데이터를 역양자화하고, 역주파수변환하여 텍스쳐정보를 재생하는 텍스쳐복호화부; 상기 텍스쳐복호화부에 의해 재생된 텍스쳐정보와 상기 기본 레이어의 텍스쳐정보의 차를 생성하는 제1차영상생성부; 및 상기 제1차영상생성부에 의해 생성된 차를 주파수변환부호화하고, 각 주파수 별로 분류하여 각각의 주파수 대역에 따른 비트스트림을 생성하는 제1화질상계층구조생성부를 구비한 기본레이어부호화부; 및 상기 기본 레이어의 형상정보를 상기 상위 레이어로 업샘플링하고, 상기 텍스쳐복호화부에 의해 재생된 텍스쳐정보를 상기 상위 레이어로 업샘플링하는 업샘플링부; 업샘프링된 형상정보와 상기 상위 레이어의 형상정보의 차를 형상부호화하는 형상부호화부 2기; 상기 형상부호화부 2의 출력 데이터를 가변장부호화하여 상위 레이어 기본 비트스트림을 생성하는 제2가변장부호화부; 업샘플링부의 출력 데이터를 패딩한 텍스쳐정보와 상기 상위 레이어의 텍스쳐정보의 차를 구하는 제2차영상생성부; 상기 제2차영상생성부에 의해 생성된 차를 주파수변환부호화하고, 각 주파수 별로 분류하여 각각의 주파수 대역에 따른 비트스트림을 생성하는 제2화질상계층구조생성부를 구비한 하나 이상의 상위레이어부호화부를 포함함을 특징으로 한다.In order to solve the above technical problem, the video encoding apparatus according to the present invention is configured to encode a video input data consisting of the shape information of the object and the texture information inside the object in a spatial hierarchical structure and a quality hierarchical structure. And downsampling the shape information and the texture information, respectively, to form a spatial hierarchy including one base layer downsampled at the largest ratio and one or more upper layers downsampled at a smaller ratio than the base layer. A down-sampling unit to shape-shape the shape information of the base layer; A texture encoding unit that pads, texture transforms and quantizes texture information of the base layer; A first variable length encoder configured to variably encode the data output from the shape encoder 1 and the texture encoder to generate a basic bitstream of the base layer; A texture decoder which inversely quantizes the data output from the texture encoder and reproduces texture information by inverse frequency conversion; A first primary image generating unit generating a difference between texture information reproduced by the texture decoding unit and texture information of the base layer; And a basic layer encoding unit including a first image layer structure generation unit for generating a bitstream according to each frequency band by frequency-converting the difference generated by the first image generating unit and classifying each frequency. An upsampling unit for upsampling the shape information of the base layer to the upper layer and upsampling the texture information reproduced by the texture decoding unit to the upper layer; Two shape encoding units configured to shape-encode the difference between the upsampled shape information and the shape information of the upper layer; A second variable length encoder for variable length encoding the output data of the shape encoder 2 to generate an upper layer basic bitstream; A second image generating unit obtaining a difference between texture information padding output data of the upsampling unit and texture information of the upper layer; At least one higher layer encoding unit including a second image layer structure generation unit for generating a bitstream according to each frequency band by frequency-converting the difference generated by the second image generator and classifying each frequency. It is characterized by including.

본 발명의 상기 형상부호화부 1 및 상기 형상부호화부 2는 격주선처리 방식을 이용한 계층적 형상정보부호화부임이 바람직하다.Preferably, the shape coding unit 1 and the shape coding unit 2 of the present invention are hierarchical shape information coding units using a bilinear processing method.

본 발명의 상기 텍스쳐부호화부, 상기 제1화질상계층구조생성부 및 상기 제2화질상계층구조생성부는 각각 이산 코사인 변환기를 구비함이 바람직하다.Preferably, the texture encoding unit, the first image layer structure generation unit, and the second image layer structure generation unit each include a discrete cosine converter.

본 발명의 상기 텍스쳐부호화부, 상기 제1화질상계층구조생성부 및 상기 제2화질상계층구조생성부는 각각 이산 웨이블릿 변환기를 구비함이 바람직하다.Preferably, the texture encoding unit, the first image layer structure generation unit, and the second image layer structure generation unit each include a discrete wavelet converter.

본 발명의 제1화질상계층구조생성부 및 상기 제2화질상계층구조생성부는 각각 차의 밝기신호에 대해서는, Nㅧ N 크기 단위의 주블록으로 분할하고, 각 주블록을 4개의 부블록(sub-block)으로 분할하고, 각 부블록 단위로 이산 코사인 변환된 밝기신호를 소정의 개수의 대역별로 분할하고, 밝기신호의 각 주블록에 대응한 색차신호에 대해서는, 각각 N/2ㅧ N/2 크기의 블록으로 분할하고, 각 블록 단위로 이산 코사인 변환된 색차신호를 소정의 개수의 대역별로 분할하고, 각 대역별로 이산 코사인 변환된 밝기신호와 이산 코사인 변환된 색차신호를 재조합하여 화질상의 계층으로 구성함(여기서, N은 2보다 큰 소정의 정수)이 바람직하다.The first image layer structure generating portion and the second image layer structure generating portion of the present invention are divided into main blocks of N ㅧ N size units for the brightness signal of the difference, and each of the main blocks is divided into four subblocks ( sub-block), the discrete cosine-converted brightness signal in each sub-block unit is divided into a predetermined number of bands, and each color difference signal corresponding to each main block of the brightness signal is N / 2 ㅧ N /. It is divided into two-sized blocks, the discrete cosine transformed color difference signal is divided into predetermined number of bands in each block unit, and the discrete cosine transformed brightness signal and the discrete cosine transformed color difference signal are recombined for each band. In this case, N is a predetermined integer greater than two.

본 발명의 제1화질상계층구조생성부 및 상기 제2화질상계층구조생성부는 각각 형상정보가 있는 부블록 또는 블록에 대해서만 이산 코사인 변환을 하고, 형상 정보가 없는 부블록 또는 블록에 대해서는 이산 코사인 변환을 하지 않는 것이 바람직하다.The first image layer structure generating unit and the second image layer structure generating unit of the present invention respectively perform discrete cosine transformation for subblocks or blocks having shape information, and discrete cosine for subblocks or blocks without shape information. It is preferable not to convert.

상술한 기술적 과제를 해결하기 위하여, 본 발명에 의한 동영상 복화화 방법은, 공간상 계층구조와 화질상 계층구조로 구성되어 부호화된 비트스트림을 복호화하는 방법에 있어서, (a) 상기 비트스트림을 가변장복호화하면서, 기본 레이어 비트스트림과 하나 이상의 상위 레이어 비트스트림들로 분류하는 단계, (b) 상기 기본 레이어 비트스트림에 포함된 부호화된 형상정보를 형상복호화하여 기본 레이어의 형상정보를 생성하는 단계, (c) 상기 기본 레이어 비트스트림에 포함된 부호화된 텍스쳐정보를 역양자화하고, 역주파수변환하여 기본 레이어의 텍스쳐정보를 생성하는 단계, (d) 상기 기본 레이어 비트스트림에 포함된 화질상 계층구조에서 선택된 비트스트림들을 순차적으로 역주파수변환하여 상기 기본 레이어의 텍스쳐정보에 더하는 단계 및 (e) 상기 하나 이상의 상위 레이어들 중 선택된 상위 레이어까지의 각 상위 레이어에 대하여, (e1) 상기 상위 레이어 비트스트림에 포함된 상위 레이어의 형상정보를 형상복호화하여 업샘플링된 하위 레이어의 형상정보에 더하는 단계 및 (e2) 상기 상위 레이어 비트스트림에 포함된 화질상 계층구조에서 선택된 비트스트림들을 순차적으로 역주파수변환하여 업샘플링된 하위 레이어의 텍스쳐정보에 더하는 단계를 순차적으로 반복하는 단계를 포함함을 특징으로 한다.In order to solve the above technical problem, the video decoding method according to the present invention is a method of decoding a coded bitstream composed of a spatial hierarchy and a quality hierarchy, (a) varying the bitstream Classifying the base layer bitstream and one or more upper layer bitstreams with long decoding, (b) generating shape information of the base layer by shape decoding the encoded shape information included in the base layer bitstream; (c) inverse quantizing the encoded texture information included in the base layer bitstream and inverse frequency transform to generate texture information of the base layer, (d) in the image quality hierarchical structure included in the base layer bitstream Sequentially inversely converting the selected bitstreams and adding them to the texture information of the base layer; and (e) For each upper layer up to a selected upper layer among the one or more upper layers, (e1) shape-decoding the shape information of the upper layer included in the upper layer bitstream and adding the shape information to the shape information of the upsampled lower layer; and (e2) sequentially repeating the inverse frequency conversion of the selected bitstreams in the image quality hierarchical structure included in the higher layer bitstream and adding them to the texture information of the upsampled lower layer. .

상술한 기술적 과제를 해결하기 위하여, 본 발명에 의한 동영상 복호화 장치는, 공간상 계층구조와 화질상 계층구조로 구성되어 부호화된 비트스트림을 복호화하는 장치에 있어서, 상기 비트스트림을 가변장복호화하면서, 기본 레이어 비트스트림과 하나 이상의 상위 레이어 비트스트림들로 분류하는 가변장복호화부, 상기 기본 레이어 비트스트림에 포함된 부호화된 형상정보를 형상복호화하여 기본 레이어의 형상정보를 생성하는 형상복호화부 1; 상기 기본 레이어 비트스트림에 포함된 부호화된 텍스쳐정보를 역양자화하고, 역주파수변환하여 기본 레이어의 텍스쳐정보를 생성하는 텍스쳐복호화부; 및 상기 기본 레이어 비트스트림에 포함된 화질상 계층구조에서 선택된 비트스트림들을 순차적으로 역주파수변환하여 상기 기본 레이어의 텍스쳐정보에 더하는 제1화질상계층구조복호화부를 구비한 기본레이어복호화부 및 공간상 계층구조에서 상기 상위 레이어 직전 하위 레이어의 형상정보와 상기 상위 레이어의 직전 하위 레이어의 텍스쳐정보를 상기 상위 레이어로 업샘플링하는 업샘플링부; 상기 상위 레이어 비트스트림에 포함된 상위 레이어의 형상정보를 형상복호화하여 업샘플링된 하위 레이어의 형상정보에 더하는 형상복호화부 2; 및 상기 상위 레이어 비트스트림에 포함된 화질상 계층구조에서 선택된 비트스트림들을 순차적으로 역주파수변환하여 업샘플링된 하위 레이어의 텍스쳐정보에 더하는 제2화질상계층구조복호화부를 구비한 하나 이상의 상위레이어복호화부를 포함함을 특징으로 한다.In order to solve the above technical problem, the video decoding apparatus according to the present invention is a device for decoding a coded bit stream composed of a spatial hierarchical structure and a quality hierarchical structure, while variable length decoding the bit stream, A variable length decoding unit for classifying the base layer bitstream into one or more upper layer bitstreams, and a shape decoding unit 1 for generating shape information of the base layer by shape decoding the encoded shape information included in the base layer bitstream; A texture decoder which inversely quantizes encoded texture information included in the base layer bitstream and inverse frequency transforms to generate texture information of the base layer; And a first layer hierarchical layer decoding unit and a spatial layer having a first image layer hierarchical structure decoding unit to sequentially inverse frequency-convert the bitstreams selected from the image quality hierarchical structure included in the base layer bitstream and add the texture information of the base layer. An upsampling unit for upsampling the shape information of the lower layer immediately before the upper layer and the texture information of the immediately lower layer of the upper layer in the structure to the upper layer; A shape decoding unit 2 for shape decoding the shape information of the upper layer included in the upper layer bitstream and adding the shape information to the shape information of the upsampled lower layer; And at least one upper layer decoding unit including a second image layer structure decoding unit configured to sequentially inverse-frequency convert the bitstreams selected from the image quality hierarchy included in the upper layer bitstream and add the texture information of the upsampled lower layer. It is characterized by including.

이하 첨부된 도면을 참조하여 본 발명을 상세히 설명하기로 한다. 다만, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서, 이는 사용자, 운용자의 의도 또는 관례에 따라 달라질 수 있다. 그러므로, 당해 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. However, terms to be described below are terms defined in consideration of functions in the present invention, and may be changed according to a user's or operator's intention or custom. Therefore, the definition should be made based on the contents throughout the specification.

도 2에 도시된 바와 같이 입력되는 비디오(102)와 형상정보인 마스크 1(101) 로부터 크기가 반으로 축소(103)된 데이터를 사용하여 마스크 2(104)와 패딩(Padding) 1(105) 영상을 만든 다음, 형상부호화부 1(106)과 주파수 변환 부호화기인 DCT(Discrete Cosine Transform;107)를 이용하여 기본 레이어, 즉, 비트스트림r(BL,;131)를 생성한 후 이를 역양자화(110)하고 역주파수 변환기인 IDCT (Inverse Discrete Cosine Transform;111)를 통해서 재현된 영상과 원영상과의 차이(112)를 구한다. As shown in FIG. 2, the mask 2 104 and the padding 1 105 are formed by using the data 102 reduced in size from the input video 102 and the shape information mask 1 101. After the image is generated, the base layer, that is, the bit stream r (BL,; 131) is generated by using the shape encoder 1 (106) and the DCT (Discrete Cosine Transform) 107, which is a frequency transform encoder, and then dequantized it. 110) and the difference 112 between the reproduced image and the original image is obtained through IDCT (Inverse Discrete Cosine Transform; 111).

그 다음에 이 차이에 대한 DCT(113)를 수행하여 계수 분배기(114)에서 각 주파수 별로 분류하여 각각의 주파수 대역에 따라서 비트스트림(bit stream)을 구성해 준다. 이렇게 하여 BSL0(115), BSL1(116), ... , BSL(n)(118)을 생성해 냄으로써 비트스트림의 추가에 따라서 기본 레이어(131)에 저역 주파수 성분인 BSL0(115)가 더해지면 조금더 화질이 향상된 BSNR(0)(132)을 만들어 내게 되고, 최고주파 성분인 BSL(n)(118)을 더하면 가장 향상된 화질의 BSNR(n)의 비트스트림 구성을 가능케 함으로써 화질을 가변적으로 조절하는 것이 가능하게 한다. 이와 같은 저해상도의 재생된 영상을 바탕으로 공간적으로 확대한 영상에 대해서도 같은 원리를 적용하여 해상도를 높인 경우에도 적용시킬 수 있도록 2배 확대시킨 마스크 3(120)과 패딩(Padding) 2 영상(122)에 대해서도 같은 해상도의 원영상(102)과 패딩(Padding) 2 영상(122)과의 차이(123)를 구한 다음, 이 차이에 대한 DCT(124)를 수행하여 계수 분배기(125)에서 각 주파수 별로 분류해서 각각의 주파수 대역에 따라서 비트스트림을 구성해 준다. 이렇게 하여 ESL0(127), ESL1(128), ... , ESL(n)(130)을 생성해 냄으로써 상위 레이어의 기본이 되는 EL(136)로부터 비트스트림의 추가에 따라서 ESNR0(137), ESNR1(138), ... , ESNR(n)(140)등과 같이 높은 해상도에서도 화질을 점점 향상되는 방향으로 가변적으로 조절하는 것이 가능하도록 구현되어 있다. Next, DCT 113 is performed on this difference, and the coefficient divider 114 classifies each frequency to form a bit stream according to each frequency band. In this way, BSL0 115, BSL1 116, ..., BSL (n) 118 are generated to add the low frequency component BSL0 115 to the base layer 131 according to the addition of the bitstream. The BSNR (0) (132) with improved image quality is created.BSL (n) (118), which is the highest frequency component, is added to enable the bitstream configuration of BSNR (n) with the highest image quality. Makes it possible. Based on the low resolution reproduced image, the mask 3 120 and the padding 2 image 122 are enlarged twice to be applied even when the resolution is increased by applying the same principle to the spatially enlarged image. The difference 123 between the original image 102 and the padding 2 image 122 having the same resolution is obtained, and then DCT 124 is performed on the difference, so that the coefficient divider 125 for each frequency It classifies and constructs a bitstream according to each frequency band. By generating ESL0 (127), ESL1 (128), ..., ESL (n) 130 in this way, ESNR0 (137) and ESNR1 in accordance with the addition of the bitstream from the EL 136, which is the base of the upper layer. (138), ..., ESNR (n) (140) is implemented such that it is possible to variably adjust in the direction of the image quality is improved even at high resolution.

이렇게 구성된 부호화기에 따라 만들어진 비트스트림들은 도 3에 도시된 바와 같이 복호화기에서 구성된 장치에 따라 복호화됨으로써 공간상/화질상 계층구조의 재생된 영상을 제공해 준다. 도 3에서 도시된 것처럼 입력되는 비트스트림(201)을 가변장 복호화기인 VLD1(202)에서 해석하여 기본 레이어 비트스트림(203)과 상위 레이어 비트스트림(210)으로 분류한다. 그리고 나서 기본 레이어 비트스트림(203)은 다시 형상정보에 관련된 비트스트림에 대해서는 형상복호화부 1(204)을 거쳐서 마스크 4(205)를 만들어 내는데, 이것이 기본 레이어의 형상정보(208)가 되고 영상정보에 관련된 비트스트림에 대해서는 역양자화(206) 및 역주파수 변환 부호화 장치(IDCT;207)를 거쳐서 기본 레이어의 텍스쳐 정보(209)를 만들어 낸다. 한편 상위 레이어가 있는 경우에는 앞의 과정에서 만들어진 기본 레이어의 형상정보와 텍스쳐 정보를 두배로 확대 시킨 영상(213, 218)과 상위 레이어에서 기본 레이어와 같은 과정의 복호화기를 거쳐서 만들어진 정보를 합하여 상위 레이어 형상정보(215)와 텍스쳐 정보(220)를 만들어 낸다.The bitstreams generated according to the encoder configured as described above are decoded according to the apparatus configured in the decoder as shown in FIG. 3 to provide a reproduced image of a spatial / image hierarchy. As shown in FIG. 3, the input bitstream 201 is interpreted by the variable length decoder VLD1 202 and classified into a base layer bitstream 203 and an upper layer bitstream 210. Then, the base layer bitstream 203 again generates a mask 4 205 through the shape decoding unit 1 204 for the bitstream related to the shape information, which becomes the shape information 208 of the base layer and the image information. For the bitstream associated with, the texture information 209 of the base layer is generated through the inverse quantization 206 and the inverse frequency transform encoding apparatus (IDCT) 207. On the other hand, if there is an upper layer, the upper layer is formed by adding the image (213, 218) which doubled the shape information and texture information of the base layer created in the previous process and the information made through the decoder in the same process as the base layer in the upper layer. Shape information 215 and texture information 220 are produced.

한편, 본 발명의 동작원리를 설명하면 다음과 같다. On the other hand, the operation principle of the present invention will be described as follows.

도 2는 본발명에 의한 공간상/화질상의 계층적 부호화기의 전체 구조를 나타낸 것이다. 도 2에 도시된 바와 같이 입력 데이터는 물체의 형상정보를 제공하는 마스크 1(101)과 물체의 내부 텍스쳐 정보를 제공하는 비디오(102)로 이루어져 있다. 그런데 임의의 형상이 아니고 사각형의 스크린 전체를 부호화할 경우에는 마스크 정보(마스크 1, 2, 3)가 필요하지 않게 된다. 이렇게 입력된 영상은 다운샘플링(down sampling) 장치(103)에서 가로 세로 1/2 크기의 영상들로 변환된 다음, 마스크 1(101)은 마스크 2(104)가 되어, 형상부호화부 1(106)에서 전송에 필요한 정보로 압축이 된다. 이때 형상부호화부 1(106)은 현재 MPEG-4의 version 2 Working Draft에 채택되어 있는 Scalable shape coding 방식을 이용한다. 2 shows the overall structure of a hierarchical encoder in spatial / image quality according to the present invention. As shown in FIG. 2, the input data includes a mask 1 101 providing shape information of an object and a video 102 providing internal texture information of an object. However, mask information (masks 1, 2, 3) is not required when the entire screen of the rectangle is encoded without any shape. The input image is converted into images of size 1/2 of the width in the down sampling apparatus 103, and then the mask 1 101 becomes the mask 2 104, and the shape encoder 1 106 is used. ) Is compressed into information necessary for transmission. In this case, the shape coder 1 106 uses the scalable shape coding scheme currently employed in version 2 Working Draft of MPEG-4.

그리고 비디오(102) 정보는 형상에 맞도록 패딩(Padding)이 되어 패딩 1(105) 영상으로 만들어는데, 이 때 패딩이라고 하는 것은 MPEG-4 비디오(14496-2) 부분의 기술을 사용한다. 이 패딩 1(105)영상은 주파수 변환 부호화 장치의 일종인 DCT(107)와 양자화기(108)를 거쳐서 미리 코딩이 되어있는 형상부호화 정보와 함께 제1가변장부호화부(VLC1;109)에서 기본 레이어의 기본 비트스트림(BL;131)을 생성하게 된다. 이것이 최저 해상도에서 가장 기본이 되는 영상을 만드는 비트스트림이다. 상기 양자화기(108)를 거쳐서 양자화되었던 주파수 계수들은 역양자화기(110)와 역주파수 변환 부호화 장치인 IDCT(111)를 거쳐서 축소되었던 원영상과의 차이(112)를 구한 다. 그 다음에 상기 차 신호에 대해서 다시 주파수 변환 부호화 장치인 DCT(113)를 적용시키고 이 DCT 계수들을 정해진 주파수 대역에 따라 계수 분배기(114)에서 분류하여 BSL(0)(115), BSL(1)(116), ..., BSL(n)(118)등으로 분류한다. The video 102 information is padded to match the shape to form a padding 1 (105) image, wherein padding uses a technique of MPEG-4 video (14496-2). The padding 1 (105) image is basic in the first variable length coding unit (VLC1) 109 together with the shape coding information that is precoded through the DCT 107 and the quantizer 108, which is a type of frequency conversion encoding apparatus. A basic bitstream BL 131 of the layer is generated. This is the bitstream that produces the most basic picture at the lowest resolution. The frequency coefficients quantized through the quantizer 108 obtain a difference 112 between the inverse quantizer 110 and the original image reduced through the IDCT 111, which is an inverse frequency transform encoding apparatus. Next, DCT 113, which is a frequency transform encoding apparatus, is applied to the difference signal, and the DCT coefficients are classified by the coefficient divider 114 according to a predetermined frequency band, thereby BSL (0) 115 and BSL (1). (116), ..., BSL (n) (118).

이 때 각각의 대역을 구성하는 예는 도 4 (a)에 도시된 바와 같이 화상을 구성하는 임의의 영역을 다수 개의 N x N 부블록(251)들로 나눈 다음, 각각의 부블록에 대해 주파수 변환 부호화기의 출력 데이터를 각각의 대역별로 분류하게 된다. 임의의 부블록 k에 대한 분류 예를 살펴보면 다음과 같다. N x N의 밝기성분(luminance component)(251)과 N/2 x N/2의 색차성분(Chrominance component)데이터(256,257)에 대해서 분류하게 된다. 밝기성분 데이터는 4개의 부블록(252,253,254,255)으로 구성되어 각각의 대역들이 모여서 하나의 서브대역 데이터를 구성하게 되며, 여기에 색차성분들이 하나씩 더해져서 단위블록에 대한 각각의 주파수 대역이군(MELk)을 형성하게 된다. In this case, an example of constituting each band is divided into a plurality of N × N subblocks 251 of an arbitrary region constituting an image, as shown in FIG. 4 (a), and then a frequency for each subblock. The output data of the transform encoder is classified for each band. An example of classification for an arbitrary subblock k is as follows. N × N luminance component 251 and N / 2 × N / 2 chrominance component data 256 and 257 are classified. The brightness component data is composed of four sub-blocks 252, 253, 254, and 255, and respective bands are combined to form one sub-band data. The chrominance components are added one by one to each group of frequency bands (MELk) for the unit block. To form.

도 4 (a)에서는 밝기성분과 색차성분의 블록구성을 보여준다. 도 4 (b)는 각각의 단위 블록에 대해서 주파수 변환 부호화기를 거친 계수들의 대역별 분류를 보여주는데, 여기서는 ELk 0(260), ELk 1(261), ..., ELk 7(267) 등 모두 8개의 각기 다른 대역을 생성할 수 있음을 알 수 있다. 도 4 (c)는 밝기 성분을 구성하는 4개의 부블록들에 존재하는 계수들을 동일한 대역별로 묶어서 단위 대역군을 형성하는 과정을 보여주는 것으로, 부블록에 따라 고유위치를 가지고 배열되게 된다. 즉, 각 서브대역의 DC 성분들의 집합을 MELk 0(270), 다음으로 가장 중요한 저주파 성분의 AC 계수들의 집합을 MELk 1(271), ..., 가장 마지막 대역에 존재하는 AC 계수들의 집합을 MELk 7(277)로 분류한다. 각각의 대역에 따르는 주파수 계수값들의 구성을 도 5와 도 6에 보여주고 있다. 4 (a) shows a block configuration of the brightness component and the color difference component. FIG. 4 (b) shows the band-by-band classification of the coefficients passed through the frequency conversion encoder for each unit block, where ELk 0 (260), ELk 1 (261), ..., ELk 7 (267), etc. are all 8 It can be seen that different bands can be generated. 4 (c) shows a process of forming unit band groups by grouping coefficients existing in four subblocks constituting a brightness component for the same band, and arranged with unique positions according to the subblocks. That is, MELk 0 (270), the set of AC coefficients of the most important low frequency component, MELk 1 (271), ..., the set of AC coefficients existing in the last band. Classified as MELk 7 (277). 5 and 6 show the configuration of frequency coefficient values according to respective bands.

도 5에서는 4개의 부블록(301,302,303,304)으로 되어 있는 밝기성분 데이터에 대한 8가지 계층구조에 속하는 각 블록에서의 주파수 성분들의 배열에 의해 레이어(Layer)가 MELk0(305), MELk1(306), ..., MELk7(312)까지 구성되는 것을 보여주고 있다. 도 6에서는 두개의 색차신호(Cr(401), Cb(403))에 대한 주파수 성분들의 계수구성을 보여주고 있다. 밝기성분과 마찬가지로 각각의 레이어에 대한 주파수 계수들이 MELk 0, MELk 1, ..., MELk 7 까지 구성되게 된다. In FIG. 5, the layer is composed of MELk0 305, MELk1 306,... By the arrangement of the frequency components in each block belonging to the eight hierarchical structures for the brightness component data including four subblocks 301, 302, 303, and 304. .., MELk7 312 is shown. 6 shows the coefficient configuration of the frequency components of two color difference signals (Cr 401 and Cb 403). As with the brightness components, the frequency coefficients for each layer are made up to MELk 0, MELk 1, ..., MELk 7.

이와같이 구성하여 임의의 단위 블록 k에 대한 계층구조를 만든 다음, 모든 부블록에 대해 같은 대역에 있는 데이터들을 하나로 묶어서 전체 화면에 대한 계층구조를 만들어 주는 것이 BSL(0)(115), BSL(1)(116), ..., BSL(n)(118)이다. 이때 임의의 형상에 대한 처리를 하기 위해서는 부블록이 형상의 내부인지 경계인지 아니면 외부인지를 알려주어야 한다. 즉, 도 7에 도시된 것처럼 밝기 정보에 대한 4개의 부블록들의 상태를 4비트의 데이터로 표시해 줌으로써 미리 약속된 순서에 의해 어떤 블록이 형상정보에 속하는지 아닌지를 쉽게 알 수 있다. 도 7 (a)처럼 모두 형상의 외부이면 “0000”(502)으로 표시되고 도 7(b)처럼 왼쪽 상단만 형상의 외부이면 “0111”(504), ..., 그리고 모든 블록이 형상의 내부이면 “1111”(532)로 표시함으로써 각각의 레이어에 놓이는 데이터가 어떤 블록에 속하는 것인지를 알 수 있도록 해 준다. 물론 색차성분인 경우에는 밝기 성분에 최소 한개 이상의 형상정보가 존재하면 색차성분도 존재하는 것으로 판단하게 된다. 이와같이 공간상 기본 레이어에서 n개의 화질상 계층구조를 갖는 비트스트림을 재생해 냄과 동시에 이 기본 레이어의 데이터를 upsampling부(119)에서 두배 확대하여 확대된 형상정보인 마스크 3(120)을 형상부호화부 2(121)에서 코딩하여 상위 레이어를 구성하고 마스크 3(120)을 참조하여 확되된 영역에서 패딩 2(122) 영상을 만들어 낸 다음, 이 영상을 원영상인 비디오(102)로부터 뺀 다음 이 차이값에 대해서 기본 레이어와 마찬가지 과정을 거쳐서 ESL(0)(127), ESL(1)(128), ..., ESL(n)(130)을 만들어 냄으로써 공간상/화질상 계층구조를 갖는 비트스트림을 제작할 수 있게 된다.In this way, it is possible to create a hierarchical structure for an arbitrary unit block k, and then create a hierarchical structure for the entire screen by grouping data in the same band for all subblocks into a BSL (0) 115 and a BSL (1). ) 116, ..., BSL (n) 118. At this time, in order to process the arbitrary shape, it is necessary to inform whether the subblock is inside, boundary, or outside of the shape. That is, as shown in FIG. 7, by displaying the states of the four subblocks for the brightness information as 4-bit data, it is easy to know which blocks belong to the shape information in a predetermined order. As shown in Fig. 7 (a), if all are outside of the shape, it is indicated as “0000” (502). If it is internal, it is indicated by “1111” (532), so that it is possible to know which block belongs to the data placed in each layer. Of course, in the case of the color difference component, if at least one shape information exists in the brightness component, it is determined that the color difference component also exists. As described above, the bitstream having the hierarchical structure of n image quality is reproduced in the spatial base layer, and the data of the base layer is doubled in the upsampling unit 119 to shape-encode the mask 3 120, which is the enlarged shape information. Coded in part 2 (121) to form an upper layer and creating a padding 2 (122) image in the area identified with reference to mask 3 (120), and subtracting this image from the original video (102) With respect to the difference value, the ESL (0) (127), ESL (1) 128, ..., and ESL (n) 130 are generated through the same process as the base layer to have a spatial / image hierarchical structure. The bitstream can be produced.

한편, 이렇게 만들어진 비트스트림들이 복호화되는 과정은 도 3에 나타내어져 있다. 도 3에서 도시된 바와 같이 입력되는 비트스트림들은 가변장 복호화기인 VLD1(202)에서 기본 레이어 비트스트림(203)과 상위 레이어 비트스트림(210)으로 분류가 된다. 먼저 기본 레이어 비트스트림(203)은 또 다시 형상정보와 텍스쳐(texture) 정보로 나뉘어진다. 상기 형상정보는 형상복호화부 1(204)에서 복호화되어 마스크 4(205)를 생성해내고 이것이 기본 레이어 형상정보(208)가 된다. 상기 텍스쳐 정보는 역양자화(206)와 역주파수 변환기인 IDCT(207)를 거쳐서 기본 레이어 텍스쳐 정보(209)를 만들어낸다. 이렇게 함으로써 기본 레이어에 대한 복호화가 끝나게 된다. Meanwhile, a process of decoding the thus-created bitstreams is shown in FIG. 3. As shown in FIG. 3, the input bitstreams are classified into a base layer bitstream 203 and a higher layer bitstream 210 in the variable length decoder VLD1 202. First, the base layer bitstream 203 is further divided into shape information and texture information. The shape information is decoded by the shape decoding unit 1 (204) to generate the mask 4 (205), which becomes the base layer shape information (208). The texture information generates base layer texture information 209 through inverse quantization 206 and IDCT 207 which is an inverse frequency converter. This ends decoding of the base layer.

만약 상위 레이어 비트스트림(210)이 존재하는 경우에는 상위 레이어 비트스트림(210)도 마찬가지로 형상정보와 텍스쳐 정보로 나뉘어진다. 상기 형상정보는 형상복호화부 2(211)에서 복호화되어 마스크 5(212)를 생성해내고 이것이 기본 레이어에서 재생된 형상정보인 마스크를 upsampling부(213)에서 두배 확대시킨 정보와 합산기(214)에서 합쳐져서 상위 레이어 형상정보(215)를 만들게 된다. 상기 텍스쳐 정보는 역양자화(216)와 역주파수 변환기인 IDCT(217)를 거쳐서 만들어진 영상에 기본 레이어에서 재생된 텍스쳐 정보를 upsampling부(218)에서 두배 확대시킨 것과 합산기(219)에서 합쳐져서 상위 레이어 텍스쳐 정보(220)를 만들어 낸다. 이렇게 함으로써 상위 레이어에 대한 복호화 과정이 끝나게 된다. 모든 과정이 종료되면 수신단에서는 공간상/화질상 다수개의 계층구조를 갖는 영상의 재현이 가능하게 된다.If the upper layer bitstream 210 exists, the upper layer bitstream 210 is similarly divided into shape information and texture information. The shape information is decoded by the shape decoding unit 2 (211) to generate a mask 5 (212), which is a shape information reproduced in the base layer, and the upsampling unit 213 doubles the information and the adder 214 Are combined to form upper layer shape information 215. The texture information is added to the image generated through the inverse quantization 216 and the inverse frequency converter IDCT 217 by doubling the texture information reproduced in the base layer by the upsampling unit 218 and adding the same in the adder 219 to the upper layer. The texture information 220 is produced. This ends the decoding process for the upper layer. After all processes are completed, the receiving end can reproduce an image having a plurality of hierarchical structures in spatial and image quality.

이상에서 설명한 것은, 본 발명에 따른, 동영상 부호화/복호화 방법 및 장치를 실시하기 위한 하나의 실시예에 불과한 것으로서, 본 발명은 상기한 실시예에 한정되지 않고 이하의 특허청구범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변경 실시가 가능할 것이다.What has been described above is just one embodiment for implementing a video encoding / decoding method and apparatus according to the present invention, and the present invention is not limited to the above-described embodiment, but is claimed in the following claims. Various changes can be made by those skilled in the art without departing from the gist of the present invention.

상술한 바와 같이, 본 발명에 의한 동영상 부호화/복호화 방법 및 장치에 따르면, 하나의 비트스트림을 통해서 임의의 모양을 갖는 물체에 대해 다양한 크기의 다양한 화질 정보를 제공해 주는 것이 가능하다. 즉, 최소한의 정보만으로 기본 레이어의 기본 화질을 갖는 정보를 재현한 후 다양하게 세분화된 비트스트림을 구성해 줌으로써 전송로나 수신단의 성능에 따라서 재현되는 화질이 다양하게 변화될 수 있다. 뿐만아니라 공간상으로 확대된 상위 레이어에 대해서도 똑같은 형태의 동작이 반복될 수 있도록 되어있기 때문에 해상도의 변화에 따라서도 다양한 화질의 정보를 제공할 수 있다. As described above, according to the video encoding / decoding method and apparatus according to the present invention, it is possible to provide various image quality information of various sizes to an object having an arbitrary shape through one bitstream. That is, by reproducing the information having the basic picture quality of the base layer with a minimum of information, and configuring the various bit streams, the picture quality reproduced according to the performance of the transmission line or the receiving end can be variously changed. In addition, since the same type of operation can be repeated for the upper layer enlarged in space, information of various image quality can be provided according to the change of resolution.

또한, 본 발명에 의한 동영상 부호화/복호화 방법 및 장치에 따르면, 임의의 모양을 갖는 물체를 대상으로 부호화/복호화를 적용할 수 있으므로 화면에 나타나는 임의의 물체들에 대해 별도의 화질 서비스가 가능해 진다. 즉, 필요한 물체(object)에 대해 사용자나 제공자가 의도하는 대로 화질의 정도를 결정하여 주는 다양한 질의 서비스(QoS; Quality of Service)가 가능해 진다.In addition, according to the method and apparatus for encoding / decoding a video according to the present invention, encoding / decoding may be applied to an object having an arbitrary shape, so that a separate image quality service may be provided for arbitrary objects displayed on a screen. That is, various quality of service (QoS) services that determine the degree of image quality as the user or provider intends for the required object become possible.

도 1은 화상의 공간 및 화질상의 계층구조 관계를 도시한 것이다.Fig. 1 shows a hierarchical relationship in space and image quality of an image.

도 2는 본발명에 의한 공간/화질상 스케일러블 부호화기의 전체구조를 나타낸 것이다.2 shows the overall structure of a spatial / resolution scalable encoder according to the present invention.

도 3은 본발명에 의한 공간/화질상 스케일러블 복호화기의 전체구조를 나타낸 것이다.3 shows the overall structure of a spatial / resolution scalable decoder according to the present invention.

도 4는 주파수 대역의 분할을 통한 화질상 계층구조를 구현하는 과정을 보여주는 것이다.4 illustrates a process of implementing a hierarchical structure of image quality by dividing a frequency band.

도 5는 밝기 정보의 각 계층을 구성하는 주파수 성분들의 집합을 보여주는 것이다.5 shows a set of frequency components constituting each layer of brightness information.

도 6은 색차 정보의 각 계층을 구성하는 주파수 성분들의 집합을 보여주는 것이다.6 shows a set of frequency components constituting each layer of chrominance information.

도 7은 형상정보의 존재유무에 따른 정보형태를 표현하는 코드의 구성예이다.7 is a structural example of a code expressing an information type according to the presence or absence of shape information.

Claims

In the method of encoding the video input data consisting of the shape information of the object and the texture information inside the object in a hierarchical structure of the spatial and image quality

(a) downsampling the shape information and the texture information at a predetermined ratio to form a spatial hierarchy including one base layer and one or more upper layers;

(b) encoding the shape information and texture information of the base layer to generate a base bitstream of the base layer, frequency-converting the difference between the decoded texture information and the original texture information, and hierarchically structured image quality for each frequency band. Configuring a; And

(c) for each of the one or more upper layers, encoding a difference between the shape information upsampled in the base layer and the shape information of the upper layer to generate a basic bitstream of the upper layer, and decoding in step (b) Encoding the difference between the texture information and the texture information of the upper layer, and encoding the video with both the spatial and video quality hierarchies, characterized in that it comprises the steps of: Way.

(a) downsampling the shape information and the texture information, respectively, to form a spatial hierarchy including one base layer downsampled at the largest rate and one or more upper layers downsampled at a rate smaller than the base layer; Constructing;

(b) Regarding the shape information and texture information of the base layer,

(b1) shape-coding the shape information of the base layer;

(b2) padding, frequency converting, and quantizing the texture information of the base layer;

(b3) generating a basic bitstream of the base layer by variable length encoding by collecting the data generated in the steps (b1) and (b2);

(b4) inversely quantizing the data generated in the step (b2) and obtaining a difference between the texture information reproduced by inverse frequency conversion and the texture information of the base layer;

(b5) generating a bitstream according to each frequency band by performing frequency conversion encoding on the difference of step (b4) and classifying each frequency;

(c) shape and texture information of each upper layer,

(c1) shape encoding the difference between the shape information of the base layer by upsampling the shape information of the base layer and the shape information of the upper layer, and variable length encoding to generate an upper layer basic bitstream;

(c2) upsampling the texture information reproduced in the step (b4) to the upper layer and obtaining a difference between the padded texture information and the texture information of the upper layer; And

and (c3) generating a bitstream according to each frequency band by performing frequency conversion encoding on the difference of step (c2) and classifying each frequency. Video encoding method having a.

The method of claim 2,

And (b1) and (b2), wherein the shape information encoding comprises hierarchical shape information encoding using a bilinear processing method.

The method of claim 2,

And the frequency transform encoding comprises a discrete cosine transform.

The method of claim 2,

And the frequency transform encoding is a discrete wavelet transform.

The method of claim 4, wherein step (b5) and step (c3) are respectively

The difference brightness signal is divided into main blocks of N ㅧ N pixel size, each main block is divided into four sub-blocks, and the discrete cosine-converted brightness signal of each sub block is predetermined. Divide by number of bands,

For the color difference signal corresponding to each main block of the brightness signal, the color difference signal is divided into N / 2 ㅧ N / 2 size blocks, and the discrete cosine transformed color difference signal in each block unit is divided into a predetermined number of bands.

A video encoding method having a spatial hierarchical structure and an image quality hierarchical structure, wherein the discrete cosine transformed brightness signal and the discrete cosine transformed color difference signal are recombined for each band to form a hierarchical image quality. Greater than a given integer).

The method of claim 6,

Recombination for each band is a video encoding method having a spatial hierarchical structure and a quality hierarchical structure characterized in that the unit of the main block of the brightness signal.

The method of claim 6,

A video encoding method having a spatial hierarchical structure and a quality hierarchical structure, characterized in that recombination for each band is performed on an entire image unit.

The method of claim 6,

And each layer of the image quality recombined for each band is recoded by a predetermined method.

The method of claim 6,

Discrete cosine transform for only subblocks or blocks with shape information, and discrete cosine transform for subblocks or blocks without shape information, video coding with both spatial and image quality hierarchies Way.

In the device for encoding the video input data consisting of the shape information of the object and the texture information inside the object in a spatial hierarchy and image quality hierarchical structure,

Downsampling the shape information and the texture information, respectively, to form a spatial hierarchy including one base layer downsampled at the largest rate and one or more upper layers downsampled at a smaller rate than the base layer. Sampling unit;

A shape encoding unit 1 for shape coding the shape information of the base layer; A texture encoding unit that pads, texture transforms and quantizes texture information of the base layer; A first variable length encoder configured to variably encode the data output from the shape encoder 1 and the texture encoder to generate a basic bitstream of the base layer; A texture decoder which inversely quantizes the data output from the texture encoder and reproduces texture information by inverse frequency conversion; A first primary image generating unit generating a difference between texture information reproduced by the texture decoding unit and texture information of the base layer; And a basic layer encoding unit including a first image layer structure generation unit for generating a bitstream according to each frequency band by frequency-converting the difference generated by the first image generating unit and classifying each frequency. And

An upsampling unit for upsampling the shape information of the base layer to the upper layer and upsampling the texture information reproduced by the texture decoding unit to the upper layer; A shape encoding unit 2 for shape encoding a difference between the upsampled shape information and the shape information of the upper layer; A second variable length encoder for variable length encoding the output data of the shape encoder 2 to generate an upper layer basic bitstream; A second image generating unit obtaining a difference between texture information padding output data of the upsampling unit and texture information of the upper layer; At least one higher layer encoding unit including a second image layer structure generation unit for generating a bitstream according to each frequency band by frequency-converting the difference generated by the second image generator and classifying each frequency. And a video encoding apparatus having a spatial hierarchy and a quality hierarchy.

The method of claim 11,

And the shape encoding unit 1 and the shape encoding unit 2 are hierarchical shape information encoding units using a bilinear processing method.

12. The spatial hierarchical structure and the hierarchical image quality structure according to claim 11, wherein the texture encoding unit, the first image hierarchy structure generation unit, and the second image hierarchy structure generation unit each include a discrete cosine transformer. Video encoding apparatus having at the same time.

The method of claim 11,

The video encoding apparatus having both a spatial hierarchical structure and a quality hierarchical structure, wherein the texture encoder, the first image layer structure generator, and the second image layer structure generator each include a discrete wavelet converter. .

The method of claim 13,

Each of the first image layer structure generation unit and the second image layer structure generation unit divides the brightness signal of the difference into main blocks of N ㅧ N size units, and divides each main block into four subblocks (sub-). block), and the discrete cosine-converted brightness signal in each subblock unit for each predetermined number of bands,

A video encoding apparatus having a spatial hierarchical structure and an image quality hierarchical structure comprising recombining a discrete cosine transformed brightness signal and a discrete cosine transformed color difference signal for each band to form a hierarchical image quality (where N is 2). Greater than a given integer).

16. The subblock or block according to claim 15, wherein each of the first image layer structure generation unit and the second image layer structure generation unit performs discrete cosine transformation only on subblocks or blocks having shape information, and has no shape information. The video encoding apparatus having the spatial hierarchical structure and the hierarchical image quality structure at the same time, characterized in that no discrete cosine transform is performed on the.

In the method for decoding a coded bit stream consisting of a spatial hierarchical structure and a quality hierarchical structure,

(a) classifying the bitstream into a base layer bitstream and one or more upper layer bitstreams, while variable length decoding the bitstream;

(b) shape-decoding the encoded shape information included in the base layer bitstream to generate shape information of the base layer;

(c) inverse quantizing the encoded texture information included in the base layer bitstream and inverse frequency transforming to generate texture information of the base layer;

(d) sequentially inverse frequency converting the bitstreams selected from the image quality hierarchical structure included in the base layer bitstream and adding them to texture information of the base layer; And

(e) for each upper layer up to a selected upper layer among the one or more upper layers,

(e1) shape decoding the shape information of the upper layer included in the upper layer bitstream and adding the shape information to the shape information of the upsampled lower layer; And

(e2) sequentially repeating the inverse frequency conversion of the selected bitstreams in the image quality hierarchy included in the higher layer bitstream and adding them to the texture information of the upsampled lower layer. A video decoding method having a spatial hierarchy and a quality hierarchy.

An apparatus for decoding a coded bitstream consisting of a spatial hierarchical structure and an image quality hierarchical structure,

A variable length decoding unit classifying the bit stream into a base layer bitstream and one or more upper layer bitstreams while variable length decoding the bitstream;

A shape decoding unit 1 generating shape information of the base layer by shape decoding the encoded shape information included in the base layer bitstream; A texture decoder which inversely quantizes encoded texture information included in the base layer bitstream and inverse frequency transforms to generate texture information of the base layer; And a basic layer decoding unit including a first image layer structure decoding unit which sequentially converts bitstreams selected in the image quality hierarchical structure included in the base layer bitstream and adds the texture information of the base layer to the texture information of the base layer. And

An upsampling unit for upsampling the shape information of the lower layer immediately before the upper layer and the texture information of the immediately lower layer of the upper layer in the spatial hierarchical structure to the upper layer; A shape decoding unit 2 for shape decoding the shape information of the upper layer included in the upper layer bitstream and adding the shape information to the shape information of the upsampled lower layer; And at least one upper layer decoding unit including a second image layer structure decoding unit configured to sequentially inverse-frequency convert the bitstreams selected from the image quality hierarchy included in the upper layer bitstream and add the texture information of the upsampled lower layer. A video decoding apparatus having both a spatial hierarchy and a quality hierarchy.