KR20060101847A

KR20060101847A - Method for scalably encoding and decoding video signal

Info

Publication number: KR20060101847A
Application number: KR1020050084745A
Authority: KR
Inventors: 윤도현; 전병문; 박지호; 박승욱
Original assignee: 엘지전자 주식회사
Priority date: 2005-03-21
Filing date: 2005-09-12
Publication date: 2006-09-26

Abstract

본 발명은 영상 신호를 스케일러블하게 인코딩 하고 디코딩 하는 방법에 관한 것이다. 인핸스드 레이어의 인코딩 레벨 정보를 기초로 베이스 레이어 혹은 인핸스드 레이어의 프레임을 선택하고, 이를 기초로 레이어 간 예측 방법에 따라 상기 매크로 블록을 인코딩 한다. 따라서, 영상 신호를 스케일러블하게 인코딩 할 때 베이스 레이어에 동시간의 프레임이 존재하지 않는 인핸스드 레이어의 프레임에 대해서도 레이어 간 예측 방법을 적용할 수 있게 되고, 베이스 레이어의 비트 스트림으로부터 인코딩 레벨에 관한 정보를 얻을 수 없는 경우에도 대응되는 베이스 레이어의 프레임을 합리적으로 선택하여 이용할 수 있게 되어, 코딩 효율을 향상시킬 수 있게 된다.The present invention relates to a method for scalable encoding and decoding of a video signal. A frame of a base layer or an enhanced layer is selected based on encoding level information of an enhanced layer, and the macroblock is encoded according to the inter-layer prediction method based on the frame. Therefore, the inter-layer prediction method can be applied to the frame of the enhanced layer in which the frame is not simultaneously present in the base layer when the video signal is scalablely encoded, and the encoding level is determined from the bit stream of the base layer. Even when information cannot be obtained, it is possible to reasonably select and use a frame of a corresponding base layer, thereby improving coding efficiency.

MCTF, 베이스 레이어, 인핸스드 레이어, 레이어 간 예측, 시간적 분해 레벨, 예측 동작 MCTF, base layer, enhanced layer, inter-layer prediction, temporal decomposition level, prediction behavior

Description

Scalable method for encoding and decoding video signals {Method for scalably encoding and decoding video signal}

도 1은 베이스 레이어와 인핸스드 레이어의 2개의 레이어로 이루어진 멀티-레이어 구조로 영상 프레임 시퀀스에 대한 예측 영상을 생성하여 인코딩 하는 예를 도시한 것이고,FIG. 1 illustrates an example of generating and encoding a prediction image for an image frame sequence by using a multi-layer structure having two layers of a base layer and an enhanced layer.

도 2은 본 발명에 따른 영상 신호의 스케일러블 코딩 방법이 적용되는 영상 신호 인코딩 장치의 구성을 도시한 것이고,2 illustrates a configuration of a video signal encoding apparatus to which a scalable coding method of a video signal according to the present invention is applied.

도 3은 어느 한 시간적 분해 레벨에서 영상 신호에 대해 시간적 분해를 하는 구성을 도시한 것이고,3 illustrates a configuration of temporal decomposition of an image signal at any one temporal decomposition level,

도 4는 베이스 레이어에 동시간의 프레임이 존재하지 않는 인핸스드 레이어의 프레임에 대해 레이어 간 예측 방법에 의해 예측 동작을 수행할 때 사용할 베이스 레이어의 프레임을 선택하는 방법에 대한 본 발명에 따른 실시예를 도시한 것이고,4 illustrates an embodiment of a method of selecting a frame of a base layer to be used when performing a prediction operation by an inter-layer prediction method on a frame of an enhanced layer in which no frame exists simultaneously in the base layer. Is shown,

도 5는 도 2의 장치에 의해 인코딩 된 데이터 스트림을 디코딩 하는 장치의 구성을 도시한 것이고,5 illustrates a configuration of an apparatus for decoding a data stream encoded by the apparatus of FIG. 2,

도 6은 시간적 분해 레벨 N의 'H' 프레임 시퀀스와 'L' 프레임 시퀀스를 분 해 레벨 N-1의 'L' 프레임 시퀀스로 시간적 합성하는 구성을 도시한 것이다.6 illustrates a configuration of temporally synthesizing an 'H' frame sequence of temporal decomposition level N and an 'L' frame sequence into a 'L' frame sequence of level N-1.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100 : EL 인코더 101 : 추정/예측기100: EL encoder 101: estimator / predictor

102 : 갱신기 105 : BL 디코더102: updater 105: BL decoder

110 : 텍스처 인코더 120 : 모션 코딩부110: texture encoder 120: motion coding unit

130 : 먹서 150 : BL 인코더130: eat 150: BL encoder

200 : 디먹서 210 : 텍스처 디코더200: Demuxer 210: Texture Decoder

220 : 모션 디코딩부 230 : EL 디코더220: motion decoding unit 230: EL decoder

231 : 역갱신기 232 : 역예측기231: reverse updater 232: reverse predictor

233 : 모션 벡터 디코더 234 : 배열기233: motion vector decoder 234: array

240 : BL 디코더240: BL decoder

본 발명은, 영상 신호의 스케일러블 인코딩 및 디코딩 방법에 관한 것으로, 좀더 상세하게는 레이어 간 예측 방법을 적용하여 영상 신호를 스케일러블하게 인코딩 및 디코딩 하는 방법에 관한 것이다.The present invention relates to a scalable encoding and decoding method of a video signal, and more particularly, to a method for scalable encoding and decoding of a video signal by applying an inter-layer prediction method.

현재 널리 사용되고 있는 휴대폰과 노트북, 그리고 앞으로 널리 사용하게 될 이동(mobile) TV와 핸드헬드 PC 등이 무선으로 송수신하는 디지털 영상 신호에 대해서는 TV 신호에서와 같은 넓은 대역을 할당하기가 여의치 않다. 따라서, 이와 같은 이동성 휴대 장치를 위한 영상 압축 방식에 사용될 표준은 좀 더 영상 신호의 압축 효율이 높아야만 한다.For digital video signals transmitted and received wirelessly by mobile phones and laptops and mobile TVs and handheld PCs, which are widely used in the future, it is difficult to allocate wide bands as in TV signals. Therefore, the standard to be used for the image compression method for such a mobile portable device should be higher the compression efficiency of the video signal.

더욱이, 상기와 같은 이동성 휴대 장치는 자신이 처리 또는 표현(presentation)할 수 있는 능력이 다양할 수 밖에 없다. 따라서, 압축된 영상이 그만큼 다양하게 사전 준비되어야만 하는 데, 이는 동일한 하나의 영상원(source)에 대해서 초당 전송 프레임 수, 해상도, 픽셀 당 비트 수 등 다양한 변수들로 각각 조합된 여러 품질의 영상 데이터를 구비하고 있어야 함을 의미하므로, 컨텐츠 제공자에게 많은 부담이 될 수 밖에 없다.In addition, such a mobile portable device is inevitably varied in its ability to process or present. Therefore, the compressed image has to be prepared in such a variety that it is different from each other by various variables such as transmission frames per second, resolution, bits per pixel, etc. for the same image source. This means that it must be provided, which is a burden on the content provider.

이러한 이유로, 컨텐츠 제공자는, 하나의 영상원에 대해 고속 비트레이트의 압축 영상 데이터를 구비해 두고, 상기와 같은 이동성 장치가 요청하면 압축 영상 을 디코딩 한 다음 요청한 장치의 영상 처리 능력(capability)에 맞는 영상 데이터로 다시 인코딩 하여 이를 제공한다. 하지만 이러한 방식에는 트랜스코딩(transcoding)(decoding+scaling+encoding) 과정이 필히 수반되므로 이동성 장치가 요청한 영상을 제공함에 있어서 다소 시간 지연이 발생한다. 또한 트랜스코딩도 목표 인코딩이 다양함에 따라 복잡한 하드웨어의 디바이스와 알고리즘을 필요로 한다.For this reason, the content provider has high-speed bitrate compressed image data for one image source, decodes the compressed image when requested by the mobile device, and then fits the image capability of the requested device. This is provided by re-encoding the video data. However, this method requires a transcoding (decoding + scaling + encoding) process, and thus a time delay occurs in providing a video requested by the mobile device. Transcoding also requires complex hardware devices and algorithms as the target encoding varies.

이와 같은 불리한 점들을 해소하기 위해 제안된 것이 스케일러블 영상 코덱(SVC : Scalable Video Codec)이다. 이 방식은 영상 신호를 인코딩함에 있어, 최고 화질로 인코딩 하되, 그 결과로 생성된 픽처(프레임) 시퀀스의 부분 시퀀스(시퀀스 전체에서 간헐적으로 선택된 프레임의 시퀀스)를 디코딩 하여도 영상의 화질을 어느 정도 보장할 수 있도록 하는 방식이다.Scalable video codec (SVC) has been proposed to solve such disadvantages. This method encodes a video signal, and encodes at the highest quality, but decodes a partial sequence of the resulting picture (frame) sequence (sequence of frames selected intermittently throughout the sequence) to some extent. It's a way to ensure that.

MCTF(Motion Compensated Temporal Filter(or Filtering))는 상기와 같은 스케일러블 영상 코덱에 사용하기 위해 제안된 인코딩 방식의 일 예이다. MCTF 방식은 대역폭이 제한된 이동 통신 등과 같은 전송 환경에 적용될 가능성이 높으므로 초당 전송되는 비트 수를 낮추기 위해 높은 압축 효율, 즉 높은 코딩 효율(coding efficiency)을 필요로 한다.Motion Compensated Temporal Filter (or MCTF) is an example of an encoding scheme proposed for use with the scalable video codec. Since the MCTF scheme is likely to be applied to a transmission environment such as a bandwidth-limited mobile communication, a high compression efficiency, that is, a high coding efficiency is required to lower the number of bits transmitted per second.

앞서 언급한 바와 같이 스케일러블 방식인 MCTF로 인코딩 된 픽쳐 시퀀스 중 일부만을 수신하여 처리하여도 어느 정도의 화질을 보장하지만, 비트 레이트(bit rate)가 낮아지는 경우에는 화질 저하가 크게 나타난다. 이를 해소하기 위해서 낮은 전송률을 위한 별도의 보조 픽쳐 시퀀스, 예를 들어 소화면 및/또는 초당 프레 임 수 등이 낮은 픽쳐 시퀀스를 제공할 수도 있다.As mentioned above, although only a part of the picture sequence encoded by the scalable MCTF is received and processed, the image quality is guaranteed to some extent. However, when the bit rate is lowered, the image quality deteriorates. In order to solve this problem, a separate auxiliary picture sequence for low bit rate, for example, a small picture and / or a low picture sequence per frame may be provided.

보조 픽쳐 시퀀스를 베이스 레이어(base layer)라고 부르고, 주 픽쳐 시퀀스를 인핸스드(enhanced)(또는 인핸스먼트(enhancement)) 레이어라고 부른다. 베이스 레이어와 인핸스드 레이어는 동일한 영상 콘텐츠를 공간 해상도나 프레임 레이트 등을 달리하여 인코딩 한 것이므로, 양 레이어의 영상 신호에는 잉여 정보(redundancy)가 존재한다. 따라서, 인핸스드 레이어의 코딩 효율을 높이기 위해, 베이스 레이어의 모션 정보 및/또는 텍스쳐(texture) 정보를 이용하여 인핸스드 레이어의 영상 신호를 예측하여 인코딩 하는데, 이를 레이어 간 예측 방법(Inter-layer prediction method)이라 한다.The auxiliary picture sequence is called a base layer, and the main picture sequence is called an enhanced (or enhanced) layer. Since the base layer and the enhanced layer encode the same video content at different spatial resolutions or frame rates, redundancy exists in the video signals of both layers. Therefore, in order to improve coding efficiency of the enhanced layer, the video signal of the enhanced layer is predicted and encoded using motion information and / or texture information of the base layer, and the inter-layer prediction method is performed. method).

도 1은 베이스 레이어와 인핸스드 레이어의 2개의 레이어로 이루어진 멀티-레이어 구조로 영상 프레임 시퀀스에 대한 예측 영상을 생성하여 인코딩 하는 예를 도시한 것이다. 도 1에서, 베이스 레이어는 계층적 B 픽처(Hierarchical B Picture)를 이용해 입력 영상 신호를 인코딩 하고 , 인핸스드 레이어에서는 MCTF 구조에서의 시간적 분해(Temporal Decomposition) 과정을 통해 입력 영상 신호를 인코딩 한다.FIG. 1 illustrates an example of generating and encoding a predictive image for an image frame sequence in a multi-layer structure having two layers of a base layer and an enhanced layer. In FIG. 1, the base layer encodes an input video signal using a hierarchical B picture, and in the enhanced layer, encodes the input video signal through a temporal decomposition process in an MCTF structure.

베이스 레이어에 동시간의(temporally coincident) 픽처(프레임)가 있는 인핸스드 레이어의 픽처(프레임)에 대해서, 상기 동시간의 베이스 레이어 픽처를 이용하는 레이어 간 예측 방법이 적용될 수 있다. 도 1에서 인핸스드 레이어의 시간적 분해 레벨(Temporal Decomposition level) 1과 2의 H 픽처가 이에 해당한다.An inter-layer prediction method using the same base layer picture may be applied to a picture (frame) of an enhanced layer having a temporally coincident picture (frame) in the base layer. In FIG. 1, H pictures of Temporal Decomposition levels 1 and 2 of the enhanced layer correspond to this.

반면, 도 1에서 인핸스드 레이어의 시간적 분해 레벨 0의 H 픽처와 같이, 베 이스 레이어에 동시간의 픽처가 존재하지 않는 인핸스드 레이어의 픽처에 대해서는, 레이어 간 예측 방법이 적용될 수 없다.On the other hand, as shown in the H picture of the temporal decomposition level 0 of the enhanced layer in FIG. 1, the inter-layer prediction method may not be applied to a picture of the enhanced layer in which no pictures exist simultaneously in the base layer.

베이스 레이어에 동시간의 픽처가 존재하지 않는 인핸스드 레이어의 픽처에 대해서도 레이어 간 예측 방법을 적용하기 위해서는, 이에 대응되는 베이스 레이어의 픽처를 선택해야 하는데, 이에 대한 합리적인 방법이 제시되고 있지 않다. 특히, 베이스 레이어가 H.264와 같이 시간적 분해 레벨 등의 인코딩 정보가 없는 상태로 인코딩 되는 경우에는 인핸스드 레이어의 픽처에 대응되는 픽처를 베이스 레이어에서 선택할 기준이 없게 되는 문제가 있다.In order to apply the inter-layer prediction method to a picture of an enhanced layer having no simultaneous picture in the base layer, a corresponding picture of the base layer must be selected, but a reasonable method has not been proposed. In particular, when the base layer is encoded without encoding information such as temporal decomposition level, such as H.264, there is a problem that there is no criterion for selecting a picture corresponding to a picture of the enhanced layer from the base layer.

본 발명은 이러한 문제점을 해결하기 위해 창작된 것으로서, 본 발명의 목적은, 코딩 효율을 향상시킬 수 있도록, 베이스 레이어에 동시간의 픽처가 존재하지 않는 인핸스드 레이어의 픽처에 대해서도 레이어 간 예측 방법을 적용할 수 있도록 하는 방법을 제공하는데 있다.The present invention has been made to solve this problem, and an object of the present invention is to provide an inter-layer prediction method for a picture of an enhanced layer in which a picture is not simultaneously present in a base layer so as to improve coding efficiency. To provide a way to apply it.

본 발명의 구체적인 목적은, 인핸스드 레이어의 픽처에 대응되는 베이스 레이어 혹은 인핸스드 레이어의 픽처를 선택하기 위한 합리적인 방법을 제공하는데 있다.A specific object of the present invention is to provide a reasonable method for selecting a base layer or a picture of an enhanced layer corresponding to a picture of an enhanced layer.

상기한 목적을 달성하기 위해 본 발명의 일 실시예에 따른 영상 신호를 인코 딩 하는 방법은, 영상 신호를 스케일러블하게 인코딩 하여 제 1 레이어의 비트 스트림을 생성하는 단계; 및 상기 영상 신호를 소정의 방식으로 인코딩 하여 제 2 레이어의 비트 스트림을 생성하는 단계를 포함하여 구성되고, 여기서 상기 제 1 레이어의 비트 스트림을 생성하는 단계는, 인코딩 하고자 하는 영상 블록을 포함하는 상기 제 1 레이어의 임의의 프레임에 대해, 상기 제 1 레이어의 인코딩 레벨 정보를 기초로 레이어 간 예측에 이용될 제 2 레이어의 프레임을 선택하는 단계를 포함하여 이루어지는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a method of encoding a video signal, the method comprising: generating a bit stream of a first layer by scalable encoding of the video signal; And generating a bit stream of a second layer by encoding the video signal in a predetermined manner, wherein the generating of the bit stream of the first layer comprises: a video block to be encoded. And selecting a frame of the second layer to be used for inter-layer prediction based on the encoding level information of the first layer, for any frame of the first layer.

상기 실시예에서, 상기 제 1 레이어의 비트 스트림을 생성하는 단계는, 제 2 레이어에 동시간의 프레임이 존재하지 않는 상기 임의의 프레임 내의 영상 블록의 헤더 영역에 상기 영상 블록이 상기 제 2 레이어를 이용한 레이어 간 예측 방법에 따라 인코딩 되었음을 가리키는 정보를 기록하는 단계를 더 포함하여 이루어지는 것을 특징으로 한다.In the above embodiment, the generating of the bit stream of the first layer may include generating the second layer in the header area of the image block in the arbitrary frame in which no frame exists simultaneously in the second layer. The method may further include recording information indicating that encoding is performed according to the inter-layer prediction method used.

일 실시예에서, 동시간의 프레임이 상기 제 2 레이어에 존재하고, 인코딩 레벨이 가장 낮은 제 1 레이어의 프레임 중에서, 상기 임의의 프레임과 시간적 거리가 가장 가까운 프레임이 검색되고, 상기 검색된 제 1 레이어의 프레임과 동시간의 제 2 레이어의 프레임이 선택될 수 있다.In one embodiment, frames concurrently exist in the second layer, and among the frames of the first layer having the lowest encoding level, the frame closest to the arbitrary frame is searched for, and the retrieved first layer The frame of the second layer at the same time as the frame of may be selected.

또는, 다른 실시예에서는, 인코딩 레벨이 가장 낮은 제 1 레이어의 프레임 중에서 상기 임의의 프레임과 시간적 거리가 가장 가까운 프레임이 검색되고, 상기 검색된 제 1 레이어의 프레임이 상기 제 2 레이어의 프레임으로 선택될 수 있다.Alternatively, in another embodiment, among the frames of the first layer having the lowest encoding level, a frame closest to the arbitrary frame is searched, and the frames of the retrieved first layer may be selected as the frames of the second layer. Can be.

상기 인코딩 레벨 정보는, 시간적 분해 레벨 정보 또는 프레임이 디코딩 되 는 순서 정보인 것을 특징으로 한다.The encoding level information is characterized in that the temporal decomposition level information or the order information that the frame is decoded.

상기 실시예에서, 상기 선택된 제 1 레이어의 프레임이 2개 이상인 경우, 그 중 가장 앞선 프레임이 상기 제 1 레이어의 프레임으로 선택된다.In the above embodiment, when there are two or more frames of the selected first layer, the first frame among them is selected as the frame of the first layer.

본 발명의 다른 실시예에 따른 인코딩 된 영상 비트 스트림을 디코딩 하는 방법은, 소정의 제 2 방식으로 인코딩 되어 수신되는 제 2 레이어의 비트 스트림을 디코딩 하는 단계; 및 상기 디코딩 된 제 2 레이어를 이용하여, 스케일러블하게 인코딩 되어 수신되는 제 1 레이어의 비트 스트림을 디코딩 하는 단계를 포함하여 이루어지고, 여기서, 상기 제 1 레이어를 디코딩 하는 단계는, 상기 제 2 레이어를 이용한 레이어 간 예측 방법에 따라 인코딩 된 영상 블록을 포함하는 상기 제 1 레이어의 임의의 프레임에 대해, 상기 제 1 레이어의 인코딩 레벨 정보를 기초로 레이어 간 예측에 이용된 제 2 레이어의 프레임을 선택하는 단계를 포함하여 이루어지는 것을 특징으로 한다.According to another embodiment of the present invention, a method of decoding an encoded video bit stream includes: decoding a bit stream of a second layer that is encoded and received in a predetermined second manner; And decoding the bit stream of the first layer that is scalable and encoded using the decoded second layer, wherein the decoding of the first layer comprises: Selecting a frame of a second layer used for inter-layer prediction based on encoding level information of the first layer, for any frame of the first layer including an image block encoded according to the inter-layer prediction method using Characterized in that it comprises a step.

이하, 본 발명의 바람직한 실시예에 대해 첨부 도면을 참조하여 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명에 따른 영상 신호의 스케일러블 코딩 방법이 적용되는 영상 신호 인코딩 장치의 구성을 도시한 것이다.2 is a block diagram of a video signal encoding apparatus to which a scalable coding method of a video signal according to the present invention is applied.

도 2의 영상 신호 인코딩 장치는, 입력되는 영상 신호를, 예를 들어 MCTF 방식에 의해 매크로 블록(macro block) 단위로 스케일러블하게 인코딩 하고 적절한 관리 정보를 생성하는 인핸스드 레이어(EL) 인코더(100), 상기 인코딩 된 각 매크 로 블록의 데이터를 압축된 비트 열로 변환하는 텍스처(Texture) 코딩부(110), 상기 EL 인코더(100)에 의해 얻어지는 영상 블록의 모션 벡터(motion vectors)를 지정된 방식에 의해 압축된 비트 열로 코딩 하는 모션 코딩부(120), 입력 영상 신호를 지정된 방식, 예를 들어 MPEG 1, 2, 또는 4, 또는 H.261, H.264 방식으로 인코딩 하여 필요에 따라 소화면, 예를 들어 원래 크기의 25%인 픽쳐들의 시퀀스를 생성하는 베이스 레이어(BL) 인코더(150), 상기 텍스처 코딩부(110)의 출력 데이터, 상기 BL 인코더(150)의 소화면 시퀀스 및 상기 모션 코딩부(120)의 출력 벡터 데이터를 기 지정된 포맷으로 인캡슐(encapsulate) 한 다음 기 지정된 전송 포맷으로 상호 먹싱하여 출력하는 먹서(130)를 포함하여 구성된다.The video signal encoding apparatus of FIG. 2 is an enhanced layer (EL) encoder 100 that scalablely encodes an input video signal in macroblock units by, for example, an MCTF method and generates appropriate management information. Texture coding unit 110 for converting the data of each encoded macro block into a compressed bit string, and motion vectors of an image block obtained by the EL encoder 100 in a specified manner. The motion coding unit 120 coding the compressed bit stream by using a predetermined method, for example, MPEG 1, 2, or 4, or H.261, H.264, to encode a small picture, For example, a base layer (BL) encoder 150 that generates a sequence of pictures that are 25% of the original size, output data of the texture coding unit 110, a small picture sequence of the BL encoder 150, and the motion coding. Part 120 In the capsule (encapsulate), and then groups the specified transmission format to output a vector data group with the specified format is configured to include a meokseo 130, and outputting the cross-Muxing.

상기 EL 인코더(100)는, 임의 영상 프레임(또는 픽쳐) 내의 매크로 블록에 대하여 모션 추정(motion estimation)으로 구한 기준 블록을 감하는 예측 동작을 수행하며, 또한 상기 매크로 블록과 기준 블록의 이미지 차를 그 기준 블록에 더하는 갱신(update) 동작을 수행한다. 여기서, 갱신 동작은 필요에 따라 생략될 수 있다.The EL encoder 100 performs a prediction operation of subtracting a reference block obtained by motion estimation with respect to a macroblock in an image frame (or picture), and further extracts an image difference between the macroblock and the reference block. An update operation is added to the reference block. Here, the update operation may be omitted as necessary.

상기 EL 인코더(100)는, 입력되는 영상 프레임 시퀀스를 에러 값을 갖게 될 프레임과 상기 에러 값이 더해질 프레임, 예를 들어 홀수(odd) 프레임과 짝수(even) 프레임으로 분리하여, 예측 동작과 갱신 동작을 여러 인코딩 레벨에 걸쳐, 예를 들어 하나의 GOP(Group Of Pictures)에 대해 L 프레임(갱신 동작에 의해 생성되는 프레임)의 수가 1개가 될 때까지 수행하는데, 도 3은 그 중 한 레벨에서의 예측 동작과 갱신 동작에 관련된 구성을 도시한 것이다.The EL encoder 100 separates an input image frame sequence into a frame having an error value and a frame to which the error value is added, for example, an odd frame and an even frame, to predict and update the frame. The operation is performed over several encoding levels, for example, for one group of pictures (GOP) until the number of L frames (frames generated by the update operation) is one, while FIG. The configuration related to the predictive operation and the update operation is shown.

도 3의 구성은, 상기 BL 인코더(150)에서 인코딩 된 소화면 시퀀스의 스트림에 대한 프레임 레이트, 매크로 블록의 모드와 같은 인코딩 정보를 추출하고, 상기 베이스 레이어 스트림을 디코딩 하여 매크로 블록 또는 매크로 블록들로 구성된 프레임을 생성하는 기능을 갖는 베이스 레이어(BL) 디코더(105), 및 모션 추정(motion estimation)을 통해 레지듀얼(residual) 데이터를 갖게 될 프레임, 예를 들어 홀수 프레임에 대해서, 그 프레임 내의 각 매크로 블록에 대한 기준 블록을 찾고 그 기준 블록과의 이미지 차(각 대응 화소의 차값) 및 모션 벡터를 산출하는 예측 동작을 수행하는 추정/예측기(101)를 포함하고, 나아가 상기 매크로 블록에 대한 기준 블록을 포함하는 프레임, 예를 들어 짝수 프레임에 대해서, 상기 매크로 블록에 대해 산출한 이미지 차를 정규화(normalize)하여 해당 기준 블록에 더하는 갱신 동작을 수행하는 갱신기(102)를 더 포함할 수 있다.The configuration of FIG. 3 extracts encoding information such as a frame rate and a mode of a macro block for a stream of a small picture sequence encoded by the BL encoder 150, and decodes the base layer stream to decode the macro block or macro blocks. A base layer (BL) decoder 105 having a function of generating a frame consisting of a frame, and for a frame that will have residual data through motion estimation, for example, an odd frame, An estimator / predictor 101 for finding a reference block for each macro block and performing a prediction operation for calculating an image difference (difference value of each corresponding pixel) and a motion vector from the reference block, and further for the macro block. For a frame including a reference block, for example, an even frame, the image difference calculated for the macro block is normalized. and an updater 102 for performing an update operation to add to the corresponding reference block.

여기서, 이미지 차가 가장 적은 블록이 가장 높은 상관 관계를 갖는 블록이다. 이미지 차의 크기는, 예를 들어 pixel-to-pixel의 차이값 총합 또는 그 평균값 등으로 정해지며, 그 크기가 소정 문턱값 이하가 되는 블록들 중에서 크기가 가장 작은 매크로 블록 또는 블록들을 기준(reference) 블록(들)이라 한다.Here, the block with the smallest image difference is the block with the highest correlation. The size of the image difference is determined by, for example, a sum of pixel-to-pixel difference values or an average thereof, and refers to a macroblock or blocks having the smallest size among the blocks whose size is less than or equal to a predetermined threshold. ) Is called block (s).

상기 추정/예측기(101)가 수행하는 동작을 'P' 동작이라 하며, 'P' 동작에 의해 생성되는 프레임은 'H' 프레임이라 한다. 또한, 상기 갱신기(102)가 수행하는 동작을 'U' 동작이라 하며, 'U' 동작에 의해 생성되는 프레임은 'L' 프레임이라 한다.An operation performed by the estimator / predictor 101 is called a 'P' operation, and a frame generated by the 'P' operation is called an 'H' frame. In addition, an operation performed by the updater 102 is referred to as an 'U' operation, and a frame generated by the 'U' operation is referred to as an 'L' frame.

도 3의 추정/예측기(101)와 갱신기(102)는 프레임 단위가 아니고 하나의 프 레임이 분할된 복수 개의 슬라이스(slice)에 대해 병렬적으로 동시에 수행할 수도 있다. 이하의 실시예에서 사용되는 '프레임'의 용어는 '슬라이스'로 대체하여도 기술의 등가성이 유지되는 경우에는 '슬라이스'의 의미를 당연히 포함하는 것으로 해석되어야 한다.The estimator / predictor 101 and the updater 102 of FIG. 3 may simultaneously perform parallel operations on a plurality of slices in which one frame is divided instead of a frame unit. The term 'frame' used in the following embodiments should be construed to include the meaning of 'slice' when the equivalent of technology is maintained even if it is replaced with 'slice'.

상기 추정/예측기(101)는, 입력되는 영상 프레임 또는 전 레벨에서 얻어진 'L' 프레임의 홀수 프레임 각각에 대해서, 기 정해진 크기의 매크로 블록(macro-block)으로 분할하고, 각 분할된 매크로 블록과 이미지가 가장 유사한 블록을, 동일한 시간적 분해 레벨에 있는 전후의 짝수 프레임 내에서 또는 자신의 프레임 내에서 찾아서 예측 영상을 만들고 모션 벡터를 구하는 과정을 수행한다.The estimator / predictor 101 divides each of the inputted image frames or odd-numbered frames of 'L' frames obtained at all levels into macro-blocks having a predetermined size, The process of finding a block having the most similar image in a before and after even frame at the same temporal decomposition level or in its own frame to create a predictive image and obtaining a motion vector.

또는, 상기 추정/예측기(101)는, 매크로 블록에 대한 기준 블록을 베이스 레이어 프레임에서 찾을 수도 있는데, 상기 BL 디코더(105)에 의해 복원된 베이스 레이어의 동시간의 프레임 내에서 또는 아래에서 설명할 본 발명에 따른 방법에 따라 선택되는 베이스 레이어의 프레임 내에서(동시간의 베이스 레이어 프레임이 존재하지 않을 때) 기준 블록을 찾을 수 있다. 이 경우, 상기 추정/예측기(101)는, 상기 찾아진 베이스 레이어의 기준 블록의 텍스쳐 정보 및/또는 모션 벡터 정보를 이용하는 레이어 간 예측 방법에 의해 상기 매크로 블록을 인코딩 한다.Alternatively, the estimator / predictor 101 may find a reference block for a macro block in a base layer frame, which will be described later or in the same frame of the base layer reconstructed by the BL decoder 105. A reference block can be found within the frame of the base layer selected according to the method according to the invention (when no base layer frame at the same time exists). In this case, the estimator / predictor 101 encodes the macroblock by an inter-layer prediction method using texture information and / or motion vector information of the found reference layer of the base layer.

나아가, 상기 추정/예측기(101)는, 인해스드 레이어의 매크로 블록이 베이스 레이어의 텍스쳐 정보 및/또는 모션 벡터 정보를 이용하여 인코딩 되었음을 알리는 정보를 상기 매크로 블록의 헤더 영역에 삽입하여, 디코더에 알린다.Further, the estimator / predictor 101 inserts information indicating that the macroblock of the due layer has been encoded using texture information and / or motion vector information of the base layer into the header area of the macroblock and informs the decoder. .

상기 추정/예측기(101)는, 프레임 내의 모든 매크로 블록에 대해 상기의 과 정을 수행하여, 해당 프레임에 대한 예측 영상인 'H' 프레임을 완성한다. 또한, 상기 추정/예측기(101)는, 입력되는 영상 프레임 또는 전 레벨에서 얻어진 'L' 프레임의 모든 홀수 프레임에 대해서, 각 프레임에 대한 예측 영상인 'H' 프레임을 완성한다.The estimator / predictor 101 performs the above process on all macroblocks in a frame to complete an 'H' frame, which is a prediction image for the frame. In addition, the estimator / predictor 101 completes an 'H' frame, which is a predictive image for each frame, for all odd frames of an input image frame or an 'L' frame obtained at all levels.

한편, 상기 갱신기(102)는, 앞서 설명한 바와 같이, 상기 추정/예측기(101)에 의해 생성된 'H' 프레임 내의 각 매크로 블록 내의 이미지 차를 해당 기준 블록이 있는 'L' 프레임(입력되는 영상 프레임 또는 전 레벨에서 얻어진 'L' 프레임의 짝수 프레임)에 더하는 동작을 수행한다.On the other hand, the updater 102, as described above, the image difference in each macro block in the 'H' frame generated by the estimator / predictor 101 'L' frame with the corresponding reference block (input An even frame of an image frame or an 'L' frame obtained at the previous level).

베이스 레이어에 동시간의 프레임이 존재하지 않는 인핸스드 레이어의 현재 프레임(CurrPic)에 대해 레이어 간 예측 방법에 의해 예측 동작을 수행할 때 사용할 베이스 레이어의 프레임(BasePic)을 선택하는 방법에 대해 도 4를 참조로 설명한다.4 illustrates a method of selecting a base layer frame (BasePic) to be used when performing a prediction operation by an inter-layer prediction method on a current frame (CurrPic) of an enhanced layer in which no frame exists simultaneously in the base layer. It is explained with reference.

도 1에서와 마찬가지로, 도 4의 인핸스드 레이어에서 바로 이웃하는 프레임만을 이용하여 예측 동작과 갱신 동작이 수행되는 것으로 도시되어 있으나, 이는 도면을 단순하게 표현하기 위한 것이다. 인핸스드 레이어에서 예측 동작과 갱신 동작에 이용되는 프레임에는 제한이 없다.As in FIG. 1, although the prediction operation and the updating operation are performed using only the immediately neighboring frame in the enhanced layer of FIG. 4, this is for simplicity of drawing. There is no limitation on the frames used for the prediction operation and the update operation in the enhanced layer.

또한, 인핸스드 레이어에서, 마지막 레벨에서 생성되는 'L8' 프레임을 제외하고는, 'L' 프레임이 속하는 시간적 분해 레벨은 'L' 프레임과 동시간의 'H' 프레임이 속하는 레벨에 의해 결정된다. 도 4에서, 프레임 2와 동시간의 'L' 프레임은 'H2'와 동시간이므로 레벨 1에 속하고, 프레임 4와 동시간인 2개의 'L' 프레임은 'H4'와 동시간이므로 레벨 2에 속하고, 프레임 6과 동시간인 'L' 프레임은 'H6'과 동시간이므로 레벨 1에 속한다. 또한, 프레임 8과 동시간인 2개의 'L' 프레임은 'L8'과 동시간이므로 레벨 3에 속한다.In addition, in the enhanced layer, except for the 'L8' frame generated at the last level, the temporal decomposition level to which the 'L' frame belongs is determined by the level to which the 'H' frame belongs to the 'L' frame. . In FIG. 4, the 'L' frame at the same time as frame 2 belongs to level 1 because it is the same time as 'H2', and the two 'L' frames that are the same time as frame 4 are the same time as 'H4' and therefore level 2 'L' frame, which is the same time as frame 6, belongs to level 1 because it is the same time as 'H6'. Also, two 'L' frames that are the same time as frame 8 belong to level 3 because they are the same time as 'L8'.

먼저, 다음과 같은 세가지 조건을 동시에 만족하는 프레임(refPic)을 인핸스드 레이어에서 찾는다.First, the frame that meets the following three conditions simultaneously (refPic) is found in the enhanced layer.

첫번째 조건은 다음 세가지 세부 조건 중 적어도 어느 하나를 만족하는 프레임이다. i) 현재 프레임(CurrPic)의 List_0 레퍼런스 프레임들과 List_1 레퍼런스 프레임들에서 첫 번째 레퍼런스 프레임, ii) List_0과 List_1에서 활성인(active) 레퍼런스 프레임, 또는 및 iii) 현재 GOP(Group Of Picture) 혹은 Decoded Picture Buffer 내의 프레임.The first condition is a frame that satisfies at least one of the following three detailed conditions. i) the first reference frame in List_0 reference frames and List_1 reference frames of the current frame (CurrPic), ii) the active reference frame in List_0 and List_1, or iii) the current Group of Picture (GOP) or Decoded Frame in Picture Buffer.

여기서, 상기 레퍼런스 프레임은 현재의 프레임(CurrPic)에 대해 예측 동작을 수행할 때 이용될 수 있는 프레임들을 가리키고, List_0와 List_1에는 각각 소정 개수의 레퍼런스 프레임이 포함될 수 있다. List_0와 List_1에서 첫 번째 레퍼런스 프레임은 각각의 프레임 리스트에서의 첫번째 프레임을 가리킨다. 또한, List_0와 List_1 내의 다수의 레퍼런스 프레임 중에서 예측 동작 및/또는 갱신 동작에 사용하기 위하여 현재의 프레임에서 실제로 사용할 레퍼런스 프레임을 활성인 레퍼런스 프레임이라 한다.Here, the reference frame refers to frames that can be used when performing a prediction operation on the current frame CurPic, and a predetermined number of reference frames may be included in List_0 and List_1, respectively. The first reference frame in List_0 and List_1 points to the first frame in each frame list. Also, among the plurality of reference frames in List_0 and List_1, the reference frame actually used in the current frame for use in the prediction operation and / or the update operation is called an active reference frame.

두번째 조건 2은 베이스 레이어에 동시간의 프레임이 존재하는 프레임이다.The second condition 2 is a frame in which simultaneous frames exist in the base layer.

세번째 조건은 첫번째 조건과 두번째 조건을 모두 만족하는 프레임 중에서 시간적 분해 레벨이 가장 낮은 프레임이다.The third condition is the frame with the lowest temporal decomposition level among the frames satisfying both the first condition and the second condition.

다음, 위의 세가지 조건을 모두 만족하는 프레임이 인핸스드 레이어에 2개 이상 존재하는 경우, refPic는 현재의 프레임(CurrPic)과 시간 차(절대값)가 가장 작은 프레임이 되고, 이러한 조건을 만족하는 프레임이 하나가 아닌 경우에는, 그 중 가장 앞선 프레임, 즉 현재의 프레임(CurrPic)보다 앞선 프레임이 refPic으로 결정된다.Next, if there are two or more frames in the enhanced layer that satisfy all three conditions above, refPic becomes the frame having the smallest time difference (absolute value) from the current frame (CurrPic). If there is not one frame, the frame earlier than that, that is, the frame preceding the current frame CurPic, is determined as refPic.

그리고, 상기 결정된 refPic과 동시간의 베이스 레이어의 프레임이 상기 BasePic으로 결정될 수 있다. 상기 결정된 BasePic의 텍스쳐 및/또는 모션 벡터 정보를 이용하여, 인핸스드 레이어의 현재 프레임(CurrPic)에 대한 예측 동작이 수행될 수 있다.The frame of the base layer simultaneously with the determined refPic may be determined as the BasePic. A prediction operation on the current frame CurPic of the enhanced layer may be performed using the determined texture and / or motion vector information of BasePic.

또는, refPic과 동시간의 베이스 레이어의 프레임 BasePic을 이용해 인핸스드 레이어의 현재 프레임(CurrPic)을 예측하지 않고, 상기에서 결정된 refPic 자체를 동시간의 베이스 레이어의 프레임 BasePic으로 가정하여 인핸스드 레이어의 현재 프레임(CurrPic)을 예측할 수 있다. 이때, 상기 두번째 조건은 고려하지 않아도 된다. 즉, 베이스 레이어에 동시간의 프레임이 존재하지 않는 프레임도 refPic으로 결정될 수 있다.Or, instead of predicting the current frame CurPic of the enhanced layer by using the frame BasePic of the base layer simultaneously with refPic, assuming that the determined refPic itself is the frame BasePic of the base layer simultaneously, the current of the enhanced layer The frame CurrPic can be predicted. At this time, the second condition does not need to be considered. That is, a frame in which no frame exists simultaneously in the base layer may be determined as refPic.

여기서, 시간적 분해 레벨에 관련된 조건 대신에, 각 프레임 혹은 슬라이스에 존재하는 frame_num의 값이 대신 사용될 수 있다. 이 경우 시간적 분해 레벨이 가장 낮은 프레임이란 조건은, frame_num이 가장 높은 프레임이란 조건으로 변경된다. 여기서, frame_num은 각 프레임이 인코딩 되는 순서(전송 순서, 디코딩 순서 도 동일)에 따라서 순차적으로 부여되는 숫자로서, frame_num이 높은 프레임은 frame_num이 낮은 프레임에 비해서 시간적 분해 레벨이 낮거나 같기 때문에, 이와 같이 대체하는 것이 가능하다.Here, instead of the condition related to the temporal decomposition level, the value of frame_num existing in each frame or slice may be used instead. In this case, the condition of the frame having the lowest temporal decomposition level is changed to the condition of the frame having the highest frame_num. Here, frame_num is a number assigned sequentially according to the order in which each frame is encoded (the transmission order and the decoding order are also the same). Since a frame having a high frame_num has a lower or equal temporal decomposition level than a frame having a low frame_num, It is possible to replace.

예를 들어, 상기 첫번째 조건에서 세부 조건 i)이 적용되는 경우를 가정한다. 상기 세부 조건 i)을 만족하는 프레임은 List_0와 List_1에 각각 1개씩 총 2개 존재한다. 상기 2개의 프레임의 시간적 분해 레벨이 동일한지 여부를 판단한다. 먼저, 상기 2개의 프레임의 시간적 분해 레벨이 동일한 경우, 현재 프레임(CurrPic)과의 시간 차가 작은 프레임이 refPic으로 결정되고, 시간 차도 동일한 경우에는 앞선 프레임인 List_0 내의 refPic으로 결정된다. 이와 반대로, 상기 2개의 프레임의 시간적 분해 레벨이 동일하지 않은 경우에는 시간적 분해 레벨이 낮은 프레임이 refPic으로 결정된다. 그리고, BasePic은 상기 결정된 refPic 또는 refPic과 동시간의 베이스 레이어의 프레임이 된다.For example, assume that the detailed condition i) is applied in the first condition. A total of two frames satisfying the detailed condition i) exist in List_0 and List_1, respectively. It is determined whether the temporal decomposition level of the two frames is the same. First, when the temporal decomposition levels of the two frames are the same, a frame having a small time difference from the current frame CurPic is determined as refPic, and when the time difference is also the same, it is determined as refPic in the previous frame List_0. In contrast, when the temporal decomposition levels of the two frames are not the same, a frame having a low temporal decomposition level is determined as refPic. BasePic becomes a frame of the base layer simultaneously with the determined refPic or refPic.

이러한 예측에 이용될 베이스 레이어의 프레임을 선택하는 방법을 도 4에 적용하면 다음과 같다.A method of selecting a frame of the base layer to be used for such prediction is applied to FIG. 4 as follows.

도 4에서 인핸스드 레이어의 현재의 프레임(CurrPic) H3에 대응되는 베이스 레이어의 프레임(BasePic)을 선택하는 경우를 생각한다. List_0에 속하는 프레임은 H2와 L0이고, List_1에 속하는 프레임은 H4, H6, L8인데, 여기서 CurrPic H3에서 가까운 순서로 프레임을 나열하였다. 따라서, 첫번째 조건의 세부 조건 i)을 만족하는 List_0와 List_1의 첫 번째 프레임은 H2와 H4이다. H2와 H4 모두 두번째 조건을 만족하고, 시간적 분해 레벨은 각각 1과 2이므로, 세번째 조건에 따라 H2가 refPic이 된다. 따라서, refPic인 H2 또는 H2와 동시간에 있는 B2가 BasePic이 되고, H3은 인핸스드 레이어의 H2 또는 베이스 레이어의 B2의 텍스쳐 및/또는 모션 정보를 이용한 레이어 간 예측 방법에 의해 인코딩 될 수 있다.In FIG. 4, a case where a base layer frame BasePic corresponding to the current frame CurPic H3 of the enhanced layer is selected will be considered. Frames belonging to List_0 are H2 and L0, and frames belonging to List_1 are H4, H6 and L8. Here, the frames are arranged in the order close to CurrPic H3. Therefore, the first frames of List_0 and List_1 satisfying the detailed condition i) of the first condition are H2 and H4. Since both H2 and H4 satisfy the second condition and the temporal decomposition levels are 1 and 2, respectively, H2 becomes refPic according to the third condition. Accordingly, B2 at the same time as H2 or H2 which is refPic becomes BasePic, and H3 can be encoded by an inter-layer prediction method using texture and / or motion information of H2 of the enhanced layer or B2 of the base layer.

마찬가지 방법으로, 베이스 레이어에 동시간의 프레임이 존재하지 않는 H1, H5, H7에 대한 BasePic으로 각각 H2, H4, H6 또는 B2, B6, B6을 선택할 수 있다.In the same way, H2, H4, H6 or B2, B6, B6 can be selected as BasePic for H1, H5, H7 where no frame exists at the same time in the base layer.

앞서 설명한 바와 같이, 상기 방법은, 인핸스드 레이어의 시간적 분해 레벨 정보를 이용하기 때문에, 베이스 레이어의 비트 스트림으로부터 시간적 분해 레벨에 관한 정보를 얻을 수 없는 경우에도 적용될 수 있다.As described above, the method may be applied even when the information on the temporal decomposition level cannot be obtained from the bit stream of the base layer because the temporal decomposition level information of the enhanced layer is used.

지금까지 설명한 방법에 의해 인코딩 된 데이터 스트림은 유선 또는 무선으로 디코딩 장치에 전송되거나 기록 매체를 매개로 하여 전달되며, 디코딩 장치는 이후 설명하는 방법에 따라 원래의 영상 신호를 복원하게 된다.The data stream encoded by the method described so far is transmitted to the decoding device by wire or wirelessly or transmitted through a recording medium, and the decoding device reconstructs the original video signal according to the method described later.

도 5는 도 2의 장치에 의해 인코딩 된 데이터 스트림을 디코딩 하는 장치의 블록도이다. 도 5의 디코딩 장치는, 수신되는 데이터 스트림에서 압축된 모션 벡터 스트림과 압축된 매크로 블록 정보 스트림을 분리하는 디먹서(200), 압축된 매크로 블록 정보 스트림을 원래의 비압축 상태로 복원하는 텍스처 디코딩부(210), 압축된 모션 벡터 스트림을 원래의 비압축 상태로 복원하는 모션 디코딩부(220), 압축 해제된 매크로 블록 정보 스트림과 모션 벡터 스트림을 원래의 영상 신호로 역변환하는 인핸스드 레이어(EL) 디코더(230), 베이스 레이어 스트림을 정해진 방 식, 예를 들어 MPEG4 또는 H.264 방식에 의해 디코딩 하는 베이스 레이어(BL) 디코더(240)를 포함하여 구성된다. 상기 EL 디코더(230)는 프레임 레이트, 매크로 블록의 모드와 같은 베이스 레이어의 인코딩 정보 및/또는 디코딩 된 베이스 레이어의 프레임(또는 매크로 블록)을 이용한다. 상기 EL 디코더(230)는 예를 들어 MCTF 방식에 따라 원래의 영상 신호로 역변환할 수 있다.5 is a block diagram of an apparatus for decoding a data stream encoded by the apparatus of FIG. The decoding apparatus of FIG. 5 includes a demux 200 for separating the compressed motion vector stream and the compressed macro block information stream from the received data stream, and texture decoding for restoring the compressed macro block information stream to the original uncompressed state. A unit 210, a motion decoding unit 220 for restoring a compressed motion vector stream to an original uncompressed state, an enhanced layer EL for inversely converting the decompressed macroblock information stream and the motion vector stream into an original video signal. Decoder 230, a base layer (BL) decoder 240 for decoding the base layer stream by a predetermined method, for example, MPEG4 or H.264. The EL decoder 230 uses frame rate, encoding information of the base layer such as the mode of the macro block, and / or a frame (or macro block) of the decoded base layer. The EL decoder 230 may inversely convert the original video signal according to, for example, the MCTF scheme.

상기 EL 디코더(230)는, 입력되는 스트림으로부터 원래의 프레임 시퀀스로 복원하는데, 도 6은 상기 EL 디코더(230)의 주요 구성을 상세히 도시한 것으로, MCTF 방식에 대한 예이다.The EL decoder 230 reconstructs an original frame sequence from an input stream. FIG. 6 illustrates a main configuration of the EL decoder 230 in detail and is an example of an MCTF scheme.

도 6은 시간적 분해 레벨 N의 'H' 프레임 시퀀스와 'L' 프레임 시퀀스를 시간적 분해 레벨 N-1의 'L' 프레임 시퀀스로 시간적 합성(Temporal Composition)하는 구성이다. 도 6에는, 입력되는 'H' 프레임의 화소의 차값을 입력되는 'L' 프레임에서 선택적으로 감하는 역갱신기(231), 'H' 프레임의 이미지 차가 감해진 'L' 프레임과 그 'H' 프레임을 사용하여 원래의 이미지를 갖는 'L' 프레임을 복원하는 역예측기(232), 입력되는 모션 벡터 스트림을 디코딩 하여 'H' 프레임 내의 각 블록의 모션 벡터 정보를 각 단(stage)의 역갱신기(231)와 역예측기(232)에 제공하는 모션 벡터 디코더(233) 및 상기 역예측기(232)에 의해 완성된 'L' 프레임을 상기 역갱신기(231)의 출력 'L' 프레임 사이에 삽입하여 정상적인 순서의 'L' 프레임 시퀀스로 만드는 배열기(234)를 포함한다. 여기서, 인코딩 과정에서 갱신 과정이 생략되었다면, 역갱신기(231) 또한 생략될 수 있다.FIG. 6 is a configuration of temporal composition of an 'H' frame sequence of temporal decomposition level N and an 'L' frame sequence into a 'L' frame sequence of temporal decomposition level N-1. 6, an inverse updater 231 for selectively subtracting a difference value of a pixel of an input 'H' frame from an input 'L' frame, an 'L' frame having an image difference of the 'H' frame subtracted, and the 'H' Inverse predictor 232 which restores the 'L' frame with the original image using the frame, and decodes the input motion vector stream to invert the motion vector information of each block in the 'H' frame. Between the motion vector decoder 233 provided to the updater 231 and the reverse predictor 232, and the 'L' frame completed by the reverse predictor 232, between the output 'L' frame of the reverse updater 231. Inserter 234 into a sequence of 'L' frames in normal order. Here, if the update process is omitted in the encoding process, the reverse updater 231 may also be omitted.

상기 배열기(234)에 의해 출력되는 'L' 프레임은 레벨 N-1의 'L' 프레임 시 퀀스(701)가 되고, 이는 입력되는 N-1 레벨의 'H' 프레임 시퀀스(702)와 함께 다음 단의 역갱신기와 역예측기에 의해 'L' 프레임 시퀀스로 다시 복원되며, 이 과정이 인코딩 할 때 수행된 레벨만큼 수행되어 원래의 영상 프레임 시퀀스로 복원된다.The 'L' frame output by the arranger 234 becomes a 'L' frame sequence 701 of level N-1, which is accompanied by an input 'H' frame sequence 702 of N-1 level. It is restored to the 'L' frame sequence by the inverse updater and the reverse predictor of the next stage, and this process is performed by the level performed when encoding and restored to the original image frame sequence.

수신되는 레벨 N의 'H' 프레임과 레벨 N+1에서 생성된 레벨 N의 'L' 프레임이 레벨 N-1의 'L' 프레임으로 복원되는 레벨 N에서의 복원(시간적 합성) 과정을 보다 상세히 설명한다.Reconstruction (temporal synthesis) at level N where the received 'H' frame at level N and the 'L' frame at level N generated at level N + 1 are restored to the 'L' frame at level N-1 in more detail. Explain.

먼저 상기 역갱신기(231)는, 임의의 'L' 프레임(레벨 N)에 대해, 상기 모션 벡터 디코더(233)로부터 제공되는 모션 벡터를 참조하여, 인코딩 과정에서 상기 임의의 'L' 프레임(레벨 N)으로 갱신되는 원래의 'L' 프레임(레벨 N-1) 내의 블록을 기준 블록으로 하여 이미지 차를 구한 모든 'H' 프레임(레벨 N)을 파악한 다음, 상기 'H' 프레임 내의 매크로 블록의 에러 값을 상기 임의의 'L' 프레임 내의 해당 블록의 화소값에서 감하는 동작을 수행하여, 원래의 'L' 프레임을 복원한다.First, the inverse updater 231 refers to a motion vector provided from the motion vector decoder 233 for an arbitrary 'L' frame (level N). Identify all 'H' frames (level N) whose image difference is obtained by using blocks in the original 'L' frame (level N-1) updated to level N) as reference blocks, and then macroblocks in the 'H' frame. An error value of is subtracted from the pixel value of the corresponding block in the arbitrary 'L' frame, thereby restoring the original 'L' frame.

현재 'L' 프레임(레벨 N) 내의 블록 중 인코딩 과정에서 'H' 프레임 내의 매크로 블록의 에러 값에 의해 갱신된 블록에 대해 상기와 같은 역갱신 동작을 수행하여 레벨 N-1의 'L' 프레임으로 복원한다.Among the blocks in the current 'L' frame (level N), the reverse update operation is performed on the block updated by the error value of the macro block in the 'H' frame in the encoding process, thereby performing the 'L' frame at the level N-1. Restore to.

상기 역예측기(232)는, 임의의 'H' 프레임 내의 매크로 블록에 대해, 상기 모션 벡터 디코더(233)로부터 제공되는 모션 벡터를 참조하여, 'L' 프레임(상기 역갱신기(231)에 의해 역갱신되어 출력되는 'L' 프레임)에 있는 기준 블록을 파악한 다음, 상기 매크로 블록의 화소의 차값(에러 값)에 기준 블록의 화소값을 더함으로써 원래의 이미지를 복원한다.The inverse predictor 232 refers to a motion vector provided from the motion vector decoder 233 for a macro block within an arbitrary 'H' frame, by using an 'L' frame (the inverse updater 231). After recognizing the reference block in the reversely output 'L' frame, the original image is restored by adding the pixel value of the reference block to the difference value (error value) of the pixel of the macro block.

또는, 상기 역예측기(232)는, 임의의 'H' 프레임 내의 매크로 블록이 베이스 레이어의 텍스쳐 정보 및/또는 모션 벡터 정보를 이용하여 인코딩 되었음을 지시하는 정보가 상기 매크로 블록의 헤더에 포함된 경우, 상기 BL 디코더(240)로부터 제공되는 스트림 내의 헤더 정보와 디코딩 된 베이스 레이어의 프레임을 이용하여 상기 매크로 블록에 대한 원래의 이미지를 복원한다. 이때, 상기 역예측기(232)는, 앞에서 설명한 방법에 따라 베이스 레이어의 프레임(BasePic)을 선택하고, 선택된 프레임의 기준 블록의 텍스쳐 정보 및/또는 모션 벡터 정보를 이용하여 역예측 동작을 수행한다.Alternatively, when the inverse predictor 232 includes information indicating that a macroblock in an arbitrary 'H' frame is encoded using texture information and / or motion vector information of the base layer, the header of the macroblock includes: The original image for the macro block is reconstructed using the header information in the stream provided from the BL decoder 240 and the frame of the decoded base layer. In this case, the inverse predictor 232 selects the frame BasePic of the base layer according to the method described above, and performs the inverse predictive operation using texture information and / or motion vector information of the reference block of the selected frame.

현재 'H' 프레임 내의 모든 매크로 블록이 상기와 같은 동작을 통해 원래의 이미지로 복원되고, 이들이 모두 조합되어 'L' 프레임으로 복원되면, 이 'L' 프레임은 상기 배열기(234)를 통해 상기 역갱신기(231)에서 복원된 'L' 프레임과 교대로 배열되어 다음 단으로 출력된다.When all macro blocks in the current 'H' frame are restored to the original image through the above operation, and all of them are combined and restored to the 'L' frame, the 'L' frame is stored through the arranger 234. The reverse updater 231 alternately arranges the 'L' frame and outputs the next stage.

전술한 방법에 따라, 인코딩 된 데이터 스트림이 완전한 영상 프레임 시퀀스로 복구된다. 특히, MCTF 방식을 예로 들어 설명한 인코딩 과정에서 예측 동작과 갱신 동작을 한 GOP에 대해서 N회 수행한 경우, MCTF 디코딩 과정에서 역갱신 동작과 역예측 동작을 N회 수행하면 원래 영상 신호의 화질을 얻을 수 있고, 그 보다 작은 횟수로 수행하면 화질이 다소 저하되지만 비트 레이트는 보다 낮은 영상 프레임 시퀀스를 얻을 수 있다. 따라서, 디코딩 장치는 자신의 성능에 맞는 정도로 상기 역갱신 동작과 역예측 동작을 수행하도록 설계된다.According to the method described above, the encoded data stream is recovered into a complete image frame sequence. In particular, in the case of performing the NOP for the GOP that performed the prediction operation and the updating operation in the encoding process described using the MCTF method as an example, if the reverse update operation and the reverse prediction operation are performed N times in the MCTF decoding process, the image quality of the original video signal may be obtained. If the number of times is smaller, the image quality may be lowered slightly, but the image frame sequence having a lower bit rate may be obtained. Accordingly, the decoding apparatus is designed to perform the reverse update operation and the reverse prediction operation to the extent appropriate for its performance.

전술한 디코딩 장치는 이동 통신 단말기 등에 실장되거나 또는 기록 매체를 재생하는 장치에 실장될 수 있다.The above-described decoding apparatus may be mounted in a mobile communication terminal or the like or in an apparatus for reproducing a recording medium.

이상, 전술한 본 발명의 바람직한 실시예는 예시의 목적을 위해 개시된 것으로, 당업자라면 이하 첨부된 특허청구범위에 개시된 본 발명의 기술적 사상과 그 기술적 범위 내에서 또 다른 다양한 실시예들을 개량, 변경, 대체 또는 부가 등이 가능할 것이다.As described above, preferred embodiments of the present invention have been disclosed for the purpose of illustration, and those skilled in the art can improve, change, and further various embodiments within the technical spirit and the technical scope of the present invention disclosed in the appended claims. Replacement or addition may be possible.

따라서, 영상 신호를 스케일러블하게 인코딩 할 때 베이스 레이어에 동시간의 프레임이 존재하지 않는 인핸스드 레이어의 프레임에 대해서도 레이어 간 예측 방법을 적용할 수 있게 된다. 또한, 베이스 레이어의 비트 스트림으로부터 시간적 분해 레벨에 관한 정보를 얻을 수 없는 경우에도 대응되는 베이스 레이어의 프레임을 합리적으로 선택하여 이용할 수 있게 되어, 코딩 효율을 향상시킬 수 있게 된다.Therefore, the inter-layer prediction method can be applied to the frame of the enhanced layer in which the frame is not simultaneously present in the base layer when the video signal is encoded in a scalable manner. In addition, even when the information on the temporal decomposition level cannot be obtained from the bit stream of the base layer, the frame of the corresponding base layer can be rationally selected and used, thereby improving coding efficiency.

Claims

Scalablely encoding the video signal to generate a bit stream of the first layer; And

Generating a bit stream of a second layer by encoding the video signal in a predetermined manner;

The generating of the bit stream of the first layer may include

And selecting a frame of a second layer to be used for inter-layer prediction based on encoding level information of the first layer, for any frame of the first layer including an image block to be encoded. Characterized by a video signal encoding method.

The method of claim 1,

Generating the bit stream of the first layer,

Recording information indicating that the image block is encoded according to an inter-layer prediction method using the second layer in a header region of the image block in the arbitrary frame in which no frame exists simultaneously in the second layer. The video signal encoding method, characterized in that made.

The method of claim 1,

Selecting a frame of the second layer,

Retrieving a frame having a closest temporal distance from the arbitrary frame among the frames of the first layer having simultaneous frames in the second layer and having the lowest encoding level; And

And selecting a frame of the second layer at the same time as the retrieved frame of the first layer.

The method of claim 1,

Selecting a frame of the second layer,

Retrieving a frame closest to the temporal distance from the random frame among the frames of the first layer having the lowest encoding level; And

And selecting the retrieved frame of the first layer as the frame of the second layer.

The method of claim 1,

And the encoding level information is temporal decomposition level information or order information in which frames are decoded.

The method of claim 3, wherein

And if the searched frames of the first layer have two or more frames, selecting the most advanced frame among them as the frame of the first layer.

Decoding a bit stream of a second layer encoded and received in a predetermined second manner; And

Decoding the bit stream of the first layer, which is scalable and encoded using the second layer,

Here, the step of decoding the first layer,

A second layer used for inter-layer prediction based on encoding level information of the first layer, for any frame of the first layer including an image block encoded according to the inter-layer prediction method using the second layer And selecting a frame of the encoded video bit stream.

The method of claim 7, wherein

Decoding the first layer,

And reading information indicating that the image block has been encoded according to the inter-layer prediction method using the second layer from the header area of the image block. .

The method of claim 7, wherein

Selecting a frame of the second layer,

Selecting a frame of a second layer at the same time as the retrieved frame of the first layer.

The method of claim 7, wherein

Selecting a frame of the second layer,

Selecting the retrieved frame of the first layer as the frame of the second layer.

The method of claim 7, wherein

Wherein the encoding level information is temporal decomposition level information or order information in which frames are decoded.

The method of claim 9,

If there are two or more frames of the found first layer, selecting the most advanced frame among the frames of the first layer, wherein the encoded image bit stream is decoded.