KR20140106450A

KR20140106450A - Method and apparatus for scalable video encoding considering memory bandwidth and calculation complexity, method and apparatus for scalable video decoding considering memory bandwidth and calculation complexity

Info

Publication number: KR20140106450A
Application number: KR1020140022179A
Authority: KR
Inventors: 엘레나 알쉬나; 알렉산더 알쉰
Original assignee: 삼성전자주식회사
Priority date: 2013-02-25
Filing date: 2014-02-25
Publication date: 2014-09-03

Abstract

The present invention relates to scalable video encoding and decoding methods to optimize a memory bandwidth and computational quantity when performing an inter-layer prediction. Proposed is a scalable video encoding method, comprising: determining a reference layer image among base layer images to perform an inter-layer prediction to enhance the layer images; determining not to perform an inter-layer prediction among the enhancement layer images for the enhancement layer images if an upsampled reference layer image is determined by performing an inter-layer (IL) interpolation filtering to the reference layer image; and encoding residue components between the upsampled reference layer image and the enhancement layer images.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a scalable video encoding apparatus and method, a scalable video encoding apparatus and a scalable video encoding apparatus, a scalable video encoding apparatus and a scalable video encoding apparatus,

The present invention relates to a scalable video coding method and a decoding method. And more particularly, to a scalable video encoding method and a decoding method for optimizing a memory bandwidth and an arithmetic amount when interlaced prediction is performed.

Background of the Invention [0002] As the development and dissemination of hardware capable of playing back and storing high-resolution or high-definition video content increases the need for video codecs to effectively encode or decode high-definition or high-definition video content. According to the conventional video codec, video is encoded according to a limited encoding method based on a macroblock of a predetermined size.

The image data in the spatial domain is transformed into coefficients in the frequency domain using frequency conversion. The video codec divides an image into blocks of a predetermined size for fast calculation of frequency conversion, performs DCT conversion on each block, and encodes frequency coefficients on a block-by-block basis. Compared to image data in the spatial domain, coefficients in the frequency domain have a form that is easy to compress. In particular, since the image pixel values of the spatial domain are expressed by prediction errors through inter prediction or intra prediction of the video codec, many data can be converted to 0 when the frequency transformation is performed on the prediction error. Video codecs reduce the amount of data by replacing consecutively repeated data with small-sized data.

The multi-layer video codec decodes the base layer video and one or more enhancement layer video. The amount of data in this layer video and enhancement layer video can be reduced by eliminating temporal / spatial redundancy in the base layer video and enhancement layer video, respectively. It is possible to decode only the base layer video or decode both the base layer video and the enhancement layer video according to the playback capability of the receiving end.

Various embodiments of a scalable video encoding method and apparatus for limiting the number of interpolation filtering to minimize the increase of memory bandwidth and computation amount generated in interpolation filtering performed for inter prediction and inter-layer prediction are provided. There are provided various embodiments of a scalable video decoding method and apparatus in which inter prediction and inter-layer prediction are limited in an enhancement layer under predetermined conditions.

A scalable video encoding method according to various embodiments includes determining a reference layer image from among base layer images to inter-layer predict an enhancement layer image; Performing inter-layer (IL) interpolation filtering on the determined reference layer image to generate an upsampled reference layer image; And if the upsampled reference layer image is determined through the IL interpolation filtering, decide not to perform inter prediction between enhancement layer images for the enhancement layer image, And encoding the residue component.

According to various embodiments, the sum of the first calculation amount of MC interpolation filtering for inter prediction between the base layer images and the second calculation amount of MC interpolation filtering for inter prediction between the enhancement layer images, The computation amount of the interpolation filtering for interlayer prediction can be limited so as not to be large.

The step of encoding the residue component between the enhancement layer images according to various embodiments may include encoding a reference index indicating that the reference image of the enhancement layer image is the upsampled reference image, And coding the motion vector to represent zero.

According to various embodiments, at least one of the number of taps of the MC interpolation filter for MC interpolation filtering for inter prediction, the number of taps of the IL interpolation filter for the IL interpolation filtering, and the size of the prediction unit of the enhancement layer image, , The number of MC interpolation filtering and the number of IL interpolation filtering may be limited.

The scalable video decoding method according to various embodiments includes the steps of: obtaining a reference index indicating a residue component and a reference layer image for inter-layer prediction of an enhancement layer video; Determining, based on the reference index, not to perform inter prediction between enhancement layer images and determining the reference layer image from among base layer images; Generating an upsampled reference layer image by performing IL interpolation filtering on the determined reference layer image; And reconstructing the enhancement layer image using the residue component of the interlaced prediction and the upsampled reference layer image.

According to various embodiments, when the reference index of the enhancement layer image indicates the upsampled reference image, the step of determining the reference layer image may include determining a motion vector for inter-prediction between the enhancement layer images to be 0 .

According to various embodiments, a scalable video encoding apparatus includes: a base layer encoding unit for performing inter-prediction on base layer images; And determining that the inter-prediction between the enhancement layer images is not performed for the enhancement layer image if the up-sampled reference layer image is determined by performing IL interpolation filtering on the determined reference layer image, And an enhancement layer encoder for encoding a residue component between the enhanced layer video and the enhancement layer video.

A scalable video encoding apparatus according to various embodiments includes a base layer decoding unit for performing motion compensation to reconstruct base layer images; And determining a reference index indicating a reference layer index and a residue component for inter-layer prediction of an enhancement layer image, determines not to perform inter-prediction between enhancement layer images based on the reference index, And performing an IL interpolation filtering on the determined reference layer image to generate an upsampled reference layer image and restoring the enhancement layer image using the residue component of the interlayer prediction and the upsampled reference layer image And a layer decoding unit.

The present invention provides a computer-readable recording medium on which a computer program for implementing a scalable video encoding method according to various embodiments is recorded.

The present invention provides a computer-readable recording medium on which a computer program for implementing a scalable video decoding method according to various embodiments is recorded.

1 shows a block diagram of a scalable video coding apparatus according to various embodiments.
2 shows a block diagram of a scalable video decoding apparatus according to various embodiments.
3 shows a detailed structure of a scalable video encoding apparatus according to various embodiments.
4 shows an individual prediction structure of base layer images and enhancement layer images.
FIG. 5 shows an interlayer prediction structure of base layer images and enhancement layer images.
Figure 6 shows the bandwidth of memory required for interpolation filtering for a block.
7 shows a memory access pattern.
FIG. 8 illustrates a memory bandwidth for MC interpolation filtering that varies according to the inter prediction mode and the block size according to an embodiment.
FIG. 9 illustrates a memory bandwidth for IL interpolation filtering that varies with block size according to one embodiment.
FIG. 10 shows the number of interpolation filtering performed in base layer coding and enhancement layer coding, without restriction.
FIG. 11 illustrates the number of interpolation filtering performed in base layer coding and enhancement layer coding under a predetermined condition according to an embodiment.
Figure 12 lists combinations of MC interpolation filtering and IL interpolation filtering that can be performed under certain conditions according to various embodiments.
13 shows a flow chart of a scalable video coding method according to various embodiments.
14 shows a flowchart of a scalable video decoding method according to various embodiments.
15 shows a block diagram of a video coding apparatus based on a coding unit according to a tree structure according to various embodiments.
16 shows a block diagram of a video decoding apparatus based on a coding unit according to a tree structure according to various embodiments.
Figure 17 shows the concept of a coding unit according to various embodiments.
FIG. 18 shows a block diagram of an image encoding unit based on an encoding unit according to various embodiments.
FIG. 19 shows a block diagram of an image decoding unit based on an encoding unit according to various embodiments.
20 illustrates depth-specific encoding units and partitions according to various embodiments.
Figure 21 illustrates the relationship between an encoding unit and a conversion unit, according to various embodiments.
Figure 22 illustrates depth-specific encoding information, in accordance with various embodiments.
FIG. 23 shows a depth encoding unit according to various embodiments.
Figures 24, 25 and 26 show the relationship of the coding unit, the prediction unit and the conversion unit according to various embodiments.
FIG. 27 shows the relationship between the encoding unit, the prediction unit, and the conversion unit according to the encoding information information in Table 1. FIG.
28 illustrates a physical structure of a disk on which a program according to various embodiments is stored.
29 shows a disk drive for recording and reading a program using a disk.
30 shows the overall structure of a content supply system for providing a content distribution service.
31 and 32 illustrate an external structure and an internal structure of a mobile phone to which the video encoding method and the video decoding method of the present invention are applied according to various embodiments.
33 shows a digital broadcasting system to which a communication system is applied.
34 shows a network structure of a cloud computing system using a video encoding apparatus and a video decoding apparatus according to various embodiments.

A scalable video encoding apparatus, a scalable video decoding apparatus, a scalable video encoding method, and a scalable video decoding method according to various embodiments are described below with reference to Figs. 1 to 14. 15 to 27, a video coding apparatus, a video coding apparatus, a video coding method, and a video decoding method based on a coding unit of a tree structure according to various embodiments are disclosed. Various embodiments in which the scalable video encoding method, the scalable video decoding method, the video encoding method, and the video decoding method according to the embodiments of Figs. 1 to 27 are applicable will be described with reference to Figs. 28 to 34. Fig.

Hereinafter, 'video' may be a still image of a video or a video, that is, a video itself.

Hereinafter, 'sample' means data to be processed as data assigned to a sampling position of an image. For example, pixel values assigned to pixels in an image in the spatial domain may be samples.

Hereinafter, the symbol refers to a value of each syntax determined by performing encoding on an image. The bitstreams generated by performing entropy encoding on the symbols may be successively output to generate a bitstream. The entropy decoding is performed on the bit strings parsed from the bit stream to recover the symbols, and if decoding is performed using the symbols, the images can be reconstructed.

First, referring to FIG. 1 to FIG. 14, a scalable video encoding apparatus, a scalable video encoding method, and a scalable video decoding apparatus and a scalable video decoding method according to an embodiment are disclosed.

1 shows a block diagram of a scalable video coding apparatus 10 according to various embodiments.

The scalable video encoding apparatus 10 according to various embodiments includes a base layer encoding unit 11 and an enhancement layer encoding unit 13.

The scalable video encoding apparatus 10 according to various embodiments classifies a plurality of video sequences according to a layer according to a scalable video coding scheme and encodes the video sequences, and generates a separate stream Can be output. The scalable video encoding apparatus 10 can encode the base layer video sequence and the enhancement layer video sequence into different layers.

The base layer encoding unit 11 may encode base layer images and output a base layer stream including encoded data of base layer images.

The enhancement layer coding unit 13 may encode the enhancement layer video and output an enhancement layer stream including the enhancement layer video encoding data.

For example, according to a scalable video coding scheme based on spatial scalability, low resolution images can be encoded as base layer images, and high resolution images can be encoded as enhancement layer images. The encoding result of the base layer images may be output to the base layer stream and the encoding result of the enhancement layer images may be output to the enhancement layer stream.

As another example, according to the scalable video coding scheme based on SNR (Scalable Video Coding), the resolution and size of the base layer images and the enhancement layer images are the same, but the difference in the quantization parameter QP have. The larger the QP, the larger the quantization interval, and thus the quality of the reconstructed image is lowered. The low-quality images with relatively large QPs are encoded as the base layer images, and the high-quality images with relatively small QPs can be encoded as enhancement layer images.

As another example, multi-view video can be encoded according to a scalable video coding scheme. Left view images may be coded as base layer images, and right view images may be coded as enhancement layer images. Alternatively, the center view images, the left view images, and the right view images are coded. Among them, the central view images are coded as the base layer images, the left view images are the first enhancement layer images, Enhancement layer images.

As another example, a scalable video coding scheme may be performed according to Temporal Hierarchical Prediction based on temporal scalability. A base layer stream including encoded information generated by encoding images of a basic frame rate may be output. It is possible to further encode the images of the high frame rate by referring to the images of the basic frame rate and output the enhancement layer stream including the encoding information of the high frame rate.

The scalable video encoding apparatus 10 according to various embodiments may perform inter prediction in which a current image is predicted by referring to images in a single layer. Through the inter prediction, a motion vector indicating motion information between the current image and the reference image and a residual between the current image and the reference image can be generated.

In addition, the scalable video encoding apparatus 10 according to various embodiments may perform inter-layer prediction in which enhancement layer images are predicted by referring to base layer images. Here, the current layer image to be subjected to the interlayer prediction is an enhancement layer image, and the reference layer image used for interlayer prediction may be a base layer image. Through inter-layer prediction, a position difference component between a current image and a reference image of another layer and a residual component between the current image and a reference image of another layer can be generated.

The interlayer prediction structure will be described later with reference to FIG.

The scalable video encoding apparatus 10 according to various embodiments encodes each block of each video of each layer. The type of block may be square or rectangular, and may be any geometric shape. But is not limited to a certain size of data unit. A block according to an exemplary embodiment may be a maximum encoding unit, an encoding unit, a prediction unit, a conversion unit, or the like among the encoding units according to the tree structure. The maximum encoding unit including the encoding units of the tree structure is variously divided into a coding block tree, a block tree, a root block tree, a coding tree, a coding route, or a tree trunk. It is also named. The video decoding method based on the coding units according to the tree structure will be described later with reference to FIGS. 8 to 20. FIG.

Inter prediction and inter-layer prediction may be performed based on a data unit of an encoding unit, a prediction unit, or a conversion unit.

The base layer coding unit 11 according to various embodiments may generate symbol data by performing source coding operations including inter prediction or intra prediction on base layer images. For example, the encoding unit 11 performs inter-prediction, intra-prediction, transformation, and quantization on samples of data units of base layer images to generate symbol data, performs entropy encoding on the symbol data, Stream can be generated.

The enhancement layer encoding unit 13 may encode the enhancement layer images based on the encoding units of the tree structure. The enhancement layer encoding unit 13 generates interleaved / intra-predicted, transformed, and quantized samples of the enhancement layer video encoding unit, generates symbol data, and performs entropy encoding on the symbol data to generate an enhancement layer stream can do.

The enhancement layer encoding unit 13 according to various embodiments may perform interlaced prediction to predict an enhancement layer image using reconstruction samples of a base layer image. The enhancement layer encoding unit 13 generates an enhancement layer prediction image using the base layer reconstructed image to encode the enhancement layer original image of the enhancement layer video sequence through the interlayer prediction structure, A prediction error between layer prediction images can be encoded.

The enhancement layer encoding unit 13 may perform interlaced prediction for each enhancement layer image, such as an encoding unit or a prediction unit. A block of the base layer image to be referred to by the block of the enhancement layer image can be determined.

For example, the restored image of the base layer image allocated with the same POC (Picture Order Count) as the enhancement layer image may be determined as the reference image. Also, of the blocks of the base layer reconstructed image, it may be determined that a block located corresponding to the position of the current block in the enhancement layer image is a reference block. The enhancement layer encoding unit 13 can determine an enhancement layer prediction block using a base layer reconstruction block corresponding to the enhancement layer block.

The enhancement layer encoding unit 13 may encode the current layer video sequence by referring to the base layer reconstructed images through the interlayer prediction structure. In addition, the enhancement layer encoder 13 according to various embodiments may encode the enhancement layer video sequence according to a single layer prediction structure without referring to other layer samples. As another example, the enhancement layer coding unit 13 may combine inter prediction and interlayer prediction in a single layer.

Hereinafter, the case where the enhancement layer images are coded with reference to the base layer images according to the interlayer prediction will be described in detail.

The base layer encoding unit 11 decodes the encoded samples for each of the encoding units of the tree structure of the base layer image through inverse quantization, inverse transform, inter prediction, or motion compensation, Can be restored. For a coded sample, a reconstructed image can be generated by performing coding on a previous slice and decoding. The reconstructed image of the previous slice can be referred to for inter prediction of the current slice.

The enhancement layer encoding unit 13 according to the embodiment can use the enhancement layer prediction block determined using the base layer reconstruction block according to the interlayer prediction structure as a reference image for interlayer prediction of the enhancement layer source block . The enhancement layer encoding unit 13 can encode an error between a sample value of an enhancement layer prediction block and a sample value of an enhancement layer original block, i.e., a residual component according to the interlayer prediction, using the base layer reconstructed image.

If the resolution is different between the base layer image and the enhancement layer image, such as spatial scalability, the image size is also different. Therefore, in order to generate the reference layer image for the enhancement layer image, the enhancement layer encoding unit 13 may perform the interpolation filtering for upsampling the base layer reconstructed image to the resolution of the enhancement layer image.

In general, interpolation filtering can be performed to determine a reference block on a sub-pixel basis even when inter prediction on a sub-pixel basis is performed.

Hereinafter, the interpolation filtering for inter prediction of an encoding unit will be referred to as 'MC (Motion Interpolation) interpolation filtering', and the interpolation filtering for inter-layer prediction will be referred to as 'IL (Inter-Layer) interpolation filtering'. MC interpolation filtering and IL interpolation filtering are collectively referred to as 'interpolation filtering'.

In general, since interpolation filtering is performed using surrounding samples of the current sample, not only block samples but also some samples of adjacent blocks are required for interpolation filtering for the current block. Therefore, as the number of interpolation filtering for inter prediction and inter-layer prediction increases, the memory bandwidth and computation burden are greatly increased.

Therefore, in order to reduce the memory bandwidth and the computation burden, the enhancement layer encoder 13 according to various embodiments selectively determines whether to perform inter-prediction within a single layer or inter-layer prediction using a reference layer image It is possible.

According to an exemplary embodiment, an up-sampled base layer reconstructed image, which is a reference layer image, is required for the inter-layer prediction structure. Therefore, the base layer encoding unit 11 can perform inter-prediction on the base layer images to generate base layer reconstructed images.

The enhancement layer encoding unit 13 can determine a reference layer image from the base layer images to inter-layer predict the enhancement layer image. The enhancement layer encoding unit 13 may perform an IL interpolation filtering on the determined reference layer image to generate an upsampled reference layer image.

For example, when the scalable video encoding apparatus 10 according to various embodiments performs the inter-prediction in the base layer and the inter-layer prediction in the enhancement layer, It is not necessary to be at least as large as the calculation burden to be performed.

The MC interpolation filtering and the number of times of performing the IL interpolation filtering can be used to evaluate the memory bandwidth or computation amount required for inter prediction or inter-layer prediction. Hereinafter, the number of times of filtering referred to in the present specification indicates the number of times that interpolation filtering is performed for one sample.

As described above, in order to perform the interlayer prediction for the enhancement layer image once, MC interpolation filtering for inter prediction, which is performed in the base layer, is performed once, and a basic layer restored image is generated for inter- The IL interpolation filtering for up-sampling may occur once.

In a typical individual layer prediction structure, if inter prediction is performed once in the base layer and inter prediction is performed once in the enhancement layer, interpolation filtering is performed at least twice in total by performing MC interpolation filtering and IL interpolation filtering once, respectively .

Therefore, when the interlayer prediction for the enhancement layer image is performed, the MC interpolation filtering for the base layer inter prediction and the IL interpolation filtering for the inter layer prediction are performed. Should not be added.

Therefore, when the up-sampled reference layer image is determined through the IL interpolation filtering, the enhancement layer coding unit 13 can decide not to perform inter-prediction between the enhancement layer images for the enhancement layer image. Therefore, the enhancement layer encoding unit 13 can only perform inter-layer prediction without inter-prediction to encode residual components between the up-sampled reference layer image and the enhancement layer image.

Accordingly, the enhancement layer encoding unit 13 may encode a reference index indicating that the reference image of the enhancement layer image is the upsampled reference image, and may code the motion vector for inter-prediction between the enhancement layer images to represent 0 .

The scalable video encoding apparatus 10 according to another embodiment may further include a number of taps of a MC interpolation filter for MC interpolation filtering for an enhancement layer image, a number of taps of an IL interpolation filter for IL interpolation filtering, The number of MC interpolation filtering and the number of times of IL interpolation filtering can be limited.

For example, interpolation filtering for blocks of size 8x8 or larger can be limited to i) a combination of two 8-tap MC interpolation filtering or ii) a combination of 8 tap MC interpolation filtering and one 8 tap IL interpolation filtering .

As another example, interpolation filtering for blocks of size 4x8 or greater may be combined with iii) one 8-tap IL interpolation filtering, or iv) two 6 tap IL interpolation filtering or v) two 4 tap IL interpolation filtering, or vi) 3 times 2 tap IL interpolation filtering.

Another example is interpolation filtering for blocks of size 8x16 or more, vii) a combination of two 8-tap MC interpolation filtering and a 4 tap IL interpolation filtering, or viii) four 2-tap MC interpolation filtering and 4 2-tap IL interpolation filtering , Or ix) a combination of two 8-tap MC interpolation filtering and two 2-tap IL interpolation filtering, or x) a combination of 8 8-tap MC interpolation filtering and 8 tap IL interpolation filtering.

The scalable video encoding apparatus 10 according to various embodiments may include a central processor (not shown) for collectively controlling the base layer encoding unit 11 and the enhancement layer encoding unit 13. Alternatively, the base layer coding unit 11 and the enhancement layer coding unit 13 are operated by respective ones of the processors (not shown), and the processors (not shown) operate mutually organically so that the scalable video coding apparatus 10 May be operated as a whole. Alternatively, the base layer coding unit 11 and the enhancement layer coding unit 13 may be controlled under the control of an external processor (not shown) of the scalable video coding apparatus 10. [

The scalable video coding apparatus 10 according to an exemplary embodiment may include one or more data storage units (not shown) in which input / output data of the base layer coding unit 11 and the enhancement layer coding unit 13 are stored . The scalable video encoding apparatus 10 may include a memory control unit (not shown) for controlling data input / output of a data storage unit (not shown).

The scalable video encoding apparatus 10 according to an embodiment operates in cooperation with an internal video encoding processor or an external video encoding processor to output a video encoding result to perform a video encoding operation including a conversion . The internal video encoding processor of the scalable video encoding apparatus 10 according to one embodiment can implement a video encoding operation as a separate processor. It is also possible to implement a basic video encoding operation by including the scalable video encoding device 10 or the central processing unit and the graphics processing unit with a video encoding processing module.

Therefore, the base layer encoding unit 11 of the scalable video encoding apparatus 10 encodes the base layer video sequence to generate a base layer bitstream, and the enhancement layer decoding unit 26 encodes the enhancement layer video sequence An enhancement layer bitstream can be generated.

The scalable video decoding apparatus 20 that receives and decodes the base layer bitstream and the enhancement layer bitstream generated by the above-described scalable video encoding apparatus 10 will be described below with reference to FIG.

2 shows a block diagram of a scalable video decoding apparatus 20 according to various embodiments.

The scalable video decoding apparatus 20 according to various embodiments includes a base layer decoding unit 21 and an enhancement layer decoding unit 23.

The scalable video decoding apparatus 20 according to various embodiments can receive bit streams on a layer-by-layer basis according to a scalable encoding scheme. The number of layers of the bit streams received by the scalable video decoding apparatus 20 is not limited. However, for convenience of explanation, the base layer decoding unit 21 of the scalable video decoding apparatus 20 receives and decodes the base layer stream, and the enhancement layer decoding unit 23 receives and decodes the enhancement layer stream The embodiment will be described in detail.

For example, the scalable video decoding apparatus 20 based on spatial scalability can receive a stream in which video sequences of different resolutions are coded in different layers. The base layer stream is decoded to reconstruct the low resolution image sequence, and the enhancement layer stream can be decoded to restore the high resolution image sequence.

As another example, the scalable video decoding apparatus 20 based on the SNR scaler can receive an image bitstream encoded with different QPs in the base layer and the enhancement layer. The low-quality images with a relatively large QP are decoded from the base layer bitstream, and the high-quality images with a relatively small QP from the enhancement layer bitstream can be decoded.

As another example, multi-view video may be decoded according to a scalable video coding scheme. When the stereoscopic video stream is received in multiple layers, the left-view images can be reconstructed by decoding the base layer stream. The right view image can be restored by further decoding the enhancement layer stream to the base layer stream.

Alternatively, when the multi-view video stream is received in multiple layers, the central view images can be reconstructed by decoding the base layer stream. The left view image can be restored by further decoding the first enhancement layer stream in the base layer stream. The second enhancement layer stream may be further decoded in the base layer stream to restore the right view images.

As another example, a scalable video coding scheme based on temporal scalability can be performed. Images of the basic frame rate can be reconstructed by decoding the base layer stream. A higher frame rate image can be restored by further decoding the enhancement layer stream to the base layer stream.

The scalable video decoding apparatus 20 obtains encoded data of base layer images and enhancement layer images from a base layer stream and an enhancement layer stream and further generates encoded data of the enhancement layer images by motion vector and inter- It is possible to obtain more predicted information.

For example, the scalable video decoding apparatus 20 may decode inter-predicted data for each layer and decode inter-layer predicted data among a plurality of layers. Reconstruction may be performed by motion compensation (Motion Compensation) and interlayer decoding based on a coding unit or a prediction unit according to an embodiment.

For each layer stream, the reconstructed images can be reconstructed by referring to reconstructed images predicted through inter prediction of the same layer, and performing motion compensation for the current image. Motion compensation refers to an operation of reconstructing a reconstructed image of a current image by synthesizing a residual image of a current image with a reference image determined using a motion vector of the current image.

In addition, the scalable video decoding apparatus 20 according to an exemplary embodiment may perform interlayer decoding by referring to base layer images to restore a predicted enhancement layer image through interlayer prediction. Interlayer decoding means reconstructing a reconstructed image of a current image by synthesizing a reference image of another layer determined to predict a current image and a residual component of the current image.

The scalable video decoding apparatus 20 decodes each video block by block. A block according to an exemplary embodiment may be a maximum encoding unit, an encoding unit, a prediction unit, a conversion unit, or the like among the encoding units according to the tree structure.

The base layer decoding unit 21 can decode the base layer video using the encoded symbols of the parsed base layer video. If the scalable video decoding apparatus 20 receives streams encoded on the basis of the coding units of the tree structure, the base layer decoding unit 21, on the basis of the coding units of the tree structure, for each maximum coding unit of the base layer stream Decryption can be performed.

The base layer decoding unit 21 can perform entropy decoding for each maximum encoding unit to obtain encoded information and encoded data. The base layer decoding unit 21 can perform inverse quantization and inverse transform on the coded data obtained from the stream to restore the residue component. The base layer decoding unit 21 according to another embodiment may directly receive a bitstream of the quantized transform coefficients. As a result of performing inverse quantization and inverse transform on the quantized transform coefficients, the residue component of the images may be restored.

The base layer decoding unit 21 can restore the base layer images by combining the predicted image and the residue components through motion compensation between the same layer images.

The enhancement layer decoding unit 23 can perform interlaced prediction for each enhancement layer image, such as an encoding unit or a prediction unit. A block of the base layer image to be referred to by the block of the enhancement layer image can be determined. For example, a restoration block of the base layer image positioned corresponding to the position of the current block in the enhancement layer image may be determined. The enhancement layer decoding unit 23 can determine an enhancement layer prediction block using a base layer reconstruction block corresponding to the enhancement layer block.

Specifically, the base layer decoding unit 21 decodes the encoded samples for each of the encoding units of the tree structure of the base layer image through inverse quantization, inverse transform, intra prediction, or motion compensation, Lt; RTI ID = 0.0 > a < / RTI > For a coded sample, a reconstructed image can be generated by performing coding on a previous slice and decoding. The reconstructed image of the previous slice can be referred to for inter prediction of the current slice. Therefore, the restored image of the previous slice can be used as the predicted image for the current slice.

According to the interlayer prediction structure, an enhancement layer prediction image can be generated using samples of a base layer reconstruction image. The enhancement layer decoding unit 23 can decode the enhancement layer stream to obtain a prediction error according to the interlayer prediction. The enhancement layer decoding unit 23 can generate an enhancement layer restored image by combining the enhancement layer prediction image with a prediction error.

As described above, the enhancement layer decoding unit 23 may restore the enhancement layer images by referring to the base layer reconstructed images through the interlayer prediction structure. In addition, the enhancement layer decoding unit 23 according to various embodiments may restore enhancement layer images according to a single layer prediction structure without referring to other layer samples. As another example, the enhancement layer decoding unit 23 may combine inter prediction (motion compensation) and interlayer prediction in a single layer.

Hereinafter, the case where the enhancement layer images are decoded using the base layer reconstructed images according to the interlayer prediction will be described in detail.

The base layer decoding unit 21 decodes the coded symbols for each coding unit of the tree structure of the base layer image through inverse quantization, inverse transform, intra prediction or motion compensation, and outputs the samples included in the current maximum coding unit Can be restored. It is possible to generate a reconstructed image by performing decoding on the previous slice. The reconstructed image of the previous slice can be referred to for motion compensation of the current slice.

The enhancement layer decoding unit 23 according to the embodiment can use the enhancement layer prediction block determined using the base layer reconstruction block according to the interlayer prediction structure as a reference image for interlayer prediction of the enhancement layer source block . The enhancement layer decoding unit 23 combines the error between the sample value of the enhancement layer prediction block and the sample value of the enhancement layer original block, i.e., the residual component according to the interlayer prediction, with the base layer restored image, can do.

If the resolution is different between the base layer image and the enhancement layer image, such as spatial scalability, the image size is also different. Therefore, in order to generate the reference layer image for the enhancement layer image, the enhancement layer decoding unit 23 may perform the interpolation filtering for upsampling the base layer reconstructed image to the resolution of the enhancement layer image.

Also, when inter prediction on the sub-pixel basis is performed, interpolation filtering can be performed to determine a reference block on a sub-pixel basis.

Therefore, in order to reduce the memory bandwidth and the calculation burden in the enhancement layer decoding unit 23 according to various embodiments, it is selectively determined whether to perform inter prediction in a single layer or inter layer prediction using a reference layer image It is possible.

As described above, for the inter-layer prediction structure, an up-sampled base layer reconstructed image which is a reference layer image is required. Therefore, the base layer decoding unit 23 can perform motion compensation on the base layer images to generate base layer reconstructed images.

The enhancement layer decoding unit 23 may determine a reference layer image from the base layer reconstructed images to perform interlayer prediction on the enhancement layer image. For example, it can be determined that restoration of the base layer image having the same POC as the enhancement layer image is the reference layer image. The enhancement layer decoding unit 23 may perform IL interpolation filtering on the determined reference layer image to generate an upsampled reference layer image.

In the scalable video decoding apparatus 20 according to various embodiments, the amount of computation for performing inter-prediction in the base layer and inter-layer prediction in the enhancement layer is determined by performing inter-prediction only on the respective layers of the base layer and enhancement layer It is necessary that the calculation load is not at least as large as the computation burden amount.

As described above with reference to FIG. 1, when the inter-layer prediction structure is compared with the individual layer prediction structure, when inter-layer prediction is performed for the enhancement layer image, MC interpolation filtering and inter- No additional interpolation filtering should be added in addition to the IL interpolation filtering.

Therefore, when the up-sampled reference layer image is determined through the IL interpolation filtering, the enhancement layer decoding unit 23 may decide not to perform inter-prediction between the enhancement layer images for the enhancement layer image.

For example, the enhancement layer decoding unit 23 can obtain a reference index indicating a residual component and a reference layer image for inter-layer prediction of an enhancement layer video. It is determined not to perform the inter prediction between the enhancement layer images based on the reference index and the reference layer image among the base layer images can be determined.

Therefore, the enhancement layer decoding unit 23 can perform the IL interpolation filtering on the reference layer image to obtain the upsampled reference layer image. The enhancement layer decoding unit 23 may reconstruct the enhancement layer block by synthesizing the reference block and the residue component of the reference layer image up-sampled block by block. The residue component between the enhancement layer images is combined with the upsampled reference layer image so that the enhancement layer image can be reconstructed.

The scalable video decoding apparatus 20 according to another exemplary embodiment may further include a number of taps of a MC interpolation filter for MC interpolation filtering for inter prediction, a number of taps of an IL interpolation filter for IL interpolation filtering, The number of MC interpolation filtering and the number of times of IL interpolation filtering can be limited based on at least one of the sizes of prediction units.

Therefore, the base layer decoding unit 21 of the scalable video decoding apparatus 20 decodes the base layer stream to restore the base layer video sequence, and the enhancement layer decoding unit 23 decodes the enhancement layer stream, The video sequence can be restored.

The scalable video decoding apparatus 20 according to various embodiments may include a central processor (not shown) for collectively controlling the base layer coding unit 21 and the enhancement layer coding unit 23. Alternatively, the base layer coding unit 21 and the enhancement layer coding unit 23 are operated by respective self-processors (not shown), and the processors (not shown) operate mutually organically so that the scalable video decoding apparatus 20 May be operated as a whole. Alternatively, the base layer coding unit 21 and the enhancement layer coding unit 23 may be controlled according to the control of an external processor (not shown) of the scalable video decoding apparatus 20 according to various embodiments.

The scalable video decoding apparatus 20 according to various embodiments may include one or more data storage units (not shown) in which input / output data of the base layer coding unit 21 and the enhancement layer coding unit 23 are stored . The scalable video decoding apparatus 20 may include a memory control unit (not shown) for controlling data input / output of a data storage unit (not shown).

The scalable video decoding apparatus 20 according to various embodiments operates in conjunction with an in-house video decoding processor or an external video decoding processor to reconstruct video through video decoding to perform a video decoding operation including an inverse transform Can be performed. The internal video decoding processor 20 of the scalable video decoding apparatus 20 according to the various embodiments may include a scalable video decoding apparatus 20 or a central processing unit 20 as well as a separate processor and a graphics processing apparatus including a video decoding processing module But may also include the case of implementing a basic video decoding operation.

According to the scalable video coding apparatus 10 and the scalable video decoding apparatus 20 according to the various embodiments described above with reference to Fig. 1, at least one of the partition size, partition type, It is possible to selectively determine whether to perform inter prediction in a single layer or interlayer prediction using a reference layer image.

Therefore, in the scalable video coding apparatus 10 according to various embodiments, even if the enhancement layer image is encoded according to the interlaced prediction structure, the memory bandwidth and the computation amount may not increase as compared with the case where the individual layer prediction is performed for each single layer .

Similarly, in the scalable video decoding apparatus 20 according to various embodiments, compared with the individual prediction structure for restoring the base layer images and the enhancement layer images for each single layer, Layer images and enhancement layer images can be restored.

3 shows the detailed structure of the scalable video coding apparatus 10 according to various embodiments.

The inter-layer coding system 1600 includes a base layer encoding stage 1610 and an enhancement layer encoding stage 1660 and an inter-layer prediction stage 1650 between the base layer encoding stage 1610 and the enhancement layer encoding stage 1660. [ . The base layer coding stage 1610 and the enhancement layer coding stage 1660 may show concrete configurations of the base layer coding unit 11 and the enhancement layer coding unit 13, respectively.

The base layer encoding unit 1610 receives the base layer video sequence and encodes the base layer video sequence. The enhancement layer encoding stage 1660 receives the enhancement layer video sequence and encodes it for each image. The redundant operations among the operations of the base layer encoding stage 1610 and the enhancement layer encoding stage 1620 will be described later.

The input image (low-resolution image, high-resolution image) is divided into a maximum encoding unit, an encoding unit, a prediction unit, a conversion unit, and the like through the block dividing units 1618 and 1668. Intra prediction or inter prediction may be performed for each prediction unit of the encoding unit for encoding the encoding units output from the block dividing units 1618 and 1668. Prediction switches 1648 and 1698 perform inter prediction by referring to the previous reconstructed image output from the motion compensators 1640 and 1690 according to whether the prediction mode of the prediction unit is the intra prediction mode or the inter prediction mode, Or intra prediction can be performed using the neighbor prediction unit of the current prediction unit in the current input image output from the intra prediction units 1645 and 1695. [ Dual dual information can be generated for each prediction unit through inter prediction.

The residue components between the prediction unit and the surrounding image are input to the conversion / quantization units 1620 and 1670 for each prediction unit of the encoding unit. The conversion / quantization units 1620 and 1670 can perform conversion and quantization on the basis of conversion units on the basis of conversion units of encoding units and output the quantized conversion coefficients.

The scaling / inverse transforming units 1625 and 1675 may perform the scaling and inverse transform on the transform coefficients quantized for each transform unit of the encoding unit to generate a residue component of the spatial domain. In the case of being controlled in the inter mode by the prediction switches 1648 and 1698, the residue component is combined with the previous reconstructed image or the neighbor prediction unit to generate a reconstructed image including the current prediction unit, and the reconstructed image is stored in the storage 1630 , 1680). The current reconstructed image may be transmitted to intra prediction units 1645 and 1695 / motion compensation units 1640 and 1690 according to a prediction mode of a prediction unit to be encoded next.

In particular, in the inter mode, the in-loop filtering units 1635 and 1685 perform deblocking filtering and SAO (Sample Adaptive Offset) for each restored image stored in the storage units 1630 and 1680, At least one of filtering may be performed. At least one of deblocking filtering and SAO (sample adaptive offset) filtering may be performed on at least one of a prediction unit and a conversion unit included in an encoding unit and an encoding unit.

De-blocking filtering is filtering to mitigate blocking of data units, and SAO filtering is filtering to compensate pixel values that are modified by data encoding and decoding. The data filtered by the in-loop filtering units 1635 and 1685 can be transmitted to the motion compensation units 1640 and 1690 on a prediction unit basis. The motion compensation units 1640 and 1690 and the current restored image output from the block dividing units 1618 and 1668 and the next encoding unit output from the block dividing units 1618 and 1668, A residue component can be generated.

In this way, the above-described encoding operation can be repeated for each encoding unit of the input image.

For the inter-layer prediction, the enhancement layer encoding stage 1660 may refer to the reconstructed image stored in the storage 1630 of the base layer encoding stage 1610. [ The encoding control unit 1615 of the base layer encoding unit 1610 controls the storage 1630 of the base layer encoding unit 1610 to transmit the reconstructed image of the base layer encoding unit 1610 to the enhancement layer encoding unit 1660 . The delivered base layer reconstructed image can be used as the enhancement layer predicted image.

The upsampling unit 1655 of the inter-layer prediction unit 1650 may upsample the reconstructed image of the base layer and transmit it to the enhancement layer encoding unit 1660 when the resolution of the base layer is different from that of the enhancement layer have. Therefore, the upsampled base layer reconstructed image can be used as the enhancement layer predicted image.

When the encoding control unit 1665 of the enhancement layer encoding stage 1660 controls the switch 1698 to perform inter-layer prediction, the base layer reconstructed image transmitted through the inter-layer prediction unit 1650 is referred to The enhancement layer image may be predicted.

In order to encode the image, various encoding modes for the encoding unit, the prediction unit, and the conversion unit can be set. For example, as an encoding mode for an encoding unit, depth or split flag and the like can be set. The prediction mode, the partition type, the intra direction information, the reference list information, and the like can be set as the encoding mode for the prediction unit. As an encoding mode for the conversion unit, conversion depth or division information and the like can be set.

The base layer encoding stage 1610 includes various depths for the encoding unit, various prediction modes for the prediction unit, various partition types, various intra directions, various reference lists, and various conversion depths for the conversion unit The prediction mode, the partition type, the intra direction / reference list, and the depth of transformation can be determined according to the result of performing the coding by applying the coding efficiency. Is not limited to the enumerated encoding modes determined at the base layer encoding stage 1610. [

The encoding control unit 1615 of the base layer encoding stage 1610 can control various encoding modes to suitably apply to the operation of the respective components. The coding control unit 1615 also controls the coding mode of the enhancement layer coding unit 1660 by referring to the coding result of the base layer coding unit 1610 for the interlayer coding of the enhancement layer coding stage 1660, It is possible to control to determine the dew component.

For example, the enhancement layer encoding stage 1660 may use the encoding mode of the base layer encoding stage 1610 as it is as the encoding mode for the enhancement layer image, or may improve the enhancement layer encoding performance by referring to the encoding mode of the base layer encoding stage 1610 The encoding mode for the layer video can be determined. The coding control unit 1615 of the base layer coding unit 1610 controls the control signal of the coding control unit 1665 of the enhancement layer coding unit 1660 of the base layer coding unit 1610 to control the enhancement layer coding unit 1660 May use the current encoding mode from the encoding mode of the base layer encoding stage 1610 to determine the current encoding mode.

Similar to the inter-layer coding system 1600 according to the inter-layer prediction scheme shown in FIG. 3, an inter-layer decoding system according to an inter-layer prediction scheme can also be implemented. That is, the inter-layer decoding system can receive the base layer bitstream and the enhancement layer bitstream. The base layer decoding unit of the inter-layer decoding system can decode the base layer bitstream to restore the base layer images. In an enhancement layer decoding step of a multi-layer video inter-layer decoding system, enhancement layer images can be reconstructed by decoding an enhancement layer bitstream using a base layer reconstructed image and parsed encoding information.

If the enhancement layer coding unit 13 of the scalable video coding apparatus 10 according to the various embodiments performs the interlayer prediction, the enhancement layer decoding unit 23 of the scalable video decoding apparatus 20 also performs the inter- The enhancement layer images can be reconstructed according to the layer decoding system.

The memory bandwidths of the individual layer prediction structure and the interlayer prediction structure are compared with reference to FIGS.

4 shows an individual prediction structure of base layer images and enhancement layer images.

According to the individual prediction structure, inter prediction can be performed in the base layer and the enhancement layer, respectively.

That is, in the base layer, the inter-prediction for the base layer image BL (45) is performed using at least one of the reference images (48, 49) belonging to the L0 reference list and the reference images Can be performed. Also, in the enhancement layer, the inter-prediction for the enhancement layer video EL (40) is performed using at least one of the reference images (43, 44) belonging to the L0 reference list and the reference images Can be performed.

In order to perform motion prediction or motion compensation on a sub-pixel basis, interpolation filtering is required on the reference image. Therefore, in order to perform inter prediction on the current image in each layer, MC interpolation filtering is performed once in the base layer, and MC interpolation filtering is performed once in the enhancement layer.

FIG. 5 shows an interlayer prediction structure of base layer images and enhancement layer images.

According to the interlayer prediction structure, inter prediction is performed in the base layer, and inter prediction and inter layer prediction can be performed in the enhancement layer.

That is, inter prediction for the base layer image BL (45) in the base layer can be performed. MC interpolation filtering can be performed once for sub-pixel motion prediction or motion compensation in the base layer.

In the enhancement layer, at least one of the reference images (53, 54) belonging to the L0 reference list and the reference images (51, 52) belonging to the L1 reference list is used, Can be performed. In the enhancement layer, MC interpolation filtering may be performed once for sub-pixel-based motion prediction or motion compensation.

Also, the up-sampled reference layer images 56 and 55 may be generated by up-sampling the reconstructed image of the base layer image 45 for inter-layer prediction. Up-sampled reference layer images 56 and 55 may be used for inter-layer prediction for enhancement layer video EL 40. [ In order to perform the inter-layer prediction, the IL interpolation filtering for the up-sampling of the base layer reconstructed image may be performed once.

In order to perform motion prediction or motion compensation on a sub-pixel basis, interpolation filtering is required on the reference image. Therefore, in order to encode the current image in each layer, interpolation filtering may be performed once for inter prediction of the base layer, and interpolation filtering may be performed for inter prediction of the enhancement layer.

Hereinafter, the computational complexity of the individual prediction structure and the interlayer prediction structure will be compared. The computational complexity can be evaluated in terms of the memory bandwidth required for the computation, the number of operations of multiplication and addition, the dynamic range of the sample to be computed, the memory size in which the filter coefficients are stored, and the computational latency. In the present specification, the computational complexity is evaluated using the memory bandwidth and the amount of computation (the number of computations) required for inter prediction and inter-layer prediction.

Hereinafter, the memory efficiency of the interpolation filtering occurring in the individual prediction structure and the interlayer prediction structure will be described in detail with reference to FIGS. 6 and 7. FIG.

Figure 6 shows the bandwidth of memory required for interpolation filtering for a block.

First, we want to determine how many adjacent pixels should be stored in memory for inter prediction of one sample in a block.

In the case of unidirectional prediction, interpolation filtering in one direction L0 or L1 is required. In the case of bidirectional prediction, a memory bandwidth is needed in which adjacent pixels for interpolation filtering in both directions L0 and L1 can be stored.

In the case of inter-layer prediction, a memory bandwidth is required in which adjacent pixels for interpolation filtering for up-sampling from the resolution of the base layer image can be stored.

When inter-layer prediction and inter prediction (unidirectional or bidirectional) are combined, both memory bandwidth for interpolation filtering for inter-layer prediction and memory bandwidth for interpolation filtering for inter prediction are both required.

When intercolor interpolation is used, the required memory bandwidth increases in proportion to the number of color components stored at different locations in the memory.

6, the width and height of the interpolation block including the samples to be interpolated are denoted by W and H, respectively. The width and height of the memory pattern representing the sample area that the memory can read at one time are denoted by w and h, respectively.

The filter length (number of filter taps) of the interpolation filter for the luma block is denoted by TL, and the filter length of the interpolation filter for the chroma block is denoted by TC.

The resolution of the enhancement layer to be predicted is S_EL, and the resolution of the base layer to be the reference object is represented by S_BL.

And the ratio S (= S_BL / S_EL) of the enhancement layer and the base layer. x 2 Spatial scalability can be expressed as S 1/2, x 1.5 spatial scalability S can be expressed as 2/3. In the case of SNR scalability or motion compensation, S may be determined to be 1.

The luma block 60 is a rectangular block having a width W and a height H, and the chroma block 62 is a rectangular block having a width W and a height H / 2. In the color format 4: 2: 0, the Cb component and the Cr component are arranged in an interleaved manner.

The memory bandwidth means the maximum amount of data that can be accessed at one time by accessing the memory. The larger the memory bandwidth required to perform inter prediction or inter-layer prediction, the lower the memory efficiency. For example, the closer S is to 1, the lower the memory efficiency.

The most memory bandwidth is required in the two-dimensional interpolation filtering in which the vertical direction performs interpolation filtering and the vertical direction interpolation filtering are sequentially performed. For example, the pixel blocks necessary for the two-dimensional interpolation filtering are determined by extending from the luma block 60 and the chroma block 62 in the vertical and horizontal directions by the size of the interpolation filter. A memory bandwidth is required to access the size of the expanded pixel block.

That is, for the interpolation filtering of the luma block 60, the memory bandwidth for accessing the memory area 61 of the width (W + TL-1) and the height (H + TL-1) based on the luma interpolation filter size TL Is required.

In order to perform the interpolation filtering of the chroma block 62, access to the memory area 63 of the width (W + 2 * TC-2) and the height (H / 2 + TL-1) is performed based on the chroma interpolation filter size TC. Memory bandwidth is required.

If the enhancement layer and the reference layer block have different spatial resolutions, scaling according to the resolution ratio is required. That is, the size of the memory area for interpolation filtering of the luma block 60 can be determined as (W + TL-1) x (H + TL-1) / S / S.

7 shows a memory access pattern.

Also, instead of reading the samples one at a time from the memory, it is possible to read as many memory patterns as the width w and the height h. However, if the upper left corner of the memory area 70 required for interpolation filtering does not coincide with the sample area 71 that can be accessed according to the memory pattern, unnecessary memory bandwidth is further needed, and memory efficiency may be lowered.

When the memory bandwidth is the largest, the memory efficiency is reduced to the lowest when the size of the memory pattern is added to the memory area for interpolation filtering.

That is, the maximum memory bandwidth required for interpolation filtering for the luma block 60 is the width ((W + TL-1) / S + w-1) and the height ((H + TL-1) / S + . &Lt; / RTI > The maximum memory bandwidth required for interpolation filtering for the chroma block 62 in a similar manner is the width ((W + 2TC-2) / S + w-1) and the height (H / 2 + TC- 1). &Lt; / RTI >

Finally, in the enhancement layer block, the maximum memory bandwidth required for interpolation filtering for one sample containing all of the YCbCr components can be determined by the following equation.

((W + TL-1) / S + w-1) * ((H + TL-1) / S + h-1) + / 2 + TC-1) / S + h-1) / W / H

The above-mentioned memory bandwidth is the memory size to be accessed in interpolation filtering for unidirectional inter prediction or inter-layer prediction. For bidirectional inter prediction, twice the memory bandwidth will be required compared to the memory bandwidth described above. When inter-layer prediction and inter prediction are combined, a memory bandwidth equal to the sum of the memory bandwidths in all interpolation filtering is required.

Further, according to the above-described equation with reference to Figs. 6 and 7, it can be envisaged that the memory bandwidth varies depending on the block size and the inter prediction mode.

Accordingly, the scalable video encoding apparatus 10 and the scalable video decoding apparatus 20 according to the embodiment are capable of reducing the memory bandwidth and the computational burden required for the interpolation filtering from being excessively increased, The number of interpolation filtering can be adjusted according to at least one of the inter prediction modes. An embodiment for limiting the number of interpolation filtering according to the filter size, the size of the block and the inter prediction mode will be described below with reference to the tables of Figs. 8 and 9.

FIG. 8 illustrates a memory bandwidth for MC interpolation filtering that varies according to the inter prediction mode and the block size according to an embodiment.

Referring to FIG. 8, the memory bandwidth required for interpolation filtering for prediction of the enhancement layer decreases from left to right.

The partition type of the block indicates the classification according to the size and type of the block.

According to one embodiment, bidirectional inter prediction is limited to 4x8 blocks and 8x4 in the enhancement layer.

On the other hand, bidirectional inter prediction for 8x8 blocks, 4x16 blocks, 16x4 blocks, unidirectional inter prediction for 4x8 blocks, bidirectional inter prediction for 8x16 blocks, and unidirectional inter prediction for 8x4 blocks are allowed in the enhancement layer. Also, according to the listed order, the memory bandwidth required in the interpolation filtering for each inter prediction is gradually reduced.

FIG. 9 illustrates a memory bandwidth for IL interpolation filtering that varies with block size according to one embodiment.

If two-dimensional IL interpolation filtering is performed for inter-layer prediction, the memory bandwidth required for IL interpolation filtering is equal to the memory bandwidth for MC interpolation filtering for unidirectional prediction. However, the smaller the size of the predicted block, the larger the memory bandwidth for IL interpolation filtering.

9, according to the order of 4x8 block, 8x4 block, 8x8 block, 4x16 block, 16x4 block, 8x16 block, 8x32 block, the memory bandwidth for IL interpolation filtering can be reduced.

The scalable video encoding apparatus 10 and the scalable video decoding apparatus 20 according to various embodiments can set a constraint on whether to perform both inter-layer prediction and inter-prediction in the enhancement layer. For example, inter prediction can not be performed in the enhancement layer under certain conditions, and only inter-layer prediction is possible.

Hereinafter, with reference to FIGS. 10 and 11, the number of interpolation filtering required for base layer coding and enhancement layer coding is compared in the case of no constraint.

FIG. 10 shows the number of interpolation filtering performed in base layer coding and enhancement layer coding, without restriction.

Unidirectional prediction or bi-directional prediction can be performed first for motion compensation, but bi-directional prediction requires twice as much memory bandwidth. It is assumed that motion compensation of bidirectional prediction is performed in order to assume a case where memory bandwidth is most required in motion compensation.

First, in the two-layer individual coding structure, base layer inter prediction and enhancement layer inter prediction are performed. Motion compensation can be performed in the base layer inter prediction and the enhancement layer inter prediction, respectively.

For the motion compensation, horizontal reference MC interpolation filtering and vertical direction MC interpolation filtering are necessary because the reference block is determined for each subpixel. In addition, in order to compensate for motion in the L0 direction and the L1 direction, MC interpolation filtering in the horizontal direction and MC interpolation filtering in the vertical direction in each prediction direction can be performed.

Therefore, for the base layer inter prediction in the two-layer independent coding structure, horizontal MC interpolation filtering and vertical MC interpolation filtering can be performed in the L0 direction. In addition, horizontal MC interpolation filtering and vertical MC interpolation filtering can be performed in the L0 direction for enhanced layer inter prediction in the two-layer independent coding structure. Therefore, in the two-layer individual coding structure, the memory bandwidth and calculation burden required for four interpolation filtering can occur.

In the next two-layer reference coding structure, a prediction scheme may be different depending on whether interlayer prediction is performed.

First, if inter-layer prediction is not performed even in the case of a two-layer reference coding structure, horizontal MC interpolation filtering and vertical MC interpolation filtering are performed in the L0 direction for the base layer inter- Horizontal MC interpolation filtering and vertical MC interpolation filtering can be performed. For enhancement layer inter prediction, horizontal MC interpolation filtering and vertical MC interpolation filtering are performed in the L0 direction, and horizontal MC interpolation filtering and vertical MC interpolation filtering are performed in the L1 direction. That is, memory bandwidth and computation burden for up to 8 interpolation filters may be required.

However, if inter-layer prediction is performed in the two-layer reference coding structure, horizontal IL interpolation filtering and vertical direction IL interpolation filtering may be further performed to generate an upsampled reference image for inter-layer prediction . Therefore, when interlayer prediction is performed, a memory bandwidth and computation burden for a maximum of 10 interpolation filters at maximum may be required.

FIG. 11 illustrates the number of interpolation filtering performed in base layer coding and enhancement layer coding under a predetermined condition according to an embodiment.

The scalable video encoding apparatus 10 and the scalable video decoding apparatus 20 according to the embodiment may restrict the inter prediction for enhancement layer prediction when additional memory bandwidth and calculation burden are generated according to the interlayer prediction .

The scalable video decoding apparatus 20 can perform four MC interpolation filtering including horizontal interpolation filtering and vertical interpolation filtering in the L0 direction and the L1 direction with respect to the base layer video. In addition, the scalable video decoding apparatus 20 includes two (2) blocks including a horizontal directional IL interpolation filtering and a vertical directional IL interpolation filtering for a reference layer image to generate an upsampled reference layer image for inter- It is possible to perform the inter-IL interpolation filtering.

In the scalable video decoding apparatus 20, the interpolation filtering for interlayer prediction can be limited to four MC interpolation filtering for base layer inter prediction and two IL interpolation filtering for inter layer prediction. That is, it can be limited to a total of six interpolation filtering.

Therefore, according to FIG. 11, the scalable video decoding apparatus 20 can determine a motion vector for enhancement layer inter-prediction to be 0 when inter-layer prediction is performed. That is, since the motion vector is 0, MC interpolation filtering for inter prediction can be omitted.

In addition, when the reference index for the current layer image indicates the up-sampled reference layer image, that is, when the inter-layer prediction is performed, the scalable video decoding apparatus 20 according to another embodiment of the present invention includes variables related to the motion vector, For example, a merge index indicating a merge target block, an mvp flag (mvp flag) indicating whether a motion vector predicter is used, a reference index of a motion vector, motion vector difference information mvd, and mvd are 0 And the flag (mvd zero flag) indicating " 0 "

11, the sum of the first calculation amount of MC interpolation filtering for inter prediction between base layer images and the second calculation amount of MC interpolation filtering for inter prediction between enhancement layer images, The computation amount of interpolation filtering for prediction may not be large.

According to another embodiment, if both the inter-layer prediction and the inter-prediction can be performed in the enhancement layer, the scalable video encoding apparatus 10 and the scalable video decoding apparatus 20 may perform the MC interpolation filtering on the filter size for IL interpolation filtering Lt; RTI ID = 0.0 > a < / RTI > Figure 12 below suggests combinable MC interpolation filtering and IL interpolation filtering, provided that the filter size for IL interpolation filtering is not larger than the filter size for MC interpolation filtering.

Figure 12 lists combinations of MC interpolation filtering and IL interpolation filtering that can be performed under certain conditions according to various embodiments.

According to the H.265 / HEVC standard, an 8-tap interpolation filter is used for MC interpolation filtering of a luma block when inter prediction is performed on a block of 8x8 or larger size, and a 4-tap interpolation filter is used for MC interpolation filtering of a chroma block. Is used. According to the H.264 standard, 6-tap interpolation filter is used for MC interpolation filtering of luma block when inter prediction is performed for blocks 4x4 or larger in size, and 2-tap interpolation filter is used for MC interpolation filtering of chroma block do.

The scalable video decoding apparatus 20 uses an 8-tap interpolation filter for IL interpolation filtering of a luma block and a 4-tap interpolation filter for IL interpolation filtering of a chroma block when interlayer prediction is performed. , And the IL interpolation filtering can be performed twice in succession. The minimum block size allowed is 8x8.

The scalable video decoding apparatus 20 has 8 tap MC interpolation filters for MC interpolation filtering of luma blocks and 8 tap interpolation filters for IL interpolation filtering in order to combine inter prediction and interlayer prediction for blocks of size 8x8 or larger IL interpolation filter can be used. For MC interpolation filtering of the chroma block, a 4-tap MC interpolation filter and a 4-tap IL interpolation filter can be used for IL interpolation filtering.

The scalable video decoding apparatus 20 permits only one IL interpolation filtering when interlayer prediction is performed using an 8-tap IL interpolation filter for a luma block of 4x8 or larger in size, and inter prediction can be restricted. Only one tap IL interpolation filtering may be allowed for the chroma block of the corresponding block.

In the case of performing inter-layer prediction using a 6-tap or 4-tap IL interpolation filter for a 4x8 or larger luma block, two IL interpolation filtering is allowed, but inter prediction can be limited. For the chroma block of the block, two 2-tap IL interpolation filtering may be allowed.

Three IL interpolation filtering is allowed when performing inter-layer prediction using a 2-tap IL interpolation filter for a 4x8 or larger size luma block, but inter prediction can be limited. Three times of 2-tap IL interpolation filtering may be allowed for the chroma block of the corresponding block.

The scalable video decoding apparatus 20 performs interleaved prediction using a 4-tap or 2-tap IL interpolation filter for a luma block having a size of 8x16 or more, and performs IL interpolation filtering using one IL interpolation filtering and 2 Conferencing MC interpolation filtering may be allowed. For the chroma block of the block, two 4-tap MC interpolation filtering and one 2-tap IL interpolation filtering can be performed for MC interpolation filtering.

In addition, when the scalable video decoding apparatus 20 performs two IL interpolation filtering using a 2-tap IL interpolation filter for a luma block of size 8x16 or more, four MC interpolation filtering using a 2-tap MC interpolation filter Can be allowed. Four times of 2-tap MC interpolation filtering and 2 times of 2-tap IL interpolation filtering can be performed on the chroma block of the corresponding block for MC interpolation filtering.

In addition, when the scalable video decoding apparatus 20 performs two IL interpolation filtering using a 2-tap IL interpolation filter for a luma block of size 8x16 or more, two MC interpolation filtering using an 8-tap MC interpolation filter Can be allowed. Two 4-tap MC interpolation filtering and two 4-tap IL interpolation filtering can be performed on the chroma block of the corresponding block for MC interpolation filtering.

In addition, when the scalable video decoding apparatus 20 performs one IL interpolation filtering using an 8-tap IL interpolation filter for a luma block having a size of 8x16 or more, two MC interpolation filtering using an 8-tap MC interpolation filter Can be allowed. Two 4-tap MC interpolation filtering and one 4-tap IL interpolation filtering can be performed on the chroma block of the corresponding block for MC interpolation filtering.

As described above with respect to various combinations of inter prediction and interlayer prediction that can be performed in the enhancement layer, it is confirmed that the minimum block size in which the combination of interlayer prediction and inter prediction can be allowed is 8x8.

13 shows a flow chart of a scalable video coding method according to various embodiments.

In step 131, the scalable video coding apparatus 10 can determine a reference layer image from the base layer images to inter-layer-predict an enhancement layer image.

First, the scalable video coding apparatus 10 performs four MC interpolation filtering including a horizontal direction interpolation filtering and a vertical direction interpolation filtering in the L0 prediction direction and the L1 prediction direction with respect to the base layer image, Motion compensation can be performed. The base layer reconstructed image can be generated through motion compensation. Among the base layer restored images, the reference layer image of the enhancement layer image can be determined.

In step 133, the scalable video coding apparatus 10 may perform IL interpolation filtering on the reference layer image to generate an upsampled reference layer image.

The scalable video encoding apparatus 10 can perform the two IL interpolation filtering including the horizontal directional IL interpolation filtering and the vertical directional IL interpolation filtering for the reference layer image to generate the upsampled reference layer image .

If the scalable video coding apparatus 10 performs inter-layer prediction on one of the enhancement layer images to prevent an increase in the amount of computation as compared with the two-layer independent prediction structure, the total interpolation filtering is performed by four MC interpolation Filtering and filtering of two IL interpolations.

In step 135, when the up-sampled reference layer image is determined through the IL interpolation filtering, the scalable video encoding apparatus 10 may decide not to perform inter prediction between the enhancement layer images for the enhancement layer image. The scalable video encoding apparatus 10 can encode the residual components between the upsampled reference layer video and the enhancement layer video and output the enhancement layer bitstream.

The scalable video encoding device 10 can encode a reference index indicating a reference image for each block. For the block in which the interlayer prediction is performed, a reference index indicating that the reference image of the enhancement layer image is the upsampled reference image can be encoded. In this case, since the scalable video coding apparatus 10 omits the enhancement layer inter prediction, the motion vector for inter prediction between the enhancement layer images can be encoded to indicate 0.

14 shows a flowchart of a scalable video decoding method according to various embodiments.

In step 141, the scalable video decoding apparatus 20 can obtain a reference index indicating a residue component and a reference layer image for inter-layer prediction of the enhancement layer video.

In step 143, the scalable video decoding apparatus 20 decides not to perform inter prediction between the enhancement layer images based on the reference index, and can determine the reference layer image from the base layer images.

When the reference index of the enhancement layer image indicates the upsampled reference image, the scalable video decoding apparatus 20 can determine that the motion vector for inter prediction between the enhancement layer images is 0.

First, the scalable video decoding apparatus 20 can determine a reference image for motion compensation of a base layer image. In order to determine sub-pixel reference blocks in the reference image, four MC interpolation filtering including horizontal interpolation filtering and vertical interpolation filtering are performed on the reference image in the L0 prediction direction and the L1 prediction direction, respectively can do. When motion compensation is performed on a base layer image, a base layer reconstruction image can be generated.

In step 145, the scalable video decoding apparatus 20 may perform the IL interpolation filtering on the reference layer image determined in the previous step to generate the upsampled reference layer image.

The scalable video decoding apparatus 20 performs the two-times IL interpolation filtering including the horizontal directional IL interpolation filtering and the vertical directional IL interpolation filtering on the reference layer image determined from the base layer decompressed images, A layer image can be generated.

If performing an inter-layer prediction on one of the enhancement layer images, then the total interpolation filtering may be limited to four MC interpolation filtering at step 143 and two IL interpolation filtering at step 145.

In step 147, the scalable video decoding apparatus 20 can restore the enhancement layer video using the residue component of the interlayer prediction and the upsampled reference layer video.

The scalable video coding apparatus 10 and the scalable video decoding apparatus 20 according to the other embodiments may be configured such that the enhancement layer video Based on at least one of the number of taps of the MC interpolation filter for MC interpolation filtering for inter prediction, the number of taps of the IL interpolation filter for IL interpolation filtering, and the size of a prediction unit of the enhancement layer image, The number of IL interpolation filtering may be limited.

Accordingly, the sum of the first calculation amount of MC interpolation filtering for inter-prediction between base layer images and the second calculation amount of MC interpolation filtering for inter prediction between the enhancement layer images, The computation amount of the interpolation filtering can be adjusted so as not to be large.

In the scalable video coding apparatus 10 according to the embodiment and the scalable video decoding apparatus 20 according to the embodiment, blocks into which video data is divided are divided into coding units of a tree structure, The encoding units, the prediction units, and the conversion units are sometimes used for inter-layer prediction or inter prediction. 15 to 27, a video coding method and apparatus, a video decoding method, and an apparatus thereof based on a coding unit and a conversion unit of a tree structure according to an embodiment are disclosed.

In principle, encoding / decoding processes for base layer images and encoding / decoding processes for enhancement layer images are separately performed in the encoding / decoding process for multi-layer video. That is, when interlaced prediction occurs in the multi-layer video, the coding / decoding result of the single layer video can be cross-referenced, but a separate coding / decoding process occurs for each layer video.

Therefore, for convenience of explanation, the video encoding process and the video decoding process based on the encoding units of the tree structure described below with reference to FIGS. 15 to 27 are a video encoding process and a video decoding process for single layer video, . However, as described above with reference to FIGS. 1 to 14, interlayer prediction and compensation between the basic view images and the enhancement layer images are performed for video stream encoding / decoding.

Therefore, in order for the encoding unit 12 of the scalable video encoding apparatus 10 according to the embodiment to encode the multi-layer video based on the encoding unit of the tree structure, It is possible to control the encoding of the single layer video allocated to each video encoding apparatus 100 by including the video encoding apparatus 100 of FIG. 15 as many as the number of layers of the multilayer video. In addition, the scalable video encoding apparatus 10 can perform inter-view prediction using the encoding results of a single single view of each video encoding apparatus 100. Accordingly, the encoding unit 12 of the scalable video encoding apparatus 10 can generate the base view video stream and the enhancement layer video stream including the encoding results for each layer.

Similarly, in order for the decoding unit 26 of the scalable video decoding apparatus 20 according to the embodiment to decode the multi-layer video based on the encoding unit of the tree structure, the received base layer video stream and the enhancement layer video In order to perform video decoding on a stream by layer basis, the video decoding apparatus 200 of FIG. 16 is included in the number of layers of the multilayer video and is controlled to decode the single layer video allocated to each video decoding apparatus 200 And the scalable video decoding apparatus 20 can perform inter-layer compensation using the decoding result of a separate single layer of each video decoding apparatus 200. [ Accordingly, the decoding unit 26 of the scalable video decoding apparatus 20 can generate base layer images and enhancement layer images reconstructed layer by layer.

FIG. 15 shows a block diagram of a video coding apparatus 100 based on a coding unit according to a tree structure according to an embodiment of the present invention.

The video coding apparatus 100, which includes video prediction based on a coding unit according to an exemplary embodiment, includes a coding unit determination unit 120 and an output unit 130. For convenience of explanation, the video encoding apparatus 100 with video prediction based on the encoding unit according to the tree structure according to an embodiment is abbreviated as 'video encoding apparatus 100'.

The encoding unit determination unit 120 may partition the current picture based on the maximum encoding unit which is the maximum size encoding unit for the current picture of the image. If the current picture is larger than the maximum encoding unit, the image data of the current picture may be divided into at least one maximum encoding unit. The maximum encoding unit according to an exemplary embodiment may be a data unit of size 32x32, 64x64, 128x128, 256x256, or the like, and a data unit of a character approval square whose width and height are two.

An encoding unit according to an embodiment may be characterized by a maximum size and a depth. The depth indicates the number of times the coding unit is spatially divided from the maximum coding unit. As the depth increases, the depth coding unit can be divided from the maximum coding unit to the minimum coding unit. The depth of the maximum encoding unit is the highest depth and the minimum encoding unit can be defined as the least significant encoding unit. As the depth of the maximum encoding unit increases, the size of the depth-dependent encoding unit decreases, so that the encoding unit of the higher depth may include a plurality of lower-depth encoding units.

As described above, according to the maximum size of an encoding unit, the image data of the current picture is divided into a maximum encoding unit, and each maximum encoding unit may include encoding units divided by depth. Since the maximum encoding unit according to an embodiment is divided by depth, image data of a spatial domain included in the maximum encoding unit can be hierarchically classified according to depth.

The maximum depth for limiting the total number of times the height and width of the maximum encoding unit can be hierarchically divided and the maximum size of the encoding unit may be preset.

The encoding unit determination unit 120 encodes at least one divided area in which the area of the maximum encoding unit is divided for each depth, and determines the depth at which the final encoding result is output for each of at least one of the divided areas. That is, the coding unit determination unit 120 selects the depth at which the smallest coding error occurs, and determines the coding depth as the coding depth by coding the image data in units of coding per depth for each maximum coding unit of the current picture. The determined coding depth and the image data of each coding unit are output to the output unit 130.

The image data in the maximum encoding unit is encoded based on the depth encoding unit according to at least one depth below the maximum depth, and the encoding results based on the respective depth encoding units are compared. As a result of the comparison of the encoding error of the depth-dependent encoding unit, the depth with the smallest encoding error can be selected. At least one coding depth may be determined for each maximum coding unit.

As the depth of the maximum encoding unit increases, the encoding unit is hierarchically divided and divided, and the number of encoding units increases. In addition, even if encoding units of the same depth included in one maximum encoding unit, the encoding error of each data is measured and it is determined whether or not the encoding unit is divided into lower depths. Therefore, even if the data included in one maximum coding unit has a different coding error according to the position, the coding depth can be determined depending on the position. Accordingly, one or more coding depths may be set for one maximum coding unit, and data of the maximum coding unit may be divided according to one or more coding depth encoding units.

Therefore, the encoding unit determiner 120 according to the embodiment can determine encoding units according to the tree structure included in the current maximum encoding unit. The 'encoding units according to the tree structure' according to an exemplary embodiment includes encoding units of depth determined by the encoding depth, among all depth encoding units included in the current maximum encoding unit. The coding unit of coding depth can be hierarchically determined in depth in the same coding area within the maximum coding unit, and independently determined in other areas. Similarly, the coding depth for the current area can be determined independently of the coding depth for the other area.

The maximum depth according to one embodiment is an index related to the number of divisions from the maximum encoding unit to the minimum encoding unit. The first maximum depth according to an exemplary embodiment may indicate the total number of division from the maximum encoding unit to the minimum encoding unit. The second maximum depth according to an exemplary embodiment may represent the total number of depth levels from the maximum encoding unit to the minimum encoding unit. For example, when the depth of the maximum encoding unit is 0, the depth of the encoding unit in which the maximum encoding unit is divided once may be set to 1, and the depth of the encoding unit that is divided twice may be set to 2. In this case, if the coding unit divided four times from the maximum coding unit is the minimum coding unit, since the depth levels of depth 0, 1, 2, 3 and 4 exist, the first maximum depth is set to 4 and the second maximum depth is set to 5 .

Prediction encoding and conversion of the maximum encoding unit can be performed. Likewise, predictive coding and conversion are performed on the basis of the depth coding unit for each maximum coding unit and for each depth below the maximum depth.

Since the number of coding units per depth is increased every time the maximum coding unit is divided by depth, the coding including prediction coding and conversion should be performed for every depth coding unit as the depth increases. For convenience of explanation, predictive encoding and conversion will be described based on a current encoding unit of at least one of the maximum encoding units.

The video encoding apparatus 100 according to an exemplary embodiment may select various sizes or types of data units for encoding image data. In order to encode the image data, the steps of predictive encoding, conversion, entropy encoding, and the like are performed. The same data unit may be used for all steps, and the data unit may be changed step by step.

For example, the video coding apparatus 100 can select not only a coding unit for coding image data but also a data unit different from the coding unit in order to perform predictive coding of the image data of the coding unit.

For predictive coding of the maximum coding unit, predictive coding may be performed based on a coding unit of coding depth according to an embodiment, i.e., a coding unit which is not further divided. Hereinafter, the more unfragmented encoding units that are the basis of predictive encoding will be referred to as 'prediction units'. The partition in which the prediction unit is divided may include a data unit in which at least one of the height and the width of the prediction unit and the prediction unit is divided. A partition is a data unit in which a prediction unit of a coding unit is divided, and a prediction unit may be a partition having the same size as a coding unit.

For example, if the encoding unit of size 2Nx2N (where N is a positive integer) is not further divided, it is a prediction unit of size 2Nx2N, and the size of the partition may be 2Nx2N, 2NxN, Nx2N, NxN, and the like. The partition type according to an embodiment is not limited to symmetric partitions in which the height or width of a prediction unit is divided by a symmetric ratio, but also partitions partitioned asymmetrically, such as 1: n or n: 1, Partitioned partitions, arbitrary type partitions, and the like.

The prediction mode of the prediction unit may be at least one of an intra mode, an inter mode, and a skip mode. For example, intra mode and inter mode can be performed for partitions of 2Nx2N, 2NxN, Nx2N, NxN sizes. In addition, the skip mode can be performed only for a partition of 2Nx2N size. Encoding is performed independently for each prediction unit within an encoding unit, and a prediction mode having the smallest encoding error can be selected.

In addition, the video encoding apparatus 100 according to an exemplary embodiment may perform conversion of image data of an encoding unit based on not only an encoding unit for encoding image data but also a data unit different from the encoding unit. For conversion of a coding unit, the conversion may be performed based on a conversion unit having a size smaller than or equal to the coding unit. For example, the conversion unit may include a data unit for the intra mode and a conversion unit for the inter mode.

The conversion unit in the encoding unit is also recursively divided into smaller conversion units in a similar manner to the encoding unit according to the tree structure according to the embodiment, And can be partitioned according to the conversion unit.

For a conversion unit according to one embodiment, a conversion depth indicating the number of times of division until the conversion unit is divided by the height and width of the encoding unit can be set. For example, if the size of the conversion unit of the current encoding unit of size 2Nx2N is 2Nx2N, the conversion depth is set to 0 if the conversion depth is 0, if the conversion unit size is NxN, and if the conversion unit size is N / 2xN / 2, . That is, a conversion unit according to the tree structure can be set for the conversion unit according to the conversion depth.

The coding information according to the coding depth needs not only the coding depth but also prediction related information and conversion related information. Therefore, the coding unit determination unit 120 can determine not only the coding depth at which the minimum coding error is generated, but also the partition type in which the prediction unit is divided into partitions, the prediction mode for each prediction unit, and the size of the conversion unit for conversion.

The encoding unit, the prediction unit / partition, and the conversion unit determination method according to the tree structure of the maximum encoding unit according to the embodiment will be described later in detail with reference to FIGS. 17 to 27.

The encoding unit determination unit 120 may measure the encoding error of the depth-dependent encoding unit using a Lagrangian Multiplier-based rate-distortion optimization technique.

The output unit 130 outputs, in the form of a bit stream, video data of the maximum encoding unit encoded based on at least one encoding depth determined by the encoding unit determination unit 120 and information on the depth encoding mode.

The encoded image data may be a result of encoding residual data of the image.

The information on the depth-dependent coding mode may include coding depth information, partition type information of a prediction unit, prediction mode information, size information of a conversion unit, and the like.

The coding depth information can be defined using depth division information indicating whether or not coding is performed at the lower depth coding unit without coding at the current depth. If the current depth of the current encoding unit is the encoding depth, the current encoding unit is encoded in the current depth encoding unit, so that the division information of the current depth can be defined so as not to be further divided into lower depths. On the other hand, if the current depth of the current encoding unit is not the encoding depth, the encoding using the lower depth encoding unit should be tried. Therefore, the division information of the current depth may be defined to be divided into the lower depth encoding units.

If the current depth is not the encoding depth, encoding is performed on the encoding unit divided into lower-depth encoding units. Since there are one or more lower-level coding units in the current-depth coding unit, the coding is repeatedly performed for each lower-level coding unit so that recursive coding can be performed for each coding unit of the same depth.

Since the coding units of the tree structure are determined in one maximum coding unit and information on at least one coding mode is determined for each coding unit of coding depth, information on at least one coding mode is determined for one maximum coding unit . Since the data of the maximum encoding unit is hierarchically divided according to the depth and the depth of encoding may be different for each position, information on the encoding depth and the encoding mode may be set for the data.

Accordingly, the output unit 130 according to the embodiment can allocate encoding depths and encoding information for the encoding mode to at least one of the encoding unit, the prediction unit, and the minimum unit included in the maximum encoding unit .

The minimum unit according to an exemplary embodiment is a square data unit having a minimum coding unit having the lowest coding depth divided into quadrants. The minimum unit according to an exemplary embodiment may be a maximum size square data unit that can be included in all coding units, prediction units, partition units, and conversion units included in the maximum coding unit.

For example, the encoding information output through the output unit 130 may be classified into encoding information per depth unit and encoding information per prediction unit. The encoding information for each depth coding unit may include prediction mode information and partition size information. The encoding information to be transmitted for each prediction unit includes information about the estimation direction of the inter mode, information about the reference picture index of the inter mode, information on the motion vector, information on the chroma component of the intra mode, information on the interpolation mode of the intra mode And the like.

Information on the maximum size of a coding unit defined for each picture, slice or GOP, and information on the maximum depth can be inserted into a header, a sequence parameter set, or a picture parameter set of a bitstream.

Information on the maximum size of the conversion unit allowed for the current video and information on the minimum size of the conversion unit can also be output through a header, a sequence parameter set, or a picture parameter set or the like of the bit stream. The output unit 130 can encode and output reference information, prediction information, slice type information, and the like related to the prediction.

According to the simplest embodiment of the video coding apparatus 100, the coding unit for depth is a coding unit which is half the height and width of the coding unit of one layer higher depth. That is, if the size of the current depth encoding unit is 2Nx2N, the size of the lower depth encoding unit is NxN. In addition, the current encoding unit of 2Nx2N size can include a maximum of 4 sub-depth encoding units of NxN size.

Therefore, the video encoding apparatus 100 determines the encoding unit of the optimal shape and size for each maximum encoding unit based on the size and the maximum depth of the maximum encoding unit determined in consideration of the characteristics of the current picture, Encoding units can be configured. In addition, since each encoding unit can be encoded by various prediction modes, conversion methods, and the like, an optimal encoding mode can be determined in consideration of image characteristics of encoding units of various image sizes.

Therefore, if an image having a very high image resolution or a very large data amount is encoded in units of existing macroblocks, the number of macroblocks per picture becomes excessively large. This increases the amount of compression information generated for each macroblock, so that the burden of transmission of compressed information increases and the data compression efficiency tends to decrease. Therefore, the video encoding apparatus according to an embodiment can increase the maximum size of the encoding unit in consideration of the image size, and adjust the encoding unit in consideration of the image characteristic, so that the image compression efficiency can be increased.

The scalable video encoding apparatus 10 described above with reference to FIG. 1 may include video encoding apparatuses 100 as many as the number of layers for encoding single layer images for each layer of multilayer video. For example, the base layer encoding unit 12 may include one video encoding apparatus 100 and the enhancement layer encoding unit 14 may include as many enhancement layer video encoding apparatuses 100 as possible.

When the video encoding apparatus 100 encodes base layer images, the encoding unit determination unit 120 determines a prediction unit for inter-image prediction for each encoding unit according to the tree structure for each maximum encoding unit, Inter prediction can be performed.

Even when the video encoding apparatus 100 encodes the enhancement layer images, the encoding unit determination unit 120 determines encoding units and prediction units according to the tree structure for each maximum encoding unit, and performs inter-prediction for each prediction unit have.

FIG. 16 shows a block diagram of a video decoding apparatus 200 based on a coding unit according to a tree structure according to various embodiments.

A video decoding apparatus 200 including video prediction based on a coding unit according to an exemplary embodiment includes a receiving unit 210, a video data and coding information extracting unit 220, and a video data decoding unit 230 do. For convenience of explanation, a video decoding apparatus 200 that accompanies video prediction based on a coding unit according to an exemplary embodiment is referred to as a 'video decoding apparatus 200' in short.

Definitions of various terms such as coding unit, depth, prediction unit, conversion unit, and information on various coding modes for the decoding operation of the video decoding apparatus 200 according to an embodiment are described with reference to FIG. 15 and the video coding apparatus 100 Are the same as described above.

The receiving unit 210 receives and parses the bitstream of the encoded video. The image data and encoding information extracting unit 220 extracts image data encoded for each encoding unit according to the encoding units according to the tree structure according to the maximum encoding unit from the parsed bit stream and outputs the extracted image data to the image data decoding unit 230. The image data and encoding information extraction unit 220 can extract information on the maximum size of the encoding unit of the current picture from the header, sequence parameter set, or picture parameter set for the current picture.

Also, the image data and encoding information extracting unit 220 extracts information on the encoding depth and the encoding mode for the encoding units according to the tree structure for each maximum encoding unit from the parsed bit stream. The information on the extracted coding depth and coding mode is output to the image data decoding unit 230. That is, the video data of the bit stream can be divided into the maximum encoding units, and the video data decoding unit 230 can decode the video data per maximum encoding unit.

Information on the coding depth and coding mode per coding unit can be set for one or more coding depth information, and the information on the coding mode for each coding depth is divided into partition type information of the coding unit, prediction mode information, The size information of the image data, and the like. In addition, as the encoding depth information, depth-based segmentation information may be extracted.

The encoding depth and encoding mode information extracted by the image data and encoding information extracting unit 220 may be encoded in the encoding unit such as the video encoding apparatus 100 according to one embodiment, And information on the coding depth and coding mode determined to repeatedly perform coding for each unit to generate the minimum coding error. Therefore, the video decoding apparatus 200 can decode the data according to the coding scheme that generates the minimum coding error to recover the video.

The encoding information for the encoding depth and the encoding mode according to the embodiment may be allocated for a predetermined data unit among the encoding unit, the prediction unit and the minimum unit. Therefore, the image data and the encoding information extracting unit 220 may extract predetermined data Information on the coding depth and coding mode can be extracted for each unit. If information on the coding depth and the coding mode of the corresponding maximum coding unit is recorded for each predetermined data unit, the predetermined data units having the same coding depth and information on the coding mode are referred to as data units included in the same maximum coding unit .

The image data decoding unit 230 decodes the image data of each maximum encoding unit based on the information on the encoding depth and the encoding mode for each maximum encoding unit to reconstruct the current picture. That is, the image data decoding unit 230 decodes the image data encoded based on the read partition type, the prediction mode, and the conversion unit for each coding unit among the coding units according to the tree structure included in the maximum coding unit . The decoding process may include a prediction process including intra prediction and motion compensation, and an inverse process.

The image data decoding unit 230 may perform intra prediction or motion compensation according to each partition and prediction mode for each coding unit based on partition type information and prediction mode information of a prediction unit of each coding depth .

In addition, the image data decoding unit 230 may read the conversion unit information according to the tree structure for each encoding unit for inverse conversion according to the maximum encoding unit, and perform inverse conversion based on the conversion unit for each encoding unit. Through the inverse transformation, the pixel value of the spatial domain of the encoding unit can be restored.

The image data decoding unit 230 can determine the coding depth of the current maximum coding unit using the division information by depth. If the division information indicates that it is no longer divided at the current depth, then the current depth is the depth of the encoding. Therefore, the image data decoding unit 230 can decode the current depth encoding unit for the image data of the current maximum encoding unit using the partition type, the prediction mode, and the conversion unit size information of the prediction unit.

In other words, the encoding information set for the predetermined unit of data among the encoding unit, the prediction unit and the minimum unit is observed, and the data units holding the encoding information including the same division information are collected, and the image data decoding unit 230 It can be regarded as one data unit to be decoded in the same encoding mode. Information on the encoding mode can be obtained for each encoding unit determined in this manner, and decoding of the current encoding unit can be performed.

The scalable video decoding apparatus 20 described above with reference to FIG. 2 decodes the received base layer video stream and enhancement layer video stream to reconstruct the base layer video and enhancement layer video, As many as the number of viewpoints.

When the base layer video stream is received, the video data decoding unit 230 of the video decoding apparatus 200 extracts the samples of the base layer video extracted from the base layer video stream by the extracting unit 220, Can be divided into coding units according to a tree structure. The image data decoding unit 230 may perform motion compensation for each prediction unit for inter-image prediction for each coding unit according to a tree structure of samples of base layer images to restore the base layer images.

When the enhancement layer video stream is received, the video data decoding unit 230 of the video decoding apparatus 200 extracts the samples of the enhancement layer video extracted from the enhancement layer video stream by the extracting unit 220, Can be divided into coding units according to a tree structure. The image data decoding unit 230 may perform enhancement layer reconstruction by performing motion compensation for each prediction unit for inter-image prediction for each coding unit of samples of enhancement layer images.

As a result, the video decoding apparatus 200 can perform recursive encoding for each maximum encoding unit in the encoding process, obtain information on the encoding unit that has generated the minimum encoding error, and use it for decoding the current picture. That is, it is possible to decode the encoded image data of the encoding units according to the tree structure determined as the optimal encoding unit for each maximum encoding unit.

Accordingly, even if an image with a high resolution or an excessively large amount of data is used, the information on the optimal encoding mode transmitted from the encoding end is used, and the image data is efficiently encoded according to the encoding unit size and encoding mode, Can be decoded and restored.

Figure 17 shows the concept of a coding unit according to various embodiments.

An example of an encoding unit is that the size of an encoding unit is represented by a width x height, and may include 32x32, 16x16, and 8x8 from an encoding unit having a size of 64x64. The encoding unit of size 64x64 can be divided into the partitions of size 64x64, 64x32, 32x64, 32x32, and the encoding unit of size 32x32 is the partitions of size 32x32, 32x16, 16x32, 16x16 and the encoding unit of size 16x16 is the size of 16x16 , 16x8, 8x16, and 8x8, and a size 8x8 encoding unit can be divided into partitions of size 8x8, 8x4, 4x8, and 4x4.

With respect to the video data 310, the resolution is set to 1920 x 1080, the maximum size of the encoding unit is set to 64, and the maximum depth is set to 2. For the video data 320, the resolution is set to 1920 x 1080, the maximum size of the encoding unit is set to 64, and the maximum depth is set to 3. With respect to the video data 330, the resolution is set to 352 x 288, the maximum size of the encoding unit is set to 16, and the maximum depth is set to 1. The maximum depth shown in FIG. 17 represents the total number of divisions from the maximum encoding unit to the minimum encoding unit.

It is preferable that the maximum size of the coding size is relatively large in order to improve the coding efficiency as well as to accurately characterize the image characteristics when the resolution or the data amount is large. Therefore, the maximum size of the video data 310 and 320 having the higher resolution than the video data 330 can be selected to be 64. FIG.

Since the maximum depth of the video data 310 is 2, the encoding unit 315 of the video data 310 is divided into two from the maximum encoding unit having the major axis size of 64, and the depths are deepened by two layers, Encoding units. On the other hand, since the maximum depth of the video data 330 is 1, the encoding unit 335 of the video data 330 divides the encoding unit 335 having a long axis size of 16 by one time, Encoding units.

Since the maximum depth of the video data 320 is 3, the encoding unit 325 of the video data 320 divides the encoding unit 325 from the maximum encoding unit having the major axis size of 64 to 3 times, , 8 encoding units can be included. The deeper the depth, the better the ability to express detail.

FIG. 18 shows a block diagram of an image encoding unit 400 based on an encoding unit according to various embodiments.

The image encoding unit 400 according to an exemplary embodiment includes operations to encode image data in the encoding unit determination unit 120 of the video encoding device 100. [ That is, the intraprediction unit 410 performs intraprediction on the intra-mode encoding unit of the current frame 405, and the motion estimation unit 420 and the motion compensation unit 425 perform intraprediction on the current frame 405 of the inter- And a reference frame 495, as shown in FIG.

The data output from the intraprediction unit 410, the motion estimation unit 420 and the motion compensation unit 425 are output as quantized transform coefficients through the transform unit 430 and the quantization unit 440. The quantized transform coefficients are reconstructed into spatial domain data through the inverse quantization unit 460 and the inverse transform unit 470 and the data of the reconstructed spatial domain is passed through the deblocking unit 480 and the offset compensating unit 490, And output to the reference frame 495. [ The quantized transform coefficient may be output to the bitstream 455 via the entropy encoding unit 450.

The motion estimation unit 420, the motion compensation unit 425, and the conversion unit 420, which are the components of the image encoding unit 400, to be applied to the video encoding apparatus 100 according to one embodiment. The quantization unit 440, the entropy encoding unit 450, the inverse quantization unit 460, the inverse transform unit 470, the deblocking unit 480 and the offset compensation unit 490 It is necessary to perform operations based on each encoding unit among the encoding units according to the tree structure in consideration of the depth.

In particular, the intra prediction unit 410, the motion estimation unit 420, and the motion compensation unit 425 compute the maximum size and the maximum depth of the current maximum encoding unit, And the prediction mode, and the conversion unit 430 determines the size of the conversion unit in each coding unit among the coding units according to the tree structure.

FIG. 19 shows a block diagram of an image decoding unit 500 based on an encoding unit according to various embodiments.

The bitstream 505 passes through the parsing unit 510 and the encoded image data to be decoded and the encoding-related information necessary for decoding are parsed. The encoded image data is output as inverse quantized data through the entropy decoding unit 520 and the inverse quantization unit 530, and the image data in the spatial domain is restored via the inverse transform unit 540.

The intra-prediction unit 550 performs intraprediction on the intra-mode encoding unit for the video data in the spatial domain, and the motion compensating unit 560 performs intra-prediction on the intra-mode encoding unit using the reference frame 585 And performs motion compensation for the motion compensation.

The data in the spatial domain that has passed through the intra prediction unit 550 and the motion compensation unit 560 may be post-processed through the deblocking unit 570 and the offset compensating unit 580 and output to the reconstruction frame 595. Further, the post-processed data via deblocking unit 570 and loop filtering unit 580 may be output as reference frame 585.

In order to decode the image data in the image data decoding unit 230 of the video decoding apparatus 200, operations after the parsing unit 510 of the image decoding unit 500 according to the embodiment may be performed.

A parsing unit 510, an entropy decoding unit 520, an inverse quantization unit 530, and an inverse transform unit 540, which are components of the video decoding unit 500, in order to be applied to the video decoding apparatus 200 according to one embodiment. The intraprediction unit 550, the motion compensation unit 560, the deblocking unit 570 and the offset compensating unit 580 all perform operations based on the encoding units according to the tree structure for each maximum encoding unit do.

In particular, the intraprediction unit 550 and the motion compensation unit 560 determine the partition and prediction mode for each coding unit according to the tree structure, and the inverse transform unit 540 determines the size of the conversion unit for each coding unit .

The coding operation of Fig. 18 and the decoding operation of Fig. 19 are described above for the video stream coding operation and the decoding operation in a single layer, respectively. Therefore, if the scalable video encoding apparatus 10 of FIG. 1 encodes video streams of two or more layers, the image encoding unit 400 may be included in each layer. Similarly, if the scalable video decoding apparatus 20 of FIG. 2 decodes video streams of two or more layers, the video decoding unit 500 may be included in each layer.

20 illustrates depth-specific encoding units and partitions according to various embodiments.

The video encoding apparatus 100 and the video decoding apparatus 200 according to an exemplary embodiment use a hierarchical encoding unit to consider an image characteristic. The maximum height, width, and maximum depth of the encoding unit may be adaptively determined according to the characteristics of the image, or may be variously set according to the demand of the user. The size of each coding unit may be determined according to the maximum size of a predetermined coding unit.

The hierarchical structure 600 of the encoding unit according to an embodiment shows a case where the maximum height and width of the encoding unit is 64 and the maximum depth is 3. In this case, the maximum depth indicates the total number of division from the maximum encoding unit to the minimum encoding unit. Since the depth is deeper along the vertical axis of the hierarchical structure 600 of the encoding unit according to the embodiment, the height and width of the encoding unit for each depth are divided. In addition, along the horizontal axis of the hierarchical structure 600 of encoding units, prediction units and partitions serving as the basis of predictive encoding of each depth-dependent encoding unit are shown.

That is, the coding unit 610 is the largest coding unit among the hierarchical structures 600 of the coding units and has a depth of 0, and the size of the coding units, that is, the height and the width, is 64x64. There are a coding unit 620 of depth 1 having a depth of 32x32 and a coding unit 630 of depth 2 having a size of 16x16 and a coding unit 640 of depth 3 having a size of 8x8. The encoding unit 640 of depth 3 of size 8x8 is the minimum encoding unit.

Prediction units and partitions of coding units are arranged along the horizontal axis for each depth. That is, if the encoding unit 610 having a depth 0 size of 64x64 is a prediction unit, the prediction unit is a partition 610 having a size of 64x64, a partition 612 having a size 64x32, 32x64 partitions 614, and size 32x32 partitions 616. [

Likewise, the prediction unit of the 32x32 coding unit 620 having the depth 1 is the partition 620 of the size 32x32, the partitions 622 of the size 32x16, the partition 622 of the size 16x32 included in the coding unit 620 of the size 32x32, And a partition 626 of size 16x16.

Likewise, the prediction unit of the 16x16 encoding unit 630 of depth 2 is divided into a partition 630 of size 16x16, partitions 632 of size 16x8, partitions 632 of size 8x16 included in the encoding unit 630 of size 16x16, (634), and partitions (636) of size 8x8.

Likewise, the prediction unit of the 8x8 encoding unit 640 of depth 3 is a partition 640 of size 8x8, partitions 642 of size 8x4, partitions 642 of size 4x8 included in the encoding unit 640 of size 8x8, 644, and a partition 646 of size 4x4.

The encoding unit determination unit 120 of the video encoding apparatus 100 according to an exemplary embodiment of the present invention determines encoding depths of encoding units of the respective depths included in the maximum encoding unit 610 Encoding is performed.

The number of coding units per depth to include data of the same range and size increases as the depth of the coding unit increases. For example, for data containing one coding unit at depth 1, four coding units at depth 2 are required. Therefore, in order to compare the encoding results of the same data by depth, they should be encoded using a single depth 1 encoding unit and four depth 2 encoding units, respectively.

For each depth-of-field coding, encoding is performed for each prediction unit of the depth-dependent coding unit along the horizontal axis of the hierarchical structure 600 of the coding unit, and a representative coding error, which is the smallest coding error at the corresponding depth, is selected . In addition, depths are deepened along the vertical axis of the hierarchical structure 600 of encoding units, and the minimum encoding errors can be retrieved by comparing the representative encoding errors per depth by performing encoding for each depth. The depth and partition at which the minimum coding error occurs among the maximum coding units 610 can be selected as the coding depth and the partition type of the maximum coding unit 610. [

Figure 21 illustrates the relationship between an encoding unit and a conversion unit, according to various embodiments.

The video coding apparatus 100 or the video decoding apparatus 200 according to an embodiment encodes or decodes an image in units of coding units smaller than or equal to the maximum coding unit for each maximum coding unit. The size of the conversion unit for conversion during the encoding process can be selected based on a data unit that is not larger than each encoding unit.

For example, in the video encoding apparatus 100 or the video encoding apparatus 200 according to an embodiment, when the current encoding unit 710 is 64x64 size, the 32x32 conversion unit 720 The conversion can be performed.

In addition, the data of the 64x64 encoding unit 710 is converted into 32x32, 16x16, 8x8, and 4x4 conversion units each having a size of 64x64 or smaller, and then a conversion unit having the smallest error with the original is selected .

Figure 12 shows depth-specific encoding information, in accordance with various embodiments.

The output unit 130 of the video encoding apparatus 100 according to one embodiment includes information on the encoding mode, information 800 relating to the partition type, information 810 relating to the prediction mode for each encoding unit of each encoding depth, , And information 820 on the conversion unit size may be encoded and transmitted.

The partition type information 800 represents information on the type of partition in which the prediction unit of the current encoding unit is divided, as a data unit for predictive encoding of the current encoding unit. For example, the current encoding unit CU_0 of size 2Nx2N may be any one of a partition 802 of size 2Nx2N, a partition 804 of size 2NxN, a partition 806 of size Nx2N, and a partition 808 of size NxN And can be divided and used. In this case, the information 800 regarding the partition type of the current encoding unit indicates one of a partition 802 of size 2Nx2N, a partition 804 of size 2NxN, a partition 806 of size Nx2N, and a partition 808 of size NxN .

The prediction mode information 810 indicates a prediction mode of each partition. For example, it is determined whether the partition indicated by the information 800 relating to the partition type is predictive-encoded in one of the intra mode 812, the inter mode 814, and the skip mode 816 through the prediction mode information 810 Can be set.

In addition, the information 820 on the conversion unit size indicates whether to perform conversion based on which conversion unit the current encoding unit is to be converted. For example, the conversion unit may be one of a first intra-conversion unit size 822, a second intra-conversion unit size 824, a first inter-conversion unit size 826, have.

The video data and encoding information extracting unit 210 of the video decoding apparatus 200 according to one embodiment is configured to extract the information 800 about the partition type, the information 810 about the prediction mode, Information 820 on the unit size can be extracted and used for decoding.

FIG. 23 shows a depth encoding unit according to various embodiments.

Partition information may be used to indicate changes in depth. The division information indicates whether the current-depth encoding unit is divided into lower-depth encoding units.

The prediction unit 910 for predicting the coding unit 900 having the depth 0 and 2N_0x2N_0 size includes a partition type 912 of 2N_0x2N_0 size, a partition type 914 of 2N_0xN_0 size, a partition type 916 of N_0x2N_0 size, N_0xN_0 Size partition type 918. < RTI ID = 0.0 > Only the partitions 912, 914, 916, and 918 in which the prediction unit is divided at the symmetric ratio are exemplified, but the partition type is not limited to the above, and may be an asymmetric partition, an arbitrary type partition, . &Lt; / RTI >

For each partition type, predictive encoding should be repeatedly performed for each partition of size 2N_0x2N_0, two 2N_0xN_0 partitions, two N_0x2N_0 partitions, and four N_0xN_0 partitions. For a partition of size 2N_0x2N_0, size N_0x2N_0, size 2N_0xN_0 and size N_0xN_0, predictive coding can be performed in intra mode and inter mode. The skip mode can be performed only on the partition of size 2N_0x2N_0 with predictive coding.

If the encoding error caused by one of the partition types 912, 914, and 916 of the sizes 2N_0x2N_0, 2N_0xN_0 and N_0x2N_0 is the smallest, there is no need to further divide into lower depths.

If the coding error by the partition type 918 of the size N_0xN_0 is the smallest, the depth 0 is changed to 1 and divided (920), and the coding unit 930 of the partition type of the depth 2 and the size N_0xN_0 is repeatedly encoded The minimum coding error can be retrieved.

A prediction unit 940 for predicting the coding unit 930 of the depth 1 and the size 2N_1x2N_1 (= N_0xN_0) includes a partition type 942 of size 2N_1x2N_1, a partition type 944 of size 2N_1xN_1, a partition type 942 of size N_1x2N_1 (946), and a partition type 948 of size N_1xN_1.

If the encoding error by the partition type 948 having the size N_1xN_1 size is the smallest, the depth 1 is changed to the depth 2 and divided (950), and repeatedly performed on the encoding units 960 of the depth 2 and the size N_2xN_2 Encoding can be performed to search for the minimum coding error.

If the maximum depth is d, the depth-based coding unit is set up to the depth d-1, and the division information can be set up to the depth d-2. That is, when the encoding is performed from the depth d-2 to the depth d-1, the prediction encoding of the encoding unit 980 of the depth d-1 and the size 2N_ (d-1) x2N_ (d- The prediction unit 990 for the size 2N_ (d-1) x2N_ (d-1) includes a partition type 992 of size 2N_ A partition type 998 of N_ (d-1) x2N_ (d-1), and a partition type 998 of size N_ (d-1) xN_ (d-1).

(D-1) x2N_ (d-1), two size 2N_ (d-1) xN_ (d-1) partitions, and two sizes N_ (d-1) and the partition of four sizes N_ (d-1) xN_ (d-1), the partition type in which the minimum coding error occurs can be retrieved .

Even if the coding error by the partition type 998 of the size N_ (d-1) xN_ (d-1) is the smallest, since the maximum depth is d, the coding unit CU_ (d-1) of the depth d- The coding depth for the current maximum coding unit 900 is determined as the depth d-1, and the partition type can be determined as N_ (d-1) xN_ (d-1). Also, since the maximum depth is d, the division information is not set for the encoding unit 952 of the depth d-1.

The data unit 999 may be referred to as the 'minimum unit' for the current maximum encoding unit. The minimum unit according to an exemplary embodiment may be a quadrangle data unit having a minimum coding unit having the lowest coding depth divided into quadrants. Through the iterative coding process, the video coding apparatus 100 according to an embodiment compares the coding errors of the coding units 900 to determine the coding depth, selects the depth at which the smallest coding error occurs, determines the coding depth, The corresponding partition type and the prediction mode can be set to the coding mode of the coding depth.

In this way, the minimum coding error of each of the depths 0, 1, ..., d-1, and d is compared and the depth with the smallest error is selected to be determined as the coding depth. The coding depth, and the partition type and prediction mode of the prediction unit can be encoded and transmitted as information on the encoding mode. In addition, since the coding unit must be divided from the depth 0 to the coding depth, only the division information of the coding depth is set to '0', and the division information by depth is set to '1' except for the coding depth.

The video data and encoding information extracting unit 220 of the video decoding apparatus 200 according to an exemplary embodiment extracts information on the encoding depth and the prediction unit for the encoding unit 900 and uses the information to extract the encoding unit 912 . The video decoding apparatus 200 according to an exemplary embodiment of the present invention uses division information by depth to grasp the depth with the division information of '0' as a coding depth and can use it for decoding using information on the coding mode for the corresponding depth have.

Figures 24, 25 and 26 show the relationship of the coding unit, the prediction unit and the conversion unit according to various embodiments.

The coding unit 1010 is coding units for coding depth determined by the video coding apparatus 100 according to the embodiment with respect to the maximum coding unit. The prediction unit 1060 is a partition of prediction units of each coding depth unit in the coding unit 1010, and the conversion unit 1070 is a conversion unit of each coding depth unit.

When the depth of the maximum encoding unit is 0, the depth of the encoding units 1012 and 1054 is 1 and the depth of the encoding units 1014, 1016, 1018, 1028, 1050, The coding units 1020, 1022, 1024, 1026, 1030, 1032 and 1048 have a depth of 3 and the coding units 1040, 1042, 1044 and 1046 have a depth of 4.

Some partitions 1014, 1016, 1022, 1032, 1048, 1050, 1052, and 1054 among the prediction units 1060 are in the form of a segment of a coding unit. That is, the partitions 1014, 1022, 1050 and 1054 are 2NxN partition types, the partitions 1016, 1048 and 1052 are Nx2N partition type, and the partition 1032 is NxN partition type. The prediction units and the partitions of the depth-dependent coding units 1010 are smaller than or equal to the respective coding units.

The image data of a part 1052 of the conversion units 1070 is converted or inversely converted into a data unit smaller in size than the encoding unit. The conversion units 1014, 1016, 1022, 1032, 1048, 1050, 1052, and 1054 are data units of different sizes or types when compared with the prediction units and the partitions of the prediction units 1060. In other words, the video coding apparatus 100 according to the embodiment and the video decoding apparatus 200 according to the embodiment can perform the intra prediction / motion estimation / motion compensation operation for the same coding unit and the conversion / Each can be performed on a separate data unit basis.

Thus, for each maximum encoding unit, the encoding units are recursively performed for each encoding unit hierarchically structured in each region, and the optimal encoding unit is determined, so that encoding units according to the recursive tree structure can be constructed. The encoding information may include division information for the encoding unit, partition type information, prediction mode information, and conversion unit size information. Table 1 below shows an example that can be set in the video encoding apparatus 100 according to the embodiment and the video decoding apparatus 200 according to an embodiment.

Partition information 0 (encoding for the encoding unit of size 2Nx2N of current depth d) Partition information 1 Prediction mode Partition type Conversion unit size For each sub-depth d + 1 encoding units, Intra
Inter

Skip (2Nx2N only) Symmetrical partition type Asymmetric partition type Conversion unit partition information 0 Conversion unit
Partition information 1 2Nx2N
2NxN
Nx2N
NxN 2NxnU
2NxnD
nLx2N
nRx2N 2Nx2N NxN
(Symmetrical partition type)

N / 2xN / 2
(Asymmetric partition type)

The output unit 130 of the video encoding apparatus 100 according to an exemplary embodiment outputs encoding information for encoding units according to the tree structure and outputs the encoding information to the encoding information extracting unit 220 can extract the encoding information for the encoding units according to the tree structure from the received bitstream.

The division information indicates whether the current encoding unit is divided into low-depth encoding units. If the division information of the current depth d is 0, since the depth at which the current encoding unit is not further divided into the current encoding unit is the encoding depth, the partition type information, prediction mode, and conversion unit size information are defined . When it is necessary to further divide by one division according to the division information, encoding should be performed independently for each of four divided sub-depth coding units.

The prediction mode may be represented by one of an intra mode, an inter mode, and a skip mode. Intra mode and inter mode can be defined in all partition types, and skip mode can be defined only in partition type 2Nx2N.

The partition type information indicates symmetrical partition types 2Nx2N, 2NxN, Nx2N and NxN in which the height or width of the predicted unit is divided into symmetric proportions and asymmetric partition types 2NxnU, 2NxnD, nLx2N, and nRx2N divided by the asymmetric ratio . Asymmetric partition types 2NxnU and 2NxnD are respectively divided into heights 1: 3 and 3: 1, and asymmetric partition types nLx2N and nRx2N are respectively divided into widths of 1: 3 and 3: 1.

The conversion unit size can be set to two kinds of sizes in the intra mode and two kinds of sizes in the inter mode. That is, if the conversion unit division information is 0, the size of the conversion unit is set to the size 2Nx2N of the current encoding unit. If the conversion unit division information is 1, a conversion unit of the size where the current encoding unit is divided can be set. Also, if the partition type for the current encoding unit of size 2Nx2N is a symmetric partition type, the size of the conversion unit may be set to NxN, or N / 2xN / 2 if it is an asymmetric partition type.

The encoding information of the encoding units according to the tree structure according to an exemplary embodiment may be allocated to at least one of encoding units, prediction units, and minimum unit units of the encoding depth. The coding unit of the coding depth may include one or more prediction units and minimum units having the same coding information.

Therefore, if encoding information held in adjacent data units is checked, it can be confirmed whether or not the encoded information is included in the encoding unit of the same encoding depth. In addition, since the encoding unit of the encoding depth can be identified by using the encoding information held by the data unit, the distribution of encoding depths within the maximum encoding unit can be inferred.

Therefore, in this case, when the current encoding unit is predicted with reference to the neighboring data unit, the encoding information of the data unit in the depth encoding unit adjacent to the current encoding unit can be directly referenced and used.

In another embodiment, when predictive encoding is performed with reference to a current encoding unit with reference to a surrounding encoding unit, data adjacent to the current encoding unit in the depth encoding unit is encoded using the encoding information of adjacent encoding units The surrounding encoding unit may be referred to by being searched.

FIG. 27 shows the relationship between the encoding unit, the prediction unit, and the conversion unit according to the encoding mode information in Table 1.

The maximum coding unit 1300 includes coding units 1302, 1304, 1306, 1312, 1314, 1316, and 1318 of the coding depth. Since one of the encoding units 1318 is a coding unit of the encoding depth, the division information may be set to zero. The partition type information of the encoding unit 1318 of the size 2Nx2N is the partition type information of the partition type 2Nx2N 1322, 2NxN 1324, Nx2N 1326, NxN 1328, 2NxnU 1332, 2NxnD 1334, nLx2N 1336, And < RTI ID = 0.0 > nRx2N 1338 < / RTI >

The TU size flag is a kind of conversion index, and the size of the conversion unit corresponding to the conversion index can be changed according to the prediction unit type or partition type of the coding unit.

For example, when the partition type information is set to one of the symmetric partition types 2Nx2N 1322, 2NxN 1324, Nx2N 1326 and NxN 1328, if the conversion unit division information is 0, the conversion unit of size 2Nx2N 1342) is set, and if the conversion unit division information is 1, the conversion unit 1344 of size NxN can be set.

When the partition type information is set to one of the asymmetric partition types 2NxnU 1332, 2NxnD 1334, nLx2N 1336 and nRx2N 1338, if the TU size flag is 0, the conversion unit of size 2Nx2N 1352) is set, and if the conversion unit division information is 1, a conversion unit 1354 of size N / 2xN / 2 can be set.

The TU size flag described above with reference to FIG. 27 is a flag having a value of 0 or 1, but the conversion unit division information according to the embodiment is not limited to a 1-bit flag, , 1, 2, 3, etc., and the conversion unit may be divided hierarchically. The conversion unit partition information can be used as an embodiment of the conversion index.

In this case, if the conversion unit division information according to the embodiment is used together with the maximum size of the conversion unit and the minimum size of the conversion unit, the size of the conversion unit actually used can be expressed. The video encoding apparatus 100 according to an exemplary embodiment may encode the maximum conversion unit size information, the minimum conversion unit size information, and the maximum conversion unit division information. The encoded maximum conversion unit size information, the minimum conversion unit size information, and the maximum conversion unit division information may be inserted into the SPS. The video decoding apparatus 200 according to an exemplary embodiment can use the maximum conversion unit size information, the minimum conversion unit size information, and the maximum conversion unit division information for video decoding.

For example, if (a) the current encoding unit is 64x64 and the maximum conversion unit size is 32x32, (a-1) when the conversion unit division information is 0, the size of the conversion unit is 32x32, When the division information is 1, the size of the conversion unit is 16x16, (a-3) When the conversion unit division information is 2, the size of the conversion unit can be set to 8x8.

As another example, (b) if the current encoding unit is 32x32 and the minimum conversion unit size is 32x32, the size of the conversion unit may be set to 32x32 when the conversion unit division information is 0, Since the size can not be smaller than 32x32, further conversion unit division information can not be set.

As another example, (c) if the current encoding unit is 64x64 and the maximum conversion unit division information is 1, the conversion unit division information may be 0 or 1, and other conversion unit division information can not be set.

Therefore, when the maximum conversion unit division information is defined as 'MaxTransformSizeIndex', the minimum conversion unit size is defined as 'MinTransformSize', and the conversion unit size when the conversion unit division information is 0 is defined as 'RootTuSize', the minimum conversion unit The size 'CurrMinTuSize' can be defined as the following relation (1).

CurrMinTuSize

= max (MinTransformSize, RootTuSize / (2 ^ MaxTransformSizeIndex)) (1)

'RootTuSize', which is the conversion unit size when the conversion unit division information is 0 as compared with the minimum conversion unit size 'CurrMinTuSize' possible in the current encoding unit, can represent the maximum conversion unit size that can be adopted by the system. That is, according to the relational expression (1), 'RootTuSize / (2 ^ MaxTransformSizeIndex)' is obtained by dividing 'RootTuSize', which is the conversion unit size in the case where the conversion unit division information is 0, by the number corresponding to the maximum conversion unit division information Unit size, and 'MinTransformSize' is the minimum conversion unit size, so a smaller value of these may be the minimum conversion unit size 'CurrMinTuSize' that is currently available in the current encoding unit.

The maximum conversion unit size RootTuSize according to an exemplary embodiment may vary depending on the prediction mode.

For example, if the current prediction mode is the inter mode, RootTuSize can be determined according to the following relation (2). In the relation (2), 'MaxTransformSize' indicates the maximum conversion unit size and 'PUSize' indicates the current prediction unit size.

RootTuSize = min (MaxTransformSize, PUSize) (2)

That is, if the current prediction mode is the inter mode, 'RootTuSize' which is the conversion unit size when the conversion unit division information is 0 can be set to a smaller value of the maximum conversion unit size and the current prediction unit size.

If the prediction mode of the current partition unit is the intra mode, if the prediction mode is the mode, 'RootTuSize' can be determined according to the following relation (3). 'PartitionSize' represents the size of the current partition unit.

RootTuSize = min (MaxTransformSize, PartitionSize) (3)

That is, if the current prediction mode is the intra mode, 'RootTuSize' which is the conversion unit size when the conversion unit division information is 0 can be set to a smaller value among the maximum conversion unit size and the size of the current partition unit.

However, it should be noted that the present maximum conversion unit size 'RootTuSize' according to one embodiment that varies according to the prediction mode of the partition unit is only one embodiment, and the factor for determining the current maximum conversion unit size is not limited thereto.

According to a video coding technique based on the coding units of the tree structure described above with reference to FIGS. 15 to 27, video data of a spatial region is encoded for each coding unit of a tree structure, and a video decoding technique based on coding units of a tree structure Decoding is performed for each maximum encoding unit according to the motion vector, and the video data in the spatial domain is reconstructed, and the video and the video, which is a picture sequence, can be reconstructed. The restored video can be played back by the playback apparatus, stored in a storage medium, or transmitted over a network.

The above-described embodiments of the present invention can be embodied in a general-purpose digital computer that can be embodied as a program that can be executed by a computer and operates the program using a computer-readable recording medium. The computer-readable recording medium includes a storage medium such as a magnetic storage medium (e.g., ROM, floppy disk, hard disk, etc.), optical reading medium (e.g., CD ROM,

For convenience of explanation, the scalable video encoding method and / or the video encoding method described above with reference to FIGS. 1 to 27 are collectively referred to as 'video encoding method of the present invention'. In addition, the scalable video decoding method and / or video decoding method described above with reference to FIGS. 1 to 27 is referred to as 'video decoding method of the present invention'

The video encoding apparatus composed of the scalable video encoding apparatus 10, the video encoding apparatus 100, or the image encoding unit 400 described above with reference to FIGS. 1 to 27 is the 'video encoding apparatus of the present invention' Collectively. The video decoding apparatus composed of the scalable video decoding apparatus 20, the video decoding apparatus 200 or the video decoding unit 500 described above with reference to FIGS. 1 to 27 may be a video decoding apparatus of the present invention Collectively.

An embodiment in which the computer-readable storage medium on which the program according to one embodiment is stored is disk 26000, is described in detail below.

Figure 28 illustrates the physical structure of a disk 26000 on which a program according to various embodiments is stored. The above-mentioned disk 26000 as a storage medium may be a hard disk, a CD-ROM disk, a Blu-ray disk, or a DVD disk. The disk 26000 is composed of a plurality of concentric tracks (tr), and the tracks are divided into a predetermined number of sectors (Se) along the circumferential direction. A program for implementing the quantization parameter determination method, the video encoding method, and the video decoding method described above may be allocated and stored in a specific area of the disk 26000 storing the program according to the above-described embodiment.

A computer system achieved using the above-described video encoding method and a storage medium storing a program for implementing the video decoding method will be described below with reference to FIG.

Fig. 29 shows a disk drive 26800 for recording and reading programs using disk 26000. Fig. The computer system 26700 may use a disk drive 26800 to store on the disk 26000 a program for implementing at least one of the video encoding method and the video decoding method of the present invention. The program may be read from disk 26000 by disk drive 26800 and the program may be transferred to computer system 26700 to execute the program stored on disk 26000 on computer system 26700. [

A program for implementing at least one of the video coding method and the video decoding method of the present invention may be stored in a memory card, a ROM cassette, or a solid state drive (SSD) as well as the disk 26000 exemplified in Figs. 28 and 29 .

A system to which the video coding method and the video decoding method according to the above-described embodiments are applied will be described later.

30 shows the overall structure of a content supply system 11000 for providing a content distribution service. The service area of the communication system is divided into cells of a predetermined size, and radio base stations 11700, 11800, 11900, and 12000 serving as base stations are installed in each cell.

The content supply system 11000 includes a plurality of independent devices. Independent devices such as, for example, a computer 12100, a personal digital assistant (PDA) 12200, a camera 12300 and a cellular phone 12500 may be connected to the Internet service provider 11200, the communication network 11400, 11700, 11800, 11900, 12000).

However, the content supply system 11000 is not limited to the structure shown in Fig. 31, and the devices may be selectively connected. Independent devices may be directly connected to the communication network 11400 without going through the wireless base stations 11700, 11800, 11900, and 12000.

The video camera 12300 is an imaging device that can capture a video image such as a digital video camera. The cellular phone 12500 may be a personal digital assistant (PDC), a code division multiple access (CDMA), a wideband code division multiple access (W-CDMA), a global system for mobile communications (GSM), and a personal handyphone system At least one of various protocols may be adopted.

The video camera 12300 may be connected to the streaming server 11300 via the wireless base station 11900 and the communication network 11400. [ The streaming server 11300 may stream the content transmitted by the user using the video camera 12300 to a real-time broadcast. The content received from the video camera 12300 can be encoded by the video camera 12300 or the streaming server 11300. [ The video data photographed by the video camera 12300 may be transmitted to the streaming server 11300 via the computer 12100. [

The video data photographed by the camera 12600 may also be transmitted to the streaming server 11300 via the computer 12100. [ The camera 12600 is an imaging device that can capture both still images and video images like a digital camera. The video data received from the camera 12600 may be encoded by the camera 12600 or the computer 12100. [ The software for video encoding and decoding may be stored in a computer readable recording medium such as a CD-ROM disk, a floppy disk, a hard disk drive, an SSD, or a memory card, to which the computer 12100 can access.

Also, when video is taken by a camera mounted on the cellular phone 12500, video data can be received from the cellular phone 12500. [

The video data can be encoded by a large scale integrated circuit (LSI) system mounted on the video camera 12300, the cellular phone 12500, or the camera 12600.

In a content supply system 11000 according to one embodiment, a user may be able to view a recorded video using a video camera 12300, a camera 12600, a cellular phone 12500 or other imaging device, such as, for example, The content is encoded and transmitted to the streaming server 11300. The streaming server 11300 may stream the content data to other clients requesting the content data.

Clients are devices capable of decoding encoded content data and may be, for example, a computer 12100, a PDA 12200, a video camera 12300, or a mobile phone 12500. Thus, the content supply system 11000 allows clients to receive and reproduce the encoded content data. In addition, the content supply system 11000 allows clients to receive encoded content data and decode and play back the encoded content data in real time, thereby enabling personal broadcasting.

The video encoding apparatus and the video decoding apparatus of the present invention can be applied to the encoding operation and the decode operation of the independent devices included in the content supply system 11000. [

One embodiment of the cellular phone 12500 of the content supply system 11000 will be described in detail below with reference to Figs. 31 and 32. Fig.

31 shows an external structure of a cellular phone 12500 to which the video coding method and the video decoding method of the present invention according to various embodiments are applied. The mobile phone 12500 may be a smart phone that is not limited in functionality and can be modified or extended in functionality through an application program.

The cellular phone 12500 includes an internal antenna 12510 for exchanging RF signals with the wireless base station 12000 and includes images captured by the camera 12530 or images received and decoded by the antenna 12510 And a display screen 12520 such as an LCD (Liquid Crystal Display) or an OLED (Organic Light Emitting Diodes) screen. The smartphone 12510 includes an operation panel 12540 including a control button and a touch panel. If the display screen 12520 is a touch screen, the operation panel 12540 further includes a touch sensitive panel of the display screen 12520. [ The smartphone 12510 includes a speaker 12580 or other type of acoustic output for outputting voice and sound and a microphone 12550 or other type of acoustic input for inputting voice and sound. The smartphone 12510 further includes a camera 12530 such as a CCD camera for capturing video and still images. The smartphone 12510 may also include a storage medium for storing encoded or decoded data, such as video or still images captured by the camera 12530, received via e-mail, or otherwise acquired 12570); And a slot 12560 for mounting the storage medium 12570 to the cellular phone 12500. [ The storage medium 12570 may be another type of flash memory, such as an SD card or an electrically erasable and programmable read only memory (EEPROM) embedded in a plastic case.

32 shows the internal structure of the cellular phone 12500. Fig. A power supply circuit 12700, an operation input control section 12640, an image encoding section 12720, a camera interface 12530, and a camera interface 12530 for systematically controlling each part of the cellular phone 12500 including a display screen 12520 and an operation panel 12540. [ An LCD control unit 12620, an image decoding unit 12690, a multiplexer / demultiplexer 12680, a recording / reading unit 12670, a modulation / demodulation unit 12660, A sound processing unit 12650 is connected to the central control unit 12710 via a synchronization bus 12730. [

The power supply circuit 12700 supplies power to each part of the cellular phone 12500 from the battery pack so that the cellular phone 12500 is powered by the power supply circuit May be set to the operation mode.

The central control unit 12710 includes a CPU, a ROM (Read Only Memory), and a RAM (Random Access Memory).

A digital signal is generated in the cellular phone 12500 under the control of the central control unit 12710. For example, in the sound processing unit 12650, a digital sound signal is generated A digital image signal is generated in the image encoding unit 12720 and text data of a message can be generated through the operation panel 12540 and the operation input control unit 12640. [ When the digital signal is transmitted to the modulation / demodulation unit 12660 under the control of the central control unit 12710, the modulation / demodulation unit 12660 modulates the frequency band of the digital signal, and the communication circuit 12610 modulates the band- Performs a D / A conversion and a frequency conversion process on the acoustic signal. The transmission signal output from the communication circuit 12610 can be transmitted to the voice communication base station or the wireless base station 12000 through the antenna 12510. [

For example, the sound signal obtained by the microphone 12550 when the cellular phone 12500 is in the call mode is converted into a digital sound signal by the sound processing unit 12650 under the control of the central control unit 12710. [ The generated digital sound signal is converted into a transmission signal via the modulation / demodulation section 12660 and the communication circuit 12610, and can be transmitted through the antenna 12510. [

When a text message such as e-mail is transmitted in the data communication mode, the text data of the message is input using the operation panel 12540, and the text data is transmitted to the central control unit 12610 through the operation input control unit 12640. The text data is converted into a transmission signal through the modulation / demodulation section 12660 and the communication circuit 12610 under control of the central control section 12610 and is sent to the wireless base station 12000 through the antenna 12510. [

In order to transmit the image data in the data communication mode, the image data photographed by the camera 12530 is provided to the image encoding unit 12720 through the camera interface 12630. The image data photographed by the camera 12530 can be displayed directly on the display screen 12520 through the camera interface 12630 and the LCD control unit 12620. [

The structure of the image encoding unit 12720 may correspond to the structure of the above-described video encoding apparatus of the present invention. The image encoding unit 12720 encodes the image data provided from the camera 12530 according to the above-described video encoding method of the present invention, converts the encoded image data into compression encoded image data, and outputs the encoded image data to the multiplexing / (12680). The acoustic signals obtained by the microphone 12550 of the cellular phone 12500 during the recording of the camera 12530 are also converted into digital sound data via the sound processing unit 12650 and the digital sound data is multiplexed / Lt; / RTI >

The multiplexing / demultiplexing unit 12680 multiplexes the encoded image data provided from the image encoding unit 12720 together with the sound data provided from the sound processing unit 12650. The multiplexed data is converted into a transmission signal through the modulation / demodulation section 12660 and the communication circuit 12610, and can be transmitted through the antenna 12510. [

In the process of receiving communication data from the outside of the cellular phone 12500, the signal received through the antenna 12510 is converted into a digital signal through frequency recovery and A / D conversion (Analog-Digital conversion) . The modulation / demodulation section 12660 demodulates the frequency band of the digital signal. The band-demodulated digital signal is transmitted to the video decoding unit 12690, the sound processing unit 12650, or the LCD control unit 12620 according to the type of the digital signal.

When the cellular phone 12500 is in the call mode, it amplifies the signal received through the antenna 12510 and generates a digital sound signal through frequency conversion and A / D conversion (Analog-Digital conversion) processing. The received digital sound signal is converted into an analog sound signal through the modulation / demodulation section 12660 and the sound processing section 12650 under the control of the central control section 12710, and the analog sound signal is output through the speaker 12580 .

In a data communication mode, when data of an accessed video file is received from a web site of the Internet, a signal received from the wireless base station 12000 through the antenna 12510 is processed by the modulation / demodulation unit 12660 And the multiplexed data is transmitted to the multiplexing / demultiplexing unit 12680.

In order to decode the multiplexed data received via the antenna 12510, the multiplexer / demultiplexer 12680 demultiplexes the multiplexed data to separate the encoded video data stream and the encoded audio data stream. The encoded video data stream is supplied to the video decoding unit 12690 by the synchronization bus 12730 and the encoded audio data stream is supplied to the audio processing unit 12650. [

The structure of the video decoding unit 12690 may correspond to the structure of the video decoding apparatus of the present invention described above. The video decoding unit 12690 decodes the encoded video data to generate reconstructed video data using the video decoding method of the present invention described above and transmits the reconstructed video data to the display screen 1252 via the LCD control unit 1262 To provide restored video data.

Accordingly, the video data of the video file accessed from the web site of the Internet can be displayed on the display screen 1252. [ At the same time, the sound processing unit 1265 can also convert the audio data to an analog sound signal and provide an analog sound signal to the speaker 1258. [ Accordingly, the audio data included in the video file accessed from the web site of the Internet can also be played back on the speaker 1258. [

The cellular phone 1250 or another type of communication terminal may be a transmitting terminal including both the video coding apparatus and the video decoding apparatus of the present invention or a transmitting terminal including only the video coding apparatus of the present invention described above, Only the receiving terminal may be included.

The communication system of the present invention is not limited to the above-described structure with reference to Fig. For example, FIG. 33 shows a digital broadcasting system to which a communication system according to various embodiments is applied. The digital broadcasting system according to the embodiment of FIG. 33 can receive digital broadcasting transmitted through a satellite or a terrestrial network by using the video encoding apparatus and the video decoding apparatus of the present invention.

Specifically, the broadcasting station 12890 transmits the video data stream to the communication satellite or broadcast satellite 12900 through radio waves. The broadcast satellite 12900 transmits the broadcast signal, and the broadcast signal is received by the satellite broadcast receiver by the antenna 12860 in the home. In each assumption, the encoded video stream may be decoded and played back by a TV receiver 12810, a set-top box 12870, or another device.

By implementing the video decoding apparatus of the present invention in the reproducing apparatus 12830, the reproducing apparatus 12830 can read and decode the encoded video stream recorded in the storage medium 12820 such as a disk and a memory card. The reconstructed video signal can thus be reproduced, for example, on a monitor 12840.

The video decoding apparatus of the present invention may be installed in the set-top box 12870 connected to the antenna 12860 for satellite / terrestrial broadcast or the cable antenna 12850 for cable TV reception. The output data of the set-top box 12870 can also be played back on the TV monitor 12880.

As another example, the video decoding apparatus of the present invention may be mounted on the TV receiver 12810 itself instead of the set-top box 12870. [

An automobile 12920 having an appropriate antenna 12910 may receive a signal transmitted from the satellite 12800 or the radio base station 11700. [ The decoded video can be reproduced on the display screen of the car navigation system 12930 mounted on the car 12920. [

The video signal can be encoded by the video encoding apparatus of the present invention and recorded and stored in the storage medium. Specifically, the video signal may be stored in the DVD disk 12960 by the DVD recorder, or the video signal may be stored in the hard disk by the hard disk recorder 12950. [ As another example, the video signal may be stored in SD card 12970. If a hard disk recorder 12950 is provided with the video decoding apparatus of the present invention according to an embodiment, a video signal recorded on a DVD disk 12960, an SD card 12970, or another type of storage medium is transferred from the monitor 12880 Can be reproduced.

The car navigation system 12930 may not include the camera 12530, the camera interface 12630, and the image encoding unit 12720 in Fig. For example, the computer 12100 and the TV receiver 12810 may not include the camera 12530, the camera interface 12630, and the image encoding unit 12720 in Fig.

34 shows a network structure of a cloud computing system using a video encoding apparatus and a video decoding apparatus according to various embodiments.

The cloud computing system of the present invention may include a cloud computing server 14100, a user DB 14100, a computing resource 14200, and a user terminal.

The cloud computing system provides an on demand outsourcing service of computing resources through an information communication network such as the Internet according to a request of a user terminal. In a cloud computing environment, service providers integrate computing resources in data centers that are in different physical locations into virtualization technologies to provide services to users. Service users do not install and use computing resources such as application, storage, OS, security, etc. in the terminals owned by each user, but instead use services in the virtual space created through virtualization technology Can be selected and used as desired.

A user terminal of a specific service user accesses the cloud computing server 14100 through an information communication network including the Internet and a mobile communication network. The user terminals can receive cloud computing service, in particular, a moving image playback service, from the cloud computing server 14100. [ The user terminal includes all electronic devices capable of accessing the Internet such as a desktop PC 14300, a smart TV 14400, a smartphone 14500, a notebook 14600, a portable multimedia player (PMP) 14700, and a tablet PC 14800 It can be a device.

The cloud computing server 14100 can integrate a plurality of computing resources 14200 distributed in the cloud network and provide the integrated computing resources to the user terminal. A number of computing resources 14200 include various data services and may include uploaded data from a user terminal. In this way, the cloud computing server 14100 integrates the video database distributed in various places into the virtualization technology to provide the service requested by the user terminal.

The user DB 14100 stores user information subscribed to the cloud computing service. Here, the user information may include login information and personal credit information such as an address and a name. Also, the user information may include an index of a moving image. Here, the index may include a list of moving pictures that have been played back, a list of moving pictures being played back, and a stopping time of the moving pictures being played back.

Information on the moving image stored in the user DB 14100 can be shared among user devices. Accordingly, when the user requests playback from the notebook computer 14600 and provides the predetermined video service to the notebook computer 14600, the playback history of the predetermined video service is stored in the user DB 14100. When a request to reproduce the same moving picture service is received from the smartphone 14500, the cloud computing server 14100 refers to the user DB 14100 and finds and plays the predetermined moving picture service. When the smartphone 14500 receives the moving image data stream through the cloud computing server 14100, the operation of decoding the moving image data stream to reproduce the video is the same as the operation of the cellular phone 12500 described above with reference to FIG. similar.

The cloud computing server 14100 may refer to the playback history of the predetermined moving image service stored in the user DB 14100. [ For example, the cloud computing server 14100 receives a playback request for the moving image stored in the user DB 14100 from the user terminal. If the moving picture has been played back before, the cloud computing server 14100 changes the streaming method depending on whether it is reproduced from the beginning according to the selection to the user terminal or from the previous stopping point. For example, when the user terminal requests to play from the beginning, the cloud computing server 14100 transmits the streaming video from the first frame to the user terminal. On the other hand, when the terminal requests to play back from the previous stopping point, the cloud computing server 14100 transmits the moving picture stream from the stopping frame to the user terminal.

At this time, the user terminal may include the video decoding apparatus of the present invention described above with reference to Figs. 1 to 27. As another example, the user terminal may include the video encoding apparatus of the present invention described above with reference to Figs. Also, the user terminal may include both the video encoding apparatus and the video decoding apparatus of the present invention described above with reference to Figs. 1 to 27.

Various embodiments in which the video coding method and the video decoding method, video coding apparatus and video decoding apparatus described above with reference to Figs. 1 to 27 are utilized have been described in Figs. 28 to 34. Fig. However, various embodiments in which the video encoding method and the video decoding method described above with reference to Figs. 1 to 27 are stored in a storage medium or in which a video encoding apparatus and a video decoding apparatus are implemented in a device, .

It will be understood by those skilled in the art that various embodiments disclosed herein may be embodied in various forms without departing from the essential characteristics of the embodiments disclosed herein. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present disclosure is set forth in the following claims, rather than the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included within the scope of the present disclosure.

Claims

A scalable video encoding method comprising:
Determining a reference layer image among base layer images to inter-layer predict an enhancement layer image;
Performing inter-layer (IL) interpolation filtering on the determined reference layer image to generate an upsampled reference layer image; And
Sampled reference layer image is determined through the IL interpolation filtering, the inter-prediction between the enhancement layer images is determined not to be performed for the enhancement layer image, and the inter-prediction between the upsampled reference layer image and the enhancement layer image Wherein the quantization step includes the steps of:

2. The method of claim 1, wherein the determining the reference layer image comprises:
Performing four MC interpolation filtering including interpolation filtering in the horizontal direction and interpolation filtering in the vertical direction in the L0 prediction direction and the L1 prediction direction with respect to the base layer image,
Wherein the step of generating the upsampled reference layer video comprises:
Performing two IL interpolation filtering including horizontal directional IL interpolation filtering and vertical directional IL interpolation filtering on the determined reference layer image,
Wherein the interpolation filtering for inter-layer prediction of the enhancement layer image is limited to the MC interpolation filtering for the four times and the IL interpolation filtering for the two times.

3. The method of claim 2,
Layer interpolation for an interlayer prediction of the enhancement layer image, the sum of a first calculation amount of MC interpolation filtering for inter prediction between the base layer images and a second calculation amount of MC interpolation filtering for inter prediction between the enhancement layer images, And the amount of filtering is limited so as not to be large.

2. The method of claim 1, wherein encoding the residue component between enhancement layer pictures comprises:
Encoding a reference index indicating that the reference image of the enhancement layer image is the upsampled reference image and encoding a motion vector for inter-prediction between the enhancement layer images to indicate 0, A method for encoding a video signal.

The method according to claim 1,
In the enhancement layer video, whether to perform inter prediction is determined based on the size, shape, and prediction direction of the block,
Wherein the number of taps of the IL interpolation filter for the IL interpolation filtering is limited not to be greater than the number of taps of the MC interpolation filter for the MC interpolation filtering.

The method according to claim 1,
Based on at least one of a number of taps of a MC interpolation filter for MC interpolation filtering for inter-prediction for the enhancement layer image, a number of taps of the IL interpolation filter for IL interpolation filtering, and a size of a prediction unit of the enhancement layer image, Wherein the number of MC interpolation filtering and the number of IL interpolation filtering are limited.

The method according to claim 6,
Interpolation filtering for blocks of size 8x8 or greater is limited to a combination of i) two 8-tap MC interpolation filtering or ii) one 8-tap MC interpolation filtering and one 8-tap IL interpolation filtering,
I) a combination of one 8-tap IL interpolation filtering, or iv) a combination of two 6-tap IL interpolation filtering, or v) a combination of two 4-tap IL interpolation filtering, or vi) 2-tap IL interpolation filtering,
(Vii) a combination of two 8-tap MC interpolation filtering and one 4-tap IL interpolation filtering, or viii) a combination of 4 2-tap MC interpolation filtering and 4 2-tap IL interpolation filtering , Or ix) a combination of two 8-tap MC interpolation filtering and 2 2-tap IL interpolation filtering, or x) a combination of 8 8-tap MC interpolation filtering and 1 8-tap IL interpolation filtering. A method for encoding a video signal.

A scalable video decoding method comprising:
Obtaining a reference index indicating a residue component and a reference layer image for inter-layer prediction of an enhancement layer image;
Determining, based on the reference index, not to perform inter prediction between enhancement layer images and determining the reference layer image from among base layer images;
Generating an upsampled reference layer image by performing IL interpolation filtering on the determined reference layer image; And
And reconstructing the enhancement layer image using the residue component of the interlaced prediction and the upsampled reference layer image.

9. The method of claim 8, wherein determining the reference layer video comprises:
Performing four MC interpolation filtering including horizontal interpolation filtering and vertical interpolation filtering in the L0 prediction direction and the L1 prediction direction with respect to the base layer video,
Wherein the step of generating the upsampled reference layer video comprises:
Performing two IL interpolation filtering including horizontal directional IL interpolation filtering and vertical directional IL interpolation filtering on the determined reference layer image,
Wherein the interpolation filtering for inter-layer prediction of the enhancement layer image is limited to the MC interpolation filtering and the two IL interpolation filtering.

10. The method of claim 9,
Layer interpolation for an interlayer prediction of the enhancement layer image, the sum of a first calculation amount of MC interpolation filtering for inter prediction between the base layer images and a second calculation amount of MC interpolation filtering for inter prediction between the enhancement layer images, And the amount of computation of the filtering is not large.

9. The method of claim 8, wherein determining the reference layer video comprises:
And determining a motion vector for inter-prediction between the enhancement layer images to be 0 if the reference index of the enhancement layer image indicates the upsampled reference image.

9. The method of claim 8,
In the enhancement layer video, whether to perform inter prediction is determined based on the size, shape, and prediction direction of the block,
Wherein the number of taps of the IL interpolation filter for the IL interpolation filtering is limited not to be greater than the number of taps of the MC interpolation filter for the MC interpolation filtering.

9. The method of claim 8,
Based on at least one of a number of taps of a MC interpolation filter for MC interpolation filtering for inter-prediction for the enhancement layer image, a number of taps of the IL interpolation filter for IL interpolation filtering, and a size of a prediction unit of the enhancement layer image, Wherein the number of MC interpolation filtering and the number of IL interpolation filtering are limited.

14. The method of claim 13,
Interpolation filtering for blocks of size 8x8 or greater is limited to a combination of i) two 8-tap MC interpolation filtering or ii) one 8-tap MC interpolation filtering and one 8-tap IL interpolation filtering,
I) a combination of one 8-tap IL interpolation filtering, or iv) a combination of two 6-tap IL interpolation filtering, or v) a combination of two 4-tap IL interpolation filtering, or vi) 2-tap IL interpolation filtering,
(Vii) a combination of two 8-tap MC interpolation filtering and one 4-tap IL interpolation filtering, or viii) a combination of 4 2-tap MC interpolation filtering and 4 2-tap IL interpolation filtering , Or ix) a combination of two 8-tap MC interpolation filtering and 2 2-tap IL interpolation filtering, or x) a combination of 8 8-tap MC interpolation filtering and 1 8-tap IL interpolation filtering. A method for decoding a good video.

A scalable video encoding apparatus comprising:
A base layer encoding unit for performing inter-prediction on base layer images; And
Determines an up-sampled reference layer video image by performing IL interpolation filtering on the determined reference layer video image, determines that inter-prediction between enhancement layer video images is not performed for the enhancement layer video image, And an enhancement layer encoder for encoding the residue component between the enhancement layer images.

A scalable video encoding apparatus comprising:
A base layer decoding unit for performing motion compensation to restore base layer images; And
Determining whether to perform inter-prediction between enhancement layer images based on the reference index if a reference index indicating a residue component and a reference layer image for inter-layer prediction of the enhancement layer image is obtained; An enhancement layer for restoring the enhancement layer image using the residue component of the interlaced prediction and the upsampled reference layer image by performing IL interpolation filtering on the reference layer image to generate an upsampled reference layer image, And a decoding unit for decoding the encoded video data.

A computer-readable recording medium on which a computer program for implementing the scalable video encoding method according to any one of claims 1 to 7 is recorded.

A computer-readable recording medium on which a computer program for implementing the scalable video decoding method according to any one of claims 8 to 14 is recorded.