WO2015137786A1 - 스케일러블 비디오 부호화/복호화 방법 및 장치 - Google Patents
스케일러블 비디오 부호화/복호화 방법 및 장치 Download PDFInfo
- Publication number
- WO2015137786A1 WO2015137786A1 PCT/KR2015/002532 KR2015002532W WO2015137786A1 WO 2015137786 A1 WO2015137786 A1 WO 2015137786A1 KR 2015002532 W KR2015002532 W KR 2015002532W WO 2015137786 A1 WO2015137786 A1 WO 2015137786A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- layer
- phase difference
- luma
- chroma
- current layer
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/30—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/105—Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/186—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/44—Decoders specially adapted therefor, e.g. video decoders which are asymmetric with respect to the encoder
Definitions
- the present invention relates to a method and apparatus for video encoding / decoding through image upsampling.
- one picture is divided into macroblocks to encode an image. Then, each macroblock is predictively encoded using inter prediction or intra prediction.
- Inter prediction is a method of compressing an image by removing temporal redundancy between pictures
- motion estimation coding is a representative example.
- the motion estimation encoding predicts blocks of the current picture using at least one reference region, respectively.
- the predetermined evaluation function is used to search for a reference block most similar to the current block in a predetermined search range.
- the current block is predicted based on the reference block, and the residual block generated by subtracting the prediction block generated as a result of the prediction from the current block is encoded.
- interpolation is performed on the search range of the reference region to perform prediction more accurately, and subsamples of smaller sample units are generated, and inter prediction is performed based on the generated subsamples. do.
- a method of determining a phase of samples included in a predictive picture by using information about a phase difference and an apparatus for performing the method are provided. Also provided is a method of determining information for determining information relating to a phase difference and an apparatus for performing the method.
- a video decoding method includes determining a predictive picture of a current layer.
- the phase of the luma samples included in the prediction picture may be adjusted according to the luma vertical phase difference and the luma horizontal phase difference, and the phase of the chroma samples included in the prediction picture may be adjusted according to the chroma vertical phase difference and the chroma horizontal phase difference. .
- the luma vertical phase difference and the chroma vertical phase difference may be determined by a scanning method of the reference layer.
- the luma vertical phase difference and the chroma vertical phase difference are determined by a scanning method of the reference layer and an alignment method of the reference layer and the current layer, and the alignment method is based on the left upper portion of the reference layer and the current layer.
- Zero-phase alignment to align the reference layer and the current layer and symmetric alignment to align the reference layer and the current layer with respect to the center of the reference layer and the current layer May be included.
- the video decoding method includes reference layer size information indicating a height and a width of the reference layer, reference layer offset information for defining a reference region used for inter-layer prediction from the reference layer, and a current indicating the height and width of the current layer.
- Acquiring layer size information and current layer offset information for defining an extended reference region corresponding to the reference region from the current layer, from the bitstream, from the reference layer size information and the reference layer offset information, of the reference region Determining a size, determining a size of the extended reference region from the current layer size information and the current layer offset information, according to the size of the reference region and the size of the extended reference region, The ratio of the size between the reference regions
- the method may further include determining an accumulation ratio, wherein the determining of the prediction picture comprises: the luma vertical phase difference, the luma horizontal phase difference, the chroma vertical phase difference and the chroma horizontal phase difference, the reference layer offset information, and the current layer.
- the predictive picture may be determined by upsampling the reference
- the video decoding method may further include obtaining residual data from the bitstream including residual values of sample values included in the current layer and sample values included in a reference picture of the current layer, and the residual data; The method may further include reconstructing the current picture using the predictive picture.
- a video decoding apparatus including a decoder configured to determine a predictive picture of the current layer.
- the phase of the luma samples included in the prediction picture may be adjusted according to the luma vertical phase difference and the luma horizontal phase difference, and the phase of the chroma samples included in the prediction picture may be adjusted according to the chroma vertical phase difference and the chroma horizontal phase difference. .
- the luma vertical phase difference and the chroma vertical phase difference may be determined by a scanning method of the reference layer.
- the luma vertical phase difference and the chroma vertical phase difference are determined by a scanning method of the reference layer and an alignment method of the reference layer and the current layer, and the alignment method is based on the left upper portion of the reference layer and the current layer.
- Zero-phase alignment to align the reference layer and the current layer and symmetric alignment to align the reference layer and the current layer with respect to the center of the reference layer and the current layer Can be included.
- the reception extracting unit may include reference layer size information indicating a height and a width of the reference layer, reference layer offset information for defining a reference region used for inter-layer prediction from the reference layer, and a current indicating the height and width of the current layer.
- Determine a size of a reference area determine a size of the extended reference area from the current layer size information and the current layer offset information, and determine the size of the reference area and the extended reference area according to the size of the reference area and the size of the extended reference area.
- the reception extracting unit obtains, from the bitstream, residual data including a difference between sample values included in the current layer and sample values included in a reference picture of the current layer, and the decoding unit includes:
- the current picture may be reconstructed using data and the predictive picture.
- Determining a scanning method of a current layer and a reference layer according to an embodiment, wherein the current layer is scanned by a progressive scanning method, and when indicating that the reference layer is scanned by an interlaced scanning method, fields of the reference layer are scanned. Determining, luma vertical phase difference, luma horizontal phase difference, chroma vertical phase difference, and chroma horizontal to correct phases of luma samples and chroma samples included in the prediction picture of the current layer based on the scanning scheme and the field of the reference layer.
- a phase difference Determining a phase difference, determining a predictive picture of the current layer by upsampling the reference layer based on the luma vertical phase difference, the luma horizontal phase difference, the chroma vertical phase difference, and the chroma horizontal phase difference, Sample values and the sample values of the prediction picture of the current layer Determining residual data comprising these values, and outputting a bitstream comprising the luma vertical phase difference, the luma horizontal phase difference, the chroma vertical phase difference, the chroma horizontal phase difference, and the residual data.
- An encoding method is provided.
- a scanning scheme of a current layer and a reference layer is determined, and the current layer is scanned by a progressive scanning scheme, and when indicating that the reference layer is scanned by an interlaced scanning scheme, a field of the reference layer is determined.
- Luma vertical phase difference, luma horizontal phase difference, chroma vertical phase difference and chroma horizontal phase difference for correcting the phase of luma samples and chroma samples included in the prediction picture of the current layer based on the scanning method and the field of the reference layer.
- a video encoding apparatus comprising: an encoder configured to determine residual data; and an output unit configured to output a bitstream including the luma vertical phase difference, the luma horizontal phase difference, the chroma vertical phase difference, the chroma horizontal phase difference, and the residual data.
- a computer-readable recording medium having recorded thereon a program for executing the video decoding method and the video encoding method is provided.
- the phases of the samples included in the current layer are adjusted according to encoding conditions.
- the phases of the samples are adjusted as in the encoding process. The coding efficiency is increased by adjusting the phase of the samples in the resampling process.
- FIG. 1A illustrates a block diagram of a scalable video decoding apparatus, according to an embodiment.
- 1B is a flowchart of a scalable video decoding method, according to an embodiment.
- FIG. 2A is a block diagram of a scalable video encoding apparatus, according to an embodiment.
- 2B is a flowchart of a scalable video encoding method, according to an embodiment.
- 3A and 3B are diagrams for describing a luma-chroma phase difference according to an embodiment.
- 4A is a diagram for describing an interlaced scanning method, according to an exemplary embodiment.
- 4B is a diagram for describing a reference area, an extended reference area, and an accumulation ratio, according to an embodiment.
- FIG. 5 is a diagram illustrating syntax for describing a process of acquiring encoding information, according to an embodiment.
- FIGS. 6A and 6B illustrate block diagrams of the scalable video encoding apparatus 600 according to an embodiment.
- FIG. 7A and 7B illustrate a block diagram of a scalable video decoding apparatus 700, according to an embodiment.
- FIG. 8A is a block diagram of a video encoding apparatus based on coding units having a tree structure, according to an embodiment.
- FIG. 8B is a block diagram of a video decoding apparatus based on coding units having a tree structure, according to an embodiment.
- FIG 9 illustrates a concept of coding units, according to an embodiment.
- 10A is a block diagram of an image encoder based on coding units, according to an embodiment.
- 10B is a block diagram of an image decoder based on coding units, according to an embodiment.
- FIG. 11 is a diagram of deeper coding units according to depths, and partitions, according to an embodiment.
- FIG. 12 illustrates a relationship between a coding unit and transformation units, according to an embodiment.
- 13 is a diagram of deeper encoding information, according to an embodiment.
- FIG. 14 is a diagram of deeper coding units according to depths, according to an exemplary embodiment.
- 15, 16, and 17 illustrate a relationship between a coding unit, a prediction unit, and a transformation unit, according to an embodiment.
- FIG. 18 illustrates a relationship between a coding unit, a prediction unit, and a transformation unit, according to encoding mode information of Table 1.
- FIG. 18 illustrates a relationship between a coding unit, a prediction unit, and a transformation unit, according to encoding mode information of Table 1.
- FIG. 19 illustrates a physical structure of a disk in which a program is stored.
- FIG. 20 shows a disc drive for recording and reading a program by using the disc.
- FIG. 21 illustrates the overall structure of a content supply system for providing a content distribution service.
- 22 and 23 illustrate an external structure and an internal structure of a mobile phone to which a video encoding method and a video decoding method are applied, according to an embodiment.
- FIG. 24 illustrates a digital broadcasting system employing a communication system, according to an embodiment.
- a video decoding method includes determining a predictive picture of a current layer.
- the phase of the luma samples included in the prediction picture may be adjusted according to the luma vertical phase difference and the luma horizontal phase difference, and the phase of the chroma samples included in the prediction picture may be adjusted according to the chroma vertical phase difference and the chroma horizontal phase difference.
- the luma vertical phase difference and the chroma vertical phase difference may be determined by a scanning method of the reference layer.
- a video decoding apparatus including a decoder configured to determine a predictive picture of the current layer.
- the phase of the luma samples included in the prediction picture may be adjusted according to the luma vertical phase difference and the luma horizontal phase difference, and the phase of the chroma samples included in the prediction picture may be adjusted according to the chroma vertical phase difference and the chroma horizontal phase difference.
- the luma vertical phase difference and the chroma vertical phase difference may be determined by a scanning method of the reference layer.
- Determining a scanning method of a current layer and a reference layer according to an embodiment, wherein the current layer is scanned by a progressive scanning method, and when indicating that the reference layer is scanned by an interlaced scanning method, fields of the reference layer are scanned. Determining, luma vertical phase difference, luma horizontal phase difference, chroma vertical phase difference, and chroma horizontal to correct phases of luma samples and chroma samples included in the prediction picture of the current layer based on the scanning scheme and the field of the reference layer.
- a phase difference Determining a phase difference, determining a predictive picture of the current layer by upsampling the reference layer based on the luma vertical phase difference, the luma horizontal phase difference, the chroma vertical phase difference, and the chroma horizontal phase difference, Sample values and the sample values of the prediction picture of the current layer Determining residual data comprising these values, and outputting a bitstream comprising the luma vertical phase difference, the luma horizontal phase difference, the chroma vertical phase difference, the chroma horizontal phase difference, and the residual data.
- An encoding method is provided.
- a scanning scheme of a current layer and a reference layer is determined, and the current layer is scanned by a progressive scanning scheme, and when indicating that the reference layer is scanned by an interlaced scanning scheme, a field of the reference layer is determined.
- Luma vertical phase difference, luma horizontal phase difference, chroma vertical phase difference and chroma horizontal phase difference for correcting the phase of luma samples and chroma samples included in the prediction picture of the current layer based on the scanning method and the field of the reference layer.
- a video encoding apparatus comprising: an encoder configured to determine residual data; and an output unit configured to output a bitstream including the luma vertical phase difference, the luma horizontal phase difference, the chroma vertical phase difference, the chroma horizontal phase difference, and the residual data.
- 'image' may refer to a generic image including a still image as well as a video such as a video.
- 'picture' described in the present specification means a still image to be encoded or decoded.
- the scalable coding method refers to a method of hierarchically encoding a single image so that various resolutions, frame rates, quality, and the like are supported. Since one bitstream includes images having various resolutions, frame rates, and image quality, the content consumer may extract a portion of the bitstream to play an image satisfying a desired resolution, frame rate, and image quality.
- An image encoded according to the scalable encoding method has two or more layers. Each layer may have at least one of an upper layer and a lower layer.
- the layer may be classified into a current layer and a reference layer.
- the current layer refers to an upper layer of the reference layer that is encoded / decoded with reference to pictures of the reference layer.
- the reference layer refers to a lower layer of the current layer that provides a picture necessary for encoding / decoding of the current layer.
- pictures of a reference layer are inferior in terms of resolution, frame rate, and picture quality to pictures of the current layer.
- the current layer and the reference layer are relative concepts. For example, if there is a first layer, a second layer, and a third layer from an upper layer, the second layer may be a reference layer with respect to the first layer. In contrast, the second layer may be the current layer with respect to the third layer.
- the current layer is mixed with the term enhancement layer.
- the reference layer is mixed with the term base layer.
- the enhancement layer used herein has the same meaning as the current layer.
- the base layer used herein has the same meaning as the reference layer.
- resampling refers to a series of processes for re-determining the number and properties of samples constituting a picture. Resampling includes downsampling and upsampling.
- Downsampling in the present specification means a series of processes for reducing the number of samples constituting the picture. For example, when the number of samples constituting the picture is 32x32, downsampled pictures having the number of samples 16x16 may be obtained using downsampling. The rate of samples decreasing due to downsampling may vary from embodiment to embodiment.
- Upsampling in the present specification refers to a series of processes of increasing the number of samples constituting a picture as opposed to downsampling. For example, when the number of samples constituting the picture is 16 ⁇ 16, an upsampled picture having the number of samples 32 ⁇ 32 may be obtained using upsampling. The rate of samples decreasing due to upsampling may vary from embodiment to embodiment.
- downsampling and upsampling are used in terms of resolution, and can downsample the current layer to generate a reference layer, and upsample the reference layer to obtain a predictive picture of the current layer.
- the offset refers to a displacement difference between the entire area of the layer defined on the basis of luma sample units and a partial area of the layer to be subjected to upsampling or downsampling.
- the horizontal offset means a displacement difference in the horizontal direction.
- the vertical offset means a displacement difference in the vertical direction.
- the unit of the offset is defined by setting the interval between the left, right, up, and down adjacent luma samples as one. For example, when the fourth sample to the right and the second sample below the B sample are B samples, the horizontal offset of the B sample to the A sample is 4, and the vertical offset is 2.
- the phase refers to a sample-to-sample displacement.
- the phase can include vertical or horizontal components.
- the phase difference refers to the displacement of the sample after being adjusted with the sample before being adjusted when the position of the sample is adjusted in the upsampling or downsampling process.
- the phase difference can only be expressed as an integer. For example, when defining a distance between adjacent left and right adjacent samples as 16, the phase difference may be expressed with an accuracy of 1/16 of the distance between adjacent left and right samples.
- the phases of the luma samples and the chroma samples may be adjusted during downsampling and upsampling to improve encoding efficiency.
- the phases of the luma samples and the chroma samples are adjusted similarly to the encoding step in the upsampling process. Therefore, when upsampling the current layer during decoding, information about the adjusted phase is required when downsampling and upsampling the original image during encoding.
- the present specification describes a method of determining the phase of samples in the upsampling process. More specifically, various embodiments of the present disclosure change the phase of the luma sample and the chroma sample during the decoding process by using information about the phase changed by the alignment method of the reference layer and the current layer and the scanning method of the reference layer. A method and apparatus for performing the method are provided. Also described herein is an overall process that occurs during upsampling with respect to phase variations.
- upsampling of an image in consideration of offsets of a reference layer and a current layer is proposed.
- scalable video encoding and decoding using upsampling in consideration of the offset of the reference layer and the current layer is proposed.
- encoding and decoding of a video based on coding units having a tree structure performed in each layer of the scalable video system are proposed.
- FIG. 1A illustrates a block diagram of the scalable video decoding apparatus 100, according to an embodiment.
- the scalable video encoding apparatus 100 may include a reception extractor 110 and a decoder 120.
- the reception extractor 110 and the decoder 120 are represented by separate components, but according to an exemplary embodiment, the reception extractor 110 and the decoder 120 may be combined to be implemented in the same configuration unit. have.
- reception extractor 110 and the decoder 120 are represented as structural units located in one device, the apparatuses that are in charge of each function of the reception extractor 110 and the decoder 120 are physically physical. It does not have to be contiguous. Therefore, according to an exemplary embodiment, the reception extractor 110 and the decoder 120 may be distributed.
- the reception extractor 110 and the decoder 120 of FIG. 1A may be implemented by one processor according to an exemplary embodiment. In some embodiments, the present invention may also be implemented by a plurality of processors.
- the scalable video decoding apparatus 100 may include storage (not shown) for storing data generated by the reception extractor 110 and the decoder 120.
- the reception extractor 110 and the decoder 120 may extract and use data stored in a storage (not shown).
- the scalable video decoding apparatus 100 of FIG. 1A is not limited to a physical apparatus.
- some of the functions of the scalable video decoding apparatus 100 may be implemented in software instead of hardware.
- the reception extractor 110 may obtain upsampling phase set information.
- the upsampling phase set information indicates whether the phases of the samples included in the current layer are adjusted during the upsampling process.
- the upsampling phase set information may have a value of zero or one. For example, if the upsampling phase set information indicates 1, the phase of the samples included in the current layer is adjusted. On the contrary, when the upsampling phase set information indicates 0, the phases of the samples included in the current layer are not adjusted. In contrast to the above example, when the alignment scheme indication information indicates 0, the phases of the samples included in the current layer may be adjusted.
- the reception extractor 110 may obtain a luma vertical phase difference, a luma horizontal phase difference, a chroma vertical phase difference, and a chroma horizontal phase difference from the bitstream.
- the luma vertical phase difference indicates how the phase of the luma samples varies in the vertical direction.
- the luma horizontal phase difference indicates how much the phase of the luma samples varies in the horizontal direction.
- the chroma vertical phase difference indicates how much the phases of the chroma samples vary in the vertical direction.
- the chroma horizontal phase difference indicates how much the phases of the chroma samples vary in the horizontal direction.
- Luma vertical phase difference, luma horizontal phase difference, chroma vertical phase difference, and chroma horizontal phase difference may be determined in an encoding step.
- a method of determining luma vertical phase difference, luma horizontal phase difference, chroma vertical phase difference, and chroma horizontal phase difference will be described.
- the chroma vertical phase difference and the chroma horizontal phase difference may be determined by the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference.
- the vertical luma-chroma phase difference represents the phase difference in the vertical direction between the luma sample and the chroma sample.
- the horizontal luma-chroma phase difference represents the phase difference in the horizontal direction between the luma sample and the chroma sample. 3A and 3B, the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference are described.
- FIG. 3A discloses six cases according to the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference.
- FIG. 3B is a diagram illustrating the relative positions of the chroma samples with respect to the luma samples when the color format is 4: 2: 0, for each case shown in FIG. 3A.
- the X-axis phase difference of FIG. 3A means horizontal luma-chroma phase difference
- the Y-axis phase difference means vertical luma-chroma phase difference.
- the square symbol of FIG. 3B means luma sample
- the circular symbol means chroma sample.
- the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference may have values of 0 to 2. However, according to the embodiment, it may have a value other than 0 to 2.
- one chroma sample corresponds to a 2 ⁇ 2 luma sample grid of four luma samples. If there is no phase difference between the luma sample and the chroma sample, the chroma sample is located in the luma sample located in the upper left of the 2 x 2 luma sample grid.
- the distance between adjacent luma samples in the vertical or horizontal direction is defined as two.
- the vertical luma-chroma phase difference is two. If the chroma sample 312 moves from the position of the luma sample 310 to the intermediate position of the luma sample 310 and the luma sample 314 as in case b, the vertical luma-chroma phase difference is one.
- the horizontal luma-chroma phase difference and the vertical luma-chroma phase difference are both zero. Therefore, since there is no phase difference between the luma sample and the chroma sample, the chroma sample 302 is located in the luma sample located at the upper left of the 2 ⁇ 2 luma sample grid.
- the horizontal luma-chroma phase difference is zero and the vertical luma-chroma phase difference is one.
- the chroma sample 312 is located in the center of the upper left luma sample 310 and the lower left luma sample 314 of the 2 ⁇ 2 luma sample grid.
- the horizontal luma-chroma phase difference is 1 and the vertical luma-chroma phase difference is 0.
- the chroma sample is thus located in the center of the upper left luma sample 320 and the upper right luma sample 324 of the 2 ⁇ 2 luma sample grid.
- the chroma sample 338 is located at the center of the four luma samples 330, 332, 334, and 336.
- the horizontal luma-chroma phase difference is 1 and the vertical luma-chroma phase difference is 2.
- the chroma sample 344 is located between the lower left luma sample 340 and the lower right luma sample 342 of the 2 ⁇ 2 luma sample grid.
- the horizontal luma-chroma phase difference is zero, and the vertical luma-chroma phase difference is two.
- the chroma sample 352 is thus located at the location of the lower left luma sample 350 of the 2 ⁇ 2 luma sample grid.
- the chroma vertical phase difference and the chroma horizontal phase difference may be determined using the determined vertical luma-chroma phase difference and the horizontal luma-chroma phase difference. Since the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference mean the relative phase change of the chroma sample with respect to the luma sample, there is no change in the luma vertical phase difference and the luma horizontal phase difference. For example, if the vertical luma-chroma phase difference is 1 and the horizontal luma-chroma phase difference is 2, there is no phase variation of the luma sample due to the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference. However, the chroma sample is shifted in phase by 1 at the bottom and 2 at the left due to the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference.
- the luma vertical phase difference, the luma horizontal phase difference, the chroma vertical phase difference, and the chroma horizontal phase difference may be determined according to an alignment method indicating an alignment method between the reference layer and the current layer.
- Alignment schemes include zero-phase alignment and symmetric alignment.
- the zero phase alignment method is a method of aligning samples of the reference layer and the current layer based on the upper left samples of the reference layer or the current layer. For example, when the luma sample and the chroma sample are aligned by the zero phase alignment method, the chroma sample is positioned in the luma sample located at the upper left side of the four luma samples as shown in case a of FIG. 3B. Therefore, if the samples are aligned according to the zero phase alignment scheme, the sample according to the alignment scheme is already located when the color format is 4: 2: 0 because the chroma sample is already located at the position of the upper left luma sample of the 2 ⁇ 2 luma sample grid. There is no phase shift.
- the symmetrical sorting method aligns samples of the reference layer and the current layer based on the center of the reference layer and the current layer.
- the sample distribution is symmetric about the center part. Therefore, by the symmetric alignment method, the phases of the luma sample and the chroma sample are adjusted.
- the luma vertical phase difference and the chroma vertical phase difference may be determined according to the scanning scheme of the current layer and the reference layer and the fields of the reference layer.
- Scanning methods include a progressive scan method and an interlace scan method.
- the sequential scanning method refers to a method of displaying, storing, and transmitting an image including one picture in one frame. Therefore, each frame of the image corresponds to one intact picture. For example, when a picture is acquired every 1/30 second, 30 frames corresponding to 30 pictures are generated in one second. The 30 frames per second generated by the sequential scanning method is displayed as 30p.
- interlaced scanning refers to a method of displaying, storing, and transmitting an image including an odd field or an even field of a picture in one frame.
- the odd field includes only samples positioned in an odd line among samples constituting the picture.
- the even field includes only samples located on an even line among samples constituting the picture.
- Frames with odd fields are alternately played with frames with even fields to produce the same effect as the entire picture.
- 4A is a diagram for describing in detail the features of the interlaced scanning method introduced above.
- a frame 402 including an odd field is displayed on the left side
- a frame 304 including an even field is displayed on the right side.
- the gray line means a sample line in which the samples to be scanned are located.
- the white line means a sample line without samples to be scanned.
- n is an integer of 1 or more.
- the frame 402 including the odd field includes only the odd field of the picture acquired at (2n-2) / 60 seconds.
- the frame 404 including even fields includes only even fields of pictures acquired in (2n-1) / 60 seconds. Accordingly, the frame 402 including the odd field and the frame 404 including the even field are alternately displayed, stored, and transmitted. As shown in FIG. 4A, an image of 60 frames per second generated by the interlaced scanning method is displayed as 60i.
- the data amount of one frame of an image generated by the interlaced scanning method is half of the data amount of one frame of the image generated by the progressive scanning method. Therefore, the data amount of an image generated at 60 frames per second by the interlaced scanning method is equal to half of the data amount of 60 frames per second generated by the sequential scanning method. Thus, interlaced scanning requires less data.
- the sequential scanning method reproduces a frame including an intact picture, it is possible to provide better image quality than the interlaced scanning method.
- the interlaced scanning method displays only even-numbered scan lines (even fields) or odd-numbered scan lines (odd fields) in one frame.
- the positions of the samples of the frame including the even field should be adjusted so that the area in which the odd field is displayed and the area in the even field do not overlap.
- the reference layer is scanned by the interlaced scanning method, and the fields of the reference layer are even fields, the luma vertical phase difference and the chroma vertical phase difference are adjusted.
- Equations 1 and 2 below are equations for determining the vertical phase difference and the horizontal phase difference based on the alignment method and the scanning method.
- the encoder 210 of the scalable encoding apparatus 200 to be described below may determine the vertical phase difference and the horizontal phase difference based on Formula 1 and Formula 2.
- Equations 1 and 2 ' ⁇ ' means a right shift operator. Specifically, '(bit string) ⁇ (N)' is interpreted as adding 'N' zeros to the right of 'bit string'. For example, '11 ⁇ 2 'is interpreted as' 1100'.
- ‘?’ Means a conditional operator. Specifically, ‘(conditional statement)? (Formula 1) :( Formula 2) 'is interpreted as the result of Expression 1 if the conditional statement is true, or Expression 2 if the conditional statement is false.
- phaseX means horizontal phase difference
- phaseY means vertical phase difference
- CIdx denotes a color component index
- cross_layer_phase_alignment_flag denotes alignment scheme indication information
- VertPhasePositionAdjustFlag means scanning method indication information
- VertPhasePositionFlag means field indication information.
- the color component index is determined to be zero.
- the color component index is determined to be 1 or 2.
- the alignment scheme indication information is set to zero. If the current layer and the reference layer are based on the symmetrical alignment scheme, the alignment scheme indication information is determined to be one.
- the scanning scheme indication information is determined as zero.
- the scanning method indication information is determined to be 1.
- the field indication information is determined when the scanning method indication information is one. If the reference layer includes an odd field, the field indication information is determined to be zero. And if the reference layer includes an even field, the field indication information is determined to be 1.
- the value of the color component index is determined first. If the value of the color component index indicates 0, cross_layer_phase_alignment_flag ⁇ 1 is calculated to determine the horizontal phase difference with respect to the luma sample. If the alignment method indication information indicates 1, the horizontal phase difference for the luma sample is determined to be 2, and if the alignment method indication information indicates 0, the horizontal phase difference for the luma sample is determined to be 0.
- cross_layer_phase_alignment_flag is calculated to determine the horizontal phase difference with respect to the chroma sample. If the alignment method indication information indicates 1, the horizontal phase difference for the luma sample is determined to be 1, and if the alignment method indication information indicates 0, the horizontal phase difference for the luma sample is determined to be 0.
- the scanning method indication information is first interpreted. If the scanning method indication information indicates 1, the forward scanning method is applied to the current layer, and the reference layer is determined to be the interlaced scanning method. When the scanning method indication information indicates 1, VertPhasePositionFlag ⁇ 2 is calculated to obtain a vertical phase difference. If the scan type indication information indicates 1, the horizontal phase difference for the luma sample is determined to be 4, and if the scan type indication information indicates 0, the horizontal phase difference for the luma sample is determined to be 0.
- the scanning method indication information indicates 1
- it is determined that the forward scanning method is applied to both the current layer and the reference layer.
- the value of the color component index is then determined.
- cross_layer_phase_alignment_flag ⁇ 1 is calculated to determine the vertical phase difference with respect to the luma sample. If the alignment method indication information indicates 1, the vertical phase difference with respect to the luma sample is determined to be 2, and if the alignment method indication information indicates 0, the vertical phase difference with respect to the luma sample is determined to be 0.
- cross_layer_phase_alignment_flag + 1 is calculated to determine the vertical phase difference with respect to the chroma sample. If the alignment method indication information indicates 1, the vertical phase difference with respect to the luma sample is 2, and if the alignment method indication information indicates 0, the vertical phase difference with respect to the luma sample is 1.
- the reception extractor 110 may obtain reference layer size information, reference layer offset information, current layer size information, and current layer offset information from the bitstream.
- the reference region refers to a region used for inter-layer prediction in a reference layer picture.
- the entire area of the reference layer may be determined as the reference area.
- only a part of the reference layer may be determined as the reference region.
- a picture is encoded / decoded based on coding units. Since the minimum coding unit is 8x8, if the resolution of the reference layer and the current layer is not a multiple of 8, upsampling cannot be performed quickly. Therefore, when the resolution of the reference layer is not a multiple of 8, a reference region of which the resolution is a multiple of 8 may be set. Similarly, when the resolution of the current layer is not a multiple of 8, an extended reference region having a resolution of a multiple of 8 may be set.
- the extended reference region refers to a region in which a picture generated by upsampling the reference region is located.
- the resolution of the current layer picture is larger than the resolution of the reference layer picture
- the resolution of the current layer picture is larger than the resolution of the reference region that is part of the reference layer picture. Therefore, it is difficult to predict a current layer picture having a high resolution as a reference area having a low resolution. Therefore, by up-sampling the reference region, the current layer prediction picture is predicted using the extended reference region with increased resolution. Similar to the reference area, the entire area of the current layer may be determined as the extended reference area. However, according to an embodiment, only a part of the current layer may be determined as the extended reference region.
- the reference region is determined by reference layer size information and reference layer offset information.
- the reference layer size information means information on the height and width of the reference layer picture.
- Reference layer offset information means an offset between a reference layer picture and a reference region.
- the reference layer offset information may include a reference layer left offset, a reference layer right offset, a reference layer top offset, and a reference layer bottom offset.
- the reference layer left offset is a horizontal offset between the luma sample at the top left of the reference layer picture and the luma sample at the top left of the reference region.
- the reference layer top offset is the vertical offset of the luma sample at the top left of the reference layer picture and the luma sample at the top left of the reference region.
- the reference layer right offset is the horizontal offset of the luma sample at the bottom right of the reference layer picture and the luma sample at the bottom right of the reference area.
- the reference layer bottom offset is the luma sample at the bottom right of the reference layer picture and the luma at the bottom right of the reference area.
- the vertical offset of the sample is the horizontal offset of the luma sample at the bottom right of the reference layer picture and the luma sample at the bottom right of the reference area.
- the extended reference region is determined by the current layer size information and the current layer offset information.
- the current layer size information means information on the height and width of the current layer picture.
- the current layer offset information means an offset between the current layer picture and the current picture.
- the current layer offset information may include a current layer left offset, a current layer right offset, a current layer top offset, and a current layer bottom offset.
- the current layer left offset is a horizontal offset between the luma sample at the top left of the current layer picture and the luma sample at the top left of the extended reference region.
- the current layer top offset is the vertical offset of the luma sample at the top left of the current layer picture and the luma sample at the top left of the extended reference region.
- the current layer right offset is the horizontal offset of the luma sample at the bottom right of the current layer picture and the luma sample at the bottom right of the extended reference area
- the current layer bottom offset is the bottom right of the luma sample at the bottom right of the current layer picture and the extended reference area. Is the vertical offset of the luma sample.
- the reference layer offset information and the current layer offset information may be expressed in luma sample units. For example, if the reference layer left offset is 4 and the reference layer top offset is 2, then the fourth luma sample in the right direction and the second luma sample in the lower direction is the luma of the upper left of the reference region. It becomes a sample.
- the reference layer offset information and the current layer offset information are expressed in luma sample units
- the reference layer offset information and the current layer offset information may be expressed in chroma sample units.
- both the vertical offset and the horizontal offset included in the reference layer offset information and the current layer offset information may be twice the value when expressed in luma sample units.
- the luma sample and the chroma sample are one-to-one matching. Therefore, all offsets of the reference layer offset information and the current layer offset information have the same value when expressed in luma sample units and when expressed in chroma sample units.
- a method of determining the reference region and the extended reference region from the reference layer size information, the reference layer offset information, the current layer size information, and the current layer offset information in the decoder 120 to be described later will be described.
- the reception extractor 110 may obtain residual data used for reconstruction of the current layer from the bitstream.
- the residual data includes a difference value between a sample value of the upsampled image of the current layer and a sample value of the original image in the encoding process.
- the decoder 120 to be described later reconstructs the current layer by using the prediction image of the current layer generated by up-sampling the residual data and the reference layer.
- the decoder 120 upsamples the reference layer based on the information obtained by the reception extractor 110.
- the decoder 120 predicts the current layer using the upsampled reference layer.
- the decoder 120 may determine the size of the reference region from the reference layer size information and the reference layer offset information. For example, the decoder 120 may determine the height of the reference region by subtracting the reference layer top offset and the reference layer bottom offset from the height of the reference layer. The decoder 120 may determine the width of the reference region by subtracting the reference layer right offset and the reference layer left offset from the width of the reference layer.
- the decoder 120 may determine the size of the extended reference region from the current layer size information and the current layer offset information. For example, the decoder 120 may determine the height of the extended reference region by subtracting the current layer top offset and the current layer bottom offset from the height of the current layer. The decoder 120 may determine the width of the extended reference region by subtracting the current layer right offset and the current layer left offset from the width of the current layer.
- the decoder 120 may determine the entire region of the reference layer as the reference region. Similarly, when all offset values of the current layer offset information indicate 0, the decoder 120 may determine the entire area of the current layer as the extended reference area.
- the decoder 120 may determine an accumulation ratio indicating a ratio of the size between the reference area and the extended reference area according to the size of the reference area and the size of the extended reference area.
- the accumulation ratio represents the ratio of the size between the reference region and the extended reference region.
- the accumulation ratio may include a horizontal accumulation ratio representing the ratio of the width of the reference region to the width of the extended reference region, and a vertical accumulation ratio representing the ratio of the height of the reference region to the height of the extended reference region. For example, when both the vertical accumulation ratio and the horizontal accumulation ratio are 1: 2, when the number of luma samples of the reference region is 16 ⁇ 16, the number of luma samples of the extended reference region may be 32 ⁇ 32.
- the decoder 120 determines the accumulation ratio from the size of the reference region and the size of the extended reference region.
- the decoder 120 may determine the horizontal accumulation ratio by comparing the width of the reference region with the width of the extended reference region.
- the decoder 120 may determine the vertical accumulation ratio by comparing the height of the reference region with the height of the extended reference region.
- FIG. 4B a method of determining the reference region, the extended reference region, and the accumulation ratio among the functions of the decoder 120 will be described in detail.
- FIG. 4B shows a current layer picture 410 and a reference layer picture 430.
- An extended reference region 420 is defined in the current layer picture 410, and a reference region 440 is defined in the reference layer picture 430.
- the width 422a and height 422b of the extended reference region 420 are based on the width 412a and height 412b of the current layer picture 410 and the current layer offset information 414a, 414b, 414c, 414d. Can be determined.
- the current layer offset information 414a, 414b, 414c, 414d may include a current layer left offset 414a, a current layer top offset 414b, a current layer right offset 414c, and a current layer bottom offset 414d. .
- the width 422a of the extended reference region 420 may be determined by subtracting the current layer left offset 414a and the current layer right offset 414c from the width 412a of the current layer picture 410.
- the height 422b of the extended reference region 420 may be determined by subtracting the current layer top offset 414b and the current layer bottom offset 414d from the height 412b of the current layer picture 410.
- the width 442a and height 442b of the reference area 440 may be determined based on the width 432a and height 432b of the reference layer picture 430 and the reference layer offset information 434a, 434b, 434c, and 434d. Can be.
- Reference layer offset information 434a, 434b, 434c, 434d may include a reference layer left offset 434a, a reference layer top offset 434b, a reference layer right offset 434c, and a reference layer bottom offset 434d. .
- the width 422a of the reference region 420 may be determined by subtracting the reference layer left offset 414a and the reference layer right offset 414c from the width 412a of the reference layer picture 410.
- the height 442b of the reference region 440 may be determined by subtracting the reference layer upper offset 434b and the reference layer lower offset 434d from the height 432b of the reference layer picture 430.
- the horizontal accumulation ratio may be determined by comparing the width 422a of the reference region 420 with the width 422a of the extended reference region 420. In more detail, a value obtained by dividing the width 422a of the extended reference region 420 by the width 422a of the reference region 420 may be determined as the horizontal accumulation ratio.
- the vertical accumulation ratio may be determined by comparing the height 442b of the reference region 440 with the height 422b of the extended reference region 420. In more detail, a value obtained by dividing the height 422b of the extended reference region 420 by the height 442b of the reference region 440 may be determined as the horizontal accumulation ratio.
- the decoder 120 may determine the predictive picture of the extended reference region by upsampling the reference region according to the reference region offset information, the current region offset information, the horizontal accumulation ratio, and the vertical accumulation ratio. have.
- upsampling of a reference layer is interpreted as the same as upsampling of a reference region.
- the decoder 120 may adjust phases of luma samples and chroma samples included in the prediction picture of the current layer. In addition, the decoder 120 may adjust phases of luma samples and chroma samples included in the prediction picture of the extended reference region determined during upsampling by reference region offset information, current region offset information, horizontal accumulation ratio, and vertical accumulation ratio. have.
- the decoder 120 determines a sample value of the samples of the extended reference region based on the sample values of the reference region. In the interpolation process, phases of samples of the extended reference region, filter coefficients of the interpolation filter set, and sample values of the reference region are used.
- the decoder 120 may determine the prediction values of the region not included in the extended reference region in the current layer.
- the sample values may be determined based on sample values of samples included in the predictive picture of the extended reference region.
- the decoder 120 uses a method of padding, cropping, upsampling, downsampling, etc., to determine an area of an area not included in the extended reference area in the current layer from the sample values of the samples of the extended reference area. Prediction values can be determined. Therefore, the prediction picture of the current layer is determined using the prediction picture of the extended reference region and the prediction values of the region not included in the extended reference region.
- the decoder 120 may reconstruct the current layer by using the prediction picture and the residual data acquired by the reception extractor 110.
- reception extractor 110 and the decoder 120 described above is described in detail with reference to FIGS. 5A to 5D.
- FIG. 1B is a flowchart of a scalable video encoding method 10 performed by the scalable video encoding apparatus 100 according to an embodiment.
- step 11 in the process of determining the sample values of the samples included in the current layer based on the sample values of the samples of the reference layer, the upsampling phase set information indicating whether or not the phases of the samples included in the current layer are adjusted. Is obtained from.
- step 12 when the upsampling phase set information indicates that the phase is adjusted, luma vertical phase difference, luma horizontal phase difference, chroma vertical phase difference, and chroma horizontal phase difference are obtained from the bitstream.
- reference layer size information indicating the height and width of the reference layer
- reference layer offset information to define the reference region used for inter-layer prediction from the reference layer
- current height indicating the height and width of the current layer.
- Layer size information and current layer offset information for defining an extended reference region corresponding to the reference region from the current layer may be obtained from the bitstream.
- residual data including a difference between sample values included in the current layer and sample values included in the reference picture of the current layer may be obtained from the bitstream.
- Steps 11 and 12 are performed by the reception extractor 110.
- step 13 the predictive picture of the current layer is determined by up-sampling the reference layer based on the luma vertical phase difference, the luma horizontal phase difference, the chroma vertical phase difference, and the chroma horizontal phase difference.
- step 13 the size of the reference area is determined from the reference layer size information and the reference layer offset information
- the size of the extended reference area is determined from the current layer size information and the current layer offset information
- the size of the reference area and the extended reference area According to the size of, the accumulation ratio representing the ratio of the size between the reference region and the extended reference region may be determined.
- the determined reference layer offset information, current layer offset information, and accumulation ratio may be used for upsampling the reference layer.
- the current picture may be reconstructed using the residual data and the predictive picture.
- Step 13 is performed by the decoder 120.
- FIG. 2A is a block diagram of the scalable video encoding apparatus 200 according to an embodiment.
- the scalable video encoding apparatus 200 may include an encoder 210 and an output unit 220.
- the encoder 210 and the output unit 220 are represented by separate structural units. However, according to an embodiment, the encoder 210 and the output unit 220 may be combined to be implemented in the same structural unit.
- the encoder 210 and the output unit 220 are expressed as structural units located in one device, the devices in charge of each function of the encoder 210 and the output unit 220 may be physically adjacent to each other. There is no need. Therefore, in some embodiments, the encoder 210 and the output unit 220 may be distributed.
- the encoder 210 and the output unit 220 of FIG. 2A may be implemented by one processor according to an exemplary embodiment. In some embodiments, the present invention may also be implemented by a plurality of processors.
- the scalable video encoding apparatus 200 may include storage (not shown) for storing data generated by the encoder 210 and the output unit 220.
- the encoder 210 and the output unit 220 may extract and use data stored in a storage (not shown).
- the scalable video encoding apparatus 200 of FIG. 2A is not limited to a physical apparatus.
- some of the functions of the scalable video decoding apparatus 200 may be implemented in software instead of hardware.
- the encoder 210 encodes the original image input to the scalable video encoding apparatus 200.
- an original image is input to the current layer, and an image down-sampling the original image is input to the reference layer.
- the reference layer and the current layer are encoded.
- the encoder 210 determines the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference based on the phase difference of the chroma sample when the phase of the chroma sample is adjusted to the luma sample when downsampling the original image.
- the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference are generally determined as 1 and 0, respectively.
- the vertical luma-chroma phase difference and the horizontal luma-chroma phase difference may be determined as different values according to embodiments.
- the encoder 210 may adjust the phases of the samples included in the prediction picture of the current layer according to the alignment method used when downsampling the original image.
- the encoder 210 may adjust the phases of the samples included in the prediction picture of the current layer, according to a scanning scheme. If interlaced scanning is used, the phase of the samples included in the prediction picture of the current layer may be adjusted when the downsampled image is an even field.
- the encoder 210 may determine a luma vertical phase difference, a luma horizontal phase difference, a chroma vertical phase difference, and a chroma horizontal phase difference according to Formulas 1 and 2 described above with reference to FIG. 1.
- the encoder 210 determines a reference region to be used for inter-layer prediction of the current layer from the reference layer, and generates an extended reference region by upsampling the reference region.
- the encoder 210 may encode the reference layer independently of the current layer. In addition, the encoder 210 may encode the reference layer according to a method of encoding a single layer picture based on a tree structure.
- the encoder 210 may encode the current layer picture by using the reference region. According to an embodiment, the encoder 210 may encode the current layer independently of the reference region without using the reference region.
- Interlayer prediction of the encoder 210 will be described in more detail with reference to FIGS. 6A and 6B. Coding based on the tree structure will be described in more detail with reference to FIGS. 8 to 17.
- the encoder 210 may determine reference layer size information and reference layer offset information from the reference layer and the reference region.
- the encoder 210 may determine the current layer size information and the current layer offset from the current layer and the extended reference region.
- the encoder 210 may upsample the reference layer based on the vertical luma-chroma phase difference, the horizontal luma-chroma phase difference, alignment scheme indication information, scan scheme indication information, and information about the reference region and the extended reference region.
- the encoder 210 may generate residual data by comparing the current layer with the prediction picture of the current layer generated by upsampling the reference layer.
- the output unit 220 includes reference layer size information, reference layer offset information, current layer size information, current layer offset information, luma vertical phase difference, luma horizontal phase difference, chroma vertical phase difference, chroma horizontal phase difference, and register determined by the encoder 210. Send a bitstream containing dual data.
- FIG. 2B is a flowchart of a scalable video encoding method 20 performed by the scalable video encoding apparatus 200 according to an embodiment.
- step 21 a scanning scheme indicating a scanning scheme of the current layer and the reference layer is determined.
- step 22 the field of the reference layer is determined when the current layer is scanned by the progressive scan method and indicates that the reference layer is scanned by the interlaced scan method.
- a horizontal phase difference and a vertical phase difference for determining the phase of luma samples and chroma samples included in the prediction picture of the current layer are determined based on the scanning scheme and the fields of the reference layer.
- step 24 the predictive picture of the current layer is determined by upsampling the reference layer based on the horizontal phase difference and the vertical phase difference.
- step 25 residual data including difference values between sample values of the current layer and sample values of the prediction picture of the current layer is determined.
- Steps 21 to 25 may be performed by the encoder 210.
- step 26 a bitstream including the horizontal phase difference and vertical phase difference and residual data is output.
- Step 26 is performed by the output unit 220.
- FIG. 5 is a diagram illustrating syntax for describing a process of acquiring encoding information, according to an embodiment.
- calculation equations for explaining a method of upsampling using encoding information are provided together.
- the picture parameter set includes information commonly applied to slice segments included in a picture.
- the syntax for the parameters related to upsampling in FIG. 5 is shown below.
- phase_hor_luma [ref_loc_offset_layer_id [i]]
- phase_hor_chroma_plus8 [ref_loc_offset_layer_id [i]]
- phase_ver_chroma_plus8 [ref_loc_offset_layer_id [i]]
- num_ref_loc_offsets means the maximum value of the number of upsampling information sets.
- an image may be encoded in two or more layers.
- the upsampling information set includes reference layer offset information, current layer offset information, and phase information necessary for the upsampling process. If n layers exist, n-1 upsampling may occur, so the maximum value of num_ref_loc_offsets becomes n-1. Therefore, when the video is encoded into n layers, the scalable video decoding apparatus 100 determines the number of upsampling information sets by parsing num_ref_loc_offsets.
- the video decoding apparatus 100 obtains reference layer offset information, current layer offset information, and phase information for a layer corresponding to i when i is 0, 1, ..., (num_ref_loc_offsets -1). Acquire.
- ref_loc_offset_layer_id [i] represents an identification number of the i-th upsampling information set. For example, if an image is encoded into four layers from the first layer, which is the lowest layer, to the fourth layer, which is the highest layer, num_ref_loc_offsets represents 3, and ref_loc_offset_layer_id [0] represents the upsampling information set between the first layer and the second layer. Ref_loc_offset_layer_id [2] is the third layer? It may indicate a set of upsampling information between the fourth layers.
- the scalable video decoding apparatus 100 may obtain current layer offset information from the bitstream, and the video decoding apparatus 100 may determine the extended reference region of the current layer from the current layer offset information. Related syntax and calculations are described below.
- scaled_ref_layer_offset_present_flag [i] indicates whether the current layer offset information is included in the i-th upsampling information set. If scaled_ref_layer_offset_present_flag [i] indicates 1, the current layer offset information is included in the i-th upsampling information set, and if 0 indicates that the current layer offset information is not included in the i-th upsampling information set. If scaled_ref_layer_offset_present_flag [i] does not exist, the value of scaled_ref_layer_offset_present_flag [i] is considered to be zero.
- the video decoding apparatus 100 scales_ref_layer_left_offset [ref_loc_offset_layer_id [i]], scaled_ref_layer_top_offset [ref_loc_off_layer_right_off_layer_right_off_layer_right_offset_off_layer_right_off_set_Can be obtained.
- scaled_ref_layer_left_offset [ref_loc_offset_layer_id [i]] represents the current layer left offset corresponding to ref_loc_offset_layer_id [i]. If scaled_ref_layer_left_offset [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines scaled_ref_layer_left_offset [ref_loc_offset_layer_id [i]] to be zero.
- scaled_ref_layer_top_offset [ref_loc_offset_layer_id [i]] indicates the current layer top offset corresponding to ref_loc_offset_layer_id [i]. If scaled_ref_layer_top_offset [ref_loc_offset_layer_id [0]] is not present in the bitstream, the scalable video decoding apparatus 100 determines scaled_ref_layer_top_offset [ref_loc_offset_layer_id [0]] to be zero.
- scaled_ref_layer_right_offset [ref_loc_offset_layer_id [i]] indicates the current layer right offset corresponding to ref_loc_offset_layer_id [i]. If scaled_ref_layer_right_offset [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines scaled_ref_layer_right_offset [ref_loc_offset_layer_id [i]]] as 0.
- scaled_ref_layer_bottom_offset [ref_loc_offset_layer_id [i]] represents the current layer bottom offset corresponding to ref_loc_offset_layer_id [i]. If scaled_ref_layer_bottom_offset [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines scaled_ref_layer_bottom_offset [ref_loc_offset_layer_id [i]] as 0.
- the scalable video decoding apparatus 100 may convert the offsets into luma sample units according to a color format.
- the scalable video decoding apparatus 100 uses the obtained scaled_ref_layer_left_offset [ref_loc_offset_layer_id [i]], scaled_ref_layer_top_offset [ref_loc_offset_layer_id [i]], scaled_ref_layer_right_offset [ref_loc_offset_layer_id_i_set] and refd__offset_ref_ And width can be determined.
- the height and width of the extended reference region can be determined according to equations 3 and 4. In Equations 3 and 4, the height and width of the extended reference region are determined based on the luma sample.
- ScaledRefLayerRegionWidthInSamplesY PicWidthInSamplesCurrY? ScaledRefLayerRegionLeftOffset? ScaledRefLayerRegionRightOffset [Calculation 3]
- ScaledRefLayerRegionHeightInSamplesY PicHeightInSamplesCurrY? ScaledRefLayerRegionTopOffset? ScaledRefLayerRegionBottomOffset [Calculation 4]
- ScaledRefLayerRegionWidthInSamplesY represents the width of the extended reference area
- ScaledRefLayerRegionHeightInSamplesY represents the height of the extended reference area.
- PicWidthInSamplesCurrY represents the width of the current layer
- PicHeightInSamplesCurrY represents the height of the current layer.
- ScaledRefLayerRegionLeftOffset means a current layer left offset, a current layer top offset, a current layer right offset, and a current layer bottom offset, respectively, obtained by the scalable video decoding apparatus 100 from the bitstream.
- ScaledRefLayerRegionWidthInSamplesY is determined by subtracting ScaledRefLayerRegionLeftOffset and ScaledRefLayerRegionRightOffset from PicWidthInSamplesCurrY.
- ScaledRefLayerRegionHeightInSamplesY is determined by subtracting ScaledRefLayerRegionTopOffset and ScaledRefLayerRegionBottomOffset from PicHeightInSamplesCurrY.
- the scalable video decoding apparatus 100 may obtain reference layer offset information from the bitstream, and the video decoding apparatus 100 may determine a reference region of the reference layer from the reference layer offset information. Related syntax and calculations are described below.
- ref_region_offset_present_flag [i] indicates whether reference layer offset information is included in the i-th upsampling information set. If ref_region_offset_present_flag [i] indicates 1, the reference layer offset information is included in the i-th upsampling information set, and if 0, the reference layer offset information is not included in the i-th upsampling information set. If ref_region_offset_present_flag [i] does not exist, the value of ref_region_offset_present_flag [i] is considered to be zero.
- the scalable video decoding apparatus 100 may determine ref_region_left_offset [ref_loc_offset_layer_id [i]], ref_region_top_offset [ref_loc_offset_layer_id [i]], ref_region_right_off_layer_top_ref_loc_offset_layer if the ref_region_offset_present_flag [i] indicates 1 ]] Can be obtained.
- ref_region_left_offset [ref_loc_offset_layer_id [i]] indicates a reference layer left offset corresponding to ref_loc_offset_layer_id [i]. If ref_region_left_offset [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines that ref_region_left_offset [ref_loc_offset_layer_id [i]] is zero.
- ref_region_top_offset [ref_loc_offset_layer_id [i]] indicates a reference layer top offset corresponding to ref_loc_offset_layer_id [i]. If ref_region_top_offset [ref_loc_offset_layer_id [0]] is not present in the bitstream, the scalable video decoding apparatus 100 determines ref_region_top_offset [ref_loc_offset_layer_id [0]] to be zero.
- ref_region_right_offset [ref_loc_offset_layer_id [i]] indicates a reference layer right offset corresponding to ref_loc_offset_layer_id [i]. If ref_region_right_offset [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines ref_region_right_offset [ref_loc_offset_layer_id [i]]] as 0.
- ref_region_bottom_offset [ref_loc_offset_layer_id [i]] indicates a reference layer bottom offset corresponding to ref_loc_offset_layer_id [i]. If ref_region_bottom_offset [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines ref_region_bottom_offset [ref_loc_offset_layer_id [i]] as 0.
- ref_region_left_offset [ref_loc_offset_layer_id [i]], ref_region_top_offset [ref_loc_offset_layer_id [i]], ref_region_right_offset [ref_loc_offset_layer_id [i]], and ref_region_bottom_offset [ref_loc_offset_layer_id [i]] can be expressed in unit value. If the offsets are expressed in chroma sample units, the scalable video decoding apparatus 100 may convert the offsets into luma sample units according to a color format.
- the scalable video decoding apparatus 100 uses the obtained ref_region_left_offset [ref_loc_offset_layer_id [i]], ref_region_top_offset [ref_loc_offset_layer_id [i]], ref_region_right_offset [ref_loc_offset_layer_id [i]], and ref_region_bottom_offset [ref_loc_id_layer_offset]. And width can be determined.
- the scalable video decoding apparatus 100 uses the obtained ref_layer_left_offset [ref_layer_id [i]], ref_layer_top_offset [ref_layer_id [i]], ref_layer_right_offset [ref_layer_id [i]], and ref_layer_bottom_offset [ref_layer_id [i]].
- the width can be determined.
- the height and width of the reference region can be determined according to equations 5 and 6. In Equations 5 and 6, the height and width of the reference region are determined based on the luma sample.
- RefLayerRegionWidthInSamplesY PicWidthInSamplesRefLayerY? RefLayerRegionLeftOffset? RefLayerRegionRightOffset [Equation 5]
- RefLayerRegionHeightInSamplesY PicHeightInSamplesRefLayerY? RefLayerRegionTopOffset? RefLayerRegionBottomOffset [Equation 6]
- RefLayerRegionWidthInSamplesY represents the width of the reference region and RefLayerRegionHeightInSamplesY represents the height of the reference region.
- PicWidthInSamplesRefLayerY represents the width of the reference layer and PicHeightInSamplesRefLayerY represents the height of the reference layer.
- RefLayerRegionLeftOffset means a reference layer left offset, a reference layer top offset, a reference layer right offset, and a reference layer bottom offset, respectively, obtained by the scalable video decoding apparatus 200 from the bitstream.
- RefLayerRegionWidthInSamplesY is determined by subtracting RefLayerRegionLeftOffset and RefLayerRegionRightOffset from PicWidthInSamplesRefLayerY according to Equation 5.
- RefLayerRegionHeightInSamplesY is determined by subtracting RefLayerRegionTopOffset and RefLayerRegionBottomOffset from PicHeightInSamplesRefLayerY according to Equation 6.
- the scalable video decoding apparatus 100 may determine the horizontal accumulation ratio and the vertical accumulation ratio by using the height and width of the reference region and the height and width of the extended reference region.
- the horizontal accumulation ratio and the vertical accumulation ratio may be determined by Equations 7 and 8 below. In Formula 7 and Formula 8, the horizontal accumulation ratio and the vertical accumulation ratio are determined based on the luma sample.
- SpatialScaleFactorHorY ((RefLayerRegionWidthInSamplesY ⁇ 16) +
- SpatialScaleFactorVerY ((RefLayerRegionHeightInSamplesY ⁇ 16) +
- SpatialScaleFactorHorY and SpatialScaleFactorVerY indicate the horizontal and vertical accumulation ratios for the luma samples, respectively.
- SpatialScaleFactorHorY is equal to the value obtained by dividing RefLayerRegionWidthInSamplesY by 16 to the right divided by ScaledRefRegionWidthInSamplesY. Since the samples included in the prediction picture of the current layer must match the coordinate plane of the reference layer, the width of the reference layer is divided by the width of the current layer. RefLayerRegionWidthInSamplesY is multiplied by 2 ⁇ 16, the maximum value of the ratio of the extended reference region to the reference region so that the horizontal accumulation ratio is always greater than one.
- SpatialScaleFactorVerY is equal to the value obtained by dividing RefLayerRegionHeightInSamplesY by 16 to the right divided by ScaledRefRegionHeightInSamplesY. Since the samples included in the prediction picture of the current layer must match the coordinate plane of the reference layer, the height of the reference layer is divided by the height of the current layer. Therefore, RefLayerRegionHeightInSamplesY is multiplied by 2 ⁇ 16, the maximum value of the ratio of the extended reference area to the reference area so that the vertical accumulation ratio is always greater than 1.
- SpatialScaleFactorHorY divided by 2 ⁇ 16 is the actual horizontal accumulation ratio
- SpatialScaleFactorVerY divided by 2 ⁇ 16 is the actual vertical accumulation ratio
- the scalable video decoding apparatus 100 may obtain information related to a phase for adjusting the phase of samples in the upsampling process.
- resample_phase_set_prsent_flag [i] indicates whether phase information is included in the i-th upsampling information set. If resample_phase_set_prsent_flag [i] indicates 1, the phase information is included in the i-th upsampling information set, and if 0, the phase information is not included in the i-th upsampling information set. If resample_phase_set_prsent_flag [i] does not exist, the value of resample_phase_set_prsent_flag [i] is considered to be zero.
- the video decoding apparatus 100 determines that phase_hor_luma [ref_loc_offset_layer_id [i]], phase_ver_luma [ref_loc_offset_layer_id [i]], phase_hor_chroma_plus8 [ref_loc_off_layer_id_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_i_
- phase_hor_luma [ref_loc_offset_layer_id [i]] represents the horizontal phase difference of the luma component corresponding to ref_loc_offset_layer_id [i]. For example, when the horizontal phase difference of the luma component corresponding to ref_loc_offset_layer_id [i] is 1, phase_hor_luma [ref_loc_offset_layer_id [i]] represents 1. If phase_hor_luma [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines that phase_hor_luma [ref_loc_offset_layer_id [i]] is zero.
- phase_ver_luma [ref_loc_offset_layer_id [i]] represents the vertical phase difference of the luma component corresponding to ref_loc_offset_layer_id [i]. For example, when the vertical phase difference of the luma component corresponding to ref_loc_offset_layer_id [i] is 2, phase_ver_luma [ref_loc_offset_layer_id [i]] represents 2. If phase_ver_luma [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines that phase_ver_luma [ref_loc_offset_layer_id [i]] is zero.
- phase_hor_chroma_plus8 [ref_loc_offset_layer_id [i]] represents the horizontal phase difference of the luma component corresponding to ref_loc_offset_layer_id [i]. For example, when the horizontal phase difference of the chroma component corresponding to ref_loc_offset_layer_id [i] is 1, phase_hor_chroma_plus8 [ref_loc_offset_layer_id [i]] represents 1.
- phase_hor_chroma_plus8 [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines that phase_hor_chroma_plus8 [ref_loc_offset_layer_id [i]] is zero.
- phase_ver_chroma_plus8 [ref_loc_offset_layer_id [i]] represents the horizontal phase difference of the luma component corresponding to ref_loc_offset_layer_id [i]. For example, when the vertical phase difference of the chroma component corresponding to ref_loc_offset_layer_id [i] is 1, phase_ver_chroma_plus8 [ref_loc_offset_layer_id [i]] represents 1. If phase_ver_chroma_plus8 [ref_loc_offset_layer_id [i]] is not present in the bitstream, the scalable video decoding apparatus 100 determines that phase_ver_chroma [ref_loc_offset_layer_id [i]] is zero.
- phase_hor_luma [ref_loc_offset_layer_id [i]]
- phase_ver_luma [ref_loc_offset_layer_id [i]]
- phase_hor_chroma_plus8 [ref_loc_offset_layer_id [i]]
- phase_ver_chroma_plus8 [ref_loc_offset_layer_id [i]]
- the scalable video decoding apparatus 100 may strictly sample the reference region by using the previously obtained reference layer offset information, current layer offset information, vertical accumulation ratio, horizontal accumulation ratio, vertical phase difference, and horizontal phase difference.
- Equations 9 to 20 the upsampling method of the reference region is described.
- Equations 9 to 12 offset values used for upsampling are defined.
- currOffsetX is the horizontal offset of the current layer.
- currOffsetX has the same value as ScaledRefLayerLeftOffset, the current layer left offset.
- currOffsetY means the vertical offset of the current layer.
- currOffsetY has the same value as ScaledRefLayerTopOffset, the current layer top offset.
- currOffsetX has the same value as ScaledRefLayerLeftOffset / SubWidthCurrC, which is the current layer left offset.
- currOffsetY means the vertical offset of the current layer.
- currOffsetY has the same value as ScaledRefLayerTopOffset / SubHeightCurrC which is the current layer top offset.
- reOffsetX means the horizontal offset of the reference layer.
- reOffsetX has a value shifted to the right by 4 from RefLayerRegionLeftOffset, which is the reference layer left offset.
- refOffsetY means the vertical offset of the reference layer.
- refOffsetY has the value shifted to the right by 4 in RefLayerRegionTopOffset, the reference layer top offset.
- refOffsetX has a value shifted to the right by 4 in RefLayerRegionLeftOffset / SubWidthRefLayerC, which is the left offset of the reference layer.
- refOffsetY means the vertical offset of the reference layer.
- refOffsetY has the value shifted to the right by 4 in RefLayerRegionTopOffset / SubHeightRefLayerC, which is the reference layer top offset.
- SubWidthCurrC, SubHeightCurrC, SubWidthRefLayerC, and SubHeightRefLayerC are values used for upsampling on chroma samples and may have a value of 1 or 2 depending on the color format. For example, if the color formats of the reference layer are all 4: 2: 0, SubWidthRefLayerC and SubHeightRefLayerC are both 2. SubWidthRefLayerC is 2 and SubHeightRefLayerC is 1 when the color formats of the reference layer are all 4: 2: 2. SubWidthRefLayerC and SubHeightRefLayerC are 1 when the color formats of the reference layer are all 4: 4: 4.
- SubWidthCurrC and SubHeightCurrC are both 2.
- SubWidthCurrC is 2 and SubHeightCurrC is 1 when the color formats of the current layer are all 4: 2: 2.
- SubWidthCurrC and SubHeightCurrC are 1 when the color formats of the current layer are all 4: 4: 4.
- Equations 13 to 16 define the phase difference and accumulation ratio used for upsampling.
- phaseX means the horizontal phase difference.
- phaseX may have a different value depending on the color component index.
- phaseX has the same value as PhaseHorY.
- phaseX has the same value as PhaseHorC for the chroma sample.
- phaseY means vertical phase difference. phaseY may have a different value depending on the color component index. When the color component index represents a luma sample, phaseY has the same value as PhaseVerY. And when the color component index represents a chroma sample, phaseY has the same value as PhaseVerC for the chroma sample.
- PhaseHorY and PhaseVerY are the same as phase_hor_luma [ref_loc_offset_layer_id [i]] and phase_ver_luma [ref_loc_offset_layer_id [i]].
- phase_hor_chroma_plus8 [ref_loc_offset_layer id [i]]? 8
- phase_ver_chroma_plus8 [ref_loc_offset_layer_id [i]]? Same as 8
- scaleX means the horizontal phase difference.
- scaleX may have a different value depending on the color component index.
- the color component index represents a luma sample
- scaleX has the same value as SpatialScaleFactorHorY.
- scaleX has the same value as SpatialScaleFactorHorC for the chroma sample.
- scaleY means vertical phase difference. scaleY may have a different value depending on the color component index. When the color component index represents a luma sample, scaleY has the same value as SpatialScaleFactorVerY. When the color component index represents a chroma sample, scaleY has the same value as SpatialScaleFactorVerC for the chroma sample.
- addX and addY represent the amount of change in the sample position generated during the upsampled process according to the accumulation ratio due to the adjustment of the phase.
- addX is determined by a negative number that is the product of scaleX and phaseX.
- addY is determined by a negative number that is the product of scaleY and phaseY.
- scaleX * phaseX and scaleY * phaseY are rounded to the fourth digit and then shifted to the right by four.
- xRef16 and yRef16 are values indicating which position of the reference region the sample of the extended reference region corresponds to.
- xP and yP indicate the position of a sample of the extended reference region. Therefore, by currOffsetX and currOffsetY ⁇ in xP and yP, respectively, the position of the sample of the extended reference region is determined when the current layer offset is excluded.
- the result values of xRef16 and yRef16 are determined by adding refOffsetX and refOffsetY representing the offset of the reference area to the result values.
- xRef16 and yRef16 remove only the 12 bit strings on the right, so the width and width of the reference region
- the value corresponding to 16 times of is determined as the maximum value of xRef16 and yRef16. Therefore, xRef16 and yRef16 divided by 16 match the coordinates of the reference area. For example, the sample of the extended reference area whose xRef16 is 96 is matched with the sample of the reference area whose x coordinate is 6. If xRef16 is 88, the sample of the extended reference region is matched with the point at which the x coordinate of the reference region is 5.5. Therefore, the interpolation process is made with the accuracy of 1/16 of the distance between adjacent samples.
- the samples of the reference region for interpolating the samples of the extended reference region are determined according to the values of xRef16 and yRef16. Of the samples of the reference region, samples around xRef16 and yRef16 are used for interpolation. According to an embodiment, an 8-tap filter using 8 samples is mainly used in the interpolation process. Interpolation is performed in at least one of a vertical direction and a horizontal direction depending on the sample position.
- 6A is a block diagram of the scalable video encoding apparatus 600 according to an embodiment.
- the scalable video encoding apparatus 600 may include a downsampler 605, a reference layer encoder 610, an upsampler 650, a current layer encoder 660, and a multiplexer 690.
- the down sampling device 605 receives the current layer picture 602.
- the down sampling apparatus 605 downsamples the input current layer picture 602 to generate a reference layer picture 607.
- the reference layer encoder 610 receives a reference layer picture 607.
- the reference layer encoder 610 encodes the reference layer picture 607.
- the reference layer encoder 610 may encode the reference layer picture 607 according to a single layer encoding method.
- the reference layer encoder 610 may store the reconstructed reference layer picture 607 in storage (not shown) by encoding and decoding the reference layer picture 607 and then decoding the reference layer picture 607 again.
- the reference layer encoder 610 may determine the reference region 651 from the reference layer picture 607.
- the upsampling unit 650 receives the reference region 651 from the reference layer encoder 610.
- the upsampling unit 650 upsamples the reference region 651 to determine the extended reference region 652.
- the current layer encoder 660 receives the current layer picture 602 and the extended reference region 652.
- the current layer encoding stage 660 may encode the current layer picture 602 according to a single layer encoding scheme.
- the current layer encoder 660 may generate a predictive picture of the current layer picture 602 according to the extended reference region 652, and encode the current layer picture 602.
- the reference layer encoder 610 transmits a bitstream including encoding information of the reference layer picture 607 to the multiplexer 690.
- the current layer encoder 660 transmits the bitstream including the encoding information of the current layer picture 602 to the multiplexer 690.
- the multiplexer 690 combines the bitstreams transmitted from the reference layer encoder 610 and the current layer encoder 660 to generate a scalable bitstream 695.
- the reference layer encoder 610 or the current layer encoder 660 may determine the vertical phase difference and the horizontal phase difference of the sample.
- the multiplexer 690 may output the scalable bitstream 695 including the determined vertical phase difference and the horizontal phase difference.
- the downsampling unit 605, the reference layer encoder 610, the upsampling unit 650, and the current layer encoder 660 correspond to the encoder 210 of FIG. 2A.
- the multiplexer 690 corresponds to the output 220 of FIG. 2A.
- 6B is a block diagram of the scalable video encoding apparatus 600 according to an embodiment. 6B illustrates the encoding process of the reference layer encoder 610 and the current layer encoder 660 in more detail.
- the reference layer encoder 610 may split and encode the reference layer picture 607 into a maximum coding unit, a coding unit, a prediction unit, a transformation unit, and the like.
- the intra predictor 622 may determine the optimal coding mode according to the intra mode and the coding depth to predict the reference layer picture 607.
- the motion compensator 624 may predict the reference layer picture 607 with reference to the reference picture list stored in the storage.
- the reference picture list includes reference layer pictures input to the reference layer encoder 610. Residual data may be generated for each prediction unit through intra prediction or inter prediction.
- the transformer / quantizer 630 generates a quantized transform coefficient by frequency transforming and quantizing the residual data.
- the entropy encoder 632 entropy encodes the quantized transform coefficients.
- the entropy coded quantized transform coefficients are transmitted to the multiplexer 690 together with the encoding information generated during the encoding process.
- the inverse transformer / inverse quantizer 634 inverse quantizes and inverse transforms the quantized transform coefficients to restore the residual data.
- the intra predictor 622 or the motion compensator 624 reconstructs the reference layer picture 607 using the residual data and the encoding information.
- the in-loop filter 636 may include at least one of a deblocking filter and a sample adaptive offset (SAO) filter.
- the reconstructed reference layer picture 607 may be stored in the storage 638.
- the reconstructed reference layer picture 607 may be transmitted to the motion compensator 624 and used for prediction of another reference layer picture.
- the reference region 651 of the reference region picture 607 stored in the storage 638 may be upsampled by the upsampling unit 650.
- the upsampling unit 650 may transmit the extended reference region in which the reference region 651 is upsampled to the storage 688 of the current layer encoding terminal 660.
- the motion compensator 624 may generate inter-layer motion prediction information 654 that scales the motion prediction information used for inter prediction according to the accumulation ratio of the current layer picture and the reference layer picture.
- the motion compensator 624 may transmit inter-layer motion prediction information 654 to the motion compensator 674 of the current layer encoder 660.
- the encoding operation on the reference layer pictures may be repeated.
- the current layer encoder 660 may divide and encode the current layer picture 602 into a maximum coding unit, a coding unit, a prediction unit, a transformation unit, and the like.
- the intra predictor 672 may predict the current hierarchical picture 602 by determining an optimal coding mode according to the intra mode and the coding depth.
- the motion compensator 674 may predict the current layer picture 602 by referring to the reference picture list stored in the storage. In addition, the motion compensator 674 may use the inter-layer motion prediction information 654 generated by the motion compensator 624 of the reference layer encoder 610 for inter prediction.
- the reference picture list includes current layer pictures input to the current layer encoder 660 and an extended reference region 652 upsampled by the upsampling unit 650. Residual data may be generated for each prediction unit through the intra prediction or the inter prediction.
- the transformer / quantizer 680 frequency transforms and quantizes the residual data to generate quantized transform coefficients.
- the entropy encoder 682 entropy encodes the quantized transform coefficients.
- the entropy coded quantized transform coefficients are transmitted to the multiplexer 690 together with the encoding information generated during the encoding process.
- the inverse transformer / inverse quantizer 684 dequantizes and inverse transforms the quantized transform coefficients to restore the residual data.
- the intra predictor 672 or the motion compensator 674 reconstructs the current layer picture 602 using the residual data and the encoding information.
- the in-loop filter 686 may include at least one of a deblocking filter and a sample adaptive offset (SAO) filter.
- the restored current layer picture 602 may be stored in the storage 688.
- the reconstructed current layer picture 602 may be transmitted to the motion compensator 624 and used for prediction of another reference layer picture.
- the encoding operation on the current layer pictures may be repeated.
- FIG. 7A illustrates a block diagram of a scalable video decoding apparatus 700, according to an embodiment.
- the scalable video decoding apparatus 700 may include a demultiplexer 705, a reference layer decoder 710, an upsampling unit 650, and a current layer decoder 760.
- the demultiplexer 705 receives the scalable bitstream 702.
- the demultiplexer 705 parses the scalable bitstream 702 to divide the bitstream for the current layer picture 797 and the bitstream for the reference layer picture 795.
- the bitstream associated with the current layer picture 797 is transmitted to the current layer decoder 760.
- the bitstream for the reference layer picture 795 is transmitted to the reference layer decoder 710.
- the reference layer decoder 710 decodes the bitstream of the received reference layer picture 795.
- the reference layer decoder 710 may decode the reference layer picture 795 according to a single layer decoding method.
- the reference layer decoder 710 may store the decoded reference layer picture 795 in a storage (not shown).
- the reference layer decoder 710 may determine the reference region 751 from the reference layer picture 795.
- the reference layer picture 795 may be output through the decoding process of the reference layer decoder 710.
- the upsampling unit 750 receives the reference region 751 from the reference layer decoder 710. The upsampling unit 750 upsamples the reference area 751 to determine the extended reference area 752.
- the current layer decoding end 760 receives a bitstream and an extended reference region 752 for the current layer picture 797.
- the current layer decoder 760 may decode the current layer picture 797 according to a single layer decoding method.
- the current layer decoding unit 760 may generate a predictive picture of the current layer picture 797 according to the extended reference region 752, and decode the current layer picture 797.
- the current layer picture 797 may be output through the decoding process of the current layer decoder 710.
- the demultiplexer 705 can obtain the vertical phase difference and the horizontal phase difference from the scalable bitstream 702.
- the upsampling unit 650 may upsample the reference region 751 based on the vertical phase difference and the horizontal phase difference of the sample.
- the demultiplexer 705 corresponds to the reception extractor 110 of FIG. 1A.
- the reference layer decoder 710, the upsampling unit 750, and the current layer decoder 760 correspond to the decoder 120 of FIG. 1A.
- 7B is a block diagram of the scalable video decoding apparatus 700, according to an embodiment. 7B illustrates a coding process of the reference layer decoder 710 and the current layer decoder 760 in more detail.
- the entropy decoder 720 entropy decodes the bitstream for the reference layer picture 795 to generate quantized transform coefficients.
- the inverse transformer / inverse quantizer 734 inversely quantizes and inverse transforms the quantized transform coefficients to restore the residual data.
- the intra predictor 732 may predict the reference layer picture 795 according to the residual data and the encoding information.
- the motion compensator 734 may predict the reference layer picture 795 with reference to the reference picture list stored in the residual data and the storage.
- the reference picture list includes reference layer pictures reconstructed by the reference layer encoder 710.
- the in-loop filter 724 may include at least one of a deblocking filter and a sample adaptive offset (SAO) filter.
- the reconstructed reference layer picture 795 may be stored in the storage 738.
- the reconstructed reference layer picture 795 may be transmitted to the motion compensator 734 and used for prediction of another reference layer picture.
- the reference region 751 of the reference region picture 795 stored in the storage 738 may be upsampled by the upsampling unit 750.
- the upsampling unit 750 may transmit the extended reference region in which the reference region 751 is upsampled to the storage 788 of the current layer encoding terminal 760.
- the motion compensator 724 may generate inter-layer motion prediction information 754 which is scaled by the motion prediction information used for inter prediction according to the accumulation ratio of the current layer picture and the reference layer picture.
- the motion compensator 734 may transmit the inter-layer motion prediction information 754 to the motion compensator 784 of the current layer encoder 760.
- the decoding operation on the reference layer pictures may be repeated.
- the entropy decoder 770 entropy decodes the bitstream for the current layer picture 797 to generate quantized transform coefficients.
- the inverse transformer / inverse quantizer 772 dequantizes and inverse transforms the quantized transform coefficients to restore the residual data.
- the intra predictor 782 may predict the current layer picture 797 according to the residual data and the encoding information.
- the motion compensator 784 may predict the current layer picture 797 by referring to the reference picture list stored in the residual data and the storage 788.
- the motion compensator 784 may use inter-layer motion prediction information 754 generated by the motion compensator 734 of the reference layer decoder 710 for inter prediction.
- the reference picture list includes current layer pictures reconstructed by the current layer encoder 760 and an extended reference region 752 upsampled by the upsampling unit 750.
- the in-loop filter 774 may include at least one of a deblocking filter and a sample adaptive offset (SAO) filter.
- the reconstructed current layer picture 797 may be stored in storage 788.
- the reconstructed current layer picture 797 may be transmitted to the motion compensator 784 and used for prediction of another reference layer picture.
- the decoding operation on the current layer pictures may be repeated.
- the reference layer picture 795 may be output from the reference layer decoder 710, and the current layer picture 797 may be output from the current layer decoder 760.
- FIGS. 6 and 7 illustrate a scalable video encoding / decoding apparatus including only two layers.
- the encoding / decoding principle shown in FIGS. 6 and 7 may also be applied to a scalable video encoding / decoding apparatus including three or more layers.
- the extended reference region and the inter-layer motion prediction information for inter-layer prediction in the encoding process of the first and second layer encoders Can be generated.
- an extension reference region and inter-layer motion prediction information for inter-layer prediction may be generated during the encoding process of the second and third layer encoders.
- the video encoding process and the video decoding process based on coding units having a tree structure described below with reference to FIGS. 8A to 18 are video encoding process and video decoding process for single layer video, and thus inter prediction and motion compensation are performed. This is detailed. However, as described above with reference to FIGS. 6A through 7B, inter-layer prediction and compensation between reference layer pictures and current layer pictures are performed for video stream encoding / decoding.
- the encoder 810 of the scalable video decoding apparatus 800 may perform encoding for each single layer video.
- the video encoding apparatus 800 may include the number of layers of the multi-layer video to control encoding of the single layer video allocated to each video encoding apparatus 800.
- the scalable video decoding apparatus 850 may perform inter-view prediction by using encoding results of separate single views of each video encoding apparatus 800. Accordingly, the encoder 810 of the scalable video decoding apparatus 850 may generate a base view video stream and a current layer video stream that contain encoding results for each layer.
- the decoder 870 of the scalable video decoding apparatus 850 in order for the decoder 870 of the scalable video decoding apparatus 850 to decode the multi-layer video based on the coding unit of the tree structure, the received reference layer video stream and the current layer video are decoded.
- the video decoding apparatus 850 of FIG. 14 includes as many layers as the number of layers of a multi-layer video and controls to perform decoding of a single layer video allocated to each video decoding apparatus 850.
- the scalable video decoding apparatus 850 may perform inter-layer compensation by using a decoding result of a separate single layer of each video decoding apparatus 850. Accordingly, the scalable video decoding apparatus 850 may generate reference layer reconstructed images and current layer images for each layer.
- FIG. 8A is a block diagram of a video encoding apparatus 800 based on coding units having a tree structure, according to various embodiments.
- the video encoding apparatus 800 including video prediction based on coding units having a tree structure includes an encoder 810 and an output unit 820.
- the video encoding apparatus 800 that carries video prediction based on coding units having a tree structure according to an embodiment is referred to as a short term 'video encoding apparatus 800'.
- the encoder 810 may partition the current picture based on a maximum coding unit that is a maximum coding unit for the current picture of the image. If the current picture is larger than the maximum coding unit, image data of the current picture may be split into at least one maximum coding unit.
- the maximum coding unit may be a data unit having a size of 32x32, 64x64, 128x128, 256x256, or the like, and may be a square data unit having a square of two horizontal and vertical sizes.
- the coding unit according to an embodiment may be characterized by a maximum size and depth.
- the depth indicates the number of times the coding unit is spatially divided from the maximum coding unit, and as the depth increases, the coding unit for each depth may be split from the maximum coding unit to the minimum coding unit.
- the depth of the largest coding unit is the highest depth and the minimum coding unit may be defined as the lowest coding unit.
- the maximum coding unit decreases as the depth increases, the size of the coding unit for each depth decreases, and thus, the coding unit of the higher depth may include coding units of a plurality of lower depths.
- the image data of the current picture may be divided into maximum coding units according to the maximum size of the coding unit, and each maximum coding unit may include coding units divided by depths. Since the maximum coding unit is divided according to depths, image data of a spatial domain included in the maximum coding unit may be hierarchically classified according to depths.
- the maximum depth and the maximum size of the coding unit that limit the total number of times of hierarchically dividing the height and the width of the maximum coding unit may be preset.
- the encoder 810 encodes at least one divided region obtained by dividing the region of the largest coding unit for each depth, and determines a depth at which the final encoding result is output for each of the at least one divided region. That is, the encoder 810 encodes the image data in coding units according to depths for each maximum coding unit of the current picture, and selects a depth at which the smallest coding error occurs to determine the coding depth. The determined coded depth and the image data for each maximum coding unit are output to the output unit 820.
- Image data in the largest coding unit is encoded based on coding units according to depths according to at least one depth less than or equal to the maximum depth, and encoding results based on the coding units for each depth are compared. As a result of comparing the encoding error of the coding units according to depths, a depth having the smallest encoding error may be selected. At least one coding depth may be determined for each maximum coding unit.
- the coding unit is divided into hierarchically and the number of coding units increases.
- a coding error of each data is measured and it is determined whether to divide into lower depths. Therefore, even in the data included in one largest coding unit, since the encoding error for each depth is different according to the position, the coding depth may be differently determined according to the position. Accordingly, one or more coding depths may be set for one maximum coding unit, and data of the maximum coding unit may be partitioned according to coding units of one or more coding depths.
- the encoder 810 may determine coding units having a tree structure included in the current maximum coding unit.
- the coding units having a tree structure according to an embodiment include coding units having a depth determined as a coding depth among all deeper coding units included in the maximum coding unit.
- the coding unit of the coding depth may be hierarchically determined according to the depth in the same region within the maximum coding unit, and may be independently determined for the other regions.
- the coded depth for the current region may be determined independently of the coded depth for the other region.
- the maximum depth according to an embodiment is an index related to the number of divisions from the maximum coding unit to the minimum coding unit.
- the maximum depth according to an embodiment may represent the total number of splits from the maximum coding unit to the minimum coding unit. For example, when the depth of the largest coding unit is 0, the depth of the coding unit obtained by dividing the largest coding unit once may be set to 1, and the depth of the coding unit divided twice may be set to 2. In this case, if the coding unit divided four times from the maximum coding unit is the minimum coding unit, since depth levels of depths 0, 1, 2, 3, and 4 exist, the maximum depth may be set to 4.
- Predictive encoding and transformation of the largest coding unit may be performed. Similarly, prediction encoding and transformation are performed based on depth-wise coding units for each maximum coding unit and for each depth less than or equal to the maximum depth.
- encoding including prediction encoding and transformation should be performed on all the coding units for each depth generated as the depth deepens.
- the prediction encoding and the transformation will be described based on the coding unit of the current depth among at least one maximum coding unit.
- the video encoding apparatus 800 may variously select a size or shape of a data unit for encoding image data.
- the encoding of the image data is performed through prediction encoding, transforming, entropy encoding, and the like.
- the same data unit may be used in every step, or the data unit may be changed in steps.
- the video encoding apparatus 800 may select not only a coding unit for encoding the image data but also a data unit different from the coding unit in order to perform predictive encoding of the image data in the coding unit.
- prediction encoding may be performed based on a coding unit of a coding depth, that is, a more strange undivided coding unit, according to an embodiment.
- a more strange undivided coding unit that is the basis of prediction coding is referred to as a 'prediction unit'.
- the partition in which the prediction unit is divided may include a data unit in which at least one of the prediction unit and the height and the width of the prediction unit are divided.
- the partition may be a data unit in which the prediction unit of the coding unit is split, and the prediction unit may be a partition having the same size as the coding unit.
- the partition type includes not only symmetric partitions in which the height or width of the prediction unit is divided by a symmetrical ratio, but also partitions divided in an asymmetrical ratio, such as 1: n or n: 1, by a geometric form. It may optionally include partitioned partitions, arbitrary types of partitions, and the like.
- the prediction mode of the prediction unit may be at least one of an intra mode, an inter mode, and a skip mode.
- the intra mode and the inter mode may be performed on partitions having sizes of 2N ⁇ 2N, 2N ⁇ N, N ⁇ 2N, and N ⁇ N.
- the skip mode may be performed only for partitions having a size of 2N ⁇ 2N.
- the encoding may be performed independently for each prediction unit within the coding unit to select a prediction mode having the smallest encoding error.
- the video encoding apparatus 800 may perform the transformation of the image data of the coding unit based on not only a coding unit for encoding the image data but also a data unit different from the coding unit.
- the transformation may be performed based on a transformation unit having a size smaller than or equal to the coding unit.
- the transformation unit may include a data unit for intra mode and a transformation unit for inter mode.
- the transformation unit in the coding unit is also recursively divided into smaller transformation units, so that the residual data of the coding unit is determined according to the tree structure according to the transformation depth. Can be partitioned according to the conversion unit.
- a transform depth indicating a number of divisions between the height and the width of the coding unit divided to the transform unit may be set. For example, if the size of the transform unit of the current coding unit of size 2Nx2N is 2Nx2N, the transform depth is 0, the transform depth 1 if the size of the transform unit is NxN, and the transform depth 2 if the size of the transform unit is N / 2xN / 2. Can be. That is, the transformation unit having a tree structure may also be set for the transformation unit according to the transformation depth.
- the encoded information for each coded depth requires not only the coded depth but also prediction related information and transformation related information. Accordingly, the encoder 810 may determine not only the coded depth that generated the minimum encoding error, but also a partition type obtained by dividing a prediction unit into partitions, a prediction mode for each prediction unit, and a size of a transformation unit for transformation.
- a method of determining a coding unit, a prediction unit / partition, and a transformation unit according to a tree structure of a maximum coding unit according to an embodiment will be described later in detail with reference to FIGS. 15 to 24.
- the encoder 810 may measure a coding error of coding units according to depths using a Lagrangian Multiplier-based rate-distortion optimization technique.
- the output unit 820 outputs the image data of the maximum coding unit encoded and the information about the encoding modes according to depths in the form of a bit stream based on the at least one coded depth determined by the encoder 810.
- the encoded image data may be a result of encoding residual data of the image.
- the information about the encoding modes according to depths may include encoding depth information, partition type information of a prediction unit, prediction mode information, size information of a transformation unit, and the like.
- the coded depth information may be defined using depth-specific segmentation information indicating whether to encode to a coding unit of a lower depth without encoding to the current depth. If the current depth of the current coding unit is a coding depth, since the current coding unit is encoded in a coding unit of the current depth, split information of the current depth may be defined so that it is no longer divided into lower depths. On the contrary, if the current depth of the current coding unit is not the coding depth, encoding should be attempted using the coding unit of the lower depth, and thus split information of the current depth may be defined to be divided into coding units of the lower depth.
- encoding is performed on the coding unit divided into the coding units of the lower depth. Since at least one coding unit of a lower depth exists in the coding unit of the current depth, encoding may be repeatedly performed for each coding unit of each lower depth, and recursive coding may be performed for each coding unit of the same depth.
- coding units having a tree structure are determined in one largest coding unit and information about at least one coding mode should be determined for each coding unit of a coding depth, information about at least one coding mode may be determined for one maximum coding unit. Can be.
- the coding depth may be different for each location, and thus information about the coded depth and the coding mode may be set for the data.
- the output unit 820 may allocate encoding information about a corresponding coding depth and an encoding mode to at least one of a coding unit, a prediction unit, and a minimum unit included in the maximum coding unit. .
- the minimum unit according to an embodiment is a square data unit having a size obtained by dividing the minimum coding unit, which is the lowest coding depth, into four divisions.
- the minimum unit according to an embodiment may be a square data unit having a maximum size that may be included in all coding units, prediction units, partition units, and transformation units included in the maximum coding unit.
- the encoding information output through the output unit 820 may be classified into encoding information according to depth coding units and encoding information according to prediction units.
- the encoding information for each coding unit according to depth may include prediction mode information and partition size information.
- the encoding information transmitted for each prediction unit includes information about an estimation direction of the inter mode, information about a reference image index of the inter mode, information about a motion vector, information about a chroma component of an intra mode, and information about an inter mode of an intra mode. And the like.
- Information about the maximum size and information about the maximum depth of a coding unit defined for each picture, slice segment, or GOP may be inserted into a header, a sequence parameter set, or a picture parameter set of a bitstream.
- the information on the maximum size of the transform unit and the minimum size of the transform unit allowed for the current video may also be output through a header, a sequence parameter set, a picture parameter set, or the like of the bitstream.
- the output unit 820 may encode and output reference information, prediction information, and slice segment type information related to prediction.
- the coding units according to depths are coding units having a size in which the height and width of coding units of one layer higher depth are divided by half. That is, if the size of the coding unit of the current depth is 2Nx2N, the size of the coding unit of the lower depth is NxN.
- the current coding unit having a size of 2N ⁇ 2N may include up to four lower depth coding units having a size of N ⁇ N.
- the video encoding apparatus 800 determines a coding unit having an optimal shape and size for each maximum coding unit based on the size and the maximum depth of the maximum coding unit determined in consideration of the characteristics of the current picture. Coding units may be configured. In addition, since each of the maximum coding units may be encoded in various prediction modes and transformation methods, an optimal coding mode may be determined in consideration of image characteristics of coding units having various image sizes.
- the video encoding apparatus may adjust the coding unit in consideration of the image characteristics while increasing the maximum size of the coding unit in consideration of the size of the image, thereby increasing image compression efficiency.
- the scalable video encoding apparatus 600 described above with reference to FIG. 6A may include as many video encoding apparatuses 800 as the number of layers for encoding single layer images for each layer of a multi-layer video.
- the reference layer encoder 610 may include one video encoding apparatus 800
- the current layer encoder 660 may include the video encoding apparatus 800 as many as the number of current layers.
- the encoder 810 determines a prediction unit for inter-image prediction for each coding unit having a tree structure for each maximum coding unit, and predicts the inter-picture for each prediction unit. Can be performed.
- the encoder 810 may determine a coding unit and a prediction unit having a tree structure for each maximum coding unit, and may perform inter prediction for each prediction unit.
- the video encoding apparatus 800 may encode an interlayer prediction error for predicting a current layer image using SAO. Accordingly, the prediction error of the current layer image may be encoded using only information on the SAO type and the offset, based on the sample value distribution of the prediction error, without having to encode the prediction error for each sample position.
- the encoder 810 may perform the functions of the encoder 210 and the encoding information determiner 110 of FIG. 1.
- the output unit 820 may perform a function of the bitstream transmitter 130.
- FIG. 8B is a block diagram of a video decoding apparatus 850 based on coding units having a tree structure, according to various embodiments.
- a video decoding apparatus 850 including video prediction based on coding units having a tree structure includes a receiver 210, image data and encoding information reception extractor 860, and a decoder 870.
- the video decoding apparatus 850 including video prediction based on coding units having a tree structure according to an embodiment is referred to as a short term 'video decoding apparatus 850'.
- Definitions of various terms such as a coding unit, a depth, a prediction unit, a transformation unit, and information about various encoding modes for a decoding operation of the video decoding apparatus 850 according to an embodiment may be described with reference to FIG. 8 and the video encoding apparatus 800. Same as described above with reference.
- the reception extractor 860 receives and parses a bitstream of an encoded video.
- the image data and encoding information reception extractor 860 extracts image data encoded for each coding unit according to the coding units having a tree structure from the parsed bitstream, and outputs the encoded image data to the decoder 870.
- the image data and encoding information reception extractor 860 may extract information about a maximum size of a coding unit of the current picture from a header, a sequence parameter set, or a picture parameter set for the current picture.
- the image data and encoding information reception extractor 860 extracts information about a coded depth and an encoding mode of coding units having a tree structure for each maximum coding unit, from the parsed bitstream.
- the extracted information about the coded depth and the coding mode is output to the decoder 870. That is, the image data of the bit string may be divided into maximum coding units so that the decoder 870 may decode the image data for each maximum coding unit.
- the information about the coded depth and the encoding mode for each largest coding unit may be set with respect to one or more coded depth information, and the information about the coding mode according to the coded depths may include partition type information, prediction mode information, and transformation unit of the corresponding coding unit. May include size information and the like.
- split information for each depth may be extracted as the coded depth information.
- the information about the coded depth and the encoding mode for each maximum coding unit extracted by the image data and the encoding information reception extractor 860 may be different according to the depths for each of the maximum coding units, as in the video encoding apparatus 800 according to an exemplary embodiment.
- the image data and encoding information reception extractor 860 may be predetermined. Information about a coded depth and an encoding mode may be extracted for each data unit. If the information about the coded depth and the coding mode of the maximum coding unit is recorded for each of the predetermined data units, the predetermined data units having the information about the same coded depth and the coding mode are inferred as data units included in the same maximum coding unit. Can be.
- the decoder 870 reconstructs the current picture by decoding image data of each maximum coding unit based on the information about the coded depth and the encoding mode for each maximum coding unit. That is, the decoder 870 may decode encoded image data based on the read partition type, prediction mode, and transformation unit for each coding unit among coding units having a tree structure included in the maximum coding unit. .
- the decoding process may include a prediction process including intra prediction and motion compensation, and an inverse transform process.
- the decoder 870 may perform intra prediction or motion compensation according to each partition and prediction mode for each coding unit, based on partition type information and prediction mode information of the prediction unit of the coding unit for each coding depth.
- the decoder 870 may read transform unit information according to a tree structure for each coding unit and perform inverse transform based on the transformation unit for each coding unit for inverse transformation for each coding unit. Through inverse transformation, the pixel value of the spatial region of the coding unit may be restored.
- the decoder 870 may determine the coded depth of the current maximum coding unit by using the split information for each depth. If the split information indicates that the split information is no longer split at the current depth, the current depth is the coded depth. Therefore, the decoder 870 may decode the coding unit of the current depth with respect to the image data of the current maximum coding unit by using the partition type, the prediction mode, and the transformation unit size information of the prediction unit.
- the decoding unit 870 encodes the same data. It can be regarded as one data unit to be decoded in the mode.
- the decoding of the current coding unit may be performed by obtaining information about an encoding mode for each coding unit determined in this way.
- the scalable video decoding apparatus 700 described above with reference to FIG. 7A may decode the received reference layer image stream and the current layer image stream to reconstruct the reference layer images and the current layer images. It may include as many times as the number of viewpoints.
- the decoder 870 of the video decoding apparatus 850 may extract the samples of the reference layer images extracted by the reception extractor 860 from the reference layer image stream in the maximum coding unit. It may be divided into coding units according to a structure. The decoder 870 may reconstruct the reference layer images by performing motion compensation for each coding unit according to a tree structure of samples of the reference layer images, for each prediction unit for inter-image prediction.
- the decoder 870 of the video decoding apparatus 850 may extract the samples of the current layer images extracted by the reception extractor 860 from the current layer image stream in a maximum coding unit. It may be divided into coding units according to a structure. The decoder 870 may reconstruct the current layer images by performing motion compensation for each prediction unit for inter-image prediction for each coding unit of samples of the current layer images.
- the reception extractor 860 may acquire the SAO type and the offset from the received current layer bitstream and determine the SAO category according to the distribution of sample values for each sample of the current layer prediction image.
- the offset for each SAO category can be obtained. Therefore, even if the prediction error is not received for each sample, the decoder 870 compensates the offset for each category of each sample of the current hierarchical prediction image, and determines the current hierarchical reconstruction image by referring to the compensated current hierarchical prediction image. have.
- the video decoding apparatus 850 may obtain information about a coding unit that generates a minimum coding error by recursively encoding the maximum coding units in the encoding process, and use the same to decode the current picture. That is, decoding of encoded image data of coding units having a tree structure determined as an optimal coding unit for each maximum coding unit can be performed.
- the image data can be efficiently used according to the coding unit size and the encoding mode that are adaptively determined according to the characteristics of the image by using the information about the optimum encoding mode transmitted from the encoding end. Can be decoded and restored.
- the reception extractor 860 may perform a function of the encoding information obtainer 210 of FIG. 2.
- the decoder 870 may perform the functions of the accumulation ratio determiner 220 and the upsampling unit 230 of FIG. 2.
- FIG 9 illustrates a concept of coding units, according to various embodiments.
- a size of a coding unit may be expressed by a width x height, and may include 32x32, 16x16, and 8x8 from a coding unit having a size of 64x64.
- Coding units of size 64x64 may be partitioned into partitions of size 64x64, 64x32, 32x64, and 32x32, coding units of size 32x32 are partitions of size 32x32, 32x16, 16x32, and 16x16, and coding units of size 16x16 are 16x16.
- Coding units of size 8x8 may be divided into partitions of size 8x8, 8x4, 4x8, and 4x4, into partitions of 16x8, 8x16, and 8x8.
- the resolution is set to 1920x1080, the maximum size of the coding unit is 64, and the maximum depth is 2.
- the resolution is set to 1920x1080, the maximum size of the coding unit is 64, and the maximum depth is set to 3.
- the resolution is set to 352x288, the maximum size of the coding unit is 16, and the maximum depth is 1.
- the maximum depth illustrated in FIG. 15 represents the total number of divisions from the maximum coding unit to the minimum coding unit.
- the maximum size of the coding size is relatively large not only to improve the coding efficiency but also to accurately shape the image characteristics. Accordingly, the video data 910 and 920 having higher resolution than the video data 930 may be selected to have a maximum size of 64.
- the coding unit 915 of the video data 910 is divided twice from the largest coding unit having a long axis size of 64, and the depth is deepened by two layers, so that the long axis size is 32, 16. Up to coding units may be included.
- the coding unit 935 of the video data 930 is divided once from coding units having a long axis size of 16, and the depth of the video data 930 is deepened by one layer to make the long axis size 8. Up to coding units may be included.
- the coding unit 925 of the video data 920 is divided three times from the largest coding unit having a long axis size of 64, and the depth is three layers deep. , Up to 8 coding units may be included. As the depth increases, the expressive power of the detailed information may be improved.
- FIG. 10A is a block diagram of an image encoder 1000 based on coding units, according to various embodiments.
- the image encoder 1000 includes operations performed by the encoder 910 of the video encoding apparatus 900 to encode image data. That is, the intra predictor 1004 performs intra prediction on the coding unit of the intra mode among the current frame 1002, and the motion estimator 1006 and the motion compensator 1008 perform the current frame 1002 in the inter mode. And the inter frame estimation and motion compensation using the reference frame 1026.
- Data output from the intra predictor 1004, the motion estimator 1006, and the motion compensator 1008 is output as a quantized transform coefficient through the transform unit 1010 and the quantization unit 1012.
- the quantized transform coefficients are restored to the data of the spatial domain through the inverse quantizer 1018 and the inverse transformer 1020, and the recovered data of the spatial domain is passed through the deblocking unit 1022 and the offset compensator 1024. Processing is performed and output to the reference frame 1026.
- the quantized transform coefficients may be output to the bitstream 1016 via the entropy encoder 1014.
- an intra predictor 1004, a motion estimator 1006, a motion compensator 1008, and a transform unit, which are components of the image encoder 1000, may be used.
- 1010, quantizer 1012, entropy encoder 1014, inverse quantizer 1018, inverse transform unit 1020, deblocking unit 1022, and offset compensator 1024 are all maximal for each largest coding unit. In consideration of the depth, a task based on each coding unit among the coding units having a tree structure should be performed.
- the intra predictor 1004, the motion estimator 1006, and the motion compensator 1008 consider partitions of each coding unit among the coding units having a tree structure in consideration of the maximum size and the maximum depth of the current maximum coding unit. And a prediction mode, and the transformer 1010 should determine the size of a transform unit in each coding unit among the coding units having a tree structure.
- FIG. 10B is a block diagram of an image decoder 1050 based on coding units, according to various embodiments.
- the bitstream 1052 is parsed through the parser 1054, and the encoded image data to be decoded and information about encoding necessary for decoding are parsed.
- the encoded image data is output as inverse quantized data through the entropy decoding unit 1056 and the inverse quantization unit 1058, and the image data of the spatial domain is restored through the inverse transformation unit 1060.
- the intra predictor 1062 performs intra prediction on the coding unit of the intra mode, and the motion compensator 1064 uses the reference frame 1070 together with the coding unit of the inter mode. Perform motion compensation for the
- Data in the spatial domain that has passed through the intra predictor 1062 and the motion compensator 1064 may be post-processed through the deblocking unit 1066 and the offset compensator 1068 and output to the reconstructed frame 1072.
- the post-processed data through the deblocking unit 1066 and the loop filtering unit 1068 may be output as the reference frame 1070.
- step-by-step operations after the parser 1054 of the image decoder 1050 may be performed.
- a parser 1054 In order to be applied to the video decoding apparatus 950, a parser 1054, an entropy decoder 1056, an inverse quantizer 1058, and an inverse transform unit 1060 that are components of the image decoder 1050 may be used. ), The intra predictor 1062, the motion compensator 1064, the deblocking unit 1066, and the offset compensator 1068 must all perform operations based on coding units having a tree structure for each largest coding unit. do.
- the intra predictor 1062 and the motion compensator 1064 determine a partition and a prediction mode for each coding unit having a tree structure, and the inverse transform unit 1060 should determine the size of a transformation unit for each coding unit. .
- the encoding operation of FIG. 10A and the decoding operation of FIG. 10B describe the video stream encoding operation and the decoding operation in a single layer, respectively. Therefore, when the scalable video encoding apparatus 1200 of FIG. 12A encodes a video stream of two or more layers, the scalable video encoding apparatus 1200 may include an image encoder 1000 for each layer. Similarly, if the scalable video decoding apparatus 1250 of FIG. 12B decodes video streams of two or more layers, it may include an image decoder 1050 for each layer.
- FIG. 11 is a diagram illustrating deeper coding units according to depths, and partitions, according to various embodiments.
- the video encoding apparatus 800 and the video decoding apparatus 850 use hierarchical coding units to consider image characteristics.
- the maximum height, width, and maximum depth of the coding unit may be adaptively determined according to the characteristics of the image, and may be variously set according to a user's request. According to the maximum size of the preset coding unit, the size of the coding unit for each depth may be determined.
- the hierarchical structure 1100 of a coding unit illustrates a case in which a maximum height and a width of a coding unit are 64 and a maximum depth is three.
- the maximum depth indicates the total number of divisions from the maximum coding unit to the minimum coding unit. Since the depth deepens along the vertical axis of the hierarchical structure 1100 of the coding unit, the height and the width of the coding unit for each depth are respectively divided.
- a prediction unit and a partition on which the prediction encoding of each deeper coding unit is shown along the horizontal axis of the hierarchical structure 1100 of the coding unit is illustrated.
- the coding unit 1110 has a depth of 0 as a maximum coding unit of the hierarchical structure 1100 of the coding unit, and a size, that is, a height and a width, of the coding unit is 64x64.
- a depth deeper along the vertical axis includes a coding unit 1120 having a depth of 32x32, a coding unit 1130 having a depth of 16x16, and a coding unit 1140 having a depth of 8x8.
- a coding unit 1140 of depth 3 having a size of 8 ⁇ 8 is a minimum coding unit.
- Prediction units and partitions of the coding unit are arranged along the horizontal axis for each depth. That is, if the coding unit 1110 of size 64x64 having a depth of zero is a prediction unit, the prediction unit is a partition 1110 of size 64x64 included in the coding unit 1110 of size 64x64, partitions 1112 of size 64x32 and size 32x64 partitions 1114, and 32x32 partitions 1116.
- the prediction unit of the coding unit 1120 having a size of 32x32 having a depth of 1 includes a partition 1120 having a size of 32x32, partitions 1122 having a size of 32x16, and a partition having a size of 16x32 included in the coding unit 1120 having a size of 32x32. 1124, partitions 1126 of size 16x16.
- the prediction unit of the coding unit 1130 having a size of 16x16 having a depth of 2 includes a partition 1130 having a size of 16x16, partitions 1132 having a size of 16x8, and a partition having a size of 8x16 included in the coding unit 1130 having a size of 16x16. 1134, partitions 1136 of size 8x8.
- the prediction unit of the coding unit 1140 of size 8x8 having a depth of 3 includes a partition 1140 of size 8x8, partitions 1142 of size 8x4 and a partition of size 4x8 included in the coding unit 1140 of size 8x8. 144, partitions 1146 of size 4x4.
- the encoder 810 of the video encoding apparatus 100 encodes each coding unit of each depth included in the maximum coding unit 1110 to determine a coding depth of the maximum coding unit 1110. Should be performed.
- the number of deeper coding units according to depths for including data having the same range and size increases as the depth increases. For example, four coding units of depth 2 are required for data included in one coding unit of depth 1. Therefore, in order to compare the encoding results of the same data for each depth, each of the coding units having one depth 1 and four coding units having four depths 2 should be encoded.
- encoding may be performed for each prediction unit of a coding unit according to depths along a horizontal axis of the hierarchical structure 1100 of the coding unit, and a representative coding error, which is the smallest coding error at a corresponding depth, may be selected. .
- a depth deeper along the vertical axis of the hierarchical structure 1100 of the coding unit encoding may be performed for each depth, and a minimum encoding error may be searched by comparing the representative encoding error for each depth.
- the depth and the partition in which the minimum coding error occurs in the maximum coding unit 1110 may be selected as the coding depth and the partition type of the maximum coding unit 1110.
- FIG. 12 illustrates a relationship between a coding unit and transformation units, according to various embodiments.
- the video encoding apparatus 800 encodes or decodes an image in coding units having a size smaller than or equal to the maximum coding unit for each maximum coding unit.
- the size of a transformation unit for transformation in the encoding process may be selected based on a data unit that is not larger than each coding unit.
- the 32x32 transform unit 1220 may be used. The conversion can be performed.
- the data of the 64x64 coding unit 1210 is transformed and encoded into 32x32, 16x16, 8x8, and 4x4 size transformation units of 64x64 size or less, and the transformation unit having the least error with the original is selected. Can be.
- FIG. 13 is a diagram of deeper encoding information according to depths, according to various embodiments.
- the output unit 820 of the video encoding apparatus 100 is information about an encoding mode, and information about a partition type 1300 and information 1310 about a prediction mode for each coding unit of each coded depth.
- the information about the transform unit size may be encoded and transmitted.
- the information about the partition type 1300 is a data unit for predictive encoding of the current coding unit and indicates information about a partition type in which the prediction unit of the current coding unit is divided.
- the current coding unit CU_0 of size 2Nx2N is any one of a partition 1302 of size 2Nx2N, a partition 1304 of size 2NxN, a partition 1306 of size Nx2N, and a partition 1308 of size NxN. It can be divided and used.
- the information 1300 on the partition type of the current coding unit represents one of a partition 1302 of size 2Nx2N, a partition 1304 of size 2NxN, a partition 1306 of size Nx2N, and a partition 1308 of size NxN. It is set to.
- Information 1310 about the prediction mode indicates a prediction mode of each partition. For example, through the information 1310 about the prediction mode, whether the partition indicated by the information about the partition type 1300 is performed in one of the intra mode 1312, the inter mode 1314, and the skip mode 1316 is performed. Whether or not can be set.
- the information 1320 about the size of the transformation unit indicates which transformation unit is to be converted based on the current coding unit.
- the transform unit may be one of a first intra transform unit size 1322, a second intra transform unit size 1324, a first inter transform unit size 1326, and a second inter transform unit size 1328. have.
- the reception extractor 860 of the video decoding apparatus 850 may include information about a partition type 1300, information 1310 about a prediction mode, and a transform unit size for each depth-based coding unit.
- the information 1320 can be extracted and used for decoding.
- FIG. 14 is a diagram illustrating deeper coding units according to depths, according to various embodiments.
- Segmentation information may be used to indicate a change in depth.
- the split information indicates whether a coding unit of a current depth is split into coding units of a lower depth.
- the prediction unit 1410 for prediction encoding of the coding unit 1400 having depth 0 and 2N_0x2N_0 size includes a partition type 1412 having a size of 2N_0x2N_0, a partition type 1414 having a size of 2N_0xN_0, a partition type 1416 having a size of N_0x2N_0, and N_0xN_0 It may include a partition type 1418 of size. Although only partitions 1412, 1414, 1416, 1418 in which the prediction unit is divided in symmetric proportions are illustrated, as described above, the partition type is not limited thereto, and asymmetric partitions, arbitrary partitions, geometric partitions, and the like. It may include.
- prediction coding For each partition type, prediction coding must be performed repeatedly for one 2N_0x2N_0 partition, two 2N_0xN_0 partitions, two N_0x2N_0 partitions, and four N_0xN_0 partitions.
- prediction encoding For partitions having a size 2N_0x2N_0, a size N_0x2N_0, a size 2N_0xN_0, and a size N_0xN_0, prediction encoding may be performed in an intra mode and an inter mode. The skip mode may be performed only for prediction encoding on partitions having a size of 2N_0x2N_0.
- the depth 0 is changed to 1 and divided (1420), and iteratively encoded for the depth 2 and the coding units 1430 of the partition type of size N_0xN_0.
- the depth 1 is changed to the depth 2 and divided (1450), and repeatedly for the coding units 1460 of the depth 2 and the size N_2xN_2.
- the encoding may be performed to search for a minimum encoding error.
- depth-based coding units may be set until depth d-1, and split information may be set up to depth d-2. That is, when encoding is performed from the depth d-2 to the depth d-1 and the encoding is performed to the depth d-1, the prediction encoding of the coding unit 1480 of the depth d-1 and the size 2N_ (d-1) x2N_ (d-1)
- the prediction unit 1490 for is a partition type 1452 of size 2N_ (d-1) x2N_ (d-1), partition type 1494 of size 2N_ (d-1) xN_ (d-1), size A partition type 1496 of N_ (d-1) x2N_ (d-1) and a partition type 1498 of size N_ (d-1) xN_ (d-1) may be included.
- one partition 2N_ (d-1) x2N_ (d-1), two partitions 2N_ (d-1) xN_ (d-1), two sizes N_ (d-1) x2N_ Prediction encoding is repeatedly performed for each partition of (d-1) and four partitions of size N_ (d-1) xN_ (d-1), so that a partition type having a minimum encoding error may be searched. .
- the coding unit CU_ (d-1) of the depth d-1 is no longer
- the coded depth for the current maximum coding unit 1400 may be determined as the depth d-1 and the partition type may be determined as N_ (d-1) xN_ (d-1) without going through a division process into lower depths.
- split information is not set for the coding unit 1452 of the depth d-1.
- the data unit 1499 may be referred to as a 'minimum unit' for the current maximum coding unit.
- the minimum unit may be a square data unit having a size obtained by dividing the minimum coding unit, which is the lowest coding depth, into four divisions.
- the video encoding apparatus 100 compares the encoding errors for each depth of the coding unit 1400, selects a depth at which the smallest encoding error occurs, and determines a coding depth.
- the partition type and the prediction mode may be set to the encoding mode of the coded depth.
- the depth with the smallest error can be determined by comparing the minimum coding errors for all depths of depths 0, 1, ..., d-1, d, and can be determined as the coding depth.
- the coded depth, the partition type of the prediction unit, and the prediction mode may be encoded and transmitted as information about an encoding mode.
- the coding unit since the coding unit must be split from the depth 0 to the coded depth, only the split information of the coded depth is set to '0', and the split information for each depth except the coded depth should be set to '1'.
- the image data and encoding information reception extractor 860 of the video decoding apparatus 850 extracts information about a coding depth and a prediction unit for the coding unit 1400 to decode the coding unit 1412. It is available.
- the video decoding apparatus 850 may identify a depth having split information of '0' as an encoding depth by using split information for each depth, and use the decoding information by using information about an encoding mode for a corresponding depth. have.
- 15, 16, and 17 illustrate a relationship between a coding unit, a prediction unit, and a transformation unit, according to various embodiments.
- the coding units 1510 are coding units according to coding depths, which are determined by the video encoding apparatus 100 according to an embodiment with respect to the maximum coding unit.
- the prediction unit 1560 is partitions of prediction units of each coding depth of each coding depth of the coding unit 1510, and the transformation unit 1570 is transformation units of each coding depth for each coding depth.
- the depth-based coding units 1510 have a depth of 0
- the coding units 1512 have a depth of 1
- the coding units 1514, 1516, 1518, 1528, 1550, and 1552 have a depth of 2.
- the coding units 1520, 1522, 1524, 1526, 1530, 1532, and 1548 have a depth of three
- the coding units 1540, 1542, 1544, and 1546 have a depth of four.
- partitions 1514, 1516, 1522, 1532, 1548, 1550, 1552, and 1554 of the prediction units 1560 have a form in which coding units are split. That is, partitions 1514, 1522, 1550, and 1554 are partition types of 2NxN, partitions 1516, 1548, and 1552 are partition types of Nx2N, and partitions 1532 are partition types of NxN. Prediction units and partitions of the coding units 1510 according to depths are smaller than or equal to each coding unit.
- the image data of the part 1552 of the transformation units 1570 may be transformed or inversely transformed into a data unit having a smaller size than that of the coding unit.
- the transformation units 1514, 1516, 1522, 1532, 1548, 1550, 1552, and 1554 are data units having different sizes or shapes when compared to corresponding prediction units and partitions among the prediction units 1560. That is, even if the video encoding apparatus 800 and the video decoding apparatus 850 according to the embodiment are intra prediction / motion estimation / motion compensation operations and transform / inverse transform operations for the same coding unit, Each can be performed on a separate data unit.
- coding is performed recursively for each coding unit having a hierarchical structure for each largest coding unit to determine an optimal coding unit.
- coding units having a recursive tree structure may be configured.
- the encoding information may include split information about a coding unit, partition type information, prediction mode information, and transformation unit size information. Table 1 below shows an example that can be set in the video encoding apparatus 800 and the video decoding apparatus 850 according to an embodiment.
- the output unit 820 of the video encoding apparatus 100 outputs encoding information about coding units having a tree structure, and the encoding information reception extraction unit of the video decoding apparatus 850 according to an embodiment. 860 may extract encoding information about coding units having a tree structure from the received bitstream.
- the split information indicates whether the current coding unit is split into coding units of a lower depth. If the split information of the current depth d is 0, partition type information, prediction mode, and transform unit size information are defined for the coded depth because the depth in which the current coding unit is no longer divided into the lower coding units is a coded depth. Can be. If it is to be further split by the split information, encoding should be performed independently for each coding unit of the divided four lower depths.
- the prediction mode may be represented by one of an intra mode, an inter mode, and a skip mode.
- Intra mode and inter mode can be defined in all partition types, and skip mode can be defined only in partition type 2Nx2N.
- the partition type information indicates the symmetric partition types 2Nx2N, 2NxN, Nx2N and NxN, in which the height or width of the prediction unit is divided by the symmetrical ratio, and the asymmetric partition types 2NxnU, 2NxnD, nLx2N, nRx2N, which are divided by the asymmetrical ratio.
- the asymmetric partition types 2NxnU and 2NxnD are divided into heights 1: 3 and 3: 1, respectively, and the asymmetric partition types nLx2N and nRx2N are divided into 1: 3 and 3: 1 widths, respectively.
- the conversion unit size may be set to two kinds of sizes in the intra mode and two kinds of sizes in the inter mode. That is, if the transformation unit split information is 0, the size of the transformation unit is set to the size 2Nx2N of the current coding unit. If the transform unit split information is 1, a transform unit having a size obtained by dividing the current coding unit may be set. In addition, if the partition type for the current coding unit having a size of 2Nx2N is a symmetric partition type, the size of the transform unit may be set to NxN, and if the asymmetric partition type is N / 2xN / 2.
- Encoding information of coding units having a tree structure may be allocated to at least one of a coding unit, a prediction unit, and a minimum unit unit of a coding depth.
- the coding unit of the coding depth may include at least one prediction unit and at least one minimum unit having the same encoding information.
- the encoding information held by each adjacent data unit is checked, it may be determined whether the adjacent data units are included in the coding unit having the same coding depth.
- the coding unit of the corresponding coding depth may be identified by using the encoding information held by the data unit, the distribution of the coded depths within the maximum coding unit may be inferred.
- the encoding information of the data unit in the depth-specific coding unit adjacent to the current coding unit may be directly referred to and used.
- the prediction coding when the prediction coding is performed by referring to the neighboring coding unit, the data adjacent to the current coding unit in the coding unit according to depths is encoded by using the encoding information of the adjacent coding units according to depths.
- the neighboring coding unit may be referred to by searching.
- FIG. 18 illustrates a relationship between a coding unit, a prediction unit, and a transformation unit, according to encoding mode information of Table 1.
- FIG. 18 illustrates a relationship between a coding unit, a prediction unit, and a transformation unit, according to encoding mode information of Table 1.
- the maximum coding unit 1800 includes coding units 1802, 1804, 1806, 1812, 1814, 1816, and 1818 of a coded depth. Since one coding unit 1818 is a coding unit of a coded depth, split information may be set to zero.
- the partition type information of the coding unit 1818 having a size of 2Nx2N includes partition types 2Nx2N 1822, 2NxN 1824, Nx2N 1826, NxN 1828, 2NxnU 1834, 2NxnD 1834, and nLx2N 1836. And nRx2N 1838.
- the transform unit split information (TU size flag) is a type of transform index, and a size of a transform unit corresponding to the transform index may be changed according to a prediction unit type or a partition type of a coding unit.
- the partition type information is set to one of the symmetric partition types 2Nx2N (1822), 2NxN (1824), Nx2N (1826), and NxN (1828)
- the conversion unit partition information is 0, the conversion unit of size 2Nx2N ( 1842 is set, and if the transform unit split information is 1, a transform unit 1844 of size N ⁇ N may be set.
- the partition type information is set to one of the asymmetric partition types 2NxnU (1832), 2NxnD (1834), nLx2N (1836), and nRx2N (1838), if the conversion unit partition information (TU size flag) is 0, a conversion unit of size 2Nx2N ( 1852 is set, and if the transform unit split information is 1, a transform unit 1854 of size N / 2 ⁇ N / 2 may be set.
- the conversion unit splitting information (TU size flag) described above with reference to FIG. 12 is a flag having a value of 0 or 1
- the conversion unit splitting information according to an embodiment is not limited to a 1-bit flag and is set to 0 according to a setting. , 1, 2, 3., etc., and may be divided hierarchically.
- the transformation unit partition information may be used as an embodiment of the transformation index.
- the size of the transformation unit actually used may be expressed.
- the video encoding apparatus 100 may encode maximum transform unit size information, minimum transform unit size information, and maximum transform unit split information.
- the encoded maximum transform unit size information, minimum transform unit size information, and maximum transform unit split information may be inserted into the SPS.
- the video decoding apparatus 850 may use the maximum transform unit size information, the minimum transform unit size information, and the maximum transform unit split information to use for video decoding.
- the maximum transform unit split information is defined as 'MaxTransformSizeIndex'
- the minimum transform unit size is 'MinTransformSize'
- the transform unit split information is 0,
- the minimum transform unit possible in the current coding unit is defined as 'RootTuSize'.
- the size 'CurrMinTuSize' can be defined as in relation (1) below.
- 'RootTuSize' which is a transform unit size when the transform unit split information is 0, may indicate a maximum transform unit size that can be adopted in the system. That is, according to relation (1), 'RootTuSize / (2 ⁇ MaxTransformSizeIndex)' is a transformation obtained by dividing 'RootTuSize', which is the size of the transformation unit when the transformation unit division information is 0, by the number of times corresponding to the maximum transformation unit division information. Since the unit size is 'MinTransformSize' is the minimum transform unit size, a smaller value among them may be the minimum transform unit size 'CurrMinTuSize' possible in the current coding unit.
- the maximum transform unit size RootTuSize may vary depending on a prediction mode.
- RootTuSize may be determined according to the following relation (2).
- 'MaxTransformSize' represents the maximum transform unit size
- 'PUSize' represents the current prediction unit size.
- RootTuSize min (MaxTransformSize, PUSize) ......... (2)
- 'RootTuSize' which is a transform unit size when the transform unit split information is 0, may be set to a smaller value among the maximum transform unit size and the current prediction unit size.
- 'RootTuSize' may be determined according to Equation (3) below.
- 'PartitionSize' represents the size of the current partition unit.
- RootTuSize min (MaxTransformSize, PartitionSize) ........... (3)
- the conversion unit size 'RootTuSize' when the conversion unit split information is 0 may be set to a smaller value among the maximum conversion unit size and the current partition unit size.
- the current maximum conversion unit size 'RootTuSize' according to an embodiment that changes according to the prediction mode of the partition unit is only an embodiment, and a factor determining the current maximum conversion unit size is not limited thereto.
- image data of the spatial domain is encoded for each coding unit of the tree structure, and the video decoding method based on the coding units of the tree structure.
- decoding is performed for each largest coding unit, and image data of a spatial region may be reconstructed to reconstruct a picture and a video that is a picture sequence.
- the reconstructed video can be played back by a playback device, stored in a storage medium, or transmitted over a network.
- the above-described embodiments of the present invention can be written as a program that can be executed in a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable recording medium.
- the computer-readable recording medium may include a storage medium such as a magnetic storage medium (eg, a ROM, a floppy disk, a hard disk, etc.) and an optical reading medium (eg, a CD-ROM, a DVD, etc.).
- the scalable video encoding method and / or video encoding method described above with reference to FIGS. 6A to 18 are collectively referred to as the video encoding method of the present invention.
- the scalable video decoding method and / or video decoding method described above with reference to FIGS. 6A to 18 are referred to as the video decoding method of the present invention.
- a video encoding apparatus including the scalable video decoding apparatus 1200, the video encoding apparatus 800, or the image encoder 1000 described above with reference to FIGS. 6A to 18 may be referred to as the video encoding apparatus of the present invention.
- the video decoding apparatus including the scalable video decoding apparatus 1250, the video decoding apparatus 850, or the image decoding unit 1050 described above with reference to FIGS. 6A to 18 may be referred to as the video decoding apparatus of the present invention.
- a computer-readable storage medium in which a program is stored according to an embodiment of the present invention will be described in detail below.
- the disk 26000 described above as a storage medium may be a hard drive, a CD-ROM disk, a Blu-ray disk, or a DVD disk.
- the disk 26000 is composed of a plurality of concentric tracks tr, and the tracks are divided into a predetermined number of sectors Se in the circumferential direction.
- a program for implementing the above-described quantization parameter determination method, video encoding method, and video decoding method may be allocated and stored in a specific region of the disc 26000 which stores the program according to the above-described embodiment.
- a computer system achieved using a storage medium storing a program for implementing the above-described video encoding method and video decoding method will be described below with reference to FIG. 21.
- the computer system 26700 may store a program for implementing at least one of the video encoding method and the video decoding method of the present invention on the disc 26000 using the disc drive 26800.
- the program may be read from the disk 26000 by the disk drive 26800, and the program may be transferred to the computer system 26700.
- a program for implementing at least one of the video encoding method and the video decoding method may be stored in a memory card, a ROM cassette, and a solid state drive (SSD). .
- FIG. 21 illustrates the overall structure of a content supply system 11000 for providing a content distribution service.
- the service area of the communication system is divided into cells of a predetermined size, and wireless base stations 11700, 11800, 11900, and 12000 that serve as base stations are installed in each cell.
- the content supply system 11000 includes a plurality of independent devices.
- independent devices such as a computer 12100, a personal digital assistant (PDA) 12200, a camera 12300, and a mobile phone 12500 may be an Internet service provider 11200, a communication network 11400, and a wireless base station. 11700, 11800, 11900, and 12000 to connect to the Internet 11100.
- PDA personal digital assistant
- the content supply system 11000 is not limited to the structure shown in FIG. 21, and devices may be selectively connected.
- the independent devices may be directly connected to the communication network 11400 without passing through the wireless base stations 11700, 11800, 11900, and 12000.
- the video camera 12300 is an imaging device capable of capturing video images like a digital video camera.
- the mobile phone 12500 is such as Personal Digital Communications (PDC), code division multiple access (CDMA), wideband code division multiple access (W-CDMA), Global System for Mobile Communications (GSM), and Personal Handyphone System (PHS). At least one communication scheme among various protocols may be adopted.
- PDC Personal Digital Communications
- CDMA code division multiple access
- W-CDMA wideband code division multiple access
- GSM Global System for Mobile Communications
- PHS Personal Handyphone System
- the video camera 12300 may be connected to the streaming server 11300 through the wireless base station 11900 and the communication network 11400.
- the streaming server 11300 may stream and transmit the content transmitted by the user using the video camera 12300 through real time broadcasting.
- Content received from the video camera 12300 may be encoded by the video camera 12300 or the streaming server 11300.
- Video data captured by the video camera 12300 may be transmitted to the streaming server 11300 via the computer 12100.
- Video data captured by the camera 12600 may also be transmitted to the streaming server 11300 via the computer 12100.
- the camera 12600 is an imaging device capable of capturing both still and video images, like a digital camera.
- Video data received from the camera 12600 may be encoded by the camera 12600 or the computer 12100.
- Software for video encoding and decoding may be stored in a computer readable recording medium such as a CD-ROM disk, a floppy disk, a hard disk drive, an SSD, or a memory card that the computer 12100 may access.
- video data may be received from the mobile phone 12500.
- the video data may be encoded by a large scale integrated circuit (LSI) system installed in the video camera 12300, the mobile phone 12500, or the camera 12600.
- LSI large scale integrated circuit
- a user is recorded using a video camera 12300, a camera 12600, a mobile phone 12500, or another imaging device.
- the content is encoded and sent to the streaming server 11300.
- the streaming server 11300 may stream and transmit content data to other clients who have requested the content data.
- the clients are devices capable of decoding the encoded content data, and may be, for example, a computer 12100, a PDA 12200, a video camera 12300, or a mobile phone 12500.
- the content supply system 11000 allows clients to receive and play encoded content data.
- the content supply system 11000 enables clients to receive and decode and reproduce encoded content data in real time, thereby enabling personal broadcasting.
- the video encoding apparatus and the video decoding apparatus of the present invention may be applied to encoding and decoding operations of independent devices included in the content supply system 11000.
- the mobile phone 12500 is not limited in functionality and may be a smart phone that can change or expand a substantial portion of its functions through an application program.
- the mobile phone 12500 includes a built-in antenna 12510 for exchanging RF signals with the wireless base station 12000, and displays images captured by the camera 1530 or images received and decoded by the antenna 12510. And a display screen 12520 such as an LCD (Liquid Crystal Display) and an OLED (Organic Light Emitting Diodes) screen for displaying.
- the smartphone 12510 includes an operation panel 12540 including a control button and a touch panel. When the display screen 12520 is a touch screen, the operation panel 12540 further includes a touch sensing panel of the display screen 12520.
- the smart phone 12510 includes a speaker 12580 or another type of audio output unit for outputting voice and sound, and a microphone 12550 or another type of audio input unit for inputting voice and sound.
- the smartphone 12510 further includes a camera 1530 such as a CCD camera for capturing video and still images.
- the smartphone 12510 may be a storage medium for storing encoded or decoded data, such as video or still images captured by the camera 1530, received by an e-mail, or obtained in another form. 12570); And a slot 12560 for mounting the storage medium 12570 to the mobile phone 12500.
- the storage medium 12570 may be another type of flash memory such as an electrically erasable and programmable read only memory (EEPROM) embedded in an SD card or a plastic case.
- EEPROM electrically erasable and programmable read only memory
- FIG. 23 illustrates an internal structure of the mobile phone 12500.
- the power supply circuit 12700 the operation input controller 12640, the image encoder 12720, and the camera interface (12630), LCD control unit (12620), image decoding unit (12690), multiplexer / demultiplexer (12680), recording / reading unit (12670), modulation / demodulation unit (12660) and
- the sound processor 12650 is connected to the central controller 12710 through the synchronization bus 1730.
- the power supply circuit 12700 supplies power to each part of the mobile phone 12500 from the battery pack, thereby causing the mobile phone 12500 to operate. Can be set to an operating mode.
- the central controller 12710 includes a CPU, a read only memory (ROM), and a random access memory (RAM).
- the digital signal is generated in the mobile phone 12500 under the control of the central controller 12710, for example, the digital sound signal is generated in the sound processor 12650.
- the image encoder 12720 may generate a digital image signal, and text data of the message may be generated through the operation panel 12540 and the operation input controller 12640.
- the modulator / demodulator 12660 modulates a frequency band of the digital signal, and the communication circuit 12610 is a band-modulated digital signal. Digital-to-analog conversion and frequency conversion are performed on the acoustic signal.
- the transmission signal output from the communication circuit 12610 may be transmitted to the voice communication base station or the radio base station 12000 through the antenna 12510.
- the sound signal acquired by the microphone 12550 is converted into a digital sound signal by the sound processor 12650 under the control of the central controller 12710.
- the generated digital sound signal may be converted into a transmission signal through the modulation / demodulation unit 12660 and the communication circuit 12610 and transmitted through the antenna 12510.
- the text data of the message is input using the operation panel 12540, and the text data is transmitted to the central controller 12610 through the operation input controller 12640.
- the text data is converted into a transmission signal through the modulator / demodulator 12660 and the communication circuit 12610, and transmitted to the radio base station 12000 through the antenna 12510.
- the image data photographed by the camera 1530 is provided to the image encoder 12720 through the camera interface 12630.
- the image data photographed by the camera 1252 may be directly displayed on the display screen 12520 through the camera interface 12630 and the LCD controller 12620.
- the structure of the image encoder 12720 may correspond to the structure of the video encoding apparatus as described above.
- the image encoder 12720 encodes the image data provided from the camera 1252 according to the video encoding method of the present invention described above, converts the image data into compression-encoded image data, and multiplexes / demultiplexes the encoded image data. (12680).
- the sound signal obtained by the microphone 12550 of the mobile phone 12500 is also converted into digital sound data through the sound processor 12650 during recording of the camera 1250, and the digital sound data is converted into the multiplex / demultiplexer 12680. Can be delivered.
- the multiplexer / demultiplexer 12680 multiplexes the encoded image data provided from the image encoder 12720 together with the acoustic data provided from the sound processor 12650.
- the multiplexed data may be converted into a transmission signal through the modulation / demodulation unit 12660 and the communication circuit 12610 and transmitted through the antenna 12510.
- the signal received through the antenna converts the digital signal through a frequency recovery (Analog-Digital conversion) process .
- the modulator / demodulator 12660 demodulates the frequency band of the digital signal.
- the band demodulated digital signal is transmitted to the video decoder 12690, the sound processor 12650, or the LCD controller 12620 according to the type.
- the mobile phone 12500 When the mobile phone 12500 is in the call mode, the mobile phone 12500 amplifies a signal received through the antenna 12510 and generates a digital sound signal through frequency conversion and analog-to-digital conversion processing.
- the received digital sound signal is converted into an analog sound signal through the modulator / demodulator 12660 and the sound processor 12650 under the control of the central controller 12710, and the analog sound signal is output through the speaker 12580. .
- a signal received from the radio base station 12000 via the antenna 12510 is converted into multiplexed data as a result of the processing of the modulator / demodulator 12660.
- the output and multiplexed data is transmitted to the multiplexer / demultiplexer 12680.
- the multiplexer / demultiplexer 12680 demultiplexes the multiplexed data to separate the encoded video data stream and the encoded audio data stream.
- the encoded video data stream is provided to the video decoder 12690, and the encoded audio data stream is provided to the sound processor 12650.
- the structure of the image decoder 12690 may correspond to the structure of the video decoding apparatus as described above.
- the image decoder 12690 generates the reconstructed video data by decoding the encoded video data by using the video decoding method of the present invention described above, and displays the reconstructed video data through the LCD controller 1262 through the display screen 1252. ) Can be restored video data.
- video data of a video file accessed from a website of the Internet can be displayed on the display screen 1252.
- the sound processor 1265 may convert the audio data into an analog sound signal and provide the analog sound signal to the speaker 1258. Accordingly, audio data contained in a video file accessed from a website of the Internet can also be reproduced in the speaker 1258.
- the mobile phone 1150 or another type of communication terminal is a transmitting / receiving terminal including both the video encoding apparatus and the video decoding apparatus of the present invention, a transmitting terminal including only the video encoding apparatus of the present invention described above, or the video decoding apparatus of the present invention. It may be a receiving terminal including only.
- FIG. 24 illustrates a digital broadcasting system employing a communication system, according to various embodiments.
- the digital broadcasting system according to the embodiment of FIG. 24 may receive a digital broadcast transmitted through a satellite or terrestrial network using the video encoding apparatus and the video decoding apparatus.
- the broadcast station 12890 transmits the video data stream to the communication satellite or the broadcast satellite 12900 through radio waves.
- the broadcast satellite 12900 transmits a broadcast signal, and the broadcast signal is received by the antenna 12860 in the home to the satellite broadcast receiver.
- the encoded video stream may be decoded and played back by the TV receiver 12610, set-top box 12870, or other device.
- the playback device 12230 can read and decode the encoded video stream recorded on the storage medium 12020 such as a disk and a memory card.
- the reconstructed video signal may thus be reproduced in the monitor 12840, for example.
- the video decoding apparatus of the present invention may also be mounted in the set-top box 12870 connected to the antenna 12860 for satellite / terrestrial broadcasting or the cable antenna 12850 for cable TV reception. Output data of the set-top box 12870 may also be reproduced by the TV monitor 12880.
- the video decoding apparatus of the present invention may be mounted on the TV receiver 12810 instead of the set top box 12870.
- An automobile 12920 with an appropriate antenna 12910 may receive signals from satellite 12800 or radio base station 11700.
- the decoded video may be played on the display screen of the car navigation system 12930 mounted on the car 12920.
- the video signal may be encoded by the video encoding apparatus of the present invention and recorded and stored in a storage medium.
- the video signal may be stored in the DVD disk 12960 by the DVD recorder, or the video signal may be stored in the hard disk by the hard disk recorder 12950.
- the video signal may be stored in the SD card 12970. If the hard disk recorder 12950 includes the video decoding apparatus of the present invention according to an embodiment, the video signal recorded on the DVD disk 12960, the SD card 12970, or another type of storage medium is output from the monitor 12880. Can be recycled.
- the vehicle navigation system 12930 may not include the camera 1530, the camera interface 12630, and the image encoder 12720 of FIG. 23.
- the computer 12100 and the TV receiver 12610 may not include the camera 1250, the camera interface 12630, and the image encoder 12720 of FIG. 23.
- the user terminal may include the video decoding apparatus as described above with reference to FIGS. 1A through 18.
- the user terminal may include the video encoding apparatus as described above with reference to FIGS. 1A through 18.
- the user terminal may include both the video encoding apparatus and the video decoding apparatus as described above with reference to FIGS. 1A through 18.
- FIGS. 1A to 18 Various embodiments in which the above-described video encoding method, video decoding method, video encoding apparatus, and video decoding apparatus are utilized are described above with reference to FIGS. 1A to 18. However, various embodiments in which the video encoding method and the video decoding method described above with reference to FIGS. 1A to 18 are stored in a storage medium or the video encoding apparatus and the video decoding apparatus are implemented in a device are illustrated in the embodiments of FIGS. 1A to 18. It is not limited to.
Abstract
Description
분할 정보 0 (현재 심도 d의 크기 2Nx2N의 부호화 단위에 대한 부호화) | 분할 정보 1 | ||||
예측 모드 | 파티션 모드 | 변환 단위 크기 | 하위 심도 d+1의 부호화 단위들마다 반복적 부호화 | ||
인트라 인터스킵(2Nx2N만) | 대칭형 파티션 모드 | 비대칭형 파티션 모드 | 변환 단위분할 정보 0 | 변환 단위 분할 정보 1 | |
2Nx2N2NxNNx2NNxN | 2NxnU2NxnDnLx2NnRx2N | 2Nx2N | NxN (대칭형 파티션 모드) N/2xN/2 (비대칭형 파티션 모드) |
Claims (12)
- 현재 계층에 포함된 샘플들의 위상이 조정되는지 여부를 나타내는 업샘플링 위상 세트 정보를 비트스트림으로부터 획득하는 단계;상기 업샘플링 위상 세트 정보에 따라 상기 위상이 조정되는 경우, 루마 수직 위상차, 루마 수평 위상차, 크로마 수직 위상차 및 크로마 수평 위상차를 상기 비트스트림으로부터 획득하는 단계; 및상기 루마 수직 위상차, 상기 루마 수평 위상차, 상기 크로마 수직 위상차 및 상기 크로마 수평 위상차에 기초하여 참조 계층을 업샘플링하여 상기 현재 계층의 예측 픽처를 결정하는 단계를 포함하고,상기 루마 수직 위상차 및 상기 루마 수평 위상차에 따라 상기 예측 픽처에 포함된 루마 샘플들의 위상이 조정되고, 상기 크로마 수직 위상차 및 상기 크로마 수평 위상차에 따라 상기 예측 픽처에 포함된 크로마 샘플들의 위상이 조정되고,상기 루마 수직 위상차 및 상기 크로마 수직 위상차는 상기 참조 계층의 주사 방식에 의하여 결정된 것을 특징으로 하는 비디오 복호화 방법.
- 제1항에 있어서,상기 루마 수직 위상차 및 상기 크로마 수직 위상차는 상기 참조 계층의 주사 방식 및 상기 참조 계층과 상기 현재 계층의 정렬 방식에 의하여 결정되고,상기 정렬 방식에는 상기 참조 계층 및 상기 현재 계층의 좌측 상단부를 기준으로 상기 참조 계층 및 상기 현재 계층을 정렬하는 영 위상 정렬 방식(zero-phase alignment) 및 상기 참조 계층 및 상기 현재 계층의 중심을 기준으로 상기 참조 계층 및 상기 현재 계층을 정렬하는 대칭 정렬 방식(symmetric alignment)이 포함되는 것을 특징으로 하는 비디오 복호화 방법.
- 제1항에 있어서,상기 참조 계층의 높이 및 너비를 나타내는 참조 계층 크기 정보, 상기 참조 계층으로부터 계층 간 예측에 사용되는 참조 영역을 정의하기 위한 참조 계층 오프셋 정보, 상기 현재 계층의 높이 및 너비를 나타내는 현재 계층 크기 정보 및 상기 현재 계층으로부터 상기 참조 영역과 대응되는 확장 참조 영역을 정의하기 위한 현재 계층 오프셋 정보를 상기 비트스트림으로부터 획득하는 단계;상기 참조 계층 크기 정보와 상기 참조 계층 오프셋 정보로부터 상기 참조 영역의 크기를 결정하는 단계;상기 현재 계층 크기 정보와 상기 현재 계층 오프셋 정보로부터 상기 확장 참조 영역의 크기를 결정하는 단계;상기 참조 영역의 크기와 상기 확장 참조 영역의 크기에 따라, 상기 참조 영역과 상기 확장 참조 영역 간 크기의 비율를 나타내는 축적비를 결정하는 단계를 더 포함하고,상기 예측 픽처를 결정하는 단계는, 상기 루마 수직 위상차, 상기 루마 수평 위상차, 상기 크로마 수직 위상차 및 상기 크로마 수평 위상차, 상기 참조 계층 오프셋 정보, 상기 현재 계층 오프셋 정보 및 상기 축적비에 따라 상기 참조 픽처를 업샘플링함으로써, 상기 예측 픽처를 결정하는 것을 특징으로 하는 비디오 복호화 방법.
- 제1항에 있어서,상기 현재 계층에 포함된 샘플값들과 상기 현재 계층의 참조 픽처에 포함된 샘플값들의 차이값을 포함하는 레지듀얼 데이터를 상기 비트스트림으로부터 획득하는 단계; 및상기 레지듀얼 데이터와 상기 예측 픽처를 이용하여 상기 현재 픽처를 복원하는 단계를 더 포함하는 비디오 복호화 방법.
- 현재 계층에 포함된 샘플들의 위상이 조정되는지 여부를 나타내는 업샘플링 위상 세트 정보를 비트스트림으로부터 획득하고, 상기 업샘플링 위상 세트 정보에 따라 위상이 조정되는 경우, 루마 수직 위상차, 루마 수평 위상차, 크로마 수직 위상차 및 크로마 수평 위상차를 상기 비트스트림으로부터 획득하는 수신 추출부; 및상기 루마 수직 위상차, 상기 루마 수평 위상차, 상기 크로마 수직 위상차 및 상기 크로마 수평 위상차에 기초하여 참조 계층을 업샘플링하여 상기 현재 계층의 예측 픽처를 결정하는 복호화부를 포함하고,상기 루마 수직 위상차 및 상기 루마 수평 위상차에 따라 상기 예측 픽처에 포함된 루마 샘플들의 위상이 조정되고, 상기 크로마 수직 위상차 및 상기 크로마 수평 위상차에 따라 상기 예측 픽처에 포함된 크로마 샘플들의 위상이 조정되고, 상기 루마 수직 위상차 및 상기 크로마 수직 위상차는 상기 참조 계층의 주사 방식에 의하여 결정된 것을 특징으로 하는 비디오 복호화 장치.
- 제5항에 있어서,상기 루마 수직 위상차 및 상기 크로마 수직 위상차는 상기 참조 계층의 주사 방식 및 상기 참조 계층과 상기 현재 계층의 정렬 방식에 의하여 결정되고,상기 정렬 방식에는 상기 참조 계층 및 상기 현재 계층의 좌측 상단부를 기준으로 상기 참조 계층 및 상기 현재 계층을 정렬하는 영 위상 정렬 방식(zero-phase alignment) 및 상기 참조 계층 및 상기 현재 계층의 중심을 기준으로 상기 참조 계층 및 상기 현재 계층을 정렬하는 대칭 정렬 방식(symmetric alignment)이 포함되는 것을 특징으로 하는 비디오 복호화 장치.
- 제5항에 있어서,상기 수신 추출부는, 상기 참조 계층의 높이 및 너비를 나타내는 참조 계층 크기 정보, 상기 참조 계층으로부터 계층 간 예측에 사용되는 참조 영역을 정의하기 위한 참조 계층 오프셋 정보, 상기 현재 계층의 높이 및 너비를 나타내는 현재 계층 크기 정보 및 상기 현재 계층으로부터 상기 참조 영역과 대응되는 확장 참조 영역을 정의하기 위한 현재 계층 오프셋 정보를 상기 비트스트림으로부터 획득하고,상기 복호화부는, 상기 참조 계층 크기 정보와 상기 참조 계층 오프셋 정보로부터 상기 참조 영역의 크기를 결정하고, 상기 현재 계층 크기 정보와 상기 현재 계층 오프셋 정보로부터 상기 확장 참조 영역의 크기를 결정하며, 상기 참조 영역의 크기와 상기 확장 참조 영역의 크기에 따라, 상기 참조 영역과 상기 확장 참조 영역 간 크기의 비율를 나타내는 축적비를 결정하고, 상기 루마 수직 위상차, 상기 루마 수평 위상차, 상기 크로마 수직 위상차 및 상기 크로마 수평 위상차, 상기 참조 계층 오프셋 정보, 상기 현재 계층 오프셋 정보 및 상기 축적비에 따라 상기 참조 픽처를 업샘플링함으로써, 상기 예측 픽처를 결정하는 것을 특징으로 하는 비디오 복호화 장치.
- 제5항에 있어서,상기 수신 추출부는, 상기 현재 계층에 포함된 샘플값들과 상기 현재 계층의 참조 픽처에 포함된 샘플값들의 차이값을 포함하는 레지듀얼 데이터를 상기 비트스트림으로부터 획득하고,상기 복호화부는, 상기 레지듀얼 데이터와 상기 예측 픽처를 이용하여 상기 현재 픽처를 복원하는 것을 특징으로 하는 비디오 복호화 장치.
- 현재 계층 및 참조 계층의 주사 방식을 결정하는 단계;상기 현재 계층은 순행 주사 방식에 의하여 주사되고, 상기 참조 계층이 비월 주사 방식에 의하여 주사된다고 나타낼 때, 상기 참조 계층의 필드를 결정하는 단계;상기 주사 방식 및 상기 참조 계층의 필드에 기초하여 상기 현재 계층의 예측 픽처에 포함된 루마 샘플 및 크로마 샘플들의 위상을 보정하기 위한 루마 수직 위상차, 루마 수평 위상차, 크로마 수직 위상차 및 크로마 수평 위상차를 결정하는 단계;상기 루마 수직 위상차, 상기 루마 수평 위상차, 상기 크로마 수직 위상차 및 상기 크로마 수평 위상차에 기초하여 상기 참조 계층을 업샘플링함으로써 상기 현재 계층의 예측 픽처를 결정하는 단계;상기 현재 계층의 샘플값들과 상기 현재 계층의 예측 픽처의 샘플값들의 차이값들을 포함하는 레지듀얼 데이터를 결정하는 단계; 및상기 루마 수직 위상차, 상기 루마 수평 위상차, 상기 크로마 수직 위상차, 상기 크로마 수평 위상차 및 상기 레지듀얼 데이터를 포함하는 비트스트림을 출력하는 단계를 포함하는 비디오 부호화 방법.
- 현재 계층 및 참조 계층의 주사 방식을 결정하고, 상기 현재 계층은 순행 주사 방식에 의하여 주사되고, 상기 참조 계층이 비월 주사 방식에 의하여 주사된다고 나타낼 때, 상기 참조 계층의 필드를 결정하며, 상기 주사 방식 및 상기 참조 계층의 필드에 기초하여 상기 현재 계층의 예측 픽처에 포함된 루마 샘플 및 크로마 샘플들의 위상을 보정하기 위한 루마 수직 위상차, 루마 수평 위상차, 크로마 수직 위상차 및 크로마 수평 위상차를 결정하고, 상기 루마 수직 위상차, 상기 루마 수평 위상차, 상기 크로마 수직 위상차 및 상기 크로마 수평 위상차에 기초하여 상기 참조 계층을 업샘플링함으로써 상기 현재 계층의 예측 픽처를 결정하며, 상기 현재 계층의 샘플값들과 상기 현재 계층의 예측 픽처의 샘플값들의 차이값들을 포함하는 레지듀얼 데이터를 결정하는 부호화부; 및상기 루마 수직 위상차, 상기 루마 수평 위상차, 상기 크로마 수직 위상차 및 상기 크로마 수평 위상차 및 상기 레지듀얼 데이터를 포함하는 비트스트림을 출력하는 출력부를 포함하는 비디오 부호화 장치.
- 제1항의 비디오 복호화 방법을 실행하기 위한 프로그램이 기록된 컴퓨터로 판독 가능한 기록 매체.
- 제9항의 비디오 부호화 방법을 실행하기 위한 프로그램이 기록된 컴퓨터로 판독 가능한 기록 매체.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201580025071.XA CN106464890A (zh) | 2014-03-14 | 2015-03-16 | 可伸缩视频编码/解码方法和设备 |
US15/126,005 US20180176588A1 (en) | 2014-03-14 | 2015-03-16 | Scalable video encoding/decoding method and apparatus |
KR1020167025749A KR20160132857A (ko) | 2014-03-14 | 2015-03-16 | 스케일러블 비디오 부호화/복호화 방법 및 장치 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201461953180P | 2014-03-14 | 2014-03-14 | |
US61/953,180 | 2014-03-14 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2015137786A1 true WO2015137786A1 (ko) | 2015-09-17 |
Family
ID=54072134
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2015/002532 WO2015137786A1 (ko) | 2014-03-14 | 2015-03-16 | 스케일러블 비디오 부호화/복호화 방법 및 장치 |
Country Status (4)
Country | Link |
---|---|
US (1) | US20180176588A1 (ko) |
KR (1) | KR20160132857A (ko) |
CN (1) | CN106464890A (ko) |
WO (1) | WO2015137786A1 (ko) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019031703A1 (ko) * | 2017-08-09 | 2019-02-14 | 엘지전자 주식회사 | 영상 코딩 시스템에서 선형 모델에 따른 영상 디코딩 방법 및 장치 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6653860B2 (ja) * | 2014-05-26 | 2020-02-26 | シャープ株式会社 | 画像復号装置、画像符号化装置、画像復号方法、画像符号化方法及びコンピュータ読み取り可能な記録媒体 |
PL3284255T3 (pl) * | 2015-04-13 | 2023-10-30 | V-Nova International Limited | Kodowanie wielu sygnałów z docelową szybkością transmisji danych sygnału w zależności od informacji o złożoności |
US10368107B2 (en) * | 2016-08-15 | 2019-07-30 | Qualcomm Incorporated | Intra video coding using a decoupled tree structure |
GB2573486B (en) * | 2017-12-06 | 2022-12-21 | V Nova Int Ltd | Processing signal data using an upsampling adjuster |
CN113170131A (zh) * | 2018-10-11 | 2021-07-23 | Lg电子株式会社 | 变换系数编码方法及其装置 |
KR20220116357A (ko) * | 2018-12-21 | 2022-08-22 | 삼성전자주식회사 | 부호화 방법 및 그 장치, 복호화 방법 및 그 장치 |
CN113545044A (zh) | 2019-03-08 | 2021-10-22 | 北京字节跳动网络技术有限公司 | 视频处理中的整形模型 |
KR20220024006A (ko) | 2019-06-22 | 2022-03-03 | 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 | 크로마 잔차 스케일링을 위한 신택스 요소 |
EP3977738A4 (en) | 2019-07-07 | 2022-08-17 | Beijing Bytedance Network Technology Co., Ltd. | SIGNALING OF CHROMA RESIDUAL SCALE |
US11303909B2 (en) * | 2019-09-18 | 2022-04-12 | Qualcomm Incorporated | Scaling ratio and output full resolution picture in video coding |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000175194A (ja) * | 1998-12-01 | 2000-06-23 | Sony Corp | 画像復号装置及び画像復号方法 |
WO2006101682A2 (en) * | 2005-03-18 | 2006-09-28 | Sharp Kabushiki Kaisha | Methods and systems for extended spatial scalability with picture-level adaptation |
US7136417B2 (en) * | 2002-07-15 | 2006-11-14 | Scientific-Atlanta, Inc. | Chroma conversion optimization |
KR20090128504A (ko) * | 2001-11-30 | 2009-12-15 | 소니 가부시끼 가이샤 | 화상 정보 복호 방법 및 장치 |
WO2014039547A1 (en) * | 2012-09-04 | 2014-03-13 | Qualcomm Incorporated | Signaling of down-sampling phase information in scalable video coding |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8054886B2 (en) * | 2007-02-21 | 2011-11-08 | Microsoft Corporation | Signaling and use of chroma sample positioning information |
JP2009015025A (ja) * | 2007-07-05 | 2009-01-22 | Hitachi Ltd | 画像信号処理装置および画像信号処理方法 |
-
2015
- 2015-03-16 KR KR1020167025749A patent/KR20160132857A/ko not_active Application Discontinuation
- 2015-03-16 CN CN201580025071.XA patent/CN106464890A/zh not_active Withdrawn
- 2015-03-16 WO PCT/KR2015/002532 patent/WO2015137786A1/ko active Application Filing
- 2015-03-16 US US15/126,005 patent/US20180176588A1/en not_active Abandoned
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2000175194A (ja) * | 1998-12-01 | 2000-06-23 | Sony Corp | 画像復号装置及び画像復号方法 |
KR20090128504A (ko) * | 2001-11-30 | 2009-12-15 | 소니 가부시끼 가이샤 | 화상 정보 복호 방법 및 장치 |
US7136417B2 (en) * | 2002-07-15 | 2006-11-14 | Scientific-Atlanta, Inc. | Chroma conversion optimization |
WO2006101682A2 (en) * | 2005-03-18 | 2006-09-28 | Sharp Kabushiki Kaisha | Methods and systems for extended spatial scalability with picture-level adaptation |
WO2014039547A1 (en) * | 2012-09-04 | 2014-03-13 | Qualcomm Incorporated | Signaling of down-sampling phase information in scalable video coding |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019031703A1 (ko) * | 2017-08-09 | 2019-02-14 | 엘지전자 주식회사 | 영상 코딩 시스템에서 선형 모델에 따른 영상 디코딩 방법 및 장치 |
Also Published As
Publication number | Publication date |
---|---|
US20180176588A1 (en) | 2018-06-21 |
CN106464890A (zh) | 2017-02-22 |
KR20160132857A (ko) | 2016-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2015137786A1 (ko) | 스케일러블 비디오 부호화/복호화 방법 및 장치 | |
WO2014051408A1 (ko) | 인터 레이어 예측 오차를 부호화하기 위한 sao 오프셋 보상 방법 및 그 장치 | |
WO2014107066A1 (ko) | 위상차를 고려한 영상 업샘플링을 이용하는 스케일러블 비디오 부호화 방법 및 장치, 스케일러블 비디오 복호화 방법 및 장치 | |
WO2014030920A1 (ko) | 트리 구조의 부호화 단위에 기초한 예측 정보의 인터-레이어 비디오 부호화 방법 및 그 장치, 트리 구조의 부호화 단위에 기초한 예측 정보의 인터-레이어 비디오 복호화 방법 및 그 장치 | |
WO2015137783A1 (ko) | 인터 레이어 비디오의 복호화 및 부호화를 위한 머지 후보 리스트 구성 방법 및 장치 | |
WO2015009068A1 (ko) | 비트 뎁스 및 컬러 포맷의 변환을 동반하는 업샘플링 필터를 이용하는 스케일러블 비디오 부호화 방법 및 장치, 스케일러블 비디오 복호화 방법 및 장치 | |
WO2013187654A1 (ko) | 컬러성분별로 sao 파라미터를 공유하는 비디오 부호화 방법 및 그 장치, 비디오 복호화 방법 및 그 장치 | |
WO2013115560A1 (ko) | 공간 서브영역별로 비디오를 부호화하는 방법 및 그 장치, 공간 서브영역별로 비디오를 복호화하는 방법 및 그 장치 | |
WO2015152608A4 (ko) | 서브블록 기반 예측을 수행하는 인터 레이어 비디오 복호화 방법 및 그 장치 및 서브블록 기반 예측을 수행하는 인터 레이어 비디오 부호화 방법 및 그 장치 | |
WO2014014251A1 (ko) | Sao 파라미터를 시그널링하는 비디오 부호화 방법 및 그 장치, 비디오 복호화 방법 및 그 장치 | |
WO2013069958A1 (ko) | 비디오 복호화 과정에서 역양자화 및 역변환의 데이터를 클리핑하는 역변환 방법 및 그 장치 | |
WO2013157817A1 (ko) | 트리 구조의 부호화 단위에 기초한 다시점 비디오 부호화 방법 및 그 장치, 트리 구조의 부호화 단위에 기초한 다시점 비디오 복호화 방법 및 그 장치 | |
WO2014137175A1 (ko) | 선택적인 노이즈제거 필터링을 이용한 스케일러블 비디오 부호화 방법 및 그 장치, 선택적인 노이즈제거 필터링을 이용한 스케일러블 비디오 복호화 방법 및 그 장치 | |
WO2013095047A1 (ko) | 최대 부호화 단위 별로 픽셀 분류에 따른 오프셋 조정을 이용하는 비디오 부호화 방법 및 그 장치, 비디오 복호화 방법 및 그 장치 | |
WO2015099506A1 (ko) | 서브블록 기반 예측을 수행하는 인터 레이어 비디오 복호화 방법 및 그 장치 및 서브블록 기반 예측을 수행하는 인터 레이어 비디오 부호화 방법 및 그 장치 | |
WO2015002444A1 (ko) | 필터링을 수반한 비디오 부호화 및 복호화 방법 및 그 장치 | |
WO2014109594A1 (ko) | 휘도차를 보상하기 위한 인터 레이어 비디오 부호화 방법 및 그 장치, 비디오 복호화 방법 및 그 장치 | |
WO2014163458A1 (ko) | 인터 레이어 복호화 및 부호화 방법 및 장치를 위한 인터 예측 후보 결정 방법 | |
WO2016117930A1 (ko) | 인터 레이어 비디오 복호화 방법 및 그 장치 및 인터 레이어 비디오 부호화 방법 및 그 장치 | |
WO2015053601A1 (ko) | 멀티 레이어 비디오 부호화 방법 및 그 장치, 멀티 레이어 비디오 복호화 방법 및 그 장치 | |
WO2015133866A1 (ko) | 서브 블록 기반 예측을 수행하는 인터 레이어 비디오 복호화 방법 및 그 장치 및 서브 블록 기반 예측을 수행하는 인터 레이어 비디오 부호화 방법 및 그 장치 | |
WO2015012622A1 (ko) | 움직임 벡터 결정 방법 및 그 장치 | |
WO2015105385A1 (ko) | 스케일러블 비디오 부호화/복호화 방법 및 장치 | |
WO2015194896A1 (ko) | 휘도차를 보상하기 위한 인터 레이어 비디오 부호화 방법 및 그 장치, 비디오 복호화 방법 및 그 장치 | |
WO2014129872A1 (ko) | 메모리 대역폭 및 연산량을 고려한 스케일러블 비디오 부호화 장치 및 방법, 스케일러블 비디오 복호화 장치 및 방법 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15761579 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15126005 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 20167025749 Country of ref document: KR Kind code of ref document: A |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15761579 Country of ref document: EP Kind code of ref document: A1 |