WO2021199783A1

WO2021199783A1 - Image decoding device, image decoding method, and program

Info

Publication number: WO2021199783A1
Application number: PCT/JP2021/006375
Authority: WO
Inventors: 圭河村; 内藤　整
Original assignee: Kddi株式会社
Priority date: 2020-03-30
Filing date: 2021-02-19
Publication date: 2021-10-07
Also published as: JP2021164005A; CN115315958A

Abstract

This image decoding device 200 comprises: a sub picture layout derivation unit 211 configured to derive layout information of a sub picture by decoding encoded data; a fill slice identification unit 212 configured to identify whether each of a plurality of slices constituting the sub picture is a fill slice by decoding the encoded data; and a slice decoding unit 213 configured to reconstruct slice data on the basis of the layout information and the identification results of the fill slice identification unit 211 by decoding the encoded data.

Description

Image decoding device, image decoding method and program

The present invention relates to an image decoding device, an image decoding method, and a program.

The sub-picture in VVC (Versatile Video Coding), which is the next-generation video coding method described in Patent Document 1, is a rectangular region composed of one or a plurality of slices in the picture, and is, for example, as shown in FIG. In addition, the picture is completely covered by sub-pictures, which are a plurality of rectangular areas, without duplication.

Further, Non-Patent Document 1 discloses a procedure for extracting a bit stream for each sub-picture. Using this procedure, it is possible to generate a new bitstream by extracting a bitstream of sub-pictures corresponding to a desired region from different bitstreams and further combining a plurality of sub-pictures into a picture. can.

For example, in a cube-map type 360 ° video, it is possible to generate bitstreams having different resolutions for a subpicture including a visual field area and a subpicture other than the field area without re-encoding.

However, in VVC, which is a next-generation moving image coding method, there is a restriction that the shapes of pictures and sub-pictures are rectangular, so if you want to separate and combine bitstreams, you can arrange sub-pictures in any way. There is a problem that the picture may not be completely covered without duplication.

Therefore, the present invention has been made in view of the above-mentioned problems, and even if a rectangular shape is not formed when a plurality of bit streams are combined, a picture is decoded by a sub-picture function that serves as a filler in which no content exists. It is an object of the present invention to provide an image decoding device, an image decoding method, and a program capable of performing the above.

The first feature of the present invention is an image decoding device configured to decode the coded data, and is configured to decode the coded data to derive the layout information of the sub-picture. Sub-picture layout derivation unit, a filling slice identification unit configured to decode the coded data to identify whether each slice constituting the sub-picture is a filling slice, the layout information, and the layout information. It is a gist to include a slice decoding unit configured to decode the coded data and reconstruct the slice data based on the identification result by the filled slice identification unit.

The second feature of the present invention is an image decoding method, which comprises a step of decoding coded data to derive layout information of a sub-picture and decoding the coded data to form the sub-picture. It is a gist to have a step of identifying whether each slice is a filled slice and a step of decoding the coded data and reconstructing the slice data based on the layout information and the identification result.

A third feature of the present invention is a program that causes a computer to function as an image decoding device configured to decode coded data, wherein the image decoding device decodes the coded data. A sub-picture layout derivation unit configured to derive the layout information of the sub-picture and a sub-picture layout derivation unit configured to decode the coded data to identify whether each slice constituting the sub picture is a filled slice. A slice decoding unit configured to decode the coded data and reconstruct the slice data based on the layout information and the identification result by the filling slice identification unit. The gist is to prepare.

According to the present invention, an image decoding device and an image decoding method capable of decoding a picture by a sub-picture function that serves as a filler in which no content exists even when a plurality of bit streams are combined to form a rectangle. And programs can be provided.

It is a figure which shows an example of the structure of the image processing system 1 which concerns on one Embodiment. It is a figure which shows an example of the functional block of the image coding apparatus 100 which concerns on one Embodiment. It is a figure which shows an example of the functional block of the entropy coding part 104 of the image coding apparatus 100 which concerns on one Embodiment. It is a figure which shows an example of the syntax used in the image processing system 1 which concerns on one Embodiment. It is a figure which shows an example of the functional block of the image decoding apparatus 200 which concerns on one Embodiment. It is a figure which shows an example of the functional block of the entropy decoding unit 201 of the image decoding apparatus 200 which concerns on one Embodiment. It is a figure which shows an example of the syntax used in the image processing system 1 which concerns on one Embodiment. It is a figure which shows an example of the structure of the coded data conversion system 2 which concerns on one Embodiment. In one embodiment, it is a figure for demonstrating an example of the case where the coded data is extracted and combined with respect to the 360 ° video of the cube map system composed of 6 planes. It is a figure for demonstrating the prior art.

Hereinafter, embodiments of the present invention will be described with reference to the drawings. The components in the following embodiments can be replaced with existing components as appropriate, and various variations including combinations with other existing components are possible. Therefore, the description of the following embodiments does not limit the content of the invention described in the claims.

(First Embodiment)
FIG. 1 is a diagram showing an example of a functional block of the image processing system 1 according to the first embodiment of the present invention. The image processing system 1 includes an image coding device 100 that encodes a moving image and generates coded data, and an image decoding device 200 that decodes the coded data generated by the image coding device 100. The above-mentioned coded data is transmitted and received between the image coding device 100 and the image decoding device 200, for example, via a transmission line.

<Image coding device 100>
FIG. 2 is a diagram showing an example of a functional block of the image coding device 100. As shown in FIG. 2, the image coding apparatus 100 includes an inter-prediction unit 101, an intra-prediction unit 102, a conversion / quantization unit 103, an entropy coding unit 104, and an inverse conversion / inverse quantization unit 105. , The subtraction unit 106, the addition unit 107, the in-loop filter unit 108, the frame buffer 109, the block division unit 110, and the block integration unit 111.

The block division unit 110 is configured to divide the entire screen of the input image into the same square and output an image (divided image) recursively divided by a quadtree or the like.

The inter-prediction unit 101 is configured to perform inter-prediction and generate and output an inter-prediction image by using the divided image input by the block division unit 110 and the filtered local decoding image input from the frame buffer 109. Has been done.

The intra prediction unit 102 is configured to generate and output an intra prediction image by performing intra prediction using the divided image input by the block dividing unit 110 and the pre-filter local decoding image described later.

The conversion / quantization unit 103 performs an orthogonal conversion process on the residual signal input from the subtraction unit 106, performs a quantization process on the conversion coefficient obtained by the orthogonal conversion process, and performs the quantization process. It is configured to output the resulting quantized level value.

The entropy coding unit 104 is configured to entropy-code the quantized level value, conversion unit size, and conversion size input from the conversion / quantization unit 103 and output them as coded data.

The inverse conversion / inverse quantization unit 105 performs an inverse quantization process on the quantized level value input from the conversion / quantization unit 103, and with respect to the conversion coefficient obtained by the inverse quantization process. It is configured to perform the inverse quadrature conversion process and output the inverse quadrature-converted residual signal obtained by the inverse quadrature conversion process.

The subtraction unit 106 is configured to output a residual signal which is a difference between the divided image input by the block division unit 110 and the intra prediction image or the inter prediction image.

The addition unit 107 is configured to output a divided image obtained by adding the inverse orthogonal transformation residual signal input from the inverse transformation / inverse quantization unit 105 to the intra-prediction image or the inter-prediction image. There is.

The block integration unit 111 is configured to output a pre-filter locally decoded image obtained by integrating the divided images input from the addition unit 107.

The in-loop filter unit 108 applies in-loop filter processing such as deblocking filter processing to the pre-filter local decoding image input from the block integration unit 111 to generate and output the post-filter local decoding image. It is configured in. Here, the pre-filter locally decoded image is a signal obtained by adding the inverse orthogonal-transformed residual signal and the intra-predicted image or the inter-predicted image.

The frame buffer 109 accumulates the filtered locally decoded image and appropriately supplies it to the inter-prediction unit 101 as the filtered locally decoded image.

Hereinafter, the entropy coding unit 104 of the image coding device 100 according to the present embodiment will be described with reference to FIG. FIG. 3 is a diagram showing an example of a part of the functional blocks of the entropy coding unit 104 of the image coding device 100 according to the present embodiment.

The entropy coding unit 104 is configured to derive a subpicture composed of filled slices. Specifically, as shown in FIG. 3, the entropy coding unit 104 includes a sub-picture layout determination unit 121, a filling slice determination unit 122, and a slice coding unit 123.

The sub-picture layout determination unit 121 is configured to determine the layout of the sub-pictures signaled at the sequence level and output layout information related to the determined layout.

The filling slice determination unit 122 is configured to determine whether each slice constituting each sub-picture is a filling slice and output the determination result.

The slice coding unit 123 is configured to encode a bit stream in slice units and output it as slice data.

Here, the slice coding unit 123 corresponds to the corresponding slice when the corresponding slice is not a filled slice, based on the layout information determined by the sub-picture layout determination unit 121 and the determination result by the filling slice determination unit 122. The bit stream is encoded and output as slice data in accordance with the slice decoding process described in Non-Patent Document 1.

On the other hand, the slice coding unit 123, when the corresponding slice is a filling slice, based on the layout information determined by the sub-picture layout determination unit 121 and the determination result by the filling slice determination unit 122, the filling slice in the corresponding slice. It is configured to be encoded and output as slice data for.

As slice data for filling slices, it may be a bit stream indicating that the intra slice is not divided from the maximum CU size, the intra prediction mode is INTRA_PLANAR, and there is no residual signal. Further, the slice data for the filling slice may be a bit stream indicating that the interslice is not divided from the maximum CU size, the motion vector is the merge index 0 in the merge mode, and there is no residual signal.

FIG. 4 shows an example of the syntax including the layout information determined by the sub-picture layout determination unit 121 and the determination result by the filling slice determination unit 122. The syntax is encoded by the entropy coding unit 104 (slice coding unit 123).

In FIG. 4, the subpics_present_flag is a flag indicating that the picture is composed of one or more sub-pictures.

In FIG. 7, filler_slice_subpic_present_flag is a flag indicating the presence or absence of a sub-picture composed of filled slices in the sequence. For example, when the value of filler_slice_subpics_present_flag is "1" (when it is valid), it indicates that a subpicture composed of filled slices exists in the sequence, and when the value of filler_slice_subpics_present_flag is "0" (when it is valid). (If invalid) indicates that there are no subpictures consisting of filled slices in the sequence.

In FIG. 4, subpic_ctu_top_left_x [i] is a coordinate value in the upper left horizontal CTU unit constituting the subpicture, and subpic_ctu_top_left_y [i] is a coordinate value in the upper left vertical CTU unit constituting the subpicture. be.

In FIG. 4, subpic_width_minus_1 [i] is the number of horizontal CTUs constituting the sub-picture, and subpic_height_minus_1 [i] is the number of vertical CTUs constituting the sub-picture.

In FIG. 4, the subpic_treated_as_pic_flag [i] is a flag indicating whether or not the decoding process is applied to the sub-picture as a picture except for the loop filter process, and the loop_filter_acloss_subpic_enable_flag [i] applies the loop filter process to the sub-picture boundary. A flag that indicates that it may do.

In FIG. 4, filler_slice_subpic_flag [i] is a flag indicating whether each slice constituting each subpicture is a filled slice. For example, if the value of filler_slice_subpic_flag [i] is "1", it means that the slice is a filled slice, and if the value of filler_slice_subpic_flag [i] is "1", the slice is not a filled slice. Is shown.

Here, the filler_slice_subpic_flag [i] corresponds to the determination result by the filling slice determination unit 122.

Note that the subpic_ctu_top_left_x [i], subpic_ctu_top_left_y [i], subpic_width_minus_1 [i], and subpic_height_minus_1 [i] in FIG. 4 correspond to the above layout information.

<Image Decoding Device 200>
FIG. 5 is a block diagram of the image decoding device 200 according to the present embodiment. As shown in FIG. 5, the image decoding apparatus 200 according to the present embodiment includes an entropy decoding unit 201, an inverse transformation / inverse quantization unit 202, an inter prediction unit 203, an intra prediction unit 204, and an addition unit 205. , The in-loop filter unit 206, the frame buffer 207, and the block integration unit 208 are provided.

The entropy decoding unit 201 is configured to entropy-decode the coded data and output the quantized level value, the motion compensation method generated by the image coding apparatus 100, and the like.

The inverse conversion / inverse quantization unit 202 performs an inverse quantization process on the quantized level value input from the entropy decoding unit 201, and an inverse orthogonal conversion on the result obtained by the inverse quantization process. It is configured to perform processing and output as a residual signal.

The inter-prediction unit 203 is configured to perform inter-prediction using the filtered locally decoded image input from the frame buffer 207 to generate and output an inter-prediction image.

The intra prediction unit 204 is configured to perform intra prediction using the pre-filter locally decoded image input from the addition unit 205 to generate and output an intra prediction image.

The addition unit 205 combines the residual signal input from the inverse conversion / inverse quantization unit 202 and the prediction image (inter prediction image input from the inter prediction unit 203 or intra prediction image input from the intra prediction unit 204). It is configured to output the divided image obtained by adding.

Here, the prediction image is a prediction image calculated by a prediction method obtained by entropy decoding among the inter prediction image input from the inter prediction unit 203 and the intra prediction image input from the intra prediction unit 204. Is.

The block integration unit 208 is configured to output a pre-filter locally decoded image obtained by integrating the divided images input from the addition unit 205.

The in-loop filter unit 206 applies in-loop filter processing such as deunit filter processing to the pre-filter local decoding image input from the block integration unit 208 to generate and output the post-filter local decoding image. It is configured.

The frame buffer 207 is configured to accumulate the filtered locally decoded image input from the in-loop filter 206, appropriately supply it to the inter-prediction unit 203 as a filtered locally decoded image, and output it as a decoded image. There is.

Hereinafter, the entropy decoding unit 201 of the image decoding apparatus 200 according to the present embodiment will be described with reference to FIG.

The entropy decoding unit 201 is configured to derive a subpicture composed of filled slices. As shown in FIG. 6, the entropy decoding unit 201 includes a sub-picture layout derivation unit 211, a filling slice identification unit 212, and a slice decoding unit 213.

The sub-picture layout derivation unit 211 is configured to decode the syntax signaled at the sequence level and derive the layout information of the sub-picture based on the syntax.

Similar to the sub-picture layout derivation unit 211, the filled slice identification unit 212 decodes the syntax signaled at the sequence level, and based on the syntax, each slice constituting each sub-picture is a filled slice. Is configured to identify.

Such syntax is included in the coded data and is the same as the syntax shown in FIG.

The slice decoding unit 213 is configured to decode the coded data and reconstruct the slice data based on the layout information derived by the sub-picture layout derivation unit 211 and the identification result by the filling slice identification unit 212. ..

Specifically, when the slice decoding unit 213 determines that the corresponding slice is not a filled slice based on the layout information derived by the sub-picture layout derivation unit 211 and the identification result by the filled slice identification unit 212, the slice decoding unit 213 does not. It is configured to decode and reconstruct the slice data in accordance with the slice decoding process described in Patent Document 1.

On the other hand, when the slice decoding unit 213 determines that the slice is a filled slice based on the layout information derived by the sub-picture layout derivation unit 211 and the identification result by the filled slice identification unit 212, the slice decoding unit 213 determines that the slice is a filled slice. It is configured to be decoded and output as slice data for filled slices.

According to the present embodiment, even when a plurality of bit streams are combined to form a rectangle, the image decoding apparatus 200 can decode the picture as a rectangle by introducing a sub-picture composed of filled slices. ..

Further, according to the present embodiment, the amount of processing to be decoded by the image decoding apparatus 200 that can decode each sub-picture can be reduced.

Further, according to the present embodiment, in the extraction and combination of bitstreams, it is possible to remove and add packed slices in the multiplexing layer.
(Second Embodiment)
Hereinafter, the image processing system 1 according to the second embodiment of the present invention will be described focusing on the differences from the image processing system 1 according to the first embodiment described above.

The slice decoding unit 213 according to the second embodiment of the present invention determines that the slice is a filled slice based on the layout information derived by the sub-picture layout derivation unit 211 and the identification result by the filled slice identification unit 212. In this case, the slice is regarded as the following encoded data, and the slice data is reconstructed by decoding according to the slice decoding process described in Non-Patent Document 1.

Specifically, it is assumed that the slice type is intra, the CU size is the maximum size, the intra prediction mode is INTRA_PLANAR, and there is no residual signal.

(Third Embodiment)
Hereinafter, the image processing system 1 according to the third embodiment of the present invention will be described with reference to FIG. 7, focusing on the differences from the image processing system 1 according to the first embodiment described above.

In the image coding apparatus 100 according to the third embodiment of the present invention, the sub-picture layout determination unit 121 is configured to determine whether to use the filling slice function in the sequence and output the determination result. ..

When it is determined that the filling slice function is not used, the filling slice determination unit 122 is configured not to perform the above determination.

FIG. 7 shows an example of the syntax including the determination result by the sub-picture layout determination unit 121.

In FIG. 7, the filer_slice_subpic_present_ "flag is a flag indicating whether or not to use the filling slice function in the sequence. The "filer_slice_subpic_present_" flag corresponds to the determination result by the sub-picture layout determination unit 121.

In the image decoding apparatus 200 according to the third embodiment of the present invention, the sub-picture layout derivation unit 211 is configured to derive whether or not to utilize the filling slice function in the sequence based on the above-mentioned syntax. ..

When it is determined that the filling slice function is not used, the filling slice identification unit 212 is configured not to perform the above-mentioned identification.

According to this embodiment, the coding performance for signaling the presence or absence of filled slices is improved.

(Fourth Embodiment)
Hereinafter, the image processing system 1 according to the fourth embodiment of the present invention will be described focusing on the differences from the image processing system 1 according to the first embodiment described above.

In the image coding apparatus 100 according to the present embodiment, the slice coding unit 123 uses the filling slice as the corresponding slice based on the layout information determined by the sub-picture layout determination unit 121 and the determination result by the filling slice determination unit 122. In some cases, it is configured to be encoded and output as slice data filled with a specific value.

In the image decoding device 200 according to the present embodiment, the slice decoding unit 213 determines that the slice is a filling slice based on the layout information derived by the sub-picture layout derivation unit 211 and the identification result by the filling slice identification unit 212. When it is determined, it is configured to be decoded and output as slice data filled with a specific value.

According to this embodiment, it is possible to provide a lightweight decoding procedure that reflects the characteristics of the filled slice.

(Fifth Embodiment)
Hereinafter, the coded data conversion system 2 according to the fifth embodiment of the present invention will be described with reference to FIG.

The coded data conversion system 2 is a coded data extraction device 21 that inputs coded data of a plurality of moving images and outputs coded data corresponding to a designated subpicture from outside the system, and the output coded code. It includes a coded data combining device 22 that inputs data, combines sub-pictures, and outputs coded data as a picture.

The coded data combining device 22 is configured to output coded data that satisfies the requirement that the picture is rectangular and is completely covered with subpictures, and that the subpictures do not overlap with each other. ..

Here, if the coded data combining device 22 does not satisfy the above-mentioned requirements even if the input sub-pictures are combined, the coded data combining device 22 adds a sub-picture composed of an arbitrary number of filled slices to satisfy the above-mentioned requirements. It is configured to output the conversion data.

Specifically, as shown in FIG. 9, coded data may be extracted and combined for a 360 ° video of a cube map method composed of 6 surfaces. The characters in FIG. 9 represent the positions and orientations of the respective surfaces of the cube map system, where L means the left surface, R means the right surface, F means the front surface, Bot means the lower surface, Bac means the back surface, and Top means the upper surface. Fil is a packed slice according to the present invention.

In such a case, as shown in the left column of FIG. 9, if three types of coded data of high resolution, medium resolution, and low resolution are generated in advance, the coded data conversion system 2 can be set in the field of view region and the transmission band. Accordingly, as shown in the right column of FIG. 9, the coded data can be dynamically extracted and combined.

For example, the resolution for each surface in FIG. 9 is 1024 x 1024 pixels, 768 x 768 pixels, and 512 x 512 pixels.

Here, as shown in the upper part of the right column of FIG. 9, the coded data combining device 22 is composed of high-resolution sub-pictures on the two surfaces corresponding to the visual field region, and low-resolution sub-pictures on the other four surfaces. When a picture composed of pictures is generated, a picture of 2048 × 1536 pixels is generated as a whole.

Further, as shown in the lower part of the right column of FIG. 9, the coded data combining device 22 has a medium-resolution sub-picture for two surfaces corresponding to the visual field region and a low-resolution sub-picture for the other four surfaces. When a picture composed of two filled slices of 256 × 512 pixels and a filled slice of 512 × 256 pixels is generated, a picture of 2048 × 1280 pixels is generated as a whole.

By introducing the filling slice in this way, it is possible to improve the degree of freedom in selecting the sub-picture resolution in the extraction and combination of the coded data.

The above-mentioned image coding device 100 and image decoding device 200 may be realized by a program that causes a computer to execute each function (each process).

In each of the above-described embodiments, the present invention has been described by taking application to the image coding device 100 and the image decoding device 200 as an example, but the present invention is not limited to this, and the image coding is not limited to this. The same applies to an image coding system and an image decoding system having the functions of the device 100 and the image decoding device 200.

1 ... Image processing system 100 ...

Image coding devices

101, 203 ...

Inter prediction unit

102, 204 ... Intra prediction unit 103 ... Conversion / quantization unit 104 ...

Entropy coding unit

105, 202 ... Inverse conversion / inverse quantization unit 106 ...

Subtraction units

107, 205 ...

Addition units

108, 206 ... In-

loop filter units

109, 207 ... Frame buffer 110 ...

Block division units

111, 208 ... Block integration unit 121 ... Sub-picture layout determination unit 122 ... Filling slice determination unit 123 ... Slice coding unit 200 ... Image decoding device 201 ... Entropy decoding unit 211 ... Sub-picture layout derivation unit 212 ... Filled slice identification unit 213 ... Slice decoding unit 2 ... Coded data conversion system 21 ... Coded data extraction device 22 ... Coding Data coupling device

Claims

An image decoding device configured to decode encoded data.
A sub-picture layout derivation unit configured to decode the coded data to derive sub-picture layout information, and a sub-picture layout derivation unit.
A filled slice identification unit configured to decode the coded data and identify whether each slice constituting the subpicture is a filled slice.
An image decoding apparatus including a slice decoding unit configured to decode the coded data and reconstruct the slice data based on the layout information and the identification result by the filling slice identification unit. ..
The sub-picture layout derivation unit decodes a flag indicating the presence or absence of a sub-picture composed of the filled slice in the sequence.
The image decoding apparatus according to claim 1, wherein the filled slice identification unit is configured to perform the identification only when the flag is valid.
The slice decoding unit is configured to decode and output as slice data filled with a specific value when it is determined that the slice is a filled slice. 2. The image decoding apparatus according to 2.
Claim 1 is characterized in that the slice decoding unit is configured to decode the slice as specific coded data and output the slice data when it is determined that the slice is the filled slice. Or the image decoding apparatus according to 2.
The process of decoding the coded data and deriving the layout information of the sub-picture,
A step of decoding the coded data to identify whether each slice constituting the subpicture is a filled slice.
An image decoding method comprising a step of decoding the coded data and reconstructing slice data based on the layout information and the identification result.
A program that causes a computer to function as an image decoding device that is configured to decode encoded data.
The image decoding device is
A sub-picture layout derivation unit configured to decode the coded data to derive sub-picture layout information, and a sub-picture layout derivation unit.
A filled slice identification unit configured to decode the coded data and identify whether each slice constituting the subpicture is a filled slice.
A program including a slice decoding unit configured to decode the coded data and reconstruct the slice data based on the layout information and the identification result by the filling slice identification unit.