WO2020184673A1

WO2020184673A1 - Image decoding device, image decoding method, image encoding device, and image encoding method

Info

Publication number: WO2020184673A1
Application number: PCT/JP2020/010880
Authority: WO
Inventors: 勇司藤本
Original assignee: ソニー株式会社
Priority date: 2019-03-12
Filing date: 2020-03-12
Publication date: 2020-09-17

Abstract

The present invention pertains to an image decoding device, an image decoding method, an image encoding device, and an image encoding method capable of further enhancing practicality. Identification information for, regarding images stored in all Tile group NAL units in an access unit, identifying whether a current frame will be able to become a reference image in the future is stored in a header area of the access unit. The identification information is stored in an AUD NAL unit which indicates an access unit division or stored in a NEW NAL unit which is provided separately from the AUD NAL unit, and is applied to all the Tile group NAL units in the access unit. This technology can be applied to, for example, an image decoding device and an image encoding device.

Description

Image decoding device, image decoding method, image coding device, and image coding method

The present disclosure relates to an image decoding device, an image decoding method, an image coding device, and an image coding method, and in particular, an image decoding device, an image decoding method, an image coding device, which can be made more practical. And the image coding method.

In H.264 / AVC, which is one of the standard specifications of the image coding method, each image (picture) is divided into one or more slices. Then, each slice is classified into one of I slice (Intra Slice), P slice (Predictive Slice) and B slice (Bi-predictive Slice). The I slice is a slice that is independently decoded without referring to another image. A P-slice is a slice that is decoded by referencing a single other image. A B slice is a slice that is decoded by referencing a plurality of other images.

The picture at the beginning of the sequence consisting of only I slices is called an IDR (Instantaneous Decoding Refresh) picture. The IDR picture is identified by the value of the NAL (Network Abstraction Layer) unit type. The pictures in the same sequence following the IDR picture do not refer to the pictures before the IDR picture in the decoding order, and are located only after the IDR picture in the presentation order. Therefore, when attempting random access (decoding / playback from the middle of a stream instead of decoding from the beginning) at a certain point in the middle of the video of a coded stream, the video is appropriately extracted from the IDR picture in the vicinity of the specified time. Can be decrypted to.

In the standardization work of HEVC (High Efficiency Video Coding), which is the next-generation image coding method following H.264 / AVC, CRA (Clean Random Access) pictures are identified by the NAL unit type value in addition to IDR pictures. Has been proposed. A CRA picture is a picture consisting of only I slices in the middle of a sequence. A picture following a CRA picture in both the decoding order and the display order does not refer to both the picture preceding the CRA picture in the decoding order and the picture preceding the CRA picture in the display order. Therefore, when random access to the CRA picture at the time in the middle of the video (decoding of the video from the CRA picture) is performed, the decoding process of the picture following the CRA picture in the display order can be performed without failure.

Here, in

Non-Patent Documents

1 and 2, NAL Unit Header is provided so that it can be determined whether or not the current frame image will be a reference image in the future when implementing TrickPlay playback such as Nx speed playback or reverse playback. A standard for storing identification information is disclosed in. For example, in the case of H.264 / AVC, the identification information is stored in nal_ref_idx, and in the case of H.265 / HEVC, the identification information is stored in nal_unit_type.

By the way, in the current VVC (JVET-M1001), the identification information indicating whether the current picture is a reference picture or a non-reference picture is not specified. For this reason, it has been difficult to support TrickPlay such as Nx speed playback and reverse playback in the past. For example, in AVC, it can be judged by nal_ref_idx of NalUnitHeader, and in HEVC, it can be judged by nal_unit_type, but if you try to judge in the same way, NALUnit of all Tilegroups in AccessUnit (AU) The same identification information will be stored in the Header more than once. Therefore, not only the redundancy becomes redundant, but also the processing becomes complicated, the memory cost increases, and the coding efficiency is affected. In addition, it becomes difficult to effectively utilize the transmission band, especially in a narrow band.

In addition, information on whether or not the current picture will become a reference image in the future is processed on a frame-by-frame basis, but when storing that information in the NAL Unit Header of all Tile groups during a Multi-Tile image, it is redundant. There is concern that it will become. Therefore, it is required to extend the conventional standard to further improve its practicality.

This disclosure has been made in view of such a situation, and is intended to further enhance the practicality.

The image decoding device of the first aspect of the present disclosure includes a decoding unit that decodes an image of a bit stream composed of an access unit in which at least one or more NAL units are arranged, and accesses the header area of the access unit. For the images stored in all the Tile group NAL units in the unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored.

The image decoding method of the first aspect of the present disclosure includes the image decoding apparatus performing the image decoding process decoding a bitstream image composed of an access unit in which at least one or more NAL units are arranged. In the header area of the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored for the images stored in all the Tile group NAL units in the access unit. ..

In the first aspect of the present disclosure, an image of a bitstream consisting of an access unit in which at least one or more NAL units are arranged is decoded. Then, in the header area of the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored for the images stored in all the Tile group NAL units in the access unit. There is.

The image coding device of the second aspect of the present disclosure includes a coding unit for encoding an image of a bitstream consisting of an access unit in which at least one or more NAL units are arranged, and is provided in a header area of the access unit. For the images stored in all the Tile group NAL units in the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored.

In the image coding method of the second aspect of the present disclosure, the image coding device that performs the image coding process encodes a bitstream image composed of an access unit in which at least one or more NAL units are arranged. In the header area of the access unit, there is identification information for identifying whether or not the current frame can be a reference image in the future for the images stored in all the Tile group NAL units in the access unit. It is stored.

In the second aspect of the present disclosure, an image of a bitstream consisting of access units in which at least one or more NAL units are located is encoded. Then, in the header area of the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored for the images stored in all the Tile group NAL units in the access unit. ..

It is a figure which shows a reference document. It is a block diagram which shows the structural example of one Embodiment of the image coding apparatus to which this technique is applied. It is a block diagram which shows the structural example of one Embodiment of the image decoding apparatus to which this technique is applied. It is a figure explaining TrickPlay. It is a figure which shows the configuration example of the bit stream in the 1st variation. It is a figure which shows the 1st description example of the syntax of the AUDNAL unit in the 1st variation. It is a figure which shows the 2nd description example of the syntax of the AUD NAL unit in the 1st variation. It is a flowchart explaining the file generation process of the image coding process. It is a flowchart explaining AUD coding processing. It is a flowchart explaining AUD coding processing. It is a flowchart explaining the Tile group coding process. It is a flowchart explaining the file decoding process of the image decoding process. It is a flowchart explaining AUD decoding processing. It is a flowchart explaining AUD decoding processing. It is a flowchart explaining the Tile group decoding process. It is a figure which shows the configuration example of the bit stream in the 2nd variation. It is a figure which shows the description example of the syntax of the NEW NAL unit in the 2nd variation. It is a flowchart explaining the file generation process of the image coding process. It is a flowchart explaining NEW coding processing. It is a flowchart explaining the Tile group coding process. It is a flowchart explaining the file decoding process of the image decoding process. NEW It is a flowchart explaining the decoding process. It is a flowchart explaining the Tile group decoding process. It is a block diagram which shows the structural example of one Embodiment of the computer to which this technique is applied.

<Documents that support technical contents and technical terms>
The scope disclosed in the present specification is not limited to the contents of the examples, and the contents of the references REF1 to REF6 shown in FIG. 1, which are known at the time of filing, are also incorporated in the present specification by reference. Is done.

That is, the contents described in the references REF1 to REF6 shown in FIG. 1 are also the basis for judging the support requirements. For example, even if the NAL unit structure described in Reference REF4 and the Bytestream Format described in Reference REF5 are not directly defined in the detailed description of the invention, they are within the scope of the present disclosure. It shall meet the support requirements of the claims. Similarly, technical terms such as Parsing, Syntax, and Semantics are also within the scope of the present disclosure, even if they are not directly defined in the detailed description of the invention. Yes, and shall meet the support requirements of the claims.

<Terms>
In this application, the following terms are defined as follows.

<Block>
Unless otherwise specified, a "block" (not a block indicating a processing unit) used as a partial area or a processing unit of an image (picture) indicates an arbitrary partial area in the picture, and its size, shape, and processing. The characteristics are not limited. For example, "block" includes TB (Transform Block), TU (Transform Unit), PB (Prediction Block), PU (Prediction Unit), SCU (Smallest Coding Unit), CU (Coding Unit), and LCU (Largest Coding Unit). ), CTB (Coding TreeBlock), CTU (Coding Tree Unit), conversion block, subblock, macroblock, tile, slice, etc., any partial area (processing unit) shall be included.

<Specify block size>
Further, when specifying the size of such a block, not only the block size may be directly specified, but also the block size may be indirectly specified. For example, the block size may be specified using the identification information that identifies the size. Further, for example, the block size may be specified by the ratio or difference with the size of the reference block (for example, LCU or SCU). For example, when transmitting information for specifying a block size as a syntax element or the like, the information for indirectly specifying the size as described above may be used as the information. By doing so, the amount of information of the information can be reduced, and the coding efficiency may be improved. Further, the designation of the block size also includes the designation of the range of the block size (for example, the designation of the range of the allowable block size).

<Unit of information / processing>
The data unit in which various information is set and the data unit targeted by various processes are arbitrary and are not limited to the above-mentioned examples. For example, these information and processing are TU (Transform Unit), TB (Transform Block), PU (Prediction Unit), PB (Prediction Block), CU (Coding Unit), LCU (Largest Coding Unit), and subblock, respectively. , Blocks, tiles, slices, pictures, sequences, or components may be set, or the data of those data units may be targeted. Of course, this data unit can be set for each information or process, and it is not necessary that the data unit of all the information or process is unified. The storage location of these information is arbitrary, and may be stored in the header, parameter set, or the like of the above-mentioned data unit. Further, it may be stored in a plurality of places.

<Control information>
The control information related to the present technology may be transmitted from the coding side to the decoding side. For example, control information (for example, enabled_flag) that controls whether or not the application of the present technology described above is permitted (or prohibited) may be transmitted. Further, for example, control information indicating an object to which the present technology is applied (or an object to which the present technology is not applied) may be transmitted. For example, control information may be transmitted that specifies the block size (upper and lower limits, or both) to which the present technology is applied (or allowed or prohibited), frames, components, layers, and the like.

<Flag>
In the present specification, the "flag" is information for identifying a plurality of states, and is not only information used for identifying two states of true (1) or false (0), but also three or more states. It also contains information that can identify the state. Therefore, the value that this "flag" can take may be, for example, 2 values of 1/0 or 3 or more values. That is, the number of bits constituting this "flag" is arbitrary, and may be 1 bit or a plurality of bits. Further, the identification information (including the flag) is assumed to include not only the identification information in the bit stream but also the difference information of the identification information with respect to a certain reference information in the bit stream. In, the "flag" and "identification information" include not only the information but also the difference information with respect to the reference information.

<Associate metadata>
Further, various information (metadata, etc.) regarding the coded data (bit stream) may be transmitted or recorded in any form as long as it is associated with the coded data. Here, the term "associate" means, for example, to make the other data available (linkable) when processing one data. That is, the data associated with each other may be combined as one data or may be individual data. For example, the information associated with the coded data (image) may be transmitted on a transmission path different from the coded data (image). Further, for example, the information associated with the coded data (image) may be recorded on a recording medium (or another recording area of the same recording medium) different from the coded data (image). Good. Note that this "association" may be a part of the data, not the entire data. For example, an image and information corresponding to the image may be associated with each other in an arbitrary unit such as a plurality of frames, one frame, or a part within the frame.

In the present specification, "synthesize", "multiplex", "add", "integrate", "include", "store", "insert", "insert", "insert". A term such as "" means combining a plurality of objects into one, for example, combining encoded data and metadata into one data, and means one method of "associating" described above. Further, in the present specification, the coding includes not only the whole process of converting an image into a bit stream but also a part of the process. For example, it not only includes processing that includes prediction processing, orthogonal transformation, quantization, arithmetic coding, etc., but also includes processing that collectively refers to quantization and arithmetic coding, prediction processing, quantization, and arithmetic coding. Including processing, etc. Similarly, decoding includes not only the entire process of converting a bitstream into an image, but also some processes. For example, it not only includes processing that includes inverse arithmetic decoding, inverse quantization, inverse orthogonalization, prediction processing, etc., but also processing that includes inverse arithmetic decoding and inverse quantization, inverse arithmetic decoding, inverse quantization, and prediction processing. Including processing that includes and.

Hereinafter, specific embodiments to which the present technology is applied will be described in detail with reference to the drawings.

<Image coding device and image decoding device>
An image coding device and an image decoding device to which the present technology is applied will be described with reference to FIGS. 2 and 3.

As shown in FIG. 2, the image coding device 10 includes a coding unit 11, a determination unit 12, a VCL buffer 13, a non-VCL buffer 14, a file generation unit 15, and a control unit 16.

The coding unit 11 is an encoder that operates according to the HEVC method. The coding unit 11 acquires an image sequence to be encoded from a moving image source such as a camera or a television tuner connected to the image coding device 10. Then, the coding unit 11 generates a coded bit stream by executing various processes such as intra prediction, inter-frame prediction, orthogonal conversion, quantization, and lossless coding for each image in the acquired image sequence. To do. Slice data corresponding to the substance of the image is generated as a VCL (Video Coding Layer) NAL unit.

On the other hand, parameter sets such as SPS (Sequence Parameter Set), PPS (Picture Parameter Set) and APS (Adaptation Parameter Set) can be generated as non-VCL NAL units. The coding unit 11 outputs a VCL NAL unit, that is, a bit stream of slice data to the file generation unit 15 via the VCL buffer 13. Further, the coding unit 11 outputs the parameter set to the file generation unit 15 via the non-VCL buffer 14.

The determination unit 12 determines the type of each image in the image sequence encoded by the coding unit 11. More specifically, in the present embodiment, the determination unit 12 at least determines whether each image is an IDR picture, a CRA picture, or another picture. Both the IDR picture and the CRA picture are pictures consisting of only I slices.

As described above, the IDR picture is the picture at the beginning of the sequence. The pictures in the same sequence following the IDR picture do not refer to the pictures before the IDR picture in the coding order (decoding order), and are located only after the IDR picture in the display order. The CRA picture is a picture that is located in the middle of the sequence and can be used as a decoding start picture at the time of random access on the decoder side. A picture following a CRA picture in both coding order (decoding order) and display order does not refer to both a picture preceding the CRA picture in the coding order (decoding order) and a picture preceding the CRA picture in the display order. .. The determination unit 12 outputs the determination result to the coding unit 11 and the file generation unit 15. The coding unit 11 assigns a NAL unit type indicating the type of each image determined by the determination unit 12 to the NAL header of each NAL unit.

By the way, when random access to a CRA picture is performed, only the CRA picture of the random access destination and the picture following the CRA picture in the decoding order are targeted for decoding. However, there may be a picture that follows the CRA picture in the decoding order and precedes the CRA picture in the display order. In the present specification, such a picture is referred to as a preceding picture. As can be understood from the definition of a CRA picture, it is permissible for the preceding picture to reference a picture that precedes the CRA picture in the decoding order.

When random access to the CRA picture is performed, the preceding picture that refers to the picture preceding the CRA picture in the decoding order is not normally decoded. This is because the reference picture of the preceding picture has not been decoded. That is, when random access is performed, whether or not the preceding picture to be decoded can be normally decoded depends on the reference relationship of the preceding picture. Therefore, the determination unit 12 may further determine the preceding picture that is not normally decoded when the random access to each CRA picture is performed, and provide the determination result to the file generation unit 15.

The VCL buffer 13 buffers the VCL NAL unit. The slice data of the CRA picture is buffered by the VCL buffer 13.

The non-VCL buffer 14 buffers the non-VCL NAL unit.

The file generation unit 15 generates a moving image file for storing a series of encoded image data according to a file format including a header area and a data area, and outputs the video file to the storage unit 20. In this specification, an example in which the MP4 format is used as the file format will be mainly described. However, the technique according to the present disclosure is not limited to such an example, and is applicable to other types of video file formats having a header area and a data area.

In the MP4 format, data is stored in an object called a box and recorded in object units. Within a file, the boxes form a tree structure and the parent box contains child boxes. The type of each box is identified by a four-letter identifier.

More specifically, the file generation unit 15 inserts a bit stream of slice data corresponding to the VCL NAL unit into the data area (for example, mdat box) of the moving image file in the decoding order. Further, the file generation unit 15 inserts one or more parameter sets corresponding to the non-VCL NAL unit into the header area (for example, moov box) of the moving image file. Further, in the present embodiment, the file generation unit 15 inserts CRA information that identifies one or more CRA pictures determined by the determination unit 12 into the header area of the moving image file. Further, the file generation unit 15 may include the preceding picture information for identifying the preceding picture that is not normally decoded when the random access to each CRA picture is performed in the CRA information.

The control unit 16 controls the coding process executed in the image coding device 10. For example, when the instruction to start coding is detected, the control unit 16 causes the coding unit 11 to encode the designated image sequence. Further, the control unit 16 causes the file generation unit 15 to generate a moving image file for storing the image data encoded by the coding unit 11. The control unit 16 may control the generation of the coded stream by using a virtual decoder model called HRD (Hypothetical Reference Decoder) so as not to disrupt the decoder buffer.

As shown in FIG. 3, the image decoding device 30 includes a VCL buffer 31, a non-VCL buffer 32, a parameter memory 33, a decoding unit 34, an output buffer 35, and a control unit 37.

The VCL buffer 31 buffers a bit stream of image data (typically slice data) read from a data area (for example, mdat box) of a moving image file stored in the storage unit 20.

The non-VCL buffer 32 buffers the parameter sets such as SPS, PPS and APS read from the header area (for example, moov box) of the moving image file stored in the storage unit 20, and the header information such as CRA information.

The parameter memory 33 collectively stores the information in the header area of the file acquired via the non-VCL buffer 32. The CRA information that can be recorded in the header area of the moving image file in the various formats described above is held by the parameter memory 33 while the moving image file is opened.

The decoding unit 34 is a decoder that operates according to the HEVC method. The decoding unit 34 decodes the image sequence from the bit stream acquired from the data area of the moving image file via the VCL buffer 31. The decoding unit 34 uses the parameters in the parameter set stored in the parameter memory 33 when decoding the image. The decoding unit 34 rearranges the images in the decoded image sequence in the display order, and outputs the rearranged images to the output buffer 35.

The decoding unit 34 normally accesses the slice data stored in the moving image track in the moving image file in the decoding order in order from the beginning. However, when the control unit 37 detects a random access instruction, the decoding unit 34 randomly accesses the decoding start picture (in the middle of the moving image track) specified by the control unit 37, and starts an image sequence from the decoding start picture. To decrypt. The decoding start picture is one of the IDR picture and the CRA picture in the moving image track.

The output buffer 35 is a decoding picture buffer (DPB; Decoded Picture Buffer) for buffering the image decoded by the decoding unit 34. The image buffered by the output buffer 35 is output to a display or processor (not shown) at the output timing of the image.

The control unit 37 controls the image decoding process executed in the image decoding device 30. For example, the control unit 37 opens the moving image file stored in the storage unit 20 in response to an instruction from the user, and causes the decoding unit 34 to start decoding the image sequence. Further, when the random access instruction is detected, the control unit 37 uses the CRA information to specify any CRA picture in the image sequence as the decoding start picture. Then, the control unit 37 causes the decoding unit 34 to decode the image sequence from the specified decoding start picture (that is, from the middle of the moving image track).

The control unit 37 typically uses a CRA picture located closest to the timing specified in the random access instruction (for example, the timing pointed to by the operated pointer of the seek bar of the moving image playback window) as the decoding start picture. Identify.

When the CRA information includes the preceding picture information described above, the control unit 37 skips the output from the output buffer 35 (and decoding by the decoding unit 34) of the preceding picture identified using the preceding picture information and which is not normally decoded. You may let me. By using the preceding picture information, it is possible to prevent the damaged image from being displayed on the display or being output to an external processor. At that time, the control unit 37 does not have to determine ex post facto whether or not each image has been normally decoded.

Further, the control unit 37 can transmit a command to the control unit 16 of the image coding device 10.

Here, with reference to FIG. 4, TrickPlay in the present embodiment will be described.

For example, on the upper side of FIG. 4, as an example of Nx speed playback of a playback player, as shown by a white arrow, there is an example of skipping decoding of a Non-reference picture having an M = 3 GOP structure and playing it 3 times. It is shown. When such N-fold speed reproduction is performed, it is required to determine, for example, whether the current picture is decoded by Ref or Non-Ref.

Further, in the lower part of FIG. 4, as an example of random access and reverse playback of the playback player, an example of reverse playback in which a decoded image of a Ref picture for 1 GOP is stored is shown as shown by a white arrow. ing. When such random access is performed and reverse playback is performed, it is required to determine, for example, whether the current picture is decoded and stored by Ref or Non-Ref.

<First variation of bitstream>
The first variation of the bitstream will be described with reference to FIGS. 5 to 15.

As shown in FIG. 5, the bit stream is composed of a plurality of access units (AU: AccessUnit), and at least one or more NAL units are arranged in the access units corresponding to one frame each. In addition, there are multiple types of NAL units, such as AUD (Access Unit Delimiter) NAL unit, SPS (Sequence Parameter Set) NAL unit, PPS (Picture Parameter Set) NAL unit, and Tile group NAL unit.

The AUD NAL unit indicates the delimiter of access units, and in general, only one is always placed at the beginning of each access unit. The current AUD already has a syntax that indicates the attributes of all Tile group NALs in the access unit.

The SPS NAL unit stores the sequence parameters required for bitstream playback. The PPS NAL unit stores the sequence parameters required for picture playback. Images for each tile group are stored in the Tilegroup NAL unit.

And, in the configuration of the bit stream, three cases as shown in the figure are assumed.

For example, in the first case, in all access units, the AUD NAL unit is placed first, the SPS NAL unit is placed second, the PPS NAL unit is placed third, and so on. The Tile group NAL unit is placed in. That is, in the first case, the SPS NAL unit and the PPS NAL unit are assigned to the individual access units.

In the second case, in the first access unit, the AUD NAL unit is placed first, the SPS NAL unit is placed second, the PPS NAL unit is placed third, and so on. The Tile group NAL unit is placed in. Then, in the second and subsequent access units, the AUD NAL unit is arranged first, the PPS NAL unit is arranged second, and the Tile group NAL unit is continuously arranged thereafter. That is, in the second case, the SPS NAL unit is given to the first access unit, and the PPS NAL unit is given to each access unit.

Further, in the third case, in the first access unit, the AUD NAL unit is arranged first, the SPS NAL unit is arranged second, the PPS NAL unit is arranged third, and the subsequent access units are continuous. The Tile group NAL unit is placed in. Then, in the second and subsequent access units, the AUD NAL unit is arranged first, and the Tile group NAL unit is continuously arranged thereafter. That is, in the third case, the SPS NAL unit and the PPS NAL unit are assigned only to the first access unit.

Then, in the first variation of the bitstream, in any of the first to third cases, the identification information of whether or not the current frame becomes the reference image in the future is stored in the AUDNAL unit of the bitstream of the image. In that respect, it has been changed from the conventional configuration. For example, the syntax of the identification information indicating whether the access unit (all Tile groups in the access unit) is reference or non-reference is set in the AUD NAL unit. In this way, by arranging the identification information in the AUD NAL unit in the header area of the access unit and applying the identification information to all the Tile group NAL units in the access unit, the identification information is applied to each Tile group. It is possible to avoid the fact that it becomes redundant, the processing becomes complicated, and the coding efficiency is adversely affected as compared with the case of storing.

Specifically, FIG. 6 shows a first description example of the AUDNAL syntax in the first variation of the bitstream.

For example, as shown in the syntax shown at the bottom of FIG. 6, the syntax of pic_type has been expanded, and identification information indicating whether all Tile groups in the access unit are reference or non-reference has been added. There is. At this time, as shown in the upper part of FIG. 6, the definition of pic_type in the AUD NAL unit is changed.

Further, FIG. 7 shows a second description example of the AUDNAL syntax in the first variation of the bitstream.

As shown in the syntax shown, the ref_pic_flag in the AUDNAL unit is added as 1-bit identification information indicating whether all Tile groups in the access unit are reference or non-reference.

An example of a file generation process performed in the image coding process executed by the image coding device 10 when the first variation of the bit stream is applied will be described with reference to the flowchart shown in FIG.

For example, when the image sequence is supplied to the image coding device 10, the file generation process is started, and in step S11, the coding unit 11 determines whether or not there is an NAL unit to be generated from the image sequence.

If the coding unit 11 determines in step S11 that there is an NAL unit to be generated from the image sequence, the process proceeds to step S12.

In step S12, the coding unit 11 determines whether or not the NAL unit to be generated from the image sequence is an AUD NAL unit.

If the coding unit 11 determines in step S12 that the NAL unit to be generated from the image sequence is the AUD NAL unit, the process proceeds to step S13. Then, in step S13, the coding unit 11 performs AUD coding processing to generate an AUD NAL unit containing the identification information, supplies the AUD NAL unit including the identification information to the file generation unit 15 via the non-VCL buffer 14, and then the processing is performed in step S13. Returning to S11, the same process is repeated thereafter. On the other hand, if the coding unit 11 determines in step S12 that the NAL unit to be generated from the image sequence is not the AUD NAL unit, the process proceeds to step S14.

In step S14, the coding unit 11 determines whether or not the NAL unit to be generated from the image sequence is the Tile group NAL unit.

If the coding unit 11 determines in step S14 that the NAL unit to be generated from the image sequence is the Tile group NAL unit, the process proceeds to step S15. Then, in step S15, the coding unit 11 performs the Tile group coding process to generate the Tile group NAL unit, supplies the Tile group NAL unit to the file generation unit 15 via the VCL buffer 13, and then returns to step S11. Hereinafter, the same process is repeated. On the other hand, if the coding unit 11 determines in step S14 that the NAL unit to be generated from the image sequence is not the Tile group NAL unit, the process proceeds to step S16.

In step S16, the coding unit 11 performs a coding process for encoding other NAL units other than the AUD NAL unit and the Tile group NAL unit, and then returns to step S11, and the same process is repeated thereafter. Is done.

On the other hand, in step S11, when the coding unit 11 determines that there is no NAL unit to be generated from the image sequence, the file generation process is terminated.

FIG. 9 is a flowchart illustrating the AUD coding process performed in step S13 of FIG. Note that this AUD coding process is a process when the syntax is described in the first description example shown in FIG.

In step S21, the encoding unit 11 performs a process of setting the pic_type including the ref_pic_flag information.

In step S22, the coding unit 11 generates an AUD NAL unit by performing a frame delimiter process for dividing a frame (access unit), and then the AUD coding process is terminated.

FIG. 10 is a flowchart illustrating the AUD coding process performed in step S13 of FIG. Note that this AUD coding process is a process when the syntax is described in the second description example shown in FIG. 7.

In step S31, the coding unit 11 performs a process of setting the pic_type.

In step S32, the encoding unit 11 performs a process of setting ref_pic_flag information.

In step S33, the coding unit 11 generates an AUD NAL unit by performing a frame delimiter process for dividing a frame (access unit), and then the AUD coding process is terminated.

FIG. 11 is a flowchart illustrating the Tile group coding process performed in step S15 of FIG. 7.

In step S41, the encoding unit 11 refers to the ref_pic_flag information set for the AUD NAL unit.

In step S42, the encoding unit 11 determines whether or not 1 is set in the ref_pic_flag information.

If the coding unit 11 determines in step S42 that 1 is set in the ref_pic_flag information, the process proceeds to step S43. In step S43, the coding unit 11 encodes the Tile group NAL unit by the coding process when the current frame is a reference frame, and then the Tile group coding process ends.

On the other hand, if the coding unit 11 determines in step S42 that 1 is not set in the ref_pic_flag information (other than 1 is set), the process proceeds to step S44. In step S44, the coding unit 11 encodes the Tile group NAL unit by the coding process when the current frame is a non-reference frame, and then the Tile group coding process ends.

By the file generation process as described above, the image encoding device 10 generates an AUD NAL unit in which the identification information is arranged, and generates a moving image file composed of a bit stream as described with reference to FIG. 5 above. That is, it is possible to encode an image of a bitstream consisting of access units in which at least one or more NAL units are located.

An example of a file decoding process in the image decoding process executed by the image decoding device 30 when the first variation of the bit stream is applied will be described with reference to the flowchart shown in FIG.

For example, when the image decoding device 30 reads out the bit stream stored in the storage unit 20, the process is started, and in step S51, the decoding unit 34 determines whether or not there is an NAL unit to be decoded from the bit stream. ..

If the decoding unit 34 determines in step S51 that there is an NAL unit to be decoded from the bit stream, the process proceeds to step S52.

In step S52, the decoding unit 34 determines whether or not the NAL unit to be decoded from the bit stream is an AUD NAL unit.

If the decoding unit 34 determines in step S52 that the NAL unit to be decoded from the bit stream is the AUD NAL unit, the process proceeds to step S53. Then, in step S53, the decoding unit 34 performs the AUD decoding process, supplies the identification information acquired by decoding the AUD NAL unit to the parameter memory 33, and then returns to step S51, and so on. The process is repeated. On the other hand, in step S52, when the decoding unit 34 determines that the NAL unit to be decoded from the bit stream is not the AUD NAL unit, the process proceeds to step S54.

In step S54, the decoding unit 34 determines whether or not the NAL unit to be decoded from the bit stream is the Tile group NAL unit.

If the decoding unit 34 determines in step S54 that the NAL unit to be decoded from the bit stream is the Tile group NAL unit, the process proceeds to step S55. Then, in step S55, the decoding unit 34 performs the Tile group decoding process and supplies the image obtained by decoding the Tile group NAL unit to the output buffer 35, and then returns to step S51, and the same process is performed thereafter. Is repeated. On the other hand, in step S54, when the decoding unit 34 determines that the NAL unit to be decoded from the bit stream is not the Tile group NAL unit, the process proceeds to step S56.

In step S56, the decoding unit 34 performs a decoding process for decoding other NAL units other than the AUD NAL unit and the Tile group NAL unit, and then returns to step S51, and the same process is repeated thereafter. ..

On the other hand, in step S51, when the decoding unit 34 determines that there is no NAL unit to be decoded from the bit stream, the file decoding process is terminated.

FIG. 13 is a flowchart illustrating the AUD decoding process performed in step S53 of FIG. Note that this AUD decoding process is a process when the syntax is described in the first description example shown in FIG.

In step S61, the decoding unit 34 acquires ref_pic_flag information from the pic_type set in the AUD NAL unit.

In step S62, the decoding unit 34 decodes the AUD NAL unit by performing a frame delimiter process for dividing the frame (access unit), and then the AUD decoding process is terminated.

FIG. 14 is a flowchart illustrating the AUD decoding process performed in step S53 of FIG. Note that this AUD decoding process is a process when the syntax is described in the second description example shown in FIG. 7.

In step S71, the decoding unit 34 acquires the pic_type set in the AUD NAL unit.

In step S72, the decoding unit 34 acquires the ref_pic_flag information set in the AUD NAL unit.

In step S73, the decoding unit 34 decodes the AUD NAL unit by performing a frame delimiter process for dividing the frame (access unit), and then the AUD decoding process is terminated.

FIG. 15 is a flowchart illustrating the Tile group decoding process performed in step S55 of FIG.

In step S81, the decoding unit 34 refers to the ref_pic_flag information set for the AUD NAL unit.

In step S82, the decoding unit 34 determines whether or not 1 is set in the ref_pic_flag information.

If the decoding unit 34 determines in step S82 that 1 is set in the ref_pic_flag information, the process proceeds to step S83. In step S83, the decoding unit 34 decodes the Tile group NAL unit by the decoding process when the current frame is the reference frame, and then the Tile group decoding process is terminated.

On the other hand, if the decoding unit 34 determines in step S82 that 1 is not set in the ref_pic_flag information (other than 1 is set), the process proceeds to step S84. In step S44, the decoding unit 34 decodes the Tile group NAL unit by the decoding process when the current frame is a non-reference frame, and then the Tile group decoding process is terminated.

As described above, the image decoding device 30 decodes the image according to the ref_pic_flag information set for the AUD NAL unit, that is, the bit stream composed of the access unit in which at least one or more NAL units are arranged. The image can be decoded.

<Second variation of bitstream>
A second variation of the bitstream will be described with reference to FIGS. 16 to 23.

As shown in FIG. 16, the bitstream is composed of a plurality of access units as described with reference to FIG. 5 above, and each access unit is provided with at least one NAL unit. Will be done. Further, as in the case of the first variation of the bitstream (see FIG. 5), three cases as shown in FIG. 16 are assumed in the configuration of the bitstream.

Then, in the second variation of the bitstream, the NEW NAL unit is used as the NAL unit in addition to the AUD NAL unit, SPS NAL unit, PPS NAL unit, and Tile group NAL unit.

For example, in the first case, in all access units, the AUD NAL unit is placed first, the SPS NAL unit is placed second, the PPS NAL unit is placed third, and the NEW NAL unit is placed fourth. Is placed, and after that, the Tile group NAL unit is placed continuously. That is, in the first case, the SPS NAL unit and the PPS NAL unit are given to the individual access units, and the NEW NAL unit is further given.

In the second case, in the first access unit, the AUD NAL unit is placed first, the SPS NAL unit is placed second, the PPS NAL unit is placed third, and the NEW NAL unit is placed fourth. Is placed, and after that, the Tile group NAL unit is placed continuously. Then, in the second and subsequent access units, the AUD NAL unit is arranged first, the PPS NAL unit is arranged second, the NEW NAL unit is arranged third, and Tile is continuously arranged thereafter. group NAL unit is placed. That is, in the second case, the SPS NAL unit is given to the first access unit, the PPS NAL unit is given to each access unit, and the NEW NAL unit is further given.

In the third case, in the first access unit, the AUD NAL unit is arranged first, the SPS NAL unit is arranged second, the PPS NAL unit is arranged third, and the NEW NAL unit is arranged fourth. Is placed, and after that, the Tile group NAL unit is placed continuously. Then, in the second and subsequent access units, the AUD NAL unit is arranged first, the NEW NAL unit is arranged second, and the Tile group NAL unit is continuously arranged thereafter. That is, in the third case, the SPS NAL unit and the PPS NAL unit are given only to the first access unit, and the NEW NAL unit is given to each access unit.

Then, in the second variation of the bitstream, in any of the first to third cases, the identification information of whether or not the current frame becomes the reference image in the future is stored in the NEW NAL unit of the bitstream of the image. In that respect, it has been changed from the conventional configuration. That is, the syntax of the identification information indicating Ref / non-ref is set in the NEW NAL unit, which is always placed at the beginning of each access unit. In this way, by arranging the identification information in the NEW NAL unit (NAL unit for identification) in the header area of the access unit and applying the identification information to all the Tile group NAL units in the access unit. , It is possible to avoid redundancy, complicated processing, and adverse effect on coding efficiency as compared with the case of storing identification information for each Tile group.

Specifically, FIG. 17 shows a description example of the syntax of the NEW NAL unit in the second variation of the bitstream.

For example, in the NEW NAL syntax, the ref_pic_flag in the AUD NAL unit is set with identification information indicating whether all Tile groups in the access unit are reference or non-reference.

An example of a file generation process performed in the image coding process executed by the image coding apparatus 10 when the second variation of the bit stream is applied will be described with reference to the flowchart shown in FIG.

In steps S101 to S103, the same processing as in steps S11 to S13 described with reference to the flowchart of FIG. 8 described above is performed. Then, in step S104, the coding unit 11 determines whether or not the NAL unit to be generated from the image sequence is the NEW NAL unit.

If the coding unit 11 determines in step S104 that the NAL unit to be generated from the image sequence is the NEW NAL unit, the process proceeds to step S105. Then, in step S105, the coding unit 11 performs a NEW coding process to generate a NEW NAL unit containing the identification information, supplies the file to the file generation unit 15 via the non-VCL buffer 14, and then the process steps. Returning to S101, the same process is repeated thereafter. On the other hand, in step S104, when the coding unit 11 determines that the NAL unit to be generated from the image sequence is not the NEW NAL unit, the process proceeds to step S106.

Then, in steps S106 to 108, after the same processing as in steps S14 to S16 described with reference to the flowchart of FIG. 8 described above is performed, the file generation processing is terminated.

FIG. 19 is a flowchart illustrating the NEW coding process performed in step S105 of FIG.

In step S111, the coding unit 11 sets ref_pic_flag as identification information to generate a NEW NAL unit, and then the NEW coding process is terminated.

FIG. 20 is a flowchart illustrating the Tile group coding process performed in step S107 of FIG.

In step S121, the encoding unit 11 refers to the ref_pic_flag information set for the NEW NAL unit.

After that, in steps S122 to S124, the same processing as in steps S42 to S44 described with reference to the flowchart of FIG. 11 described above is performed, and then the Tile group coding process is terminated.

By the file generation process as described above, the image encoding device 10 generates a NEW NAL unit in which the identification information is arranged, and generates a moving image file composed of a bit stream as described with reference to FIG. 5 above. That is, it is possible to encode an image of a bitstream consisting of access units in which at least one or more NAL units are located.

An example of a file decoding process in the image decoding process executed by the image decoding device 30 when the second variation of the bit stream is applied will be described with reference to the flowchart shown in FIG.

In steps S131 to S133, the same processing as steps S51 to S53 described with reference to the flowchart of FIG. 12 described above is performed. Then, in step S134, the decoding unit 34 determines whether or not the NAL unit to be decoded from the bit stream is a NEW NAL unit.

If the decoding unit 34 determines in step S134 that the NAL unit to be decoded from the bit stream is the NEW NAL unit, the process proceeds to step S135. Then, in step S135, the decoding unit 34 performs a NEW decoding process, supplies the identification information acquired by decoding the NEW NAL unit to the parameter memory 33, and then returns to step S131, and so on. The process is repeated. On the other hand, if the decoding unit 34 determines in step S134 that the NAL unit to be decoded from the bit stream is not the NEW NAL unit, the process proceeds to step S136.

Then, in steps S136 to 138, the file decoding process is terminated after the same processing as in steps S54 to S56 described with reference to the flowchart of FIG. 12 described above is performed.

FIG. 22 is a flowchart illustrating the NEW decoding process performed in step S135 of FIG. 21.

In step S141, the decoding unit 34 acquires the pic_type set in the NEW NAL unit. After that, the NEW decryption process is terminated.

FIG. 23 is a flowchart illustrating the Tile group decoding process performed in step S137 of FIG. 21.

In step S151, the decoding unit 34 refers to the ref_pic_flag information set for the NEW NAL unit.

After that, in steps S152 to S154, the same processing as in steps S82 to S84 described with reference to the flowchart of FIG. 15 described above is performed, and then the Tile group decoding process is terminated.

As described above, the image decoding device 30 decodes the image according to the ref_pic_flag information set for the NEW NAL unit, that is, the bit stream composed of the access unit in which at least one or more NAL units are arranged. The image can be decoded.

<Computer configuration example>
Next, the series of processes (image decoding method and image coding method) described above can be performed by hardware or software. When a series of processes is performed by software, the programs constituting the software are installed on a general-purpose computer or the like.

FIG. 24 is a block diagram showing a configuration example of an embodiment of a computer on which a program for executing the above-mentioned series of processes is installed.

The program can be recorded in advance on the hard disk 105 or ROM 103 as a recording medium built in the computer.

Alternatively, the program can be stored (recorded) in the removable recording medium 111 driven by the drive 109. Such a removable recording medium 111 can be provided as so-called package software. Here, examples of the removable recording medium 111 include a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, and a semiconductor memory.

In addition to installing the program on the computer from the removable recording medium 111 as described above, the program can be downloaded to the computer via a communication network or a broadcasting network and installed on the built-in hard disk 105. That is, for example, the program transfers wirelessly from a download site to a computer via an artificial satellite for digital satellite broadcasting, or transfers to a computer by wire via a network such as LAN (Local Area Network) or the Internet. be able to.

The computer includes a CPU (Central Processing Unit) 102, and an input/output interface 110 is connected to the CPU 102 via a bus 101.

When a command is input by the user by operating the input unit 107 or the like via the input / output interface 110, the CPU 102 executes a program stored in the ROM (Read Only Memory) 103 accordingly. .. Alternatively, the CPU 102 loads the program stored in the hard disk 105 into the RAM (Random Access Memory) 104 and executes it.

As a result, the CPU 102 performs processing according to the above-mentioned flowchart or processing performed according to the above-mentioned block diagram configuration. Then, the CPU 102 outputs the processing result from the output unit 106, transmits it from the communication unit 108, or records it on the hard disk 105, if necessary, via the input / output interface 110, for example.

The input unit 107 is composed of a keyboard, a mouse, a microphone, and the like. Further, the output unit 106 is composed of an LCD (Liquid Crystal Display), a speaker, or the like.

Here, in the present specification, the processing performed by the computer according to the program does not necessarily have to be performed in chronological order in the order described as the flowchart. That is, the processing performed by the computer according to the program also includes processing executed in parallel or individually (for example, parallel processing or processing by an object).

Further, the program may be processed by one computer (processor) or may be distributed by a plurality of computers. Further, the program may be transferred to a distant computer and executed.

Furthermore, in the present specification, the system means a set of a plurality of constituent elements (devices, modules (parts), etc.), and it does not matter whether or not all constituent elements are in the same housing. Therefore, a plurality of devices housed in separate housings and connected via a network, and a device in which a plurality of modules are housed in one housing are both systems. ..

Further, for example, the configuration described as one device (or processing unit) may be divided and configured as a plurality of devices (or processing units). On the contrary, the configurations described above as a plurality of devices (or processing units) may be collectively configured as one device (or processing unit). Further, of course, a configuration other than the above may be added to the configuration of each device (or each processing unit). Further, if the configuration and operation of the entire system are substantially the same, a part of the configuration of one device (or processing unit) may be included in the configuration of another device (or other processing unit). ..

Further, for example, this technology can have a cloud computing configuration in which one function is shared by a plurality of devices via a network and jointly processed.

Further, for example, the above-mentioned program can be executed in any device. In that case, the device may have necessary functions (functional blocks, etc.) so that necessary information can be obtained.

Further, for example, each step described in the above flowchart can be executed by one device or can be shared and executed by a plurality of devices. Further, when a plurality of processes are included in one step, the plurality of processes included in the one step can be executed by one device or shared by a plurality of devices. In other words, a plurality of processes included in one step can be executed as processes of a plurality of steps. On the contrary, the processes described as a plurality of steps can be collectively executed as one step.

In the program executed by the computer, the processing of the steps for describing the program may be executed in chronological order according to the order described in this specification, or may be called in parallel or called. It may be executed individually at a necessary timing such as time. That is, as long as there is no contradiction, the processing of each step may be executed in an order different from the above-mentioned order. Further, the processing of the step for writing this program may be executed in parallel with the processing of another program, or may be executed in combination with the processing of another program.

It should be noted that the present techniques described in the present specification can be independently implemented independently as long as there is no contradiction. Of course, any plurality of the present technologies can be used in combination. For example, some or all of the techniques described in any of the embodiments may be combined with some or all of the techniques described in other embodiments. It is also possible to carry out a part or all of any of the above-mentioned techniques in combination with other techniques not described above.

<Example of configuration combination>
The present technology can also have the following configurations.
(1)
A decoding unit that decodes an image of a bitstream consisting of an access unit in which at least one or more NAL (Network Abstraction Layer) units are arranged is provided.
In the header area of the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored for the images stored in all the Tile group NAL units in the access unit. Image decoding device.
(2)
The identification information is stored in an AUD (Access Unit Delimiter) NAL unit indicating the delimiter of the access unit, and is applied to all Tile group NAL units in the access unit according to the above (1). Image decoding device.
(3)
The identification information is stored in an identification NAL unit provided separately from the AUD (Access Unit Delimiter) NAL unit indicating the delimiter of the access unit, and is stored for all Tile group NAL units in the access unit. The image decoding apparatus according to (1) above.
(4)
The image decoding device that performs the image decoding process
Includes decoding a bitstream image consisting of an access unit in which at least one or more NAL (Network Abstraction Layer) units are located.
In the header area of the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored for the images stored in all the Tile group NAL units in the access unit. Image decoding method.
(5)
A coding unit for encoding an image of a bitstream consisting of an access unit in which at least one or more NAL (Network Abstraction Layer) units are arranged is provided.
An image that stores identification information that identifies whether or not the current frame can become a reference image in the future for the images stored in all the Tile group NAL units in the access unit in the header area of the access unit. Encoding device.
(6)
The image code according to (5) above, wherein the identification information is stored in an AUD (Access Unit Delimiter) NAL unit indicating the delimiter of the access unit, and is applied to all Tile group NAL units in the access unit. Delimiter.
(7)
The identification information is stored in an identification NAL unit provided separately from the AUD (Access Unit Delimiter) NAL unit indicating the delimiter of the access unit, and is applied to all Tile group NAL units in the access unit. The image encoding device according to (5) above.
(8)
An image coding device that performs image coding processing
Includes encoding an image of a bitstream consisting of access units in which at least one or more Network Abstraction Layer (NAL) units are located.
In the header area of the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored for the images stored in all the Tile group NAL units in the access unit. Image coding method.

Note that the present embodiment is not limited to the above-described embodiment, and various changes can be made without departing from the gist of the present disclosure. Further, the effects described in the present specification are merely examples and are not limited, and other effects may be obtained.

10 image encoding device, 11 coding unit, 12 judgment unit, 13 VCL buffer, 14 non-VCL buffer, 15 file generation unit, 16 control unit, 20 storage unit, 30 image decoding device, 31 VCL buffer, 32 non-VCL buffer. , 33 parameter memory, 34 decoding unit, 35 output buffer, 37 control unit

Claims

A decoding unit that decodes an image of a bitstream consisting of an access unit in which at least one or more NAL (Network Abstraction Layer) units are arranged is provided.
In the header area of the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored for the images stored in all the Tile group NAL units in the access unit. Image decoding device.
The image according to claim 1, wherein the identification information is stored in an AUD (Access Unit Delimiter) NAL unit indicating the delimiter of the access unit, and is applied to all Tile group NAL units in the access unit. Decoding device.
The identification information is stored in an identification NAL unit provided separately from the AUD (Access Unit Delimiter) NAL unit indicating the delimiter of the access unit, and is stored for all Tile group NAL units in the access unit. The image decoding apparatus according to claim 1, which is applied.
The image decoding device that performs the image decoding process
Includes decoding a bitstream image consisting of an access unit in which at least one or more NAL (Network Abstraction Layer) units are located.
In the header area of the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored for the images stored in all the Tile group NAL units in the access unit. Image decoding method.
A coding unit for encoding an image of a bitstream consisting of an access unit in which at least one or more NAL (Network Abstraction Layer) units are arranged is provided.
An image that stores identification information that identifies whether or not the current frame can become a reference image in the future for the images stored in all the Tile group NAL units in the access unit in the header area of the access unit. Encoding device.
The image coding according to claim 5, wherein the identification information is stored in an AUD (Access Unit Delimiter) NAL unit indicating the delimiter of the access unit, and is applied to all Tile group NAL units in the access unit. apparatus.
The identification information is stored in an identification NAL unit provided separately from the AUD (Access Unit Delimiter) NAL unit indicating the delimiter of the access unit, and is applied to all Tile group NAL units in the access unit. The image coding apparatus according to claim 5.
An image coding device that performs image coding processing
Includes encoding an image of a bitstream consisting of access units in which at least one or more Network Abstraction Layer (NAL) units are located.
In the header area of the access unit, identification information for identifying whether or not the current frame can become a reference image in the future is stored for the images stored in all the Tile group NAL units in the access unit. Image coding method.