CN115174901A

CN115174901A - Video decoder, video encoder and related encoding and decoding methods

Info

Publication number: CN115174901A
Application number: CN202210623879.XA
Authority: CN
Inventors: 弗努·亨德里; 王业奎
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-09-12
Filing date: 2019-09-12
Publication date: 2022-10-11
Also published as: AU2019338463B2; KR20240032173A; AU2019337644A1; WO2020056168A1; AU2023219935A1; JP2023076474A; CN112840651A; JP2022500922A; JP2022500923A; CN117834880A; CN112840651B; CN115174899B; JP2024026290A; KR102643058B1; KR20210057109A; EP3847812B1; KR20230129599A; US20210195236A1; AU2019338463A1; AU2023237082A1

Abstract

The invention provides a method for decoding a coded video code stream. The method comprises the following steps: analyzing the mark; analyzing a first reference image list structure; when the flag has a first value, determining that an index of a second reference picture list structure does not exist in a slice header of the coded video bitstream and inferring that the index of the second reference picture list structure is the same as the index of the first reference picture list structure; determining that an index of the second reference picture list structure exists in the slice header when the flag has a second value; generating a reference picture list using the first reference picture list structure or the second reference picture list structure; performing inter prediction from the reference picture list to generate a reconstructed block.

Description

Video decoder, video encoder and related encoding and decoding methods

The present application is a divisional application, the original application having an application number of 201980059745.6 and the original application date of 2019, 09 and 12, the entire content of the original application being incorporated by reference in the present application.

Cross reference to related applications

This patent application claims the benefits of U.S. provisional patent application serial No. 62/730,172, entitled "Reduction of number of bits for Reference Picture Management Based on Reference Picture Lists in Video Coding" filed on 2018, 9, 12, entitled "Bit Count Reduction for Reference Picture Management in Video Coding" and claims the benefits of U.S. provisional patent application serial No. 62/848,147, entitled "Reduction of number of bits for Reference Picture Management Based on Reference Picture Lists in Video Coding" filed on 2019, 5, 15, and entitled "Reduction of number of bits for Reference Picture Management Based on Reference Picture Lists in Video Coding" filed on Video Coding "by Fnu Hendry et al, the entire contents of which are incorporated herein by Reference.

Technical Field

This disclosure generally describes techniques to improve indication efficiency of reference picture management in video coding. More specifically, techniques are described to improve reference picture list construction and indication of reference picture identification based directly on a reference picture list.

Background

Even if the video is relatively short, a large amount of video data is required to describe, which can cause difficulties when the data is to be streamed or otherwise transmitted in a communication network having limited bandwidth capacity. Therefore, video data is typically compressed and then transmitted over modern telecommunication networks. Since memory resources may be limited, the size of the video may also be an issue when stored on a storage device. Video compression devices typically use software and/or hardware on the source side to encode video data for transmission or storage, thereby reducing the amount of data required to represent digital video images. Then, the compressed data is received at the destination side by a video decompression apparatus that decodes the video data. With limited network resources and an increasing demand for higher video quality, there is a need for improved compression and decompression techniques that can increase the compression ratio with little impact on image quality.

Disclosure of Invention

A first aspect relates to a method implemented by a video decoder of decoding a coded video bitstream. The method comprises the following steps: parsing a flag from the coded video stream; parsing a first reference picture list structure from the coded video bitstream; when the flag has a first value, determining that an index of a second reference picture list structure does not exist in a slice header of the coded video bitstream and inferring that the index of the second reference picture list structure is the same as the index of the first reference picture list structure; determining that an index of the second reference picture list structure exists in the slice header when the flag has a second value; generating a reference picture list using at least one of the first reference picture list structure or the second reference picture list structure; performing inter prediction according to the reference picture list to generate a reconstructed block.

The method provides techniques that simplify and improve the efficiency of the decoding process. Using a flag to indicate whether the index of the second reference picture list structure can be inferred to be the same as the index of the first reference picture list structure, the encoder/decoder (also referred to as "codec") in video coding is improved (e.g., using fewer bits, requiring less bandwidth, being more efficient, etc.) over current codecs. Indeed, the improved video coding process provides a better user experience when sending, receiving and/or viewing video.

In a first implementation form of the method according to the first aspect, the flag is denoted rpl1_ idx _ present _ flag.

In a second implementation of the method according to the first aspect as such or any of the above implementations of the first aspect, the flag is included in a Picture Parameter Set (PPS) of the coded video stream.

In a third implementation form of the method according to the first aspect as such or any of the preceding implementation forms of the first aspect, the flag is included in a Sequence Parameter Set (SPS) of the coded video bitstream.

In a fourth implementation form of the method according to the first aspect as such or any of the preceding implementation forms of the first aspect, the first reference picture list structure is included in a slice header of the coded video bitstream.

In a fifth implementation of the method according to the first aspect as such or any of the preceding implementations of the first aspect, the flag is included in a Picture Parameter Set (PPS) of the coded video stream, and the first reference picture list structure is included in a slice header of the coded video stream.

In a sixth implementation form of the method according to the first aspect as such or any of the preceding implementation forms of the first aspect, the first value of the flag is 1 (1).

In a seventh implementation form of the method according to the first aspect as such or any of the preceding implementation forms of the first aspect, when the first value of the flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header.

In an eighth implementation form of the method according to the first aspect as such or any of the preceding implementation forms of the first aspect, the second value of the flag is 0 (0).

In a ninth implementation form of the method according to the first aspect as such or any of the above implementation forms of the first aspect, when the second value of the flag is 0 (0), ref _ pic _ list _ sps _ flag [0] and ref _ pic _ list _ idx [0] are included in the slice header.

A second aspect relates to a method implemented by a video encoder for encoding a video bitstream. The method comprises the following steps: when the index of a second reference image list structure is not coded in the slice header of the video code stream and a video decoder infers that the index of the second reference image list structure is the same as the index of a first reference image list structure, coding a mark with a first value in the video code stream; when the index of the second reference image list structure is coded in the slice header of the video code stream, coding a mark with a second value in the video code stream; when the mark is coded by taking the first value as the value of the mark, coding the first reference image list structure in the video code stream; when the flag is encoded with the second value as its value, encoding the first reference picture list structure and the second reference picture list structure in the video bitstream; and sending the video code stream to the video decoder.

The method provides techniques that simplify the decoding process and increase the efficiency of the decoding process. Using a flag to indicate whether the index of the second reference picture list structure can be inferred to be the same as the index of the first reference picture list structure, the encoder/decoder (also referred to as "codec") in video coding is improved (e.g., using fewer bits, requiring less bandwidth, being more efficient, etc.) over current codecs. Indeed, the improved video coding process provides a better user experience when sending, receiving and/or viewing video.

In a first implementation of the method according to the second aspect, the flag is denoted rpl1_ idx _ present _ flag.

In a second implementation of the method according to the second aspect as such or any of the above implementations of the second aspect, the flag is encoded in a Picture Parameter Set (PPS) of the coded video stream.

In a third implementation of the method according to the second aspect as such or any of the preceding implementations of the second aspect, the first reference picture list structure is encoded in a slice header of the coded video bitstream.

In a fourth implementation of the method according to the second aspect as such or any of the preceding implementations of the second aspect, the first reference picture list structure and the second reference picture list are encoded in a slice header of the coded video bitstream.

In a fifth implementation form of the method according to the second aspect as such or any of the preceding implementation forms of the second aspect, the first value of the flag is 1 (1) and the second value of the flag is 0 (0).

In a sixth implementation of the method according to the second aspect as such or any of the above implementations of the second aspect, when the first value of the flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header.

A third aspect relates to a decoding device. The decoding apparatus includes: a receiver for receiving a decoded video stream; a memory coupled with the receiver, the memory storing instructions; a processor coupled with the memory, the processor to execute the instructions stored in the memory to cause the processor to: parsing a logo from the coded video bitstream; parsing a first reference picture list structure from the coded video bitstream; when the flag has a first value, determining that an index of a second reference picture list structure does not exist in a slice header of the coded video bitstream and inferring that the index of the second reference picture list structure is the same as the index of the first reference picture list structure; determining that an index of the second reference picture list structure exists in the slice header when the flag has a second value; generating a reference picture list using at least one of the first reference picture list structure or the second reference picture list structure; performing inter prediction from the reference picture list to generate a reconstructed block.

The decoding apparatus provides techniques that simplify the decoding process and increase the efficiency of the decoding process. Using a flag to indicate whether the index of the second reference picture list structure can be inferred to be the same as the index of the first reference picture list structure, the encoder/decoder (also referred to as "codec") in video coding is improved (e.g., using fewer bits, requiring less bandwidth, being more efficient, etc.) over current codecs. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

In a first implementation form of the decoding device according to the third aspect, the decoding device comprises a display for displaying an image generated using the reconstructed block.

In a second implementation form of the decoding device according to the third aspect as such or any of the preceding implementation forms of the third aspect, the flag is denoted rpl1_ idx _ present _ flag.

In a third implementation form of the decoding device according to the third aspect as such or any of the preceding implementation forms of the third aspect, the flag is included in a Picture Parameter Set (PPS) of the coded video bitstream.

In a fourth implementation form of the decoding device according to the third aspect as such or any of the preceding implementation forms of the third aspect, the first reference picture list structure is included in a slice header of the coded video bitstream.

In a fifth implementation form of the decoding device according to the third aspect as such or any of the preceding implementation forms of the third aspect, the first value of the flag is 1 (1) and the second value of the flag is 0 (0).

In a sixth implementation form of the method according to the third aspect as such or any of the preceding implementation forms of the third aspect, when the first value of the flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header.

A fourth aspect relates to an encoding device. The encoding device includes: a processor configured to: when the index of the second reference picture list structure is not coded in the slice header of the video code stream, and a video decoder infers that the index of the second reference picture list structure is the same as the index of the first reference picture list structure, coding a mark with a first value in the video code stream; when the index of the second reference image list structure is coded in the slice header of the video code stream, coding a mark with a second value in the video code stream; when the flag is coded by taking the first value as the value of the flag, coding the first reference image list structure in the video code stream; when the flag is encoded with the second value as its value, encoding the first reference picture list structure and the second reference picture list structure in the video bitstream; a transmitter coupled to the processor, the transmitter configured to transmit the video bitstream to the video decoder.

The encoding apparatus provides a technique that simplifies the decoding process and improves the efficiency of the decoding process. Using a flag to indicate whether the index of the second reference picture list structure can be inferred to be the same as the index of the first reference picture list structure, the encoder/decoder (also referred to as "codec") in video coding is improved (e.g., using fewer bits, requiring less bandwidth, being more efficient, etc.) over current codecs. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

In a first implementation form of the encoding device according to the fourth aspect, the flag is denoted rpl1_ idx _ present _ flag.

In a second implementation form of the encoding device according to the fourth aspect as such or any of the preceding implementation forms of the fourth aspect, the flag is encoded in a Picture Parameter Set (PPS) of the coded video stream.

In a third implementation of the method according to the fourth aspect as such or any of the above implementations of the fourth aspect, the first reference picture list structure is encoded in a slice header of the coded video bitstream.

In a fourth implementation form of the encoding device according to the fourth aspect as such or any of the preceding implementation forms of the fourth aspect, the first value of the flag is 1 (1) and the second value of the flag is 0 (0).

In a fifth implementation form of the method according to the fourth aspect as such or any of the preceding implementation forms of the fourth aspect, when the first value of the flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header.

A fifth aspect relates to a decoding device. The coding device comprises: the receiver is used for receiving the code stream and decoding the code stream; a transmitter coupled with the receiver, the transmitter to transmit the decoded image to a display; a memory coupled with at least one of the receiver or the transmitter, the memory to store instructions; a processor coupled with the memory, the processor to execute the instructions stored in the memory to perform a method provided by any of the embodiments disclosed herein.

The decoding device provides a technology which simplifies the decoding process and improves the efficiency of the decoding process. Using a flag to indicate whether the index of the second reference picture list structure can be inferred to be the same as the index of the first reference picture list structure, the encoder/decoder (also referred to as "codec") in video coding is improved (e.g., using fewer bits, requiring less bandwidth, being more efficient, etc.) over current codecs. Indeed, the improved video coding process provides a better user experience when sending, receiving and/or viewing video.

A seventh aspect relates to a system. The system comprises: an encoder comprising the encoding apparatus provided in any one of the embodiments disclosed herein; a decoder in communication with the encoder, the decoder comprising a decoding device provided in any of the embodiments disclosed herein.

The system provides techniques that simplify the decoding process and improve the efficiency of the decoding process. Using a flag to indicate whether the index of the second reference picture list structure can be inferred to be the same as the index of the first reference picture list structure, the encoder/decoder (also referred to as "codec") in video coding is improved (e.g., using fewer bits, requiring less bandwidth, being more efficient, etc.) over current codecs. In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

An eighth aspect relates to a decoding module. The coding template comprises: the receiving module is used for receiving the code stream and decoding the code stream; a transmitting module coupled with the receiving module, the transmitting module to transmit the decoded image to a display module; a storage module coupled to at least one of the receiving module or the transmitting module, the storage module to store instructions; a processing module coupled to the memory module for executing the instructions stored in the memory module to perform the method provided by any of the embodiments disclosed herein.

The decoding module provides a technique that simplifies the decoding process and improves the efficiency of the decoding process. Using a flag to indicate whether the index of the second reference picture list structure can be inferred to be the same as the index of the first reference picture list structure, the encoder/decoder (also referred to as "codec") in video coding is improved (e.g., using fewer bits, requiring less bandwidth, being more efficient, etc.) over current codecs. Indeed, the improved video coding process provides a better user experience when sending, receiving and/or viewing video.

Drawings

For a more complete understanding of the present invention, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.

Fig. 1 is a block diagram of an exemplary decoding system that may employ bi-directional prediction techniques.

Fig. 2 is a block diagram of an exemplary video encoder that may implement bi-prediction techniques.

Fig. 3 is a block diagram of an exemplary video decoder that may implement bi-prediction techniques.

Fig. 4 is a diagram of a Reference Picture Set (RPS) including a current picture including entries in all subsets of the RPS.

FIG. 5 is a diagram of one embodiment of a video bitstream.

Fig. 6 is an embodiment of a method of decoding a coded video bitstream.

Fig. 7 is an embodiment of a method of encoding a video bitstream.

FIG. 8 is a schematic diagram of a video coding apparatus.

FIG. 9 is a schematic diagram of one embodiment of a decode module.

Detailed Description

The following are various abbreviations used herein: decoding a Picture Buffer (DPB), decoding an Instantaneous Refresh (IDR), an Intra Random Access Point (IRAP), least Significant bits (Least Significant Bit, LSB), most Significant Bits (MSB), network Abstraction Layer (NAL), picture Order number (POC), original Byte Sequence Payload (Raw Byte Sequence Payload, RBSP), sequence Parameter Set (SPS), work Draft (WD).

Fig. 1 is a block diagram of an exemplary coding system 10 that may employ the video coding techniques described herein. As shown in fig. 1, coding system 10 includes a source device 12, where source device 12 provides encoded video data that is later decoded by a destination device 14. In particular, source device 12 may provide video data to destination device 14 via computer-readable medium 16. Source device 12 and destination device 14 may comprise any of a variety of devices, including a desktop computer, a notebook (e.g., laptop) computer, a tablet computer, a set-top box, a telephone handset such as a "smart" handset and a "smart" pad, a television, a camera, a display device, a digital media player, a video game console, a video streaming device, and so forth. In some cases, source device 12 and destination device 14 may be used for wireless communication.

Destination device 14 may receive encoded video data to be decoded via computer-readable medium 16. Computer-readable medium 16 may include any type of medium or device capable of communicating encoded video data from source device 12 to destination device 14. In one example, computer-readable medium 16 may include a communication medium to enable source device 12 to send encoded video data directly to destination device 14 in real-time. The encoded video data may be modulated according to a communication standard, such as a wireless communication protocol, and transmitted to destination device 14. The communication medium may include any wireless or wired communication medium, such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The communication medium may form part of a packet network, such as a local area network, a wide area network, or a global network such as the internet. The communication medium may include a router, switch, base station, or any other device operable to facilitate communication from source device 12 to destination device 14.

In some examples, the encoded data may be output from output interface 22 to a storage device. Similarly, encoded data may be accessed from a storage device through an input interface. The storage device may comprise any of a variety of distributed or locally accessed data storage media such as a hard disk drive, blu-ray discs, digital Video Discs (DVDs), compact Disc Read-Only memories (CD-ROMs), flash Memory, volatile or non-volatile Memory, or any other suitable digital storage media for storing encoded video data. In another example, the storage device may correspond to a file server or another intermediate storage device that may store the encoded video generated by source device 12. Destination device 14 may access the stored video data from the storage device via streaming or download. The file server may be any type of server capable of storing encoded video data and transmitting the encoded video data to destination device 14. Exemplary file servers include a web server (e.g., for a website), a File Transfer Protocol (FTP) server, a Network Attached Storage (NAS) device, or a local disk drive. Destination device 14 may access the encoded video data over any standard data connection, including an internet connection. The standard data connection may include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., a Digital Subscriber Line (DSL), a cable modem, etc.), or a combination of both suitable for accessing encoded video data stored on the file server. The transmission of the encoded video data from the storage device may be a streaming transmission, a download transmission, or a combination thereof.

The techniques in this disclosure are not necessarily limited to wireless applications or settings. These techniques may be applied to video coding to support any of a variety of multimedia applications, such as over-the-air television broadcasts, cable television transmissions, satellite television transmissions, internet streaming video transmissions (e.g., dynamic adaptive streaming over HTTP (DASH)), digital video encoded on a data storage medium, decoding of digital video stored on a data storage medium, or other applications.

In the example of fig. 1, source device 12 includes a video source 18, a video encoder 20, and an output interface 22. Destination device 14 includes an input interface 28, a video decoder 30, and a display device 32. In accordance with this disclosure, video encoder 20 in source device 12 and/or video decoder 30 in destination device 14 may be used for video coding using these techniques. In other examples, the source device and the destination device may include other components or apparatuses. For example, source device 12 may receive video data from an external video source (e.g., an external camera). Similarly, destination device 14 may be connected with an external display device instead of including an integrated display device.

The decoding system 10 shown in fig. 1 is merely an example. The video coding techniques may be performed by any digital video encoding and/or decoding device. Although the techniques in this disclosure are typically performed by video coding devices, these techniques may also be performed by video encoders/decoders (commonly referred to as "CODECs"). In addition, the techniques of the present invention may also be performed by a video preprocessor. The video encoder and/or decoder may be a Graphics Processing Unit (GPU) or similar device.

Source device 12 and destination device 14 are merely examples of such transcoding devices in which source device 12 generates transcoded video data for transmission to destination device 14. In some examples, source device 12 and destination device 14 may operate substantially symmetrically such that both source device 12 and destination device 14 include video encoding and decoding components. Accordingly, transcoding system 10 may support one-way or two-way video transmission between

video devices

12 and 14, such as for video streaming, video playback, video broadcasting, or video telephony.

Video source 18 in source device 12 may include a video capture device (e.g., a video camera), a video archive that includes previously captured video, and/or a video input interface that receives video from a video content provider. Alternatively, video source 18 may generate computer graphics-based data as the source video, or as a combination of live video, archived video, and computer-generated video.

In some cases, when video source 18 is a camera, source device 12 and destination device 14 may constitute a camera phone or video phone. However, as noted above, the techniques described in this disclosure may be applicable to video coding in general, as well as to wireless applications and/or wired applications. In each case, the captured, pre-captured, or computer-generated video may be encoded by video encoder 20. The encoded video information may then be output onto computer readable medium 16 through output interface 22.

The computer-readable medium 16 may be a transitory medium such as a wireless broadcast or a wired network transmission, and may also include a storage medium (i.e., a non-transitory storage medium) such as a hard disk, a flash drive, a compact disk, a digital video disk, a blu-ray disk, or other computer-readable medium. In some examples, a network server (not shown) may receive encoded video data from source device 12 and provide the encoded video data to destination device 14 via a network transmission or the like. Similarly, a computing device in a media production facility (e.g., an optical disc stamping facility) may receive encoded video data from source device 12 and generate an optical disc that includes the encoded video data. Thus, in various examples, computer-readable media 16 may be understood to include one or more of various forms of computer-readable media.

The input interface 28 in the destination device 14 receives information from the computer readable medium 16. The information of computer-readable medium 16 may include syntax information defined by video encoder 20. This syntax information is also used by video decoder 30, including syntax elements that describe the characteristics and/or processing of blocks and other coded units (e.g., groups of pictures (GOPs)). The display device 32 displays the decoded video data to a user, and may include any of a variety of display devices, such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or other types of display devices.

Video encoder 20 and Video decoder 30 may operate in accordance with a Video Coding standard, such as the High Efficiency Video Coding (HEVC) standard currently being developed, and may comply with the HEVC Test Model (HM). Alternatively, video encoder 20 and Video decoder 30 may operate in accordance with other proprietary or industry standards, such as the International Telecommunication Union Telecommunication Standardization Sector (ITU-T) H.264 standard (also known as Motion Picture Expert Group (MPEG) -4 part 10, advanced Video Coding (AVC)), H.265/HEVC, extended versions of such standards, and so forth. However, the techniques in this disclosure are not limited to any particular encoding standard. Other examples of video coding standards include MPEG-2 and ITU-T H.263. Although not shown in fig. 1, in some aspects, video encoder 20 and video decoder 30 may be integrated with an audio encoder and an audio decoder, respectively, and may include suitable multiplexer-demultiplexer (MUX-DEMUX) units or other hardware and software to encode both audio and video in a common data stream or separate data streams. The MUX-DEMUX unit may conform to the ITU h.223 multiplexer protocol, if applicable, or to other protocols such as the User Datagram Protocol (UDP).

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable encoder circuitry, such as one or more microprocessors, one or more Digital Signal Processors (DSPs), one or more Application Specific Integrated Circuits (ASICs), one or more Field Programmable Gate Arrays (FPGAs), one or more discrete logic, one or more software, one or more hardware, one or more firmware, or any combination thereof. When the above-described techniques are implemented in part in software, an apparatus may store the instructions of the software in a suitable non-transitory computer-readable medium and execute the instructions in hardware using one or more processors to perform the techniques of this disclosure. Both video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, which may be integrated as part of a combined encoder/decoder (CODEC) in the respective device. A device including video encoder 20 and/or video decoder 30 may include an integrated circuit, a microprocessor, and/or a wireless communication device such as a cellular telephone.

Fig. 2 is a block diagram of an exemplary video encoder 20 that may implement video coding techniques. Video encoder 20 may perform intra-coding and inter-coding on video blocks within a video slice (slice). Intra-coding relies on spatial prediction to reduce or remove spatial redundancy in video within a certain video frame or picture. Inter-coding relies on temporal prediction to reduce or remove temporal redundancy in video within adjacent frames or pictures of a video sequence. The intra mode (I-mode) may be any of several spatially based coding modes. Inter modes, such as uni-directional/uni prediction (P mode) or bi-prediction/bi prediction (B mode), may be any of several time-based coding modes.

As shown in fig. 2, video encoder 20 receives a current video block within a video frame to be encoded. In the example of fig. 2, video encoder 20 includes a mode selection unit 40, a reference frame memory 64, an adder 50, a transform processing unit 52, a quantization unit 54, and an entropy encoding unit 56. The mode selection unit 40 in turn comprises a motion compensation unit 44, a motion estimation unit 42, an intra-prediction (intra-prediction) unit 46 and a segmentation unit 48. To reconstruct the video block, video encoder 20 also includes an inverse quantization unit 58, an inverse transform unit 60, and an adder 62. A deblocking filter (not shown in fig. 2) is also included to filter block boundaries to remove blocking artifacts from the reconstructed video. The output of the summer 62 is typically filtered by a deblocking filter if desired. In addition to the deblocking filter, other (in-loop or post-loop) filters may be used. Such filters are not shown for simplicity, but may filter the output of adder 50 (as an in-loop filter) if desired.

In the encoding process, video encoder 20 receives a video frame or slice (slice) to be coded. The frame or slice may be divided into a plurality of video blocks. Motion estimation unit 42 and motion compensation unit 44 perform inter-prediction coding on received video blocks with reference to one or more blocks in one or more reference frames to achieve temporal prediction. Intra-prediction unit 46 may also perform intra-prediction coding on the received video block with reference to one or more neighboring blocks located in the same frame or slice as the block to be coded to achieve spatial prediction. Video encoder 20 may perform multiple coding passes, e.g., to select an appropriate coding mode for each block of video data.

Furthermore, partition unit 48 may partition a block of video data into sub-blocks according to an evaluation of a previous partition scheme in a previous coding pass. For example, the partition unit 48 may initially partition a frame or slice into Largest Coding Units (LCUs) and partition each LCU into sub-coding units (sub-CUs) according to a rate-distortion analysis (e.g., rate-distortion optimization). Mode select unit 40 may also generate a quadtree data structure indicating the partitioning of the LCU into sub-CUs. Leaf-node CUs in a quadtree may include one or more Prediction Units (PUs) and one or more Transform Units (TUs).

This disclosure uses the term "block" to refer to any of a CU, PU, or TU in the HEVC context, or similar data structures in other standard contexts (e.g., macroblocks and their sub-blocks in h.264/AVC). A CU includes an encoding node, a PU associated with the encoding node, and a TU. The size of the CU corresponds to the size of the coding node and is square. The CU may range in size from 8 × 8 pixels up to a maximum treeblock size of 64 × 64 pixels or more. Each CU may include one or more PUs and one or more TUs. Syntax data associated with a CU may describe, for example, partitioning the CU into one or more PUs. The partition modes may be different for a CU, either skip mode or direct mode coding, intra prediction mode coding, or inter-prediction/inter prediction mode coding. The PU may be partitioned into non-square shapes. Syntax data associated with a CU may also describe, for example, partitioning of the CU into one or more TUs according to a quadtree. TUs may be square or non-square (e.g., rectangular).

Mode select unit 40 may select one of the intra or inter coding modes based on the error result, etc., provide the resulting intra or inter coded block to adder 50 to generate residual block data, and provide to adder 62 to reconstruct the encoded block for use as a reference frame. Mode select unit 40 also provides syntax elements, such as motion vectors, intra-mode indicators, partition information, and other such syntax information, to entropy encoding unit 56.

Motion estimation unit 42 and motion compensation unit 44 may be highly integrated, but are illustrated separately for conceptual purposes. The motion estimation performed by motion estimation unit 42 is a process of generating motion vectors, which are used to estimate the motion of the video block. A motion vector, for example, may represent the displacement of a PU of a video block within a current video frame or picture relative to a predicted block within a reference frame (or other coded unit) (relative to the current block being coded within the current frame (or other coded unit)). A prediction block is a block that highly matches the block to be coded in terms of pixel differences. The pixel difference may be determined by Sum of Absolute Differences (SAD), sum of Squared Differences (SSD), or other difference metrics. In some examples, video encoder 20 may calculate values for sub-integer pixel positions of reference pictures stored in reference frame memory 64. For example, video encoder 20 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of a reference picture. Therefore, the motion estimation unit 42 may perform a motion search with reference to the integer pixel position and the fractional pixel position, and output a motion vector with fractional pixel accuracy.

Motion estimation unit 42 counts the motion vector for a PU of a video block in an inter-coded slice by comparing the location of the PU to locations of prediction blocks of reference pictures. The reference picture may be selected from a first reference picture list (list 0) or a second reference picture list (list 1), each list identifying one or more reference pictures stored in the reference frame memory 64. The motion estimation unit 42 sends the calculated motion vector to the entropy coding unit 56 and the motion compensation unit 44.

The motion compensation performed by motion compensation unit 44 may include retrieving or generating a prediction block based on the motion vector determined by motion estimation unit 42. Additionally, in some examples, motion estimation unit 42 and motion compensation unit 44 may be functionally integrated. Upon receiving the motion vector for the PU of the current video block, motion compensation unit 44 may locate the prediction block pointed to by the motion vector in one of the reference picture lists. Adder 50 forms a residual video block by subtracting the pixel values of the prediction block from the pixel values of the current video block being coded, resulting in pixel difference values, as described below. In general, motion estimation unit 42 performs motion estimation with reference to the luminance component, and motion compensation unit 44 uses a motion vector calculated from the luminance component for the chrominance component and the luminance component. Mode select unit 40 may also generate syntax elements related to the video blocks and the video slices for use by video decoder 30 in decoding the video blocks of the video slices.

The intra prediction unit 46 may intra predict the current block instead of inter prediction performed by the motion estimation unit 42 and the motion compensation unit 44, as described above. In particular, the intra-prediction unit 46 may determine an intra-prediction mode for encoding the current block. In some examples, intra-prediction unit 46 may encode the current block using various intra-prediction modes, e.g., in separate encoding passes, and intra-prediction unit 46 (or mode selection unit 40 in some examples) may select an appropriate intra-prediction mode from the tested modes for use.

For example, the intra prediction unit 46 may calculate rate-distortion values for various tested intra prediction modes using rate-distortion analysis and select an intra prediction mode having the best rate-distortion characteristics among the tested modes. Rate-distortion analysis generally determines the amount of distortion (or error) between an encoded block and an original, unencoded block (that was once encoded to produce the encoded block), and determines the code rate (i.e., the number of bits) used to produce the encoded block. Intra-prediction unit 46 may calculate ratios from the distortion and rate of various encoded blocks to determine which intra-prediction mode exhibits the best rate-distortion value for the block.

In addition, the intra prediction unit 46 may be used to code a depth block of a depth image using a Depth Modeling Mode (DMM). The mode selection unit 40 may determine whether the available DMM mode produces better coding results than the intra prediction mode and other DMM modes (e.g., rate-distortion optimization (RDO)). Data of the texture image corresponding to the depth image may be stored in the reference frame memory 64. Motion estimation unit 42 and motion compensation unit 44 may also be used to inter-predict depth blocks of the depth image.

After selecting the intra-prediction mode (e.g., the conventional intra-prediction mode or one of the DMM modes) for the block, intra-prediction unit 46 may provide information to entropy encoding unit 56, which is indicative of the intra-prediction mode selected for the block. The entropy encoding unit 56 may encode information representing the selected intra prediction mode. Video encoder 20 may carry configuration data in the transmitted codestream that may include multiple intra-prediction mode index tables and multiple modified intra-prediction mode index tables (also referred to as codeword mapping tables), definitions of the coding contexts of the various blocks, indications of the most probable intra-prediction mode, intra-prediction mode index table, and modified intra-prediction mode index table for each coding context.

Video encoder 20 forms a residual video block by subtracting the prediction data from mode select unit 40 from the original video block being coded. Adder 50 is one or more components that perform such subtraction operations.

Transform processing unit 52 applies a Discrete Cosine Transform (DCT), a conceptually similar transform, or the like, to the residual block, thereby generating a video block including residual transform coefficient values. Transform processing unit 52 may perform other transforms that are conceptually similar to DCT. Wavelet transforms, integer transforms, sub-band transforms, or other types of transforms may also be used.

Transform processing unit 52 applies a transform to the residual block, producing a block of residual transform coefficients. The transform may convert the residual information from the pixel domain to a transform domain (e.g., frequency domain). The transform processing unit 52 may send the resulting transform coefficients to a quantization unit 54. The quantization unit 54 quantizes the transform coefficients to further reduce the code rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The degree of quantization may be modified by adjusting the quantization parameter. In some examples, quantization unit 54 may then perform a scan of a matrix including the quantized transform coefficients. Alternatively, the entropy encoding unit 56 may perform scanning.

After quantization, the entropy encoding unit 56 entropy encodes the quantized transform coefficients. For example, entropy encoding unit 56 may perform Context Adaptive Variable Length Coding (CAVLC), context Adaptive Binary Arithmetic Coding (CABAC), syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval entropy (PIPE) coding, or other entropy encoding techniques. In the case of context-based entropy coding, the context may be a neighboring block. After entropy encoding is performed by entropy encoding unit 56, the encoded codestream may be sent to another device (e.g., video decoder 30) or archived for later transmission or retrieval.

Inverse quantization unit 58 and inverse transform unit 60 apply inverse quantization and inverse transform, respectively, to reconstruct a residual block in the pixel domain, e.g., for later use as a reference block. Motion compensation unit 44 may calculate the reference block by adding the residual block to a predicted block of one of the frames in reference frame store 64. The motion compensation unit 44 may also apply one or more interpolation filters to the reconstructed residual block to calculate sub-integer pixel values for motion estimation. Adder 62 adds the reconstructed residual block to the motion compensated prediction block produced by motion compensation unit 44 to produce a reconstructed video block, which is stored in reference frame store 64. The reconstructed video block may be used by motion estimation unit 42 and motion compensation unit 44 as a reference block to inter-code a block in a subsequent video frame.

Fig. 3 is a block diagram of an exemplary video decoder 30 that may implement video coding techniques. In the example of fig. 3, video decoder 30 includes an entropy decoding unit 70, a motion compensation unit 72, an intra prediction unit 74, an inverse quantization unit 76, an inverse transform unit 78, a reference frame memory 82, and an adder 80. In some examples, video decoder 30 may perform a decoding pass that is generally reciprocal to the encoding pass described with reference to video encoder 20 (fig. 2). Motion compensation unit 72 may generate prediction data from the motion vectors received from entropy decoding unit 70, and intra-prediction unit 74 may generate prediction data from the intra-prediction mode indicator received from entropy decoding unit 70.

In the decoding process, video decoder 30 receives an encoded video stream from video encoder 20, which represents video blocks of an encoded video slice (slice) and associated syntax elements. Entropy decoding unit 70 in video decoder 30 entropy decodes the code stream to generate quantized coefficients, motion vectors, or intra prediction mode indicators and other syntax elements. Entropy decoding unit 70 forwards the motion vectors and other syntax elements to motion compensation unit 72. Video decoder 30 may receive syntax elements at the video slice level and/or the video block level.

When the video slice is coded as an intra-coded (I) slice, intra-prediction unit 74 may generate prediction data for video blocks of the current video slice according to the indicated intra-prediction mode and data from previously decoded blocks in the current frame or picture. When a video frame is coded as an inter-coded (e.g., B, P, or GPB) slice, motion compensation unit 72 generates prediction blocks for the video blocks of the current video slice according to the motion vectors and other syntax elements received from entropy decoding unit 70. The prediction blocks may be generated from one of the reference pictures in one of the reference picture lists. Video decoder 30 may use a default construction technique to construct reference frame list 0 and list 1 from the reference pictures stored in reference frame store 82.

Motion compensation unit 72 determines prediction information for the video blocks of the current video slice by parsing the motion vectors and other syntax elements and uses the prediction information to generate a prediction block for the current video block being decoded. For example, motion compensation unit 72 uses some syntax elements received to determine a prediction mode (e.g., intra-prediction or inter-prediction) for coding video blocks in a video slice, an inter-prediction slice type (e.g., a B-slice, a P-slice, or a GPB slice), construction information for one or more reference picture lists in a slice, a motion vector for each inter-coded video block in a slice, an inter-prediction state for each inter-coded video block in a slice, and other information for decoding video blocks in a current video slice.

The motion compensation unit 72 may also interpolate according to an interpolation filter. Motion compensation unit 72 may calculate interpolated values for sub-integer pixels of the reference block using interpolation filters used by video encoder 20 during encoding of the video block. In this case, the motion compensation unit 72 may determine interpolation filters used by the video encoder 20 according to the received syntax elements and generate prediction blocks using the interpolation filters.

Data of the texture image corresponding to the depth image may be stored in the reference frame memory 82. Motion compensation unit 72 may also be used to inter-predict a depth block of a depth image.

Image and video compression has evolved rapidly, resulting in a variety of coding standards. These Video Coding standards include International Telecommunication Union (ITU) -Telecommunication Sector (ITU-T) H.261, ISO/IEC Moving Picture Experts Group (MPEG) -1 part 2, ITU-T H.262 or ISO/IEC MPEG-2 part 2, ITU-T H.263, ISO/IEC MPEG-4 part 2, advanced Video Coding (AVC) (also known as ITU-T H.264 or ISO/MPEG-4 part 10), and High Efficiency Video Coding (HEVC) (also known as ITU-T H.265 or MPEG-H part 2). AVC includes Scalable Video Coding (SVC), multi-view Video Coding (MVC), and extensions of multi-view Video Coding plus Depth (MVC + D), 3D AVC (3D-AVC). HEVC includes Scalable HEVC (SHVC), multi-view HEVC (Multiview HEVC, MV-HEVC), 3D HEVC (3D-HEVC), and like extensions.

Universal Video Coding (VVC) is a new Video Coding standard being developed by joint Video experts group (jmet) for ITU-T and ISO/IEC compliance. At the time of writing, the latest Work Draft (WD) of VVC is contained in JVET-K1001-v 1. The JFET document JFET-K0325-v 3 includes an update to the high level syntax of VVC.

Various techniques are described to address the problem in which the VVC standard is being developed. However, these techniques may also be applied to other video/media codec specifications.

Video compression techniques perform spatial (intra) prediction and/or temporal (inter) prediction to reduce or remove redundancy inherent in video sequences. For block-based video coding, a video slice (slice), such as a video image or a portion of a video image, may be partitioned into video blocks. A video block may also be referred to as a tree block (treeblock), a Coding Tree Block (CTB), a Coding Tree Unit (CTU), a Coding Unit (CU), and/or a coding node. Video blocks in an intra-coded (I) slice of one picture are encoded using spatial prediction with respect to reference samples in neighboring blocks within the same picture. Video blocks in inter-coded (P or B) slices of one picture may use spatial prediction with respect to reference samples in neighboring blocks within the same picture, or use temporal prediction with respect to reference samples in other reference pictures. A picture may be referred to as a frame and a reference picture may be referred to as a reference frame.

Spatial prediction or temporal prediction generates a prediction block for a block to be coded. The residual data represents pixel differences between the original block to be coded and the prediction block. An inter-coded block is coded according to a motion vector pointing to a block of reference samples making up the prediction block and residual data representing the difference between the coded block and the prediction block. An intra-coded block is encoded according to an intra-coding mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to a transform domain, producing residual transform coefficients that may then be quantized. The quantized transform coefficients are initially arranged in a two-dimensional array and may be scanned to produce a one-dimensional vector of transform coefficients. Entropy coding can be used to achieve further compression.

In video codec specifications, identification of pictures is used for a variety of purposes, including use as reference pictures in inter prediction, for pictures output from Decoded Picture Buffers (DPBs), for scaling of motion vectors, for weighted prediction, and the like. In AVC and HEVC, pictures may be identified by Picture Order Count (POC). In AVC and HEVC, pictures in the DPB may be identified as "used for short-term reference", "used for long-term reference", or "unused for reference". Once a picture is identified as "unused for reference", it cannot be reused for prediction, and can be deleted from the DPB when it is no longer needed to be output.

There are two reference pictures in AVC: short-term reference pictures and long-term reference pictures. A reference picture may be identified as "unused for reference" when the reference picture is no longer needed for prediction reference. The transition between the three states (short-term reference, long-term reference, no reference) is controlled by the decoded reference picture identification process. There are two alternative decoded reference picture identification mechanisms: an implicit sliding window process and an explicit Memory Management Control Operation (MMCO) process. The sliding window process identifies short-term reference pictures as "unused for reference" when the number of reference frames is equal to a given maximum number (max _ num _ ref _ frames in SPS). These short-term reference pictures are stored in a first-in-first-out manner so that the most recently decoded short-term pictures are stored in the DPB.

The explicit MMCO process may include a plurality of MMCO commands. An MMCO command may identify one or more short-term or long-term reference pictures as "unused for reference", all pictures as "unused for reference", or a current reference picture or an existing short-term reference picture as a long-term reference picture and assign a long-term picture index to the long-term reference picture.

In AVC, the reference picture identification operation and the process of outputting and deleting pictures from the DPB are performed after one picture completes decoding.

HEVC introduces a different reference picture management method, called Reference Picture Set (RPS). The RPS concept differs from MMCO/sliding window in AVC most fundamentally in that a complete set of reference pictures, used by the current picture or any subsequent pictures, is provided for each particular slice. Thus, a complete set of all pictures that must be saved in the DPB for use by the current picture or a subsequent picture is indicated. This is different from the AVC scheme, which only indicates relative changes of the DPB. Using the RPS concept, information of pictures in the earlier order of decoding is not required to maintain the correct state of the reference pictures in the DPB.

HEVC changes the order of picture decoding and DPB operations compared to AVC in order to fully exploit the advantages of RPS and improve error resilience. In the picture identification and buffer operation of AVC, outputting and deleting decoded pictures from the DPB are generally performed after the current picture is completely decoded. In HEVC, the RPS is first decoded from the slice header of the current picture, then picture identification and buffer operations are performed, and finally the current picture is decoded.

Each slice header in HEVC must include parameters for indicating the RPS for the picture that includes the slice. The only exception is that instead of indicating the RPS for an IDR slice, the RPS is inferred to be empty. For I slices that do not belong to an IDR picture, the RPS can be provided even if these I slices belong to an I picture, because there may be pictures that follow the I picture in decoding order, which use inter prediction of pictures that precede the I picture in decoding order. The number of pictures in the RPS does not exceed the DPB size limit represented by the syntax element SPS _ max _ dec _ pic _ buffering in the SPS.

Each picture is associated with a POC value representing the output order. The slice header includes a fixed length codeword pic _ order _ cnt _ LSB, representing the Least Significant Bit (LSB) of the overall POC value, also referred to as the POC LSB. The length of the codeword is indicated in the SPS and may be between 4 and 16 bits. The RPS concept uses POC to identify reference pictures. In addition to the POC value of each slice header itself, each slice header also directly includes a coded representation of the POC value (or LSB) of each picture in the RPS or uses a coded representation in the SPS.

The RPS for each picture includes 5 different reference picture lists, also referred to as 5 RPS subsets. RefPicSetStCurrBefore includes all short-term reference pictures that can be used in inter prediction of the current picture with both decoding order and output order prior to the current picture. RefPicSetStCurrAfter includes all short-term reference pictures that can be used in inter prediction of the current picture that are decoded before the current picture in order of output after the current picture. RefPicSetStFoll includes all short-term reference pictures that may be used in inter prediction for decoding one or more pictures that follow the current picture in order, but are not used in inter prediction for the current picture. RefPicSetLtCurr includes all long-term reference pictures that may be used in inter prediction of the current picture. RefPicSetLtFoll includes all long-term reference pictures that may be used in inter prediction for decoding one or more pictures that follow the current picture in order, but are not used in inter prediction for the current picture.

RPS uses a maximum of 3 cycles to indicate that these 3 cycles are repeated for different types of reference pictures: a short-term reference picture with a POC value less than the current picture, a short-term reference picture with a POC value greater than the current picture, and a long-term reference picture. In addition, a flag (used _ by _ curr _ pic _ X _ flag) is sent for each reference picture indicating that the reference picture is referenced by the current picture (included in the list RefPicSetStCurrBefore, refPicSetStCurrAfter or RefPicSetLtCurr) or not referenced by the current picture (included in the list RefPicSetStFoll or RefPicSetLtFoll).

Fig. 4 shows an RPS 400 including a current picture B14, the current picture B14 including entries (e.g., pictures) in all subsets 402 of the RPS 400. In the example of fig. 4, the current picture B14 is exactly one picture, which is located in each of the 5 subsets 402 (also referred to as RPS subsets). P8 is the picture in the subset 402 called RefPicSetStCurrBefore because the output order of the picture precedes B14 and the picture is used by B14. P12 is the picture in the subset 402 called RefPicSetStCurrAfter because the output order of the picture is after B14 and the picture is used by B14. P13 is a picture in the subset 402 called RefPicSetStFoll because it is a short-term reference picture that B14 does not use (but must be saved in the DPB because B15 will be used). P4 is the picture in subset 402, referred to as RefPicSetLtCurr, because this picture is a long-term reference picture used by B14. I0 is a picture in the subset 402 called refpicsetltfol because it is a long-term reference picture that is not used by the current picture (but must be saved in the DPB since B15 will be used).

The short-term reference pictures in the RPS 400 may be directly included in the slice header. Alternatively, the slice header may include only syntax elements representing indices to reference the predefined RPS list sent in the activation SPS. The short-term reference pictures in the RPS 402 may be indicated using any of two different schemes: inter RPS described below or intra RPS described herein. When intra RPS is used, num _ negative _ pics and num _ positive _ pics are indicated to represent the length of two different reference picture lists. The two lists include reference pictures that have a negative POC difference value and a positive POC difference value, respectively, from the current picture. Each element in both lists is encoded using a variable length code that represents the POC difference value minus1 relative to the previous element in the list. For the first picture in each list, the indication relates to the POC value of the current picture minus 1.

When encoding a cyclic RPS in a sequence parameter set, it is possible for an element in one RPS (e.g., RPS 400) to be encoded with reference to another RPS already encoded in the sequence parameter set. This is called inter frame RPS. Because all RPSs in the sequence parameter set are located in the same Network Abstraction Layer (NAL) unit, the method has no error stability problem. The inter-frame RPS syntax is based on the fact that: the RPS of the current picture can be predicted from the RPS of the previously decoded picture. This is because all reference pictures of the current picture must be reference pictures of the previous picture or the previously decoded picture itself. It is only necessary to indicate which of these pictures are reference pictures and used to predict the current picture. Thus, the syntax includes the following: an index pointing to the RPS and used as a predicted value; delta _ POC to be added to the delta _ POC of the predicted value to obtain the delta POC of the current RPS; a set of indicators indicating which pictures are reference pictures and whether these pictures are used only for predicting subsequent pictures. In one embodiment, delta POC refers to the POC difference between a current reference picture and another (e.g., previous) reference picture.

An encoder wishing to use a long-term reference picture must set the SPS syntax element long _ term _ ref _ pics _ present _ flag to 1. The long-term reference pictures can then be indicated in the slice header by a fixed length codeword POC _ lsb _ lt, which represents the least significant bits of the full POC value of each long-term picture. Each poc _ lsb _ lt is a copy of the codeword pic _ order _ cnt _ lsb indicated for a particular long-term picture. It is also possible to indicate the long-term image set in the SPS as a list of POC LSB values. The POC LSBs of the long-term image may then be indicated in the slice header as an index to this list.

The syntax element delta _ POC _ msb _ cycle _ lt _ minus1 may also be indicated to calculate the overall POC distance of the long-term reference picture relative to the current picture. This requires that the codeword delta _ POC _ msb _ cycle _ lt _ minus1 be indicated for each long-term reference picture, each long-term reference picture having the same POC LSB value as any other reference picture in the RPS.

For reference picture identification in HEVC, there are typically multiple pictures in the DPB before picture decoding. Some of the pictures may be used for prediction and are therefore identified as "used for reference". Other pictures are not available for prediction but are waiting for output and are therefore identified as "unused for reference". When the slice header completes parsing, an image identification process is first performed, and then slice data is decoded. Pictures that are present in the DPB and are identified as "used for reference" but not included in the RPS are identified as "unused for reference". If used _ by _ curr _ pic _ X _ flag is equal to 0, pictures that are not present in the DPB but included in the reference picture set are ignored. However, if used _ by _ curr _ pic _ X _ flag is equal to 1, then this reference picture will be used to predict the current picture, but missing. At this point, it is inferred that the picture was unintentionally lost and the decoder needs to take appropriate action.

After decoding the current picture, the reference picture is identified as "used for short-term reference".

Reference picture list construction in HEVC is discussed below. In HEVC, the term "inter prediction" is used to denote a prediction derived from data elements (e.g., sample values or motion vectors) of a reference picture other than the currently decoded picture. As in AVC, one picture can be predicted from multiple reference pictures. Reference pictures used for inter prediction are arranged in one or more reference picture lists. The reference index is used to identify which reference pictures in the list need to be used to generate the prediction signal.

P slices use a single reference picture list (list 0) and B slices use two reference picture lists (list 0 and list 1). Similar to AVC, reference picture list construction in HEVC includes reference picture list initialization and reference picture list modification.

In AVC, the initialization process for list 0 is different for P slices (using decoding order) and B slices (using output order). In HEVC, the output order is used for both cases.

Reference picture list initialization default list 0 and list 1 (if the slice is a B slice) are created from the following 3 RPS subsets: refPicSetStCurrBefore, refPicSetStCurrAfter and refpicsetltcurrafter. First, short-term pictures that are earlier (later) in output order are inserted into list 0 (list 1) from small to large according to their POC distances from the current picture; then, the short-term pictures in the back (front) of the output sequence are inserted into the list 0 (list 1) from small to large according to their POC distances from the current picture; finally, the long-term image is inserted at the end. With respect to the RPS, for list 0, the entry in RefPicSetStCurrBefore is inserted into the initial list, and then the entry in RefPicSetStCurrAfter is inserted. Thereafter, the entry in RefPicSetLtCurr (if present) is added.

In HEVC, the above process is repeated (reference pictures already added to the reference picture list are added again) when the number of entries in a list is less than the target number of active reference pictures (indicated in the picture parameter set or slice header). Truncating the list when the number of entries is greater than the target number.

After the reference picture list has been initialized, the reference picture list may be modified so that the reference pictures of the current picture may be arranged in any order (including the case where one particular reference picture may appear in multiple locations in the list) according to the reference picture list modification command. When the flag indicating that there is a list modification is set to 1, a fixed number of commands (equal to the target number of entries in the reference picture list) is indicated, and each command inserts an entry for the reference picture list. A reference picture is identified in the command by the index of the reference picture list of the current picture derived from the RPS indication. This is different from the reference picture list modification in h.264/AVC where one picture is identified by either picture number (derived from the syntax element frame _ num) or long-term reference picture index, and fewer commands may be needed to swap the first two entries in the initial list or to insert one entry at the start of the initial list and shift the other entries, etc.

A reference picture list may not include any reference pictures with temporalld greater than the current picture. An HEVC bitstream may include several temporal sub-layers. Each NAL unit belongs to a certain sub-layer, represented by temporalld (equal to temporal id plus 1-1).

Reference picture management is directly based on reference picture lists. JCT-VC document JCTVC-G643 includes a method to manage reference pictures in a DPB directly using 3 reference picture lists (reference picture list 0, reference picture list 1, and idle reference picture list), avoiding the need for the following indication and decoding process: (1) A sliding window and MMCO process and a reference picture list initialization and modification process in AVC, or (2) a reference picture set and a reference picture list initialization and modification process in HEVC.

However, reference Picture management based on a Reference Picture List (RPL) has a problem. For example, some RPL-based reference picture management schemes are not optimized with respect to indicating syntax elements in the codestream. This results in such RPL-based methods requiring more bits to indicate than other explicit reference picture management methods such as RPS-based methods. There are several reasons why the RPL based approach is inefficient to indicate.

For example, some syntax elements in the RPL structure are coded using inefficient entropy coding. For example, coding of syntax elements representing delta POC values for short-term reference pictures (STRPs) uses left-first signed integer zeroth order exponential golomb encoded syntax elements (e.g., se (v)) because of the increaseThe quantity POC value may be positive or negative. To encode any non-negative integer x using exponential golomb codes, the first step is to write down x +1 in binary representation. The written bits are then counted, the number is subtracted by 1, and the number of starting zero bits is written before the previous bit string. The first few values of the exponential golomb code are:

however, efficiency may be higher if the symbol values and absolute values of the delta POC are coded separately.

When the RPL indices (e.g. corresponding to reference picture list 0 and reference picture list 1) refer to predefined RPL structures in a parameter set (e.g. SPS), the indication of the RPL indices is not very efficient, as these schemes always indicate both indices. In many cases, the predefined RPL structures of RPL0 and RPL1 may be arranged such that the index of RPL0 and the index of RPL1 are the same when the RPL of a picture references the predefined RPL structure in the parameter set.

Furthermore, the reference picture list structure is written into the coded video bitstream without inter-frame RPL coding. For example, the RPL structure may be decoded without reference to the RPL structure(s) previously indicated in the codestream. In one embodiment, the RPL structure refers to a programmed construct of an index list that includes pointers (pointers) to candidate reference pictures.

Several aspects are described herein, which can be used alone and/or in combination to address the problems described herein. These several aspects are described separately below.

In one aspect, when the delta POC values are coded for STRP, the symbol values and absolute values are coded separately in the video stream. To indicate whether the delta POC values in each RPL structure have the same sign value (e.g. whether all delta POC values in the same RPL structure are positive or negative delta POC values), one flag is indicated in the same parameter set (e.g. SPS, PPS) as the predetermined RPL structure. This flag may be referred to as "all _ rpl _ entries _ same _ sign _ flag".

When all _ RPL _ entries _ same _ sign _ flag is equal to 1, a flag is indicated in the RPL structure to indicate the symbol values of all entries in the RPL structure (e.g., the symbol values of the delta POC values of STRPs). When all _ RPL _ entries _ same _ sign _ flag is equal to 0, a flag is indicated in the RPL structure to indicate a symbol value of each entry related to the STRP in each RPL structure. In one embodiment, the absolute value of the delta POC value of an STRP entry in an RPL structure is coded using zeroth order exponential golomb encoding of unsigned integers (e.g., ue (v)).

Alternatively, instead of indicating one flag all _ RPL _ entries _ same _ sign _ flag, two flags are indicated, one flag for each RPL structure list (e.g., the RPL structure list corresponding to RPL0 and the RPL structure list corresponding to RPL 1).

For encoding, when predefined RPL structures corresponding to lists RPL0 and RPL1 are created in the parameter set, the RPL structures in the lists corresponding to RPL0 and RPL1 may be arranged such that the RPL structure corresponding to list 0 is paired with the RPL structure corresponding to list 1. Thus, when the RPL structure is predefined in a picture reference parameter set, the indices of RPL0 and RPL0 for that picture are the same.

In connection with the above-described viewpoint, one flag may indicate whether or not a syntax element referring to a predefined RPL structure in RPL1 is present in the slice header. This flag may be referred to as "rpl1_ idx _ present _ flag". The flag may be indicated in the SPS or PPS according to an expected range or expected duration (persistence) of the flag. In one embodiment, the flag is preferably indicated in the PPS.

Depending on the value of rpl1_ idx _ present _ flag, the following applies. When RPL1_ idx _ present _ flag is equal to 1, whether RPL1 of a slice header related to a parameter set including this flag is not indicated in the slice header with reference to a predefined RPL structure and its index, but is inferred from a corresponding syntax element of RPL0 in the same slice. That is, ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not indicated, but are inferred (e.g., copied) from the values of ref _ pic _ list _ sps _ flag [0] and ref _ pic _ list _ idx [0], respectively. Otherwise, when rpl1_ idx _ present _ flag is equal to 0, ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are present in the slice header associated with the parameter set including the flag.

Alternatively, depending on the value of rpl1_ idx _ present _ flag, the following applies. When RPL1_ idx _ present _ flag is equal to 1, when RPL1 of a slice header related to a parameter set including the flag refers to a predefined RPL structure, an index of RPL1 is not indicated in the slice header. Instead, the index of RPL1 is inferred from the syntax element corresponding to RPL0 in the same slice. That is, ref _ pic _ list _ idx [1] is not indicated, but is inferred (e.g., copied) from the value of ref _ pic _ list _ idx [0 ]. Otherwise, when rpl1_ idx _ present _ flag is equal to 0, ref _ pic _ list _ idx [1] exists in the slice header associated with the parameter set including the flag.

A flag may be used to indicate whether the RPL structure in the list to which RPL1 corresponds has the same content as the RPL structure in the list to which RPL0 corresponds. This flag may be referred to as "rpl1_ copy _ from _ rpl0_ flag". The flag is indicated in the same parameter set that includes the predefined RPL structure and the position needs to be determined before the predefined RPL structure is indicated.

According to the value of rpl1_ copy _ from _ rpl0_ flag, the following applies. When RPL1_ copy _ from _ RPL0_ flag is equal to 1, the number of RPL structures corresponding to list 1 is not indicated, but the number of RPL structures corresponding to list 0 is inferred to be the same. The RPL structure corresponding to list 1 is not indicated. Conversely, after the decoder side parses and decodes the predefined RPL structures corresponding to list 0, for each RPL structure corresponding to list 0, an identical copy of the RPL structure is created and assigned to the RPL structure corresponding to list 1 with the same index. Otherwise, when RPL1_ copy _ from _ RPL0_ flag is equal to 0, the number of RPL structures corresponding to list 1 and the number of RPL structures corresponding to list 1 are indicated.

The content of the RPL structure can be predicted from another RPL structure. When the predefined RPL structures in the parameter set are divided into two lists (e.g., list 0 corresponding RPL structure list and list 1 corresponding RPL structure list), only the RPL structures in the list 0 corresponding list may be used as a reference for inter-frame RPL. The RPL structures in the list corresponding to list 0 can only refer to another RPL structure in the same list with an index smaller than that of the RPL structure, while the RPL structures in the list corresponding to list 1 can refer to any RPL structure in the list corresponding to list 0. The RPL structure explicitly indicated in the slice header may refer to any predefined RPL structure in the list to which list 0 corresponds. Alternatively, the RPL structure in the list corresponding to list 0 or list 1 can only refer to another RPL structure whose index in the list corresponding to list 0 is smaller than that of the RPL structure. The RPL structure explicitly indicated in the slice header may refer to any predefined RPL structure in the list to which list 0 corresponds.

In an alternative, when the predefined RPL structures in the parameter set are divided into two lists (e.g., list 0 corresponding RPL structure list and list 1 corresponding RPL structure list), the RPL structure in the list 0 corresponding list can only reference another RPL structure in the list 0 corresponding list with an index smaller than the index of the RPL structure. Similarly, an RPL structure in the list corresponding to list 1 can only refer to another RPL structure in the list corresponding to list 1 with an index smaller than that of the RPL structure. The RPL structure explicitly indicated in the slice header of RPL0 may refer to any predefined RPL structure in the list to which list 0 corresponds, while the RPL structure explicitly indicated in the slice header of RPL1 may refer to any predefined RPL structure in the list to which list 1 corresponds.

In another alternative, when the predefined RPL structures in the parameter set are not divided into two lists (e.g., the list of RPL structures corresponding to list 0 and the list of RPL structures corresponding to list 1 are indicated in one list), the RPL structure can only reference another RPL structure in the list with an index smaller than the index of the RPL structure. The RPL structure explicitly indicated in the slice header may refer to any predefined RPL structure.

For inter-RPL, the index of the reference RPL structure may be coded as the increment of the current RPL structure index and the reference RPL structure index minus1 and coded using ue (v) coding. In one alternative, the reference RPL index is coded directly using u (v) coding. The number of bits used to represent the index is set to the logarithm (log 2) of the number of RPL structures available for reference in the list. For example, when only the RPL structures in the list corresponding to list 0 can be used as a reference, the number of bits representing the reference RPL index is the logarithm (log 2) of the number of RPL structures in the list corresponding to list 0. In another alternative, the index of the reference RPL structure may be coded using ue (v) or u (v) depending on the mode of inter RPL.

To support inter-frame RPL, each RPL structure is coded using one of the modes described below. Note that the order of the following modes does not necessarily indicate the order of the values of the modes. The indication of the mode may be decoded as follows. In one embodiment, the indication of the mode may simply be decoded using ue (v). In one embodiment, the indication of the mode may be coded using u (v), where the number of bits representing the mode is the logarithm (log 2) of the total number of defined modes.

The first RPL coding mode is an intra coding mode. This mode, like other mechanisms, is used to indicate the contents of the RPL structure. See, for example, the method described in U.S. provisional application No. 62/719,360 entitled "Reference Picture Management in Video Coding" filed on 8/17/2018. Alternatively, an integer value greater than 0, called granularity _ val, may be indicated for each RPL structure. The value of granularity _ val is used to scale or divide each value representing the POC delta value of the STRP in the RPL structure.

The second RPL coding mode is an inter-coding mode that utilizes delta POC values where the STRPs in the reference RPL structure and the current RPL structure have the same or consistent difference value. To decode RPL using this mode, the following information is indicated in the code stream.

First, a decoding mode and a reference index are indicated in a code stream. The reference index is an index referring to the RPL structure. The reference index may be coded as described above. However, the mechanism may also code the reference index as the increment between the current RPL structure index and the reference RPL structure index minus1 and as ue (v). An offset is also indicated in the codestream. The offset is the difference between the delta POC value for the STRP in the reference RPL structure and the delta POC value for the STRP in the current RPL structure. The value of the offset may be constrained to be only positive (e.g., if the delta POC value of the STRP in the reference RPL structure is less than the delta POC value of the STRP in the current RPL structure, then the mode cannot be used to code the current RPL structure), only negative, or may be positive or negative. If the offset is decoded as ue (v), the offset may be indicated by term minus 1. A flag is also indicated in the codestream that references each entry in the RPL structure to indicate whether the entry is used as an entry in the current RPL structure. When an entry is a STRP entry in the reference RPL structure and is used as an entry in the current RPL structure, the entry is also a STRP entry in the current RPL structure, and the value of the STRP entry in the current RPL structure is the entry value in the reference RPL structure minus the offset (plus 1 if minus1 term is used to decode the offset). When an entry is a long-term reference picture (LTRP) entry in the reference RPL structure and is used as an entry in the current RPL structure, the entry is also an LTRP entry in the current RPL structure, and the value of the LTRP entry in the current RPL structure is merely copied from the entry in the reference RPL structure.

The third RPL coding mode takes advantage of the fact that: the entries (e.g., STRP and LTRP) in the current RPL structure are a superset of the entries in the reference RPL structure. That is, all entries in the reference RPL structure are the same as the first X (replacing X by the number of entries in the reference RPL structure) entries, and there are also other 0 or more entries. To decode RPL using this mode, the following information is indicated in the code stream.

First, a decoding mode and a reference index are indicated in a code stream. The reference index is an index referring to the RPL structure. The reference index may be coded as described above. The number of other entries is also indicated in the codestream. The number of other entries is the difference between the number of entries in the current RPL structure and the number of entries in the reference RPL structure. When one table entry is the STRP table entry in the reference RPL structure, the table entry is also the STRP table entry in the current RPL structure, and the value of the STRP table entry in the current RPL structure is only copied from the table entry in the reference RPL structure. When an entry is an LTRP entry in the reference RPL structure, the entry is also an LTRP entry in the current RPL structure, and the LTRP entry in the current RPL structure is merely copied from the entry in the reference RPL structure. After all entries in the reference RPL structure are copied into the current RPL structure, for each other entry, the following information is indicated: if a long-term reference picture is used in the codestream (i.e., indicated by a flag in the same parameter set), a flag is indicated to indicate whether the other entries are LTRP entries or STRP entries. If the entry is an LTRP entry, the POC LSB of the LTRP entry is indicated. Otherwise, indicating the increment POC of the STRP table entry. The value of the delta POC may indicate the delta of the previous STRP entry or simply the delta of the POC of the current picture.

The fourth RPL coding mode exploits the fact that: the entries (e.g., STRP and LTRP) in the current RPL structure are identical to the entries in the reference RPL structure or identical after the symbol values are flipped. To decode RPL using this mode, the following information is indicated in the code stream.

First, a coding mode and a reference index are indicated. The reference index is an index referring to the RPL structure. The reference index may be coded as described above. Optionally, a flag indicating whether the symbol value is inverted is also indicated in the code stream.

When the code stream is coded using forward inter prediction (e.g., inter prediction for pictures with reference POC values less than the POC value of the current picture) and backward inter prediction (e.g., inter prediction for pictures with reference POC values greater than the POC value of the current picture), the following constraints are applied to the reference pictures in RPL. For each RPL of a picture, all reference pictures in the RPL have the same inter prediction direction, e.g., all reference pictures are reference pictures for forward inter prediction or all reference pictures are reference pictures for backward inter prediction. For a pair of RPLs of one picture, no reference picture is included in RPL0 and RPL1 unless the following condition is satisfied. Provided that RPL0 and RPL1 both include the same reference picture in the same order. Under this condition, all reference pictures except the first reference picture (i.e., the reference picture with the smallest index) in RPL1 are deleted from RPL 1. Alternatively, all reference pictures except the first reference picture in RPL0 (i.e., the reference picture with the smallest index) are deleted from RPL 0.

Video coding techniques are disclosed that use a flag to indicate whether the index of the second reference picture list structure can be inferred to be the same as the index of the first reference picture list structure. That is, when the flag has a first value, the index of the second reference picture list structure is not present in the slice header of the coded video bitstream and is inferred to be the same as the index of the first reference picture list structure. On the other hand, when the flag has a second value, the index of the second reference picture list structure exists in the slice header. Using the flag in this manner, an encoder/decoder (also referred to as a "codec") in video coding is improved over current codecs (e.g., using fewer bits, requiring less bandwidth, being more efficient, etc.). In effect, the improved video coding process provides a better user experience when sending, receiving, and/or viewing video.

Fig. 5 is a schematic diagram of an embodiment of a video bitstream 500. Video bitstream 500, as used herein, may also be referred to as a coded video bitstream, a bitstream, or a variant thereof. As shown in fig. 5, the bitstream 500 includes a Sequence Parameter Set (SPS) 510, a Picture Parameter Set (PPS) 512, a slice header 514, and picture data 520.

The SPS 510 includes data common to all pictures in a sequence of pictures (SOPs). In contrast, the PPS 512 includes data common to the entire image. The slice header 514 includes information of the current slice, such as the slice type, reference pictures to be used, and the like. SPS 510 and PPS 512 may be collectively referred to as parameter sets. SPS 510, PPS 512, and slice header 514 are types of Network Abstraction Layer (NAL) units. Image data 520 includes data related to the image or video being encoded or decoded. The image data 520 may simply be referred to as payload or data carried in the codestream 500.

In one embodiment, the SPS 510, PPS 512, slice header 514, or other portion of the codestream 500 carries a plurality of reference picture list structures, each of which includes a plurality of reference picture entries. Those skilled in the art will appreciate that the codestream 500 may include other parameters and information in practical applications.

Fig. 6 is one embodiment of a method 600 implemented by a video decoder, such as video decoder 30, of decoding a coded video bitstream, such as bitstream 500. Method 600 may be performed after receiving a decoded codestream, either directly or indirectly, from a video encoder, such as video encoder 20. The method 600 improves the decoding process (e.g., makes the decoding process more efficient, faster, etc. than conventional decoding processes) because when the flag is set to a certain value, it can be inferred that the index of the second reference picture list structure is the same as the index of the first reference picture list structure. That is, as in HEVC and AVC, the second reference picture list structure need not be indicated in the coded bitstream under every scene. Thus, the performance of the codec is actually improved and the user experience is better.

In step 602, a flag is parsed from a coded video stream (e.g., video stream 500). In one embodiment, the flag is denoted rpl1_ idx _ present _ flag. In one embodiment, the flag is included in a PPS (e.g., PPS 512) of the coded video stream. In one embodiment, the flag is included in an SPS (e.g., SPS 510) of the coded video stream.

In one embodiment, the first value of the flag is 1 (1). In one embodiment, when the first value of the flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header. In one embodiment, the second value of the flag is 0 (0). In one embodiment, when the second value of the flag is 0 (0), ref _ pic _ list _ sps _ flag [0] and ref _ pic _ list _ idx [0] are included in the slice header.

In step 604, a first reference picture list structure is parsed from the coded video stream. In one embodiment, the first reference picture list structure is included in a slice header (e.g., slice header 514) of the coded video stream. In one embodiment, the flag and the first reference picture list structure are parsed separately. That is, the flag is parsed first, and then the first reference picture list structure is parsed, or vice versa.

In step 606, when the flag has a first value, it is determined that an index of a second reference picture list structure is not present in a slice header of the coded video bitstream and it is inferred that the index of the second reference picture list structure is the same as the index of the first reference picture list structure. In step 608, it is determined that the index of the second reference picture list structure is present in the slice header when the flag has the second value.

In step 610, a reference picture list is generated using the first reference picture list structure, the second reference picture list structure, or some combination thereof. The reference image list may identify one or more images, such as the images shown and described in connection with FIG. 4.

In step 612, inter prediction is performed according to the reference picture list to generate a reconstructed block. In one embodiment, the reconstruction block may be used to generate or produce an image for display to a user on a display or screen of an electronic device (e.g., a smartphone, tablet, laptop, personal computer, etc.).

Fig. 7 is an embodiment of a method 700 implemented by a video encoder (e.g., video encoder 20) for encoding a video bitstream (e.g., bitstream 500). Method 700 may be performed when an image (e.g., in video) is to be encoded in a video bitstream and then transmitted to a video decoder (e.g., video decoder 30). The method 700 improves the encoding process (e.g., makes the encoding process more efficient, faster, etc. than conventional encoding processes) because when the flag is set to a certain value, it can be inferred that the index of the second reference picture list structure is the same as the index of the first reference picture list structure. That is, as in HEVC and AVC, the second reference picture list structure need not be indicated in the coded bitstream under every scene. Thus, the performance of the codec is actually improved and the user experience is better.

In step 702, when the index of the second reference picture list structure is not encoded in the slice header of the video bitstream and the video decoder concludes that the index of the second reference picture list structure is the same as the index of the first reference picture list structure, a flag having a first value is encoded in the video bitstream. In one embodiment, the flag is denoted rpl1_ idx _ present _ flag. In one embodiment, the flag is encoded in a PPS (e.g., PPS 512) of the coded video stream (e.g., video stream 500). In one embodiment, the first reference picture list structure is included in the slice header (e.g., slice header 514).

In step 704, when the index of the second reference picture list structure is encoded in a slice header of the video bitstream, a flag having a second value is encoded in the video bitstream. In one embodiment, the first reference picture list structure and the second reference picture list are encoded in a slice header of the coded video bitstream.

In step 706, when the flag is encoded with the first value as its value, the first reference picture list structure is encoded in the video bitstream. In step 708, when the flag is encoded with the second value as its value, the first reference picture list structure and the second reference picture list structure are encoded in the video bitstream. In one embodiment, the first value of the flag is 1 (1) and the second value of the flag is 0 (0). In one embodiment, when the first value of the flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header.

In step 710, the video bitstream is transmitted to the video decoder (e.g., video decoder 30). The video decoder, upon receiving the encoded video bitstream, may decode (e.g., as described above) to generate or produce an image for display to a user on a display or screen of an electronic device (e.g., a smartphone, tablet, laptop, personal computer, etc.).

The description of the technology disclosed herein is provided with respect to the latest VVC WD. Additionally, definitions, syntax, and semantics suitable for implementing the techniques disclosed herein are also provided.

First, several definitions are provided. An Intra Random Access Point (IRAP) picture is a coded picture in which NAL _ unit _ type of each Video Coding Layer (VCL) NAL unit is equal to IRAP _ NUT. A long-term reference picture (LTRP) is a picture identified as "used for long-term reference". A NON-IRAP picture is a coded picture with nal _ unit _ type equal to NON _ IRAP _ NUT for each VCLNAL unit. The reference picture list is a list of reference pictures used for inter prediction of a P slice or a B slice. Two reference picture lists are generated for each slice in the non-IRAP picture: reference picture list 0 and reference picture list 1. The unique picture set referenced by all entries in the two reference picture lists associated with a picture includes all reference pictures that can be used for inter prediction of the associated picture or any picture following the associated picture in decoding order. When decoding slice data of P slices, only the reference picture list 0 is used for inter prediction. In decoding slice data of the B slice, both reference picture lists are used for inter prediction. When decoding slice data of an I slice, no reference picture list is used for inter prediction. A short-term reference picture (STRP) is a picture identified as "for short-term reference".

Some abbreviations are provided below. LTRP as used herein denotes a long-term reference picture and STRP denotes a short-term reference picture.

The following portions of the invention provide syntax and semantics suitable for implementing the techniques disclosed herein.

NAL unit header syntax

Sequence parameter set RBSP syntax

Picture parameter set RBSP syntax

Strip head grammar

Reference picture list structure syntax

NAL unit header semantics

forbidden _ zero _ bit equals 0.NAL _ unit _ type denotes a type of RBSP data structure included in the NAL unit.

TABLE 4-1: NAL unit type code and NAL unit type category

nuh temporal id plus1 minus1 represents the temporal identifier of a NAL unit. The value of nuh temporal id plus1 is not equal to 0. The variable temporalld is expressed as follows: temporalld = nuh _ temporal _ id _ plus1-1. When nal _ unit _ type is equal to IRAP _ NUT, the coded slice belongs to an IRAP picture, and temporalld is equal to 0. All vcl nal units in the access unit have the same temporalld value. The temporalld value of a coded picture or access unit is the temporalld value of the VCL NAL unit in that coded picture or access unit. The temporalld values for non-VCL NAL units are constrained as follows: if NAL _ unit _ type is equal to SPS _ NUT, then Temporalld is equal to 0, and Temporalld of the access unit including the NAL unit is equal to 0. Otherwise, if nal _ unit _ type is equal to EOS _ NUT or EOB _ NUT, temporalld is equal to 0. Otherwise, the TemporalId is greater than or equal to the TemporalId of the access unit that includes the NAL unit. When a NAL unit is a non-VCL NAL unit, the temporalld value is equal to the minimum of the temporalld values of all access units that include the non-VCL NAL unit. When NAL _ unit _ type is equal to PPS _ NUT, the temporalld may be greater than or equal to the temporalld of the access unit including the NAL unit, since all Picture Parameter Sets (PPS) may be included at the start of the codestream, where the temporalld of the first coded picture is equal to 0. When NAL _ unit _ type is equal to PREFIX _ SEI _ NUT or SUFFIX _ SEI _ NUT, the temporalld may be greater than or equal to the temporalld of the access unit including the NAL unit, since the SEI NAL unit may include information applied to the bitstream subset including the access unit whose temporalld value is greater than the temporalld of the access unit including the SEI NAL unit. nuh _ reserved _ zero _7bits is equal to '0000000'. Other values for nuh _ reserved _ zero _7bits may be specified in the future by the ITU-T or ISO/IEC. The decoder ignores (i.e., deletes and discards from the code stream) NAL units whose value of nuh _ reserved _ zero _7bits is not equal to '0000000'.

Sequence parameter set RBSP semantics

log2_ max _ pic _ order _ cnt _ lsb _ minus4 represents the value of a variable MaxPicOrderCntLsb used in the decoding process for picture order numbering, as follows: maxPicOrderCntLsb =2 (log 2_ max _ pic _ order _ cnt _ lsb _ minus4+ 4). The value range of log2_ max _ pic _ order _ cnt _ lsb _ minus4 is 0 to 12 (inclusive). sps _ max _ dec _ pic _ buffering _ minus1 plus1 represents the maximum size of a decoded picture buffer required for coding a video sequence (CVS), in units of picture storage buffers. The value range of sps _ max _ dec _ pic _ buffering _ minus1 is 0-MaxDipSize-1 (including an end value), wherein the MaxDipSize is shown elsewhere. long term ref pics flag equal to 0 means that there is no LTRP for inter prediction of any coded picture in the CVS. long _ term _ ref _ pics _ flag equal to 1 indicates that LTRP may be used for inter prediction of one or more coded pictures in the CVS. additional _ lt _ poc _ lsb represents the value of the variable maxlltpcordercntlsb used in the decoding process for the reference picture list, as follows: maxltpricrdercntlsb =2 (log 2_ max _ pic _ order) _ cnt _ lsb _ minus4+ addtional _ poc _lsb). The value range of the additional _ lt _ poc _ lsb is 0-32-log 2_ max _ pic _ order _ cnt _ lsb _ minus4-4 (including an end value). If the additional _ lt _ poc _ lsb is not present, then the value of the additional _ lt _ poc _ lsb is inferred to be equal to 0.

all _ rpl _ entries _ same _ sign _ flag equal to 1 means that all STRP entries in each ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) have the same sign value, wherein the sign means a positive value or a negative value. all _ rpl _ entries _ same _ sign _ flag equal to 0 indicates that each STRP entry in ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) may or may not have the same symbol value. rpl1_ copy _ from _ rpl0_ flag equal to 1 means num _ ref _ pic _ lists _ in _ sps [1] and ref _ pic _ list _ struct (1, rplsidx, ltrpflag) are not present, and the following applies: the value of num _ ref _ pic _ lists _ in _ sps [1] is set to the value of num _ ref _ pic _ lists _ in _ sps [0 ]. The inferred syntax structure ref _ pic _ list _ struct (1, rplsidx, ltrpflag) has the same value as ref _ pic _ list _ struct (0, rplsidx, ltrpflag). Therefore, it is inferred that the syntax elements in ref _ pic _ list _ struct (1, rplsidx, ltrpflag) are equal to the syntax elements in ref _ pic _ list _ struct (0, rplsidx, ltrpflag), respectively.

num _ ref _ pic _ lists _ in _ SPS [ i ] represents the number of syntax structures ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) with listIdx equal to i included in the SPS. num _ ref _ pic _ lists _ in _ sps [ i ] has a value ranging from 0 to 64 (inclusive). For each value of listIdx (equal to 0 or 1), the decoder needs to allocate memory for num _ ref _ pic _ lists _ in _ sps [ i ] +1 syntax structures ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag), since one syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) can be directly indicated in the slice header of the current picture.

Picture parameter set RBSP semantics

When i is equal to 0, num _ ref _ idx _ default _ active _ minus1[ i ] plus1 represents an inferred value of a variable NumRefIdxActive [0] of a P slice or a B slice where num _ ref _ idx _ active _ override _ flag is equal to 0; when i is equal to 1, num _ ref _ idx _ default _ active _ minus1[ i ] plus1 represents an inferred value of NumRefIdxActive [1] for B-slice where num _ ref _ idx _ active _ override _ flag is equal to 0. The num _ ref _ idx _ default _ active _ minus1[ i ] has a value ranging from 0 to 14 (inclusive). rpl1_ idx _ present _ flag equal to 0 means that ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not present in the slice header. rpl1_ idx _ present _ flag equal to 1 means that ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] may exist in the slice header.

Banding header semantics

If the syntax elements slice _ pic _ parameter _ set _ id and slice _ pic _ order _ cnt _ lsb are present, the values of the slice header syntax elements slice _ pic _ parameter _ set _ id and slice _ pic _ order _ cnt _ lsb are the same in all slice headers of the coded picture. slice _ type indicates the decoding type of the slice, as shown in table 7-3.

Tables 7 to 3: association relation between name and slice _ type

slice_type	Name of slice _ type
		0	B (B strip)
1	P (P strip)
		2	I (I strip)

When nal _ unit _ type is equal to IRAP _ NUT, i.e. when the picture is an IRAP picture, slice _ type is equal to 2.slice _ pic _ order _ cnt _ lsb represents the value of the picture order number of the current picture modulo MaxPicOrderCntLsb. The syntax element slice _ pic _ order _ cnt _ lsb has a length of (log 2_ max _ pic _ order _ cnt _ lsb _ minus4+ 4) bits. The value range of slice _ pic _ order _ cnt _ lsb is 0-MaxPicOrderCntLsb-1 (including an end value). If slice _ pic _ order _ cnt _ lsb does not exist, then slice _ pic _ order _ cnt _ lsb is inferred to be equal to 0.ref _ pic _ list _ SPS _ flag [ i ] equal to 1 means that the reference picture list i of the current picture is derived from one of the syntax structures ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) with listIdx equal to i in the activation SPS. ref _ pic _ list _ sps _ flag [ i ] equal to 0 means that the reference picture list i of the current picture is derived from the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) of listIdx equal to i directly included in the slice header of the current picture. When num _ ref _ pic _ lists _ in _ sps [ i ] is equal to 0, the value of ref _ pic _ list _ sps _ flag [ i ] is equal to 0. When rpl1_ idx _ present _ flag is equal to 0 and ref _ pic _ list _ sps _ flag [0] is present, it is inferred that the value of ref _ pic _ list _ sps _ flag [1] is equal to the value of ref _ pic _ list _ sps _ flag [0 ]. ref _ pic _ list _ idx [ i ] represents an index in the list of syntax structures ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) for deriving the reference picture list i of the current picture with listIdx equal to i, included in the activation SPS for listIdx equal to i. The syntax element ref _ pic _ list _ idx [ i ] is represented by Ceil (Log 2 (num _ ref _ pic _ lists _ in _ sps [ i ])) bits. If ref _ pic _ list _ idx [ i ] is not present, then the value of ref _ pic _ list _ idx [ i ] is inferred to be equal to 0. The ref _ pic _ list _ idx [ i ] ranges from 0 to num _ ref _ pic _ lists _ in _ sps [ i ] -1 (inclusive). When rpl1_ idx _ present _ flag is equal to 0 and ref _ pic _ list _ sps _ flag [0] is present, it is inferred that the value of ref _ pic _ list _ idx [1] is equal to the value of ref _ pic _ list _ idx [0 ]. num _ ref _ idx _ active _ override _ flag equal to 1 means that syntax elements num _ ref _ idx _ active _ minus1[0] exist for P and B slices, while syntax elements num _ ref _ idx _ active _ minus1[1] exist for B slices. num _ ref _ idx _ active _ override _ flag equal to 0 means that syntax elements num _ ref _ idx _ active _ minus1[0] and num _ ref _ idx _ active _ minus1[1] are not present. If num _ ref _ idx _ active _ minus1[ i ] exists, num _ ref _ idx _ active _ minus1[ i ] represents the value of the variable NumRefidxActive [ i ], as follows: numRefIdxActive [ i ] = num _ ref _ idx _ active _ minus1[ i ] +1. The num _ ref _ idx _ active _ minus1[ i ] ranges from 0 to 14 inclusive.

The value of NumRefIdxActive [ i ] -1 represents the maximum reference index in the reference picture list i that can be used to decode a slice. When the value of NumRefIdxActive [ i ] is equal to 0, no reference index in reference picture list i can be used to decode the slice. In the case where i is equal to 0 or 1, if the current slice is a B slice and num _ ref _ idx _ active _ override _ flag is equal to 0, it is inferred that NumRefIdxActive [ i ] is equal to num _ ref _ idx _ default _ active _ minus1[ i ] +1. If the current slice is a P slice and num _ ref _ idx _ active _ override _ flag is equal to 0, then it is inferred that NumRefIdxActive [0] is equal to num _ ref _ idx _ default _ active _ minus1[0] +1. If the current stripe is a P stripe, then NumRefIdxActive [1] is inferred to be equal to 0. If the current band is an I band, then both NumRefIdxActive [0] and NumRefIdxActive [1] are inferred to be equal to 0.

Alternatively, in the case where i is equal to 0 or 1, it is assumed that rplsIdx1 is set to ref _ pic _ list _ sps _ flag [ i ]? ref _ pic _ list _ idx [ i ] num _ ref _ pic _ lists _ in _ sps [ i ], and numRpEntries [ i ] is set to num _ strp _ entries [ i ] [ rplsIdx1] + num _ ltrp _ entries [ i ] [ rplsIdx1]. When NumRefIdxActive [ i ] is greater than numRpEntries [ i ], the value of NumRefIdxActive [ i ] is set to numRpEntries [ i ].

Reference image list structure semantics

The syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) may exist in the SPS or slice header. The specific meaning of this syntax structure depends on whether it is included in the slice header or in the SPS: if the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) exists in the slice header, the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) indicates the reference picture list listIdx of the current picture (including the pictures of the slice). Otherwise (present in SPS), the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) represents a candidate for the reference picture list listIdx, and the term "current picture" in the semantics specified in the remainder of this section refers to: (1) Each picture comprising one or more slices, wherein a slice comprises ref _ pic _ list _ idx [ listIdx ] equal to an index in a list of syntax structures ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) comprised in the SPS, (2) each picture in the CVS, wherein the CVS has the same SPS as the SPS is activated. rpl _ mode [ listIdx ] [ rplsIdx ] represents the coding mode of syntax elements in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). num _ STRP _ entries [ listIdx ] [ rplsIdx ] represents the number of STRP entries in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). num _ LTRP _ entries [ listIdx ] [ rplsIdx ] represents the number of LTRP entries in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). If num _ ltrp _ entries [ listIdx ] [ rplsIdx ] is not present, the value of num _ ltrp _ entries [ listIdx ] [ rplsIdx ] is inferred to be equal to 0. The variable NumEntriesInList [ listIdx ] [ rplsIdx ] is derived as follows: numEntriesInList [ listIdx ] [ rplsIdx ] = num _ strp _ entries [ listIdx ] [ rplsIdx ] + num _ ltrp _ entries [ listIdx ] [ rplsIdx ]. The value range of NumEntriesInList [ listIdx ] [ rplsIdx ] is 0-sps _ max _ dec _ pic _ buffering _ minus1 (including end values). STRP _ entries _ sign _ flag [ listIdx ] [ rplsIdx ] equal to 1 means that the values of all STRP entries in ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) are greater than or equal to 0.STRP _ entries _ sign _ flag [ listIdx ] [ rplsIdx ] equal to 0 means that the values of all STRP entries in ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) are less than 0.

lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 1 indicates that the ith entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) is an LTRP entry. And the condition that the ith table entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) is an STRP table entry is represented by that the lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 0. If lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is not present, then the value of lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is inferred to be equal to 0. The requirement of code stream consistency is as follows: the sum of lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] (all values of i are within the range 0 to NumEntriesInList [ listIdx ] [ rplsIdx ] -1 (inclusive)) is equal to num _ ltrp _ entries [ listIdx ] [ rplsIdx ]. strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 1 means that the value of the ith entry in ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) is greater than or equal to 0.strp _ entries _ sign _ flag [ listIdx ] [ rplsIdx ] equal to 0 means that the value of the ith entry in ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) is less than 0. If strp _ entries _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] is not present, it is inferred that the value of strp _ entries _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to the value of strp _ entries _ sign _ flag [ listIdx ] [ rplsIdx ].

When the ith entry is the first STRP entry in the syntax structure ref _ pic _ list _ struct (rplsIdx, ltrpFlag), delta _ poc _ st [ listIdx ] [ rplsIdx ] [ i ] represents the difference between the image sequence number values of the current image and the image referred to by the ith entry, or when the ith entry is a STRP entry in the syntax structure ref _ pic _ list _ struct (rplsIdx, ltrpFlag) but is not the first STRP entry, delta _ poc _ st [ listIdx ] [ rpidx ] [ i ] represents the difference between the image sequence number values of the image referred to by the ith entry and the previous STRP entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). The value range of delta _ poc _ st [ listIdx ] [ rplsIdx ] [ i ] is-215 to 215-1 (including end values). poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] represents the value modulo maxltpicticordercntlsb by the picture order number of the picture referenced by the i-th entry in the syntax structure ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). The syntax element poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] has a length of Log2 (MaxLtPocidOrderCntLsb) bits. The derivation of the array DeltaPocSt [ listIdx ] [ rplsIdx ] is as follows:

mode1_ ref _ rpl _ idx _ delta _ minus1[ listIdx ] [ rplsIdx ] plus1 represents the difference between the value of rplsIdx and the index of the reference ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). When rpl _ mode [ listIdx ] [ rplsIdx ] is equal to 1, the variable RefRplidx is derived as follows:

RefRplIdx＝rplsIdx–(mode1_ref_rpl_idx_delta_minus1[listIdx][rplsIdx]+1)

STRP _ offset _ val _ minus1[ listIdx ] [ rplsIdx ] plus1 represents the value subtracted from each STRP entry in the reference ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag) to calculate the value of delta _ poc _ st [ listIdx ] [ rplsIdx ] [ i ] for the current ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). ref _ entry _ used _ flag [ listIdx ] [ rplsIdx ] [ i ] equal to 1 means that the ith entry in ref _ pic _ list _ struct (0, refRplidx, ltrpFlag) is used as an entry in ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). ref _ entry _ used _ flag [ listIdx ] [ rplsIdx ] equal to 0 means that the ith entry in ref _ pic _ list _ struct (0, refrpldx, ltrpFlag) is not used as an entry in ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag).

When rpl _ mode [ listIdx ] [ rplsIdx ] is equal to 1, the following applies to infer the value of the syntax element lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ]; when lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 1, the following applies to infer the value of poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] and to derive the variable DeltaPocSt [ listIdx ] [ rplsIdx ] [ i ] (when lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 0) and the variable NumEntriesInList [ listIdx ] [ rplsIdx ]:

mode2_ ref _ rpl _ idx [ listIdx ] [ rplsIdx ] denotes the index of the reference ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). The syntax element mode2_ ref _ rpl _ idx [ listIdx ] [ rplsIdx ] is represented by Ceil (Log 2 (num _ ref _ pic _ lists _ in _ sps [0 ])) bits. When rpl _ mode [ listIdx ] [ rplsIdx ] is equal to 2, the variable RefRplidx is derived as follows:

RefRplIdx＝mode2_ref_rpl_idx[listIdx][rplsIdx]

num _ additional _ entries [ listIdx ] [ rplsIdx ] represents the delta between NumEntriesInList [ listIdx ] [ rplsIdx ] and NumEntriesInList [0] [ RefRplIdx ]. Add _ lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is used to infer the value of lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ NumEntriesInList [0] [ RefRplIdx ] + i ]. If add _ lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is not present, then the value of add _ lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is inferred to be equal to 0.add _ strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] is used to infer the value of strp _ entry _ sign _ flag [ listIdx ] [ rplsIdx ] [ NumEntriesInList [0] [ RefrplIdx ] + i ]. If add _ strp _ entries _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] does not exist, the value of add _ strp _ entries _ sign _ flag [ listIdx ] [ rplsIdx ] [ i ] is set to the value of strp _ entries _ sign _ flag [ listIdx ] [ rplsIdx ].

Add _ delta _ poc _ st [ listIdx ] [ rplsIdx ] [ i ] is used to infer the value of delta _ poc _ st [ listIdx ] [ rplsIdx ] [ NumEntriesInList [0] [ RefRplIdx ] + i ]. The value range of add _ delta _ poc _ st [ listIdx ] [ rplsIdx ] [ i ] is-215 to 215-1 (including an end value). add _ poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] is used to infer the value of poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ NumEntriesInList [0] [ RefrplIdx ] + i ]. The syntax element add _ poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] has a length of Log2 (MaxLtPomicOrderCntLsb) bits.

When rpl _ mode [ listIdx ] [ rplsIdx ] is equal to 2, the following applies to infer the values of syntax elements strp _ entries _ sign _ flag [ listIdx ] [ rplsIdx ] and lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ]; when lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 1, the following applies to infer the value of poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] and to derive the variable DeltaPocSt [ listIdx ] [ rplsIdx ] [ i ] (when lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 0) and the variable NumEntriesInList [ listIdx ] [ rplsIdx ]:

mode3_ ref _ rpl _ idx [ listIdx ] [ rplsIdx ] denotes the index of the reference ref _ pic _ list _ struct (listIdx, rplsIdx, ltrpFlag). The syntax element mode3_ ref _ rpl _ idx [ listIdx ] [ rplsIdx ] is represented by Ceil (Log 2 (num _ ref _ pic _ lists _ in _ sps [0 ])) bits. When rpl _ mode [ listIdx ] [ rplsIdx ] is equal to 3, the variable RefRplidx is derived as follows:

RefRplIdx＝mode3_ref_rpl_idx[listIdx][rplsIdx]

when rpl _ mode [ listIdx ] [ rplsIdx ] is equal to 3, the following applies to infer the value of syntax element lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ]; when lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 1, the following applies to infer the value of poc _ lsb _ lt [ listIdx ] [ rplsIdx ] [ i ] and to derive the variable DeltaPocSt [ listIdx ] [ rplsIdx ] [ i ] (when lt _ ref _ pic _ flag [ listIdx ] [ rplsIdx ] [ i ] is equal to 0) and the variable NumEntriesInList [ listIdx ] [ rplsIdx ]:

the general decoding process is provided below.

The decoding process of the current picture CurrPic operates as follows. The decoding of NAL units is detailed below. The following process details the following decoding process using syntax elements at the slice header layer and higher layers. Variables and functions related to picture order numbering are derived. This is only called for the first stripe of the image. At the start of the decoding process for each slice in a non-IRAP picture, a decoding process for reference picture list construction is invoked to derive reference picture list 0 (RefPicList [0 ]) and reference picture list 1 (RefPicList [ 1) ]. A decoding process for reference picture identification is invoked, wherein a reference picture may be identified as "unused for reference" or "used for long-term reference". This is only called for the first stripe of the image. The decoding process for coding tree elements, scaling, transformation, in-loop filtering, etc. is invoked. After all slices of the current picture have been decoded, the current decoded picture is identified as "used for short-term reference".

The NAL unit decoding process is provided below.

The inputs to this process are the NAL units of the current picture and their associated non-VCL NAL units. The output of this process is the parsed RBSP syntax structure encapsulated within NAL units. The decoding process for each NAL unit extracts the RBSP syntax structure from the NAL unit and then parses the RBSP syntax structure.

The following provides the slice decoding process.

The decoding process for picture order numbering is as follows.

The output of this process is PicOrderCntVal, the picture order number of the current picture. The picture order number is used to identify pictures, to derive motion parameters and motion vector predictions in merge (merge) mode, and for decoder consistency checks. Each coded picture is associated with a picture order number variable denoted PicOrderCntVal. If the current picture is not an IRAP picture, the variables prevPicOrderCntLsb and prevPicOrderCntMsb are derived as follows: let prevTid0Pic be the previous picture in decoding order with temporalld equal to 0. The variable prevPicOrderCntLsb is set to slice _ Pic _ order _ cnt _ lsb of prevTid0 Pic. The variable prevPicOrderCntMsb is set to PiOrderCntMsb of prevTid0 Pic.

The variable PicOrderCntMsb for the current picture is derived as follows: if the current picture is an IRAP picture, picOrderCntMsb is set to 0. Otherwise, picOrderCntMsb is derived as follows:

PicOrderCntVal was derived as follows: picOrderCntVal = PicOrderCntMsb + slice _ pic _ order _ cnt _ lsb.

Since it is inferred that slice _ pic _ order _ cnt _ lsb of the IRAP picture is equal to 0 and prevPicOrderCntLsb and prevPicOrderCntMsb are both set to 0, picOrderCntVal of all IRAP pictures is equal to 0.PicOrderCntVal takes on values ranging from-231 to 231-1 (inclusive). In one CVS, the PicOrderCntVal values of any two coded pictures are not the same.

At any time during decoding, the values of PicOrderCntVal & (maxrefpicpicordercntlsb-1) of any two reference pictures in the DPB are not the same. The function PicOrderCnt (picX) is expressed as follows: picOrderCnt (picX) = PicOrderCntVal of an image picX. The function diffpicorercnt (picA, picB) is expressed as follows: diffPicOrderCnt (picA, picB) = PicOrderCnt (picA) -PicOrderCnt (picB). The codestream does not include data that would cause the values of DiffPicOrderCnt (picA, picB) used in the decoding process to be outside the range-215 to 215-1, inclusive. Assuming that X is the current image and Y and Z are two other images in the same CVS, Y and Z are considered to be in the same output order direction relative to X when DiffPicOrderCnt (X, Y) and DiffPicOrderCnt (X, Z) are both positive or both negative.

The following provides a decoding process for reference picture list construction.

This process is invoked when the decoding process for each slice in the non-IRAP picture begins. The reference picture is addressed by a reference index. The reference index is an index in the reference picture list. When decoding an I slice, slice data is decoded without using a reference picture list. When decoding a P slice, only reference picture list 0 (i.e., refPicList [0 ]) is used to decode the slice data. In decoding a B slice, the slice data is decoded using both reference picture list 0 and reference picture list 1 (i.e., refPicList [1 ]). At the start of the decoding process for each slice in a non-IRAP picture, reference picture lists RefPicList [0] and RefPicList [1] are derived. The two reference picture lists are used to identify reference pictures or decoded slice data. RefPicList [0] and RefPicList [1] may be derived if the I slice in a non-IRAP picture is not the first slice in the picture, in order to check codestream consistency, but RefPicList [0] and RefPicList [1] do not necessarily need to be derived for decoding the current picture or for decoding pictures that follow the current picture in order. RefPicList [1] may be derived if the P slice is not the first slice in a picture in order to check for code stream consistency, but the derivation of RefPicList [1] is not necessarily required for decoding the current picture or pictures whose decoding order follows the current picture. Reference picture lists RefPicList [0] and RefPicList [1] are constructed as follows:

in the case where each i is equal to 0 or 1, the first NumRefIdxActive [ i ] entries in RefPicList [ i ] are called active entries in RefPicList [ i ], and the other entries in RefPicList [ i ] are called inactive entries in RefPicList [ i ]. Each entry in RefPicList [ i ] [ j ] (j is within the range 0-NumEntriesInList [ i ] -1 (inclusive)) is called an STRP entry if lt _ ref _ pic _ flag [ i ] [ j ] is equal to 0, and otherwise called an LTRP entry. It is possible that a picture is commonly referenced by an entry in RefPicList [0] and an entry in RefPicList [1]. It is also possible that a picture is referenced by multiple entries in RefPicList [0] or multiple entries in RefPicList [1]. The active entry in RefPicList [0] and the active entry in RefPicList [1] collectively reference all reference pictures that may be used in inter prediction of the current picture and one or more pictures following the current picture in decoding order. The inactive entries in RefPicList [0] and the inactive entries in RefPicList [1] collectively reference all reference pictures that are not used in inter prediction of the current picture, but may be used in inter prediction of one or more pictures following the current picture in decoding order. One or more entries equal to "no reference pictures" may exist in RefPicList [0] or RefPicList [1] because there is no corresponding picture in the DPB. Each inactive entry in RefPicList [0] or RefPicList [0] that is equal to "no reference picture" is ignored. An unintentional picture loss is inferred for each active entry in RefPicList [0] or RefPicList [1] that is equal to "no reference picture".

The requirement of code stream consistency is to use the following constraint conditions: in the case where each i is equal to 0 or 1, numEntriesInList [ i ] [ RlsIdx [ i ] ] is not less than NumRefidxActive [ i ]. The picture referenced by each active entry in RefPicList [0] or RefPicList [1] exists in the DPB and has a TemporalId less than or equal to the TemporalId of the current picture. Optionally, the following constraints may also be specified: the entry index of any inactive entry in RefPicList [0] or RefPicList [1] is not used as a reference index for decoding the current picture. Optionally, the following constraints may also be specified: an inactive entry in RefPicList [0] or RefPicList [1] does not reference the same picture as any other entry in RefPicList [0] or RefPicList [1]. The STRP entry in RefPicList [0] or RefPicList [1] for a slice in a picture does not refer to the same picture as the LTRP entry in RefPicList [0] or RefPicList [1] for the same slice or a different slice in the same picture. The current picture is not referenced by any entry in RefPicList [0] or RefPicList [1]. The difference between PicOrderCntVal of the current picture and PicOrderCntVal of the picture referred to by the LTRP entry is greater than or equal to 224, and no such entry exists in RefPicList [ 2 ] or RefPicList [1]. Let setOfRefPics be the unique set of images referenced by all entries in RefPicList [0] and all entries in RefPicList [1]. The number of pictures in setOfRefPics is less than or equal to sps _ max _ dec _ pic _ buffering _ minus1, and all slices in a picture have the same setOfRefPics.

The following provides a decoding process for reference picture identification.

This process is invoked once per picture after the decoding process of the slice header and reference picture list reconstruction for the slice, but before the slice data is decoded. This process may cause one or more reference pictures in the DPB to be identified as "unused for reference" or "used for long-term reference". A decoded picture in a DPB may be identified as "unused for reference", "used for short-term reference", or "used for long-term reference", but only one of these three identification states may exist at any given moment in the decoding process. Assigning one of the identification states to an image implicitly deletes the other identification states (if applicable). When an image is identified as "for reference," this refers collectively to identifying the image as "for short-term reference" or "for long-term reference" (but not simultaneously). If the current picture is an IRAP picture, all reference pictures (if any) currently included in the DPB are identified as "unused for reference". STRPs are identified by their PicOrderCntVal values. The LTRPs are identified by the Log2 (maxltpicterrdercntlsb) LSBs of their PicOrderCntVal values. For each LTRP entry in RefPicList [0] or RefPicList [1], when the referenced picture is STRP, the picture is identified as "for long-term reference". Each reference picture in the DPB that is not referenced by any entry in RefPicList [0] or RefPicList [1] is identified as "unused for reference".

Fig. 8 is a schematic diagram of a video coding apparatus 800 (e.g., video encoder 20 or video decoder 30) according to an embodiment of the invention. Video coding apparatus 800 is suitable for implementing the disclosed embodiments described herein. The video coding apparatus 800 comprises: an ingress port 810 and a receive unit (Rx) 820 for receiving data; a processor, logic unit, or Central Processing Unit (CPU) 830 for processing the data; a transmitting unit (Tx) 840 and an egress port 850 for transmitting the data; a memory 860 for storing said data. The video decoding apparatus 800 may further include an optical-to-electrical (OE) component and an electrical-to-optical (EO) component coupled to the input port 810, the receiving unit 820, the transmitting unit 840, and the output port 850, serving as an outlet or inlet of optical signals or electrical signals.

The processor 830 is implemented by hardware and software. The processor 830 may be implemented as one or more CPU chips, one or more cores (e.g., a multi-core processor), one or more field-programmable gate arrays (FPGA), one or more Application Specific Integrated Circuits (ASIC), and one or more Digital Signal Processors (DSP). Processor 830 communicates with ingress port 810, receiver unit 820, transmit unit 840, egress port 850, and memory 860. Processor 830 includes a decode module 870. The decode module 870 implements the disclosed embodiments described above. For example, decode module 870 performs, processes, prepares, or provides various network connection functions. Thus, the inclusion of the decode module 870 provides a substantial improvement in the functionality of the video coding apparatus 800 and affects the transition of the video coding apparatus 800 to different states. Alternatively, decode module 870 is implemented as instructions stored in memory 860 and executed by processor 830.

Video coding device 800 may also include an input/output (I/O) device 880 for communicating data with a user. The I/O devices 880 may include output devices such as a display to display video data, speakers to output audio data, and the like. The I/O devices 880 may also include input devices such as a keyboard, mouse, trackball, etc., and/or corresponding interfaces to interact with the output devices described above.

Memory 860 includes one or more disks, one or more tape drives, and one or more solid state drives, and may serve as an over-flow data storage device to store programs when such programs are selected for execution, and to store instructions and data that are read during execution of the programs. The memory 860 may be volatile and/or nonvolatile, and may be read-only memory (ROM), random Access Memory (RAM), ternary content-addressable memory (TCAM), and/or static random-access memory (SRAM).

FIG. 9 is a diagram of one embodiment of a decode module 900. In the present embodiment, coding module 900 is implemented in video coding apparatus 902 (e.g., video encoder 20 or video decoder 30). The video coding apparatus 902 includes a receiving module 901. The receiving module 901 is used for receiving an image for encoding or receiving a code stream for decoding. The video coding device 902 includes a transmitting module 907 coupled to the receiving module 901. The transmitting module 907 is used for transmitting the code stream to a decoder or transmitting the decoded image to a display module (e.g., one of the plurality of I/O devices 880).

Video coding apparatus 902 includes a storage module 903. The storage module 903 is coupled to at least one of the receiving module 901 or the transmitting module 907. The storage module 903 is used to store instructions. Video coding device 902 also includes a processing module 905. The processing module 905 is coupled to the storage module 903. The processing module 905 is used to execute instructions stored in the storage module 903 to perform the methods disclosed herein.

It should also be understood that the steps of the exemplary methods set forth herein do not necessarily need to be performed in the order described, and the order of the steps of these methods should be understood as being merely exemplary. Likewise, methods consistent with various embodiments of the present invention may include additional steps, and certain steps may be omitted or combined.

While several embodiments of the present invention have been provided, it should be understood that the disclosed systems and methods may be embodied in many other specific forms without departing from the spirit or scope of the present invention. The present examples are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.

In addition, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or described as coupled or directly coupled or communicating with each other may also be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.

Claims

1. A method implemented by a video decoder for decoding a coded video stream, the method comprising:

parsing a first flag from the coded video stream;

parsing a second flag and a first index of a first reference picture list syntax structure from the coded video bitstream;

when the first flag has a first value: determining that a third flag and a second index of a second reference picture list syntax structure are not present in a slice header of the coded video stream and inferring that values of the third flag and the second index of the second reference picture list syntax structure are the same as values of the second flag and the first index of the first reference picture list syntax structure, respectively; generating a reference picture list using the first reference picture list syntax structure from the coded video bitstream; wherein the first reference picture list syntax structure is one of reference picture list syntax structures that activates listIdx in SPS equal to 0 when the second flag is equal to 1; when the second flag is equal to 0, the first reference picture list syntax structure is a reference picture list syntax structure in which listIdx included directly in the slice header is equal to 0; the first index is an index in a list of reference picture list syntax structures included in the primary SPS for which listIdx is equal to 0;

when the first flag has a second value: determining that the third flag indicates that a value of a second index of the second reference picture list syntax structure is present in the slice header; generating a reference picture list using the first reference picture list syntax structure and the second reference picture list syntax structure from the coded video bitstream; wherein the second reference picture list syntax structure is one of the reference picture list syntax structures that activates listIdx in SPS equal to 1 when the third flag is equal to 1; when the third flag is equal to 0, the second reference picture list syntax structure is a reference picture list syntax structure in which listIdx included directly in the slice header is equal to 1; the second index is an index in a list of reference picture list syntax structures included in the primary SPS for which listIdx is equal to 1;

performing inter prediction from the reference picture list to generate a reconstructed block.

2. The method of claim 1, wherein the first flag is denoted rpl1_ idx _ present _ flag.

3. The method of claim 1 or claim 2, wherein the first flag is included in a Picture Parameter Set (PPS) of the coded video bitstream.

4. The method of any of claims 1-3, wherein the first flag is included in a Sequence Parameter Set (SPS) of the coded video bitstream.

5. The method of any of claims 1 to 4, wherein the first reference picture list syntax structure is included in a slice header of the coded video bitstream.

6. The method of claim 1, wherein the first flag is included in a Picture Parameter Set (PPS) of the coded video stream, and wherein the first reference picture list syntax structure is included in a slice header of the coded video stream.

7. The method according to any one of claims 1 to 6, wherein the first value of the first flag is 1 (1).

8. The method according to any of claims 1 to 7, wherein, when the first value of the first flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header.

9. The method of claim 1, wherein the second value of the first flag is 0 (0).

10. The method of claim 9, wherein ref pic list sps flag [0] and ref pic list idx [0] are included in the slice header when the second value of the first flag is 0 (0).

11. A method implemented by a video encoder for encoding a video stream, the method comprising:

when a third flag and a second index of a second reference picture list syntax structure are not coded in a slice header of the video code stream and the values of the third flag and the second index of the second reference picture list syntax structure are inferred to be the same as the values of the second flag and a first index of a first reference picture list syntax structure respectively, coding a first flag with a first value in the video code stream; wherein the first reference picture list syntax structure is one of the reference picture list syntax structures that activates listIdx equals 0 in SPS when the second flag equals 1; when the second flag is equal to 0, the first reference picture list syntax structure is a reference picture list syntax structure in which listIdx included directly in the slice header is equal to 0; the first index is an index in a list of reference picture list syntax structures included in the primary SPS for which listIdx is equal to 0;

when the third flag and a second index of the second reference picture list syntax structure are coded in a slice header of the video code stream, coding a first flag with a second value in the video code stream; wherein the second reference picture list syntax structure is one of the reference picture list syntax structures that activates listIdx in SPS equal to 1 when the third flag is equal to 1; when the third flag is equal to 0, the second reference picture list syntax structure is a reference picture list syntax structure in which listIdx included directly in the slice header is equal to 1; the second index is an index in a list of reference picture list syntax structures included in the primary SPS for which listIdx is equal to 1;

when the first flag is coded by taking the first value as the value of the first flag, coding the second flag and a first index of the first reference picture list syntax structure in the video code stream;

when the first flag is coded by taking the second value as the value of the first flag, coding the second flag and a first index of the first reference picture list syntax structure and coding the third flag and a second index of the second reference picture list syntax structure in the video code stream;

and sending the video code stream to the video decoder.

12. The method of claim 11, wherein the first flag is denoted rpl1_ idx _ present _ flag.

13. The method of claim 11 or 12, wherein the first flag is encoded in a Picture Parameter Set (PPS) of the coded video stream.

14. The method of any of claims 11 to 13, wherein the first reference picture list syntax structure is encoded in a slice header of the coded video bitstream.

15. The method of any of claims 11 to 14, wherein the first reference picture list syntax structure and the second reference picture list syntax structure are encoded in a slice header of the coded video bitstream.

16. The method according to any one of claims 11 to 15, wherein the first value of the first flag is 1 (1) and the second value of the flag is 0 (0).

17. The method according to any one of claims 11 to 16, wherein, when the first value of the first flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header.

18. A decoding device, characterized in that the decoding device comprises:

a receiver for receiving a coded video stream;

a memory coupled with the receiver, the memory storing instructions;

a processor coupled with the memory, the processor to execute the instructions stored in the memory to cause the processor to:

parsing a first flag from the coded video stream;

when the first flag has a second value: determining that the third flag indicates that a value of a second index of the second reference picture list syntax structure is present in the slice header; generating a reference picture list using the first reference picture list syntax structure and the second reference picture list syntax structure from the coded video bitstream; wherein the second reference picture list syntax structure is one of the reference picture list syntax structures that activates listIdx equal to 1 in SPS when the third flag is equal to 1; when the third flag is equal to 0, the second reference picture list syntax structure is a reference picture list syntax structure in which listIdx included directly in the slice header is equal to 1; the second index is an index in a list of reference picture list syntax structures included in the primary SPS for which listIdx is equal to 1;

19. The decoding device according to claim 18, wherein the decoding device further comprises a display for displaying an image generated using the reconstructed block.

20. The decoding device according to claim 18 or 19, wherein the first flag is denoted rpl1_ idx _ present _ flag.

21. The decoding device according to any of claims 18 to 20, wherein the first flag is included in a Picture Parameter Set (PPS) of the coded video stream.

22. The decoding device according to any of claims 18 to 21, wherein the first reference picture list syntax structure is included in a slice header of the coded video bitstream.

23. The decoding device according to any one of claims 18 to 22, wherein the first value of the first flag is 1 (1), and the second value of the flag is 0 (0).

24. The encoding device according to any one of claims 18 to 23, wherein, when the first value of the first flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header.

25. An encoding apparatus characterized by comprising:

a processor to:

when a third flag and a second index of a second reference picture list syntax structure are not coded in a slice header of the video code stream and the values of the third flag and the second index of the second reference picture list syntax structure are inferred to be the same as the values of the second flag and the first index of the first reference picture list syntax structure, respectively, coding a first flag having a first value in the video code stream; wherein the first reference picture list syntax structure is one of reference picture list syntax structures that activates listIdx in SPS equal to 0 when the second flag is equal to 1; when the second flag is equal to 0, the first reference picture list syntax structure is a reference picture list syntax structure for which listIdx included directly in the slice header is equal to 0; the first index is an index in a list of reference picture list syntax structures included in the primary SPS for which listIdx is equal to 0;

when the third flag and a second index of the second reference picture list syntax structure are encoded in a slice header of the video code stream, encoding a first flag having a second value in the video code stream; wherein the second reference picture list syntax structure is one of the reference picture list syntax structures that activates listIdx in SPS equal to 1 when the third flag is equal to 1; when the third flag is equal to 0, the second reference picture list syntax structure is a reference picture list syntax structure for which listIdx included directly in the slice header is equal to 1; the second index is an index in a list of reference picture list syntax structures included in the primary SPS for which listIdx is equal to 1;

a transmitter coupled to the processor, the transmitter configured to transmit the video bitstream to the video decoder.

26. The encoding device of claim 25, wherein the first flag is denoted rpl1_ idx _ present _ flag.

27. The apparatus of claim 25 or 26, wherein the first flag is encoded in a Picture Parameter Set (PPS) of the coded video bitstream.

28. The encoding device of any of claims 25 to 27, wherein the first reference picture list syntax structure is encoded in a slice header of the coded video bitstream.

29. The encoding device according to any one of claims 25 to 28, characterized in that the first value of the first flag is 1 (1) and the second value of the flag is 0 (0).

30. The encoding device according to any one of claims 25 to 29, wherein, when the first value of the first flag is 1 (1), ref _ pic _ list _ sps _ flag [1] and ref _ pic _ list _ idx [1] are not included in the slice header.

31. A decoding apparatus, characterized in that the decoding apparatus comprises:

the receiver is used for receiving the code stream and decoding the code stream;

a transmitter coupled with the receiver, the transmitter to transmit the decoded image to a display;

a memory coupled with at least one of the receiver or the transmitter, the memory to store instructions;

a processor coupled with the memory, the processor to execute the instructions stored in the memory to perform the method of any of claims 1-17.

32. A system, characterized in that the system comprises:

encoder, wherein the encoder comprises an encoding device according to any one of claims 25 to 30;

a decoder in communication with the encoder, wherein the decoder comprises a decoding device according to any one of claims 18 to 24.

33. A coding module, wherein the coding module comprises:

the receiving module is used for receiving the code stream and decoding the code stream;

a transmitting module coupled with the receiving module, the transmitting module to transmit the decoded image to a display module;

a storage module coupled to at least one of the receiving module or the transmitting module, the storage module to store instructions;

a processing module coupled with the storage module, the processing module to execute the instructions stored in the storage module to perform the method of any of claims 1-17.