CN114846805A

CN114846805A - Signaling in video coding and decoding to inform coding and decoding picture buffer zone level

Info

Publication number: CN114846805A
Application number: CN202080090438.7A
Authority: CN
Inventors: 王业奎
Original assignee: ByteDance Inc
Current assignee: ByteDance Inc
Priority date: 2019-12-26
Filing date: 2020-12-27
Publication date: 2022-08-02
Also published as: WO2021134052A1; WO2021134051A1; CN114846792A

Abstract

Methods, systems, and devices are described for signaling picture buffer levels as part of video coding or decoding. One example method of video processing includes: performing a conversion between video and a bitstream of the video, wherein the bitstream is organized into one or more access units according to rules, and wherein the rules specify that one or more constraints on at least one of a sum of a Coded Picture Buffer (CPB) removal time, a nominal CPB removal time, a Decoded Picture Buffer (DPB) output time, and a number of bytes in a Network Abstraction Layer (NAL) unit for an access unit are based on a sum of picture sizes of each of a plurality of pictures of the access unit.

Description

Signaling in video coding and decoding to inform coding and decoding picture buffer zone level

Cross Reference to Related Applications

This application claims timely priority and benefit to U.S. provisional application No.62/953,815 filed on 26.12.2019, in accordance with applicable patent laws and/or the provisions of the paris convention. The entire disclosure of the above application is incorporated by reference as part of the disclosure of the present application for all purposes in law.

Technical Field

This patent document relates to picture and video coding.

Background

In the internet and other digital communication networks, digital video occupies the largest bandwidth. As the number of connected user devices capable of receiving and displaying video increases, the bandwidth requirements for pre-counting the use of digital video will continue to grow.

Disclosure of Invention

This document discloses techniques that may be used by video encoders and decoders to signal picture buffer levels as part of performing video encoding or decoding.

In one example aspect, a video processing method is disclosed. The method comprises performing a conversion between the video and a bitstream of said video according to a rule. Wherein the bitstream comprises one or more bitstream portions that are independently decodable, each bitstream portion corresponding to one or more coded video pictures of the video, and wherein the rules specify that a maximum decoded picture buffer size required to decode the bitstream or the one or more bitstream portions is determined based on a maximum allowed picture size of the one or more coded video pictures corresponding to the bitstream or the one or more bitstream portions.

In another example aspect, another video processing method is disclosed. The method includes performing a conversion between video and a bitstream of the video comprising one or more coding layer video sequences, wherein the bitstream conforms to a rule, and wherein the rule specifies that a maximum buffer size of decoded pictures of the coding layer video sequences is constrained to be less than or equal to a maximum picture size selected from pictures in the coding layer video sequences.

In another example aspect, another video processing method is disclosed. The method comprises performing a conversion between the video and a bitstream of said video according to a rule. Wherein the bitstream comprises one or more bitstream portions that are independently decodable, each bitstream portion corresponding to one or more coded video pictures of the video, and wherein the rules specify at least one of a maximum allowed picture size, a maximum allowed picture width, a maximum allowed picture height for the converting.

In another example aspect, another video processing method is disclosed. The method comprises the following steps: a conversion between a video and a bitstream of the video is performed. Wherein the bitstream includes one or more output layer sets, wherein at least one output layer set includes a plurality of video layers. And wherein the bitstream conforms to a rule that specifies a constraint on an overall size of a decoded picture buffer of the at least one output layer set that includes the plurality of video layers.

In another example aspect, another video processing method is disclosed. The method includes performing a conversion between video and a bitstream of the video, wherein the bitstream is organized into one or more access units according to a rule, and wherein the rule specifies that one or more constraints for at least one of a pair Coded Picture Buffer (CPB) removal time, a nominal CPB removal time, a Decoded Picture Buffer (DPB) output time for an access unit, and a sum of a number of bytes in a Network Abstraction Layer (NAL) unit are based on a sum of picture sizes for each of a plurality of pictures of the access unit.

In another example aspect, another video processing method is disclosed. The method comprises performing a conversion between the video and a bitstream of the video, wherein the bitstream is organized into one or more access units according to a rule, and wherein the rule specifies a limit on a maximum number of stripes in an access unit.

In another example aspect, a video encoder apparatus is disclosed. The video encoder comprises a processor configured to implement the above-described method.

In another example aspect, a video decoder apparatus is disclosed. The video decoder comprises a processor configured to implement the above-described method.

In another example aspect, a computer-readable medium having code stored thereon is disclosed. The code embodies one of the methods described herein in the form of processor executable code.

These and other features are described in this document.

Drawings

Fig. 1 is a block diagram illustrating an example video processing system in which various techniques disclosed in this document may be implemented.

Fig. 2 is a block diagram of an example hardware platform for video processing.

Fig. 3 illustrates a block diagram of an example video encoding system in which some embodiments of the present disclosure may be implemented.

Fig. 4 illustrates a block diagram of an example encoder in which some embodiments of the present disclosure may be implemented.

Fig. 5 illustrates a block diagram of an example decoder in which some embodiments of the present disclosure may be implemented.

Fig. 6 shows a flow diagram of an example method of video processing.

Fig. 7 shows a flow diagram of an example method of video processing.

Detailed Description

The section headings are used in this document to facilitate understanding and do not limit the applicability of the techniques and embodiments disclosed in each section to only that section. Furthermore, the use of the H.266 term in some descriptions is for ease of understanding only and is not intended to limit the scope of the disclosed technology. Thus, the techniques described herein are also applicable to other video codec protocols and designs.

1. Overview

This document relates to video coding and decoding techniques. In particular, it relates to defining the level of video codecs that support both single-layer video codecs and multi-layer video codecs. It can be applied to any video codec standard or non-standard video codec supporting both single-layer video codec and multi-layer video codec, such as the multi-function video codec (VVC) being developed.

2. Abbreviations

APS adaptive Parameter Set (Adaptation Parameter Set)

AU Access Unit (Access Unit)

AUD Access Unit Delimiter (Access Unit Delimiter)

AVC Advanced Video Coding (Advanced Video Coding)

CLVS codec Layer Video Sequence (Coded Layer Video Sequence)

CPB coding and decoding Picture Buffer (Coded Picture Buffer)

CRA Clean Random Access (Clean Random Access)

CTU Coding and decoding Tree Unit (Coding Tree Unit)

CVS codec Video Sequence (Coded Video Sequence)

DPB Decoded Picture Buffer (Decoded Picture Buffer)

DPS Decoding Parameter Set (Decoding Parameter Set)

End Of EOB Bitstream (End Of Bitstream)

End Of EOS Sequence (End Of Sequence)

GDR Gradual Decoding Refresh (Gradual Decoding Refresh)

HEVC High Efficiency Video Coding (High Efficiency Video Coding)

IDR Instantaneous Decoding Refresh (instant Decoding Refresh)

JEM Joint Exploration Model (Joint Exploration Model)

MCTS Motion Constrained slice Sets (Motion-Constrained Tile Sets)

NAL Network Abstraction Layer (Network Abstraction Layer)

OLS Output Layer Set (Output Layer Set)

PH Picture Header (Picture Header)

PPS Picture Parameter Set (Picture Parameter Set)

PU Picture Unit (Picture Unit)

RBSP original Byte Sequence Payload (Raw Byte Sequence Payload)

SEI Supplemental Enhancement Information (Supplemental Enhancement Information)

SPS Sequence Parameter Set (Sequence Parameter Set)

VCL Video Coding and decoding Layer (Video Coding Layer)

VPS Video Parameter Set (Video Parameter Set)

VTM VVC Test Model (VVC Test Model)

VUI Video Usability Information (Video Usability Information)

VVC multifunctional Video Coding and decoding (Versatile Video Coding)

3. Preliminary discussion

The video coding standard has evolved largely through the development of the well-known ITU-T and ISO/IEC standards. ITU-T has established H.261 and H.263, ISO/IEC has established MPEG-1 and MPEG-4 Visual, and both organizations have jointly established the H.262/MPEG-2 video and H.264/MPEG-4 Advanced Video Codec (AVC) and H.265/HEVC standards. Since h.262, video codec standards have been based on hybrid video codec structures, in which temporal prediction plus transform coding is utilized. In order to explore future video codec technologies beyond HEVC, VCEG and MPEG united in 2015 to form the joint video exploration team (jfet). Thereafter, JVET adopted many new methods and entered them into a reference software named Joint Exploration Model (JEM). The jfet conference is held once a quarter at the same time, and the goal of the new codec standard is to reduce the bit rate by 50% compared to HEVC. The new video codec standard was formally named multifunctional video codec (VVC) at the jfet meeting in year 2018, month 4, and the first version of the VVC Test Model (VTM) was also released at that time. Due to the continuous effort to standardize VVCs, new codec techniques are adopted in the VVC standard at every jfet conference. The working draft and test model VTM of the VVC are updated after each meeting. The VVC project is now targeted for technical completion (FDIS) at a 7-month conference in 2020.

3.1 grades, layers and grades

Video codec standards typically specify a profile and a level. Some video codec standards also specify layers, such as HEVC and VVC under development.

Profiles, layers and levels specify restrictions on the bitstream and therefore limit the ability to decode the bitstream. Profiles, layers and levels may also be used to indicate interoperability points between various decoder implementations.

Each profile specifies a subset of the constraints and algorithmic features supported by all decoders that conform to the profile. Note that the encoder need not use all the codec tools or features supported in the profile, whereas a profile-compliant decoder needs to support all the codec tools or features.

Each level of the layer specifies a limit on the set of values that a bitstream syntax element can take. All profiles typically use the same set of layer and level definitions, but each implementation may support different layers, and within a layer, each supported profile may support a different level. For any given profile, the level of a layer typically corresponds to a particular decoder processing load and storage capacity.

The capabilities of a video decoder that conforms to the video codec specification are specified in terms of the ability to decode a video stream that conforms to the constraints of the level, layer and level specified in the video codec specification. When expressing the capabilities of a particular profile's decoder, the layers and levels supported by the profile should also be expressed.

3.2 existing VVC level and layer definitions

In the latest VVC draft text in JVET-P2001-v14, publications are available here: http:// phenix. int-evry. fr/JVET/doc _ end _ user/documents/16_ Geneva/wg11/JVET-P2001-v14.zip, the levels are defined as follows.

A.1.1 generic layer and level restrictions

For the purpose of comparing layer capabilities, a layer whose general _ tier _ flag is equal to 0 is regarded as a lower layer than a layer whose general _ tier _ flag is equal to 1.

For the purpose of comparing level capabilities, a particular level of a given layer is considered to be a lower level than other levels of the same layer when the value of general _ level _ idc or sub _ level _ idc [ i ] of the particular level is less than the values of the other levels.

To express the constraints in this appendix, the following are specified:

let access unit n be the nth access unit in decoding order, the first access unit being access unit 0 (i.e. the 0 th access unit).

-letting picture n be a coded picture of access unit n or a corresponding decoded picture

When the specified level is not level 8.5, a bitstream conforming to the profile of the specified layer and level should comply with the following constraints of each bitstream conformance test specified in appendix C:

a) PicSizeInSamplesY should be less than or equal to MaxLumaPs, where MaxLumaPs are specified in table a.1.

b) The value of pic _ width _ in _ luma _ samples should be less than or equal to Sqrt (MaxLumaPs × 8).

c) The value of pic _ height _ in _ luma _ samples should be less than or equal to Sqrt (MaxLumaPs × 8).

d) The value of num _ tile _ columns _ minus1 should be less than MaxTileCols and the value of num _ tile _ rows _ minus1 should be less than MaxTileRows, where MaxTileCols and MaxTileRows are specified in table a.1.

e) For VCL HRD parameters, CpbSize [ Htid ] [ i ] should be less than or equal to cpbbcfactor × MaxCPB for at least one value of i in the range of 0 to HRD _ cpb _ cnt _ minus1 (inclusive), where CpbSize [ Htid ] [ i ] is specified in clause 7.4.6.3 according to the parameters specified in clause c.1, cpbbcfactor is specified in table a.3, and MaxCPB is specified in table a.1 in cpbbcfactor bits.

f) For NAL HRD parameters, CpbSize [ Htid ] [ i ] should be less than or equal to CpbNalFactor × MaxCPB for at least one value of i ranging from 0 to HRD _ cpb _ cnt _ minus1 (inclusive), where CpbSize [ Htid ] [ i ] is specified in clause 7.4.6.3 according to the parameters specified in clause c.1, CpbNalFactor is specified in table a.3, and MaxCPB is specified in table a.1 in CpbNalFactor bits.

Table a.1 specifies the limits of each level of each layer except level 8.5.

The layers and levels to which the bitstream conforms are indicated by the syntax elements general _ tier _ flag and general _ level _ idc, the sub-layer indicating the level of conformance is indicated by the syntax element sublayer _ level _ idc [ i ], as follows, if the specified level is not level 8.5, general _ tier _ flag equal to 0 indicates conformance to the Main layer (Main tier), general _ tier _ flag equal to 1 indicates conformance to the High layer (High tier), according to the layer constraints specified in table a.1, general _ tier _ flag should be equal (corresponding to the entry with the "-" flag in table a.1) for layers below level 4. Otherwise (the specified level is level 8.5), the requirement for bitstream conformance is that the general _ tier _ flag should be equal to 1, and the value 0 of the general _ tier _ flag is reserved for future use by the ITU-T | ISO/IEC, and the decoder should ignore the value of the general _ tier _ flag.

-general _ level _ IDC and background _ level _ IDC [ i ] should be set equal to 30 times the number of levels specified in Table A.1.

TABLE A.1 general layer and level restrictions

A.1.2 class-specific level restrictions

To express the constraints in this appendix, the following are specified:

-the set variable fR is equal to 1 ÷ 300.

The variable HbrFactor is defined as follows:

-if the bitstream is indicated to be compliant with the master 10 profile or the master 4:4: 410 profile, the HbrFactor is set equal to 1.

The variable BrVclFactor representing the VCL bit rate scale factor is set equal to CpbVclFactor HbrFactor.

The variable BrNalFactor representing the NAL bit rate scale factor is set equal to cpbnallfactor.

The variable MinCr is set equal to MinCrBase MinCrScaleFactor hbr.

When the specified level is not level 8.5, the value of max _ dec _ pic _ buffering _ minus1[ Htid ] +1 should be less than or equal to MaxPbSize, derived as follows:

if (PicSizeInSamplesY < (MaxLumaPs > >2))

MaxDpbSize＝Min(4*maxDpbPicBuf,16)

Others if (PicSizeInSamplesY < (MaxLumaPs > >1))

MaxDpbSize＝Min(2*maxDpbPicBuf,16)(A.1)

Others if (PicSizeInSamplesY < (3X MaxLumaPs) > >2)

MaxDpbSize＝Min((4*maxDpbPicBuf)/3,16)

Others

MaxDpbSize＝maxDpbPicBuf

Where MaxLumaPs are specified in Table A.1 and maxPbPicBuf is equal to 8.

A master 10 or master 4:4: 410 profile of bitstreams conforming to the specified layers and levels should comply with the following restrictions for each bitstream conformance test specified in annex C:

a) as specified in clause c.2.3, the nominal removal time for removing access unit n (n is greater than 0) from the CPB should satisfy the following constraint: for the value of PicSizeInSamplesY for picture n-1, AuNominalRemovalTime [ n ] -AuCpbRemovalTime [ n-1] is greater than or equal to Max (PicSizeInSamplesY ÷ MaxLumaSr, fR), where MaxLumaSr is the value specified for picture n-1.

b) As specified in clause c.3.3, the difference between the consecutive output times of the pictures in the DPB should satisfy the following constraint: for the value of PicSizeInSamplesY for picture n, DpbOutputInterval [ n ] is greater than or equal to Max (PicSizeInSamplesY ÷ MaxLumaSr, fR), where MaxLumaSr is the value specified in table a.2 for picture n, provided that picture n is the picture that is output and not the last picture of the bitstream output.

c) For the value of piszizensamplesy for picture 0, the removal time of access unit 0 should satisfy the constraint that the number of slice segments in picture 0 is less than or equal to Min (Max (1, Max segmentserpicture, MaxLumaSr/MaxLumaPs, AuCpbRemovalTime [0] -aunoallremovaltime [0]) + MaxSliceSegmentsPerPicture, PicSizeInSamplesY/MaxLumaPs), where maxsackmentsperpicture, MaxLumaPs, and MaxLumaSr are the values specified in table a.1 and table a.2, respectively, as applicable to picture 0

d) The difference between consecutive CPB removal times for access unit n and access unit n-1(n is greater than 0) should satisfy the restriction that the number of slice segments in picture n is less than or equal to Min ((Max (1, MaxSliceSegmentsPerPicture:. Max/MaxLumaPs (AuCpbRemovalTime [ n ] -AuCpbRemovalTime [ n-1])), maxsacksegmentsperpicture), wherein maxsacksegmentsperpicture, MaxLumaSr ps and maxluma are the values specified in table a.1 and table a.2, respectively, as applicable to picture n.

e) For parameter VCL HRD, BitRate [ Htid ] [ i ] should be less than or equal to BrVclFactor × MaxBR for at least one value of i in the range of 0 to HRD _ cpb _ cnt _ minus1 (inclusive), where BitRate [ Htid ] [ i ] is specified in clause 7.4.6.3 based on the parameters specified in clause c.1 and MaxBR is specified in table a.2 in BrVclFactor bits/second.

f) For the NAL HRD parameters, for at least one value of i in the range of 0 to HRD _ cpb _ cnt _ minus1, BitRate [ Htid ] [ i ] should be less than or equal to BrNalFactor × MaxBR, where BitRate [ Htid ] [ i ] is specified in clause 7.4.6.3 based on the parameters specified in clause c.1 and MaxBR is specified in table a.2 in BrVclFactor bits/second.

g) For the value of PicSizeInSamplesY for picture 0, the sum of the NumBytesInNalUnit variables of access unit 0 should be less than or equal to FormatCapabilityFactor (Max (PicSizeInSamplesY, fR) MaxLumaSr) + MaxLumaSr (AuCpbRemovalTime [0] -aunomnalremovaltime [0 ]))/MinCr, where MaxLumaSr and FormatCapabilityFactor are the values specified in table a.2 and table a.3, respectively, applicable to picture 0.

h) The sum of the numbytesinnnalunit variables of access unit n (n is greater than 0) should be less than or equal to FormatCapabilityFactor masr (aucpbremovaltimen-1) MinCr, where MaxLumaSr and FormatCapabilityFactor are the values specified in table a.2 and table a.3, respectively, applicable to picture n.

i) For the value of picsizesinsamplesy for picture 0, the removal time of ACCESS UNIT 0 should satisfy the constraint that the number of slices in picture 0 is less than or equal to Min (Max (1, MaxTileCols, MaxTileRows 120 (AuCpbRemovalTime [0] -Aunominal RemovalTime [0]) + MaxTileCols MaxTileRows PicSizeInsampleY/MaxLumaPs), where MaxTileCols and MaxTileRows are the values specified in Table A.1 as applicable to picture 0.

j) The difference between successive CPB removal times for access units n and n-1(n is greater than 0) should satisfy the restrictions of picture n for which the number of slices is less than or equal to Min (Max (1, MaxTileCols MaxTileRows 120 (AuCpbRemovalTime [ n ] -AuCpbRemovalTime [ n-1])), MaxTileCols MaxTileRows), where MaxTileCols and MaxTileRows are the values specified in table a.1 for picture n.

…

TABLE A.2 layer and level restrictions for video profiles

TABLE A.3 Specifications for CpbVclFactor, CpbNalFactor, FormatCapabilityFactor and MinCrScaleFactor

4. Technical solution example technical problem to be solved

The existing VVC design for class definition has the following problems:

1) to derive the equation for the variable MaxDpbSize, i.e. equation (a.1) above, the variable PicSizeInSamplesY is used, which is the picture size of the particular picture. However, in VVC, picture sizes may vary from one picture to another in a Codec Layer Video Sequence (CLVS). Therefore, the maximum picture size among pictures within CLVS should be used to derive MaxDpbSize. Furthermore, since there may be multiple layers, and different layers in the bitstream of the OLS have different maximum picture size values in the pictures within the CLVS, the value of MaxDpbSize needs to be derived for each layer.

2) The constraint related to the DPB size specifies that the value of max _ dec _ pic _ buffering _ minus1[ ] +1 should be less than or equal to MaxDpbSize, which constraint only applies to single-layer bitstreams. For an Output Layer Set (OLS) that contains multiple layers in its bitstream, existing constraints do not apply because there may be multiple different max _ dec _ pic _ buffering _ minus1[ ] syntax element instances applied to different layers. Instead, each layer needs to be constrained using the appropriate instance of the max _ dec _ pic _ buffering _ minus1[ ] syntax element.

3) The definitions of the constraints on picture size, picture width and picture height, i.e. the a, b and c entries in section a.1.1 above, use the variable PicSizeInSamplesY and the syntax elements pic _ width _ in _ luma _ samples and pic _ height _ in _ luma _ samples. Also, since the picture size within VVC, CLVS may vary from one picture to another, the values of maximum picture size, picture width and picture height should be used. Furthermore, because there may be multiple layers, and different layers in the bitstream of the OLS have different maximum picture size, picture width, and picture height values in the pictures within the CLVS, the constraints need to be specified separately for each layer or each SPS to which the layer refers.

4) Furthermore, the overall DPB size for an OLS containing multiple layers needs to be constrained such that the same DPB size limitation of the decoder will work regardless of the number of layers. This is currently missing.

5) The definition of AU for several constraints, namely the sum of the CPB removal time, the nominal CPB removal time, the DPB output time and the numbytesinnnalunit variable, i.e. the a, b, c, d, g and i terms in section a.1.2 above, uses the variable PicSizeInSamplesY, which is the picture size of a particular picture. However, since an AU can contain a plurality of pictures, the sum of picture sizes of all pictures in the AU should be used.

6) A limit on the maximum number of slices per AU should be specified and used when specifying these constraints, rather than specifying a limit on the maximum number of slices per picture for each level and some constraints on CPB removal time (i.e., the c and d terms of the a.1.2 section above) for an AU.

5. Example embodiments and techniques

In order to solve the above-mentioned problems and other problems, methods summarized below are disclosed in lists of various items. These terms should be considered as examples to explain the general concept and should not be interpreted in a narrow manner. Further, these items may be used alone or in any combination.

1) To solve the first problem, the equation to derive the variable MaxDpbSize is updated to use the maximum picture size in the picture within CLVS, instead of using the variable PicSizeInSamplesY, and the value of MaxDpbSize is derived for each layer.

2) To solve the second problem, for each layer, the constraint requires that the value of a particular instance of the max _ dec _ pic _ buffering _ minus1[ ] syntax element plus 1 is less than or equal to the value of MaxDpbSize for that layer, where a particular instance of the max _ dec _ pic _ buffering _ minus1[ ] syntax element is max _ dec _ pic _ buffering _ minus1[ i ] in the dpb _ parameters () syntax structure, which is determined as follows: if the layer is an output layer of the OLS, the dpb _ parameters () syntax structure is a syntax structure applied to the layer when the layer is an output layer in the OLS; otherwise, the dpb _ parameters () syntax structure is a dpb _ parameters () syntax structure applied to the layer when the layer is not an output layer in the OLS.

3) To solve the third problem, the definition of the constraints on picture size, picture width, and picture height uses the maximum picture size, picture width, and picture height, instead of the variables PicSizeInSamplesY and syntax elements pic _ width _ in _ luma _ samples and pic _ height _ in _ luma _ samples. Furthermore, constraints are specified separately for each layer or each SPS referenced by a layer.

4) To solve the fourth problem, constraints on the overall DPB size of an OLS including multiple layers are specified.

a. In one alternative, the constraints are specified as follows:

immediately after each AU is decoded, numDecPics is given as the number of decoded pictures in DPB, picSizeInSamplesY [ i]The value of picSizeInSamplesY for the ith decoded picture in the DPB, where i is in the range of 0 to numDecPics-1 (inclusive),

should be less than or equal to maxDpbPicBuf x MaxLumaPs, where maxDpbPicBuf is equal to 8, MaxLumaPs being specified in table a.1.

b. In another alternative, the constraints are specified as follows:

let numLayers be the number of layers in OLS, maxDecBuff [ i ]]And PicSizeMaxInSamplesY [ i]Max _ dec _ pic _ buffering _ minus1[ maxTid ] for the ith layer, respectively]And a value of PicSizeMaxInSamplesY, i ranging from 0 to numLayers-1 inclusive, wherein PicSizeMaxInSamplesY equals pic _ width _ max _ in _ luma _ samples for the layer pic _ height _ max _ in _ luma _ samples, and max _ dec _ pic _ buffering _ minus1[ maxTid ] for the layer]Max _ dec _ pic _ buffering _ minus1[ i, max of the maximum value of i in the dpb _ parameters () syntax structure]Is determined as follows: if the layer is an output layer of the OLS, the dpb _ parameters () syntax structure is a syntax structure applied to the layer when the layer is an output layer in the OLS; otherwise, the dpb _ parameters () syntax structure is a syntax structure applied to the layer when the layer is not an output layer in the OLS.

5) To solve the fifth problem, the definition of several constraints on the sum of the CPB removal time, the nominal CPB removal time, the DPB output time, and the numbytesinnnalunit variables for an AU uses the sum of the picture sizes of all pictures in the AU instead of the variable PicSizeInSamplesY.

6) To solve the sixth problem, a limit of the maximum number of slices per AU is specified and used when specifying these constraints, instead of specifying a limit of the maximum number of slices per picture for each level and using the limit when specifying some constraints of an AU on the CPB removal time.

6 examples

The following are some example embodiments that may be applied to the VVC specification. The changed text is based on the latest VVC text in JFET-P2001-v 14. Most relevant parts of the additions or modifications are indicated by bold double brackets. There are also some editing property modifications and therefore no highlighting.

6.1 first embodiment

6.1.1 overview of grades, layers, and levels

The profiles, layers and levels specify restrictions on the bitstream and therefore limit the ability to decode the bitstream. Profiles, layers and levels may also be used to indicate interoperability points between various decoder implementations.

Each profile specifies a subset of the constraints and algorithmic features supported by all decoders that conform to the profile.

Note that the encoder need not use any particular subset of functions supported in the profile.

Each level of the layer specifies a limit on a set of values that a syntax element of the present specification can take. All profiles use the same set of layer and level definitions, but each implementation can support a different layer, and within a layer, each supported profile can support a different level. For any given profile, the level of a layer typically corresponds to a particular decoder processing load and storage capacity.

{ { in this clause, a phrase like "bitstream compliant" should be interpreted as "bitstream compliant of OLS". }}

6.1.2 layer and level display overview

For the purpose of comparing layer functions, a layer whose general _ tier _ flag is equal to 0 is regarded as a layer lower than a layer whose general _ tier _ flag is equal to 1.

To express the constraints in this appendix, the following are specified:

let AU n be the nth AU in decoding order, the first AU being AU 0 (i.e. 0 th AU).

-for a particular reference SPS, let PicSizeMaxInSamplesY be the value of pic _ width _ max _ in _ luma _ samples pic _ height _ max _ in _ luma _ samples

When the specified level is not level 8.5, the value of { { for each layer in the OLS, } } MaxDpbSize is derived as follows:

if ({ { PicSizeMaxInSamplesY } } < [ (MaxLumaPs > >2))

MaxDpbSize＝Min(4*maxDpbPicBuf,16)

Others if ({ { PicSizeMaxInSamplesY } } < { (MaxLumaPs >)

1))

MaxDpbSize＝Min(2*maxDpbPicBuf,16) (A.1)

Others if ({ { PicSizeMaxInSamplesY } } [ ((3). MaxLumaPs) > ]

2))

MaxDpbSize＝Min((4*maxDpbPicBuf)/3,16)

Others

MaxDpbSize＝maxDpbPicBuf

Where MaxLumaPs are specified in Table A.1 and maxPbPicBuf is equal to 8.

a) { { PicSizeInSamplesY per reference SPS } } should be less than or equal to MaxLumaPs, where MaxLumaPs are specified in Table A.1.

b) The value of { { for each reference SPS, } } pic _ width { { max _ } in _ luma _ samples should be less than or equal to Sqrt (MaxLumaPs × 8).

c) The value of { { for each reference SPS, } } pic _ height { { max _ } in _ luma _ samples should be less than or equal to Sqrt (MaxLumaPs × 8).

d) The value of { { for each reference PPS, } } NumTilecolumns should be less than MaxTiLECols, and the value of NumTileRows should be less than MaxTiLERows, where MaxTiLECols and MaxTiLERows are specified in Table A.1.

g) { { for each layer in the OLS, the value of max _ dec _ pic _ buffering _ minus1[ maxTid ] +1 should be less than or equal to the MaxPbSize for that layer, where max _ dec _ pic _ buffering _ minus1[ maxTid ] is the value of max _ dec _ pic _ buffering _ minus1[ i ] at the time i takes the maximum value in the dpb _ parameters () syntax structure, which is determined as follows: if the layer is an output layer of the OLS, the dpb _ parameters () syntax structure is a syntax structure applied to the layer when the layer is an output layer in the OLS; otherwise, when the layer is not an output layer in the OLS, a dpb _ parameters () syntax structure will be applied.

h) Immediately after each AU is decoded, numDecPics is given as the number of decoded pictures in DPB, picSizeInSamplesY [ i]For the value of picSizeInSamplesY for the ith decoded picture in the DPB, i is in the range of 0 to numDecPics-1 (inclusive),

should be less than or equal to maxDpbPicBuf x MaxLumaPs, where maxDpbPicBuf is equal to 8, MaxLumaPs being specified in table a.1. }}

Table a.1 specifies the limits of the levels other than level 8.5.

The layers and levels to which the bitstream conforms are indicated by the syntax elements general _ tier _ flag and general _ level _ idc, the sub-layer indicating the level of conformance is indicated by the syntax element sub _ level _ idc [ i ], as follows-if the specified level is not level 8.5, general _ tier _ flag equal to 0 indicates conformance to the main layer, general _ tier _ flag equal to 1 indicates conformance to the high layers, for levels below level 4 general _ tier _ flag should be equal to 0 (corresponding to the entry in table a.1 with the "-" flag). Otherwise (the specified level is level 8.5), the requirement for bitstream conformance is that the general _ tier _ flag should be equal to 1, and the value 0 of the general _ tier _ flag is reserved for future use by the ITU-T | ISO/IEC, and the decoder should ignore the value of the general _ tier _ flag.

-general _ level _ IDC and background _ level _ IDC [ i ] should be set to a value equal to 30 times the number of levels specified in Table A.1.

TABLE A.1 general layer and level restrictions

6.1.3 class-specific level restrictions

To express the constraints in this appendix, the following are specified:

-the set variable fR is equal to 1 ÷ 300.

The variable HbrFactor is defined as follows:

The variable MinCr is set equal to MinCrBase MinCrScaleFactor hbr.

{ { variable AuSizeInSamplesY [ n]Is set equal to

Where numDecPics is the number of pictures in AUn, picSizeInSamplesY [ i]Is the picSizeInSamplesY value for the ith picture in AU n, i is in the range of 0 to numDecPics-1 (inclusive). }}

a) as specified in clause c.2.3, the nominal removal time for removing AU n (n is greater than 0) from the CPB should satisfy the following constraint: AuNominalRemovalTime [ n ] -AuCpbRemovalTime [ n-1] is greater than or equal to Max ({ { PicSizeInSamplesY [ n-1] } } +/MaxLumaSr, fR), where MaxLumaSr is the value specified in Table A.2 as being applicable to AU n-1.

b) As specified in clause c.3.3, the difference between consecutive output times of pictures of different AUs in the DPB should satisfy the following constraint: DpbOutputInterval [ n ] is greater than or equal to Max ({ { auseizensamplesy [ n ] } } MaxLumaSr, fR), where MaxLumaSr is the value specified for AU n in table a.2, provided that AU has a picture output, and AU n is not the last AU in the bitstream that has a picture output.

c) The CPB removal time of AU 0 should satisfy the constraint that the number of bands in AU 0 is less than or equal to Min (Max (1, { { maxslicespeerau } } MaxLumaSr/MaxLumaPs @ AuCpbRemovalTime [0] -AuNominalRemovalTime [0]) + { { maxslicespeerau } } { { auseiinsamplesy [0] }/MaxLumaPs), { { maxsliceraau } }), where { { maxsserau } }, MaxLumaPs and MaxLumaSr are the values specified in table a.1 and table a.2, respectively, as applicable to AU 0.

d) The difference between successive CPB removal times for AU n and AU n-1(n is greater than 0) should satisfy the constraint that the number of bands in AU n is less than or equal to Min ((Max (1, { { maxslicesperra } } MaxLumaSr/MaxLumaPs (AuCpbRemovalTime [ n ] -AuCpbRemovalTime [ n-1]), { { maxslicesperra } }), where { { maxslicesperra } }, MaxLumaPs and MaxLumaSr are the values for AU applicable to n specified in tables a.1 and a.2, respectively.

g) The sum of the numbytesinnnalunit variables of AU 0 should be less than or equal to FormatCapabilityFactor (Max ({ { ausieinsamplesy [0] } }, fR × MaxLumaSr) + MaxLumaSr (AuCpbRemovalTime [0] -aunonalalremovaltime [0])) /) MinCr, where MaxLumaSr and FormatCapabilityFactor are the values specified in table a.2 and table a.3, respectively, as applicable to AU 0.

h) The sum of the NumBytesInNalUnit variables of AU n (n is greater than 0) should be less than or equal to FormatCapabilityFactor MaxLumaSr (AuCpbRemovalTime [ n ] -AuCpbRemovalTime [ n-1]) +/MinCr, where MaxLumaSr and FormatCapabilityFactor are the values specified in Table A.2 and Table A.3, respectively, as applicable to AU n.

i) The removal time of AU 0 should satisfy the constraint that the number of slices per picture in AU 0 is less than or equal to Min (Max (1, MaxTileCols MaxTileRows 120 (AuCpbRemovalTime [0] -AuNominalRemovalTime [0]) + MaxTileCols MaxTileRows { { auseiinsamplesy [0] } }/MaxLumaPs), MaxTileCols MaxTileRows), where MaxTileCols and MaxTileRows are the values specified in table a.1 as applicable to AU 0.

j) The difference between successive CPB removal times for AU n and AU n-1(n greater than 0) should satisfy the constraint of Min (Max (1, MaxTileCols MaxTileRows 120 (AuCpbRemovalTime [ n ] -AuCpbRemovalTime [ n-1])), MaxTileCols MaxTileRows) for each picture in AU n, where MaxTileCols and MaxTileRows are the values specified in table a.1 as applicable to AU n.

…

Fig. 1 is a block diagram of an example video processing system 1000, which video processing system 1000 may perform various techniques in this disclosure. Various embodiments may include some or all of the components of system 1000. The system 1000 can include an input 1002 to receive video content. The video content may be received in a raw or uncompressed format (e.g., 8 or 10 bit multi-component pixel values) as well as in a compressed or encoded format. Input 1002 may represent a network interface, a peripheral bus interface, or a storage interface. Examples of network interfaces include wired interfaces such as ethernet, Passive Optical Networks (PONs), etc., and wireless interfaces such as Wi-Fi or cellular interfaces.

The system 1000 can include a codec component 1004 that can implement various codecs or encoding methods described in this document. The codec component 1004 can reduce the average bit rate of the video from the input 1002 to the output of the codec component 1004 to produce a codec representation of the video. Thus, codec techniques are sometimes referred to as video compression or video transcoding techniques. As represented by component 1006, the output of codec component 1004 can be stored or transmitted via a connected communication. A stored or transmitted bitstream representation (or codec representation) of the video received at input 1002 can be used by component 1008 to generate pixel values or displayable video that is sent to display interface 1010. The process of generating a user viewable video from a bitstream representation is sometimes referred to as video decompression. Furthermore, although certain video processing operations are referred to as "codec" operations or tools, it should be understood that the encoding tool or encoding operation is used at the encoder and the corresponding decoding tool or decoding operation that operates inversely on the encoded results is to be performed by the decoder.

Examples of a peripheral bus interface or display interface may include Universal Serial Bus (USB) or High Definition Multimedia Interface (HDMI) or Displayport, among others. Examples of storage interfaces include SATA (serial advanced technology attachment), PCI, IDE interfaces, and so forth. The techniques described in this document may be implemented in various electronic devices, such as mobile phones, laptop computers, smart phones, or other devices capable of performing digital data processing and/or video display.

Fig. 2 is a block diagram of the video processing apparatus 2000. Apparatus 2000 may be used to implement one or more of the methods described herein. The apparatus 2000 may be embodied in a smartphone, tablet, computer, internet of things (IoT) receiver, or the like. The apparatus 2000 may include one or more processors 2002, one or more memories 2004, and video processing circuitry 2006. The processor 2002 may be configured to implement one or more methods described in this document (e.g., fig. 6-7). The memory(s) 2004 may be used to store data and code for implementing the methods and techniques described herein. Video processing hardware 2006 may be used to implement some of the techniques described in this document in hardware circuits.

Fig. 3 illustrates a block diagram of an exemplary video codec system 100 that may utilize techniques of this disclosure. As shown in fig. 3, the video codec system 100 may include a source device 110 and a destination device 120. Source device 110 generates encoded video data, which may be referred to as a video encoding device. Destination device 120 may decode encoded video data generated by source device 110, which may be referred to as a video decoding device. The source device 110 may include a video source 112, a video encoder 114, and an input/output (I/O) interface 116.

The video source 112 may include sources such as a video capture device, an interface that receives video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of such sources. The video data may include one or more pictures. The video encoder 114 encodes video data from the video source 112 to generate a bitstream. The bitstream may comprise a sequence of bits that form an encoded representation of the video data. The bitstream may include coded pictures and related data. A coded picture is a coded representation of a picture. The related data may include sequence parameter sets, picture parameter sets, and other syntax structures. The I/O interface 116 may include a modulator/demodulator (modem) and/or a transmitter. The encoded video data may be transmitted directly to the destination device 120 over the network 130a via the I/O interface 116. The encoded video data may also be stored on storage media/server 130b for access by destination device 120.

Destination device 120 may include an I/O interface 126, a video decoder 124, and a display device 122.

I/O interface 126 may include a receiver and/or a modem. I/O interface 126 may retrieve encoded video data from source device 110 or storage medium/server 130 b. The video decoder 124 may decode the encoded video data. The display device 122 may display the decoded video data to a user. The display device 122 may be integrated with the destination device 120 or may be external to the destination device 120, the destination device 120 being configured to interface with an external display device.

The video encoder 114 and the video decoder 124 may operate in accordance with video compression standards such as the High Efficiency Video Codec (HEVC) standard, the multifunction video codec (VVM) standard, and other current and/or further standards.

Fig. 4 is a block diagram illustrating an example of a video encoder 200, which video encoder 200 may be the video encoder 114 in the system 100 shown in fig. 3.

Video encoder 200 may be configured to perform any or all of the techniques of this disclosure. In the example of fig. 4, the video encoder 200 includes a number of functional components. The techniques described in this disclosure may be shared among various components of video encoder 200. In some examples, a processor may be configured to perform any or all of the techniques described in this disclosure.

The functional components of the video encoder 200 may include a partitioning unit 201, a prediction unit 202, which may include a mode selection unit 203, a motion estimation unit 204, a motion compensation unit 205, and an intra prediction unit 206, a residual generation unit 207, a transform unit 208, a quantization unit 209, an inverse quantization unit 810, an inverse transform unit 211, a reconstruction unit 212, a buffer 213, and an entropy encoding unit 214.

In other examples, video encoder 200 may include more, fewer, or different functional components. In one example, the prediction unit 202 may include an Intra Block Copy (IBC) unit. The IBC unit may perform prediction in IBC mode, where the at least one reference picture is a picture in which the current video block is located.

Furthermore, some components (e.g., motion estimation unit 204 and motion compensation unit 205) may be highly integrated, but are represented separately in the example of fig. 4 for explanatory purposes.

Partition unit 201 may partition a picture into one or more video blocks. The video encoder 200 and the video decoder 300 may support various video block sizes.

Mode selection unit 203 may, for example, select one of a plurality of coding modes (intra or inter) based on the error results and provide the resulting intra or inter coded block to residual generation unit 207 to generate residual block data and to reconstruction unit 212 to reconstruct the coded block for use as a reference picture. In some examples, mode selection unit 203 may select a Combination of Intra and Inter Prediction (CIIP) modes, where the prediction is based on inter prediction signaling and intra prediction signaling. In the case of inter prediction, mode selection unit 203 may also select the precision of the motion vector for the block (e.g., sub-pixel or integer-pixel precision).

To perform inter prediction on the current video block, motion estimation unit 204 may generate motion information for the current video block by comparing one or more reference frames from buffer 213 to the current video block. Motion compensation unit 205 may determine a predictive video block for the current video block based on motion information and decoded samples for pictures from buffer 213 other than the picture associated with the current video block.

Motion estimation unit 204 and motion compensation unit 205 may perform different operations on the current video block, e.g., depending on whether the current video block is in an I-slice, a P-slice, or a B-slice.

In some examples, motion estimation unit 204 may perform uni-directional prediction on the current video block, and motion estimation unit 204 may search for a reference video block for the current video block in a list 0 or list 7 reference picture. Motion estimation unit 204 may then generate a reference index that indicates a reference picture in list 0 or list 7, which includes the reference video block and a motion vector that indicates a spatial displacement between the current video block and the reference video block. Motion estimation unit 204 may output the reference index, the prediction direction indicator, and the motion vector as motion information for the current video block. Motion compensation unit 205 may generate a prediction video block for the current block based on a reference video block indicated by motion information for the current video block.

In other examples, motion estimation unit 204 may perform bi-prediction on the current video block, motion estimation unit 204 may search for a reference video block of the current video block in a reference picture in list 0, and may also search for another reference video block of the current video block in a reference picture in list 7. Motion estimation unit 204 may then generate reference indices indicating reference pictures in list 0 and list 7 that include reference video blocks and motion vectors indicating spatial displacements between the reference video blocks and the current video block. Motion estimation unit 204 may output the reference index and the motion vector of the current video block as motion information for the current video block. Motion compensation unit 205 may generate a prediction video block for the current video block based on a reference video block indicated by the motion information for the current video block.

In some examples, motion estimation unit 204 may output complete motion information for the decoding process of the decoder.

In some examples, motion estimation unit 204 may not output the full set of motion information for the current video. Instead, motion estimation unit 204 may signal the motion information of the current video block with reference to the motion information of another video block. For example, motion estimation unit 204 may determine that the motion information of the current video block is sufficiently similar to the motion information of the neighboring video block.

In one example, motion estimation unit 204 may indicate a value in a syntax structure associated with the current video block that indicates to video decoder 900 that the current video block has the same motion information as another video block.

In another example, motion estimation unit 204 may identify another video block and a Motion Vector Difference (MVD) in a syntax structure associated with the current video block. The motion vector difference indicates a difference between a motion vector of the current video block and a motion vector of the indicated video block. The video decoder 300 may use the indicated motion vector and motion vector difference for the video block to determine the motion vector for the current video block.

As described above, the video encoder 200 may predictively signal the motion vector. Two examples of prediction signaling techniques that may be implemented by video encoder 200 include Advanced Motion Vector Prediction (AMVP) and merge mode signaling.

Intra-prediction unit 206 may perform intra-prediction on the current video block. When intra-prediction unit 206 performs intra-prediction on the current video block, intra-prediction unit 206 may generate prediction data for the current video block based on decoded samples of other video blocks in the same picture. The prediction data for the current video block may include the predicted video block and various syntax elements.

Residual generation unit 207 may generate residual data for the current video block by subtracting (e.g., indicated by a negative sign) the prediction video block of the current video block from the current video block. The residual data for the current video block may comprise residual video blocks corresponding to different sample components of samples in the current video block.

In other examples, the current video block may not have residual data for the current video block, e.g., in skip mode, and residual generation unit 207 may not perform the subtraction operation.

Transform processing unit 208 may generate one or more transform coefficient video blocks for the current video block by applying one or more transforms to a residual video block associated with the current video block.

After transform processing unit 208 generates a transform coefficient video block associated with the current video block, quantization unit 209 may quantize the transform coefficient video block associated with the current video block based on one or more Quantization Parameter (QP) values associated with the current video block.

Inverse quantization unit 210 and inverse transform unit 211 may apply inverse quantization and inverse transform, respectively, to the transform coefficient video blocks to reconstruct residual video blocks from the transform coefficient video blocks. Reconstruction unit 212 may add the reconstructed residual video block to corresponding samples from one or more prediction video blocks generated by prediction unit 202 to generate a reconstructed video block associated with the current block for storage in buffer 213.

After reconstruction unit 212 reconstructs the video blocks, loop filtering operations may be performed to reduce video blocking artifacts in the video blocks.

Entropy encoding unit 214 may receive data from other functional components of video encoder 200. When entropy encoding unit 214 receives the data, entropy encoding unit 214 may perform one or more entropy encoding operations to generate entropy encoded data and output a bitstream that includes the entropy encoded data.

Fig. 5 is a block diagram illustrating an example of a video decoder 300, the video decoder 300 may be the video decoder 114 in the system 100 shown in fig. 3.

Video decoder 300 may be configured to perform any or all of the techniques of this disclosure. In the example of fig. 5, the video decoder 300 includes a number of functional components. The techniques described in this disclosure may be shared among various components of the video decoder 300. In some examples, a processor may be configured to perform any or all of the techniques described in this disclosure.

In the example of fig. 5, the video decoder 300 includes an entropy decoding unit 301, a motion compensation unit 302, an intra prediction unit 303, an inverse quantization unit 304, an inverse transform unit 305, a reconstruction unit 306, and a buffer 307. In some examples, video decoder 300 may perform a decoding process that is generally reciprocal to the encoding process described for video encoder 200 (fig. 4).

The entropy decoding unit 301 may retrieve the encoded bitstream. The encoded bitstream may include entropy-coded video data (e.g., encoded blocks of video data). Entropy decoding unit 301 may decode entropy-coded video data, and from the entropy-decoded video data, motion compensation unit 302 may determine motion information, which includes motion vectors, motion vector precision, reference picture list indices, and other motion information. Motion compensation unit 302 may determine this information, for example, by performing AMVP and merge modes.

Motion compensation unit 302 may generate motion compensation blocks, which may perform interpolation based on interpolation filters. An identifier for the interpolation filter used with sub-pixel precision may be included in the syntax element.

Motion compensation unit 302 may use interpolation filters as used by video encoder 200 during encoding of video blocks to calculate interpolated values for sub-integer pixels of a reference block. Motion compensation unit 302 may determine interpolation filters used by video encoder 200 according to the received syntax information and use the interpolation filters to generate prediction blocks.

Motion compensation unit 302 may use some syntax information to determine the size of blocks used to encode frames and/or slices of an encoded video sequence, partitioning information that describes how each macroblock of a picture of the encoded video sequence is partitioned, a mode that indicates how each partition is encoded, one or more reference frames (and reference frame lists) for each inter-coded block, and other information to decode the encoded video sequence.

The intra prediction unit 303 may form a prediction block from spatially neighboring blocks using, for example, an intra prediction mode received in a bitstream. The inverse quantization unit 303 inversely quantizes (i.e., dequantizes) the quantized video block coefficients provided in the bitstream and decoded by the entropy decoding unit 301. The inverse transform unit 303 applies inverse transform.

Reconstruction unit 306 may add the residual block to the corresponding prediction block produced by motion compensation unit 202 or intra-prediction unit 303 to form a decoded block. A deblocking filter may also be applied to filter the decoded blocks, if desired, to remove blocking artifacts. The decoded video blocks are then stored in a buffer 307, the buffer 307 providing reference blocks for subsequent motion compensation/intra prediction, and also generating decoded video for presentation on a display device.

Fig. 6-7 illustrate example methods in which the above-described aspects (e.g., the embodiments shown in fig. 1-5) may be implemented.

Fig. 6 shows a flow diagram of an example method 600 of video processing. The method 600 includes, at operation 610, performing a conversion between a video and a bitstream of the video, the bitstream organized into one or more access units according to rules that specify one or more constraints on at least one of a pair Coded Picture Buffer (CPB) removal time, a nominal CPB removal time, a Decoded Picture Buffer (DPB) output time for an access unit, and a sum of a number of bytes in a Network Abstraction Layer (NAL) unit is based on a sum of picture sizes of each of a plurality of pictures of the access unit.

Fig. 7 shows a flow diagram of an example method 700 of video processing. The method 700 includes, at operation 710, performing a conversion between a video and a bitstream of the video, wherein the bitstream is organized into one or more access units according to a rule that specifies a limit on a maximum number of stripes in an access unit.

A list of preferred solutions for some embodiments is provided next.

A1. A video processing method, comprising: performing a conversion between a video and a bitstream of the video according to a rule, wherein the bitstream comprises one or more bitstream portions that are independently decodable, each bitstream portion corresponding to one or more coded video pictures of the video, and wherein the rule specifies that a maximum decoded picture buffer size required to decode the bitstream or the one or more bitstream portions is determined based on a maximum allowed picture size of the one or more coded video pictures corresponding to the bitstream or the one or more bitstream portions.

A2. The method of scheme 1, further comprising: the determination of the maximum picture buffer size based on the picture size in the luma samples, denoted as PicSizeInSamplesY, is avoided.

A3. The method of scheme 1 or 2, wherein the maximum decoded picture buffer size is represented as MaxDpbSize, maximum luma picture size is represented as MaxLumaPs, the maximum allowed picture size is represented as picsizeinsamplesy, default maximum decoded picture buffer size is represented as maxDpbPicBuf, and wherein MaxDpbSize is determined as follows:

if (PicSizeMaxInSamplesY ≦ (MaxLumaPs > >2))

MaxDpbSize＝Min(4*maxDpbPicBuf,16)

If others (PicSizeMaxInSamplesY ≦ (MaxLumaPs > >1))

MaxDpbSize＝Min(2*maxDpbPicBuf,16)

Others if (PicSizeMaxInSamplesY ≦ ((3. MaxLumaPs) > >2))

MaxDpbSize＝Min((4*maxDpbPicBuf)/3,16)

Others

MaxDpbSize＝maxDpbPicBuf。

A4. The method of any of schemes 1-3, wherein each of the one or more independently decodable bitstream portions is a Coding Layer Video Sequence (CLVS).

A5. The method of any of schemes 1-4, wherein the maximum decoded picture buffer size is derived on a per CLVS basis.

A6. A video processing method, comprising: performing a conversion between a video and a bitstream of the video according to rules, wherein the bitstream comprises one or more bitstream portions that are independently decodable, each bitstream portion corresponding to one or more coded video pictures of the video, and wherein the rules specify at least one of a maximum allowed picture size, a maximum allowed picture width, a maximum allowed picture height for the conversion.

A7. The method of scheme 6, wherein each of the one or more independently decodable portions of the bitstream is a Coding Layer Video Sequence (CLVS).

A8. The method of scheme 7, wherein one or more of the maximum allowed picture size, the maximum allowed picture width, and the maximum allowed picture height are specified on a per CLVS basis.

A9. The method of scheme 7, wherein the maximum allowed picture size, the maximum allowed picture width, and the maximum allowed picture height are specified for each Sequence Parameter Set (SPS) associated with each of the at least one CLVS.

A10. The method of any of schemes 6-9, wherein the rule specifies that the maximum allowed picture size (denoted as PicSizeMaxInSamplesY) for each reference Sequence Parameter Set (SPS) is less than or equal to a maximum luma picture size (denoted as MaxLumaPs).

A11. The method of any of schemes 6-10, wherein the rule specifies that the maximum allowed picture width (denoted pic _ width _ max _ in _ luma _ samples) for each reference Sequence Parameter Set (SPS) is less than or equal to Sqrt (8 x MaxLumaPs), where MaxLumaPs is a maximum luma picture size.

A12. The method of any of schemes 6-11, wherein the rule specifies that the maximum allowed picture height (denoted pic height max in luma samples) for each reference Sequence Parameter Set (SPS) is less than or equal to Sqrt (8 x MaxLumaPs), where MaxLumaPs is a maximum luma picture size.

A13. A video processing method, comprising:

performing a conversion between video and a bitstream of the video comprising one or more coding layer video sequences, wherein the bitstream conforms to a rule, and wherein the rule specifies that a maximum buffer size of decoded pictures of the coding layer video sequences is constrained to be less than or equal to a maximum picture size selected from pictures in the coding layer video sequences.

A14. The method of scheme 13, wherein the maximum buffer size for the decoded picture is selected based on a maximum buffer size from a Decoded Picture Buffer (DPB) parameter set.

A15. The method of scheme 14, wherein the DPB parameter set corresponds to a DPB parameter set of a codec layer that is an outer layer of an Output Layer Set (OLS).

16. The method of scheme 14, wherein the DPB parameter set corresponds to a DPB parameter set of a codec layer that is not an outer layer of an Output Layer Set (OLS).

A17. A video processing method, comprising: performing a conversion between a video and a bitstream of the video, wherein the bitstream comprises one or more output layer sets, wherein at least one output layer set comprises a plurality of video layers, and wherein the bitstream conforms to a rule that specifies a constraint on an overall size of a decoded picture buffer of the at least one output layer set comprising the plurality of video layers.

A18. The method of claim 17, wherein the decoded picture buffer comprises a plurality of decoded pictures after decoding each access unit, wherein each of the plurality of decoded pictures has a width in luma samples, and wherein the constraint specifies that a sum of the widths of the plurality of decoded pictures is less than or equal to a predetermined value.

A19. The method of claim 18, wherein the predetermined value is based on an index of a respective codec layer of the plurality of codec layers.

A20. The method of scheme 17, wherein each of the plurality of coding layers is associated with each of a plurality of decoded pictures, wherein each of the plurality of decoded pictures has a maximum width, and wherein the constraint specifies that a sum of the maximum widths of the plurality of decoded pictures is less than or equal to a predetermined value.

A21. The method of claim 20, wherein the maximum width is a product of a maximum picture width in luma samples and a maximum picture height in luma samples of a corresponding layer of the plurality of codec layers.

A22. The method of any of schemes 1-21, wherein the converting comprises decoding the video from the bitstream.

A23. The method of any of schemes 1-21, wherein the converting comprises encoding the video into the bitstream

A24. The method of any of schemes 1-21, wherein performing the conversion comprises:

encoding the video into the bitstream; and storing the bitstream in a non-transitory computer readable storage medium.

A25. A video processing apparatus comprising a processor configured to perform the method of any one or more of schemes 1-24.

A26. A non-transitory computer readable storage medium configured to store a bitstream of video generated by the method of any one or more of schemes a 1-a 24.

A27. A non-transitory computer readable storage medium configured to store instructions that cause a processor to implement the method of any one or more of embodiments a 1-a 24.

A28. A video processing apparatus for storing a bitstream, wherein the video processing apparatus is configured to implement the method of any one or more of embodiments a 1-a 24.

A29. A method of storing a bitstream of video, comprising: generating a bitstream from the video according to the rules; and storing the bitstream in a non-transitory computer readable storage medium, wherein the bitstream comprises one or more bitstream portions that are independently decodable, each bitstream portion corresponding to one or more coded video pictures of the video, and wherein the rules specify that a maximum decoded picture buffer size required to decode the bitstream or the one or more bitstream portions is determined based on a maximum allowed picture size of the one or more coded video pictures corresponding to the bitstream or the one or more bitstream portions.

A30. A method of storing a bitstream of video, comprising: generating a bitstream from the video according to the rules; and storing the bitstream in a non-transitory computer readable storage medium, wherein the bitstream includes one or more bitstream portions that are independently decodable, each bitstream portion corresponding to one or more coded video pictures of the video, and wherein the rules specify at least one of a maximum allowed picture size, a maximum allowed picture width, a maximum allowed picture height for the converting.

Another list of preferred solutions for some embodiments is provided next.

B1. A video processing method comprising performing a conversion between a video and a bitstream of the video, wherein the bitstream is organized into one or more access units according to a rule, and wherein the rule specifies that one or more constraints on at least one of a Codec Picture Buffer (CPB) removal time, a nominal CPB removal time, a Decoded Picture Buffer (DPB) output time, or a sum of number of bytes in a Network Abstraction Layer (NAL) unit of an access unit are based on a sum of picture sizes of each of a plurality of pictures of the access unit.

B2. The method according to scheme B1, wherein the sum of the number of bytes in a NAL unit is denoted numbytesinnnalunit.

B3. The method according to scheme B1 or B2, wherein the rule is independent of the current size of the decoded picture in luma samples (denoted as PicSizeInSamplesY).

B4. The method according to scheme B1, wherein the rule specifies that a nominal CPB removal time of an Access Unit (AU) satisfies a constraint:

AuNominalRemovalTime[n]-AuCpbRemovalTime[n-1]

≥Max(AuSizeInSamplesY[n-1]÷MaxLumaSr,fR)，

where AuNominalRemovalTime [ n ] is the nominal CPB removal time for the nth AU, AuCpbRemovalTime [ n-1] is the CPB removal time for the nth-1 AU, AuSizeInSamplesY [ n-1] is the size of the nth-1 AU in samples, MaxLumaSr is the maximum luma sampling rate (number of samples per second), fR is a variable equal to 1/300, where n is an integer greater than 0.

B5. The method according to scheme B1, wherein the rule states that a difference between DPB output times of pictures from different Access Units (AU) of the DPB satisfies a constraint:

DpbOutputInterval[n]≥Max(AuSizeInSamplesY[n-1]÷MaxLumaSr,fR),

where DpbOutputInterval [ n ] is the DPB output time of the nth AU, MaxLumaSr is the maximum luminance sample rate (samples per second), AuSizeInSamplesY [ n-1] is the size of the (n-1) th AU in a sample point, and fR is a variable equal to 1/300, where n is an integer greater than zero.

B6. The method according to scheme B1, wherein the rule specifies that the sum of the number of bytes in the NAL unit satisfies the constraint:

NumBytesInNalUnit[0]≤FormatCapabilityFactor

×(Max(AuSizeInSamplesY[0]÷MaxLumaSr,fR×MaxLumaSr)

+MaxLumaSr×(AuCpbRemovalTime[0]-AuNominalRemovalTime[0]))÷MinCr,

where NumBytesInNalUnit [0] is the sum of the number of bytes in the NAL unit of the first Access Unit (AU), AuSizeInSamplesY [0] is the size of the first one of the samples, MaxLumaSr is the maximum luma sample rate (number of samples per second), aucpbreaktime [0] is the CPB removal time of the first AU, AuNominalRemovalTime [0] is the nominal CPB removal time of the first AU, MinCr is the minimum compression level, fR is a variable equal to 1 ÷ 300.

B7. A video processing method comprising performing a conversion between a video and a bitstream of the video, wherein the bitstream is organized into one or more access units according to rules, and wherein the rules specify a limit on a maximum number of stripes in an access unit.

B8. The method according to scheme B7, wherein the rule further specifies that the maximum number of stripes in the access unit (denoted as maxsliceserau) is based on the level of the access unit.

B9. The method according to scenario B8, wherein the following table is used to determine the maximum number of stripes in an access unit:

rank of	1	2	2.1	3	3.1	4	4.1	5	5.1	5.2	6	6.1	6.2
														MaxSlicesPerAu	16	16	20	30	40	75	75	200	200	200	600	600	600

B10. The method according to scheme B7, wherein the rule further specifies that the constraint on the Codec Picture Buffer (CPB) removal time (denoted as AuCpbRemovalTime) for each access unit is based on a limit on the maximum number of stripes in the access unit.

B11. The method according to scheme B10, wherein the rule specifies that the CPB removal time (denoted as AuCpbRemovalTime [0]) for the first access unit satisfies the constraint:

NumSlicesPerAu[0]≤Min(Max(1,MaxSlicesPerAu×MaxLumaSr/MaxLumaPs×(AuCpbRemovalTime[0]-AuNominalRemovalTime[0])+MaxSlicesPerAu×AuSizeInSamplesY[0]/MaxLumaPs),MaxSlicesPerAu)，

where MaxSlicesPerAu is the maximum number of stripes in an access unit, NumSilespPerAu [0] is the number of stripes in a first Access Unit (AU), MaxLumaPs is the maximum luma picture size, MaxLumaSr is the maximum luma sample rate (in samples per second), AuCpbRemovalTime [0] is the CPB removal time of the first Au, and AuNominalRemovalTime [0] is the nominal CPB removal time of the first AU.

B12. The method according to scenario B10, wherein the rule states that the difference between consecutive CPB removal times satisfies the constraint:

NumSlicesPerAu[n]≤Min((Max(1,MaxSlicesPerAu×MaxLumaSr/MaxLumaPs×(AuCpbRemovalTime[n]-AuCpbRemovalTime[n-1])),MaxSlicesPerAu)

where MaxSlicesPerAu is the maximum number of stripes in an access unit, NumSilespPerAu [ n ] is the number of stripes in the nth Access Unit (AU), MaxLumaPs is the maximum luma picture size, MaxLumaSr is the maximum luma sampling rate (in samples per second), AuCpbRemovalTime [ n ] is the CPB removal time for the nth AU, AuCpbRemovalTime [ n-1] is the CPB removal time for the (n-1) th AU, where n is an integer greater than zero.

B13. The method according to scheme B7, wherein the rule is independent of a maximum number of slices per picture.

B14. The method according to any of schemes B1-B13, wherein the converting comprises decoding the video from a bitstream.

B15. The method of any of schemes B1-B13, wherein converting comprises encoding the video into a bitstream.

B16. The method of any of schemes B1-B13, wherein performing the conversion comprises encoding the video into a bitstream; and storing the bitstream in a non-transitory computer readable storage medium.

B17. A video processing apparatus comprising a processor configured to implement the method of any one or more of schemes B1 through B16.

B18. A non-transitory computer readable storage medium configured to store a bitstream of video generated by the method of any one or more of schemes B1-B16.

B19. A non-transitory computer readable storage medium configured to store instructions that cause a processor to implement the method of any one or more of schemes B1-B16.

B20. A video processing apparatus for storing a bitstream, wherein the video processing apparatus is configured to implement the method of any one or more of schemes B1-B16.

B21. A method for storing a video bitstream includes generating a bitstream from a video according to rules; and storing the bitstream in a non-transitory computer readable storage medium, wherein the bitstream is organized into one or more access units according to a rule, and wherein the rule specifies that one or more constraints on at least one of a Coded Picture Buffer (CPB) removal time, a nominal CPB removal time, a Decoded Picture Buffer (DPB) output time, or a sum of number of bytes in a Network Abstraction Layer (NAL) unit by an access unit are based on a sum of picture sizes of each of a plurality of pictures of the access unit.

B22. A method for storing a bitstream of video, comprising: generating a bitstream from the video according to the rules; and storing the bitstream in a non-transitory computer readable storage medium, wherein the bitstream is organized into one or more access units according to a rule, and wherein the rule specifies a limit on a maximum number of stripes in an access unit.

A further list of preferred solutions for some embodiments is provided next.

P1. a video processing method comprising performing a conversion between a video unit of video and a codec representation of the video, wherein a maximum picture buffer size used during the conversion is determined from a maximum picture size in a picture in a codec layer of the video unit, wherein the maximum picture buffer size is specific to the codec layer.

P2. the method according to scheme P1, wherein the determination of the maximum picture buffer size is independent of variables defining the picture size in relation to the coding layer.

P3. a video processing method comprising performing a conversion between a video unit of a video and a codec representation of the video, wherein the codec representation complies with a format rule that specifies that a constraint related to a maximum buffer size of a decoded picture applies only to a single layer codec representation.

P4. the method according to scheme P3, wherein the format rules further specify that, in case the codec representation comprises multiple layers of video, different values of the maximum decoder buffer size are applicable for different layers.

P5. A video processing method comprising performing a conversion between a video unit of video and a codec representation of the video, wherein the codec representation complies with a format rule that specifies that, in the case that the codec representation comprises multiple layers, values for a maximum picture size, a picture width and a picture height of a picture are specified separately within each codec layer of the codec representation of the video.

P6. method according to scheme P5, wherein the values are specified at the level of the sequence parameter set.

P7. the method according to any of schemes P1 to P6, wherein performing the conversion comprises encoding the video to generate the codec representation.

P8. the method according to any of schemes P1 to P6, wherein performing the conversion comprises parsing and decoding the codec representation to generate the video.

P9. a video decoding apparatus comprising a processor configured to implement the method of one or more of schemes P1 to P8.

P10. a video coding device comprising a processor configured to implement the method of one or more of schemes P1 to P8.

P11. a computer program product having stored thereon computer code which, when executed by a processor, causes the processor to implement the method of any one of schemes P1 to P8.

In this document, the term "video processing" may refer to video encoding, video decoding, video compression, or video decompression. For example, a video compression algorithm may be applied during conversion from a pixel representation of a video to a corresponding bitstream representation, and vice versa. The bitstream representation (or simply, the bitstream) of the current video block may, for example, correspond to bits that are co-located or interspersed at different locations within the bitstream, as defined by the syntax. For example, a macroblock may be encoded according to transform and encoded error residual values, and bits in the header and other fields in the bitstream may also be used.

Implementations of the subject matter and the functional operations described in this patent document can be implemented as various systems, digital electronic circuitry, or computer software, firmware, or hardware, including the structures disclosed in this specification and their equivalents, or in combinations of one or more of them. Implementations of the subject matter described in this specification can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a tangible and non-transitory computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory apparatus, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more of them. The term "data processing apparatus" encompasses all devices, apparatus, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.

A computer program (also known as a program, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read-only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer need not have such a device. Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, such as internal hard disks or removable disks; magneto-optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Although this patent document contains many specifics, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features that may be specific to particular embodiments of particular inventions. Certain features that are described in this patent document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Moreover, the separation of various system components in the embodiments described in this patent document should not be understood as requiring such separation in all embodiments.

Only some embodiments and examples are described and other embodiments, enhancements and variations can be made based on what is described and illustrated in this patent document.

Claims

1. A video processing method, comprising:

performing a conversion between a video and a bitstream of the video,

wherein the bitstream is organized into one or more access units according to rules, and

wherein the rule specifies that one or more constraints on at least one of a sum of a Coded Picture Buffer (CPB) removal time, a nominal CPB removal time, a Decoded Picture Buffer (DPB) output time, and a number of bytes in a Network Abstraction Layer (NAL) unit for an access unit are based on a sum of picture sizes of each of a plurality of pictures of the access unit.

2. The method of claim 1, wherein the sum of the number of bytes in the NAL unit is denoted as numbytesinnnalunit.

3. The method according to claim 1 or 2, wherein the rule is independent of the current size of the decoded picture in the luma samples (denoted as PicSizeInSamplesY).

4. The method of claim 1, wherein the rule specifies that the nominal CPB removal time of an access unit satisfies a constraint:

AuNominalRemovalTime[n]-AuCpbRemovalTime[n-1]

≥Max(AuSizeInSamplesY[n-1]÷MaxLumaSr,fR),

wherein AuNominalRemovalTime [ n ] is the nominal CPB removal time for the nth access unit, AuCpbRemovalTime [ n-1] is the nominal CPB removal time for the (n-1) th access unit, AuSizeInSamplesY [ n-1] is the size of the (n-1) th access unit in a sample point, MaxLumaSr is the maximum luma sample rate (number of samples per second), fR is a variable equal to 1 ÷ 300, and wherein n is an integer greater than 0.

5. The method of claim 1, wherein the rule specifies that a difference between DPB output times of pictures from different Access Units (AUs) of the DPB satisfies a constraint of:

DpbOutputInterval[n]≥Max(AuSizeInSamplesY[n-1]÷MaxLumaSr,fR),

where DpbOutputInterval [ n ] is the DPB output time of the nth access unit, MaxLumaSr is the maximum luma sampling rate (number of samples per second), AuSizeInSamplesY [ n-1] is the size of the (n-1) th access unit in a sample, fR is a variable equal to 1/300, and where n is an integer greater than 0.

6. The method of claim 1, wherein the rule specifies that a sum of a number of bytes in the NAL unit satisfies a constraint:

NumBytesInNalUnit[0]≤FormatCapabilityFactor×(Max(AuSizeInSamplesY[0]÷MaxLumaSr,fR×MaxLumaSr)+MaxLumaSr×(AuCpbRemovalTime[0]-AuNominalRemovalTime[0]))÷MinCr,

where NumBytesInNalUnit [0] is the sum of the number of bytes in the NAL unit of the first Access Unit (AU), AuSizeInSamplesY [0] is the size of the first access unit in a sample point, MaxLumaSr is the maximum luminance sampling rate (number of samples per second), AuCpbRemovalTime [0] is the CPB removal time of the first access unit, aunomialremovaltime [0] is the nominal CPB removal time of the first access unit, MinCr is the minimum compression level, fR is a variable equal to 1 ÷ 300.

7. A video processing method, comprising:

performing a conversion between a video and a bitstream of the video,

wherein the rule specifies a limit on a maximum number of stripes in an access unit.

8. The method of claim 7, wherein the rule further specifies that a maximum number of stripes (denoted as MaxSlicesPerAu) in the access unit is based on a rank of the access unit.

9. The method of claim 8, wherein the maximum number of stripes in the access unit is determined using the following table:

10. The method of claim 7, wherein the rules further specify that a constraint on a Codec Picture Buffer (CPB) removal time (denoted as AuCpbRemovalTime) for each access unit is based on a limit on a maximum number of slices in the access unit.

11. The method of claim 10, wherein the rule specifies that a CPB removal time (denoted as AuCpbRemovalTime [0]) for a first access unit satisfies a constraint:

NumSlicesPerAu[0]≤Min(Max(1,MaxSlicesPerAu×MaxLumaSr/MaxLumaPs×(AuCpbRemovalTime[0]-AuNominalRemovalTime[0])+MaxSlicesPerAu×AuSizeInSamplesY[0]/MaxLumaPs),MaxSlicesPerAu),

wherein MaxSliceSerAu is the maximum number of stripes of the access unit, NumSilespERAU [0] is the number of stripes in the first Access Unit (AU), MaxLuMaPs is the maximum luma picture size, MaxLuMaSr is the maximum luma sampling rate (number of samples per second), AuCpbRemovalTime [0] is the CPB removal time of the first access unit, and AuNominalRemovalTime [0] is the nominal CPB removal time of the first access unit.

12. The method of claim 10, wherein the rule specifies that a difference between consecutive CPB removal times satisfies a constraint:

where, maxslice pear is the maximum number of stripes in the access unit, numsolicelpera [ n ] is the number of stripes of the nth Access Unit (AU), MaxLumaPs is the maximum luma picture size, MaxLumaSr is the maximum luma sampling rate (number of samples per second), AuCpbRemovalTime [ n ] is the CPB removal time of the nth access unit, and AuCpbRemovalTime [ n-1] is the nominal CPB removal time of the (n-1) th access unit, where n is an integer greater than 0.

13. The method of claim 7, wherein the rule is independent of a maximum number of slices per picture.

14. The method of any of claims 1 to 13, wherein the converting comprises decoding the video from the bitstream.

15. The method of any of claims 1-13, wherein the converting comprises encoding the video into the bitstream.

16. The method of any of claims 1-13, wherein performing the conversion comprises:

encoding the video into the bitstream; and

storing the bitstream in a non-transitory computer-readable storage medium.

17. A video processing apparatus comprising a processor configured to perform the method of any one or more of claims 1-16.

18. A non-transitory computer readable storage medium configured to store a bitstream of video generated by the method of any one or more of claims 1 to 16.

19. A non-transitory computer-readable storage medium configured to store instructions that cause a processor to implement the method of any one or more of claims 1-16.

20. A video processing apparatus storing a bitstream, wherein the video processing apparatus is configured to implement the method of any one or more of claims 1 to 16.

21. A method of storing a bitstream of video, comprising:

generating a bitstream from the video according to the rules; and

storing the bitstream in a non-transitory computer-readable storage medium,

wherein the bitstream is organized into one or more access units according to the rules, and

22. A method of storing a bitstream of video, comprising:

generating a bitstream from the video according to the rules; and

storing the bitstream in a non-transitory computer-readable storage medium,