CN114189694B - Parameter sets for video coding - Google Patents

Parameter sets for video coding Download PDF

Info

Publication number
CN114189694B
CN114189694B CN202111266613.6A CN202111266613A CN114189694B CN 114189694 B CN114189694 B CN 114189694B CN 202111266613 A CN202111266613 A CN 202111266613A CN 114189694 B CN114189694 B CN 114189694B
Authority
CN
China
Prior art keywords
hps
parameter set
parameter
type
slice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111266613.6A
Other languages
Chinese (zh)
Other versions
CN114189694A (en
Inventor
弗努·亨德里
王业奎
陈建乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN114189694A publication Critical patent/CN114189694A/en
Application granted granted Critical
Publication of CN114189694B publication Critical patent/CN114189694B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/109Selection of coding mode or of prediction mode among a plurality of temporal predictive coding modes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/174Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/184Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being bits, e.g. of the compressed video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/186Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a colour or a chrominance component
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/187Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a scalable video layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/30Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability
    • H04N19/31Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using hierarchical techniques, e.g. scalability in the temporal domain
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/46Embedding additional information in the video signal during the compression process
    • H04N19/463Embedding additional information in the video signal during the compression process by compressing encoding parameters before transmission
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/80Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation
    • H04N19/82Details of filtering operations specially adapted for video compression, e.g. for pixel interpolation involving filtering within a prediction loop

Abstract

A video coding mechanism is disclosed. The mechanism includes receiving a codestream, the codestream including: a first parameter set including first type coding tool parameters, a second parameter set including second type coding tool parameters, a slice header, and a slice associated with the slice header. The mechanism also includes determining that the slice header contains a first identification of the first parameter set and a second identification of the second parameter set. The mechanism further comprises decoding the slice using the first type of coding tool parameter and the second type of coding tool parameter upon determining that the slice header contains the first identification and the second identification. The mechanism also includes sending the slice for display as part of a decoded video sequence.

Description

Parameter sets for video coding
Cross reference to related applications
This patent application claims the benefit of U.S. provisional patent application No. 62/756,983, entitled "Header Parameter Set For Video Coding" filed on 7/11/2018, by FNU Hendry et al, which is incorporated herein by reference.
Technical Field
The present disclosure relates generally to video coding, and in particular, to efficient indication (signaling) of coding tool parameters for compressing video data in video coding.
Background
Even where video is short, a large amount of video data needs to be described, which can cause difficulties when the data is to be streamed or otherwise transmitted in a communication network with limited bandwidth capacity. Therefore, video data is typically compressed and then transmitted in modern telecommunication networks. Since memory resources may be limited, the size of the video may also be an issue when storing the video in the storage device. Video compression devices typically use software and/or hardware on the source side to decode the video data prior to transmission or storage, thereby reducing the amount of data required to represent digital video images. The compressed data is then received at the destination side by a video decompression device for decoding the video data. With limited network resources and an ever-increasing demand for higher video quality, there is a need for improved compression and decompression techniques that can increase the compression ratio with little impact on image quality.
Disclosure of Invention
In one embodiment, the invention includes a method implemented in a decoder, the method comprising: a receiver of the decoder receives a code stream, the code stream including: a first Header Parameter Set (HPS) including first type coding tool parameters, a second HPS including second type coding tool parameters, a slice header, and a slice associated with the slice header; determining, by a processor of the decoder, that the slice header includes a first reference to the first HPS and a second reference to the second HPS; the processor, upon determining that the slice header includes the first reference and the second reference, decoding the slice using the first type of coding tool parameter and the second type of coding tool parameter; and the processor sending the slice for display as part of a decoded video sequence. The HPS, also called an Adaptation Parameter Set (APS), may be used to describe video data having a smaller granularity than a Picture Parameter Set (PPS) but a larger granularity than a slice header. The disclosed aspects allow a single bandhead to reference multiple types of HPSs. By providing a mechanism that allows a single slice header to reference multiple types of HPSs, various transcoding tool parameters can be indicated at the HPS level. This allows coding tool parameters to be changed between slices in the same picture/frame without loading slice headers with additional data. Thus, the encoder has more flexibility in performing Rate Distortion Optimization (RDO) because coding tool parameters between different slices in the same picture may not be loaded into each slice header. Furthermore, as an optimal decoding solution is found, the encoder has access to more encoding options, and thus the average decoding efficiency is improved. Therefore, the occupancy rate of memory resources and the occupancy rate of network resources when the encoder and the decoder store the video data and send the video data are reduced.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first HPS and the second HPS include an Adaptive Loop Filter (ALF) HPS, a luma mapping with chroma scaling (LMCS) HPS, a scaling list parameter HPS, or a combination thereof.
Optionally, in any of the above aspects, another implementation of the aspects is provided: the code stream further comprises a plurality of HPSs, wherein the plurality of HPSs comprise the first HPS and the second HPS, and each HPS in the plurality of HPSs is prevented from referring to decoding tool parameters from other HPSs in the plurality of HPSs. In some systems, the current HPS may inherit the coding parameters of the previous HPS. In theory, this allows the current HPS to include only the differences between the current HPS and the previous HPS. However, in practice the inheritance chain of this method is usually long, requiring the decoder to retain a large number of old HPSs in the buffer as long as the subsequent HPSs may refer to the old HPSs. This both creates buffer memory problems and increases the likelihood of decoding errors due to HPS losses during transmission. Aspects of the present invention address this problem by requiring each HPS to contain all relevant decoding parameters without reference to other HPSs.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first HPS is included in an access unit associated with a time Identifier (ID), wherein the first HPS includes the time ID associated with the access unit including the first HPS. In one example, each HPS may need to maintain the same time ID as the access unit containing the HPS.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the strip is part of an image, wherein the image is associated with a temporal ID, the first HPS containing the temporal ID associated with the image. In some examples, each HPS may need to maintain the same time ID as the image associated with the first strip of the reference HPS.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the codestream includes a plurality of pictures, each picture containing one or more stripes including the stripe, wherein the codestream further includes a plurality of HPSs including the first HPS and the second HPS, wherein each of the HPSs and each of the stripes are associated with one of a plurality of time IDs, wherein each stripe having a first time ID is blocked from referencing any HPS having a second time ID, the second time ID being greater than the first time ID. In some examples, the codestream of images and slices may be associated with one of multiple temporal IDs (e.g., one of three temporal IDs). The time IDs are each associated with a corresponding frame rate. Data items at a higher frame rate may be ignored while rendering (render) a lower frame rate. In this example, the stripes are prevented from referencing the HPS with a larger time ID and a higher frame rate. This approach ensures that the stripes do not reference the higher frame rate HPS, which will be ignored when rendering the lower frame rates of the stripes. This may ensure that HPS is actually available for the stripe and is not ignored due to frame rate mismatch.
In one embodiment, the invention includes a method implemented in an encoder, the method comprising: a processor of the encoder segmenting a plurality of images into a plurality of slices; the processor encodes the plurality of stripes into a code stream, wherein the stripes are encoded by at least a first type of decoding tool based on a first type of decoding tool parameter and a second type of decoding tool based on a second type of decoding tool parameter; the processor encodes a first HPS and a second HPS into the code stream, wherein the first HPS comprises the first type decoding tool parameters, and the second HPS comprises the second type decoding tool parameters; encoding, by the processor, a first slice header into the codestream describing an encoding of a first slice of the plurality of slices, wherein the first slice header contains a first reference to the first HPS and a second reference to the second HPS; and a memory coupled to the processor stores the codestream sent to a decoder. HPS, also known as APS, may be used to describe video data having a smaller granularity than PPS but larger granularity than the slice header. The disclosed aspects allow a single bandhead to reference multiple types of HPSs. By providing a mechanism that allows a single slice header to reference multiple types of HPSs, various transcoding tool parameters can be indicated at the HPS level. This allows coding tool parameters to be changed between slices in the same picture/frame without loading slice headers with additional data. Thus, the encoder has greater flexibility in performing RDO because coding tool parameters between different slices in the same image may not be loaded into each slice header. Furthermore, as an optimal decoding solution is found, the encoder has access to more encoding options, and thus the average decoding efficiency is improved. Therefore, the occupancy rate of memory resources and the occupancy rate of network resources are reduced when the encoder and the decoder side store the video data and send the video data.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first HPS and the second HPS comprise ALF HPS, LMCS HPS, scaling list parameters HPS, or a combination thereof.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the method further includes the processor encoding a plurality of HPSs into the codestream, wherein the plurality of HPSs include the first HPS and the second HPS, wherein each HPS of the plurality of HPSs is prevented from referencing transcoding tool parameters from other HPSs of the plurality of HPSs. In some systems, the current HPS may inherit the coding parameters of the previous HPS. In theory, this allows the current HPS to include only the differences between the current HPS and the previous HPS. However, in practice the inheritance chain of this method is usually long, requiring the decoder to retain a large number of old HPSs in the buffer as long as the subsequent HPSs may refer to the old HPSs. This creates both buffer memory problems and increases the likelihood of decoding errors due to HPS loss during transmission. Aspects of the present invention address this problem by requiring each HPS to contain all relevant decoding parameters without reference to other HPSs.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first HPS is included in an access unit associated with a time ID, wherein the first HPS includes the time ID associated with the access unit including the first HPS. In one example, each HPS may need to maintain the same time ID as the access unit containing the HPS.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the first slice is segmented from a first image, wherein the image is associated with a temporal ID, and the first HPS contains the temporal ID associated with the image. In some examples, each HPS may need to maintain the same time ID as the image associated with the first strip of the reference HPS.
In any of the above aspects, there is provided another implementation of the aspect: encoding a plurality of HPSs including the first HPS and the second HPS, wherein each of the plurality of HPSs and each of the stripes is associated with a time ID of a plurality of time IDs, wherein each stripe having a first time ID is blocked from reference to any HPS having a second time ID, the second time ID being greater than the first time ID. In some examples, the codestream of images and slices may be associated with one of multiple temporal IDs (e.g., one of three temporal IDs). The time IDs are each associated with a corresponding frame rate. Data items at higher frame rates may be ignored when presenting lower frame rates. In this example, the stripes are prevented from referencing the HPS with the larger time ID and the higher frame rate. This approach ensures that the stripes do not reference the higher frame rate HPS, which will be ignored when rendering the lower frame rates of the stripes. This may ensure that HPS is actually available for the stripe and is not ignored due to frame rate mismatch.
In one embodiment, this disclosure includes a video coding apparatus comprising: a processor, a receiver coupled to the processor, a memory, and a transmitter, wherein the processor, receiver, memory, and transmitter are configured to perform the method of any of the preceding aspects.
In one embodiment, the invention includes a non-transitory computer readable medium comprising a computer program product for use by a video coding apparatus, wherein the computer program product comprises computer executable instructions stored in the non-transitory computer readable medium which, when executed by a processor, cause the video coding apparatus to perform the method of any of the above aspects.
In one embodiment, the invention includes a decoder comprising: a receiving module, configured to receive a code stream, where the code stream includes: a first HPS including a first type of coding tool parameter, a second HPS including a second type of coding tool parameter, a slice header, and a slice associated with the slice header; a determination module to determine that the slice header includes a first reference to the first HPS and a second reference to the second HPS; a decoding module for decoding the slice using the first type of coding tool parameters and the second type of coding tool parameters when it is determined that the slice header includes the first reference and the second reference; and a transmitting module for transmitting the slice for display as part of a decoded video sequence.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the decoder is also for performing the method of any of the above aspects.
In one embodiment, the invention includes an encoder comprising: a segmentation module for segmenting the plurality of images into a plurality of strips; an encoding module to: encoding the plurality of stripes into a code stream, wherein the stripes are encoded by at least a first type of decoding tool based on a first type of decoding tool parameter and a second type of decoding tool based on a second type of decoding tool parameter; encoding a first HPS and a second HPS into the code stream, wherein the first HPS comprises the first type decoding tool parameter, and the second HPS comprises the second type decoding tool parameter; and encoding a first slice header into the codestream describing an encoding of a first slice of the plurality of slices, wherein the first slice header contains a first reference to the first HPS and a second reference to the second HPS; and the storage module is used for storing the code stream sent to the decoder.
Optionally, in any of the above aspects, there is provided another implementation of the aspect: the encoder is also for performing the method of any of the above aspects.
For clarity, any of the embodiments described above may be combined with any one or more of the other embodiments described above to create a new embodiment within the scope of the present invention.
These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
Drawings
For a more complete understanding of the present invention, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
FIG. 1 is a flow diagram of an exemplary method of coding a video signal;
FIG. 2 is a schematic diagram of an exemplary encoding and decoding (codec) system for video coding;
FIG. 3 is a schematic diagram of an exemplary video encoder;
FIG. 4 is a schematic diagram of an exemplary video decoder;
fig. 5 is a diagram of an exemplary bitstream containing a coded video sequence with a Header Parameter Set (HPS);
FIG. 6 is a schematic diagram of an exemplary mechanism for time scaling;
FIG. 7 is a schematic diagram of an exemplary video coding apparatus;
FIG. 8 is a flow diagram of an exemplary method for encoding a video sequence into a codestream using HPS;
FIG. 9 is a flow diagram of an exemplary method for decoding a video sequence from a codestream using HPS;
FIG. 10 is a schematic diagram of an exemplary system for decoding a video sequence of images in a codestream using HPS.
Detailed Description
It should be understood at the outset that although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The present invention should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
The following abbreviations are used herein: adaptive Loop Filter (ALF), coding Tree Block (CTB), coding Tree Unit (CTU), coding Unit (CU), coded Video Sequence (CVS), dynamic Adaptive media stream based on Hypertext Transfer protocol (DASH), video Joint Experts group (Joint Video Experts Team, JFET), motion Constrained Block Set (Motion-Constrained Tile Block Set, MCTS), maximum Transmission Unit (MTU), network Abstraction Layer (NAL), picture Order Count (POC), raw Byte Sequence Payload (RBSP), pixel Adaptive Offset (SAO), sequence Parameter Set (SPS), universal Video Coding (Versatile Video Coding, VVC), and Working Draft (WD).
Many video compression techniques can be used to reduce the size of video files while minimizing data loss. For example, video compression techniques may include performing spatial (e.g., intra) prediction and/or temporal (e.g., inter) prediction to reduce or remove data redundancy in a video sequence. For block-based video coding, a video slice (e.g., a video picture or a portion of a video picture) may be partitioned into video blocks, which may also be referred to as treeblocks, coding Tree Blocks (CTBs), coding Tree Units (CTUs), coding Units (CUs), and/or coding nodes. Video blocks in an intra-coded (I) slice of an image are coded using spatial prediction with respect to reference pixels in neighboring blocks in the same image. Video blocks in inter-coded uni-predictive (P) or bi-predictive (B) slices of an image may be coded using spatial prediction with respect to reference pixels in neighboring blocks in the same image or using temporal prediction with respect to reference pixels in other reference images. A picture may be referred to as a frame (frame) and/or a picture (picture/image), and a reference picture may be referred to as a reference frame and/or a reference picture. Spatial or temporal prediction produces a prediction block that represents an image block. The residual data represents pixel differences between the original image block and the prediction block. Therefore, an inter-coded block is encoded based on a motion vector pointing to a block of reference pixels constituting a predicted block and residual data representing the difference between the coded block and the predicted block. The intra-coded block is encoded according to the intra-coded mode and the residual data. For further compression, the residual data may be transformed from the pixel domain to the transform domain. This results in residual transform coefficients, which can be quantized. The quantized transform coefficients may initially be arranged in a two-dimensional array. The quantized transform coefficients may be scanned to produce a one-dimensional vector of transform coefficients. Entropy coding may be applied to achieve greater compression. Such video compression techniques will be discussed in more detail below.
To ensure that the encoded video can be accurately decoded, the video is encoded and decoded according to a corresponding video coding standard. Video Coding standards include the International Telecommunication Union (ITU) Telecommunication Standardization Sector (ITU-T) H.261, the International Organization for Standardization/International Electrotechnical Commission (International Organization for Standardization/International Electrotechnical Commission, ISO/IEC) moving Picture Experts Group (Motion Picture Experts Group, MPEG) -1 part 2, ITU-T H.262 or ISO/IEC-2 part 2, ITU-T H.263, ISO/IEC-4 part 2, advanced Video Coding (Advanced Video Coding, AVC) (also known as ITU-T H.264 or ISO/MPEG-4 part 10), and High Efficiency Video Coding (High Efficiency Coding, also known as ITU-T H.265 or ITU/MPEG-4 part 265). AVC includes Scalable Video Coding (SVC), multiview Video Coding (MVC), and Multiview Video Coding plus Depth (MVC + D) and three-dimensional (3D) AVC (three dimensional AVC, 3D-AVC) extensions. HEVC includes Scalable HEVC (SHVC), multi-view HEVC (MV-HEVC), and 3D HEVC (3D-HEVC) extensions. The joint Video experts group (JVT) of ITU-T and ISO/IEC has begun to develop a Video Coding standard called Universal Video Coding (VVC). The VVC is included in a Working Draft (WD) that includes JFET-L1001-v 1 and JFET-K1002-v 3, which respectively provide an algorithmic description, an encoding side description, and reference software for the VVC WD.
To decode a video image, the image is first segmented and the resulting segments are decoded into a code stream. There are a variety of image segmentation schemes. For example, the image may be segmented into regular stripes, dependent stripes, tiles (tiles), and/or segmented according to wave front Parallel Processing (WPP). For simplicity, HEVC limits the encoder so that only regular slices, dependent slices, partitions, WPPs, and combinations thereof can be used when partitioning slices into CTB groups for video coding. This partitioning may be used to support Maximum Transfer Unit (MTU) size matching, parallel processing, and reduce end-to-end latency. The MTU represents the maximum amount of data that can be transmitted in a single message. If the message payload exceeds the MTU, the payload is divided into two messages by a process called fragmentation.
Regular bands, also referred to simply as bands, are segments of an image that can be reconstructed independently of other regular bands in the same image, although there is some interdependency due to the loop filtering operation. Each rule stripe is encapsulated in its own Network Abstraction Layer (NAL) unit for transmission. Furthermore, intra prediction (intra pixel point prediction, motion information prediction, coding mode prediction) and entropy coding dependencies across slice boundaries may be disabled to support independent reconstruction. This independent reconstruction supports parallelization. For example, rule stripe based parallelization uses minimal inter-processor or inter-core communication. However, since each regular stripe is independent, each stripe is associated with a separate stripe head. Using regular slices can cause large coding overhead due to the bit cost of the slice header for each slice, and due to the lack of prediction across slice boundaries. Furthermore, a rule stripe may be used to support MTU size matching requirements. In particular, since the regular stripes are encapsulated in separate NAL units and can be independently decoded, each regular stripe should be smaller than the MTU in the MTU scheme to avoid dividing the stripe into multiple packets. Thus, to achieve parallelization and MTU size matching, the stripe layouts in the image may be contradictory to each other.
Non-independent slices are similar to regular slices, but have shortened slice headers and allow for segmentation of image tree block boundaries without affecting intra prediction. Thus, non-independent slices allow a regular slice to be split into multiple NAL units, which reduces end-to-end latency by sending a portion of the regular slice before encoding of the entire regular slice is complete.
Tiles are segments of an image created by horizontal and vertical boundaries that create columns and rows of tiles. The partitions may be decoded in raster scan order (right to left, top to bottom). The scanning order of CTBs is done inside the partitions. Thus, the CTBs in the first partition are decoded in raster scan order before proceeding to the CTBs in the next partition. Similar to regular stripes, intra prediction dependencies as well as entropy decoding dependencies are removed by partitioning. However, no partitions may be included in a single NAL unit, and thus, partitions may not be used for MTU size matching. Each partition may be processed by one processor/core, and inter-processor/inter-core communication (which is used for intra prediction between processing units that decode neighboring partitions) may be limited to sending shared slice headers (when neighboring partitions are in the same slice) and performing loop-filter related sharing of reconstructed pixel points and metadata. When multiple chunks are included in a stripe, in addition to the first entry point offset in the stripe, a signal may be indicated (signal) in the stripe header for the entry point byte offset of each chunk. For each stripe and partition, at least one of the following conditions should be satisfied: (1) all the coding tree blocks in a stripe belong to the same partition; and (2) all the coding tree blocks in the block belong to the same stripe.
In WPP, a picture is divided into single lines of CTBs. The entropy decoding and prediction mechanism may use data from CTBs in other rows. Parallel processing is supported by parallel decoding of the CTB lines. For example, the current row may be decoded in parallel with the previous row. However, the decoding of the current row is delayed by two CTBs compared to the decoding process of the previous row. This delay ensures that data related to CTBs above and to the right of the current CTB in the current row is available before the current CTB is decoded. When represented graphically, the method appears as a wavefront. This staggered start supports as many processors/cores as possible for parallel processing if the picture contains CTB lines. Since intra prediction is supported between adjacent treeblock rows within the image, intra prediction may be enabled by inter-processor/inter-core communication. WPP partitioning does not consider NAL unit size. Therefore, WPP does not support MTU size matching. However, regular stripes can be used in conjunction with WPP to achieve MTU size matching as needed, which requires some decoding overhead.
The tiles define horizontal and vertical boundaries that divide the image into tile columns and rows. The scan order of CTBs may be changed to local before the top left CTB of the next partition is decoded in the order of the block raster scan of the picture. The local scan order represents the CTB raster scan order of the tile. Similar to regular stripes, partitioning may remove intra prediction dependencies as well as entropy decoding dependencies. However, the partition may not be included in each NAL unit. Thus, chunking may not be used for MTU size matching. Each partition may be processed by one processor/core. Inter-processor/inter-core communication for intra prediction between processing units decoding neighboring blocks may be limited to sending a shared slice header if the slice spans multiple blocks. Inter-processor/inter-core communication may also be used for loop filtering related sharing of reconstruction pixel points and metadata. When multiple partitions or WPP segments are included in a stripe, the entry point byte offset for each partition or WPP segment, except for the first partition or WPP segment in the stripe, may be indicated in the stripe header. For simplicity, the application of four different image segmentation schemes may be limited. For example, a Coded Video Sequence (CVS) may not include the partitions and wavefronts of most documents (profiles) specified in HEVC. One or both of the following conditions may also be satisfied for each stripe and partition. All the coding tree blocks in a stripe may belong to the same partition; all coding tree blocks in a partition may belong to the same stripe. Furthermore, a wavefront segment may contain exactly one row of CTBs. Meanwhile, when WPP is used, a stripe starting from a CTB row should end in the same CTB row.
VVCs may include chunking and chunking group image partitioning schemes. The partitions in VVC may be the same as the partitions in HEVC. VVC may use groups of tiles instead of stripes. A stripe is defined to contain a set of CTUs, and a block set is defined to contain a set of blocks. A decoded picture may be composed of one or more slices (or groups of blocks). Each slice/block group has a slice header containing syntax elements representing information used to decode the slice. Each slice header may contain information for decoding only a slice. However, the information from the slice header may be the same for other slices in the same image. This is because the transcoding tools can operate at the picture level, so the parameters for all slices within a picture can be the same. This situation may result in redundant information in the slice header.
A Header Parameter Set (HPS), also referred to as APS, may be used to overcome problems associated with redundant slice header information. The HPS may contain slice level information shared by multiple slices. The HPS may be generated by an encoder and may contain coding tool parameters used when decoding the corresponding slice at the decoder side. Some systems implement the HPS scheme using the HPS and a reference HPS. In this scheme, the initial HPS of the decoding order contains all relevant decoding tool parameters for the corresponding slice. When these parameters change for a subsequent stripe, the subsequent HPS only includes the parameters that have changed. The subsequent HPS then refers back to the initial HPS. Thus, the initial HPS serves as a reference HPS for subsequent HPSs. Other HPSs may then be used to reference previous HPSs, etc.
Referencing HPS in this manner has some problems. For example, allowing the HPS to reference other HPSs may result in complex mechanisms. In a particular example, the mechanism may result in a series of HPS references. In some cases, this approach may result in a long chain of HPSs, since the number of HPS references used in the codestream containing the video data may not be explicitly limited. To manage this scheme, a decoder may be required to reserve any number of HPSs in the decoded picture buffer for possible subsequent reference. To address this issue, an HPS reset mechanism may be added to break any such extended HPS reference chain, which further increases complexity. Furthermore, this method is prone to errors. For example, if an early HPS is lost due to a transmission error, the subsequently referenced HPS will not contain enough data to decode the corresponding stripe. Furthermore, this approach may result in a large number of HPSs in the codestream. However, the number of HPS Identifiers (IDs) may be limited to avoid decoding larger HPS ID values. Thus, the HPS ID may be reused in the codestream. This may lead to ambiguity, for example, when the HPS references an HPS ID used by multiple reference HPSs. Further, HPS may be indicated within the coded codestream, referred to as in-band indication. The HPS may also be indicated by an external mechanism in metadata information or the like. Such an indication is called an out-of-band indication. This dual indication mechanism further increases the complexity of the HPS scheme.
Various mechanisms for reducing the complexity of HPS pointing are disclosed herein. The HPS refers to the HPS in the latest standard document. Thus, for clarity of discussion, the following disclosure refers generally to HPS. However, the terms HPS and APS may be used interchangeably in most respects. The present invention avoids the reference of the HPS to other HPSs to avoid the complexity and error-susceptibility of the HPS. Since an HPS may not reference another HPS, the loss of a single HPS may only result in a local error. Furthermore, the decoder does not need to retain the HPS in memory when a subsequent HPS replaces a previous HPS. In a specific example, in the case where the HPS type indicates the type of transcoding tool parameters contained in the HPS, a plurality of types of HPSs may be used. Such HPS types may include an Adaptive Loop Filter (ALF) HPS, a luma-mapped with chroma scaling (LMCS) HPS, and/or a scaling list parameter HPS. In such an example, when the decoder obtains a current HPS of the first type, previous HPS of the first type may be discarded, as the current HPS would replace such previous HPS. In order to enable multiple types of HPSs, a single slice header may refer to multiple HPSs to refer to all coding tool parameters of the corresponding slice. This is in contrast to other schemes where the tape head may be referenced to a single HPS, which is then referenced to other HPSs. Thus, allowing a single slice header to reference multiple HPSs may result in an inability to implement a HPS reference chain. Further, the present invention describes a mechanism that enables the HPS to operate with time scaling. In time scaling, the codestream is used to allow a decoder and/or user to select from a plurality of frame rates. To implement this scheme, each image/frame is assigned a time ID. Frames with smaller time IDs are displayed at each frame rate. For lower frame rates, frames with larger time IDs will be skipped, while for higher frame rates, only display will be done. To support such time scaling (temporal scaling), the HPS is assigned a time ID. The HPS may receive a time ID of an image of a first stripe containing a reference HPS. In other examples, the HPS may receive a time ID for an access unit that includes the HPS. An access unit is a codestream data packet containing video data sufficient to decode a corresponding image. To further support time scaling, stripes associated with smaller time IDs may be blocked from referencing HPSs containing larger time IDs. This ensures that lower frame rate settings do not cause the slices to refer to HPS ignored by temporal scaling, thereby preventing the transcoding tool parameters from being unavailable when decoding certain slices at lower frame rates.
Fig. 1 is a flow chart of an exemplary method 100 of operation for coding a video signal. In particular, a video signal is encoded in an encoder. The encoding process reduces the video file size by compressing the video signal using various mechanisms. The smaller file size allows compressed video files to be sent to the user while reducing the associated bandwidth overhead. The decoder then decodes the compressed video file to reconstruct the original video signal and display it to the end user. The decoding process typically mirrors the encoding process so that the decoder can consistently reconstruct the video signal.
In step 101, a video signal is input into an encoder. The video signal may be, for example, an uncompressed video file stored in a memory. In another example, a video file may be captured by a video capture device (e.g., a video camera) and encoded to support a live stream of video. The video file may include an audio component and a video component. The video component comprises a series of image frames that, when viewed in sequence, create a visual effect of motion. These frames include pixels that are represented in terms of light (referred to herein as luminance components (or luminance pixels)) and color (referred to as chrominance components (or color pixels)). In some examples, the frames may also contain depth values to support three-dimensional viewing.
In step 103, the video is divided into blocks. The segmentation includes subdividing the pixels in each frame into square and/or rectangular blocks for compression. For example, in High Efficiency Video Coding (HEVC), also known as h.265 and MPEG-H part 2, a frame may first be divided into Coding Tree Units (CTUs), which are blocks of a predefined size (e.g., 64 pixels by 64 pixels). The CTU includes luminance pixel points and chrominance pixel points. A coding tree may be used to divide the CTU into blocks and then recursively subdivide the blocks until a configuration is obtained that supports further encoding. For example, the luminance component of a frame may be subdivided until each block contains a relatively uniform illumination value. In addition, the chroma components of a frame may be subdivided until each block contains a relatively uniform color value. Therefore, the segmentation mechanism differs according to the content of the video frame.
In step 105, various compression mechanisms are employed to compress the image blocks divided in step 103. For example, inter prediction and/or intra prediction may be employed. Inter-frame prediction aims to exploit the fact that objects in a common scene tend to appear in successive frames. Therefore, the block in the reference frame in which the object is depicted need not be repeatedly described in the adjacent frame. In particular, an object (e.g., a table) may remain in a constant position over multiple frames. Thus, describing the table only once, the adjacent frame can return to the reference frame. A pattern matching mechanism may be employed to match objects over multiple frames. Further, due to object movement or camera movement, etc., a moving object may be represented by a plurality of frames. In a particular example, a video may show a car moving on the screen over multiple frames. The movement can be described using motion vectors. The motion vector is a two-dimensional vector that provides an offset from the coordinates of an object in the frame to the coordinates of the object in the reference frame. Thus, inter prediction may encode an image block in a current frame as a set of motion vectors, indicating an offset relative to a corresponding block in a reference frame.
Intra-prediction encodes blocks in a common frame. Intra prediction exploits the fact that luminance and chrominance components tend to be clustered in one frame. For example, a piece of green in a portion of a tree tends to be adjacent to several similar pieces of green. Intra prediction employs multiple directional prediction modes (e.g., 33 modes in HEVC), planar mode, and Direct Current (DC) mode. The direction mode indicates that the current block is similar/identical to pixels of a corresponding directionally neighboring block. The plane mode indication may interpolate a series of blocks on a row/column (e.g., plane) from neighboring blocks at the row edge. In practice, the planar mode indicates a smooth transition of light/color on a row/column by employing a relatively constant slope with varying values. The DC mode is used for boundary smoothing, indicating that the block is similar/identical to the average value associated with the pixel points of all the neighbors associated with the angular direction of the directional prediction mode. Therefore, the intra-prediction block may represent an image block as various relational prediction mode values instead of actual values. Also, the inter prediction block may represent an image block as a motion vector value instead of an actual value. In either case, the prediction block may not accurately represent the image block in some cases. All differences are stored in the residual block. A transform may be applied to the residual block to further compress the file.
In step 107, various filtering techniques may be applied. In HEVC, filters are applied according to an in-loop filtering scheme. The block-based prediction described above may result in the creation of a block image at the decoder side. Furthermore, a block-based prediction scheme may encode a block and then reconstruct the encoded block for later use as a reference block. The in-loop filtering scheme iteratively applies a noise suppression filter, a deblocking filter, an adaptive loop filter, and a Sample Adaptive Offset (SAO) filter to the block/frame. These filters reduce such block artifacts so that the encoded file can be accurately reconstructed. In addition, these filters reduce artifacts in the reconstructed reference block so that the artifacts are less likely to produce other artifacts in subsequent blocks encoded from the reconstructed reference block.
In step 109, the video signal is divided, compressed, and filtered, and the resulting data is encoded in the code stream. The bitstream comprises the above data as well as any indicating data needed to support a proper reconstruction of the video signal at the decoder side. Such data may include, for example, partition data, prediction data, residual blocks, and various flags that provide decoding instructions to the decoder. The codestream may be stored in memory for transmission to a decoder upon request. The codestream may also be broadcast and/or multicast to multiple decoders. Creating a codestream is an iterative process. Thus, steps 101, 103, 105, 107 and 109 may be performed consecutively and/or simultaneously in a plurality of frames and blocks. The order shown in fig. 1 is presented for clarity and ease of description, and is not intended to limit the video coding process to a particular order.
In step 111, the decoder receives the code stream and begins the decoding process. Specifically, the decoder converts the code stream into corresponding syntax data and video data using an entropy decoding scheme. In step 111, the decoder determines the segmentation of the frame using syntax data in the codestream. The segmentation should match the block segmentation result in step 103. The entropy encoding/decoding employed in step 111 will now be described. The encoder makes a number of choices in the compression process, such as a block segmentation scheme from a number of possible choices based on the spatial localization of values in one or more input images. Indicating an exact selection may take a large number of bits. A bit, as used herein, is a binary value (e.g., a bit value that may vary depending on context) that is considered a variable. Entropy coding allows the encoder to discard any options that apparently do not fit in a particular case, leaving a set of options available for use. Then, one codeword is assigned to each available option. The length of the codeword depends on the number of options available (e.g., one bit for two options, two bits for three or four options, etc.). The encoder then encodes the codeword for the selected option. This approach reduces the size of the codeword because the size of the codeword is as large as the codeword needed to uniquely indicate a selection from a small subset of the available options, rather than uniquely indicate a selection from a potentially large set of all possible options. The decoder then decodes the selection by determining the set of available options in a similar manner as the encoder. By determining the set of available options, the decoder can read the codeword and determine the selection made by the encoder.
In step 113, the decoder performs block decoding. Specifically, the decoder performs an inverse transform to generate a residual block. The decoder then reconstructs the image block from the partition using the residual block and the corresponding prediction block. The prediction block may include an intra prediction block and an inter prediction block generated at the encoder side in step 105. The reconstructed image block is then positioned into a frame of the reconstructed video signal according to the segmentation data determined in step 111. The syntax of step 113 may also be indicated in the codestream by entropy coding as described above.
In step 115, the frames of the reconstructed video signal are filtered on the encoder side in a manner similar to step 107. For example, a noise suppression filter, a deblocking filter, an adaptive loop filter, and an SAO filter may be applied to the frame to remove the blocking artifacts. After filtering the frame, the video signal may be output to a display for viewing by an end user in step 117.
Fig. 2 is a schematic diagram of an exemplary encoding and decoding (codec) system 200 for video coding. In particular, the codec system 200 is capable of implementing the method of operation 100. The codec system 200 broadly describes the components used in the encoder and decoder. The codec system 200 receives a video signal and segments the video signal as described by steps 101 and 103 in the method of operation 100, thereby generating a segmented video signal 201. Then, when acting as an encoder, the codec system 200 compresses the partitioned video signal 201 into a coded bitstream, as described in connection with steps 105, 107 and 109 in method 100. When acting as a decoder, codec system 200 generates an output video signal from the codestream, as described in connection with steps 111, 113, 115, and 117 of method 100. The codec system 200 includes a generic coder control component 211, a transform scaling and quantization component 213, an intra estimation component 215, an intra prediction component 217, a motion compensation component 219, a motion estimation component 221, a scaling and inverse transform component 229, a filter control analysis component 227, an in-loop filter component 225, a decoded picture buffer component 223, a header formatting and Context Adaptive Binary Arithmetic Coding (CABAC) component 231. Such components are coupled as shown. In fig. 2, a black line represents movement of data to be encoded/decoded, and a dotted line represents movement of control data that controls operations of other components. The components of the codec system 200 may all be present in an encoder. The decoder may comprise a subset of the components of the codec system 200. For example, the decoder may include an intra prediction component 217, a motion compensation component 219, a scaling and inverse transform component 229, an in-loop filter component 225, and a decoded picture buffer component 223. These components are now described.
The segmented video signal 201 is a captured video sequence that has been segmented into blocks of pixels by a coding tree. The coding tree employs various partitioning schemes to subdivide blocks of pixels into smaller blocks of pixels. These blocks can then be further subdivided into smaller blocks. The blocks may be referred to as nodes on the coding tree. The larger parent node is divided into smaller child nodes. The number of times a node is subdivided is referred to as the depth of the node/coding tree. In some cases, the divided blocks may be included in a Coding Unit (CU). For example, a CU may be a sub-part of a CTU, including a luma block, one or more red-difference chroma (Cr) blocks, and one or more blue-difference chroma (Cb) blocks, and corresponding syntax instructions of the CU. The partitioning patterns may include a Binary Tree (BT), a Triple Tree (TT) and a Quadtree (QT), and are used to partition a node into two, three or four sub-nodes with different shapes, respectively, according to the adopted partitioning pattern. The segmented video signal 201 is sent to the universal coder control component 211, the transform scaling and quantization component 213, the intra estimation component 215, the filter control analysis component 227, and the motion estimation component 221 for compression.
The universal decoder control component 211 is operable to make decisions related to decoding images of a video sequence into a bitstream based on application constraints. For example, the universal decoder control component 211 manages the optimization of the code rate/codestream size with respect to the reconstruction quality. Such decisions may be made based on storage space/bandwidth availability and image resolution requests. The universal decoder control component 211 also manages buffer usage based on transmission speed to alleviate buffer underrun and overload problems. To manage these issues, the universal decoder control component 211 manages the partitioning, prediction, and filtering by other components. For example, the universal transcoder control component 211 may dynamically increase compression complexity to increase resolution and bandwidth usage or decrease compression complexity to decrease resolution and bandwidth usage. Thus, the universal decoder control component 211 controls other components of the codec system 200 to balance video signal reconstruction quality with rate issues. The universal decoder control component 211 creates control data that controls the operation of the other components. The control data is also sent to the header formatting and CABAC component 231 for encoding into the codestream, indicating parameters for decoding at the decoder side.
The partitioned video signal 201 is also sent to motion estimation component 221 and motion compensation component 219 for inter prediction. A frame or slice of the partitioned video signal 201 may be divided into a plurality of video blocks. Motion estimation component 221 and motion compensation component 219 inter-prediction code the received video block relative to one or more blocks in one or more reference frames to provide temporal prediction. The codec system 200 may perform multiple coding processes to select an appropriate coding mode for each block of video data, etc.
The motion estimation component 221 and the motion compensation component 219 may be highly integrated, but are illustrated separately for conceptual purposes. Motion estimation performed by motion estimation component 221 is the process of generating motion vectors, wherein these motion vectors are used to estimate the motion of a video block. For example, the motion vector may represent a displacement of a coding object with respect to a prediction block. A prediction block is a block that is found to be highly matched in terms of pixel differences to the block to be coded. The prediction block may also be referred to as a reference block. Such pixel differences may be determined by Sum of Absolute Difference (SAD), sum of Squared Difference (SSD), or other difference metrics. HEVC employs several coding objects, including CTUs, coding Tree Blocks (CTBs), and CUs. For example, a CTU may be divided into a plurality of CTBs, and then the CTBs may be divided into a plurality of CBs to be included in a CU. A CU may be encoded as a Prediction Unit (PU) containing prediction data and/or a Transform Unit (TU) containing transform residual data of the CU. Motion estimation component 221 generates motion vectors, PUs, and TUs using rate-distortion analysis as part of a rate-distortion optimization process. For example, the motion estimation component 221 may determine a plurality of reference blocks, a plurality of motion vectors, etc. for the current block/frame and may select the reference block, motion vector, etc. having the best rate-distortion characteristics. The optimal rate-distortion characteristics balance the quality of the video reconstruction (e.g., the amount of data loss due to compression) and the coding efficiency (e.g., the size of the final encoding).
In some examples, the codec system 200 may calculate values for sub-integer pixel positions of reference pictures stored in the decoded picture buffer component 223. For example, the video codec system 200 may interpolate values for a quarter-pixel position, an eighth-pixel position, or other fractional-pixel positions of a reference image. Thus, the motion estimation component 221 can perform a motion search with respect to the full pixel position and fractional pixel position and output a motion vector with fractional pixel precision. Motion estimation component 221 calculates motion vectors for PUs of video blocks in inter-coded slices by comparing locations of the PUs to locations of prediction blocks of a reference picture. The motion estimation component 221 outputs the calculated motion vectors as motion data to the header formatting and CABAC component 231 for encoding, and outputs the motion to the motion compensation component 219.
The motion compensation performed by motion compensation component 219 may involve retrieving or generating a prediction block from the motion vector determined by motion estimation component 221. Also, in some examples, motion estimation component 221 and motion compensation component 219 may be functionally integrated. After receiving the motion vector for the PU of the current video block, motion compensation component 219 may locate the prediction block to which the motion vector points. The pixel values of the prediction block are then subtracted from the pixel values of the current video block being coded, forming pixel difference values, thus forming a residual video block. In general, motion estimation component 221 performs motion estimation with respect to the luma component, and motion compensation component 219 uses the motion vectors calculated from the luma component for the chroma component and the luma component. The prediction block and the residual block are sent to the transform scaling and quantization component 213.
The partitioned video signal 201 is also sent to intra estimation component 215 and intra prediction component 217. Like motion estimation component 221 and motion compensation component 219, intra estimation component 215 and intra prediction component 217 may be highly integrated, but are illustrated separately for conceptual purposes. Intra-estimation component 215 and intra-prediction component 217 intra-predict the current block relative to blocks in the current frame in place of inter-prediction performed between frames by motion estimation component 221 and motion compensation component 219 as described above. In particular, intra-estimation component 215 determines an intra-prediction mode to use for encoding the current block. In some examples, the intra-estimation component 215 selects an appropriate intra-prediction mode from a plurality of tested intra-prediction modes to encode the current block. The selected intra prediction mode is then sent to the header formatting and CABAC component 231 for encoding.
For example, the intra-estimation component 215 performs rate-distortion analysis on various tested intra-prediction modes to calculate rate-distortion values, and selects an intra-prediction mode having the best rate-distortion characteristics among the tested modes. Rate-distortion analysis typically determines the amount of distortion (or error) between an encoded block and the original, unencoded block that was encoded to produce the encoded block, as well as the code rate (e.g., number of bits) used to produce the encoded block. The intra estimation component 215 calculates ratios based on the distortion and rate of various encoded blocks to determine the intra prediction mode that exhibits the best rate-distortion value for the block. Further, the intra estimation component 215 may be operative to code the depth blocks of the depth map using a Depth Modeling Mode (DMM) according to rate-distortion optimization (RDO).
When implemented at an encoder, the intra prediction component 217 may generate a residual block from the predicted block according to the selected intra prediction mode determined by the intra estimation component 215, or when implemented at a decoder, read the residual block from the code stream. The residual block comprises the value difference between the prediction block and the original block, represented as a matrix. The residual block is then sent to the transform scaling and quantization component 213. Intra estimation component 215 and intra prediction component 217 may operate on the luma component and the chroma component.
The transform scaling and quantization component 213 is used to further compress the residual block. The transform scaling and quantization component 213 applies a transform, such as a Discrete Cosine Transform (DCT), a Discrete Sine Transform (DST), or a conceptually similar transform, to the residual block, thereby generating a video block that includes residual transform coefficient values. Wavelet transforms, integer transforms, sub-band transforms, or other types of transforms may also be used. The transform may transform the residual information from a pixel value domain to a transform domain, such as the frequency domain. The transform scaling and quantization component 213 is also used to scale the transformed residual information according to frequency, etc. Such scaling involves applying a scaling factor to the residual information in order to quantize different frequency information at different granularities, which may affect the final visual quality of the reconstructed video. The transform scaling and quantization component 213 is also used to quantize the transform coefficients to further reduce the code rate. The quantization process may reduce the bit depth associated with some or all of the coefficients. The quantization level may be modified by adjusting a quantization parameter. In some examples, the transform scaling and quantization component 213 may then scan a matrix comprising quantized transform coefficients. The quantized transform coefficients are sent to the header formatting and CABAC component 231 for encoding into the bitstream.
The scaling and inverse transform component 229 applies the inverse operations of the transform scaling and quantization component 213 to support motion estimation. Scaling and inverse transform component 229 applies inverse scaling, transformation, and/or quantization to reconstruct the residual block in the pixel domain, e.g., for later use as a reference block, which may become a prediction block for another current block. Motion estimation component 221 and/or motion compensation component 219 may compute a reference block by adding the residual block back to the corresponding prediction block for motion estimation of a subsequent block/frame. Filters are applied to the reconstructed reference block to reduce artifacts generated during scaling, quantization and transformation. Such artifacts may additionally lead to prediction inaccuracies (and other artifacts) when predicting subsequent blocks.
The filter control analysis component 227 and the in-loop filter component 225 apply filters to the residual block and/or reconstructed image block. For example, the transformed residual block of the scaling and inverse transform component 229 may be merged with the corresponding prediction block of the intra prediction component 217 and/or the motion compensation component 219 to reconstruct the original image block. Then, a filter may be applied to the reconstructed image block. In some examples, the filter may instead be applied to the residual block. As with the other components in fig. 2, filter control analysis component 227 and in-loop filter component 225 are highly integrated and can be implemented together, but are described separately for conceptual purposes. The filter applied to reconstruct the reference block is applied to a particular spatial region and includes a number of parameters to adjust the manner in which such filter is applied. The filter control analysis component 227 analyzes the reconstructed reference block to determine where such filters should be applied and sets the corresponding parameters. Such data is sent as filter control data to the header formatting and CABAC component 231 for encoding. The in-loop filter component 225 applies such filters according to the filter control data. The filters may include deblocking filters, noise suppression filters, SAO filters, and adaptive loop filters. Such filters may be applied in the spatial/pixel domain (e.g., for reconstructed pixel blocks) or in the frequency domain, according to examples.
When operating as an encoder, the filtered reconstructed image blocks, residual blocks, and/or prediction blocks are stored in decoded image buffer component 223 for later use in motion estimation, as described above. When operating as a decoder, decoded picture buffer component 223 stores the reconstructed and filtered block and sends it to a display as part of the output video signal. Decoded picture buffer component 223 may be any memory device capable of storing a predicted block, a residual block, and/or a reconstructed image block.
The header formatting and CABAC component 231 receives data from the various components of the codec system 200 and encodes such data into a coded code stream for transmission to a decoder. In particular, the header formatting and CABAC component 231 generates various headers to encode control data (e.g., overall control data and filter control data). Furthermore, prediction data (including intra prediction) and motion data, as well as residual data in the form of quantized transform coefficient data, are encoded into the code stream. The final code stream includes all the information needed by the decoder to reconstruct the original segmented video signal 201. Such information may also include an intra prediction mode index table (also referred to as a codeword mapping table), definitions of coding contexts for various blocks, an indication of a most probable intra prediction mode, an indication of partitioning information, and so forth. Such data may be encoded by employing entropy coding. For example, information may be encoded using Context Adaptive Variable Length Coding (CAVLC), CABAC, syntax-based context-adaptive binary arithmetic coding (SBAC), probability interval entropy (PIPE) coding, or other entropy coding techniques. After entropy coding, the coded codestream may be sent to another device (e.g., a video decoder) or archived for subsequent transmission or retrieval.
Fig. 3 is a block diagram of an exemplary video encoder 300; the video encoder 300 may be used to implement the encoding functions of the codec system 200 and/or to implement the steps 101, 103, 105, 107 and/or 109 of the method of operation 100. Encoder 300 divides the input video signal, producing a divided video signal 301, wherein the divided video signal 301 is substantially similar to divided video signal 201. The partitioned video signal 301 is then compressed by the components of the encoder 300 and encoded into a codestream.
In particular, the partitioned video signal 301 is sent to an intra prediction component 317 for intra prediction. The intra-prediction component 317 may be substantially similar to the intra-estimation component 215 and the intra-prediction component 217. The partitioned video signal 301 is also sent to a motion compensation component 321 for inter prediction from reference blocks in the decoded picture buffer component 323. Motion compensation component 321 may be substantially similar to motion estimation component 221 and motion compensation component 219. The prediction blocks and residual blocks of the intra prediction component 317 and the motion compensation component 321 are sent to a transform and quantization component 313 for transforming and quantizing the residual blocks. The transform and quantization component 313 may be substantially similar to the transform scaling and quantization component 213. The transformed and quantized residual block and corresponding prediction block (along with associated control data) are sent to entropy coding component 331 for coding into the code stream. Entropy coding component 331 may be substantially similar to header formatting and CABAC component 231.
The transformed and quantized residual block and/or the corresponding prediction block is also sent from the transform and quantization component 313 to the inverse transform and quantization component 329 for reconstruction as a reference block for use by the motion compensation component 321. Inverse transform and quantization component 329 may be substantially similar to scaling and inverse transform component 229. According to an example, the in-loop filter in the in-loop filter component 325 is also applied to the residual block and/or the reconstructed reference block. In-loop filter component 325 may be substantially similar to filter control analysis component 227 and in-loop filter component 225. In-loop filter component 325 may include a plurality of filters as depicted by in-loop filter component 225. The filtered block is then stored in the decoded picture buffer component 323 for use by the motion compensation component 321 as a reference block. Decoded picture buffer component 323 can be substantially similar to decoded picture buffer component 223.
Fig. 4 is a block diagram of an example video decoder 400. The video decoder 400 may be used to implement the decoding function of the codec system 200 and/or to implement the steps 111, 113, 115 and/or 117 of the method of operation 100. For example, the decoder 400 receives a code stream from the encoder 300 and generates a reconstructed output video signal from the code stream for display to an end user.
The code stream is received by entropy decoding component 433. Entropy decoding component 433 is used to implement entropy decoding schemes such as CAVLC, CABAC, SBAC, PIPE coding, or other entropy coding techniques. For example, entropy decoding component 433 may employ header information to provide context to interpret other data of the codeword encoded into the code stream. The decoding information includes any information required for decoding the video signal, such as overall control data, filter control data, partition information, motion data, prediction data, and quantized transform coefficients in a residual block. The quantized transform coefficients are sent to an inverse transform and quantization component 429 for reconstruction into a residual block. Inverse transform and quantization component 429 may be similar to inverse transform and quantization component 329.
The reconstructed residual block and/or the prediction block is sent to the intra prediction component 417 to be reconstructed into an image block according to the intra prediction operation. The intra prediction component 417 may be similar to the intra estimation component 215 and the intra prediction component 217. In particular, intra-prediction component 417 employs a prediction mode to locate a reference block in a frame and applies a residual block to the result to reconstruct an intra-predicted image block. The reconstructed intra-predicted image block and/or residual block and corresponding inter-prediction data are sent through an in-loop filter component 425 to a decoded image buffer component 423, which decoded image buffer component 423 and in-loop filter component 425 may be substantially similar to decoded image buffer component 223 and in-loop filter component 225, respectively. In-loop filter component 425 filters the reconstructed image blocks, residual blocks, and/or predicted blocks and stores such information in decoded image buffer component 423. The reconstructed picture block of the decoded picture buffer component 423 is sent to the motion compensation component 421 for inter prediction. The motion compensation component 421 may be substantially similar to the motion estimation component 221 and/or the motion compensation component 219. Specifically, the motion compensation component 421 generates a prediction block using a motion vector of a reference block and applies a residual block to the result to reconstruct an image block. The resulting reconstructed block may also be sent to the decoded picture buffer component 423 by the in-loop filter component 425. Decoded picture buffer component 423 continues to store other reconstructed picture blocks that can be reconstructed into frames from the partition information. Such frames may also be located in a sequence. The sequence is output to a display as a reconstructed output video signal.
Fig. 5 is a schematic diagram illustrating an exemplary codestream 500 containing an encoded video sequence with an HPS 513. For example, the codestream 500 may be generated by the codec system 200 and/or the encoder 300 for decoding by the codec system 200 and/or the decoder 400. In another example, codestream 500 may be generated by an encoder at step 109 of method 100 for use by a decoder at step 111.
The bitstream 500 includes a Sequence Parameter Set (SPS) 510, a plurality of Picture Parameter Sets (PPS) 512, a plurality of HPSs 513, a plurality of slice headers 514, and picture data 520.SPS 510 contains sequence data common to all of the pictures in the video sequence contained in codestream 500. These data may include image size, bit depth, coding tool parameters, code rate limitations, and the like. The PPS 512 contains parameters that apply to the entire picture. Thus, each picture in a video sequence may reference the PPS 512. It should be noted that although each picture refers to PPS 512, in some examples, a single PPS 512 may contain data for multiple pictures. For example, a plurality of similar images may be decoded according to similar parameters. In this case, a single PPS 512 may contain data for such similar pictures. PPS 512 may represent coding tools that may be used for slices, quantization parameters, offsets, etc. in a corresponding picture. The slice header 514 contains parameters specific to each slice in the image. Thus, there may be a slice header 514 for each slice in the video sequence. The slice header 514 may contain slice type information, picture Order Count (POC), reference picture list, prediction weights, partition entry points, deblocking parameters, etc.
HPS513 is a syntax structure that contains syntax elements that apply to zero or more stripes, which are determined by zero or more syntax elements found in the stripe header. Thus, HPS513 includes syntax elements for coding parameters of the tool related to the multiple slices. The HPS513 may also be referred to as an HPS in some systems. For example, one or more strips may be referred to as HPS 513. Thus, the decoder may obtain the HPS513 from these references, obtain coding tool parameters from the HPS513, and decode the corresponding slice using the coding tool parameters. The HPS513 conceptually occupies a tiered position between the PPS 512 and the slice header 514. For example, some data may be associated with multiple stripes, but not the entire image. Such data may not be stored in the PPS 512 because the data is independent of the entire picture. However, such data would otherwise be contained in multiple stripe headers 514. The HPS513 may accept such data to avoid redundant indications among multiple stripe headers 514. The HPS513 coding structure is introduced in VVC, and there is no similar structure in HEVC or previous coding standards. Various implementations of the HPS513 are discussed below.
Image data 520 includes video data encoded according to inter-prediction and/or intra-prediction and corresponding transformed and quantized residual data. For example, a video sequence includes a plurality of images 521 coded as image data. The image 521 is a single frame of a video sequence and is therefore typically displayed in a single unit when the video sequence is displayed. However, portions of the image may be displayed to implement certain technologies, such as virtual reality, picture-in-picture, and so forth. The pictures 521 each refer to the PPS 512. The image 521 is divided into stripes 523. The strip 523 may be defined as a horizontal portion of the image 521. For example, the swath 523 may comprise a portion of the height of the image 521 and the full width of the image 521. In some systems, the stripe 523 is subdivided into blocks 525. In other systems, stripe 523 is replaced by a tile group (tile group) containing tile 525. The slice group of the slice 523 and/or the slice 525 refers to the slice header 514 and/or the HPS 513. Tiles 525 may include rectangular portions of image 521 and/or portions of image 521 defined by columns and rows. Block 525 is further divided into Coding Tree Units (CTUs). The CTUs are further divided into decoding blocks according to a decoding tree. The coded block may then be encoded/decoded according to a prediction mechanism.
The code stream 500 is decoded into VCL NAL units 533 and non-VCL NAL units 531. A NAL unit is a coded data unit that can be sized into the payload of a single packet for transmission over a network. VCL NAL unit 533 is a NAL unit that contains coded video data. For example, each VCL NAL unit 533 may contain one slice 523 and/or a block group including data of a corresponding slice 525, CTU, and/or coded block. non-VCL NAL units 531 are NAL units that include support syntax but do not include coded video data. For example, non-VCL NAL units 531 may include SPS 510, PPS 512, HPS513, slice header 514, and so on. Thus, the decoder receives the codestream 500 in discrete VCL NAL units 533 and non-VCL NAL units 531. Access unit 535 is a set of VCL NAL units 533 and/or non-VCL NAL units 531 that include data sufficient to code a single picture 521.
In some examples, HPS513 may be implemented as follows. The HPS513 may be available in-band and/or out-of-band, where the in-band indication is included in the codestream 500 and the out-of-band indication is included in the support metadata. The HPS513 is included in a NAL unit, such as a non-VCL NAL unit 531, where the HPS513 is identified by a NAL unit type. The HPS513 may contain parameters for coding tools such as, but not limited to, ALF, SAO, deblocking, quantization matrices, inter prediction parameters, reference picture set construction related parameters, and/or reference picture list construction related parameters. The HPS513 may include a type. This type defines which decoding tool parameters are contained in the HPS 513. Each HPS513 may contain only one transcoding tool parameter. The different types of HPSs 513 may be referred to together as a Parameter Set (GPS) Group. The strip 523 may reference GPS instead of the single HPS 513.HPS 513 may be available to the decoder side before being referenced by the corresponding slice 523 and/or block group. Different slices 523 of the coded image 521 may reference different HPSs 513. The HPS513 may be placed at the boundary of any stripe 523 in the codestream 500. This makes it possible to reuse parameters (e.g., ALF parameters) in the HPS513 even for all the slices 523 of the current image 521 after the HPS 513.
The reference from the stripe head 514 to the HPS513 may be optional. For example, the strip 523 may reference the HPS513 when either of the following is true. First, such a reference may indicate in the corresponding PPS 512: the HPS513 may be utilized with the codestream 500. Second, the slice 523 may reference the HPS513 when at least one of the coding tools whose parameters are contained in the HPS513 is enabled for the codestream 500. Each HPS513 may be associated with an HPS ID. The strip 523 of the reference HPS513 should include the HPS ID of the referenced HPS 513. The HPS ID may be coded using unsigned integer 0 order Exp-Golomb coded syntax elements with left-bit first (e.g., ue (v)). The value of the HPS ID may be limited, for example, in the range of 0 to 63. For each parameter of the transcoding tool, a flag may be present in the HPS513 to indicate whether the parameter is present in the HPS 513. When a coding tool is enabled for a slice 523 and a parameter for the coding tool is present in the HPS513 referenced by the slice 523, the parameter may not be indicated in the corresponding slice header 514. The HPS513 may be segmented into one or more NAL units and each segment of the HPS513 may be parsed and applied independently. The strip 523 may reference a single HPS513 or multiple HPSs 513. When multiple HPSs 513 are allowed to be referenced, each reference to an HPS513 may be employed to resolve parameters of a particular coding tool.
The following implementation allows the HPS513 to reference other HPSs 513 and inherit coding tool parameters through such references. The HPS513 may include one or more references to other HPSs 513. In such examples, when the HPS513 does not reference any other HPS513, the HPS513 may be referred to as an intra HPS. When the HPS513 references another HPS513, the reference HPS//// copies one or more parameters from the reference HPS 513. The HPS513 may have multiple references to other HPS513, with one reference HPS513 for each parameter set. In some examples, the linked list of HPSs 513 may be formed as a series of HPSs 513 connected by a reference mechanism. When a parameter of the transcoding tool is specified to be absent from the HPS513 (e.g., the value of the present flag is equal to 0), other flags may be present to indicate whether the parameter can be inferred from the reference HPS 513. The reference from the HPS513 to another HPS513 may be implicitly specified such that when the parameters of the transcoding tool are not present in the HPS513, these parameters are inferred to be the same as the parameters of the transcoding tool of the previous HPS 513.
In some cases, HPS513 may no longer exist when Random Access is invoked to obtain Instantaneous Decoder Refresh (IDR) and/or pure Random Access (CRA) images. Thus, two HPS buffers may be employed to store HPS513, where each buffer is alternately activated at the beginning of each Intra Random Access Point (IRAP) image. In this case, the received HPS513 is stored in an active HPS (active HPS) buffer. To improve error resilience, a range of HPS IDs may be specified to indicate the HPS ID currently in use. When the HPS513 has an HPS ID that is outside the range of HPS IDs used, other HPS513 may use the HPS ID. The HPS ID used may then be updated in a sliding window method. When this technique is used, the HPS513 may refer only to other HPS513 whose HPS ID is within range of use. A limit on the number of active HPSs 513 may be defined to limit the HPS513 storage requirements in the decoder memory. When the limit is reached and a new HPS513 is received, the oldest (e.g., oldest received) HPS513 in the buffer may be removed and the new HPS513 inserted.
In some cases, the decoder may parse all HPS513 references once it receives the HPS 513. For example, when the decoder receives the HPS513, the coding tool parameters may be immediately copied and the coding tool parameters are not available in the current HPS513 but are available in the reference HPS 513. Another approach to improving error resilience may require the intra HPS513 to be present for a specified period of time. When an intra HPS513 is received, all available HPSs 513 of the same type in the buffer may be discarded. When the parameters of the transcoding tools are not present in the HPS513, a flag may be present to indicate whether to disable the transcoding tools for the stripe 523 of the reference HPS 513. Further, a flag in a NAL unit header (e.g., NAL _ ref _ flag) may specify whether the HPS513 contained in the NAL unit can be referred to by the slice 523 of the picture 521 used as a reference. For example, when NAL _ ref _ flag of the NAL unit containing the HPS513 is equal to 0, the HPS513 can only be referred to by the slice 523 of the non-reference picture 521. The flag may also be used to determine which HPS513 may be used as a reference for other HPSs 513. For example, the HPS513 may not refer to other HPS513 included in NAL units with NAL _ ref _ flag equal to 0. The reset period of the HPS buffer may be specified in the SPS 510. The HPS513 buffer may be reset when an IRAP image appears.
As can be appreciated from examination of the foregoing implementations, allowing the HPS513 to reference other HPS513 can be quite complex. Thus, in the disclosed example, the HPS513 may be prevented from referencing other HPS 513. Instead, multiple types of HPS513 may be employed. The HPS513 type indicates the type of transcoding tool parameters contained in the HPS 513. Such HPS513 types may include ALF HPS, LMCS HPS, and/or zoom list parameters HPS. An ALF HPS is an HPS that contains coding tool parameters as part of the adaptive loop filtering of the corresponding slice. The LMCS HPS contains coding tool parameters for the LMCS mechanism. LMCS is a filtering technique that reshapes the luminance component based on a mapping to the corresponding chrominance component to reduce rate distortion. The scaling list parameter HPS contains coding tool parameters related to the quantization matrix used by the specified filter. In such an example, when the decoder obtains the current HPS513 of the first type, the previous HPS513 of the first type may be discarded because the current HPS513 would replace such previous HPS 513. Furthermore, in order that multiple types of HPSs 513 may be implemented, a single slice header 514 may reference multiple HPSs 513 to reference all coding tool parameters of the corresponding slice 523 and/or group of tiles. This is in contrast to other schemes that allow the stripe header 514 to reference a single HPS513, which single HPS513 subsequently references other HPS 513. Thus, making multiple HPSs 513 available for reference to a single slice header 514 may result in the inability to implement HPS513 reference chains. This approach significantly reduces complexity and therefore reduces the occupancy of processing resources of the encoder and decoder. Further, the process reduces the number of HPSs 513 buffered on the decoder side. For example, only one HPS513 of each type may be buffered on the decoder side. This reduces the memory usage of the decoder. Furthermore, by avoiding the HPS513 reference chain, potential errors are located and thus reduced. This is because losing the HPS513 during a transmission error may only affect the stripe 523 that directly references the HPS 513. Thus, the disclosed mechanism yields improvements at both the encoder and decoder sides when the HPS513 is employed in the codestream 500.
Fig. 6 is a schematic diagram of an example mechanism 600 for time scaling. For example, a decoder (e.g., codec system 200 and/or decoder 400) may employ mechanism 600 when displaying a decoded codestream (e.g., codestream 500). Further, mechanism 600 may be used as part of step 117 of method 100 when outputting video for display. In addition, an encoder (e.g., codec system 200 and/or encoder 300) may encode data in a codestream, such that mechanism 600 may be employed at the decoder side as well.
The mechanism 600 operates on a plurality of decoded pictures 601, 603, and 605. Images 601, 603, and 605 are part of an ordered video sequence and have been decoded from the codestream by employing the mechanisms described above, and the like. The code stream is encoded such that a decoder can display a video sequence at one of a plurality of frame rates, including a first frame rate (FR 0) 610, a second frame rate (FR 1) 611, and a third frame rate (FR 2) 612. Frame rate is a measure of the frequency of frame/image display of a video sequence. The frame rate for a period of time may be measured in units of frames. Differences in frame rates can cause different decoders to display the same video sequence at different qualities to account for differences in decoder capabilities. For example, a decoder with less hardware capabilities and/or a decoder streamed over a network connection with less quality may be displayed at FR0 610. In another example, a high quality decoder that has access to a fast network connection may be displayed at FR2 612. In yet another example, a decoder with some quality impairment can display at FR1 611, but not at FR2 612. Thus, time scaling (e.g., mechanism 600) is employed so that each decoder can display video at the highest frame rate possible according to different decoding-side capabilities and constraints to achieve the best user experience. In most systems, each frame rate is twice the frequency of the previous frame rate. For example, FR0 610, FR1 611, and FR2 612 may be set to 15 Frames Per Second (FPS), 30FPS, and 60FPS, respectively.
To achieve time scaling, the encoder decodes the images 601, 603 and 605 into the codestream at as high a frame rate as possible (FR 2 612 in this case). The encoder also assigns a Temporal Identifier (TID) to each picture 601, 603, and 605. Images 601, 603 and 605 receive TIDs of 0, 1 and 2, respectively. When the decoder displays the obtained decoded video, the frame rate is selected, the corresponding frame rate TID is determined, and all frames with the TID equal to or less than the frame rate TID are displayed. Images with a frame rate TID greater than the selected frame rate are ignored. For example, a decoder selecting FR2 612 displays all pictures having a TID of 2 or less, and thus displays all pictures 601, 603, and 605. In another example, a decoder selecting FR1 611 displays all pictures with a TID of 1 or less, and thus displays pictures 601 and 603, ignoring picture 605. In another example, a decoder selecting FR0 610 displays all pictures with a TID of 0 or less and thus displays picture 601, ignoring pictures 603 and 605. By employing this mechanism 600, a video sequence can be time scaled by a decoder to a selected frame rate.
The HPS513 as described in fig. 5 may be implemented to support time scaling of the mechanism 600. This may be done by allocating a TID, such as TID 0, TID 1 or TID 2, for each HPS. When time scaling is performed, HPSs with TIDs less than or equal to the selected frame rate TID are decoded, and HPSs with TIDs greater than the selected frame rate TID are discarded. According to various embodiments, a TID may be allocated to an HPS.
Referring to fig. 5, in one example, the HPS513 may receive a time ID of an image 521 including a first strip 523 of the reference HPS 513. In other examples, HPS513 may receive a time ID for access unit 535 that contains HPS 513. To further support time scaling, the stripe 523 associated with a smaller time ID may be blocked from referencing the HPS513 containing a larger time ID. This ensures that lower frame rate settings do not cause the stripes 523 to reference the HPS513, which is ignored due to the time scaling of the mechanism 600, and thus prevents the transcoding tool parameters from being unavailable when decoding certain stripes 523 at lower frame rates, such as FR0 610 and/or FR1 611.
The foregoing mechanism may be implemented as follows. The following aspects may be applied alone and/or in combination. HPS is also called Adaptive Parameter Set (APS). The availability of HPSs for the codestream may be all in-band available, all out-of-band available, and/or some in-band available and some out-of-band available. When provided out-of-band, the HPS may be present hereinafter. In an ISO-based media file format, the HPS may be present in a sample entry (e.g., a sample description box). In the ISO-based media file format, the HPS may be present in a time-synchronized track, such as a parameter set track or a metadata track with time attributes.
In one particular implementation, when provided out-of-band, the HPS may be carried as follows. In the ISO-based media file format, when there is no HPS update, the HPS may only exist in the sample entry (e.g., sample description box). When the HPS Identifier (ID) is reused while the HPS other parameters are different from the previously sent HPS with the same HPS ID, there may be no HPS update. When there is an HPS update, for example, when the HPS contains Adaptive Loop Filter (ALF) parameters based on the International Organization for Standardization (ISO) media file format, the HPS may be carried in a time synchronization track, such as a parameter set track or a metadata track with time attributes. In this way, stripes, each containing a complete set of chunks, may be carried in their own file format tracks. Further, the HPS may be carried in a time synchronization track. Thus, these traces can each be carried in the form of Dynamic Adaptive Streaming of Hypertext transfer protocol (DASH) representation. For decoding and rendering of the subset of stripe/chunk tracks, the client may request a DASH representation containing the subset of stripe/chunk tracks and a DASH representation containing the HPS in a segment-by-segment manner.
In another example, the HPS may be designated as always provided in-band. The HPS may also be carried in a time sync trace, such as a parameter set trace or a metadata trace with time attributes. Thus, the HPS may be issued as described above. Furthermore, in the specification of the file format of video codec, the bitstream reconstruction process may construct an output bitstream from a subset of the stripe/partition track and a time synchronization track containing the HPS such that the HPS is part of the output bitstream.
The HPS should be present and/or available to the decoder in decoding order and before the first stripe of the reference HPS. For example, if the HPS is available in-band, the HPS may precede the first band of the reference HPS in decoding order. Otherwise, the HPS decoding time should be equal to or less than the decoding time of the first slice of the reference HPS.
The HPS may include an ID of a sequence level parameter set such as SPS. When an SPS ID is not present in the HPS, the following constraints may apply. When a first band of a reference HPS is a part of an Intra Random Access Point (IRAP) image and the HPS is carried in-band, the HPS may exist in an IRAP Access unit. When the first stripe of the reference HPS is part of an IRAP picture and the HPS is carried out-of-band, the decoding time of the HPS may be the same as the decoding time of the IRAP picture. When the first stripe of the reference HPS is not part of an IRAP picture and the HPS is carried in-band, the HPS may be present in one of the access units between (including an end value of) the IRAP access unit that starts coding the video sequence and the access unit that contains the stripe. When the first slice of the reference HPS is not part of an IRAP picture and the HPS is carried in-band, the decoding time of the HPS may be between (including an end value) the decoding time of an IRAP access unit that starts coding the video sequence and the decoding time of an access unit that contains the slice.
When the SPS ID is present in the HPS, the following may apply. If provided in-band, the HPS may be present at the beginning of the codestream or in any decoding sequence as long as the HPS precedes the first stripe of the reference HPS in decoding order. Otherwise, the HPS may be present in the sample entry or in the time synchronization track as long as the HPS decoding time is less than the decoding time of the first slice of the reference HPS.
For the strip reference of HPS, the following case may apply. If the SPS ID is present in the HPS, each stripe and the HPS to which the stripe refers may refer to the same SPS. Otherwise, the stripe may not refer to the HPS present in the access unit preceding the IRAP access unit associated with the stripe. The HPS may also be restricted to have a slice reference decoding time less than the decoding time of the IRAP access unit associated with the slice.
As an alternative to using a flag to indicate the presence of a parameter of a coding tool in the HPS, a double bit indicator (e.g., coded as u (2)) may be used. The semantics of the indicator are defined as follows: one value of the indicator (e.g., a value of 0) indicates that the parameter is not present in the HPS and that there is no reference to another HPS for deriving the parameter. Another value of the indicator (e.g., a value of 1) indicates that the parameter is not present in the HPS and that there is a reference to another HPS for deriving the parameter. Another value (e.g., value 2) of the indicator indicates that the parameter is present in the HPS and there is no reference to another HPS. Another value of the indicator may be reserved. In another example, another value of the indicator (e.g., a value of 3) indicates that the parameter is present in the HPS, and there is a reference to another HPS for deriving the parameter. In this case, the final parameters are derived from the parameters explicitly indicated in the HPS and the parameters present in the reference HPS by the input.
When the disabling of the coding tool for the coded video sequence is indicated by any indication (e.g., an enable flag in the sequence parameter flags indicates that the coding tool is disabled), the following constraints may be applied alone or in combination. The flag or indication of the presence of the parameter of the coding tool and the parameter of the coding tool may not be present in the HPS associated with coding the video sequence. There may be a flag or indication of the presence of a parameter of the coding tool, but limited such that the value represents that the coding tool parameter is not present and the reference HPS used to derive and/or infer the parameter is not present.
When a slice refers to an HPS that may include parameters of a coding tool, the following constraints may be applied separately or in combination. When the coding tool is enabled for the slice and parameters of the coding tool are available in the HPS, the parameters of the coding tool may not be present in the slice header. This may occur when the transcoding tool parameters are directly indicated and/or are present in the HPS or available through a reference HPS. When the coding tool is enabled for the slice and parameters of the coding tool are not available in the HPS, the parameters of the coding tool may be present in the slice header. This may occur when the transcoding tool parameters are not directly indicated and/or are not present in the HPS or not available through the reference HPS. When the coding tool is enabled for the slice and parameters of the coding tool are available in the HPS, the parameters of the coding tool may be present in the slice header. This may occur when the transcoding tool parameters are directly indicated and/or present in the HPS or available through the reference HPS. In this case, the parameter used to invoke the coding tool during decoding of the slice is a parameter present in the slice header.
The following may apply when transmitting in-band and out-of-band to the HPS. If the HPS is carried in-band, the HPS may not refer to another HPS carried out-of-band. Otherwise, the HPS may not refer to another HPS carried in-band.
The following constraints may be applied alone or in combination when transmitting the HPS in-band and out-of-band. The HPS carried out-of-band may be carried only on the time synchronization track. The HPS can be carried out-of-band only when there is an HPS update.
As an alternative to coding the HPS ID as a syntax element (ue (v)) of the order 0 Exp-Golomb coding of unsigned integers with left-bit precedence, the HPS ID may be coded as u (v). The number of bits indicating the HPS ID may be specified in the SPS.
Two HPSs in the same coded video sequence may have the same HPS ID, in which case the following may apply. When HPS a and HPS B have the same HPS ID, HPS B follows HPS a in decoding order and SPS contains an ID that references the HPS ID, and then HPS B replaces HPSA. When HPS A and HPS B have the same HPS ID, the decoding time of HPS B is greater than that of HPS A, and SPS contains ID of reference HPS ID, then HPS B replaces HPSA. HPS a, HPS B, HPS C and HPS D are set to HPS included in the same coded video sequence, either by having the same SPS ID (e.g., when the SPS ID is present in the HPS) or by correlating access units that contain HPS. When the HPS IDs of HPS A and HPS D are the same, and the HPS IDs of HPS A, HPS B, and HPS C are unique, the values of the HPS IDs of HPS A, HPS B, and HPS C may be constrained to monotonically increase. There may be a flag in the SPS to indicate whether the HPS can reference another reference HPS.
When reference between HPSs is not allowed, one slice header may have multiple references to the same HPS or may have multiple references to different HPSs. In this case, the following may apply. An HPS reference may be present for each coding tool enabled for the slice, and parameters of the coding tool may be inferred from the HPS. When a stripe references the HPS to infer parameters of a decoding tool, the parameters of the decoding tool should be present in the HPS.
When a parameter of the transcoding tool is not present in the HPS and there is a reference to another HPS for the parameter, the parameter should be present in the reference HPS. The HPS may not reference another HPS from a different coded video sequence. If the SPS ID is present in the HPS, the values of the SPS ID of the current HPS and the corresponding reference HPS should be the same. Otherwise, the HPS may not refer to another HPS present in the access unit that precedes the last IRAP access unit in decoding order, which precedes the HPS. The HPS may also not refer to another HPS whose decoding time is less than the decoding time of the last IRAP access unit preceding the HPS in decoding order, or to the last IRAP access unit whose decoding time is closest to and less than the decoding time of the HPS.
When one HPS references another HPS, the HPS ID of the reference HPS should be less than the HPS ID of the HPS. When an SPS ID is present in any HPS, HPS a, HPS B, HPS C, and HPS D may all have the same SPS ID. When HPS B follows HPS a in decoding order, HPS C follows HPS B in decoding order, and HPS D follows HPS C in decoding order, the following constraints may be applied alone or in combination. When HPS C references HPS A, the HPS ID of HPS B may not be the same as the HPS ID of HPS A. When HPS B refers to HPS A, HPS C is the same as HPS ID of HPS A, HPS D may not refer to HPS A or HPS B. When HPS B and HPS a have the same HPS ID, there may be no band of the reference HPS a following HPS B in decoding order. When HPS B refers to HPS a and HPS C has the same HPS ID as HPS a, there may be no band of either the reference HPS a or HPS B following HPS C in decoding order.
When using time scalability, the time ID of the HPS can be specified as follows. The time ID of the HPS may be set to be the same as the time ID of the access unit containing the HPS. In one example, the time ID of the HPS may be set to be the same as the time ID of the image of the first stripe of the reference HPS.
The bands in the image with time ID (Tid) a may not reference the HPS with ID Tid B, where Tid B is greater than Tid a. The HPS with Tid a may not be referenced to a reference HPS with Tid B, where Tid B is greater than Tid a. HPS with Tid a cannot replace HPS with Tid B, where Tid a is greater than Tid B.
There may be a flag in the sequence level parameter (e.g., SPS) to indicate whether the slice refers to the HPS. When the value of the flag is set to 1, it may indicate that the band reference HPS exists, and may indicate that the HPS ID exists in the band header. The value of the flag, when set to 0, may indicate that the stripe does not refer to the HPS, and that the HPS ID is not present in the stripe header.
The above aspects may be implemented according to the following syntax.
Figure BDA0003327176680000211
The hps _ present _ flag may be set to 1 to indicate that hps _ id is present in the slice header. hps _ present _ flag may be set to 0 to indicate that hps _ id is not present in the slice header.
Figure BDA0003327176680000212
The header _ parameter _ set _ id may identify the HPS for reference by other syntax elements. The value of hdr _ parameter _ set _ id may be between 0 and 63 inclusive. hps _ seq _ parameter _ set _ id denotes the value of SPS _ seq _ parameter _ set _ id of active SPS (activeSPS). The value of pps _ seq _ parameter _ set _ id may be between 0 and 15 (inclusive). alf _ parameters _ idc header _ parameter _ set _ id may be set to 2 to represent that alf _ data () exists in the HPS. alf _ parameters _ idc [ head _ parameter _ set _ id ] may be set to 1 to represent that alf _ data () is not present in the HPS, but inferred to be the same as alf _ data () present in the reference HPS, which is represented by alf _ ref _ HPS _ id [ head _ parameter _ set _ id ]. Alf _ parameters _ idc [ head _ parameter _ set _ id ] may be set to 0 to represent that neither alf _ data () nor alf _ ref _ HPS _ id [ head _ parameter _ set _ id ] is present in the HPS. a value of alf _ parameters _ idc header _ parameter _ set _ id equal to 3 may be reserved. The alf _ ref _ HPS _ id [ head _ parameter _ set _ id ] may represent the head _ parameter _ set _ id of the reference HPS to infer the value of alf _ data () therefrom.
The codestream consistency check example may require the following constraints to be applied. When present, the value of alf _ ref _ hps _ id [ head _ parameter _ set _ id ] may be less than the value of head _ parameter _ set _ id. The values of HPS _ seq _ parameter _ set _ id in the current HPS and in the HPS denoted by alf _ ref _ HPS _ id [ head _ parameter _ set _ id ] may be the same. The value of alf _ parameters _ idc [ alf _ ref _ hps _ id [ head _ parameter _ set _ id ] ] may be equal to 2.
Given that the header _ parameter _ set _ id is equal to hpsA's HPS a, the header _ parameter _ set _ id is equal to hpsB's HPS B, the header _ parameter _ set _ id is equal to hpsC's HPS C, and the current HPS, the value of alf _ ref _ HPS _ id [ header _ parameter _ set _ id ] may not equal hpsA or hpsB when the following condition is true (tube). These conditions are that HPS A precedes HPS B in decoding order, HPS B precedes HPS C in decoding order, and HPS C precedes the current HPS in decoding order. These conditions also include that the value of alf _ ref _ id [ hpsB ] is equal to hpsA and that of hpsC is equal to hpsA.
Figure BDA0003327176680000221
slice _ HPS _ id denotes the header _ parameter _ set _ id of the HPS of the slice reference. When slice _ hps _ id is not present, it is inferred that alf _ parameters _ idc slice _ hps _ id is equal to 0. The codestream consistency check example may require the following constraints to be applied. Before parsing the slice header, an HPS with header _ parameter _ set _ id equal to slice _ HPS _ id should be available. When HPS with head _ parameter _ set _ id equal to slice _ HPS _ id is available in-band, head _ parameter _ set _ id shall be present in one of the following access units. I.e. the IRAP access unit associated with the picture of the current slice, or any access unit following the IRAP access unit but preceding the current access unit in decoding order, or the current access unit. Given that header _ parameter _ set _ id is equal to hpsA's HPS a, header _ parameter _ set _ id is equal to hpsB's HPS B, header _ parameter _ set _ id is equal to hpsC's hpsC, and the current stripe, the value of slice _ HPS _ id should not equal hpsA or hpsB when the two following conditions are true. The conditions include: HPS A precedes HPS B in decoding order, HPS B precedes HPS C in decoding order, and HPS C precedes the current slice in decoding order. The conditions further include: the value of alf _ ref _ id [ hpsB ] is equal to hpsA, and the value of hpsC is equal to hpsA.
Another example implementation is described below.
Figure BDA0003327176680000231
alf _ parameters _ idc header _ parameter _ set _ id may be set to 2 to represent that alf _ data () exists in the HPS. alf _ parameters _ idc [ head _ parameter _ set _ id ] may be set to 1 to represent that alf _ data () does not exist in the HPS, but is inferred to be the same as alf _ data () that exists in the reference HPS, which is represented by alf _ ref _ HPS _ id [ head _ parameter _ set _ id ]. Alf _ parameters _ idc [ head _ parameter _ set _ id ] may be set to 0 to represent that both alf _ data () and alf _ ref _ HPS _ id [ head _ parameter _ set _ id ] are not present in the HPS. When not present, it can be inferred that [ header _ parameter _ set _ id ] is equal to 0.a value of alf _ parameters _ idc header _ parameter _ set _ id equal to 3 may be reserved.
Another example implementation is included below.
The tape head may reference multiple HPSs. A slice of a picture may refer to the HPS of the ALF parameter (e.g., pic _ ALF _ HPS _ id _ luma [ i ]), the HPS of the LMCS parameter (e.g., pic _ LMCS _ aps _ id), and the HPS of the zoom list parameter (e.g., pic _ scaling _ list _ aps _ id). pic _ ALF _ aps _ id _ luma [ i ] denotes an adaptation _ parameter _ set _ id of the ith ALF HPS to which the luminance component of the slice associated with the PH refers. slice _ ALF _ aps _ id _ luma [ i ] represents the adaptation _ parameter _ set _ id of the ith ALF HPS to which the luma component of the slice refers. The Temporalld of an HPS NAL unit with APS _ params _ type equal to ALF _ APS and adaptation _ parameter _ set _ id equal to slice _ ALF _ APS _ id _ luma [ i ] should be less than or equal to the Temporalld of the coded slice NAL unit. When slice _ alf _ enabled _ flag is equal to 1 and slice _ alf _ aps _ id _ luma [ i ] is not present, the value of slice _ alf _ aps _ id _ luma [ i ] is inferred to be equal to the value of pic _ alf _ aps _ id _ luma [ i ]. pic _ LMCS _ aps _ id denotes adaptation _ parameter _ set _ id of the LMCS HPS referred to by the strip related to PH. The temporalld of HPS NAL unit with APS _ params _ type equal to LMCS _ APS and adaptation _ parameter _ set _ id equal to pic _ LMCS _ APS _ id should be less than or equal to the temporalld of the picture associated with PH. pic _ scaling _ list _ aps _ id represents the adaptation _ parameter _ set _ id of the zoom list HPS. The temporalld of HPS NAL units with APS _ params _ type equal to SCALING _ APS and adaptation _ parameter _ set _ id equal to pic _ SCALING _ list _ APS _ id should be less than or equal to the temporalld of the picture associated with PH.
The temporalld of the HPS NAL unit should be the same as the Access Unit (AU) containing the HPS. The temporalld of the HPS NAL unit should be less than or equal to the temporalld of the coded slice NAL unit of the reference HPS. In a specific example, the value of temporalld for non-VCL NAL units is constrained as follows: if NAL _ unit _ type is equal to DPS _ NUT, VPS _ NUT or SPS _ NUT, the Temporalld should be equal to 0 and the Temporalld of the AU containing the NAL unit should be equal to 0. Otherwise, if NAL _ unit _ type is equal to PH _ NUT, then the temporalld should be equal to the temporalld of the PU containing the NAL unit. Otherwise, if nal _ unit _ type is equal to EOS _ NUT or EOB _ NUT, temporalld should be equal to 0. Otherwise, if NAL _ unit _ type is equal to AUD _ NUT, FD _ NUT, PREFIX _ SEI _ NUT, or SUFFIX _ SEI _ NUT, the temporalld should be equal to the temporalld of the AU containing the NAL unit. Otherwise, when NAL _ unit _ type is equal to PPS _ NUT, PREFIX _ APS _ NUT, or SUFFIX _ APS _ NUT, the temporalld should be greater than or equal to the temporalld of the PU containing the NAL unit. When the NAL unit is a non-VCL NAL unit, the value of temporalld is equal to the minimum of temporalld values of all AUs to which the non-VCL NAL unit applies. When nal _ unit _ type is equal to PPS _ NUT, PREFIX _ APS _ NUT, or SUFFIX _ APS _ NUT, the temporalld may be greater than or equal to the temporalld of the containing AU, since all PPS and HPSs may be included in the start of the codestream (e.g., when they are transmitted out-of-band and the receiver places them at the start of the codestream), where the temporalld of the first decoded picture is equal to 0.
Fig. 7 is a schematic diagram of an exemplary video coding apparatus 700. The video coding device 700 is suitable for implementing the disclosed examples/embodiments described herein. The video coding device 700 comprises a downstream port 720, an upstream port 750, and/or a transceiving unit (Tx/Rx) 710 comprising a transmitter and/or a receiver for transmitting data upstream and/or downstream over a network. The video coding apparatus 700 also includes: a processor 730 including a logic unit and/or a Central Processing Unit (CPU) for processing data; and a memory 732 for storing data. Video decoding device 700 may also include electronic, optical-to-electrical (OE), electrical-to-optical (EO), and/or wireless communication components coupled to upstream port 750 and/or downstream port 720 for data communication over an electrical, optical, or wireless communication network. The video coding device 700 may also include an input/output (I/O) device 760 for sending data to and from a user. The I/O devices 760 may include output devices such as a display for displaying video data, speakers for outputting audio data, and the like. I/O device 760 may also include input devices such as a keyboard, mouse, trackball, etc., and/or a corresponding interface for interacting with such output devices.
The processor 730 is implemented by hardware and software. The processor 730 may be implemented as one or more CPU chips, cores (e.g., a multi-core processor), field-programmable gate arrays (FPGAs), application Specific Integrated Circuits (ASICs), and Digital Signal Processors (DSPs). Processor 730 is in communication with downstream port 720, tx/Rx 710, upstream port 750, and memory 732. The processor 730 includes a decode module 714. The decoding module 714 implements the above disclosed embodiments, such as the methods 100, 800, and 900, which may employ the codestream 500 and/or the mechanism 600. The decode module 714 may also implement any other methods/mechanisms described herein. In addition, the coding module 714 may implement the codec system 200, the encoder 300, and/or the decoder 400. For example, the coding module 714 may encode/decode images in the codestream and encode/decode parameters related to slices of images in the multiple HPSs. Various types of HPSs may be included in the corresponding types of transcoding tool parameters. The slice header may then reference various types of HPSs to obtain the decoding tool parameters for the corresponding slice. The HPS may also be assigned a time ID to run with a time scaling algorithm. Using HPSs can aggregate transcoding tool parameters used by multiple stripes into a single location (e.g., along with other HPSs when parameters change). Therefore, the redundant indication is removed, the decoding efficiency is improved, the occupancy rate of the memory resource when the code stream is stored is reduced, and the occupancy rate of the network resource when the code stream is sent is reduced. Accordingly, the coding module 714 enables the video coding apparatus 700 to provide other functions and/or coding efficiencies when coding video data. Accordingly, the decoding module 714 improves the functions of the video decoding apparatus 700 and solves problems specific to the field of video encoding. In addition, the transcoding module 714 may transform the video transcoding device 700 to a different state. Alternatively, the decode module 714 may be implemented as instructions (e.g., a computer program product stored on a non-transitory medium) stored in the memory 732 and executed by the processor 730.
The memory 732 may include one or more types of memory, such as a magnetic disk, a magnetic tape drive, a solid state drive, a Read Only Memory (ROM), a Random Access Memory (RAM), a flash memory, a ternary content-addressable memory (TCAM), a Static Random Access Memory (SRAM), and so forth. Memory 732 may be used as an overflow data storage device to store such programs when such programs are selected for execution, as well as to store instructions and data that are read during program execution.
Fig. 8 is a flow diagram of an example method 800 for encoding a video sequence into a codestream, such as codestream 500, using HPS. The encoder (e.g., codec system 200, encoder 300, and/or video coding device 700) may employ method 800 in performing method 100. Method 800 may also encode a code stream at a decoder (e.g., decoder 400) to support time scaling according to mechanism 600.
Method 800 may begin when an encoder receives a video sequence comprising a plurality of images and determines to encode the video sequence into a codestream, for example, based on user input. Video sequences are segmented into picture/image frames for further segmentation prior to encoding. In step 801, a plurality of images is segmented into a plurality of strips including a first strip.
In step 803, a plurality of stripes, including a first stripe, are encoded into the codestream. A slice may be encoded by a plurality of coding tool parameters. In some examples, the slice is encoded by at least a first type of coding tool and a second type of coding tool. In particular, the slice is encoded by a first type of coding tool according to first type of coding tool parameters. The slice is also encoded by a second type of coding tool based on a second type of coding tool parameter. Such decoding tools may include, for example, ALF decoding tools, LMCS decoding tools, and/or scaling list parameter decoding tools.
In step 805, a plurality of HPSs are encoded into a codestream. The plurality of HPSs may include at least a first HPS and a second HPS. The first HPS comprises first type coding tool parameters and the second HPS comprises second type coding tool parameters, which are used in step 803 for encoding the slice.
In step 807, a first slice header is encoded into the codestream. The first slice header describes the encoding of a first slice of the plurality of slices to support decoding at the decoder side. For example, the first swath header may contain a first reference to a first HPS and a second reference to a second HPS. Thus, the slice header may carry the transcoding tool parameters from a plurality of different types of HPS relays. This can omit such decoding tool parameters from the slice header and improve the decoding efficiency of the code stream by reducing redundant decoding tool parameter indications. In a particular example, the first HPS and the second HPS may include an ALF HPS, an LMCS HPS, a scaling list parameter HPS, or a combination thereof.
In order to reduce the complexity of the HPS-related decoding method, the HPS may be prevented from referring to the decoding tool parameters stored in the other HPS. Accordingly, the plurality of HPSs (including the first HPS and the second HPS) encoded in step 805 are prevented from referencing the transcoding tool parameters from other HPSs of the plurality of HPSs. Further, the HPSs may be encoded to support time scaling by including a time ID in each HPS. In one example, the first HPS is included in the access unit associated with the time ID. Further, the first HPS contains the same time ID associated with the access unit containing the first HPS. In another example, the first strip is segmented from the first image, and the first image is associated with a temporal ID. In this example, the first HPS contains a time ID associated with the image. Further, to support time scaling, each HPS of the plurality of HPSs and each stripe of the stripes is associated with a time ID of the plurality of time IDs. Further, each stripe having a first time ID is blocked from referencing any HPS having a second time ID that is greater than (e.g., associated with a higher frame rate) the first time ID. This ensures that the lower frame rate related stripes do not reference the higher frame rate related HPS, since such HPS is ignored when the lower frame rate is employed according to the temporal scaling mechanism.
In step 809, the codestream is stored in memory. The codestream may then be sent to a decoder, for example by a transmitter, upon request.
Fig. 9 is a flow diagram of an example method 900 for decoding a video sequence from a codestream (e.g., codestream 500) using HPS. A decoder (e.g., codec system 200, decoder 400, and/or video coding device 700) may employ method 900 in performing method 100. The results of method 900 may also be used to support time scaling at the decoder side according to mechanism 600. Method 900 may be used in response to receiving a codestream from an encoder, such as encoder 300, and thus method 900 may be used in response to method 800.
Method 900 may begin when a decoder begins receiving a stream of decoded data representing a video sequence, for example, as a result of method 800. At step 901, a codestream is received at a decoder side. The code stream comprises a plurality of HPSs, including a first HPS and a second HPS. The first HPS is a first type HPS and comprises first type decoding tool parameters. The second HPS is a second type HPS including second type decoding tool parameters. The codestream also includes a slice header and a slice associated with the slice header.
In step 903, the decoder determines that the slice header contains a first reference to the first HPS and a second reference to the second HPS. Thus, the slice header may carry the transcoding tool parameters from a plurality of different types of HPS relays. This can omit such decoding tool parameters from the slice header and improve the decoding efficiency of the code stream by reducing redundant decoding tool parameter indications. In a particular example, the first HPS and the second HPS may include an ALF HPS, an LMCS HPS, a scaling list parameter HPS, or a combination thereof.
In step 905, the decoder may decode the slice using the first type coding tool parameter and the second type coding tool parameter when it is determined that the slice header includes the first reference and the second reference. In order to reduce the complexity of the HPS-related decoding method, the HPS may be prevented from referring to the decoding tool parameters stored in the other HPS. Accordingly, the plurality of HPSs (including the first HPS and the second HPS) received in the codestream in step 901 are prevented from referring to the decoding tool parameters from the other HPSs among the plurality of HPSs. Further, the HPSs may be transcoded to support time scaling by including a time ID in each HPS. In one example, the first HPS is included in the access unit associated with the time ID. Further, the first HPS contains the same time ID associated with the access unit containing the first HPS. In another example, the first strip is segmented from the first image, and the first image is associated with a temporal ID. In this example, the first HPS contains a time ID associated with the image. Further, to support time scaling, each HPS of the plurality of HPSs and each stripe of the stripes is associated with a time ID of the plurality of time IDs. Further, each stripe having a first time ID is blocked from referencing any HPS having a second time ID that is greater than (e.g., associated with a higher frame rate) the first time ID. This ensures that the lower frame rate related stripes do not reference the higher frame rate related HPS, since such HPS is ignored when the lower frame rate is employed according to the temporal scaling mechanism.
At step 907, the decoder may transmit the slice for display as part of the decoded video sequence.
Fig. 10 is a schematic diagram of an example system 1000 for decoding a video sequence of images in a codestream (e.g., codestream 500) using HPS. The system 1000 may be implemented by an encoder and a decoder, such as the codec system 200, the encoder 300, the decoder 400, and/or the video coding device 700. Further, the system 1000 may be used when implementing the methods 100, 800, and/or 900. Further, system 1000 can be employed to support time scaling as discussed with respect to mechanism 600.
The system 1000 includes a video encoder 1002. The video encoder 1002 includes a partitioning module 1001 for partitioning a plurality of images into a plurality of slices. The video encoder 1002 further comprises an encoding module 1003 for encoding a plurality of slices into a code stream, wherein the slices are encoded by at least a first type of coding tool based on the first type of coding tool parameters and a second type of coding tool based on the second type of coding tool parameters. The encoding module 1003 is further configured to encode a first HPS and a second HPS into the code stream, where the first HPS includes a first type decoding tool parameter, and the second HPS includes a second type decoding tool parameter. The encoding module 1003 is further configured to encode a first slice header into the codestream describing an encoding of a first slice of the plurality of slices, where the first slice header contains a first reference to the first HPS and a second reference to the second HPS. The video encoder 1002 also includes a storage module 1005 for storing the codestream transmitted to the decoder. The video encoder 1002 further comprises a transmitting module 1005 for transmitting the codestream using the first HPS and the second HPS to support decoding of the slices at the decoder side according to the first type of coding tool and the second type of coding tool. The video encoder 1002 may also be used to perform any of the steps of the method 800.
The system 1000 also includes a video decoder 1010. The video decoder 1010 includes a receiving module 1011 configured to receive a code stream, where the code stream includes: the apparatus includes an HPS including first-type coding tool parameters, a second HPS including second-type coding tool parameters, a slice header, and a slice associated with the slice header. The video decoder 1010 further comprises a determination module 1013 for determining that the slice header contains a first reference to the first HPS and a second reference to the second HPS. The video decoder 1010 further includes a decoding module 1015 for decoding the slice using the first type of coding tool parameter and the second type of coding tool parameter when it is determined that the slice header includes the first reference and the second reference. The video decoder 1010 also includes a transmitting module 1017 for transmitting the slices for display as part of the decoded video sequence. The video decoder 1010 may also be used to perform any of the steps of the method 900.
A first component is directly coupled to a second component when there are no intervening components between the first component and the second component other than wires, traces, or other media. A first component is indirectly coupled to a second component when there are intervening components, in addition to wires, traces, or other media, between the first and second components. The term "coupled" and variations thereof include direct coupling and indirect coupling. Unless otherwise indicated, use of the term "about" is intended to include a range of the following numbers ± 10%.
It should also be understood that the steps of the exemplary methods set forth herein do not necessarily need to be performed in the order described, and the order of the steps of these methods should be understood as being merely exemplary. Likewise, methods consistent with various embodiments of the present invention may include additional steps, and certain steps may be omitted or combined.
While several embodiments have been provided in the present disclosure, it should be understood that the disclosed systems and methods may be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
Moreover, techniques, systems, subsystems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, components, techniques, or methods without departing from the scope of the present disclosure. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and could be made without departing from the spirit and scope disclosed herein.

Claims (18)

1. A method implemented in a decoder, the method comprising:
receiving a code stream, wherein the code stream comprises: a first parameter set comprising a first type of coding tool parameter, a second parameter set comprising a second type of coding tool parameter, a slice header (slice header), and slice data associated with the slice header; wherein the first parameter set is of a different type than the second parameter set, the first type of coding tool parameter included in the first parameter set being of a different type than the second type of coding tool parameter included in the second parameter set;
determining that the slice header includes a first identification of the first parameter set and a second identification of the second parameter set;
when it is determined that the slice header contains the first identifier and the second identifier, decoding the slice data using the first type coding tool parameters and the second type coding tool parameters.
2. The method of claim 1, wherein the first parameter set is of a type of an Adaptive Loop Filter (ALF) parameter set, a Luma Map Chroma Scaling (LMCS) parameter set, or a scaling list (scalinglist) parameter set, wherein the second parameter set is of a type of an ALF parameter set, an LMCS parameter set, or a scaling list (scalinglist) parameter set, and wherein the first parameter set is of a different type than the second parameter set.
3. The method of any of claims 1-2, wherein the codestream further comprises a plurality of parameter sets, the plurality of parameter sets comprising the first parameter set and the second parameter set, wherein each parameter set of the plurality of parameter sets is blocked from referencing coding tool parameters from other parameter sets of the plurality of parameter sets.
4. The method of claim 1, wherein the first set of parameters is included in an access unit related to a time Identifier (ID), and wherein the first set of parameters includes the time ID related to the access unit including the first set of parameters.
5. The method of claim 1, wherein the slice is part of a picture, wherein the picture is associated with a temporal Identifier (ID), and wherein the first set of parameters includes the temporal ID associated with the picture.
6. The method of claim 1, wherein the codestream comprises a plurality of pictures, each picture containing one or more slices, the one or more slices including the slice, wherein the codestream further comprises a plurality of parameter sets, the plurality of parameter sets including the first parameter set and the second parameter set, wherein each of the parameter sets and each of the slices is associated with one of a plurality of temporal IDs, wherein each slice with a first temporal ID is blocked from referencing any parameter set with a second temporal ID, the second temporal ID being greater than the first temporal ID.
7. A method implemented in an encoder, the method comprising:
segmenting the plurality of images into a plurality of slices (slice);
encoding the plurality of stripes into a code stream, wherein the stripes are encoded by at least a first type of decoding tool based on a first type of decoding tool parameter and a second type of decoding tool based on a second type of decoding tool parameter;
encoding a first parameter set and a second parameter set into the codestream, the first parameter set including the first type of coding tool parameters and the second parameter set including the second type of coding tool parameters; wherein the first parameter set is of a different type than the second parameter set, the first type of coding tool parameter included in the first parameter set being of a different type than the second type of coding tool parameter included in the second parameter set;
encoding a slice header into the codestream describing an encoding of a first slice of the plurality of slices, wherein the slice header contains a first identification of the first parameter set and a second identification of the second parameter set; and
and sending the code stream to a decoder.
8. The method of claim 7, wherein the first parameter set is of a type of an Adaptive Loop Filter (ALF) parameter set, a Luma Map Chroma Scaling (LMCS) parameter set, or a scaling list (scalinglist) parameter set, wherein the second parameter set is of a type of an ALF parameter set, an LMCS parameter set, or a scaling list (scalinglist) parameter set, and wherein the first parameter set is of a different type than the second parameter set.
9. The method of any of claims 7 to 8, further comprising encoding a plurality of parameter sets into the codestream, wherein the plurality of parameter sets includes the first parameter set and the second parameter set, wherein each parameter set of the plurality of parameter sets is blocked from referencing coding tool parameters from other parameter sets of the plurality of parameter sets.
10. The method of claim 7, wherein the first set of parameters is included in an access unit related to a time Identifier (ID), and wherein the first set of parameters includes the time ID related to the access unit including the first set of parameters.
11. The method of claim 7, wherein the first slice is segmented from a first image, wherein the first image is associated with a temporal ID, and wherein the first parameter set comprises the temporal ID associated with the image.
12. The method of claim 7, wherein a plurality of parameter sets including the first parameter set and the second parameter set are encoded, wherein each of the plurality of parameter sets and each of the slices is associated with one of a plurality of temporal IDs, wherein each slice with a first temporal ID is blocked from referencing any parameter set with a second temporal ID that is greater than the first temporal ID.
13. A video coding device, comprising:
a processor, a receiver, a memory, and a transmitter coupled to the processor, wherein the processor, the receiver, the memory, and the transmitter are configured to perform the method of any of claims 1-12.
14. A non-transitory computer readable medium comprising a computer program product for use by a video coding apparatus, the computer program product comprising computer executable instructions stored in the non-transitory computer readable medium which, when executed by a processor, cause the video coding apparatus to perform the method of any of claims 1 to 12.
15. A decoder, comprising:
a receiving module, configured to receive a code stream, where the code stream includes: a first parameter set including a first type of coding tool parameter, a second parameter set including a second type of coding tool parameter, a slice header, and slice data relating to the slice header; wherein the first parameter set is of a different type than the second parameter set, the first type of coding tool parameter included in the first parameter set being of a different type than the second type of coding tool parameter included in the second parameter set;
a determining module to determine that the slice header contains a first identification of the first parameter set and a second identification of the second parameter set;
a decoding module, configured to decode the slice data using the first type of coding tool parameter and the second type of coding tool parameter when it is determined that the slice header includes the first identifier and the second identifier.
16. The decoder of claim 15, wherein the decoder is further configured to perform the method of any of claims 2 to 6.
17. An encoder, comprising:
a segmentation module for segmenting the plurality of images into a plurality of strips;
an encoding module to:
encoding the plurality of stripes into a code stream, wherein the stripes are encoded by at least a first type of coding tool based on a first type of coding tool parameter and a second type of coding tool based on a second type of coding tool parameter;
encoding a first parameter set and a second parameter set into the codestream, the first parameter set including the first type of coding tool parameters and the second parameter set including the second type of coding tool parameters; wherein the first parameter set is of a different type than the second parameter set, the first type of coding tool parameters included in the first parameter set being of a different type than the second type of coding tool parameters included in the second parameter set; and
encoding a slice header into the codestream describing an encoding of a first slice of the plurality of slices, wherein the slice header contains a first identification of the first parameter set and a second identification of the second parameter set.
18. The encoder according to claim 17, characterized in that the encoder is further adapted to perform the method according to any of the claims 8 to 12.
CN202111266613.6A 2018-11-07 2019-11-06 Parameter sets for video coding Active CN114189694B (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201862756983P 2018-11-07 2018-11-07
US62/756,983 2018-11-07
PCT/US2019/060113 WO2020097232A1 (en) 2018-11-07 2019-11-06 Header parameter set for video coding
CN201980073465.0A CN113056911A (en) 2018-11-07 2019-11-06 Header parameter set for video coding

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201980073465.0A Division CN113056911A (en) 2018-11-07 2019-11-06 Header parameter set for video coding

Publications (2)

Publication Number Publication Date
CN114189694A CN114189694A (en) 2022-03-15
CN114189694B true CN114189694B (en) 2022-11-08

Family

ID=70611094

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201980073465.0A Pending CN113056911A (en) 2018-11-07 2019-11-06 Header parameter set for video coding
CN202111266613.6A Active CN114189694B (en) 2018-11-07 2019-11-06 Parameter sets for video coding

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201980073465.0A Pending CN113056911A (en) 2018-11-07 2019-11-06 Header parameter set for video coding

Country Status (8)

Country Link
US (1) US20210258598A1 (en)
EP (1) EP3864842A4 (en)
JP (2) JP7354241B2 (en)
KR (1) KR20210080533A (en)
CN (2) CN113056911A (en)
BR (1) BR112021008659A2 (en)
MX (1) MX2021005355A (en)
WO (1) WO2020097232A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2020213583B2 (en) 2019-02-01 2023-06-01 Beijing Bytedance Network Technology Co., Ltd. Signaling of in-loop reshaping information using parameter sets
CA3127848A1 (en) * 2019-02-02 2020-08-06 Beijing Bytedance Network Technology Co., Ltd. Buffer management for intra block copy in video coding
EP3915265A4 (en) 2019-03-01 2022-06-22 Beijing Bytedance Network Technology Co., Ltd. Direction-based prediction for intra block copy in video coding
JP7402888B2 (en) * 2019-03-08 2023-12-21 中興通訊股▲ふん▼有限公司 Parameter set signaling in digital video
CN113574889B (en) 2019-03-14 2024-01-12 北京字节跳动网络技术有限公司 Signaling and syntax of loop shaping information
CN113632462B (en) * 2019-03-23 2023-08-22 北京字节跳动网络技术有限公司 Default in-loop shaping parameters
WO2020204413A1 (en) * 2019-04-03 2020-10-08 엘지전자 주식회사 Video or image coding for correcting restoration picture
KR102627834B1 (en) 2019-05-11 2024-01-23 베이징 바이트댄스 네트워크 테크놀로지 컴퍼니, 리미티드 Optional use of coding tools in video processing
CN114026863A (en) * 2019-06-24 2022-02-08 交互数字Vc控股公司 Method and apparatus for signaling decoded data using high level syntax elements
JP7303367B2 (en) * 2019-07-08 2023-07-04 エルジー エレクトロニクス インコーポレイティド Video or image coding based on signaling of scaling list data
MX2022000110A (en) 2019-07-10 2022-02-10 Beijing Bytedance Network Tech Co Ltd Sample identification for intra block copy in video coding.
WO2021018031A1 (en) 2019-07-27 2021-02-04 Beijing Bytedance Network Technology Co., Ltd. Restrictions of usage of tools according to reference picture types
JP2022552511A (en) 2019-10-12 2022-12-16 北京字節跳動網絡技術有限公司 high-level syntax for video coding tools
US11451811B2 (en) 2020-04-05 2022-09-20 Tencent America LLC Method and apparatus for video coding
US11750815B2 (en) 2020-09-17 2023-09-05 Lemon, Inc. Versatile video coding track coding
US20220109856A1 (en) * 2020-10-06 2022-04-07 Samsung Electronics Co., Ltd. Access of essential video coding (evc) slices in a file
US11611752B2 (en) * 2020-10-07 2023-03-21 Lemon Inc. Adaptation parameter set storage in video coding
WO2024079334A1 (en) * 2022-10-13 2024-04-18 Telefonaktiebolaget Lm Ericsson (Publ) Video encoder and video decoder
CN117492702B (en) * 2023-12-29 2024-04-02 成都凯迪飞研科技有限责任公司 Conversion method of data streams at large end and small end

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2051527A1 (en) * 2007-10-15 2009-04-22 Thomson Licensing Enhancement layer residual prediction for bit depth scalability using hierarchical LUTs
US9060174B2 (en) * 2010-12-28 2015-06-16 Fish Dive, Inc. Method and system for selectively breaking prediction in video coding
CN103096054B (en) * 2011-11-04 2015-07-08 华为技术有限公司 Video image filtering processing method and device thereof
WO2013109505A2 (en) * 2012-01-19 2013-07-25 Vid Scale, Inc. Methods, apparatus and systems for signaling video coding adaptation parameters
US9351016B2 (en) * 2012-04-13 2016-05-24 Sharp Kabushiki Kaisha Devices for identifying a leading picture
CN104380749A (en) * 2012-04-16 2015-02-25 诺基亚公司 Method and apparatus for video coding
US9813705B2 (en) * 2012-04-26 2017-11-07 Qualcomm Incorporated Parameter set coding
US20130343465A1 (en) * 2012-06-26 2013-12-26 Qualcomm Incorporated Header parameter sets for video coding
US9716892B2 (en) * 2012-07-02 2017-07-25 Qualcomm Incorporated Video parameter set including session negotiation information
EP3090558A4 (en) * 2014-01-03 2017-08-16 Nokia Technologies OY Parameter set coding
CN107534769B (en) * 2015-04-17 2021-08-03 交互数字麦迪逊专利控股公司 Chroma enhancement filtering for high dynamic range video coding
US11323711B2 (en) * 2019-09-20 2022-05-03 Alibaba Group Holding Limited Method and system for signaling chroma quantization parameter offset
US20220394301A1 (en) * 2019-10-25 2022-12-08 Sharp Kabushiki Kaisha Systems and methods for signaling picture information in video coding

Also Published As

Publication number Publication date
JP7354241B2 (en) 2023-10-02
JP2023156358A (en) 2023-10-24
CN114189694A (en) 2022-03-15
KR20210080533A (en) 2021-06-30
US20210258598A1 (en) 2021-08-19
EP3864842A4 (en) 2021-12-08
JP2022506623A (en) 2022-01-17
WO2020097232A1 (en) 2020-05-14
MX2021005355A (en) 2021-06-30
EP3864842A1 (en) 2021-08-18
CN113056911A (en) 2021-06-29
BR112021008659A2 (en) 2021-08-31

Similar Documents

Publication Publication Date Title
CN114189694B (en) Parameter sets for video coding
CN112703743B (en) Stripes and partitions in video coding
CN113498522A (en) Adaptive parameter set types in video coding
US20230209069A1 (en) Flexible Tiling Improvements in Video Coding
CN114827600A (en) Hybrid NAL unit image constraints in video coding
CN114467118A (en) Picture header indication in video coding
US11729401B2 (en) Tile group signaling in video coding
US11425377B2 (en) Arbitrary and wrap-around tile grouping
CN114503591A (en) OLS supporting spatial and SNR adaptivity
CN114650428B (en) Method, apparatus and medium for video coding stream extraction using identifier indication
CN113330746B (en) Block group allocation for raster scan block groups and rectangular block groups in video coding
KR20210135621A (en) Slice entry point in video coding
CN113243110A (en) Explicit address indication in video coding
US20220159285A1 (en) Alf aps constraints in video coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant