CN118285094A - Method, apparatus and medium for video processing - Google Patents
Method, apparatus and medium for video processing Download PDFInfo
- Publication number
- CN118285094A CN118285094A CN202280056040.0A CN202280056040A CN118285094A CN 118285094 A CN118285094 A CN 118285094A CN 202280056040 A CN202280056040 A CN 202280056040A CN 118285094 A CN118285094 A CN 118285094A
- Authority
- CN
- China
- Prior art keywords
- video
- codec
- video block
- time
- budget
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 303
- 238000012545 processing Methods 0.000 title claims abstract description 28
- 230000008569 process Effects 0.000 claims abstract description 109
- 230000007704 transition Effects 0.000 claims abstract description 23
- 238000006243 chemical reaction Methods 0.000 claims abstract description 17
- 230000001133 acceleration Effects 0.000 claims description 26
- 230000015654 memory Effects 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 12
- 238000013139 quantization Methods 0.000 claims description 10
- 238000000638 solvent extraction Methods 0.000 claims description 9
- 238000005192 partition Methods 0.000 claims description 7
- 230000002123 temporal effect Effects 0.000 claims description 7
- 230000002596 correlated effect Effects 0.000 claims 1
- 238000003672 processing method Methods 0.000 claims 1
- 239000013598 vector Substances 0.000 description 16
- 238000007906 compression Methods 0.000 description 15
- 238000013459 approach Methods 0.000 description 14
- 230000006835 compression Effects 0.000 description 14
- 238000010586 diagram Methods 0.000 description 14
- 230000009467 reduction Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 230000008713 feedback mechanism Effects 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 230000001186 cumulative effect Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000006073 displacement reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 241000219357 Cactaceae Species 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000011664 signaling Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 101100122750 Caenorhabditis elegans gop-2 gene Proteins 0.000 description 1
- 241000023320 Luma <angiosperm> Species 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 230000029087 digestion Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- OSWPMRLSEDHDFF-UHFFFAOYSA-N methyl salicylate Chemical compound COC(=O)C1=CC=CC=C1O OSWPMRLSEDHDFF-UHFFFAOYSA-N 0.000 description 1
- 230000000116 mitigating effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 208000001491 myopia Diseases 0.000 description 1
- 230000004379 myopia Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011946 reduction process Methods 0.000 description 1
- 238000007430 reference method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/136—Incoming video signal characteristics or properties
- H04N19/14—Coding unit complexity, e.g. amount of activity or edge presence estimation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/156—Availability of hardware or computational resources, e.g. encoding based on power-saving criteria
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/172—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a picture, frame or field
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/174—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a slice, e.g. a line of blocks or a group of blocks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/177—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
A method for video processing is provided. The method (1100) includes: (1102) During a transition between a target video block of video and a bitstream of video, determining an adjusted codec process for the target video block based at least in part on a budget of a codec time for at least one additional video block and an actual codec time for the at least one additional video block, the at least one additional video block being encoded prior to the transition, the codec time representing a duration of the at least one additional video block being encoded, the budget of the codec time representing a duration of the at least one additional video block being pre-allocated for encoding; and (1104) performing a conversion by using the adjusted codec procedure.
Description
Technical Field
Embodiments of the present disclosure relate generally to video codec technology and, more particularly, to codec process adjustment.
Background
Today, digital video capabilities are being applied to aspects of people's life. Various types of video compression techniques have been proposed for video encoding/decoding, such as the MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 Advanced Video Codec (AVC), ITU-T H.265 High Efficiency Video Codec (HEVC) standard, the Universal video codec (VVC) standard. However, the codec efficiency of conventional video codec techniques is typically low, which is undesirable.
Disclosure of Invention
Embodiments of the present disclosure provide solutions for video processing.
In a first aspect, a method for video processing is presented. The method comprises the following steps: during a transition between a target video block of a video and a bitstream of the video, an adjusted codec process for the target video block is determined based at least in part on a budget of a codec time for at least one additional video block and an actual codec time for the at least one additional video block. At least one additional video block is encoded and decoded prior to conversion. The codec time represents a duration that at least one additional video block is encoded. The budget of the codec time represents the duration pre-allocated for encoding at least one further video block. The method further includes performing a conversion by using the adjusted codec procedure. The method can advantageously improve codec effectiveness and codec efficiency compared to conventional solutions.
In a second aspect, an apparatus for processing video data is presented. The apparatus for processing video data includes a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to perform a method according to the first aspect of the present disclosure.
In a third aspect, a non-transitory computer-readable storage medium storing instructions that cause a processor to perform a method according to the first aspect of the present disclosure is presented.
In a fourth aspect, another non-transitory computer readable recording medium is presented. A non-transitory computer readable recording medium stores a bitstream of video generated by a method performed by a video processing apparatus, wherein the method comprises: determining an adjusted encoding process for a target video block of video based at least in part on a budget of encoding time for at least one further video block and an actual encoding time for the at least one further video block, the at least one further video block encoded prior to the conversion, the encoding time representing a duration of time the at least one further video block is encoded, the budget of encoding time representing a duration of time pre-allocated for encoding the at least one further video block; and generating a code stream by using the adjusted encoding process.
In a fifth aspect, another method for video processing is presented. The method is used for storing the code stream of the video and comprises the following steps: determining an adjusted encoding process for a target video block of video based at least in part on a budget of encoding time for at least one further video block and an actual encoding time for the at least one further video block, the at least one further video block encoded prior to the conversion, the encoding time representing a duration of time the at least one further video block is encoded, the budget of encoding time representing a duration of time pre-allocated for encoding the at least one further video block; generating a code stream by using the adjusted encoding process; and storing the code stream in a non-transitory computer readable recording medium.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
Drawings
The above and other objects, features and advantages of the exemplary embodiments of the present disclosure will become more apparent from the following detailed description with reference to the accompanying drawings. In example embodiments of the present disclosure, like reference numerals generally refer to like components.
FIG. 1 illustrates a block diagram of an example video codec system according to some embodiments of the present disclosure;
fig. 2 illustrates a block diagram of a first example video encoder, according to some embodiments of the present disclosure;
fig. 3 illustrates a block diagram of an example video decoder, according to some embodiments of the present disclosure;
FIG. 4 illustrates a schematic diagram of a relationship of three modules used in codec process adjustment according to some embodiments of the present disclosure;
FIG. 5 illustrates a schematic diagram showing an overall complexity control scheme according to some embodiments of the present disclosure;
FIG. 6 illustrates an example of weighted budget pre-allocation in accordance with some embodiments of the present disclosure;
Fig. 7A and 7B illustrate example graphs of codec time fluctuations for frames in a 2GOP according to some embodiments of the present disclosure;
FIG. 8 illustrates example time savings and RD performance of presets traversed in accordance with some embodiments of the present disclosure;
FIG. 9 illustrates an example diagram of an elasticity threshold method according to some embodiments of the present disclosure;
10A-10D illustrate example graphs of the time error of each CTU as encoding progresses according to some embodiments of the present disclosure;
FIG. 11 illustrates a flowchart of a method for video processing according to some embodiments of the present disclosure; and
FIG. 12 illustrates a block diagram of a computing device in which various embodiments of the disclosure may be implemented.
The same or similar reference numbers will generally be used throughout the drawings to refer to the same or like elements.
Detailed Description
The principles of the present disclosure will now be described with reference to some embodiments. It should be understood that these embodiments are described merely for the purpose of illustrating and helping those skilled in the art to understand and practice the present disclosure and do not imply any limitation on the scope of the present disclosure. The disclosure described herein may be implemented in various ways, other than as described below.
In the following description and claims, unless defined otherwise, all scientific and technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
References in the present disclosure to "one embodiment," "an example embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature, structure, or characteristic is described in connection with an example embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
It will be understood that, although the terms "first" and "second," etc. may be used to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the listed terms.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes," and/or "having," when used herein, specify the presence of stated features, elements, and/or components, but do not preclude the presence or addition of one or more other features, elements, components, and/or groups thereof.
Example Environment
Fig. 1 is a block diagram illustrating an example video codec system 100 that may utilize the techniques of this disclosure. As shown, the video codec system 100 may include a source device 110 and a destination device 120. The source device 110 may also be referred to as a video encoding device and the destination device 120 may also be referred to as a video decoding device. In operation, source device 110 may be configured to generate encoded video data and destination device 120 may be configured to decode the encoded video data generated by source device 110. Source device 110 may include a video source 112, a video encoder 114, and an input/output (I/O) interface 116.
Video source 112 may include a source such as a video capture device. Examples of video capture devices include, but are not limited to, interfaces that receive video data from video content providers, computer graphics systems for generating video data, and/or combinations thereof.
The video data may include one or more pictures. Video encoder 114 encodes video data from video source 112 to generate a bitstream. The code stream may include a sequence of bits that form an encoded representation of the video data. The code stream may include encoded pictures and associated data. An encoded picture is an encoded representation of a picture. The associated data may include sequence parameter sets, picture parameter sets, and other syntax structures. The I/O interface 116 may include a modulator/demodulator and/or a transmitter. The encoded video data may be transmitted directly to destination device 120 via I/O interface 116 over network 130A. The encoded video data may also be stored on storage medium/server 130B for access by destination device 120.
Destination device 120 may include an I/O interface 126, a video decoder 124, and a display device 122. The I/O interface 126 may include a receiver and/or a modem. The I/O interface 126 may obtain encoded video data from the source device 110 or the storage medium/server 130B. The video decoder 124 may decode the encoded video data. The display device 122 may display the decoded video data to a user. The display device 122 may be integrated with the destination device 120 or may be external to the destination device 120, the destination device 120 configured to interface with an external display device.
The video encoder 114 and the video decoder 124 may operate in accordance with video compression standards, such as the High Efficiency Video Codec (HEVC) standard, the Versatile Video Codec (VVC) standard, and other existing and/or further standards.
Fig. 2 is a block diagram illustrating an example of a video encoder 200 according to some embodiments of the present disclosure, the video encoder 200 may be an example of the video encoder 114 in the system 100 shown in fig. 1.
Video encoder 200 may be configured to implement any or all of the techniques of this disclosure. In the example of fig. 2, video encoder 200 includes a plurality of functional components. The techniques described in this disclosure may be shared among the various components of video encoder 200. In some examples, the processor may be configured to perform any or all of the techniques described in this disclosure.
In some embodiments, the video encoder 200 may include a dividing unit 201, a prediction unit 202, a residual generating unit 207, a transforming unit 208, a quantizing unit 209, an inverse quantizing unit 210, an inverse transforming unit 211, a reconstructing unit 212, a buffer 213, and an entropy encoding unit 214, and the prediction unit 202 may include a mode selecting unit 203, a motion estimating unit 204, a motion compensating unit 205, and an intra prediction unit 206.
In other examples, video encoder 200 may include more, fewer, or different functional components. In one example, the prediction unit 202 may include an intra-block copy (IBC) unit. The IBC unit may perform prediction in an IBC mode, wherein the at least one reference picture is a picture in which the current video block is located.
Furthermore, although some components (such as the motion estimation unit 204 and the motion compensation unit 205) may be integrated, these components are shown separately in the example of fig. 2 for purposes of explanation.
The dividing unit 201 may divide a picture into one or more video blocks. The video encoder 200 and the video decoder 300 may support various video block sizes.
The mode selection unit 203 may select one of a plurality of codec modes (intra-coding or inter-coding) based on an error result, for example, and supply the generated intra-frame codec block or inter-frame codec block to the residual generation unit 207 to generate residual block data and to the reconstruction unit 212 to reconstruct the codec block to be used as a reference picture. In some examples, mode selection unit 203 may select a Combination of Intra and Inter Prediction (CIIP) modes, where the prediction is based on an inter prediction signal and an intra prediction signal. In the case of inter prediction, the mode selection unit 203 may also select a resolution (e.g., sub-pixel precision or integer-pixel precision) for the motion vector for the block.
In order to perform inter prediction on the current video block, the motion estimation unit 204 may generate motion information for the current video block by comparing one or more reference frames from the buffer 213 with the current video block. The motion compensation unit 205 may determine a predicted video block for the current video block based on the motion information and decoded samples from the buffer 213 of pictures other than the picture associated with the current video block.
The motion estimation unit 204 and the motion compensation unit 205 may perform different operations on the current video block, e.g., depending on whether the current video block is in an I-slice, a P-slice, or a B-slice. As used herein, an "I-slice" may refer to a portion of a picture that is made up of macroblocks, all based on macroblocks within the same picture. Further, as used herein, in some aspects "P-slices" and "B-slices" may refer to portions of a picture that are made up of macroblocks that are independent of macroblocks in the same picture.
In some examples, motion estimation unit 204 may perform unidirectional prediction on the current video block, and motion estimation unit 204 may search for a reference picture of list 0 or list 1 to find a reference video block for the current video block. The motion estimation unit 204 may then generate a reference index indicating a reference picture in list 0 or list 1 containing the reference video block and a motion vector indicating a spatial displacement between the current video block and the reference video block. The motion estimation unit 204 may output the reference index, the prediction direction indicator, and the motion vector as motion information of the current video block. The motion compensation unit 205 may generate a predicted video block of the current video block based on the reference video block indicated by the motion information of the current video block.
Alternatively, in other examples, motion estimation unit 204 may perform bi-prediction on the current video block. The motion estimation unit 204 may search the reference pictures in list 0 for a reference video block for the current video block and may also search the reference pictures in list 1 for another reference video block for the current video block. The motion estimation unit 204 may then generate a plurality of reference indices indicating a plurality of reference pictures in list 0 and list 1 containing a plurality of reference video blocks and a plurality of motion vectors indicating a plurality of spatial displacements between the plurality of reference video blocks and the current video block. The motion estimation unit 204 may output a plurality of reference indexes and a plurality of motion vectors of the current video block as motion information of the current video block. The motion compensation unit 205 may generate a prediction video block for the current video block based on the plurality of reference video blocks indicated by the motion information of the current video block.
In some examples, motion estimation unit 204 may output a complete set of motion information for use in a decoding process of a decoder. Alternatively, in some embodiments, motion estimation unit 204 may signal motion information of the current video block with reference to motion information of another video block. For example, motion estimation unit 204 may determine that the motion information of the current video block is sufficiently similar to the motion information of neighboring video blocks.
In one example, motion estimation unit 204 may indicate a value to video decoder 300 in a syntax structure associated with the current video block that indicates that the current video block has the same motion information as another video block.
In another example, motion estimation unit 204 may identify another video block and a Motion Vector Difference (MVD) in a syntax structure associated with the current video block. The motion vector difference indicates the difference between the motion vector of the current video block and the indicated video block. The video decoder 300 may determine a motion vector of the current video block using the indicated motion vector of the video block and the motion vector difference.
As discussed above, the video encoder 200 may signal motion vectors in a predictive manner. Two examples of prediction signaling techniques that may be implemented by video encoder 200 include Advanced Motion Vector Prediction (AMVP) and merge mode signaling.
The intra prediction unit 206 may perform intra prediction on the current video block. When intra prediction unit 206 performs intra prediction on a current video block, intra prediction unit 206 may generate prediction data for the current video block based on decoded samples of other video blocks in the same picture. The prediction data for the current video block may include the prediction video block and various syntax elements.
The residual generation unit 207 may generate residual data for the current video block by subtracting (e.g., indicated by a minus sign) the predicted video block(s) of the current video block from the current video block. The residual data of the current video block may include residual video blocks corresponding to different sample portions of samples in the current video block.
In other examples, for example, in the skip mode, there may be no residual data for the current video block, and the residual generation unit 207 may not perform the subtracting operation.
The transform processing unit 208 may generate one or more transform coefficient video blocks for the current video block by applying one or more transforms to the residual video block associated with the current video block.
After the transform processing unit 208 generates the transform coefficient video block associated with the current video block, the quantization unit 209 may quantize the transform coefficient video block associated with the current video block based on one or more Quantization Parameter (QP) values associated with the current video block.
The inverse quantization unit 210 and the inverse transform unit 211 may apply inverse quantization and inverse transform, respectively, to the transform coefficient video blocks to reconstruct residual video blocks from the transform coefficient video blocks. Reconstruction unit 212 may add the reconstructed residual video block to corresponding samples from the one or more prediction video blocks generated by prediction unit 202 to generate a reconstructed video block associated with the current video block for storage in buffer 213.
After the reconstruction unit 212 reconstructs the video block, a loop filtering operation may be performed to reduce video blockiness artifacts in the video block.
The entropy encoding unit 214 may receive data from other functional components of the video encoder 200. When the entropy encoding unit 214 receives data, the entropy encoding unit 214 may perform one or more entropy encoding operations to generate entropy encoded data and output a bitstream including the entropy encoded data.
Fig. 3 is a block diagram illustrating an example of a video decoder 300 according to some embodiments of the present disclosure, the video decoder 300 may be an example of the video decoder 124 in the system 100 shown in fig. 1.
The video decoder 300 may be configured to perform any or all of the techniques of this disclosure. In the example of fig. 3, video decoder 300 includes a plurality of functional components. The techniques described in this disclosure may be shared among the various components of video decoder 300. In some examples, the processor may be configured to perform any or all of the techniques described in this disclosure.
In the example of fig. 3, the video decoder 300 includes an entropy decoding unit 301, a motion compensation unit 302, an intra prediction unit 303, an inverse quantization unit 304, an inverse transform unit 305, and a reconstruction unit 306 and a buffer 307. In some examples, video decoder 300 may perform a decoding process that is generally opposite to the encoding process described with respect to video encoder 200.
The entropy decoding unit 301 may retrieve the encoded code stream. The encoded bitstream may include entropy encoded video data (e.g., encoded blocks of video data). The entropy decoding unit 301 may decode the entropy-encoded video data, and the motion compensation unit 302 may determine motion information including a motion vector, a motion vector precision, a reference picture list index, and other motion information from the entropy-decoded video data. The motion compensation unit 302 may determine this information, for example, by performing AMVP and merge mode. AMVP is used, including deriving several most likely candidates based on data and reference pictures of neighboring PB. The motion information typically includes horizontal and vertical motion vector displacement values, one or two reference picture indices, and in the case of prediction regions in B slices, an identification of which reference picture list is associated with each index. As used herein, in some aspects, "merge mode" may refer to deriving motion information from spatially or temporally adjacent blocks.
The motion compensation unit 302 may generate a motion compensation block, possibly performing interpolation based on an interpolation filter. An identifier for an interpolation filter used with sub-pixel precision may be included in the syntax element.
The motion compensation unit 302 may calculate interpolation values for sub-integer pixels of the reference block using interpolation filters used by the video encoder 200 during encoding of the video block. The motion compensation unit 302 may determine an interpolation filter used by the video encoder 200 according to the received syntax information, and the motion compensation unit 302 may generate a prediction block using the interpolation filter.
Motion compensation unit 302 may use at least part of the syntax information to determine a block size for encoding frame(s) and/or slice(s) of the encoded video sequence, partition information describing how each macroblock of a picture of the encoded video sequence is partitioned, a mode indicating how each partition is encoded, one or more reference frames (and a list of reference frames) for each inter-codec block, and other information to decode the encoded video sequence. As used herein, in some aspects, "slices" may refer to data structures that may be decoded independent of other slices of the same picture in terms of entropy encoding, signal prediction, and residual signal reconstruction. The strip may be the entire picture or may be a region of the picture.
The intra prediction unit 303 may use an intra prediction mode received in a bitstream, for example, to form a prediction block from spatially neighboring blocks. The dequantization unit 304 dequantizes (i.e., dequantizes) the quantized video block coefficients provided in the bitstream and decoded by the entropy decoding unit 301. The inverse transformation unit 305 applies an inverse transformation.
The reconstruction unit 306 may obtain a decoded block, for example, by adding the residual block to the corresponding prediction block generated by the motion compensation unit 302 or the intra prediction unit 303. If desired, a deblocking filter may also be applied to filter the decoded blocks to remove blocking artifacts. The decoded video blocks are then stored in buffer 307, buffer 307 providing reference blocks for subsequent motion compensation/intra prediction, and buffer 307 also generates decoded video for presentation on a display device.
Some example embodiments of the present disclosure are described in detail below. It should be noted that the section headings are used in this document for ease of understanding and do not limit the embodiments disclosed in the section to this section only. Furthermore, although some embodiments are described with reference to a generic video codec or other specific video codec, the disclosed techniques are applicable to other video codec techniques as well. Furthermore, although some embodiments describe video encoding steps in detail, it should be understood that the corresponding decoding steps to cancel encoding will be implemented by a decoder. Furthermore, the term video processing includes video codec or compression, video decoding or decompression, and video transcoding in which video pixels are represented from one compression format to another or at different compression code rates.
1. Summary of the invention
The present disclosure relates to video encoding and decoding techniques. And more particularly to coding complexity control in video coding. These ideas can be applied to any video codec standard, such as a general video codec (VVC) or a non-standard video codec, alone or in various combinations.
2. Abbreviations (abbreviations)
AI all intra frames
RA random access
AVC advanced video coding
HEVC efficient video coding and decoding
VVC universal video coding and decoding
VCEG video coding and decoding expert group
MPEG moving Picture expert group
QTMT quadtree with nested multi-type trees
SIMD single instruction multiple data
CTU coding tree unit
VTM VVC test model
QP quantization parameters
RD rate distortion
GOP group of pictures
TID time ID
SATD absolute conversion difference sum
CU coding and decoding unit
PU prediction unit
QT quadtree
BT binary tree
MT multi-type tree
BDBR Variable (Delta) code rate
TS time saving
3. Background art
The video codec standard is evolving iteratively through the development of the ITU-T Video Codec Expert Group (VCEG) and ISO/IEC Moving Picture Expert Group (MPEG) joint collaboration team. Every ten years, new generation video codec standards have emerged that incorporate the current state-of-the-art technology. The most representative technologies of the times have been compiled by H.262/MPEG-2, H.264/MPEG-4 Advanced Video Codec (AVC), H.265/High Efficiency Video Codec (HEVC), and H.266/common video codec (VVC). Hybrid video codec structures such as in h.262 and coding tools that are continually updated during AVC, HEVC, and VVC developments include block partitioning, intra prediction, inter prediction, transforms, entropy coding, and loop filters. While a code rate saving of about 50% is achieved for each new generation of video codec standard, more complex algorithm designs (e.g., more complex partitioning strategies in the Codec Tree Unit (CTU)) also result in several times increased coding time. The coding computation complexity of HEVC is 5 times higher than that of AVC. Although Single Instruction Multiple Data (SIMD) is integrated and enabled at VVC, the VVC encoding time is still 10.2 times higher than the encoding time of HEVC on average under Random Access (RA) settings. As for All Intra (AI) settings, the complexity is even increased 31 times. Coding complexity is increasingly becoming a more significant obstacle to the widespread deployment and use of video codec standards, particularly for AI settings. To solve this problem, reducing complexity is a common approach.
In general, the reduced complexity approach chooses to skip some brute force searches to achieve coding time savings. Taking VVC as an example, most of the current research focuses on complexity reduction of some modules, full search to skip the nested multi-type tree (QTMT) quadtree partitioning process, fast intra-mode decision and fast multiple transform selection. In fact, a series of complexity reduction algorithms are also discussed in JVET development, some of which are employed in the VVC Test Model (VTM). However, algorithms that reduce complexity may not solve the problem in practice. The reason is two. First, algorithms that reduce complexity typically have only a few discrete configurations. It is therefore inflexible and cannot meet the requirements of resource-constrained applications whose target encoding time is between two configurations. Second, the reduced complexity performance is not stable for different video content and Quantization Parameters (QPs). For the same algorithm, a large difference in time savings and Rate Distortion (RD) loss may occur across different sequences. Thus, complexity control is a necessary condition to fill the gap between the requirements and the fast coding algorithm in order to achieve a randomly selected coding time available for each sequence and each QP.
4. Problem(s)
The existing designs for control of video coding complexity have the following problems:
1. the complexity reduction method occurs more frequently, and lacks a reference method to achieve video codec complexity control, especially for VVC.
2. Currently, existing complexity control mechanisms are tightly coupled with the design of fast algorithms. It is difficult to extend the complexity control method from one video codec standard to another.
3. Complexity control accuracy (obtained by encoding time error estimation) is not satisfactory.
4. A significant RD reduction occurs when complexity control methods are employed as compared to reduced complexity methods.
5. Description of the invention
To solve the above problems, as well as some other problems not mentioned, a method as described below is disclosed. The embodiments should be considered as examples explaining the general concepts and should not be construed in a narrow sense. Furthermore, the embodiments may be applied alone or in any combination.
A video coding complexity control scheme is designed in which three main modules are deployed: complexity pre-allocation, feedback mechanisms for implementing one pass coding complexity (more specifically coding time) control, and coding strategy decisions. Fig. 4 illustrates a schematic diagram 400 for showing a relationship of three modules used in encoding process adjustment according to some embodiments of the present disclosure. The relationship of the three modules mentioned is shown in fig. 4. Wherein the complexity is performed prior to the encoding process. The feedback mechanism then works according to the pre-allocation complexity and the actual coding time. Finally, the complexity pre-allocation module and the feedback module will jointly decide the coding strategy for the subsequent video units/frames/GOP to be encoded.
Policies in the various modules may be modified according to usage scenarios.
Pre-allocation with respect to complexity
1) Unreasonable coding complexity allocation may make coding time budget unrealizable, causing significant RD loss for some video units. Thus, a weighted coding budget pre-allocation method is devised.
A. in one example, the pre-allocation may consist of three phases, namely a clip phase, a frame phase, and a video unit phase.
I. In one example, at the segment stage, a sequence encoding budget will be allocated to each segment. One slice will have one or more GOPs. The slice budget may depend on the number of GOPs/frames.
In one example, in the frame phase, a GOP budget will be allocated to each frame.
1. In one example, the weight of each frame may depend on the decoded information, e.g., slice/picture type, QP, etc.
2. Alternatively, the weights may be updated on the fly.
In one example, in the video unit phase, a frame budget will be allocated to each video unit.
1. In one example, the weight of each video unit may depend on the computed intermediate features, such as gradient/variance/SATD cost, etc.
2. Alternatively, the weights may depend on decoded information, such as spatial/temporal neighboring video units or similar video units that may be historically tracked.
With respect to feedback mechanism
2) A feedback mechanism is designed that helps to maintain the accuracy characteristics of the method.
A. In one example, video unit level (e.g., one video unit is a CTU) encoding time will be collected and compared to a video unit level budget after the encoding process, which will help update the video unit level encoding budget consumption state, i.e., the joint budget state.
B. alternatively, in addition, the actual encoding time at the slice/tile/frame/GOP level may be collected to update the joint budget state.
3) By jointly considering the coding time budget and the coding time error, the subsequent video units can be fed back in a way that accelerates the rate or coding strategy adjustment direction.
A. In one example, a model-based encoding time reassignment method is designed. Wherein the encoding time offset is to be utilized to reallocate the encoding time budget. Then, with the estimated original encoding time, the encoding time budget will be converted to an acceleration ratio. The speed-up ratio may then be used to determine the encoding strategy.
I. In one example, the above-described encoding time bias (i.e., temporal error) may remain at the video unit level, and the accumulated error will be evenly or unevenly distributed over the next few video units.
Alternatively, the temporal error may be maintained at the frame/GOP level. The accumulated error will be uniformly or unevenly distributed over the next few frames/GOP.
In one example, the estimated raw encoding time described above may be based on a time-cost model. Where the coding time will be estimated by the available costs, such as SATD costs or plane costs.
1. In one example, the model may be defined as α CTUcost β, where α and β are two parameters.
2. In one example, the model may be defined as α× CTUcost β +γ, where α, β, γ are three parameters.
B. Alternatively, a state-based encoding strategy adjustment method is designed. Wherein the total budget consumption states (including the target budget consumption rate and the actual budget consumption rate) are combined to calculate a current encoding speed, which is then utilized to modify the encoding strategy adjustment direction.
With respect to coding strategies
4) For the encoding strategy decision module, one or more factors will be used. Where each factor will have multiple encoding configurations. The combination of the configurations of the factors will have different effects on the coding time and RD loss.
A. In one example, the maximum division depth is the only factor. The widespread existence of the partitioning process makes it easy to extend to different video codec standards, e.g., VVC, HEVC, and video codec settings, including intra and inter frames.
I. In one example, for VVC, the maximum partition depth may be a single Quadtree (QT)/multi-type tree (MT)/Binary Tree (BT)
Alternatively, for VVC, the maximum depth may be a QT/MT/BT combination
B. alternatively, the minimum partition depth may be considered as one factor.
C. alternatively, other coding tools (including intra/inter/IBC/palette prediction modes, motion estimation ranges) may also be considered as factors.
5) A Pareto (Pareto) based approach was then devised to generate the most efficient coding strategy in terms of rate-distortion-complexity from a combination of factors, i.e. Preset.
A. In one example, all configurations of factors are combined to evaluate coding performance. One or more configurations with the best performance in terms of time saving and RD loss will be selected as candidate policies, i.e. presets.
I. In one example, the presets will be arranged in a sequential order, such as in an order of increasing RD loss.
B. Furthermore, by the obtained preset of the offline training, a look-up table can be derived.
I. In one example, the lookup table may contain all presets. For each preset, the configuration of each factor, the corresponding coding time saving ratio, the corresponding RD loss.
6) The coding strategy decision is then made by determining which preset to use for the subsequent video unit, depending on the manner of feedback.
A. in one example, the acceleration ratio is obtained from a feedback module.
I. In one example, a lookup table will be used to find the corresponding preset whose acceleration ratio is closest to the target acceleration ratio.
B. alternatively, the encoding strategy adjustment direction is obtained from a feedback module.
I. in one example, all presets will be available. The threshold thr is used to decide to select a faster or slower preset.
Alternatively, there are several presets available. Accordingly, another threshold thr2 is used to adjust the presets available for the faster/slower range.
1. In one example, the presets may be several consecutive. For example, [ Preset min,Presetmax ] is designed to constrain a Preset range.
2. Alternatively, the presets may be discontinuous.
For the above threshold, a fixed threshold or alternatively an elastic threshold according to the experience of the coding progress is designed.
1. In one example, the encoding schedule may be the ratio of uncoded video units/frames/GOP, i.e., r b.
2. In one example, the elastic threshold is designed to thr2=thr+δ· (1-r b), where δ is a parameter in the range (0, 1).
6. Examples
1) Example 1: the present embodiment describes an overall design example of the complexity control scheme.
Fig. 5 illustrates a schematic diagram 500 for showing an overall complexity control scheme according to some embodiments of the present disclosure. Given the sequence target coding time, the overall scheme aims to achieve precise complexity control, as shown in fig. 5.
First, the target encoding time will be decomposed stepwise to each level. In particular, a time budget is evenly allocated to each group of pictures (GOP). The GOP budget is then unevenly allocated to each frame according to the Time ID (TID). The frame budget is then unevenly allocated to each video unit according to the SATD cost of the video unit based on an 8x8 Hadamard transform. The target encoding time is split into budgets for individual video units by a pre-allocation process.
Then, a video unit level coding strategy decision module is designed to achieve video unit budgeting. In one example, a temporal cost (T-C) model is constructed using planar costs that are found to have a strong correlation with video unit coding time. The model is then used to estimate video unit encoding time when performing a default encoding search for the video unit. The acceleration ratio of the current video unit may then be derived, based on which a set of parameters (also called presets) is selected from a series of predefined parameter sets. Alternatively, when planar costs are not available. For example, for intra-frame settings, presets are selected directly for subsequent video units in combination with the target budget consumption state and the actual budget consumption state. In particular, presets with higher or lower coding complexity will be selected.
In order to realize accurate coding time control, a feedback module is designed. Wherein the encoding time of each video unit is collected, the total time error at the video unit level is maintained and used to update the joint budget consumption state. The joint budget consumption state will then be further used for the guidance of the encoding strategy decision module.
2) Example 2: this embodiment describes an example of how weighted budget allocation is implemented in a pre-allocation module.
The total budget of the sequence is first evenly distributed to each GOP, fig. 6 illustrates an example weighted budget pre-allocation 600 according to some embodiments of the present disclosure. When the total frame is not divisible by the GOP size, the trailing frame will appear at the end, as shown in fig. 6. The trailing frame is denoted GOP k+1. In this case, a k+1GOP will be allocated less coding budget, depending on the number of frames in the GOP.
One or more training frames may be required to fit some of the online parameters prior to the actual encoding time control process. The presence of training frames is optional. For AI settings, the sequence budget will be equally allocated to each frame, so the number of training frames can also be zero. The number of training frames is also optional. Under RA settings, one or more GOPs may be selected for training. Thus, the true encoding time of these training frames should be subtracted from the sequence budget during the pre-allocation process, as shown in (1). The budget for each GOP will then be obtained, e.g., from GOP 2 to GOP k+1.
The GOP budget will then be factored into the frame budget as shown in (2). Here, ω Frame represents the weight of each frame in the GOP.
The encoding times of frames at the same TID have shown some fixed relative relationship for the same sequence. Fig. 7A illustrates an example encoding time fluctuation 700 of a frame in a GOP (labeled johnni (Jonny)) in accordance with some embodiments of the present disclosure. Fig. 7B illustrates another example encoding time fluctuation 750 of frames in a GOP (labeled DaylightRoad 2). Preliminary experiments (as shown in fig. 7A and 7B) validated this hypothesis. Johnny and DaylightRoad use a GOP size 16 RA setting for encoding. Two GOPs are encoded and frame level encoding times are collected and shown in the figure. According to the figure, the encoding time consumption of frames with the same TID shows a significant stability, whereas for frames with different TIDs there is a degree of separation.
To get the weight for each TID, the training frame is set to collect the coding time relationship (i.e., relative weights) for each TID. The weights derived from the training frames may also be used to estimate the weight of each TID in a GOP for subsequent GOPs, taking into account the similarity of GOPs in the same sequence.
Similarly, the frame budget is broken down to the video unit budget as shown in (3). Here ω CTU represents the weight of each video unit in a frame. And, T b represents the time budget for the video unit.
To determine the weights of all video units in the entire frame prior to encoding, the sum of the 8x8 Hadamard costs to be used is selected, as shown in (4). Notably, the Hadamard cost is also calculated for rectangular video units at picture boundaries and the video unit time budget is allocated accordingly.
The pre-allocation process regards complexity as a factor affecting the final encoding time. The complexity is reflected here by TID-related factors and factors based on Hadamard costs. By such pre-allocation, frames and video units with higher complexity will be allocated more time budget, avoiding unreasonable speed ratios for those complex areas.
3) Example 3: this embodiment describes an example of how predefined presets are generated using the pareto-based method in the encoding policy decision module.
The pareto principle states that for many results, about 80% of the results come from 20% of the causes ("vital minority"). In the problem of obtaining an optimal parameter configuration for achieving the target time savings with minimal RD loss. The pareto efficient coding configuration identified by rate-distortion-complexity analysis is first performed. Parameters that have a significant impact on RD performance and encoding time are collected and selected. For example, maximum/minimum CU depth and PU mode for HEVC, maximum QT/BT/MT depth for VVC.
Here VVC intra setting is an example. The combination of QT, {0,1,2,3,4,5} BT and {0,1,2,3} MT with maximum depth {0,1,2,3,4} may be traversed. All sequences from class C-E were tested to investigate the acceleration characteristics (byVariable code rate (BDBR) assessment, also known as BR. FIG. 8 illustrates example time savings and RD performance 800 of presets traversed in accordance with some embodiments of the present disclosure. Presets with BDBR higher than 30% are aborted for better presentation. The boundary of fig. 8 means the best cost performance, which is extracted to the configuration selected in table 1, as pareto valid presets 0-5.
Table 1 selected boundary presets
The maximum QT/MT/BT combination is taken as an example here. A pareto efficiency comparison may also be made with the addition of a minimum QT/BT/MT depth. Additionally, the motion estimation range may be used as a parameter for comparison. In case more parameters are selected in the candidate list for comparison, a lower RD loss can be achieved with the same time saving performance. Since the method does not constrain specific parameters in the candidate list, the method can be generalized to other video codec standards such as AV1, HEVC, and h.264 by constructing a candidate parameter list and selecting pareto-significant parameters through rate-distortion-complexity analysis.
4) Example 4: this embodiment describes an example of how to implement a model-based method for implementing coding strategy decisions.
Examples of how the coding strategy decision may be made are illustrated herein. Again, VVC is set in the frame as an example. Through analysis based on the pareto principle, two or more candidate presets can be obtained, where each preset corresponds to a TS ratio. The present embodiment will demonstrate the first approach of selecting the appropriate presets for each video unit, i.e., the model-based approach.
The central idea of the model-based approach is that the video unit time budget for predicting the video unit coding time by means of the model is reallocated when coding under the default configuration. In which an accurate model is designed to estimate the video unit coding time (i.e., the original video unit time) in a default configuration. In order to estimate the encoding time before the real encoding process, the luminance compression time of each video unit is collected. Different features were tried and for the corresponding block, the plane cost was found to have a strong correlation with its luminance compression time. An exponential relationship was observed and thus a build time-plane cost PlanarCost (i.e., T-C) model was constructed as (5), where α and β are the sequence and QP related parameters. In one example, the two parameters may be fitted offline. Alternatively, the two parameters may be fitted by an online process. These parameters may be fitted using the first one or more video units to be encoded in the current sequence. Alternatively, the initial values of the two parameters may be fitted off-line, but they are updated at certain intervals as the encoding process proceeds. In addition, T p represents the original video unit time.
Tp=α×PlanarCostβ (5)
The performance of model-based methods depends largely on the accuracy of the model. The video unit time target obtained by the joint consideration of the pre-allocation process and the budget consumption feedback can be converted into the acceleration ratio only when the original video unit time is accurately estimated. In order to maintain accurate characteristics of models across different compilation or execution environments, a factor r cpu was designed to represent relative computing power, as follows:
Where T r represents the true encoding time collected online, and T p is the predicted value of the T-C model. During encoding, the ratio will be updated using the un-accelerated CTUs. And this ratio will be used immediately to update the following alpha,
This will help to keep the exact properties of the model during encoding. Then after updatingWill be used to predict the original luminance compression time of the subsequent video unit,
On the other hand, the video unit budget obtained from (8) will be updated to T fb, the time that is fed back to the current video unit from the previously accumulated time error.
Ta=Tb+Tfb (9)
The updated video unit time budget is referred to herein as the reallocated CTU time T a. The target time ratio of the current video unit, r Video Unit, will be derived as
The preset in table 1 closest to the speed-up ratio is then employed to achieve the target speed-up ratio.
5) Example 5: this embodiment describes an example of how to implement a state-based method for implementing coding strategy decisions.
An example of how the coding strategy decision is made is shown here. Taking the VVC intra-frame setting as an example, two or more candidate presets may be obtained after analysis based on pareto principles, where each preset corresponds to a TS ratio. The present embodiment will demonstrate a second method of selecting the appropriate presets for each video unit, namely a state-based method.
The central idea is to modify the preset adjustment direction according to the budget consumption state. Specifically, three data values including a target encoding time, an accumulated budget consumption ratio, and an accumulated actual time consumption ratio are derived from the joint budget state module. Wherein the accumulated actual time consumption ratio is updated after each video unit is encoded as shown in (11).
Here, r r represents the actual overall consumption ratio of the accumulated video unit coding time. Similarly, the accumulated budget is updated after each video unit is encoded as shown in (12).
Here, r b is the anchor consumption overall consumption ratio of the accumulated video unit coding time. The anchor point is obtained and fixed after the pre-analysis step. So, it can be referred to evaluate the encoding speed as shown in (13).
Where r speed denotes the relative speed. If r speed is greater than 1, this means that the actual budget consumption is higher than the target. In this case, the following video units should be encoded faster to fill the gap. The preset modification step may be defined in (14):
Here, a higher threshold thr means that there is less likelihood of changing the preset, while a lower threshold means that the scheme is more sensitive to deviations between the anchor budget consumption and the actual budget consumption. A lower threshold means that complexity control is more timely and accurate, but preset switching is more frequent. Lower thresholds are always urgent to solve the problem in front of the eye, which can easily lead to shortsightedness. In contrast, higher thresholds are long-looking, but may come at the expense of complexity control accuracy.
In order to combine the advantages of both parties, an elastic threshold method is designed. Specifically, after a new Preset New is obtained, it will be limited to a Preset range, as shown in (15),
PresetNew=Clamp(PresetNew,[Presetmin,Presetmax]) (15)
Where Preset min is the minimum allowed Preset, which represents the slowest encoding speed. And Preset max is a maximum allowable Preset that represents the fastest coding speed. When the encoding time of the previous CTU exceeds a threshold, the preset will switch within a preset range. Only when a more significant deviation from the threshold is detected, the preset range is adjusted as shown in (16),
Wherein the larger derived threshold thr2 is defined as:
thr2=thr+δ·(1-rb) (17)
Where thr is the smaller threshold. Where thr2 is a more pronounced threshold. Where δ is in the range of (0, 1), representing the interval between these two thresholds. At the start of encoding, thr2 and thr have the largest interval, which means that adjustment of the preset range is suppressed. By using this suppression method, more coding time fluctuations are approved, which will help to achieve better RD performance of the encoder. Thr2 gradually approaches thr as the encoding process approaches tail sounds. In this case, more frequent preset switching will be allowed to achieve accurate complexity control.
Fig. 9 illustrates an example diagram 900 of an elasticity threshold method according to some embodiments of the present disclosure. Fig. 9 compares the mechanism of preset switching for both the fixed threshold method and the flexible threshold method. The arrow covering preset 0 to preset 5 represents the fixed threshold method. Taking into account that the encoding time of a single video unit is less. A single video unit encoding time does not quickly converge the entire encoding time to the target area. Thus, it is possible that the final selected preset will quickly switch to 0 or 5 and stay.
On the other hand, the elastic thresholding method will first restrict the selected presets to a range, e.g., preset 3 to preset 5. The preset range will be adjusted from preset 2 to preset 4 only if the overall encoding speed differs considerably from the target. This will give a change in the digestion of the encoding speed fluctuations due to the locally encoded content, which will avoid frequent preset switching and result in better RD performance.
6) Example 6: this embodiment describes an example of how the feedback mechanism is implemented.
At the end of the encoding process for each video unit, the actual elapsed time for video unit T r is available, which can be used to calculate the time error T e, as shown in (18),
Te=Tr-Ta(18)
Prior to the encoding process for each video unit, the previous accumulated time error is used to calculate a time error feedback, as shown at (19),
Where N Window represents the number of video units used for error distribution. A smaller N Window may allow time to converge quickly, but too tight a time constraint may result in higher RD loss. In one example, anchor N Window is set to 20. Alternatively, it may be any artificially specified positive integer. In addition, the number of left video units CTUleft is also considered for temporal error mitigation when the encoding process approaches tail-biting.
NWindow=max(1,min(CTUleft,NWindow)) (20)
After determining N Window, a time feedback T fb is to be obtained and collected by a joint budget state module for updating the time budget of the video unit to be encoded by (9), which drives a model-based approach to implement the encoding strategy decision (embodiment 4). Alternatively, T b and T r are collected into a joint budget state module, which will drive a state-based approach to implement the encoding strategy decision (embodiment 5).
7) Example 7: the present embodiment describes an example of performance evaluation of the complexity control scheme.
The present embodiment will evaluate the effectiveness of the proposed complexity control mechanism in terms of both accuracy of complexity control per sequence and overall performance at target time. The framework is implemented in VTM 10.0. Four QP values {22,27,32,37} are selected from class A1, A2, B, C, E of the JVET standard test set for compression sequences. Class D is excluded because the number of video units in a frame is too small. AI settings are selected to display complexity control performance.
Compared to the original VTM10.0, the Time Savings (TS) is calculated as follows:
and, the time error ratio (TE) is used to calculate the time error ratio from the time budget of the sequence, as follows,
All experiments were performed on a workstation with Intel (R) KuRui (TM) i9-9900XCPU@3.50GHz, 128GB, ubuntu16.04.6LTS operating system.
First, the control accuracy of the main index, i.e., our complexity control framework, is evaluated by choosing the task of controlling the encoding time within one frame. Prior to testing, one frame of each test sequence is encoded by a default VTM 10.0 encoder, where the luma compression time is collected. A 30% to 90% complexity ratio of 10% spacing will then be applied to derive the target luminance compression time. The only information received by the redesigned VTM encoder is the specific target time, and the encoder will automatically analyze and select QTMT depths for each video unit to approach the picture-level complexity budget. In experiments, the video unit was set to CTU.
Four sequences with different resolutions and three target encoding times (40%, 60% and 80% of the original luminance compression time) were chosen as representatives to show the complexity control effect. Fig. 10A shows an example plot 1000 of the time error of each CTU as encoding proceeds in the example of (a) campfire (Campfire). Fig. 10B shows an example graph 1020 of the time error of each CTU as encoding proceeds in the example of (B) cactus (cactus). Fig. 10C shows an example plot 1040 of the time error of each CTU as encoding proceeds in the example of (C) Johnny. Fig. 10D illustrates an example graph 1060 of the temporal error of each CTU as encoding proceeds in an example of (D) party scenario (PARTYSCENE) according to some embodiments of the present disclosure. Fig. 10A-10D present the cumulative trend T e divided by the number of CTUs encoded as the CTUs are encoded. According to the figure, the average value T e fluctuates at the beginning for sequences of different sizes and target coding times. However, after about one-fourth of the picture, the frame function and the average T e converge to near zero. This demonstrates the ability of the framework to precisely control the encoding time to approach the target.
Table 2 codec performance at target complexity reduction ratio
Second, acceleration characteristics (i.e., time savings and RD loss) and complexity errors are evaluated based on TS, BDBR, and TE, respectively. The target luminance compression time is set using the same method as in the previous section. Table 2 shows the average performance of all test sequences at luminance compression times corresponding to 30% -90% of the original time. According to table 2, an average of 3.21% TE is reached, which means that the encoding time of the luminance compression process is precisely controlled to its target, i.e., from 30% to 90%. Generally, TE is higher when the target time ratio is close to 30%. This is reasonable because the error ratio is calculated with respect to the time budget. A lower picture budget means a higher TE ratio. Furthermore, according to table 1, 28.1% is the average coding complexity ratio when preset 5 is employed, and thus, even if preset 5 is applied to all CTUs, there may be some sequences whose complexity ratio cannot reach 30%, which may also lead to higher complexity errors.
From the total encoding time, an overall encoding time reduction of from 9.96% to 57.78% can be achieved with BDBR losses from 0.23% to 2.71%. As a complexity control approach, acceleration performance has been comparable to the most advanced complexity reduction algorithms.
Notably, time savings and RD performance can only be achieved by constraining the maximum depth of QTMT. Or a better acceleration strategy may be employed, e.g. also constraining the minimum depth. It is expected that better results may be possible.
7. Further embodiments
Embodiments of the present disclosure relate to codec time estimation and codec time adjustment, and as used herein, the term "block" may refer to a Codec Block (CB), a Codec Unit (CU), a Prediction Unit (PU), a Transform Unit (TU), a Prediction Block (PB), a Transform Block (TB).
Fig. 11 illustrates a flow chart of a method 1100 for video processing according to some embodiments of the present disclosure. The method 1100 may be implemented during a transition between a target video block of a video and a bitstream of the video. As shown in fig. 11, method 1100 begins at 1102, where an adjusted codec process for a target video block is determined based at least in part on a budget for a codec time for at least one additional video block and an actual codec time for the at least one additional video block. At least one further video block is encoded and decoded prior to conversion. The codec time represents a duration that at least one additional video block is encoded. The budget of the codec time represents the duration pre-allocated for encoding at least one further video block.
In some embodiments, converting may include encoding the target video block into a bitstream. In this case, the codec time may include an encoding time, and the codec process may include an encoding process.
Alternatively, the converting may include decoding the target video block from the bitstream. In this case, the codec time may include a decoding time, and the codec process may include a decoding process.
As used below, the term "target video block" may also be referred to as a "video processing unit". As used below, the adjustment of the codec process may also be referred to as a "complexity reduction process" or "complexity reduction algorithm".
At block 1104, a conversion between the target video block and the bitstream is performed, e.g., the conversion may be performed by using an adjusted codec procedure.
According to embodiments of the present disclosure, it is proposed that video coding complexity can be controlled. For example, a budget for codec time may be pre-allocated for a target video block. The actual codec time may be taken as feedback. The coding and decoding process of the subsequent video unit is adjusted by using the budget of the coding and decoding time and the fed-back actual coding and decoding time, and the adjusted coding and decoding process can be used for improving the prediction effectiveness, thereby improving the coding and decoding efficiency.
In some embodiments, a budget for a codec time of at least one additional video block may be determined. For example, individual segment budgets for codec times for multiple segments of video may be determined. For another example, individual frame budgets for codec times for multiple frames of video may be determined. For another example, individual video unit budgets for codec times for a plurality of video units of video may be determined. The at least one further video object may comprise at least one segment, at least one frame or at least one video unit. In other words, the pre-allocation of the codec time may include three phases, namely a slice phase, a frame phase, and a video unit phase.
In some embodiments, for a slice budget, it may be determined based on the number of groups of pictures (GOPs) in a slice or the number of frames in a slice, e.g., at a slice stage, a sequence encoding budget will be allocated to each slice. One slice will have one or more GOPs. The slice budget may depend on the number of GOPs/frames.
In some embodiments, for a frame, a frame budget for the frame is determined by assigning a segment budget for a codec time of the segment to a group of frames based on respective weights of the group of frames in the segment. For example, in the frame phase, a GOP budget is allocated to each frame.
In some embodiments, the respective weights for the set of frames may be determined based on decoded information, which may include, for example, a slice type, a picture type, or a Quantization Parameter (QP).
Alternatively, or in addition, the respective weights of the set of frames may be adjusted during further transitions between at least one further video block and the code stream and during transitions between the target video block and the code stream. In other words, the weight of each frame may be dynamically updated.
In some embodiments, for a video unit, the video unit budget may be determined by assigning a frame budget for a codec time of a frame to a group of video units in the frame based on respective weights of the group of video units. For example, in the video unit phase, a frame budget is allocated to each video unit.
In some embodiments, for a video unit in a set of video units, intermediate features of the video unit may be calculated during a transition between the video unit and the bitstream. Weights for the video units may be determined based on the intermediate features. For example, the intermediate features may include gradients of video units, variances of video units, or Sum of Absolute Transformed Differences (SATD) of video units.
Alternatively, or in addition, in some embodiments, weights for video units in a set of video units may be determined based on the decoded information for the video units in the set of video units. For example, the decoded information may include spatially or temporally adjacent video units, or similar video units that are historically tracked.
In some embodiments, a budget for a codec time of at least one video block may be calculated based on a codec time allocation model. For example, the codec time allocation model may include a time-cost relationship model.
In some embodiments, the budget for the codec time may be calculated based on one of: sum of Absolute Transformed Differences (SATD) cost or planar cost.
In some embodiments, the codec time allocation model may include α× CTUcost β, where α represents a weighting parameter, CTUcost represents a cost of the first video block, and β represents an exponent parameter. Or the codec time allocation model may comprise a x CTUcost β + y, where a weight parameter is denoted a, CTUcost denotes the cost of the first video block, β denotes an exponential parameter, and y denotes an offset parameter.
In some embodiments, as in block 1102, a difference between a codec time budget and an actual codec time for at least one additional video block may be determined. The codec process of the target video block may be adjusted based on the difference. By using such a feedback mechanism, the complexity reduced accuracy characteristics can be maintained.
In some embodiments, the at least one video block comprises more than one video block. In determining the difference value, an accumulated codec time for the at least one video block may be determined. A cumulative budget for a codec time of the at least one video block may also be determined. It is thus possible to determine the difference between the determination of the accumulated codec time and the budget of the accumulated codec time. For example, each of the at least one video block includes a video unit. Or each of the at least one video block comprises one of: a slice, tile, frame, or group of pictures (GOP).
This gathers video unit level (e.g., one video unit is one CTU) codec time and compares it to the video unit level budget after the codec process, which helps update the video unit level coding budget consumption state, i.e., joint budget state. Likewise, the actual codec time at the slice/tile/frame/GOP level may also be collected to update the joint budget state.
In some embodiments, the direction of adjustment for the target video block may be determined based on the difference. The adjustment direction indicates whether to speed up or slow down the codec process. The codec process for the target video block may be adjusted based on the adjustment direction. Alternatively, or in addition, in some embodiments, the acceleration ratio for the target video block is determined based on the difference. The codec process for the target video block may be accelerated based on the acceleration ratio.
Thus, by jointly considering the codec time budget and the codec time error, feedback can be used for subsequent video units in a way that adjusts direction at a speed-up ratio or coding strategy.
In some embodiments, individual time differences for a plurality of uncoded video blocks in the video may be determined based on the differences. The plurality of uncoded video blocks includes a target video block. The speed ratio may be determined based on the respective time differences of the target video block.
In some embodiments, the respective time differences may be determined by uniformly or non-uniformly distributing the differences over a plurality of uncoded video blocks. For example, each of the plurality of unencoded video blocks may include a video unit, frame, or group of pictures (GOP). For example, the time offset or time error of the codec may be maintained at the video unit level or the frame/GOP level. The accumulated error will be uniformly or unevenly distributed over the next few video units or next few frames/GOP.
In some embodiments, the target budget consumption rate is determined based on a budget of a codec time of the at least one video block and a total budget of the codec time for the video. The actual budget consumption rate may be determined based on an actual codec time of the at least one video block and a total budget for the codec time of the video. The adjustment direction may be determined based on a ratio difference between the target budget consumption ratio and the actual budget consumption ratio. The adjustment direction indicates whether to speed up or slow down the codec process. The codec process may be adjusted based on the adjustment direction. In other words, the current encoding speed is calculated by combining the total budget consumption state including the target budget consumption rate and the actual budget consumption rate, and then the encoding strategy adjustment direction is modified by using the speed.
In some embodiments, a target configuration may be determined for at least one factor of the codec process, the at least one factor comprising a plurality of configurations, the plurality of configurations having an effect on at least one of: the codec time and rate distortion used for the codec process.
In some embodiments, the at least one factor includes a maximum division depth in a division process for the target video block. For example, the maximum division depth may be the only factor. The partitioning process is for at least one of: a general video codec (VVC) standard, a High Efficiency Video Codec (HEVC) standard, an intra video codec setting, or an inter video codec setting. Taking the VVC standard as an example, the maximum partition depth may include a Quadtree (QT), a multi-type tree (MT), or a Binary Tree (BT).
Alternatively, or in addition, in some embodiments, the at least one factor includes a minimum division depth used in the division process for the target video block. Further, the at least one factor may include at least one of: intra prediction mode, inter prediction mode, intra Block Copy (IBC) prediction mode, palette prediction mode, or motion estimation range.
In some embodiments, the target configuration of at least one factor may be generated by using a pareto-based method, e.g., the most efficient coding strategy in terms of rate-distortion-complexity, i.e., presets, may be generated from a combination of factors using a pareto-based method.
In some embodiments, a set of candidate configurations (also referred to as presets) may be determined from a plurality of configurations of at least one factor. The target configuration may be selected from a set of candidate configurations. For example, respective performance of the plurality of configurations for the at least one factor may be determined based on respective time consumption or respective Rate Distortion (RD) loss of the plurality of configurations. A candidate configuration set may be determined based on respective capabilities of the plurality of configurations. Alternatively, or in addition, a candidate configuration set of at least one factor may be determined from offline training.
In some embodiments, the candidate configuration sets may be ordered in order of increasing Rate Distortion (RD) loss. The target configuration may be selected from the ordered set of candidate configurations. In other words, the presets will be arranged in order of increasing RD penalty.
Alternatively, or in addition, in some embodiments, a lookup table may be determined. The look-up table includes a set of candidate configurations. Each candidate configuration in the set of candidate configurations is associated with a respective rate-increase ratio and a respective rate-distortion (RD) loss. The target configuration may be selected from the candidate configuration set based on a look-up table. In other words, the look-up table may include all presets. For each preset, the configuration of each factor, corresponding codec time-saving ratio, corresponding RD loss may be included.
In some embodiments, the target speed ratio may be determined based on the difference. The candidate configuration associated with the acceleration ratio closest to the target acceleration ratio may be selected from the lookup table as the target configuration. In other words, the lookup table will be utilized to find the corresponding preset whose acceleration ratio is closest to the target acceleration ratio.
In some embodiments, the adjustment direction may be determined based on the difference. The adjustment direction indicates whether to speed up or slow down the codec process. The target configuration may be determined based on the adjustment direction and the first threshold. The target configuration is associated with an adjustment direction indicated by the determined adjustment direction. In other words, the first threshold may be utilized to decide to select a faster or slower preset.
In some embodiments, the candidate configuration set includes a plurality of configurations. In other words, all presets will be available. Alternatively, in some embodiments, the candidate configuration set includes a subset of the plurality of configurations. That is, several presets will be available.
In some embodiments, the target configuration may be adjusted based on the second threshold, and the adjusted target configuration may be associated with an adjustment direction indicated by the determined adjustment direction. In other words, the second threshold may be utilized to adjust the available presets to a faster or slower range.
In some embodiments, the candidate configuration set includes a plurality of consecutive configurations within a predetermined range. That is, the presets may be several consecutive presets. Alternatively, the candidate configuration set includes a plurality of discontinuous configurations. In other words, the presets may be discontinuous.
In some embodiments, at least one of the first and second thresholds is predefined or empirically fixed. Or at least one of the first and second thresholds may be elastically adjusted. For example, at least one of the first and second thresholds may be adjusted during a further transition between at least one further video block and the code stream and during a transition between the target video block and the code stream. In other words, one of the two thresholds is adjusted according to the codec procedure.
In some embodiments, the uncoded ratio may be determined based on a set of video blocks being encoded and a plurality of video blocks included in the video. At least one of the first threshold and the second threshold is adjusted based on the uncoded ratio and the adjustment parameter. For example, each video block may include a video unit, frame, or group of pictures (GOP).
In some embodiments, the adjustment parameter is greater than zero and less than one. Alternatively or additionally, the first threshold and the second threshold are related. For example, the first and second thresholds may be adjusted by using thr2=thr+δ· (1-r b), where thr represents the first threshold, thr2 represents the second threshold, r b represents the uncoded ratio, and δ represents the adjustment parameter.
Thus, depending on the manner of feedback, the coding strategy decision is made by determining which preset is used by the following video unit. The selected configuration of the selected presets or factors may have satisfactory performance in terms of codec time and RD loss.
In some embodiments, the bitstream of video may be stored in a non-transitory computer readable recording medium. The code stream of video may be generated by a method performed by a video processing apparatus. According to the method, an adjusted encoding process for a target video block of the video is determined based at least in part on a budget for encoding time of at least one further video block and an actual encoding time for the at least one further video block. At least one further video block is encoded prior to conversion. The encoding time represents the duration that at least one further video block is encoded. The budget of the encoding time represents the duration pre-allocated for encoding the at least one further video block. The code stream may be generated by using an adjusted encoding process.
In some embodiments, the adjusted encoding process for the target video block of video is determined based at least in part on a budget for encoding time of at least one additional video block and an actual encoding time for at least one additional video block. At least one further video block is encoded prior to conversion. The encoding time represents the duration that at least one further video block is encoded. The budget of the encoding time represents the duration pre-allocated for encoding the at least one further video block. The code stream may be generated by using the adjusted encoding process. The code stream may be stored in a non-transitory computer readable recording medium.
Implementations of the present disclosure may be described in terms of the following clauses, the features of which may be combined in any reasonable manner.
Clause 1. A method for video processing, comprising: during a transition between a target video block of video and a bitstream of video, determining an adjusted codec process for the target video block based at least in part on a budget of a codec time for at least one additional video block and an actual codec time for the at least one additional video block, the at least one additional video block being encoded prior to the transition, the codec time representing a duration of the at least one additional video block being encoded, the budget of the codec time representing a duration of the at least one additional video block being pre-allocated for encoding; and performing the conversion by using the adjusted codec procedure.
Clause 2. The method of clause 1, further comprising: a budget for a codec time of at least one further video block is determined.
Clause 3 the method of clause 2, wherein determining the budget for the codec time comprises at least one of: determining respective segment budgets for codec times for a plurality of segments of video; determining respective frame budgets for a codec time for a plurality of frames of the video; or determining respective video unit budgets for a codec time of a plurality of video units of the video, wherein the at least one further video object comprises one of: at least one segment, at least one frame, or at least one video unit.
Clause 4. The method of clause 3, wherein determining the segment budget for the codec time of the segment comprises: the slice budget is determined by allocating a sequence coding budget to a slice based on a number of groups of pictures (GOP) in the slice or a number of frames in the slice.
Clause 5. The method of clause 3, wherein determining the frame budget for the codec time of the frame comprises: a segment budget for a codec time of a segment is allocated to a group of frames based on respective weights of the group of frames in the segment.
Clause 6 the method of clause 5, further comprising: respective weights for a set of frames are determined based on the decoded information.
Clause 7. The method of clause 6, wherein the decoded information comprises at least one of: slice type, picture type, or Quantization Parameter (QP).
Clause 8 the method of clause 6, wherein determining the respective weights for the set of frames comprises: during the further transition between the at least one further video block and the code stream and during the transition between the target video block and the code stream, the respective weights for the set of frames are adjusted.
Clause 9. The method of clause 5, wherein determining the video unit budget for the codec time of the video unit comprises: a frame budget for a codec time of a frame is allocated to a group of video units based on respective weights of the group of video units in the frame.
Clause 10. The method of clause 9, further comprising: calculating, for a video unit of a set of video units, an intermediate feature of the video unit during a transition between the video unit and the bitstream; and determining weights for the video units based on the intermediate features.
Clause 11. The method of clause 10, wherein the intermediate features include at least one of: gradient of video unit, variance of video unit, or Sum of Absolute Transformed Differences (SATD) of video unit.
Clause 12 the method of clause 9, further comprising: for a video unit of a set of video units, weights for the video units of the set of video units are determined based on the decoded information.
Clause 13 the method of clause 12, wherein the decoded information comprises at least one of: spatially or temporally adjacent video units, or similar video units that are historically tracked.
Clause 14. The method of clause 2, wherein determining the budget for the codec time for the at least one video block comprises: a budget for the codec time is calculated based on the codec time allocation model.
Clause 15. The method of clause 14, wherein the codec time allocation model comprises a time-cost relationship model.
Clause 16 the method of clause 14 or clause 15, wherein calculating the budget for the codec time comprises: calculating a budget for the codec time based on one of: sum of Absolute Transformed Differences (SATD) cost or planar cost.
Clause 17 the method of any of clauses 14-16, wherein the codec time allocation model comprises one of: α× CTUcost β, where α represents a weighting parameter, CTUcost represents a cost of the first video block, and β represents an exponential parameter, or α× CTUcost β +γ, where α represents a weighting parameter, CTUcost represents a cost of the first video block, β represents an exponential parameter, and γ represents an offset parameter.
Clause 18 the method of any of clauses 1-17, wherein determining the adjusted codec process comprises: determining a difference between a budget of a codec time for at least one further video block and an actual codec time; and adjusting a codec process for the target video block based on the difference.
Clause 19 the method of clause 18, wherein the at least one video block comprises more than one video block, and wherein determining the difference comprises: determining an accumulated codec time for at least one video block; determining a cumulative budget for a codec time of at least one video block; and determining a difference between the accumulated codec time and a budget for the accumulated codec time.
The method of clause 20, wherein each of the at least one video block comprises a video unit.
Clause 21 the method of clause 19, wherein each of the at least one video block comprises one of: a slice, tile, frame, or group of pictures (GOP).
Clause 22 the method of any of clauses 18-21, wherein adjusting the codec process of the target video block comprises: determining an adjustment direction for the target video block based on the difference, the adjustment direction indicating whether to accelerate or slow down the encoding and decoding process; and adjusting a codec process for the target video block based on the adjustment direction.
Clause 23 the method of any of clauses 18-21, wherein adjusting the codec process of the target video block comprises: determining an acceleration ratio for the target video block based on the difference; and accelerating the codec process for the target video block based on the acceleration ratio.
Clause 24 the method of clause 23, wherein determining the acceleration ratio for the target video block comprises: determining respective time differences for a plurality of uncoded video blocks in the video based on the differences, the plurality of uncoded video blocks comprising the target video block; and determining a speed-up ratio based on the respective time differences of the target video block.
Clause 25 the method of clause 24, wherein determining the respective time differences comprises: the respective time differences are determined by uniformly or non-uniformly distributing the differences over the plurality of uncoded video blocks.
The method of clause 24 or clause 25, wherein each of the plurality of unencoded video blocks comprises one of: a video unit, frame, or group of pictures (GOP).
Clause 27. The method of clause 22, wherein adjusting the codec process of the target video block comprises: determining a target budget consumption ratio based on a budget of a codec time of the at least one video block and a total budget of the codec time for the video; determining an actual budget consumption rate based on an actual codec time of the at least one video block and a total budget for the codec time of the video; determining an adjustment direction based on a ratio difference between the target budget consumption rate and the actual budget consumption rate, the adjustment direction indicating whether to accelerate or slow down the codec process; and adjusting the codec process based on the adjustment direction.
Clause 28 the method of clause 22, wherein adjusting the codec process comprises: determining a target configuration for at least one factor of the codec process, the at least one factor comprising a plurality of configurations, the plurality of configurations affecting at least one of: the codec time and rate distortion used for the codec process.
Clause 29. The method of clause 28, wherein the at least one factor comprises a maximum division depth used in the division process for the target video block.
The method of clause 30, wherein the partitioning process is used for at least one of: a multi-function video codec (VVC) standard, a High Efficiency Video Codec (HEVC) standard, an intra video codec setting, or an inter video codec setting.
Clause 31 the method of clause 28 or clause 29, wherein for the multi-function video codec (VVC) standard, the maximum partitioning depth comprises at least one of: quadtree (QT), multi-type tree (MT), or Binary Tree (BT).
Clause 32. The method of clause 28, wherein the at least one factor comprises a minimum division depth used in the division process for the target video block.
Clause 33. The method of clause 28, wherein the at least one factor comprises at least one of: intra prediction mode, inter prediction mode, intra Block Copy (IBC) prediction mode, palette prediction mode, or motion estimation range.
Clause 34 the method of clause 28, wherein determining the target configuration for at least one factor of the codec process comprises: a target configuration of at least one factor is generated using a pareto-based method.
The method of clause 28 or clause 34, wherein determining the target configuration for the at least one factor comprises: determining a candidate configuration set from a plurality of configurations of the at least one factor; and selecting a target configuration from the candidate configuration set.
Clause 36 the method of clause 35, wherein determining the candidate configuration set comprises: determining respective performance of the plurality of configurations for the at least one factor based on respective time consumption or respective Rate Distortion (RD) loss of the plurality of configurations; and determining a candidate configuration set based on respective capabilities of the plurality of configurations.
Clause 37 the method of clause 35, wherein determining the candidate configuration set comprises: a candidate configuration set of the at least one factor is determined by offline training.
The method of any of clauses 35-38, wherein selecting the target configuration comprises: sorting the candidate configuration sets in increasing rate-distortion (RD) loss order; and selecting a target configuration from the ordered candidate configuration set.
Clause 39 the method of any of clauses 35-37, wherein selecting the target configuration comprises: determining a look-up table comprising a set of candidate configurations, each candidate configuration in the set of candidate configurations being associated with a respective rate-increase ratio and a respective rate-distortion (RD) loss; and selecting a target configuration from the candidate configuration set based on the lookup table.
Clause 40 the method of clause 39, wherein selecting the target configuration comprises: determining a target speed-up ratio based on the difference; and selecting, from the lookup table, the candidate configuration associated with the acceleration ratio closest to the target acceleration ratio as the target configuration.
Clause 41 the method of any of clauses 35-37, wherein selecting the target configuration from the set of candidate configurations comprises: determining an adjustment direction based on the difference, the adjustment direction indicating whether to accelerate or slow down the codec process; and determining a target configuration based on the adjustment direction and the first threshold, the target configuration being associated with the adjustment direction indicated by the determined adjustment direction.
Clause 42. The method of clause 41, wherein the candidate configuration set comprises a plurality of configurations.
Clause 43 the method of clause 41, wherein determining the target configuration based on the comparison comprises: the target configuration is adjusted based on the second threshold, the adjusted target configuration being associated with an adjustment direction indicated by the determined adjustment direction.
Clause 44 the method of clause 41, wherein the candidate configuration set comprises a subset of the plurality of configurations.
Clause 45 the method of any of clauses 35-44, wherein the candidate configuration set comprises a plurality of consecutive configurations within a predetermined range.
Clause 46 the method of any of clauses 35-44, wherein the candidate configuration set comprises a plurality of discontinuous configurations.
Clause 47 the method of any of clauses 43-46, wherein at least one of the first threshold and the second threshold is predefined.
Clause 48 the method of any of clauses 43-46, further comprising: at least one of the first threshold and the second threshold is adjusted during a further transition between the at least one further video block and the code stream and during a transition between the target video block and the code stream.
Clause 49 the method of clause 48, wherein adjusting at least one of the first threshold and the second threshold comprises: determining an uncoded ratio based on the set of video blocks being coded and the plurality of video blocks included in the video; and adjusting at least one of the first threshold and the second threshold based on the uncoded ratio and the adjustment parameter.
Clause 50 the method of clause 49, wherein each video chunk comprises one of: a video unit, frame, or group of pictures (GOP).
Clause 51 the method of clause 49 or clause 50, wherein the adjustment parameter is greater than zero and less than one.
Clause 52 the method of any of clauses 49-51, wherein the first threshold is related to the second threshold.
Clause 53 the method of any of clauses 49-51, wherein the first and second thresholds are adjusted using thr2=thr+δ· (1-r b), wherein thr represents the first threshold, thr2 represents the second threshold, r b represents the uncoded ratio, and δ represents the adjustment parameter.
Clause 54 the method of any of clauses 1-53, wherein converting comprises encoding the target video block into a bitstream.
Clause 55 the method of clause 54, wherein the codec time comprises an encoding time and the codec process comprises an encoding process.
Clause 56 the method of any of clauses 1-53, wherein converting comprises decoding the target video block from the bitstream.
Clause 57 the method of clause 56, wherein the codec time comprises a decode time and the codec process comprises a decode process.
Clause 58 an apparatus for processing video data, comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to perform the method according to any of clauses 1-57.
Clause 59. A non-transitory computer readable storage medium storing instructions that cause a processor to perform the method of any of clauses 1-57.
Clause 60, a non-transitory computer readable recording medium storing a bitstream of video generated by a method performed by a video processing device, wherein the method comprises: determining an adjusted encoding process for a target video block of video based at least in part on a budget of encoding time for at least one further video block and an actual encoding time for the at least one further video block, the at least one further video block encoded prior to the conversion, the encoding time representing a duration of time the at least one further video block is encoded, the budget of encoding time representing a duration of time pre-allocated for encoding the at least one further video block; and generating a code stream by using the adjusted encoding process.
Clause 61 a method for storing a bitstream of a video, comprising: determining an adjusted encoding process for a target video block of video based at least in part on a budget of encoding time for at least one further video block and an actual encoding time for the at least one further video block, the at least one further video block encoded prior to the conversion, the encoding time representing a duration of time the at least one further video block is encoded, the budget of encoding time representing a duration of time pre-allocated for encoding the at least one further video block; generating a code stream by using the adjusted encoding process; and storing the code stream in a non-transitory computer readable recording medium.
Example apparatus
FIG. 12 illustrates a block diagram of a computing device 1200 in which various embodiments of the disclosure may be implemented. The computing device 1200 may be implemented as the source device 110 (or video encoder 114 or 200) or the destination device 120 (or video decoder 124 or 300), or may be included in the source device 110 (or video encoder 114 or 200) or the destination device 120 (or video decoder 124 or 300).
It should be understood that the computing device 1200 shown in fig. 12 is for illustration purposes only and is not intended to suggest any limitation as to the scope of use or functionality of the embodiments of the disclosure in any way.
As shown in fig. 12, computing device 1200 includes a general purpose computing device 1200. The computing device 1200 may include at least one or more processors or processing units 1210, memory 1220, storage unit 1230, one or more communication units 1240, one or more input devices 1250, and one or more output devices 1260.
In some embodiments, computing device 1200 may be implemented as any user terminal or server terminal having computing capabilities. The server terminal may be a server provided by a service provider, a large computing device, or the like. The user terminal may be, for example, any type of mobile terminal, fixed terminal, or portable terminal, including a mobile phone, station, unit, device, multimedia computer, multimedia tablet computer, internet node, communicator, desktop computer, laptop computer, notebook computer, netbook computer, personal Communication System (PCS) device, personal navigation device, personal Digital Assistants (PDAs), audio/video player, digital camera/camcorder, positioning device, television receiver, radio broadcast receiver, electronic book device, game device, or any combination thereof, and including the accessories and peripherals of these devices or any combination thereof. It is contemplated that the computing device 1200 may support any type of interface to the user (such as "wearable" circuitry, etc.).
The processing unit 1210 may be a physical processor or a virtual processor, and may implement various processes based on programs stored in the memory 1220. In a multiprocessor system, multiple processing units execute computer-executable instructions in parallel in order to improve the parallel processing capabilities of computing device 1200. The processing unit 1210 may also be referred to as a Central Processing Unit (CPU), microprocessor, controller, or microcontroller.
Computing device 1200 typically includes a variety of computer storage media. Such media can be any medium that is accessible by computing device 1200, including but not limited to volatile and nonvolatile media, or removable and non-removable media. The memory 1220 may be volatile memory (e.g., registers, cache, random Access Memory (RAM)), non-volatile memory (such as read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), or flash memory), or any combination thereof. The storage unit 1230 may be any removable or non-removable media and may include machine-readable media such as memories, flash drives, diskettes or other media that may be used to store information and/or data and that may be accessed in the computing device 1200.
Computing device 1200 may also include additional removable/non-removable storage media, volatile/nonvolatile storage media. Although not shown in fig. 12, a magnetic disk drive for reading from and/or writing to a removable nonvolatile magnetic disk, and an optical disk drive for reading from and/or writing to a removable nonvolatile optical disk may be provided. In this case, each drive may be connected to a bus (not shown) via one or more data medium interfaces.
The communication unit 1240 communicates with another computing device via a communication medium. Additionally, the functionality of components in computing device 1200 may be implemented by a single computing cluster or multiple computing machines that may communicate via a communication connection. Accordingly, the computing device 1200 may operate in a networked environment using logical connections to one or more other servers, networked Personal Computers (PCs), or other general purpose network nodes.
The input device 1250 may be one or more of a variety of input devices such as a mouse, keyboard, trackball, voice input device, and the like. The output device 1260 may be one or more of a variety of output devices, such as a display, speakers, printer, etc. By means of the communication unit 1240, the computing device 1200 may also communicate with one or more external devices (not shown), such as storage devices and display devices, and the computing device 1200 may also communicate with one or more devices that enable a user to interact with the computing device 1200, or any device (e.g., network card, modem, etc.) that enables the computing device 1200 to communicate with one or more other computing devices, if desired. Such communication may occur via an input/output (I/O) interface (not shown).
In some embodiments, some or all of the components of computing device 1200 may also be arranged in a cloud computing architecture, rather than integrated in a single device. In a cloud computing architecture, components may be provided remotely and work together to implement the functionality described in this disclosure. In some embodiments, cloud computing provides computing, software, data access, and storage services that will not require the end user to know the physical location or configuration of the system or hardware that provides these services. In various embodiments, cloud computing provides services via a wide area network (e.g., the internet) using a suitable protocol. For example, cloud computing providers provide applications over a wide area network that may be accessed through a web browser or any other computing component. Software or components of the cloud computing architecture and corresponding data may be stored on a remote server. Computing resources in a cloud computing environment may be consolidated or distributed at locations of remote data centers. The cloud computing infrastructure may provide services through a shared data center, although they appear as a single access point for users. Thus, the cloud computing architecture may be used to provide the components and functionality described herein from a service provider at a remote location. Alternatively, they may be provided by a conventional server, or installed directly or otherwise on a client device.
In embodiments of the present disclosure, computing device 1200 may be used to implement video encoding/decoding. Memory 1220 may include one or more video codec modules 1225 having one or more program instructions. These modules can be accessed and executed by the processing unit 1210 to perform the functions of the various embodiments described herein.
In an example embodiment that performs video encoding, input device 1250 may receive video data as input 1270 to be encoded. The video data may be processed by, for example, video encoding module 1225 to generate an encoded bitstream. The encoded code stream may be provided as an output 1280 via an output device 1260.
In an example embodiment that performs video decoding, input device 1250 may receive the encoded bitstream as input 1270. The encoded bitstream may be processed, for example, by a video encoding module 1225 to generate decoded video data. The decoded video data may be provided as output 1280 via output device 1260.
While the present disclosure has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application as defined by the appended claims. Such variations are intended to be covered by the scope of this application. Accordingly, the foregoing description of embodiments of the application is not intended to be limiting.
Claims (61)
1. A video processing method, comprising:
During a transition between a target video block of video and a bitstream of the video, determining an adjusted codec process for the target video block based at least in part on a budget of a codec time for at least one further video block and an actual codec time for the at least one further video block, the at least one further video block being encoded prior to the transition, the codec time representing a duration of the at least one further video block being encoded, the budget of the codec time representing a duration pre-allocated for encoding the at least one further video block; and
The conversion is performed by using the adjusted codec procedure.
2. The method of claim 1, further comprising:
A budget for the codec time of the at least one further video block is determined.
3. The method of claim 2, wherein determining the budget for the codec time comprises at least one of:
Determining respective segment budgets for codec times for a plurality of segments of the video;
Determining respective frame budgets for a codec time for a plurality of frames of the video; or alternatively
Determining respective video unit budgets for codec times of a plurality of video units of the video,
Wherein the at least one additional video object comprises one of: at least one segment, at least one frame, or at least one video unit.
4. The method of claim 3, wherein determining a slice budget for a codec time of a slice comprises:
The segment budget is determined by allocating a sequence coding budget to the segment based on a number of group of pictures (GOP) in the segment or a number of frames in the segment.
5. A method according to claim 3, wherein determining a frame budget for a codec time of a frame comprises:
a segment budget for a codec time of a segment is allocated to a set of frames in the segment based on respective weights of the set of frames.
6. The method of claim 5, further comprising:
The respective weights for the set of frames are determined based on the decoded information.
7. The method of claim 6, wherein the decoded information comprises at least one of:
Slice type, picture type, or Quantization Parameter (QP).
8. The method of claim 6, wherein determining the respective weights for the set of frames comprises:
The respective weights for the set of frames are adjusted during a further transition between the at least one further video block and the bitstream and during a transition between the target video block and the bitstream.
9. The method of claim 5, wherein determining a video unit budget for a codec time of a video unit comprises:
A frame budget for a codec time of a frame is allocated to a set of video units in the frame based on respective weights of the set of video units.
10. The method of claim 9, further comprising:
for a video unit of the set of video units,
Calculating an intermediate feature of the video unit during a transition between the video unit and the bitstream; and
Weights for the video units are determined based on the intermediate features.
11. The method of claim 10, wherein the intermediate features comprise at least one of:
The gradient of the video unit is such that,
Variance of the video unit, or
An absolute transformed difference Sum (SATD) of the video units.
12. The method of claim 9, further comprising:
for a video unit of the set of video units, weights for the video units of the set of video units are determined based on the decoded information.
13. The method of claim 12, wherein the decoded information comprises at least one of:
spatially or temporally adjacent video units, or
Similar video units are historically tracked.
14. The method of claim 2, wherein determining a budget for the codec time for the at least one video block comprises:
A budget for the codec time is calculated based on a codec time allocation model.
15. The method of claim 14, wherein the codec time allocation model comprises a time-cost relationship model.
16. The method of claim 14 or claim 15, wherein calculating the budget for the codec time comprises:
Calculating the budget of the codec time based on one of: sum of Absolute Transformed Differences (SATD) cost or planar cost.
17. The method of any of claims 14-16, wherein the codec time allocation model comprises one of:
α -CTUcost β, where α represents a weighting parameter, CTUcost represents a cost of the first video block, and β represents an exponential parameter, or
Α× CTUcost β +γ, where α represents a weighting parameter, CTUcost represents a cost of the first video block, β represents an exponential parameter, and γ represents an offset parameter.
18. The method of any of claims 1-17, wherein determining the adjusted codec procedure comprises:
determining a difference between a budget of the codec time and the actual codec time for the at least one further video block; and
The codec process for the target video block is adjusted based on the difference.
19. The method of claim 18, wherein the at least one video block comprises more than one video block, and wherein determining the difference comprises:
Determining an accumulated codec time for the at least one video block;
Determining an accumulated budget for a codec time of the at least one video block; and
A difference between the accumulated codec time and a budget for the accumulated codec time is determined.
20. The method of claim 19, wherein each video block of the at least one video block comprises a video unit.
21. The method of claim 19, wherein each video block of the at least one video block comprises one of:
A slice, tile, frame, or group of pictures (GOP).
22. The method of any of claims 18-21, wherein adjusting the codec process of the target video block comprises:
determining an adjustment direction for the target video block based on the difference, the adjustment direction indicating whether to accelerate the codec process or slow the codec process; and
The codec process for the target video block is adjusted based on the adjustment direction.
23. The method of any of claims 18-21, wherein adjusting the codec process of the target video block comprises:
determining an acceleration ratio for the target video block based on the difference; and
The codec process for the target video block is accelerated based on the acceleration ratio.
24. The method of claim 23, wherein determining the acceleration ratio for the target video block comprises:
Determining respective temporal differences for a plurality of uncoded video blocks in the video based on the differences, the plurality of uncoded video blocks comprising the target video block; and
The speed up ratio is determined based on the respective time differences of the target video block.
25. The method of claim 24, wherein determining the respective time differences comprises:
the respective time differences are determined by uniformly or non-uniformly distributing the differences across the plurality of uncoded video blocks.
26. The method of claim 24 or claim 25, wherein each unencoded video block of the plurality of unencoded video blocks comprises one of:
A video unit, frame, or group of pictures (GOP).
27. The method of claim 22, wherein adjusting the codec process of the target video block comprises:
Determining a target budget consumption rate based on a budget of said codec time for said at least one video block and a total budget of codec time for said video;
Determining an actual budget consumption rate based on the actual codec time of the at least one video block and a total budget for the codec time of the video;
determining an adjustment direction based on a ratio difference between the target budget consumption ratio and the actual budget consumption ratio, the adjustment direction indicating whether to accelerate the codec process or slow the codec process; and
And adjusting the coding and decoding process based on the adjustment direction.
28. The method of claim 22, wherein adjusting the codec procedure comprises:
Determining a target configuration for at least one factor of the codec process, the at least one factor comprising a plurality of configurations, the plurality of configurations affecting at least one of: the codec time and rate distortion used for the codec process.
29. The method of claim 28, wherein the at least one factor comprises a maximum division depth used in a division process for the target video block.
30. The method of claim 29, wherein the partitioning process is for at least one of:
a Versatile Video Codec (VVC) standard,
The High Efficiency Video Codec (HEVC) standard,
Intra video codec setup, or
Inter-frame video codec settings.
31. The method of claim 28 or claim 29, wherein the maximum partition depth comprises at least one of the following for a multi-function video codec (VVC) standard:
a Quadtree (QT),
Multiple type tree (MT), or
Binary Tree (BT).
32. The method of claim 28, wherein the at least one factor comprises a minimum partition depth used in a partitioning process for the target video block.
33. The method of claim 28, wherein the at least one factor comprises at least one of:
an intra-frame prediction mode is used,
An inter-frame prediction mode is used,
Intra Block Copy (IBC) prediction mode,
Palette prediction mode, or
Motion estimation range.
34. The method of claim 28, wherein determining a target configuration for the at least one factor of the codec process comprises:
the target configuration for the at least one factor is generated using a pareto-based method.
35. The method of claim 28 or claim 34, wherein determining the target configuration of the at least one factor comprises:
Determining a candidate configuration set from a plurality of configurations of the at least one factor; and
The target configuration is selected from the candidate configuration set.
36. The method of claim 35, wherein determining the candidate configuration set comprises:
Determining respective performance of the plurality of configurations of the at least one factor based on respective time consumption or respective Rate Distortion (RD) loss of the plurality of configurations; and
The candidate configuration set is determined based on the respective capabilities of the plurality of configurations.
37. The method of claim 35, wherein determining the candidate configuration set comprises:
the candidate configuration set of the at least one factor is determined by offline training.
38. The method of any of claims 35-38, wherein selecting the target configuration comprises:
Ordering the candidate configuration sets in increasing order of Rate Distortion (RD) loss; and
And selecting the target configuration from the sorted candidate configuration set.
39. The method of any of claims 35-37, wherein selecting the target configuration comprises:
determining a look-up table comprising the set of candidate configurations, each candidate configuration in the set of candidate configurations being associated with a respective rate-increase ratio and a respective rate-distortion (RD) loss; and
The target configuration is selected from the candidate configuration set based on the lookup table.
40. The method of claim 39, wherein selecting the target configuration comprises:
determining a target speed ratio based on the difference; and
A candidate configuration associated with an acceleration ratio closest to the target acceleration ratio is selected from the lookup table as the target configuration.
41. The method of any of claims 35-37, wherein selecting the target configuration from the set of candidate configurations comprises:
determining an adjustment direction based on the difference, the adjustment direction indicating whether to accelerate the codec process or slow the codec process; and
The target configuration is determined based on the adjustment direction and a first threshold, the target configuration being associated with an adjustment direction indicated by the determined adjustment direction.
42. The method of claim 41, wherein the candidate set of configurations comprises the plurality of configurations.
43. The method of claim 41, wherein determining the target configuration based on the comparison comprises:
The target configuration is adjusted based on a second threshold, the adjusted target configuration being associated with an adjustment direction indicated by the determined adjustment direction.
44. The method of claim 41, wherein the candidate set of configurations comprises a subset of the plurality of configurations.
45. The method of any of claims 35-44, wherein the candidate configuration set comprises a plurality of consecutive configurations within a predetermined range.
46. The method of any of claims 35-44, wherein the candidate configuration set comprises a plurality of discontinuous configurations.
47. The method of any one of claims 43-46, wherein at least one of the first threshold and the second threshold is predefined.
48. The method of any one of claims 43-46, further comprising:
At least one of the first threshold and the second threshold is adjusted during a further transition between the at least one further video block and the bitstream and during a transition between the target video block and the bitstream.
49. The method of claim 48, wherein adjusting at least one of the first threshold and the second threshold comprises:
Determining an uncoded ratio based on a set of video blocks being coded and a plurality of video blocks included in the video; and
At least one of the first threshold and the second threshold is adjusted based on the uncoded ratio and an adjustment parameter.
50. The method of claim 49, wherein each video block comprises one of:
The video unit is used for generating a video signal,
Frames, or
Group of pictures (GOP).
51. The method of claim 49 or claim 50, wherein the adjustment parameter is greater than zero and less than one.
52. The method of any of claims 49-51, wherein the first threshold and the second threshold are correlated.
53. The method of any one of claims 49-51, wherein the first and second thresholds are adjusted using thr2 = thr + δ - (1-r b), wherein thr represents the first threshold, thr2 represents the second threshold, r b represents the uncoded ratio, and δ represents the adjustment parameter.
54. The method of any of claims 1-53, wherein the converting comprises encoding the target video block into the bitstream.
55. The method of claim 54, wherein the codec time comprises a coding time and the codec process comprises a coding process.
56. The method of any of claims 1-53, wherein the converting comprises decoding the target video block from the bitstream.
57. The method of claim 56, wherein said codec time comprises a decoding time and said codec process comprises a decoding process.
58. An apparatus for processing video data, comprising a processor and a non-transitory memory having instructions thereon, wherein the instructions, when executed by the processor, cause the processor to perform the method of any of claims 1-57.
59. A non-transitory computer readable storage medium storing instructions for causing a processor to perform the method of any one of claims 1-57.
60. A non-transitory computer readable recording medium storing a code stream of a video generated by a method performed by a video processing apparatus, wherein the method comprises:
Determining an adjusted encoding process for a target video block of the video based at least in part on a budget for an encoding time of at least one further video block, the at least one further video block encoded prior to the conversion, and an actual encoding time for the at least one further video block, the encoding time representing a duration of the at least one further video block encoded, the budget of encoding time representing a duration pre-allocated for encoding the at least one further video block; and
The code stream is generated by using the adjusted encoding process.
61. A method for storing a bitstream of video, comprising:
Determining an adjusted encoding process for a target video block of the video based at least in part on a budget for an encoding time of at least one further video block, the at least one further video block encoded prior to the conversion, and an actual encoding time for the at least one further video block, the encoding time representing a duration of the at least one further video block encoded, the budget of encoding time representing a duration pre-allocated for encoding the at least one further video block;
generating the code stream by using the adjusted encoding process; and
The code stream is stored in a non-transitory computer readable recording medium.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNPCT/CN2021/111635 | 2021-08-09 | ||
CN2021111635 | 2021-08-09 | ||
PCT/CN2022/110366 WO2023016351A1 (en) | 2021-08-09 | 2022-08-04 | Method, apparatus, and medium for video processing |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118285094A true CN118285094A (en) | 2024-07-02 |
Family
ID=85200531
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202280056040.0A Pending CN118285094A (en) | 2021-08-09 | 2022-08-04 | Method, apparatus and medium for video processing |
Country Status (3)
Country | Link |
---|---|
US (1) | US20240205415A1 (en) |
CN (1) | CN118285094A (en) |
WO (1) | WO2023016351A1 (en) |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100150168A1 (en) * | 2008-11-17 | 2010-06-17 | Chanchal Chatterjee | Method and apparatus for multiplexing of digital video |
US20100166060A1 (en) * | 2008-12-31 | 2010-07-01 | Texas Instruments Incorporated | Video transcoder rate control |
US9591316B2 (en) * | 2014-03-27 | 2017-03-07 | Intel IP Corporation | Scalable video encoding rate adaptation based on perceived quality |
CN111010569B (en) * | 2018-10-06 | 2023-02-28 | 北京字节跳动网络技术有限公司 | Improvement of temporal gradient calculation in BIO |
-
2022
- 2022-08-04 WO PCT/CN2022/110366 patent/WO2023016351A1/en active Application Filing
- 2022-08-04 CN CN202280056040.0A patent/CN118285094A/en active Pending
-
2024
- 2024-02-07 US US18/435,980 patent/US20240205415A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
US20240205415A1 (en) | 2024-06-20 |
WO2023016351A1 (en) | 2023-02-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20190297347A1 (en) | Picture-level rate control for video encoding | |
CN104885455B (en) | Computer-implemented method and apparatus for video coding | |
CN101543074B (en) | Deblocking filtering apparatus and method | |
CN111373749B (en) | Method and apparatus for low complexity bi-directional intra prediction in video encoding and decoding | |
US20160366437A1 (en) | Search strategies for intra-picture prediction modes | |
JP4632049B2 (en) | Video coding method and apparatus | |
EP4037320A1 (en) | Boundary extension for video coding | |
US20230239464A1 (en) | Video processing method with partial picture replacement | |
US20240187604A1 (en) | Joint coding of motion vector difference | |
US9565404B2 (en) | Encoding techniques for banding reduction | |
EP4333433A1 (en) | Video coding method and apparatus, and electronic device | |
US10171807B2 (en) | Picture-level QP rate control for HEVC encoding | |
CN113170175A (en) | Adaptive temporal filter for unavailable reference pictures | |
Wang et al. | Complexity adaptive H. 264 encoding for light weight streams | |
CN118369919A (en) | Method, apparatus and medium for video processing | |
CN118285094A (en) | Method, apparatus and medium for video processing | |
WO2022026480A1 (en) | Weighted ac prediction for video coding | |
CN117616751A (en) | Video encoding and decoding of moving image group | |
WO2024217464A1 (en) | Method, apparatus, and medium for video processing | |
Lu et al. | Generalized optical flow based motion vector refinement in av1 | |
US20240195986A1 (en) | Method, apparatus, and medium for video processing | |
US20240214565A1 (en) | Method, device, and medium for video processing | |
WO2024213142A1 (en) | Method, apparatus, and medium for video processing | |
US20240098299A1 (en) | Method and apparatus for motion vector prediction based on subblock motion vector | |
US20240205417A1 (en) | Method, apparatus, and medium for video processing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |