CN118042157A

CN118042157A - Video code stream processing method, electronic device and program product

Info

Publication number: CN118042157A
Application number: CN202410275878.XA
Authority: CN
Inventors: 陈智斌; 周统和
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Filing date: 2024-03-11
Publication date: 2024-05-14

Abstract

The present disclosure provides a video bitstream processing method, including: acquiring an initial video code stream; acquiring a plurality of original frames from an initial video code stream; generating a first coding frame and a second coding frame which correspond to the original frames respectively, wherein the number of the bytes of the code stream occupied by the second coding frame is smaller than or equal to the number of the bytes of the code stream occupied by the first coding frame; and generating a target video code stream according to the first coding frame and the second coding frame. The present disclosure also provides an electronic device and a program product for performing the method.

Description

Video code stream processing method, electronic device and program product

Technical Field

The present disclosure relates to the field of video encoding, and in particular, to a video bitstream processing method, an electronic device, and a program product.

Background

Video coding is a process of compressing a code stream in an original video signal into a smaller code stream in order to more efficiently transmit and store video. Such as removing redundant information, exploiting spatial and temporal correlation in video, using compression algorithms, etc. Thus, the data volume of the video can be greatly reduced, and the video transmission is faster and more stable.

However, in the process of encoding the video code stream, in order to ensure the stability and consistency of the video in the transmission process, and ensure the data volume of the video within a certain range, the code rate of the video code stream needs to be stabilized, but in the prior art, the stabilizing technical means for the code rate of the video often cannot achieve the effect required by actual production.

Disclosure of Invention

One aspect of the present disclosure provides a video bitstream processing method, including: acquiring an initial video code stream; acquiring a plurality of original frames from an initial video code stream; generating a first coding frame and a second coding frame which correspond to the original frames respectively, wherein the number of the bytes of the code stream occupied by the second coding frame is smaller than or equal to the number of the bytes of the code stream occupied by the first coding frame; and generating a target video code stream according to the first coding frame and the second coding frame.

Optionally, generating a first encoded frame and a second encoded frame corresponding to each of the plurality of original frames includes: generating a first coded frame and a second coded frame corresponding to the first original frame; taking the second original frame as a target original frame, and circularly executing the following steps until all the original frames are traversed: generating a first reconstruction frame according to a second coding frame corresponding to a previous original frame of the target original frame; generating a first coding frame corresponding to the target original frame by taking the first reconstruction frame as a reference frame; generating a second reconstructed frame according to the first coded frame; generating a second coding frame corresponding to the target original frame by taking the second reconstructed frame as a reference frame; and taking the next original frame of the target original frame as the target original frame.

Optionally, generating a second encoded frame corresponding to the target original frame includes: determining the code stream size of a first coding frame corresponding to a target original frame, if the code stream size is smaller than or equal to a budget code stream threshold value, generating a second coding frame in a first coding mode, and if the code stream size is larger than the budget code stream threshold value, generating the second coding frame in a second coding mode, wherein the first coding mode is different from the second coding mode, and the code stream byte number of the second coding frame coded based on the first coding mode is larger than the code stream byte number of the second coding frame coded based on the second coding mode.

Optionally, generating the second encoded frame in the first encoding mode includes: taking the second reconstructed frame as a reference frame, and calculating residual errors of the target original frame and the reference frame; generating a second encoded frame from the residual; generating a second encoded frame in a second encoding mode, comprising: and generating a skip frame serving as a second coding frame by taking the second reconstructed frame as a reference frame.

Optionally, generating a second encoded frame from the residual comprises: subtracting a preset decrement parameter from a quantization parameter corresponding to a first coding frame corresponding to a target original frame to obtain a quantization parameter corresponding to a second coding frame; and generating the second coding frame according to the residual error and the quantization parameter corresponding to the second coding frame.

Optionally, generating a first encoded frame and a second encoded frame corresponding to each of the plurality of original frames includes: generating first coded frames and second coded frames corresponding to the first N original frames to form a reference code stream, wherein N is a positive integer; taking a second coding frame corresponding to the Nth original frame as the last frame in the sliding window, and selecting the sliding window according to the preset window width and the reference code stream; calculating quantization parameters of a target frame according to all the coded frames in the sliding window, wherein the target frame is a first coded frame or a second coded frame to be generated next; and generating a target frame according to the quantization parameter.

Optionally, calculating the quantization parameter of the target frame includes: acquiring respective actual code stream byte numbers of all the coded frames in the sliding window; calculating the code stream budget of the target frame according to the preset code stream budget mean value and the total actual code stream byte number; and calculating the quantization parameter of the target frame according to the code stream budget, the actual code stream byte number of the last encoded frame in the sliding window and the quantization parameter of the last encoded frame in the sliding window.

Optionally, before generating the target frame, the method further includes: determining whether the target frame is a key frame; and under the condition that the target frame is a key frame, increasing the value of the quantization parameter according to the preset increment parameter.

Another aspect of the present disclosure provides an electronic device, comprising: at least one processor; and a memory coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the video rate processing method.

Another aspect of the present disclosure provides a computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of the video rate processing method described above.

Drawings

For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:

Fig. 1 schematically illustrates a block diagram of a video encoder according to an embodiment of the present disclosure;

fig. 2 schematically illustrates a block diagram of a rate control process according to an embodiment of the present disclosure;

Fig. 3 schematically illustrates a flowchart of a video bitstream processing method according to an embodiment of the present disclosure;

fig. 4 schematically illustrates a flowchart of generating an encoded frame in a video bitstream processing method according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a correspondence diagram of an original frame and an encoded frame according to an embodiment of the present disclosure;

fig. 6 schematically illustrates a flowchart of generating a second encoded frame in a video bitstream processing method according to an embodiment of the present disclosure;

Fig. 7 schematically illustrates another flowchart of generating an encoded frame in a video bitstream processing method according to an embodiment of the present disclosure;

fig. 8 schematically illustrates a flowchart for calculating quantization parameters of a target frame in a video bitstream processing method according to an embodiment of the present disclosure;

Fig. 9 schematically illustrates a block diagram of generating a target frame through a sliding window in a video bitstream processing method according to an embodiment of the present disclosure;

fig. 10 schematically illustrates a block diagram of a video bitstream processing apparatus according to an embodiment of the present disclosure; and

FIG. 11 schematically illustrates a block diagram of an example electronic device that may be used to implement the methods of embodiments of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.

Some of the block diagrams and/or flowchart illustrations are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, when executed by the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart.

Thus, the techniques of this disclosure may be implemented in hardware and/or software (including firmware, microcode, etc.). Additionally, the techniques of this disclosure may take the form of a computer program product on a computer-readable medium having instructions stored thereon, the computer program product being usable by or in connection with an instruction execution system. In the context of this disclosure, a computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, a computer-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the computer readable medium include: magnetic storage devices such as magnetic tape or hard disk (HDD); optical storage devices such as compact discs (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or a wired/wireless communication link.

Fig. 1 schematically shows a block diagram of a video encoder according to an embodiment of the present disclosure.

As shown in fig. 1, F _n is a current frame to be encoded in the video bitstream, and is a reference frame of the current frame of F' _n-1, typically a reconstructed frame of a previous frame. According to F _n and F ' _n-1, through ME (Motion Estimation ) and MC (Motion Compensation, motion compensation), an inter-frame predicted image P ' is obtained, or Fn is selected through intra-frame prediction and intra-frame prediction, an intra-frame predicted image P ' is obtained, and a mode of obtaining the predicted image can be selected by setting a switch SW. The residual value D _n of the current frame F _n and the predicted image P' is calculated. D _n is transformed into T (for example, discrete cosine transform, wavelet transform, etc.) and quantized into Q to obtain the encoded frame X of F _n, where X is reordered and entropy encoded and then output to NAL (Network Abstraction Layer ) for subsequent transmission and processing.

After the encoded frame X is obtained, the encoded frame X may be further subjected to inverse quantization Q ^-1 and inverse transformation T ^-1 to obtain a prediction residual D ' _n, and a preliminary reconstructed frame uF ' _n of the current frame may be obtained according to D ' _n and the predicted image P ', and the preliminary reconstructed frame uF ' _n may be output to the intra-prediction selection and/or intra-prediction step to form a feedback loop. After filtering, a reconstructed frame F' _n of the current frame can be obtained.

Fig. 2 schematically illustrates a block diagram of a rate control process according to an embodiment of the present disclosure.

The video stream is transmitted through a transmission channel with limited bandwidth, and the encoder generates a variable stream, so that it is necessary to provide a video buffer between the encoder and the channel to smooth the fluctuation of the bit stream generated during the encoding process, to avoid the overflow or underflow of the buffer, and thus a suitable rate control algorithm is required. As shown in fig. 2, after a video sequence passes through a feedback loop formed by a buffer and a rate control module, a coded code stream with a code rate meeting transmission requirements is generated and transmitted to a transmission channel.

Fig. 3 schematically illustrates a flowchart of a video bitstream processing method according to an embodiment of the present disclosure.

As shown in fig. 3, the video bitstream processing method of the embodiment of the present disclosure may include operations S310 to S340.

In operation S310, an initial video bitstream is acquired. The initial video code stream refers to a data stream of a digital signal in a transmission process, wherein the initial video code stream comprises a plurality of original frames of video, the original frames comprise all pixel information in the video, and each pixel has complete brightness and color information. The original frame is the basis of video coding, and the video coder reduces the video data volume by compressing and coding the original frame, so that the transmission and storage of the video are realized.

In operation S320, a plurality of original frames are acquired from an initial video bitstream. And according to the actual coding requirement, a plurality of original frames needing to be coded are obtained from the initial video code stream, and the obtained plurality of original frames are a continuous original frame sequence. In other embodiments, a plurality of original frame sequences are obtained from the initial video bitstream in operation S320, each of the original frame sequences including a plurality of original frames, the original frames in each of the original frame sequences being consecutive frames, in which case the following operations S330 to S340 need to be performed on the plurality of original frames in each of the original frame sequences.

In operation S330, a first encoded frame and a second encoded frame corresponding to each of the plurality of original frames are generated, where the number of bytes of the code stream occupied by the second encoded frame is less than or equal to the number of bytes of the code stream occupied by the first encoded frame. As shown in fig. 4-5, for double frame rate encoding, two corresponding encoded frames are generated for each original frame F1-F4, respectively, that is, a first encoded frame and a second encoded frame, where the number of bytes of the code stream occupied by the second encoded frame is less than or equal to the number of bytes of the code stream occupied by the first encoded frame. And under the condition that the current original frame and the previous original frame are different, the number of bytes of the code stream occupied by the second code frame is smaller than that occupied by the first code frame. And under the condition that the current original frame and the previous original frame are the same, the number of bytes of the code stream occupied by the second encoded frame is equal to the number of bytes of the code stream occupied by the first encoded frame.

According to one embodiment of the present disclosure, the quantization parameter used to generate the first encoded frame corresponding to the current original frame is calculated from the quantization parameter of the second encoded frame corresponding to the previous original frame. The quantization parameter used for generating the second encoded frame corresponding to the current original frame is calculated according to the quantization parameter of the first encoded frame corresponding to the current original frame.

It should be noted that, since the process of generating the encoded frame from the next original frame needs to refer to the encoded frame generated from the previous original frame, among the obtained plurality of original frames, the first encoded frame and the second encoded frame corresponding to the first original frame need to be directly generated, and the other original frames need not be referred to in the generation process.

In operation S340, a target video bitstream is generated from the first encoded frame and the second encoded frame. Specifically, after the first encoded frame and the second encoded frame of each original frame are generated, the obtained encoded frames are reordered and entropy encoded to obtain a target video code stream. After the sorting, the previous coded frame of the first coded frame corresponding to the current original frame is the second coded frame corresponding to the previous original frame, and the previous coded frame of the second coded frame corresponding to the current original frame is the first coded frame corresponding to the current original frame.

According to the embodiment of the present disclosure, there are two encoded frames generated per original frame, that is, the target code stream is doubled in frame rate with respect to the original code stream, and in the target code stream, the frame rate of the second encoded frame is smaller than that of the first encoded frame, and the code rate of the target code stream is smaller in the same frame number with respect to the conventional art.

For example, in the initial code stream having four original frames F1 to F4, in the conventional technology of single frame rate, the encoded frames generated by F1 to F4 are P1 to P4, and the number of bytes of the code stream occupied by P1 to P4 is 1000, 1500, 1000, 800, respectively, so that the number average value of bytes of the code stream of the target code stream generated by the 4 original frames is (1000+1500+1000+800)/4=1075 bytes. According to the scheme of the embodiment of the disclosure, the first encoded frame is generated in a similar manner to the process of the encoded frame generated by the conventional single frame rate, and the reconstructed frame of the encoded frame generated according to the previous original frame is generated as the reference frame, so that the number of bytes of the generated code stream is slightly different, but the difference is small, and the number of bytes of the code stream occupied by the generated first encoded frame is 1000, 1500, 1000 and 800 as the same as the conventional technology. According to the scheme of the embodiment of the disclosure, the generated second coding frame is smaller than the first coding frame, and the number of bytes is assumed to be 500, 750, 500 and 400 in sequence. The number average value of the bytes of the target stream generated corresponding to the 4 original frames is (1000+500+1500+750+1000+500+800+400)/8= 806.25 bytes.

On the one hand, since in the scheme of the embodiment of the disclosure, each original frame corresponds to two encoded frames, the decoded image quality is higher than in the case of a single frame rate. Because the double frame rate exists, under the condition of constant bandwidth, the transmission rate of the double frame rate is relatively reduced for the same number of original frames, and in order to ensure the transmission rate, in the scheme of the embodiment of the disclosure, the byte number average value of the code stream is relatively smaller, and more encoded frames are transmitted in the same time, so that the loss in the transmission rate is complemented. On the other hand, under the condition of a certain image quality, a code rate as small as possible needs to be pursued to reduce the burden of data transmission, and according to the scheme of the embodiment of the disclosure, the target code stream generated by the embodiment of the disclosure has a small overall code stream byte number average value, and based on the analysis, in any selected segment, for example, the code frame corresponding to every 4 frames of original frames, the code stream byte number average value of the target code stream is also small, and the lower the code stream byte number average value is, the smaller the code rate is, so that transmission congestion caused by unstable code rate conditions (for example, the occurrence of peak in a designated segment code rate) can be avoided.

Fig. 4 schematically illustrates a flowchart of generating an encoded frame in a video bitstream processing method according to an embodiment of the present disclosure. Fig. 5 schematically illustrates a correspondence diagram of an original frame and an encoded frame according to an embodiment of the present disclosure.

As shown in fig. 4, operation S330 may include operations S410 to S460.

In operation S410, a first encoded frame and a second encoded frame corresponding to a first original frame are generated. Alternatively, referring to fig. 5, the first encoded frame generated by the first original frame is an I frame, which refers to an Intra-encoded frame (Intra-encoded frame), which is electrically called a key frame. An I-frame is a special frame in a video sequence that is a complete image frame independent of the information of other frames. Since the I-frame is independent of other frames, it can be decoded independently without first decoding the other frames. The higher the frequency of occurrence of I frames, the lower the compression ratio of video, but the better the decoding efficiency and video quality. The second coding frame S corresponding to the first original frame generation can be generated according to the reconstructed frame of the I frame and the quantization parameter of the I frame, and because the second coding frame S corresponding to the first original frame generation uses the reconstructed frame of the I frame as a reference frame, the residual error between the reference frame and the original frame is small, and therefore the number of bytes of the code stream occupied by the second coding frame S corresponding to the first original frame generation is smaller than the number of bytes of the code stream occupied by the I frame.

And taking the second original frame as a target original frame, and circularly executing operations S420-S460 until all original frames are traversed.

Starting from a second original frame, two encoding frames are generated from each subsequent original frame, the quantization parameter of the first encoding frame corresponding to each original frame needs to refer to the quantization parameter of the second encoding frame corresponding to the previous original frame, the quantization parameter of the second encoding frame corresponding to each original frame needs to refer to the quantization parameter of the first encoding frame, referring to fig. 5, for example, the original frames have F1-F4, F1 is the first frame, the first encoding frames I frame and the first encoding frames S1, F2-F4 are respectively and independently generated according to the preset quantization parameter QP _I and the quantization parameter QP _S1, the first encoding frames corresponding to each of the first encoding frames S1 and the second encoding frames S2-F4 are respectively and sequentially P _P2～QP_P4, the quantization parameters used for generating P2-P4 are respectively and sequentially S2-S4, and the quantization parameters used for generating S2-S4 are sequentially QP _S2～QP_S4. The solid line in the figure represents the correspondence between the original frame and the encoded frame, the dashed line represents the reference relationship of the quantization parameters used to generate the encoded frame, the quantization parameter QP _P2 of P2 is generated with reference to the quantization parameter QP _S1 of S1, the quantization parameter QP _S2 of S2 is generated with reference to the quantization parameter QP _P2 of P2, and so on, the QP _P3 with reference to QP _S2,QP_S3 with reference to QP _P3,QP_P4 with reference to QP _S3,QP_S4 with reference to QP _P4.

With continued reference to fig. 4, in operation S420, a first reconstructed frame is generated from a second encoded frame corresponding to a frame previous to the target original frame. The pixel value of the original frame before the target original frame is restored, that is, a first reconstructed frame is generated, and the first reconstructed frame is restored to the original frame before the target original frame by analyzing and reversely decoding a second coded frame corresponding to the original frame before the target original frame.

In operation S430, a first encoded frame corresponding to the target original frame is generated with the first reconstructed frame as a reference frame. Specifically, the first reconstructed frame is used as a reference frame, a residual error between the target original frame and the first reconstructed frame is calculated, and a first coding frame is generated according to the residual error.

In operation S440, a second reconstructed frame is generated from the first encoded frame. The process of generating the second reconstructed frame according to the first encoded frame is similar to the process of generating the first reconstructed frame in operation S420, in which the encoded frame is reversely restored to obtain a reconstructed frame, unlike in operation S420, the generated reconstructed frame in operation S440 is a restoration of the target original frame, and the reconstructed frame in operation S420 is a restoration of a previous original frame of the target original frame.

In operation S450, a second encoded frame corresponding to the target original frame is generated using the second reconstructed frame as a reference frame. The process of generating the second encoded frame corresponding to the target original frame using the second reconstructed frame as the reference frame is similar to the process of generating the first encoded frame in operation S430. However, since the reference frame used for generating the second encoded frame is the restoration of the target original frame, the difference between the second reference frame and the target original frame is far smaller than the difference between the first reference frame and the target original frame, and therefore, the number of bytes of the code stream occupied by the generated second encoded frame is far smaller than that of the first encoded frame.

In operation S460, the next original frame of the target original frame is taken as the target original frame.

Fig. 6 schematically illustrates a flowchart of generating a second encoded frame in a video bitstream processing method according to an embodiment of the present disclosure.

According to one embodiment of the present disclosure, generating the second reconstructed frame from the first encoded frame may include operations S610 to S630 in operation S440.

In operation S610, a code stream size of a first encoded frame corresponding to the target original frame is determined.

If less than or equal to the budget code stream threshold, operation S620 is performed, otherwise operation S630 is performed.

In operation S620, a second encoded frame is generated in a first encoding mode.

In operation S630, a second encoded frame is generated in a second encoding mode, wherein the first encoding mode is different from the second encoding mode, and the number of bytes of the code stream of the second encoded frame encoded based on the first encoding mode is greater than the number of bytes of the code stream of the second encoded frame encoded based on the second encoding mode.

Specifically, in operation S630, a second encoded frame is generated in the second encoding mode, including generating a skip frame as the second encoded frame with the second reconstructed frame as a reference frame. Each coding block, such as MB (Macroblock) or CTB (Coding Tree Block ), is directly based on the reference frame, and is coded into Skip blocks according to large blocks (without dividing sub-blocks), and the frames of all Skip blocks have no residual values, so that the number of occupied bytes of the code stream is very small, usually only tens of bytes, and the code stream occupation is greatly reduced.

According to an embodiment of the present disclosure, the budget code stream threshold may be calculated according to a preset code stream budget, and specifically includes:

According to the preset code stream budget, calculating the residual code stream budget before generating the second encoded frame, for example, the code stream budget of every four encoded frames is 4000, and the number of bytes of the generated code stream of the first three encoded frames is 1000, 800 and 500 respectively, so that the residual code stream budget of the fourth encoded frame is 4000-1000-800-300 =1700.

And calculating a budget code stream threshold of the second coding frame according to the residual code stream budget and a preset threshold proportion, for example, the threshold proportion is 50%, and the budget code stream threshold is 1700×50% =850.

And determining whether the generation mode of the second coding frame is the first coding mode or the second coding mode according to the code stream size of the first coding frame. For example, the number of bytes of the code stream occupied by the first encoded frame is 500, which is less than threshold 850, then the first encoded mode is selected to generate the second encoded frame. In other embodiments, the number of bytes of the code stream occupied by the first encoded frame is greater than the budget code stream threshold, and the second encoded frame is generated by selecting the second encoding mode. The number of bytes of the code stream of the second coding frame coded based on the first coding mode is larger than the number of bytes of the code stream of the second coding frame coded based on the second coding mode, so that the generated second coding frame is ensured to be smaller as much as possible.

Referring to fig. 6, operation S620 may include operation S621 or operation S622.

In operation S621, a residual of the target original frame and the reference frame is calculated with the second reconstructed frame as the reference frame.

In operation S620, a second encoded frame is generated from the residual. Specifically, subtracting a preset decrement parameter from a quantization parameter corresponding to a first coding frame corresponding to a target original frame to obtain a quantization parameter corresponding to a second coding frame. And generating the second coding frame according to the residual error and the quantization parameter corresponding to the second coding frame.

Since the quantization parameter value of the second encoded frame is smaller relative to the quantization parameter value used by the first encoded frame, the number of bytes of the code stream occupied is greater relative to generating the second encoded frame directly using the quantization parameter of the first encoded frame. However, since the reference frame of the second encoded frame is the reconstructed frame corresponding to the target original frame, the residual error between the target original frame and the reconstructed frame is very small, and the influence of the residual error on the code flow is far greater than that of the quantization parameter, the generated second encoded frame is still far smaller than the first encoded frame, and meanwhile, the quantization parameter used by the second encoded frame is increased, so that the image quality of the second encoded frame is relatively enhanced while the code flow is reduced.

Fig. 7 schematically illustrates another flowchart of generating an encoded frame in a video bitstream processing method according to an embodiment of the present disclosure.

As shown in fig. 7, operation S330 may include operations S710 to S740.

In operation S710, first encoded frames and second encoded frames corresponding to the first N original frames are generated to form a reference code stream, where N is a positive integer.

In operation S720, the second encoded frame corresponding to the nth original frame is used as the last frame in the sliding window, and the sliding window is selected according to the preset window width and the reference code stream. The preset window width needs to be smaller than N, that is, the number of encoded frames larger than the preset window width needs to be generated in advance.

In operation S730, the quantization parameter of the target frame is calculated according to all the encoded frames in the sliding window, and the target frame is the next first encoded frame or the next second encoded frame to be generated. That is, in the embodiment of the present disclosure, the quantization parameter of the target frame is not calculated from only the quantization parameter of the previous encoded frame, but is calculated from all the encoded frames within the entire sliding window before the target frame.

In operation S740, a target frame is generated according to the quantization parameter. After the target frame is generated, the sliding window may be slid backward by one frame as a new sliding window, and operations S730 and S740 are repeated until all the original frames are traversed. For example, the sliding window contains the generated encoded frames f 0-f 7, then after the target frame f8 is generated, the sliding window is slid backward by one frame, the encoded frames contained in the sliding window are f 1-f 8, the quantization parameter of the next target frame f9 is calculated according to the same manner, and f9 is generated according to the quantization parameter, so that the process is pushed until all the original frames are traversed.

Fig. 8 schematically illustrates a flowchart for calculating quantization parameters of a target frame in a video bitstream processing method according to an embodiment of the present disclosure.

As shown in fig. 8, operation S730 may include operations S810 to S830.

In operation S810, the respective actual number of bytes of the code stream for all the encoded frames within the sliding window is obtained.

In operation S820, the code stream budget of the target frame is calculated according to the preset code stream budget average value and the total actual code stream byte number.

In operation S830, the quantization parameter of the target frame is calculated according to the code stream budget, the actual number of bytes of the code stream for the last encoded frame in the sliding window, and the quantization parameter of the last encoded frame in the sliding window.

Specifically, according to the code stream budget and the actual code stream byte number of the last encoded frame in the sliding window, the adjustment amplitude and the adjustment direction of the quantization parameter of the target frame are calculated. For example, the code stream budget is compared with the actual code stream byte number of the last encoded frame in the sliding window, if the code stream budget is larger than the actual code stream byte number of the last encoded frame in the sliding window, the quantization parameter of the last encoded frame in the sliding window is increased to be used as the quantization parameter of the target frame, if the quantization parameter is smaller than the quantization parameter of the target frame, the quantization parameter of the last encoded frame in the sliding window is reduced to be used as the quantization parameter of the target frame, and if the quantization parameter is equal to the quantization parameter of the target frame, the quantization parameter is not adjusted.

The increasing or decreasing amplitude can be determined according to the relative error between the code stream budget and the actual code stream byte number of the last encoded frame in the sliding window, and the adjusting amplitude of the quantization parameter is determined according to the relative error and at least one preset relative error threshold. For example, three relative error thresholds are set, namely a first relative error threshold, a second relative error threshold and a third relative error threshold, wherein the first relative error threshold is larger than the second relative error threshold, the second relative error threshold is larger than the third relative error threshold, the adjustment amplitude is larger when the relative error is larger than or equal to the first relative error threshold, the adjustment amplitude is medium when the relative error is larger than or equal to the second relative error threshold and smaller than the first relative error threshold, the adjustment amplitude is smaller when the relative error is larger than or equal to the third relative error threshold and smaller than the second relative error threshold, and the adjustment amplitude is zero when the relative error is smaller than the third relative error threshold.

According to one embodiment of the present disclosure, the amplitude-adjusted quantization parameter is further adjusted according to a preset maximum quantization parameter threshold and a preset minimum quantization parameter threshold, if the adjusted quantization parameter is greater than the maximum quantization parameter threshold, the maximum quantization parameter threshold is taken as the quantization parameter, and if the adjusted quantization parameter is less than the minimum quantization parameter threshold, the minimum quantization parameter threshold is taken as the quantization parameter.

Fig. 9 schematically illustrates a block diagram of generating a target frame through a sliding window in a video bitstream processing method according to an embodiment of the present disclosure.

The operation steps in fig. 7 to 8 are explained in detail below with reference to fig. 9.

First, 8 encoded frames f0 to f7 are generated. The obtained actual code stream byte numbers of f0 to f7 are f0size to f7size in sequence, wherein the quantization parameter f7QP=25 corresponding to f7.

According to the preset window width, a sliding window is selected from f0 to f7, and if the preset window width L is 8, frames contained in the sliding window are f0 to f7. In other embodiments, the preset window width may be smaller than the number of generated encoded frames, e.g., the preset window width is 4, and then the frames included in the sliding window are f 4-f 7. In the following steps, explanation will be made taking a preset window width of 8 as an example.

A preset first relative error THRESHOLD threshold_high=1, a second relative error THRESHOLD threshold_mid=0.5, and a third relative error THRESHOLD threshold_low=0.2. Three adjustment amplitude values are preset, wherein a larger adjustment amplitude qp_diff_high=3, a medium adjustment amplitude qp_diff_mid=2, a smaller adjustment amplitude wp_diff_low=1, and a ZERO amplitude adjustment value qp_diff_zero=0.

The relative error between the actual stream byte number f7size of the target frame, budge, budge and f7 is bytesDiff, and the adjustment amplitude deltaQP of the quantization parameter can be calculated according to the following rule: when bytesDiff > = threshold_high, deltaQP = qp_diff_high; when bytesDiff > = threshold_mid and bytesDiff < threshold_high, deltaQP = qp_diff_mid; when bytesDiff > = threshold_low and bytesDiff < threshold_mid, deltaQP = qp_diff_low; when bytesDiff < threshold_low, deltaqp=qp_diff_zero.

The adjustment direction qpSign of the quantization parameter may be calculated following the following rules: qpSign = -1 when budge > f7 size; qpSign =1 when budge < f7 size; when budge =f7size, qpSign =0.

Preset maximum quantization parameter threshold maxQP =30, minimum quantization parameter threshold minqp=20.

The quantization parameter f8QP of the target frame f8 may be calculated by the following formula: f8qp=f7qp+ qpSign deltaQP.

Assuming that the preset code stream bps=1000000 and the frame rate fps=25 for every 8 frames of encoded frames, the preset code rate budget average bytesPerFrame =bps/(fps×l) =5000.

The situation that the code stream is not hyperbranched and surplus is avoided: assuming that the actual number of bytes of the code streams of f0 to f7 is 5000, the actual total code stream sum of f0 to f 7=40000, and the code stream budget budge =5000×9-40000=5000 for the 9 th frame, i.e. the target frame f 8. Thus, according to the foregoing rule, bytesDiff =0, deltaqp=0, QP sign=0, and thus the quantization parameter f8qp=f7qp+ 0*0 =25 for f8 can be derived.

Code stream hyperbranched case: assuming that the actual number of bytes of the code stream for f 0-f 7 is 5000 in sequence, 5000, 6000, budge =5000 x 9-44000=1000. Thus, bytesDiff =0.83, deltaqp=2, QP ign=1, at which point f8qp=f7qp+ 1*2 =27 can be derived according to the foregoing rules.

Situation of code stream surplus: assuming that the actual number of bytes of the code stream for f 0-f 7 is 5000 in sequence, 5000, 4000, budge =5000 x 9-36000=9000. Thus, bytesDiff =1.25, deltaqp=3, QP ign= -1, at which point f8qp=f7qp-3*1 =22 can be derived according to the foregoing rules.

It can be seen from this that the quantization parameter of the target frame is not substantially adjusted in the case that the code stream is not hyperbranched and is not surplus; under the condition of hyperbranched code stream, the quantization parameter of the target frame is increased to reduce the byte number of the code stream of the target frame, so that the code rate is reduced, and the code stream expense is reduced to be stable; under the condition of surplus code stream, the quantization parameter of the target frame is reduced to increase the number of bytes of the code stream of the target frame, so that the code rate is increased, and the image quality of the encoded frame is improved in a proper range. According to the embodiment of the disclosure, compared with a scheme without using a sliding window, the difference of code rate average values is not large, but another index (standard deviation) of code stream stability is reduced, and code rate control is more stable.

A comparison of consecutive 100 frames of data using the sliding window and the non-sliding window scheme is summarized in table 1.

The data in table 1 is encoded based on the code stream blue_sky_1080p25.yuv, the resolution 1080p, the target code rate bps=1 Mbps, the frame rate fps=25, the minqp=10, the maxqp=50, and the window l=8.

TABLE 1 continuous 100 frame data comparison

From the data in table 1, it can be calculated that the average value of the number of bytes of the code stream after the sliding window scheme is not used is 5018, and the average value of the number of bytes of the code stream after the sliding window scheme is used is 5088.04. The standard deviation of the byte number of the code stream after the sliding window scheme is not used is 1611.04, and the standard deviation of the byte number of the code stream after the sliding window scheme is used is 1096.17. It can be seen that the standard deviation optimizes the amplitude of 31.9% although the variation in the average value is not large.

According to one embodiment of the present disclosure, before generating the target frame, further comprising: determining whether the target frame is a key frame; and under the condition that the target frame is a key frame, increasing the value of the quantization parameter according to the preset increment parameter. Since the key frame is an important frame in the video code stream and has important image information, the quantization parameter for generating the key frame needs to be increased separately to reduce the code rate, increase the image quality and save the important image information as much as possible.

Fig. 10 schematically shows a block diagram of a video bitstream processing apparatus according to an embodiment of the present disclosure.

As shown in fig. 10, the video bitstream processing apparatus 1000 disposed at a client includes a first acquisition module 1010, a second acquisition module 1020, a first generation module 1030, and a second generation module 1040.

The first acquisition module 1010 may perform, for example, operation S310 for acquiring an initial video bitstream.

The second acquisition module 1020 may perform, for example, operation S320 for acquiring a plurality of original frames from the initial video bitstream.

The first generating module 1030 may perform, for example, operation S330, configured to generate a first encoded frame and a second encoded frame corresponding to each of the plurality of original frames, where the number of bytes of the code stream occupied by the second encoded frame is less than or equal to the number of bytes of the code stream occupied by the first encoded frame.

The second generation module 1040 may perform, for example, operation S340 for generating a target video bitstream from the first encoded frame and the second encoded frame.

For example, any of the first acquisition module 1010, the second acquisition module 1020, the first generation module 1030, and the second generation module 1040 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules. According to embodiments of the present disclosure, at least one of the first acquisition module 1010, the second acquisition module 1020, the first generation module 1030, and the second generation module 1040 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of any of the three implementations of software, hardware, and firmware. Or at least one of the first 1010, second 1020, first 1030, and second 1040 generation modules may be at least partially implemented as computer program modules that, when executed, perform the corresponding functions.

FIG. 11 schematically illustrates a block diagram of an example electronic device that may be used to implement the methods of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the apparatus 1100 includes a computing unit 1101 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the device 1100 can also be stored. The computing unit 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in the electronic device 1100 are connected to the I/O interface 1105, including: an input unit 1106 such as a keyboard, a mouse, etc.; an output unit 1107 such as various types of displays, speakers, and the like; a storage unit 1108, such as a magnetic disk, optical disk, etc.; and a communication unit 1109 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 1109 allows the device 1100 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 1101 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 1101 performs the respective methods and processes described above, such as an application running method. For example, in some embodiments, the application running method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 1108. In some embodiments, some or all of the computer programs may be loaded and/or installed onto device 1100 via ROM 1102 and/or communication unit 1109. When a computer program is loaded into the RAM 1103 and executed by the computing unit 1101, one or more steps of the application running method described above may be performed. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the application running method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual PRIVATE SERVER" or simply "VPS"). The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A video bitstream processing method, comprising:

Acquiring an initial video code stream;

acquiring a plurality of original frames from the initial video code stream;

generating a first coding frame and a second coding frame which correspond to the original frames respectively, wherein the number of the bytes of the code stream occupied by the second coding frame is smaller than or equal to the number of the bytes of the code stream occupied by the first coding frame;

and generating a target video code stream according to the first coding frame and the second coding frame.

2. The method of claim 1, wherein the generating the first encoded frame and the second encoded frame, respectively, of the plurality of original frames comprises:

Generating a first coded frame and a second coded frame corresponding to the first original frame;

and taking a second original frame as a target original frame, and circularly executing the following steps until all the original frames are traversed:

Generating a first reconstructed frame according to a second coded frame corresponding to the original frame before the target original frame;

generating a first coding frame corresponding to the target original frame by taking the first reconstruction frame as a reference frame;

generating a second reconstructed frame according to the first coded frame;

generating a second coding frame corresponding to the target original frame by taking the second reconstructed frame as a reference frame;

and taking the next original frame of the target original frame as a target original frame.

3. The method of claim 2, wherein generating a second encoded frame corresponding to the target original frame comprises:

Determining the code stream size of a first coding frame corresponding to the target original frame, generating the second coding frame in a first coding mode if the code stream size is smaller than or equal to a budget code stream threshold value, generating the second coding frame in a second coding mode if the code stream size is larger than the budget code stream threshold value,

The first coding mode is different from the second coding mode, and the number of bytes of the code stream of the second coding frame coded based on the first coding mode is larger than the number of bytes of the code stream of the second coding frame coded based on the second coding mode.

4. A method according to claim 3, wherein the generating the second encoded frame in the first encoding mode comprises:

calculating residual errors of the target original frame and the reference frame by taking the second reconstructed frame as the reference frame;

Generating the second encoded frame from the residual;

The generating the second encoded frame in a second encoding mode includes:

and generating a skip frame serving as the second coding frame by taking the second reconstructed frame as a reference frame.

5. The method of claim 4, the generating the second encoded frame from the residual, comprising:

subtracting a preset decrement parameter from a quantization parameter corresponding to a first coding frame corresponding to the target original frame to obtain a quantization parameter corresponding to the second coding frame;

And generating the second coding frame according to the residual error and the quantization parameter corresponding to the second coding frame.

6. The method of claim 1, wherein the generating the first encoded frame and the second encoded frame, respectively, of the plurality of original frames comprises:

generating first coded frames and second coded frames corresponding to the first N original frames to form a reference code stream, wherein N is a positive integer;

taking a second coding frame corresponding to the Nth original frame as the last frame in the sliding window, and selecting the sliding window according to the preset window width and the reference code stream;

Calculating quantization parameters of a target frame according to all the encoding frames in the sliding window, wherein the target frame is a first encoding frame or a second encoding frame to be generated next;

And generating the target frame according to the quantization parameter.

7. The method of claim 6, wherein the calculating quantization parameters of the target frame comprises:

acquiring respective actual code stream byte numbers of all the coded frames in the sliding window;

Calculating the code stream budget of the target frame according to a preset code stream budget mean value and the total actual code stream byte number;

And calculating the quantization parameter of the target frame according to the code stream budget, the actual code stream byte number of the last coding frame in the sliding window and the quantization parameter of the last coding frame in the sliding window.

8. The method of claim 6, further comprising, prior to the generating the target frame:

Determining whether the target frame is a key frame;

and under the condition that the target frame is a key frame, increasing the numerical value of the quantization parameter according to a preset increment parameter.

9. An electronic device, comprising:

At least one processor; and

A memory coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

10. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 8.