CN116389768A - Video encoding method and apparatus, electronic device, and computer-readable storage medium - Google Patents

Video encoding method and apparatus, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
CN116389768A
CN116389768A CN202310325793.3A CN202310325793A CN116389768A CN 116389768 A CN116389768 A CN 116389768A CN 202310325793 A CN202310325793 A CN 202310325793A CN 116389768 A CN116389768 A CN 116389768A
Authority
CN
China
Prior art keywords
frame
determining
trending
backward
current video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310325793.3A
Other languages
Chinese (zh)
Inventor
梁俊辉
苏文艺
叶天晓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bilibili Technology Co Ltd
Original Assignee
Shanghai Bilibili Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bilibili Technology Co Ltd filed Critical Shanghai Bilibili Technology Co Ltd
Priority to CN202310325793.3A priority Critical patent/CN116389768A/en
Publication of CN116389768A publication Critical patent/CN116389768A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/573Motion compensation with multiple frame prediction using two or more reference frames in a given prediction direction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/119Adaptive subdivision aspects, e.g. subdivision of a picture into rectangular or non-rectangular coding blocks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present disclosure provides a video encoding method and apparatus, an electronic device, and a computer-readable storage medium. The method comprises the following steps: acquiring a video to be encoded, wherein the video to be encoded comprises one or more video frames; determining, for a current video frame of the one or more video frames, a plurality of reference frames corresponding to the current video frame; determining a trending reference direction of the current video frame in response to determining that the plurality of reference frames are to be pruned based on attributes of the plurality of reference frames, the trending reference direction indicating an association between a target reference frame and the current video frame, the target reference frame being included in the plurality of reference frames; pruning a plurality of reference frames according to the trend reference direction; and encoding the video to be encoded based on the pruning result.

Description

Video encoding method and apparatus, electronic device, and computer-readable storage medium
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to a video encoding method and apparatus, an electronic device, a computer readable storage medium, and a computer program product.
Background
With the development of internet technology, video platforms are also rapidly developed, and more users share and watch videos. As the number of users increases, the bandwidth costs for video transmission increase for platforms. In order to improve the storage and transmission efficiency of video, compression of video data is generally achieved by encoding video images. Because of the strong correlation between the continuous frames of images in the video, the video can be encoded by the intra-frame prediction technology and the inter-frame prediction technology, thereby achieving the purpose of compressing video data.
The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, the problems mentioned in this section should not be considered as having been recognized in any prior art unless otherwise indicated.
Disclosure of Invention
The present disclosure provides a video encoding method and apparatus, an electronic device, a computer-readable storage medium, and a computer program product.
According to an aspect of the present disclosure, there is provided a video encoding method including: acquiring a video to be encoded, wherein the video to be encoded comprises one or more video frames; determining, for a current video frame of the one or more video frames, a plurality of reference frames corresponding to the current video frame; determining a trending reference direction of the current video frame in response to determining that the plurality of reference frames are to be pruned based on attributes of the plurality of reference frames, the trending reference direction indicating an association between a target reference frame and the current video frame, the target reference frame being included in the plurality of reference frames; pruning a plurality of reference frames according to the trend reference direction; and encoding the video to be encoded based on the pruning result.
According to another aspect of the present disclosure, there is also provided a video encoding apparatus including: an acquisition module configured to acquire a video to be encoded, the video to be encoded comprising one or more video frames; a first determination module configured to determine, for a current video frame of the one or more video frames, a plurality of reference frames corresponding to the current video frame; a second determining module configured to determine a trending reference direction of the current video frame in response to determining that the plurality of reference frames are to be pruned based on attributes of the plurality of reference frames, the trending reference direction indicating an association relationship between a target reference frame and the current video frame, the target reference frame being included in the plurality of reference frames; pruning module configured to prune the plurality of reference frames according to the trending reference direction; and an encoding module configured to encode the video to be encoded based on the pruning result.
According to another aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and at least one memory communicatively coupled to the at least one processor, wherein the at least one memory stores a computer program that, when executed by the at least one processor, implements the video encoding method described above.
According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the video encoding method described above.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the video encoding method described above.
According to one or more embodiments of the present disclosure, it is determined whether to prune a reference frame list of each video frame before encoding video to be encoded, and the reference frame list is pruned based on a trending reference direction of each video frame in the case where it is determined to prune the reference frame list, whereby reference frame combinations required to be traversed can be reduced, a large amount of computation required when determining an optimal reference frame from a plurality of reference frames is avoided, and thus encoding speed is increased.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The accompanying drawings illustrate exemplary embodiments and, together with the description, serve to explain exemplary implementations of the embodiments. The illustrated embodiments are for exemplary purposes only and do not limit the scope of the claims. Throughout the drawings, identical reference numerals designate similar, but not necessarily identical, elements.
Fig. 1 illustrates a flow chart of a video encoding method according to some embodiments of the present disclosure;
FIG. 2 illustrates a flow chart for determining a trending reference direction for a current video frame in accordance with some embodiments of the present disclosure;
FIG. 3 illustrates a flow chart for determining a trending reference direction for a current video frame in accordance with further embodiments of the present disclosure;
fig. 4 shows a block diagram of a video encoding apparatus according to an embodiment of the present disclosure;
fig. 5 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the present disclosure, the use of the terms "first," "second," and the like to describe various elements is not intended to limit the positional relationship, timing relationship, or importance relationship of the elements, unless otherwise indicated, and such terms are merely used to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, they may also refer to different instances based on the description of the context.
The terminology used in the description of the various illustrated examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, the elements may be one or more if the number of the elements is not specifically limited. Furthermore, the term "and/or" as used in this disclosure encompasses any and all possible combinations of the listed items.
With the development of internet technology, video platforms are also rapidly developed, and more users share and watch videos. As the number of users increases, the bandwidth costs for video transmission increase for platforms. In order to improve the storage and transmission efficiency of video, compression of video data is generally achieved by encoding video images. Because of the strong correlation between the continuous frames of images in the video, the video can be encoded by the intra-frame prediction technology and the inter-frame prediction technology, thereby achieving the purpose of compressing video data.
The number of reference frames in the corresponding reference frame list when each video frame is encoded using inter-frame prediction techniques is specified under different video standards (e.g., AV1 standards specify that each video frame may have 7 reference frames), where the reference frames may be divided into forward reference frames and backward reference frames depending on whether the reference frames are located before or after the current frame play order. Typically, at most 2 of the reference frames are used as final target reference frames. The inventors have found that the final target reference frame is typically determined based on the rate-distortion theory, i.e. traversing all possible reference frame combinations and calculating the rate-distortion costs of the respective combinations, selecting the combination with the smallest rate-distortion cost as the final target reference frame for the current frame. However, calculating the rate distortion cost of each combination often requires a larger amount of calculation, and thus may result in a reduction in the encoding speed, particularly when the number of reference frames in the reference frame list is large, the influence of the amount of calculation on the encoding speed is particularly important.
In view of this, embodiments of the present disclosure provide a video encoding method that determines a trending reference direction of each video frame by pre-analysis before main encoding a video to be encoded, and prunes a reference frame list based on the trending reference direction of each video frame. By combining the pre-analysis with the main coding, reference frame combinations required to be traversed in the main coding process can be reduced, so that the calculation amount required when determining the optimal reference frame from a plurality of reference frames is reduced, and the coding speed is increased. Meanwhile, as the pre-analysis and the main coding are two times of coding on the same video frame, the results of the pre-analysis and the main coding have a certain correlation, and therefore, when the video to be coded is coded based on a pruning result, the coding result and the coding performance can be ensured.
Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.
Fig. 1 illustrates a flow chart of a video encoding method 100 according to some embodiments of the present disclosure. As shown in fig. 1, the method 100 may include: step S110, obtaining a video to be encoded, wherein the video to be encoded comprises one or more video frames; step S120, determining a plurality of reference frames corresponding to the current video frame for the current video frame in the one or more video frames; step S130, in response to determining that the plurality of reference frames are to be pruned based on the attributes of the plurality of reference frames, determining a trend reference direction of the current video frame, the trend reference direction indicating an association relationship between a target reference frame and the current video frame, the target reference frame being included in the plurality of reference frames; step S140, pruning a plurality of reference frames according to the trend reference direction; and step S150, encoding the video to be encoded based on the pruning result.
The method comprises the steps of determining the trend reference direction of each video frame through pre-analysis before main coding is carried out on the video to be coded, pruning a reference frame list based on the trend reference direction of each video frame, and reducing the combination of reference frames required to be traversed in the main coding process through the combination of the pre-analysis and the main coding, thereby reducing the calculation amount required when determining the optimal reference frame from a plurality of reference frames and accelerating the coding speed. Meanwhile, as the pre-analysis and the main coding are two times of coding on the same video frame, the results of the pre-analysis and the main coding have a certain correlation, and therefore, when the video to be coded is coded based on a pruning result, the coding result and the coding performance can be ensured.
In step S110, the stored or cached video to be encoded may be read from an appropriate storage device (local and/or remote). Alternatively, the video to be encoded may also be received from an external other device via a wired or wireless communication link. Video to be encoded may refer to any complete video file. For example, it may be a video file recorded by the user himself, a video file intercepted by the user from other video files, or a video file authored by the user based on a plurality of video files, the scope of the claimed subject matter is not limited in this respect.
In step S120, the reference frame corresponding to the current video frame may refer to an image referenced by the encoding block in the video encoding process. The number of reference frames in the reference frame list may vary depending on the video coding standard selected. For example, 7 reference frames are included in the AV1 video coding standard lower reference frame list, 16 reference frames are included in the h.264 video coding standard lower reference frame list, and the like.
Fig. 2 illustrates a flow chart of determining a trending reference direction for a current video frame in accordance with some embodiments of the present disclosure. As shown in fig. 2, in step S130, in response to determining that the plurality of reference frames are to be pruned based on the attributes of the plurality of reference frames, determining the trending reference direction of the current video frame may include: step S232, dividing the current video frame to obtain a plurality of coding blocks with preset sizes; step S234, for each coding block in the plurality of coding blocks, calculating forward inter-frame prediction rate-distortion cost, backward inter-frame prediction rate-distortion cost and intra-frame prediction rate-distortion cost of the coding block; and step S236, determining a trend reference direction of the current video frame based on the forward inter-frame prediction rate-distortion cost, the backward inter-frame prediction rate-distortion cost, the intra-frame prediction rate-distortion cost and the total number of the plurality of coding blocks.
The rate-distortion cost can be used to characterize the cost penalty for intra-and inter-prediction encoding each video frame, the higher the rate-distortion cost, the lower the encoding efficiency for video encoding. Conversely, the lower the rate distortion cost, the higher the coding efficiency when encoding video. Therefore, determining the trending reference direction of the current video frame based on the rate-distortion cost of each encoded block has a degree of accuracy and reliability.
In step S232, a plurality of fixed-size encoded blocks may be obtained by performing downsampling processing on the current video frame, for example, dividing the current video frame into a plurality of encoded blocks having a size of 8×8 or 16×16 pixel regions. Then, a rate distortion cost is calculated for each divided coding block.
According to some embodiments of the present disclosure, in step S234, the forward inter-frame prediction rate-distortion cost may represent an inter-frame prediction time-cost penalty for the current video frame (encoded block) and the forward reference frames in the reference frame list, and the backward inter-frame prediction rate-distortion cost may represent an inter-frame prediction time-cost penalty for the current video frame (encoded block) and the backward reference frames in the reference frame list. Specifically, the forward inter-frame prediction rate-distortion cost, the backward inter-frame prediction rate-distortion cost, and the intra-frame prediction rate-distortion cost may be calculated by:
SATDCOST=SATD+λ*R (1)
The SATD is the sum of absolute transform differences (Sum of Absolute Transformed Differences) of residuals of a prediction coding block and an original coding block after Hadamard transform during intra prediction or inter prediction. Lambda is the lagrangian multiplier in the rate distortion optimization theory and is obtained by the quantization parameter QP (Quantization Parameter). R is the number of bits required to encode the inter motion vector or intra prediction direction and residual.
Based on the quantization parameter QP, the lagrangian multiplier λ may be calculated by:
λ=c*Qstep 2 (2)
where Qstep denotes a quantization step obtained from a quantization parameter QP lookup table, and c is a constant determined by the encoder, e.g., c may be 0.5.
The Hadamard transform of the residual of the prediction encoded block from the original encoded block can be calculated by:
T=H M *Resi*H M (3)
where Resi is the residual matrix. H M For the M-order Hadamard matrix, M may take values of 4, 8, 16, 32, etc. T is a result matrix after Hadamard transformation, and the SATD can be obtained by taking the absolute value of each value in T and summing.
The residual of a predictive coded block from the original coded block can be calculated by:
Resi(i,j)=Pred(i,j)-Src(i,j) (4)
where i, j respectively represent row-column coordinates of the pixels in the image, pred (i, j) represents a predicted pixel value of an ith row and a jth column, src (i, j) represents an original pixel value of the ith row and the jth column, and Resi (i, j) represents a residual value between the predicted pixel value of the ith row and the jth column and the original pixel value. All residual values of each coding block may form a residual matrix Resi.
It should be appreciated that the specific formulas (1) - (4) above for calculating the inter-frame and intra-frame predicted rate-distortion costs are shown for illustrative purposes only and may be calculated using any other suitable method, as the scope of the presently claimed subject matter is not limited in this respect.
It should also be understood that, either the forward inter-frame prediction rate-distortion cost, the backward inter-frame prediction rate-distortion cost or the intra-frame prediction rate-distortion cost can be calculated by the above equation, only the corresponding residual and the corresponding required number of bits need to be obtained respectively.
According to some embodiments of the present disclosure, calculating, for each of the plurality of encoded blocks, a forward inter-prediction rate-distortion cost, a backward inter-prediction rate-distortion cost, and an intra-prediction rate-distortion cost for the encoded block may include: calculating the inter-frame prediction rate distortion cost between the coding block and the corresponding coding block in the forward adjacent reference frame of the current video frame as the forward inter-frame prediction rate distortion cost; and calculating an inter-frame prediction rate-distortion cost between the coded block and a corresponding coded block in a backward adjacent reference frame of the current video frame as a backward inter-frame prediction rate-distortion cost.
Specifically, when the inter-prediction rate distortion cost between the current video frame and the corresponding encoded block of the forward or backward neighboring reference frame can be calculated by the above formulas (1) - (4).
Because stronger correlation exists between continuous multi-frame images of the video and the correlation between each video frame and adjacent video frames (comprising a forward adjacent reference frame and a backward adjacent reference frame) is possibly the most close, the current video frame and the adjacent forward and backward two reference frames are selected to respectively carry out inter-frame prediction and calculate the corresponding inter-frame prediction rate distortion cost, thereby being beneficial to obtaining more accurate trend reference directions of the current video frame later.
In accordance with some embodiments of the present disclosure, determining the trending reference direction for the current video frame based on the forward inter-frame prediction rate-distortion cost, the backward inter-frame prediction rate-distortion cost, the intra-frame prediction rate-distortion cost, and the total number of the plurality of encoded blocks may include, in step S236: determining whether the encoded block is a forward predictive encoded block or a backward predictive encoded block based on the forward inter-frame predicted rate-distortion cost, the backward inter-frame predicted rate-distortion cost, and the intra-frame predicted rate-distortion cost; determining the number of forward predictive coding blocks and backward predictive coding blocks in the plurality of coding blocks; and determining a trending reference direction of the current video frame based on the number of forward predictive coding blocks, the number of backward predictive coding blocks, and the total number of the plurality of coding blocks.
In some embodiments, the trending reference direction of the current video frame may be forward, backward, or indeterminate, and the trending reference direction of the current video frame may be set to be indeterminate in an initial state. Specifically, determining whether the encoded block is a forward predictive encoded block or a backward predictive encoded block based on the forward inter-frame rate-distortion cost, the backward inter-frame rate-distortion cost, and the intra-frame rate-distortion cost may include: responsive to determining that the forward inter-prediction rate-distortion cost of the encoded block is less than the intra-prediction rate-distortion cost, determining that the encoded block is a forward predictive encoded block; and in response to determining that the backward inter-prediction rate-distortion cost of the encoded block is less than the intra-prediction rate-distortion cost, determining the encoded block to be a backward predictive encoded block.
For each video frame, either inter-frame coding or intra-frame coding may be selected during the actual encoding process, wherein the selection of inter-frame coding requires determining the optimal target reference frame or target reference frame combination from a list of reference frames. In the pre-analysis process of determining the trend reference direction of the current video frame, the inter-frame prediction rate-distortion cost and the intra-frame prediction rate-distortion cost are compared, so that the preliminary prediction of selecting inter-frame coding or intra-frame coding in the actual coding process can be realized. Meanwhile, when the preliminary prediction needs to perform inter-coding, it can be more accurately determined whether each coding block is a forward prediction coding block or a backward prediction coding block. In addition, compared with the actual encoding process, the amount of calculation involved in calculating the rate distortion cost in the pre-analysis process is relatively small, so that the encoding speed is favorably increased.
It should be noted that the pre-analysis process described above is not entirely used as a basis for selecting inter-coding or intra-coding in the actual coding process, but is only used to determine the trending reference direction for each coded block at a low inter-coding rate-distortion cost. In other words, even if it is determined that the inter-prediction rate-distortion cost of the encoded block is greater than the intra-prediction rate-distortion cost (including the forward inter-prediction rate-distortion cost being greater than the intra-prediction rate-distortion cost, the subsequent inter-prediction rate-distortion cost being greater than the intra-prediction rate-distortion cost, or both), it does not mean that the determination of intra-encoding the video frame to which the encoded block corresponds is made in the actual encoding process.
It should be appreciated that by calculating the forward inter-prediction rate-distortion cost, the backward inter-prediction rate-distortion cost, and the intra-prediction rate-distortion cost for each encoded block, there may be four cases:
1) The forward inter-frame prediction rate-distortion cost of the encoded block is less than the intra-frame prediction rate-distortion cost, and the backward inter-frame prediction rate-distortion cost is also less than the intra-frame prediction rate-distortion cost. In this case, it may be determined that the encoded block is both a forward predictive encoded block and a backward predictive encoded block, and the number of both forward predictive encoded blocks and backward predictive encoded blocks is increased by 1.
2) The forward inter-frame prediction rate-distortion cost of the encoded block is less than the intra-frame prediction rate-distortion cost, and the backward inter-frame prediction rate-distortion cost is greater than or equal to the intra-frame prediction rate-distortion cost. In this case, it may be determined that the encoded block is a forward predictive encoded block, and the number of forward predictive encoded blocks is increased by 1, while the number of backward predictive encoded blocks remains unchanged.
3) The forward inter-frame prediction rate-distortion cost of the encoded block is greater than or equal to the intra-frame prediction rate-distortion cost, and the backward inter-frame prediction rate-distortion cost is less than the intra-frame prediction rate-distortion cost. In this case, it may be determined that the encoded block is a backward predictive encoded block, and the number of backward predictive encoded blocks is increased by 1, while the number of forward predictive encoded blocks remains unchanged.
4) The forward inter-frame prediction rate-distortion cost of the encoded block is greater than or equal to the intra-frame prediction rate-distortion cost, and the backward inter-frame prediction rate-distortion cost is also greater than or equal to the intra-frame prediction rate-distortion cost. In this case, it may be determined that the encoded block is neither a forward predictive encoded block nor a backward predictive encoded block, and the number of forward predictive encoded blocks and the number of backward predictive encoded blocks are both kept unchanged.
Thus, the sum of the number of forward predictive coding blocks and backward predictive coding blocks is not necessarily less than or equal to the total number of coding blocks. For example, in the case where 16 fixed-size encoded blocks are obtained by downsampling a current video frame, there may be 9 forward predictive encoded blocks and 9 backward predictive encoded blocks, where two encoded blocks are simultaneously forward predictive encoded blocks and backward predictive encoded blocks.
On this basis, according to some embodiments of the present disclosure, determining the trending reference direction of the current video frame based on the number of forward predictive coding blocks, the number of backward predictive coding blocks, and the total number of the plurality of coding blocks may include: in response to determining that the number of forward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks, determining a trending reference direction of the current video frame as forward; and determining the trending reference direction of the current video frame as backward in response to determining that the number of forward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks and that the number of backward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks.
The first threshold may be the same as the second threshold or may be different from the second threshold. For example, the first threshold and the second threshold may both be set to 0.5. For another example, the first threshold may be set to 0.6 and the second threshold may be set to 0.5. It should be appreciated that the first threshold and the second threshold may be any suitable value between 0-1 and are not limited to the specific values in the particular embodiments described above, as claimed subject matter is not limited in this respect.
According to some embodiments of the present disclosure, determining the trending reference direction for the current video frame based on the number of forward predictive coding blocks, the number of backward predictive coding blocks, and the total number of the plurality of coding blocks may further comprise: determining the trending reference direction of the current video frame as uncertain in response to determining that it is not both: the number of forward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks; and the number of forward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is greater than the first threshold multiple of the total number of the plurality of coding blocks.
Continuing the example in which 9 forward predictive coded blocks and 9 backward predictive coded blocks are included in the above 16 coded blocks, and the first threshold and the second threshold are both 0.5, in this case, the number of forward predictive coded blocks 9 is greater than the total number of the plurality of coded blocks 16×the first threshold 0.5=8, and the number of backward predictive coded blocks 9 is not less than the total number of the plurality of coded blocks 16×the second threshold 0.5=8, so that the trend reference direction of the preceding video frame can be determined to be uncertain.
Fig. 3 illustrates a flow chart for determining a trending reference direction for a current video frame in accordance with further embodiments of the present disclosure. As shown in fig. 3, in step S130, in response to determining that the plurality of reference frames are to be pruned based on the attributes of the plurality of reference frames, determining the trending reference direction of the current video frame may include: steps S332-S336 that are the same as or similar to steps S232-S236 described with respect to fig. 2; and step S338, updating the trend reference direction of the current video frame to obtain an updated trend reference direction.
It should be appreciated that the operations, features and advantages described above with respect to steps S232-S236 apply equally to steps S332-S336. For brevity, these operations, features and advantages are not described in detail herein.
After the trend reference direction of the current video frame is determined through the above steps, the accuracy and reliability of the determined trend reference direction can be further improved by updating the trend reference direction.
In accordance with some embodiments of the present disclosure, updating the trending reference direction of the current video frame to obtain an updated trending reference direction in step S338 may include: responsive to determining that the trending reference direction of the current video frame is forward, determining a trending reference direction of a forward neighboring reference frame of the current video frame; and in response to determining that the trending reference direction of the forward neighboring reference frame is backward, updating the trending reference direction of the current video frame to be uncertain.
Specifically, when the trend reference direction of the current video frame is determined to be forward, a more compact association relationship may exist between the adjacent forward reference frame and the current video frame, and at this time, by further determining the trend reference direction of the adjacent forward reference frame, errors caused in the process of calculating the trend reference direction of the current video frame can be reduced, so that the accuracy of the preprocessing analysis result can be improved.
Optionally, if the trending reference direction of the forward neighboring reference frame is determined to be uncertain, the trending reference direction of the current video frame may not be updated, i.e., the trending reference direction of the current video frame is maintained as forward.
In accordance with some embodiments of the present disclosure, updating the trending reference direction of the current video frame to obtain an updated trending reference direction in step S360 may include: in response to determining that the trending reference direction of the current video frame is backward, determining a trending reference direction of a backward neighboring reference frame of the current video frame; and in response to determining the trending reference direction of the backward neighboring reference frame as forward, updating the trending reference direction of the current video frame to be uncertain.
Similarly, when the trend reference direction of the current video frame is determined to be backward, a more compact association relationship may exist between the neighboring backward reference frame and the current video frame, and at this time, by further determining the trend reference direction of the neighboring backward reference frame, the error caused in the process of calculating the trend reference direction of the current video frame can be reduced, so that the accuracy of the preprocessing analysis result can be improved.
Optionally, if the trending reference direction of the backward neighboring reference frame is determined to be uncertain, the trending reference direction of the current video frame may not be updated, i.e., the trending reference direction of the current video frame is maintained to be backward.
It should be appreciated that determining the trending reference direction of either the forward neighboring reference frame or the backward neighboring reference frame may be accomplished using the same or similar steps as determining the trending reference direction of the current video frame.
In accordance with some embodiments of the present disclosure, determining to prune the plurality of reference frames based on the attributes of the plurality of reference frames in step S130 may include: determining a number of the plurality of reference frames; determining whether a plurality of reference frames includes both a forward reference frame and a backward reference frame; and pruning the plurality of reference frames in response to determining that the number of the plurality of reference frames is greater than 2 and that both the forward reference frame and the backward reference frame are included in the plurality of reference frames.
As described above, the number of reference frames in the reference frame list may be associated with the current video coding standard, and there may be cases where the reference frames in the reference frame list are both forward reference frames or both backward reference frames. When the number of the reference frames in the reference frame list is not more than 2 or the reference frames in the reference frame list are all forward reference frames or all backward reference frames, pruning of the reference frames in the reference frame list is not needed. By determining whether to prune the reference frames in the reference frame list before determining the trending reference direction of the current video frame, the amount of computation, such as that consumed by computing the rate-distortion cost, may be avoided, saving memory space.
According to some embodiments of the present disclosure, pruning the plurality of reference frames according to the trending reference direction in step S140 includes: pruning all backward reference frames in the plurality of reference frames in response to determining that the trending reference direction is forward; pruning all forward reference frames in the plurality of reference frames in response to determining that the trending reference direction is backward; and in response to determining that the trending reference direction is uncertain, pruning the plurality of reference frames is not performed.
By pruning the reference frames in the reference frame list, reference frame combinations required to be traversed when determining the optimal reference frame from a plurality of reference frames can be reduced, a large amount of computation required is effectively reduced, and thus the encoding speed is increased.
According to some embodiments of the present disclosure, in step S150, the pruning result represents reference frames remaining after pruning the reference frames in the reference frame list. Encoding the video to be encoded based on the pruning result may include: traversing rest reference frame combinations after pruning; calculating a rate distortion cost corresponding to each reference frame combination; selecting a reference frame combination with the minimum rate distortion cost as an optimal target reference frame corresponding to the current video frame; and encoding the video to be encoded based on the target reference frame.
Therefore, in the process of determining the optimal target reference frame, the reference frame combination required to be traversed can be reduced, and the aim of accelerating the coding speed is fulfilled. Meanwhile, the pre-analysis process and the main coding process for obtaining the pruning result are two times of coding on the same video frame, and the results of the pre-analysis process and the main coding process often have certain correlation, so that the position of the target reference frame reflected by the pruning result has certain accuracy, and better coding performance can be ensured.
Fig. 4 shows a block diagram of a video encoding apparatus 400 according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus 400 may include: an acquisition module 410 configured to acquire a video to be encoded, the video to be encoded comprising one or more video frames; a first determining module 420 configured to determine, for a current video frame of the one or more video frames, a plurality of reference frames corresponding to the current video frame; a second determining module 430 configured to determine a trending reference direction of the current video frame in response to determining that the plurality of reference frames are to be pruned based on attributes of the plurality of reference frames, the trending reference direction indicating an association relationship between a target reference frame and the current video frame, the target reference frame being included in the plurality of reference frames; pruning module 440 configured to prune the plurality of reference frames according to the trending reference direction; and an encoding module 450 configured to encode the video to be encoded based on the pruning result.
By determining whether to prune the reference frame list of each video frame before encoding the video to be encoded, and pruning the reference frame list based on the trending reference direction of each video frame if it is determined to prune the reference frame list, reference frame combinations that require traversal can be reduced, avoiding a large amount of computation that is required when determining an optimal reference frame from among a plurality of reference frames, thereby speeding up encoding.
According to some embodiments of the present disclosure, the second determining module 430 may include: a dividing module configured to divide a current video frame to obtain a plurality of encoded blocks of a preset size; a calculation rate-distortion cost module configured to calculate, for each of a plurality of encoded blocks, a forward inter-frame prediction rate-distortion cost, a backward inter-frame prediction rate-distortion cost, and an intra-frame prediction rate-distortion cost for the encoded block; and a determine trending reference direction module configured to determine a trending reference direction for the current video frame based on the forward inter-frame prediction rate-distortion cost, the backward inter-frame prediction rate-distortion cost, the intra-frame prediction rate-distortion cost, and a total number of the plurality of encoded blocks.
According to some embodiments of the present disclosure, the calculating rate distortion cost module may include: a module configured to calculate an inter-prediction rate-distortion cost between the encoded block and a corresponding encoded block in a forward neighboring reference frame of the current video frame as a forward inter-prediction rate-distortion cost; and means for calculating an inter-prediction rate-distortion cost between the encoded block and a corresponding encoded block in a backward adjacent reference frame of the current video frame as a backward inter-prediction rate-distortion cost.
According to some embodiments of the present disclosure, determining the trending reference direction module may include: a forward and backward prediction coding block determining module configured to determine whether the coding block is a forward prediction coding block or a backward prediction coding block based on the forward inter-frame prediction rate-distortion cost, the backward inter-frame prediction rate-distortion cost, and the intra-frame prediction rate-distortion cost; a forward and backward encoding block number determining module configured to determine the number of forward predictive encoding blocks and backward predictive encoding blocks among the plurality of encoding blocks; and a determine trending reference direction sub-module configured to determine a trending reference direction for the current video frame based on the number of forward predictive coding blocks, the number of backward predictive coding blocks, and the total number of the plurality of coding blocks.
According to some embodiments of the present disclosure, determining the forward and backward predictive coding block module may include: means for determining the encoded block as a forward prediction encoded block in response to determining that the forward inter-prediction rate-distortion cost of the encoded block is less than the intra-prediction rate-distortion cost; and means for determining the encoded block as a backward predicted encoded block in response to determining that the backward inter-prediction rate-distortion cost of the encoded block is less than the intra-prediction rate-distortion cost.
According to some embodiments of the present disclosure, determining the trending reference direction sub-module may include: a module configured to determine a trending reference direction of the current video frame as forward in response to determining that the number of forward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks; and means for determining that the trending reference direction of the current video frame is backward in response to determining that the number of forward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks and that the number of backward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks.
According to some embodiments of the present disclosure, determining the trending reference direction sub-module may further include: a module configured to determine that the trending reference direction of the current video frame is uncertain in response to determining that it is not both: the number of forward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks; and the number of forward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is greater than the first threshold multiple of the total number of the plurality of coding blocks.
According to some embodiments of the present disclosure, the second determining module 430 may further include: and the updating module is configured to update the trend reference direction of the current video frame to obtain an updated trend reference direction.
According to some embodiments of the present disclosure, the update module may include: a module configured to determine a trending reference direction of a forward neighboring reference frame of the current video frame in response to determining the trending reference direction of the current video frame as forward; and means for updating the trending reference direction of the current video frame to be uncertain in response to determining the trending reference direction of the forward neighboring reference frame to be backward.
According to some embodiments of the present disclosure, the update module may include: a module configured to determine a trending reference direction of a backward neighboring reference frame of the current video frame in response to determining the trending reference direction of the current video frame as backward; and means for updating the trending reference direction of the current video frame to be uncertain in response to determining the trending reference direction of the backward neighboring reference frame to be forward.
According to some embodiments of the present disclosure, determining to prune the plurality of reference frames based on the attributes of the plurality of reference frames may include: a module configured to determine a number of the plurality of reference frames; a module configured to determine whether a plurality of reference frames includes both a forward reference frame and a backward reference frame; and means for pruning the plurality of reference frames in response to determining that the number of the plurality of reference frames is greater than 2 and that both the forward reference frame and the backward reference frame are included in the plurality of reference frames.
According to some embodiments of the present disclosure, pruning module 440 may include: a module configured to prune all backward reference frames of the plurality of reference frames in response to determining the trending reference direction as forward; a module configured to prune all forward reference frames of the plurality of reference frames in response to determining that the trending reference direction is backward; and means for not pruning the plurality of reference frames in response to determining that the trending reference direction is uncertain.
It should be appreciated that the various modules 410-450 of the apparatus 400 shown in fig. 4 may correspond to the various steps S110-S150 in the method 100 described with reference to fig. 1. Thus, the operations, features and advantages described above with respect to method 100 apply equally to apparatus 400 and the modules comprised thereby. For brevity, certain operations, features and advantages are not described in detail herein.
It should also be appreciated that various techniques may be described herein in the general context of software hardware elements or program modules. The various modules described above with respect to fig. 4 may be implemented in hardware or in hardware in combination with software and/or firmware. For example, the modules may be implemented as computer program code/instructions configured to be executed in one or more processors and stored in a computer-readable storage medium. Alternatively, these modules may be implemented as hardware logic/circuitry. For example, in some embodiments, one or more of the acquisition module 410, the first determination module 420, the second determination module 430, the pruning module 540, and the encoding module 450 may be implemented together in a System on Chip (SoC). The SoC may include an integrated circuit chip including one or more components of a processor (e.g., a central processing unit (Central Processing Unit, CPU), microcontroller, microprocessor, digital signal processor (Digital Signal Processor, DSP), etc.), memory, one or more communication interfaces, and/or other circuitry, and may optionally execute received program code and/or include embedded firmware to perform functions.
According to another aspect of the present disclosure, there is also provided an electronic apparatus including: at least one processor; and at least one memory communicatively coupled to the at least one processor; wherein the at least one memory stores a computer program which, when executed by the at least one processor, implements the video encoding method described above.
According to another aspect of the present disclosure, there is also provided a computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the video encoding method described above.
According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the video encoding method described above.
Referring to fig. 5, a block diagram of an electronic device 500 that may be a server of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. The electronic devices may be different types of computer devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the electronic device 500 may include at least one processor 510, a working memory 520, an input unit 540, a display unit 550, a speaker 560, a storage unit 570, a communication unit 580, and other output units 590 capable of communicating with each other through a system bus 530.
Processor 510 may be a single processing unit or multiple processing units, all of which may include a single or multiple computing units or multiple cores. Processor 510 may be implemented as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuitries, and/or any devices that manipulate signals based on operational instructions. The processor 510 may be configured to obtain and execute computer readable instructions stored in the working memory 520, the storage unit 570, or other computer readable medium, such as program code of the operating system 520a, program code of the application programs 520b, etc.
Working memory 520 and storage unit 570 are examples of computer-readable storage media for storing instructions that are executed by processor 510 to implement the various functions described previously. Working memory 520 may include both volatile memory and nonvolatile memory (e.g., RAM, ROM, etc.). In addition, the storage unit 570 may include a hard disk drive, a solid state drive, a removable media, including external and removable drives, memory cards, flash memory, floppy disks, optical disks (e.g., CDs, DVDs), storage arrays, network attached storage, storage area networks, and the like. The working memory 520 and the storage unit 570 may both be referred to herein collectively as memory or computer-readable storage medium, and may be non-transitory media capable of storing computer-readable, processor-executable program instructions as computer program code that may be executed by the processor 510 as a particular machine configured to implement the operations and functions described in the examples herein.
The input unit 560 may be any type of device capable of inputting information to the electronic device 500, the input unit 560 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, an operationA wand, a microphone and/or a remote control. The output unit may be any type of device capable of presenting information and may include, but is not limited to, a display unit 550, a speaker 560, and other output units 590 may include, but are not limited to, video/audio output terminals, vibrators, and/or printers. The communication unit 580 allows the electronic device 500 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth TM Devices, 802.11 devices, wi-Fi devices, wiMAX devices, cellular communication devices, and/or the like.
The application 520b in the working register 520 may be loaded to perform the various methods and steps thereof described above, such as steps S110-S150 in fig. 1, steps S232-S236 in fig. 2, and steps S332-S338 in fig. 3. For example, in some embodiments, the various methods described above may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 570. In some embodiments, part or all of the computer program may be loaded and/or installed onto electronic device 500 via storage unit 570 and/or communication unit 580. One or more of the steps of the method 100 described above may be performed when a computer program is loaded and executed by the processor 510. Alternatively, in other embodiments, processor 510 may be configured to perform method 100 in any other suitable manner (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the foregoing methods, systems, and apparatus are merely exemplary embodiments or examples, and that the scope of the present invention is not limited by these embodiments or examples but only by the claims following the grant and their equivalents. Various elements of the embodiments or examples may be omitted or replaced with equivalent elements thereof. Furthermore, the steps may be performed in a different order than described in the present disclosure. Further, various elements of the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced by equivalent elements that appear after the disclosure.

Claims (16)

1. A video encoding method, comprising:
acquiring a video to be encoded, wherein the video to be encoded comprises one or more video frames;
determining, for a current video frame of the one or more video frames, a plurality of reference frames corresponding to the current video frame;
determining a trending reference direction of the current video frame in response to determining that the plurality of reference frames are to be pruned based on attributes of the plurality of reference frames, the trending reference direction indicating an association relationship between a target reference frame and the current video frame, the target reference frame being included in the plurality of reference frames;
Pruning the plurality of reference frames according to the trend reference direction; and
and encoding the video to be encoded based on the pruning result.
2. The video encoding of claim 1, wherein determining the trending reference direction of the current video frame in response to determining that the plurality of reference frames are to be pruned based on attributes of the plurality of reference frames comprises:
dividing the current video frame to obtain a plurality of coding blocks with preset sizes;
for each of the plurality of encoded blocks, calculating a forward inter-frame predicted rate-distortion cost, a backward inter-frame predicted rate-distortion cost, and an intra-frame predicted rate-distortion cost for the encoded block; and
the trending reference direction of the current video frame is determined based on the forward inter-frame prediction rate-distortion cost, the backward inter-frame prediction rate-distortion cost, the intra-frame prediction rate-distortion cost, and a total number of the plurality of encoded blocks.
3. The method of claim 2, wherein for each of the plurality of encoded blocks, calculating a forward inter-prediction rate-distortion cost, a backward inter-prediction rate-distortion cost, and an intra-prediction rate-distortion cost for the encoded block comprises:
Calculating inter-frame prediction rate-distortion cost between the coding block and a corresponding coding block in a forward adjacent reference frame of the current video frame as the forward inter-frame prediction rate-distortion cost; and
and calculating the inter-frame prediction rate distortion cost between the coding block and the corresponding coding block in a backward adjacent reference frame of the current video frame as the backward inter-frame prediction rate distortion cost.
4. The method of claim 3, wherein determining the trending reference direction for the current video frame based on the forward inter-frame prediction rate-distortion cost, the backward inter-frame prediction rate-distortion cost, the intra-frame prediction rate-distortion cost, and a total number of the plurality of encoded blocks comprises:
determining whether the encoded block is a forward predictive encoded block or a backward predictive encoded block based on the forward inter-frame predicted rate-distortion cost, the backward inter-frame predicted rate-distortion cost, and the intra-frame predicted rate-distortion cost;
determining the number of forward predictive coding blocks and backward predictive coding blocks in the plurality of coding blocks; and
the trending reference direction of the current video frame is determined based on the number of forward predictive coding blocks, the number of backward predictive coding blocks, and the total number of the plurality of coding blocks.
5. The method of claim 4, wherein determining whether the encoded block is a forward predictive encoded block or a backward predictive encoded block based on the forward inter-frame rate-distortion cost, the backward inter-frame rate-distortion cost, and the intra-frame rate-distortion cost comprises:
in response to determining that the forward inter-prediction rate-distortion cost of the encoded block is less than the intra-prediction rate-distortion cost, determining that the encoded block is a forward predictive encoded block; and
in response to determining that the backward inter-prediction rate-distortion cost of the encoded block is less than the intra-prediction rate-distortion cost, the encoded block is determined to be a backward predictive encoded block.
6. The method of claim 4 or 5, wherein determining the trending reference direction for the current video frame based on the number of forward predictive coding blocks, the number of backward predictive coding blocks, and the total number of the plurality of coding blocks comprises:
in response to determining that the number of forward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks, determining that the trending reference direction of the current video frame is forward; and
In response to determining that the number of forward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks, the trending reference direction of the current video frame is determined to be backward.
7. The method of claim 6, wherein determining the trending reference direction for the current video frame based on the number of forward predictive encoded blocks, the number of backward predictive encoded blocks, and the total number of the plurality of encoded blocks further comprises:
determining the trending reference direction of the current video frame as uncertain in response to determining that it is not both:
the number of forward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks; and
the number of forward predictive coding blocks is less than a second threshold multiple of the total number of the plurality of coding blocks and the number of backward predictive coding blocks is greater than a first threshold multiple of the total number of the plurality of coding blocks.
8. The method of any of claims 2-7, wherein determining the trending reference direction for the current video frame in response to determining that the plurality of reference frames are to be pruned based on attributes of the plurality of reference frames further comprises:
Updating the trend reference direction of the current video frame to obtain an updated trend reference direction.
9. The method of claim 8, wherein updating the trending reference direction of the current video frame to obtain an updated trending reference direction comprises:
responsive to determining that the trending reference direction of the current video frame is forward, determining a trending reference direction of a forward neighboring reference frame of the current video frame; and
in response to determining that the trending reference direction of the forward neighboring reference frame is backward, the trending reference direction of the current video frame is updated to be uncertain.
10. The method of claim 8 or 9, wherein updating the trending reference direction of the current video frame to obtain an updated trending reference direction comprises:
responsive to determining that the trending reference direction of the current video frame is backward, determining a trending reference direction of a backward neighboring reference frame of the current video frame; and
in response to determining that the trending reference direction of the backward neighboring reference frame is forward, the trending reference direction of the current video frame is updated to be uncertain.
11. The method of any of claims 1-10, wherein determining that the plurality of reference frames are to be pruned based on attributes of the plurality of reference frames comprises:
determining a number of the plurality of reference frames;
determining whether the plurality of reference frames includes both a forward reference frame and a backward reference frame; and
the plurality of reference frames are pruned in response to determining that the number of the plurality of reference frames is greater than 2 and that both forward reference frames and backward reference frames are included in the plurality of reference frames.
12. The method of claim 11, wherein pruning the plurality of reference frames according to the trending reference direction comprises:
pruning all backward reference frames of the plurality of reference frames in response to determining that the trending reference direction is forward;
pruning all forward reference frames of the plurality of reference frames in response to determining that the trending reference direction is backward; and
in response to determining that the trending reference direction is uncertain, pruning is not performed on the plurality of reference frames.
13. A video encoding apparatus, comprising:
an acquisition module configured to acquire a video to be encoded, the video to be encoded comprising one or more video frames;
A first determination module configured to determine, for a current video frame of the one or more video frames, a plurality of reference frames corresponding to the current video frame;
a second determining module configured to determine a trending reference direction of the current video frame in response to determining that the plurality of reference frames are to be pruned based on attributes of the plurality of reference frames, the trending reference direction indicating an association relationship between a target reference frame and the current video frame, the target reference frame being included in the plurality of reference frames;
pruning module configured to prune the plurality of reference frames according to the trending reference direction; and
and the coding module is configured to code the video to be coded based on pruning results.
14. An electronic device, comprising:
at least one processor; and
at least one memory communicatively coupled to the at least one processor,
wherein the at least one memory stores a computer program that, when executed by the at least one processor, implements the method of any of claims 1-12.
15. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method of any one of claims 1-12.
16. A computer program product comprising a computer program, wherein the computer program, when executed by a processor, implements the method of any of claims 1-12.
CN202310325793.3A 2023-03-29 2023-03-29 Video encoding method and apparatus, electronic device, and computer-readable storage medium Pending CN116389768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310325793.3A CN116389768A (en) 2023-03-29 2023-03-29 Video encoding method and apparatus, electronic device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310325793.3A CN116389768A (en) 2023-03-29 2023-03-29 Video encoding method and apparatus, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN116389768A true CN116389768A (en) 2023-07-04

Family

ID=86974432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310325793.3A Pending CN116389768A (en) 2023-03-29 2023-03-29 Video encoding method and apparatus, electronic device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN116389768A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117596392A (en) * 2023-09-28 2024-02-23 书行科技(北京)有限公司 Coding information determining method of coding block and related product

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117596392A (en) * 2023-09-28 2024-02-23 书行科技(北京)有限公司 Coding information determining method of coding block and related product

Similar Documents

Publication Publication Date Title
CN110248189B (en) Video quality prediction method, device, medium and electronic equipment
JP2004531950A (en) Distortion quantizer model for video coding
EP3282701A1 (en) Prediction mode selection method, apparatus and device
CN111787322B (en) Video coding method and device, electronic equipment and computer readable storage medium
WO2023045420A1 (en) Image processing method and apparatus, electronic device, and storage medium
US20220060716A1 (en) Method and apparatus for determining video bitrate, computer device, and storage medium
CN116389768A (en) Video encoding method and apparatus, electronic device, and computer-readable storage medium
WO2018161845A1 (en) Method for video coding code rate allocation and coding unit code rate allocation, and computer equipment
CN113596442A (en) Video processing method and device, electronic equipment and storage medium
WO2024012263A1 (en) Video coding processing method, apparatus and device, and storage medium
CN112449182A (en) Video encoding method, device, equipment and storage medium
WO2018014301A1 (en) Video coding method and device
CN110740324B (en) Coding control method and related device
WO2020186763A1 (en) Image component prediction method, encoder, decoder and storage medium
CN103517074A (en) Image encoding apparatus and control method thereof
JP4842899B2 (en) Moving picture coding apparatus, moving picture coding method, and program
CN112243129B (en) Video data processing method and device, computer equipment and storage medium
RU2587412C2 (en) Video rate control based on transform-coefficients histogram
US11272222B2 (en) Bit rate control method and video processing device
CN115442617A (en) Video processing method and device based on video coding
JP2024506130A (en) Data processing methods, devices and computer programs.
CN111510715B (en) Video processing method, system, computer device and storage medium
CN113099241A (en) Reference frame list updating method, device, equipment and storage medium
CN113542737A (en) Encoding mode determining method and device, electronic equipment and storage medium
CN112738529A (en) Inter-frame prediction method, device, equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination