CN114666579A - Video coding method and device, electronic equipment and storage medium - Google Patents

Video coding method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN114666579A
CN114666579A CN202210287419.4A CN202210287419A CN114666579A CN 114666579 A CN114666579 A CN 114666579A CN 202210287419 A CN202210287419 A CN 202210287419A CN 114666579 A CN114666579 A CN 114666579A
Authority
CN
China
Prior art keywords
coding unit
prediction mode
target coding
prediction
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210287419.4A
Other languages
Chinese (zh)
Inventor
邵宇超
郭磊
陈宇聪
黄跃
闻兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202210287419.4A priority Critical patent/CN114666579A/en
Publication of CN114666579A publication Critical patent/CN114666579A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present disclosure relates to a video encoding method, apparatus, electronic device, and storage medium, the video encoding method including: acquiring feature information of a target coding unit, feature information of a parent coding unit of the target coding unit and feature information of adjacent coding units in a video frame to be coded; determining a prediction mode of the target coding unit based on the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units; encoding the target coding unit based on the determined prediction mode. According to the video coding method, the video coding device, the electronic equipment and the storage medium, the problems that the traversal prediction mode occupies a large amount of resources and is extremely time-consuming can be solved, and the prediction mode of the unit to be coded can be determined, so that the computational complexity in the mode selection aspect of the inter-coded frame is saved, and the coding efficiency is improved.

Description

Video coding method and device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of video encoding technologies, and in particular, to a video encoding method and apparatus, an electronic device, and a storage medium.
Background
In video coding, each video frame image is generally coded, wherein three types of coded frames, I frame, P frame and B frame, exist, I frame is an intra-coded frame, P frame is a forward reference frame, and B frame is a bidirectional reference frame, where I frame is an independent frame with all information, and can be independently decoded without referring to other frames, and it is coded by intra-frame prediction. Both P-frames and B-frames are inter-coded frames, which may be encoded using either intra-prediction or inter-prediction.
In the existing encoding method for inter-frame encoded frames, all preset prediction modes including an intra-frame prediction mode and an inter-frame prediction mode need to be traversed, and then the prediction mode with the minimum rate-distortion cost is selected as a final prediction mode for encoding.
However, with the development of coding technology, in order to improve the coding performance of the encoder, the prediction modes are continuously increased and optimized, which results in that the process of traversing the prediction modes in the existing coding method occupies a large amount of computing resources, is time-consuming, and is not beneficial to improving the coding efficiency.
Disclosure of Invention
The present disclosure provides a video encoding method, an apparatus, an electronic device and a storage medium, so as to at least solve the problem in the related art that traversing a prediction mode occupies a large amount of resources and is time-consuming. The technical scheme of the disclosure is as follows:
according to a first aspect of embodiments of the present disclosure, there is provided a video encoding method, the video encoding method including: acquiring feature information of a target coding unit, feature information of a parent coding unit of the target coding unit and feature information of adjacent coding units in a video frame to be coded; determining a prediction mode of the target coding unit based on the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units; encoding the target coding unit based on the determined prediction mode, wherein the target coding unit is divided from the parent coding unit, the adjacent coding unit is a coding unit adjacent to the target coding unit in the video frame to be encoded, and the prediction mode is intra-frame prediction or inter-frame prediction.
Optionally, the determining the prediction mode of the target coding unit based on the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units comprises: inputting the feature information of the target coding unit, the feature information of the parent coding unit and the feature information of the adjacent coding units into a preset neural network model to obtain estimated prediction mode related information of the target coding unit; determining a prediction mode of the target coding unit based on the estimated prediction mode related information of the target coding unit.
Optionally, the prediction mode related information is a first probability that the best candidate prediction mode of the target coding unit is intra prediction and/or a second probability that the best candidate prediction mode of the target coding unit is inter prediction, where the best candidate prediction mode refers to a prediction mode with a smallest rate-distortion cost among predetermined candidate prediction modes.
Optionally, the step of determining the prediction mode of the target coding unit based on the estimated prediction mode related information of the target coding unit comprises: comparing one of the first probability and the second probability with a preset probability threshold; determining a prediction mode corresponding to the one as a prediction mode of the target coding unit when the comparison result indicates that the one is greater than the probability threshold; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is less than the probability threshold; determining a prediction mode of the target coding unit by calculating a rate-distortion cost for the target coding unit traversing all candidate prediction modes of the intra prediction and all candidate prediction modes of the inter prediction when the comparison result indicates that the one is equal to the probability threshold.
Optionally, the step of determining the prediction mode of the target coding unit based on the estimated prediction mode related information of the target coding unit comprises: comparing one of the first and second probabilities to preset first and second probability thresholds, wherein the first probability threshold is less than the second probability threshold; determining a prediction mode corresponding to the one as a prediction mode of the target coding unit when the comparison result indicates that the one is greater than the second probability threshold; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is less than the first probability threshold; determining a prediction mode of the target coding unit by calculating rate-distortion costs for traversing all candidate prediction modes of the intra-prediction and all candidate prediction modes of the inter-prediction for the target coding unit when a comparison result indicates that the one is less than or equal to the second probability threshold and greater than or equal to the first probability threshold.
Optionally, the feature information of the target coding unit comprises at least one of: a first feature related to a partitioning of the target coding unit, a second feature related to a prediction result of the target coding unit in a predetermined first candidate prediction mode, a third feature related to a pixel within the target coding unit, a fourth feature related to the video frame, and a fifth feature related to a prediction result of the target coding unit in a predetermined second candidate prediction mode.
Optionally, the first feature includes a size of the target coding unit and a partition depth of the target coding unit, the second feature includes statistics and quantization parameters of a prediction residual of the target coding unit in the first candidate prediction mode, the third feature includes texture information of the target coding unit, the fourth feature includes a coding level of the video frame, and the fifth feature includes a rate-distortion cost of the target coding unit in the second candidate prediction mode.
Optionally, the feature information of the parent coding unit includes at least one of: features related to division of the parent coding unit, features related to an optimal candidate prediction mode of the parent coding unit, inter-frame features related to the parent coding unit of the video frame, and a rate-distortion cost of the parent coding unit in a third candidate prediction mode, wherein the third candidate prediction mode is all candidate prediction modes in a prediction mode corresponding to an optimal candidate prediction mode of a previous coding unit of the parent coding unit, the parent coding unit is divided from the previous coding unit, and the feature information of the neighboring coding unit includes: features related to partitioning of the neighboring coding units, features related to an optimal candidate prediction mode of the neighboring coding units, and inter features of the video frame related to the neighboring coding units, wherein the optimal candidate prediction mode refers to a candidate prediction mode having a smallest rate-distortion cost among a plurality of predetermined candidate prediction modes.
Optionally, the feature related to the division of the parent coding unit includes a size of the parent coding unit, the feature related to the best candidate prediction mode of the parent coding unit includes a prediction mode of the best candidate prediction mode of the parent coding unit, the inter-frame feature related to the parent coding unit of the video frame includes a motion vector of the parent coding unit, the feature related to the division of the neighboring coding unit includes a size of the neighboring coding unit, the feature related to the best candidate prediction mode of the neighboring coding unit includes a prediction mode of the best candidate prediction mode of the neighboring coding unit, and the inter-frame feature related to the neighboring coding unit of the video frame includes a motion vector of the neighboring coding unit.
According to a second aspect of the embodiments of the present disclosure, there is provided a video encoding device, comprising: an acquisition unit configured to acquire feature information of a target coding unit, feature information of a parent coding unit of the target coding unit, and feature information of an adjacent coding unit in a video frame to be coded; a determination unit configured to determine a prediction mode of the target coding unit based on the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units; a coding unit configured to code the target coding unit based on a determined prediction mode, wherein the target coding unit is divided from the parent coding unit, the adjacent coding unit is a coding unit adjacent to the target coding unit in the video frame to be coded, and the prediction mode is intra-prediction or inter-prediction.
Optionally, the determining unit is further configured to: inputting the feature information of the target coding unit, the feature information of the parent coding unit and the feature information of the adjacent coding units into a preset neural network model to obtain estimated prediction mode related information of the target coding unit; determining a prediction mode of the target coding unit based on the estimated prediction mode related information of the target coding unit.
Optionally, the prediction mode related information is a first probability that the best candidate prediction mode of the target coding unit is intra prediction and/or a second probability that the best candidate prediction mode of the target coding unit is inter prediction, where the best candidate prediction mode refers to a prediction mode with a smallest rate-distortion cost among predetermined candidate prediction modes.
Optionally, the determining unit is further configured to: comparing one of the first probability and the second probability with a preset probability threshold; determining a prediction mode corresponding to the one as a prediction mode of the target coding unit when the comparison result indicates that the one is greater than the probability threshold; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is less than the probability threshold; determining a prediction mode of the target coding unit by calculating a rate-distortion cost for the target coding unit traversing all candidate prediction modes of the intra prediction and all candidate prediction modes of the inter prediction when the comparison result indicates that the one is equal to the probability threshold.
Optionally, the determining unit is further configured to: comparing one of the first and second probabilities to preset first and second probability thresholds, wherein the first probability threshold is less than the second probability threshold; determining a prediction mode corresponding to the one as a prediction mode of the target coding unit when the comparison result indicates that the one is greater than the second probability threshold; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is less than the first probability threshold; determining a prediction mode of the target coding unit by calculating rate-distortion costs for traversing all candidate prediction modes of the intra-prediction and all candidate prediction modes of the inter-prediction for the target coding unit when a comparison result indicates that the one is less than or equal to the second probability threshold and greater than or equal to the first probability threshold.
Optionally, the feature information of the target coding unit comprises at least one of: the first feature related to the division of the target coding unit, the second feature related to the prediction result of the target coding unit in a predetermined first candidate prediction mode, the third feature related to the pixel in the target coding unit, the fourth feature related to the video frame, and the fifth feature related to the prediction result of the target coding unit in a predetermined second candidate prediction mode.
Optionally, the first feature includes a size of the target coding unit and a partition depth of the target coding unit, the second feature includes statistics and quantization parameters of a prediction residual of the target coding unit in the first candidate prediction mode, the third feature includes texture information of the target coding unit, the fourth feature includes a coding level of the video frame, and the fifth feature includes a rate-distortion cost of the target coding unit in the second candidate prediction mode.
Optionally, the feature information of the parent coding unit comprises at least one of: features related to division of the parent coding unit, features related to an optimal candidate prediction mode of the parent coding unit, inter-frame features related to the parent coding unit of the video frame, and a rate-distortion cost of the parent coding unit in a third candidate prediction mode, wherein the third candidate prediction mode is all candidate prediction modes in a prediction mode corresponding to an optimal candidate prediction mode of a previous coding unit of the parent coding unit, the parent coding unit is divided from the previous coding unit, and the feature information of the neighboring coding unit includes: features related to partitioning of the neighboring coding units, features related to an optimal candidate prediction mode of the neighboring coding units, and inter features of the video frame related to the neighboring coding units, wherein the optimal candidate prediction mode refers to a candidate prediction mode having a smallest rate-distortion cost among a plurality of predetermined candidate prediction modes.
Optionally, the feature related to partitioning of the parent coding unit includes a size of the parent coding unit, the feature related to best candidate prediction mode of the parent coding unit includes a prediction mode of best candidate prediction mode of the parent coding unit, the inter-frame feature related to the parent coding unit of the video frame includes a motion vector of the parent coding unit, the feature related to partitioning of the neighboring coding unit includes a size of the neighboring coding unit, the feature related to best candidate prediction mode of the neighboring coding unit includes a prediction mode of best candidate prediction mode of the neighboring coding unit, and the inter-frame feature related to the neighboring coding unit of the video frame includes a motion vector of the neighboring coding unit.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions, wherein the processor-executable instructions, when executed by the processor, cause the processor to perform the video encoding method disclosed in the root.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform a video encoding method according to the present disclosure.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising computer instructions which, when executed by a processor, implement a video encoding method according to the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
the prediction mode of the unit to be encoded may be determined, thereby allowing all candidate prediction modes in one of inter prediction and intra prediction among the candidate prediction modes to be skipped, saving computational complexity in selecting a mode of an inter-coded frame, and improving encoding efficiency.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
Fig. 1 is a flow chart illustrating a method of video encoding according to an example embodiment.
Fig. 2 is a diagram illustrating coding unit partitioning in a video coding method according to an example embodiment.
Fig. 3 is a diagram illustrating an encoding order in a video encoding method according to an example embodiment.
Fig. 4 is a schematic diagram of a neural network illustrating a video encoding method according to an example embodiment.
Fig. 5 is a block diagram illustrating a video encoding apparatus according to an example embodiment.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of a and B" includes the following three cases in parallel: (1) comprises A; (2) comprises B; (3) including a and B. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.
It should be further noted that the video encoding method and apparatus according to the exemplary embodiments of the present disclosure may be applied to devices for computing services, including but not limited to personal computers, tablet computers, smart phones, and the like.
In video coding, a hybrid coding method based on coding blocks may be adopted, and specifically, each frame image of a video may be divided into a plurality of coding blocks, intra-frame or inter-frame prediction may be performed in units of the coding blocks, then, the prediction residual may be transformed and quantized, and finally, entropy coding may be performed on information such as prediction mode information and quantized prediction residual, so as to obtain a coded stream.
In general, in order to adapt to various video contents and video characteristics, most video codecs support the selection of an intra prediction mode for an encoding mode of a block to be encoded in an intra encoded frame. However, for an inter-coded frame, the prediction mode traversal needs to be performed on the block to be coded, so that a prediction mode with the smallest Rate-Distortion cost can be calculated from a plurality of preset prediction modes through a Rate-Distortion Optimization (RDO) decision method as the coding mode of the block to be coded.
In such an approach, since the process of traversal takes up a lot of computational resources and is extremely time consuming, in some cases the computational time spent for prediction mode selection in inter-coded frames is more than even 20% of the total time of coding of the current frame. Such a traversal process results in a complex and diverse prediction mode that is difficult to apply in practice.
In order to solve the problems that the method for traversing the prediction modes occupies resources and consumes long time, on one hand, the motion characteristics and the detail characteristics of the blocks to be coded can be calculated, and the inter-frame prediction mode which meets the preset conditions is selected to carry out RDO decision, so that some unnecessary modes to be selected can be effectively skipped from multiple inter-frame prediction modes, the calculation complexity of an encoder is reduced, the calculation amount is reduced, and the method is suitable for practical application. However, in such a method, only the inter-frame prediction mode is considered, and the case that the intra-frame prediction mode exists in the blocks to be encoded of the inter-frame encoded frame is not considered, and in addition, the conventional motion feature and detail feature calculation method does not fit the characteristics of different blocks to be encoded well, and it is difficult to accurately remove the unqualified modes to be selected from various inter-frame prediction modes.
On the other hand, the prediction mode of the current block to be coded can be obtained by utilizing the multilayer convolutional neural network, so that the traversal of the candidate intra-frame prediction mode is further reduced, and the computational complexity of intra-frame prediction is reduced. However, in such a method, only the intra prediction mode is considered, and the inter prediction scenario is not considered, and in addition, the more complex convolutional neural network also introduces a greater computational complexity.
In view of the above-described problems, a video encoding method, a video encoding apparatus, an electronic device, a computer-readable storage medium, and a computer program product according to exemplary embodiments of the present disclosure will be provided below with reference to the accompanying drawings.
Fig. 1 is a flow chart illustrating a video encoding method according to an exemplary embodiment, which may include the steps of, as shown in fig. 1:
in step S10, the feature information of the target coding unit, the feature information of the parent coding unit of the target coding unit, and the feature information of the neighboring coding units in the video frame to be encoded may be acquired.
To facilitate understanding of this step, the following first describes a division process of the coding unit. Specifically, in a video encoding process, a video frame image may be captured from a video to be encoded first, and the video frame to be encoded may be divided into a plurality of encoding units. Fig. 2 illustrates an exemplary diagram of coding unit division, as shown in fig. 2, first, a video frame to be coded may be initially divided in a predetermined size, the initial division may be equally divided, that is, each of the divided coding units has the same shape and size, and for each of the coding units obtained after the initial division, depth division may be performed, where the initially divided coding units may be depth-divided based on a predetermined rule or model to further perform secondary division, tertiary division, and the like on the coding units, for example, each of the initially divided coding units may have a size of 64 × 64 pixels, and according to pixel characteristics of the coding units, a division depth may be determined based on the predetermined rule or model to further divide the coding units, for example, in the case that the division depth is 2, the initially divided coding units may be sub-divided, obtaining four coding units with 32 pixels by 32 pixels; when the division depth is 3, the coding units obtained by the initial division may be divided into two and three times, that is, four coding units of 32 × 32 pixels are obtained by dividing the coding units obtained by the initial division, and four coding units of 16 × 16 pixels are obtained by dividing each coding unit obtained by the secondary division.
Here, an appropriate division depth may be determined based on the characteristics of pixels in the initial coding unit, and the division depth may be a balance between the accuracy of prediction of the coding unit and the amount of calculation, that is, the division is too fine to improve the accuracy, but may cause an increase in the amount of calculation; too coarse a partition may reduce the amount of computation, but may cause a reduction in accuracy, and the determined depth of the partition may be a value that balances the two. Here, the predetermined rule or model for determining the partition depth may be arbitrary, and may be, for example, an existing method for partitioning a video coding block, which is not particularly limited by the present disclosure.
The above describes the dividing process of the coding units, and after determining the division depth, the finally divided coding units can be obtained, it should be noted that the shape of the finally divided coding units can be arbitrary, for example, a square, a rectangle, etc., and the division depths of different coding units can be different, so that the sizes of different coding units can be different, for example, as shown in fig. 2, the sizes of adjacent coding units can be different. In fig. 2, a number on each coding unit may represent a division depth of the coding unit.
In this step S10, the target coding unit may be a coding unit finally divided from the video frame, such as the coding units A, B, C and D in fig. 2, which may have a determined division depth, and the target coding unit may be divided from the parent coding unit, that is, the division depth of the parent coding unit may be 1 layer less than the division depth of the target coding unit, and taking the above example as an example, if the target coding unit is 16 × 16 pixels in size and the division depth thereof is 3, the parent coding unit thereof may be 32 × 32 pixels in size and the division depth thereof is 2, and the parent coding unit may be divided from the initial coding unit having a 64 × 64 pixel size.
Taking fig. 2 as an example, the coding unit a1 (with a pixel size of 64 × 64) obtained by initially dividing the video frame is divided twice to obtain four coding units (with pixel sizes of 32 × 32 respectively) including the coding units B2 and a2, wherein the coding unit B2 is further divided three times to obtain four coding units (with pixel sizes of 16 × 16 respectively) including the coding units B and C, and therefore, the parent coding unit of the coding units B and C is the coding unit B2 obtained by dividing twice; the coding unit a2 is not divided into three times, that is, the coding unit a in the final divided grid is the coding unit a2 obtained by dividing into two times, the pixel size of the coding unit a2 × 32, and the parent coding unit of the coding unit a is still the coding unit a 1.
The adjacent coding units are coding units adjacent to the target coding unit in the video frame, and here, since the division depth and size of the coding unit obtained by the final division may be different, the size of the target coding unit may be different from that of the adjacent coding units, and therefore, on any one of four sides of the target coding unit, the coding units adjacent to the target coding unit may be one or more, as shown in fig. 2, taking the target coding unit as a example, the coding units adjacent to the left side of the coding unit a are two coding units B and C, and the coding unit adjacent to the upper side of the coding unit a is only one coding unit D.
In step S20, the prediction mode of the target coding unit may be determined based on the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units.
In this step, since the target coding unit is divided from the parent coding unit and has a correlation in pixel characteristics with the neighboring coding units, the prediction mode of the target coding unit may be determined based on the characteristic information of the target coding unit, its parent coding unit, and the neighboring coding units neighboring thereto, to encode the target coding unit with a specific mode in the determined prediction mode.
Here, the prediction mode may be intra prediction or inter prediction. Here, the intra prediction mode and the inter prediction mode may include a plurality of candidate prediction modes, respectively, and these candidate prediction modes may be any encoding prediction mode, for example, a Merge mode which is an existing inter prediction, or the like.
As an example, in this step S20, the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units may be input into a preset neural network model, resulting in estimated prediction mode related information of the target coding unit; determining a prediction mode of the target coding unit based on the estimated prediction mode-related information of the target coding unit.
Specifically, the prediction mode related information of the target coding unit estimated by the neural network may indicate a first probability that the best candidate prediction mode of the target coding unit is intra prediction and/or a second probability that the best candidate prediction mode of the target coding unit is inter prediction, where the best candidate prediction mode refers to a prediction mode in which a rate-distortion cost is minimum among predetermined candidate prediction modes, and where the predetermined candidate prediction modes may include an inter prediction mode and an intra prediction mode.
Here, the neural network model may be a neural network trained in advance for predicting a prediction mode of the target coding unit. For example, training samples may be obtained, the training samples may include sample coding units and coding labels corresponding to the sample coding units, the coding labels indicate whether the best candidate prediction mode of the sample coding units is an inter-frame prediction mode or an intra-frame prediction mode, the sample coding units may be input into an initial neural network model to output probability values of each sample coding unit being the inter-frame prediction mode and/or the intra-frame prediction mode from the neural network model, and the probability values output based on the model may be compared with pre-labeled coding labels corresponding to the sample coding units, so that the neural network model may be trained. Therefore, the probability values under the intra-frame prediction mode and the inter-frame prediction mode can be better fitted and predicted through the training of the neural network, and the mode selection is accelerated based on the probability values.
According to an exemplary embodiment of the present disclosure, a lightweight neural network may be employed as the neural network model described above. For example, as shown in fig. 4, a full-connected lightweight neural network of only 3 layers may be employed, by which it can be predicted whether the target coding unit needs to skip intra prediction or inter prediction for traversal of candidate prediction modes in the entire class.
Here, in the case of determining the probability of intra prediction and/or inter prediction, the prediction mode may be determined by:
in the case where only the probability of intra prediction or inter prediction is calculated, for example, only one output node of the neural network model is provided, and the prediction mode may be determined by setting a probability threshold in advance.
In one example, the determining the prediction mode of the target coding unit based on the estimated prediction mode related information of the target coding unit may include: comparing one of the first probability and the second probability with a preset probability threshold; determining a prediction mode corresponding to the one as a prediction mode of the target coding unit when the comparison result indicates that the one is greater than the probability threshold; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is less than the probability threshold; when the comparison result indicates that the one is equal to the probability threshold, determining the prediction mode of the target coding unit by calculating rate-distortion costs for the target coding unit through all candidate prediction modes of intra prediction and all candidate prediction modes of inter prediction.
Specifically, a probability threshold T may be set, for example, T may be 0.5, and when the probability of intra prediction (or inter prediction) is greater than the probability threshold, intra prediction (or inter prediction) may be selected as a prediction mode, and traversal of all inter prediction (or intra prediction) modes is skipped; when the probability of intra prediction (or inter prediction) is less than the probability threshold, inter prediction (or intra prediction) may be selected as the prediction mode, skipping the traversal of all intra prediction (or inter prediction) modes. When the probability of intra prediction (or inter prediction) is equal to the probability threshold, no selection may be made and the process of traversing all candidate prediction modes may still be performed.
In another example, the determining of the prediction mode of the target coding unit based on the estimated prediction mode related information of the target coding unit may include: comparing one of the first probability and the second probability with a preset first probability threshold and a second probability threshold, wherein the first probability threshold is smaller than the second probability threshold; when the comparison result indicates that the one is greater than a second probability threshold, determining a prediction mode corresponding to the one as a prediction mode of the target coding unit; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is smaller than the first probability threshold; when the comparison result indicates that the one is less than or equal to the second probability threshold and greater than or equal to the first probability threshold, determining the prediction mode of the target coding unit by calculating rate-distortion costs for traversing all candidate prediction modes of intra prediction and all candidate prediction modes of inter prediction for the target coding unit.
Specifically, a first probability threshold T1 and a second probability threshold T2(T1< T2) may be set to determine whether a traversal of the prediction mode in the entire class within a frame or an inter frame needs to be skipped, T1 may be 0.2, for example, and T2 may be 0.8, for example. Skipping traversal of all inter-prediction (or intra-prediction) modes when the probability of intra-prediction (or inter-prediction) is greater than a second probability threshold T2; skipping traversal of all intra-prediction (or inter-prediction) modes when the probability of intra-prediction (or inter-prediction) is less than a first probability threshold T1; when the probability of intra prediction (or inter prediction) is less than or equal to the second probability threshold T2 and greater than or equal to the first probability threshold T1, no selection may be made and the process of traversing all candidate prediction modes may still be performed. In this manner, a confidence interval may be set for the probability of mode prediction to ensure the accuracy of prediction mode selection.
In the case of calculating the probabilities of both intra prediction and inter prediction, for example, the number of output nodes of the neural network model is two, and in one example, a probability threshold may be set for both probabilities by setting the probability threshold in advance, so as to determine the prediction mode; in another example, the more probable mode may be selected and the less probable mode traversal skipped by comparing the magnitude of the probabilities for both intra and inter predictions.
Thus, in the video encoding method according to the present disclosure, the prediction mode may be selected based on the lightweight neural network, so that the entire large class of candidate prediction modes may be allowed to be skipped, for example, traversal of all candidate prediction modes under intra-frame prediction or inter-frame prediction may be skipped.
However, in the video encoding method according to the present disclosure, the manner of determining the prediction mode based on the feature information of the target coding unit, the parent coding unit, and the neighboring coding units is not limited to the manner of using the neural network, and the probability may be calculated by an existing statistical method.
In step S30, the target coding unit may be encoded based on the determined prediction mode.
According to an exemplary embodiment of the present disclosure, since information of the target coding unit, the parent coding unit, and the neighboring coding units is taken into consideration, the prediction mode of the target coding unit may be determined in step S20, and thus, in step S30, prediction modes other than the determined prediction mode may be skipped, for example, if the prediction mode of the target coding unit is determined to be intra prediction, all candidate prediction modes of inter prediction may be skipped while only the candidate prediction modes of intra prediction are traversed in the process of traversing the candidate prediction modes, for example, rate-distortion costs of the candidate prediction modes of respective intra predictions may be compared.
Note that the present document intends to determine a prediction mode, and the specific candidate prediction mode included in each prediction mode is not particularly limited.
As described above, since the feature information of the target coding unit, the parent coding unit, and the neighboring coding unit is taken into consideration, the prediction mode of the target coding unit may be determined, and here, the feature information of the target coding unit, the parent coding unit, and the neighboring coding unit may refer to information that may characterize the characteristics of the corresponding coding unit, so as to determine the prediction mode suitable for the target coding unit by performing statistics and analysis on the characteristics of the three. In this way, prediction and acceleration can be performed for mode selection of a unit to be encoded (e.g., a target encoding unit) in an inter-coded frame, and the probability of selecting an intra-prediction mode and an inter-prediction mode by the unit to be encoded is directly predicted to skip traversal of candidate prediction modes in the intra-frame or inter-frame whole mode, thereby saving computational complexity in mode selection of the inter-coded frame.
Specific examples of the feature information of the target encoding unit, the parent encoding unit, and the adjacent encoding units will be given below.
The characteristic information of the target coding unit may include at least one of: the first feature related to the division of the target coding unit, the second feature related to the prediction result of the target coding unit in the predetermined first candidate prediction mode, the third feature related to the pixel in the target coding unit, the fourth feature related to the video frame, and the fifth feature related to the prediction result of the target coding unit in the predetermined second candidate prediction mode.
Specifically, the first feature related to the partitioning of the target coding unit may be obtained based on partitioning information for partitioning the video frame, for example, the first feature may include a size of the target coding unit and a partitioning depth of the target coding unit, and the size may be 16 × 16 pixels and the partitioning depth may be 3, taking the coding unit B of fig. 2 as the target coding unit as an example.
The second feature related to the prediction result of the target coding unit in the predetermined first candidate prediction mode may include a quantization parameter and a statistic of a prediction residual of the target coding unit in the predetermined first candidate prediction mode.
Here, the predetermined first candidate prediction mode may be any prediction mode, which may be an intra prediction mode or an inter prediction mode, and the target coding unit may be subjected to encoding prediction in the predetermined first candidate prediction mode based on an existing encoder to obtain an encoding prediction result, for example, a Quantization Parameter (QP) and a prediction residual may be obtained, and a related statistic value, for example, a mean value of the prediction residual, may be extracted from the prediction residual.
Here, the quantization parameter may be used to reflect a spatial detail compression condition of the coding unit, specifically, the quantization parameter is in an inverse proportion relationship with the code rate and the bit rate, and the smaller the quantization parameter is, the more details of the coding unit are reserved, and the higher the code rate is; the larger the quantization parameter, the less detail of the coding unit is preserved, the lower the code rate, the greater the image distortion and the reduced image quality. The quantization parameter is determined by the resolution, the input frame rate and the code rate of the video, and reflects the quality effect of the encoder.
Furthermore, the first candidate prediction mode described herein may be one or more, and the present disclosure does not specifically limit the specific type, form, and number of the first candidate prediction modes, nor does the present disclosure specifically limit the manner in which the prediction result of the target coding unit in the first candidate prediction mode is obtained.
The third feature related to the pixels within the target coding unit may include, for example, texture information of the target coding unit, where the texture information may include a mean and a variance of pixel values of the target coding unit and gradient values of the pixel values of the target coding unit in horizontal/vertical directions (i.e., width/height directions of the target coding unit).
The fourth feature relating to the video frames may for example comprise an encoding level of the video frames, the encoding level of each video frame may be determined when capturing a video frame image from the video to be encoded. Specifically, the encoding levels of all I-frames and P-frames are 0 levels, and the encoding level of a B-frame may be determined according to an encoding order determined when capturing video frame images from a video to be encoded.
Fig. 3 shows an example of an encoding order determined when capturing video frame images from a video to be encoded. As shown in fig. 3, the direction indicated by the arrow is the direction of the reference frame, for example, the 8 th frame is a B frame, which refers to the previous 0 th frame and 16 th frame, wherein the 0 th frame is an I frame, the 16 th frame is a P frame, and therefore, the 8 th frame is a 0-level frame in the B frame; as can be seen from fig. 3, the 4 th frame is also a B frame, which refers to the previous 0 th frame and 8 th frame, wherein the 0 th frame is an I frame, the 8 th frame is a B frame, the 4 th frame is a level 1 frame in the B frame because it refers to the previous frame as a level 0 frame in the B frame, and similarly, the 2 nd frame refers to the 4 th frame as a level 1 frame in the B frame, and thus the 2 nd frame is a level 2 frame in the B frame.
Here, it should be noted that the process of determining the encoding order when capturing video frame images from a video to be encoded may be implemented by any existing encoding method, and the present disclosure does not particularly limit this, as long as the encoding level of the video frame where the target encoding unit is located can be determined based on the known encoding order.
The fifth feature related to the prediction result of the target coding unit in the predetermined second candidate prediction mode may include, for example, a rate-distortion cost of the target coding unit in the predetermined second candidate prediction mode.
Here, the predetermined second candidate prediction mode may be any prediction mode, which may be an intra prediction mode or an inter prediction mode, and may be the same as or different from the first candidate prediction mode described above. The target coding unit may be subjected to coding prediction according to a predetermined second candidate prediction mode based on an existing encoder, resulting in a rate-distortion cost.
As an example, the rate-distortion cost may be determined by the following expression:
rate-distortion cost of D + lamda × R
Wherein D represents coding distortion, R represents code rate after coding, and lamda is a preset coefficient.
In addition, the number of the second candidate prediction modes described herein may be one or more, and the present disclosure does not specifically limit the specific type, form and number of the second candidate prediction modes, nor does the present disclosure specifically limit the manner for obtaining the rate-distortion cost of the target coding unit in the second candidate prediction mode.
The characteristic information of the parent-level encoding unit may include at least one of: features related to partitioning of the parent coding unit, features related to a best candidate prediction mode of the parent coding unit, inter-frame features of the video frame related to the parent coding unit, a rate-distortion cost of the parent coding unit in the third candidate prediction mode.
In particular, the characteristics related to the partitioning of the parent coding unit may be derived based on partitioning information that partitions the video frame, which may include, for example, the size of the parent coding unit.
The feature related to the best candidate prediction mode of the parent coding unit may include a prediction mode of the best candidate prediction mode of the parent coding unit. Here, the best candidate prediction mode refers to a candidate prediction mode having the smallest rate-distortion cost among a plurality of predetermined candidate prediction modes. The plurality of predetermined candidate prediction modes described herein may include any prediction mode, which may include an intra prediction mode and/or an inter prediction mode.
For the parent-level encoding unit, encoding prediction may be performed according to a plurality of predetermined candidate prediction modes based on an existing encoder, and rate distortion costs may be obtained respectively to determine the optimal candidate prediction mode, so that whether the prediction mode of the optimal candidate prediction mode is intra-prediction or inter-prediction may be determined.
The inter-frame characteristics of the video frame related to the parent coding unit include a Motion Vector (MV) value of the parent coding unit, where the Motion Vector value of the parent coding unit refers to a Motion Vector between the parent coding unit and a matching coding unit in a reference frame of the video frame where the parent coding unit is located, and specifically, a certain correspondence exists between each coding unit between the video frame where the parent coding unit is located and the reference frame. In addition, calculating the motion vector may be implemented based on any existing encoding method, and the disclosure is not particularly limited thereto.
The third candidate prediction modes may be all candidate prediction modes in a prediction mode corresponding to an optimal candidate prediction mode of a previous-stage coding unit of a parent-stage coding unit, where the parent-stage coding unit is divided from the previous-stage coding unit. Taking fig. 2 as an example, the target coding unit is coding unit B, the parent coding unit is coding unit B2, and the previous coding unit of the parent coding unit is coding unit a 1.
Specifically, the best candidate prediction mode of the previous-stage coding unit of the parent-stage coding unit may be determined by an existing encoder, and thus a prediction mode corresponding to the best candidate prediction mode, that is, it is intra prediction or inter prediction, may be determined. The parent coding unit may be subjected to traversal coding according to all possible prediction modes in the prediction mode of the best candidate prediction mode of the previous coding unit, so as to obtain the rate-distortion cost in all possible prediction modes. For example, if it is determined that the prediction mode corresponding to the best candidate prediction mode of the encoding unit a1 is intra prediction, rate-distortion costs in all intra candidate prediction modes of the parent encoding unit under the intra prediction type may be calculated as the feature information of the parent encoding unit.
The characteristic information of the neighboring coding units may include: features related to partitioning of neighboring coding units, features related to best candidate prediction modes of neighboring coding units, and inter-frame features of a video frame related to neighboring coding units.
In particular, the features related to the partitioning of the neighboring coding units may be derived based on partitioning information that partitions the video frame, which may include, for example, the size of the neighboring coding units.
The feature related to the best candidate prediction mode of the neighboring coding unit may include a prediction mode of the best candidate prediction mode of the neighboring coding unit. Here, the best candidate prediction mode refers to a candidate prediction mode having the smallest rate-distortion cost among a plurality of predetermined candidate prediction modes. The plurality of predetermined candidate prediction modes described herein may include any prediction mode, which may include an intra prediction mode and/or an inter prediction mode.
For the adjacent coding unit, coding prediction can be performed according to a plurality of predetermined candidate prediction modes based on an existing encoder, rate distortion costs are respectively obtained to determine the optimal candidate prediction mode, and therefore whether the prediction mode of the optimal candidate prediction mode is intra prediction or inter prediction can be determined.
The inter-frame features of the video frame that are associated with neighboring coding units may include motion vector values of the neighboring coding units. The motion vector value of the adjacent coding unit is similar to the meaning of the motion vector value of the parent coding unit, and refers to the motion vector between the adjacent coding unit and the matching coding unit in the reference frame of the video frame where the adjacent coding unit is located, specifically, a certain corresponding relationship exists between the video frame where the adjacent coding unit is located and each coding unit in the reference frame, so that the best matching coding unit between two frames can be found by the existing inter-frame coding method, for example, the closest matching coding unit to the adjacent coding unit can be found in the reference frame of the video frame where the adjacent coding unit is located, and the motion vector between the adjacent coding unit and the matching coding unit can be calculated based on the positions of the adjacent coding unit and the matching coding unit in the video frame. In addition, the calculation of the motion vector may be implemented based on any existing encoding method, which is not particularly limited by the present disclosure.
The characteristic information of the target encoding unit, the parent encoding unit, and the neighboring encoding units are respectively exemplarily described above, however, the respective characteristic information thereof is not limited to the specific items described above.
In determining the prediction mode using the lightweight neural network, in one case, the feature information of the target coding unit is a size of the target coding unit, a partition depth of the target coding unit, a statistical value and a quantization parameter of a prediction residual of the target coding unit in a predetermined first candidate prediction mode, texture information of the target coding unit, and a coding level of the video frame, the feature information of the parent coding unit is a size of the parent coding unit, a prediction mode of an optimal candidate prediction mode of the parent coding unit, a motion vector of the parent coding unit, and a rate distortion cost of the parent coding unit in a third candidate prediction mode, and the feature information of the neighboring coding unit is a size of the neighboring coding unit, a prediction mode of an optimal candidate prediction mode of the neighboring coding unit, and a motion vector of the neighboring coding unit, in this case, the accuracy of the neural network prediction result can be improved while ensuring the calculation speed of the neural network, so that the prediction mode can be determined more efficiently.
In another case, on the basis of the above case, the feature information of the target coding unit may further include a rate distortion cost of the target coding unit in the predetermined second candidate prediction mode, so that the calculation accuracy of the neural network model may be improved, and the coding loss at the same coding time in the entire video coding process may be further reduced.
In order to reduce the computational complexity of an encoder and meet the real-time requirement in practical encoding application in the video encoding process, the rapid selection of the prediction mode is one of the main approaches for reducing the encoding time consumption, and for this reason, as described above, according to the video encoding method of the exemplary embodiment of the present disclosure, for the mode selection problem of intra prediction and inter prediction of a unit to be encoded in an inter-frame encoding frame, the intra-frame or inter-frame prediction mode of a target encoding unit to be encoded may be selected, and a part of candidate prediction modes that are not necessarily traversed is removed in advance. Therefore, the computational complexity in the aspect of mode selection of the interframe coding frame can be greatly saved under the condition of small introduced coding loss simply by utilizing CPU computing resources, and the coding speed and efficiency are improved.
Fig. 5 is a block diagram illustrating a video encoding apparatus according to an example embodiment. Referring to fig. 5, the video encoding apparatus includes an acquisition unit 100, a determination unit 200, and an encoding unit 300.
The acquisition unit 100 may be configured to acquire feature information of a target coding unit, feature information of a parent coding unit of the target coding unit, and feature information of an adjacent coding unit in a video frame to be encoded.
The determination unit 200 may be configured to determine the prediction mode of the target coding unit based on the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units.
The encoding unit 300 may be configured to encode the target coding unit based on the determined prediction mode.
Here, the target coding unit is divided from the parent coding unit, the adjacent coding unit is a coding unit adjacent to the target coding unit in the video frame to be encoded, and the prediction mode is intra prediction or inter prediction.
As an example, the determining unit 200 may be further configured to: inputting the characteristic information of the target coding unit, the characteristic information of the parent coding unit and the characteristic information of the adjacent coding units into a preset neural network model to obtain the estimated relevant information of the prediction mode of the target coding unit; the prediction mode of the target coding unit is determined based on the estimated prediction mode-related information of the target coding unit.
As an example, the prediction mode related information may be a probability that the best candidate prediction mode of the target coding unit is intra prediction and/or a probability that the best candidate prediction mode of the target coding unit is inter prediction, and the best candidate prediction mode may refer to a prediction mode in which a rate-distortion cost is smallest among predetermined candidate prediction modes.
As an example, the determining unit 200 may be further configured to: comparing one of the first probability and the second probability with a preset probability threshold; when the comparison result indicates that the one is greater than the probability threshold, determining a prediction mode corresponding to the one as a prediction mode of the target coding unit; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is less than the probability threshold; when the comparison result indicates that the one is equal to the probability threshold, determining the prediction mode of the target coding unit by calculating rate-distortion costs for traversing all candidate prediction modes of intra prediction and all candidate prediction modes of inter prediction for the target coding unit.
As an example, the determining unit 200 may be further configured to: comparing one of the first probability and the second probability with a preset first probability threshold and a second probability threshold, wherein the first probability threshold is smaller than the second probability threshold; when the comparison result indicates that the one is greater than a second probability threshold, determining a prediction mode corresponding to the one as a prediction mode of the target coding unit; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is smaller than the first probability threshold; when the comparison result indicates that the one is less than or equal to the second probability threshold and greater than or equal to the first probability threshold, determining the prediction mode of the target coding unit by calculating rate-distortion costs for traversing all candidate prediction modes of intra prediction and all candidate prediction modes of inter prediction for the target coding unit.
As an example, the characteristic information of the target coding unit may include at least one of: the first feature related to the division of the target coding unit, the second feature related to the prediction result of the target coding unit in the predetermined first candidate prediction mode, the third feature related to the pixel in the target coding unit, the fourth feature related to the video frame, and the fifth feature related to the prediction result of the target coding unit in the predetermined second candidate prediction mode.
In this example, the first feature may include a size of the target coding unit and a partition depth of the target coding unit, the second feature may include statistics of a prediction residual and a quantization parameter of the target coding unit in the first candidate prediction mode, the third feature may include texture information of the target coding unit, the fourth feature may include a coding level of the video frame, and the fifth feature includes a rate-distortion cost of the target coding unit in the second candidate prediction mode.
As an example, the characteristic information of the parent coding unit may include at least one of: features related to partitioning of a parent coding unit, features related to an optimal candidate prediction mode of the parent coding unit, inter-frame features related to the parent coding unit of a video frame, and a rate distortion cost of the parent coding unit in a third candidate prediction mode, wherein the third candidate prediction mode is all candidate prediction modes in a prediction mode corresponding to the optimal candidate prediction mode of a previous coding unit of the parent coding unit, and the parent coding unit is partitioned from the previous coding unit.
The characteristic information of the neighboring coding units may include: the video frame coding method includes the steps of determining characteristics related to partitioning of adjacent coding units, characteristics related to best candidate prediction modes of the adjacent coding units, and inter-frame characteristics of the video frame related to the adjacent coding units, wherein the best candidate prediction mode refers to a candidate prediction mode with the smallest rate distortion cost in a plurality of predetermined candidate prediction modes.
In this example, the features related to the partitioning of the parent coding unit may include a size of the parent coding unit, the features related to the best candidate prediction mode of the parent coding unit may include a prediction mode of the best candidate prediction mode of the parent coding unit, and the inter-frame features related to the parent coding unit of the video frame may include a motion vector of the parent coding unit.
The feature related to the division of the neighboring coding units may include sizes of the neighboring coding units, the feature related to the best candidate prediction modes of the neighboring coding units may include prediction modes of the best candidate prediction modes of the neighboring coding units, and the inter-feature related to the neighboring coding units of the video frame includes motion vectors of the neighboring coding units.
With regard to the apparatus in the above-described embodiment, the specific manner in which each unit performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment. As shown in fig. 6, the electronic device 10 includes a processor 101 and a memory 102 for storing processor-executable instructions. Here, the processor-executable instructions, when executed by the processor, cause the processor to perform the video encoding method as described in the above exemplary embodiments.
By way of example, the electronic device 10 need not be a single device, but can be any collection of devices or circuits capable of executing the above instructions (or sets of instructions) either individually or in combination. The electronic device 10 may also be part of an integrated control system or system manager, or may be configured as an electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the electronic device 10, the processor 101 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a dedicated processor system, a microcontroller, or a microprocessor. By way of example and not limitation, processor 101 may also include an analog processor, a digital processor, a microprocessor, a multi-core processor, a processor array, a network processor, or the like.
The processor 101 may execute instructions or code stored in the memory 102, wherein the memory 102 may also store data. The instructions and data may also be transmitted or received over a network via the network interface device, which may employ any known transmission protocol.
Memory 102 may be integrated with processor 101, e.g., with RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 102 may comprise a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 102 and the processor 101 may be operatively coupled or may communicate with each other, such as through an I/O port, a network connection, etc., so that the processor 101 can read files stored in the memory 102.
In addition, the electronic device 10 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 10 may be connected to each other via a bus and/or a network.
In an exemplary embodiment, there may also be provided a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform the video encoding method as described in the above exemplary embodiments. The computer readable storage medium may be, for example, a memory including instructions, and optionally: read-only memory (ROM), random-access memory (RAM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or optical disk memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, Secure Digital (SD) card or extreme digital (XD) card), tape, floppy disk, magneto-optical data storage, hard disk, magnetic disk, magneto-optical data storage, optical disk drive (HDD), magnetic disk (SSD), magnetic tape, magnetic disk, magneto-optical data storage, optical disk, and the like, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
In an exemplary embodiment, a computer program product may also be provided, which comprises computer instructions that, when executed by a processor, implement the video encoding method as described in the above exemplary embodiments.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (10)

1. A video encoding method, the video encoding method comprising:
acquiring feature information of a target coding unit, feature information of a parent coding unit of the target coding unit and feature information of an adjacent coding unit in a video frame to be coded;
determining a prediction mode of the target coding unit based on the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units;
encoding the target coding unit based on the determined prediction mode,
wherein the target coding unit is divided from the parent coding unit, the adjacent coding unit is a coding unit adjacent to the target coding unit in the video frame to be encoded, and the prediction mode is intra-prediction or inter-prediction.
2. The video coding method of claim 1, wherein the determining the prediction mode of the target coding unit based on the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units comprises:
inputting the feature information of the target coding unit, the feature information of the parent coding unit and the feature information of the adjacent coding units into a preset neural network model to obtain estimated prediction mode related information of the target coding unit;
determining a prediction mode of the target coding unit based on the estimated prediction mode related information of the target coding unit.
3. The video coding method of claim 2, wherein the prediction mode related information is a first probability that the best candidate prediction mode of the target coding unit is intra prediction and/or a second probability that the best candidate prediction mode of the target coding unit is inter prediction, and the best candidate prediction mode refers to a prediction mode with a smallest rate-distortion cost among predetermined candidate prediction modes.
4. The video coding method of claim 3, wherein determining the prediction mode of the target coding unit based on the estimated prediction mode related information of the target coding unit comprises:
comparing one of the first probability and the second probability with a preset probability threshold;
determining a prediction mode corresponding to the one as a prediction mode of the target coding unit when the comparison result indicates that the one is greater than the probability threshold; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is less than the probability threshold; determining a prediction mode of the target coding unit by calculating a rate-distortion cost for the target coding unit traversing all candidate prediction modes of the intra prediction and all candidate prediction modes of the inter prediction when the comparison result indicates that the one is equal to the probability threshold.
5. The video coding method of claim 3, wherein determining the prediction mode of the target coding unit based on the estimated prediction mode related information of the target coding unit comprises:
comparing one of the first and second probabilities to preset first and second probability thresholds, wherein the first probability threshold is less than the second probability threshold;
determining a prediction mode corresponding to the one as a prediction mode of the target coding unit when the comparison result indicates that the one is greater than the second probability threshold; determining a prediction mode corresponding to the other of the first probability and the second probability as a prediction mode of the target coding unit when the comparison result indicates that the one is less than the first probability threshold; determining a prediction mode of the target coding unit by calculating rate-distortion costs for traversing all candidate prediction modes of the intra-prediction and all candidate prediction modes of the inter-prediction for the target coding unit when a comparison result indicates that the one is less than or equal to the second probability threshold and greater than or equal to the first probability threshold.
6. The video coding method of claim 1, wherein the feature information of the target coding unit comprises at least one of: a first feature related to a partitioning of the target coding unit, a second feature related to a prediction result of the target coding unit in a predetermined first candidate prediction mode, a third feature related to a pixel within the target coding unit, a fourth feature related to the video frame, and a fifth feature related to a prediction result of the target coding unit in a predetermined second candidate prediction mode.
7. A video encoding apparatus, characterized in that the video encoding apparatus comprises:
an acquisition unit configured to acquire feature information of a target coding unit, feature information of a parent coding unit of the target coding unit, and feature information of an adjacent coding unit in a video frame to be coded;
a determination unit configured to determine a prediction mode of the target coding unit based on the feature information of the target coding unit, the feature information of the parent coding unit, and the feature information of the neighboring coding units;
a coding unit configured to code the target coding unit based on the determined prediction mode,
wherein the target coding unit is divided from the parent coding unit, the adjacent coding unit is a coding unit adjacent to the target coding unit in the video frame to be encoded, and the prediction mode is intra-prediction or inter-prediction.
8. An electronic device, characterized in that the electronic device comprises:
a processor;
a memory for storing the processor-executable instructions,
wherein the processor-executable instructions, when executed by the processor, cause the processor to perform the video encoding method of any of claims 1 to 6.
9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the video encoding method of any of claims 1-6.
10. A computer program product comprising computer instructions, characterized in that said computer instructions, when executed by a processor, implement the video encoding method according to any one of claims 1 to 6.
CN202210287419.4A 2022-03-22 2022-03-22 Video coding method and device, electronic equipment and storage medium Pending CN114666579A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210287419.4A CN114666579A (en) 2022-03-22 2022-03-22 Video coding method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210287419.4A CN114666579A (en) 2022-03-22 2022-03-22 Video coding method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114666579A true CN114666579A (en) 2022-06-24

Family

ID=82031794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210287419.4A Pending CN114666579A (en) 2022-03-22 2022-03-22 Video coding method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114666579A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117440156A (en) * 2023-09-22 2024-01-23 书行科技(北京)有限公司 Video coding method, video publishing method and related products

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117440156A (en) * 2023-09-22 2024-01-23 书行科技(北京)有限公司 Video coding method, video publishing method and related products

Similar Documents

Publication Publication Date Title
US10841583B2 (en) Coding unit depth determining method and apparatus
KR101155767B1 (en) Selecting encoding types and predictive modes for encoding video data
JP4777889B2 (en) Mode decision for intermediate prediction in video coding
US11070803B2 (en) Method and apparatus for determining coding cost of coding unit and computer-readable storage medium
CN101710991A (en) Fast intra mode prediction for a video encoder
CN104135629A (en) Encoding an image
WO2016180129A1 (en) Prediction mode selection method, apparatus and device
CN114173120A (en) Video coding block division method and video coding block division prediction model training method
WO2021159785A1 (en) Image processing method and apparatus, terminal, and computer-readable storage medium
CN115484464A (en) Video coding method and device
CN114666579A (en) Video coding method and device, electronic equipment and storage medium
KR20190013908A (en) Interframe predictive coding method and apparatus
CN112087624A (en) Coding management method based on high-efficiency video coding
CN105791863B (en) 3D-HEVC depth map intra-frame predictive encoding method based on layer
CN115278235B (en) Video coding method and device, electronic equipment and storage medium
CN111918059B (en) Hardware-friendly regression tree-based intra-frame prediction mode decision method and device
KR101671759B1 (en) Method for executing intra prediction using bottom-up pruning method appliing to high efficiency video coding and apparatus therefor
Lin et al. Coding unit partition prediction technique for fast video encoding in HEVC
CN113542737A (en) Encoding mode determining method and device, electronic equipment and storage medium
CN113259669B (en) Encoding method, encoding device, electronic device and computer readable storage medium
AU2021103378A4 (en) A self-adaptive n-depth context tree weighting method
US11889055B2 (en) Methods and systems for combined lossless and lossy coding
Angel et al. Complexity Reduction in Intra Prediction of HEVC Using a Modified Convolutional Neural Network Model Incorporating Depth Map and RGB Texture
WO2022047144A1 (en) Methods and systems for combined lossless and lossy coding
KR101600714B1 (en) Fast intra-mode decision method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination