CN117857813A

CN117857813A - Video processing method, device, electronic equipment and medium

Info

Publication number: CN117857813A
Application number: CN202311862661.0A
Authority: CN
Inventors: 简云瑞; 薛毅; 周超
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2023-12-29
Filing date: 2023-12-29
Publication date: 2024-04-09

Abstract

The disclosure provides a video processing method, a video processing device, electronic equipment and a video processing medium, and belongs to the technical field of computers. The method comprises the following steps: for any coding block of a target image in a target video, determining a bidirectional initial motion vector of the coding block; taking the position pointed by the bidirectional initial motion vector in the target reference image as a searching starting point, and carrying out motion searching based on a plurality of searching combinations to determine a plurality of candidate motion vector combinations; determining a target motion vector combination from a plurality of candidate motion vector combinations; and coding the index of the bidirectional initial motion vector and the index of the search combination of the target motion vector combination to obtain coded data. According to the technical scheme, the plurality of search amplitudes and the plurality of search directions in the motion search process of the bidirectional initial motion vector are adjusted, so that the irregular motion condition of the object in the image frame can be processed on the premise of not increasing the code rate consumption, and the video compression coding efficiency is improved.

Description

Video processing method, device, electronic equipment and medium

Technical Field

The disclosure relates to the field of computer technology, and in particular, to a video processing method, a video processing device, an electronic device and a medium.

Background

Inter-frame prediction techniques are widely used for compression encoding video such as video conferencing, video telephony, and high definition television. Inter-frame prediction refers to image compression using correlation between image frames of video to remove redundancy in image frames over time. The fusion technique with motion vector difference is a commonly used inter-frame prediction technique at present, and usually needs to determine an optimal motion vector between coding blocks at a coding end, and then code related information of the optimal motion vector and send the coded information to a decoding end. However, in bi-prediction using this technique, only scenes in which objects in the image frame are regularly moving are typically involved. Therefore, how to perform effective inter-frame prediction in the case that an object in an image frame performs irregular motion is a problem to be solved.

Disclosure of Invention

The present disclosure provides a video processing method, apparatus, electronic device, and medium, capable of processing a situation of irregular motion of an object in an image frame in a bi-directional prediction process without increasing code rate consumption by adjusting a plurality of search magnitudes and a plurality of search directions used when performing motion search by a bi-directional initial motion vector. The technical scheme of the present disclosure is as follows:

According to an aspect of the embodiments of the present disclosure, there is provided a video processing method, the method including:

for any coding block of a target image in a target video, determining a bidirectional initial motion vector of the coding block, wherein the target image is any video frame in the target video, the coding block is used for compressing and storing partial information of the video frame, and the bidirectional initial motion vector is a candidate motion vector for bidirectional prediction of the coding block;

the method comprises the steps of taking the position pointed by the bidirectional initial motion vector in a target reference image as a searching starting point, carrying out motion searching based on a plurality of searching combinations, and determining a plurality of candidate motion vector combinations, wherein the target reference image comprises a previous frame and a subsequent frame of the target image in the video, the plurality of searching combinations are composed of a plurality of searching amplitudes and a plurality of searching directions, the searching amplitudes are used for indicating the offset of the bidirectional initial motion vector, and the candidate motion vector combinations comprise a forward candidate motion vector and a backward candidate motion vector;

determining a target motion vector combination from the plurality of candidate motion vector combinations, wherein the target motion vector combination is a candidate motion vector combination in which the difference between a target reference block and the coding block reaches a difference threshold, and the target reference block is a coding unit corresponding to the coding block in the target reference image;

And coding the index of the bidirectional initial motion vector and the index of the search combination of the target motion vector combination to obtain coding data, wherein the coding data is used for determining a prediction block of the coding block in the target image.

According to another aspect of the embodiments of the present disclosure, there is provided a video processing apparatus including:

a first determining unit configured to determine, for any one of encoded blocks of a target image in a target video, a bi-directional initial motion vector of the encoded block, the target image being any one of video frames in the target video, the encoded block being configured to store partial information of the video frames in a compressed manner, the bi-directional initial motion vector being a candidate motion vector for bi-directional prediction of the encoded block;

a second determining unit configured to perform a motion search based on a plurality of search combinations, including a previous frame and a subsequent frame of the target image in the video, with a position pointed by the bi-directional initial motion vector in the target reference image as a search start point, the plurality of search combinations being constituted by a plurality of search magnitudes for indicating an offset amount of the bi-directional initial motion vector, and a plurality of search directions, the candidate motion vector combinations including a forward candidate motion vector and a backward candidate motion vector;

A third determining unit configured to determine a target motion vector combination from the plurality of candidate motion vector combinations, the target motion vector combination being a candidate motion vector combination in which a difference between a target reference block and the coding block reaches a difference threshold, the target reference block being a coding unit corresponding to the coding block in the target reference image;

and the coding unit is configured to code the index of the bidirectional initial motion vector and the index of the search combination of the target motion vector combination to obtain coding data, wherein the coding data is used for determining a prediction block of the coding block in the target image.

In some embodiments, the first determining unit is configured to obtain, for any coding block of a target image in a target video, a candidate motion information list of the coding block, where the candidate motion information list includes a candidate motion vector, a prediction direction, and a reference image; traversing bi-directionally predicted candidate motion vectors in the candidate motion information list based on the forward-backward order; determining a bi-directional predicted candidate motion vector as the bi-directional initial motion vector; and stopping traversing the bi-directional predicted candidate motion vector under the condition that the bi-directional initial motion vector reaches a quantity threshold value.

In some embodiments, the first determining unit is further configured to traverse the uni-directionally predicted candidate motion vectors based on the order from front to back in the candidate motion information list if the bi-directional initial motion vector does not reach the number threshold after completing the traversing of bi-directionally predicted candidate motion vectors; determining a candidate motion vector of unidirectional prediction as a unidirectional initial motion vector; and stopping traversing the unidirectional predicted candidate motion vector if the sum of the bidirectional initial motion vector and the unidirectional initial motion vector reaches the number threshold.

In some embodiments, the second determining unit is configured to determine a first number of search steps and a second number of search directions with a position pointed by the bi-directional initial motion vector in the target reference image as a search start point; determining a first number of search magnitudes based on the first number of search steps, the first number of search magnitudes being a product value of the first number of search steps and a first number of step coefficients; performing a motion search in the second number of search directions based on the plurality of search combinations constructed by the first number of search magnitudes and the second number of search directions, determining a plurality of offset motion vector combinations including a forward offset vector for forward offset of the bi-directional initial motion vector and a backward offset vector for backward offset; and adding the plurality of offset motion vector combinations with the bidirectional initial motion vector respectively to obtain the plurality of candidate motion vector combinations.

In some embodiments, the first number of search steps includes 1/8 pixel, 1/4 pixel, 1/2 pixel, 1 pixel, 2 pixel, 4 pixel, 8 pixel, 16 pixel; the second number of search directions includes an up direction, a down direction, a left direction, and a right direction; the plurality of search combinations are used for indicating the bidirectional initial motion vector to respectively perform the same-direction search, reverse search and one-direction search in the forward direction and the backward direction, wherein the same-direction search refers to a search mode in which the forward direction offset vector and the backward direction offset vector are the same, the reverse search refers to a search mode in which the forward direction offset vector and the backward direction offset vector are opposite, and the one-direction search refers to a search mode in which only forward direction offset or only backward direction offset is performed.

In some embodiments, the first number of search steps includes 1/8 pixel, 1/4 pixel, 1/2 pixel, 1 pixel, 2 pixel, 4 pixel, 8 pixel, 16 pixel; the second number of search directions includes n directions, an angle between the n directions being 360/n degrees, where n is a positive integer; the plurality of search combinations are used for indicating the bidirectional initial motion vector to perform reverse search in the forward direction and the backward direction, and the reverse search refers to a search mode that the forward direction offset vector and the backward direction offset vector are opposite.

In some embodiments, the third determining unit includes:

a first determination subunit configured to determine a plurality of absolute error sums corresponding to the plurality of candidate motion vector combinations, the absolute error sums being used to indicate a gap between the encoded block and a predicted block of the encoded block;

a second determination subunit configured to determine a third number of candidate motion vector combinations based on the plurality of absolute error sums in order from small to large;

a third determination subunit configured to determine the target motion vector combination based on rate-distortion optimization among the third number of candidate motion vector combinations.

In some embodiments, the first determining subunit is configured to determine, for any candidate motion vector combination, a reference block of the encoded block based on the candidate motion vector combination and the encoded block, the reference block being a start point of the candidate motion vector combination, the encoded block being an end point of the candidate motion vector combination; performing motion compensation on edge pixels of the coding block based on the edge pixels of the reference block to obtain edge prediction pixels, wherein the edge pixels comprise left side pixels and upper side pixels; an absolute error sum of edge pixels of the encoded block and the edge prediction pixels is determined.

In some embodiments, the first determining subunit is configured to perform motion compensation on the coding block for any candidate motion vector combination to obtain a prediction block of the coding block, where the prediction block includes a forward prediction block and a backward prediction block; an absolute error sum of the prediction block and the coding block is determined.

In some embodiments, the apparatus further comprises:

the searching unit is configured to perform motion searching on the bidirectional initial motion vector through a decoding end motion vector refinement technology under the condition that the bidirectional prediction weight of the bidirectional initial motion vector is not equal to a preset weight, determine an offset bidirectional initial motion vector, wherein the bidirectional prediction weight is a weight stored in a candidate motion information list of the coding block, and the residual error of the offset bidirectional initial motion vector is smaller than the residual error of the bidirectional initial motion vector and is used for indicating the difference between the reference block and the prediction block.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic device including:

one or more processors;

a memory for storing the processor-executable program code;

Wherein the processor is configured to execute the program code to implement the video processing method described above.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the above-described video processing method.

According to another aspect of the disclosed embodiments, there is provided a computer program product comprising a computer program/instruction which, when executed by a processor, implements the video processing method described above.

The embodiment of the disclosure provides a video processing method, which can process the irregular motion of an object in an image frame in the bidirectional prediction process on the premise of not increasing code rate consumption by adjusting a plurality of search amplitudes and a plurality of search directions used when a bidirectional initial motion vector performs motion search. Compared with the traditional mode, the scheme improves the efficiency of video compression coding.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a schematic diagram illustrating an implementation environment of a video processing method according to an exemplary embodiment.

Fig. 2 is a flow chart illustrating a video processing method according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating another video processing method according to an exemplary embodiment.

Fig. 4 is a schematic diagram illustrating a motion situation of a coding block according to an exemplary embodiment.

Fig. 5 is a schematic diagram illustrating an expansion direction according to an exemplary embodiment.

FIG. 6 is a schematic diagram illustrating a template matching according to an exemplary embodiment.

Fig. 7 is a block diagram of a video processing apparatus according to an exemplary embodiment.

Fig. 8 is a block diagram of another video processing apparatus according to an exemplary embodiment.

Fig. 9 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

It should be noted that, the information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals related to the present disclosure are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, the target video referred to in this disclosure is acquired with sufficient authorization.

Fig. 1 is a schematic diagram illustrating an implementation environment of a video processing method according to an exemplary embodiment. Referring to fig. 1, the implementation environment specifically includes: a terminal 101 and a server 102. The terminal 101 may be connected to the server 102 through a wireless network or a wired network.

The terminal 101 may be at least one of a smart phone, a smart watch, a desktop computer, a laptop computer, an MP3 player (Moving Picture Experts Group Audio Layer III, mpeg 3), an MP4 (Moving Picture Experts Group Audio Layer IV, mpeg 4) player, and a laptop portable computer. The terminal 101 may be provided with an application program that may acquire a target motion vector combination of any encoding block, encode the relevant information, and send the encoded information to the decoding end. The application is associated with the server 102, and a background service is provided to the terminal 101 by the server 102.

The terminal 101 may refer broadly to one of a plurality of terminals, the present embodiment being illustrated by the terminal 101. Those skilled in the art will recognize that the number of terminals may be greater or lesser. For example, the number of the terminals may be several, or the number of the terminals may be tens or hundreds, or more, and the number and the device type of the terminals are not limited in the embodiments of the present disclosure.

Server 102 is at least one of a server, a plurality of servers, a cloud computing platform, and a virtualization center. Alternatively, the number of servers may be greater or lesser, which is not limited by the embodiments of the present disclosure. Of course, the server 102 may also include other functional servers to provide more comprehensive and diverse services. In some embodiments, the server 102 takes on primary computing work and the terminal 101 takes on secondary computing work; alternatively, the server 102 takes on secondary computing work and the terminal 101 takes on primary computing work; alternatively, a distributed computing architecture is used for collaborative computing between the server 102 and the terminal 101. The server 102 may be connected to the terminal 101 and other terminals through a wireless network or a wired network, alternatively, the number of servers may be greater or less, which is not limited by the embodiments of the present disclosure.

Fig. 2 is a flowchart illustrating a video processing method, as shown in fig. 2, performed by an electronic device, according to an exemplary embodiment, comprising the steps of:

in step S201, for any coding block of a target image in a target video, a bi-directional initial motion vector of the coding block is determined, the target image is any video frame in the target video, the coding block is used for compressing and storing partial information of the video frame, and the bi-directional initial motion vector is a candidate motion vector for bi-directionally predicting the coding block.

In the embodiment of the present disclosure, the target video is composed of consecutive images, that is, the video frame is a basic constituent unit composing the video. Video frame rate, which refers to the number of images recorded per second, is typically used as a measure of video smoothness. The video frame rate of the target video is not limited by the embodiments of the present disclosure. The target image is composed of a plurality of coding blocks, and each coding block is mapped into another value domain after mathematical transformation in the process of video compression and then is subjected to coding processing. The embodiment of the disclosure does not limit the number, division, encoding mode and the like of the encoding blocks.

In step S202, a motion search is performed based on a plurality of search combinations using a position pointed by the bi-directional initial motion vector in the target reference image as a search start point, and a plurality of candidate motion vector combinations are determined, wherein the target reference image includes a frame preceding and a frame following the target image in the video, the plurality of search combinations are composed of a plurality of search magnitudes and a plurality of search directions, the search magnitudes are used to indicate an offset of the bi-directional initial motion vector, and the candidate motion vector combinations include a forward candidate motion vector and a backward candidate motion vector.

In the disclosed embodiments, the target reference image refers to the forward reference frame and the backward reference frame of the current video frame, that is, the previous frame and the next frame of the target image in the video. The bi-directional candidate motion vector is a motion vector candidate for bi-directional prediction, and thus, in the forward reference frame and the backward reference frame, motion search is performed using the position to which the bi-directional candidate motion vector is directed as a search start point. In the forward reference frame and the backward reference frame, the bidirectional candidate motion vectors are respectively offset in a plurality of directions and a plurality of step sizes around the searching starting point, so that a plurality of forward candidate motion vectors and a plurality of backward candidate motion vectors can be obtained. The plurality of directions and the plurality of steps form a plurality of search combinations, and the plurality of forward candidate motion vectors and the plurality of backward candidate motion vectors form a plurality of candidate motion vector combinations.

In step S203, a target motion vector combination is determined from a plurality of candidate motion vector combinations, where the target motion vector combination is a candidate motion vector combination in which a difference between a target reference block and a coding block reaches a difference threshold, and the target reference block is a coding unit corresponding to the coding block in the target reference image.

In an embodiment of the present disclosure, a terminal determines a target motion vector combination from a plurality of candidate motion vector combinations. Wherein, the target motion vector combination is usually the optimal motion vector combination, namely the candidate motion vector combination with the minimum difference between the indicated target reference block and the coding block.

In step S204, the index of the bi-directional initial motion vector and the index of the search combination of the target motion vector combination are encoded, resulting in encoded data, which is used to determine the prediction block of the encoded block in the target image.

In the embodiment of the disclosure, the terminal encodes the index of the search combination of the bidirectional initial motion vector and the index of the target motion vector combination to obtain encoded data, and the encoded data is convenient to be sent to a decoding end subsequently. The decoding end obtains the predicted block of the target reference block based on the information and takes the predicted block as the predicted block of the encoding block.

In some embodiments, for any encoded block of a target image in a target video, determining a bi-directional initial motion vector for the encoded block comprises:

for any coding block of a target image in a target video, acquiring a candidate motion information list of the coding block, wherein the candidate motion information list comprises a candidate motion vector, a prediction direction and a reference image;

traversing bi-directionally predicted candidate motion vectors in the candidate motion information list based on the forward-backward order;

determining a bi-directional predicted candidate motion vector as a bi-directional initial motion vector;

and stopping traversing the bi-directional predicted candidate motion vector when the bi-directional initial motion vector reaches the quantity threshold.

In the embodiment of the disclosure, by traversing the bi-directional predicted candidate motion vector preferentially, more benefits can be obtained in the process of performing motion search and selection on the bi-directional initial motion vector in the process of performing video compression coding, and the coding efficiency of performing video compression coding is further improved.

In some embodiments, the method further comprises:

traversing the unidirectional predicted candidate motion vectors based on the sequence from front to back in the candidate motion information list under the condition that the bidirectional initial motion vectors do not reach the quantity threshold value after traversing the bidirectional predicted candidate motion vectors;

Determining a candidate motion vector of unidirectional prediction as a unidirectional initial motion vector;

and stopping traversing the candidate motion vector of the unidirectional prediction when the sum of the bidirectional initial motion vector and the unidirectional initial motion vector reaches a quantity threshold.

In the embodiment of the disclosure, the bi-directional prediction mode with larger benefit is preferentially used in the video compression coding process by traversing the bi-directional predicted candidate motion information after traversing the bi-directional predicted candidate motion information, so that the coding efficiency is convenient to improve.

In some embodiments, motion searching based on a plurality of search combinations using a location at which a bi-directional initial motion vector points in a target reference image as a search start point, determining a plurality of candidate motion vector combinations includes:

determining a first number of search step sizes and a second number of search directions by taking the position pointed by the bidirectional initial motion vector in the target reference image as a search starting point;

determining a first number of search magnitudes based on the first number of search steps, the first number of search magnitudes being a product value of the first number of search steps and a first number of step coefficients;

performing motion search in a second number of search directions based on a plurality of search combinations constructed of the first number of search magnitudes and the second number of search directions, determining a plurality of offset motion vector combinations including a forward offset vector for forward offset of the bi-directional initial motion vector and a backward offset vector for backward offset;

And adding the plurality of offset motion vector combinations with the bidirectional initial motion vector respectively to obtain a plurality of candidate motion vector combinations.

In the embodiment of the disclosure, a plurality of candidate motion vector combinations are obtained by performing motion search, so that the optimal motion vector combination can be conveniently selected from the candidate motion vector combinations to realize compression coding of the video frame. Meanwhile, based on adjustment of a plurality of search magnitudes and a plurality of search directions, efficient video compression encoding can be achieved when an object in a video performs irregular motion, and encoding efficiency is improved.

In some embodiments, the first number of search steps includes 1/8 pixel, 1/4 pixel, 1/2 pixel, 1 pixel, 2 pixel, 4 pixel, 8 pixel, 16 pixel;

the second number of search directions includes an up direction, a down direction, a left direction, and a right direction;

the plurality of search combinations are used for indicating that the bidirectional initial motion vector performs the search combinations of the same-direction search, the reverse search and the unidirectional search in the forward direction and the backward direction respectively, wherein the same-direction search refers to the search mode that the forward direction offset vector and the backward direction offset vector are the same, the reverse search refers to the search mode that the forward direction offset vector and the backward direction offset vector are opposite, and the unidirectional search refers to the search mode that only forward direction offset or only backward direction offset is performed.

In the embodiment of the disclosure, the motion search is performed based on the plurality of search combinations formed by the plurality of search magnitudes and the plurality of search directions, so that the obtained plurality of candidate motion vector combinations better describe the condition that the encoding block represents the object to perform discontinuous motion, thereby being more comprehensive and facilitating the improvement of the encoding efficiency.

the second number of search directions includes n directions, an angle between the n directions being 360/n degrees, where n is a positive integer;

the plurality of search combinations are used to indicate a search combination in which the bi-directional initial motion vector performs a reverse search in the forward and backward directions, the reverse search being a search in which the forward offset vector is opposite to the backward offset vector.

In the embodiment of the disclosure, the motion search is performed based on the plurality of search combinations formed by the plurality of search magnitudes and the plurality of search directions, so that the obtained plurality of candidate motion vector combinations better describe the condition that the coding block represents the object to perform non-horizontal vertical, thereby being more comprehensive and facilitating the improvement of the coding efficiency.

In some embodiments, determining a target motion vector combination from a plurality of candidate motion vector combinations comprises:

determining a plurality of absolute error sums corresponding to the plurality of candidate motion vector combinations, wherein the absolute error sums are used for indicating differences between the coding blocks and the prediction blocks of the coding blocks;

determining a third number of candidate motion vector combinations based on the plurality of absolute error sums in order from small to large;

in a third number of candidate motion vector combinations, a target motion vector combination is determined based on rate-distortion optimization.

In the embodiment of the disclosure, the target motion vector is obtained by optimizing the target motion vector according to the rate distortion after the preliminary screening is performed on the candidate motion vectors according to the absolute error sum, so that the method is beneficial to more accurately and efficiently determining the target motion vector combination.

In some embodiments, determining a plurality of absolute error sums corresponding to a plurality of candidate motion vector combinations comprises:

for any candidate motion vector combination, determining a reference block of the coding block based on the candidate motion vector combination and the coding block, wherein the reference block is a starting point of the candidate motion vector combination, and the coding block is an ending point of the candidate motion vector combination;

performing motion compensation on edge pixels of the coding block based on edge pixels of the reference block to obtain edge prediction pixels, wherein the edge pixels comprise left side pixels and upper side pixels;

The absolute error sum of the edge pixels and the edge prediction pixels of the encoded block is determined.

In the embodiment of the disclosure, the template matching algorithm is used for performing preliminary screening on a plurality of candidate motion vectors, so that candidate motion vector combinations with better benefits can be efficiently obtained under the condition of strict complexity requirements, and the target motion vector combinations can be conveniently obtained from the candidate motion vector combinations.

for any candidate motion vector combination, performing motion compensation on the coding block to obtain a prediction block of the coding block, wherein the prediction block comprises a forward prediction block and a backward prediction block;

the absolute error sum of the prediction block and the coding block is determined.

In the embodiment of the disclosure, the bilateral matching algorithm is used for performing preliminary screening on a plurality of candidate motion vectors, so that candidate motion vector combinations with better benefits can be obtained more accurately under the condition of lower complexity requirement, and the target motion vector combinations can be obtained conveniently.

In some embodiments, the method further comprises:

and under the condition that the bidirectional prediction weight of the bidirectional initial motion vector is not equal to the preset weight, performing motion search on the bidirectional initial motion vector through a decoding end motion vector refinement technology, and determining an offset bidirectional initial motion vector, wherein the bidirectional prediction weight is a weight stored in a candidate motion information list of the coding block, and the residual error of the offset bidirectional initial motion vector is smaller than the residual error of the bidirectional initial motion vector and is used for indicating the difference between the reference block and the prediction block.

In the embodiment of the disclosure, the motion search is performed on the bidirectional initial motion vector by using a decoding end motion vector refinement technology, so that a more accurate bidirectional initial motion vector can be obtained, and the subsequent obtaining of a target motion vector combination with smaller difference between a corresponding prediction block and a corresponding coding block is facilitated.

Fig. 2 is a flowchart of a video processing method according to the present disclosure, and a video processing scheme provided by the present disclosure is further described below. Fig. 3 is a flowchart illustrating another video processing method, see fig. 3, performed by an electronic device, according to an exemplary embodiment, comprising the steps of:

in step S301, for any coding block of a target image in a target video, a bi-directional initial motion vector of the coding block is determined, the target image is any video frame in the target video, the coding block is used for compressing and storing partial information of the video frame, and the bi-directional initial motion vector is a candidate motion vector for bi-directionally predicting the coding block.

In the embodiment of the present disclosure, the target video is composed of continuous video frames, and the embodiment of the present disclosure does not limit the video content, the video frame rate, and the like of the target video. The target image is composed of a plurality of coding blocks, and each coding block is mapped into another value domain after mathematical transformation in the process of video compression and then is subjected to coding processing. The embodiment of the disclosure does not limit the number, division, encoding mode and the like of the encoding blocks.

In some embodiments, bi-directionally predicted candidate motion vectors are preferentially selected from a list of candidate motion information for the encoded block. Correspondingly, for any coding block of a target image in a target video, the terminal acquires a candidate motion information list of the coding block, wherein the candidate motion information list comprises a candidate motion vector, a prediction direction and a reference image; in the candidate motion information list, the terminal traverses the bi-directionally predicted candidate motion vectors based on the front-to-back sequence; determining a bi-directional predicted candidate motion vector as a bi-directional initial motion vector; in case the bi-directional initial motion vector reaches the number threshold, the terminal stops traversing bi-directionally predicted candidate motion vectors. By traversing the bi-directional predicted candidate motion vector preferentially, more benefits can be obtained in the bi-directional prediction process in the video compression coding process, and the coding efficiency of the video compression coding is improved.

Wherein the candidate motion information list is used to provide a plurality of candidate motion vectors for the codec block. At present, a fusion technology method with motion vector difference is commonly used for carrying out inter-frame compression coding, and the first two candidate motion vectors in a candidate motion information list are generally selected as initial motion vectors in the method. The performance of the method is derived from the enhancement of the motion accuracy expression of the initial motion vector, and the method in the embodiment of the disclosure can improve the accuracy of the candidate motion vector on the premise of not increasing the code rate consumption in the process of bi-directional prediction, so that the candidate motion vector of bi-directional prediction is preferentially selected from a candidate motion information list in the process of determining the initial motion vector.

In some embodiments, when the bi-directionally predicted candidate motion vector does not meet the number requirement, the uni-directionally predicted candidate motion vector is re-selected. Correspondingly, after traversing the bi-directional predicted candidate motion vectors, under the condition that the bi-directional initial motion vectors do not reach the quantity threshold value, traversing the uni-directional predicted candidate motion vectors in the candidate motion information list by the terminal based on the sequence from front to back; the terminal determines a candidate motion vector of unidirectional prediction as a unidirectional initial motion vector; in case that the sum of the bi-directional initial motion vector and the uni-directional initial motion vector reaches the number threshold, the terminal stops traversing the uni-directional predicted candidate motion vectors. And finally, traversing the candidate motion information of unidirectional prediction, so that a bidirectional prediction mode with larger benefit is preferentially used in the process of video compression coding, and the coding efficiency is convenient to improve.

Wherein, according to the difference of the prediction directions, the initial motion vector is divided into a bidirectional initial motion vector and a unidirectional initial motion vector. Wherein the bi-directional initial motion vector refers to a candidate motion vector for bi-directional prediction of the current coding block. Bi-prediction refers to the need to reference both the forward reference frame (L0 reference) and the backward reference frame (L1 reference) during prediction. Wherein the unidirectional initial motion vector refers to a candidate motion vector for unidirectional prediction of the current encoded block. Unidirectional prediction refers to referencing only forward reference frames or referencing only backward reference frames in the prediction process.

For example, the inter-frame prediction technology commonly used at present determines candidate motion vectors in a manner of "spatial domain (adding 4 candidates at most) > temporal domain (1 candidate at most) > HMVP- > pair-wise- > zeroMV", and constructs a candidate motion information list with a length of 6. In the disclosed embodiment, when the initial motion vector is selected, the first two candidate motion vectors are no longer acquired in a default manner. The candidate motion information list is traversed firstly, and the bi-directional predicted candidate motion vector is traversed from the candidate motion vectors corresponding to the airspace- > time domain- > HMVP- > pariewise, and is determined as the initial motion vector firstly. If the number of the initial motion vectors still does not meet the requirement after the traversing is finished, traversing the unidirectional predicted candidate motion vectors from the candidate motion information list in a front-to-back mode to serve as the initial motion vectors. Through the process, the bi-directional prediction candidate motion vector can be selected as much as possible, so that more benefits are obtained, and the coding efficiency is further improved.

In some embodiments, correspondingly, under the condition that the bidirectional prediction weight of the bidirectional initial motion vector is not equal to the preset weight, the terminal performs motion search on the bidirectional initial motion vector through a decoding end motion vector refinement technology, determines an offset bidirectional initial motion vector, wherein the bidirectional prediction weight is a weight stored in a candidate motion information list of the coding block, the residual error of the offset bidirectional initial motion vector is smaller than the residual error of the bidirectional initial motion vector, and the residual error is used for indicating the difference between the reference block and the prediction block. By performing motion search on the bi-directional initial motion vector by using a decoding-side motion vector refinement technique, a more accurate bi-directional initial motion vector can be obtained.

For example, decoding-side motion vector refinement techniques (DMVR, decode MV refinement) are typically used only for conventional candidates in the implementation of multi-function video coding (VVC, versatile Video Coding). However, in the embodiment of the present disclosure, after determining two initial motion vectors, if a bi-directional initial motion vector exists at this time and the index of BCW (bi-directional prediction weight) of the bi-directional initial motion vector is not equal to bcw_default (preset weight), the bi-directional initial motion vector is adjusted by using a decoding end motion vector refinement technique, and at this time, both the corresponding forward candidate motion vector and backward candidate motion vector are offset, and the adjusted bi-directional initial motion vector is reused for bi-directional prediction.

In step S302, a first number of search steps and a second number of search directions are determined using a position pointed by the bi-directional initial motion vector in the target reference image as a search start point, the target reference image including a previous frame and a subsequent frame of the target image in the video.

In the embodiment of the present disclosure, the search step length refers to a pixel distance in which the bi-directional initial motion vector is shifted, and the search direction refers to a direction in which the bi-directional initial motion vector is shifted. Because the video frames in the embodiments of the present disclosure are compression encoded by inter-frame prediction, in the process of performing bi-directional prediction, the video frames need to be encoded by referring to the forward reference frame and the backward reference frame, so that at the decoding end, the restoration and reconstruction of the current reference frame need to be performed according to the forward reference frame, the backward reference frame, and the image difference data between the current reference frame. Wherein the image difference data is composed of a plurality of motion vectors.

In step S303, a first number of search magnitudes is determined based on the first number of search steps, the first number of search magnitudes being a product value of the first number of search steps and a first number of step coefficients.

In the embodiment of the disclosure, the terminal multiplies the search step length by the step length coefficient to obtain a search amplitude, and the search amplitude is used for indicating the offset of the bidirectional initial motion vector.

In step S304, a motion search is performed in the second number of search directions based on a plurality of search combinations constructed of the first number of search magnitudes and the second number of search directions, and a plurality of offset motion vector combinations are determined, the plurality of search combinations being composed of the plurality of search magnitudes and the plurality of search directions, the offset motion vector combinations including a forward offset vector for forward offset of the bi-directional initial motion vector and a backward offset vector for backward offset.

In the embodiment of the disclosure, the terminal performs motion search in the second number of search directions based on a plurality of search combinations constructed by the first number of search magnitudes and the second number of search directions, and determines a plurality of offset motion vector combinations, so that a plurality of candidate motion vector combinations can be determined. Wherein the first number and the second number are different in value in different motion scenarios.

In step S305, a plurality of offset motion vector combinations are added to the bi-directional initial motion vector, respectively, to obtain a plurality of candidate motion vector combinations, each including a forward candidate motion vector and a backward candidate motion vector.

In the embodiment of the disclosure, the terminal respectively adds the plurality of offset motion vector combinations and the bidirectional initial motion vector to obtain a plurality of candidate motion vector combinations, so that the target motion vector combination can be conveniently determined from the candidate motion vector combinations.

It should be noted that, at present, the adjustment of the bi-directional MMVD to MV0offset and MV1offset is performed on the assumption that the motion pattern in the video follows continuous motion in a fixed direction in time sequence. Referring to fig. 4, fig. 4 is a schematic diagram illustrating a motion situation of a coding block according to an exemplary embodiment. Wherein, bidirectional MMVD refers to bidirectional prediction using fusion techniques with motion vector differences. Wherein MV0offset refers to a forward offset motion vector, and MV1offset refers to a backward offset motion vector.

Wherein fig. 4 (a) is a schematic diagram when poc0< poc, poc1> poc. Where poc refers to the image sequence number of the current frame, poc0 refers to the image sequence number of the forward reference frame, and poc1 refers to the image sequence number of the backward reference frame. Since the current frame is located between the forward reference frame and the backward reference frame in time sequence, the motion condition of the encoded block has strong symmetry on the forward and backward reference frames, and the bidirectional MMVD adjusts MV1offset (backward offset motion vector) according to the relative relationship of poc. Fig. 4 (b) and fig. 4 (c) are schematic diagrams when the motion of the current coding block, the forward reference block, and the backward reference block is discontinuous. In fig. 4 (b), the position of the best matching block of the coding block in the forward reference frame is shifted upward with respect to the position of the current coding block, and the position of the best matching block of the coding block in the backward reference frame and the relative position of the current coding block are not shifted substantially; in fig. 4 (c), the position of the best matching block of the coded block in both the forward reference frame and the backward reference frame is shifted upward relative to the position of the current coded block.

That is, in the case that the motion of the current coding block, the forward reference block, and the backward reference block is discontinuous, the adoption of the current strategy of adjusting MV1offset in the bidirectional MMVD does not allow the coding block to find a suitable optimal matching block. Meanwhile, since the bidirectional MMVD performs motion search only in the horizontal direction and the vertical direction, in an actual video, the target object may have irregular motion in other directions in addition to the horizontal direction and the vertical direction. Therefore, the determination process of MV0offset and MV1offset of the bidirectional MMVD needs to be adjusted as follows.

By making different adjustments to the search step, search direction, and search combinations, two implementations can be obtained. The first implementation mode is suitable for a scene that the coding block represents the object to do discontinuous motion, and the second implementation mode is suitable for a scene that the coding block represents the object to do non-horizontal and vertical motion.

In some embodiments, in the scenario where the encoded block represents discontinuous motion of the object, the search step size, search direction, and search combination are adjusted as follows, using a first implementation. Correspondingly, the first number of search steps comprises 1/8 pixel, 1/4 pixel, 1/2 pixel, 1 pixel, 2 pixel, 4 pixel, 8 pixel, 16 pixel; the second number of search directions includes an up direction, a down direction, a left direction, and a right direction; the plurality of search combinations are used for indicating that the bidirectional initial motion vector performs the search combinations of the same-direction search, the reverse search and the unidirectional search in the forward direction and the backward direction respectively, wherein the same-direction search refers to the search mode that the forward direction offset vector and the backward direction offset vector are the same, the reverse search refers to the search mode that the forward direction offset vector and the backward direction offset vector are opposite, and the unidirectional search refers to the search mode that only forward direction offset or only backward direction offset is performed. By performing motion search based on a plurality of search combinations of the plurality of search magnitudes and the plurality of search directions, the obtained plurality of candidate motion vector combinations can better describe the condition that the encoding block represents the object to perform discontinuous motion, thereby being more comprehensive and convenient for improving the encoding efficiency.

In some embodiments, in a scenario where the encoded block represents an object that is not horizontal or vertical, the search step size, search direction, and search combination are adjusted as follows, using a second implementation. Correspondingly, the first number of search steps comprises 1/8 pixel, 1/4 pixel, 1/2 pixel, 1 pixel, 2 pixel, 4 pixel, 8 pixel, 16 pixel; the second number of search directions includes n directions, an angle between the n directions being 360/n degrees, where n is a positive integer; the plurality of search combinations are used to indicate a search combination in which the bi-directional initial motion vector performs a reverse search in the forward and backward directions, the reverse search being a search in which the forward offset vector is opposite to the backward offset vector. By performing motion search based on a plurality of search combinations of the plurality of search magnitudes and the plurality of search directions, the obtained plurality of candidate motion vector combinations can better describe the condition that the coding block represents the object to perform non-horizontal and vertical, thereby being more comprehensive and convenient for improving the coding efficiency.

It should be noted that the two implementations are mutually exclusive, and therefore, the first implementation or the second implementation needs to be predefined in the encoder and the decoder. Generally, if more discontinuous motion scenes appear in the video, a first implementation manner is generally adopted; if the video motion is more continuous, but there is also a large amount of motion in a direction other than horizontal and vertical, a second implementation is generally employed.

In the scene that the coded block represents the object to do discontinuous motion, whether to invert MV1offset is no longer determined according to poc among the current frame, the forward reference frame and the backward reference frame, a first implementation mode is adopted to adjust the searching step length, the searching direction and the searching combination, and the MV0offset and the MV1offset are respectively determined again in the horizontal x-axis direction and the vertical y-axis direction. For example, the specific implementation steps are as follows steps (1) - (4):

(1) A search step size (offset) is determined. Wherein, the 8 offset steps are {1/4,1/2,1,2,4,8,16,32};

(2) From the search step size and the step size coefficient (m), a search magnitude (delta) is determined. Where delta=offset.

In order to realize the adaptation, the search amplitude is adjusted by adjusting m, where m may be any real number other than 0. The value of m can be set according to the characteristics of video content. For example, for a simpler sequence of video content, m may generally be set to a real number greater than 1; for video frame sequences containing complex content, m can be set to 0<m.ltoreq.1 in general, enabling finer motion searches around the initial motion vector, avoiding information loss. In addition, the value of m is preset at the encoding end and the decoding end at the same time, and the code stream does not need to be written.

(3) The motion vector difference (MVD, motion Vector Difference) is determined. If the offset is made in the x-axis direction, mvd= { delta,0}; if the offset is made in the y-axis direction, mvd= {0, delta }.

(4) 8 offset motion vector combinations are determined. See table 1 below.

Table 1offset motion vector combinations in discontinuous motion scenes

Candidate sequence number	MV0offset	MV1offset
			Candidate 0	MVD	MVD
Candidate 1	MVD	MVD*(-1)
			Candidate 2	MVD*(-1)	MVD
Candidate 3	MVD*(-1)	MVD*(-1)
			Candidate 4	MVD	zeroMV
Candidate 5	zeroMV	MVD
			Candidate 6	MVD*(-1)	zeroMV
Candidate 7	zeroMV	MVD*(-1)

The 8 offset motion vector combinations include a plurality of cases such as MV0offset and MV1offset being offset in the same direction, being offset in opposite directions, being offset only MV0offset, and being offset only MV1offset. That is, there are various cases in which the same-direction search, the reverse search, and the one-direction search are performed. Wherein the zero vector (zeroMV) introduced is denoted as zeromv= {0,0}. The above combinations may effectively contain representations of more motion cases of the encoded blocks. For example, candidate 4 and candidate 0 may better represent the two cases (b) and (c) shown in fig. 4, respectively.

In the scene that the coded block represents the object to perform non-horizontal and vertical motion, whether to perform inversion on MV1offset is not determined according to poc among the current frame, the forward reference frame and the backward reference frame, and the search step length, the search direction and the search combination are adjusted by adopting a second implementation mode. The direction is expanded into n directions, and the angle between every two directions is 360/n. MV0offset and MV1offset are re-determined, respectively. It should be noted that, the n directions may not be equal, which is not limited by the embodiments of the present disclosure.

For example, the implementation steps are as follows steps (1) - (4):

(2) From the search step size and the step size coefficient (m), a search magnitude (delta) is determined. Where delta=offset. The meaning and value of m are not described in detail herein.

(3) Motion Vector Differences (MVDs) are determined.

Taking n=16 as an example, as shown in fig. 5, fig. 5 is a schematic diagram illustrating an expansion direction according to an exemplary embodiment. For a direction with an angle α, the horizontal component of the MVD is cos (α) delta and the vertical component is-sin (α) delta. It should be noted that the horizontal component and the vertical component of the MVD may be extended, for example, (a×cos (α), -b×sin (α)), which is not limited by the embodiments of the present disclosure.

(4) 16 offset motion vector combinations are determined. See table 2 below.

Table 2 offset motion vector combinations in non-horizontal vertical motion scenarios

Direction number	MV0offset	MV1offset
			Direction 0	{delta,0}	MV0offset*(-1)
Direction 1	{-delta,0},	MV0offset*(-1)
			Direction 2	{0,delta},	MV0offset*(-1)
Direction 3	{0,-delta}	MV0offset*(-1)
			Direction 4	{deltacos(2α),delta-sin(2α)}	MV0offset*(-1)
Direction 5	{deltacos(10α),delta-sin(10α)}	MV0offset*(-1)
			Direction 6	{deltacos(14α),delta-sin(14α)}	MV0offset*(-1)
Direction 7	{deltacos(6α),delta-sin(6α)}	MV0offset*(-1)
			Direction 8	{deltacos(3α),delta-sin(3α)}	MV0offset*(-1)
Direction 9	{deltacos(α),delta-sin(α)}	MV0offset*(-1)
			Direction 10	{deltacos(15α),delta-sin(15α)}	MV0offset*(-1)
Direction 11	{deltacos(13α),delta-sin(13α)}	MV0offset*(-1)
			Direction 12	{deltacos(11α),delta-sin(11α)}	MV0offset*(-1)
Direction 13	{deltacos(9α),delta-sin(9α)}	MV0offset*(-1)
			Direction 14	{deltacos(7α),delta-sin(7α)}	MV0offset*(-1)
Direction 15	{deltacos(5α),delta-sin(5α)}	MV0offset*(-1)

In the second implementation manner, the purpose of deflecting a certain angle can be achieved by modifying the component values of MVoffset in the horizontal direction and the vertical direction, while in the first implementation manner, the component values in the horizontal direction and the vertical direction are not modified, but the MVoffset is subjected to positive and negative value taking operation as a whole, so that the purpose of horizontal and vertical deflection is achieved. In the second implementation manner, the adjustment manner of the forward motion candidate vector and the backward motion candidate vector in the bidirectional MMVD is still continuously used, that is, MV1offset is directly inverted according to MV0 offset.

In addition, the first implementation manner is that on the basis of a given search amplitude, motion search is performed in the x and y directions to obtain a plurality of candidate motion vector combinations, and then template matching or bilateral matching is used for determining the first two better candidate motion vector combinations in the x and y directions respectively. Thus, it is subsequently necessary to design syntax elements to represent the x and y directions and to select the 0 th or 1 st candidate. The second implementation mode is that on the basis of a given search amplitude, motion search is conducted in n directions according to fixed angle values to obtain a plurality of candidate motion vector combinations, and then template matching or bilateral matching is utilized to determine the first four better candidate motion vector combinations. Thus, a syntax element is subsequently required to represent the target motion vector combination of the four candidate motion vector combinations selected.

In step S306, a plurality of absolute error sums corresponding to the plurality of candidate motion vector combinations are determined, the absolute error sums indicating a difference between the encoded block and a predicted block of the encoded block.

In the embodiment of the disclosure, the terminal may determine a plurality of absolute error sums corresponding to a plurality of candidate motion vector combinations by using a mode based on a template matching algorithm or a mode based on a bilateral matching algorithm.

In some embodiments, a template matching algorithm may be used to determine the absolute error sum of a plurality of candidate motion vector combinations. Correspondingly, for any candidate motion vector combination, the terminal determines a reference block of the coding block based on the candidate motion vector combination and the coding block, wherein the reference block is a starting point of the candidate motion vector combination, and the coding block is an ending point of the candidate motion vector combination; the terminal performs motion compensation on edge pixels of the coding block based on edge pixels of the reference block to obtain edge prediction pixels, wherein the edge pixels comprise left side pixels and upper side pixels; the terminal determines the sum of absolute errors of the edge pixels and the edge prediction pixels of the encoded block. By using a template matching algorithm to perform preliminary screening on a plurality of candidate motion vectors, candidate motion vector combinations with better benefits can be efficiently obtained under the condition of strict complexity requirements, and the target motion vector combinations can be conveniently obtained from the candidate motion vector combinations.

In some embodiments, a bilateral matching algorithm may be used to determine the absolute error sum of a plurality of candidate motion vector combinations. Correspondingly, for any candidate motion vector combination, the terminal performs motion compensation on the coding block to obtain a prediction block of the coding block, wherein the prediction block comprises a forward prediction block and a backward prediction block; the terminal determines the absolute error sum of the prediction block and the coding block. By using the bilateral matching algorithm to perform preliminary screening on a plurality of candidate motion vectors, the candidate motion vector combination with better benefits can be obtained more accurately under the condition of lower complexity requirement, and the target motion vector combination can be obtained conveniently.

It should be noted that, the method based on the template matching algorithm has lower complexity than the method based on the bilateral matching algorithm, but the matching accuracy is lower than the bilateral matching, so that the benefit is lower than the bilateral matching. Therefore, a mode based on template matching can be adopted for scenes with strict complexity requirements; for scenes with low complexity requirements, a bilateral matching-based mode can be adopted.

When a template matching algorithm-based mode is used for carrying out primary screening on a plurality of candidate motion vector combinations, the specific implementation steps are as follows (1) - (4):

(1) Referring to fig. 6, fig. 6 is a schematic diagram illustrating a template matching according to an exemplary embodiment. And determining a reference block of the coding block according to the combination of the coding block and the candidate motion vector, and further determining left side pixels and upper side pixels of the coding block and the reference block respectively. Wherein, the left side pixels and the upper side pixels of the coding block and the reference block are in one-to-one correspondence.

(2) And performing motion compensation on the left pixel and the upper pixel of the coding block by using the left pixel and the upper pixel of the reference block to obtain predicted pixels of the left pixel and the upper pixel.

When the candidate motion vector combination is directed to a sub-pixel, the following two processing methods are available to obtain the predicted pixels of the left pixel and the upper pixel. The first method is to interpolate the predicted pixels of the left pixel and the upper pixel by adopting the same interpolation method as the encoder; the second method refers to performing whole pixel rounding, and only the pixel corresponding to the whole pixel position is required to be obtained at this time, and interpolation operation is not required.

The two methods have advantages and disadvantages, and the first method can acquire more accurate prediction pixels, but the complexity is improved due to interpolation operation; the second method does not perform interpolation operation and thus has low complexity, but since rounding operation is performed, accuracy of predicted pixels may be affected to some extent. Therefore, if the application scene has higher requirements on complexity, a second method is adopted; the first approach is used if complexity is not a concern.

(3) And calculating the sum of absolute errors (SAD) of all corresponding positions of the prediction pixels of the left pixel and the upper pixel and the left pixel and the upper pixel of the coding block, namely, obtaining a residual error obtained by the motion compensation of the coding block under the candidate motion vector combination.

In step S307, a third number of candidate motion vector combinations is determined based on the plurality of absolute error sums in order from small to large.

In the embodiment of the disclosure, the terminal sorts the plurality of absolute error sums in order from small to large, and then determines candidate motion vectors therefrom. Wherein the third number is different in different motion scenarios.

It should be noted that, in the above example, 2×2×8=256 candidate motion vector combinations can be determined in total based on two initial motion vectors, two search directions, eight search steps, and eight offset motion vector combinations in the discontinuous motion scene; in a non-horizontal vertical motion scenario, in the above example, 2×8×16=256 candidate motion vector combinations can be determined altogether based on two initial motion vectors, sixteen search directions, and eight search steps. The encoding end determines an optimal value from a plurality of searching step sizes and a plurality of searching directions respectively, so that the searching step sizes and the searching directions are determined at the decoding end, and the decoding end can obtain the optimal candidate motion vector only by decoding the code stream to obtain the searching step sizes and the searching directions.

When a plurality of candidate motion vector combinations are initially screened by using a template matching method, different numbers of candidate motion vector combinations after initial screening need to be determined in the following two scenarios respectively. In the discontinuous motion scene, among the 256 candidate motion vector combinations generated in the above example, the absolute error sum of each candidate motion vector combination is determined sequentially by using a template matching algorithm, and the sequence is ordered in the order from large to small, and it should be noted that the stable ordering method is required to be used for the ordering. Finally, the first two candidates with the smallest absolute error sum are selected as candidate motion vector combinations for the second stage screening. In the non-horizontal and vertical motion scene, among the 256 candidate motion vector combinations generated in the above example, the absolute error sum of each candidate motion vector combination is determined sequentially by using a template matching algorithm, and the candidate motion vector combinations are ordered in the order from large to small, and it should be noted that a stable ordering method is required for the ordering. Finally, the first four candidates with the smallest absolute error sum are selected as candidate motion vector combinations for the second stage screening.

When the two-sided matching is used to perform the preliminary screening on the plurality of candidate motion vector combinations, different numbers of candidate motion vector combinations after the preliminary screening need to be determined in the following two scenarios respectively. In the discontinuous motion scene, if BCW of the prediction block is not equal to bcw_default, the pixels of the forward prediction block and the backward prediction block need to be multiplied by corresponding weights respectively, and then absolute error sums are calculated. And similarly, sorting by adopting a stable sorting algorithm, and finally selecting the first two candidates with the smallest absolute error sum as candidate motion vector combinations screened in the second stage. In the discontinuous motion scene, if BCW of the prediction block is not equal to bcw_default, the pixels of the forward prediction block and the backward prediction block need to be multiplied by corresponding weights respectively, and then absolute error sums are calculated. And similarly, sorting by adopting a stable sorting algorithm, and selecting the first four candidates with the smallest absolute errors as candidate motion vector combinations screened in the second stage.

In step S308, among the third number of candidate motion vector combinations, a target motion vector combination is determined based on the rate-distortion optimization.

In an embodiment of the present disclosure, the terminal determines the target motion vector from among the above-described plurality of candidate motion vector combinations based on rate distortion optimization. Rate-distortion optimization is a technique for improving the quality of video compression video, and is to optimize the amount of distortion for the amount of data required for video coding, i.e., to reduce the loss of video quality.

For example, for two candidate motion vector combinations or four candidate motion vector combinations generated when the preliminary screening is performed, an optimal candidate motion vector combination is selected in a rate distortion optimization manner and identified by a. If in the discontinuous motion scene, a=0 is used to indicate that the 0 th candidate motion vector combination is optimal, and a=1 is used to indicate that the 1 st candidate motion vector combination is optimal. Similarly, if in a non-horizontal vertical scene, a=0/1/2/3 is used to represent that the 0/1/2/3 candidate motion vector combination is optimal. It should be noted that, the embodiment of the present disclosure does not limit the adopted identification method, as long as different candidate motion vector combinations can be distinguished.

It should be noted that, in the current bidirectional MMVD method, three syntax elements, namely, an initial motion vector index (mmvd_cand_flag), a search step index (mmvd_distance_idx), and a search direction index (mmvd_direction_idx), are required to be written into a code stream for each coding block.

Wherein the initial motion vector index only requires 1bit to represent its information. The details are shown in table 3 below:

table 3 initial motion vector index syntax element meaning

Value of	Meaning of representation
		0	Selecting the zeroth candidate motion information
1	Selecting first candidate motion information

The search step index is encoded by adopting a truncated unary code mode, and 1-7 bits are needed to represent information according to the selected step. Specific meanings are shown in table 4 below:

table 4 search step index syntax element meaning

Value of	Meaning of representation
		0	Select step 0
10	Select 1 st step
		110	Select step 2
1110	Select step 3
		11110	Select step 4
111110	Select the 5 th step
		1111110	Select the 6 th step
1111111	Select the 7 th step

Wherein the search direction index requires 2bits to represent its information. The details are shown in table 5 below:

table 5 search direction index syntax element meaning

Value of	Meaning of representation
		00	Selecting positive x-axis direction
01	Selecting the negative x-axis direction
		10	Selecting the positive y-axis direction
11	Selecting the negative y-axis direction

It should be noted that, in the discontinuous motion scene, only the x-axis and the y-axis directions need to be represented, and positive and negative are not needed to be represented. Thus, in the disclosed embodiment, the specific meaning of the syntax element of the search direction index is referred to in table 6. The index of the candidate is represented by bit 0 in binary, and the x-axis and y-axis directions are represented by bit 1. Wherein the initial motion vector index and the search step index are still referred to in tables 3 and 4.

TABLE 6 syntax element meaning for search direction index in discontinuous motion scene

Value of	Meaning of representation
		00	Select x-axis and select candidate 0
01	Select x-axis and select 1 st candidate
		10	Select y-axis and select 0 th candidate
11	Select y-axis and select 1 st candidate

In the non-horizontal-vertical motion scene, the direction is not necessarily expressed, and only the nth candidate of the first 4 candidates is necessarily expressed. Thus, in the embodiments of the present disclosure, the specific meaning of the syntax element of the search direction index is referred to in table 7. Wherein the initial motion vector index and the search step index are still referred to in tables 3 and 4.

TABLE 7 syntax element meaning for search direction index in non-horizontal vertical motion scene

Value of	Meaning of representation
		00	Selection of candidate 0
01	Selection of candidate 1
		10	Selection of candidate 2
11	Selection of candidate 3

It should be noted that, compared with the current bidirectional MMVD method, the method in the embodiment of the present disclosure can effectively improve the coding performance. For example, the optimum gain in a discontinuous motion scene is BD-SSIM611 (-0.70%), BD-PSNR611 (-0.82%), BD-VMAF611 (-1.10%). The optimum gain in a non-horizontal vertical motion scenario is BD-SSIM611 (-0.40%), BD-PSNR611 (-0.35%). That is, the video rate is reduced in each index. Compared with the current bidirectional MMVD method, the embodiment of the disclosure adds more candidate motion vectors capable of effectively expressing the motion information of the coding block, and simultaneously adopts a mode based on a template matching algorithm or a bidirectional matching algorithm to screen the most effective candidate motion vectors. Under the condition of not increasing performance loss, accuracy and coding efficiency are improved.

Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.

Fig. 7 is a block diagram of a video processing apparatus according to an exemplary embodiment. As shown in fig. 7, the apparatus includes: a first determination unit 701, a second determination unit 702, a third determination unit 703, and an encoding unit 704.

A first determining unit 701 configured to determine, for any one of encoded blocks of a target image in a target video, a bi-directional initial motion vector of the encoded block, the target image being any one video frame in the target video, the encoded block being used for compressing and storing partial information of the video frame, the bi-directional initial motion vector being a candidate motion vector for bi-directionally predicting the encoded block;

a second determining unit 702 configured to perform a motion search based on a plurality of search combinations, including a previous frame and a subsequent frame of the target image in the video, with a position pointed by the bi-directional initial motion vector in the target reference image as a search start point, the plurality of search combinations being made up of a plurality of search magnitudes and a plurality of search directions, the search magnitudes being used to indicate an offset of the bi-directional initial motion vector, the candidate motion vector combination including a forward candidate motion vector and a backward candidate motion vector;

A third determining unit 703 configured to determine a target motion vector combination from a plurality of candidate motion vector combinations, the target motion vector combination being a candidate motion vector combination in which a difference between a target reference block and a coding block reaches a difference threshold, the target reference block being a coding unit corresponding to the coding block in the target reference image;

and an encoding unit 704 configured to encode the index of the bi-directional initial motion vector and the index of the search combination of the target motion vector combination to obtain encoded data, wherein the encoded data is used for determining a prediction block of the encoded block in the target image.

In some embodiments, the first determining unit 701 is configured to obtain, for any one of the encoded blocks of the target image in the target video, a candidate motion information list of the encoded blocks, where the candidate motion information list includes a candidate motion vector, a prediction direction, and a reference image; traversing bi-directionally predicted candidate motion vectors in the candidate motion information list based on the forward-backward order; determining a bi-directional predicted candidate motion vector as a bi-directional initial motion vector; and stopping traversing the bi-directional predicted candidate motion vector when the bi-directional initial motion vector reaches the quantity threshold.

In some embodiments, the first determining unit 701 is further configured to traverse the candidate motion vectors of the unidirectional prediction based on the order from front to back in the candidate motion information list, in the case that the bi-directional initial motion vector does not reach the number threshold after the traversing of the candidate motion vectors of the bi-prediction is completed; determining a candidate motion vector of unidirectional prediction as a unidirectional initial motion vector; and stopping traversing the candidate motion vector of the unidirectional prediction when the sum of the bidirectional initial motion vector and the unidirectional initial motion vector reaches a quantity threshold.

In some embodiments, the second determining unit 702 is configured to determine the first number of search steps and the second number of search directions with a position pointed by the bi-directional initial motion vector in the target reference image as a search start point; determining a first number of search magnitudes based on the first number of search steps, the first number of search magnitudes being a product value of the first number of search steps and a first number of step coefficients; performing motion search in a second number of search directions based on a plurality of search combinations constructed of the first number of search magnitudes and the second number of search directions, determining a plurality of offset motion vector combinations including a forward offset vector for forward offset of the bi-directional initial motion vector and a backward offset vector for backward offset; and adding the plurality of offset motion vector combinations with the bidirectional initial motion vector respectively to obtain a plurality of candidate motion vector combinations.

In some embodiments, the first number of search steps includes 1/8 pixel, 1/4 pixel, 1/2 pixel, 1 pixel, 2 pixel, 4 pixel, 8 pixel, 16 pixel; the second number of search directions includes an up direction, a down direction, a left direction, and a right direction; the plurality of search combinations are used for indicating that the bidirectional initial motion vector performs the search combinations of the same-direction search, the reverse search and the unidirectional search in the forward direction and the backward direction respectively, wherein the same-direction search refers to the search mode that the forward direction offset vector and the backward direction offset vector are the same, the reverse search refers to the search mode that the forward direction offset vector and the backward direction offset vector are opposite, and the unidirectional search refers to the search mode that only forward direction offset or only backward direction offset is performed.

In some embodiments, the first number of search steps includes 1/8 pixel, 1/4 pixel, 1/2 pixel, 1 pixel, 2 pixel, 4 pixel, 8 pixel, 16 pixel; the second number of search directions includes n directions, an angle between the n directions being 360/n degrees, where n is a positive integer; the plurality of search combinations are used to indicate a search combination in which the bi-directional initial motion vector performs a reverse search in the forward and backward directions, the reverse search being a search in which the forward offset vector is opposite to the backward offset vector.

In some embodiments, fig. 8 is a block diagram of another video processing apparatus shown according to an exemplary embodiment, and as shown in fig. 8, a third determining unit 703 includes:

a first determining subunit 801 configured to determine a plurality of absolute error sums corresponding to a plurality of candidate motion vector combinations, the absolute error sums being used to indicate a difference between the encoded block and a predicted block of the encoded block;

a second determining subunit 802 configured to determine a third number of candidate motion vector combinations based on the plurality of absolute error sums in order from small to large;

the third determining subunit 803 is configured to determine, among the third number of candidate motion vector combinations, a target motion vector combination based on the rate-distortion optimization.

In some embodiments, the first determining subunit 801 is configured to determine, for any candidate motion vector combination, a reference block of the encoded block based on the candidate motion vector combination and the encoded block, the reference block being a start point of the candidate motion vector combination, the encoded block being an end point of the candidate motion vector combination; performing motion compensation on edge pixels of the coding block based on edge pixels of the reference block to obtain edge prediction pixels, wherein the edge pixels comprise left side pixels and upper side pixels; the absolute error sum of the edge pixels and the edge prediction pixels of the encoded block is determined.

In some embodiments, the first determining subunit 801 is configured to perform motion compensation on the encoded block for any candidate motion vector combination to obtain a predicted block of the encoded block, where the predicted block includes a forward predicted block and a backward predicted block; the absolute error sum of the prediction block and the coding block is determined.

In some embodiments, the apparatus further comprises:

the searching unit 804 is configured to perform motion search on the bi-directional initial motion vector through a decoding end motion vector refinement technology under the condition that the bi-directional prediction weight of the bi-directional initial motion vector is not equal to a preset weight, determine the bi-directional initial motion vector after the offset, where the bi-directional prediction weight is a weight stored in a candidate motion information list of the coding block, and the residual error of the bi-directional initial motion vector after the offset is smaller than the residual error of the bi-directional initial motion vector, where the residual error is used to indicate the difference between the reference block and the prediction block.

The embodiment of the disclosure provides a video processing device, which can process the irregular motion of an object in an image frame in the bidirectional prediction process on the premise of not increasing code rate consumption by adjusting a plurality of search amplitudes and a plurality of search directions used when a bidirectional initial motion vector performs motion search. Compared with the traditional mode, the scheme improves the efficiency of video compression coding.

It should be noted that, in the video processing apparatus provided in the above embodiment, only the division of the functional units is illustrated, and in practical application, the above functional allocation may be performed by different functional units according to needs, that is, the internal structure of the electronic device is divided into different functional units, so as to perform all or part of the functions described above. In addition, the video processing apparatus and the video processing method embodiment provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.

With respect to the video processing apparatus in the above-described embodiment, the specific manner in which the respective modules perform operations has been described in detail in the embodiment regarding the method, and will not be described in detail here.

Fig. 9 is a block diagram of an electronic device, according to an example embodiment. Generally, the electronic device 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 901 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 901 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 901 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 901 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.

The memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is used to store at least one program code for execution by processor 901 to implement the video processing methods provided by the method embodiments in the present disclosure.

In some embodiments, the electronic device 900 may further optionally include: a peripheral interface 903, and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 903 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 904, a display 905, a camera assembly 906, audio circuitry 907, and a power source 908.

The peripheral interface 903 may be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 901, the memory 902, and the peripheral interface 903 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 904 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuit 904 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 904 may communicate with other electronic devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: metropolitan area networks, various generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency circuitry 904 may also include NFC (Near Field Communication, short range wireless communication) related circuitry, which is not limited by the present disclosure.

The display 905 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 905 is a touch display, the display 905 also has the ability to capture touch signals at or above the surface of the display 905. The touch signal may be input as a control signal to the processor 901 for processing. At this time, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 905 may be one, providing a front panel of the electronic device 900; in other embodiments, the display 905 may be at least two, respectively disposed on different surfaces of the electronic device 900 or in a folded design; in still other embodiments, the display 905 may be a flexible display disposed on a curved surface or a folded surface of the electronic device 900. Even more, the display 905 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 905 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.

The camera assembly 906 is used to capture images or video. Optionally, the camera assembly 906 includes a front camera and a rear camera. In general, a front camera is disposed on a front panel of an electronic device, and a rear camera is disposed on a rear surface of the electronic device. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple and separately disposed at different locations of the electronic device 900. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 907 may also include a headphone jack.

The power supply 908 is used to power the various components in the electronic device 900. The power source 908 may be alternating current, direct current, disposable or rechargeable. When the power source 908 comprises a rechargeable battery, the rechargeable battery may support wired or wireless charging. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the structure shown in fig. 9 is not limiting of the electronic device 900 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In an exemplary embodiment, a computer readable storage medium is also provided, such as a memory 902, comprising instructions executable by the processor 901 of the terminal 900 to perform the video processing method described above. Alternatively, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.

A computer program product comprising a computer program which, when executed by a processor, implements the video processing method described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method of video processing, the method comprising:

2. The method according to claim 1, wherein the determining, for any one of the encoded blocks of the target image in the target video, the bi-directional initial motion vector of the encoded block comprises:

determining a bi-directional predicted candidate motion vector as the bi-directional initial motion vector;

And stopping traversing the bi-directional predicted candidate motion vector under the condition that the bi-directional initial motion vector reaches a quantity threshold value.

3. The video processing method according to claim 2, characterized in that the method further comprises:

traversing the unidirectional predicted candidate motion vectors based on the sequence from front to back in the candidate motion information list under the condition that the bidirectional initial motion vector does not reach the quantity threshold value after traversing the bidirectional predicted candidate motion vectors;

and stopping traversing the unidirectional predicted candidate motion vector if the sum of the bidirectional initial motion vector and the unidirectional initial motion vector reaches the number threshold.

4. The method according to claim 1, wherein the determining a plurality of candidate motion vector combinations by performing a motion search based on a plurality of search combinations with a position pointed by the bi-directional initial motion vector in the target reference image as a search start point includes:

determining a first number of search steps and a second number of search directions by taking the position pointed by the bidirectional initial motion vector in the target reference image as a search starting point;

performing a motion search in the second number of search directions based on the plurality of search combinations constructed by the first number of search magnitudes and the second number of search directions, determining a plurality of offset motion vector combinations including a forward offset vector for forward offset of the bi-directional initial motion vector and a backward offset vector for backward offset;

and adding the plurality of offset motion vector combinations with the bidirectional initial motion vector respectively to obtain the plurality of candidate motion vector combinations.

5. The video processing method according to claim 4, wherein:

the first number of search steps includes 1/8 pixels, 1/4 pixels, 1/2 pixels, 1 pixel, 2 pixels, 4 pixels, 8 pixels, 16 pixels;

the plurality of search combinations are used for indicating the bidirectional initial motion vector to respectively perform the same-direction search, reverse search and one-direction search in the forward direction and the backward direction, wherein the same-direction search refers to a search mode in which the forward direction offset vector and the backward direction offset vector are the same, the reverse search refers to a search mode in which the forward direction offset vector and the backward direction offset vector are opposite, and the one-direction search refers to a search mode in which only forward direction offset or only backward direction offset is performed.

6. The video processing method according to claim 4, wherein:

the plurality of search combinations are used for indicating the bidirectional initial motion vector to perform reverse search in the forward direction and the backward direction, and the reverse search refers to a search mode that the forward direction offset vector and the backward direction offset vector are opposite.

7. The method of video processing according to claim 1, wherein said determining a target motion vector combination from said plurality of candidate motion vector combinations comprises:

determining a plurality of absolute error sums corresponding to the plurality of candidate motion vector combinations, wherein the absolute error sums are used for indicating a difference between the coding block and a prediction block of the coding block;

in the third number of candidate motion vector combinations, the target motion vector combination is determined based on rate-distortion optimization.

8. The method of video processing according to claim 7, wherein said determining a plurality of absolute error sums corresponding to said plurality of candidate motion vector combinations comprises:

performing motion compensation on edge pixels of the coding block based on the edge pixels of the reference block to obtain edge prediction pixels, wherein the edge pixels comprise left side pixels and upper side pixels;

an absolute error sum of edge pixels of the encoded block and the edge prediction pixels is determined.

9. The method of video processing according to claim 7, wherein said determining a plurality of absolute error sums corresponding to said plurality of candidate motion vector combinations comprises:

an absolute error sum of the prediction block and the coding block is determined.

10. The video processing method of claim 1, wherein the method further comprises:

and under the condition that the bidirectional prediction weight of the bidirectional initial motion vector is not equal to a preset weight, performing motion search on the bidirectional initial motion vector through a decoding end motion vector refinement technology, and determining an offset bidirectional initial motion vector, wherein the bidirectional prediction weight is a weight stored in a candidate motion information list of the coding block, and the residual error of the offset bidirectional initial motion vector is smaller than the residual error of the bidirectional initial motion vector and is used for indicating the difference between the reference block and the prediction block.

11. A video processing apparatus, the apparatus comprising:

12. An electronic device, the electronic device comprising:

one or more processors;

a memory for storing the processor-executable program code;

wherein the processor is configured to execute the program code to implement the video processing method of any of claims 1 to 10.

13. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the video processing method of any one of claims 1 to 10.