CN115297333B

CN115297333B - Inter-frame prediction method and device of video data, electronic equipment and storage medium

Info

Publication number: CN115297333B
Application number: CN202211194847.9A
Authority: CN
Inventors: 简云瑞; 黄跃; 闻兴
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-09-29
Filing date: 2022-09-29
Publication date: 2023-03-24
Anticipated expiration: 2042-09-29
Also published as: CN115297333A

Abstract

The disclosure relates to an inter-frame prediction method and device of video data, an electronic device and a storage medium. The method comprises the following steps: under the condition that an inter-frame prediction mode of a coding block of video data is an inter-frame fusion prediction mode, acquiring motion information of a plurality of spatial domain candidate blocks of the coding block, and determining the target number of target spatial domain candidate blocks of which the motion information meets a preset similar condition; and under the condition that the target number is greater than or equal to a preset threshold value, determining the target motion information of the coding block according to the motion information of the target spatial domain candidate block, and predicting the coding block. By adopting the method and the device, under the condition that a plurality of spatial domain candidate blocks of the coding block are highly similar, the motion information of the coding block can be determined according to the motion information of the spatial domain candidate blocks, the coding block is predicted without encoding an index value or writing the index value into a code stream, the code rate consumption of the coding block is reduced, the coding performance of an inter-frame fusion prediction mode is effectively improved, and the coding efficiency is improved.

Description

Inter-frame prediction method and device of video data, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of video coding technologies, and in particular, to a method and an apparatus for inter-frame prediction of video data, an electronic device, and a storage medium.

Background

With the development of the field of Video Coding and decoding, various Video Coding standards, such as High Efficiency Video Coding (HEVC) and universal Video Coding (VVC) standards, appear.

In the related art, in the HEVC standard and the VVC standard, video frame data may be predicted by using an inter-frame fusion prediction mode, and in the inter-frame fusion prediction mode, coding information of coded blocks around a target coded block is generally multiplexed to predict the target coded block. Specifically, the current coding block needs to determine multiple candidate blocks that have been encoded at positions in an adjacent space domain, a time domain, and the like of the block, and determine a candidate block with the optimal encoding performance among the multiple candidate blocks that have been encoded by using a rate-distortion optimization method, so that the current coding block can write an index value of the candidate block with the optimal encoding performance into a code stream, and perform prediction based on the index value in the code stream.

However, the encoding efficiency of the encoding block is low by determining a candidate block with the best encoding performance and writing the index value of the candidate block into a prediction mode for prediction in the code stream.

Disclosure of Invention

The present disclosure provides a method, an apparatus, an electronic device and a storage medium for inter-frame prediction of video data, so as to at least solve the problem of low coding efficiency of a coding block in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided an inter prediction method of video data, including:

under the condition that an inter-frame prediction mode of a current coding block of video data is an inter-frame fusion prediction mode, acquiring motion information of a plurality of spatial domain candidate blocks of the current coding block;

determining the target number of target space domain candidate blocks of which the motion information meets a preset similar condition based on the motion information of each space domain candidate block;

and under the condition that the target number is greater than or equal to a preset threshold value, determining the target motion information of the current coding block according to the motion information of the target spatial domain candidate block, and predicting the current coding block based on the target motion information of the current coding block.

In one embodiment, the motion information includes prediction data and motion vector data, and the prediction data is used in inter prediction of video data;

the determining the target number of the target spatial candidate blocks of which the motion information meets the preset similar condition based on the motion information of each spatial candidate block comprises the following steps:

for each spatial domain candidate block in a plurality of spatial domain candidate blocks of the current coding block, determining a target quadrant corresponding to motion vector data of the spatial domain candidate block based on a preset corresponding relation between the motion vector data and the quadrant;

determining a target space domain candidate block with the same prediction data and the same target quadrant from a plurality of space domain candidate blocks of the current coding block;

determining a target number of the target spatial candidate blocks.

the determining the target motion information of the current coding block according to the motion information of the target spatial domain candidate block comprises:

taking the prediction data of the target spatial domain candidate block as the prediction data of the current coding block;

carrying out mean processing on the motion vector data of a plurality of target space domain candidate blocks, and determining the mean as the motion vector data of the current coding block;

and obtaining target motion information of the current coding block based on the prediction data of the current coding block and the motion vector data of the current coding block.

In one embodiment, the predicting the current coding block based on the target motion information of the current coding block includes:

determining a prediction reference video frame corresponding to the current coding block based on the target motion information of the current coding block, and determining a prediction reference transformation block and pixel data of the prediction reference transformation block in a plurality of reference transformation blocks contained in the prediction reference video frame, wherein the prediction reference transformation block is a data block used for predicting the pixel data of the current coding block;

and predicting the pixel data of the current coding block based on the pixel data of the prediction reference transformation block to obtain the predicted pixel data of the current coding block.

In one embodiment, the motion vector data includes a value corresponding to a first direction vector and a value corresponding to a second direction vector;

the determining a target quadrant corresponding to the motion vector data of the spatial candidate block based on the corresponding relationship between the preset motion vector data and the quadrant includes:

under the condition that a numerical value corresponding to a first direction vector of the airspace candidate block is larger than a target value and a numerical value corresponding to a second direction vector of the airspace candidate block is larger than the target value, determining that a quadrant corresponding to motion vector data of the airspace candidate block is a first quadrant and determining that the first quadrant is a target quadrant;

in one embodiment, the method further comprises:

and under the condition that the numerical value corresponding to the first direction vector of the spatial domain candidate block is less than or equal to the target value and the numerical value corresponding to the second direction vector of the spatial domain candidate block is less than or equal to the target value, determining that the quadrant corresponding to the motion vector data of the spatial domain candidate block is a third quadrant and determining that the third quadrant is a target quadrant.

In one embodiment, the method further comprises:

under the condition that the target number is smaller than the preset threshold value, determining a target candidate block in a plurality of candidate blocks of the current coding block according to a preset rate distortion optimization strategy;

determining target index values of the target candidate blocks in a plurality of candidate blocks of the current coding block, coding the target index values, and performing inter-frame prediction on the current coding block based on the coded target index values.

According to a second aspect of the embodiments of the present disclosure, there is provided an inter prediction apparatus for video data, including:

an acquisition unit configured to perform acquisition of motion information of a plurality of spatial domain candidate blocks of a current coding block of video data in a case where an inter-frame prediction mode of the current coding block is an inter-frame fusion prediction mode;

a first determination unit configured to perform determination of a target number of target spatial candidate blocks whose motion information satisfies a preset similarity condition, based on motion information of each of the spatial candidate blocks;

and the second determining unit is configured to determine the target motion information of the current coding block according to the motion information of the target spatial domain candidate block under the condition that the target number is greater than or equal to a preset threshold value, and predict the current coding block based on the target motion information of the current coding block.

the first determination unit includes:

a first determining subunit, configured to perform, for each of a plurality of spatial candidate blocks of the current coding block, determining, based on a preset correspondence relationship between motion vector data and quadrants, a target quadrant corresponding to the motion vector data of the spatial candidate block;

a second determining subunit, configured to perform determining, among the plurality of spatial candidate blocks of the current coding block, a target spatial candidate block whose prediction data are the same and whose target quadrant is the same;

determining a target number of the target spatial candidate blocks.

the second determination unit includes:

a third determining subunit configured to perform, as prediction data of the current coding block, prediction data of a target spatial candidate block;

the mean subunit is configured to perform mean processing on the motion vector data of the target spatial domain candidate blocks, and determine that the mean is the motion vector data of the current coding block;

In one embodiment, the second determining unit further includes:

a fourth determining subunit configured to perform determining, based on the target motion information of the current coding block, a prediction reference video frame corresponding to the current coding block, and determining, among a plurality of reference transform blocks included in the prediction reference video frame, a prediction reference transform block and pixel data of the prediction reference transform block, where the prediction reference transform block is a data block used for predicting pixel data of the current coding block;

a prediction sub-unit configured to perform prediction of pixel data of the current coding block based on pixel data of the prediction reference transform block, resulting in predicted pixel data of the current coding block.

the first determining subunit is specifically configured to:

In one embodiment, the apparatus further comprises:

a third determining unit, configured to perform, in a case that the target number is smaller than the preset threshold, determining a target candidate block in accordance with a preset rate-distortion optimization strategy among a plurality of candidate blocks of the current coding block;

a fourth determining unit configured to perform determining target index values of the target candidate blocks in a plurality of candidate blocks of the current coding block, encoding the target index values, and inter-predicting the current coding block based on the encoded target index values.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of inter-prediction of video data according to any of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the inter prediction method of video data according to any one of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the inter prediction method of video data according to any one of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

under the condition that an inter-frame prediction mode of a current coding block of video data is an inter-frame fusion prediction mode, acquiring motion information of a plurality of spatial domain candidate blocks of the coding block; determining the target number of target space domain candidate blocks of which the motion information meets a preset similar condition based on the motion information of each space domain candidate block; and under the condition that the target number is greater than or equal to a preset threshold value, determining the target motion information of the coding block according to the motion information of the target spatial domain candidate block, and predicting the coding block based on the target motion information of the coding block. By adopting the method and the device, under the condition that a plurality of spatial domain candidate blocks of the coding block are highly similar, the motion information of the coding block of the video data can be determined according to the motion information of the spatial domain candidate blocks, the coding block can be predicted without encoding an index value or writing the index value into a code stream, the code rate consumption of the coding block can be reduced, the coding performance of an inter-frame fusion prediction mode can be effectively improved, and the coding efficiency can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a flowchart illustrating a method of inter-prediction of video data according to an exemplary embodiment.

FIG. 2 is a diagram illustrating the location of spatial candidates for a current coding block in accordance with an exemplary embodiment.

Fig. 3 is a flowchart illustrating a target number determining step in a method of inter-prediction of video data according to an exemplary embodiment.

Fig. 4 is a flowchart illustrating a step of determining target motion information in a method of inter-prediction of video data according to an exemplary embodiment.

Fig. 5 is a flowchart illustrating a step of determining pixel data in a method for inter-prediction of video data according to an exemplary embodiment.

Fig. 6 is a flowchart illustrating a prediction step in a method of inter-prediction of video data according to an exemplary embodiment.

Fig. 7 is a block diagram illustrating an inter prediction apparatus of video data according to an exemplary embodiment.

FIG. 8 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

It should also be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are both information and data that are authorized by the user or sufficiently authorized by various parties.

Video coding techniques include intra/inter prediction, transform, quantization, entropy coding, and the like. In the HEVC video coding standard and the VVC video coding standard, prediction may include intra prediction and inter prediction, where inter prediction includes a normal inter prediction mode and an inter fusion prediction mode (merge mode), and coding information may include motion information and other information. In a common inter-frame prediction mode, a current coding block needs to write information such as motion information and reference frame index of the coding block into a code stream; in the inter-frame fusion prediction mode, the current coding block may multiplex coding information of coding blocks that have already been coded around the current coding block.

Specifically, multiple candidate blocks may be constructed at positions adjacent to the current coding block in a spatial dimension, multiple candidate blocks may be constructed at positions adjacent to the current coding block in a temporal dimension, and multiple candidate blocks corresponding to multiple dimensions may constitute a candidate list. The candidate lists may be ordered in order, and in one example the candidate list may be 0,1, \8230; P-1, where P may be the maximum number of candidate lists specified by the HEVC coding standard as well as the VVC coding standard.

Based on this, the current coding block may determine a target candidate block with the best coding performance in the candidate list by a rate distortion optimization method, determine an index value of the target candidate block in the candidate list, and encode the index value as a merge index value, so that the merge index value is written into the code stream, where the merge index value indicates which block of coding information is multiplexed by the current coding block. However, correlation between multiplexed encoded information is not considered in the related art, which results in low encoding performance and low encoding efficiency.

Accordingly, the present disclosure proposes an inter prediction method of video data that can improve coding efficiency.

Fig. 1 is a flowchart illustrating an inter prediction method of video data according to an exemplary embodiment, and as shown in fig. 1, the inter prediction method of video data may be applied to an electronic device, including the following steps.

In step S110, when the inter prediction mode of the current coding block of the video data is the inter-frame fusion prediction mode, motion information of a plurality of spatial candidate blocks of the current coding block is acquired.

The video data may be video frame data, which may be video data in audio/video data, and the video frame data may include a plurality of encoded blocks. The current coding block is a video frame data block which is being coded at the current moment, and the size information of the coding block in the video frame data can be determined according to the video data coding standard corresponding to the current coding block; the spatial candidate blocks are blocks of video frame data at a plurality of locations around the current coding block in the spatial dimension, and it is understood that the current coding block also includes blocks of video frame data at a plurality of locations around the current coding block in the temporal dimension.

In implementation, the electronic device may determine an inter-frame prediction mode of the current coding block in advance, and in a case that the inter-frame prediction mode of the current coding block is an inter-frame fusion prediction mode, the electronic device may determine, according to a candidate block determination policy corresponding to a preset video data coding standard, the number of spatial candidate blocks corresponding to the current coding block and position information of each spatial candidate block. Based on the above scheme, the electronic device may respectively obtain motion information of each spatial domain candidate block, where the motion information of the current coding block and the spatial domain candidate blocks of the current coding block are different under different video data coding standards.

Alternatively, fig. 2 may be a schematic diagram of relative positions of a plurality of spatial candidate blocks (which may be respectively referred to as A0, A1, B0, B1, B2) of a current coding block and the current coding block (which may be referred to as a).

In step S120, a target number of target spatial candidate blocks whose motion information satisfies a preset similarity condition is determined based on the motion information of each spatial candidate block.

Specifically, the electronic device may obtain motion information of all spatial candidate blocks of the current coding block, compare the motion information of each spatial candidate block, and determine, among the plurality of spatial candidate blocks of the current coding block, a spatial candidate block whose motion information satisfies a preset similar condition as a target spatial candidate block. In this way, the electronic device may count the number of the target spatial candidate blocks to obtain the target number. The predetermined similarity condition may be that the motion information of the spatial candidate blocks is the same.

In step S130, when the target number is greater than or equal to the preset threshold, the target motion information of the current coding block is determined according to the motion information of the target spatial domain candidate block, and the current coding block is predicted based on the target motion information of the current coding block.

The preset threshold may be determined according to an actual application scenario, or may be determined according to the number of the spatial candidate blocks corresponding to the current coding block, for example, the electronic device may perform weighting calculation according to the number of the spatial candidate blocks and a preset weight, so as to obtain the preset threshold. In one example, where the number of spatial candidate blocks is 5, the electronic device may determine the preset weight to be eighty percent, such that the calculated preset threshold may be 4.

In an implementation, the electronic device may compare the target number of the plurality of target spatial candidate blocks with a preset threshold to obtain a comparison result. And under the condition that the target number is greater than or equal to the preset threshold value as a comparison result, a larger number of spatial candidate blocks with consistent motion information exist in the plurality of spatial candidate blocks corresponding to the current coding block, so that the probability that the motion information of the current coding block is consistent with the motion information of the spatial candidate blocks with consistent motion information can be determined to be higher. In this way, the electronic device may determine target motion information for the current coding block based on the motion information for each spatial candidate block. After determining the target motion information of the current coding block, the electronic device may predict the pixel data of the current coding block based on the target motion information to obtain predicted pixel data of the current coding block.

In the inter-frame prediction method of the video data, under the condition that an inter-frame prediction mode of a current coding block of the video data is an inter-frame fusion prediction mode, motion information of a plurality of spatial domain candidate blocks of the current coding block is obtained; determining the target number of target space domain candidate blocks of which the motion information meets a preset similar condition based on the motion information of each space domain candidate block; and under the condition that the target number is greater than or equal to a preset threshold value, determining the target motion information of the current coding block according to the motion information of the target spatial domain candidate block, and predicting the current coding block based on the target motion information of the current coding block. By adopting the method and the device, under the condition that a plurality of spatial domain candidate blocks of the current coding block are highly similar, the motion information of the current coding block corresponding to the video frame data is determined according to the motion information of the spatial domain candidate blocks, the current coding block is predicted, an index value is not required to be coded, the index value is not required to be written into a code stream, the code rate consumption of the coding block is reduced, the coding performance of an inter-frame fusion prediction mode is effectively improved, and the coding efficiency is improved.

In an exemplary embodiment, the motion information includes prediction data as well as motion vector data.

The prediction data is used for predicting video data, and the types of information included in the prediction data are different under different video data encoding standards. The motion vector data includes a value corresponding to the first direction vector and a value corresponding to the second direction vector. In one example, the motion vector data of the current coding block may be (x, y), meaning that it may be shifted by x units in a first direction towards a positive direction of the first direction and by y units in a second direction towards a positive direction of the second direction. For example, the first direction may be a horizontal direction, the second direction may be a vertical direction, the positive direction of the first direction may be a rightward direction, and the positive direction of the second direction may be an upward direction.

In one example, if the current coding block is in an Inter-fusion prediction mode under the HEVC coding standard, the prediction data may include an Inter prediction Direction (ID) and a Reference Index (RI); in another example, if the current coding block is in an Inter-frame fusion prediction mode under the VVC coding standard, the prediction data may include an Inter Direction (ID), a Reference Index (RI), a Bi-directional weight prediction index (Bi-prediction with CU-level weight Idx, bi), a pixel interpolation index (HF), and an adaptive mv prediction index (AI).

Accordingly, as shown in fig. 3, in step S120, determining the target number of the target spatial candidate blocks whose motion information satisfies the preset similarity condition based on the motion information of each spatial candidate block may specifically be implemented by:

in step S121, for each spatial candidate block of the plurality of spatial candidate blocks of the current coding block, a target quadrant corresponding to the motion vector data of the spatial candidate block is determined based on a preset correspondence between the motion vector data and the quadrant.

In an implementation, for each spatial candidate block of the plurality of spatial candidate blocks of the current coding block, the electronic device may determine, according to a preset correspondence between motion vector data and a quadrant, a quadrant corresponding to the motion vector data of the spatial candidate block.

In step S122, a target spatial candidate block having the same prediction data and the same target quadrant is determined among the plurality of spatial candidate blocks of the current coding block.

In an implementation, among a plurality of spatial candidate blocks of a current coding block, an electronic device may screen the spatial candidate blocks with the same prediction data and consistent target quadrants, and use the plurality of spatial candidate blocks as target spatial candidate blocks,

in step S123, a target number of target spatial candidate blocks is determined.

In an implementation, the electronic device may count the number of target spatial candidate blocks to obtain a target number of the target spatial candidate blocks.

In one example, based on a target quadrant, the electronic device may partition multiple spatial candidate blocks of a current coding block, partitioning spatial candidate blocks that are the same for the target quadrant into the same set of spatial candidate blocks. Through the judgment of the target quadrants of the spatial candidate blocks of the current coding block, the electronic device can obtain at least one spatial candidate block set, wherein the target quadrants of the spatial candidate blocks contained in each spatial candidate block set are the same, and the corresponding target quadrants between the spatial candidate block sets are different.

Based on the above scheme, for each spatial candidate block set, the electronic device may determine whether the prediction data of the plurality of spatial candidate blocks included in the spatial candidate block set are the same, and in the spatial candidate block set, the electronic device may divide the spatial candidate blocks with the same prediction data into the same subset. Thus, the electronic device may obtain at least one subset by determining prediction data of a plurality of spatial candidate blocks included in the spatial candidate block set, and thus, the electronic device may count the number of spatial candidate blocks included in each subset to obtain the number of spatial candidate blocks included in each subset, and the electronic device may determine a subset corresponding to the largest number, and further use the spatial candidate blocks included in the subset as target spatial candidate blocks, and use the largest number as a target number.

In another example, the electronic device may partition multiple spatial candidate blocks of a currently encoded block based on prediction data, partitioning the spatial candidate blocks with the same prediction data into the same set of spatial candidate blocks. Through the judgment of the prediction data of the spatial candidate blocks of the current coding block, the electronic device can obtain at least one spatial candidate block set, wherein the prediction data of the spatial candidate blocks contained in each spatial candidate block set are the same, and the prediction data corresponding to each spatial candidate block set are different.

In this way, for each spatial candidate block set, the electronic device may determine whether target quadrants of the plurality of spatial candidate blocks included therein are the same, and in the spatial candidate block set, the electronic device may divide the spatial candidate blocks having the same target quadrants into the same subset. Thus, through the judgment of the target quadrants of the plurality of spatial candidate blocks included in the spatial candidate block set, the electronic device may obtain at least one subset, and thus, the electronic device may count the number of spatial candidate blocks included in each subset to obtain the number of spatial candidate blocks included in each subset, and the electronic device may determine the subset corresponding to the largest number, and further use the spatial candidate blocks included in the subset as the target spatial candidate blocks, and use the largest number as the target number.

In this embodiment, whether the current coding block meets the preset judgment condition may be judged, so as to improve the accuracy of judging the similarity of the plurality of spatial domain candidate blocks of the current coding block.

In an exemplary embodiment, as the motion information including prediction data and motion vector data in the above embodiments, correspondingly, as shown in fig. 4, in step S130, the target motion information of the current coding block is determined according to the motion information of the target spatial candidate block, which may be specifically implemented by the following steps:

in step S1311, the prediction data of the target spatial candidate block is used as the prediction data of the current coding block.

In an implementation, since the prediction data of each target spatial candidate block is the same, the electronic device may randomly determine a target spatial candidate block from among the target spatial candidate blocks, and use the prediction data of the target spatial candidate block as the prediction data of the current coding block.

In one example, if the current coding block is in inter-frame fusion prediction mode under the HEVC coding standard, the electronic device may use the ID value and RI value of the target spatial candidate block as the ID value and RI value of the current coding block.

In step S1312, the mean of the motion vector data of the target spatial candidate blocks is calculated, and the mean is determined as the motion vector data of the current coding block.

In implementation, the electronic device may obtain motion vector data corresponding to each target spatial domain candidate block that satisfies a preset similar condition, so that the electronic device may perform mean calculation on the motion vector data corresponding to each target spatial domain candidate block that satisfies the preset similar condition to obtain mean data, where the mean data is average motion vector data of the motion vector data of each target spatial domain candidate block that satisfies the preset similar condition, and the electronic device may use the mean data as the motion vector data of the current coding block.

In step 1313, target motion information of the current coding block is obtained based on the prediction data of the current coding block and the motion vector data of the current coding block.

In an implementation, since the motion information includes prediction data and motion vector data, based on which the electronic device may use the prediction data of the current encoding block and the motion vector data of the current encoding block as the target motion information of the current encoding block.

In the embodiment, the motion information of the current coding block can be conveniently and accurately determined without consuming extra coding resources.

In an exemplary embodiment, as shown in fig. 5, in step S130, predicting the current coding block based on the target motion information of the current coding block may specifically be implemented by the following steps:

in step S1321, a prediction reference video frame corresponding to the current coding block is determined based on the target motion information of the current coding block, and a prediction reference transform block and pixel data of the prediction reference transform block are determined among a plurality of reference transform blocks included in the prediction reference video frame.

Wherein the prediction reference video frame comprises a plurality of transform blocks, and the electronic device can determine the prediction reference transform block from the plurality of transform blocks contained in the prediction reference video frame; the prediction reference transform block is a data block to be referred to when the current coding block performs inter prediction, and the target motion information may include prediction data, which may include a reference index value and an inter prediction direction, and motion vector data.

In an implementation, the electronic device may determine a predicted reference video frame among a plurality of reference video frames corresponding to a current coding block based on a reference index value and an inter-prediction direction. In one example, the electronic device may obtain a reference video frame sequence corresponding to the current coding block, and may determine a predicted reference video frame corresponding to the current coding block in the reference video frame sequence based on the reference index value and the inter prediction direction.

The prediction reference video frame may be divided into a plurality of pixel blocks in advance, so that after determining the prediction reference video frame, the electronic device may obtain the plurality of pixel blocks included in the prediction reference video frame, that is, a plurality of reference transform blocks. Based on this, the electronic device may determine an initial reference transform block in the prediction reference video frame based on position information of the current coding block in a video frame corresponding to the current coding block. In this way, the electronic device can perform motion compensation processing, i.e., displacement in the prediction reference video frame, based on the motion vector data at the initial reference change block position, obtain a prediction reference transform block, and acquire pixel data of the prediction reference transform block.

In one example, the prediction data may further include a bi-directional weight prediction index, a pixel interpolation index, and an adaptive motion vector precision index. In this way, the electronic device may determine a prediction reference video frame from a plurality of reference video frames corresponding to the current coding block based on the reference index value and the inter-frame prediction direction, and after the step of determining the prediction reference video frame, the prediction reference video frame may be divided into a plurality of pixel blocks, i.e., a plurality of reference transform blocks. The electronic device may determine an initial reference transform block in the prediction reference video frame based on the position information of the current coding block in the video frame corresponding to the current coding block, and perform motion compensation processing, that is, perform displacement in the prediction reference video frame, to obtain a prediction reference transform block, and obtain pixel data of the prediction reference transform block, based on the motion vector data, the bidirectional weight prediction index, the pixel interpolation index, and the adaptive motion vector precision index at the initial reference transform block.

In step S1322, the pixel data of the current coding block is predicted based on the pixel data of the prediction reference transform block, resulting in predicted pixel data of the current coding block.

In implementation, the electronic device may obtain the pixel data of the prediction reference transform block to predict the pixel data of the current coding block, so as to obtain the predicted pixel data of the current coding block.

In the embodiment, redundant information in the video frame data can be effectively reduced, and the efficiency and the accuracy of motion estimation and motion compensation are improved.

In an exemplary embodiment, the motion vector data includes a value corresponding to the first direction vector and a value corresponding to the second direction vector.

Correspondingly, in step S121, a target quadrant corresponding to the motion vector data of the spatial candidate block is determined based on the preset correspondence between the motion vector data and the quadrant, which may specifically be implemented by the following steps:

and under the condition that the value corresponding to the first direction vector of the spatial candidate block is greater than the target value and the value corresponding to the second direction vector of the spatial candidate block is greater than the target value, determining that the quadrant corresponding to the motion vector data of the spatial candidate block is a first quadrant and determining that the first quadrant is a target quadrant.

The target value may be determined according to the requirements of the actual application scenario, and may be zero, for example.

In an implementation, the electronic device may compare a value corresponding to a first direction vector and a value corresponding to a second direction vector of the spatial candidate block with a target value, and if the value corresponding to the first direction vector of the spatial candidate block is greater than the target value and the value corresponding to the second direction vector of the spatial candidate block is greater than the target value, the electronic device may determine that a quadrant corresponding to the motion vector data of the spatial candidate block is the first quadrant, so that the electronic device may determine that the first quadrant is the target quadrant, that is, the electronic device may determine that the target quadrant corresponding to the motion vector data of the spatial candidate block is the first quadrant.

In this embodiment, the quadrant corresponding to the motion information of the spatial candidate block can be efficiently determined.

In an exemplary embodiment, the inter prediction method of video data further includes:

and under the condition that the numerical value corresponding to the first direction vector of the spatial candidate block is less than or equal to the target value and the numerical value corresponding to the second direction vector of the spatial candidate block is less than or equal to the target value, determining that the quadrant corresponding to the motion vector data of the spatial candidate block is a third quadrant and determining that the third quadrant is a target quadrant.

In an implementation, the electronic device may compare a value corresponding to the first direction vector and a value corresponding to the second direction vector of the spatial candidate block with a target value, and if the value corresponding to the first direction vector of the spatial candidate block is less than or equal to the target value and the value corresponding to the second direction vector of the spatial candidate block is also less than the target value, the electronic device may determine that a quadrant corresponding to the motion vector data of the spatial candidate block is a third quadrant, so that the electronic device may determine that the third quadrant is a target quadrant, that is, the electronic device may determine that the target quadrant corresponding to the motion vector data of the spatial candidate block is the third quadrant.

In one example, the electronic device may compare a numerical value corresponding to a first direction vector of the spatial candidate block and a numerical value corresponding to a second direction vector of the spatial candidate block with a target value, and if the numerical value corresponding to the first direction vector of the spatial candidate block is less than or equal to the target value and the numerical value corresponding to the second direction vector of the spatial candidate block is greater than the target value, the electronic device may determine that a quadrant corresponding to the motion vector data of the spatial candidate block is the second quadrant, so that the electronic device may use the second quadrant as a target quadrant, that is, the electronic device may determine that the target quadrant corresponding to the motion vector data of the spatial candidate block is the second quadrant.

Similarly, the electronic device may compare a value corresponding to the first direction vector and a value corresponding to the second direction vector of the spatial candidate block with the target value, and if the value corresponding to the first direction vector of the spatial candidate block is greater than the target value and the value corresponding to the second direction vector of the spatial candidate block is less than or equal to the target value, the electronic device may determine that the quadrant corresponding to the motion vector data of the spatial candidate block is the fourth quadrant.

Optionally, the process of determining the target quadrant may further include: if the value corresponding to the first direction vector is equal to the target value and the value corresponding to the second direction vector is greater than the target value, the electronic device may determine that the quadrant corresponding to the motion vector data of the spatial candidate block is the first direction of the first coordinate axis; if the value corresponding to the first direction vector is equal to the target value and the value corresponding to the second direction vector is smaller than the target value, the electronic device may determine that the quadrant corresponding to the motion vector data of the spatial candidate block is the second direction of the first coordinate axis; if the value corresponding to the first direction vector is greater than the target value and the value corresponding to the second direction vector is equal to the target value, the electronic device may determine that the quadrant corresponding to the motion vector data of the spatial candidate block is the first direction of the second coordinate axis; if the value corresponding to the first direction vector is smaller than the target value and the value corresponding to the second direction vector is equal to the target value, the electronic device may determine that the quadrant corresponding to the motion vector data of the spatial candidate block is the second direction of the second coordinate axis; if the value corresponding to the first direction vector is equal to the target value and the value corresponding to the second direction vector is equal to the target value, the electronic device may determine that the quadrant corresponding to the motion vector data of the spatial candidate block is the origin.

In an exemplary embodiment, as shown in fig. 6, the method for inter-prediction of video data further includes:

in step S610, under the condition that the target number is smaller than the preset threshold, determining a target candidate block in a plurality of candidate blocks of the current coding block according to a preset rate-distortion optimization strategy;

the plurality of candidate blocks of the current coding block may include a plurality of candidate blocks respectively determined based on a plurality of dimensions, for example, a plurality of spatial candidate blocks determined in a position dimension, a plurality of temporal candidate blocks determined in a time dimension, and the like.

In an implementation, the electronic device may determine a candidate block list corresponding to the current coding block from a plurality of spatial candidate blocks determined in a location dimension and from a plurality of temporal candidate blocks determined in a time dimension. In this way, the electronic device may traverse a plurality of candidate blocks included in the candidate block list, calculate rate distortion optimization costs respectively corresponding to the candidate blocks based on a preset rate distortion optimization strategy, and determine, as a target candidate block, a candidate block with the optimal coding performance from among the candidate blocks included in the candidate block list based on the calculated rate distortion optimization costs.

In step S620, target index values of the target candidate blocks in multiple candidate blocks of the current coding block are determined, the target index values are encoded, and inter prediction is performed on the current coding block based on the encoded target index values.

In an implementation, the electronic device may determine an index value of a target candidate block in a candidate block list corresponding to a current encoding block as a target index value (which may be referred to as a merge index value) of the current encoding block, so that the electronic device may perform an encoding operation on the target index value, so that the target index value is written into a code stream of the current encoding block, that is, the merge index value is transferred in the code stream. In this way, the electronic device can predict the pixel data of the current coding block based on the encoded target index value to obtain the predicted pixel data of the current coding block.

In this embodiment, a pre-determination may be performed on a current coding block, and a target index value is coded when target data is smaller than a preset threshold, so as to improve compatibility of the inter-frame prediction method for video data provided by the present disclosure.

Hereinafter, a specific implementation of the inter prediction method for video data provided in the present disclosure may be described in detail with reference to an embodiment:

in the inter-frame prediction method for video data provided by the present disclosure, a plurality of spatial candidate blocks of a current coding block may be determined according to a candidate block determination policy in a video data coding standard corresponding to the current coding block, for example, 5 spatial candidate blocks may be determined, and relative positions of the 5 spatial candidate blocks and the current coding block may be as shown in fig. 2, in a case that a prediction mode of the current coding block is an inter-frame fusion prediction mode (merge mode), an electronic device may determine similarity of motion information of the 5 spatial candidate blocks, if the plurality of spatial candidate blocks of the current coding block satisfy a preset determination condition, it is not necessary to encode and transfer a merge index value, and if the plurality of spatial candidate blocks of the current coding block do not satisfy the preset determination condition, it is necessary to determine a target candidate block among the plurality of candidate blocks of the current coding block, determine a merge index value based on the target candidate block, and encode and transfer the merge index value.

The specific judgment condition is related to the video data coding standard corresponding to the current coding block.

In one example, if the video data coding standard corresponding to the current coding block is the VVC standard, the plurality of spatial candidate blocks of the current coding block may be respectively denoted as A0, A1, B0, B1, and B2, and if, among the spatial candidate blocks corresponding to five positions of A0, A1, B0, B1, and B2, the RI value, the BI value, the ID value, the AI value, and the HF value corresponding to m (m > = N) candidate blocks are all equal, and the MV is located in the same quadrant, it may be determined that the plurality of spatial candidate blocks of the current coding block satisfy the preset determination condition. In this way, the electronic device may determine that the RI, BI, ID, AI, and HF values of the current coding block are equal to those of the m candidate blocks, and may calculate an average value of the MV values of the m candidate blocks, and use the average value as the MV value of the current coding block. Where N is a preset threshold determined according to the number of spatial candidate blocks of the current coding block, for example, in the case where the number of spatial candidate blocks of the current coding block is 5, the preset threshold may be 4.

In one example, if the video data coding standard corresponding to the current coding block is the HEVC standard, the plurality of spatial candidate blocks of the current coding block may be respectively denoted as A0, A1, B0, B1, and B2, and if, of the spatial candidate blocks corresponding to five positions of A0, A1, B0, B1, and B2, the RI values and the ID values corresponding to m (m > = N) candidate blocks are all equal and MV is located in the same quadrant, it may be determined that the plurality of spatial candidate blocks of the current coding block satisfy the preset determination condition. In this way, the electronic device may determine that the RI and ID values of the current coding block are equal to those of the m candidate blocks, and the electronic device may calculate an average value of the MV values of the m candidate blocks, and use the average value as the MV value of the current coding block. Where N is a preset threshold determined according to the number of spatial candidate blocks of the current coding block, for example, in the case where the number of spatial candidate blocks of the current coding block is 5, the preset threshold may be 4.

In a possible implementation manner, if the inter prediction mode of the current coding block is a merge mode, the inter fusion prediction mode flag value in this case is 1 (which may be recorded as merge _ flag = 1), and when the electronic device determines that the merge _ flag is equal to 1, the electronic device needs to determine the current coding block (which may be recorded as the current CU block), and determine whether a plurality of spatial candidate blocks of the current coding block satisfy a preset determination condition. If the electronic device determines that the motion information of m airspace candidates currently meets the preset judgment condition, the electronic device may determine that the target flag value is 1 (which may be recorded as implicit _ flag = 1), so that the electronic device may determine the target motion information of the current coding block according to the method in the above embodiment, and does not need to determine a merge index value and encode the merge index value; if the electronic device determines that the motion information of the m spatial domain candidates does not meet the preset judgment condition, the electronic device may determine that the target flag value is 0 (which may be marked as "impricit _ flag = 0"), so that the electronic device may determine, in the multiple candidate blocks of the current coding block, the target candidate block according to a preset rate-distortion optimization strategy, determine the target index values of the target candidate block in the multiple candidate blocks of the current coding block, encode the target index values, and perform inter-frame prediction on the current coding block based on the encoded target index values.

The inter-frame prediction method of the video data provided by the disclosure has a wide application range, can reduce the code rate consumption of the coding information (such as motion information) of the coding block with high spatial domain similarity in various coding standards, and improves the coding performance.

It should be understood that although the various steps in the flowcharts of fig. 1-6 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least some of the steps in fig. 1-6 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.

Fig. 7 is an apparatus block diagram illustrating an inter prediction apparatus of video data according to an exemplary embodiment. Referring to fig. 7, the inter prediction apparatus 700 of video data includes an acquisition unit 702, a first determination unit 704, and a second determination unit 706.

An obtaining unit 702 configured to perform, in a case where an inter prediction mode of a current coding block of video data is an inter fusion prediction mode, obtaining motion information of a plurality of spatial candidate blocks of the current coding block;

a first determining unit 704 configured to perform determining a target number of target spatial candidate blocks whose motion information satisfies a preset similarity condition, based on motion information of each spatial candidate block;

and a second determining unit 706 configured to determine target motion information of the current coding block according to the motion information of the target spatial domain candidate block and predict the current coding block based on the target motion information of the current coding block, if the target number is greater than or equal to a preset threshold.

In an exemplary embodiment, the motion information includes prediction data and motion vector data, the prediction data being data used in an inter prediction process for the video data;

a first determination unit 704, comprising:

a first determining subunit, configured to perform, for each spatial candidate block of a plurality of spatial candidate blocks of a current coding block, determining a target quadrant corresponding to motion vector data of the spatial candidate block based on a preset correspondence relationship between the motion vector data and the quadrant;

a target number of target spatial candidate blocks is determined.

the second determining unit 706 includes:

a third determining subunit configured to perform, as prediction data of the current coding block, prediction data of the target spatial candidate block;

the mean subunit is configured to perform mean processing on the motion vector data of the target spatial domain candidate blocks, and determine the mean as the motion vector data of the current coding block;

In an exemplary embodiment, the second determining unit 706 further includes:

a fourth determining subunit configured to perform determining a prediction reference video frame corresponding to the current coding block based on the target motion information of the current coding block, and determining, among a plurality of reference transform blocks included in the prediction reference video frame, a prediction reference transform block and pixel data of the prediction reference transform block, the prediction reference transform block being a data block used for predicting the pixel data of the current coding block;

and a prediction sub-unit configured to perform prediction of pixel data of the current coding block based on pixel data of the prediction reference transform block, resulting in predicted pixel data of the current coding block.

In an exemplary embodiment, the motion vector data includes a value corresponding to the first direction vector and a value corresponding to the second direction vector;

the first determining subunit is specifically configured to:

under the condition that the numerical value corresponding to the first direction vector of the spatial candidate block is larger than the target value and the numerical value corresponding to the second direction vector of the spatial candidate block is larger than the target value, determining that the quadrant corresponding to the motion vector data of the spatial candidate block is a first quadrant and determining that the first quadrant is a target quadrant;

In an exemplary embodiment, the apparatus 700 for inter-frame prediction of video data further includes:

a third determining unit, configured to determine a target candidate block according to a preset rate-distortion optimization strategy in a plurality of candidate blocks of a current coding block when the target number is smaller than a preset threshold;

and the fourth determination unit is configured to perform determination of target index values of the target candidate blocks in a plurality of candidate blocks of the current coding block, encode the target index values, and perform inter-frame prediction on the current coding block based on the encoded target index values.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 8 is a block diagram illustrating an electronic device 800 for inter-prediction of video data in accordance with an example embodiment. For example, the electronic device 800 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a gaming console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.

Referring to fig. 8, electronic device 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile storage devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk, optical disk, or graphene memory.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device 800 is in an operation mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive an external audio signal when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the electronic device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or components of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, communications component 816 further includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 804 comprising instructions, executable by the processor 820 of the electronic device 800 to perform the above-described method is also provided. For example, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which includes instructions executable by the processor 820 of the electronic device 800 to perform the above-described method.

It should be noted that the descriptions of the above-mentioned apparatus, the electronic device, the computer-readable storage medium, the computer program product, and the like according to the method embodiments may also include other embodiments, and specific implementations may refer to the descriptions of the related method embodiments, which are not described in detail herein.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for inter-prediction of video data, comprising:

determining the target number of target airspace candidate blocks of which the motion information meets a preset similar condition based on the motion information of each airspace candidate block, wherein the motion information comprises prediction data and motion vector data, and the prediction data is data used in the interframe prediction process of the video data;

under the condition that the target number is larger than or equal to a preset threshold value, determining target motion information of the current coding block according to motion information of the target space domain candidate block, and predicting the current coding block based on the target motion information of the current coding block;

determining a target number of the target spatial candidate blocks.

2. The method of claim 1, wherein the determining the target motion information of the current coding block according to the motion information of the target spatial candidate block comprises:

taking the prediction data of the target space domain candidate block as the prediction data of the current coding block;

calculating the mean value of the motion vector data of the target space domain candidate blocks, and determining the mean value as the motion vector data of the current coding block;

3. The method of claim 1 or 2, wherein the predicting the current coding block based on the target motion information of the current coding block comprises:

determining a prediction reference video frame corresponding to the current coding block based on the target motion information of the current coding block, and determining a prediction reference transformation block and a pixel data block of the prediction reference transformation block from a plurality of reference transformation blocks contained in the prediction reference video frame, wherein the prediction reference transformation block is a data block used for predicting the pixel data of the current coding block;

4. The method of claim 1, wherein the motion vector data comprises a value corresponding to a first direction vector and a value corresponding to a second direction vector;

and under the condition that the numerical value corresponding to the first direction vector of the spatial domain candidate block is larger than the target value and the numerical value corresponding to the second direction vector of the spatial domain candidate block is larger than the target value, determining that the quadrant corresponding to the motion vector data of the spatial domain candidate block is a first quadrant and determining that the first quadrant is a target quadrant.

5. The method of inter-prediction of video data according to claim 4, further comprising:

6. The method of claim 1 or 2, wherein the method further comprises:

under the condition that the target number is smaller than the preset threshold value, determining a target candidate block in a plurality of candidate blocks of the current coding block according to a preset rate-distortion optimization strategy;

7. An apparatus for inter-prediction of video data, comprising:

a first determination unit configured to perform determination of a target number of target spatial candidate blocks whose motion information satisfies a preset similarity condition based on motion information of each of the spatial candidate blocks, the motion information including prediction data and motion vector data, the prediction data being data used in inter prediction of video data;

a second determining unit, configured to determine target motion information of the current coding block according to motion information of the target spatial domain candidate block when the target number is greater than or equal to a preset threshold, and predict the current coding block based on the target motion information of the current coding block;

the first determination unit includes:

determining a target number of the target spatial candidate blocks.

8. The apparatus of claim 7, wherein the second determining unit comprises:

the mean subunit is configured to perform mean processing on the motion vector data of the target spatial domain candidate blocks, and determine that the mean is the motion vector data of the current coding block; and obtaining target motion information of the current coding block based on the prediction data of the current coding block and the motion vector data of the current coding block.

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the method of inter-prediction of video data according to any one of claims 1 to 6.

10. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of inter-prediction of video data of any of claims 1-6.