CN110839155A

CN110839155A - Method and device for motion estimation, electronic equipment and computer-readable storage medium

Info

Publication number: CN110839155A
Application number: CN201810940267.7A
Authority: CN
Inventors: 范娟婷; 樊鸿飞
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd; Beijing Kingsoft Cloud Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd; Beijing Kingsoft Cloud Technology Co Ltd
Priority date: 2018-08-17
Filing date: 2018-08-17
Publication date: 2020-02-25
Anticipated expiration: 2038-08-17
Also published as: WO2020034921A1; CN110839155B

Abstract

The embodiment of the invention provides a motion estimation method, a motion estimation device, electronic equipment and a computer-readable storage medium, wherein a reference frame to be predicted in a candidate reference frame set corresponding to a target prediction unit is obtained, the reference frame to be predicted is a candidate reference frame meeting a first preset condition in the candidate reference frame set, pixel search is carried out on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block, and the matching block with the minimum rate distortion cost in the candidate matching block is determined to be the best matching block of the target prediction unit. Based on the processing, only the candidate reference frames meeting the first preset condition are subjected to pixel search, and all the candidate reference frames do not need to be traversed and each candidate reference frame is subjected to pixel search, so that the coding efficiency of the video is improved.

Description

Method and device for motion estimation, electronic equipment and computer-readable storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for motion estimation, an electronic device, and a computer-readable storage medium.

Background

With the rapid development of computer network technology, in order to reduce the bandwidth and storage space occupied by video transmission, it is necessary to encode the transmitted video. In encoding video, each video frame in the video may be divided into a plurality of image blocks, which are also referred to as encoding blocks. In the process of encoding a coding block, the coding block needs to be predicted, and at this time, a coding block may be divided into a plurality of prediction units, where predicting the coding block may include: and performing intra-frame prediction on the coding block and inter-frame prediction on the coding block, wherein the inter-frame prediction is to search an image block similar to the prediction unit in a reference frame as a matching block. The inter-frame prediction may include motion estimation and motion compensation, where the motion estimation is a process of searching for a matching block with the smallest rate-distortion cost in each candidate reference frame in the candidate reference frame set, the matching block is a best matching block of the prediction unit, and a reference frame in which the matching block is located is a best reference frame.

In the related art, according to the arrangement order of each candidate reference frame in the candidate reference frame set, for each candidate reference frame, according to a preset pixel search rule, a matching block (which may be referred to as a preferred matching block) with the minimum rate distortion cost in the candidate reference frame may be obtained. Then, in the preferred matching blocks corresponding to the candidate reference frames, the preferred matching block with the minimum rate distortion cost is determined as the best matching block of the prediction unit.

Therefore, in the related art, it is necessary to traverse all candidate reference frames and perform pixel search on each candidate reference frame to determine the best matching block of the prediction unit, which results in higher complexity of motion estimation and further reduces the coding efficiency of the video.

Disclosure of Invention

Embodiments of the present invention provide a method, an apparatus, an electronic device, and a computer-readable storage medium for motion estimation, which can improve video encoding efficiency. The specific technical scheme is as follows:

in a first aspect, to achieve the above object, an embodiment of the present invention discloses a method for motion estimation, where the method includes: acquiring a reference frame to be predicted in a candidate reference frame set corresponding to a target prediction unit; the reference frame to be predicted is a candidate reference frame meeting a first preset condition in the candidate reference frame set; performing pixel search on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block; and determining the matching block with the minimum rate distortion cost in the candidate matching blocks as the best matching block of the target prediction unit.

Optionally, the first preset condition includes at least one of: the number of reference frames in the candidate reference frame set is less than or equal to a first preset threshold, and the candidate reference frame is not a designated reference frame in the candidate reference frame set; the number of reference frames in the candidate reference frame set is less than or equal to the first preset threshold, and the candidate reference frames are in the candidate reference frame set; the candidate reference frame is in the set of alternative reference frames; the candidate reference frame is in the set of alternative reference frames and the candidate reference frame is not a designated reference frame in the set of candidate reference frames; the reference frame included in the candidate reference frame set is a reference frame where a matching block with the minimum rate distortion cost in matching blocks obtained by inter-frame prediction of image blocks is located, and the image block is an image block meeting a preset adjacent condition with the target prediction unit.

Optionally, the method further includes: skipping motion estimation on the candidate reference frame under the condition that the candidate reference frame meets a second preset condition; wherein the second preset condition comprises at least one of: the number of reference frames in the alternative reference frame set is larger than a first preset threshold value, and the candidate reference frame is not in the alternative reference frame set; the number of reference frames in the candidate reference frame set is less than or equal to the first preset threshold, the candidate reference frame is not a designated reference frame in the candidate reference frame set, and the candidate reference frame is not in the candidate reference frame set; the reference frame included in the candidate reference frame set is a reference frame where a matching block with the minimum rate distortion cost in matching blocks obtained by inter-frame prediction of image blocks is located, and the image block is an image block meeting a preset adjacent condition with the target prediction unit.

Optionally, the performing pixel search on the reference frame to be predicted according to the pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block includes at least one of: under the condition that the reference frame to be predicted is not in the alternative reference frame set, performing whole pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted, wherein the matching block is used as a candidate matching block; and under the condition that the reference frame to be predicted is in the alternative reference frame set, performing whole pixel search and sub-pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted, wherein the matching block is used as a candidate matching block.

Optionally, when the reference frame to be predicted is multiple, performing pixel search on a first reference frame to be predicted in the multiple reference frames to be predicted according to a pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block, where the method includes: performing pixel search on the first reference frame to be predicted according to a pixel search rule corresponding to the first reference frame to be predicted to obtain a first candidate matching block; after the pixel search is performed on the first to-be-predicted reference frame according to the pixel search rule corresponding to the first to-be-predicted reference frame to obtain a first candidate matching block, the method further includes: and updating the pixel search rule of each reference frame behind the first to-be-predicted reference frame into integer pixel search according to the arrangement sequence of the first to-be-predicted reference frame in the candidate reference frame set when the first candidate matching block is determined to be the matching block with the minimum rate distortion cost in the currently obtained candidate matching blocks and the rate distortion cost of the first candidate matching block is smaller than a second preset threshold.

In a second aspect, to achieve the above object, an embodiment of the present invention further discloses an apparatus for motion estimation, where the apparatus includes: the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a reference frame to be predicted in a candidate reference frame set corresponding to a target prediction unit; the reference frame to be predicted is a candidate reference frame meeting a first preset condition in the candidate reference frame set; the first processing module is used for carrying out pixel search on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block; and the determining module is used for determining the matching block with the minimum rate distortion cost in the candidate matching blocks as the best matching block of the target prediction unit.

Optionally, the apparatus further comprises: a second processing module, configured to skip motion estimation on the candidate reference frame if the candidate reference frame meets a second preset condition; wherein the second preset condition comprises at least one of: the number of reference frames in the alternative reference frame set is larger than a first preset threshold value, and the candidate reference frame is not in the alternative reference frame set; the number of reference frames in the candidate reference frame set is less than or equal to the first preset threshold, the candidate reference frame is not a designated reference frame in the candidate reference frame set, and the candidate reference frame is not in the candidate reference frame set; the reference frame included in the candidate reference frame set is a reference frame where a matching block with the minimum rate distortion cost in matching blocks obtained by inter-frame prediction of image blocks is located, and the image block is an image block meeting a preset adjacent condition with the target prediction unit.

Optionally, the first processing module is specifically configured to, when the reference frame to be predicted is not in the candidate reference frame set, perform integer pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted, where the matching block is used as a candidate matching block; and/or, under the condition that the reference frame to be predicted is in the candidate reference frame set, performing integer pixel search and sub-pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted as a candidate matching block.

Optionally, when the number of the reference frames to be predicted is multiple, for a first reference frame to be predicted in the multiple reference frames to be predicted, the first processing module is configured to perform pixel search on the first reference frame to be predicted according to a pixel search rule corresponding to the first reference frame to be predicted, so as to obtain a first candidate matching block; the device further comprises: and a third processing module, configured to update a pixel search rule of each reference frame after the first to-be-predicted reference frame to integer pixel search according to an arrangement order of the first to-be-predicted reference frame in the candidate reference frame set when it is determined that the first candidate matching block is a matching block with a smallest rate distortion cost among the currently obtained candidate matching blocks, and the rate distortion cost of the first candidate matching block is smaller than a second preset threshold.

In a third aspect, to achieve the above object, an embodiment of the present invention further discloses an electronic device, where the electronic device includes a memory and a processor; the memory is used for storing a computer program; the processor is configured to implement the method steps of motion estimation according to the first aspect when executing a program stored in the memory.

In a fourth aspect, to achieve the above object, an embodiment of the present invention further discloses a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the method steps of motion estimation as described in the first aspect above are implemented.

In a fifth aspect, to achieve the above object, an embodiment of the present invention further discloses a computer program product containing instructions, which when run on a computer, causes the computer to perform the method steps of motion estimation described in the first aspect.

According to the motion estimation method, the motion estimation device, the electronic device and the computer readable storage medium, a reference frame to be predicted which meets a first preset condition in a candidate reference frame set corresponding to a target prediction unit is obtained, pixel search is conducted on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted, a candidate matching block is obtained, and a matching block with the minimum rate distortion cost in the candidate matching block is determined to be the best matching block of the target prediction unit. Based on the processing, the candidate reference frames meeting the first preset condition are subjected to pixel search, and compared with the mode that all the candidate reference frames need to be traversed and each candidate reference frame needs to be subjected to pixel search in the prior art, the pixel search is not required to be carried out on all the candidate reference frames, so that the complexity of video coding is reduced, the coding time is saved, and the coding efficiency of the video can be improved.

Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of a method of motion estimation in the prior art;

FIG. 2 is a diagram illustrating obtaining a matching block of a prediction unit according to an embodiment of the present invention;

fig. 3 is a flowchart of a method for motion estimation according to an embodiment of the present invention;

fig. 4 is a schematic diagram of dividing a coding block according to an embodiment of the present invention;

fig. 5 is a schematic diagram of different-level coding blocks provided by an embodiment of the present invention;

FIG. 6 is a diagram of spatial neighboring blocks according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a searching process of pixel searching according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating a sub-pixel search according to an embodiment of the present invention;

fig. 9 is a flowchart of an example of a method for motion estimation according to an embodiment of the present invention;

fig. 10 is a block diagram of an apparatus for motion estimation according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

First, the method for motion estimation in the prior art is described in terms of comparing with the method for motion estimation provided by the embodiment of the present invention.

Referring to fig. 1, fig. 1 is a flow chart of a method for motion estimation in the prior art.

In the prior art, a candidate reference frame set corresponding to a target prediction unit is first acquired (S101).

Then, according to a preset precision requirement, a pixel search is performed on each candidate reference frame in the candidate reference frame set to obtain a matching block in the candidate reference frame (S102), which is exemplarily shown in fig. 2 and is a schematic diagram for obtaining the matching block of the prediction unit according to the embodiment of the present invention. Wherein, P_n、P_n-1、P_n-2、P_n-3、P_n-4Respectively representing the sequence numbers, P, of the 5 video frames in FIG. 2_nFor the video frame currently to be encoded, P_nThe image block in (1) is the prediction unit, P_n-1、P_n-2、P_n-3And P_n-4Are candidate reference frames. As shown in fig. 2, i.e. in the candidate reference frame P_n-1Candidate reference frame P_n-2Candidate reference frame P_n-3And candidate reference frame P_n-4To find the current video frame P_nThe best matching block of the prediction unit of (1).

The matching block with the smallest rate distortion cost among the matching blocks in each candidate reference frame is determined as the best matching block (S103). In the prior art, pixel search needs to be performed on each candidate reference frame in a candidate reference frame set to obtain a matching block in each candidate reference frame, and then a matching block with the minimum rate distortion cost is selected from a plurality of matching blocks to serve as an optimal matching block of a target prediction unit.

The inventor finds that in the prior art, correlation of image content is ignored, and pixel search is performed on all candidate reference frames, which is tedious to operate, resulting in higher complexity of motion estimation, and further reducing the encoding efficiency of video.

In view of the above, the present invention provides a method for motion estimation, which can be applied to an electronic device for encoding video. On the basis of the prior art, the electronic device may acquire a reference frame to be predicted in a candidate reference frame set corresponding to a target prediction unit, where the reference frame to be predicted is a candidate reference frame satisfying a first preset condition in the candidate reference frame set, perform pixel search on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted, obtain a candidate matching block, and determine a matching block with the minimum rate distortion cost in the candidate matching block as an optimal matching block of the target prediction unit. The electronic equipment only carries out pixel search on the candidate reference frames meeting the first preset condition, and does not need to traverse all the candidate reference frames and carry out pixel search on each candidate reference frame, so that the coding efficiency of the video can be improved.

The present invention will be described in detail with reference to specific examples.

Fig. 3 is a flowchart of a method for motion estimation according to an embodiment of the present invention, where the method may include the following steps:

s301: and acquiring a reference frame to be predicted in the candidate reference frame set corresponding to the target prediction unit.

The reference frame to be predicted is a candidate reference frame meeting a first preset condition in the candidate reference frame set. The first preset condition may be preset empirically by a technician.

Optionally, the first preset condition includes at least one of:

firstly, the number of reference frames in the candidate reference frame set is less than or equal to a first preset threshold, and the candidate reference frame is not a designated reference frame in the candidate reference frame set.

And secondly, the number of the reference frames in the candidate reference frame set is less than or equal to a first preset threshold, and the candidate reference frames are in the candidate reference frame set.

And thirdly, the candidate reference frame is in the candidate reference frame set.

And fourthly, the candidate reference frame is in the candidate reference frame set and the candidate reference frame is not a designated reference frame in the candidate reference frame set.

The reference frame included in the candidate reference frame set is a reference frame where a matching block with the minimum rate distortion cost in matching blocks obtained by inter-frame prediction of image blocks is located, and the image block is an image block meeting a preset adjacent condition with a target prediction unit. It can be seen that, as long as a certain candidate reference frame is in the candidate reference frame set, the electronic device may use the candidate reference frame as a reference frame to be predicted, and thus, the reference frame to be predicted includes an image frame that has been referred to in inter-frame prediction, and the accuracy of the reference frame to be predicted can be further improved.

In addition, the reference frames in the candidate reference frame set are all reference frames of image blocks which meet preset adjacent conditions with the target prediction unit, so that the image blocks and the target prediction unit have certain correlation on image content, namely the correlation of the image content is utilized in the method, and the reference frames in the candidate reference frame set are screened according to the reference frames of the image blocks, so that the coding performance of the video is further improved.

The image blocks satisfying the preset neighbor condition with the target prediction unit may include: the upper layer coding block of the coding block where the target prediction unit is located, the lower layer coding block of the coding block where the target prediction unit is located, and the spatial adjacent block of the target prediction unit.

The target prediction unit is obtained by dividing the coding block where the target prediction unit is located. Specifically, referring to fig. 4, fig. 4 is a schematic diagram of dividing a coding block according to an embodiment of the present invention. The left side of fig. 4 is a coding block of 2N × 2N (N is an integer greater than 1), which can be divided according to the eight division examples shown on the right side to obtain corresponding prediction units.

Referring to fig. 5, fig. 5 is a schematic diagram of different-level coding blocks according to an embodiment of the present invention. When a video frame is encoded, the video frame is divided into Coding Tree Units (CTUs) of equal size, and then the Coding Tree units are used as basic units for encoding. The size of the coding tree unit is generally 64 × 64 blocks, and in the coding process, the coding tree unit can be further divided into coding blocks of different sizes. In fig. 5, a 64 × 64 block represents a coding tree unit with a width of 64 pixels and a height of 64 pixels obtained by dividing a video frame. It can be seen that a 64 × 64 coding tree unit can be coded as a 64 × 64 coding block, or can be divided into 4 equal-sized 32 × 32 coding blocks, and each 32 × 32 coding block is coded. According to the rate-distortion criterion, for each 64 × 64 coding block, comparing the sum of the rate-distortion cost of the 64 × 64 coding block and the rate-distortion costs of 4 32 × 32 coding blocks, and selecting a partitioning mode with a smaller rate-distortion cost for partitioning. Each 32 × 32 coding block can be divided into 4 equal-sized 16 × 16 coding blocks, that is, for each 32 × 32 coding block, the sum of the rate-distortion cost of one 32 × 32 coding block and the rate-distortion cost of 4 16 × 16 coding blocks needs to be compared, and a division mode with a smaller rate-distortion cost is selected for division. Similarly, each 16 × 16 coding block may be further divided into 4 equal-sized 8 × 8 coding blocks, and whether each 16 × 16 coding block needs to be further divided is determined by comparing the sum of the rate-distortion cost of the 16 × 16 coding block and the rate-distortion cost of the 4 8 × 8 coding blocks.

It should be noted that the end condition of the division may be a default or may be user-defined, for example, the division may be ended by setting the division to 4 8 × 8 coding blocks, but is not limited thereto.

Specifically, if the CTUs are divided into 32 × 32 blocks, four 32 × 32 coded blocks indicated in fig. 5 can be obtained, in this case, the four 32 × 32 coded blocks are referred to as lower coded blocks of the 64 × 64 coded block in the figure, and correspondingly, the 64 × 64 coded block in the figure is referred to as upper coded blocks of the four 32 × 32 coded blocks in the figure. By analogy, each of the four 16 × 16 coding blocks obtained by dividing the coding block by 32 × 32 is referred to as a lower coding block of the 32 × 32 coding block, and the 32 × 32 coding block is referred to as an upper coding block of the four 16 × 16 coding blocks obtained by dividing the coding block by 32 × 32; each of the four 8 × 8 coded blocks obtained by dividing the 16 × 16 coding block is referred to as a lower coded block of the 16 × 16 coded block, and the 16 × 16 coded block is referred to as an upper coded block of the four 8 × 8 coded blocks obtained by dividing the coded block. The upper and lower layer relation of the upper coding block and the lower coding block is also called a mutual adjacent layer relation, the adjacent layer relation only exists between two adjacent layers, and the upper and lower layer relation is not considered in the case that one layer is arranged between the two layers. As can be seen from the above description, the coding blocks obtained by one division method may only exist in the lower layer coding blocks, for example, 64 × 64 coding blocks only exist in the lower layer coding blocks, or may only exist in the upper layer coding blocks, for example, 8 × 8 coding blocks only exist in the upper layer coding blocks, or may exist in both the upper layer coding blocks and the upper layer coding blocks, for example, 32 × 32 coding blocks and 16 × 16 coding blocks, and exist in both the upper layer coding blocks and the upper layer coding blocks. When the video frame is actually coded, the coding tree unit in the video frame is divided from an upper coding block to a lower coding block or from the lower coding block to the upper coding block or from a middle layer to obtain coding blocks, and then the coding blocks are predicted to obtain a prediction unit.

Referring to fig. 6, fig. 6 is a schematic diagram of spatial neighboring blocks according to an embodiment of the present invention. Where C is the current prediction unit, and the prediction unit a0, the prediction unit a1, the prediction unit B0, the prediction unit B1, and the prediction unit B2 are prediction units in the same video frame as the current prediction unit C and in neighboring positions to the current prediction unit C. The prediction unit A0, the prediction unit A1, the prediction unit B0, the prediction unit B1, and the prediction unit B2 are referred to as spatial neighboring blocks of the current prediction unit C.

The first preset threshold value may be set empirically by the skilled person, and is typically greater than 1. In the case that the first preset condition relates to the first preset threshold, it can be seen that the smaller the first preset threshold, the smaller the probability of meeting the first preset condition is, the smaller the number of reference frames to be predicted is, and accordingly, the higher the coding efficiency is, but the coding performance is reduced, so that the coding efficiency and the coding performance may also be considered comprehensively to determine the first preset threshold.

The specified reference frame in the candidate reference frame set is typically a reference frame in the candidate reference frame set that is located after a predetermined position. The preset position may be set by a technician according to experience, for example, the preset position may be a position where a second candidate reference frame in the candidate reference frame set is located, and the designated reference frame includes a third candidate reference frame to a last candidate reference frame in the candidate reference frame set; the predetermined position may also be a position of a fourth candidate reference frame in the candidate reference frame set, and the designated reference frame includes a fifth candidate reference frame to a last candidate reference frame in the candidate reference frame set. The later the preset position is, the greater the probability of meeting the first preset condition is, the greater the number of reference frames to be predicted is, and correspondingly, the lower the coding efficiency is, but the coding performance is improved, so that the preset position can be determined by comprehensively considering the coding efficiency and the coding performance.

In an implementation, for each candidate reference frame, the electronic device may determine whether the candidate reference frame satisfies any of the four conditions described above. When the electronic device determines that the candidate reference frame satisfies any one of the four conditions, the electronic device may determine that the candidate reference frame satisfies a first preset condition, that is, the electronic device may determine the candidate reference frame as a reference frame to be predicted, so as to perform subsequent processing.

S302: and carrying out pixel search on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block.

The pixel search rule may be set by a technician according to a service requirement, for example, only performing an integer pixel search on the reference frame to be predicted, or performing a sub-pixel search on the basis of the integer pixel search. The sub-pixel search may include a one-half pixel search, a one-quarter pixel search, and an one-eighth pixel search.

Fig. 7 is a schematic diagram of a searching process of pixel searching according to an embodiment of the present invention. The search range is (2d +1+ M) × (2d +1+ N), the padding block is the prediction unit, the blank block is the matching block of the prediction unit, M is the width of the prediction unit, N is the height of the prediction unit, d is the value of the search window size, the coordinates (k, l) of the upper left corner of the prediction unit, the coordinates (k + u, l + v) of the upper left corner of the matching block of the prediction unit, and then the motion vector (u, v) can be obtained.

Referring to fig. 8, fig. 8 is a schematic diagram of the sub-pixel search according to the embodiment of the present invention. The solid dots represent whole pixel points, and the solid dots in the center represent the best matching points searched by the whole pixels; before the sub-pixel search is carried out, sub-pixel points, namely hollow dots (one-half pixel points), are interpolated, the solid dots at the center are taken as the center, and the full search (full search, namely exhaustive search in a search area, namely traversing each pixel point in the search range) is carried out on the eight surrounding one-half pixel points. Selecting one-half pixel point with the minimum rate distortion cost as a best matching point (here, one-half pixel point at the upper right corner is taken as the best matching point); if the quarter-pixel search is supported, firstly interpolating a quarter pixel (solid triangle) around the current best matching point, taking the current best matching point as a center, carrying out full search on the eight surrounding quarter pixels, and selecting the quarter point with the minimum rate distortion cost as the best matching point. Here, the quarter pixel point at the bottom right corner is taken as the best matching point, i.e. the hollow triangle in the figure. The empty triangle is the best matching point obtained by the sub-pixel search, i.e. the candidate matching block in the current reference frame.

In implementation, the electronic device may perform pixel search on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted, to obtain a matching block corresponding to the target prediction unit in the reference frame to be predicted, and the matching block is used as a candidate matching block. Since the reference frame to be predicted is usually multiple, the corresponding candidate matching block is also usually multiple.

S303: and determining the matching block with the minimum rate distortion cost in the candidate matching blocks as the best matching block of the target prediction unit.

In implementation, the electronic device may select, from the obtained candidate matching blocks, a candidate matching block with the smallest rate-distortion cost as the best matching block of the target prediction unit.

As can be seen from the above, in the embodiment, the pixel search is performed only on the candidate reference frames meeting the first preset condition, and not all the candidate reference frames are traversed and the pixel search is performed on each candidate reference frame, so that the complexity of video encoding can be reduced, and the encoding efficiency can be improved.

Optionally, the method may further include the steps of: and skipping the motion estimation of the candidate reference frame under the condition that the candidate reference frame meets a second preset condition.

The second preset condition includes at least one of:

firstly, the number of reference frames in the candidate reference frame set is larger than a first preset threshold value, and the candidate reference frames are not in the candidate reference frame set.

And secondly, the number of the reference frames in the candidate reference frame set is less than or equal to a first preset threshold, the candidate reference frame is not a designated reference frame in the candidate reference frame set, and the candidate reference frame is not in the candidate reference frame set.

With respect to the alternative reference frame set and the first preset threshold, reference may be made to the detailed description in the above embodiments.

In an implementation, for each candidate reference frame in the set of candidate reference frames, the electronic device may determine whether the candidate reference frame satisfies any of the two conditions. When the electronic device determines that the candidate reference frame satisfies any one of the two conditions, the electronic device may determine that the candidate reference frame satisfies a second preset condition, that is, the electronic device may determine to skip motion estimation on the candidate reference frame, and further determine a reference frame to be predicted for which motion estimation is required.

It should be noted that the electronic device may determine the reference frame to be predicted only according to the first preset condition; the electronic device may also determine the reference frame to be predicted only according to the second preset condition. In addition, the electronic device may further determine the reference frame to be predicted by combining the first preset condition and the second preset condition.

For example, the electronic device may determine candidate reference frames that do not skip motion estimation from the candidate reference frame set according to a second preset condition, and then, the electronic device may determine candidate reference frames that satisfy the first preset condition from the determined candidate reference frames that do not skip motion estimation as reference frames to be predicted.

Or, the electronic device may determine, according to a first preset condition, a candidate reference frame that needs to be motion-estimated from the candidate reference frame set, and then, the electronic device may determine, as a reference frame to be predicted, a candidate reference frame that does not satisfy a second preset condition from the determined candidate reference frames that need to be motion-estimated.

The execution sequence of each step in the method for determining the reference frame to be predicted according to the first preset condition and the second preset condition is not limited in the embodiment of the present invention.

It should be noted that, for each candidate reference frame in the candidate reference frame set corresponding to the target prediction unit, the electronic device may determine whether the candidate reference frame is in the candidate reference frame set. When the electronic device determines that the candidate reference frame is in the set of candidate reference frames, the electronic device may determine that the candidate reference frame does not satisfy a preset skip condition.

When the electronic device determines that the candidate reference frame is not in the candidate reference frame set, the electronic device may determine whether the candidate reference frame satisfies a preset skip condition according to the following manner.

In a first mode, the electronic device obtains a first number of reference frames in the candidate reference frame set, and if the first number is greater than a first preset threshold, the electronic device determines that the candidate reference frame meets a preset skipping condition.

In a second way, if the first number is less than or equal to the first preset threshold, the electronic device may further determine whether the candidate reference frame is a designated reference frame in the candidate reference frame set. If the candidate reference frame is the designated reference frame, the electronic device determines that the candidate reference frame meets the preset skipping condition, and if the candidate reference frame is not the designated reference frame, the electronic device determines that the candidate reference frame does not meet the preset skipping condition.

Based on the above processing, the electronic device may determine candidate reference frames that do not satisfy the preset skip condition, and then, the electronic device may perform pixel search only on the candidate reference frames that do not satisfy the preset skip condition to improve the encoding efficiency of the video. It should be noted that the preset skipping condition may be a condition that the candidate reference frame is not subjected to pixel search or motion estimation.

Optionally, the electronic device may determine a pixel search rule of the reference frame to be predicted according to the affiliated relationship between the reference frame to be predicted and the candidate reference frame set, so as to further improve the encoding efficiency of the video.

Specifically, step S302 may include the following processing steps: and under the condition that the reference frame to be predicted is not in the alternative reference frame set, carrying out whole pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted, wherein the matching block is used as a candidate matching block.

In an implementation, for each to-be-predicted reference frame, the electronic device may determine whether the to-be-predicted reference frame is in the set of alternative reference frames. When the electronic equipment judges that the reference frame to be predicted is not in the alternative reference frame set, the electronic equipment only carries out integer pixel search on the reference frame to be predicted, pixel-division search is not carried out, and a matching block obtained by the integer pixel search is used as a candidate matching block.

And under the condition that the reference frame to be predicted is in the alternative reference frame set, carrying out whole pixel search and sub-pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted as a candidate matching block.

In implementation, when the electronic device determines that the reference frame to be predicted is in the candidate reference frame set, the electronic device performs integer pixel search and sub-pixel search on the reference frame to be predicted, and obtains a matching block by sub-pixel search to serve as a candidate matching block.

It can be seen that, if the reference frame to be predicted is not in the candidate reference frame set, the electronic device only performs integer pixel search on the reference frame to be predicted, but does not perform sub-pixel search, which reduces the complexity of motion estimation and can further improve the coding efficiency of the video.

Optionally, the electronic device may further skip the sub-pixel search process of a part of the candidate reference frames, so as to further improve the video coding efficiency. Specifically, when there are multiple reference frames to be predicted, step S302 may include the following processing procedures for a first reference frame to be predicted in the multiple reference frames to be predicted: and carrying out pixel search on the first reference frame to be predicted according to a pixel search rule corresponding to the first reference frame to be predicted to obtain a first candidate matching block.

Accordingly, after obtaining the first candidate matching block, the method may further include the following processing: and when the first candidate matching block is determined to be the matching block with the minimum rate distortion cost in the currently obtained candidate matching blocks, and the rate distortion cost of the first candidate matching block is smaller than a second preset threshold, updating the pixel search rule of each reference frame behind the first reference frame to be predicted into integer pixel search according to the arrangement sequence of the first reference frame to be predicted in the candidate reference frame set.

The arrangement order of the reference frames to be predicted in the candidate reference frame set can be determined according to the arrangement order of each reference frame to be predicted in the candidate reference frame set. The second preset threshold may be set empirically by a technician. Specifically, the second preset threshold may be represented by Cost, which is a constant related to information such as a quantization parameter, and generally speaking, the larger the quantization parameter, the larger the Cost.

According to the arrangement order, the first to-be-predicted reference frame may be any one of a plurality of to-be-predicted reference frames, for example, the first to-be-predicted reference frame may be a first reference frame, or a second reference frame, which is not limited in this embodiment of the application.

In implementation, the electronic device may perform pixel search on the first to-be-predicted reference frame according to a pixel search rule corresponding to the first to-be-predicted reference frame to obtain a matching block (i.e., a first candidate matching block) in the first to-be-predicted reference frame, and then may determine whether the first candidate matching block is a matching block with a minimum rate distortion cost among currently obtained candidate matching blocks.

The following example can be referred to as a method for the electronic device to determine whether the first candidate matching block is a matching block with the smallest rate-distortion cost among currently obtained candidate matching blocks.

For example, candidate reference frame P₁For the first candidate reference frame in the set of candidate reference frames, candidate reference frame P₂For a second candidate reference frame in the set of candidate reference frames, candidate reference frame P₃Is the third candidate reference frame in the candidate reference frame set. Candidate reference frame P₁Candidate reference frame P₂And candidate reference frame P₃All of which do not satisfy the preset skipping condition, candidate reference frame P₁The matching block in (1) is Z₁Candidate reference frame P₂The matching block in (1) is Z₂Candidate reference frame P₃The matching block in (1) is Z₃. The electronic equipment is aligning candidate reference frame P₁Carry out pixelDuring searching, each matching block obtained currently is a matching block Z₁Matching block Z₁The matching block with the minimum rate distortion cost in the currently obtained matching blocks is obtained; at the electronic device, candidate reference frame P₂After the pixel search is carried out, each matching block obtained currently is a matching block Z₁And matching block Z₂Electronic device needs to judge Z₂Is less than the matching block Z₁A rate distortion cost of; at the electronic device, candidate reference frame P₃After the pixel search is carried out, each matching block obtained currently is a matching block Z₁Matching block Z₂And matching block Z₃The electronic device needs to judge the matching block Z₃Whether it is a matching block Z₁Matching block Z₂And matching block Z₃The matching block with the lowest medium rate distortion cost.

When the electronic device determines that the first candidate matching block is the matching block with the smallest rate distortion cost among the currently obtained candidate matching blocks, the electronic device may further determine whether the rate distortion cost of the first candidate matching block is smaller than a second preset threshold. If the rate-distortion cost of the first candidate matching block is smaller than the second preset threshold, the electronic device may determine the arrangement order of the first to-be-predicted reference frame in the candidate reference frame set, and update the pixel search rule of each reference frame located after the first to-be-predicted reference frame to integer pixel search, that is, when the electronic device performs motion estimation on the to-be-predicted reference frame (which may be referred to as a second to-be-predicted reference frame) located after the first to-be-predicted reference frame, the electronic device performs only integer pixel search without performing sub-pixel search, thereby improving the coding efficiency of the video.

It should be noted that, when the electronic device determines that the first candidate matching block is the matching block with the minimum rate distortion cost among the currently obtained candidate matching blocks, and whether the rate distortion cost of the first candidate matching block is smaller than a second preset threshold, the electronic device performs integer pixel search on the second reference frame to be predicted to obtain a matching block (which may be referred to as a second candidate matching block) in the second reference frame to be predicted, it is not necessary to determine whether the second candidate matching block is the matching block with the minimum rate distortion cost among the currently obtained candidate matching blocks, and it is also not necessary to determine whether the rate distortion cost of the second candidate matching block is smaller than the second preset threshold.

Referring to fig. 9, fig. 9 is a flowchart of an example of a method for motion estimation according to an embodiment of the present invention, where the method may include the following steps:

s901: and acquiring an optimal reference frame used for inter-frame prediction in the image blocks meeting the preset adjacent condition with the target prediction unit to obtain an alternative reference frame set.

S902: and judging whether the candidate reference frame is in the candidate reference frame set or not for each candidate reference frame in the candidate reference frame set corresponding to the target prediction unit, if so, executing S903, and if not, executing S904.

S903: and determining that the candidate reference frame does not satisfy the preset skipping condition.

S904: and judging whether the first number of the alternative reference frames contained in the alternative reference frame set is greater than a first preset threshold value. If the first number is greater than the first preset threshold, S905 is performed, and if the first number is not greater than the first preset threshold, S906 is performed.

S905: and determining that the candidate reference frame meets a preset skipping condition.

S906: and judging whether the candidate reference frame is a designated reference frame in the candidate reference frame set, if the candidate reference frame is not the designated reference frame, executing S903, and if the candidate reference frame is the designated reference frame, executing S905.

S907: and determining the candidate reference frame which does not meet the preset skipping condition as the reference frame to be predicted.

S908: and judging whether the reference frame to be predicted is in the alternative reference frame set, if the reference frame to be predicted is not in the alternative reference frame set, executing S909, and if the reference frame to be predicted is in the alternative reference frame set, executing S910.

S909: and carrying out whole pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted as a candidate matching block.

S910: and carrying out whole pixel search and sub-pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted as a candidate matching block.

S911: and taking the matching block with the minimum rate distortion cost in the candidate matching blocks as the best matching block of the target prediction unit.

The embodiment of the invention also provides a selectable motion estimation method, which comprises the following steps:

acquiring an optimal reference frame for inter-frame prediction in image blocks meeting preset adjacent conditions with a target prediction unit to obtain an alternative reference frame set;

judging whether the condition 1 is met or not aiming at each candidate reference frame in the candidate reference frame set corresponding to the target prediction unit, if so, skipping the motion estimation of the candidate reference frame, and directly judging the next candidate reference frame; if the condition 1 is not met, judging whether the condition 2 is met, if the condition 2 is met, skipping the motion estimation of the candidate reference frame, and directly judging the next candidate reference frame; if the condition 2 is not met, judging whether the condition 3 is met, if the condition 3 is met, skipping the sub-pixel search and only performing the integer pixel search when performing the motion estimation on the current candidate reference frame; if condition 3 is not satisfied, all motion search steps need to be completed. Wherein, the condition 1 is that the number of reference frames in the candidate reference frame set is greater than a certain threshold T (T >1) (equivalent to the first preset threshold) and the candidate reference frame is not in the candidate reference frame set; condition 2 is that the current candidate reference frame is the nth candidate reference frame (corresponding to the above-mentioned designated reference frame) (where N is greater than 2), and the current candidate reference frame is not in the candidate reference frame set; condition 3 is that the current candidate reference frame is not in the set of candidate reference frames.

Based on the motion estimation method of the above embodiment, for each of the image sequence classifications shown in table (1), a corresponding image sequence is selected for testing the encoding performance, and table (1) is a comparison result of the motion estimation method of the embodiment of the present invention and encoding performed by using the prior art.

Watch (1)

Where a resolution column represents a sequence of images at different resolutions and a sequence of images represents a sequence of images at different video pictures.

And selecting different numbers of image sequences for testing aiming at each image sequence classification, wherein the results in the table are the average values of comparison results of all the image sequences which are coded by using the motion estimation method of the embodiment of the invention and the prior art in each image sequence classification. Y (BD-rate) column, U (BD-rate) column, V (BD-rate) column and YUV (BD-rate) column represent Y, U, V and the code rate saving (negative value represents saving and positive value represents increasing) at YUV merging quality, respectively, Y represents lightness (Luminince or Luma), i.e. the gray level value; u and V represent Chroma (Chroma) which is used to describe the color and saturation of the image for specifying the color of the pixel. Δ fps represents the coding acceleration, specifically, as shown in formula (1).

Where Δ FPS represents the coding acceleration, FPS_anchorRepresenting the frame rate FPS, FPS of a sequence of images encoded using an original encoder_proposedWhich represents the frame rate fps of the sequence of encoded images obtained by the same encoder using the method of motion estimation of the present embodiment. Δ fps is positive indicating acceleration and negative indicating deceleration.

As can be seen from the data in table (1), for each classified image sequence, motion estimation using the method of the embodiment of the present invention has a significant coding time saving effect, and brings about a gain of about 2.02% on average.

As can be seen from the above, according to the motion estimation method of the embodiment of the present invention, a to-be-predicted reference frame meeting a first preset condition in a candidate reference frame set corresponding to a target prediction unit is obtained, a pixel search is performed on the to-be-predicted reference frame according to a pixel search rule corresponding to the to-be-predicted reference frame, so as to obtain a candidate matching block, and a matching block with the minimum rate distortion cost in the candidate matching block is determined as an optimal matching block of the target prediction unit. Based on the processing, the candidate reference frames meeting the first preset condition are subjected to pixel search, and compared with the mode of traversing all the candidate reference frames and performing pixel search on each candidate reference frame in the prior art, the complexity of video coding is reduced, the coding time is saved, and the coding efficiency of the video can be improved.

Corresponding to the embodiment of the method in fig. 3, referring to fig. 10, fig. 10 is a block diagram of an apparatus for motion estimation according to an embodiment of the present invention, where the apparatus includes:

an obtaining module 1001, configured to obtain a reference frame to be predicted in a candidate reference frame set corresponding to a target prediction unit; the reference frame to be predicted is a candidate reference frame meeting a first preset condition in the candidate reference frame set;

the first processing module 1002 is configured to perform pixel search on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted, so as to obtain a candidate matching block;

a determining module 1003, configured to determine a matching block with the smallest rate distortion cost among the candidate matching blocks as a best matching block of the target prediction unit.

Optionally, the first preset condition includes at least one of:

the number of reference frames in the candidate reference frame set is less than or equal to a first preset threshold, and the candidate reference frame is not a designated reference frame in the candidate reference frame set;

the number of reference frames in the candidate reference frame set is less than or equal to the first preset threshold, and the candidate reference frames are in the candidate reference frame set;

the candidate reference frame is in the set of alternative reference frames;

the candidate reference frame is in the set of alternative reference frames and the candidate reference frame is not a designated reference frame in the set of candidate reference frames;

the reference frame included in the candidate reference frame set is a reference frame where a matching block with the minimum rate distortion cost in matching blocks obtained by inter-frame prediction of image blocks is located, and the image block is an image block meeting a preset adjacent condition with the target prediction unit.

Optionally, the apparatus further comprises:

a second processing module, configured to skip motion estimation on the candidate reference frame if the candidate reference frame meets a second preset condition;

wherein the second preset condition comprises at least one of: the number of reference frames in the alternative reference frame set is larger than a first preset threshold value, and the candidate reference frame is not in the alternative reference frame set; the number of reference frames in the candidate reference frame set is less than or equal to the first preset threshold, the candidate reference frame is not a designated reference frame in the candidate reference frame set, and the candidate reference frame is not in the candidate reference frame set; the reference frame included in the candidate reference frame set is a reference frame where a matching block with the minimum rate distortion cost in matching blocks obtained by inter-frame prediction of image blocks is located, and the image block is an image block meeting a preset adjacent condition with the target prediction unit.

Optionally, the first processing module 1002 is specifically configured to, when the reference frame to be predicted is not in the candidate reference frame set, perform integer pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted, where the matching block is used as a candidate matching block;

and/or, under the condition that the reference frame to be predicted is in the candidate reference frame set, performing integer pixel search and sub-pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted as a candidate matching block.

Optionally, when the number of the reference frames to be predicted is multiple, for a first reference frame to be predicted in the multiple reference frames to be predicted, the first processing module 1002 is specifically configured to perform pixel search on the first reference frame to be predicted according to a pixel search rule corresponding to the first reference frame to be predicted, so as to obtain a first candidate matching block;

the device further comprises: and a third processing module, configured to update a pixel search rule of each reference frame after the first to-be-predicted reference frame to integer pixel search according to an arrangement order of the first to-be-predicted reference frame in the candidate reference frame set when it is determined that the first candidate matching block is a matching block with a smallest rate distortion cost among the currently obtained candidate matching blocks, and the rate distortion cost of the first candidate matching block is smaller than a second preset threshold.

As can be seen from the above, according to the motion estimation device in the embodiment of the present invention, a reference frame to be predicted, which satisfies a first preset condition, in a candidate reference frame set corresponding to a target prediction unit is obtained, a pixel search is performed on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted, so as to obtain a candidate matching block, and a matching block with the minimum rate distortion cost in the candidate matching block is determined as an optimal matching block of the target prediction unit. Based on the processing, the candidate reference frames meeting the first preset condition are subjected to pixel search, and compared with the mode of traversing all the candidate reference frames and performing pixel search on each candidate reference frame in the prior art, the complexity of video coding is reduced, the coding time is saved, and the coding efficiency of the video can be improved.

It should be noted that the above-mentioned apparatus may be located in a device, such as a terminal, a server, etc., but is not limited thereto.

An embodiment of the present invention further provides an electronic device, as shown in fig. 11, including a memory 1101 and a processor 1102;

a memory 1101 for storing a computer program;

the processor 1102, when executing the program stored in the memory 1101, is configured to implement the method for motion estimation according to the embodiment of the present invention.

Specifically, the motion estimation method includes:

acquiring a reference frame to be predicted in a candidate reference frame set corresponding to a target prediction unit; the reference frame to be predicted is a candidate reference frame meeting a first preset condition in the candidate reference frame set;

performing pixel search on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block;

and determining the matching block with the minimum rate distortion cost in the candidate matching blocks as the best matching block of the target prediction unit.

It should be noted that other implementation manners of the motion estimation method are partially the same as those of the foregoing method embodiments, and are not described herein again.

The electronic device may be provided with a communication interface for realizing communication between the electronic device and another device.

The processor 1102, the communication interface, and the memory 1101 are configured to communicate with each other through a communication bus, where the communication bus may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus. The communication bus may be divided into an address bus, a data bus, a control bus, etc.

The Memory 1101 may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor 1102 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.

The electronic device provided by the embodiment of the invention only carries out pixel search on the candidate reference frame meeting the first preset condition when carrying out motion estimation, thereby reducing the complexity of video coding, saving the coding time and further improving the coding efficiency of the video.

Embodiments of the present invention also provide a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to perform the method for motion estimation provided by the embodiments of the present invention.

Specifically, the motion estimation method includes:

By operating the instruction stored in the computer-readable storage medium provided by the embodiment of the invention, when motion estimation is performed, only the candidate reference frame meeting the first preset condition is subjected to pixel search, so that the complexity of video coding is reduced, the coding time is saved, and the coding efficiency of the video can be improved.

Embodiments of the present invention also provide a computer program product containing instructions, which when run on a computer, cause the computer to perform the method for motion estimation provided by embodiments of the present invention.

Specifically, the motion estimation method includes:

By operating the computer program product provided by the embodiment of the invention, when motion estimation is carried out, only the candidate reference frame meeting the first preset condition is subjected to pixel search, so that the complexity of video coding is reduced, the coding time is saved, and the coding efficiency of the video can be improved.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of motion estimation, the method comprising:

2. The method of claim 1, wherein the first preset condition comprises at least one of:

the candidate reference frame is in the set of alternative reference frames;

3. The method of claim 1, further comprising:

skipping motion estimation on the candidate reference frame under the condition that the candidate reference frame meets a second preset condition;

4. The method according to claim 2, wherein the performing a pixel search on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block comprises at least one of:

under the condition that the reference frame to be predicted is not in the alternative reference frame set, performing whole pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted, wherein the matching block is used as a candidate matching block;

and under the condition that the reference frame to be predicted is in the alternative reference frame set, performing whole pixel search and sub-pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted, wherein the matching block is used as a candidate matching block.

5. The method according to claim 1, wherein in a case that the reference frame to be predicted is multiple, performing pixel search on a first reference frame to be predicted in the multiple reference frames to be predicted according to a pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block, includes:

performing pixel search on the first reference frame to be predicted according to a pixel search rule corresponding to the first reference frame to be predicted to obtain a first candidate matching block;

after the pixel search is performed on the first to-be-predicted reference frame according to the pixel search rule corresponding to the first to-be-predicted reference frame to obtain a first candidate matching block, the method further includes:

and updating the pixel search rule of each reference frame behind the first to-be-predicted reference frame into integer pixel search according to the arrangement sequence of the first to-be-predicted reference frame in the candidate reference frame set when the first candidate matching block is determined to be the matching block with the minimum rate distortion cost in the currently obtained candidate matching blocks and the rate distortion cost of the first candidate matching block is smaller than a second preset threshold.

6. An apparatus for motion estimation, the apparatus comprising:

the device comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring a reference frame to be predicted in a candidate reference frame set corresponding to a target prediction unit; the reference frame to be predicted is a candidate reference frame meeting a first preset condition in the candidate reference frame set;

the first processing module is used for carrying out pixel search on the reference frame to be predicted according to a pixel search rule corresponding to the reference frame to be predicted to obtain a candidate matching block;

and the determining module is used for determining the matching block with the minimum rate distortion cost in the candidate matching blocks as the best matching block of the target prediction unit.

7. The apparatus of claim 6, wherein the first preset condition comprises at least one of:

the candidate reference frame is in the set of alternative reference frames;

8. The apparatus of claim 6, further comprising:

9. The apparatus according to claim 7, wherein the first processing module is configured to, when the reference frame to be predicted is not in the candidate reference frame set, perform integer pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted as a candidate matching block; and/or, under the condition that the reference frame to be predicted is in the candidate reference frame set, performing integer pixel search and sub-pixel search on the reference frame to be predicted to obtain a matching block in the reference frame to be predicted as a candidate matching block.

10. The apparatus according to claim 6, wherein, in the case that there are a plurality of reference frames to be predicted, for a first reference frame to be predicted in the plurality of reference frames to be predicted, the first processing module is configured to perform pixel search on the first reference frame to be predicted according to a pixel search rule corresponding to the first reference frame to be predicted, so as to obtain a first candidate matching block;

11. An electronic device comprising a memory and a processor;

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1 to 5 when executing a program stored in the memory.

12. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-5.