CN112770118B

CN112770118B - Video frame image motion estimation method and related equipment

Info

Publication number: CN112770118B
Application number: CN202011639504.XA
Authority: CN
Inventors: 索士尧; 罗小伟; 郭春磊; 李�荣
Original assignee: Spreadtrum Communications Tianjin Co Ltd
Current assignee: Spreadtrum Communications Tianjin Co Ltd
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-09-13
Anticipated expiration: 2040-12-31
Also published as: CN115633178A; CN112770118A

Abstract

The embodiment of the invention discloses a motion estimation method of a video frame image and related equipment, wherein the method comprises the following steps: acquiring a first frame image in a target video, and dividing the first frame image into a plurality of image blocks; obtaining a first central image block of the plurality of image blocks; performing rough search on a plurality of first candidate image blocks in a first candidate search set by adopting the first central image block, and determining a second central image block according to a rough search result; determining a plurality of second candidate image blocks in a second candidate search set according to the second central image block; adopting the first central image block to perform accurate search aiming at a plurality of second candidate image blocks, and determining a motion estimation result of the first central image block according to the accurate search result; and determining the motion estimation result of the first frame image according to the motion estimation results of the plurality of image blocks of the first frame image. The embodiment of the invention improves the overall efficiency and accuracy of motion estimation.

Description

Video frame image motion estimation method and related equipment

Technical Field

The invention relates to the technical field of video frame interpolation, in particular to a video frame image motion estimation method and related equipment.

Background

In the motion estimation process of video frame interpolation, a frame of video image is divided into non-overlapping image blocks, a matching block most similar to the image block in the current frame is found in a certain search range from each image block in the current frame to a reference frame, and then a motion vector is calculated.

A variety of motion estimation algorithms exist. The optical flow method is to independently calculate a motion vector for each pixel point to obtain an optical flow field for motion estimation. The pixel recursive method obtains a motion vector by updating a prediction value for each pixel in a recursive manner. The motion estimation algorithm based on block matching searches image blocks of adjacent frames in an image by physical distances between pixels and finds the best result according to a matching rule. The simplest block matching algorithm is a full search method, which searches matching blocks in turn given a search range, e.g. the entire image. A three-dimensional Recursive Search (3-dimensional Recursive Search, 3DRS) method is a block matching algorithm, in which a current block inherits motion vectors of neighboring blocks, cost values of the current block and blocks corresponding to each candidate vector are calculated according to a matching criterion, and a most similar block is found by comparing the cost values.

However, the 3DRS algorithm is adopted to carry out motion estimation of the video frame image, the problems that the search range is small, the overall motion condition of the video is not considered, accurate search is not carried out and the like exist, and the 3DRS algorithm is combined with other existing algorithms to carry out motion estimation, the calculation is complex, the motion estimation consumes long time and the like exist.

Disclosure of Invention

The embodiment of the invention provides a motion estimation method of a video frame image and related equipment, which can increase the search range of motion estimation, simultaneously does not need a large amount of calculation, and improves the overall efficiency and accuracy of the motion estimation.

In a first aspect, an embodiment of the present invention provides a method for motion estimation of a video frame image, where the method includes: acquiring a first frame image in a target video, and dividing the first frame image into a plurality of image blocks; obtaining a first central image block of the plurality of image blocks; performing rough search on a plurality of first candidate image blocks in a first candidate search set by adopting a first center image block, and determining a second center image block according to a rough search result, wherein the first candidate image block is determined according to image blocks in adjacent frame images of a first frame image; determining a plurality of second candidate image blocks in a second candidate search set according to the second central image block, wherein the second candidate image blocks are determined according to step length distances between the second central image block and other image blocks, and the other image blocks and the second central image block are positioned in the same frame of image; adopting the first central image block to perform accurate search aiming at a plurality of second candidate image blocks, and determining a motion estimation result of the first central image block according to the accurate search result; and determining the motion estimation result of the first frame image according to the motion estimation results of the plurality of image blocks of the first frame image.

It can be seen that, in the embodiment of the present application, when motion estimation is performed on image blocks in a video frame image, a full search method is adopted to perform motion estimation on each image block on the frame image, and for the search process of each image block, two searches are performed, namely, a coarse search and a precise search, respectively, a first candidate search set for the coarse search is determined according to the image block of an adjacent frame image of the frame image currently subjected to motion estimation, and a second candidate search set for the precise search obtains the image block within a preset step length based on the result of the coarse search, and the search range is expanded through the two searches in the process, so that the search result is improved. In addition, the process of acquiring the search set is simple in calculation, and the efficiency of the search process is improved.

In a second aspect, an embodiment of the present invention provides a motion estimation apparatus, including: the acquisition module is used for acquiring a first frame image in a target video; the processing module is used for dividing the first frame image into a plurality of image blocks to obtain a first central image block in the plurality of image blocks; the processing module is further configured to perform a coarse search for a plurality of first candidate image blocks in the first candidate search set by using the first center image block, and determine a second center image block according to a coarse search result, where the first candidate image block is determined according to image blocks in adjacent frame images of the first frame image; the processing module is further configured to determine a plurality of second candidate image blocks in a second candidate search set according to the second center image block, where the second candidate image blocks are determined according to step length distances between the second center image block and other image blocks, and the other image blocks and the second center image block are located in the same frame of image; the processing module is further configured to perform an accurate search for a plurality of second candidate image blocks by using the first central image block, and determine a motion estimation result of the first central image block according to the accurate search result; and determining the motion estimation result of the first frame image according to the motion estimation results of the plurality of image blocks of the first frame image.

In a third aspect, an embodiment of the present invention provides a motion estimation apparatus, including: a processor and a memory;

the processor is connected to a memory, wherein the memory is used for storing program codes, and the processor is used for calling the program codes to execute the motion estimation method of the video frame image according to the first aspect.

In a fourth aspect, an embodiment of the present application provides a chip system, including: a processor coupled to a memory for storing a program or instructions which, when executed by the processor, cause the system-on-chip to implement the method of the first aspect or any of the possible implementations of the first aspect.

In a fifth aspect, an embodiment of the present invention provides a computer storage medium storing a computer program, the computer program comprising program instructions that, when executed by a processor, perform the method for motion estimation of video frame images according to the first aspect.

In a sixth aspect, embodiments of the present application provide a computer program product, which, when read and executed by a computer, causes the computer to execute the method in the first aspect or any one of the possible implementation manners of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a full search matching method according to an embodiment of the present application;

FIG. 2A is a flowchart illustrating a method for motion estimation of video frame images according to an embodiment of the present disclosure;

fig. 2B is a schematic diagram of dividing an image block of a first frame image according to an embodiment of the present disclosure;

fig. 2C is a schematic diagram of a process of obtaining a plurality of first candidate image blocks according to an embodiment of the present application;

fig. 2D is a flowchart of motion vector clustering provided in the embodiment of the present application;

fig. 2E is a schematic diagram illustrating a process of determining a plurality of second candidate image blocks according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a motion estimation apparatus according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a motion estimation device according to an embodiment of the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

It should be understood that the terms "first," "second," and the like in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by the person skilled in the art that the described embodiments of the invention can be combined with other embodiments.

First, terms of art that may be referred to in the embodiments of the present application will be described.

Motion Estimation (ME) is a process of dividing a frame of image into a plurality of image blocks which are not overlapped with each other, considering that the displacement of all pixels in the image blocks are the same, finding out the most similar image block, namely a matching block, from each image block to a given search range of a reference frame according to a certain matching criterion, and obtaining the relative offset of the space positions of the two image blocks.

And (4) Motion Vector (MV), namely the relative offset of the matching block and the current block (the image block used for matching search) is the Motion Vector.

And (4) motion vector field, wherein the motion vectors of all image blocks in one frame of image form a motion vector field.

And motion compensation: and reconstructing an intermediate frame which does not exist originally through the original frame and the motion information.

Video frame insertion, namely, in a video sequence, generating a new frame through operations such as motion estimation, motion compensation and the like, and improving the time resolution of the video.

Interpolation frame: between two frames of a video sequence, a new frame is generated by a video frame interpolation method.

The process of inserting frames of a video sequence based on motion estimation and motion compensation requires the use of information of the previous and subsequent neighboring frames to estimate the motion of the interpolated frames with respect to them. The quality of the interpolated frame depends on the accuracy of the motion estimation. In order to obtain a good quality interpolated frame, it is necessary to select a motion estimation algorithm that can obtain a true motion vector field. The existing motion estimation algorithm has high computational complexity and is not easy to realize, such as an optical flow field; the full search method is the simplest block matching algorithm, and searches for matching blocks in sequence within a given search range, such as a whole image, specifically as shown in fig. 1, for an image block in a current frame, searching for a matching block within a search range in a reference frame, and then obtaining a motion vector, and the search algorithm has high precision but too large calculation amount; searching for matching blocks in a defined search range, while reducing the amount of computation, a given image block does not represent well the true motion of an object.

Based on the above description, please refer to fig. 2A, fig. 2A is a flowchart of a video frame image motion estimation method according to an embodiment of the present application, and as shown in fig. 2A, the method includes the following steps:

101. acquiring a first frame image in a target video, and dividing the first frame image into a plurality of image blocks;

102. obtaining a first central image block of the plurality of image blocks;

103. performing rough search on a plurality of first candidate image blocks in a first candidate search set by adopting a first center image block, and determining a second center image block according to a rough search result, wherein the first candidate image block is determined according to image blocks in adjacent frame images of a first frame image;

104. determining a plurality of second candidate image blocks in a second candidate search set according to the second central image block, wherein the second candidate image blocks are determined according to step length distances between the second central image block and other image blocks, and the other image blocks and the second central image block are positioned in the same frame of image;

105. performing accurate search for a plurality of second candidate image blocks by adopting the first central image block, and determining a motion estimation result of the first central image block according to an accurate search result;

106. and determining the motion estimation result of the first frame image according to the motion estimation results of the image blocks in the first frame image.

In the embodiment of the present application, a target video is a video that needs to be subjected to frame interpolation, a first frame image in the target video image is obtained, the first frame image is a frame image with an unknown motion direction, and video frame interpolation is attempted after the first frame image, that is, assuming that a second frame image is a next frame image adjacent to the first frame image, in order to determine how to perform video frame interpolation between the first frame image and the second frame image, motion estimation needs to be performed on the first frame image, specifically, a running vector corresponding to the motion from the first frame image to the second frame image is obtained.

Referring to fig. 2B, fig. 2B is a schematic diagram of dividing a first frame image into a plurality of image blocks, where as shown in (a) of fig. 2B, the first frame image may be divided into a plurality of image blocks with the same size and shape, or as shown in (B) of fig. 2B, the first frame image may be divided into image blocks with different sizes according to pixel values, for example, smaller image blocks are divided for areas with larger pixel values (with darker colors) in the frame image, that is, finer division is performed, and larger image blocks are divided for areas with smaller pixel values (with lighter colors) in the frame image, that is, coarser division is performed. This division is based on the fact that areas of higher color are generally considered to have more detail, and therefore more detailed matching is required.

In the embodiment of the present application, a method for dividing a first frame image into a plurality of image blocks with the same size and shape is taken as an example to describe, the first frame image is divided into a plurality of rectangles with the same size according to a preset size, and each rectangle corresponds to one image block. Any one of the image blocks may be selected as the first central image block for motion estimation. As shown in (a) of fig. 2B, the image block C is the first central image block selected in the embodiment of the present application.

And then, searching and matching the first central image block with the image block in the reference frame, determining a matching block with the highest matching degree with the first central image block, calculating to obtain a motion vector between the first central block and the matching block, and finishing motion estimation of the first central image block according to the motion vector. The reference frame may be a forward reference frame or a backward reference frame, the forward reference frame represents an image frame corresponding to a current frame (an image frame currently needing motion estimation) at a time before the current time, the backward reference frame represents an image frame to which the current frame may move at a next time, a search match between the current frame and the forward reference frame represents motion estimation at the previous time of the current time, and a search match between the current frame and the backward reference frame represents motion estimation at the next time of the current time. In the embodiment of the present application, motion estimation performed by search matching with a backward reference frame is exemplified.

In the embodiment of the present application, the using the first central image block to perform search matching with an image block in a reference frame (a next frame image adjacent to the first frame image) specifically includes: performing rough search on a plurality of first candidate image blocks in a first candidate search set by adopting a first center image block, and determining a second center image block according to a rough search result, wherein the first candidate image block is determined according to image blocks in adjacent frame images of a first frame image; and performing accurate search on a plurality of second candidate image blocks in a second candidate search set by adopting the second central image block, and determining a motion estimation result of the first frame image according to the accurate search result, wherein the second candidate image block is determined according to step length distances between the second central image block and other image blocks, and the other image blocks and the second central image block are positioned in the same frame image.

As can be seen from the above description, in the process of performing search matching between the first central image block and the image block in the reference frame, two searches need to be performed, the first search is a coarse search for determining the second central image block, and the second search is a precise search for finally determining the motion estimation result. The rough search actually performs a matching operation on the first center image block and a plurality of first candidate image blocks in the first candidate search set, and then determines a second center image block according to a matching result. The second central image block may be a matching block having a highest matching degree with the first central image block in the first candidate image block, or may be an image block obtained by adjustment based on the matching block. The first candidate image block is determined according to image blocks in adjacent frame images of the first frame image, for example, the first candidate image block may be an image block corresponding to an adjacent image block of the first center image block in the adjacent frame images, and the adjacent frame image may be a previous frame image or a next frame image of the first frame image, and the corresponding image block represents an image block corresponding to the same pixel image on different frame images at different time instants. Or the first candidate image block may be an image block that the first central image block predicted according to the motion vector of the image block of the previous frame may correspond to on the image of the next frame, and the like.

After the second central image block is determined according to the coarse search result, a plurality of second candidate image blocks in the second candidate search set are determined according to the second central image block, where the second central image block is an image block on a next frame image (which may be named as a second frame image) adjacent to the first frame image, that is, after the coarse search, it is determined that the first central image block may be located at the position of the second central image block when the first frame image is transformed to the second frame image at the next time. Further, the second central image block may not be the image block of the second frame image that has the highest matching degree with the first central image block, and therefore, a plurality of second candidate image blocks on the second frame image may be further obtained, then the matching block having the highest matching degree with the first central image block is obtained from the second candidate image blocks, and the position where the first central image block may be located finally when the first frame image is transformed to the second frame image at the next time is determined.

Each image block in the first frame image can be selected to be a first central image block, then the motion estimation is carried out by adopting the method, and finally the motion estimation result of the first frame image is determined according to the motion vector of each image block, so that the position for carrying out video frame interpolation is determined.

As can be seen, in the embodiment of the present application, a first frame image in a target video image is obtained, a first central image block in the first frame image is obtained, and a rough search and an accurate search are performed on the first central image block, where an image block used for performing a matching search during the rough search is determined according to an image block in an adjacent frame image of the first frame image, an image block used for performing the matching search during the accurate search is determined according to a second central image block determined by the rough search and a matching result between the second central image block and an image block within a preset step size range, and finally an image block with the highest matching degree with the first central image block is obtained as a final matching block, and then a motion estimation result of the first central image block is determined. The process enlarges the search range through two searches and improves the search result. In addition, the process of acquiring the candidate image blocks is simple in calculation, and the efficiency of the searching process is improved.

Optionally, the method further comprises determining a plurality of first candidate image blocks in the first candidate search set. Referring to fig. 2C specifically, fig. 2C is a schematic diagram of a process of obtaining a plurality of first candidate image blocks according to an embodiment of the present application, as shown in (b) of fig. 2C, f _n Is the first frame image, i.e. the current frame, wherein the image block C is the first central image block obtained, first, in the current frame f _n Up, the first image block S1 and the second image block S2 to the left and above of image block C have been completedMotion estimation search and spatially closer to the image block C than the other blocks, the prediction vectors of S1 and S2 are selected as spatial prediction vectors, that is, as shown in (C) of fig. 2C, the prediction motion vectors corresponding to S1 and S2 are calculated and obtained according to the prediction motion vectors corresponding to each of S1 and S2, respectively, in the second frame image f _n+1 The third image block S1 'and the fourth image block S2' as the two first candidate image blocks. Second frame image f _n+1 Is a and f _n The next adjacent frame image.

Further, as shown in FIG. 2C (a), f _n-1 Is the current frame f _n Is referred to as a third frame image, f is selected _n-1 The right adjacent fifth image block T1 and the lower adjacent sixth image block T2 of the corresponding image block C' being f are taken as temporal prediction vectors _n And the middle image block C is the corresponding image block on the previous frame of image. Assuming that image block C has the same motion vector as image blocks T1 and T2 in terms of continuity of motion of the object in the image, then T1 and T2 are at f _n The corresponding image blocks in (b) are T1 'and T2', and also the image block adjacent to the right and the image block adjacent to the bottom of the image block C, T1 'and T2' keep the motion vectors of T1 and T2 on the previous frame image, and can obtain the motion vectors of T1 and T2 on f _n+1 The above corresponding seventh image block T1 "and eighth image block T2" are also taken as the other two first candidate image blocks.

In addition, the image block C is obtained at f _n+1 The image blocks Zero corresponding to the same position in (b) are taken as the Zero point image blocks. The same position means that the image block Zero is at f _n+1 Coordinate position of (3) and image block C at f _n The coordinate positions in (1) are the same. The zero image block is also a first candidate image block.

For the area with complex motion, some image blocks of the space-time neighborhood can be added to improve the frame interpolation quality, so that some global motion vectors of the time domain can be added to supplement the first candidate search set.

The global motion vector is mainly considered that for an area with complex motion, a first candidate image block determined according to an adjacent image block of a first central image block is adopted for searching, and a required variable possibly cannot be captured, so that the global motion vector is compensated by a method of adding the global motion vector, and the global motion vector corresponding to a previous frame of image is used in a rough searching process of the first central image block.

The global motion vector represents a motion vector of a category with a larger number after the motion vector of each image block in the whole frame image is classified by a certain method. In order to find motion vectors for most image blocks, the motion vectors in the last frame image may be classified.

Referring to fig. 2D specifically, fig. 2D is a motion vector clustering flowchart provided in an embodiment of the present application, and as shown in fig. 2D, a global motion vector obtaining process of a third frame image (a previous frame image of a first frame image) includes the following steps:

201. initializing the center of the class into a zero vector, setting the number k of the classes to be 1, setting the selected class x to be 0, and setting a distance D;

202. judging whether the number x of the selected categories is less than 4;

203. if yes, calculating the distance D between the motion vector in the image block in the row and the center vector of the existing class, and comparing the distance D with the distance D;

204. if the distance D is smaller than or equal to the distance D, dividing the motion vector of the image block and the existing class into one class, and after adding a new motion vector, representing the number of the motion vectors in the class as +1 as count + +;

205. if the distance D is larger than the distance D, classifying the motion vector of the image into a new class, wherein the number of the classes is k + +; determining whether the total number K of classes is less than K, if so, executing step 206; if not, stopping clustering;

207. adding a new motion vector in a class, recalculating the average value of the motion vectors in the current class, updating the average value to a new class center vector mv _ c, and executing step 202;

208. after clustering is finished, 1/8 of the number of motion vectors in the target class, which exceeds the number of image blocks in each line of the frame image by count, and the number of the motion vectors is arranged in the first four lines, is obtained;

209. a target center vector mv _ c of the target class is obtained,determining f according to the target central vector and the first central image block _n+1 A global image block.

As can be seen from the above description, the motion vectors of each image block of the third frame image are clustered to obtain a plurality of classification categories, where the total number of the classification categories cannot be greater than K, in this embodiment of the present application, K may be set to 16 at maximum, and in some cases, if the third frame image is an image with a small area, K may be a value smaller than 16. And selecting 1/8 of which the number of the motion vectors exceeds the number of image blocks in each line of the third frame image, and selecting the target class of which the number of the motion vectors is four in the first row, wherein the number of the image blocks in each line is the number of the image blocks transversely dividing the third frame image. If the first four object classes are obtained, the central vectors of the 4 object classes may be obtained, and 4 global image blocks corresponding to the first central image block on the second image frame may be obtained by calculation according to the 4 central vectors, and the obtained global image blocks are also image blocks in the first candidate search set.

As can be seen from the above description, the first candidate search set includes a total of 9 first candidate image blocks. After all the first candidate image blocks in the first candidate search set are obtained, matching operation (namely rough search) is carried out on the first center image block and the plurality of first candidate image blocks respectively, and the image block with the highest matching degree with the first center image block is obtained and serves as a second center image block. The matching algorithm between images includes a gray-scale-based matching algorithm and a feature-based matching algorithm, and the feature-based matching method further includes a Mean Absolute Difference (MAD) algorithm, a Sum of Absolute Differences (SAD) algorithm, a sum of square errors (SSD) algorithm, and the like. In the embodiment of the application, an SAD algorithm is adopted, and the matching degree of the first central image block and the first candidate image block is determined by solving the sum of absolute values of differences between pixel values in pixel blocks corresponding to the first central image block and the first candidate image block, wherein the larger the SAD value is, the lower the matching degree of the two image blocks is. The process has the advantage of low computational complexity, and can ensure higher rough search efficiency. And finally, acquiring the image block with the highest matching degree with the first central image block in the first candidate image blocks as a second central image block.

In an alternative case, in order to apply weighted average to the actual data and the original predicted data obtained continuously to make the prediction result closer to the actual situation, when the matching degree of the first candidate image block and the first central image block is calculated, the sum of the absolute values of the differences between the pixel values (the sum of the first absolute errors) is obtained according to the SAD algorithm, and then the smoothing term smoothness1 is added, that is, the sum of the first absolute errors and the smoothing term is summed to obtain the first estimation value, where the smoothing term specifically is:

smoothness1＝∑|mv _c1 -mv _neighbor | (1)

wherein mv _c1 Representing a motion vector, mv, of a first candidate image block _neighbor Motion vectors representing 8 neighboring image blocks of the first central image block on the first frame image, wherein the motion vectors of S1 and S2 are spatial prediction vectors, the motion vectors of T1 and T2 are temporal prediction vectors, and the motion vector of Zero is a Zero vector.

And finally, determining the matching degree between the first central image block and the first candidate image block according to the first estimation value, wherein the larger the first estimation value is, the lower the matching degree is.

As can be seen, in the embodiment of the present application, when a first central image block is selected to perform a rough search on a plurality of first candidate image blocks in a corresponding first candidate search set, an image block obtained by combining a temporal prediction vector with an image block adjacent to the first central image block, an image block obtained by combining a spatial prediction vector with an image block adjacent to the first central image block, an image block with the first central image block corresponding to the same position on a next frame image, and a global image block determined according to a global motion vector of a previous frame image are considered. The process fully considers candidate image blocks possibly corresponding to adjacent image blocks of the first central image block under various conditions, and simultaneously considers the global motion vector of the previous frame of image, so that the representativeness and the comprehensiveness of the obtained first candidate search set are improved, and the reliability of a rough search result is further improved.

After the rough search is completed, the precise search is further performed, and according to the previous process, the plurality of second candidate image blocks in the second candidate search set used in the precise search is determined by the step length distance between the second center image block and the other image blocks. The second central image block and the other image blocks are positioned on the second frame image, the step distance represents the linear distance between the image blocks, and the step distance between the adjacent image blocks is 1.

Specifically, referring to fig. 2E, fig. 2E is a schematic diagram illustrating a process of determining a plurality of second candidate image blocks, where the second center image block is image block C0, and as shown in (a) of fig. 2E, an image block having a step distance from the second center image block of a first distance, which may be 1, is obtained as the first step image block, that is, an image block having a step distance from the second center image block of 1 is obtained as the first step image block, specifically, an image block marked as 1 in the figure includes 8 image blocks adjacent to image block C0. Then, the 8 image blocks are respectively subjected to matching operation with the image block C0, and similarly, the image matching algorithm described above may be used to obtain the first step length image block with the highest matching degree with the image block C0, which is the third center image block 1-C0, and then obtain the image block with the step length distance of 1 from the image block 1-C0 as the second step length image block, which is the image block marked as 1' in the figure, where actually the second step length image block overlaps with the first step length image block, and the overlapping portion is used as the first step length image block and is not repeatedly recorded as the second step length image block.

Then, as shown in (b) in fig. 2E, an image block with a step size distance from the second center image block being a second distance is obtained as a third step-length image block, where the second distance may be 3, that is, an image block with a step size distance from the second center image block being 3 is obtained as the third step-length image block, specifically, 8 image blocks marked as 3 in the drawing. And performing matching operation on the 8 image blocks and the image block C0 respectively to obtain a third step length image block with the highest matching degree with the image block C0 as a fourth central image block 3-C0, and then obtaining an image block with a step length distance of 3 from the image block 3-C0 as a fourth step length image block, namely, the image block marked as 3' in the figure. Similarly, the fourth-step image block includes an image block overlapping with the third-step image block, and the overlapping portion is regarded as the third-step image block and is not repeatedly recorded as the fourth-step image block.

And the second central image block, the first step length image block, the second step length image block, the third step length image block and the fourth step length image block which are obtained according to the method form a plurality of second candidate image blocks in a second candidate search set. Then, the first central image block is used to perform an accurate search for the plurality of second candidate image blocks, that is, the second central image block is respectively subjected to image matching with each image block in the plurality of second candidate image blocks, and the matching method used may be, for example, the gray-scale-based matching algorithm or the feature-based matching algorithm described above, and particularly, the SAD algorithm may be used, so that the matching efficiency can be improved. And finally, acquiring an image block with the highest matching degree with the first central image block from the plurality of second candidate image blocks, wherein the image block is called a final matching block.

Optionally, when the matching degree between the second candidate image block and the first center image block is calculated, when the sum of the absolute values of the differences between the pixel values (the second sum of absolute errors) is obtained according to the SAD algorithm, a smoothing term 2 and a distance difference distance may be further added, that is, the second sum of absolute errors, the smoothing term and the distance difference are summed to obtain a second estimation value, where the corresponding formula is:

smoothness2＝∑|mv _c2 -mv _neighbor | (2)

distance＝max(|x|,|y|) (3)

wherein mv _c2 A motion vector, mv, representing a second candidate image block _neighbor Representing the motion vectors of 8 neighboring image blocks of the first central image block on the first frame image, the motion vector of the second candidate image block may also be a temporal prediction vector or a spatial prediction vector. The distance difference (first distance difference) is the maximum value of the absolute value of the x or y direction offset of the motion vector between the first center image block and the second candidate image block.

And finally, determining the matching degree between the first central image block and the second candidate image block according to the second estimation value, wherein the larger the second estimation value is, the lower the matching degree is.

It can be seen that, in the embodiment of the present application, when a first center image block is selected to perform an accurate search on a plurality of second candidate image blocks in a corresponding second candidate search set, a second step length image block and a fourth step length image block are obtained for a first distance and a second distance respectively according to a step length distance, and then an accurate search result is obtained according to a matching result of the first center image block, the second step length image block and the fourth step length image block.

Based on the description of the above-mentioned embodiment of the motion estimation method for video frame images, the embodiment of the present invention further discloses a motion estimation apparatus, referring to fig. 3, fig. 3 is a schematic structural diagram of a motion estimation apparatus provided in the embodiment of the present invention, where the motion estimation apparatus 300 includes:

an obtaining module 301, configured to obtain a first frame image in a target video;

the processing module 302 is configured to divide the first frame image into a plurality of image blocks, and obtain a first central image block of the plurality of image blocks;

the processing module 302 is further configured to perform a coarse search for a plurality of first candidate image blocks in the first candidate search set by using the first center image block, and determine a second center image block according to the coarse search result, where the first candidate image block is determined according to image blocks in adjacent frame images of the first frame image;

the processing module 302 is further configured to determine, according to the second central image block, a plurality of second candidate image blocks in the second candidate search set, where the second candidate image block is determined according to step length distances between the second central image block and other image blocks, and the other image blocks and the second central image block are located in the same frame of image;

the processing module 302 is further configured to perform an accurate search for a plurality of second candidate image blocks by using the first central image block, and determine a motion estimation result of the first central image block according to an accurate search result; and determining the motion estimation result of the first frame image according to the motion estimation results of the plurality of image blocks of the first frame image.

Optionally, the processing module 302 is further configured to determine a plurality of first candidate image blocks in the first candidate search set, and specifically to:

acquiring a first motion vector of a first image block and a second motion vector of a second image block, determining a third image block corresponding to the first image block when a first frame image moves to a second frame image according to the first image block and the first motion vector, and determining a fourth image block corresponding to the second image block when the first frame image moves to the second frame image according to the second image block and the second motion vector, wherein the first image block is a left adjacent image block of a first central image block, the second image block is an upper adjacent image block of the first central image block, and the second frame image is a next frame image adjacent to the first frame image;

acquiring a third motion vector of a fifth image block moving from a position on a third frame image to a current position and a fourth motion vector of a sixth image block moving from the position on the third frame image to the current position, determining a seventh image block corresponding to the fifth image block when the first frame image moves to the second frame image according to the fifth image block and the third motion vector, and determining an eighth image block corresponding to the sixth image block when the first frame image moves to the second frame image according to the sixth image block and the fourth motion vector, wherein the fifth image block is a right adjacent image block of a first central image block, the sixth image block is a lower adjacent image block of the first central image block, and the third frame image is a previous frame image adjacent to the first frame image;

acquiring a zero-point image block corresponding to the first central image block in the second frame image, wherein the coordinate position of the zero-point image block in the second frame image is the same as the coordinate position of the first central image block in the first frame image;

acquiring a global motion vector of a third frame of image, wherein the global motion vector is obtained by clustering motion vectors corresponding to a plurality of image blocks which are used for dividing the third frame of image;

acquiring a corresponding image block of the first central image block on the second frame image according to the global motion vector as a global image block;

and the third image block, the fourth image block, the seventh image block, the eighth image block, the zero image block and the global image block form a plurality of first candidate image blocks in the first candidate search set. Optionally, the determining, by the rough search result, a second central image block according to the rough search result, where the rough search result is a matching result between the first central image block and the plurality of first candidate image blocks, includes:

and determining the image block with the highest matching degree with the first central image block in the first candidate image blocks as a second central image block.

Optionally, the processing module is further configured to determine matching degrees between the first center image block and the plurality of first candidate image blocks, and specifically configured to:

calculating to obtain a first absolute error sum of the first central image block and the first candidate image block, wherein the first absolute error sum is a sum of absolute values of differences between a plurality of pixel points in the first central image block and a plurality of pixel points to be matched in the first candidate image block;

summing the first absolute error sum and a first smooth term to obtain a first estimated value, wherein the first smooth term is determined according to the difference sum of the motion vector of the first candidate image block and the motion vector of the adjacent image block of the first central image block;

and determining the matching degree between the first central image block and the first candidate image block according to the first estimated value.

Optionally, the processing module 302 is further configured to determine a plurality of second candidate image blocks in the second candidate search set, specifically to:

acquiring an image block with a first distance from the step length of the second central image block as a first step length image block;

determining a third central image block according to the matching result of the second central image block and the first step length image block;

acquiring an image block with a first distance to the step length of the third central image block as a second step length image block;

acquiring an image block with a step length distance from the second central image block as a second step length image block, wherein the second distance is greater than the first distance;

determining a fourth central image block according to the matching result of the second central image block and the third step length image block;

acquiring an image block with a second distance from the fourth central image block as a fourth step image block;

the second center image block, the first step length image block, the second step length image block, the third step length image block and the fourth step length image block form a plurality of second candidate image blocks in a second candidate search set.

Optionally, the determining the motion estimation result of the first central image block according to the accurate search result is a matching result of the first central image block and a plurality of second candidate image blocks, and includes:

determining a final matching block according to the accurate searching result, wherein the final matching block is the image block with the highest matching degree with the first central image block in the plurality of second candidate image blocks;

and calculating to obtain a final motion vector between the first central image block and the final matching block as a motion estimation result of the first central image block.

Optionally, the processing module 302 is further configured to determine matching degrees of the first central image block and the plurality of second candidate image blocks, and specifically is configured to:

calculating to obtain a second absolute error sum of the first central image block and the second candidate image block, wherein the second absolute error sum is the sum of absolute values of differences of pixel values between a plurality of pixel points in the first central image block and a plurality of pixel points to be matched in the second candidate image block;

summing the second sum of absolute errors with a second smooth term determined according to a sum of differences between the motion vector of the second candidate image block and the motion vector of an adjacent image block to the first center image block, and a first distance difference determined according to a maximum value of an absolute value of x or y direction offset of the motion vector between the first center image block and the second candidate image block, to obtain a second estimated value;

It is to be noted that, for a specific functional implementation of the motion estimation apparatus, reference may be made to the description of the motion estimation method, and details are not described herein again. The units or modules in the motion estimation apparatus may be respectively or completely combined into one or several other units or modules to form the motion estimation apparatus, or some unit(s) or module(s) may be further split into multiple functionally smaller units or modules to form the motion estimation apparatus, which may achieve the same operation without affecting the achievement of the technical effect of the embodiments of the present invention. The above units or modules are divided based on logic functions, and in practical applications, the functions of one unit (or module) may also be implemented by a plurality of units (or modules), or the functions of a plurality of units (or modules) may be implemented by one unit (or module).

Each device and product described in the above embodiments includes modules/units, which may be software modules/units, or hardware modules/units, or may be partly software modules/units and partly hardware modules/units. For example, for each device and product of an application or integrated chip, each module/unit included in the application or integrated chip may all be implemented in a hardware manner such as a circuit, or at least a part of the modules/units may be implemented in a software program, which runs on an integrated processor inside the chip, and the remaining part of the modules/units may be implemented in a hardware manner such as a circuit; for each device and product corresponding to or integrating the chip module, each module/unit included in the device and product can be implemented by adopting hardware such as a circuit, different modules/units can be positioned in the same piece (such as a chip, a circuit module and the like) or different components of the chip module, at least part of/unit can be implemented by adopting a software program, and the software program runs in the chip module, and the rest of the modules/units of the integrated processor can be implemented by adopting hardware such as a circuit; for each device or product corresponding to or integrating the terminal, the modules/units included in the device or product may all be implemented by hardware such as circuits, different modules/units may be located in the same component (e.g., chip, circuit module, etc.) or different components in the terminal, or at least some of the modules/units may be implemented by software programs, the programs run on a processor integrated in the terminal, and the remaining sub-modules/units may be implemented by hardware such as circuits.

Based on the description of the method embodiment and the device embodiment, the embodiment of the invention also provides a motion estimation device. Fig. 4 is a schematic structural diagram of a motion estimation device according to an embodiment of the present invention. As shown in fig. 4, the motion estimation apparatus 300 described above may be applied to the motion estimation device 400, and the motion estimation device 400 may include: the processor 401, the network interface 404 and the memory 405, and the motion estimation apparatus 400 may further include: a user interface 403, and at least one communication bus 402. Wherein a communication bus 402 is used to enable connective communication between these components. The user interface 403 may include a Display (Display) and a Keyboard (Keyboard), and the selectable user interface 403 may also include a standard wired interface and a standard wireless interface. The network interface 404 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 405 may be a high-speed RAM memory or a non-volatile memory, such as at least one disk memory. The memory 405 may alternatively be at least one storage device located remotely from the aforementioned processor 401. As shown in fig. 4, the memory 405, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.

In the motion estimation device 400 shown in fig. 4, the network interface 404 may provide a network communication function; and the user interface 403 is primarily an interface for providing input to a user; and processor 401 may be used to invoke a device control application stored in memory 405 to implement the steps of the above-described method for motion estimation of video frame images.

It should be understood that the motion estimation apparatus 400 described in the embodiment of the present invention can perform the motion estimation method for the video frame image as described above, and can also perform the motion estimation device as described above, which are not described herein again. In addition, the beneficial effects of the same method are not described in detail.

Furthermore, it is to be noted here that: an embodiment of the present invention further provides a computer storage medium, where the computer storage medium stores the computer program executed by the aforementioned video processing apparatus, and the computer program includes program instructions, and when the processor executes the program instructions, the processor can execute the description of the video processing method, and therefore, details are not repeated here. In addition, the beneficial effects of the same method are not described in detail. For technical details not disclosed in the embodiments of the computer storage medium to which the present invention relates, reference is made to the description of the method embodiments of the present invention.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A method for motion estimation of a video frame image, the method comprising:

acquiring a first frame image in a target video, and dividing the first frame image into a plurality of image blocks;

obtaining a first central image block of the plurality of image blocks;

determining a plurality of first candidate image blocks in the first candidate search set, specifically including:

acquiring a first motion vector of a first image block and a second motion vector of a second image block, determining a third image block corresponding to the first image block when the first frame image moves to the second frame image according to the first image block and the first motion vector, determining a fourth image block corresponding to the second image block when the first frame image moves to the second frame image according to the second image block and the second motion vector, wherein the first image block is a left adjacent image block of a first central image block, the second image block is an upper adjacent image block of the first central image block, and the second frame image is a next frame image adjacent to the first frame image;

acquiring a third motion vector of a fifth image block moving from a position on a third frame image to a current position and a fourth motion vector of a sixth image block moving from the position on the third frame image to the current position, determining a seventh image block corresponding to the fifth image block when the first frame image moves to a second frame image according to the fifth image block and the third motion vector, determining an eighth image block corresponding to the sixth image block when the first frame image moves to the second frame image according to the sixth image block and the fourth motion vector, wherein the fifth image block is a right adjacent image block of the first central image block, the sixth image block is a lower adjacent image block of the first central image block, and the third frame image is a previous frame image adjacent to the first frame image;

acquiring a zero-point image block corresponding to the first center image block in the second frame image, wherein the coordinate position of the zero-point image block in the second frame image is the same as the coordinate position of the first center image block in the first frame image;

acquiring a global motion vector of the third frame of image, wherein the global motion vector is obtained according to motion vector clusters corresponding to a plurality of image blocks divided by the third frame of image;

acquiring a corresponding image block of the first central image block on the second frame image according to the global motion vector to be used as a global image block;

the third image block, the fourth image block, the seventh image block, the eighth image block, the zero image block and the global image block form a plurality of first candidate image blocks in the first candidate search set;

performing a rough search for the plurality of first candidate image blocks in a first candidate search set by using the first center image block, and determining a second center image block according to a rough search result, wherein the first candidate image block is determined according to image blocks in adjacent frame images of the first frame image;

determining a plurality of second candidate image blocks in a second candidate search set according to the second central image block, wherein the second candidate image blocks are determined according to step length distances between the second central image block and other image blocks, and the other image blocks and the second central image block are located in the same frame of image;

adopting the first central image block to perform accurate search aiming at the plurality of second candidate image blocks, and determining a motion estimation result of the first central image block according to an accurate search result;

and determining the motion estimation result of the first frame image according to the motion estimation results of the plurality of image blocks of the first frame image.

2. The method of claim 1, wherein the coarse search result is a match between the first central image block and the first candidate image blocks, and wherein determining a second central image block according to the coarse search result comprises:

and determining the image block with the highest matching degree with the first central image block in the first candidate image blocks as the second central image block.

3. The method according to claim 2, further comprising determining a degree of matching between the first center image block and the plurality of first candidate image blocks, specifically comprising:

4. The method according to any one of claims 1 to 3, wherein the determining a plurality of second candidate image blocks in a second candidate search set according to the second center image block specifically includes:

acquiring an image block with the step length distance from the third central image block being the first distance as a second step length image block;

acquiring an image block with a step length distance from the second central image block as a second distance which is greater than the first distance as a third step length image block;

acquiring an image block with the step length distance from the fourth central image block being the second distance as a fourth step length image block;

the second center image block, the first step length image block, the second step length image block, the third step length image block and the fourth step length image block form a plurality of second candidate image blocks in the second candidate search set.

5. The method of claim 4, wherein the exact search result is a matching result of the first central image block and the plurality of second candidate image blocks, and wherein determining the motion estimation result of the first central image block according to the exact search result comprises:

6. The method according to claim 5, further comprising determining a degree of matching of the first center image block with the plurality of second candidate image blocks, specifically comprising:

and determining the matching degree between the first central image block and the second candidate image block according to the second estimated value.

7. A motion estimation apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring a first frame image in a target video;

the processing module is used for dividing the first frame image into a plurality of image blocks to obtain a first central image block in the plurality of image blocks;

the processing module is further configured to determine a plurality of first candidate image blocks in the first candidate search set, and is specifically configured to:

the processing module is further configured to perform a coarse search on a plurality of first candidate image blocks in the first candidate search set by using the first center image block, and determine a second center image block according to a coarse search result, where the first candidate image block is determined according to image blocks in adjacent frame images of the first frame image;

the processing module is further configured to determine, according to the second central image block, a plurality of second candidate image blocks in a second candidate search set, where the second candidate image block is determined according to step length distances between the second central image block and other image blocks, and the other image blocks and the second central image block are located in the same frame of image;

the processing module is further configured to perform an accurate search for the plurality of second candidate image blocks by using the first central image block, and determine a motion estimation result of the first central image block according to an accurate search result; and determining the motion estimation result of the first frame image according to the motion estimation results of the plurality of image blocks of the first frame image.

8. The apparatus of claim 7, wherein the coarse search result is a matching result between the first center image block and the first candidate image blocks, and wherein determining a second center image block according to the coarse search result comprises:

9. The apparatus according to claim 8, wherein the processing module is further configured to determine a degree of matching between the first center image block and the plurality of first candidate image blocks, and is specifically configured to:

calculating to obtain a first absolute difference sum of the first central image block and the first candidate image block, wherein the first absolute difference sum is a sum of absolute values of differences of pixel values between a plurality of pixel points in the first central image block and a plurality of pixel points to be matched in the first candidate image block;

and determining the matching degree between the first central image block and the first candidate image block according to the first estimation value.

10. The apparatus according to any of claims 7 to 9, wherein the processing module is further configured to determine a plurality of second candidate image blocks in the second candidate search set, and specifically configured to:

acquiring an image block with a second distance from the step length of the second central image block as a third step length image block, wherein the second distance is greater than the first distance;

11. The apparatus of claim 10, wherein the exact search result is a matching result of the first central image block and the plurality of second candidate image blocks, and wherein the determining the motion estimation result of the first central image block according to the exact search result comprises:

12. The apparatus according to claim 11, wherein the processing module is further configured to determine matching degrees of the first center image block and the plurality of second candidate image blocks, and is specifically configured to:

summing the second sum of absolute differences with a second smooth term determined according to the sum of differences between the motion vector of the second candidate image block and the motion vector of the adjacent image block to the first central image block, and a first distance difference determined according to the maximum value of the absolute values of x-or y-direction offsets of the motion vectors between the first central image block and the second candidate image block, to obtain a second estimated value;

13. A motion estimation device, characterized by comprising: a processor and a memory;

the processor is coupled to a memory, wherein the memory is configured to store program code and the processor is configured to invoke the program code to perform the video frame image motion estimation method of any of claims 1-6.

14. A computer storage medium, characterized in that it stores a computer program comprising program instructions which, when executed by a processor, perform a video frame image motion estimation method according to any one of claims 1 to 6.