CN106210449B

CN106210449B - Multi-information fusion frame rate up-conversion motion estimation method and system

Info

Publication number: CN106210449B
Application number: CN201610657029.6A
Authority: CN
Inventors: 张小云; 鲁国; 包文博; 高志勇; 陈立
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2016-08-11
Filing date: 2016-08-11
Publication date: 2020-01-07
Anticipated expiration: 2036-08-11
Also published as: CN106210449A

Abstract

The invention discloses a method and a system for estimating the up-conversion motion of a multi-information fusion frame rate, wherein the method comprises the following steps: reading two frames of images before and after the image is read, respectively carrying out down-sampling on the images, carrying out motion estimation based on an optical flow method on the down-sampled images, carrying out motion estimation of block matching on an original image, extracting SIFT (scale invariant feature transform) features of the two frames of images before and after the image is read, calculating a feature vector and carrying out feature matching to obtain a motion vector of the feature matching, and fusing the block matching motion vector, the feature matching motion vector and the optical flow method motion vector to transmit the motion vector. Compared with the traditional motion estimation algorithm based on block matching, the method has the advantage that the precision is obviously improved. For a general optical flow estimation algorithm, the method has better performance in keeping the motion vector of the small object and the motion boundary vector.

Description

Multi-information fusion frame rate up-conversion motion estimation method and system

Technical Field

The invention relates to the field of video frame rate up-conversion, in particular to a method and a system for multi-information fusion frame rate up-conversion motion estimation.

Background

The video frame rate is used as a key parameter of the motion video, and influences the fluency experience of people on moving objects in the watching process. In low frame rate video, fluctuation in pixel quality between adjacent frames easily causes a flickering (flickering) feeling of a picture, and in a moving scene or a sports program having the same moving direction, a motion judder (motion jutting) and a moving feeling of a picture are easily observed. The high frame rate video can greatly improve the visual experiences, bring better comprehensive visual quality and image fluency, and become the development direction and important requirements of future video applications.

As an important technology of video post-processing, frame rate up-conversion FRUC has been a lot of research work and related results in the fields of video image processing and digital television. At present, the mainstream practical FRUC method mainly uses correlation between adjacent frames of a video sequence to perform interpolation along the Motion trajectory of an object, and is also called Motion compensated based frame rate up conversion (MC-FRUC). The existing research mainly surrounds two key points of motion estimation and compensation interpolation of frame rate up-conversion, and a series of research results are obtained for improving the accuracy of a motion vector field and enhancing the reliability of an interpolation image. According to the generation method of the motion vector, the method can be divided into five types of common motion estimation methods based on block matching, feature matching, regions, gradients or transform domains. The block matching method is simple, efficient and easy to implement in hardware, and is widely applied in practice. However, the smoothness of the block matching method is poor, and especially in a monotonous background, the motion vector noise of the block matching is large. In addition, most block matching methods are difficult to estimate accurately for large motion vectors, so in practice, the block matching based method is often prone to errors at the motion vector boundaries and on the dynamic background, which reduces the visual quality of the interpolated frame video. For the problem of block matching, researchers have proposed many methods, including multi-resolution motion estimation, and indirectly or directly applying a smoothing constraint to further improve the image quality. Even so, the quality of the interpolated image still remains to be further improved.

Optical flow methods, and in particular, horns-Schunck based variable optical flow methods have been developed even further in recent years. Among them, the method of High Accuracy Optical Flow Estimation Based on the Theory for warp published by Brox et al in ECCV is widely used in practical application. The method utilizes a layered architecture and simultaneously utilizes a global optimization mode to smooth a vector field. Experiments show that the vector field generated by the optical flow method is obviously improved in precision and smoothness compared with the block matching method, and the vector field can be applied to motion estimation of frame rate up-conversion. But also has its disadvantages in the conventional optical flow method. Firstly, the traditional optical flow method adopts a layered architecture, on one hand, the convergence of motion vectors can be accelerated, but the description of the motion vectors of small objects is easily lost at the uppermost layer. This is particularly true when the size of a small object is not very different from its motion vector. In addition, due to the existence of the global smooth constraint, the optical flow method is inaccurate in estimating the large motion vector, and is easy to generate wrong aggregation near the large object motion vector.

The method of feature extraction and feature matching is widely applied to image processing, wherein SIFT features and SURF features are used more frequently. The SIFT features are the local features of the image, which keep invariance to rotation, scale scaling and brightness change and also keep a certain degree of stability to view angle change, affine transformation and noise; secondly, the SIFT features are good in uniqueness and rich in information quantity, and the method is suitable for fast and accurate matching in a massive feature database; in addition, the SIFT has a plurality of features, and even a few objects can extract a large number of SIFT feature vectors. The SIFT feature method mainly comprises the following steps of detection of extreme values in a scale space, positioning of key points, determination of directions and description of the key points. By calculating and matching the key points of two consecutive frames of images, the motion vector of the corresponding key point between the two frames can be obtained. Due to the characteristics of the SIFT features, the motion vector obtained by feature matching has good estimation performance for complex motion and large motion vectors. But SIFT features are relatively sparse and the locations of keypoints are unpredictable. In addition, because of point matching, the motion vector obtained by SIFT feature matching has little or easy error in the texture simple region.

Through retrieval, the invention discloses a frame rate up-conversion motion estimation method and system based on Kalman filtering, and the Chinese invention application with the publication number of CN104915966A and the application number of CN 201510233587.5. The method comprises the following steps: firstly, setting parameters and initial state values of a Kalman filtering model to enable the model to be consistent with an actual system; then obtaining a motion vector observed value by a strategy of firstly carrying out unidirectional motion estimation and then mapping to an interpolation frame; and finally, updating the observation vector by adopting a time-varying-gain Kalman filtering method so as to obtain a more accurate motion vector.

However, the motion estimation method of frame rate up-conversion is not accurate enough for estimating the motion vector of a small object and the motion vector of a large object. It is also not robust against complex motion vectors.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a motion estimation method and a motion estimation system in video frame rate up-conversion, which provide a high-performance motion vector by fusing an optical flow motion vector, a feature matching motion vector and a block matching motion vector and combining a motion post-processing means, thereby improving the visual quality of an interpolated frame

According to a first aspect of the present invention, there is provided a method for multi-information-fused frame rate up-conversion motion estimation, the method comprising the steps of:

the method comprises the following steps: reading two continuous frames of images from a high-definition video stream;

step two: down-sampling the image data of the first step;

step three: performing motion estimation based on an optical flow method on the front frame data and the rear frame data in the step two to obtain an optical flow method motion vector;

step four: performing motion estimation based on block matching on the image data of the front frame and the image data of the rear frame in the step one to obtain a block matching motion vector;

step five: SIFT feature extraction is carried out on the front frame image data and the rear frame image data in the step one, feature vectors of each feature are calculated, and feature matching is carried out to obtain corresponding feature motion vectors;

step six: fusing the block matching motion vector obtained in the fourth step, the characteristic motion vector obtained in the fifth step and the optical flow method motion vector obtained in the third step to obtain a fused motion vector field;

step seven: correcting the fused motion vector field obtained in the sixth step in a reliable motion vector transmission mode;

step eight: and returning to the step one, and reading the next two frames of images.

Preferably, in the second step, the image is down-sampled by taking one point every 4 points in the horizontal and vertical directions.

Preferably, in step three, the optical flow method adopts a classical variational optical flow method.

Preferably, in step four, the block matching algorithm uses a 3-dimensional recursive search.

Preferably, in the sixth step, the feature motion vector, the block matching motion vector and the optical flow method motion vector are fused, and the steps are as follows:

s1: acquiring an optical flow method motion vector field vo obtained by an optical flow method, a feature motion vector vf obtained by feature matching and a block matching motion vector field vb, and creating a fused vector field vm;

s2: scanning the fused motion vector field vm from left to right and from top to bottom in sequence;

s3: judging whether a characteristic motion vector exists in the characteristic matching motion vector field vf at the current position, if not, turning to S4, otherwise, turning to S5;

s4: calculating a pixel value difference SADB of a motion vector corresponding to the block matching motion vector field vb at the current position and a pixel value difference SADO of a motion vector corresponding to the optical flow method motion vector field vo at the current position;

s5: selecting a motion vector corresponding to the minimum value from the SADB and the SADO as a motion vector of the fused vector field vm at the current position; go to S8;

s6: respectively calculating a pixel value difference SADO of a motion vector corresponding to the current position of the optical flow method motion vector field vo, a pixel value difference SADF of a feature motion vector vf corresponding to the motion vector at the current position, and a pixel value difference SADB of a motion vector corresponding to the current position of the block matching motion vector field vb;

s7: selecting a motion vector corresponding to the minimum value from the SADB, the SADO and the SADF as a motion vector of the fused vector field vm at the current position; go to S8;

s8: whether the fused motion vector field vm has been scanned, if so, exiting and obtaining the motion vector field vm, otherwise, returning to S2.

More preferably, in S4 and S6, the method of calculating the pixel value difference of the motion vector is:

acquiring a current position pixel F (x), and a pixel RF (x + v) of a reference frame, wherein x is a current coordinate position, v is a motion vector corresponding to the current position, and RF is the reference frame;

SAD | f (x) -RF (x + v) |, where SAD is the difference in pixel values sought.

Preferably, in step seven, the motion vector is transmitted, and the steps are as follows:

acquiring an optical flow method motion vector field vo obtained by an optical flow method, a feature motion vector vf obtained by feature matching and a block matching motion vector field vb, and setting and creating a fused vector field vm;

s701: scanning the fused vector field vm from left to right and from top to bottom in sequence;

s702: for the current position (px, py), wherein px is a horizontal position coordinate, py is a vertical position coordinate, vm (px, py) is a vector of the current position, and a left position vector vm (px-1, py), an upper position vector vm (px, py-1), and an upper left position vector vm (px-1, py-1) are sequentially obtained;

s703: calculating pixel value differences of the previous and subsequent frames corresponding to the four vectors in the S702, and selecting the vector corresponding to the minimum pixel value difference to update the vector of the fused vector field vm at the current position;

s704: whether the complete frame image is scanned or not, if not, returning to the step S701, otherwise, entering the step S705;

s705: scanning the fused vector field vm sequentially from right to left and from bottom to top;

s706: for the current position (px, py), wherein px is a horizontal position coordinate, py is a vertical position coordinate, vm (px, py) is a vector of the current position, and a right position vector vm (px +1, py), a lower position vector vm (px, py +1), and a right lower position vector vm (px +1, py +1) are sequentially obtained;

s707: calculating pixel value differences of the previous and subsequent frames corresponding to the four vectors in the S706, and selecting the vector corresponding to the minimum pixel value difference to update the vector of vm at the current position;

708: whether a complete frame image has been scanned, if not, return to S705, otherwise exit.

According to a second aspect of the present invention, there is provided a multi-information fused frame rate up-conversion motion estimation system, comprising:

an image input module: reading two continuous frames of images from a high-definition video stream;

a down-sampling module: down-sampling image data input by an image input module;

an optical flow method motion vector obtaining module: carrying out motion estimation based on an optical flow method on the front frame data and the rear frame data behind the down-sampling module to obtain an optical flow method motion vector;

a block matching motion vector obtaining module: carrying out motion estimation based on block matching on the image data of the front frame and the back frame input by the image input module to obtain a block matching motion vector;

a feature motion vector obtaining module: SIFT feature extraction is carried out on the front frame image data and the rear frame image data input by the image input module, the feature vector of each feature is calculated, and feature matching is carried out to obtain the corresponding feature motion vector;

a motion vector field fusion module: fusing the block matching motion vector, the characteristic motion vector and the optical flow method motion vector obtained by each module to obtain a fused motion vector field;

a vector transfer module: and correcting the fused motion vector field obtained by the motion vector field fusion module by adopting a reliable motion vector transmission mode.

Compared with the prior art, the invention has the following beneficial effects:

the invention adopts a fusion method of a plurality of information, especially considering that the characteristic matching method has good representation for large motion vectors and complex vectors, and the block matching motion method has better estimation for small objects, and the performance of the invention in motion estimation is better than that of the invention with the publication number of CN 104915966A.

Furthermore, the invention is:

1. the optical flow method can ensure that the obtained optical flow method motion vector field is basically smooth;

2. motion estimation based on block matching has an obvious effect on ensuring the precision of the motion vector of a small object;

3. the characteristic motion vector matched with the characteristics has a good estimation effect on large motion vectors and complex motion vectors, and the performance of frame rate up-conversion in a complex scene can be obviously improved;

4. in the fusion process, the advantages of each motion estimation method are fully utilized, mutual interaction is carried out, the optimal motion vector at the current position is selected by judging the pixel value difference corresponding to a plurality of motion vectors at the corresponding position, and the reliability of the whole vector is improved;

5. the transfer of the motion vector is to select the optimal vector of the current position according to the surrounding vectors, on one hand, the correct vector can be transferred, and on the other hand, the smoothness of the whole motion vector field is also ensured;

6. the transmission of the motion vector adopts two times of motion transmission in different directions, so that the correct vector can be transmitted fully, and omission is avoided.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a flow chart of a method of one embodiment of the present invention;

FIG. 2 is a diagram illustrating motion vector transfer according to an embodiment of the present invention;

fig. 3 is a block diagram of a system structure according to an embodiment of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.

As shown in fig. 1, a flowchart of an embodiment of a method for motion estimation by frame rate up-conversion with multi-information fusion according to the present invention is shown, and specifically, the embodiment includes the following steps:

1 reading two consecutive frames of images in a video sequence and storing them as f1, f 2;

2 image downsampling

For high definition video (resolution of 1920x1080), it is down-sampled horizontally and vertically in order to reduce the subsequent computational complexity. One point is selected from every four points in the horizontal direction, and one point is selected from every four points in the vertical direction as a down-sampled image, so that the down-sampled images, f1_ s and f2_ s, can be obtained according to the method, and the resolutions of the down-sampled images are 480x270 respectively; for other resolutions, a suitable down-sampling mode may be selected in consideration of motion estimation performance and computational power.

And 3, carrying out motion estimation by using an optical flow method to obtain a corresponding optical flow vector field.

The motion Estimation of the Optical Flow method can be implemented by using the prior art, such as the method in the paper "High Accuracy Optical Flow Estimation Based on a Theory for Warping" published by Brox on ECCV; wherein the input images are down-sampled images f1_ s and f2_ s, and the resulting motion vector is the forward motion vector vo pointing from f1_ s to f2_ s;

and 4, carrying out block matching motion estimation to obtain a block matching motion vector field.

The block matching method adopts a three-dimensional recursive search method, and specifically comprises the following steps:

4.1 the input two consecutive frame images are f1, f 2;

4.2 divide the whole frame of picture into blocks of size NxN, where the size of N should be consistent with the down-sampled scale of the picture. In this embodiment, take N-4;

4.3 traversing each NxN block from the left upper corner in turn according to the scanning sequence from left to right and from top to bottom;

4.4 for a block whose position is at (x, y), its candidate vectors are from the already computed vectors, i.e. the candidate vector at the current position is from the motion vector vleft at the left position (x-1, y), the motion vector vabove at the upper position (x, y-1), and the zero vector v 0. The SAD value for each candidate motion vector is calculated. Wherein for vector vt, the horizontal motion vector is vtx and the vertical motion vector is vty. Its SAD value is equal to the absolute value of f1(x, y) -f2(x + vtx, y + vty). Wherein f1(x, y) and f2(x + vtx, y + vty) represent blocks of size NxN; selecting a motion vector corresponding to the minimum SAD value from the motion vectors, and defining the motion vector as vc;

4.5 for the resulting motion vector vc, the horizontal and vertical values are vcx and vcy, respectively. The updated motion candidate vectors are (vcx-1, vcy), (vcx +1, vcy), (vcx, vcy-1), (vcx, vcy + 1). Calculating SADs corresponding to the vectors respectively according to the SAD calculation mode, wherein the vector with the minimum SAD value is used as a final block matching motion vector;

4.6 scanning the whole frame image to obtain the block matching motion vector field vb.

5, feature extraction, feature matching and feature motion vector field calculation.

The adopted features are SIFT features, specifically:

5.1 the input two consecutive frame images are f1, f 2;

5.2 extracting feature points on the images f1 and f2, and recording the feature points as k1 and k 2;

5.3, matching k1 and k2 to obtain a corresponding feature matching motion vector field v;

5.4 downsampling the vector field v according to the above downsampling mode to obtain the vector field vf, so as to match the sizes of the block matching motion vector field and the optical flow motion vector field.

And 6, fusing the characteristic motion vector, the block matching motion vector and the optical flow method motion vector. Specifically, the method comprises the following steps:

6.1 reading an optical flow vector field vo, a block matching vector field vb and a feature matching vector field vf with the size of 480x270, and establishing a fused vector field vm with the same size;

6.2 scanning block by block from left to right and from top to bottom in the scanning sequence;

6.3 the current scan position is p. Since the motion vectors of feature matching are sparse, it is necessary to determine whether there is a corresponding motion vector in the feature matching vector field vf at the current position. If not, executing step 6.4, otherwise executing step 6.5;

6.4 calculating the absolute value error of the block matching motion vector vb and the vector corresponding to the optical flow method motion vector vo at the current position. The minimum sum of absolute errors is used as the determination criterion in this embodiment, but those skilled in the art will readily understand that other determination criteria may be selected, such as the minimum sum of mean square errors; assuming that the motion vector is v, the absolute value error sad ═ f (p) -f (p + v) |. The vector with the minimum absolute value error is the optimal motion vector of the current position and is stored in the corresponding position in vm;

6.5 calculating the absolute value error of the block matching motion vector vb, the optical flow method motion vector vo and the feature matching vf corresponding to the vector at the current position. The absolute value error is as defined in 6.4. The motion vector corresponding to the minimum absolute value error is the optimal motion vector and is stored in the corresponding position in vm;

6.6 scanning all positions to obtain a fused motion vector field vm.

7 motion vector transfer.

The purpose of motion vector transfer is to diffuse the correct motion vector, improve the precision of the motion vector field, and ensure the smoothness of the whole vector field. The transfer of motion vectors may be performed in two different ways. As shown in fig. 2.

7.1 scanning the fusion vector field vm from left to right and from top to bottom in sequence;

7.2 for the current position p, the horizontal and vertical coordinates are px, py, respectively, and the corresponding motion vector is v 0. Wherein the vector v1 is located at the left position (px-1, py), the motion vector v2 is located at the upper position (px, py-1), and the motion vector v3 is located at the upper left position (px-1, py-1). V1, v2 and v3 are vectors that have been scan processed in the current scan processing order;

7.3 calculate the corresponding pixel value differences for v0, v1, v2 and v3, where absolute value errors or other similar criteria may be used. For the vector v, the horizontal displacement and the vertical displacement are vx and vy respectively, the corresponding positions of the vector v and the vertical displacement on the original image need to be calculated, and meanwhile, in order to increase robustness, the pixel value difference corresponding to the blocks of the front frame and the rear frame on the motion trail is calculated. Wherein the down-sampled coordinates are px, py, respectively. The adopted expansion space N is 2; then the horizontal coordinate range h of the corresponding block at the original resolution is equal to [ -N +4 x px, N +4 x (px +1) -1], and the vertical coordinate range v is equal to [ -N +4 x py, N +4 x (py +1) -1 ]. In the present embodiment, the pixel value difference is defined as an absolute value error sad, which is | f1(h, v) -f 2(h + vx, v + vy) |. Selecting a vector with the lowest sad from v0, v1, v2 and v3 as a current optimal motion vector, and updating a fusion vector field vm;

7.4 scanning all the positions of the fused vector field vm;

7.5 scanning the updated motion vector field vm from bottom to top and from right to left in sequence;

7.6 for the current position p, the horizontal and vertical coordinates are px, py, respectively, and the corresponding motion vector is v 0. Wherein the vector v1 at the right position (px +1, py), the motion vector v2 at the bottom position (px, py +1), and the motion vector v3 at the bottom right position (px +1, py +1) are included. V1, v2 and v3 are vectors that have been scan processed in the current scan processing order;

7.7 and 7.3 are the same, and the pixel value differences corresponding to v0, v1, v2 and v3 are calculated respectively. Selecting a motion vector with the minimum pixel value difference as an optimal motion vector of the fusion vector field vm at the current position;

7.8 complete the full position of the fused vector field vm.

As shown in fig. 3, based on the above method description, there is provided a multi-information fusion frame rate up-conversion motion estimation system implementing the above method, including:

The specific implementation techniques of the above modules are the same as those in the steps of the multi-information fusion frame rate up-conversion motion estimation method, and are not described herein again.

The invention adopts a multi-information fusion mode to carry out motion estimation of frame rate up-conversion, and utilizes the advantages of various information, thereby improving the accuracy of a motion vector field. The method has remarkable effects on ensuring the smoothness of the motion vector field and improving the estimation of the motion vector of the small object and the motion estimation of a complex scene.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. A multi-information fusion frame rate up-conversion motion estimation method is characterized by comprising the following steps:

step two: down-sampling the image data of the first step;

step three: performing motion estimation based on an optical flow method on the continuous two-frame data in the step two to obtain an optical flow method motion vector;

step four: carrying out motion estimation based on block matching on the continuous two frames of image data in the step one to obtain a block matching motion vector;

step five: SIFT feature extraction is carried out on the continuous two frames of image data in the step one, the feature vector of each feature is calculated, and feature matching is carried out to obtain a corresponding feature motion vector;

step seven: correcting the fused motion vector field obtained in the sixth step in a motion vector transmission mode;

step eight: returning to the step one, reading the next two frames of images;

in the sixth step, the feature motion vector, the block matching motion vector and the optical flow method motion vector are fused, and the steps are as follows:

6.1 reading an optical flow method motion vector field vo of a size, matching a block matching motion vector field vb and a feature matching vector field vf, and establishing a fused vector field vm of the same size;

6.3, judging whether a corresponding motion vector exists in the feature matching vector field vf at the current position, if not, executing the step 6.4, otherwise, executing the step 6.5;

6.4, calculating absolute value errors corresponding to the vectors of the block matching motion vector and the optical flow method motion vector at the current position, assuming that the motion vector is v, the absolute value error sad is | f (p) -f (p + v) |, and the vector with the minimum absolute value error is the optimal motion vector of the current position and is stored in the corresponding position in vm;

6.5 calculating absolute value errors corresponding to the vectors of the block matching motion vector, the optical flow method motion vector and the feature matching vector at the current position, wherein the absolute value errors are the same as the definition in 6.4, and the motion vector corresponding to the minimum absolute value error is an optimal motion vector and is stored in the corresponding position in vm;

6.6 scanning all the positions to obtain the final fused motion vector field vm.

2. The method of claim 1, wherein in step two, the image is down-sampled by taking every 4 points horizontally and vertically.

3. The method of claim 1, wherein in step four, the block matching algorithm uses a 3-dimensional recursive search.

4. The method according to any one of claims 1 to 3, wherein in step seven, the motion vector is transmitted by the steps of:

5. A frame rate up-conversion motion estimation system implementing multi-information fusion of the method of any of claims 1-4, characterized by comprising: