Disclosure of Invention
To solve the problems in the prior art, an embodiment of the present invention provides a motion prediction method based on shape matching, including:
generating a shape description of the template image;
generating a decision value description of the target image;
and scanning in the judgment value description of the target image by using the shape description of the template image, and selecting the direction of the maximum sum value in the shape matching calculation result as a target motion prediction range.
In one embodiment of the invention, generating the shape description of the template image comprises:
generating a black plug matrix of each pixel point of the template image;
generating a decision value for each pixel based on a blackplug matrix for each pixel of the template image;
and sampling and marking the judgment value of the template image according to a certain step length, and forming the shape description of the whole template image by using the sampling points.
In one embodiment of the invention, the template image is gaussian filtered before the blackplug matrix is generated.
In one embodiment of the present invention, the determination value of each pixel is a determinant of a blackout matrix, an eigenvalue of a blackout matrix, or a discriminant of a blackout matrix.
In one embodiment of the present invention, sampling and marking are performed on the decision value of the template image according to a certain step length, and the shape description of the whole template image by using the sampling points comprises:
and when the absolute value of the sampling point is greater than the threshold value, storing the information of the sampling point, wherein a set consisting of adjacent points with the same sign forms a shape description.
In one embodiment of the present invention, the description of the blackout determination value of the target image is the same as the calculation method of the blackout determination value of the template image.
In one embodiment of the present invention, scanning in the determination value description of the target image using the shape description of the template image includes:
determining a primary prediction central point described by a judgment value of a target image;
and respectively calculating the convolution sum of the shape description of the template image and the judgment value of the test image in the four directions at the positions of the four directions of which the central point offset described by the judgment value of the target image is T, wherein the direction with the maximum convolution sum is the range of the predicted motion.
In one embodiment of the present invention, the scanning in the determination value description of the target image using the shape description of the template image further comprises:
taking the position of the maximum value of the convolution sum determined by the previous prediction as the current predicted central point, respectively calculating the convolution sum of the shape description of the template image and the judgment value of the test image in four directions at the positions of the four directions with the current predicted central point offset being T;
judging whether the maximum value of the convolution sum calculated currently is larger than the maximum value of the convolution sum predicted last time;
if the maximum value of the convolution sum calculated currently is larger than the convolution sum maximum value predicted last time, changing the current prediction into the previous prediction, repeatedly taking the position of the maximum value of the convolution sum determined by the previous prediction as the central point of the current prediction, respectively calculating the convolution sum of the shape description of the template image and the judgment value of the test image in four directions at the positions of the four directions of which the offsets of the central point of the current prediction are T; judging whether the maximum value of the convolution sum calculated currently is larger than the maximum value of the convolution sum predicted last time;
and when the maximum value of the convolution sum calculated at present is smaller than the convolution sum maximum value of the previous prediction, the position of the convolution sum maximum value of the previous prediction is the final position of the image prediction.
In one embodiment of the invention, the offset T is less than 100 pixels.
The motion prediction method based on the shape matching can realize large-range prediction with small calculation cost.
Detailed Description
In the following description, the invention is described with reference to various embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details, or with other alternative and/or additional methods, materials, or components. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of embodiments of the invention. Similarly, for purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the embodiments of the invention. However, the invention may be practiced without specific details. Further, it should be understood that the embodiments shown in the figures are illustrative representations and are not necessarily drawn to scale.
Reference in the specification to "one embodiment" or "the embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
In the following description, the invention is described with reference to various embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details, or with other alternative and/or additional methods, materials, or components. In other instances, well-known structures, materials, or operations are not shown or described in detail to avoid obscuring aspects of embodiments of the invention. Similarly, for purposes of explanation, specific numbers, materials and configurations are set forth in order to provide a thorough understanding of the embodiments of the invention. However, the invention may be practiced without specific details. Further, it should be understood that the embodiments shown in the figures are illustrative representations and are not necessarily drawn to scale.
Reference in the specification to "one embodiment" or "the embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.
The invention provides a brand-new method for predicting a high-frequency image in a large range, and motion prediction is realized through shape matching. In a general tracking algorithm, a Hessian (black plug) matrix is used for realizing prediction tracking by selecting a maximum value or a minimum value as a characteristic point, and the method is characterized in that a Hessian matrix determinant is sampled according to a certain step length to generate a shape, then a shape matching method is used for calculation, and finally the movement direction of the maximum value is selected as a prediction range, so that large-range prediction is realized at low calculation cost.
The invention processes the image with more characters mainly displayed by high frequency through the Hessian matrix to achieve the effect of removing high frequency, then performs shape matching and prediction to realize the large-range prediction of the high frequency image and solve the defects of the existing matching tracking method in the aspect of high frequency anti-interference.
The high frequency information may include text. It should be noted here that, in the present invention, the frequency of an image is an index representing the intensity of change in the gray scale in the image, and is the gradient of the gray scale in a plane space; then, the high frequency image is an image that characterizes sharp gray scale changes, large gradients of gray scale in planar space, and/or sharp edges. For example, the high frequency information is a text, a high speed moving scene, or a high speed moving object, etc., depending on the application scene.
Fig. 1 illustrates a flowchart of an image motion prediction method according to an embodiment of the present invention.
First, in step 110, a shape description of the template image is generated. In an embodiment of the present invention, the shape description of the template image may be generated by a Hessian matrix.
The Hessian matrix is a square matrix formed by second-order partial derivatives of a multivariate function and describes the local curvature of the function. For each pixel point, the Hessian matrix is as follows:
before constructing the Hessian matrix, Gaussian filtering can be carried out on the image to remove pixel mutation caused by noise, and the filtered Hessian matrix can be expressed as follows:
and generating a judgment value of each pixel based on the Hessian matrix of each pixel of the template image, wherein the judgment value can be a determinant of the Hessian matrix, an eigenvalue of the Hessian matrix or a discriminant of the Hessian matrix, and the like.
And sampling and marking the judgment value of the template image according to a certain step length, and forming the shape description of the whole template image by using the sampling points.
When the absolute value of the sampling point of the template image is greater than the threshold value, the point values and the coordinate information are stored, and because the values are divided into positive and negative, the shape description is formed by a set of adjacent points with the same sign, namely the shape description is formed according to the light and shade information. FIG. 2 illustrates an exemplary diagram of a shape description of a template image according to one embodiment of the invention. As shown in fig. 2, when the template graph 210 has the same size as the original template image and the absolute value of the sampling point is greater than the threshold, the information of the sampling point is stored, and the set of adjacent points with the same sign forms the shape description 220. The values of the remaining points in the template map 210 are discarded. In other words, the values of the remaining points may be regarded as 0.
In step 120, a Hessian determination value for each pixel of the target image is generated, and the Hessian determination value for each pixel is described as the determination value of the target image. In the embodiment of the invention, the Hessian determination value of the target image and the Hessian determination value of the template image are calculated in the same manner. The Hessian determination value of the target image may be a determinant of a Hessian matrix of each pixel of the target image, a feature value of the Hessian matrix, a discriminant of the Hessian matrix, or the like.
In step 130, the shape description of the template image is used to scan in the decision value description of the target image. And selecting the direction of the maximum sum value in the shape matching calculation result as the target motion prediction range.
In the embodiment of the invention, in order to simplify scanning calculation, a certain point described by a judgment value of a target image is taken as a central point, a central point of initial prediction can be the central point described by the judgment value of the target image, the central points of template images are respectively positioned in four directions with offset being T, the convolution sum of the shape description of the template images and the judgment value of a test image in the four directions is calculated, the direction with the maximum sum value is taken as a prediction motion range, and the next prediction is carried out based on the current prediction range. If the next prediction needs to be continued, the next prediction is the last prediction, and the last position is the final position of the image prediction of the frame when the maximum sum of the operations is smaller than the sum of the operations.
The convolution calculation can be regarded as a weighted summation process, using each pixel in the image region to be multiplied by each element of the convolution kernel (i.e., the weight matrix), and the sum of all products is used as the new value of the pixel in the center of the region.
Convolution calculation of the pixel region R of 3 × 3 with the convolution kernel G:
assuming that R is a 3 × 3 pixel region, the convolution kernel is G:
convolution sum ═ R1G1+ R2G2+ R3G3+ R4G4+ R5G5+ R6G6+ R7G7+ R8G8+ R9G 9.
In the convolution and calculation process of the shape description of the template image and the determination value of the test image, the shape description of the template image can be regarded as a sparse convolution kernel, and only the product sum of the sampling point with the value in the shape description of the template image and the determination value of the corresponding coordinate position of the test image is calculated.
In the process of predicting the image of one frame, since the offset T is small, for example, the offset T is less than 100 pixels, the maximum sum cannot be obtained by one-time prediction, and therefore, the maximum sum of the image prediction of the frame needs to be obtained through multiple iterative predictions.
In other embodiments of the invention, different scanning strategies may be employed. For example, the initial point of the scan calculation is first determined, and then the scan is performed pixel by pixel. Alternatively, a larger range is predetermined with a larger scan offset and then the scan range is gradually reduced.
Fig. 3 shows a schematic diagram of a process for object motion prediction according to an embodiment of the invention. In fig. 3, first, a certain point P1 described by the determination value of the target image is a center point of the initial prediction, and P2 is an initial prediction range and a next prediction center point. Part a in fig. 3 shows an overlapping portion of the determination value description of the target image and the template image. Part B of fig. 3 shows a schematic diagram of the offset of the first step of the prediction. The images marked by the dashed boxes are all template images (including shape description, not marked in the figure), x1, x2, x3 and x4 of the marks are four directions with x0 as the center and offset as T, and x1, x2, x3 and x4 are the center points of the corresponding template images. Section C of fig. 3 shows a schematic diagram of predicting the offset of the second step. The markers x5, x6, x7, and x8 are four directions centered at x3 and offset by T, and x5, x6, x7, and x8 are the center points of the corresponding template images.
As shown in fig. 3, with the test chart x0 as the center point, if the direction sum of x3 is the largest in the calculation of the four directions x1, x2, x3, and x4, the direction of x3 is the predicted range of this calculation, and the convolution sum of the four directions x5, x6, x7, and x8 with the offset T is calculated continuously with the position of x3 as the center in the next prediction. And repeating the calculation process for multiple times until the maximum convolution sum value of the calculation is smaller than the sum value of the last calculation, wherein the last position is the predicted final position of the image of the frame.
While various embodiments of the present invention have been described above, it should be understood that they have been presented by way of example only, and not limitation. It will be apparent to persons skilled in the relevant art that various combinations, modifications, and changes can be made thereto without departing from the spirit and scope of the invention. Thus, the breadth and scope of the present invention disclosed herein should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.