CN114943891A

CN114943891A - Prediction frame matching method based on feature descriptors

Info

Publication number: CN114943891A
Application number: CN202210417188.4A
Authority: CN
Inventors: 邵巍; 李帅; 肖扬; 王光泽; 姚文龙
Original assignee: Qingdao University of Science and Technology
Current assignee: Qingdao University of Science and Technology
Priority date: 2022-04-20
Filing date: 2022-04-20
Publication date: 2022-08-26

Abstract

The invention discloses a deep learning prediction frame matching method based on feature descriptors, and relates to the technology of target object recognition and matching in the field of image recognition. According to the method, the support region of the prediction frame is identified through determining deep learning, the region gradient is obtained to construct the feature descriptor with scale invariance, and the formed feature vector is used for matching the identification result. Simulation results show that the method has stronger robustness on translation, rotation, scaling and scale transformation, and has important significance on development in the fields of target tracking, visual navigation and the like.

Description

Prediction frame matching method based on feature descriptors

Technical Field

The invention belongs to the technology of identifying and matching a target object in the field of image identification, relates to the field of image identification and the field of image feature matching based on artificial intelligence, and particularly relates to a prediction frame matching method based on feature descriptors.

Background

With the development of artificial intelligence, deep learning networks are widely applied to image recognition, and an artificial intelligence technology based on deep learning is one of the development directions of future image recognition. The deep features of the image are mined by adopting the feature extraction network, so that the detection rate of the target object is effectively improved, but the matching of the target object is difficult to complete by the identification network, and the robustness is not high under large-scale view angle transformation.

In view of this, how to design an algorithm capable of matching the recognition results is a problem to be solved urgently by those skilled in the art.

Disclosure of Invention

The invention solves the technical problem that the target matching is difficult to complete by the existing identification network, combines the artificial intelligent identification network with the feature descriptor matching algorithm and provides a prediction frame matching algorithm based on the feature descriptor.

The deep learning prediction Frame matching algorithm based on the feature descriptors provided by the invention comprises selection of a prediction Frame Support area (Frame Support Range), generation of the feature descriptors and matching of the feature descriptors. Firstly, down-sampling and Gaussian blurring are carried out on an image to construct a scale space pyramid. The purpose of the scale space pyramid is to simulate multi-scale features of image data, describe a prediction frame of each image in the pyramid, and select the minimum circumcircle of the prediction frame as a prediction frame support area, so that pixel information contained in a sampling area is not changed after the image is rotated. The prediction block described in this way is then guaranteed to have scale invariance. Then, the main direction of the descriptor is determined by using the pixel gradient of the support region, the sector neighborhood is uniformly divided into 8 sectors by taking the main direction as a starting point, and the support region is equally divided into eight sector regions (F) ₁ ，F ₂ ，......，F _m ) Each sector is a sub-region of the FSR. Each sub-region is used as a component vector, all pixels in the sub-region are counted respectively, 8 × 8 feature vectors are formed according to the gray gradient projection values of the eight directions of 0-315 degrees of image coordinates, and normalization processing is performed according to the main direction. And performing Gaussian weighting operation on the gradient of each pixel to reduce the influence of noise of peripheral points on the characteristic value to form a 64-dimensional vector descriptor. Performing brightness normalization on the 64-dimensional vector to reduce illumination variationThe impact caused by the heat; matching of targets is completed by two-point set prediction box descriptor comparison. The similarity measure of descriptors with 128 dimensions is represented by the euclidean distance. And circularly acquiring the prediction frames from the previous frame image, and matching the corresponding prediction frame in the next frame image for each prediction frame in the previous frame image. And secondly, setting Euclidean distance threshold values between descriptors, and judging whether prediction frames extracted from the two images are similar or not according to the threshold values.

Compared with the prior art, the invention has the advantages and positive effects that:

the invention provides a prediction frame matching method based on a feature descriptor, and provides a solution to the problems that the matching of a target object is difficult to complete by an identification network and the robustness is not high under the large-scale visual angle transformation.

The method provided by the invention has invariance when the image is subjected to scaling transformation, translation rotation transformation and illumination matching, and is irrelevant to the size of a prediction frame.

Drawings

Various aspects of the invention will become more apparent to the reader upon reading the detailed description with reference to the accompanying drawings. Wherein the content of the first and second substances,

fig. 1 shows an overall flow chart of the algorithm of the present invention.

Fig. 2 is a schematic view of the support area.

Fig. 3 is a relationship between the number of support region partitions and the correct matching rate.

Fig. 4 is a schematic diagram of support area division.

FIG. 5 is a diagram illustrating the weight of the support region of the prediction box.

Fig. 6 is a feature descriptor gradient histogram.

Fig. 7 is a graph of the matching result in the scaling.

Fig. 8 is a diagram showing a matching result in the rotation transformation.

Fig. 9 is a diagram of a matching result in the shift transform.

Detailed Description

Fig. 1 shows a general flow chart of the algorithm of the present invention. The embodiments of the present invention will be described in further detail with reference to the drawings.

Referring to fig. 1, first in the process of support area determination: and performing down-sampling and Gaussian blur on the image to construct a scale space pyramid. The purpose of the scale space pyramid is to simulate the multi-scale features of image data and describe the prediction frame of each image in the pyramid, so that the described prediction frame is ensured to have scale invariance. As shown in fig. 2, the minimum circumscribed circle of the prediction box is taken as the prediction box support area. When the descriptor of the shape has a slight proportion change in the corresponding region, each fan-shaped region can keep more pixels as corresponding pixels compared with a rectangle, and the areas of the changed regions are the same; in the feature descriptor generation process: when the descriptor is generated for the information in the prediction frame, fuzzy processing is firstly carried out, and the influence of noise on matching is reduced. In order to avoid local information from being damaged, a Gaussian filtering algorithm is adopted to replace average filtering; the gaussian function of the two-dimensional normal distribution is used when weighting the pixels later:

and collecting gradient and direction distribution characteristics of pixels in the support area of the prediction frame where the pyramid is located on each layer of pyramid. Each prediction box is assigned a reference direction using the support region pixel gradient. In order to achieve complete rotational invariance, when the gradient and direction of a certain point are calculated, adaptive adjustment is performed according to the position of the certain point. For a binary function f (x, y), at point (x) ₀ ,y ₀ ) In the horizontal direction, gradient g (x) ₀ ,y ₀ ) _h Gradient g (x) from vertical ₀ ,y ₀ ) _v Can be expressed as:

the image is at a pixel point (x) ₀ ,y ₀ ) The gradient amplitude m (x, y) and the direction α (x, y) are:

after gradient calculation is completed on all pixel points, summation processing needs to be performed on gradient amplitudes according to the divided gradient direction intervals so as to construct a gradient histogram. If the gradient direction division granularity is too fine, the calculation amount is increased when the descriptor is constructed, and the noise interference is stronger when the descriptor is matched; conversely, if the gradient division size is too coarse, the image detail features will be lost, and the matching effect will be poor. Fig. 3 shows the relationship between the number of patches and matching accuracy. When the gradient direction is divided into 1 interval at every 45 degrees and is divided into 8 intervals, a better matching effect can be obtained under the condition of small calculated amount. Fig. 4 is a schematic diagram of support area division in the case of 8 divisions.

For the ith directional interval, the gradient magnitude summation can be expressed as:

wherein, w (x, y) is a weight coefficient, which represents the influence of the pixel gradient amplitude in the i interval, and the calculation process is shown in formula (7). When the image rotates, the edge information of the sensing area changes correspondingly, and correspondingly, the pixels close to the central area can be kept stable relatively. And giving corresponding weight coefficients when summing the gradient amplitudes according to the position conditions of the pixel points.

For a rectangular sensing region, the sensing region is divided into 4 × 4 meshes, and the weight coefficient of each mesh is obtained according to the gaussian function shown in formula (1), and the result is shown in fig. 5. And further, combining the circumscribed circle approximate perception target in the step one, and adjusting the network weight coefficient according to the network coverage relation between the circle and the rectangular frame. The circumscribing circle support region is shown in fig. 2. And calculating the sum of gradient amplitudes of all directional intervals according to a network weight coefficient grid, and constructing a gradient histogram, wherein the horizontal axis is the divided eight gradient directional intervals, and the vertical axis is the sum of the gradient amplitudes in the corresponding intervals. And then selecting the direction interval with the maximum gradient amplitude sum as the main direction interval, and traversing all intervals from the interval counterclockwise to construct the descriptor. For the histogram shown in fig. 6, where the k-th bin is the primary direction bin, the corresponding descriptor can be expressed as:

when the image is changed in scale, the whole pixel number is changed, and when the image is changed in brightness, the pixel value is changed integrally, but the proportion of the sum of the gradient amplitudes of the intervals in all directions can be kept relatively stable. Therefore, the descriptor D is subjected to normalization processing. Let the maximum value in D be M _max Minimum value of M _min Then the normalized descriptor D can be expressed as:

and finally, matching the prediction frames in the two images, and establishing a prediction frame description subset through the feature descriptors. Matching of targets is completed by two-point set prediction box descriptor comparison. The similarity measure of descriptors with 128 dimensions is represented by the euclidean distance. And circularly acquiring the prediction frames from the previous frame image, and matching the corresponding prediction frame in the right image for each prediction frame in the previous frame image. Secondly, setting Euclidean distance threshold values among descriptors, and considering vectors among different descriptorsThe length is different, and it is difficult to set a threshold value with generality, and the matching effect is measured by using the relative distance. For descriptor

When matching, if the best matching item is

Judging that the matching establishment conditions are as follows:

and judging whether the prediction frames extracted from the two images are similar or not according to the threshold value. Meanwhile, the nearest neighbor ratio method is adopted to reduce the mismatching, and the matching result is shown in figures 7-9. Under the translation change, the correct matching rate is 83.05%, and the mismatching rate is 1.69%; the correct matching rate is 81.36% under the rotation change, and the mismatching rate is 6.78%; the correct matching rate under the scale change is 89.25%, and the mismatching rate is 2.94%.

In conclusion, the method performs prediction frame matching on the image identified by the deep learning network, and the final result shows that the prediction frame matching method based on the feature descriptor realizes prediction frame matching and has important significance for target identification, target tracking, visual navigation and the like.

The above description is only for the preferred embodiment of the present invention and is not intended to limit the present invention in other forms, and any person skilled in the art may apply the above modifications or changes to the equivalent embodiment and apply it in other fields without departing from the technical scope of the present invention.

Claims

1. A deep learning prediction box matching method based on feature descriptors is characterized by comprising the following steps:

step A, selecting a prediction frame support area;

taking a circular area with the diagonal line of the prediction Frame as the diameter and the intersection point as the center of a circle as a prediction Frame Support area (Frame Support Range), wherein the FSR contains all pixel information inside the prediction Frame;

step B, generating a feature descriptor for the extracted prediction frame, wherein the step B comprises the following steps;

b1, constructing a Frame descriptor (previous) in the support area;

and C, performing prediction frame matching on the generated feature descriptors.

2. The support area determination method according to claim 1, characterized in that:

and A1, taking the minimum circumcircle of the prediction box as a prediction box support area. When the descriptor of the shape changes in rotation in the corresponding region, each fan-shaped region can hold a larger proportion of initial pixels compared with the rectangle, and the areas of the changed regions are the same;

a2, adopting Gaussian weighting to reduce the influence of the peripheral sub-region, so that the descriptor has scaling invariance, and the descriptor has scale invariance under the action of a Gaussian pyramid;

3. a descriptor schema method according to claim 2, characterized in that:

b11, a method for constructing descriptors in a circular supporting area by using sector area division is provided; when the descriptor is generated for the information in the prediction frame, fuzzy processing is firstly carried out, and the influence of noise on matching is reduced. In order to avoid local information from being damaged, a Gaussian filtering algorithm is adopted to replace average filtering; then, a gaussian function of two-dimensional normal distribution is used in weighting the pixels:

collecting on each layer of pyramidThe gradient and direction distribution characteristics of the pixels in the support area of the prediction frame. The support region pixel gradient and the highest direction are taken as the main directions of the prediction box. In order to achieve complete rotational invariance, when the gradient and direction of a certain point are calculated, adaptive adjustment is performed according to the position of the certain point. For a binary function f (x, y), at point (x) ₀ ,y ₀ ) In the horizontal direction, gradient g (x) ₀ ,y ₀ ) _h Gradient g (x) from vertical ₀ ,y ₀ ) _v Can be expressed as:

respectively calculating the horizontal gradient g (x, y) of each pixel point _h And g (x) ₀ ,y ₀ ) _v And then, performing gradient vector synthesis according to a formula in B11 to obtain the gradient direction and amplitude of the pixel point.

And B12, after gradient calculation is completed on all the pixel points, summing processing needs to be carried out on the gradient amplitude values according to the divided gradient direction intervals so as to construct a gradient histogram. If the gradient direction division granularity is too fine, the calculation amount is increased when the descriptor is constructed, and the noise interference is stronger when the descriptor is matched; conversely, if the gradient division size is too coarse, the image detail features will be lost, and the matching effect will be poor. Through experiments, the gradient direction is divided into 1 interval at every 45 degrees, and when the gradient direction is divided into 8 intervals, a good matching effect can be obtained under the condition of small calculated amount.

wherein, w (x, y) is a weight coefficient, represents the influence of the pixel gradient amplitude in the i interval, and is obtained by the normalization operation of a Gaussian function. When the image is rotated and changed, the gradient amplitude is made more by the gradient of the central area under the action of the weight coefficient.

4. The matching method according to claim 2: in the step C, matching according to the descriptor includes the steps of:

c1, establishing a prediction box descriptor set through the feature descriptors;

the similarity measure of the descriptor with 128 dimensions, C2, is represented by the euclidean distance. And circularly acquiring the prediction frames from the previous frame image, and matching the corresponding prediction frame in the next frame aiming at each prediction frame in the previous frame image. Secondly, setting Euclidean distance threshold values among descriptors, considering the difference of vector lengths among different descriptors, setting the threshold values with generality is difficult, and here, the matching effect is measured by using relative distance. For descriptor

When matching, if the best matching item is

Then judgeThe matching conditions are as follows: