CN111160304B

CN111160304B - Local frame difference and multi-frame fusion ground moving target detection and tracking method

Info

Publication number: CN111160304B
Application number: CN201911420447.3A
Authority: CN
Inventors: 张天序; 徐海; 许磊; 苏轩
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2022-03-29
Anticipated expiration: 2039-12-31
Also published as: CN111160304A

Abstract

The invention discloses a ground moving target detection and tracking method with local frame difference and multi-frame fusion, belonging to the technical field of ground moving target detection and comprising the following steps: and extracting FAST angular points in the first frame image, screening out triangular matching regions through the angular points, and finishing the registration of the frame image through triangular matching. Detecting a target from a third frame of image, extracting a FAST corner of each image, constructing a candidate region by taking the corner as a center, calculating the structural similarity between the candidate region and a region corresponding to an image of an alternate frame, filtering a background with overlarge structural similarity, screening out an interested region of a next frame of image and a target detection region of a current frame of image, and performing moving target detection in a local region; in the frame image processing process, the accuracy of target capture is improved by adopting multi-frame fusion, and the target is stably tracked by adopting a method of combining related filtering tracking and small target re-inspection. The invention solves the technical problem that the moving target can not be quickly detected and stably tracked in the large-view-field image in the moving target detection of the moving platform.

Description

Local frame difference and multi-frame fusion ground moving target detection and tracking method

Technical Field

The invention relates to the technical field of ground moving target detection, in particular to a ground moving target detection and tracking method with local frame difference and multi-frame fusion.

Background

The multi-target capturing and tracking of the visible light images under the ground background of the moving target of the moving platform often cannot effectively capture the target due to various problems of complicated and changeable ground background environment, large coverage range of high-resolution images, non-unique target quantity and the like, in addition, in the detection stage, the image range of the target to be detected is extremely large, the ground battlefield environment is complicated and changeable, in addition, the target is often in a far place in the primary detection stage, the texture information is insufficient, and effective characteristic points cannot be extracted for target detection, so the target detection difficulty is extremely high.

The traditional moving target detection method has the problems of poor applicability, high calculation complexity and the like. For example, a global image-based accurate registration method, namely a method of performing sift and surf registration and then performing frame difference in a global image, has the disadvantages of large calculation amount, long time consumption and the like, and because illumination change and image noise may exist between adjacent frames, a large number of false alarm targets may exist when frame difference calculation is performed globally, and the false alarms cannot be eliminated in image processing of limited frames; in addition, the moving object detection methods such as the optical flow method and the background modeling method have large calculation amount and cannot meet the real-time requirement of the battlefield at all. In the current ground-oriented vehicle moving target detection technology, due to the fact that a platform moves, frame difference of images cannot be directly achieved to extract moving target information, targets may not be unique, and stable moving targets cannot be detected in a global image.

Therefore, it is necessary to research a new ground moving target detection method to quickly and stably detect and track the ground moving target in the large-field image, so as to meet the real-time requirement of the battlefield.

Disclosure of Invention

In view of the above defects or improvement requirements of the prior art, the present invention provides a ground moving target detection and tracking method with a local frame difference and a multi-frame fusion, which aims to stably detect and track a ground moving target, thereby solving the technical problem that in the ground-oriented moving target detection technology, a moving target cannot be quickly detected in a large-field image and stably tracked.

To achieve the above object, according to one aspect of the present invention, there is provided a local frame difference and multi-frame fused ground moving target detecting and tracking method, which continuously acquires a plurality of frames of images for a fixed field of view, the method comprising the following steps:

(1) dividing the 1 st frame of image into M multiplied by N grid areas, extracting FAST angular points of the whole image, and determining the grid areas with angular points as the interested areas of the 3 rd frame of image;

(2) counting the number of corner points of the interested regions of all the 3 rd frame images in the 1 st frame image, selecting the interested regions of the three 3 rd frame images with the largest number of corner points as three template regions for triangular matching, and carrying out triangular matching on the 1 st frame image and the 2 nd frame image to obtain the displacement relationship between the 1 st frame image and the 2 nd frame image;

updating three template areas in the 2 nd frame image according to the displacement relation between the 1 st frame image and the 2 nd frame image, taking i as 3, and entering the step (3);

(3) performing triangular matching on the ith frame image and the ith-1 frame image based on three template areas in the ith-1 frame image to obtain a displacement relation between the ith frame image and the ith-1 frame image, and further obtain a displacement relation between the ith frame image and the ith-2 frame image;

updating a template area in the ith frame image according to the displacement relation between the ith frame image and the (i-1) th frame image;

(4) if i is 3, correcting the interested region of the 3 rd frame image according to the displacement relation between the 1 st frame image and the 2 nd frame image and the displacement relation between the 2 nd frame image and the 3 rd frame image; if i is larger than or equal to 4, correcting the interested area of the ith frame image according to the displacement relation between the ith frame image and the (i-1) th frame image;

(5) extracting FAST angular points in an interested area in the corrected i frame image, and respectively taking each FAST angular point as a center to construct a square candidate area with set side length; according to the displacement relation between the ith frame image and the ith-2 frame image, obtaining corresponding areas of all candidate areas in the ith frame image in the ith-2 frame image, and calculating the structural similarity between each candidate area in the ith frame image and the corresponding area in the ith-2 frame image;

(6) screening candidate areas with structural similarity smaller than a preset first threshold value in the ith frame of image, merging the candidate areas, and synthesizing a plurality of larger areas to serve as interested areas of the next frame of image; screening candidate areas with structural similarity smaller than a preset second threshold value, merging the candidate areas, and synthesizing a plurality of larger areas to serve as target detection areas of the current frame; screening candidate areas with structural similarity larger than or equal to a preset first threshold value, and directly judging the candidate areas as background areas to be filtered;

(7) performing local image frame difference on all target detection areas of the ith frame of image obtained in the step (6), and extracting a circumscribed rectangle of the moving target in the ith frame of image as a local image frame difference result;

(8) and fusing the local image frame difference result of the ith frame image and the suspected target result in the (i-1) th frame image to obtain the suspected target result of the ith frame image.

(9) Taking i as i +1, repeatedly executing the steps (3) - (8) until a preset maximum detection frame number is reached, and determining all moving targets in the view field;

(10) and stably tracking the moving target by adopting FDSST, and if the scale of the moving target is smaller than the set size, re-detecting the moving target in a local area by adopting multi-stage filtering to ensure that the target can be stably tracked.

Preferably, in step (3), the three template regions in the i-1 th frame image are represented as:

A_j(i-1)＝(x_j(i-1),y_j(i-1),M,N),j＝1,2,3

wherein A is_j(i-1) represents the jth template region, x, of the ith-1 frame image_j(i-1),y_j(i-1) represent the x-axis and y-axis coordinates, respectively, of the top left vertex of the template region.

Preferably, the method of triangle matching is:

keeping the position relation of the three template areas of the i-1 frame image unchanged, performing template matching in a set range in the i-1 frame image, and taking the displacement in the horizontal and vertical directions when the absolute value of the position error is minimum as the displacement relation of the i-1 frame image and the i-1 frame image.

Preferably, in step (3), the displacement relationship between the ith frame image and the (i-1) th frame image is calculated as:

in the formula, P_i(x, y) represents the gray scale value of the ith frame image at the coordinate (x, y), and Δ x (i-1), Δ y (i-1) are the displacements of the ith frame image relative to the ith-1 frame image in the horizontal and vertical directions, respectively.

Preferably, the update formula of the template region in the ith frame image is:

in the formula, x_j(i) Is the x-axis coordinate, y, of the jth template region in the ith frame image_j(i) Is the y-axis coordinate, x, of the jth template region in the ith frame image_j(i-1) x-axis coordinate, y, of jth template region in ith-1 frame image_j(i-1) is the y-axis coordinate of the jth template region in the ith-1 frame image, and Deltax (i-1) is the displacement of the ith frame image relative to the ith-1 frame image in the horizontal and vertical directions respectively.

Preferably, in step (4), the formula for correcting the region of interest of the 3 rd frame image is as follows:

in the formula, x_n' (3) is the x-axis coordinate of the top left vertex of the region of interest of the corrected 3 rd frame image, y_n' (3) is the y-axis coordinate of the top left vertex of the region of interest of the corrected 3 rd frame image, x_n(3) Is the x-axis coordinate, y, of the top left vertex of the region of interest of the 3 rd frame image before rectification_n(3) For the y-axis coordinate of the top left vertex of the interested region of the 3 rd frame image before rectification, Δ x (1) is the displacement of the 2 nd frame image relative to the 1 st frame image in the horizontal direction, Δ y (1) is the displacement of the 2 nd frame image relative to the 1 st frame image in the vertical direction, and Δ x (2) is the phase of the 3 rd frame imageFor the displacement of the 2 nd frame image in the horizontal direction, Δ y (2) is the displacement of the 3 rd frame image relative to the 2 nd frame image in the vertical direction;

the formula for correcting the interested region of the fourth frame image and the subsequent images is as follows:

in the formula, x_n' (i) is the x-axis coordinate of the top left vertex of the region of interest of the corrected i-th frame image, y_n' (i) is the y-axis coordinate of the top left vertex of the region of interest of the corrected i-th frame image, x_n(i) Is the x-axis coordinate, y, of the top left vertex of the interested region of the i frame image before rectification_n(i) In order to obtain the y-axis coordinate of the top left vertex of the interested region of the i-th frame image before rectification, Δ x (i-1) is the displacement of the i-th frame image relative to the i-1-th frame image in the horizontal direction, and Δ y (i-1) is the displacement of the i-th frame image relative to the i-1-th frame image in the vertical direction.

Preferably, in step (5), the set of square candidate regions in the ith frame image is

CA(i)＝{(x_c(i),y_c(i),w_c(i),h_c(i))}

Wherein x is_c(i),y_c(i),w_c(i),h_c(i) Sequentially representing the abscissa and the ordinate of the top left corner vertex of the c candidate region in the ith frame of image and the width and the height of the c candidate region;

the set of corresponding regions in the i-2 th frame image is:

CA”(i)＝{(x_c(i)-Δx(i-2),y_c(i)-Δy(i-2),w_c(i),h_c(i))}

Δ x (i-2) is a displacement in the horizontal direction of the ith frame image relative to the ith-2 frame image, and Δ y (i-2) is a displacement in the vertical direction of the ith frame image relative to the ith-2 frame image.

The calculation formula of the structural similarity between the candidate area of the ith frame image and the corresponding area image in the ith-2 frame image is as follows:

wherein X and Y are respectively the image of the candidate area of the ith frame image and the image of the corresponding area of the ith-2 frame image, and u_xAnd u_yRepresenting the mean, σ, of images X and Y, respectively_XAnd σ_YRespectively representing the standard deviation of images X and Y,

and

representing the variance, σ, of images X and Y, respectively_XYRepresenting the covariance of the X and Y images, C₁And C₂Is a constant used to maintain stability.

Preferably, step (8) is specifically:

if no suspected target exists in the i-1 th frame image, directly taking the local image frame difference result of the i-th frame image as the suspected target result of the i-th frame image;

if the suspected target exists in the image of the frame i-1, calculating the intersection ratio of all suspected target external rectangles in the image of the frame i-1 and all local image frame difference results external rectangles of the image of the frame i, determining that the two external rectangles with the largest intersection ratio represent the same target, and if the largest intersection ratio is larger than or equal to a first set value, directly merging the two external rectangles to obtain a new external rectangle as the suspected target result of the image of the frame i; and if the maximum intersection ratio is smaller than a first set value, respectively calculating the scores of the two circumscribed rectangles by using a convolutional neural network, and taking the circumscribed rectangle with high score as a suspected target result of the ith frame of image.

Preferably, the convolutional neural network comprises two convolutional layers, one pooling layer, and three fully-connected layers, wherein,

the input rectangular image resize is a grayscale image with the size of 12 × 12;

the first layer is a convolution layer, single-channel input and 4-channel output, the size of a convolution kernel is 3 multiplied by 3, the step length is 1, an effective filling strategy is adopted, an activation function is relu, and the output data dimensionality is 10 multiplied by 4;

the second layer is a pooling layer, the pooling scale is 2, and the output is changed into 5 multiplied by 4;

the third layer is a convolution layer, 4 channels are input, 8 channels are output, the size of a convolution kernel is 3 multiplied by 3, the step length is 1, an effective filling strategy is adopted, the activation function is relu, and the output data dimension is changed into 3 multiplied by 8;

the fourth layer stretches the data output by the third layer of convolution layer into a one-dimensional vector with the length of 72;

the fifth layer is a full connection layer, the length of an input vector is 72, and the output is 36;

the sixth layer is a full-connection layer, the length of an input vector is 36, and the output is 10;

the last layer is a full-connection layer, the input vector length is 10, floating point numbers between 0 and 1 are finally output and represent the final score of the input picture, the closer the score is to 1, the higher the probability of representing the moving target is, the closer the score is to 0, the less the moving target is represented.

Preferably, in step (10), the multistage filtering method specifically includes:

assuming that F represents the amplitude spectrum of the original image and L _ p represents the amplitude spectrum of the low-pass filter transfer function, the result of the image F passing through the low-pass filter is denoted as F × L _ p, which is subtracted from F of the original image to obtain F-F × L _ p ═ F (1-L _ p);

f (1-L _ p) plus absolute value processing for F-F × L _ p to simultaneously achieve re-detection of small bright and dark targets.

The method extracts the FAST angular point in the first frame image, screens out a triangle matching area through the angular point, and finishes the registration of the frame image through triangle matching. Detecting a target from a third frame of image, extracting a FAST corner of each image, constructing a candidate region by taking the corner as a center, calculating the structural similarity between the candidate region and a region corresponding to an image of a frame, filtering a background region with overlarge structural similarity, screening out an interested region of a next frame of image and a target detection region of a current frame of image, and performing moving target detection in a local region; in the frame image processing process, the accuracy of target capture is improved by adopting multi-frame fusion, and the target is stably tracked by adopting a method of combining related filtering tracking and small target re-inspection.

In general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:

(1) the invention does not adopt a registration method of sift, surf and the like which needs to be subjected to complex feature extraction and feature matching, but considers that the displacement change of adjacent frame images in a video sequence is very small, particularly, the displacement pixels can be controlled to be single-digit when the images pass through image stabilization at the aiming stage, so that the rapid registration of the adjacent frame images is realized by adopting a triangular matching mode, and the displacement relation of the two frame images is obtained. The triangular matching method selects three different areas as the templates, and the three templates are positioned at different positions of the image, so that the situation that a single template is matched locally and accidentally to cause matching errors can be avoided when the templates are matched, and the method has strong robustness.

(2) The method comprises the steps of constructing candidate areas by using the detected FAST angular points, calculating the structural similarity of the candidate areas at intervals of frames, taking the areas with the screening structural similarity results smaller than a first threshold value as the areas of interest of the next frame, and taking the areas with the screening structural similarity results smaller than a second threshold value as the target detection areas of the current frame. Along with the accumulation of the frame number, the filtered background regions are more and more, the region of interest and the target detection region are less and less, the number of the extracted corner points is reduced, and similarly, the candidate regions are further reduced. Therefore, as the sequential images are processed frame by frame, the image area required to be calculated by the algorithm is smaller and smaller, and the running speed is higher and higher until the target is finally locked.

(3) The invention does not perform frame difference calculation in the global scope, but performs local image frame difference in a local area with obvious motion information. This is because the moving object is often in a local area of the image, and the background can be considered as stationary, so that it is not necessary to perform frame differencing on a global scale. And the global frame difference may introduce too many false alarms to be filtered out completely in a short time.

(4) The invention adopts the multi-frame fusion method to determine the stable ground moving target, because the ground background is very complicated, even if the method of local frame difference is adopted, the moving target can not be stably divided, and no other false alarms exist, therefore, the stable ground moving target in the image sequence can be effectively extracted and the unstable false alarms can be removed by adopting the multi-frame fusion method.

(5) The method comprises a small target rechecking process in the tracking stage, so that the situation that the tracking is seriously deviated due to the fact that the target size is too small is avoided. The small target re-detection adopts a multi-stage filtering method, bright and dark small target detection is carried out near the target, the moving small target in the background is re-extracted and segmented, and the intersection and parallel ratio of the tracking result and the true value result in the sequence image can be effectively improved.

Drawings

FIG. 1 is a flow chart of a method for detecting and tracking a ground moving target by combining local frame differences and multiple frames according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a triangular matching area provided by an embodiment of the present invention;

FIG. 3 is a flow chart of triangle matching provided by an embodiment of the present invention;

FIG. 4 is a diagram illustrating the results of triangle matching according to an embodiment of the present invention;

FIG. 5 is a flow chart of a local image frame difference method provided by an embodiment of the invention;

FIG. 6 is a schematic diagram of a convolutional neural network structure provided in an embodiment of the present invention;

FIG. 7 is a graph of activation functions used by neurons in a convolutional neural network provided by an embodiment of the present invention;

FIG. 8 is a comparison graph of the intersection ratio between the small target review and non-review in the tracking process provided by the embodiment of the invention;

fig. 9 is a graph comparing the central euclidean distance results of the small target review and non-review in the tracking process according to the embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

The invention provides a ground moving target detection and tracking method based on local frame difference and multi-frame fusion. The technical problems that in the ground-oriented vehicle moving target detection technology, due to the fact that a platform moves, frame difference of images cannot be directly extracted to obtain moving target information, targets are possibly not unique, and stable moving targets cannot be detected in a global image are solved.

As shown in fig. 1, the present invention provides a local frame difference and multi-frame fused ground moving target detection and tracking method, which continuously collects a plurality of frames of images for a certain fixed field of view, and comprises the following steps:

(1) dividing the 1 st frame image into square regions with the size of M multiplied by N (default to 50 multiplied by 50), extracting FAST (features from accessed Segment test) corners of the whole image, and determining the square regions with the corners as the interested regions of the 3 rd frame image (the 2 nd frame image is not processed);

region of interest: the method is characterized in that each frame of image possibly has a target area, in the process of processing the image frames of the image frames, a FAST corner point is only extracted from an interested area, the interested area of the image of the 1 st frame is the complete area of the image, the image of the 2 nd frame is not processed, the interested area of the image of the 3 rd frame is determined by the interested area screening of the 1 st frame, and in the subsequent images, the interested area of the image of the i th frame is determined by the interested area screening of the image of the i-1 st frame.

The set of regions of interest of the ith frame image is recorded as:

IA(i)＝{(x_n(i),y_n(i),w_n(i),h_n(i)),1≤n≤I}

IA (i) represents the region of interest set of the ith frame image, (x)_n(i),y_n(i),w_n(i),h_n(i) In the interest region set of the ith frame image, the abscissa and the ordinate of the vertex at the upper left corner of the nth interest region, the width and the height of the nth interest region, and I represents the number of interest regions of the ith frame image.

(2) Counting the number of FAST corners of all grid regions in the 1 st frame of image (or the number of corners of interest regions of all the 3 rd frame of image in the 1 st frame of image), selecting three grid regions with the largest number of FAST corners as template regions for triangle matching, and performing triangle matching on the 1 st frame of image and the 2 nd frame of image to obtain the displacement relationship between the 1 st frame of image and the 2 nd frame of image;

in this step, three template regions of the 1 st frame image are represented as

A_j(1)＝(x_j(1),y_j(1),M,N),j＝1,2,3

In the formula, A_j(1) Representing the jth template region, x, of the 1 st frame image_j(1),y_j(1) Respectively representing the x-axis and y-axis coordinates of the top left vertex of the template region. An example of three template regions is shown in figure 2.

Fig. 3 shows a triangle matching process, and fig. 4 shows a schematic diagram of a triangle matching result in this embodiment. In this step, the method for triangle matching between the 1 st frame image and the 2 nd frame image comprises: keeping the position relation of the three template areas of the 1 st frame image unchanged, performing template matching in a set range in the 2 nd frame image, and taking the displacement in the horizontal and vertical directions when the absolute value of the position error is minimum as the displacement relation of the 2 nd frame image and the 1 st frame image.

in this step, three template regions in the i-1 th frame image are represented as:

A_j(i-1)＝(x_j(i-1),y_j(i-1),M,N),j＝1,2,3

in the formula, A_j(i-1) represents the jth template region, x, of the ith-1 frame image_j(i-1),y_j(i-1) x-axis and y-axis coordinates representing the top left vertex of the template region, respectively, and M and N are defaulted to 50.

In this step, the method for performing triangle matching between the ith frame image and the (i-1) th frame image is as follows: keeping the positional relationship of the three template areas of the image of the (i-1) th frame unchanged, performing template matching in a set range (within +/-3 pixels in the horizontal and vertical directions of the image by default) in the image of the (i) th frame, evaluating the matching degree by using the absolute value of the position error, and taking the position error with the minimum absolute value as the best matching result, wherein the obtained (delta x (i-1) and delta y (i-1)) are the displacement relationship of the image of the (i) th frame and the image of the (i-1) th frame:

The displacement relation of the alternate frame image of the ith frame image and the ith-2 frame image is as follows:

wherein, (Δ x, Δ y) represents the positional relationship between the i-th frame image and the i-2 nd frame image.

Updating the template area in the ith frame image according to the displacement relation between the ith frame image and the (i-1) th frame image, wherein the updating formula of the jth template in the ith frame image is as follows:

in the formula, x_j(i) Is the x-axis coordinate, y, of the jth template region in the ith frame image_j(i) Is the y-axis coordinate, x, of the jth template region in the ith frame image_j(i-1) x-axis coordinate, y, of jth template region in ith-1 frame image_j(i-1) is the y-axis coordinate of the jth template region in the image of the (i-1) th frame.

In the step, a registration method which needs to finish a large amount of complex calculation, such as sift registration, edge registration and the like, is not adopted, but the simplest template matching is directly adopted to realize the coarse registration of the adjacent frame images, so that the complexity of the algorithm and the load of the platform are greatly reduced. After the displacement relation of the adjacent frames is obtained, the displacement relation of the frame-separated images can be obtained through the chain relation between the frame frames.

(4) If the interested area exists in the ith frame of image, correcting the interested area of the ith frame of image; specifically, if i is 3, correcting the region of interest of the 3 rd frame image according to the displacement relationship between the 1 st frame image and the 2 nd frame image and the displacement relationship between the 2 nd frame image and the 3 rd frame image; and if i is larger than or equal to 4, correcting the interested area of the ith frame image according to the displacement relation between the ith frame image and the (i-1) th frame image.

More specifically, the formula for rectifying the region of interest of the 3 rd frame image is:

in the formula, x_n' (3) is the x-axis coordinate of the top left vertex of the region of interest of the corrected 3 rd frame image, y_n' (3) is the y-axis coordinate of the top left vertex of the region of interest of the corrected 3 rd frame image, x_n(3) Is the x-axis coordinate, y, of the top left vertex of the region of interest of the 3 rd frame image before rectification_n(3) For the y-axis coordinate of the top left vertex of the interested region of the 3 rd frame image before rectification, Δ x (1) is the displacement of the 2 nd frame image relative to the 1 st frame image in the horizontal direction, Δ y (1) is the displacement of the 2 nd frame image relative to the 1 st frame image in the vertical direction, and Δ x (2) is the displacement of the 3 rd frame image relative to the 2 nd frame image in the horizontal directionUpward displacement, Δ y (2) is the displacement of the 3 rd frame image relative to the 2 nd frame image in the vertical direction;

And if no interesting area exists in the ith frame image, defining the full image as the interesting area.

(5) Extracting FAST angular points in an interested area in the corrected ith frame image, and respectively taking each FAST angular point as a center to construct a square candidate area with set side length R (the default value is 19); the set of these candidate regions is:

CA(i)＝{(x_c(i),y_c(i),w_c(i),h_c(i))}

where CA (i) represents the candidate region set in the ith frame image, x_c(i),y_c(i),w_c(i),h_c(i) And the abscissa and the ordinate of the vertex at the upper left corner of the candidate region c in the image of the ith frame and the width and the height of the candidate region c are represented.

According to the displacement relation between the ith frame image and the ith-2 frame image, obtaining corresponding areas of all candidate areas in the ith frame image in the ith-2 frame image, wherein the set of the corresponding areas in the ith-2 frame image is as follows:

CA”(i)＝{(x_c(i)-Δx(i-2),y_c(i)-Δy(i-2),w_c(i),h_c(i))}

Then, calculating the structural similarity between the candidate region image of the ith frame image and the corresponding region image in the ith-2 frame image, wherein the calculation formula is as follows:

wherein X and Y are respectively the candidate region image of the ith frame image and the image of the corresponding region of the ith-2 frame image, u_xAnd u_yRepresenting the mean, σ, of images X and Y, respectively_XAnd σ_YRespectively representing the standard deviation of images X and Y,

and

representing the variance, σ, of images X and Y, respectively_XYRepresenting the covariance of images X and Y. C₁And C₂Is a constant for maintaining stability, and C is usually taken as the denominator of 0 to maintain stability₁＝(K₁·L)²And C₂＝(K₂·L)²Generally, take K₁0.01 and L255.

(6) And screening all candidate regions according to the structural similarity result, wherein the specific method for screening the candidate regions is as follows:

(6-1) if the structural similarity result is greater than or equal to a first threshold (the default value is 0.88), directly considering the result as belonging to the background and filtering the result;

(6-2) if the structural similarity result is smaller than a first threshold value, screening the region of interest, and combining all the regions of interest (in this case, small regions of interest with the size of R multiplied by R) to synthesize a plurality of large regions of interest as the region of interest of the next frame of image;

(6-3) if the structural similarity result is smaller than a second threshold (the default value is 0.7), the region is simultaneously screened as a moving target detection region of the ith frame image (in this case, small moving target detection regions with the size of R multiplied by R), the moving target detection regions are finally merged to synthesize a plurality of large moving target detection regions as the last moving target detection region of the current frame, and the set of the moving target detection regions of the current frame is defined as:

MA(i)＝{(x_m(i),y_m(i),w_m(i),h_m(i))}

wherein MA (i) is the set of all moving target detection areas in the ith frame image, x_m(i),y_m(i),w_m(i),h_m(i) Sequentially representing the abscissa and the ordinate of the top left corner vertex of the mth moving-eye detection region in the ith frame image, and the width and the height of the mth moving-eye detection region, the set of corresponding regions in the ith-2 frame image is as follows:

MA”(i)＝{(x_m(i)-Δx(i-2),y_m(i)-Δy(i-2),w_m(i),h_m(i))}

the set of corresponding regions in the i-1 th frame image is:

MA'(i)＝{(x_m(i)-Δx(i-1),y_m(i)-Δy(i-1),w_m(i),h_m(i))}

in this step, based on the extracted FAST corner, the structural similarity of the corresponding regions of the frame-separated images is used to screen the background and the region of interest (the region of interest simultaneously includes the moving target detection region), instead of using image segmentation and edge segmentation in the conventional image processing.

(7) Performing local image frame difference on all moving target detection areas of the ith frame of image obtained in the step (6), and obtaining a local image frame difference result by adopting a three-frame difference method, wherein as shown in fig. 5, the local image frame difference method specifically comprises the following steps:

(7-1) subtracting the image of the corresponding region in the i-1 frame image from the image of the moving target detection region in the i frame image, taking the absolute value of the obtained difference value to obtain a frame difference image, and performing binary segmentation by using a maximum inter-class variance method to obtain a final binary image, namely a local frame difference result 1;

(7-2) subtracting the image of the corresponding region in the image of the i-2 frame from the image of the moving target detection region in the image of the i-2 frame, taking the absolute value of the obtained difference value to obtain a frame difference image, and performing binary segmentation by using a maximum inter-class variance method to obtain a final binary image, namely a local frame difference result 2;

(7-3) performing intersection operation on the local frame difference result 1 and the local frame difference result 2, and performing morphological filtering on the finally obtained binary image to obtain a final local frame difference result of an image contained in a bright area external rectangle of the binary image, namely the ith frame image.

(8) Multi-frame image fusion: fusing the final local image frame difference result of the ith frame image with the obtained suspected target result in the (i-1) th frame image to obtain the suspected target result of the ith frame image, updating the stability times and stability scores of all the suspected targets, and deleting the corresponding suspected target when the stability score of the suspected target is less than 0; if the frame difference result cannot be fused with the existing suspected target, the local frame difference result is directly added as the suspected target of the ith frame image, the stability frequency is initialized to be 0, and the stability score is 1.

And if the second frame image does not obtain the suspected target during the third frame image processing, all the frame difference results of the third frame image are taken as the suspected targets, and the fourth frame image processing is performed.

The stability times represent the times that a suspected target is continuously judged as a moving target from a third frame, if the suspected target is continuously judged as the moving target, the times are added with 1, if the times are judged as non-moving targets in the middle and the current stability times are more than 0, the suspected target is directly returned to 0, otherwise, the suspected target is subtracted by 1 every time;

the stability score represents the score of the possibility that the suspected target is the moving target, the larger the stability score is, the more the suspected target is likely to be the moving target, and the increase and decrease of the stability score are linearly related to the stability times of the current suspected target.

In the step, a false alarm is filtered by adopting a multi-frame information fusion method, and a stable moving target is extracted.

The specific steps of multi-frame image fusion are as follows:

(8-1) if no suspected target exists in the i-1 th frame image, directly adding a current frame difference result as the suspected target of the i-th frame image, initializing the stability times of all the suspected targets to be 0, and obtaining a stability score of 1;

if the suspected target exists in the image of the (i-1) th frame, firstly, completing position correction of the suspected target through a triangular matching result (delta x (i-1) and delta y (i-1) are respectively subtracted from horizontal and vertical coordinates of all circumscribed rectangles), then carrying out template matching in a local area (within +/-6 pixels in the horizontal direction and the vertical direction by default) of the image of the (i) th frame, selecting the best matching position as the final position of the suspected target, and entering (8-2);

(8-2) calculating the intersection ratio of all the circumscribed rectangles of the suspected targets in the image of the (i-1) th frame and all the circumscribed rectangles of the local frame difference results of the image of the (i) th frame,

(8-3) the suspected target with the largest (non-zero) intersection ratio and the local frame difference result (hereinafter referred to as the "suspected target" and the "local frame difference result") are determined to represent the same target, that is, the suspected target is considered to be continuously and stably present in the ith frame image, and the stability number is added by 1. At this time, if the cross-over ratio value is greater than a first set value, for example, 0.5, directly merging the suspected target with the circumscribed rectangle of the local frame difference result to obtain a new circumscribed rectangle as the suspected target result of the ith frame image, and increasing the stability score by 3 times of the current stability times; if the intersection ratio is smaller than the first set value, for example 0.5, the neural network module is used for respectively calculating the probability scores of the suspected target and the local frame difference result belonging to the vehicle, if the score of the local frame difference result is higher, the suspected target is replaced by the local frame difference result, the stability score is increased by 1 time of the stability times, otherwise, the stability score of the suspected target is only increased by 1 time of the stability times. And the newly obtained suspected target is taken as the suspected target of the ith frame of image.

(8-4) the local frame difference result between the suspected object in the i-1 frame image fused in the step (8-3) and the i frame image is marked and does not participate in the next fusion of the current frame, and then the step (7-3) is repeated until one of the following 3 conditions is met: the maximum intersection ratio is 0, all the suspected targets in the ith-1 th frame of image participate in the fusion process, and all the local frame difference results of the ith frame of image participate in the fusion process;

(8-5) if the suspected target in the image of the i-1 th frame is not fused, updating the stability times and the stability score: if the stability times are more than 0, directly returning to 0, otherwise subtracting 1; stability score increases by 1 time the number of stabilities (stability score decreases if the number of stabilities is negative); and directly adding the current frame difference result into the local frame difference result which does not participate in fusion in the ith frame image to serve as the suspected target of the ith frame image, initializing the stability times of all the suspected targets to be 0, and obtaining the stability score to be 1.

As shown in fig. 6, the convolutional neural network includes two convolutional layers, one pooling layer, and 3 fully-connected layers, which are as follows:

the input rectangular image is firstly resize is a gray scale image with the size of 12 x 12, then the input rectangular image is input into a convolutional neural network, a first layer of convolutional layer is single-channel input and is output in 4 channels, the size of a convolutional kernel is 3 x 3, the step length is 1, an effective filling strategy is adopted, an activation function is relu, the activation function is shown in figure 7, and the dimensionality of output data is 10 x 4;

the last layer is a full-connection layer, the input vector length is 10, floating point numbers between 0 and 1 are finally output and represent the final score of the input picture, the closer the score is to 1, the higher the probability of representing a car is, and the closer the score is to 0, the less likely the car is.

(9) Taking i as i +1, repeatedly executing the steps (3) - (8) until the maximum detection frame number is reached (default is 10), determining all moving targets in the image, sorting all moving targets according to stability scores, and enabling an operator to select one target to enter a tracking stage; an operator can carry out multistage filtering in a local area of the moving target, and a target circumscribed rectangle which needs to be tracked finally is segmented.

In the method, the moving target is not simply obtained by performing a frame difference method on the whole image, but the region of interest and the candidate region of the moving target which are possibly provided with the target are obtained by screening in the step (6), iteration is performed in the next frame of image, corner points are extracted and the candidate region is constructed on the basis of the existing region of interest, so that the region of interest and the detection region of the moving target can be repeatedly screened in an iterative manner, the region of interest and the detection region of the moving target are continuously reduced in the process of continuously filtering the background, the operation speed of the algorithm is improved,

after calculating to obtain a moving target candidate region of the current frame, obtaining corresponding regions of the moving target candidate region of the ith frame image in the (i-1) th frame image and the (i-2) th frame image by utilizing an image displacement relation obtained by triangular matching, and extracting a moving target in a local region by adopting a three-frame difference method. Since a large number of background regions have been screened and filtered, many false alarms can be effectively filtered out during the frame difference process.

(10) After the tracking target is determined, an FDSST (fast discrete Scale Space tracking) method is adopted to stably track the target, if the target Scale is too small (for example, smaller than 16 multiplied by 16), multi-stage filtering is adopted to perform recheck in a local area, and the target can be stably tracked. The specific method of multistage filtering is as follows:

assuming that F represents the amplitude spectrum of the original image and L _ p represents the amplitude spectrum of the low-pass filter transfer function, the result of the image F passing through the low-pass filter can be represented as F × L _ p, which is subtracted from F of the original image to obtain F-F × L _ p ═ F (1-L _ p), with the result that the noise is then suppressed and enhanced in the same way in order to highlight the target and the noise.

In the actual processing, positive and negative peak values appear for F-F × L _ p (1-L _ p), and at this time, absolute value processing is performed, so that whether a bright target or a dark target is regarded as bright characteristics, and by using such bright characteristics, the re-detection of small bright and dark targets can be simultaneously realized.

In the step, in the tracking stage, the target is not simply tracked by a tracker, but small target rechecking is carried out according to the target scale, and when the target scale is too small, the small target rechecking is carried out in a local area by using multistage filtering, so that the tracker can stably track the target.

As shown in fig. 8, as a comparison graph of the intersection ratio between the small target review and the non-review in the tracking process provided by the embodiment of the present invention, it can be found that the intersection ratio between the tracking result and the true value is obviously better than that of the tracking algorithm without the small target review by adding the tracking algorithm for the small target review; as shown in fig. 9, a comparison graph of central euclidean distance results of the small target re-inspection and the non-re-inspection in the tracking process provided by the embodiment of the present invention is added with a tracking algorithm of the small target re-inspection, and the tracking result obtained in the tracking stage has a smaller central euclidean distance from the target true value. Therefore, a tracking algorithm for re-inspection of the small target is added, and a better tracking effect is achieved.

It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A local frame difference and multi-frame fused ground moving target detection and tracking method is characterized in that images are continuously acquired in a certain fixed view field, and the method comprises the following steps:

(8) fusing the local image frame difference result of the ith frame image with the suspected target result in the (i-1) th frame image to obtain the suspected target result of the ith frame image;

2. The method as claimed in claim 1, wherein in step (3), the three template regions in the i-1 th frame image are represented as:

A_j(i-1)＝(x_j(i-1),y_j(i-1),M,N),j＝1,2,3

3. The method of claim 1, wherein the method of triangle matching is:

4. The method according to claim 1, wherein in the step (3), the displacement relationship between the ith frame image and the (i-1) th frame image is calculated by the formula:

5. The method of claim 1, wherein the updating formula of the template region in the ith frame image is:

6. The method as claimed in claim 1, wherein in the step (4), the formula for rectifying the region of interest of the 3 rd frame image is as follows:

in the formula, x_n' (3) is the x-axis coordinate of the top left vertex of the region of interest of the corrected 3 rd frame image, y_n' (3) is the y-axis coordinate of the top left vertex of the region of interest of the corrected 3 rd frame image, x_n(3) Is the x-axis coordinate, y, of the top left vertex of the region of interest of the 3 rd frame image before rectification_n(3) For the y-axis coordinate of the top left vertex of the interested region of the 3 rd frame image before rectification, Δ x (1) is the displacement of the 2 nd frame image relative to the 1 st frame image in the horizontal direction, Δ y (1) is the displacement of the 2 nd frame image relative to the 1 st frame image in the vertical direction, and Δ x (2) is the displacement of the 3 rd frame image relative to the 2 nd frame imageDisplacement in the horizontal direction, Δ y (2) is the displacement in the vertical direction of the 3 rd frame image relative to the 2 nd frame image;

7. The method of claim 1, wherein in step (5), the set of square candidate regions in the ith frame image is

CA(i)＝{(x_c(i),y_c(i),w_c(i),h_c(i))}

the set of corresponding regions in the i-2 th frame image is:

CA”(i)＝{(x_c(i)-Δx(i-2),y_c(i)-Δy(i-2),w_c(i),h_c(i))}

Δ x (i-2) is the displacement of the ith frame image relative to the ith-2 frame image in the horizontal direction, and Δ y (i-2) is the displacement of the ith frame image relative to the ith-2 frame image in the vertical direction;

and

8. The method according to claim 1, characterized in that step (8) is embodied as:

9. The method of claim 8, wherein the convolutional neural network comprises two convolutional layers, one pooling layer, and three fully-connected layers, wherein,

the input rectangular image is a gray scale image with the size of 12 multiplied by 12;

10. The method according to claim 1, wherein in step (10), the multi-stage filtering method is specifically: