Moving target tracking method and system thereof
Technical Field
The invention relates to a video monitoring technology, in particular to a moving target tracking method and a moving target tracking system in an intelligent video monitoring system.
Background
With increasing crime levels and threats, security has become a common concern in the world. Video surveillance is one of the methods to solve this problem. Besides public safety, video monitoring can also effectively solve other problems, such as adjustment of traffic flow and people flow in crowded cities. Large monitoring systems have been widely used for many years in major locations such as airports, banks, highways or city centres.
Because the traditional video monitoring technology is generally manual monitoring, the defects of easy fatigue, easy negligence, slow reaction speed, high labor cost and the like exist. Therefore, in recent years, a digital, standardized, intelligent and IP-networked intelligent video monitoring technology is gradually researched.
Conventional intelligent video surveillance techniques all include a moving object tracking technique. The purpose of moving object tracking is to determine the position of the same object in different scene images on the basis of correctly detecting the moving object.
To achieve tracking, methods based on motion analysis, such as inter-frame difference methods and optical flow segmentation methods, may be used. The interframe difference method is to subtract the adjacent frame images, then take the threshold value of the result image and divide, and then extract the moving target. The method has the disadvantages that whether the target moves in the scene can be detected only according to the intensity change of the pixels between frames, and the correlation between the frames of moving target signals and the correlation between the frames of noise are weak and difficult to distinguish. Optical flow segmentation detects moving objects by the different velocities between the object and the background. The method has the disadvantages that the problems of background occlusion, display, aperture and the like caused by the movement of the target cannot be effectively distinguished, the calculation amount is large, and special hardware support is required.
To achieve tracking, image matching methods such as region matching, model matching may be used. The area matching is to superpose a certain block of the reference image with all possible positions of the real-time image, and then calculate the corresponding value of a certain image similarity measure, and the position corresponding to the maximum similarity is the position of the target. The method has the defects of large calculation amount and difficulty in meeting the real-time requirement. Model matching is matching objects in a scene image according to a template. The method has the defects of complex calculation and analysis, low operation speed, complex model updating and poor real-time performance.
In summary, there is an urgent need to provide a simpler, more effective and more real-time moving target tracking scheme.
Disclosure of Invention
In view of the above, the main objective of the present invention is to provide a moving target tracking method and system thereof, which can obtain correct foreground images and reduce target detection errors; furthermore, operations such as prediction, matching, updating and the like can be performed according to the detection result so as to filter false moving targets and realize accurate tracking of the moving targets.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
the invention provides a moving target tracking method, which comprises the following steps:
detecting a target, and segmenting a moving target area in a video scene from a background;
predicting a target, and estimating the next frame motion of the target;
matching targets, tracking the matched stable targets, and filtering false targets;
and updating the target, and updating the template of the stable target in the current frame.
According to the invention, the detection of the target comprises the following steps:
acquiring a video, acquiring video content to obtain a scene image, and establishing a background model;
preprocessing the image, and eliminating the influence of the scene image on the background model;
marking a region, performing foreground segmentation on the scene image according to the background model, and marking a connected region;
maintaining the state, judging the current state of the detection target module, performing corresponding processing, and performing abnormal detection if necessary;
enhancing the area, and removing false areas of shadow, highlight and leaf swing by using shadow detection, highlight detection and tree filtering;
splitting and merging the regions, merging and splitting the regions by using the constraint provided by the background model and the prior knowledge of the human and vehicle models so as to solve the problems of over-segmentation of the target and mutual shielding of the target.
Wherein the pre-processing the image comprises: filtering processing and global motion compensation; wherein,
the filtering process includes: carrying out noise filtering processing and image smoothing processing on the image;
the global motion compensation is to compensate the image global motion caused by slight swing of the camera, and in the global motion compensation, a motion model comprises translation, rotation and zooming.
The area brightness difference IDS of plus and minus 5 pixels around the rectangular area where the foreground is located is calculated through the following conventional formula to obtain the image translation distances delta x and delta y in global motion compensation, wherein the formula is as follows:
<math>
<mrow>
<mi>IDS</mi>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>x</mi>
<mo>=</mo>
<msub>
<mi>s</mi>
<mi>x</mi>
</msub>
</mrow>
<mi>m</mi>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>y</mi>
<mo>=</mo>
<msub>
<mi>s</mi>
<mi>y</mi>
</msub>
</mrow>
<mi>n</mi>
</munderover>
<mrow>
<mo>(</mo>
<msub>
<mi>I</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>I</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<msub>
<mi>s</mi>
<mi>x</mi>
</msub>
<msub>
<mi>s</mi>
<mi>y</mi>
</msub>
</mrow>
</math>
wherein s isxDenotes the area starting point x coordinate, syDenotes the y coordinate of the start of the region, I(x,y)(t) represents the gray level of the current frame image, I(x,y)(t-1) representing the gray level of the image of the previous frame; calculating the position variation of other four areas in the same way, and finally solving the average delta x and delta y; and translating the image according to the delta x and the delta y to obtain a compensated image.
Wherein the marking region comprises the steps of:
foreground segmentation, namely segmenting a scene image based on a background model to obtain a binary image of a foreground;
morphological processing, namely processing the binary image by using a mathematical morphology method to remove false regions with small areas and fill regions with large areas; and
and marking different areas in the same scene by using a connected domain method to distinguish different target areas.
Wherein the maintenance state includes state determination and anomaly detection.
The state judgment is to judge the current state of the detection target module and carry out corresponding processing; when the scene stability time exceeds a threshold value 1, the system enters a working state from an initialization state; when the scene change time exceeds the threshold value 2, the system enters an initialization state from an operating state. The threshold value 1 is preferably between 0.5 and 2 seconds, and the threshold value 2 is preferably between 5 and 20 seconds.
The abnormal detection is executed under the conditions that the video signal interference is serious and the camera is artificially shielded; and judging according to the edge matching values of the background twice and the shortest time for successful background initialization, and if the value of the background of the current frame matched with the edge of the background model is less than a threshold value 3 or the shortest time for successful background initialization exceeds a threshold value 4, determining that the phenomenon is abnormal. The threshold 3 is preferably 30-50 seconds, and the threshold 4 is preferably 6-20 seconds.
Wherein the enhancement region comprises: shadow detection, highlight detection, tree filtering.
And shadow detection, namely respectively calculating the mean value of the pixel values in each communication area, taking the mean value as a threshold value, judging the shadow area of the area, filtering the shadow area, and judging the shadow if the pixel value is smaller than the threshold value.
The highlight detection is for detecting whether or not an image is in a highlight state, and if so, brightness compensation is performed so that the average value of pixel values of the image becomes 128.
And the tree detection is used for detecting the leaves of the swinging tree and the shadows of the swinging tree in the image and filtering the leaves of the swinging tree from the foreground image.
The detection of the swing leaves is realized according to one of the following two characteristics: (1) tracking a motion track, and when the part of the area of the motion area corresponding to the target in the motion track point is smaller than a threshold value 5 of the area of the motion area, considering that the target is a swing leaf; (2) and (3) the amplitude of the centroid motion, and when the displacement change of the centroid of the target in the adjacent track points exceeds the threshold value 6 of the width of the target, the target is considered to be the leaf of the swinging tree. Wherein, the threshold value 5 is preferably between 5% and 15%; the threshold 6 is preferably between 1.5 and 2.5.
The detection method of the swinging leaf shadow comprises the following steps: and respectively counting the number of points with the pixel value of 1 before and after the expansion operation in the area before and after the expansion operation, calculating the ratio of the points, and if the ratio is less than a threshold value 7, determining that the area is the area of the swinging leaf shadow. Wherein, the threshold value 7 is preferably between 40% and 60%.
The splitting and merging area is based on the processing process of the enhancement area, and whether two adjacent areas are the same target area is judged; if the two regions belong to the same target region, merging the two regions; otherwise, splitting the same; the two adjacent regions are regions with a region edge distance smaller than a threshold value 8. The threshold value 8 is preferably between 3 and 7 pixels.
According to the invention, the target is predicted by calculating the average speed of the target movement according to the accumulated displacement of the target movement and the corresponding accumulated time, and predicting the next displacement of the target according to the speed; wherein,
the relationship among the accumulated displacement, the accumulated time and the average movement speed is as follows:
v=s/t
wherein s is the displacement of a target mass center after stably moving for multiple frames, t is the time required by the target moving for multiple frames, and v is the average speed of the target stably moving;
the next displacement predicted from the average velocity v is:
s′=v·Δt
and the time delta t is the predicted target time, and the time s' is the displacement of the target mass center after the stable movement time delta t.
According to the invention, the matching target comprises: tracking the matched stable target and filtering out false target; wherein,
the stable target of the tracking matching is to judge whether the detection area is matched with the tracking target, and the matching is judged according to a matching coefficient D of the detection area and the target in the following formula:
D=Da*ADa+Db*ADb+Dc*ADc
where Da is an area matching coefficient, Db is a histogram matching coefficient, and Dc is a distance matching coefficient. When the matching coefficient D of the detection area and the target is larger than the threshold value 9, the detection area is judged to be matched with the target. A. theDa、ADb、ADcThe weight coefficients are respectively corresponding to Da, Db and Dc. Wherein, the threshold 9 is preferably between 0.7 and 0.8.
The area matching coefficient Da is that when the area of the area where the detection area and the target are intersected is larger than the threshold value 10 of the area of the target, the detection area is considered to meet the matching of the areas, and Da is 1; otherwise Da is 0. Wherein, the threshold value 10 is preferably between 40% and 60%.
A histogram matching coefficient Db, which is used for considering that the detection area meets the matching of the histogram when the histogram of the area where the detection area and the target are intersected is larger than the threshold value 11 of the histogram of the target, and taking 1 as Db; otherwise Db is taken to be 0. The threshold 11 is preferably between 40% and 60%.
A distance matching coefficient Dc that is considered in accordance with two cases of whether the detection region is moving or stationary; if the number of foreground points in the difference image of the detection area in the current frame image and the previous frame image is greater than the threshold value 12 of the number of background points, the detection area is considered to be moving, otherwise, the detection area is considered to be static.
When the detection area is in motion, calculating the distance between the center of the detection area in the current frame image and the center of the detection area in the current frame image, if the distance is smaller than a threshold value 13 of the length of a diagonal line of a rectangular frame where the target is located, considering that the distance matching is met, and taking 1 as Dc; otherwise Dc is taken as 0.
When the detection area is static, calculating the distance between the center of the detection area in the previous frame image and the center of the detection area in the current frame image, if the distance is less than a threshold value 14, determining that the distance matching is satisfied, and taking Dc as 1; otherwise Dc is taken as 0.
Wherein, the threshold value 12 is preferably between 65% and 75%. The threshold value 13 is preferably between 1.5 and 2. The threshold value 14 is preferably between 8 and 12 pixels.
Filtering out false targets by analyzing the motion tracks of the targets to filter out false target areas; the track analysis is to count the smoothness of the area change and the stationarity of the centroid point change by using target track information.
The smoothness of the statistical area change refers to an area set { area ] on a statistical target track point1,area2,...,areanN represents the number of the trace points, and the area mean value is counted:
<math>
<mrow>
<mover>
<mi>area</mi>
<mo>‾</mo>
</mover>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msub>
<mi>area</mi>
<mi>i</mi>
</msub>
</mrow>
</math>
and (3) counting the area variance: <math>
<mrow>
<msub>
<mi>area</mi>
<mi>sd</mi>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>area</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<mover>
<mi>area</mi>
<mo>‾</mo>
</mover>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</math>
when areasdWhen area is more than 0.5, the area change is not smooth, and the target area is filtered;
the method is characterized in that the stationarity of the change of the centroid points is calculated according to the fact that frequent sudden changes can not be generated in the direction of the movement of a normal target, the ratio of the direction change in the adjacent track points is calculated, if the ratio exceeds a threshold value 15, the change of the centroid points is considered to be unstable, and the target area is filtered. Wherein, the threshold value 15 is preferably between 40% and 60%.
According to another aspect of the present invention, there is also provided a moving object tracking system including:
the detection target module is used for segmenting a moving target area in the video scene image from a background;
a predicted target module, configured to estimate a position of the moving target in a next scene image;
the matching target module is used for tracking the matched stable target and filtering out a false target; and
and the target updating module is used for updating the template of the stable target in the current frame.
Wherein the detection target module comprises:
the video acquisition module is used for acquiring video content to obtain a scene image and establishing a background model;
the image preprocessing module is used for eliminating the influence of the scene image on the background model;
the marking region module is used for carrying out foreground segmentation on the scene image according to the background model and marking a connected region;
the maintenance state module is used for judging the current state of the detection target module, performing corresponding processing and performing abnormal detection when necessary;
the enhancement region module is used for removing false regions of shadow, highlight and leaf swing by using shadow detection, highlight detection and tree filtering; and
and the splitting and combining region module is used for combining and splitting the regions by using the constraint provided by the background model and the prior knowledge of the human and vehicle models so as to solve the problems of target over-segmentation and target mutual occlusion.
The matching target module comprises: the tracking matching stable target module is used for judging whether the detection area is matched with the tracking target or not; and a false object filtering module for filtering the false region.
The method has the greatest advantages of realizing the accurate tracking of multiple targets under the complex background, solving the problems of shielding, leaf swinging and the like, along with simple and convenient operation and strong practicability
The invention has the advantages that the invention can accurately detect the moving objects in the scene image, including people and vehicles, and can ignore the influence of interference factors such as image shake, swinging trees, brightness change, shadow, rain, snow and the like.
The invention can also be used in an intelligent video monitoring system to realize the functions of target classification identification, moving target warning, moving target tracking, PTZ tracking, automatic close-up shooting, target behavior detection, flow detection, congestion detection, carry-over detection, stolen object detection, smoke detection, flame detection and the like.
Drawings
FIG. 1 is a schematic structural diagram of a moving object tracking method according to the present invention;
FIG. 2 is a schematic diagram illustrating a process of detecting a target in the moving target tracking method according to the present invention;
FIG. 3 is a schematic flow chart of a labeling area in the moving object tracking method according to the present invention;
FIG. 4 is a schematic flow chart of a matching target in the moving target tracking method according to the present invention;
FIG. 5 is a schematic diagram of a moving object tracking system according to the present invention;
FIG. 6 is a schematic diagram of a target detection module in the moving target tracking system according to the present invention;
fig. 7 is a schematic structural diagram of a matching target module in the moving target tracking system of the present invention.
Detailed Description
Fig. 1 is a schematic structural diagram of a moving object tracking method in the present invention, and as shown in fig. 1, the moving object tracking method includes:
detecting a target 10, and segmenting a moving target area in a video scene from a background;
a predicted target 20 that estimates a next frame motion of the target;
matching the target 30, tracking the matched stable target, and filtering out false target;
the target 40 is updated, and the template of the stable target in the current frame is updated.
Firstly, a first step of detecting a target is carried out, namely, a moving target area in a video scene is segmented from a background. Fig. 2 is a schematic diagram of a detection target of the present invention, as shown in fig. 2. The schematic diagram of the detection target 10 includes: acquiring the video 11: acquiring video content to obtain a scene image, and establishing a background model; pre-processing the image 12: eliminating the influence of the scene image on the background model; mark region 13: performing foreground segmentation on the scene image according to the background model, and marking a connected region; maintenance state 14: judging the current state of the detection target module, performing corresponding processing, and performing abnormal detection if necessary; an enhanced region 15, which eliminates false regions of shadow, highlight and leaf wobble by using shadow detection, highlight detection and tree filtering; and splitting and merging the areas 16, merging and splitting the areas by using the constraint provided by the background model and the prior knowledge of the human and vehicle models so as to solve the problems of target over-segmentation and target mutual occlusion.
First acquiring the content of the video 11 is performed by a video acquisition device, which may be a visible spectrum, near infrared or infrared camera. The near infrared and infrared cameras allow application in low light without additional light. The background model is initially created with a first frame of scene images as the background model and then updated in the maintenance state 14.
The pre-processed image 12 then includes a filtering process and global motion compensation. The filtering processing refers to performing conventional processing such as noise filtering and smoothing on the image to remove noise points in the image. The filtering process can be implemented by the following documents, for example: "image denoising hybrid filtering method [ J ]. chinese image graphics press, 2005, 10 (3)", "adaptive center weighted improved mean filtering algorithm [ J ]. hua university press (natural science edition), 1999, 39 (9)".
Global motion compensation refers to compensating for image global motion due to slight camera shake. In global motion compensation, the motion model basically reflects various motions of the camera, including translation, rotation, zooming, and the like. The global motion compensation method comprises the following steps: based on the motion compensation of the region block matching, four region blocks are drawn in the image, the length and the width of each region block are between 32 and 64 pixels, and the region is required to cover a relatively fixed background, such as a building or a stationary background.
The conventional method of global motion compensation is as follows: assuming that the size of the rectangular area where the foreground is located is mxn, the area brightness difference IDS of plus and minus 5 pixels around the area is calculated, and the formula is as follows:
<math>
<mrow>
<mi>IDS</mi>
<mo>=</mo>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>x</mi>
<mo>=</mo>
<msub>
<mi>s</mi>
<mi>x</mi>
</msub>
</mrow>
<mi>m</mi>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>y</mi>
<mo>=</mo>
<msub>
<mi>s</mi>
<mi>y</mi>
</msub>
</mrow>
<mi>n</mi>
</munderover>
<mrow>
<mo>(</mo>
<msub>
<mi>I</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>I</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mi>t</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<msub>
<mi>s</mi>
<mi>x</mi>
</msub>
<msub>
<mi>s</mi>
<mi>y</mi>
</msub>
</mrow>
</math>
wherein s isxDenotes the area starting point x coordinate, syDenotes the y coordinate of the start of the region, I(x,y)(t) represents the gray level of the current frame image, I(x,y)And (t-1) represents the gray scale of the image of the last frame.
Thus, the position of the area corresponding to the minimum brightness difference is obtained, and the position change amounts Δ x and Δ y of the area are calculated. And calculating the position variation of other four areas in the same way, and finally solving the average delta x and delta y. And translating the image according to the delta x and the delta y to obtain a compensated image.
Fig. 3 is a schematic flow chart of the marking region 13 according to the present invention, as shown in fig. 3. The flow of the area mark 13 flow is specifically as follows: foreground segmentation 131, morphological processing 132, connected region labeling 133.
The foreground segmentation 131 is to segment the scene image based on the background model to obtain a binary image of the foreground. Specifically, the pixel values corresponding to the scene image and the background model are subtracted, and if the result is greater than a set threshold value, the result is marked as "1" to represent the scene image as a foreground point; if less than the threshold, it is noted as "0" to represent as a background point, thereby obtaining a binary image of the foreground.
The morphology processing 132 is to process the binary image by using a mathematical morphology method, i.e. by erosion and then dilation, to remove the dummy regions with small area and fill the regions with large area. Wherein, the corrosion parameter is selected to be a 3X 3 template, and the expansion parameter is selected to be a 3X 3 template.
The connected component marking 133 generally refers to marking different regions in the same scene by using a connected component method to distinguish different target regions. The connected region labeling method may be implemented by a four-connected domain method or an eight-connected domain method. The method for the connectivity marking of the eight-connection/four-connection domain comprises the following steps: firstly, the image obtained by the morphological processing 132 is scanned line by line, a first point of an unmarked area is found, and the point is marked; checking the eight-link/four-link domain points of the point, marking the points which meet the connectivity requirement and are not marked yet, and recording newly added marked points as seed points of 'region growing'. In the subsequent marking process, continuously taking out a seed from the array of the recorded seed points, and executing the operation, and repeating the steps until the array of the recorded seed points is empty and a connected region mark is finished. The next unmarked area is then marked until all connected regions of the image acquired by the morphological processing 132 are marked.
In the mark area 13, individual areas do not correspond one-to-one to individual targets. Due to the occlusion, an area contains multiple people or vehicles; since the foreground is similar to the background, one object may be over-segmented into multiple regions; due to the influence of illumination, shadow and highlight areas may be contained in the area; false foreground regions may also be created due to some non-interesting motions, such as leaf wiggling and water wave rippling. These problems are inherent in the background model approach and need to be solved in a subsequent step.
The maintenance state 14 in fig. 2 includes: status determination and anomaly detection.
The state judgment means that the current state of the detection target module is judged and corresponding processing is performed. The current state of the detection target module is mainly determined according to the scene stable time and the scene change time. When the scene stability time exceeds a threshold value 1, the system enters a working state from an initialization state; when the scene change time exceeds the threshold value 2, the system enters an initialization state from an operating state.
The threshold value 1 is preferably between 0.5 and 2 seconds. The threshold value 2 is preferably between 5 and 20 seconds.
And when the mobile terminal is in the working state, continuing to execute the next operation, and keeping the background model unchanged. When in the initialization state, the background model is re-established and anomaly detection is made if necessary. During the process of reestablishing the background model, the region detection can be realized by an interframe difference method. The interframe difference method is realized by subtracting two frames of images to obtain an absolute value.
The abnormal detection is performed when necessary, including the situations that the video signal interference is serious, and a camera is artificially shielded. And judging according to the edge matching values of the background twice and the shortest time for successful background initialization. And if the value of the background of the current frame matched with the edge of the background model is less than a threshold value 3 or the shortest time for successful background initialization exceeds a threshold value 4, determining that the current frame is an abnormal phenomenon.
The threshold value 3 is preferably between 30 and 50. The threshold 4 is preferably between 6 and 20 seconds.
The enhanced region 15 in fig. 2 is a false region that uses shadow detection, highlight detection, and tree filtering to remove shadows, highlights, and leaf flapping; the method comprises the following steps: shadow detection, highlight detection, tree filtering.
The shadow detection is used for detecting shadow areas in the foreground image, including shadows of people and vehicles, and filtering out the detected shadow areas. The shadow detection is to respectively calculate the mean value of the pixel values in each connected region, take the mean value as a threshold value, judge the shadow region of the region and then filter the shadow region. The shadow determination rule is as follows: and if the pixel value is smaller than the threshold value, judging the shadow.
The highlight detection is used to detect whether an image is in a highlight state (the highlight state means that pixel values in the image are generally too high), and if so, brightness compensation is performed. The luminance compensation is achieved by luminance equalization such that the mean value of the pixel values of the image is 128.
Tree filtering is used to detect the leaves of the wiggling in the image and their shadows and to filter them out of the foreground image.
The detection of the swing leaves is achieved according to one of the following two characteristic decisions: (1) tracking a motion track, and when the part of the area of the motion area corresponding to the target in the motion track point is smaller than a threshold value 5 of the area of the motion area, considering that the target is a swing leaf; for example, if the target has 10 trace points, and only one corresponding region in the trace points is moving, the target is regarded as a leaf of a swinging tree, and the target is filtered out. (2) And if the amplitude of the centroid motion of a certain target is abrupt change, the target is considered to be a leaf of the swinging tree, namely when the displacement change of the target centroid in the adjacent track points exceeds a threshold value 6 of the target width, the target is considered to be the leaf of the swinging tree, and the target is filtered.
The threshold value 5 is preferably between 5% and 15%; the threshold 6 is preferably between 1.5 and 2.5.
The detection of the shadow of the swing leaves is realized by detecting the density of points in an area, and the detection method of the shadow of the swing leaves comprises the following steps: respectively counting the number of points in the area before and after the expansion operation (namely the number of points with the pixel value of 1 before and after the expansion operation in the area), calculating the ratio of the points, and if the ratio is less than a threshold value of 7, determining that the area is the area with the shadow of the swinging leaves, and filtering the area.
The threshold value 7 is preferably between 40% and 60%.
The splitting and merging area 16 in fig. 2 is a merging and splitting process for areas using constraints provided by a background model and a priori knowledge of human and vehicle models, etc., to solve the problems of object over-segmentation and mutual occlusion of objects. The method for splitting and merging the regions is to determine whether two adjacent regions are the same target region or different target regions based on the processing procedure of the enhanced region 15. If the two regions belong to the same target region, merging the two regions; otherwise, it is split. The two adjacent areas are areas with the edge distance smaller than a threshold value 8, areas with the same area index mark consistent, and areas with different target area index marks inconsistent.
The threshold value 8 is preferably between 3 and 7 pixels.
The second step of the present invention is to predict the target 20, calculate the average velocity of the target motion based on the accumulated displacement of the target motion and its corresponding accumulated time, and predict the next displacement of the target based on the velocity. Wherein the accumulated displacement is an accumulated sum of displacements of the target motion, and the accumulated time is an accumulated sum of times of the target motion. The relationship among the accumulated displacement, the accumulated time and the average movement speed is as follows:
v=s/t
wherein s is the displacement of the target mass center after the target mass center stably moves for multiple frames, t is the time required by the target to move for multiple frames, and v is the average speed of the target stably moving. The average speed can be calculated by the formula.
The next displacement predicted from the average velocity v is:
s′=v·Δt
and the time delta t is the predicted target time, and the time s' is the displacement of the target mass center after the stable movement time delta t. The next displacement can be calculated and predicted by the formula.
The third step of the present invention is a matching target 30 for tracking the matching stable target and filtering out false targets. Fig. 4 is a schematic flow chart of matching targets in the present invention, as shown in fig. 4. The matching target 30 includes: tracking the matching stable target 301 and filtering out false targets 302.
The stable target 301 of tracking matching is to determine whether or not the detection area and the tracking target match. The matching judgment conditions are as follows: the calculation formula of the matching coefficient D of the detection area and the target is as follows:
D=Da*ADa+Db*ADb+Dc*ADc
where Da is an area matching coefficient, Db is a histogram matching coefficient, and Dc is a distance matching coefficient. When the matching coefficient D of the detection area and the target is larger than the threshold value 9, the detection area is judged to be matched with the target. A. theDa、ADb、ADcThe weight coefficients are respectively corresponding to Da, Db and Dc. The threshold 9 is preferably between 0.7 and 0.8.
A is describedDa、ADb、ADcThe values of (A) are all between 0 and 1, and the sum of the values of the three is 1. A is describedDa、ADb、ADcPreferred values of (b) are 0.2, 0.3, 0.5, respectively.
1) Area matching coefficient Da. When the area of the area where the detection area and the target intersect is larger than the threshold value 10 of the area of the target, the detection area is considered to meet the matching of the areas, and Da is 1; otherwise Da is 0.
The threshold value 10 is preferably between 40% and 60%.
2) The histogram matching coefficient Db. When the histogram of the area where the detection area and the target are intersected is larger than the threshold value 11 of the histogram of the target, the detection area is considered to meet the matching of the histogram, and Db is 1; otherwise Db is taken to be 0.
The threshold 11 is preferably between 40% and 60%.
3) The distance matching coefficient Dc. The distance matching coefficient Dc is considered in two cases, i.e., whether the detection area is moving or stationary. If the number of foreground points in the difference image of the detection area in the current frame image and the previous frame image is greater than the threshold value 12 of the number of background points, the detection area is considered to be moving, otherwise, the detection area is considered to be static. When the detection area is in motion, calculating the distance between the center of the detection area in the current frame image and the center of the detection area in the current frame image, if the distance is smaller than a threshold value 13 of the length of a diagonal line of a rectangular frame where the target is located, considering that the distance matching is met, and taking Dc as 1; otherwise Dc is taken as 0. When the detection area is static, calculating the distance between the center of the detection area in the previous frame image and the center of the detection area in the current frame image, if the distance is smaller than a threshold value 14, determining that the distance matching is satisfied, and taking Dc as 1; otherwise Dc is taken as 0.
The threshold 12 is preferably between 65% and 75%. The threshold value 13 is preferably between 1.5 and 2. The threshold value 14 is preferably between 8 and 12 pixels.
Filtering out false target means to filter out false target area through the trajectory analysis of target motion. The track analysis is to use target track information (including plane information and centroid point information) to count the smoothness of area change and the stationarity of centroid point change.
The method for counting the smoothness of the area change comprises the following steps: area set { area on statistical target track point1,area2,...,areanN represents the number of the trace points, and the area mean value is counted:
<math>
<mrow>
<mover>
<mi>area</mi>
<mo>‾</mo>
</mover>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msub>
<mi>area</mi>
<mi>i</mi>
</msub>
</mrow>
</math>
and (3) counting the area variance: <math>
<mrow>
<msub>
<mi>area</mi>
<mi>sd</mi>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>n</mi>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>area</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<mover>
<mi>area</mi>
<mo>‾</mo>
</mover>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</math>
when areasdWhen area > 0.5, the area change is considered to be not smooth, and the target region is filtered out.
The method for counting the stability of the change of the centroid point is that the ratio of the direction change in the adjacent track points is counted according to the fact that the normal target does not generate frequent sudden change in the direction of the motion, if the ratio exceeds a threshold value 15, the centroid point is considered to be unstable in change, and the target area is filtered.
The threshold 15 is preferably between 40% and 60%.
The last step is to update the target 40, and update the model of the tracked target in real time according to the stable target after the target matching 30.
The invention also provides a moving target tracking system, and fig. 5 is a schematic structural diagram of the moving target tracking system of the invention, as shown in fig. 5. The moving object tracking system includes a detected object module 71, a predicted object module 72, a matched object module 73, and an updated object module 74. The detection target module 71 is configured to segment a moving target region in a video scene image from a background, the prediction target module 72 is configured to estimate a position of the moving target in a next frame of the scene image, the matching target module 73 is configured to track a matched stable target and filter a false target, and the update target module 74 is configured to update a template of the stable target in a current frame.
Fig. 6 is a schematic structural diagram of a target detection module in the moving target tracking system of the present invention. As shown in FIG. 6, the detection object module 71 includes an acquire video module 711, a pre-process image module 712, a mark region module 713, a maintenance status module 714, an enhanced region module 715, and a split and merge region module 716. The acquiring video module 711 is configured to acquire video content to obtain a scene image and establish a background model; a pre-processing image module 712, configured to eliminate an influence of the scene image on the background model; a marking region module 713, configured to perform foreground segmentation on the scene image according to the background model and mark a connected region; a maintenance state module 714, configured to determine a current state of the detection target module, perform corresponding processing, and perform anomaly detection if necessary; an enhanced region module 715, configured to remove false regions of shadows, highlights, and leaf flapping using shadow detection, highlight detection, and tree filtering; a split and merge region module 716, configured to merge and split the regions using the constraints provided by the background model and the a priori knowledge of the human and vehicle models to solve the problems of object over-segmentation and mutual occlusion.
Fig. 7 is a schematic structural diagram of a matching target module in the moving target tracking system of the present invention. As shown in fig. 7, the match target module 73 includes a stable target module 731 that tracks matches and a false target module 732 that filters out false targets. The track-matching stable target module 731 is configured to determine whether the detected region matches the tracked target, and the filter false target module 732 is configured to filter the false region.
The method has the greatest advantages of realizing the accurate tracking of multiple targets under the complex background, solving the problems of shielding, leaf swinging and the like, along with simple and convenient operation and strong practicability.
The invention has the advantages that the invention can accurately detect the moving objects in the scene image, including people and vehicles, and can ignore the influence of interference factors such as image shake, swinging trees, brightness change, shadow, rain, snow and the like.
The invention can also be used in an intelligent video monitoring system to realize the functions of target classification identification, moving target warning, moving target tracking, PTZ tracking, automatic close-up shooting, target behavior detection, flow detection, congestion detection, carry-over detection, stolen object detection, smoke detection, flame detection and the like.
While the foregoing is directed to the preferred embodiment of the present invention, and is not intended to limit the scope of the invention, it will be understood that the invention is not limited to the embodiments described herein, which are described to assist those skilled in the art in practicing the invention. Further modifications and improvements may readily occur to those skilled in the art without departing from the spirit and scope of the invention, and it is intended that the invention be limited only by the terms and scope of the appended claims, as including all alternatives and equivalents which may be included within the spirit and scope of the invention as defined by the appended claims.