WO2019006633A1

WO2019006633A1 - Fuzzy logic based video multi-target tracking method and device

Info

Publication number: WO2019006633A1
Application number: PCT/CN2017/091575
Authority: WO
Inventors: 李良群; 湛西羊; 罗升; 刘宗香; 谢维信
Original assignee: 深圳大学
Priority date: 2017-07-04
Filing date: 2017-07-04
Publication date: 2019-01-10

Abstract

The present invention discloses a fuzzy logic based video multi-target tracking method and device, the method comprising: performing online target motion detection on a current video frame, using a possible moving object obtained by detection as an observation result; creating a data association between the observation result and a prediction result of a target, wherein the prediction result is obtained by at least using a target trajectory in a previous video frame for prediction; performing trajectory management on unassociated prediction results and observation results, wherein the trajectory management comprises using an unassociated prediction result to obtain a terminating trajectory segment and using an unassociated observation result to obtain a new trajectory segment; and creating a trajectory association between the terminating trajectory segment and the new trajectory segment. The method of the present invention can be used to effectively increase the number of correct associations between multiple targets and observation results, thus significantly reducing the number of variations in target tags during tracking of multiple target objects, resulting in improved robustness and accuracy.

Description

Video multi-target tracking method and device based on fuzzy logic

[Technical Field]

The present invention relates to the field of target tracking, and in particular to a video multi-target tracking method and apparatus based on fuzzy logic.

【Background technique】

Video multi-target tracking technology is an important research branch in the field of computer vision. It is related to many frontier disciplines, such as image processing, pattern recognition, artificial intelligence, automatic control and computer integration. In intelligent video surveillance, human-computer interaction, robot vision Navigation, virtual reality, medical diagnosis, traffic control and surveillance have very important practical value.

However, for a video target in a complex background environment, there are still many difficulties in developing a robust and efficient multi-target tracking algorithm, such as mutual occlusion of targets, number of targets, and false observations. These situations have strong arbitrariness and uncertainty in actual pedestrian target tracking, and traditional probabilistic methods are not well modeled.

[Summary of the Invention]

The invention provides a video multi-target tracking method and device based on fuzzy logic, which can effectively improve the correct association between multi-objectives and observations, and accurately correct multiple targets under the conditions of apparent similarity, frequent interaction, occlusion and background interference. Tracking, while significantly reducing the number of target tag changes in multi-target tracking, has strong robustness and accuracy.

In order to solve the above technical problem, a technical solution adopted by the present invention is to provide a video multi-target tracking method based on fuzzy logic, which comprises: performing online target motion detection on a current video frame, and detecting a possible moving object as an observation result; Data correlation between the observation result and the prediction result of the target, wherein the prediction result is obtained by predicting at least the trajectory of the target of the previous video frame; and the prediction result and the observation on the unassociated Performing trajectory management, comprising: acquiring a termination trajectory segment by using the prediction result not associated with the prediction result, and acquiring a new trajectory segment by using the observation result that is not associated, the termination trajectory segment and the new The track segment is tracked.

In order to solve the above technical problem, a technical solution adopted by the present invention is to provide a device for video multi-target tracking based on fuzzy logic, comprising: a processor, configured to perform a current video frame acquired from the camera Online target motion detection, detecting the obtained possible moving object as an observation result; performing data association on the observation result and the prediction result of the target, wherein the prediction result is obtained by predicting at least the trajectory of the target of the previous video frame Performing trajectory management on the prediction result that is not associated with the observation result, including acquiring the termination trajectory segment by using the prediction result that is not associated with the prediction result, and acquiring the observation result by using the unrelated association New track a segment, performing trajectory association on the terminating trajectory segment and the new trajectory segment.

The invention has the beneficial effects of providing a video multi-target tracking method and device based on fuzzy logic, which performs data association by the observation result in the current video frame and the prediction result of the target, and the observation result and the prediction result on the unassociated Trajectory management can effectively improve the correct correlation between multi-objectives and observations, and accurately track multiple targets under the conditions of apparent similarity, frequent interaction, occlusion and background interference, which has strong robustness and accuracy.

[Description of the Drawings]

1 is a schematic flow chart of a first embodiment of a video multi-target tracking method based on fuzzy logic;

2 is a schematic flow chart of a second embodiment of a video multi-target tracking method based on fuzzy logic according to the present invention;

3 is a schematic diagram of occlusion between prediction results of different targets of the present invention;

4 is a schematic flow chart of a third embodiment of a video multi-target tracking method based on fuzzy logic according to the present invention;

FIG. 5 is a schematic flowchart of an embodiment of step S233 in FIG. 4;

6 is a schematic flow chart of a fourth embodiment of a video multi-target tracking method based on fuzzy logic according to the present invention;

7 is a schematic flow chart of an embodiment of step S23b in FIG. 6;

8 is a schematic structural diagram of multi-feature clue fusion according to the present invention;

Figure 9 is a fuzzy input variable f _k (i, j) of the present invention,

Schematic diagram of the membership function;

10 is a schematic diagram of a membership function of the output fuzzy variable α _M of the present invention;

11 is a schematic flowchart diagram of a fifth embodiment of a video multi-target tracking method based on fuzzy logic according to the present invention;

12 is a schematic flow chart of an embodiment of step S31 in FIG. 11;

13 is a motion similarity metric of a terminating trajectory segment and a new trajectory segment in the occlusion case of the present invention;

14 is a schematic flow chart of an embodiment of step S33 in FIG. 11;

15 is a schematic diagram of a position of the present invention for acquiring a loss prediction point;

16 is a schematic structural diagram of a first embodiment of a video multi-target tracking apparatus based on fuzzy logic according to the present invention;

17 is a schematic structural diagram of a second embodiment of a video multi-target tracking apparatus based on fuzzy logic according to the present invention.

【Detailed ways】

As shown in FIG. 1 , a schematic flowchart of a first embodiment of a video multi-target tracking method based on fuzzy logic, the method includes the following steps:

S1: Perform online target motion detection on the current video frame, and detect possible motion objects as Observations.

The online target motion detection can use motion detection algorithms such as frame difference method, optical flow method, background subtraction method, and mixed Gaussian background model. The invention mainly adopts a mixed Gaussian background model to perform motion detection on the current video frame to find pixels belonging to the foreground of the motion, supplemented by median filtering and simple morphological processing, and finally obtain possible moving objects in the current video frame. Observed object. An observation object is an image block in the current video frame. Generally, the shape of the observation object is a rectangle.

The mixed Gaussian background model is used to detect the moving target, and the set of detection targets is z={z ₁ ,...,z _r }. Since the predicted result of the detected target does not have the identity ID, the observation result cannot be judged. The corresponding situation of the prediction result of the previous frame target. To this end, the detection result z={z ₁ ,..., z _r } must also be used as the current observation information to make further correlation judgment between the prediction result of the target and the observation result.

S2, data association between the observation result and the prediction result of the target.

Since most of the targets in video multi-target tracking are non-rigid, their motions have certain randomness, and in actual complex scenes, there are often lighting changes, target occlusion, similar object interference, etc., which may cause targets. Tracking uncertainty. Targets include reliable targets for stable tracking and temporary targets for unstable tracking. The target state in this step, that is, whether each target is marked as a reliable target or a temporary target, is determined by the trajectory management of the previous video frame. The temporary target includes a new target established by the observation that the previous video frame is a candidate result that is not associated and is not a successful match, and a target whose consecutively associated successful number of frames is less than or equal to the first frame number threshold and has not been deleted. A reliable target includes a target whose number of consecutively successful frames is greater than the first frame number threshold and has not been deleted. The prediction result of the target is obtained by predicting at least the trajectory of the target of the previous video frame.

S3, performing trajectory management on the unrelated prediction results and observation results, including acquiring the trajectory segments by using the unrelated prediction results and acquiring new trajectory segments by using the unrelated observations, and terminating the trajectory segments and new The track segment is tracked.

Specifically, the data association method in step S2 can deal with the data association problem of high-frequency occlusion occurring in a short period of time and multi-target tracking under a large number of false observation conditions, but in the case of long-term occlusion and missed detection, some The target state is not updated for a long time, and the target motion trajectory is difficult to maintain, and the target trajectory is broken, that is, the same target has multiple motion trajectories. At the same time, when the new target enters the scene, the corresponding new target trajectory needs to be initialized, and if the target leaves the scene, the corresponding target trajectory is also deleted.

In this application, by using the unrelated prediction results and the unrelated observations to obtain the termination trajectory segments and the new trajectory segments, the fuzzy membership degree is established by introducing the feature similarity measure of the target trajectory, and the fuzzy synthesis function is used to calculate the trajectory. The comprehensive similarity between the segments is then used to achieve the correlation of the same target trajectory with the maximum comprehensive similarity and threshold discriminant principle, and predictively fill the missing points between the trajectory segments of the same target, and finally obtain a complete continuous target trajectory.

In the above embodiment, the data association is performed by the observation result in the current video frame and the prediction result of the target, and the trajectory management is performed on the uncorrelated observation result and the prediction result, thereby effectively improving the multi-view. The correct correlation between the target and the observation, accurate tracking of multiple targets under the conditions of apparent similarity, frequent interaction, occlusion and background interference, with strong robustness and accuracy

Referring to FIG. 2, FIG. 2 is a schematic flowchart diagram of a second embodiment of a video multi-target tracking method based on fuzzy logic according to the present invention, and a second embodiment of the present invention is a further extension of step S2 in the first embodiment. The method includes the following steps:

S21. Calculate an occlusion degree between prediction results of different targets in the current video frame.

The prediction result of the target in this step is obtained by predicting at least the trajectory of the target of the previous video frame. First, an occlusion calculation is performed between the prediction results of all the targets in the current frame video to determine whether occlusion occurs between the prediction results of all the targets in the current frame video.

Please refer to FIG. 3 for further reference. FIG. 3 is a schematic diagram of occlusion between prediction results of different targets of the present invention. In the current video frame, the tracking icon shapes of the prediction result A and the prediction result B are all rectangular, and there is overlap between the two, and the parameter of the prediction result A is expressed as: [x, y, w, d], wherein x, y represents the coordinates of the rectangular frame, w represents the width of the rectangular frame, d represents the height of the rectangular frame, and the parameter of the prediction result B is expressed as: [x', y', w', h'], where x', y' represents the coordinates of the rectangular frame, w' represents the width of the rectangular frame, h' represents the height of the rectangular frame, and the shaded portion between the predicted result A and the predicted result B is expressed as: [x _o , y _o , w _o , h _o ] And its overlapping parts are expressed as:

x _o =max(x,x')

y _o =max(y,y')

w _o =min(x+w,x'+w')-x _o

h _o =min(y+h,y'+h')-y _o (14)

From this, it can be seen that the area of the overlap between the prediction result A and the prediction result B is expressed as w _o *h _o . If the above w _o , h _o does not satisfy w _o >0 or h _o >0, the two tracking rectangles do not form an overlapping rectangle, that is, the overlapping rectangle area is 0.

Assuming that the prediction result A and the prediction result B are occluded as shown in FIG. 2, and the shaded portion between the two tracking rectangles represents the occlusion area, the occlusion degree between the two is defined as:

Where s(·) represents the area of the area, and the occlusion degree satisfies 0 ≤ ω (A, B) ≤ 1. When ω(A, B) is greater than 0, it means that occlusion occurs between the prediction result A and the prediction result B. And further, the longitudinal image coordinate values y _A and y _B at the bottom of the two tracking rectangles respectively representing the prediction result A and the prediction result B, if y _A > y _B , the prediction result B is blocked by the prediction result A, and vice versa. Then, the prediction result A is occluded by the prediction result B.

S22: Determine whether occlusion occurs between each prediction result and other prediction results according to the occlusion degree.

In this step, the ambiguity determination is performed on the prediction results of all the targets in the current video frame scene, and the overlap ratio ω _ij of the tracking rectangle between the different target prediction results of the current video frame is calculated according to the formula (15). The occlusion degree between the prediction results is determined, and it is judged whether the occlusion degree of each prediction result and other prediction results is smaller than the first occlusion determination threshold τ _over . The first occlusion determination threshold τ _over satisfies τ _over ∈ [0, 1]. If ω _{ij is} smaller than the first occlusion determination threshold τ _over , occlusion is considered to occur between the prediction results. If τ _over is equal to 0, it indicates that no occlusion occurs between the prediction results.

S23: if there is no occlusion between the prediction result and any other prediction results, the first data correlation is performed on the prediction result and the observation result; if occlusion occurs between the prediction result and other prediction results, the prediction result and the observation result are performed The second data association.

After occlusion determination is performed on the prediction result of all the targets in the current video frame, if no occlusion occurs between the prediction result and any other prediction results, the first data is associated with the observation result in the current video frame. An occlusion occurs between the predicted result and other predicted results, and a second data association is performed. The first data association is different from the first data association, and the second data association is more complex than the first data association.

In the above embodiment, firstly, by determining whether occlusion occurs between prediction results of all targets in the current video frame, occlusion and non-occlusion are respectively performed between the prediction results of the target, and data association between the prediction result and the observation result is performed. It can accurately track multiple targets under the conditions of apparent similarity, frequent interaction, occlusion and background interference, and has strong robustness and accuracy.

Referring to FIG. 4, FIG. 4 is a third embodiment of a video multi-target tracking method based on fuzzy logic according to the present invention, which is a further extension of S23 in the second embodiment of the video multi-target tracking method based on the fuzzy logic of the present invention. The invention discloses a video multi-target tracking method based on fuzzy logic. The same steps of the second embodiment are not described herein again. This embodiment includes:

Referring to FIG. 4, step S23 further includes the following sub-steps:

S231. Calculate a second similarity measure between the observation result and the prediction result.

The second similarity measure is used to measure the predicted result and the distance between the observed result. The second similarity measure includes: a spatial distance feature similarity measure and an appearance feature similarity measure. Generally, the position of the target between adjacent frame images does not change greatly. Therefore, the spatial distance feature is one of the features that can more effectively match the observation and prediction results of the target. In a specific embodiment, the spatial distance feature similarity measure f _D (·) between the observation d and the prediction result o is defined as:

Where ||·|| ₂ is a two-norm, (x _o , y _o ) is the central coordinate of the prediction result o in the current video frame, and (x _d , y _d ) is the observation result d in the current video frame. Center coordinates, h _o is the height of the prediction result o in the current video frame,

Is the variance constant.

Further, the appearance feature similarity measure f _S (·) between the observation result d and the prediction result o is defined as:

Where h _d is the height of the observation d in the current video frame,

Is the variance constant.

S232. Calculate an associative cost matrix between the observation result and the prediction result by using the first similarity measure.

The multiplicative fusion is used to fuse the spatial distance feature similarity measure and the appearance feature similarity measure to obtain the correlation between the observation result and the prediction result, and is defined as:

s _ij =f _D (o,d)×f _s (o,d) (3)

The correlation cost matrix between the observation result and the prediction result is obtained according to the degree of association, and is defined as:

S=[s _ij ] _n×l (4)

Where i=1, 2,...n, j=1, 2,...,l.

S233, using the greedy algorithm to optimize the correlation cost matrix to find the associated observations and prediction results.

The greedy algorithm is used to achieve the correct correlation between the prediction result and the observation result, so as to obtain the correlation between the prediction result and the observation result. Referring to FIG. 5, the step S233 further includes the following sub-steps:

S2331, find the maximum value among all the elements in the associated cost matrix S that are not marked.

Find the maximum value Spq=max([Sij]n*l) of all elements in the associated cost matrix S that are not marked, where p=1, 2, 3...n, q=1, 2, 3...l, and mark all elements of the pth row and the qth column where the maximum value s _pq is located in the associated cost matrix S.

S2332: Determine whether the maximum value is the maximum value in the row and column, and meet the greater than the first threshold.

It is determined whether the maximum value s _pq is the maximum value in the row and the column in which it is located, that is, whether: s _pq ≥{s _pj } _j=1,2,...l , s _pq ≥{s _iq } _{i=1, 2,...,n} . It is further determined whether the maximum value s _pq is greater than the first threshold λ ₁ , that is, whether the correlation probability of the prediction result p and the observation result q is greater than the first threshold λ ₁ , and the first threshold satisfies λ ₁ ∈ [0.6, 0.9].

S2333, if it is greater, the observation result is correctly correlated with the prediction result.

When the maximum value s _pq satisfies the above-described determination condition, it is considered that the prediction result p is correctly associated with the observation result q and the correlation pair is recorded in the set of the associated prediction result and the observation result. The above steps are performed cyclically until all or all of the columns in the associated cost matrix S are marked.

In the foregoing implementation manner, by determining that no occlusion occurs between the prediction results of the target in the current video frame, the spatial distance feature similarity measure and the appearance feature similarity measure between the observation result and the prediction result are fused to obtain an associated cost matrix of the two. The optimization solution can find the correct correlation observations and prediction results.

Referring to FIG. 6, FIG. 6 is a fourth embodiment of a video multi-target tracking method based on fuzzy logic according to the present invention, which is a further expansion of S23 in a second embodiment of a video multi-target tracking method based on fuzzy logic according to the present invention. exhibition.

In the case where there is occlusion between the prediction results of different targets in the video frame, the fusion of the two features cannot be completed due to the simple multiplicative fusion strategy, and the correlation between the prediction result and the observation result cannot be completed. A logical multi-feature clue weighted fusion strategy.

Step S23 further includes the following sub-steps:

S23a: Calculate a third similarity measure between the observation and the prediction.

In the current video frame, because the color feature has better ability to resist the deformation of the target, but it lacks the description of the spatial structure of the prediction result, and is sensitive to illumination, and the edge feature can well describe the edge of the human body. The variation and the small amount of offset are insensitive, so the color and edge features have complementary characteristics, so the present invention uses these two kinds of information fusion to establish the appearance characteristics of the prediction result. In the present invention, the distance between the observation result and the prediction result is measured by a third similarity measure, and the third similarity measure includes an appearance feature similarity measure, a geometric feature similarity measure, and a motion feature similarity measure. And spatial distance feature similarity measures.

Wherein, the appearance feature similarity measure f _A (·) between the observation result d and the prediction result o is defined as:

Where ρ(·) is a Bhattacharyya coefficient, H _c (·) is a color histogram feature weighted by the background of the current video frame image, and H _g (·) is a block gradient direction histogram feature,

Is a variance constant,

Is the variance constant.

The motion feature similarity measure f _M (·) between the observation d and the prediction result o is defined as:

Where (x' _o , y' _o ) is the central coordinate of the prediction result o at the previous moment, and (x _o , y _o ) is the central coordinate of the prediction result o,

The projection of the speed of the prediction result o on the coordinate axis for the previous moment,

Is a variance constant;

The spatial distance feature similarity measure f _D (·) between the observation d and the prediction result o is defined as:

Where ||·|| ₂ is a two-norm, (x _o , y _o ) is the central coordinate of the prediction result o, (x _d , y _d ) is the central coordinate of the observation d, and h _o is the prediction result o height,

Is the variance constant.

The appearance feature similarity measure f _S (·) between the observation d and the prediction result o is defined as:

Where h _d is the height of the observation d

Is the variance constant.

The target model and the candidate model corresponding to the appearance feature similarity measure and the geometric feature similarity measure are respectively defined as: the target model:

Candidate model:

To measure the similarity between the target model and the candidate model, the present invention is described using a Bhattacharyya coefficient, which is defined as:

The motion model of the predicted result of the target is described by the coordinates and velocity of its centroid. In video multi-target tracking, since the interval between adjacent two video sequences is very short, the motion of the video target motion is not very large. In most cases, it can be assumed that the motion of each target is uniform motion, so it can be The tracking rectangle (x, y, w, h) of the prediction result of each target establishes a motion state parameter model based on position, size, and velocity. Then define the state variable X _k of the Kalman filter as:

X _k =[x,v _x ,y,v _y ] ^T (17)

Wherein, x and y respectively represent the horizontal and vertical coordinates of the centroid of the tracking rectangle of the observation result of the kth frame, and respectively represent the velocity of the centroid of the tracking rectangle of the observation result of the kth frame in the x-axis and the y-axis direction.

S23b: The fuzzy inference system model is used to calculate the weight value of each feature similarity measure in the third similarity measure.

The fuzzy inference system in the present invention mainly includes four basic elements: fuzzification of input variables, establishment of fuzzy rule base, fuzzy inference engine, and defuzzifier (fuzzy innovation precise output). In this embodiment, the input of the fuzzy inference system is defined by the similarity measure of each feature, and the adaptive weighting coefficient of each feature is obtained by inference.

Referring to FIG. 7, the step S23b further includes the following substeps:

S23b1, calculating the input variables of the fuzzy inference system.

Referring to FIG. 8, FIG. 8 is a schematic structural diagram of multi-feature clue fusion according to the present invention. The motion feature similarity measure is used as the first fuzzy input variable, and the similarity measure of the remaining three features is the second fuzzy input variable, and the calculation of the similarity measure mean of the other three features can be defined as:

will

As the first and second fuzzy input variables of the fuzzy logic system, respectively. Where e _i is a measure of similarity of feature i,

The fusion coefficient of the feature k at time t-1, the f _M (i, j) motion feature similarity measure,

The weighted mean of the remaining three feature similarity measures.

S23b2, determining the input function of the fuzzy inference system and the membership function of the output variable.

In general, the accuracy of the output variable is affected by the number of fuzzy sets. The more fuzzy sets, the more accurate the output, but the more fuzzy sets, the more computational complexity of the algorithm, so usually the number of fuzzy sets is selected by experience. .

Please refer to FIG. 9. FIG. 9 is a fuzzy input variable f _k (i, j) according to the present invention.

Schematic diagram of the membership function.

Input variable f _k (i,j) and

The fuzzification is performed by using five language fuzzy sets {ZE, SP, MP, LP, VP}, and the membership functions are μ _{0, ZE} (i, j), μ _{0, SP} (i, j), μ _{0, respectively. MP} (i, j), μ _{0, LP} (i, j), and μ _{0, VP} (i, j) indicate that the five fuzzy sets represent zero, positive, medium, positive, and very large, respectively. .

Please refer to FIG. 10. FIG. 10 is a schematic diagram of the membership function of the output fuzzy variable α _M according to the present invention. For the output fuzzy variable α _M contains five fuzzy sets: {ZE, SP, MP, LP, VP, EP}, EP represents the maximum fuzzy set, and its membership function is μ _{1, ZE} (i, j), μ _{1, SP} (i, j), μ _{1, MP} (i, j), μ _{1, LP} (i, j), μ _{1, VP} (i, j), and μ _{1, EP} (i, j).

S23b3 obtains the weight value of each feature similarity measure in the third similarity measure by using the inference rule of the fuzzy inference system.

According to the input function of the input variable and the output variable defined in step S23b2, the fuzzy inference rule can be as follows:

Rule 1: If f _M (i,j) is ZE and f _M (i,j) is ZE, then α _M is EP

Rule 2: If f _M (i,j) is ZE and f _M (i,j) is SP, then α _M is VP

Rule 3: If f _M (i,j) is ZE and f _M (i,j) is MP, then α _M is LP

The detailed fuzzy rules are shown in Table 1:

In a specific embodiment of the present invention, rule 1 is taken as an example to give a detailed reasoning process:

a) According to rule 1, the fuzzy set corresponding to the fuzzy input variable f _M (i, j) is ZE, and the corresponding fuzzy membership degree can be obtained by using the value of f _M (i, j) according to the fuzzy membership function shown in FIG. 9 . value

The same method can be used to find fuzzy input variables

Corresponding fuzzy membership value

b) Calculate the applicability of Rule 1 using the following formula:

Among them, ∧ means take small.

c) According to rule 1, the corresponding fuzzy output is EP, then the output of rule 1 can be calculated by:

In the same way, you can calculate the fuzzy output variables of all rules. According to Table 1, M=25 in the present application. Thus, the total fuzzy output is:

Among them, ∨ means take big. Since equation (20) yields a fuzzified output, the following method can be used to obtain the defuzzified output:

Wherein, the fuzzy rule m corresponds to the centroid of the output fuzzy set. For the same reason, the fuzzy inference system is constructed for different features, and the weight value coefficients α _A , α _S and α _{D of the} geometric shape feature, the motion direction feature and the spatial distance can be obtained respectively.

S23c, performing multiple feature clue fusion on the weight value and the third similarity measure to obtain an associative cost matrix between the observation result and the prediction result.

Then normalize the weight value coefficients of all features to obtain the fusion coefficients of the features at the current time:

By judging the credibility of each feature, adaptive assigning different weights to different features, which solves the tracking problem in complex background and mutual occlusion. According to equation (21), the correlation cost matrix between the observation result and the prediction result is obtained, which is defined as:

S=[s _ij ] _n×l (24)

Where {α _k } _{k∈{A, M, S, D}} is the fusion coefficient of each feature similarity measure and satisfies

f _k (i,j) _{k∈{A,M,S,D}} are observations

And forecast results

A measure of similarity between each feature.

S23d, using the greedy algorithm to optimize the correlation cost matrix to find the associated observations and prediction results.

The greedy algorithm is used to achieve the correct correlation between the prediction result and the observation result, so that the correlation between the prediction result and the observation result further includes:

1) Find the maximum of all elements in the associated cost matrix s _ij that are not marked.

Find the maximum value Spq=max([Sij]n*l) of all elements in the associated cost matrix s _ij that are not marked, where p=1, 2, 3...n, q=1 , 2, 3, ..., and mark all elements of the pth row and the qth column where the maximum value s _pq is located in the associated cost matrix S.

2) Determine whether the maximum value is the maximum value in the rank and column, and satisfy the greater than the second threshold.

It is determined whether the maximum value s _pq is the maximum value in the row and the column in which it is located, that is, whether: s _pq ≥{s _pj } _j=1,2,...l , s _pq ≥{s _iq } _{i=1, 2,...,r} . It is further determined whether the maximum value s _pq is greater than the second threshold λ ₂ , that is, whether the correlation probability of the prediction result p and the observation result q is greater than the second threshold λ ₂ , and the second threshold satisfies λ ₂ ∈ [0.6, 0.9].

3) If greater than, the observations are correctly correlated with the predictions.

When the maximum value s _pq satisfies the above-described determination condition, it is considered that the prediction result p is correctly associated with the observation result q and the correlation pair is recorded in the set of the associated prediction result and the observation result. Further, if there are still unmarked rows and columns in the associated cost matrix Sij, the above step 1) is continued.

In the above implementation manner, by determining that occlusion occurs between the prediction results of the target in the current video frame, calculating a third feature similarity measure between the prediction result and the observation result, introducing a fuzzy inference system, and using the fuzzy logic based method, according to the current tracking The environment adaptively assigns different weight values to different feature information, obtains the weighted sum of the multi-attribute features, constitutes the associated cost matrix of the prediction result of the frame target and the observation result, and then uses the greedy algorithm to optimize the solution allocation, which can effectively improve the multi-objective The correct association with the observations.

Referring to FIG. 11 , FIG. 11 is a schematic flowchart of a fifth embodiment of a video multi-target tracking method based on the fuzzy logic, which is a further extension of step S3 in the first embodiment of the video multi-target fuzzy data association method of the present invention. Embodiments further include:

S31. Establish, by using the first similarity measure, a fuzzy association cost matrix between the terminating trajectory segment and the new trajectory segment.

The fuzzy logic data association method can deal with the data association problem of high-frequency occlusion and multi-target tracking under a large number of false observations in a short period of time. However, in the case of long-term occlusion and missed detection, some target states are not long. To the update, the target motion trajectory is difficult to maintain, and there will be a situation where the target trajectory is broken, that is, the same target has multiple motion trajectories. At the same time, when the new target enters the scene, the corresponding new target trajectory needs to be initialized, and if the target leaves the scene, the corresponding target trajectory is also deleted.

Referring to FIG. 12, step S31 further includes the following sub-steps:

S311, establishing a similarity vector between the ending track segment and the new track segment.

The termination of the target's prediction result is for the target that is left in the scene, or is a stationary target, which needs to be removed from the current target tracking sequence. If the estimated position of the target is at the edge of the video scene (the edge of the scene is set to τ _border = 5), then it can be determined that the target leaves the video scene, and the target is deleted from the current target tracking sequence. If the estimated position of the target is not at the edge of the video scene and the target consecutive x frames are not associated with any observations, then the target can be inferred to be stationary or occluded, and the target is removed from the current target tracking sequence.

If there are unrelated observations in the scene area, it can be confirmed whether a new target appears by judging whether the observation can be associated with the target. In a complex environment, due to various factors such as background interference and target deformation, the target detector will inevitably produce some false observations under the condition of maintaining high detection rate, which will not be associated with any existing targets. These false observations may also be incorrectly initialized to new targets. In general, within a few consecutive frames (within the time sliding window) the target will have overlapping areas and the same geometric dimensions, so in order to be able to accurately determine whether the unrelated observations are derived from new targets, the present application is The new target initialization module uses the observations in the continuous T _init frame to determine whether there is an overlap of the rectangular frame areas and the same size, and the area overlap ratio of the rectangular frame between the observation results is defined as:

The dimensional similarity between the observations is:

among them,

The observation values at time t and time t+1 are respectively, and area(□) indicates the area of the observation result.

Representation of observations

versus

The overlap area, h is the height value of the observation rectangle.

Where τ _ω and τ _r represent the overlap rate threshold and the size similarity threshold, respectively. The area overlap ratio and the size similarity of the observations in the continuous init frame are greater than the set threshold, that is, when the init is greater than or equal to T _init , it is converted into a valid track, that is, a new track segment is started. And add it to the target tracking sequence. Therefore, the method can effectively eliminate false observations generated by the target detector, thereby reducing the erroneous target trajectory start.

Wherein, since the target termination trajectory may be a trajectory segment or a complete target trajectory, in order to confirm the integrity of the target trajectory, the last position of the termination trajectory is used to determine whether the trajectory is disconnected or leaves the scene in the scene. If the last position of the end track is within the scene, its trajectory is the end track segment. Meanwhile, when the start frame of the target track segment is the current time, it indicates that the new track segment is a new observed temporary track.

In a specific embodiment of the invention, the set of termination trajectory segments is defined as:

The set of new track fragments is defined as:

Where n _a and n _b respectively represent the number of the end track segment set and the new track segment set.

The first similarity measure includes an appearance similarity measure, a shape similarity measure, and a motion similarity measure, wherein the appearance similarity measure is defined as:

Where ρ(·) is expressed as a Bhattacharyya coefficient, and H _c (·) is a background weighted color histogram feature.

For the variance constant, H _g (·) represents the direction gradient histogram feature,

Is a variance constant;

The shape similarity measure is defined as:

Where h _i represents the height of the end track segment T _i in the image, and h _j represents the height of the new track segment T _j in the image,

Is a variance constant;

The motion similarity measure is defined as:

Where G(□) represents a Gaussian distribution, ∑ is the variance of the Gaussian distribution, and Δt is the first observed frame interval of the last trajectory segment T _i and the new trajectory segment T _{j is} observed.

v _i is the end position and velocity of the terminating track segment T _i , respectively.

v _j is the starting position and velocity of the new track segment.

Figure 13 is a motion similarity measure for the end track segment and the new track segment in the occlusion case. It is assumed that the error between the position of the predicted result and the position of the actual observation satisfies the Gaussian distribution, that is, the smaller the distance between the predicted position of the terminated trajectory segment and the actual position of the new trajectory segment, the similarity between the two trajectory segments The bigger (for example

versus

The closer the distance is,

The value is larger).

Further, according to equations (1), (2), and (3), a similarity vector between two pieces of track segments can be calculated, which is defined as:

Where Λ _k (T _i , T _j ) ∈ [0, 1] ³ , τ _gap is the associated time interval threshold,

a time frame indicating that the trajectory segment T _{i is} broken,

Indicates the time frame at which the new track segment T _j starts.

S312. Calculate the degree of matching between the end track segment and the new track segment by using the similarity vector.

In order to obtain the similarity between any new trajectory segment and the ending trajectory segment, the present application uses a fuzzy comprehensive model based fuzzy model to measure the degree of matching between the ending trajectory segment and the new trajectory segment, which is defined as:

Where ∧ indicates that the matching degree takes a minimum value, and ∨ indicates that the matching degree takes a maximum value.

S313, calculating a fuzzy comprehensive similarity between the ending trajectory segment and the new trajectory segment according to the matching degree degree.

The fuzzy comprehensive similarity between the k-terminating trajectory segment T _i and the new orbital segment T _j is defined as:

S314. Establish an associative cost matrix of the terminating trajectory segment and the new trajectory segment according to the fuzzy comprehensive similarity.

The associated cost matrix between the terminating trajectory segment and the new trajectory segment is defined as:

The prerequisites for the two track segments to be associated are:

1) The time has continuity, that is, there is no overlapping area in the corresponding time frame interval, ie

2) The time interval between the two track segments should be within the associated time interval threshold, ie

In the process of target tracking, if the target's prediction result is occlusion, target detection error, and missed detection, etc., the target's motion trajectory is disconnected, then the time between the new trajectory and the original termination trajectory after disconnection The interval is relatively short. If the time interval between the two track segments is relatively long, then it can be considered that they are not from the same target. In the present application, the reasonable correlation time interval threshold τ _gap can be set to associate the trajectories that may be associated within a relatively small range, which can improve the time efficiency of the algorithm. Excludes some trace segments that cannot be successfully associated.

S32, using the maximum fuzzy comprehensive similarity and the threshold discriminant principle to implement the trajectory association between the ending trajectory segment and the new trajectory segment.

According to the fuzzy correlation cost matrix U, due to the complexity of the target tracking environment, in order to give the similarity judgment between the ending trajectory segment T _i and the new trajectory segment T _j in the trajectory segment association, it is necessary to use the fuzzy operator to deblur The maximum comprehensive similarity is expressed as:

in case

μ _ij* ≥ ε (29)

Then the terminating trajectory segment T _{i is} associated with the new trajectory segment T _j* , and the new trajectory segment T _{j* is} no longer associated with other terminating trajectory segments T _i , otherwise it is not associated with the trajectory segment, where ε is a threshold parameter, and 0 ≤ ε ≤ 1.

S33, filling the trailing track segment on the association and the missing track segment between the new track segments.

Due to occlusion, target detection error and missed detection between the target's prediction results, the target's motion trajectory is disconnected. The above correlation method can be used to associate two broken trajectories together, but the two trajectory segments There is often a lack of detection point information for lost frames. Therefore, the item The target does not form a complete continuous trajectory, and it also needs to predict the filling of the gap between them.

Referring to Figure 14, step S33 includes the following sub-steps:

S331: Perform bidirectional prediction on the missing track segment between the associated ending track segment and the new track segment to obtain position information of the predicted point.

15 is a schematic view of the position acquisition loss prediction point, T _f in front of the two tracks off a track segment, the track segment is terminated, T _b is the back of a track segment, i.e. the new track segment. The position of the two-way continuous predicted target within the disconnection time interval is determined by the end position, the new starting position, and the speed information of the two trajectories where the same target is broken. The process of acquiring the position information of the predicted point is as shown in FIG. p _f represents a specific location when using the track segment T _f for the target where forward prediction, p _b represents a specific location when using the track segment T _b of the reverse prediction target, t _f T _f represents a forward prediction for the current The number of frames, t _b represents the current number of frames when T _{b is} used for backward prediction. The process of obtaining the predicted position information is as follows:

1) Initialization:

2) If t _f <t _b , the forward prediction target from P _f is in the specific position in the next frame:

p _f =p _f +v _f ,t _f =t _f +1 (30)

The specific position of the reverse prediction target from the previous frame in P _b :

p _b =p _b -v _b ,t _b =t _b -1 (31)

Repeat step 2) until t _f ≥ t _b , and finally obtain the position information of the missing points between the two track segments.

S332. Obtain rectangular frame information of the predicted point.

In order to be able to evaluate the multi-target tracking accuracy of the tracking algorithm, it is also necessary to obtain the width and height of the rectangular frame of the predicted point target. In this application, the averaging method is used to obtain the width and height of the rectangular frame of the predicted point, which is:

Where h _k and w _k represent the height and width of the rectangular frame of the detection point in the kth frame.

The height and width of the rectangle representing the end of the track segment _Tf ,

The height and width of the rectangular box representing the head of the track segment T _b .

S333: Fill the missing track segment according to the position information of the predicted point and the rectangular frame information.

By using the above predicted point filling method to predict and fill the missing points between the track segments, a complete continuous motion track of the target can be obtained.

In the practical application of the present invention, the prediction result and the observation result of the already associated target are filtered and predicted by a filter to obtain an actual track point and a prediction result in the current video frame of the target, Among them, the filter used in the present application may include, but is not limited to, a Kalman filter. Further, extrapolation prediction is performed on the prediction result without the associated target, and the prediction result is obtained, so as to accurately track the multi-target. And the prediction result of the target is used for data association in the next frame of the video frame.

In the above embodiment, the missing points between the broken trajectories of the same target are predicted and filled, and a complete continuous target trajectory is formed, which can effectively solve the problems of smoothing and predicting the target trajectory, terminating the target trajectory, and starting the new target trajectory.

As shown in FIG. 16, FIG. 16 is a schematic structural diagram of a first embodiment of a video multi-target tracking apparatus based on fuzzy logic, including:

The detecting module 11 is configured to perform online target motion detection on the current video frame, and detect the obtained possible moving object as an observation result.

The association module 12 is configured to perform data association between the observation result and the prediction result of the target, wherein the prediction result is obtained by predicting at least the trajectory of the target of the previous video frame.

The trajectory management module 13 is configured to perform trajectory management on the unrelated prediction results and the observation results, including acquiring the ending trajectory segments by using the unrelated prediction results, and acquiring the new trajectory segments by using the unrelated observation results, Terminate the track segment and the new track segment for track association.

As shown in FIG. 17, FIG. 17 is a schematic structural diagram of a second embodiment of a video multi-target tracking apparatus based on fuzzy logic according to the present invention, including: a processor 110 and a camera 120.

The camera 120 can be a local camera, the processor 110 is connected to the camera 120 through a bus; the camera 120 can also be a remote camera, and the processor 110 is connected to the camera 120 via a local area network or the Internet.

The processor 110 controls the operation of the fuzzy logic based video multi-target tracking device, which may also be referred to as a CPU (Central Processing Unit). Processor 110 may be an integrated circuit chip with signal processing capabilities. The processor 110 can also be a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, and discrete hardware components. . The general purpose processor may be a microprocessor or the processor or any conventional processor or the like.

The video multi-target tracking device based on the fuzzy logic may further include a memory (not shown) for storing instructions and data necessary for the operation of the processor 110, and may also store video data captured by the transmitter 120.

The processor 110 is configured to perform online target motion detection on the current video frame acquired from the camera 120, and detect the obtained possible moving object as an observation result; perform data association on the observation result and the target prediction result, wherein the prediction result is at least the previous one. The trajectory of the target of the video frame is predicted; the trajectory management is performed on the unconstrained prediction result and the observation result, including the use of the unrelated prediction result to obtain the termination trajectory fragment and the use of the unrelated observation result to acquire the new trajectory The trajectory segment, the trajectory association of the ending trajectory segment and the new trajectory segment.

The function of each part included in the video multi-target tracking device based on the fuzzy logic of the present invention can refer to the present The description of the video multi-target tracking method based on the fuzzy logic in the corresponding embodiments is not described herein.

In summary, those skilled in the art can easily understand that the present invention provides a video multi-target tracking method and apparatus based on fuzzy logic, which performs data association by observing results in a current video frame and prediction results of a target, and is not associated. The observation and prediction results on the trajectory management can effectively improve the correct correlation between multi-objectives and observations, and accurately track multiple targets under the conditions of apparent similarity, frequent interaction, occlusion and background interference. Robustness and accuracy.

The above is only the embodiment of the present invention, and is not intended to limit the scope of the invention, and the equivalent structure or equivalent process transformations made by the description of the invention and the drawings are directly or indirectly applied to other related technologies. The fields are all included in the scope of patent protection of the present invention.

Claims

A video multi-target tracking method based on fuzzy logic, characterized in that the method comprises:

Performing on-line target motion detection on the current video frame, and detecting the possible moving object as an observation result;

Performing data association on the observation result and the prediction result of the target, wherein the prediction result is obtained by predicting at least the trajectory of the target of the previous video frame;

Performing trajectory management on the prediction result and the observation result that are not associated, including acquiring the termination trajectory segment by using the prediction result that is not associated with the prediction result, and acquiring the new trajectory by using the observation result that is not associated a trajectory segment, the trajectory association of the terminating trajectory segment and the new trajectory segment.
The method according to claim 1, wherein the data association between the observation result and the prediction result of the target comprises:

Calculating an occlusion degree between prediction results of different targets in the current video frame;

Determining whether occlusion occurs between each of the prediction results and the other prediction results according to the occlusion degree;

If no occlusion occurs between the prediction result and any other of the prediction results, performing a first data association on the prediction result and the observation result; if the prediction result occurs between the prediction result and the other prediction result And occluding, the second data association is performed on the prediction result and the observation result, wherein the first data association and the second data association are different.
The method according to claim 2, wherein if no occlusion occurs between the prediction result and any other of the prediction results, performing a first data association between the prediction result and the observation result comprises:

Computing a second similarity measure between the observation result and the prediction result, the second similarity measure including a spatial distance feature similarity measure and an appearance feature similarity measure;

Calculating an associated cost matrix between the observation result and the prediction result by using the second similarity measure;

The correlation cost matrix is optimized by using a greedy algorithm to find the associated observation result and the prediction result.
The method according to claim 2, wherein said spatial distance feature similarity measure f D (·) between the observation result d and the prediction result o is defined as:

Where ||·|| 2 is a two-norm, (x o , y o ) is the central coordinate of the prediction result o, and (x d , y d ) is the central coordinate of the observation d, h o is The height of the prediction result o,
Is a variance constant;

The appearance feature similarity measure f S (·) between the observation d and the prediction result o is defined as:

Where h d is the height of the observation d
Is a variance constant;

Calculating the associated cost matrix between the observation result and the prediction result by using the second similarity measure includes:

The spatial distance feature similarity measure and the appearance feature similarity measure are merged by multiplicative fusion to obtain the degree of association between the observation result and the predicted result, which is defined as:

s ij =f D (o,d)×f s (o,d) (3)

Obtaining an associative cost matrix between the observation result and the prediction result according to the degree of association, which is defined as:

S=[s ij ] n×l (4)

Where i=1,2,...n,j=1,2,...,l;

The greedy algorithm is used to optimize the correlation cost matrix to find related observation results and prediction results including:

Finding a maximum of all elements in the associated cost matrix S that are not marked;

Determining whether the maximum value is a maximum value in a row and column, and is greater than a first threshold;

If greater, the observation is correctly associated with the prediction.
The method according to claim 2, wherein if an occlusion occurs between the prediction result and the other of the prediction results, performing a second data association between the prediction result and the observation result comprises:

Calculating a third similarity measure between the observation result and the prediction result, the third similarity measure including an appearance feature similarity measure, a geometric feature similarity measure, a motion feature similarity measure, and a spatial distance feature similarity Sex measure

Calculating a weight value of each feature similarity measure in the third similarity measure by using a fuzzy inference system model;

Performing multi-feature clue fusion on the weight value and the third similarity measure to obtain an associative cost matrix between the observation result and the prediction result;

The correlation cost matrix is optimized by using a greedy algorithm to find the associated observation result and the prediction result.
The method according to claim 5, characterized in that the appearance feature similarity measure f A (·) between the observation result d and the prediction result o is defined as:

Where ρ(·) is a Barthel's coefficient, H c (·) is a color histogram feature weighted by the background of the current video frame image, and H g (·) is a histogram of the block gradient direction histogram.
Is a variance constant,
Is a variance constant;

The motion feature similarity measure f M (·) between the observation d and the prediction result o is defined as:

Where (x' o , y' o ) is the central coordinate of the prediction result o at the previous moment, and (x o , y o ) is the central coordinate of the prediction result o,
The projection of the speed of the prediction result o on the coordinate axis for the previous moment,
Is a variance constant;

The spatial distance feature similarity measure f D (·) between the observation d and the prediction result o is defined as:

Where ||·|| 2 is a two-norm, (x o , y o ) is the central coordinate of the prediction result o, and (x d , y d ) is the central coordinate of the observation d, h o is The height of the prediction result o,
Is a variance constant;

The geometric feature similarity measure f S (·) between the observation d and the predicted o is defined as:

Where h d is the height of the observation d
Is the variance constant.
The method according to claim 1, wherein the trajectory association of the terminating trajectory segment and the new trajectory segment comprises:

Establishing a fuzzy association cost matrix between the termination trajectory segment and the new trajectory segment by using a first similarity metric;

Implementing a trajectory association between the termination trajectory segment and the new trajectory segment by using a maximum fuzzy comprehensive similarity and a threshold discriminant principle;

Filling the trailing trajectory segment on the association and the missing trajectory segment between the new trajectory segments.
The method according to claim 7, wherein the establishing a fuzzy association cost matrix between the termination trajectory segment and the new trajectory segment by using the first similarity metric comprises:

Establishing a similarity vector between the termination trajectory segment and the new trajectory segment;

Calculating a degree of matching between the terminating trajectory segment and the new trajectory segment by using the similarity vector;

Calculating a fuzzy comprehensive between the end track segment and the new track segment according to the matching degree Similarity

Establishing an associated cost matrix of the terminating trajectory segment and the new trajectory segment according to the fuzzy comprehensive similarity.
The method of claim 8 wherein the set of termination trajectory segments is defined as:
The set of new track segments is defined as:
Where n a and n b respectively represent the set of the end track segment and the number of the new track segment set;

The first similarity measure includes an appearance similarity measure, a shape similarity measure, and a motion similarity measure;

The appearance similarity measure is defined as:

Where ρ(·) is expressed as a Bhattacharyya coefficient, and H c (·) is a background weighted color histogram feature.
For the variance constant, H g (·) represents the direction gradient histogram feature,
Is a variance constant;

The shape similarity measure is defined as:

Where h i represents the height of the terminating trajectory segment T i in the image, and h j represents the height of the new trajectory segment T j in the image,
Is a variance constant;

The motion similarity measure is defined as:

among them,
Representing a Gaussian distribution, where ∑ is the variance of the Gaussian distribution, Δt is the first observed frame interval of the new trajectory segment T j that the ending trajectory segment T i finally observed,
v i is the termination position and velocity of the terminating trajectory segment T i , respectively.
v j is the starting position and speed of the new track segment, respectively;

The similarity vector is defined as:

Where Λ k (T i , T j ) ∈ [0, 1] 3 , τ gap is the associated time interval threshold,
a time frame indicating that the terminating trajectory segment T i is disconnected,
a time frame indicating the start of the new track segment T j ;

The matching degree is defined as:

Where ∧ indicates that the matching degree takes a minimum value, and ∨ indicates that the matching degree takes a maximum value;

The fuzzy comprehensive similarity is defined as:

The associated cost matrix is defined as:
The method according to claim 2, wherein the missing track segment between the terminating track segment and the new track segment on the padding association comprises:

Performing bidirectional prediction on the missing trajectory segment between the termination trajectory segment and the new trajectory segment on the association to obtain location information of the prediction point;

Obtaining rectangular frame information of the predicted point;

And filling the missing track segment according to the position information of the predicted point and the rectangular frame information.
An apparatus for video multi-target tracking based on fuzzy logic, comprising: a processor and a camera, wherein the processor is connected to the camera;

The processor is configured to perform online target motion detection on the current video frame acquired from the camera, and detect the obtained possible motion object as an observation result; perform data association on the observation result and the target prediction result, where the prediction The result is obtained by predicting at least the trajectory of the target of the previous video frame; performing trajectory management on the unpredicted prediction result and the observation result, including obtaining by using the prediction result that is not associated with the prediction result Terminating the trajectory segment and acquiring a new trajectory segment by using the observation result that is not associated, and performing trajectory association on the terminating trajectory segment and the new trajectory segment.