CN114663977A

CN114663977A - Long-time span video image pedestrian monitoring accurate tracking method

Info

Publication number: CN114663977A
Application number: CN202210296634.0A
Authority: CN
Inventors: 梁徽耀; 傅万乐; 傅小强; 陈诚
Original assignee: Longgang Tianyu Information Technology Co ltd
Current assignee: Longgang Tianyu Information Technology Co ltd
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2022-06-24

Abstract

This application adopts structurized developments background self-adaptation model to realize the dynamic learning process to pedestrian detection and pursuit problem, overcomes pedestrian detection and the irregular motion of common target in the pursuit process, background change, shelters from and disappears the scheduling problem, prescribes a limit to the figure of support vector, improves the adoption rate of algorithm to computational resource: on the pedestrian detection method, a scale search transformation target tracking algorithm is adopted, the problem that the target is likely to be amplified and reduced is solved, and the tracking accuracy is improved; in the pedestrian tracking method, the dynamic learning detection module is replaced by a structured dynamic background self-adaptive model, the problems of shielding, disappearance and the like can be solved by an improved algorithm, good performance is shown, multi-scale estimation is expanded, and the drift problem is solved; in the target feature extraction method, the HOG extended features are adopted for target feature extraction, so that the tracking accuracy of the tracking algorithm can be better improved, and the method has important functions and great application values in multiple fields.

Description

Long-time span video image pedestrian monitoring accurate tracking method

Technical Field

The application relates to a pedestrian monitoring and accurately tracking method based on video images, in particular to a pedestrian monitoring and accurately tracking method based on long-time span video images, and belongs to the technical field of pedestrian monitoring and tracking of videos.

Background

Pedestrian target detection and tracking are important functions of intelligent video monitoring and intelligent traffic, and have many applications in the field of computer vision, such as robotics, monitoring, automation and the like. In general video tracking, pedestrian tracking can accomplish the task of estimating the trajectory of a pedestrian object throughout a sequence under the condition that only the initial position of the pedestrian object is known. The prior art tracking method learns the appearance of an object model and then evaluates a new object state framework based on the appearance model.

Pedestrian tracking can be applied to many scenes such as traffic monitoring, human-computer interaction, motion analysis, video monitoring and the like. The problem of pedestrian tracking is mainly caused by changes in the target point, occlusion, deformation, appearance changes due to rapid movement, and the like. The accurate estimation has great difficulty due to the problems that the size of the moving target changes, changes along the camera axis or the appearance of the target and the like. Other factors such as occlusion, rapid movement, and changes in illumination also present challenges for accurate tracking.

The intelligent pedestrian detection is very important for protecting drivers and pedestrians from traffic accidents, the pedestrian detection and tracking have great influence on the pedestrian counting performance, the pedestrian detection and tracking automatically detects an interested target image sequence, and does not judge the sequence of a target in the future. At present, the pedestrian detection and tracking technology is widely applied to the position scene with high safety requirements, such as banks, transportation, supermarkets, warehouses, social comprehensive treatment and the like. In recent years, the development of virtual reality technology is becoming more mature, which requires that a computer has the capability of acquiring various human actions and voices, and must be able to capture human actions at any time and accurately recognize and even understand the actions.

Pedestrian detection is widely applied to the fields of intelligent control, intelligent traffic driving assistance, advanced human-computer interaction and the like. However, pedestrian detection is a problem of practical significance. First, pedestrians are non-rigid objects that are diverse in their bodies and motions; secondly, there may be overlap between pedestrians wearing different clothing and pants; furthermore, the background environment is very complex and frequently changing in practice.

The prior art has many researches and applications related to the aspect of pedestrian detection, firstly, a cascade detector is trained by using Haar characteristics and motion information of pedestrians, the cascade detector consists of a plurality of strong classifiers, and the method achieves good effect in MIT pedestrian detection; secondly, an oriented gradient histogram is used as a feature descriptor, and an SVM classifier is adopted in an INRIA pedestrian database to obtain an excellent effect; partitioning the target, providing a deformable model based on components, combining various characteristics and motion information of pedestrian detection, and performing a pedestrian detection method based on multiple sparse dictionaries; and fourthly, a deformable model is adopted, so that people can be detected, even pedestrians with serious adhesion can be detected, and the effect is good. In addition to these artificially designed features, neural networks are also applied in pedestrian detection.

In recent decades, the pedestrian tracking technology has also attracted much attention, a large number of excellent tracking algorithms are continuously emerging, and tracking frames oriented to various complex application scenarios are also endless, and the application requirements in the fields of business, society, civil life and the like are increasing day by day. Due to the wide application prospect, the research and the application of pedestrian detection and tracking are paid unprecedented attention.

Pedestrian detection is based on video or still images, and human body regions in the images are separated through algorithm combination. A series of technologies such as target detection, tracking, behavior understanding and the like are linked, and only each previous operation can be effectively executed and each subsequent operation can be acted. As the early-stage work of operation technologies such as target tracking, behavior understanding and the like, the effective detection of the human body is the key point of the whole technology, and each subsequent processing process is based on the detection result, and the detection rate has direct influence on subsequent processing such as moving human body classification, tracking and the like. The algorithm for pedestrian detection comprises a continuous interframe difference method, a background difference method, an optical flow method and other static background algorithms. The method also comprises dynamic background algorithms such as a matching block method, an optical flow estimation algorithm, an image matching method, a global motion estimation algorithm and the like.

In summary, the pedestrian monitoring and accurately tracking method based on video images in the prior art has obvious defects, and the main defects and design difficulties thereof include:

first, although many studies on pedestrian detection and tracking are becoming mature, a reliable and highly robust fully intelligent pedestrian detection and tracking system is still a challenge to solve due to the complexity of the environment where pedestrians are located. The current reasons for influencing the target detection effect mainly include: the change of the gray value of the pedestrian target affects the detection precision, the noise inevitably affects the detection result, and the interference target can cause false alarm of the moving target, target overlapping, camera shaking and the like. The difficulties of target tracking mainly include: multi-target pedestrian tracking difficulty caused by various types; the used template algorithm has the problems of initial selection and self-adaptive updating of the template; the problems of motion area detection, positioning accuracy, target conflict, occlusion, target position prediction and the like cannot be solved well in the prior art;

secondly, the prior art does not have a pedestrian moving target tracking algorithm which can be widely applied to various complex environments, and the following problems are not solved: firstly, motion segmentation is carried out, a target object usually appears in different scenes under the influence of weather change, illumination change, shielding among different objects, camera motion and the like in the tracking process, and establishing a constant model in the change is a very important point; secondly, target position or target deformation: in the prior art, the appearance of a target object is changed, and then a template method is adopted to perform image recognition on a high-change target object, so that low performance is displayed; thirdly, the detection and the positioning of the target object are obstructed by a messy scene; fourthly, occlusion occurs, complex scenes are occluded inevitably, particularly in the multi-target tracking problem, different targets are easy to be occluded mutually under respective motion tracks, and at the moment, a model with higher robustness is required to process the target matching problem during occlusion; fifthly, the treatment efficiency is: the real-time performance is a real requirement of tracking, the tracking efficiency is particularly important when the algorithm precision is insufficient, and particularly under the condition that the size of the current video image is larger and larger, the processing efficiency of the algorithm needs to be improved, and the processing speed of the algorithm is improved so as to achieve the effect of real-time processing;

third, pedestrian detection is a practical but difficult problem. First, pedestrians are non-rigid objects, diverse and complex in body and motion; secondly, there may also be visible overlap between pedestrians wearing different clothing and pants; in addition, the background environment is very complex and frequently changes in practice, and in the pedestrian detection in the prior art, firstly, a strategy needs to be designed for a generated and marked sample, and at present, it is not clear how to realize the detection effectively; secondly, the labeled samples output by the classifier cannot be accurately correlated to the estimation of the target, namely the optimal classification result does not mean that the position of the moving target can be estimated most accurately, and the assumption that the maximum classifier is consistent with the optimal target position prediction is not true at all because the two targets cannot be completely matched during learning; in the prior art, the HOG does not have rotation invariance, the HOG does not have scale invariance and is very sensitive to noise, therefore, Gaussian smoothing processing is needed, and the defects of the prior art determine that the HOG cannot finish long-time span video image pedestrian monitoring accurate tracking;

fourthly, in the long-time real-time tracking of pedestrians, unpredictable conditions such as changes of the action posture of pedestrians, changes of background reference objects, illumination conditions, shielding and disappearance always exist, the target model and the background model are distorted, and the pedestrian tracking module and the self-adaptive detection module need to update data to adapt to the changes. In the tracking algorithm in the prior art, the detection algorithm is adopted only in the initialization process to find out the positioning and modeling of the pedestrian, the data are updated mainly in the following tracking process by the tracking algorithm, the detection module does not intervene in the tracking process, and when the target is shielded or disappears, the motion change of the pedestrian cannot be accurately estimated by adopting the original pedestrian tracking model, so that the tracking failure result is easy to occur. The adaptive detection module needs to train a parameter model in advance, and the selected training sample may contain various shapes, postures and illumination conditions, so that the robustness of the algorithm can be ensured. Although the adaptive detection module is added in the tracking frame to improve the adaptability of the tracking frame to various changes, the prior art lacks a mechanism capable of adjusting the tracking module and the adaptive detection module in real time to have a more perfect tracking effect in real time tracking for a long time.

Disclosure of Invention

The application adopts the structured dynamic background self-adaptive model to realize the dynamic learning process in the algorithm aiming at the problems of pedestrian detection and tracking, overcomes the common problems of irregular motion, background change, shielding, disappearance and the like of common targets in the pedestrian detection and tracking process, limits the number of support vectors in the model algorithm, and improves the utilization rate of the algorithm to computing resources: firstly, on the pedestrian detection method, an improved HOG (histogram oriented gradient) extended feature extraction method is adopted, and a scale search transformation target tracking algorithm is adopted, so that the problem that the target is likely to be amplified and reduced is solved, and the tracking accuracy is improved; secondly, in the pedestrian tracking method, the dynamic learning detection module is replaced by the structured dynamic background self-adaptive model, the improved algorithm can solve the problems of shielding, disappearance and the like, good performance is shown, the high-dimensional problem of few samples is solved, the structured dynamic background self-adaptive tracking algorithm is expanded to multi-scale estimation, and the drift problem is solved; and thirdly, in the target feature extraction method, the HOG extended features are adopted for extracting the target features, so that the method still has good invariance under the condition that the physical geometric change of the target and the optical change of the external environment occur, can better improve the tracking accuracy of the tracking algorithm, and has important functions and great application values in multiple fields.

In order to achieve the technical effects, the technical scheme adopted by the application is as follows:

a long-time span video image pedestrian monitoring accurate tracking method is characterized in that pedestrian tracking is regarded as a structured dynamic background self-adaptive target detection problem, a closed form solution is provided, a model is rapidly updated in an original form, and the updating step length of the model is calculated; secondly, a nonlinear kernel is adopted, the linear property of the structured dynamic background self-adaptive model is kept, and the nonlinear kernel is matched with the explicit characteristic diagram; expanding a tracker and performing multi-scale estimation, and improving the performance through large-scale change tracking;

the method specifically comprises the following steps: the method comprises the steps of long-time span pedestrian monitoring accurate tracking framework, scale search transformation, an improved HOG extended feature extraction method, structured dynamic background self-adaptive kernel function setting and a structured dynamic background self-adaptive optimization algorithm;

firstly, on a pedestrian detection method, an improved HOG extended feature extraction method is adopted: for the change of the size of the target pedestrian relative to the video picture, a scale search transformation target tracking algorithm is adopted, the problem of target amplification and reduction is solved, and the position and the size of the target are accurately detected;

secondly, on the basis of a long-time span pedestrian monitoring accurate tracking frame, a dynamic learning detection module is replaced by a structured dynamic background self-adaptive model to track pedestrians, the problems of shielding and disappearance are solved, the problems of high dimension and nonlinearity of few samples are solved through an online structured dynamic self-adaptive learning method, a classifier is trained by acquiring data in real time according to the tracking result of the previous frame, the structured dynamic background self-adaptive tracking algorithm is expanded to multi-scale estimation, and the drift problem is solved;

and thirdly, on the pedestrian feature extraction method, an HOG algorithm is improved, target feature extraction is carried out by adopting an HOG expansion feature, the HOG expansion feature represents the local gradient direction and gradient intensity distribution feature of the image, the distribution of the edge direction can well represent the outline of the pedestrian target under the condition that the specific position of the edge is unknown, and the pedestrian target still has good invariance when the physical geometric change of the target and the optical change of the external environment occur.

Long-time span video image pedestrian monitors accurate pursuit method, and is further, long-time span pedestrian monitors accurate pursuit frame: the method comprises three modules, namely pedestrian tracking, dynamic learning and self-adaptive detection, wherein the pedestrian tracking module estimates the motion condition of a target between continuous frame sequences, and meanwhile, aiming at tracking failure self-adaptive detection, when pedestrians in a video disappear or are shielded, the target cannot be accurately tracked, and at the moment, the self-adaptive detector is informed to detect again, so that the target can be tracked again; the self-adaptive detection module is formed by cascading a random forest self-adaptive detector, a variance detector and a nearest neighbor detector together, so that the target is accurately detected, the pedestrian tracking and self-adaptive detection processes are updated in real time through online dynamic learning, and are mutually corrected and supplemented, and various emergency situations in scenes where pedestrians appear are dealt with.

The method for accurately tracking the pedestrians in the long-time span video image comprises the steps of setting a structured dynamic background self-adaptive tracking model, setting a mechanism for adjusting a tracking module and a self-adaptive detection module in real time, adopting a tracking result of a previous frame in real time, iteratively extracting target features to update a tracker, simultaneously adopting positive sample data and negative sample data generated by self-adaptive detection to evaluate processing results of a detector and the tracker in real time, correcting in time, and avoiding errors from being accumulated continuously in the following tracking process.

A long-time span video image pedestrian monitoring accurate tracking method is further characterized in that scale searching transformation is carried out: and a scale searching target tracking filter is adopted and matched with a structured dynamic background self-adaptive model to realize scale searching transformation, the scale searching target tracking filter carries out estimation on the scale of the target of the current frame, the central position of a target frame in the previous frame of image is obtained and recorded as a central point, the scale is recalculated, and then the value is assigned to the next frame of image to complete scale change.

A long-time span video image pedestrian monitoring accurate tracking method is further provided, and an improved HOG extended feature extraction method comprises the following steps: computing HOG extension features and selecting valid features from the extended features, the method of extending HOG extension features using f (u, v) to represent pixel intensity (u, v) of coordinates:

formula 1

f_uAnd f_vThe composition image gradient, representing u and v, the gradient magnitude m (u.v) and orientation θ (u, v) of the coordinates (u, v) is calculated as:

formula 2

Where, one is cell size C: containing image pixel information, determined by an upper left designated position (x, y) and a width w and a height h, and a block size R_pq: one block is a rectangle formed by a group of units, a normalized histogram is processed in the block to reduce the influence of negative illumination factors, the position and the size of the histogram are respectively defined by (x ', y') and (w ', h'), the blocks with the sizes are independently used for normalization, and p and q are the numbers of transverse units and longitudinal units respectively; thirdly, the number of intervals l and the number of grids B;

by the number of grids B, cell size C, block size R_pqNumber of intervals l, cell size at coordinates (u, v), index representing the size of the tile and the interval, respectively, extended HOG extension feature

Calculating formula:

formula 3

Formula 4

The pixel with the central pixel of the block located in the center of the cell, these features are enumerated as a feature vector x;

the method comprises all the expansion of the parameters, selects effective results from high-dimensional features to lock the position of the pedestrian, and realizes high-precision self-adaptive detection rate of the pedestrian.

A long-time span video image pedestrian monitoring accurate tracking method is further characterized in that a structured dynamic background self-adaptive kernel function is set: analyzing the image after the image is processed in the early stage, extracting a plurality of training samples around the target and marking the real values of the training samples so as to initialize the model, then, training is carried out by adopting the sample to obtain the optimal interface and the required model parameters, when the next frame of image appears, starting from the estimated target position of the previous frame, searching in a certain range nearby to obtain several sample data, classifying the data by using the above-mentioned frame image trained structured dynamic background self-adaptive model, determining the final target estimation position by using weighted summation of data samples whose classification results are targets, repeatedly updating classifier after determining target position, updating the model, and continuously circulating in such a way until all image frames are processed to obtain the estimated positions of the targets in all the image frames, thereby realizing the target tracking task in the video sequence;

the method is characterized in that an optimal classification surface is searched for by the structured dynamic background self-adaptive model under the linear divisible condition, the minimum structured risk is ensured, when the inseparable condition of data occurs, a kernel function is applied, the data can be processed in a high-dimensional space, training samples of a classifier cannot independently appear and appear in pairs in the form of inner products, the mathematical description of the classifier is < x, y >, and the decision function of the structured dynamic background self-adaptive model is as follows:

formula 5

The output of the prediction function f forms a sequence (x, y), y is a change value of a target, L is a target function, α is a lagrangian factor, w is a normal vector, b is a constraint critical value, g is a decision coefficient, and α (i =1,2, L, n) is an optimal solution of the following quadratic optimization problem:

formula 6

Formula 7

C is a constant, balancing the training errors of the maximum and minimum boundaries, performing nonlinear transformation on the feature x, and mapping it to a high-dimensional space, assuming that the mapping function of the high-dimensional space is z = Φ (x), the decision function of the new feature in the high-dimensional space becomes:

formula 8

The corresponding secondary optimization problem is as follows:

formula 9

Formula 10

The structured dynamic background self-adaptive model meets the following conditions:

formula 11

Analyzing the functions before and after mapping, and only setting the inner product of the original features less than x in the structured dynamic background self-adaptive model no matter how the features change_i,x>Conversion to inner products of features in high dimensional space

And for the inner product function of the new feature, the mapping relation K is supposed to satisfy:

formula 12

The decision function in the high dimensional space can be written as:

formula 13

Comparing the f-functions in equations 5 and 13, no matter how many dimensions the target feature is mapped to, the kernel function value K (x) in the original space_iX) corresponding to the inner product of a high-dimensional space

The decision function of the features in the high-dimensional space can be obtained, the data are classified, the mapping operation of the high-dimensional space is completely avoided, the mapping dimension disaster is avoided, and the kernel function K (x)_iX) and direct calculation of the inner product of the original space < x_i,x>There is no great difference in the calculated amount of (a), and the essential condition of the presence or absence of the kernel function is: k (x)_iX) is an arbitrary symmetric function, for any Φ ≠ 0 and

the method comprises the following steps:

formula 14

And setting a kernel function meeting the conditions to complete the inner product operation of the training sample in a high-dimensional space, thereby avoiding the remarkable increase of the computational complexity and the dimensionality disaster caused by phi calculation.

A long-time span video image pedestrian monitoring accurate tracking method is further characterized by comprising the following steps of: the structured dynamic background self-adaptive model calculates the maximum function interval and geometric interval in order to obtain the optimal classification judgment surface;

when solving the maximum function interval, deducing that the judgment face meets the following conditions:

formula 15

The output of the prediction function f constitutes a sequence (x, y), y being the variation of the target and L being the targetA standard function, wherein alpha is Lagrange factor, w is normal vector, b is constraint critical value, g is decision coefficient, and solving the maximum value of 1/w is equivalent to solving

Then the original problem is equivalent to:

formula 16

The original problem is converted into a convex optimization problem which is a linear quadratic optimization problem and is converted into a Lagrange dual form for solving, and after a Lagrange multiplier is added to a constraint condition, the original problem is converted into a convex optimization problem which is a linear quadratic optimization problem:

formula 17

Using the lagrange function to obtain:

formula 18

Formula 19

Substituting the above equation into the decision function

The following can be obtained:

formula 20

Wherein K (x)_iAnd x) is a kernel function, and the original quadratic optimization problem is converted into:

formula 21

C is a constant that balances the training errors of the maximum and minimum boundaries, giving the meaning when α takes different values: when alpha is_iWhen the number is =0, the data can be classified normally and are distributed in the boundary; when 0 < alpha_iWhen < C, this point is indicated as the support vector, just on the boundary: when alpha is_iIf = C, it indicates that the boundary is not of any class in the middle of the parallel boundary;

in updating alpha not satisfying the above condition_iIn order to satisfy the conditions

Setting a constraint relation:

formula 22

Is composed of

The new value of (a) is determined,

is composed of

Old value of, cancel

Is obtained about

The single variable convex quadratic optimization problem of (1), then considering the constraint relation 0 <

< C, further obtaining

The analytical formula (D) is as follows:

formula 23

H and L are the upper and lower boundaries, if any

The solution process of (2) is considered as a linear programming model, and is further expanded into an upper boundary and a lower boundary, so that the analytic expressions of H and L are as follows:

formula 24

Formula 25

For

The method comprises the following steps:

formula 26

Establishing

The optimal interface is obtained, and the most main steps comprise two steps:

the first step is as follows: select a set of α to be updated_iAnd alpha_jDetermining alpha by searching_iAnd alpha_jTo ensure that the calculated objective function is closest to the global optimum value;

the second step is that: by using alpha_iAnd alpha_jTo optimize the objective function, this process maintains alpha_k(k ≠ i, j) is unchanged;

searching for alpha_iAnd alpha_jThe method mainly comprises the following steps: scanning all multipliers, setting the multiplier violating the KKT condition as the first updating object as alpha_j(ii) a Then choose to make | B among all multipliers that do not violate the KKT condition_i- B_jThe maximum multiplier of | is α_i，B_iRepresenting the difference between the predicted value and the true value;

α_iand alpha_jThe optimization process of (2): suppose that in some iteration, alpha needs to be updated₁And alpha₂Then the objective function translates into:

formula 27

In this process, α is updated₁And alpha₂Comprises the following steps:

step 1: upper and lower bounds H and L are first calculated by equations 24 and 25;

step 2: calculating the second derivative of the objective function, the method comprises:

formula 28

And 3, step 3: updating alpha according to equation 26₁And alpha₂；

Selecting only alpha at each iteration of calculation_iAnd alpha_jAs a set of updated data, the other components are kept unchanged, and alpha is calculated_iAnd alpha_jAfter the updated value is obtained, the updated value is substituted into the objective function to solve other components, then a new objective function is obtained, and the optimal interface is solved in the structured dynamic background self-adaptive model algorithm to be used as the optimal method for solving the judgment plane.

A long-time span video image pedestrian monitoring accurate tracking method is further provided, and the improved HOG + structured dynamic background self-adaptive pedestrian tracking method comprises the following steps: the characteristics of the adaptive tracking model based on the structured dynamic background comprise: firstly, the structured dynamic background self-adaptive classifier is in a closed form, and a nonlinear classifier is trained and evaluated more quickly; secondly, the structured dynamic background self-adaptive tracker adopts a high-dimensional linear characteristic to better represent a target; third, multi-scale estimation improves performance through large-scale change tracking.

Compared with the prior art, the innovation points and advantages of the application are as follows:

first, the application aims at the problem of pedestrian detection and tracking, a dynamic learning process in an algorithm is realized by adopting a structured dynamic background adaptive model, common problems of irregular motion, background change, shielding, disappearance and the like of a target in the pedestrian detection and tracking process are overcome, the number of support vectors is limited in the structured dynamic background adaptive model algorithm, and the utilization rate of the algorithm to computing resources is improved: firstly, on the pedestrian detection method, an improved HOG extended feature extraction method is adopted: in order to accurately detect the position and the size of a target, a scale search transformation target tracking algorithm is adopted for the change of the target pedestrian relative to the size of a video picture, so that the problem of amplification and reduction of the target which may occur is solved, and the tracking accuracy is improved; secondly, on the basis of a long-time span pedestrian monitoring accurate tracking frame, a dynamic learning detection module is replaced by a structured dynamic background self-adaptive model, the problems of shielding, disappearance and the like can be solved by an improved algorithm, good performance is shown in a benchmark test data set, and the high-dimensional problem of few samples is solved by the online structured dynamic self-adaptive learning method; the structured dynamic background self-adaptive tracking algorithm is expanded to multi-scale estimation, and the drift problem is solved; thirdly, on the basis of a target feature extraction method, an HOG algorithm is improved, and the HOG extended features are adopted for target feature extraction, so that the target feature extraction method still has good invariance under the conditions of geometric change of target physics and optical change of external environment, and the tracking accuracy of the tracking algorithm can be better improved;

secondly, adding scale search transformation aiming at the change of the relative size of the target in the video, constructing a scale search and target estimation method based on a one-dimensional independent correlation filter, having strong portability, being capable of matching with a structured dynamic background self-adaptive algorithm and tracking by combining a long-time span pedestrian monitoring strategy framework, wherein the scale filter firstly samples a series of targets with different scales, then dynamically learns on line, and finally obtains an accurate scale and adds a scale parameter into the calculation of the next frame. Compared with a standard method, the method has the advantages that the appearance change of the target, including the target scale change, is directly and dynamically learned, so that the pedestrian recognition monitoring precision is greatly improved, and in addition, the calculation cost of the algorithm is also reduced;

thirdly, a structured dynamic background self-adaptive model is provided based on a long-time span pedestrian monitoring accurate tracking framework, and a traditional linear SVM is replaced by a structured dynamic background self-adaptive model capable of processing a nonlinear condition, so that the classification precision is higher, the speed is higher, and the idea of tracking under a dynamic background is added; the method comprises the steps of adding a static simple picture classification idea into detection-based tracking, then applying a long-time span tracking frame into dynamic pedestrian tracking, replacing a dynamic learning module with a structured dynamic background self-adaptive model on the basis of an online dynamic learning algorithm, improving the adaptability of the algorithm to background change and target motion, well solving the problems of target shielding and the like, improving the accuracy and real-time performance of pedestrian monitoring and tracking, having a wide application range and strong robustness and reliability;

fourthly, compared with the TLD and related filtering methods, the provided structured dynamic background adaptive kernel function setting and optimization algorithm has obvious advantages, the TLD tracking method has the characteristics of good real-time performance, good frame portability and the like, but the problems of tracking failure and the like easily occur under the condition that the similarity between pedestrians and the background is high in tracking, the TLD algorithm can be separated from a tracking target in a short time after the pedestrians encounter shielding, the structured dynamic background adaptive model is prominent in the two points, through experimental comparison of data sets, the tracking capability of the algorithm of a related filter can be lost when the target is shielded or disappears in a lens, the structured dynamic background adaptive model has good tracking capability, robustness on shielding, light and shadow changes, noise and the like is achieved, and the stability and effect of long-time span tracking are better, has important function and great application value in a plurality of fields.

Drawings

Fig. 1 is a long-time span pedestrian monitoring accurate tracking frame structure diagram.

Fig. 2 is a schematic diagram of a method for calculating the HOG extension feature and selecting the valid feature.

FIG. 3 is a graph illustrating tracking accuracy of different features in different data sets.

Fig. 4 is a graph illustrating the tracking success rate of different features in different data sets.

FIG. 5 is a diagram of an example of a training process and a classification process in a structured dynamic background adaptive algorithm.

FIG. 6 is a constraint relation diagram of a structured dynamic background adaptive linear programming model.

FIG. 7 is a graph of the tracking effect of the scale search transform structured dynamic background adaptive model and QSEV.

Fig. 8 is a graph of comparison results of pedestrian tracking accuracy for several algorithms in different data sets.

Figure 9 is a graph of comparison of pedestrian tracking success rates for several algorithms in different data sets.

Detailed description of the invention

The following further describes the technical solution of the long-time span video image pedestrian monitoring accurate tracking method provided in the present application with reference to the accompanying drawings, so that those skilled in the art can better understand the present application and can implement the present application.

Pedestrian detection is widely applied to the fields of intelligent control, intelligent traffic driving assistance, advanced human-computer interaction and the like. However, pedestrian detection is a problem of practical significance. First, pedestrians are non-rigid objects that are diverse and complex in body and motion; secondly, there may also be visible overlap between pedestrians wearing different clothing and pants. In addition, the background environment is very complex and frequently changes in practice, and pedestrian detection has become a hot spot for intelligent applications.

The detection-based tracking method mainly comprises a trained classifier, and is used for identifying pedestrians from a surrounding complex background. Typically, a sliding window is used, given the initial pedestrian location, and the conventional algorithm is to use a stack of training samples labeled by two values and then update the classifier. These algorithms divide the adaptive module of the tracker into two parts: the generation and marking of the samples and the updating of the classifier are carried out.

However, after extensive use, this classification causes a series of problems: firstly, a strategy needs to be designed for generating and marking samples, however, it is not clear how to realize the strategy effectively, and whether a sample should be marked as a positive sample or a negative sample is generally determined by predefining a distance variable such as a sample to a predicted target position; secondly, the target of the classifier is to correctly predict a sample of the binary marker, while the target of the tracker is to accurately predict the target position, the marker sample output by the classifier cannot be accurately correlated to the estimation of the target, i.e. the optimal classification result does not mean that the position of the moving target can be estimated most accurately, and since the two targets cannot be completely matched during learning, the assumption that the maximum classifier is consistent with the optimal target position prediction is completely false.

The application optimizes and improves the defects of the pedestrian detection and tracking technology in the prior art, and achieves certain progress, and mainly comprises the following steps:

1. on the pedestrian detection method, an improved HOG extended feature extraction method is adopted: in order to accurately detect the position and the size of a target, a scale search transformation target tracking algorithm is adopted for the change of the target pedestrian relative to the size of a video picture, so that the problem of amplification and reduction of the target which may occur is solved, and the tracking accuracy is improved;

2. on the basis of a traditional HOG + SVM detection algorithm, a dynamic learning detection module is replaced by a structured dynamic background self-adaptive model based on a long-time span pedestrian monitoring accurate tracking frame provided by the application on the pedestrian tracking method, pedestrians are tracked, and the problems of shielding, disappearance and the like can be solved through an improved algorithm. The method has the advantages that good performance is shown in a benchmark test data set based on a structured dynamic background self-adaptive model, the high-dimensional problem of few samples is solved by an online structured dynamic self-adaptive learning method, and finally, the structured dynamic background self-adaptive tracking algorithm is expanded to multi-scale estimation to solve the drift problem;

3. on the basis of the target feature extraction method, an HOG algorithm is improved, the HOG expansion feature is adopted for target feature extraction, the image gradient feature is processed by the HOG expansion feature, the method still has good invariance under the condition that the physical geometric change of a target and the optical change of an external environment occur, the target pedestrian feature is extracted by the improved HOG expansion feature, and the tracking accuracy of the tracking algorithm can be better improved.

One, traditional HOG + SVM pedestrian detection

Pedestrian detection is a difficult problem, and pedestrian objects can have the effects of varying actual environments, with a variety of poses, clothing, and also backgrounds, lighting. In order to solve the problem, in the HOG + SVM pedestrian detection in the prior art, a linear SVM is adopted as a test set, so that the speed and the simplicity are ensured. The HOG algorithm is based on evaluating a local histogram of normalized image gradient directions, and according to the appearance and shape of a local target, even if there is no precise corresponding gradient or edge position information, the distribution of local density gradients or edge directions can be characterized, and thus is sensitive to local features, and a specific flow chart of the HOG + SVM algorithm is shown in fig. 1.

However, HOG has its own drawbacks: for the case of occlusion or disappearance, detection is not easy; secondly, the HOG itself has no rotational invariance, because the main direction of the gradient direction is not selected, and the histogram of the gradient direction is not rotated, the rotational invariance is realized by adopting training samples in different rotational directions; second, HOG does not have scale invariance itself; finally, HOG is very sensitive to noise, so gaussian smoothing is also required.

Second, improved HOG + structured dynamic background self-adaptive pedestrian tracking method

The method is based on a structured dynamic background self-adaptive tracking model, firstly, pedestrian tracking is regarded as a structured dynamic background self-adaptive target detection problem, the model is rapidly updated in an original form, a closed form solution is provided by analyzing the relation between bivariables and original variables, and the updating step length of the model is calculated; secondly, a nonlinear kernel is adopted, the linear property of the structured dynamic background self-adaptive model is kept, and the nonlinear kernel is matched with the explicit characteristic diagram; third, to overcome large scale variations due to drift problems, the tracker and multi-scale estimation are extended. The experiment adopts an image sequence in a reference data set, and the result shows that the structured dynamic background self-adaptive tracking achieves the most advanced performance at present.

The main contributions include: firstly, the structured dynamic background self-adaptive classifier can train and evaluate a nonlinear classifier more quickly in a closed form; secondly, the structured dynamic background self-adaptive tracker adopts a high-dimensional linear characteristic to better represent a target; third, multi-scale estimation further improves performance through large-scale change tracking.

Long-time span pedestrian monitoring accurate tracking frame

The long-time span pedestrian monitoring accurate tracking method is composed of three modules, namely pedestrian tracking, dynamic learning and self-adaptive detection, and a long-time span pedestrian monitoring accurate tracking frame is shown in figure 1. The pedestrian tracking module estimates the motion situation of the target between the continuous frame sequences, and meanwhile, aiming at tracking failure self-adaptive detection, when the pedestrian in the video disappears or is shielded, the target cannot be accurately tracked, and at the moment, the self-adaptive detector is informed to detect again, so that the target can be tracked again; the self-adaptive detection module is formed by cascading a random forest self-adaptive detector, a variance detector and a nearest neighbor detector together, so that the target is accurately detected, the pedestrian tracking and self-adaptive detection processes are updated in real time through online dynamic learning, and are mutually corrected and supplemented, and the aim of coping with various emergency situations in scenes where pedestrians appear is achieved.

In the long-time real-time tracking of the pedestrian, unpredictable conditions such as the change of the action posture of the pedestrian, the change of a background reference object, the change of illumination conditions, the shading and the disappearance always exist, a target model and a background model are distorted, and the pedestrian tracking module and the self-adaptive detection module need to update data to adapt to the change. In the tracking algorithm in the prior art, the detection algorithm is adopted only in the initialization process to find out the positioning and modeling of the pedestrian, the data are updated mainly in the following tracking process by the tracking algorithm, the detection module does not intervene in the tracking process, and when the target is shielded or disappears, the motion change of the pedestrian cannot be accurately estimated by adopting the original pedestrian tracking model, so that the tracking failure result is easy to occur. The adaptive detection module processes the situation, but the adaptive detection module needs to train a parameter model in advance, and the selected training sample may contain various shapes, postures and illumination conditions, so that the robustness of the algorithm can be ensured. Although the adaptive detection module is added in the tracking frame, the adaptability of the tracking frame to various changes can be improved, but a more perfect tracking effect can be achieved in long-time real-time tracking, a mechanism capable of adjusting the tracking module and the adaptive detection module in real time must be designed.

(II) Scale search transformation

The method and the device adopt a scale search target tracking filter and are matched with a structured dynamic background self-adaptive model to realize scale search transformation. And the scale searching target tracking filter estimates the scale of the target of the current frame, acquires the central position of the target frame in the previous frame of image, records the central position as a central point, recalculates the scale, and assigns a value to the next frame of image to finish the scale change.

(III) improved HOG extended feature extraction method

The method of computing HOG extended features and selecting valid features from the extended features, which is illustrated in fig. 2, with f (u, v) representing the pixel intensity (u, v) of the coordinates:

formula 1

formula 2

by the number of meshes B (B =1,3,9), the cell size C (w (pixel) × (pixel) =5 × 5,5 × 10,10 × 5), the block size R_pq(p × q =1 × 1,2 × 2,3 × 3), l is the number of sections, the unit size at coordinates (u, v), the index indicating the size of the rectangular block and the section, respectively, and the extended HOG extension feature

Calculating formula:

formula 3

Formula 4

The pixel in the center of the block is located in the center of the cell, and the features are enumerated as a feature vector x, if the parameters are set as a 30 × 60 (pixel) image, the features are 154362= { (30-5+1) × (60-5+1) + (30-5+1) × (60-10+1) + (3010+1) × (60-5+1) } × 3 × (1+3+ 9);

the application comprises all the extensions of the parameters (figure 2), effective results are selected from high-dimensional features to lock the position of the pedestrian, and the experimental results (figure 3) are compared to show the feasibility of the proposed improved method, so that the high-precision self-adaptive pedestrian detection rate is realized.

Comparing the data in fig. 3 and fig. 4, it can be seen that the improved HOG extended feature extraction has a significant advantage over the former two algorithms. The Human2 is used for indoor pedestrian detection, the accuracy and the success rate are quite high, and the effect of the representation algorithm is excellent.

(IV) structured dynamic background adaptive kernel function setup

In the object tracking framework of the application, the structured dynamic background adaptive model is the most important part, and the method is used for accurately classifying marked samples, judging possible object regions and background regions in the samples, further determining the final object estimation position according to the possibility of the occurrence of candidate object regions, and after the object estimation position is given, sending the characteristics of the estimated object into the structured dynamic background adaptive model to update the parameters and the support vectors of the model. Fig. 5 gives an example of the training process and the classification process in the algorithm.

Analyzing the image after the image is processed in the early stage, extracting a plurality of training samples around the target and marking the real values of the training samples so as to initialize the model, then, training is carried out by adopting the sample to obtain the optimal interface and the required model parameters, when the next frame of image appears, starting from the estimated target position of the previous frame, searching in a certain range nearby to obtain several sample data, classifying the data by using the above-mentioned frame image trained structured dynamic background self-adaptive model, determining the final target estimation position by using weighted summation of data samples whose classification results are targets, repeatedly updating classifier after determining target position, and updating the model, and continuously circulating in such a way until all the image frames are processed to obtain the estimated positions of the targets in all the image frames, thereby realizing the target tracking task in the video sequence.

formula 5

formula 6

Formula 7

formula 8

The corresponding secondary optimization problem is as follows:

formula 9

Formula 10

formula 11

formula 12

The decision function in the high dimensional space can be written as:

formula 13

Comparing the f-functions in equation 5 and equation 13, no matter how many dimensions the target feature is mapped to, the kernel function value K (x) in the original space_iX) corresponding to the inner product of a high-dimensional space

The decision function of the features in the high-dimensional space can be obtained, the data are classified, the mapping operation of the high-dimensional space is completely avoided in the process, the mapping dimension disaster is avoided, and the kernel function K (x)_iX) and direct calculation of the inner product of the original space < x_i,x>There is no great difference in the calculated amount of (a), and the essential condition of the presence or absence of the kernel function is: k (x)_iX) is an arbitrary symmetric function, for any Φ ≠ 0 and

the method comprises the following steps:

formula 14

Therefore, the kernel function meeting the conditions is arranged to complete the inner product operation of the training sample in the high-dimensional space, and the remarkable increase of the computational complexity and the dimensionality disaster caused by the phi calculation are avoided.

(V) optimization algorithm for adaptive structured dynamic background

The structured dynamic background self-adaptive model calculates the maximum function interval and geometric interval for obtaining the optimal classification judgment surface.

When solving the maximum function interval, deducing that the conditions met by the judgment face are as follows:

formula 15

The output of the prediction function f forms a sequence (x, y), y is a change value of a target, L is a target function, alpha is a Lagrange factor, w is a normal vector, b is a constraint critical value, g is a judgment coefficient, and solving the maximum value of 1/w is equivalent to solving

Then the original problem is equivalent to:

formula 16

formula 17

Using the lagrange function to obtain:

formula 18

Formula 19

Substituting the above equation into the decision function

The following can be obtained:

formula 20

Wherein K (x)_iAnd x) is a kernel function, then the original quadratic optimization problem is converted into:

formula 21

C is a constant that balances the training errors of the maximum and minimum boundaries, giving the meaning when α takes different values: when alpha is_iWhen the number is =0, the data can be classified normally and are distributed in the boundary; when 0 < alpha_iWhen < C, this point is indicated as the support vector, just on the boundary: when alpha is_iIf C, it indicates that the boundary is not in any class in the middle of the parallel boundary.

Setting a constraint relation:

formula 22

Is composed of

The new value of (a) is determined,

is composed of

Old value of, cancel

Get about

< C, is further obtained

The analytical formula (D) is as follows:

formula 23

H and L are the upper and lower boundaries, if any

The solution process of (2) is regarded as a linear programming model, and FIG. 6 is

The constraint relationship of (a) further expands to two boundaries, namely an upper boundary and a lower boundary, so that the analytic expressions of H and L are as follows:

formula 24

Formula 25

For

The method comprises the following steps:

formula 26

Establishing

The optimal interface is obtained, and the most main steps comprise two steps:

searching for alpha_iAnd alpha_jThe method mainly comprises the following steps: scanning all multipliers, setting the first update object violating the KKT condition as alpha_j(ii) a Then choose to make | B among all multipliers that do not violate the KKT condition_i- B_jThe maximum multiplier of | is α_i，B_iRepresenting the difference between the predicted value and the true value;

α_iand alpha_jThe optimization process of (2): suppose that in some iteration of the calculation, α needs to be updated₁And alpha₂Then the objective function translates into:

formula 27

In this process, α is updated₁And alpha₂Comprises the following steps:

formula 28

And 3, step 3: updating alpha according to equation 26₁And alpha₂。

At each iteration of the calculation, only alpha is selected_iAnd alpha_jAs a set of updated data, the other components are kept unchanged, and alpha is calculated_iAnd alpha_jAfter the updated value is obtained, the updated value is substituted into the objective function to solve other components, and then a new objective function is obtained. Compared with other optimization algorithms, the algorithm has the advantages that the calculation amount required by each iteration is small, the realization is more operable, in addition, in the convergence process, a kernel matrix does not need to be stored, complicated matrix operation does not exist, and the characteristic of rapid convergence is realized, so that the optimal interface is solved in the structured dynamic background self-adaptive model algorithm to serve as the optimal method for solving the judgment surface.

Third, analysis of experimental results

The method comprises the steps of firstly providing a pedestrian self-adaptive detection result based on a structured dynamic background self-adaptive model, and qualitatively providing a tracking effect. And then calculating indexes such as tracking accuracy and success rate quantitative analysis data of various methods including the prior art. Fig. 7 is a graph showing the tracking effect of the structured dynamic background adaptive model and the improved HOG-based tracking algorithm (QSEV) in a data set, which incorporates a scale search transformation.

The experimental result can be obtained, after the scale searching transformation is added, the performance is excellent in the normal advancing process, and the scale frame can be flexibly expanded or reduced according to the relative size of the pedestrian in the lens, so that the good effect can be achieved under the condition of no shielding and no disappearance. In the improved HOG + structured dynamic background adaptive algorithm (QSEV), the self-adaptive detection processing is carried out on the whole image, and the self-adaptive detection ensures the effectiveness of the algorithm on the conditions of shielding, disappearance and the like. When a pedestrian encounters a first obstacle, the pedestrian can still be tracked, when the pedestrian disappears under the lens, the pedestrian can still be detected in a self-adaptive mode again, the accuracy is not greatly reduced, and when the pedestrian encounters a second obstacle, the pedestrian still has a high tracking success rate. The method and the device try to still work when the occlusion is serious, and compared with other algorithms such as scale target tracking, the method and the device have the advantages that the situation that the occlusion direct tracking fails when the occlusion is encountered for the first time is greatly improved.

Fig. 8 and 9 are graphs for comparing the TLD algorithm, DSST algorithm, with the modified HOG + structured dynamic background adaptive algorithm (QSEV), and quantitatively analyzing the superiority of the latter. Two indexes of accuracy and success rate are selected in the quantitative analysis stage to evaluate the tracking effect and robustness of the algorithm.

The precision is mainly used for evaluating the accuracy of the tracking algorithm for tracking the target, and the definition is that the percentage of the frame number of the Euclidean distance between the real central position of the tracked target and the tracking position determined by the tracking algorithm in a set critical value range is counted. And counting the tracking accuracy when the set critical value is gradually changed from 0 to 50, and finally drawing an image to describe the accuracy, so that the influence of random errors caused by only selecting one critical value on the accuracy of an evaluation result is avoided. In addition, the reason why the commonly used central position error index is not selected when the effect of the tracking algorithm is evaluated is that the average Euclidean distance between the central position of the target and the central value of the position of the tracking mark is defined, when the tracker loses the target, the tracking position output by the algorithm is random, and the evaluation accuracy of the index is greatly influenced by the error of the Euclidean distance brought by the process.

The success rate is an evaluation standard established on the overlapping rate of the target bounding box and the tracking mark bounding box, and the mark position of the tracking algorithm is assumed to be y_tThe true position of the target is y_aThen the overlap ratio can be given by the formula S = | y_tIy_a|/|y_tUy_aWhere | represents the area within its region, here substituted by the number of pixel points. The success rate chart isThe ratio of the number of successful tracking frames when the overlap ratio is set from 0 to 1 is defined, and the success ratio when S =0.5 is used as a reference value, and compared with other algorithms.

The best tracking performance of the improved HOG + SVM algorithm (QSEV) of the present application can be seen from the data of fig. 8 and 9.

Claims

1. The long-time span video image pedestrian monitoring accurate tracking method is characterized in that firstly, pedestrian tracking is regarded as a structured dynamic background self-adaptive target detection problem, a closed form solution is provided, a model is rapidly updated in an original form, and the updating step length of the model is calculated; secondly, a nonlinear kernel is adopted, the linear property of the structured dynamic background self-adaptive model is kept, and the nonlinear kernel is matched with the explicit characteristic diagram; expanding a tracker and performing multi-scale estimation, and improving the performance through large-scale change tracking;

firstly, on a pedestrian detection method, an improved HOG extended feature extraction method is adopted: for the change of the size of the target pedestrian relative to the video picture, a scale search transformation target tracking algorithm is adopted, the problem of target magnification and reduction is solved, and the position and the size of the target are accurately detected;

2. The long-time span video image pedestrian monitoring accurate tracking method according to claim 1, characterized in that the long-time span pedestrian monitoring accurate tracking frame: the method comprises three modules of pedestrian tracking, dynamic learning and adaptive detection, wherein the pedestrian tracking module estimates the motion condition of a target between continuous frame sequences, and meanwhile, aiming at tracking failure adaptive detection, when pedestrians in a video disappear or are shielded, the target cannot be accurately tracked, and at the moment, the adaptive detector is informed to detect again, so that the target can be tracked again; the self-adaptive detection module is formed by cascading a random forest self-adaptive detector, a variance detector and a nearest neighbor detector together, so that the target is accurately detected, the pedestrian tracking and self-adaptive detection processes are updated in real time through online dynamic learning, and are mutually corrected and supplemented, and various emergency situations in scenes where pedestrians appear are dealt with.

3. The pedestrian monitoring accurate tracking method based on long-time span video images as claimed in claim 2, wherein the application provides a structured dynamic background adaptive tracking model, a mechanism for adjusting the tracking module and the adaptive detection module in real time is provided, the tracking result of the previous frame is adopted in real time, the target feature is extracted iteratively to update the tracker, meanwhile, the processing results of the detector and the tracker are evaluated in real time by adopting the positive sample data and the negative sample data generated by adaptive detection, and are corrected in time, so that errors are prevented from being accumulated continuously in the following tracking process.

4. The long time span video image pedestrian monitoring accurate tracking method according to claim 1, characterized in that the scale search transformation: and a scale search target tracking filter is adopted to match with a structured dynamic background self-adaptive model to realize scale search transformation, the scale search target tracking filter carries out estimation on the scale of the target of the current frame, the central position of the target frame in the previous frame of image is obtained and recorded as the central point, the scale is recalculated, and then the value is assigned to the next frame of image to complete scale change.

5. The long-time span video image pedestrian monitoring accurate tracking method according to claim 1, characterized in that the improved HOG extension feature extraction method comprises: calculating HOG extension features and selecting valid features from the extended features, the method for extending HOG extension features using f (u, v) to represent pixel intensity (u, v) of coordinates:

formula 1

f_uAnd f_vThe component image gradients, representing u and v, the gradient magnitude m (u.v) and orientation θ (u, v) of coordinates (u, v) are calculated as equation 2:

formula 2

by the number of grids B, cell size C, block size R_pqNumber of intervals l, unit size at coordinates (u, v), index representing the size of the rectangular block and the interval, respectively, extended HOG extension feature

Calculating formula:

formula 3

Formula 4

6. The long time span video image pedestrian monitoring accurate tracking method according to claim 1, characterized in that the structured dynamic background adaptive kernel function is set as follows: analyzing the image after the image is processed in the early stage, extracting a plurality of training samples around the target and marking the real values of the training samples so as to initialize the model, then, training is carried out by adopting the sample to obtain the optimal interface and the required model parameters, when the next frame of image appears, starting from the estimated target position of the previous frame, searching in a certain range nearby to obtain several sample data, classifying the data by using the above-mentioned frame image trained structured dynamic background self-adaptive model, determining the final target estimation position by using weighted summation of data samples whose classification results are targets, repeatedly updating classifier after determining target position, updating the model, and continuously circulating in such a way until all image frames are processed to obtain the estimated positions of the targets in all the image frames, thereby realizing the target tracking task in the video sequence;

formula 5

The output of the prediction function f forms a sequence (x, y), y is a change value of a target, L is an objective function, α is a lagrangian factor, w is a normal vector, b is a constraint critical value, g is a decision coefficient, and α (i =1,2, L, n) is an optimal solution of the following quadratic optimization problem:

formula 6

Formula 7

formula 8

The corresponding secondary optimization problem is as follows:

formula 9

Formula 10

formula 11

And for the inner product function of the new feature, assuming that a mapping relation K satisfies:

formula 12

The decision function in the high dimensional space can be written as:

formula 13

the method comprises the following steps:

formula 14

And setting a kernel function meeting the conditions to complete the inner product operation of the training sample in a high-dimensional space, thereby avoiding the remarkable increase of computational complexity and dimension disaster caused by phi calculation.

7. The long time span video image pedestrian monitoring accurate tracking method according to claim 1, characterized in that the optimization algorithm of the structured dynamic background self-adaption is as follows: the structured dynamic background self-adaptive model calculates the maximum function interval and geometric interval in order to obtain the optimal classification judgment surface;

formula 15

Then the original problem is equivalent to:

formula 16

formula 17

Using the lagrange function to obtain:

formula 18

Formula 19

Substituting the above equation into the decision function

The following can be obtained:

formula 20

formula 21

C is a constant that balances the training errors of the maximum and minimum boundaries, giving the meaning when α takes different values: when alpha is_iWhen the number is =0, the data can be classified normally and are distributed in the boundary; when 0 < alpha_iWhen < C, this point is indicated as the support vector, just on the boundary: when alpha is_iIf = C, it indicates that the boundary is not in any class in the middle of the parallel boundary;

Setting a constraint relation:

formula 22

Is composed of

The new value of (a) is determined,

is composed of

Old value of, cancel

Get about

< C, is further obtained

The analytical formula (D) is as follows:

formula 23

H and L are the upper and lower boundaries, if any

formula 24

Formula 25

For the

The method comprises the following steps:

formula 26

Establishing

The optimal interface is obtained, and the most main steps comprise two steps:

searching for alpha_iAnd alpha_jThe method mainly comprises the following steps: scanning all multipliers, setting the multiplier violating the KKT condition as the first updating object as alpha_j(ii) a Then choose to make | B among all multipliers that do not violate the KKT condition_i- B_jThe maximum multiplier of | is α_i，B_iExpress the predicted value and trueThe difference between the real values;

formula 27

In this process, α is updated₁And alpha₂Comprises the following steps:

formula 28

And 3, step 3: updating alpha according to equation 26₁And alpha₂；

8. The long time span video image pedestrian monitoring accurate tracking method according to claim 1, characterized in that the improved HOG + structured dynamic background adaptive pedestrian tracking method comprises: the characteristics of the adaptive tracking model based on the structured dynamic background comprise: firstly, the structured dynamic background self-adaptive classifier is in a closed form, and a nonlinear classifier is trained and evaluated more quickly; secondly, the structured dynamic background self-adaptive tracker adopts a high-dimensional linear characteristic to better represent a target; third, multi-scale estimation improves performance through large-scale change tracking.