CN109949341B

CN109949341B - Pedestrian target tracking method based on human skeleton structural features

Info

Publication number: CN109949341B
Application number: CN201910176928.8A
Authority: CN
Inventors: 钟震宇; 马敬奇; 雷欢; 杨慧莉
Original assignee: Guangdong Institute of Intelligent Manufacturing
Current assignee: Institute of Intelligent Manufacturing of Guangdong Academy of Sciences
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2020-12-22
Anticipated expiration: 2039-03-08
Also published as: CN109949341A

Abstract

A pedestrian target tracking method based on human body skeleton structural features comprises the following steps of recording a target motion video, carrying out equalization preprocessing on a target motion video image, extracting skeleton coordinate information of pedestrians, manually selecting tracked target pedestrians, establishing a target initial feature template image, and tracking the target pedestrians; if the target pedestrian is normal, continuing tracking, and if the target pedestrian is lost, executing the next step; extracting a target structural feature image and updating a target feature template image according to a video image before the target pedestrian is lost and skeleton coordinate information of the target pedestrian, extracting structural feature images of all pedestrians, matching the structural feature images with the target feature template image one by one, and repositioning the position of the target pedestrian according to a matching result; and then continue to track the target. The invention can realize the stable and continuous tracking of the target pedestrian under the complex multi-person scene, and solves the problems that the tracking target is easy to lose and is difficult to automatically find under the conditions of multi-person overlapping, shielding and the like in the traditional tracking algorithm.

Description

Pedestrian target tracking method based on human skeleton structural features

Technical Field

The invention relates to the field of computer vision, in particular to a pedestrian target tracking method based on human skeleton structural characteristics.

Background

With the continuous development of artificial intelligence technology, target tracking has become one of the important research subjects in the field of computer vision, is widely applied to the fields of human-computer interaction, military, medical treatment, traffic and the like, and has profound significance for social development. The target tracking is a process of positioning and detecting the state and position of a moving target by utilizing target characteristic information or background difference frame separation, the existing target tracking technology is carried out in a mode of inputting video or monitoring, and the tracking purpose is achieved by utilizing a tracking algorithm to position and predict the motion track of the target. With the continuous research and development of tracking algorithms, from traditional tracking algorithms used before 2010 such as Meanshift, Kalman filtering, particle filtering and the like to currently used TLD, Struck, correlation filtering, tracking methods based on deep learning and the like, the accuracy and real-time of target tracking are continuously improved, and the problem that the target is difficult to detect and accurately track in a complex environment is gradually solved. However, the existing tracking technology has insufficient stability in complex environments such as shielding and view angle change, and is prone to problems such as target drift and loss, so that the target is difficult to reposition, and the target tracking continuity performance is poor. Therefore, the method has important practical significance in researching target tracking based on complex environments such as shielding, view angle change and the like, can relieve the problem that the target is difficult to find after being shielded and lost, timely updates the target characteristic template and enhances the stability of the algorithm.

Disclosure of Invention

In order to solve the technical problem, the invention provides a pedestrian target tracking method based on human skeleton structural features.

In order to solve the technical problems, the invention adopts the following technical scheme:

a pedestrian target tracking method based on human skeleton structural features comprises the following steps:

s1, recording a target motion video by using a camera, and carrying out equalization preprocessing on a target motion video image;

s2, extracting skeleton coordinate information of pedestrians through human body skeleton detection aiming at a target motion video image, manually selecting and tracking a target pedestrian, establishing a target initial characteristic template image according to the skeleton coordinate information of the target pedestrian, and tracking the target pedestrian;

s3, judging whether the target pedestrian is in a tracking loss state, if the target pedestrian is normal, keeping the current state to continue tracking, if the target pedestrian is lost, executing the next step, and turning to the step S4;

s4, extracting a target structural feature image and updating a target feature template image according to the video image before the target pedestrian is lost and the skeleton coordinate information of the target pedestrian, then extracting the structural feature images of all pedestrians according to the video image at the current moment and the skeleton coordinate information of the pedestrian, matching the structural feature images of all pedestrians and the target feature template image one by one at the current moment, and repositioning the position of the target pedestrian according to the matching result;

and S5, continuing to track the target after the target pedestrian relocates.

The image preprocessing is to adopt a CLAHE algorithm to equalize the image.

The human skeleton detection process comprises the following steps: extracting human skeleton key point information by adopting OpenPose method, wherein the human skeleton key point information comprises 18 key points P of a human skeleton_i＝{(x_i,y_i) I ═ 0,1,., 17}, and the human skeleton key points include left eye, right eye, left ear, right ear, mouth, left shoulder, right shoulder, chest neck, left elbow, right elbow, left hand, right hand, left hip, right hip, left knee, right knee, left foot, and right foot.

And the target pedestrian tracking process adopts a KCF, TLD, CSK or Struck algorithm for tracking.

The structural characteristic image is clothing characteristic image of each part region of human body extracted according to human body skeleton structure, and the characteristic image is R_n(n is 1,2,3, 4.). n is the number of structural characteristic images, including left eye, right eye, left ear, right ear, mouth, left shoulder, right shoulder, chest neck, left elbow, right elbow, left hand, right hand, left hip, right hip, left knee, right knee, left hip, right knee, left shoulder, right shoulder, left neck, right elbow, right shoulder, rightLeft and right foot images.

The target initial characteristic template image is a characteristic template image of each part region of a target pedestrian extracted according to a human body skeleton, and is established and stored in a target normal state before the target pedestrian is tracked.

Judging whether the target pedestrian is in a lost state by taking the tracking confidence as a standard, if the tracking confidence of the current tracking process is smaller than a tracking confidence judgment threshold, if the target is judged to be in the lost state, updating the target characteristic template image and repositioning the target pedestrian; and if the current tracking confidence coefficient is greater than the tracking confidence coefficient judgment threshold, keeping the current state to continue tracking.

Setting the tracking confidence coefficient as q, q belongs to [0,1 ]]Wherein 0 represents that the reliability of the target pedestrian is lowest in the tracking process, 1 represents that the reliability of the target pedestrian is highest in the tracking process, the tracking confidence value that the tracking target is in a lost state is the lowest value and is taken as a judgment threshold value and recorded as Q₀The tracking confidence value of the normal tracking process is the highest value and is marked as Q₁The specific judgment method is as follows:

s3.1, tracking the target, and calculating to obtain a tracking confidence value q;

s3.2, when the target pedestrian in the Nth frame image tracks the confidence q_NGreater than the tracking confidence judgment threshold Q₀When the target characteristic template image is not needed to be updated, the current state is kept for continuous tracking; when the N frame image target tracking confidence q_NLess than tracking confidence judgment threshold Q₀When the target characteristic template image needs to be updated, the next step is continuously executed;

and S3.3, judging the target tracking confidence q of the image of the Nth-i (i is 1, 2)_N-iWhether or not it is greater than Q₁(ii) a If greater than Q₁Updating the target feature template image based on the target structural feature image in the N-i frame image, if the target structural feature image is smaller than Q₁Executing the next step, and turning to step S3.4;

s3.4, until the N-k (k 1, 2..) frame image target tracking confidence q_N-kGreater than Q₁And from the N-k frame imageThe target structural feature image in (1) is an updated target feature template image according to the target structural feature image;

the expression is as follows:

wherein M is_N+1Representing the target feature template image at frame N +1, G_N-kRepresenting an updated object feature template image according to the object state of the N-k th frame, M_NRepresenting the target feature template image at the Nth frame, q_N-kIndicating the confidence of tracking at the N-k frame.

The pedestrian and target pedestrian matching process comprises the following steps: calculating the similarity between each structural feature image of the pedestrian and a target feature template image corresponding to the pedestrian through image matching, constructing a comprehensive similarity evaluation model for comprehensive calculation to obtain the comprehensive similarity between the pedestrian feature and the target feature template, and matching if the comprehensive similarity is greater than a set threshold or not matching if the comprehensive similarity is less than the set threshold;

matching n structured feature images with the feature template image one by one to obtain n similarity values d_i(i＝1,2,...n)；

Establishing a comprehensive similarity evaluation model of the pedestrian characteristic image and the target characteristic template image, wherein the formula is as follows:

f(d)＝d₁α₁+d₂α₂+…+d_iα_i+…+d_nα_n

wherein alpha is_iN denotes a function weight, α_iThe range of (1) is (0), since the closer the function value of f (d) is to 1, the higher the similarity is, the target is determined according to the function value of f (d), and the highest value is the target pedestrian.

When the target structural feature image is extracted, the method specifically comprises the following steps: targeting structured feature image R_iThe region position in the captured video image is denoted as R_i(x, y, w, h), wherein i is 0,1,2And the region reference central point G (x, y) is set to have a width w and a height h, and the reference central point is used as a base point, and the set width and height are used as scale ranges to intercept the image sub-region in the acquired video image to obtain the target structural feature image.

The invention has the following beneficial effects:

(1) the CLAHE algorithm is adopted to carry out equalization pretreatment on the acquired video image, the image quality is enhanced, and the applicability of the algorithm in an outdoor illumination variable complex environment can be improved; pedestrian skeleton key point information is extracted through human skeleton detection, human skeleton key point information can accurately describe human skeleton structures and joint part relations, and the extraction capability and precision of human body local detail information are improved.

(2) The human body structural features are extracted according to the pedestrian skeleton key point information, the body shape features of clothes of all parts of a human body can be accurately represented, the human body feature details in the tracking process are enriched, and the target tracking precision and the scene applicability are improved; meanwhile, based on human body structural characteristics, an updating strategy of a tracking target characteristic matching template is designed, an optimal characteristic template of a target before the conditions of shielding, multi-person overlapping and the like (namely the condition of easy loss) is automatically searched, and the template is utilized for pedestrian matching identification and repositioning, so that stable and continuous tracking of the target pedestrian under a complex multi-person scene is realized, and the problems that the tracking target is easy to lose and difficult to automatically find under the complex conditions of shielding, multi-person overlapping and the like are solved.

(3) In the human body structural feature matching process, a comprehensive similarity evaluation model of the pedestrian features and the target feature template is established, structural features of all parts of pedestrians can be effectively fused through weight adjustment, comprehensive similarity evaluation of the tracked target and the detected pedestrians is achieved, and accuracy of target identification and positioning in the tracking process can be improved.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of the method of the present invention;

FIG. 3 is a schematic diagram of the CLAHE principle of the present invention;

FIG. 4 is the key point information of the human skeleton in the present invention;

FIG. 5 is a graph of the change in tracking confidence of the present invention;

FIG. 6 is a diagram of the effect of the invention before and after the object is occluded;

FIG. 7 is a diagram of the pedestrian target tracking effect based on the human skeleton structural features of the present invention.

Detailed Description

So that the manner in which the features and advantages of the invention, as well as the objects and functions attained by the method can be understood in detail, a more particular description of the invention briefly summarized above may be had by reference to the embodiments thereof which are illustrated in the appended drawings.

As shown in fig. 1-7, the present invention discloses a pedestrian target tracking method based on human skeleton structural features, comprising the following steps:

and S1, recording the target motion video by using the camera, and performing equalization preprocessing on the target motion video image by using a CLAHE algorithm to increase the image quality. The applicability of the CLAHE algorithm in an outdoor illumination variable complex environment can be improved; pedestrian skeleton key point information is extracted through human skeleton detection, human skeleton key point information can accurately describe human skeleton structures and joint part relations, and the extraction capability and precision of human body local detail information are improved.

S2, extracting skeleton coordinate information of the pedestrian through human body skeleton detection aiming at the target motion video image, manually selecting and tracking the target pedestrian, establishing a target initial characteristic template image according to the skeleton coordinate information of the target pedestrian, and tracking the target pedestrian.

And S3, judging whether the target pedestrian is in a tracking loss state, if the target pedestrian is normal, keeping the current state to continue tracking, if the target pedestrian is lost, executing the next step, and turning to the step S4.

And S4, extracting a target structural feature image according to the video image before the target pedestrian is lost and the skeleton coordinate information of the target pedestrian and updating a target feature template image, then extracting the structural feature images of all pedestrians according to the video image at the current moment and the skeleton coordinate information of the pedestrian, matching the structural feature images of all pedestrians at the current moment with the target feature template image one by one, and repositioning the position of the target pedestrian according to the matching result.

And S5, continuing to track the target after the target pedestrian relocates.

For the image equalization by using the CLAHE algorithm in step S1, the specific steps are:

s1.1, extracting a Y component channel in YUV, and dividing the channel image into a plurality of sub-blocks with the same size.

S1.2, performing histogram clipping on each divided sub-block, calculating the pixel average value of the sub-block, as shown in the formula,

wherein N is_aveDenotes an average value of pixels, Nx denotes the number of pixels in the x direction, N_yNumber of pixels in y direction, N_xyIndicating the number of gray levels of the sub-block region.

S1.3, calculating a contrast limited value,

C＝N_cN_ave

where Nc is the set pixel clipping coefficient.

S1.4, by a clipping factor N_cThe subblock area is processed for the standard, the number of processed pixels S is calculated, and the pixels are evenly distributed, that is, the evenly distributed pixels Nv are as shown in the following formula,

and S1.5, finding out the pixel values which exceed the set pixel range after the clipping by using the following formula, and redistributing the pixel values.

Where L is the allocated pixel length, L_GAs an imageLength of the gray scale range.

S1.6, carrying out histogram equalization on the subblocks subjected to the step;

and S1.7, solving the effect generated between the sub-blocks by using a linear difference algorithm.

The human skeleton detection process in the step S2 is as follows: extracting human skeleton key point information by adopting OpenPose method, wherein the human skeleton key point information comprises 18 key points P of a human skeleton_i＝{(x_i,y_i) I is 0,1, 17}, and the human skeleton key points mainly comprise 18 nodes of a left eye, a right eye, a left ear, a right ear, a mouth, a left shoulder, a right shoulder, a chest neck, a left elbow, a right elbow, a left hand, a right hand, a left hip, a right hip, a left knee, a right knee, a left foot and a right foot.

Further, in step S2, the target initial structural feature template image is created with 18 key points P of the human skeleton_i＝{(x_i,y_i) I ═ 0, 1.. times, 17} is based on extracting the structured features, as shown in fig. 4, and calculating the structured feature region range. Assuming that the structured feature region is denoted as R_i(x, y, w, h), wherein i is 0,1,. and 5, which in turn represents the chest, crotch, left arm, right arm, left leg, right leg; meanwhile, the gravity center is G (x, y), the width is w, the height is h, the gravity center is used as a base point, and the range of the width and the height is used as a scale to intercept the structural feature region. The specific calculation process is as follows:

let the structured characteristic region of the chest be R₀(x, y, w, h) can be represented by P₂、P₅、P₈、P₁₁The coordinate information of the key points is obtained, and the specific calculation formula is as follows:

wherein the content of the first and second substances,

similarly, the crotch structural feature region R₁(x, y, w, h) can be represented by P₈、P₉、P₁₁、P₁₂And coordinate information of the key points is obtained, and a calculation formula is similar to that of the chest structural feature region.

Let the left arm structured feature region be R₂(x, y, w, h) can be represented by P₂、P₃The coordinate information of the key points is obtained, and the specific calculation formula is as follows:

similarly, the right arm structured feature region R₃(x, y, w, h) can be represented by P₅、P₆Coordinate information is obtained, and a left leg structural characteristic region R is obtained₄(x,y, w, h) can be determined by P₈、P₉Coordinate information is obtained, and the right leg is in a structural characteristic region R₅(x, y, w, h) can be represented by P₁₁、P₁₂And coordinate information is obtained, and calculation formulas are similar to the calculation formula of the left arm structural feature region.

In addition, in the embodiment of the present invention, the existing tracking algorithm indicated in step S2 is target tracking performed by taking KCF tracking algorithm as an example, and the specific process is as follows:

s2.1, acquiring training samples by using a cyclic matrix, as shown in the following formula,

and C (x) is a cyclic matrix which is obtained by circularly moving the basic sample x, namely acquiring image samples around the target by the cyclic matrix to obtain a large number of training samples.

S2.2, performing discrete Fourier transform on the circulant matrix, namely diagonalizing the X Fourier of the matrix, as shown in the following formula:

wherein F is a constant matrix independent of x, is a discrete Fourier matrix,

discrete Fourier transform, F, representing x^HRepresenting the conjugate transpose of F.

S2.3, forming a large number of positive and negative samples after the basic samples are subjected to cyclic displacement, and thus training and classifying the samples by utilizing a ridge regression to establish a filtering model, namely training the samples by a least square method and searching a classifier f (z) ═ w^Tz function, where z represents the target image input sample and w represents the weight, such that sample x_iAnd its regression value y_iThe squared error of (d) is minimized and the minimum value of w is solved as shown in the following equation:

λ is a control regularization parameter that can prevent overfitting.

Wherein w ═ X^TX+λI)^-1X^Ty

Since fourier transform is needed, it can be simplified as follows:

w＝(X^HX+λI)^-1X^Hy

where XH is the conjugate transpose of X.

And S2.4, calculating the tracking confidence coefficient of the KCF, and judging whether the target structure characteristic template image needs to be updated or not according to the tracking confidence coefficient.

In addition, the tracking confidence is judged by adopting the tracking confidence in the KCF tracking algorithm, as shown in fig. 5, whether the target pedestrian is in a lost state is judged by taking the tracking confidence as a standard, if the tracking confidence in the current tracking process is smaller than a tracking confidence judgment threshold, and if the target is judged to be in a lost state, the target feature template image needs to be updated and the target pedestrian needs to be repositioned; and if the current tracking confidence coefficient is greater than the tracking confidence coefficient judgment threshold, keeping the current state to continue tracking.

Setting the tracking confidence coefficient as q, q belongs to [0,1 ]]Wherein 0 represents that the reliability of the target pedestrian is lowest in the tracking process, 1 represents that the reliability of the target pedestrian is highest in the tracking process, the tracking confidence value that the tracking target is in a lost state is the lowest value and is taken as a judgment threshold value and recorded as Q₀The tracking confidence value of the normal tracking process is the highest value and is marked as Q₁，Q₀The method is obtained by carrying out tracking test on conditions of pedestrian shielding, multi-person overlapping, view angle change and the like and taking the average value of tracking confidence coefficients when multiple targets are lost. Q₁The method is obtained by performing tracking test on the pedestrian under the condition of normal posture and taking the average value of tracking confidence coefficients when the target is not lost for many times.

The specific judgment method is as follows:

and S3.1, tracking the target, and calculating to obtain a tracking confidence value q. Calculating a filter response value between the basic sample and the training sample according to a KCF algorithm, namely a correlation tracking confidence q, wherein the formula is as follows:

wherein q is the tracking confidence, f (Z) is the classifier established by ridge regression, X is the basic sample, Z is the training sample, K^xzFor the kernel correlation between the base and training samples, alpha is the vector coefficient,

is a discrete fourier transform of (f) (z),

is K^xzThe discrete fourier transform of (a) is,

is a discrete fourier transform of alpha.

s3.4, until the N-k (k 1, 2..) frame image target tracking confidence q_N-kGreater than Q₁Updating a target feature template image by taking the target structural feature image in the N-k frame image as a basis;

the expression is as follows:

When the target is blocked, the loss phenomenon is easy to occur, and the characteristic template at the moment does not meet the current target state, so that the characteristic template needs to be updated to match the target again. Threshold Q₀Is a condition for judging the updating of the feature template, and the threshold value Q₁Is the average tracking confidence of the target in the normal state. As the situation that the target is still blocked probably exists in the continuous frames before the N frame and the normal state of the N-k frame is close to the target state when the N +1 frame exists, as shown in figure 6, the image of the N-k frame is selected as the basis for updating the feature template, so that the target can be matched in time when the N +1 frame exists.

Further, in step S4, the specific steps of relocation after target loss are as follows:

(1) extracting target structural characteristics of the N-k frames before loss according to the key point information of the human skeleton, wherein the target structural characteristics are taken as the clothes characteristics of chest, crotch, left arm, right arm, left leg and right leg regions;

(2) updating a target feature template according to the structural features extracted in the step (1);

(3) comparing each pedestrian feature template (containing six structural features) of the (N + 1) th frame after the target is lost with the updated feature template one by one through a histogram correlation matching method, wherein the comparison formula is shown as follows;

wherein H₁Representing the order of the N-k framesMarking a feature template histogram, namely updating the feature template histogram; h₂A structured feature template histogram representing a pedestrian to be matched;

each represents H₁、H₂Average value of (d); i denotes the image pixel value.

(4) Since each pedestrian is compared with the feature template to generate six similarity values, comprehensive similarity evaluation needs to be performed to establish a comprehensive similarity evaluation model of the pedestrian feature and the target feature template, as shown in the following formula. As can be seen from the nature of the histogram correlation matching method, the closer the value of f (d) is to 1, the higher the degree of similarity, and therefore the target can be repositioned according to the highest value of the values of f (d) generated by each pedestrian.

f(d)＝d₁α₁+d₂α₂+…+d_iα_i+…+d₆α₆

Wherein d is_i(i ═ 1, 2.. 6) represent similarity values for each structured feature map; alpha is alpha_i(i ═ 1, 2.. 6) denotes a function weight, α_iThe range of (1) is (0).

(5) And preferably selecting the target pedestrian with the highest similarity to reposition the tracking target according to the comprehensive similarity calculation result.

According to the invention, the human body structural features are extracted according to the pedestrian skeleton key point information, the body type features of clothes of all parts of the human body are accurately represented, and a tracking target feature matching template updating strategy is designed, so that the target pedestrian identification and positioning accuracy is improved, the target pedestrian is stably and continuously tracked under a complex multi-person scene, the problems that the tracking target is easy to lose and difficult to automatically find under complex conditions of shielding, multi-person overlapping and the like are solved, the pedestrian target tracking performance is improved, and the stability is strong.

Although the present invention has been described in detail with reference to the embodiments, it will be apparent to those skilled in the art that modifications, equivalents, improvements, and the like can be made in the technical solutions of the foregoing embodiments or in some of the technical features of the foregoing embodiments, but those modifications, equivalents, improvements, and the like are all within the spirit and principle of the present invention.

Claims

1. A pedestrian target tracking method based on human skeleton structural features comprises the following steps:

and S5, continuing to track the target after the target pedestrian relocates.

2. The pedestrian target tracking method based on the human skeleton structural features of claim 1, wherein the image preprocessing is to equalize the image by using a CLAHE algorithm.

3. The pedestrian target tracking method based on the human skeleton structural features of claim 1, wherein the human skeleton detection process is as follows:extracting human skeleton key point information by adopting OpenPose method, wherein the human skeleton key point information comprises 18 key points P of a human skeleton_i＝{(x_i,y_i) I ═ 0,1,., 17}, and the human skeleton key points include left eye, right eye, left ear, right ear, mouth, left shoulder, right shoulder, chest neck, left elbow, right elbow, left hand, right hand, left hip, right hip, left knee, right knee, left foot, and right foot.

4. The pedestrian target tracking method based on the human skeletal structural features of claim 1, wherein the target pedestrian tracking process adopts KCF, TLD, CSK or Struck algorithm for tracking.

5. The pedestrian target tracking method based on human skeleton structural features of claim 1, wherein the structural feature image is a clothing feature image of each region of the human body extracted according to the human skeleton structure, and the feature image is R_nAnd (n is 1,2,3, 4.). n is the number of structural characteristic images, including left eye, right eye, left ear, right ear, mouth, left shoulder, right shoulder, chest neck, left elbow, right elbow, left hand, right hand, left hip, right hip, left knee, right knee, left foot and right foot images.

6. The pedestrian target tracking method based on the human skeleton structural features of claim 1, wherein the target initial feature template image is a feature image of each region of the target pedestrian extracted according to the human skeleton, and is a feature template image in a target normal state established and stored before the target pedestrian tracking is performed.

7. The pedestrian target tracking method based on the human skeleton structural features of claim 1, wherein whether the target pedestrian is in a loss state is judged by taking a tracking confidence as a standard, if the tracking confidence of the current tracking process is smaller than a tracking confidence judgment threshold, if the target is judged to be in the loss state, the target feature template image needs to be updated and the target pedestrian needs to be repositioned; and if the current tracking confidence coefficient is greater than the tracking confidence coefficient judgment threshold, keeping the current state to continue tracking.

8. The pedestrian target tracking method based on the human skeleton structural characteristics as claimed in claim 6, wherein the tracking confidence coefficient is set as q, q is e [0,1 [ ]]Wherein 0 represents that the reliability of the target pedestrian is lowest in the tracking process, 1 represents that the reliability of the target pedestrian is highest in the tracking process, the tracking confidence value that the tracking target is in a lost state is the lowest value and is taken as a judgment threshold value and recorded as Q₀The tracking confidence value of the normal tracking process is the highest value and is marked as Q₁The specific judgment method is as follows:

the expression is as follows:

wherein M is_N+1Represents the N +1 th frameTemporal target feature template image, G_N-kRepresenting an updated object feature template image according to the object state of the N-k th frame, M_NRepresenting the target feature template image at the Nth frame, q_N-kIndicating the confidence of tracking at the N-k frame.

9. The pedestrian target tracking method based on the human skeleton structural features of claim 1, wherein the pedestrian and target pedestrian matching process is as follows: calculating the similarity between each structural feature image of the pedestrian and a target feature template image corresponding to the pedestrian through image matching, constructing a comprehensive similarity evaluation model for comprehensive calculation to obtain the comprehensive similarity between the pedestrian feature and the target feature template, and matching if the comprehensive similarity is greater than a set threshold or not matching if the comprehensive similarity is less than the set threshold;

f(d)＝d₁α₁+d₂α₂+…+d_iα_i+…+d_nα_n

wherein alpha is_iN denotes a function weight, α_iThe more the function value of f (d) is close to 1, the higher the similarity is, and the target is determined according to the function value of f (d), and the highest value is the target pedestrian.

10. The pedestrian target tracking method based on the human skeleton structural feature of claim 1, wherein when the target structural feature image is extracted, the method specifically comprises the following steps: targeting structured feature image R_iThe region position in the captured video image is denoted as R_i(x, y, w, h), wherein i is 0,1, 2.. the region reference center point G (x, y) is calculated according to the skeleton key point of the region where the target structural feature image is located, the width is set as w, the height is set as h, and the reference center is used asAnd taking the point as a base point, and intercepting the image subarea in the acquired video image by taking the set width and height as a scale range to obtain the target structural feature image.