CN114627424A

CN114627424A - Gait recognition method and system based on visual angle transformation

Info

Publication number: CN114627424A
Application number: CN202210305445.5A
Authority: CN
Inventors: 卫星; 周芳; 陈柏霖; 杨烨; 王明珠; 陈逸康; 何煦; 李宝璐
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2022-03-25
Filing date: 2022-03-25
Publication date: 2022-06-14

Abstract

The invention provides a gait recognition method and a gait recognition system based on visual angle transformation, which are characterized in that firstly, monitoring videos acquired by a plurality of pedestrian monitoring devices are acquired, a pedestrian gait data set is obtained through processing, and a training set and a testing set are divided; training a visual angle conversion model for generating a specific visual angle image through any visual angle and a discriminator for discriminating the correctness of the generated visual angle image by using a GaitGAN network, and then inputting a test set into the visual angle conversion model to obtain a gait energy atlas under a target visual angle; obtaining an image generated by the visual angle conversion model, preprocessing the image to obtain a pixel image, sending the pixel image into a reference gait feature extraction model to obtain a feature vector and a pedestrian prediction vector to calculate the total loss, and optimizing model parameters by using a gradient descent algorithm to obtain a trained gait feature extraction model. The invention realizes the automation of pedestrian tracking, and converts the pedestrian gait energy diagram into the most obvious gait feature at a 90-degree visual angle by adopting the generated confrontation network, so that the gait recognition accuracy is higher.

Description

Gait recognition method and system based on visual angle transformation

Technical Field

The invention relates to the field of computer vision and deep learning, in particular to a gait recognition method and system based on visual angle transformation.

Background

The gait recognition technology is a new biological characteristic recognition technology, and aims to perform identity recognition and analysis, detection of physiological and pathological characteristics and the like through the walking characteristics of a person. Gait recognition has wide application prospects in the fields of access control systems, pedestrian monitoring, public security and the like, and is widely applied in the field of computer vision in recent years. The pedestrian detection and tracking by gait recognition has the following advantages: 1) compared with the traditional biological characteristic identification technology (such as fingerprint identification, palm print identification and the like), the gait identification is bright in non-touch and is suitable for identity authentication under a long-distance situation; 2) the gait characteristics of pedestrians are not easy to hide or disguise due to the fact that physiological conditions such as bones and the center of gravity of each person and the individual coordination ability are different.

Although gait recognition techniques have been proposed for some time, a unified technical framework has not been formed so far. Compared with other biometric authentication technologies (such as face recognition, iris recognition, fingerprint recognition and the like), the biometric authentication method is still immature, and mainly shows that a well-known effective database, an effective algorithm and a high recognition rate are lacked.

The current gait recognition technology mainly comprises the following steps: background separation, target tracking, machine learning, machine vision, etc. However, some of these techniques are not yet mature and can cause certain difficulties in gait recognition. In addition, the method also faces a lot of difficulties in practical application, which is mainly reflected in that pedestrians are influenced by external environments and self factors in the walking process, for example, the comprehensive influence of multiple factors such as different walking pavements, different resolutions, different viewing angles, different clothes and different carried objects, and the like, so that the recognition rate in a complex real environment combining a plurality of influencing factors is still low. Among the above-mentioned influencing factors, the view angle change is one of the most important factors influencing the performance of the gait recognition system.

In conclusion, the gait recognition technology has the problems that the recognition accuracy is hindered by various factors such as different visual angles, different clothes, different carried objects and the like.

Disclosure of Invention

In view of the above-mentioned shortcomings of the prior art, the present invention aims to provide a gait recognition method and system based on view angle transformation, which is used to improve the problem in the prior art that the recognition accuracy is hindered by various factors such as different view angles, different clothes and different carrying objects.

To achieve the above and other related objects, the present invention provides a gait recognition method based on perspective transformation, including:

step one S100: the method comprises the steps of data acquisition and processing, wherein monitoring videos acquired by a plurality of pedestrian monitoring devices are acquired and processed to obtain a pedestrian gait data set, and the pedestrian gait data set is divided into a gait training set and a gait testing set;

step two S200: a visual angle conversion step, namely training a visual angle conversion model for generating a specific visual angle image through any visual angle by using a GaitGAN network, training a discriminator for discriminating the correctness of the generated visual angle image, and then inputting the data of the gait test set into the visual angle conversion model to obtain a gait energy map set under a specific target visual angle; and

step three S300: and a gait recognition step, namely acquiring an image generated by the GaitGAN network, preprocessing the image to obtain a pixel image, inputting the pixel image into a reference gait feature extraction model to obtain a feature vector and a pedestrian prediction vector, calculating total loss according to the feature vector and the pedestrian prediction vector, optimizing parameters of the reference gait feature extraction model by using a gradient descent algorithm, and finally obtaining a trained gait feature extraction model.

In a preferred embodiment of the present invention, the step S100 includes:

step S101: acquiring the monitoring video, and performing video frame extraction processing on the monitoring video to obtain a frame image;

step S102: screening the frame images, and preprocessing the screened frame images to obtain pedestrian images; and

step S103: and processing the pedestrian image to obtain a gait energy map, wherein all the gait energy maps form the pedestrian gait data set and are divided into the gait training set and the gait testing set.

In a preferred embodiment of the present invention, the step two S200 includes:

step S201: marking a view angle label on the gait energy map in the gait training set obtained in the step one S100;

step S202: setting the target view angle beta to be 90 degrees, and obtaining the gait energy image I obtained in the step S201^αAnd the gait energy map I^αGait energy map I generated by initial generator G^β' As input, train true and false arbiter D_RDistinguishing a real image from a generated image;

step S203: mapping the gait energy I^αAnd the gait energy map I^β' As input, train identity arbiter D_ATo judge whether the real image and the generated image are the same person;

step S204: mapping the gait energy I^αAnd target view angle meansSign v^βInput to a generator G that is trained to generate a gait energy map I with a target perspective β^β；

Step S205: inputting all gait energy maps of the gait training set obtained in the step one S100 into a perspective conversion model, and repeating the steps S202 to S204 until the discrimination probabilities of the true and false discriminators and the identity discriminator tend to be 0.5 and stable; and

step S206: generating a new gait energy image set by using the gait energy map in the gait training set obtained in the first step S100 by using the perspective transformation model, and using the new gait energy image set as the training set of the reference gait feature extraction model in the third step S300; and generating a new gait energy image set by using the gait energy image in the gait test set obtained in the first step S100 by using the perspective transformation model, and using the new gait energy image set as the test set of the reference gait feature extraction model in the third step S300.

In a preferred embodiment of the present invention, after repeating the steps S202 to S204 until the discrimination probabilities of the true and false discriminators and the identity discriminator tend to be 0.5 and stable, the generator G obtained in the step S205 is used as the view angle conversion model.

In a preferred embodiment of the present invention, the real image in step S202 is the gait energy map I obtained from step S201^αThe generated image is the gait energy map I^αThe gait energy map I generated by the initial generator G^β’。

In a preferred embodiment of the present invention, the step three S300 includes:

step 301: acquiring the gait energy images of the training set obtained after the step S205 as images for training to form a training batch, and processing the acquired images of the training batch to obtain a plurality of pixel images;

step 302: sending the pixel image into a convolutional neural network for feature extraction to obtain a feature vector, then sending the feature vector into a full connection layer, and obtaining a pedestrian prediction vector with dimensionality equal to the pedestrian category number through a softmax function; and

step 303: and calculating total loss according to the feature vector and the pedestrian prediction vector, optimizing the parameters of the reference gait feature extraction model by using a gradient descent algorithm, and finally obtaining a trained gait feature extraction model.

In a preferred embodiment of the present invention, the total loss in step S303 is algebraically added by the ternary loss and the ID loss, and the reference gait feature extraction model parameter includes a weight parameter w_iAnd a bias parameter b_i。

In a preferred embodiment of the invention, ternary losses are calculated from the eigenvectors obtained (16 sets for the number m of sets):

wherein, a_iFeature vector, p, representing target picture_iFeature vectors, n, representing positive sample pictures (belonging to the same category, i.e. the same person, as the target picture)_iRepresenting the feature vector of a negative sample picture (not the same person as the target picture), all three dimensions are 1000 multiplied by 1 dimension, a_i，p_i，n_iForming a triple for loss calculation; margin is a parameter, set here to 0.3, m represents the triple logarithm extracted from the training batch, d (a)_i，n_i) For Euclidean distance, the formula is (where z is the feature vector dimension, here 1000):

the activation function calculation formula for converting the feature vector into an N-dimensional vector is:

wherein N is the number of pedestrian categories, v is the output vector of the full connection layer, v_jFor the j-th value in v, i represents the pedestrian category which needs to be calculated currently, the result of calculation through the activation function is between 0 and 1, and the softmax values of all categories are summed to be 1.

In a preferred embodiment of the present invention, the ID loss is calculated from the resulting pedestrian prediction vector:

wherein p is_niIs the nth pedestrian prediction vector v'_nAnd the predicted probability value of the ith pedestrian, y is a real ID value, N is the number of the categories of the pedestrians, and K is the number of the pictures selected by each category of the pedestrians in a training batch.

In a preferred embodiment of the present invention, the total loss is calculated according to the feature vector and the pedestrian prediction vector by the following formula:

L_total＝L_T+L_ID。

in a preferred embodiment of the present invention, the step 303 of optimizing the reference gait feature extraction model parameters by using a gradient descent algorithm is that:

for all data in a training batch, the total loss L is calculated as a training step_total；

Extracting the respective weights w in the model for the reference gait feature_iAnd deviation b_i(w_i+1And b_i+1For updated parameters), the following formula is executed to update the parameters:

the invention further provides a gait recognition system based on visual angle transformation, which comprises:

the system comprises a data acquisition and processing device, a pedestrian gait data set and a pedestrian monitoring device, wherein the data acquisition and processing device acquires monitoring videos from a plurality of pedestrian monitoring devices, processes the monitoring videos to obtain the pedestrian gait data set, and divides the pedestrian gait data set into a gait training set and a gait testing set;

the visual angle conversion device trains a visual angle conversion model for generating a specific visual angle image through any visual angle by using a GaitGAN network, trains a discriminator for discriminating the correctness of the generated visual angle image, and then inputs the data of the gait test set into the visual angle conversion model to obtain a gait energy map set under a specific target visual angle; and

and the gait recognition device is used for acquiring an image generated by the GaitGAN network, preprocessing the image to obtain a pixel image, inputting the pixel image into a standard gait feature extraction model to obtain a feature vector and a pedestrian prediction vector, calculating total loss according to the feature vector and the pedestrian prediction vector, optimizing parameters of the standard gait feature extraction model by using a gradient descent algorithm, and finally obtaining a trained gait feature extraction model.

The gait recognition method and system based on visual angle conversion provided by the invention realize the automation of pedestrian tracking, so that the process of multidirectional tracking of the target pedestrian is free from dependence on manpower, and the video acquired by the cameras at multiple visual angles is processed into a gait energy diagram to be input into a pre-trained visual angle conversion model and a standard gait feature extraction model, so that the multidirectional and multi-visual angle tracking result of the target pedestrian can be obtained. The process utilizes a gait recognition technology to realize high automation of pedestrian tracking, and a generated confrontation network is adopted to convert a pedestrian gait energy map into a 90-degree visual angle with most obvious gait characteristics, so that the gait recognition accuracy is higher.

Drawings

FIG. 1 is a schematic flow chart of a gait recognition method based on visual angle transformation according to the invention;

FIG. 2 is a schematic diagram of a pedestrian gait data acquisition and processing process in step S100 according to the invention;

fig. 3 is a schematic structural diagram of the generator G in step two S200 according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of the true/false discriminator in step S200 according to an embodiment of the invention;

fig. 5 is a schematic structural diagram of the identity identifier in step S200 according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of gait energy images of a new training set and test set generated by generator G in step two S200 of the present invention;

FIG. 7 is a schematic flow chart of step three S300 according to the present invention;

fig. 8 is a schematic diagram of the basic gait feature extraction model in step three S300 according to the present invention;

fig. 9 is a schematic structural diagram of a ResNet50 network used in step three S300 according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. It is also to be understood that the terminology used in the examples is for the purpose of describing particular embodiments only, and is not intended to limit the scope of the present invention. Test methods in which specific conditions are not specified in the following examples are generally carried out under conventional conditions or under conditions recommended by the respective manufacturers.

Please refer to fig. 1 to 7 for a technical solution of the present invention. It should be understood that the structures, ratios, sizes, and the like shown in the drawings and described in the specification are only used for matching with the disclosure of the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions of the present invention, so that the present invention has no technical significance. In addition, the terms such as "upper", "lower", "left", "right", "middle" and "one" used in the present specification are used for clarity of description only, and are not used to limit the scope of the present invention, and the relative relationship between the terms may be changed or adjusted without substantial change in the technical content.

Fig. 1 is a schematic flow chart of a gait recognition method based on perspective transformation according to the present invention. In this embodiment, the gait recognition method based on perspective transformation of the present invention includes:

step three S300: and a gait recognition step (simultaneously refer to fig. 8), acquiring an image generated by the GaitGAN network, preprocessing the image to obtain a pixel image, inputting the pixel image into a reference gait feature extraction model to obtain a feature vector and a pedestrian prediction vector, calculating total loss according to the feature vector and the pedestrian prediction vector, optimizing parameters of the reference gait feature extraction model by using a gradient descent algorithm, and finally obtaining a trained gait feature extraction model.

For GaitGAN networks, see Shiqi Yu, Haifeng Chen, Edel b. garcia Reyes and Norman Poh at Conference 2017 at IEEE Conference on Computer Vision and Pattern Recognition Works (CVPRW) entitled "GaitGAN: an investigational gap Feature Extraction Using general adaptive Networks.

In an embodiment of the present invention (see fig. 2), the step S100 of data acquisition and processing specifically includes:

step S101: and acquiring the monitoring video, and performing video frame extraction processing on the monitoring video to obtain a frame image. Fig. 2 (a) shows an example of one extracted frame image;

step S102: screening frame images, and preprocessing the screened frame images to obtain pedestrian images, as shown in (b) and (c) of fig. 2, wherein (c) shows a pedestrian image in one gait cycle; and

step S103: the pedestrian images are processed to obtain gait energy maps (the map (d) in fig. 2 is a gait energy map synthesized by the pedestrian images in one gait cycle), and all the gait energy maps form a pedestrian gait data set and are divided into a training set and a testing set. The gait energy profile of the training set and the test set can be seen as an example in fig. 2 (e). All the gait energy images in the training set and the test set obtained in the step are gait energy images corresponding to the original observation visual angle, and the original observation visual angle refers to the original visual angle of the monitoring video shot by the monitoring equipment. The graph (e) in fig. 2 includes gait energy graphs corresponding to 11 original observation perspectives.

In an embodiment of the present invention, for example, a high-precision camera may be used to acquire several pieces of pedestrian walking video stream data from 11 viewing angles (from 0 ° to 180 ° and at an interval of 18 °), and perform video frame extraction processing on the several pieces of pedestrian walking video stream data to acquire a plurality of frame images (a).

The plurality of frame images are then pre-processed: firstly, frame images which do not contain detection objects, namely pedestrians, are removed, and then the rest of the screened frame images are subjected to noise removal, gray level conversion, binarization conversion and other operations, so that the definition and the contrast of the frame images are improved, image information is enhanced, and the frame images after image processing are obtained. And then cutting the frame image after the image processing to ensure that the specification size of the frame image after the image processing meets the preset specification size requirement, and manually checking the cut frame image to remove abnormal data. The frame image to be regarded as the abnormal data is, for example: the extracted video key frame image has extremely poor image quality and is not easy to distinguish, or the image main body is the surrounding environment and is not the frame image of the current detection object (namely the pedestrian), and the like.

The frame image of the pedestrian is cut by taking half of the human body width as a positioning position, and is zoomed to the same size (256 × 256), then the cut frame image is synthesized into a pedestrian image (b), and the pedestrian image (b) in one gait cycle is synthesized into a gait energy image (d). And marking a pedestrian identity label for each gait energy map. And (3) a data set consisting of all gait energy maps is as follows: and 2, dividing the gait training set and the gait testing set and storing the gait training set and the gait testing set.

Finally, a data set containing images of pedestrians photographed by a plurality of target pedestrians at 11 viewing angles respectively is acquired through step S100, and is processed to obtain a pedestrian gait data set. The pedestrian gait data set consists of gait energy images of each pedestrian in the monitoring video of the individual at 11 visual angles.

In an embodiment of the present invention, the step two S200 of converting the viewing angle specifically includes:

step S201: marking a view angle label on the gait energy map in the training set obtained in the step one S100;

step S202: setting the target visual angle beta as 90 degrees, and obtaining a gait energy chart I with the original observation angle^αAnd gait energy map I^αGait energy map I generated by initial generator G^β' As input, train true and false arbiter D_RDistinguishing a real image (i.e., the gait energy map in the training set obtained in step one S100) from a generated image (i.e., the gait energy map output by the generator G');

step S203: will I^αAnd I^β' As input, train identity arbiter D_ATo determine whether the real image (i.e. the gait energy map in the training set obtained in step one S100) and the generated image (i.e. the gait energy map output by the generator G') are presentIs the same person;

step S204: will I^αAnd a target view indicator v^βInput to a generator G that is trained to generate a gait energy map D with a target perspective β_A；

Step S205: inputting all gait energy maps of the training set obtained in the step one S100 into a view angle conversion model, and repeating the steps S202 to S204 until a true and false discriminator D_RIdentity discriminator D_AThe discrimination probability of (2) tends to 0.5 and is stable;

the steps S202 to S204 are repeated until a true and false discriminator D_RIdentity discriminator D_AAfter the discrimination probability of (2) tends to 0.5 and stabilizes, the generator G obtained in step S205 is used as a view conversion model;

step S206: acquiring a gait energy image set under a generated target view angle beta from the gait energy images in the training set acquired in the first step S100 by using a view angle conversion model (namely, a generator G), and taking the gait energy image set as a training set of the reference gait feature extraction module in the third step S300; the gait energy images in the test set obtained in step one S100 are converted by using the perspective conversion model (i.e., generator G), and the generated gait energy image set is obtained and used as the test set of the reference gait feature extraction model in step three S300.

FIG. 3 is a schematic diagram of the generator G obtained in step 205, and FIG. 4 is a diagram of the true/false discriminator D in step 202_RFig. 5 is a schematic diagram of the identity discriminator D in step 203_ASchematic structural diagram of (1). It should be understood that fig. 3-5 show only the generator G, the true and false arbiter D_RAnd identity discriminator D_AA structure example of the present invention, a generator G and a true/false discriminator D_RAnd an identity discriminator D_AOther suitable configurations may also be used.

The following is a description of a specific implementation process of step two S200 in an embodiment of the present invention:

step S201: in this embodiment, for example, for each gait energy image in the gait training set obtained in step one S100, a picture I^θAdopting One-hot vector, and expressing the visual angle indicator v according to the shooting angle theta in the form of vector according to the division mode that the shooting visual angle (namely the original observation visual angle) is from 0 DEG to 180 DEG and the interval is 18 DEG^θ(θ 0, θ 18, …, θ 180). For each picture, if the picture is obtained by post-processing of shooting by monitoring equipment with a 0-degree visual angle, setting the value of theta 0 to be 1, and setting the rest values to be 0;

step S202: randomly extracting gait energy images I of n original observation angles from the gait training set obtained in the step one S100^αAs a true sample (i.e., a true image); setting the target view angle beta to 90 deg., i.e. the target view angle indicator v^θThe value of θ 90 is set to 1, and the remaining values are set to 0; for each gait energy map in the n real samples, v is divided into^θCopying and expanding the value of each element in the image to a two-dimensional matrix with the same size as the two-dimensional matrix corresponding to the input image, inputting the two-dimensional matrix and the input image together to a generator G' (the "input image" refers to a real image and is a gait energy map in the training set obtained in the step S100), and generating n generation samples (namely generation images, namely gait energy maps I) corresponding to the real samples one by one under a target view angle beta^β') to a host; using the n generated samples as false samples and the true samples as input images x to be input into a true and false discriminator D_RTaking binary cross entropy as a true-false discriminator D_RLoss function of

The calculation formula is as follows:

wherein D is_R(x) Representing the probability that x is a true image, when D_R(x) If it is greater than 0.5, the true or false decision device D_RJudging that the input image x is a real sample, and taking t as 1; when D is present_R(x) If it is less than 0.5, the true or false decision device D_RAnd judging that the input image x is a false sample, and taking t as 0.

Denotes the probability that x is at a 90 ° view angle when

If it is greater than 0.5, the true or false decision device D_RJudging that the input image x is at a 90-degree visual angle, and taking s as 1; when in use

If it is less than 0.5, the true or false decision device D_RAnd judging that the input image x is at other visual angles, and taking s as 0. Updating and optimizing the true and false discriminator D by using a back propagation algorithm_RTo minimize

Step S203: using the n real samples selected in step S202 as a source gait energy map set { x }_iN noise samples obtained in step S202 are used as a generated gait energy map set { x }_jExtracting n images from the training set to form an irrelevant image set { x }_k}，x_kMust have an and x_iA different identity tag;

from { x, respectively_iAnd { x }_jSelecting an image to be input to an identity discriminator D together_AIn, the identity discriminator D_ALoss function

Is defined as follows:

wherein D is_A(x₁，x₂) Denotes x₁，x₂Is the probability of two gait energy maps of the same person. Updating the optimized identity discriminator D using a back propagation algorithm_ATo minimize

Step S204: set source gait energy map { x_iImage I in (1)^αAnd a target view indicator v^βInput to a generator G, and combined with a true-false discriminator D_RAnd an identity discriminator D_ADetermining an objective function of the generator G as

Updating the parameters of the optimized generator G to maximize L using a back propagation algorithm_G(x)。

Step S205: each gait energy map of the training set (obtained in step S100) is combined with v^θThe values of each element are copied and expanded to obtain a two-dimensional matrix with the same size as the two-dimensional matrix corresponding to the input image, the two-dimensional matrix is input into the model (the input image refers to a real image and is a gait energy image in the training set obtained in the step S100), and the steps 202-204 are repeated until the discrimination probabilities (D) of a true discriminator, a false discriminator and an identity discriminator are reached_R(x)、D_A(x₁，x₂) Tends to 0.5 and stabilizes, at which point the ability to generate the model is considered to be sufficiently strong and the training is stopped.

step 206: each gait energy map and v in the gait training set obtained in the step one S100^θCopying and expanding the value of each element to obtain a first two-dimensional matrix, wherein the first two-dimensional matrix has the same size as a second two-dimensional matrix corresponding to an input image (namely, a real image which is a gait energy image in the training set obtained in the step one S100), inputting the first two-dimensional matrix and the second two-dimensional matrix into a generator G together for visual angle conversion, and taking an image set generated by the generator G as a training set of the reference gait feature extraction model in the step three S300;

centralizing the gait test obtained in step S100Each gait energy map and v^θAnd copying and expanding the value of each element to obtain a third two-dimensional matrix, wherein the size of the third two-dimensional matrix is the same as that of a fourth two-dimensional matrix corresponding to the input image (namely, a real image which is a gait energy image in the test set obtained in the step S100), inputting the third two-dimensional matrix and the fourth two-dimensional matrix into a generator G together, and taking an image set generated by the generator G as the test set of the reference gait feature extraction model in the step three S300.

Fig. 6 shows gait energy images of the new training set and test set generated by generator G in step two 200.

Step three S300: namely a gait recognition step, comprising: acquiring images generated by the GaitGAN network (i.e. gait energy images of the new training set and the new test set obtained in step S206, as illustrated in fig. 8), preprocessing the images to obtain pixel images, sending the pixel images to a convolutional neural network (i.e. a feature extraction network in a reference gait feature extraction model) to perform feature extraction to obtain feature vectors, then sending the feature vectors to a full-link layer, obtaining vectors with dimensions equal to pedestrian categories through a softmax function, wherein the vectors are pedestrian prediction vectors, calculating total loss according to the feature vectors and the pedestrian prediction vectors, optimizing parameters of the reference gait feature extraction model by using a gradient descent algorithm, and finally obtaining a trained gait feature extraction model.

Specifically, step three S300 of the present invention includes the following steps (see fig. 7 at the same time):

step 301: acquiring images (images generated by a GaitGAN network, namely gait energy images acquired after S205) used for training to form a training batch, and processing the acquired images of the training batch to acquire a plurality of pixel images;

step 302: sending the pixel image into a convolutional neural network for feature extraction to obtain a feature vector, then sending the feature vector into a full connection layer, and obtaining a pedestrian prediction vector with dimensionality equal to the pedestrian category number through a softmax function;

step 303: and calculating the total loss according to the feature vector and the pedestrian prediction vector, and optimizing the parameters of the model by using a gradient descent algorithm to finally obtain a trained gait feature extraction model.

The following is a description of a specific implementation process of step three S300 in an embodiment of the present invention:

step S301: in the GaitGAN network of the perspective conversion step of step two S200, among the gait energy maps at the 90-degree perspective of various pedestrians generated after step S205, the gait energy maps of N × K pedestrians (N is the number of pedestrian categories, and K represents that K pictures (where K is 16) are randomly extracted for each type of pedestrian) are extracted as a training batch, and then each image is resized to 224 × 224 pixels, and then each image is decoded in [0, 1] to 32-bit floating-point original pixel values.

S302: the gait energy images of a training batch are all input into a convolutional neural network (namely a feature extraction network in a reference gait feature extraction model), multi-dimensional features (such as contour features, step length, proportion of each part of a human body and the like of pedestrians) are extracted, each image generates a feature vector, then the feature vector is input into a full connection layer (namely a conventional neural network) with the dimension number equal to the class number N of the pedestrians in the training set, and a pedestrian prediction vector with the dimension equal to the class number N is generated after a softmax activation function.

In an embodiment of the present invention, the generation process of the pedestrian prediction vector takes a process of generating a pedestrian prediction vector by a pedestrian picture (an image generated through step S301) as an example:

after extracting features by using a convolutional neural network, inputting the features into a full connection layer (namely a conventional neural network) with the dimensionality equal to the pedestrian classification number N of a training set, outputting a column vector N with the dimensionality of 1 multiplied by N after the features pass through the full connection layer (the column vector is a mathematical proper noun, and the column vector N is an output vector mentioned in a softmax formula), obtaining a column vector v 'with each element having the size between (0 and 1) after the column vector v passes through a softmax function, and taking the column vector v' as a pedestrian prediction vector obtained after the picture is processed by a reference gait feature extraction model.

In an embodiment of the present invention, the structure of the reference gait feature extraction model may be as shown in fig. 8. The convolutional network is an extracted feature network which is a part of the reference gait recognition model, the ResNet50 is a specific example of a convolutional neural network in the reference gait recognition model, or the ResNet50 is a specific example of an extracted feature network in the reference gait feature extraction model. The structure of the ResNet50 network may take the form shown in fig. 9.

As to the ResNet50 network, see the paper entitled "Deep residual learning for image recognition", published by Kaming He, Xiangyu Zhang, Shaoqing Ren and Jianan Sun in 2016 at the IEEE conference on computer vision and pattern recognition conference.

S303: calculating total loss according to the characteristic vector and the pedestrian prediction vector, and optimizing the parameters (including the weight parameter w) of the reference gait feature extraction model by using a gradient descent algorithm_iAnd a bias parameter b_i) And finally obtaining a trained gait feature extraction model. The method specifically comprises the following steps:

and (4) calculating the ternary loss according to the obtained feature vector (the group number m is 16 groups):

wherein, a_iFeature vector, p, representing a target picture_iFeature vectors, n, representing positive sample pictures (belonging to the same category, i.e. the same person, as the target picture)_iRepresenting the feature vector of a negative sample picture (not the same person as the target picture), and the dimensions of the three are 1000 multiplied by 1 dimensions. a is_i，p_i，n_iAnd forming a triple to calculate loss. A in the triad_i，p_i，n_iThe combinations of (a) are arranged by calling the torch.nn. triplettmarginloss method in the mature and open source pytorch deep learning framework. margin is a parameter, set here to 0.3, m represents the number of triplets extracted from the training batch, d (a)_i，n_i) Calculating the Euclidean distance (the Euclidean distance is the distance of a straight line between two points in space)The formula is (where z is the feature vector dimension, here 1000):

the activation function calculation formula (softmax formula) for converting the feature vector into an N-dimensional vector is:

For ternary losses, see the paper "FaceNet" by Florian Schroff, Dmitry Kalenichenko and James philibin at year 2015, month 2, at the Conference cvpr (ieee Conference on Computer Vision and Pattern recognition): a Unified Embedding for Face Recognition and Cluster.

Calculating ID loss according to the obtained pedestrian prediction vector:

wherein p is_niIs the n-th pedestrian prediction vector v'_nThe predicted probability value of the ith pedestrian is the real ID value, N is the number of the categories of the pedestrians, and K is the number of the pictures selected by each category of the pedestrians in a training batch.

With regard to ID loss, see the paper entitled "A discrete levels of elevation and examination for person identification" published by Zhedong Zheng, Liang Zheng and Yi Yang on ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) (volume 14, vol.14, page 13, 2018).

According to the feature vector and the pedestrian prediction vector, the total loss formula is calculated as follows:

L_total＝L_T+L_ID

inputting the total loss of a convolutional neural network (namely, the network corresponding to the reference gait feature extraction model) after a training batch into the algebraic addition of the ternary loss and the ID loss, and then utilizing a gradient descent algorithm to extract the weight parameters (w) of the network and the full-connection layer_i) And updating to complete the optimization of the reference gait feature extraction model once. A total of three such optimizations are performed in one iteration. The initial learning rate was set to 0.00035, which was reduced by 0.1 at 40 and 70 th iterations, respectively. A total of 120 iterations were performed. And after 120 iterations, finishing the training of the reference gait feature extraction model.

The gradient descent algorithm has the following principle:

For each weight parameter w in the model_iAnd bias parameter b_i(w_i+1And b_i+1For updated parameters), the following formula is executed to update the parameters:

the invention provides a gait recognition method and a gait recognition system based on visual angle conversion, which can realize the automation of pedestrian tracking, so that the process of multidirectional tracking of a target pedestrian is free from dependence on manpower, and the multidirectional and multi-visual-angle tracking result of the target pedestrian can be obtained only by processing videos acquired by cameras at multiple visual angles into a gait energy diagram and inputting the gait energy diagram into a pre-trained gait feature extraction model. The process utilizes a gait recognition technology to realize high automation of pedestrian tracking, and a generated confrontation network is adopted to convert a pedestrian gait energy map into a 90-degree visual angle with most obvious gait characteristics, so that the gait recognition accuracy is higher.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims

1. A gait recognition method based on visual angle transformation comprises the following steps:

2. The gait recognition method according to claim 1, characterized in that: the step S100 includes:

3. The gait recognition method according to claim 2, characterized in that: the second step S200 includes:

step S202: setting the target view angle beta to be 90 degrees, and obtaining the gait energy image I obtained in the step S201^αAnd the gait energy map I^αGait energy map I generated by initial generator G^β' As input, training true and false discriminator D_RDistinguishing a real image from a generated image;

step S203: mapping the gait energy I^αAnd the gait energy map I^β' As input, training an identity discriminator D_ATo judge whether the real image and the generated image are the same person;

step S204: mapping the gait energy I^αAnd a target view indicator v^βInput to a generator G that is trained to generate a gait energy map I with a target perspective β^β；

step S206: generating a new gait energy image set by using the perspective conversion model for the gait energy images in the gait training set obtained in the first step S100, and using the new gait energy image set as the training set of the reference gait feature extraction model in the third step S300; and generating a new gait energy image set by using the gait energy image in the gait test set obtained in the first step S100 by using the perspective transformation model, and using the new gait energy image set as the test set of the reference gait feature extraction model in the third step S300.

4. The gait recognition method according to claim 3, characterized in that: after repeating the steps S202 to S204 until the discrimination probabilities of the true and false discriminators and the identity discriminator approach 0.5 and stabilize, the generator G obtained in the step S205 is used as the view angle conversion model.

5. The gait recognition method according to claim 3, characterized in that: the real image in step S202 is the gait energy map I obtained from step S201^αThe generated image is the gait energy map I^αThe gait energy map I generated by the initial generator G^β，。

6. The gait recognition method according to claim 3, characterized in that: the third step S300 includes:

step 302: sending the pixel image into a convolutional neural network of the reference gait feature extraction model for feature extraction to obtain a feature vector, inputting the feature vector into a full connection layer of the reference gait feature extraction model, and obtaining the pedestrian prediction vector with dimensionality equal to the number of pedestrian categories through a softmax function; and

step 303: and calculating total loss according to the feature vector and the pedestrian prediction vector, optimizing parameters of the reference gait feature extraction model by using a gradient descent algorithm, and finally obtaining the trained gait feature extraction model.

7. The gait recognition method according to claim 6, characterized in that: in step S303, the total loss is algebraically adding the ternary loss and the ID loss, and the parameter of the reference gait feature extraction model includes a weight parameter w_iAnd a bias parameter b_i。

8. The gait recognition method according to claim 7, characterized in that: calculating the ternary loss according to the obtained feature vector (taking 16 groups as the group number m):

wherein, a_iFeature vector, p, representing a target picture_iFeature vectors, n, representing positive sample pictures (belonging to the same category, i.e. the same person, as the target picture)_iRepresenting the feature vector of a negative sample picture (not the same person as the target picture), all three dimensions are 1000 multiplied by 1 dimension, a_i，p_i，n_iForming a triple for loss calculation; margin is a parameter, set here to 0.3, m represents the number of triplets extracted from the training batch, d (a)_i，n_i) For the Euclidean distance, the formula is calculated (where z is the feature vector dimension, here 1000):

wherein N is the pedestrian categoryNumber, v is the full connection layer output vector, v_jFor the j-th value in v, i represents the pedestrian category which needs to be calculated currently, the result of calculation through the activation function is between 0 and 1, and the softmax values of all categories are summed to be 1.

9. The gait recognition method according to claim 7, characterized in that: calculating the ID loss according to the obtained pedestrian prediction vector:

10. The gait recognition method according to claim 7, characterized in that: the step 303 of optimizing the parameters of the reference gait feature extraction model by using the gradient descent algorithm is to:

11. a gait recognition system based on perspective transformation, comprising:

the system comprises a data acquisition and processing device, a pedestrian monitoring device and a monitoring and control device, wherein the data acquisition and processing device acquires monitoring videos from a plurality of pedestrian monitoring devices, processes the monitoring videos to obtain a pedestrian gait data set, and divides the pedestrian gait data set into a gait training set and a gait testing set;

and the gait recognition device is used for acquiring an image generated by the GaitGAN network, preprocessing the image to obtain a pixel image, inputting the pixel image into a reference gait feature extraction model to obtain a feature vector and a pedestrian prediction vector, calculating total loss according to the feature vector and the pedestrian prediction vector, optimizing parameters of the reference gait feature extraction model by using a gradient descent algorithm, and finally obtaining a trained gait feature extraction model.