CN112434655A

CN112434655A - Gait recognition method based on adaptive confidence map convolution network

Info

Publication number: CN112434655A
Application number: CN202011432129.1A
Authority: CN
Inventors: 唐俊; 徐硕; 朱明�; 王年; 鲍文霞
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-02
Anticipated expiration: 2040-12-07
Also published as: CN112434655B

Abstract

The invention discloses a gait recognition method based on a self-adaptive confidence map convolution network, which comprises the following steps: 1, acquiring a data set containing human body posture information; 2, establishing a self-adaptive confidence map convolution network model; 3 off-line training the built self-adaptive confidence map convolution network model; and 4, realizing prediction by using the established model so as to achieve the purpose of gait recognition. The gait feature can be extracted from the self-adaptive confidence map convolutional network attitude data and the influence of noise in the attitude data is reduced, so that the accuracy of gait recognition can be improved, and the defect that the conventional gait recognition algorithm based on the appearance is sensitive to appearance change is improved.

Description

Gait recognition method based on adaptive confidence map convolution network

Technical Field

The invention relates to the technical field of computer vision, in particular to a gait recognition method based on a self-adaptive confidence map convolution network.

Background

Identity recognition is an important subject in the field of computer vision and plays an important role in the fields of video monitoring and social security. Gait recognition aims at judging the identity of a person through a walking mode of the person, and compared with other identity recognition algorithms, the gait recognition method has the advantages of non-contact, long acting distance and the like. Gait recognition algorithms are mainly divided into two main categories: appearance-based algorithms and model-based algorithms.

Appearance-based algorithms typically utilize a body contour image as input data that can retain gait information while removing a map. In the early algorithm, a gait template is constructed, and an image sequence is fused into a template image and then is identified. Recent algorithms based on deep learning use convolutional neural networks to extract gait features from gait templates or gait sequences and then identify the gait features. The algorithm based on the appearance is widely applied, but is also easily interfered by external factors, such as the change of the human appearance (clothes change, carrying and the like), the change of the visual angle, the walking speed and the like.

The model-based algorithm models gait using a priori knowledge of the structure and motion of the human body, and is therefore more robust to disturbances such as appearance, perspective, and the like. However, the early algorithms can only be modeled under limited conditions, and are low in application range and accuracy compared with appearance-based algorithms. In recent years, deep learning has attracted wide attention in the computer vision field, and people have applied the deep learning to various vision tasks, wherein a posture estimation algorithm based on the deep learning can more accurately acquire human body posture information, and a new idea is provided for gait recognition based on a model. However, due to the limitation of the performance of the pose estimation algorithm itself and the interference of factors such as occlusion, background interference, low resolution in the image, the pose learning obtained by this method often includes noise. On one hand, the existing gait algorithm based on the posture does not perform targeted processing on noise in the posture data, so that the recognition accuracy is influenced; on the other hand, the relation between the posture data is not utilized, and the extracted gait features are not comprehensive enough.

Disclosure of Invention

The invention provides a gait recognition method based on a self-adaptive confidence map convolution network, aiming at overcoming the defects of the prior art, and aiming at extracting effective gait expression from human body posture key point data containing noise by using the self-adaptive confidence map convolution network and reducing the influence of the noise in the posture data, and simultaneously considering the relation between the posture key points so as to improve the gait recognition accuracy and improve the defect that the prior gait recognition algorithm based on the appearance is sensitive to appearance change.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention relates to a gait recognition method based on a self-adaptive confidence map convolution network, which is characterized by comprising the following steps of:

step 1, extracting human body posture key points from an original video data set by using a posture estimation algorithm, obtaining gait sequences of N T frames, carrying out standardization processing, obtaining a standardized posture skeleton data set and recording the standardized posture skeleton data set as a standard posture skeleton data set

Wherein,

represents the nth normalized pose skeleton sequence, and

representing the nth normalized pose skeleton sequence

A middle t frame posture skeleton;

representing the t frame pose skeleton

The confidence corresponding to the middle posture key point; n1, 2, 1., N, T1, 2., T;

step 2, establishing an adaptive confidence map convolution network model, which comprises the following steps: the system comprises a first-layer input module, an alpha-layer adaptive confidence coefficient module, a beta-layer space-time feature fusion module and a first-layer output module;

the input module is realized by a batch normalization layer;

the self-adaptive confidence module consists of a self-adaptive confidence map convolution layer, a normalization layer, a ReLU activation layer and a Dropout layer in sequence;

the space-time feature fusion module sequentially consists of a self-adaptive graph convolution layer, a normalization layer, a ReLU activation layer, a Dropout layer, a convolution layer, a normalization layer and a ReLU activation layer;

the output module consists of a pooling layer and a full-connection layer with a softmax function;

step 2.1, the standardized posture skeleton is used for describing the posture data in the data set X

According to the human body structure composition, regarding each gesture key point as a vertex set V of the graph, regarding natural connection among the gesture key points as an edge set E of the graph, and forming graph data as G (V, E);

step 2.2, the graph data G is sent to an input module for data normalization processing, and normalized graph data G ' is obtained, wherein V ' represents a normalized vertex set, and E ' represents an adjacency matrix subjected to regularization;

step 2.3, the normalized graph data G' and the confidence coefficient vector in the normalized posture skeleton data set X

Sequential input alpha-layer adaptive confidence modulusExtracting features from the graph data G' according to a preset graph volume operator, and weighting the extracted features by using a confidence coefficient vector; obtaining a space gait feature sequence after an alpha layer self-adaptive confidence coefficient module

Wherein, F_t ^sRepresenting a sequence of spatial features F^sA characteristic map of a middle t-th frame;

step 2.4, the space gait feature sequence is processed

Inputting the data into a beta layer space-time characteristic fusion module in sequence, and utilizing a graph convolution operator to obtain a space gait characteristic sequence F^sFurther extracting deep space gait features, and extracting space-time gait features from the space gait feature sequence by utilizing a time sequence convolution operator; obtaining a space-time gait feature sequence after a space-time feature fusion module of a beta layer

Wherein, F_t ^stRepresenting a sequence of spatio-temporal features F^stThe T th frame characteristic diagram, T' represents the frame number of the space-time characteristic sequence after passing through all the space-time characteristic fusion modules;

step 2.5, the space-time gait feature sequence is processed

Carrying out averaging processing on a time sequence to fuse T' frame feature maps; carrying out average processing on the fused feature images on the space so as to fuse the features of all the attitude key points, thereby obtaining the final gait feature f;

step 2.6, the gait characteristics f are sent to an output module to obtain a prediction result y;

step 3, calculating triple losses according to the gait characteristics f, calculating cross entropy losses according to the prediction result y, updating the weight of the adaptive confidence map convolution network model by using the SGD, and finishing training when the loss value tends to be stable to obtain an optimal adaptive confidence map convolution network model;

and 4, extracting gait features from the sequence to be retrieved and all sequences in the retrieval library by using the optimal self-adaptive confidence map convolution network model, calculating Euclidean distances between the feature to be retrieved and all features in the retrieval library, and sequencing the features from small to large according to the distances to obtain a retrieval result.

The gait recognition method based on the adaptive confidence map convolution network is also characterized in that the step 2.3 comprises the following steps:

step 2.3.1, input of each adaptive confidence map convolutional layer is recorded as

Order to

Obtaining a characteristic diagram after respectively passing through two convolution operators

And

by matrix multiplication

And

normalizing the result by using a softmax function to obtain an adaptive matrix A;

step 2.3.2, vector confidence

Expanding the matrix into a matrix with the same size as the adjacent matrix A according to columns, and recording the matrix as a confidence matrix C; generating an all-0 matrix with the same size as the adaptive matrix B;

step 2.3.3, the adjacent matrix E', the self-adaptive matrix A and the self-adaptive matrix B are added and multiplied by the confidence coefficient matrix C to calculate the Hadamard productObtaining a final adaptive confidence adjacent matrix E ', and inputting the final adaptive confidence adjacent matrix E' and the input

The outputs of the adaptive branches in the adaptive confidence map convolution layer are obtained after passing through a map convolution operator

Step 2.3.4, input the

And confidence vector

Obtaining the output of the confidence branch in the self-adaptive confidence map volume layer through a convolution operator after multiplication and weighting

Step 2.3.5, outputting the self-adaptive branch

Output with confidence branch

Adding to obtain the output of the adaptive confidence map convolutional layer

Then an intermediate result of the convolution module of the self-adaptive confidence map is obtained through the normalization layer, the ReLU activation layer and the Dropout layer

Step 2.3.6, if the input is

Size and intermediate results

If not, then order the input

Passing through a convolutional layer and then combining it with an intermediate result

Equal in size and intermediate in result

Adding to obtain the output of the convolution module of the self-adaptive confidence map

If it is as described

Size and intermediate results

If they are equal, directly order the input

And intermediate results

Step 2.3.7, repeating the step 2.3.1 to the step 2.3.6 until all the self-adaptive confidence coefficient modules are passed, thereby obtaining a space gait feature sequence F^s。

The step 2.4 comprises:

step 2.4.1, input of each space-time feature fusion module is recorded as

Order to

And

by matrix multiplication

And

normalizing the result by using a softmax function to obtain an adaptive matrix A';

step 2.4.2, adding the adjacent matrix E ' and the self-adaptive matrix A ' to obtain a self-adaptive adjacent matrix E '; will input

Obtaining the output of the adaptive graph convolution layer in the space-time characteristic fusion module through a graph convolution operator with the adaptive adjacent matrix E

Step 2.4.3, output of the adaptive graph convolution layer

Sequentially passing through a batch normalization layer, a ReLU activation layer and a Dropout layer to obtain a deep space gait characteristic sequence F^st′；

Step 2.4.4, the deep space gait feature sequence F^st′Sequentially passing through a convolution layer, a batch normalization layer and a ReLU activation layer to obtain intermediate results of the space-time feature fusion module

Step 2.4.5, if the input

Size and intermediate results of

If not, then order the input

Equal in size and intermediate in result

Adding to obtain the output of the space-time feature fusion module

If it is inputted

Size and intermediate results

If they are equal, directly order the input

And intermediate results

Adding to obtain the output of the space-time feature fusion module

Step 2.4.6, repeating the step 2.4.1 to the step 2.4.5 until all the self-adaptive confidence coefficient modules are passed, thereby obtaining a space-time gait feature sequence F^st。

Compared with the prior art, the invention has the beneficial effects that:

1. compared with the conventional mainstream gait recognition method, the method has the advantages of low calculation amount, higher efficiency and better robustness to external interference factors such as appearance change and the like, thereby being more beneficial to the practical application of gait recognition.

2. The gait features are extracted from the human body posture information through the designed self-adaptive confidence map convolution network, compared with the traditional convolution network, the network can better process posture key point data, and can extract richer gait features by utilizing the relation among the posture key points, so that the accuracy of gait recognition is improved.

3. The invention constructs a self-adaptive confidence weighting mechanism applied to the graph convolution network, and uses the confidence obtained by the attitude extractor to weight each attitude key point in the convolution process, thereby reducing the adverse effect of noise points, more utilizing accurate attitude information and finally improving the accuracy of gait recognition.

4. The invention increases the diversity of gait characteristics and ensures the accuracy of the gait characteristics by constructing the self-adaptive confidence map convolution network and applying the attention mechanism.

Drawings

FIG. 1 is a schematic flow diagram of the process of the present invention;

FIG. 2 is a block diagram of an adaptive confidence map convolutional layer in accordance with the present invention;

FIG. 3 is a block diagram of an adaptive confidence module of the present invention;

FIG. 4 is a block diagram of a spatiotemporal feature fusion module according to the present invention;

FIG. 5 is a block diagram of an adaptive confidence map convolution network in accordance with the present invention.

Detailed Description

In this embodiment, a gait recognition method based on an adaptive confidence graph convolution network mainly extracts effective gait features from noisy posture data by using a graph convolution network and an attention mechanism, as shown in fig. 1, the method includes the following specific steps:

Wherein,

represents the nth normalized pose skeleton sequence, and

representing the nth normalized pose skeleton sequence

A middle t frame posture skeleton;

representing the t frame pose skeleton

The confidence corresponding to the middle posture key point; n1, 2, 1., N, T1, 2., T; in the embodiment, a CASI-B gait data set is used for training and testing, and the data set comprises 124 walking videos of people in different walking states and different visual angles; using the data of the first 62 persons as a training set and the data of the second 62 persons as a testing set; the attitude estimation algorithm is AlphaPose; because the distance between the pedestrian and the camera is constantly changed when the pedestrian walks, the acquired posture frameworks are different in size, and all posture framework data need to be standardized in order to avoid the influence of the size on gait characteristics; let the original posture skeleton be P ═ v_i1, 2., M }, where v ═ i ═ 1,2_iRepresenting the ith posture key point in the posture skeleton, and M represents the number of the posture key points in one posture skeleton; the standardized posture skeleton is recorded as

The process of normalization can be expressed as formula (1):

in the formula (1), v^neckAnd v^hipRespectively representing the coordinates of a neck key point and a hip key point, and dist represents a Euclidean distance function; because the neck key points and the hip key points are accurate in posture estimation, the distance between the two key points is normalized to obtain posture data with relatively consistent sizes; during training, the length T of the video sequence is 30, because a complete gait cycle is about 25 frames, 30 frames are taken to ensure that the gait data comprises a complete gait, and if the length of the video in the data set is shorter than 30 frames, the video is repeated to be more than 30 frames and then the previous 30 frames are intercepted; the uniform video length is used to speed up the training process with batch processing;

step 2, establishing an adaptive confidence map convolution network model, which comprises the following steps: the system comprises a layer of input module, an alpha layer self-adaptive confidence coefficient module, a beta layer space-time feature fusion module and a layer of output module, wherein as shown in fig. 5, in the embodiment, alpha is 3, and beta is 7, so that the network achieves the optimal identification accuracy rate;

the input module is realized by a batch normalization layer;

as shown in fig. 3, the adaptive confidence module is sequentially composed of an adaptive confidence map convolution layer, a normalization layer, a ReLU activation layer, and a Dropout layer;

as shown in fig. 4, the spatio-temporal feature fusion module sequentially comprises a self-adaptive graph convolution layer, a normalization layer, a ReLU activation layer, a Dropout layer, a convolution layer, a normalization layer and a ReLU activation layer;

step 2.2, sending the graph data G into an input module for data normalization processing to obtain normalized graph data G ═ V ', E'), where V 'represents a normalized vertex set, and E' represents an adjacency matrix subjected to regularization; the input module enables the data to be normalized and processed into data which meet the distribution that the mean value is 0 and the standard deviation is 1, and the input module is beneficial to reducing the deviation between the distribution of the training set and the test set, so that the accuracy rate is improved;

Sequentially inputting an alpha-layer self-adaptive confidence coefficient module, extracting features from the image data G' according to a preset image volume operator, and weighting the extracted features by using a confidence coefficient vector; obtaining a space gait feature sequence after an alpha layer self-adaptive confidence coefficient module

Wherein,

representing a sequence of spatial features F^sA characteristic map of a middle t-th frame; in the embodiment, 3 layers of adaptive confidence modules are used, and the number of feature map channels output by each layer of confidence module is respectively 64, 64 and 64, as shown in fig. 5;

step 2.3.1, the structure of the adaptive confidence map convolutional layer in the adaptive confidence module is shown in FIG. 2, and the input of each adaptive confidence map convolutional layer is recorded as

Order to

And

by matrix multiplication

And

normalizing the result by using a softmax function to obtain an adaptive matrix A; for the posture key points in the posture skeleton, the importance of each key point is different, and in the gait recognition task, it is obvious that the leg key points contain more gait information than the head key points, so that each key point needs to be weighted; the adjacent matrix E' is only used to indicate whether there is a connection between the key points, and does not indicate the weight of each key point, so an additional weight matrix is required; calculating the correlation among all key points through an attention mechanism, wherein the obtained matrix A can adaptively update the weight of each key point;

step 2.3.2, vector confidence

Expanding the matrix into a matrix with the same size as the adjacent matrix A according to columns, and recording the matrix as a confidence matrix C; generating an all-0 matrix with the same size as the adaptive matrix B; the graph convolution process is a process of exchanging information for each key point, but since some key points are noise data and the contained information is not accurate, the noise key points need to be suppressed in the information exchange process; after the confidence coefficient vector is expanded into a confidence coefficient matrix according to columns, the confidence coefficient matrix can be used for weighting the key points according to the confidence coefficient of the key points in the graph convolution process, and therefore the propagation of the noise key points to other key points is restrainedMeanwhile, the noise key point is not influenced to receive the information of the correct key point; the adaptive matrix B is a further supplement to the adjacent matrix E ', in E', the connection between key points is defined only according to the natural connection of human bodies, but implicit connection exists between key points, for example, when a person walks, the movement directions of arms and legs on the same side are opposite, which shows that the implicit connection exists between the arms and the legs, and the adaptive matrix B network is automatically updated in the training process to expand the connection and weight between the key points.

Step 2.3.3, the adjacency matrix E ', the self-adaptive matrix A and the self-adaptive matrix B are added and multiplied by the confidence coefficient matrix C to calculate a Hadamard product to obtain a final self-adaptive confidence coefficient adjacency matrix E ', and the final self-adaptive confidence coefficient adjacency matrix E ' is input with the input

Step 2.3.4, input

And confidence vector

Step 2.3.5, outputting the self-adaptive branch

Output with confidence branch

Adding to obtain an adaptive deviceReliability map convolution layer output

Then an intermediate result of the convolution module of the self-adaptive confidence map is obtained through a normalization layer, a ReLU activation layer and a Dropout layer

The reason for using Dropout layers is to avoid the overfitting phenomenon of the network, and the probability of all Dropout layers in this embodiment is 0.5;

step 2.3.6, if input

Size and intermediate results

If not, then order the input

Equal in size and intermediate in result

If it is

Size and intermediate results

If they are equal, directly order the input

And intermediate results

This step is to input

And intermediate results

Adding, and improving the performance of the network through residual learning;

step 2.3.7, repeating the step 2.3.1 to the step 2.3.6 until all the self-adaptive confidence coefficient modules are passed, thereby obtaining a space gait feature sequence F^s；

Step 2.4, the space gait feature sequence

Wherein,

representing a sequence of spatio-temporal features F^stThe T th frame characteristic diagram, T' represents the frame number of the space-time characteristic sequence after passing through all the space-time characteristic fusion modules; the main difference between the spatio-temporal feature fusion module and the adaptive confidence coefficient module is that the spatio-temporal feature fusion module does not use confidence coefficient for weighting, and only uses an attention mechanism to expand an adjacency matrix, because there are two reasons: on one hand, after passing through the alpha-layer self-adaptive confidence module, each key point carries out information exchange, and the original noise key point is correctly receivedThe noise of the information itself is suppressed, so that the information can be regarded as a correct key point; on the other hand, confidence weighting requires that the confidence is in one-to-one correspondence with each attitude key point, and the time-space feature fusion module changes the length of the attitude sequence in the time sequence convolution process and does not meet the one-to-one correspondence relationship; in this embodiment, 7 layers of spatio-temporal feature fusion modules are used, and the number of feature map channels output by each layer of spatio-temporal feature fusion module is 64, 128, 128, 128, 256, 256 and 256, as shown in fig. 5; the step length of the convolution layers in the spatio-temporal feature fusion modules of the 2 nd layer and the 5 th layer is 2, and the step length of the convolution layers in the other spatio-temporal feature fusion modules is 1;

step 2.4.1, input of each space-time feature fusion module is recorded as

Order to

And

by matrix multiplication

And

Step 2.4.3, output of the adaptive graph convolution layer

Step 2.4.5, if input

Size and intermediate results of

If not, then order the input

Equal in size and intermediate in result

Adding to obtain the output of the space-time feature fusion module

If it is inputted

Size and intermediate results

If they are equal, directly order the input

And intermediate results

Adding to obtain the output of the space-time feature fusion module

Step 2.4.6, repeating the step 2.4.1 to the step 2.4.5 until all the self-adaptive confidence coefficient modules are passed, thereby obtaining a space-time gait feature sequence F^st；

Step 2.5, the space-time gait feature sequence

Carrying out averaging processing on a time sequence to fuse T' frame feature maps; carrying out average processing on the fused feature images on the space so as to fuse the features of all the attitude key points, thereby obtaining the final gait feature f; in the step, the sequence characteristics are fused by average processing, so that the gait characteristics with the same size can be obtained after the sequences with different lengths pass through a network, and the test is convenient;

step 3, calculating triple losses according to the gait characteristics f, calculating cross entropy losses according to the prediction result y, updating the weight of the adaptive confidence map convolution network model by using the SGD, and finishing training when the loss value tends to be stable to obtain an optimal adaptive confidence map convolution network model; in this embodiment, 12 persons are randomly selected from the training set at each iteration, and then 2 gait sequences are randomly selected from the data of each person, so that the batch size at each iteration is 24 sequences; the boundary distance of the triple loss is set to be 1.5, and the weight of the triple loss is 0.25; the training process adopts a preheated training strategy, the strategy is helpful for relieving the overfitting phenomenon of the network, the accuracy is improved, and the specific strategy is implemented as follows: setting the initial learning rate to 0.0001, linearly increasing to 0.1 in 1000 iterations, reducing the learning rate to 0.1 time from the 5000 th iteration, the 7000 th iteration and the 9000 th iteration, and ending the training after 10000 iterations;

step 4, extracting gait features from the sequence to be retrieved and all sequences in the retrieval library by using an optimal self-adaptive confidence map convolution network model, calculating Euclidean distances between the feature to be retrieved and all features in the retrieval library, and sequencing the features from small to large according to the distances to obtain a retrieval result; during testing, all frames of each sequence are sent into a model to extract gait characteristics so as to fully utilize information in data.

Claims

1. A gait recognition method based on a self-adaptive confidence map convolution network is characterized by comprising the following steps:

Wherein,

represents the nth normalized pose skeleton sequence, and

representing the nth normalized pose skeleton sequence

A middle t frame posture skeleton;

representing the t frame pose skeleton

the input module is realized by a batch normalization layer;

step 2.4, the space gait feature sequence is processed

step 2.5, the space-time gait feature sequence is processed

2. The gait recognition method based on the adaptive confidence map convolutional network of claim 1, wherein the step 2.3 comprises:

Order to

Obtaining a characteristic graph F after respectively passing through two convolution operators₁ ^sAnd

calculating F by matrix multiplication₁ ^sAnd

step 2.3.2, vector confidence

step 2.3.3, adding the adjacency matrix E ', the self-adaptive matrix A and the self-adaptive matrix B and then adding the adjacency matrix E', the self-adaptive matrix A and the self-adaptive matrix B with the confidence matrixC, multiplying and calculating a Hadamard product to obtain a final adaptive confidence coefficient adjacent matrix E ', and combining the final adaptive confidence coefficient adjacent matrix E' with the input