CN113011396A

CN113011396A - Gait recognition method based on deep learning cascade feature fusion

Info

Publication number: CN113011396A
Application number: CN202110460610.XA
Authority: CN
Inventors: 罗俊; 李华洋; 王慧燕
Original assignee: Zhejiang Gongshang University; Third Research Institute of the Ministry of Public Security
Current assignee: Zhejiang Gongshang University; Third Research Institute of the Ministry of Public Security
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2021-06-22
Anticipated expiration: 2041-04-27
Also published as: CN113011396B

Abstract

The invention discloses a gait recognition method based on deep learning cascade feature fusion. The method comprises the steps of firstly reading a video frame, carrying out local feature extraction through a local feature extraction branch, simultaneously extracting overall contour features through a global feature extraction branch, and then fusing the two types of features for subsequent identification. The method uses an improved GaitSet network as a local feature extraction branch to slice the image and extract local features; obtaining a global feature extraction branch through an improved GaitSet network, so that the network is more concerned about the overall contour feature of the image; fusing the two types of features through a well-designed feature fusion branch enables the overall framework to be represented by more complete features. The invention has better universality and can be suitable for other gait recognition models.

Description

Gait recognition method based on deep learning cascade feature fusion

Technical Field

The invention belongs to the field of video image processing and gait recognition in computer vision, and relates to a gait recognition method based on cascade feature fusion of deep learning.

Background

Gait recognition is a new biometric recognition technology for identity verification through pedestrian walking posture. Compared with the technologies such as face recognition, fingerprint recognition, iris recognition and the like, the gait recognition research starts relatively late, and the recognition can be completed without active cooperation of a recognition object due to the advantages of non-contact, long distance, difficulty in camouflage and the like, so that the gait recognition technology can be widely applied to the fields of smart cities, smart traffic and the like and scenes such as suspects and the like can be searched.

In recent years, with the wide application of deep neural networks, gait recognition has been greatly developed. The existing gait recognition method based on deep learning can be roughly divided into two types: template-based methods and sequence-based methods, both identified by pedestrian gait contours, but most methods use only global contour features or only local contour features, possibly losing some useful information; even though some methods use global and local contour features, the two types of feature extraction branches share weights, and the network cannot learn the respective unique features.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a gait recognition method based on deep learning cascade feature fusion, which fuses improved global and local feature extraction models, improves the accuracy of gait recognition and can be widely applied to other gait recognition networks.

The technical scheme adopted by the invention for solving the technical problem is as follows:

step 1, accessing a pedestrian gait image sequence or video, inputting the pedestrian gait image sequence or video into a local feature extraction branch based on a GaitSet network, and carrying out slicing processing on the image to extract a local feature F_Local。

Step 2, sequencing or looking at pedestrian gait imagesThe frequency input is based on a GaitSet network improved global feature extraction branch, the branch completely reserves the overall contour feature information of the image to obtain a global feature F_Global。

And 3, performing feature fusion on the extracted local features and the extracted global features through the feature fusion branch.

And 4, during network training, using triple loss for the fusion characteristics to perform subsequent back propagation and update network parameters.

And 5, calculating the Euclidean distance between the fusion characteristics of the probe and the galery when the trained network is used for forward reasoning, and calculating the rank-1 recognition accuracy according to the distance.

The technical scheme provided by the invention has the beneficial effects that: the GaitSet network is used as a local feature extraction branch to slice the image to extract local features, the GaitSet network is improved to keep complete contour information of the image to extract global features, meanwhile, a feature fusion branch is designed to learn context information and perform feature fusion on the two types of features respectively, the global features in the GaiSet are enhanced, final representation of the network extraction features is more complete, and identification accuracy is improved. The invention can realize high-precision gait recognition through image sequence or video input without other auxiliary equipment.

Drawings

In order to more clearly show the network structure and the training and forward reasoning processes in the embodiment of the present invention, the drawings used in the embodiment will be briefly described below.

FIG. 1 is a structure of a gait recognition method of deep learning cascade feature fusion according to the present invention;

FIG. 2 is a flow chart of the present invention for training;

FIG. 3 is a flow chart of forward reasoning according to the present invention.

Detailed Description

In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.

The invention provides a general gait recognition method based on deep learning cascade feature fusion, wherein a network framework structure is shown in figure 1 and mainly comprises three branches, namely a whole gait feature extraction branch, a local gait feature extraction branch and a feature fusion branch, and a structure diagram of a residual error network Res and an attention model Att is also shown in figure 1.

The network training process is shown in fig. 2. The brief steps are as follows: firstly, reading a video frame and extracting local gait features; secondly, extracting global and local gait features from the video frame at the same time; thirdly, enhancing and fusing the two characteristics; fourthly, calculating the Loss of the fusion characteristics; and fifthly, propagating and updating the network parameters in the reverse direction.

The network forward reasoning process is shown in fig. 3. The brief steps are as follows: firstly, reading a video frame and extracting local gait features; secondly, extracting global gait features from the video frames at the same time; thirdly, enhancing and fusing the two characteristics; and fourthly, calculating the Euclidean distance between the fusion characteristics of the probe and the galery, and calculating the rank-1 identification accuracy according to the distance.

Example (b):

a gait recognition method based on deep learning cascade feature fusion comprises the following steps:

step 1, accessing a gait sequence or a video, extracting features through a GaitSet local feature branch, and carrying out slicing processing on the features during feature mapping to obtain local features of each slice.

Step 2, extracting global gait features through a global feature extraction branch improved based on GaitSet, specifically:

and inputting the gait sequence or the video into the global characteristic extraction branch to extract global gait characteristics. This branch identifies the network based on GaitSet gait. A horizontal feature mapping (HMP) module in the GaitSet gait recognition network carries out horizontal blocking on the features and then carries out subsequent processing respectively, the method is changed into the method for directly carrying out fusion operation of Global Average Pooling (GAP) and Global Maximum Pooling (GMP) on the features, blocking processing is not carried out, so that the network focuses on the overall gait contour features, and when the pedestrian contour is subjected to local detail change, more robust features can be extracted.

Step 3, performing feature fusion on the local gait feature and the global gait feature through an improved multi-cascade attention-enhancing feature fusion network module, namely the feature fusion branch, specifically:

as shown in FIG. 1, the feature fusion module firstly amplifies the receptive field of the output neuron by cascading two residual error network (Res) branches to further enhance the semantic expression of the global features, wherein the output F of one branch_{Global_Res1}Adding (Add) the feature map with the input of another branch, and obtaining the output after the fusion and passing through a Res residual error network, wherein the output is represented as F_{Global_Res2}. The basic units of Res are a 3 x 3 max pooling, a 1 x 1 convolution, a BN layer, a ReLu activation function, and an upsampling using bilinear differences, see fig. 1. The global feature enhancement branch is also added with a 1 x 1 convolution branch for retaining complete overall gait contour information, and the output F of the convolution branch_{Global_1}Directly with F_{Global_Res1}And F_{Global_Res2}Weighted sum enhanced global feature F_G。

The local feature enhancement network also enhances semantic expressions by cascading two residual network (Res) branches, where the output F of one branch_{Local_Res1}Add fusion with the input of the other branch, the output obtained after one Res after fusion is denoted as F_{Local_Res2}Output F_{Local_Res1}And F_{Local_Res2}Weighted sum to obtain F_{Local_Res}And further extracting features by using a 3 x 3 convolutional network to obtain F_{Local_Res3}Then F is put_{Local_Res3}Inputting an attention model to obtain F_AttThe basic unit Att of the attention model is a weighted sum of global maximum pooling and global average pooling, 1 × 1 convolution, a BN layer and a Sigmoid activation function, is used for enriching the context information of local features, and can capture finer spatial resolution information, F_{Local_Res3}、F_AttAnd F_GAnd performing weighted sum to obtain fusion characteristics.

The whole fusion module enriches the information of feature propagation through two residual error network cascades and a self-attention mechanism, generates rich context information, and is vital to gait recognition based on a video sequence.

And 4, calculating triple losses for the fusion characteristics during network training, and then performing reverse propagation to update network parameters.

And 5, calculating the Euclidean distance between the fusion characteristics of the probe and the galery when the trained network is used for forward reasoning, sorting according to the distance, and calculating rank-1 recognition accuracy, wherein the closest distance is a sequence from the same sample.

Claims

1. The gait recognition method based on the deep learning cascade feature fusion is characterized by comprising the following steps:

step 1, accessing a pedestrian gait image sequence or video, inputting the pedestrian gait image sequence or video into a local feature extraction branch based on a GaitSet network, and carrying out slicing processing on the image to extract a local feature F_Local；

Step 2, inputting the pedestrian gait image sequence or video into a global feature extraction branch improved based on a GaitSet network, wherein the branch completely reserves the overall contour feature information of the image to obtain a global feature F_Global；

Step 3, performing feature fusion on the extracted local features and the extracted global features through the feature fusion branch;

step 4, during network training, triple losses are used for fusion characteristics to perform subsequent back propagation and update network parameters;

2. The gait recognition method based on deep learning cascade feature fusion of claim 1,

the global feature extraction branch in the step 2 is improved based on a GaitSet network, and a horizontal feature mapping module in the GaitSet network is changed into a fusion operation of directly performing global average pooling and global maximum pooling on the features.

3. The gait recognition method based on deep learning cascade feature fusion of claim 2,

after the fusion operation of global average pooling and global maximum pooling is performed, the features are mapped through a full-link layer.

4. The gait recognition method based on deep learning cascade feature fusion according to any one of claims 1 to 3,

the feature fusion branch firstly carries out output amplification on the local feature extraction branch and the global feature extraction branch, and then fuses the amplified features;

wherein the global feature F_GlobalObtaining a first global residual error characteristic F after passing through a first residual error network Res_{Global_Res1}First global residual feature F_{Global_Res1}And global feature F_GlobalAdding and fusing the residual errors through a second residual error network Res to obtain a second global residual error characteristic F_{Global_Res2}Global feature F_GlobalObtaining a global feature F through a 1 x 1 convolution branch_{Global_1}Then the global feature F is added_{Global_1}First global residual error feature F_{Global_Res1}And a second global residual feature F_{Global_Res2}Adding and fusing to obtain enhanced global feature F_G；

Wherein the local feature F_LocalObtaining a first local residual error characteristic F after passing through a third residual error network Res_{Local_Res1}First local residual bit F_{Local_Res1}And local feature F_LocalAdding and fusing the residual errors through a fourth residual error network Res to obtain a second local residual error characteristic F_{Local_Res2}First local residual feature F_{Local_Res1}And a second local residual feature F_{Local_Res2}Adding and fusing to obtain local residual error characteristics F_{Local_Res}Using a 3 x 3 convolutional network on the local residual features F_{Local_Res}Further extracting the features to obtain a third local residual error feature F_{Local_Res3}The third local residual error feature F_{Local_Res3}Inputting an attention model to obtain a local attention feature F_AttFinally, the third local residual error feature F_{Local_Res3}Local attention feature F_AttAnd global feature F_GAnd adding to obtain a fusion characteristic.