CN116563937A

CN116563937A - Multi-granularity feature learning gait recognition method based on key frames

Info

Publication number: CN116563937A
Application number: CN202310106799.1A
Authority: CN
Inventors: 付利华; 吴会贤; 张梓通; 邢旻与; 董光建
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2023-02-13
Filing date: 2023-02-13
Publication date: 2023-08-08

Abstract

The invention discloses a multi-granularity feature learning gait recognition method based on key frames. The key frame sequence and the original gait sequence are respectively extracted with gait characteristics, and finally are fused into identification characteristics, so that the extracted identification characteristics are more discriminant; the timing information is focused on at feature extraction and global features are fused with local features. The invention optimizes the identification characteristics by using cross entropy loss and triplet loss to improve the identification accuracy. The invention solves the problems that the gait recognition technology based on the sequence in the past has no differential feature extraction, partial feature extraction is insufficient and time sequence feature extraction is insufficient during recognition. The method is suitable for gait recognition tasks of pedestrians, and has high recognition accuracy and good algorithm robustness. The invention has wide application in the gait recognition field.

Description

Multi-granularity feature learning gait recognition method based on key frames

Technical Field

The invention belongs to the field of biological feature recognition, and aims to carry out identity recognition based on posture change when people walk normally, in particular to a multi-granularity feature learning gait recognition method based on key frames.

Background

Gait recognition is an emerging human body biological feature recognition task, and aims to perform identity recognition by recognizing the posture change of people during normal walking. Compared with other biological feature recognition technologies such as face, fingerprint, iris recognition and the like, gait recognition can be recognized in a long distance without contact, and is difficult to hide and disguise, so that the method has obvious advantages. With video monitoring widely applied, gait recognition has more potential and can play an important role in various fields.

Currently, existing gait recognition technologies are mostly implemented based on deep learning. Gait recognition technology based on deep learning mainly has two strategies: template-based gait recognition and sequence-based gait recognition. The gait recognition method based on the template is to compress the gait outline image of the same pedestrian into an image or a gait template, and then recognize the obtained image or gait template. The identification method is fast, network parameters are few, but partial information can be lost due to the compression process, so that the identification accuracy is not high.

Therefore, in gait recognition, a sequence-based gait recognition method is often used. The gait recognition method based on the sequence is to take a plurality of existing gait profiles of the same pedestrian walking as a gait sequence, input the gait sequence into a recognition model to extract features and recognize the features. Although existing sequence-based gait recognition methods have made great progress in both accuracy and speed of operation, these methods still need improvement in terms of indiscriminate feature extraction, local feature extraction, and sequential feature extraction at the time of recognition. Accordingly, a new gait recognition method is currently required to solve the above problems.

Disclosure of Invention

The invention aims to solve the problems that: in the sequence-based gait recognition technology, although the existing gait recognition methods have greatly advanced in both accuracy and running speed, these methods still need to be improved in terms of non-differential feature extraction, local feature extraction and sequential feature extraction at the time of recognition.

In order to solve the problems, the invention provides a multi-granularity feature learning gait recognition method based on key frames. The method extracts key frames from gait contour diagrams in a gait sequence, selects gait contour diagrams with great influence on an identification result, and forms the gait contour diagrams into the key frame sequence. The key frame sequence and the original gait sequence are respectively extracted to be finally fused into gait recognition features, so that the extracted gait recognition features are more discriminant; the method focuses on time sequence information in feature extraction and fuses global features and local features so that the identification features are more discriminant, and comprises the following steps:

1) Selecting gait profile graphs with large influence on gait recognition results in a gait sequence to form a key frame sequence, taking the key frame sequence as an input processing branch to be called a key frame branch, and taking the original gait sequence as the input processing branch to be called a main branch so as to ensure that the gait profile graphs with large influence on the recognition results can be focused more fully;

2) And respectively performing feature extraction operations on the key frame branches and the main branches, wherein the feature extraction operations of the two branches are identical. In the feature extraction stage, performing spatial feature extraction operation on each frame of gait contour diagram in the sequence, and extracting time sequence features based on the spatial features extracted from adjacent frames; then, multi-granularity feature learning is carried out, firstly, feature extraction operation is carried out on the whole feature map to obtain global features, meanwhile, blocking is carried out on the whole feature map, feature extraction operation is carried out on the feature map after blocking to obtain local features, and then the global features and the local features are fused to obtain multi-granularity features; performing time sequence feature pooling operation on the multi-granularity features, and fusing the pooled features of the two branches to obtain gait recognition features;

3) And (5) performing gait feature matching on the gait recognition features. When gait features are matched, the generalized average pooling operation is used for adaptively extracting the space features, then the full-connection layer is used for adjusting the features, finally the similarity calculation is carried out on the gait recognition features with different samples after adjustment, and the gait recognition task is completed according to the feature similarity.

Further, the step 1) of extracting key frames in the gait sequence specifically includes:

1.1 All gait contour diagrams in the pedestrian gait sequence of the input model are regarded as a group of matrixes, the probability value k of each frame of gait contour diagram in the pedestrian gait sequence, which is a key frame, is calculated, and the calculation formula of the key frame probability value k is as follows:

wherein ,the j-th gait contour map in the gait sequence for the pedestrian labeled i>Is->Line vector of h th line in (1, H)]H is the height of the gait profile. />For row vector->W-th element of (a) in the above). />Is->Part of (a) specifically +>First->The matrix of row-to-H-th row vectors corresponds to all portions of the pedestrian's gait contour that are below the pedestrian's head.

1.2 A threshold value δ) is set, δ being related to the angle of the pedestrian with the sampling lens in the walking direction, δ being expressed as:

δ＝(De/18)％6×β+α (3)

wherein De is the angle between the walking direction of the pedestrian and the sampling lens when the current gait contour diagram is sampled, and alpha and beta are super parameters;

1.3 If the key frame probability value k > delta of the gait contour map in the gait sequence, the gait contour map is judged as a key frame. For each gait sequence, judging whether each gait contour diagram in the sequence is a key frame or not, and selecting the gait contour diagrams meeting the conditions to form a key frame sequence. The processing branch taking the key frame sequence as input is called a key frame branch, the processing branch taking the original gait sequence as input is called a main branch, and the two branches respectively perform subsequent identical feature extraction operations.

Further, the feature extraction process in the step 2) specifically includes:

the key frame branches are the same as the main branches in the feature extraction stage, and the main branches are taken as an illustration of the feature extraction process.

2.1 Space-time feature extraction, X) of gait sequences entered into this branch _in Representing the gait sequence of the current branch input, the spatiotemporal feature extraction process can be expressed as:

X _ST ＝Te(Sp(X _in )) (4)

wherein Sp is the spatial feature extraction operation, A3D convolution with a convolution kernel of 3 x 3 and a step size of 3 x 3 is represented, te is the timing feature extraction operation,/A> A 3D convolution with a convolution kernel of 3×1×1 and a step size of 3×1×1 is represented;

2.2 For the feature map X obtained in 2.1) _ST Performing multi-granularity feature learning operation, and obtaining multi-granularity feature Y after learning _MF Can be expressed as:

Y _MF ＝Y _G +Y _L (5)

wherein ,Y_G As global features, Y _L Is a local feature.

Global feature Y _G Expressed as:

Y _G ＝f _3×3×3 (X _ST ) (6)

wherein ,f_3×3×3 (. Cndot.) represents a 3D convolution layer with a convolution kernel size of 3 x 3, obtaining a feature map Y containing global information _G 。

Local feature Y _L Expressed as:

wherein ,for the characteristic diagram X _ST Local features with different granularities obtained by using 3 partitioning modes for blockingThe 3 division modes are respectively expressed as that the feature map is horizontally and averagely divided into 3 blocks, 4 blocks and 6 blocks.

Firstly byFor example, the extraction process is described.

First, feature map X _ST The horizontal division is 3 blocks; then extracting the characteristics of each block respectively; finally, performing cat operation on each extracted feature to obtain features The treatment process of (2) is shown in the formula (8):

wherein ,f_3×3×3 (·) represents a 3D convolution layer with a convolution kernel of 3 x 3,is a feature map X to be input _ST And (3) horizontally dividing the characteristic map of the ith block into 3 blocks, wherein cat represents the connection operation in the matrix, and horizontally splicing the obtained characteristic maps to form the characteristic map containing local information.

And->Extraction process and->The extraction process is similar, and the only difference is that the number of the blocks is different when the blocks are divided.

2.3 For 2.2) the obtained characteristic map Y _MF The time sequence characteristic is pooled, and the pooling process can be expressed as follows:

Y _T ＝αF _Max (Y _MF )+βF _Avg (Y _MF )+γF _Mean (Y _MF ) (9)

wherein ,F_Max (. Cndot.) represents Max-working layer, F _Avg (. Cndot.) is a Median-pivoting layer, F _Mean (. Cndot.) is the Mean-pulling layer, α, β, γ are parameters, and α+β+γ=1.

2.4 Based on step 2.3) deriving features of key frame branches, respectivelyFeatures of main branches->Fusing the two features to obtain gait recognition feature Y _M The treatment process is shown as a formula (10):

the ∈is a feature fusion operation, and the specific operation is a matrix cat operation in the channel dimension.

Step 3) performing gait feature matching on gait recognition features, specifically:

3.1 First, the generalized average pooling operation is used to improve the self-learning ability of the model so that it can adaptively integrate spatial information. Generalized average pooled feature Y _GeM The treatment process is shown as a formula (11):

wherein ,F_Avg (.) is a Median-pooling layer, p is an adaptive parameter that can be learned through the network, and when p=1, the generalized average pooling layer is equivalently an average pooling;

3.2 After generalized average pooling, the obtained characteristic Y is further processed _GeM Put into a 2D full-connection layer for adjustment to obtain Y _out The process is shown in formula (12):

Y _out ＝f _1×1 (Y _GeM ) (12)

wherein ,f_1×1 (. Cndot.) represents a 2-dimensional convolution layer with a convolution kernel of 1×1.

3.3 The gait recognition features of different samples are subjected to similarity calculation, the Euclidean similarity is adopted for calculating the similarity among the gait recognition features of different samples, and the similarity S calculation process is shown as a formula (13):

wherein ,f_i and f_j Respectively, the feature vectors obtained after different samples are input,is the characteristic vector f _i And f _j Euclidean distance in feature space after normalization.

And matching the features with the closest feature similarity as gait recognition features belonging to the same pedestrian.

3.4 The model is trained simultaneously using the cross entropy loss and the triplet loss, with the sum of the two being taken as the final loss of the model. Final loss L _com Can be expressed as:

L _com ＝L _cse +L _tri (14)

wherein ,L_cse and L_tri Respectively cross entropy loss and triplet loss. The model uses L _com The model is trained.

Cross entropy loss L _cse Expressed as:

L _cse ＝-∑ _x p(x)log q(x) (15)

wherein x is gait recognition feature output by the model, p (·) is a probability value that the current feature belongs to the target tag, and q (·) is a probability value that the current feature does not belong to the target tag, i.e., q (x) =1-p (x).

Triplet loss L _tri Can be expressed as:

L _tri ＝[D(F(i),F(k))-D(F(i),F(j))+m] ₊ (16)

wherein i and j are from the same pedestrian labelK is a sample from a different label than i and j, F (·) is a feature extraction operation corresponding to the model, D (D) ₁ ,d ₂ ) Is d ₁ And d ₂ Euclidean distance between m is the boundary of the triplet loss, operation [ gamma ]] ₊ Equal to max (γ, 0).

The invention provides a multi-granularity feature learning gait recognition method based on a key frame. The method extracts key frames from gait contour diagrams in a gait sequence, selects gait contour diagrams with great influence on an identification result, and forms the gait contour diagrams into the key frame sequence. The key frame sequence and the original gait sequence are respectively extracted with gait characteristics, and finally are fused into identification characteristics, so that the extracted identification characteristics are more discriminant; the timing information is focused on at feature extraction and global features are fused with local features. The method optimizes the identification characteristics by using cross entropy loss and triplet loss to improve the identification accuracy. The invention solves the problems of insufficient local feature extraction and insufficient sequential feature extraction in the past sequence-based gait recognition technology without differentiation in recognition. The method is suitable for gait recognition tasks of pedestrians, and has high recognition accuracy and good algorithm robustness.

The invention has the advantages that: firstly, the invention provides a key frame extraction method to solve the problem of indiscriminate feature extraction; secondly, the invention focuses on the time sequence information in the gait sequence during feature extraction, uses the time sequence feature pooling to process multi-frame information, and fully utilizes the time sequence information contained in the sequence; finally, a multi-granularity feature learning mode is provided, global information and local information are fully learned, and features are optimized by using triplet loss and cross entropy loss, so that the recognition accuracy of the gait recognition method is improved.

Drawings

FIG. 1 is a flow chart of a key frame based multi-granularity feature learning gait recognition method of the present invention.

Fig. 2 is a block diagram of a key frame-based multi-granularity feature learning gait recognition system of the present invention.

Detailed Description

The invention provides a multi-granularity feature learning gait recognition method based on key frames. The processing branch that takes as input a key frame sequence is called a key frame branch, and the processing branch that takes as input an original gait sequence is called a main branch. The key frame branches and the main branches respectively perform the same feature extraction operation. In the feature extraction stage, performing spatial feature extraction operation on each frame of gait contour diagram in the sequence, and extracting time sequence features based on the spatial features extracted from adjacent frames; then, multi-granularity feature learning is carried out, the whole feature map is extracted to obtain global features, meanwhile, the whole feature map is segmented to obtain local features, and then the global features and the local features are fused to obtain gait recognition features; and then the gait recognition features are subjected to gait feature matching. When gait features are matched, the generalized average pooling operation is used for adaptively extracting the space features, then the full-connection layer is used for adjusting the features, finally the similarity calculation is carried out on the gait recognition features with different samples after adjustment, and the features with the closest feature similarity are matched to the gait features of the same pedestrian. The present model uses cross entropy loss and triplet loss to optimize the recognition features, as shown in fig. 1, the present invention includes the steps of:

1) And acquiring a gait sequence, wherein the gait sequence comprises gait contour diagrams of a plurality of pedestrian continuous walking processes, and the gait sequence is used as a group of matrixes as input.

2) The impact of the gait profile in the gait sequence on the recognition result is different. Gait contours, which include the continuous motion of a pedestrian, can result in overlapping body parts. Therefore, an adaptive key frame extraction module is designed to judge gait contour diagrams containing more gait information in the pedestrian gait sequence to form a key frame sequence so as to obtain more discernable pedestrian gait recognition characteristics.

2.1 All gait contour diagrams in the pedestrian gait sequence of the input model are regarded as a group of matrixes, the probability value k of each frame of gait contour diagram in the pedestrian gait sequence, which is a key frame, is calculated, and the calculation formula of the key frame probability value k is as follows:

2.2 A threshold value δ) is set, δ being related to the angle of the pedestrian with the sampling lens in the walking direction, δ being expressed as:

δ＝(De/18)％6×β+α (19)

wherein De is the angle between the walking direction of the pedestrian and the sampling lens when the current gait profile is sampled, alpha and beta are super parameters, and the value alpha=0.5 and the value beta=0.1 of the super parameters are set;

2.3 If the key frame probability value k > delta of the gait contour map in the gait sequence, the gait contour map is judged as a key frame. For each gait sequence, judging whether each gait contour diagram in the sequence is a key frame or not, and selecting the gait contour diagrams meeting the conditions to form a key frame sequence. The processing branch taking the key frame sequence as input is called a key frame branch, the processing branch taking the original gait sequence as input is called a main branch, and the two branches respectively carry out subsequent feature extraction operation.

3) For the recognition task, the global information and the local information in the image need to be focused, the global information can enable the feature to have discrimination, and the local information can pay more attention to the detail information in the image, so that the global information and the local information in the image need to be focused when the feature is extracted. For gait recognition tasks, the pedestrian's process is a continuous motion process, so the time sequence information contained in the gait sequence needs to be paid attention to when extracting the gait recognition features. After the key frame is selected, the two branches respectively perform subsequent feature extraction operations.

The key frame branches are the same as the main branches in the feature extraction stage, and the main branches are taken as the description of the operation in the feature extraction process.

3.1 Space-time feature extraction, X) of gait sequences entered into this branch _in Representing the gait sequence of the current branch input, the spatiotemporal feature extraction process can be expressed as:

X _ST ＝Te(Sp(X _in )) (20)

wherein Sp is the spatial feature extraction operation, a 3D convolution with a convolution kernel of 3 x 3 and a step size of 3 x 3 is represented, te is the time sequence specialSyndrome extraction procedure, cryptophan jaundice> A 3D convolution with a convolution kernel of 3×1×1 and a step size of 3×1×1 is represented;

3.2 For the feature map X obtained in 3.1) _ST Performing multi-granularity feature learning operation, and obtaining multi-granularity feature Y after learning _MF The method comprises the following steps:

Y _MG ＝Y _G +Y _L (21)

wherein ,Y_G As global features, Y _L Is a local feature.

Global feature Y _G Expressed as:

Y _G ＝f _3×3×3 (X _ST ) (22)

Local feature Y _L Expressed as:

wherein ,for the characteristic diagram X _ST The local features with different granularities obtained by partitioning are obtained by using 3 partitioning modes, wherein the 3 partitioning modes are respectively expressed as that the feature map is horizontally and averagely divided into 3 blocks, 4 blocks and 6 blocks.

Firstly byFor example, the extraction process is described.

First, feature map X _ST The horizontal division is 3 blocks; then extracting the characteristics of each block respectively; finally, performing cat operation on each extracted feature to obtain features The process of (2) is shown in formula (24):

And->Extraction process and->The extraction processes of the blocks are similar, and the only difference is that the number of the blocks is different when the blocks are divided;

3.3 For 3.2) the obtained feature map Y _MF The time sequence characteristic is pooled, and the pooling process can be expressed as follows:

Y _T ＝αF _Max (Y _MF )+βF _Avg (Y _MF )+γF _Mean (Y _MF ) (25)

wherein ,F_Max (. Cndot.) represents Max-working layer, F _Avg (. Cndot.) is a Median-pivoting layer, F _Mean (·) is a Mean-pulling layer, setting α=0.5, β=0.25, γ=0.25.

3.4 Based on step 3.3) deriving features of key frame branches, respectivelyFeatures of main branches->Fusing the two features to obtain gait recognition feature Y _M The treatment process is shown as a formula (26):

4) And (5) performing gait feature matching on the gait recognition features. When gait features are matched, the generalized average pooling is used for adaptively extracting the space features, then the full-connection layer is used for adjusting the features, finally the adjusted gait recognition features of different samples are subjected to similarity calculation, and features with the closest feature similarity are matched to the gait features of the same pedestrian.

4.1 The generalized average pooling operation is used, the self-learning capability of the model is improved, and the model can be self-adaptively integrated with space information. Generalized average pooled feature Y _GeM The treatment process is shown as a formula (27):

4.2 After generalized average pooling, the obtained characteristic Y is further processed _GeM Put into a 2D full-connection layer for adjustment to obtain Y _out The process is shown in formula (12):

Y _out ＝f _1×1 (Y _GeM ) (28)

4.3 The gait recognition characteristics obtained by different samples are subjected to similarity calculation, the European similarity calculation is adopted for the model, the gait recognition characteristics obtained by different samples are calculated, and the similarity S calculation process is shown as a formula (29):

And matching the features with the closest feature similarity as gait features belonging to the same pedestrian.

4.4 The model is trained simultaneously using the cross entropy loss and the triplet loss, taking the sum of both as the final loss of the model. Final loss L _com Can be expressed as:

L _com ＝L _cse +L _tri (30)

wherein ,L_cse and L_tri Respectively cross entropy loss and triplet loss.

Cross entropy loss L _cse Expressed as:

L _cse ＝-∑ _x p(x)log q(x) (31)

wherein x is an identification feature output by the model, p (·) is a probability value that the current feature belongs to the target tag, and q (·) is a probability value that the current feature does not belong to the target tag, i.e., q (x) =1-p (x).

Triplet loss L _tri Can be expressed as:

L _tri ＝[D(F(i),F(k))-D(F(i),F(j))+m] ₊ (32)

wherein i and j are samples from the same pedestrian label, k is samples from different labels from i and j, F (-) is a feature extraction operation corresponding to the model, and D (D) ₁ ,d ₂ ) Is d ₁ And d ₂ Euclidean distance between m is the boundary of the triplet loss, operation [ gamma ]] ₊ Equal to max (γ, 0).

The invention has wide application in the gait recognition technology field, for example: pedestrian recognition in public places, long-distance pedestrian recognition, public security, and the like. The present invention will be described in detail below with reference to the accompanying drawings.

(1) In an embodiment of the invention, a gait sequence is acquired, with the gait sequence as a set of matrices as input. First, a key frame probability value k is calculated for each frame of gait contour graph in the pedestrian gait sequence. Setting the value alpha=0.5 and beta=0.1 of the super parameter, calculating a threshold delta according to the input gait sequence information, comparing k with delta, if k > delta, judging the current frame as a key frame, and forming a key frame sequence by all the key frames.

(2) And taking the key frame sequence as the input of a key frame branch, taking the original gait sequence as the input of a main branch, and respectively carrying out the same feature extraction operation on the gait sequences in the two branches.

(2.1) performing space-time feature extraction operation on a gait sequence, firstly extracting space features by using 3D convolution with a convolution kernel size of 3 multiplied by 3, and then extracting the space features by using 3D convolution with a convolution kernel size of 3 multiplied by 1;

(2.2) performing multi-granularity feature learning on the feature map obtained after space-time feature extraction, and extracting global information by using a 3D convolution layer with a convolution kernel size of 3 multiplied by 3; meanwhile, the input feature map is horizontally segmented, the number of the segments is 3, 4 and 6, and each segmented feature is convolved by using an independent convolution layer to extract local information contained in each feature map; adding the local features obtained after the blocking to obtain total local features; and then fusing the feature map containing the local information with the feature map containing the global information, and splicing in the space dimension.

And (2.3) carrying out time sequence feature pooling on the feature map obtained through multi-granularity feature learning in order to compress feature information on the time dimension of the feature map, wherein the time sequence feature pooling uses maximum pooling, average pooling and median pooling. The parameter α=0.5, β=0.25, γ=0.25 is set.

And (2.4) splicing the features of the key frame branches and the features of the main branches in the channel dimension after time sequence feature pooling to obtain gait recognition features.

(3) And (5) performing gait feature matching on the gait recognition features. When gait features are matched, the generalized average pooling is used for adaptively extracting the space features, the full-connection layer is used for adjusting the features, and the gait recognition features of different adjusted samples are subjected to similarity calculation. The model is optimized using the loss function.

(3.1) improving the self-learning capability of the model, and pooling gait recognition features by generalized average so that the gait recognition features can adaptively integrate spatial information, and the recognition features have more discrimination capability;

(3.2) after generalized average pooling, adjusting gait recognition features by using a 2-dimensional full-connection layer, and changing the dimension of the gait recognition features;

(3.3) calculating the similarity of gait recognition characteristics of different samples by using Euclidean similarity;

and (3.4) training the model by using the cross entropy loss and the triplet loss, and adopting the two and the model as the final loss of the model to ensure that the final identification feature has more discrimination capability.

The method was implemented using Python3.9 programming under Intel Core i9-10980 and Ubuntu20.04 bit operating systems.

The invention provides a multi-granularity feature learning gait recognition method based on key frames, which is suitable for gait recognition tasks, high in recognition accuracy and good in algorithm robustness. Experiments show that the gait recognition method can be used for rapidly and effectively recognizing the gait.

Claims

1. The multi-granularity characteristic learning gait recognition method based on the key frames is characterized by comprising the following steps of:

1) Selecting gait contour diagrams with large influence on gait recognition results in a gait sequence to form a key frame sequence, taking the key frame sequence as an input processing branch to be called a key frame branch, and taking an original gait sequence as an input processing branch to be called a main branch so as to ensure that attention is paid to the gait contour diagrams with large influence on the recognition results;

2) The same feature extraction operation is respectively carried out on the key frame branch and the main branch; in the feature extraction stage, performing spatial feature extraction operation on each frame of gait contour diagram in the sequence, and extracting time sequence features based on the spatial features extracted from adjacent frames; then, multi-granularity feature learning is carried out, feature extraction is carried out on the whole feature map to obtain global features, meanwhile, blocking operation is carried out on the whole feature map, local features are extracted on each blocking feature, and finally, the global features and the local features are fused to obtain multi-granularity features; performing time sequence feature pooling operation on the multi-granularity features, and fusing the features of the key frame branches and the main branches after pooling to obtain gait recognition features;

3) Performing gait feature matching based on the gait recognition features; when gait features are matched, the generalized average pooling operation is used for adaptively extracting the space features, then the full-connection layer is used for adjusting the space features, finally the similarity calculation is carried out on the gait recognition features with different samples after adjustment, and the features with the closest feature similarity are matched to the gait features of the same pedestrian.

2. The method for recognizing gait recognition based on multi-granularity feature learning of key frame according to claim 1, wherein the key frame in the gait sequence is extracted in the step 1), specifically:

wherein ,the j-th gait contour map in the gait sequence for the pedestrian labeled i>Is->Line vector of h th line in (1, H)]H is the height of the gait profile; />For row vector->W-th element of (a) in the above). />Is->Part of (a) is->First->A matrix of row-to-H-th row vectors corresponding to all portions of the pedestrian's head below the pedestrian's gait contour;

δ＝(De/18)％6×β+α (3)

1.3 If the key frame probability value k > delta of the gait contour map in the gait sequence, judging the gait contour map as a key frame; judging whether each gait contour map in each gait sequence is a key frame or not, and selecting the gait contour maps meeting the conditions to form a key frame sequence; the processing branch taking the key frame sequence as input is called a key frame branch, the processing branch taking the original gait sequence as input is called a main branch, and the two branches respectively carry out subsequent feature extraction operation.

3. The method for recognizing gait recognition based on multi-granularity feature learning of key frame according to claim 1, wherein the feature extraction process in step 2) specifically comprises:

2.1 Space-time feature extraction, X) of gait sequences entered into this branch _in Representing the gait sequence of the current branch input, the space-time feature extraction process is expressed as:

X _ST ＝Te(Sp(X _in )) (4)

2.2 For the feature map X obtained in 2.1) _ST Performing multi-granularity feature learning operation, and obtaining multi-granularity feature Y after learning _MF The method comprises the following steps:

Y _MF ＝Y _G +Y _L

(5)

wherein ,Y_G As global features, Y _L Is a local feature;

global feature Y _G Expressed as:

Y _G ＝f _3×3×3 (X _ST ) (6)

wherein ,f_3×3×3 (. Cndot.) represents a 3D convolution layer with a convolution kernel size of 3 x 3, obtaining a feature map Y containing global information _G ；

Local feature Y _L Expressed as:

wherein ,for the characteristic diagram X _ST The local features with different granularities obtained by partitioning are obtained by using 3 partitioning modes, wherein the 3 partitioning modes are respectively expressed as that the feature map is horizontally and averagely divided into 3 blocks, 4 blocks and 6 blocks;

the extraction process is as follows:

wherein ,f_3×3×3 (·) represents a 3D convolution layer with a convolution kernel of 3 x 3,is a feature map X to be input _ST The ith block of feature map after being horizontally and averagely divided into 3 blocks, cat represents connection operation in a matrix, and the obtained feature map is horizontally spliced to form a feature map containing local information;

and->Extraction process and->The extraction process is similar, and the only difference is that the number of the blocks is different when the blocks are divided;

2.3 For 2.2) the obtained characteristic map Y _MF And (3) carrying out time sequence feature pooling, wherein the pooling process comprises the following steps:

Y _T ＝αF _Max (Y _MF )+βF _Avg (Y _MF )+γF _Mean (Y _MF ) (9)

wherein ,F_Max (. Cndot.) represents Max-working layer, F _Avg (. Cndot.) is a Median-pivoting layer, F _Mean (. Cndot.) is the Mean-pulling layer, α, β, γ are parameters, and α+β+γ=1;

2.4 Obtaining features of key frame branches according to step 2.3), respectivelyFeatures of main branches->Fusing the two features to obtain gait recognition feature Y _M The treatment process is shown as a formula (10):

4. The method for recognizing gait recognition based on multi-granularity feature learning of key frame according to claim 1, wherein the step 3) comprises a gait feature matching process;

3.1 Using generalized mean pooling operation, generalized mean pooled features Y _GeM The treatment process is shown as a formula (11):

wherein ,F_Avg (.) is Median-pooling layer, p is adaptive parameter learning through network, when p=1, generalized average pooling layer is equivalent to average pooling;

Y _out ＝f _1×1 (Y _GeM ) (12)

wherein ,f_1×1 (. Cndot.) represents a 2-dimensional convolution layer with a convolution kernel of 1×1;

3.3 Performing similarity calculation on gait recognition features obtained by different samples, and calculating the gait recognition features obtained by different samples by using Euclidean similarity, wherein the similarity S calculation process is shown in a formula (13):

wherein ,f_i and f_j Respectively, the feature vectors obtained after different samples are input,is the characteristic vector f _i And f _j Euclidean distance in the feature space after normalization;

matching the features with the closest feature similarity as gait features belonging to the same pedestrian;

3.4 Training the model simultaneously using the cross entropy loss and the triplet loss, taking the sum of the two as the final loss of the model; final loss L _com Expressed as:

L _com ＝L _cse +L _tri (14)

wherein L_cse and L_tri Respectively cross entropy loss and triplet loss; the model uses L _com Training the model;

cross entropy loss L _cse Expressed as:

L _cse ＝-∑ _x p(x)logq(x) (15)

wherein x is an identification feature output by the model, p (·) is a probability value that the current feature belongs to the target tag, q (·) is a probability value that the current feature does not belong to the target tag, i.e., q (x) =1-p (x);

triplet loss L _tri Expressed as:

L _tri ＝[D(F(i),F(k))-D(F(i),F(j))+m] ₊ (16)

where i and j are samples from the same pedestrian tag, k is samples from different from the i and j tags, F (-) is the feature extraction operation corresponding to the model, D (D) ₁ ,d ₂ ) Is d ₁ And d ₂ Euclidean distance between m is the boundary of the triplet loss, operation [ gamma ]] ₊ Equal to max (γ, 0).

5. The method of claim 2, 3 or 4, wherein the given gait sequence is obtained by processing videos captured by cameras at a plurality of angles, the gait sequence includes gait profiles of the same pedestrian at different angles, and the training set does not intersect with the pedestrian tags included in the test set.