CN116563937A - Multi-granularity feature learning gait recognition method based on key frames - Google Patents

Multi-granularity feature learning gait recognition method based on key frames Download PDF

Info

Publication number
CN116563937A
CN116563937A CN202310106799.1A CN202310106799A CN116563937A CN 116563937 A CN116563937 A CN 116563937A CN 202310106799 A CN202310106799 A CN 202310106799A CN 116563937 A CN116563937 A CN 116563937A
Authority
CN
China
Prior art keywords
gait
feature
features
sequence
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310106799.1A
Other languages
Chinese (zh)
Inventor
付利华
吴会贤
张梓通
邢旻与
董光建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202310106799.1A priority Critical patent/CN116563937A/en
Publication of CN116563937A publication Critical patent/CN116563937A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-granularity feature learning gait recognition method based on key frames. The key frame sequence and the original gait sequence are respectively extracted with gait characteristics, and finally are fused into identification characteristics, so that the extracted identification characteristics are more discriminant; the timing information is focused on at feature extraction and global features are fused with local features. The invention optimizes the identification characteristics by using cross entropy loss and triplet loss to improve the identification accuracy. The invention solves the problems that the gait recognition technology based on the sequence in the past has no differential feature extraction, partial feature extraction is insufficient and time sequence feature extraction is insufficient during recognition. The method is suitable for gait recognition tasks of pedestrians, and has high recognition accuracy and good algorithm robustness. The invention has wide application in the gait recognition field.

Description

Multi-granularity feature learning gait recognition method based on key frames
Technical Field
The invention belongs to the field of biological feature recognition, and aims to carry out identity recognition based on posture change when people walk normally, in particular to a multi-granularity feature learning gait recognition method based on key frames.
Background
Gait recognition is an emerging human body biological feature recognition task, and aims to perform identity recognition by recognizing the posture change of people during normal walking. Compared with other biological feature recognition technologies such as face, fingerprint, iris recognition and the like, gait recognition can be recognized in a long distance without contact, and is difficult to hide and disguise, so that the method has obvious advantages. With video monitoring widely applied, gait recognition has more potential and can play an important role in various fields.
Currently, existing gait recognition technologies are mostly implemented based on deep learning. Gait recognition technology based on deep learning mainly has two strategies: template-based gait recognition and sequence-based gait recognition. The gait recognition method based on the template is to compress the gait outline image of the same pedestrian into an image or a gait template, and then recognize the obtained image or gait template. The identification method is fast, network parameters are few, but partial information can be lost due to the compression process, so that the identification accuracy is not high.
Therefore, in gait recognition, a sequence-based gait recognition method is often used. The gait recognition method based on the sequence is to take a plurality of existing gait profiles of the same pedestrian walking as a gait sequence, input the gait sequence into a recognition model to extract features and recognize the features. Although existing sequence-based gait recognition methods have made great progress in both accuracy and speed of operation, these methods still need improvement in terms of indiscriminate feature extraction, local feature extraction, and sequential feature extraction at the time of recognition. Accordingly, a new gait recognition method is currently required to solve the above problems.
Disclosure of Invention
The invention aims to solve the problems that: in the sequence-based gait recognition technology, although the existing gait recognition methods have greatly advanced in both accuracy and running speed, these methods still need to be improved in terms of non-differential feature extraction, local feature extraction and sequential feature extraction at the time of recognition.
In order to solve the problems, the invention provides a multi-granularity feature learning gait recognition method based on key frames. The method extracts key frames from gait contour diagrams in a gait sequence, selects gait contour diagrams with great influence on an identification result, and forms the gait contour diagrams into the key frame sequence. The key frame sequence and the original gait sequence are respectively extracted to be finally fused into gait recognition features, so that the extracted gait recognition features are more discriminant; the method focuses on time sequence information in feature extraction and fuses global features and local features so that the identification features are more discriminant, and comprises the following steps:
1) Selecting gait profile graphs with large influence on gait recognition results in a gait sequence to form a key frame sequence, taking the key frame sequence as an input processing branch to be called a key frame branch, and taking the original gait sequence as the input processing branch to be called a main branch so as to ensure that the gait profile graphs with large influence on the recognition results can be focused more fully;
2) And respectively performing feature extraction operations on the key frame branches and the main branches, wherein the feature extraction operations of the two branches are identical. In the feature extraction stage, performing spatial feature extraction operation on each frame of gait contour diagram in the sequence, and extracting time sequence features based on the spatial features extracted from adjacent frames; then, multi-granularity feature learning is carried out, firstly, feature extraction operation is carried out on the whole feature map to obtain global features, meanwhile, blocking is carried out on the whole feature map, feature extraction operation is carried out on the feature map after blocking to obtain local features, and then the global features and the local features are fused to obtain multi-granularity features; performing time sequence feature pooling operation on the multi-granularity features, and fusing the pooled features of the two branches to obtain gait recognition features;
3) And (5) performing gait feature matching on the gait recognition features. When gait features are matched, the generalized average pooling operation is used for adaptively extracting the space features, then the full-connection layer is used for adjusting the features, finally the similarity calculation is carried out on the gait recognition features with different samples after adjustment, and the gait recognition task is completed according to the feature similarity.
Further, the step 1) of extracting key frames in the gait sequence specifically includes:
1.1 All gait contour diagrams in the pedestrian gait sequence of the input model are regarded as a group of matrixes, the probability value k of each frame of gait contour diagram in the pedestrian gait sequence, which is a key frame, is calculated, and the calculation formula of the key frame probability value k is as follows:
wherein ,the j-th gait contour map in the gait sequence for the pedestrian labeled i>Is->Line vector of h th line in (1, H)]H is the height of the gait profile. />For row vector->W-th element of (a) in the above). />Is->Part of (a) specifically +>First->The matrix of row-to-H-th row vectors corresponds to all portions of the pedestrian's gait contour that are below the pedestrian's head.
1.2 A threshold value δ) is set, δ being related to the angle of the pedestrian with the sampling lens in the walking direction, δ being expressed as:
δ=(De/18)%6×β+α (3)
wherein De is the angle between the walking direction of the pedestrian and the sampling lens when the current gait contour diagram is sampled, and alpha and beta are super parameters;
1.3 If the key frame probability value k > delta of the gait contour map in the gait sequence, the gait contour map is judged as a key frame. For each gait sequence, judging whether each gait contour diagram in the sequence is a key frame or not, and selecting the gait contour diagrams meeting the conditions to form a key frame sequence. The processing branch taking the key frame sequence as input is called a key frame branch, the processing branch taking the original gait sequence as input is called a main branch, and the two branches respectively perform subsequent identical feature extraction operations.
Further, the feature extraction process in the step 2) specifically includes:
the key frame branches are the same as the main branches in the feature extraction stage, and the main branches are taken as an illustration of the feature extraction process.
2.1 Space-time feature extraction, X) of gait sequences entered into this branch in Representing the gait sequence of the current branch input, the spatiotemporal feature extraction process can be expressed as:
X ST =Te(Sp(X in )) (4)
wherein Sp is the spatial feature extraction operation, A3D convolution with a convolution kernel of 3 x 3 and a step size of 3 x 3 is represented, te is the timing feature extraction operation,/A> A 3D convolution with a convolution kernel of 3×1×1 and a step size of 3×1×1 is represented;
2.2 For the feature map X obtained in 2.1) ST Performing multi-granularity feature learning operation, and obtaining multi-granularity feature Y after learning MF Can be expressed as:
Y MF =Y G +Y L (5)
wherein ,YG As global features, Y L Is a local feature.
Global feature Y G Expressed as:
Y G =f 3×3×3 (X ST ) (6)
wherein ,f3×3×3 (. Cndot.) represents a 3D convolution layer with a convolution kernel size of 3 x 3, obtaining a feature map Y containing global information G
Local feature Y L Expressed as:
wherein ,for the characteristic diagram X ST Local features with different granularities obtained by using 3 partitioning modes for blockingThe 3 division modes are respectively expressed as that the feature map is horizontally and averagely divided into 3 blocks, 4 blocks and 6 blocks.
Firstly byFor example, the extraction process is described.
First, feature map X ST The horizontal division is 3 blocks; then extracting the characteristics of each block respectively; finally, performing cat operation on each extracted feature to obtain features The treatment process of (2) is shown in the formula (8):
wherein ,f3×3×3 (·) represents a 3D convolution layer with a convolution kernel of 3 x 3,is a feature map X to be input ST And (3) horizontally dividing the characteristic map of the ith block into 3 blocks, wherein cat represents the connection operation in the matrix, and horizontally splicing the obtained characteristic maps to form the characteristic map containing local information.
And->Extraction process and->The extraction process is similar, and the only difference is that the number of the blocks is different when the blocks are divided.
2.3 For 2.2) the obtained characteristic map Y MF The time sequence characteristic is pooled, and the pooling process can be expressed as follows:
Y T =αF Max (Y MF )+βF Avg (Y MF )+γF Mean (Y MF ) (9)
wherein ,FMax (. Cndot.) represents Max-working layer, F Avg (. Cndot.) is a Median-pivoting layer, F Mean (. Cndot.) is the Mean-pulling layer, α, β, γ are parameters, and α+β+γ=1.
2.4 Based on step 2.3) deriving features of key frame branches, respectivelyFeatures of main branches->Fusing the two features to obtain gait recognition feature Y M The treatment process is shown as a formula (10):
the ∈is a feature fusion operation, and the specific operation is a matrix cat operation in the channel dimension.
Step 3) performing gait feature matching on gait recognition features, specifically:
3.1 First, the generalized average pooling operation is used to improve the self-learning ability of the model so that it can adaptively integrate spatial information. Generalized average pooled feature Y GeM The treatment process is shown as a formula (11):
wherein ,FAvg (.) is a Median-pooling layer, p is an adaptive parameter that can be learned through the network, and when p=1, the generalized average pooling layer is equivalently an average pooling;
3.2 After generalized average pooling, the obtained characteristic Y is further processed GeM Put into a 2D full-connection layer for adjustment to obtain Y out The process is shown in formula (12):
Y out =f 1×1 (Y GeM ) (12)
wherein ,f1×1 (. Cndot.) represents a 2-dimensional convolution layer with a convolution kernel of 1×1.
3.3 The gait recognition features of different samples are subjected to similarity calculation, the Euclidean similarity is adopted for calculating the similarity among the gait recognition features of different samples, and the similarity S calculation process is shown as a formula (13):
wherein ,fi and fj Respectively, the feature vectors obtained after different samples are input,is the characteristic vector f i And f j Euclidean distance in feature space after normalization.
And matching the features with the closest feature similarity as gait recognition features belonging to the same pedestrian.
3.4 The model is trained simultaneously using the cross entropy loss and the triplet loss, with the sum of the two being taken as the final loss of the model. Final loss L com Can be expressed as:
L com =L cse +L tri (14)
wherein ,Lcse and Ltri Respectively cross entropy loss and triplet loss. The model uses L com The model is trained.
Cross entropy loss L cse Expressed as:
L cse =-∑ x p(x)log q(x) (15)
wherein x is gait recognition feature output by the model, p (·) is a probability value that the current feature belongs to the target tag, and q (·) is a probability value that the current feature does not belong to the target tag, i.e., q (x) =1-p (x).
Triplet loss L tri Can be expressed as:
L tri =[D(F(i),F(k))-D(F(i),F(j))+m] + (16)
wherein i and j are from the same pedestrian labelK is a sample from a different label than i and j, F (·) is a feature extraction operation corresponding to the model, D (D) 1 ,d 2 ) Is d 1 And d 2 Euclidean distance between m is the boundary of the triplet loss, operation [ gamma ]] + Equal to max (γ, 0).
The invention provides a multi-granularity feature learning gait recognition method based on a key frame. The method extracts key frames from gait contour diagrams in a gait sequence, selects gait contour diagrams with great influence on an identification result, and forms the gait contour diagrams into the key frame sequence. The key frame sequence and the original gait sequence are respectively extracted with gait characteristics, and finally are fused into identification characteristics, so that the extracted identification characteristics are more discriminant; the timing information is focused on at feature extraction and global features are fused with local features. The method optimizes the identification characteristics by using cross entropy loss and triplet loss to improve the identification accuracy. The invention solves the problems of insufficient local feature extraction and insufficient sequential feature extraction in the past sequence-based gait recognition technology without differentiation in recognition. The method is suitable for gait recognition tasks of pedestrians, and has high recognition accuracy and good algorithm robustness.
The invention has the advantages that: firstly, the invention provides a key frame extraction method to solve the problem of indiscriminate feature extraction; secondly, the invention focuses on the time sequence information in the gait sequence during feature extraction, uses the time sequence feature pooling to process multi-frame information, and fully utilizes the time sequence information contained in the sequence; finally, a multi-granularity feature learning mode is provided, global information and local information are fully learned, and features are optimized by using triplet loss and cross entropy loss, so that the recognition accuracy of the gait recognition method is improved.
Drawings
FIG. 1 is a flow chart of a key frame based multi-granularity feature learning gait recognition method of the present invention.
Fig. 2 is a block diagram of a key frame-based multi-granularity feature learning gait recognition system of the present invention.
Detailed Description
The invention provides a multi-granularity feature learning gait recognition method based on key frames. The processing branch that takes as input a key frame sequence is called a key frame branch, and the processing branch that takes as input an original gait sequence is called a main branch. The key frame branches and the main branches respectively perform the same feature extraction operation. In the feature extraction stage, performing spatial feature extraction operation on each frame of gait contour diagram in the sequence, and extracting time sequence features based on the spatial features extracted from adjacent frames; then, multi-granularity feature learning is carried out, the whole feature map is extracted to obtain global features, meanwhile, the whole feature map is segmented to obtain local features, and then the global features and the local features are fused to obtain gait recognition features; and then the gait recognition features are subjected to gait feature matching. When gait features are matched, the generalized average pooling operation is used for adaptively extracting the space features, then the full-connection layer is used for adjusting the features, finally the similarity calculation is carried out on the gait recognition features with different samples after adjustment, and the features with the closest feature similarity are matched to the gait features of the same pedestrian. The present model uses cross entropy loss and triplet loss to optimize the recognition features, as shown in fig. 1, the present invention includes the steps of:
1) And acquiring a gait sequence, wherein the gait sequence comprises gait contour diagrams of a plurality of pedestrian continuous walking processes, and the gait sequence is used as a group of matrixes as input.
2) The impact of the gait profile in the gait sequence on the recognition result is different. Gait contours, which include the continuous motion of a pedestrian, can result in overlapping body parts. Therefore, an adaptive key frame extraction module is designed to judge gait contour diagrams containing more gait information in the pedestrian gait sequence to form a key frame sequence so as to obtain more discernable pedestrian gait recognition characteristics.
2.1 All gait contour diagrams in the pedestrian gait sequence of the input model are regarded as a group of matrixes, the probability value k of each frame of gait contour diagram in the pedestrian gait sequence, which is a key frame, is calculated, and the calculation formula of the key frame probability value k is as follows:
wherein ,the j-th gait contour map in the gait sequence for the pedestrian labeled i>Is->Line vector of h th line in (1, H)]H is the height of the gait profile. />For row vector->W-th element of (a) in the above). />Is->Part of (a) specifically +>First->The matrix of row-to-H-th row vectors corresponds to all portions of the pedestrian's gait contour that are below the pedestrian's head.
2.2 A threshold value δ) is set, δ being related to the angle of the pedestrian with the sampling lens in the walking direction, δ being expressed as:
δ=(De/18)%6×β+α (19)
wherein De is the angle between the walking direction of the pedestrian and the sampling lens when the current gait profile is sampled, alpha and beta are super parameters, and the value alpha=0.5 and the value beta=0.1 of the super parameters are set;
2.3 If the key frame probability value k > delta of the gait contour map in the gait sequence, the gait contour map is judged as a key frame. For each gait sequence, judging whether each gait contour diagram in the sequence is a key frame or not, and selecting the gait contour diagrams meeting the conditions to form a key frame sequence. The processing branch taking the key frame sequence as input is called a key frame branch, the processing branch taking the original gait sequence as input is called a main branch, and the two branches respectively carry out subsequent feature extraction operation.
3) For the recognition task, the global information and the local information in the image need to be focused, the global information can enable the feature to have discrimination, and the local information can pay more attention to the detail information in the image, so that the global information and the local information in the image need to be focused when the feature is extracted. For gait recognition tasks, the pedestrian's process is a continuous motion process, so the time sequence information contained in the gait sequence needs to be paid attention to when extracting the gait recognition features. After the key frame is selected, the two branches respectively perform subsequent feature extraction operations.
The key frame branches are the same as the main branches in the feature extraction stage, and the main branches are taken as the description of the operation in the feature extraction process.
3.1 Space-time feature extraction, X) of gait sequences entered into this branch in Representing the gait sequence of the current branch input, the spatiotemporal feature extraction process can be expressed as:
X ST =Te(Sp(X in )) (20)
wherein Sp is the spatial feature extraction operation, a 3D convolution with a convolution kernel of 3 x 3 and a step size of 3 x 3 is represented, te is the time sequence specialSyndrome extraction procedure, cryptophan jaundice> A 3D convolution with a convolution kernel of 3×1×1 and a step size of 3×1×1 is represented;
3.2 For the feature map X obtained in 3.1) ST Performing multi-granularity feature learning operation, and obtaining multi-granularity feature Y after learning MF The method comprises the following steps:
Y MG =Y G +Y L (21)
wherein ,YG As global features, Y L Is a local feature.
Global feature Y G Expressed as:
Y G =f 3×3×3 (X ST ) (22)
wherein ,f3×3×3 (. Cndot.) represents a 3D convolution layer with a convolution kernel size of 3 x 3, obtaining a feature map Y containing global information G
Local feature Y L Expressed as:
wherein ,for the characteristic diagram X ST The local features with different granularities obtained by partitioning are obtained by using 3 partitioning modes, wherein the 3 partitioning modes are respectively expressed as that the feature map is horizontally and averagely divided into 3 blocks, 4 blocks and 6 blocks.
Firstly byFor example, the extraction process is described.
First, feature map X ST The horizontal division is 3 blocks; then extracting the characteristics of each block respectively; finally, performing cat operation on each extracted feature to obtain features The process of (2) is shown in formula (24):
wherein ,f3×3×3 (·) represents a 3D convolution layer with a convolution kernel of 3 x 3,is a feature map X to be input ST And (3) horizontally dividing the characteristic map of the ith block into 3 blocks, wherein cat represents the connection operation in the matrix, and horizontally splicing the obtained characteristic maps to form the characteristic map containing local information.
And->Extraction process and->The extraction processes of the blocks are similar, and the only difference is that the number of the blocks is different when the blocks are divided;
3.3 For 3.2) the obtained feature map Y MF The time sequence characteristic is pooled, and the pooling process can be expressed as follows:
Y T =αF Max (Y MF )+βF Avg (Y MF )+γF Mean (Y MF ) (25)
wherein ,FMax (. Cndot.) represents Max-working layer, F Avg (. Cndot.) is a Median-pivoting layer, F Mean (·) is a Mean-pulling layer, setting α=0.5, β=0.25, γ=0.25.
3.4 Based on step 3.3) deriving features of key frame branches, respectivelyFeatures of main branches->Fusing the two features to obtain gait recognition feature Y M The treatment process is shown as a formula (26):
the ∈is a feature fusion operation, and the specific operation is a matrix cat operation in the channel dimension.
4) And (5) performing gait feature matching on the gait recognition features. When gait features are matched, the generalized average pooling is used for adaptively extracting the space features, then the full-connection layer is used for adjusting the features, finally the adjusted gait recognition features of different samples are subjected to similarity calculation, and features with the closest feature similarity are matched to the gait features of the same pedestrian.
4.1 The generalized average pooling operation is used, the self-learning capability of the model is improved, and the model can be self-adaptively integrated with space information. Generalized average pooled feature Y GeM The treatment process is shown as a formula (27):
wherein ,FAvg (.) is a Median-pooling layer, p is an adaptive parameter that can be learned through the network, and when p=1, the generalized average pooling layer is equivalently an average pooling;
4.2 After generalized average pooling, the obtained characteristic Y is further processed GeM Put into a 2D full-connection layer for adjustment to obtain Y out The process is shown in formula (12):
Y out =f 1×1 (Y GeM ) (28)
wherein ,f1×1 (. Cndot.) represents a 2-dimensional convolution layer with a convolution kernel of 1×1.
4.3 The gait recognition characteristics obtained by different samples are subjected to similarity calculation, the European similarity calculation is adopted for the model, the gait recognition characteristics obtained by different samples are calculated, and the similarity S calculation process is shown as a formula (29):
wherein ,fi and fj Respectively, the feature vectors obtained after different samples are input,is the characteristic vector f i And f j Euclidean distance in feature space after normalization.
And matching the features with the closest feature similarity as gait features belonging to the same pedestrian.
4.4 The model is trained simultaneously using the cross entropy loss and the triplet loss, taking the sum of both as the final loss of the model. Final loss L com Can be expressed as:
L com =L cse +L tri (30)
wherein ,Lcse and Ltri Respectively cross entropy loss and triplet loss.
Cross entropy loss L cse Expressed as:
L cse =-∑ x p(x)log q(x) (31)
wherein x is an identification feature output by the model, p (·) is a probability value that the current feature belongs to the target tag, and q (·) is a probability value that the current feature does not belong to the target tag, i.e., q (x) =1-p (x).
Triplet loss L tri Can be expressed as:
L tri =[D(F(i),F(k))-D(F(i),F(j))+m] + (32)
wherein i and j are samples from the same pedestrian label, k is samples from different labels from i and j, F (-) is a feature extraction operation corresponding to the model, and D (D) 1 ,d 2 ) Is d 1 And d 2 Euclidean distance between m is the boundary of the triplet loss, operation [ gamma ]] + Equal to max (γ, 0).
The invention has wide application in the gait recognition technology field, for example: pedestrian recognition in public places, long-distance pedestrian recognition, public security, and the like. The present invention will be described in detail below with reference to the accompanying drawings.
(1) In an embodiment of the invention, a gait sequence is acquired, with the gait sequence as a set of matrices as input. First, a key frame probability value k is calculated for each frame of gait contour graph in the pedestrian gait sequence. Setting the value alpha=0.5 and beta=0.1 of the super parameter, calculating a threshold delta according to the input gait sequence information, comparing k with delta, if k > delta, judging the current frame as a key frame, and forming a key frame sequence by all the key frames.
(2) And taking the key frame sequence as the input of a key frame branch, taking the original gait sequence as the input of a main branch, and respectively carrying out the same feature extraction operation on the gait sequences in the two branches.
(2.1) performing space-time feature extraction operation on a gait sequence, firstly extracting space features by using 3D convolution with a convolution kernel size of 3 multiplied by 3, and then extracting the space features by using 3D convolution with a convolution kernel size of 3 multiplied by 1;
(2.2) performing multi-granularity feature learning on the feature map obtained after space-time feature extraction, and extracting global information by using a 3D convolution layer with a convolution kernel size of 3 multiplied by 3; meanwhile, the input feature map is horizontally segmented, the number of the segments is 3, 4 and 6, and each segmented feature is convolved by using an independent convolution layer to extract local information contained in each feature map; adding the local features obtained after the blocking to obtain total local features; and then fusing the feature map containing the local information with the feature map containing the global information, and splicing in the space dimension.
And (2.3) carrying out time sequence feature pooling on the feature map obtained through multi-granularity feature learning in order to compress feature information on the time dimension of the feature map, wherein the time sequence feature pooling uses maximum pooling, average pooling and median pooling. The parameter α=0.5, β=0.25, γ=0.25 is set.
And (2.4) splicing the features of the key frame branches and the features of the main branches in the channel dimension after time sequence feature pooling to obtain gait recognition features.
(3) And (5) performing gait feature matching on the gait recognition features. When gait features are matched, the generalized average pooling is used for adaptively extracting the space features, the full-connection layer is used for adjusting the features, and the gait recognition features of different adjusted samples are subjected to similarity calculation. The model is optimized using the loss function.
(3.1) improving the self-learning capability of the model, and pooling gait recognition features by generalized average so that the gait recognition features can adaptively integrate spatial information, and the recognition features have more discrimination capability;
(3.2) after generalized average pooling, adjusting gait recognition features by using a 2-dimensional full-connection layer, and changing the dimension of the gait recognition features;
(3.3) calculating the similarity of gait recognition characteristics of different samples by using Euclidean similarity;
and (3.4) training the model by using the cross entropy loss and the triplet loss, and adopting the two and the model as the final loss of the model to ensure that the final identification feature has more discrimination capability.
The method was implemented using Python3.9 programming under Intel Core i9-10980 and Ubuntu20.04 bit operating systems.
The invention provides a multi-granularity feature learning gait recognition method based on key frames, which is suitable for gait recognition tasks, high in recognition accuracy and good in algorithm robustness. Experiments show that the gait recognition method can be used for rapidly and effectively recognizing the gait.

Claims (5)

1. The multi-granularity characteristic learning gait recognition method based on the key frames is characterized by comprising the following steps of:
1) Selecting gait contour diagrams with large influence on gait recognition results in a gait sequence to form a key frame sequence, taking the key frame sequence as an input processing branch to be called a key frame branch, and taking an original gait sequence as an input processing branch to be called a main branch so as to ensure that attention is paid to the gait contour diagrams with large influence on the recognition results;
2) The same feature extraction operation is respectively carried out on the key frame branch and the main branch; in the feature extraction stage, performing spatial feature extraction operation on each frame of gait contour diagram in the sequence, and extracting time sequence features based on the spatial features extracted from adjacent frames; then, multi-granularity feature learning is carried out, feature extraction is carried out on the whole feature map to obtain global features, meanwhile, blocking operation is carried out on the whole feature map, local features are extracted on each blocking feature, and finally, the global features and the local features are fused to obtain multi-granularity features; performing time sequence feature pooling operation on the multi-granularity features, and fusing the features of the key frame branches and the main branches after pooling to obtain gait recognition features;
3) Performing gait feature matching based on the gait recognition features; when gait features are matched, the generalized average pooling operation is used for adaptively extracting the space features, then the full-connection layer is used for adjusting the space features, finally the similarity calculation is carried out on the gait recognition features with different samples after adjustment, and the features with the closest feature similarity are matched to the gait features of the same pedestrian.
2. The method for recognizing gait recognition based on multi-granularity feature learning of key frame according to claim 1, wherein the key frame in the gait sequence is extracted in the step 1), specifically:
1.1 All gait contour diagrams in the pedestrian gait sequence of the input model are regarded as a group of matrixes, the probability value k of each frame of gait contour diagram in the pedestrian gait sequence, which is a key frame, is calculated, and the calculation formula of the key frame probability value k is as follows:
wherein ,the j-th gait contour map in the gait sequence for the pedestrian labeled i>Is->Line vector of h th line in (1, H)]H is the height of the gait profile; />For row vector->W-th element of (a) in the above). />Is->Part of (a) is->First->A matrix of row-to-H-th row vectors corresponding to all portions of the pedestrian's head below the pedestrian's gait contour;
1.2 A threshold value δ) is set, δ being related to the angle of the pedestrian with the sampling lens in the walking direction, δ being expressed as:
δ=(De/18)%6×β+α (3)
wherein De is the angle between the walking direction of the pedestrian and the sampling lens when the current gait contour diagram is sampled, and alpha and beta are super parameters;
1.3 If the key frame probability value k > delta of the gait contour map in the gait sequence, judging the gait contour map as a key frame; judging whether each gait contour map in each gait sequence is a key frame or not, and selecting the gait contour maps meeting the conditions to form a key frame sequence; the processing branch taking the key frame sequence as input is called a key frame branch, the processing branch taking the original gait sequence as input is called a main branch, and the two branches respectively carry out subsequent feature extraction operation.
3. The method for recognizing gait recognition based on multi-granularity feature learning of key frame according to claim 1, wherein the feature extraction process in step 2) specifically comprises:
2.1 Space-time feature extraction, X) of gait sequences entered into this branch in Representing the gait sequence of the current branch input, the space-time feature extraction process is expressed as:
X ST =Te(Sp(X in )) (4)
wherein Sp is the spatial feature extraction operation, A3D convolution with a convolution kernel of 3 x 3 and a step size of 3 x 3 is represented, te is the timing feature extraction operation,/A> A 3D convolution with a convolution kernel of 3×1×1 and a step size of 3×1×1 is represented;
2.2 For the feature map X obtained in 2.1) ST Performing multi-granularity feature learning operation, and obtaining multi-granularity feature Y after learning MF The method comprises the following steps:
Y MF =Y G +Y L
(5)
wherein ,YG As global features, Y L Is a local feature;
global feature Y G Expressed as:
Y G =f 3×3×3 (X ST ) (6)
wherein ,f3×3×3 (. Cndot.) represents a 3D convolution layer with a convolution kernel size of 3 x 3, obtaining a feature map Y containing global information G
Local feature Y L Expressed as:
wherein ,for the characteristic diagram X ST The local features with different granularities obtained by partitioning are obtained by using 3 partitioning modes, wherein the 3 partitioning modes are respectively expressed as that the feature map is horizontally and averagely divided into 3 blocks, 4 blocks and 6 blocks;
the extraction process is as follows:
first, feature map X ST The horizontal division is 3 blocks; then extracting the characteristics of each block respectively; finally, performing cat operation on each extracted feature to obtain features The treatment process of (2) is shown in the formula (8):
wherein ,f3×3×3 (·) represents a 3D convolution layer with a convolution kernel of 3 x 3,is a feature map X to be input ST The ith block of feature map after being horizontally and averagely divided into 3 blocks, cat represents connection operation in a matrix, and the obtained feature map is horizontally spliced to form a feature map containing local information;
and->Extraction process and->The extraction process is similar, and the only difference is that the number of the blocks is different when the blocks are divided;
2.3 For 2.2) the obtained characteristic map Y MF And (3) carrying out time sequence feature pooling, wherein the pooling process comprises the following steps:
Y T =αF Max (Y MF )+βF Avg (Y MF )+γF Mean (Y MF ) (9)
wherein ,FMax (. Cndot.) represents Max-working layer, F Avg (. Cndot.) is a Median-pivoting layer, F Mean (. Cndot.) is the Mean-pulling layer, α, β, γ are parameters, and α+β+γ=1;
2.4 Obtaining features of key frame branches according to step 2.3), respectivelyFeatures of main branches->Fusing the two features to obtain gait recognition feature Y M The treatment process is shown as a formula (10):
the ∈is a feature fusion operation, and the specific operation is a matrix cat operation in the channel dimension.
4. The method for recognizing gait recognition based on multi-granularity feature learning of key frame according to claim 1, wherein the step 3) comprises a gait feature matching process;
3.1 Using generalized mean pooling operation, generalized mean pooled features Y GeM The treatment process is shown as a formula (11):
wherein ,FAvg (.) is Median-pooling layer, p is adaptive parameter learning through network, when p=1, generalized average pooling layer is equivalent to average pooling;
3.2 After generalized average pooling, the obtained characteristic Y is further processed GeM Put into a 2D full-connection layer for adjustment to obtain Y out The process is shown in formula (12):
Y out =f 1×1 (Y GeM ) (12)
wherein ,f1×1 (. Cndot.) represents a 2-dimensional convolution layer with a convolution kernel of 1×1;
3.3 Performing similarity calculation on gait recognition features obtained by different samples, and calculating the gait recognition features obtained by different samples by using Euclidean similarity, wherein the similarity S calculation process is shown in a formula (13):
wherein ,fi and fj Respectively, the feature vectors obtained after different samples are input,is the characteristic vector f i And f j Euclidean distance in the feature space after normalization;
matching the features with the closest feature similarity as gait features belonging to the same pedestrian;
3.4 Training the model simultaneously using the cross entropy loss and the triplet loss, taking the sum of the two as the final loss of the model; final loss L com Expressed as:
L com =L cse +L tri (14)
wherein Lcse and Ltri Respectively cross entropy loss and triplet loss; the model uses L com Training the model;
cross entropy loss L cse Expressed as:
L cse =-∑ x p(x)logq(x) (15)
wherein x is an identification feature output by the model, p (·) is a probability value that the current feature belongs to the target tag, q (·) is a probability value that the current feature does not belong to the target tag, i.e., q (x) =1-p (x);
triplet loss L tri Expressed as:
L tri =[D(F(i),F(k))-D(F(i),F(j))+m] + (16)
where i and j are samples from the same pedestrian tag, k is samples from different from the i and j tags, F (-) is the feature extraction operation corresponding to the model, D (D) 1 ,d 2 ) Is d 1 And d 2 Euclidean distance between m is the boundary of the triplet loss, operation [ gamma ]] + Equal to max (γ, 0).
5. The method of claim 2, 3 or 4, wherein the given gait sequence is obtained by processing videos captured by cameras at a plurality of angles, the gait sequence includes gait profiles of the same pedestrian at different angles, and the training set does not intersect with the pedestrian tags included in the test set.
CN202310106799.1A 2023-02-13 2023-02-13 Multi-granularity feature learning gait recognition method based on key frames Pending CN116563937A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310106799.1A CN116563937A (en) 2023-02-13 2023-02-13 Multi-granularity feature learning gait recognition method based on key frames

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310106799.1A CN116563937A (en) 2023-02-13 2023-02-13 Multi-granularity feature learning gait recognition method based on key frames

Publications (1)

Publication Number Publication Date
CN116563937A true CN116563937A (en) 2023-08-08

Family

ID=87493571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310106799.1A Pending CN116563937A (en) 2023-02-13 2023-02-13 Multi-granularity feature learning gait recognition method based on key frames

Country Status (1)

Country Link
CN (1) CN116563937A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117830311A (en) * 2024-03-05 2024-04-05 中山大学 Medical image picture segmentation and key frame identification method, system, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117830311A (en) * 2024-03-05 2024-04-05 中山大学 Medical image picture segmentation and key frame identification method, system, equipment and medium
CN117830311B (en) * 2024-03-05 2024-05-28 中山大学 Medical image picture segmentation and key frame identification method, system, equipment and medium

Similar Documents

Publication Publication Date Title
CN110084156B (en) Gait feature extraction method and pedestrian identity recognition method based on gait features
CN108830252B (en) Convolutional neural network human body action recognition method fusing global space-time characteristics
US7929729B2 (en) Image processing methods
CN110717411A (en) Pedestrian re-identification method based on deep layer feature fusion
CN111639564B (en) Video pedestrian re-identification method based on multi-attention heterogeneous network
CN109377555B (en) Method for extracting and identifying three-dimensional reconstruction target features of foreground visual field of autonomous underwater robot
CN111539320B (en) Multi-view gait recognition method and system based on mutual learning network strategy
CN111353385B (en) Pedestrian re-identification method and device based on mask alignment and attention mechanism
CN113591674B (en) Edge environment behavior recognition system for real-time video stream
CN113177464A (en) End-to-end multi-modal gait recognition method based on deep learning
Czyżewski et al. Multi-stage video analysis framework
CN111914762A (en) Gait information-based identity recognition method and device
CN116563937A (en) Multi-granularity feature learning gait recognition method based on key frames
CN114550268A (en) Depth-forged video detection method utilizing space-time characteristics
CN118135660A (en) Cross-view gait recognition method for joint multi-view information bottleneck under view-angle deficiency condition
CN113111797A (en) Cross-view gait recognition method combining self-encoder and view transformation model
Megha et al. Intelligent optimization of latent fingerprint image segmentation using stacked convolutional autoencoder
Wang et al. Face tracking using motion-guided dynamic template matching
CN114627492B (en) Double-pyramid structure guided multi-granularity pedestrian re-identification method and system
Karungaru et al. Face recognition in colour images using neural networks and genetic algorithms
CN115457652A (en) Pedestrian re-identification method and device based on semi-supervised learning and storage medium
CN111553202B (en) Training method, detection method and device for neural network for living body detection
CN114743278A (en) Finger vein identification method based on generation of confrontation network and convolutional neural network
KR20230011817A (en) Apparatus and Method for Object Recognition based on Environment Matching
CN112818808A (en) High-precision gait recognition method combining two vector embedding spaces

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination