CN113011396A - Gait recognition method based on deep learning cascade feature fusion - Google Patents

Gait recognition method based on deep learning cascade feature fusion Download PDF

Info

Publication number
CN113011396A
CN113011396A CN202110460610.XA CN202110460610A CN113011396A CN 113011396 A CN113011396 A CN 113011396A CN 202110460610 A CN202110460610 A CN 202110460610A CN 113011396 A CN113011396 A CN 113011396A
Authority
CN
China
Prior art keywords
global
local
feature
network
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110460610.XA
Other languages
Chinese (zh)
Other versions
CN113011396B (en
Inventor
罗俊
李华洋
王慧燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Gongshang University
Third Research Institute of the Ministry of Public Security
Original Assignee
Zhejiang Gongshang University
Third Research Institute of the Ministry of Public Security
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Gongshang University, Third Research Institute of the Ministry of Public Security filed Critical Zhejiang Gongshang University
Priority to CN202110460610.XA priority Critical patent/CN113011396B/en
Publication of CN113011396A publication Critical patent/CN113011396A/en
Application granted granted Critical
Publication of CN113011396B publication Critical patent/CN113011396B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/181Segmentation; Edge detection involving edge growing; involving edge linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a gait recognition method based on deep learning cascade feature fusion. The method comprises the steps of firstly reading a video frame, carrying out local feature extraction through a local feature extraction branch, simultaneously extracting overall contour features through a global feature extraction branch, and then fusing the two types of features for subsequent identification. The method uses an improved GaitSet network as a local feature extraction branch to slice the image and extract local features; obtaining a global feature extraction branch through an improved GaitSet network, so that the network is more concerned about the overall contour feature of the image; fusing the two types of features through a well-designed feature fusion branch enables the overall framework to be represented by more complete features. The invention has better universality and can be suitable for other gait recognition models.

Description

Gait recognition method based on deep learning cascade feature fusion
Technical Field
The invention belongs to the field of video image processing and gait recognition in computer vision, and relates to a gait recognition method based on cascade feature fusion of deep learning.
Background
Gait recognition is a new biometric recognition technology for identity verification through pedestrian walking posture. Compared with the technologies such as face recognition, fingerprint recognition, iris recognition and the like, the gait recognition research starts relatively late, and the recognition can be completed without active cooperation of a recognition object due to the advantages of non-contact, long distance, difficulty in camouflage and the like, so that the gait recognition technology can be widely applied to the fields of smart cities, smart traffic and the like and scenes such as suspects and the like can be searched.
In recent years, with the wide application of deep neural networks, gait recognition has been greatly developed. The existing gait recognition method based on deep learning can be roughly divided into two types: template-based methods and sequence-based methods, both identified by pedestrian gait contours, but most methods use only global contour features or only local contour features, possibly losing some useful information; even though some methods use global and local contour features, the two types of feature extraction branches share weights, and the network cannot learn the respective unique features.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a gait recognition method based on deep learning cascade feature fusion, which fuses improved global and local feature extraction models, improves the accuracy of gait recognition and can be widely applied to other gait recognition networks.
The technical scheme adopted by the invention for solving the technical problem is as follows:
step 1, accessing a pedestrian gait image sequence or video, inputting the pedestrian gait image sequence or video into a local feature extraction branch based on a GaitSet network, and carrying out slicing processing on the image to extract a local feature FLocal
Step 2, sequencing or looking at pedestrian gait imagesThe frequency input is based on a GaitSet network improved global feature extraction branch, the branch completely reserves the overall contour feature information of the image to obtain a global feature FGlobal
And 3, performing feature fusion on the extracted local features and the extracted global features through the feature fusion branch.
And 4, during network training, using triple loss for the fusion characteristics to perform subsequent back propagation and update network parameters.
And 5, calculating the Euclidean distance between the fusion characteristics of the probe and the galery when the trained network is used for forward reasoning, and calculating the rank-1 recognition accuracy according to the distance.
The technical scheme provided by the invention has the beneficial effects that: the GaitSet network is used as a local feature extraction branch to slice the image to extract local features, the GaitSet network is improved to keep complete contour information of the image to extract global features, meanwhile, a feature fusion branch is designed to learn context information and perform feature fusion on the two types of features respectively, the global features in the GaiSet are enhanced, final representation of the network extraction features is more complete, and identification accuracy is improved. The invention can realize high-precision gait recognition through image sequence or video input without other auxiliary equipment.
Drawings
In order to more clearly show the network structure and the training and forward reasoning processes in the embodiment of the present invention, the drawings used in the embodiment will be briefly described below.
FIG. 1 is a structure of a gait recognition method of deep learning cascade feature fusion according to the present invention;
FIG. 2 is a flow chart of the present invention for training;
FIG. 3 is a flow chart of forward reasoning according to the present invention.
Detailed Description
In order to more specifically describe the present invention, the following detailed description is provided for the technical solution of the present invention with reference to the accompanying drawings and the specific embodiments.
The invention provides a general gait recognition method based on deep learning cascade feature fusion, wherein a network framework structure is shown in figure 1 and mainly comprises three branches, namely a whole gait feature extraction branch, a local gait feature extraction branch and a feature fusion branch, and a structure diagram of a residual error network Res and an attention model Att is also shown in figure 1.
The network training process is shown in fig. 2. The brief steps are as follows: firstly, reading a video frame and extracting local gait features; secondly, extracting global and local gait features from the video frame at the same time; thirdly, enhancing and fusing the two characteristics; fourthly, calculating the Loss of the fusion characteristics; and fifthly, propagating and updating the network parameters in the reverse direction.
The network forward reasoning process is shown in fig. 3. The brief steps are as follows: firstly, reading a video frame and extracting local gait features; secondly, extracting global gait features from the video frames at the same time; thirdly, enhancing and fusing the two characteristics; and fourthly, calculating the Euclidean distance between the fusion characteristics of the probe and the galery, and calculating the rank-1 identification accuracy according to the distance.
Example (b):
a gait recognition method based on deep learning cascade feature fusion comprises the following steps:
step 1, accessing a gait sequence or a video, extracting features through a GaitSet local feature branch, and carrying out slicing processing on the features during feature mapping to obtain local features of each slice.
Step 2, extracting global gait features through a global feature extraction branch improved based on GaitSet, specifically:
and inputting the gait sequence or the video into the global characteristic extraction branch to extract global gait characteristics. This branch identifies the network based on GaitSet gait. A horizontal feature mapping (HMP) module in the GaitSet gait recognition network carries out horizontal blocking on the features and then carries out subsequent processing respectively, the method is changed into the method for directly carrying out fusion operation of Global Average Pooling (GAP) and Global Maximum Pooling (GMP) on the features, blocking processing is not carried out, so that the network focuses on the overall gait contour features, and when the pedestrian contour is subjected to local detail change, more robust features can be extracted.
Step 3, performing feature fusion on the local gait feature and the global gait feature through an improved multi-cascade attention-enhancing feature fusion network module, namely the feature fusion branch, specifically:
as shown in FIG. 1, the feature fusion module firstly amplifies the receptive field of the output neuron by cascading two residual error network (Res) branches to further enhance the semantic expression of the global features, wherein the output F of one branchGlobal_Res1Adding (Add) the feature map with the input of another branch, and obtaining the output after the fusion and passing through a Res residual error network, wherein the output is represented as FGlobal_Res2. The basic units of Res are a 3 x 3 max pooling, a 1 x 1 convolution, a BN layer, a ReLu activation function, and an upsampling using bilinear differences, see fig. 1. The global feature enhancement branch is also added with a 1 x 1 convolution branch for retaining complete overall gait contour information, and the output F of the convolution branchGlobal_1Directly with FGlobal_Res1And FGlobal_Res2Weighted sum enhanced global feature FG
The local feature enhancement network also enhances semantic expressions by cascading two residual network (Res) branches, where the output F of one branchLocal_Res1Add fusion with the input of the other branch, the output obtained after one Res after fusion is denoted as FLocal_Res2Output FLocal_Res1And FLocal_Res2Weighted sum to obtain FLocal_ResAnd further extracting features by using a 3 x 3 convolutional network to obtain FLocal_Res3Then F is putLocal_Res3Inputting an attention model to obtain FAttThe basic unit Att of the attention model is a weighted sum of global maximum pooling and global average pooling, 1 × 1 convolution, a BN layer and a Sigmoid activation function, is used for enriching the context information of local features, and can capture finer spatial resolution information, FLocal_Res3、FAttAnd FGAnd performing weighted sum to obtain fusion characteristics.
The whole fusion module enriches the information of feature propagation through two residual error network cascades and a self-attention mechanism, generates rich context information, and is vital to gait recognition based on a video sequence.
And 4, calculating triple losses for the fusion characteristics during network training, and then performing reverse propagation to update network parameters.
And 5, calculating the Euclidean distance between the fusion characteristics of the probe and the galery when the trained network is used for forward reasoning, sorting according to the distance, and calculating rank-1 recognition accuracy, wherein the closest distance is a sequence from the same sample.

Claims (4)

1. The gait recognition method based on the deep learning cascade feature fusion is characterized by comprising the following steps:
step 1, accessing a pedestrian gait image sequence or video, inputting the pedestrian gait image sequence or video into a local feature extraction branch based on a GaitSet network, and carrying out slicing processing on the image to extract a local feature FLocal
Step 2, inputting the pedestrian gait image sequence or video into a global feature extraction branch improved based on a GaitSet network, wherein the branch completely reserves the overall contour feature information of the image to obtain a global feature FGlobal
Step 3, performing feature fusion on the extracted local features and the extracted global features through the feature fusion branch;
step 4, during network training, triple losses are used for fusion characteristics to perform subsequent back propagation and update network parameters;
and 5, calculating the Euclidean distance between the fusion characteristics of the probe and the galery when the trained network is used for forward reasoning, and calculating the rank-1 recognition accuracy according to the distance.
2. The gait recognition method based on deep learning cascade feature fusion of claim 1,
the global feature extraction branch in the step 2 is improved based on a GaitSet network, and a horizontal feature mapping module in the GaitSet network is changed into a fusion operation of directly performing global average pooling and global maximum pooling on the features.
3. The gait recognition method based on deep learning cascade feature fusion of claim 2,
after the fusion operation of global average pooling and global maximum pooling is performed, the features are mapped through a full-link layer.
4. The gait recognition method based on deep learning cascade feature fusion according to any one of claims 1 to 3,
the feature fusion branch firstly carries out output amplification on the local feature extraction branch and the global feature extraction branch, and then fuses the amplified features;
wherein the global feature FGlobalObtaining a first global residual error characteristic F after passing through a first residual error network ResGlobal_Res1First global residual feature FGlobal_Res1And global feature FGlobalAdding and fusing the residual errors through a second residual error network Res to obtain a second global residual error characteristic FGlobal_Res2Global feature FGlobalObtaining a global feature F through a 1 x 1 convolution branchGlobal_1Then the global feature F is addedGlobal_1First global residual error feature FGlobal_Res1And a second global residual feature FGlobal_Res2Adding and fusing to obtain enhanced global feature FG
Wherein the local feature FLocalObtaining a first local residual error characteristic F after passing through a third residual error network ResLocal_Res1First local residual bit FLocal_Res1And local feature FLocalAdding and fusing the residual errors through a fourth residual error network Res to obtain a second local residual error characteristic FLocal_Res2First local residual feature FLocal_Res1And a second local residual feature FLocal_Res2Adding and fusing to obtain local residual error characteristics FLocal_ResUsing a 3 x 3 convolutional network on the local residual features FLocal_ResFurther extracting the features to obtain a third local residual error feature FLocal_Res3The third local residual error feature FLocal_Res3Inputting an attention model to obtain a local attention feature FAttFinally, the third local residual error feature FLocal_Res3Local attention feature FAttAnd global feature FGAnd adding to obtain a fusion characteristic.
CN202110460610.XA 2021-04-27 2021-04-27 Gait recognition method based on deep learning cascade feature fusion Active CN113011396B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110460610.XA CN113011396B (en) 2021-04-27 2021-04-27 Gait recognition method based on deep learning cascade feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110460610.XA CN113011396B (en) 2021-04-27 2021-04-27 Gait recognition method based on deep learning cascade feature fusion

Publications (2)

Publication Number Publication Date
CN113011396A true CN113011396A (en) 2021-06-22
CN113011396B CN113011396B (en) 2024-02-09

Family

ID=76380709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110460610.XA Active CN113011396B (en) 2021-04-27 2021-04-27 Gait recognition method based on deep learning cascade feature fusion

Country Status (1)

Country Link
CN (1) CN113011396B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538400A (en) * 2021-07-29 2021-10-22 燕山大学 Cross-modal crowd counting method and system
CN113947814A (en) * 2021-10-28 2022-01-18 山东大学 Cross-visual angle gait recognition method based on space-time information enhancement and multi-scale saliency feature extraction

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019018063A1 (en) * 2017-07-19 2019-01-24 Microsoft Technology Licensing, Llc Fine-grained image recognition
CN111539320A (en) * 2020-04-22 2020-08-14 山东大学 Multi-view gait recognition method and system based on mutual learning network strategy

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019018063A1 (en) * 2017-07-19 2019-01-24 Microsoft Technology Licensing, Llc Fine-grained image recognition
CN111539320A (en) * 2020-04-22 2020-08-14 山东大学 Multi-view gait recognition method and system based on mutual learning network strategy

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113538400A (en) * 2021-07-29 2021-10-22 燕山大学 Cross-modal crowd counting method and system
CN113947814A (en) * 2021-10-28 2022-01-18 山东大学 Cross-visual angle gait recognition method based on space-time information enhancement and multi-scale saliency feature extraction
CN113947814B (en) * 2021-10-28 2024-05-28 山东大学 Cross-view gait recognition method based on space-time information enhancement and multi-scale saliency feature extraction

Also Published As

Publication number Publication date
CN113011396B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN111563508B (en) Semantic segmentation method based on spatial information fusion
CN108875807B (en) Image description method based on multiple attention and multiple scales
CN111507311B (en) Video character recognition method based on multi-mode feature fusion depth network
CN113011396B (en) Gait recognition method based on deep learning cascade feature fusion
CN113240683B (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN112784929A (en) Small sample image classification method and device based on double-element group expansion
CN112766378A (en) Cross-domain small sample image classification model method focusing on fine-grained identification
CN113449671A (en) Multi-scale and multi-feature fusion pedestrian re-identification method and device
CN113935435A (en) Multi-modal emotion recognition method based on space-time feature fusion
CN114579707B (en) Aspect-level emotion analysis method based on BERT neural network and multi-semantic learning
CN112084895A (en) Pedestrian re-identification method based on deep learning
CN116206327A (en) Image classification method based on online knowledge distillation
CN115797808A (en) Unmanned aerial vehicle inspection defect image identification method, system, device and medium
Liu et al. Deeply coupled convolution–transformer with spatial–temporal complementary learning for video-based person re-identification
CN114821050A (en) Named image segmentation method based on transformer
CN110826534A (en) Face key point detection method and system based on local principal component analysis
CN115952360B (en) Domain self-adaptive cross-domain recommendation method and system based on user and article commonality modeling
CN116883746A (en) Graph node classification method based on partition pooling hypergraph neural network
CN115982652A (en) Cross-modal emotion analysis method based on attention network
CN114495163A (en) Pedestrian re-identification generation learning method based on category activation mapping
CN112529098B (en) Dense multi-scale target detection system and method
CN115204171A (en) Document-level event extraction method and system based on hypergraph neural network
CN114911930A (en) Global and local complementary bidirectional attention video question-answering method and system
CN110222716B (en) Image classification method based on full-resolution depth convolution neural network
CN111325068B (en) Video description method and device based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant