CN112434655A - Gait recognition method based on adaptive confidence map convolution network - Google Patents
Gait recognition method based on adaptive confidence map convolution network Download PDFInfo
- Publication number
- CN112434655A CN112434655A CN202011432129.1A CN202011432129A CN112434655A CN 112434655 A CN112434655 A CN 112434655A CN 202011432129 A CN202011432129 A CN 202011432129A CN 112434655 A CN112434655 A CN 112434655A
- Authority
- CN
- China
- Prior art keywords
- adaptive
- layer
- self
- space
- matrix
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000005021 gait Effects 0.000 title claims abstract description 100
- 238000000034 method Methods 0.000 title claims abstract description 28
- 230000003044 adaptive effect Effects 0.000 title claims description 64
- 238000012549 training Methods 0.000 claims abstract description 14
- 239000011159 matrix material Substances 0.000 claims description 77
- 230000004927 fusion Effects 0.000 claims description 38
- 238000010606 normalization Methods 0.000 claims description 25
- 230000004913 activation Effects 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 15
- 238000010586 diagram Methods 0.000 claims description 12
- 238000012935 Averaging Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 239000004576 sand Substances 0.000 claims 2
- 230000008859 change Effects 0.000 abstract description 6
- 230000007547 defect Effects 0.000 abstract description 3
- 230000008569 process Effects 0.000 description 12
- 230000006870 function Effects 0.000 description 7
- 230000007246 mechanism Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Human Computer Interaction (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a gait recognition method based on a self-adaptive confidence map convolution network, which comprises the following steps: 1, acquiring a data set containing human body posture information; 2, establishing a self-adaptive confidence map convolution network model; 3 off-line training the built self-adaptive confidence map convolution network model; and 4, realizing prediction by using the established model so as to achieve the purpose of gait recognition. The gait feature can be extracted from the self-adaptive confidence map convolutional network attitude data and the influence of noise in the attitude data is reduced, so that the accuracy of gait recognition can be improved, and the defect that the conventional gait recognition algorithm based on the appearance is sensitive to appearance change is improved.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a gait recognition method based on a self-adaptive confidence map convolution network.
Background
Identity recognition is an important subject in the field of computer vision and plays an important role in the fields of video monitoring and social security. Gait recognition aims at judging the identity of a person through a walking mode of the person, and compared with other identity recognition algorithms, the gait recognition method has the advantages of non-contact, long acting distance and the like. Gait recognition algorithms are mainly divided into two main categories: appearance-based algorithms and model-based algorithms.
Appearance-based algorithms typically utilize a body contour image as input data that can retain gait information while removing a map. In the early algorithm, a gait template is constructed, and an image sequence is fused into a template image and then is identified. Recent algorithms based on deep learning use convolutional neural networks to extract gait features from gait templates or gait sequences and then identify the gait features. The algorithm based on the appearance is widely applied, but is also easily interfered by external factors, such as the change of the human appearance (clothes change, carrying and the like), the change of the visual angle, the walking speed and the like.
The model-based algorithm models gait using a priori knowledge of the structure and motion of the human body, and is therefore more robust to disturbances such as appearance, perspective, and the like. However, the early algorithms can only be modeled under limited conditions, and are low in application range and accuracy compared with appearance-based algorithms. In recent years, deep learning has attracted wide attention in the computer vision field, and people have applied the deep learning to various vision tasks, wherein a posture estimation algorithm based on the deep learning can more accurately acquire human body posture information, and a new idea is provided for gait recognition based on a model. However, due to the limitation of the performance of the pose estimation algorithm itself and the interference of factors such as occlusion, background interference, low resolution in the image, the pose learning obtained by this method often includes noise. On one hand, the existing gait algorithm based on the posture does not perform targeted processing on noise in the posture data, so that the recognition accuracy is influenced; on the other hand, the relation between the posture data is not utilized, and the extracted gait features are not comprehensive enough.
Disclosure of Invention
The invention provides a gait recognition method based on a self-adaptive confidence map convolution network, aiming at overcoming the defects of the prior art, and aiming at extracting effective gait expression from human body posture key point data containing noise by using the self-adaptive confidence map convolution network and reducing the influence of the noise in the posture data, and simultaneously considering the relation between the posture key points so as to improve the gait recognition accuracy and improve the defect that the prior gait recognition algorithm based on the appearance is sensitive to appearance change.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention relates to a gait recognition method based on a self-adaptive confidence map convolution network, which is characterized by comprising the following steps of:
step 2, establishing an adaptive confidence map convolution network model, which comprises the following steps: the system comprises a first-layer input module, an alpha-layer adaptive confidence coefficient module, a beta-layer space-time feature fusion module and a first-layer output module;
the input module is realized by a batch normalization layer;
the self-adaptive confidence module consists of a self-adaptive confidence map convolution layer, a normalization layer, a ReLU activation layer and a Dropout layer in sequence;
the space-time feature fusion module sequentially consists of a self-adaptive graph convolution layer, a normalization layer, a ReLU activation layer, a Dropout layer, a convolution layer, a normalization layer and a ReLU activation layer;
the output module consists of a pooling layer and a full-connection layer with a softmax function;
step 2.1, the standardized posture skeleton is used for describing the posture data in the data set XAccording to the human body structure composition, regarding each gesture key point as a vertex set V of the graph, regarding natural connection among the gesture key points as an edge set E of the graph, and forming graph data as G (V, E);
step 2.2, the graph data G is sent to an input module for data normalization processing, and normalized graph data G ' is obtained, wherein V ' represents a normalized vertex set, and E ' represents an adjacency matrix subjected to regularization;
step 2.3, the normalized graph data G' and the confidence coefficient vector in the normalized posture skeleton data set XSequential input alpha-layer adaptive confidence modulusExtracting features from the graph data G' according to a preset graph volume operator, and weighting the extracted features by using a confidence coefficient vector; obtaining a space gait feature sequence after an alpha layer self-adaptive confidence coefficient moduleWherein, Ft sRepresenting a sequence of spatial features FsA characteristic map of a middle t-th frame;
step 2.4, the space gait feature sequence is processedInputting the data into a beta layer space-time characteristic fusion module in sequence, and utilizing a graph convolution operator to obtain a space gait characteristic sequence FsFurther extracting deep space gait features, and extracting space-time gait features from the space gait feature sequence by utilizing a time sequence convolution operator; obtaining a space-time gait feature sequence after a space-time feature fusion module of a beta layerWherein, Ft stRepresenting a sequence of spatio-temporal features FstThe T th frame characteristic diagram, T' represents the frame number of the space-time characteristic sequence after passing through all the space-time characteristic fusion modules;
step 2.5, the space-time gait feature sequence is processedCarrying out averaging processing on a time sequence to fuse T' frame feature maps; carrying out average processing on the fused feature images on the space so as to fuse the features of all the attitude key points, thereby obtaining the final gait feature f;
step 2.6, the gait characteristics f are sent to an output module to obtain a prediction result y;
step 3, calculating triple losses according to the gait characteristics f, calculating cross entropy losses according to the prediction result y, updating the weight of the adaptive confidence map convolution network model by using the SGD, and finishing training when the loss value tends to be stable to obtain an optimal adaptive confidence map convolution network model;
and 4, extracting gait features from the sequence to be retrieved and all sequences in the retrieval library by using the optimal self-adaptive confidence map convolution network model, calculating Euclidean distances between the feature to be retrieved and all features in the retrieval library, and sequencing the features from small to large according to the distances to obtain a retrieval result.
The gait recognition method based on the adaptive confidence map convolution network is also characterized in that the step 2.3 comprises the following steps:
step 2.3.1, input of each adaptive confidence map convolutional layer is recorded asOrder toObtaining a characteristic diagram after respectively passing through two convolution operatorsAndby matrix multiplicationAndnormalizing the result by using a softmax function to obtain an adaptive matrix A;
step 2.3.2, vector confidenceExpanding the matrix into a matrix with the same size as the adjacent matrix A according to columns, and recording the matrix as a confidence matrix C; generating an all-0 matrix with the same size as the adaptive matrix B;
step 2.3.3, the adjacent matrix E', the self-adaptive matrix A and the self-adaptive matrix B are added and multiplied by the confidence coefficient matrix C to calculate the Hadamard productObtaining a final adaptive confidence adjacent matrix E ', and inputting the final adaptive confidence adjacent matrix E' and the inputThe outputs of the adaptive branches in the adaptive confidence map convolution layer are obtained after passing through a map convolution operator
Step 2.3.4, input theAnd confidence vectorObtaining the output of the confidence branch in the self-adaptive confidence map volume layer through a convolution operator after multiplication and weighting
Step 2.3.5, outputting the self-adaptive branchOutput with confidence branchAdding to obtain the output of the adaptive confidence map convolutional layerThen an intermediate result of the convolution module of the self-adaptive confidence map is obtained through the normalization layer, the ReLU activation layer and the Dropout layer
Step 2.3.6, if the input isSize and intermediate resultsIf not, then order the inputPassing through a convolutional layer and then combining it with an intermediate resultEqual in size and intermediate in resultAdding to obtain the output of the convolution module of the self-adaptive confidence mapIf it is as describedSize and intermediate resultsIf they are equal, directly order the inputAnd intermediate resultsAdding to obtain the output of the convolution module of the self-adaptive confidence map
Step 2.3.7, repeating the step 2.3.1 to the step 2.3.6 until all the self-adaptive confidence coefficient modules are passed, thereby obtaining a space gait feature sequence Fs。
The step 2.4 comprises:
step 2.4.1, input of each space-time feature fusion module is recorded asOrder toObtaining a characteristic diagram after respectively passing through two convolution operatorsAndby matrix multiplicationAndnormalizing the result by using a softmax function to obtain an adaptive matrix A';
step 2.4.2, adding the adjacent matrix E ' and the self-adaptive matrix A ' to obtain a self-adaptive adjacent matrix E '; will inputObtaining the output of the adaptive graph convolution layer in the space-time characteristic fusion module through a graph convolution operator with the adaptive adjacent matrix E
Step 2.4.3, output of the adaptive graph convolution layerSequentially passing through a batch normalization layer, a ReLU activation layer and a Dropout layer to obtain a deep space gait characteristic sequence Fst′;
Step 2.4.4, the deep space gait feature sequence Fst′Sequentially passing through a convolution layer, a batch normalization layer and a ReLU activation layer to obtain intermediate results of the space-time feature fusion module
Step 2.4.5, if the inputSize and intermediate results ofIf not, then order the inputPassing through a convolutional layer and then combining it with an intermediate resultEqual in size and intermediate in resultAdding to obtain the output of the space-time feature fusion moduleIf it is inputtedSize and intermediate resultsIf they are equal, directly order the inputAnd intermediate resultsAdding to obtain the output of the space-time feature fusion module
Step 2.4.6, repeating the step 2.4.1 to the step 2.4.5 until all the self-adaptive confidence coefficient modules are passed, thereby obtaining a space-time gait feature sequence Fst。
Compared with the prior art, the invention has the beneficial effects that:
1. compared with the conventional mainstream gait recognition method, the method has the advantages of low calculation amount, higher efficiency and better robustness to external interference factors such as appearance change and the like, thereby being more beneficial to the practical application of gait recognition.
2. The gait features are extracted from the human body posture information through the designed self-adaptive confidence map convolution network, compared with the traditional convolution network, the network can better process posture key point data, and can extract richer gait features by utilizing the relation among the posture key points, so that the accuracy of gait recognition is improved.
3. The invention constructs a self-adaptive confidence weighting mechanism applied to the graph convolution network, and uses the confidence obtained by the attitude extractor to weight each attitude key point in the convolution process, thereby reducing the adverse effect of noise points, more utilizing accurate attitude information and finally improving the accuracy of gait recognition.
4. The invention increases the diversity of gait characteristics and ensures the accuracy of the gait characteristics by constructing the self-adaptive confidence map convolution network and applying the attention mechanism.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a block diagram of an adaptive confidence map convolutional layer in accordance with the present invention;
FIG. 3 is a block diagram of an adaptive confidence module of the present invention;
FIG. 4 is a block diagram of a spatiotemporal feature fusion module according to the present invention;
FIG. 5 is a block diagram of an adaptive confidence map convolution network in accordance with the present invention.
Detailed Description
In this embodiment, a gait recognition method based on an adaptive confidence graph convolution network mainly extracts effective gait features from noisy posture data by using a graph convolution network and an attention mechanism, as shown in fig. 1, the method includes the following specific steps:
step 1, extracting human body posture key points from an original video data set by using a posture estimation algorithm, obtaining gait sequences of N T frames, carrying out standardization processing, obtaining a standardized posture skeleton data set and recording the standardized posture skeleton data set as a standard posture skeleton data setWherein,represents the nth normalized pose skeleton sequence, andrepresenting the nth normalized pose skeleton sequenceA middle t frame posture skeleton;representing the t frame pose skeletonThe confidence corresponding to the middle posture key point; n1, 2, 1., N, T1, 2., T; in the embodiment, a CASI-B gait data set is used for training and testing, and the data set comprises 124 walking videos of people in different walking states and different visual angles; using the data of the first 62 persons as a training set and the data of the second 62 persons as a testing set; the attitude estimation algorithm is AlphaPose; because the distance between the pedestrian and the camera is constantly changed when the pedestrian walks, the acquired posture frameworks are different in size, and all posture framework data need to be standardized in order to avoid the influence of the size on gait characteristics; let the original posture skeleton be P ═ vi1, 2., M }, where v ═ i ═ 1,2iRepresenting the ith posture key point in the posture skeleton, and M represents the number of the posture key points in one posture skeleton; the standardized posture skeleton is recorded asThe process of normalization can be expressed as formula (1):
in the formula (1), vneckAnd vhipRespectively representing the coordinates of a neck key point and a hip key point, and dist represents a Euclidean distance function; because the neck key points and the hip key points are accurate in posture estimation, the distance between the two key points is normalized to obtain posture data with relatively consistent sizes; during training, the length T of the video sequence is 30, because a complete gait cycle is about 25 frames, 30 frames are taken to ensure that the gait data comprises a complete gait, and if the length of the video in the data set is shorter than 30 frames, the video is repeated to be more than 30 frames and then the previous 30 frames are intercepted; the uniform video length is used to speed up the training process with batch processing;
step 2, establishing an adaptive confidence map convolution network model, which comprises the following steps: the system comprises a layer of input module, an alpha layer self-adaptive confidence coefficient module, a beta layer space-time feature fusion module and a layer of output module, wherein as shown in fig. 5, in the embodiment, alpha is 3, and beta is 7, so that the network achieves the optimal identification accuracy rate;
the input module is realized by a batch normalization layer;
as shown in fig. 3, the adaptive confidence module is sequentially composed of an adaptive confidence map convolution layer, a normalization layer, a ReLU activation layer, and a Dropout layer;
as shown in fig. 4, the spatio-temporal feature fusion module sequentially comprises a self-adaptive graph convolution layer, a normalization layer, a ReLU activation layer, a Dropout layer, a convolution layer, a normalization layer and a ReLU activation layer;
the output module consists of a pooling layer and a full-connection layer with a softmax function;
step 2.1, the standardized posture skeleton is used for describing the posture data in the data set XAccording to the human body structure composition, regarding each gesture key point as a vertex set V of the graph, regarding natural connection among the gesture key points as an edge set E of the graph, and forming graph data as G (V, E);
step 2.2, sending the graph data G into an input module for data normalization processing to obtain normalized graph data G ═ V ', E'), where V 'represents a normalized vertex set, and E' represents an adjacency matrix subjected to regularization; the input module enables the data to be normalized and processed into data which meet the distribution that the mean value is 0 and the standard deviation is 1, and the input module is beneficial to reducing the deviation between the distribution of the training set and the test set, so that the accuracy rate is improved;
step 2.3, the normalized graph data G' and the confidence coefficient vector in the normalized posture skeleton data set XSequentially inputting an alpha-layer self-adaptive confidence coefficient module, extracting features from the image data G' according to a preset image volume operator, and weighting the extracted features by using a confidence coefficient vector; obtaining a space gait feature sequence after an alpha layer self-adaptive confidence coefficient moduleWherein,representing a sequence of spatial features FsA characteristic map of a middle t-th frame; in the embodiment, 3 layers of adaptive confidence modules are used, and the number of feature map channels output by each layer of confidence module is respectively 64, 64 and 64, as shown in fig. 5;
step 2.3.1, the structure of the adaptive confidence map convolutional layer in the adaptive confidence module is shown in FIG. 2, and the input of each adaptive confidence map convolutional layer is recorded asOrder toObtaining a characteristic diagram after respectively passing through two convolution operatorsAndby matrix multiplicationAndnormalizing the result by using a softmax function to obtain an adaptive matrix A; for the posture key points in the posture skeleton, the importance of each key point is different, and in the gait recognition task, it is obvious that the leg key points contain more gait information than the head key points, so that each key point needs to be weighted; the adjacent matrix E' is only used to indicate whether there is a connection between the key points, and does not indicate the weight of each key point, so an additional weight matrix is required; calculating the correlation among all key points through an attention mechanism, wherein the obtained matrix A can adaptively update the weight of each key point;
step 2.3.2, vector confidenceExpanding the matrix into a matrix with the same size as the adjacent matrix A according to columns, and recording the matrix as a confidence matrix C; generating an all-0 matrix with the same size as the adaptive matrix B; the graph convolution process is a process of exchanging information for each key point, but since some key points are noise data and the contained information is not accurate, the noise key points need to be suppressed in the information exchange process; after the confidence coefficient vector is expanded into a confidence coefficient matrix according to columns, the confidence coefficient matrix can be used for weighting the key points according to the confidence coefficient of the key points in the graph convolution process, and therefore the propagation of the noise key points to other key points is restrainedMeanwhile, the noise key point is not influenced to receive the information of the correct key point; the adaptive matrix B is a further supplement to the adjacent matrix E ', in E', the connection between key points is defined only according to the natural connection of human bodies, but implicit connection exists between key points, for example, when a person walks, the movement directions of arms and legs on the same side are opposite, which shows that the implicit connection exists between the arms and the legs, and the adaptive matrix B network is automatically updated in the training process to expand the connection and weight between the key points.
Step 2.3.3, the adjacency matrix E ', the self-adaptive matrix A and the self-adaptive matrix B are added and multiplied by the confidence coefficient matrix C to calculate a Hadamard product to obtain a final self-adaptive confidence coefficient adjacency matrix E ', and the final self-adaptive confidence coefficient adjacency matrix E ' is input with the inputThe outputs of the adaptive branches in the adaptive confidence map convolution layer are obtained after passing through a map convolution operator
Step 2.3.4, inputAnd confidence vectorObtaining the output of the confidence branch in the self-adaptive confidence map volume layer through a convolution operator after multiplication and weighting
Step 2.3.5, outputting the self-adaptive branchOutput with confidence branchAdding to obtain an adaptive deviceReliability map convolution layer outputThen an intermediate result of the convolution module of the self-adaptive confidence map is obtained through a normalization layer, a ReLU activation layer and a Dropout layerThe reason for using Dropout layers is to avoid the overfitting phenomenon of the network, and the probability of all Dropout layers in this embodiment is 0.5;
step 2.3.6, if inputSize and intermediate resultsIf not, then order the inputPassing through a convolutional layer and then combining it with an intermediate resultEqual in size and intermediate in resultAdding to obtain the output of the convolution module of the self-adaptive confidence mapIf it isSize and intermediate resultsIf they are equal, directly order the inputAnd intermediate resultsAdding to obtain the output of the convolution module of the self-adaptive confidence mapThis step is to inputAnd intermediate resultsAdding, and improving the performance of the network through residual learning;
step 2.3.7, repeating the step 2.3.1 to the step 2.3.6 until all the self-adaptive confidence coefficient modules are passed, thereby obtaining a space gait feature sequence Fs;
Step 2.4, the space gait feature sequenceInputting the data into a beta layer space-time characteristic fusion module in sequence, and utilizing a graph convolution operator to obtain a space gait characteristic sequence FsFurther extracting deep space gait features, and extracting space-time gait features from the space gait feature sequence by utilizing a time sequence convolution operator; obtaining a space-time gait feature sequence after a space-time feature fusion module of a beta layerWherein,representing a sequence of spatio-temporal features FstThe T th frame characteristic diagram, T' represents the frame number of the space-time characteristic sequence after passing through all the space-time characteristic fusion modules; the main difference between the spatio-temporal feature fusion module and the adaptive confidence coefficient module is that the spatio-temporal feature fusion module does not use confidence coefficient for weighting, and only uses an attention mechanism to expand an adjacency matrix, because there are two reasons: on one hand, after passing through the alpha-layer self-adaptive confidence module, each key point carries out information exchange, and the original noise key point is correctly receivedThe noise of the information itself is suppressed, so that the information can be regarded as a correct key point; on the other hand, confidence weighting requires that the confidence is in one-to-one correspondence with each attitude key point, and the time-space feature fusion module changes the length of the attitude sequence in the time sequence convolution process and does not meet the one-to-one correspondence relationship; in this embodiment, 7 layers of spatio-temporal feature fusion modules are used, and the number of feature map channels output by each layer of spatio-temporal feature fusion module is 64, 128, 128, 128, 256, 256 and 256, as shown in fig. 5; the step length of the convolution layers in the spatio-temporal feature fusion modules of the 2 nd layer and the 5 th layer is 2, and the step length of the convolution layers in the other spatio-temporal feature fusion modules is 1;
step 2.4.1, input of each space-time feature fusion module is recorded asOrder toObtaining a characteristic diagram after respectively passing through two convolution operatorsAndby matrix multiplicationAndnormalizing the result by using a softmax function to obtain an adaptive matrix A';
step 2.4.2, adding the adjacent matrix E ' and the self-adaptive matrix A ' to obtain a self-adaptive adjacent matrix E '; will inputObtaining the output of the adaptive graph convolution layer in the space-time characteristic fusion module through a graph convolution operator with the adaptive adjacent matrix E
Step 2.4.3, output of the adaptive graph convolution layerSequentially passing through a batch normalization layer, a ReLU activation layer and a Dropout layer to obtain a deep space gait characteristic sequence Fst′;
Step 2.4.4, the deep space gait feature sequence Fst′Sequentially passing through a convolution layer, a batch normalization layer and a ReLU activation layer to obtain intermediate results of the space-time feature fusion module
Step 2.4.5, if inputSize and intermediate results ofIf not, then order the inputPassing through a convolutional layer and then combining it with an intermediate resultEqual in size and intermediate in resultAdding to obtain the output of the space-time feature fusion moduleIf it is inputtedSize and intermediate resultsIf they are equal, directly order the inputAnd intermediate resultsAdding to obtain the output of the space-time feature fusion module
Step 2.4.6, repeating the step 2.4.1 to the step 2.4.5 until all the self-adaptive confidence coefficient modules are passed, thereby obtaining a space-time gait feature sequence Fst;
Step 2.5, the space-time gait feature sequenceCarrying out averaging processing on a time sequence to fuse T' frame feature maps; carrying out average processing on the fused feature images on the space so as to fuse the features of all the attitude key points, thereby obtaining the final gait feature f; in the step, the sequence characteristics are fused by average processing, so that the gait characteristics with the same size can be obtained after the sequences with different lengths pass through a network, and the test is convenient;
step 2.6, the gait characteristics f are sent to an output module to obtain a prediction result y;
step 3, calculating triple losses according to the gait characteristics f, calculating cross entropy losses according to the prediction result y, updating the weight of the adaptive confidence map convolution network model by using the SGD, and finishing training when the loss value tends to be stable to obtain an optimal adaptive confidence map convolution network model; in this embodiment, 12 persons are randomly selected from the training set at each iteration, and then 2 gait sequences are randomly selected from the data of each person, so that the batch size at each iteration is 24 sequences; the boundary distance of the triple loss is set to be 1.5, and the weight of the triple loss is 0.25; the training process adopts a preheated training strategy, the strategy is helpful for relieving the overfitting phenomenon of the network, the accuracy is improved, and the specific strategy is implemented as follows: setting the initial learning rate to 0.0001, linearly increasing to 0.1 in 1000 iterations, reducing the learning rate to 0.1 time from the 5000 th iteration, the 7000 th iteration and the 9000 th iteration, and ending the training after 10000 iterations;
step 4, extracting gait features from the sequence to be retrieved and all sequences in the retrieval library by using an optimal self-adaptive confidence map convolution network model, calculating Euclidean distances between the feature to be retrieved and all features in the retrieval library, and sequencing the features from small to large according to the distances to obtain a retrieval result; during testing, all frames of each sequence are sent into a model to extract gait characteristics so as to fully utilize information in data.
Claims (3)
1. A gait recognition method based on a self-adaptive confidence map convolution network is characterized by comprising the following steps:
step 1, extracting human body posture key points from an original video data set by using a posture estimation algorithm, obtaining gait sequences of N T frames, carrying out standardization processing, obtaining a standardized posture skeleton data set and recording the standardized posture skeleton data set as a standard posture skeleton data setWherein,represents the nth normalized pose skeleton sequence, and representing the nth normalized pose skeleton sequenceA middle t frame posture skeleton;representing the t frame pose skeletonThe confidence corresponding to the middle posture key point; n1, 2, 1., N, T1, 2., T;
step 2, establishing an adaptive confidence map convolution network model, which comprises the following steps: the system comprises a first-layer input module, an alpha-layer adaptive confidence coefficient module, a beta-layer space-time feature fusion module and a first-layer output module;
the input module is realized by a batch normalization layer;
the self-adaptive confidence module consists of a self-adaptive confidence map convolution layer, a normalization layer, a ReLU activation layer and a Dropout layer in sequence;
the space-time feature fusion module sequentially consists of a self-adaptive graph convolution layer, a normalization layer, a ReLU activation layer, a Dropout layer, a convolution layer, a normalization layer and a ReLU activation layer;
the output module consists of a pooling layer and a full-connection layer with a softmax function;
step 2.1, the standardized posture skeleton is used for describing the posture data in the data set XAccording to the human body structure composition, regarding each gesture key point as a vertex set V of the graph, regarding natural connection among the gesture key points as an edge set E of the graph, and forming graph data as G (V, E);
step 2.2, the graph data G is sent to an input module for data normalization processing, and normalized graph data G ' is obtained, wherein V ' represents a normalized vertex set, and E ' represents an adjacency matrix subjected to regularization;
step 2.3, the normalized graph data G' and the confidence coefficient vector in the normalized posture skeleton data set XSequentially inputting an alpha-layer self-adaptive confidence coefficient module, extracting features from the image data G' according to a preset image volume operator, and weighting the extracted features by using a confidence coefficient vector; obtaining a space gait feature sequence after an alpha layer self-adaptive confidence coefficient moduleWherein, Ft sRepresenting a sequence of spatial features FsA characteristic map of a middle t-th frame;
step 2.4, the space gait feature sequence is processedInputting the data into a beta layer space-time characteristic fusion module in sequence, and utilizing a graph convolution operator to obtain a space gait characteristic sequence FsFurther extracting deep space gait features, and extracting space-time gait features from the space gait feature sequence by utilizing a time sequence convolution operator; obtaining a space-time gait feature sequence after a space-time feature fusion module of a beta layerWherein, Ft stRepresenting a sequence of spatio-temporal features FstThe T th frame characteristic diagram, T' represents the frame number of the space-time characteristic sequence after passing through all the space-time characteristic fusion modules;
step 2.5, the space-time gait feature sequence is processedCarrying out averaging processing on a time sequence to fuse T' frame feature maps; carrying out average processing on the fused feature images on the space so as to fuse the features of all the attitude key points, thereby obtaining the final gait feature f;
step 2.6, the gait characteristics f are sent to an output module to obtain a prediction result y;
step 3, calculating triple losses according to the gait characteristics f, calculating cross entropy losses according to the prediction result y, updating the weight of the adaptive confidence map convolution network model by using the SGD, and finishing training when the loss value tends to be stable to obtain an optimal adaptive confidence map convolution network model;
and 4, extracting gait features from the sequence to be retrieved and all sequences in the retrieval library by using the optimal self-adaptive confidence map convolution network model, calculating Euclidean distances between the feature to be retrieved and all features in the retrieval library, and sequencing the features from small to large according to the distances to obtain a retrieval result.
2. The gait recognition method based on the adaptive confidence map convolutional network of claim 1, wherein the step 2.3 comprises:
step 2.3.1, input of each adaptive confidence map convolutional layer is recorded asOrder toObtaining a characteristic graph F after respectively passing through two convolution operators1 sAndcalculating F by matrix multiplication1 sAndnormalizing the result by using a softmax function to obtain an adaptive matrix A;
step 2.3.2, vector confidenceExpanding the matrix into a matrix with the same size as the adjacent matrix A according to columns, and recording the matrix as a confidence matrix C; generating an all-0 matrix with the same size as the adaptive matrix B;
step 2.3.3, adding the adjacency matrix E ', the self-adaptive matrix A and the self-adaptive matrix B and then adding the adjacency matrix E', the self-adaptive matrix A and the self-adaptive matrix B with the confidence matrixC, multiplying and calculating a Hadamard product to obtain a final adaptive confidence coefficient adjacent matrix E ', and combining the final adaptive confidence coefficient adjacent matrix E' with the inputThe outputs of the adaptive branches in the adaptive confidence map convolution layer are obtained after passing through a map convolution operator
Step 2.3.4, input theAnd confidence vectorObtaining the output of the confidence branch in the self-adaptive confidence map volume layer through a convolution operator after multiplication and weighting
Step 2.3.5, outputting the self-adaptive branchOutput with confidence branchAdding to obtain the output of the adaptive confidence map convolutional layerThen an intermediate result of the convolution module of the self-adaptive confidence map is obtained through the normalization layer, the ReLU activation layer and the Dropout layer
Step (ii) of2.3.6, if the input isSize and intermediate resultsIf not, then order the inputPassing through a convolutional layer and then combining it with an intermediate resultEqual in size and intermediate in resultAdding to obtain the output of the convolution module of the self-adaptive confidence mapIf it is as describedSize and intermediate resultsIf they are equal, directly order the inputAnd intermediate resultsAdding to obtain the output of the convolution module of the self-adaptive confidence map
Step 2.3.7, repeat step 2.3.1-step 2.3.6 until all adaptive confidence modules are passed, thus obtaining spaceGait signature sequence Fs。
3. The gait recognition method based on the adaptive confidence map convolutional network of claim 1, wherein the step 2.4 comprises:
step 2.4.1, input of each space-time feature fusion module is recorded asOrder toObtaining a characteristic graph F after respectively passing through two convolution operators1 stAndcalculating F by matrix multiplication1 stAndnormalizing the result by using a softmax function to obtain an adaptive matrix A';
step 2.4.2, adding the adjacent matrix E ' and the self-adaptive matrix A ' to obtain a self-adaptive adjacent matrix E '; will inputObtaining the output of the adaptive graph convolution layer in the space-time characteristic fusion module through a graph convolution operator with the adaptive adjacent matrix E
Step 2.4.3, output of the adaptive graph convolution layerSequentially passing through a batch normalization layer, a ReLU activation layer and a Dropout layer to obtain a deep space gait characteristic sequence Fst′;
Step 2.4.4, the deep space gait feature sequence Fst′Sequentially passing through a convolution layer, a batch normalization layer and a ReLU activation layer to obtain intermediate results of the space-time feature fusion module
Step 2.4.5, if the inputSize and intermediate results ofIf not, then order the inputPassing through a convolutional layer and then combining it with an intermediate resultEqual in size and intermediate in resultAdding to obtain the output of the space-time feature fusion moduleIf it is inputtedSize and intermediate resultsIf they are equal, directly order the inputAnd intermediate resultsAdding to obtain the output of the space-time feature fusion module
Step 2.4.6, repeating the step 2.4.1 to the step 2.4.5 until all the self-adaptive confidence coefficient modules are passed, thereby obtaining a space-time gait feature sequence Fst。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011432129.1A CN112434655B (en) | 2020-12-07 | 2020-12-07 | Gait recognition method based on adaptive confidence map convolution network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011432129.1A CN112434655B (en) | 2020-12-07 | 2020-12-07 | Gait recognition method based on adaptive confidence map convolution network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112434655A true CN112434655A (en) | 2021-03-02 |
CN112434655B CN112434655B (en) | 2022-11-08 |
Family
ID=74690963
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011432129.1A Active CN112434655B (en) | 2020-12-07 | 2020-12-07 | Gait recognition method based on adaptive confidence map convolution network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112434655B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113159068A (en) * | 2021-04-13 | 2021-07-23 | 天津大学 | RGB-D significance target detection method based on deep learning |
CN113177464A (en) * | 2021-04-27 | 2021-07-27 | 浙江工商大学 | End-to-end multi-modal gait recognition method based on deep learning |
CN113496216A (en) * | 2021-08-31 | 2021-10-12 | 四川大学华西医院 | Multi-angle falling high-risk identification method and system based on skeleton key points |
CN113538581A (en) * | 2021-07-19 | 2021-10-22 | 之江实验室 | 3D attitude estimation method based on graph attention space-time convolution |
CN113657169A (en) * | 2021-07-19 | 2021-11-16 | 浙江大华技术股份有限公司 | Gait recognition method, device, system and computer readable storage medium |
CN113673560A (en) * | 2021-07-15 | 2021-11-19 | 华南理工大学 | Human behavior identification method based on multi-stream three-dimensional adaptive graph convolution |
CN114012742A (en) * | 2022-01-05 | 2022-02-08 | 北京动思创新科技有限公司 | Control system of hip joint power assisting device |
CN114224326A (en) * | 2021-11-18 | 2022-03-25 | 北京精密机电控制设备研究所 | Wearable gait phase and action recognition device and method |
CN115909418A (en) * | 2023-03-01 | 2023-04-04 | 科大讯飞股份有限公司 | Human body direction determining method, human body direction determining device, screen control method, device and related equipment |
CN117690583A (en) * | 2024-02-01 | 2024-03-12 | 吉林大学 | Internet of things-based rehabilitation and nursing interactive management system and method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492581A (en) * | 2018-11-09 | 2019-03-19 | 中国石油大学(华东) | A kind of human motion recognition method based on TP-STG frame |
WO2019220128A1 (en) * | 2018-05-18 | 2019-11-21 | Benevolentai Technology Limited | Graph neutral networks with attention |
CN110837778A (en) * | 2019-10-12 | 2020-02-25 | 南京信息工程大学 | Traffic police command gesture recognition method based on skeleton joint point sequence |
CN111160294A (en) * | 2019-12-31 | 2020-05-15 | 西安理工大学 | Gait recognition method based on graph convolution network |
CN111310668A (en) * | 2020-02-18 | 2020-06-19 | 大连海事大学 | Gait recognition method based on skeleton information |
WO2020173226A1 (en) * | 2019-02-28 | 2020-09-03 | 华中科技大学 | Spatial-temporal behavior detection method |
CN111652124A (en) * | 2020-06-02 | 2020-09-11 | 电子科技大学 | Construction method of human behavior recognition model based on graph convolution network |
CN111814719A (en) * | 2020-07-17 | 2020-10-23 | 江南大学 | Skeleton behavior identification method based on 3D space-time diagram convolution |
-
2020
- 2020-12-07 CN CN202011432129.1A patent/CN112434655B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019220128A1 (en) * | 2018-05-18 | 2019-11-21 | Benevolentai Technology Limited | Graph neutral networks with attention |
CN109492581A (en) * | 2018-11-09 | 2019-03-19 | 中国石油大学(华东) | A kind of human motion recognition method based on TP-STG frame |
WO2020173226A1 (en) * | 2019-02-28 | 2020-09-03 | 华中科技大学 | Spatial-temporal behavior detection method |
CN110837778A (en) * | 2019-10-12 | 2020-02-25 | 南京信息工程大学 | Traffic police command gesture recognition method based on skeleton joint point sequence |
CN111160294A (en) * | 2019-12-31 | 2020-05-15 | 西安理工大学 | Gait recognition method based on graph convolution network |
CN111310668A (en) * | 2020-02-18 | 2020-06-19 | 大连海事大学 | Gait recognition method based on skeleton information |
CN111652124A (en) * | 2020-06-02 | 2020-09-11 | 电子科技大学 | Construction method of human behavior recognition model based on graph convolution network |
CN111814719A (en) * | 2020-07-17 | 2020-10-23 | 江南大学 | Skeleton behavior identification method based on 3D space-time diagram convolution |
Non-Patent Citations (6)
Title |
---|
FANJIA LI 等: "Multi-Stream and Enhanced Spatial-Temporal Graph Convolution Network for Skeleton-Based Action Recognition", 《IEEE ACCESS》 * |
HUAYU LI 等: "DG-FPN: Learning Dynamic Feature Fusion Based on Graph Convolution Network For Object Detection", 《2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 * |
XUESONG GAO 等: "3D Skeleton-Based Video Action Recognition by Graph Convolution Network", 《2019 IEEE INTERNATIONAL CONFERENCE ON SMART INTERNET OF THINGS (SMARTIOT)》 * |
YAN S. 等: "Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition", 《PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL》 * |
王强宇: "基于深度神经网络的动态手势识别技术研究", 《中国博士学位论文全文数据库 信息科技辑》 * |
王欣 等: "基于双层卷积神经网络的步态识别算法", 《安徽大学学报(自然科学版)》 * |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113159068A (en) * | 2021-04-13 | 2021-07-23 | 天津大学 | RGB-D significance target detection method based on deep learning |
WO2022227275A1 (en) * | 2021-04-27 | 2022-11-03 | 浙江工商大学 | Deep learning-based end-to-end multi-modal gait recognition method |
CN113177464A (en) * | 2021-04-27 | 2021-07-27 | 浙江工商大学 | End-to-end multi-modal gait recognition method based on deep learning |
CN113177464B (en) * | 2021-04-27 | 2023-12-01 | 浙江工商大学 | End-to-end multi-mode gait recognition method based on deep learning |
CN113673560B (en) * | 2021-07-15 | 2023-06-09 | 华南理工大学 | Human behavior recognition method based on multi-flow three-dimensional self-adaptive graph convolution |
CN113673560A (en) * | 2021-07-15 | 2021-11-19 | 华南理工大学 | Human behavior identification method based on multi-stream three-dimensional adaptive graph convolution |
CN113657169A (en) * | 2021-07-19 | 2021-11-16 | 浙江大华技术股份有限公司 | Gait recognition method, device, system and computer readable storage medium |
CN113538581A (en) * | 2021-07-19 | 2021-10-22 | 之江实验室 | 3D attitude estimation method based on graph attention space-time convolution |
CN113538581B (en) * | 2021-07-19 | 2024-03-12 | 之江实验室 | 3D attitude estimation method based on graph attention space-time convolution |
CN113496216B (en) * | 2021-08-31 | 2023-05-05 | 四川大学华西医院 | Multi-angle falling high-risk identification method and system based on skeleton key points |
CN113496216A (en) * | 2021-08-31 | 2021-10-12 | 四川大学华西医院 | Multi-angle falling high-risk identification method and system based on skeleton key points |
CN114224326A (en) * | 2021-11-18 | 2022-03-25 | 北京精密机电控制设备研究所 | Wearable gait phase and action recognition device and method |
CN114224326B (en) * | 2021-11-18 | 2024-05-03 | 北京精密机电控制设备研究所 | Wearable gait phase and motion recognition device and method |
CN114012742B (en) * | 2022-01-05 | 2022-03-29 | 北京动思创新科技有限公司 | Control system of hip joint power assisting device |
CN114012742A (en) * | 2022-01-05 | 2022-02-08 | 北京动思创新科技有限公司 | Control system of hip joint power assisting device |
CN115909418A (en) * | 2023-03-01 | 2023-04-04 | 科大讯飞股份有限公司 | Human body direction determining method, human body direction determining device, screen control method, device and related equipment |
CN117690583A (en) * | 2024-02-01 | 2024-03-12 | 吉林大学 | Internet of things-based rehabilitation and nursing interactive management system and method |
CN117690583B (en) * | 2024-02-01 | 2024-04-09 | 吉林大学 | Internet of things-based rehabilitation and nursing interactive management system and method |
Also Published As
Publication number | Publication date |
---|---|
CN112434655B (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112434655B (en) | Gait recognition method based on adaptive confidence map convolution network | |
CN112800903B (en) | Dynamic expression recognition method and system based on space-time diagram convolutional neural network | |
CN108416266B (en) | Method for rapidly identifying video behaviors by extracting moving object through optical flow | |
CN106056628B (en) | Method for tracking target and system based on depth convolutional neural networks Fusion Features | |
CN107529650B (en) | Closed loop detection method and device and computer equipment | |
CN109815826B (en) | Method and device for generating face attribute model | |
US11182644B2 (en) | Method and apparatus for pose planar constraining on the basis of planar feature extraction | |
CN111652124A (en) | Construction method of human behavior recognition model based on graph convolution network | |
US20170083751A1 (en) | Method for estimating locations of facial landmarks in an image of a face using globally aligned regression | |
CN108960059A (en) | A kind of video actions recognition methods and device | |
CN111814719A (en) | Skeleton behavior identification method based on 3D space-time diagram convolution | |
US20230134967A1 (en) | Method for recognizing activities using separate spatial and temporal attention weights | |
CN110378208B (en) | Behavior identification method based on deep residual error network | |
CN107833239B (en) | Optimization matching target tracking method based on weighting model constraint | |
CN113095254B (en) | Method and system for positioning key points of human body part | |
CN116246338B (en) | Behavior recognition method based on graph convolution and transducer composite neural network | |
CN114694261A (en) | Video three-dimensional human body posture estimation method and system based on multi-level supervision graph convolution | |
CN109800635A (en) | A kind of limited local facial critical point detection and tracking based on optical flow method | |
CN111833400B (en) | Camera pose positioning method | |
CN108062559A (en) | A kind of image classification method based on multiple receptive field, system and device | |
CN114118303B (en) | Face key point detection method and device based on prior constraint | |
CN112906520A (en) | Gesture coding-based action recognition method and device | |
CN109522865A (en) | A kind of characteristic weighing fusion face identification method based on deep neural network | |
CN110135435B (en) | Saliency detection method and device based on breadth learning system | |
CN111027350A (en) | Improved PCA algorithm based on human face three-dimensional reconstruction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |