CN111310668A

CN111310668A - Gait recognition method based on skeleton information

Info

Publication number: CN111310668A
Application number: CN202010100136.5A
Authority: CN
Inventors: 刘晓凯; 尤昭阳; 毕胜; 刘祥
Original assignee: Dalian Maritime University
Current assignee: Dalian Maritime University
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2020-06-19
Anticipated expiration: 2040-02-18
Also published as: CN111310668B

Abstract

The invention provides a gait recognition method based on skeleton information, which comprises the following steps: acquiring a gait video sequence; carrying out attitude estimation on the gait video sequence by adopting OpenPose to obtain a gait key point sequence; constructing a space-time skeleton sequence; inputting the adjacency matrix and the gait key point sequence into a multi-scale space-time graph convolution network for training; and after training is finished, testing by using the trained model, extracting gait characteristics and performing characteristic matching. The gait recognition method mainly adopts a human body key point form, introduces a graph convolution neural network aiming at a graph structure, improves a connection mode and a division strategy, adopts a twin mechanism for the network, combines cross entropy loss and contrast loss, fuses a shallow layer characteristic, a middle layer characteristic and a deep layer characteristic of the network, and improves the robustness of gait recognition to a certain extent.

Description

Gait recognition method based on skeleton information

Technical Field

The invention relates to the technical field of pattern recognition, in particular to a gait recognition method based on skeleton information.

Background

Gait recognition is a novel biological feature recognition technology, aims to recognize the identity of people through walking postures in a video sequence, and has the advantages of non-contact, long distance and difficulty in disguising compared with other biological recognition technologies. The method has more advantages in the fields of security and intelligent monitoring, and the gait recognition application still has some problems in the actual complex environment so far.

In recent years, gait recognition technology has become more and more important to academic research institutions at home and abroad. The prior art is mainly divided into the following two types:

1. a model-based approach. The body is mainly divided into a plurality of blocks or body joint points are obtained, and fitting is carried out through the joint points or partial motion tracks of the body motion. The method mainly relies on static characteristics of each block of the body and motion tracks of joint points for modeling, and comprises a two-dimensional model and a three-dimensional model.

2. Non-model based methods. The walking gait feature acquisition method is mainly realized by acquiring the walking shape of the human and the gait features such as the characteristic parameters and the like, and does not need to reconstruct a walking gait model. There are roughly three subjects: gait energy map, contour map sequence and human body key point sequence.

In recent years, with great progress of deep learning technology in various fields of computer vision, a great number of gait recognition methods based on deep learning emerge. For example: and describing a gait sequence by adopting a gait energy diagram, and training a matching model by utilizing a deep convolution neural network so as to match the identity of a person. The method has the advantages that when the change range of the walking visual angle of a human body is large, the extracted multi-visual angle gait feature representation capability is not enough, and meanwhile, the robustness of clothes, carrying objects and the like is low. And moreover, gait feature representation and matching are carried out by adopting the posture information. The method comprises the steps of acquiring human body key point coordinates from a gait video sequence by using a posture estimation algorithm, training the gait key point sequence by using a convolutional neural network and a long-term memory network, and simultaneously introducing manual features to carry out gait recognition. However, the existing gait recognition technology still has the following defects:

1. gait recognition is carried out through a gait contour map or a gait energy map, requirements on contour quality and background are high, influences of illumination conditions and complex background are large, and incomplete contour map extraction is often caused.

2. Due to the influence of covariates such as clothing and carried objects, the covariates cannot be separated from the human body, and the identification precision is reduced.

Disclosure of Invention

In accordance with the above-mentioned technical problem, a gait recognition method based on skeleton information is provided. The gait recognition method mainly adopts a human body key point form, introduces a graph convolution neural network aiming at a graph structure, improves a connection mode and a division strategy, adopts a twin mechanism for the network, combines cross entropy loss and contrast loss, fuses a shallow layer characteristic, a middle layer characteristic and a deep layer characteristic of the network, and improves the robustness of gait recognition to a certain extent.

The technical means adopted by the invention are as follows:

a gait recognition method based on skeleton information comprises the following steps:

s1, acquiring a gait video sequence;

s2, carrying out attitude estimation on the gait video sequence by adopting OpenPose to obtain a gait key point sequence;

s3, constructing a spatio-temporal skeleton sequence;

s4, inputting the adjacency matrix and the gait key point sequence into a multi-scale space-time graph convolution network for training;

and S5, after the training is finished, testing by using the trained model, extracting gait characteristics and carrying out characteristic matching.

Further, the step S3 is specifically:

s31, carrying out human body natural connection on the gait key point sequence in space; meanwhile, symmetry is introduced to connect symmetrical joint points (only connecting symmetrical key points of the legs because of lack of symmetry between arms under the condition of carrying objects); connecting the same key points from frame to frame in time;

s32, defining a sampling function, a node v_tiIs defined as:

B(v_ti)＝{v_tj|d(v_tj,v_ti)≤D},

wherein B is node v_tiA neighborhood set of (c); v represents a node; d represents a distance; d (v)_tj,v_ti) Representing the shortest path between two nodes, usually taking D ═ 1; thus, the sampling function is defined as: p (v)_tj,v_ti)＝v_tj；

S33, selecting a partitioning strategy, partitioning the neighborhood set into four subsets, that is, the node itself is the first subset, in the asymmetric node, a second subset closer to the gravity center than the node itself, and a third subset farther from the gravity center than the node itself, where the symmetric node is defined as the fourth subset, that is:

s34, defining a weight function, dividing the neighborhood set into four subsets, each subset having a digital label, and adopting a mapping function l_tiMapping each node to its subset label, the mapping function being defined as: b (v)_ti) → {0, … …, K-1}, K ═ 4; the weight function is defined as: w (v)_ti,v_tj)＝w’(l_ti(v_tj))；

S35, expanding the space graph convolution to a space-time domain, and defining a space-time neighborhood set as: b (v)_ti)＝{v_qj|d(v_tj,v_ti) K is less than or equal to, q-t is less than or equal to gamma, and B is a node v_tiA neighborhood set of (c); v represents a node; k represents a distance; Γ controls the extent of the graph, i.e., the temporal convolution kernel, that is included in the neighborhood.

Further, the process of training the multi-scale space-time graph convolutional neural network specifically includes:

s41, after the samples are selected, randomly selecting one sample from all samples with the same ID as the selected samples as a positive sample, and randomly selecting one sample from all samples with different IDs from the selected samples as a negative sample;

s42, inputting the selected sample into a branch 1, sequentially inputting a positive sample and a negative sample into a branch 2 in one iteration by adopting a twin mechanism, wherein the branch 1 and the branch 2 share parameters;

s43, classifying the selected sample characteristics in the branch 1 by adopting SoftMax and a cross entropy loss function;

s44, comparing the characteristics of the selected sample and the positive sample and the characteristics of the selected sample and the negative sample by using a contrast loss function; samples from the same ID are labeled 1, and samples from different IDs are labeled 0.

S45, adding the two part losses, wherein the total loss is as follows:

Loss＝Lid+0.5*[Lc(sample,pos,1)+Lc(sample,neg,0)]，

and performing back propagation to update the network, wherein Lid is cross entropy loss, and Lc is contrast loss.

Further, the step S4 further includes the following setting procedure when training the multi-scale space-time graph convolutional neural network:

step 1, inputting a gait sequence, wherein the dimensionality is [3,100,18], 3 is that input key point characteristics have 3 channels which are respectively X, Y coordinates and confidence C, 100 is that the time dimensionality has 100 frames, and 18 is that each frame has 18 key points;

step 2, outputting 64 channels from the first three layers, wherein the convolution kernel size is (9,3), 9 is the time convolution kernel size, 3 is the space convolution kernel size, and the output dimension is [64,100,18 ];

step 3, outputting 128 channels in the middle three layers, wherein the convolution kernel size is (9,3), the output dimensionality is [128,50,18], and in the fourth layer, the time dimension convolution step size is 2;

step 4, outputting 256 channels in the last three layers, wherein the convolution kernel size is (9,3), the output dimensionality is [256,25,18], and in the seventh layer, the time dimension convolution step size is 2;

step 5, performing global average pooling, wherein after the pooling is performed, the characteristic dimension is changed into 256 dimensions;

step 6, carrying out dimension exchange on the features [64,100 and 18] output by the first layer, and then carrying out average pooling to obtain 18-dimensional features;

step 7, carrying out dimension exchange on the output features [128,50 and 18] of the fifth layer, and then carrying out average pooling to obtain 18-dimensional features;

step 8, representing the gait characteristics by adopting a mode of fusing the shallow characteristics and the deep characteristics, and splicing the 18-dimensional characteristics of the first layer, the 18-dimensional characteristics of the fifth layer and the 256-dimensional characteristics of the last layer to obtain 292-dimensional characteristics;

and 9, classifying the 292-dimensional features by adopting a SoftMax classifier.

The invention also provides a testing method of the gait recognition method based on the skeleton information, which comprises the following steps:

step I: inputting a gait key point sequence to be tested;

step II: extracting gait features by using the trained network, and carrying out two-norm normalization on the features;

step III: carrying out the operations of the step I and the step II on the samples in the sample library, and representing the gait sequence of the pedestrian to be searched and the gait sequence of the pedestrian in the search library by using the characteristic vector;

step IV: calculating the distance between the pedestrian gait sequence to be retrieved and the pedestrian gait sequence in the retrieval library, namely calculating the distance between the characteristics of the pedestrian gait sequence to be retrieved and all the pedestrian gait sequences in the retrieval library aiming at one pedestrian gait sequence to be retrieved;

step V: and performing similarity sorting on the samples in the search library according to the calculated distance from small to large, wherein the more front the samples are, the more likely the samples are consistent with the ID of the pedestrian to be searched.

Compared with the prior art, the invention has the following advantages:

1. the gait recognition method provided by the invention aims at the influence of covariates such as backpacks, clothes and the like, adopts the way of representing the gait by the key point sequence, solves the problem of low robustness of the gait characteristics represented by the contour map and the energy map under the covariate condition, and improves the recognition accuracy under the influence of the covariates.

2. According to the gait recognition method provided by the invention, aiming at the characteristic symmetry characteristic of gait, the symmetry is introduced when a space-time skeleton sequence is constructed, and the symmetric joint point information of the human leg is added into the adjacency matrix, so that the relevance of related nodes is enhanced, meanwhile, the noise caused by inaccurate joint estimation is reduced, and the recognition accuracy is improved.

3. Because the deep convolutional neural network extracts high-level features which are single-representation high-level semantic information and cannot describe static features, a multi-scale mode of fusing shallow features, middle-level features and deep features is adopted, the expression form of gait features is enriched, and the identification accuracy is improved.

For the above reasons, the present invention can be widely applied to the fields of pattern recognition and the like.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a schematic diagram of a specific configuration of the multi-scale space-time graph convolutional neural network according to the present invention.

FIG. 3 is a schematic diagram of the testing method of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

As shown in fig. 1, the present invention provides a gait recognition method based on skeleton information, which comprises the following steps:

s1, acquiring a gait video sequence;

s3, constructing a spatio-temporal skeleton sequence;

further, as a preferred embodiment of the present invention, the step S3 specifically includes:

s31, carrying out human body natural connection on the gait key point sequence in space; because the walking posture of the person has the characteristic of symmetry, symmetry is introduced to connect symmetrical joint points (only connecting symmetrical key points of legs, because the symmetry between arms is lost under the condition of carrying objects); connecting the same key points from frame to frame in time;

s32, defining a sampling function, a node v_tiIs defined as:

B(v_ti)＝{v_tj|d(v_tj,v_ti)≤D},

further, as a preferred embodiment of the present invention, as shown in fig. 2, before training the multi-scale space-time graph convolutional neural network in step S4, the following setting process is further included:

step 8, because the deep convolutional neural network extracts high-level features, which singly represent high-level semantic information and cannot describe static features, representing gait features by fusing shallow features and deep features, and splicing the 18-dimensional features of the first layer, the 18-dimensional features of the fifth layer and the 256-dimensional features of the last layer to obtain 292-dimensional features;

In this embodiment, a CASIA-B data set is used, NM: normal walking condition, BG: carry conditions, CL: and (5) wearing overcoat. As shown in the following table:

the process of training the multi-scale space-time graph convolutional neural network specifically comprises the following steps:

in the training stage, the purpose is to train the network to extract the features which can represent the pedestrian, so the network is trained in a classification mode, and the specific steps are as follows:

s44, comparing the characteristics of the selected sample and the positive sample and the characteristics of the selected sample and the negative sample by using a contrast loss function; samples from the same ID are labeled with 1, and samples from different IDs are labeled with 0.

S45, adding the two part losses, wherein the total loss is as follows:

Loss＝Lid+0.5*[Lc(sample,pos,1)+Lc(sample,neg,0)]，

As shown in fig. 3, the invention also provides a testing method of the gait recognition method based on the skeleton information, which comprises the following steps:

step I: inputting a gait key point sequence to be tested;

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A gait recognition method based on skeleton information is characterized by comprising the following steps:

s1, acquiring a gait video sequence;

s3, constructing a spatio-temporal skeleton sequence;

2. The method for recognizing gait based on skeleton information according to claim 1, characterized in that the step S3 is specifically:

s32, defining a sampling function, a node v_tiIs defined as:

B(v_ti)＝{v_tj|d(v_tj,v_ti)≤D},

3. The method for gait recognition based on the skeleton information as claimed in claim 1, wherein the process of training the multi-scale space-time graph convolutional neural network is as follows:

S45, adding the two part losses, wherein the total loss is as follows:

Loss＝Lid+0.5*[Lc(sample,pos,1)+Lc(sample,neg,0)]，

4. A gait recognition method based on skeleton information according to claim 3, characterized in that said step S4 further includes the following setting process when training the multi-scale space-time graph convolutional neural network:

step 8, representing the gait characteristics by fusing the shallow layer characteristics, the middle layer characteristics and the deep layer characteristics, and splicing the 18-dimensional characteristics of the first layer, the 18-dimensional characteristics of the fifth layer and the 256-dimensional characteristics of the last layer to obtain 292-dimensional characteristics;

5. A gait recognition method testing method based on skeleton information is characterized by comprising the following steps:

step I: inputting a gait key point sequence to be tested;