CN112131970A - Identity recognition method based on multi-channel space-time network and joint optimization loss - Google Patents

Identity recognition method based on multi-channel space-time network and joint optimization loss Download PDF

Info

Publication number
CN112131970A
CN112131970A CN202010926230.6A CN202010926230A CN112131970A CN 112131970 A CN112131970 A CN 112131970A CN 202010926230 A CN202010926230 A CN 202010926230A CN 112131970 A CN112131970 A CN 112131970A
Authority
CN
China
Prior art keywords
gait
network
loss
sequence
joint optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010926230.6A
Other languages
Chinese (zh)
Inventor
蒋敏兰
吴颖
陈昊然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Normal University CJNU
Original Assignee
Zhejiang Normal University CJNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Normal University CJNU filed Critical Zhejiang Normal University CJNU
Priority to CN202010926230.6A priority Critical patent/CN112131970A/en
Publication of CN112131970A publication Critical patent/CN112131970A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • G06V40/25Recognition of walking or running movements, e.g. gait recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an identity recognition method based on a multi-channel space-time network and joint optimization loss, which comprises a multi-channel space-time network system and a joint optimization loss system, wherein the joint optimization loss system comprises an improved ternary loss function and a label smoothing regularized cross entropy loss function, the label smoothing regularized cross entropy loss function is a cross entropy loss function in the training process aiming at the traditional classification network, and label smoothing regularization is blended in the cross entropy loss in a calculation mode. The problems that the gait recognition cross-view angle accuracy is low based on the traditional image method, the model-based gait recognition method is complex in calculation, long in time consumption and the like are solved, and method guarantee is provided for the real-time identity recognition technology.

Description

Identity recognition method based on multi-channel space-time network and joint optimization loss
Technical Field
The invention relates to the technical field of identity recognition methods, in particular to an identity recognition method based on a multi-channel space-time network and joint optimization loss.
Background
In recent years, artificial intelligence technology is becoming mature and gradually applied, and more industries begin to enter an intelligent technology innovation stage. The identity authentication field is gradually developed from traditional user name/password authentication, IC card authentication and dynamic password authentication into the existing human body biological characteristic identification authentication. The identification of the identity of an individual is completed according to the unique physiological characteristics or behavior characteristics of each person by combining the computer technology and the sensor technology, and the identification technology is one of the identity authentication technologies with the highest safety factor at present. Gait recognition, an emerging biometric recognition technology, has also started to be increasingly explored in recent years. The identified person does not need to contact a sensor, the requirements on the acquisition direction and the image quality of the gait image are not high, and the identity authentication can be completed at any angle in a long distance. Although the current gait recognition technology cannot reach the commercial level, the unique advantages and the wide application prospect attract more and more scholars to participate in research. In a novel construction smart city report, the safety of citizen digital identity authentication and network identity identification technology is emphasized, so that the development of gait identification technology can be well complemented with other biological characteristic identification technology, and a new thought is provided for identity authentication in the modern intelligent security construction. The existing CASIA-B database is a large-scale gait database CASIA-B disclosed by the automation of the Chinese academy, the CASIA-B database comprises 124 persons (93 persons in men and 31 persons in women), each person has 11 visual angles (0 degree, 18 degrees, 36 degrees, 54 degrees, 72 degrees, 90 degrees, 108 degrees, 126 degrees, 144 degrees, 162 degrees and 180 degrees), and 3 walking states (common conditions, wearing overcoat and carrying packages) exist at each visual angle.
The gait recognition based on the gait framework sequence has stronger robustness in the scenes of visual angle change, carrying objects and the like, but the refined framework sequence loses a large number of effective features, the difference between different individuals is reduced, the advantages and the disadvantages of the gait recognition are just complementary with the method based on the gait contour sequence, and the gait recognition method of the multi-channel space-time network and the joint optimization loss is provided by combining the advantages of the two types of gait sequences, so that the network training convergence speed is accelerated while the effective feature similarity learning is ensured, and the identity recognition accuracy in the scenes of visual angle change, carrying objects and the like is improved.
Disclosure of Invention
In order to solve certain or some technical problems in the prior art, the invention provides an identity recognition method based on a multi-channel space-time network and joint optimization loss, solves the problems of low cross-view angle accuracy of gait recognition based on a traditional image method, complex calculation, long time consumption and the like of a gait recognition method based on a model, and provides method guarantee for a real-time identity recognition technology.
In order to solve the above-mentioned existing technical problem, the invention adopts the following scheme:
an identity recognition method based on a multi-channel spatio-temporal network and joint optimization loss comprises a multi-channel spatio-temporal network system and a joint optimization loss system, wherein the joint optimization loss system comprises an improved ternary loss function and a label smoothing regularization cross entropy loss function, the label smoothing regularization cross entropy loss function is a cross entropy loss function in a training process aiming at a traditional classification network, and label smoothing regularization is blended in cross entropy loss in a calculation mode, and the steps of realizing the method comprise:
step one, preprocessing a gait sequence, namely preprocessing gait images in a CASIA-B gait database into a gait contour sequence and a skeleton sequence with consistent sizes and aligned centers through the gait sequence respectively by a gait image preprocessing algorithm;
step two, inputting a gait framework sequence and a contour sequence obtained by preprocessing the gait sequence into a multi-channel space-time network system together so as to fully extract space-time characteristics among the gait sequences;
step three, establishing a gait identification model by combining a triple network;
and step four, jointly supervising network training by combining the improved ternary loss and the optimized cross entropy loss.
Furthermore, the multi-channel space-time network system adopts a structure that a multi-channel shallow convolutional neural network is connected with a long-time memory neural network in series as a main network for feature extraction, and gait frameworks and contour sequences which correspond to each other one by one in a period are directly used as input of the network so as to fully mine space-time information between the gait sequences.
Further, the improved ternary loss is an improvement on the selection mode of the positive and negative samples in the ternary loss training process, and stronger constraint is added on the selection of the positive and negative samples.
Further, the calculation method of the ternary loss value is to calculate the spatial euclidean distance of all samples in each Batch in the training process, and calculate the spatial euclidean distance by using the negative sample closest to the original sample and the positive sample farthest from the original sample, and the calculation formula is as follows:
Figure BDA0002668496090000031
inputting p original samples of each Batch, selecting different gait sequences of k frames from the samples of each class to form gait sequences of p frames x k frames, and LthFor the final ternary loss value, a represents the original sample, a is the set of positive samples that are the farthest from the original sample, and B is the set of negative samples that are the closest to the original sample.
Further, a calculation formula of the label smoothing regularization method in the label smoothing regularization cross entropy loss function is as follows:
Figure BDA0002668496090000032
wherein, λ is the weight of the smooth label, and the value range is λ ∈ [0,1]]And n is the number of label types.
Further, the expression after merging LSR in cross entropy loss is:
Figure BDA0002668496090000033
further, jointly improving a ternary loss function and a label smoothing regularization cross entropy loss function, and jointly supervising network training, wherein the combined optimization loss system loss function expression after fusion is as follows:
Ltotal=k×LLSR-ce+Lth
wherein L isLSR-ceFor cross entropy loss function of joining LSR, LthFor an improved ternary loss function, k is a weight coefficient for fusing two loss functions.
Further, an attention mechanism is included, and key frames can be captured and gait feature emphasis extraction can be carried out on the multi-channel space-time network system so as to increase accuracy and robustness of the network model.
Furthermore, the attention mechanism comprises a weight of the gait sequence, the weight of the gait sequence is obtained by normalizing the common frame number of each type of gait sequence and the fraction output by the corresponding long-time memory neural network, and the calculation formula is as follows:
Figure BDA0002668496090000041
wherein Q isjWeight coefficient representing gait sequence of corresponding j frames, cjAnd expressing the output value of the fusion characteristic of the j frame gait sequence of the long and short time memory neural network.
Further, according to the obtained weight coefficient QjAnd further calculating the space-time characteristics based on the attention mechanism, wherein the calculation formula is as follows:
Figure BDA0002668496090000042
wherein F represents a spatiotemporal feature obtained based on an attention mechanism, QjThe weighting coefficients obtained for the jth frame sequence obtained by the attention mechanism,
Figure BDA0002668496090000043
the gait frame is a fusion feature corresponding to j frames of gait frames and a contour sequence.
Compared with the prior art, the invention has the beneficial effects that:
compared with the gait outline, the existing gait skeleton has stronger robustness under the scenes of visual angle change, carrying objects and the like, but the skeleton sequence after thinning loses a large number of effective characteristics, reduces the difference among different individuals, and has the advantages and the disadvantages which are just complementary with the gait outline. The method comprises the steps of firstly improving the selection mode of positive and negative samples in the process of ternary loss training through a combined optimization loss system, thereby enhancing the generalization performance and robustness of a metric learning network, meanwhile aiming at the problem that the traditional classification network has low accuracy of network classification and identification due to the fact that a cross entropy loss function cannot effectively use the label position of a negative sample in the training process, integrating label smoothing and regularization in the calculation of cross entropy loss for the purpose of improving the accuracy of network classification, jointly supervising network training through the two improved loss functions, improving the accuracy of classification and identification while ensuring effective characteristic distance metric learning, overcoming the problem that the network is not easy to converge, effectively solving the problems of low cross-view-angle accuracy of gait identification based on the traditional image method, complex calculation and long time consumption of the model-based gait identification method, and the like, and providing method guarantee for the real-time identity identification technology, meanwhile, the modern biological behavior characteristic recognition technology is improved, the safety of identity recognition under complex scenes such as carrying objects, wearing and the like is ensured, and meanwhile, the gait identity recognition accuracy is improved.
Drawings
FIG. 1 is a schematic diagram of an identity recognition framework based on a multi-channel spatio-temporal network and joint optimization loss;
FIG. 2 is a schematic diagram of a multi-channel spatio-temporal network system;
FIG. 3 is a schematic diagram of the joint optimization loss scheme of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and the detailed description, and it should be noted that any combination of the embodiments or technical features described below can be used to form a new embodiment without conflict.
As shown in fig. 1 to 3, an identity recognition method based on a multi-channel spatio-temporal network and joint optimization loss includes a multi-channel spatio-temporal network system and a joint optimization loss system, where the joint optimization loss system includes two parts, namely an improved ternary loss function and a label smoothing regularization cross entropy loss function, the label smoothing regularization cross entropy loss function is a cross entropy loss function in a training process for a traditional classification network, and a label smoothing regularization is incorporated in the cross entropy loss in calculation, and the method includes the following steps:
step one, preprocessing a gait sequence, namely preprocessing gait images in a CASIA-B gait database into a gait contour sequence and a skeleton sequence with consistent sizes and aligned centers through the gait sequence respectively by a gait image preprocessing algorithm;
step two, inputting a gait framework sequence and a contour sequence obtained by preprocessing the gait sequence into a multi-channel space-time network system together so as to fully extract space-time characteristics among the gait sequences;
step three, establishing a gait identification model by combining a triple network;
and step four, jointly supervising network training by combining the improved ternary loss and the optimized cross entropy loss.
When the gait sequence is preprocessed, firstly, a gait video is processed into a frame of gait image through a gait image preprocessing algorithm, a gait skeleton image and a contour image are obtained by combining a posture estimation method and an image moving target extraction method, and the gait skeleton image and the contour image are processed into a gait sequence with consistent size and aligned center by further combining a two-interpolation amplification and image centroid alignment centralization algorithm, so that an experiment sample is provided for a subsequent characteristic measurement learning and classification network.
The multi-channel time-space network system adopts a structure that a multi-channel shallow convolutional neural network is connected with a long-time memory neural network in series as a main network for feature extraction, and gait frameworks and contour sequences which correspond to each other one by one in a period are directly used as input of the network so as to fully mine time-space information between the gait sequences.
Based on the invented method, compared with the existing method, the method has the outstanding differences and contributions that:
the novel multi-channel space-time network system is characterized in that a multi-channel shallow convolutional network is connected with a long-time memory network in series to serve as a backbone of a multi-channel space-time network, so that gait sequence space-time information is fully extracted, gait features with higher discriminative power are improved for subsequent gait identity recognition, and the specific structure is shown in figure 2.
A gait identity recognition method framework based on a multi-channel spatio-temporal network and joint optimization loss is shown in fig. 1, and the method comprises the following steps: (1) the gait framework sequence and the gait contour sequence are used as the input of a multi-channel space-time network system together, and the attention mechanism is combined to fully extract space-time characteristics among the gait sequences, so that gait characteristics with higher discriminative power are provided for the subsequent characteristic measurement learning and classification network; (2) a joint optimization loss system strategy is provided, aiming at the problems that the ternary group network is not easy to converge and low in generalization performance in the training process, the selection mode of positive and negative samples in the ternary loss training process is improved, and the generalization performance and robustness of the metric learning network are enhanced; meanwhile, a Label Smoothing Regularization (LSR) method is fused to optimize a cross entropy loss function, and for the reason that the cross entropy loss function cannot effectively use the label position of a negative sample in the training process of the traditional classification network, the label smoothing regularization is fused in the calculation of cross entropy loss, so that the network classification accuracy is improved, the network training is jointly supervised by combining the two types of optimization loss functions, the classification identification accuracy is improved while the effective characteristic distance measurement learning is ensured, and the problem that the network is not easy to converge is solved; (3) a large number of comparison experiments are carried out in the CASIA-B database, and the effectiveness of the invention is further verified.
A structure of a multi-channel shallow layer convolution neural network system connected with a long-time memory neural network in series is used as a main network for feature extraction, gait frameworks and contour sequences which correspond to each other one by one in a period are directly used as input of the network, and therefore time-space information between the gait sequences is fully mined.
And further improving a ternary loss function, wherein the selection mode of the positive and negative samples in the ternary loss training process is mainly improved, and stronger constraint is added on the selection of the positive and negative samples, namely, a gait sequence with smaller heterogeneous visual angle difference is selected as a negative sample, and a gait sequence with larger homogeneous visual angle difference is selected as a positive sample. By selecting proper positive and negative samples, the Euclidean distance between the positive and negative samples is effectively avoided from being too far, and the generalization and robustness of the metric learning network are further enhanced.
The improvement is as follows: the improved ternary loss is an improvement aiming at a selection mode of positive and negative samples in a ternary loss training process, and stronger constraint is added on the selection of the positive and negative samples; the ternary loss function is one of the commonly used characteristic distance measurement learning modes at present, and aims to enable the characteristic distances extracted by samples of the same category to be closer and the characteristic distances extracted by samples of different categories to be farther, so that the precision of fine-grained classification is improved. In the ternary loss training process, an original sample (an anchor sample) and a corresponding positive sample and a negative sample need to be input, and the traditional random selection mode is easy to meet constraint conditions before solving a loss function, so that the problems of network generalization performance reduction and the like are easily caused. Therefore, the selection mode of positive and negative samples in the ternary loss training process is improved so as to enhance the generalization and robustness of the metric learning network. Meanwhile, the problem that the classification accuracy is low during testing is caused by the fact that the cross entropy loss function cannot effectively use the label position of a negative sample in the training process of the traditional classification network. The method adopts a fusion label smoothing regularization method to optimize and calculate cross entropy loss so as to improve the classification precision of the test network. The two types of improved loss functions are combined to monitor network training, so that the classification and identification accuracy is improved while effective characteristic distance measurement learning is ensured, and the problem that the network is not easy to converge is solved.
The calculation method of the ternary loss value is characterized in that the space Euclidean distance of all samples is calculated in each Batch in the training process, and is calculated through a negative sample closest to an original sample and a positive sample farthest from the original sample, specifically, the space Euclidean distance of all samples is calculated in each Batch in the training process, and the final ternary loss is calculated through the negative sample (high-similarity negative sample) closest to the original sample and the positive sample (low-similarity positive sample) farthest from the original sample, so that the network can be rapidly converged and meanwhile avoid the oscillation condition. For example: original samples of p categories are input into each Batch, and k frames of different gait sequences are selected from the samples of each category to form p x k frame gait sequences. The final three losses can be expressed as:
Figure BDA0002668496090000081
wherein, p kinds of original samples are input into each Batch, and k frames of different gait sequences are selected for the samples of each kind to formp x k frame gait sequence, LthFor the final ternary loss value, a represents the original sample, a is the set of positive samples that are the farthest from the original sample, and B is the set of negative samples that are the closest to the original sample.
The cross entropy loss function of label smoothing and regularization is further improved, the number of negative labels is increased due to the fact that the number of sample categories in a training set is large, and in the cross entropy loss function of the classification probability calculated by using the traditional Softmax, the label positions of the negative samples are ignored by adopting a one-hot label calculation mode. Finally, the network can perform good fitting on the sample classification in the training set, but the accuracy of the network testing is reduced due to the fact that the label position of the negative sample cannot be effectively utilized. Therefore, the cross entropy loss function is calculated by adding a Label Smoothing Regularization (LSR) method, so that the calculation result of the Softmax activation function can be closer to the correct output, and the classification and identification accuracy of the test network is improved. The calculation formula of the label smoothing regularization method in the label smoothing regularization cross entropy loss function is as follows:
Figure BDA0002668496090000082
wherein, λ is the weight of the smooth label, and the value range is λ ∈ [0,1]]And n is the number of label types. The expression after merging LSR in cross entropy loss is:
Figure BDA0002668496090000091
and further improving the joint optimization loss, wherein through the design and analysis, the joint improvement ternary loss function and the label smoothing regularization cross entropy loss function are subjected to joint supervision network training, and the fused joint optimization loss system loss function expression is as follows:
Ltotal=k×LLSR-ce+Lth
wherein L isLSR-ceFor cross entropy loss function of joining LSR, LthFor an improved ternary loss function, k isAnd fusing the weight coefficients of the two loss functions. The value range of the hyper-parameter as the network is [0,1]]It can be adjusted according to the training situation of the network.
By the combined optimization loss strategy, the network convergence of the set model can be effectively controlled, so that the network approaches an optimization curve, the effective similarity measurement is achieved, and the accuracy of individual fine-grained classification is improved.
Human gait motion can be regarded as a time sequence problem in nature, so each frame of gait sequence in a period also has a strict sequence, and each frame of gait outline and skeleton diagram represent the motion posture at a certain moment in the gait period. Therefore, on the basis of ensuring that the shape features of the image are extracted, whether the time sequence information among the gait sequences can be sufficiently mined is the key for learning the feature distance similarity and improving the accuracy of identity recognition. Compared with a convolutional neural network, the cyclic neural network is better at feature learning of sequence data, so that a multichannel space-time network is provided by combining the convolutional neural network and a long-time memory network, attention is fused to repeatedly extract space-time information between gait sequences, and the specific structure of the multichannel space-time network is shown in fig. 2.
In the multi-channel space-time feature extraction network shown in fig. 2, the one-to-one corresponding contour sequence and the skeleton sequence are used as input together, and because the gait contour sequence is a binary image and the skeleton sequence is an RGB image, the gait sequences of two types are processed into the same size before input. And further respectively inputting the two-channel convolution network to extract the gait sequence spatial features, and fusing the features in a tensor splicing mode. The long-time memory network layer adopts a double-layer network containing 256 hidden nodes, the output fusion characteristic graph of the convolutional network is used as the input of a rear-layer network according to the time sequence, and the time sequence characteristic is extracted by decoding through an LSTM module.
On the basis of the above, an Attention mechanism (Attention) is also introduced, and the method solves the defect that the traditional continuous image sequence processing network can only carry out feature extraction on limited frame input, thereby utilizing the association among different frames in a gait cycle to the maximum extent and weakening the influence of irrelevant frames in the cycle on a network output result. The key frames can be captured through the multi-channel space-time network and gait features of the key frames are mainly extracted so as to increase the accuracy and robustness of a network model, and if each type of gait sequence has Z frames in common, Z scores are output corresponding to the long-time and short-time memory neural network. After normalization, the weight of each frame sequence is obtained, and the calculation method is represented by the following formula:
Figure BDA0002668496090000101
wherein Q isjWeight coefficient representing gait sequence of corresponding j frames, cjAnd expressing the output value of the fusion characteristic of the gait sequence of the jth frame of the long and short time memory neural network, so that the time-space characteristic based on the attention mechanism can be further calculated according to the obtained weight coefficient, and the calculation formula is as follows:
Figure BDA0002668496090000102
wherein F represents a spatiotemporal feature obtained based on an attention mechanism, QjThe weighting coefficients obtained for the jth frame sequence obtained by the attention mechanism,
Figure BDA0002668496090000103
the gait frame is a fusion feature corresponding to j frames of gait frames and a contour sequence.
The above embodiments are only preferred embodiments of the present invention, and the protection scope of the present invention is not limited thereby, and any insubstantial changes and substitutions made by those skilled in the art based on the present invention are within the protection scope of the present invention.

Claims (10)

1. An identity recognition method based on multi-channel space-time network and joint optimization loss is characterized in that: the method comprises a multi-channel space-time network system and a joint optimization loss system, wherein the joint optimization loss system comprises an improved ternary loss function and a label smoothing and regularization cross entropy loss function, the label smoothing and regularization cross entropy loss function is a cross entropy loss function in a training process aiming at the traditional classification network, and label smoothing and regularization is blended in cross entropy loss in calculation, and the method is realized by the following steps:
step one, preprocessing a gait sequence, namely preprocessing gait images in a CASIA-B gait database into a gait contour sequence and a skeleton sequence with consistent sizes and aligned centers through the gait sequence respectively by a gait image preprocessing algorithm;
step two, inputting a gait framework sequence and a contour sequence obtained by preprocessing the gait sequence into a multi-channel space-time network system together so as to fully extract space-time characteristics among the gait sequences;
step three, establishing a gait identification model by combining a triple network;
and step four, jointly supervising network training by combining the improved ternary loss and the optimized cross entropy loss.
2. The identity recognition method based on the multi-channel spatio-temporal network and the joint optimization loss as claimed in claim 1, characterized in that: the multi-channel time-space network system adopts a structure that a multi-channel shallow convolutional neural network is connected with a long-time memory neural network in series as a main network for feature extraction, and gait frameworks and contour sequences which correspond to each other one by one in a period are directly used as input of the network so as to fully mine time-space information between the gait sequences.
3. The identity recognition method based on the multi-channel spatio-temporal network and the joint optimization loss as claimed in claim 1, characterized in that: the improved ternary loss is an improvement aiming at the selection mode of the positive and negative samples in the ternary loss training process, and stronger constraint is added on the selection of the positive and negative samples.
4. The identity recognition method based on the multi-channel spatio-temporal network and the joint optimization loss as claimed in claim 3, characterized in that: the calculation method of the ternary loss value is that the space Euclidean distance of all samples is calculated in each Batch in the training process, and the calculation is carried out through a negative sample closest to an original sample and a positive sample farthest from the original sample, and the calculation formula is as follows:
Figure RE-FDA0002797197540000021
inputting p original samples of each Batch, selecting different gait sequences of k frames from the samples of each class to form gait sequences of p frames x k frames, and LthFor the final ternary loss value, a represents the original sample, a is the set of positive samples that are the farthest from the original sample, and B is the set of negative samples that are the closest to the original sample.
5. The identity recognition method based on the multi-channel spatio-temporal network and the joint optimization loss as claimed in claim 4, characterized in that: the calculation formula of the label smoothing regularization method in the label smoothing regularization cross entropy loss function is as follows:
Figure RE-FDA0002797197540000022
wherein, λ is the weight of the smooth label, the value range is λ ∈ [0,1], and n is the number of label types.
6. The identity recognition method based on the multi-channel spatio-temporal network and the joint optimization loss as claimed in claim 5, characterized in that: the expression after merging LSR in cross entropy loss is:
Figure RE-FDA0002797197540000023
7. the identity recognition method based on the multi-channel spatio-temporal network and the joint optimization loss as claimed in claim 6, characterized in that: jointly improving a ternary loss function and a label smoothing regularization cross entropy loss function, and jointly supervising network training, wherein the fused joint optimization loss system loss function expression is as follows: l istotal=k×LLSR-ce+Lth
Wherein L isLSR-ceFor cross entropy loss function of joining LSR, LthFor an improved ternary loss function, k is a weight coefficient for fusing two loss functions.
8. The identity recognition method based on the multi-channel space-time network and the joint optimization loss as claimed in any one of claims 1 to 7, wherein: the system also comprises an attention mechanism, and key frames can be captured by the multi-channel space-time network system and gait feature emphasis is extracted, so that the accuracy and the robustness of a network model are improved.
9. The identity recognition method based on the multi-channel spatio-temporal network and joint optimization loss as claimed in claim 8, characterized in that: the attention mechanism comprises a weight of a gait sequence, the weight of the gait sequence is obtained by normalizing the total frame number of each type of gait sequence and the fraction output by the corresponding long-time and short-time memory neural network, and the calculation formula is as follows:
Figure RE-FDA0002797197540000031
wherein Q isjWeight coefficient representing gait sequence of corresponding j frames, cjAnd expressing the output value of the fusion characteristic of the j frame gait sequence of the long and short time memory neural network.
10. The identity recognition method based on the multi-channel spatio-temporal network and joint optimization loss as claimed in claim 9, wherein: according to the obtained weight coefficient QjAnd further calculating the space-time characteristics based on the attention mechanism, wherein the calculation formula is as follows:
Figure RE-FDA0002797197540000032
wherein F represents a spatiotemporal feature obtained based on an attention mechanism, QjThe weighting coefficients obtained for the jth frame sequence obtained by the attention mechanism,
Figure RE-FDA0002797197540000033
the gait frame is a fusion feature corresponding to j frames of gait frames and a contour sequence.
CN202010926230.6A 2020-09-07 2020-09-07 Identity recognition method based on multi-channel space-time network and joint optimization loss Pending CN112131970A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010926230.6A CN112131970A (en) 2020-09-07 2020-09-07 Identity recognition method based on multi-channel space-time network and joint optimization loss

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010926230.6A CN112131970A (en) 2020-09-07 2020-09-07 Identity recognition method based on multi-channel space-time network and joint optimization loss

Publications (1)

Publication Number Publication Date
CN112131970A true CN112131970A (en) 2020-12-25

Family

ID=73848229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010926230.6A Pending CN112131970A (en) 2020-09-07 2020-09-07 Identity recognition method based on multi-channel space-time network and joint optimization loss

Country Status (1)

Country Link
CN (1) CN112131970A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112906673A (en) * 2021-04-09 2021-06-04 河北工业大学 Lower limb movement intention prediction method based on attention mechanism
CN113222775A (en) * 2021-05-28 2021-08-06 北京理工大学 User identity correlation method integrating multi-mode information and weight tensor
CN114511848A (en) * 2021-12-30 2022-05-17 广西慧云信息技术有限公司 Grape phenological period identification method and system based on improved label smoothing algorithm
CN114882593A (en) * 2022-05-18 2022-08-09 厦门市美亚柏科信息股份有限公司 Robust space-time mixed gait feature learning method and system
CN114879849A (en) * 2022-06-07 2022-08-09 吉林大学 Multi-channel air pen gesture recognition method
CN115297441A (en) * 2022-09-30 2022-11-04 上海世脉信息科技有限公司 Method for calculating robustness of individual space-time activity in big data environment
CN115687934A (en) * 2022-12-30 2023-02-03 智慧眼科技股份有限公司 Intention recognition method and device, computer equipment and storage medium
CN115841681A (en) * 2022-11-01 2023-03-24 南通大学 Pedestrian re-identification anti-attack method based on channel attention
JP2023523502A (en) * 2021-04-07 2023-06-06 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Model training methods, pedestrian re-identification methods, devices and electronics

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016065534A1 (en) * 2014-10-28 2016-05-06 中国科学院自动化研究所 Deep learning-based gait recognition method
CN106778527A (en) * 2016-11-28 2017-05-31 中通服公众信息产业股份有限公司 A kind of improved neutral net pedestrian recognition methods again based on triple losses
US10025950B1 (en) * 2017-09-17 2018-07-17 Everalbum, Inc Systems and methods for image recognition
CN108960184A (en) * 2018-07-20 2018-12-07 天津师范大学 A kind of recognition methods again of the pedestrian based on heterogeneous components deep neural network
CN110059616A (en) * 2019-04-17 2019-07-26 南京邮电大学 Pedestrian's weight identification model optimization method based on fusion loss function
CN110321862A (en) * 2019-07-09 2019-10-11 天津师范大学 A kind of pedestrian's recognition methods again based on the loss of compact ternary
WO2020122985A1 (en) * 2018-12-10 2020-06-18 Interactive-Al, Llc Neural modulation codes for multilingual and style dependent speech and language processing
CN111428658A (en) * 2020-03-27 2020-07-17 大连海事大学 Gait recognition method based on modal fusion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016065534A1 (en) * 2014-10-28 2016-05-06 中国科学院自动化研究所 Deep learning-based gait recognition method
CN106778527A (en) * 2016-11-28 2017-05-31 中通服公众信息产业股份有限公司 A kind of improved neutral net pedestrian recognition methods again based on triple losses
US10025950B1 (en) * 2017-09-17 2018-07-17 Everalbum, Inc Systems and methods for image recognition
CN108960184A (en) * 2018-07-20 2018-12-07 天津师范大学 A kind of recognition methods again of the pedestrian based on heterogeneous components deep neural network
WO2020122985A1 (en) * 2018-12-10 2020-06-18 Interactive-Al, Llc Neural modulation codes for multilingual and style dependent speech and language processing
CN110059616A (en) * 2019-04-17 2019-07-26 南京邮电大学 Pedestrian's weight identification model optimization method based on fusion loss function
CN110321862A (en) * 2019-07-09 2019-10-11 天津师范大学 A kind of pedestrian's recognition methods again based on the loss of compact ternary
CN111428658A (en) * 2020-03-27 2020-07-17 大连海事大学 Gait recognition method based on modal fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JUNYI WU等: "Improving Person Re-Identification Performance Using Body Mask Via Cross-Learning Strategy", 《 2019 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP)》, pages 1 - 4 *
吴颖: "基于深度学习的步态识别研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 1, pages 138 - 2108 *
张涛等: "一种基于全局特征的行人重识别改进算法", 《激光与光电子学进展》, vol. 57, no. 24, pages 324 - 330 *
熊炜等: "基于全局特征拼接的行人重识别算法研究", 《计算机应用研究》, vol. 38, no. 1, pages 316 - 320 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2023523502A (en) * 2021-04-07 2023-06-06 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Model training methods, pedestrian re-identification methods, devices and electronics
JP7403673B2 (en) 2021-04-07 2023-12-22 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Model training methods, pedestrian re-identification methods, devices and electronic equipment
CN112906673A (en) * 2021-04-09 2021-06-04 河北工业大学 Lower limb movement intention prediction method based on attention mechanism
CN113222775A (en) * 2021-05-28 2021-08-06 北京理工大学 User identity correlation method integrating multi-mode information and weight tensor
CN114511848A (en) * 2021-12-30 2022-05-17 广西慧云信息技术有限公司 Grape phenological period identification method and system based on improved label smoothing algorithm
CN114511848B (en) * 2021-12-30 2024-05-14 广西慧云信息技术有限公司 Grape waiting period identification method and system based on improved label smoothing algorithm
CN114882593A (en) * 2022-05-18 2022-08-09 厦门市美亚柏科信息股份有限公司 Robust space-time mixed gait feature learning method and system
CN114879849A (en) * 2022-06-07 2022-08-09 吉林大学 Multi-channel air pen gesture recognition method
CN115297441A (en) * 2022-09-30 2022-11-04 上海世脉信息科技有限公司 Method for calculating robustness of individual space-time activity in big data environment
CN115297441B (en) * 2022-09-30 2023-01-17 上海世脉信息科技有限公司 Method for calculating robustness of individual space-time activity in big data environment
CN115841681A (en) * 2022-11-01 2023-03-24 南通大学 Pedestrian re-identification anti-attack method based on channel attention
CN115687934A (en) * 2022-12-30 2023-02-03 智慧眼科技股份有限公司 Intention recognition method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112131970A (en) Identity recognition method based on multi-channel space-time network and joint optimization loss
CN110084156B (en) Gait feature extraction method and pedestrian identity recognition method based on gait features
CN106295568B (en) The mankind's nature emotion identification method combined based on expression and behavior bimodal
CN108520216B (en) Gait image-based identity recognition method
CN111339990B (en) Face recognition system and method based on dynamic update of face features
CN111898736B (en) Efficient pedestrian re-identification method based on attribute perception
CN108268859A (en) A kind of facial expression recognizing method based on deep learning
CN109255289B (en) Cross-aging face recognition method based on unified generation model
CN113516005B (en) Dance action evaluation system based on deep learning and gesture estimation
Liu et al. An end-to-end deep model with discriminative facial features for facial expression recognition
CN109344909A (en) A kind of personal identification method based on multichannel convolutive neural network
Xia et al. Face occlusion detection using deep convolutional neural networks
CN113255602A (en) Dynamic gesture recognition method based on multi-modal data
CN111401116B (en) Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network
CN110210399A (en) Face recognition method based on uncertainty quantization probability convolution neural network
CN112084998A (en) Pedestrian re-identification method based on attribute information assistance
CN114511901B (en) Age classification-assisted cross-age face recognition algorithm
CN115797827A (en) ViT human body behavior identification method based on double-current network architecture
CN115830652A (en) Deep palm print recognition device and method
CN113095201B (en) AU degree estimation model establishment method based on self-attention and uncertainty weighted multi-task learning among different areas of face
CN114937298A (en) Micro-expression recognition method based on feature decoupling
CN114550270A (en) Micro-expression identification method based on double-attention machine system
CN117636436A (en) Multi-person real-time facial expression recognition method and system based on attention mechanism
CN116363712B (en) Palmprint palm vein recognition method based on modal informativity evaluation strategy
Sun et al. Deep Facial Attribute Detection in the Wild: From General to Specific.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201225

WD01 Invention patent application deemed withdrawn after publication