CN114120447A - Behavior recognition method and system based on prototype comparison learning and storage medium - Google Patents

Behavior recognition method and system based on prototype comparison learning and storage medium Download PDF

Info

Publication number
CN114120447A
CN114120447A CN202111413784.7A CN202111413784A CN114120447A CN 114120447 A CN114120447 A CN 114120447A CN 202111413784 A CN202111413784 A CN 202111413784A CN 114120447 A CN114120447 A CN 114120447A
Authority
CN
China
Prior art keywords
prototype
vector
vectors
behavior recognition
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111413784.7A
Other languages
Chinese (zh)
Inventor
高浩元
张一帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Institute of Automation of Chinese Academy of Science
Original Assignee
Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Nanjing Artificial Intelligence Innovation Research Institute, Institute of Automation of Chinese Academy of Science filed Critical Zhongke Nanjing Artificial Intelligence Innovation Research Institute
Priority to CN202111413784.7A priority Critical patent/CN114120447A/en
Publication of CN114120447A publication Critical patent/CN114120447A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a behavior recognition method and system based on prototype comparison learning and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: sampling in the skeletal key point data set to generate a sample set; converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples; inputting the two groups of enhancement samples into a coding network respectively to obtain two groups of characterization vectors; searching a prototype vector with the highest similarity to each characterization vector in a group of characterization vectors in the prototype vector set, and labeling to generate a corresponding similar vector set; constructing a prototype contrast loss function; and performing back propagation by adopting the prototype contrast loss function, and simultaneously training the encoder network and all prototype vectors. The prototype comparison learning behavior identification method used by the invention can use the skeleton point sequence sample to train to obtain the model capable of effectively obtaining the behavior characterization vector under the conditions of not needing a behavior class label and not being based on a self-encoder.

Description

Behavior recognition method and system based on prototype comparison learning and storage medium
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a behavior recognition method and system based on prototype comparison learning and a storage medium.
Background
In the behavior recognition task, due to the constraints of data volume and algorithm, the behavior recognition model based on the RGB image is often interfered by the change of the viewing angle and the complex background, so that the generalization performance is insufficient, and the robustness in practical application is poor. While behavior recognition based on skeletal point data may solve this problem well.
In the skeletal point data, the human body is represented by coordinates of several predefined key joint points in the camera coordinate system. It can be conveniently obtained by a depth camera (e.g., Kinect) and various pose estimation algorithms (e.g., openpos). Key joint points of the human body as defined by the Kinect depth camera are used. It defines the human body as three-dimensional coordinates of 25 key joint points. Since the behavior is often in the form of video, the behavior of a length T frame can be represented by a tensor of T × 25 × 3.
In behavior recognition based on skeleton points, a behavior characterization vector of a skeleton point sequence learned by a model is a core problem, most methods train the model based on a supervised learning paradigm at present, that is, in the process of training the behavior recognition model, a large number of skeleton point data samples marked with behavior class labels are needed to train the behavior recognition model, and the difficulty and the cost of obtaining the behavior class labels are high. Meanwhile, a large number of skeleton point sequence samples of the unlabeled behavior classes cannot be effectively utilized in the training process of the behavior recognition model. Some methods propose to learn behavior characterization vectors based on unsupervised learning models constructed by an auto-encoder, however, the sample reconstruction task in the auto-encoder is not only complex in structure and low in learning efficiency, but also the learned characterization vectors often contain redundant information, so that the effect is not good when the behavior identification task is used.
Disclosure of Invention
The invention provides a behavior recognition method and system based on prototype comparison learning and a storage medium for solving the technical problems in the background art.
The invention adopts the following technical scheme: a behavior recognition method based on prototype comparison learning comprises the following steps:
acquiring joint points in bones and corresponding coordinate information to generate a sample set;
converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples;
inputting the two groups of enhanced samples into a coding network respectively to obtain a query vector and a key vector;
searching the prototype vector with the highest similarity to the key vector in the prototype vector set, and labeling to generate a corresponding similar vector set;
prototype contrast loss function L is constructed based on prototype vectors, query vectors and key vectors in the similarity vector setpc
Using said prototype contrast loss function LpcBack propagation is performed while training the encoder network and all prototype vectors.
In a further embodiment, two sets of enhancement samples are defined as
Figure BDA0003375237900000021
The query vector is obtained as follows:
Figure BDA0003375237900000022
wherein
Figure BDA0003375237900000023
Representing a query vector, fθIs a neural network of the encoder, theta is a parameter of the neural network, namely an object to be trained;
the key vector is obtained as follows:
Figure BDA0003375237900000024
wherein
Figure BDA0003375237900000025
Representing a query vector; i denotes the index of the sample.
4. In a further embodiment of the method of the invention,
5. the similar vector set is obtained as follows:
defining a set of prototype vectors, consisting of K prototype vectors:
Figure BDA0003375237900000026
and (3) similarity calculation:
Figure BDA0003375237900000027
i.e. calculating the cosine similarity of the vector a and the vector b.
Obtaining the prototype vector with the highest similarity:
Figure BDA0003375237900000028
sithe index of the prototype vector with the highest similarity to the key vector of the sample with index i.
All prototypes in a further embodiment, the prototype contrast loss function is constructed as follows:
Figure BDA0003375237900000029
Figure BDA00033752379000000210
Lpc=Lp+Lc
wherein τ is a temperature parameter whose value is in the interval (0,1), B is the number of samples of a batch, j represents the label of the sample, and k represents the label of the prototype vector.
In a further embodiment, the encoder network and all prototype vectors are trained using a gradient descent approach:
Figure BDA00033752379000000211
Figure BDA0003375237900000031
in the formula, α is a learning rate of the gradient descent method, and is set according to a training situation.
In a further embodiment, the data transformation method comprises: linear mapping clipping method.
In a further embodiment, the data transformation method comprises: coordinate axis clipping method.
A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.
The invention has the beneficial effects that: the prototype comparison learning behavior identification method used by the invention can be used for training the skeleton point sequence sample to obtain the model capable of effectively obtaining the behavior characterization vector under the conditions of not needing a behavior class label and not being based on a self-encoder. And experiments show that the method achieves a remarkably higher characterization learning effect under the condition of reducing the calculation amount.
Drawings
FIG. 1 is a flow chart of prototype-to-learning behavior recognition according to the present invention.
Fig. 2 is a prior art supervised behavior characterization learning diagram.
Fig. 3 is a prior art unsupervised characterization learning diagram based on an auto-encoder.
Fig. 4 is a diagram of prototype comparative behavior learning according to the present invention.
FIG. 5 is a graph of the prototype comparative behavior learning of the present invention versus the accuracy of existing methods.
Detailed Description
The invention is further described with reference to the drawings and the specific embodiments in the following description.
As shown in fig. 2, in the conventional supervised learning method, after a characterization vector of a sample is obtained by using an encoder network, the characterization vector is input into a linear full-link layer, a probability value of the sample belonging to each category is obtained by a Softmax activation function, and then the encoder network and the linear full-link layer are trained while performing back propagation by using a cross entropy loss function as follows:
Figure BDA0003375237900000032
wherein B is the size of batch, yiIs a sample tag, piIs the predicted probability of the output of the Softmax activation function. It can be seen that unlabeled data cannot be utilized in characterization learning in this process, since sample labels must be used in the loss function. Unsupervised behavior characterization learning based on self-encoders is shown in fig. 3, which uses a network of encoders to obtain characterization vectors of samples, and inputs the characterization vectors into a network of decoders to generate a sequence of skeleton points. The sequence of bone points is trained on the encoder network and the decoder network while backpropagating the sequence of bone points with a squared error loss function that is essentially based on:
Figure BDA0003375237900000041
wherein B is the size of batch, xiIs an original sample, and the sample is a sample,
Figure BDA0003375237900000042
are the samples generated by the decoder. After parameters of the encoder network are obtained through label-free data training, the parameters of the encoder network are finely adjusted through label data based on a supervised learning paradigm. It can be seen that the methodAllowing training on unlabeled samples of the sequence of skeletal points then requires the addition of a decoder network during the training process, which makes training inefficient, learned characterizations less targeted than behavior recognition such classification tasks.
Therefore, the prototype comparison behavior learning in fig. 4 is provided, and compared with a method based on an auto-encoder, the method directly uses the loss function to perform back propagation on the encoder, and the construction of the loss function is easier to learn the implicit category information, so that the learned representation is better in behavior recognition task, and meanwhile, the process of reconstructing a sequence by the decoder is removed, and the training efficiency is higher. The method is the same as the unsupervised behavior characterization learning method based on the self-encoder, after parameters of the encoder network are obtained through unmarked data training, the parameters of the encoder network are finely adjusted through a paradigm of supervised learning based on marked data.
Example 1
This embodiment proposes prototype comparison behavior learning as shown in fig. 1, instead of using the self-encoder structure of encoder + decoder, the comparison learning at sample level and the comparison learning at class level are combined to directly train the encoder end-to-end. Specifically, the training process of the method is as follows:
step one, obtaining joint points in bones and corresponding coordinate information, and generating a sample set X ═ Xi};xiIs the sample numbered i.
Step two, adopting at least two random data transformation methods to carry out prototype sample xiTwo sets of enhanced samples are obtained by conversion
Figure BDA0003375237900000043
Step three, respectively inputting the two groups of enhanced samples into a coding network to obtain query vectors
Figure BDA0003375237900000044
And a key vector
Figure BDA0003375237900000045
Step four, searching the prototype vector with the highest similarity with each key vector in the prototype vector set, and labeling to generate a corresponding similar vector set Ws
Step five, prototype vector based on similar vector set
Figure BDA0003375237900000051
Query vector
Figure BDA0003375237900000052
And a key vector
Figure BDA0003375237900000053
Construction of prototype contrast loss function Lpc
Step six, adopting the prototype contrast loss function LpcBack propagation is performed while training the encoder network and all prototype vectors.
In a further embodiment, the query vector is obtained as follows:
Figure BDA0003375237900000054
wherein
Figure BDA0003375237900000055
Representing a query vector, fθIs a neural network of the encoder, theta is a parameter of the neural network, namely an object to be trained;
the key vector is obtained as follows:
Figure BDA0003375237900000056
wherein
Figure BDA0003375237900000057
Representing a query vector; i denotes the index of the sample.
In a further embodiment, the similar vector set is obtained as follows:
defining a set of prototype vectors, consisting of K prototype vectors:
Figure BDA0003375237900000058
and (3) similarity calculation:
Figure BDA0003375237900000059
i.e. calculating the cosine similarity of the vector a and the vector b.
Obtaining the prototype vector with the highest similarity:
Figure BDA00033752379000000510
sithe index of the prototype vector with the highest similarity to the key vector of the sample with index i.
In a further embodiment, the prototype contrast loss function is constructed as follows:
Figure BDA00033752379000000511
Figure BDA00033752379000000512
Figure BDA00033752379000000513
wherein τ is a temperature parameter whose value is in the interval (0,1), B is the number of samples of a batch, j represents the label of the sample, and k represents the label of the prototype vector.
In a further embodiment, the encoder network and all prototype vectors are trained using a gradient descent approach:
Figure BDA00033752379000000514
Figure BDA0003375237900000061
in the formula, α is a learning rate of the gradient descent method, and is set according to a training situation.
In the random data transformation mode in the second step, we select the superposition of the following two strategies:
1. linear mapping and clipping method: defining each joint point in the whole skeleton as a three-dimensional coordinate in a three-dimensional space, and performing following transformation on each joint point, wherein a three-dimensional vector corresponding to the three-dimensional coordinate is set as
Figure BDA0003375237900000062
The new three-dimensional coordinate obtained after transformation is y, and the transformation is represented as:
Figure BDA0003375237900000063
Figure BDA0003375237900000064
wherein, the elements except 1 in the S matrix are all clipping elements obtained by random sampling from [ -1,1 ].
2. Coordinate axis cutting: for the three-dimensional coordinates of each joint point in the whole skeleton, one of the three dimensions is randomly selected and set as 0.
Based on the above description, in the whole characterization learning process, end-to-end training can be performed on the encoder network without requiring label information of the samples or constructing a sample reconstruction task by using the decoder network. The behavior characterization vector learned by the method has the following characteristics:
(1) the characterization vectors can learn specific transformation invariance;
(2) samples with similar behavioral semantic information will be as close as possible in the characterization space;
(3) all samples will be distributed as uniformly as possible over the characterization space;
(4) the individual clusters of vectors formed on the token space based on the behavioral semantic information will be distributed as evenly as possible on the token space.
Experiments show that the accuracy rate obtained by the method is obviously higher than that of other learning methods in the prior art based on the behavior recognition method, as shown in fig. 5. As can be seen from FIG. 5, the Top1Accuracy of the method of the present invention is significantly improved over the existing unsupervised behavior characterization learning method, and the performance thereof can even exceed that of many methods for performing supervised characterization learning on labeled data sets. In the figure, S: Hand-Crafted and S: DNN-Based represent several classical supervised learning methods in the behavior recognition Based on skeletal points, and SS: Depth Image and SS: Skeleton represent several unsupervised behavior characterization learning methods Based on self-encoders with the best performance at present. "Ours" represents our proposed prototype-versus-behavior learning method.
In another embodiment, a computer system is also disclosed, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program: obtaining a sample in a bone to generate a sample set;
converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples;
inputting the two groups of enhanced samples into a coding network respectively to obtain a query vector and a key vector;
searching the prototype vector with the highest similarity to the key vector in the prototype vector set, and labeling to generate a corresponding similar vector set;
prototype contrast loss function L is constructed based on prototype vectors, query vectors and key vectors in the similarity vector setpc
Using said prototype contrast loss function LpcBack propagation is performed while training the encoder network and all prototype vectors.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.

Claims (9)

1. A behavior recognition method based on prototype comparison learning is characterized by comprising the following steps:
sampling in the skeletal key point data set to generate a sample set;
converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples;
inputting the two groups of enhanced samples into a coding network respectively to obtain a query vector and a key vector;
searching the prototype vector with the highest similarity to each key vector in the prototype vector set, and labeling to generate a corresponding similar vector set;
prototype contrast loss function L is constructed based on prototype vectors, query vectors and key vectors in the similarity vector setpc
Using said prototype contrast loss function LpcBack propagation is performed while training the encoder network and all prototype vectors.
2. The behavior recognition method according to claim 1, wherein two groups of enhanced samples are defined as
Figure RE-FDA0003429589700000011
And
Figure RE-FDA0003429589700000012
the query vector is obtained as follows:
Figure RE-FDA0003429589700000013
wherein
Figure RE-FDA0003429589700000014
Representing a query vector, fθIs a neural network of the encoder, theta is a parameter of the neural network, namely an object to be trained;
the key vector is obtained as follows:
Figure RE-FDA0003429589700000015
wherein
Figure RE-FDA0003429589700000016
Representing a query vector; i denotes the index of the sample.
3. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the similar vector set is obtained as follows:
defining a set of prototype vectors, consisting of K prototype vectors:
Figure RE-FDA0003429589700000017
and (3) similarity calculation:
Figure RE-FDA0003429589700000018
i.e. calculating the cosine similarity of the vector a and the vector b.
Obtaining the prototype vector with the highest similarity:
Figure RE-FDA0003429589700000019
sithe index of the prototype vector with the highest similarity to the key vector of the sample with index i.
4. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the prototype comparison loss function is constructed as follows:
Figure RE-FDA00034295897000000110
Figure RE-FDA00034295897000000111
Lpc=Lp+Lc
wherein τ is a temperature parameter whose value is in the interval (0,1), B is the number of samples of a batch, j represents the label of the sample, and k represents the label of the prototype vector.
5. The behavior recognition method based on prototype comparison learning according to claim 1, characterized in that the encoder network and all prototype vectors are trained by gradient descent method:
Figure RE-FDA0003429589700000021
Figure RE-FDA0003429589700000022
in the formula, α is a learning rate of the gradient descent method, and is set according to a training situation.
6. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the data transformation method comprises: linear mapping clipping method.
7. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the data transformation method comprises: coordinate axis clipping method.
8. A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented by the processor when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202111413784.7A 2021-11-25 2021-11-25 Behavior recognition method and system based on prototype comparison learning and storage medium Pending CN114120447A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111413784.7A CN114120447A (en) 2021-11-25 2021-11-25 Behavior recognition method and system based on prototype comparison learning and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111413784.7A CN114120447A (en) 2021-11-25 2021-11-25 Behavior recognition method and system based on prototype comparison learning and storage medium

Publications (1)

Publication Number Publication Date
CN114120447A true CN114120447A (en) 2022-03-01

Family

ID=80373072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111413784.7A Pending CN114120447A (en) 2021-11-25 2021-11-25 Behavior recognition method and system based on prototype comparison learning and storage medium

Country Status (1)

Country Link
CN (1) CN114120447A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114386079A (en) * 2022-03-23 2022-04-22 清华大学 Encrypted traffic classification method and device based on contrast learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114386079A (en) * 2022-03-23 2022-04-22 清华大学 Encrypted traffic classification method and device based on contrast learning
CN114386079B (en) * 2022-03-23 2022-12-06 清华大学 Encrypted traffic classification method and device based on contrast learning

Similar Documents

Publication Publication Date Title
Zhang et al. Improved deep hashing with soft pairwise similarity for multi-label image retrieval
CN111310707B (en) Bone-based graph annotation meaning network action recognition method and system
CN111079532B (en) Video content description method based on text self-encoder
Ye et al. Attentive linear transformation for image captioning
Li et al. 2-D stochastic configuration networks for image data analytics
Yan et al. Multibranch attention networks for action recognition in still images
CN109948475B (en) Human body action recognition method based on skeleton features and deep learning
CN109783666B (en) Image scene graph generation method based on iterative refinement
Taylor et al. Learning invariance through imitation
CN109447096B (en) Glance path prediction method and device based on machine learning
Li et al. Multiple VLAD encoding of CNNs for image classification
CN109960732B (en) Deep discrete hash cross-modal retrieval method and system based on robust supervision
CN113780249B (en) Expression recognition model processing method, device, equipment, medium and program product
Yuan et al. Compositional scene representation learning via reconstruction: A survey
CN113763385A (en) Video object segmentation method, device, equipment and medium
CN112115744B (en) Point cloud data processing method and device, computer storage medium and electronic equipment
Zhao et al. Deeply supervised active learning for finger bones segmentation
CN114120447A (en) Behavior recognition method and system based on prototype comparison learning and storage medium
Li et al. Image decomposition with multilabel context: Algorithms and applications
Robert The Role of Deep Learning in Computer Vision
Cong et al. Gradient-Semantic Compensation for Incremental Semantic Segmentation
CN114764865A (en) Data classification model training method, data classification method and device
CN116977714A (en) Image classification method, apparatus, device, storage medium, and program product
CN116485962A (en) Animation generation method and system based on contrast learning
WO2023168818A1 (en) Method and apparatus for determining similarity between video and text, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination