CN114120447A - Behavior recognition method and system based on prototype comparison learning and storage medium - Google Patents
Behavior recognition method and system based on prototype comparison learning and storage medium Download PDFInfo
- Publication number
- CN114120447A CN114120447A CN202111413784.7A CN202111413784A CN114120447A CN 114120447 A CN114120447 A CN 114120447A CN 202111413784 A CN202111413784 A CN 202111413784A CN 114120447 A CN114120447 A CN 114120447A
- Authority
- CN
- China
- Prior art keywords
- prototype
- vector
- vectors
- behavior recognition
- samples
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 230000006399 behavior Effects 0.000 title claims abstract description 49
- 239000013598 vector Substances 0.000 claims abstract description 106
- 230000006870 function Effects 0.000 claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000006243 chemical reaction Methods 0.000 claims abstract description 5
- 238000002372 labelling Methods 0.000 claims abstract description 5
- 238000005070 sampling Methods 0.000 claims abstract description 3
- 238000004590 computer program Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000011550 data transformation method Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 4
- 238000013507 mapping Methods 0.000 claims description 3
- 238000012512 characterization method Methods 0.000 abstract description 27
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 239000000523 sample Substances 0.000 description 24
- 230000008569 process Effects 0.000 description 7
- 210000000988 bone and bone Anatomy 0.000 description 5
- 230000009466 transformation Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013501 data transformation Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a behavior recognition method and system based on prototype comparison learning and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: sampling in the skeletal key point data set to generate a sample set; converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples; inputting the two groups of enhancement samples into a coding network respectively to obtain two groups of characterization vectors; searching a prototype vector with the highest similarity to each characterization vector in a group of characterization vectors in the prototype vector set, and labeling to generate a corresponding similar vector set; constructing a prototype contrast loss function; and performing back propagation by adopting the prototype contrast loss function, and simultaneously training the encoder network and all prototype vectors. The prototype comparison learning behavior identification method used by the invention can use the skeleton point sequence sample to train to obtain the model capable of effectively obtaining the behavior characterization vector under the conditions of not needing a behavior class label and not being based on a self-encoder.
Description
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to a behavior recognition method and system based on prototype comparison learning and a storage medium.
Background
In the behavior recognition task, due to the constraints of data volume and algorithm, the behavior recognition model based on the RGB image is often interfered by the change of the viewing angle and the complex background, so that the generalization performance is insufficient, and the robustness in practical application is poor. While behavior recognition based on skeletal point data may solve this problem well.
In the skeletal point data, the human body is represented by coordinates of several predefined key joint points in the camera coordinate system. It can be conveniently obtained by a depth camera (e.g., Kinect) and various pose estimation algorithms (e.g., openpos). Key joint points of the human body as defined by the Kinect depth camera are used. It defines the human body as three-dimensional coordinates of 25 key joint points. Since the behavior is often in the form of video, the behavior of a length T frame can be represented by a tensor of T × 25 × 3.
In behavior recognition based on skeleton points, a behavior characterization vector of a skeleton point sequence learned by a model is a core problem, most methods train the model based on a supervised learning paradigm at present, that is, in the process of training the behavior recognition model, a large number of skeleton point data samples marked with behavior class labels are needed to train the behavior recognition model, and the difficulty and the cost of obtaining the behavior class labels are high. Meanwhile, a large number of skeleton point sequence samples of the unlabeled behavior classes cannot be effectively utilized in the training process of the behavior recognition model. Some methods propose to learn behavior characterization vectors based on unsupervised learning models constructed by an auto-encoder, however, the sample reconstruction task in the auto-encoder is not only complex in structure and low in learning efficiency, but also the learned characterization vectors often contain redundant information, so that the effect is not good when the behavior identification task is used.
Disclosure of Invention
The invention provides a behavior recognition method and system based on prototype comparison learning and a storage medium for solving the technical problems in the background art.
The invention adopts the following technical scheme: a behavior recognition method based on prototype comparison learning comprises the following steps:
acquiring joint points in bones and corresponding coordinate information to generate a sample set;
converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples;
inputting the two groups of enhanced samples into a coding network respectively to obtain a query vector and a key vector;
searching the prototype vector with the highest similarity to the key vector in the prototype vector set, and labeling to generate a corresponding similar vector set;
prototype contrast loss function L is constructed based on prototype vectors, query vectors and key vectors in the similarity vector setpc;
Using said prototype contrast loss function LpcBack propagation is performed while training the encoder network and all prototype vectors.
The query vector is obtained as follows:whereinRepresenting a query vector, fθIs a neural network of the encoder, theta is a parameter of the neural network, namely an object to be trained;
the key vector is obtained as follows:whereinRepresenting a query vector; i denotes the index of the sample.
4. In a further embodiment of the method of the invention,
5. the similar vector set is obtained as follows:
and (3) similarity calculation:i.e. calculating the cosine similarity of the vector a and the vector b.
Obtaining the prototype vector with the highest similarity:sithe index of the prototype vector with the highest similarity to the key vector of the sample with index i.
All prototypes in a further embodiment, the prototype contrast loss function is constructed as follows:
Lpc=Lp+Lc
wherein τ is a temperature parameter whose value is in the interval (0,1), B is the number of samples of a batch, j represents the label of the sample, and k represents the label of the prototype vector.
In a further embodiment, the encoder network and all prototype vectors are trained using a gradient descent approach:
in the formula, α is a learning rate of the gradient descent method, and is set according to a training situation.
In a further embodiment, the data transformation method comprises: linear mapping clipping method.
In a further embodiment, the data transformation method comprises: coordinate axis clipping method.
A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.
The invention has the beneficial effects that: the prototype comparison learning behavior identification method used by the invention can be used for training the skeleton point sequence sample to obtain the model capable of effectively obtaining the behavior characterization vector under the conditions of not needing a behavior class label and not being based on a self-encoder. And experiments show that the method achieves a remarkably higher characterization learning effect under the condition of reducing the calculation amount.
Drawings
FIG. 1 is a flow chart of prototype-to-learning behavior recognition according to the present invention.
Fig. 2 is a prior art supervised behavior characterization learning diagram.
Fig. 3 is a prior art unsupervised characterization learning diagram based on an auto-encoder.
Fig. 4 is a diagram of prototype comparative behavior learning according to the present invention.
FIG. 5 is a graph of the prototype comparative behavior learning of the present invention versus the accuracy of existing methods.
Detailed Description
The invention is further described with reference to the drawings and the specific embodiments in the following description.
As shown in fig. 2, in the conventional supervised learning method, after a characterization vector of a sample is obtained by using an encoder network, the characterization vector is input into a linear full-link layer, a probability value of the sample belonging to each category is obtained by a Softmax activation function, and then the encoder network and the linear full-link layer are trained while performing back propagation by using a cross entropy loss function as follows:
wherein B is the size of batch, yiIs a sample tag, piIs the predicted probability of the output of the Softmax activation function. It can be seen that unlabeled data cannot be utilized in characterization learning in this process, since sample labels must be used in the loss function. Unsupervised behavior characterization learning based on self-encoders is shown in fig. 3, which uses a network of encoders to obtain characterization vectors of samples, and inputs the characterization vectors into a network of decoders to generate a sequence of skeleton points. The sequence of bone points is trained on the encoder network and the decoder network while backpropagating the sequence of bone points with a squared error loss function that is essentially based on:
wherein B is the size of batch, xiIs an original sample, and the sample is a sample,are the samples generated by the decoder. After parameters of the encoder network are obtained through label-free data training, the parameters of the encoder network are finely adjusted through label data based on a supervised learning paradigm. It can be seen that the methodAllowing training on unlabeled samples of the sequence of skeletal points then requires the addition of a decoder network during the training process, which makes training inefficient, learned characterizations less targeted than behavior recognition such classification tasks.
Therefore, the prototype comparison behavior learning in fig. 4 is provided, and compared with a method based on an auto-encoder, the method directly uses the loss function to perform back propagation on the encoder, and the construction of the loss function is easier to learn the implicit category information, so that the learned representation is better in behavior recognition task, and meanwhile, the process of reconstructing a sequence by the decoder is removed, and the training efficiency is higher. The method is the same as the unsupervised behavior characterization learning method based on the self-encoder, after parameters of the encoder network are obtained through unmarked data training, the parameters of the encoder network are finely adjusted through a paradigm of supervised learning based on marked data.
Example 1
This embodiment proposes prototype comparison behavior learning as shown in fig. 1, instead of using the self-encoder structure of encoder + decoder, the comparison learning at sample level and the comparison learning at class level are combined to directly train the encoder end-to-end. Specifically, the training process of the method is as follows:
step one, obtaining joint points in bones and corresponding coordinate information, and generating a sample set X ═ Xi};xiIs the sample numbered i.
Step two, adopting at least two random data transformation methods to carry out prototype sample xiTwo sets of enhanced samples are obtained by conversion
Step three, respectively inputting the two groups of enhanced samples into a coding network to obtain query vectorsAnd a key vector
Step four, searching the prototype vector with the highest similarity with each key vector in the prototype vector set, and labeling to generate a corresponding similar vector set Ws;
Step five, prototype vector based on similar vector setQuery vectorAnd a key vectorConstruction of prototype contrast loss function Lpc;
Step six, adopting the prototype contrast loss function LpcBack propagation is performed while training the encoder network and all prototype vectors.
In a further embodiment, the query vector is obtained as follows:whereinRepresenting a query vector, fθIs a neural network of the encoder, theta is a parameter of the neural network, namely an object to be trained;
the key vector is obtained as follows:whereinRepresenting a query vector; i denotes the index of the sample.
In a further embodiment, the similar vector set is obtained as follows:
and (3) similarity calculation:i.e. calculating the cosine similarity of the vector a and the vector b.
Obtaining the prototype vector with the highest similarity:sithe index of the prototype vector with the highest similarity to the key vector of the sample with index i.
In a further embodiment, the prototype contrast loss function is constructed as follows:
wherein τ is a temperature parameter whose value is in the interval (0,1), B is the number of samples of a batch, j represents the label of the sample, and k represents the label of the prototype vector.
In a further embodiment, the encoder network and all prototype vectors are trained using a gradient descent approach:
in the formula, α is a learning rate of the gradient descent method, and is set according to a training situation.
In the random data transformation mode in the second step, we select the superposition of the following two strategies:
1. linear mapping and clipping method: defining each joint point in the whole skeleton as a three-dimensional coordinate in a three-dimensional space, and performing following transformation on each joint point, wherein a three-dimensional vector corresponding to the three-dimensional coordinate is set asThe new three-dimensional coordinate obtained after transformation is y, and the transformation is represented as:
wherein, the elements except 1 in the S matrix are all clipping elements obtained by random sampling from [ -1,1 ].
2. Coordinate axis cutting: for the three-dimensional coordinates of each joint point in the whole skeleton, one of the three dimensions is randomly selected and set as 0.
Based on the above description, in the whole characterization learning process, end-to-end training can be performed on the encoder network without requiring label information of the samples or constructing a sample reconstruction task by using the decoder network. The behavior characterization vector learned by the method has the following characteristics:
(1) the characterization vectors can learn specific transformation invariance;
(2) samples with similar behavioral semantic information will be as close as possible in the characterization space;
(3) all samples will be distributed as uniformly as possible over the characterization space;
(4) the individual clusters of vectors formed on the token space based on the behavioral semantic information will be distributed as evenly as possible on the token space.
Experiments show that the accuracy rate obtained by the method is obviously higher than that of other learning methods in the prior art based on the behavior recognition method, as shown in fig. 5. As can be seen from FIG. 5, the Top1Accuracy of the method of the present invention is significantly improved over the existing unsupervised behavior characterization learning method, and the performance thereof can even exceed that of many methods for performing supervised characterization learning on labeled data sets. In the figure, S: Hand-Crafted and S: DNN-Based represent several classical supervised learning methods in the behavior recognition Based on skeletal points, and SS: Depth Image and SS: Skeleton represent several unsupervised behavior characterization learning methods Based on self-encoders with the best performance at present. "Ours" represents our proposed prototype-versus-behavior learning method.
In another embodiment, a computer system is also disclosed, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program: obtaining a sample in a bone to generate a sample set;
converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples;
inputting the two groups of enhanced samples into a coding network respectively to obtain a query vector and a key vector;
searching the prototype vector with the highest similarity to the key vector in the prototype vector set, and labeling to generate a corresponding similar vector set;
prototype contrast loss function L is constructed based on prototype vectors, query vectors and key vectors in the similarity vector setpc;
Using said prototype contrast loss function LpcBack propagation is performed while training the encoder network and all prototype vectors.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.
Claims (9)
1. A behavior recognition method based on prototype comparison learning is characterized by comprising the following steps:
sampling in the skeletal key point data set to generate a sample set;
converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples;
inputting the two groups of enhanced samples into a coding network respectively to obtain a query vector and a key vector;
searching the prototype vector with the highest similarity to each key vector in the prototype vector set, and labeling to generate a corresponding similar vector set;
prototype contrast loss function L is constructed based on prototype vectors, query vectors and key vectors in the similarity vector setpc;
Using said prototype contrast loss function LpcBack propagation is performed while training the encoder network and all prototype vectors.
2. The behavior recognition method according to claim 1, wherein two groups of enhanced samples are defined asAnd
the query vector is obtained as follows:whereinRepresenting a query vector, fθIs a neural network of the encoder, theta is a parameter of the neural network, namely an object to be trained;
3. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the similar vector set is obtained as follows:
and (3) similarity calculation:i.e. calculating the cosine similarity of the vector a and the vector b.
4. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the prototype comparison loss function is constructed as follows:
Lpc=Lp+Lc
wherein τ is a temperature parameter whose value is in the interval (0,1), B is the number of samples of a batch, j represents the label of the sample, and k represents the label of the prototype vector.
5. The behavior recognition method based on prototype comparison learning according to claim 1, characterized in that the encoder network and all prototype vectors are trained by gradient descent method:
in the formula, α is a learning rate of the gradient descent method, and is set according to a training situation.
6. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the data transformation method comprises: linear mapping clipping method.
7. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the data transformation method comprises: coordinate axis clipping method.
8. A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented by the processor when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111413784.7A CN114120447A (en) | 2021-11-25 | 2021-11-25 | Behavior recognition method and system based on prototype comparison learning and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111413784.7A CN114120447A (en) | 2021-11-25 | 2021-11-25 | Behavior recognition method and system based on prototype comparison learning and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114120447A true CN114120447A (en) | 2022-03-01 |
Family
ID=80373072
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111413784.7A Pending CN114120447A (en) | 2021-11-25 | 2021-11-25 | Behavior recognition method and system based on prototype comparison learning and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114120447A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114386079A (en) * | 2022-03-23 | 2022-04-22 | 清华大学 | Encrypted traffic classification method and device based on contrast learning |
-
2021
- 2021-11-25 CN CN202111413784.7A patent/CN114120447A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114386079A (en) * | 2022-03-23 | 2022-04-22 | 清华大学 | Encrypted traffic classification method and device based on contrast learning |
CN114386079B (en) * | 2022-03-23 | 2022-12-06 | 清华大学 | Encrypted traffic classification method and device based on contrast learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhang et al. | Improved deep hashing with soft pairwise similarity for multi-label image retrieval | |
CN111310707B (en) | Bone-based graph annotation meaning network action recognition method and system | |
CN111079532B (en) | Video content description method based on text self-encoder | |
Ye et al. | Attentive linear transformation for image captioning | |
Li et al. | 2-D stochastic configuration networks for image data analytics | |
Yan et al. | Multibranch attention networks for action recognition in still images | |
CN109948475B (en) | Human body action recognition method based on skeleton features and deep learning | |
CN109783666B (en) | Image scene graph generation method based on iterative refinement | |
Taylor et al. | Learning invariance through imitation | |
CN109447096B (en) | Glance path prediction method and device based on machine learning | |
Li et al. | Multiple VLAD encoding of CNNs for image classification | |
CN109960732B (en) | Deep discrete hash cross-modal retrieval method and system based on robust supervision | |
CN113780249B (en) | Expression recognition model processing method, device, equipment, medium and program product | |
Yuan et al. | Compositional scene representation learning via reconstruction: A survey | |
CN113763385A (en) | Video object segmentation method, device, equipment and medium | |
CN112115744B (en) | Point cloud data processing method and device, computer storage medium and electronic equipment | |
Zhao et al. | Deeply supervised active learning for finger bones segmentation | |
CN114120447A (en) | Behavior recognition method and system based on prototype comparison learning and storage medium | |
Li et al. | Image decomposition with multilabel context: Algorithms and applications | |
Robert | The Role of Deep Learning in Computer Vision | |
Cong et al. | Gradient-Semantic Compensation for Incremental Semantic Segmentation | |
CN114764865A (en) | Data classification model training method, data classification method and device | |
CN116977714A (en) | Image classification method, apparatus, device, storage medium, and program product | |
CN116485962A (en) | Animation generation method and system based on contrast learning | |
WO2023168818A1 (en) | Method and apparatus for determining similarity between video and text, electronic device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |