CN114120447A

CN114120447A - Behavior recognition method and system based on prototype comparison learning and storage medium

Info

Publication number: CN114120447A
Application number: CN202111413784.7A
Authority: CN
Inventors: 高浩元; 张一帆
Original assignee: Zhongke Nanjing Artificial Intelligence Innovation Research Institute; Institute of Automation of Chinese Academy of Science
Current assignee: Zhongke Nanjing Artificial Intelligence Innovation Research Institute; Institute of Automation of Chinese Academy of Science
Priority date: 2021-11-25
Filing date: 2021-11-25
Publication date: 2022-03-01

Abstract

The invention discloses a behavior recognition method and system based on prototype comparison learning and a storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: sampling in the skeletal key point data set to generate a sample set; converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples; inputting the two groups of enhancement samples into a coding network respectively to obtain two groups of characterization vectors; searching a prototype vector with the highest similarity to each characterization vector in a group of characterization vectors in the prototype vector set, and labeling to generate a corresponding similar vector set; constructing a prototype contrast loss function; and performing back propagation by adopting the prototype contrast loss function, and simultaneously training the encoder network and all prototype vectors. The prototype comparison learning behavior identification method used by the invention can use the skeleton point sequence sample to train to obtain the model capable of effectively obtaining the behavior characterization vector under the conditions of not needing a behavior class label and not being based on a self-encoder.

Description

Behavior recognition method and system based on prototype comparison learning and storage medium

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a behavior recognition method and system based on prototype comparison learning and a storage medium.

Background

In the behavior recognition task, due to the constraints of data volume and algorithm, the behavior recognition model based on the RGB image is often interfered by the change of the viewing angle and the complex background, so that the generalization performance is insufficient, and the robustness in practical application is poor. While behavior recognition based on skeletal point data may solve this problem well.

In the skeletal point data, the human body is represented by coordinates of several predefined key joint points in the camera coordinate system. It can be conveniently obtained by a depth camera (e.g., Kinect) and various pose estimation algorithms (e.g., openpos). Key joint points of the human body as defined by the Kinect depth camera are used. It defines the human body as three-dimensional coordinates of 25 key joint points. Since the behavior is often in the form of video, the behavior of a length T frame can be represented by a tensor of T × 25 × 3.

In behavior recognition based on skeleton points, a behavior characterization vector of a skeleton point sequence learned by a model is a core problem, most methods train the model based on a supervised learning paradigm at present, that is, in the process of training the behavior recognition model, a large number of skeleton point data samples marked with behavior class labels are needed to train the behavior recognition model, and the difficulty and the cost of obtaining the behavior class labels are high. Meanwhile, a large number of skeleton point sequence samples of the unlabeled behavior classes cannot be effectively utilized in the training process of the behavior recognition model. Some methods propose to learn behavior characterization vectors based on unsupervised learning models constructed by an auto-encoder, however, the sample reconstruction task in the auto-encoder is not only complex in structure and low in learning efficiency, but also the learned characterization vectors often contain redundant information, so that the effect is not good when the behavior identification task is used.

Disclosure of Invention

The invention provides a behavior recognition method and system based on prototype comparison learning and a storage medium for solving the technical problems in the background art.

The invention adopts the following technical scheme: a behavior recognition method based on prototype comparison learning comprises the following steps:

acquiring joint points in bones and corresponding coordinate information to generate a sample set;

converting the samples by adopting at least two random data conversion methods to obtain two groups of enhanced samples;

inputting the two groups of enhanced samples into a coding network respectively to obtain a query vector and a key vector;

searching the prototype vector with the highest similarity to the key vector in the prototype vector set, and labeling to generate a corresponding similar vector set;

prototype contrast loss function L is constructed based on prototype vectors, query vectors and key vectors in the similarity vector set_pc；

Using said prototype contrast loss function L_pcBack propagation is performed while training the encoder network and all prototype vectors.

In a further embodiment, two sets of enhancement samples are defined as

The query vector is obtained as follows:

wherein

Representing a query vector, f_θIs a neural network of the encoder, theta is a parameter of the neural network, namely an object to be trained;

the key vector is obtained as follows:

wherein

Representing a query vector; i denotes the index of the sample.

4. In a further embodiment of the method of the invention,

5. the similar vector set is obtained as follows:

defining a set of prototype vectors, consisting of K prototype vectors:

and (3) similarity calculation:

i.e. calculating the cosine similarity of the vector a and the vector b.

Obtaining the prototype vector with the highest similarity:

s_ithe index of the prototype vector with the highest similarity to the key vector of the sample with index i.

All prototypes in a further embodiment, the prototype contrast loss function is constructed as follows:

L_pc＝L_p+L_c

wherein τ is a temperature parameter whose value is in the interval (0,1), B is the number of samples of a batch, j represents the label of the sample, and k represents the label of the prototype vector.

In a further embodiment, the encoder network and all prototype vectors are trained using a gradient descent approach:

in the formula, α is a learning rate of the gradient descent method, and is set according to a training situation.

In a further embodiment, the data transformation method comprises: linear mapping clipping method.

In a further embodiment, the data transformation method comprises: coordinate axis clipping method.

A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program.

A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method as set forth above.

The invention has the beneficial effects that: the prototype comparison learning behavior identification method used by the invention can be used for training the skeleton point sequence sample to obtain the model capable of effectively obtaining the behavior characterization vector under the conditions of not needing a behavior class label and not being based on a self-encoder. And experiments show that the method achieves a remarkably higher characterization learning effect under the condition of reducing the calculation amount.

Drawings

FIG. 1 is a flow chart of prototype-to-learning behavior recognition according to the present invention.

Fig. 2 is a prior art supervised behavior characterization learning diagram.

Fig. 3 is a prior art unsupervised characterization learning diagram based on an auto-encoder.

Fig. 4 is a diagram of prototype comparative behavior learning according to the present invention.

FIG. 5 is a graph of the prototype comparative behavior learning of the present invention versus the accuracy of existing methods.

Detailed Description

The invention is further described with reference to the drawings and the specific embodiments in the following description.

As shown in fig. 2, in the conventional supervised learning method, after a characterization vector of a sample is obtained by using an encoder network, the characterization vector is input into a linear full-link layer, a probability value of the sample belonging to each category is obtained by a Softmax activation function, and then the encoder network and the linear full-link layer are trained while performing back propagation by using a cross entropy loss function as follows:

wherein B is the size of batch, yⁱIs a sample tag, pⁱIs the predicted probability of the output of the Softmax activation function. It can be seen that unlabeled data cannot be utilized in characterization learning in this process, since sample labels must be used in the loss function. Unsupervised behavior characterization learning based on self-encoders is shown in fig. 3, which uses a network of encoders to obtain characterization vectors of samples, and inputs the characterization vectors into a network of decoders to generate a sequence of skeleton points. The sequence of bone points is trained on the encoder network and the decoder network while backpropagating the sequence of bone points with a squared error loss function that is essentially based on:

wherein B is the size of batch, xⁱIs an original sample, and the sample is a sample,

are the samples generated by the decoder. After parameters of the encoder network are obtained through label-free data training, the parameters of the encoder network are finely adjusted through label data based on a supervised learning paradigm. It can be seen that the methodAllowing training on unlabeled samples of the sequence of skeletal points then requires the addition of a decoder network during the training process, which makes training inefficient, learned characterizations less targeted than behavior recognition such classification tasks.

Therefore, the prototype comparison behavior learning in fig. 4 is provided, and compared with a method based on an auto-encoder, the method directly uses the loss function to perform back propagation on the encoder, and the construction of the loss function is easier to learn the implicit category information, so that the learned representation is better in behavior recognition task, and meanwhile, the process of reconstructing a sequence by the decoder is removed, and the training efficiency is higher. The method is the same as the unsupervised behavior characterization learning method based on the self-encoder, after parameters of the encoder network are obtained through unmarked data training, the parameters of the encoder network are finely adjusted through a paradigm of supervised learning based on marked data.

Example 1

This embodiment proposes prototype comparison behavior learning as shown in fig. 1, instead of using the self-encoder structure of encoder + decoder, the comparison learning at sample level and the comparison learning at class level are combined to directly train the encoder end-to-end. Specifically, the training process of the method is as follows:

step one, obtaining joint points in bones and corresponding coordinate information, and generating a sample set X ═ Xⁱ}；xⁱIs the sample numbered i.

Step two, adopting at least two random data transformation methods to carry out prototype sample xⁱTwo sets of enhanced samples are obtained by conversion

Step three, respectively inputting the two groups of enhanced samples into a coding network to obtain query vectors

And a key vector

Step four, searching the prototype vector with the highest similarity with each key vector in the prototype vector set, and labeling to generate a corresponding similar vector set W^s；

Step five, prototype vector based on similar vector set

Query vector

And a key vector

Construction of prototype contrast loss function L_pc；

Step six, adopting the prototype contrast loss function L_pcBack propagation is performed while training the encoder network and all prototype vectors.

In a further embodiment, the query vector is obtained as follows:

wherein

the key vector is obtained as follows:

wherein

Representing a query vector; i denotes the index of the sample.

In a further embodiment, the similar vector set is obtained as follows:

defining a set of prototype vectors, consisting of K prototype vectors:

and (3) similarity calculation:

i.e. calculating the cosine similarity of the vector a and the vector b.

Obtaining the prototype vector with the highest similarity:

In a further embodiment, the prototype contrast loss function is constructed as follows:

In the random data transformation mode in the second step, we select the superposition of the following two strategies:

1. linear mapping and clipping method: defining each joint point in the whole skeleton as a three-dimensional coordinate in a three-dimensional space, and performing following transformation on each joint point, wherein a three-dimensional vector corresponding to the three-dimensional coordinate is set as

The new three-dimensional coordinate obtained after transformation is y, and the transformation is represented as:

wherein, the elements except 1 in the S matrix are all clipping elements obtained by random sampling from [ -1,1 ].

2. Coordinate axis cutting: for the three-dimensional coordinates of each joint point in the whole skeleton, one of the three dimensions is randomly selected and set as 0.

Based on the above description, in the whole characterization learning process, end-to-end training can be performed on the encoder network without requiring label information of the samples or constructing a sample reconstruction task by using the decoder network. The behavior characterization vector learned by the method has the following characteristics:

(1) the characterization vectors can learn specific transformation invariance;

(2) samples with similar behavioral semantic information will be as close as possible in the characterization space;

(3) all samples will be distributed as uniformly as possible over the characterization space;

(4) the individual clusters of vectors formed on the token space based on the behavioral semantic information will be distributed as evenly as possible on the token space.

Experiments show that the accuracy rate obtained by the method is obviously higher than that of other learning methods in the prior art based on the behavior recognition method, as shown in fig. 5. As can be seen from FIG. 5, the Top1Accuracy of the method of the present invention is significantly improved over the existing unsupervised behavior characterization learning method, and the performance thereof can even exceed that of many methods for performing supervised characterization learning on labeled data sets. In the figure, S: Hand-Crafted and S: DNN-Based represent several classical supervised learning methods in the behavior recognition Based on skeletal points, and SS: Depth Image and SS: Skeleton represent several unsupervised behavior characterization learning methods Based on self-encoders with the best performance at present. "Ours" represents our proposed prototype-versus-behavior learning method.

In another embodiment, a computer system is also disclosed, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method as described above when executing the computer program: obtaining a sample in a bone to generate a sample set;

Claims

1. A behavior recognition method based on prototype comparison learning is characterized by comprising the following steps:

sampling in the skeletal key point data set to generate a sample set;

searching the prototype vector with the highest similarity to each key vector in the prototype vector set, and labeling to generate a corresponding similar vector set;

2. The behavior recognition method according to claim 1, wherein two groups of enhanced samples are defined as

And

the query vector is obtained as follows:

wherein

the key vector is obtained as follows:

wherein

Representing a query vector; i denotes the index of the sample.

3. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the similar vector set is obtained as follows:

defining a set of prototype vectors, consisting of K prototype vectors:

and (3) similarity calculation:

i.e. calculating the cosine similarity of the vector a and the vector b.

Obtaining the prototype vector with the highest similarity:

4. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the prototype comparison loss function is constructed as follows:

L_pc＝L_p+L_c

5. The behavior recognition method based on prototype comparison learning according to claim 1, characterized in that the encoder network and all prototype vectors are trained by gradient descent method:

6. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the data transformation method comprises: linear mapping clipping method.

7. The behavior recognition method based on prototype comparison learning according to claim 1, wherein the data transformation method comprises: coordinate axis clipping method.

8. A computer system comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 7 are implemented by the processor when executing the computer program.

9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.