CN113052073A

CN113052073A - Meta learning-based few-sample behavior identification method

Info

Publication number: CN113052073A
Application number: CN202110319209.4A
Authority: CN
Inventors: 陈朋; 宗鹏程; 党源杰; 俞天纬; 王海霞
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-03-25
Filing date: 2021-03-25
Publication date: 2021-06-29

Abstract

A few-sample behavior identification method based on meta-learning comprises the following steps: 1) the video data set is divided into a meta-training set and a meta-testing set, a plurality of groups of support sets and inquiry sets are extracted from the meta-training set and used for training the model, and a plurality of groups of support sets and inquiry sets are extracted from the meta-testing set and used for testing the model; 2) extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network; 3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2); 4) performing second-order transformation and normalization processing on the video features extracted in the step 2); 5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video. The method has better generalization capability among tasks and high identification accuracy rate of new video behaviors.

Description

Meta learning-based few-sample behavior identification method

Technical Field

The invention relates to the technical field of video behavior recognition, in particular to a low-sample behavior recognition method based on meta-learning.

Background

The behavior recognition technology is one of the research key points in the field of computer vision, and is widely applied to the fields of urban traffic control, intelligent security and the like.

With the rapid development of network technology and the large-scale installation of intelligent cameras, video data is increased explosively every day. Although the technical progress of deep learning in the last decade greatly improves the accuracy rate of video behavior recognition, the labeling of such massive video data brings great difficulty to people. In addition, it is still rare to collect videos of specific fields, such as abnormal behavior scenes, dangerous behaviors in factories, and the like. How to train the model by using only a small amount of sample data and obtain higher accuracy is a focus of attention of researchers in recent years.

Meta-learning techniques aim to make computers as human-like, and can learn common empirical knowledge from previous tasks for use in new tasks. The training set of the meta-learning technology is composed of tasks, aims to learn general knowledge, and is often used by researchers to solve the problem of learning with few samples. Compared with a static image, the time dimension is increased in video behavior identification, and the important problem of research is how to extract effective features and enhance the generalization capability of a model facing different tasks under the condition of avoiding overfitting, namely using a deep neural network.

Disclosure of Invention

In order to overcome the problems, the invention provides a meta-learning-based few-sample behavior recognition method which uses a shallow network, can extract video features by using different parameters according to different video behavior recognition tasks and has strong generalization capability.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a meta-learning based method of low-sample behavior recognition, the method comprising the steps of:

1) dividing a video data set into a meta-training set and a meta-testing set, extracting a plurality of groups of support sets and inquiry sets from the meta-training set for training a model, and extracting a plurality of groups of support sets and inquiry sets from the meta-testing set for testing the model;

2) extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network;

3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2);

4) performing second-order transformation and normalization processing on the video features extracted in the step 2);

5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video.

Further, in step 1), the dividing process of the video data set includes: partitioning a video data set into meta-training sets D_meta-trainAnd meta test set D_meta-testDuring the training process, each round of slave D_meta-trainRandomly extracting N different classes, wherein each class comprises K different samples to form a support set

Then from the rest D_meta-trainRandomly extracting samples in the N classes to form a query set

During the test, for D_meta-testThe same operation is done.

Further, in the step 3), the process of generating the shallow three-dimensional convolution network parameters in the step 2) by the meta-learning network is as follows: will be provided with

Inputting a task encoder E consisting of a three-dimensional convolution network to obtain the probability distribution of the task, and expressing the task into a conditional probability distribution modelObtaining a task characteristic vector t as shown in formulas (1) and (2):

wherein q is a conditional probability distribution and,

is a normal distribution;

then, generating a network parameter theta of the three-dimensional convolution in the step 2) by using the single-layer fully-connected neural network g, wherein the network parameter theta is represented by formula (3):

θ＝g(t) (3)

regularizing the network parameters obtained in the formula (3), as shown in the formula (4):

still further, the shallow three-dimensional convolution network f in the step 2)_θExtracting video features using the parameter θ generated by the meta learning network in step 3), the video feature representation being calculated as:

w＝f_θ(x) (5)

where x is a video clip and x ∈ R^C×T×H×WW is a video feature and w is R^{C'×T'×H'×W'}。

Still further, in the step 4), the process of processing the video features w extracted in the step 2) is as follows: firstly, w is belonged to R^{C’×T’×H’×W’}Changing dimension to w' ∈ R^C'×M(M ═ T ' × H ' × W '), and second-order characteristics were obtained

The formula of (1) is as follows:

where ψ (-) is a normalization function, as follows:

further, in step 5), the metric relationship between the support set and the query set is found, and the process of classifying the query set is as follows: the video characteristics of the support set and the inquiry set extracted in the step 4) are

Make a splice, represented as

Inputting the spliced features into a multilayer two-dimensional convolution network to obtain the similarity r, wherein the similarity r is shown as a formula (8):

in the formula r_i,jIs a value of 0 to 1, and represents the support set video x_iAnd query set video x_jThe similarity of (2);

finally, the mean square error formula is used as the loss function, as in formula (9):

the technical conception of the invention is as follows: the method aims to solve the problems that massive videos are difficult to label and videos of special scenes are difficult to collect in the current society. The invention uses a training mode of meta-learning and uses multi-task training. Meanwhile, adaptive model parameters are adopted for different tasks, so that the video features are extracted more effectively. And performing second-order and normalization preprocessing on the extracted video features, finally splicing the video features of the support set and the query set, and acquiring a nonlinear measurement relation by using a two-dimensional convolution network.

The invention has the beneficial effects that: the method has better inter-task generalization capability and high identification accuracy of new video behaviors.

Drawings

FIG. 1 is an overall framework diagram of the model of the present invention.

Fig. 2 is a framework diagram of the meta learning network of the present invention.

Fig. 3 is a flow chart of a method of low-sample behavior recognition based on meta-learning.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

Referring to fig. 1 and 2, a method for identifying a few-sample behavior based on meta-learning includes the following steps:

1) the method comprises the steps of dividing a video data set into a meta-training set and a meta-testing set, extracting a plurality of groups of support sets and inquiry sets from the meta-training set for training a model, and extracting a plurality of groups of support sets and inquiry sets from the meta-testing set for testing the model.

Partitioning a video data set into meta-training sets D_meta-trainAnd meta test set D_meta-testDuring the training process, each round of slave D_meta-trainRandomly extracting N different classes, wherein each class comprises K different samples to form a support set

During the test, for D_meta-testThe same operation is done.

3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2); process for exampleThe following: will be provided with

Inputting a task encoder E consisting of a convolution network to obtain the probability distribution of the task, expressing the task into a conditional probability distribution model to obtain a task characteristic vector t as shown in formulas (1) and (2):

wherein q is a conditional probability distribution and,

is a normal distribution;

then, using the network parameter theta of the three-dimensional convolution in the single-layer fully-connected neural network g, as shown in formula (3):

θ＝g(t) (3)

the shallow three-dimensional convolution network f in the step 2)_θExtracting video features using the parameter θ generated by the meta learning network in step 3), the video feature representation being calculated as:

w＝f_θ(x) (5)

processing the video characteristics w extracted in the step 2)The following were used: firstly, w is belonged to R^{C’×T’×H’×W’}Changing dimension to w' ∈ R^C’×M(M ═ T ' × H ' × W '), and second-order characteristics were obtained

The formula of (1) is as follows:

where ψ (-) is a normalized function, as follows:

The video characteristics of the support set and the inquiry set extracted in the step 3) are

Make a splice, represented as

the embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.

Claims

1. A meta-learning based few-sample behavior recognition method, characterized in that the method comprises the following steps:

2. The meta-learning based few-sample behavior recognition method of claim 1, wherein: in step 1), the process of dividing the video data set includes: partitioning a video data set into meta-training sets D_meta-trainAnd meta test set D_meta-testDuring the training process, each round of slave D_meta-trainRandomly extracting N different classes, wherein each class comprises K different samples to form a support set

During the test, for D_meta-testThe same operation is done.

3. A meta-learning based low-sample behavior recognition method as claimed in claim 1 or 2, characterized by: in the step 3), the process of generating the shallow three-dimensional convolution network parameters in the step 2) by the meta-learning network is as follows: will be provided with

Inputting a task encoder E consisting of a three-dimensional convolution network to obtain probability distribution of tasks, and expressing the tasks into a conditional probability distribution model to obtain a task characteristic vector t as shown in formulas (1) and (2):

wherein q is a conditional probability distribution and,

is a normal distribution;

θ＝g(t) (3)

4. a meta-learning based low-sample behavior recognition method as claimed in claim 1 or 2, characterized by: the shallow three-dimensional convolution network f in the step 2)_θExtracting video features using the parameter θ generated by the meta learning network in step 3), the video feature representation being calculated as:

w＝f_θ(x) (5)

5. A meta-learning based low-sample behavior recognition method as claimed in claim 1 or 2, characterized by: in the step 4), the process of processing the video features w extracted in the step 2) is as follows: firstly, w is belonged to R^{C′×T′×H′×W′}Changing dimension to w' ∈ R^C'×M(M ═ T ' × H ' × W '), and second-order characteristics were obtained

The formula of (1) is as follows:

where ψ (-) is a normalization function, as follows:

6. the meta-learning based few-sample behavior recognition method of claim 5, wherein: in step 5), a metric relationship between the support set and the query set is found, and the process of classifying the query set is as follows: the video characteristics of the support set and the inquiry set extracted in the step 4) are

Make a splice, represented as

in the formula r_i,jIs a value of 0 to 1, and represents the support set video x_iAnd query set video x_jThe similarity of (2); finally, the mean square error formula is used as the loss function, as in formula (9):