CN113052073A - Meta learning-based few-sample behavior identification method - Google Patents

Meta learning-based few-sample behavior identification method Download PDF

Info

Publication number
CN113052073A
CN113052073A CN202110319209.4A CN202110319209A CN113052073A CN 113052073 A CN113052073 A CN 113052073A CN 202110319209 A CN202110319209 A CN 202110319209A CN 113052073 A CN113052073 A CN 113052073A
Authority
CN
China
Prior art keywords
meta
video
network
inquiry
support
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110319209.4A
Other languages
Chinese (zh)
Inventor
陈朋
宗鹏程
党源杰
俞天纬
王海霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202110319209.4A priority Critical patent/CN113052073A/en
Publication of CN113052073A publication Critical patent/CN113052073A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A few-sample behavior identification method based on meta-learning comprises the following steps: 1) the video data set is divided into a meta-training set and a meta-testing set, a plurality of groups of support sets and inquiry sets are extracted from the meta-training set and used for training the model, and a plurality of groups of support sets and inquiry sets are extracted from the meta-testing set and used for testing the model; 2) extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network; 3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2); 4) performing second-order transformation and normalization processing on the video features extracted in the step 2); 5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video. The method has better generalization capability among tasks and high identification accuracy rate of new video behaviors.

Description

Meta learning-based few-sample behavior identification method
Technical Field
The invention relates to the technical field of video behavior recognition, in particular to a low-sample behavior recognition method based on meta-learning.
Background
The behavior recognition technology is one of the research key points in the field of computer vision, and is widely applied to the fields of urban traffic control, intelligent security and the like.
With the rapid development of network technology and the large-scale installation of intelligent cameras, video data is increased explosively every day. Although the technical progress of deep learning in the last decade greatly improves the accuracy rate of video behavior recognition, the labeling of such massive video data brings great difficulty to people. In addition, it is still rare to collect videos of specific fields, such as abnormal behavior scenes, dangerous behaviors in factories, and the like. How to train the model by using only a small amount of sample data and obtain higher accuracy is a focus of attention of researchers in recent years.
Meta-learning techniques aim to make computers as human-like, and can learn common empirical knowledge from previous tasks for use in new tasks. The training set of the meta-learning technology is composed of tasks, aims to learn general knowledge, and is often used by researchers to solve the problem of learning with few samples. Compared with a static image, the time dimension is increased in video behavior identification, and the important problem of research is how to extract effective features and enhance the generalization capability of a model facing different tasks under the condition of avoiding overfitting, namely using a deep neural network.
Disclosure of Invention
In order to overcome the problems, the invention provides a meta-learning-based few-sample behavior recognition method which uses a shallow network, can extract video features by using different parameters according to different video behavior recognition tasks and has strong generalization capability.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a meta-learning based method of low-sample behavior recognition, the method comprising the steps of:
1) dividing a video data set into a meta-training set and a meta-testing set, extracting a plurality of groups of support sets and inquiry sets from the meta-training set for training a model, and extracting a plurality of groups of support sets and inquiry sets from the meta-testing set for testing the model;
2) extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network;
3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2);
4) performing second-order transformation and normalization processing on the video features extracted in the step 2);
5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video.
Further, in step 1), the dividing process of the video data set includes: partitioning a video data set into meta-training sets Dmeta-trainAnd meta test set Dmeta-testDuring the training process, each round of slave Dmeta-trainRandomly extracting N different classes, wherein each class comprises K different samples to form a support set
Figure BDA0002992472910000021
Then from the rest Dmeta-trainRandomly extracting samples in the N classes to form a query set
Figure BDA0002992472910000022
During the test, for Dmeta-testThe same operation is done.
Further, in the step 3), the process of generating the shallow three-dimensional convolution network parameters in the step 2) by the meta-learning network is as follows: will be provided with
Figure BDA0002992472910000023
Inputting a task encoder E consisting of a three-dimensional convolution network to obtain the probability distribution of the task, and expressing the task into a conditional probability distribution modelObtaining a task characteristic vector t as shown in formulas (1) and (2):
Figure BDA0002992472910000024
Figure BDA0002992472910000025
wherein q is a conditional probability distribution and,
Figure BDA0002992472910000026
is a normal distribution;
then, generating a network parameter theta of the three-dimensional convolution in the step 2) by using the single-layer fully-connected neural network g, wherein the network parameter theta is represented by formula (3):
θ=g(t) (3)
regularizing the network parameters obtained in the formula (3), as shown in the formula (4):
Figure BDA0002992472910000031
still further, the shallow three-dimensional convolution network f in the step 2)θExtracting video features using the parameter θ generated by the meta learning network in step 3), the video feature representation being calculated as:
w=fθ(x) (5)
where x is a video clip and x ∈ RC×T×H×WW is a video feature and w is RC'×T'×H'×W'
Still further, in the step 4), the process of processing the video features w extracted in the step 2) is as follows: firstly, w is belonged to RC’×T’×H’×W’Changing dimension to w' ∈ RC'×M(M ═ T ' × H ' × W '), and second-order characteristics were obtained
Figure BDA0002992472910000032
The formula of (1) is as follows:
Figure BDA0002992472910000033
where ψ (-) is a normalization function, as follows:
Figure BDA0002992472910000034
further, in step 5), the metric relationship between the support set and the query set is found, and the process of classifying the query set is as follows: the video characteristics of the support set and the inquiry set extracted in the step 4) are
Figure BDA0002992472910000035
Make a splice, represented as
Figure BDA0002992472910000036
Inputting the spliced features into a multilayer two-dimensional convolution network to obtain the similarity r, wherein the similarity r is shown as a formula (8):
Figure BDA0002992472910000037
in the formula ri,jIs a value of 0 to 1, and represents the support set video xiAnd query set video xjThe similarity of (2);
finally, the mean square error formula is used as the loss function, as in formula (9):
Figure BDA0002992472910000038
the technical conception of the invention is as follows: the method aims to solve the problems that massive videos are difficult to label and videos of special scenes are difficult to collect in the current society. The invention uses a training mode of meta-learning and uses multi-task training. Meanwhile, adaptive model parameters are adopted for different tasks, so that the video features are extracted more effectively. And performing second-order and normalization preprocessing on the extracted video features, finally splicing the video features of the support set and the query set, and acquiring a nonlinear measurement relation by using a two-dimensional convolution network.
The invention has the beneficial effects that: the method has better inter-task generalization capability and high identification accuracy of new video behaviors.
Drawings
FIG. 1 is an overall framework diagram of the model of the present invention.
Fig. 2 is a framework diagram of the meta learning network of the present invention.
Fig. 3 is a flow chart of a method of low-sample behavior recognition based on meta-learning.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a method for identifying a few-sample behavior based on meta-learning includes the following steps:
1) the method comprises the steps of dividing a video data set into a meta-training set and a meta-testing set, extracting a plurality of groups of support sets and inquiry sets from the meta-training set for training a model, and extracting a plurality of groups of support sets and inquiry sets from the meta-testing set for testing the model.
Partitioning a video data set into meta-training sets Dmeta-trainAnd meta test set Dmeta-testDuring the training process, each round of slave Dmeta-trainRandomly extracting N different classes, wherein each class comprises K different samples to form a support set
Figure BDA0002992472910000041
Then from the rest Dmeta-trainRandomly extracting samples in the N classes to form a query set
Figure BDA0002992472910000042
During the test, for Dmeta-testThe same operation is done.
2) Extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network;
3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2); process for exampleThe following: will be provided with
Figure BDA0002992472910000043
Inputting a task encoder E consisting of a convolution network to obtain the probability distribution of the task, expressing the task into a conditional probability distribution model to obtain a task characteristic vector t as shown in formulas (1) and (2):
Figure BDA0002992472910000051
Figure BDA0002992472910000052
wherein q is a conditional probability distribution and,
Figure BDA0002992472910000053
is a normal distribution;
then, using the network parameter theta of the three-dimensional convolution in the single-layer fully-connected neural network g, as shown in formula (3):
θ=g(t) (3)
regularizing the network parameters obtained in the formula (3), as shown in the formula (4):
Figure BDA0002992472910000054
the shallow three-dimensional convolution network f in the step 2)θExtracting video features using the parameter θ generated by the meta learning network in step 3), the video feature representation being calculated as:
w=fθ(x) (5)
where x is a video clip and x ∈ RC×T×H×WW is a video feature and w is RC'×T'×H'×W'
4) Performing second-order transformation and normalization processing on the video features extracted in the step 2);
processing the video characteristics w extracted in the step 2)The following were used: firstly, w is belonged to RC’×T’×H’×W’Changing dimension to w' ∈ RC’×M(M ═ T ' × H ' × W '), and second-order characteristics were obtained
Figure BDA0002992472910000055
The formula of (1) is as follows:
Figure BDA0002992472910000056
where ψ (-) is a normalized function, as follows:
Figure BDA0002992472910000057
5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video.
The video characteristics of the support set and the inquiry set extracted in the step 3) are
Figure BDA0002992472910000058
Make a splice, represented as
Figure BDA0002992472910000059
Inputting the spliced features into a multilayer two-dimensional convolution network to obtain the similarity r, wherein the similarity r is shown as a formula (8):
Figure BDA0002992472910000061
in the formula ri,jIs a value of 0 to 1, and represents the support set video xiAnd query set video xjThe similarity of (2);
finally, the mean square error formula is used as the loss function, as in formula (9):
Figure BDA0002992472910000062
the embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.

Claims (6)

1. A meta-learning based few-sample behavior recognition method, characterized in that the method comprises the following steps:
1) dividing a video data set into a meta-training set and a meta-testing set, extracting a plurality of groups of support sets and inquiry sets from the meta-training set for training a model, and extracting a plurality of groups of support sets and inquiry sets from the meta-testing set for testing the model;
2) extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network;
3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2);
4) performing second-order transformation and normalization processing on the video features extracted in the step 2);
5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video.
2. The meta-learning based few-sample behavior recognition method of claim 1, wherein: in step 1), the process of dividing the video data set includes: partitioning a video data set into meta-training sets Dmeta-trainAnd meta test set Dmeta-testDuring the training process, each round of slave Dmeta-trainRandomly extracting N different classes, wherein each class comprises K different samples to form a support set
Figure FDA0002992472900000011
Then from the rest Dmeta-trainRandomly extracting samples in the N classes to form a query set
Figure FDA0002992472900000012
During the test, for Dmeta-testThe same operation is done.
3. A meta-learning based low-sample behavior recognition method as claimed in claim 1 or 2, characterized by: in the step 3), the process of generating the shallow three-dimensional convolution network parameters in the step 2) by the meta-learning network is as follows: will be provided with
Figure FDA0002992472900000013
Inputting a task encoder E consisting of a three-dimensional convolution network to obtain probability distribution of tasks, and expressing the tasks into a conditional probability distribution model to obtain a task characteristic vector t as shown in formulas (1) and (2):
Figure FDA0002992472900000014
Figure FDA0002992472900000015
wherein q is a conditional probability distribution and,
Figure FDA0002992472900000016
is a normal distribution;
then, generating a network parameter theta of the three-dimensional convolution in the step 2) by using the single-layer fully-connected neural network g, wherein the network parameter theta is represented by formula (3):
θ=g(t) (3)
regularizing the network parameters obtained in the formula (3), as shown in the formula (4):
Figure FDA0002992472900000021
4. a meta-learning based low-sample behavior recognition method as claimed in claim 1 or 2, characterized by: the shallow three-dimensional convolution network f in the step 2)θExtracting video features using the parameter θ generated by the meta learning network in step 3), the video feature representation being calculated as:
w=fθ(x) (5)
where x is a video clip and x ∈ RC×T×H×WW is a video feature and w is RC'×T'×H'×W'
5. A meta-learning based low-sample behavior recognition method as claimed in claim 1 or 2, characterized by: in the step 4), the process of processing the video features w extracted in the step 2) is as follows: firstly, w is belonged to RC′×T′×H′×W′Changing dimension to w' ∈ RC'×M(M ═ T ' × H ' × W '), and second-order characteristics were obtained
Figure FDA0002992472900000027
The formula of (1) is as follows:
Figure FDA0002992472900000022
where ψ (-) is a normalization function, as follows:
Figure FDA0002992472900000023
6. the meta-learning based few-sample behavior recognition method of claim 5, wherein: in step 5), a metric relationship between the support set and the query set is found, and the process of classifying the query set is as follows: the video characteristics of the support set and the inquiry set extracted in the step 4) are
Figure FDA0002992472900000024
Make a splice, represented as
Figure FDA0002992472900000025
Inputting the spliced features into a multilayer two-dimensional convolution network to obtain the similarity r, wherein the similarity r is shown as a formula (8):
Figure FDA0002992472900000026
in the formula ri,jIs a value of 0 to 1, and represents the support set video xiAnd query set video xjThe similarity of (2); finally, the mean square error formula is used as the loss function, as in formula (9):
Figure FDA0002992472900000031
CN202110319209.4A 2021-03-25 2021-03-25 Meta learning-based few-sample behavior identification method Pending CN113052073A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110319209.4A CN113052073A (en) 2021-03-25 2021-03-25 Meta learning-based few-sample behavior identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110319209.4A CN113052073A (en) 2021-03-25 2021-03-25 Meta learning-based few-sample behavior identification method

Publications (1)

Publication Number Publication Date
CN113052073A true CN113052073A (en) 2021-06-29

Family

ID=76515734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110319209.4A Pending CN113052073A (en) 2021-03-25 2021-03-25 Meta learning-based few-sample behavior identification method

Country Status (1)

Country Link
CN (1) CN113052073A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535953A (en) * 2021-07-15 2021-10-22 湖南大学 Meta learning-based few-sample classification method
CN117077030A (en) * 2023-10-16 2023-11-17 易停车物联网科技(成都)有限公司 Few-sample video stream classification method and system for generating model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113535953A (en) * 2021-07-15 2021-10-22 湖南大学 Meta learning-based few-sample classification method
CN117077030A (en) * 2023-10-16 2023-11-17 易停车物联网科技(成都)有限公司 Few-sample video stream classification method and system for generating model
CN117077030B (en) * 2023-10-16 2024-01-26 易停车物联网科技(成都)有限公司 Few-sample video stream classification method and system for generating model

Similar Documents

Publication Publication Date Title
CN110414462B (en) Unsupervised cross-domain pedestrian re-identification method and system
Quiros et al. A kNN-based approach for the machine vision of character recognition of license plate numbers
CN110245564B (en) Pedestrian detection method, system and terminal equipment
CN105069424A (en) Quick recognition system and method for face
CN114067385B (en) Cross-modal face retrieval hash method based on metric learning
CN113052073A (en) Meta learning-based few-sample behavior identification method
Xu et al. Task-aware meta-learning paradigm for universal structural damage segmentation using limited images
Tian et al. Video object detection for tractability with deep learning method
CN112966088B (en) Unknown intention recognition method, device, equipment and storage medium
Manohar et al. Convolutional neural network with SVM for classification of animal images
Rangkuti et al. A novel reliable approach for image batik classification that invariant with scale and rotation using MU2ECS-LBP algorithm
CN113936175A (en) Method and system for identifying events in video
CN114299321A (en) Video classification method, device, equipment and readable storage medium
CN111241933A (en) Pig farm target identification method based on universal countermeasure disturbance
CN114359582B (en) Small sample feature extraction method based on neural network and related equipment
CN111882000A (en) Network structure and method applied to small sample fine-grained learning
Zhang Application of artificial intelligence recognition technology in digital image processing
CN114782752A (en) Small sample image grouping classification method and device based on self-training
Salvesen et al. Robust deep unsupervised learning framework to discover unseen plankton species
CN114333062A (en) Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
Tong et al. Robust facial expression recognition based on local tri-directional coding pattern
CN111143544B (en) Method and device for extracting bar graph information based on neural network
CN112926368B (en) Method and device for identifying obstacle
Visser et al. StampNet: unsupervised multi-class object discovery
Gahelot et al. Hog Features Based Handwritten Bengali Numerals Recognition Using SVM Classifier: A Comparison with Hopfield Implementation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination