CN113052073A - Meta learning-based few-sample behavior identification method - Google Patents
Meta learning-based few-sample behavior identification method Download PDFInfo
- Publication number
- CN113052073A CN113052073A CN202110319209.4A CN202110319209A CN113052073A CN 113052073 A CN113052073 A CN 113052073A CN 202110319209 A CN202110319209 A CN 202110319209A CN 113052073 A CN113052073 A CN 113052073A
- Authority
- CN
- China
- Prior art keywords
- meta
- video
- network
- inquiry
- support
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A few-sample behavior identification method based on meta-learning comprises the following steps: 1) the video data set is divided into a meta-training set and a meta-testing set, a plurality of groups of support sets and inquiry sets are extracted from the meta-training set and used for training the model, and a plurality of groups of support sets and inquiry sets are extracted from the meta-testing set and used for testing the model; 2) extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network; 3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2); 4) performing second-order transformation and normalization processing on the video features extracted in the step 2); 5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video. The method has better generalization capability among tasks and high identification accuracy rate of new video behaviors.
Description
Technical Field
The invention relates to the technical field of video behavior recognition, in particular to a low-sample behavior recognition method based on meta-learning.
Background
The behavior recognition technology is one of the research key points in the field of computer vision, and is widely applied to the fields of urban traffic control, intelligent security and the like.
With the rapid development of network technology and the large-scale installation of intelligent cameras, video data is increased explosively every day. Although the technical progress of deep learning in the last decade greatly improves the accuracy rate of video behavior recognition, the labeling of such massive video data brings great difficulty to people. In addition, it is still rare to collect videos of specific fields, such as abnormal behavior scenes, dangerous behaviors in factories, and the like. How to train the model by using only a small amount of sample data and obtain higher accuracy is a focus of attention of researchers in recent years.
Meta-learning techniques aim to make computers as human-like, and can learn common empirical knowledge from previous tasks for use in new tasks. The training set of the meta-learning technology is composed of tasks, aims to learn general knowledge, and is often used by researchers to solve the problem of learning with few samples. Compared with a static image, the time dimension is increased in video behavior identification, and the important problem of research is how to extract effective features and enhance the generalization capability of a model facing different tasks under the condition of avoiding overfitting, namely using a deep neural network.
Disclosure of Invention
In order to overcome the problems, the invention provides a meta-learning-based few-sample behavior recognition method which uses a shallow network, can extract video features by using different parameters according to different video behavior recognition tasks and has strong generalization capability.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a meta-learning based method of low-sample behavior recognition, the method comprising the steps of:
1) dividing a video data set into a meta-training set and a meta-testing set, extracting a plurality of groups of support sets and inquiry sets from the meta-training set for training a model, and extracting a plurality of groups of support sets and inquiry sets from the meta-testing set for testing the model;
2) extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network;
3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2);
4) performing second-order transformation and normalization processing on the video features extracted in the step 2);
5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video.
Further, in step 1), the dividing process of the video data set includes: partitioning a video data set into meta-training sets Dmeta-trainAnd meta test set Dmeta-testDuring the training process, each round of slave Dmeta-trainRandomly extracting N different classes, wherein each class comprises K different samples to form a support setThen from the rest Dmeta-trainRandomly extracting samples in the N classes to form a query setDuring the test, for Dmeta-testThe same operation is done.
Further, in the step 3), the process of generating the shallow three-dimensional convolution network parameters in the step 2) by the meta-learning network is as follows: will be provided withInputting a task encoder E consisting of a three-dimensional convolution network to obtain the probability distribution of the task, and expressing the task into a conditional probability distribution modelObtaining a task characteristic vector t as shown in formulas (1) and (2):
then, generating a network parameter theta of the three-dimensional convolution in the step 2) by using the single-layer fully-connected neural network g, wherein the network parameter theta is represented by formula (3):
θ=g(t) (3)
regularizing the network parameters obtained in the formula (3), as shown in the formula (4):
still further, the shallow three-dimensional convolution network f in the step 2)θExtracting video features using the parameter θ generated by the meta learning network in step 3), the video feature representation being calculated as:
w=fθ(x) (5)
where x is a video clip and x ∈ RC×T×H×WW is a video feature and w is RC'×T'×H'×W'。
Still further, in the step 4), the process of processing the video features w extracted in the step 2) is as follows: firstly, w is belonged to RC’×T’×H’×W’Changing dimension to w' ∈ RC'×M(M ═ T ' × H ' × W '), and second-order characteristics were obtainedThe formula of (1) is as follows:
where ψ (-) is a normalization function, as follows:
further, in step 5), the metric relationship between the support set and the query set is found, and the process of classifying the query set is as follows: the video characteristics of the support set and the inquiry set extracted in the step 4) areMake a splice, represented asInputting the spliced features into a multilayer two-dimensional convolution network to obtain the similarity r, wherein the similarity r is shown as a formula (8):
in the formula ri,jIs a value of 0 to 1, and represents the support set video xiAnd query set video xjThe similarity of (2);
finally, the mean square error formula is used as the loss function, as in formula (9):
the technical conception of the invention is as follows: the method aims to solve the problems that massive videos are difficult to label and videos of special scenes are difficult to collect in the current society. The invention uses a training mode of meta-learning and uses multi-task training. Meanwhile, adaptive model parameters are adopted for different tasks, so that the video features are extracted more effectively. And performing second-order and normalization preprocessing on the extracted video features, finally splicing the video features of the support set and the query set, and acquiring a nonlinear measurement relation by using a two-dimensional convolution network.
The invention has the beneficial effects that: the method has better inter-task generalization capability and high identification accuracy of new video behaviors.
Drawings
FIG. 1 is an overall framework diagram of the model of the present invention.
Fig. 2 is a framework diagram of the meta learning network of the present invention.
Fig. 3 is a flow chart of a method of low-sample behavior recognition based on meta-learning.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 and 2, a method for identifying a few-sample behavior based on meta-learning includes the following steps:
1) the method comprises the steps of dividing a video data set into a meta-training set and a meta-testing set, extracting a plurality of groups of support sets and inquiry sets from the meta-training set for training a model, and extracting a plurality of groups of support sets and inquiry sets from the meta-testing set for testing the model.
Partitioning a video data set into meta-training sets Dmeta-trainAnd meta test set Dmeta-testDuring the training process, each round of slave Dmeta-trainRandomly extracting N different classes, wherein each class comprises K different samples to form a support setThen from the rest Dmeta-trainRandomly extracting samples in the N classes to form a query setDuring the test, for Dmeta-testThe same operation is done.
2) Extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network;
3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2); process for exampleThe following: will be provided withInputting a task encoder E consisting of a convolution network to obtain the probability distribution of the task, expressing the task into a conditional probability distribution model to obtain a task characteristic vector t as shown in formulas (1) and (2):
then, using the network parameter theta of the three-dimensional convolution in the single-layer fully-connected neural network g, as shown in formula (3):
θ=g(t) (3)
regularizing the network parameters obtained in the formula (3), as shown in the formula (4):
the shallow three-dimensional convolution network f in the step 2)θExtracting video features using the parameter θ generated by the meta learning network in step 3), the video feature representation being calculated as:
w=fθ(x) (5)
where x is a video clip and x ∈ RC×T×H×WW is a video feature and w is RC'×T'×H'×W'。
4) Performing second-order transformation and normalization processing on the video features extracted in the step 2);
processing the video characteristics w extracted in the step 2)The following were used: firstly, w is belonged to RC’×T’×H’×W’Changing dimension to w' ∈ RC’×M(M ═ T ' × H ' × W '), and second-order characteristics were obtainedThe formula of (1) is as follows:
where ψ (-) is a normalized function, as follows:
5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video.
The video characteristics of the support set and the inquiry set extracted in the step 3) areMake a splice, represented asInputting the spliced features into a multilayer two-dimensional convolution network to obtain the similarity r, wherein the similarity r is shown as a formula (8):
in the formula ri,jIs a value of 0 to 1, and represents the support set video xiAnd query set video xjThe similarity of (2);
finally, the mean square error formula is used as the loss function, as in formula (9):
the embodiments described in this specification are merely illustrative of implementations of the inventive concepts, which are intended for purposes of illustration only. The scope of the present invention should not be construed as being limited to the particular forms set forth in the examples, but rather as being defined by the claims and the equivalents thereof which can occur to those skilled in the art upon consideration of the present inventive concept.
Claims (6)
1. A meta-learning based few-sample behavior recognition method, characterized in that the method comprises the following steps:
1) dividing a video data set into a meta-training set and a meta-testing set, extracting a plurality of groups of support sets and inquiry sets from the meta-training set for training a model, and extracting a plurality of groups of support sets and inquiry sets from the meta-testing set for testing the model;
2) extracting video features of a support set and an inquiry set by using a shallow three-dimensional convolutional neural network;
3) constructing a meta-learning network for modeling a support set and generating parameters of the shallow three-dimensional convolutional neural network in the step 2);
4) performing second-order transformation and normalization processing on the video features extracted in the step 2);
5) and splicing the processed support set and inquiry set video features, extracting a nonlinear distance relation between the inquiry set and support set video features by adopting a multilayer two-dimensional convolutional neural network, and classifying the training set video.
2. The meta-learning based few-sample behavior recognition method of claim 1, wherein: in step 1), the process of dividing the video data set includes: partitioning a video data set into meta-training sets Dmeta-trainAnd meta test set Dmeta-testDuring the training process, each round of slave Dmeta-trainRandomly extracting N different classes, wherein each class comprises K different samples to form a support setThen from the rest Dmeta-trainRandomly extracting samples in the N classes to form a query setDuring the test, for Dmeta-testThe same operation is done.
3. A meta-learning based low-sample behavior recognition method as claimed in claim 1 or 2, characterized by: in the step 3), the process of generating the shallow three-dimensional convolution network parameters in the step 2) by the meta-learning network is as follows: will be provided withInputting a task encoder E consisting of a three-dimensional convolution network to obtain probability distribution of tasks, and expressing the tasks into a conditional probability distribution model to obtain a task characteristic vector t as shown in formulas (1) and (2):
then, generating a network parameter theta of the three-dimensional convolution in the step 2) by using the single-layer fully-connected neural network g, wherein the network parameter theta is represented by formula (3):
θ=g(t) (3)
regularizing the network parameters obtained in the formula (3), as shown in the formula (4):
4. a meta-learning based low-sample behavior recognition method as claimed in claim 1 or 2, characterized by: the shallow three-dimensional convolution network f in the step 2)θExtracting video features using the parameter θ generated by the meta learning network in step 3), the video feature representation being calculated as:
w=fθ(x) (5)
where x is a video clip and x ∈ RC×T×H×WW is a video feature and w is RC'×T'×H'×W'。
5. A meta-learning based low-sample behavior recognition method as claimed in claim 1 or 2, characterized by: in the step 4), the process of processing the video features w extracted in the step 2) is as follows: firstly, w is belonged to RC′×T′×H′×W′Changing dimension to w' ∈ RC'×M(M ═ T ' × H ' × W '), and second-order characteristics were obtainedThe formula of (1) is as follows:
where ψ (-) is a normalization function, as follows:
6. the meta-learning based few-sample behavior recognition method of claim 5, wherein: in step 5), a metric relationship between the support set and the query set is found, and the process of classifying the query set is as follows: the video characteristics of the support set and the inquiry set extracted in the step 4) areMake a splice, represented asInputting the spliced features into a multilayer two-dimensional convolution network to obtain the similarity r, wherein the similarity r is shown as a formula (8):
in the formula ri,jIs a value of 0 to 1, and represents the support set video xiAnd query set video xjThe similarity of (2); finally, the mean square error formula is used as the loss function, as in formula (9):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110319209.4A CN113052073A (en) | 2021-03-25 | 2021-03-25 | Meta learning-based few-sample behavior identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110319209.4A CN113052073A (en) | 2021-03-25 | 2021-03-25 | Meta learning-based few-sample behavior identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113052073A true CN113052073A (en) | 2021-06-29 |
Family
ID=76515734
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110319209.4A Pending CN113052073A (en) | 2021-03-25 | 2021-03-25 | Meta learning-based few-sample behavior identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113052073A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113535953A (en) * | 2021-07-15 | 2021-10-22 | 湖南大学 | Meta learning-based few-sample classification method |
CN117077030A (en) * | 2023-10-16 | 2023-11-17 | 易停车物联网科技(成都)有限公司 | Few-sample video stream classification method and system for generating model |
-
2021
- 2021-03-25 CN CN202110319209.4A patent/CN113052073A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113535953A (en) * | 2021-07-15 | 2021-10-22 | 湖南大学 | Meta learning-based few-sample classification method |
CN117077030A (en) * | 2023-10-16 | 2023-11-17 | 易停车物联网科技(成都)有限公司 | Few-sample video stream classification method and system for generating model |
CN117077030B (en) * | 2023-10-16 | 2024-01-26 | 易停车物联网科技(成都)有限公司 | Few-sample video stream classification method and system for generating model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110414462B (en) | Unsupervised cross-domain pedestrian re-identification method and system | |
Quiros et al. | A kNN-based approach for the machine vision of character recognition of license plate numbers | |
CN110245564B (en) | Pedestrian detection method, system and terminal equipment | |
CN105069424A (en) | Quick recognition system and method for face | |
CN114067385B (en) | Cross-modal face retrieval hash method based on metric learning | |
CN113052073A (en) | Meta learning-based few-sample behavior identification method | |
Xu et al. | Task-aware meta-learning paradigm for universal structural damage segmentation using limited images | |
Tian et al. | Video object detection for tractability with deep learning method | |
CN112966088B (en) | Unknown intention recognition method, device, equipment and storage medium | |
Manohar et al. | Convolutional neural network with SVM for classification of animal images | |
Rangkuti et al. | A novel reliable approach for image batik classification that invariant with scale and rotation using MU2ECS-LBP algorithm | |
CN113936175A (en) | Method and system for identifying events in video | |
CN114299321A (en) | Video classification method, device, equipment and readable storage medium | |
CN111241933A (en) | Pig farm target identification method based on universal countermeasure disturbance | |
CN114359582B (en) | Small sample feature extraction method based on neural network and related equipment | |
CN111882000A (en) | Network structure and method applied to small sample fine-grained learning | |
Zhang | Application of artificial intelligence recognition technology in digital image processing | |
CN114782752A (en) | Small sample image grouping classification method and device based on self-training | |
Salvesen et al. | Robust deep unsupervised learning framework to discover unseen plankton species | |
CN114333062A (en) | Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency | |
Tong et al. | Robust facial expression recognition based on local tri-directional coding pattern | |
CN111143544B (en) | Method and device for extracting bar graph information based on neural network | |
CN112926368B (en) | Method and device for identifying obstacle | |
Visser et al. | StampNet: unsupervised multi-class object discovery | |
Gahelot et al. | Hog Features Based Handwritten Bengali Numerals Recognition Using SVM Classifier: A Comparison with Hopfield Implementation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |