CN111860697A - Local descriptor-based criticist-driven small sample learning method - Google Patents
Local descriptor-based criticist-driven small sample learning method Download PDFInfo
- Publication number
- CN111860697A CN111860697A CN202010777958.7A CN202010777958A CN111860697A CN 111860697 A CN111860697 A CN 111860697A CN 202010777958 A CN202010777958 A CN 202010777958A CN 111860697 A CN111860697 A CN 111860697A
- Authority
- CN
- China
- Prior art keywords
- local
- picture
- pictures
- network
- local descriptors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/217—Validation; Performance evaluation; Active pattern learning techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a local descriptor-based critic-driven small sample learning method. The invention aims to score the extracted local descriptors of the pictures by using the critic network mechanism and improve the influence weight of the local descriptors with high scores on the classification result, so that the critic network parameters are adjusted to make the critic network parameters score the important local descriptors of the pictures, further the integral small sample learning network is forced to pay attention to the key areas of the pictures, the influence of the disordered background and the irrelevant information of the pictures on the network classification is reduced, and the final classification effect of the network on the pictures is improved.
Description
Technical Field
The invention relates to the fields of image recognition, small sample learning, image characteristic information extraction, deep learning and the like, in particular to a critic-driven small sample learning method based on local descriptors.
Background
In recent years, methods and technologies for deep learning are continuously developed and advanced, which greatly drives the development of research in the field of image recognition, such as image classification, image segmentation, target detection and the like, and even the capability of deep learning a network on certain tasks exceeds that of human beings. However, mainstream deep learning methods typically require a tremendous amount of data to train the network model, and these methods tend to over-fit on specific tasks, or they often perform poorly on entirely new data sets. The ability of humans to adapt rapidly when challenged with entirely new tasks remains elusive. In this context, small sample learning arises.
Small sample learning aims to force the model to learn at a small sample size and to obtain a model generalization capability outside the training data set. Existing small sample learning methods based on meta-learning or metric learning generally utilize several improvement methods to improve model learning capabilities, such as data enhancement, embedded network model structure improvement, comparative benchmark improvement, and target positioning. The maximum entropy block generator can simulate block sequences of human visual track sampling pictures, the sequences are divided into background sequences and target sequences by a reinforcement learning method, and a model is guided to focus on the target block sequences of the images, so that the accuracy of small sample image recognition is improved. The image saliency map is also applied to the field of small sample recognition, the background and the foreground of different pictures can be mixed together by using a data transformation method to increase the model training data amount, meanwhile, the influence of the picture background on recognition can be weakened, and the accuracy of small sample image recognition is improved. These approaches are relatively lacking in some insight into the mechanisms that the human visual system reacts to when faced with a completely new picture.
Humans can learn to classify new pictures very quickly, mainly because of their extensive knowledge reserves and past experience. In addition, another important factor is that humans have a rapid information capture capability. Human beings can quickly locate key information points of a picture from a new picture with a complex environment background and extract important information, and prefer to ignore other irrelevant information in the picture. This is because humans have developed a criticizing mechanism in past experience that forces them to turn their visual attention to a specific area in the picture, but this transfer process is so short that humans do not realize it. This human visual mechanism can help us to better design a network approach for small sample learning.
Disclosure of Invention
The invention aims to overcome the defects in the prior art and provides a method for small sample learning based on local descriptor driven critics. The method uses the local descriptors of different areas of the picture to describe the detailed characteristic information of the picture, and adopts a picture-to-class classification method to improve the classification precision. The invention aims to score the extracted local descriptors of the pictures by using the critic network mechanism and improve the influence weight of the local descriptors with high scores on the classification result, so that the critic network parameters are adjusted to make the critic network parameters score the important local descriptors of the pictures, further the integral small sample learning network is forced to pay attention to the key areas of the pictures, the influence of the disordered background and the irrelevant information of the pictures on the network classification is reduced, and the final classification effect of the network on the pictures is improved.
The purpose of the invention is realized by the following technical scheme:
a critic-driven small sample learning method based on local descriptors comprises the following steps:
(1) extracting a local descriptor of the picture through a local descriptor extraction network;
(2) scoring the extracted local descriptors of the pictures through a critic network;
(3) and classifying the pictures according to the extracted local descriptors and the marked scores by using a picture-to-class classification method.
Further, in the step (1), a local descriptor extraction network is used for respectively extracting local descriptors of all query set pictures and support set pictures, all local descriptors of one query set picture are integrated into one local descriptor pool, and local descriptors of each type of pictures in the support set are respectively integrated into the local descriptor pools;
in the step (2), a critic network is adopted to score all local descriptors of a query set picture;
and (3) calculating the similarity of all local descriptors of the query set picture and all local descriptor pools of the support set by using a K-nearest neighbor algorithm, and enabling the classification weight of the local descriptors of the query set picture with high scores to be higher, so as to find out the support set picture category which is more similar to the query set picture.
Compared with the prior art, the technical scheme of the invention has the following beneficial effects:
1. the critic network is used for driving the small sample learning network to pay attention to key information in the picture, and picture background interference information and irrelevant information are ignored, so that the performance of the network in the small sample learning task is improved.
2. The method adopts the local descriptors to extract the detail features in the pictures, uses the picture-to-class image classification method to classify the images under the condition of small samples, and uses the critic network to score the importance degree of the extracted picture local descriptors, so that the local descriptors with high scores have larger influence on the classification result, the network is forced to pay more attention to the areas with high scores, namely the more important areas in the pictures, and the interference of disordered backgrounds and irrelevant information on the picture classification is reduced. Finally, the purposes of paying attention to picture key information in a classification task and reducing the influence of interference information are achieved; the problem of the interference of the disordered background information of the picture to the identification is solved.
3. The problem that the embedded network extracts the picture characteristic information roughly is solved. In the prior art, a global descriptor is generally used as an extraction result of an embedded network for picture features, so that the extracted picture features are global information of pictures and fine features of the pictures can be lost. The method uses the local descriptor to extract the local information of the picture, and then integrates the local information into the characteristic information set of the picture, thereby keeping the detailed characteristics of the picture.
Drawings
Fig. 1 is a diagram of the network architecture of the present invention.
Fig. 2 is a schematic diagram of the operation mechanism of the criticizing network.
FIG. 3 shows the results of the experiments performed on the miniImageNet data set according to the present invention.
Fig. 4 is an experimental result of the present invention on three fine-grained classified picture data sets.
Detailed Description
The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A small sample learning task T is actually a relationship learning task that contains a large number of classes of pictures. A support set S is given containing many classes of pictures, each class having K pictures (K is a small number, usually 1 or 5, and hence called a small sample learning task), and a query set Q, the pictures contained in S and Q do not overlap each other and they are extracted from T. Under the above data settings, the small sample learning network is trained to indicate which class of pictures in S each picture in Q belongs to, so that after many generations of learning, in the face of a new task T ', the overall network can indicate which class of the new support set S each picture in the new query set Q' belongs to.
Therefore, the invention provides a local descriptor-based critic-driven small sample learning method, and the overall network method mainly comprises the following parts:
(1) and the local descriptor extraction network is used for extracting the local descriptors of the pictures.
(2) And the critic network is used for scoring the extracted picture local descriptors.
(3) And the comparison module considering the scores classifies the pictures according to the extracted local descriptors and the marked scores by using a picture-to-class classification method.
Specifically, with respect to local descriptors: due to quantization errors, the picture-level descriptor, i.e. the global descriptor, loses the detailed information that contributes to the classification. The effect of such errors can be mitigated if the amount of data is sufficient, but the effect of such errors is irreversible at small sample task settings. In addition, the image level descriptor has a receptive field of a whole picture, so that the extracted information is relatively coarse, and the local descriptor has a receptive field of a certain specific area of the picture, so that the local descriptor can provide more detailed characteristic information of different areas of the picture, thereby being more beneficial to the classification of the picture by the whole network.
Specifically, regarding the critic network: the invention adopts a critic mechanism to simulate a human vision mechanism and drives an integral small sample to learn key areas of network attention pictures. The criticizing family network is also a deep learning network and aims to score extracted picture local descriptors, and under the training mode of unsupervised learning, the criticizing family network tends to score more important local descriptors in pictures to be high through parameter adjustment, so that the local descriptors have higher weight on the classification result.
Specifically, the graph-to-class classification method includes: graph-to-graph classification methods sometimes perform poorly due to the effects of intra-class differences. If the training set is small, one query set picture may not resemble much each picture in the support set. The image-to-class classification method integrates all local descriptors of a picture into a local descriptor pool, and calculates the similarity between all local descriptors of a query set picture and each class of local descriptor pool in a support set by adopting a K neighbor method, so as to obtain which class of pictures the query set picture belongs to in the support set, thereby effectively solving the problem of intra-class difference influence.
In the embodiment, a local descriptor extraction network is used for respectively extracting local descriptors of all query set pictures and support set pictures, all local descriptors of a query set picture are integrated into a local descriptor pool, local descriptors of each type of pictures in a support set are integrated into the local descriptor pool of the type, then a critic network is adopted to score all local descriptors of a query set picture, finally a graph-to-class classification method is used for calculating the similarity between all local descriptors of the query set picture and all local descriptor pools of the support set by using a K-nearest neighbor algorithm, and the classification weight of the local descriptors of the query set picture with high score is higher, so that the support set picture type which is more similar to the query set picture is found. The specific overall network structure is shown in fig. 1.
The specific operation flow of this embodiment is as follows:
1. extraction network F using a multi-layer convolutional neural network as local descriptore. And inputting the picture into a local descriptor extraction network, wherein the obtained output is a local descriptor set of the picture. In the embodiment, the full connection layer of the extraction network is removed, and only the convolution layer is reserved, so that the output result of the extraction network is not a global descriptor of the picture but a set of local descriptors. The specific extraction process is shown in the following formula:
Fe(X)=[x1,x2,…,xm]∈Rd×m
wherein X represents a currently input picture pixel value representation matrix, m represents the number of extracted local descriptors, m is the product of w and h (w is the width of the picture pixel value matrix, h is the height of the picture pixel value matrix), and XiFor the ith local descriptor of the picture, d is the filter number of the last convolution layer of the local descriptor extraction network and is also the dimension of the output local descriptor, and R represents the real number set. In this embodiment, the picture is adjusted to 84 × 84 pixels, m × h 21 × 441, and d 64. The output picture local description subset represents the feature information of the picture.
2. For each local descriptor of a query set picture X extracted by a local descriptor extraction network, a critic network FcA score is given to evaluate the importance of each local descriptor, as shown in figure 2. The specific scoring process is shown in the following formula:
Fc(X)=[s1,s2,…,sm]∈Rd×m
where m is the same as the set of the local descriptor extraction network, siThe scores of the critic network on the ith local descriptor are marked, d is fixed to be 1, namely the scores of the local descriptors at the same position of one picture are the same. Obviously, m × h 21 × 441 is the same as the local descriptor extraction network. Criticizing family networkAlso only convolutional layers are included and the activation function of the last layer is set to Sigmoid function to make the score derived by critic network compressed between 0-1.
3. A comparison module that considers scores; extraction network FeA query set picture XqPartial descriptor extraction ofe(X)=[x1,x2,…,xm]∈Rd×mExtracting a local description subcategory pool of a support setK is the number of pictures in each category in the support set, critics network FcGiving the score F of each local descriptor in the query set picturec(X)=[s1,s2,…,sm]∈R1×m. Each score-weighted query set local descriptor is compared to the pool of local descriptors for each class of the support set to find its k nearest neighbors in the poolThen the method calculates the image X of the query setqAnd m × k graph-to-class similarity values between class C are added as shown in the following formula:
the Cosine similarity is calculated as follows:
after comparing all the image categories in one query set and the support set, the similarity score result with each category is printed, and the label with the highest score result is the classification result of the current query set image.
4. Through the training of the meta-tasks of a plurality of generations, each meta-task comprises a support set and a query set, the parameters of the local descriptor extraction network and the critic network are adjusted towards the direction of improving the classification accuracy, and finally the relative stability is achieved, so that the whole small sample learning picture classification network can indicate which picture category of the support set a picture belongs to when facing the picture of the category which is never seen.
Specifically, in the embodiment, a pytorech library is used to build an integral network, and experiments are performed on four data sets, namely miniImageNet, StanfordDogs, Stanford Cars and CUB-200, wherein the experimental task setting comprises 1-shot and 5-shot. The method uses a cross entropy loss function and an Adam optimizer. In the training stage, 15 and 10 query set pictures are respectively extracted from each meta task and used for recognition tasks under 1-shot and 5-shot settings, and the initial learning rate is set at 5e 3. In the testing stage, 600 generation testing tasks are extracted, the data setting continues to use the setting in the training stage, and top-1 average accuracy and 95% confidence interval are adopted for evaluating the whole network. Experimental results show that the invention achieves unusual performances on four data sets, as shown in figures 3 and 4.
The present invention is not limited to the above-described embodiments. The foregoing description of the specific embodiments is intended to describe and illustrate the technical solutions of the present invention, and the above specific embodiments are merely illustrative and not restrictive. Those skilled in the art can make many changes and modifications to the invention without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (2)
1. A critic-driven small sample learning method based on local descriptors is characterized by comprising the following steps:
(1) extracting a local descriptor of the picture through a local descriptor extraction network;
(2) scoring the extracted local descriptors of the pictures through a critic network;
(3) and classifying the pictures according to the extracted local descriptors and the marked scores by using a picture-to-class classification method.
2. The local descriptor-based critic-driven small sample learning method according to claim 1, wherein in step (1), local descriptors of all query set pictures and support set pictures are extracted respectively by using a local descriptor extraction network, all local descriptors of a query set picture are integrated into a local descriptor pool, and local descriptors of each type of pictures in a support set are integrated into the local descriptor pool respectively;
in the step (2), a critic network is adopted to score all local descriptors of a query set picture;
and (3) calculating the similarity of all local descriptors of the query set picture and all local descriptor pools of the support set by using a K-nearest neighbor algorithm, and enabling the classification weight of the local descriptors of the query set picture with high scores to be higher, so as to find out the support set picture category which is more similar to the query set picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010777958.7A CN111860697A (en) | 2020-08-05 | 2020-08-05 | Local descriptor-based criticist-driven small sample learning method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010777958.7A CN111860697A (en) | 2020-08-05 | 2020-08-05 | Local descriptor-based criticist-driven small sample learning method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111860697A true CN111860697A (en) | 2020-10-30 |
Family
ID=72971377
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010777958.7A Pending CN111860697A (en) | 2020-08-05 | 2020-08-05 | Local descriptor-based criticist-driven small sample learning method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111860697A (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105938560A (en) * | 2016-03-23 | 2016-09-14 | 吉林大学 | Convolutional-neural-network-based vehicle model refined classification system |
CN107169415A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Human motion recognition method based on convolutional neural networks feature coding |
US20190005069A1 (en) * | 2017-06-28 | 2019-01-03 | Google Inc. | Image Retrieval with Deep Local Feature Descriptors and Attention-Based Keypoint Descriptors |
CN109784258A (en) * | 2019-01-08 | 2019-05-21 | 华南理工大学 | A kind of pedestrian's recognition methods again cut and merged based on Analysis On Multi-scale Features |
CN110020682A (en) * | 2019-03-29 | 2019-07-16 | 北京工商大学 | A kind of attention mechanism relationship comparison net model methodology based on small-sample learning |
CN110569886A (en) * | 2019-08-20 | 2019-12-13 | 天津大学 | Image classification method for bidirectional channel attention element learning |
CN110727844A (en) * | 2019-10-21 | 2020-01-24 | 东北林业大学 | Online commented commodity feature viewpoint extraction method based on generation countermeasure network |
CN111062441A (en) * | 2019-12-18 | 2020-04-24 | 武汉大学 | Scene classification method and device based on self-supervision mechanism and regional suggestion network |
-
2020
- 2020-08-05 CN CN202010777958.7A patent/CN111860697A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105938560A (en) * | 2016-03-23 | 2016-09-14 | 吉林大学 | Convolutional-neural-network-based vehicle model refined classification system |
CN107169415A (en) * | 2017-04-13 | 2017-09-15 | 西安电子科技大学 | Human motion recognition method based on convolutional neural networks feature coding |
US20190005069A1 (en) * | 2017-06-28 | 2019-01-03 | Google Inc. | Image Retrieval with Deep Local Feature Descriptors and Attention-Based Keypoint Descriptors |
CN109784258A (en) * | 2019-01-08 | 2019-05-21 | 华南理工大学 | A kind of pedestrian's recognition methods again cut and merged based on Analysis On Multi-scale Features |
CN110020682A (en) * | 2019-03-29 | 2019-07-16 | 北京工商大学 | A kind of attention mechanism relationship comparison net model methodology based on small-sample learning |
CN110569886A (en) * | 2019-08-20 | 2019-12-13 | 天津大学 | Image classification method for bidirectional channel attention element learning |
CN110727844A (en) * | 2019-10-21 | 2020-01-24 | 东北林业大学 | Online commented commodity feature viewpoint extraction method based on generation countermeasure network |
CN111062441A (en) * | 2019-12-18 | 2020-04-24 | 武汉大学 | Scene classification method and device based on self-supervision mechanism and regional suggestion network |
Non-Patent Citations (8)
Title |
---|
吴建等: "基于集成迁移学习的细粒度图像分类算法", 《重庆邮电大学学报(自然科学版)》 * |
李万相等: "基于深度学习算法的小样本人耳识别", 《计算机仿真》 * |
李国瑞等: "基于语义信息跨层特征融合的细粒度鸟类识别", 《计算机应用与软件》 * |
李娟等: "基于区域推荐和深度卷积网络的交通目标检测", 《数学的实践与认识》 * |
李杰等: "基于视觉注意力机制的异步优势行动者-评论家算法", 《计算机科学》 * |
汪荣贵等: "多级注意力特征网络的小样本学习", 《电子与信息学报》 * |
王飞等: "应用卷积网络及深度学习理论的羊绒与羊毛鉴别", 《纺织学报》 * |
罗建豪等: "基于深度卷积特征的细粒度图像分类研究综述", 《自动化学报》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Toward end-to-end car license plate detection and recognition with deep neural networks | |
Hu et al. | Pushing the limits of deep cnns for pedestrian detection | |
Amit et al. | A coarse-to-fine strategy for multiclass shape detection | |
Sivaraman et al. | A general active-learning framework for on-road vehicle recognition and tracking | |
Pan et al. | A robust system to detect and localize texts in natural scene images | |
Ruta et al. | Real-time traffic sign recognition from video by class-specific discriminative features | |
EP2291722B1 (en) | Method, apparatus and computer program product for providing gesture analysis | |
Serna et al. | Traffic signs detection and classification for European urban environments | |
CN103605972B (en) | Non-restricted environment face verification method based on block depth neural network | |
Ruta et al. | Robust class similarity measure for traffic sign recognition | |
CN109829467A (en) | Image labeling method, electronic device and non-transient computer-readable storage medium | |
CN112183468A (en) | Pedestrian re-identification method based on multi-attention combined multi-level features | |
CN109255284B (en) | Motion trajectory-based behavior identification method of 3D convolutional neural network | |
Yang et al. | Tracking based multi-orientation scene text detection: A unified framework with dynamic programming | |
CN106295532B (en) | A kind of human motion recognition method in video image | |
CN106384345B (en) | A kind of image detection and flow statistical method based on RCNN | |
Chen et al. | Recognizing human action from a far field of view | |
CN111008639B (en) | License plate character recognition method based on attention mechanism | |
CN110929593A (en) | Real-time significance pedestrian detection method based on detail distinguishing and distinguishing | |
CN113177612B (en) | Agricultural pest image identification method based on CNN few samples | |
CN108509861B (en) | Target tracking method and device based on combination of sample learning and target detection | |
CN111126401A (en) | License plate character recognition method based on context information | |
CN112233105A (en) | Road crack detection method based on improved FCN | |
CN111444816A (en) | Multi-scale dense pedestrian detection method based on fast RCNN | |
Gouet-Brunet et al. | Object recognition and segmentation in videos by connecting heterogeneous visual features |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20230707 |