CN112749738B - Zero sample object detection method for performing superclass reasoning by fusing context - Google Patents
Zero sample object detection method for performing superclass reasoning by fusing context Download PDFInfo
- Publication number
- CN112749738B CN112749738B CN202011618077.7A CN202011618077A CN112749738B CN 112749738 B CN112749738 B CN 112749738B CN 202011618077 A CN202011618077 A CN 202011618077A CN 112749738 B CN112749738 B CN 112749738B
- Authority
- CN
- China
- Prior art keywords
- superclass
- cell
- class
- context
- reasoning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a zero sample object detection method for performing super-class reasoning by fusing context, which can be used for positioning and identifying a brand new object which is never seen in the past under the condition that a tag training picture is missing. Firstly, predicting an object frame possibly existing in an input picture by using an object detection network, secondly, positioning the position of the object based on the visual characteristics of the object frame, predicting a specific object category by using a label semantic vector, extracting the context information of a candidate object frame by adopting multi-layer cavity convolution, predicting a corresponding superclass by using the extracted context information, and finally fusing the predicted specific category with the superclass to achieve a final recognition result. The method is simple, convenient and flexible, and can obviously improve the detection performance of the objects which are not seen.
Description
Technical Field
The invention relates to the technical field of computer vision, in particular to a zero sample object detection method for performing super-class reasoning by fusing contexts.
Background
Object detection is one of the classical problems in the computer vision field, and object detection techniques based on deep neural networks have met with great success in the last few years, one of which is the use of a large number of labeled training datasets with accurate bounding box labeling. However, on the one hand, it is difficult to collect and annotate all object class images beyond the daily object, such as some species that are dying, continually pushing new products out, etc. On the other hand, when the data of the target domain is relatively small or non-existent, the object detector trained on the source domain is difficult to generalize to the target domain. In contrast, a human being has the extraordinary ability to quickly learn new concepts, new objects, even if no image is seen, and a good object detection system should have this learning ability. In order to bridge the gap between object detectors and human intelligence, the ability to give object detectors the ability to detect completely new unknown target classes (i.e. zero sample object detection) has become one of the hotspot problems.
Zero sample object detection aims to detect unknown object classes in the absence of supervised training samples. It requires not only model recognition of the class of objects, but also accurate localization of the target within millions of potential candidate regions, as compared to zero sample recognition. The common practice for zero sample object detection is to incorporate a zero sample classifier into an existing object detection framework, such as Faster R-CNN, YOLO, etc., to bridge the semantic gap between visible and unknown class objects by aligning the visual features of each object region with the inherent properties of the class of objects (i.e., class semantic embedding).
However, this type of approach has two drawbacks. First, they use only limited visual features to identify candidate regions, ignoring the importance of context information, which has shown great potential in multiple tasks. In this case, visually similar but semantically different objects will be falsely detected. For example, a green apple in a kitchen may be mistaken for a tennis ball because the appearance of the two objects is very similar. Second, such methods ignore domain offset problems due to the different source domain and target domain data distributions, resulting in a detector trained on the source domain that does not generalize well to the target domain. This problem is exacerbated when both known and unknown classes are present in the picture.
Disclosure of Invention
In order to solve the defects in the prior art and achieve the purpose of improving the detection accuracy of a zero sample object, the invention adopts the following technical scheme:
a zero sample object detection method for performing super-class reasoning by fusing context includes the following steps:
step one: extracting depth picture characteristics;
step two: based on the extracted depth picture features, for each cell, predicting the position coordinates of the cell by using a coordinate prediction network, and predicting the confidence of whether an object exists in the cell by using a confidence prediction network;
step three: classifying candidate object frames by using a zero sample classifier based on the visual characteristics of each cell to obtain a fine-grained classification result, wherein each cell can predict one or more candidate object frame positions, the characteristics of each candidate object frame are the visual characteristics of the current cell, and zero sample classification is performed based on the visual characteristics;
step four: on the basis of the first step, extracting the context information of each cell by using a context extraction network, and simultaneously predicting the superclass information of each candidate object by using a superclass prediction network on the basis of the extracted context characteristics;
step five: and (3) organically fusing the super class predicted in the step four with the class predicted in the step three to obtain a final classification result.
Combining the superclass corresponding to the context in the step four with the class predicted in the step three to finally obtain a final classification result, and solving the problem that when limited visual features are used for identifying candidate object areas, the importance of the context information is ignored, so that objects which are similar in vision but are different in semantics can be detected wrongly;
because the source domain and the target domain are mutually exclusive, namely the class in the source domain (namely the training class) is completely different from the class in the target domain (namely the testing class), however, because of the superclass relation, the superclass has unknown class information, and the information of the testing class can be considered to be taken into the training process, so that the detector considers the information of the target domain in the training process, thereby having generalization capability, improving the detection capability of the target domain, and avoiding the domain offset problem caused by different data distribution of the source domain and the target domain, and leading the detector trained on the source domain to be not well generalized to the target domain.
Further, the third step specifically includes the following steps:
step 3.1: based on the visual features of each cell, projecting the visual features to a semantic embedding space using a nonlinear equation; the visual characteristics of the cell are x i The visual characteristic after projection is k i I represents the i-th cell;
step 3.2: calculating the similarity between the projection vector of the visual characteristic and the semantic embedded vector of the object class in the semantic embedded space to obtain a classification score value and give a classification result with fine granularity; score value is expressed asC s The number of classes with objects is represented, j represents the j-th training class, and s represents the known class, i.e., training class. The object classes are given in advance, in zero sample object detection, which object classes need to be detected are given in advance, and the labels of the object classes are given in advance, so that the purpose of classification can be achieved by only comparing projection vectors with semantic embedded vectors of the given object classes in a semantic embedded space.
Further, the fourth step specifically includes the following steps:
step 4.1: extracting a context feature matrix through context feature extraction on a feature matrix of a given input picture, and obtaining a feature matrix with a larger visual field range on an original image by using hole convolution, wherein the feature matrix fuses context information of candidate object frames;
step 4.2: extracting a superclass relation between the object class and the object class from the semantic web, so that the superclass at least contains 1 test object class; thus, each object class belongs to a superclass;
step 4.3: based on the context characteristics of each cell, predicting corresponding superclasses by using a multi-layer fully connected network; the true superclass of cells is represented asThe predicted superclass is denoted +.>The network was optimized using the following cross entropy loss:
where h×w is the number of cells, i is the number of cells, and the superclass prediction is performed on each cell, C is the number of superclasses, and j is the j-th superclass.
Further, the fifth step specifically includes the following steps:
step 5.1: the super class score value predicted in the step 4.3 is calculatedMultiplying the fine-grained classification score obtained in step 3.2 to obtain a final classification result, which is expressed as +.>Wherein s is->The superclass where the cross entropy loss is located is: />
Where obj is an abbreviation for object, indicating whether the ith cell contains an object, if any,otherwise is->And obtaining an optimal classification result through the cross entropy loss function.
Further, the contextual feature extraction in step 4.1, specifically,is a picture feature matrix, X represents a tensor matrix, R represents the whole real number interval, H X W X d v Is a three-dimensional matrix, which represents the size of X, and the size of each dimension is H, W and d respectively v H W is the number of cells, each cell characterized by d v In dimension, a plurality of hole convolution blocks are continuously used on X, wherein each hole convolution block comprises K hole convolutions, and the hole rate is +.>i represents the ith cavity convolution, r represents the cavity rate of the cavity convolution, and the feature matrix obtained by each cavity convolution block is fused to obtain a final context feature matrix +.>H×W×d c Representing the size of the contextual feature matrix, where the contextual features of each cell are represented as p i I represents the number of cells.
Further, the nonlinear equation in the step 3.1 is a fully connected network.
Further, the hole convolution in the step 4.1 adopts a multi-layer 3*3 size hole convolution.
Further, the semantic net in the step 4.2 is WordNet.
The invention has the advantages that:
the invention combines the superclass corresponding to the context with the predicted class to finally obtain the final classification result, and solves the problem that when the limited visual features are used for identifying the candidate object areas, the importance of the context information is ignored, so that objects which are similar in vision but are semantically different are detected wrongly;
because the source domain and the target domain are mutually exclusive, namely the class in the source domain (namely the training class) is completely different from the class in the target domain (namely the testing class), however, because of the superclass relation, the superclass has unknown class information, and the information of the testing class can be considered to be taken into the training process, so that the detector considers the information of the target domain in the training process, thereby having generalization capability, improving the detection capability of the target domain, and avoiding the domain offset problem caused by different data distribution of the source domain and the target domain, and leading the detector trained on the source domain to be not well generalized to the target domain.
Drawings
Fig. 1 is a diagram of an object detection framework of the present invention.
FIG. 2 is a schematic diagram of a hollow convolution structure in accordance with the present invention.
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
As shown in FIG. 1, a zero sample object detection method for performing super-class reasoning by fusing context comprises the following steps:
step one: and inputting the picture into a deep neural network (CNN) to extract the depth picture characteristics.
Step two: based on the extracted depth picture features, predicting the position coordinates of each cell by using a coordinate prediction network; while using a confidence prediction network, predicting whether there is confidence (confidence) for the object in the cell. In fig. 1, p represents a probability value (between 0 and 1) of whether an object is contained in the cell, x, y, w, h represents a predicted object frame position in the cell, where x, y represents a center point coordinate of the object frame, w represents a width of the object frame, h represents a height of the object frame, c represents a category in p (c|s), and s represents a supercategory.
Step three: classifying the candidate object frames by using a zero sample classifier based on the visual characteristics of each cell, to obtain a fine-grained classification result, wherein each cell can predict one or more candidate object frame positions (i.e. p, x, y, w, h in fig. 1), and the characteristics of each candidate object frame are the visual characteristics of the current cell, and zero sample classification is performed based on the visual characteristics, and comprises the following steps:
3.1, based on the visual features of each cell, projecting the visual features to a semantic embedding space using a nonlinear equation FC (fully-connected network); assuming that the visual characteristics of the cell are x i The visual characteristic after projection is k i I represents the i-th cell;
3.2, calculating the similarity between the projection vector of the visual characteristic and the semantic embedded vector of the object class in the semantic embedded space to obtain a classification score value and give a classification result with fine granularity; score value is expressed asC s Representing the number of classes of objects, i.e. a total of C during training s The j represents the j-th training class (i.e., the known class because there is data at the time of training, and therefore known class), s is the abbreviation of sen, representing the known class, i.e., training class. The object classes are given in advance, in zero sample object detection, which object classes need to be detected are given in advance, and the labels of the object classes are given in advance, so that the purpose of classification can be achieved by only comparing projection vectors with semantic embedded vectors of the given object classes in a semantic embedded space.
Step four: on the basis of the first step, extracting the context information of each cell by using a context extraction network, and simultaneously predicting the superclass information of each candidate object by using a superclass prediction network based on the extracted context characteristics, comprising the following steps:
4.1, extracting a context feature matrix through context feature extraction CFE (contextual feature extraction) on the feature matrix of a given input picture, and obtaining a feature matrix with a larger visual field range on an original image by using multi-layer 3*3-sized cavity convolution, wherein the feature matrix fuses context information of candidate object frames; in particular ifIs a picture feature matrix, X represents a tensor matrix (also called tensor, represents an input picture to a deep learning network, and the network outputs a three-dimensional matrix representing the picture), R represents an entire real number interval (because each element in X is a real number, namely, the value range of X is the entire real number interval), and H is multiplied by W is multiplied by d v Is a three-dimensional matrix, which represents the size of X, and the size of each dimension is H, W and d respectively v H W is the number of cells,each unit cell is characterized by d v Dimension, v is an abbreviation for visual, we use a number of hole convolution blocks consecutively on X, where each hole convolution block contains K hole convolutions, the hole rate is +.>i represents the ith cavity convolution (from 1 st to Kth), r represents the cavity rate of the cavity convolution, as shown in FIG. 2, and the feature matrix obtained by each cavity convolution block is fused to obtain a final context feature matrix->H×W×d c Representing the size of the context feature matrix, c is an abbreviation for context, different from d v Wherein the contextual characteristics of each cell are denoted as p i I represents the number of cells;
4.2, extracting a superclass relation (building superclass) between the object classes from the WordNet, so that the superclass at least contains 1 test object class; thus, each object class belongs to a superclass;
4.3, based on the contextual characteristics of each cell, predicting corresponding superclasses using a multi-layer fully connected network (e.g., MLP, multilayer Perceptron, multi-layer perceptron); assuming that the true superclass of the cell is represented asThe predicted superclass is denoted +.>The network was optimized using the following cross entropy loss: />
Wherein i represents the number of cells, and each cell is subjected to superclass prediction, C represents the number of superclasses, each cell needs to predict the probability of C superclasses, and j represents the j-th superclass.
Step five: organically fusing the super class predicted in the fourth step with the class predicted in the third step to obtain a final classification result, wherein the method comprises the following steps of:
5.1, the predicted superclass score value (one superclass score per cell is predicted, step 4.3)) Multiplying the fine-grained classification score obtained in the step 3.2 to obtain a final classification result. Assume that the classification result is expressed as +.>Wherein s is->The superclass where the cross entropy loss is located is:
where obj is an abbreviation for object, indicating whether the ith cell contains an object, if any,otherwise is->And obtaining an optimal classification result through the cross entropy loss function.
Each cell has its own object class and context characteristics, and through the superclass relationship, it is first known that the superclass corresponding to each object class, such as the superclass of "dog" and "cat" may be "animal". This is already constructed in advance. We use the visual features of the cells to predict the corresponding object class, such as dogs, and use the contextual features of the cells to predict superclasses, i.e., animals. And fusing the two paths of prediction results to obtain a final classification result.
Combining the superclass corresponding to the context in the step four with the class predicted in the step three to finally obtain a final classification result, and solving the problem that when limited visual features are used for identifying candidate object areas, the importance of the context information is ignored, so that objects which are similar in vision but are different in semantics can be detected wrongly;
because the source domain and the target domain are mutually exclusive, namely the class in the source domain (namely the training class) is completely different from the class in the target domain (namely the testing class), however, because of the superclass relation, the superclass has unknown class information, and the information of the testing class can be considered to be taken into the training process, so that the detector considers the information of the target domain in the training process, thereby having generalization capability, improving the detection capability of the target domain, and avoiding the domain offset problem caused by different data distribution of the source domain and the target domain, and leading the detector trained on the source domain to be not well generalized to the target domain.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the technical solutions according to the embodiments of the present invention.
Claims (7)
1. A zero sample object detection method for performing super-class reasoning by fusing context is characterized by comprising the following steps:
step one: extracting depth picture characteristics;
step two: based on the extracted depth picture features, for each cell, predicting the position coordinates of the cell by using a coordinate prediction network, and predicting the confidence of whether an object exists in the cell by using a confidence prediction network;
step three: classifying candidate object frames by using a zero sample classifier based on the visual characteristics of each cell to obtain a fine-grained classification result, wherein each cell can predict one or more candidate object frame positions, the characteristics of each candidate object frame are the visual characteristics of the current cell, and zero sample classification is performed based on the visual characteristics; the method specifically comprises the following steps:
step 3.1: based on the visual features of each cell, projecting the visual features to a semantic embedding space using a nonlinear equation; the visual characteristics of the cell are x i The visual characteristic after projection is k i I represents the i-th cell;
step 3.2: calculating the similarity between the projection vector of the visual characteristic and the semantic embedded vector of the object class in the semantic embedded space to obtain a classification score value and give a classification result with fine granularity; score value is expressed asC s Representing the number of the classes with the objects, j representing the jth training class, s representing the known class, namely the training class; the object class is given in advance;
step four: on the basis of the first step, extracting the context information of each cell by using a context extraction network, and simultaneously predicting the superclass information of each candidate object by using a superclass prediction network on the basis of the extracted context characteristics;
step five: organically fusing the super class predicted in the fourth step with the class predicted in the third step to obtain a final classification result; multiplying the super class information predicted in the fourth step by the fine granularity classification result obtained in the step S3.2 to obtain a final classification result.
2. The method for detecting zero sample object by fusion context to perform superclass reasoning as set forth in claim 1, wherein said step four comprises the steps of:
step 4.1: extracting a context feature matrix through context feature extraction on the feature matrix of a given input picture, and using hole convolution;
step 4.2: extracting a superclass relation between the object class and the object class from the semantic web, so that the superclass at least contains 1 test object class;
step 4.3: predicting superclasses using a multi-layer fully connected network based on the contextual characteristics of each cell; the true superclass of cells is represented asThe predicted superclass is denoted +.>The network was optimized using the following cross entropy loss:
where h×w is the number of cells, i is the number of cells, and the superclass prediction is performed on each cell, C is the number of superclasses, and j is the j-th superclass.
3. The method for detecting zero sample object by fusion context to perform superclass reasoning as set forth in claim 2, wherein said step five comprises the steps of:
step 5.1: the super class score value predicted in the step 4.3 is calculatedMultiplying the fine-grained classification score obtained in step 3.2 to obtain a final classification result, which is expressed as +.>Wherein s is->The superclass where the cross entropy loss is located is:
4. A method of zero sample object detection with context fusion for superclass reasoning as claimed in claim 2, characterized by the context feature extraction in step 4.1, in particular,is a picture feature matrix, X represents a tensor matrix, R represents the whole real number interval, H X W X d v Is a three-dimensional matrix, which represents the size of X, and the size of each dimension is H, W and d respectively v H W is the number of cells, each cell characterized by d v In dimension, a plurality of hole convolution blocks are continuously used on X, wherein each hole convolution block comprises K hole convolutions, and the hole rate is +.>i represents the ith cavity convolution, r represents the cavity rate of the cavity convolution, and the feature matrix obtained by each cavity convolution block is fused to obtain a final context feature matrixH×W×d c Representing the size of the contextual feature matrix, where the contextual features of each cell are represented as p i I represents the number of cells.
5. The method for detecting zero-sample objects by fusion context to perform superclass reasoning according to claim 1, wherein the nonlinear equation in the step 3.1 is a fully-connected network.
6. The method for detecting zero-sample object by fusion context to perform superclass reasoning according to claim 2, wherein the hole convolution in the step 4.1 adopts a multi-layer 3*3 hole convolution.
7. The method for detecting zero-sample objects by fusion context to perform superclass reasoning according to claim 2, wherein the semantic net in the step 4.2 is WordNet.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011618077.7A CN112749738B (en) | 2020-12-30 | 2020-12-30 | Zero sample object detection method for performing superclass reasoning by fusing context |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011618077.7A CN112749738B (en) | 2020-12-30 | 2020-12-30 | Zero sample object detection method for performing superclass reasoning by fusing context |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112749738A CN112749738A (en) | 2021-05-04 |
CN112749738B true CN112749738B (en) | 2023-05-23 |
Family
ID=75650165
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011618077.7A Active CN112749738B (en) | 2020-12-30 | 2020-12-30 | Zero sample object detection method for performing superclass reasoning by fusing context |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112749738B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113672711B (en) * | 2021-08-09 | 2024-01-19 | 之江实验室 | Service type robot intention recognition device and training and recognition method thereof |
CN113887647A (en) * | 2021-10-14 | 2022-01-04 | 浙江大学 | Class increase and decrease sample object detection method integrating knowledge distillation and class representative point extraction |
CN116994104B (en) * | 2023-07-19 | 2024-06-11 | 湖北楚天高速数字科技有限公司 | Zero sample identification method and system based on tensor fusion and contrast learning |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9928448B1 (en) * | 2016-09-23 | 2018-03-27 | International Business Machines Corporation | Image classification utilizing semantic relationships in a classification hierarchy |
CN110582777A (en) * | 2017-05-05 | 2019-12-17 | 赫尔实验室有限公司 | Zero-sample machine vision system with joint sparse representation |
CN111428733A (en) * | 2020-03-12 | 2020-07-17 | 山东大学 | Zero sample target detection method and system based on semantic feature space conversion |
CN112036170A (en) * | 2020-09-03 | 2020-12-04 | 浙江大学 | Neural zero sample fine-grained entity classification method based on type attention |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10908616B2 (en) * | 2017-05-05 | 2021-02-02 | Hrl Laboratories, Llc | Attribute aware zero shot machine vision system via joint sparse representations |
US11055555B2 (en) * | 2018-04-20 | 2021-07-06 | Sri International | Zero-shot object detection |
CN109993197B (en) * | 2018-12-07 | 2023-04-28 | 天津大学 | Zero sample multi-label classification method based on depth end-to-end example differentiation |
CN110826638B (en) * | 2019-11-12 | 2023-04-18 | 福州大学 | Zero sample image classification model based on repeated attention network and method thereof |
CN111680757A (en) * | 2020-06-12 | 2020-09-18 | 汪金玲 | Zero sample image recognition algorithm and system based on self-encoder |
-
2020
- 2020-12-30 CN CN202011618077.7A patent/CN112749738B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9928448B1 (en) * | 2016-09-23 | 2018-03-27 | International Business Machines Corporation | Image classification utilizing semantic relationships in a classification hierarchy |
CN110582777A (en) * | 2017-05-05 | 2019-12-17 | 赫尔实验室有限公司 | Zero-sample machine vision system with joint sparse representation |
CN111428733A (en) * | 2020-03-12 | 2020-07-17 | 山东大学 | Zero sample target detection method and system based on semantic feature space conversion |
CN112036170A (en) * | 2020-09-03 | 2020-12-04 | 浙江大学 | Neural zero sample fine-grained entity classification method based on type attention |
Also Published As
Publication number | Publication date |
---|---|
CN112749738A (en) | 2021-05-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xing et al. | A convolutional neural network-based method for workpiece surface defect detection | |
CN112749738B (en) | Zero sample object detection method for performing superclass reasoning by fusing context | |
Oliveira et al. | Deep learning for human part discovery in images | |
CN107133569B (en) | Monitoring video multi-granularity labeling method based on generalized multi-label learning | |
Sillito et al. | Semi-supervised learning for anomalous trajectory detection | |
EP3447727B1 (en) | A method, an apparatus and a computer program product for object detection | |
Yang et al. | Multi-object tracking with discriminant correlation filter based deep learning tracker | |
CN110633632A (en) | Weak supervision combined target detection and semantic segmentation method based on loop guidance | |
Yang et al. | Detecting coarticulation in sign language using conditional random fields | |
CN111125406A (en) | Visual relation detection method based on self-adaptive cluster learning | |
CN113378676A (en) | Method for detecting figure interaction in image based on multi-feature fusion | |
Zhang et al. | An efficient semi-supervised manifold embedding for crowd counting | |
Ghatak et al. | GAN based efficient foreground extraction and HGWOSA based optimization for video synopsis generation | |
Iqbal et al. | Classifier comparison for MSER-based text classification in scene images | |
Zhao et al. | BiTNet: a lightweight object detection network for real-time classroom behavior recognition with transformer and bi-directional pyramid network | |
Athira et al. | Underwater object detection model based on YOLOv3 architecture using deep neural networks | |
Mo et al. | Student behavior recognition based on multitask learning | |
Attia et al. | Efficient deep learning models based on tension techniques for sign language recognition | |
Abdulghani et al. | Discover human poses similarity and action recognition based on machine learning | |
CN115482436B (en) | Training method and device for image screening model and image screening method | |
Wang et al. | Spatial relationship recognition via heterogeneous representation: A review | |
Yao et al. | Extracting robust distribution using adaptive Gaussian Mixture Model and online feature selection | |
CN113223018A (en) | Fine-grained image analysis processing method | |
CN113516118A (en) | Image and text combined embedded multi-mode culture resource processing method | |
Dong et al. | Intelligent pixel-level pavement marking detection using 2D laser pavement images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |