CN112749738B - Zero sample object detection method for performing superclass reasoning by fusing context - Google Patents

Zero sample object detection method for performing superclass reasoning by fusing context Download PDF

Info

Publication number
CN112749738B
CN112749738B CN202011618077.7A CN202011618077A CN112749738B CN 112749738 B CN112749738 B CN 112749738B CN 202011618077 A CN202011618077 A CN 202011618077A CN 112749738 B CN112749738 B CN 112749738B
Authority
CN
China
Prior art keywords
superclass
cell
class
context
reasoning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011618077.7A
Other languages
Chinese (zh)
Other versions
CN112749738A (en
Inventor
李亚南
李太豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202011618077.7A priority Critical patent/CN112749738B/en
Publication of CN112749738A publication Critical patent/CN112749738A/en
Application granted granted Critical
Publication of CN112749738B publication Critical patent/CN112749738B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a zero sample object detection method for performing super-class reasoning by fusing context, which can be used for positioning and identifying a brand new object which is never seen in the past under the condition that a tag training picture is missing. Firstly, predicting an object frame possibly existing in an input picture by using an object detection network, secondly, positioning the position of the object based on the visual characteristics of the object frame, predicting a specific object category by using a label semantic vector, extracting the context information of a candidate object frame by adopting multi-layer cavity convolution, predicting a corresponding superclass by using the extracted context information, and finally fusing the predicted specific category with the superclass to achieve a final recognition result. The method is simple, convenient and flexible, and can obviously improve the detection performance of the objects which are not seen.

Description

Zero sample object detection method for performing superclass reasoning by fusing context
Technical Field
The invention relates to the technical field of computer vision, in particular to a zero sample object detection method for performing super-class reasoning by fusing contexts.
Background
Object detection is one of the classical problems in the computer vision field, and object detection techniques based on deep neural networks have met with great success in the last few years, one of which is the use of a large number of labeled training datasets with accurate bounding box labeling. However, on the one hand, it is difficult to collect and annotate all object class images beyond the daily object, such as some species that are dying, continually pushing new products out, etc. On the other hand, when the data of the target domain is relatively small or non-existent, the object detector trained on the source domain is difficult to generalize to the target domain. In contrast, a human being has the extraordinary ability to quickly learn new concepts, new objects, even if no image is seen, and a good object detection system should have this learning ability. In order to bridge the gap between object detectors and human intelligence, the ability to give object detectors the ability to detect completely new unknown target classes (i.e. zero sample object detection) has become one of the hotspot problems.
Zero sample object detection aims to detect unknown object classes in the absence of supervised training samples. It requires not only model recognition of the class of objects, but also accurate localization of the target within millions of potential candidate regions, as compared to zero sample recognition. The common practice for zero sample object detection is to incorporate a zero sample classifier into an existing object detection framework, such as Faster R-CNN, YOLO, etc., to bridge the semantic gap between visible and unknown class objects by aligning the visual features of each object region with the inherent properties of the class of objects (i.e., class semantic embedding).
However, this type of approach has two drawbacks. First, they use only limited visual features to identify candidate regions, ignoring the importance of context information, which has shown great potential in multiple tasks. In this case, visually similar but semantically different objects will be falsely detected. For example, a green apple in a kitchen may be mistaken for a tennis ball because the appearance of the two objects is very similar. Second, such methods ignore domain offset problems due to the different source domain and target domain data distributions, resulting in a detector trained on the source domain that does not generalize well to the target domain. This problem is exacerbated when both known and unknown classes are present in the picture.
Disclosure of Invention
In order to solve the defects in the prior art and achieve the purpose of improving the detection accuracy of a zero sample object, the invention adopts the following technical scheme:
a zero sample object detection method for performing super-class reasoning by fusing context includes the following steps:
step one: extracting depth picture characteristics;
step two: based on the extracted depth picture features, for each cell, predicting the position coordinates of the cell by using a coordinate prediction network, and predicting the confidence of whether an object exists in the cell by using a confidence prediction network;
step three: classifying candidate object frames by using a zero sample classifier based on the visual characteristics of each cell to obtain a fine-grained classification result, wherein each cell can predict one or more candidate object frame positions, the characteristics of each candidate object frame are the visual characteristics of the current cell, and zero sample classification is performed based on the visual characteristics;
step four: on the basis of the first step, extracting the context information of each cell by using a context extraction network, and simultaneously predicting the superclass information of each candidate object by using a superclass prediction network on the basis of the extracted context characteristics;
step five: and (3) organically fusing the super class predicted in the step four with the class predicted in the step three to obtain a final classification result.
Combining the superclass corresponding to the context in the step four with the class predicted in the step three to finally obtain a final classification result, and solving the problem that when limited visual features are used for identifying candidate object areas, the importance of the context information is ignored, so that objects which are similar in vision but are different in semantics can be detected wrongly;
because the source domain and the target domain are mutually exclusive, namely the class in the source domain (namely the training class) is completely different from the class in the target domain (namely the testing class), however, because of the superclass relation, the superclass has unknown class information, and the information of the testing class can be considered to be taken into the training process, so that the detector considers the information of the target domain in the training process, thereby having generalization capability, improving the detection capability of the target domain, and avoiding the domain offset problem caused by different data distribution of the source domain and the target domain, and leading the detector trained on the source domain to be not well generalized to the target domain.
Further, the third step specifically includes the following steps:
step 3.1: based on the visual features of each cell, projecting the visual features to a semantic embedding space using a nonlinear equation; the visual characteristics of the cell are x i The visual characteristic after projection is k i I represents the i-th cell;
step 3.2: calculating the similarity between the projection vector of the visual characteristic and the semantic embedded vector of the object class in the semantic embedded space to obtain a classification score value and give a classification result with fine granularity; score value is expressed as
Figure BDA0002873134970000021
C s The number of classes with objects is represented, j represents the j-th training class, and s represents the known class, i.e., training class. The object classes are given in advance, in zero sample object detection, which object classes need to be detected are given in advance, and the labels of the object classes are given in advance, so that the purpose of classification can be achieved by only comparing projection vectors with semantic embedded vectors of the given object classes in a semantic embedded space.
Further, the fourth step specifically includes the following steps:
step 4.1: extracting a context feature matrix through context feature extraction on a feature matrix of a given input picture, and obtaining a feature matrix with a larger visual field range on an original image by using hole convolution, wherein the feature matrix fuses context information of candidate object frames;
step 4.2: extracting a superclass relation between the object class and the object class from the semantic web, so that the superclass at least contains 1 test object class; thus, each object class belongs to a superclass;
step 4.3: based on the context characteristics of each cell, predicting corresponding superclasses by using a multi-layer fully connected network; the true superclass of cells is represented as
Figure BDA0002873134970000022
The predicted superclass is denoted +.>
Figure BDA0002873134970000023
The network was optimized using the following cross entropy loss:
Figure BDA0002873134970000031
where h×w is the number of cells, i is the number of cells, and the superclass prediction is performed on each cell, C is the number of superclasses, and j is the j-th superclass.
Further, the fifth step specifically includes the following steps:
step 5.1: the super class score value predicted in the step 4.3 is calculated
Figure BDA0002873134970000032
Multiplying the fine-grained classification score obtained in step 3.2 to obtain a final classification result, which is expressed as +.>
Figure BDA0002873134970000033
Wherein s is->
Figure BDA0002873134970000034
The superclass where the cross entropy loss is located is: />
Figure BDA0002873134970000035
Where obj is an abbreviation for object, indicating whether the ith cell contains an object, if any,
Figure BDA0002873134970000036
otherwise is->
Figure BDA0002873134970000037
And obtaining an optimal classification result through the cross entropy loss function.
Further, the contextual feature extraction in step 4.1, specifically,
Figure BDA0002873134970000038
is a picture feature matrix, X represents a tensor matrix, R represents the whole real number interval, H X W X d v Is a three-dimensional matrix, which represents the size of X, and the size of each dimension is H, W and d respectively v H W is the number of cells, each cell characterized by d v In dimension, a plurality of hole convolution blocks are continuously used on X, wherein each hole convolution block comprises K hole convolutions, and the hole rate is +.>
Figure BDA0002873134970000039
i represents the ith cavity convolution, r represents the cavity rate of the cavity convolution, and the feature matrix obtained by each cavity convolution block is fused to obtain a final context feature matrix +.>
Figure BDA00028731349700000310
H×W×d c Representing the size of the contextual feature matrix, where the contextual features of each cell are represented as p i I represents the number of cells.
Further, the nonlinear equation in the step 3.1 is a fully connected network.
Further, the hole convolution in the step 4.1 adopts a multi-layer 3*3 size hole convolution.
Further, the semantic net in the step 4.2 is WordNet.
The invention has the advantages that:
the invention combines the superclass corresponding to the context with the predicted class to finally obtain the final classification result, and solves the problem that when the limited visual features are used for identifying the candidate object areas, the importance of the context information is ignored, so that objects which are similar in vision but are semantically different are detected wrongly;
because the source domain and the target domain are mutually exclusive, namely the class in the source domain (namely the training class) is completely different from the class in the target domain (namely the testing class), however, because of the superclass relation, the superclass has unknown class information, and the information of the testing class can be considered to be taken into the training process, so that the detector considers the information of the target domain in the training process, thereby having generalization capability, improving the detection capability of the target domain, and avoiding the domain offset problem caused by different data distribution of the source domain and the target domain, and leading the detector trained on the source domain to be not well generalized to the target domain.
Drawings
Fig. 1 is a diagram of an object detection framework of the present invention.
FIG. 2 is a schematic diagram of a hollow convolution structure in accordance with the present invention.
Detailed Description
The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.
As shown in FIG. 1, a zero sample object detection method for performing super-class reasoning by fusing context comprises the following steps:
step one: and inputting the picture into a deep neural network (CNN) to extract the depth picture characteristics.
Step two: based on the extracted depth picture features, predicting the position coordinates of each cell by using a coordinate prediction network; while using a confidence prediction network, predicting whether there is confidence (confidence) for the object in the cell. In fig. 1, p represents a probability value (between 0 and 1) of whether an object is contained in the cell, x, y, w, h represents a predicted object frame position in the cell, where x, y represents a center point coordinate of the object frame, w represents a width of the object frame, h represents a height of the object frame, c represents a category in p (c|s), and s represents a supercategory.
Step three: classifying the candidate object frames by using a zero sample classifier based on the visual characteristics of each cell, to obtain a fine-grained classification result, wherein each cell can predict one or more candidate object frame positions (i.e. p, x, y, w, h in fig. 1), and the characteristics of each candidate object frame are the visual characteristics of the current cell, and zero sample classification is performed based on the visual characteristics, and comprises the following steps:
3.1, based on the visual features of each cell, projecting the visual features to a semantic embedding space using a nonlinear equation FC (fully-connected network); assuming that the visual characteristics of the cell are x i The visual characteristic after projection is k i I represents the i-th cell;
3.2, calculating the similarity between the projection vector of the visual characteristic and the semantic embedded vector of the object class in the semantic embedded space to obtain a classification score value and give a classification result with fine granularity; score value is expressed as
Figure BDA0002873134970000041
C s Representing the number of classes of objects, i.e. a total of C during training s The j represents the j-th training class (i.e., the known class because there is data at the time of training, and therefore known class), s is the abbreviation of sen, representing the known class, i.e., training class. The object classes are given in advance, in zero sample object detection, which object classes need to be detected are given in advance, and the labels of the object classes are given in advance, so that the purpose of classification can be achieved by only comparing projection vectors with semantic embedded vectors of the given object classes in a semantic embedded space.
Step four: on the basis of the first step, extracting the context information of each cell by using a context extraction network, and simultaneously predicting the superclass information of each candidate object by using a superclass prediction network based on the extracted context characteristics, comprising the following steps:
4.1, extracting a context feature matrix through context feature extraction CFE (contextual feature extraction) on the feature matrix of a given input picture, and obtaining a feature matrix with a larger visual field range on an original image by using multi-layer 3*3-sized cavity convolution, wherein the feature matrix fuses context information of candidate object frames; in particular if
Figure BDA0002873134970000042
Is a picture feature matrix, X represents a tensor matrix (also called tensor, represents an input picture to a deep learning network, and the network outputs a three-dimensional matrix representing the picture), R represents an entire real number interval (because each element in X is a real number, namely, the value range of X is the entire real number interval), and H is multiplied by W is multiplied by d v Is a three-dimensional matrix, which represents the size of X, and the size of each dimension is H, W and d respectively v H W is the number of cells,each unit cell is characterized by d v Dimension, v is an abbreviation for visual, we use a number of hole convolution blocks consecutively on X, where each hole convolution block contains K hole convolutions, the hole rate is +.>
Figure BDA0002873134970000051
i represents the ith cavity convolution (from 1 st to Kth), r represents the cavity rate of the cavity convolution, as shown in FIG. 2, and the feature matrix obtained by each cavity convolution block is fused to obtain a final context feature matrix->
Figure BDA0002873134970000052
H×W×d c Representing the size of the context feature matrix, c is an abbreviation for context, different from d v Wherein the contextual characteristics of each cell are denoted as p i I represents the number of cells;
4.2, extracting a superclass relation (building superclass) between the object classes from the WordNet, so that the superclass at least contains 1 test object class; thus, each object class belongs to a superclass;
4.3, based on the contextual characteristics of each cell, predicting corresponding superclasses using a multi-layer fully connected network (e.g., MLP, multilayer Perceptron, multi-layer perceptron); assuming that the true superclass of the cell is represented as
Figure BDA0002873134970000053
The predicted superclass is denoted +.>
Figure BDA0002873134970000054
The network was optimized using the following cross entropy loss: />
Figure BDA0002873134970000055
Wherein i represents the number of cells, and each cell is subjected to superclass prediction, C represents the number of superclasses, each cell needs to predict the probability of C superclasses, and j represents the j-th superclass.
Step five: organically fusing the super class predicted in the fourth step with the class predicted in the third step to obtain a final classification result, wherein the method comprises the following steps of:
5.1, the predicted superclass score value (one superclass score per cell is predicted, step 4.3)
Figure BDA0002873134970000056
) Multiplying the fine-grained classification score obtained in the step 3.2 to obtain a final classification result. Assume that the classification result is expressed as +.>
Figure BDA0002873134970000057
Wherein s is->
Figure BDA0002873134970000058
The superclass where the cross entropy loss is located is:
Figure BDA0002873134970000059
where obj is an abbreviation for object, indicating whether the ith cell contains an object, if any,
Figure BDA00028731349700000510
otherwise is->
Figure BDA00028731349700000511
And obtaining an optimal classification result through the cross entropy loss function.
Each cell has its own object class and context characteristics, and through the superclass relationship, it is first known that the superclass corresponding to each object class, such as the superclass of "dog" and "cat" may be "animal". This is already constructed in advance. We use the visual features of the cells to predict the corresponding object class, such as dogs, and use the contextual features of the cells to predict superclasses, i.e., animals. And fusing the two paths of prediction results to obtain a final classification result.
Combining the superclass corresponding to the context in the step four with the class predicted in the step three to finally obtain a final classification result, and solving the problem that when limited visual features are used for identifying candidate object areas, the importance of the context information is ignored, so that objects which are similar in vision but are different in semantics can be detected wrongly;
because the source domain and the target domain are mutually exclusive, namely the class in the source domain (namely the training class) is completely different from the class in the target domain (namely the testing class), however, because of the superclass relation, the superclass has unknown class information, and the information of the testing class can be considered to be taken into the training process, so that the detector considers the information of the target domain in the training process, thereby having generalization capability, improving the detection capability of the target domain, and avoiding the domain offset problem caused by different data distribution of the source domain and the target domain, and leading the detector trained on the source domain to be not well generalized to the target domain.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the technical solutions according to the embodiments of the present invention.

Claims (7)

1. A zero sample object detection method for performing super-class reasoning by fusing context is characterized by comprising the following steps:
step one: extracting depth picture characteristics;
step two: based on the extracted depth picture features, for each cell, predicting the position coordinates of the cell by using a coordinate prediction network, and predicting the confidence of whether an object exists in the cell by using a confidence prediction network;
step three: classifying candidate object frames by using a zero sample classifier based on the visual characteristics of each cell to obtain a fine-grained classification result, wherein each cell can predict one or more candidate object frame positions, the characteristics of each candidate object frame are the visual characteristics of the current cell, and zero sample classification is performed based on the visual characteristics; the method specifically comprises the following steps:
step 3.1: based on the visual features of each cell, projecting the visual features to a semantic embedding space using a nonlinear equation; the visual characteristics of the cell are x i The visual characteristic after projection is k i I represents the i-th cell;
step 3.2: calculating the similarity between the projection vector of the visual characteristic and the semantic embedded vector of the object class in the semantic embedded space to obtain a classification score value and give a classification result with fine granularity; score value is expressed as
Figure FDA0004106607340000011
C s Representing the number of the classes with the objects, j representing the jth training class, s representing the known class, namely the training class; the object class is given in advance;
step four: on the basis of the first step, extracting the context information of each cell by using a context extraction network, and simultaneously predicting the superclass information of each candidate object by using a superclass prediction network on the basis of the extracted context characteristics;
step five: organically fusing the super class predicted in the fourth step with the class predicted in the third step to obtain a final classification result; multiplying the super class information predicted in the fourth step by the fine granularity classification result obtained in the step S3.2 to obtain a final classification result.
2. The method for detecting zero sample object by fusion context to perform superclass reasoning as set forth in claim 1, wherein said step four comprises the steps of:
step 4.1: extracting a context feature matrix through context feature extraction on the feature matrix of a given input picture, and using hole convolution;
step 4.2: extracting a superclass relation between the object class and the object class from the semantic web, so that the superclass at least contains 1 test object class;
step 4.3: predicting superclasses using a multi-layer fully connected network based on the contextual characteristics of each cell; the true superclass of cells is represented as
Figure FDA0004106607340000021
The predicted superclass is denoted +.>
Figure FDA0004106607340000022
The network was optimized using the following cross entropy loss:
Figure FDA0004106607340000023
where h×w is the number of cells, i is the number of cells, and the superclass prediction is performed on each cell, C is the number of superclasses, and j is the j-th superclass.
3. The method for detecting zero sample object by fusion context to perform superclass reasoning as set forth in claim 2, wherein said step five comprises the steps of:
step 5.1: the super class score value predicted in the step 4.3 is calculated
Figure FDA0004106607340000024
Multiplying the fine-grained classification score obtained in step 3.2 to obtain a final classification result, which is expressed as +.>
Figure FDA0004106607340000025
Wherein s is->
Figure FDA0004106607340000026
The superclass where the cross entropy loss is located is:
Figure FDA0004106607340000027
/>
where obj is an abbreviation for object, indicating whether the ith cell contains an object, if any,
Figure FDA0004106607340000028
otherwise it is
Figure FDA0004106607340000029
And obtaining an optimal classification result through the cross entropy loss function.
4. A method of zero sample object detection with context fusion for superclass reasoning as claimed in claim 2, characterized by the context feature extraction in step 4.1, in particular,
Figure FDA00041066073400000210
is a picture feature matrix, X represents a tensor matrix, R represents the whole real number interval, H X W X d v Is a three-dimensional matrix, which represents the size of X, and the size of each dimension is H, W and d respectively v H W is the number of cells, each cell characterized by d v In dimension, a plurality of hole convolution blocks are continuously used on X, wherein each hole convolution block comprises K hole convolutions, and the hole rate is +.>
Figure FDA0004106607340000031
i represents the ith cavity convolution, r represents the cavity rate of the cavity convolution, and the feature matrix obtained by each cavity convolution block is fused to obtain a final context feature matrix
Figure FDA0004106607340000032
H×W×d c Representing the size of the contextual feature matrix, where the contextual features of each cell are represented as p i I represents the number of cells.
5. The method for detecting zero-sample objects by fusion context to perform superclass reasoning according to claim 1, wherein the nonlinear equation in the step 3.1 is a fully-connected network.
6. The method for detecting zero-sample object by fusion context to perform superclass reasoning according to claim 2, wherein the hole convolution in the step 4.1 adopts a multi-layer 3*3 hole convolution.
7. The method for detecting zero-sample objects by fusion context to perform superclass reasoning according to claim 2, wherein the semantic net in the step 4.2 is WordNet.
CN202011618077.7A 2020-12-30 2020-12-30 Zero sample object detection method for performing superclass reasoning by fusing context Active CN112749738B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011618077.7A CN112749738B (en) 2020-12-30 2020-12-30 Zero sample object detection method for performing superclass reasoning by fusing context

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011618077.7A CN112749738B (en) 2020-12-30 2020-12-30 Zero sample object detection method for performing superclass reasoning by fusing context

Publications (2)

Publication Number Publication Date
CN112749738A CN112749738A (en) 2021-05-04
CN112749738B true CN112749738B (en) 2023-05-23

Family

ID=75650165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011618077.7A Active CN112749738B (en) 2020-12-30 2020-12-30 Zero sample object detection method for performing superclass reasoning by fusing context

Country Status (1)

Country Link
CN (1) CN112749738B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113672711B (en) * 2021-08-09 2024-01-19 之江实验室 Service type robot intention recognition device and training and recognition method thereof
CN113887647A (en) * 2021-10-14 2022-01-04 浙江大学 Class increase and decrease sample object detection method integrating knowledge distillation and class representative point extraction
CN116994104B (en) * 2023-07-19 2024-06-11 湖北楚天高速数字科技有限公司 Zero sample identification method and system based on tensor fusion and contrast learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928448B1 (en) * 2016-09-23 2018-03-27 International Business Machines Corporation Image classification utilizing semantic relationships in a classification hierarchy
CN110582777A (en) * 2017-05-05 2019-12-17 赫尔实验室有限公司 Zero-sample machine vision system with joint sparse representation
CN111428733A (en) * 2020-03-12 2020-07-17 山东大学 Zero sample target detection method and system based on semantic feature space conversion
CN112036170A (en) * 2020-09-03 2020-12-04 浙江大学 Neural zero sample fine-grained entity classification method based on type attention

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10908616B2 (en) * 2017-05-05 2021-02-02 Hrl Laboratories, Llc Attribute aware zero shot machine vision system via joint sparse representations
US11055555B2 (en) * 2018-04-20 2021-07-06 Sri International Zero-shot object detection
CN109993197B (en) * 2018-12-07 2023-04-28 天津大学 Zero sample multi-label classification method based on depth end-to-end example differentiation
CN110826638B (en) * 2019-11-12 2023-04-18 福州大学 Zero sample image classification model based on repeated attention network and method thereof
CN111680757A (en) * 2020-06-12 2020-09-18 汪金玲 Zero sample image recognition algorithm and system based on self-encoder

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9928448B1 (en) * 2016-09-23 2018-03-27 International Business Machines Corporation Image classification utilizing semantic relationships in a classification hierarchy
CN110582777A (en) * 2017-05-05 2019-12-17 赫尔实验室有限公司 Zero-sample machine vision system with joint sparse representation
CN111428733A (en) * 2020-03-12 2020-07-17 山东大学 Zero sample target detection method and system based on semantic feature space conversion
CN112036170A (en) * 2020-09-03 2020-12-04 浙江大学 Neural zero sample fine-grained entity classification method based on type attention

Also Published As

Publication number Publication date
CN112749738A (en) 2021-05-04

Similar Documents

Publication Publication Date Title
Xing et al. A convolutional neural network-based method for workpiece surface defect detection
CN112749738B (en) Zero sample object detection method for performing superclass reasoning by fusing context
Oliveira et al. Deep learning for human part discovery in images
CN107133569B (en) Monitoring video multi-granularity labeling method based on generalized multi-label learning
Sillito et al. Semi-supervised learning for anomalous trajectory detection
EP3447727B1 (en) A method, an apparatus and a computer program product for object detection
Yang et al. Multi-object tracking with discriminant correlation filter based deep learning tracker
CN110633632A (en) Weak supervision combined target detection and semantic segmentation method based on loop guidance
Yang et al. Detecting coarticulation in sign language using conditional random fields
CN111125406A (en) Visual relation detection method based on self-adaptive cluster learning
CN113378676A (en) Method for detecting figure interaction in image based on multi-feature fusion
Zhang et al. An efficient semi-supervised manifold embedding for crowd counting
Ghatak et al. GAN based efficient foreground extraction and HGWOSA based optimization for video synopsis generation
Iqbal et al. Classifier comparison for MSER-based text classification in scene images
Zhao et al. BiTNet: a lightweight object detection network for real-time classroom behavior recognition with transformer and bi-directional pyramid network
Athira et al. Underwater object detection model based on YOLOv3 architecture using deep neural networks
Mo et al. Student behavior recognition based on multitask learning
Attia et al. Efficient deep learning models based on tension techniques for sign language recognition
Abdulghani et al. Discover human poses similarity and action recognition based on machine learning
CN115482436B (en) Training method and device for image screening model and image screening method
Wang et al. Spatial relationship recognition via heterogeneous representation: A review
Yao et al. Extracting robust distribution using adaptive Gaussian Mixture Model and online feature selection
CN113223018A (en) Fine-grained image analysis processing method
CN113516118A (en) Image and text combined embedded multi-mode culture resource processing method
Dong et al. Intelligent pixel-level pavement marking detection using 2D laser pavement images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant