CN116665018A - Target detection method for open world unknown class identification - Google Patents

Target detection method for open world unknown class identification Download PDF

Info

Publication number
CN116665018A
CN116665018A CN202310940374.0A CN202310940374A CN116665018A CN 116665018 A CN116665018 A CN 116665018A CN 202310940374 A CN202310940374 A CN 202310940374A CN 116665018 A CN116665018 A CN 116665018A
Authority
CN
China
Prior art keywords
unknown
model
class
ore
unknown class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310940374.0A
Other languages
Chinese (zh)
Inventor
黄阳阳
罗荣华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202310940374.0A priority Critical patent/CN116665018A/en
Publication of CN116665018A publication Critical patent/CN116665018A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The application discloses a target detection method for open world unknown class identification. The method comprises the following steps: training a Faster R-CNN model by using Faster R-CNN as a reference network to obtain a target detection model UC-ORE based on open world unknown class identification; generating background frames by using RPN, and marking a plurality of background frames with the scores arranged in the front as unknown categories; separating known and unknown categories by means of feature clustering; converting a classification head in a target detection model UC-ORE based on open world unknown class identification into an energy function by using an EBMs energy model, and identifying the unknown class according to the energy value; and according to the received unknown class labels, learning the new class by using an incremental learning mode, and further circularly realizing the open world unknown class identification. The application realizes the detection of unknown categories in an open environment, reduces the cost of manual labeling and improves the target detection precision in the open world.

Description

Target detection method for open world unknown class identification
Technical Field
The application belongs to the field of image data identification, and particularly relates to a target detection method for open world unknown class identification.
Background
Object detection deep learning accelerates the progress of object detection research, and the task of the model is to identify and locate objects in images. All existing methods work under an important assumption that all classes to be detected are available during the training phase. When this assumption is relaxed, two challenging scenarios occur: 1) The test image may contain objects from unknown classes that should be classified as unknown, 2) the model should be able to learn new classes incrementally when information (tags) about these identified unknown items is available, a problem called open world object detection.
In the current open world target detection method, although unknown classes are identified in the implementation process, the unknown classes are unified, but the unknown classes are various and are not actually the same class, which can cause side effects, and the classification of the unknown classes has great commercial value, for example, the unknown environments need to be explored in the practical application of robots and automatic driving automobiles, and different strategies are adopted for different unknown classes.
In the current implementation process of the open world target detection method, as disclosed in the prior patent document CN115797706a, the target detection method, the target detection model training method and the related devices, although the unknown classes are identified and classified into one class, the unknown classes are various, and in fact, the unknown classes are not the same class, which may cause side effects.
Disclosure of Invention
The application aims to solve the problems, namely, the application not only can gradually learn and recognize new unknown objects when the unknown objects and the information (labels) related to the recognized unknown objects are available in the open world environment, but also can optimize the recognition of the unknown classes, thereby realizing the detection of the unknown classes in the open world environment, reducing the cost of manual labeling and improving the target detection precision in the open world.
The object of the application is achieved by at least one of the following technical solutions.
An object detection method for open world unknown class identification, comprising the following steps:
s1, in a training stage, using a Faster R-CNN as a reference network, and using a known class image as a training set to train the Faster R-CNN model to obtain a target detection model UC-ORE based on open world unknown class identification;
s2, generating background frames by utilizing RPN of a fast R-CNN reference network, and marking a plurality of background frames with the front scores as unknown categories;
s3, separating known and unknown categories by utilizing a characteristic clustering mode;
s4, in an reasoning stage, using an EBMs energy model to convert a classification head in a target detection model UC-ORE based on open world unknown class identification into an energy function, and identifying the unknown class according to the energy value;
s5, learning a new class by utilizing an incremental learning mode according to the received unknown class label, and further circularly realizing the open world unknown class identification.
Further, in step S1, the training set adopts a Pascal VOC 2007 and MS-COCO standard data set as detection references; training tasks on a standard data set, and taking a Faster R-CNN as a reference network; the Faster R-CNN is known as Faster Region-based Convolutional Neural Network, and is a two-stage target detection algorithm.
Further, in step S2, in the background frames generated by the candidate frame extraction network RPN of the fast R-CNN, the background frames are actually areas not marked, so that the unknown category with higher score in the background frames is likely to be not marked, and the candidate frames are directly extracted from the background frames generated by the network RPN, and are sorted according to the scores, and the background frames are not markedThe individual background boxes are labeled as unknown categories, +.>Representing +.>A background frame; wherein the RPN is collectively referred to as Region PropoThe sal Network is a module in the fast R-CNN for generating candidate regions in target detection.
Further, in step S3, class separation in the potential space is an ideal feature for the open world target detection method to identify unknown classes; one natural approach is to model class separation in potential space as a feature clustering problem in which instances of the same class will be forced to stay nearby, while instances of different classes will be pushed far away, and since the unknown class may be diverse, not actually the same, classifying the unknown class into one class may have adverse effects, separating the known and unknown classes by way of feature clustering, and performing a preliminary classification of the separated unknown class, distinguishing the same class from different classes in the unknown class;
the class separation in the potential space is realized by utilizing the characteristic that the distance of the object of the same class in the characteristic space is smaller than the distance of the object of different classes in the characteristic space, and the method comprises the following steps:
firstly, carrying out k-means clustering on objects of known categories to obtain cluster centers of the corresponding categories;
then, calculating the distance between a new unknown class and the cluster center of the existing known classes, if the minimum value of the distance between the unknown class and the cluster center of all the existing known classes exists, and if the minimum value is lower than a set threshold value, the unknown class is classified into the existing known class corresponding to the minimum value, otherwise, the unknown class is classified into a new unknown class, so that the known class and the unknown class are separated, and the unknown class is primarily classified.
Further, in step S4, using the EBMs energy model, converting the classification header in the object detection model UC-ORE based on open world unknown class identification into an energy function; wherein EBMs refer to Energy-Based Models, i.e., energy-Based Models, which are probability generating Models;
based on energy models (EBMs), a given feature spaceFeature vector +.>And feature vector->Corresponding class label->,/> Label-> ,/>A set of known and unknown class labels for a feature space; the goal is to find an energy function +.>,/>The output is a scalar which estimates the observation variable +.>And possibly a set of output variables +.>Compatibility between;
energy models (EBMs) assign low energy values to data of known classes, high energy values to unknown classes, and identify the unknown classes according to the energy values.
Further, in step S4, a calculation is performed using an energy function formula, wherein the known and unknown class label sets of the feature spaceThe energy of all values in (a) are combined:
wherein ,representing the energy of the energy model for measuring the performance of the energy model,/for>Is a temperature parameter controlling the smoothness of the energy function, +.>All possible tag sets representing object instances, +.>Representing model energy of a given tag, and performing probability weighting by integrating in an equation to represent the sum of all possible tags; logarithms are used to make the energy function smoother and easier to optimize; in this case, the equation is used to assign probabilities to each possible label of the object instance based on the model energy of the object instance.
Further, in step S5, when the object tag of the identified unknown class is received, a new unknown class tag is input, and the object detection model UC-ORE based on the open world unknown class identification is obtained through retraining, thereby realizing the open world unknown class identification.
Further, when the target detection model UC-ORE based on the open world unknown class identification is obtained through retraining, a new class is learned by using an incremental learning method based on sample playback, namely, a part of representative old data is stored, fine tuning is performed on the target detection model UC-ORE based on the open world unknown class identification after each incremental step, parameters of other layers except the output layer of the target detection model UC-ORE based on the open world unknown class identification are frozen, and only parameters of the last output layer are adjusted.
Further, incremental learning based on sample playback is a machine learning method mainly used for processing addition of new data in online learning, and its basic idea is to train a target detection model UC-ORE using history data, and then use the new data together with the history data to update the model; the method has the main advantages that the whole model can be prevented from being retrained, so that the training efficiency can be greatly improved, one common strategy is to randomly select a part of historical data to be used together with new data, and the method can prevent the model from being too dependent on certain historical data, so that the generalization capability of the model is improved, and specifically comprises the following steps:
s5.1, initializing a model: before incremental learning begins, the target detection model UC-ORE needs to be initialized and used for training a part of data;
s5.2, training a model: training a target detection model UC-ORE by using a part of new data;
s5.3, sample playback: storing samples of a set proportion in a data set trained before in a buffer, called a playback buffer, and then randomly extracting samples of a set proportion from the playback buffer, and using the samples together with current training data for training of a target detection model UC-ORE;
s5.4, updating a model: combining the target detection model UC-ORE trained by using the samples in the playback buffer zone with the target detection model UC-ORE trained in the step S5.2 to obtain a new target detection model UC-ORE;
s5.5, testing model: the object detection model UC-ORE in step S5.3 is evaluated using the test dataset.
S5.6, if new data are needed to be trained, returning to the step S5.2, otherwise, ending the incremental learning.
Further, the fine tuning is performed by training the model using a portion of representative historical data and new data in order to avoid model retraining when tags of unknown class are received; in model tuning, the method of adjusting only the parameters of the last output layer is generally called "head tuning" (head fine-tuning) or "global tuning";
the main idea of the method is that only the last layers of the model are finely tuned by utilizing the general features learned by the pre-training model on large-scale data, so that the model can be better adapted to a new task, and the specific implementation process is as follows:
a1, loading a pre-training model: using as an initial model a target detection model UC-ORE that has been pre-trained on large-scale data;
a2, freezing model parameters: for layers that do not require fine tuning, their parameters are frozen so that they do not change during the training process;
a3, replacing an output layer: replacing the last output layer of the target detection model UC-ORE with a new output layer adapting to the task, wherein the output layer contains the category number required by the new task;
a4, training only a new output layer: only training the new output layer, so that the target detection model UC-ORE can be better adapted to new tasks;
a5, thawing parameters: if the parameters of other layers need to be fine-tuned, the parameters of the layers are thawed so that they can be changed in the fine-tuning;
a6, fine tuning a model: the whole object detection model UC-ORE is fine-tuned until the object detection model UC-ORE converges on the new task.
Compared with the prior art, the application has the advantages that:
in the current open world target detection method, although unknown classes are identified in the implementation process, the unknown classes are unified, but the unknown classes are various and are not actually the same class, which can cause side effects, and the classification of the unknown classes has great commercial value, for example, the unknown environments need to be explored in the practical application of robots and automatic driving automobiles, and different strategies are adopted for different unknown classes; according to the application, the accuracy of open world target detection is improved by subdividing the unknown class.
Drawings
FIG. 1 is a flow chart of a method for detecting targets for open world unknown class identification in an embodiment of the application;
FIG. 2 is a schematic diagram of PRN labeling unknown classes in an embodiment of the present application;
FIG. 3 is a schematic view of feature clustering in an embodiment of the present application;
fig. 4 is an effect diagram of an embodiment of the present application.
Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the application, reference will now be made in detail to the embodiments of the application, examples of which are illustrated in the accompanying drawings and will be apparent to those skilled in the art, all other embodiments of which are intended to be within the scope of the application.
Examples:
an object detection method for open world unknown class identification, as shown in fig. 1, comprises the following steps:
s1, in a training stage, using a Faster R-CNN as a reference network, and using a known class image as a training set to train the Faster R-CNN model to obtain a target detection model UC-ORE based on open world unknown class identification;
in one embodiment, the training set uses the Pascal VOC 2007 and MS-COCO standard data sets as detection benchmarks; training tasks on a standard data set, and taking a Faster R-CNN as a reference network; the Faster R-CNN is known as Faster Region-based Convolutional Neural Network, and is a two-stage target detection algorithm. In the training phase of the fast R-CNN model, the confidence SCORE of target detection is set to 0.35, and the non-maximum suppression NMS is set to 0.35.
S2, generating background frames by utilizing RPN of a fast R-CNN reference network, and marking a plurality of background frames with the front scores as unknown categories;
as shown in fig. 2, in the background frame generated by the network RPN extracted from the candidate frame of the fast R-CNN,the background frames are actually areas which are not marked, so that the higher score in the background frames is likely to be unknown categories which are not marked, the candidate frames are directly extracted from the background frames generated by the network RPN, and the candidate frames are sorted according to scores, and the candidate frames are ranked in the frontThe individual background boxes are labeled as unknown categories, +.>Representing +.>A background frame; wherein the RPN is collectively referred to as "Region Proposal Network", which is a module in the Faster R-CNN, for generating a candidate region in target detection.
S3, separating known and unknown categories by utilizing a characteristic clustering mode;
as shown in fig. 3, class separation in potential space is an ideal feature for an open world object detection method to identify unknown classes; one natural approach is to model class separation in potential space as a feature clustering problem in which instances of the same class will be forced to stay nearby, while instances of different classes will be pushed far away, and since the unknown class may be diverse, not actually the same, classifying the unknown class into one class may have adverse effects, separating the known and unknown classes by way of feature clustering, and performing a preliminary classification of the separated unknown class, distinguishing the same class from different classes in the unknown class;
the class separation in the potential space is realized by utilizing the characteristic that the distance of the object of the same class in the characteristic space is smaller than the distance of the object of different classes in the characteristic space, and the method comprises the following steps:
firstly, carrying out k-means clustering on objects of known categories to obtain cluster centers of the corresponding categories;
then, calculating the distance between a new unknown class and the cluster center of the existing known classes, if the minimum value of the distance between the unknown class and the cluster center of all the existing known classes exists, and if the minimum value is lower than a set threshold value, the unknown class is classified into the existing known class corresponding to the minimum value, otherwise, the unknown class is classified into a new unknown class, so that the known class and the unknown class are separated, and the unknown class is primarily classified.
S4, in an reasoning stage, using an EBMs energy model to convert a classification head in a target detection model UC-ORE based on open world unknown class identification into an energy function, and identifying the unknown class according to the energy value;
converting a classification head in a target detection model UC-ORE based on open world unknown class identification into an energy function by using an EBMs energy model; wherein EBMs refer to Energy-Based Models, i.e., energy-Based Models, which are probability generating Models;
based on energy models (EBMs), a given feature spaceFeature vector +.>And feature vector->Corresponding class label->,/> Label-> ,/>A set of known and unknown class labels for a feature space; the goal is to find an energy function +.>,/>The output is a scalar which estimates the observation variable +.>And possibly a set of output variables +.>Compatibility between;
energy models (EBMs) assign low energy values to data of known classes, high energy values to unknown classes, and identify the unknown classes according to the energy values.
Calculation using an energy function formula, wherein a set of known and unknown class labels of a feature spaceThe energy of all values in (a) are combined:
wherein ,representing the energy of the energy model for measuring the performance of the energy model,/for>Is a temperature parameter controlling the smoothness of the energy function, +.>All possible tag sets representing object instances, +.>Representing model energy of a given tag, and performing probability weighting by integrating in an equation to represent the sum of all possible tags; logarithms are used to make the energy function smoother and easier to optimize; in this case, the equation is used to assign probabilities to each possible label of the object instance based on the model energy of the object instance.
S5, learning a new class by utilizing an incremental learning mode according to the received unknown class label, and further circularly realizing the open world unknown class identification;
when the object label of the identified unknown class is received, a new unknown class label is input, and the object detection model UC-ORE based on the open world unknown class identification is obtained through retraining, so that the open world unknown class identification is realized.
Further, when the target detection model UC-ORE based on the open world unknown class identification is obtained through retraining, a new class is learned by using an incremental learning method based on sample playback, namely, a part of representative old data is stored, fine tuning is performed on the target detection model UC-ORE based on the open world unknown class identification after each incremental step, parameters of other layers except the output layer of the target detection model UC-ORE based on the open world unknown class identification are frozen, and only parameters of the last output layer are adjusted.
Further, incremental learning based on sample playback is a machine learning method mainly used to deal with the addition of new data in online learning, the basic idea of which is to train a model using historical data and then use the new data together with the historical data to update the model. The method has the main advantages that the whole model can be prevented from being retrained, so that the training efficiency can be greatly improved, one common strategy is to randomly select a part of historical data to be used together with new data, and the method can prevent the model from being too dependent on certain historical data, so that the generalization capability of the model is improved, and specifically comprises the following steps:
s5.1, initializing a model: before incremental learning begins, the target detection model UC-ORE needs to be initialized and used for training a part of data;
s5.2, training a model: training a target detection model UC-ORE by using a part of new data;
s5.3, sample playback: storing samples of a set proportion in a data set trained before in a buffer, called a playback buffer, and then randomly extracting samples of a set proportion from the playback buffer, and using the samples together with current training data for training of a target detection model UC-ORE;
s5.4, updating a model: combining the target detection model UC-ORE trained by using the samples in the playback buffer zone with the target detection model UC-ORE trained in the step S5.2 to obtain a new target detection model UC-ORE;
s5.5, testing model: the object detection model UC-ORE in step S5.3 is evaluated using the test dataset.
S5.6, if new data are needed to be trained, returning to the step S5.2, otherwise, ending the incremental learning.
Further, the fine tuning is performed by training the model using a portion of representative historical data and new data in order to avoid model retraining when tags of unknown class are received; in model tuning, the method of adjusting only the parameters of the last output layer is generally called "head tuning" (head fine-tuning) or "global tuning";
the main idea of the method is that only the last layers of the model are finely tuned by utilizing the general features learned by the pre-training model on large-scale data, so that the model can be better adapted to a new task, and the specific implementation process is as follows:
a1, loading a pre-training model: using as an initial model a target detection model UC-ORE that has been pre-trained on large-scale data;
a2, freezing model parameters: for layers that do not require fine tuning, their parameters are frozen so that they do not change during the training process;
a3, replacing an output layer: replacing the last output layer of the target detection model UC-ORE with a new output layer adapting to the task, wherein the output layer contains the category number required by the new task;
a4, training only a new output layer: only training the new output layer, so that the target detection model UC-ORE can be better adapted to new tasks;
a5, thawing parameters: if the parameters of other layers need to be fine-tuned, the parameters of the layers are thawed so that they can be changed in the fine-tuning;
a6, fine tuning a model: the whole object detection model UC-ORE is fine-tuned until the object detection model UC-ORE converges on the new task.
In order to demonstrate the effectiveness of the method proposed by the present application, the following verification experiments were performed:
a comprehensive evaluation standard is provided to discuss the performance of a target detection model UC-ORE based on open world unknown class recognition, including recognition of unknown class objects, detection of known classes, and gradual learning of new classes while providing labels to the unknown classes.
Data segmentation: dividing classes into a set of tasks={/>;···/>The method comprises the steps of carrying out a first treatment on the surface of the [ MEANS FOR SOLVING PROBLEMS ] how }; all classes of a particular task will be +.>Is introduced into the system. Training->When {>:/></>All classes of willIs regarded as known, {>:/>>/>-will be regarded as unknown;
in one embodiment, classes from Pascal VOC and MS-COCO are considered. Grouping all VOC classes and data into a first task. The remaining 60 classes of MS-COCO are divided into three sequential tasks, where there is semantic drift. All images corresponding to the above images, segmented from the Pascal VOC and MS-COCO training sets, constitute training data. For evaluation, segmentation was tested using Pascal VOC and MS-COCO validated. A 1k image is extracted from the training data for each task, and left for verification. Table 1 below shows the task composition in the open world target detection evaluation criteria for unknown classification:
TABLE 1
Evaluation index: since the unknown target is easily confused with the known target, the Wilderness Impact (WI) index is used to explicitly describe this behavior, and ideally WI should be smaller because the accuracy cannot be degraded when the unknown target is added to the test set. In addition to WI, absolute open set error (A-OSE) is also used to reflect the number of unknown targets misclassified as a known class. Both WI and a-OSE implicitly measure the effectiveness of the model in handling unknown targets.
Table 2 shows a comparison of UC-ORE with Faster RCNN on open world target detection. After learning each task, WI and A-OSE metrics are used to quantify how confused an unknown instance with any known class. The WI and a-OSE scores of the UC-ORE were found to be significantly lower due to the explicit modeling of the unknown target. When the unknown classes were progressively marked in task 2, the performance of the baseline detector on the set of known classes (quantified by mAP) was found to drop significantly from 56.16% to 5.011%. UC-ORE can achieve two goals simultaneously: detecting the effect of the known class and reducing the effect of the unknown class. Similar trends also occur in task 3 and task 4 classes.
Table 2 shows the performance of UC-ORE in open world target detection. WI and A-OSE quantification evaluates how UC-ORE handles unknown classes, while mAP measures how well it detects known classes. It can be seen that UC-ORE is always better than the Faster R-CNN based benchmark in all indicators.
TABLE 2
In the application, the UC-ORE is used for clearly modeling the unknown object, so that the UC-ORE is well represented in the incremental target detection task. This is because the UC-ORE reduces confusion in which unknown objects are classified as known objects, which allows the detector to learn incrementally about the actual foreground objects. The UC-ORE was evaluated using the criteria used in ILOD (abbreviation for incremental target detector), and the Pascal VOC 2007 dataset was used to divide the dataset into three groups: 10 (known class) +10 (unknown class), 15 (known class) +5 (unknown class), 19 (known class) +1 (unknown class) to allow incremental learning of the detector. The UC-ORE was compared to ILOD at three different settings. As shown in table 3 below, UC-ORE performs very well in all settings.
TABLE 3 Table 3
It is to be understood that various changes and modifications in the form and detail herein disclosed may be made by those skilled in the art without departing from the spirit and scope of the present disclosure. Accordingly, equivalent modifications and variations of the present application should be within the scope of the claims of the present application. In addition, although specific terms are used in the present specification, these terms are for convenience of description only and do not limit the present application in any way.

Claims (10)

1. The object detection method for open world unknown class identification is characterized by comprising the following steps:
s1, in a training stage, using a Faster R-CNN as a reference network, and using a known class image as a training set to train the Faster R-CNN model to obtain a target detection model UC-ORE based on open world unknown class identification;
s2, generating background frames by utilizing RPN of a fast R-CNN reference network, and marking a plurality of background frames with the front scores as unknown categories;
s3, separating known and unknown categories by utilizing a characteristic clustering mode;
s4, in an reasoning stage, using an EBMs energy model to convert a classification head in a target detection model UC-ORE based on open world unknown class identification into an energy function, and identifying the unknown class according to the energy value;
s5, learning a new class by utilizing an incremental learning mode according to the received unknown class label, and further circularly realizing the open world unknown class identification.
2. The method for detecting an object for open world unknown class identification according to claim 1, wherein in step S1, a training set uses Pascal VOC 2007 and MS-COCO standard data sets as detection references; training tasks on a standard data set, and taking a Faster R-CNN as a reference network; the Faster R-CNN is known as Faster Region-based Convolutional Neural Network, and is a two-stage target detection algorithm.
3. The method for detecting an object for open world unknown class identification according to claim 1, wherein in step S2, in fastIn the background frames generated by the R-CNN candidate frame extraction network RPN, the background frames are actually areas which are not marked, and the background frames generated by the R-CNN candidate frame extraction network RPN are directly sorted according to scores, and the background frames are sorted according to scores before the background framesThe individual background boxes are labeled as unknown categories, +.>Representing +.>A background frame; wherein, the RPN is generally called Region Proposal Network and is one module in the Faster R-CNN for generating candidate areas in target detection.
4. A method for detecting an object for identifying an unknown class in the open world as claimed in claim 3, wherein in step S3, class separation in the potential space is an ideal feature of the unknown class identified by the open world object detection method; separating known and unknown categories by means of feature clustering, and primarily classifying the separated unknown categories to distinguish the same category from different categories in the unknown categories;
the class separation in the potential space is realized by utilizing the characteristic that the distance of the object of the same class in the characteristic space is smaller than the distance of the object of different classes in the characteristic space, and the method comprises the following steps:
firstly, carrying out k-means clustering on objects of known categories to obtain cluster centers of the corresponding categories;
then, calculating the distance between a new unknown class and the cluster center of the existing known classes, if the minimum value of the distance between the unknown class and the cluster center of all the existing known classes exists, and if the minimum value is lower than a set threshold value, the unknown class is classified into the existing known class corresponding to the minimum value, otherwise, the unknown class is classified into a new unknown class, so that the known class and the unknown class are separated, and the unknown class is primarily classified.
5. The method according to claim 1, wherein in step S4, the classification head in the open world unknown class identification-based object detection model UC-ORE is converted into an energy function using EBMs energy model; wherein EBMs refer to Energy-Based Models, i.e., energy-Based Models, which are probability generating Models;
based on the energy model EBMs, given feature spaceFeature vector +.>And feature vector->Corresponding class label->,/> Label-> ,/>A set of known and unknown class labels for a feature space; order of (A)The sign is to find an energy function +.>The output is a scalar which estimates the observation variable +.>And possibly a set of output variables +.>Compatibility between;
the energy model EBMs allocate low energy values to the data of the known classes, allocate high energy values to the unknown classes, and identify the unknown classes according to the energy values.
6. The method for object detection for open world unknown class identification according to claim 1, wherein in step S4, the calculation is performed using an energy function formula, wherein the set of known and unknown class labels of the feature spaceThe energy of all values in (a) are combined:
wherein ,representing the energy of the energy model for measuring the performance of the energy model,/for>Is a temperature parameter controlling the smoothness of the energy function, +.>All possible tag sets representing object instances, +.>Representing model energy of a given tag, and performing probability weighting by integrating in an equation to represent the sum of all possible tags; logarithms are used to make the energy function smoother and easier to optimize; in this case, the equation is used to assign probabilities to each possible label of the object instance based on the model energy of the object instance.
7. The method for detecting an object identified by an open world unknown class according to claim 1, wherein in step S5, when an object tag of an identified unknown class is received, a new unknown class tag is input, and the object detection model UC-ORE based on the open world unknown class identification is obtained through retraining, thereby realizing the open world unknown class identification.
8. The method according to claim 7, wherein when the object detection model UC-ORE based on the open world unknown class identification is retrained, the new class is learned by using an incremental learning method based on sample playback, that is, a part of representative old data is stored, and after each incremental step, the object detection model UC-ORE based on the open world unknown class identification is fine-tuned, and the object detection model UC-ORE based on the open world unknown class identification is frozen except for parameters of other layers outside the output layer, and only parameters of the last output layer are adjusted.
9. The method for object detection for open world unknown class identification according to claim 8, wherein the incremental learning based on sample playback is a machine learning method for handling the addition of new data in online learning, training the object detection model UC-ORE using history data, and then using the new data together with the history data to update the model, comprising the steps of:
s5.1, initializing a model: before incremental learning begins, the target detection model UC-ORE needs to be initialized and used for training a part of data;
s5.2, training a model: training a target detection model UC-ORE by using a part of new data;
s5.3, sample playback: storing samples of a set proportion in a data set trained before in a buffer, called a playback buffer, and then randomly extracting samples of a set proportion from the playback buffer, and using the samples together with current training data for training of a target detection model UC-ORE;
s5.4, updating a model: combining the target detection model UC-ORE trained by using the samples in the playback buffer zone with the target detection model UC-ORE trained in the step S5.2 to obtain a new target detection model UC-ORE;
s5.5, testing model: evaluating the target detection model UC-ORE in step S5.3 using the test dataset;
s5.6, if new data are needed to be trained, returning to the step S5.2, otherwise, ending the incremental learning.
10. The method of claim 8, wherein the fine tuning is performed by training the model using a portion of representative historical data and new data to avoid model retraining when receiving tags of unknown classes; in the model trimming, a method of adjusting only the parameters of the last output layer is generally called "head trimming" or "global trimming", and the specific implementation process is as follows:
a1, loading a pre-training model: using as an initial model a target detection model UC-ORE that has been pre-trained on large-scale data;
a2, freezing model parameters: for layers that do not require fine tuning, their parameters are frozen so that they do not change during the training process;
a3, replacing an output layer: replacing the last output layer of the target detection model UC-ORE with a new output layer adapting to the task, wherein the output layer contains the category number required by the new task;
a4, training only a new output layer: training only the new output layer, so that the target detection model UC-ORE can adapt to new tasks;
a5, thawing parameters: if the parameters of other layers need to be fine-tuned, the parameters of the layers are thawed so that they can be changed in the fine-tuning;
a6, fine tuning a model: the whole object detection model UC-ORE is fine-tuned until the object detection model UC-ORE converges on the new task.
CN202310940374.0A 2023-07-28 2023-07-28 Target detection method for open world unknown class identification Pending CN116665018A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310940374.0A CN116665018A (en) 2023-07-28 2023-07-28 Target detection method for open world unknown class identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310940374.0A CN116665018A (en) 2023-07-28 2023-07-28 Target detection method for open world unknown class identification

Publications (1)

Publication Number Publication Date
CN116665018A true CN116665018A (en) 2023-08-29

Family

ID=87710045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310940374.0A Pending CN116665018A (en) 2023-07-28 2023-07-28 Target detection method for open world unknown class identification

Country Status (1)

Country Link
CN (1) CN116665018A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863250A (en) * 2023-09-01 2023-10-10 华南理工大学 Open scene target detection method related to multi-mode unknown class identification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210319176A1 (en) * 2020-04-13 2021-10-14 Capital One Services, Llc Efficient automatic punctuation with robust inference
CN114139617A (en) * 2021-11-24 2022-03-04 山东力聚机器人科技股份有限公司 New class target identification method and device based on deep clustering
CN114241260A (en) * 2021-12-14 2022-03-25 四川大学 Open set target detection and identification method based on deep neural network
CN115690514A (en) * 2022-11-14 2023-02-03 深圳市华尊科技股份有限公司 Image recognition method and related equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210319176A1 (en) * 2020-04-13 2021-10-14 Capital One Services, Llc Efficient automatic punctuation with robust inference
CN114139617A (en) * 2021-11-24 2022-03-04 山东力聚机器人科技股份有限公司 New class target identification method and device based on deep clustering
CN114241260A (en) * 2021-12-14 2022-03-25 四川大学 Open set target detection and identification method based on deep neural network
CN115690514A (en) * 2022-11-14 2023-02-03 深圳市华尊科技股份有限公司 Image recognition method and related equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
K J JOSEPH,ET AL.: "Towards Open World Object Detection", pages 1 - 16, Retrieved from the Internet <URL:https://arxiv.org/abs/2103.02603> *
ZHIHENG WU, ET AL.: "UC-OWOD: Unknown-Classified Open World Object Detection", pages 1 - 9, Retrieved from the Internet <URL:https://arxiv.org/pdf/2207.11455.pdf> *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116863250A (en) * 2023-09-01 2023-10-10 华南理工大学 Open scene target detection method related to multi-mode unknown class identification

Similar Documents

Publication Publication Date Title
Bendale et al. Towards open world recognition
CN111967294B (en) Unsupervised domain self-adaptive pedestrian re-identification method
Behrmann et al. Unified fully and timestamp supervised temporal action segmentation via sequence to sequence translation
CN108733778B (en) Industry type identification method and device of object
Dmochowski et al. Maximum Likelihood in Cost-Sensitive Learning: Model Specification, Approximations, and Upper Bounds.
JP5588395B2 (en) System and method for efficiently interpreting images with respect to objects and their parts
US8606022B2 (en) Information processing apparatus, method and program
US11182602B2 (en) Method and system for person re-identification
WO2012141332A1 (en) Supervised and semi-supervised online boosting algorithm in machine learning framework
US11210555B2 (en) High-dimensional image feature matching method and device
CN116665018A (en) Target detection method for open world unknown class identification
CN107220663B (en) Automatic image annotation method based on semantic scene classification
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
CN109871891B (en) Object identification method and device and storage medium
WO2015146113A1 (en) Identification dictionary learning system, identification dictionary learning method, and recording medium
CN113762508A (en) Training method, device, equipment and medium for image classification network model
CN104376308A (en) Human action recognition method based on multitask learning
WO2022166578A1 (en) Method and apparatus for domain adaptation learning, and device, medium and product
CN111191033A (en) Open set classification method based on classification utility
CN116910571B (en) Open-domain adaptation method and system based on prototype comparison learning
US20140358960A1 (en) Rapid nearest neighbor searching using kd-ferns
KR20210071378A (en) Hierarchical object detection method for extended categories
CN112766423B (en) Training method and device for face recognition model, computer equipment and storage medium
Madokoro et al. Adaptive Category Mapping Networks for all-mode topological feature learning used for mobile robot vision
CN116863250A (en) Open scene target detection method related to multi-mode unknown class identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination