CN116665018A

CN116665018A - Target detection method for open world unknown class identification

Info

Publication number: CN116665018A
Application number: CN202310940374.0A
Authority: CN
Inventors: 黄阳阳; 罗荣华
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2023-07-28
Filing date: 2023-07-28
Publication date: 2023-08-29

Abstract

The application discloses a target detection method for open world unknown class identification. The method comprises the following steps: training a Faster R-CNN model by using Faster R-CNN as a reference network to obtain a target detection model UC-ORE based on open world unknown class identification; generating background frames by using RPN, and marking a plurality of background frames with the scores arranged in the front as unknown categories; separating known and unknown categories by means of feature clustering; converting a classification head in a target detection model UC-ORE based on open world unknown class identification into an energy function by using an EBMs energy model, and identifying the unknown class according to the energy value; and according to the received unknown class labels, learning the new class by using an incremental learning mode, and further circularly realizing the open world unknown class identification. The application realizes the detection of unknown categories in an open environment, reduces the cost of manual labeling and improves the target detection precision in the open world.

Description

Target detection method for open world unknown class identification

Technical Field

The application belongs to the field of image data identification, and particularly relates to a target detection method for open world unknown class identification.

Background

Object detection deep learning accelerates the progress of object detection research, and the task of the model is to identify and locate objects in images. All existing methods work under an important assumption that all classes to be detected are available during the training phase. When this assumption is relaxed, two challenging scenarios occur: 1) The test image may contain objects from unknown classes that should be classified as unknown, 2) the model should be able to learn new classes incrementally when information (tags) about these identified unknown items is available, a problem called open world object detection.

In the current open world target detection method, although unknown classes are identified in the implementation process, the unknown classes are unified, but the unknown classes are various and are not actually the same class, which can cause side effects, and the classification of the unknown classes has great commercial value, for example, the unknown environments need to be explored in the practical application of robots and automatic driving automobiles, and different strategies are adopted for different unknown classes.

In the current implementation process of the open world target detection method, as disclosed in the prior patent document CN115797706a, the target detection method, the target detection model training method and the related devices, although the unknown classes are identified and classified into one class, the unknown classes are various, and in fact, the unknown classes are not the same class, which may cause side effects.

Disclosure of Invention

The application aims to solve the problems, namely, the application not only can gradually learn and recognize new unknown objects when the unknown objects and the information (labels) related to the recognized unknown objects are available in the open world environment, but also can optimize the recognition of the unknown classes, thereby realizing the detection of the unknown classes in the open world environment, reducing the cost of manual labeling and improving the target detection precision in the open world.

The object of the application is achieved by at least one of the following technical solutions.

An object detection method for open world unknown class identification, comprising the following steps:

s1, in a training stage, using a Faster R-CNN as a reference network, and using a known class image as a training set to train the Faster R-CNN model to obtain a target detection model UC-ORE based on open world unknown class identification;

s2, generating background frames by utilizing RPN of a fast R-CNN reference network, and marking a plurality of background frames with the front scores as unknown categories;

s3, separating known and unknown categories by utilizing a characteristic clustering mode;

s4, in an reasoning stage, using an EBMs energy model to convert a classification head in a target detection model UC-ORE based on open world unknown class identification into an energy function, and identifying the unknown class according to the energy value;

s5, learning a new class by utilizing an incremental learning mode according to the received unknown class label, and further circularly realizing the open world unknown class identification.

Further, in step S1, the training set adopts a Pascal VOC 2007 and MS-COCO standard data set as detection references; training tasks on a standard data set, and taking a Faster R-CNN as a reference network; the Faster R-CNN is known as Faster Region-based Convolutional Neural Network, and is a two-stage target detection algorithm.

Further, in step S2, in the background frames generated by the candidate frame extraction network RPN of the fast R-CNN, the background frames are actually areas not marked, so that the unknown category with higher score in the background frames is likely to be not marked, and the candidate frames are directly extracted from the background frames generated by the network RPN, and are sorted according to the scores, and the background frames are not markedThe individual background boxes are labeled as unknown categories, +.>Representing +.>A background frame; wherein the RPN is collectively referred to as Region PropoThe sal Network is a module in the fast R-CNN for generating candidate regions in target detection.

Further, in step S3, class separation in the potential space is an ideal feature for the open world target detection method to identify unknown classes; one natural approach is to model class separation in potential space as a feature clustering problem in which instances of the same class will be forced to stay nearby, while instances of different classes will be pushed far away, and since the unknown class may be diverse, not actually the same, classifying the unknown class into one class may have adverse effects, separating the known and unknown classes by way of feature clustering, and performing a preliminary classification of the separated unknown class, distinguishing the same class from different classes in the unknown class;

the class separation in the potential space is realized by utilizing the characteristic that the distance of the object of the same class in the characteristic space is smaller than the distance of the object of different classes in the characteristic space, and the method comprises the following steps:

firstly, carrying out k-means clustering on objects of known categories to obtain cluster centers of the corresponding categories;

then, calculating the distance between a new unknown class and the cluster center of the existing known classes, if the minimum value of the distance between the unknown class and the cluster center of all the existing known classes exists, and if the minimum value is lower than a set threshold value, the unknown class is classified into the existing known class corresponding to the minimum value, otherwise, the unknown class is classified into a new unknown class, so that the known class and the unknown class are separated, and the unknown class is primarily classified.

Further, in step S4, using the EBMs energy model, converting the classification header in the object detection model UC-ORE based on open world unknown class identification into an energy function; wherein EBMs refer to Energy-Based Models, i.e., energy-Based Models, which are probability generating Models;

based on energy models (EBMs), a given feature spaceFeature vector +.>And feature vector->Corresponding class label->，/> Label-> ，/>A set of known and unknown class labels for a feature space; the goal is to find an energy function +.>，/>The output is a scalar which estimates the observation variable +.>And possibly a set of output variables +.>Compatibility between;

energy models (EBMs) assign low energy values to data of known classes, high energy values to unknown classes, and identify the unknown classes according to the energy values.

Further, in step S4, a calculation is performed using an energy function formula, wherein the known and unknown class label sets of the feature spaceThe energy of all values in (a) are combined:

wherein ,representing the energy of the energy model for measuring the performance of the energy model,/for>Is a temperature parameter controlling the smoothness of the energy function, +.>All possible tag sets representing object instances, +.>Representing model energy of a given tag, and performing probability weighting by integrating in an equation to represent the sum of all possible tags; logarithms are used to make the energy function smoother and easier to optimize; in this case, the equation is used to assign probabilities to each possible label of the object instance based on the model energy of the object instance.

Further, in step S5, when the object tag of the identified unknown class is received, a new unknown class tag is input, and the object detection model UC-ORE based on the open world unknown class identification is obtained through retraining, thereby realizing the open world unknown class identification.

Further, when the target detection model UC-ORE based on the open world unknown class identification is obtained through retraining, a new class is learned by using an incremental learning method based on sample playback, namely, a part of representative old data is stored, fine tuning is performed on the target detection model UC-ORE based on the open world unknown class identification after each incremental step, parameters of other layers except the output layer of the target detection model UC-ORE based on the open world unknown class identification are frozen, and only parameters of the last output layer are adjusted.

Further, incremental learning based on sample playback is a machine learning method mainly used for processing addition of new data in online learning, and its basic idea is to train a target detection model UC-ORE using history data, and then use the new data together with the history data to update the model; the method has the main advantages that the whole model can be prevented from being retrained, so that the training efficiency can be greatly improved, one common strategy is to randomly select a part of historical data to be used together with new data, and the method can prevent the model from being too dependent on certain historical data, so that the generalization capability of the model is improved, and specifically comprises the following steps:

s5.1, initializing a model: before incremental learning begins, the target detection model UC-ORE needs to be initialized and used for training a part of data;

s5.2, training a model: training a target detection model UC-ORE by using a part of new data;

s5.3, sample playback: storing samples of a set proportion in a data set trained before in a buffer, called a playback buffer, and then randomly extracting samples of a set proportion from the playback buffer, and using the samples together with current training data for training of a target detection model UC-ORE;

s5.4, updating a model: combining the target detection model UC-ORE trained by using the samples in the playback buffer zone with the target detection model UC-ORE trained in the step S5.2 to obtain a new target detection model UC-ORE;

s5.5, testing model: the object detection model UC-ORE in step S5.3 is evaluated using the test dataset.

S5.6, if new data are needed to be trained, returning to the step S5.2, otherwise, ending the incremental learning.

Further, the fine tuning is performed by training the model using a portion of representative historical data and new data in order to avoid model retraining when tags of unknown class are received; in model tuning, the method of adjusting only the parameters of the last output layer is generally called "head tuning" (head fine-tuning) or "global tuning";

the main idea of the method is that only the last layers of the model are finely tuned by utilizing the general features learned by the pre-training model on large-scale data, so that the model can be better adapted to a new task, and the specific implementation process is as follows:

a1, loading a pre-training model: using as an initial model a target detection model UC-ORE that has been pre-trained on large-scale data;

a2, freezing model parameters: for layers that do not require fine tuning, their parameters are frozen so that they do not change during the training process;

a3, replacing an output layer: replacing the last output layer of the target detection model UC-ORE with a new output layer adapting to the task, wherein the output layer contains the category number required by the new task;

a4, training only a new output layer: only training the new output layer, so that the target detection model UC-ORE can be better adapted to new tasks;

a5, thawing parameters: if the parameters of other layers need to be fine-tuned, the parameters of the layers are thawed so that they can be changed in the fine-tuning;

a6, fine tuning a model: the whole object detection model UC-ORE is fine-tuned until the object detection model UC-ORE converges on the new task.

Compared with the prior art, the application has the advantages that:

in the current open world target detection method, although unknown classes are identified in the implementation process, the unknown classes are unified, but the unknown classes are various and are not actually the same class, which can cause side effects, and the classification of the unknown classes has great commercial value, for example, the unknown environments need to be explored in the practical application of robots and automatic driving automobiles, and different strategies are adopted for different unknown classes; according to the application, the accuracy of open world target detection is improved by subdividing the unknown class.

Drawings

FIG. 1 is a flow chart of a method for detecting targets for open world unknown class identification in an embodiment of the application;

FIG. 2 is a schematic diagram of PRN labeling unknown classes in an embodiment of the present application;

FIG. 3 is a schematic view of feature clustering in an embodiment of the present application;

fig. 4 is an effect diagram of an embodiment of the present application.

Detailed Description

For the purposes of promoting an understanding of the principles and advantages of the application, reference will now be made in detail to the embodiments of the application, examples of which are illustrated in the accompanying drawings and will be apparent to those skilled in the art, all other embodiments of which are intended to be within the scope of the application.

Examples:

an object detection method for open world unknown class identification, as shown in fig. 1, comprises the following steps:

in one embodiment, the training set uses the Pascal VOC 2007 and MS-COCO standard data sets as detection benchmarks; training tasks on a standard data set, and taking a Faster R-CNN as a reference network; the Faster R-CNN is known as Faster Region-based Convolutional Neural Network, and is a two-stage target detection algorithm. In the training phase of the fast R-CNN model, the confidence SCORE of target detection is set to 0.35, and the non-maximum suppression NMS is set to 0.35.

as shown in fig. 2, in the background frame generated by the network RPN extracted from the candidate frame of the fast R-CNN,the background frames are actually areas which are not marked, so that the higher score in the background frames is likely to be unknown categories which are not marked, the candidate frames are directly extracted from the background frames generated by the network RPN, and the candidate frames are sorted according to scores, and the candidate frames are ranked in the frontThe individual background boxes are labeled as unknown categories, +.>Representing +.>A background frame; wherein the RPN is collectively referred to as "Region Proposal Network", which is a module in the Faster R-CNN, for generating a candidate region in target detection.

as shown in fig. 3, class separation in potential space is an ideal feature for an open world object detection method to identify unknown classes; one natural approach is to model class separation in potential space as a feature clustering problem in which instances of the same class will be forced to stay nearby, while instances of different classes will be pushed far away, and since the unknown class may be diverse, not actually the same, classifying the unknown class into one class may have adverse effects, separating the known and unknown classes by way of feature clustering, and performing a preliminary classification of the separated unknown class, distinguishing the same class from different classes in the unknown class;

converting a classification head in a target detection model UC-ORE based on open world unknown class identification into an energy function by using an EBMs energy model; wherein EBMs refer to Energy-Based Models, i.e., energy-Based Models, which are probability generating Models;

Calculation using an energy function formula, wherein a set of known and unknown class labels of a feature spaceThe energy of all values in (a) are combined:

S5, learning a new class by utilizing an incremental learning mode according to the received unknown class label, and further circularly realizing the open world unknown class identification;

when the object label of the identified unknown class is received, a new unknown class label is input, and the object detection model UC-ORE based on the open world unknown class identification is obtained through retraining, so that the open world unknown class identification is realized.

Further, incremental learning based on sample playback is a machine learning method mainly used to deal with the addition of new data in online learning, the basic idea of which is to train a model using historical data and then use the new data together with the historical data to update the model. The method has the main advantages that the whole model can be prevented from being retrained, so that the training efficiency can be greatly improved, one common strategy is to randomly select a part of historical data to be used together with new data, and the method can prevent the model from being too dependent on certain historical data, so that the generalization capability of the model is improved, and specifically comprises the following steps:

In order to demonstrate the effectiveness of the method proposed by the present application, the following verification experiments were performed:

a comprehensive evaluation standard is provided to discuss the performance of a target detection model UC-ORE based on open world unknown class recognition, including recognition of unknown class objects, detection of known classes, and gradual learning of new classes while providing labels to the unknown classes.

Data segmentation: dividing classes into a set of tasks={/>；···/>The method comprises the steps of carrying out a first treatment on the surface of the [ MEANS FOR SOLVING PROBLEMS ] how }; all classes of a particular task will be +.>Is introduced into the system. Training->When {>：/></>All classes of willIs regarded as known, {>：/>>/>-will be regarded as unknown;

in one embodiment, classes from Pascal VOC and MS-COCO are considered. Grouping all VOC classes and data into a first task. The remaining 60 classes of MS-COCO are divided into three sequential tasks, where there is semantic drift. All images corresponding to the above images, segmented from the Pascal VOC and MS-COCO training sets, constitute training data. For evaluation, segmentation was tested using Pascal VOC and MS-COCO validated. A 1k image is extracted from the training data for each task, and left for verification. Table 1 below shows the task composition in the open world target detection evaluation criteria for unknown classification:

TABLE 1

。

Evaluation index: since the unknown target is easily confused with the known target, the Wilderness Impact (WI) index is used to explicitly describe this behavior, and ideally WI should be smaller because the accuracy cannot be degraded when the unknown target is added to the test set. In addition to WI, absolute open set error (A-OSE) is also used to reflect the number of unknown targets misclassified as a known class. Both WI and a-OSE implicitly measure the effectiveness of the model in handling unknown targets.

Table 2 shows a comparison of UC-ORE with Faster RCNN on open world target detection. After learning each task, WI and A-OSE metrics are used to quantify how confused an unknown instance with any known class. The WI and a-OSE scores of the UC-ORE were found to be significantly lower due to the explicit modeling of the unknown target. When the unknown classes were progressively marked in task 2, the performance of the baseline detector on the set of known classes (quantified by mAP) was found to drop significantly from 56.16% to 5.011%. UC-ORE can achieve two goals simultaneously: detecting the effect of the known class and reducing the effect of the unknown class. Similar trends also occur in task 3 and task 4 classes.

Table 2 shows the performance of UC-ORE in open world target detection. WI and A-OSE quantification evaluates how UC-ORE handles unknown classes, while mAP measures how well it detects known classes. It can be seen that UC-ORE is always better than the Faster R-CNN based benchmark in all indicators.

TABLE 2

。

In the application, the UC-ORE is used for clearly modeling the unknown object, so that the UC-ORE is well represented in the incremental target detection task. This is because the UC-ORE reduces confusion in which unknown objects are classified as known objects, which allows the detector to learn incrementally about the actual foreground objects. The UC-ORE was evaluated using the criteria used in ILOD (abbreviation for incremental target detector), and the Pascal VOC 2007 dataset was used to divide the dataset into three groups: 10 (known class) +10 (unknown class), 15 (known class) +5 (unknown class), 19 (known class) +1 (unknown class) to allow incremental learning of the detector. The UC-ORE was compared to ILOD at three different settings. As shown in table 3 below, UC-ORE performs very well in all settings.

TABLE 3 Table 3

。

It is to be understood that various changes and modifications in the form and detail herein disclosed may be made by those skilled in the art without departing from the spirit and scope of the present disclosure. Accordingly, equivalent modifications and variations of the present application should be within the scope of the claims of the present application. In addition, although specific terms are used in the present specification, these terms are for convenience of description only and do not limit the present application in any way.

Claims

1. The object detection method for open world unknown class identification is characterized by comprising the following steps:

2. The method for detecting an object for open world unknown class identification according to claim 1, wherein in step S1, a training set uses Pascal VOC 2007 and MS-COCO standard data sets as detection references; training tasks on a standard data set, and taking a Faster R-CNN as a reference network; the Faster R-CNN is known as Faster Region-based Convolutional Neural Network, and is a two-stage target detection algorithm.

3. The method for detecting an object for open world unknown class identification according to claim 1, wherein in step S2, in fastIn the background frames generated by the R-CNN candidate frame extraction network RPN, the background frames are actually areas which are not marked, and the background frames generated by the R-CNN candidate frame extraction network RPN are directly sorted according to scores, and the background frames are sorted according to scores before the background framesThe individual background boxes are labeled as unknown categories, +.>Representing +.>A background frame; wherein, the RPN is generally called Region Proposal Network and is one module in the Faster R-CNN for generating candidate areas in target detection.

4. A method for detecting an object for identifying an unknown class in the open world as claimed in claim 3, wherein in step S3, class separation in the potential space is an ideal feature of the unknown class identified by the open world object detection method; separating known and unknown categories by means of feature clustering, and primarily classifying the separated unknown categories to distinguish the same category from different categories in the unknown categories;

5. The method according to claim 1, wherein in step S4, the classification head in the open world unknown class identification-based object detection model UC-ORE is converted into an energy function using EBMs energy model; wherein EBMs refer to Energy-Based Models, i.e., energy-Based Models, which are probability generating Models;

based on the energy model EBMs, given feature spaceFeature vector +.>And feature vector->Corresponding class label->，/> Label-> ，/>A set of known and unknown class labels for a feature space; order of (A)The sign is to find an energy function +.>，The output is a scalar which estimates the observation variable +.>And possibly a set of output variables +.>Compatibility between;

the energy model EBMs allocate low energy values to the data of the known classes, allocate high energy values to the unknown classes, and identify the unknown classes according to the energy values.

6. The method for object detection for open world unknown class identification according to claim 1, wherein in step S4, the calculation is performed using an energy function formula, wherein the set of known and unknown class labels of the feature spaceThe energy of all values in (a) are combined:

7. The method for detecting an object identified by an open world unknown class according to claim 1, wherein in step S5, when an object tag of an identified unknown class is received, a new unknown class tag is input, and the object detection model UC-ORE based on the open world unknown class identification is obtained through retraining, thereby realizing the open world unknown class identification.

8. The method according to claim 7, wherein when the object detection model UC-ORE based on the open world unknown class identification is retrained, the new class is learned by using an incremental learning method based on sample playback, that is, a part of representative old data is stored, and after each incremental step, the object detection model UC-ORE based on the open world unknown class identification is fine-tuned, and the object detection model UC-ORE based on the open world unknown class identification is frozen except for parameters of other layers outside the output layer, and only parameters of the last output layer are adjusted.

9. The method for object detection for open world unknown class identification according to claim 8, wherein the incremental learning based on sample playback is a machine learning method for handling the addition of new data in online learning, training the object detection model UC-ORE using history data, and then using the new data together with the history data to update the model, comprising the steps of:

s5.5, testing model: evaluating the target detection model UC-ORE in step S5.3 using the test dataset;

10. The method of claim 8, wherein the fine tuning is performed by training the model using a portion of representative historical data and new data to avoid model retraining when receiving tags of unknown classes; in the model trimming, a method of adjusting only the parameters of the last output layer is generally called "head trimming" or "global trimming", and the specific implementation process is as follows:

a4, training only a new output layer: training only the new output layer, so that the target detection model UC-ORE can adapt to new tasks;