CN116152721B

CN116152721B - Target detection method and device based on annealing type label transfer learning

Info

Publication number: CN116152721B
Application number: CN202310414703.8A
Authority: CN
Inventors: 刘祥龙; 马宇晴; 张湛舸; 吴妍
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2023-04-18
Filing date: 2023-04-18
Publication date: 2023-06-20
Anticipated expiration: 2043-04-18
Also published as: CN116152721A

Abstract

The invention discloses a target detection method and device based on annealing type label transfer learning. According to the target detection method, according to the principle that known and unknown semantic features exist information coupling in the target image feature extraction process, label migration and feature decoupling are carried out on known class examples in data, and an annealing type scheduling curve is constructed to dynamically allocate the decoupling degree of the two classes of semantic features, so that effective general target information in the known class is extracted to guide learning of unknown knowledge, the detection effect of a model on the unknown class is effectively improved, and meanwhile, the strong detection capability of the original known class is reserved. The method can be applied to target detection application scenes in the open world such as automatic driving, defect detection, target tracking and the like, and has higher practical application value.

Description

Target detection method and device based on annealing type label transfer learning

Technical Field

The invention relates to a target detection method, in particular to a target detection method based on annealing type label transfer learning, and also relates to a corresponding target detection device, belonging to the technical field of computer vision.

Background

Currently, the task of object detection in computer vision comes up with new challenges. Increasingly, people are not satisfied with the object detection capabilities in conventional closed-set scenes, and begin to work on the object detection tasks for open world scene images. The difficulty of such a task for target detection is that knowledge is growing in the open world, and it is required to detect unknown targets not included in the training set while detecting known targets in the image, and to perform incremental learning on the identified unknown targets according to updating of the data set.

Compared to traditional closed-set target detection tasks, target detection tasks in the open world present new challenges: (1) detection of unknown classes: an unknown instance is detected and distinguished from similar known instances and contexts. (2) incremental learning: the identified unknown classes can be learned incrementally and a balance between learning of the original known class and the newly annotated known class is achieved. To this end, joseph et al uses an unknown class candidate region generation network with an automatic labeling strategy and provides an energy-based binary classifier to distinguish between unknown classes and known classes. Yang et al pre-define a semantic centroid for each class and push object instances near their centroids during the incremental learning process to enhance the discrimination of unknown classes from known classes. Gupta et al add attention-driven pseudo-labeling, novel classification, object scoring, etc. methods to the DETR model to detect unknown classes. Zhao et al make additional corrections to the benchmarks and evaluation metrics of the target detection task under open world settings and use a non-parametric candidate box guidance module and a class-specific exclusion classifier to improve detection of unknown classes.

Although the above-mentioned various methods realize the detection of the unknown class object through the specific network structure, because in the target detection task in the open world, only a large amount of known class labeling information exists in the data set, and the detection of the unknown class depends on the unknown class candidate frame generation mechanism with higher uncertainty, the detection effect of the existing method on the unknown class still needs to be improved.

Disclosure of Invention

The invention aims to provide a target detection method based on annealing type label transfer learning.

Another technical problem to be solved by the invention is to provide an annealing-based target detection device for label transfer learning.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

according to a first aspect of an embodiment of the present invention, there is provided an annealing-based target detection method for label transfer learning, including the steps of:

s1, guiding an open target detection model in the current stage to perform pre-training of a known class by using known class data with labeling information;

s2, adding a combined label for the data example to realize information decoupling of known and unknown semantic features;

s3, constructing an annealing type scheduling curve to dynamically allocate the information decoupling degrees of the known semantic features and the unknown semantic features, and guiding an open target detection model to perform collaborative learning of the known knowledge and the unknown knowledge;

s4, performing updating operation of a training data set according to task setting in the open world, and guiding the open target detection model to perform new incremental learning;

s5, iteratively executing the steps S1 to S4, completing training of the open target detection model, and performing target detection in the open world by using the trained open target detection model.

Preferably, the step S2 specifically includes:

from the training set of the current stage

Middle sample image +.>

The image comprises a plurality of examples of known targets, each of which is marked +.>

，/>

And->

Respectively representing a sample instance and a corresponding truth value label; for each instance in the sampled image +.>

Add a new tag->

The transfer tag called this example, original tag +.>

Called truth label, set transfer label of all instances in each image to +.>

I.e., the transfer tag of the instance is considered an unknown class.

Preferably, the step S3 specifically includes:

s31, image is formed

Inputting an open target detection model of the current stage to obtain and output classification probability of each instance;

s32, according to the image

The true value label and the added transfer label of each target instance acquire the combined unit effective code of each instance;

s33, calculating combined cross entropy loss according to the classification probability and the combined unit effective coding;

and S34, adjusting weights of a truth value tag and a transfer tag in a calculation rule of the combined unit effective code by adopting an annealing type scheduling strategy, and guiding an open target detection model to perform collaborative learning of a known class and an unknown class.

Preferably, the step S31 specifically includes:

image is formed

Open target detection model input to the current stage +.>

Wherein the detection head is obtained by the following formula>

Output classification probability->

。

Preferably, the step S32 specifically includes:

computing truth labels in the examples

Unit efficient encoding of (2)

And the transfer tag

Unit efficient encoding of (2)

The calculation rule is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,

representing the total class number of the known classes, and then calculating the combined unit effective code by the following formula

Meanwhile, the information of the known class and the unknown class carried by the truth label and the transfer label are contained, and effective unknown class characteristics contained in the example are decoupled:

representing the degree of coupling of the known class and the unknown class features.

Wherein preferably, in the step S33, the combined cross entropy loss

The calculation formula of (2) is as follows:

coupling degree representing the characteristics of the known class and the unknown class, < >>

Is the output of the classifier after normalization by the softmax function>

In->

Probability on class.

Wherein preferably, in the step S34, the method is used for regulating

The varying annealing scheduling strategy is defined as follows:

represents the number of iterations of the current phase, +.>

Representing the total number of iterations of the pre-training phase in step S1, < >>

Is a constant, express ∈ ->

Is used for the speed of change of (a),for adjusting the weight variation as the number of iterations increases.

Preferably, the step S4 specifically includes:

s41, adding new known categories and updating the data set according to task settings in the open world;

s42, pre-training the known class of the open target detection model in a new stage by using the updated data set;

s43, performing label migration on the updated data set, and adopting an annealing scheduling strategy to guide an open target detection model in a new stage to perform collaborative learning of known classes and unknown classes;

s44, performing fine-tuning training of small sample increment on the open target detection model in the new stage to keep the detection capability of the open target detection model on the original known category semantics.

Wherein preferably, in the step S41, use is made of

Training set of phases->

After the training of the open target detection model is completed, the +/is performed according to the task setting in the open world>

Training of the stage, n new classes are added incrementally to the known classes of the dataset, i.e. the known class set is updated +.>

The updated training set is as follows

。

According to a second aspect of the embodiment of the present invention, there is provided an object detection device based on annealing type tag transfer learning, including a processor and a memory, where the processor reads a computer program in the memory, and is configured to execute the above object detection method based on annealing type tag transfer learning.

Compared with the prior art, the invention provides the target detection method and the target detection device based on the annealing type label transfer learning, which are used for carrying out label transfer and characteristic decoupling on the known type examples in the data according to the principle that the known and unknown semantic features exist information coupling in the target image feature extraction process, and constructing an annealing type scheduling curve to dynamically allocate the decoupling degree of the two types of semantic features, so that effective general target information in the known type is extracted to guide the learning of unknown knowledge, the detection effect of a model on the unknown type is effectively improved, and meanwhile, the powerful detection capability of the original known type is reserved. The method can be applied to target detection application scenes in the open world such as automatic driving, defect detection, target tracking and the like, and has higher practical application value.

Drawings

FIG. 1 is a flowchart of a training process of an open target detection model used in a target detection method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a visual result of an open target detection model capable of detecting known and unknown class instances and performing incremental learning during an autopilot mission;

FIG. 3 is a flowchart of constructing an annealing type scheduling curve to dynamically allocate the information decoupling degrees of the known and unknown semantic features and guiding an open target detection model to perform collaborative learning of the known and unknown knowledge in the embodiment of the present invention;

FIG. 4 is a flowchart of performing an update operation of a training data set according to task settings in the open world and guiding an open target detection model to perform new types of incremental learning in accordance with an embodiment of the present invention;

fig. 5 is a schematic diagram of an object detection device based on annealing type label transfer learning according to an embodiment of the present invention.

Detailed Description

The technical contents of the present invention will be described in detail with reference to the accompanying drawings and specific examples.

Autopilot, defect detection, target tracking, etc. are typical open world target detection application scenarios. In the embodiment of the present invention, an autopilot scenario is mainly described as an example, but the application scenario of the present invention is not limited thereto. As shown in fig. 1, the present invention first provides a target detection method based on annealing type label transfer learning. The open target detection model used by the target detection method is obtained through training the following steps: s1, using known class data with labeling information to guide an open target detection model (simply called a model) of the current stage to perform the pre-training of the known class; s2, adding a combined label for the data example to realize information decoupling of known and unknown semantic features; s3, constructing an annealing type scheduling curve to dynamically allocate the information decoupling degrees of the known semantic features and the unknown semantic features, and guiding an open target detection model to perform collaborative learning of the known knowledge and the unknown knowledge; and S4, performing updating operation of the training data set according to task setting in the open world, and guiding the open target detection model to perform new incremental learning.

In one embodiment of the present invention, the above steps S1, S2, S3 and S4 are iteratively performed. The incremental learning is set into four stages, each new stage needs to learn a new known category, the detection capability of the original known category is reserved, and meanwhile, an unknown instance can be detected. The corresponding open target detection model employs an SGD optimizer with a batch size set to 8. For the super parameters, set

Peak value of 1, curve change rate +.>

Set to->

. In the known class pre-training phase of step S1, the initial learning rate is set to 0.01. The learning rate decay is added in the unknown class training stage of step S3, the initial learning rate is set to 0.0001, and the decay is 1/10 of the previous training iterations of 12000 and 16000 respectively. Continuously iterating training until the loss function of the open target detection model converges, and storing in a verification setAnd (5) performing parameters of each layer of the neural network with the best performance, and finishing training of the open target detection model.

On the basis of the trained open target detection model, the target detection method can be applied to application scenes such as automatic driving, defect detection, target tracking and the like, namely, the step S5 is implemented: and detecting targets in the open world by using the trained open target detection model. Here, the inventor takes an autopilot application scenario as an example, and performs landing use on an open target detection data set in a natural real scenario, so as to verify the actual effect of the target detection method provided by the embodiment of the invention. The concrete explanation is as follows:

the inventor selects image data in data sets Pascal-VOC and MS-COCO acquired in a natural scene, wherein the image data contains different types of object images shot in 80 types of natural scenes. Incremental learning is set to four phases, with each phase having a new known class set to 20 classes and other classes than the original known class and the new known class set to unknown classes. Wherein, the number of the original known categories in the first stage is 0, and only comprises 20 new categories and the rest unknown categories; in the fourth stage, the total number of the new known category and the original known category is 80, and no unknown category exists.

As shown in fig. 2, the visual example on the automatic driving task is shown in fig. 2, and the target detection method provided by the embodiment of the invention not only can detect known classes such as automobiles, pedestrians and the like in an open road scene, but also can detect an unlearned bridge deck baffle (left in fig. 2), a tire barrier (right in fig. 2) and a skateboard (right in fig. 2) in the middle of a road, and marks the unknown classes, so that cooperative identification of the known target and the unknown target in the open road scene is realized, and the unexpected situation possibly caused by the occurrence of the unknown object in the automatic driving process is avoided.

In order to quantitatively and accurately measure the performance of the target detection method provided by the embodiment of the invention, the inventor adopts known average precision K-mAP, unknown average precision U-mAP and unknown Recall U-Recall as measurement indexes, and the invention is fairly compared with other similar methods, and the results from the first stage to the fourth stage are shown in table 1.

TABLE 1

As can be seen from Table 1, compared with other similar methods, the method provided by the invention has more excellent performance in detection of unknown class and known class, wherein the average accuracy of the unknown class in the stage one is improved by 200%, and the method has better practical value in automatic driving tasks.

The specific training process of the open target detection model is described in further detail below.

In one embodiment of the present invention, step S1 specifically includes the following sub-steps: for the current stage

Training set

Which comprises a set of known classes +.>

Use training set +.>

Open target detection model for guiding current stage through multiple rounds of iteration +.>

A pre-training of a known class is performed. The loss function at pre-training is as follows:

unit valid (one-hot) code for data truth tag,/for data truth tag>

Output after processing the output of the classifier by a normalized exponential function, i.e. a softmax function +.>

In->

Probability on class.

In one embodiment of the present invention, step S2 specifically includes the following sub-steps: from the training set of the current stage

Middle sample image, image->

Comprising a plurality of examples of known targets, each of which can be denoted +.>

，/>

And->

Representing sample instances and corresponding truth labels, respectively. For sampled image->

And performing label migration. I.e. the image obtained for sampling

Every instance of->

Add a new tag->

The transfer tag, the original tag, is called the transfer tag of this example>

Called truth labels, set the positions in each imageAn example transfer tag is->

The transfer label of the instance is regarded as an unknown class, so that effective unknown class semantic features are decoupled from the known class instance under the condition that unknown class supervision information is not needed, and uncertainty of unknown class identification is reduced.

As shown in fig. 3, the step S3 specifically includes the following sub-steps:

s31, image is formed

Inputting an open target detection model of the current stage to obtain and output classification probabilities (output values of logits functions, namely original numerical values which are output by the model and are not processed by a softmax function) of each instance;

s32, according to the image

In one embodiment of the invention, the classification probability is obtained by:

image is formed

Open target detection model input to current stage

In the process, the acquisition detection head

Output classification probability

In one embodiment of the invention, the combined unit effective code is calculated by the steps of:

computing truth labels in the examples

Is effective in encoding->

And the transfer tag->

Is effective in encoding->

The rules are as follows:

representing the total number of classes of the known class, then calculating the combined unit effective code +.>

。

Meanwhile, the true value label and the transfer label respectively carry known class information and unknown class information, and effective unknown class characteristics contained in the examples are successfully decoupled, and the specific calculation method is as follows:

and the coupling degree of the known class and the unknown class is expressed, and the coupling degree is used for controlling the weight (super parameter) of the combination loss calculation of the truth label and the transfer label.

In one embodiment of the invention, the combined cross entropy loss is calculated by:

efficient coding based on the classification probability and combined units

Calculating the corresponding cross entropy loss->

The specific calculation formula is as follows:

the coupling degree of the known class and the unknown class features is expressed and is used for controlling the weight (super parameter) of the combination loss calculation of the truth label and the transfer label. />

Is the output of the classifier after the normalization of the softmax function

In->

Probability on class.

In one embodiment of the invention, collaborative learning guided by an annealing scheduling policy specifically comprises the following steps:

with the change of the iteration times, an annealing type is adoptedCoupling degree of scheduling policy to known class and unknown class characteristics

And adjusting to regulate the weight of the truth value label and the transfer label to participate in combination loss calculation, so as to guide the open target detection model to perform collaborative learning of unknown class and known class, and finally achieve balance of the two classes of knowledge. In particular for regulating and controlling

The varying annealing scheduling strategy is defined as follows:

represents the number of iterations of the current phase, +.>

Representing the total number of iterations of the pre-training phase in step S1, +.>

Is a constant, express ∈ ->

For adjusting the change in weight as the number of iterations increases.

As shown in fig. 4, the step S4 specifically includes the following sub-steps:

s42, pre-training the known class of the detection model of the new stage by using the updated data set;

s43, performing label migration on the updated data set, and adopting an annealing scheduling strategy to guide a model in a new stage to perform collaborative learning of a known class and an unknown class;

s44, fine-tuning training of small sample increment is carried out on the model in the new stage so as to keep the detection capability of the model on the semantics of the original known category.

In one embodiment of the invention, the updating of the data set specifically comprises the following sub-steps:

using

Training set of phases->

After training the model, the training is carried out according to the task setting in the open world

Training of the stage, n (n is a positive integer, the same applies below) new classes are added to the known classes of the data set incrementally, i.e. the known class set is updated to +.>

The training set after the update is +.>

。

In one embodiment of the invention, the known class data is in

The knowledge pre-training of the stage specifically comprises the following sub-steps:

using

Training set of phases->

For->

Model of stage->

Carry out a new set of known classes +.>

The loss of training process is calculated as follows:

is->

Unit efficient coding of truth labels of phase-known class data,/->

Is->

Output of time classifier after normalized by softmax function>

In->

Probability on class.

In one embodiment of the present invention,

the stage annealing type scheduling strategy guided collaborative learning specifically comprises the following sub-steps:

for data sets of new phases

Performing label migration and guiding an open target detection model +.>

Collaborative training of known classes and unknown classes is performed. The loss function at training is as follows:

degree of coupling for the features of the known class and the unknown class, -/->

Is the output of the classifier after normalization by the softmax function>

In->

Probability on class. />

、/>

、/>

The valid codes of the units of the truth label, the transfer label and the final combined label at the time t+1 are respectively shown.

In one embodiment of the invention, the small sample fine tuning training specifically comprises the following sub-steps:

sample playback strategy adopting incremental learning to ensure semantic recognition capability of model to old known category and construct the previous stage

Is>

Which contains +.about.each known class>

Samples, in->

After incremental learning is completed using the new dataset at stage, small samples are usedCollect->

The model is subjected to fine tuning training, and the loss in the training process is calculated as follows:

is the final classification output in the fine tuning training after normalization by the softmax function at +.>

Probability on class. Finally obtaining an updated open target detection model +.>

。

On the basis of the target detection method based on the annealing type label transfer learning, the invention further provides a target detection device based on the annealing type label transfer learning. As shown in fig. 5, the object detection device includes one or more processors 11 and a memory 12. Wherein the memory 12 is coupled to the processor 11 for storing one or more programs that, when executed by the one or more processors 11, cause the one or more processors 11 to implement the target detection method based on annealed tag migration learning as in the above embodiments.

The processor 11 is configured to control the overall operation of the target detection apparatus to complete all or part of the steps of the target detection method based on annealing type label transfer learning. The processor 11 may be a Central Processing Unit (CPU), a Graphics Processor (GPU), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processing (DSP) chip, or the like. The memory 12 is used to store various types of data to support operation at the object detection device, which may include, for example, instructions for any application or method operating on the object detection device, as well as application-related data.

The memory 12 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, etc.

In an exemplary embodiment, the target detection device based on the annealing type label transfer learning may be specifically implemented by a computer chip or an entity, or implemented by a product having a certain function, so as to perform the target detection method based on the annealing type label transfer learning, and achieve technical effects consistent with the method. One exemplary embodiment is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a car-mounted human-machine interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

In another exemplary embodiment, the invention also provides a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the annealing-based label transfer learning object detection method in any of the above embodiments. For example, the computer readable storage medium may be a memory including program instructions executable by a processor of the target detection apparatus based on the annealing type tag transfer learning to complete the target detection method based on the annealing type tag transfer learning and achieve technical effects consistent with the method described above.

The target detection method and the device based on annealing type label transfer learning provided by the invention are described in detail. Any obvious modifications to the present invention, without departing from the spirit thereof, would constitute an infringement of the patent rights of the invention and would take on corresponding legal liabilities.

Claims

1. The target detection method based on annealing type label transfer learning is characterized by comprising the following steps:

2. The method for detecting targets based on annealing type label transfer learning as claimed in claim 1, wherein the step S2 specifically includes:

from the training set of the current stage

Middle sample image +.>

，/>

And->

Add a new tag->

The transfer tag, the original tag, is called the transfer tag of this example>

Called truth label, set transfer label of all instances in each image to +.>

I.e., the transfer tag of the instance is considered an unknown class.

3. The method for detecting targets based on annealing type label transfer learning as claimed in claim 2, wherein the step S3 specifically includes:

s31, image is formed

s32, according to the image

4. The method for detecting an object based on annealing type tag transfer learning as claimed in claim 3, wherein the step S31 specifically includes:

image is formed

Open target detection model input to the current stage +.>

Wherein the detection head is obtained by the following formula>

Output classification probability->

：

。

5. The method for detecting an object based on annealing type tag transfer learning as claimed in claim 3, wherein the step S32 specifically includes:

computing truth labels in the examples

Is effective in encoding->

And the transfer tag->

Is effective in encoding->

The calculation rule is as follows:

wherein (1)>

Representing the total number of classes of the known class, then calculating the combined unit effective coding +.>

，/>

wherein (1)>

6. The method for detecting an object based on annealing type tag transfer learning as claimed in claim 3, wherein in said step S33, said combined cross entropy loss is calculated by

The calculation formula of (2) is as follows:

wherein (1)>

Is the output of the classifier after normalization by the softmax function>

In->

Probability on class.

7. The method for detecting targets by annealing-based label transfer learning of claim 3, wherein in step S34, the method is used for controlling

The varying annealing scheduling strategy is defined as follows:

wherein (1)>

Represents the number of iterations of the current phase, +.>

Is a constant, express ∈ ->

For adjusting the change in weight as the number of iterations increases.

8. The method for detecting targets based on annealing type label transfer learning as claimed in claim 1, wherein the step S4 specifically includes:

9. The method for detecting an object by label transfer learning based on annealing as claimed in claim 8, wherein in said step S41, use is made of

Training set of phases->

Training of phasesIncrementally adding n new classes to the known classes of the dataset, i.e., the known class set is updated to

The training set after the update is +.>

。

10. An annealing type label transfer learning-based target detection device, comprising a processor and a memory, wherein the processor reads a computer program in the memory, and is used for executing the annealing type label transfer learning-based target detection method according to any one of claims 1 to 9.