CN116152721B - Target detection method and device based on annealing type label transfer learning - Google Patents

Target detection method and device based on annealing type label transfer learning Download PDF

Info

Publication number
CN116152721B
CN116152721B CN202310414703.8A CN202310414703A CN116152721B CN 116152721 B CN116152721 B CN 116152721B CN 202310414703 A CN202310414703 A CN 202310414703A CN 116152721 B CN116152721 B CN 116152721B
Authority
CN
China
Prior art keywords
target detection
class
unknown
label
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310414703.8A
Other languages
Chinese (zh)
Other versions
CN116152721A (en
Inventor
刘祥龙
马宇晴
张湛舸
吴妍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202310414703.8A priority Critical patent/CN116152721B/en
Publication of CN116152721A publication Critical patent/CN116152721A/en
Application granted granted Critical
Publication of CN116152721B publication Critical patent/CN116152721B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a target detection method and device based on annealing type label transfer learning. According to the target detection method, according to the principle that known and unknown semantic features exist information coupling in the target image feature extraction process, label migration and feature decoupling are carried out on known class examples in data, and an annealing type scheduling curve is constructed to dynamically allocate the decoupling degree of the two classes of semantic features, so that effective general target information in the known class is extracted to guide learning of unknown knowledge, the detection effect of a model on the unknown class is effectively improved, and meanwhile, the strong detection capability of the original known class is reserved. The method can be applied to target detection application scenes in the open world such as automatic driving, defect detection, target tracking and the like, and has higher practical application value.

Description

Target detection method and device based on annealing type label transfer learning
Technical Field
The invention relates to a target detection method, in particular to a target detection method based on annealing type label transfer learning, and also relates to a corresponding target detection device, belonging to the technical field of computer vision.
Background
Currently, the task of object detection in computer vision comes up with new challenges. Increasingly, people are not satisfied with the object detection capabilities in conventional closed-set scenes, and begin to work on the object detection tasks for open world scene images. The difficulty of such a task for target detection is that knowledge is growing in the open world, and it is required to detect unknown targets not included in the training set while detecting known targets in the image, and to perform incremental learning on the identified unknown targets according to updating of the data set.
Compared to traditional closed-set target detection tasks, target detection tasks in the open world present new challenges: (1) detection of unknown classes: an unknown instance is detected and distinguished from similar known instances and contexts. (2) incremental learning: the identified unknown classes can be learned incrementally and a balance between learning of the original known class and the newly annotated known class is achieved. To this end, joseph et al uses an unknown class candidate region generation network with an automatic labeling strategy and provides an energy-based binary classifier to distinguish between unknown classes and known classes. Yang et al pre-define a semantic centroid for each class and push object instances near their centroids during the incremental learning process to enhance the discrimination of unknown classes from known classes. Gupta et al add attention-driven pseudo-labeling, novel classification, object scoring, etc. methods to the DETR model to detect unknown classes. Zhao et al make additional corrections to the benchmarks and evaluation metrics of the target detection task under open world settings and use a non-parametric candidate box guidance module and a class-specific exclusion classifier to improve detection of unknown classes.
Although the above-mentioned various methods realize the detection of the unknown class object through the specific network structure, because in the target detection task in the open world, only a large amount of known class labeling information exists in the data set, and the detection of the unknown class depends on the unknown class candidate frame generation mechanism with higher uncertainty, the detection effect of the existing method on the unknown class still needs to be improved.
Disclosure of Invention
The invention aims to provide a target detection method based on annealing type label transfer learning.
Another technical problem to be solved by the invention is to provide an annealing-based target detection device for label transfer learning.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
according to a first aspect of an embodiment of the present invention, there is provided an annealing-based target detection method for label transfer learning, including the steps of:
s1, guiding an open target detection model in the current stage to perform pre-training of a known class by using known class data with labeling information;
s2, adding a combined label for the data example to realize information decoupling of known and unknown semantic features;
s3, constructing an annealing type scheduling curve to dynamically allocate the information decoupling degrees of the known semantic features and the unknown semantic features, and guiding an open target detection model to perform collaborative learning of the known knowledge and the unknown knowledge;
s4, performing updating operation of a training data set according to task setting in the open world, and guiding the open target detection model to perform new incremental learning;
s5, iteratively executing the steps S1 to S4, completing training of the open target detection model, and performing target detection in the open world by using the trained open target detection model.
Preferably, the step S2 specifically includes:
from the training set of the current stage
Figure SMS_2
Middle sample image +.>
Figure SMS_5
The image comprises a plurality of examples of known targets, each of which is marked +.>
Figure SMS_7
,/>
Figure SMS_3
And->
Figure SMS_6
Respectively representing a sample instance and a corresponding truth value label; for each instance in the sampled image +.>
Figure SMS_8
Add a new tag->
Figure SMS_9
The transfer tag called this example, original tag +.>
Figure SMS_1
Called truth label, set transfer label of all instances in each image to +.>
Figure SMS_4
I.e., the transfer tag of the instance is considered an unknown class.
Preferably, the step S3 specifically includes:
s31, image is formed
Figure SMS_10
Inputting an open target detection model of the current stage to obtain and output classification probability of each instance;
s32, according to the image
Figure SMS_11
The true value label and the added transfer label of each target instance acquire the combined unit effective code of each instance;
s33, calculating combined cross entropy loss according to the classification probability and the combined unit effective coding;
and S34, adjusting weights of a truth value tag and a transfer tag in a calculation rule of the combined unit effective code by adopting an annealing type scheduling strategy, and guiding an open target detection model to perform collaborative learning of a known class and an unknown class.
Preferably, the step S31 specifically includes:
image is formed
Figure SMS_12
Open target detection model input to the current stage +.>
Figure SMS_13
Wherein the detection head is obtained by the following formula>
Figure SMS_14
Output classification probability->
Figure SMS_15
Figure SMS_16
Preferably, the step S32 specifically includes:
computing truth labels in the examples
Figure SMS_17
Unit efficient encoding of (2)
Figure SMS_18
And the transfer tag
Figure SMS_19
Unit efficient encoding of (2)
Figure SMS_20
The calculation rule is as follows:
Figure SMS_21
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_22
representing the total class number of the known classes, and then calculating the combined unit effective code by the following formula
Figure SMS_23
Meanwhile, the information of the known class and the unknown class carried by the truth label and the transfer label are contained, and effective unknown class characteristics contained in the example are decoupled:
Figure SMS_24
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_25
representing the degree of coupling of the known class and the unknown class features.
Wherein preferably, in the step S33, the combined cross entropy loss
Figure SMS_26
The calculation formula of (2) is as follows:
Figure SMS_27
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_28
coupling degree representing the characteristics of the known class and the unknown class, < >>
Figure SMS_29
Is the output of the classifier after normalization by the softmax function>
Figure SMS_30
In->
Figure SMS_31
Probability on class.
Wherein preferably, in the step S34, the method is used for regulating
Figure SMS_32
The varying annealing scheduling strategy is defined as follows:
Figure SMS_33
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_34
represents the number of iterations of the current phase, +.>
Figure SMS_35
Representing the total number of iterations of the pre-training phase in step S1, < >>
Figure SMS_36
Is a constant, express ∈ ->
Figure SMS_37
Is used for the speed of change of (a),for adjusting the weight variation as the number of iterations increases.
Preferably, the step S4 specifically includes:
s41, adding new known categories and updating the data set according to task settings in the open world;
s42, pre-training the known class of the open target detection model in a new stage by using the updated data set;
s43, performing label migration on the updated data set, and adopting an annealing scheduling strategy to guide an open target detection model in a new stage to perform collaborative learning of known classes and unknown classes;
s44, performing fine-tuning training of small sample increment on the open target detection model in the new stage to keep the detection capability of the open target detection model on the original known category semantics.
Wherein preferably, in the step S41, use is made of
Figure SMS_38
Training set of phases->
Figure SMS_39
After the training of the open target detection model is completed, the +/is performed according to the task setting in the open world>
Figure SMS_40
Training of the stage, n new classes are added incrementally to the known classes of the dataset, i.e. the known class set is updated +.>
Figure SMS_41
The updated training set is as follows
Figure SMS_42
According to a second aspect of the embodiment of the present invention, there is provided an object detection device based on annealing type tag transfer learning, including a processor and a memory, where the processor reads a computer program in the memory, and is configured to execute the above object detection method based on annealing type tag transfer learning.
Compared with the prior art, the invention provides the target detection method and the target detection device based on the annealing type label transfer learning, which are used for carrying out label transfer and characteristic decoupling on the known type examples in the data according to the principle that the known and unknown semantic features exist information coupling in the target image feature extraction process, and constructing an annealing type scheduling curve to dynamically allocate the decoupling degree of the two types of semantic features, so that effective general target information in the known type is extracted to guide the learning of unknown knowledge, the detection effect of a model on the unknown type is effectively improved, and meanwhile, the powerful detection capability of the original known type is reserved. The method can be applied to target detection application scenes in the open world such as automatic driving, defect detection, target tracking and the like, and has higher practical application value.
Drawings
FIG. 1 is a flowchart of a training process of an open target detection model used in a target detection method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a visual result of an open target detection model capable of detecting known and unknown class instances and performing incremental learning during an autopilot mission;
FIG. 3 is a flowchart of constructing an annealing type scheduling curve to dynamically allocate the information decoupling degrees of the known and unknown semantic features and guiding an open target detection model to perform collaborative learning of the known and unknown knowledge in the embodiment of the present invention;
FIG. 4 is a flowchart of performing an update operation of a training data set according to task settings in the open world and guiding an open target detection model to perform new types of incremental learning in accordance with an embodiment of the present invention;
fig. 5 is a schematic diagram of an object detection device based on annealing type label transfer learning according to an embodiment of the present invention.
Detailed Description
The technical contents of the present invention will be described in detail with reference to the accompanying drawings and specific examples.
Autopilot, defect detection, target tracking, etc. are typical open world target detection application scenarios. In the embodiment of the present invention, an autopilot scenario is mainly described as an example, but the application scenario of the present invention is not limited thereto. As shown in fig. 1, the present invention first provides a target detection method based on annealing type label transfer learning. The open target detection model used by the target detection method is obtained through training the following steps: s1, using known class data with labeling information to guide an open target detection model (simply called a model) of the current stage to perform the pre-training of the known class; s2, adding a combined label for the data example to realize information decoupling of known and unknown semantic features; s3, constructing an annealing type scheduling curve to dynamically allocate the information decoupling degrees of the known semantic features and the unknown semantic features, and guiding an open target detection model to perform collaborative learning of the known knowledge and the unknown knowledge; and S4, performing updating operation of the training data set according to task setting in the open world, and guiding the open target detection model to perform new incremental learning.
In one embodiment of the present invention, the above steps S1, S2, S3 and S4 are iteratively performed. The incremental learning is set into four stages, each new stage needs to learn a new known category, the detection capability of the original known category is reserved, and meanwhile, an unknown instance can be detected. The corresponding open target detection model employs an SGD optimizer with a batch size set to 8. For the super parameters, set
Figure SMS_43
Peak value of 1, curve change rate +.>
Figure SMS_44
Set to->
Figure SMS_45
. In the known class pre-training phase of step S1, the initial learning rate is set to 0.01. The learning rate decay is added in the unknown class training stage of step S3, the initial learning rate is set to 0.0001, and the decay is 1/10 of the previous training iterations of 12000 and 16000 respectively. Continuously iterating training until the loss function of the open target detection model converges, and storing in a verification setAnd (5) performing parameters of each layer of the neural network with the best performance, and finishing training of the open target detection model.
On the basis of the trained open target detection model, the target detection method can be applied to application scenes such as automatic driving, defect detection, target tracking and the like, namely, the step S5 is implemented: and detecting targets in the open world by using the trained open target detection model. Here, the inventor takes an autopilot application scenario as an example, and performs landing use on an open target detection data set in a natural real scenario, so as to verify the actual effect of the target detection method provided by the embodiment of the invention. The concrete explanation is as follows:
the inventor selects image data in data sets Pascal-VOC and MS-COCO acquired in a natural scene, wherein the image data contains different types of object images shot in 80 types of natural scenes. Incremental learning is set to four phases, with each phase having a new known class set to 20 classes and other classes than the original known class and the new known class set to unknown classes. Wherein, the number of the original known categories in the first stage is 0, and only comprises 20 new categories and the rest unknown categories; in the fourth stage, the total number of the new known category and the original known category is 80, and no unknown category exists.
As shown in fig. 2, the visual example on the automatic driving task is shown in fig. 2, and the target detection method provided by the embodiment of the invention not only can detect known classes such as automobiles, pedestrians and the like in an open road scene, but also can detect an unlearned bridge deck baffle (left in fig. 2), a tire barrier (right in fig. 2) and a skateboard (right in fig. 2) in the middle of a road, and marks the unknown classes, so that cooperative identification of the known target and the unknown target in the open road scene is realized, and the unexpected situation possibly caused by the occurrence of the unknown object in the automatic driving process is avoided.
In order to quantitatively and accurately measure the performance of the target detection method provided by the embodiment of the invention, the inventor adopts known average precision K-mAP, unknown average precision U-mAP and unknown Recall U-Recall as measurement indexes, and the invention is fairly compared with other similar methods, and the results from the first stage to the fourth stage are shown in table 1.
TABLE 1
Figure SMS_46
As can be seen from Table 1, compared with other similar methods, the method provided by the invention has more excellent performance in detection of unknown class and known class, wherein the average accuracy of the unknown class in the stage one is improved by 200%, and the method has better practical value in automatic driving tasks.
The specific training process of the open target detection model is described in further detail below.
In one embodiment of the present invention, step S1 specifically includes the following sub-steps: for the current stage
Figure SMS_47
Training set
Figure SMS_48
Which comprises a set of known classes +.>
Figure SMS_49
Use training set +.>
Figure SMS_50
Open target detection model for guiding current stage through multiple rounds of iteration +.>
Figure SMS_51
A pre-training of a known class is performed. The loss function at pre-training is as follows:
Figure SMS_52
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_53
unit valid (one-hot) code for data truth tag,/for data truth tag>
Figure SMS_54
Output after processing the output of the classifier by a normalized exponential function, i.e. a softmax function +.>
Figure SMS_55
In->
Figure SMS_56
Probability on class.
In one embodiment of the present invention, step S2 specifically includes the following sub-steps: from the training set of the current stage
Figure SMS_59
Middle sample image, image->
Figure SMS_62
Comprising a plurality of examples of known targets, each of which can be denoted +.>
Figure SMS_65
,/>
Figure SMS_58
And->
Figure SMS_61
Representing sample instances and corresponding truth labels, respectively. For sampled image->
Figure SMS_64
And performing label migration. I.e. the image obtained for sampling
Figure SMS_67
Every instance of->
Figure SMS_57
Add a new tag->
Figure SMS_60
The transfer tag, the original tag, is called the transfer tag of this example>
Figure SMS_63
Called truth labels, set the positions in each imageAn example transfer tag is->
Figure SMS_66
The transfer label of the instance is regarded as an unknown class, so that effective unknown class semantic features are decoupled from the known class instance under the condition that unknown class supervision information is not needed, and uncertainty of unknown class identification is reduced.
As shown in fig. 3, the step S3 specifically includes the following sub-steps:
s31, image is formed
Figure SMS_68
Inputting an open target detection model of the current stage to obtain and output classification probabilities (output values of logits functions, namely original numerical values which are output by the model and are not processed by a softmax function) of each instance;
s32, according to the image
Figure SMS_69
The true value label and the added transfer label of each target instance acquire the combined unit effective code of each instance;
s33, calculating combined cross entropy loss according to the classification probability and the combined unit effective coding;
and S34, adjusting weights of a truth value tag and a transfer tag in a calculation rule of the combined unit effective code by adopting an annealing type scheduling strategy, and guiding an open target detection model to perform collaborative learning of a known class and an unknown class.
In one embodiment of the invention, the classification probability is obtained by:
image is formed
Figure SMS_70
Open target detection model input to current stage
Figure SMS_71
In the process, the acquisition detection head
Figure SMS_72
Output classification probability
Figure SMS_73
Figure SMS_74
In one embodiment of the invention, the combined unit effective code is calculated by the steps of:
computing truth labels in the examples
Figure SMS_75
Is effective in encoding->
Figure SMS_76
And the transfer tag->
Figure SMS_77
Is effective in encoding->
Figure SMS_78
The rules are as follows:
Figure SMS_79
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_80
representing the total number of classes of the known class, then calculating the combined unit effective code +.>
Figure SMS_81
Figure SMS_82
Meanwhile, the true value label and the transfer label respectively carry known class information and unknown class information, and effective unknown class characteristics contained in the examples are successfully decoupled, and the specific calculation method is as follows:
Figure SMS_83
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_84
and the coupling degree of the known class and the unknown class is expressed, and the coupling degree is used for controlling the weight (super parameter) of the combination loss calculation of the truth label and the transfer label.
In one embodiment of the invention, the combined cross entropy loss is calculated by:
efficient coding based on the classification probability and combined units
Figure SMS_85
Calculating the corresponding cross entropy loss->
Figure SMS_86
The specific calculation formula is as follows:
Figure SMS_87
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_88
the coupling degree of the known class and the unknown class features is expressed and is used for controlling the weight (super parameter) of the combination loss calculation of the truth label and the transfer label. />
Figure SMS_89
Is the output of the classifier after the normalization of the softmax function
Figure SMS_90
In->
Figure SMS_91
Probability on class.
In one embodiment of the invention, collaborative learning guided by an annealing scheduling policy specifically comprises the following steps:
with the change of the iteration times, an annealing type is adoptedCoupling degree of scheduling policy to known class and unknown class characteristics
Figure SMS_92
And adjusting to regulate the weight of the truth value label and the transfer label to participate in combination loss calculation, so as to guide the open target detection model to perform collaborative learning of unknown class and known class, and finally achieve balance of the two classes of knowledge. In particular for regulating and controlling
Figure SMS_93
The varying annealing scheduling strategy is defined as follows:
Figure SMS_94
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_95
represents the number of iterations of the current phase, +.>
Figure SMS_96
Representing the total number of iterations of the pre-training phase in step S1, +.>
Figure SMS_97
Is a constant, express ∈ ->
Figure SMS_98
For adjusting the change in weight as the number of iterations increases.
As shown in fig. 4, the step S4 specifically includes the following sub-steps:
s41, adding new known categories and updating the data set according to task settings in the open world;
s42, pre-training the known class of the detection model of the new stage by using the updated data set;
s43, performing label migration on the updated data set, and adopting an annealing scheduling strategy to guide a model in a new stage to perform collaborative learning of a known class and an unknown class;
s44, fine-tuning training of small sample increment is carried out on the model in the new stage so as to keep the detection capability of the model on the semantics of the original known category.
In one embodiment of the invention, the updating of the data set specifically comprises the following sub-steps:
using
Figure SMS_99
Training set of phases->
Figure SMS_100
After training the model, the training is carried out according to the task setting in the open world
Figure SMS_101
Training of the stage, n (n is a positive integer, the same applies below) new classes are added to the known classes of the data set incrementally, i.e. the known class set is updated to +.>
Figure SMS_102
The training set after the update is +.>
Figure SMS_103
In one embodiment of the invention, the known class data is in
Figure SMS_104
The knowledge pre-training of the stage specifically comprises the following sub-steps:
using
Figure SMS_105
Training set of phases->
Figure SMS_106
For->
Figure SMS_107
Model of stage->
Figure SMS_108
Carry out a new set of known classes +.>
Figure SMS_109
The loss of training process is calculated as follows:
Figure SMS_110
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_111
is->
Figure SMS_112
Unit efficient coding of truth labels of phase-known class data,/->
Figure SMS_113
Is->
Figure SMS_114
Output of time classifier after normalized by softmax function>
Figure SMS_115
In->
Figure SMS_116
Probability on class.
In one embodiment of the present invention,
Figure SMS_117
the stage annealing type scheduling strategy guided collaborative learning specifically comprises the following sub-steps:
for data sets of new phases
Figure SMS_118
Performing label migration and guiding an open target detection model +.>
Figure SMS_119
Collaborative training of known classes and unknown classes is performed. The loss function at training is as follows:
Figure SMS_120
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_121
degree of coupling for the features of the known class and the unknown class, -/->
Figure SMS_122
Is the output of the classifier after normalization by the softmax function>
Figure SMS_123
In->
Figure SMS_124
Probability on class. />
Figure SMS_125
、/>
Figure SMS_126
、/>
Figure SMS_127
The valid codes of the units of the truth label, the transfer label and the final combined label at the time t+1 are respectively shown.
In one embodiment of the invention, the small sample fine tuning training specifically comprises the following sub-steps:
sample playback strategy adopting incremental learning to ensure semantic recognition capability of model to old known category and construct the previous stage
Figure SMS_128
Is>
Figure SMS_129
Which contains +.about.each known class>
Figure SMS_130
Samples, in->
Figure SMS_131
After incremental learning is completed using the new dataset at stage, small samples are usedCollect->
Figure SMS_132
The model is subjected to fine tuning training, and the loss in the training process is calculated as follows:
Figure SMS_133
wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure SMS_134
is the final classification output in the fine tuning training after normalization by the softmax function at +.>
Figure SMS_135
Probability on class. Finally obtaining an updated open target detection model +.>
Figure SMS_136
On the basis of the target detection method based on the annealing type label transfer learning, the invention further provides a target detection device based on the annealing type label transfer learning. As shown in fig. 5, the object detection device includes one or more processors 11 and a memory 12. Wherein the memory 12 is coupled to the processor 11 for storing one or more programs that, when executed by the one or more processors 11, cause the one or more processors 11 to implement the target detection method based on annealed tag migration learning as in the above embodiments.
The processor 11 is configured to control the overall operation of the target detection apparatus to complete all or part of the steps of the target detection method based on annealing type label transfer learning. The processor 11 may be a Central Processing Unit (CPU), a Graphics Processor (GPU), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), a Digital Signal Processing (DSP) chip, or the like. The memory 12 is used to store various types of data to support operation at the object detection device, which may include, for example, instructions for any application or method operating on the object detection device, as well as application-related data.
The memory 12 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, etc.
In an exemplary embodiment, the target detection device based on the annealing type label transfer learning may be specifically implemented by a computer chip or an entity, or implemented by a product having a certain function, so as to perform the target detection method based on the annealing type label transfer learning, and achieve technical effects consistent with the method. One exemplary embodiment is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a car-mounted human-machine interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
In another exemplary embodiment, the invention also provides a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the annealing-based label transfer learning object detection method in any of the above embodiments. For example, the computer readable storage medium may be a memory including program instructions executable by a processor of the target detection apparatus based on the annealing type tag transfer learning to complete the target detection method based on the annealing type tag transfer learning and achieve technical effects consistent with the method described above.
Compared with the prior art, the invention provides the target detection method and the target detection device based on the annealing type label transfer learning, which are used for carrying out label transfer and characteristic decoupling on the known type examples in the data according to the principle that the known and unknown semantic features exist information coupling in the target image feature extraction process, and constructing an annealing type scheduling curve to dynamically allocate the decoupling degree of the two types of semantic features, so that effective general target information in the known type is extracted to guide the learning of unknown knowledge, the detection effect of a model on the unknown type is effectively improved, and meanwhile, the powerful detection capability of the original known type is reserved. The method can be applied to target detection application scenes in the open world such as automatic driving, defect detection, target tracking and the like, and has higher practical application value.
The target detection method and the device based on annealing type label transfer learning provided by the invention are described in detail. Any obvious modifications to the present invention, without departing from the spirit thereof, would constitute an infringement of the patent rights of the invention and would take on corresponding legal liabilities.

Claims (10)

1. The target detection method based on annealing type label transfer learning is characterized by comprising the following steps:
s1, guiding an open target detection model in the current stage to perform pre-training of a known class by using known class data with labeling information;
s2, adding a combined label for the data example to realize information decoupling of known and unknown semantic features;
s3, constructing an annealing type scheduling curve to dynamically allocate the information decoupling degrees of the known semantic features and the unknown semantic features, and guiding an open target detection model to perform collaborative learning of the known knowledge and the unknown knowledge;
s4, performing updating operation of a training data set according to task setting in the open world, and guiding the open target detection model to perform new incremental learning;
s5, iteratively executing the steps S1 to S4, completing training of the open target detection model, and performing target detection in the open world by using the trained open target detection model.
2. The method for detecting targets based on annealing type label transfer learning as claimed in claim 1, wherein the step S2 specifically includes:
from the training set of the current stage
Figure QLYQS_2
Middle sample image +.>
Figure QLYQS_5
The image comprises a plurality of examples of known targets, each of which is marked +.>
Figure QLYQS_7
,/>
Figure QLYQS_3
And->
Figure QLYQS_6
Respectively representing a sample instance and a corresponding truth value label; for each instance in the sampled image +.>
Figure QLYQS_8
Add a new tag->
Figure QLYQS_9
The transfer tag, the original tag, is called the transfer tag of this example>
Figure QLYQS_1
Called truth label, set transfer label of all instances in each image to +.>
Figure QLYQS_4
I.e., the transfer tag of the instance is considered an unknown class.
3. The method for detecting targets based on annealing type label transfer learning as claimed in claim 2, wherein the step S3 specifically includes:
s31, image is formed
Figure QLYQS_10
Inputting an open target detection model of the current stage to obtain and output classification probability of each instance;
s32, according to the image
Figure QLYQS_11
The true value label and the added transfer label of each target instance acquire the combined unit effective code of each instance;
s33, calculating combined cross entropy loss according to the classification probability and the combined unit effective coding;
and S34, adjusting weights of a truth value tag and a transfer tag in a calculation rule of the combined unit effective code by adopting an annealing type scheduling strategy, and guiding an open target detection model to perform collaborative learning of a known class and an unknown class.
4. The method for detecting an object based on annealing type tag transfer learning as claimed in claim 3, wherein the step S31 specifically includes:
image is formed
Figure QLYQS_12
Open target detection model input to the current stage +.>
Figure QLYQS_13
Wherein the detection head is obtained by the following formula>
Figure QLYQS_14
Output classification probability->
Figure QLYQS_15
Figure QLYQS_16
5. The method for detecting an object based on annealing type tag transfer learning as claimed in claim 3, wherein the step S32 specifically includes:
computing truth labels in the examples
Figure QLYQS_17
Is effective in encoding->
Figure QLYQS_18
And the transfer tag->
Figure QLYQS_19
Is effective in encoding->
Figure QLYQS_20
The calculation rule is as follows:
Figure QLYQS_21
wherein (1)>
Figure QLYQS_22
Representing the total number of classes of the known class, then calculating the combined unit effective coding +.>
Figure QLYQS_23
,/>
Figure QLYQS_24
Meanwhile, the information of the known class and the unknown class carried by the truth label and the transfer label are contained, and effective unknown class characteristics contained in the example are decoupled:
Figure QLYQS_25
wherein (1)>
Figure QLYQS_26
Representing the degree of coupling of the known class and the unknown class features.
6. The method for detecting an object based on annealing type tag transfer learning as claimed in claim 3, wherein in said step S33, said combined cross entropy loss is calculated by
Figure QLYQS_27
The calculation formula of (2) is as follows:
Figure QLYQS_28
wherein (1)>
Figure QLYQS_29
Coupling degree representing the characteristics of the known class and the unknown class, < >>
Figure QLYQS_30
Is the output of the classifier after normalization by the softmax function>
Figure QLYQS_31
In->
Figure QLYQS_32
Probability on class.
7. The method for detecting targets by annealing-based label transfer learning of claim 3, wherein in step S34, the method is used for controlling
Figure QLYQS_33
The varying annealing scheduling strategy is defined as follows:
Figure QLYQS_34
wherein (1)>
Figure QLYQS_35
Represents the number of iterations of the current phase, +.>
Figure QLYQS_36
Representing the total number of iterations of the pre-training phase in step S1, < >>
Figure QLYQS_37
Is a constant, express ∈ ->
Figure QLYQS_38
For adjusting the change in weight as the number of iterations increases.
8. The method for detecting targets based on annealing type label transfer learning as claimed in claim 1, wherein the step S4 specifically includes:
s41, adding new known categories and updating the data set according to task settings in the open world;
s42, pre-training the known class of the open target detection model in a new stage by using the updated data set;
s43, performing label migration on the updated data set, and adopting an annealing scheduling strategy to guide an open target detection model in a new stage to perform collaborative learning of known classes and unknown classes;
s44, performing fine-tuning training of small sample increment on the open target detection model in the new stage to keep the detection capability of the open target detection model on the original known category semantics.
9. The method for detecting an object by label transfer learning based on annealing as claimed in claim 8, wherein in said step S41, use is made of
Figure QLYQS_39
Training set of phases->
Figure QLYQS_40
After the training of the open target detection model is completed, the +/is performed according to the task setting in the open world>
Figure QLYQS_41
Training of phasesIncrementally adding n new classes to the known classes of the dataset, i.e., the known class set is updated to
Figure QLYQS_42
The training set after the update is +.>
Figure QLYQS_43
10. An annealing type label transfer learning-based target detection device, comprising a processor and a memory, wherein the processor reads a computer program in the memory, and is used for executing the annealing type label transfer learning-based target detection method according to any one of claims 1 to 9.
CN202310414703.8A 2023-04-18 2023-04-18 Target detection method and device based on annealing type label transfer learning Active CN116152721B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310414703.8A CN116152721B (en) 2023-04-18 2023-04-18 Target detection method and device based on annealing type label transfer learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310414703.8A CN116152721B (en) 2023-04-18 2023-04-18 Target detection method and device based on annealing type label transfer learning

Publications (2)

Publication Number Publication Date
CN116152721A CN116152721A (en) 2023-05-23
CN116152721B true CN116152721B (en) 2023-06-20

Family

ID=86360369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310414703.8A Active CN116152721B (en) 2023-04-18 2023-04-18 Target detection method and device based on annealing type label transfer learning

Country Status (1)

Country Link
CN (1) CN116152721B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104751198A (en) * 2013-12-27 2015-07-01 华为技术有限公司 Method and device for identifying target object in image
CN112084866A (en) * 2020-08-07 2020-12-15 浙江工业大学 Target detection method based on improved YOLO v4 algorithm
CN115731445A (en) * 2021-08-26 2023-03-03 丰田自动车株式会社 Learning method, information processing apparatus, and recording medium having learning program recorded thereon

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220092407A1 (en) * 2020-09-23 2022-03-24 International Business Machines Corporation Transfer learning with machine learning systems

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104751198A (en) * 2013-12-27 2015-07-01 华为技术有限公司 Method and device for identifying target object in image
CN112084866A (en) * 2020-08-07 2020-12-15 浙江工业大学 Target detection method based on improved YOLO v4 algorithm
CN115731445A (en) * 2021-08-26 2023-03-03 丰田自动车株式会社 Learning method, information processing apparatus, and recording medium having learning program recorded thereon

Also Published As

Publication number Publication date
CN116152721A (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN112990432A (en) Target recognition model training method and device and electronic equipment
CN110852447A (en) Meta learning method and apparatus, initialization method, computing device, and storage medium
CN111079780B (en) Training method for space diagram convolution network, electronic equipment and storage medium
CN112634170B (en) Method, device, computer equipment and storage medium for correcting blurred image
CN111027605A (en) Fine-grained image recognition method and device based on deep learning
CN111260032A (en) Neural network training method, image processing method and device
CN111784595B (en) Dynamic tag smooth weighting loss method and device based on historical record
CN113837370A (en) Method and apparatus for training a model based on contrast learning
CN116127953B (en) Chinese spelling error correction method, device and medium based on contrast learning
US10380456B2 (en) Classification dictionary learning system, classification dictionary learning method and recording medium
Cermelli et al. Modeling missing annotations for incremental learning in object detection
CN110909784A (en) Training method and device of image recognition model and electronic equipment
CN112884147A (en) Neural network training method, image processing method, device and electronic equipment
CN115147680B (en) Pre-training method, device and equipment for target detection model
CN114186063A (en) Training method and classification method of cross-domain text emotion classification model
CN110197213B (en) Image matching method, device and equipment based on neural network
CN116152721B (en) Target detection method and device based on annealing type label transfer learning
CN113435531A (en) Zero sample image classification method and system, electronic equipment and storage medium
CN117218408A (en) Open world target detection method and device based on causal correction learning
CN116958809A (en) Remote sensing small sample target detection method for feature library migration
CN116910571A (en) Open-domain adaptation method and system based on prototype comparison learning
CN116503670A (en) Image classification and model training method, device and equipment and storage medium
CN112766282B (en) Image recognition method, device, equipment and computer readable medium
CN111984872B (en) Multi-modal information social media popularity prediction method based on iterative optimization strategy
JP2016062249A (en) Identification dictionary learning system, recognition dictionary learning method and recognition dictionary learning program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant