CN116342851A - Target detection model construction method, target detection method and device - Google Patents

Target detection model construction method, target detection method and device Download PDF

Info

Publication number
CN116342851A
CN116342851A CN202211708845.7A CN202211708845A CN116342851A CN 116342851 A CN116342851 A CN 116342851A CN 202211708845 A CN202211708845 A CN 202211708845A CN 116342851 A CN116342851 A CN 116342851A
Authority
CN
China
Prior art keywords
data set
training
model
prediction
pseudo tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211708845.7A
Other languages
Chinese (zh)
Inventor
李林超
王威
何林阳
周凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zhuoyun Intelligent Technology Co ltd
Original Assignee
Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Zhuoyun Intelligent Technology Co ltd filed Critical Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority to CN202211708845.7A priority Critical patent/CN116342851A/en
Publication of CN116342851A publication Critical patent/CN116342851A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/16Image acquisition using multiple overlapping images; Image stitching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection model construction method, a target detection method and a target detection device, wherein the target detection model construction method comprises the following steps: acquiring a marked data set and an unmarked data set; model training is carried out based on the marked data set, and a first training model is obtained; training the initialized second training model according to the marked data set and the unmarked data set; and when training is performed each time, performing first pseudo-label calibration on the unlabeled data set according to the current second training model, performing target pasting operation or image splicing operation on the unlabeled data set according to a pseudo-label calibration result so as to fuse the unlabeled data set with the labeled data set, and performing local training on the current second training model based on the fused data set to obtain a first target detection model. The invention can improve the detection capability of the model.

Description

Target detection model construction method, target detection method and device
Technical Field
The present invention relates to the field of target detection technologies, and in particular, to a method for constructing a target detection model, a method for detecting a target, and a device for constructing a target detection model.
Background
With rapid development of deep learning, the deep learning is widely applied in various fields, wherein in the field of vision, a very good effect is obtained by using a deep learning technology for target detection. The existing target detection model construction method generally predicts an unlabeled data set directly to calibrate a pseudo tag, and constructs a target detection model based on the calibrated pseudo tag, so that target features and background features of unlabeled data cannot be correctly learned during model training, and the detection capability of the model is poor, and the conditions of missed detection and false detection are easy to occur.
Disclosure of Invention
The invention provides a target detection model construction method, a target detection method and a target detection device, which are used for solving the technical problems that the existing target detection model construction method cannot accurately learn target characteristics and background characteristics of unlabeled data during model training, so that the detection capability of a model is poor, and the conditions of missed detection and false detection are easy to occur.
One embodiment of the present invention provides a method for constructing a target detection model, including:
acquiring a marked data set and an unmarked data set;
model training is carried out based on the marked data set, and a first training model is obtained;
initializing a second training model according to the model parameters of the first training model, and training the initialized second training model according to the marked data set and the unmarked data set; and when training is performed each time, performing first pseudo-label calibration on the unlabeled data set according to the current second training model, performing target pasting operation or image splicing operation on the unlabeled data set according to a pseudo-label calibration result so as to fuse the unlabeled data set with the labeled data set, and performing local training on the current second training model based on the fused data set to obtain a first target detection model.
Further, a two-class branch network for judging whether the prediction frame is a foreground or not is arranged in the first training model;
performing first pseudo tag calibration on the unlabeled data set according to the current second training model, including:
and carrying out target prediction on the unlabeled data set according to the current second training model to obtain a plurality of prediction frames, and carrying out first pseudo tag calibration according to the plurality of prediction frames to obtain a pseudo tag calibration result, wherein each prediction frame comprises a prediction frame category, a prediction frame confidence coefficient and a prediction frame binary category.
Further, performing first pseudo tag calibration according to a plurality of the prediction frames to obtain a pseudo tag calibration result, including:
marking the prediction frames with the classification of the prediction frames as the background as a first error label in the prediction frames, wherein the classification of the prediction frames is not background; and the confidence coefficient of the prediction frame is smaller than a first preset value, and the prediction frame with the confidence coefficient larger than a second preset value is marked as a second error label, so that a pseudo label marking result is obtained.
Further, performing a target pasting operation or an image stitching operation on the unlabeled dataset includes:
pasting images which do not have the first error label and the second error label on the marked data set, wherein the confidence coefficient of the predicted frame of the unmarked data set is larger than that of a predicted frame of a first preset value;
or performing splicing operation on the images with the confidence coefficient of the predicted frame smaller than a second preset value and without the first error label and the second error label.
One embodiment of the present invention provides a method for constructing a target detection model, including:
according to the second training model obtained by the target detection model construction method, performing second pseudo tag calibration on the unlabeled data set to obtain a pseudo tag data set;
initializing a third training model according to model parameters of the second training model, and training the initialized third training model according to the marked data set and the pseudo tag data set to obtain a second target detection model; and when training is performed each time, inputting a third training model, wherein the input of the third training model comprises at least one marked image in marked data sets and at least one pseudo tag data image in pseudo tag data sets, after forward propagation is performed to obtain a prediction frame, the obtained prediction frame is matched with an input second pseudo tag, and after matching, loss function calculation is performed.
Further, performing second pseudo tag calibration on the unlabeled dataset to obtain a pseudo tag dataset, including:
predicting unlabeled data to obtain a plurality of prediction frames;
setting a calibration condition of a second pseudo tag according to the prediction frame confidence coefficient of the prediction frame and the I OU threshold value;
marking a plurality of second pseudo tags in a plurality of prediction frames based on the marking condition;
and taking the unlabeled data set corresponding to the second pseudo tag as a pseudo tag data set.
Further, the calibration conditions of the second pseudo tag include:
the confidence of the prediction frame is larger than a third preset value, and the I OU threshold value is smaller than a fourth preset value;
the prediction frame confidence is smaller than the third preset value and larger than the fifth preset value, and the I OU threshold is smaller than the sixth preset value.
Further, matching the obtained prediction frame with the input second pseudo tag, and performing loss function calculation after matching;
performing non-maximum operation according to the confidence coefficient of the predicted frame to obtain a candidate frame in the predicted frame;
and carrying out I OU calculation on the candidate frame and the second pseudo tag, taking the second pseudo tag with the largest I OU value as the best pseudo tag, and carrying out loss function calculation based on the best pseudo tag.
One embodiment of the present invention provides a target detection method, including:
acquiring a first image to be identified, inputting the first image into a target detection model, and outputting a target detection result of the first image;
the target detection model is constructed according to the target detection model construction method.
One embodiment of the present invention provides an object detection apparatus including;
the target detection module is used for acquiring a first image to be identified, inputting the first image into a target detection model and outputting a target detection result of the first image;
the target detection model is constructed according to the target detection model construction method.
An embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, a device where the computer readable storage medium is controlled to execute the method for constructing an object detection model as described above.
According to the embodiment of the invention, the target pasting operation or the image stitching operation is carried out on the unlabeled data set, all images are traversed, so that the unlabeled data set and the labeled data set are fused, and a new training data set is obtained. The prediction frame with higher confidence coefficient can be pasted on the marked data set, so that the probability of the pasted data on the occurrence of the target is lower, and the occurrence of false detection is reduced. Or performing image stitching operation according to the confidence coefficient of the prediction frame, so that the converged model can learn correct target characteristic information and background characteristic information, the occurrence of false detection is further reduced, and the accuracy of target detection by the model is improved.
Drawings
FIG. 1 is a schematic flow chart of a method for constructing a target detection model according to an embodiment of the present invention;
FIG. 2 is another schematic flow chart of a method for constructing a target detection model according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a target detection method according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an object detection model construction device according to an embodiment of the present invention;
FIG. 5 is another schematic structural diagram of an object detection model construction device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an object detection device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.
In the description of the present application, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.
Referring to fig. 1, an embodiment of the present invention provides a method for constructing an object detection model, including:
s1, acquiring a marked data set and an unmarked data set;
the embodiment of the invention can be applied to target detection scenes such as X-ray contraband detection and the like. Before model training can be performed, the marked data set and the unmarked data set need to be acquired.
In the embodiment of the invention, the marked data set can be a plurality of marked image data, and the unmarked data set can be a plurality of unmarked image data.
S2, model training is carried out based on the marked data set, and a first training model is obtained;
in the embodiment of the invention, model training is performed in a target detection network through the marked data set, so as to obtain a first training model.
In the embodiment of the invention, the target detection network is internally provided with the two-class branch network for judging whether the prediction frame is the foreground or not so as to obtain the judging result of whether the prediction frame is the background or the foreground, thereby being capable of playing a role in restraining the pseudo tag training set, playing a role in semi-supervised training, reducing the occurrence of false detection conditions and effectively improving the detection capability of the model.
In the embodiment of the invention, a plurality of loss functions are set for model training, wherein the sigmoi function is set as the loss function for judging the binary class of the prediction frame. Wherein the expression of the s i gmoi d function may be as follows:
Loss 2_cls =-(ylog 1-p +(1-y)log p ) (1)
in the Loss 2_cls Representing a binary class loss function.
On the other hand, the embodiment of the invention also sets a loss function of the confidence of the predicted frame and a loss function for judging the category of the predicted frame, and the expression can be the following formula:
Loss det =Loss det_cls +Loss det_bbox (2)
Loss sum =Loss det +Loss 2_cls (3)
in the Loss det Loss function representing prediction frame confidence, loss det_cls Class Loss value representing detection algorithm, loss det_bbox Representing regression loss of the target detection algorithm; lossu represents a loss function that determines the prediction box class.
According to the embodiment of the invention, model training is carried out according to the 3 loss functions, so that the model obtained through training can be used for predicting and obtaining the binary class loss, the class loss and the regression loss of the prediction frame. When the loss value of the loss function tends to be stable, the loss value fluctuation is within a set range, the first training model converges, and the target detection network is not trained any more.
In addition, when the first training model is used for target prediction, coordinate information of a prediction frame can be obtained.
S3, initializing a second training model according to model parameters of the first training model, and training the initialized second training model according to the marked data set and the unmarked data set; and when training is performed each time, performing first pseudo-label calibration on the unlabeled data set according to the current second training model, performing target pasting operation or image splicing operation on the unlabeled data set according to a pseudo-label calibration result so as to fuse the unlabeled data set with the labeled data set, and performing local training on the current second training model based on the fused data set to obtain a first target detection model.
In the embodiment of the invention, the second training model can be initialized according to the parameters of the first training model, so that the second training model has certain detection capability when training is started, and further, the subsequent training steps can be started based on the initialized second training model.
In the embodiment of the invention, the unlabeled data set is subjected to target pasting operation or image stitching operation, and all images are traversed, so that the unlabeled data set and the labeled data set are fused, and a new training data set is obtained. The prediction frame with higher confidence coefficient can be pasted on the marked data set, so that the probability of the pasted data on the occurrence of the target is lower, and the occurrence of false detection is reduced. Or performing image stitching operation according to the confidence coefficient of the prediction frame, so that the converged model can learn correct target characteristic information and background characteristic information, and further the occurrence of false detection is reduced.
In the embodiment of the invention, after the loss function of the second training model meets the preset requirement, the second training model converges, for example, when the loss value of the loss function of the second training model tends to be stable, the second training model is judged to converge, and then the converged second training model is used as the first target detection model.
In one embodiment, a two-class branch network for judging whether the prediction frame is a foreground is arranged in the first training model;
performing first pseudo tag calibration on the unlabeled data set according to the current second training model, including:
and carrying out target prediction on the unlabeled data set according to the current second training model to obtain a plurality of prediction frames, and carrying out first pseudo tag calibration according to the plurality of prediction frames to obtain a pseudo tag calibration result, wherein each prediction frame comprises a prediction frame category, a prediction frame confidence coefficient and a prediction frame binary category.
In the embodiment of the invention, the second training model is utilized to conduct target prediction on the unlabeled data set, a plurality of prediction frames can be obtained, and according to the category or the confidence level of the prediction frames, a first pseudo tag can be selected from the prediction frames and calibrated, so that a pseudo tag calibration result can be obtained.
In one embodiment, performing a first pseudo tag calibration according to a plurality of prediction frames to obtain a pseudo tag calibration result, including:
marking a prediction frame with the classification of the prediction frame as a background as a first error label in the prediction frame, wherein the classification of the prediction frame is not the background; and the confidence coefficient of the prediction frame is smaller than the first preset value, and the prediction frame larger than the second preset value is marked as a second error label, so that a pseudo label marking result is obtained.
In the embodiment of the invention, whether the prediction frame meets the calibration condition of the error label can be judged by combining the prediction frame type and the prediction frame binary type results. For example, when the prediction box class is not background, but the prediction light for which the prediction box class is background is marked as the first error label.
The embodiment of the invention can also judge whether the prediction frame meets the calibration condition of the error label according to the confidence level of the prediction frame. For example, when the confidence coefficient of the predicted frame is smaller than the first preset value and larger than the second preset value, the predicted frame is marked as a second error label, and then a pseudo label marking result is obtained. The first preset value and the second preset value can be set and adjusted according to actual needs, and in a specific embodiment, the first preset value can be 0.9, and the second preset value can be 0.05.
In the embodiment of the present invention, the first error label and the second error label may be marked as-1.
In one embodiment, performing a target pasting operation or an image stitching operation on an unlabeled dataset includes:
pasting images which are not marked with the data sets and have no first error labels and second error labels on the marked data sets, wherein the confidence of the predicted frames of the unmarked data sets is larger than a predicted frame of a first preset value;
in the embodiment of the invention, the image with the confidence of the predicted frame larger than the first preset value in the unlabeled data set and without the first error label and the second error label can be pasted on the blank of the image with the labeled data set.
Or performing splicing operation on the images with the confidence coefficient of the predicted frame smaller than the second preset value and without the first error label and the second error label.
In the embodiment of the invention, the images with the confidence coefficient smaller than the second preset value and without the first error label and the second error label in the marked data set and the unmarked data set can be spliced.
According to the embodiment of the invention, the target pasting operation or the image stitching operation is carried out on the unlabeled data set until all the images are traversed, so that the second training model can learn correct target characteristic information and background characteristic information, the occurrence of the condition of missing detection and false detection is avoided, and the model detection capability can be effectively provided.
The embodiment of the invention has the following beneficial effects:
according to the embodiment of the invention, the target pasting operation or the image stitching operation is carried out on the unlabeled data set, all images are traversed, so that the unlabeled data set and the labeled data set are fused, and a new training data set is obtained. The prediction frame with higher confidence coefficient can be pasted on the marked data set, so that the probability of the pasted data on the occurrence of the target is lower, and the occurrence of false detection is reduced. Or performing image stitching operation according to the confidence coefficient of the prediction frame, so that the converged model can learn correct target characteristic information and background characteristic information, and further the occurrence of false detection is reduced.
Referring to fig. 2, an embodiment of the present invention provides a method for constructing an object detection model, including:
s10, according to a second training model obtained by the target detection model construction method, performing second pseudo tag calibration on the unlabeled data set to obtain a pseudo tag data set;
in the embodiment of the invention, the second training model has better prediction capability, a plurality of prediction frames are obtained by predicting the unlabeled data set through the second training model, the second pseudo tag calibration can be carried out on the prediction frames through setting the confidence coefficient condition and the I OU threshold value condition, and the unlabeled data set with the second pseudo tag is taken as the pseudo tag data set after the second pseudo tag is calibrated.
In the embodiment of the invention, when the second pseudo tag is marked, the categories of the second pseudo tag can be 2, 3, 4 and the like, so that the multi-frame multi-category pseudo tag is obtained, and the detection capability of the model obtained by training the semi-supervised model can be effectively improved through the multi-frame multi-category pseudo tag.
S20, initializing a third training model according to the model parameters of the converged second training model, and training the initialized third training model according to the marked data set and the pseudo tag data set to obtain a second target detection model; and when training is performed each time, inputting a third training model, wherein the input of the third training model comprises at least one marked image in marked data sets and at least one pseudo tag data image in pseudo tag data sets, after forward propagation is performed to obtain a prediction frame, the obtained prediction frame is matched with an input second pseudo tag, and after matching, loss function calculation is performed.
In the embodiment of the invention, the third training model is initialized according to the model parameters of the converged second training model, so that the third training model has the same detection capability as the converged second training model, and further, the follow-up semi-supervised training can be performed based on the initialized training model, and the training efficiency of the model can be effectively improved.
In the embodiment of the invention, the proportion of the marked image and the pseudo tag data image input into the third training model can be set by modifying the training sampler, the marked image and the pseudo tag data set are used as the training set of the third training model, and the prediction frame can be obtained through forward propagation, so that the optimal second pseudo tag can be determined based on the matching of the prediction frame and the second pseudo tag, and further the loss function calculation can be performed based on the optimal second pseudo tag.
In one embodiment, the step S20 performs a second pseudo tag calibration on the unlabeled dataset to obtain a pseudo tag dataset, and further includes the following sub-steps:
s201, predicting unlabeled data to obtain a plurality of prediction frames;
in the embodiment of the invention, the unlabeled data set is predicted through the third training model to obtain a plurality of prediction frames, and each prediction frame is recorded with corresponding prediction frame confidence coefficient, prediction frame category and prediction frame dichotomy category.
S202, setting a calibration condition of a second pseudo tag according to the prediction frame confidence coefficient of the prediction frame and the I OU threshold value;
in the embodiment of the invention, a plurality of second pseudo tags can be marked in the prediction frame by setting the marking conditions of the second pseudo tags.
S203, marking a plurality of second pseudo tags in a plurality of prediction frames based on the calibration conditions;
in the embodiment of the invention, a plurality of second pseudo tags can be provided.
And taking the unlabeled data set corresponding to the second pseudo tag as a pseudo tag data set.
In one embodiment, the calibration conditions of the second pseudo tag include:
the confidence of the prediction frame is larger than a third preset value, and the I OU threshold value is smaller than a fourth preset value;
in the embodiment of the present invention, the third preset value may be 0.7, and the fourth preset value may be 0.75.
The prediction frame confidence is smaller than the third preset value and larger than the fifth preset value, and the I OU threshold is smaller than the sixth preset value.
In the embodiment of the present invention, the third preset value may be 0.7, the fifth preset value may be 0.1, and the sixth preset value may be 0.45.
In one embodiment, step S20 matches the obtained prediction box with the second input pseudo tag, and performs loss function calculation after matching, and further includes the following sub-steps:
step 2011, performing non-maximum operation according to the confidence coefficient of the predicted frame to obtain a candidate frame in the predicted frame;
in the embodiment of the invention, one candidate frame can be selected from all the prediction frames through non-maximum operation.
Step S2012, I OU calculation is performed on the candidate frame and the second pseudo tag, the second pseudo tag with the largest I OU value is used as the best pseudo tag, and loss function calculation is performed based on the best pseudo tag.
In the embodiment of the invention, the position information of the candidate frame and the second pseudo tag can be calculated by I OU, and the second pseudo tag with the largest I OU value is used as the best pseudo tag according to the calculation result, and the loss function calculation is performed based on the best pseudo tag.
In the embodiment of the present invention, the target loss function of the second target detection model may be composed of a pseudo tag training set loss function corresponding to the best pseudo tag and a labeled data set loss function, where:
the loss function of the annotated data set is shown in equation (4):
Loss ren_det =Loss ren_cls +Loss ren_bbox +Loss ren_sigmoid (4)
in the formula, lossren_det is a Loss function of the marked data set, loss ren_cls Loss of value for class for a labeled dataset, loss ren_bbox Regression Loss values representing annotated data sets, loss ren_sigmoid A target loss value representing the annotated dataset;
the functions of the pseudo tag training set are shown in equations (5), (6), (7) and (8):
Loss pre_bbox =score wei_bbox ×smoothL1(x pre_bbox ,x wei_bbox ) (5)
Figure BDA0004026694530000111
Loss pre_sigmoid =-(y wei log 1-p +(1-y wei )log p ) (7)
Loss wei =weight wei_sigmoid ×(Loss pre_bbox +Loss pre_cls +Loss pre_sigmoid )(8)
wherein, loss pre_bbox Regression loss function for second pseudo tag, score wei_bbox Confidence, x, for second pseudo tag candidate frame pre_bbox To predict target coordinates, x wei_bbox A second pseudo tag coordinate;
Loss pre_cls classifying the loss function for the second pseudo tag, M is the number of categories, weight c wei_label Weight of the c-th category, y c Indicating whether the c-th predicted category is a true value, if true value is assigned 1, otherwise, 0, p c To predict class c probability;
Loss pre_sigmoid representing the target loss function of the second pseudo tag, y wei Indicating that if the target is predicted to be a real target, the target is assigned 1, and if the target is predicted not to be a real target, the target is assigned 0; p represents the probability of being predicted as a real target;
Loss wei weight is the second pseudo tag loss function wei_sigmoid The probability of representing a real object is used to reduce the noise of a pseudo tag.
In a specific example, test data for different semi-supervised training methodologies are presented in Table 1 below:
TABLE 1
Figure BDA0004026694530000112
Figure BDA0004026694530000121
As can be seen from table 1, the method for constructing the target detection model provided by the embodiment of the invention has a great improvement on the detection accuracy of the AP50 and the false positive rate.
The embodiment of the invention has the following beneficial effects:
according to the embodiment of the invention, the target pasting operation or the image stitching operation is carried out on the unlabeled data set, all images are traversed, so that the unlabeled data set and the labeled data set are fused, and a new training data set is obtained. The prediction frame with higher confidence coefficient can be pasted on the marked data set, so that the probability of the pasted data on the occurrence of the target is lower, and the occurrence of false detection is reduced. Or performing image stitching operation according to the confidence coefficient of the prediction frame, so that the converged model can learn correct target characteristic information and background characteristic information, and further the occurrence of false detection is reduced.
Further, in the embodiment of the invention, the multi-frame and multi-class second pseudo tag is obtained through calibration, after the prediction frame is obtained through forward propagation of the third training model, the obtained prediction frame is matched with the input second pseudo tag, and loss function calculation is performed after the matching, so that a converged target detection model is obtained.
Referring to fig. 3, an embodiment of the present invention provides a target detection method, which includes:
s11, acquiring a first image to be identified, inputting the first image into a target detection model, and outputting a target detection result of the first image;
the target detection model is constructed according to the target detection model construction method.
Referring to fig. 4, based on the same inventive concept as the above embodiment, an embodiment of the present invention provides an object detection model construction apparatus, including:
a data set acquisition module 10 for acquiring a marked data set and an unmarked data set;
a first model training module 20, configured to perform model training to converge based on the labeled dataset, to obtain a first training model;
the first target detection model determining module 30 is configured to initialize the second training model according to the model parameters of the first training model, and train the initialized second training model according to the labeled data set and the unlabeled data set; and when training is performed each time, performing first pseudo-label calibration on the unlabeled data set according to the current second training model, performing target pasting operation or image splicing operation on the unlabeled data set according to a pseudo-label calibration result so as to fuse the unlabeled data set with the labeled data set, and performing local training on the current second training model based on the fused data set to obtain a first target detection model.
In one embodiment, a two-class branch network for judging whether the prediction frame is a foreground is arranged in the first training model;
the first object detection model determination module 30 is further configured to:
and carrying out target prediction on the unlabeled data set according to the current second training model to obtain a plurality of prediction frames, and carrying out first pseudo tag calibration according to the plurality of prediction frames to obtain a pseudo tag calibration result, wherein each prediction frame comprises a prediction frame category, a prediction frame confidence coefficient and a prediction frame binary category.
In one embodiment, performing a first pseudo tag calibration according to a plurality of prediction frames to obtain a pseudo tag calibration result, including:
marking a prediction frame with the classification of the prediction frame as a background as a first error label in the prediction frame, wherein the classification of the prediction frame is not the background; and the confidence coefficient of the prediction frame is smaller than the first preset value, and the prediction frame larger than the second preset value is marked as a second error label, so that a pseudo label marking result is obtained.
In one embodiment, performing a target pasting operation or an image stitching operation on an unlabeled dataset includes:
pasting images which do not have the first error label and the second error label on the marked data set, wherein the confidence coefficient of the predicted frame of the unmarked data set is larger than that of a predicted frame of a first preset value;
or performing splicing operation on the images with the confidence coefficient of the predicted frame smaller than a second preset value and without the first error label and the second error label.
Referring to fig. 5, based on the same inventive concept as the above embodiment, an embodiment of the present invention provides an object detection model construction apparatus, including:
the pseudo tag calibration module 11 is configured to perform second pseudo tag calibration on the unlabeled data set according to the second training model obtained by the target detection model construction method to obtain a pseudo tag data set;
the second target detection model determining module 21 is configured to initialize a third training model according to model parameters of the second training model, and train the initialized third training model according to the labeled data set and the pseudo tag data set to obtain a second target detection model; and when training is performed each time, inputting a third training model, wherein the input of the third training model comprises at least one marked image in marked data sets and at least one pseudo tag data image in pseudo tag data sets, after forward propagation is performed to obtain a prediction frame, the obtained prediction frame is matched with an input second pseudo tag, and after matching, loss function calculation is performed.
In one embodiment, the pseudo tag calibration module 11 is further configured to:
predicting unlabeled data to obtain a plurality of prediction frames;
setting a calibration condition of the second pseudo tag according to the prediction frame confidence coefficient of the prediction frame and the I OU threshold value;
marking a plurality of second pseudo tags in a plurality of prediction frames based on the marking condition;
and taking the unlabeled data set corresponding to the second pseudo tag as a pseudo tag data set.
In one embodiment, the calibration conditions of the second pseudo tag include:
the confidence of the prediction frame is larger than a third preset value, and the I OU threshold value is smaller than a fourth preset value;
the prediction frame confidence is smaller than the third preset value and larger than the fifth preset value, and the I OU threshold is smaller than the sixth preset value.
In one embodiment, the second object detection model determination module 21 is further configured to;
performing non-maximum operation according to the confidence coefficient of the predicted frame to obtain a candidate frame in the predicted frame;
and carrying out I OU calculation on the candidate frame and the second pseudo tag, taking the second pseudo tag with the largest I OU value as the best pseudo tag, and carrying out loss function calculation based on the best pseudo tag.
Referring to fig. 6, an embodiment of the present invention provides an object detection apparatus, including;
the target detection module 12 is configured to acquire a first image to be identified, input the first image into the target detection module, and output a target detection result of the first image;
the object detection model 12 is constructed according to the object detection model construction method described above.
An embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to execute the method for constructing an object detection model as described above.
The foregoing is a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention and are intended to be comprehended within the scope of the present invention.

Claims (11)

1. The method for constructing the target detection model is characterized by comprising the following steps of:
acquiring a marked data set and an unmarked data set;
model training is carried out based on the marked data set, and a first training model is obtained;
initializing a second training model according to the model parameters of the first training model, and training the initialized second training model according to the marked data set and the unmarked data set; and when training is performed each time, performing first pseudo-label calibration on the unlabeled data set according to the current second training model, performing target pasting operation or image splicing operation on the unlabeled data set according to a pseudo-label calibration result so as to fuse the unlabeled data set with the labeled data set, and performing local training on the current second training model based on the fused data set to obtain a first target detection model.
2. The method for constructing an object detection model according to claim 1, wherein,
a two-class branch network for judging whether the prediction frame is a foreground or not is arranged in the first training model;
performing first pseudo tag calibration on the unlabeled data set according to the current second training model, including:
and carrying out target prediction on the unlabeled data set according to the current second training model to obtain a plurality of prediction frames, and carrying out first pseudo tag calibration according to the plurality of prediction frames to obtain a pseudo tag calibration result, wherein each prediction frame comprises a prediction frame category, a prediction frame confidence coefficient and a prediction frame binary category.
3. The method for constructing a target detection model according to claim 2, wherein the step of performing the first pseudo tag calibration according to the plurality of prediction frames to obtain the pseudo tag calibration result comprises:
marking the prediction frames with the classification of the prediction frames as the background as a first error label in the prediction frames, wherein the classification of the prediction frames is not background; and the confidence coefficient of the prediction frame is smaller than a first preset value, and the prediction frame with the confidence coefficient larger than a second preset value is marked as a second error label, so that a pseudo label marking result is obtained.
4. The method for constructing an object detection model according to claim 3, wherein performing an object pasting operation or an image stitching operation on the unlabeled dataset comprises:
pasting images which do not have the first error label and the second error label on the marked data set, wherein the confidence coefficient of the predicted frame of the unmarked data set is larger than that of a predicted frame of a first preset value;
or performing splicing operation on the images with the confidence coefficient of the predicted frame smaller than a second preset value and without the first error label and the second error label.
5. The method for constructing the target detection model is characterized by comprising the following steps of:
the second training model obtained by the target detection model construction method according to any one of claims 1-4, wherein the unlabeled data set is subjected to second pseudo tag calibration to obtain a pseudo tag data set;
initializing a third training model according to model parameters of the second training model, and training the initialized third training model according to the marked data set and the pseudo tag data set to obtain a second target detection model; and when training is performed each time, inputting a third training model, wherein the input of the third training model comprises at least one marked image in marked data sets and at least one pseudo tag data image in pseudo tag data sets, after forward propagation is performed to obtain a prediction frame, the obtained prediction frame is matched with an input second pseudo tag, and after matching, loss function calculation is performed.
6. The method of claim 5, wherein performing a second pseudo tag calibration on the unlabeled dataset to obtain a pseudo tag dataset comprises:
predicting unlabeled data to obtain a plurality of prediction frames;
setting a calibration condition of a second pseudo tag according to the prediction frame confidence coefficient of the prediction frame and an IOU threshold value;
marking a plurality of second pseudo tags in a plurality of prediction frames based on the marking condition;
and taking the unlabeled data set corresponding to the second pseudo tag as a pseudo tag data set.
7. The method of claim 6, wherein the calibration conditions of the second pseudo tag comprise:
the confidence of the prediction frame is larger than a third preset value, and the IOU threshold is smaller than a fourth preset value;
the prediction frame confidence is less than a third preset value, greater than a fifth preset value, and the IOU threshold is less than a sixth preset value.
8. The method of claim 5, wherein matching the obtained prediction box with the second pseudo tag and performing a loss function calculation after the matching, comprises:
performing non-maximum operation according to the confidence coefficient of the predicted frame to obtain a candidate frame in the predicted frame;
and carrying out IOU calculation on the candidate frame and the second pseudo tag, taking the second pseudo tag with the largest IOU value as the best pseudo tag, and carrying out loss function calculation based on the best pseudo tag.
9. A method of detecting an object, comprising:
acquiring a first image to be identified, inputting the first image into a target detection model, and outputting a target detection result of the first image;
wherein the object detection model is constructed according to the object detection model construction method according to any one of claims 1 to 4 or according to any one of claims 5 to 8.
10. An object detection apparatus, comprising;
the target detection module is used for acquiring a first image to be identified, inputting the first image into a target detection model and outputting a target detection result of the first image;
wherein the object detection model is constructed according to the object detection model construction method according to any one of claims 1 to 4 or according to any one of claims 5 to 8.
11. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program, wherein the apparatus in which the computer-readable storage medium is located is controlled to perform the object detection model construction method according to any one of claims 1 to 4 when the computer program is run, or the apparatus in which the computer-readable storage medium is located is controlled to perform the object detection model construction method according to any one of claims 5 to 8 when the computer program is run.
CN202211708845.7A 2022-12-29 2022-12-29 Target detection model construction method, target detection method and device Pending CN116342851A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211708845.7A CN116342851A (en) 2022-12-29 2022-12-29 Target detection model construction method, target detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211708845.7A CN116342851A (en) 2022-12-29 2022-12-29 Target detection model construction method, target detection method and device

Publications (1)

Publication Number Publication Date
CN116342851A true CN116342851A (en) 2023-06-27

Family

ID=86886482

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211708845.7A Pending CN116342851A (en) 2022-12-29 2022-12-29 Target detection model construction method, target detection method and device

Country Status (1)

Country Link
CN (1) CN116342851A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117907970A (en) * 2024-03-19 2024-04-19 清华大学苏州汽车研究院(相城) Method and device for generating target detection model of laser radar and method and device for detecting target

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117907970A (en) * 2024-03-19 2024-04-19 清华大学苏州汽车研究院(相城) Method and device for generating target detection model of laser radar and method and device for detecting target
CN117907970B (en) * 2024-03-19 2024-05-28 清华大学苏州汽车研究院(相城) Method and device for generating target detection model of laser radar and method and device for detecting target

Similar Documents

Publication Publication Date Title
CN108197670B (en) Pseudo label generation model training method and device and pseudo label generation method and device
US10417524B2 (en) Deep active learning method for civil infrastructure defect detection
CN111310808B (en) Training method and device for picture recognition model, computer system and storage medium
CN109902202B (en) Video classification method and device
CN111695385A (en) Text recognition method, device and equipment
CN113076872B (en) Intelligent test paper correcting method
CN114241505B (en) Method and device for extracting chemical structure image, storage medium and electronic equipment
CN116342851A (en) Target detection model construction method, target detection method and device
CN110968725A (en) Image content description information generation method, electronic device, and storage medium
CN114510570A (en) Intention classification method and device based on small sample corpus and computer equipment
CN116091836A (en) Multi-mode visual language understanding and positioning method, device, terminal and medium
CN112270334B (en) Few-sample image classification method and system based on abnormal point exposure
US11948387B2 (en) Optimized policy-based active learning for content detection
CN114328942A (en) Relationship extraction method, apparatus, device, storage medium and computer program product
CN113688757A (en) SAR image recognition method and device and storage medium
CN109101984B (en) Image identification method and device based on convolutional neural network
CN113033525B (en) Training method of image recognition network, electronic device and storage medium
CN116681961A (en) Weak supervision target detection method based on semi-supervision method and noise processing
CN115620083A (en) Model training method, face image quality evaluation method, device and medium
CN115457305A (en) Semi-supervised target detection method and system
CN114637877A (en) Labeling method, electronic device and storage medium
CN114663751A (en) Power transmission line defect identification method and system based on incremental learning technology
CN116012656B (en) Sample image generation method and image processing model training method and device
EP4083870A1 (en) Method and system for classifying data
CN113554127B (en) Image recognition method, device and medium based on hybrid model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination