CN116342851A

CN116342851A - Target detection model construction method, target detection method and device

Info

Publication number: CN116342851A
Application number: CN202211708845.7A
Authority: CN
Inventors: 李林超; 王威; 何林阳; 周凯
Original assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Current assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-06-27

Abstract

The invention discloses a target detection model construction method, a target detection method and a target detection device, wherein the target detection model construction method comprises the following steps: acquiring a marked data set and an unmarked data set; model training is carried out based on the marked data set, and a first training model is obtained; training the initialized second training model according to the marked data set and the unmarked data set; and when training is performed each time, performing first pseudo-label calibration on the unlabeled data set according to the current second training model, performing target pasting operation or image splicing operation on the unlabeled data set according to a pseudo-label calibration result so as to fuse the unlabeled data set with the labeled data set, and performing local training on the current second training model based on the fused data set to obtain a first target detection model. The invention can improve the detection capability of the model.

Description

Target detection model construction method, target detection method and device

Technical Field

The present invention relates to the field of target detection technologies, and in particular, to a method for constructing a target detection model, a method for detecting a target, and a device for constructing a target detection model.

Background

With rapid development of deep learning, the deep learning is widely applied in various fields, wherein in the field of vision, a very good effect is obtained by using a deep learning technology for target detection. The existing target detection model construction method generally predicts an unlabeled data set directly to calibrate a pseudo tag, and constructs a target detection model based on the calibrated pseudo tag, so that target features and background features of unlabeled data cannot be correctly learned during model training, and the detection capability of the model is poor, and the conditions of missed detection and false detection are easy to occur.

Disclosure of Invention

The invention provides a target detection model construction method, a target detection method and a target detection device, which are used for solving the technical problems that the existing target detection model construction method cannot accurately learn target characteristics and background characteristics of unlabeled data during model training, so that the detection capability of a model is poor, and the conditions of missed detection and false detection are easy to occur.

One embodiment of the present invention provides a method for constructing a target detection model, including:

acquiring a marked data set and an unmarked data set;

model training is carried out based on the marked data set, and a first training model is obtained;

initializing a second training model according to the model parameters of the first training model, and training the initialized second training model according to the marked data set and the unmarked data set; and when training is performed each time, performing first pseudo-label calibration on the unlabeled data set according to the current second training model, performing target pasting operation or image splicing operation on the unlabeled data set according to a pseudo-label calibration result so as to fuse the unlabeled data set with the labeled data set, and performing local training on the current second training model based on the fused data set to obtain a first target detection model.

Further, a two-class branch network for judging whether the prediction frame is a foreground or not is arranged in the first training model;

performing first pseudo tag calibration on the unlabeled data set according to the current second training model, including:

and carrying out target prediction on the unlabeled data set according to the current second training model to obtain a plurality of prediction frames, and carrying out first pseudo tag calibration according to the plurality of prediction frames to obtain a pseudo tag calibration result, wherein each prediction frame comprises a prediction frame category, a prediction frame confidence coefficient and a prediction frame binary category.

Further, performing first pseudo tag calibration according to a plurality of the prediction frames to obtain a pseudo tag calibration result, including:

marking the prediction frames with the classification of the prediction frames as the background as a first error label in the prediction frames, wherein the classification of the prediction frames is not background; and the confidence coefficient of the prediction frame is smaller than a first preset value, and the prediction frame with the confidence coefficient larger than a second preset value is marked as a second error label, so that a pseudo label marking result is obtained.

Further, performing a target pasting operation or an image stitching operation on the unlabeled dataset includes:

pasting images which do not have the first error label and the second error label on the marked data set, wherein the confidence coefficient of the predicted frame of the unmarked data set is larger than that of a predicted frame of a first preset value;

or performing splicing operation on the images with the confidence coefficient of the predicted frame smaller than a second preset value and without the first error label and the second error label.

according to the second training model obtained by the target detection model construction method, performing second pseudo tag calibration on the unlabeled data set to obtain a pseudo tag data set;

initializing a third training model according to model parameters of the second training model, and training the initialized third training model according to the marked data set and the pseudo tag data set to obtain a second target detection model; and when training is performed each time, inputting a third training model, wherein the input of the third training model comprises at least one marked image in marked data sets and at least one pseudo tag data image in pseudo tag data sets, after forward propagation is performed to obtain a prediction frame, the obtained prediction frame is matched with an input second pseudo tag, and after matching, loss function calculation is performed.

Further, performing second pseudo tag calibration on the unlabeled dataset to obtain a pseudo tag dataset, including:

predicting unlabeled data to obtain a plurality of prediction frames;

setting a calibration condition of a second pseudo tag according to the prediction frame confidence coefficient of the prediction frame and the I OU threshold value;

marking a plurality of second pseudo tags in a plurality of prediction frames based on the marking condition;

and taking the unlabeled data set corresponding to the second pseudo tag as a pseudo tag data set.

Further, the calibration conditions of the second pseudo tag include:

the confidence of the prediction frame is larger than a third preset value, and the I OU threshold value is smaller than a fourth preset value;

the prediction frame confidence is smaller than the third preset value and larger than the fifth preset value, and the I OU threshold is smaller than the sixth preset value.

Further, matching the obtained prediction frame with the input second pseudo tag, and performing loss function calculation after matching;

performing non-maximum operation according to the confidence coefficient of the predicted frame to obtain a candidate frame in the predicted frame;

and carrying out I OU calculation on the candidate frame and the second pseudo tag, taking the second pseudo tag with the largest I OU value as the best pseudo tag, and carrying out loss function calculation based on the best pseudo tag.

One embodiment of the present invention provides a target detection method, including:

acquiring a first image to be identified, inputting the first image into a target detection model, and outputting a target detection result of the first image;

the target detection model is constructed according to the target detection model construction method.

One embodiment of the present invention provides an object detection apparatus including;

the target detection module is used for acquiring a first image to be identified, inputting the first image into a target detection model and outputting a target detection result of the first image;

An embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, a device where the computer readable storage medium is controlled to execute the method for constructing an object detection model as described above.

According to the embodiment of the invention, the target pasting operation or the image stitching operation is carried out on the unlabeled data set, all images are traversed, so that the unlabeled data set and the labeled data set are fused, and a new training data set is obtained. The prediction frame with higher confidence coefficient can be pasted on the marked data set, so that the probability of the pasted data on the occurrence of the target is lower, and the occurrence of false detection is reduced. Or performing image stitching operation according to the confidence coefficient of the prediction frame, so that the converged model can learn correct target characteristic information and background characteristic information, the occurrence of false detection is further reduced, and the accuracy of target detection by the model is improved.

Drawings

FIG. 1 is a schematic flow chart of a method for constructing a target detection model according to an embodiment of the present invention;

FIG. 2 is another schematic flow chart of a method for constructing a target detection model according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a target detection method according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of an object detection model construction device according to an embodiment of the present invention;

FIG. 5 is another schematic structural diagram of an object detection model construction device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an object detection device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In the description of the present application, it should be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or an implicit indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.

In the description of the present application, it should be noted that, unless explicitly specified and limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be either fixedly connected, detachably connected, or integrally connected, for example; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the terms in this application will be understood by those of ordinary skill in the art in a specific context.

Referring to fig. 1, an embodiment of the present invention provides a method for constructing an object detection model, including:

s1, acquiring a marked data set and an unmarked data set;

the embodiment of the invention can be applied to target detection scenes such as X-ray contraband detection and the like. Before model training can be performed, the marked data set and the unmarked data set need to be acquired.

In the embodiment of the invention, the marked data set can be a plurality of marked image data, and the unmarked data set can be a plurality of unmarked image data.

S2, model training is carried out based on the marked data set, and a first training model is obtained;

in the embodiment of the invention, model training is performed in a target detection network through the marked data set, so as to obtain a first training model.

In the embodiment of the invention, the target detection network is internally provided with the two-class branch network for judging whether the prediction frame is the foreground or not so as to obtain the judging result of whether the prediction frame is the background or the foreground, thereby being capable of playing a role in restraining the pseudo tag training set, playing a role in semi-supervised training, reducing the occurrence of false detection conditions and effectively improving the detection capability of the model.

In the embodiment of the invention, a plurality of loss functions are set for model training, wherein the sigmoi function is set as the loss function for judging the binary class of the prediction frame. Wherein the expression of the s i gmoi d function may be as follows:

Loss _{2_cls} ＝-(ylog ^1-p +(1-y)log ^p ) (1)

in the Loss _{2_cls} Representing a binary class loss function.

On the other hand, the embodiment of the invention also sets a loss function of the confidence of the predicted frame and a loss function for judging the category of the predicted frame, and the expression can be the following formula:

Loss _det ＝Loss _{det_cls} +Loss _{det_bbox} (2)

Loss _sum ＝Loss _det +Loss _{2_cls} (3)

in the Loss _det Loss function representing prediction frame confidence, loss _{det_cls} Class Loss value representing detection algorithm, loss _{det_bbox} Representing regression loss of the target detection algorithm; lossu represents a loss function that determines the prediction box class.

According to the embodiment of the invention, model training is carried out according to the 3 loss functions, so that the model obtained through training can be used for predicting and obtaining the binary class loss, the class loss and the regression loss of the prediction frame. When the loss value of the loss function tends to be stable, the loss value fluctuation is within a set range, the first training model converges, and the target detection network is not trained any more.

In addition, when the first training model is used for target prediction, coordinate information of a prediction frame can be obtained.

S3, initializing a second training model according to model parameters of the first training model, and training the initialized second training model according to the marked data set and the unmarked data set; and when training is performed each time, performing first pseudo-label calibration on the unlabeled data set according to the current second training model, performing target pasting operation or image splicing operation on the unlabeled data set according to a pseudo-label calibration result so as to fuse the unlabeled data set with the labeled data set, and performing local training on the current second training model based on the fused data set to obtain a first target detection model.

In the embodiment of the invention, the second training model can be initialized according to the parameters of the first training model, so that the second training model has certain detection capability when training is started, and further, the subsequent training steps can be started based on the initialized second training model.

In the embodiment of the invention, the unlabeled data set is subjected to target pasting operation or image stitching operation, and all images are traversed, so that the unlabeled data set and the labeled data set are fused, and a new training data set is obtained. The prediction frame with higher confidence coefficient can be pasted on the marked data set, so that the probability of the pasted data on the occurrence of the target is lower, and the occurrence of false detection is reduced. Or performing image stitching operation according to the confidence coefficient of the prediction frame, so that the converged model can learn correct target characteristic information and background characteristic information, and further the occurrence of false detection is reduced.

In the embodiment of the invention, after the loss function of the second training model meets the preset requirement, the second training model converges, for example, when the loss value of the loss function of the second training model tends to be stable, the second training model is judged to converge, and then the converged second training model is used as the first target detection model.

In one embodiment, a two-class branch network for judging whether the prediction frame is a foreground is arranged in the first training model;

In the embodiment of the invention, the second training model is utilized to conduct target prediction on the unlabeled data set, a plurality of prediction frames can be obtained, and according to the category or the confidence level of the prediction frames, a first pseudo tag can be selected from the prediction frames and calibrated, so that a pseudo tag calibration result can be obtained.

In one embodiment, performing a first pseudo tag calibration according to a plurality of prediction frames to obtain a pseudo tag calibration result, including:

marking a prediction frame with the classification of the prediction frame as a background as a first error label in the prediction frame, wherein the classification of the prediction frame is not the background; and the confidence coefficient of the prediction frame is smaller than the first preset value, and the prediction frame larger than the second preset value is marked as a second error label, so that a pseudo label marking result is obtained.

In the embodiment of the invention, whether the prediction frame meets the calibration condition of the error label can be judged by combining the prediction frame type and the prediction frame binary type results. For example, when the prediction box class is not background, but the prediction light for which the prediction box class is background is marked as the first error label.

The embodiment of the invention can also judge whether the prediction frame meets the calibration condition of the error label according to the confidence level of the prediction frame. For example, when the confidence coefficient of the predicted frame is smaller than the first preset value and larger than the second preset value, the predicted frame is marked as a second error label, and then a pseudo label marking result is obtained. The first preset value and the second preset value can be set and adjusted according to actual needs, and in a specific embodiment, the first preset value can be 0.9, and the second preset value can be 0.05.

In the embodiment of the present invention, the first error label and the second error label may be marked as-1.

In one embodiment, performing a target pasting operation or an image stitching operation on an unlabeled dataset includes:

pasting images which are not marked with the data sets and have no first error labels and second error labels on the marked data sets, wherein the confidence of the predicted frames of the unmarked data sets is larger than a predicted frame of a first preset value;

in the embodiment of the invention, the image with the confidence of the predicted frame larger than the first preset value in the unlabeled data set and without the first error label and the second error label can be pasted on the blank of the image with the labeled data set.

Or performing splicing operation on the images with the confidence coefficient of the predicted frame smaller than the second preset value and without the first error label and the second error label.

In the embodiment of the invention, the images with the confidence coefficient smaller than the second preset value and without the first error label and the second error label in the marked data set and the unmarked data set can be spliced.

According to the embodiment of the invention, the target pasting operation or the image stitching operation is carried out on the unlabeled data set until all the images are traversed, so that the second training model can learn correct target characteristic information and background characteristic information, the occurrence of the condition of missing detection and false detection is avoided, and the model detection capability can be effectively provided.

The embodiment of the invention has the following beneficial effects:

according to the embodiment of the invention, the target pasting operation or the image stitching operation is carried out on the unlabeled data set, all images are traversed, so that the unlabeled data set and the labeled data set are fused, and a new training data set is obtained. The prediction frame with higher confidence coefficient can be pasted on the marked data set, so that the probability of the pasted data on the occurrence of the target is lower, and the occurrence of false detection is reduced. Or performing image stitching operation according to the confidence coefficient of the prediction frame, so that the converged model can learn correct target characteristic information and background characteristic information, and further the occurrence of false detection is reduced.

Referring to fig. 2, an embodiment of the present invention provides a method for constructing an object detection model, including:

s10, according to a second training model obtained by the target detection model construction method, performing second pseudo tag calibration on the unlabeled data set to obtain a pseudo tag data set;

in the embodiment of the invention, the second training model has better prediction capability, a plurality of prediction frames are obtained by predicting the unlabeled data set through the second training model, the second pseudo tag calibration can be carried out on the prediction frames through setting the confidence coefficient condition and the I OU threshold value condition, and the unlabeled data set with the second pseudo tag is taken as the pseudo tag data set after the second pseudo tag is calibrated.

In the embodiment of the invention, when the second pseudo tag is marked, the categories of the second pseudo tag can be 2, 3, 4 and the like, so that the multi-frame multi-category pseudo tag is obtained, and the detection capability of the model obtained by training the semi-supervised model can be effectively improved through the multi-frame multi-category pseudo tag.

S20, initializing a third training model according to the model parameters of the converged second training model, and training the initialized third training model according to the marked data set and the pseudo tag data set to obtain a second target detection model; and when training is performed each time, inputting a third training model, wherein the input of the third training model comprises at least one marked image in marked data sets and at least one pseudo tag data image in pseudo tag data sets, after forward propagation is performed to obtain a prediction frame, the obtained prediction frame is matched with an input second pseudo tag, and after matching, loss function calculation is performed.

In the embodiment of the invention, the third training model is initialized according to the model parameters of the converged second training model, so that the third training model has the same detection capability as the converged second training model, and further, the follow-up semi-supervised training can be performed based on the initialized training model, and the training efficiency of the model can be effectively improved.

In the embodiment of the invention, the proportion of the marked image and the pseudo tag data image input into the third training model can be set by modifying the training sampler, the marked image and the pseudo tag data set are used as the training set of the third training model, and the prediction frame can be obtained through forward propagation, so that the optimal second pseudo tag can be determined based on the matching of the prediction frame and the second pseudo tag, and further the loss function calculation can be performed based on the optimal second pseudo tag.

In one embodiment, the step S20 performs a second pseudo tag calibration on the unlabeled dataset to obtain a pseudo tag dataset, and further includes the following sub-steps:

s201, predicting unlabeled data to obtain a plurality of prediction frames;

in the embodiment of the invention, the unlabeled data set is predicted through the third training model to obtain a plurality of prediction frames, and each prediction frame is recorded with corresponding prediction frame confidence coefficient, prediction frame category and prediction frame dichotomy category.

S202, setting a calibration condition of a second pseudo tag according to the prediction frame confidence coefficient of the prediction frame and the I OU threshold value;

in the embodiment of the invention, a plurality of second pseudo tags can be marked in the prediction frame by setting the marking conditions of the second pseudo tags.

S203, marking a plurality of second pseudo tags in a plurality of prediction frames based on the calibration conditions;

in the embodiment of the invention, a plurality of second pseudo tags can be provided.

In one embodiment, the calibration conditions of the second pseudo tag include:

in the embodiment of the present invention, the third preset value may be 0.7, and the fourth preset value may be 0.75.

In the embodiment of the present invention, the third preset value may be 0.7, the fifth preset value may be 0.1, and the sixth preset value may be 0.45.

In one embodiment, step S20 matches the obtained prediction box with the second input pseudo tag, and performs loss function calculation after matching, and further includes the following sub-steps:

step 2011, performing non-maximum operation according to the confidence coefficient of the predicted frame to obtain a candidate frame in the predicted frame;

in the embodiment of the invention, one candidate frame can be selected from all the prediction frames through non-maximum operation.

Step S2012, I OU calculation is performed on the candidate frame and the second pseudo tag, the second pseudo tag with the largest I OU value is used as the best pseudo tag, and loss function calculation is performed based on the best pseudo tag.

In the embodiment of the invention, the position information of the candidate frame and the second pseudo tag can be calculated by I OU, and the second pseudo tag with the largest I OU value is used as the best pseudo tag according to the calculation result, and the loss function calculation is performed based on the best pseudo tag.

In the embodiment of the present invention, the target loss function of the second target detection model may be composed of a pseudo tag training set loss function corresponding to the best pseudo tag and a labeled data set loss function, where:

the loss function of the annotated data set is shown in equation (4):

Loss _{ren_det} ＝Loss _{ren_cls} +Loss _{ren_bbox} +Loss _{ren_sigmoid} (4)

in the formula, lossren_det is a Loss function of the marked data set, loss _{ren_cls} Loss of value for class for a labeled dataset, loss _{ren_bbox} Regression Loss values representing annotated data sets, loss _{ren_sigmoid} A target loss value representing the annotated dataset;

the functions of the pseudo tag training set are shown in equations (5), (6), (7) and (8):

Loss _{pre_bbox} ＝score _{wei_bbox} ×smoothL1(x _{pre_bbox} ,x _{wei_bbox} ) (5)

Loss _{pre_sigmoid} ＝-(y _wei log ^1-p +(1-y _wei )log ^p ) (7)

Loss _wei ＝weight _{wei_sigmoid} ×(Loss _{pre_bbox} +Loss _{pre_cls} +Loss _{pre_sigmoid} )(8)

wherein, loss _{pre_bbox} Regression loss function for second pseudo tag, score _{wei_bbox} Confidence, x, for second pseudo tag candidate frame _{pre_bbox} To predict target coordinates, x _{wei_bbox} A second pseudo tag coordinate;

Loss _{pre_cls} classifying the loss function for the second pseudo tag, M is the number of categories, weight _c ^wei_label Weight of the c-th category, y _c Indicating whether the c-th predicted category is a true value, if true value is assigned 1, otherwise, 0, p _c To predict class c probability;

Loss _{pre_sigmoid} representing the target loss function of the second pseudo tag, y _wei Indicating that if the target is predicted to be a real target, the target is assigned 1, and if the target is predicted not to be a real target, the target is assigned 0; p represents the probability of being predicted as a real target;

Loss _wei weight is the second pseudo tag loss function _{wei_sigmoid} The probability of representing a real object is used to reduce the noise of a pseudo tag.

In a specific example, test data for different semi-supervised training methodologies are presented in Table 1 below:

TABLE 1

As can be seen from table 1, the method for constructing the target detection model provided by the embodiment of the invention has a great improvement on the detection accuracy of the AP50 and the false positive rate.

The embodiment of the invention has the following beneficial effects:

Further, in the embodiment of the invention, the multi-frame and multi-class second pseudo tag is obtained through calibration, after the prediction frame is obtained through forward propagation of the third training model, the obtained prediction frame is matched with the input second pseudo tag, and loss function calculation is performed after the matching, so that a converged target detection model is obtained.

Referring to fig. 3, an embodiment of the present invention provides a target detection method, which includes:

s11, acquiring a first image to be identified, inputting the first image into a target detection model, and outputting a target detection result of the first image;

Referring to fig. 4, based on the same inventive concept as the above embodiment, an embodiment of the present invention provides an object detection model construction apparatus, including:

a data set acquisition module 10 for acquiring a marked data set and an unmarked data set;

a first model training module 20, configured to perform model training to converge based on the labeled dataset, to obtain a first training model;

the first target detection model determining module 30 is configured to initialize the second training model according to the model parameters of the first training model, and train the initialized second training model according to the labeled data set and the unlabeled data set; and when training is performed each time, performing first pseudo-label calibration on the unlabeled data set according to the current second training model, performing target pasting operation or image splicing operation on the unlabeled data set according to a pseudo-label calibration result so as to fuse the unlabeled data set with the labeled data set, and performing local training on the current second training model based on the fused data set to obtain a first target detection model.

the first object detection model determination module 30 is further configured to:

Referring to fig. 5, based on the same inventive concept as the above embodiment, an embodiment of the present invention provides an object detection model construction apparatus, including:

the pseudo tag calibration module 11 is configured to perform second pseudo tag calibration on the unlabeled data set according to the second training model obtained by the target detection model construction method to obtain a pseudo tag data set;

the second target detection model determining module 21 is configured to initialize a third training model according to model parameters of the second training model, and train the initialized third training model according to the labeled data set and the pseudo tag data set to obtain a second target detection model; and when training is performed each time, inputting a third training model, wherein the input of the third training model comprises at least one marked image in marked data sets and at least one pseudo tag data image in pseudo tag data sets, after forward propagation is performed to obtain a prediction frame, the obtained prediction frame is matched with an input second pseudo tag, and after matching, loss function calculation is performed.

In one embodiment, the pseudo tag calibration module 11 is further configured to:

predicting unlabeled data to obtain a plurality of prediction frames;

setting a calibration condition of the second pseudo tag according to the prediction frame confidence coefficient of the prediction frame and the I OU threshold value;

In one embodiment, the calibration conditions of the second pseudo tag include:

In one embodiment, the second object detection model determination module 21 is further configured to;

Referring to fig. 6, an embodiment of the present invention provides an object detection apparatus, including;

the target detection module 12 is configured to acquire a first image to be identified, input the first image into the target detection module, and output a target detection result of the first image;

the object detection model 12 is constructed according to the object detection model construction method described above.

An embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where the computer program when executed controls a device in which the computer readable storage medium is located to execute the method for constructing an object detection model as described above.

The foregoing is a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention and are intended to be comprehended within the scope of the present invention.

Claims

1. The method for constructing the target detection model is characterized by comprising the following steps of:

acquiring a marked data set and an unmarked data set;

2. The method for constructing an object detection model according to claim 1, wherein,

a two-class branch network for judging whether the prediction frame is a foreground or not is arranged in the first training model;

3. The method for constructing a target detection model according to claim 2, wherein the step of performing the first pseudo tag calibration according to the plurality of prediction frames to obtain the pseudo tag calibration result comprises:

4. The method for constructing an object detection model according to claim 3, wherein performing an object pasting operation or an image stitching operation on the unlabeled dataset comprises:

5. The method for constructing the target detection model is characterized by comprising the following steps of:

the second training model obtained by the target detection model construction method according to any one of claims 1-4, wherein the unlabeled data set is subjected to second pseudo tag calibration to obtain a pseudo tag data set;

6. The method of claim 5, wherein performing a second pseudo tag calibration on the unlabeled dataset to obtain a pseudo tag dataset comprises:

predicting unlabeled data to obtain a plurality of prediction frames;

setting a calibration condition of a second pseudo tag according to the prediction frame confidence coefficient of the prediction frame and an IOU threshold value;

7. The method of claim 6, wherein the calibration conditions of the second pseudo tag comprise:

the confidence of the prediction frame is larger than a third preset value, and the IOU threshold is smaller than a fourth preset value;

the prediction frame confidence is less than a third preset value, greater than a fifth preset value, and the IOU threshold is less than a sixth preset value.

8. The method of claim 5, wherein matching the obtained prediction box with the second pseudo tag and performing a loss function calculation after the matching, comprises:

and carrying out IOU calculation on the candidate frame and the second pseudo tag, taking the second pseudo tag with the largest IOU value as the best pseudo tag, and carrying out loss function calculation based on the best pseudo tag.

9. A method of detecting an object, comprising:

wherein the object detection model is constructed according to the object detection model construction method according to any one of claims 1 to 4 or according to any one of claims 5 to 8.

10. An object detection apparatus, comprising;

11. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored computer program, wherein the apparatus in which the computer-readable storage medium is located is controlled to perform the object detection model construction method according to any one of claims 1 to 4 when the computer program is run, or the apparatus in which the computer-readable storage medium is located is controlled to perform the object detection model construction method according to any one of claims 5 to 8 when the computer program is run.