WO2023088174A1 - 目标检测方法及装置 - Google Patents
目标检测方法及装置 Download PDFInfo
- Publication number
- WO2023088174A1 WO2023088174A1 PCT/CN2022/131320 CN2022131320W WO2023088174A1 WO 2023088174 A1 WO2023088174 A1 WO 2023088174A1 CN 2022131320 W CN2022131320 W CN 2022131320W WO 2023088174 A1 WO2023088174 A1 WO 2023088174A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pseudo
- label
- predicted
- initial
- management model
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 61
- 238000000034 method Methods 0.000 claims abstract description 56
- 238000012549 training Methods 0.000 claims abstract description 38
- 238000009826 distribution Methods 0.000 claims description 88
- 230000008569 process Effects 0.000 claims description 26
- 238000013434 data augmentation Methods 0.000 claims description 25
- 238000003860 storage Methods 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 14
- 238000012937 correction Methods 0.000 claims description 10
- 230000003190 augmentative effect Effects 0.000 claims description 6
- 230000004044 response Effects 0.000 claims description 4
- 238000010801 machine learning Methods 0.000 description 14
- 238000010586 diagram Methods 0.000 description 11
- 230000003416 augmentation Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 239000000047 product Substances 0.000 description 7
- 230000009466 transformation Effects 0.000 description 6
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 241001465754 Metazoa Species 0.000 description 2
- 230000001174 ascending effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000011478 gradient descent method Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 239000006227 byproduct Substances 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Definitions
- the present disclosure relates to the technical field of artificial intelligence, and in particular to a target detection method and device.
- Machine learning is a way to realize artificial intelligence. It is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other disciplines. Machine learning is used to study how computers simulate or implement human learning behaviors to acquire new knowledge or skills, and reorganize existing knowledge structures to continuously improve their own performance. Machine learning pays more attention to algorithm design, enabling computers to automatically learn laws from data and use the laws to predict unknown data. Machine learning has been used in a wide range of applications, such as deep learning, data mining, computer vision, natural language processing, biometric recognition, search engines, medical diagnosis, speech recognition, and handwriting recognition.
- a training data set can be constructed, which includes a large amount of labeled data (such as image data, that is, images with labeled boxes and labeled categories).
- a machine learning model is trained based on the training data set.
- a machine learning model with target detection function can use the machine learning model to perform target detection on the data to be detected. For example, to detect the target frame in the data to be detected and identify the target category.
- target category such as vehicle category, animal category, electronic product category, etc.
- the present disclosure provides a target detection method, the method comprising:
- the target management model is used for target detection on the data to be detected.
- the present disclosure provides a target detection device, the device comprising:
- An acquisition module configured to acquire an initial management model and an initial learning model, add pseudo-labels to unlabeled data based on the initial management model, and divide the pseudo-labels into high-quality pseudo-labels and uncertain pseudo-labels;
- a determination module configured to input unlabeled data to the initial learning model to obtain a first predicted value corresponding to the unlabeled data; determine a first predicted label and a first predicted frame based on the first predicted value corresponding to the high-quality pseudo-label, Determine the second prediction label and the second prediction frame based on the first prediction value corresponding to the uncertain pseudo-label; input the unlabeled data into the initial management model to obtain the second prediction value corresponding to the unlabeled data, based on the uncertain pseudo-label The second predicted value corresponding to the label determines a third predicted label and a third predicted frame;
- a processing module configured to train the initial management model based on the first prediction label, the first prediction frame, the second prediction label, the second prediction frame, the third prediction label and the third prediction frame to obtain a target management model; wherein , the target management model is used for target detection on the data to be detected.
- the present disclosure provides a target detection device, including: a processor and a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions that can be executed by the processor;
- the processor is configured to execute machine-executable instructions, so as to implement the object detection method in the above examples.
- the embodiment of the present disclosure also provides a machine-readable storage medium, on which several computer instructions are stored, and when the computer instructions are executed by a processor, the object detection method disclosed in the above examples of the present disclosure can be implemented.
- the target management model can be obtained based on unlabeled data training, that is, the target management model can also be trained through a small amount of labeled data and a large amount of unlabeled data, thereby avoiding obtaining a large amount of labeled data.
- a semi-supervised target detection training method drawing on the idea of maintaining consistency in the features of the same image after different augmentations in unsupervised representation learning, and establishing consistency constraints on the regression boxes and classifications of unlabeled data under different augmentations, combining pseudo-labels and The consistency constraints are combined, and the pseudo-label is used as the real category for the reliable target frame, and the consistency comparison loss is established for different prediction results (or features) for the unconfident target.
- FIG. 1 is a schematic flow diagram of a target detection method in an embodiment of the present disclosure
- FIG. 2 is a schematic flow diagram of a target detection method in another embodiment of the present disclosure.
- FIG. 3 is a schematic structural diagram of a target detection device in an embodiment of the present disclosure.
- Fig. 4 is a hardware structural diagram of an object detection device in an embodiment of the present disclosure.
- first, second, third, etc. may use the terms first, second, third, etc. to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of the present disclosure, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, furthermore, the use of the word “if” could be interpreted as “at” or "when” or "in response to a determination.”
- An object detection method is proposed in an embodiment of the present disclosure, which can be applied to an object detection device.
- the object detection device can be any type of device, such as a server, a terminal device, a management device, etc., without limitation.
- Fig. 1 it is a schematic flow chart of the target detection method, the target detection method of the present embodiment can be a semi-supervised target detection method, and the method can include:
- Step 101 Obtain an initial management model and an initial learning model, and add pseudo-labels to unlabeled data based on the initial management model, and divide the pseudo-labels into high-quality pseudo-labels and uncertain pseudo-labels.
- obtaining an initial management model and an initial learning model may include, but is not limited to: using labeled data to train to obtain a baseline model, and generating an initial management model and an initial learning model based on the baseline model.
- the network structure of the initial management model and the network structure of the baseline model may be the same, and the network parameters of the initial management model and the network parameters of the baseline model may be the same or different.
- the network structure of the initial learning model is the same as that of the baseline model, and the network parameters of the initial learning model are the same or different from those of the baseline model.
- the network structure of the initial management model is the same as that of the initial learning model, and the network parameters of the initial management model are the same or different from those of the initial learning model.
- adding pseudo-labels to unlabeled data based on the initial management model, and dividing the pseudo-labels into high-quality pseudo-labels and uncertain pseudo-labels may include but not limited to: for each unlabeled data, the The unlabeled data is input to the initial management model, and the pseudo-label corresponding to the unlabeled data and the probability value corresponding to the pseudo-label are obtained.
- K pseudo-labels are used as high-quality pseudo-labels corresponding to this category, and the remaining pseudo-labels in all pseudo-labels corresponding to this category except the high-quality pseudo-labels corresponding to this category are used as uncertain pseudo-labels corresponding to this category.
- K is a positive integer.
- Step 102 Input the unlabeled data into the initial learning model to obtain the first predicted value corresponding to the unlabeled data; determine the first predicted label and the first predicted frame based on the first predicted value corresponding to the high-quality pseudo-label, based on the The first predicted value corresponding to the uncertain pseudo-label determines the second predicted label and the second predicted frame.
- the first data augmentation may be performed on the unlabeled data, and the unlabeled data after the first data augmentation is input to the initial learning model to obtain the first predicted value corresponding to the unlabeled data.
- the second data augmentation may be performed on the unlabeled data, and the unlabeled data augmented by the second data is input into the initial management model to obtain the second predicted value corresponding to the unlabeled data.
- the manner of the first data augmentation may be different from the manner of the second data augmentation.
- Step 103 Train the initial management model based on the first prediction label, the first prediction frame, the second prediction label, the second prediction frame, the third prediction label and the third prediction frame to obtain a goal management model; wherein, the goal management
- the model is used for object detection on the data to be detected.
- the following steps 1031-1032 can be used to train the initial management model to obtain the target management model:
- Step 1031 Determine a first loss value based on the first predicted label and the first predicted frame, and determine a second loss value based on the second predicted label, the second predicted frame, the third predicted label, and the third predicted frame.
- Determining the second loss value based on the second prediction label, the second prediction frame, the third prediction label, and the third prediction frame may include but not limited to: if the second prediction label includes C categories corresponding to the initial learning model support detection C first probability values, the third prediction label includes C second probability values corresponding to C categories supported by the initial management model, then based on the C first probability values and C second probability values, the consistency is determined
- Constrained category loss value C can be a positive integer greater than 1.
- a first probability distribution of coordinate point offsets corresponding to the second prediction frame is determined, and a second probability distribution of coordinate point offsets corresponding to the third prediction frame is determined.
- determine the coordinate frame loss value of the consistency constraint based on the first probability distribution and the second probability distribution.
- a second loss value is determined based on the category loss value and the coordinate frame loss value.
- determining the coordinate frame loss value of the consistency constraint based on the first probability distribution and the second probability distribution may include but not limited to: determining the relative entropy between the first probability distribution and the second probability distribution , and based on this relative entropy, determine the coordinate frame loss value of the consistency constraint.
- Step 1032 Adjust the initial management model based on the first loss value and the second loss value to obtain a target management model; wherein, the target management model is used for target detection on the data to be detected.
- labeled data may also be input into the initial learning model to obtain a third predicted value corresponding to the labeled data, and determine a fourth predicted label and a fourth predicted frame based on the third predicted value, And determine a third loss value based on the fourth predicted label and the fourth predicted frame.
- the initial management model can be adjusted based on the first loss value, the second loss value and the third loss value to obtain a target management model.
- adjusting the initial management model based on the first loss value and the second loss value to obtain the target management model may include but not limited to: adjusting the network parameters of the initial learning model based on the first loss value and the second loss value Adjust to obtain the adjusted learning model; adjust the network parameters of the initial management model based on the network parameters of the adjusted learning model to obtain the adjusted management model; if the adjusted management model does not converge, the adjusted learning model is determined as the initial learning model model, the adjusted management model is determined as the initial management model, returns to the operation of adding pseudo-labels to unlabeled data based on the initial management model, and divides the pseudo-labels into high-quality pseudo-labels and uncertain pseudo-labels (ie step 101); If the adjusted management model has converged, the adjusted management model is determined as the target management model.
- the network parameters of the initial management model are adjusted based on the network parameters of the adjusted learning model to obtain the adjusted management model, which may include but not limited to: determine the network parameters based on the adjusted learning model and the configured proportional coefficient
- the parameter correction value of the parameter is adjusted, and the network parameters of the initial management model are adjusted based on the parameter correction value to obtain the adjusted management model.
- the target management model can be obtained based on unlabeled data training, that is, the target management model can also be trained through a small amount of labeled data and a large amount of unlabeled data, thereby avoiding the problem of obtaining a large amount of labeled data. While consuming human resources, it helps to reduce the workload of labeling operations and save human resources, and the target management model has relatively good performance and high reliability. Based on the effective use of pseudo-labels, it can improve the robustness to noise samples in the training process, and the target management model has a very large improvement compared with the baseline model.
- a semi-supervised target detection training method drawing on the idea of maintaining consistency in the features of the same image after different augmentations in unsupervised representation learning, and establishing consistency constraints on the regression frame and classification of unlabeled data under different augmentations, the pseudo-label Combined with the consistency constraints, the pseudo-label is used as the real category for the reliable target frame, and the consistency comparison loss is established for different prediction results (or features) for the unsure target.
- An object detection method is proposed in the embodiment of the present disclosure, which is a semi-supervised object detection method based on consistency constraints.
- part of the labeled data can be combined with large-scale unlabeled
- the data trains the model and achieves performance close to that of fully labeled data.
- Fig. 2 is a schematic flow chart of the target detection method, as shown in Fig. 2, the method may include:
- Step 201 using labeled data training to obtain a baseline model.
- a training data set can be constructed in advance, and the training data set can include a plurality of labeled data (such as labeled images), and for each labeled data, the labeled data corresponds to calibration information, and the calibration
- the information includes but is not limited to the calibration frame (for example, when the calibration frame is a rectangular calibration frame, it can be the coordinates of the four vertices of the rectangular calibration frame) and the calibration category (ie, the category of the target object in the calibration frame).
- an initial network model can be obtained in advance, and the initial network model can be a machine learning model, such as a machine learning model based on a deep learning algorithm, a machine learning model based on a neural network, etc.
- the type of the machine learning model No restrictions.
- the structure of the initial network model is not limited in this embodiment.
- the initial network model can realize object detection.
- the initial network model can be trained.
- the trained initial network model is called the baseline model. That is to say, the training data set can be used
- a baseline model is obtained by training on labeled data.
- the labeled data is input to the initial network model, and the initial network model processes the labeled data to obtain the prediction frame and prediction label corresponding to the labeled data.
- the prediction frame is used to indicate the location of the target object, such as four vertex coordinates.
- the predicted label is used to represent the category of the target object, such as category 1. If the initial network model supports the detection of target objects of category 1 and target objects of category 2, when the predicted label is 0, it indicates that the category of the target object is category 1, and when the predicted label is 1, it indicates that the category of the target object is category 2.
- the category loss value can be determined based on the predicted label corresponding to the labeled data and the calibration category corresponding to the labeled data (that is, the real category), and based on the prediction frame corresponding to the labeled data and the calibration frame corresponding to the labeled data (that is, the real category). frame) to determine the loss value of the coordinate frame. Then, determine the target loss value of the initial network model based on the category loss value and the coordinate frame loss value. For example, the following formula (1) can be used to determine the target loss value of the initial network model:
- L represents the target loss value of the initial network model
- L loc represents the coordinate frame loss value
- L cls represents the category loss value
- the class loss value L cls if the predicted label matches the calibration class (for example, they are the same), the class loss value is smaller (for example, the class loss value is the minimum loss value). If the predicted label does not match the calibration category (for example, they are not the same), the category loss value is larger (for example, the category loss value is the maximum loss value).
- the above is just an example of determining the category loss value, and is not limited thereto.
- the coordinate frame loss value L loc if the prediction frame matches the calibration frame (for example, the four vertex coordinates of the prediction frame are the same as the four vertex coordinates of the calibration frame), the coordinate frame loss value is small (for example, the coordinate frame loss value is the minimum loss value). If the prediction frame does not match the calibration frame (for example, the 4 vertex coordinates of the prediction frame are different from the 4 vertex coordinates of the calibration frame), then determine the prediction frame (such as the 4 vertex coordinates of the prediction frame) and the calibration frame (such as the calibration frame. 4 vertex coordinates) closeness. If the prediction frame is closer to the calibration frame, the loss value of the coordinate frame is smaller. If the difference between the prediction frame and the calibration frame is greater, the loss value of the coordinate frame is greater.
- each vertex coordinate of the prediction frame corresponds to a probability distribution, such as the probability distribution corresponding to the vertex coordinates in the upper left corner of the prediction frame a1.
- Probability distribution corresponding to vertex coordinates in the upper right corner of the prediction frame a2 probability distribution corresponding to vertex coordinates in the lower right corner of the prediction frame a3, probability distribution a4 corresponding to vertex coordinates in the lower left corner of the prediction frame.
- each vertex coordinate of the calibration frame corresponds to a probability distribution, such as the probability distribution b1 corresponding to the vertex coordinates in the upper left corner of the calibration frame, and the upper right corner of the calibration frame
- the vertex coordinates correspond to the probability distribution b2
- the vertex coordinates of the lower right corner of the calibration frame correspond to the probability distribution b3
- the vertex coordinates of the lower left corner of the calibration frame correspond to the probability distribution b4.
- the probability distribution can be expressed by the mean and variance.
- the probability distribution corresponding to the vertex coordinate x (such as the vertex coordinates of the prediction frame or the vertex coordinates of the calibration frame) is expressed as N( ⁇ tx , ⁇ tx ).
- the loss value of the coordinate frame between the prediction frame and the calibration frame can be calculated. For example, calculate the negative log likelihood loss value based on probability distribution a1 and probability distribution b1, calculate the negative log likelihood loss value based on probability distribution a2 and probability distribution b2, and calculate the negative log likelihood loss value based on probability distribution a3 and probability distribution b3 Loss value, calculate the negative log likelihood loss value based on probability distribution a4 and probability distribution b4.
- the network parameters of the initial network model can be adjusted based on the target loss value, such as using the gradient descent method to adjust the network parameters of the initial network model Adjustment is performed, and this embodiment does not limit the adjustment process.
- Use the adjusted network model as the initial network model return to perform the operation of inputting labeled data into the initial network model, and so on, until the initial network model has converged (such as the number of iterations of the initial network model reaches the number threshold, or, The target loss value is less than the loss value threshold), and the converged initial network model is used as the baseline model. So far, a baseline model is obtained by using labeled data training.
- Step 202 generating an initial management model and an initial learning model based on the baseline model.
- an initial management model (also called an initial teacher model) can be generated based on the baseline model, the network structure of the initial management model can be the same as that of the baseline model, and the network parameters of the initial management model can be the same as those of the baseline model
- the network parameters can be the same or different.
- the baseline model can be directly used as the initial management model, or the network parameters of the baseline model can be adjusted, and the baseline model with adjusted network parameters can be used as the initial management model.
- an initial learning model (also called an initial student model) can be generated based on the baseline model, the network structure of the initial learning model can be the same as that of the baseline model, and the network parameters of the initial learning model can be the same as those of the baseline model
- the network parameters can be the same or different.
- the baseline model can be directly used as the initial learning model, or the network parameters of the baseline model can be adjusted, and the baseline model with adjusted network parameters can be used as the initial learning model.
- Step 203 Add pseudo-labels to the unlabeled data based on the initial management model, and divide the pseudo-labels into high-quality pseudo-labels and uncertain pseudo-labels. For example, determine the pseudo-label corresponding to the unlabeled data based on the initial management model, and divide the pseudo-label into high-quality pseudo-label or uncertain pseudo-label.
- a sample data set may be pre-built, and the sample data set may include multiple unlabeled data (such as unlabeled images), that is, add unlabeled data to the sample data set.
- the unlabeled data has no calibration information, that is, there is no corresponding calibration frame and calibration category.
- the unlabeled data can be input to the initial management model, and the initial management model processes the unlabeled data to obtain the prediction corresponding to the unlabeled data frame, the prediction label corresponding to the prediction frame corresponding to the unlabeled data, and the probability value corresponding to the prediction label (that is, the probability value that the target object in the prediction frame is the prediction label).
- the initial management model supports target detection of category 1, category 2, and category 3.
- the prediction box and the probability vector corresponding to the prediction box can be obtained.
- the probability vector can be [ 0.9,0.06,0.04], based on the probability vector, it can be known that the predicted label corresponding to the predicted frame is category 1, and the probability value corresponding to the predicted label is 0.9.
- the unlabeled data may correspond to P prediction frames, P is a positive integer, and each prediction frame corresponds to a prediction label, that is, P prediction frames correspond to P prediction labels, and each prediction label corresponds to A probability value, in addition, each prediction box can also correspond to a probability value.
- the probability values of prediction frame 1 and prediction frame 1 the prediction label 1 corresponding to prediction frame 1 and the probability value of prediction label 1, and prediction frame 2 and prediction frame 2 are obtained.
- the probability value of , the prediction label 2 corresponding to the prediction box 2 and the probability value of the prediction label 2.
- the prediction frame corresponding to the unlabeled data and the prediction label corresponding to the prediction frame can be called a pseudo-label
- the probability value corresponding to the prediction label and the probability value corresponding to the prediction frame can be called the probability value corresponding to the pseudo-label
- All pseudo-labels corresponding to this category are sorted based on the probability values corresponding to all pseudo-labels corresponding to this category. For example, based on the probability value corresponding to the predicted label in the pseudo-label, sort all the pseudo-labels corresponding to the category according to the probability value corresponding to the predicted label from large to small, or, according to the probability value corresponding to the predicted label from small to small The largest order sorts all pseudo-labels corresponding to this category. For another example, calculate the probability product value (or probability average value) between the probability value corresponding to the predicted label in the pseudo-label and the probability value corresponding to the prediction frame, and follow the order of the probability product value from large to small. Pseudo-labels are sorted, or all pseudo-labels corresponding to the category are sorted in ascending order of the probability product value. Of course, other ways of sorting may also be used, and there is no limitation on this.
- K pseudo-labels with large probability values can be selected as high-quality pseudo-labels corresponding to this category, and the remaining pseudo-labels in all pseudo-labels corresponding to this category except the high-quality pseudo-labels corresponding to this category can be used as Uncertain pseudo-label corresponding to this category. For example, if the probability values (or probability product values) corresponding to the predicted labels are sorted from large to small, the top K pseudo-labels can be selected as high-quality pseudo-labels, and the remaining pseudo-labels can be used as uncertain pseudo-labels. Label.
- the K pseudo-labels that are ranked lower can be selected as high-quality pseudo-labels, and the remaining pseudo-labels can be used as uncertain pseudo-labels.
- pseudo-labels c1-c100 corresponding to category 1 sort the pseudo-labels c1-c100 according to the probability value corresponding to the predicted label (the probability value corresponding to category 1) from large to small, and select the top K pseudo-labels Labels (such as c1-c10) are used as high-quality pseudo-labels, and the remaining pseudo-labels (such as c11-c100) are used as uncertain pseudo-labels.
- pseudo-labels c101-c300 corresponding to category 2 sort the pseudo-labels c101-c300 according to the probability values corresponding to the predicted labels (the probability values corresponding to category 2) from large to small, and select the top K pseudo-labels labels as high-quality pseudo-labels, and the remaining pseudo-labels as uncertain pseudo-labels, and so on, high-quality pseudo-labels and uncertain pseudo-labels corresponding to each category can be obtained.
- K can be configured based on experience, or can be determined based on the total number of pseudo-labels corresponding to the category.
- K is equal to the total number*M
- M is a value between 0-1, which can be configured based on experience, such as 20 %, 30%, etc.
- M is equal to 20%
- the value of K is 20, that is, the top 20 pseudo-labels are selected from all pseudo-labels corresponding to category 1 as high-quality pseudo-labels, And take the remaining 80 pseudo-labels as uncertain pseudo-labels.
- the value of K is 40, that is, select the top 40 pseudo-labels from all pseudo-labels corresponding to category 2 as high-quality pseudo-labels, and use the remaining 160 pseudo-labels as Not sure about pseudo-labels, and so on.
- all pseudo-labels corresponding to all unlabeled data in the sample dataset can be divided into high-quality pseudo-labels and uncertain pseudo-labels.
- High-quality pseudo-labels can be used as reliable labels.
- high-quality pseudo-labels can be used as labeled data, that is, high-quality pseudo-labels correspond to calibration boxes and calibration categories.
- the predicted frame output by the initial management model is used as the calibration frame of the high-quality pseudo-label
- the predicted label output by the initial management model is used as the calibration category of the high-quality pseudo-label.
- labeled data, high-quality pseudo-labels and uncertain pseudo-labels can be obtained.
- these unlabeled data can be unlabeled data of high-quality pseudo-labels and unlabeled data with uncertain pseudo-labels
- the ratio of labeled data to unlabeled data can be m:n, that is, the ratio of the total number of labeled data to the total number of unlabeled data is m:n, and m:n can be determined based on experience. Configuration, there is no restriction on the value of m:n, such as 1:1, 1:2, 2:1, etc.
- Step 204 Perform data augmentation on the labeled data, input the augmented labeled data into the initial learning model, obtain the third predicted value corresponding to the labeled data, and determine the fourth predicted label based on the third predicted value and a fourth prediction frame, and determine a third loss value based on the fourth prediction label and the fourth prediction frame.
- space transformation and/or color transformation can be used to perform data augmentation on the labeled data. This process is not limited, and the labeled data after data augmentation can be obtained.
- the labeled data after data augmentation can be input to the initial learning model, and the initial learning model processes the labeled data to obtain the third predicted value corresponding to the labeled data.
- the third predicted value It can include a prediction frame and a prediction label.
- the prediction frame is called the fourth prediction frame
- the prediction label is called the fourth prediction label, that is, the fourth prediction label and the fourth prediction can be determined based on the third prediction value. frame.
- the fourth prediction frame is used to represent the location of the target object, such as four vertex coordinates
- the fourth prediction label is used to represent the category of the target object, such as category 1, category 2, or category 3.
- the labeled data corresponds to calibration information, such as a calibration frame and a calibration category.
- the category loss value can be determined based on the fourth prediction label corresponding to the labeled data and the calibration category corresponding to the labeled data, and based on the fourth prediction frame corresponding to the labeled data and the calibration category corresponding to the labeled data box determines the coordinate box loss value.
- a third loss value is determined based on the category loss value and the coordinate frame loss value. For example, the third loss value is determined using the following formula (2):
- L is used to represent the third loss value
- L loc is used to represent the coordinate frame loss value
- L cls is used to represent the category loss value.
- the category loss value L cls if the fourth predicted label matches the calibrated category, the category loss value is small, if the category loss value is the minimum value of the loss value, if the fourth predicted label does not match the calibrated category, then the category loss value The value is larger, such as the category loss value is the maximum loss value.
- the coordinate frame loss value L loc if the fourth prediction frame matches the calibration frame, the coordinate frame loss value is smaller, such as the coordinate frame loss value is the minimum value of the loss value, if the fourth prediction frame does not match the calibration frame, Then it can be determined how close the four vertex coordinates of the fourth prediction frame are to the four vertex coordinates of the calibration frame. If the fourth prediction frame is closer to the calibration frame, the loss value of the coordinate frame is smaller. If the fourth prediction frame is closer to the calibration frame The larger the frame difference, the larger the coordinate frame loss value.
- each vertex coordinate of the fourth prediction frame corresponds to a probability distribution
- determine the coordinates corresponding to the 4 vertex coordinates of the calibration frame The probability distribution of the point offset, that is, each vertex coordinate of the calibration frame corresponds to a probability distribution.
- the probability distribution is represented by the mean and variance.
- the probability distribution corresponding to the vertex coordinate x (such as the vertex coordinates of the prediction frame or the vertex coordinates of the calibration frame) is N( ⁇ tx , ⁇ tx ).
- the coordinate frame loss value between the fourth prediction frame and the calibration frame can be calculated.
- the negative log-likelihood loss value can be calculated based on the probability distribution corresponding to the fourth prediction frame and the probability distribution corresponding to the calibration frame, and the loss value of the determined coordinate frame can be obtained based on the negative log-likelihood loss value.
- the category loss value and the coordinate frame loss value corresponding to the labeled data can be obtained, and the third loss value corresponding to the labeled data can be determined based on the category loss value and the coordinate frame loss value.
- Step 205 Perform the first data augmentation on the unlabeled data, input the unlabeled data after the first data augmentation into the initial learning model, and obtain the first predicted value corresponding to the unlabeled data; Determine the first predicted label and the first predicted frame based on the first predicted value of , and determine the second predicted label and the second predicted frame based on the first predicted value corresponding to the uncertain pseudo-label.
- the first data augmentation may be performed on the unlabeled data by means of space transformation and/or color transformation, to obtain unlabeled data after the first data augmentation.
- the unlabeled data augmented by the first data can be input to the initial learning model, and the unlabeled data is processed by the initial learning model to obtain the first predicted value corresponding to the unlabeled data , the first predicted value may include a predicted box and a predicted label.
- pseudo-labels have been added to unlabeled data, and the pseudo-labels are divided into high-quality pseudo-labels and uncertain pseudo-labels.
- High-quality pseudo-labels can have corresponding prediction frames (ie, as calibration frames) and prediction labels ( That is, as a calibration category), uncertain pseudo-labels can have corresponding prediction boxes and prediction labels.
- the prediction box in the first prediction value matches the prediction box corresponding to the high-quality pseudo-label, that is, the two represent the prediction box of the same area in the same unlabeled data (such as an unlabeled image)
- this The first predicted value is the first predicted value corresponding to the high-quality pseudo-label
- the predicted label in the first predicted value is used as the first predicted label
- the predicted frame in the first predicted value is used as the first predicted frame, That is, the first predicted label and the first predicted frame are determined based on the first predicted value corresponding to the high-quality pseudo-label.
- the first prediction value is The first predicted value corresponding to the uncertain pseudo-label, and the predicted label in the first predicted value is used as the second predicted label, and the predicted frame in the first predicted value is used as the second predicted frame, that is, based on Determine the first predicted value corresponding to the pseudo-label and determine the second predicted label and the second predicted frame.
- the second prediction frame is used to represent the location of the target object, such as four vertex coordinates
- the second prediction label is used to represent the category of the target object.
- the second predicted label may include C first probability values corresponding to C categories supported by the initial learning model, and C may be a positive integer greater than 1.
- the second predicted label may include the first probability value (such as 0.5) corresponding to category 1, and the first probability value corresponding to category 2.
- the probability value (such as 0.3), the first probability value corresponding to category 3 (such as 0.2), that is, the second predicted label is [0.5, 0.3, 0.2].
- Step 206 Perform second data augmentation on the unlabeled data, input the unlabeled data after the second data augmentation into the initial management model, and obtain the second predicted value corresponding to the unlabeled data; The second predicted value of determines the third predicted label and the third predicted box.
- the second data augmentation may be performed on the unlabeled data by means of space transformation and/or color transformation, to obtain unlabeled data after the second data augmentation.
- the method of the first data augmentation and the method of the second data augmentation can be different, that is, the same unlabeled data is augmented twice by using different augmentation methods, and the unlabeled data after two data augmentations are obtained, One unlabeled data is input to the initial learning model, and the other unlabeled data is input to the initial management model.
- the unlabeled data augmented by the second data can be input to the initial management model, and the unlabeled data is processed by the initial management model to obtain the second predicted value corresponding to the unlabeled data , the second prediction value may include a prediction box and a prediction label.
- pseudo-labels have been added to unlabeled data, and the pseudo-labels are divided into high-quality pseudo-labels and uncertain pseudo-labels.
- High-quality pseudo-labels can have corresponding prediction boxes and prediction labels
- uncertain pseudo-labels can have The corresponding predicted box and predicted label.
- the predicted frame in the second predicted value matches the predicted frame corresponding to the high-quality pseudo-label, the predicted frame and predicted label in the second predicted value are no longer considered, that is, they do not participate in subsequent training.
- the second prediction value is For the second predicted value corresponding to the uncertain pseudo-label, the predicted label in the second predicted value is used as the third predicted label, and the predicted frame in the second predicted value is used as the third predicted frame, that is, based on and uncertain
- the second predicted value corresponding to the pseudo-label determines a third predicted label and a third predicted frame.
- the third prediction frame is used to represent the location of the target object, such as four vertex coordinates, and the third prediction label is used to represent the category of the target object.
- the third predicted label may include C second probability values corresponding to C categories supported by the initial management model, and C may be a positive integer greater than 1.
- the third prediction label may include a second probability value (such as 0.6) corresponding to category 1, and a second probability value corresponding to category 2. value (such as 0.2), the second probability value corresponding to category 3 (such as 0.2), that is, the third predicted label is [0.6, 0.2, 0.2].
- Step 207 Determine a first loss value based on the first predicted label and the first predicted frame.
- the high-quality pseudo-label corresponds to The prediction frame of the high-quality pseudo-label is used as the calibration frame, and the prediction label corresponding to the high-quality pseudo-label is used as the calibration category), on this basis, the category can be determined based on the first prediction label corresponding to the high-quality pseudo-label and the calibration category corresponding to the high-quality pseudo-label Loss value, and determine the coordinate frame loss value based on the first prediction frame corresponding to the high-quality pseudo-label and the calibration frame corresponding to the high-quality pseudo-label. Then, a first loss value may be determined based on the category loss value and the coordinate frame loss value. Wherein, for the determination process of the first loss value, reference may be made to the determination process of the third loss value above, which will not
- Step 208 Determine a second loss value based on the second predicted label, the second predicted frame, the third predicted label, and the third predicted frame.
- the second prediction label and the second prediction frame are prediction labels and prediction frames corresponding to uncertain pseudo-labels (output by the initial learning model)
- the third prediction label and the third prediction frame are also the The predicted label and predicted frame (output by the initial management model) corresponding to the uncertain pseudo-label, on this basis, can be based on the second predicted label corresponding to the uncertain pseudo-label and the third predicted label corresponding to the uncertain pseudo-label
- a category loss value is determined, and a coordinate frame loss value is determined based on the second prediction frame corresponding to the uncertain pseudo-label and the third prediction frame corresponding to the uncertain pseudo-label.
- a second loss value may be determined based on the category loss value and the coordinate frame loss value.
- the second predicted label includes C first probability values
- the third predicted label includes C second probability values
- the category loss of the consistency constraint can be determined based on the C first probability values and the C second probability values
- the category loss value of the consistency constraint refers to the consistency constraint between the predicted label of the initial management model and the predicted label of the initial learning model.
- a first probability distribution of coordinate point offsets corresponding to the second prediction frame may be determined
- a second probability distribution of coordinate point offsets corresponding to the third prediction frame may be determined
- based on the first probability distribution and the second probability distribution Determine the coordinate frame loss value of the consistency constraint
- the coordinate frame loss value of the consistency constraint refers to the consistency constraint between the prediction frame of the initial management model and the prediction frame of the initial learning model.
- a second loss value may be determined based on the category loss value and the frame loss value.
- a relative entropy between the first probability distribution and the second probability distribution may be determined, and a coordinate frame loss value of the consistency constraint may be determined based on the relative entropy.
- the uncertain pseudo-label corresponds to the second prediction label and the second prediction box, the third prediction label and the third prediction box, and the category loss value is determined based on the second prediction label and the third prediction label, based on the second prediction box and
- the third prediction frame determines the loss value of the coordinate frame.
- a second loss value is determined based on the category loss value and the coordinate frame loss value.
- the second loss value is determined using the following formula (3):
- L is used to represent the second loss value
- L loc is used to represent the coordinate frame loss value
- L cls is used to represent the category loss value.
- the second predicted label includes C first probability values corresponding to C categories
- the third predicted label includes C second probability values corresponding to C categories, based on C
- the following formula can be used to calculate the category loss value of the consistency constraint.
- the following formula (4) is just an example, and there is no limit to this, as long as it can be based on C first probabilities value and C second probability values, and determine the category loss value of the consistency constraint.
- L c-cls represents the category loss value of the consistency constraint
- C represents the C categories supported by the initial learning model or the initial management model
- i represents the i-th category in the C categories
- p ti represents the second probability value corresponding to the i-th category output by the initial management model (belonging to the third prediction label).
- p ti can be the second probability value corresponding to the i-th category after sharpening
- the probability value of , p si represents the first probability value corresponding to the i-th category output by the initial learning model (belonging to the second predicted label).
- the value range of i is 1-C.
- i 1, it means that the initial learning model supports the detection of the first category, p ti indicates the second probability value corresponding to the first category, and p si indicates the first category The first probability value corresponding to the category, and so on.
- i C
- p ti indicates the second probability value corresponding to the C-th category
- p si indicates the C-th category The first probability value corresponding to the category.
- the coordinate frame loss value L loc it is possible to determine the closeness of the four vertex coordinates of the second prediction frame to the four vertex coordinates of the third prediction frame, if the proximity indicates that the second prediction frame and the third prediction frame The closer the frame is, the smaller the loss value of the coordinate frame is, and if the degree of proximity indicates that the difference between the second prediction frame and the third prediction frame is greater, the loss value of the coordinate frame is larger.
- each vertex coordinate corresponds to a first probability distribution
- the second probability distribution of the point offset that is, each vertex coordinate corresponds to a second probability distribution.
- the probability distribution is represented by mean and variance, for example, the probability distribution corresponding to the vertex coordinate x is N( ⁇ tx , ⁇ tx ).
- the coordinate frame loss value of the consistency constraint can be determined.
- the following formula can be used to calculate the coordinate frame loss value of the consistency constraint.
- the following formula (5) is just an example, There is no restriction on this, as long as the loss value of the coordinate frame can be obtained.
- L c-loc represents the loss value of the coordinate frame of the consistency constraint
- a represents the four vertex coordinates of the prediction frame, for example, a can be the vertex coordinate (x, y), and a can be the vertex coordinate ( x+w, y), a can be vertex coordinates (x, y+h), a can be vertex coordinates (x+w, y+h), that is, a can be 4 vertex coordinates.
- KL stands for KL (Kullback-Leibler) divergence, which can also be called relative entropy or information divergence, etc.
- N( ⁇ ta-s , ⁇ ta-s ) represents the first probability of the offset of the coordinate point corresponding to the vertex coordinate a Distribution (also known as Gaussian distribution), that is, the probability distribution corresponding to the second prediction frame output by the initial learning model
- N( ⁇ ta-t , ⁇ ta-t ) represents the second offset of the coordinate point corresponding to the vertex coordinate a
- Probability distribution that is, the probability distribution corresponding to the third prediction frame output by the initial management model.
- ⁇ ta-t can also be sharpened.
- the relative relationship between the first probability distribution of the coordinate point offset corresponding to the vertex coordinate (x, y) and the second probability distribution of the coordinate point offset corresponding to the vertex coordinate (x, y) can be calculated Entropy, and calculate the difference between the first probability distribution of the coordinate point offset corresponding to the vertex coordinate (x+w, y) and the second probability distribution of the coordinate point offset corresponding to the vertex coordinate (x+w, y) Relative entropy, and calculate the difference between the first probability distribution of the coordinate point offset corresponding to the vertex coordinate (x, y+h) and the second probability distribution of the coordinate point offset corresponding to the vertex coordinate (x, y+h) , and calculate the first probability distribution of the coordinate point offset corresponding to the vertex coordinate (x+w, y+h) and the first probability distribution of the coordinate point offset corresponding to the vertex coordinate (x+w, y+h) Relative entropy between two probability distributions. Then, calculate the sum of the above four
- the category loss value and the coordinate frame loss value corresponding to the uncertain pseudo-label can be obtained, and the second loss value corresponding to the uncertain pseudo-label can be determined based on the category loss value and the coordinate frame loss value.
- Step 209 Determine a target loss value of the initial learning model based on the first loss value, the second loss value and the third loss value.
- the average value among the first loss value, the second loss value and the third loss value can be used as the target loss value of the initial learning model, or the first loss value, the second loss value and the third loss value
- the sum of the values is used as the target loss value of the initial learning model, which is not limited.
- Step 210 Adjust the network parameters of the initial learning model based on the target loss value to obtain an adjusted learning model. For example, after obtaining the target loss value of the initial learning model, the network parameters of the initial learning model can be adjusted based on the target loss value, such as using gradient descent method to adjust the network parameters of the initial learning model. There is no limit to this adjustment process.
- Step 211 Adjust the network parameters of the initial management model based on the network parameters of the adjusted learning model to obtain an adjusted management model. For example, after the adjusted learning model is obtained, the parameter correction value of the network parameter is determined based on the network parameter of the adjusted learning model and the configured proportional coefficient, and the network parameter of the initial management model is adjusted based on the parameter correction value, to obtain Regulatory model after adjustment.
- the EMA Exposential Moving Average
- the parameter correction value can be determined based on the network parameters and the proportional coefficient of the adjusted learning model, and the network parameters of the initial management model can be adjusted based on the parameter correction value to obtain the adjusted management model. There is no limit to this process.
- Step 212 judging whether the adjusted management model has converged. If the adjusted management model does not converge, then step 213 may be performed; if the adjusted management model has converged, then step 214 may be performed.
- the adjusted management model if the number of iterations of the initial management model or the initial learning model reaches the number threshold, it is determined that the adjusted management model has converged, and if the number of iterations of the initial management model or the initial learning model does not reach the number threshold, then it is determined that the adjusted management model Not converged.
- the target loss value of the initial learning model is less than the loss value threshold, it is determined that the adjusted management model has converged, and if the target loss value of the initial learning model is not less than the loss value threshold, it is determined that the adjusted management model has not converged.
- Step 213 if the adjusted management model does not converge, determine the adjusted learning model as the initial learning model, determine the adjusted management model as the initial management model, and return to step 203 .
- Step 214 If the adjusted management model has converged, then determine the converged adjusted management model as the target management model, and the target management model is the final model to be output.
- target detection may be performed on the data to be detected based on the target management model.
- the data to be detected (such as an image to be detected) can be input to the target management model, and the target management model outputs the target frame in the data to be detected, and recognizes the target or target category, such as recognizing a face, Identifying the vehicle category, identifying the animal category, identifying the electronic product category, etc., the process will not be repeated here.
- the target management model can be obtained by training a small amount of labeled data and a large amount of unlabeled data, so as to avoid obtaining a large amount of labeled data, reduce the workload of labeling operations, save human resources, and target
- the management model has relatively good performance and high reliability.
- the robustness to noise samples in the training process is improved, and the target management model is greatly improved compared with the baseline model. Setting different proportions of pseudo-labels to high-quality pseudo-labels can achieve better training results, which is robust to noise samples and insensitive to hyperparameters.
- a consistency constraint is established for the prediction frame and prediction label under different augmentations of unlabeled data, which can improve the noise samples in the training process on the basis of effective use of pseudo-labels.
- robustness Pseudo-labels and consistency constraints can be combined.
- pseudo-labels are used as real categories.
- consistency comparison loss is established for prediction results (or features). The prediction results (or features) of different views are used to constrain the same target not to change the category characteristics under different augmentations, the management model-learning model is used to generate a smoother and more stable classifier, and only the gradient is returned to the learning model.
- FIG. 3 is a schematic structural diagram of the target detection device.
- the device may include:
- Obtaining module 31 is used for obtaining initial management model and initial learning model, adds pseudo-label for unlabeled data based on described initial management model, and pseudo-label is divided into high-quality pseudo-label and uncertain pseudo-label;
- Determine module 32 use Input the unlabeled data into the initial learning model to obtain the first predicted value corresponding to the unlabeled data; determine the first predicted label and the first predicted frame based on the first predicted value corresponding to the high-quality pseudo-label, based on the uncertainty
- the first predicted value corresponding to the pseudo-label determines the second predicted label and the second predicted box; input the unlabeled data to the initial management model to obtain the second predicted value corresponding to the unlabeled data, based on the first predicted value corresponding to the uncertain pseudo-label
- the second predicted value determines the third predicted label and the third predicted frame;
- the processing module 33 is configured to use the first predicted label, the first predicted frame, the second predicted label, the second predicted frame, the third predicted label and the third
- the acquisition module 31 acquires the initial management model and the initial learning model, it is specifically used to: use labeled data training to obtain a baseline model; generate an initial management model and an initial learning model based on the baseline model; wherein, the The network structure of the initial management model is the same as that of the baseline model, and the network parameters of the initial management model are the same or different from those of the baseline model; the network structure of the initial learning model is the same as that of the baseline model The network structure is the same, and the network parameters of the initial learning model are the same or different from those of the baseline model.
- the acquisition module 31 adds pseudo-labels to the unlabeled data based on the initial management model, and divides the pseudo-labels into high-quality pseudo-labels and uncertain pseudo-labels for: for each unlabeled data, Input the unlabeled data into the initial management model to obtain the pseudo-label corresponding to the unlabeled data and the probability value corresponding to the pseudo-label; for each category that the initial management model supports detection, based on the The probability values corresponding to all the pseudo-labels corresponding to the category, sort all the pseudo-labels corresponding to the category; based on the sorting results, select K pseudo-labels with high probability values as the high-quality pseudo-labels corresponding to the category, and rank the corresponding pseudo-labels of the category Among all the pseudo-labels of , the remaining pseudo-labels except the high-quality pseudo-labels corresponding to this category are used as uncertain pseudo-labels corresponding to this category; where K is
- the processing module 33 trains the initial management model based on the first predicted label, the first predicted frame, the second predicted label, the second predicted frame, the third predicted label and the third predicted frame to obtain the target
- it is specifically used to: determine the first loss value based on the first prediction label and the first prediction frame; determine the first loss value based on the second prediction label, the second prediction frame, the third prediction label and the The third prediction frame determines a second loss value; the initial management model is adjusted based on the first loss value and the second loss value to obtain the target management model.
- the processing module 33 determines the second loss value based on the second predicted label, the second predicted frame, the third predicted label and the third predicted frame, it is specifically used to: if the second predicted label includes the same C first probability values corresponding to C categories that the initial learning model supports detection, and the third prediction label includes C second probability values corresponding to C categories that the initial management model supports detection, then based on The C first probability values and the C second probability values determine the category loss value of the consistency constraint; wherein, the C is a positive integer greater than 1; determine the coordinate point corresponding to the second prediction frame The first probability distribution of the offset, and determine the second probability distribution of the coordinate point offset corresponding to the third prediction frame, and determine the consistency constraint based on the first probability distribution and the second probability distribution A coordinate frame loss value; determining a second loss value based on the category loss value and the coordinate frame loss value.
- the processing module 33 determines the coordinate frame loss value of the consistency constraint based on the first probability distribution and the second probability distribution, it is specifically used to: determine the first probability distribution and the second probability distribution A relative entropy between distributions; a frame loss value for the consistency constraint is determined based on the relative entropy.
- the processing module 33 adjusts the initial management model based on the first loss value and the second loss value, and when obtaining the target management model is specifically used to: based on the first loss value and the The second loss value adjusts the network parameters of the initial learning model to obtain an adjusted learning model; adjusts the network parameters of the initial management model based on the network parameters of the adjusted learning model to obtain an adjusted management model ; If the adjusted management model does not converge, then determine the adjusted learning model as the initial learning model, determine the adjusted management model as the initial management model, and return to execute based on the initial management model as unlabeled data The operations of adding pseudo-labels and dividing pseudo-labels into high-quality pseudo-labels and uncertain pseudo-labels; if the adjusted management model has converged, then determining the adjusted management model as the target management model.
- the processing module 33 adjusts the network parameters of the initial management model based on the network parameters of the adjusted learning model, and when obtaining the adjusted management model is specifically used for: based on the network parameters of the adjusted learning model and The configured proportional coefficient determines the parameter correction value of the network parameter, and adjusts the network parameter of the initial management model based on the parameter correction value to obtain the adjusted management model.
- the readable storage medium 42 stores machine-executable instructions that can be executed by the processor 41; the processor 41 is configured to execute the machine-executable instructions to implement the object detection method disclosed in the above examples of the present disclosure.
- the processor 41 is used to execute machine-executable instructions to implement the following steps:
- the target management model is used for target detection on the data to be detected.
- the embodiment of the present disclosure also provides a machine-readable storage medium, on which several computer instructions are stored, and when the computer instructions are executed by a processor, the present invention can be realized.
- the target detection method disclosed in the above example is disclosed.
- the above-mentioned machine-readable storage medium may be any electronic, magnetic, optical or other physical storage device, which may contain or store information, such as executable instructions, data, and so on.
- the machine-readable storage medium can be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, storage drive (such as hard disk drive), solid state drive, any type of storage disk (such as CD, DVD, etc.), or similar storage media, or a combination of them.
- a typical implementing device is a computer, which may take the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, e-mail device, game control device, etc. desktops, tablets, wearables, or any combination of these.
- embodiments of the present disclosure may be provided as methods, systems, or computer program products. Accordingly, the present disclosure can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
- computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
- these computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing device to operate in a specific manner, so that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means,
- the instruction means implements the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
- These computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operational steps are performed on the computer or other programmable equipment to produce computer-implemented processing, so that the information executed on the computer or other programmable equipment
- the instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本公开提供一种目标检测方法及装置,包括:为无标签数据添加伪标签,将伪标签划分为高质量伪标签和不确定伪标签;将无标签数据输入给初始学习模型得到第一预测值;基于与高质量伪标签对应的第一预测值确定第一预测标签和第一预测框,基于与不确定伪标签对应的第一预测值确定第二预测标签和第二预测框;将无标签数据输入给初始管理模型得到第二预测值,基于与不确定伪标签对应的第二预测值确定第三预测标签和第三预测框;基于第一预测标签、第一预测框、第二预测标签、第二预测框、第三预测标签和第三预测框对初始管理模型进行训练,得到目标管理模型,所述目标管理模型用于对待检测数据进行目标检测。
Description
相关申请的交叉引用
本公开要求于2021年11月19日提交的、申请号为202111401508.9的中国专利申请的优先权,该申请以引用的方式并入本文中。
本公开涉及人工智能技术领域,尤其涉及一种目标检测方法及装置。
机器学习是实现人工智能的一种途径,是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习用于研究计算机如何模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习更加注重算法设计,使计算机能够自动地从数据中学习规律,并利用规律对未知数据进行预测。机器学习已经有了十分广泛的应用,如深度学习、数据挖掘、计算机视觉、自然语言处理、生物特征识别、搜索引擎、医学诊断、语音识别和手写识别等。
为了采用机器学习实现人工智能处理,可以构建训练数据集,该训练数据集包括大量有标签数据(如图像数据,即具有标定框和标定类别的图像)。基于训练数据集训练出机器学习模型,如具有目标检测功能的机器学习模型,可以采用机器学习模型对待检测数据进行目标检测,比如说,检测待检测数据中的目标框,并识别出目标类别,如车辆类别、动物类别、电子产品类别等。
为了提高机器学习模型的性能,需要获取大量有标签数据,有标签数据越多,则训练出的机器学习模型的性能越好。但是,为了得到有标签数据,需要对大量数据进行标注操作,因而需要耗费大量人力资源。
发明内容
本公开提供一种目标检测方法,所述方法包括:
获取初始管理模型和初始学习模型,并基于所述初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签;
将无标签数据输入给初始学习模型,得到该无标签数据对应的第一预测值;基于与高质量伪标签对应的第一预测值确定第一预测标签和第一预测框,基于与不确定伪标签对应的第一预测值确定第二预测标签和第二预测框;
将无标签数据输入给初始管理模型,得到该无标签数据对应的第二预测值,基于与不确定伪标签对应的第二预测值确定第三预测标签和第三预测框;
基于所述第一预测标签、第一预测框、第二预测标签、第二预测框、第三预测标签和第三预测框对初始管理模型进行训练,得到目标管理模型;
其中,所述目标管理模型用于对待检测数据进行目标检测。
本公开提供一种目标检测装置,所述装置包括:
获取模块,用于获取初始管理模型和初始学习模型,基于所述初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签;
确定模块,用于将无标签数据输入给初始学习模型,得到该无标签数据对应的第一预测值;基于与高质量伪标签对应的第一预测值确定第一预测标签和第一预测框,基于 与不确定伪标签对应的第一预测值确定第二预测标签和第二预测框;将无标签数据输入给初始管理模型,得到该无标签数据对应的第二预测值,基于与不确定伪标签对应的第二预测值确定第三预测标签和第三预测框;
处理模块,用于基于所述第一预测标签、第一预测框、第二预测标签、第二预测框、第三预测标签和第三预测框对初始管理模型进行训练,得到目标管理模型;其中,所述目标管理模型用于对待检测数据进行目标检测。
本公开提供一种目标检测设备,包括:处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令;
所述处理器用于执行机器可执行指令,以实现上述示例的目标检测方法。
本公开实施例还提供一种机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令,所述计算机指令被处理器执行时,能够实现本公开上述示例公开的目标检测方法。
由以上技术方案可见,本公开实施例中,可以基于无标签数据训练得到目标管理模型,即通过少量有标签数据和大量无标签数据也可以训练得到目标管理模型,从而避免获取大量有标签数据,减轻标注操作的工作量,节约人力资源,且目标管理模型具有比较好的性能,可靠性很高。能够在有效利用伪标签的基础上,提升训练过程中对噪声样本的鲁棒性,目标管理模型相较基线模型有非常大的提升。将不同比例的伪标签设置为高质量的伪标签均能取得比较好的训练结果,对噪声样本具有较好的鲁棒性,对超参不敏感。提出半监督目标检测训练方式,借鉴无监督表征学习中相同图像在不同增广后的特征维持一致的思想,对无标签数据不同增广下的回归框和分类建立一致性约束,将伪标签和一致性约束进行结合,对可靠的目标框采用伪标签作为真实类别,而对于不确信的目标则对不同预测结果(或者特征)建立一致性对比损失。
为了更加清楚地说明本公开实施例或者现有技术中的技术方案,下面将对本公开实施例或者现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据本公开实施例的这些附图获得其他的附图。
图1是本公开一种实施方式中的目标检测方法的流程示意图;
图2是本公开另一种实施方式中的目标检测方法的流程示意图;
图3是本公开一种实施方式中的目标检测装置的结构示意图;
图4是本公开一种实施方式中的目标检测设备的硬件结构图。
在本公开实施例使用的术语仅仅是出于描述特定实施例的目的,而非限制本公开。本公开和权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。还应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开实施例可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,此外,所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
本公开实施例中提出一种目标检测方法,可以应用于目标检测设备,目标检测设备可以是任意类型的设备,如服务器、终端设备、管理设备等,对此不做限制。参见图1所示,为目标检测方法的流程示意图,本实施例的目标检测方法可以是半监督目标检测方法,该方法可以包括:
步骤101、获取初始管理模型和初始学习模型,并基于初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签。
示例性的,获取初始管理模型和初始学习模型,可以包括但不限于:利用有标签数据训练得到一个基线模型,并基于该基线模型生成初始管理模型和初始学习模型。其中,初始管理模型的网络结构与基线模型的网络结构可以相同,初始管理模型的网络参数与基线模型的网络参数可以相同或不同。初始学习模型的网络结构与基线模型的网络结构相同,初始学习模型的网络参数与基线模型的网络参数相同或不同。初始管理模型的网络结构与初始学习模型的网络结构相同,初始管理模型的网络参数与初始学习模型的网络参数相同或不同。
示例性的,基于初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签,可以包括但不限于:针对每个无标签数据来说,可以将该无标签数据输入给初始管理模型,得到该无标签数据对应的伪标签、及与该伪标签对应的概率值。在此基础上,针对初始管理模型支持检测的每种类别,基于与该类别对应的所有伪标签对应的概率值,对该类别对应的所有伪标签进行排序;基于排序结果,选取概率值大的K个伪标签作为该类别对应的高质量伪标签,并将该类别对应的所有伪标签中除该类别对应的高质量伪标签之外的剩余伪标签作为该类别对应的不确定伪标签。其中,K为正整数。
步骤102、将无标签数据输入给初始学习模型,得到该无标签数据对应的第一预测值;基于与高质量伪标签对应的第一预测值确定第一预测标签和第一预测框,基于与不确定伪标签对应的第一预测值确定第二预测标签和第二预测框。将无标签数据输入给初始管理模型,得到该无标签数据对应的第二预测值,基于与不确定伪标签对应的第二预测值确定第三预测标签和第三预测框。
示例性的,可以对无标签数据进行第一数据增广,将第一数据增广后的无标签数据输入给初始学习模型,得到该无标签数据对应的第一预测值。
示例性的,可以对无标签数据进行第二数据增广,将第二数据增广后的无标签数据输入给初始管理模型,得到该无标签数据对应的第二预测值。
示例性的,第一数据增广的方式与第二数据增广的方式可以不同。
步骤103、基于第一预测标签、第一预测框、第二预测标签、第二预测框、第三预测标签和第三预测框对初始管理模型进行训练,得到目标管理模型;其中,该目标管理模型用于对待检测数据进行目标检测。比如说,可以采用如下步骤1031-1032对初始管理模型进行训练,得到目标管理模型:
步骤1031、基于第一预测标签和第一预测框确定第一损失值,并基于第二预测标签、第二预测框、第三预测标签和第三预测框确定第二损失值。
示例性的,初始学习模型和初始管理模型支持检测的类别的数量是相同的。基于第二预测标签、第二预测框、第三预测标签和第三预测框确定第二损失值,可以包括但不限于:若第二预测标签包括与初始学习模型支持检测的C种类别对应的C个第一概率值,第三预测标签包括与初始管理模型支持检测的C种类别对应的C个第二概率值,则基于C个第一概率值和C个第二概率值,确定一致性约束的类别损失值,C可以为大 于1的正整数。确定与第二预测框对应的坐标点偏移量的第一概率分布,并确定与第三预测框对应的坐标点偏移量的第二概率分布。并基于第一概率分布和第二概率分布确定一致性约束的坐标框损失值。基于该类别损失值和该坐标框损失值确定第二损失值。
在一种可能的实施方式中,基于第一概率分布和第二概率分布确定一致性约束的坐标框损失值,可以包括但不限于:确定第一概率分布与第二概率分布之间的相对熵,并基于该相对熵确定一致性约束的坐标框损失值。
步骤1032、基于第一损失值和第二损失值对初始管理模型进行调整,得到目标管理模型;其中,目标管理模型用于对待检测数据进行目标检测。
示例性的,在步骤1032之前,还可以将有标签数据输入给初始学习模型,得到该有标签数据对应的第三预测值,并基于第三预测值确定第四预测标签和第四预测框,并基于第四预测标签和第四预测框确定第三损失值。
在此基础上,针对步骤1032来说,可以基于第一损失值、第二损失值和第三损失值对初始管理模型进行调整,得到目标管理模型。
示例性的,基于第一损失值和第二损失值对初始管理模型进行调整,得到目标管理模型,可以包括但不限于:基于第一损失值和第二损失值对初始学习模型的网络参数进行调整,得到调整后学习模型;基于调整后学习模型的网络参数对初始管理模型的网络参数进行调整,得到调整后管理模型;若调整后管理模型未收敛,则将调整后学习模型确定为初始学习模型,将调整后管理模型确定为初始管理模型,返回执行基于初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签的操作(即步骤101);若调整后管理模型已收敛,则将调整后管理模型确定为目标管理模型。
示例性的,基于调整后学习模型的网络参数对初始管理模型的网络参数进行调整,得到调整后管理模型,可以包括但不限于:基于调整后学习模型的网络参数和已配置的比例系数确定网络参数的参数修正值,并基于该参数修正值对初始管理模型的网络参数进行调整,得到调整后管理模型。
由以上技术方案可见,本公开实施例中,可以基于无标签数据训练得到目标管理模型,即通过少量有标签数据和大量无标签数据也可以训练得到目标管理模型,从而避免因获取大量有标签数据而耗费人力资源,有助于减轻标注操作的工作量,节约人力资源,且目标管理模型具有比较好的性能,可靠性很高。能够在有效利用伪标签的基础上,提升训练过程中对噪声样本的鲁棒性,目标管理模型相较基线模型有非常大的提升。将不同比例的伪标签设置为高质量的伪标签均能取得比较好的训练结果,对噪声样本具有较好的鲁棒性,对超参不敏感。提出半监督目标检测训练方式,借鉴无监督表征学习中相同图像在不同增广后的特征维持一致的思想,对无标签数据在不同增广下的回归框和分类建立一致性约束,将伪标签和一致性约束进行结合,对可靠的目标框采用伪标签作为真实类别,而对于不确信的目标则对不同预测结果(或者特征)建立一致性对比损失。
以下结合具体应用场景,对本公开实施例的技术方案进行说明。
本公开实施例中提出一种目标检测方法,是基于一致性约束的半监督目标检测(semi-supervised object detection)方式,在半监督的目标检测方式中,可以采用部分标注数据结合大规模未标注数据对模型进行训练,并且取得和全量标注数据接近的性能。图2为目标检测方法的流程示意图,如图2所示,该方法可以包括:
步骤201、利用有标签数据训练得到基线模型。
示例性的,可以预先构建一个训练数据集,该训练数据集可以包括多个有标签数据(如有标签图像),针对每个有标签数据来说,该有标签数据对应有标定信息,该标定 信息包括但不限于标定框(如标定框是矩形标定框时,可以是矩形标定框的4个顶点坐标)和标定类别(即标定框内目标对象的类别)。
示例性的,可以预先获取一个初始网络模型,该初始网络模型可以是机器学习模型,如基于深度学习算法的机器学习模型、基于神经网络的机器学习模型等,在此,对机器学习模型的类型不做限制。关于初始网络模型的结构,本实施例不做限制。关于初始网络模型的功能,该初始网络模型可以实现目标检测。
示例性的,基于训练数据集中的有标签数据,可以对初始网络模型进行训练,对此训练过程不做限制,将训练完成的初始网络模型称为基线模型,也就是说,可以利用训练数据集中的有标签数据训练得到一个基线模型。
在初始网络模型的训练过程中,将有标签数据输入给初始网络模型,由初始网络模型对有标签数据进行处理,得到该有标签数据对应的预测框和预测标签。该预测框用于表示目标对象所在位置,如4个顶点坐标。该预测标签用于表示目标对象的类别,如类别1。如果初始网络模型支持检测类别1的目标对象、类别2的目标对象,则预测标签为0时,表示目标对象的类别是类别1,预测标签为1时,表示目标对象的类别是类别2。
可以基于有标签数据对应的预测标签和该有标签数据对应的标定类别(即真实类别)确定出类别损失值,并基于有标签数据对应的预测框和该有标签数据对应的标定框(即真实框)确定出坐标框损失值。然后,基于类别损失值和坐标框损失值确定初始网络模型的目标损失值,比如说,可以采用如下公式(1)确定初始网络模型的目标损失值:
L=L
loc+L
cls (1),
在上述公式(1)中,L表示初始网络模型的目标损失值,L
loc表示坐标框损失值,L
cls表示类别损失值。
其中,对于类别损失值L
cls,若预测标签与标定类别匹配(例如二者相同),则类别损失值较小(例如类别损失值为损失值最小值)。若预测标签与标定类别不匹配(例如二者不相同),则类别损失值较大(例如类别损失值为损失值最大值)。当然,上述只是确定类别损失值的示例,对此不做限制。
其中,对于坐标框损失值L
loc,若预测框与标定框匹配(例如预测框的4个顶点坐标与标定框的4个顶点坐标相同),则坐标框损失值较小(例如坐标框损失值为损失值最小值)。若预测框与标定框不匹配(例如预测框的4个顶点坐标与标定框的4个顶点坐标不同),则确定预测框(例如预测框的4个顶点坐标)与标定框(例如标定框的4个顶点坐标)的接近程度。若预测框与标定框越接近,则坐标框损失值越小。若预测框与标定框差别越大,则坐标框损失值越大。
比如说,确定预测框的4个顶点坐标对应的坐标点偏移量的概率分布(如高斯分布),即预测框的每个顶点坐标对应一个概率分布,如预测框左上角顶点坐标对应概率分布a1、预测框右上角顶点坐标对应概率分布a2、预测框右下角顶点坐标对应概率分布a3、预测框左下角顶点坐标对应概率分布a4。以及,确定标定框的4个顶点坐标对应的坐标点偏移量的概率分布,即标定框的每个顶点坐标对应一个概率分布,如标定框左上角顶点坐标对应概率分布b1、标定框右上角顶点坐标对应概率分布b2、标定框右下角顶点坐标对应概率分布b3、标定框左下角顶点坐标对应概率分布b4。在确定顶点坐标对应的概率分布时,可以通过均值和方差表示概率分布,如顶点坐标x(如预测框的顶点坐标或者标定框的顶点坐标)对应的概率分布表示为N(μ
tx,∑
tx)。
基于预测框的4个顶点坐标对应的概率分布和标定框的4个顶点坐标对应的概率分 布,就可以计算预测框与标定框之间的坐标框损失值。例如,基于概率分布a1与概率分布b1计算负对数似然损失值,基于概率分布a2与概率分布b2计算负对数似然损失值,基于概率分布a3与概率分布b3计算负对数似然损失值,基于概率分布a4与概率分布b4计算负对数似然损失值。基于上述4个负对数似然损失值确定坐标框损失值,如4个负对数似然损失值的均值作为坐标框损失值,或4个负对数似然损失值的求和值作为坐标框损失值。当然,上述只是示例,对此不做限制,只要能够基于概率分布得到坐标框损失值即可。
在初始网络模型的训练过程中,在得到初始网络模型的目标损失值后,可以基于该目标损失值对初始网络模型的网络参数进行调整,如采用梯度下降法等方式对初始网络模型的网络参数进行调整,本实施例对此调整过程不做限制。将调整后的网络模型作为初始网络模型,返回执行将有标签数据输入给初始网络模型的操作,以此类推,一直到初始网络模型已收敛(如初始网络模型的迭代次数达到次数阈值,或者,目标损失值小于损失值阈值),将已收敛的初始网络模型作为基线模型,至此,利用有标签数据训练得到一个基线模型。
步骤202、基于该基线模型生成初始管理模型和初始学习模型。
比如说,可以基于该基线模型生成初始管理模型(也可以称为初始教师模型),该初始管理模型的网络结构与该基线模型的网络结构可以相同,该初始管理模型的网络参数与该基线模型的网络参数可以相同或不同。例如,可以直接将该基线模型作为初始管理模型,或者,可以对该基线模型的网络参数进行调整,并将网络参数调整后的基线模型作为初始管理模型。
比如说,可以基于该基线模型生成初始学习模型(也可以称为初始学生模型),该初始学习模型的网络结构与该基线模型的网络结构可以相同,该初始学习模型的网络参数与该基线模型的网络参数可以相同或不同。例如,可以直接将该基线模型作为初始学习模型,或者,可以对该基线模型的网络参数进行调整,并将网络参数调整后的基线模型作为初始学习模型。
步骤203、基于初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签。比如说,基于初始管理模型确定无标签数据对应的伪标签,并将该伪标签划分为高质量伪标签或者不确定伪标签。
示例性的,可以预先构建一个样本数据集,该样本数据集可以包括多个无标签数据(如无标签图像),即将无标签数据添加到样本数据集。针对每个无标签数据来说,该无标签数据没有标定信息,即没有对应的标定框和标定类别。
示例性的,针对样本数据集中的每个无标签数据来说,可以将该无标签数据输入给初始管理模型,由该初始管理模型对该无标签数据进行处理,得到该无标签数据对应的预测框、该无标签数据对应的预测框对应的预测标签、及与该预测标签对应的概率值(即预测框内的目标对象是该预测标签的概率值)。
比如说,初始管理模型支持类别1、类别2和类别3的目标检测,则初始管理模型对无标签数据进行处理之后,可以得到预测框以及该预测框对应的概率向量,如概率向量可以是[0.9,0.06,0.04],基于该概率向量可以获知该预测框对应的预测标签是类别1,且与该预测标签对应的概率值是0.9。
其中,针对每个无标签数据,该无标签数据可能对应P个预测框,P为正整数,每个预测框对应一个预测标签,即P个预测框对应P个预测标签,每个预测标签对应一个概率值,此外,每个预测框还可以对应一个概率值。例如,初始管理模型对无标签数据进行处理后,得到预测框1和预测框1的概率值、预测框1对应的预测标签1和预测标 签1的概率值,以及,预测框2和预测框2的概率值、预测框2对应的预测标签2和预测标签2的概率值。
其中,可以将无标签数据对应的预测框和该预测框对应的预测标签称为伪标签,并将该预测标签对应的概率值和该预测框对应的概率值称为伪标签对应的概率值,因此,在将无标签数据输入给初始管理模型之后,可以得到该无标签数据对应的伪标签(即预测框和预测标签)、及与该伪标签对应的概率值。
综上所述,将样本数据集中的多个无标签数据输入给初始管理模型后,可以得到大量伪标签,针对每个伪标签来说,可以得到该伪标签对应的概率值。
示例性的,针对初始管理模型支持检测的每种类别,基于与该类别对应的所有伪标签(基于伪标签的预测标签获知伪标签对应的类别)对应的概率值,对该类别对应的所有伪标签进行排序;基于排序结果,选取概率值大的K个伪标签作为该类别对应的高质量伪标签,并将该类别对应的所有伪标签中除该类别对应的高质量伪标签之外的剩余伪标签作为该类别对应的不确定伪标签。
其中,假设初始管理模型支持类别1、类别2和类别3的目标检测,则针对每个伪标签,若该伪标签中的预测标签是类别1,则该伪标签与类别1对应,若该伪标签中的预测标签是类别2,则该伪标签与类别2对应,若该伪标签中的预测标签是类别3,则该伪标签与类别3对应。综上所述,针对初始管理模型支持检测的每种类别,可以得到与该类别对应的所有伪标签(即预测框和预测标签),如类别1对应伪标签c1-c100,类别2对应伪标签c101-c300,类别3对应伪标签c301-c600。
基于与该类别对应的所有伪标签对应的概率值,对该类别对应的所有伪标签进行排序。比如说,基于伪标签中的预测标签对应的概率值,按照预测标签对应的概率值从大到小的顺序对该类别对应的所有伪标签进行排序,或者,按照预测标签对应的概率值从小到大的顺序对该类别对应的所有伪标签进行排序。又例如,计算伪标签中的预测标签对应的概率值和预测框对应的概率值之间的概率乘积值(或者概率平均值),按照概率乘积值从大到小的顺序对该类别对应的所有伪标签进行排序,或者,按照概率乘积值从小到大的顺序对该类别对应的所有伪标签进行排序。当然,也可以采用其它方式排序,对此不做限制。
基于排序结果,可以选取概率值大的K个伪标签作为该类别对应的高质量伪标签,并将该类别对应的所有伪标签中除该类别对应的高质量伪标签之外的剩余伪标签作为该类别对应的不确定伪标签。比如说,若按照预测标签对应的概率值(或概率乘积值)从大到小的顺序排序,则可以选取排序靠前的K个伪标签作为高质量伪标签,将剩余伪标签作为不确定伪标签。若按照预测标签对应的概率值(或概率乘积值)从小到大的顺序排序,则可以选取排序靠后的K个伪标签作为高质量伪标签,将剩余伪标签作为不确定伪标签。
针对类别1对应的伪标签c1-c100,按照预测标签对应的概率值(与类别1对应的概率值)从大到小的顺序对伪标签c1-c100进行排序,选取排序靠前的K个伪标签(如c1-c10)作为高质量伪标签,将剩余的伪标签(如c11-c100)作为不确定伪标签。针对类别2对应的伪标签c101-c300,按照预测标签对应的概率值(与类别2对应的概率值)从大到小的顺序对伪标签c101-c300进行排序,选取排序靠前的K个伪标签作为高质量伪标签,并将剩余的伪标签作为不确定伪标签,以此类推,可以得到每种类别对应的高质量伪标签和不确定伪标签。
关于K的取值,可以根据经验配置,也可以基于类别对应的伪标签的总数量确定,如K等于总数量*M,M是位于0-1之间的数值,可以根据经验配置,如20%、30%等。 在M等于20%的情况下,由于类别1对应100个伪标签,K的取值为20,即从类别1对应的所有伪标签中选取排序靠前的20个伪标签作为高质量伪标签,并将剩余的80个伪标签作为不确定伪标签。由于类别2对应200个伪标签,K的取值为40,即从类别2对应的所有伪标签中选取排序靠前的40个伪标签作为高质量伪标签,并将剩余的160个伪标签作为不确定伪标签,以此类推。
综上所述,针对样本数据集中所有无标签数据对应的所有伪标签,可以划分为高质量伪标签和不确定伪标签。高质量伪标签可以作为可靠标签,在后续训练过程中,可以将高质量伪标签作为有标签数据,即高质量伪标签对应有标定框和标定类别。比如说,将初始管理模型输出的预测框作为该高质量伪标签的标定框,将初始管理模型输出的预测标签作为该高质量伪标签的标定类别。
综上所述,可以得到有标签数据、高质量伪标签和不确定伪标签,在此基础上,可以基于有标签数据和无标签数据(这些无标签数据可以是高质量伪标签的无标签数据和不确定伪标签的无标签数据)进行联合训练,即进行半监督的联合训练。在联合训练过程中,有标签数据与无标签数据的配比可以是m:n,即有标签数据的总数量与无标签数据的总数量的比例是m:n,m:n可以根据经验进行配置,对此m:n的取值不做限制,如1:1、1:2、2:1等。
针对半监督训练过程,继续参见图2所示,可以包括以下步骤:
步骤204、对有标签数据进行数据增广,将数据增广后的有标签数据输入给初始学习模型,得到该有标签数据对应的第三预测值,并基于第三预测值确定第四预测标签和第四预测框,基于第四预测标签和第四预测框确定第三损失值。
针对每个有标签数据,可以采用空间变换和/或色彩变换等方式,对有标签数据进行数据增广,对此过程不做限制,得到数据增广后的有标签数据。
在联合训练过程中,可以将数据增广后的有标签数据输入给初始学习模型,由初始学习模型对有标签数据进行处理,得到该有标签数据对应的第三预测值,该第三预测值可以包括预测框和预测标签,为了区分方便,将该预测框称为第四预测框,将该预测标签称为第四预测标签,即可以基于第三预测值确定第四预测标签和第四预测框。第四预测框用于表示目标对象所在位置,如4个顶点坐标,第四预测标签用于表示目标对象的类别,如类别1、类别2、或类别3等。
针对有标签数据来说,该有标签数据对应有标定信息,如标定框和标定类别。在此基础上,可以基于有标签数据对应的第四预测标签和该有标签数据对应的标定类别确定出类别损失值,并基于有标签数据对应的第四预测框和该有标签数据对应的标定框确定出坐标框损失值。然后,基于类别损失值和坐标框损失值确定第三损失值。比如说,采用如下公式(2)确定第三损失值:
L=L
loc+L
cls (2),
其中,L用于表示第三损失值,L
loc用于表示坐标框损失值,L
cls用于表示类别损失值。
其中,对于类别损失值L
cls,若第四预测标签与标定类别匹配,则类别损失值较小,如类别损失值为损失值最小值,若第四预测标签与标定类别不匹配,则类别损失值较大,如类别损失值为损失值最大值。
其中,对于坐标框损失值L
loc,若第四预测框与标定框匹配,则坐标框损失值较小,如坐标框损失值为损失值最小值,若第四预测框与标定框不匹配,则可以确定第四预测框的4个顶点坐标与标定框的4个顶点坐标的接近程度,若第四预测框与标定框越接近,则坐标框损失值越小,若第四预测框与标定框差别越大,则坐标框损失值越大。
比如说,确定第四预测框的4个顶点坐标对应的坐标点偏移量的概率分布,即第四预测框的每个顶点坐标对应一个概率分布,确定标定框的4个顶点坐标对应的坐标点偏移量的概率分布,即标定框的每个顶点坐标对应一个概率分布。在确定顶点坐标对应的概率分布时,通过均值和方差表示概率分布,如顶点坐标x(如预测框的顶点坐标或标定框的顶点坐标)对应的概率分布为N(μ
tx,∑
tx)。
基于第四预测框的4个顶点坐标对应的概率分布和标定框的4个顶点坐标对应的概率分布,就可以计算第四预测框与标定框之间的坐标框损失值。例如,可以基于第四预测框对应的概率分布与标定框对应的概率分布计算负对数似然损失值,并基于负对数似然损失值就可以得到确定坐标框损失值。
综上所述,可以得到有标签数据对应的类别损失值和坐标框损失值,并基于该类别损失值和该坐标框损失值确定有标签数据对应的第三损失值。
步骤205、对无标签数据进行第一数据增广,将第一数据增广后的无标签数据输入给初始学习模型,得到该无标签数据对应的第一预测值;基于与高质量伪标签对应的第一预测值确定第一预测标签和第一预测框,基于与不确定伪标签对应的第一预测值确定第二预测标签和第二预测框。
示例性的,针对每个无标签数据来说,可以采用空间变换和/或色彩变换等方式,对该无标签数据进行第一数据增广,得到第一数据增广后的无标签数据。
示例性的,在联合训练过程中,可以将第一数据增广后的无标签数据输入给初始学习模型,由初始学习模型对无标签数据进行处理,得到该无标签数据对应的第一预测值,该第一预测值可以包括预测框和预测标签。
参见步骤203,已经为无标签数据添加伪标签,且将伪标签划分为高质量伪标签和不确定伪标签,高质量伪标签可以具有与其对应的预测框(即作为标定框)和预测标签(即作为标定类别),不确定伪标签可以具有与其对应的预测框和预测标签。
在此基础上,若第一预测值中的预测框与高质量伪标签对应的预测框匹配,即二者表示同一无标签数据(如无标签图像)中的同一个区域的预测框,则这个第一预测值是与该高质量伪标签对应的第一预测值,将该第一预测值中的预测标签作为第一预测标签,将该第一预测值中的预测框作为第一预测框,即基于与高质量伪标签对应的第一预测值确定第一预测标签和第一预测框。
若第一预测值中的预测框与不确定伪标签对应的预测框匹配,即二者表示同一无标签数据(如无标签图像)中的同一个区域的预测框,则这个第一预测值是与该不确定伪标签对应的第一预测值,并将该第一预测值中的预测标签作为第二预测标签,将该第一预测值中的预测框作为第二预测框,即基于与不确定伪标签对应的第一预测值确定第二预测标签和第二预测框。
其中,第二预测框用于表示目标对象所在位置,如4个顶点坐标,第二预测标签用于表示目标对象的类别。针对第二预测标签来说,可以包括与初始学习模型支持检测的C种类别对应的C个第一概率值,C可以为大于1的正整数。比如说,假设初始学习模型支持检测类别1、类别2和类别3等三种类别,则第二预测标签可以包括与类别1对应的第一概率值(如0.5)、与类别2对应的第一概率值(如0.3)、与类别3对应的第一概率值(如0.2),即第二预测标签为[0.5,0.3,0.2]。
步骤206、对无标签数据进行第二数据增广,将第二数据增广后的无标签数据输入给初始管理模型,得到该无标签数据对应的第二预测值;基于与不确定伪标签对应的第二预测值确定第三预测标签和第三预测框。
示例性的,针对每个无标签数据,可以采用空间变换和/或色彩变换等方式,对该无标签数据进行第二数据增广,得到第二数据增广后的无标签数据。其中,第一数据增广的方式与第二数据增广的方式可以不同,即采用不同增广方式对同一无标签数据进行两次数据增广,得到两个数据增广后的无标签数据,一个无标签数据输入给初始学习模型,另一个无标签数据输入给初始管理模型。
示例性的,在联合训练过程中,可以将第二数据增广后的无标签数据输入给初始管理模型,由初始管理模型对无标签数据进行处理,得到该无标签数据对应的第二预测值,该第二预测值可以包括预测框和预测标签。
参见步骤203,已经为无标签数据添加伪标签,且将伪标签划分为高质量伪标签和不确定伪标签,高质量伪标签可以具有与其对应的预测框和预测标签,不确定伪标签可以具有与其对应的预测框和预测标签。在此基础上,若第二预测值中的预测框与高质量伪标签对应的预测框匹配,则不再考虑该第二预测值中的预测框和预测标签,即不参与后续训练。若第二预测值中的预测框与不确定伪标签对应的预测框匹配,即二者表示同一无标签数据(如无标签图像)中的同一个区域的预测框,则这个第二预测值是与该不确定伪标签对应的第二预测值,将该第二预测值中的预测标签作为第三预测标签,将该第二预测值中的预测框作为第三预测框,即基于与不确定伪标签对应的第二预测值确定第三预测标签和第三预测框。
其中,第三预测框用于表示目标对象所在位置,如4个顶点坐标,第三预测标签用于表示目标对象的类别。针对第三预测标签来说,可以包括与初始管理模型支持检测的C种类别对应的C个第二概率值,C可以为大于1的正整数。比如说,假设初始管理模型支持类别1、类别2和类别3等三种类别,则第三预测标签可以包括与类别1对应的第二概率值(如0.6)、与类别2对应的第二概率值(如0.2)、与类别3对应的第二概率值(如0.2),即第三预测标签为[0.6,0.2,0.2]。
步骤207、基于第一预测标签和第一预测框确定第一损失值。
由于第一预测标签和第一预测框是高质量伪标签对应的预测标签和预测框,且高质量伪标签还对应有标定信息,如标定框和标定类别(参见步骤203,高质量伪标签对应的预测框作为标定框,高质量伪标签对应的预测标签作为标定类别),在此基础上,可以基于高质量伪标签对应的第一预测标签和该高质量伪标签对应的标定类别确定出类别损失值,并基于高质量伪标签对应的第一预测框和该高质量伪标签对应的标定框确定出坐标框损失值。然后,可以基于该类别损失值和该坐标框损失值确定出第一损失值。其中,该第一损失值的确定过程可以参见上述第三损失值的确定过程,在此不再重复赘述。
步骤208、基于第二预测标签、第二预测框、第三预测标签和第三预测框确定第二损失值。示例性的,由于该第二预测标签和该第二预测框是不确定伪标签对应的预测标签和预测框(由初始学习模型输出),且该第三预测标签和该第三预测框也是该不确定伪标签对应的预测标签和预测框(由初始管理模型输出),在此基础上,可以基于不确定伪标签对应的该第二预测标签和该不确定伪标签对应的该第三预测标签确定出类别损失值,并基于不确定伪标签对应的该第二预测框和该不确定伪标签对应的该第三预测框确定出坐标框损失值。然后,可以基于该类别损失值和该坐标框损失值确定出第二损失值。
示例性的,第二预测标签包括C个第一概率值,第三预测标签包括C个第二概率值,可以基于C个第一概率值和C个第二概率值确定一致性约束的类别损失值,一致性约束的类别损失值是指初始管理模型的预测标签与初始学习模型的预测标签之间的一致性约束。可以确定与第二预测框对应的坐标点偏移量的第一概率分布,确定与第 三预测框对应的坐标点偏移量的第二概率分布,并基于第一概率分布和第二概率分布确定一致性约束的坐标框损失值,一致性约束的坐标框损失值是指初始管理模型的预测框与初始学习模型的预测框之间的一致性约束。可以基于该类别损失值和该坐标框损失值确定第二损失值。
在一种可能的实施方式中,可以确定第一概率分布与第二概率分布之间的相对熵,并基于该相对熵确定一致性约束的坐标框损失值。
比如说,不确定伪标签对应第二预测标签和第二预测框,第三预测标签和第三预测框,基于第二预测标签和第三预测标签确定出类别损失值,基于第二预测框和第三预测框确定出坐标框损失值。然后,基于类别损失值和坐标框损失值确定第二损失值。比如说,采用如下公式(3)确定第二损失值:
L=L
loc+L
cls (3),
其中,L用于表示第二损失值,L
loc用于表示坐标框损失值,L
cls用于表示类别损失值。
其中,对于类别损失值L
cls来说,第二预测标签包括与C种类别对应的C个第一概率值,第三预测标签包括与C种类别对应的C个第二概率值,基于C个第一概率值和C个第二概率值,可以采用如下公式计算一致性约束的类别损失值,当然,如下公式(4)只是一个示例,对此不做限制,只要可以基于C个第一概率值和C个第二概率值,确定出一致性约束的类别损失值即可。
在上述公式(4)中,L
c-cls表示一致性约束的类别损失值,C表示初始学习模型或初始管理模型支持检测的C种类别,i表示C种类别中的第i种类别,p
ti表示初始管理模型输出的第i种类别对应的第二概率值(属于第三预测标签),在实际应用中,p
ti可以是对第i种类别对应的第二概率值进行锐化处理后的概率值,p
si表示初始学习模型输出的第i种类别对应的第一概率值(属于第二预测标签)。
其中,i的取值范围是1-C,当i为1时,表示初始学习模型支持检测的第1种类别,p
ti表示第1种类别对应的第二概率值,p
si表示第1种类别对应的第一概率值,以此类推,当i为C时,表示初始学习模型支持检测的第C种类别,p
ti表示第C种类别对应的第二概率值,p
si表示第C种类别对应的第一概率值。
其中,对于坐标框损失值L
loc来说,可以确定第二预测框的4个顶点坐标与第三预测框的4个顶点坐标的接近程度,若接近程度表示第二预测框与第三预测框越接近,则坐标框损失值越小,若接近程度表示第二预测框与第三预测框差别越大,则坐标框损失值越大。例如,确定第二预测框的4个顶点坐标对应的坐标点偏移量的第一概率分布,即每个顶点坐标对应一个第一概率分布,确定第三预测框的4个顶点坐标对应的坐标点偏移量的第二概率分布,即每个顶点坐标对应一个第二概率分布。在确定顶点坐标对应的概率分布时,通过均值和方差表示概率分布,如顶点坐标x对应的概率分布为N(μ
tx,∑
tx)。
基于第一概率分布和第二概率分布,就可以确定一致性约束的坐标框损失值,比如说,可以采用如下公式计算一致性约束的坐标框损失值,当然,如下公式(5)只是示例,对此不做限制,只要能够得到坐标框损失值即可。
L
c-loc=∑
a={x,y,w,h}KL(N(μ
ta-t,∑
ta-t)||N(μ
ta-s,∑
ta-s)) (5),
在上述公式(5)中,L
c-loc表示一致性约束的坐标框损失值,a表示预测框的4个顶点坐标,如a可以为顶点坐标(x,y)、a可以为顶点坐标(x+w,y)、a可以为顶点坐标(x,y+h)、a可以为顶点坐标(x+w,y+h),即a可以是4个顶点坐标。
KL代表KL(Kullback-Leibler)散度,也可以称为相对熵或者信息散度等,N(μ
ta-s,∑
ta-s)表示顶点坐标a对应的坐标点偏移量的第一概率分布(也称为高斯分布),即初始学习模型输出的第二预测框对应的概率分布,N(μ
ta-t,∑
ta-t)表示顶点坐标a对应的坐标点偏移量的第二概率分布,即初始管理模型输出的第三预测框对应的概率分布,在实际应用中,∑
ta-t还可以进行锐化处理。
综上所述,可以计算顶点坐标(x,y)对应的坐标点偏移量的第一概率分布和顶点坐标(x,y)对应的坐标点偏移量的第二概率分布之间的相对熵,并计算顶点坐标(x+w,y)对应的坐标点偏移量的第一概率分布和顶点坐标(x+w,y)对应的坐标点偏移量的第二概率分布之间的相对熵,并计算顶点坐标(x,y+h)对应的坐标点偏移量的第一概率分布和顶点坐标(x,y+h)对应的坐标点偏移量的第二概率分布之间的相对熵,并计算顶点坐标(x+w,y+h)对应的坐标点偏移量的第一概率分布和顶点坐标(x+w,y+h)对应的坐标点偏移量的第二概率分布之间的相对熵。然后,计算上述4个相对熵的求和值,也就是一致性约束的坐标框损失值L
c-loc。
综上所述,可以得到不确定伪标签对应的类别损失值和坐标框损失值,并基于类别损失值和坐标框损失值确定不确定伪标签对应的第二损失值。
步骤209、基于第一损失值、第二损失值和第三损失值确定初始学习模型的目标损失值。比如说,可以将第一损失值、第二损失值和第三损失值之间的平均值,作为初始学习模型的目标损失值,或者,将第一损失值、第二损失值和第三损失值之间的求和值,作为初始学习模型的目标损失值,对此不做限制。
步骤210、基于该目标损失值对初始学习模型的网络参数进行调整,得到调整后学习模型。比如说,在得到初始学习模型的目标损失值之后,可以基于该目标损失值对初始学习模型的网络参数进行调整,如采用梯度下降法等方式对初始学习模型的网络参数进行调整,本实施例对此调整过程不做限制。
步骤211、基于调整后学习模型的网络参数对初始管理模型的网络参数进行调整,得到调整后管理模型。比如说,在得到调整后学习模型之后,基于调整后学习模型的网络参数和已配置的比例系数确定网络参数的参数修正值,并基于该参数修正值对初始管理模型的网络参数进行调整,得到调整后管理模型。
例如,基于调整后学习模型的网络参数,可以采用EMA(Exponential Moving Average,指数平均数指标)算法确定管理模型的网络参数,从而得到调整后管理模型。在采用EMA算法确定管理模型的网络参数时,可以基于调整后学习模型的网络参数和比例系数确定参数修正值,并基于参数修正值对初始管理模型的网络参数进行调整,得到调整后管理模型,对此过程不做限制。
步骤212、判断调整后管理模型是否已收敛。若调整后管理模型未收敛,则可以执行步骤213,若调整后管理模型已收敛,则可以执行步骤214。
示例性的,若初始管理模型或初始学习模型的迭代次数达到次数阈值,则确定调整后管理模型已收敛,若初始管理模型或初始学习模型的迭代次数未达到次数阈值,则确定调整后管理模型未收敛。或者,若初始学习模型的目标损失值小于损失值阈值,则确定调整后管理模型已收敛,若初始学习模型的目标损失值不小于损失值阈值,则确定调整后管理模型未收敛。
步骤213、若调整后管理模型未收敛,则将调整后学习模型确定为初始学习模型,将调整后管理模型确定为初始管理模型,返回执行步骤203。
步骤214、若调整后管理模型已收敛,则将已收敛的调整后管理模型确定为目标管 理模型,而目标管理模型就是最终需要输出的模型。
示例性的,在得到目标管理模型之后,可以基于目标管理模型对待检测数据进行目标检测。比如说,比如说,可以将待检测数据(如待检测图像)输入给目标管理模型,由目标管理模型输出待检测数据中的目标框,并识别出目标或目标类别,如识别出人脸、识别出车辆类别、识别出动物类别、识别出电子产品类别等,对此过程不再赘述。
由以上技术方案可见,本公开实施例中,可以通过少量有标签数据和大量无标签数据训练得到目标管理模型,从而避免获取大量有标签数据,减轻标注操作的工作量,节约人力资源,且目标管理模型具有比较好的性能,可靠性很高。在有效利用伪标签的基础上,提升训练过程中对噪声样本的鲁棒性,目标管理模型相较基线模型有非常大的提升。将不同比例的伪标签设置为高质量的伪标签均能取得比较好的训练结果,对噪声样本具有较好的鲁棒性,对超参不敏感。基于相同图像不同增广后的特征维持一致的原理,对无标签数据不同增广下的预测框和预测标签建立一致性约束,能够在有效利用伪标签的基础上,提升训练过程中对噪声样本的鲁棒性。可以将伪标签和一致性约束进行结合,对可靠的预测框,采用伪标签作为真实类别,对于不确信的预测框,对预测结果(或者特征)建立一致性对比损失。利用不同视图的预测结果(或者特征)来约束同一目标在不同增广下不改变类别特性,采用管理模型-学习模型来产生更加平滑稳定的分类器,并只对学习模型进行梯度回传。
基于与上述方法同样的申请构思,本公开实施例中提出一种目标检测装置,参见图3所示,为所述目标检测装置的结构示意图,所述装置可以包括:
获取模块31,用于获取初始管理模型和初始学习模型,基于所述初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签;确定模块32,用于将无标签数据输入给初始学习模型,得到该无标签数据对应的第一预测值;基于与高质量伪标签对应的第一预测值确定第一预测标签和第一预测框,基于与不确定伪标签对应的第一预测值确定第二预测标签和第二预测框;将无标签数据输入给初始管理模型,得到该无标签数据对应的第二预测值,基于与不确定伪标签对应的第二预测值确定第三预测标签和第三预测框;处理模块33,用于基于所述第一预测标签、第一预测框、第二预测标签、第二预测框、第三预测标签和第三预测框对初始管理模型进行训练,得到目标管理模型;其中,所述目标管理模型用于对待检测数据进行目标检测。
示例性的,所述获取模块31获取初始管理模型和初始学习模型时具体用于:利用有标签数据训练得到一个基线模型;基于所述基线模型生成初始管理模型和初始学习模型;其中,所述初始管理模型的网络结构与所述基线模型的网络结构相同,所述初始管理模型的网络参数与所述基线模型的网络参数相同或不同;所述初始学习模型的网络结构与所述基线模型的网络结构相同,所述初始学习模型的网络参数与所述基线模型的网络参数相同或不同。
示例性的,所述获取模块31基于所述初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签时具体用于:针对每个无标签数据,将该无标签数据输入给所述初始管理模型,得到该无标签数据对应的伪标签、及与所述伪标签对应的概率值;针对所述初始管理模型支持检测的每种类别,基于与该类别对应的所有伪标签对应的概率值,对该类别对应的所有伪标签进行排序;基于排序结果,选取概率值大的K个伪标签作为该类别对应的高质量伪标签,并将该类别对应的所有伪标签中除该类别对应的高质量伪标签之外的剩余伪标签作为该类别对应的不确定伪标签;其中,K为正整数。
示例性的,所述处理模块33基于所述第一预测标签、第一预测框、第二预测标签、第二预测框、第三预测标签和第三预测框对初始管理模型进行训练,得到目标管理模型 时具体用于:基于所述第一预测标签和所述第一预测框确定第一损失值;基于所述第二预测标签、所述第二预测框、所述第三预测标签和所述第三预测框确定第二损失值;基于所述第一损失值和所述第二损失值对初始管理模型进行调整,得到所述目标管理模型。
示例性的,所述处理模块33基于所述第二预测标签、第二预测框、第三预测标签和第三预测框确定第二损失值时具体用于:若所述第二预测标签包括与所述初始学习模型支持检测的C种类别对应的C个第一概率值,所述第三预测标签包括与所述初始管理模型支持检测的C种类别对应的C个第二概率值,则基于所述C个第一概率值和所述C个第二概率值确定一致性约束的类别损失值;其中,所述C为大于1的正整数;确定与所述第二预测框对应的坐标点偏移量的第一概率分布,并确定与所述第三预测框对应的坐标点偏移量的第二概率分布,基于所述第一概率分布和所述第二概率分布确定一致性约束的坐标框损失值;基于所述类别损失值和所述坐标框损失值确定第二损失值。
示例性的,所述处理模块33基于所述第一概率分布和所述第二概率分布确定一致性约束的坐标框损失值时具体用于:确定所述第一概率分布与所述第二概率分布之间的相对熵;基于所述相对熵确定一致性约束的坐标框损失值。
示例性的,所述处理模块33基于所述第一损失值和所述第二损失值对初始管理模型进行调整,得到所述目标管理模型时具体用于:基于所述第一损失值和所述第二损失值对所述初始学习模型的网络参数进行调整,得到调整后学习模型;基于所述调整后学习模型的网络参数对所述初始管理模型的网络参数进行调整,得到调整后管理模型;若所述调整后管理模型未收敛,则将所述调整后学习模型确定为初始学习模型,将所述调整后管理模型确定为初始管理模型,返回执行基于所述初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签的操作;若所述调整后管理模型已收敛,则将所述调整后管理模型确定为所述目标管理模型。
示例性的,所述处理模块33基于所述调整后学习模型的网络参数对所述初始管理模型的网络参数进行调整,得到调整后管理模型时具体用于:基于调整后学习模型的网络参数和已配置的比例系数确定网络参数的参数修正值,基于所述参数修正值对初始管理模型的网络参数进行调整,得到调整后管理模型。
基于与上述方法同样的申请构思,本公开实施例中提出一种目标检测设备,参见图4所示,所述目标检测设备可以包括:处理器41和机器可读存储介质42,所述机器可读存储介质42存储有能够被所述处理器41执行的机器可执行指令;所述处理器41用于执行机器可执行指令,以实现本公开上述示例公开的目标检测方法。比如说,处理器41用于执行机器可执行指令,以实现如下步骤:
获取初始管理模型和初始学习模型,并基于所述初始管理模型为无标签数据添加伪标签,并将伪标签划分为高质量伪标签和不确定伪标签;
将无标签数据输入给初始学习模型,得到该无标签数据对应的第一预测值;基于与高质量伪标签对应的第一预测值确定第一预测标签和第一预测框,基于与不确定伪标签对应的第一预测值确定第二预测标签和第二预测框;
将无标签数据输入给初始管理模型,得到该无标签数据对应的第二预测值,基于与不确定伪标签对应的第二预测值确定第三预测标签和第三预测框;
基于所述第一预测标签、第一预测框、第二预测标签、第二预测框、第三预测标签和第三预测框对初始管理模型进行训练,得到目标管理模型;
其中,所述目标管理模型用于对待检测数据进行目标检测。
基于与上述方法同样的申请构思,本公开实施例还提供一种机器可读存储介质, 所述机器可读存储介质上存储有若干计算机指令,所述计算机指令被处理器执行时,能够实现本公开上述示例公开的目标检测方法。
其中,上述机器可读存储介质可以是任何电子、磁性、光学或其它物理存储装置,可以包含或存储信息,如可执行指令、数据,等等。例如,机器可读存储介质可以是:RAM(Radom Access Memory,随机存取存储器)、易失存储器、非易失性存储器、闪存、存储驱动器(如硬盘驱动器)、固态硬盘、任何类型的存储盘(如光盘、dvd等),或者类似的存储介质,或者它们的组合。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本公开时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本公开的实施例可提供为方法、系统、或计算机程序产品。因此,本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本公开实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本公开是参照根据本公开实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可以由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
而且,这些计算机程序指令也可以存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或者多个流程和/或方框图一个方框或者多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备上,使得在计算机或者其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本公开的实施例而已,并不用于限制本公开。对于本领域技术人员来说,本公开可以有各种更改和变化。凡在本公开的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本公开的权利要求范围之内。
Claims (12)
- 一种目标检测方法,所述方法包括:获取初始管理模型和初始学习模型,并基于所述初始管理模型为无标签数据添加伪标签,并将所述伪标签划分为高质量伪标签和不确定伪标签;将所述无标签数据输入给所述初始学习模型,得到所述无标签数据对应的第一预测值;基于与所述高质量伪标签对应的第一预测值确定第一预测标签和第一预测框,基于与所述不确定伪标签对应的第一预测值确定第二预测标签和第二预测框;将所述无标签数据输入给所述初始管理模型,得到所述无标签数据对应的第二预测值,基于与所述不确定伪标签对应的第二预测值确定第三预测标签和第三预测框;基于所述第一预测标签、所述第一预测框、所述第二预测标签、所述第二预测框、所述第三预测标签和所述第三预测框对所述初始管理模型进行训练,得到目标管理模型;其中,所述目标管理模型用于对待检测数据进行目标检测。
- 根据权利要求1所述的方法,其特征在于,所述获取初始管理模型和初始学习模型,包括:利用有标签数据训练得到基线模型;基于所述基线模型生成所述初始管理模型和所述初始学习模型;其中,所述初始管理模型的网络结构与所述基线模型的网络结构相同;其中,所述初始学习模型的网络结构与所述基线模型的网络结构相同。
- 根据权利要求1所述的方法,其特征在于,基于所述初始管理模型为无标签数据添加伪标签,并将所述伪标签划分为高质量伪标签和不确定伪标签,包括:针对每个无标签数据,将该无标签数据输入给所述初始管理模型,得到该无标签数据对应的伪标签、及与所述伪标签对应的概率值;针对所述初始管理模型支持检测的类别中的每种类别,基于与该类别对应的所有伪标签对应的概率值,对该类别对应的所有伪标签进行排序;基于排序结果,选取概率值大的K个伪标签作为该类别对应的高质量伪标签,并将该类别对应的所有伪标签中除该类别对应的高质量伪标签之外的剩余伪标签确定为该类别对应的不确定伪标签;其中,所述K为正整数。
- 根据权利要求1所述的方法,其特征在于,所述将无标签数据输入给初始学习模型,包括:对所述无标签数据进行第一数据增广,将第一数据增广后的无标签数据输入给所述初始学习模型;所述将无标签数据输入给初始管理模型,包括:对所述无标签数据进行第二数据增广,将第二数据增广后的无标签数据输入给所述初始管理模型;其中,所述第一数据增广的方式与所述第二数据增广的方式不同。
- 根据权利要求1所述的方法,其特征在于,所述基于所述第一预测标签、所述第一预测框、所述第二预测标签、所述第二预测框、所述第三预测标签和所述第三预测框对所述初始管理模型进行训练,得到目标管理模型,包括:基于所述第一预测标签和所述第一预测框确定第一损失值;基于所述第二预测标签、所述第二预测框、所述第三预测标签和所述第三预测框确定第二损失值;基于所述第一损失值和所述第二损失值对所述初始管理模型进行调整,得到所述目标管理模型。
- 根据权利要求5所述的方法,其特征在于,所述基于所述第二预测标签、所述第二预测框、所述第三预测标签和所述第三预测框确定第二损失值,包括:响应于确定所述第二预测标签包括与所述初始学习模型支持检测的C种类别对应的C个第一概率值,所述第三预测标签包括与所述初始管理模型支持检测的C种类别对应的C个第二概率值,则基于所述C个第一概率值和所述C个第二概率值确定一致性约束的类别损失值;其中,所述C为大于1的正整数;确定与所述第二预测框对应的坐标点偏移量的第一概率分布,并确定与所述第三预测框对应的坐标点偏移量的第二概率分布,基于所述第一概率分布和所述第二概率分布确定一致性约束的坐标框损失值;基于所述类别损失值和所述坐标框损失值确定所述第二损失值。
- 根据权利要求5所述的方法,其特征在于,所述基于所述第一损失值和所述第二损失值对所述初始管理模型进行调整,得到所述目标管理模型之前,所述方法还包括:将有标签数据输入给所述初始学习模型,得到该有标签数据对应的第三预测值,基于所述第三预测值确定第四预测标签和第四预测框,并基于第四预测标签和第四预测框确定第三损失值;所述基于所述第一损失值和所述第二损失值对所述初始管理模型进行调整,得到所述目标管理模型,包括:基于所述第一损失值、所述第二损失值和所述第三损失值对所述初始管理模型进行调整,得到所述目标管理模型。
- 根据权利要求5或7所述的方法,其特征在于,所述基于所述第一损失值和所述第二损失值对所述初始管理模型进行调整,得到所述目标管理模型,包括:基于所述第一损失值和所述第二损失值对所述初始学习模型的网络参数进行调整,得到调整后学习模型;基于所述调整后学习模型的网络参数对所述初始管理模型的网络参数进行调整,得到调整后管理模型;响应于确定所述调整后管理模型未收敛,则将调整后学习模型确定为初始学习模型,将调整后管理模型确定为初始管理模型,返回执行基于所述初始管理模型为无标签数据添加伪标签,并将所述伪标签划分为高质量伪标签和不确定伪标签的操作;响应于确定所述调整后管理模型已收敛,则将所述调整后管理模型确定为所述目标管理模型。
- 根据权利要求8所述的方法,其特征在于,所述基于所述调整后学习模型的网络参数对所述初始管理模型的网络参数进行调整,得到调整后管理模型,包括:基于所述调整后学习模型的网络参数和已配置的比例系数确定网络参数的参数修正值,并基于所述参数修正值对所述初始管理模型的网络参数进行调整,得到所述调整后管理模型。
- 一种目标检测装置,所述装置包括:获取模块,用于获取初始管理模型和初始学习模型,基于所述初始管理模型为无标签数据添加伪标签,并将所述伪标签划分为高质量伪标签和不确定伪标签;确定模块,用于将所述无标签数据输入给所述初始学习模型,得到所述无标签数据对应的第一预测值;基于与所述高质量伪标签对应的第一预测值确定第一预测标签和第一预测框,基于与所述不确定伪标签对应的第一预测值确定第二预测标签和第二预测框;将所述无标签数据输入给初始管理模型,得到所述无标签数据对应的第二预测值,基于与所述不确定伪标签对应的第二预测值确定第三预测标签和第三预测框;处理模块,用于基于所述第一预测标签、所述第一预测框、所述第二预测标签、所述第二预测框、所述第三预测标签和所述第三预测框对所述初始管理模型进行训练,得到目标管理模型;其中,所述目标管理模型用于对待检测数据进行目标检测。
- 一种目标检测设备,包括:处理器和机器可读存储介质,所述机器可读存储介质存储有能够被所述处理器执行的机器可执行指令;其中当所述处理器用于执行机器可执行指令,以实现权利要求1至9任一所述的目标检测方法。
- 一种机器可读存储介质,所述机器可读存储介质上存储有若干计算机指令,所述计算机指令被处理器执行时,能够实现权利要求1至9任一所述的目标检测方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP22894711.5A EP4435660A1 (en) | 2021-11-19 | 2022-11-11 | Target detection method and apparatus |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111401508.9A CN114118259A (zh) | 2021-11-19 | 2021-11-19 | 一种目标检测方法及装置 |
CN202111401508.9 | 2021-11-19 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023088174A1 true WO2023088174A1 (zh) | 2023-05-25 |
Family
ID=80371546
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/131320 WO2023088174A1 (zh) | 2021-11-19 | 2022-11-11 | 目标检测方法及装置 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP4435660A1 (zh) |
CN (1) | CN114118259A (zh) |
WO (1) | WO2023088174A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117612206A (zh) * | 2023-11-27 | 2024-02-27 | 深圳市大数据研究院 | 行人重识别网络模型生成方法、装置、计算机设备及介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114118259A (zh) * | 2021-11-19 | 2022-03-01 | 杭州海康威视数字技术股份有限公司 | 一种目标检测方法及装置 |
CN114882243B (zh) * | 2022-07-11 | 2022-11-22 | 浙江大华技术股份有限公司 | 目标检测方法、电子设备及计算机可读存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200012953A1 (en) * | 2018-07-03 | 2020-01-09 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for generating model |
CN111222648A (zh) * | 2020-01-15 | 2020-06-02 | 深圳前海微众银行股份有限公司 | 半监督机器学习优化方法、装置、设备及存储介质 |
CN111291755A (zh) * | 2020-02-13 | 2020-06-16 | 腾讯科技(深圳)有限公司 | 对象检测模型训练及对象检测方法、装置、计算机设备和存储介质 |
CN111898696A (zh) * | 2020-08-10 | 2020-11-06 | 腾讯云计算(长沙)有限责任公司 | 伪标签及标签预测模型的生成方法、装置、介质及设备 |
CN112686300A (zh) * | 2020-12-29 | 2021-04-20 | 杭州海康威视数字技术股份有限公司 | 一种数据处理方法、装置及设备 |
CN114118259A (zh) * | 2021-11-19 | 2022-03-01 | 杭州海康威视数字技术股份有限公司 | 一种目标检测方法及装置 |
-
2021
- 2021-11-19 CN CN202111401508.9A patent/CN114118259A/zh active Pending
-
2022
- 2022-11-11 WO PCT/CN2022/131320 patent/WO2023088174A1/zh active Application Filing
- 2022-11-11 EP EP22894711.5A patent/EP4435660A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200012953A1 (en) * | 2018-07-03 | 2020-01-09 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus for generating model |
CN111222648A (zh) * | 2020-01-15 | 2020-06-02 | 深圳前海微众银行股份有限公司 | 半监督机器学习优化方法、装置、设备及存储介质 |
CN111291755A (zh) * | 2020-02-13 | 2020-06-16 | 腾讯科技(深圳)有限公司 | 对象检测模型训练及对象检测方法、装置、计算机设备和存储介质 |
CN111898696A (zh) * | 2020-08-10 | 2020-11-06 | 腾讯云计算(长沙)有限责任公司 | 伪标签及标签预测模型的生成方法、装置、介质及设备 |
CN112686300A (zh) * | 2020-12-29 | 2021-04-20 | 杭州海康威视数字技术股份有限公司 | 一种数据处理方法、装置及设备 |
CN114118259A (zh) * | 2021-11-19 | 2022-03-01 | 杭州海康威视数字技术股份有限公司 | 一种目标检测方法及装置 |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117612206A (zh) * | 2023-11-27 | 2024-02-27 | 深圳市大数据研究院 | 行人重识别网络模型生成方法、装置、计算机设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN114118259A (zh) | 2022-03-01 |
EP4435660A1 (en) | 2024-09-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chen et al. | Underwater object detection using Invert Multi-Class Adaboost with deep learning | |
CN111079639B (zh) | 垃圾图像分类模型构建的方法、装置、设备及存储介质 | |
WO2023088174A1 (zh) | 目标检测方法及装置 | |
CN109086658B (zh) | 一种基于生成对抗网络的传感器数据生成方法与系统 | |
CN108564129B (zh) | 一种基于生成对抗网络的轨迹数据分类方法 | |
EP3798917A1 (en) | Generative adversarial network (gan) for generating images | |
CN107209861B (zh) | 使用否定数据优化多类别多媒体数据分类 | |
Zhuang et al. | Visual tracking via discriminative sparse similarity map | |
WO2020114378A1 (zh) | 视频水印的识别方法、装置、设备及存储介质 | |
US20170344881A1 (en) | Information processing apparatus using multi-layer neural network and method therefor | |
US20210383225A1 (en) | Self-supervised representation learning using bootstrapped latent representations | |
US10878297B2 (en) | System and method for a visual recognition and/or detection of a potentially unbounded set of categories with limited examples per category and restricted query scope | |
US11288542B1 (en) | Learning graph-based priors for generalized zero-shot learning | |
Danisman et al. | Intelligent pixels of interest selection with application to facial expression recognition using multilayer perceptron | |
CN108961358B (zh) | 一种获得样本图片的方法、装置及电子设备 | |
CN110111365B (zh) | 基于深度学习的训练方法和装置以及目标跟踪方法和装置 | |
Duman et al. | Distance estimation from a monocular camera using face and body features | |
US20210365719A1 (en) | System and method for few-shot learning | |
Yao | [Retracted] Application of Higher Education Management in Colleges and Universities by Deep Learning | |
US20240320493A1 (en) | Improved Two-Stage Machine Learning for Imbalanced Datasets | |
CN111753583A (zh) | 一种识别方法及装置 | |
Barcic et al. | Convolutional Neural Networks for Face Recognition: A Systematic Literature Review | |
CN111858999A (zh) | 一种基于分段困难样本生成的检索方法及装置 | |
Dornier et al. | Scaf: Skip-connections in auto-encoder for face alignment with few annotated data | |
JP7270839B2 (ja) | 顔認識のための汎用特徴表現学習 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22894711 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2022894711 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2022894711 Country of ref document: EP Effective date: 20240619 |