CN117372813A

CN117372813A - Target detection method and device based on pre-marking

Info

Publication number: CN117372813A
Application number: CN202311416735.8A
Authority: CN
Inventors: 杨云聪; 李俊杰; 魏茂敏; 廖洪波
Original assignee: Shanghai Dingge Information Technology Co ltd
Current assignee: Shanghai Dingge Information Technology Co ltd
Priority date: 2023-10-30
Filing date: 2023-10-30
Publication date: 2024-01-09

Abstract

The invention discloses a target detection method and device based on pre-labeling, wherein the method comprises the following steps: acquiring a first data set; labeling the first data set to obtain a second data set; inputting the second data set into the first target detection model for training according to preset configuration parameters to obtain a second target detection model; inputting the third data set into the second target detection model for detection to obtain a fourth data set; performing confidence threshold filtering and non-maximum suppression filtering on the detection result in the fourth data set to obtain a fifth data set; adjusting the detection result in the fifth data set to obtain a sixth data set; and inputting the second data set and the sixth data set as training data sets into the second target detection model for training to obtain a third target detection model. According to the target detection method and device based on pre-marking, the marking workload of a user is greatly reduced through pre-marking, and the use cost of the user is reduced.

Description

Target detection method and device based on pre-marking

Technical Field

The embodiment of the invention relates to the technical field of image recognition, in particular to a target detection method and device based on pre-labeling.

Background

At present, the artificial intelligence technology has been deeply applied to various industries and fields of production and life, and the application range of target detection based on a deep convolutional neural network is gradually widened.

There are roughly two schemes for existing target detection applications. The method has the advantages that firstly, a developer provides a complete software and hardware scheme, the project construction period of the method is long, the project cost is high, and the method is difficult to bear for users with smaller volumes and single purposes; secondly, the user provides training data, the training data is transmitted to the deep learning platform for training, the data security of the method is poor, and the method is difficult to use for users with sensitive data security.

For example, patent application 201910759146.7 discloses an image classification model training method and system for automatically generating a training data set, the method comprising: step 1, building a key feature learning model, and training the feature learning model by using an original training set formed by pictures only marked with key features of an object; step 2, capturing target features of each picture in the original training set, and automatically marking the captured features according to the classification of the captured features to generate key feature data; step 3, training the key feature data; step 4, after the key feature data are trained, feeding the pictures with wrong classification back to the feature learning model, and putting the pictures with wrong classification into a training set of the feature learning model for iterative upgrading; the step 1 specifically comprises the following steps: 1) Carrying out convolution operation on the image by utilizing a WideResNet network to extract image characteristics, wherein the image characteristics are used for subsequent candidate region screening and classification; 2) Screening a foreground candidate region with the highest probability based on image features extracted by the WideResNet through a 3*3 convolution layer and a softmax full-link layer; 3) Classifying the candidate areas by using the image features extracted in the step 1); 4) Finishing correction on the position of the candidate region by means of linear regression; the step 2 specifically comprises the following steps: 1) Automatically generating: the feature learning model returns coordinate axes of the key feature region relative to the upper left corner (x 1, y 1) and the lower right corner (x 2, y 2) of the original image, and a local key feature picture is automatically generated according to the coordinate axes; 2) Automatic labeling: automatically labeling the original picture and the local key feature picture according to the identified key feature category, so as to realize automatic labeling of training data; 3) Automatic data expansion: according to the quantity comparison condition of the generated training data of various types, automatic data expansion is carried out on the classification of less data so as to achieve the data 1 of various types: 1, automatic data expansion is performed by random clipping on the original picture. However, the above method is complicated, and the system requires a lot of calculation time.

Therefore, it is necessary to provide a target detection method and device based on pre-labeling to solve the above problems.

Disclosure of Invention

The object of the embodiment of the application is to provide a target detection method and device based on pre-marking, which greatly reduces the marking workload of a user and reduces the use cost of the user through pre-marking.

According to one aspect of the present invention, there is provided a target detection method based on pre-labeling, including:

acquiring a first data set, wherein the first data set is a first number of pictures;

labeling the first data set to obtain a second data set, wherein labeling the first data set comprises labeling a target to be detected in the first number of pictures by using a maximum circumscribed rectangular frame, and the second data set is a set of picture files and corresponding labeling information files;

inputting the second data set as a training data set into a first target detection model for training according to preset configuration parameters to obtain a second target detection model;

inputting a third data set into the second target detection model for detection to obtain a fourth data set, wherein the third data set and the first data set are second quantity of picture data which are distributed in the same way, and the fourth data set comprises a set of picture files and corresponding detection result information files thereof;

performing confidence threshold filtering and non-maximum suppression filtering on the detection result in the fourth data set to obtain a fifth data set;

adjusting the detection result in the fifth data set to obtain a sixth data set, wherein the adjustment comprises newly adding a target to be detected which is not detected by the second target detection model, modifying the target to be detected which is not accurately detected by the second target detection model, and deleting the target to be detected which is erroneously detected by the second target detection model, and the sixth data set is a pre-labeling result;

and inputting the second data set and the sixth data set as training data sets into the second target detection model for training to obtain a third target detection model, wherein the third target detection model detects a seventh data set to obtain an eighth data set, the seventh data set is a picture acquired in real time, and the eighth data set is a real-time detection result.

Preferably, inputting the picture data into a first target detection model, obtaining a detection result of the target to be detected through forward reasoning, and calculating an error by using the detection result and a manual marking result in a corresponding marking file; and if the calculation error is related to the partial derivative of the weight parameter in the first target detection model, updating the corresponding weight parameter by utilizing the partial derivative according to a gradient descent algorithm.

Preferably, the model training is updating weight parameters in the model including using forward reasoning, back-propagation and gradient descent algorithms.

Preferably, the fourth data set and the eighth data set comprise a category, probability, location and size of the detection result.

Preferably, filtering the eighth data set by using a confidence threshold value to obtain a ninth data set, and reserving a result that the prediction probability in the eighth data set is greater than the confidence threshold value.

Preferably, filtering the ninth data set by using a non-maximum value suppression method to obtain a tenth data set, and calculating the overlapping degree between the detection results in the ninth data set according to the position and the size of the detection results in the ninth data set, wherein the detection result with the largest remaining probability among a plurality of detection results with the overlapping degree larger than a first preset threshold value.

Preferably, filtering the tenth data set by size filtering to obtain an eleventh data set, and filtering the detection result in the tenth data set that the size is greater than or less than a second preset threshold.

Preferably, filtering the eleventh data set by using number filtering to obtain a twelfth data set, and filtering the categories of the detection result with the category number smaller than a third preset threshold value in the detection result of the eleventh data set.

Preferably, an alarm is sent when the ratio of the number of pictures containing the target to be detected in the twelfth data set to the number of field pictures acquired in real time in the seventh data set exceeds a fourth preset threshold.

According to another aspect of the present invention, there is provided a target detection apparatus based on pre-labeling, comprising:

the data set acquisition module is used for acquiring a first data set, wherein the first data set is a first number of pictures;

the data set labeling module is used for labeling the first data set to obtain a second data set, wherein labeling the first data set comprises labeling a to-be-detected target in the first number of pictures by using a maximum circumscribed rectangular frame, and the second data set is a set of picture files and corresponding labeling information files;

the model pre-training module is used for inputting the second data set as a training data set into the first target detection model for training according to preset configuration parameters so as to obtain a second target detection model;

the model pre-labeling module is used for inputting a third data set into the second target detection model for detection to obtain a fourth data set, wherein the third data set and the first data set are second number of pictures in the same distribution, the fourth data set comprises a set of picture files and corresponding detection result information files, and confidence threshold filtering and non-maximum suppression filtering are carried out on detection results in the fourth data set to obtain a fifth data set;

the data set adjusting module is used for adjusting the detection result in the fifth data set to obtain a sixth data set, wherein the adjustment comprises the steps of adding a target to be detected which is not detected by the second target detection model, modifying the target to be detected which is not accurately detected by the second target detection model, and deleting the target to be detected which is erroneously detected by the second target detection model, and the sixth data set is a pre-labeling result;

the model retraining module is used for inputting the second data set and the sixth data set as training data sets into the second target detection model for retraining to obtain a third target detection model, the third target detection model detects a seventh data set to obtain an eighth data set, the seventh data set is a picture acquired in real time, and the eighth data set is a real-time detection result.

The application discloses a target detection method based on pre-labeling, which comprises the following steps: acquiring a first data set, wherein the first data set is a first number of pictures; labeling the first data set to obtain a second data set, wherein labeling the first data set comprises labeling a target to be detected in the first number of pictures by using a maximum circumscribed rectangular frame, and the second data set is a set of picture files and corresponding labeling information files; inputting the second data set as a training data set into a first target detection model for training according to preset configuration parameters to obtain a second target detection model; inputting a third data set into the second target detection model for detection to obtain a fourth data set, wherein the third data set and the first data set are second quantity of picture data which are distributed in the same way, and the fourth data set comprises a set of picture files and corresponding detection result information files thereof; performing confidence threshold filtering and non-maximum suppression filtering on the detection result in the fourth data set to obtain a fifth data set; adjusting the detection result in the fifth data set to obtain a sixth data set, wherein the adjustment comprises newly adding a target to be detected which is not detected by the second target detection model, modifying the target to be detected which is not accurately detected by the second target detection model, and deleting the target to be detected which is erroneously detected by the second target detection model, and the sixth data set is a pre-labeling result; inputting the second data set and the sixth data set as training data sets into the second target detection model for training to obtain a third target detection model, detecting a seventh data set by the third target detection model to obtain an eighth data set, wherein the seventh data set is a picture acquired in real time, the eighth data set is a real-time detection result, and the data set is adjusted to obtain a pre-labeling result, so that the target detection model is continuously optimized to improve the target detection precision;

further, the data are optimized by using a confidence threshold value, a non-maximum value suppression method, size filtering, quantity filtering and other methods, so that the target detection precision is further improved;

further, when the ratio of the number of pictures containing the target to be detected in the twelfth data set to the number of the plurality of field pictures acquired in real time in the seventh data set exceeds a fourth preset threshold, an alarm is sent out, so that when the number of pictures containing the target to be detected in the field pictures acquired in real time is too large, the alarm is sent out in time to warn the field work.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the prior art, a brief description of the drawings is provided below, wherein it is apparent that the drawings in the following description are some, but not all, embodiments of the present invention. Other figures may be derived from these figures without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a flow diagram of a pre-labeled based target detection method according to an embodiment of the invention;

FIG. 2 is a flow diagram of manual annotation based on a pre-annotated object detection method according to an embodiment of the invention;

FIG. 3 is a flow diagram of model pre-labeling based on a pre-labeled target detection method according to an embodiment of the invention;

FIG. 4 is a flow chart of manual adjustment of a pre-labeled target detection method according to an embodiment of the invention;

FIG. 5 is a flow chart of training of a third object detection model based on a pre-labeled object detection method according to an embodiment of the invention;

FIG. 6 is a flow chart of a specific detection based on a pre-labeled target detection method according to an embodiment of the invention;

fig. 7 is a schematic structural diagram of a pre-labeled-based object detection device according to an embodiment of the invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.

The technical scheme of the invention is described in detail below by specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.

According to the problems existing in the prior art, the embodiment of the invention provides the target detection method and the target detection device based on the pre-marking, which greatly lighten the marking workload of a user and reduce the use cost of the user through the pre-marking.

FIG. 1 is a flow diagram of a pre-labeled based target detection method according to an embodiment of the invention; FIG. 2 is a flow diagram of manual annotation based on a pre-annotated object detection method according to an embodiment of the invention; FIG. 3 is a flow diagram of model pre-labeling based on a pre-labeled target detection method according to an embodiment of the invention; FIG. 4 is a flow chart of manual adjustment of a pre-labeled target detection method according to an embodiment of the invention; FIG. 5 is a flow chart of training of a third object detection model based on a pre-labeled object detection method according to an embodiment of the invention; FIG. 6 is a schematic flow chart of specific detection based on a pre-labeled target detection method according to an embodiment of the invention.

As shown in fig. 1 to 6, an embodiment of the present invention provides a target detection method based on pre-labeling, including:

step S101: acquiring a first data set, wherein the first data set is a first number of pictures;

step S102: labeling the first data set to obtain a second data set, wherein labeling the first data set comprises labeling a target to be detected in the first number of pictures by using a maximum circumscribed rectangular frame, and the second data set is a set of picture files and corresponding labeling information files;

step S103: inputting the second data set as a training data set into a first target detection model for training according to preset configuration parameters to obtain a second target detection model;

the preset configuration parameters comprise an initial learning rate of the optimizer during training, a learning rate attenuation method and a learning rate updating method; inputting the size of the picture during training; a method for enhancing images during training; calculating each partial coefficient of the loss value during training; the number of rounds the training dataset traverses during training.

Step S104: inputting a third data set into the second target detection model for detection to obtain a fourth data set, wherein the third data set and the first data set are second quantity of picture data which are distributed in the same way, and the fourth data set comprises a set of picture files and corresponding detection result information files thereof;

step S105: performing confidence threshold filtering and non-maximum suppression filtering on the detection result in the fourth data set to obtain a fifth data set;

step S106: adjusting the detection result in the fifth data set to obtain a sixth data set, wherein the adjustment comprises newly adding a target to be detected which is not detected by the second target detection model, modifying the target to be detected which is not accurately detected by the second target detection model, and deleting the target to be detected which is erroneously detected by the second target detection model, and the sixth data set is a pre-labeling result;

step S107: and inputting the second data set and the sixth data set as training data sets into the second target detection model for training to obtain a third target detection model, wherein the third target detection model detects a seventh data set to obtain an eighth data set, the seventh data set is a picture acquired in real time, and the eighth data set is a real-time detection result.

Specifically, the first number of pictures is a small number of pictures, and the second number of pictures is a large number of pictures.

First, a second data set is obtained by labeling a small number of pictures, and is input into the first target detection model as a training data set to preliminarily obtain a second target detection model.

Secondly, inputting a large number of pictures into a second target detection model for detection to obtain a fourth data set, performing confidence threshold filtering and non-maximum suppression filtering on data in the fourth data set to obtain a fifth data set, and adjusting the fifth data set to obtain sixth data, wherein the sixth data comprises newly adding a target to be detected which is not detected by the second target detection model, modifying the target to be detected which is not accurately detected by the second target detection model, and deleting the target to be detected which is incorrect detected by the second target detection model.

And finally, the second data set and the sixth data set are used as training data sets to be input into a second target detection model for training, and then a third target detection model with more accurate detection is obtained.

In specific implementation, inputting picture data into a first target detection model, obtaining a detection result of the target to be detected through forward reasoning, and calculating an error by using the detection result and a corresponding manual marking result in a marking file; and if the calculation error is related to the partial derivative of the weight parameter in the first target detection model, updating the corresponding weight parameter by utilizing the partial derivative according to a gradient descent algorithm.

In particular implementations, the model training is updating weight parameters in the model including using forward reasoning, back-propagation, and gradient descent algorithms.

Forward reasoning refers to a process of inputting the third data set into the second target detection model, and then obtaining a fourth data set through one-by-one operation of each module in the second target detection model. Counter-propagating means calculating a deviation between the fourth data set output by the second object detection model and the artificial labeling result, calculating a partial derivative of the learnable parameters in each module in the second object detection model by using the deviation, and gradient descent means updating the corresponding learnable parameters by using the partial derivative.

In a specific implementation, the fourth data set and the eighth data set include a category, probability, location, and size of the detection result.

The category refers to the category to which the detection result belongs, the probability refers to the possibility that the detection result is correct, and the position and the size refer to the position and the size of the frame of the detection result in the picture. After the third data set is input into the second target detection model, a corresponding xml format file is obtained, wherein the xml format file contains the prediction of the target to be detected in the third data set by the second target detection model. The fourth data set is an xml file and a corresponding input picture file set obtained by inputting pictures in the third data set into the second target detection model one by one.

In a specific implementation, filtering the eighth data set by using a confidence threshold value to obtain a ninth data set, and reserving a result that the prediction probability in the eighth data set is greater than the confidence threshold value.

In a specific implementation, filtering the ninth data set by using a non-maximum value suppression method to obtain a tenth data set, and calculating the overlapping degree between the detection results in the ninth data set according to the position and the size of the detection results in the ninth data set, wherein the overlapping degree is larger than a first preset threshold value, and the detection result with the largest retention probability is among a plurality of detection results.

In a specific implementation, filtering the tenth data set by using size filtering to obtain an eleventh data set, and filtering the detection result in the tenth data set, where the size of the detection result is greater than or less than a second preset threshold.

In a specific implementation, filtering the eleventh data set by using number filtering to obtain a twelfth data set, and filtering the categories of the detection result of the eleventh data set, where the number of categories is smaller than a third preset threshold.

In a specific implementation, when the ratio of the number of pictures containing the target to be detected in the twelfth data set to the number of field pictures acquired in real time in the seventh data set exceeds a fourth preset threshold value, an alarm is sent.

Fig. 7 is a schematic structural diagram of a target detection device based on pre-labeling according to an embodiment of the present invention, as shown in fig. 7, an embodiment of the present invention provides a target detection device based on pre-labeling, including:

a data set acquisition module 71 for acquiring a first data set, the first data set being a first number of pictures;

a dataset labeling module 72, configured to label the first dataset to obtain a second dataset, where labeling the first dataset includes labeling a target to be detected in the first number of pictures with a maximum circumscribed rectangle, and the second dataset is a set of picture files and corresponding labeling information files;

the model pre-training module 73 is configured to input the second data set as a training data set into the first target detection model for training according to a preset configuration parameter to obtain a second target detection model;

the model pre-labeling module 74 is configured to input a third dataset into the second target detection model for detection to obtain a fourth dataset, where the third dataset and the first dataset are the same distributed second number of pictures, the fourth dataset includes a set of picture files and corresponding detection result information files, and confidence threshold filtering and non-maximum suppression filtering are performed on detection results in the fourth dataset to obtain a fifth dataset;

a data set adjustment module 75, configured to adjust the detection result in the fifth data set to obtain a sixth data set, where the adjustment includes newly adding a target to be detected that is not detected by the second target detection model, modifying the target to be detected that is not accurately detected by the second target detection model, and deleting the target to be detected that is detected by the second target detection model and is in error, and the sixth data set is a pre-labeling result;

a model retraining module 76 for inputting the second data set and the sixth data set as training data sets into the second object detection model for retraining to obtain a third object detection model, wherein the third object detection model detects a seventh data set to obtain an eighth data set, the seventh data set is a picture acquired in real time, and the eighth data set is a real-time detection result.

In summary, the target detection method based on pre-labeling disclosed in the present application includes: acquiring a first data set, wherein the first data set is a first number of pictures; labeling the first data set to obtain a second data set, wherein labeling the first data set comprises labeling a target to be detected in the first number of pictures by using a maximum circumscribed rectangular frame, and the second data set is a set of picture files and corresponding labeling information files; inputting the second data set as a training data set into a first target detection model for training according to preset configuration parameters to obtain a second target detection model; inputting a third data set into the second target detection model for detection to obtain a fourth data set, wherein the third data set and the first data set are second quantity of picture data which are distributed in the same way, and the fourth data set comprises a set of picture files and corresponding detection result information files thereof; performing confidence threshold filtering and non-maximum suppression filtering on the detection result in the fourth data set to obtain a fifth data set; adjusting the detection result in the fifth data set to obtain a sixth data set, wherein the adjustment comprises newly adding a target to be detected which is not detected by the second target detection model, modifying the target to be detected which is not accurately detected by the second target detection model, and deleting the target to be detected which is erroneously detected by the second target detection model, and the sixth data set is a pre-labeling result; inputting the second data set and the sixth data set as training data sets into the second target detection model for training to obtain a third target detection model, detecting a seventh data set by the third target detection model to obtain an eighth data set, wherein the seventh data set is a picture acquired in real time, the eighth data set is a real-time detection result, and the data set is adjusted to obtain a pre-labeling result, so that the target detection model is continuously optimized to improve the target detection precision;

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The target detection method based on the pre-marking is characterized by comprising the following steps of:

2. The target detection method based on pre-marking according to claim 1, wherein the image data is input into a first target detection model, the detection result of the target to be detected is obtained through forward reasoning, and the detection result and the artificial marking result in the corresponding marking file are used for calculating errors; and if the calculation error is related to the partial derivative of the weight parameter in the first target detection model, updating the corresponding weight parameter by utilizing the partial derivative according to a gradient descent algorithm.

3. The pre-label based target detection method of claim 2, wherein the model training is updating weight parameters in the model includes using forward reasoning, back-propagation, and gradient descent algorithms.

4. The pre-labeled based object detection method of claim 1 wherein the fourth dataset and the eighth dataset comprise categories, probabilities, locations, and sizes of detection results.

5. The pre-label based target detection method of claim 4, wherein filtering the eighth dataset with a confidence threshold results in a ninth dataset, retaining results in the eighth dataset for which the prediction probability is greater than the confidence threshold.

6. The pre-labeling-based target detection method according to claim 5, wherein the tenth data set is obtained by filtering the ninth data set by using a non-maximum suppression method, and the overlapping degree between the detection results in the ninth data set is calculated according to the position and the size of the detection results in the ninth data set, wherein the detection result with the largest remaining probability among the plurality of detection results with the overlapping degree larger than the first preset threshold value is obtained.

7. The pre-labeling-based target detection method of claim 6, wherein filtering the tenth dataset using size filtering results in an eleventh dataset, and wherein the detection results in the tenth dataset having a size greater than or less than a second preset threshold are filtered.

8. The method for detecting targets based on pre-labeling according to claim 7, wherein a twelfth dataset is obtained by filtering the eleventh dataset by using a quantity filter, and the category of the detection result of the eleventh dataset is filtered, wherein the category number of the detection result is smaller than a third preset threshold.

9. The pre-labeling-based object detection method according to claim 1, wherein an alarm is sent when a ratio of the number of pictures in the twelfth dataset containing the object to be detected to the number of field pictures acquired in real time by the seventh dataset exceeds a fourth preset threshold.

10. A pre-labeled-based object detection device, comprising:

the model pre-labeling module is used for inputting a third data set into the second target detection model for detection to obtain a fourth data set, wherein the third data set and the first data set are the same distributed second quantity of picture data, the fourth data set comprises a set of picture files and corresponding detection result information files, and confidence threshold filtering and non-maximum suppression filtering are carried out on detection results in the fourth data set to obtain a fifth data set;