CN118172619A

CN118172619A - Difficult sample mining method, device, computer equipment and storage medium

Info

Publication number: CN118172619A
Application number: CN202211579434.2A
Authority: CN
Inventors: 鲁帅; 郑加希; 李智; 周浩
Original assignee: Beijing Wanji Technology Co Ltd
Current assignee: Beijing Wanji Technology Co Ltd
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2024-06-11

Abstract

The application relates to a difficult sample mining method, a device, computer equipment, a storage medium and a computer program product, which train a target detection model according to an initial image training sample set with labeling results; performing mining identification on target image samples in the image sample set to be mined through the trained target detection model to obtain a detection result of the target image samples; determining a difficult sample mined each time according to a detection result of the target image sample, and updating a difficult sample set according to the difficult sample; and updating the initial image training sample set according to the difficult sample, and repeating the processes of training the target detection model and mining the difficult sample until the updated difficult sample set meets the preset condition. The process of mining the difficult sample by the target detection model and training the target model according to the difficult sample is repeated continuously, so that the mining of the difficult sample in the massive image sample set is realized, and the quality of the mined difficult sample can be improved due to the fact that the training model is optimized continuously.

Description

Difficult sample mining method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of image processing technology, and in particular, to a difficult sample mining method, apparatus, computer device, storage medium, and computer program product.

Background

Along with the development of science and technology, the acquisition cost of image data is lower and lower, so that the acquisition of massive unlabeled image data is simpler, and meanwhile, great challenges are brought to manual screening of image samples and manual labeling of images. The camera is used as an important component unit of the road side system and plays an irreplaceable role in the perception, tracking and time detection of the participants of the intelligent traffic system. The image information acquired by the camera is input as the most upstream data, and the quality of the image information directly influences the quality of the downstream algorithm effect. For example, when the image detection model is trained by taking the acquired image information as a sample, the difficulty of manually screening the sample is high, and the sample cannot be guaranteed to encompass all actual scenes, so that the image detection model is difficult to learn for some scenes lacking corresponding samples during training, and the detection performance of the model in corresponding scenes is greatly reduced.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a difficult sample mining method, apparatus, computer device, computer readable storage medium, and computer program product that are capable of being quickly and efficiently executed.

In a first aspect, the present application provides a method of difficult sample mining. The method comprises the following steps:

Acquiring an initial image training sample set with a labeling result, and training a target detection model according to the initial image training sample set;

Selecting a first preset number of target image samples from an image sample set to be mined, inputting the target image samples into a trained target detection model to obtain a detection result of the target image samples, wherein the detection result comprises a detection frame and a prediction probability that the image content in the detection frame belongs to each detection category;

Determining difficult samples in a first preset number of target image samples according to the prediction probability that the image content in the detection frame belongs to each detection category, constructing a difficult sample subset, and updating the difficult sample set according to the difficult sample subset;

and acquiring the manual labeling result of each difficult sample in the difficult sample subset, updating the initial image training sample set according to the difficult sample subset and the corresponding manual labeling result, and returning to the step of training the target detection model according to the initial image training sample set to continue execution until the updated difficult sample set meets the preset condition.

In one embodiment, determining a difficult sample in a first preset number of target image samples according to a prediction probability that the image content in the detection frame belongs to each detection category includes:

Aiming at each detection frame in the target image sample, determining the difficulty of the detection frame according to the target prediction probability of the image content in the detection frame belonging to the target detection category and the maximum value in the prediction probability of the image content in the detection frame belonging to each detection category, wherein the target detection category is determined based on the application scene of the target detection model, and the difficulty is used for indicating the detection difficulty degree of the target detection model on the detection object;

determining the difficulty of the target image sample according to the difficulty of each detection frame in the target image sample;

And determining the difficult samples in the first preset number of target image samples according to the difficulty degree of all the target image samples.

In one embodiment, determining the difficulty of the detection frame according to the maximum value of the target prediction probability of the image content in the detection frame belonging to the target detection category and the prediction probability of the image content in the detection frame belonging to each detection category includes:

the maximum value is differenced with the target prediction probability of the image content in the detection frame belonging to the target detection category, so that a probability difference value is obtained;

and taking the ratio between the probability difference value and the target prediction probability of the image content belonging to the target detection category in the detection frame as the difficulty of the detection frame.

In one embodiment, determining the difficulty level of the target image sample according to the difficulty level of all the detection frames in the target image sample includes:

And taking the minimum value of the difficulties of all the detection frames in the target image sample as the difficulty of the target image sample.

In one embodiment, determining a difficult sample of the first preset number of target image samples according to the difficulty level of all the target image samples comprises:

sorting all target image samples from small to large according to the difficulty of all target image samples to obtain sorted all target image samples;

and selecting a second preset number of target image samples from front to back from all the sequenced target image samples, and taking the second preset number of target image samples as difficult samples.

In one embodiment, updating the initial image training sample set according to the difficult sample subset and the corresponding artificial annotation result comprises:

And taking the difficult sample subset, the corresponding artificial labeling result and the union set of the initial image training sample set as a new initial image training sample set.

In one embodiment, the preset condition includes that the number of difficult samples in the set of difficult samples is greater than a third preset number.

In one embodiment, the preset condition includes that a difference between an evaluation value of the target detection model after the previous training and an evaluation value of the target detection model after the previous training is smaller than a preset threshold, where the evaluation value is used to indicate a target detection effect of the target detection model on the target class.

In a second aspect, the present application also provides a difficult sample excavating device. The device comprises:

The data acquisition module is used for acquiring an initial image training sample set with a labeling result, and training the target detection model according to the initial image training sample set;

The class prediction module is used for selecting a first preset number of target image samples from the image sample set to be mined, inputting the target image samples into the trained target detection model, and obtaining a detection result of the target image samples, wherein the detection result comprises a detection frame and a prediction probability that the image content in the detection frame belongs to each detection class;

The sample determining module is used for determining difficult samples in a first preset number of target image samples according to the prediction probability that the image content in the detection frame belongs to each detection category, constructing a difficult sample subset, and updating the difficult sample set according to the difficult sample subset;

The iteration training module is used for acquiring the manual labeling result of each difficult sample in the difficult sample subset, updating the initial image training sample set according to the difficult sample subset and the corresponding manual labeling result, and returning to the step of training the target detection model according to the initial image training sample set to continue execution until the updated difficult sample set meets the preset condition.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

The difficult sample mining method, the difficult sample mining device, the computer equipment, the storage medium and the computer program product acquire an initial image training sample set with a labeling result, and train a target detection model according to the initial image training sample set; selecting a first preset number of target image samples from an image sample set to be mined, inputting the target image samples into a trained target detection model to obtain a detection result of the target image samples, wherein the detection result comprises a detection frame and a prediction probability that the image content in the detection frame belongs to each detection category; determining difficult samples in a first preset number of target image samples according to the prediction probability that the image content in the detection frame belongs to each detection category, constructing a difficult sample subset, and updating the difficult sample set according to the difficult sample subset; and acquiring the manual labeling result of each difficult sample in the difficult sample subset, updating the initial image training sample set according to the difficult sample subset and the corresponding manual labeling result, and returning to the step of training the target detection model according to the initial image training sample set to continue execution until the updated difficult sample set meets the preset condition. The process of mining the difficult sample by the target detection model and training the target model according to the difficult sample is repeated continuously, so that the mining of the difficult sample in the massive image sample set is realized, and the quality of the mined difficult sample can be improved due to the fact that the training model is optimized continuously.

Drawings

FIG. 1 is an application environment diagram of a difficult sample mining method in one embodiment;

FIG. 2 is a flow diagram of a method of difficult sample mining in one embodiment;

FIG. 3 is a flow chart of a method of mining a difficult sample according to another embodiment;

FIG. 4 is a flow chart of a method of mining a difficult sample according to yet another embodiment;

FIG. 5 is a flow diagram of a method of mining image difficulty samples in one embodiment;

FIG. 6 is a block diagram of a difficult sample mining apparatus in one embodiment;

Fig. 7 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The difficult sample mining method provided by the embodiment of the application can be used for mining difficult samples of the image set acquired by the camera in real time, and also can be used for processing a large number of images acquired by the camera as the sample set. In one embodiment, the difficult sample mining method provided by the embodiment of the present application may be applied to the application environment shown in fig. 1. The terminal 102 is a terminal for acquiring an image, the terminal 102 communicates with the server 104 through a network, the acquired image is sent to the server, and the server 104 is used for completing mining of difficult samples as image samples. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, a difficult sample mining method is provided, and the method is applied to the server 104 in fig. 1 for illustration, and includes the following steps:

step 202, acquiring an initial image training sample set with a labeling result, and training a target detection model according to the initial image training sample set;

The object detection model is a model capable of detecting an object in an image, and the object detection can judge the position and the type of the object in the image, such as Yolo model, fast R-CNN and SSD. For ease of understanding, the present application takes the object detection model Yolov model as an example, and obtains Yolov structure and network parameters of the model to process the image.

The initial image training sample set comprises image samples with labeling results, wherein the labeling results can be obtained by labeling the image samples and can also be manual labeling results. It should be noted that, the number of samples in the initial image training sample set is limited, and the initial image training sample set is used as a sample set for performing initial training on the target detection model, so that the number of the initial image training sample set is small, and the problem that the subsequent optimization of the target detection model is greatly influenced due to the low sample quality in the initial image training sample set can be avoided.

In addition, a small amount of image samples with labeling results can be obtained to serve as a verification set to verify the initial trained samples, and if the error is large and verification fails, the initial image training sample set can be obtained again to train the target detection model.

It should be noted that, the model training referred to in the present application refers to training parameters of the target detection model, such as network weights in the target detection model, and the superparameter of the target detection model may be adjusted according to the application scenario of the target detection model before the first training of the target detection model. For example, in one embodiment, the learning rate of the target detection model may be set to 0.0001, and the batch-size set to 16.

Step 204, selecting a first preset number of target image samples from the image sample set to be mined, inputting the target image samples into a trained target detection model to obtain a detection result of the target image samples, wherein the detection result comprises a detection frame and a prediction probability that the image content in the detection frame belongs to each detection category;

The image sample set to be mined comprises a large number of image samples without labeling results, and the first preset number refers to the number of image samples obtained each time when the image sample set to be mined is mined in batches. It will be appreciated that the first predetermined number is much smaller than the number of samples in the image sample set to be mined. For example, the image sample set to be mined comprises 1000 image samples without labeling results, the first preset number is set to be 50, and 50 samples are randomly selected from the 1000 image samples without labeling results as target image samples for difficult sample mining every time the difficult sample mining is performed. It should be noted that, when the target image sample is obtained, all samples in the image sample set to be mined may be numbered, and a first preset number of samples are sequentially obtained as the target image sample each time, so as to avoid accidental identical results caused by randomly obtaining the target image sample.

After each target image sample is input into the target detection model, a detection result corresponding to each target image sample can be obtained, wherein the detection result comprises information of detection frames corresponding to the identified objects, such as classification results and classification probability of the objects in the detection frames, and the position and the size of the detection frames. It will be appreciated that each object detection model has a plurality of classes of objects that it can identify, i.e. detection classes, e.g. one object detection model can identify 50 objects. When the target detection model detects a target, the object in any detection frame and the probability of predicting the object to be in each detection category are output, and the category with the highest probability is output.

For example, in one embodiment, the object detection model has 4 detection categories, A, B, C and D respectively, and after one of the object image samples is input to the object detection model, a prediction probability [0.2, 0.3, 0.4, 0.1] for each category is obtained, where the probability of the corresponding prediction being C category is the largest.

Step 206, determining a difficult sample in a first preset number of target image samples according to the prediction probability that the image content in the detection frame belongs to each detection category, constructing a difficult sample subset, and updating the difficult sample set according to the difficult sample subset;

The difficult sample is the sample with larger error with the truth label in the prediction. In the difficult sample mining, a detection class is preset and determined as a target class according to the definition of the difficult sample, and then whether the sample is a difficult sample for the target detection class can be determined through the class corresponding to the finally output maximum prediction probability. Specifically, the probability that each detection frame in the target image sample is predicted to be all detection categories can be compared with the maximum prediction probability, the difficulty parameter corresponding to each detection frame can be calculated, the smaller the difficulty parameter value is, the greater the detection difficulty of the model to the detection frame is, and whether the detection frame is a difficult sample is determined according to the difficulty parameter.

It should be noted that, after the difficulty parameter of each target image sample is obtained, the difficulty parameter may be screened by setting a preset threshold value, so as to determine the difficulty samples in the first preset number of target samples; the number of difficult samples to excavate each time may also be set, and the difficult samples are determined from a first preset number of target image samples according to the number of excavated samples. After the difficult samples in the first preset number of target image samples are determined, a difficult sample set is constructed, and all the mined difficult samples are stored in the difficult sample set.

Step 208, obtaining the artificial labeling result of each difficult sample in the difficult sample subset, updating the initial image training sample set according to the difficult sample subset and the corresponding artificial labeling result, and returning to the step of training the target detection model according to the initial image training sample set to continue execution until the updated difficult sample set meets the preset condition.

After the excavated difficult samples are determined, more accurate manual labeling results of each difficult sample are obtained in a manual labeling mode, so that the difficult samples and the labeling results corresponding to the difficult samples can be obtained, the initial image training sample set is updated through the excavated difficult samples to obtain a new image training sample set, and training of the target detection model and the difficult sample excavation process are repeated until preset conditions are met, so that excavation is stopped.

The preset conditions are set according to the requirements of difficult samples required to be mined in a specific application scene, such as the number of the difficult samples, the quality of the difficult samples, and the like. In one embodiment, the preset condition includes that the number of difficult samples in the set of difficult samples is greater than a third preset number.

In the method provided by the embodiment, an initial image training sample set with a labeling result is obtained, and a target detection model is trained according to the initial image training sample set; selecting a first preset number of target image samples from an image sample set to be mined, inputting the target image samples into a trained target detection model to obtain a detection result of the target image samples, wherein the detection result comprises a detection frame and a prediction probability that the image content in the detection frame belongs to each detection category; determining difficult samples in a first preset number of target image samples according to the prediction probability that the image content in the detection frame belongs to each detection category, constructing a difficult sample subset, and updating the difficult sample set according to the difficult sample subset; and acquiring the manual labeling result of each difficult sample in the difficult sample subset, updating the initial image training sample set according to the difficult sample subset and the corresponding manual labeling result, and returning to the step of training the target detection model according to the initial image training sample set to continue execution until the updated difficult sample set meets the preset condition. The process of mining the difficult sample by the target detection model and training the target model according to the difficult sample is repeated continuously, so that the mining of the difficult sample in the massive image sample set is realized, and the quality of the mined difficult sample can be improved due to the fact that the training model is optimized continuously.

In one embodiment, referring to fig. 3, determining a difficult sample in a first preset number of target image samples according to a prediction probability of the image content in the detection frame belonging to each detection category includes:

Step 302, determining, for each detection frame in the target image sample, the difficulty level of the detection frame according to the target prediction probability that the image content in the detection frame belongs to the target detection category and the maximum value in the prediction probability that the image content in the detection frame belongs to each detection category, wherein the target detection category is determined based on the application scene of the target detection model, and the difficulty level is used for indicating the detection difficulty level of the target detection model on the detection object;

step 304, determining the difficulty of the target image sample according to the difficulty of each detection frame in the target image sample;

Step 306, determining a difficult sample in the first preset number of target image samples according to the difficulty level of all the target image samples.

After the target image sample is input into the target detection model, for any detection frame detected in the target image sample, a probability that the image content in the detection frame is predicted to be each detection category is obtained, wherein the probability comprises the probability of the target detection category. And then outputting the maximum value in all probabilities and the corresponding detection category by the target detection model, and judging the detection difficulty of the target detection model on the detection frame according to the relation between the target detection probability and the maximum probability. According to the detection difficulty of all the detection frames in the target image sample, the detection difficulty of the target image sample can be determined, and whether the target image sample is a difficult sample or not is judged.

For example, the object detection model has 4 detection categories, A, B, C and D respectively, and after one of the object image samples is input to the object detection model, a probability of predicting each detection category [0.2, 0.3, 0.4, 0.1] is obtained for one of the detection frames, wherein the probability of predicting the corresponding detection category as category C is the largest, and the object detection category as category B, so that the detection difficulty level of the detection frame can be determined by comparing 0.4 with 0.3. In the comparison, for the convenience of computer program processing, the difficulty level may be calculated from two probability values, and the difficulty level may be represented by the difficulty level. In a specific embodiment, the difference between the two probabilities may be directly used as the difficulty, or the ratio of the two probabilities may be used as the difficulty, which is not particularly limited herein.

It will be appreciated that the detection difficulty level of the target image sample should be closely related to the detection difficulty level of all the detection frames, and thus, in one embodiment, after the detection difficulty level of each detection frame is calculated, the detection difficulty level of the target image sample may be determined by selecting the minimum value, the maximum value or the average value.

It should be noted that, when the difficult samples in the image sample set to be mined are mined, the number of the target detection types is not limited, and only one target detection type may be provided, or a plurality of target detection types may be provided, but for each target detection type, the difficult sample mining process corresponding to each target detection type is the same, and the difficult sample mining processes corresponding to different target detection types are not mutually interfered, and finally the difficult sample set corresponding to each target detection type is obtained. Therefore, in the explanation of the present embodiment, the excavation process of the difficult sample corresponding to one of the target detection types is explained.

According to the method provided by the embodiment, through specific analysis of each detection frame of each target image sample, the difficulty is calculated layer by layer from the detection frame, the target image sample and the difficult sample, the difficult sample is further determined, the logic rules are clear, accurate analysis can be carried out on all target image samples in the first preset number, and the mining accuracy of the difficult sample is improved.

In one embodiment, the difficulty level calculation formula for each detection box is as follows:

Wherein Score _a is the difficulty of the a-th detection frame; A prediction probability that is identified as the ith type for the a-th detection box; /(I) The prediction probability for the a-th detection box to be predicted as the target class l.

In the difficulty calculation process, if the difficulty is determined only by using the difference method, the actual detection difficulty cannot be actually reflected. For example, in the detection of a target image sample, the maximum probability corresponding to one detection frame is 0.7, and the target detection type probability is 0.5; and the maximum probability corresponding to the other detection frame is 0.4, and the probability of the target detection type is 0.2; if only the probability difference is used as the difficulty, the difficulty of the two detection frames is 0.2, which means that the detection difficulty of the two detection frames is the same, but the detection difficulty corresponding to the first detection frame is higher when the detection method is combined with practical application. Therefore, the difficulty is defined by dividing the difference value by the probability of the target detection category, so that the calculated difficulty can more effectively reflect the real detection condition.

The smaller the difficulty is, the greater the difficulty of the target detection model to detect the detection frame is, so that when the difficulty of the target image sample where the detection frame is located is determined, the minimum value in the difficulty of all the detection frames is used as the difficulty of the target image sample, and the detection difficulty of the target detection model to the target image sample can be better reflected.

In one embodiment, referring to fig. 4, determining a difficult sample of the first preset number of target image samples according to the difficulty level of all target image samples includes:

Step 402, sorting all target image samples from small to large according to the difficulty of all target image samples, and obtaining sorted all target image samples;

Step 404, selecting a second predetermined number of target image samples from front to back from all the sorted target image samples, as difficult samples.

The second preset number is set according to the first preset number, and is used for selecting a difficult sample from the target image samples of the first preset number, so that the second preset number is smaller than the first preset number. For example, the first preset number is 50, and the second preset number may be set to 10, that is, 10 samples are selected from the 50 target image samples to be set as difficult samples every difficult sample mining. It should be noted that, the quality of the mined difficult sample and the quality of the target detection model obtained by training are affected by the setting of the second preset number, so the second preset number needs to be reasonably set according to the first preset number.

The smaller the difficulty is, the larger the detection difficulty is, so that the first preset number of target image samples are ranked according to the difficulty, and the second preset number of difficult samples are selected from the smaller difficulty end, namely, from the minimum difficulty value after ranking, and from front to back.

In the method provided by the embodiment, the difficulty sample is selected from the target image samples of each batch according to the difficulty level, so that the quality of the mined difficulty sample and the accuracy of the target detection model trained according to the difficulty sample are ensured.

Because the number of the samples in the initial image training sample set is smaller, and the labeling result accuracy of the samples in the initial image training sample set is lower, after each difficult sample is mined, the difficulty sample mined at this time can be used as the input of the time when the next target detection model is trained, and the effect of mining the next difficult sample is improved. The method comprises the following steps:

Wherein, A training sample set for the jth training of the target detection model; /(I)A subset of difficult samples is mined for the target detection model after the jth training.

The evaluation value of the target detection model refers to a score obtained by evaluating the mining effect of the target detection model according to the manual marking result of the mined difficult sample, and the score is used for representing the recognition capability of the target detection model to the difficult sample. If under a specific scene, namely after the target detection type is determined, the excavation effect (evaluation value) of the target detection sample after the current training on the difficult sample can be obtained after each difficult sample excavation, and whether the target detection model still needs to be continuously trained according to the evaluation values of a plurality of times is determined to improve the quality of the excavated difficult sample.

Specifically, in one embodiment, according to the difference value of the evaluation values of the target detection model after the training is performed twice, the difference value is compared with a preset threshold value, so as to determine whether the training effect of the target detection model is significantly improved, and further determine whether to continue the training-mining iterative process. For example, if the difference value of the evaluation values of the target detection models after the two times of training is greater than a preset threshold value, continuing to repeat the process of mining the difficult sample by the target detection model, and further training the target detection model according to the mined difficult sample; and if the difference value of the evaluation values of the target detection models after the front training and the rear training is not greater than a preset threshold value, stopping the excavation of the difficult sample and the training of the target detection models. It should be noted that the preset number and the evaluation value may be used together or may be used alone, and the present application is not limited in particular herein.

In one embodiment, a method for mining difficult samples of images is provided, see fig. 5, comprising the steps of:

(1) Initial detection model and data acquisition

Acquiring the structure and network parameters of the existing image detection model yolov, and dividing the initial small amount of tagged data into training setsAnd a verification Set _val, and letting the image detection model be in the initial training Set/>Parameter fine tuning is performed, and a huge amount of unlabeled data sets are constructed into unlabeled pool Set _unlab.

(2) Optimization of image detection models

Modifying the trained super parameters: the learning rate is 0.0001, the batch-size is 16, and when the model is optimized for the first time, the input data is an initial training setIn the subsequent optimization model, the j-th mined difficult sample data/>And training set data/>Merging is carried out to obtain a new training set and the new training set is used as input data of a model.

(3) Mining of difficult samples

And (3) taking the unlabeled pool Set _unlab as an optimized model to input to obtain a detection result corresponding to each graph, wherein one graph detection result comprises a plurality of detection frames, and one detection frame comprises position information boxes, confidence coefficient conf and category probability class.

Wherein the difficulty of each detection frame is calculated as follows:

Wherein, Representing the probability that the corresponding detection frame is predicted to be of a certain class i,/>The prediction probability of the attention class l is represented, S _corea represents the difficulty level of inputting a picture, and the smaller the value of S _corea is, the greater the detection difficulty of the model on the detection frame is. The difficulty of each image sample and the new training set construction logic are as follows:

Uncertainty_i＝argmin_k(Score_a)；

Wherein, uncertinty _i represents the difficulty level corresponding to each image, and k represents the number of detection frames corresponding to the image. Sequencing Uncertinty _i, setting the number of sample excavation each time to realize the excavation of difficult samples, and manually and accurately labeling the excavated samples to obtain the jth excavated sample And merge it into the current training set/>New training set/>

(4) Iterative optimization

And (3) repeating the steps (2) and (3), sequentially optimizing the model and excavating difficult samples, and finally obtaining a desired number of labeled difficult sample data sets Set _mining through repeated iterative excavation processes, wherein the effectiveness of the excavated difficult samples can be also verified in repeated model iterative processes.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a difficult sample excavating device for realizing the difficult sample excavating method. The implementation of the solution provided by the device is similar to that described in the above method, so the specific limitations in one or more embodiments of the difficult sample mining device provided below may be referred to above as limitations of the difficult sample mining method, and will not be described in detail herein.

In one embodiment, as shown in FIG. 6, there is provided a difficult sample excavating device comprising: a data acquisition module 601, a class prediction module 602, a sample determination module 603, and an iterative training module 604, wherein:

the data acquisition module 601 is configured to acquire an initial image training sample set with a labeling result, and train the target detection model according to the initial image training sample set;

The class prediction module 602 is configured to select a first preset number of target image samples from the image sample set to be mined, input the target image samples into a trained target detection model, and obtain a detection result of the target image samples, where the detection result includes a detection frame and a prediction probability that image content in the detection frame belongs to each detection class;

the sample determining module 603 is configured to determine, according to a prediction probability that the image content in the detection frame belongs to each detection category, a difficult sample in a first preset number of target image samples, construct a difficult sample subset, and update the difficult sample set according to the difficult sample subset;

The iterative training module 604 is configured to obtain a manual labeling result of each difficult sample in the difficult sample subset, update the initial image training sample set according to the difficult sample subset and the corresponding manual labeling result, and return to the step of training the target detection model according to the initial image training sample set to continue execution until the updated difficult sample set meets a preset condition.

In one embodiment, the sample determination module 603 is further configured to:

In one embodiment, iterative training module 604 is further configured to:

In one embodiment, iterative training module 604 is further configured to determine that the preset condition includes a number of difficult samples in the set of difficult samples being greater than a third preset number.

In one embodiment, the iterative training module 604 is further configured to determine that the preset condition includes that a difference between an evaluation value of the target detection model after the current training and an evaluation value of the target detection model after the previous training is less than a preset threshold, where the evaluation value is used to indicate a target detection effect of the target detection model on the target class.

The various modules in the difficult sample mining apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing sample set data. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a difficult sample mining method.

It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of: determining the preset condition includes that the number of difficult samples in the set of difficult samples is greater than a third preset number.

In one embodiment, the processor when executing the computer program further performs the steps of: the determining of the preset condition comprises that the difference between the evaluation value of the target detection model after the current training and the evaluation value of the target detection model after the last training is smaller than a preset threshold value, and the evaluation value is used for indicating the target detection effect of the target detection model on the target class.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: determining the preset condition includes that the number of difficult samples in the set of difficult samples is greater than a third preset number.

In one embodiment, the computer program when executed by the processor further performs the steps of: the determining of the preset condition comprises that the difference between the evaluation value of the target detection model after the current training and the evaluation value of the target detection model after the last training is smaller than a preset threshold value, and the evaluation value is used for indicating the target detection effect of the target detection model on the target class.

In one embodiment, a computer program product is provided comprising a computer program which, when executed by a processor, performs the steps of:

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magneto-resistive random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (PHASE CHANGE Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in various forms such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), etc. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method of difficult sample mining, the method comprising:

Selecting a first preset number of target image samples from an image sample set to be mined, inputting the target image samples into the trained target detection model to obtain a detection result of the target image samples, wherein the detection result comprises a detection frame and a prediction probability that the image content in the detection frame belongs to each detection category;

determining difficult samples in the first preset number of target image samples according to the prediction probability that the image content in the detection frame belongs to each detection category, constructing a difficult sample subset, and updating the difficult sample set according to the difficult sample subset;

and acquiring a manual labeling result of each difficult sample in the difficult sample subset, updating the initial image training sample set according to the difficult sample subset and the corresponding manual labeling result, and returning to the step of training the target detection model according to the initial image training sample set to continue execution until the updated difficult sample set meets the preset condition.

2. The method of claim 1, wherein determining a difficult sample of the first predetermined number of target image samples based on the predicted probability that the image content in the detection box belongs to each detection class comprises:

For each detection frame in the target image sample, determining the difficulty of the detection frame according to the maximum value of the target prediction probability of the image content in the detection frame belonging to the target detection category and the prediction probability of the image content in the detection frame belonging to each detection category, wherein the target detection category is determined based on the application scene of the target detection model, and the difficulty is used for indicating the detection difficulty of the target detection model on a detection object;

3. The method according to claim 2, wherein determining the difficulty of the detection frame based on the maximum of the target prediction probability of the image content in the detection frame belonging to the target detection category and the prediction probability of the image content in the detection frame belonging to each detection category comprises:

the maximum value is differenced with target prediction probability of the image content in the detection frame belonging to a target detection category, so that a probability difference value is obtained;

4. The method of claim 2, wherein determining the difficulty level of the target image sample based on the difficulty levels of all the detection frames in the target image sample comprises:

5. The method of claim 2, wherein determining a difficult sample of the first predetermined number of target image samples based on the difficulty levels of all target image samples comprises:

6. The method of claim 1, wherein the updating the initial image training sample set based on the difficult sample subset and the corresponding artificial annotation result comprises:

7. The method of claim 1, wherein the preset condition comprises a number of difficult samples in the set of difficult samples being greater than a third preset number.

8. The method according to claim 1, wherein the preset condition includes that a difference between an evaluation value of the target detection model after the previous training and an evaluation value of the target detection model after the previous training is smaller than a preset threshold, and the evaluation value is used for indicating a target detection effect of the target detection model on a target class.

9. A difficult sample mining apparatus, the apparatus comprising:

The data acquisition module is used for acquiring an initial image training sample set with a labeling result, and training a target detection model according to the initial image training sample set;

The sample determining module is used for determining difficult samples in the first preset number of target image samples according to the prediction probability that the image content in the detection frame belongs to each detection category, constructing a difficult sample subset, and updating the difficult sample set according to the difficult sample subset;

And the iterative training module is used for acquiring the manual labeling result of each difficult sample in the difficult sample subset, updating the initial image training sample set according to the difficult sample subset and the corresponding manual labeling result, and returning to the step of training the target detection model according to the initial image training sample set to continue execution until the updated difficult sample set meets the preset condition.

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1to 8 when the computer program is executed.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.

12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 8.