CN112418362B

CN112418362B - Target detection training sample screening method

Info

Publication number: CN112418362B
Application number: CN202110093092.2A
Authority: CN
Inventors: 宋志龙
Original assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Current assignee: Zhejiang Zhuoyun Intelligent Technology Co ltd
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2021-04-30
Anticipated expiration: 2041-01-25
Also published as: CN112418362A

Abstract

The invention discloses a method for screening target detection training samples, which is used for detecting targets in training set data by using models at different stages obtained in a training process to obtain detection results of each image sample on a model M at different stages. And screening the detection result of each image sample in the model at different stages to obtain a complete forgotten sample and a partial forgotten sample. The model replaces manpower to analyze the noise samples in a huge number of target detection data sets, so that the manpower is saved, the subjective influence of manual data screening is eliminated, and the efficiency and the accuracy of executing the target detection task by using a deep learning method are improved.

Description

Target detection training sample screening method

Technical Field

The invention belongs to the technical field of target detection, and particularly relates to a method for screening a target detection training sample.

Background

In recent years, with the continuous development of artificial intelligence technology, deep learning technology has made breakthrough progress in the tasks of classification, identification, detection, segmentation, tracking and the like in the field of computer vision. Compared with the traditional machine vision method, the deep convolutional neural network learns useful characteristics from a large amount of data under the training of big data, and has the advantages of high speed, high precision, low cost and the like. However, a great part of the reason why deep learning can achieve this advantage over conventional methods is because deep learning is based on a large amount of data, and especially in the field of target detection, deep learning requires a large amount of effective data. In order to provide a sufficient amount of effective data, the current mainstream practice is data enhancement, and many other sample generation methods also appear, but after a sufficient amount of samples are obtained, some noises which can be recognized by the model at the initial training stage and cannot be recognized by the model at the later training stage are inevitably generated, which are called as "forgotten samples", and the forgotten samples have negative effects on the model training process.

At the present stage, for a forgotten sample (such as an error label) in a data set, manual screening is generally needed, the workload is huge and is not representative, namely, a part of samples exist, and people subjectively consider that the samples are noise, but the samples are not noise or do not influence training in the view of a model, so that the target detection effect is influenced.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method for screening a target detection training sample aiming at the technical defects related in the background technology, so that the noise in the sample is reduced, the effective rate of the sample is improved, and the detection efficiency and the accuracy in the process of executing target detection by utilizing a deep learning method are improved.

According to one aspect of the present invention, there is provided a target detection training sample screening method, including:

and detecting the target in the training set data by using the models at different stages obtained in the training process to obtain the detection result of each image sample on the model M at different stages. Each image sample is processed as follows:

setting the recall rate recall as 1 when all the pre-labeled targets in the detection result of the model M have identification frames which are consistent with the pre-labeled categories and have IOU (input output unit) which is more than or equal to A;

when all the pre-labeled targets do not have the identification frames which are consistent with the pre-labeled categories and the IOU is more than or equal to A, the recall rate recall is 0;

when only the identification frames of part of the pre-labeled targets are consistent with the pre-labeled categories and the IOU is more than or equal to A, the recall rate recall is more than 0 and less than 1;

the IOU is the intersection ratio of the identification frame area in the detection result and the pre-marked identification frame area; a is a constant of 0 to 1 set according to an empirical value.

Screening at M₁-M_mUpper recall > 0, and at M_m＋1-M_nTaking the image sample with the upper call being 0 as a complete forgetting sample;

screening at M₁-M_mUpper recall is 1, and is at M_m＋1-M_nThe image samples with upper 0 < recall < 1 are used as partial forgetting samples.

Wherein m belongs to n, n is the number of stages of the model, and n is a natural number more than 1.

Preferably, n is m + 1 and a is 0.5.

The training set is composed of pre-labeled image samples, and the pre-labeled information optionally comprises information such as names of samples, target categories, coordinates of pre-labeled identification frames and the like.

The models in different stages are selected from models loaded with weight files learned from a training set, and n corresponds to the learning times.

The detection result optionally contains information such as the name of the sample, the identified target class, the coordinate of the identification frame and the like.

Compared with the prior art, the invention has at least the following beneficial effects: according to the method, the model replaces manpower to analyze the noise samples in a large number of target detection data sets, so that the manpower is saved, the subjective influence of manual data screening is eliminated, and the efficiency and the accuracy of executing the target detection task by using a deep learning method are improved.

Drawings

Fig. 1 is a schematic diagram of a completely forgotten sample screened by a target detection training sample screening method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a completely forgotten sample screened by the target detection training sample screening method according to the embodiment of the present invention.

Fig. 3 is a schematic diagram of a part of forgotten samples screened by the target detection training sample screening method according to the embodiment of the present invention.

Detailed Description

In order to make the technical solutions in one or more embodiments of the present disclosure better understood, the technical solutions in one or more embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in one or more embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of one or more embodiments of the present disclosure, but not all embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments in one or more of the specification without inventive faculty are intended to fall within the scope of one or more of the specification.

Example 1: in order to solve the above technical problems, this embodiment takes a model of a contraband detection scenario as an example, and describes a sample screening method, including the following steps:

1. and detecting the target in the training set data by using the models at different stages obtained in the training process to obtain the detection result of each image sample on the model M at different stages.

The training set is composed of manually labeled image samples, the number of the image samples is huge, generally thousands or even hundreds of thousands, the acquisition of the image samples is not limited to image acquisition, and the training set can also comprise samples obtained by data augmentation of images and/or new samples obtained by an image fusion method disclosed in the prior art. The type of the image is not limited, and the image may be acquired by a camera, an X-ray security inspection apparatus, or a terahertz imaging apparatus, but in general, the pattern is acquired by the same kind of apparatus, and the images acquired by a plurality of kinds of apparatuses are not mixed to be a sample of the same training set. The manual labeling is the labeling of the detection target, and the pre-labeling information is selectable and mainly comprises information such as the name of the sample, the type of the target, the coordinate of the pre-labeled identification frame and the like. And pre-labeling the obtained image samples to obtain labeled image samples to form a training set for training the constructed contraband detection model.

The type of the model is not limited in this embodiment, and the target detection model based on the deep learning method may be any target detection model, such as a tow stage target detection framework model represented by fast RCNN or a one stage target detection framework model represented by SSD or YOLO.

The training process of the model is to require the model to learn the training set for multiple times, and each learning (called an epoch) adjusts the parameters of the model, that is, a weight file is generated in each learning process. As an embodiment of the present invention, training 14 epochs saves the weight file obtained by 14 epochs in the model training process. The model M at different stages can be obtained by loading the weight files of different epochs on the model respectively₁-M₁₄And detecting the targets in the training set by using the models in different stages to obtain the detection result of each image sample on the model M in different stages.

The detection result optionally contains information such as the name of the sample, the identified target class, the coordinates of the identification frame and the like.

For a certain object on an image, when the recognition class of the object by the model is consistent with the pre-labeling class of the object, if the intersection ratio IOU (intersection over Union) of the recognition frame of the object and the labeling frame of the model is greater than a threshold value, the object is considered to be correctly detected by the model. The preset threshold may be set empirically, and is preferably 0.5.

2. And screening the detection results of the models in different stages to obtain a complete forgotten sample and a partial forgotten sample.

The screening process of the noise sample screening method of the training set data center in the field of target detection is a forgetting mechanism based on a target detection model, so the noise sample screened by the method is called a forgetting sample in the invention. In this embodiment, concepts of the IOU and the recall are introduced to screen a forgetting sample in a target detection data set, and the forgetting sample is subdivided into a completely forgotten sample and a partially forgotten sample according to whether the forgetting is completely forgotten or partially forgotten.

Specifically, for one image sample, multiple targets may be included, and for one target model detection result, multiple recognition frames may be included. Each image sample is processed as follows: setting a completely forgotten sample as a model M₁-M₁₃The detection result of (1) has an identification frame (recall > 0) that the pre-labeled target is consistent with the pre-labeled type and the IOU is more than or equal to 0.5, and the identification frame is in the model M₁₄In the detection result of (1), no image sample of the identification frame (recall 0) which is consistent with the labeling type and has the IOU more than or equal to 0.5 exists in the identification frame of the pre-labeled target.

Setting partial forgetting samples to be in model M₁-M₁₃All the pre-labeled targets in the upper detection result have identification frames (the recall rate recall is 1) which are consistent with the pre-labeled type and the IOU is more than or equal to 0.5, but in the model M₁₄Only part of the pre-labeled targets in the detection result have image samples of identification frames (0 < recall < 1) which are consistent with the pre-labeled type and have IOU more than or equal to 0.5.

The model self-checking method solves the problems that the workload of manually screening samples is huge, a small amount of mislabeled samples which are favorable for the robustness of the model can be deleted by human subjective judgment, and huge errors can be brought to the screening of data. Because the method screens the noise data which are unfavorable to the model in a model self-checking mode, the method is more targeted than manual screening and can not regard a small amount of wrong standard samples which are favorable to model training as the noise data, thereby ensuring the robustness of the model while deleting the screened noise data.

By the method of example 1, a training set consisting of 25000 image samples was tested to find that 10 total forgetting samples and 167 partial forgetting samples were screened out. Fig. 1 and fig. 2 show an example of an original image of a screened completely forgotten sample, and it can be seen that an object in the image sample is wrongly labeled. Fig. 3 is an example of a screened partial forgotten sample, and it can be seen that the image sample contains correctly labeled firecrackers and also contains objects incorrectly labeled as pistols.

And (3) eliminating the screened complete forgotten samples and partial forgotten samples, taking the rest data as a new training set and an original training set, respectively training the same model, and then testing to find that the model trained by the new training set has higher identification accuracy.

Example 2: in example 1 for M₁-M₁₃And M₁₄After the identification frame is used for judging and screening the forgotten sample composition set C1, the M is continuously carried out₁-M₅And M₆-M₁₄And judging and screening the forgotten samples by the obtained identification frame to form a set C2, and removing a union set of C1 and C2 to obtain a new training set. The forgetting samples with different forgetting degrees can be obtained by screening for multiple times at different stages, and the specific times can be selected according to the actual effect.

The technical scheme of the invention can also be applied to target identification detection scenes except contraband detection in the embodiment, such as various target detection scenes, such as face identification, license plate identification, road identification, unmanned driving, focus detection analysis under medical image CT inspection scenes and the like.

The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and the practical application, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims

1. A method for screening a target detection training sample is characterized by comprising the following steps:

s1, detecting the target in the training set data by using the models at different stages obtained in the training process to obtain the detection result of each image sample on the model M at different stages;

s2, the following processing is carried out on each image sample:

when only part of the pre-labeled targets have identification frames which are consistent with the pre-labeled categories and have IOUs larger than or equal to A, the recall rate recall is more than 0 and less than 1;

the IOU is the intersection ratio of the identification frame area in the detection result and the pre-marked identification frame area; a is a constant of 0 to 1 set according to an empirical value;

s3-will be at M₁-M_mUpper recall > 0, and at M_m＋1-M_nTaking the image sample with the upper call being 0 as a complete forgetting sample;

will be at M₁-M_mUpper recall is 1, and is at M_m＋1-M_nTaking the image sample with upper 0 < recall < 1 as a partial forgetting sample;

wherein m belongs to n, n is the number of stages of the model, and n is a natural number more than 1;

s4: screening out a complete forgotten sample and/or a partial forgotten sample;

2. The method as claimed in claim 1, wherein n is m + 1.

3. The method for screening a training sample for target detection according to any one of claims 1 or 2, wherein a is 0.5.

4. The method for screening the training samples for target detection according to claim 1, wherein the training set is composed of pre-labeled image samples, and the pre-labeled information optionally includes target category and pre-labeled identification frame coordinate information.

5. The method as claimed in claim 1, wherein the detection result optionally includes the recognized target class and the recognition frame coordinate information.

6. The method as claimed in claim 1, wherein the step S3 is performed for a plurality of times according to different m.

7. The method for screening a training sample for target detection according to any one of claims 1-2 and 4-6, wherein the screened sample is deleted and the remaining samples constitute a training set.