CN115496960A

CN115496960A - Sample generation method, target detection model training method, target detection method and system

Info

Publication number: CN115496960A
Application number: CN202211099128.9A
Authority: CN
Inventors: 夏威; 邓文平; 梁鸿; 周颖超; 项载尉
Original assignee: Hunan Shibite Robot Co Ltd
Current assignee: Hunan Shibite Robot Co Ltd
Priority date: 2022-09-09
Filing date: 2022-09-09
Publication date: 2022-12-20
Also published as: CN116152606A

Abstract

The invention discloses a sample generation method, a target detection model training method and a target detection method and system, wherein the sample generation method comprises the following steps: 1, training based on an original sample set to obtain a target detection model; 2, acquiring a candidate sample set which comprises N candidate sample pictures to be detected; 3, outputting target information of each position on each candidate sample picture based on the candidate sample set and the target detection model; 4, screening target sample pictures from the candidate sample set based on the output information of the step 3, and determining target positions and category information of all places on all the target sample pictures; and 5, outputting the target sample picture and the target position and type information of each position on the picture as a training sample. According to the invention, a label-free candidate sample set is utilized to automatically generate a sample for training the target detection model, so that the labor cost is greatly reduced; through continuous iterative learning, the adaptability of the target detection model to data in the online workflow is continuously improved, and the accuracy of the target detection result is improved.

Description

Sample generation method, target detection model training method, target detection method and system

Technical Field

The invention relates to the technical field of machine learning, in particular to a sample generation method, target detection model training and target detection method and system.

Background

In a real scene, various objects are often required to be subjected to target detection, such as steel plate and part detection of a factory assembly line, continuous vehicle detection on a road, pedestrian detection on a sidewalk and the like.

The existing target detection method mainly researches optimization of a static data set, and target detection models are obtained by training a fixed data set. Therefore, the existing target detection model has low detection precision for continuously changing data, and still needs to rely on manual work to continuously identify and label the acquired image to generate a training sample, and then retrains and deploys the target detection model again to improve the detection precision of the target detection model.

The method for generating the sample by the manual identification marking needs to consume a large amount of manpower and material resources, and has low sample marking efficiency and low model training efficiency. In addition, careless omission and subjectivity inevitably exist in the manual labeling process, and the accuracy of the target detection result is influenced.

Disclosure of Invention

In order to solve the problem that a large number of samples need to be marked manually in the prior art, the method is used for training a new target detection model. The invention provides a sample generation method, target detection model training, a target detection method and a target detection system, which can continuously and automatically generate a training sample of a target detection model, greatly reduce labor cost and continuously improve the accuracy of a target detection result.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows:

a sample generation method is characterized by comprising the following steps:

step 1, training based on an original sample set to obtain a target detection model, wherein the original sample set comprises a plurality of original sample pictures, and the output of the target detection model is target detection information at each position on each original sample picture;

step 2, acquiring a candidate sample set, wherein the candidate sample set comprises N candidate sample pictures to be detected;

step 3, outputting each target information on each candidate sample picture based on the candidate sample set and the target detection model;

step 4, based on the output information of the step 3, screening out target sample pictures from the candidate sample set and determining target positions and category information of all parts on all the target sample pictures;

and 5, outputting the target sample picture and the target positions and the class information of all places on the picture as a training sample.

By means of the method, the target detection model training sample is automatically generated by using the label-free candidate sample set, so that the labor cost is greatly reduced, and the accuracy of the target detection result is continuously improved.

As a preferred mode, the target detection information at each position on the picture comprises whether an object to be detected is at each position on the picture and the object class probability; in the step 3, for each candidate sample picture, the process of outputting the target information of each place on the candidate sample picture includes:

step 301, performing a plurality of reversible transformation processes (e.g., adjusting illumination, rotating, scaling, flipping, etc.) on the candidate sample picture to obtain m different pictures P;

step 302, inputting the m different pictures P into a target detection model, and outputting whether targets to be detected and target category probabilities are detected at each position on each picture P;

step 303, calculating target positions and target category probabilities of all places on the candidate sample pictures corresponding to the m pictures P based on the output result of the step 302;

and step 304, obtaining final target positions and target category probability information of each part on the candidate sample picture based on the calculation result of the step 303.

By means of the method, in the process of sample labeling, the candidate sample pictures are subjected to augmentation processing to obtain a plurality of pictures, the target detection information corresponding to each picture is obtained through inference of an existing detection model, and the target detection information on the original candidate sample pictures is finally determined by comprehensively considering the target detection information of each picture, so that the sample labeling accuracy is improved.

As a preferable mode, the step 4 includes:

step 401, using the object class probabilities of various places on the candidate sample picture output in step 303 as confidence scores;

step 402, setting a first threshold and a second threshold, wherein the first threshold is much smaller than the second threshold. For the samples, the candidate samples with the confidence degrees between the first preset threshold and the second preset threshold are regarded as useless samples and temporarily discarded. Reserving the candidate sample picture with the confidence coefficient lower than a first preset threshold value or higher than a second preset threshold value as a target sample picture to be output; setting the position with the confidence level lower than a first preset threshold value on the candidate sample picture as a background, setting the position with the confidence level higher than a second preset threshold value on the candidate sample picture as a target object and setting an object bounding box;

and step 403, acquiring the object class probability of the corresponding position of the object bounding box on the candidate sample picture corresponding to each picture P, and taking the maximum value of the object class probabilities as the object class probability at the object bounding box for output.

In a preferred embodiment, in the step 303, the object class probabilities of the respective locations on each picture P are averaged to obtain the object class probability of the respective locations on the candidate sample picture.

As a preferable mode, in step 2, the N candidate sample pictures of the area to be detected are obtained by continuously acquiring real-time images of the area to be detected. As another preferable mode, N candidate sample pictures of the region to be detected are obtained from the history storage image set of the region to be detected.

Preferably, the processing includes position reversal processing, picture brightness contrast change processing, reduction processing, or enlargement processing.

Based on the same inventive concept, the invention also provides a target detection model training method, which is characterized in that the training sample generated by the sample generation method is used for training the target detection model to obtain an updated target detection model. The method can continuously improve the adaptability of the target detection model to the data in the online workflow and improve the target detection accuracy through continuous iterative learning.

Preferably, the present invention further provides another method for training a target detection model, which is characterized in that an original sample set and training samples generated by the sample generation method are used to train the target detection model, so as to obtain an updated target detection model.

Based on the same inventive concept, the invention also provides a target detection model, which is characterized in that the target detection model carries out continuous self-learning updating through the target detection model training method.

Based on the same inventive concept, the invention also provides a target detection method, which is characterized in that the target detection model is utilized to carry out target detection on the picture to be detected.

Based on the same inventive concept, the invention also provides a target detection system, which is characterized by comprising an image acquisition unit, a model training unit and the target detection model, wherein:

an image acquisition unit: the image acquisition system is used for acquiring a picture to be detected, wherein one part of the picture to be detected is used as a candidate sample picture for generating a training sample set to train and update a target detection model, and the other part of the picture to be detected is used for identifying by the target detection model to output a target detection result;

a model training unit: for training the target detection model based on the generated training sample set to update the target detection model.

Compared with the prior art, the method utilizes the label-free candidate sample set to automatically generate the sample for training the target detection model, thereby greatly reducing the labor cost; meanwhile, the target detection method and the system continuously improve the adaptability of the target detection model to data in the online workflow and improve the accuracy of the target detection result through continuous iterative learning.

Drawings

Fig. 1 is a layout diagram of an image capturing unit according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a training method of a target detection model according to an embodiment of the present invention.

In fig. 1, 1 is an image acquisition unit, 101 is an online acquisition camera, 102 is a local storage, 103 is a data storage center, and 104 is a data transmission module.

Detailed Description

In order to make the technical solutions of the present invention better understood by those skilled in the art, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments. It is to be understood that the described embodiments are merely exemplary of a portion of the invention and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," and the like in the description and claims of the present invention and in the foregoing description and drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprising" and "having," and any variations thereof, in the description and claims of this invention are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The invention automatically generates the detection target frame for the continuously input picture data and automatically trains and updates, thereby achieving the continuous self-learning of the continuous data stream.

In one embodiment, the present invention provides a sample generation method comprising the steps of:

step 1, based on original sample set

Training to obtain target detection model

And deployed to a production environment. The original sample set comprises a plurality of original sample pictures, and the output of the target detection model is target detection information at each position on each original sample picture. Wherein the original sample set may be an initial set of manually labeled samples. The original sample set may also be obtained by other means, and is not limited herein.

Step 2, obtaining a candidate sample set

Wherein the candidate sample set comprises N candidate sample pictures to be detected

，

Is an integer and

，

is shown as

Opening a candidate sample picture to be detected;

step 3, based on the candidate sample set and the target detection model, predicting and outputting each candidate sample picture

Target information at various positions;

step 4, based on the prediction output information of step 3, from the candidate sample set

Screening target sample pictures and determining target positions and category information of all the positions on all the target sample pictures;

The invention utilizes the label-free candidate sample set to automatically generate the sample for training the target detection model, thereby greatly reducing the labor cost and continuously improving the accuracy of the target detection result.

In some embodiments, the target detection information at various places on the picture includes whether targets are to be detected at various places on the picture and target category probabilities; in the step 3, for each candidate sample picture

Outputting candidate sample pictures

The target information process at various places comprises the following steps:

step 301, candidate sample pictures

Performing augmentation treatment to obtain m different pictures

Is an integer and

，

is shown as

Corresponding to a candidate sample picture to be detected

A picture under transform;

step 302, the m pictures with different transformations are processed

Input target detection model

Using models

Predict each picture

Outputting each picture based on the detection result of (1)

Whether the object to be detected and the object class probability are detected everywhere is obtained

Probability matrix of object class everywhere above

And an object position matrix

；

Step 303, based on the output result of step 302, for m different transformed pictures

Doing the same stepThe inverse processing operation in step 301 is to restore the image to the original image position, average the m prediction results, and output the candidate sample image

Whether there is any object to be detected and object class probability

The position of the outer frame of the object is

And obtaining the most accurate prediction result.

In the sample labeling process, the candidate sample picture is processed to obtain different transformed pictures of the candidate sample picture, the target detection information of each part on the different transformed pictures corresponding to the candidate sample picture is obtained, and the target detection information on the original candidate sample picture is finally determined by comprehensively considering the target detection information of each part on each transformation, so that the sample labeling accuracy is improved.

In some embodiments, the step 4 comprises:

step 401, using the probability of each object class in the candidate sample picture output in step 303

As confidence score, and setting a first preset threshold

A second preset threshold

In which

；

Step 402, keeping the confidence coefficient lower than a first preset threshold value

Or above a second predetermined threshold

The candidate sample picture is taken as a target sample picture to be output; setting the position with the confidence degree lower than a first preset threshold value on the candidate sample picture as a background, setting the position with the confidence degree higher than a second preset threshold value on the candidate sample picture as a target object and setting an object boundary frame, and discarding the candidate sample picture with the confidence degree between the first preset threshold value and the second preset threshold value; in particular for

If any position exists

Confidence of (2)

It is rejected as a aliased sample. For each of the remaining candidate samples that is below a second predetermined threshold

Are all set as background and are higher than a first preset threshold

The position of (2) is foreground, corresponding to the position of (3)

Setting the prediction result as an object boundary frame, finally fusing the prediction results of all samples to generate a self-labeling data set

。

And step 403, acquiring object class probabilities of corresponding positions of the object bounding boxes on the candidate sample pictures corresponding to the pictures P, and taking the maximum value of the object class probabilities as the object class probability at the object bounding boxes for output.

In some embodiments, in the step 303, the object class probabilities of the positions on each picture P are averaged to serve as the object class probabilities of the positions on the candidate sample picture.

In some embodiments, in step 2, N candidate sample pictures of the area to be detected are obtained by continuously acquiring real-time images of the area to be detected. In other embodiments, N candidate sample pictures of the region to be detected are obtained from the historical stored image set of the region to be detected.

In some embodiments, the processing includes a position flipping process, a picture brightness contrast changing process, a reduction process, or an enlargement process. For example, for each candidate sample picture

After the left-right turning, up-down turning, reduction to 0.5 times of the original image, enlargement to 1.5 times of the original image and other multiple enlargements, multiple images including the original candidate sample image are obtained, and then the model is used

And predicting the target detection results of the multiple pictures in parallel to obtain the object to be detected and the object class probability of each corresponding position on the multiple pictures. In this embodiment, it is necessary to subsequently determine whether there are objects to be detected and object class probabilities at various places on the original image corresponding to the multiple pictures, and further determine an average value of the object class probabilities at the corresponding positions of the multiple pictures to obtain the most accurate prediction probability average value.

In some embodiments, the present invention provides a target detection model training method, which trains a target detection model by using a training sample generated by the sample generation method to obtain an updated target detection model. According to the invention, through continuous iterative learning, the adaptability of the target detection model to data in an online workflow can be continuously improved, and the target detection accuracy is improved.

In a more preferred embodiment, as shown in FIG. 2, the present invention also provides another method for training an object detection model, which utilizesArtificially labeled data sets (original sample sets)

) And self-labeling training samples generated by the sample generation method

Model for detecting target

And training to obtain an updated target detection model. In some embodiments, the artificial labeling dataset and the self-labeling training sample set are fused 1:1, sampling, and continuously training the target detection model. Meanwhile, methods such as cutting, zooming, color change, brightness change, mosaic and picture mixing and the like can be adopted to enhance the diversity of the pictures, strengthen the fusion of an original sample data set and a self-labeled new sample data set, and improve the comprehensive learning capacity of the two data sets. Using the fused sample set to detect the target

Continuing training until the model converges to obtain a new model

Using new models

Updating an original model

Then, the model is put in

And (5) deploying the model to a production environment, continuously executing the step 2 to the step 5, obtaining the automatically marked sample again, training the model again and updating, and realizing continuous self-learning of the model until the loss of the model is converged.

In the present invention, if the images are fused by methods such as clipping, scaling, color change, brightness change, mosaic, image mixing, etc., which belong to the prior art, no further description is given here, but understanding and implementation of the present invention by those skilled in the art are not affected.

In some embodiments, the invention further provides a target detection model, and the target detection model is continuously updated by self-learning through the target detection model training method. And after the target detection model is trained and updated, deploying the updated target detection model to a production environment.

In some embodiments, the present invention further provides a target detection method, which performs target detection on a picture to be detected by using the target detection model.

In some embodiments, the present invention further provides an object detection system, which includes an image acquisition unit, a model training unit and the object detection model, wherein:

an image acquisition unit: the image acquisition device is used for acquiring images to be detected, one part of the images to be detected is used as candidate sample images for generating a training sample set to train and update a target detection model, and the other part of the images to be detected is used for identifying by means of the target detection model to output a target detection result.

In some embodiments, the object detection system further comprises an online model deployment module for deploying the image acquisition unit, the model training unit and the object detection model at desired locations.

In some embodiments, the image acquisition unit employs a data recovery unit that is responsible for recovering the picture data in the persistent data stream. As shown in fig. 1, in this embodiment, the image capturing unit 1 includes an online capturing camera 101, a local storage 102, a data storage center 103, and a data transmission module 104. In fig. 1, the direction indicated by the hollow arrow indicates the direction of continuous feeding of workpieces during production, and the diamond shape and the cross represent different types of target workpieces to be identified.

The image acquisition unit 1 works as follows:

first, the on-line capture camera 101 takes a picture of a region to be detected, and after the picture is taken, the picture is stored in the local storage 102 (e.g., a local disk).

Then, the data transmission module 104 reads the picture from the local storage 102, and transmits the picture back to the data storage center 103 via a network or the like, and the data storage center 103 stores the received picture data according to the reception time for use

And (4) showing.

Picture data stored in data storage center 103

Can be used as candidate sample pictures

After automatic labeling, the training samples are finally used; can also be used as an initial manual labeling data set

。

The invention can be used for an object detection scene with continuous samples, can fully utilize label-free sample generation labels on a production line to automatically generate the samples for target detection, continuously and circularly self-update the sample set and the target detection model, greatly reduce labor cost, continuously improve the data adaptability of the target detection model to an on-line workflow, continuously optimize the model to adapt to new data and improve the accuracy of the target detection result. The invention is particularly suitable for the scenes with continuous data, such as workpiece detection in a factory assembly line (such as steel plate and part detection in the factory assembly line), vehicle detection on a road, pedestrian detection on the road and the like.

As shown in table 1, the experimental comparative effects are as follows:

the production line data of a certain factory is used as an experimental object, 5000 pieces of historical manual labeled data are used as a starting original sample set, in the online data of 3 months, 4 months and 5 months, the recovered partial data are used as a test set, and for an initial target detection model (Baseline model), the full-class average accuracy (MAP) is only 94.1. And after the data added with the artificial labels is adopted, the full-class average accuracy (MAP) is improved to 95. After the method is adopted to continuously train the target detection model, the full class average accuracy (MAP) is improved to 97.87, the effect is far better than that of the original basic target detection model, and the effect is better than that of the detection model obtained by utilizing a large number of manual labeling samples, and the result shows that the method has obvious effect on improving the full class average accuracy (MAP).

TABLE 1 Experimental comparison results Table (MAP means the average accuracy of the whole class)

While the embodiments of the present invention have been described in connection with the drawings, the present invention is not limited to the above-described embodiments, which are intended to be illustrative rather than restrictive, and many modifications may be made by one skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of generating a sample, comprising the steps of:

and 5, outputting the target sample picture and the target position and category information of each position on the picture as a training sample.

2. The sample generation method according to claim 1, wherein the target detection information at each position on the picture includes whether an object is to be detected at each position on the picture and an object class probability;

in the step 3, for each candidate sample picture, the process of outputting the target information of each place on the candidate sample picture includes:

step 301, performing multiple reversible transformation processes on candidate sample pictures to obtain m different pictures P;

step 302, inputting the m different pictures P into a target detection model, and outputting the target to be detected and the target class probability of each part on each picture P;

3. The sample generation method according to claim 2, wherein the step 4 comprises:

step 401, taking the class probability of each object on the candidate sample picture output in step 303 as a confidence score;

step 402, reserving all candidate sample pictures with confidence degrees lower than a first preset threshold value or higher than a second preset threshold value as target sample pictures to be output; setting the position with the confidence degree lower than a first preset threshold value on the candidate sample picture as a background, setting the position with the confidence degree higher than a second preset threshold value on the candidate sample picture as a target object and setting an object boundary frame, and discarding the candidate sample picture with the confidence degree between the first preset threshold value and the second preset threshold value;

4. The sample generation method according to claim 2 or 3, wherein in the step 303, the object class probabilities of each location on each picture P are averaged to be the object class probabilities of each location on the candidate sample picture.

5. The sample generation method according to any one of claims 1 to 3, wherein in the step 2, N candidate sample pictures of the area to be detected are obtained by continuously acquiring real-time images of the area to be detected; or the N candidate sample pictures of the region to be detected are obtained from the historical storage image set of the region to be detected.

6. The sample generation method according to claim 2 or 3, wherein in the step 301, the processing includes a position flipping process, a picture brightness contrast change process, a reduction process, or an enlargement process.

7. A method for training a target detection model is characterized in that,

training a target detection model by using a training sample generated by the sample generation method according to any one of claims 1 to 6 to obtain an updated target detection model;

or, training the target detection model by using the original sample set and the training samples generated by the sample generation method of any one of claims 1 to 6 to obtain the updated target detection model.

8. An object detection model, characterized in that the object detection model is continuously self-learning updated by the object detection model training method as claimed in claim 7.

9. An object detection method, characterized in that the object detection model of claim 8 is used to perform object detection on the picture to be detected.

10. An object detection system comprising an image acquisition unit, a model training unit and an object detection model according to claim 8, wherein: