CN115527089A

CN115527089A - Yolo-based target detection model training method and application and device thereof

Info

Publication number: CN115527089A
Application number: CN202210959911.1A
Authority: CN
Inventors: 郝矿荣; 杜少帅; 张海超; 郝灵广; 隗兵; 唐雪嵩
Original assignee: Donghua University
Current assignee: Donghua University
Priority date: 2022-08-11
Filing date: 2022-08-11
Publication date: 2022-12-27

Abstract

The invention relates to a method for training a target detection model based on Yolo, and application and a device thereof, wherein the method comprises the following steps: (a) Loading a training set and a test set, performing regional elastic deformation data enhancement on the training set, and setting corresponding training parameters; (b) Constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model; (c) Training the neural network model to obtain a trained target detection model; the application is as follows: after a sample set to be tested is obtained, inputting the sample set to be tested into the trained target detection model, and outputting a prediction label of a Yolo format of the sample set to be tested; the device comprises a data set marking unit, a data set segmentation and preprocessing unit, a parameter tuning unit, a neural network architecture searching unit and a training unit. The method simplifies the operation and realizes the standardization of the whole detection model training process; the device has simple structure and convenient operation.

Description

Yolo-based target detection model training method and application and device thereof

Technical Field

The invention belongs to the technical field of deep learning, and particularly relates to a Yolo-based target detection model training method, application and a device thereof.

Background

The chip is the brain operated in the current society, and mobile phones, intelligent wearable equipment, large-scale servers, sensors and the like have the shadow of the chip. Therefore, chip manufacturing is a core technology of aerospace science and technology and national defense, is the basis of intelligent manufacturing, and is a key technology for realizing informatization. The appearance defect can seriously affect the performance of the chip, so the detection of the appearance defect of the chip is an important ring for the production and the manufacture of the chip. The chip appearance defects are irregular in shape, various in characteristics, unfixed in appearance position and large in background noise; the object to be detected is small relative to the background. Traditional chip defect detection usually relies on artifical naked eye to detect, and detection efficiency and reliability are lower, moreover very big increase the manufacturing cost of enterprise.

In recent years, artificial intelligence technology based on deep learning has been gradually developed to maturity. In the field of computer vision, target detection is one of the hottest research fields, and has important applications in real scenes, such as intelligent monitoring, automatic driving, face detection, and the like. At present, a target detection model based on a deep neural network has the advantages of high identification precision, high speed and the like, and becomes a mainstream in a target detection algorithm. Therefore, the target detection model based on the deep neural network is used for chip defect detection tasks, and the method has important significance in improving the production yield of chips and reducing the production cost of enterprises. The traditional chip defect automatic detection task is generally defined as an object detection problem, the more samples are better, and the problem and the actual application environment are not fully considered. In addition, the deep learning technology threshold is high, practitioners need to master certain programming capability, mathematical basis and intelligent algorithm, and have sufficient knowledge on data sets, and then can design a proper neural network model, so as to optimize the model.

Through sufficient research on enterprises and production lines, users are more inclined to rapidly deploy and easily upgrade deep neural network models. The smaller the impact of the sample, sample labeling and model deployment on the entire production line, the better. When the detected target changes, the user can quickly adjust the deployment model without having to re-tune for long periods of time.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a method for training a target detection model based on Yolo and application and a device thereof.

In order to achieve the purpose, the invention adopts the following scheme:

a target detection model training method based on Yolo comprises the following steps:

(a) Loading a training set and a test set, and setting corresponding training parameters;

the training set comprises a chip appearance defect picture and a target detection label of a Yolo format of the chip appearance defect picture;

the training parameters comprise predefined chip appearance defect category number, maximum training round, learning rate, picture input width, picture input height and filters, wherein the predefined chip appearance defect category number is the number of the types of the chip appearance defects, the predefined chip appearance defect category number is set to be n larger than or equal to 1, the maximum training round is set to be larger than or equal to 2000 x n, the learning rate is set to be 0.00111, the picture input width is set to be 512, the picture input height is set to be 512, and the filters are set to be (n + 5) x 3;

(b) Constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;

dividing the training set of step (a) into a training data set D _train And validating the data set D _val Building a meta-structure search space and aligning themModeling by a neural network architecture search method, utilizing the training data set D _train Training a differentiable neural network architecture search model;

in the training process, the structure weight of the element structure is subjected to global normalization, and then the structure weight of the element structure and the network parameters are subjected to double-layer optimization, namely, the verification data set D is used _val The loss value of the element structure is used as an objective function of the optimization process, and the network parameters and the structure weight of the element structure are simultaneously adjusted through a back propagation algorithm;

after training is finished, sequencing is carried out according to the structure weights of all the element structures, the element structure with the largest weight is reserved, and a deep neural network model is formed, so that the neural network model is obtained;

(c) Training the neural network model to obtain a trained target detection model;

and (b) after data enhancement is carried out on the training set in the step (a), optimizing model parameters of the neural network model according to the training set, the test set and the training parameters after the data enhancement, and obtaining a trained target detection model.

As a preferred technical scheme:

the method for training the target detection model based on the Yolo includes the following specific steps:

(c1) After data enhancement is carried out on the training set in the step (a), chip appearance defect pictures in the training set are input into a neural network model, and a target detection label prediction value is obtained;

(c2) Calculating a loss function value by using the real value of the target detection label and the predicted value of the target detection label;

(c3) Updating model parameters (parameters are divided into hyper-parameters and model parameters, the hyper-parameters are set by people, the model parameters are optimized by an algorithm, and the above-mentioned training parameters are the hyper-parameters) by using the loss function values;

(c4) Inputting the chip appearance defect pictures concentrated in the test into a neural network model to obtain a target detection label predicted value;

(c5) Calculating a loss function value and a test set accuracy by using the real value of the target detection label and the predicted value of the target detection label;

(c6) Judging whether the accuracy of the test set is greater than the maximum accuracy R (the value range is 0-100%), if so, saving the neural network model, updating R, and entering the next step; otherwise, directly entering the next step;

(c7) Judging whether the neural network model converges (judging whether the model converges by judging whether the loss function values of the training set and the test set are gradually reduced), and if so, entering the next step; otherwise, the learning rate is decreased (the specific adjustment value is determined according to experience, for example, the adjustment value is decreased by 10 times, namely, the adjustment value is adjusted to one tenth of the last learning rate), and the step (c 1) is returned;

(c8) Judging whether the maximum training round is reached, if so, ending, and outputting a trained target detection model; otherwise, returning to the step (c 1).

The above-mentioned method for training a target detection model based on Yolo, training data set D _train And validating the data set D _val The ratio of the number of data of 9:1.

according to the above method for training the target detection model based on the Yolo, if the training set contains small targets, the data enhancement process includes sequentially performing Cutmix, mosaic data enhancement, class label smoothing, random copy and paste at an instance level, region elastic deformation, inversion, random scaling and brightness contrast random transformation; otherwise, the data enhancement process comprises the steps of sequentially carrying out Cutmix, mosaic data enhancement, class label smoothing, region elastic deformation, turnover and brightness contrast random transformation;

the small target is an appearance defect with the ratio of the width and the height of the bounding box to the width and the height of the image being less than 0.1, or an appearance defect with the resolution being less than 32 pixels multiplied by 32 pixels.

In the above method for training a target detection model based on Yolo, the specific steps of the elastic deformation of the region are as follows:

(i) The area ratio range of the rectangular frame to the image is set to (r) _min ，r _max ) Rectangular frameHas an aspect ratio in the range of (a) _min ，a _max )；

(ii) Randomly selecting a coordinate point (x, y) in the image at (r) _min ，r _max ) Randomly selecting the area ratio r within the range _i In (a) _min ，a _max ) Randomly selecting the aspect ratio a within the range _i ；

(iii) According to r _i 、a _i And the image area, calculate the length and width of the rectangular frame, regard (x, y) as the central point of the rectangular frame, confirm the rectangular frame;

(iv) Performing elastic deformation on the image in the rectangular frame;

(v) And (d) obtaining an image area containing the target according to the target detection label, and repeating the steps (i) to (iv) for the image area.

The method for training the target detection model based on the Yolo includes the following steps:

(i) Marking the picture with the appearance defects of the chip to obtain a marked data set;

predefining the class of the chip appearance defects to obtain the class configuration of the predefined chip appearance defects; calling a graphic image annotation tool, namely label img, labeling the rectangular frame to obtain a Yolo-format target detection label of the chip appearance defect picture, and finally obtaining a chip appearance defect data set formed by the chip appearance defect picture and the target detection label, namely obtaining a labeled data set;

(ii) Preprocessing the labeled data set to obtain a training set and a test set;

judging whether the chip appearance defect data set contains the small target or not, if so, backing up the chip appearance defect data set, dividing the chip appearance defect picture into 64 subgraphs, simultaneously dividing the target detection label into sub-labels according to a rule corresponding to the chip appearance defect picture, extracting the subgraphs and the sub-labels to obtain the divided chip appearance defect data set, dividing the divided chip appearance defect data set into a training set and a test set, backing up the training set and the test set simultaneously, rewriting a copy function by adopting C language in the backup process, and calling the copy function by a preprocessing module in a DLL (delay locked loop) library function form; and otherwise, directly extracting the chip appearance defect picture and the target detection label in the chip appearance defect data set, segmenting the chip appearance defect picture and the target detection label to obtain a training set and a test set, and simultaneously backing up, wherein a C language is adopted to rewrite the copy function in the backup process, and the copy function is called by the preprocessing module in a DLL library function mode.

According to the above Yolo-based target detection model training method, the segmentation adopts a random sampling method, and the ratio of the data quantity of the training set to the data quantity of the testing set is 9:1.

According to the Yolo-based target detection model training method, the segmentation process is accelerated by adopting a multithreading method.

The invention also provides an application of the method for training the target detection model based on the yo, after the sample set to be tested is obtained, the sample set to be tested is input into the trained target detection model, and the prediction label of the yo format of the sample set to be tested is output by the model;

the acquisition process of the sample set to be detected comprises the following steps: collecting an appearance defect picture of a chip to be detected, judging whether the appearance defect picture of the chip to be detected contains the small target or not, if so, backing up the appearance defect picture of the chip to be detected, dividing the appearance defect picture of the chip to be detected into 64 sub-pictures, extracting the sub-pictures to obtain a sample set to be detected, and backing up the sample set at the same time; otherwise, directly extracting the picture of the appearance defect of the chip to be detected to obtain a sample set to be detected, and simultaneously carrying out backup.

The invention also provides a device adopting the method for training the target detection model based on the Yolo, which comprises the following steps:

the data set marking unit is used for marking the chip appearance defect picture to obtain a marked data set;

the data set segmentation and preprocessing unit is used for preprocessing the labeled data set to obtain a training set and a test set;

the parameter tuning unit is used for loading the training set and the test set and setting corresponding training parameters;

the neural network architecture searching unit is used for constructing a meta structure searching space according to the training set, searching the neural network architecture and obtaining a neural network model;

and the training unit is used for training the neural network model to obtain a trained target detection model.

As a preferred technical scheme:

the apparatus as described above further includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the flow of the computer program is as follows:

(S1) marking the picture with the appearance defects of the chip to obtain a marked data set;

(S2) preprocessing the labeled data set to obtain a training set and a test set;

(S3) loading a training set and a test set, and setting corresponding training parameters;

(S4) constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;

and (S5) training the neural network model to obtain a trained target detection model.

The method of the invention has the following four characteristics:

(1) The prior appearance detection method is from training to reasoning, but the method integrates data marking, preprocessing, model optimization and training, integrates the whole flow of appearance detection, and unifies a plurality of algorithms and steps under a framework; the algorithms are originally unrelated, each step needs to be executed manually, and intermediate conversion operation is carried out, but the incompatibility among the algorithms is overcome, and the algorithms are connected, so that for a user, the method greatly simplifies the complex operation process, reduces the difficulty of applying the algorithms, improves the efficiency of the model from training to deployment, and is more suitable for complex and changeable scenes of a production field;

(2) Aiming at the problems of tiny and irregular chip appearance defects, the invention designs a small target mode, namely when a chip appearance defect picture contains small targets, the picture is divided into 64 sub-pictures, tags are divided according to rules corresponding to the picture, then the sub-pictures and the sub-tags are extracted and divided into a training set and a test set, and then corresponding data enhancement is respectively carried out; the small target mode enables the model to pay more attention to chip appearance defects instead of a large number of backgrounds, and detection accuracy is improved;

(3) In the stage of preprocessing the labeled data set, the speed of the segmentation process is increased by adopting a multithread method, after the segmentation is completed, the copy function is rewritten by adopting C language in the backup process and is called by a preprocessing module in a DLL (delay locked loop) library function mode, so that the problem that the execution speed of the python copy function is too low is solved, and the preprocessing speed of the system is increased in multiples; the running environment is a CPU: i7-10700, hard disk: when NVME KIOXIA 256G is used, 223 chip appearance defect pictures (the picture size is 6464x 4852) containing small targets are preprocessed, and the preprocessing time is reduced from 6h to within 20 min;

(4) The neural network architecture is adopted for searching to obtain an optimized network structure, so that automatic optimization of the target detection model is realized, dependence on expert knowledge is reduced, and dependence of the model on a data set is greatly reduced.

With regard to the characteristics (1), the comparison document 1 (CN 202110944135) mainly solves the technical problem that the Yolo network cannot converge when detecting the same target data set containing different types of feature richness, and has no labeling and preprocessing method, but the content of the invention is directed to the chip appearance defect detection task, and makes adjustments, such as small target mode, and integrates labeling and preprocessing methods; the comparison document 2 (CN 202110905324) mainly provides a sorting system, a built-in algorithm is simple, and the key point of the invention is training from data labeling to a model without involving a mechanical structure; the comparison document 3 (CN 113410154A) judges whether the chip is qualified or not by using a region division method, which is completely different from the method used by the invention and can only classify but not accurately identify the unqualified region of the chip, which is essentially different from the method used by the invention; the comparison document 4 (an IC chip appearance detection system based on machine vision, university of southern China) mainly increases the positioning precision of chip pins by a light source setting and image processing method of image acquisition, and detects pin defects and printing information definition defects of an SOP type chip, but the invention does not relate to mechanical facilities such as image acquisition and the like, can detect various types of appearance defects, is a universal detection model training frame, and has essential differences; a comparison document 5 (a QFP chip appearance visual detection system and a detection method) Chinese mechanical engineering 24 (3) adopts a Canny operator edge detection algorithm to process images of QFP pins, emphasizes a pin stack height detection method and a pin coplanarity detection method based on a three-point method, is similar to the previous invention, and detects the pins by using a traditional image processing method.

The method is characterized in that (2) the comparison file 6 (CN 202110880257) mainly realizes the defect detection of the small chip through transfer learning, but the method improves the detection precision of the small target by segmenting a high-resolution picture, and searches a model suitable for a data set through a neural network architecture searching method; a comparison file 7 (carrier chip defect detection based on a lightweight convolutional neural network, computer engineering and application: 1-10.) provides a carrier chip defect detection algorithm YOLO-Effectinenet based on the lightweight convolutional neural network aiming at the real-time detection problems of three different types of defects, namely carrier chip breakouts, positioning column damages and waveguide stains, but the method aims at the three defects of the carrier chip, the method is suitable for detecting the defects of various chips, particularly small target appearance defects are adjusted, the method comprises various flows from data marking to preprocessing, and a whole set of system is included, which is not possessed by other inventions.

With respect to feature (3), no relevant document mentions the application of such an acceleration method in the detection of defects in the chip appearance, since this is related to the small target mode of the present invention and is unique to the present invention.

The method has the characteristics that (4), the comparison file 8 (CN 202110642625) only detects the chip solder balls, and meanwhile, the characters are identified, the detection content is single, and the adaptability is poor, but the method can reconstruct a model at any time according to the change of a training data set, and an optimal network structure is searched out based on a neural network architecture searching method, so that the method is very convenient and efficient; the comparison document 9 (IC chip appearance defect recognition algorithm research based on deep learning, university in south of the Yangtze river) mainly studies the traditional convolutional neural network algorithm, which is different from the full-flow system method provided by the present invention, and is essentially different from Yolo and neural network architecture search.

Generally, compared with the comparison documents, the invention only constructs data marking, preprocessing, parameter setting, neural network architecture searching and model training in a system among a plurality of methods adopting deep learning technology, thereby forming an integral method for detecting the appearance defects of the chip. The method is systematic, integrated, automated, and globally considered, while optimizing various components, such as small target patterns, preprocessing acceleration methods, and the like. Other methods only focus on one part of the detection system and cannot be directly put into production line for use.

Compared with the prior art, other methods only optimize a certain part, but the invention considers how to simplify the operation flow from the overall perspective so as to better implement the method; the prior art method has limitations, and the invention designs methods such as preprocessing acceleration, small target mode and the like elaborately, and optimizes a subject model by using the latest neural network architecture search algorithm. The invention skillfully integrates the processes and algorithms, so that the processes and algorithms become a new whole and cannot be regarded as simple combination.

Has the advantages that:

(1) The method for training the target detection model based on the Yolo simplifies the operation, realizes the standardization of the whole detection model training process, enables the whole process to be highly automatic, improves the efficiency and reduces the dependence on expert knowledge;

(2) According to the Yolo-based target detection model training method, a front-line worker can autonomously and normatively finish data acquisition, labeling and preprocessing according to the actual condition of the obtained sample; obtaining a deep neural network model suitable for the current data set through a smaller data set and a standard data labeling and neural network architecture searching module, and finishing the training and deployment of the model;

(3) The Yolo-based target detection model training device is simple in structure and convenient to operate.

Drawings

FIG. 1 is a schematic flow chart of a method for training a target detection model based on Yolo according to the present invention;

FIG. 2 is a schematic diagram of a partial structure of a Yolo-based target detection model training apparatus according to the present invention;

FIG. 3 is a schematic flow chart illustrating a process of training a neural network model to obtain a trained target detection model in the Yolo-based target detection model training method of the present invention;

FIG. 4 shows the specific steps of the zone elastic deformation algorithm of the present invention;

FIG. 5 is a schematic diagram illustrating the deformation principle of elastic deformation of the region according to the present invention, wherein (a) is an original drawing and (b) is a drawing after deformation;

fig. 6 is a diagram of the actual effect of the elastic deformation of the region of the present invention, in which (a) is the actual effect 1, (b) is the actual effect 2, (c) is the actual effect 3, (d) is the actual effect 4, (e) is the actual effect 5, (f) is the actual effect 6, (g) is the actual effect 7, (h) is the actual effect 8, and (i) is the actual effect 9.

Detailed Description

The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes and modifications of the present invention may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.

A method for training a target detection model based on Yolo is disclosed, as shown in FIG. 1, and comprises the following specific steps:

(a) Acquiring a training set and a test set, and specifically comprising the following steps:

predefining the classes of the chip appearance defects to obtain the configuration of the classes of the predefined chip appearance defects; calling a graphic image annotation tool labelImg to label a rectangular frame to obtain a Yolo-format target detection label of the chip appearance defect picture, and finally obtaining a chip appearance defect data set consisting of the chip appearance defect picture and the target detection label, namely obtaining a labeled data set;

judging whether the chip appearance defect data set contains the small target or not, if so, backing up the chip appearance defect data set, dividing the chip appearance defect picture into 64 subgraphs by adopting a multithreading method, simultaneously dividing the target detection label into sub-labels according to a rule corresponding to the chip appearance defect picture by adopting a multithreading method, extracting the subgraphs and the sub-labels to obtain the divided chip appearance defect data set, and segmenting the chip appearance defect data set by adopting a random sampling method, namely obtaining a training set and a test set with a data quantity ratio of 9:1, and backing up the chip appearance defect data set at the same time, wherein a C language is adopted to rewrite a copy function in the backup process, and the copy function is called by a preprocessing module in a DLL library function mode; otherwise, directly extracting the chip appearance defect picture and the target detection label in the chip appearance defect data set, and segmenting the chip appearance defect picture and the target detection label by adopting a random sampling method, namely obtaining a training set and a test set with a data quantity ratio of 9:1, simultaneously carrying out backup, rewriting a copy function by adopting C language in the backup process, and calling the copy function by a preprocessing module in a DLL (delay locked loop) library function mode;

the small target is an appearance defect with the ratio of the width and the height of the bounding box to the width and the height of the image being less than 0.1, or an appearance defect with the resolution being less than 32 pixels multiplied by 32 pixels;

(b) Loading a training set and a test set, and setting corresponding training parameters;

(c) Constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;

dividing the training set in step (a) into a training data set D with a data quantity ratio of 9:1 _train And validating the data set D _val Building a meta-structure search space, modeling a method for searching a differentiable neural network architecture, and using the training data set D _train Training a differentiable neural network architecture search model;

in the training process, the structure weight of the element structure is subjected to global normalization, and then the structure weight of the element structure and the network parameters are subjected to double-layer optimization, namely the verification data set D is used _val The loss value of the element structure is used as an objective function of the optimization process, and the network parameters and the structure weight of the element structure are simultaneously adjusted through a back propagation algorithm;

(d) Training the neural network model to obtain a trained target detection model, as shown in fig. 3, specifically including the following steps:

(d1) After data enhancement is carried out on the training set in the step (a), chip appearance defect pictures (namely 'training pictures' in the picture) in the training set are input into a neural network model, and a target detection label prediction value (namely 'prediction result' in the picture) is obtained; if the training set contains small targets, the data enhancement process comprises sequentially performing Cutmix, mosaic data enhancement, class label smoothing, random copy and paste at an instance level, region elastic deformation, overturning, random scaling and brightness contrast random transformation; otherwise, the data enhancement process comprises the steps of sequentially carrying out Cutmix, mosaic data enhancement, class label smoothing, region elastic deformation, turnover and brightness contrast random transformation;

(d2) Calculating a loss function value by using a real value of the target detection label (namely a 'training picture label' in the graph) and a predicted value of the target detection label (namely a 'prediction result' in the graph);

(d3) Updating model parameters (parameters are divided into hyper-parameters and model parameters, the hyper-parameters are set by people, the model parameters are optimized by an algorithm, and the above-mentioned training parameters are the hyper-parameters) by using the loss function values;

(d4) Inputting a chip appearance defect picture (namely a 'test picture' in the picture) in the test set into the neural network model to obtain a target detection label predicted value (namely a 'predicted result' in the picture);

(d5) Calculating a loss function value and a test set accuracy by using a real value of a target detection label (namely a 'test picture label' in the figure) and a predicted value of the target detection label (namely a 'prediction result' in the figure);

(d6) Judging whether the accuracy of the test set is greater than the maximum accuracy R (the initial assignment of R is 0%), if so, saving the neural network model, updating R, and entering the next step; otherwise, directly entering the next step;

(d7) Judging whether the neural network model is converged, if so, entering the next step; otherwise, decreasing the learning rate (the specific adjustment value is determined according to experience, for example, decreasing by 10 times, that is, decreasing by one tenth of the last learning rate), and returning to step (d 1);

(d8) Judging whether the maximum training round is reached, if so, ending, and outputting a trained target detection model; otherwise, returning to the step (d 1).

In the step (d 1), one link in data enhancement is elastic deformation of the region, as shown in fig. 4, the specific steps are as follows:

(i) The area ratio range of the rectangular frame to the image is set to (r) _min ，r _max ) The rectangular frame has an aspect ratio ranging from (a) _min ，a _max )；

(iv) Elastically deforming the image in the rectangular frame;

The mathematical principle of elastic deformation is based on bilinear interpolation, as shown in FIG. 5.

The application of the Yolo-based target detection model training method as described above: after a sample set to be tested is obtained, inputting the sample set to be tested into the trained target detection model, and outputting a prediction label of a Yolo format of the sample set to be tested; the acquisition process of the sample set to be detected is as follows: collecting an appearance defect picture of a chip to be detected, judging whether the appearance defect picture of the chip to be detected contains the small target or not, if so, backing up the appearance defect picture of the chip to be detected, dividing the appearance defect picture of the chip to be detected into 64 sub-pictures, extracting the sub-pictures to obtain a sample set to be detected, and backing up the sample set at the same time; otherwise, directly extracting the picture of the appearance defect of the chip to be detected to obtain a sample set to be detected, and simultaneously carrying out backup.

The device adopting the Yolo-based target detection model training method comprises a data set labeling unit, a data set segmentation and preprocessing unit, a parameter tuning unit, a training unit, a memory, a processor and a computer program which is stored on the memory and can run on the processor; wherein, the partial structure schematic diagram of the device is shown in FIG. 2;

the training unit is used for training the neural network model to obtain a trained target detection model;

the flow of the computer program is as follows:

Now, the above mentioned target detection model training method based on Yolo and its application are explained by combining with specific cases, and the task is as follows: detecting the appearance defects and scratches of the chip, which comprises the following steps:

(a) Acquiring a training set and a test set;

predefining a chip appearance defect class file, wherein the file content is 'hushang', and the file content is a class label of the chip appearance defect; firstly, collecting a batch of chip appearance defect pictures, opening a graphic image annotation tool label img for marking, wherein the marking content is scratch defects, and the marking form is a rectangular frame; during marking, the rectangular frame can just wrap the scratch; marking all pictures containing scratches, namely obtaining an initial data set;

the size of the picture with the appearance defects of the original chip is 6464 multiplied by 4852, the size of the scratch defects is smaller than 600 multiplied by 480, and the condition of a small target is met, so that the picture and the label are segmented; dividing the picture with the chip appearance defects into 64 sub-pictures, and dividing the scratch label into sub-labels according to rules corresponding to the pictures; then, taking out the sub-graph containing the scratch and the corresponding sub-label to obtain a small target chip appearance defect data set or a chip appearance scratch data set;

segmenting the small target chip appearance defect data set according to the proportion of 9:1 to obtain a training set and a test set;

setting the number n of predefined chip appearance defect types as 1, the maximum training round as 2000, the learning rate as 0.00111, the picture input width as 512, the picture input height as 512 and the filter as 18;

dividing the training set obtained above into training data sets D according to the proportion of 9:1 _train And validating the data set D _val Constructing a meta-structure search space under the integral architecture of Yolo, modeling a method for searching a differentiable neural network architecture, and utilizing the training data set D _train Training a differentiable neural network architecture search model;

(d) Training the neural network model to obtain a trained target detection model, and specifically comprising the following steps:

(d1) Firstly, sequentially applying a data enhancement method of Cutmix, mosaic data enhancement, class label smoothing, random copying and pasting at an example level, region elastic deformation, overturning, random scaling and brightness contrast random transformation to a training set to obtain an enhanced training set picture and a corresponding scratch label; then, a data enhancement method of turning, randomly scaling and randomly changing the brightness contrast ratio is carried out on the test set data to obtain an enhanced test set picture and a corresponding scratch label; finally, inputting the enhanced training set chip appearance defect picture into a neural network to obtain a scratch label predicted value;

(d2) Calculating a loss function value by using the real value of the scratch label and the predicted value of the scratch label;

(d3) Updating the model parameters by using the loss function values;

(d4) Inputting the chip appearance scratch pictures in the test set into a neural network model to obtain a scratch predicted value of the test set;

(d5) Calculating a loss function value and a test set accuracy by using the real value of the scratch label of the test set and the predicted value of the scratch label of the test set;

(d7) Judging whether the neural network model is converged, if so, entering the next step; otherwise, the learning rate is decreased (the specific adjustment value is determined according to experience, for example, the adjustment value is decreased by 10 times, namely, the adjustment value is adjusted to one tenth of the last learning rate), and the step (d 1) is returned;

(d8) Judging whether the maximum training round number is 2000, if so, finishing the training, and outputting a trained target detection model; otherwise, returning to the step (d 1);

(e) Outputting a label of an appearance defect picture of the chip to be detected;

collecting a chip appearance defect picture to be detected, and dividing the chip appearance defect picture into 64 sub-pictures to obtain a divided chip appearance defect picture; extracting the segmented chip appearance defect picture to obtain a sample set to be detected, and simultaneously carrying out backup;

inputting a sample set to be tested into the trained neural network model to obtain a scratch prediction label of the sample set to be tested;

the existing chip to be detected has 100 pictures with appearance defects, and 110 scratch defects exist, wherein the scratch defects are caused by 90 small targets; inputting a sample to be detected into the model of the invention and the model (master-rcnn) in the prior art to obtain a detection result; the model detects 100 pictures and 102 scratch defects, wherein the scratch defects of a small target are 85, and the detection speed is 35 pictures per second; the prior art model (master-rcnn) detects 93 scratch defects, wherein 79 scratch defects are detected from a small target, and the detection speed is 5 pictures per second.

In addition to chip datasets, the present invention works well on other common datasets as well. On a PASCAL VOC 2007 target detection data set, by using the method, the AP50 index is improved by 0.5 percent compared with a basic model. In addition, the region elastic deformation algorithm in the invention can also be applied to an image classification task, and the actual effect is as shown in fig. 6. On a CIFAR-10 data set, the accuracy of the original ResNet18 model is 94.92%, and after the regional elastic deformation algorithm is adopted, the accuracy is 95.86%. On a CIFAR-100 data set, the accuracy of an original ResNet50 model is 80.60%, and after a region elastic deformation algorithm is adopted, the accuracy is 81.68%.

The method has the advantages that: compared with the original Yolo network, the method inherits the high-efficiency detection speed of the Yolo network model, has good accuracy rate aiming at small targets, and can adjust the structure of the neural network model according to a data set; compared with the fast-rcnn method, the method has the advantages of higher detection speed and simpler operation method; from the aspect of data enhancement, the invention designs a regional elastic deformation data enhancement method, which simulates the local elastic deformation of an object in the real world, increases the richness of a sample and improves the robustness of a model; from the perspective of standardization, compared with other methods which only improve the model structure, the method provided by the invention links the data set construction and the model training process aiming at the variable data set, reduces the complexity of data preprocessing, and provides a more standard and complete model training method.

Claims

1. A method for training a target detection model based on Yolo is characterized by comprising the following steps:

the training parameters comprise predefined chip appearance defect category number, maximum training round, learning rate, picture input width, picture input height and filters, wherein the predefined chip appearance defect category number is set to be n is larger than or equal to 1, the maximum training round is set to be larger than or equal to 2000 Xn, the learning rate is set to be 0.00111, the picture input width is set to be 512, the picture input height is set to be 512, and the filters are set to be (n + 5) X3;

dividing the training set in step (a) into a training data set D _tain And validating the data set D _val Building a meta-structure search space, modeling a method for searching a differentiable neural network architecture, and using the training data set D _train Training a differentiable neural network architecture search model;

2. The method for training a Yolo-based target detection model according to claim 1, wherein the specific process of step (c) is as follows:

(c3) Updating the model parameters by using the loss function values;

(c4) Inputting the chip appearance defect pictures in the test set into a neural network model to obtain a target detection label predicted value;

(c6) Judging whether the accuracy of the test set is greater than the maximum accuracy R, if so, saving the neural network model, updating R, and entering the next step; otherwise, directly entering the next step;

(c7) Judging whether the neural network model is converged, if so, entering the next step; otherwise, reducing the learning rate and returning to the step (c 1);

3. The method of claim 1, wherein the training data is a training dataSet D _train And validating the data set D _val Is 9:1.

4. The Yolo-based target detection model training method as claimed in claim 1, wherein if the training set contains small targets, the data enhancement process is sequentially Cutmix, mosaic data enhancement, class label smoothing, instance-level random copy-paste, regional elastic deformation, flipping, random scaling and luminance contrast random transformation; otherwise, the data enhancement process comprises the steps of sequentially carrying out Cutmix, mosaic data enhancement, class label smoothing, area elastic deformation, turnover and brightness contrast random transformation;

5. The method for training the Yolo-based target detection model according to claim 4, wherein the specific steps of elastic deformation of the region are as follows:

(iv) Elastically deforming the image in the rectangular frame;

6. The method for training a Yolo-based target detection model according to claim 4, wherein the training set and the test set are obtained by the following steps:

judging whether the chip appearance defect data set contains the small target or not, if so, backing up the chip appearance defect data set, dividing the chip appearance defect picture into 64 sub-pictures, simultaneously dividing the target detection label into sub-labels according to a rule corresponding to the chip appearance defect picture, extracting the sub-pictures and the sub-labels to obtain the divided chip appearance defect data set, dividing the divided chip appearance defect data set to obtain a training set and a test set, backing up the chip appearance defect data set, rewriting a copy function by adopting C language in the backup process, and calling the copy function by a pre-processing module in a DLL (delay locked loop) library function mode; and otherwise, directly extracting the chip appearance defect picture and the target detection label in the chip appearance defect data set, segmenting the chip appearance defect picture and the target detection label to obtain a training set and a test set, simultaneously backing up, rewriting the copy function by adopting C language in the backup process, and calling the copy function by the preprocessing module in a DLL library function mode.

7. The method as claimed in claim 6, wherein the segmentation adopts a random sampling method, and the ratio of the data quantity of the training set to the data quantity of the testing set is 9:1; and the speed of the segmentation process is increased by adopting a multithreading method.

8. The application of the Yolo-based target detection model training method as claimed in claim 6 or 7, wherein after a sample set to be tested is obtained, the sample set is input into the trained target detection model, and a prediction label in the Yolo format of the sample set to be tested is output;

9. The apparatus for the Yolo-based target detection model training method as claimed in claim 6 or 7, comprising:

10. The apparatus of claim 9, further comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program having a flow chart as follows: