CN115527089A - Yolo-based target detection model training method and application and device thereof - Google Patents

Yolo-based target detection model training method and application and device thereof Download PDF

Info

Publication number
CN115527089A
CN115527089A CN202210959911.1A CN202210959911A CN115527089A CN 115527089 A CN115527089 A CN 115527089A CN 202210959911 A CN202210959911 A CN 202210959911A CN 115527089 A CN115527089 A CN 115527089A
Authority
CN
China
Prior art keywords
training
target detection
chip
neural network
appearance defect
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210959911.1A
Other languages
Chinese (zh)
Inventor
郝矿荣
杜少帅
张海超
郝灵广
隗兵
唐雪嵩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN202210959911.1A priority Critical patent/CN115527089A/en
Publication of CN115527089A publication Critical patent/CN115527089A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method for training a target detection model based on Yolo, and application and a device thereof, wherein the method comprises the following steps: (a) Loading a training set and a test set, performing regional elastic deformation data enhancement on the training set, and setting corresponding training parameters; (b) Constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model; (c) Training the neural network model to obtain a trained target detection model; the application is as follows: after a sample set to be tested is obtained, inputting the sample set to be tested into the trained target detection model, and outputting a prediction label of a Yolo format of the sample set to be tested; the device comprises a data set marking unit, a data set segmentation and preprocessing unit, a parameter tuning unit, a neural network architecture searching unit and a training unit. The method simplifies the operation and realizes the standardization of the whole detection model training process; the device has simple structure and convenient operation.

Description

Yolo-based target detection model training method and application and device thereof
Technical Field
The invention belongs to the technical field of deep learning, and particularly relates to a Yolo-based target detection model training method, application and a device thereof.
Background
The chip is the brain operated in the current society, and mobile phones, intelligent wearable equipment, large-scale servers, sensors and the like have the shadow of the chip. Therefore, chip manufacturing is a core technology of aerospace science and technology and national defense, is the basis of intelligent manufacturing, and is a key technology for realizing informatization. The appearance defect can seriously affect the performance of the chip, so the detection of the appearance defect of the chip is an important ring for the production and the manufacture of the chip. The chip appearance defects are irregular in shape, various in characteristics, unfixed in appearance position and large in background noise; the object to be detected is small relative to the background. Traditional chip defect detection usually relies on artifical naked eye to detect, and detection efficiency and reliability are lower, moreover very big increase the manufacturing cost of enterprise.
In recent years, artificial intelligence technology based on deep learning has been gradually developed to maturity. In the field of computer vision, target detection is one of the hottest research fields, and has important applications in real scenes, such as intelligent monitoring, automatic driving, face detection, and the like. At present, a target detection model based on a deep neural network has the advantages of high identification precision, high speed and the like, and becomes a mainstream in a target detection algorithm. Therefore, the target detection model based on the deep neural network is used for chip defect detection tasks, and the method has important significance in improving the production yield of chips and reducing the production cost of enterprises. The traditional chip defect automatic detection task is generally defined as an object detection problem, the more samples are better, and the problem and the actual application environment are not fully considered. In addition, the deep learning technology threshold is high, practitioners need to master certain programming capability, mathematical basis and intelligent algorithm, and have sufficient knowledge on data sets, and then can design a proper neural network model, so as to optimize the model.
Through sufficient research on enterprises and production lines, users are more inclined to rapidly deploy and easily upgrade deep neural network models. The smaller the impact of the sample, sample labeling and model deployment on the entire production line, the better. When the detected target changes, the user can quickly adjust the deployment model without having to re-tune for long periods of time.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a method for training a target detection model based on Yolo and application and a device thereof.
In order to achieve the purpose, the invention adopts the following scheme:
a target detection model training method based on Yolo comprises the following steps:
(a) Loading a training set and a test set, and setting corresponding training parameters;
the training set comprises a chip appearance defect picture and a target detection label of a Yolo format of the chip appearance defect picture;
the training parameters comprise predefined chip appearance defect category number, maximum training round, learning rate, picture input width, picture input height and filters, wherein the predefined chip appearance defect category number is the number of the types of the chip appearance defects, the predefined chip appearance defect category number is set to be n larger than or equal to 1, the maximum training round is set to be larger than or equal to 2000 x n, the learning rate is set to be 0.00111, the picture input width is set to be 512, the picture input height is set to be 512, and the filters are set to be (n + 5) x 3;
(b) Constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;
dividing the training set of step (a) into a training data set D train And validating the data set D val Building a meta-structure search space and aligning themModeling by a neural network architecture search method, utilizing the training data set D train Training a differentiable neural network architecture search model;
in the training process, the structure weight of the element structure is subjected to global normalization, and then the structure weight of the element structure and the network parameters are subjected to double-layer optimization, namely, the verification data set D is used val The loss value of the element structure is used as an objective function of the optimization process, and the network parameters and the structure weight of the element structure are simultaneously adjusted through a back propagation algorithm;
after training is finished, sequencing is carried out according to the structure weights of all the element structures, the element structure with the largest weight is reserved, and a deep neural network model is formed, so that the neural network model is obtained;
(c) Training the neural network model to obtain a trained target detection model;
and (b) after data enhancement is carried out on the training set in the step (a), optimizing model parameters of the neural network model according to the training set, the test set and the training parameters after the data enhancement, and obtaining a trained target detection model.
As a preferred technical scheme:
the method for training the target detection model based on the Yolo includes the following specific steps:
(c1) After data enhancement is carried out on the training set in the step (a), chip appearance defect pictures in the training set are input into a neural network model, and a target detection label prediction value is obtained;
(c2) Calculating a loss function value by using the real value of the target detection label and the predicted value of the target detection label;
(c3) Updating model parameters (parameters are divided into hyper-parameters and model parameters, the hyper-parameters are set by people, the model parameters are optimized by an algorithm, and the above-mentioned training parameters are the hyper-parameters) by using the loss function values;
(c4) Inputting the chip appearance defect pictures concentrated in the test into a neural network model to obtain a target detection label predicted value;
(c5) Calculating a loss function value and a test set accuracy by using the real value of the target detection label and the predicted value of the target detection label;
(c6) Judging whether the accuracy of the test set is greater than the maximum accuracy R (the value range is 0-100%), if so, saving the neural network model, updating R, and entering the next step; otherwise, directly entering the next step;
(c7) Judging whether the neural network model converges (judging whether the model converges by judging whether the loss function values of the training set and the test set are gradually reduced), and if so, entering the next step; otherwise, the learning rate is decreased (the specific adjustment value is determined according to experience, for example, the adjustment value is decreased by 10 times, namely, the adjustment value is adjusted to one tenth of the last learning rate), and the step (c 1) is returned;
(c8) Judging whether the maximum training round is reached, if so, ending, and outputting a trained target detection model; otherwise, returning to the step (c 1).
The above-mentioned method for training a target detection model based on Yolo, training data set D train And validating the data set D val The ratio of the number of data of 9:1.
according to the above method for training the target detection model based on the Yolo, if the training set contains small targets, the data enhancement process includes sequentially performing Cutmix, mosaic data enhancement, class label smoothing, random copy and paste at an instance level, region elastic deformation, inversion, random scaling and brightness contrast random transformation; otherwise, the data enhancement process comprises the steps of sequentially carrying out Cutmix, mosaic data enhancement, class label smoothing, region elastic deformation, turnover and brightness contrast random transformation;
the small target is an appearance defect with the ratio of the width and the height of the bounding box to the width and the height of the image being less than 0.1, or an appearance defect with the resolution being less than 32 pixels multiplied by 32 pixels.
In the above method for training a target detection model based on Yolo, the specific steps of the elastic deformation of the region are as follows:
(i) The area ratio range of the rectangular frame to the image is set to (r) min ,r max ) Rectangular frameHas an aspect ratio in the range of (a) min ,a max );
(ii) Randomly selecting a coordinate point (x, y) in the image at (r) min ,r max ) Randomly selecting the area ratio r within the range i In (a) min ,a max ) Randomly selecting the aspect ratio a within the range i
(iii) According to r i 、a i And the image area, calculate the length and width of the rectangular frame, regard (x, y) as the central point of the rectangular frame, confirm the rectangular frame;
(iv) Performing elastic deformation on the image in the rectangular frame;
(v) And (d) obtaining an image area containing the target according to the target detection label, and repeating the steps (i) to (iv) for the image area.
The method for training the target detection model based on the Yolo includes the following steps:
(i) Marking the picture with the appearance defects of the chip to obtain a marked data set;
predefining the class of the chip appearance defects to obtain the class configuration of the predefined chip appearance defects; calling a graphic image annotation tool, namely label img, labeling the rectangular frame to obtain a Yolo-format target detection label of the chip appearance defect picture, and finally obtaining a chip appearance defect data set formed by the chip appearance defect picture and the target detection label, namely obtaining a labeled data set;
(ii) Preprocessing the labeled data set to obtain a training set and a test set;
judging whether the chip appearance defect data set contains the small target or not, if so, backing up the chip appearance defect data set, dividing the chip appearance defect picture into 64 subgraphs, simultaneously dividing the target detection label into sub-labels according to a rule corresponding to the chip appearance defect picture, extracting the subgraphs and the sub-labels to obtain the divided chip appearance defect data set, dividing the divided chip appearance defect data set into a training set and a test set, backing up the training set and the test set simultaneously, rewriting a copy function by adopting C language in the backup process, and calling the copy function by a preprocessing module in a DLL (delay locked loop) library function form; and otherwise, directly extracting the chip appearance defect picture and the target detection label in the chip appearance defect data set, segmenting the chip appearance defect picture and the target detection label to obtain a training set and a test set, and simultaneously backing up, wherein a C language is adopted to rewrite the copy function in the backup process, and the copy function is called by the preprocessing module in a DLL library function mode.
According to the above Yolo-based target detection model training method, the segmentation adopts a random sampling method, and the ratio of the data quantity of the training set to the data quantity of the testing set is 9:1.
According to the Yolo-based target detection model training method, the segmentation process is accelerated by adopting a multithreading method.
The invention also provides an application of the method for training the target detection model based on the yo, after the sample set to be tested is obtained, the sample set to be tested is input into the trained target detection model, and the prediction label of the yo format of the sample set to be tested is output by the model;
the acquisition process of the sample set to be detected comprises the following steps: collecting an appearance defect picture of a chip to be detected, judging whether the appearance defect picture of the chip to be detected contains the small target or not, if so, backing up the appearance defect picture of the chip to be detected, dividing the appearance defect picture of the chip to be detected into 64 sub-pictures, extracting the sub-pictures to obtain a sample set to be detected, and backing up the sample set at the same time; otherwise, directly extracting the picture of the appearance defect of the chip to be detected to obtain a sample set to be detected, and simultaneously carrying out backup.
The invention also provides a device adopting the method for training the target detection model based on the Yolo, which comprises the following steps:
the data set marking unit is used for marking the chip appearance defect picture to obtain a marked data set;
the data set segmentation and preprocessing unit is used for preprocessing the labeled data set to obtain a training set and a test set;
the parameter tuning unit is used for loading the training set and the test set and setting corresponding training parameters;
the neural network architecture searching unit is used for constructing a meta structure searching space according to the training set, searching the neural network architecture and obtaining a neural network model;
and the training unit is used for training the neural network model to obtain a trained target detection model.
As a preferred technical scheme:
the apparatus as described above further includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the flow of the computer program is as follows:
(S1) marking the picture with the appearance defects of the chip to obtain a marked data set;
(S2) preprocessing the labeled data set to obtain a training set and a test set;
(S3) loading a training set and a test set, and setting corresponding training parameters;
(S4) constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;
and (S5) training the neural network model to obtain a trained target detection model.
The method of the invention has the following four characteristics:
(1) The prior appearance detection method is from training to reasoning, but the method integrates data marking, preprocessing, model optimization and training, integrates the whole flow of appearance detection, and unifies a plurality of algorithms and steps under a framework; the algorithms are originally unrelated, each step needs to be executed manually, and intermediate conversion operation is carried out, but the incompatibility among the algorithms is overcome, and the algorithms are connected, so that for a user, the method greatly simplifies the complex operation process, reduces the difficulty of applying the algorithms, improves the efficiency of the model from training to deployment, and is more suitable for complex and changeable scenes of a production field;
(2) Aiming at the problems of tiny and irregular chip appearance defects, the invention designs a small target mode, namely when a chip appearance defect picture contains small targets, the picture is divided into 64 sub-pictures, tags are divided according to rules corresponding to the picture, then the sub-pictures and the sub-tags are extracted and divided into a training set and a test set, and then corresponding data enhancement is respectively carried out; the small target mode enables the model to pay more attention to chip appearance defects instead of a large number of backgrounds, and detection accuracy is improved;
(3) In the stage of preprocessing the labeled data set, the speed of the segmentation process is increased by adopting a multithread method, after the segmentation is completed, the copy function is rewritten by adopting C language in the backup process and is called by a preprocessing module in a DLL (delay locked loop) library function mode, so that the problem that the execution speed of the python copy function is too low is solved, and the preprocessing speed of the system is increased in multiples; the running environment is a CPU: i7-10700, hard disk: when NVME KIOXIA 256G is used, 223 chip appearance defect pictures (the picture size is 6464x 4852) containing small targets are preprocessed, and the preprocessing time is reduced from 6h to within 20 min;
(4) The neural network architecture is adopted for searching to obtain an optimized network structure, so that automatic optimization of the target detection model is realized, dependence on expert knowledge is reduced, and dependence of the model on a data set is greatly reduced.
With regard to the characteristics (1), the comparison document 1 (CN 202110944135) mainly solves the technical problem that the Yolo network cannot converge when detecting the same target data set containing different types of feature richness, and has no labeling and preprocessing method, but the content of the invention is directed to the chip appearance defect detection task, and makes adjustments, such as small target mode, and integrates labeling and preprocessing methods; the comparison document 2 (CN 202110905324) mainly provides a sorting system, a built-in algorithm is simple, and the key point of the invention is training from data labeling to a model without involving a mechanical structure; the comparison document 3 (CN 113410154A) judges whether the chip is qualified or not by using a region division method, which is completely different from the method used by the invention and can only classify but not accurately identify the unqualified region of the chip, which is essentially different from the method used by the invention; the comparison document 4 (an IC chip appearance detection system based on machine vision, university of southern China) mainly increases the positioning precision of chip pins by a light source setting and image processing method of image acquisition, and detects pin defects and printing information definition defects of an SOP type chip, but the invention does not relate to mechanical facilities such as image acquisition and the like, can detect various types of appearance defects, is a universal detection model training frame, and has essential differences; a comparison document 5 (a QFP chip appearance visual detection system and a detection method) Chinese mechanical engineering 24 (3) adopts a Canny operator edge detection algorithm to process images of QFP pins, emphasizes a pin stack height detection method and a pin coplanarity detection method based on a three-point method, is similar to the previous invention, and detects the pins by using a traditional image processing method.
The method is characterized in that (2) the comparison file 6 (CN 202110880257) mainly realizes the defect detection of the small chip through transfer learning, but the method improves the detection precision of the small target by segmenting a high-resolution picture, and searches a model suitable for a data set through a neural network architecture searching method; a comparison file 7 (carrier chip defect detection based on a lightweight convolutional neural network, computer engineering and application: 1-10.) provides a carrier chip defect detection algorithm YOLO-Effectinenet based on the lightweight convolutional neural network aiming at the real-time detection problems of three different types of defects, namely carrier chip breakouts, positioning column damages and waveguide stains, but the method aims at the three defects of the carrier chip, the method is suitable for detecting the defects of various chips, particularly small target appearance defects are adjusted, the method comprises various flows from data marking to preprocessing, and a whole set of system is included, which is not possessed by other inventions.
With respect to feature (3), no relevant document mentions the application of such an acceleration method in the detection of defects in the chip appearance, since this is related to the small target mode of the present invention and is unique to the present invention.
The method has the characteristics that (4), the comparison file 8 (CN 202110642625) only detects the chip solder balls, and meanwhile, the characters are identified, the detection content is single, and the adaptability is poor, but the method can reconstruct a model at any time according to the change of a training data set, and an optimal network structure is searched out based on a neural network architecture searching method, so that the method is very convenient and efficient; the comparison document 9 (IC chip appearance defect recognition algorithm research based on deep learning, university in south of the Yangtze river) mainly studies the traditional convolutional neural network algorithm, which is different from the full-flow system method provided by the present invention, and is essentially different from Yolo and neural network architecture search.
Generally, compared with the comparison documents, the invention only constructs data marking, preprocessing, parameter setting, neural network architecture searching and model training in a system among a plurality of methods adopting deep learning technology, thereby forming an integral method for detecting the appearance defects of the chip. The method is systematic, integrated, automated, and globally considered, while optimizing various components, such as small target patterns, preprocessing acceleration methods, and the like. Other methods only focus on one part of the detection system and cannot be directly put into production line for use.
Compared with the prior art, other methods only optimize a certain part, but the invention considers how to simplify the operation flow from the overall perspective so as to better implement the method; the prior art method has limitations, and the invention designs methods such as preprocessing acceleration, small target mode and the like elaborately, and optimizes a subject model by using the latest neural network architecture search algorithm. The invention skillfully integrates the processes and algorithms, so that the processes and algorithms become a new whole and cannot be regarded as simple combination.
Has the advantages that:
(1) The method for training the target detection model based on the Yolo simplifies the operation, realizes the standardization of the whole detection model training process, enables the whole process to be highly automatic, improves the efficiency and reduces the dependence on expert knowledge;
(2) According to the Yolo-based target detection model training method, a front-line worker can autonomously and normatively finish data acquisition, labeling and preprocessing according to the actual condition of the obtained sample; obtaining a deep neural network model suitable for the current data set through a smaller data set and a standard data labeling and neural network architecture searching module, and finishing the training and deployment of the model;
(3) The Yolo-based target detection model training device is simple in structure and convenient to operate.
Drawings
FIG. 1 is a schematic flow chart of a method for training a target detection model based on Yolo according to the present invention;
FIG. 2 is a schematic diagram of a partial structure of a Yolo-based target detection model training apparatus according to the present invention;
FIG. 3 is a schematic flow chart illustrating a process of training a neural network model to obtain a trained target detection model in the Yolo-based target detection model training method of the present invention;
FIG. 4 shows the specific steps of the zone elastic deformation algorithm of the present invention;
FIG. 5 is a schematic diagram illustrating the deformation principle of elastic deformation of the region according to the present invention, wherein (a) is an original drawing and (b) is a drawing after deformation;
fig. 6 is a diagram of the actual effect of the elastic deformation of the region of the present invention, in which (a) is the actual effect 1, (b) is the actual effect 2, (c) is the actual effect 3, (d) is the actual effect 4, (e) is the actual effect 5, (f) is the actual effect 6, (g) is the actual effect 7, (h) is the actual effect 8, and (i) is the actual effect 9.
Detailed Description
The invention will be further illustrated with reference to specific embodiments. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes and modifications of the present invention may be made by those skilled in the art after reading the teachings of the present invention, and such equivalents may fall within the scope of the present invention as defined in the appended claims.
A method for training a target detection model based on Yolo is disclosed, as shown in FIG. 1, and comprises the following specific steps:
(a) Acquiring a training set and a test set, and specifically comprising the following steps:
(i) Marking the picture with the appearance defects of the chip to obtain a marked data set;
predefining the classes of the chip appearance defects to obtain the configuration of the classes of the predefined chip appearance defects; calling a graphic image annotation tool labelImg to label a rectangular frame to obtain a Yolo-format target detection label of the chip appearance defect picture, and finally obtaining a chip appearance defect data set consisting of the chip appearance defect picture and the target detection label, namely obtaining a labeled data set;
(ii) Preprocessing the labeled data set to obtain a training set and a test set;
judging whether the chip appearance defect data set contains the small target or not, if so, backing up the chip appearance defect data set, dividing the chip appearance defect picture into 64 subgraphs by adopting a multithreading method, simultaneously dividing the target detection label into sub-labels according to a rule corresponding to the chip appearance defect picture by adopting a multithreading method, extracting the subgraphs and the sub-labels to obtain the divided chip appearance defect data set, and segmenting the chip appearance defect data set by adopting a random sampling method, namely obtaining a training set and a test set with a data quantity ratio of 9:1, and backing up the chip appearance defect data set at the same time, wherein a C language is adopted to rewrite a copy function in the backup process, and the copy function is called by a preprocessing module in a DLL library function mode; otherwise, directly extracting the chip appearance defect picture and the target detection label in the chip appearance defect data set, and segmenting the chip appearance defect picture and the target detection label by adopting a random sampling method, namely obtaining a training set and a test set with a data quantity ratio of 9:1, simultaneously carrying out backup, rewriting a copy function by adopting C language in the backup process, and calling the copy function by a preprocessing module in a DLL (delay locked loop) library function mode;
the small target is an appearance defect with the ratio of the width and the height of the bounding box to the width and the height of the image being less than 0.1, or an appearance defect with the resolution being less than 32 pixels multiplied by 32 pixels;
the training set comprises a chip appearance defect picture and a target detection label of a Yolo format of the chip appearance defect picture;
(b) Loading a training set and a test set, and setting corresponding training parameters;
the training parameters comprise predefined chip appearance defect category number, maximum training round, learning rate, picture input width, picture input height and filters, wherein the predefined chip appearance defect category number is the number of the types of the chip appearance defects, the predefined chip appearance defect category number is set to be n larger than or equal to 1, the maximum training round is set to be larger than or equal to 2000 x n, the learning rate is set to be 0.00111, the picture input width is set to be 512, the picture input height is set to be 512, and the filters are set to be (n + 5) x 3;
(c) Constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;
dividing the training set in step (a) into a training data set D with a data quantity ratio of 9:1 train And validating the data set D val Building a meta-structure search space, modeling a method for searching a differentiable neural network architecture, and using the training data set D train Training a differentiable neural network architecture search model;
in the training process, the structure weight of the element structure is subjected to global normalization, and then the structure weight of the element structure and the network parameters are subjected to double-layer optimization, namely the verification data set D is used val The loss value of the element structure is used as an objective function of the optimization process, and the network parameters and the structure weight of the element structure are simultaneously adjusted through a back propagation algorithm;
after training is finished, sequencing is carried out according to the structure weights of all the element structures, the element structure with the largest weight is reserved, and a deep neural network model is formed, so that the neural network model is obtained;
(d) Training the neural network model to obtain a trained target detection model, as shown in fig. 3, specifically including the following steps:
(d1) After data enhancement is carried out on the training set in the step (a), chip appearance defect pictures (namely 'training pictures' in the picture) in the training set are input into a neural network model, and a target detection label prediction value (namely 'prediction result' in the picture) is obtained; if the training set contains small targets, the data enhancement process comprises sequentially performing Cutmix, mosaic data enhancement, class label smoothing, random copy and paste at an instance level, region elastic deformation, overturning, random scaling and brightness contrast random transformation; otherwise, the data enhancement process comprises the steps of sequentially carrying out Cutmix, mosaic data enhancement, class label smoothing, region elastic deformation, turnover and brightness contrast random transformation;
(d2) Calculating a loss function value by using a real value of the target detection label (namely a 'training picture label' in the graph) and a predicted value of the target detection label (namely a 'prediction result' in the graph);
(d3) Updating model parameters (parameters are divided into hyper-parameters and model parameters, the hyper-parameters are set by people, the model parameters are optimized by an algorithm, and the above-mentioned training parameters are the hyper-parameters) by using the loss function values;
(d4) Inputting a chip appearance defect picture (namely a 'test picture' in the picture) in the test set into the neural network model to obtain a target detection label predicted value (namely a 'predicted result' in the picture);
(d5) Calculating a loss function value and a test set accuracy by using a real value of a target detection label (namely a 'test picture label' in the figure) and a predicted value of the target detection label (namely a 'prediction result' in the figure);
(d6) Judging whether the accuracy of the test set is greater than the maximum accuracy R (the initial assignment of R is 0%), if so, saving the neural network model, updating R, and entering the next step; otherwise, directly entering the next step;
(d7) Judging whether the neural network model is converged, if so, entering the next step; otherwise, decreasing the learning rate (the specific adjustment value is determined according to experience, for example, decreasing by 10 times, that is, decreasing by one tenth of the last learning rate), and returning to step (d 1);
(d8) Judging whether the maximum training round is reached, if so, ending, and outputting a trained target detection model; otherwise, returning to the step (d 1).
In the step (d 1), one link in data enhancement is elastic deformation of the region, as shown in fig. 4, the specific steps are as follows:
(i) The area ratio range of the rectangular frame to the image is set to (r) min ,r max ) The rectangular frame has an aspect ratio ranging from (a) min ,a max );
(ii) Randomly selecting a coordinate point (x, y) in the image at (r) min ,r max ) Randomly selecting the area ratio r within the range i In (a) min ,a max ) Randomly selecting the aspect ratio a within the range i
(iii) According to r i 、a i And the image area, calculate the length and width of the rectangular frame, regard (x, y) as the central point of the rectangular frame, confirm the rectangular frame;
(iv) Elastically deforming the image in the rectangular frame;
(v) And (d) obtaining an image area containing the target according to the target detection label, and repeating the steps (i) to (iv) for the image area.
The mathematical principle of elastic deformation is based on bilinear interpolation, as shown in FIG. 5.
The application of the Yolo-based target detection model training method as described above: after a sample set to be tested is obtained, inputting the sample set to be tested into the trained target detection model, and outputting a prediction label of a Yolo format of the sample set to be tested; the acquisition process of the sample set to be detected is as follows: collecting an appearance defect picture of a chip to be detected, judging whether the appearance defect picture of the chip to be detected contains the small target or not, if so, backing up the appearance defect picture of the chip to be detected, dividing the appearance defect picture of the chip to be detected into 64 sub-pictures, extracting the sub-pictures to obtain a sample set to be detected, and backing up the sample set at the same time; otherwise, directly extracting the picture of the appearance defect of the chip to be detected to obtain a sample set to be detected, and simultaneously carrying out backup.
The device adopting the Yolo-based target detection model training method comprises a data set labeling unit, a data set segmentation and preprocessing unit, a parameter tuning unit, a training unit, a memory, a processor and a computer program which is stored on the memory and can run on the processor; wherein, the partial structure schematic diagram of the device is shown in FIG. 2;
the data set marking unit is used for marking the chip appearance defect picture to obtain a marked data set;
the data set segmentation and preprocessing unit is used for preprocessing the labeled data set to obtain a training set and a test set;
the parameter tuning unit is used for loading the training set and the test set and setting corresponding training parameters;
the neural network architecture searching unit is used for constructing a meta structure searching space according to the training set, searching the neural network architecture and obtaining a neural network model;
the training unit is used for training the neural network model to obtain a trained target detection model;
the flow of the computer program is as follows:
(S1) marking the picture with the appearance defects of the chip to obtain a marked data set;
(S2) preprocessing the labeled data set to obtain a training set and a test set;
(S3) loading a training set and a test set, and setting corresponding training parameters;
(S4) constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;
and (S5) training the neural network model to obtain a trained target detection model.
Now, the above mentioned target detection model training method based on Yolo and its application are explained by combining with specific cases, and the task is as follows: detecting the appearance defects and scratches of the chip, which comprises the following steps:
(a) Acquiring a training set and a test set;
predefining a chip appearance defect class file, wherein the file content is 'hushang', and the file content is a class label of the chip appearance defect; firstly, collecting a batch of chip appearance defect pictures, opening a graphic image annotation tool label img for marking, wherein the marking content is scratch defects, and the marking form is a rectangular frame; during marking, the rectangular frame can just wrap the scratch; marking all pictures containing scratches, namely obtaining an initial data set;
the size of the picture with the appearance defects of the original chip is 6464 multiplied by 4852, the size of the scratch defects is smaller than 600 multiplied by 480, and the condition of a small target is met, so that the picture and the label are segmented; dividing the picture with the chip appearance defects into 64 sub-pictures, and dividing the scratch label into sub-labels according to rules corresponding to the pictures; then, taking out the sub-graph containing the scratch and the corresponding sub-label to obtain a small target chip appearance defect data set or a chip appearance scratch data set;
segmenting the small target chip appearance defect data set according to the proportion of 9:1 to obtain a training set and a test set;
(b) Loading a training set and a test set, and setting corresponding training parameters;
setting the number n of predefined chip appearance defect types as 1, the maximum training round as 2000, the learning rate as 0.00111, the picture input width as 512, the picture input height as 512 and the filter as 18;
(c) Constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;
dividing the training set obtained above into training data sets D according to the proportion of 9:1 train And validating the data set D val Constructing a meta-structure search space under the integral architecture of Yolo, modeling a method for searching a differentiable neural network architecture, and utilizing the training data set D train Training a differentiable neural network architecture search model;
in the training process, the structure weight of the element structure is subjected to global normalization, and then the structure weight of the element structure and the network parameters are subjected to double-layer optimization, namely, the verification data set D is used val The loss value of the element structure is used as an objective function of the optimization process, and the network parameters and the structure weight of the element structure are simultaneously adjusted through a back propagation algorithm;
after training is finished, sequencing is carried out according to the structure weights of all the element structures, the element structure with the largest weight is reserved, and a deep neural network model is formed, so that the neural network model is obtained;
(d) Training the neural network model to obtain a trained target detection model, and specifically comprising the following steps:
(d1) Firstly, sequentially applying a data enhancement method of Cutmix, mosaic data enhancement, class label smoothing, random copying and pasting at an example level, region elastic deformation, overturning, random scaling and brightness contrast random transformation to a training set to obtain an enhanced training set picture and a corresponding scratch label; then, a data enhancement method of turning, randomly scaling and randomly changing the brightness contrast ratio is carried out on the test set data to obtain an enhanced test set picture and a corresponding scratch label; finally, inputting the enhanced training set chip appearance defect picture into a neural network to obtain a scratch label predicted value;
(d2) Calculating a loss function value by using the real value of the scratch label and the predicted value of the scratch label;
(d3) Updating the model parameters by using the loss function values;
(d4) Inputting the chip appearance scratch pictures in the test set into a neural network model to obtain a scratch predicted value of the test set;
(d5) Calculating a loss function value and a test set accuracy by using the real value of the scratch label of the test set and the predicted value of the scratch label of the test set;
(d6) Judging whether the accuracy of the test set is greater than the maximum accuracy R (the initial assignment of R is 0%), if so, saving the neural network model, updating R, and entering the next step; otherwise, directly entering the next step;
(d7) Judging whether the neural network model is converged, if so, entering the next step; otherwise, the learning rate is decreased (the specific adjustment value is determined according to experience, for example, the adjustment value is decreased by 10 times, namely, the adjustment value is adjusted to one tenth of the last learning rate), and the step (d 1) is returned;
(d8) Judging whether the maximum training round number is 2000, if so, finishing the training, and outputting a trained target detection model; otherwise, returning to the step (d 1);
(e) Outputting a label of an appearance defect picture of the chip to be detected;
collecting a chip appearance defect picture to be detected, and dividing the chip appearance defect picture into 64 sub-pictures to obtain a divided chip appearance defect picture; extracting the segmented chip appearance defect picture to obtain a sample set to be detected, and simultaneously carrying out backup;
inputting a sample set to be tested into the trained neural network model to obtain a scratch prediction label of the sample set to be tested;
the existing chip to be detected has 100 pictures with appearance defects, and 110 scratch defects exist, wherein the scratch defects are caused by 90 small targets; inputting a sample to be detected into the model of the invention and the model (master-rcnn) in the prior art to obtain a detection result; the model detects 100 pictures and 102 scratch defects, wherein the scratch defects of a small target are 85, and the detection speed is 35 pictures per second; the prior art model (master-rcnn) detects 93 scratch defects, wherein 79 scratch defects are detected from a small target, and the detection speed is 5 pictures per second.
In addition to chip datasets, the present invention works well on other common datasets as well. On a PASCAL VOC 2007 target detection data set, by using the method, the AP50 index is improved by 0.5 percent compared with a basic model. In addition, the region elastic deformation algorithm in the invention can also be applied to an image classification task, and the actual effect is as shown in fig. 6. On a CIFAR-10 data set, the accuracy of the original ResNet18 model is 94.92%, and after the regional elastic deformation algorithm is adopted, the accuracy is 95.86%. On a CIFAR-100 data set, the accuracy of an original ResNet50 model is 80.60%, and after a region elastic deformation algorithm is adopted, the accuracy is 81.68%.
The method has the advantages that: compared with the original Yolo network, the method inherits the high-efficiency detection speed of the Yolo network model, has good accuracy rate aiming at small targets, and can adjust the structure of the neural network model according to a data set; compared with the fast-rcnn method, the method has the advantages of higher detection speed and simpler operation method; from the aspect of data enhancement, the invention designs a regional elastic deformation data enhancement method, which simulates the local elastic deformation of an object in the real world, increases the richness of a sample and improves the robustness of a model; from the perspective of standardization, compared with other methods which only improve the model structure, the method provided by the invention links the data set construction and the model training process aiming at the variable data set, reduces the complexity of data preprocessing, and provides a more standard and complete model training method.

Claims (10)

1. A method for training a target detection model based on Yolo is characterized by comprising the following steps:
(a) Loading a training set and a test set, and setting corresponding training parameters;
the training set comprises a chip appearance defect picture and a target detection label of a Yolo format of the chip appearance defect picture;
the training parameters comprise predefined chip appearance defect category number, maximum training round, learning rate, picture input width, picture input height and filters, wherein the predefined chip appearance defect category number is set to be n is larger than or equal to 1, the maximum training round is set to be larger than or equal to 2000 Xn, the learning rate is set to be 0.00111, the picture input width is set to be 512, the picture input height is set to be 512, and the filters are set to be (n + 5) X3;
(b) Constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;
dividing the training set in step (a) into a training data set D tain And validating the data set D val Building a meta-structure search space, modeling a method for searching a differentiable neural network architecture, and using the training data set D train Training a differentiable neural network architecture search model;
in the training process, the structure weight of the element structure is subjected to global normalization, and then the structure weight of the element structure and the network parameters are subjected to double-layer optimization, namely, the verification data set D is used val The loss value of the element structure is used as an objective function of the optimization process, and the network parameters and the structure weight of the element structure are simultaneously adjusted through a back propagation algorithm;
after training is finished, sequencing is carried out according to the structure weights of all the element structures, the element structure with the largest weight is reserved, and a deep neural network model is formed, so that the neural network model is obtained;
(c) Training the neural network model to obtain a trained target detection model;
and (b) after data enhancement is carried out on the training set in the step (a), optimizing model parameters of the neural network model according to the training set, the test set and the training parameters after the data enhancement, and obtaining a trained target detection model.
2. The method for training a Yolo-based target detection model according to claim 1, wherein the specific process of step (c) is as follows:
(c1) After data enhancement is carried out on the training set in the step (a), chip appearance defect pictures in the training set are input into a neural network model, and a target detection label prediction value is obtained;
(c2) Calculating a loss function value by using the real value of the target detection label and the predicted value of the target detection label;
(c3) Updating the model parameters by using the loss function values;
(c4) Inputting the chip appearance defect pictures in the test set into a neural network model to obtain a target detection label predicted value;
(c5) Calculating a loss function value and a test set accuracy by using the real value of the target detection label and the predicted value of the target detection label;
(c6) Judging whether the accuracy of the test set is greater than the maximum accuracy R, if so, saving the neural network model, updating R, and entering the next step; otherwise, directly entering the next step;
(c7) Judging whether the neural network model is converged, if so, entering the next step; otherwise, reducing the learning rate and returning to the step (c 1);
(c8) Judging whether the maximum training round is reached, if so, ending, and outputting a trained target detection model; otherwise, returning to the step (c 1).
3. The method of claim 1, wherein the training data is a training dataSet D train And validating the data set D val Is 9:1.
4. The Yolo-based target detection model training method as claimed in claim 1, wherein if the training set contains small targets, the data enhancement process is sequentially Cutmix, mosaic data enhancement, class label smoothing, instance-level random copy-paste, regional elastic deformation, flipping, random scaling and luminance contrast random transformation; otherwise, the data enhancement process comprises the steps of sequentially carrying out Cutmix, mosaic data enhancement, class label smoothing, area elastic deformation, turnover and brightness contrast random transformation;
the small target is an appearance defect with the ratio of the width and the height of the bounding box to the width and the height of the image being less than 0.1, or an appearance defect with the resolution being less than 32 pixels multiplied by 32 pixels.
5. The method for training the Yolo-based target detection model according to claim 4, wherein the specific steps of elastic deformation of the region are as follows:
(i) The area ratio range of the rectangular frame to the image is set to (r) min ,r max ) The rectangular frame has an aspect ratio ranging from (a) min ,a max );
(ii) Randomly selecting a coordinate point (x, y) in the image at (r) min ,r max ) Randomly selecting the area ratio r within the range i In (a) min ,a max ) Randomly selecting the aspect ratio a within the range i
(iii) According to r i 、a i And the image area, calculate the length and width of the rectangular frame, regard (x, y) as the central point of the rectangular frame, confirm the rectangular frame;
(iv) Elastically deforming the image in the rectangular frame;
(v) And (d) obtaining an image area containing the target according to the target detection label, and repeating the steps (i) to (iv) for the image area.
6. The method for training a Yolo-based target detection model according to claim 4, wherein the training set and the test set are obtained by the following steps:
(i) Marking the picture with the appearance defects of the chip to obtain a marked data set;
predefining the classes of the chip appearance defects to obtain the configuration of the classes of the predefined chip appearance defects; calling a graphic image annotation tool labelImg to label a rectangular frame to obtain a Yolo-format target detection label of the chip appearance defect picture, and finally obtaining a chip appearance defect data set consisting of the chip appearance defect picture and the target detection label, namely obtaining a labeled data set;
(ii) Preprocessing the labeled data set to obtain a training set and a test set;
judging whether the chip appearance defect data set contains the small target or not, if so, backing up the chip appearance defect data set, dividing the chip appearance defect picture into 64 sub-pictures, simultaneously dividing the target detection label into sub-labels according to a rule corresponding to the chip appearance defect picture, extracting the sub-pictures and the sub-labels to obtain the divided chip appearance defect data set, dividing the divided chip appearance defect data set to obtain a training set and a test set, backing up the chip appearance defect data set, rewriting a copy function by adopting C language in the backup process, and calling the copy function by a pre-processing module in a DLL (delay locked loop) library function mode; and otherwise, directly extracting the chip appearance defect picture and the target detection label in the chip appearance defect data set, segmenting the chip appearance defect picture and the target detection label to obtain a training set and a test set, simultaneously backing up, rewriting the copy function by adopting C language in the backup process, and calling the copy function by the preprocessing module in a DLL library function mode.
7. The method as claimed in claim 6, wherein the segmentation adopts a random sampling method, and the ratio of the data quantity of the training set to the data quantity of the testing set is 9:1; and the speed of the segmentation process is increased by adopting a multithreading method.
8. The application of the Yolo-based target detection model training method as claimed in claim 6 or 7, wherein after a sample set to be tested is obtained, the sample set is input into the trained target detection model, and a prediction label in the Yolo format of the sample set to be tested is output;
the acquisition process of the sample set to be detected comprises the following steps: collecting an appearance defect picture of a chip to be detected, judging whether the appearance defect picture of the chip to be detected contains the small target or not, if so, backing up the appearance defect picture of the chip to be detected, dividing the appearance defect picture of the chip to be detected into 64 sub-pictures, extracting the sub-pictures to obtain a sample set to be detected, and backing up the sample set at the same time; otherwise, directly extracting the picture of the appearance defect of the chip to be detected to obtain a sample set to be detected, and simultaneously carrying out backup.
9. The apparatus for the Yolo-based target detection model training method as claimed in claim 6 or 7, comprising:
the data set marking unit is used for marking the chip appearance defect picture to obtain a marked data set;
the data set segmentation and preprocessing unit is used for preprocessing the labeled data set to obtain a training set and a test set;
the parameter tuning unit is used for loading the training set and the test set and setting corresponding training parameters;
the neural network architecture searching unit is used for constructing a meta structure searching space according to the training set, searching the neural network architecture and obtaining a neural network model;
and the training unit is used for training the neural network model to obtain a trained target detection model.
10. The apparatus of claim 9, further comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program having a flow chart as follows:
(S1) marking the picture with the appearance defects of the chip to obtain a marked data set;
(S2) preprocessing the labeled data set to obtain a training set and a test set;
(S3) loading a training set and a test set, and setting corresponding training parameters;
(S4) constructing a meta-structure search space according to the training set, and searching a neural network architecture to obtain a neural network model;
and (S5) training the neural network model to obtain a trained target detection model.
CN202210959911.1A 2022-08-11 2022-08-11 Yolo-based target detection model training method and application and device thereof Pending CN115527089A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210959911.1A CN115527089A (en) 2022-08-11 2022-08-11 Yolo-based target detection model training method and application and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210959911.1A CN115527089A (en) 2022-08-11 2022-08-11 Yolo-based target detection model training method and application and device thereof

Publications (1)

Publication Number Publication Date
CN115527089A true CN115527089A (en) 2022-12-27

Family

ID=84696164

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210959911.1A Pending CN115527089A (en) 2022-08-11 2022-08-11 Yolo-based target detection model training method and application and device thereof

Country Status (1)

Country Link
CN (1) CN115527089A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984274A (en) * 2023-03-20 2023-04-18 菲特(天津)检测技术有限公司 Vehicle appearance detection model, construction method and detection method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984274A (en) * 2023-03-20 2023-04-18 菲特(天津)检测技术有限公司 Vehicle appearance detection model, construction method and detection method
CN115984274B (en) * 2023-03-20 2023-05-30 菲特(天津)检测技术有限公司 Vehicle appearance detection model, construction method and detection method

Similar Documents

Publication Publication Date Title
CN110348441B (en) Value-added tax invoice identification method and device, computer equipment and storage medium
CN111028217A (en) Image crack segmentation method based on full convolution neural network
CN111652317B (en) Super-parameter image segmentation method based on Bayes deep learning
Kadam et al. Detection and localization of multiple image splicing using MobileNet V1
CN109145766A (en) Model training method, device, recognition methods, electronic equipment and storage medium
CN110956126A (en) Small target detection method combined with super-resolution reconstruction
CN111209907A (en) Artificial intelligent identification method for product characteristic image in complex light pollution environment
CN115131613B (en) Small sample image classification method based on multidirectional knowledge migration
CN113111804B (en) Face detection method and device, electronic equipment and storage medium
CN115439458A (en) Industrial image defect target detection algorithm based on depth map attention
US20220391641A1 (en) Defect Detection System
CN114359199A (en) Fish counting method, device, equipment and medium based on deep learning
Zhao et al. Research on detection method for the leakage of underwater pipeline by YOLOv3
CN117011260A (en) Automatic chip appearance defect detection method, electronic equipment and storage medium
CN111696079A (en) Surface defect detection method based on multi-task learning
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
CN115527089A (en) Yolo-based target detection model training method and application and device thereof
Mirani et al. Object recognition in different lighting conditions at various angles by deep learning method
Wang et al. Automatic identification and location of tunnel lining cracks
Yang et al. An improved algorithm for the detection of fastening targets based on machine vision
CN114419006A (en) Method and system for removing watermark of gray level video characters changing along with background
Li et al. Research on textile defect detection based on improved cascade R-CNN
CN116665054A (en) Remote sensing image small target detection method based on improved YOLOv3
CN115424280A (en) Handwritten digit detection method based on improved Faster-RCNN
CN111259974B (en) Surface defect positioning and classifying method for small-sample flexible IC substrate

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination