CN117593264A - Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation - Google Patents

Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation Download PDF

Info

Publication number
CN117593264A
CN117593264A CN202311544930.9A CN202311544930A CN117593264A CN 117593264 A CN117593264 A CN 117593264A CN 202311544930 A CN202311544930 A CN 202311544930A CN 117593264 A CN117593264 A CN 117593264A
Authority
CN
China
Prior art keywords
model
loss
yolov5
training
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311544930.9A
Other languages
Chinese (zh)
Inventor
金晶
陈铎
何旭杰
冯怡园
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CN202311544930.9A priority Critical patent/CN117593264A/en
Publication of CN117593264A publication Critical patent/CN117593264A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an improved detection method for the inner wall of a cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation. The method comprises the following steps: enhancing the image; constructing and training a teacher model by using the YOLOv5 model; training the YOLOv5 student model by improved knowledge distillation on the newly added defect class and sample based on the teacher model. According to the invention, through an improved combined YOLOv5 target detection algorithm and a knowledge distillation training mode, the recognition capability of an existing model on an old sample after training is carried out by only using the defect type of the inner wall of the cylinder hole of the newly-added engine and the defect sample of the inner wall of the cylinder hole of the newly-added engine is improved.

Description

Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation
Technical Field
The invention relates to the technical field of image defect detection of an inner wall of an engine cylinder hole, in particular to an improved detection method of the inner wall of the cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation.
Background
In the production process of the engine, the quality of the workpiece is extremely easily affected by the factors of the prior art, working conditions and the like. Among these, surface defects are the most intuitive manifestations of the affected quality of engine products. In order to ensure the product yield, it is necessary to perform defect detection of the inner wall surface of the engine cylinder bore. The defect detection means to detect defects such as scratches, foreign matter shielding, color pollution, holes and the like on the surface of the sample to be detected, so as to obtain a series of relevant information such as defect types, positions and the like.
Conventional surface defect detection methods play a great role over a period of time, and are mainly classified into texture feature-based methods, color feature-based methods, and shape feature-based methods. The texture features can reflect the arrangement characteristics of the surface through the gray distribution of the pixels and the neighborhood, including a statistical method, a signal processing method and the like. The method based on the color features is relatively small in calculation amount, has high robustness and does not depend on image viewing angles, directions and the like. Contour-based methods are based on the main representation in shape methods, where shape parameters of an image, such as hough transforms, etc., are obtained by describing the external boundary features of an object. Conventional solutions focus on functions that are specifically designed for a particular problem, but in complex cases, these functions are sometimes difficult to accurately describe. However, the deep learning adopts a data learning mode to convert data into abstract feature representation, so that the system learns corresponding features by itself, and the requirement of complex features on specific defects is overcome. The defect detection method for deep learning is roughly classified into a supervised mode, an unsupervised mode, and a semi-supervised mode. The supervised learning requires that samples of the training set must be labeled for searching for internal rules during training and have a degree of generalization on the test set. Common methods of supervised learning are broadly divided into defect classification networks, defect detection networks, and defect segmentation networks. The non-supervision learning method inputs only the non-label data, obtains the inherent characteristics of the data through a network, judges new data according to a learned model, and common methods include a self-encoder, an countermeasure generation network and the like. The semi-supervision method combines the characteristics of the supervision method and the non-supervision method, obtains better performance under specific conditions, and avoids higher marking cost.
The rapid development of defect detection technology based on deep learning makes the technology widely applied to the field of detecting defects of the inner wall of an engine cylinder hole. In a real scenario, however, new samples and even new defect categories will appear continuously. Due to different reasons such as illumination and engine cylinder hole batches, certain differences exist between the new category and the new sample and the old sample, and when the existing engine defect detection algorithm meets the conditions, the whole model needs to be trained on the new sample and the old sample again, so that a large amount of time is consumed; training only the new samples creates a catastrophic forgetting problem for the old samples. Aiming at the problems, the invention provides an improved method for detecting the inner wall of the inner hole of the automobile engine by combining YOLOv5 with knowledge distillation, which can realize the reservation of the detection capability of the old sample while learning the new category and the new sample, reduce the disastrous forgetting phenomenon of the old sample caused by the existing algorithm and provide a new method choice for the defect detection task.
Disclosure of Invention
The invention aims to solve the problem that the retraining time of a new type and a new sample is excessively consumed in the field of detecting defects of the inner wall of an engine cylinder hole, and the problem that only the new type and the new sample are trained and the old sample is forgotten catastrophically, and provides an improved method for detecting the inner wall of the engine cylinder hole of an automobile by combining YOLOv5 with knowledge distillation.
The invention is realized by the following technical scheme, and provides an improved detection method for the inner wall of a cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation, which comprises the following steps:
s1: carrying out data enhancement on pictures in the image dataset of the inner wall of the engine cylinder hole, wherein the data enhancement comprises HSV enhancement, image translation, image scaling, left-right horizontal overturning and mosaics data enhancement;
s2: constructing and training a teacher model by taking a YOLOv5 model as a framework;
s3: constructing a student model by taking a YOLOv5 model as a framework, and training the student model in a knowledge distillation mode by using an engine cylinder hole inner wall image dataset formed by a new defect type and a new sample based on a trained teacher model;
s4: and the student model obtained through knowledge distillation training realizes the detection of new and old class defect samples.
Further, the step S1 specifically includes:
s11: HSV enhancement mode: randomly adjusting the original image in three aspects of chromaticity, saturation and brightness to obtain different sub-images;
s12: image panning, scaling, and flipping: moving the original image in the horizontal or vertical direction, enlarging or reducing, supplementing the missing boundary according to the characteristics compared with the original image, cutting the exceeding area, and ensuring that the proportion of the zoomed image and the original image is consistent; the original image is turned over in the horizontal and vertical directions to obtain a sub-image;
s13: mosaics data enhancement: and taking out four pictures from the data set, carrying out random overturning, zooming and cutting operations on the four pictures, and then synthesizing the four operated pictures into one image to obtain a sub-image.
Further, in step S2, firstly, adaptive anchor frame calculation is performed according to the features of the dataset, so as to obtain respective three anchor frame ratios under three dimensions of large, medium and small for subsequent training, and then adaptive image scaling is performed on the sample image so as to adapt to detection of targets with different sizes.
Further, the teacher model adopts a yolov5_small structure, the main network comprises a slice structure Focus, a convolution module Conv, a bottleneck layer C3 and a space pyramid pooling SPP, the input image is subjected to repeated downsampling of the main network to extract a plurality of features with different scales, then the features are fused with a network structure FPN through a feature pyramid from top to bottom, the feature information with different scales is fused through a path aggregation network structure PAN from bottom to top, and finally a head prediction network predicts the feature images of 80×80, 40×40 and 20×20 on a small target, a medium target and a large target respectively.
Further, in step S2, the teacher model is trained using a loss function of YOLOv5, which consists of three loss parts including a rectangular box loss lossbox, a confidence loss lossbj, and a classification loss losclcls, where the loss function is a loss weighted sum of the three parts, and the formula is:
Loss=a×lossbox+b×lossobj+c×losscls (1)
the rectangular box loss is calculated by CIOUloss, and the formula is as follows:
loss CIOU =1-CIOU (6)
S 1 ,S 2 the intersection area and the union area of the prediction frame and the real frame are respectively, ρ is the distance between the center of the prediction frame and the center of the real frame, c is the minimum rectangular diagonal length overlapped by the prediction frame and the real frame, v is the aspect ratio similarity of the prediction frame and the real frame, and α is an influence factor; w (w) gt 、h gt Respectively the width and the height of a real frame, w p 、h p The width and the height of the prediction frame are respectively;
confidence loss and classification loss are calculated by BCE loss, and 80×80 feature diagram is taken asFor example, YOLOv5 predicts three rectangular frames located near each pixel grid, so that there is a range of z, x, y values, confidence label is matrix L, confidence in prediction is matrix P, and BCE loss of each value in the matrix is loss BCE (z, x, y); the formula is:
loss BCE (z,x,y)=-L(z,x,y)*logP(z,x,y)-(1-L(z,x,y))*log(1-P(z,x,y)) (7)
0≤z<3,0≤x<80,0≤y<80
training the model by using samples of the old defect types to obtain a teacher model, wherein the teacher model is used as a basis for distilling the new defect types and the new defect samples appearing later.
Further, in step S3, the specific method for performing distillation operation on the student model by using the teacher model is as follows: firstly, building a student model according to a YOLOv5 model frame, freezing teacher model parameters, respectively passing an input picture through the teacher model and the student model, and then passing through a last detection layer, wherein the meaning of output vectors is as follows in sequence: the abscissa of the center point of the target frame, the ordinate of the center point of the target frame, the width of the target frame, the height of the target frame, the foreground probability and the probability of belonging to each category; the student model is a training model capable of identifying the defect type of the inner wall of the cylinder hole of the newly-increased engine and the old defect type at the same time, so that the output of the student model is higher than the output dimension of the teacher model, the quantity of the higher output is the quantity of the newly-increased types, the output of the student model is sliced, a part corresponding to the dimension and meaning of the teacher model is taken out, the output vector of the teacher model and the output sliced vector of the student model are processed by a deformation softmax function, the result of the teacher model is taken as a soft label, the result of the student model is taken as soft prediction, the weighted sum of the KL loss function and the L2 loss function is taken as distillation loss, the output of the teacher model guides the training of the student model, the output result of the student model on the old type is similar to the teacher model, and the recognition capability of the student model on the old type is kept; the L2 loss function formula is:
the KL loss function formula is:
wherein y is i Is a soft label, f (x i ) For soft prediction, the distillation loss L is:
L=0.1KL(y i ||f(x i ))+0.9L2 (10)。
further, in step S3, the output of the student model is subjected to a softmax function to obtain a hard prediction, and the hard prediction and a corresponding hard tag marked in advance in the dataset are calculated according to a loss function of YOLOv5, so that the student model has the capability of identifying a newly added defect type sample;
thus, the total loss function for training the student model is:
Loss total =Loss+λ×L (11)
wherein lambda is a weight parameter;
hard predictions for student models are obtained by softmax functions, given by:
wherein q i The probability, z, output for each category j Output for the full connection layer of each category.
Further, a deformed Softmax function is introduced to carry out smoothing treatment, and the specific mode is to add a parameter T, and the formula is as follows:
after T is added, the probability values tend to be equal, so that the problem caused by overlarge difference between the numerical values is reduced as much as possible.
The invention provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the improved method for detecting the inner wall of the cylinder hole of the automobile engine by combining YOLOv5 with knowledge distillation when executing the computer program.
The present invention proposes a computer readable storage medium for storing computer instructions which, when executed by a processor, implement the steps of the improved method for detecting an inner wall of a cylinder bore of an automotive engine in combination with YOLOv5 and knowledge distillation.
The invention has the beneficial effects that:
the invention is applied to the field of detection of defects of the inner wall of an engine cylinder hole, and provides an improved detection method of the defects of the inner wall of the engine cylinder hole of an automobile, which combines yolov5 with knowledge distillation, so that the recognition capability of an existing model on an old sample after training is improved by only using the defect types of the inner wall of the engine cylinder hole and the defect samples of the inner wall of the engine cylinder hole.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for improved detection of defects in the inner wall of a cylinder bore of an automotive engine combining YOLOv5 with knowledge distillation;
FIG. 2 is a diagram of the structure of a teacher model and a student model in a distillation method;
fig. 3 is a framework diagram of a knowledge distillation method.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The embodiment of the invention provides an improved method for detecting defects of the inner wall of a cylinder hole of an automobile engine by combining a YOLOv5_small model and knowledge distillation. In the practical application scene of the detection of the inner wall of the engine cylinder hole, new samples are continuously generated, even new types are generated, certain differences exist between the new samples and the old samples, in order to enable the models to have the detection capability of the new types and the new samples, the detection capability of the existing models on the old samples is damaged by directly training the new types and the new samples on the existing models, and the time cost corresponding to training the existing models again on all the samples is too high. The knowledge distillation method can protect the knowledge learned in the original model by improving the distillation loss design, and can be used for compression model or incremental learning.
Referring to fig. 1, the invention provides an improved method for detecting the inner wall of a cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation, which comprises the following steps:
s1: carrying out data enhancement on pictures in the image data set of the inner wall of the engine cylinder hole, wherein the data enhancement comprises HSV (hue, saturation and brightness) enhancement, image translation, image scaling, left-right horizontal overturning and mosaic data enhancement;
s2: constructing and training a teacher model by taking a YOLOv5 model as a framework;
s3: constructing a student model by taking a YOLOv5 model as a framework, and training the student model in a knowledge distillation mode by using an engine cylinder hole inner wall image dataset formed by a new defect type and a new sample based on a trained teacher model;
s4: and the student model obtained through knowledge distillation training realizes the detection of new and old class defect samples.
In the embodiment of the present invention, step S1 specifically includes:
s11: HSV enhancement mode: h represents Hue (chroma), S represents Saturation, V represents Value (brightness), and the original image is randomly adjusted in three aspects of chroma, saturation and brightness to obtain different sub-images;
s12: image panning, scaling, and flipping: moving the original image in the horizontal or vertical direction, enlarging or reducing, supplementing the missing boundary according to the characteristics compared with the original image, cutting the exceeding area, and ensuring that the proportion of the zoomed image and the original image is consistent; the original image is turned over in the horizontal and vertical directions to obtain a sub-image;
s13: mosaics data enhancement: and taking out four pictures from the data set, carrying out random overturning, zooming and cutting operations on the four pictures, and then synthesizing the four operated pictures into one image to obtain a sub-image. After the data is enhanced, the diversity of targets can be increased, targets with different scales are increased, and the training process of the model is facilitated. The method can increase the diversity of targets, increase the number of samples for single training, enrich the background of detected objects, increase the number of small-scale targets and be beneficial to the detection precision of subsequent models and the convergence speed of network training.
In step S2, firstly, according to the characteristics of the dataset, adaptive anchor frame calculation is performed to obtain respective three anchor frame ratios under three dimensions of large, medium and small for subsequent training, and then adaptive image scaling is performed on the sample image to adapt to detection of targets with different sizes.
The teacher model adopts a YOLOv5_small structure, a main network comprises a slice structure Focus, a convolution module Conv, a bottleneck layer C3 and a space pyramid pooling SPP, an input image extracts a plurality of features with different scales through repeated downsampling of the main network, then the features are fused with a network structure FPN from top to bottom, and then the feature information with different scales is fused through a path aggregation network structure PAN from bottom to top, and finally a head prediction network predicts the feature images of 80 multiplied by 80, 40 multiplied by 40 and 20 multiplied by 20 on a small target, a medium target and a large target respectively.
Referring to fig. 2, the teacher model adopts yolov5_small structure, an input picture firstly passes through a focus structure, then passes through a convolution layer with batch normalization and a Silu activation function (other convolution layers after all have batch normalization and a Silu activation function), a C3 structure module, then passes through two convolution layers and a C3 structure module, outputs into the convolution layer and an SPP structure, then connects the C3 module and the convolution layer, upsamples the output, splices with the output of the sixth layer through a connection layer, upsamples after passing through the C3 structure and the convolution layer, splices with the output of the fourth layer through the connection layer, and then obtains a feature map for detecting small targets in multi-scale detection through the C3 structure and the convolution layer. And connecting the output with a convolution layer, splicing the output with the output of the fourteenth layer through a connecting layer, and obtaining a characteristic diagram for detecting the target through a C3 structure and the convolution layer. Similarly, the output is connected with a convolution layer, and is spliced with the output of the tenth layer through a connecting layer, and the characteristic diagram for detecting the large target is finally obtained through the C3 structure and the convolution layer. The three feature maps of different scales are adjusted to vector form corresponding to the hard tag in the last detection layer for subsequent training and distillation work.
The input is sliced by the Focus structure, the four sliced results are connected through the connecting layer, and finally the output is obtained through a convolution layer. In the C3 structure, the input is respectively connected with two outputs through a plurality of residual assemblies and a convolution layer, then is spliced with the input of the C3 structure through a connecting layer, and finally is output through the convolution layer. The SPP structure is firstly input to pass through a convolution layer, then the maximum pooling layers with the kernel sizes of 5 multiplied by 5, 9 multiplied by 9 and 13 multiplied by 13 are respectively used for pooling, then the output of the convolution layer is spliced with the three pooled results, and the spliced results are finally output through the convolution layer.
In step S2, the teacher model is trained using a loss function of YOLOv5, which consists of three loss parts including a rectangular box loss lossbox, a confidence loss lossbj, and a classification loss losclcl, and the loss function is a loss weighted sum of the three parts, and the formula is:
Loss=a×lossbox+b×lossobj+c×losscls (1)
the rectangular box loss is calculated by CIOUloss, and the formula is as follows:
loss CIOU =1-CIOU (6)
wherein S is 1 ,S 2 The intersection area and the union area of the prediction frame and the real frame are respectively, ρ is the distance between the center of the prediction frame and the center of the real frame, c is the minimum rectangular diagonal length overlapped by the prediction frame and the real frame, v is the aspect ratio similarity of the prediction frame and the real frame, and α is an influence factor; w (w) gt 、h gt Respectively the width and the height of a real frame, w p 、h p The width and the height of the prediction frame are respectively;
the confidence loss and the classification loss are obtained by BCE loss calculation, taking an 80 multiplied by 80 feature diagram as an example, YOLOv5 predicts three rectangular frames positioned near each pixel grid, so that the value range of z, x and y exists, the confidence label is a matrix L, the prediction confidence is a matrix P, and the BCE loss of each numerical value in the matrix is the loss BCE (z, x, y); the formula is:
loss BCE (z,x,y)=-L(z,x,y)*logP(z,x,y)-(1-L(z,x,y))*log(1-P(z,x,y)) (7)
0≤z<3,0≤x<80,0≤y<80
training the model by using samples of the old defect types to obtain a teacher model, wherein the teacher model is used as a basis for distilling the new defect types and the new defect samples appearing later.
Referring to fig. 3, the knowledge distillation method can protect the knowledge learned in the original model through the design of distillation loss, and in this embodiment, when the student model is trained by adopting the knowledge distillation method, the output of the teacher model is used to constrain the student model through the distillation loss, so as to protect the knowledge learned in the teacher model for the old sample.
In step S3, the specific method for performing distillation operation on the student model by using the teacher model is as follows: firstly, building a student model according to a YOLOv5 model frame, freezing teacher model parameters, respectively passing an input picture through the teacher model and the student model, and then passing through a last detection layer, wherein the meaning of output vectors is as follows in sequence: the abscissa of the center point of the target frame, the ordinate of the center point of the target frame, the width of the target frame, the height of the target frame, the foreground probability and the probability of belonging to each category; the student model is a training model capable of identifying the defect type of the inner wall of the cylinder hole of the newly-increased engine and the old defect type at the same time, so that the output of the student model is higher than the output dimension of the teacher model, the quantity of the higher output is the quantity of the newly-increased types, the output of the student model is sliced, a part corresponding to the dimension and meaning of the teacher model is taken out, the output vector of the teacher model and the output sliced vector of the student model are processed by a deformation softmax function, the result of the teacher model is taken as a soft label, the result of the student model is taken as soft prediction, the weighted sum of the KL loss function and the L2 loss function is taken as distillation loss, the output of the teacher model guides the training of the student model, the output result of the student model on the old type is similar to the teacher model, and the recognition capability of the student model on the old type is kept; the L2 loss function formula is:
the KL loss function formula is:
wherein y is i Is a soft markSign f (x) i ) For soft prediction, the distillation loss L is:
L=0.1KL(y i ||f(x i ))+0.9L2 (10)。
in step S3, the output of the student model is subjected to a softmax function to obtain hard prediction, and the hard prediction and a corresponding hard tag marked in advance in the data set are calculated according to a loss function of YOLOv5, so that the student model has the capability of identifying a newly added defect type sample;
thus, the total loss function for training the student model is:
Loss total =Loss+λ×L (11)
wherein lambda is a weight parameter;
hard predictions for student models are obtained by softmax functions, given by:
wherein q i The probability, z, output for each category j Output for the full connection layer of each category.
In view of the fact that, during the distillation training, although the numerical value of the class with the largest probability value in the soft label is more concerned, other small probability values for distillation are also knowledge learned by the teacher network and should be utilized. Because these numerical differences are too large, the modified Softmax function is introduced for smoothing by adding the parameter T, the formula:
after T is added, the probability values tend to be equal, so that the problem caused by overlarge difference between the numerical values is reduced as much as possible.
The experimental results of the present invention on the engine cylinder bore inner wall dataset are described in detail below.
The engine cylinder bore inner wall dataset is made up of two parts, the first part being the old sample dataset and the second part being the new class and the new sample dataset. A total of 438 pictures in the old sample dataset, according to 3: the training set and the test set were scaled 1, with 326 training sets, 112 test sets, and the defect categories contained two categories, cracks (cracks) and trachoma (sansholes). The new class and the new sample data set total 265, wherein the training set 212, the test set 53, and the defect class contains three classes, cracks (cracks), tracings (sand holes) and bumps (bumps). The samples in the two data sets do not overlap.
Training is completed on the new class and the new sample data set by taking the teacher model as a pre-training weight to obtain a model weight A, training is completed on the new class and the new sample data set by using the method provided by the invention under the supervision of the teacher model to obtain a model weight B, and table 1 is a test result of the model weights A and B on the old sample test set and map0.5 is used as a detection index. Table 2 shows the results of experiments when lambda takes part in different values.
Table 1 test results
Table 2 lambda shows the results of experiments when parts of the values are different
The result shows that the method for detecting the defects of the inner wall of the cylinder hole of the automobile engine by combining YOLOv5 with improved knowledge distillation can avoid the catastrophic forgetting phenomenon of the model in the process of learning a new class and a new sample, and can save a great amount of time and cost in an actual application scene.
The invention provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the improved method for detecting the inner wall of the cylinder hole of the automobile engine by combining YOLOv5 with knowledge distillation when executing the computer program.
The present invention proposes a computer readable storage medium for storing computer instructions which, when executed by a processor, implement the steps of the improved method for detecting an inner wall of a cylinder bore of an automotive engine in combination with YOLOv5 and knowledge distillation.
The memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DRRAM). It should be noted that the memory of the methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
It should be noted that the processor in the embodiments of the present application may be an integrated circuit chip with signal processing capability. In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
The improved method for detecting the inner wall of the cylinder hole of the automobile engine, which combines YOLOv5 with knowledge distillation, is provided by the invention, and the principle and the implementation mode of the invention are explained by applying specific examples, and the above examples are only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. An improved method for detecting the inner wall of a cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation, which is characterized by comprising the following steps:
s1: carrying out data enhancement on pictures in the image dataset of the inner wall of the engine cylinder hole, wherein the data enhancement comprises HSV enhancement, image translation, image scaling, left-right horizontal overturning and mosaics data enhancement;
s2: constructing and training a teacher model by taking a YOLOv5 model as a framework;
s3: constructing a student model by taking a YOLOv5 model as a framework, and training the student model in a knowledge distillation mode by using an engine cylinder hole inner wall image dataset formed by a new defect type and a new sample based on a trained teacher model;
s4: and the student model obtained through knowledge distillation training realizes the detection of new and old class defect samples.
2. The method according to claim 1, wherein step S1 is specifically:
s11: HSV enhancement mode: randomly adjusting the original image in three aspects of chromaticity, saturation and brightness to obtain different sub-images;
s12: image panning, scaling, and flipping: moving the original image in the horizontal or vertical direction, enlarging or reducing, supplementing the missing boundary according to the characteristics compared with the original image, cutting the exceeding area, and ensuring that the proportion of the zoomed image and the original image is consistent; the original image is turned over in the horizontal and vertical directions to obtain a sub-image;
s13: mosaics data enhancement: and taking out four pictures from the data set, carrying out random overturning, zooming and cutting operations on the four pictures, and then synthesizing the four operated pictures into one image to obtain a sub-image.
3. The method according to claim 2, wherein in step S2, firstly, adaptive anchor frame calculation is performed according to the characteristics of the dataset, so as to obtain respective three anchor frame ratios at three dimensions of large, medium and small for subsequent training, and then adaptive picture scaling is performed on the sample picture to adapt to detection of targets with different sizes.
4. The method of claim 3, wherein the teacher model adopts a yolv5_small structure, the backbone network comprises a slice structure Focus, a convolution module Conv, a bottleneck layer C3 and a spatial pyramid pooling SPP, the input image is subjected to multiple downsampling of the backbone network to extract a plurality of features with different scales, then the features with different scales are fused through a top-down feature pyramid fusion network structure FPN and then through a bottom-up path aggregation network structure PAN, finally the head prediction network predicts the feature patterns of 80×80, 40×40 and 20×20 for small-sized targets, medium-sized targets respectively.
5. The method of claim 4, wherein in step S2, a loss function of YOLOv5 is used in training the teacher model, and the loss function is composed of three loss parts, including a rectangular box loss lossbox, a confidence loss lossbj, and a classification loss losclcl, and the loss function is a loss weighted sum of the three parts, and the formula is:
Loss=a×lossbox+b×lossobj+c×losscls (1)
the rectangular frame loss is calculated by CIOUloss, and the formula is as follows:
loss CIOU =1-CIOU (6)
S 1 ,S 2 the intersection area and the union area of the prediction frame and the real frame are respectively, ρ is the distance between the center of the prediction frame and the center of the real frame, c is the minimum rectangular diagonal length overlapped by the prediction frame and the real frame, v is the aspect ratio similarity of the prediction frame and the real frame, and α is an influence factor; w (w) gt 、h gt Respectively the width and the height of a real frame, w p 、h p The width and the height of the prediction frame are respectively;
confidence and classification losses are calculated by BCE loss, and by taking an 80×80 feature map as an example, YOLOv5 predicts three pixel grids located near each gridTherefore, the value range of z, x and y exists, the confidence coefficient label is a matrix L, the prediction confidence coefficient is a matrix P, and the BCE loss of each numerical value in the matrix is loss BCE (z, x, y); the formula is:
loss BCE (z,x,y)=-L(z,x,y)*logP(z,x,y)-(1-L(z,x,y))*log(1-P(z,x,y)) (7)
0≤z<3,0≤x<80,0≤y<80
training the model by using samples of the old defect types to obtain a teacher model, wherein the teacher model is used as a basis for distilling the new defect types and the new defect samples appearing later.
6. The method according to claim 5, wherein in step S3, the specific method for performing distillation operation on the student model by using the teacher model is as follows: firstly, building a student model according to a YOLOv5 model frame, freezing teacher model parameters, respectively passing an input picture through the teacher model and the student model, and then passing through a last detection layer, wherein the meaning of output vectors is as follows in sequence: the abscissa of the center point of the target frame, the ordinate of the center point of the target frame, the width of the target frame, the height of the target frame, the foreground probability and the probability of belonging to each category; the student model is a training model capable of identifying the defect type of the inner wall of the cylinder hole of the newly-increased engine and the old defect type at the same time, so that the output of the student model is higher than the output dimension of the teacher model, the quantity of the higher output is the quantity of the newly-increased types, the output of the student model is sliced, a part corresponding to the dimension and meaning of the teacher model is taken out, the output vector of the teacher model and the output sliced vector of the student model are processed by a deformation softmax function, the result of the teacher model is taken as a soft label, the result of the student model is taken as soft prediction, the weighted sum of the KL loss function and the L2 loss function is taken as distillation loss, the output of the teacher model guides the training of the student model, the output result of the student model on the old type is similar to the teacher model, and the recognition capability of the student model on the old type is kept; the L2 loss function formula is:
the KL loss function formula is:
wherein y is i Is a soft label, f (x i ) For soft prediction, the distillation loss L is:
L=0.1KL(y i ||f(x i ))+0.9L2 (10)。
7. the method according to claim 6, wherein in step S3, the output of the student model is subjected to a softmax function to obtain a hard prediction, and the hard prediction and the corresponding hard tag marked in advance in the dataset are calculated according to a loss function of YOLOv5, so that the student model has the ability to identify a new defect type sample;
thus, the total loss function for training the student model is:
Loss total =Loss+λ×L (11)
wherein lambda is a weight parameter;
hard predictions for student models are obtained by softmax functions, given by:
wherein q i The probability, z, output for each category j Output for the full connection layer of each category.
8. The method of claim 7, wherein the smoothing is performed by introducing a deformed Softmax function by adding a parameter T, the formula being:
after T is added, the probability values tend to be equal, so that the problem caused by overlarge difference between the numerical values is reduced as much as possible.
9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1-8 when the computer program is executed.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1-8.
CN202311544930.9A 2023-11-20 2023-11-20 Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation Pending CN117593264A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311544930.9A CN117593264A (en) 2023-11-20 2023-11-20 Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311544930.9A CN117593264A (en) 2023-11-20 2023-11-20 Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation

Publications (1)

Publication Number Publication Date
CN117593264A true CN117593264A (en) 2024-02-23

Family

ID=89916016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311544930.9A Pending CN117593264A (en) 2023-11-20 2023-11-20 Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation

Country Status (1)

Country Link
CN (1) CN117593264A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911403A (en) * 2024-03-18 2024-04-19 沈阳派得林科技有限责任公司 Knowledge distillation-based light-weight dynamic DR steel pipe weld defect detection method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117911403A (en) * 2024-03-18 2024-04-19 沈阳派得林科技有限责任公司 Knowledge distillation-based light-weight dynamic DR steel pipe weld defect detection method

Similar Documents

Publication Publication Date Title
CN112884064B (en) Target detection and identification method based on neural network
CN111080628B (en) Image tampering detection method, apparatus, computer device and storage medium
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
Wan et al. Ceramic tile surface defect detection based on deep learning
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN111768415A (en) Image instance segmentation method without quantization pooling
TWI743837B (en) Training data increment method, electronic apparatus and computer-readable medium
CN117593264A (en) Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation
CN116612292A (en) Small target detection method based on deep learning
CN114863236A (en) Image target detection method based on double attention mechanism
CN114708437B (en) Training method of target detection model, target detection method, device and medium
CN110751195A (en) Fine-grained image classification method based on improved YOLOv3
CN114359245A (en) Method for detecting surface defects of products in industrial scene
CN112330651A (en) Logo detection method and system based on deep learning
CN110807404A (en) Form line detection method, device, terminal and storage medium based on deep learning
CN116342536A (en) Aluminum strip surface defect detection method, system and equipment based on lightweight model
CN115984226A (en) Insulator defect detection method, device, medium, and program product
CN116091823A (en) Single-feature anchor-frame-free target detection method based on fast grouping residual error module
US11195265B2 (en) Server and method for recognizing image using deep learning
CN116563230A (en) Weld defect identification method and system
CN115619678A (en) Image deformation correction method and device, computer equipment and storage medium
CN112307908B (en) Video semantic extraction method and device
CN112529095B (en) Single-stage target detection method based on convolution region re-registration
Pang et al. [Retracted] GCN‐Unet: A Computer Vision Method with Application to Industrial Granularity Segmentation
Huang et al. FPP detector for small product defect detection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination