CN117593264A

CN117593264A - Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation

Info

Publication number: CN117593264A
Application number: CN202311544930.9A
Authority: CN
Inventors: 金晶; 陈铎; 何旭杰; 冯怡园
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2023-11-20
Filing date: 2023-11-20
Publication date: 2024-02-23

Abstract

The invention provides an improved detection method for the inner wall of a cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation. The method comprises the following steps: enhancing the image; constructing and training a teacher model by using the YOLOv5 model; training the YOLOv5 student model by improved knowledge distillation on the newly added defect class and sample based on the teacher model. According to the invention, through an improved combined YOLOv5 target detection algorithm and a knowledge distillation training mode, the recognition capability of an existing model on an old sample after training is carried out by only using the defect type of the inner wall of the cylinder hole of the newly-added engine and the defect sample of the inner wall of the cylinder hole of the newly-added engine is improved.

Description

Improved detection method for inner wall of cylinder hole of automobile engine by combining YOLOv5 with knowledge distillation

Technical Field

The invention relates to the technical field of image defect detection of an inner wall of an engine cylinder hole, in particular to an improved detection method of the inner wall of the cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation.

Background

In the production process of the engine, the quality of the workpiece is extremely easily affected by the factors of the prior art, working conditions and the like. Among these, surface defects are the most intuitive manifestations of the affected quality of engine products. In order to ensure the product yield, it is necessary to perform defect detection of the inner wall surface of the engine cylinder bore. The defect detection means to detect defects such as scratches, foreign matter shielding, color pollution, holes and the like on the surface of the sample to be detected, so as to obtain a series of relevant information such as defect types, positions and the like.

Conventional surface defect detection methods play a great role over a period of time, and are mainly classified into texture feature-based methods, color feature-based methods, and shape feature-based methods. The texture features can reflect the arrangement characteristics of the surface through the gray distribution of the pixels and the neighborhood, including a statistical method, a signal processing method and the like. The method based on the color features is relatively small in calculation amount, has high robustness and does not depend on image viewing angles, directions and the like. Contour-based methods are based on the main representation in shape methods, where shape parameters of an image, such as hough transforms, etc., are obtained by describing the external boundary features of an object. Conventional solutions focus on functions that are specifically designed for a particular problem, but in complex cases, these functions are sometimes difficult to accurately describe. However, the deep learning adopts a data learning mode to convert data into abstract feature representation, so that the system learns corresponding features by itself, and the requirement of complex features on specific defects is overcome. The defect detection method for deep learning is roughly classified into a supervised mode, an unsupervised mode, and a semi-supervised mode. The supervised learning requires that samples of the training set must be labeled for searching for internal rules during training and have a degree of generalization on the test set. Common methods of supervised learning are broadly divided into defect classification networks, defect detection networks, and defect segmentation networks. The non-supervision learning method inputs only the non-label data, obtains the inherent characteristics of the data through a network, judges new data according to a learned model, and common methods include a self-encoder, an countermeasure generation network and the like. The semi-supervision method combines the characteristics of the supervision method and the non-supervision method, obtains better performance under specific conditions, and avoids higher marking cost.

The rapid development of defect detection technology based on deep learning makes the technology widely applied to the field of detecting defects of the inner wall of an engine cylinder hole. In a real scenario, however, new samples and even new defect categories will appear continuously. Due to different reasons such as illumination and engine cylinder hole batches, certain differences exist between the new category and the new sample and the old sample, and when the existing engine defect detection algorithm meets the conditions, the whole model needs to be trained on the new sample and the old sample again, so that a large amount of time is consumed; training only the new samples creates a catastrophic forgetting problem for the old samples. Aiming at the problems, the invention provides an improved method for detecting the inner wall of the inner hole of the automobile engine by combining YOLOv5 with knowledge distillation, which can realize the reservation of the detection capability of the old sample while learning the new category and the new sample, reduce the disastrous forgetting phenomenon of the old sample caused by the existing algorithm and provide a new method choice for the defect detection task.

Disclosure of Invention

The invention aims to solve the problem that the retraining time of a new type and a new sample is excessively consumed in the field of detecting defects of the inner wall of an engine cylinder hole, and the problem that only the new type and the new sample are trained and the old sample is forgotten catastrophically, and provides an improved method for detecting the inner wall of the engine cylinder hole of an automobile by combining YOLOv5 with knowledge distillation.

The invention is realized by the following technical scheme, and provides an improved detection method for the inner wall of a cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation, which comprises the following steps:

s1: carrying out data enhancement on pictures in the image dataset of the inner wall of the engine cylinder hole, wherein the data enhancement comprises HSV enhancement, image translation, image scaling, left-right horizontal overturning and mosaics data enhancement;

s2: constructing and training a teacher model by taking a YOLOv5 model as a framework;

s3: constructing a student model by taking a YOLOv5 model as a framework, and training the student model in a knowledge distillation mode by using an engine cylinder hole inner wall image dataset formed by a new defect type and a new sample based on a trained teacher model;

s4: and the student model obtained through knowledge distillation training realizes the detection of new and old class defect samples.

Further, the step S1 specifically includes:

s11: HSV enhancement mode: randomly adjusting the original image in three aspects of chromaticity, saturation and brightness to obtain different sub-images;

s12: image panning, scaling, and flipping: moving the original image in the horizontal or vertical direction, enlarging or reducing, supplementing the missing boundary according to the characteristics compared with the original image, cutting the exceeding area, and ensuring that the proportion of the zoomed image and the original image is consistent; the original image is turned over in the horizontal and vertical directions to obtain a sub-image;

s13: mosaics data enhancement: and taking out four pictures from the data set, carrying out random overturning, zooming and cutting operations on the four pictures, and then synthesizing the four operated pictures into one image to obtain a sub-image.

Further, in step S2, firstly, adaptive anchor frame calculation is performed according to the features of the dataset, so as to obtain respective three anchor frame ratios under three dimensions of large, medium and small for subsequent training, and then adaptive image scaling is performed on the sample image so as to adapt to detection of targets with different sizes.

Further, the teacher model adopts a yolov5_small structure, the main network comprises a slice structure Focus, a convolution module Conv, a bottleneck layer C3 and a space pyramid pooling SPP, the input image is subjected to repeated downsampling of the main network to extract a plurality of features with different scales, then the features are fused with a network structure FPN through a feature pyramid from top to bottom, the feature information with different scales is fused through a path aggregation network structure PAN from bottom to top, and finally a head prediction network predicts the feature images of 80×80, 40×40 and 20×20 on a small target, a medium target and a large target respectively.

Further, in step S2, the teacher model is trained using a loss function of YOLOv5, which consists of three loss parts including a rectangular box loss lossbox, a confidence loss lossbj, and a classification loss losclcls, where the loss function is a loss weighted sum of the three parts, and the formula is:

Loss＝a×lossbox+b×lossobj+c×losscls (1)

the rectangular box loss is calculated by CIOUloss, and the formula is as follows:

loss _CIOU ＝1-CIOU (6)

S ₁ ，S ₂ the intersection area and the union area of the prediction frame and the real frame are respectively, ρ is the distance between the center of the prediction frame and the center of the real frame, c is the minimum rectangular diagonal length overlapped by the prediction frame and the real frame, v is the aspect ratio similarity of the prediction frame and the real frame, and α is an influence factor; w (w) _gt 、h _gt Respectively the width and the height of a real frame, w _p 、h _p The width and the height of the prediction frame are respectively;

confidence loss and classification loss are calculated by BCE loss, and 80×80 feature diagram is taken asFor example, YOLOv5 predicts three rectangular frames located near each pixel grid, so that there is a range of z, x, y values, confidence label is matrix L, confidence in prediction is matrix P, and BCE loss of each value in the matrix is loss _BCE (z, x, y); the formula is:

loss _BCE (z,x,y)＝-L(z,x,y)*logP(z,x,y)-(1-L(z,x,y))*log(1-P(z,x,y)) (7)

0≤z＜3,0≤x＜80,0≤y＜80

training the model by using samples of the old defect types to obtain a teacher model, wherein the teacher model is used as a basis for distilling the new defect types and the new defect samples appearing later.

Further, in step S3, the specific method for performing distillation operation on the student model by using the teacher model is as follows: firstly, building a student model according to a YOLOv5 model frame, freezing teacher model parameters, respectively passing an input picture through the teacher model and the student model, and then passing through a last detection layer, wherein the meaning of output vectors is as follows in sequence: the abscissa of the center point of the target frame, the ordinate of the center point of the target frame, the width of the target frame, the height of the target frame, the foreground probability and the probability of belonging to each category; the student model is a training model capable of identifying the defect type of the inner wall of the cylinder hole of the newly-increased engine and the old defect type at the same time, so that the output of the student model is higher than the output dimension of the teacher model, the quantity of the higher output is the quantity of the newly-increased types, the output of the student model is sliced, a part corresponding to the dimension and meaning of the teacher model is taken out, the output vector of the teacher model and the output sliced vector of the student model are processed by a deformation softmax function, the result of the teacher model is taken as a soft label, the result of the student model is taken as soft prediction, the weighted sum of the KL loss function and the L2 loss function is taken as distillation loss, the output of the teacher model guides the training of the student model, the output result of the student model on the old type is similar to the teacher model, and the recognition capability of the student model on the old type is kept; the L2 loss function formula is:

the KL loss function formula is:

wherein y is _i Is a soft label, f (x _i ) For soft prediction, the distillation loss L is:

L＝0.1KL(y _i ||f(x _i ))+0.9L2 (10)。

further, in step S3, the output of the student model is subjected to a softmax function to obtain a hard prediction, and the hard prediction and a corresponding hard tag marked in advance in the dataset are calculated according to a loss function of YOLOv5, so that the student model has the capability of identifying a newly added defect type sample;

thus, the total loss function for training the student model is:

Loss _total ＝Loss+λ×L (11)

wherein lambda is a weight parameter;

hard predictions for student models are obtained by softmax functions, given by:

wherein q _i The probability, z, output for each category _j Output for the full connection layer of each category.

Further, a deformed Softmax function is introduced to carry out smoothing treatment, and the specific mode is to add a parameter T, and the formula is as follows:

after T is added, the probability values tend to be equal, so that the problem caused by overlarge difference between the numerical values is reduced as much as possible.

The invention provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the improved method for detecting the inner wall of the cylinder hole of the automobile engine by combining YOLOv5 with knowledge distillation when executing the computer program.

The present invention proposes a computer readable storage medium for storing computer instructions which, when executed by a processor, implement the steps of the improved method for detecting an inner wall of a cylinder bore of an automotive engine in combination with YOLOv5 and knowledge distillation.

The invention has the beneficial effects that:

the invention is applied to the field of detection of defects of the inner wall of an engine cylinder hole, and provides an improved detection method of the defects of the inner wall of the engine cylinder hole of an automobile, which combines yolov5 with knowledge distillation, so that the recognition capability of an existing model on an old sample after training is improved by only using the defect types of the inner wall of the engine cylinder hole and the defect samples of the inner wall of the engine cylinder hole.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for improved detection of defects in the inner wall of a cylinder bore of an automotive engine combining YOLOv5 with knowledge distillation;

FIG. 2 is a diagram of the structure of a teacher model and a student model in a distillation method;

fig. 3 is a framework diagram of a knowledge distillation method.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention provides an improved method for detecting defects of the inner wall of a cylinder hole of an automobile engine by combining a YOLOv5_small model and knowledge distillation. In the practical application scene of the detection of the inner wall of the engine cylinder hole, new samples are continuously generated, even new types are generated, certain differences exist between the new samples and the old samples, in order to enable the models to have the detection capability of the new types and the new samples, the detection capability of the existing models on the old samples is damaged by directly training the new types and the new samples on the existing models, and the time cost corresponding to training the existing models again on all the samples is too high. The knowledge distillation method can protect the knowledge learned in the original model by improving the distillation loss design, and can be used for compression model or incremental learning.

Referring to fig. 1, the invention provides an improved method for detecting the inner wall of a cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation, which comprises the following steps:

s1: carrying out data enhancement on pictures in the image data set of the inner wall of the engine cylinder hole, wherein the data enhancement comprises HSV (hue, saturation and brightness) enhancement, image translation, image scaling, left-right horizontal overturning and mosaic data enhancement;

In the embodiment of the present invention, step S1 specifically includes:

s11: HSV enhancement mode: h represents Hue (chroma), S represents Saturation, V represents Value (brightness), and the original image is randomly adjusted in three aspects of chroma, saturation and brightness to obtain different sub-images;

s13: mosaics data enhancement: and taking out four pictures from the data set, carrying out random overturning, zooming and cutting operations on the four pictures, and then synthesizing the four operated pictures into one image to obtain a sub-image. After the data is enhanced, the diversity of targets can be increased, targets with different scales are increased, and the training process of the model is facilitated. The method can increase the diversity of targets, increase the number of samples for single training, enrich the background of detected objects, increase the number of small-scale targets and be beneficial to the detection precision of subsequent models and the convergence speed of network training.

In step S2, firstly, according to the characteristics of the dataset, adaptive anchor frame calculation is performed to obtain respective three anchor frame ratios under three dimensions of large, medium and small for subsequent training, and then adaptive image scaling is performed on the sample image to adapt to detection of targets with different sizes.

The teacher model adopts a YOLOv5_small structure, a main network comprises a slice structure Focus, a convolution module Conv, a bottleneck layer C3 and a space pyramid pooling SPP, an input image extracts a plurality of features with different scales through repeated downsampling of the main network, then the features are fused with a network structure FPN from top to bottom, and then the feature information with different scales is fused through a path aggregation network structure PAN from bottom to top, and finally a head prediction network predicts the feature images of 80 multiplied by 80, 40 multiplied by 40 and 20 multiplied by 20 on a small target, a medium target and a large target respectively.

Referring to fig. 2, the teacher model adopts yolov5_small structure, an input picture firstly passes through a focus structure, then passes through a convolution layer with batch normalization and a Silu activation function (other convolution layers after all have batch normalization and a Silu activation function), a C3 structure module, then passes through two convolution layers and a C3 structure module, outputs into the convolution layer and an SPP structure, then connects the C3 module and the convolution layer, upsamples the output, splices with the output of the sixth layer through a connection layer, upsamples after passing through the C3 structure and the convolution layer, splices with the output of the fourth layer through the connection layer, and then obtains a feature map for detecting small targets in multi-scale detection through the C3 structure and the convolution layer. And connecting the output with a convolution layer, splicing the output with the output of the fourteenth layer through a connecting layer, and obtaining a characteristic diagram for detecting the target through a C3 structure and the convolution layer. Similarly, the output is connected with a convolution layer, and is spliced with the output of the tenth layer through a connecting layer, and the characteristic diagram for detecting the large target is finally obtained through the C3 structure and the convolution layer. The three feature maps of different scales are adjusted to vector form corresponding to the hard tag in the last detection layer for subsequent training and distillation work.

The input is sliced by the Focus structure, the four sliced results are connected through the connecting layer, and finally the output is obtained through a convolution layer. In the C3 structure, the input is respectively connected with two outputs through a plurality of residual assemblies and a convolution layer, then is spliced with the input of the C3 structure through a connecting layer, and finally is output through the convolution layer. The SPP structure is firstly input to pass through a convolution layer, then the maximum pooling layers with the kernel sizes of 5 multiplied by 5, 9 multiplied by 9 and 13 multiplied by 13 are respectively used for pooling, then the output of the convolution layer is spliced with the three pooled results, and the spliced results are finally output through the convolution layer.

In step S2, the teacher model is trained using a loss function of YOLOv5, which consists of three loss parts including a rectangular box loss lossbox, a confidence loss lossbj, and a classification loss losclcl, and the loss function is a loss weighted sum of the three parts, and the formula is:

Loss＝a×lossbox+b×lossobj+c×losscls (1)

loss _CIOU ＝1-CIOU (6)

wherein S is ₁ ，S ₂ The intersection area and the union area of the prediction frame and the real frame are respectively, ρ is the distance between the center of the prediction frame and the center of the real frame, c is the minimum rectangular diagonal length overlapped by the prediction frame and the real frame, v is the aspect ratio similarity of the prediction frame and the real frame, and α is an influence factor; w (w) _gt 、h _gt Respectively the width and the height of a real frame, w _p 、h _p The width and the height of the prediction frame are respectively;

the confidence loss and the classification loss are obtained by BCE loss calculation, taking an 80 multiplied by 80 feature diagram as an example, YOLOv5 predicts three rectangular frames positioned near each pixel grid, so that the value range of z, x and y exists, the confidence label is a matrix L, the prediction confidence is a matrix P, and the BCE loss of each numerical value in the matrix is the loss _BCE (z, x, y); the formula is:

loss _BCE (z,x,y)＝-L(z,x,y)*logP(z,x,y)-(1-L(z,x,y))*log(1-P(z,x,y)) (7)

0≤z＜3,0≤x＜80,0≤y＜80

Referring to fig. 3, the knowledge distillation method can protect the knowledge learned in the original model through the design of distillation loss, and in this embodiment, when the student model is trained by adopting the knowledge distillation method, the output of the teacher model is used to constrain the student model through the distillation loss, so as to protect the knowledge learned in the teacher model for the old sample.

In step S3, the specific method for performing distillation operation on the student model by using the teacher model is as follows: firstly, building a student model according to a YOLOv5 model frame, freezing teacher model parameters, respectively passing an input picture through the teacher model and the student model, and then passing through a last detection layer, wherein the meaning of output vectors is as follows in sequence: the abscissa of the center point of the target frame, the ordinate of the center point of the target frame, the width of the target frame, the height of the target frame, the foreground probability and the probability of belonging to each category; the student model is a training model capable of identifying the defect type of the inner wall of the cylinder hole of the newly-increased engine and the old defect type at the same time, so that the output of the student model is higher than the output dimension of the teacher model, the quantity of the higher output is the quantity of the newly-increased types, the output of the student model is sliced, a part corresponding to the dimension and meaning of the teacher model is taken out, the output vector of the teacher model and the output sliced vector of the student model are processed by a deformation softmax function, the result of the teacher model is taken as a soft label, the result of the student model is taken as soft prediction, the weighted sum of the KL loss function and the L2 loss function is taken as distillation loss, the output of the teacher model guides the training of the student model, the output result of the student model on the old type is similar to the teacher model, and the recognition capability of the student model on the old type is kept; the L2 loss function formula is:

the KL loss function formula is:

wherein y is _i Is a soft markSign f (x) _i ) For soft prediction, the distillation loss L is:

L＝0.1KL(y _i ||f(x _i ))+0.9L2 (10)。

in step S3, the output of the student model is subjected to a softmax function to obtain hard prediction, and the hard prediction and a corresponding hard tag marked in advance in the data set are calculated according to a loss function of YOLOv5, so that the student model has the capability of identifying a newly added defect type sample;

thus, the total loss function for training the student model is:

Loss _total ＝Loss+λ×L (11)

wherein lambda is a weight parameter;

In view of the fact that, during the distillation training, although the numerical value of the class with the largest probability value in the soft label is more concerned, other small probability values for distillation are also knowledge learned by the teacher network and should be utilized. Because these numerical differences are too large, the modified Softmax function is introduced for smoothing by adding the parameter T, the formula:

The experimental results of the present invention on the engine cylinder bore inner wall dataset are described in detail below.

The engine cylinder bore inner wall dataset is made up of two parts, the first part being the old sample dataset and the second part being the new class and the new sample dataset. A total of 438 pictures in the old sample dataset, according to 3: the training set and the test set were scaled 1, with 326 training sets, 112 test sets, and the defect categories contained two categories, cracks (cracks) and trachoma (sansholes). The new class and the new sample data set total 265, wherein the training set 212, the test set 53, and the defect class contains three classes, cracks (cracks), tracings (sand holes) and bumps (bumps). The samples in the two data sets do not overlap.

Training is completed on the new class and the new sample data set by taking the teacher model as a pre-training weight to obtain a model weight A, training is completed on the new class and the new sample data set by using the method provided by the invention under the supervision of the teacher model to obtain a model weight B, and table 1 is a test result of the model weights A and B on the old sample test set and map0.5 is used as a detection index. Table 2 shows the results of experiments when lambda takes part in different values.

Table 1 test results

Table 2 lambda shows the results of experiments when parts of the values are different

The result shows that the method for detecting the defects of the inner wall of the cylinder hole of the automobile engine by combining YOLOv5 with improved knowledge distillation can avoid the catastrophic forgetting phenomenon of the model in the process of learning a new class and a new sample, and can save a great amount of time and cost in an actual application scene.

The memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DRRAM). It should be noted that the memory of the methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a high-density digital video disc (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.

It should be noted that the processor in the embodiments of the present application may be an integrated circuit chip with signal processing capability. In implementation, the steps of the above method embodiments may be implemented by integrated logic circuits of hardware in a processor or instructions in software form. The processor may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, or discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.

The improved method for detecting the inner wall of the cylinder hole of the automobile engine, which combines YOLOv5 with knowledge distillation, is provided by the invention, and the principle and the implementation mode of the invention are explained by applying specific examples, and the above examples are only used for helping to understand the method and the core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. An improved method for detecting the inner wall of a cylinder hole of an automobile engine by combining YOLOv5 with knowledge distillation, which is characterized by comprising the following steps:

2. The method according to claim 1, wherein step S1 is specifically:

3. The method according to claim 2, wherein in step S2, firstly, adaptive anchor frame calculation is performed according to the characteristics of the dataset, so as to obtain respective three anchor frame ratios at three dimensions of large, medium and small for subsequent training, and then adaptive picture scaling is performed on the sample picture to adapt to detection of targets with different sizes.

4. The method of claim 3, wherein the teacher model adopts a yolv5_small structure, the backbone network comprises a slice structure Focus, a convolution module Conv, a bottleneck layer C3 and a spatial pyramid pooling SPP, the input image is subjected to multiple downsampling of the backbone network to extract a plurality of features with different scales, then the features with different scales are fused through a top-down feature pyramid fusion network structure FPN and then through a bottom-up path aggregation network structure PAN, finally the head prediction network predicts the feature patterns of 80×80, 40×40 and 20×20 for small-sized targets, medium-sized targets respectively.

5. The method of claim 4, wherein in step S2, a loss function of YOLOv5 is used in training the teacher model, and the loss function is composed of three loss parts, including a rectangular box loss lossbox, a confidence loss lossbj, and a classification loss losclcl, and the loss function is a loss weighted sum of the three parts, and the formula is:

Loss＝a×lossbox+b×lossobj+c×losscls (1)

the rectangular frame loss is calculated by CIOUloss, and the formula is as follows:

loss _CIOU ＝1-CIOU (6)

confidence and classification losses are calculated by BCE loss, and by taking an 80×80 feature map as an example, YOLOv5 predicts three pixel grids located near each gridTherefore, the value range of z, x and y exists, the confidence coefficient label is a matrix L, the prediction confidence coefficient is a matrix P, and the BCE loss of each numerical value in the matrix is loss _BCE (z, x, y); the formula is:

loss _BCE (z,x,y)＝-L(z,x,y)*logP(z,x,y)-(1-L(z,x,y))*log(1-P(z,x,y)) (7)

0≤z＜3,0≤x＜80,0≤y＜80

6. The method according to claim 5, wherein in step S3, the specific method for performing distillation operation on the student model by using the teacher model is as follows: firstly, building a student model according to a YOLOv5 model frame, freezing teacher model parameters, respectively passing an input picture through the teacher model and the student model, and then passing through a last detection layer, wherein the meaning of output vectors is as follows in sequence: the abscissa of the center point of the target frame, the ordinate of the center point of the target frame, the width of the target frame, the height of the target frame, the foreground probability and the probability of belonging to each category; the student model is a training model capable of identifying the defect type of the inner wall of the cylinder hole of the newly-increased engine and the old defect type at the same time, so that the output of the student model is higher than the output dimension of the teacher model, the quantity of the higher output is the quantity of the newly-increased types, the output of the student model is sliced, a part corresponding to the dimension and meaning of the teacher model is taken out, the output vector of the teacher model and the output sliced vector of the student model are processed by a deformation softmax function, the result of the teacher model is taken as a soft label, the result of the student model is taken as soft prediction, the weighted sum of the KL loss function and the L2 loss function is taken as distillation loss, the output of the teacher model guides the training of the student model, the output result of the student model on the old type is similar to the teacher model, and the recognition capability of the student model on the old type is kept; the L2 loss function formula is:

the KL loss function formula is:

L＝0.1KL(y _i ||f(x _i ))+0.9L2 (10)。

7. the method according to claim 6, wherein in step S3, the output of the student model is subjected to a softmax function to obtain a hard prediction, and the hard prediction and the corresponding hard tag marked in advance in the dataset are calculated according to a loss function of YOLOv5, so that the student model has the ability to identify a new defect type sample;

thus, the total loss function for training the student model is:

Loss _total ＝Loss+λ×L (11)

wherein lambda is a weight parameter;

8. The method of claim 7, wherein the smoothing is performed by introducing a deformed Softmax function by adding a parameter T, the formula being:

9. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1-8 when the computer program is executed.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1-8.