CN115456938A

CN115456938A - Metal part crack detection method based on deep learning and ultrasonic infrared image

Info

Publication number: CN115456938A
Application number: CN202210872244.3A
Authority: CN
Inventors: 陶发展; 吴阳; 付主木; 宋书中; 冀保峰; 陈启宏; 司鹏举; 焦龙吟; 张冬凯; 张中才; 李梦杨
Original assignee: Henan University of Science and Technology
Current assignee: Henan University of Science and Technology
Priority date: 2022-07-20
Filing date: 2022-07-20
Publication date: 2022-12-09

Abstract

The invention relates to the field of machine vision, and discloses a metal part crack detection method based on deep learning and ultrasonic infrared images, S1, constructing an infrared image data set of a metal part with cracks under ultrasonic excitation; s2, carrying out image preprocessing on the acquired data set; s3, training an improved YOLOv3 network model by the preprocessed data set; and S4, inputting an image or a video to be detected, detecting through the trained metal part crack detection model, judging whether a defect exists and realizing positioning. The invention adopts a cross-stage residual error structure and a cross-stage dense feature reuse block in the backbone network to improve the reusability of the features; the pyramid composite neural network structure is improved, rich context information can be obtained, a feature refinement mechanism is introduced to inhibit conflict information, and a tiny target is prevented from being submerged in conflict semantic information, so that more tiny crack defects are detected.

Description

Metal part crack detection method based on deep learning and ultrasonic infrared image

Technical Field

The invention belongs to the field of machine vision, and particularly relates to a metal part crack detection method based on deep learning and ultrasonic infrared images.

Background

When an ultrasonic wave excites a tested piece, an object vibrates, the vibration states of contact surfaces on two sides of a crack defect are inconsistent, so that mutual friction and collision are caused, the energy of the ultrasonic wave is converted into heat energy by friction acting, and the temperature around the defect is increased along with the heat conduction and presents gradient distribution.

The infrared imaging is characterized in that infrared radiation with different wavelengths is emitted according to the fact that all temperatures of an object are higher than the absolute zero degree, the infrared radiation is stronger when the temperature of the object is higher, the infrared radiation contains characteristic information of the object, and then a thermal imager converts a radiation image invisible to human eyes into a clear and visible image.

The ultrasonic infrared thermal imaging defect detection has the advantages of large detection range, wide applicable materials and the like, is widely applied to the detection of metal materials and composite materials, but has the problems of high experience requirement, high manual false detection rate and high cost, and in addition, personnel cannot achieve the task of defect detection in many complex dangerous environments, so that the intelligent automatic defect detection method capable of realizing remote monitoring is provided by combining a deep learning algorithm. The deep learning target detection algorithm represented by YOLOv3 is widely applied to life, production and research due to the characteristics of high detection speed and high accuracy. However, since the YOLOv3 is designed for general target detection and is directly used for infrared detection of crack defects, the detection precision and speed are greatly reduced, the YOLOv3 algorithm is improved to a certain extent, and the YOLOv3 algorithm has stronger detection capability on the crack defects of the metal component under the ultrasonic excitation infrared image.

Disclosure of Invention

The invention aims to provide a metal part crack detection method based on deep learning and ultrasonic infrared images, which is used for solving the problem of automatic detection of internal crack defects of metal parts in industrial production.

In order to achieve the purpose, the invention adopts the following technical scheme:

a metal part crack detection method based on deep learning and ultrasonic infrared images comprises the following steps:

s1, constructing an infrared image data set of a metal part with cracks under ultrasonic excitation;

s2, carrying out image preprocessing on the acquired data set;

s3, performing feature extraction on metal part cracks under ultrasonic excitation on each preprocessed image by adopting an improved YOLOv3 feature extraction network to obtain a feature map, and training a crack detection network model;

and S4, inputting the image to be detected, detecting the image to be detected by the crack detection model, judging whether the image to be detected has defects or not, and realizing positioning.

Furthermore, the steps of obtaining the metal part crack detection network model are as follows:

a1: collecting infrared images of metal parts with cracks under ultrasonic excitation and making the images into a VOC data set;

a2: preprocessing the infrared image in the data set by a data enhancement technology;

a3: clustering the data set by a K-means clustering algorithm;

a4: and training the improved YOLOv3 network model through a data set to obtain the network model capable of accurately detecting the cracks of the metal parts.

Furthermore, the steps of training the metal part crack detection network model are as follows:

a41: inputting the data set image into a Darknet-53 backbone network, and acquiring characteristic information in the data set through a cross-level residual error structure;

a42: introducing a channel attention module and a space attention module to form a CBAM (convolutional block attention) module, distributing different weights to the characteristic information at different spatial positions or channels, allowing the model to focus on the channels rich in the characteristic information, simultaneously ignoring useless characteristic information, and then generating three characteristic graphs with different scales;

a43: and carrying out feature fusion on the feature map through the improved feature pyramid composite network, thereby carrying out multi-scale crack position prediction.

Further, the step of obtaining the feature map in step a42 is as follows:

l1: obtaining characteristic information of different gradients through continuous multi-layer trans-level residual blocks;

l2: reusing the feature information of different gradients through the cross-stage dense feature reuse block, generating feature maps of different gradients through the CBAM module, and inputting the feature maps of different gradients into the improved feature pyramid network.

The metal part crack detection method based on deep learning and ultrasonic infrared images provided by the invention has the beneficial effects that:

(1) According to the invention, the residual error unit in Darknet-53 adopts a cross-connection structure and a cross-stage dense feature reuse block, so that the feature extraction capability of the network is further improved, and the feature reusability is improved.

(2) The invention provides a feature pyramid composite neural network structure combining context enhancement and feature refinement, which can acquire rich context information and introduce a feature refinement mechanism to inhibit conflict information, so that a tiny target is prevented from being submerged in conflict semantic information, and the detection precision of finer crack defect detection is improved.

(3) The invention introduces a channel attention module and a space attention module, distributes different weights to the characteristic information at different spatial positions or channels, leads the model to pay attention to the channel with rich characteristic information, and ignores useless characteristic information at the same time so as to improve the identification capability of the model.

Drawings

FIG. 1 is a schematic flow chart of a crack detection method for a metal part;

FIG. 2 is a schematic diagram of the YOLOv3 network structure of the present invention;

FIG. 3 is a schematic view of a channel attention module;

FIG. 4 is a schematic view of a spatial attention module;

FIG. 5 is a schematic diagram of a CBAM module;

FIG. 6 is a schematic diagram of a pyramid composite network structure according to the present invention.

Detailed description of the preferred embodiments

Further details of the present invention are provided below with reference to the accompanying drawings.

Example one

Fig. 1 is a schematic flow chart of a method for detecting cracks of a metal part based on deep learning and ultrasonic infrared images, and as can be seen from the schematic flow chart, the method comprises the following steps:

s2, carrying out image preprocessing on the acquired data set;

s3, carrying out feature extraction on the cracks of the metal part under ultrasonic excitation on each preprocessed image by adopting an improved YOLOv3 feature extraction network to obtain a feature map, and training a crack detection network model;

In step S3, the step of obtaining the metal part crack detection network model is as follows:

the metal part crack detection algorithm based on deep learning and ultrasonic infrared images needs to learn the characteristics of metal part cracks under ultrasonic excitation infrared images from sample data sets, so that the self-established VOC representative data set comprises infrared images of metal parts with various specifications and crack defects inside, and the method comprises the following specific steps:

(1) Storing the training pictures into the created data set folder;

(2) Marking the crack defect position in the training picture by using a LabelImg marking tool to generate an xml label;

(3) The txt file needed for training is generated.

successful model training requires a large number of parameters, so that the parameters can work correctly, and a large amount of data is required for training, but in actual situations, the data is not too much as imaginable, so that a data enhancement technology is required for enriching a training set, and the generalization capability and robustness of the model are improved.

A CutMix data enhancement method is adopted, namely after a part of area is cut off, area images of other data in a training set are randomly filled for patching, and then training is carried out, wherein the principle is as follows:

x _A and x _B Are two different training samples, y _A And y _B What is needed to generate is a new training sample and corresponding label for the corresponding label value CutMix：

And

the formula is as follows:

M∈{0,1} ^W×H to cut out the binary mask of the partial region and patch,. Is a pixel-by-pixel multiplication, 1 is a binary mask in which all pixels are 1, λ belongs to the Beta distribution λ -Beta (α, α), and let α =1 and λ obey a uniform distribution of (0, 1).

Firstly, the cutting area boundary box B = (r) _x ,r _y ,r _w ,r _h ) Sampling is performed to represent the sample x _A And x _B The cutting area of (1), sample x _A Is removed and is represented by sample x _B The clipping region B of (a) is filled.

Then sampling the holding mask M, wherein the length and the width of the holding mask M are proportional to the size of the sample, and a bounding box coordinate sampling formula is as follows:

ensure a cutting area ratio of

After the clipping region B is determined, the clipping region B in M is set to 0, and the other regions are set to 1. Then obtaining a new sample according to equation (1)

And a label

A3: clustering the data set by a K-means clustering algorithm;

the prior box of YOLOv3 was obtained by the authors by clustering the COCO dataset, while our detection target has differences in object size, position and shape from the target in the COCO dataset, and if it is used on their own dataset, part of the prior box is unreasonable. Therefore, the invention adopts a K-means clustering algorithm to cluster the VOC data set established by the invention, thereby obtaining a more accurate anchor frame, and the algorithm is as follows:

(1) Randomly selecting K points from the data set as the center of the initial cluster, wherein the center point is C = { C = { (C) } ₁ ,c ₂ ,...,c _k }；

(2) For each sample x in the dataset _i Calculating the distance from the cluster center points to which the distance from the cluster center point is the smallest, and then thinking that the cluster center points are divided into classes corresponding to the cluster centers;

(3) For each category i, recalculating the cluster centers for that category

(where | i | represents the total number of data of the category);

(4) And (4) repeating the step (2) and the step (3) until the cluster center is not changed any more.

A4: and training the improved YOLOv3 network model through a data set to obtain the network model capable of accurately detecting the cracks of the metal part.

In step A4, the step of training the metal part crack detection network model is as follows:

the improved YOLOv3 target detection network is adopted, the main network is a convolutional neural feature extraction network based on Darknet-53, and the structure of the convolutional neural feature extraction network is shown in figure 2. According to the invention, a cross-level residual block (CSRes-block) is introduced into Darknet-53, so that the input features are divided into two convolutions, and the channel is halved before the input features are input into a residual unit, thereby solving the problem caused by network deepening and further improving the feature extraction capability of the network. Meanwhile, a cross-stage dense feature reuse block (CSDense-block) is introduced, wherein a conventional Convolution is replaced by a deep Convolution (Depthwise Convolution), one Convolution kernel is responsible for one channel, and one channel is only convolved by one Convolution kernel, so that the feature reusability is improved, and the calculated amount is reduced to a certain extent while the information of each channel can be well reserved through the operation of depth and breadth. And Spatial Pyramid Pooling (SPP) blocks are added to more effectively increase the range of reception of the stem features.

In addition, the activation function Leaky ReLU in the convolutional neural network of Darknet-53 is modified into a Mish function, so that the characteristic information can be deeply inserted into the neural network, and therefore the minimum component of the proposed algorithm is composed of a conv + BN (Batch Normalization) + Mish (CBM) unit, namely, the convolution + Batch Normalization + Mish activation function.

A42: introducing a channel attention module and a space attention module to form a CBAM (convolutional block attention) module, distributing different weights to the characteristic information at different spatial positions or channels, allowing the model to focus on the channels rich in the characteristic information, simultaneously neglecting useless characteristic information, and then generating three characteristic graphs with different scales;

in step a42, the steps of obtaining the feature map are as follows:

l1: obtaining characteristic information of different gradients through continuous multilayer trans-level residual blocks;

The YOLOv3 model fuses deep features with shallow features through up-sampling operation, and a low-order feature map fused with deep semantic information has stronger feature expression capability. Due to the different importance of different feature channels and the different importance of different positions of the same feature channel, important information obtained by multilayer convolution in part of original feature maps can be lost when the feature maps are spliced. The invention introduces a Channel Attention Module (CAM) and a Space Attention Module (SAM) to form a Convolution Block Attention Module (CBAM), combines a cross-stage residual block (CSRes-block) and a cross-stage dense feature reuse block (CSDense-block), thereby modeling the correlation among channels and pixel points in a low-order feature diagram and improving the feature expression capability of the model to acquire important feature information.

Specifically, the channel attention module firstly deduces a thinner channel attention by adopting a global average pooling and maximum pooling method to obtain average pooling and maximum pooling characteristics, then inputs the average pooling and maximum pooling characteristics into a shared network modeling channel comprising two full connection layers and corresponding activation functions thereof, finally combines output characteristic vectors to obtain a channel attention weight of the input characteristic graph, and the calculation method is shown in formula (3). Finally, after the weight of the characteristic diagram channel is obtained, the original characteristic recalibration on the channel latitude is completed through multiplying and weighting with the input characteristic diagram.

M _c (F)＝σ(MLP(AvgPool(F))+MLP(MaxPool(F))) (3)

Wherein M is _c (F) Representing channel attention weight, sigma representing sigmoid activation function, MLP representing multilayer perceptron for sharing parameters, avgPool and MaxPool being average pooling layer and maximum pooling layer, respectively, and F representing input feature map. The structure is shown in fig. 3.

Spatial attention is focused on the most informative location, which is complementary to the channel attention. Global average pooling and max pooling operations are performed on the channel dimension, changing all input channels into two real numbers. The method comprises the steps of converting a feature map with the size of (h multiplied by w multiplied by c) into two feature maps with the size of (h multiplied by w multiplied by 1) of a single channel, forming one feature map with the size of (h multiplied by w multiplied by 1) through convolution operation with the size of 3 multiplied by 3, and finally obtaining spatial attention weight through an activation function, wherein the calculation formula is shown as a formula (4).

M _s (F)＝σ(f ^3×3 ([AvgPool(F)；MaxPool(F)])) (4)

Where σ denotes a sigmoid activation function, f ^3×3 Representing a3 x 3 convolution operation and F representing the input signature, the structure of which is shown in figure 4.

A feature pyramid composite network is presented that combines enhanced context and refined features. And fusing the features obtained by the multi-scale expansion convolution from top to bottom and injecting the fused features into a feature pyramid network to supplement context information. Introducing a channel and spatial feature refinement mechanism, inhibiting the formation of conflicts in multi-scale feature fusion, and preventing tiny targets from being submerged in conflict information

As shown in fig. 6, for the improved pyramid composite network, after the input image is downsampled 4, 8, 16, and 32 times, C2, C3, C4, and C5 represent features of different scales, respectively. F1, F2, F3 are generated by one layer of convolution for the features of C3, C4, C5 (C2 is discarded due to noise clutter). L1, L2 and L3 represent the corresponding features output after the results of the context enhancement module (CAM) and the features generated by the Feature Pyramid Network (FPN) are fused, P1, P2 and P3 represent the features generated by the refined feature module (FRM),

the invention first adopts a cross-level residual block in the backbone network to improve the reusability of the characteristics. And then, introducing a space attention and channel attention mechanism, forming a CBAM module to be inserted into the neck part of the backbone network, and enabling the model to realize efficient expression aiming at the required characteristic information. Secondly, a feature pyramid composite network combining enhanced context and refined features is designed, rich context information can be obtained, a tiny target is prevented from being submerged in conflict information, and effective detection of a model on finer crack defects is improved.

Claims

1. A metal part crack detection method based on deep learning and ultrasonic infrared images is characterized by comprising the following steps:

s2, carrying out image preprocessing on the acquired data set;

2. The metal part crack detection method based on deep learning and ultrasonic infrared images as claimed in claim 1, wherein the step of obtaining the crack detection network model is as follows:

a1: acquiring an infrared image of a metal part with cracks under ultrasonic excitation and making the infrared image into a VOC data set;

a3: clustering the data set by a K-means clustering algorithm;

3. The method for detecting the crack of the metal component based on the deep learning and the ultrasonic infrared image as claimed in claim 2, wherein the step of training the crack detection network model is as follows:

4. The metal part crack detection method based on the deep learning and the ultrasonic infrared image as claimed in claim 3, characterized in that the step of obtaining the feature map is as follows: