CN115439408A

CN115439408A - Metal surface defect detection method and device and storage medium

Info

Publication number: CN115439408A
Application number: CN202210920393.2A
Authority: CN
Inventors: 胡广华; 何文亮; 涂千禧; 王清辉; 焦安强
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-08-02
Filing date: 2022-08-02
Publication date: 2022-12-06

Abstract

The invention discloses a metal surface defect detection method, a device and a storage medium, wherein the method comprises the following steps: acquiring a defect composite image according to the normal sample image; performing dual-tree complex wavelet transform on the defect composite image, and transforming the image characteristics from a pixel domain to a wavelet domain to obtain low-frequency components and high-frequency components of multiple scales; modifying the obtained low-frequency component and high-frequency component, and performing inverse dual-tree complex wavelet transform to obtain a reconstructed image; training an image prediction model by adopting a defect synthesis image and a reconstructed image; and acquiring an image to be detected, inputting the image to be detected into the trained image prediction model, and outputting a detection result. The method does not need to collect real defect samples, can have better defect detection and positioning capability only by normal samples, and can be widely used for automatic online detection of metal surfaces with certain metal luster. The invention can be widely applied to the technical field of defect detection of metal surfaces.

Description

Metal surface defect detection method and device and storage medium

Technical Field

The invention relates to the technical field of metal surface defect detection, in particular to a metal surface defect detection method, a metal surface defect detection device and a storage medium.

Background

In the production process of arc metal workpieces (such as electronic commutators and body-in-white), defects on the surface of a product directly affect the quality of the product. Because the surfaces of an electronic commutator, a body-in-white and the like have the problems of poor material imaging characteristics, various defect types, different shapes, low defect contrast and the like, the defect detection of the surfaces of the products mainly depends on experienced technical workers through manual visual inspection at present, and the defects of low efficiency, low precision, low reproducibility, boring and the like become key bottlenecks which restrict enterprises to improve the product quality and improve the market competitiveness. For example, the internal data of a certain automobile company shows that in 2019, defects existing in a vehicle body before painting (namely a white vehicle body) account for 46% of the total defects, but 65% of the defects are discovered only after the vehicle is painted, so that the repair cost of the automobile is greatly increased.

At present, the surface defects are mainly detected by adopting a detection method based on supervised learning. While these methods are mature, model training requires the collection of a large number of defect samples, which is very difficult in many applications. Some studies therefore generate training samples by means of artificial synthesis, but the synthesized samples have distortion and cannot be well fitted to the real distribution of defects, so that once deployed in a real production scene, the detection capability of the model is seriously reduced.

The unsupervised method is a method for realizing detection by training the neural network model only by using normal (i.e. defect-free) samples, and can greatly reduce the training cost of the model. The basic principle is as follows: a 'normal' sample image is reconstructed for the test sample, and the region where the difference exists between the two images is determined as a defect region. Therefore, the core module of the algorithm based on the principle is the image reconstruction module in the network model. The ideal reconstruction module should keep the defect-free region in the input sample 'as is', and simultaneously fuse the context information 'seamless' to repair the defect region, so that the overall reconstruction result is highly fidelity. However, how to use only the normal sample to guide the network model to distinguish the normal area from the defect area and perform adaptive image restoration is a major technical difficulty of such methods. On one hand, the current methods generally carry out constraint and reconstruction on the image in a pixel domain, the methods have good effect on the regular texture image, but the effective constraint of the network is difficult to realize effective description on the main characteristics of the random texture image. On the other hand, because the arc-shaped metal surfaces such as an electronic commutator and the like have surface textures with strong randomness, the current mainstream reconstruction network modules are difficult to accurately reconstruct the normal region of the image, so that a large amount of noise caused by reconstruction errors exists in the detection result, and the detection capability of the method is limited.

Disclosure of Invention

To solve at least some of the technical problems in the prior art, an object of the present invention is to provide a method, an apparatus and a storage medium for detecting defects on a metal surface.

The technical scheme adopted by the invention is as follows:

a metal surface defect detection method comprises the following steps:

acquiring a defect composite image according to the normal sample image; wherein, the defect composite image is provided with a defect area;

performing dual-tree complex wavelet transform on the defect composite image, and transforming the image characteristics from a pixel domain to a wavelet domain to obtain low-frequency components and high-frequency components of multiple scales;

modifying the obtained low-frequency component and high-frequency component, and performing inverse dual-tree complex wavelet transform to obtain a reconstructed image;

training an image prediction model by adopting a defect synthesis image and a reconstructed image;

and acquiring an image to be detected, inputting the image to be detected into the trained image prediction model, and outputting a detection result.

Further, the acquiring a defect composite image according to the normal sample image includes:

acquiring a defect candidate region, and acquiring a mask image according to the defect candidate region;

generating an abnormal image according to the defect candidate area; the abnormal image comprises a salt and pepper noise image, a Gaussian noise image, a fading image or a different source data set image;

and fusing the normal sample image and the abnormal image according to the mask image to generate a defect area and obtain a defect composite image.

Further, the expression of the defect composite image is as follows:

in the formula (I), the compound is shown in the specification,

in order to synthesize the image, the image is synthesized,

in the case of a normal sample image,

is an abnormal image, beta is a blending factor, is a random floating point number obeying uniform distribution, indicates a pixel point multiplication,

an image of the mask is represented and,

as an image of the mask

And (5) taking the inverted binary image.

Further, the modifying the obtained low frequency component and high frequency component, and performing inverse dual-tree complex wavelet transform to obtain a reconstructed image, includes:

for the low-frequency wavelet coefficient graph, a reconstruction network is adopted to reconstruct the low-frequency component as the modified low-frequency component, and the wavelet coefficient graph obtained by reconstruction is consistent with the wavelet coefficient graph of the normal sample image;

for the high-frequency wavelet coefficient graph, the high-frequency component of each scale corresponds to a decision module; the decision module performs modulus calculation on the real part and the imaginary part of the wavelet coefficient image in each direction, and outputs a fractional value to each part in the image; if the output fractional value is larger than a preset threshold value, the local wavelet coefficient is reserved; otherwise, setting the local wavelet coefficient to 0; obtaining a modified high frequency component;

and performing inverse dual-tree complex wavelet transform on the modified low-frequency component and the modified high-frequency component to obtain a reconstructed image.

Further, the reconstruction network is a self-encoder network, and the expression of the loss function of the reconstruction network is as follows:

in the formula (I), the compound is shown in the specification,

to reconstruct the wavelet coefficient map of the network output,

performing the same dual-tree complex wavelet decomposition on a normal sample image to obtain a low-frequency wavelet coefficient graph;

representing structural similarity of two imagesLoss value.

Further, the expression of the loss function corresponding to the high frequency component is:

in the formula (I), the compound is shown in the specification,

for the high frequency component images processed by the decision module,

a high-frequency component image being a normal sample image, which is a pixel multiplication,

in order to image the mask, the mask is,

as an image of the mask

And (5) taking the inverted binary image.

Further, the expression of the loss function of the pixel domain corresponding to the reconstructed image is as follows:

in the formula (I), the compound is shown in the specification,

in order to reconstruct the image,

in the case of a normal sample image,

in order to be the image of the mask,

as an image of the mask

And (5) taking the inverted binary image.

Further, the training of the image prediction model by using the defect synthesis image and the reconstructed image includes:

acquiring a defect synthetic image, a reconstructed image and a residual image between the two images, and stacking the defect synthetic image, the reconstructed image and the residual image along the direction of the number of channels as the input of an image prediction model;

and training the image prediction model by adopting a Focal loss function to enable the defect area detected by the image prediction model to be consistent with the defect candidate area on the defect synthetic image.

The other technical scheme adopted by the invention is as follows:

a metal surface defect detection apparatus comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method described above.

The invention adopts another technical scheme that:

a computer readable storage medium in which a processor executable program is stored, which when executed by a processor is for performing the method as described above.

The beneficial effects of the invention are: the method does not need to collect real defect samples, can have better defect detection and positioning capability only by normal samples, and can be widely used for automatic online detection of metal surfaces with certain metal luster.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description is made on the drawings of the embodiments of the present invention or the related technical solutions in the prior art, it should be understood that the drawings in the following description are only for convenience and clarity of describing some embodiments in the technical solutions of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow diagram of a network framework in an embodiment of the invention;

FIG. 2 is a schematic illustration of some example images of a composite image in an embodiment of the invention; wherein fig. 2 (a) is a salt-pepper noise composite image, fig. 2 (b) is a gaussian noise composite image, fig. 2 (c) is a fading composite image, and fig. 2 (d) is a heterogeneous data set composite image;

FIG. 3 is a schematic diagram of a reconstructed network model according to an embodiment of the present invention;

FIG. 4 is a block diagram of a decision module according to an embodiment of the present invention;

FIG. 5 is an original image and a reconstructed image of a normal sample and a defective sample in a KSDD2 dataset of a reconstruction network pair according to an embodiment of the present invention; wherein, fig. 5 (a) is a normal sample original, and fig. 5 (b) is a reconstructed image of the corresponding sample; FIG. 5 (c) is an original image of a defect sample, and FIG. 5 (d) is a reconstructed image of the defect sample;

FIG. 6 is a diagram illustrating an exemplary architecture of an image prediction module according to an embodiment of the present invention;

FIG. 7 is a graph of the results of the tests on the KSDD2 open source dataset and the bondboard dataset for a simulated body-in-white surface in accordance with an embodiment of the present invention;

FIG. 8 is a flowchart illustrating a method for detecting defects on a metal surface according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

In the description of the present invention, it should be understood that the orientation or positional relationship referred to in the description of the orientation, such as upper, lower, front, rear, left, right, etc., is based on the orientation or positional relationship shown in the drawings only for the convenience of description of the present invention and simplification of the description, and does not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention.

In the description of the present invention, the meaning of a plurality of means is one or more, the meaning of a plurality of means is two or more, and larger, smaller, larger, etc. are understood as excluding the number, and larger, smaller, inner, etc. are understood as including the number. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.

In the description of the present invention, unless otherwise explicitly limited, terms such as arrangement, installation, connection and the like should be understood in a broad sense, and those skilled in the art can reasonably determine the specific meanings of the above terms in the present invention in combination with the specific contents of the technical solutions.

As shown in fig. 1 and 8, in the method for detecting metal surface defects according to the present embodiment, based on Q-Shift dual-tree complex wavelet transform and convolution network, an artificial defect image is first generated by using an image enhancement module, then dual-tree complex wavelet transform is performed on the image, image features are transformed from a pixel domain to a wavelet domain, then low-frequency components and high-frequency components are respectively processed by using a reconstruction network and a decision network, then inverse dual-tree complex wavelet transform is performed, the image is transformed from the wavelet domain back to the pixel domain to obtain a reconstructed image, and finally a detection result is output by using an image prediction module. Before detection, the method can realize end-to-end defect detection by only collecting a certain number of normal sample training network models without collecting any real defect samples. The method specifically comprises the following steps:

s1, acquiring a defect composite image according to a normal sample image; wherein, the defect composite image is provided with a defect candidate area.

An image enhancement module: by utilizing an image processing technology, the normal image is randomly destroyed to obtain a synthesized image with the defect, and a defect sample is provided for a subsequent module.

Specifically, the image enhancement module is composed of a mask generation module and an abnormal image generation module, firstly, the mask generation module is used for generating a defect candidate region, then, the abnormal image generation module is used for filling contents into the defect candidate region to obtain a composite image, and the network model is guided to distinguish a normal region from a defect region.

As an alternative embodiment, the mask generation module is a random algorithm for generating defect candidate regions. The mask generation module can generate 3 types of mask regions with different shapes and sizes: regular areas, irregular areas, and berlin noise areas.

The random generation algorithm of the regular area mask can randomly generate rectangular areas with different sizes. The random generation algorithm of the irregular area mask can generate various irregular areas which are composed of three elements of line segments, circles and squares. The random generation algorithm of the Berlin noise area mask is a random algorithm based on the Berlin noise image, and candidate areas can be obtained by carrying out threshold values on the Berlin noise image.

By using the mask generation module, a series of various candidate regions can be obtained, and then the abnormal image generation module is used for filling the contents in the candidate regions. The abnormal image generation module can generate four types of abnormal images: salt and pepper noise images, gaussian noise images, fading images, and outlier data set images.

After the mask image and the abnormal image are obtained, image synthesis can be performed, and finally a synthesized image is obtained. The composite image may be represented by the following formula:

in the formula (I), the compound is shown in the specification,

in order to synthesize the image, the image is synthesized,

in the case of a normal sample image,

is an abnormal image, β is a blending factor, is a random floating point number obeying uniform distribution, an inner circle indicates a pixel point multiplication,

representing the mask image.

And S2, performing dual-tree complex wavelet transformation on the defect composite image, and transforming the image characteristics from a pixel domain to a wavelet domain to obtain low-frequency components and high-frequency components with multiple scales.

And S3, modifying the obtained low-frequency component and high-frequency component, and performing inverse dual-tree complex wavelet transformation to obtain a reconstructed image.

An image reconstruction module: a network model formed by Q-Shift dual-tree complex wavelet transform and a convolutional neural network is adopted, and the module is formed by a reconstruction network and a decision module. In the wavelet domain, the reconstruction network reconstructs low-frequency components, the decision module reconstructs high-frequency components in a classified manner, and in the pixel domain, additional constraint is performed on a normal region in the image.

Specifically, the image reconstruction module is composed of a reconstruction network and a plurality of decision modules. Firstly, the defect composite image obtained in the step S1 is used as input, an image reconstruction module firstly carries out dual-tree complex wavelet transformation on the image to respectively obtain multi-level high-frequency components and low-frequency components, then a reconstruction network carries out reconstruction on the low-frequency components, a decision module carries out local decision-by-local decision on each high-frequency component image so as to only reserve the high-frequency components of a normal area, and then inverse dual-tree complex wavelet transformation is carried out to obtain a reconstructed image.

As an alternative embodiment, first, the image is subjected to multilevel Q-Shift dual-tree complex wavelet transform to obtain high-frequency components and low-frequency components of multiple scales. For the low-frequency wavelet coefficient map, the reconstruction network reconstructs the low-frequency wavelet coefficient map, the reconstructed wavelet coefficient map is required to be consistent with the wavelet coefficient map of the image before damage, and the reconstruction network adopted in the embodiment is a self-encoder network. For the high-frequency wavelet coefficient map, a decision module is corresponding to the high-frequency component of each scale. The decision module first modulo the real and imaginary parts of the wavelet coefficient map for each direction and then the convolution network outputs a fractional value for each small local in the image. If the fraction value is greater than 0.5, the wavelet coefficients of the local part are retained, otherwise the wavelet coefficients of the local part are set to 0. After the above-mentioned treatment, the modified low-frequency component and high-frequency component can be obtained, then they are undergone the process of inverse dual-tree complex wavelet transformation, and the image can be restored from wavelet domain to pixel domain so as to obtain the reconstructed image.

And S4, training the image prediction model by adopting the defect synthetic image and the reconstructed image.

An image prediction module: the method comprises the steps of taking a synthetic image, a reconstructed image and a residual image between the synthetic image and the reconstructed image which are stacked along the direction of the number of channels as input of a model, adopting a network model similar to a Unet structure by a network structure, outputting an abnormal score value for each pixel position in the image by the model, and taking the maximum abnormal score value in the image as an abnormal score of the image.

Specifically, defect segmentation and localization: and (3) stacking the synthetic image obtained in the step (S2), the reconstructed image obtained in the step (S3) and a residual image between the synthetic image and the reconstructed image along the direction of the number of channels, taking the residual images as the input of an image prediction module, directly outputting an abnormal score value of each pixel position by a network model to obtain an abnormal score image, and taking the maximum abnormal score value in the image as the abnormal score of the image.

As an alternative embodiment, first, the defect synthesized image in step S1, the reconstructed image in step S2, and the residual map between the two maps are taken as input. Then, in the training process, the image prediction module is required to output the consistency of the defect candidate areas in the step S1 by adopting the Focal loss. And finally, taking the maximum value in the detection result graph output by the image prediction module as the abnormal score of the image.

And S5, obtaining an image to be detected, inputting the image to be detected into the trained image prediction model, and outputting a detection result.

The above method is explained in detail with reference to the drawings and the specific embodiments.

In this embodiment, a kolektor sdd2 open source data set (KSDD 2 for short) is used as a detection object, and a defect detection method for industrial metal products is introduced, which includes the following steps:

s101, the size of an original KSDD2 data set image is not fixed, and a mode of directly adjusting resolution has certain negative influence on the performance of the method, so that the minimum side length (the length is 184 pixels) in a training set and a testing set is obtained at first, then a square image block with the side length of 184 pixels is randomly cut out for each image in the training set and the testing set, the size is uniformly adjusted to 256 pixels multiplied by 256 pixels, the training set comprises 2085 normal samples, and the testing set comprises 51 defect samples and 99 normal samples.

S102, the image enhancement module is composed of two sub-modules: the device comprises a mask generation module and an abnormal image generation module. The mask generation module can generate 3 types of mask regions of different shapes and sizes: regular regions, irregular regions, and berlin noise regions. The random generation algorithm of the regular region mask is to randomly generate rectangular regions with different sizes, and randomly generate rectangular regions occupying one tenth to one fourth of the area of an image according to the size of an input image. The irregular area generated by the random generation algorithm of the irregular area mask is composed of three elements of a line segment, a circle and a square, the value range of the line width, the radius and the side length is [1,25] pixels, the value range of the line segment length is [10,70] pixels, the included angle range between two adjacent sides is [0 degrees and 130 degrees ], each communication area is randomly composed of 1 to 5 similar elements, and the enhancing times of the same image are 10 times at most. The random generation algorithm for the Berlin noise region mask is a random algorithm based on the Berlin noise image. Firstly, a Berlin noise image is generated by a random algorithm, in order to further improve the randomness of the area, the random algorithm randomly rotates the noise image, the angle range is [ -45 degrees, and 45 degrees ], then a threshold value is carried out on the rotated image, the area larger than the threshold value is a defect candidate area, and the threshold value selected in the example is 0.5.

Then, the abnormal image generation module may generate four types of abnormal images: salt and pepper noise images, gaussian noise images, fading images, and heterogeneous data set images. For salt-pepper noise images, the signal-to-noise ratio is a uniformly distributed random number that obeys a range of values of [0.1,0.9 ]. The gaussian noise image is a truncated normal distribution that follows a normal distribution with a mean of 0 and a standard deviation of 0.5, with an upper limit of 1.0 and a lower limit of 0. For a fading image, the random algorithm firstly inputs a corresponding image of a normal sample and normalizes the image to 0 to 1, and then randomly adds a floating point number to the whole image, so that the brightness of the whole image is deviated, and when the image is synthesized with an original image, local gray value change occurs, and the floating point number adopted in the embodiment is a random number which is uniformly distributed and obeys a numeric area of- [0.1,0.5] < U [0.1,0.5 ]. For the iso-source dataset image, the DTD texture dataset is chosen for this example.

After the mask image and the abnormal image are obtained, image synthesis can be carried out, and finally the artificially synthesized image is obtained. The artificially synthesized image may be represented by the following formula:

in the formula (I), the compound is shown in the specification,

in order to synthesize the image, the image is synthesized,

in the case of a normal sample image,

is an abnormal image (salt and pepper noise image, gaussian noise image, fading image and different source data set image), beta is a fusion factor, and the value range is subject to [0,0.5 ]]Uniformly distributed random floating-point numbers,. Indicates a pixel point multiplication，

Representing the mask image.

Fig. 2 shows four synthetic image samples, fig. 2 (a) is a salt-pepper noise synthetic image, fig. 2 (b) is a gaussian noise synthetic image, fig. 2 (c) is a fading synthetic image, and fig. 2 (d) is a heterogeneous dataset synthetic image. Wherein the proportion of different mask types is: { rule area: line segment irregular area: circular irregular area: square irregular area: berlin noise area: defect-free region = 3. In the image containing artificial defects, the proportion of different defect types is as follows: { salt and pepper noise enhancement: gaussian noise enhancement: and (3) fading enhancement: heterologous image enhancement = 1.

Thus, this step results in two images in total, a composite image and a mask image.

S103, after the composite image is obtained from the previous step and is used as the input of an image reconstruction module, 3-level Q-Shifit dual-tree complex wavelet decomposition is firstly carried out on the image, the first-level length of the mother wave is 13/19tap, the second-level length and the above length are 10tap, and the image is sampled 2 every time decomposition is carried out. After 3-level decomposition, three-scale high-frequency components and one-scale low-frequency component are obtained.

S104, for low-frequency components, reconstructing a wavelet coefficient graph by using a reconstruction network, wherein the embodiment adopts a network model of a self-encoder structure, FIG. 3 shows a network structure schematic diagram of the reconstruction network for J-level dual-tree complex wavelet decomposition, in the diagram, J represents the number of decomposition levels, h is the height of an image, w is the width of the image, the number above a characteristic diagram represents the number of channels of the characteristic diagram, conv 5 multiplied by 5, reLU represents that the size of a convolution kernel is 5, the step size is 1, and an activation function is a convolution layer of a ReLU function; conv 3 × 3, reLU denotes convolution layer with convolution kernel size of 3, step size 1, and activation function of ReLU function; maxpooling,2 × 2 represents the maximum pooling layer with convolution kernel size of 2 and step size of 2; upsample,2 represents bilinear interpolation upsampling, and the amplification factor is 2; conv 3 × 3 represents the convolutional layer with convolutional kernel size of 3, step size of 1, no activation function.

The reconstructed network loss function is:

in the formula (I), the compound is shown in the specification,

to reconstruct the wavelet coefficient map of the network output,

the low-frequency wavelet coefficient graph is obtained by carrying out the same dual-tree complex wavelet decomposition on a normal image (an image before artificial damage). Loss of structural similarity, the expression is:

in the formula (I), the compound is shown in the specification,

representing the loss of structural similarity, mu, of two images within a window _x ，μ _y ，σ _x ，σ _y Respectively the mean value and the standard deviation of the x image and the y image; sigma _xy Covariance as x and y; stability factor C ₁ And C ₂ Is constant and is 0.0001 and 0.0009 respectively.

And S105, for the high-frequency component, each level of wavelet decomposition generates wavelet coefficients of 6 directions of +/-15 degrees, +/-45 degrees and +/-75 degrees of the corresponding scale, and the wavelet coefficients of each direction are composed of a real part and an imaginary part. In this example, decision modules of the same network structure are used for high-frequency components of different scales, and a decision module of the J-th order high-frequency component is taken as an example to be described below.

And the classification network in the decision module judges whether the high-frequency sub-band in any small area in each direction is reserved or not in an image block level classification mode. The decision module firstly performs a modulus operation on the real part and the imaginary part:

in the formula (I), the compound is shown in the specification,

modulo the real and imaginary parts of the high frequency subband coefficients,

is the real part of the high frequency subband coefficients,

the imaginary part of the high frequency subband coefficients.

Inputting modulus of high-frequency sub-band coefficient diagrams of J-th level dual-tree complex wavelet decomposition in all directions into a classification network, wherein the input size of the classification network is

An output of magnitude

The classification score for each pixel position of the classification net result corresponds to 2 for the original image ^J+2 Pixel 2 ^J+2 A small area of pixels. After the classification score is obtained, nearest neighbor linear interpolation is carried out on the classification score to adjust the classification score to be the same as the input shape. In the training stage, the classification fraction after the size adjustment and the original high-frequency wavelet coefficient image are subjected to pixel point multiplication to obtain a reconstructed wavelet coefficient image. In the testing stage, the classification score is binarized, the threshold value is 0.5, the classification score larger than 0.5 is set as 1, the classification score smaller than 0.5 is set as 0, and the small image block-level decision classification can be realized:

in the formula (I), the compound is shown in the specification,

represents the original heightThe frequency wavelet coefficient, d, represents the classification score for the current pixel location.

In FIG. 4, the numbers above the feature map represent the shape of the feature map, the Modulo operation represents the Modulo operation, conv 5 × 5, reLU represents the convolution kernel size of 5, step 1, and the activation function is the convolution layer of the ReLU function; conv 3 × 3, reLU denotes convolution layer with convolution kernel size of 3, step size 1, and activation function of ReLU function; maxpooling,2 × 2 represents the maximum pooling layer with convolution kernel size of 2 and step size of 2; upsample,4 represents nearest neighbor interpolation upsampling, and the amplification factor is 4; conv 3 × 3, sigmoid indicates that the convolution kernel size is 3, the step size is 1, and the activation function is an unbiased convolution layer of the Sigmoid function.

Performing 3-level complex wavelet transform will generate 3 high frequency components of different scales and 1 low frequency component of scales, setting the low frequency component to 0, and then performing inverse wavelet transform to obtain an image with only high frequency component. The formula for the loss function for the high frequency component is:

in the formula (I), the compound is shown in the specification,

to "reconstruct" the high frequency component image for the decision module,

a high-frequency component image before an image damage, a multiplication of pixels,

is the mask image obtained in step S2.

S106, obtaining reconstructed low-frequency components and high-frequency components through S104 and S105, and performing inverse dual-tree complex wavelet transformation on the low-frequency components and the high-frequency components to obtain a reconstructed image, wherein a loss function calculation formula of a pixel domain is as follows:

in the formula (I), the compound is shown in the specification,

in order to reconstruct the image,

is a normal image.

Fig. 5 shows a reconstructed image of a normal image and a defect image in the KSDD2 data set by the reconstruction module, where fig. 5 (a) is a normal sample, fig. 5 (b) is a reconstructed image corresponding to the normal sample, fig. 5 (c) is a defect sample (a defect region is in a red curve region), and fig. 5 (d) is a reconstructed image corresponding to the defect sample, and the synthesized image and the reconstructed image are compared to obtain an L1 residual map.

And S107, stacking the synthetic image, the reconstructed image and the residual image as the input of the image prediction module.

The network structure of the image prediction module is shown in FIG. 6, the number above the feature map represents the number of channels of the feature, the number at the lower left corner of the feature map represents the size of the feature map, conv 5 × 5, reLU represents the size of a convolution kernel of 5, step size 1, and the activation function is the convolution layer of the ReLU function; maxpooling,2 × 2 represents the maximum pooling layer with convolution kernel size of 2 and step size of 2; conv 3 × 3, reLU denotes convolution layer with convolution kernel size of 3, step size 1, and activation function of ReLU function; conv 1 × 1, sigmoid indicates that the size of a convolution kernel is 1, the step length is 1, and an activation function is an unbiased convolution layer of a Sigmoid function; dConv 3 × 3, reLU denotes convolution kernel size of 3, step size of 2, activation function of deconvolution of ReLU; concatenation represents stacking in the direction of the number of channels; skip, termination represents a Skip connection layer; maximum means max operation.

The loss function of the image prediction module adopts the following Focal loss formula:

in the formula (I), the compound is shown in the specification,

is an anomaly score map output by the image prediction module,

is the mask image. Wherein the value of the gamma of the Focal loss is 2.

The maximum abnormality score value in the abnormality score map is taken as the abnormality score of the image. In addition, the Adam optimizer with the learning rate of 0.0001 is selected as the optimizer in the embodiment, and the iteration number is 100.

FIG. 7 shows some of the test results in this case and some of the test results of the method herein on phosphated panels with a curvature variation similar to body-in-white metal. In the figure, the first line is an original image, the second line is a reconstructed image, the third line is a GT image, and the fourth line is a detection result, that is, an abnormal score image. The result shows that the method has better detection performance on the metal surface.

In summary, compared with the prior art, the method of the embodiment has the following advantages and beneficial effects:

1. the method provided by the invention does not need to collect any real defect sample before detection, does not need to use a standard image as a reference template, is particularly suitable for the surface with the texture characteristics with strong randomness, has good detection performance on the regular texture surface, and has the advantages of high detection speed, high precision, stable detection result and good adaptability.

2. In addition, the invention can better guide the network to segment and position the defect region by stacking the test image, the reconstructed image and the residual image between the test image and the reconstructed image as the input of the image prediction module.

3. The image enhancement module provides defect random generation algorithms with various shapes and types for artificial defects on the texture surface, and has certain reference significance for other defect detection algorithms.

The present embodiment also provides a metal surface defect detecting device, including:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of fig. 8.

The metal surface defect detection device of the embodiment can execute the metal surface defect detection method provided by the method embodiment of the invention, can execute any combination of the implementation steps of the method embodiment, and has corresponding functions and beneficial effects of the method.

The embodiment of the application also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor, causing the computer device to perform the method illustrated in fig. 8.

The embodiment also provides a storage medium, which stores instructions or programs capable of executing the metal surface defect detection method provided by the embodiment of the method of the invention, and when the instructions or the programs are run, the instructions or the programs can be used for executing any combination of the embodiment of the method, so that the corresponding functions and beneficial effects of the method are achieved.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be understood that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, and various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following technologies, which are well known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A metal surface defect detection method is characterized by comprising the following steps:

performing dual-tree complex wavelet transform on the defect composite image, and transforming image features from a pixel domain to a wavelet domain to obtain low-frequency components and high-frequency components of multiple scales;

training an image prediction model by adopting a defect synthetic image and a reconstructed image;

2. The method of claim 1, wherein the obtaining a defect composite image from the normal sample image comprises:

3. A method as claimed in claim 2, wherein the defect composite image is expressed by the following formula:

in the formula (I), the compound is shown in the specification,

in order to synthesize the image(s),

in the case of a normal sample image,

an image of the mask is represented and,

as an image of the mask

And (5) taking the inverted binary image.

4. The method for detecting defects on a metal surface according to claim 1, wherein the modifying the obtained low frequency component and high frequency component and performing inverse dual-tree complex wavelet transform to obtain a reconstructed image comprises:

for the low-frequency wavelet coefficient graph, a reconstruction network is adopted to reconstruct a low-frequency component as a modified low-frequency component, and the wavelet coefficient graph obtained by reconstruction is consistent with the wavelet coefficient graph of a normal sample image;

for the high-frequency wavelet coefficient graph, the high-frequency component of each scale corresponds to a decision module; the decision module performs modulus calculation on the real part and the imaginary part of the wavelet coefficient image in each direction, and outputs a fractional value to each part in the image; if the output fraction value is larger than a preset threshold value, the local wavelet coefficient is reserved; otherwise, setting the local wavelet coefficient to 0; obtaining a modified high frequency component;

and performing inverse dual-tree complex wavelet transform on the modified low-frequency component and high-frequency component to obtain a reconstructed image.

5. The method of claim 4, wherein the reconstruction network is a self-encoder network, and the loss function of the reconstruction network is expressed as:

in the formula (I), the compound is shown in the specification,

to reconstruct the wavelet coefficient map of the network output,

representing the structural similarity loss values of the two images.

6. The method of claim 4, wherein the loss function corresponding to the high frequency component is expressed as:

in the formula (I), the compound is shown in the specification,

for the high frequency component images processed by the decision module,

in order to image the mask, the mask is,

as an image of the mask

And (5) taking the inverted binary image.

7. The method according to claim 4, wherein the expression of the loss function of the pixel domain corresponding to the reconstructed image is as follows:

in the formula (I), the compound is shown in the specification,

in order to reconstruct the image,

in the case of a normal sample image,

in order to image the mask, the mask is,

as an image of the mask

And (5) taking the inverted binary image.

8. The method as claimed in claim 1, wherein the training of the image prediction model by using the defect synthesized image and the reconstructed image comprises:

9. A metal surface defect detection apparatus, comprising:

at least one processor;

at least one memory for storing at least one program;

when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1-8.

10. A computer-readable storage medium, in which a program executable by a processor is stored, wherein the program executable by the processor is adapted to perform the method according to any one of claims 1 to 8 when executed by the processor.