CN117649387A

CN117649387A - Defect detection method suitable for object with complex surface texture

Info

Publication number: CN117649387A
Application number: CN202311630765.9A
Authority: CN
Inventors: 胡冰; 程坦; 刘涛
Original assignee: Zhongkehaituo Wuxi Technology Co ltd
Current assignee: Zhongkehaituo Wuxi Technology Co ltd
Priority date: 2023-11-30
Filing date: 2023-11-30
Publication date: 2024-03-05
Anticipated expiration: 2043-11-30
Also published as: CN117649387B

Abstract

The invention relates to the technical field of data processing, in particular to a defect detection method suitable for an object with a complex surface texture. The method comprises the following steps: performing patch image division on the object surface image data to generate object surface patch image data; constructing a twin network structure of image feature decoupling based on the variation self-encoder to obtain an image feature decoupling network model; constructing a joint loss function of the image feature decoupling network model according to the image feature decoupling network model; performing network model optimization on the image feature decoupling network model according to the joint loss function of the image feature decoupling network model to generate an optimized image feature decoupling network model; and transmitting the object surface patch image data to an optimized image feature decoupling network model for constant factor prediction, and generating an object surface image constant factor. The invention realizes that the defect abnormality detection performance of the object with the complex surface texture can be well shown by only relying on rare abnormality training samples.

Description

Defect detection method suitable for object with complex surface texture

Technical Field

The invention relates to the technical field of data processing, in particular to a defect detection method suitable for an object with a complex surface texture.

Background

In the task of detecting surface anomalies of industrial products with complex textures, which do not affect the quality of the product, but often are mixed with defect features, so that it is difficult to accurately describe the defect features, the complex textures may include various colors, patterns and irregular shapes, which may be confused with defects, leading to failure of standard detection methods, and a special method is required to accurately identify and distinguish the actual defects of the surfaces so as to ensure the quality and safety of the product, however, the conventional defect detection method suitable for objects with complex textures on the surfaces is very difficult and costly to collect and mark the defect data due to the fact that the defects are generated in the practical application, and when the anomaly detection method based on supervised deep learning is applied to such tasks, serious overfitting problems are caused due to insufficient training samples, so that the detection task cannot be completed well.

Disclosure of Invention

Based on the above, the present invention provides a defect detection method suitable for an object with a complex texture on a surface, so as to solve at least one of the above technical problems.

To achieve the above object, a defect detection method suitable for an object with a complex texture on a surface, comprises the steps of:

step S1: collecting object surface image data by using monitoring equipment to generate object surface image data; patch image division is carried out on the object surface image data according to the preset patch image size data, and object surface patch image data are generated;

step S2: constructing a twin network structure of image feature decoupling based on the variation self-encoder to obtain an image feature decoupling network model; designing a variable self-encoder integral loss function and a constant factor discrimination loss function according to the image characteristic decoupling network model;

step S3: constructing a joint loss function of an image characteristic decoupling network model based on the integral loss function of the variable self-encoder and the invariant factor discrimination loss function; performing network model optimization on the image feature decoupling network model according to the joint loss function of the image feature decoupling network model to generate an optimized image feature decoupling network model;

step S4: transmitting the patch image data of the object surface to an optimized image feature decoupling network model for constant factor prediction, and generating an object surface image constant factor; constructing an optimized abnormal score discriminator model based on the deep neural network model; mapping the object surface image invariant factors into an anomaly score discriminator model for evaluation of object surface anomaly scores so as to generate object surface anomaly score data; when the object surface anomaly score data is larger than a preset anomaly score threshold value, marking the object surface image data corresponding to the object surface anomaly score data as object surface defect image data; and when the object surface anomaly score data is not greater than a preset anomaly score threshold value, marking the object surface image data corresponding to the object surface anomaly score data as object surface conventional image data.

The invention ensures that the collected image data has high quality through high-precision acquisition of the monitoring equipment, which is important for subsequent defect detection. High quality image data can provide richer, more accurate surface texture information, thereby enhancing the accuracy of defect detection. The image data is divided according to the preset patch image size, so that the efficiency and the accuracy of data processing are further improved, the image processing becomes more flexible and efficient, and potential defect areas can be more accurately positioned and identified particularly when objects with complex surface textures are processed. The image characteristic decoupling is carried out by using the twin network structure constructed by the variational self-encoder, so that the complex texture characteristics of the object surface can be effectively separated and identified, the accuracy of identifying the defects is improved, the model can more deeply understand and process the complex texture by decomposing and reconstructing the image characteristics, the detection effect is improved, and the model learning process is further optimized by aiming at the integral loss function and the constant factor acquaintance loss function of the variational self-encoder designed by the image characteristic decoupling network model. These loss functions not only improve the model's ability to capture features, but also enhance the model's robustness in handling different textures and defect types. The construction of the joint loss function combines the integral loss function of the encoder and the constant factor acquaintance loss function, so that the decoupling capacity of the model to the image characteristics is improved, the identification and learning capacity of the model to the constant factors in the image is also enhanced, and the comprehensive loss function enables the model to consider the image characteristics more comprehensively in the learning process, so that the accuracy and efficiency of defect detection are improved. And secondly, the network model is optimized based on the joint loss function, so that the performance of the model is further improved, the processing capacity of the model on complex textures is enhanced, and the adaptability of the model in the face of different types of defects is improved. The object surface patch image data is transmitted to the optimized image feature decoupling network model for invariant factor prediction, the optimized anomaly score discriminator model constructed by the deep neural network is used for anomaly score, the object surface patch image data is transmitted to the optimized network model for invariant factor prediction, and therefore the image features critical to defect detection can be effectively extracted, accuracy of defect detection is improved, the defect detection is finer and more accurate through accurate analysis of the image features of the object surface, the invariant factor of the object surface image is evaluated based on the anomaly score discriminator model constructed by the deep neural network, and accuracy and reliability of defect detection are further improved. The model can effectively distinguish normal image characteristics from abnormal image characteristics, so that defect judgment is more accurate and sensitive, corresponding image data can be automatically marked as a defect image when abnormal scoring data of the surface of an object exceeds a preset threshold value, and the corresponding image data is marked as a conventional image when the scoring data does not exceed the threshold value. The automated marking process greatly improves the efficiency of defect detection and reduces the need for manual intervention. Therefore, the defect detection method suitable for the object with the surface complex texture performs abnormality detection through the feature decoupling model, can show good abnormality detection performance only by means of rare abnormality samples, has better detection performance than other abnormality detection methods, and can finish detection tasks well.

Preferably, step S1 comprises the steps of:

step S11: collecting object surface image data by using monitoring equipment to generate object surface image data;

step S12: converting the gray level image data of the object surface to generate gray level image data of the object surface;

step S13: and carrying out patch image division on the object surface gray level image data according to the preset patch image size data to generate object surface patch image data.

The invention uses the monitoring equipment to collect the image data of the object surface, ensures the high resolution and definition of the collected image, is of great importance to the subsequent image processing and defect detection, and the high definition image can provide finer surface texture information, thereby improving the accuracy of defect detection, reducing manual intervention in automatic image collection, improving the efficiency of data collection, and being particularly suitable for large-scale or continuous production line detection. Converting object surface image data into gray scale images simplifies the color information of the images, which helps to more intensively process the texture and shape characteristics of the images, gray scale conversion reduces the complexity of processing, and simultaneously retains information critical to defect detection, and gray scale images helps to more clearly highlight the texture characteristics of the object surface, so that subsequent texture analysis and defect identification are more accurate. Dividing the gray level image according to the preset patch image size allows the system to process and analyze the characteristics of each small area more accurately, thereby improving the capability of identifying local defects.

Preferably, step S2 comprises the steps of:

step S21: constructing a twin network structure of image feature decoupling based on a variation self-encoder to obtain an image feature decoupling network model, wherein the image feature decoupling network model comprises a first full-connection layer and a second full-connection layer;

step S22: transmitting a preset patch image training set to an encoder of an image feature decoupling network model for potential space mapping, and respectively generating a potential space image invariant factor and a potential space image variant factor, wherein the preset patch image training set is divided into first patch image data and second patch data, and the transmitting the preset patch image training set to the encoder of the image feature decoupling network model comprises the following steps: transmitting the first patch image data to the first fully-connected layer, and transmitting the second patch image data to the second fully-connected layer;

step S23: according to the potential space image invariant factors and the potential space image variant factors, carrying out integral loss function design of the variable self-encoder, and generating integral loss functions of the variable self-encoder;

step S24: and designing the degree of identity loss function of the potential space image invariant factor according to the potential space image invariant factor, and generating the invariant factor degree of identity loss function.

The invention can more effectively realize decoupling of image features by utilizing the twin network structure based on the variation self-encoder, is particularly beneficial to more accurately distinguishing normal textures and defects on the surface of an object with complex textures, and realizes decoupling of two components in potential representation by processing different patch image data respectively at a first full-connection layer and a second full-connection layer. The preset patch image training set is transmitted to the encoder for potential space mapping, and potential space image invariant factors and change factors can be generated respectively, so that image characteristics can be analyzed more carefully, the diversity of training data is increased through processing the first patch image data and the second patch image data respectively, and the generalization capability and accuracy of the model are improved. The integral loss function of the variational self-encoder designed based on the image invariant factors and the variational factors of the potential space is beneficial to the model to accurately learn and extract key image features, and the design of the loss function enables the model to better optimize and adjust parameters in the training process, so that the accuracy and reliability of defect detection are improved. The designed invariant factor discrimination loss function is beneficial to better identifying and retaining key invariant features in the image, which is important for accurately identifying the defects, and the discrimination capability of the model for different textures and defect types is enhanced by focusing on the invariant factor discrimination loss function, so that the model can be kept efficient and accurate under various conditions.

Preferably, the variation from the encoder overall loss function in step S23 is as follows:

in the method, in the process of the invention,expressed as a variation self-encoder overall loss function, < >>Denoted as q _φ (z _(c) ，z _(s) I x), phi is expressed as encoder parameters, q _φ The (-) function is expressed as a joint probability posterior distribution function of the encoder, θ is expressed as a decoder parameter, p _θ The (-) function is expressed as a joint probability posterior distribution function of the decoder, x is expressed as an input image parameter of the model, z _(c) Represented as latent aerial image invariant factor, z _(s) Represented as a latent aerial image variable factor, D _KL The (-) function is expressed as a relative entropy function of the probability distribution.

The present invention utilizes a variation from the encoder overall loss function, and when VAEs are used as the generation model, the generated data can be considered as a common representation of multiple independent components in potential space. Assuming that the potential representation z can be decoupled into two independent components, one corresponding to eachVarying texture features z _(s) And unchanged structural feature z _(c) The data generated by the decoder can then be regarded as a joint distribution of the two components. This can be expressed as: p is p _θ (x，z _(s) ，z _(c) )＝p _θ (x|z _(s) ，z _(c) )p(z _(s) )p(z _(c) ) Where θ is the decoder parameter, z _(s) And z _(c) The varying component and the common component of the potential representation, respectively. From the interpretation of the VAE optimization objective, the a priori distribution p (z _(c) ) And p (z) _(s) ) Is set to a standard normal distribution with a mean of zero and a variance of 1. P is p _θ (x|z _(c) ，z _(s) ) Also set to a normal distribution whose mean and variance are given by the encoder during the encoding phase. Phi is the encoder parameter, z _(c) And z _(s) The posterior distribution of (2) is also respectively set to be the mean value _xc (x)、μ _s (x) Variance is sigma _c (x)、σ _s (x) This can be expressed as a normal distribution of AndThe VAE encoding process is expressed as: q _φ (z _(c) ，z _(s) |x)＝q _φ (z _(c) |x)q _φ (z _(s) I x), i.e.: Thus, the overall loss function of the variable self-encoder in this network is designed, resulting in the variable self-encoder overall loss function.

Preferably, the invariant factor discrimination loss function in step S24 is as follows:

in the method, in the process of the invention,expressed as a constant factor affinity loss function, D _sim The (-) function is expressed as an acquaintance measure function, q _φ The (-) function is expressed as a joint probability posterior distribution function, z, of the encoder _(c) Expressed as a latent aerial image invariant factor, x ^l Represented as first patch image data, x ^m Represented as second patch image data;

wherein, the contrast loss function of the mahalanobis distance is used as the acquaintance measurement function, namely:

in the method, in the process of the invention,expressed as a constant factor acquaintance loss function, beta is expressed as a binary label for distinguishing whether the inputted first patch image data and second patch image data are of the same category, D _m Denoted as q _φ (z _(c) |x ^l ) Joint probability posterior distribution function of encoder and q _φ (z _(c) |x ^m ) The mahalanobis distance of the joint probability posterior distribution function of the encoder, M being set feature boundary data;

wherein the D is _m Denoted as q _φ (z _(c) |x ^l ) Joint probability posterior distribution function of encoder and q _φ (z _(c) |x ^m ) The specific expression of the mahalanobis distance of the joint probability posterior distribution function of the encoder is as follows:

wherein D is _m Denoted as q _φ (z _(c) |x ^l ) Joint probability posterior distribution function of encoder and q _φ (z _(c) |x ^m ) The mahalanobis distance of the joint probability posterior distribution function of the encoder, d is expressed as dimensional data of potential spatial image invariance factors,mean data, x, of potential aerial image invariance factors expressed as the ith dimension ^l Represented as first patch image data, x ^m Represented as second patch image data, +.>Standard deviation data expressed as potential aerial image invariant factors for the ith dimension.

The invention designs a constant factor discrimination loss function, which is implemented by pairing image samples x ^l And x ^m Input into both VAE branch networks simultaneously, the input is encoded by an encoder into potential vectors corresponding to different feature factors. In the assumption, we consider the random texture in the image as a variable factor, while the common structure after removing the random texture is a constant factor. Thus, the invariant factor-discrimination loss function is specific to the potential component z corresponding to the invariant factors of the two inputs _(c) Constraining can achieve decoupling of two components in the potential representation, namely: wherein D is _sim (. Cndot.) is a measure of similarity between two potential components, and various measures can be utilized, such as the simple use of two posterior centroids μ ^c (x ^l ) And mu ^c (x ^m ) The L1 or L2 distance between them serves as a metric. However, when the posterior is distributed in each potential dimensionWhen the variances are different, the distance between the centroids cannot truly reflect the distance between the two distributions. Meanwhile, in order to avoid interference of partial defect features on feature separation, a contrast loss function based on a mahalanobis distance is used as a similarity loss function, so that the function is adjusted to be: Wherein beta is a binary label for distinguishing the input sample x ^l And x ^m Whether or not it is of the same class, D _m (. Cndot.) is the Markov distance, and when the input data are different, the characteristic boundary is m > 0. When the input samples are identical, the loss translates to a mahalanobis distance between the two distributions, since the variational approximation posterior is a multivariate gaussian distribution with a diagonal covariance structure. Thus, dm can be calculated from the mean and standard deviation, i.e.: The mahalanobis distance can be regarded as a polynomial equivalent of the Euclidean distance, which is more suitable for measuring the distance between the distributions of two points than the Euclidean distance, and the term- >And->The mean and standard deviation of the ith dimension in the common component are represented, respectively, and d is the dimension of the potential representation.

Preferably, step S3 comprises the steps of:

step S31: constructing an initial joint loss function of the image feature decoupling network model by using the integral loss function of the encoder and the constant factor discrimination loss function, and performing function optimization on the initial loss function of the image feature decoupling network model by using the open loss function to generate the joint loss function of the image feature decoupling network model;

step S32: and performing network model optimization on the image feature decoupling network model according to the joint loss function of the image feature decoupling network model to generate an optimized image feature decoupling network model.

The invention constructs the initial joint loss function of the image characteristic decoupling network model by combining the integral loss function of the encoder and the constant factor discrimination loss function, thereby being beneficial to more comprehensively evaluating the performance of the model. The comprehensive loss function considers various aspects of image feature decoupling, including accurate decoupling of image features and accurate identification of invariant factors, so that accuracy and efficiency of model training are improved, the initial joint loss function is optimized by using the loose loss function, the identification capacity of the model on key features is further improved, the loose loss function is beneficial to the model to pay attention to important image features better, interference on unimportant features is reduced, and accuracy of identifying defects by the model is improved. And optimizing the image characteristic decoupling network model based on the constructed joint loss function, and further improving the performance of the model. The optimization ensures the high efficiency and accuracy of the model when processing complex texture images, particularly in the aspect of distinguishing normal textures and defects, and the model optimization guided by the joint loss function not only improves the performance of the model on a specific data set, but also enhances the generalization capability of the model when processing the surface textures of different types of objects. This enables the model to accommodate a wider variety of textures and defects in practical applications, increasing its practical value.

Preferably, the loose loss function in step S31 is as follows:

in the method, in the process of the invention,expressed as an open loss function, d as dimensional data of potential aerial image invariant factors,the average degree of activation of the parameters of the potential aerial image invariant factor, denoted as the ith dimension, τ, is denoted as constant data.

The present invention utilizes an open-figure loss function, in order to avoid the use of similarity constraints in the VAE, the network learns a trivial solution to it, which means that even if the values of all dimensions of the component mean are simply set to 0 values, all constraints can still be fully satisfied, in which case all the input information will be encoded into a particular feature. To represent the features more efficiently, we add a penalty term to the network loss function to constrain the degree of scribble of the component mean, defining a scribble vectorTo represent the average degree of activation of the i-th dimension in the mean vector, this can be expressed as:Wherein (1)>For the i-th dimension of the mean vector of the kth input data in a training batch B, τ can be described as a near zero scrim parameter by limiting the average activation of the mean vector of the different components in the potential space in the different dimensions to encourage specificity of the mean vector in the different dimensions by minimizing the relative entropy to bring the average activation as close as possible to the scrim constant.

Preferably, the joint loss function of the image feature decoupling network model in step S31 is as follows:

in the method, in the process of the invention,joint loss function expressed as image feature decoupling network model, +.>The variation denoted as first fully connected layer is derived from the encoder overall loss function, +.>The variation represented as the second fully-connected layer is derived from the encoder overall loss function, lambda ₁ Balance adjustment coefficient expressed as constant factor discrimination loss function, +.>Expressed as a constant factor affinity loss function, lambda ₂ Balance adjustment coefficient expressed as a loss-in-balance function, < >>Represented as a loss-in-scribble function.

The invention designs a joint loss function of an image characteristic decoupling network model, a variational self-encoder (VAE) is widely used in a current-stage disentangling method, and any input data is mapped into a potential space as a potential representation by using an encoder based on a deep neural network, and can be recorded as q _φ (z|x). Wherein phi is a weight parameter in the encoder. It is assumed that, after the encoder mapping, the input data x,expressed as +.>Based on [6 ]]In the definition of a separable representation, the potential representation z can be considered as being composed of a plurality of independent components together The joint probability model of z can be expressed as p (z) =n _i＝1 p(z _(i) ). Then, when each component contains only one semantic information, the change of one observable change factor in the image will correspond to only one component z _(i) Is a change in (c). When the number of independent components in the potential representation is greater than the number of semantic change factors (v.ltoreq.d _z ) There will be a subset in z that is independent of the observable change factor, the elements in the subset not necessarily having explicit semantic information, but still satisfying independent assumptions. That is, there will be potential elements in this subset that are not related to all observable change factors, namely: And +.>Wherein->To contain the set of basic components corresponding to all the variation factors in z, the image x is obtained ^l ，x ^m E X are respectively the encoders of VAEs input into the network, and the image is decomposed into z after being encoded by the encoder _(c) And z _(s) Factors corresponding to invariance and variation, respectively, in a given image pair in order to match potential components z in the image corresponding to common features and variable features _(s) And z _(c) And mutually decoupling so as to design a joint loss function consisting of a plurality of losses, thereby obtaining the joint loss function of the image characteristic decoupling network model. Wherein, And->The loss functions of the two inputs in the respective VAE branches, respectively.As a similarity loss function, it is used to constrain the potential components corresponding to the invariant features in both images, < +.>Is a penalty represented by the loose loss function that avoids overfitting by applying loose constraints to the mean of the potential components corresponding to the common feature in each dimension. Lambda (lambda) ₁ And lambda (lambda) ₂ The coefficients used to balance the two loss terms, respectively, use the empirical parameter lambda ₁ As a balance coefficient in the loss function. When there are enough defect samples in the training set, it can focus on reconstructing image quality and rely on the classifier to improve classification performance. When the defect sample is insufficient, the distribution boundary can be emphasized by enhancing the similarity function so as to obtain a clearer decision space.

Preferably, step S4 comprises the steps of:

step S41: transmitting the patch image data of the object surface to an optimized image feature decoupling network model for constant factor prediction, and generating an object surface image constant factor;

step S42: constructing an abnormal score discriminator model based on the deep neural network model;

step S43: model training and optimizing are carried out based on a preset abnormal score training set and an abnormal score discriminator model, and an optimized abnormal score discriminator model is generated;

Step S44: mapping the invariant factors of the object surface image to an optimized anomaly score discriminator model for evaluation of anomaly scores of the object surface so as to generate anomaly score data of the object surface; when the object surface anomaly score data is larger than a preset anomaly score threshold value, marking the object surface image data corresponding to the object surface anomaly score data as object surface defect image data; and when the object surface anomaly score data is not greater than a preset anomaly score threshold value, marking the object surface image data corresponding to the object surface anomaly score data as object surface conventional image data.

According to the invention, the object surface patch image data is transmitted to the optimized image feature decoupling network model for constant factor prediction, so that key image features can be accurately extracted, which is critical for defect identification, and the model can more effectively distinguish normal textures from abnormal defects by focusing on constant factors. The optimized network model is more efficient in processing image data, and can quickly and accurately generate the invariant factors of the image, so that a reliable basis is provided for subsequent defect discrimination. The anomaly score discriminator model constructed based on the deep neural network model is capable of efficiently processing and evaluating image data. The model can analyze images more accurately, improves the accuracy of defect detection, and has good generalization capability, so that the anomaly score discriminator can adapt to various different image textures and defect types. Training and optimizing the abnormal score discriminator model based on a preset abnormal score training set, so that the model can be better adapted to data features in practical application, and the recognition accuracy is improved. Through this training optimization, the model not only improves the processing effect on the current dataset, but also enhances the adaptability and stability in the face of new or unknown types of image data. The invariant factors of the object surface image are mapped into the optimized abnormal scoring discriminator model for evaluation, so that abnormal scoring data of the object surface can be accurately generated. According to the data, the system can automatically classify the image data into defect images or conventional images, so that the degree of automation and the efficiency of detection are greatly improved, when the abnormal score of the surface of an object exceeds a preset threshold value, the defect images can be rapidly and accurately marked, the high sensitivity and the accuracy of detection are ensured, and the detection is vital to improving the efficiency of product quality control.

Preferably, step S43 comprises the steps of:

transmitting a preset abnormal score training set to an abnormal score discriminator model to perform model training optimization, and generating a trained abnormal score discriminator model;

and performing model optimization on the trained abnormal score discriminator model by using the cross entropy loss function to generate an optimized abnormal score discriminator model.

The invention transmits the preset abnormal score training set to the abnormal score discriminator model for model training, and the process is helpful for model learning and adapting to various image data possibly encountered in practical application. By learning the diversified training data, the model can more accurately identify and evaluate the abnormal characteristics of the surface of the object, and the training process not only improves the performance of the model on a specific data set, but also enhances the generalization capability of the model when processing the surface textures of different types of objects. After the model is trained in a targeted manner, various complex textures and defect conditions can be processed and judged more effectively. The trained abnormal score discriminator model is further optimized by using the cross entropy loss function, so that model parameters can be more effectively adjusted, and prediction errors are reduced. The cross entropy loss function is particularly suitable for classification problems, can improve the performance of the model in defect detection classification tasks, is optimized through the cross entropy loss function, and improves the sensitivity and accuracy of the model in distinguishing normal images from defect images. This optimization method enables the model to make a more accurate determination in the face of complex situations approaching the threshold.

The method has the beneficial effects that the surface defect detection method based on the characteristic decoupling model can show good abnormality detection performance only by means of rare abnormality samples, and the key point of the method is to find a potential space capable of separating random texture characteristics and stable structural characteristics of the surface, so that abnormality of the surface structure can be judged directly in the space. In the reasoning stage, the image to be detected is divided into patches, the patches are input into a detection network for detection, the approximate region of the abnormality in the whole image is returned according to the positions of the patches, the proposed method is verified on a group of public defect data sets and compared with other defect detection methods, and the result shows that the method can realize the surface abnormality detection task of the industrial product with texture by only relying on a small amount of abnormality data and has better detection performance compared with other abnormality detection methods.

Drawings

FIG. 1 is a schematic flow chart of a defect detecting method suitable for an object with complex texture on the surface;

FIG. 2 is a detailed flowchart illustrating the implementation of step S1 in FIG. 1;

FIG. 3 is a detailed flowchart illustrating the implementation of step S2 in FIG. 1;

FIG. 4 is a flowchart illustrating the detailed implementation of step S4 in FIG. 1;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The following is a clear and complete description of the technical method of the present patent in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.

Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.

It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.

In order to achieve the above objective, referring to fig. 1 to 4, the present invention provides a defect detecting method suitable for an object with a complex texture surface, comprising the following steps:

In the embodiment of the present invention, as described with reference to fig. 1, a schematic flow chart of steps of a defect detection method suitable for an object with a complex surface texture according to the present invention is shown, and in the embodiment, the defect detection method suitable for an object with a complex surface texture includes the following steps:

in the embodiment of the invention, high-resolution monitoring equipment (such as an industrial camera) is used for shooting the surface of the target object. These devices should be able to capture details of the object surface, particularly complex textures or potential defects. The photographed image should have enough definition and contrast to ensure accurate identification of defects during subsequent processing to obtain object surface image data, comprising 900 hot rolled strip surface gray scale images of 200 x 200 size, 8 bit depth, and BMP format. The acquired image is subjected to necessary preprocessing, such as adjusting brightness, contrast, or applying a denoising algorithm, to improve image quality. Each original image is cut into a plurality of small blocks according to a preset patch image size (e.g., 48 x 48 pixels). These tiles can be more easily processed by subsequent algorithms, during the clipping process, a proportion of the overlap area (e.g. 20%) can be designed to avoid defect information loss due to clipping, and for the edge area, the size and shape of the clipping can be adjusted according to the actual situation to ensure that important information is not missed.

in an embodiment of the present invention, two variable self-encoders (VAEs) with the same architecture are used as the basis of the network. The two VAE branches share weights of the encoder and decoder, meaning that the two branches update these weights together during learning, and the encoder portion of each VAE branch is made up of multiple convolutional layers. These convolution layers may use a 3 x 3 convolution kernel and a step size of 2 to gradually reduce the feature map size while increasing the number of channels (e.g., from 128 to 512). Between the convolution layers, the ReLU activation function and Dropout techniques can be used to avoid overfitting, with the output of the encoder connected to two fully connected layers, yielding the mean and variance of the common and varying components in the potential representation, respectively.

The method comprises the steps of sampling from standard multidimensional normal distribution by adopting heavy parameter skills, and generating related data of potential spatial representation of an image by combining the mean value and the variance of the output of an encoder. The decoder architecture is symmetrical to the encoder, and the input image is reconstructed using a 2D deconvolution layer (with a gradually decreasing number of channels, e.g. from 512 to 128). The overall loss function should combine the reconstruction error, which measures the difference between the input image and the reconstructed image, with a regularization term of the potential space, which ensures that the distribution of the potential space is close to a predefined distribution (typically a normal distribution). The invariant factor discrimination loss function is used to ensure that the network can learn and extract the invariant factors efficiently. This may be achieved by comparing potential representations of samples within the same class or using some measure of similarity (e.g., cosine similarity). The twin network is trained using the preprocessed image dataset. In the training process, the accuracy of reconstruction quality and feature extraction can be balanced by adjusting the weight parameters in the overall loss function and the constant factor acquaintance loss function, and the network parameters are gradually adjusted until the loss function reaches the minimum or meets other established stopping standards.

in the embodiment of the invention, the joint loss function of the image characteristic decoupling network model is constructed by combining the integral loss function of the encoder and the constant factor discrimination loss function, wherein the integral loss function of the encoder generally comprises reconstruction loss and potential space regularization loss. The reconstruction loss may use a measure of Mean Square Error (MSE) or cross entropy, etc., to measure the difference between the input image and the reconstructed image. Regularization loss ensures that the distribution of potential space approaches a standard normal distribution. The constant factor affinity loss function is used to enhance the model's ability to extract invariant features, and may be implemented by comparing representations of images of the same class in potential space, for example using cosine similarity or other similarity metrics, in combination with the encoder overall loss function and the constant factor affinity loss function to construct a joint loss function for the image feature decoupling network model. The encoder ensemble loss function typically includes a reconstruction loss, which may use a measure of Mean Square Error (MSE) or cross entropy, for measuring the difference between the input image and the reconstructed image, and a regularization loss of potential space. Regularization loss ensures that the distribution of potential space approximates a standard normal distribution, and a constant factor affinity loss function is used to enhance the model's ability to extract invariant features, by comparing representations of images of the same class in potential space, for example using cosine similarity or other similarity measures. And evaluating the optimized model by using a verification set, checking the performance of the model in terms of decoupling image features and extracting invariant factors, and if the performance of the model does not meet the expectations, further optimizing the model by adjusting weight parameters of a loss function or optimizing a network architecture (for example, increasing or reducing the number of layers, adjusting the size of a convolution kernel and the like), wherein after the optimization process is completed, the obtained model has stronger image feature decoupling capability and the capability of extracting invariant factors, so that more accurate feature representation is provided for subsequent defect detection.

In the embodiment of the invention, patch image data of the surface of an object is input into an optimized image feature decoupling network model. The model analyzes each patch and extracts key invariant factors which capture important features of the image related to defect detection, and the invariant factor prediction process involves passing the patch image through an encoder to obtain representations in potential space, namely invariant factors. And constructing an optimized abnormal score discriminator model by using a deep neural network (such as a multi-layer perceptron). The model consists of multiple fully connected layers, and can use ReLU activation function and Dropout technique to prevent over fitting after each fully connected layer, and the last layer uses sigmoid function to output an anomaly score between 0 and 1, indicating the probability of each patch image belonging to defect class. The extracted invariant factors are input into an optimized anomaly score discriminator model, anomaly score evaluation processing is carried out on each patch image, and anomaly score data of the object surface are generated based on the scores, wherein the anomaly score data reflect the probability that each patch image possibly contains defects. A preset anomaly score threshold is set. When the abnormal score of a certain patch exceeds the threshold, the original image corresponding to the patch is marked as containing defects, and if the score is lower than or equal to the threshold, the corresponding image is marked as a normal image without defects. For images marked as containing defects, specific classification, localization or repair advice of the defects can be further performed, and in addition, marking results can be used for quality control of production lines, and products containing the defects can be screened or alarmed.

Preferably, step S1 comprises the steps of:

As an example of the present invention, referring to fig. 2, a detailed implementation step flow diagram of step S1 in fig. 1 is shown, where step S1 includes:

in the embodiment of the invention, high-resolution monitoring equipment (such as an industrial camera) is used for shooting the surface of the target object. These devices should be able to capture details of the object surface, particularly complex textures or potential defects. The photographed image should have enough definition and contrast to ensure accurate identification of defects during subsequent processing to obtain object surface image data, comprising 900 hot rolled strip surface gray scale images of 200 x 200 size, 8 bit depth, and BMP format.

in the embodiment of the invention, the gray level conversion is carried out on the acquired color image. This step may be implemented by removing color information, converting the RGB value of each pixel point into a gray value, where the purpose of converting into a gray image is to reduce the computational burden of subsequent processing, while highlighting the texture and shape features of the object surface, which is particularly important for defect detection, and gray conversion may be performed using a common image processing library (such as OpenCV, PIL, etc.).

In the embodiment of the invention, each gray-scale image is cut into a plurality of small blocks according to a preset patch image size (for example, 48×48 pixels). These tiles can be more easily processed by subsequent algorithms, during the clipping process, a proportion of the overlap area (e.g. 20%) can be designed to avoid defect information loss due to clipping, and for the edge area, the size and shape of the clipping can be adjusted according to the actual situation to ensure that important information is not missed.

Preferably, step S2 comprises the steps of:

As an example of the present invention, referring to fig. 3, a detailed implementation step flow diagram of step S2 in fig. 1 is shown, where step S2 includes:

in an embodiment of the invention, a variational self-encoder (VAE) based twin network architecture is used to construct an image feature decoupling network model. This model includes two key components: the encoder part of the network consists of a plurality of convolution layers for extracting features of the image and reducing its dimensions. These features are then passed to two different fully connected layers, a first fully connected layer and a second fully connected layer, for extracting the invariant factors and the variant factors of the image, respectively.

In the embodiment of the invention, the preprocessed patch image training set is input into an encoder of an image feature decoupling network model. The training set is divided into two parts: first patch image data and second patch image data. The first patch image data is transferred into the first fully-connected layer and the second patch image data is transferred into the second fully-connected layer. This separation ensures that the network can learn the invariant and variant factors of the image separately.

in an embodiment of the invention, the overall loss function of the variational self-encoder is designed based on the potential spatial image invariant factor and the variational factor obtained from the encoder. This loss function typically includes reconstruction loss and regularization loss (e.g., KL divergence), etc., to ensure the validity and consistency of the potential space.

In the embodiment of the invention, the identity loss function is designed according to the potential space image invariant factor. The purpose of such a loss function is to ensure that the network can learn and extract the invariant features critical to defect detection efficiently. The invariant factor discrimination loss function may be implemented by comparing potential representations of patches within the same class, or using a particular similarity measure (e.g., cosine similarity).

The present invention utilizes a variation from the encoder overall loss function, and when VAEs are used as the generation model, the generated data can be considered as a common representation of multiple independent components in potential space. Assuming that the potential representation z can be decoupled into two independent components, each corresponding to a varying texture feature z _(s) And unchanged structural feature z _(c) The data generated by the decoder can then be regarded as a union of the two componentsAnd (5) mixing and distributing. This can be expressed as: p is p _θ (x，z _(s) ，z _(c) )＝p _θ (x|z _(s) ，z _(c) )p(z _(s) )p(z _(c) ) Where θ is the decoder parameter, z _(s) And z _(c) The varying component and the common component of the potential representation, respectively. From the interpretation of the VAE optimization objective, the a priori distribution p (z _(c) ) And p (z) _(s) ) Is set to a standard normal distribution with a mean of zero and a variance of 1. P is p _θ (x|z _(c) ，z _(s) ) Also set to a normal distribution whose mean and variance are given by the encoder during the encoding phase. Phi is the encoder parameter, z _(c) And z _(s) The posterior distribution is also set to be the average value mu _c (x)、μ _s (x) Variance is sigma _c (x)、σ _s (x) This can be expressed as a normal distribution of AndThe VAE encoding process is expressed as: q _φ (z _(c) ，z _(s) |x)＝q _φ (z _(c) |x)q _φ (z _(s) I x), i.e.: Thus, the overall loss function of the variable self-encoder in this network is designed, resulting in the variable self-encoder overall loss function.

The invention designs a constant factor discrimination loss function, which is implemented by pairing image samples x ^l And x ^m Input into both VAE branch networks simultaneously, the input is encoded by an encoder into potential vectors corresponding to different feature factors. In the assumption, we consider the random texture in the image as a variable factor, while the common structure after removing the random texture is a constant factor. Thus, the invariant factor-discrimination loss function is specific to the potential component z corresponding to the invariant factors of the two inputs _(c) Constraining can achieve decoupling of two components in the potential representation, namely: wherein D is _sim (. Cndot.) is a measure of similarity between two potential components, and various measures can be utilized, such as the simple use of two posterior centroids μ ^c (x ^l ) And mu ^c (x ^m ) The L1 or L2 distance between them serves as a metric. However, when the variance of the posterior distribution is different in each potential dimension, the distance between centroids does not truly reflect the distance between the two distributions. At the same time, to avoid partial defect feature-to-feature separationThe contrast loss function based on the mahalanobis distance is used as the similarity loss function, so that the function is adjusted to be: / > Wherein beta is a binary label for distinguishing the input sample x ^l And x ^m Whether or not it is of the same class, D _m (. Cndot.) is the Markov distance, and when the input data are different, the characteristic boundary is m > 0. When the input samples are identical, the loss translates to a mahalanobis distance between the two distributions, since the variational approximation posterior is a multivariate gaussian distribution with a diagonal covariance structure. Thus, dm can be calculated from the mean and standard deviation, i.e.: The mahalanobis distance can be regarded as a polynomial equivalent of the Euclidean distance, which is more suitable for measuring the distance between the distributions of two points than the Euclidean distance, and the term->And->The mean and standard deviation of the ith dimension in the common component are represented, respectively, and d is the dimension of the potential representation.

Preferably, step S3 comprises the steps of:

In the embodiment of the invention, firstly, an initial joint loss function of an image characteristic decoupling network model is constructed by combining an integral loss function of an encoder and an invariant factor discrimination loss function. The encoder overall loss function typically includes reconstruction losses (e.g., mean square error) and regularization losses (e.g., KL divergence) to ensure normalization of the quality and distribution of the potential representation. The invariant factor affinity loss function aims at enhancing the ability of the model to learn important invariant features in the image, and then an open-loop loss function (e.g., L1 loss) is introduced to optimize, which helps the model to focus on important features while suppressing uncorrelated or noisy features, thereby enhancing the effectiveness of feature extraction. And training and optimizing the image characteristic decoupling network model by using the constructed joint loss function. This includes using a back-propagation algorithm and gradient descent (or other optimization algorithm) to adjust model parameters, with the goal of minimizing the joint loss function, and during the optimization process, the best balance point can be found by adjusting the weight coefficients of the parts of the loss function, ensuring that the model can effectively reconstruct images, accurately extract key features, and periodically use the validation dataset to evaluate the performance of the model, ensuring that no overfitting occurs during model optimization. If the model performance is found to be degraded over the validation set, corresponding actions may be taken, such as adjusting the weight of the loss function or early stopping training.

Preferably, the loose loss function in step S31 is as follows:

The present invention utilizes an open-figure loss function, in order to avoid the use of similarity constraints in the VAE, the network learns a trivial solution to it, which means that even if the values of all dimensions of the component mean are simply set to 0 values, all constraints can still be fully satisfied, in which case all the input information will be encoded into a particular feature. To represent the features more efficiently, we add a penalty term to the network loss function to constrain the degree of scribble of the component mean, defining aScribble vectorTo represent the average degree of activation of the i-th dimension in the mean vector, this can be expressed as:Wherein (1)>For the i-th dimension of the mean vector of the kth input data in a training batch B, τ can be described as a near zero scrim parameter by limiting the average activation of the mean vector of the different components in the potential space in the different dimensions to encourage specificity of the mean vector in the different dimensions by minimizing the relative entropy to bring the average activation as close as possible to the scrim constant.

The invention designs a joint loss function of an image characteristic decoupling network model, a variational self-encoder (VAE) is widely used in a current-stage disentangling method, and any input data is mapped into a potential space as a potential representation by using an encoder based on a deep neural network, and can be recorded as q _φ (z|x). Wherein phi is a weight parameter in the encoder. It is assumed that, after the encoder mapping, the input data x,expressed as +.>Based on [6 ]]In the definition of separable representation, the potential representation z can be regarded as being composed of a plurality of independent components together, and the joint probability model of z can be expressed as p (z) =pi _i＝1 p(z _(i) ). Then, when each component contains only one semantic information, the change of one observable change factor in the image will correspond to only one component z _(i) Is a change in (c). When the number of independent components in the potential representation is greater than the number of semantic change factors (v.ltoreq.d _z ) There will be a subset in z that is independent of the observable change factor, the elements in the subset not necessarily having explicit semantic information, but still satisfying independent assumptions. That is, there will be potential elements in this subset that are not related to all observable change factors, namely: And +.>Wherein->To contain the set of basic components corresponding to all the variation factors in z, the image x is obtained ^l ，x ^m E X are respectively the encoders of VAEs input into the network, and the image is decomposed into z after being encoded by the encoder _(c) And z _(s) Factors corresponding to invariance and variation, respectively, in a given image pair in order to match potential components z in the image corresponding to common features and variable features _(s) And z _(c) And mutually decoupling so as to design a joint loss function consisting of a plurality of losses, thereby obtaining the joint loss function of the image characteristic decoupling network model. Wherein,and->The loss functions of the two inputs in the respective VAE branches, respectively. / >As a similarity loss function, it is used to constrain the potential components corresponding to the invariant features in both images, < +.>Is a penalty represented by the loose loss function that avoids overfitting by applying loose constraints to the mean of the potential components corresponding to the common feature in each dimension. Lambda (lambda) ₁ And lambda (lambda) ₂ The coefficients used to balance the two loss terms, respectively, use the empirical parameter lambda ₁ As a balance coefficient in the loss function. When there are enough defect samples in the training set, it can focus on reconstructing image quality and rely on the classifier to improve classification performance. When the defect sample is insufficient, the distribution boundary can be emphasized by enhancing the similarity function so as to obtain clearerA distinct decision space.

Preferably, step S4 comprises the steps of:

As an example of the present invention, referring to fig. 4, a detailed implementation step flow diagram of step S4 in fig. 1 is shown, where step S4 includes:

in the embodiment of the invention, patch image data of the surface of an object is input into an optimized image feature decoupling network model. The model analyzes each patch and extracts key invariant factors which capture important features of the image related to defect detection, and the invariant factor prediction process involves passing the patch image through an encoder to obtain representations in potential space, namely invariant factors.

in the embodiment of the invention, an anomaly score discriminator model is constructed by using a deep neural network (such as a multi-layer perceptron). The model consists of multiple fully connected layers, and can use ReLU activation function and Dropout technique to prevent over fitting after each fully connected layer, and the last layer uses sigmoid function to output an anomaly score between 0 and 1, indicating the probability of each patch image belonging to defect class.

in the embodiment of the invention, a preset abnormal score training set is used for training and optimizing an abnormal score discriminator model. The training set should contain various types of patch images, including normal and abnormal (defect-containing) samples. In the training process, parameters are continuously adjusted according to the performance of the model on the training set so as to optimize the abnormality detection capability of the model.

In the embodiment of the invention, the extracted invariant factors are input into an optimized anomaly score discriminator model, the anomaly score evaluation processing is carried out on each patch image, and anomaly score data of the object surface is generated based on the scores, wherein the anomaly score data reflect the probability that each patch image possibly contains defects. A preset anomaly score threshold is set. When the abnormal score of a certain patch exceeds the threshold, the original image corresponding to the patch is marked as containing defects, and if the score is lower than or equal to the threshold, the corresponding image is marked as a normal image without defects. For images marked as containing defects, specific classification, localization or repair advice of the defects can be further performed, and in addition, marking results can be used for quality control of production lines, and products containing the defects can be screened or alarmed.

Preferably, step S43 comprises the steps of:

In the embodiment of the invention, a discriminator C with a parameter psi and constructed by a deep neural network is used _ψ (z _(c) ) The image invariant factors in the training set are mapped to a discriminant space. In training, to avoid the influence of unbalance of training data on the partitioned space, an oversampled abnormal sample is used to construct a small training batch with the same number of normal samples and abnormal samples. Using sigmoid function as activating function in last layer of discriminator, and using cross entropy loss as loss function of discriminatorThe loss function is particularly suited to a classification problem, which can measure the difference between the probability distribution of the model output and the probability distribution of the real labels. In the training process, the gradient of the loss function with respect to the model parameters is calculated by using a back propagation algorithm, and the model parameters are updated by using gradient descent (or other optimization algorithms) to reduce the value of the loss function, and model performance is periodically evaluated on a verification set to ensure that the model not only performs well on the training set, but also can be generalized to unseen data. The finally obtained abnormal score discriminator model has stronger capability of distinguishing normal images from abnormal images. This optimized model may more accurately generate an anomaly score for each patch image in a subsequent step.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A defect detection method suitable for objects with complex textures on surfaces, comprising the steps of:

2. The defect detection method for objects with complex textures on surfaces according to claim 1, wherein step S1 comprises the steps of:

3. The defect detection method for objects with complex textures on surfaces according to claim 1, wherein step S2 comprises the steps of:

4. A defect detection method applied to objects with complex textures on surfaces according to claim 3, wherein the variation self-encoder overall loss function in step S23 is as follows:

in the method, in the process of the invention,expressed as a variation self-encoder overall loss function, < >>Denoted as q _φ (z _(c) ,z _(s) I x), phi is expressed as encoder parameters, q _φ The (-) function is expressed as a joint probability posterior distribution function of the encoder, θ is expressed as a decoder parameter, p ₀ The (-) function is expressed as a joint probability posterior distribution function of the decoder, x is expressed as an input image parameter of the model, z _(c) Represented as latent aerial image invariant factor, z _(s) Represented as a latent aerial image variable factor, D _KL The (-) function is expressed as a relative entropy function of the probability distribution.

5. A defect detection method applied to an object with complex texture on surface according to claim 3, wherein the invariant factor phase loss function in step S24 is as follows:

wherein D is _m Denoted as q _φ (z _(c) |x ^l ) Joint probability posterior distribution function of encoder and q _φ (z _(c) |x ^m ) The mahalanobis distance of the joint probability posterior distribution function of the encoder, d is expressed as dimensional data of potential spatial image invariance factors, Mean data, x, of potential aerial image invariance factors expressed as the ith dimension ^l Represented as first patch image data, x ^m Represented as second patch image data, +.>Standard deviation data expressed as potential aerial image invariant factors for the ith dimension.

6. A defect detection method suitable for objects with complex textures on surfaces according to claim 3, wherein step S3 comprises the steps of:

7. The method for detecting defects in an object with a complex texture on a surface according to claim 6, wherein the loose loss function in step S31 is as follows:

In the method, in the process of the invention,dimension data expressed as a loss-in-break function, d expressed as a latent aerial image invariant factor,/->The average degree of activation of the parameters of the potential aerial image invariant factor, denoted as the ith dimension, τ, is denoted as constant data.

8. The method according to claim 6, wherein the joint loss function of the image feature decoupling network model in step S31 is as follows:

9. The defect detection method for objects with complex textures on surfaces according to claim 1, wherein step S4 comprises the steps of:

10. The defect detection method for objects with complex textures on surfaces according to claim 9, wherein step S43 comprises the steps of: