CN115620054A

CN115620054A - Defect classification method and device, electronic equipment and storage medium

Info

Publication number: CN115620054A
Application number: CN202211249790.8A
Authority: CN
Inventors: 李吉林; 陈晓炬; 于跃
Original assignee: Nanjing Xurui Software Technology Co ltd
Current assignee: Nanjing Xurui Software Technology Co ltd
Priority date: 2022-10-12
Filing date: 2022-10-12
Publication date: 2023-01-17

Abstract

The invention provides a defect classification method, a defect classification device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring a basic feature map of an image to be classified; performing multi-scale feature extraction on the basic feature map to obtain multi-scale features; determining attention weight of each channel corresponding to the feature vector of the multi-scale feature; performing weighted fusion on the attention weight of each channel and the corresponding multi-scale features to obtain weighted multi-scale features; determining the defect category of the image to be classified based on the weighted multi-scale features. In other words, in the embodiment of the invention, for the case that the defect scale in the image has a large change, the weight of effective features is pertinently strengthened and the weight of interference suppression information is reduced while the extraction of defect features of different scales is considered through the weighted fusion of the adaptive multi-scale features, so that the accuracy of image defect classification is improved.

Description

Defect classification method and device, electronic equipment and storage medium

Technical Field

The present invention relates to the field of defect image classification technologies, and in particular, to a defect classification method and apparatus, an electronic device, and a computer-readable storage medium.

Background

The defect classification is to classify the defect images in the industrial detection field through a certain algorithm. In the related art, the surface defect detection of the workpiece generally adopts: the method comprises the steps of adding a One Class SVM model to CNN network extraction characteristics, training a multi-scale image, adding a pyramid pooling method to CNN network characteristic extraction, and the like. For the CNN network extraction feature plus One Class SVM model, the computation of an SVM kernel function is time-consuming, training and reasoning are time-consuming, and the mode depends heavily on initial picture data distribution; if the picture features are not distributed sufficiently and the picture background changes due to processes and the like, the data set needs to be collected again to generate a new feature set. For the multi-scale image training method, the robustness of the multi-scale training sample lifting model under the condition of large scale change is increased in an image augmentation mode, but the algorithm model is not processed on a feature extraction level, so that the lifting effect of the mode is limited. For the method of CNN network feature extraction plus pyramid pooling, only context information of features with different granularities is extracted, no further semantic extraction is performed on feature maps, and pyramid pooling brings more interference information due to the introduction of feature information with a plurality of granularities, so that pyramid pooling can reduce the precision of model classification under certain conditions.

Therefore, in the related art, the classification of industrial defects with large scale differences due to non-uniform image scales faces the following problems: 1) The defect sample data is less, and the sample distribution of different scales is extremely uneven. 2) The defect scale changes a lot, the feature extraction network of a single scale is difficult, and the feature extraction of large defects and tiny defects is considered at the same time. 3) The size of an input picture is large in change, the size of the input picture is generally required to be fixed for a deep neural network comprising a full connection layer, and if the size of the picture is forcibly scaled to be uniform, deformation of an object in an image is possibly caused, and detection performance is influenced.

Therefore, how to determine the type of the defect image under the scene of non-uniform image size and complex defect scale change is a technical problem to be solved at present.

Disclosure of Invention

The invention provides a defect classification method, a defect classification device, electronic equipment and a computer-readable storage medium, which are used for at least solving the problem that the classification accuracy of a defect image is reduced due to non-uniform image size and complex defect scale change in the related technology. The technical scheme of the invention is as follows:

according to a first aspect of the embodiments of the present invention, there is provided a defect classification method, including:

acquiring a basic feature map of an image to be classified;

performing multi-scale feature extraction on the basic feature map to obtain multi-scale features;

determining attention weight of each channel corresponding to the feature vector of the multi-scale feature;

carrying out weighted fusion on the attention weight of each channel and the corresponding multi-scale features to obtain weighted multi-scale features;

determining the defect category of the image to be classified based on the weighted multi-scale features.

Optionally, the obtaining of the basic feature map of the image to be classified includes:

and carrying out feature extraction on the image to be classified through a trained feature extraction network to obtain a basic feature map.

Optionally, the performing multi-scale feature extraction on the basic feature map to obtain multi-scale features includes:

carrying out feature extraction on the basic feature map through a trained multi-scale feature extraction network to obtain multi-scale features of different sizes;

and (4) all the multi-scale features with different sizes are connected in series to obtain the multi-scale features with the preset sizes.

Optionally, the determining the attention weight of each channel corresponding to the feature vector of the multi-scale feature includes:

and inputting the feature vector of the multi-scale feature into a channel attention network for processing to obtain the attention weight of each channel.

Optionally, the feature vector of the multi-scale feature is input to a channel attention network and processed according to the following formula:

w＝σ(W ₂ δ(W ₁ F _GAP (F _ap )))

wherein, σ and δ are sigmoid activation function and ReLU activation function respectively. W ₁ ,W ₂ Respectively two full connection layers FC, F _GAP For global average pooling operation, F _ap For the input multi-scale feature, w is the attention weight value, where global average pooling is the averaging of all elements in the two-dimensional matrix for each channel in the feature map.

Optionally, the performing weighted fusion on the attention weight of each channel and the corresponding multi-scale feature to obtain a weighted multi-scale feature includes:

and performing weighted fusion on the attention weight of each channel and the corresponding multi-scale feature through a self-adaptive feature fusion network according to the following formula to obtain the weighted multi-scale feature, wherein the formula is as follows:

wherein w ∈ R ²⁵⁶ Is a parameter obtained by learning through a channel attention mechanism network, w is an attention weight value,

representing a set of 1x1 convolution and global average pooling operations for channel compression, F _ap As input multi-scale features, F _r The multi-scale features after weighted fusion.

Optionally, the determining the defect category of the image to be classified based on the weighted multi-scale features includes:

carrying out dimension adjustment on the weighted multi-scale features by using a convolution network;

connecting the full connection layers in the convolutional network to obtain the posterior probability of each category;

and selecting the maximum class in the posterior probability as the defect class of the image to be classified.

According to a second aspect of an embodiment of the present invention, there is provided a defect classification apparatus including:

the acquisition module is used for acquiring a basic feature map of the image to be classified;

the extraction module is used for carrying out multi-scale feature extraction on the basic feature map to obtain multi-scale features;

the first determination module is used for determining the attention weight of each channel corresponding to the feature vector of the multi-scale feature;

the weighted fusion module is used for carrying out weighted fusion on the attention weight of each channel and the corresponding multi-scale features to obtain the weighted multi-scale features;

and the second determination module is used for determining the defect category of the image to be classified based on the weighted multi-scale features.

Optionally, the obtaining module is specifically configured to perform feature extraction on the image to be classified through a trained feature extraction network to obtain a basic feature map.

Optionally, the extracting module includes:

the characteristic extraction module is used for extracting the characteristic of the basic characteristic diagram through a trained multi-scale characteristic extraction network to obtain multi-scale characteristics with different sizes;

and the concatenation module is used for concatenating all the multi-scale features with different sizes to obtain the multi-scale features with preset sizes.

Optionally, the first determining module is specifically configured to input the feature vector of the multi-scale feature into a channel attention network for processing, so as to obtain an attention weight of each channel.

Optionally, the first determining module inputs the feature vector of the multi-scale feature into the channel attention network to process according to the following formula:

w＝σ(W ₂ δ(W ₁ F _GAP (F _ap )))

wherein, σ and δ are sigmoid activation function and ReLU activation function respectively. W ₁ ,W ₂ Respectively two full connection layers FC, F _GAP For a global average pooling operation, F _ap For the input multi-scale feature, w is the attention weight value, where global average pooling is the averaging of all elements in the two-dimensional matrix for each channel in the feature map.

Optionally, the weighted fusion module is specifically configured to perform weighted fusion on the attention weight of each channel and the corresponding multi-scale feature according to the following formula through the adaptive feature fusion network to obtain a weighted multi-scale feature, where the formula is:

Optionally, the second determining module includes:

the adjusting module is used for carrying out dimension adjustment on the weighted multi-scale features by utilizing a convolution network;

the connection module is used for connecting the full connection layer in the convolutional network to obtain the posterior probability of each category;

and the selection module is used for selecting the maximum class in the posterior probability as the defect class of the image to be classified.

According to a third aspect of embodiments of the present invention, there is provided an electronic apparatus, including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the defect classification method of any of claims 1 to 8.

According to a first aspect of embodiments of the present invention, there is provided a computer-readable storage medium, in which instructions, when executed by a processor of an electronic device, enable the electronic device to perform a defect classification method as claimed in any one of claims 1 to 8.

According to a fifth aspect of embodiments of the present invention, there is provided a computer program product comprising a computer program or instructions which, when executed by a processor, implements the defect classification method as described above.

The technical scheme provided by the embodiment of the invention at least has the following beneficial effects:

in the embodiment of the invention, a basic characteristic diagram of an image to be classified is obtained; performing multi-scale feature extraction on the basic feature map to obtain multi-scale features with fixed sizes; determining attention weight of each channel corresponding to the feature vector of the multi-scale feature; carrying out weighted fusion on the attention weight of each channel and the corresponding multi-scale features to obtain weighted multi-scale features; determining the defect category of the image to be classified based on the weighted multi-scale features. In other words, in the embodiment of the invention, for the condition that the defect scale in the image has large change, the weight of effective features is pertinently strengthened and the weight of interference suppression information is reduced while the extraction of defect features of different scales is considered through the weighted fusion of the self-adaptive multi-scale features, so that the accuracy of image defect classification is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention and are not to be construed as limiting the invention.

Fig. 1 is a flowchart of a defect classification method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a multi-scale feature extraction network according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of an adaptive feature fusion network according to an embodiment of the present invention.

Fig. 4 is a block diagram of a defect classification apparatus according to an embodiment of the present invention.

Fig. 5 is a block diagram of an extraction module provided in an embodiment of the present invention.

Fig. 6 is a block diagram of a second determining module provided by the embodiment of the present invention.

Fig. 7 is a block diagram illustrating a defect classification system according to an embodiment of the present invention.

Fig. 8 is a block diagram of an electronic device according to an embodiment of the present invention.

Fig. 9 is a block diagram of an apparatus for defect classification according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, specifically, a machine is used for identifying the world, and the computer vision technology generally comprises technologies such as face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.

Before describing the embodiment of the present invention, technical terms related to the embodiment are described:

convolutional Neural Networks (CNN), which are a type of feed-forward Neural network that includes convolution calculations and has a deep structure, are a representative algorithm for deep learning.

And defect classification, which is to obtain corresponding classification of images through a certain algorithm.

Pyramid pooling mainly solves the multi-scale problem in a convolutional neural network by a pyramid feature network, and greatly improves the extraction performance of multi-scale features under the condition of basically not increasing the calculation amount of an original model through simple network connection change.

The channel attention mechanism is an attention mechanism (attention) aiming to lead the computing resource to be biased to the part with the largest information amount in the input signal. By combining with threshold functions (such as softmax and sigmoid), the feature weight is learned by back propagation through a network according to the final loss, so that the effective feature graph has large weight, and the training model with invalid or small effect has small weight to achieve better results. The channel attention mechanism can dynamically adjust the characteristics of each channel according to the input, and the representation capability of the network is enhanced.

Fig. 1 is a flowchart of a defect classification method according to an embodiment of the present invention, as shown in fig. 1, the defect classification method includes the following steps:

step 101: acquiring a basic feature map of an image to be classified;

step 102: performing multi-scale feature extraction on the basic feature map to obtain multi-scale features;

step 103: determining attention weight of each channel corresponding to the feature vector of the multi-scale feature;

step 104: performing weighted fusion on the attention weight of each channel and the corresponding multi-scale features to obtain weighted multi-scale features;

step 105: determining the defect category of the image to be classified based on the weighted multi-scale features.

The defect classification method can be applied to terminals, servers and the like, and is not limited herein, and the terminal implementation equipment can be electronic equipment such as a smart phone, a notebook computer, a tablet computer and the like.

The following describes in detail specific implementation steps of a defect classification method provided in an embodiment of the present invention with reference to fig. 1.

In step 101, a base feature map of an image to be classified is obtained.

In the step, the image to be classified is subjected to feature extraction through a trained feature extraction network to obtain a basic feature map.

In one embodiment, a dense convolutional network 161 (densenert 161) may be used as a feature extraction network, and the feature extraction network is trained using a large data set, and features of different background pictures are fully learned, so that a trained feature extraction network is obtained. The feature information of different pictures can be accurately extracted through the trained feature extraction network.

The basic idea of DenseNet161 is consistent with ResNet, but DenseNet161 establishes a dense connection (dense connection) between all layers in front and all layers behind. Another feature of DenseNet is the feature reuse (feature reuse) achieved by the connection of features over channels. These characteristics enable the DenseNet to achieve better performance than ResNet under the condition of less parameters and computation cost, so in this embodiment, the DenseNet is selected as a feature extractor to achieve extraction of an image to be classified, and a basic feature map is obtained.

In this embodiment, a DenseBlock + Transition structure is used in the DenseNet network, where the DenseBlock is a module including many layers, the feature maps of each layer have the same size, and a dense connection mode is used between layers. And the Transition module connects two adjacent DenseBlock and reduces the size of the feature map by Pooling. And DenseNet typically includes: 4 DenseBlock, each DenseBlock connected together through Transition.

In step 102, multi-scale feature extraction is performed on the basic feature map to obtain multi-scale features.

In the step, the basic feature map is subjected to feature extraction through a trained multi-scale feature extraction network to obtain multi-scale features with different sizes; and all the multi-scale features with different sizes are connected in series to obtain the multi-scale features with the preset sizes.

The multi-scale feature extraction network is usually a convolutional neural network model, and the convolutional neural network model is usually divided into two parts: one part comprises: the feature extraction networks of the previous convolutional layer, activation function layer and pooling layer. The other part includes the following fully connected network for classification. Usually, the convolutional neural network requires the input picture size to be consistent, because the following fully-connected layer requires a fixed input dimension. Specifically, as shown in fig. 2, fig. 2 is a schematic diagram of a multi-scale feature extraction network according to an embodiment of the present invention. In this embodiment, the basic features are input into a multi-scale feature extraction network, and the multi-scale feature extraction network takes pooling layers of 4x4, 8x8 and 24x24 as examples, but in practical applications, the invention is not limited thereto.

As shown in fig. 2, in the embodiment of the present invention, first, basic features are input into a multi-scale feature extraction network, and three sets of multi-scale feature maps with fixed output sizes are obtained through a set of pooling layers with fixed output sizes of 4x4, 8x8, and 24x24, so that a network model of a convolutional neural network can process pictures with different sizes. In this embodiment, different convolutional layers are used for feature maps of different scales, and then, the obtained multi-scale features are subjected to feature concatenation to obtain feature vectors of different scales. Specifically, the concatenation may be performed sequentially according to the size of the scale.

Assuming that the basic feature map is x ∈ R ^M×N×D For each of which x is mapped _i ∈R ^M×N Is divided into a plurality of regions R ^d _m,n Then each region outputs a value y ^d _m,n The calculation formula of (c) is:

in the embodiment, the basic feature map is subjected to feature extraction through a multi-scale feature extraction network to obtain multi-scale features with different sizes; at least two advantages can be brought about: firstly, the size of an input image is allowed to be not unique, and image deformation caused by the fact that the image is zoomed to the same size is avoided; secondly, the multi-scale feature extraction network can extract features of defects with different scales, and the extraction capability of the network on the multi-scale defect features is improved.

The multi-scale feature extraction network needs to be trained through multi-scale defect samples, and the trained multi-scale feature extraction network can be obtained. When training a local defect sample data set, the last full-link layer of the multi-scale feature extraction network needs to be removed first, and the classification category is adjusted to the category of the local defect sample data set. After the multi-scale feature extraction network extraction, the size of the feature map is 1 × 2496, and reshape to 2496 are required to perform feature mapping to be the dimension of the defect type number by using a full connection layer. Where reshape is a function that can readjust the number of rows, columns, and dimensions of the matrix.

In step 103, an attention weight of each channel corresponding to the feature vector of the multi-scale feature is determined.

In the step, the feature vectors of the multi-scale features are input into a channel attention network for processing, and the attention weight of each channel is obtained. Wherein the channel attention network comprises: the global tie pooling layer may include, but is not limited to, a global tie pooling layer, two fully connected layers, a ReLU activation function, and a sigmoid activation function, and may also include others, which is not limited in this embodiment.

In the embodiment of the present invention, the feature vector of the multi-scale feature is input to the channel attention network and processed according to the following formula:

w＝σ(W ₂ δ(W ₁ F _GAp (F _ap )))

wherein, sigma and delta are respectively a sigmoid activation function and a ReLU activation function. W is a group of ₁ ,W ₂ Respectively two full connection layers FC, F _GAP For a global average pooling operation, F _ap For the input multi-scale feature, w is the attention weight value, where global averaging pooling is averaging all elements in the two-dimensional matrix of each channel in the feature map.

The channel attention network provided in this embodiment usually employs an attention mechanism, and the setting of the attention mechanism usually uses some network algorithms to calculate the weight of a feature, operate the weight of the feature with a high-level feature (feature map), and change the feature map to obtain the feature map with enhanced attention.

In step 104, the attention weight of each channel and the corresponding multi-scale feature are subjected to weighted fusion to obtain the weighted multi-scale feature.

In the step, the attention weight of each channel and the corresponding multi-scale feature are subjected to weighted fusion through a self-adaptive feature fusion module according to the following formula to obtain a weighted-fused multi-scale feature F _r And may be referred to as weighted pyramid features.

Wherein w ∈ R ²⁵⁶ Is a parameter obtained by network learning of a channel attention mechanism, w is an attention weight value,

That is to say, in this embodiment, to solve the problem of large defect scale difference, the obtained basic features are subjected to a multi-scale feature extraction network to obtain a group of multi-scale features F with fixed sizes _ap . In order to further emphasize the features from different scales, the embodiment adopts a channel self-attention network to perform weighted fusion on the features of each channel so as to selectively emphasize effective information in the features of different scales in different channels and suppress interference information, and the weighted fusion mode is like F _r As shown.

The adaptive feature fusion module is specifically shown in fig. 3, and is a schematic diagram of an adaptive feature fusion network provided in an embodiment of the present invention. Wherein the channel attention network is to include: a global tie pooling layer, two fully connected layers, a ReLU activation function and a sigmoid activation function are examples. The mathematical expressions of the ReLU activation function and the sigmoid function are as follows:

ReLU(x)＝max(0，x)

specifically, the extracted feature map is input into the channel self-attention network, and the operation process thereof can be defined as follows:

w＝σ(W ₂ δ(W ₁ F _GAP (F _ap )))

wherein, σ and δ are sigmoid activation function and ReLU activation function respectively. W ₁ And W ₂ Respectively two full connection layers FC, F _GAP For global average pooling operation, F _ap For the input multi-scale features, w is an attention weight value, wherein the global average pooling is an average value of all elements in a two-dimensional matrix of each channel in the feature map, and the global average pooling is defined as follows:

wherein, the adaptive feature fusion module can aim at different multi-scale features F (x, y) when being applied, namely F _ap And inputting to obtain a group of feature adaptive weights W, and performing adaptive weighted fusion of multi-scale features so as to selectively emphasize effective information of different channels and suppress interference information. Meanwhile, the adaptive feature fusion module can learn parameters in a training stage, can obtain gradient information by using back propagation, and updates the parameters by using a gradient descent method.

In the embodiment of the invention, in order to relieve the condition that the number of defect samples with different scales is not uniformly distributed, a sample adaptive weight adjustment loss function is adopted for network training. The trained loss value L (y, f (x; theta)) quantifies the difference between the model prediction f (x; theta) and the true label y. The accuracy of the model as a whole can be measured by the expected risk R (θ) which is defined as:

the model learns a predictive model by training on the data set such that the predictive model minimizes the expected risk on the data set.

Cross entropy loss is often used in defect classification tasks as a loss function for model training, and the cross entropy loss function LE (y, f (x; θ)) is as follows:

LE(y，f(x；θ))＝-y ^T log(f(x；θ)

in the cross entropy loss function, the problem of unbalanced number of class samples is not considered, so in the training stage, the samples with less number of samples have fewer opportunities for calculating loss values and participating in parameter updating in the training stage, and further the model is easier to classify the samples with more number, and is harder to classify the samples with less number of samples correctly due to insufficient training.

Focal local Loss function FL (p) is used to adjust for sample number imbalance _t ) The mathematical expression is as follows:

FL(p _t )＝-α(1-p _t ) ^γ log(p _t )

P _t representing the prediction confidence score of a candidate object, a is a parameter that balances the importance of positive and negative examples, and the adjustment factor (1-p) _t ) ^γ By predicting the score P _t The sum parameter y reduces the loss of simple samples, and for samples with higher confidence the modulation factor will yield a lower weight, whereas it will yield a higher weight. The larger the gamma is, the more important the network pays attention to the learning of the difficult sample, and the more extreme the unbalanced sample problem is suitable for. Thus, focal local can be viewed as positive and negative examplesRe-weighting of training loss.

In step 105, determining a defect category of the image to be classified based on the weighted multi-scale features.

The method specifically comprises the following steps: carrying out dimension adjustment on the weighted multi-scale features by using a convolution network; connecting all connection layers in the convolutional network to obtain the posterior probability of each category; and selecting the maximum class in the posterior probability as the defect class of the image to be classified. For example, the dimensionality of the weighted multi-scale features (i.e., pyramid features) is adjusted by using a group of 1 × 1 convolutional networks, the posterior probability of each class is obtained through a full connection layer, and the class corresponding to the maximum value of the posterior probability is used as the defect class of the image to be classified, so that image classification is completed.

The probability of revising after obtaining the information of the result is the result in the problem of executing result searching. That is, the fact has not occurred, the magnitude of the possibility that the fact is required to occur is a priori probability, and the fact that the fact has occurred is required to be the magnitude of the possibility caused by a certain factor, which is a posterior probability. Generally, the posterior probability is calculated by using the prior probability and a likelihood function through a Bayesian formula based on the prior probability.

In the embodiment of the invention, a basic characteristic diagram of an image to be classified is obtained; performing multi-scale feature extraction on the basic feature map to obtain multi-scale features with fixed sizes; determining attention weight of each channel corresponding to the feature vector of the multi-scale feature; performing weighted fusion on the attention weight of each channel and the corresponding multi-scale features to obtain weighted multi-scale features; determining the defect category of the image to be classified based on the weighted multi-scale features. In other words, in the embodiment of the invention, for the case that the defect scale in the image has a large change, the weight of effective features is pertinently strengthened and the weight of interference suppression information is reduced while the extraction of defect features of different scales is considered through the weighted fusion of the adaptive multi-scale features, so that the accuracy of image defect classification is improved.

Further, the embodiment of the invention uses a convolutional neural network to extract the image basic feature map; on the basis of the characteristic diagram, using a multi-scale characteristic extraction network to extract multi-scale characteristics; the attention mechanism network is utilized to obtain the attention weight of each scale feature, and the attention weight is utilized to weight the obtained multi-scale features, so that the weight of the effective features is enhanced, and the weight of interference information is suppressed; and finally, a Focal local Loss function is used, the training strength of the samples difficult to classify is enhanced, and the classification capability of the network on the difficult classes is improved. The embodiment of the invention can solve the problems of large defect scale change and unbalanced defect classification of the industry. Can be widely applied to the field of industrial defect classification,

in the embodiment of the invention, the training and testing speed of the model is improved based on the end-to-end classification of the deep neural network, and the problem of insufficient class training of fewer samples caused by unbalanced distribution of training samples is solved by increasing the training strength of difficult samples in a targeted manner.

In the embodiment, the multi-scale defect feature extraction capability of the designed multi-scale feature extraction network is stronger. Aiming at the problem that the sizes of detected defect pictures are possibly different greatly in an upstream detection task in industrial vision, and if the sizes are forcibly zoomed to a uniform size, the image deformation is possibly caused, but the embodiment of the invention does not force the image sizes to be uniform, so that the image defect deformation caused in the image zooming process can be greatly reduced, the extraction capability of a model on multi-scale features is improved, and the classification performance of the model under the condition of large defect size change is improved.

By using the multi-scale self-adaptive feature fusion module provided by the embodiment of the invention, effective features can be selectively enhanced. Different from pyramid pooling, the embodiment of the invention can adopt a multi-scale feature extraction network to extract feature information of different granularities and simultaneously use a convolution network to further extract a feature map, thereby enriching semantic information of multi-scale features. Furthermore, the embodiment of the invention uses an attention mechanism to perform adaptive weighted fusion on the multi-scale features, so that effective features can be selectively emphasized and interference information can be suppressed.

It is noted that for simplicity of description, the method embodiments are shown as a series of acts or combinations, but those skilled in the art will appreciate that the present disclosure is not limited by the order of acts, as some steps may, in accordance with the present invention, occur in other orders and/or concurrently. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Fig. 4 is a block diagram of a defect classification apparatus according to an embodiment of the present invention. Referring to fig. 4, the apparatus includes: an obtaining module 401, an extracting module 402, a first determining module 403, a weighted fusion module 404 and a second determining module 405, wherein,

the obtaining module 401 is configured to obtain a basic feature map of an image to be classified;

the extraction module 402 is configured to perform multi-scale feature extraction on the basic feature map to obtain multi-scale features;

the first determining module 403 is configured to determine an attention weight of each channel corresponding to the feature vector of the multi-scale feature;

the weighted fusion module 404 is configured to perform weighted fusion on the attention weight of each channel and the corresponding multi-scale feature to obtain a weighted multi-scale feature;

the second determining module 405 is configured to determine a defect type of the image to be classified based on the weighted multi-scale features.

Optionally, in another embodiment, on the basis of the above embodiment, the obtaining module 401 is specifically configured to perform feature extraction on the image to be classified through a trained feature extraction network to obtain a basic feature map.

Optionally, in another embodiment, on the basis of the above embodiment, the extracting module 402 includes: the structure block diagram of the feature extraction module 501 and the concatenation module 502 is shown in fig. 5, wherein,

the feature extraction module 501 is configured to perform feature extraction on the basic feature map through a trained multi-scale feature extraction network to obtain multi-scale features of different sizes;

the concatenation module 502 is configured to concatenate the multi-scale features of different sizes to obtain the multi-scale feature of the predetermined size.

Optionally, in another embodiment, on the basis of the above embodiment, the first determining module 403 is specifically configured to input the feature vector of the multi-scale feature into a channel attention network for processing, so as to obtain an attention weight of each channel.

Optionally, in another embodiment, on the basis of the above embodiment, the first determining module 403 inputs the feature vector of the multi-scale feature into the channel attention network to process according to the following formula:

w＝σ(W ₂ δ(W ₁ F _GAP (F _ap )))

wherein, σ and δ are sigmoid activation function and ReLU activation function respectively. W ₁ ,W ₂ Respectively two full connection layers FC, F _GAP For global average pooling operation, F _ap For input multi-scale features (of course, feature vectors, etc. may also be understood), w is the attention weight value, where global average pooling is averaging all elements in a two-dimensional matrix for each channel in the feature map.

Optionally, in another embodiment, on the basis of the foregoing embodiment, the weighted fusion module 404 is specifically configured to perform weighted fusion on the attention weight of each channel and the corresponding multi-scale feature according to the following formula through an adaptive feature fusion network to obtain a weighted multi-scale feature, where the formula is:

wherein w ∈ R ²⁵⁶ Attention is paid to through the channelThe force mechanism network learns the obtained parameters, w is the attention weighted value,

Optionally, in another embodiment, on the basis of the foregoing embodiment, the second determining module 405 includes: the structure block diagram of the adjusting module 601, the connecting module 602 and the selecting module 603 is shown in fig. 6, wherein,

the adjusting module 601 is configured to perform dimension adjustment on the weighted multi-scale features by using a convolution network;

the connection module 602 is configured to connect to a full connection layer in a convolutional network to obtain a posterior probability of each category;

the selecting module 603 is configured to select a largest class in the posterior probabilities as a defect class of the image to be classified.

Optionally, referring to fig. 7, a block diagram of a defect classification system according to an embodiment of the present invention is further provided, where the system includes: a convolutional neural network module 701, a pyramid pooling network module 702, an attention mechanism network module 703, a feature adaptive fusion network module 704 and a classification network module 705. Wherein the content of the first and second substances,

the convolutional neural network module 701 is configured to extract basic features of an image to be classified to obtain a basic feature map;

the pyramid pooling network module 702 is configured to perform multi-scale feature extraction on the basic feature map to obtain multi-scale features;

the attention mechanism network module 703 is configured to determine an attention weight of each channel corresponding to the feature vector of the multi-scale feature;

the adaptive fusion network module 704 is configured to perform weighted fusion on the attention weight of each channel and the corresponding multi-scale feature to obtain a weighted multi-scale feature;

the classification network module 705 is configured to determine a defect category of the image to be classified based on the weighted multi-scale features.

In this application example, the functions and functions of each module are detailed in the functions and functions of the corresponding parts in the above embodiments, and are not described again here.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

The embodiment of the invention has stronger extraction capability on multi-scale defect features through the designed multi-scale feature extraction network, and can not force the image size to be uniform aiming at the problem that the size of the detected defect image possibly has larger difference in the upstream detection task in the industrial vision and the image deformation can be caused if the image is forcibly zoomed to the uniform size, thereby greatly reducing the image defect deformation caused in the image zooming process, improving the extraction capability of the model on the multi-scale features from the feature extraction level of the model, and improving the classification performance of the model under the condition of larger defect scale change.

In the embodiment of the invention, the multi-scale self-adaptive feature fusion network module is used, so that effective features are selectively enhanced, the pyramid pooling is distinguished, the multi-scale feature extraction network adopted in the embodiment of the invention extracts feature information with different granularities, and meanwhile, a convolution network is used for further extracting a feature map, so that the semantic information of the multi-scale features is enriched. Furthermore, the embodiment of the invention uses an attention mechanism to perform adaptive weighted fusion on the multi-scale features, so that effective features can be selectively emphasized and interference information can be suppressed.

In an embodiment, there is also provided an electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the defect classification method as described above.

In an embodiment, there is also provided a computer readable storage medium, the instructions in which, when executed by a processor of an electronic device, enable the electronic device to perform the defect classification method as described above. Alternatively, the computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an embodiment, there is also provided a computer program product comprising a computer program or instructions which, when executed by a processor, implements the defect classification method as described above.

Fig. 8 is a block diagram of an electronic device 800 according to an embodiment of the invention. For example, the electronic device 800 may be a mobile terminal or a server, and in the embodiment of the present invention, the electronic device is taken as an example of a mobile terminal. For example, the electronic device 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 8, electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operation at the device 800. Examples of such data include instructions for any application or method operating on the electronic device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 806 provides power to the various components of the electronic device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the electronic device 800.

The multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front-facing camera and/or the rear-facing camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device 800. For example, the sensor assembly 814 may detect an open/closed state of the device 800, the relative positioning of components, such as a display and keypad of the electronic device 800, the sensor assembly 814 may also detect a change in the position of the electronic device 800 or a component of the electronic device 800, the presence or absence of user contact with the electronic device 800, orientation or acceleration/deceleration of the electronic device 800, and a change in the temperature of the electronic device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices. The electronic device 800 may access a wireless network based on a communication standard, such as WiFi, a carrier network (such as 2G, 3G, 4G, or 5G), or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.

In an embodiment, the electronic device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the defect classification methods shown above.

In an embodiment, a computer-readable storage medium, such as the memory 804, is also provided that includes instructions executable by the processor 820 of the electronic device 800 to perform the defect classification method illustrated above.

In an embodiment, there is also provided a computer program product, the instructions of which, when executed by the processor 820 of the electronic device 800, cause the electronic device 800 to perform the defect classification method illustrated above.

Fig. 9 is a block diagram of an apparatus 900 for defect classification according to an embodiment of the present invention. For example, the apparatus 900 may be provided as a server. Referring to fig. 9, the apparatus 900 includes a processing component 922, which further includes one or more processors, and memory resources, represented by memory 932, for storing instructions, such as applications, that are executable by the processing component 922. The application programs stored in memory 932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 922 is configured to execute instructions to perform the above-described methods.

The device 900 may also include a power component 926 configured to perform power management of the device 900, a wired or wireless network interface 950 configured to connect the device 900 to a network, and an input output (I/O) interface 958. The apparatus 900 may operate based on an operating system, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM or the like, stored in the memory 932.

Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.

It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims

1. A method of classifying defects, comprising:

acquiring a basic feature map of an image to be classified;

and determining the defect category of the image to be classified based on the weighted multi-scale features.

2. The method for classifying defects according to claim 1, wherein the obtaining of the basic feature map of the image to be classified comprises:

3. The defect classification method according to claim 1 or 2, wherein the performing multi-scale feature extraction on the basic feature map to obtain multi-scale features comprises:

4. The method of classifying defects according to claim 1 or 2, wherein the determining the attention weight of each channel corresponding to the feature vector of the multi-scale feature comprises:

5. The defect classification method of claim 4, wherein inputting the feature vectors of the multi-scale features into a channel attention network is processed according to the following formula:

w＝σ(W ₂ δ(W ₁ F _GAP (F _ap )))

wherein, sigma and delta are respectively a sigmoid activation function and a ReLU activation function. W is a group of ₁ And W ₂ Respectively two full connection layers FC, F _GAP For global average pooling operation, F _ap For the input multi-scale feature, w is the attention weight value, where global averaging pooling is averaging all elements in the two-dimensional matrix of each channel in the feature map.

6. The image classification method according to claim 1 or 2, wherein the weighted fusion of the attention weight of each channel and the corresponding multi-scale feature to obtain a weighted multi-scale feature comprises:

and performing weighted fusion on the attention weight of each channel and the corresponding multi-scale feature through a self-adaptive feature fusion network according to the following formula to obtain the weighted-fused multi-scale feature, wherein the formula is as follows:

7. The method according to claim 1 or 2, wherein the determining the defect classification of the image to be classified based on the weighted multi-scale features comprises:

8. A defect classification apparatus, comprising:

9. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the defect classification method of any of claims 1 to 7.

10. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the defect classification method of any of claims 1 to 7.