CN112070733A

CN112070733A - Defect rough positioning method and device based on weak supervision mode

Info

Publication number: CN112070733A
Application number: CN202010888542.2A
Authority: CN
Inventors: 陈海波; 许伟康
Original assignee: DeepBlue AI Chips Research Institute Jiangsu Co Ltd
Current assignee: Shenlan Artificial Intelligence Application Research Institute (Shandong) Co.,Ltd.
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2020-12-11

Abstract

The invention provides a defect coarse positioning method and device based on a weak supervision mode, which comprises the following steps: classifying the acquired images according to the non-defective images and the defective images, and weakly labeling the images according to the categories; performing data enhancement processing and normalization processing on an input image; inputting the processed image into a full convolution network for training, and utilizing weak marking information to enable a ResNet-101 network to form an attention machine mechanism for weak supervision training so as to predict the processed image; when the prediction result is a defective image, calculating a semantic feature map by using defect features in a completely trained network and the weight of an output layer, and obtaining a thermodynamic diagram required by coarse positioning according to the semantic feature map; the rough location of the defect is obtained from the thermodynamic diagram and the input image. The method only needs the defect type information, can predict the defect type, marks the general position of the defect in a thermodynamic diagram mode, simplifies the manual marking process and saves the needed manpower and material resources.

Description

Defect rough positioning method and device based on weak supervision mode

Technical Field

The invention relates to the technical field of image processing, in particular to a defect rough positioning method based on a weak supervision mode, a defect rough positioning device based on the weak supervision mode, computer equipment and a computer program product.

Background

The target detection technology based on deep learning is one of the most fierce and hot directions in the computer vision field in recent years, and due to the strong learning capability of the target detection technology, the target detection technology is widely applied to the defect detection field of industrial products. Defects of industrial products are often very complex, and the types and the forms of the defects are all five-door, and the defects are difficult to effectively identify by the traditional image processing technology. The target detection technology can better identify the defects in the unmarked picture by learning the manually marked defect types and position information, give the defect types and mark the positions of the defects by using a rectangular frame. In the field of target detection, strong supervised learning represents a learning mode that in the training process, the target category information and the position information which need to be labeled simultaneously can be given out during prediction.

Most of the existing industrial defect detection modes depend on a target detection algorithm based on strong supervised learning, and include Yolov3, Faster R-CNN and the like. The algorithm firstly needs to be trained by manually marked and defective pictures in the same scene, and the trained pictures can be used for predicting unmarked pictures in the scene. The labeling of pictures requires two kinds of information: the type of defect and the location of the defect, each defect having only one type and one location. For example, fig. 2 is an example of labeling pictures, where a rectangular box labels the position information of the target, and boxes with different colors represent different categories. The training data of the strong supervision target detection needs two labels of a target category and a target position, the labels depend on manual work, when more targets are arranged in a picture, the labeling work is time-consuming and labor-consuming, and if the labeled target position is not accurate enough, the detection effect of the model obtained by training is not ideal enough.

Disclosure of Invention

The invention provides a defect rough positioning method based on a weak supervision mode for solving the technical problems, only the type information of the defects is needed when a defect detection model is trained, but the position information of the defects is not needed, the types of the defects can be given during prediction, and the general positions of the defects are marked in a thermodynamic diagram mode, so that the manual marking process is simplified to a great extent, and the manpower and material resources required by marking are saved.

The technical scheme adopted by the invention is as follows:

a defect rough positioning method based on a weak supervision mode comprises the following steps: classifying the acquired images according to the non-defective images and the defective images, and weakly labeling the images according to the categories; performing data enhancement processing and normalization processing on an input image; inputting the processed image into a full convolution network for training, and in the training process, utilizing weak marking information to enable a ResNet-101 network to form an attention mechanism for weak supervision training so as to predict the processed image; when the prediction result is a defective image, calculating a semantic feature map by using defect features in a fully trained network and the weight of an output layer, and obtaining a thermodynamic diagram required by coarse positioning according to the semantic feature map; and obtaining the rough position of the defect according to the thermodynamic diagram and the input image.

In an embodiment of the present invention, the weakly labeling refers to weakly labeling the image according to the category without specifically labeling the specific position of the defect in the image, and includes: the category of the non-defective image is labeled as 0, and the category of the defective image is labeled as 1.

In one embodiment of the present invention, the data enhancement processing and the normalization processing are performed on the input image, and include: carrying out data enhancement processing on the input image in a random horizontal turning mode, a random vertical turning mode, a random cutting mode, a random contrast variation mode and a random brightness variation mode in sequence; and scaling the image after the data enhancement processing to 448 x 448 to complete the normalization processing of the input image.

In an embodiment of the present invention, the processed image is input into a full convolution network for training, and in the training process, the ResNet-101 network forms an attention machine mechanism by using weak label information to perform weak supervision training, including: training the first three network structures of the ResNet-101 network by using the labeled type labels, identifying the type of the input image so as to form an attention mechanism, training the first two network structures of the ResNet-101 network, and identifying whether the processed image has defects; and outputting the network with complete training until the network converges, and finishing the training process.

In one embodiment of the present invention, the semantic feature map is matrix data with a shape of (2, 14, 14), and a thermodynamic diagram required for coarse positioning is obtained according to the semantic feature map, including: when the prediction result is a non-defective image, taking out a first slice of a first dimension on the semantic feature map; when the prediction result is a defective image, taking out a second slice of the first dimension on the semantic feature map; upsampling the first slice or the second slice to a (448 ) resolution using bilinear interpolation; and adjusting the interpolation value range to the range required by image display to obtain the thermodynamic diagram required by the coarse positioning.

In one embodiment of the invention, deriving a coarse location of a defect from the thermodynamic diagram and the input image comprises: and superposing the thermodynamic diagram and the input image to obtain a rough position of the defect.

In one embodiment of the invention, the thermodynamic diagram is identified using cool and warm colors, wherein the closer the color is to the red portion, the more the model focuses on the region; the closer the color is to the blue portion, the less the representation model is focused on the region.

The invention also provides a defect coarse positioning device based on the weak supervision mode, which comprises the following components: the classification marking module is used for classifying the acquired images according to the non-defective images and the defective images and weakly marking the images according to the categories; the image processing module is used for performing data enhancement processing and normalization processing on the input image; the training module is used for inputting the processed images into a full convolution network for training, and in the training process, the ResNet-101 network forms an attention mechanism by using weak marking information to perform weak supervision training so as to predict the processed images; and the prediction module is used for calculating a semantic feature map by using defect features in a completely trained network and the weight of an output layer when the prediction result is a defective image, obtaining a thermodynamic diagram required by coarse positioning according to the semantic feature map, and obtaining a rough position of the defect according to the thermodynamic diagram and the input image.

The invention also provides a computer device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein when the processor executes the program, the defect rough positioning method based on the weak supervision mode is realized.

The invention also provides a computer program product, and when instructions in the computer program product are executed by a processor, the defect rough positioning method based on the weak supervision mode is executed.

The invention has the beneficial effects that:

the method only needs the defect type information and does not need the position information of the defects when training the defect detection model, can give the defect type when predicting, and marks the general position of the defect in a thermodynamic diagram mode, thereby simplifying the manual marking process to a great extent and saving the manpower and material resources required by marking.

Drawings

FIG. 1 is a flowchart of a defect coarse localization method based on weak supervision mode according to an embodiment of the present invention;

FIG. 2 is a labeled diagram of a strong supervised learning based target detection algorithm in the related art;

FIG. 3 is a schematic diagram of a model training architecture according to one embodiment of the present invention;

FIG. 4 is a schematic comparison of a defect-containing and a defect-free thermodynamic diagram according to one embodiment of the invention;

FIG. 5 is a schematic comparison of a defect-containing and a defect-free thermodynamic diagram according to another embodiment of the invention;

fig. 6 is a block diagram of a defect coarse positioning apparatus based on weak supervision mode according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a flowchart of a defect coarse positioning method based on a weak supervision mode according to an embodiment of the present invention.

As shown in fig. 1, a defect rough localization method based on weak supervision mode according to an embodiment of the present invention may include the following steps:

and S1, classifying the acquired images according to the non-defective images and the defective images, and weakly labeling the images according to the categories.

In an embodiment of the present invention, the weak labeling refers to that a specific position of a defect in an image does not need to be specifically labeled, and the image is weakly labeled according to a category, including: the class of the non-defective image is labeled as 0 and the class of the defective image is labeled as 1.

Specifically, the collected pictures are divided into two types, namely, a non-defective picture and a defective picture, wherein the category label of the non-defective picture is 0, and the category label of the defective picture is 1. Since only the picture category needs to be labeled, a specific labeling tool can be omitted, and only the picture name plus ok or ng is used for identification, such as "1 _ ok.jpg". 80% of all pictures are divided into training sets, the rest are verification sets.

S2, data enhancement processing and normalization processing are performed on the input image.

In one embodiment of the present invention, the data enhancement processing and the normalization processing are performed on the input image, and include: carrying out data enhancement processing on an input image in a random horizontal turning mode, a random vertical turning mode, a random cutting mode, a random contrast variation mode and a random brightness variation mode in sequence; the data enhanced image is scaled to 448 x 448 to complete the normalization of the input image.

That is, before training the model of the image model, the image is preprocessed to adjust the size of the image.

And S3, inputting the processed image into a full convolution network for training, and in the training process, using weak marking information to enable the ResNet-101 network to form an attention mechanism for weak supervision training so as to predict the processed image.

In an embodiment of the present invention, the processed image is input into a full convolution network for training, and in the training process, the ResNet-101 network forms an attention machine mechanism by using weak label information to perform weak supervision training, including: training the first three network structures of the ResNet-101 network by using the labeled type labels, identifying the type of an input image so as to form an attention mechanism, training the first two network structures of the ResNet-101 network, and identifying whether the processed image has defects; and outputting the network with complete training until the network converges, and finishing the training process.

In particular, model training may be improved on the basis of existing deep Weak Supervised Learning (WSL) models. First, using the latest Full Convolutional Networks (FCNs) as backend modules, e.g., ResNe (left side of fig. 3), FCNs recently demonstrated outstanding performance in fully supervised object detection and semantic segmentation, adjusting their ability to retain spatial information in the WSL context. Second, a new multi-mapped WSL transport layer (middle part of fig. 3) is added that explicitly learns a number of localized features associated with complementary class patterns. Finally, a new Pooling strategy was proposed (right side of FIG. 3).

Specifically, the establishment of the training model can be improved on the basis of the existing deep Weak Supervised Learning (WSL) model, as shown in fig. 3, the architecture of the whole network is divided into three parts: the left part is FCN part, which is mainly used to extract feature map; the middle part is a Multi-map transfer layer which mainly decomposes feature maps into Multi-channel features, wherein each channel corresponds to a remarkable local feature; and the rightmost side is Pooling operation, which mainly sets the generated multi-channel feature map aggregate to a value, and finally learns by using the group route information of image level. These three sections are explained separately below.

Full convolution structure: the full convolution results used ResNet-101, the last layer was removed and replaced with a transfer layer and Pooling layer, the parameters of which were trained on ImageNet.

Multi-map transfer layer: this converts the feature map from w x h x d to w x h MC using 1x1 constraint, where M represents the number of channels and represents the number of feature maps; c represents the number of categories.

WILDCAT Pooling: since the group channel is an image label, it is necessary to aggregate information of the transfer layer to an image level. This Pooling layer shares two operations: 1) class-wise firing; 2) spatial posing.

Class-wise firing synthesizes feature maps of different channels into one for each Class, and the formula is as follows, which converts w × h MC into w × h C.

Spatial posing: in a sense, it is a Pooling by general. Is an operation between max Pooling and average Pooling, and is combined with 'negative evidence inertia', namely, a value with a lower activation value is used. The formula is shown below.

Wherein k is⁺Indicating the k values with the highest activation value. k is a radical of^-Indicating the k values with the lowest activation values. So the last value is related to the k values with the highest activation and the k values with the lowest activation and finally averaged. So it can be seen that this operation is an average of 2k activation values, and alpha can adjust the ratio between the highest activation value and the lowest activation value.

Finally, the classification task is trained directly using this network. Meanwhile, for weakly supervised Localization/detection, the class-level feature map before spatial Pooling is used to generate the Localization/detection result, and for weakly supervised segmentation, the maximum value of activation in each class is used or CRF is used as the final prediction.

And S4, when the prediction result is a defective image, calculating a semantic feature map by using the defect features in the fully trained network and the weight of the output layer, and obtaining a thermodynamic diagram required by coarse positioning according to the semantic feature map.

In one embodiment of the present invention, the semantic feature map is matrix data with a shape of (2, 14, 14), and a thermodynamic diagram required for coarse positioning is obtained according to the semantic feature map, including: when the prediction result is a non-defective image, taking out a first slice of a first dimension on the semantic feature map; when the prediction result is a defective image, taking out a second slice of the first dimension on the semantic feature map; upsample the first slice or the second slice to (448 ) resolution using bilinear interpolation; and adjusting the interpolation value range to the range required by image display to obtain the thermodynamic diagram required by coarse positioning.

Specifically, on a single GPU RTX2080Ti, the batch size is 8, the learning rate is 0.001, and a certain number of rounds are iterated using a random gradient descent (SGD) optimizer. The specific number of iterations depends on the size of the training set, when the training set is too small, more iterations are needed, and when the training set is larger, the number of iterations can be properly reduced. The model tests the precision once on the verification set every certain number of rounds in the training process, and the weight corresponding to the model with the highest precision is reserved. And after the model is trained, loading the weight with the highest corresponding precision for prediction. The model finally gives two results, which are respectively represented by circles in fig. 3, and the semantic feature map and the classification result are sequentially shown from left to right. In this embodiment, the classification result is 0 or 1, which correspondingly indicates that the picture does not contain the defect and the picture contains the defect. The semantic feature map is matrix data with a shape of (2, 14, 14), if the model prediction type is 0, a first matrix slice in a first dimension on the semantic feature map is taken out, if the model prediction type is 1, a second matrix slice in the first dimension is taken out, and the slice shape is (14, 14). And then, the matrix slice is up-sampled to (448 ) resolution by using bilinear interpolation, and the value range is adjusted to the range of 0-255 required by picture display, wherein the adjusted matrix is the thermodynamic diagram required by rough positioning. Fig. 4 and 5 are comparison diagrams showing the product without defect original, without defect thermodynamic diagram and with defect thermodynamic diagram, wherein the left side shows the product without defect original, the middle shows the product without defect thermodynamic diagram, and the right side shows the product with defect thermodynamic diagram.

And S5, obtaining the rough position of the defect according to the thermodynamic diagram and the input image.

In one embodiment of the invention, deriving a coarse location of the defect from the thermodynamic diagram and the input image comprises: and superposing the thermodynamic diagram and the input image to obtain a rough position of the defect. The thermodynamic diagram is marked by cold and warm colors, wherein the closer the color is to the red part, the more the model focuses on the region; the closer the color is to the blue portion, the less the representation model is focused on the region.

That is, the thermodynamic diagram is labeled with cold and warm colors, where a portion with a color closer to red indicates that the model is more focused on this region, and a portion with a color closer to blue indicates that the model is less focused. And superposing the thermodynamic diagram and the original drawing to obtain a visual detection result.

It should be noted that the algorithm based on generation of the countermeasure network or reconstruction can also initially locate the defect position without requiring the position labeling information of the defect, but these methods are different from the method of the present invention in the training data and the prediction content. The training data required by the method of the invention comprises both the non-defective image and the defective image, and the defect only needs to be labeled with category information without position information. And the defect types can be classified into two types, namely 'notch' and 'burr', and the model can give rough positioning information of a certain type of defect when predicting. Training data based on algorithms for generating a countermeasure network or reconstruction generally only need non-defective images, and in prediction, no matter how many types of defects exist, the algorithms can only distinguish the predicted images into non-defective images and defect-containing images, namely all defects are regarded as one type, and rough positions of the defects are given.

In conclusion, the defect detection method and the defect detection system only need the defect type information and do not need the defect position information when training the defect detection model, can give the defect type when predicting, and mark the general position of the defect in a thermodynamic diagram mode, thereby simplifying the manual marking process to the great extent and saving the manpower and material resources required by marking.

As shown in fig. 6, the defect coarse positioning apparatus based on weak supervision mode according to the embodiment of the present invention may include: a classification labeling module 10, an image processing module 20, a training module 30 and a prediction module 40.

The classification and labeling module 10 is configured to classify the acquired images according to non-defective images and defective images, and weakly label the images according to the categories. The image processing module 20 is configured to perform data enhancement processing and normalization processing on the input image. The training module 30 is configured to input the processed image into a full convolution network for training, and in the training process, the ResNet-101 network forms an attention mechanism by using weak label information to perform weak supervision training, so as to predict the processed image. The prediction module 40 is configured to, when the prediction result is a defective image, calculate a semantic feature map by using the defect features in the fully trained network and the weights of the output layers, obtain a thermodynamic diagram required for coarse positioning according to the semantic feature map, and obtain a coarse position of the defect according to the thermodynamic diagram and the input image.

According to an embodiment of the present invention, the weak labeling means that the specific position of the defect in the image does not need to be labeled specifically, and the classification annotation module 10 performs weak labeling on the image according to the category, specifically, the category of the non-defective image is labeled as 0, and the category of the defective image is labeled as 1.

According to an embodiment of the present invention, the image processing module 20 performs data enhancement processing and normalization processing on the input image, specifically, is configured to perform data enhancement processing on the input image sequentially through random horizontal flipping, random vertical flipping, random cropping, random contrast variation, and random brightness variation; the data enhanced image is scaled to 448 x 448 to complete the normalization of the input image.

According to an embodiment of the invention, the training module 30 inputs the processed image into a full convolution network for training, and in the training process, the ResNet-101 network forms an attention machine mechanism by using weak labeling information to perform weak supervision training, specifically, the training module is used for training the first three network structures of the ResNet-101 network by using a labeled type label, identifying the type of the input image so as to form an attention machine mechanism, training the first two network structures of the ResNet-101 network, and identifying whether the processed image has defects; and outputting the network with complete training until the network converges, and finishing the training process.

According to an embodiment of the present invention, the semantic feature map is matrix data with a shape of (2, 14, 14), and the prediction module 40 obtains a thermodynamic diagram required for coarse positioning according to the semantic feature map, specifically, when the prediction result is a non-defective image, a first slice of a first dimension on the semantic feature map is taken; when the prediction result is a defective image, taking out a second slice of the first dimension on the semantic feature map; upsample the first slice or the second slice to (448 ) resolution using bilinear interpolation; and adjusting the interpolation value range to the range required by image display to obtain the thermodynamic diagram required by coarse positioning.

According to an embodiment of the present invention, the prediction module 40 obtains a rough location of the defect according to the thermodynamic diagram and the input image, and includes: and superposing the thermodynamic diagram and the input image to obtain a rough position of the defect.

According to one embodiment of the invention, the thermodynamic diagram is identified using cool and warm colors, wherein the closer the color is to the red portion, the more the representation model focuses on the region; the closer the color is to the blue portion, the less the representation model is focused on the region.

It should be noted that details not disclosed in the defect rough location apparatus based on the weak supervision mode in the embodiment of the present invention refer to details disclosed in the defect rough location method based on the weak supervision mode in the embodiment of the present invention, and are not described herein again in detail.

According to the defect rough positioning device based on the weak supervision mode, only the type information of the defects is needed when a defect detection model is trained, the position information of the defects is not needed, the types of the defects can be given during prediction, and the general positions of the defects are marked in a thermodynamic diagram mode, so that the manual marking process is simplified to the great extent, and the manpower and material resources needed by marking are saved.

The invention further provides a computer device corresponding to the embodiment.

The computer device of the embodiment of the invention comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, and when the processor executes the computer program, the defect rough positioning method based on the weak supervision mode can be realized.

According to the computer equipment provided by the embodiment of the invention, when the processor executes the computer program stored on the memory, only the type information of the defect is needed, but the position information of the defect is not needed, the type of the defect is given, and the general position of the defect is marked in a thermodynamic diagram mode, so that the manual marking process is greatly simplified, and the manpower and material resources required by marking are saved.

The invention also provides a non-transitory computer readable storage medium corresponding to the above embodiment.

A non-transitory computer readable storage medium of an embodiment of the present invention stores thereon a computer program, which when executed by a processor, can implement the defect rough location method based on the weak supervision mode according to the above embodiment of the present invention.

According to the non-transitory computer readable storage medium of the embodiment of the invention, when the processor executes the computer program stored on the processor, only the type information of the defect is needed, but the position information of the defect is not needed, the type of the defect is given, and the general position of the defect is marked in a thermodynamic diagram mode, so that the manual marking process is greatly simplified, and the manpower and material resources required by marking are saved.

The present invention also provides a computer program product corresponding to the above embodiments.

When the instructions in the computer program product of the embodiment of the present invention are executed by the processor, the defect rough location method based on the weak supervision mode according to the above-mentioned embodiment of the present invention can be executed.

According to the computer program product provided by the embodiment of the invention, when the processor executes the instruction, only the type information of the defect is needed, but the position information of the defect is not needed, the type of the defect is given, and the general position of the defect is marked in a thermodynamic diagram mode, so that the manual marking process is greatly simplified, and the manpower and material resources required by marking are saved.

In the description of the present invention, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. The meaning of "plurality" is two or more unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A defect rough positioning method based on a weak supervision mode is characterized by comprising the following steps:

classifying the acquired images according to the non-defective images and the defective images, and weakly labeling the images according to the categories;

performing data enhancement processing and normalization processing on an input image;

inputting the processed image into a full convolution network for training, and in the training process, utilizing weak marking information to enable a ResNet-101 network to form an attention mechanism for weak supervision training so as to predict the processed image;

when the prediction result is a defective image, calculating a semantic feature map by using defect features in a fully trained network and the weight of an output layer, and obtaining a thermodynamic diagram required by coarse positioning according to the semantic feature map;

and obtaining the rough position of the defect according to the thermodynamic diagram and the input image.

2. The method for coarsely positioning the defect based on the weak supervision mode as claimed in claim 1, wherein the weak labeling refers to that the specific position of the defect in the image does not need to be specifically labeled, and the image is weakly labeled according to the category, which includes:

the category of the non-defective image is labeled as 0, and the category of the defective image is labeled as 1.

3. The method for coarsely positioning the defects based on the weak supervision mode according to claim 1, wherein the data enhancement processing and the normalization processing are performed on the input image, and the method comprises the following steps:

carrying out data enhancement processing on the input image in a random horizontal turning mode, a random vertical turning mode, a random cutting mode, a random contrast variation mode and a random brightness variation mode in sequence;

and scaling the image after the data enhancement processing to 448 x 448 to complete the normalization processing of the input image.

4. The method for coarsely positioning the defects based on the weak supervision mode as claimed in claim 1, wherein the processed image is input into a full convolution network for training, and in the training process, the ResNet-101 network forms an attention machine mechanism by using weak label information to perform the weak supervision training, which comprises:

training the first three network structures of the ResNet-101 network by using the labeled type labels, identifying the type of the input image so as to form an attention mechanism, training the first two network structures of the ResNet-101 network, and identifying whether the processed image has defects;

and outputting the network with complete training until the network converges, and finishing the training process.

5. The method for coarsely positioning the defect based on the weak supervision mode according to claim 1, wherein the semantic feature map is matrix data with a shape of (2, 14, 14), and a thermodynamic diagram required for coarsely positioning is obtained according to the semantic feature map, and the method comprises the following steps:

when the prediction result is a non-defective image, taking out a first slice of a first dimension on the semantic feature map;

when the prediction result is a defective image, taking out a second slice of the first dimension on the semantic feature map;

upsampling the first slice or the second slice to a (448 ) resolution using bilinear interpolation;

and adjusting the interpolation value range to the range required by image display to obtain the thermodynamic diagram required by the coarse positioning.

6. The method for coarsely positioning the defect based on the weak supervision mode according to claim 1, wherein obtaining the coarse position of the defect according to the thermodynamic diagram and the input image comprises:

and superposing the thermodynamic diagram and the input image to obtain a rough position of the defect.

7. The method of claim 6, wherein the thermodynamic diagram is identified by cold and warm colors, wherein,

the closer the color is to the red portion, the more the representation model focuses on the region;

the closer the color is to the blue portion, the less the representation model is focused on the region.

8. A defect rough positioning device based on a weak supervision mode is characterized by comprising:

the classification marking module is used for classifying the acquired images according to the non-defective images and the defective images and weakly marking the images according to the categories;

the image processing module is used for performing data enhancement processing and normalization processing on the input image;

the training module is used for inputting the processed images into a full convolution network for training, and in the training process, the ResNet-101 network forms an attention mechanism by using weak marking information to perform weak supervision training so as to predict the processed images;

and the prediction module is used for calculating a semantic feature map by using defect features in a completely trained network and the weight of an output layer when the prediction result is a defective image, obtaining a thermodynamic diagram required by coarse positioning according to the semantic feature map, and obtaining a rough position of the defect according to the thermodynamic diagram and the input image.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for defect coarse localization based on weak supervision mode according to any of claims 1-7 when executing the program.

10. A computer program product, characterized in that instructions in the computer program product, when executed by a processor, perform a method for defect coarse localization based on weak supervision mode according to any of claims 1-7.