CN112348787A

CN112348787A - Training method of object defect detection model, object defect detection method and device

Info

Publication number: CN112348787A
Application number: CN202011210102.8A
Authority: CN
Inventors: 杜松
Original assignee: Quarkdata Software Co ltd
Current assignee: Quarkdata Software Co ltd; ThunderSoft Co Ltd
Priority date: 2020-11-03
Filing date: 2020-11-03
Publication date: 2021-02-09

Abstract

The application discloses a training method of an object defect detection model, and an object defect detection method and device, wherein the training method of the object defect detection model comprises the following steps: acquiring an image to be trained of an object, and performing feature extraction on the image to be trained through a feature extraction sub-network of an object defect detection model to obtain a feature map of the image to be trained; classifying the feature map of the image to be trained through a classification sub-network of the object defect detection model to obtain a classification result and a classification loss value of the image; performing defect detection on the defective image to be trained through a defect detection subnetwork of the object defect detection model to obtain a defect detection result and a defect detection loss value of the defective image; and optimizing parameters of the object defect detection model by using a gradient descent algorithm according to the classification loss value and the defect detection loss value to obtain the trained object defect detection model. The training method shortens the whole time consumption of model training and improves the detection precision of the model.

Description

Training method of object defect detection model, object defect detection method and device

Technical Field

The application relates to the technical field of object detection, in particular to a training method of an object defect detection model, and an object defect detection method and device.

Background

In the technical field of workpiece and other object detection, the acquisition of ultrahigh-resolution workpiece images is an important guarantee for improving the accuracy of workpiece detection results. The ultra-high resolution image generally refers to an image with more than 5000 ten thousand pixels, however, when the ultra-high resolution image is used for training and testing a defect detection model, the following problems are encountered: 1) the image resolution is too high, so that the video memory of the existing defect detection model is too high in occupation during training, and the existing defect detection model cannot be normally trained on a GPU (Graphics Processing Unit) with only 16G video memory; 2) the scale change of the workpiece defects is usually many, and the same detection model is difficult to simultaneously consider large defects and small defects which may exist on the workpiece; 3) the overall run time of the model is longer.

Aiming at the problems, the prior art is improved, the ultrahigh-resolution image is compressed or split, then the ultrahigh-resolution image is input into a neural network model for training, and finally the defect detection is carried out on the workpiece according to the trained model.

However, the inventor finds that the above-mentioned training method of the defect detection model for the ultrahigh resolution image still has the problem that the training efficiency and the detection accuracy of the model cannot be ensured at the same time.

Disclosure of Invention

In view of the above, the present application is proposed to provide a training method of an object defect detection model, an object defect detection method and apparatus that overcome the above problems or at least partially solve the above problems.

According to a first aspect of the present application, there is provided a training method of an object defect detection model, including:

acquiring an image to be trained of an object, and performing feature extraction on the image to be trained through a feature extraction sub-network of an object defect detection model to obtain a feature map of the image to be trained, wherein the image to be trained comprises an image set of the image to be trained with defects and the image to be trained without defects;

classifying the defective images to be trained and the feature maps of the non-defective images to be trained through a classification sub-network of an object defect detection model to obtain a classification result and a classification loss value of the images to be trained;

performing defect detection on the defective image to be trained through a defect detection subnetwork of the object defect detection model to obtain a defect detection result and a defect detection loss value of the defective image;

and optimizing parameters of the object defect detection model by using a gradient descent algorithm according to the classification loss value and the defect detection loss value to obtain a trained object defect detection model, and performing object defect detection based on the trained object defect detection model.

Optionally, the performing, by the feature extraction sub-network of the object defect detection model, feature extraction on the image to be trained includes:

inputting the image to be trained into the feature extraction sub-network, and sequentially performing 1x1 convolution, 3x3 convolution, global maximum pooling, 3x3 convolution and global maximum pooling to obtain a sub-feature map;

performing 3x3 convolution on the sub-feature map once to obtain a first output feature map;

performing one-time 3x3 convolution on the first output characteristic diagram to obtain a second output characteristic diagram;

performing one-time convolution of 3x3 on the second output characteristic diagram to obtain a third output characteristic diagram;

and performing multiple full connections on the first output characteristic diagram, the second output characteristic diagram and the third output characteristic diagram to obtain a fourth output characteristic diagram, and taking the fourth output characteristic diagram as the characteristic diagram of the image to be trained.

Optionally, the classifying the feature maps of the defective image to be trained and the non-defective image to be trained through a classification sub-network of the object defect detection model to obtain a classification result and a classification loss value of the image to be trained includes:

inputting the feature maps of the defective images to be trained and the non-defective images to be trained into a pooling layer of the classification sub-network for global average pooling processing, outputting a global average pooling feature map, performing global maximum pooling processing, and outputting a global maximum pooling feature map;

after the global average pooling feature map and the global maximum pooling feature map are subjected to multiple times of full connection, outputting a global pooling feature map;

performing Softmax operation on the global pooling feature map to obtain an output vector of the image to be trained, and obtaining a classification result of the image to be trained according to the output vector, wherein the output vector represents a defective confidence coefficient of the image to be trained;

and calculating the error between the classification result of the image to be trained and the image label of the image to be trained to obtain the classification loss value of the image to be trained.

Optionally, the performing, by the defect detection subnetwork of the object defect detection model, defect detection on the defective image to be trained to obtain a defect detection result and a defect detection loss value of the defective image includes:

inputting the feature map of the defective image into an area generation network in the defect detection sub-network, and extracting candidate frames;

inputting the extracted candidate frame into an interested region pooling layer in the defect detection sub-network, and outputting the classification and coordinate regression of the candidate frame as a defect detection result of the defective image;

and calculating the error between the defect detection result of the defective image and the defect type label and the defect position of the defective image to obtain a defect detection loss value of the defective image.

Optionally, the inputting the feature map of the defective image into the area generation network in the defect detection sub-network, and the extracting the candidate frame includes:

generating a plurality of anchor point frames with different scales for each feature point on the feature map of the defective image;

classifying each anchor frame by utilizing a first full-link layer in the area generation network to obtain a classification result of each anchor frame, and performing position regression on each anchor frame by utilizing a second full-link layer in the area generation network to obtain a position offset of each anchor frame;

and outputting the candidate frame and the position coordinate of the defective image according to the classification result and the position offset of each anchor point frame.

Optionally, the inputting the extracted candidate frame into a region-of-interest pooling layer in the defect detection sub-network, and outputting the classification and coordinate regression of the candidate frame as the defect detection result of the defective image includes:

mapping the candidate frame to a feature map of the defective image to obtain a region of interest;

and inputting the region of interest into a region of interest pooling layer in the defect detection sub-network, and outputting the classification and coordinate regression of the region of interest as a defect detection result of the defective image.

According to a second aspect of the present application, there is provided an object defect detection method, comprising:

acquiring a target image of an object to be detected, and performing feature extraction on the target image through a feature extraction sub-network of an object defect detection model to obtain a feature map of the target image;

classifying the feature map through a classification sub-network of an object defect detection model to obtain a classification result, wherein the classification result comprises that the target image is a defective image or a non-defective image;

if the target image is a defective image, performing defect detection on the target image through a defect detection subnetwork of the object defect detection model to obtain a defect detection result of the target image of the object to be detected, and feeding back defect information of the object to be detected based on the defect detection result;

if the target image is a non-defective image, directly feeding back the defect detection result as non-defective;

the object defect detection model is obtained by training based on the object defect detection method.

According to a third aspect of the present application, there is provided a training apparatus for an object defect detection model, comprising:

the first feature extraction unit is used for acquiring an image to be trained of an object, and performing feature extraction on the image to be trained through a feature extraction sub-network of an object defect detection model to obtain a feature map of the image to be trained, wherein the image to be trained comprises an image set of a defective image to be trained and a non-defective image to be trained;

the first classification unit is used for classifying the defective images to be trained and the feature maps of the non-defective images to be trained through a classification sub-network of an object defect detection model to obtain a classification result and a classification loss value of the images to be trained;

the first defect detection unit is used for carrying out defect detection on the defective image to be trained through a defect detection sub-network of the object defect detection model to obtain a defect detection result and a defect detection loss value of the defective image;

and the optimization unit is used for optimizing the parameters of the object defect detection model by using a gradient descent algorithm according to the classification loss value and the defect detection loss value to obtain a trained object defect detection model, and performing object defect detection based on the trained object defect detection model.

According to a fourth aspect of the present application, there is provided an object defect detecting apparatus comprising:

the second feature extraction unit is used for acquiring a target image of the object to be detected and extracting features of the target image through a feature extraction sub-network of the object defect detection model to obtain a feature map of the target image;

the second classification unit is used for classifying the feature map through a classification sub-network of the object defect detection model to obtain a classification result, wherein the classification result comprises that the target image is a defective image or a non-defective image;

the second defect detection unit is used for carrying out defect detection on the target image through a defect detection sub-network of the object defect detection model to obtain a defect detection result of the target image of the object to be detected and feeding back defect information of the object to be detected based on the defect detection result if the target image is the defective image; if the target image is a non-defective image, directly feeding back the defect detection result as non-defective; and the object defect detection model is obtained by training based on the training device of the object defect detection model.

According to a fifth aspect of the present application, there is provided an electronic device comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method of training an object defect detection model as described in any one of the above.

According to a sixth aspect of the present application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of training an object defect detection model as described in any one of the above.

According to the technical scheme, the image to be trained of the object is obtained, and the feature extraction sub-network of the object defect detection model is used for carrying out feature extraction on the image to be trained to obtain the feature map of the image to be trained, wherein the image to be trained comprises an image set of the image to be trained with defects and the image to be trained without defects; classifying the defective images to be trained and the feature maps of the non-defective images to be trained through a classification sub-network of an object defect detection model to obtain a classification result and a classification loss value of the images to be trained; performing defect detection on the defective image to be trained through a defect detection subnetwork of the object defect detection model to obtain a defect detection result and a defect detection loss value of the defective image; and optimizing parameters of the object defect detection model by using a gradient descent algorithm according to the classification loss value and the defect detection loss value to obtain a trained object defect detection model, and performing object defect detection based on the trained object defect detection model. The training method of the object defect detection model solves the problem that the traditional backbone network video memory is too high in occupation and cannot be trained, and greatly shortens the whole time consumption of model training. The training of the defective images to be trained and the non-defective images to be trained is simultaneously carried out for the feature extraction sub-network and the classification sub-network, and the training of the defective images to be trained is only carried out for the defect detection sub-network, so that the interference of the non-defective images to be trained is reduced, and the accuracy of the model is improved.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a schematic diagram illustrating a defect detection process of an object in the prior art;

FIG. 2 is a schematic diagram of another object defect detection process in the prior art;

FIG. 3 shows a flow diagram of a method of training an object defect detection model according to an embodiment of the present application;

FIG. 4 shows a schematic flow chart of training of an object defect detection model according to an embodiment of the present application;

FIG. 5 illustrates an overall structural view of an object defect detection model according to an embodiment of the present application;

FIG. 6 illustrates an anchor block in accordance with one embodiment of the present application;

FIG. 7 shows a schematic flow diagram of an object defect detection method according to an embodiment of the present application;

FIG. 8 shows a schematic view of an object defect detection flow according to an embodiment of the present application;

FIG. 9 is a schematic diagram of a training apparatus for an object defect detection model according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an object defect detection apparatus according to an embodiment of the present application;

FIG. 11 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 12 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.

In the figure: conv2d denotes a convolutional layer, MaxPool2d denotes a max pooling layer, AdaptevaVgPool 2d denotes an adaptive average pooling layer, AdaptevaMaxPool 2d denotes an adaptive max pooling layer, Linear denotes a fully connected layer, and Cross Entrophy Loss denotes cross entropy Loss.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

As shown in fig. 1, a method for training a defect detection model in the prior art is provided, in which a resolution of an image is directly reduced by multiple times, for example, an image with a resolution of 10000 × 10000 is directly reduced to an image with a resolution of 2000 × 2000, and then a target detection network such as fast Region-Convolutional Neural network (fast Convolutional Neural network) trains and infers the defect detection model on the reduced image.

The method has good detection effect on large defects in the image and has high running speed. However, the resolution of the image is reduced by times, so that the details of the small defect are lost by times, and the discrimination capability of the trained model for the small defect is weakened by times.

As shown in fig. 2, another prior art training method for a defect detection model is provided, in which an image is first split into a plurality of small blocks for individual processing, for example, an image with a resolution of 10000 × 10000 is split into 400 small blocks with a resolution of 500 × 500, and then the plurality of small blocks obtained after splitting are trained and inferred by a target detection network such as fast RCNN.

Although the method solves the problem that the image is too large and cannot be trained, and simultaneously retains the capability of distinguishing small defects of the image, the method still has the following problems: 1) the overall operation time of the model is too long, and the splitting, the storage and the result combination of the image bring extra operation time; meanwhile, in order to ensure that defects appearing at the edge of the small block can be supported well, a certain overlapping area needs to be added during splitting, which also brings extra calculation amount. 2) Because the proportion of the defect part in the image is usually small, many small blocks of the image obtained after splitting are samples without defects, if all the small blocks are sent to the model for training, positive and negative samples (positive samples are image blocks containing defects, and negative samples are image blocks not containing defects) are extremely unbalanced, and the omission of the trained model is increased; and if only the positive samples are trained, the model cannot accurately distinguish the negative samples, and the overdetection is increased.

In addition, the general target detection model fast RCNN used in the above prior art generally includes two parts, namely a Backbone network backhaul and a defect localization network Detector, where the Backbone network is generally a respet 18 residual network, a respet 50 residual network, a VGG16(Visual Geometry Group 16, a convolutional neural network developed by the Visual Geometry Group of oxford university), and the like, and is used for extracting image features. However, the standard Resnet and VGG networks are mainly designed for image classification and detection tasks in natural scenes, and images in natural scenes are diversified, background interference is too much, and the total information capacity requirement of the models is high, so that the models are often huge and have extremely high requirements on GPU performance.

Based on this, the embodiment of the present application provides an object defect detection model and a training method thereof, and the object defect detection model of the embodiment of the present application includes: the feature extraction sub-network, the classification sub-network and the defect detection sub-network can be better applied to the defect detection scene with simple environment, such as the workpiece defect detection scene in a workshop.

Specifically, as shown in fig. 3, the object defect detection model is obtained by training in steps S310 to S330 as follows:

step S310, obtaining an image to be trained of an object, and performing feature extraction on the image to be trained through a feature extraction sub-network of an object defect detection model to obtain a feature map of the image to be trained, wherein the image to be trained comprises an image set of a defective image to be trained and a non-defective image to be trained.

Before training of the object defect detection model, an image to be trained of an object can be acquired, if the object defect detection model is applied to a scene of workpiece detection, a professional imaging device such as a high-definition camera can be adopted to carry out all-dimensional scanning or shooting on the workpiece, and then the image to be trained of the workpiece is acquired, wherein the acquired image to be trained can be an ultrahigh-resolution image with image pixels exceeding 5000 ten thousand, so that the accuracy of subsequent defect detection is improved, and the acquired image can be a color image or a gray image. Of course, what kind of method is specifically adopted to obtain the image to be trained of the object may be flexibly selected by those skilled in the art according to the actual situation, and is not specifically limited herein.

And step S320, classifying the feature maps of the defective images to be trained and the non-defective images to be trained through a classification sub-network of the object defect detection model to obtain a classification result and a classification loss value of the images to be trained.

Specifically, in order to enable the classifying sub-network to accurately distinguish the defective images to be trained from the non-defective images to be trained, the feature maps of the defective images to be trained and the non-defective images to be trained extracted by the feature extraction sub-network may be simultaneously input into the classifying sub-network for training, so as to obtain the classification result output by the classifying sub-network. Because the images to be trained all have defective/non-defective image labels, the classification loss value of the classification sub-network can be obtained by comparing the classification result of the images to be trained output by the classification sub-network with the image labels marked in advance.

And S330, performing defect detection on the defective image to be trained through a defect detection sub-network of the object defect detection model to obtain a defect detection result and a defect detection loss value of the defective image.

As described above, in the training method of the object defect detection model in the prior art, after the image is split, all the small blocks of the image obtained after the image is split are sent to the model for training, however, since many small blocks of the image obtained after the image is split are samples without defects, if all the small blocks are sent to the model for training, the positive and negative samples are extremely unbalanced, and the missing of the trained model increases. Therefore, only the images to be trained with defects are input into the defect detection sub-network for training, so that the interference of a large number of images to be trained without defects on the training process can be avoided, and the accuracy of model training is improved. Similarly, the defect image to be trained refers to an image with the defect position and defect type marked, and the defect detection loss value of the defect detection sub-network can be obtained by comparing the defect detection result of the defect image to be trained output by the defect detection sub-network with the defect position and defect type marked.

And step S340, optimizing parameters of the object defect detection model by using a gradient descent algorithm according to the classification loss value and the defect detection loss value to obtain a trained object defect detection model, and performing object defect detection based on the trained object defect detection model.

After the classification loss value and the defect detection loss value are obtained, the classification loss value and the defect detection loss value can be added to be used as the total loss of the object defect detection model training, then the parameters of the corresponding molecular networks in the object defect detection model are optimized by using a gradient descent algorithm to obtain the trained object defect detection model, and further the subsequent object defect detection can be carried out based on the trained object defect detection model.

The training method of the object defect detection model solves the problem that the traditional backbone network video memory is too high in occupation and cannot be trained, and greatly shortens the whole time consumption of model training. The training of the defective images to be trained and the non-defective images to be trained is simultaneously carried out for the feature extraction sub-network and the classification sub-network, and the training of the defective images to be trained is only carried out for the defect detection sub-network, so that the interference of the non-defective images to be trained is reduced, and the accuracy of the model is improved.

As shown in fig. 4, a schematic diagram of a training process of an object defect detection model is provided. Firstly, acquiring images to be trained, including marked defective training images and non-defective training images, sending all the defective training images and the non-defective training images into a feature extraction sub-network SuperResBackbone for feature extraction, then sending the extracted feature images into a classification sub-network Classifier for feature classification, acquiring classification results and classification losses Loss _ cls of the images to be trained, and optimizing parameters of the feature extraction sub-network and the classification sub-network by using the classification losses.

For the training of the defect detection sub-network Detector, the marked defect training images are only required to be sent into the defect detection sub-network for training, and the feature images corresponding to the defect training images extracted by the feature extraction sub-network are input into the defect detection sub-network for defect detection, so that a defect detection result and a defect detection Loss _ det are obtained. And finally, calculating the total Loss of model training, namely Loss _ cls + alpha Loss _ det, wherein the value range of alpha can be 0-10, defaulting to select alpha 1, and carrying out corresponding optimization on the parameters of each sub-network of the object defect detection model by using a gradient descent algorithm according to the total Loss, until the model achieves the expected effect.

Both the classification Loss and the defect detection Loss can be calculated by using a cross entropy Loss function (cross entropy Loss), and the specific formula is as follows:

wherein, y_iIs the probability that the image to be trained belongs to a defective image or a non-defective image.

In an embodiment of the present application, an object defect detection model is mainly applied to scenes such as workpiece detection, and since workpiece surface defect detection is usually performed in environments such as a production workshop, a shooting scene of an image is relatively fixed, a form of a shot workpiece is also limited, a total information capacity requirement of a model is relatively small, and a huge backbone network is not required, a convolutional neural network structure superreserbakbone with a simpler and more convenient structure is designed to perform feature extraction, and the convolutional neural network structure superreserbakbone includes: 6 convolutional layers, 2 pooling layers, at least 2 full-link layers. Specifically, the characteristic extraction of the image to be detected through a characteristic extraction sub-network (SuperResBackbone) of the object defect detection model comprises the following steps: inputting the image to be trained into the feature extraction sub-network, and sequentially performing 1x1 convolution, 3x3 convolution, global maximum pooling, 3x3 convolution and global maximum pooling to obtain a sub-feature map; performing 3x3 convolution on the sub-feature map once to obtain a first output feature map; performing one-time 3x3 convolution on the first output characteristic diagram to obtain a second output characteristic diagram; performing one-time convolution of 3x3 on the second output characteristic diagram to obtain a third output characteristic diagram; and performing multiple full connections on the first output characteristic diagram, the second output characteristic diagram and the third output characteristic diagram to obtain a fourth output characteristic diagram, and taking the fourth output characteristic diagram as the characteristic diagram of the image to be trained.

The convolutional neural network structure SuperResBackbone (feature extraction sub-network) of the embodiment can improve the overall operation speed of the model and save a certain information storage space, the feature extraction sub-network does not need to compress or split the image with ultrahigh resolution in advance, and the learning capability of the model on the global information and the local detail information of the image is retained.

As shown in fig. 5, the feature extraction sub-network superresebackbone of the embodiment of the present application mainly includes 6 convolutional layers Conv2d and 2 max pooling layers Maxpool2 d. If the image to be trained collected in advance is a color image, considering that the requirement on the color information of the image per se is not high in scenes such as workpiece detection and the like, the whole image to be detected can be subjected to 1x1 convolution processing, namely, the color image is changed into a gray image with channel enhancement, so that the bandwidth of the whole network can be directly reduced by 3 times. The traditional algorithm for converting a color image into a gray image is designed for human vision, and is not necessarily optimal for machine differentiation, so that the embodiment of the application can obtain the gray image optimal for subsequent defect detection by performing convolution processing of 1 × 1. Of course, if the acquired image is a grayscale image, the aforementioned convolution step of the first layer 1x1 of the SuperResBackbone may be omitted.

And then sequentially carrying out 3x3 convolution, one-time global maximum pooling, one-time 3x3 convolution and one-time global maximum pooling on the gray level image obtained after the 1x1 convolution processing to obtain a sub-feature map. After the last maximum pooling layer, three 3x3 convolutional layers with consistent feature shapes (the shape is NxCxHxW) are connected to represent the expression of defect features under different receptive fields, and in the convolutional neural network, a receptive field can be understood as the size of a region where pixel points on a feature map output by each layer of the convolutional neural network are mapped on an original image. Specifically, the sub-feature map is first convolved by 3 × 3 to obtain a first output feature map and a first receptive field; performing one-time 3x3 convolution on the first output characteristic diagram to obtain a second output characteristic diagram and a second receptive field; the second output feature map is subjected to 3 × 3 convolution once to obtain a third output feature map and a third receptive field. And after the first output characteristic diagram, the second output characteristic diagram and the third output characteristic diagram are fully connected for two or more times, a fourth output characteristic diagram and a fourth receptive field are obtained and are used as characteristic diagrams (the shape is Nx3CxHxW) finally output by the characteristic extraction sub-network SuperResBackbone, wherein N is the number of images to be trained in one operation, C is the number of characteristics, and HxW is the size of one characteristic diagram, namely the Height (Height) x Width (Width) of the characteristic diagram. Of course, besides the feature extraction sub-networks with the above structures, those skilled in the art can flexibly set other feature extraction sub-networks with reasonable structures according to actual situations, which are not listed here.

In an embodiment of the present application, the classifying the feature maps of the defective image to be trained and the non-defective image to be trained through the classification sub-network of the object defect detection model, and obtaining the classification result and the classification loss value of the image to be trained includes: inputting the feature maps of the defective images to be trained and the non-defective images to be trained into a pooling layer of the classification sub-network for global average pooling processing, outputting a global average pooling feature map, performing global maximum pooling processing, and outputting a global maximum pooling feature map; after the global average pooling feature map and the global maximum pooling feature map are subjected to multiple times of full connection, outputting a global pooling feature map; performing Softmax operation on the global pooling feature map to obtain an output vector of the image to be trained, and obtaining a classification result of the image to be trained according to the output vector, wherein the output vector represents a defective confidence coefficient of the image to be trained; and calculating the error between the classification result of the image to be trained and the image label of the image to be trained to obtain the classification loss value of the image to be trained.

As shown in fig. 5, the classification sub-network Classifier here mainly comprises 1 global average pooling layer AdaptiveAvgPool2d, 1 global maximum pooling layer AdaptiveMaxPool2d and 2 fully connected layers (Linear) Linear. Firstly, the feature maps of the images to be trained, which are finally extracted by the feature extraction sub-network, are respectively input into a global average pooling layer for global average pooling, and are input into a global maximum pooling layer for global maximum pooling, so that a global average pooling feature map and a global maximum pooling feature map are respectively obtained. The global maximum pooling is to calculate the maximum pixel value of the whole feature map for the feature map of the last layer of convolution, so that the feature texture of the image can be extracted, and the influence of useless information is reduced. The global average pooling is to calculate the pixel average value of the whole feature map for the feature map of the last layer of convolution, so that the background information of the image can be better reserved. The shapes of the feature maps output by the global average pooling and the global maximum pooling are consistent (the shape is Nx3Cx1x1), and then the feature maps obtained by the two pooling are fully connected twice, so that the global image information of the finally output global pooled feature map (the shape is Nx6Cx1x1) capable of capturing images comprises the feature texture information, the background information and the like of the images, and a foundation is laid for the classification and defect detection of subsequent images.

And then performing Softmax function operation on the global pooling feature map to obtain an output vector of the image to be trained, wherein the output vector can represent the defective confidence coefficient of the image to be trained, and further obtain a classification result of the image to be trained according to the output vector, for example, if the confidence coefficient of the image to be trained represented by the output vector is 0.75, and if the confidence coefficient of the image to be trained represented by the output vector is 0.8 and exceeds the confidence coefficient threshold, the classification result of the image to be trained is considered to be a defective image. And finally, calculating the error between the classification result of the image to be trained which is finally output by the classification sub-network and the labeled image label of the image to be trained, thereby obtaining the classification Loss Loss _ cls of the image to be trained.

The embodiment of the application can adopt a Softmax function to calculate the output vector, and the specific formula is as follows:

wherein, y_cAnd the probability that the image to be trained belongs to the defective image or the non-defective image is represented, and the range is 0-1.

In an embodiment of the present application, the performing defect detection on the defective image to be trained through a defect detection subnetwork of the object defect detection model to obtain a defect detection result and a defect detection loss value of the defective image includes: inputting the feature map of the defective image into an area generation network in the defect detection sub-network, and extracting candidate frames; inputting the extracted candidate frame into an interested region pooling layer in the defect detection sub-network, and outputting the classification and coordinate regression of the candidate frame as a defect detection result of the defective image; and calculating the error between the defect detection result of the defective image and the defect type label and the defect position of the defective image to obtain a defect detection loss value of the defective image.

The defect detection sub-network Detector herein mainly includes a Region generation network rpn (Region pro-social network) and a Region Of Interest Pooling layer ROI-pool (Region Of Interest-pool). RPN is proposed in the fast R-CNN network, and the traditional R-CNN (regional Convolutional Neural Networks) proposes a plurality of candidate frames on an original image by selectively searching the Selective Search algorithm, and then sends the candidate frames into the CNN network for feature extraction. Fast R-CNN (Fast regional Convolutional Neural Networks) is to send the whole image into CNN network for feature extraction, and then extract candidate boxes on the feature map by Selective Search algorithm. However, both of these methods use an offline selective search algorithm, which is time-consuming and unable to learn how to extract the candidate frame end to end, and the RPN is used to incorporate the extraction of the candidate frame into the end-to-end learning, thereby improving the extraction efficiency of the candidate frame. Therefore, in the embodiment of the present application, for the purpose of improving the detection efficiency of the defect detection model, the RPN network is used to perform candidate frame extraction on the feature map of the defective image output from the classification subnetwork.

The ROI-Pooling layer is mainly used for obtaining the output with a fixed size from feature maps with different input sizes, that is, regions of interest corresponding to the extracted candidate frames, by a blocking Pooling method, and further obtaining the positions and categories of defects by a full connection layer and a classifier layer, and using the positions and categories as the final defect detection result of the defective images. Specifically, the defect detection result output by the defect detection subnetwork may be a feature vector of JxKx5, where J denotes J defects detected by the defect detection subnetwork, K denotes K categories of defects, and 5 denotes coordinates of each candidate box and confidence of the defects. And then calculating the error between the defect detection result of the defective image and the defect type label and the defect position marked by the defective image, thereby obtaining the defect detection Loss Loss _ det of the defective image. The ROI-Pooling layer of the embodiment of the application can obviously improve the training speed of a defect detection model and can also improve the detection accuracy of the model.

In an embodiment of the application, the inputting the feature map of the defective image into the area generation network in the defect detection sub-network, and the extracting the candidate frame includes: generating a plurality of anchor point frames with different scales for each feature point on the feature map of the defective image; classifying each anchor frame by utilizing a first full-link layer in the area generation network to obtain a classification result of each anchor frame, and performing position regression on each anchor frame by utilizing a second full-link layer in the area generation network to obtain a position offset of each anchor frame; and outputting the candidate frame and the position coordinate of the defective image according to the classification result and the position offset of each anchor point frame.

When the region generation network is used for extracting the candidate frames of the feature map of the defective image, the embodiment of the application can be realized by the following modes: first, for each point (which may be referred to as Anchor point) on the feature map, Anchor points with different scales and aspect ratios are generated, in the Fast R-CNN network, Anchor points with 3 scales and 3 aspect ratios (1:1, 1:2, 2:1) are usually used, so that there are 9 Anchor points corresponding to each sliding window position, and these 9 Anchor points can be understood as 9 possible sizes in the original image area corresponding to each sliding window position when the feature map is slid in the RPN network, which is equivalent to a template, and these 9 templates are used for any image and any sliding window position, as shown in fig. 6, a structural diagram of the 9 Anchor points used in the Fast R-CNN network is given. And then inputting the anchor points into two full-connection layers, wherein one full-connection layer is used for classification, namely used for judging whether the feature map in the anchor point belongs to the foreground or the background, and the other full-connection layer is used for regression, namely outputting the position coordinates (the offset relative to the real object frame) of the anchor point. And finally, outputting the candidate frame of the final defective image and the corresponding position coordinate according to the classification result and the position offset of each anchor point frame.

In an embodiment of the present application, the inputting the extracted candidate frames into a region-of-interest pooling layer in the defect detection sub-network, and outputting the classification and coordinate regression of the candidate frames as the defect detection result of the defective image includes: mapping the candidate frame to a feature map of the defective image to obtain a region of interest; and inputting the region of interest into a region of interest pooling layer in the defect detection sub-network, and outputting the classification and coordinate regression of the region of interest as a defect detection result of the defective image.

After the candidate frame is extracted through the region generation network, the candidate frame is mapped to the corresponding position of the ROI on the feature map according to the feature map corresponding to the image to be detected, then the mapped ROI can be divided into blocks with the same size (the number of the blocks is the same as the output dimension), maximum pooling Max Pooling processing is respectively carried out on each block, and finally the defect detection result of the defective image, including the position of the defect, the defect type and the like, is output through the full connection layer.

The embodiment of the present application provides a training method for an object defect detection model, as shown in fig. 7, the method includes the following steps S710 to S730:

step S710, obtaining a target image of an object to be detected, and performing feature extraction on the target image through a feature extraction sub-network of the object defect detection model to obtain a feature map of the target image.

In a scene of detecting defects of objects such as workpieces, an image of the object can be acquired as an image to be detected, and then a feature extraction sub-network, such as a convolutional neural network, is used for performing feature extraction on the image, so as to obtain a feature map of the image to be detected. The extraction sub-network of the characteristics of the embodiment of the application is a convolutional neural network structure SuperResBackbone with a simpler structure, so that the overall operation speed of the model is improved, and a certain information storage space is saved.

And S720, classifying the feature map through a classification sub-network of the object defect detection model to obtain a classification result, wherein the classification result comprises that the target image is a defective image or a non-defective image.

As described above, in the defect detection method in the prior art, after splitting an image, all the image blocks are sent to a model for training, however, since many small blocks of the image obtained after splitting are samples without defects, if all the small blocks are sent to the model for training, the positive and negative samples are extremely unbalanced, and the missing of the trained model is increased. Therefore, the pre-trained Classifier sub-network Classifier can be used to preliminarily filter the images to be detected so as to determine which images need to be subjected to subsequent defect detection.

Specifically, the extracted feature map of the image to be detected is classified by using a classification sub-network to preliminarily judge whether the image to be detected has defects, if the classification result is a defect-free image, that is, the workpiece in the image has no defects or the defects are negligible, the subsequent defect detection step is not needed, the system operation time is saved to a certain extent, and the overall defect detection efficiency is improved.

Step S730, if the target image is a defective image, performing defect detection on the target image through a defect detection subnetwork of the object defect detection model to obtain a defect detection result of the target image of the object to be detected, and feeding back defect information of the object to be detected based on the defect detection result; if the target image is a non-defective image, directly feeding back the defect detection result as non-defective; and the object defect detection model is obtained by training based on the object defect detection method.

If the classification result output by the classification sub-network is that the image to be detected is a defective image, the workpiece in the image has non-negligible defects, and further defect detection and positioning are required, wherein the defect detection result of the defective image, including the defect position, the defect category and the like, can be obtained by using the trained defect detection sub-network to perform defect detection on the defective image. And finally, feeding back the defect detection result, thereby completing the whole object defect detection process.

According to the object defect detection method, the defect-free samples are filtered out in advance through the classifying sub-networks, and due to the fact that the defect detecting sub-networks are large in operation time consumption and small in time consumption, the processing speed of the defect-free samples can be improved to a great extent, and the overall efficiency of object defect detection is improved. In addition, the method and the device can directly input the acquired ultrahigh-resolution image into the model to detect the defect without compressing or splitting the image, retain the global information and the local detail information of the image, and further obtain a more accurate defect detection result.

As shown in fig. 8, an overall flow diagram of object defect detection is provided. Firstly, obtaining an image to be detected, then extracting the characteristics of the image to be detected through a characteristic extraction sub-network in an object defect detection model, namely an improved backbone network SuperResBackbone, then classifying the extracted characteristics by using a classification sub-network Classifier to obtain the classification result of the image to be detected, wherein the classification result comprises two conditions that the image to be detected is a defective image or a non-defective image, if the classification result is that the image to be detected is a defective image, inputting the characteristics of the defective image into a defect detection sub-network for defect detection, further determining the position of a defect and the type of the defect, and finally outputting the defect detection result. And if the classification result is that the image to be detected is a non-defective image, directly returning the classification result. According to the object defect detection process, the overall speed of object defect detection is improved by adopting three network structures of the feature extraction sub-network, the classification sub-network and the defect detection sub-network, and meanwhile, an accurate detection result is obtained.

In order to verify the operation performance of the object defect detection model trained by the application, the application utilizes a torchsum tool (a tool for calculating information such as parameters of the model) to compare and check the trained model and a traditional Resnet18 model adopted in the prior art. Table 1 shows information such as the number of parameters of the Resnet18 model output by the torchsum tool, and it can be seen from table 1 that when the resolution of the Input image is 10000x10000 and the Input Size (i.e. Input Size in table 1) is 1144.41MB, the Total number of parameters of the Resnet18 model (i.e. Total parameters/slidable parameters in table 1) is 11689512, and 126.3G of video memory (i.e. Estimated Total Size in table 1) is required for normal training, which is obviously an arrangement difficult to achieve.

TABLE 1

Table 2 shows information such as the number of parameters of the feature extraction sub-network superresebackbone of the present application output by using the torchsummery tool, and similarly, in the case where the input resolution is 10000 × 10000, the feature extraction sub-network of the present application can be normally trained by only requiring 5.4G of video memory (i.e., optimized Total Size in table 2).

TABLE 2

It should be noted that the information shown in table 1 and table 2 is the conventional information Output by the torchsum tool, where layer (type) indicates the type of the network layer, including convolutional layer Conv2d, batch normalization layer BatchNorm2d, max pooling layer Maxpool2d, activation function layer ReLU, adaptive averaging pooling layer AdaptiveAvgPool2d, and net base block layer BasicBlock, etc., the Output Shape indicates the Output Shape, and Param indicates the parameter number.

The embodiment of the present application provides a training apparatus 900 for an object defect detection model, as shown in fig. 9, the apparatus 900 includes: a first feature extraction unit 910, a first classification unit 920, a first detection unit 930, and an optimization unit 940.

A first feature extraction unit 910, configured to obtain an image to be trained of an object, and perform feature extraction on the image to be trained through a feature extraction sub-network of an object defect detection model to obtain a feature map of the image to be trained, where the image to be trained includes an image set of a defective image to be trained and a non-defective image to be trained;

a first classification unit 920, configured to classify the defective images to be trained and the feature maps of the non-defective images to be trained through a classification sub-network of an object defect detection model, so as to obtain a classification result and a classification loss value of the images to be trained;

a first defect detecting unit 930, configured to perform defect detection on the defective image to be trained through a defect detecting subnetwork of the object defect detecting model, so as to obtain a defect detection result and a defect detection loss value of the defective image;

and an optimizing unit 940, configured to optimize parameters of the object defect detection model by using a gradient descent algorithm according to the classification loss value and the defect detection loss value to obtain a trained object defect detection model, and perform object defect detection based on the trained object defect detection model.

In an embodiment of the present application, the first feature extraction unit 910 is configured to: inputting the image to be trained into the feature extraction sub-network, and sequentially performing 1x1 convolution, 3x3 convolution, global maximum pooling, 3x3 convolution and global maximum pooling to obtain a sub-feature map; performing 3x3 convolution on the sub-feature map once to obtain a first output feature map; performing one-time 3x3 convolution on the first output characteristic diagram to obtain a second output characteristic diagram; performing one-time convolution of 3x3 on the second output characteristic diagram to obtain a third output characteristic diagram; and performing multiple full connections on the first output characteristic diagram, the second output characteristic diagram and the third output characteristic diagram to obtain a fourth output characteristic diagram, and taking the fourth output characteristic diagram as the characteristic diagram of the image to be trained.

In an embodiment of the present application, the first classification unit 920 is configured to: inputting the feature maps of the defective images to be trained and the non-defective images to be trained into a pooling layer of the classification sub-network for global average pooling processing, outputting a global average pooling feature map, performing global maximum pooling processing, and outputting a global maximum pooling feature map; after the global average pooling feature map and the global maximum pooling feature map are subjected to multiple times of full connection, outputting a global pooling feature map; performing Softmax operation on the global pooling feature map to obtain an output vector of the image to be trained, and obtaining a classification result of the image to be trained according to the output vector, wherein the output vector represents a defective confidence coefficient of the image to be trained; and calculating the error between the classification result of the image to be trained and the image label of the image to be trained to obtain the classification loss value of the image to be trained.

In an embodiment of the present application, the first defect detecting unit 930 is configured to: inputting the feature map of the defective image into an area generation network in the defect detection sub-network, and extracting candidate frames; inputting the extracted candidate frame into an interested region pooling layer in the defect detection sub-network, and outputting the classification and coordinate regression of the candidate frame as a defect detection result of the defective image; and calculating the error between the defect detection result of the defective image and the defect type label and the defect position of the defective image to obtain a defect detection loss value of the defective image.

In an embodiment of the present application, the first defect detecting unit 930 is configured to: generating a plurality of anchor point frames with different scales for each feature point on the feature map of the defective image; classifying each anchor frame by utilizing a first full-link layer in the area generation network to obtain a classification result of each anchor frame, and performing position regression on each anchor frame by utilizing a second full-link layer in the area generation network to obtain a position offset of each anchor frame; and outputting the candidate frame and the position coordinate of the defective image according to the classification result and the position offset of each anchor point frame.

In an embodiment of the present application, the first defect detecting unit 930 is configured to: mapping the candidate frame to a feature map of the defective image to obtain a region of interest; and inputting the region of interest into a region of interest pooling layer in the defect detection sub-network, and outputting the classification and coordinate regression of the region of interest as a defect detection result of the defective image.

An embodiment of the present application provides an object defect detecting apparatus 1000, as shown in fig. 10, the apparatus 1000 includes: a second feature extraction unit 1010, a second classification unit 1020, and a second detection unit 1030.

The second feature extraction unit 1010 of the embodiment of the application is configured to obtain an image to be detected, and perform feature extraction on the image to be detected by using a feature extraction sub-network of an object defect detection model to obtain a feature map of the image to be detected.

The second classification unit 1020 of the embodiment of the present application is configured to classify the feature map of the image to be detected by using a classification sub-network of the object defect detection model, so as to obtain a classification result of the image to be detected, where the classification result includes a defective image and a non-defective image.

The second detecting unit 1030 of the embodiment of the application is configured to perform defect detection on the defective image by using a defect detecting sub-network of the object defect detecting model, so as to obtain a defect detection result of the image to be detected.

It should be noted that, for the specific implementation of each apparatus embodiment, reference may be made to the specific implementation of the corresponding method embodiment, which is not described herein again.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in an object defect detection apparatus according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 11 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 1100 comprises a processor 1110 and a memory 1120 arranged to store computer executable instructions (computer readable program code). The memory 1120 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 1120 has a storage space 1130 storing computer readable program code 1131 for performing any of the method steps described above. For example, the memory space 1130 for storing the computer readable program code may include respective computer readable program codes 1131 for respectively implementing various steps in the above methods. The computer readable program code 1131 may be read from and written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as that shown in fig. 12. FIG. 12 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 1200 stores computer readable program code 1131 for performing the steps of the method according to the present application, which is readable by the processor 1110 of the electronic device 1100, and when the computer readable program code 1131 is executed by the electronic device 1100, causes the electronic device 1100 to perform the steps of the method described above, in particular the computer readable program code 1131 stored by the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 1131 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A training method of an object defect detection model is characterized by comprising the following steps:

2. The method for training an object defect detection model according to claim 1, wherein the performing feature extraction on the image to be trained through the feature extraction sub-network of the object defect detection model comprises:

3. The method for training the object defect detection model according to claim 1, wherein the classifying the feature maps of the defective image to be trained and the non-defective image to be trained through the classification sub-network of the object defect detection model to obtain the classification result and the classification loss value of the image to be trained comprises:

4. The method for training an object defect detection model according to claim 1, wherein the performing defect detection on the defective image to be trained through a defect detection subnetwork of the object defect detection model to obtain a defect detection result and a defect detection loss value of the defective image comprises:

5. The method for training an object defect detection model according to claim 4, wherein the inputting the feature map of the defective image into the area generation network in the defect detection sub-network, and the extracting the candidate frame comprises:

6. The method for training an object defect detection model according to claim 4, wherein the inputting the extracted candidate frames into a region-of-interest pooling layer in the defect detection sub-network, and outputting the classification and coordinate regression of the candidate frames as the defect detection result of the defective image comprises:

7. A method for detecting defects in an object, comprising:

wherein the object defect detection model is trained based on the object defect detection method according to any one of claims 1 to 6.

8. A training device for an object defect detection model is characterized by comprising:

9. An object defect detecting apparatus, comprising:

the second defect detection unit is used for carrying out defect detection on the target image through a defect detection sub-network of the object defect detection model to obtain a defect detection result of the target image of the object to be detected and feeding back defect information of the object to be detected based on the defect detection result if the target image is the defective image; if the target image is a non-defective image, directly feeding back the defect detection result as non-defective; wherein the object defect detection model is trained based on the object defect detection device of claim 8.

10. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method of training an object defect detection model as claimed in any one of claims 1 to 7.

11. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method of training an object defect detection model according to any one of claims 1 to 7.