Disclosure of Invention
The invention aims to at least solve one of the defects of the prior art and provides a detection method and a detection system suitable for highlighting the surface defects of various ceramic tiles.
In order to achieve the purpose, the invention adopts the following technical scheme:
specifically, the detection method suitable for highlighting the surface defects of various tiles comprises the following steps:
acquiring a first image shot by a first camera and a second image shot by a second camera, wherein the first camera is a color line camera and is arranged perpendicular to the shooting plane, and the second camera is a black-and-white line camera and is arranged obliquely to the shooting plane;
respectively carrying out image preprocessing operation on the first image and the second image to obtain a positioned first image and a positioned second image;
inputting the positioned first image and the second image into a pre-established neural network model, outputting feature maps of three dimensions, wherein each feature map comprises the offset information of each anchor frame and the confidence coefficient of corresponding prediction, and selecting a boundary frame with the confidence coefficient larger than a first threshold value as the final output of the predicted defect.
Further, specifically, the image preprocessing operation includes the following,
and respectively carrying out filtering and denoising processing on the first image and the second image, and then carrying out Harris corner detection to position the tile images in the first image and the second image to obtain the positioned first image and second image.
Further, the method may further comprise,
before inputting the first and second positioned images into the pre-established neural network model, the sizes of the first and second positioned images are uniformly adjusted to be integral multiples of 416x416 pixels, and then the first and second positioned images are cut into a plurality of small-size images of 416x416 pixels.
Further, specifically, the pre-established neural network model includes,
a feature extraction module, configured to extract feature information of the first and second images after positioning, where an output end of the feature extraction module is connected to an input end of a first RESn module, an output end of the first RESn module is connected to an input end of a second RESn module and an input end of a second concat module, an output end of the second RESn module is connected to an input end of a first DBL module and an input end of the first concat module, an output end of the first DBL module is connected to input ends of a first output convolution layer conv and a second DBL module respectively for up-sampling, an output end of the second DBL module is connected to an input end of the first concat module, an output end of the first concat module is connected to an input end of a third DBL module, and an output end of the third DBL module is connected to an input end of the second output convolution layer conv and an input end of a fourth DBL module respectively for up-sampling, the output end of the fourth DBL module is connected with the input end of the first concat module, the output end of the first concat module is connected with the input end of a fifth DBL module, the output end of the fifth DBL module is connected with the input end of a third output convolution layer conv, and finally the output ends of the first output convolution layer conv, the second output convolution layer conv and the third output convolution layer conv respectively output characteristic graphs with three dimensions;
the DBL module comprises a convolution layer, a normalization BN layer and a leak relu activation layer, the concat module is formed by splicing feature graphs, RESn is a jump layer structure, n represents a number and represents the number of res _ units contained in the res _ block, the res _ unit structure is formed by connecting two DBL modules and jump layers, the former DBL module adopts a convolution kernel of 1x1, and the latter DBL module adopts a convolution kernel of 3x 3.
Further, specifically, the neural network model is obtained by training in the following way,
the loss function is the same as that of a YOLOv3 network, an Adam gradient descent method is used for training, the learning rate is set to be 0.001, the iteration times are 10 epochs, the verification precision of the verification set is recorded, when the recall rate of the tile verification set reaches over 97%, the training is stopped, the model is tested, and if the recall rate of the tile on the verification set reaches over 95%, the model training is finished.
Further, specifically, the anchor frame is obtained a priori by a kmeans clustering algorithm according to the ceramic tile training data set, and the formula of the position transformation operation is shown as follows
bx=σ(tx)+cx
by=σ(ty)+cy
Wherein t isx,tyIs the predicted coordinate offset value, tw,thIs a scaling, pw、phIs a preset anchor box mapped to width and height in the feature map, cx,cyThe feature map output cells are shifted from the upper left corner of the cell of the corresponding image, and sigma (-) is a sigmoid function.
Further, the sizes of the specific three-dimensional feature maps are 13x13, 26x26 and 52x52 respectively, wherein the feature map of 13x13 has a larger receptive field and is used for predicting large defects; the 26 × 26 signature has a moderate receptive field for predicting moderate defects; the 52x52 signature has a smaller receptive field for predicting small defects.
The invention also provides a detection system suitable for highlighting the surface defects of various ceramic tiles, which comprises the following steps:
the image acquisition module is used for acquiring a first image shot by a first camera and acquiring a second image shot by a second camera, wherein the first camera is a color linear array camera and is arranged perpendicular to the shooting plane, and the second camera is a black and white linear array camera and is arranged obliquely to the shooting plane;
the image preprocessing module is used for respectively carrying out image preprocessing operation on the first image and the second image to obtain a positioned first image and a positioned second image;
and the detection module is used for inputting the positioned first image and the second image into a pre-established neural network model, outputting three-dimensional feature maps, wherein each feature map comprises the offset information of each anchor frame and the confidence coefficient of the corresponding prediction, and selecting a boundary frame with the confidence coefficient larger than a first threshold value as the final output of the predicted defect.
Further, specifically, the pre-established neural network model includes,
a feature extraction module, configured to extract feature information of the first and second images after positioning, where an output end of the feature extraction module is connected to an input end of a first RESn module, an output end of the first RESn module is connected to an input end of a second RESn module and an input end of a second concat module, an output end of the second RESn module is connected to an input end of a first DBL module and an input end of the first concat module, an output end of the first DBL module is connected to input ends of a first output convolution layer conv and a second DBL module respectively for up-sampling, an output end of the second DBL module is connected to an input end of the first concat module, an output end of the first concat module is connected to an input end of a third DBL module, and an output end of the third DBL module is connected to an input end of the second output convolution layer conv and an input end of a fourth DBL module respectively for up-sampling, the output end of the fourth DBL module is connected with the input end of the first concat module, the output end of the first concat module is connected with the input end of a fifth DBL module, the output end of the fifth DBL module is connected with the input end of a third output convolution layer conv, and finally the output ends of the first output convolution layer conv, the second output convolution layer conv and the third output convolution layer conv respectively output characteristic graphs with three dimensions;
the DBL module comprises a convolution layer, a normalization BN layer and a leak relu activation layer, the concat module is formed by splicing feature graphs, RESn is a jump layer structure, n represents a number and represents the number of res _ units contained in the res _ block, the res _ unit structure is formed by connecting two DBL modules and jump layers, the former DBL module adopts a convolution kernel of 1x1, and the latter DBL module adopts a convolution kernel of 3x 3.
The invention also proposes a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as described above.
The invention has the beneficial effects that:
the method of the invention utilizes a double camera to obtain the tile image, the fusion processing of the two images can be suitable for the tiles with various complex textures, and various unobvious defects on the surface of the tile are highlighted. And training a sample by using a convolutional neural network to enable the network to obtain a defect identification function, thereby realizing high-precision detection of the ceramic tile defects. The detection system can be suitable for detecting the defects of various tiles and has detection diversity. And the defect detection is carried out by utilizing a deep learning method, so that the labor cost is reduced, and the detection efficiency is improved.
Detailed Description
The conception, the specific structure and the technical effects of the present invention will be clearly and completely described in conjunction with the embodiments and the accompanying drawings to fully understand the objects, the schemes and the effects of the present invention. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The same reference numbers will be used throughout the drawings to refer to the same or like parts.
Referring to fig. 1 and 5, in embodiment 1, the present invention provides a detection method for highlighting surface defects of various types of tiles, including the following steps:
step 110, acquiring a first image shot by a first camera and a second image shot by a second camera, wherein the first camera is a color line camera and is arranged perpendicular to the shooting plane, and the second camera is a black-and-white line camera and is arranged obliquely to the shooting plane;
step 120, performing image preprocessing operation on the first image and the second image respectively to obtain a first image and a second image after positioning;
step 130, inputting the positioned first image and second image into a pre-established neural network model, outputting feature maps of three dimensions, wherein each feature map comprises the offset information of each anchor frame and the confidence degree of corresponding prediction, and selecting a boundary frame with the confidence degree larger than a first threshold value as the final output of the predicted defect.
The first threshold is set manually, and is set to 0.6 in the present embodiment.
Specifically, a double camera is used for shooting images of various tiles, a color camera is used for shooting color defects such as dirt and glaze dripping, and a black and white camera is used for shooting concave-convex defects such as shallow cracks and small pinholes. Preprocessing the two images, dividing the two images into small pictures, inputting the divided images into a neural network architecture for training, and processing the contrast characteristics of the two images to quickly locate the defects of the ceramic tiles. And finally, detecting the defects of the new ceramic tiles by using the trained network.
As a preferred embodiment of the present invention, specifically, the image preprocessing operation includes the following,
and respectively carrying out filtering and denoising processing on the first image and the second image, and then carrying out Harris corner detection to position the tile images in the first image and the second image to obtain the positioned first image and second image.
As a preferred embodiment of the present invention, the method further comprises,
before inputting the first and second positioned images into the pre-established neural network model, the sizes of the first and second positioned images are uniformly adjusted to be integral multiples of 416x416 pixels, and then the first and second positioned images are cut into a plurality of small-size images of 416x416 pixels.
Referring to fig. 2, 3 and 4, as a preferred embodiment of the present invention, specifically, the neural network model pre-established includes,
a feature extraction module, configured to extract feature information of the first and second images after positioning, where an output end of the feature extraction module is connected to an input end of a first RESn module, an output end of the first RESn module is connected to an input end of a second RESn module and an input end of a second concat module, an output end of the second RESn module is connected to an input end of a first DBL module and an input end of the first concat module, an output end of the first DBL module is connected to input ends of a first output convolution layer conv and a second DBL module respectively for up-sampling, an output end of the second DBL module is connected to an input end of the first concat module, an output end of the first concat module is connected to an input end of a third DBL module, and an output end of the third DBL module is connected to an input end of the second output convolution layer conv and an input end of a fourth DBL module respectively for up-sampling, the output end of the fourth DBL module is connected with the input end of the first concat module, the output end of the first concat module is connected with the input end of a fifth DBL module, the output end of the fifth DBL module is connected with the input end of a third output convolution layer conv, and finally the output ends of the first output convolution layer conv, the second output convolution layer conv and the third output convolution layer conv respectively output characteristic graphs with three dimensions;
the DBL module comprises a convolution layer, a normalization BN layer and a leak relu activation layer, the concat module is formed by splicing feature graphs, RESn is a jump layer structure, n represents a number and represents the number of res _ units contained in the res _ block, the res _ unit structure is formed by connecting two DBL modules and jump layers, the former DBL module adopts a convolution kernel of 1x1, and the latter DBL module adopts a convolution kernel of 3x 3.
In the preferred embodiment, n in the first RESn block is 8, and n in the second RESn block is 4.
In fig. 2, the first RES8 is a first RES module, the second RES4 is a second RES module, the DBL and conv in the first row are the first output convolution layer conv, the DBL and conv in the second row are the second output convolution layer conv, and the DBL and conv in the third row are the third output convolution layer conv, where the up-sampling in the first row is the up-sampling process corresponding to the second DBL module, and the up-sampling in the second row is the up-sampling process of the fourth DBL module.
As a preferred embodiment of the present invention, specifically, the neural network model is obtained by training,
the loss function is the same as that of a YOLOv3 network, an Adam gradient descent method is used for training, the learning rate is set to be 0.001, the iteration times are 10 epochs, the verification precision of the verification set is recorded, when the recall rate of the tile verification set reaches over 97%, the training is stopped, the model is tested, and if the recall rate of the tile on the verification set reaches over 95%, the model training is finished.
Specifically, the anchor frame is obtained a priori by a kmeans clustering algorithm according to the ceramic tile training data set, and the position transformation operation formula is shown as follows
bx=σ(tx)+cx
by=σ(ty)+cy
Wherein t isx,tyIs the predicted coordinate offset value, tw,thIs a scaling, pw、phIs a preset anchor box mapped to width and height in the feature map, cx,cyThe feature map output cells are shifted from the upper left corner of the cell of the corresponding image, and sigma (-) is a sigmoid function.
As a preferred embodiment of the present invention, the sizes of the specific three-dimensional feature maps are 13x13, 26x26, and 52x52, respectively, wherein the feature map of 13x13 has a larger receptive field for predicting large defects; the 26 × 26 signature has a moderate receptive field for predicting moderate defects; the 52x52 signature has a smaller receptive field for predicting small defects.
The invention also provides a detection system suitable for highlighting the surface defects of various ceramic tiles, which comprises the following steps:
the image acquisition module is used for acquiring a first image shot by a first camera and acquiring a second image shot by a second camera, wherein the first camera is a color linear array camera and is arranged perpendicular to the shooting plane, and the second camera is a black and white linear array camera and is arranged obliquely to the shooting plane;
the image preprocessing module is used for respectively carrying out image preprocessing operation on the first image and the second image to obtain a positioned first image and a positioned second image;
and the detection module is used for inputting the positioned first image and the second image into a pre-established neural network model, outputting three-dimensional feature maps, wherein each feature map comprises the offset information of each anchor frame and the confidence coefficient of the corresponding prediction, and selecting a boundary frame with the confidence coefficient larger than a first threshold value as the final output of the predicted defect.
In particular, the above-described system, when in operation,
the color camera and the black and white camera are combined with the line light source to polish and photograph the ceramic tiles, when the conveying belt conveys the ceramic tiles to enter a photographing area, the two-line-array camera starts photographing, and multi-view images of the ceramic tiles are obtained.
And respectively carrying out a series of preprocessing such as filtering and denoising, corner point detection, image segmentation and the like on the black-white tile image and the color image shot by the two cameras.
Inputting the ceramic tiles into a neural network for training, inputting black and white ceramic tile images and color images in a training set into an improved YOLOv3 network, training by using a gradient descent method, and stopping training when the recall rate of the test set reaches a set value.
And in the testing stage, inputting the ceramic tile image to be detected into the network model, and calculating the coordinates, length, width, category and confidence coefficient of the candidate frame of the ceramic tile defect at the output of the last layer to realize the detection of the defect.
As a preferred embodiment of the present invention, specifically, the neural network model pre-established includes,
a feature extraction module, configured to extract feature information of the first and second images after positioning, where an output end of the feature extraction module is connected to an input end of a first RESn module, an output end of the first RESn module is connected to an input end of a second RESn module and an input end of a second concat module, an output end of the second RESn module is connected to an input end of a first DBL module and an input end of the first concat module, an output end of the first DBL module is connected to input ends of a first output convolution layer conv and a second DBL module respectively for up-sampling, an output end of the second DBL module is connected to an input end of the first concat module, an output end of the first concat module is connected to an input end of a third DBL module, and an output end of the third DBL module is connected to an input end of the second output convolution layer conv and an input end of a fourth DBL module respectively for up-sampling, the output end of the fourth DBL module is connected with the input end of the first concat module, the output end of the first concat module is connected with the input end of a fifth DBL module, the output end of the fifth DBL module is connected with the input end of a third output convolution layer conv, and finally the output ends of the first output convolution layer conv, the second output convolution layer conv and the third output convolution layer conv respectively output characteristic graphs with three dimensions;
the DBL module comprises a convolution layer, a normalization BN layer and a leak relu activation layer, the concat module is formed by splicing feature graphs, RESn is a jump layer structure, n represents a number and represents the number of res _ units contained in the res _ block, the res _ unit structure is formed by connecting two DBL modules and jump layers, the former DBL module adopts a convolution kernel of 1x1, and the latter DBL module adopts a convolution kernel of 3x 3.
The invention also proposes a computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method as described above.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and can implement the steps of the above-described method embodiments when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium includes content that can be suitably increased or decreased according to the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunication signals according to legislation and patent practice.
While the present invention has been described in considerable detail and with particular reference to a few illustrative embodiments thereof, it is not intended to be limited to any such details or embodiments or any particular embodiments, but it is to be construed as effectively covering the intended scope of the invention by providing a broad, potential interpretation of such claims in view of the prior art with reference to the appended claims. Furthermore, the foregoing describes the invention in terms of embodiments foreseen by the inventor for which an enabling description was available, notwithstanding that insubstantial modifications of the invention, not presently foreseen, may nonetheless represent equivalent modifications thereto.
The above description is only a preferred embodiment of the present invention, and the present invention is not limited to the above embodiment, and the present invention shall fall within the protection scope of the present invention as long as the technical effects of the present invention are achieved by the same means. The invention is capable of other modifications and variations in its technical solution and/or its implementation, within the scope of protection of the invention.