CN111179278A - Image detection method, device, equipment and storage medium - Google Patents
Image detection method, device, equipment and storage medium Download PDFInfo
- Publication number
- CN111179278A CN111179278A CN201911295863.5A CN201911295863A CN111179278A CN 111179278 A CN111179278 A CN 111179278A CN 201911295863 A CN201911295863 A CN 201911295863A CN 111179278 A CN111179278 A CN 111179278A
- Authority
- CN
- China
- Prior art keywords
- image
- layer
- detected
- output image
- semantic segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 90
- 230000011218 segmentation Effects 0.000 claims abstract description 69
- 238000000034 method Methods 0.000 claims abstract description 36
- 230000007547 defect Effects 0.000 claims abstract description 32
- 238000012545 processing Methods 0.000 claims abstract description 25
- 238000011176 pooling Methods 0.000 claims description 97
- 238000010586 diagram Methods 0.000 claims description 36
- 238000007781 pre-processing Methods 0.000 claims description 25
- 101100134058 Caenorhabditis elegans nth-1 gene Proteins 0.000 claims description 3
- 238000007689 inspection Methods 0.000 claims 2
- 238000005516 engineering process Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 8
- 238000012549 training Methods 0.000 description 8
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 4
- 238000002372 labelling Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004901 spalling Methods 0.000 description 3
- 101100400452 Caenorhabditis elegans map-2 gene Proteins 0.000 description 2
- 101150064138 MAP1 gene Proteins 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000007621 cluster analysis Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000010339 dilation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012423 maintenance Methods 0.000 description 1
- 229960001685 tacrine Drugs 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/20—Image enhancement or restoration using local operators
- G06T5/30—Erosion or dilatation, e.g. thinning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The present invention relates to the field of image detection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for image detection. A method of image detection, comprising: acquiring an image to be detected; inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image; and inputting the semantic segmentation image into a detection network for detection to obtain a detection result. According to the invention, the pre-trained semantic segmentation network is utilized to perform semantic segmentation processing on the image to be detected, and then the trained detection network is utilized to perform detection. The method can complete the identification of the defects in the image by using a few data sets, has high universality, and can be suitable for the image defect detection of various objects.
Description
Technical Field
The present invention relates to the field of image detection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for image detection.
Background
With the development of economy and the continuous enhancement of comprehensive national power in China, the highway mileage is continuously increased, and simultaneously, higher requirements are put forward on the quality of roads. Meanwhile, the maintenance of the road surface is also receiving more and more attention. Road defects are particularly difficult to detect during field construction, not only because of the wide variety of road defects, such as: pits, cracks, fissures, spalling; also, there are many uncontrollable factors such as: oil shadow, road sign, oil stain and other uncontrollable factors. In the prior art for detecting the defects of the road surface, one method is to manually extract image characteristics and then carry out cluster analysis; because the features are manually extracted, the efficiency is low, and a large amount of sample data is needed for cluster analysis to obtain a relatively accurate result; or extracting and calculating the characteristics of the image by utilizing a multilayer neural network; while neural networks require a large number of samples to train. When the detection object is replaced, a sample of the detection object needs to be collected for training again. The universality is poor.
Disclosure of Invention
Therefore, embodiments of the present invention provide a method, an apparatus, and a device for image detection to solve the above problems.
In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:
according to a first aspect of embodiments of the present invention, a method of image detection includes:
acquiring an image to be detected;
inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image;
and inputting the semantic segmentation image into a detection network for detection to obtain a detection result. In one possible embodiment, the semantic segmentation network comprises n-2 convolutional layers and max-pooling layers alternately arranged; the nth-1 and nth convolution layers are also arranged behind the nth-2 maximum pooling layers in sequence; n is greater than or equal to 3.
In one possible embodiment, the output image of the nth convolutional layer is recorded as a first output image; the output image of the (n-1) th convolution layer is recorded as a second output image;
the detection network comprises: the global pooling layer comprises a global pooling layer and a convolution layer corresponding to the global pooling layer; a fully-connected layer;
the global pooling layers comprise a first global maximum pooling layer and a second global average pooling layer;
the convolutional layers comprise a first convolutional layer and a second convolutional layer;
the first convolution layer is correspondingly connected with the first global maximum pooling layer;
the second convolutional layer is correspondingly connected with the second global average pooling layer;
inputting the semantic segmentation image into a detection network for detection to obtain a detection result, wherein the detection result comprises the following steps:
inputting the first output image into a first global maximum pooling layer and a first convolution layer in sequence to obtain a first characteristic diagram of the first output image;
inputting the first output image into a first global average pooling layer and a second convolution layer in sequence to obtain a second characteristic diagram of the first output image;
the detection network also comprises M maximum pooling layers and convolution layers which are alternately arranged; wherein M is greater than 1;
combining the first output image and the second output image to obtain a third output image;
sequentially passing the third output image through the M alternately arranged maximum pooling layers and convolution layers to obtain a first characteristic diagram of the third output image;
the global pooling layers further comprise a second global maximum pooling layer and a second global average pooling layer;
the convolutional layers further comprise a third convolutional layer and a fourth convolutional layer;
the third convolutional layer is correspondingly connected with the second global maximum pooling layer;
the fourth convolutional layer is correspondingly connected with the second global average pooling layer;
sequentially passing the first feature map of the third output image through a second global maximum pooling layer and a third convolution layer to obtain a second feature map of the third output image;
inputting the first feature map of the third output image into a second global average pooling layer and a fourth convolution layer in sequence to obtain a third feature map of the third output image;
and inputting the first characteristic diagram of the first output image, the second characteristic diagram of the third output image and the third characteristic diagram of the third output image into a full-connection layer to obtain an output result.
In a possible implementation, before inputting the image to be detected into the semantic segmentation network, the method includes: and carrying out binarization processing to obtain a black-and-white image.
In a possible implementation manner, before the image to be detected is input to the semantic segmentation network, one or more of the following preprocessing steps are further included: performing expansion pretreatment on the image to be detected;
adjusting and preprocessing the size of the picture to be detected;
and carrying out rotation preprocessing on the picture to be detected.
In one possible embodiment, the image to be detected is a road surface image; and the detection result is the probability value of the defect of the road surface image.
According to a second aspect of the embodiments of the present invention, the present application further proposes an apparatus for image detection, comprising:
the image acquisition module is used for acquiring an image to be detected;
the processing module is used for inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image; and inputting the semantic segmentation image into a detection network for detection to obtain a detection result.
In a possible implementation manner, the method further includes a preprocessing module, configured to, before the processing module inputs the image to the semantic segmentation network, perform one or more of the following preprocessing on the image to be detected:
performing expansion processing on the image to be detected;
adjusting and preprocessing the size of the picture to be detected;
and carrying out rotation preprocessing on the picture to be detected. .
According to a third aspect of embodiments of the present invention, an apparatus for image detection includes: at least one processor and at least one memory;
the memory to store one or more program instructions;
the processor is configured to execute one or more program instructions to perform the method of any one of the above.
According to a fourth aspect of embodiments of the present invention, the present application further proposes a computer-readable storage medium having embodied therein one or more program instructions for being executed by a method according to any one of the above-mentioned claims.
The embodiment of the invention has the following advantages: the method comprises the steps of inputting an image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing to obtain a semantic segmentation image, and inputting the semantic segmentation image into a detection network for detection to obtain a detection result. The method is suitable for detecting various defects in the road detection pavement image; defects include pits, cracks, fissures, flaking; the method is also suitable for detecting the image defects of other objects, and has high universality; and defect detection can be accomplished with only a few samples.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.
Fig. 1 is a flowchart of an image detection method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a semantic segmentation network according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a semantic segmentation network and a detection network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating an image after performing a dilation operation according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a pre-process provided by an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating comparison between detection results of a network architecture of the present application and other types of networks provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present invention.
In the figure: 71-an image acquisition module; 72-a processing module; 81-a processor; 82-memory.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
With the rapid development of national economy, road construction rapidly advances, and for a road after construction, defects are only detected and identified by means of naked eyes; the defects include: pits, cracks, fissures, spalling, and the like; road defect detection is particularly difficult during on-site construction of a road surface, not only because of the wide variety of road defects, such as: pits, cracks, fissures, spalling; and uncontrollable factors such as oil shadow, road signs, oil stains and the like exist. In the prior art, the conventional clustering algorithm or the neural network algorithm needs a large number of data sets to obtain a relatively accurate result, so that a large number of data sets need to be prepared in advance to perform detection. The acquisition of data samples requires a great deal of effort, and if there are not a great number of data samples, normal detection cannot be performed.
Based on this, the present application proposes a method for detecting an image, see the method flowchart of image detection shown in fig. 1; the method comprises the following steps:
step S101, obtaining an image to be detected;
the image can be any image needing defect detection; including images of roads, images in mines, images of cultural relics, and the like; defects in the image are mainly manifested as cracks, fissures, and the like.
The images may be taken in real time, such as photographs of a road taken while an automobile is in motion; or may be pre-stored.
Step S102, inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image;
the semantic segmentation is to segment different objects in a picture separately; and are divided according to semantics;
the above-described images are not limited to only road images; or other images needing to be detected; including medical images, etc.
The semantic segmentation network is obtained by training in advance. In the image obtained after the image with the defects is processed by the semantic segmentation network, the defects are segmented, so that the defects are easier to detect and identify.
And step S103, inputting the semantic segmentation image into a detection network for detection to obtain a detection result.
Wherein the image to be detected is a road surface image; the detection result is the probability value of the defects in the image to be detected, and the defects comprise: pits, cracks, fissures, flaking.
The method comprises the steps of detecting the classification of the image after the network finishes segmentation, and determining the probability value of the image containing defects. For example, a defect in the image is classified and the region is identified as a defect to be distinguished from a normal region. And outputs a final probability value indicating the probability that the picture contains defects.
The method for detecting the image comprises the steps of carrying out semantic segmentation on the image by using a semantic segmentation network, then realizing detection on the defect by using a detection network, and determining the probability of the defect.
In a possible embodiment, refer to the structural diagram of the semantic segmentation network shown in fig. 2;
the semantic segmentation network comprises n-2 convolution layers and a maximum pooling layer which are alternately arranged; the nth-1 and nth convolution layers are also arranged behind the nth-2 maximum pooling layers in sequence; n is greater than or equal to 3. The output of the nth convolutional layer is used as a semantic segmentation image.
To improve generalization capability, in one embodiment, see the architectural diagram shown in FIG. 3;
the output image of the nth convolution layer is recorded as a first output image; the output image of the (n-1) th convolution layer is recorded as a second output image;
the detection network comprises: the global pooling layer comprises a global pooling layer and a convolution layer corresponding to the global pooling layer; a fully-connected layer;
the global pooling layers comprise a first global maximum pooling layer and a second global average pooling layer;
the convolutional layers comprise a first convolutional layer and a second convolutional layer;
the first convolution layer is correspondingly connected with the first global maximum pooling layer;
the second convolutional layer is correspondingly connected with the second global average pooling layer;
inputting the semantic segmentation image into a detection network for detection to obtain a detection result, wherein the detection result comprises the following steps:
inputting the first output image into a first global maximum pooling layer and a first convolution layer in sequence to obtain a first characteristic diagram of the first output image;
among them, the first convolutional layer is preferably a 1 × 1 convolutional layer.
Inputting the first output image into a first global average pooling layer and a second convolution layer in sequence to obtain a second characteristic diagram of the first output image;
among them, the first convolutional layer is preferably a 1 × 1 convolutional layer.
The detection network also comprises M maximum pooling layers and convolution layers which are alternately arranged; wherein M is greater than 1;
combining the first output image and the second output image to obtain a third output image;
sequentially passing the third output image through the M alternately arranged maximum pooling layers and convolution layers to obtain a first characteristic diagram of the third output image;
the global pooling layers further comprise a second global maximum pooling layer and a second global average pooling layer;
the convolutional layers further comprise a third convolutional layer and a fourth convolutional layer;
the third convolutional layer is correspondingly connected with the second global maximum pooling layer;
the fourth convolutional layer is correspondingly connected with the second global average pooling layer;
sequentially passing the first feature map of the third output image through a second global maximum pooling layer and a third convolution layer to obtain a second feature map of the third output image;
among them, the third convolutional layer is preferably a 32 × 1 convolutional layer.
Inputting the first feature map of the third output image into a second global average pooling layer and a fourth convolution layer in sequence to obtain a third feature map of the third output image;
among them, the third convolutional layer is preferably a 32 × 1 convolutional layer.
It is worth emphasizing that other types of convolutional layers may also be employed for the first convolutional layer, the second convolutional layer, the third convolutional layer and the fourth convolutional layer; nx 1; n is an integer greater than 0.
And inputting the first characteristic diagram of the first output image, the second characteristic diagram of the third output image and the third characteristic diagram of the third output image into a full-connection layer to obtain an output result.
In a possible implementation, before the image to be detected is input to the semantic segmentation network, preprocessing is required, including: and carrying out binarization processing on the image to be detected to obtain a black-and-white image.
The image binarization is to set the gray value of a pixel point on an image to be 0 or 255, that is, the whole image presents an obvious black-and-white effect.
In a possible implementation manner, before the image to be detected is input to the semantic segmentation network, one or more of the following preprocessing steps are further included:
performing expansion pretreatment on the image to be detected;
adjusting and preprocessing the size of the picture to be detected;
and carrying out rotation preprocessing on the picture to be detected.
For details of specific implementation, refer to the schematic structural diagram of the semantic segmentation network and the detection network shown in fig. 3;
firstly, an input image passes through a first convolution layer and a first maximum pooling layer to obtain a feature map 1;
wherein the first convolution layer has two 32x5x5 convolution kernels; the convolution kernel for the largest pooling layer is 2x 2;
the feature map 1 passes through a second convolution layer and a second maximum pooling layer to obtain a feature map 2;
wherein the second convolutional layer has three convolution kernels of 64x5x 5; the convolution kernel for the second max pooling layer is 2x 2;
obtaining a feature map 3 by the feature map 2 through a third convolution layer and a third maximum pooling layer; wherein the third convolutional layer has four 64x5x5 convolutional kernels; the convolution kernel of the third max pooling layer is 2x 2;
passing the feature map 3 through a fourth convolution layer to obtain a feature map 4;
wherein the fourth convolutional layer is provided with 1 convolution kernel of 1024x15x 15; passing the feature map 4 through a fifth convolution layer to obtain a feature map 5;
wherein, the fifth convolution layer is provided with a convolution kernel of 1x1x 1; feature map 5; i.e. the output of the semantic segmentation network;
the details of the detection network for implementing defect detection are as follows:
firstly, combining a feature map 4 and a feature map 5 to obtain a feature map 6;
respectively carrying out global maximum pooling and global average pooling on the feature map 5, and respectively passing through convolution layers of 1x1 to obtain a feature map 7 and a feature map 8 of 1x 1;
passing the feature map 6 through a maximum pooling layer of 2x2 to obtain a feature map 9;
the feature map 9 is processed by a convolution layer with a convolution kernel of 8x5x5 and a maximum pooling layer with a convolution kernel of 2x2 to obtain a feature map 10;
the feature map 10 is processed by a convolution layer with a convolution kernel of 16x5x5 and a maximum pooling layer with a convolution kernel of 2x2 to obtain a feature map 11;
the feature map 11 is processed by a convolution layer with a convolution kernel of 32x5x5 and a maximum pooling layer with a convolution kernel of 2x2 to obtain a feature map 12;
respectively passing the feature map 12 through a global maximum pooling layer and a global average pooling layer, and respectively passing through convolution layers of 32x1 to obtain a feature map 13 and a feature map 14 of 32x 1;
feature maps 7, 8, 13, and 14 are combined and input to the full connection layer.
And outputting a final defect detection result by the full connection layer. For example, the detection result may be 0.996, which indicates that the probability of the defect existing in the image is 0.996.
After the semantic segmentation network processing, each semantic object in the image can be obviously distinguished; defects can be segmented from the image.
The training process for the semantic segmentation network comprises the following steps:
firstly, obtaining a plurality of sample images;
wherein, the sample image and the target image should be images belonging to the same type; if the target image is the image of the road with the flaw, the sample image should also be the image of the road with the flaw;
the semantic labeling method comprises the following steps of firstly, manually labeling a plurality of sample images to obtain a semantic labeling image of each sample image; the semantic mark image can be a black and white image; wherein, white represents cracks and flaws; normal pavement is represented by black; the samples comprise each sample image and a corresponding semantic mark image;
inputting the sample images into a pre-trained semantic segmentation network model,
training a semantic segmentation network; the method comprises the steps of classifying pixels in each image to obtain an identification result; comparing the image with the pixel at the position in the standard image, and calculating to obtain an error; finally, calculating the errors of all pixel points; and finally accumulating the errors of all the pixel points. Wherein, preferably, the loss function preferentially adopts a cross entropy loss function. If the loss value obtained by the calculation does not reach the preset threshold value; updating the weight value in each convolution kernel, and recalculating to obtain a new loss value; after the final multiple loop iteration calculation, stopping iteration when the calculated loss value is smaller than a preset threshold value; and the semantics are distributed to the network model for successful training.
And after the training of the semantic segmentation network is finished, training the detection network.
In one embodiment, before the image is input into a semantic segmentation network, the image is subjected to expansion processing to obtain an expanded image;
wherein, the expansion processing refers to the convolution of the original image by a preset convolution kernel; the convolved image has large cracks and produces an effect of expansion. Referring to FIG. 4, a schematic illustration of an image after a dilation operation is performed; processing the images by adopting 5 different convolution kernels can obtain 5 different expanded images; wherein the convolution kernels of images (a), (b), (c), (d), and (e) increase in order; according to the invention, tests show that the final recognition accuracy of the graph (e) which is not the most obvious in expansion is high; but rather, graph (b); the convolution kernel used in fig. b is 5x 5.
In addition to the dilated convolution kernel, the following hyper-parameters need to be determined, see the preprocessing diagram shown in fig. 5; the hyper-parameters further include:
1) type of loss function of the first partial semantic segmentation network: the loss function includes various kinds, such as a mean square error loss function and a cross entropy loss function; preferably, the present application employs a cross entropy loss function.
2) Determining the size of the image; wherein the image size includes a full-size image and a half-size image; the present application is preferably a full size image; the half-size image is an image whose height is half of the original image without changing the length of the image. The image size can be input by the user, so that the image can be cropped, for example, the size of 100x200 is input, and the image deformation is automatically completed.
3) The angle of rotation of the picture. The angle of rotation may be 90 degrees; other arbitrary angles are also possible; the present application is preferably intended not to rotate. If 0 degree is input, the picture is not rotated.
The preprocessing mainly comprises the four aspects, and after the preprocessing is finished, the preprocessed image is input into the model architecture; to demonstrate the superior performance of the present application; three model architectures other than the present application can be adopted for comparison with the model architecture of the present application; in the training process, a random gradient descent (SGD) method is adopted to train the model, the iteration times (epochs) of the whole data are 100 times, the learning rate of the model is 0.1, and the parameters of the model are initialized according to the random numbers generated by normal distribution with the mean value of 0 and the variance of 0.01. To illustrate the superiority of the network architecture proposed by the present application, three other neural network architectures that are effective in the anomaly detection field will be used: U-Net, DeepLabv3+ and the commercial software Cognex ViDi Suite. Refer to fig. 6 for a schematic diagram of the detection results of the network architecture of the present application compared with the network architecture of other forms. It can be seen that the detection accuracy of the present application is higher than that of other network architectures.
Corresponding to the method, the present application also proposes an image detection device, referring to the schematic structural diagram of an image detection device shown in fig. 7; the device includes:
an image acquisition module 71, configured to acquire an image to be detected;
the processing module 72 is configured to input the image into a pre-trained semantic segmentation network to perform semantic segmentation processing to obtain a semantic segmentation image; and inputting the semantic segmentation image into a detection network for detection to obtain a detection result.
In one embodiment, the detection network comprises: the global pooling layer comprises a global pooling layer and a convolution layer corresponding to the global pooling layer; a fully-connected layer;
the global pooling layers comprise a first global maximum pooling layer and a second global average pooling layer;
the convolutional layers comprise a first convolutional layer and a second convolutional layer;
the first convolution layer is correspondingly connected with the first global maximum pooling layer;
the second convolutional layer is correspondingly connected with the second global average pooling layer;
the processing module 72 is further configured to sequentially input the first output image into a first global maximum pooling layer and a first convolution layer to obtain a first feature map of the first output image;
inputting the first output image into a first global average pooling layer and a second convolution layer in sequence to obtain a second characteristic diagram of the first output image;
the detection network also comprises M maximum pooling layers and convolution layers which are alternately arranged; wherein M is greater than 1;
combining the first output image and the second output image to obtain a third output image;
sequentially passing the third output image through the M alternately arranged maximum pooling layers and convolution layers to obtain a first characteristic diagram of the third output image;
the global pooling layers further comprise a second global maximum pooling layer and a second global average pooling layer;
the convolutional layers further comprise a third convolutional layer and a fourth convolutional layer;
the third convolutional layer is correspondingly connected with the second global maximum pooling layer;
the fourth convolutional layer is correspondingly connected with the second global average pooling layer;
the processing module 72 is further configured to sequentially pass the first feature map of the third output image through a second global maximum pooling layer and a third convolution layer to obtain a second feature map of a third output image;
inputting the first feature map of the third output image into a second global average pooling layer and a fourth convolution layer in sequence to obtain a third feature map of the third output image;
and inputting the first characteristic diagram of the first output image, the second characteristic diagram of the third output image and the third characteristic diagram of the third output image into a full-connection layer to obtain an output result. In an embodiment, the method further includes a preprocessing module, configured to perform one or more of the following preprocessing on the image to be detected before inputting the image to be detected into the semantic segmentation network:
performing expansion pretreatment on the image to be detected;
adjusting and preprocessing the size of the picture to be detected;
and carrying out rotation preprocessing on the picture to be detected.
The preprocessing module is further used for carrying out binarization processing on the image to be detected to obtain a black-and-white image before the image to be detected is input into a semantic segmentation network.
The present application further provides an apparatus for image detection, comprising: at least one processor 81 and at least one memory 82;
the memory 82 for storing one or more program instructions;
the processor 81 is configured to execute one or more program instructions to perform the method according to any one of the above-mentioned embodiments.
In a fourth aspect, the present application also proposes a computer-readable storage medium having embodied therein one or more program instructions for being executed by a method according to any one of the above.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.
Claims (10)
1. A method of image detection, comprising:
acquiring an image to be detected;
inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image;
and inputting the semantic segmentation image into a detection network for detection to obtain a detection result.
2. The method of claim 1, wherein the semantic segmentation network comprises n-2 convolutional layers and max-pooling layers alternately arranged; the nth-1 and nth convolution layers are also arranged behind the nth-2 maximum pooling layers in sequence; n is greater than or equal to 3.
3. The method of claim 2, wherein the output image of the nth convolutional layer is denoted as a first output image; the output image of the (n-1) th convolution layer is recorded as a second output image;
the detection network comprises: the global pooling layer comprises a global pooling layer and a convolution layer corresponding to the global pooling layer; a fully-connected layer;
the global pooling layers comprise a first global maximum pooling layer and a second global average pooling layer;
the convolutional layers comprise a first convolutional layer and a second convolutional layer;
the first convolution layer is correspondingly connected with the first global maximum pooling layer;
the second convolutional layer is correspondingly connected with the second global average pooling layer;
inputting the semantic segmentation image into a detection network for detection to obtain a detection result, wherein the detection result comprises the following steps:
inputting the first output image into a first global maximum pooling layer and a first convolution layer in sequence to obtain a first characteristic diagram of the first output image;
inputting the first output image into a first global average pooling layer and a second convolution layer in sequence to obtain a second characteristic diagram of the first output image;
the detection network also comprises M maximum pooling layers and convolution layers which are alternately arranged; wherein M is greater than 1;
combining the first output image and the second output image to obtain a third output image;
sequentially passing the third output image through the M alternately arranged maximum pooling layers and convolution layers to obtain a first characteristic diagram of the third output image;
the global pooling layers further comprise a second global maximum pooling layer and a second global average pooling layer;
the convolutional layers further comprise a third convolutional layer and a fourth convolutional layer;
the third convolutional layer is correspondingly connected with the second global maximum pooling layer;
the fourth convolutional layer is correspondingly connected with the second global average pooling layer;
sequentially passing the first feature map of the third output image through a second global maximum pooling layer and a third convolution layer to obtain a second feature map of the third output image;
inputting the first feature map of the third output image into a second global average pooling layer and a fourth convolution layer in sequence to obtain a second feature map of the third output image;
inputting the first feature map of the third output image into a second global average pooling layer and a fourth convolution layer in sequence to obtain a third feature map of the third output image;
and inputting the first characteristic diagram of the first output image, the second characteristic diagram of the third output image and the third characteristic diagram of the third output image into a full-connection layer to obtain an output result.
4. The method of claim 1, wherein inputting the image to be detected into a semantic segmentation network comprises: and carrying out binarization processing on the image to be detected to obtain a black-and-white image.
5. The method of claim 1, wherein before the image to be detected is input to the semantic segmentation network, the image to be detected is preprocessed by one or more of the following steps:
performing expansion pretreatment on the image to be detected;
adjusting and preprocessing the size of the picture to be detected;
and carrying out rotation preprocessing on the picture to be detected.
6. The method according to claim 1, characterized in that the image to be detected is a road surface image; and the detection result is the probability value of the defect of the road surface image.
7. An apparatus for image inspection, comprising:
the image acquisition module is used for acquiring an image to be detected;
the processing module is used for inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image; and inputting the semantic segmentation image into a detection network for detection to obtain a detection result.
8. The apparatus of claim 7, further comprising a pre-processing module, configured to perform one or more of the following pre-processing on the image to be detected before inputting the image to be detected into the semantic segmentation network:
performing expansion pretreatment on the image to be detected;
adjusting and preprocessing the size of the picture to be detected;
and carrying out rotation preprocessing on the picture to be detected.
9. An apparatus for image inspection, comprising: at least one processor and at least one memory;
the memory to store one or more program instructions;
the processor, configured to execute one or more program instructions to perform the method of any of claims 1-6.
10. A computer-readable storage medium having one or more program instructions embodied therein for being executed to perform the method of any one of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911295863.5A CN111179278B (en) | 2019-12-16 | 2019-12-16 | Image detection method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911295863.5A CN111179278B (en) | 2019-12-16 | 2019-12-16 | Image detection method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111179278A true CN111179278A (en) | 2020-05-19 |
CN111179278B CN111179278B (en) | 2021-02-05 |
Family
ID=70648879
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911295863.5A Active CN111179278B (en) | 2019-12-16 | 2019-12-16 | Image detection method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111179278B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112706764A (en) * | 2020-12-30 | 2021-04-27 | 潍柴动力股份有限公司 | Active anti-collision early warning method, device, equipment and storage medium |
WO2022121531A1 (en) * | 2020-12-09 | 2022-06-16 | 歌尔股份有限公司 | Product defect detection method and apparatus |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019104767A1 (en) * | 2017-11-28 | 2019-06-06 | 河海大学常州校区 | Fabric defect detection method based on deep convolutional neural network and visual saliency |
CN109886964A (en) * | 2019-03-29 | 2019-06-14 | 北京百度网讯科技有限公司 | Circuit board defect detection method, device and equipment |
CN110473173A (en) * | 2019-07-24 | 2019-11-19 | 熵智科技(深圳)有限公司 | A kind of defect inspection method based on deep learning semantic segmentation |
-
2019
- 2019-12-16 CN CN201911295863.5A patent/CN111179278B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019104767A1 (en) * | 2017-11-28 | 2019-06-06 | 河海大学常州校区 | Fabric defect detection method based on deep convolutional neural network and visual saliency |
CN109886964A (en) * | 2019-03-29 | 2019-06-14 | 北京百度网讯科技有限公司 | Circuit board defect detection method, device and equipment |
CN110473173A (en) * | 2019-07-24 | 2019-11-19 | 熵智科技(深圳)有限公司 | A kind of defect inspection method based on deep learning semantic segmentation |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022121531A1 (en) * | 2020-12-09 | 2022-06-16 | 歌尔股份有限公司 | Product defect detection method and apparatus |
CN112706764A (en) * | 2020-12-30 | 2021-04-27 | 潍柴动力股份有限公司 | Active anti-collision early warning method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111179278B (en) | 2021-02-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112348787B (en) | Training method of object defect detection model, object defect detection method and device | |
CN111257341A (en) | Underwater building crack detection method based on multi-scale features and stacked full convolution network | |
CN109615604A (en) | Accessory appearance flaw detection method based on image reconstruction convolutional neural networks | |
CN109584206B (en) | Method for synthesizing training sample of neural network in part surface flaw detection | |
CN111179278B (en) | Image detection method, device, equipment and storage medium | |
CN108932471B (en) | Vehicle detection method | |
CN114155474A (en) | Damage identification technology based on video semantic segmentation algorithm | |
Li et al. | Automated classification and detection of multiple pavement distress images based on deep learning | |
CN116206112A (en) | Remote sensing image semantic segmentation method based on multi-scale feature fusion and SAM | |
CN110866931B (en) | Image segmentation model training method and classification-based enhanced image segmentation method | |
CN110728214B (en) | Weak and small figure target detection method based on scale matching | |
CN117576073A (en) | Road defect detection method, device and medium based on improved YOLOv8 model | |
Yu | YOLO V5s-based deep learning approach for concrete cracks detection | |
Wang et al. | Underground defects detection based on GPR by fusing simple linear iterative clustering phash (SLIC-phash) and convolutional block attention module (CBAM)-YOLOv8 | |
Yin et al. | Road Damage Detection and Classification based on Multi-level Feature Pyramids. | |
CN113988222A (en) | Forest fire detection and identification method based on fast-RCNN | |
CN111325724B (en) | Tunnel crack region detection method and device | |
CN116052110B (en) | Intelligent positioning method and system for pavement marking defects | |
Ibrahem et al. | Weakly supervised traffic sign detection in real time using single CNN architecture for multiple purposes | |
CN116735723A (en) | Rail damage ultrasonic positioning and identifying system | |
CN113763384B (en) | Defect detection method and defect detection device in industrial quality inspection | |
Qinghua et al. | Research on YOLOX-based tire defect detection method | |
Sun et al. | Contextual models for automatic building extraction in high resolution remote sensing image using object-based boosting method | |
CN111242940B (en) | Tongue image segmentation method based on weak supervised learning | |
Gao et al. | A fast surface-defect detection method based on Dense-Yolo network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |