CN111179278A

CN111179278A - Image detection method, device, equipment and storage medium

Info

Publication number: CN111179278A
Application number: CN201911295863.5A
Authority: CN
Inventors: 田万; 朱晓宁; 吴喆峰
Original assignee: Jingying Digital Technology Co Ltd
Current assignee: Jingying Digital Technology Co Ltd
Priority date: 2019-12-16
Filing date: 2019-12-16
Publication date: 2020-05-19
Anticipated expiration: 2039-12-16
Also published as: CN111179278B

Abstract

The present invention relates to the field of image detection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for image detection. A method of image detection, comprising: acquiring an image to be detected; inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image; and inputting the semantic segmentation image into a detection network for detection to obtain a detection result. According to the invention, the pre-trained semantic segmentation network is utilized to perform semantic segmentation processing on the image to be detected, and then the trained detection network is utilized to perform detection. The method can complete the identification of the defects in the image by using a few data sets, has high universality, and can be suitable for the image defect detection of various objects.

Description

Image detection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of image detection technologies, and in particular, to a method, an apparatus, a device, and a storage medium for image detection.

Background

With the development of economy and the continuous enhancement of comprehensive national power in China, the highway mileage is continuously increased, and simultaneously, higher requirements are put forward on the quality of roads. Meanwhile, the maintenance of the road surface is also receiving more and more attention. Road defects are particularly difficult to detect during field construction, not only because of the wide variety of road defects, such as: pits, cracks, fissures, spalling; also, there are many uncontrollable factors such as: oil shadow, road sign, oil stain and other uncontrollable factors. In the prior art for detecting the defects of the road surface, one method is to manually extract image characteristics and then carry out cluster analysis; because the features are manually extracted, the efficiency is low, and a large amount of sample data is needed for cluster analysis to obtain a relatively accurate result; or extracting and calculating the characteristics of the image by utilizing a multilayer neural network; while neural networks require a large number of samples to train. When the detection object is replaced, a sample of the detection object needs to be collected for training again. The universality is poor.

Disclosure of Invention

Therefore, embodiments of the present invention provide a method, an apparatus, and a device for image detection to solve the above problems.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

according to a first aspect of embodiments of the present invention, a method of image detection includes:

acquiring an image to be detected;

inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image;

and inputting the semantic segmentation image into a detection network for detection to obtain a detection result. In one possible embodiment, the semantic segmentation network comprises n-2 convolutional layers and max-pooling layers alternately arranged; the nth-1 and nth convolution layers are also arranged behind the nth-2 maximum pooling layers in sequence; n is greater than or equal to 3.

In one possible embodiment, the output image of the nth convolutional layer is recorded as a first output image; the output image of the (n-1) th convolution layer is recorded as a second output image;

the detection network comprises: the global pooling layer comprises a global pooling layer and a convolution layer corresponding to the global pooling layer; a fully-connected layer;

the global pooling layers comprise a first global maximum pooling layer and a second global average pooling layer;

the convolutional layers comprise a first convolutional layer and a second convolutional layer;

the first convolution layer is correspondingly connected with the first global maximum pooling layer;

the second convolutional layer is correspondingly connected with the second global average pooling layer;

inputting the semantic segmentation image into a detection network for detection to obtain a detection result, wherein the detection result comprises the following steps:

inputting the first output image into a first global maximum pooling layer and a first convolution layer in sequence to obtain a first characteristic diagram of the first output image;

inputting the first output image into a first global average pooling layer and a second convolution layer in sequence to obtain a second characteristic diagram of the first output image;

the detection network also comprises M maximum pooling layers and convolution layers which are alternately arranged; wherein M is greater than 1;

combining the first output image and the second output image to obtain a third output image;

sequentially passing the third output image through the M alternately arranged maximum pooling layers and convolution layers to obtain a first characteristic diagram of the third output image;

the global pooling layers further comprise a second global maximum pooling layer and a second global average pooling layer;

the convolutional layers further comprise a third convolutional layer and a fourth convolutional layer;

the third convolutional layer is correspondingly connected with the second global maximum pooling layer;

the fourth convolutional layer is correspondingly connected with the second global average pooling layer;

sequentially passing the first feature map of the third output image through a second global maximum pooling layer and a third convolution layer to obtain a second feature map of the third output image;

inputting the first feature map of the third output image into a second global average pooling layer and a fourth convolution layer in sequence to obtain a third feature map of the third output image;

and inputting the first characteristic diagram of the first output image, the second characteristic diagram of the third output image and the third characteristic diagram of the third output image into a full-connection layer to obtain an output result.

In a possible implementation, before inputting the image to be detected into the semantic segmentation network, the method includes: and carrying out binarization processing to obtain a black-and-white image.

In a possible implementation manner, before the image to be detected is input to the semantic segmentation network, one or more of the following preprocessing steps are further included: performing expansion pretreatment on the image to be detected;

adjusting and preprocessing the size of the picture to be detected;

and carrying out rotation preprocessing on the picture to be detected.

In one possible embodiment, the image to be detected is a road surface image; and the detection result is the probability value of the defect of the road surface image.

According to a second aspect of the embodiments of the present invention, the present application further proposes an apparatus for image detection, comprising:

the image acquisition module is used for acquiring an image to be detected;

the processing module is used for inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image; and inputting the semantic segmentation image into a detection network for detection to obtain a detection result.

In a possible implementation manner, the method further includes a preprocessing module, configured to, before the processing module inputs the image to the semantic segmentation network, perform one or more of the following preprocessing on the image to be detected:

performing expansion processing on the image to be detected;

adjusting and preprocessing the size of the picture to be detected;

and carrying out rotation preprocessing on the picture to be detected. .

According to a third aspect of embodiments of the present invention, an apparatus for image detection includes: at least one processor and at least one memory;

the memory to store one or more program instructions;

the processor is configured to execute one or more program instructions to perform the method of any one of the above.

According to a fourth aspect of embodiments of the present invention, the present application further proposes a computer-readable storage medium having embodied therein one or more program instructions for being executed by a method according to any one of the above-mentioned claims.

The embodiment of the invention has the following advantages: the method comprises the steps of inputting an image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing to obtain a semantic segmentation image, and inputting the semantic segmentation image into a detection network for detection to obtain a detection result. The method is suitable for detecting various defects in the road detection pavement image; defects include pits, cracks, fissures, flaking; the method is also suitable for detecting the image defects of other objects, and has high universality; and defect detection can be accomplished with only a few samples.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.

Fig. 1 is a flowchart of an image detection method according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a semantic segmentation network according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a semantic segmentation network and a detection network according to an embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating an image after performing a dilation operation according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a pre-process provided by an embodiment of the present invention;

fig. 6 is a schematic diagram illustrating comparison between detection results of a network architecture of the present application and other types of networks provided by an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present invention.

In the figure: 71-an image acquisition module; 72-a processing module; 81-a processor; 82-memory.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

With the rapid development of national economy, road construction rapidly advances, and for a road after construction, defects are only detected and identified by means of naked eyes; the defects include: pits, cracks, fissures, spalling, and the like; road defect detection is particularly difficult during on-site construction of a road surface, not only because of the wide variety of road defects, such as: pits, cracks, fissures, spalling; and uncontrollable factors such as oil shadow, road signs, oil stains and the like exist. In the prior art, the conventional clustering algorithm or the neural network algorithm needs a large number of data sets to obtain a relatively accurate result, so that a large number of data sets need to be prepared in advance to perform detection. The acquisition of data samples requires a great deal of effort, and if there are not a great number of data samples, normal detection cannot be performed.

Based on this, the present application proposes a method for detecting an image, see the method flowchart of image detection shown in fig. 1; the method comprises the following steps:

step S101, obtaining an image to be detected;

the image can be any image needing defect detection; including images of roads, images in mines, images of cultural relics, and the like; defects in the image are mainly manifested as cracks, fissures, and the like.

The images may be taken in real time, such as photographs of a road taken while an automobile is in motion; or may be pre-stored.

Step S102, inputting the image to be detected into a pre-trained semantic segmentation network for semantic segmentation processing, and obtaining a semantic segmentation image;

the semantic segmentation is to segment different objects in a picture separately; and are divided according to semantics;

the above-described images are not limited to only road images; or other images needing to be detected; including medical images, etc.

The semantic segmentation network is obtained by training in advance. In the image obtained after the image with the defects is processed by the semantic segmentation network, the defects are segmented, so that the defects are easier to detect and identify.

And step S103, inputting the semantic segmentation image into a detection network for detection to obtain a detection result.

Wherein the image to be detected is a road surface image; the detection result is the probability value of the defects in the image to be detected, and the defects comprise: pits, cracks, fissures, flaking.

The method comprises the steps of detecting the classification of the image after the network finishes segmentation, and determining the probability value of the image containing defects. For example, a defect in the image is classified and the region is identified as a defect to be distinguished from a normal region. And outputs a final probability value indicating the probability that the picture contains defects.

The method for detecting the image comprises the steps of carrying out semantic segmentation on the image by using a semantic segmentation network, then realizing detection on the defect by using a detection network, and determining the probability of the defect.

In a possible embodiment, refer to the structural diagram of the semantic segmentation network shown in fig. 2;

the semantic segmentation network comprises n-2 convolution layers and a maximum pooling layer which are alternately arranged; the nth-1 and nth convolution layers are also arranged behind the nth-2 maximum pooling layers in sequence; n is greater than or equal to 3. The output of the nth convolutional layer is used as a semantic segmentation image.

To improve generalization capability, in one embodiment, see the architectural diagram shown in FIG. 3;

the output image of the nth convolution layer is recorded as a first output image; the output image of the (n-1) th convolution layer is recorded as a second output image;

among them, the first convolutional layer is preferably a 1 × 1 convolutional layer.

among them, the third convolutional layer is preferably a 32 × 1 convolutional layer.

It is worth emphasizing that other types of convolutional layers may also be employed for the first convolutional layer, the second convolutional layer, the third convolutional layer and the fourth convolutional layer; nx 1; n is an integer greater than 0.

In a possible implementation, before the image to be detected is input to the semantic segmentation network, preprocessing is required, including: and carrying out binarization processing on the image to be detected to obtain a black-and-white image.

The image binarization is to set the gray value of a pixel point on an image to be 0 or 255, that is, the whole image presents an obvious black-and-white effect.

In a possible implementation manner, before the image to be detected is input to the semantic segmentation network, one or more of the following preprocessing steps are further included:

performing expansion pretreatment on the image to be detected;

adjusting and preprocessing the size of the picture to be detected;

and carrying out rotation preprocessing on the picture to be detected.

For details of specific implementation, refer to the schematic structural diagram of the semantic segmentation network and the detection network shown in fig. 3;

firstly, an input image passes through a first convolution layer and a first maximum pooling layer to obtain a feature map 1;

wherein the first convolution layer has two 32x5x5 convolution kernels; the convolution kernel for the largest pooling layer is 2x 2;

the feature map 1 passes through a second convolution layer and a second maximum pooling layer to obtain a feature map 2;

wherein the second convolutional layer has three convolution kernels of 64x5x 5; the convolution kernel for the second max pooling layer is 2x 2;

obtaining a feature map 3 by the feature map 2 through a third convolution layer and a third maximum pooling layer; wherein the third convolutional layer has four 64x5x5 convolutional kernels; the convolution kernel of the third max pooling layer is 2x 2;

passing the feature map 3 through a fourth convolution layer to obtain a feature map 4;

wherein the fourth convolutional layer is provided with 1 convolution kernel of 1024x15x 15; passing the feature map 4 through a fifth convolution layer to obtain a feature map 5;

wherein, the fifth convolution layer is provided with a convolution kernel of 1x1x 1; feature map 5; i.e. the output of the semantic segmentation network;

the details of the detection network for implementing defect detection are as follows:

firstly, combining a feature map 4 and a feature map 5 to obtain a feature map 6;

respectively carrying out global maximum pooling and global average pooling on the feature map 5, and respectively passing through convolution layers of 1x1 to obtain a feature map 7 and a feature map 8 of 1x 1;

passing the feature map 6 through a maximum pooling layer of 2x2 to obtain a feature map 9;

the feature map 9 is processed by a convolution layer with a convolution kernel of 8x5x5 and a maximum pooling layer with a convolution kernel of 2x2 to obtain a feature map 10;

the feature map 10 is processed by a convolution layer with a convolution kernel of 16x5x5 and a maximum pooling layer with a convolution kernel of 2x2 to obtain a feature map 11;

the feature map 11 is processed by a convolution layer with a convolution kernel of 32x5x5 and a maximum pooling layer with a convolution kernel of 2x2 to obtain a feature map 12;

respectively passing the feature map 12 through a global maximum pooling layer and a global average pooling layer, and respectively passing through convolution layers of 32x1 to obtain a feature map 13 and a feature map 14 of 32x 1;

feature maps 7, 8, 13, and 14 are combined and input to the full connection layer.

And outputting a final defect detection result by the full connection layer. For example, the detection result may be 0.996, which indicates that the probability of the defect existing in the image is 0.996.

After the semantic segmentation network processing, each semantic object in the image can be obviously distinguished; defects can be segmented from the image.

The training process for the semantic segmentation network comprises the following steps:

firstly, obtaining a plurality of sample images;

wherein, the sample image and the target image should be images belonging to the same type; if the target image is the image of the road with the flaw, the sample image should also be the image of the road with the flaw;

the semantic labeling method comprises the following steps of firstly, manually labeling a plurality of sample images to obtain a semantic labeling image of each sample image; the semantic mark image can be a black and white image; wherein, white represents cracks and flaws; normal pavement is represented by black; the samples comprise each sample image and a corresponding semantic mark image;

inputting the sample images into a pre-trained semantic segmentation network model,

training a semantic segmentation network; the method comprises the steps of classifying pixels in each image to obtain an identification result; comparing the image with the pixel at the position in the standard image, and calculating to obtain an error; finally, calculating the errors of all pixel points; and finally accumulating the errors of all the pixel points. Wherein, preferably, the loss function preferentially adopts a cross entropy loss function. If the loss value obtained by the calculation does not reach the preset threshold value; updating the weight value in each convolution kernel, and recalculating to obtain a new loss value; after the final multiple loop iteration calculation, stopping iteration when the calculated loss value is smaller than a preset threshold value; and the semantics are distributed to the network model for successful training.

And after the training of the semantic segmentation network is finished, training the detection network.

In one embodiment, before the image is input into a semantic segmentation network, the image is subjected to expansion processing to obtain an expanded image;

wherein, the expansion processing refers to the convolution of the original image by a preset convolution kernel; the convolved image has large cracks and produces an effect of expansion. Referring to FIG. 4, a schematic illustration of an image after a dilation operation is performed; processing the images by adopting 5 different convolution kernels can obtain 5 different expanded images; wherein the convolution kernels of images (a), (b), (c), (d), and (e) increase in order; according to the invention, tests show that the final recognition accuracy of the graph (e) which is not the most obvious in expansion is high; but rather, graph (b); the convolution kernel used in fig. b is 5x 5.

In addition to the dilated convolution kernel, the following hyper-parameters need to be determined, see the preprocessing diagram shown in fig. 5; the hyper-parameters further include:

1) type of loss function of the first partial semantic segmentation network: the loss function includes various kinds, such as a mean square error loss function and a cross entropy loss function; preferably, the present application employs a cross entropy loss function.

2) Determining the size of the image; wherein the image size includes a full-size image and a half-size image; the present application is preferably a full size image; the half-size image is an image whose height is half of the original image without changing the length of the image. The image size can be input by the user, so that the image can be cropped, for example, the size of 100x200 is input, and the image deformation is automatically completed.

3) The angle of rotation of the picture. The angle of rotation may be 90 degrees; other arbitrary angles are also possible; the present application is preferably intended not to rotate. If 0 degree is input, the picture is not rotated.

The preprocessing mainly comprises the four aspects, and after the preprocessing is finished, the preprocessed image is input into the model architecture; to demonstrate the superior performance of the present application; three model architectures other than the present application can be adopted for comparison with the model architecture of the present application; in the training process, a random gradient descent (SGD) method is adopted to train the model, the iteration times (epochs) of the whole data are 100 times, the learning rate of the model is 0.1, and the parameters of the model are initialized according to the random numbers generated by normal distribution with the mean value of 0 and the variance of 0.01. To illustrate the superiority of the network architecture proposed by the present application, three other neural network architectures that are effective in the anomaly detection field will be used: U-Net, DeepLabv3+ and the commercial software Cognex ViDi Suite. Refer to fig. 6 for a schematic diagram of the detection results of the network architecture of the present application compared with the network architecture of other forms. It can be seen that the detection accuracy of the present application is higher than that of other network architectures.

Corresponding to the method, the present application also proposes an image detection device, referring to the schematic structural diagram of an image detection device shown in fig. 7; the device includes:

an image acquisition module 71, configured to acquire an image to be detected;

the processing module 72 is configured to input the image into a pre-trained semantic segmentation network to perform semantic segmentation processing to obtain a semantic segmentation image; and inputting the semantic segmentation image into a detection network for detection to obtain a detection result.

In one embodiment, the detection network comprises: the global pooling layer comprises a global pooling layer and a convolution layer corresponding to the global pooling layer; a fully-connected layer;

the processing module 72 is further configured to sequentially input the first output image into a first global maximum pooling layer and a first convolution layer to obtain a first feature map of the first output image;

the processing module 72 is further configured to sequentially pass the first feature map of the third output image through a second global maximum pooling layer and a third convolution layer to obtain a second feature map of a third output image;

and inputting the first characteristic diagram of the first output image, the second characteristic diagram of the third output image and the third characteristic diagram of the third output image into a full-connection layer to obtain an output result. In an embodiment, the method further includes a preprocessing module, configured to perform one or more of the following preprocessing on the image to be detected before inputting the image to be detected into the semantic segmentation network:

performing expansion pretreatment on the image to be detected;

adjusting and preprocessing the size of the picture to be detected;

and carrying out rotation preprocessing on the picture to be detected.

The preprocessing module is further used for carrying out binarization processing on the image to be detected to obtain a black-and-white image before the image to be detected is input into a semantic segmentation network.

The present application further provides an apparatus for image detection, comprising: at least one processor 81 and at least one memory 82;

the memory 82 for storing one or more program instructions;

the processor 81 is configured to execute one or more program instructions to perform the method according to any one of the above-mentioned embodiments.

In a fourth aspect, the present application also proposes a computer-readable storage medium having embodied therein one or more program instructions for being executed by a method according to any one of the above.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A method of image detection, comprising:

acquiring an image to be detected;

and inputting the semantic segmentation image into a detection network for detection to obtain a detection result.

2. The method of claim 1, wherein the semantic segmentation network comprises n-2 convolutional layers and max-pooling layers alternately arranged; the nth-1 and nth convolution layers are also arranged behind the nth-2 maximum pooling layers in sequence; n is greater than or equal to 3.

3. The method of claim 2, wherein the output image of the nth convolutional layer is denoted as a first output image; the output image of the (n-1) th convolution layer is recorded as a second output image;

inputting the first feature map of the third output image into a second global average pooling layer and a fourth convolution layer in sequence to obtain a second feature map of the third output image;

4. The method of claim 1, wherein inputting the image to be detected into a semantic segmentation network comprises: and carrying out binarization processing on the image to be detected to obtain a black-and-white image.

5. The method of claim 1, wherein before the image to be detected is input to the semantic segmentation network, the image to be detected is preprocessed by one or more of the following steps:

performing expansion pretreatment on the image to be detected;

adjusting and preprocessing the size of the picture to be detected;

and carrying out rotation preprocessing on the picture to be detected.

6. The method according to claim 1, characterized in that the image to be detected is a road surface image; and the detection result is the probability value of the defect of the road surface image.

7. An apparatus for image inspection, comprising:

the image acquisition module is used for acquiring an image to be detected;

8. The apparatus of claim 7, further comprising a pre-processing module, configured to perform one or more of the following pre-processing on the image to be detected before inputting the image to be detected into the semantic segmentation network:

performing expansion pretreatment on the image to be detected;

adjusting and preprocessing the size of the picture to be detected;

and carrying out rotation preprocessing on the picture to be detected.

9. An apparatus for image inspection, comprising: at least one processor and at least one memory;

the memory to store one or more program instructions;

the processor, configured to execute one or more program instructions to perform the method of any of claims 1-6.

10. A computer-readable storage medium having one or more program instructions embodied therein for being executed to perform the method of any one of claims 1-6.