CN113808103A

CN113808103A - Automatic road surface depression detection method and device based on image processing and storage medium

Info

Publication number: CN113808103A
Application number: CN202111086027.3A
Authority: CN
Inventors: 蔡长青
Original assignee: Guangzhou University
Current assignee: Guangzhou University
Priority date: 2021-09-16
Filing date: 2021-09-16
Publication date: 2021-12-17

Abstract

The invention discloses a road surface pothole automatic detection method, equipment and a storage medium based on image processing. The method comprises the following steps: collecting images of a road surface; performing three-dimensional reconstruction on the road surface image through SFM to obtain an orthoimage of the road surface; carrying out road surface hollow characteristic segmentation on the image through a depth separable convolution network to obtain a three-dimensional road surface hollow segmentation model; and measuring the caliber and the volume of the hollow in the three-dimensional road surface hollow segmentation model. The invention introduces an advanced machine learning technology into the field of road surface depression detection, and adopts SFM to carry out three-dimensional reconstruction on road surface depression images, so that the modeling precision reaches millimeter level; meanwhile, a deep separable convolution network is applied to feature extraction and segmentation, so that model parameters are reduced, and the operation efficiency is improved. The method is beneficial to realizing high-precision automatic measurement of the volume of the road depressions by the technicians in the field, and provides a new starting point for further deeply optimizing the measurement method.

Description

Automatic road surface depression detection method and device based on image processing and storage medium

Technical Field

The invention relates to the field of image processing, in particular to a road depression automatic detection method, equipment and a storage medium based on image processing.

Background

With the gradual popularization of urbanization construction, a large number of asphalt roads are paved for vehicles to pass by the nation. With the increase of service life, roads are gradually aged and are easy to crack, collapse and the like, so that the normal driving of vehicles is easily realized, and therefore, the transportation department needs to regularly inspect and maintain the road surface.

At present, a common road surface inspection mode still adopts a traditional manual detection method, a specially-assigned person carries out field recording and measurement on a concave road surface, time and labor are wasted, the efficiency is low, and the inspection cost is also the height of a water-rising ship under the condition that the road pavement mileage is higher and higher.

Disclosure of Invention

In view of the above, the present invention introduces an advanced machine learning method into road surface pothole detection, and completes the complicated pothole measurement and recording work by the AI.

The invention provides a road depression automatic detection method based on image processing, which is characterized by comprising the following steps of:

acquiring an image of a road surface to obtain a road surface image;

performing three-dimensional reconstruction on the road surface image through SFM to obtain an orthoimage of the road surface;

carrying out road surface pothole feature segmentation on the road surface image through a depth separable convolution network to obtain a segmentation result;

the diameter and volume of the hollow in the segmentation results are measured.

Further, the image acquisition of the road surface specifically includes:

setting the height, interval and angle of image acquisition to balance the visual field and the sampling distance;

and carrying out image acquisition on the road surface by using a movable carrier carrying camera.

Further, the three-dimensional reconstruction of the road surface image by the SFM specifically includes:

calibrating a camera, eliminating radial distortion and tangential distortion generated by a camera lens in a road surface image, and acquiring the internal physical characteristics of the camera;

generating a point cloud model of the road surface image according to the road surface image and the internal physical characteristics of the camera;

calibrating the generated point cloud model;

and carrying out space segmentation on the calibrated point cloud model to generate an orthoimage of the road surface.

Further, the calibrating the generated point cloud model specifically includes the following steps:

analyzing the point cloud model by using a PCA algorithm, and extracting main characteristic components of the point cloud model to obtain a first characteristic component, a second characteristic component and a third characteristic component;

establishing a space coordinate system by taking the first characteristic component, the second characteristic component and the third characteristic component as three coordinate directions, and constructing a point cloud plane in the coordinate system;

taking the characteristic component with the minimum characteristic value in the first characteristic component, the second characteristic component and the third characteristic component as the normal characteristic component of the point cloud plane;

multiplying the coordinates of each point in the point cloud model by the normal characteristic component to generate a calibration point cloud;

and carrying out rotation correction on the original point cloud in the point cloud model through the calibration point cloud.

Further, the ortho image includes:

the pixel in the color orthoimage records an average RGB value of a pixel position point cloud;

the depth orthoimage is characterized in that pixels in the depth orthoimage record the average height of a pixel position point cloud;

an overlaid orthogonal image that is an overlaid image of the color orthogonal image and the depth orthogonal image.

Further, the step of overlapping the orthographic images comprises:

performing RGB separation on the color orthoimage, wherein the separated R channel image is a first channel image, the separated G channel image is a second channel image, and the separated B channel image is a third channel image;

averaging the pixel value of the first channel image, the pixel value of the second channel image and the pixel value of the third channel image with the pixel value of the depth orthoimage respectively to obtain an average first channel image, an average second channel image and an average third channel image;

and recombining the average first channel image, the average second channel image and the average third channel image to form an overlapped orthoimage.

Further, the segmentation of the road surface hollow features of the image through the depth separable convolution network specifically includes:

performing convolution operation on the orthoimage through a plurality of downsampling layers to obtain a first characteristic image;

performing a transposition convolution operation on the first characteristic image through a plurality of up-sampling layers corresponding to the down-sampling layers to obtain a second characteristic image, wherein the second characteristic image has the same image size as the orthoimage;

and judging the probability of the road surface having the depression in the second characteristic image through a softmax layer.

Further, the down-sampling layer specifically includes a convolution layer with a kernel size of 3 × 3, a batch normalization layer, and a correction linear unit activation layer.

The invention also discloses a device, which comprises a processor and a memory;

the memory is used for storing programs;

the processor executes the program to realize the automatic detection method of the road depression based on the image processing.

The invention also discloses a computer readable storage medium, which stores a program, and the program is executed by a processor to realize the automatic detection method of the road depression based on the image processing.

The invention has the beneficial effects that: SFM is introduced to carry out 3d reconstruction on road surface images, and depth separable convolution is adopted to segment road surface pits, so that the calculation load is reduced. The invention can ensure the calculation speed, and make the detection result reach millimeter level precision, and realize the automatic measurement of the road surface pothole with high precision.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of the automatic detection method for road depressions based on image processing.

FIG. 2 is a diagram illustrating a road image acquisition procedure according to the present invention.

FIG. 3 is a diagram of the deep separable convolutional network construction of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

This example illustrates a road surface image acquisition process. The image acquisition process has a great influence on the quality of the three-dimensional reconstruction of the road surface, and needs to be designed from the height, the interval and the angle of photography. To achieve a balance between field of view and Ground Sampling Distance (GSD), the vertical distance between the cameras and the road surface is 0.8m and the horizontal distance between the cameras is 0.6 m. Under this setup, the image may cover about 2m²The smallest detail that can be accurately observed on the image is 0.27 mm/pixel. According to previous studies, the degree of overlap of the images should be between 70% and 80% in order to guarantee a high degree of detail. In addition, different capture angles may capture different surface microfeatures because the raised surface texture may mask the surrounding surface. Therefore, it is significant to compare the three-dimensional reconstruction effects at different photographing angles.

For field investigation, two different camera angle schemes were employed to compare the quality of the three-dimensional point cloud construction (fig. 2). The first solution is that all camera directions are perpendicular to the surface, as shown in fig. 2 (a). In this case, each camera can capture surface texture in various directions. However, the features captured on the image may be insignificant subject to the micro-parallax between one lens and another lens. The second solution is that the central camera direction is still perpendicular to the ground, while the side cameras are tilted towards the center at a fixed angle of 30 °. As shown in fig. 2 (b). It is certain that each camera can better capture the surface details of its own orientation. The disadvantage of this strategy is that many road surface features in other directions may be missed.

This embodiment describes the three-dimensional reconstruction of an image using SFM (Structure From Motion). The main process comprises the following steps: and according to the spatial relationship between the cameras, accurately estimating the spatial position of each effective pixel in the image by using the SfM. And secondly, converting the point cloud model generated by the SfM into an orthoimage through calibration and formatting. The method comprises the following specific steps:

(1) camera calibration

The camera eigen matrix K should be obtained first during the three-dimensional reconstruction. The relation between spatial points and image points in the camera coordinate system can be described by a camera eigenmatrix. The purpose of camera calibration is therefore to obtain the internal physical characteristics of the camera. Meanwhile, lens distortion, including radial distortion and tangential distortion, must be eliminated by camera calibration. By inputting a group of chessboard images, camera calibration parameters are automatically calculated by using MATLAB or Python-OpenCV.

(2) Point cloud reconstruction

Once the road photogrammetry and camera calibration are complete, the road surface images are divided into several groups, each group consisting of 20-30 images. Then, a Python script is compiled to call the function of PhotoScan, and batch processing is automatically performed on the image group.

SfM-based three-dimensional scene reconstruction relies on corresponding searches to extract and match points of interest between multiple views. The purpose of the correspondence search is to identify a corresponding point on the input image I ═ { Ii | I ═ 1 … N }. For each image Ii, a set of features Fi { (pj, fj) | j { (pj, fj) | 1 … M } is identified, including local feature descriptors fj for each location pj. These local features are stable in the case of position, orientation and scale changes, so that a unique matching of feature points can be achieved in a plurality of images. The feature extraction has multiple algorithms to choose from, such as accelerated robust features (SURF), accelerated segmentation test Features (FAST), and Scale Invariant Feature Transform (SIFT), and among the algorithms, SIFT is widely applied to SfM with its robustness and efficiency.

SfM finds the three-dimensional information of feature points through the geometric relationship of the image and the space. To obtain the spatial coordinates of points on the image, the relative position of the camera is first estimated. The positional relationship between the different cameras can be described by the epipolar geometry and the fundamental matrix E. Assume that the first camera coordinate system O1 xyz coincides with the world coordinate system, where O1 is the optical center and P1 ═ x1, y1] T is the image coordinates of the target point P on the first image. The second camera coordinate system is O2 xyz, where O2 is the optical center and P2 ═ x2, y2] T is the image coordinates of the target point P on the second image. The pairs of feature points (p1, p2) may be identified by feature extraction and matching. Meanwhile, the two images have the same camera parameters, wherein the focal length is fx, fy, and the coordinates of the optical center on the images are [ cx, cy ] T.

o1-xyz first camera coordinate system:

o2-xyz second camera coordinate system

Camera eigen matrix:

in equation (3), K is a camera intrinsic matrix, which can be obtained by camera calibration. [ X, Y, Z ] T is the spatial coordinates of the target point P in the O1-xyz camera coordinate system, [ X ', Y ', Z ' ] T is the spatial coordinates of the target point P in the O2-xyz camera coordinate system; d1 and d2 are the vertical distances from the target point to the optical center of the camera.

After each camera system has derived the coordinate transformation relationship from equations (1) - (2), the mutual positional relationship between the two camera coordinate systems can be estimated by the following equation, where the rotation matrix R is 3 × 3 and the translation vector T is 3 × 1:

equation (5) can be determined by solving for the perpendicular to T and K^-1[x2，y2，1]3 x1 vector of T [ T]×K^-1[x2，y2,1]T＝T_×[K^-1[x2，y2，1]T]Further simplified, wherein [ T]X is a 3 x 3 antisymmetric matrix. Then, equation (5) can be converted to:

in the formula (6), [ T]_×R is a fundamental matrix E that can be solved for a number of known pairs of eigenpoints that can be selected based on a random sample consensus (RANSAC) algorithm. And then extracting a rotation matrix R and a translation transformation direction quantity T from the basic matrix by using a Singular Value Decomposition (SVD) algorithm. Once the relationship between the cameras is determined, triangulation may be employed to calculate the spatial coordinates of the target points. Spatial coordinates [ X, Y, Z]T can be calculated by the SVD algorithm by the following formula:

wherein [ [ x2, y2, 1]]^T]_×Is a 3 x 3 antisymmetric matrix perpendicular to [ x2, y2, 1]]T。

(3) Point cloud calibration

The surface of the point cloud model generated by SfM is not parallel to the road surface because the images are taken from different angles. Therefore, an orthoimage of the road surface cannot be directly obtained based on the original point cloud model. There are several methods to find the plane of the original point cloud model and calibrate it to be parallel to the road surface. A common method is to fit a plane to the points and calculate the perpendicular distance between the original point and the fitted plane. However, this method corrects the vertical offset of the point cloud using the projection principle, neglecting the horizontal offset of the point cloud.

To overcome this problem, the present embodiment proposes the following steps to estimate the normal vector of the point cloud plane and calibrate the plane by rotation. First, three feature vectors of point cloud data are calculated using a Principal Component Analysis (PCA) algorithm, and visually regarded as three main directions of a point cloud model. Among these feature vectors, the feature vector having the smallest feature value represents the normal direction of the point cloud plane. And then multiplying the original coordinates of each point by the normal characteristic vector to generate a calibration point cloud. Geometrically, a horizontal point cloud curved surface is obtained by rotating the original point cloud. In addition, the coordinates of the original point cloud are relative coordinates, and can be calibrated to absolute coordinates through a scale factor. The scaling factor may be estimated from the ratio of the actual distance to the point cloud distance.

(4) Ortho image generation

After the corrected point cloud model is obtained, an orthoimage of road surface damage detection based on deep learning can be established. Three types of orthoimages were generated in this study, including color orthoimages, depth orthoimages, and color-depth overlay orthoimages.

Unlike the point cloud generated based on the laser method, the point cloud generated by the stereoscopic vision is disordered and randomly distributed on the surface of the object. Therefore, the calibrated point cloud is subjected to space segmentation to generate an orthoimage. Each pixel on the ortho-image represents point cloud information within a certain spatial range. For a color orthogonal image, the pixels on the image are the average RGB values of the point cloud in a specified area. Similarly, the pixels on the depth orthographic image are the average height of the point cloud within the specified area. It is noted that due to the same generation area, the two ortho images overlap completely. Thus, a superimposed image can be generated by an image superimposing operation.

The process of overlapping the color image and the depth image is as follows: first, a three-channel color image is divided into three single-channel images. Next, the pixel values on the three single-channel images are averaged with the pixel values on the depth image, respectively. Finally, the three new single-channel images are recombined to form an overlapping image.

This embodiment describes a step of dividing a road surface image hole by using a U-NET network, and a basic framework is shown in fig. 3. The input to the convolutional network based on the U-network is a three-channel image, which may be a color image, a depth image, or a color depth overlay image. The depth image is converted into a three-channel image by channel replication. The network output is a single channel image. In the encoder portion, features of an input image are first extracted by a standard convolution block including a convolution layer with a kernel size of 3 × 3, a Bulk Normalization (BN) layer and a corrected linear unit (ReLU) activation layer. A series of depth separable volume blocks is then followed to complete the remaining downsampling. After four convolution operations (step size of 2), the size of the feature map is reduced to one sixteenth of the input image. The decoder part of the system structure is mainly based on a transposition convolution layer, the size of an inner core is 2 multiplied by 2, the step length is 2, the transposition convolution is carried out, then, the compression operation is carried out, and the coding block is connected to the corresponding decoder block. The size of the feature map is doubled in each upsampling step, while the number of maps is halved. After performing four transpose convolutions, the network returns a feature map of the same size as the input image. The last layer of the network is the softmax layer, which returns a probability map providing the probability of road damage for each pixel.

The deep separable convolution divides the standard convolution into two steps, deep convolution and point convolution, greatly reducing the number of model parameters. In this embodiment, an input image is assumed to have M channels, an output feature map has N channels, and for one convolution operation, the number of parameters required for standard convolution based on kernel size k × k is:

P_conv＝M×N×k×k (9)

in deep convolution, feature maps of M channels are convolved by M convolution kernels, respectively. Thus, the number of parameters required for deep convolution is:

P_Dconv＝M×k×k (10)

based on the feature maps generated by the deep convolution, the M feature maps are subjected to 1 × 1 convolution to generate feature maps of N channels. The number of parameters in the point-by-point convolution operation is:

P_Pconv＝M×N×1×1 (11)

thus, the total number of parameters for the depth separable convolution is:

P_DSconv＝M×k×k+M×N×1×1 (12)

compared to standard convolution, the model parameters for depth-separable convolution are reduced to:

in this embodiment, the deep learning network is trained by a cross entropy loss function and a dice coefficient loss function. In the initial training phase, a cross entropy loss function is adopted to measure the performance of a pixel-level classifier, and the probability of the output of the classifier is between 0 and 1. The cross-entropy loss LCE is defined as follows:

where K is the number of pixels; yi is the basic fact of pixel i, yi ∈ {0,1 }; lambda yi is the prediction ^ yi for pixel i ^ yi ^ 0, 1.

However, for the segmentation of the road surface cracks, the number of non-crack pixels is much larger than the number of crack pixels. In consideration of this fact, a dice coefficient loss function is employed to evaluate the similarity between the ground truth image and the prediction image. The die coefficient loss LDC is determined by the following equation:

where s is a constant that prevents the denominator from being zero, in this context, s ═ 1.

The purpose of model training is to adjust the weights of the layers in the structure to achieve the best segmentation effect. After the weight is initialized, the weight is optimized by using an Adam optimization algorithm, so that the loss value is minimized. The Adam algorithm is widely applied to a model training process due to the outstanding performance of the Adam algorithm in the aspects of computational efficiency and robustness. Meanwhile, the learning rate is not fixed in the present embodiment, but is dynamically changed in the whole training process. In particular, if the loss of the validation set does not drop over several iterations, then the learning rate will be halved.

The embodiment of the invention also discloses a computer program product or a computer program, which comprises computer instructions, and the computer instructions are stored in a computer readable storage medium. The computer instructions may be read by a processor of a computer device from a computer-readable storage medium, and executed by the processor to cause the computer device to perform the method illustrated in fig. 1.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flow charts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present invention is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the described functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in a separate physical device or software module. It will also be appreciated that a detailed discussion of the actual implementation of each module is not necessary for an understanding of the present invention. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the invention as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the invention, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. An automatic detection method for road depressions based on image processing is characterized by comprising the following steps:

acquiring an image of a road surface to obtain a road surface image;

the diameter and volume of the hollow in the segmentation results are measured.

2. The method for automatically detecting the potholes on the road surface based on the image processing as claimed in claim 1, wherein the image acquisition of the road surface specifically comprises:

3. The method for automatically detecting the road depressions based on the image processing as claimed in claim 1, wherein the three-dimensional reconstruction of the road image by SFM specifically comprises:

calibrating the generated point cloud model;

4. The method for automatically detecting the potholes on the road surface based on the image processing as claimed in claim 3, wherein the step of calibrating the generated point cloud model specifically comprises the following steps:

5. The automatic detection method for road depressions based on image processing as claimed in claim 3, wherein the orthoimage comprises:

6. The automatic detection method for road depressions based on image processing as claimed in claim 5, wherein the step of overlapping the orthographic images comprises:

7. The method for automatically detecting the road potholes based on the image processing, according to claim 1, wherein the segmentation of the road pothole features of the image through a depth separable convolution network specifically comprises:

8. The method for automatically detecting the pothole on the road surface based on the image processing is characterized in that the down-sampling layers specifically comprise a convolution layer with a kernel size of 3 x 3, a batch normalization layer and a correction linear unit activation layer.

9. An apparatus comprising a processor and a memory;

the memory is used for storing programs;

the processor executing the program realizes the method according to any one of claims 1-8.

10. A computer-readable storage medium, characterized in that the storage medium stores a program, which is executed by a processor to implement the method according to any one of claims 1-8.