WO2023286917A1

WO2023286917A1 - System and method for detecting both road surface damage and obstacles by using deep neural network, and recording medium in which computer-readable program for executing same method is recorded

Info

Publication number: WO2023286917A1
Application number: PCT/KR2021/013899
Authority: WO
Inventors: 심승보; 최상일; 공석민; 이성원
Original assignee: 한국건설기술연구원
Priority date: 2021-07-16
Filing date: 2021-10-08
Publication date: 2023-01-19
Also published as: KR102423218B1

Abstract

Disclosed are a system and a method for detecting both road surface damage and obstacles by using a deep neural network, and a recording medium in which a computer-readable program for executing the method is recorded. The system for detecting both road surface damage and obstacles comprises an image acquisition unit, a backbone network unit, an object detection unit, a segmentation unit, and an output unit. The image acquisition unit acquires an image of a road on which a vehicle is driving, the backbone network unit extracts image features from the image to generate a plurality of backbone blocks having different sizes, the object detection unit detects obstacles on the road by using the plurality of backbone blocks, the segmentation unit detects surface damage of the road by using the plurality of backbone blocks, and the output unit outputs the obstacles and road surface damage on the image.

Description

A system and method for simultaneously detecting road surface damage and obstacles using a deep neural network, and a recording medium recording a computer readable program for executing the method.

The present invention relates to vehicle-related image processing technology, and more particularly, to a system and method for simultaneously detecting road surface damage and moving obstacles using artificial intelligence.

Transportation in the future is expected to take many forms. Among them, it is expected that the personal mobile vehicle, which has recently been actively supplied, will take its place as a new means of transportation.

However, such a personal mobile vehicle acts as a great risk factor for those who have difficulty in quick steering control and accurate judgment, such as the elderly. In order to mitigate these risk factors, development of automated technology for recognizing and controlling the environment with various sensors being installed continues.

Accordingly, in recent years, like other technologies in the field of autonomous driving, technology in the field of sensors has made remarkable progress, and this is because technology for recognizing the vehicle's surroundings for safe driving is so important and necessary.

The objects that can be seen as the most representative obstacles on the road are vehicles, people, bicycles, and motorcycles. In addition, vehicle accidents can be prevented by recognizing such objects and performing corresponding control.

In addition, there are obstacles to be avoided on the road surface unlike dynamic obstacles with active movement. It is a static obstacle associated with road infrastructure, mainly for targets such as falling objects on the road, road surface damage, etc. In particular, road surface damage such as potholes and turtle cracks affect the driving of personal mobile vehicles. This is because the diameter of the wheel is smaller than that of a general vehicle, and the driver is greatly affected by the condition of the road surface, which can be severe when the elderly and the disabled drive.

Therefore, in a situation where accidents occur according to road surface conditions even in general vehicles, small personal mobile vehicles are exposed to a greater risk of accidents, so technology capable of recognizing road surface conditions in real time is more urgently needed.

The present invention has been made to solve the above-mentioned conventional problems, and provides a system and method capable of simultaneously detecting not only dynamic obstacles that an autonomous personal mobile vehicle may encounter while driving on a road, but also road surface defects. aims to

In order to achieve the above object, a road surface damage and obstacle simultaneous detection system using a deep neural network according to the present invention includes an image acquisition unit, a backbone network unit, an object detection unit, a segmentation unit, and an output unit.

The image acquisition unit acquires an image of the road on which you are driving, the backbone network unit extracts features of the image from the image and creates a plurality of backbone blocks having different sizes, and the object detection unit uses the generated plurality of backbone blocks on the road. The obstacle is detected, the segmentation unit detects road surface damage of the road using a plurality of backbone blocks, and the output unit outputs the obstacle and road surface damage to an image.

According to this configuration, it is possible to simultaneously extract various road obstacles such as moving obstacles or road surface damage by connecting a segmentation deep neural network for extracting an object with an irregular shape to an object recognition deep neural network for extracting a compact object.

At this time, the segmentation unit includes a plurality of auto-encoder block units that process and output each of the plurality of backbone blocks, an up-sampling unit that upsamples the outputs of the plurality of auto-encoder block units to the same size and generates a plurality of sub-outputs, and and an averaging unit for averaging, normalizing, and outputting a plurality of sub-outputs.

In addition, the segmentation unit detects damage to the road surface by using a deep neural network, and the number of the last output channel may be 2 to distinguish between a damaged area and a normal area of the road surface.

In addition, the object detection unit includes an FFM unit that changes a plurality of backbone blocks to the same size and integrates them into basic features, a TUM unit that generates features for recognizing objects of multiple sizes from the basic features, and features generated by the TUM unit. It may include a SFAM unit that integrates and generates a multi-level feature pyramid.

Also, the object detector may further include a predictor that predicts an obstacle using a multi-level feature pyramid.

Also, the output unit may output an outline corresponding to the obstacle to the image, and the output unit may output a pixel area corresponding to the road surface damage to the image.

In addition, the method for simultaneously detecting road surface damage and obstacles using a deep neural network according to the present invention includes an image acquisition step of obtaining an image of a road being driven, and extracting features of the image from the acquired image to obtain different sizes. A backbone block generation step of generating a plurality of backbone blocks having a plurality of backbone blocks, an object detection step of detecting an obstacle on the road using a plurality of backbone blocks, a segmentation step of detecting damage to a road surface using a plurality of backbone blocks, and an image and an output step of outputting obstacles and road surface damage.

In addition, a recording medium recording a computer readable program for executing the method is disclosed.

According to the present invention, it is possible to simultaneously extract various road obstacles, such as moving obstacles or road surface damage, by connecting an object recognition deep neural network for extracting a compact object with a segmentation deep neural network for extracting an object with an irregular shape.

1 is a schematic block diagram of a road surface damage and obstacle simultaneous detection system using a deep neural network according to an embodiment of the present invention.

FIG. 2 is a diagram illustrating a network structure proposed to implement the system for simultaneously detecting road surface damage and obstacles of FIG. 1; FIG.

Fig. 3 is a diagram showing the structure of the auto encoder block of Fig. 2;

Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

1 is a schematic block diagram of a road surface damage and obstacle simultaneous detection system using a deep neural network according to an embodiment of the present invention. 1, the road surface damage and obstacle simultaneous detection system includes an image acquisition unit 110, a backbone network unit 120, an object detection unit 130, a segmentation unit 140, and an output unit 150, and an object The detection unit 130 includes an FFM unit 132, a TUM unit 134, a SFAM unit 136, and a prediction unit 138, and the segmentation unit 140 includes a plurality of auto encoder block units 142, a plurality of An upsampling unit 144 and an averaging unit 146 of , respectively, are included again.

First, the image acquisition unit 110 acquires an image of the road being driven, and the backbone network unit 120 extracts features of the image from the acquired image and generates a plurality of backbone blocks having different sizes.

The object detector 130 detects an obstacle on the road using the generated plurality of backbone blocks. To this end, the FFM unit 132 changes a plurality of backbone blocks to the same size and integrates them into basic features, and the TUM unit 134 generates features for recognizing objects of multiple sizes from the basic features, and the SFAM 136 The unit integrates the features generated by the TUM unit 134 to generate a multi-level feature pyramid, and the prediction unit 138 predicts an obstacle using the multi-level feature pyramid.

The object detection unit 130 is a component for detecting a dynamic obstacle, and may be implemented with an algorithm capable of detecting 7 types of dynamic obstacles. To this end, it can self-procur a data set of 1,418 sheets in which dynamic obstacles such as vehicles, trucks, buses, motorcycles, bicycles, traffic lights, and people are displayed in the form of bounding boxes. In addition, by securing a deep neural network structure for developing an algorithm capable of detecting dynamic obstacles in the form of a bounding box, it can be implemented as an object recognition technology based on multiscale and multilevel features.

The segmentation unit 140 detects damage to the road surface by using a plurality of backbone blocks. To this end, the plurality of auto encoder block units 142 process and output each of the plurality of backbone blocks, and the plurality of upsampling units 144 upsamples the outputs of the plurality of auto encoder blocks 142 to the same size, respectively. to generate a plurality of sub-outputs, and the averaging unit 146 averages and normalizes the plurality of sub-outputs and outputs them. At this time, the segmentation unit 140 detects damage to the road surface by using a deep neural network, and the number of the last output channel may be 2 to distinguish between a damaged area and a normal area of the road surface.

The segmentation unit 140 is a component for detecting road surface damage and can detect road surface damage areas such as linear cracks, alligator cracks, and potholes. This is a technique that shares the backbone network used in deep neural networks for object recognition for road surface damage detection. It detects road surface damage areas by connecting four lightweight auto-encoder neural networks from multilevel features, and uses four lightweight auto-encoders. The final damage area can be determined as the average of the road surface damage areas obtained from the neural network.

The output unit 150 outputs obstacles and road surface damage to images. At this time, the output unit 150 may output an outline corresponding to the obstacle to the image, and may output a pixel area corresponding to the road surface damage to the image.

The present invention is a configuration for real-time deep learning detection based on multi-task, and as a result of actual experiments, the time required to process one input image is 89 ms, and image processing can be performed based on real-time multi-task.

An autonomous vehicle typically has a sensor measurement period of 50 to 200 ms, but a personal mobile vehicle may have a longer measurement period due to a slower driving speed. Therefore, the measurement period of 89 ms is a period capable of real-time object recognition and road surface damage detection by being mounted on a personal mobile vehicle.

FIG. 2 is a diagram illustrating a network structure proposed to implement the system for simultaneously detecting road surface damage and obstacles of FIG. 1 . The present invention will be described in more detail with reference to FIG. 2 as follows.

1 Object recognition deep neural network using M2Det

M2Det stands for multilevel and multiscale detection, which means that various levels of features and images of various sizes are used for object recognition. The object to be measured varies in size within the image according to the shooting distance of the object. To solve this problem, learning is performed by varying the size of the input image. In addition, the complexity of the feature also varies according to the shape and context of the measurement target. For example, in the case of traffic lights, most of the shapes are similar to each other, but in the case of people, the shapes and contexts vary greatly depending on age or clothes worn. To solve this, various levels of features are needed. To satisfy these two requirements, MD2Det with both multiscale and multilevel methods applied is used.

The Multilevel Feature Pyramid Network (MLFPN) used in M2Det is shown in FIG. 2. It consists of three modules: Feature Fusion Module (FFM), Thinned U-shape Module (TUM), and Scale-wise Feature Aggregation Module (SFAM). FFM is a function of integrating features. For example, it serves to change the features extracted from the backbone network to the same size and integrate them into base features. TUM plays a role in generating meaningful features for recognizing objects of various sizes in the form of an auto-encoder. Finally, SFAM provides information for classification and localization after creating a feature pyramid of various levels by integrating the features generated by TUM.

2 Semantic segmentation network structure

In the present invention, as shown in FIG. 2, hierarchical features are used for segmentation. This is a method that uses features of various sizes that are generated step by step. When an image with road cracks is input, it passes through the backbone network. M2Det uses VGG16 as the backbone network. As a result, it is composed of backbone blocks (B blocks) through a total of 4 steps, and each is connected to an auto-encoder (AE) block. One B block is reduced in size by half, leading to the next B block. Therefore, the size of the output of the B block changes as [576×576, 288×288, 144×144, 72×72] as the steps pass.

Next, when used as an input of an AE block, the size of the output is the same as the size of the input. However, regardless of the number of channels in the input, the number of channels in the final output is 2, so that it can be divided into a damaged area and a normal area. Finally, the output of the AE block becomes a 576×576 up-sample and becomes a sub-output. A total of 4 sub-outputs are normalized to 0 and 1 after applying the softmax function through average operation. In addition, a location where the value of the second channel is greater than 0.5 is regarded as having road surface damage.

In the present invention, the features generated by the backbone network are used as inputs to the AE block. When an input is received, the AE block first undergoes a convolution-batch normalization-rectifier linear unit (Conv-BN-ReLU) operation process as shown in FIG. At this time, the kernel size is set to 7×7 and the padding is set to 3 to keep the size the same during the operation process. FIG. 3 is a diagram showing the structure of the auto encoder block of FIG. 2;

Next, we use a deep neural network consisting of two encoder blocks and a decoder block. In the case of the encoder block, it is divided into a residual network including skip connection and a residual network including down sample convolution operation.

The former is a deep neural network using skip connection that adds the weight value of input information before the last activation function to reduce loss of input information after convolution operation. The latter reduces the size by half by setting the stride to 2 in the first convolution operation, and the input information is also a deep neural network that reduces the size through another convolution operation with a kernel size of 1×1 and adds weight values. Each time it passes through this encoder block, the size of the feature is reduced by half.

And repeat this twice, so the size becomes 1/4. To restore these reduced features, a decoder block is used. A decoder block consists of two decoder networks. In this network, the process of restoring the size reduced in the previous block to its original role is the main role, and the present invention uses the transposed convolution operation. Performing this operation doubles the size of the feature. And repeat this twice to make it the original input size. Finally, add Conv-BN-ReLU as an operation to the last neural network so that the number of final output channels is 2. As a result, the number of feature channels finally created through the five-step operation is [64, 128, 64, 32, 2].

3 loss function

In order to update the weights of the deep neural network of segmentation proposed in the present invention, a loss function such as Equation 1 is used. i indicates the position of the pixel and s is the output stage of the deep neural network, expressed as 1, 2, 3, 4. N represents the size of the sub-output, which is 576×576. y _si becomes 0 or 1 as the label value of position i in step s, and P(y _si ) becomes the prediction probability value at the same position. After applying the loss function of cross-entropy to several sub-outputs obtained from hierarchical features, the method of summation is used.

Next, a loss function is used to update the weights of the deep neural network for object detection used in the present invention. The loss function aims to select the bounding box containing the object to be detected from the candidate bounding boxes with various ratios. Therefore, M2Det also creates tens of thousands of candidate bounding boxes, compares them with the ground truth bounding box, and learns to select the smallest bounding box with the smallest difference. At this time, an indicator representing the difference is used as a loss function, which is defined as in Equation 2.

In this equation, L _conf plays a role in determining the type of object to be detected, and L _loc plays a role in determining the location of an object in the image. x ^Pi,j is set to 1 when the degree of overlap between the i-th candidate bounding box and the j-th ground truth bounding box representing the p category is 50% or more, and is 0 otherwise. Among the candidate bounding boxes, boxes with an overlap of 50% or more are called Pos, and the other boxes are called Neg. And let N be the number of boxes included in Pos. c is a probability value that estimates the type of object. l denotes a value obtained by estimating the position of an object in the image, and g represents the actual position of the object in the image.

If L _loc is described in detail, the expression is defined as in Equation 3. cx,cy are the coordinates of the center pixel of the bounding box (d), and w and h represent the width and height. And g means ground truth bounding box. As a result, the value obtained by summing the differences between l ^mi and g ^mj is used as the value of the loss function.

L _conf is defined as in Equation 4. Indicates the probability values for the bounding boxes corresponding to Pos and the probability values for the bounding boxes corresponding to Neg. The term corresponding to the former shows a high probability value when the target to be detected is included in the bounding box. On the other hand, the term corresponding to the latter can obtain a high probability value when there is no object to be detected in the bounding box. In conclusion, the loss value for estimating the type of object can be obtained by summing these two terms.

4 learning conditions

In order to learn a deep neural network to detect dynamic or static obstacles that can be encountered on the road, 1,218 of the total data were used for learning and 200 were used for verification. And the optimization function applied the same ADAM. And as parameters used in this function, the learning rate is set to 0001, beta-1 to 09, and beta-2 to 0999.

Prior to learning, the initial values of the weights were all set to Xavier. The batch size was set to 14, and the model with the best performance was selected during a total of 2,000 epochs. The implementation of the algorithm used Pytorch based on Ubuntu 1804, and the specifications of the development PC would be Intel Xeon Gold 6226R, 128GB RAM, NVIDIA Quadro RTX 8000.

In summary, conventional object recognition technology is a technology that detects dynamic obstacles mainly in the form of bounding boxes with fast computation time, but is very unsuitable for detecting road surface damage. In addition, road surface damage is inefficient and inaccurate because its shape is not constant and includes many normal areas when used as a bounding box.

On the other hand, road surface damage recognition technology has been developed as an infrastructure maintenance technology so far and can accurately detect it, but requires a long calculation time and a large memory to use as a real-time detection technology. In addition, a lot of cost is required to mount an algorithm that takes a long calculation time and requires a large memory capacity to a personal mobile vehicle. Therefore, a new deep learning technology capable of quickly and accurately detecting not only dynamic obstacles but also road surface damage was needed by supplementing these mutual disadvantages.

In contrast, the present invention proposes an image-based artificial intelligence technology capable of simultaneously detecting not only dynamic obstacles that an autonomous personal mobile vehicle may encounter while driving, but also road surface defects.

The contribution to the technology proposed in the present invention is as follows. First, it proposes a structure that can simultaneously perform road surface damage detection in connection with an object recognition algorithm used in the field of autonomous driving. We propose a multi-tasking based algorithm by combining a simple structure with the existing object recognition deep neural network structure.

Second, the algorithm is performed in real time despite the simultaneous detection of object recognition and road surface damage. Considering these two items, we present joint deep learning that can learn and perform both functions simultaneously.

The present invention is expected to contribute to improving the driving safety of personal mobile vehicles by developing and utilizing new technologies such as image sensors, and increasing the utilization of the transportation vulnerable as a driving assistance technology for personal mobility vehicles that can be used as future transportation vehicles. expected to do In addition, the present invention is expected to accelerate the spread of next-generation means of transportation in neglected public transportation infrastructure areas by contributing to improved driving safety.

Although the present invention has been described by some preferred embodiments, the scope of the present invention should not be limited thereto, but should also extend to modifications or improvements of the above embodiments supported by the claims.

Claims

An image acquisition unit that acquires an image of a driving road;

a backbone network unit extracting features of the image from the image and generating a plurality of backbone blocks having different sizes;

an object detector detecting obstacles on the road using the plurality of backbone blocks;

a segmentation unit detecting road surface damage of the road using the plurality of backbone blocks; and

A system for simultaneously detecting road surface damage and obstacles, characterized in that it comprises an output unit for outputting the obstacle and road surface damage in the image.
The method according to claim 1, wherein the segmentation unit,

a plurality of auto encoder block units for processing and outputting each of the plurality of backbone blocks;

a plurality of upsampling units generating a plurality of sub-outputs by upsampling outputs of the plurality of auto encoder block units to a predetermined size; and

and an averaging unit that averages and normalizes the plurality of sub-outputs and outputs them.
The method of claim 2,

The segmentation unit detects road surface damage of the road using a deep neural network, and the number of the last output channel is 2 to distinguish between a damaged area and a normal area of the road surface Damage and obstacle simultaneous detection system.
The method according to claim 3, wherein the object detection unit,

an FFM unit that changes the plurality of backbone blocks to the same size and integrates them into basic features;

a TUM unit generating features for object recognition of multiple sizes from the basic features; and

and a SFAM unit generating a multi-level feature pyramid by integrating the features generated by the TUM unit.
The method according to claim 4, wherein the object detection unit,

The road surface damage and obstacle simultaneous detection system, characterized in that it further comprises a prediction unit for predicting the obstacle using the multi-level feature pyramid.
The method of claim 5,

The simultaneous detection system for road surface damage and obstacles, characterized in that the output unit outputs an outline corresponding to the obstacle on the image.
The method of claim 6,

The simultaneous detection system for road surface damage and obstacles, characterized in that the output unit outputs a pixel area corresponding to the road surface damage on the image.
An image acquisition step of acquiring an image of the road being driven;

a backbone block generation step of extracting features of the image from the image and generating a plurality of backbone blocks having different sizes;

an object detection step of detecting an obstacle on the road using the plurality of backbone blocks;

a segmentation step of detecting road surface damage of the road using the plurality of backbone blocks; and

A method for simultaneously detecting road surface damage and obstacles, characterized in that it comprises an output step of outputting the obstacle and road surface damage to the image.
A recording medium recording a computer readable program for executing the method of claim 8.