WO2023286917A1 - System and method for detecting both road surface damage and obstacles by using deep neural network, and recording medium in which computer-readable program for executing same method is recorded - Google Patents

System and method for detecting both road surface damage and obstacles by using deep neural network, and recording medium in which computer-readable program for executing same method is recorded Download PDF

Info

Publication number
WO2023286917A1
WO2023286917A1 PCT/KR2021/013899 KR2021013899W WO2023286917A1 WO 2023286917 A1 WO2023286917 A1 WO 2023286917A1 KR 2021013899 W KR2021013899 W KR 2021013899W WO 2023286917 A1 WO2023286917 A1 WO 2023286917A1
Authority
WO
WIPO (PCT)
Prior art keywords
road surface
unit
surface damage
image
road
Prior art date
Application number
PCT/KR2021/013899
Other languages
French (fr)
Korean (ko)
Inventor
심승보
최상일
공석민
이성원
Original Assignee
한국건설기술연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국건설기술연구원 filed Critical 한국건설기술연구원
Publication of WO2023286917A1 publication Critical patent/WO2023286917A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Definitions

  • the present invention relates to vehicle-related image processing technology, and more particularly, to a system and method for simultaneously detecting road surface damage and moving obstacles using artificial intelligence.
  • Transportation in the future is expected to take many forms. Among them, it is expected that the personal mobile vehicle, which has recently been actively supplied, will take its place as a new means of transportation.
  • the objects that can be seen as the most representative obstacles on the road are vehicles, people, bicycles, and motorcycles.
  • vehicle accidents can be prevented by recognizing such objects and performing corresponding control.
  • the present invention has been made to solve the above-mentioned conventional problems, and provides a system and method capable of simultaneously detecting not only dynamic obstacles that an autonomous personal mobile vehicle may encounter while driving on a road, but also road surface defects. aims to
  • a road surface damage and obstacle simultaneous detection system using a deep neural network includes an image acquisition unit, a backbone network unit, an object detection unit, a segmentation unit, and an output unit.
  • the image acquisition unit acquires an image of the road on which you are driving
  • the backbone network unit extracts features of the image from the image and creates a plurality of backbone blocks having different sizes
  • the object detection unit uses the generated plurality of backbone blocks on the road.
  • the obstacle is detected
  • the segmentation unit detects road surface damage of the road using a plurality of backbone blocks
  • the output unit outputs the obstacle and road surface damage to an image.
  • the segmentation unit includes a plurality of auto-encoder block units that process and output each of the plurality of backbone blocks, an up-sampling unit that upsamples the outputs of the plurality of auto-encoder block units to the same size and generates a plurality of sub-outputs, and and an averaging unit for averaging, normalizing, and outputting a plurality of sub-outputs.
  • the segmentation unit detects damage to the road surface by using a deep neural network, and the number of the last output channel may be 2 to distinguish between a damaged area and a normal area of the road surface.
  • the object detection unit includes an FFM unit that changes a plurality of backbone blocks to the same size and integrates them into basic features, a TUM unit that generates features for recognizing objects of multiple sizes from the basic features, and features generated by the TUM unit. It may include a SFAM unit that integrates and generates a multi-level feature pyramid.
  • the object detector may further include a predictor that predicts an obstacle using a multi-level feature pyramid.
  • the output unit may output an outline corresponding to the obstacle to the image, and the output unit may output a pixel area corresponding to the road surface damage to the image.
  • the method for simultaneously detecting road surface damage and obstacles using a deep neural network includes an image acquisition step of obtaining an image of a road being driven, and extracting features of the image from the acquired image to obtain different sizes.
  • an object recognition deep neural network for extracting a compact object with a segmentation deep neural network for extracting an object with an irregular shape.
  • FIG. 1 is a schematic block diagram of a road surface damage and obstacle simultaneous detection system using a deep neural network according to an embodiment of the present invention.
  • FIG. 2 is a diagram illustrating a network structure proposed to implement the system for simultaneously detecting road surface damage and obstacles of FIG. 1;
  • FIG. 2 is a diagram illustrating a network structure proposed to implement the system for simultaneously detecting road surface damage and obstacles of FIG. 1;
  • Fig. 3 is a diagram showing the structure of the auto encoder block of Fig. 2;
  • the road surface damage and obstacle simultaneous detection system includes an image acquisition unit 110, a backbone network unit 120, an object detection unit 130, a segmentation unit 140, and an output unit 150, and an object
  • the detection unit 130 includes an FFM unit 132, a TUM unit 134, a SFAM unit 136, and a prediction unit 138
  • the segmentation unit 140 includes a plurality of auto encoder block units 142, a plurality of An upsampling unit 144 and an averaging unit 146 of , respectively, are included again.
  • the image acquisition unit 110 acquires an image of the road being driven, and the backbone network unit 120 extracts features of the image from the acquired image and generates a plurality of backbone blocks having different sizes.
  • the object detector 130 detects an obstacle on the road using the generated plurality of backbone blocks.
  • the FFM unit 132 changes a plurality of backbone blocks to the same size and integrates them into basic features
  • the TUM unit 134 generates features for recognizing objects of multiple sizes from the basic features
  • the SFAM 136 The unit integrates the features generated by the TUM unit 134 to generate a multi-level feature pyramid
  • the prediction unit 138 predicts an obstacle using the multi-level feature pyramid.
  • the object detection unit 130 is a component for detecting a dynamic obstacle, and may be implemented with an algorithm capable of detecting 7 types of dynamic obstacles. To this end, it can self-procur a data set of 1,418 sheets in which dynamic obstacles such as vehicles, trucks, buses, motorcycles, bicycles, traffic lights, and people are displayed in the form of bounding boxes. In addition, by securing a deep neural network structure for developing an algorithm capable of detecting dynamic obstacles in the form of a bounding box, it can be implemented as an object recognition technology based on multiscale and multilevel features.
  • the segmentation unit 140 detects damage to the road surface by using a plurality of backbone blocks.
  • the plurality of auto encoder block units 142 process and output each of the plurality of backbone blocks, and the plurality of upsampling units 144 upsamples the outputs of the plurality of auto encoder blocks 142 to the same size, respectively. to generate a plurality of sub-outputs, and the averaging unit 146 averages and normalizes the plurality of sub-outputs and outputs them.
  • the segmentation unit 140 detects damage to the road surface by using a deep neural network, and the number of the last output channel may be 2 to distinguish between a damaged area and a normal area of the road surface.
  • the segmentation unit 140 is a component for detecting road surface damage and can detect road surface damage areas such as linear cracks, alligator cracks, and potholes. This is a technique that shares the backbone network used in deep neural networks for object recognition for road surface damage detection. It detects road surface damage areas by connecting four lightweight auto-encoder neural networks from multilevel features, and uses four lightweight auto-encoders. The final damage area can be determined as the average of the road surface damage areas obtained from the neural network.
  • the output unit 150 outputs obstacles and road surface damage to images. At this time, the output unit 150 may output an outline corresponding to the obstacle to the image, and may output a pixel area corresponding to the road surface damage to the image.
  • the present invention is a configuration for real-time deep learning detection based on multi-task, and as a result of actual experiments, the time required to process one input image is 89 ms, and image processing can be performed based on real-time multi-task.
  • An autonomous vehicle typically has a sensor measurement period of 50 to 200 ms, but a personal mobile vehicle may have a longer measurement period due to a slower driving speed. Therefore, the measurement period of 89 ms is a period capable of real-time object recognition and road surface damage detection by being mounted on a personal mobile vehicle.
  • FIG. 2 is a diagram illustrating a network structure proposed to implement the system for simultaneously detecting road surface damage and obstacles of FIG. 1 .
  • the present invention will be described in more detail with reference to FIG. 2 as follows.
  • M2Det stands for multilevel and multiscale detection, which means that various levels of features and images of various sizes are used for object recognition.
  • the object to be measured varies in size within the image according to the shooting distance of the object.
  • learning is performed by varying the size of the input image.
  • the complexity of the feature also varies according to the shape and context of the measurement target. For example, in the case of traffic lights, most of the shapes are similar to each other, but in the case of people, the shapes and contexts vary greatly depending on age or clothes worn. To solve this, various levels of features are needed. To satisfy these two requirements, MD2Det with both multiscale and multilevel methods applied is used.
  • M2Det The Multilevel Feature Pyramid Network (MLFPN) used in M2Det is shown in FIG. 2. It consists of three modules: Feature Fusion Module (FFM), Thinned U-shape Module (TUM), and Scale-wise Feature Aggregation Module (SFAM).
  • FFM is a function of integrating features. For example, it serves to change the features extracted from the backbone network to the same size and integrate them into base features.
  • TUM plays a role in generating meaningful features for recognizing objects of various sizes in the form of an auto-encoder.
  • SFAM provides information for classification and localization after creating a feature pyramid of various levels by integrating the features generated by TUM.
  • hierarchical features are used for segmentation.
  • This is a method that uses features of various sizes that are generated step by step.
  • M2Det uses VGG16 as the backbone network.
  • VGG16 as the backbone network.
  • AE auto-encoder
  • One B block is reduced in size by half, leading to the next B block. Therefore, the size of the output of the B block changes as [576 ⁇ 576, 288 ⁇ 288, 144 ⁇ 144, 72 ⁇ 72] as the steps pass.
  • the size of the output is the same as the size of the input. However, regardless of the number of channels in the input, the number of channels in the final output is 2, so that it can be divided into a damaged area and a normal area.
  • the output of the AE block becomes a 576 ⁇ 576 up-sample and becomes a sub-output. A total of 4 sub-outputs are normalized to 0 and 1 after applying the softmax function through average operation. In addition, a location where the value of the second channel is greater than 0.5 is regarded as having road surface damage.
  • the features generated by the backbone network are used as inputs to the AE block.
  • the AE block first undergoes a convolution-batch normalization-rectifier linear unit (Conv-BN-ReLU) operation process as shown in FIG.
  • the kernel size is set to 7 ⁇ 7 and the padding is set to 3 to keep the size the same during the operation process.
  • FIG. 3 is a diagram showing the structure of the auto encoder block of FIG. 2;
  • a deep neural network consisting of two encoder blocks and a decoder block.
  • the encoder block it is divided into a residual network including skip connection and a residual network including down sample convolution operation.
  • the former is a deep neural network using skip connection that adds the weight value of input information before the last activation function to reduce loss of input information after convolution operation.
  • the latter reduces the size by half by setting the stride to 2 in the first convolution operation, and the input information is also a deep neural network that reduces the size through another convolution operation with a kernel size of 1 ⁇ 1 and adds weight values. Each time it passes through this encoder block, the size of the feature is reduced by half.
  • a decoder block consists of two decoder networks. In this network, the process of restoring the size reduced in the previous block to its original role is the main role, and the present invention uses the transposed convolution operation. Performing this operation doubles the size of the feature. And repeat this twice to make it the original input size. Finally, add Conv-BN-ReLU as an operation to the last neural network so that the number of final output channels is 2. As a result, the number of feature channels finally created through the five-step operation is [64, 128, 64, 32, 2].
  • a loss function such as Equation 1 is used.
  • i indicates the position of the pixel and s is the output stage of the deep neural network, expressed as 1, 2, 3, 4.
  • N represents the size of the sub-output, which is 576 ⁇ 576.
  • y si becomes 0 or 1 as the label value of position i in step s, and P(y si ) becomes the prediction probability value at the same position.
  • a loss function is used to update the weights of the deep neural network for object detection used in the present invention.
  • the loss function aims to select the bounding box containing the object to be detected from the candidate bounding boxes with various ratios. Therefore, M2Det also creates tens of thousands of candidate bounding boxes, compares them with the ground truth bounding box, and learns to select the smallest bounding box with the smallest difference. At this time, an indicator representing the difference is used as a loss function, which is defined as in Equation 2.
  • L conf plays a role in determining the type of object to be detected
  • L loc plays a role in determining the location of an object in the image.
  • x Pi,j is set to 1 when the degree of overlap between the i-th candidate bounding box and the j-th ground truth bounding box representing the p category is 50% or more, and is 0 otherwise.
  • Pos boxes with an overlap of 50% or more are called Pos
  • Neg boxes with an overlap of 50% or more
  • Neg the other boxes
  • N the number of boxes included in Pos
  • c is a probability value that estimates the type of object.
  • l denotes a value obtained by estimating the position of an object in the image
  • g represents the actual position of the object in the image.
  • Equation 3 cx,cy are the coordinates of the center pixel of the bounding box (d), and w and h represent the width and height. And g means ground truth bounding box. As a result, the value obtained by summing the differences between l mi and g mj is used as the value of the loss function.
  • L conf is defined as in Equation 4. Indicates the probability values for the bounding boxes corresponding to Pos and the probability values for the bounding boxes corresponding to Neg.
  • the term corresponding to the former shows a high probability value when the target to be detected is included in the bounding box.
  • the term corresponding to the latter can obtain a high probability value when there is no object to be detected in the bounding box.
  • the loss value for estimating the type of object can be obtained by summing these two terms.
  • the learning rate is set to 0001, beta-1 to 09, and beta-2 to 0999.
  • the initial values of the weights were all set to Xavier.
  • the batch size was set to 14, and the model with the best performance was selected during a total of 2,000 epochs.
  • the implementation of the algorithm used Pytorch based on Ubuntu 1804, and the specifications of the development PC would be Intel Xeon Gold 6226R, 128GB RAM, NVIDIA Quadro RTX 8000.
  • conventional object recognition technology is a technology that detects dynamic obstacles mainly in the form of bounding boxes with fast computation time, but is very unsuitable for detecting road surface damage.
  • road surface damage is inefficient and inaccurate because its shape is not constant and includes many normal areas when used as a bounding box.
  • road surface damage recognition technology has been developed as an infrastructure maintenance technology so far and can accurately detect it, but requires a long calculation time and a large memory to use as a real-time detection technology.
  • a lot of cost is required to mount an algorithm that takes a long calculation time and requires a large memory capacity to a personal mobile vehicle. Therefore, a new deep learning technology capable of quickly and accurately detecting not only dynamic obstacles but also road surface damage was needed by supplementing these mutual disadvantages.
  • the present invention proposes an image-based artificial intelligence technology capable of simultaneously detecting not only dynamic obstacles that an autonomous personal mobile vehicle may encounter while driving, but also road surface defects.
  • the contribution to the technology proposed in the present invention is as follows. First, it proposes a structure that can simultaneously perform road surface damage detection in connection with an object recognition algorithm used in the field of autonomous driving. We propose a multi-tasking based algorithm by combining a simple structure with the existing object recognition deep neural network structure.
  • the algorithm is performed in real time despite the simultaneous detection of object recognition and road surface damage. Considering these two items, we present joint deep learning that can learn and perform both functions simultaneously.
  • the present invention is expected to contribute to improving the driving safety of personal mobile vehicles by developing and utilizing new technologies such as image sensors, and increasing the utilization of the transportation vulnerable as a driving assistance technology for personal mobility vehicles that can be used as future transportation vehicles. expected to do In addition, the present invention is expected to accelerate the spread of next-generation means of transportation in neglected public transportation infrastructure areas by contributing to improved driving safety.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed are a system and a method for detecting both road surface damage and obstacles by using a deep neural network, and a recording medium in which a computer-readable program for executing the method is recorded. The system for detecting both road surface damage and obstacles comprises an image acquisition unit, a backbone network unit, an object detection unit, a segmentation unit, and an output unit. The image acquisition unit acquires an image of a road on which a vehicle is driving, the backbone network unit extracts image features from the image to generate a plurality of backbone blocks having different sizes, the object detection unit detects obstacles on the road by using the plurality of backbone blocks, the segmentation unit detects surface damage of the road by using the plurality of backbone blocks, and the output unit outputs the obstacles and road surface damage on the image.

Description

심층 신경망을 이용한 도로 노면 손상 및 장애물 동시 탐지 시스템, 방법, 및 상기 방법을 실행시키기 위한 컴퓨터 판독 가능한 프로그램을 기록한 기록 매체.A system and method for simultaneously detecting road surface damage and obstacles using a deep neural network, and a recording medium recording a computer readable program for executing the method.
본 발명은 차량 관련 영상 처리 기술에 관한 것으로서, 더욱 상세하게는 인공 지능을 이용하여 도로 노면 파손 및 이동 장애물을 동시에 탐지하기 위한 시스템, 및 방법에 관한 것이다.The present invention relates to vehicle-related image processing technology, and more particularly, to a system and method for simultaneously detecting road surface damage and moving obstacles using artificial intelligence.
미래의 교통수단은 다양한 형태를 이루게 될 것으로 예상된다. 그 중에서도 최근 활발하게 보급이 이뤄지고 있는 개인이동차량이 새로운 교통수단으로 자리매김을 할 것으로 예상된다. Transportation in the future is expected to take many forms. Among them, it is expected that the personal mobile vehicle, which has recently been actively supplied, will take its place as a new means of transportation.
하지만, 이와 같은 개인 이동 차량은 고령자와 같이 빠른 조향 제어와 정확한 판단이 어려운 사람들에게는 큰 위험요소로 작용하게 된다. 이러한 위험요소를 경감시키기 위해 각종 센서가 장착되어 환경을 인식하고 제어하는 자동화된 기술 개발이 계속되고 있다.However, such a personal mobile vehicle acts as a great risk factor for those who have difficulty in quick steering control and accurate judgment, such as the elderly. In order to mitigate these risk factors, development of automated technology for recognizing and controlling the environment with various sensors being installed continues.
이에 따라, 최근 들어 자율 주행 분야의 다른 기술들과 마찬가지로 센서 분야의 기술은 눈부신 발전을 이루고 있으며, 이는 안전한 주행을 위한 차량 주변 환경 인식 기술이 그만큼 중요하고 필요하기 때문이다. Accordingly, in recent years, like other technologies in the field of autonomous driving, technology in the field of sensors has made remarkable progress, and this is because technology for recognizing the vehicle's surroundings for safe driving is so important and necessary.
도로 주행 상에서 가장 대표적인 장애물로 볼 수 있는 대상은 차량, 사람, 자전거, 오토바이 등이다. 그리고 이런 대상을 인식하여 그에 대응하는 제어를 수행함으로써 차량 사고를 예방할 수 있게 된다. The objects that can be seen as the most representative obstacles on the road are vehicles, people, bicycles, and motorcycles. In addition, vehicle accidents can be prevented by recognizing such objects and performing corresponding control.
또한, 움직임이 활발한 동적 장애물들과 달리 도로 노면에서 피해야 할 장애물이 있다. 주로 도로 낙하물, 노면 파손 등과 같은 대상으로 도로 인프라와 연관된 정적 장애물이다. 특히 포트홀과 거북등 균열과 같은 도로 노면 파손은 개인 이동 차량의 주행에 영향을 미친다. 이는 바퀴의 직경이 일반 차량에 비해 작아 도로 노면 상태에 따라 운전자가 받게 되는 영향이 큰데, 고령자와 장애인이 운전하는 경우 그 정도가 심할 수 있다. In addition, there are obstacles to be avoided on the road surface unlike dynamic obstacles with active movement. It is a static obstacle associated with road infrastructure, mainly for targets such as falling objects on the road, road surface damage, etc. In particular, road surface damage such as potholes and turtle cracks affect the driving of personal mobile vehicles. This is because the diameter of the wheel is smaller than that of a general vehicle, and the driver is greatly affected by the condition of the road surface, which can be severe when the elderly and the disabled drive.
따라서 일반 차량도 노면 상태에 따라 사고가 발생하는 상황에서 소형의 개인 이동 차량은 더 큰 사고의 위험에 노출되어 있어 도로 노면 상태를 실시간으로 인식할 수 있는 기술이 더 절실히 필요하다.Therefore, in a situation where accidents occur according to road surface conditions even in general vehicles, small personal mobile vehicles are exposed to a greater risk of accidents, so technology capable of recognizing road surface conditions in real time is more urgently needed.
본 발명은 상술한 종래의 문제점을 해결하기 위해 안출된 것으로서, 자율 주행형 개인 이동 차량이 도로 주행 중에 마주할 수 있는 동적 장애물뿐만 아니라 도로 노면 불량 상태까지 동시에 탐지할 수 있는 시스템 및 방법을 제공하는 것을 목적으로 한다.The present invention has been made to solve the above-mentioned conventional problems, and provides a system and method capable of simultaneously detecting not only dynamic obstacles that an autonomous personal mobile vehicle may encounter while driving on a road, but also road surface defects. aims to
상기 목적을 달성하기 위해 본 발명에 따른 심층 신경망을 이용한 도로 노면 손상 및 장애물 동시 탐지 시스템은 영상 획득부, 백본 네트워크부, 객체 검출부, 세그멘테이션부, 및 출력부를 포함한다.In order to achieve the above object, a road surface damage and obstacle simultaneous detection system using a deep neural network according to the present invention includes an image acquisition unit, a backbone network unit, an object detection unit, a segmentation unit, and an output unit.
영상 획득부는 주행중인 도로의 영상을 획득하고, 백본 네트워크부는 영상으로부터 영상의 특징을 추출하여 서로 다른 크기를 가지는 복수의 백본 블록을 생성하고, 객체 검출부는 생성된 복수의 백본 블록을 이용하여 도로상의 장애물을 검출하고, 세그멘테이션부는 복수의 백본 블록을 이용하여 도로의 노면 손상을 검출하며, 출력부는 영상에 장애물과 노면 손상을 출력한다.The image acquisition unit acquires an image of the road on which you are driving, the backbone network unit extracts features of the image from the image and creates a plurality of backbone blocks having different sizes, and the object detection unit uses the generated plurality of backbone blocks on the road. The obstacle is detected, the segmentation unit detects road surface damage of the road using a plurality of backbone blocks, and the output unit outputs the obstacle and road surface damage to an image.
이와 같은 구성에 의하면, 컴팩트한 대상을 추출하는 객체 인식 심층 신경망에 형상이 일정하지 않은 대상을 추출하는 세그멘테이션 심층 신경망을 연결하여 이동 장애물이나 노면 손상과 같은 다양한 도로 장애물을 동시에 추출할 수 있게 된다.According to this configuration, it is possible to simultaneously extract various road obstacles such as moving obstacles or road surface damage by connecting a segmentation deep neural network for extracting an object with an irregular shape to an object recognition deep neural network for extracting a compact object.
이때, 세그멘테이션부는, 복수의 백본 블록 각각을 처리하여 출력하는 복수의 오토 인코더 블록부, 복수의 오토 인코더 블록부의 출력을 각각 동일한 크기로 업샘플링하여 복수의 서브 아웃풋을 생성하는 업샘플링부, 및 생성된 복수의 서브 아웃풋을 평균하고 정규화하여 출력하는 평균화부를 포함할 수 있다.At this time, the segmentation unit includes a plurality of auto-encoder block units that process and output each of the plurality of backbone blocks, an up-sampling unit that upsamples the outputs of the plurality of auto-encoder block units to the same size and generates a plurality of sub-outputs, and and an averaging unit for averaging, normalizing, and outputting a plurality of sub-outputs.
또한, 세그멘테이션부는 심층 신경망을 이용하여 도로의 노면 손상을 검출하며, 노면의 파손 영역과 정상 영역을 구분하기 위해 마지막 출력 채널의 수가 2일 수 있다.In addition, the segmentation unit detects damage to the road surface by using a deep neural network, and the number of the last output channel may be 2 to distinguish between a damaged area and a normal area of the road surface.
또한, 객체 검출부는, 복수의 백본 블록을 동일한 크기로 변경하여 기본 특징으로 통합하는 FFM부, 상기 기본 특징으로부터 복수 크기의 객체 인식을 위한 특징을 생성하는 TUM부, 및 TUM부에서 생성된 특징을 통합하여 복수 레벨의 특징 피라미드를 생성하는 SFAM부를 포함할 수 있다.In addition, the object detection unit includes an FFM unit that changes a plurality of backbone blocks to the same size and integrates them into basic features, a TUM unit that generates features for recognizing objects of multiple sizes from the basic features, and features generated by the TUM unit. It may include a SFAM unit that integrates and generates a multi-level feature pyramid.
또한, 객체 검출부는 복수 레벨의 특징 피라미드를 이용하여 장애물을 예측하는 예측부를 더 포함할 수 있다.Also, the object detector may further include a predictor that predicts an obstacle using a multi-level feature pyramid.
또한, 출력부는 영상에 장애물에 대응하는 외곽선을 출력할 수 있으며, 출력부는 영상에 노면 손상에 대응하는 화소 영역을 출력할 수 있다.Also, the output unit may output an outline corresponding to the obstacle to the image, and the output unit may output a pixel area corresponding to the road surface damage to the image.
또한, 본 발명에 따른 본 발명에 따른 심층 신경망을 이용한 도로 노면 손상 및 장애물 동시 탐지 방법은, 주행중인 도로의 영상을 획득하는 영상 획득 단계, 획득된 영상으로부터 영상의 특징을 추출하여 서로 다른 크기를 가지는 복수의 백본 블록을 생성하는 백본 블록 생성 단계, 복수의 백본 블록을 이용하여 도로상의 장애물을 검출하는 객체 검출 단계, 복수의 백본 블록을 이용하여 도로의 노면 손상을 검출하는 세그멘테이션 단계, 및 영상에 장애물과 노면 손상을 출력하는 출력 단계를 포함한다.In addition, the method for simultaneously detecting road surface damage and obstacles using a deep neural network according to the present invention includes an image acquisition step of obtaining an image of a road being driven, and extracting features of the image from the acquired image to obtain different sizes. A backbone block generation step of generating a plurality of backbone blocks having a plurality of backbone blocks, an object detection step of detecting an obstacle on the road using a plurality of backbone blocks, a segmentation step of detecting damage to a road surface using a plurality of backbone blocks, and an image and an output step of outputting obstacles and road surface damage.
아울러, 상기 방법을 실행시키기 위한 컴퓨터 판독 가능한 프로그램을 기록한 기록 매체가 개시된다.In addition, a recording medium recording a computer readable program for executing the method is disclosed.
본 발명에 의하면, 컴팩트한 대상을 추출하는 객체 인식 심층 신경망에 형상이 일정하지 않은 대상을 추출하는 세그멘테이션 심층 신경망을 연결하여 이동 장애물이나 노면 손상과 같은 다양한 도로 장애물을 동시에 추출할 수 있게 된다.According to the present invention, it is possible to simultaneously extract various road obstacles, such as moving obstacles or road surface damage, by connecting an object recognition deep neural network for extracting a compact object with a segmentation deep neural network for extracting an object with an irregular shape.
도 1은 본 발명의 일 실시예에 따른 심층 신경망을 이용한 도로 노면 손상 및 장애물 동시 탐지 시스템의 개략적인 블록도.1 is a schematic block diagram of a road surface damage and obstacle simultaneous detection system using a deep neural network according to an embodiment of the present invention.
도 2는 도 1의 도로 노면 손상 및 장애물 동시 탐지 시스템의 구현하기 위해 제안된 네트워크 구조를 도시한 도면.FIG. 2 is a diagram illustrating a network structure proposed to implement the system for simultaneously detecting road surface damage and obstacles of FIG. 1; FIG.
도 3은 도 2의 오토 인코더 블록의 구조를 도시한 도면.Fig. 3 is a diagram showing the structure of the auto encoder block of Fig. 2;
이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 설명한다.Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.
도 1은 본 발명의 일 실시예에 따른 심층 신경망을 이용한 도로 노면 손상 및 장애물 동시 탐지 시스템의 개략적인 블록도이다. 도 1에서, 도로 노면 손상 및 장애물 동시 탐지 시스템은 영상 획득부(110), 백본 네트워크부(120), 객체 검출부(130), 세그멘테이션부(140), 및 출력부(150)를 포함하고, 객체 검출부(130)는 FFM부(132), TUM부(134), SFAM부(136), 및 예측부(138)를, 세스멘테이션부(140)는 복수의 오토 인코더 블록부(142), 복수의 업샘플링부(144), 및 평균화부(146)를 각각 다시 포함한다.1 is a schematic block diagram of a road surface damage and obstacle simultaneous detection system using a deep neural network according to an embodiment of the present invention. 1, the road surface damage and obstacle simultaneous detection system includes an image acquisition unit 110, a backbone network unit 120, an object detection unit 130, a segmentation unit 140, and an output unit 150, and an object The detection unit 130 includes an FFM unit 132, a TUM unit 134, a SFAM unit 136, and a prediction unit 138, and the segmentation unit 140 includes a plurality of auto encoder block units 142, a plurality of An upsampling unit 144 and an averaging unit 146 of , respectively, are included again.
먼저, 영상 획득부(110)는 주행중인 도로의 영상을 획득하고, 백본 네트워크부(120)는 획득된 영상으로부터 영상의 특징을 추출하여 서로 다른 크기를 가지는 복수의 백본 블록을 생성한다.First, the image acquisition unit 110 acquires an image of the road being driven, and the backbone network unit 120 extracts features of the image from the acquired image and generates a plurality of backbone blocks having different sizes.
객체 검출부(130)는 생성된 복수의 백본 블록을 이용하여 도로상의 장애물을 검출한다. 이를 위해, FFM부(132)는 복수의 백본 블록을 동일한 크기로 변경하여 기본 특징으로 통합하고, TUM부(134)는 기본 특징으로부터 복수 크기의 객체 인식을 위한 특징을 생성하고, SFAM(136)부는 TUM부(134)에서 생성된 특징을 통합하여 복수 레벨의 특징 피라미드를 생성하며, 예측부(138)는 복수 레벨의 특징 피라미드를 이용하여 장애물을 예측한다. The object detector 130 detects an obstacle on the road using the generated plurality of backbone blocks. To this end, the FFM unit 132 changes a plurality of backbone blocks to the same size and integrates them into basic features, and the TUM unit 134 generates features for recognizing objects of multiple sizes from the basic features, and the SFAM 136 The unit integrates the features generated by the TUM unit 134 to generate a multi-level feature pyramid, and the prediction unit 138 predicts an obstacle using the multi-level feature pyramid.
객체 검출부(130)는 동적 장애물을 탐지하기 위한 구성으로서, 구체적인 예를 들어 설명하자면, 7종의 동적 장애물 탐지할 수 있는 알고리즘으로 구현될 수 있다. 이를 위해, 차량, 트럭, 버스, 오토바이, 자전거, 신호등, 사람과 같은 동적 장애물이 bounding box 형태로 표시되어 있는 1,418장의 데이터 세트의 자체 확보할 수 있다. 또한, bounding box 형태로 된 동적 장애물을 탐지할 수 있는 알고리즘 개발을 위한 심층 신경망 구조를 확보하여, Multiscale and Multilevel feature를 기반으로 하는 객체 인식 기술로서 구현될 수 있다. The object detection unit 130 is a component for detecting a dynamic obstacle, and may be implemented with an algorithm capable of detecting 7 types of dynamic obstacles. To this end, it can self-procur a data set of 1,418 sheets in which dynamic obstacles such as vehicles, trucks, buses, motorcycles, bicycles, traffic lights, and people are displayed in the form of bounding boxes. In addition, by securing a deep neural network structure for developing an algorithm capable of detecting dynamic obstacles in the form of a bounding box, it can be implemented as an object recognition technology based on multiscale and multilevel features.
세그멘테이션부(140)는 복수의 백본 블록을 이용하여 도로의 노면 손상을 검출한다. 이를 위해, 복수의 오토 인코더 블록부(142)는 복수의 백본 블록 각각을 처리하여 출력하고, 복수의 업샘플링부(144)는 복수의 오토 인코더 블록(142)의 출력을 각각 동일한 크기로 업샘플링하여 복수의 서브 아웃풋을 생성하며, 평균화부(146)는 복수의 서브 아웃풋을 평균하고 정규화하여 출력한다. 이때, 세그멘테이션부(140)는 심층 신경망을 이용하여 도로의 노면 손상을 검출하며, 노면의 파손 영역과 정상 영역을 구분하기 위해 마지막 출력 채널의 수가 2일 수 있다.The segmentation unit 140 detects damage to the road surface by using a plurality of backbone blocks. To this end, the plurality of auto encoder block units 142 process and output each of the plurality of backbone blocks, and the plurality of upsampling units 144 upsamples the outputs of the plurality of auto encoder blocks 142 to the same size, respectively. to generate a plurality of sub-outputs, and the averaging unit 146 averages and normalizes the plurality of sub-outputs and outputs them. At this time, the segmentation unit 140 detects damage to the road surface by using a deep neural network, and the number of the last output channel may be 2 to distinguish between a damaged area and a normal area of the road surface.
세그멘터이션부(140)는 도로 노면 파손 탐지하기 위한 구성으로서, linear crack, alligator crack, pothole 등의 도로 노면 파손 영역을 탐지할 수 있다. 객체 인식을 위한 심층 신경망에서 사용되는 backbone network를 도로 노면 파손 탐지를 위해 공유하는 기법으로서, Multilevel feature로부터 4개의 경량화 auto-encoder 신경망을 연결하여 도로 노면 파손 영역을 탐지하고, 4개의 경량화 auto-encoder 신경망으로부터 얻은 도로 노면 파손 영역의 평균으로 최종 파손 영역을 결정할 수 있다.The segmentation unit 140 is a component for detecting road surface damage and can detect road surface damage areas such as linear cracks, alligator cracks, and potholes. This is a technique that shares the backbone network used in deep neural networks for object recognition for road surface damage detection. It detects road surface damage areas by connecting four lightweight auto-encoder neural networks from multilevel features, and uses four lightweight auto-encoders. The final damage area can be determined as the average of the road surface damage areas obtained from the neural network.
출력부(150)는 영상에 장애물과 노면 손상을 출력한다. 이때, 출력부(150)는 영상에 장애물에 대응하는 외곽선을 출력할 수 있으며, 영상에 노면 손상에 대응하는 화소 영역을 출력할 수 있다.The output unit 150 outputs obstacles and road surface damage to images. At this time, the output unit 150 may output an outline corresponding to the obstacle to the image, and may output a pixel area corresponding to the road surface damage to the image.
본 발명은 Multi-task 기반의 실시간 딥러닝 탐지를 위한 구성으로서, 실제 실험 결과, 한 장의 입력 영상을 처리하기 위해 소요되는 시간은 89ms로서, 실시간 multi-task 기반으로 영상처리를 수행할 수 있다.The present invention is a configuration for real-time deep learning detection based on multi-task, and as a result of actual experiments, the time required to process one input image is 89 ms, and image processing can be performed based on real-time multi-task.
자율 주행 차량은 통상적으로 50 ~ 200ms의 센서 측정주기를 가지지만, 개인 이동 차량의 경우 주행 속도가 이보다 느려 더 긴 측정 주기를 가질 수 있다. 따라서, 89ms의 측정 주기는 개인 이동 차량에 탑재되어 실시간으로 객체 인식과 도로 노면 파손 탐지가 가능한 주기이다.An autonomous vehicle typically has a sensor measurement period of 50 to 200 ms, but a personal mobile vehicle may have a longer measurement period due to a slower driving speed. Therefore, the measurement period of 89 ms is a period capable of real-time object recognition and road surface damage detection by being mounted on a personal mobile vehicle.
도 2는 도 1의 도로 노면 손상 및 장애물 동시 탐지 시스템의 구현하기 위해 제안된 네트워크 구조를 도시한 도면이다. 도 2를 참조하여 본 발명을 보다 구체적으로 설명하면 다음과 같다.FIG. 2 is a diagram illustrating a network structure proposed to implement the system for simultaneously detecting road surface damage and obstacles of FIG. 1 . The present invention will be described in more detail with reference to FIG. 2 as follows.
1 M2Det 활용한 객체 인식 심층 신경망1 Object recognition deep neural network using M2Det
M2Det은 multilevel and multiscale detection의 뜻으로 객체 인식을 위해 다양한 수준의 feature와 다양한 크기의 영상을 활용한다는 의미다. 측정하려는 대상은 대상의 촬영 거리에 따라 영상 내에서 크기가 달라진다. 이를 해결하기 위해서 입력 영상의 크기를 달리하여 학습을 수행한다. 그리고 측정 대상의 형상과 context에 따라 feature의 복잡도 또한 달라진다. 예를 들어, 신호등의 경우 대부분의 형태는 서로 유사하지만 사람의 경우 연령이나 착용한 의상 등에 따라 형태와 context가 매우 다양해진다. 이를 해결하기 위해, 다양한 수준의 feature가 필요하다. 이 두 가지 요구사항을 만족시키기 위해서 multiscale과 multilevel 방식을 모두 적용한 MD2Det를 사용한다.M2Det stands for multilevel and multiscale detection, which means that various levels of features and images of various sizes are used for object recognition. The object to be measured varies in size within the image according to the shooting distance of the object. To solve this problem, learning is performed by varying the size of the input image. In addition, the complexity of the feature also varies according to the shape and context of the measurement target. For example, in the case of traffic lights, most of the shapes are similar to each other, but in the case of people, the shapes and contexts vary greatly depending on age or clothes worn. To solve this, various levels of features are needed. To satisfy these two requirements, MD2Det with both multiscale and multilevel methods applied is used.
M2Det에서 사용한 Multilevel Feature Pyramid Network (MLFPN)은 도 2와 같다. 이는 크게 3가지 모듈로 구성되는데, Feature Fusion Module (FFM), Thinned U-shape Module (TUM), Scale-wise Feature Aggregation Module (SFAM)이 되겠다. FFM은 feature를 통합하는 기능으로서, 예를 들어, backbone network에서 추출한 feature를 동일한 크기로 변경하여 base feature로 통합하는 역할을 수행한다. TUM은 auto-encoder 형태로 다양한 크기의 객체 인식을 위해 의미 있는 feature를 생성하는 역할을 수행한다. 끝으로, SFAM은 TUM에서 생성된 feature를 통합하여 다양한 level의 feature pyramid를 생성한 후에 classification과 localization을 위한 정보를 제공한다.The Multilevel Feature Pyramid Network (MLFPN) used in M2Det is shown in FIG. 2. It consists of three modules: Feature Fusion Module (FFM), Thinned U-shape Module (TUM), and Scale-wise Feature Aggregation Module (SFAM). FFM is a function of integrating features. For example, it serves to change the features extracted from the backbone network to the same size and integrate them into base features. TUM plays a role in generating meaningful features for recognizing objects of various sizes in the form of an auto-encoder. Finally, SFAM provides information for classification and localization after creating a feature pyramid of various levels by integrating the features generated by TUM.
2 Semantic segmentation network structure2 Semantic segmentation network structure
본 발명에서는 도 2와 같이 segmentation을 위해 hierarchical feature를 사용한다. 이는 단계별로 생성되는 여러 크기의 feature를 사용하는 방식이다. 도로 균열이 있는 영상이 입력으로 들어왔을 때, backbone network을 통과한다. M2Det은 backbone network로 VGG16를 사용한다. 그 결과 총 4단계에 걸쳐서 backbone block (B block)으로 구성되고 각각은 auto-encoder (AE) block으로 연결된다. 하나의 B block은 그 크기가 절반씩 줄어들어 다음 B block으로 이어진다. 따라서, B block은 단계가 지나면서 출력의 크기가 [576×576, 288×288, 144×144, 72×72]와 같이 변하게 된다. In the present invention, as shown in FIG. 2, hierarchical features are used for segmentation. This is a method that uses features of various sizes that are generated step by step. When an image with road cracks is input, it passes through the backbone network. M2Det uses VGG16 as the backbone network. As a result, it is composed of backbone blocks (B blocks) through a total of 4 steps, and each is connected to an auto-encoder (AE) block. One B block is reduced in size by half, leading to the next B block. Therefore, the size of the output of the B block changes as [576×576, 288×288, 144×144, 72×72] as the steps pass.
다음으로, AE block의 입력으로 사용될 경우, 그 출력의 크기는 입력의 크기와 동일하다. 다만, 입력의 채널 수와 관계없이 최종 출력의 채널 수는 2로 파손 영역과 정상영역으로 구분할 수 있도록 하였다. 끝으로 AE block의 출력은 576×576로 up-sample이 되어 출력인 sub-output이 된다. 총 4개의 sub-output은 average operation을 거쳐 softmax 함수를 적용한 후 모두 0과 1로 normalization이 된 값이 되도록 한다. 그리고, 두 번째 채널의 값이 0.5보다 큰 위치는 도로 노면 파손이 존재하는 것으로 간주한다.Next, when used as an input of an AE block, the size of the output is the same as the size of the input. However, regardless of the number of channels in the input, the number of channels in the final output is 2, so that it can be divided into a damaged area and a normal area. Finally, the output of the AE block becomes a 576×576 up-sample and becomes a sub-output. A total of 4 sub-outputs are normalized to 0 and 1 after applying the softmax function through average operation. In addition, a location where the value of the second channel is greater than 0.5 is regarded as having road surface damage.
본 발명에서는 backbone network에서 생성된 feature를 AE block의 입력으로 사용한다. AE block은 입력이 들어왔을 때 먼저 도 3과 같이 convolution - batch normalization - rectifier linear unit (Conv-BN-ReLU) 연산과정을 거친다. 이때 kernel size는 7×7로, padding은 3으로 하여 연산 과정에서 크기를 동일하게 유지한다. 도 3은 도 2의 오토 인코더 블록의 구조를 도시한 도면이다.In the present invention, the features generated by the backbone network are used as inputs to the AE block. When an input is received, the AE block first undergoes a convolution-batch normalization-rectifier linear unit (Conv-BN-ReLU) operation process as shown in FIG. At this time, the kernel size is set to 7×7 and the padding is set to 3 to keep the size the same during the operation process. FIG. 3 is a diagram showing the structure of the auto encoder block of FIG. 2;
다음으로, 2개의 encoder block과 decoder block으로 구성된 심층 신경망을 사용한다. encoder block의 경우 skip connection을 포함한 residual network와 down sample 합성곱 연산을 포함한 residual network로 구분된다. Next, we use a deep neural network consisting of two encoder blocks and a decoder block. In the case of the encoder block, it is divided into a residual network including skip connection and a residual network including down sample convolution operation.
전자는 입력 정보가 합성곱 연산 후에 소실되는 것을 줄이기 위해, 마지막 활성화 함수 전에 입력정보의 가중치 값을 더하는 skip connection을 사용한 심층 신경망이다. 후자는 최초의 합성곱 연산에서 stride를 2로 설정하여 크기를 절반으로 줄이는데, 입력 정보도 kernel size가 1×1인 또 다른 합성곱 연산을 통해 크기를 줄여 가중치 값을 더하는 심층 신경망이다. 이 encoder block을 한번 지날 때마다 feature의 크기는 절반으로 줄어들게 된다. The former is a deep neural network using skip connection that adds the weight value of input information before the last activation function to reduce loss of input information after convolution operation. The latter reduces the size by half by setting the stride to 2 in the first convolution operation, and the input information is also a deep neural network that reduces the size through another convolution operation with a kernel size of 1×1 and adds weight values. Each time it passes through this encoder block, the size of the feature is reduced by half.
그리고 이를 2차례 반복하게 되어 크기는 4분의 1이 된다. 이렇게 줄어든 feature를 복원하기 위해 decoder block을 사용한다. decoder block의 경우 두 개의 decoder network로 구성된다. 이 network는 앞선 block에서 줄어든 크기를 원래대로 복원하는 과정이 주된 역할로, 본 발명에서는 transposed convolution 연산을 사용한다. 이 연산을 수행하면 feature의 크기는 2배가 된다. 그리고 이를 2차례 반복하여 원래의 입력 크기로 만든다. 끝으로 마지막 신경망에 Conv-BN-ReLU를 연산으로 붙여서 마지막 출력 채널의 수가 2가 되도록 한다. 그 결과 최종적으로 5단계의 연산을 통해 생성되는 feature의 채널수는 [64, 128, 64, 32, 2]와 같이 된다.And repeat this twice, so the size becomes 1/4. To restore these reduced features, a decoder block is used. A decoder block consists of two decoder networks. In this network, the process of restoring the size reduced in the previous block to its original role is the main role, and the present invention uses the transposed convolution operation. Performing this operation doubles the size of the feature. And repeat this twice to make it the original input size. Finally, add Conv-BN-ReLU as an operation to the last neural network so that the number of final output channels is 2. As a result, the number of feature channels finally created through the five-step operation is [64, 128, 64, 32, 2].
3 손실 함수3 loss function
본 발명에서 제안하는 segmentation의 심층 신경망의 가중치를 업데이트하기 위해서 수학식 1과 같은 손실함수를 사용한다. i는 화소의 위치를 나타내고 s는 심층 신경망의 output 단계로 1, 2, 3, 4로 표현된다. N은 sub-output의 크기를 나타내는 것으로 576×576이다. ysi는 s 단계에서 i 위치의 label 값으로 0 또는 1이 되고, P(ysi)는 동일 위치에서 prediction 확률 값이 된다. hierarchical feature로부터 얻은 여러 개의 sub-output에 cross-entropy의 손실함수를 적용 후 그 값을 summation하는 방법을 사용한다.In order to update the weights of the deep neural network of segmentation proposed in the present invention, a loss function such as Equation 1 is used. i indicates the position of the pixel and s is the output stage of the deep neural network, expressed as 1, 2, 3, 4. N represents the size of the sub-output, which is 576×576. y si becomes 0 or 1 as the label value of position i in step s, and P(y si ) becomes the prediction probability value at the same position. After applying the loss function of cross-entropy to several sub-outputs obtained from hierarchical features, the method of summation is used.
Figure PCTKR2021013899-appb-img-000001
Figure PCTKR2021013899-appb-img-000001
Figure PCTKR2021013899-appb-img-000002
Figure PCTKR2021013899-appb-img-000002
다음으로, 본 발명에서 사용하는 object detection의 심층 신경망의 가중치를 업데이트하기 위해 손실함수를 사용한다. 손실함수는 다양한 비율을 가진 candidate bounding box에서 탐지하고자 하는 객체를 포함하고 있는 bounding box를 선택하는 것을 목표로 한다. 그렇기 때문에 M2Det 또한, 수만개의 candidate bounding box를 생성하고 이들을 ground truth bounding box와 비교하여 그 차이가 가장 작은 적은 bounding box를 선택하도록 학습한다. 이때 차이를 나타내는 지표를 손실함수로 사용하는데, 그 식은 수학식 2와 같이 정의한다. Next, a loss function is used to update the weights of the deep neural network for object detection used in the present invention. The loss function aims to select the bounding box containing the object to be detected from the candidate bounding boxes with various ratios. Therefore, M2Det also creates tens of thousands of candidate bounding boxes, compares them with the ground truth bounding box, and learns to select the smallest bounding box with the smallest difference. At this time, an indicator representing the difference is used as a loss function, which is defined as in Equation 2.
이 식에서 Lconf는 탐지하려는 객체의 종류를 결정하는 역할을 수행하고, Lloc는 영상 내에서 객체의 위치를 결정하는 역할을 수행한다. xPi,j는 i번째 candidate bounding box와 j번째 p종류를 나타내는 ground truth bounding box의 중첩도가 50%이상인 경우 1이라고 하고, 그 외에는 0이 된다. candidate bounding box들 가운데 중첩도가 50% 이상인 box들은 Pos라고 하고 그 외 box는 Neg라고 한다. 그리고 Pos에 포함된 box들의 수를 N이라고 한다. c는 객체의 종류를 추정한 확률 값이다. l은 영상 내의 객체의 위치를 추정한 값을 의미하며, g는 영상 내에 실제 객체의 위치를 나타낸다.In this equation, L conf plays a role in determining the type of object to be detected, and L loc plays a role in determining the location of an object in the image. x Pi,j is set to 1 when the degree of overlap between the i-th candidate bounding box and the j-th ground truth bounding box representing the p category is 50% or more, and is 0 otherwise. Among the candidate bounding boxes, boxes with an overlap of 50% or more are called Pos, and the other boxes are called Neg. And let N be the number of boxes included in Pos. c is a probability value that estimates the type of object. l denotes a value obtained by estimating the position of an object in the image, and g represents the actual position of the object in the image.
Figure PCTKR2021013899-appb-img-000003
Figure PCTKR2021013899-appb-img-000003
Lloc에 대해 자세히 설명하면 그 식은 수학식 3과 같이 정의된다. cx,cy는 bounding box (d)의 센터 화소 좌표이고, w, h는 너비와 높이를 나타낸다. 그리고 g는 ground truth bounding box를 의미한다. 그 결과 lmi과 gmj의 차이를 모두 합하여 얻은 값을 손실 함수 값으로 사용한다.If L loc is described in detail, the expression is defined as in Equation 3. cx,cy are the coordinates of the center pixel of the bounding box (d), and w and h represent the width and height. And g means ground truth bounding box. As a result, the value obtained by summing the differences between l mi and g mj is used as the value of the loss function.
Figure PCTKR2021013899-appb-img-000004
Figure PCTKR2021013899-appb-img-000004
Figure PCTKR2021013899-appb-img-000005
Figure PCTKR2021013899-appb-img-000005
Figure PCTKR2021013899-appb-img-000006
Figure PCTKR2021013899-appb-img-000006
Figure PCTKR2021013899-appb-img-000007
Figure PCTKR2021013899-appb-img-000007
Figure PCTKR2021013899-appb-img-000008
Figure PCTKR2021013899-appb-img-000008
Lconf은 수학식 4와 같이 정의된다. Pos에 해당하는 bounding box들에 대한 확률 값과 Neg에 해당하는 bounding box들에 대한 확률 값을 가리킨다. 전자에 해당하는 항은 bounding box 안에 탐지하고자 하는 대상을 포함하고 있을 때, 높은 확률 값을 나타낸다. 반면에 후자에 해당하는 항은 bounding box에 탐지하고자 하는 대상이 없을 경우에 높은 확률 값을 얻을 수 있다. 결론적으로 이 두 항의 합으로 객체의 종류를 추정하는 손실 값을 얻을 수 있다.L conf is defined as in Equation 4. Indicates the probability values for the bounding boxes corresponding to Pos and the probability values for the bounding boxes corresponding to Neg. The term corresponding to the former shows a high probability value when the target to be detected is included in the bounding box. On the other hand, the term corresponding to the latter can obtain a high probability value when there is no object to be detected in the bounding box. In conclusion, the loss value for estimating the type of object can be obtained by summing these two terms.
Figure PCTKR2021013899-appb-img-000009
Figure PCTKR2021013899-appb-img-000009
Figure PCTKR2021013899-appb-img-000010
Figure PCTKR2021013899-appb-img-000010
4 학습 조건4 learning conditions
도로에서 마주할 수 있는 동적 또는 정적 장애물을 탐지하기 위한 심층 신경망을 학습하기 위해 전체 데이터 중에 1,218장은 학습용으로 사용하였고, 200장은 검증용으로 사용하였다. 그리고 최적화 함수는 ADAM을 동일하게 적용하였다. 그리고 이 함수에서 사용한 매개변수로 learning rate는 0001, beta-1은 09, beta-2는 0999로 설정하였다.In order to learn a deep neural network to detect dynamic or static obstacles that can be encountered on the road, 1,218 of the total data were used for learning and 200 were used for verification. And the optimization function applied the same ADAM. And as parameters used in this function, the learning rate is set to 0001, beta-1 to 09, and beta-2 to 0999.
학습을 수행하기에 앞서 가중치의 초기값은 모두 Xavier로 설정하였다. batch의 크기는 14로 설정하여 총 2,000회의 Epoch를 진행하는 동안 가장 성능이 좋은 모델을 선정하였다. 알고리즘의 구현은 Ubuntu 1804 기반의 Pytorch를 사용하였고, 개발용 PC의 사양은 Intel Xeon Gold 6226R, 128GB RAM, NVIDIA Quadro RTX 8000이 되겠다.Prior to learning, the initial values of the weights were all set to Xavier. The batch size was set to 14, and the model with the best performance was selected during a total of 2,000 epochs. The implementation of the algorithm used Pytorch based on Ubuntu 1804, and the specifications of the development PC would be Intel Xeon Gold 6226R, 128GB RAM, NVIDIA Quadro RTX 8000.
정리하면, 종래, 객체인식 기술은 주로 bounding box의 형태로 동적 장애물을 빠른 연산시간으로 탐지하는 기술이지만, 도로 노면 파손을 탐지하는데 매우 부적합하였다. 또한, 도로 노면 파손은 그 형상이 일정하지 않아 bounding box로 할 경우 정상적인 영역을 다수 포함하기 때문에 비효율적이고 부정확하다.In summary, conventional object recognition technology is a technology that detects dynamic obstacles mainly in the form of bounding boxes with fast computation time, but is very unsuitable for detecting road surface damage. In addition, road surface damage is inefficient and inaccurate because its shape is not constant and includes many normal areas when used as a bounding box.
이에 반해, 도로 노면 파손 인식 기술은 지금까지 인프라 유지관리 기술로 개발되었기 정확한 탐지가 가능하지만, 실시간 탐지 기술로 활용하기 위해서는 긴 연산시간과 큰 메모리가 필요하다. 또한, 연산시간이 오래 걸리고 큰 용량의 메모리가 필요한 알고리즘을 개인 이동 차량에 탑재하기 위해서는 많은 비용이 필요하다. 따라서, 이런 상호간의 단점을 보완하여 동적 장애물뿐만 아니라 도로 노면 파손을 빠르고 정확하게 탐지할 수 있는 새로운 딥러닝 기술이 필요하였다.On the other hand, road surface damage recognition technology has been developed as an infrastructure maintenance technology so far and can accurately detect it, but requires a long calculation time and a large memory to use as a real-time detection technology. In addition, a lot of cost is required to mount an algorithm that takes a long calculation time and requires a large memory capacity to a personal mobile vehicle. Therefore, a new deep learning technology capable of quickly and accurately detecting not only dynamic obstacles but also road surface damage was needed by supplementing these mutual disadvantages.
이에 대해, 본 발명에서는 자율 주행형 개인 이동 차량이 도로 주행 중에 마주할 수 있는 동적 장애물뿐만 아니라 도로 노면 불량 상태까지 동시에 탐지할 수 있는 영상 기반의 인공 지능 기술을 제안한다.In contrast, the present invention proposes an image-based artificial intelligence technology capable of simultaneously detecting not only dynamic obstacles that an autonomous personal mobile vehicle may encounter while driving, but also road surface defects.
본 발명에서 제안하는 기술에 대한 기여도는 다음과 같다. 첫째, 자율 주행 분야에서 사용하는 객체 인식 알고리즘과 연결되어 도로 노면 파손 탐지를 동시에 수행할 수 있는 구조를 제안한 점이다. 간단한 구조를 기존의 객체 인식 심층 신경망 구조와 결합하여 multi-tasking 기반의 알고리즘을 제시한다. The contribution to the technology proposed in the present invention is as follows. First, it proposes a structure that can simultaneously perform road surface damage detection in connection with an object recognition algorithm used in the field of autonomous driving. We propose a multi-tasking based algorithm by combining a simple structure with the existing object recognition deep neural network structure.
둘째, 객체 인식과 도로 노면 파손을 동시에 탐지함에도 불구하고 실시간으로 알고리즘이 수행된다는 점이다. 이러한 2가지 항목을 고려하여, 두 가지 기능을 동시에 학습하고 수행할 수 있는 joint deep learning을 제시한다.Second, the algorithm is performed in real time despite the simultaneous detection of object recognition and road surface damage. Considering these two items, we present joint deep learning that can learn and perform both functions simultaneously.
본 발명은 영상 센서 등 신기술 개발 및 활용으로 개인 이동 차량의 주행 안전성의 향상에 기여할 것으로 예상되며, 미래의 교통수단으로 활용될 수 있는 개인 이동 수단에 대한 주행 보조 기술로서 교통 약자의 활용성을 증대시킬 것으로 예상된다. 또한, 본 발명은 주행 안전성 향상에 기여함으로써, 소외된 대중교통 인프라 지역의 차세대 이동 수단의 보급을 가속화할 것으로 기대된다.The present invention is expected to contribute to improving the driving safety of personal mobile vehicles by developing and utilizing new technologies such as image sensors, and increasing the utilization of the transportation vulnerable as a driving assistance technology for personal mobility vehicles that can be used as future transportation vehicles. expected to do In addition, the present invention is expected to accelerate the spread of next-generation means of transportation in neglected public transportation infrastructure areas by contributing to improved driving safety.
본 발명이 비록 일부 바람직한 실시예에 의해 설명되었지만, 본 발명의 범위는 이에 의해 제한되어서는 아니 되고, 특허청구범위에 의해 뒷받침되는 상기 실시예의 변형이나 개량에도 미쳐야할 것이다.Although the present invention has been described by some preferred embodiments, the scope of the present invention should not be limited thereto, but should also extend to modifications or improvements of the above embodiments supported by the claims.

Claims (9)

  1. 주행중인 도로의 영상을 획득하는 영상 획득부;An image acquisition unit that acquires an image of a driving road;
    상기 영상으로부터 상기 영상의 특징을 추출하여 서로 다른 크기를 가지는 복수의 백본 블록을 생성하는 백본 네트워크부;a backbone network unit extracting features of the image from the image and generating a plurality of backbone blocks having different sizes;
    상기 복수의 백본 블록을 이용하여 상기 도로상의 장애물을 검출하는 객체 검출부;an object detector detecting obstacles on the road using the plurality of backbone blocks;
    상기 복수의 백본 블록을 이용하여 상기 도로의 노면 손상을 검출하는 세그멘테이션부; 및a segmentation unit detecting road surface damage of the road using the plurality of backbone blocks; and
    상기 영상에 상기 장애물과 노면 손상을 출력하는 출력부를 포함하는 것을 특징으로 하는 도로 노면 손상 및 장애물 동시 탐지 시스템.A system for simultaneously detecting road surface damage and obstacles, characterized in that it comprises an output unit for outputting the obstacle and road surface damage in the image.
  2. 청구항 1에 있어서, 상기 세그멘테이션부는,The method according to claim 1, wherein the segmentation unit,
    상기 복수의 백본 블록 각각을 처리하여 출력하는 복수의 오토 인코더 블록부; a plurality of auto encoder block units for processing and outputting each of the plurality of backbone blocks;
    상기 복수의 오토 인코더 블록부의 출력을 각각 미리 설정된 크기로 업샘플링하여 복수의 서브 아웃풋을 생성하는 복수의 업샘플링부; 및a plurality of upsampling units generating a plurality of sub-outputs by upsampling outputs of the plurality of auto encoder block units to a predetermined size; and
    상기 복수의 서브 아웃풋을 평균하고 정규화하여 출력하는 평균화부를 포함하는 것을 특징으로 하는 도로 노면 손상 및 장애물 동시 탐지 시스템.and an averaging unit that averages and normalizes the plurality of sub-outputs and outputs them.
  3. 청구항 2에 있어서,The method of claim 2,
    상기 세그멘테이션부는 심층 신경망을 이용하여 상기 도로의 노면 손상을 검출하며, 노면의 파손 영역과 정상 영역을 구분하기 위해 마지막 출력 채널의 수가 2인 것을 특징으로 하는 도로 노면 손상 및 장애물 동시 탐지 시스템.The segmentation unit detects road surface damage of the road using a deep neural network, and the number of the last output channel is 2 to distinguish between a damaged area and a normal area of the road surface Damage and obstacle simultaneous detection system.
  4. 청구항 3에 있어서, 상기 객체 검출부는,The method according to claim 3, wherein the object detection unit,
    상기 복수의 백본 블록을 동일한 크기로 변경하여 기본 특징으로 통합하는 FFM부;an FFM unit that changes the plurality of backbone blocks to the same size and integrates them into basic features;
    상기 기본 특징으로부터 복수 크기의 객체 인식을 위한 특징을 생성하는 TUM부; 및a TUM unit generating features for object recognition of multiple sizes from the basic features; and
    상기 TUM부에서 생성된 특징을 통합하여 복수 레벨의 특징 피라미드를 생성하는 SFAM부를 포함하는 것을 특징으로 하는 도로 노면 손상 및 장애물 동시 탐지 시스템.and a SFAM unit generating a multi-level feature pyramid by integrating the features generated by the TUM unit.
  5. 청구항 4에 있어서, 상기 객체 검출부는,The method according to claim 4, wherein the object detection unit,
    상기 복수 레벨의 특징 피라미드를 이용하여 상기 장애물을 예측하는 예측부를 더 포함하는 것을 특징으로 하는 도로 노면 손상 및 장애물 동시 탐지 시스템.The road surface damage and obstacle simultaneous detection system, characterized in that it further comprises a prediction unit for predicting the obstacle using the multi-level feature pyramid.
  6. 청구항 5에 있어서,The method of claim 5,
    상기 출력부는 상기 영상에 상기 장애물에 대응하는 외곽선을 출력하는 것을 특징으로 하는 도로 노면 손상 및 장애물 동시 탐지 시스템.The simultaneous detection system for road surface damage and obstacles, characterized in that the output unit outputs an outline corresponding to the obstacle on the image.
  7. 청구항 6에 있어서,The method of claim 6,
    상기 출력부는 상기 영상에 상기 노면 손상에 대응하는 화소 영역을 출력하는 것을 특징으로 하는 도로 노면 손상 및 장애물 동시 탐지 시스템.The simultaneous detection system for road surface damage and obstacles, characterized in that the output unit outputs a pixel area corresponding to the road surface damage on the image.
  8. 주행중인 도로의 영상을 획득하는 영상 획득 단계;An image acquisition step of acquiring an image of the road being driven;
    상기 영상으로부터 상기 영상의 특징을 추출하여 서로 다른 크기를 가지는 복수의 백본 블록을 생성하는 백본 블록 생성 단계;a backbone block generation step of extracting features of the image from the image and generating a plurality of backbone blocks having different sizes;
    상기 복수의 백본 블록을 이용하여 상기 도로상의 장애물을 검출하는 객체 검출 단계;an object detection step of detecting an obstacle on the road using the plurality of backbone blocks;
    상기 복수의 백본 블록을 이용하여 상기 도로의 노면 손상을 검출하는 세그멘테이션 단계; 및a segmentation step of detecting road surface damage of the road using the plurality of backbone blocks; and
    상기 영상에 상기 장애물과 노면 손상을 출력하는 출력 단계를 포함하는 것을 특징으로 하는 도로 노면 손상 및 장애물 동시 탐지 방법.A method for simultaneously detecting road surface damage and obstacles, characterized in that it comprises an output step of outputting the obstacle and road surface damage to the image.
  9. 청구항 8의 방법을 실행시키기 위한 컴퓨터 판독 가능한 프로그램을 기록한 기록 매체.A recording medium recording a computer readable program for executing the method of claim 8.
PCT/KR2021/013899 2021-07-16 2021-10-08 System and method for detecting both road surface damage and obstacles by using deep neural network, and recording medium in which computer-readable program for executing same method is recorded WO2023286917A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2021-0093243 2021-07-16
KR1020210093243A KR102423218B1 (en) 2021-07-16 2021-07-16 System and method for simultaneously detecting road damage and moving obstacles using deep neural network, and a recording medium recording a computer readable program for executing the method.

Publications (1)

Publication Number Publication Date
WO2023286917A1 true WO2023286917A1 (en) 2023-01-19

Family

ID=82609305

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/013899 WO2023286917A1 (en) 2021-07-16 2021-10-08 System and method for detecting both road surface damage and obstacles by using deep neural network, and recording medium in which computer-readable program for executing same method is recorded

Country Status (2)

Country Link
KR (1) KR102423218B1 (en)
WO (1) WO2023286917A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116985803A (en) * 2023-09-26 2023-11-03 赛奎鹰智能装备(威海)有限责任公司 Self-adaptive speed control system and method for electric scooter

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726324B (en) * 2024-02-07 2024-04-30 中国水利水电第九工程局有限公司 Road traffic construction inspection method and system based on data identification

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190040550A (en) * 2017-10-11 2019-04-19 현대모비스 주식회사 Apparatus for detecting obstacle in vehicle and control method thereof
KR20200023692A (en) * 2018-08-20 2020-03-06 현대자동차주식회사 Appratus and method for detecting road surface

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20170132539A (en) 2016-05-24 2017-12-04 영남대학교 산학협력단 Apparatus and method for classificating Road marking

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190040550A (en) * 2017-10-11 2019-04-19 현대모비스 주식회사 Apparatus for detecting obstacle in vehicle and control method thereof
KR20200023692A (en) * 2018-08-20 2020-03-06 현대자동차주식회사 Appratus and method for detecting road surface

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SEUNGBO SHIM, YOUNG EUN SONG: "Encoder Type Semantic Segmentation Algorithm Using Multi-scale Learning Type for Road Surface Damage Recognition", THE JOURNAL OF THE KOREA INSTITUTE OF INTELLIGENT TRANSPORTATION SYSTEMS, vol. 19, 2 April 2020 (2020-04-02), pages 89 - 103, XP093024789, ISSN: 1738-0774, DOI: 10.12815/kits.2020.19.2.89 *
SHIM SEUNGBO , JEONG, JAE-JIN: "Detection Algorithm of Road Damage and Obstacle Based on Joint Deep Learning for Driving Safety", THE JOURNAL OF THE KOREA INSTITUTE OF INTELLIGENT TRANSPORT SYSTEMS, vol. 20, no. 2, 30 April 2021 (2021-04-30), pages 95 - 111, XP093024796, ISSN: 1738-0774, DOI: 10.12815/kits.2021.20.2.95 *
SHIM SEUNGBO , SONG YOUNG EUN: "A Selection Method of Backbone Network through Multi-Classification Deep Neural Network Evaluation of Road Surface Damage Images", THE JOURNAL OF THE KOREA INSTITUTE OF INTELLIGENT TRANSPORT SYSTEMS, vol. 18, no. 3, 30 June 2019 (2019-06-30), pages 106 - 118, XP093024794, ISSN: 1738-0774, DOI: 10.12815/kits. 2019.18.3.106 *
ZHAO QIJIE, SHENG TAO, WANG YONGTAO, TANG ZHI, CHEN YING, CAI LING, LING HAIBIN: "M2Det: A Single-Shot Object Detector Based on Multi-Level Feature Pyramid Network", PROCEEDINGS OF THE AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, vol. 33, no. 1, 17 July 2019 (2019-07-17), pages 9259 - 9266, XP093024795, ISSN: 2159-5399, DOI: 10.1609/aaai.v33i01.33019259 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116985803A (en) * 2023-09-26 2023-11-03 赛奎鹰智能装备(威海)有限责任公司 Self-adaptive speed control system and method for electric scooter
CN116985803B (en) * 2023-09-26 2023-12-29 赛奎鹰智能装备(威海)有限责任公司 Self-adaptive speed control system and method for electric scooter

Also Published As

Publication number Publication date
KR102423218B1 (en) 2022-07-20

Similar Documents

Publication Publication Date Title
WO2023286917A1 (en) System and method for detecting both road surface damage and obstacles by using deep neural network, and recording medium in which computer-readable program for executing same method is recorded
CN110020651B (en) License plate detection and positioning method based on deep learning network
CN107886073B (en) Fine-grained vehicle multi-attribute identification method based on convolutional neural network
WO2021002549A1 (en) Deep learning-based system and method for automatically determining degree of damage to each area of vehicle
US5448484A (en) Neural network-based vehicle detection system and method
CN106841216A (en) Tunnel defect automatic identification equipment based on panoramic picture CNN
CN110910378B (en) Bimodal image visibility detection method based on depth fusion network
CN111967498A (en) Night target detection and tracking method based on millimeter wave radar and vision fusion
CN112329776B (en) License plate detection method and device based on improved CenterNet network
WO2020105780A1 (en) Multiple-object detection system and method
CN110348396B (en) Deep learning-based method and device for recognizing character traffic signs above roads
JP2000048209A (en) Method and device for image processing
CN116342894B (en) GIS infrared feature recognition system and method based on improved YOLOv5
WO2019124668A1 (en) Artificial intelligence system for providing road surface danger information and method therefor
WO2019088333A1 (en) Method for recognizing human body activity on basis of depth map information and apparatus therefor
Naik et al. Implementation of YOLOv4 algorithm for multiple object detection in image and video dataset using deep learning and artificial intelligence for urban traffic video surveillance application
CN115376082A (en) Lane line detection method integrating traditional feature extraction and deep neural network
WO2021215740A1 (en) Method and device for on-vehicle active learning to be used for training perception network of autonomous vehicle
WO2021225296A1 (en) Method for explainable active learning, to be used for object detector, by using deep encoder and active learning device using the same
CN114596548A (en) Target detection method, target detection device, computer equipment and computer-readable storage medium
Aghdasi et al. Automatic licence plate recognition system
CN102043941A (en) Dynamic real-time relative relationship identification method and system
CN113239725B (en) Pedestrian waiting for crossing and crossing direction recognition method and system
WO2017155315A1 (en) Size-specific vehicle classification method for local area, and vehicle detection method using same
Pramanik et al. Detection of Potholes using Convolutional Neural Network Models: A Transfer Learning Approach

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE