KR102469815B1

KR102469815B1 - 3D AVM system by use of Deep Learning

Info

Publication number: KR102469815B1
Application number: KR1020220019197A
Authority: KR
Inventors: 이윤희; 김진복; 이인섭
Original assignee: 주식회사 리트빅
Priority date: 2022-02-14
Filing date: 2022-02-14
Publication date: 2022-11-23

Abstract

The present invention relates to a technology capable of configuring a distortion-free 3D AVM screen by estimating a separation distance for a camera image using a deep learning neural network model and constructing a 3D map. The 3D around view monitoring system based on deep learning according to the present invention comprises: a plurality of cameras (11 to 14) installed in the vehicle in different directions to generate a plurality of camera images (15 to 18); a deep learning neural network model unit (100) which generates a plurality of depth maps (105 to 108) having separation distance information for each unit image of the plurality of camera images (15 to 18) with a deep learning neural network model which receives a plurality of camera images (15 to 18) and estimates and outputs separation distance information for each unit image for a series of image frames constituting the corresponding camera images (15 to 18); and a 3D AVM image synthesis unit (200) for generating a 3D AVM image (209) by applying a projection according to separation distance information to the plurality of camera images (15 to 18) with reference to the plurality of depth maps (105 to 108) and synthesizing the images. According to the present invention, there is an advantage in that the convenience of driving a vehicle can be increased by removing distortion from a 3D AVM screen.

Description

3D around view monitoring system based on deep learning {3D AVM system by use of Deep Learning}

본 발명은 일반적으로 차량용 3D 어라운드뷰 모니터링(3D AVM) 기술에 관한 것이다. The present invention relates generally to 3D around view monitoring (3D AVM) technology for vehicles.

특히, 본 발명은 딥러닝 신경망 모델을 이용하여 카메라 영상에 대해 이격거리를 추정하여 3D 맵을 구성함으로써 왜곡 없는 3D AVM 화면을 구성할 수 있는 기술에 관한 것이다.In particular, the present invention relates to a technology capable of configuring a distortion-free 3D AVM screen by estimating a separation distance for a camera image using a deep learning neural network model and constructing a 3D map.

최근들어 자동차에 어라운드뷰 모니터링(Around View Monitoring, AVM) 시스템이 도입되는 추세이다. 어라운드뷰 모니터링(AVM)은 서라운드뷰 모니터링(Surround View Monitoring; SVM)이라고도 불리는데, 차량의 전후좌우에 카메라를 설치하여 차량의 전후좌우를 촬영한 후에 이들 카메라 영상을 합성하여 탑뷰(top view)를 제공하는 기술이다. 운전자는 마치 차량 위에서 내려다보는 느낌으로 차량 주변의 상황을 파악할 수 있어 주행이나 주차가 편리해진다.Recently, an around view monitoring (AVM) system has been introduced into automobiles. Surround View Monitoring (AVM) is also called Surround View Monitoring (SVM). Cameras are installed on the front, rear, left, and right sides of the vehicle to take pictures of the front, rear, left, and right sides of the vehicle, and then the camera images are synthesized to provide a top view. It is a technique to The driver can grasp the situation around the vehicle as if looking down from above, making driving and parking more convenient.

[도 1]은 AVM 영상처리의 개념을 나타내는 도면이다. [도 1]에 나타낸 바와 같이, 차량의 전방과 후방, 그리고 사이드미러 양쪽에 카메라(11 ~ 14)를 장착하고 이들 카메라(11 ~ 14)로부터 제공된 카메라 영상(15 ~ 18)에 대해 이미지 개선과 왜곡 보정을 적용하여 평면 이미지로 만든 후, 스티칭(이미지 정합 및 합성) 처리를 통해 탑뷰 형태의 어라운드뷰 영상(19)을 얻는다. 이 어라운드뷰 영상(19)은 모니터를 통해 운전자에게 제공된다. 1 is a diagram showing the concept of AVM image processing. As shown in [Fig. 1], cameras 11 to 14 are mounted on the front and rear of the vehicle and on both side mirrors, and the camera images 15 to 18 provided from these cameras 11 to 14 are used to improve and improve images. After making a flat image by applying distortion correction, a top-view around-view image 19 is obtained through stitching (image matching and compositing) processing. This around-view image 19 is provided to the driver through a monitor.

이처럼 AVM이 탑뷰를 제공하는 것에 비해, 3D AVM은 입체(3D) 시점으로 화면을 제공한다. 3차원으로 시점을 변환하기 때문에 기존 AVM에 비해 운전자가 차량 주변 상황을 좀더 자세히 볼 수 있다. [도 2]는 3D AVM을 개념적으로 나타내는 도면이다. 3D AVM에서는 원통(cylinder)이나 보울(bowl) 형태의 3D 투영면에 카메라 영상(15 ~ 18)을 투영한 영상을 운전자에게 제공하는데, [도 2]는 보울 형태의 3D 투영면을 사용한 예를 나타낸다.As such, while AVM provides a top view, 3D AVM provides a screen in a stereoscopic (3D) perspective. Because it converts the viewpoint into 3D, the driver can see the surroundings of the vehicle in more detail than conventional AVMs. [Fig. 2] is a diagram conceptually showing a 3D AVM. In the 3D AVM, images obtained by projecting the camera images 15 to 18 onto a cylinder or bowl-shaped 3D projection surface are provided to the driver. [Fig. 2] shows an example using the bowl-shaped 3D projection surface.

3D AVM의 단점은 입체 사물(높이가 있는 물체)이 왜곡되게 보여진다는 것이다. 미리 설정된 형상(예: 원통, 보울)을 갖는 3D 투영면에 카메라 영상(15 ~ 18)을 투영하여 나타내므로 사물 형태가 왜곡되는 것이다. 또한, 카메라 영상(15 ~ 18)이 겹쳐지는 영역에서 사물 형태에 왜곡이 발생한다. [도 3]은 3D AVM 화면의 일 예를 나타내는 도면이다. [도 3]을 참조하면, 차선이나 주차선과 같이 평면 사물에는 왜곡이 적은 반면 입체 사물들은 그 형태가 실제와 상이하게 나타나는 것을 확인할 수 있다. 또한, 카메라 영상(15 ~ 18)이 겹쳐지는 영역에서도 3D AVM 화면이 왜곡되는 것을 확인할 수 있다. The disadvantage of 3D AVM is that stereoscopic objects (objects with height) appear distorted. Since the camera images 15 to 18 are projected onto a 3D projection surface having a preset shape (eg, cylinder, bowl), the shape of the object is distorted. In addition, distortion occurs in the shape of an object in an area where the camera images 15 to 18 overlap. 3 is a diagram illustrating an example of a 3D AVM screen. Referring to FIG. 3 , it can be confirmed that flat objects such as lanes and parking lines have little distortion, whereas three-dimensional objects appear different from the actual shape. In addition, it can be confirmed that the 3D AVM screen is distorted even in the area where the camera images 15 to 18 overlap.

대한민국 공개특허공보 10-2018-0001869호 "AVM 시스템의 영상 개선 장치 및 그 방법" (공개일자: 2018년 1월 5일)Republic of Korea Patent Publication No. 10-2018-0001869 "Image enhancement device and method of AVM system" (published on January 5, 2018) 대한민국 공개특허공보 10-2017-0111504호 "AVM 시스템 어라운드 영상 정합성 평가방법 및 장치" (공개일자: 2017년 10월 12일)Republic of Korea Patent Publication No. 10-2017-0111504 "AVM system around image consistency evaluation method and apparatus" (published on October 12, 2017)

본 발명의 목적은 일반적으로 차량용 3D 어라운드뷰 모니터링(3D AVM) 기술을 제공하는 것이다. An object of the present invention is generally to provide a 3D around view monitoring (3D AVM) technology for vehicles.

특히, 본 발명의 목적은 딥러닝 신경망 모델을 이용하여 카메라 영상에 대해 이격거리를 추정하여 3D 맵을 구성함으로써 왜곡 없는 3D AVM 화면을 구성할 수 있는 기술을 제공하는 것이다. In particular, an object of the present invention is to provide a technology capable of constructing a distortion-free 3D AVM screen by estimating a separation distance for a camera image using a deep learning neural network model and constructing a 3D map.

한편, 본 발명의 해결 과제는 이들 사항에 제한되지 않으며 본 명세서의 기재로부터 다른 해결 과제가 이해될 수 있다. On the other hand, the problem of the present invention is not limited to these matters, and other problems can be understood from the description of this specification.

상기의 목적을 달성하기 위하여 본 발명에 따른 딥러닝 기반의 3D 어라운드뷰 모니터링 시스템은, 상이한 방향으로 차량에 설치되어 복수의 카메라 영상(15 ~ 18)을 생성하는 복수의 카메라(11 ~ 14); 복수의 카메라 영상(15 ~ 18)을 입력받아 해당 카메라 영상(15 ~ 18)을 구성하는 일련의 이미지 프레임에 대해 단위 이미지 별로 이격거리 정보를 추정하여 출력하는 딥러닝 신경망 모델을 구비하여 복수의 카메라 영상(15 ~ 18)의 단위 이미지별 이격거리 정보를 갖는 복수의 깊이 맵(105 ~ 108)을 생성하는 딥러닝 신경망 모델부(100); 복수의 깊이 맵(105 ~ 108)을 참조하여 복수의 카메라 영상(15 ~ 18)에 대해 이격거리 정보에 따라 투영을 적용하고 영상 합성하여 3D AVM 영상(209)을 생성하는 3D AVM 영상 합성부(200);를 포함하여 구성된다. In order to achieve the above object, a deep learning-based 3D around view monitoring system according to the present invention includes a plurality of cameras 11 to 14 installed in a vehicle in different directions to generate a plurality of camera images 15 to 18; It is equipped with a deep learning neural network model that receives multiple camera images (15 to 18) and estimates and outputs separation distance information for each unit image for a series of image frames constituting the corresponding camera images (15 to 18). a deep learning neural network model unit 100 generating a plurality of depth maps 105 to 108 having distance information for each unit image of the images 15 to 18; A 3D AVM image synthesis unit for generating a 3D AVM image 209 by applying a projection according to separation distance information to a plurality of camera images 15 to 18 with reference to a plurality of depth maps 105 to 108 and synthesizing the images ( 200);

본 발명에 따른 딥러닝 기반의 3D 어라운드뷰 모니터링 시스템은, 복수의 카메라 영상(15 ~ 18)의 교합 영역에 대해 해당 깊이 맵(105 ~ 108)에 대한 바이리니어 인터폴레이션을 통해 이격거리 정보를 추정하는 교합 처리부(220);를 더 포함하여 구성될 수 있다. 이때, 3D AVM 영상 합성부(200)는 복수의 카메라 영상(15 ~ 18)에 대한 영상합성 과정에서 교합 영역을 식별하면 교합 처리부(220)로부터 교합 영역에 대한 이격거리 정보를 제공받아 영상 합성을 처리하도록 구성된다. The deep learning-based 3D around view monitoring system according to the present invention estimates separation distance information through bilinear interpolation of corresponding depth maps 105 to 108 for the occlusion area of a plurality of camera images 15 to 18. Occlusal processing unit 220; may be configured to further include. At this time, when the 3D AVM image synthesis unit 200 identifies the occlusal area during the image synthesis process for the plurality of camera images 15 to 18, the occlusion processing unit 220 receives separation distance information about the occlusal area and performs image synthesis. configured to process.

이때, 3D AVM 영상 합성부(200)는 카메라 영상(15 ~ 18)의 해당 이미지 프레임에 대응하는 깊이 맵(105 ~ 108)에 저장된 단위 이미지별 이격거리 정보를 반영하여 비정형 형상의 3D 투영면을 설정하고 비정형 형상의 3D 투영면에 카메라 영상(15 ~ 18)을 투영하도록 구성될 수 있다. At this time, the 3D AVM image synthesis unit 200 reflects the separation distance information for each unit image stored in the depth maps 105 to 108 corresponding to the corresponding image frames of the camera images 15 to 18 to set a 3D projection surface of irregular shape And may be configured to project the camera images (15 to 18) on the 3D projection surface of the atypical shape.

또한, 본 발명에 따른 딥러닝 기반의 3D 어라운드뷰 모니터링 시스템은, 복수의 카메라(11 ~ 14)에 대해 미리 저장된 내부 파라미터와 외부 파라미터를 이용하여 복수의 카메라 영상(15 ~ 18)으로부터 카메라 렌즈로 인한 왜곡 및 카메라 장착 위치와 각도로 인한 왜곡을 제거하는 카메라 보정 처리부(210);를 더 포함하여 구성될 수 있다. In addition, the deep learning-based 3D around view monitoring system according to the present invention uses internal parameters and external parameters stored in advance for the plurality of cameras 11 to 14 to transmit images from the plurality of cameras 15 to 18 to the camera lens. It may be configured to further include; a camera correction processing unit 210 that removes distortion caused by camera mounting and distortion due to the camera mounting position and angle.

또한, 본 발명에 따른 딥러닝 기반의 3D 어라운드뷰 모니터링 시스템은, 차량에 설치되어 제 1 시간 간격에 따라 훈련용 카메라 영상(311)을 생성하는 차량 카메라(310); 차량 카메라(310)와 동일 방향을 지향하도록 차량에 설치되어 제 2 시간 간격에 따라 거리측정 센서 데이터를 생성하는 거리측정 센서(320); 훈련용 카메라 영상(311)과 거리측정 센서 데이터를 제공받고 훈련용 카메라 영상(311)의 정보 획득 시점에 동기화되도록 거리측정 센서 데이터에 인터폴레이션을 적용하여 훈련용 이격거리 정보(321)를 획득하며 훈련용 카메라 영상(311)과 훈련용 이격거리 정보(321)의 조합에 의해 훈련용 데이터셋을 구성하여 딥러닝 신경망 모델에 대한 딥러닝 학습을 수행하는 딥러닝 학습부(110);를 더 포함하여 구성될 수 있다. In addition, the deep learning-based 3D around view monitoring system according to the present invention includes a vehicle camera 310 installed in a vehicle and generating a training camera image 311 according to a first time interval; a distance measurement sensor 320 installed in the vehicle to face the same direction as the vehicle camera 310 and generating distance measurement sensor data according to a second time interval; The training camera image 311 and the distance measurement sensor data are provided, and the distance measurement sensor data is interpolated to be synchronized with the training camera image 311 at the time of information acquisition to obtain training separation distance information 321 for training. Further comprising can be configured.

이때, 거리측정 센서(320)는 라이다(LiDAR) 센서를 포함하여 구성될 수 있다.In this case, the distance measuring sensor 320 may include a LiDAR sensor.

한편, 본 발명에 따른 컴퓨터프로그램은 컴퓨터에 이상과 같은 딥러닝 기반의 3D 어라운드뷰 모니터링 시스템을 작동시키기 위하여 비휘발성 저장매체에 저장된 것이다.On the other hand, the computer program according to the present invention is stored in a non-volatile storage medium to operate the 3D around view monitoring system based on deep learning as described above in the computer.

본 발명에 따르면 3D AVM 화면에서 왜곡을 제거함으로써 차량 운전의 편의성을 높일 수 있는 장점이 있다. According to the present invention, there is an advantage in that the convenience of driving a vehicle can be increased by removing distortion from a 3D AVM screen.

[도 1]은 AVM 영상처리의 개념을 나타내는 도면.
[도 2]는 3D AVM을 개념적으로 나타내는 도면.
[도 3]은 3D AVM 화면의 일 예를 나타내는 도면.
[도 4]는 본 발명에 따른 3D AVM 시스템의 전체 구성을 나타내는 블록도.
[도 5]는 본 발명에서 딥러닝 신경망 모델의 학습을 위한 구성을 나타내는 블록도.
[도 6]은 본 발명에서 딥러닝 학습 데이터 구성을 위해 카메라 영상과 라이다 센서 출력을 얻는 예시도.
[도 7]은 입체 영상에서 교합의 개념을 나타내는 도면.
[도 8]은 교합에 의해 입체 영상에 빈 영역이 나타내는 예를 나타내는 도면.[Fig. 1] is a diagram showing the concept of AVM image processing.
[Fig. 2] is a diagram conceptually illustrating a 3D AVM.
[Fig. 3] is a diagram showing an example of a 3D AVM screen.
[Figure 4] is a block diagram showing the overall configuration of a 3D AVM system according to the present invention.
[Figure 5] is a block diagram showing the configuration for learning the deep learning neural network model in the present invention.
[Fig. 6] is an exemplary view of obtaining camera images and LIDAR sensor outputs for constructing deep learning learning data in the present invention.
[Fig. 7] is a diagram illustrating the concept of occlusion in a stereoscopic image.
[Fig. 8] is a diagram showing an example in which a blank area is displayed in a stereoscopic image due to occlusion.

본 발명은 3D 어라운드뷰 모니터링(3D AVM) 기술에 관한 것이다. 일반적으로 3D AVM의 단점은 입체 사물이 왜곡되어 나타난다는 점이다. 본 발명에서는 카메라 영상의 단위 이미지들(예: 픽셀, 픽셀블록 등)과 해당 카메라(11 ~ 14) 간의 거리를 추정하고 그 이격거리(해당 카메라와 그 피사체 간의 거리)를 반영하여 3D AVM 영상을 구성함으로써 이 왜곡을 제거한다. 이때, 사물의 외곽(boundary)과 배경(background)을 구분하면 사물에 대한 거리 추정의 정확도를 개선할 수 있어 바람직하다. The present invention relates to 3D around view monitoring (3D AVM) technology. In general, the disadvantage of 3D AVM is that three-dimensional objects appear distorted. In the present invention, a 3D AVM image is obtained by estimating the distance between unit images (eg, pixels, pixel blocks, etc.) Composition eliminates this distortion. At this time, it is preferable to distinguish the boundary of the object from the background because it can improve the accuracy of estimating the distance to the object.

본 발명에서는 카메라 영상에서 단위 이미지들에 대해 해당 카메라와의 이격거리를 추정한다. 이때, 카메라 영상에서 이격거리를 추정하는 단위가 되는 영역을 본 명세서에서는 '단위 이미지(unit image)'라고 부른다. 이 단위 이미지는 미리 설정될 수 있는데, 개별 사물 단위일 수도 있고 픽셀 단위일 수도 있고 일정 크기의 픽셀 덩어리(예: 8x8 픽셀) 단위일 수도 있다. 또한, 본 발명에서는 이격거리 추정을 위해 딥러닝(Deep Learning) 신경망 모델을 활용한다. 딥러닝 신경망 모델에서는 일련의 이미지 프레임을 분석할 필요없이 영상 이미지 한 장에서 이격거리를 추정할 수 있으므로 차량 정지 상태에서도 거리 추정이 가능하다.In the present invention, a separation distance from a corresponding camera is estimated for unit images in a camera image. In this case, an area serving as a unit for estimating the separation distance in the camera image is referred to as a 'unit image' in this specification. This unit image may be set in advance, and may be an individual object unit, a pixel unit, or a unit of a pixel block (eg, 8x8 pixels) of a certain size. In addition, in the present invention, a deep learning neural network model is used to estimate the separation distance. The deep learning neural network model can estimate the separation distance from a single video image without analyzing a series of image frames, so it is possible to estimate the distance even when the vehicle is stopped.

종래기술에서 3D AVM 영상에 왜곡이 발생하는 원인은 카메라 영상을 미리 설정된 형태의 3D 투영면(예: 보울, 원통)에 일률적으로 투영하기 때문이다. 반면, 본 발명에서는 카메라 영상에서 각 단위 이미지에 대해 해당 카메라와의 이격거리를 추정하고, 그 추정된 이격거리를 단위 이미지에 매핑하여 3D 맵(map)을 구성한 후에, 3D 투영면이 아닌 이격거리에 맞게 단위 이미지 별로 카메라 영상에 대해 투영을 적용하면 왜곡 없는 3D AVM 영상을 얻을 수 있다.The cause of distortion in the 3D AVM image in the prior art is that the camera image is uniformly projected onto a 3D projection surface (eg, a bowl or a cylinder) of a preset shape. On the other hand, in the present invention, after estimating the separation distance from the corresponding camera for each unit image in the camera image, mapping the estimated separation distance to the unit image to construct a 3D map, 3D AVM images without distortion can be obtained by applying projection to camera images for each unit image accordingly.

추가로, 본 발명은 카메라(11 ~ 14)가 겹쳐지는 영역에서 왜곡 없는 화면을 구성할 수 있다.Additionally, according to the present invention, a distortion-free screen can be configured in an area where the cameras 11 to 14 overlap.

이하에서는 도면을 참조하여 본 발명을 상세하게 설명한다. Hereinafter, the present invention will be described in detail with reference to the drawings.

[도 4]는 본 발명에 따른 3D AVM 시스템의 전체 구성을 나타내는 블록도이다. [도 4]를 참조하면, 상이한 방향으로 차량에 설치된 복수의 카메라(11 ~ 14)로부터 카메라 영상(15 ~ 18)이 획득되는데, 이들 카메라 영상(15 ~ 18)은 딥러닝 신경망 모델부(100)로 전달된다. 딥러닝 신경망 모델부(100)는 딥러닝 신경망 모델을 이용하여 카메라 영상(15 ~ 18)으로부터 깊이 맵(depth maps)(105 ~ 108)을 생성한다. 깊이 맵(105 ~ 108)은 카메라 영상(15 ~ 18)을 구성하는 일련의 이미지 프레임에 대해 단위 이미지 별로 이격거리 정보를 가지고 있다. 이때, 깊이 맵(105 ~ 108)은 이미지 프레임의 전부 또는 일부에 대해 이격거리 정보를 가지고 있을 수도 있고, 단위 이미지의 전부 또는 일부에 대해 이격거리 정보를 가지고 있을 수 있다. [Figure 4] is a block diagram showing the overall configuration of the 3D AVM system according to the present invention. Referring to FIG. 4, camera images 15 to 18 are obtained from a plurality of cameras 11 to 14 installed in a vehicle in different directions, and these camera images 15 to 18 are deep learning neural network model units 100 ) is transmitted to The deep learning neural network model unit 100 generates depth maps 105 to 108 from the camera images 15 to 18 using the deep learning neural network model. The depth maps 105 to 108 have separation distance information for each unit image with respect to a series of image frames constituting the camera images 15 to 18. At this time, the depth maps 105 to 108 may have separation distance information for all or part of the image frame, or may have separation distance information for all or part of the unit image.

3D AVM 영상 합성부(200)는 차량 복수 방향의 카메라 영상(15 ~ 18)을 깊이 맵(105 ~ 108)의 이격거리 정보와 카메라 보정 정보를 이용하여 합성(스티칭)하여 3D AVM 영상을 생성한다. The 3D AVM image synthesis unit 200 synthesizes (stitches) the camera images 15 to 18 in multiple directions of the vehicle using the separation distance information and camera correction information of the depth maps 105 to 108 to generate a 3D AVM image. .

이때, 3D 투영면은 미리 설정된 형상(예: 원통, 보울)을 갖는 것이 아니라 깊이 맵(105 ~ 108)의 이격거리 정보에 기초하여 3D 투영면을 구성한다. 즉, 카메라 영상(15 ~ 18)을 투영할 때에 해당 카메라 영상(15 ~ 18)의 해당 이미지 프레임에 대응하는 깊이 맵(105 ~ 108)에 저장된 단위 이미지별 이격거리 정보를 반영하여 3D 투영면을 설정한다. 그에 따라, 이 3D 투영면은 원통이나 보울과 같은 정형화된 형상을 갖지 않고 단위 이미지별 이격거리 정보에 따라 비정형의 형상을 갖는다. 그리고 나서, 그 비정형의 3D 투영면에 카메라 영상(15 ~ 18)을 투영하는 것이다. At this time, the 3D projection plane does not have a preset shape (eg, cylinder, bowl), but configures the 3D projection plane based on the separation distance information of the depth maps 105 to 108 . That is, when the camera images 15 to 18 are projected, the 3D projection plane is set by reflecting the separation distance information for each unit image stored in the depth maps 105 to 108 corresponding to the corresponding image frame of the camera images 15 to 18 do. Accordingly, this 3D projection surface does not have a standard shape such as a cylinder or a bowl, but has an irregular shape according to the separation distance information for each unit image. Then, the camera images 15 to 18 are projected onto the atypical 3D projection surface.

카메라 영상(15 ~ 18)의 합성에 카메라 보정 정보(camera calibration information)를 이용하는 개념에 대해 기술한다. 일반적으로 AVM 기능 구현을 위해 차량에 전후좌우 카메라(11 ~ 14)를 설치한 후에 카메라 보정(camera calibration) 공정을 수행한다. 이 카메라 보정 공정에서 AVM 시스템은 해당 차량에 설치된 카메라(11 ~ 14) 각각에 대해 내부 파라미터(intrinsic camera parameters)와 외부 파라미터(extrinsic camera parameters)를 얻게 되는데, 이를 총칭하여 카메라 보정 정보라고 부른다. 내부 파라미터는 카메라 렌즈로 인한 왜곡 특성을 나타내는 정보이고, 외부 파라미터는 해당 카메라가 차량에 장착되어 있는 위치와 각도로 인한 왜곡 특성를 나타내는 정보이다. The concept of using camera calibration information to synthesize the camera images 15 to 18 will be described. In general, after installing front and rear and left and right cameras 11 to 14 in a vehicle to implement the AVM function, a camera calibration process is performed. In this camera calibration process, the AVM system obtains intrinsic camera parameters and extrinsic camera parameters for each of the cameras 11 to 14 installed in the vehicle, which are collectively referred to as camera calibration information. The internal parameter is information representing distortion characteristics due to the camera lens, and the external parameter is information representing distortion characteristics due to the position and angle at which the corresponding camera is mounted on the vehicle.

카메라 영상(15 ~ 18)에는 해당 카메라(11 ~ 14)의 내부 파라미터와 외부 파라미터에 따른 왜곡이 삽입되어 있으므로, 3D AVM 영상 합성부(200)의 카메라 보정 처리부(210)는 카메라 영상(15 ~ 18)의 합성 과정에서 내부 파라미터와 외부 파라미터에 따른 왜곡, 즉 카메라 렌즈로 인한 왜곡 및 카메라 장착 위치와 각도로 인한 왜곡을 제거하는 과정을 수행한다. 일반적으로 카메라 보정 정보를 이용하여 카메라 영상(15 ~ 18)의 왜곡을 제거하는 과정은 [수학식 1]과 같은데, 이는 영상 합성 분야에서 이미 알려진 기술이므로 본 명세서에서는 자세한 설명을 생략한다.Since distortions according to the internal and external parameters of the corresponding cameras 11 to 14 are inserted into the camera images 15 to 18, the camera correction processing unit 210 of the 3D AVM image synthesis unit 200 converts the camera images 15 to 14 into In the synthesizing process of 18), distortion according to internal and external parameters, that is, distortion caused by the camera lens and distortion due to the camera mounting position and angle, is removed. In general, the process of removing the distortion of the camera images 15 to 18 using camera correction information is the same as [Equation 1], but since this is a technique already known in the field of image synthesis, a detailed description is omitted herein.

이때, 2개의 행렬

와

는 각각 카메라 내부 파라미터와 카메라 외부 파라미터를 나타낸다. At this time, two matrices

Wow

denote camera internal parameters and camera external parameters, respectively.

[도 5]는 본 발명에서 딥러닝 신경망 모델의 학습을 위한 구성을 나타내는 블록도이다. [Figure 5] is a block diagram showing the configuration for learning the deep learning neural network model in the present invention.

본 발명에서는 딥러닝 신경망 모델을 활용하여 각 카메라 영상(15 ~ 18)에 대해 이격거리 정보를 추정한다. 이처럼 활용하기 위해서는 딥러닝 학습부(110)가 딥러닝 신경망 모델부(100)의 딥러닝 신경망 모델을 미리 훈련시켜야 한다. 딥러닝에서 훈련 과정은 학습 데이터셋(training dataset)을 통해 신경망 모델을 학습시켜가는 과정을 말한다. 본 발명에서는 훈련용 카메라 영상(311)과 훈련용 이격거리 정보(321)에 관한 다수의 조합을 학습 데이터셋으로 활용한다. 이 학습 데이터셋에 의해 학습이 이루어진 딥러닝 신경망 모델은 카메라 영상이 주어지면 일련의 이미지 프레임에 대해 단위 이미지 별로 이격거리를 추정할 수 있다.In the present invention, separation distance information is estimated for each camera image 15 to 18 using a deep learning neural network model. In order to utilize this, the deep learning learning unit 110 needs to train the deep learning neural network model of the deep learning neural network model unit 100 in advance. In deep learning, the training process refers to the process of learning a neural network model through a training dataset. In the present invention, multiple combinations of the camera image 311 for training and the separation distance information 321 for training are used as a learning dataset. The deep learning neural network model trained by this training dataset can estimate the separation distance for each unit image for a series of image frames given a camera image.

이때, 딥러닝 신경망 모델의 학습 데이터셋를 구성하기 위해 훈련용 카메라 영상(311)과 훈련용 이격거리 정보(321)를 대량으로 수집하는 과정을 사람이 수작업으로 한다면 매우 비효율적이다. 그에 따라, 본 발명에서는 차량 카메라(310)와 거리측정 센서(320)를 활용하는 것을 제안한다. 즉, 차량에 카메라(310)와 거리측정 센서(320)를 동일한 방향을 지향하도록 설치하고 여러 지점을 운행하는 것이다. At this time, it is very inefficient if a person manually collects the training camera image 311 and the separation distance information 321 for training in order to configure the learning dataset of the deep learning neural network model. Accordingly, the present invention proposes to utilize the vehicle camera 310 and the distance measuring sensor 320. That is, the camera 310 and the distance measuring sensor 320 are installed in the vehicle so as to be directed in the same direction, and the vehicle travels to several points.

차량 카메라(310)는 훈련용 카메라 영상(311)을 생성하고 거리측정 센서(320)는 거리측정 센서 데이터를 생성한다. 딥러닝 학습부(110)는 훈련용 카메라 영상(311)의 정보 획득 시점에 동기화하여 거리측정 센서 데이터로부터 훈련용 이격거리 정보(321)를 획득하고, 훈련용 카메라 영상(311)과 훈련용 이격거리 정보(321)의 조합에 의해 훈련용 데이터셋을 구성한다. 이러한 과정을 통해 딥러닝 신경망 모델을 학습시키기 위한 대량의 훈련용 데이터셋을 효과적으로 얻을 수 있다.The vehicle camera 310 generates a training camera image 311 and the distance measurement sensor 320 generates distance measurement sensor data. The deep learning learning unit 110 synchronizes with the information acquisition time of the training camera image 311 to obtain training separation distance information 321 from the distance measurement sensor data, and obtains the training camera image 311 and the training separation distance A training data set is formed by combining the distance information 321 . Through this process, it is possible to effectively obtain a large amount of training dataset for learning a deep learning neural network model.

이러한 거리측정 센서(320)의 바람직한 실시예로서 라이다(LiDAR) 센서가 적합하다. 라이다(LiDAR) 센서는 'Light Detection And Ranging' 또는 'Laser Imaging, Detection and Ranging'의 약자이다. 고출력의 펄스레이저를 이용하여 물체에 반사되어 돌아오는 레이저 빔의 시간을 측정함으로써 3D 공간정보를 획득하는 기술이다. 차량 카메라(310)와 라이다 센서(320)의 동기화를 통해서 훈련용 카메라 영상(311)의 각 픽셀별로 이격거리를 추정할 수 있으며 이를 이용하면 훈련용 카메라 영상(311)과 훈련용 이격거리 정보(321)의 조합을 획득할 수 있다. As a preferred embodiment of the distance measuring sensor 320, a LiDAR sensor is suitable. LiDAR sensor is an abbreviation of 'Light Detection And Ranging' or 'Laser Imaging, Detection and Ranging'. It is a technology that obtains 3D spatial information by measuring the time of a laser beam reflected from an object using a high-output pulse laser. Through synchronization between the vehicle camera 310 and the lidar sensor 320, the separation distance can be estimated for each pixel of the training camera image 311, and using this, the training camera image 311 and the training distance information A combination of (321) can be obtained.

이때, 차량 카메라(310)와 라이다 센서(320)는 동일한 방향을 지향하도록 설치되어 있으며, 정보 획득 시점에 있어서도 상호 동기화가 이루어져야 한다. 그런데, 일반적으로 차량 카메라(310)와 라이다 센서(320)는 정보 획득 시점이 상이하다. 차량 카메라(310)가 훈련용 카메라 영상(311)을 생성하는 시간 간격은 초당 30프레임의 경우 33 msec이다. 반면, 라이다 센서(320)가 훈련용 이격거리 정보(321)를 생성하는 시간 간격은 33 msec보다 훨씬 길다. At this time, the vehicle camera 310 and the LIDAR sensor 320 are installed to face the same direction, and mutual synchronization must be performed even at the time of information acquisition. However, in general, information acquisition time points of the vehicle camera 310 and the lidar sensor 320 are different. The time interval at which the vehicle camera 310 generates the training camera image 311 is 33 msec in the case of 30 frames per second. On the other hand, the time interval at which the lidar sensor 320 generates the separation distance information 321 for training is much longer than 33 msec.

[도 6]은 본 발명에서 딥러닝 학습 데이터 구성을 위해 카메라 영상과 라이다 센서 출력을 얻는 예시도이다. 딥러닝 신경망 모델에 대한 훈련 데이터셋을 구성하려면 훈련용 카메라 영상(311)의 정보 획득 시점에 동기화하여 이격거리 정보가 필요하다. 즉, [도 6]에서 훈련용 카메라 영상(311)의 정보 획득 시점(t1 ~ t5)에 대한 이격거리 정보가 필요하다. [Fig. 6] is an exemplary view of obtaining camera images and LIDAR sensor outputs for constructing deep learning learning data in the present invention. In order to construct a training dataset for a deep learning neural network model, separation distance information needs to be synchronized with the information acquisition time of the training camera image 311 . That is, in [FIG. 6], separation distance information for information acquisition time points (t1 to t5) of the training camera image 311 is required.

제 1 실시예로는 정보 획득 시점에 근접한 라이다 센서 출력을 이격거리 정보에 매칭시키는 것이다. [도 6]에서 시점(t2, t3)에 대해서는 시점(t6)의 라이다 센서 출력을 이격거리 정보로 설정하고, 시점(t4, t5)에 대해서는 시점(t7)의 라이다 센서 출력을 이격거리 정보로 설정하는 것이다.In the first embodiment, the LIDAR sensor output close to the information acquisition time is matched with the separation distance information. In [Fig. 6], for the viewpoints t2 and t3, the lidar sensor output at the viewpoint t6 is set as the separation distance information, and for the viewpoints t4 and t5, the lidar sensor output at the viewpoint t7 is set as the separation distance to set information.

제 2 실시예로는 라이다 센서 출력에 시간 축을 기준으로 하는 인터폴레이션(보간)을 적용하여 이격거리 정보를 획득하는 것이다. [도 6]에서 차량 카메라(310)에 비해 라이다 센서(320)의 정보 생성 주기가 긴데, 라이다 센서 출력에 예컨대 [수학식 2]와 같은 리니어 인터폴레이션(linear interpolation, 선형보간)을 적용하여 훈련용 카메라 영상(311)의 정보 획득 시점(t2 ~ t5)에 대한 이격거리 정보를 획득(추정)하는 것이다. 리니어 인터폴레이션 기법은 널리 알려진 기술이므로 본 명세서에서는 자세한 설명을 생략한다.In the second embodiment, separation distance information is obtained by applying interpolation (interpolation) based on the time axis to the lidar sensor output. 6, the information generation period of the lidar sensor 320 is longer than that of the vehicle camera 310, and linear interpolation such as [Equation 2] is applied to the lidar sensor output, It is to acquire (estimate) separation distance information for information acquisition time points (t2 to t5) of the training camera image 311. Since the linear interpolation technique is a widely known technique, a detailed description thereof is omitted in this specification.

이때, x는 시간 값이고 y는 이격거리 값이다. (x0, y0)와 (x1, y1)은 라이다 센서 출력 값이 존재하는 좌우 지점의 시간 값과 라이다 센서 출력 값이다. At this time, x is the time value and y is the separation distance value. (x0, y0) and (x1, y1) are time values and lidar sensor output values of left and right points where lidar sensor output values exist.

이상과 같이 딥러닝 학습부(110)가 차량 카메라(310)와 라이다 센서(320)로부터 얻어진 훈련용 카메라 영상(311)과 훈련용 이격거리 정보(321)를 이용하여 일정 규모 이상의 훈련 데이터셋을 구성하고, 이 훈련 데이터셋을 이용하여 딥러닝 신경망 모델부(100)의 딥러닝 신경망 모델에 대한 딥러닝 학습을 수행한다. As described above, the deep learning learning unit 110 uses the training camera image 311 obtained from the vehicle camera 310 and the lidar sensor 320 and the distance information 321 for training to set a training dataset of a certain size or more. is configured, and deep learning is performed on the deep learning neural network model of the deep learning neural network model unit 100 using this training dataset.

딥려닝 신경망 모델부(100)는 이렇게 학습이 이루어진 딥러닝 신경망 모델에 카메라 영상(15 ~ 18)에 입력하면 해당 카메라 영상(15 ~ 18)을 구성하는 일련의 이미지 프레임에 대해 단위 이미지(예: 픽셀, 픽셀블록 등) 별로 이격거리를 추정할 수 있다. 이 이격거리를 깊이(depth)라고 부르기도 한다. 그에 따라, 본 명세서에서는 딥려닝 신경망 모델부(100)가 카메라 영상(15 ~ 18)에 대해 생성하는 이격거리 정보의 집합을 깊이 맵(105 ~ 108)이라고 부른다.When the deep learning neural network model unit 100 inputs the camera images 15 to 18 to the deep learning neural network model that has been learned in this way, the unit image (e.g., The separation distance can be estimated for each pixel, pixel block, etc.). This separation distance is also called depth. Accordingly, in this specification, a set of separation distance information generated by the deep learning neural network model unit 100 for the camera images 15 to 18 is called a depth map 105 to 108.

3D AVM 영상 합성부(200)는 깊이 맵(105 ~ 108)에 따라 카메라 영상(15 ~ 18)을 합성하여 3D AVM 영상(209)을 생성한다. The 3D AVM image synthesis unit 200 synthesizes the camera images 15 to 18 according to the depth maps 105 to 108 to generate a 3D AVM image 209 .

이때, 3D AVM 영상 합성부(200)의 교합 처리부(220)는 복수 방향의 카메라 영상(15 ~ 18)을 합성할 때에 교합에 따른 오류를 제거하는 처리를 수행한다. 종래기술에서와 같이 원통이나 보울 형상에 카메라 영상(15 ~ 18)을 일률적으로 투영할 때에는 그 합성 결과물(합성 영상)에 빈 곳이 없다. 반면, 본 발명에서와 같이 단위 이미지별로 이격거리 정보가 존재하고 그 이격거리 정보를 반영하여 카메라 영상(15 ~ 18)을 합성할 때에는 교합(occlusion)에 의해서 빈 곳이 발생하는 문제가 있다. 이러한 교합을 방치하게 되면 3D AVM 영상(209)의 품질이 낮아지게 되므로 3D AVM 영상 합성부(200)의 교합 처리부(220)가 그 교합으로 인한 오류를 제거하는 것이다.At this time, the occlusion processing unit 220 of the 3D AVM image synthesis unit 200 performs processing to remove errors due to occlusion when synthesizing the camera images 15 to 18 in multiple directions. As in the prior art, when the camera images 15 to 18 are uniformly projected onto a cylinder or bowl shape, there is no empty space in the synthesized result (synthesized image). On the other hand, as in the present invention, separation distance information exists for each unit image, and when the camera images 15 to 18 are synthesized by reflecting the separation distance information, there is a problem in that occlusion occurs. If such occlusion is neglected, the quality of the 3D AVM image 209 deteriorates, so the occlusion processing unit 220 of the 3D AVM image synthesis unit 200 removes the error caused by the occlusion.

[도 7]과 [도 8]을 참조하여 교합에 의해 3D 영상에 빈 곳이 생기는 원리에 대해 기술한다. [도 7]은 입체 영상에서 교합의 개념을 나타내는 도면이고, [도 8]은 교합에 의해 입체 영상에 빈 영역이 나타내는 예를 나타내는 도면이다. Referring to [Fig. 7] and [Fig. 8], the principle of creating a void in a 3D image due to occlusion will be described. [Fig. 7] is a diagram illustrating the concept of occlusion in a stereoscopic image, and [Fig. 8] is a diagram illustrating an example in which a blank area is displayed in a stereoscopic image due to occlusion.

입체 사물의 일부분은 한쪽 카메라에서는 보이지만 다른 쪽 카메라에서는 보이지 않는 경우가 발생할 수 있다. [도 7]에서 '교합'이라고 표시된 영역은 좌측 카메라에서는 보이지만 우측 카메라에서는 보이지 않는 부분이다. 이러한 영역에 대해서는 어느 한쪽의 이격거리 정보를 얻을 수가 없기 때문에 3D 매칭(3D matching)이 이루어질 수 없으며, 3D AVM 영상(209)을 구성할 때에 빈곳으로 나타나게 된다. A part of a 3D object may be visible in one camera but not visible in the other camera. The area marked 'occlusion' in [Fig. 7] is a part visible from the left camera but not visible from the right camera. 3D matching cannot be made in this area because it is not possible to obtain information on the separation distance on either side, and an empty area appears when the 3D AVM image 209 is constructed.

[도 8] (a)는 책상과 걸상이 있는 평범한 카페를 촬영한 영상이고, [도 8] (b)는 이 지점을 좌우 카메라로 촬영한 후에 3D 합성한 결과를 나타낸 것이다. [도 8] (b)을 참조하면, 물체에 가려져서 어느 한쪽의 카메라에는 보이지 않는 영역은 해당 카메라 영상에서 이격거리 정보를 얻을 수 없게 되어 3D 영상에서 검은 색으로 처리되었다. [도 8] (b)에서 붉은색은 가까운 거리의 영역을 나타내고 파란색은 먼 거리의 영역을 나타내며 검은색은 이격거리 정보가 없는 영역을 나타낸다. [Fig. 8] (a) is an image of an ordinary cafe with desks and stools, and [Fig. 8] (b) shows the result of 3D synthesis after photographing this point with the left and right cameras. Referring to [Fig. 8] (b), the area that is not visible to either camera because it is covered by an object is processed as black in the 3D image because separation distance information cannot be obtained from the corresponding camera image. In [FIG. 8] (b), red color indicates a near area, blue color indicates a far area, and black indicates an area without separation distance information.

3D AVM 시스템에서는 카메라(11 ~ 14)가 겹쳐지는 영역에서 교합 영역이 발생할 수 있다. 교합 영역에서의 오류를 그대로 둔 상태로 3D AVM 영상(209)을 구성하면 그 품질이 낮게 평가될 것이므로, 본 발명에서는 3D AVM 영상 합성부(200)의 교합 처리부(220)가 교합 영역에서의 오류를 제거하는 처리를 수행한다.In the 3D AVM system, an occlusion area may occur in an area where the cameras 11 to 14 overlap. If the 3D AVM image 209 is constructed with the error in the occlusal region as it is, the quality will be evaluated low. Perform processing to remove .

3D AVM 영상 합성부(200)가 카메라 영상(15 ~ 18)에 대한 영상합성 과정에서 교합 영역을 식별하게 된다. 교합 영역의 식별은 영상합성 분야에서는 공지기술에 속하므로 본 명세서에서는 자세하게 기술하지 않는다. 교합 영역에 대해서는 어느 하나의 깊이 맵(105 ~ 108)에 이격거리 정보가 존재하지 않는 것이다. 이 경우에는 3D AVM 영상 합성부(200)의 교합 처리부(220)가 교합 영역에 대한 이격거리 정보를 바이리니어 인터폴레이션(bilinear interpolation, 이중선형보간)에 의해 추정한다. 바이리니어 인터폴레이션은 주변 값을 이용하여 특정 지점의 값을 생성하는 방식으로, 카메라 영상(15 ~ 18) 및 해당 깊이 맵(105 ~ 108)에 대한 바이리니어 인터폴레이션은 [수학식 3]에 의해 이루어질 수 있다. 바이리니어 인터폴레이션 기법은 널리 알려진 기술이므로 본 명세서에서는 자세한 설명을 생략한다.The 3D AVM image synthesis unit 200 identifies the occlusal region during image synthesis of the camera images 15 to 18 . Identification of the occlusion region belongs to a known technology in the field of image synthesis, so it is not described in detail in this specification. Regarding the occlusal region, separation distance information does not exist in any one of the depth maps 105 to 108. In this case, the occlusion processing unit 220 of the 3D AVM image synthesis unit 200 estimates the separation distance information for the occlusion region by bilinear interpolation. Bilinear interpolation is a method of generating a value at a specific point using surrounding values. have. Since the bilinear interpolation technique is a widely known technique, a detailed description thereof is omitted in this specification.

이때, (x, y)는 특정의 카메라 영상(15 ~ 18)에서 물체에 가려져 깊이 맵(105 ~ 108)에 이격거리 정보가 존재하지 않는 지점, 즉 바이리니어 인터폴레이션을 통해 이격거리 값을 구하려고 하는 지점의 좌표 값이다. 이때, [도 7]을 참조하면, 카메라 영상(15 ~ 18)의 좌표계가 아니라 글로벌 좌표계(global coordinate)로 변환한 좌표 값을 적용하는 것이 바람직하다. 그 이격거리 값을 구하려는 지점 근방으로 깊이 맵(105 ~ 108)에 이격거리 값이 존재하는 지점으로 사각형을 정의하고 그 꼭지점을 Q11(x1, y1), Q12(x1, y2), Q21(x2, y1), Q22(x2, y2)이라 정한다. 이때, f(Q11), f(Q12), f(Q21), f(Q22)는 이들 Q11, Q12, Q21, Q22 지점의 이격거리 값이다. At this time, (x, y) is a point where separation distance information does not exist in the depth maps 105 to 108 because it is occluded by an object in the specific camera images 15 to 18, that is, to obtain the separation distance value through bilinear interpolation. is the coordinate value of the point to be At this time, referring to [FIG. 7], it is preferable to apply the coordinate values converted to the global coordinate system instead of the coordinate system of the camera images 15 to 18. A rectangle is defined as a point where the separation distance value exists in the depth map (105 ~ 108) near the point where the separation distance value is to be obtained, and its vertices are Q11 (x1, y1), Q12 (x1, y2), Q21 (x2, y1), Q22 (x2, y2). At this time, f(Q11), f(Q12), f(Q21), and f(Q22) are separation distance values of these points Q11, Q12, Q21, and Q22.

교합 처리부(220)는 바이리니어 인터폴레이션을 통해 교합 영역에 대한 이격거리 정보를 추정할 수 있으며, 그에 따라 3D AVM 영상 합성부(200)는 교합에 의한 오류를 일정 부분 제거하고 고품질의 3D AVM 영상(209)을 생성할 수 있다. 이와 같은 교합 영역의 처리에 의해 카메라(11 ~ 14)가 겹쳐지는 영역에서의 왜곡을 효과적으로 제거할 수 있다.The occlusion processor 220 may estimate separation distance information for the occlusion region through bilinear interpolation, and accordingly, the 3D AVM image synthesis unit 200 partially removes errors caused by occlusion and produces high-quality 3D AVM images ( 209) can be created. Distortion in the region where the cameras 11 to 14 overlap can be effectively removed by such processing of the occlusion region.

한편, 본 발명은 컴퓨터가 읽을 수 있는 비휘발성 기록매체에 컴퓨터가 읽을 수 있는 코드의 형태로 구현되는 것이 가능하다. 이러한 비휘발성 기록매체로는 다양한 형태의 스토리지 장치가 존재하는데 예컨대 하드디스크, SSD, CD-ROM, NAS, 자기테이프, 웹디스크, 클라우드 디스크 등이 있고 네트워크로 연결된 다수의 스토리지 장치에 코드가 분산 저장되고 실행되는 형태도 구현될 수 있다. 또한, 본 발명은 하드웨어와 결합되어 특정의 절차를 실행시키기 위하여 매체에 저장된 컴퓨터프로그램의 형태로 구현될 수도 있다. On the other hand, the present invention can be implemented in the form of computer readable codes on a computer readable non-volatile recording medium. As such non-volatile recording media, there are various types of storage devices, such as hard disks, SSDs, CD-ROMs, NAS, magnetic tapes, web disks, and cloud disks. and can be implemented in a form that is executed. In addition, the present invention may be implemented in the form of a computer program stored in a medium in order to execute a specific procedure in combination with hardware.

11 ~ 14 : 카메라
15 ~ 18 : 카메라 영상
100 : 딥러닝 신경망 모델부
105 ~ 108 : 깊이 맵
110 : 딥러닝 학습부
200 : 3D AVM 영상 합성부
209 : 3D AVM 영상
210 : 카메라 보정 처리부
220 : 교합 처리부
310 : 차량 카메라
311 : 훈련용 카메라 영상
320 : 거리측정 센서
321 : 훈련용 이격거리 정보11 to 14: Camera
15 ~ 18: Camera video
100: deep learning neural network model unit
105 to 108: depth map
110: deep learning learning unit
200: 3D AVM video synthesis unit
209: 3D AVM video
210: camera calibration processing unit
220: occlusal processing unit
310: vehicle camera
311: camera video for training
320: distance measurement sensor
321: distance information for training

Claims

A plurality of cameras (11 to 14) for generating a plurality of camera images (15 to 18) in front, rear, left and right directions of the vehicle;
For a series of image frames, a deep learning neural network model for estimating separation distance information for each preset unit image is provided, and the deep learning neural network is provided for the plurality of camera images 15 to 18 in the front, rear, left, and right directions of the vehicle. a deep learning neural network model unit 100 that generates a plurality of depth maps 105 to 108 having separation distance information for each unit image for a series of image frames by the model;
The depth map ( 105 to 108), an occlusal processing unit 220 for estimating distance information for the occlusal region by defining a quadrangle having a distance value and applying bilinear interpolation to the distance value of each vertex of the quadrangle;
For the plurality of camera images 15 to 18 in the front, rear, left, and right directions of the vehicle, separation distance information for each unit image stored in the depth maps 105 to 108 corresponding to the corresponding image frame is reflected to obtain an atypical image for the front, rear, left, and right directions of the vehicle. Setting a 3D projection plane of the shape, projecting a plurality of camera images 15 to 18 in the front, rear, left and right directions of the vehicle onto the 3D projection plane of the atypical shape and stitching the images to generate a 3D AVM image 209, and the image stitching process In the area where the plurality of camera images 15 to 18 for the front, rear, left, and right directions of the vehicle are mutually overlapped, if an occlusal area in which separation distance information does not exist is identified in either depth map, the occlusal processing unit 220 a 3D AVM image synthesis unit 200 configured to process the image stitching while removing an error caused by the occlusion by receiving separation distance information for the identified occlusion area and supplementing the separation distance information to the depth map;
A deep learning-based 3D around-view monitoring system comprising a.

delete

The method of claim 1,
A camera that removes distortion due to camera lenses and distortion due to camera mounting positions and angles from the plurality of camera images 15 to 18 using internal parameters and external parameters stored in advance for the plurality of cameras 11 to 14 correction processing unit 210;
Deep learning-based 3D around view monitoring system, characterized in that configured to further include.

The method of claim 1,
A vehicle camera 310 installed in a vehicle to generate a training camera image 311 according to a first time interval;
a distance measurement sensor 320 installed in the vehicle to face the same direction as the vehicle camera 310 and generating distance measurement sensor data according to a second time interval;
Distance measurement information 321 for training by applying interpolation to the distance measurement sensor data so that the training camera image 311 and the distance measurement sensor data are provided and synchronized with the information acquisition time of the training camera image 311 A deep learning learning unit ( 110);
Deep learning-based 3D around view monitoring system, characterized in that configured to further include.

The method of claim 5,
The distance measurement sensor 320 is a deep learning-based 3D around view monitoring system, characterized in that configured to include a LiDAR sensor.