KR101907883B1

KR101907883B1 - Object detection and classification method

Info

Publication number: KR101907883B1
Application number: KR1020170058182A
Authority: KR
Inventors: 민지홍; 강행봉; 오상일
Original assignee: 국방과학연구소
Priority date: 2017-05-10
Filing date: 2017-05-10
Publication date: 2018-10-16
Also published as: WO2018207969A1

Abstract

본 발명은 영상에서 객체를 추출 및 분류하는 방법에 관한 것으로, 2차원 영상을 촬영하는 2차원 영상 센서와 상기 촬영된 2차원 영상으로부터 특정 객체를 추출 및 상기 특정 객체에 대한 클래스를 분류하는 2차원 영상 처리부와 3차원 영상을 촬영하는 3차원 영상 센서와 상기 촬영된 3차원 영상으로부터 상기 특정 객체를 추출 및 상기 특정 객체에 대한 클래스를 분류하는 3차원 영상 처리부와 상기 2차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과와, 상기 3차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과를 이용하여, 상기 특정 객체에 대한 최종 클래스 분류 결과를 산출하는 융합 처리부를 포함한다. The present invention relates to a method of extracting and classifying objects in an image, and more particularly, to a method of extracting and classifying objects in an image, which comprises a two-dimensional image sensor for photographing a two- A three-dimensional image sensor for capturing a three-dimensional image; a three-dimensional image processing unit for extracting the specific object from the photographed three-dimensional image and classifying the class for the specific object; And a fusion processing unit for calculating a final class classification result for the specific object using a class classification result of the object and a classification result of the specific object extracted by the three-dimensional image processing unit.

Description

{OBJECT DETECTION AND CLASSIFICATION METHOD}

본 발명은 영상에서 객체를 추출 및 분류하는 방법에 관한 것이다. The present invention relates to a method for extracting and classifying objects in an image.

첨단 운전자 보조 시스템 (ADAS, Advanced driver assistant system)는 예측 불가능한 주행 상황에서 운전자의 안전을 위해 도움을 제공한다. ADAS는 2가지 부분으로 나누어 질 수 있는데, 인지 시스템과 경고 알림 인터페이스가 이에 해당된다. 첫 번째 부분에 대한 예로는 충돌 예측과 운전자의 부주의 검출, 두 번 째 부분은 운전자에게 이벤트에 대한 정보 제공을 포함한다. 하지만, 두 부분 모두 연속적인 지역 검출, 매핑, 그리고 운동 물체 추적, 검출 등 신뢰성 높은 객체 및 이벤트 검출을 요구한다는 공통점이 있다.Advanced driver assistant system (ADAS) provides assistance for driver safety in unpredictable driving situations. ADAS can be divided into two parts, the cognitive system and the alert notification interface. Examples of the first part include collision prediction and operator's carelessness detection, and the second part includes providing information to the driver about the event. However, both parts have commonality in requiring reliable object and event detection such as continuous area detection, mapping, and moving object tracking and detection.

효과적인 관심 객체에 대한 식별 및 지역 검출을 수행하는 객체 검출 및 인식 방법은 다양한 분야에서 중요하게 적요된다. 객체 검출은 주행 중 실시간 지도에서 진행되어지며 객체 분류는 오프라인 데이터베이스에서 학습된 분류 모델을 사용해 진행된다. 객체 검출 및 분류 시스템은 객체 후보를 검출하는 후보 검출기와 검출된 객체 후보 영역을 분류하는 분류 모델로 나누어진다. 관심 객체 영역은 보통 하나의 특징 벡터로써 표현되며, support vector machine (SVM), Adaboost 등 의 기계학습 모델을 통해 분류하는 것이 전통적인 방식이다.Object detection and recognition methods that perform identification and region detection for effective objects of interest are critically important in various fields. Object detection is performed on real - time maps while driving, and object classification is performed using a classification model learned in an off - line database. The object detection and classification system is divided into a candidate detector that detects an object candidate and a classification model that classifies the detected object candidate region. The region of interest is usually represented as a feature vector, and it is conventional to classify it through a machine learning model such as support vector machine (SVM) or Adaboost.

지능형 차량 시스템에서, 객체 검출 및 분류 성능을 향상시키기 위한 한 가지 방법은 몇 가지 센서의 측정을 융합하는 것이다. 이 때, 서로 다른 센서의 불완전한 측정을 관리하는 것은 이러한 시스템을 구성하는데 있어 매우 중요하다. 서로 다른 센서의 융합을 위한 방법은 크게 두 가지로 나누어지는데 특징 융합과 결정 융합이 포함된다. 특징 융합 방법은 비 가공 데이터 혹은 데이터별 특성을 선택적으로 융합하는 방법이다. 비록 많은 특징 융합 방법이 제안되었음에도 불구하고 다중 센서 모달리티에 포함되는 센서 중 하나의 센서에 문제가 발생 할 경우 시스템 전체에 부정적 영향을 미칠 수 있다. 이에 반해 결정 융합 방법은 객체 검출 및 분류 과정을 각 센서별로 독립적 수행을 하며 각 센서에서의 결과를 융합해 최종 결과를 도출하는 방식이다. In intelligent vehicle systems, one way to improve object detection and classification performance is to combine measurements from several sensors. At this time, managing incomplete measurements of different sensors is very important in constructing such a system. There are two main methods for fusion of different sensors, which include feature fusion and crystal fusion. The feature fusion method is a method of selectively fusing non-processed data or data-specific characteristics. Although many feature fusion methods have been proposed, if a problem occurs in one of the sensors included in the multi-sensor modality, it may have a negative effect on the whole system. On the other hand, the decision fusion method performs object detection and classification processes independently for each sensor, and fuses results from each sensor to derive the final result.

본 발명에서는 각 센서 별 객체 후보 영역을 검출하는 방법을 제안한다. 효과적인 객체 후보 검출을 위해, 적은 수의 의미있는 객체 후보 영역을 찾는 것을 목표로 한다. CCD 센서에서 측정된 영상 데이터에 대해 색상 평활화를 적용한 영상 분할 및 의미적 분할된 영상 그루핑을 통해 효과적으로 객체 후보 영역을 검출 할 수 있다. LIDAR 센서를 통해 측정된 3차원 점 구름 데이터에 대해서 슈퍼복셀 분할 및 영역 성장 방법을 적용해 객체 후보 영역을 각각 검출 한다. 또한 다중 레이어 레이저 스캐너 (3D LIDAR)와 CCD 센서를 융합하는 객체 분류 방법을 제안한다. 이를 위해, 각 센서에서 검출된 객체 후보 영역을 컨벌루션 인공 신경망 (이하 CNN, Convolutional Neural Network)을 사용해 분류하고 또 다른 CNN을 통해 최종 융합 분류를 수행한다.In the present invention, a method of detecting an object candidate region for each sensor is proposed. For effective object candidate detection, we aim to find a small number of meaningful candidate object regions. It is possible to effectively detect an object candidate region by performing image segmentation using color smoothing and semi-segmented image grouping on the image data measured by the CCD sensor. For the 3D point cloud data measured by the LIDAR sensor, super-voxel segmentation and region growth method are applied to detect object candidate regions. We also propose an object classification method that fuses a multi-layer laser scanner (3D LIDAR) with a CCD sensor. To do this, object candidate regions detected by each sensor are classified using convolutional neural network (CNN) and final fusion classification is performed through another CNN.

본 발명은 객체 검출 장치의 객체 검출의 정확도를 향상시키는 것을 일 목적으로 한다. The object of the present invention is to improve the accuracy of object detection of an object detection apparatus.

또한, 본 발명은 객체 검출 장치에 설치된 다수의 센서들의 센서 정보들을 융합하여, 객체 검출의 정확도를 향상시키는 것을 또 다른 목적으로 한다. It is another object of the present invention to improve the accuracy of object detection by fusing sensor information of a plurality of sensors installed in the object detection apparatus.

본 발명은 2차원 영상을 촬영하는 2차원 영상 센서와 상기 촬영된 2차원 영상으로부터 특정 객체를 추출 및 상기 특정 객체에 대한 클래스를 분류하는 2차원 영상 처리부와 3차원 영상을 촬영하는 3차원 영상 센서와 상기 촬영된 3차원 영상으로부터 상기 특정 객체를 추출 및 상기 특정 객체에 대한 클래스를 분류하는 3차원 영상 처리부와 상기 2차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과와, 상기 3차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과를 이용하여, 상기 특정 객체에 대한 최종 클래스 분류 결과를 산출하는 융합 처리부를 포함한다. The present invention relates to a two-dimensional image sensor for photographing a two-dimensional image, a two-dimensional image processing unit for extracting a specific object from the photographed two-dimensional image, classifying a class of the specific object, A three-dimensional image processing unit for extracting the specific object from the photographed three-dimensional image and classifying a class for the specific object, a classification result of the specific object extracted from the two-dimensional image processing unit, And a fusion processing unit for calculating a final class classification result for the specific object by using the class classification result of the extracted specific object.

일 실시 예에 있어서, 상기 2차원 영상 처리부는 상기 2차원 영상을 색상 평활화 처리하고, 상기 색상 평활화 처리된 2차원 영상으로부터 객체 후보 영역을 검출하는 것을 특징으로 한다. In one embodiment, the two-dimensional image processing unit performs color smoothing processing on the two-dimensional image, and detects an object candidate region from the two-dimensional image subjected to the color smoothing processing.

일 실시 예에 있어서, 상기 2차원 영상 처리부는 상기 색상 평활화 처리된 2차원 영상을 복수의 영역으로 분할하고, 색상 및 텍스쳐의 비유사도에 근거하여, 상기 복수의 영역의 적어도 일부를 객체 후보 영역으로 추출하는 것을 특징으로 한다. In one embodiment, the two-dimensional image processing unit divides the color-smoothed two-dimensional image into a plurality of regions, and based on the non-skewness of colors and textures, converts at least a part of the plurality of regions into object candidates .

일 실시 예에 있어서, 상기 3차원 영상은 점 구름 데이터들로 이루어져 있고, 상기 3차원 영상 처리부는 상기 3차원 영상을 이루는 점 구름 데이터들을 복셀 공간으로 변환하고, 상기 변환된 복셀 공간을 단위 크기를 갖는 슈퍼 복셀로 분할하며, 각 공간에 포함된 점 구름 데이터의 높이차에 근거하여, 상기 슈퍼 복셀을 그룹핑하여, 객체 후보 영역을 검출하는 것을 특징으로 한다. In one embodiment, the 3D image is composed of point cloud data, and the 3D image processor converts the point cloud data constituting the 3D image into a voxel space, and converts the converted voxel space into a unit size And super voxels are grouped based on height difference of point cloud data included in each space to detect an object candidate region.

일 실시 예에 있어서, 상기 복셀 공간은 잡음 효과를 최소화하도록 기 설정된 크기를 갖는 것을 특징으로 한다. In one embodiment, the voxel space has a predetermined size to minimize a noise effect.

일 실시 예에 있어서, 상기 융합 처리부는 상기 2차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과와, 상기 3차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과 사이의 연관성을 계산하고, 상기 연관성에 근거하여, 상기 특정 객체에 대한 최종 클래스 분류 결과를 산출하는 것을 특징으로 한다. In one embodiment, the fusion processing unit calculates a correlation between a classification result of a specific object extracted from the two-dimensional image processing unit and a classification result of a specific object extracted from the three-dimensional image processing unit, The final class classification result for the specific object is calculated.

일 실시 예에 있어서, 상기 융합 처리부는, 상기 2차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과와, 상기 3차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과가 서로 연관되었다고 판단되면, 상기 2차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과와, 상기 3차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과를 조합하여, 최종 클래스 분류 결과를 산출하고, 상기 2차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과와, 상기 3차원 영상 처리부에서 추출된 특정 객체의 클래스 분류 결과가 서로 연관되지 않았다고 판단되면, 최종 클래스 분류 결과를 산출하지 않는 것을 특징으로 한다. In one embodiment, when it is determined that the class classification result of the specific object extracted by the two-dimensional image processing unit and the classification result of the specific object extracted by the three-dimensional image processing unit are related to each other, A classification result of a specific object extracted by the 2D image processing unit and a classification result of a specific object extracted by the 3D image processing unit are combined to calculate a final class classification result, And the class classification result of the specific object extracted by the 3D image processing unit is not related to each other, the final class classification result is not calculated.

본 발명은 2차원 영상과 3차원 영상에서 각각 특정 객체를 검출 및 분류하고, 각각 검출 및 분류된 결과 정보의 연관성 여부에 따라, 두 결과의 융합 여부를 결정함으로써, 특정 객체 추출 및 분류의 정확도를 향상시킬 수 있다. The present invention detects and classifies specific objects in two-dimensional images and three-dimensional images, and determines whether two objects are fused or not according to whether the detected and classified result information is related to each other, Can be improved.

도 1은 객체 검출 장치의 구성을 나타낸 개념도이다.
도 2는 차량에 객체 검출 장치(1000)가 설치된 모습을 나타낸 개념도이다.
도 3은 본 발명에 따른 객체 검출 장치가 특정 객체를 검출하는 방법을 나타낸 흐름도이다.
도 4는 본 발명에 따른 객체 검출 장치가 3차원 영상에서 특정 객체를 추출하는 방법을 나타낸 흐름도이다.
도 5는 본 발명에 따른 객체 검출 장치가 복수의 센서를 통하여 추출된 추출 결과들을 조합하는 방법을 나타낸 흐름도이다. 1 is a conceptual diagram showing a configuration of an object detecting apparatus.
2 is a conceptual diagram showing a state in which an object detection apparatus 1000 is installed in a vehicle.
3 is a flowchart illustrating a method of detecting an object according to an embodiment of the present invention.
4 is a flowchart illustrating a method of extracting a specific object from a three-dimensional image according to an embodiment of the present invention.
5 is a flowchart illustrating a method of combining extraction results extracted through a plurality of sensors by the object detection apparatus according to the present invention.

이하에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시 예를 상세히 설명한다. 그러나 본 발명은 이하의 실시 예에 한정되지 않으며, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에게 효율적으로 설명하기 위한 수단일 뿐이다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. However, the present invention is not limited to the following embodiments, but merely serves as a means for efficiently explaining to a person having ordinary skill in the art to which the present invention belongs.

그리고 본 발명을 명확하게 설명하기 위하여, 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우, 그 상세한 설명은 생략하였다. 또한, 본 발명의 도면에서는, 명세서 전체를 통하여 동일한 구성 요소에 대하여, 동일한 도면 부호를 붙여 설명한다.In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. In the drawings of the present invention, the same constituent elements are denoted by the same reference numerals throughout the entire specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Throughout the specification, when a part is referred to as being "connected" to another part, it includes not only "directly connected" but also "electrically connected" with another part in between . Also, when a part is referred to as "including " an element, it does not exclude other elements unless specifically stated otherwise.

본 명세서에 있어서 '부(部)' 또는 ‘모듈’이란, 하드웨어 또는 소프트웨어에 의해 실현되는 유닛(unit), 양방을 이용하여 실현되는 유닛을 포함하며, 하나의 유닛이 둘 이상의 하드웨어를 이용하여 실현되어도 되고, 둘 이상의 유닛이 하나의 하드웨어에 의해 실현되어도 된다.Herein, the term " part " or " module " means a unit realized by hardware or software, a unit realized by using both, and a unit realized by using two or more hardware Or two or more units may be realized by one hardware.

그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. The following terms are defined in consideration of the functions of the present invention, and may be changed according to the intention or custom of the user, the operator, and the like. Therefore, the definition should be based on the contents throughout this specification.

자율 주행 차량은 자율 주행을 위하여, 차량의 주변에 존재하는 객체의 검출 및 분류 기술을 필요로 한다. 본 발명은 자율 주행 차량에서, 다수의 센서에서 측정된 다수의 측정 결과를 이용하여, 차량의 주변에 존재하는 객체의 검출 및 분류를 수행하는 방법에 대하여 제안한다. Autonomous vehicles require the detection and classification of objects in the vicinity of the vehicle for autonomous driving. The present invention proposes a method for performing detection and classification of objects existing in the vicinity of a vehicle by using a plurality of measurement results measured in a plurality of sensors in an autonomous vehicle.

본 발명에 따른 객체 검출 장치(1000)는 다수에 센서에서 측정된 다수의 측정 결과를 이용하여 객체의 검출 및 분류를 수행할 수 있다. The object detection apparatus 1000 according to the present invention can perform object detection and classification using a plurality of measurement results measured in a plurality of sensors.

상기 객체 검출 장치(1000)는 차량의 내부 구성 요소일 수도 있고, 차량의 외부 구성 요소로써, 차량의 다른 구성 요소들과 유선 또는 무선으로 통신 가능하도록 형성될 수 있다.The object detecting apparatus 1000 may be an internal component of the vehicle, and may be formed as an external component of the vehicle so as to be able to communicate with other components of the vehicle in a wired or wireless manner.

이하에서는, 도 1 및 도 2와 함께 객체 검출 장치(1000)의 구성에 대하여 살펴본다. Hereinafter, the configuration of the object detecting apparatus 1000 will be described with reference to FIGS. 1 and 2. FIG.

도 1을 참고하면, 객체 검출 장치(1000)는 2차원 영상 센서(110), 2차원 영상 처리부(120), 3차원 영상 센서(210), 3차원 영상 처리부(220) 및 융합 처리부(310)을 포함할 수 있다. 1, the object detecting apparatus 1000 includes a two-dimensional image sensor 110, a two-dimensional image processing unit 120, a three-dimensional image sensor 210, a three-dimensional image processing unit 220, a fusion processing unit 310, . &Lt; / RTI >

2차원 영상 센서(110)는 2차원 영상을 촬영하는 이미지 센서이다. 이러한 이미지 센서의 일 예로, CCD(charging coupled device) 센서가 있다. 2차원 영상 센서(110)는 양안 카메라에 포함될 수 있다. 따라서, 2차원 영상 센서(110)는 차량 주변을 360도로 촬영한 2차원 영상을 생성할 수 있다. The two-dimensional image sensor 110 is an image sensor that captures a two-dimensional image. An example of such an image sensor is a charging coupled device (CCD) sensor. The two-dimensional image sensor 110 may be included in the binocular camera. Accordingly, the two-dimensional image sensor 110 can generate a two-dimensional image obtained by photographing the surroundings of the vehicle at 360 degrees.

2차원 영상 센서(110)는 차량의 주변 환경을 촬영하도록 차량의 외주면에 설치될 수 있다. 특히, 도 2를 참조하면, 2차원 영상 센서(110)는 차량의 천장에 설치될 수 있다. The two-dimensional image sensor 110 may be installed on the outer circumferential surface of the vehicle so as to photograph the surrounding environment of the vehicle. In particular, referring to FIG. 2, the two-dimensional image sensor 110 may be installed on the ceiling of the vehicle.

2차원 영상 처리부(120)는 상기 2차원 영상 센서(110)를 통하여 촬영된 2차원 영상으로부터 특정 객체를 검출하는 역할을 수행할 수 있다. 특정 객체는, 차량 주변에 위치한 객체로써, 예를 들어, 보행자, 주변 차량, 자전거, 가로등 등이 될 수 있다. The two-dimensional image processing unit 120 may detect a specific object from the two-dimensional image captured through the two-dimensional image sensor 110. The specific object may be an object located in the vicinity of the vehicle, for example, a pedestrian, a nearby vehicle, a bicycle, a streetlight, or the like.

3차원 영상 센서(210)는 3차원 영상을 촬영하는 이미지 센서이다. 3차원 영상 센서(210)의 일 예로, 3차원 레저스캐너 센서가 있다. 3차원 영상 센서(210)는 차량의 주변을 360도로 촬영한 3차원 영상을 생성할 수 있다. 도 2와 같이, 3차원 영상 센서(210)는 차량의 천장 쪽에 설치될 수 있다. The three-dimensional image sensor 210 is an image sensor for capturing a three-dimensional image. As an example of the three-dimensional image sensor 210, there is a three-dimensional leisure scanner sensor. The three-dimensional image sensor 210 can generate a three-dimensional image of the surroundings of the vehicle 360 degrees. 2, the three-dimensional image sensor 210 may be installed on the ceiling of the vehicle.

3차원 영상 처리부(220)는 상기 3차원 영상 센서(210)를 통하여 촬영된 3차원 영상으로부터 특정 객체를 검출하는 역할을 수행할 수 있다. The three-dimensional image processing unit 220 may detect a specific object from the three-dimensional image captured through the three-dimensional image sensor 210.

융합 처리부(310)는 상기 2차원 영상 처리부(120)에서 검출한 특정 객체와, 상기 3차원 영상 처리부(220)에서 검출한 특정 객체를 조합하여, 차량 주변에 위치한 특정 객체를 검출 및 분류할 수 있다. The fusion processing unit 310 may combine a specific object detected by the two-dimensional image processing unit 120 and a specific object detected by the three-dimensional image processing unit 220 to detect and classify a specific object located in the vicinity of the vehicle have.

이상에서는 차량 주변에 존재하는 특정 객체를 검출 및 분류하는 객체 분류 장치(1000)에 대하여 설명하였다. The object classification apparatus 1000 for detecting and classifying a specific object existing around the vehicle has been described above.

이하에서는 본 발명에 따른 객체 검출 장치가 2차원 영상으로부터 객체를 검출하는 방법에 대하여 설명한다. 도 3은 본 발명에 따른 객체 검출 장치가 특정 객체를 검출하는 방법을 나타낸 흐름도이다. Hereinafter, a method for detecting an object from a two-dimensional image by the object detecting apparatus according to the present invention will be described. 3 is a flowchart illustrating a method of detecting an object according to an embodiment of the present invention.

우선, 2차원 영상 처리부(120)는, 2차원 영상 데이터에 대하여, 색상 평활화 처리를 수행할 수 있다(S310). First, the two-dimensional image processing unit 120 can perform color smoothing processing on the two-dimensional image data (S310).

2차원 영상 처리부(120)는 2차원 영상으로부터 객체 검출 시, 객체 검출의 효율성을 향상시키기 위하여, 2차원 영상을 색상 평활화 처리할 수 있다. 상기 색상 평활화는 2차원 영상의 색상을 균일하게 처리하는 알고리즘이다. 즉, 2차원 영상 처리부(120)는 2차원 영상의 색상을 단조롭게 변경하여, 2차원 영상에 포함된 특정 객체를 더욱 효율적으로 검출할 수 있다. The two-dimensional image processing unit 120 may perform color smoothing processing on a two-dimensional image in order to improve object detection efficiency when detecting an object from a two-dimensional image. The color smoothing is an algorithm for uniformly processing hues of a two-dimensional image. That is, the two-dimensional image processing unit 120 may monotonously change the color of the two-dimensional image to more efficiently detect a specific object included in the two-dimensional image.

상기 색상 평활화는 L1 영상 색상 변환 기술에 기반을 두고 있다. 상기 2차원 영상 처리부(120)는 2차원 영상을 L1 영상 색상 변환 기술을 이용하여, 색상 평활화를 수행한 변환 영상을 생성할 수 있다. 이하에서는, L1 영상 색상 변환이 수행되기 이전의 2차원 영상을 원본 영상(

), L1 영상 변환이 수행된 2차원 영상을 변환 영상(

)으로 명명한다. The color smoothing is based on the L1 image color conversion technique. The two-dimensional image processing unit 120 may generate a transformed image by performing color smoothing using a two-dimensional image using an L1 image color conversion technique. Hereinafter, a two-dimensional image before the L1 image color conversion is performed is referred to as an original image (

), A two-dimensional image on which L1 image conversion is performed is converted into a transformed image

).

2차원 영상 처리부(120)는 하기의 수학식 1로 정의된 에너지 함수를 이용하여, 원본 영상을 변환 영상으로 변환할 수 있다. The two-dimensional image processing unit 120 may convert the original image into a transformed image using the energy function defined by Equation (1).

[수학식 1][Equation 1]

(

: 에너지 함수,

: 원본 영상의 픽셀과 변환 영상의 픽셀 간의 내부적 유사도,

: 이웃하는 두 픽셀 사이의 유사도)(

: Energy function,

: The internal similarity between the pixels of the original image and the pixels of the transformed image,

: Similarity between two neighboring pixels)

보다 구체적으로, 2차원 영상 처리부(120)는 원본 영상의 픽셀과 변환 영상의 픽셀 간의 내부적 유사도(

)를 계산할 수 있다. 상기

는 하기의 수학식 2로 정의될 수 있다. More specifically, the two-dimensional image processing unit 120 converts the internal similarity between pixels of the original image and pixels of the converted image (

) Can be calculated. remind

Can be defined by the following equation (2).

[수학식 2]&Quot; (2) "

(

: 원본 영상과 변환 영상 사이의 픽셀 간의 내부적 유사도,

: 변환 영상의 모든 픽셀 값을 이어 붙인 벡터,

: 원본 영상의 모든 픽셀 값을 이어 붙인 벡터)(

: Internal similarity between pixels between original image and converted image,

: A vector to which all pixel values of a transformed image are connected,

: Vector in which all pixel values of the original image are connected)

즉,

는 원본 영상의 픽셀과, 변환 영상의 픽셀 간의 차이 값이 클수록 더 큰 값을 가질 수 있다. 따라서,

의 값이 크다는 것은, 원본 영상의 픽셀과 변환 영상의 픽셀 사이의 유사도가 낮다는 의미이고,

값이 작다는 것은, 원본 영상의 픽셀과 변환 영상의 픽셀 사이의 유사도가 높다는 의미이다.In other words,

The greater the difference between the pixel of the original image and the pixel of the transformed image, the larger the value can be. therefore,

Means that the degree of similarity between the pixels of the original image and the pixels of the transformed image is low,

The smaller value means that the similarity between the pixels of the original image and the pixels of the converted image is high.

2차원 영상 처리부(120)는 상기 픽셀 간 내부적 유사도를 나타내는

에 근거하여, 색상 평활화를 수행함으로써, 2차원 영상에 포함된 모든 픽셀이 동일한 밝기 로 변환되는 것을 방지할 수 있다. The two-dimensional image processing unit 120 generates a two-

By performing color smoothing, it is possible to prevent all the pixels included in the two-dimensional image from being converted to the same brightness.

보다 구체적으로, 2차원 영상 처리부(120)는

를 최소화하기 위하여,

가 최소가 되는 값을 찾을 수 있다. 2차원 영상 처리부(120)는

를 기 설정된 횟수만큼 반복적으로 변경하면서,

가 최소화되는

을 찾을 수 있다. 상기

에 대한 방향성은 split bregman 방법을 통하여, 결정될 수 있다. More specifically, the two-dimensional image processing unit 120

In order to minimize,

Can be found. The two-dimensional image processing unit 120

Is repeatedly changed a predetermined number of times,

Is minimized

Can be found. remind

The directionality to the bregman method can be determined through the split bregman method.

또한, 상기 2차원 영상 처리부(120)는 서로 이웃하는 픽셀 사이의 유사도(

)를 계산할 수 있다. 이렇게 서로 이웃하는 픽셀 사이의 유사도를 지역적 완만함이라는 용어로도 명명할 수 있다. 이하, 설명의 편의를 위하여, 서로 이웃하는 2개의 픽셀을 픽셀 쌍이라고 명명하여 설명한다. In addition, the two-dimensional image processing unit 120 may calculate a similarity degree between neighboring pixels

) Can be calculated. The similarity between neighboring pixels can also be termed as the term "local smoothness". Hereinafter, for convenience of explanation, two neighboring pixels are referred to as pixel pairs.

상기

는 하기의 수학식 3에 의하여 계산될 수 있다. remind

Can be calculated by the following equation (3).

[수학식 3]&Quot; (3) "

(

: 이웃하는 픽셀 사이의 유사도,

: 변환 영상의 픽셀(

)에서의 RGB 벡터,

: 픽셀 쌍(

,

)의 가중치,

: 변환 영상의 픽셀 수,

: 변환 영상의 픽셀(

)에 이웃하는 mXm 픽셀 범위)(

: Similarity between neighboring pixels,

: The pixel of the transformed image (

) Of the RGB vector,

: Pixel pair (

,

),

: The number of pixels of the converted image,

: The pixel of the transformed image (

) In the mXm pixel range)

상기 가중치는, 원본 영상에서 서로 유사한 색상 값을 갖는 픽셀 쌍에 높은 값이 할당될 수 있다. 따라서, 본 발명은, 서로 유사한 색상 값을 갖는 픽셀 쌍에 높은 가중치를 부여함으로써, 변환 영상의 서로 이웃하는 픽셀 사이의 색상 값의 차이를 최소화하여, 색상 평활화를 수행할 수 있다. The weight may be assigned a high value to pixel pairs having similar color values in the original image. Therefore, the present invention can perform color smoothing by minimizing a difference in color values between neighboring pixels of a converted image by giving a high weight to pixel pairs having similar color values.

상기 가중치는, 하기의 수학식 4로 계산될 수 있다. The weight can be calculated by the following equation (4).

[수학식 4]&Quot; (4) "

(

: 픽셀 쌍(

,

)의 가중치,

: CIELab 색상 공간에서의 픽셀(

) ,

: 조명 변화와 관련된 상수,

: 분산,

: i번째 픽셀의 CIELab 색상 공간에서의 l, a, b 값)(

: Pixel pair (

,

),

: Pixels in the CIELab color space (

),

: Constant related to illumination change,

: Dispersion,

: l, a, b values in the CIELab color space of the ith pixel)

여기에서,

는 조명 변화와 관련된 상수로,

값을 조정하여, 조명 변화를 최소화 시킬 수 있다. 보다 구체적으로,

<1 일 때, 픽셀 쌍은 조명 변화에 대해 둔감해 질 수 있다. From here,

Is a constant related to illumination change,

By adjusting the value, the illumination change can be minimized. More specifically,

&Lt; 1, the pixel pair may become insensitive to the illumination change.

이러한

와

는 반복적인 실험을 통하여, 최적의 성능을 도출하기 위한 상수 값이 결정될 수 있다. 바람직하게는, 실험 결과에 따라

와

는 각각 0.3과 1.0으로 설정할 수 있다. Such

Wow

A constant value for deriving the optimum performance can be determined through an iterative experiment. Preferably, depending on the results of the experiment

Wow

Can be set to 0.3 and 1.0, respectively.

한편, 상기 2차원 영상 처리부(120)는 최적의 변환 벡터

을 찾기 위하여, 하기의 수학식 5를 통하여, 수학식 3 및 수학식 4를 최적화할 수 있다. On the other hand, the two-dimensional image processing unit 120 converts the two-

(3) and (4) can be optimized through the following equation (5).

[수학식 5]&Quot; (5) "

(z : 변환 벡터,

: 상수,

: 최소 제곱 형태의 L1 에너지 항의 가중치를 제어하는 상수,

,

: split bregman 방법의 중간 변수) (z: transform vector,

: a constant,

: A constant controlling the weight of the L1 energy term of the least square form,

,

: intermediate variable of split bregman method)

상기

은 서로 이웃하는 픽셀에 속하는 상태를 나타내는 행렬일 수 있다. 보다 구체적으로,

은 mXn 행렬이며, pi가 pj의 mXm 이웃 픽셀에 속하는 경우,

, 그렇지 않은 경우,

을 가질 수 있다. remind

May be a matrix indicating states belonging to neighboring pixels. More specifically,

Is an mXn matrix, and if pi belongs to an mXm neighboring pixel of pj,

, Otherwise,

Lt; / RTI >

상기 2차원 영상 처리부(120)는 색상 및 텍스쳐의 유사도를 이용하여, 객체 후보 영역을 결정할 수 있다(S320). The two-dimensional image processing unit 120 can determine an object candidate region using the color and texture similarity (S320).

2차원 영상 처리부(120)는 상기 색상 평활화 처리가 수행된 변환 영상을 복수의 영역으로 분할할 수 있다. 여기에서, 분할된 하나의 영역을 파티션(partition)으로 명명할 수 있다. The two-dimensional image processing unit 120 may divide the converted image subjected to the color smoothing processing into a plurality of regions. Here, one divided area can be named as a partition.

2차원 영상 처리부(120)는 서로 인접한 파티션(

,

) 사이의 유사도에 따라, 적어도 하나의 파티션을 하나의 그룹으로 그룹화할 수 있다. 상기 파티션 사이의 유사도는 색상 및 텍스쳐에 대한 유사도이다.The two-dimensional image processing unit 120 performs a two-

,

, It is possible to group at least one partition into one group. The similarity between the partitions is a similarity to color and texture.

보다 구체적으로, 2차원 영상 처리부(120)는 하기의 [수학식 7]을 통하여, 서로 인접한 파티션(

,

) 사이의 비유사도를 계산할 수 있다. More specifically, the two-dimensional image processing unit 120 divides the adjacent partitions (i. E., &Lt; RTI ID = 0.0 &

,

) Can be calculated.

[수학식 7]&Quot; (7) "

(

: 서로 인접한 파티션 간의 비유사도,

: 서로 인접한 파티션(i, j) 간의 색상 비유사도,

: 서로 인접한 파티션(i,j) 간의 텍스쳐 비유사도,

: 색상, 텍스쳐 각각에 대한 가중치 상수)(

: A non-contour between adjacent contiguous partitions,

: Color contrast between adjacent (i, j) partitions,

: A texture analogy between adjacent (i, j) partitions,

: Weight, constant for each color, texture)

2차원 영상 처리부(120)는 HSV(Hue Saturation Value) 공간의 히스토그램을 이용하여, 색상 비유사도(

)를 계산할 수 있다. 보다 구체적으로, 2차원 영상 처리부(120)는 각 파티션의 색상 공간에 대한 각 색상 채널을 25칸의 히스토그램으로 변환할 수 있다. 그리고, 2차원 영상 처리부(120)는 H, S, V 색상 공간의 각 25칸 히스토그램을 이어 붙여, 총 75칸의 히스토그램(

)을 계산할 수 있다. The two-dimensional image processing unit 120 uses the histogram of the HSV (Hue Saturation Value)

) Can be calculated. More specifically, the two-dimensional image processing unit 120 may convert each color channel for each partition color space into 25 histograms. Then, the two-dimensional image processing unit 120 adds 25 histograms of H, S, and V color spaces to the histogram of 75 squares

) Can be calculated.

상기 2차원 영상 처리부(120)는 서로 인접한 파티션들에 각각 대응되는 히스토그램(

) 사이의 거리를 계산하여, 색상 비유사도(

)를 계산할 수 있다. The two-dimensional image processing unit 120 may generate a histogram corresponding to each of the adjacent partitions

) To calculate the distance between the colors

) Can be calculated.

또한, 2차원 영상 처리부(120)는 원본 영상에 대하여, SIFT(Scale Invariant Feature Transform)의 히스토그램(

)을 이용하여, 텍스쳐 비유사도(

)를 계산할 수 있다. 여기에서, SIFT의 히스토그램은

에서 8 방향에 대한 가우시안 미분을 RGB 각 채널에 적용하는 히스토그램이다. 또한, 본 발명은 텍스처에 많은 압축이 가해진 변환 영상 대신 원본 영상을 이용하여, 텍스쳐 비유사도를 계산함으로써, 텍스처 비유사도 계산의 정확도를 향상시킬 수 있다. In addition, the two-dimensional image processing unit 120 generates a histogram of Scale Invariant Feature Transform (SIFT)

), The texture non-iso (

) Can be calculated. Here, the histogram of SIFT is

Is a histogram that applies the Gaussian differential for 8 directions to each of the RGB channels. In addition, the present invention can improve the accuracy of the texture algebra calculation by calculating the texture algebra using the original image instead of the transformed image with a lot of texture compression.

보다 구체적으로, 2차원 영상 처리부(120)는 각 방향에 대하여, 10개의 칸으로 구성된 히스토그램을 생성할 수 있다. 따라서, 2차원 영상 처리부(120)는 SIFT 히스토그램을 240개의 칸으로 계산할 수 있다. More specifically, the two-dimensional image processing unit 120 can generate a histogram composed of ten squares for each direction. Accordingly, the two-dimensional image processing unit 120 can calculate the SIFT histogram by 240 squares.

상기 2차원 영상 처리부(120)는 하기의 [수학식 8]을 이용하여, 텍스처 비유사도를 계산할 수 있다. The two-dimensional image processing unit 120 can calculate the texture similarity using Equation (8) below.

[수학식 8]&Quot; (8) "

(

: 텍스쳐 비유사도,

: i 번째 파티션의 SIFT 히스토그램,

: j번째 파티션의 SIFT 히스토그램)(

: Texture,

: SIFT histogram of the i-th partition,

: SIFT histogram of the jth partition)

2차원 영상 처리부(120)는 하기의 수학식 9를 이용하여, 수학식 7의 비유사도를 최적화할 수 있다. The two-dimensional image processing unit 120 can optimize the non-skewness of Equation (7) using the following Equation (9).

[수학식 9]&Quot; (9) "

(

: 비유사도 함수를 통해 계산된 영상 분할 결과,

: 기존 데이터 셋에서 결과를 알고있는 분할 결과(ground truth segmentation),

: 선형 SVM을 통해 미리 정의된 규칙화 변수,

: slack 변수)(

: Image segmentation result calculated through non -

: A ground truth segmentation that knows the result in an existing dataset,

: Predefined regularization variables via linear SVM,

: slack variable)

2차원 영상 처리부(120)는 상기 계산된 비유사도(

)가 기 설정된 값 미만인 경우, 적어도 두 개의 파티션을 그룹화하여, 그룹을 생성할 있다. 예를 들어, 2차원 영상 처리부(120)는 서로 인접한 제1파티션(

)과 제2파티션(

)의 비유사도(

)가 기 설정된 값(

) 미만인 경우, 제1파티션(

)과 제2파티션(

)을 하나의 그룹으로 설정할 수 있다. 한편, 여기에서, 기 설정된 값(

)은 반복적인 실험을 통하여 결정되는 상수이다. The two-dimensional image processing unit 120 converts the two-

) Is less than a predetermined value, at least two partitions may be grouped to create a group. For example, the two-dimensional image processing unit 120 may include a first partition

) And the second partition (

) Of the universe (

) Has a predetermined value (

), The first partition (

) And the second partition (

) Can be set as one group. On the other hand, here,

) Is a constant determined through repeated experiments.

그리고, 2차원 영상 처리부(120)는, 유사도가 높은 파티션들이 모인 그룹을 객체 후보 영역으로 결정할 수 있다. Then, the two-dimensional image processing unit 120 can determine a group including partitions having a high degree of similarity as an object candidate region.

2차원 영상 처리부(120)는 객체 후보 영역으로부터 특정 객체를 추출할 수 있다(S330). The two-dimensional image processing unit 120 may extract a specific object from the object candidate region (S330).

2차원 영상 처리부(120)는 CNN 모델 구조를 이용하여, 상기 결정된 객체 후보 영역으로부터 특정 객체를 추출할 수 있다. The two-dimensional image processing unit 120 can extract a specific object from the determined object candidate region using the CNN model structure.

2차원 영상 처리부(120)는 특정 객체의 추출 정확도를 향상시키기 위하여, t상기 결정된 객체 후보 영역에 대하여, 복수의 컨벌루션 레이어의 출력을 사용하는 컨벌루션 큐브(ConvCube)를 구축할 수 있다. 여기에서, 컨벌루션 레이어는, 입력된 데이터에 컨벌루션 연산을 적용하는 기능을 수행하는 레이어이며, 컨벌루션 큐브는두 개 이상의 레이어의 출력물을 3차원 영상으로 이여 붙인 것이다. The two-dimensional image processing unit 120 may construct a convolution cube (ConvCube) using output of a plurality of convolution layers for the determined object candidate region in order to improve extraction accuracy of a specific object. Here, the convolution layer is a layer that performs a function of applying a convolution operation to input data, and a convolution cube is a three-dimensional image of the output of two or more layers.

한편, 컨벌루션 큐브에 사용되는 복수의 컨벌루션 레이어의 출력물은 서로 다른 크기를 갖기 때문에, 특정 객체의 크기에 따라, 서로 다른 샘플링 방법을 적용하여, 크기를 규격화할 수 있다. 따라서, 2차원 영상 처리부(120)는 특정 객체의 크기에 따라 샘플링 방법을 달리함으로써, 특징 손실을 최소화할 수 있기 때문에, 특정 객체의 크기가 작더라도, 특정 객체를 검출할 수 있다. On the other hand, since the outputs of a plurality of convolutional layers used in the convolution cube have different sizes, different sizes can be standardized by applying different sampling methods according to the sizes of specific objects. Accordingly, since the two-dimensional image processing unit 120 minimizes the feature loss by varying the sampling method according to the size of the specific object, the specific object can be detected even if the size of the specific object is small.

보다 구체적으로, 컨벌루션 큐브의 크기보다 큰 출력을 갖는 컨벌루션 레이어는, Max pooling 샘플링 방식이 적용될 수 있고, 이와 반대의 경우, Deconvolution 샘플링 방식이 적용될 수 있다. More specifically, a Max pooling sampling scheme may be applied to a convolution layer having an output greater than that of the convolution cube, and in the opposite case, a deconvolution sampling scheme may be applied.

2차원 영상 처리부(120)는, 크기 별로 샘플링이 완료된 컨벌루션 레이어들에 대하여, 지역 응답 정규화(LRN, local response normalization)을 통하여 값의 정규화를 수행할 수 있다. The two-dimensional image processing unit 120 may perform normalization of values through local response normalization (LRN) on the convolutional layers having been sampled by size.

한편 2차원 영상 처리부(120)는, 객체 후보 영역 전체에 대하여, CNN 모델 구조를 적용하는 것이 아니라, 객체 후보 영역을 이루는 일 프레임에 대하여 컨벌루션 큐브를 구축하고, ROI-pooing을 적용하여, 객체 후보 영역 전체에 대한 컨벌류션 큐브를 구축할 수 있다. On the other hand, the two-dimensional image processing unit 120 does not apply the CNN model structure to the entire object candidate region, but constructs a convolution cube for one frame constituting the object candidate region, applies ROI-pooing, A convolution cube for the entire region can be constructed.

2차원 영상 처리부(120)는 상기 객체 후보 영역 전체에 대하여 구축된 컨벌루션 큐브를, 2개의 컨벌루션 레이어와 2개의 완전 연결(fully-connected) 레이어를 순차적으로 적용하여, 최종 출력을 획득할 수 있다. The two-dimensional image processing unit 120 may sequentially apply two convolutional layers and two fully-connected layers to the convolution cube constructed for the entire object candidate region to obtain the final output.

2개의 컨벌루션 레이어와 2개의 완전 연결(fully-connected) 레이어를 순차적으로 통과한 후, 2차원 영상 처리부(120)는 softmax 분류 레이어를 통하여, 특정 객체를 추출 및 특정 객체의 클래스를 분류할 수 있다. 여기에서, 클래스는 객체의 종류를 나타내는 것으로, 예를 들어, 자동차, 사람, 2륜차를 탄 사람으로 정의될 수 있다. After sequentially passing through two convolutional layers and two fully-connected layers , the two-dimensional image processing unit 120 can extract a specific object and classify a specific object class through a softmax classification layer . Here, the class indicates the kind of object, and can be defined as, for example, a person wearing a car, a person, or a two-wheeled vehicle.

이상에서는, 본 발명에 따른 객체 검출 장치가 2차원 영상에서, 특정 객체를 추출하는 방법에 대하여 설명하였다. In the foregoing, a method of extracting a specific object from a two-dimensional image by the object detection apparatus according to the present invention has been described.

이하에서는, 본 발명에 따른 객체 검출 장치가 3차원 영상에서, 특정 객체를 추출하는 방법에 대하여 설명한다. 도 4는 본 발명에 따른 객체 검출 장치가 3차원 영상에서 특정 객체를 추출하는 방법을 나타낸 흐름도이다. Hereinafter, a method for extracting a specific object from a three-dimensional image by the object detecting apparatus according to the present invention will be described. 4 is a flowchart illustrating a method of extracting a specific object from a three-dimensional image according to an embodiment of the present invention.

도 4를 참조하면, 3차원 영상 처리부(220)는 3차원 영상을 이루는 3차원 점 구름 데이터를 복셀로 변환할 수 있다(S410). Referring to FIG. 4, the 3D image processing unit 220 may convert the 3D point cloud data forming the 3D image into a voxel (S410).

3차원 영상 센서(210)는 차량의 주변 환경을 3차원 영상으로 촬영할 수 있다. 3차원 영상은 3차원 점 구름 데이터로 이루어질 수 있다. The three-dimensional image sensor 210 can photograph the surroundings of the vehicle in a three-dimensional image. The three-dimensional image can be composed of three-dimensional point cloud data.

3차원 영상 처리부(220)은 상기 3차원 영상을 이루는 점 구름 데이터를 3차원 활성화 복셀 공간으로 변환할 수 있다. 3차원 활성화 복셀 공간이란, 3차원 점 구름 데이터를 포함하는 공간을 의미한다. 예를 들어, 3차원 점 구름 데이터 i의 좌표를

라고 정의했을 때, 3차원 점 구름 데이터 i에 해당하는 복셀 공간은,

으로 정의된다. 3차원 점 구름 데이터 i에 해당하는 복셀 공간은, 3차원 점 구름 데이터 i와 그 주변의 3차원 점 구름 데이터를 포함할 수 있다. The three-dimensional image processing unit 220 may convert the point cloud data constituting the three-dimensional image into a three-dimensional active voxel space. The three-dimensional activated voxel space means a space including three-dimensional point cloud data. For example, when the coordinates of the three-dimensional point cloud data i are

, The voxel space corresponding to the three-dimensional point cloud data " i "

. The voxel space corresponding to the three-dimensional point cloud data i may include three-dimensional point cloud data i and three-dimensional point cloud data therearound.

상기 복셀 공간은 기 설정된 크기를 가질 수 있다. 복셀 공간의 크기가 작은 경우, 잡음 데이터를 감소시키기 어렵고, 복셀 공간의 크기가 큰 경우, 의미 있는 객체 형태가 압축되는 문제점이 발생한다. 이에, 복셀 공간은, 잡음 데이터를 최소화하면서, 의미 있는 객체의 형태가 압축되지 않는 크기를 가져야 하며, 이러한 복셀 공간의 크기는 실험에 의하여 결정될 수 있다. 바람직하게는 상기 복셀 공간의 크기는, 0.1X0.1X0.1로 결정될 수 있다. 이를 통하여, 3차원 영상 처리부(220)는 3차원 점 구름 데이터들을 특정 크기를 갖는 복수의 복셀 공간으로 그룹핑하여, 잡음 데이터의 영향을 최소화할 수 있다. The voxel space may have a predetermined size. When the size of the voxel space is small, it is difficult to reduce the noise data, and when the size of the voxel space is large, the meaningful object shape is compressed. Therefore, the voxel space must have such a size that the shape of the meaningful object is not compressed while minimizing the noise data, and the size of the voxel space can be determined experimentally. Preferably, the size of the voxel space may be determined as 0.1X0.1X0.1. Accordingly, the three-dimensional image processing unit 220 can group the three-dimensional point cloud data into a plurality of voxel spaces having a specific size, thereby minimizing the influence of the noise data.

3차원 영상 처리부(220)는 하기의 [수학식 10]에 의하여, 각 복셀 공간의 활성화 확률을 계산할 수 있다. The three-dimensional image processing unit 220 can calculate the activation probability of each voxel space according to Equation (10) below.

[수학식 10]&Quot; (10) "

(

: 복셀(

)의 활성화 확률,

:복셀 개수,

: 복셀(

)을 이루는 3차원 점 구름 데이터의 수,

: 복셀(

)의 j번째 3차원 점 구름 데이터)(

: Voxel (

) Activation probability,

: Number of voxels,

: Voxel (

The number of three-dimensional point cloud data,

: Voxel (

) Th point 3D cloud data)

상기

는 j번째 레이저가 장애물에 반사된 경우, 1, 반사되지 않은 경우, 0을 가질 수 있다. remind

May be 1 if the jth laser is reflected at the obstacle, and 0 if it is not.

3차원 영상 처리부(220)는 상기 변환된 복수의 복셀 공간을 복수의 슈퍼 복셀로 변환하고, 슈퍼 복셀 간의 높이차를 이용하여, 객체 후보 영역을 결정할 수 있다(S420). The three-dimensional image processing unit 220 may convert the transformed plurality of voxel spaces into a plurality of super voxels, and determine an object candidate region using the height difference between the super voxels in operation S420.

3차원 영상 처리부(220)는 3차원 영상에 포함된 객체의 경계선을 정확하게 추출하기 위하여, 복셀 공간을 단위 크기를 갖는 슈퍼 복셀로 분할할 수 있다. 보다 구체적으로, 3차원 영상 처리부(220)는 복셀 구름 연결성 분할(VCCS, Voxel cloud connectivity segmentation)을 통하여, 복셀 공간을 복수의 슈퍼 복셀로 분할할 수 있다. 상기 복셀 구름 연결성 분할은 3차원 데이터 공간 상에 씨뿌리기 방법(seeding methodology)를 이용하여, 공간 분할을 수행한다. The three-dimensional image processing unit 220 may divide the voxel space into super voxels having a unit size in order to accurately extract the boundary lines of the objects included in the three-dimensional image. More specifically, the three-dimensional image processing unit 220 may divide the voxel space into a plurality of super voxels through a voxel cloud connectivity segmentation (VCCS). The voxel cloud connectivity partitioning performs spatial partitioning using a seeding methodology on a three-dimensional data space.

상기 3차원 영상 처리부(220)는 상기 슈퍼 복셀이 객체 추출을 위한 객체 단위의 파티션을 포함할 수 있도록 3차원 공간 상의 슈퍼 복셀을 2차원 [X,Z] 공간 상의 0.1m X 0.1m 격자로 투영할 수 있다. The three-dimensional image processing unit 220 projects super voxels on a three-dimensional space into a 0.1 m X 0.1 m grid on a two-dimensional [X, Z] space so that the super voxels can include object- can do.

그리고, 3차원 영상 처리부(220)는 각 격자 안의 슈퍼 복셀의 높이 차이를 이용하여, 격자의 연결 여부를 결정할 수 있다. 즉, 3차원 영상 처리부(220)는 격자 안의 슈퍼 복셀 간의 높이 차이가 0.1m 이하인 경우, 각 격자 안의 슈퍼 복셀을 그룹핑할 수 있다. 이와 반대로, 3차원 영상 처리부(220)는 격자 안의 슈퍼 복셀 간의 높이 차이가 0.1m 를 초과하는 경우, 각 격자 안의 슈퍼 복셀을 그룹핑하지 않을 수 있다. The three-dimensional image processing unit 220 can determine whether to connect the grid using the height difference of the super voxels in each grid. That is, when the height difference between the super voxels in the lattice is 0.1 m or less, the three-dimensional image processing unit 220 can group the super voxels in each lattice. Conversely, if the height difference between the super voxels in the lattice exceeds 0.1 m, the three-dimensional image processing unit 220 may not group the super voxels in each lattice.

상기 3차원 영상 처리부(220)는 상기 그룹핑된 슈퍼 복셀에 해당하는 영역을 객체 후보 영역으로 설정할 수 있다. The 3D image processing unit 220 may set an area corresponding to the grouped super voxel as an object candidate area.

3차원 영상 처리부(220)는 상기 결정된 객체 후보 영역으로부터 특정 객체를 추출할 수 있다(S430). The 3D image processing unit 220 may extract a specific object from the determined object candidate region (S430).

3차원 영상 처리부(220)는 3차원 점 구름 데이터를, 2차원 영상 깊이 정보 데이터로 변환하여, 2차원 영상 처리부(120)와 동일한 방식으로 특정 객체를 추출 및 특정 객체의 클래스를 분류할 수 있다. 즉, 3차원 영상 처리부(220)는 CNN 모델 구조를 이용하여, 특정 객체를 추출할 수 있다. 따라서, 이에 대한 구체적인 설명은 S330의 설명으로 대체한다. The three-dimensional image processing unit 220 converts the three-dimensional point cloud data into two-dimensional image depth information data, extracts a specific object and classifies the specific object class in the same manner as the two-dimensional image processing unit 120 . That is, the 3D image processing unit 220 can extract a specific object using the CNN model structure. Therefore, a detailed description thereof will be replaced with the description of S330.

이상에서는 3차원 영상으로부터 특정 객체를 추출하는 방법에 대하여 설명하였다. In the above, a method of extracting a specific object from a three-dimensional image has been described.

이하에서는 2차원 영상에서 특정 객체를 추출한 결과와, 3차원 영상에서 특정 객체를 추출한 결과를 이용하여, 특정 객체의 추출 정확도를 향상시키는 방법에 대하여 설명한다. 도 5는 본 발명에 따른 객체 검출 장치가 복수의 센서를 통하여 추출된 추출 결과들을 조합하는 방법을 설명한다. Hereinafter, a method of improving the extraction accuracy of a specific object using a result of extracting a specific object from a two-dimensional image and a result of extracting a specific object from the three-dimensional image will be described. FIG. 5 illustrates a method of combining extraction results extracted by a plurality of sensors by the object detection apparatus according to the present invention.

본 발명에 따른 객체 검출 장치(1000)의 융합 처리부(310)는, 2차원 영상 처리부(120)와, 3차원 영상 처리부(220)로부터 각각 특정 객체를 추출한 추출 결과를 수신할 수 있다. 그리고, 융합 처리부(310)는 두 개의 추출 결과를 융합하여, 특정 객체를 검출할 수 있다. 이하, 융합 처리부(310)의 제어에 대하여 보다 구체적으로 살펴본다. The fusion processing unit 310 of the object detection apparatus 1000 according to the present invention can receive extraction results obtained by extracting specific objects from the two-dimensional image processing unit 120 and the three-dimensional image processing unit 220, respectively. Then, the fusion processing unit 310 can detect a specific object by merging the two extraction results. Hereinafter, the control of the fusion processing unit 310 will be described in more detail.

도 5를 참조하면, 융합 처리부(310)는 서로 다른 센서에서 측정된 영상 데이터로부터 추출된 서로 다른 객체 정보에 대한 유사도를 산출할 수 있다(S510). Referring to FIG. 5, the fusion processing unit 310 may calculate the degree of similarity between different object information extracted from image data measured by different sensors (S510).

융합 처리부(310)는 2차원 영상 센서(120)와 3차원 영상 센서(210)에서 추출된 특정 객체를 나타내는 추출 결과에 대하여, 서로 간의 유사도를 계산할 수 있다. 이러한 유사도 계산 방식은, 기본적 신뢰 할당(BBA, basic belief assingment) 방식을 이용할 수 있다. The fusion processing unit 310 can calculate the degree of similarity between extraction results indicating specific objects extracted from the two-dimensional image sensor 120 and the three-dimensional image sensor 210. This similarity calculation method can use a basic belief assingment (BBA) method.

보다 구체적으로, 융합 처리부(310)는 2차원 영상과 3차원 영상 각각의 객체 후보 영역으로부터 특정 객체의 윤곽선이 포함된 복수의 경계 영역을 검출할 수 있다. 즉, 융합 처리부(310)는 2차원 영상의 객체 후보 영역으로부터 특정 객체의 윤곽선이 포함된 제1경계 영역을 검출하고, 3차원 영상의 객체 후보 영역으로부터 특정 객체의 윤곽선이 포함된 제2경계 영역을 검출할 수 있다. More specifically, the fusion processing unit 310 can detect a plurality of boundary regions including the contours of a specific object from the object candidate regions of the two-dimensional image and the three-dimensional image. That is, the fusion processing unit 310 detects the first boundary region including the contour of the specific object from the object candidate region of the two-dimensional image, and detects the second boundary region including the contour of the specific object from the object candidate region of the three- Can be detected.

상기 융합 처리부(310)는, 클래스 분류 결과의 거리(

) 및 클래스 비유사도(

)에 근거하여, 경계 영역 간의 관계 정보를 획득할 수 있다. 여기에서, Yager의 결합 이론이 사용될 수 있다. The fusion processing unit 310 calculates the distance

) And class analogy (

, It is possible to obtain the relationship information between the boundary regions. Here, Yager's coupling theory can be used.

우선, 융합 처리부(310)는, 클래스 분류 결과의 거리(

)를 계산하기 위하여, 제1경계 영역과 제2경계 영역 간의 관계 행렬을 계산할 수 있다. First, the fusion processing unit 310 calculates the distance

), A relation matrix between the first boundary region and the second boundary region may be calculated.

상기 관계 행렬은 제1경계 영역에 포함된 파티션의 수인 n과 제2경계 영역에 포함된 슈퍼 복셀의 수인 m에 의하여, n X m 행렬로 나타낼 수 있다. 또한, 상기 관계 행렬은, 제1경계 영역에 포함된 파티션(

)과, 제2경계 영역에 포함된 슈퍼 복셀(

) 사이의 관계 성분 (

)으로 구성될 수 있다. The relational matrix may be represented by an n X m matrix, where n is the number of partitions included in the first boundary region and m is the number of super voxels included in the second boundary region. Also, the relational matrix may include a partition (e.g.,

), A super voxel included in the second border area (

) &Lt; / RTI >

).

상기 융합 처리부(310)는 관계 행렬을 이용하여, 제1경계 영역과 제2경계 영역 간의 가설 집단을 하기의 수학식 11을 통하여 표현할 수 있다. The fusion processing unit 310 can express the hypothetical group between the first boundary region and the second boundary region through the following Equation (11) using the relational matrix.

[수학식 11]&Quot; (11) "

여기에서,

는 관계 확률

이고,

은 관계 확률

이다. From here,

The probability of relationship

ego,

Probability of relationship

to be.

상기 융합 처리부(310)는 상기 제1경계 영역과, 상기 클래스 분류 결과의 거리(

)의 기본적 신뢰 할당을 하기의 수학식 12에 의하여 계산할 수 있다. The fusion processing unit 310 calculates the distance between the first boundary region and the classification result

) Can be calculated by Equation (12) below.

[수학식 12]&Quot; (12) "

(

: 증거 감소 팩터(evidence discounting factor),

:

와

사이의 Mahalanobis 거리) (

: Evidence discounting factor,

:

Wow

Mahalanobis street between)

한편, 상기 융합 처리부(310)는, 서로 거리가 가까울 때, 클 값을 반환하기 위하여, 하기의 수학식 13을 이용할 수 있다. On the other hand, the fusion processing unit 310 may use the following equation (13) to return a large value when the distances are close to each other.

[수학식 13] &Quot; (13) "

(

: 거리의 가까움을 나타내는 상수)

(

: A constant indicating the proximity of the distance)

또한, 상기 융합 처리부(310)는, 상기 제1경계 영역에 포함된 파티션(

)과, 상기 제2경계 영역에 포함된 슈퍼 복셀(

) 사이의 클래스 관계를 나타내는 클래스 비유사도(

)를 계산할 수 있다. In addition, the fusion processing unit 310 may include a partition (not shown)

And a super voxel included in the second border area

) Representing the class relationship between classes (

) Can be calculated.

상기 클래스 관계란 상기 제1경계 영역에 포함된 파티션(

)과, 상기 제2경계 영역에 포함된 슈퍼 복셀(

)이 동일한 클래스인지 다른 클래스인지 여부를 의미한다. The class relationship is a relation between a partition included in the first border area

And a super voxel included in the second border area

) Is the same class or another class.

상기 제1경계 영역에 포함된 파티션(

)과, 상기 제2경계 영역에 포함된 슈퍼 복셀(

)이 동일한 클래스인 경우, 상기 제1경계 영역에 포함된 파티션(

)과, 상기 제2경계 영역에 포함된 슈퍼 복셀(

)은 동일한 객체를 가질 수 있기도 하고, 서로 다른 객체를 포함할 수도 있다. 반면에, 상기 제1경계 영역에 포함된 파티션(

)과, 상기 제2경계 영역에 포함된 슈퍼 복셀(

)이 서로 다른 클래스인 경우, 상기 제1경계 영역에 포함된 파티션(

)과, 상기 제2경계 영역에 포함된 슈퍼 복셀(

)은 서로 다른 객체를 가질 수 있다. 따라서, 본 발명에서는, 클래스 유사도가 아닌, 클래스 비유사도를 측정하여, 상기 제1경계 영역에 포함된 파티션(

)과, 상기 제2경계 영역에 포함된 슈퍼 복셀(

)이 서로 다른 객체를 가지는 경우를 계산한다. The partitions included in the first border area (

And a super voxel included in the second border area

) Is the same class, the partition included in the first border area (

And a super voxel included in the second border area

) May have the same object, or they may contain different objects. On the other hand, the partitions included in the first border area

And a super voxel included in the second border area

) Are different classes, the partitions included in the first border area

And a super voxel included in the second border area

) Can have different objects. Therefore, in the present invention, the class non-class diagram is measured instead of the class class similarity, and the partition (s) included in the first boundary area

And a super voxel included in the second border area

) Have different objects.

융합 처리부(310)는, 클래스 비유사도를 계산하기 위하여, 클래스 관계의 가설집단을

으로 정의할 수 있다. 융합 처리부(310)는 각 클래스에 대한 확률 비율을 pignistic trasformation을 사용하여, BBA 밀도 함수로 변환할 수 있다. 상기 BBA 밀도 함수는,

으로 정의될 수 있다. 여기에서,

는 센서 S로부터 제공되는 k번째 경계 박스의 클래스 밀도를 의미한다. In order to calculate the class analogy, the fusion processing unit 310 sets a hypothetical group of class relations

. The fusion processing unit 310 may convert the probability ratios for each class into a BBA density function using pignistic trasformation. The BBA density function,

. &Lt; / RTI > From here,

Denotes the class density of the kth bounding box provided from the sensor S.

융합 처리부(310)는 클래스 비유사도를 하기의 수학식 14와 같이 계산할 수 있다. The fusion processing unit 310 can calculate the class analogy according to Equation (14) below.

[수학식 14] &Quot; (14) "

상기 융합 처리부(310)는, 상기 계산된 클래스 분류 결과의 거리(

)와 클래스 비유사도(

)를 이용하여, 최종 관계 성분(

)을 계산할 수 있다. The fusion processing unit 310 calculates a distance

) And the class analogy (

), The final relational component (

) Can be calculated.

최종 관계 성분(

)은 하기의 수학식 15에 의하여 계산될 수 있다. Final relationship components (

) Can be calculated by the following equation (15).

[수학식 15]&Quot; (15) "

(D, C : 각 센서의 경계 영역의 공간) (D, C: Space in the boundary area of each sensor)

융합 처리부(310)는 최종 관계 성분에 의하여 두 경계 영역의 연관성 여부를 판단할 수 있다(S520). 즉, 융합 처리부(310)는 최종 관계 성분이 기 설정된 값 이상을 갖는 경우, 연관성이 있다고 판단하고, 반대의 경우, 연관성이 없다고 판단할 수 있다. The fusion processing unit 310 can determine whether the two boundary regions are related by the final relation component (S520). That is, the fusion processing unit 310 determines that there is a relation when the final relation component has a predetermined value or more, and determines that there is no correlation when the final relation component has a predetermined value or more.

보다 구체적으로, 융합 처리부(310)는 최종 관계 성분에 의하여 두 경계 영역이 서로 연관성이 있다고 판단되면, 두 경계 영역에 대응되는 컨벌루션 큐브 및 클래스 분류 결과를 이어 붙일 수 있다(S530). 즉, 융합 처리부(310)는, 두 개의 컨벌류션 레이어와 두 개의 fully-connected layer를 통과하여, 최종적으로 2048 차원의 벡터를 출력할 수 있다. More specifically, if it is determined that the two boundary regions are related to each other by the final relation component, the fusion processing unit 310 may concatenate the convolution cube corresponding to the two boundary regions and the class classification result (S530). That is, the fusion processing unit 310 passes through two convolution layers and two fully-connected layers, and finally outputs 2048-dimensional vectors.

또한, 융합 처리부(310)는, 상기 2048 차원의 벡터를 각 센서에서의 3차원 벡터와 연결하여, 총 6차원 벡터를 연결한 2054 벡터로 재구성할 수 있다. 그리고, 융합 처리부(310)는 재구성된 벡터를 fully-connected layer와 최종 이전 SVM을 통과함으로써, 최종 객체 후보 영역을 결정하고, 최종 특정 객체 추출 및 최종 특정 객체의 클래스 분류 결과를 계산할 수 있다. In addition, the fusion processing unit 310 may reconfigure the 2048-dimensional vector into a 2054 vector that is a concatenated 6-dimensional vector by connecting the vector with the 3-dimensional vector at each sensor. Then, the fusion processing unit 310 can determine the final object candidate region by passing the reconstructed vector through the fully-connected layer and the final previous SVM, and can calculate the final specific object extraction result and class classification result of the final specific object.

이와 달리, 융합 처리부(310)는, 최종 관계 성분에 의하여 두 경계 영역이 서로 연관성이 없는 경우, 최종 검출 및 분류 결과로 사용하지 않을 수 있다. 따라서, 본 발명은 서로 연관성이 없는 두 경계 영역 정보를 최종 특정 객체의 추출 및 최종 특정 객체의 클래스 분류 결과의 계산에 포함시키지 않음으로써, 객체 추출의 정확도를 향상시킬 수 있다. Alternatively, the fusion processing unit 310 may not use the final detection and classification result if the two boundary regions are not related to each other due to the final relation component. Therefore, the present invention improves the accuracy of object extraction by not including two pieces of irrelevant boundary region information in the calculation of the classification result of the final specific object and the final specific object.

이상에서 살펴본 바와 같이, 본 발명은 2차원 영상과 3차원 영상에서 각각 특정 객체를 검출 및 분류하고, 각각 검출 및 분류된 결과 정보의 연관성 여부에 따라, 두 결과의 융합 여부를 결정함으로써, 특정 객체 추출 및 분류의 정확도를 향상시킬 수 있다. As described above, according to the present invention, a specific object is detected and classified in a two-dimensional image and a three-dimensional image, and a determination is made as to whether or not two results are fused, The accuracy of extraction and classification can be improved.

본 발명의 일 실시 예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체 및 통신 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. 통신 매체는 전형적으로 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈, 또는 반송파와 같은 변조된 데이터 신호의 기타 데이터, 또는 기타 전송 메커니즘을 포함하며, 임의의 정보 전달 매체를 포함한다. One embodiment of the present invention may also be embodied in the form of a recording medium including instructions executable by a computer, such as program modules, being executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. In addition, the computer-readable medium may include both computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Communication media typically includes any information delivery media, including computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave, or other transport mechanism.

본 발명의 방법 및 시스템은 특정 실시예와 관련하여 설명되었지만, 그것들의 구성 요소 또는 동작의 일부 또는 전부는 범용 하드웨어 아키텍쳐를 갖는 컴퓨터 시스템을 사용하여 구현될 수 있다.While the methods and systems of the present invention have been described in connection with specific embodiments, some or all of those elements or operations may be implemented using a computer system having a general purpose hardware architecture.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.It will be understood by those skilled in the art that the foregoing description of the present invention is for illustrative purposes only and that those of ordinary skill in the art can readily understand that various changes and modifications may be made without departing from the spirit or essential characteristics of the present invention. will be. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive. For example, each component described as a single entity may be distributed and implemented, and components described as being distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

Claims

A two-dimensional image sensor for capturing a two-dimensional image;
A two-dimensional image processing unit for extracting a specific object from the photographed two-dimensional image and classifying the class for the specific object;
A three-dimensional image sensor for capturing a three-dimensional image;
A three-dimensional image processing unit for extracting the specific object from the photographed three-dimensional image and classifying the class for the specific object;
And a fusion processing unit for calculating a final class classification result for the specific object using the class classification result of the specific object extracted by the two-dimensional image processing unit and the classification result of the specific object extracted by the three-dimensional image processing unit In addition,
The fusion processing unit
Calculating a correlation between a class classification result of the specific object extracted by the two-dimensional image processing unit and a classification result of the specific object extracted from the three-dimensional image processing unit,
And calculates a final class classification result for the specific object based on the association.

The method according to claim 1,
The two-dimensional image processing unit
The two-dimensional image is subjected to color smoothing processing,
And detects an object candidate region from the color-smoothed two-dimensional image.

3. The method of claim 2,
The two-dimensional image processing unit
Wherein the color-smoothed two-dimensional image is divided into a plurality of regions,
And extracts at least a part of the plurality of regions as an object candidate region based on the non-derivation of the color and the texture.

The method according to claim 1,
The three-dimensional image is composed of point cloud data,
The three-dimensional image processing unit
Transforming the point cloud data constituting the three-dimensional image into a voxel space,
Dividing the converted voxel space into super voxels having a unit size,
And grouping the super voxels based on a height difference of the point cloud data included in each space to detect an object candidate region.

5. The method of claim 4,
The voxel space
And has a predetermined size to minimize a noise effect.

delete

The method according to claim 1,
The fusion processing unit
When the classification result of the specific object extracted by the two-dimensional image processing unit and the classification result of the specific object extracted by the three-dimensional image processing unit are related to each other, And a class classification result of the specific object extracted by the three-dimensional image processing unit,
Wherein when the class classification result of the specific object extracted by the two-dimensional image processing unit and the class classification result of the specific object extracted by the three-dimensional image processing unit are not associated with each other, the final class classification result is not calculated Object detection device.