KR101921071B1

KR101921071B1 - Method of estimating pose of three-dimensional object with sensor fusion in multi-frame and apparatus theroef

Info

Publication number: KR101921071B1
Application number: KR1020170112810A
Authority: KR
Inventors: 심성대; 강행봉; 김정언
Original assignee: 국방과학연구소
Priority date: 2017-09-04
Filing date: 2017-09-04
Publication date: 2018-11-22
Anticipated expiration: 2037-09-04

Abstract

본 발명에 따른 다중 프레임에서의 센서 융합을 통한 3차원 객체의 포즈 추정 방법은, 비균일하고 넓게 분포된 라이다 정보를 사용하여 객체의 영역을 정확히 추적하기 위해 라이다의 차원을 3차원에서 2차원으로 감소시켜 밀도를 높이는 단계; 차원(dimension)이 감소된 라이다 정보를 좌표계 변환을 통해 카메라 좌표계로 변환하고, 잡음을 제거하는 단계; 상기 잡음이 제거되어 직선 형태로 표현되는 상기 라이다 정보의 연속성을 기준으로 영역을 분할하고 상기 객체의 위치와 방향을 추정하는 단계; 및 상기 추정된 객체 영역과 방향을 2차원 객체 분류기와 결합하여 3차원 공간에서의 객체를 추정하는 단계를 포함하고, 다중 프레임에서의 센서 융합을 통한 3차원 객체 포즈 추정 방법을 통해 기존의 3차원 객체 포즈 추정 방법이 가지는 수작업에 의한 학습 데이터 생성, 클래스 추가에 따른 선형적 연산량 증가 등의 문제점을 극복할 수 있다.The method of estimating a three-dimensional object pose through sensor fusion in multiple frames according to the present invention is characterized in that in order to accurately track an area of an object using non-uniform and widely distributed Lada information, Dimensionally to increase the density; Transforming the reduced-size Lattice information into a camera coordinate system through a coordinate system transformation, and removing noise; Dividing an area based on continuity of the Lattice information in which the noise is removed and represented in a linear form, and estimating a position and a direction of the object; And estimating an object in a three-dimensional space by combining the estimated object region and the direction with a two-dimensional object classifier. The method for estimating a three-dimensional object pose through sensor fusion in a multi- It is possible to overcome the problems such as generation of learning data by hand by the object pose estimation method and an increase in the linear calculation amount due to the addition of a class.

Description

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for estimating a pose of a three-dimensional object through sensor fusion in multiple frames,

본 발명은 다중 프레임에서의 센서 융합을 통한 3차원 객체의 포즈 추정 방법 및 이를 구비한 장치에 관한 것이다.The present invention relates to a method of estimating a pose of a three-dimensional object through sensor fusion in multiple frames and an apparatus equipped with the same.

3D 객체를 검출하는 것은 자율적인 환경에서 가까운 물체와의 실시간 상호 작용이 요구되는 로봇 분야에서 특히 중요하다. 초기 연구에서 연구의 초점은 2D 이미지에서 물체를 탐지하고 분류하는 데 있었지만, 최근 연구는 3D 객체의 3D 포즈와 크기까지 복원하고 이를 통해 객체의 occlusion이나 truncation 등에 의해 손실된 정보를 복원하는 데 맞춰지고 있다. 이를 위한 최근의 중요한 접근 방법 중 하나는 2차원 객체 분류기를 사용하여 객체의 클래스 카테고리를 분류하고 각 카테고리에 해당하는 하위 카테고리 분류기를 연결하여 추가 분류하는 것이다. 이 방법에서는 많은 가설에 의해 객체를 샘플링하여 클래스 카테고리별로 그룹화하고 분류한다. 차량의 경우를 예로 들면 여러 각도에 대응하는 학습 모델과 그 세부 label이 필요하다. 때문에 CCD 이미지를 하위 포즈 분류기와 결합하는 것은 학습 데이터를 생성하기 위해 상당한 양의 데이터와 수작업을 필요로 한다. Detecting 3D objects is particularly important in the robotics field where real-time interaction with nearby objects is required in an autonomous environment. In the early work, the focus of the research was on detecting and classifying objects in 2D images, but recent research has focused on restoring the 3D pose and size of the 3D object and restoring lost information by occlusion or truncation of the object have. One of the most important recent approaches to this is to classify the class categories of the object using a two-dimensional object classifier, and further classify the sub-category classifiers corresponding to the respective categories. In this method, objects are sampled by many hypotheses and grouped and classified by class category. In the case of a vehicle, for example, learning models corresponding to various angles and their detailed labels are required. Combining the CCD image with the sub-pose classifier requires a significant amount of data and manual work to generate the training data.

또한, 새로운 모양을 가진 하위 클래스가 추가되는 경우 이에 대응하는 하위 분류 학습 데이터를 매번 수작업으로 생성해야 하기 때문에 쉽게 추가하기 어려운 문제가 발생한다. 또 다른 접근 방법은 스테레오 이미지를 사용하는 방법이 있다. 스테레오 이미지를 사용하여 깊이 맵을 만들고 RGB 이미지의 각 픽셀을 3D 공간에 투영하여 3D 공간 정보를 생성한다. 생성된 3D 모델을 MRF 에너지 함수를 통해 clustering하고 SVM을 사용하여 객체를 분류한다. 이 방식의 문제점은 스테레오 영상을 통해 생성하는 3차원 공간 모델이 노이즈에 매우 취약하며 연산량이 많아 속도가 느리고 많은 리소스를 필요로 한다는 점이다. 또 CAD 모델 등 임의로 생성한 3차원 모델과 카메라에 의해 촬영된 CCD 영상을 매칭시켜 occlusion이나 truncation과 같은 객체의 상태를 함께 학습하는 방법도 연구되었으나 이는 너무 많은 경우의 수를 모두 학습해야 하며 학습 데이터의 생성에 있어 많은 인력이 소모된다는 단점이 있다.In addition, when a subclass having a new shape is added, the corresponding subclassified learning data must be manually generated each time, which makes it difficult to easily add the subclass. Another approach is to use a stereo image. Using a stereo image, we create a depth map and project each pixel of the RGB image into 3D space to generate 3D spatial information. The generated 3D model is clustering through the MRF energy function and the object is classified using SVM. The problem with this method is that the three-dimensional spatial model generated by the stereo image is very vulnerable to noise, is slow in operation and requires a lot of resources. In addition, a method of learning object states such as occlusion or truncation by matching a randomly generated three-dimensional model such as a CAD model and a CCD image captured by a camera has been studied. However, It is disadvantageous that a lot of manpower is consumed.

따라서, 본 발명은 전술한 문제점을 해결하기 위하여 안출된 것으로, 주행 환경에서 다중 센서의 정보가 포함된 다중 프레임의 정보를 사용하여 차량 주변의 3차원 객체의 포즈를 복원하는 객체 분류기를 제안하는 것을 목적으로 한다.SUMMARY OF THE INVENTION Accordingly, the present invention has been made to solve the above-mentioned problems, and it is an object of the present invention to provide an object classifier for restoring a pose of a three-dimensional object around a vehicle using information of multiple frames including information of multiple sensors in a driving environment The purpose.

상기와 같은 과제를 해결하기 위한 본 발명에 따른 다중 프레임에서의 센서 융합을 통한 3차원 객체의 포즈 추정 방법은, 비균일하고 넓게 분포된 라이다 정보를 사용하여 객체의 영역을 정확히 추적하기 위해 라이다의 차원을 3차원에서 2차원으로 감소시켜 밀도를 높이는 단계; 차원(dimension)이 감소된 라이다 정보를 좌표계 변환을 통해 카메라 좌표계로 변환하고, 잡음을 제거하는 단계; 상기 잡음이 제거되어 직선 형태로 표현되는 상기 라이다 정보의 연속성을 기준으로 영역을 분할하고 상기 객체의 위치와 방향을 추정하는 단계; 및 상기 추정된 위치와 방향을 2차원 객체 분류기와 결합하여 3차원 공간에서의 객체를 추정하는 단계를 포함하고, 다중 프레임에서의 센서 융합을 통한 3차원 객체 포즈 추정 방법을 통해 기존의 3차원 객체 포즈 추정 방법이 가지는 수작업에 의한 학습 데이터 생성, 클래스 추가에 따른 선형적 연산량 증가 등의 문제점을 극복할 수 있다. According to an aspect of the present invention, there is provided a method of estimating a pose of a three-dimensional object through sensor fusion in multiple frames, the method comprising: Increasing the density by decreasing the dimension of the teeth from three dimensions to two dimensions; Transforming the reduced-size Lattice information into a camera coordinate system through a coordinate system transformation, and removing noise; Dividing an area based on continuity of the Lattice information in which the noise is removed and represented in a linear form, and estimating a position and a direction of the object; And estimating an object in a three-dimensional space by combining the estimated position and direction with a two-dimensional object classifier. The method of estimating a three-dimensional object pose through sensor fusion in multiple frames, It is possible to overcome the problems of generation of learning data by manual operation of the pose estimation method and an increase in the linear calculation amount due to the addition of a class.

일 실시 예에서, 상기 3차원 공간에서의 객체를 추정하는 단계는, 상기 분할된 라이다 정보의 경계선에 인접한 상기 직선 형태의 라이다 정보의 평균 방향을 구하는 단계; 및 상기 평균 방향과 상기 경계선을 기준으로 상기 영역 내 객체의 분류 클래스에 대한 하위 정보를 사용하여 3차원 바운딩 박스를 생성하는 단계를 포함할 수 있다.In one embodiment, the step of estimating an object in the three-dimensional space includes: obtaining an average direction of the linearity information of the linear shape adjacent to the boundary of the divided linearity information; And generating a three-dimensional bounding box using the sub information on the classification class of the in-region object based on the average direction and the boundary line.

일 실시 예에서, 상기 라이다 정보를 보다 강건하게 만들기 위해 인접한 프레임의 라이다 정보를 통합하는 단계를 더 포함할 수 있다.In one embodiment, the method may further comprise the step of consolidating the ladder information of an adjacent frame to make the ladder information more robust.

일 실시 예에서, 상기 라이다 정보를 통합하는 단계는, 상기 인접 프레임에서 각각 획득한 2차원 객체 정보를 비교하여 유사 객체를 연결하는 단계; 및 상기 연결된 객체를 구성하는 3차원 라이다 정보를 2차원 객체 경계선과 3차원 방향을 기준으로 하나의 프레임에 누적하는 단계를 포함할 수 있다.In one embodiment, the step of integrating the RLL information comprises: connecting the similar objects by comparing two-dimensional object information obtained in the adjacent frame; And accumulating three-dimensional lattice information constituting the connected object in one frame based on a two-dimensional object boundary line and a three-dimensional direction.

일 실시 예에서, 상기 객체의 위치와 방향을 추정하는 단계는, 두 그룹 i,j의 친화도 점수를

와 같이 계산할 수 있다. 이때, θ_ij는 두 그룹 i, j의 평균 각도이며, m_p는 그룹 경계에 위치한 point의 유클리디안 거리를 나타내고, 상기 친화도 점수는 프로세서의 처리속도/성능, 요구되는 처리 속도/분해능, 및 카메라를 통해 파악된 객체에 대한 사전 정보를 고려하여 더 등급화될 수 있다. 한편, 상기 두 그룹 i,j의 친화도 점수에 기반하여 상기 라이다 정보의 연속성을 기준으로 영역을 분할하는 것을 특징으로 할 수 있다.In one embodiment, the step of estimating the position and orientation of the object comprises the steps of:

Can be calculated as follows. The affinity score indicates the processing speed / performance of the processor, the processing speed / resolution required, the required processing speed / resolution, and the processing speed / performance of the processor. In this case, θ _ij is the average angle of two groups i and j, m _p is the Euclidean distance of the point located at the group boundary, And prior information about the object identified through the camera. On the other hand, the region is divided on the basis of the continuity of the latitude information based on the affinity scores of the two groups i and j.

본 발명에 따르면, 다중 프레임에서의 센서 융합을 통한 3차원 객체 포즈 추정 방법을 통해 기존의 3차원 객체 포즈 추정 방법이 가지는 수작업에 의한 학습 데이터 생성, 클래스 추가에 따른 선형적 연산량 증가 등의 문제점을 극복할 수 있다는 장점이 있다.According to the present invention, there is provided a method for estimating a three-dimensional object pose through sensor fusion in multiple frames, which comprises generating manual learning data by the existing three-dimensional object pose estimation method and increasing the linear calculation amount due to class addition It has the advantage of being able to overcome.

본 명세서에서 첨부되는 다음의 도면들은 본 발명의 바람직한 실시 예를 예시하는 것이며, 발명의 상세한 설명과 함께 본 발명의 기술사상을 더욱 이해시키는 역할을 하는 것이므로, 본 발명은 그러한 도면에 기재된 사항에만 한정되어 해석되어서는 아니 된다.
도 1은 본 발명에서 사용되는 센서가 장착된 구성도를 나타낸다.
도 2는 본 발명에 따른 추정 장치의 상면도와 센싱 영역을 나타낸다.
도 3은 본 발명의 모듈 간의 흐름도를 나타낸다.
도 4는 본 발명의 다른 양상에 따른 다중 프레임에서의 센서 융합을 통한 3차원 객체의 포즈 추정 방법의 흐름도를 나타낸다. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate preferred embodiments of the invention and, together with the description, serve to further the understanding of the technical idea of the invention, And shall not be interpreted.
Fig. 1 shows a configuration in which a sensor used in the present invention is mounted.
2 shows an upper surface and a sensing area of the estimation apparatus according to the present invention.
Figure 3 shows a flow diagram between modules of the present invention.
4 shows a flowchart of a method of estimating a pose of a three-dimensional object through sensor fusion in multiple frames according to another aspect of the present invention.

상술한 본 발명의 특징 및 효과는 첨부된 도면과 관련한 다음의 상세한 설명을 통하여 보다 분명해 질 것이며, 그에 따라 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는바, 특정 실시 예들을 도면에 예시하고 본문에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 개시형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 본 명세서에서 사용한 용어는 단지 특정한 실시 예들을 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다.BRIEF DESCRIPTION OF THE DRAWINGS The above and other features and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings, It will be possible. The present invention is capable of various modifications and various forms, and specific embodiments are illustrated in the drawings and described in detail in the text. It is to be understood, however, that the invention is not intended to be limited to the particular forms disclosed, but on the contrary, is intended to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시 예를 가질 수 있는바, 특정 실시 예들을 도면에 예시하고 상세한 설명에 구체적으로 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It is to be understood, however, that the invention is not to be limited to the specific embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용한다.Like reference numerals are used for similar elements in describing each drawing.

제1, 제2등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. "및/또는" 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. The term " and / or " includes any combination of a plurality of related listed items or any of a plurality of related listed items.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미가 있다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않아야 한다.Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Should not.

이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈", "블록" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. The suffix " module ", " block ", and " part " for components used in the following description are given or mixed in consideration of ease of specification only and do not have their own distinct meanings or roles .

이하, 본 발명의 바람직한 실시 예를 첨부한 도면을 참조하여 당해 분야에 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 설명한다. 하기에서 본 발명의 실시 예를 설명함에 있어, 관련된 공지의 기능 또는 공지의 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS In the following description of the present invention, detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

이하에서는, 본 발명에 따른 다중 프레임에서의 센서 융합을 통한 3차원 객체의 포즈 추정 방법 및 추정 장치에 대해 살펴보기로 하자. Hereinafter, a method and an apparatus for estimating a pose of a three-dimensional object through sensor fusion in multiple frames according to the present invention will be described.

본 발명에서는 주행 환경에서 다중 센서의 정보가 포함된 다중 프레임의 정보를 사용하여 차량 주변의 3차원 객체의 포즈를 복원하는 객체 분류기를 제안한다. 이와 관련하여, 도 1은 제안된 발명의 전체적인 구성을 보여준다. 즉, 도 1은 본 발명에서 사용되는 센서가 장착된 구성도를 나타낸다. 한편, 도 2는 본 발명에 따른 추정 장치의 상면도와 센싱 영역을 나타낸다. 또한, 도 3은 본 발명의 모듈 간의 흐름도를 나타낸다. The present invention proposes an object classifier for restoring a pose of a three-dimensional object around a vehicle using information of multiple frames including information of multiple sensors in a driving environment. In this regard, Figure 1 shows the overall configuration of the proposed invention. That is, FIG. 1 shows a configuration in which a sensor used in the present invention is mounted. 2 shows an upper surface and a sensing area of the estimation apparatus according to the present invention. Figure 3 also shows a flow diagram between modules of the present invention.

먼저, 도 1을 참조하면, 본 발명에 따른 추정 장치(100, 100')는 라이다 센서(LiDAR Sensor, 10)와 CCD 카메라(camera)(20)와 연결된다. 이때, 추정 장치(100, 100')는 하나 또는 복수의 엔티티로 구성될 수 있다.Referring to FIG. 1, the estimating apparatus 100, 100 'according to the present invention is connected to a LiDAR sensor 10 and a CCD camera 20. At this time, the estimation apparatuses 100 and 100 'may be composed of one or a plurality of entities.

한편, 도 2를 참조하면, 라이다 센서(LiDAR Sensor, 10)와 CCD 카메라(camera)(20)에 의해 객체를 추정할 수 있고, 원(circle)로 표시되는 영역은 객체가 존재하는 것으로 추정되는 영역을 나타낸다. Referring to FIG. 2, an object can be estimated by a LiDAR sensor 10 and a CCD camera 20, and an area represented by a circle is estimated as an object exists .

도 3을 참조하면, 본 발명에서 제안된 추정장치 (또는 분류기)(100)는 먼저 라이다(LiDAR) 센서의 3차원 공간 정보를 효율적으로 단순화하고 잡음을 제거하여 2차원 영상의 영역을 분할하는 제안 영역(Proposal)을 생성한다. 생성된 제안 영역은 R-FCN(Region based Fully convolutional Network) 분류기(110, 110')를 통해 클래스가 분류되고 분류된 영역에 속한 3차원 라이다 포인트들의 평균 방향과 모서리 좌표를 통해 객체의 3차원 위치와 포즈를 복원한다. 복원된 포즈를 보다 강건하게 만들기 위해 t-1 프레임의 제안 영역과 t 프레임의 제안 영역에 대한 컨볼루션(Convolutional) 특징을 optical flow(120)를 통해 매칭한다. 이때, 매칭 영역 내의 라이다 포인트들의 평균 방향과 모서리 좌표를 기준으로 t 프레임에 정렬시켜 3차원 객체의 포즈를 보완한다. Referring to FIG. 3, the estimator (or classifier) 100 proposed in the present invention first efficiently simplifies three-dimensional spatial information of a LiDAR sensor and removes noise to divide a region of the two- Proposal area is created. The generated proposed region is classified into classes through the R-FCN (Region Based Fully Convolutional Network) classifiers 110 and 110 ', and three-dimensional lattices belonging to the classified region. Restore position and poses. In order to make the restored pose more robust, the convolutional characteristics of the proposed region of the t-1 frame and the proposed region of the t frame are matched through the optical flow 120. In this case, the pose of the three-dimensional object is compensated by aligning the average direction and the edge coordinates of the lidar points in the matching area with the t-frame.

1) 본 발명에서는 희소하고 넓게 분포되어 있는 라이다 포인트 클라우드(LiDAR point cloud) (130, 130', 140, 140')를 정리하여 2차원 객체를 분류하기 위한 제안 영역을 생성하는 프로세스에 대해 설명한다. 특정 높이를 가진 물체의 라이다 포인트들은 위에서 내려다 보았을 때 물체의 모양에 따라 복잡한 모양으로 나타나지만 객체의 테두리 형태는 명확하게 나타난다. 따라서 객체의 테두리를 제외한 점들을 제거하고 객체의 테두리만 2차원 공간에 투영하면 객체의 위치를 정확하게 나타내는 제안 영역을 생성할 수 있다. 그래서 우리는 주행 환경에서 발견된 물체를 놓치지 않고 바닥만을 제거하기 위해 라이다 포인트들의 높이를 아래의 수학식 1과 같이 40cm로 통일했다.1) In the present invention, a process of creating a proposal area for classifying a two-dimensional object by arranging a rare and widely distributed LiDAR point cloud 130, 130 ', 140, 140' do. Lidar points of an object with a certain height appear to have a complex shape depending on the shape of the object when viewed from above, but the shape of the object's edges is clearly visible. Therefore, if we remove the points except the border of the object and project only the border of the object in the two-dimensional space, we can create a suggestion area that accurately indicates the position of the object. So we unified the height of Lada points to 40cm as shown in Equation 1 below in order to remove only the bottom without missing the objects found in the driving environment.

여기서 p^z _xy는 x,y 및 z축을 구성하는 3차원 공간상의 각 점의 좌표를 나타낸다. 이와 같이 압축된 포인트 클라우드는 지면을 제거하고 점들의 밀도를 높이는 역할을 한다. 또한, 객체의 테두리를 제외한 점들은 2차원 공간에 투영하면 쉽게 제거할 수 있다. p^z _xy를 2차원 CCD 공간에 투영하면 투영된 라이다 포인트의 최하단 가장자리가 객체의 테두리를 나타내게 된다. 따라서 CCD 공간에 투영된 라이다 포인트를 x축에 대해 최하단의 점만 남기고 제거하고 대응되는 3차원 공간의 포인트도 함께 제거한다. 다음으로, 메디안 필터를 적용하여 남은 잡음점(noise point)들을 하단 테두리에 편입시켜 객체의 테두리를 표현하는 한 줄의 선형태로 만든다. 이때, 잡음에 해당하는 3차원 점들은 메디안 필터에 의해 중간 값으로 결정된 포인트에 편입시켜 제거한다. Here, p ^z _xy represents the coordinates of each point on the three-dimensional space constituting the x, y, and z axes. This compressed point cloud removes the ground and increases the density of the points. In addition, points other than the border of an object can be easily removed by projecting in a two-dimensional space. When p ^z _xy is projected in the two-dimensional CCD space, the lowermost edge of the projected ladder point represents the edge of the object. Therefore, the ladder point projected in the CCD space is removed by leaving only the lowermost point with respect to the x-axis, and the points of the corresponding three-dimensional space are also removed. Next, the median filter is applied to add the remaining noise points to the bottom edge to form a line shape representing the border of the object. At this time, the three-dimensional points corresponding to the noise are removed and incorporated into the median point determined by the median filter.

잡음이 제거된 3차원 포인트들을 2차원 공간의 x축 방향순으로 정렬하고 각 포인트 사이의 각도의 합이 90도가 될 때까지 묶어 선형 그룹으로 만든다. 그 후 생성된 그룹의 평균 각도와 평균 좌표를 구해 인근 그룹과의 친화도 점수를 구해 영역을 분할하거나 합치는 기준으로 삼는다. 두 그룹 i,j의 친화도 점수는 다음과 같이 수학식 2에 따라 계산한다. The three-dimensional points with noise removed are aligned in the x-axis direction of the two-dimensional space and grouped until the sum of the angles between the points becomes 90 degrees. Then, the average angle and the average coordinate of the generated group are obtained, and the affinity score with the neighboring group is obtained to divide or combine the region. The affinity scores of the two groups i and j are calculated according to Equation (2) as follows.

여기서, θ_ij는 두 그룹 i, j의 평균 각도이며, m_p는 그룹 경계에 위치한 point의 유클리디안 거리를 나타낸다. 이와 관련하여, 두 그룹 i, j의 친화도 점수를 전술한 바와 같이 구체적으로 정의하였다. 또한, 이러한 친화도 점수를 구체적인 조건을 고려하여 친화도 점수를 등급화할 수 있다. 일 실시 예에서, 프로세서의 처리속도/성능, 요구되는 처리 속도/분해능, 및 카메라를 통해 파악된 객체에 대한 사전 정보를 고려하여 친화도 점수를 등급화할 수 있다. Where θ _ij is the average angle of the two groups i and j, and m _p is the Euclidean distance of the point at the group boundary. In this regard, the affinity scores of the two groups i and j have been specifically defined as described above. In addition, this affinity score can be graded by considering the specific conditions. In one embodiment, the affinity score may be graded by taking into account the processing speed / performance of the processor, the required processing speed / resolution, and prior knowledge of the object captured through the camera.

모든 그룹에 대해 친화도 점수 a를 구한 뒤 a의 변화값이 큰 구간을 경계선으로 제안서를 생성한다. 제안 영역의 y축 크기는 376x1241의 해상도를 갖는 KITTI Dataset를 기준으로 아래의 수학식 3과 같은 관계를 통해 결정된다.The affinity score a is obtained for all groups, and a proposal is generated with a boundary where a change value of a is large. The y-axis size of the proposed region is determined based on the KITTI Dataset having a resolution of 376x1241 through Equation 3 below.

즉, 분류하고자 하는 class의 평균높이(m)가 C^h _reali일 때 제안 영역의 깊이가 d_i라면 CCD 영상에서의 높이는 C^h _pixeli가 된다. 영상에 투영한 라이다 포인트는 지상 40cm의 높이를 가지기 때문에 C^h _reali - 0.40과 -0.40에 대한 C^h _pixeli값을 제안 영역에 속한 라이다 좌표에 더해 제안영역을 클래스의 수만큼 생성한다. That is, when the average height (m) of the class to be classified is C ^h _reali , if the depth of the proposed region is d _i , the height in the CCD image is C ^h _pixeli . Generates a number of 0.40 and -0.40 proposed in addition to the area LA belongs to a coordinate value C ^h _pixeli the proposed area for a class-La projected onto the image point it is because it has the height of the ground 40cm C ^h _reali.

생성된 제안 영역은 R-FCN 분류기의 ROI 입력으로 사용한다. 이와 관련하여, 도 3을 참조하면, R-FCN(110, 110')은 R-CNN 기반의 분류기로서 객체가 ROI 중심에 속할 때 높은 점수를 받도록 설계된 분류기이다. 이러한 특성은 라이다를 기반으로 생성된 제안 영역이 잡음 등에 의해 분류하고자 하는 객체의 일부와 백그라운드를 크게 포함하는 경우의 오분류 확률을 감소시킨다. The generated proposal area is used as ROI input of R-FCN classifier. In this regard, referring to FIG. 3, the R-FCN 110 and 110 'is a classifier based on R-CNN, and is a classifier designed to receive a high score when an object belongs to the ROI center. This property reduces the misclassification probability when the proposed region generated based on the Lada includes a part of the object to be classified by the noise or the like and the background is large.

분류기를 통해 제안 영역의 클래스가 분류되면 제안 영역에 속한 3차원 라이다 포인트의 평균 각도 θ^o _xy와 제안영역의 x축 경계에 속한 두 point의 관계로 결정한 3차원 공간의 x-y축 기준점 p^o _xy을 통해 사전에 알고 있는 클래스의 평균 비율과 크기로 3차원 바운딩 박스(bounding box)를 생성한다. 이 두 요소는 다음 의 수학식 4에 의해 결정한다.If the class of the proposed area is classified through the classifier, the xy-axis reference point p ^o _xy of the three-dimensional space determined by the relationship between the average angle θ ^o _xy of the three-dimensional lidar points belonging to the proposed area and the two points belonging to the _x- A three-dimensional bounding box is created with an average ratio and size of classes known in advance. These two factors are determined by the following equation (4).

2) 거리에 따라 방사형으로 퍼지는 라이다 센서의 특성상 거리, 객체에 의한 가려짐 등의 요인에 의해 제안 영역에 오차가 발생하고 복원된 객체의 자세가 부정확한 경우가 발생한다. 본 발명에서는 라이다 포인트의 밀도를 높이기 위해 인접 프레임 t-1의 라이다 정보를 현재 프레임t에 누적시켜 보다 조밀한 라이다 정보를 만드는 방법을 제안한다. 2) Due to the characteristics of the radar sensor that spreads radially according to the distance, errors occur in the proposed area due to factors such as distance and obstruction by the object, and the restored object attitude may be inaccurate. In the present invention, a method of making dense RL information by accumulating RL information of the adjacent frame t-1 in the current frame t is proposed to increase the density of the lidar point.

제안 영역의 매칭은 optical flow를 통해 수행한다. R-FCN 분류기의 분류 과정에 생성된 convolutional feature map을 이용하여 두 프레임 사이에서 제안 영역 내의 특징 점이 매칭되는 관계를 추적하여 t-1 프레임의 제안영역 b^t-1 _i와 t 프레임의 제안영역 b^t _i에 다음과 같은 과정을 통해 ID를 부여한다. Matching of proposed region is performed through optical flow. Using the convolutional feature map generated in the classification process of the R-FCN classifier, the matching relationship between the feature points in the proposed region between the two frames is traced and the proposed region b ^t-1 _{i of the} ^t-1 frame and the proposed region b ^The ID is assigned to ^t _i through the following process.

b^t-1 _i의 특징점들이 optical flow에 의해 이동된 지점이 가장 많이 포함된 t 프레임의 제안 영역 b^ti에 같은 ID를 부여하고 동일 ID의 특징 영역이 포함하는 선형 그룹의 평균 각도 θ^o _xy와 기준점 p^o _xy을 t 프레임에서 누적시켜 보완된 라이다 세트를 만든다. b ^t-1 _i gives the same ID to the proposed region b ^t i of the t-frame in which the feature points of _i are shifted by the optical flow and the average angle of the linear group including the feature region of the same ID θ ^o _xy And the reference point p ^o _xy are accumulated in the t frame to create a compensated ladder set.

한편, 도 4는 본 발명의 다른 양상에 따른 다중 프레임에서의 센서 융합을 통한 3차원 객체의 포즈 추정 방법의 흐름도를 나타낸다. 도 4에 도시된 바와 같이, 포즈 추정 방법은 밀도 향상 단계(S410), 좌표 변환 단계(S420), 위치/방향 추정 단계(S430) 및 객체 추정 단계(S440)를 포함한다.Meanwhile, FIG. 4 shows a flowchart of a method of estimating a pose of a three-dimensional object through sensor fusion in multiple frames according to another aspect of the present invention. 4, the pose estimation method includes a density enhancement step S410, a coordinate transformation step S420, a position / orientation estimation step S430, and an object estimation step S440.

밀도 향상 단계(S410)에서, 비균일하고 넓게 분포된 라이다 정보를 사용하여 객체의 영역을 정확히 추적하기 위해 라이다의 차원을 3차원에서 2차원으로 감소시켜 밀도를 높일 수 있다. 한편, 밀도 향상 단계(S410)는 라이다 정보를 보다 강건하게 만들기 위해 인접한 프레임의 라이다 정보를 통합하는 단계를 더 포함할 수 있다. 이때, 상기 라이다 정보를 통합하는 단계는, 상기 인접 프레임에서 각각 획득한 2차원 객체 정보를 비교하여 유사 객체를 연결하는 단계; 및 상기 연결된 객체를 구성하는 3차원 라이다 정보를 2차원 객체 경계선과 3차원 방향을 기준으로 하나의 프레임에 누적하는 단계를 포함할 수 있다.In the density enhancement step (S410), the density can be increased by reducing the dimension of the lidar from three dimensions to two dimensions in order to accurately track the area of the object using non-uniformly and widely distributed lidar information. On the other hand, the density enhancement step S410 may further include the step of integrating the ladder information of the adjacent frames to make the ladder information more robust. In this case, the step of integrating the RLL information may include: connecting the similar objects by comparing the two-dimensional object information acquired by the neighboring frames; And accumulating three-dimensional lattice information constituting the connected object in one frame based on a two-dimensional object boundary line and a three-dimensional direction.

좌표 변환 단계(S420)에서, 차원(dimension)이 감소된 라이다 정보를 좌표계 변환을 통해 카메라 좌표계로 변환하고, 잡음을 제거할 수 있다. In the coordinate transformation step S420, the reduced dimension information may be converted into a camera coordinate system through coordinate system transformation, and noise may be removed.

위치/방향 추정 단계(S430)에서, 상기 잡음이 제거되어 직선 형태로 표현되는 상기 라이다 정보의 연속성을 기준으로 영역을 분할하고 상기 객체의 위치와 방향을 추정할 수 있다. 한편, 위치/방향 추정 단계(S430)에서, 전술된 바와 같이, 두 그룹 i,j의 친화도 점수를 계산하고, 프로세서의 처리속도/성능, 요구되는 처리 속도/분해능, 및 카메라를 통해 파악된 객체에 대한 사전 정보를 고려하여 친화도 점수를 더 등급화할 수 있다. 이와 관련하여, 상기 두 그룹 i,j의 친화도 점수에 기반하여 상기 라이다 정보의 연속성을 기준으로 영역을 분할할 수 있다.In the position / direction estimation step S430, the noise may be removed and the region may be divided on the basis of continuity of the Lattice information expressed in a linear form, and the position and direction of the object may be estimated. On the other hand, in the position / direction estimation step S430, affinity scores of the two groups i, j are calculated as described above, and the processing speed / performance of the processor, the required processing speed / resolution, The affinity score can be further graded by taking into account the dictionary information about the object. In this regard, the region may be divided based on the continuity of the ladder information based on the affinity scores of the two groups i and j.

객체 추정 단계(S440)에서, 상기 추정된 위치와 방향을 2차원 객체 분류기와 결합하여 3차원 공간에서의 객체를 추정할 수 있다. 한편, 객체 추정 단계(S440)에서, 상기 분할된 라이다 정보의 경계선에 인접한 상기 직선 형태의 라이다 정보의 평균 방향을 구하는 단계; 및 상기 평균 방향과 상기 경계선을 기준으로 상기 영역 내 객체의 분류 클래스에 대한 하위 정보를 사용하여 3차원 바운딩 박스를 생성하는 단계를 포함할 수 있다.In the object estimation step S440, the estimated position and direction may be combined with the 2D object classifier to estimate the object in the 3D space. On the other hand, in the object estimation step S440, an average direction of the linearity information of the linear shape adjacent to the boundary of the divided LLL information is obtained. And generating a three-dimensional bounding box using the sub information on the classification class of the in-region object based on the average direction and the boundary line.

소프트웨어적인 구현에 의하면, 본 명세서에서 설명되는 절차 및 기능뿐만 아니라 각각의 구성 요소들에 대한 설계 및 파라미터 최적화는 별도의 소프트웨어 모듈로도 구현될 수 있다. 적절한 프로그램 언어로 쓰여진 소프트웨어 어플리케이션으로 소프트웨어 코드가 구현될 수 있다. 상기 소프트웨어 코드는 메모리에 저장되고, 제어부(controller) 또는 프로세서(processor)에 의해 실행될 수 있다.According to a software implementation, the design and parameter optimization for each component as well as the procedures and functions described herein may be implemented as separate software modules. Software code can be implemented in a software application written in a suitable programming language. The software code is stored in a memory and can be executed by a controller or a processor.

Claims

A method for estimating a pose of a three-dimensional object through sensor fusion in multiple frames,
Increasing the density by decreasing the dimension of the lidar from 3 to 2 to accurately track the area of the object using non-uniform and widely distributed lidar information;
Transforming the reduced-size Lattice information into a camera coordinate system through a coordinate system transformation, and removing noise;
Dividing an area based on continuity of the Lattice information in which the noise is removed and represented in a linear form, and estimating a position and a direction of the object; And
And estimating an object in a three-dimensional space by combining the estimated position and direction with a two-dimensional object classifier.

The method according to claim 1,
Wherein the step of estimating an object in the three-
Obtaining an average direction of the linearity information of the linear shape adjacent to a boundary of the divided linear information; And
And generating a three-dimensional bounding box using the sub information on the classification class of the in-region object based on the average direction and the boundary line.

The method according to claim 1,
Further comprising integrating the ladder information of an adjacent frame to make the ladder information more robust.

The method of claim 3,
Wherein the step of integrating the Lladic information comprises:
Comparing the two-dimensional object information obtained in the neighboring frame with each other to connect similar objects; And
And accumulating three-dimensional Lattice information constituting the connected object in one frame based on a two-dimensional object boundary line and a three-dimensional direction.

The method according to claim 1,
Wherein estimating the position and orientation of the object comprises:
The affinity score of both groups i and j

Lt; / RTI >
θ _ij is the average angle of two groups i, j, m _p denotes the Euclidean distance of the point in the group boundary, the affinity score processing speed / performance, treatment speed / resolution required of the processor, and the camera And further information is obtained by considering the prior information about the object identified through < RTI ID = 0.0 >
Wherein the region is divided based on continuity of the lattice information based on affinity scores of the two groups i and j.