KR20200056905A

KR20200056905A - Method and apparatus for aligning 3d model

Info

Publication number: KR20200056905A
Application number: KR1020190087023A
Authority: KR
Inventors: 리 웨이밍; 이형욱; 왕 하오; 박승인; 왕 치앙; 리우 양; 카오 유잉
Original assignee: 삼성전자주식회사
Priority date: 2018-11-15
Filing date: 2019-07-18
Publication date: 2020-05-25
Also published as: CN111191492B; KR102608473B1; CN111191492A

Abstract

Disclosed are a 3D model alignment method and a device thereof. The 3D model alignment method comprises the steps of: acquiring, by a transceiving part, at least one 2D input image including an object; detecting, by a processor, a feature point of the object from the 2D input image by using a neural network; estimating, by the processor, a 3D pose of the object in the 2D input image by using the neural network; searching, by the processor, a target 3D model based on the estimated 3D pose; and aligning, by the processor, the target 3D model and the object based on the feature point.

Description

METHOD AND APPARATUS FOR ALIGNING 3D MODEL}

아래의 설명은 2D 입력 이미지로부터 오브젝트에 3D 모델을 정렬하는 기술에 관한 것이다.The description below relates to a technique for aligning a 3D model to an object from a 2D input image.

증강현실(Augmented Reality, AR)은 가상현실(VR)의 한 분야로 실제로 존재하는 환경에 가상의 사물이나 정보를 합성하여 마치 원래의 환경에 존재하는 사물처럼 보이도록 하는 컴퓨터 그래픽 기법이다. 증강현실에 표시되는 사물이나 오브젝트와의 인터랙션(interaction)은 사용자 경험을 향상시킬 수 있다. 이를 위해서, 오브젝트의 인식 기법이 요구되며, 뉴럴 네트워크는 오브젝트 인식(object recognition)을 위해 활용될 수 있다. 뉴럴 네트워크를 이용함으로써 입력 이미지에 포함된 객체는 보다 빠르고 정확하게 인식될 수 있다. Augmented Reality (AR) is a field of virtual reality (VR) and is a computer graphic technique that synthesizes virtual objects or information in an environment that actually exists and looks like objects in the original environment. Interaction with objects or objects displayed in augmented reality can improve the user experience. To this end, an object recognition technique is required, and the neural network can be utilized for object recognition. By using the neural network, objects included in the input image can be recognized faster and more accurately.

일 실시예에 따른 3D 모델 정렬 방법은, 송수신부에 의해, 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득하는 단계, 프로세서에 의해, 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지로부터 상기 오브젝트의 특징점을 검출하는 단계, 상기 프로세서에 의해, 상기 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정하는 단계, 상기 프로세서에 의해, 상기 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색하는 단계 및 상기 프로세서에 의해, 상기 특징점을 기초로 상기 타겟 3D 모델과 상기 오브젝트를 정렬하는 단계를 포함한다. 3D model alignment method according to an embodiment, obtaining, by the transceiver, at least one 2D input image including an object, by a processor, by using a neural network, the feature point of the object from the 2D input image Detecting, estimating, by the processor, a 3D pose of an object in the 2D input image using the neural network, retrieving a target 3D model by the processor, based on the estimated 3D pose, and And aligning, by the processor, the target 3D model and the object based on the feature point.

상기 2D 입력 이미지를 획득하는 단계는, 제1 포즈의 상기 오브젝트를 포함하는 제1 2D 입력 이미지와 상기 제1 포즈와 상이한 제2 포즈의 상기 오브젝트를 포함하는 제2 2D 입력 이미지를 수신할 수 있다.In the obtaining of the 2D input image, a first 2D input image including the object in a first pose and a second 2D input image including the object in a second pose different from the first pose may be received. .

상기 2D 입력 이미지를 획득하는 단계는, 제1 포즈의 상기 오브젝트를 포함하는 제1 2D 입력 이미지를 수신하는 단계 및 상기 제1 포즈와 상이한 제2 포즈의 상기 오브젝트를 포함하는 제3 2D 입력 이미지를 생성하는 단계를 포함할 수 있다.The obtaining of the 2D input image may include receiving a first 2D input image including the object in a first pose and a third 2D input image including the object in a second pose different from the first pose. And generating.

상기 3D 모델 정렬 방법은, 상기 2D 입력 이미지 내의 상기 오브젝트를 검출하는 단계를 더 포함할 수 있다.The 3D model alignment method may further include detecting the object in the 2D input image.

상기 3D 포즈를 추정하는 단계는, 상기 뉴럴 네트워크를 이용하여 상기 오브젝트의 유형을 분류하는 단계 및 상기 뉴럴 네트워크를 이용하여 상기 분류 결과를 기초로 상기 오브젝트의 3D 포즈를 추정하는 단계를 포함할 수 있다.The estimating the 3D pose may include classifying the type of the object using the neural network and estimating the 3D pose of the object based on the classification result using the neural network. .

상기 타겟 3D 모델을 검색하는 단계는, 상기 2D 입력 이미지의 오브젝트의 제1 특징을 획득하는 단계, 복수의 후보 3D 모델 중에서 하나의 후보 3D 모델의 제2 특징을 획득하는 단계 및 상기 제1 특징과 상기 제2 특징을 기초로 상기 후보 3D 모델이 상기 타겟 3D 모델인지 여부를 결정하는 단계를 포함할 수 있다.The searching of the target 3D model includes: obtaining a first characteristic of the object of the 2D input image, obtaining a second characteristic of one candidate 3D model among a plurality of candidate 3D models, and the first characteristic and And determining whether the candidate 3D model is the target 3D model based on the second feature.

상기 타겟 3D 모델인지 여부를 결정하는 단계는, 상기 제1 특징과 상기 제2 특징의 유사도를 계산하는 단계 및 상기 유사도를 기초로 상기 후보 3D 모델이 상기 타겟 3D 모델인지 여부를 결정하는 단계를 포함할 수 있다.Determining whether or not the target 3D model includes calculating the similarity between the first feature and the second feature, and determining whether the candidate 3D model is the target 3D model based on the similarity. can do.

상기 추정된 3D 포즈, 상기 오브젝트의 특징점 및 상기 타겟 3D 모델의 특징점을 기초로 상기 오브젝트 또는 상기 타겟 3D 모델을 조정하는 단계를 더 포함할 수 있다.The method may further include adjusting the object or the target 3D model based on the estimated 3D pose, the feature points of the object, and the feature points of the target 3D model.

상기 조정하는 단계는, 상기 추정된 3D 포즈를 이용하여 상기 타겟 3D 모델 또는 상기 오브젝트를 조정하는 단계 및 상기 오브젝트의 특징점 및 상기 타겟 3D 모델의 특징점을 기초로 상기 조정된 오브젝트 또는 상기 조정된 타겟 3D 모델을 재조정하는 단계를 포함할 수 있다.The adjusting step may include adjusting the target 3D model or the object using the estimated 3D pose, and adjusting the object or the adjusted target 3D based on the feature points of the object and the feature points of the target 3D model. And re-adjusting the model.

상기 조정하는 단계는, 상기 오브젝트의 특징점 및 상기 타겟 3D 모델의 특징점을 기초로 상기 오브젝트 또는 상기 타겟 3D 모델을 조정하는 단계 및 상기 추정된 3D 포즈를 이용하여 상기 조정된 타겟 3D 모델 또는 상기 조정된 오브젝트를 재조정하는 단계를 포함할 수 있다.The adjusting may include adjusting the object or the target 3D model based on the feature points of the object and the feature points of the target 3D model, and the adjusted target 3D model or the adjusted using the estimated 3D pose. And re-adjusting the object.

일 실시예에 다른 오브젝트의 움직임 예측 방법은, 송수신부에 의해, 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득하는 단계, 프로세서에 의해, 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지로부터 상기 오브젝트의 특징점을 검출하는 단계, 프로세서에 의해, 상기 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정하는 단계, 프로세서에 의해, 상기 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색하는 단계, 프로세서에 의해, 상기 특징점을 기초로 상기 타겟 3D 모델과 상기 오브젝트를 정렬하는 단계 및 프로세서에 의해, 상기 정렬된 3D 모델을 기초로 상기 오브젝트의 움직임을 예측하는 단계를 포함한다.According to an embodiment, a method for predicting motion of another object includes: acquiring at least one 2D input image including an object by a transmitting / receiving unit, and by a processor, feature points of the object from the 2D input image using a neural network Detecting, by a processor, estimating a 3D pose of an object in the 2D input image using the neural network, by a processor, retrieving a target 3D model based on the estimated 3D pose, processor And aligning the target 3D model and the object based on the feature points, and predicting motion of the object based on the aligned 3D model by a processor.

일 실시예에 따른 오브텍트의 텍스쳐 표시 방법은, 송수신부에 의해, 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득하는 단계, 프로세서에 의해, 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지로부터 상기 오브젝트의 특징점을 검출하는 단계, 프로세서에 의해, 상기 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정하는 단계, 프로세서에 의해, 상기 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색하는 단계, 프로세서에 의해, 상기 특징점을 기초로 상기 타겟 3D 모델과 상기 오브젝트를 정렬하는 단계 및 프로세서에 의해, 상기 정렬된 3D 모델을 기초로 상기 오브젝트의 표면에 텍스쳐를 표시하는 단계를 포함한다.The method for displaying a texture of an object according to an embodiment includes: obtaining, by a transmitting / receiving unit, at least one 2D input image including an object, by a processor, by using a neural network, from the 2D input image using the neural network. Detecting a feature point, estimating a 3D pose of an object in the 2D input image using the neural network by a processor, and retrieving a target 3D model based on the estimated 3D pose by the processor, And by the processor, aligning the target 3D model and the object based on the feature point, and displaying, by the processor, a texture on the surface of the object based on the aligned 3D model.

일 실시예에 따른 가상의 3D 이미지 표시 방법은, 프로세서에 의해, 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득하는 단계, 프로세서에 의해, 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지로부터 상기 오브젝트의 특징점을 검출하는 단계, 디스플레이에 의해, 상기 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정하는 단계, 프로세서에 의해, 상기 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색하는 단계, 프로세서에 의해, 상기 특징점을 기초로 상기 타겟 3D 모델과 상기 오브젝트를 정렬하는 단계 및 디스플레이에 의해, 상기 정렬된 3D 모델을 기초로 가상의 3D 이미지를 표시하는 단계를 포함한다.A virtual 3D image display method according to an embodiment includes: obtaining, by a processor, at least one 2D input image including an object, by a processor, using a neural network, feature points of the object from the 2D input image Detecting, by a display, estimating a 3D pose of an object in the 2D input image using the neural network, by a processor, retrieving a target 3D model based on the estimated 3D pose, processor And, aligning the target 3D model and the object based on the feature point and displaying, by displaying, a virtual 3D image based on the aligned 3D model.

일 실시예에 따른 뉴럴 네트워크 학습 방법은, 송수신부에 의해, 오브젝트를 포함하는 적어도 하나의 학습 2D 입력 이미지를 획득하는 단계, 프로세서에 의해, 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정하는 단계, 프로세서에 의해, 상기 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색하는 단계, 프로세서에 의해, 상기 뉴럴 네트워크를 이용하여 상기 학습 2D 입력 이미지로부터 상기 오브젝트의 특징점을 검출하는 단계 및 프로세서에 의해, 상기 추정된 3D 포즈 또는 상기 검출된 특징점을 기초로 상기 뉴럴 네트워크를 학습시키는 단계를 포함한다.The neural network learning method according to an embodiment includes: obtaining, by a transmitting / receiving unit, at least one learning 2D input image including an object, and by a processor, a 3D pose of an object in the 2D input image using a neural network Estimating, by a processor, retrieving a target 3D model based on the estimated 3D pose, by a processor, detecting a feature point of the object from the learned 2D input image using the neural network; And by the processor, training the neural network based on the estimated 3D pose or the detected feature point.

상기 3D 포즈를 추정하는 단계는, 상기 뉴럴 네트워크를 이용하여 상기 오브젝트의 유형을 분류하는 단계 및 상기 뉴럴 네트워크를 이용하여 상기 분류 결과를 기초로 상기 오브젝트의 3D 포즈를 추정하는 단계를 포함하고, 상기 뉴럴 네트워크를 학습시키는 단계는, 상기 분류된 유형을 기초로 상기 뉴럴 네트워크를 학습시킬 수 있다.The estimating the 3D pose includes the steps of classifying the type of the object using the neural network and estimating the 3D pose of the object based on the classification result using the neural network. The step of training the neural network may train the neural network based on the classified type.

상기 뉴럴 네트워크 학습 방법은, 상기 추정된 3D 포즈의 적어도 하나의 후보 3D 모델의 합성 이미지를 획득하는 단계 및 상기 뉴럴 네트워크를 이용하여 상기 학습 2D 입력 이미지 및 상기 합성 이미지의 도메인을 분류하는 단계를 더 포함하고, 상기 뉴럴 네트워크를 학습시키는 단계는, 상기 분류된 도메인을 기초로 상기 뉴럴 네트워크를 학습시킬 수 있다.The neural network learning method further includes obtaining a composite image of at least one candidate 3D model of the estimated 3D pose and classifying the domain of the training 2D input image and the composite image using the neural network. Including, training the neural network may train the neural network based on the classified domain.

상기 합성 이미지를 획득하는 단계는, 상기 적어도 하나의 후보 3D 모델은 제1 후보 3D 모델 및 제2 후보 3D 모델을 포함하고, 상기 추정된 3D 포즈의 제1 후보 3D 모델의 제1 합성 이미지, 상기 제2 포즈의 제1 후보 3D 모델의 제2 합성 이미지, 상기 추정된 3D 포즈의 제2 후보 3D 모델의 제3 합성 이미지 및 상기 제2 포즈의 제2 후보 3D 모델의 제4 합성 이미지를 획득할 수 있다.In the obtaining of the composite image, the at least one candidate 3D model includes a first candidate 3D model and a second candidate 3D model, and the first composite image of the first candidate 3D model of the estimated 3D pose, the Obtain a second composite image of the first candidate 3D model in the second pose, a third composite image of the second candidate 3D model in the estimated 3D pose, and a fourth composite image of the second candidate 3D model in the second pose. Can be.

상기 제1 후보 3D 모델과 상기 오브젝트의 유사도는 임계값 이상이고, 상기 제2 후보 3D 모델과 상기 오브젝트의 유사도는 임계값 미만일 수 있다.The similarity between the first candidate 3D model and the object may be greater than or equal to a threshold, and the similarity between the second candidate 3D model and the object may be less than a threshold.

일 실시예에 따른 3D 모델 정렬 장치는, 적어도 하나의 프로세서 및 뉴럴 네트워크를 저장하는 메모리를 포함하고, 상기 프로세서는, 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득하고, 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지로부터 상기 오브젝트의 특징점을 검출하고, 상기 뉴럴 네트워크를 이용하여 상기 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정하고, 상기 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색하고, 상기 특징점을 기초로 상기 타겟 3D 모델과 상기 오브젝트를 정렬한다.The 3D model alignment apparatus according to an embodiment includes at least one processor and a memory storing a neural network, and the processor acquires at least one 2D input image including an object, and uses the neural network to Detecting a feature point of the object from a 2D input image, estimating a 3D pose of an object in the 2D input image using the neural network, searching for a target 3D model based on the estimated 3D pose, and based on the feature point Align the target 3D model with the object.

도 1은 일 실시예에 따른 3D 모델 정렬 장치의 전체 구성을 도시한 도면이다.
도 2는 일 실시예에 따른 3D 모델 정렬 방법의 동작을 도시한 순서도이다.
도 3은 일 실시예에 따른 3D 모델 정렬 방법의 전체적인 동작을 도시한 흐름도이다.
도 4는 일 실시예에 따른 3D 모델 정렬 방법의 구체적인 동작을 도시한 흐름도이다.
도 5는 일 실시예에 따른 3D 모델 정렬을 위한 뉴럴 네트워크의 학습 방법의 동작을 도시한 순서도이다.
도 6은 일 실시예에 따른 3D 모델 정렬을 위한 뉴럴 네트워크의 학습 단계에서 뉴럴 네트워크의 구조를 도시한 도면이다.
도 7은 일 실시예에 따른 3D 모델 정렬을 위한 뉴럴 네트워크의 학습 단계에서 뉴럴 네트워크에 상이한 포즈의 이미지가 입력되는 과정과 처리 과정을 도시한 흐름도이다.
도 8은 3D 모델 정렬 장치가 응용되는 제1 실시예를 도시한 도면이다.
도 9는 3D 모델 정렬 장치가 응용되는 제2 실시예를 도시한 도면이다.
도 10은 3D 모델 정렬 장치가 응용되는 제3 실시예를 도시한 도면이다.
도 11은 3D 모델 정렬 장치가 응용되는 제4 실시예를 도시한 도면이다.1 is a view showing the overall configuration of a 3D model alignment device according to an embodiment.
2 is a flowchart illustrating an operation of a 3D model alignment method according to an embodiment.
3 is a flowchart illustrating an overall operation of a 3D model alignment method according to an embodiment.
4 is a flowchart illustrating a specific operation of a 3D model alignment method according to an embodiment.
5 is a flowchart illustrating an operation of a learning method of a neural network for 3D model alignment according to an embodiment.
FIG. 6 is a diagram illustrating the structure of a neural network in a learning step of a neural network for 3D model alignment according to an embodiment.
7 is a flowchart illustrating a process in which images of different poses are input into a neural network in a learning step of a neural network for 3D model alignment according to an embodiment.
8 is a view illustrating a first embodiment in which a 3D model alignment device is applied.
9 is a view illustrating a second embodiment in which a 3D model alignment device is applied.
10 is a diagram illustrating a third embodiment in which a 3D model alignment device is applied.
11 is a diagram illustrating a fourth embodiment in which a 3D model alignment device is applied.

이하에서, 첨부된 도면을 참조하여 실시예들을 상세하게 설명한다. 그러나, 실시예들에는 다양한 변경이 가해질 수 있어서 특허출원의 권리 범위가 이러한 실시예들에 의해 제한되거나 한정되는 것은 아니다. 실시예들에 대한 모든 변경, 균등물 내지 대체물이 권리 범위에 포함되는 것으로 이해되어야 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. However, various changes may be made to the embodiments, and the scope of the patent application right is not limited or limited by these embodiments. It should be understood that all modifications, equivalents, or substitutes for the embodiments are included in the scope of rights.

실시예에서 사용한 용어는 단지 설명을 목적으로 사용된 것으로, 한정하려는 의도로 해석되어서는 안된다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서 상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the examples are used for illustrative purposes only and should not be construed as limiting. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this specification, the terms "include" or "have" are intended to indicate the presence of features, numbers, steps, actions, components, parts or combinations thereof described herein, one or more other features. It should be understood that the existence or addition possibilities of fields or numbers, steps, operations, components, parts or combinations thereof are not excluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by a person skilled in the art to which the embodiment belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having meanings consistent with meanings in the context of related technologies, and should not be interpreted as ideal or excessively formal meanings unless explicitly defined in the present application. Does not.

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 실시예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 실시예의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다.In addition, in the description with reference to the accompanying drawings, the same reference numerals are assigned to the same components regardless of reference numerals, and redundant descriptions thereof will be omitted. In describing the embodiments, when it is determined that detailed descriptions of related known technologies may unnecessarily obscure the subject matter of the embodiments, detailed descriptions thereof will be omitted.

도 1은 일 실시예에 따른 3D 모델 정렬 장치의 전체 구성을 도시한 도면이다.1 is a view showing the overall configuration of a 3D model alignment device according to an embodiment.

일 실시예에 따르면, 3D 모델 정렬 장치(100)는 2D 이미지에 대응하는 3D 모델을 검색하고, 3D 모델을 2D 이미지 내의 오브젝트에 정렬할 수 있다. 3D 모델 정렬 장치(100)는 2D 이미지의 오브젝트와 이에 대응하는 3D 모델을 정렬함으로써 보다 개선된 증강현실(Augmented Reality, AR)의 사용자 경험을 제공할 수 있다.According to an embodiment, the 3D model alignment device 100 may search for a 3D model corresponding to the 2D image, and arrange the 3D model to an object in the 2D image. The 3D model alignment device 100 may provide an improved augmented reality (AR) user experience by aligning objects of a 2D image with a 3D model corresponding thereto.

일 실시예에 따르면, 3D 모델 정렬 장치(100)는 2D 입력 이미지(121)를 입력 받고, 2D 입력 이미지(121)에 대응하는 타겟 3D 모델(125)을 검색하여, 2D 입력 이미지(121)의 오브젝트와 타겟 3D 모델(125)을 정렬할 수 있다. 3D 모델 정렬 장치(100)는 뉴럴 네트워크(neural network)를 이용하여 2D 입력 이미지(121)의 오브젝트와 타겟 3D 모델(125)을 정렬할 수 있다. According to an embodiment, the 3D model alignment device 100 receives the 2D input image 121, searches for a target 3D model 125 corresponding to the 2D input image 121, and displays the 2D input image 121. The object and the target 3D model 125 may be aligned. The 3D model alignment device 100 may align the object of the 2D input image 121 and the target 3D model 125 using a neural network.

3D 모델 정렬 장치(100)는 다양한 시점의 2D 이미지를 사용함으로써 보다 정확하게 타겟 3D 모델(125)을 검색할 수 있다. 3D 모델 정렬 장치(100)는 서로 다른 시점의 2D 이미지로부터 제공되는 상이한 정보를 이용하여 2D 입력 이미지(121)에 매칭되는 보다 정확한 타겟 3D 모델(125)을 검색할 수 있다. The 3D model alignment device 100 may more accurately search for the target 3D model 125 by using 2D images of various viewpoints. The 3D model alignment device 100 may search for a more accurate target 3D model 125 matching the 2D input image 121 using different information provided from 2D images of different viewpoints.

3D 모델 정렬 장치(100)는 단일 시점의 2D 입력 이미지(121)만으로도 다양한 시점의 2D 이미지를 생성함으로써 사용자의 편의성을 높일 수 있다. 3D 모델 정렬 장치(100)는 서로 다른 시점의 2D 입력 이미지(121)를 입력 받을 수도 있지만, 이미 입력 받은 2D 입력 이미지(121)를 기초로 상이한 시점의 2D 이미지를 생성할 수도 있다. 이를 통해, 입력해야 할 2D 입력 이미지(121)의 수가 감소하므로, 3D 모델 정렬 장치(100)의 사용성은 향상될 수 있다.The 3D model alignment device 100 may increase the user's convenience by generating 2D images of various viewpoints using only a 2D input image 121 of a single viewpoint. The 3D model alignment device 100 may receive 2D input images 121 of different viewpoints, but may generate 2D images of different viewpoints based on the 2D input images 121 already received. Through this, since the number of 2D input images 121 to be input is reduced, usability of the 3D model alignment device 100 may be improved.

3D 모델 정렬 장치(100)는 2D 입력 이미지(121)의 3D 포즈를 추정하고, 타겟 3D 모델(125)을 검색하고, 2D 입력 이미지(121)와 타겟 3D 모델(125)을 정렬하는 단계적인 접근을 통해 정렬의 정확도를 높일 수 있다. 3D 모델 정렬 장치(100)는 추정된 3D 포즈 또는 검출된 특징점을 통해 오브젝트 또는 3D 모델을 조정함으로써 보다 정확한 정렬 결과를 도출할 수 있다. 또한, 3D 모델 정렬 장치(100)는 개선된 구조를 가지는 뉴럴 네트워크를 이용함으로써 2D 입력 이미지(121)의 오브젝트와 타겟 3D 모델(125)을 보다 정확하게 정렬할 수 있다.The 3D model alignment device 100 estimates the 3D pose of the 2D input image 121, searches the target 3D model 125, and arranges the 2D input image 121 and the target 3D model 125 in a stepwise approach. Through this, the accuracy of alignment can be improved. The 3D model alignment device 100 may derive a more accurate alignment result by adjusting the object or 3D model through the estimated 3D pose or detected feature points. In addition, the 3D model alignment device 100 may more accurately align the target 3D model 125 with the object of the 2D input image 121 by using a neural network having an improved structure.

이를 위하여, 3D 모델 정렬 장치(100)는 적어도 하나의 프로세서(101) 및 뉴럴 네트워크를 저장하는 메모리(103)를 포함한다. 3D 모델 정렬 장치(100)는 데이터베이스(110)를 더 포함할 수 있다. 데이터베이스(110)는 3D 모델 정렬 장치(100) 내부에 포함될 수도 있고, 외부 장치로서 존재할 수도 있다. 데이터베이스(110)는 하나 이상의 3D 모델을 포함할 수 있다. To this end, the 3D model alignment device 100 includes at least one processor 101 and a memory 103 storing a neural network. The 3D model alignment device 100 may further include a database 110. The database 110 may be included in the 3D model alignment device 100 or may exist as an external device. The database 110 may include one or more 3D models.

3D 모델 정렬 장치(100)는 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득한다. 예를 들어, 2D 입력 이미지는 RGB 채널로 표현되는 2D 이미지일 수 있다. 2D 입력 이미지에는 하나 이상의 오브젝트가 포함될 수 있다. 복수의 오브젝트가 2D 입력 이미지에 포함될 경우에, 3D 모델 정렬 장치(100)는 복수의 오브젝트가 포함된 2D 입력 이미지를 한번에 처리할 수도 있고, 각각의 오브젝트 별로 2D 입력 이미지를 분할하여 처리할 수도 있다.The 3D model alignment device 100 acquires at least one 2D input image including an object. For example, the 2D input image may be a 2D image represented by an RGB channel. The 2D input image may include one or more objects. When a plurality of objects are included in the 2D input image, the 3D model alignment device 100 may process the 2D input image including the plurality of objects at once, or may divide and process the 2D input image for each object. .

프로세서(101)는 뉴럴 네트워크를 이용하여 2D 입력 이미지로부터 오브젝트의 특징점을 검출한다. 여기서, 특징점은 키 포인트(key point) 또는 랜드마크(landmark)로 지칭될 수 있으나 이에 한정되지 않으며, 표현을 불문하고 오브젝트와 배경의 구별을 용이하게 해주는 표시이면 무엇이든 포함할 수 있다. 특징점은 오브젝트나 배경의 종류와 무관하게 동일한 기준으로 검출될 수도 있고 오브젝트나 배경의 종류에 따라 다른 기준으로 검출될 수도 있다. The processor 101 detects a feature point of an object from a 2D input image using a neural network. Here, the feature point may be referred to as a key point (key point) or a landmark (landmark), but is not limited thereto, and may include any display that facilitates distinguishing between an object and a background regardless of expression. The feature points may be detected with the same reference regardless of the type of object or background, or may be detected with different criteria depending on the type of object or background.

프로세서(101)는 뉴럴 네트워크를 이용하여 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정한다. 프로세서(101)는 뉴럴 네트워크를 이용하여 오브젝트의 유형을 분류할 수 있다. 프로세서(101)는 분류 결과를 기초로 뉴럴 네트워크를 이용하여 오브젝트의 3D 포즈를 추정할 수 있다. The processor 101 estimates the 3D pose of the object in the 2D input image using the neural network. The processor 101 may classify object types using a neural network. The processor 101 may estimate the 3D pose of the object using the neural network based on the classification result.

프로세서(101)는 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색한다. 프로세서(101)는 추정된 3D 포즈를 기초로 데이터베이스(110)로부터 복수의 후보 3D 모델 각각을 오브젝트와 비교할 수 있다. 프로세서(101)는 오브젝트와 가장 유사한 후보 3D 모델을 타겟 3D 모델로 결정할 수 있다.The processor 101 searches for a target 3D model based on the estimated 3D pose. The processor 101 may compare each of the plurality of candidate 3D models from the database 110 with the object based on the estimated 3D pose. The processor 101 may determine the candidate 3D model most similar to the object as the target 3D model.

프로세서(101)는 특징점을 기초로 타겟 3D 모델과 오브젝트를 정렬한다. 프로세서(101)는 3D 포즈를 기초로 오브젝트와 타겟 3D 모델을 정렬할 수 있다. 정렬 결과에 오차가 존재할 수 있으므로, 프로세서(101)는 3D 포즈 및 특징점을 기초로 오브젝트 또는 타겟 3D 모델을 조정할 수 있다. 이를 통해, 프로세서(101)는 보다 정확한 정렬 결과를 도출할 수 있다.The processor 101 aligns the target 3D model and the object based on the feature points. The processor 101 may align the object and the target 3D model based on the 3D pose. Since an error may exist in the alignment result, the processor 101 may adjust the object or target 3D model based on the 3D pose and feature points. Through this, the processor 101 can derive a more accurate alignment result.

일 실시예에 따르면, 3D 모델 정렬 장치(100)는 증강현실 분야에서 다양하게 적용될 수 있다. 3D 모델 정렬 장치(100)는 오브젝트의 움직임을 예측할 수 있다. 3D 모델 정렬 장치(100)는 오브젝트의 표면에 미리 결정된 텍스쳐를 표시하거나 드로잉(drawing)하거나 렌더링(rendering)하는 데에 사용될 수 있다. 3D 모델 정렬 장치(100)는 가상의 3D 이미지를 오브젝트와 관련되게 표시하는 데에 사용될 수 있다. 예를 들어, 3D 모델 정렬 장치(100)는 가상의 3D 이미지를 오브젝트의 위치를 고려하여 표시할 수 있다. 3D 모델 정렬 장치(100)는 오브젝트와 관련되어 표시된 가상의 3D 이미지를 조정하는 데에 사용될 수 있다. 3D 모델 정렬 장치(100)는 오브젝트와 관련되어 표시된 가상의 3D 이미지의 포즈를 업데이트하거나 제어하는 데에 사용될 수 있다. According to an embodiment, the 3D model alignment device 100 may be variously applied in the field of augmented reality. The 3D model alignment device 100 may predict the movement of the object. The 3D model alignment device 100 may be used to display, draw, or render a predetermined texture on the surface of an object. The 3D model alignment device 100 may be used to display a virtual 3D image in relation to an object. For example, the 3D model alignment device 100 may display a virtual 3D image in consideration of the position of the object. The 3D model alignment device 100 may be used to adjust a displayed virtual 3D image associated with an object. The 3D model alignment device 100 may be used to update or control the pose of the displayed virtual 3D image associated with the object.

예를 들어, 3D 모델 정렬 장치(100)에 의해 제공된 오브젝트의 3차원 정보를 통해 증강현실 내의 복수의 오브젝트 간의 상호 연동이 보조될 수 있다. 오브젝트의 3차원 정보를 기초로 증강현실 내에 존재하는 교통 수단의 움직임 또는 의도(intention)가 예측될 수 있다. 이러한 기능은 자율 주행에 적용될 수 있다. 오브젝트가 움직이는 경우에 오브젝트의 3차원 정보를 기초로 3D 효과가 표시될 수 있다. 다만 이들은 예시에 불과하며, 3D 모델 정렬 장치(100)는 이에 한정되지 않고 다양한 분야에 적용될 수 있다. 증강현실 외에도, 오브젝트의 3D 모델 및 포즈 정보는 자동 주행, 로봇 등과 같은 더 많은 기술 분야에 응용될 수 있다.For example, mutual interworking between a plurality of objects in augmented reality may be assisted through 3D information of an object provided by the 3D model alignment device 100. The movement or intention of a transportation means existing in augmented reality may be predicted based on the 3D information of the object. This function can be applied to autonomous driving. When the object moves, a 3D effect may be displayed based on the 3D information of the object. However, these are only examples, and the 3D model alignment device 100 is not limited thereto and may be applied to various fields. In addition to augmented reality, 3D model and pose information of objects can be applied to more technical fields such as automatic driving and robots.

도 2는 일 실시예에 따른 3D 모델 정렬 방법의 동작을 도시한 순서도이다.2 is a flowchart illustrating an operation of a 3D model alignment method according to an embodiment.

일 실시예에 따르면, 단계(201)에서, 3D 모델 정렬 장치(100)는 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득한다. 예를 들어, 2D 입력 이미지는 RGB 채널로 표현되는 이미지일 수 있다. 3D 모델 정렬 장치(100)는 제1 포즈의 오브젝트를 포함하는 2D 입력 이미지로부터 제2 포즈의 오브젝트를 포함하는 2D 입력 이미지를 생성할 수 있다. 이를 통해, 3D 모델 정렬 장치(100)는 요구되는 3D 입력 이미지의 개수를 줄이면서도 다양한 시점 또는 3D 포즈의 오브젝트의 정보를 획득함으로써 사용성과 정확성을 동시에 높일 수 있다. According to one embodiment, in step 201, the 3D model alignment device 100 acquires at least one 2D input image including an object. For example, the 2D input image may be an image represented by an RGB channel. The 3D model alignment device 100 may generate a 2D input image including the object in the second pose from the 2D input image including the object in the first pose. Through this, the 3D model alignment apparatus 100 can simultaneously increase usability and accuracy by acquiring information of objects of various viewpoints or 3D poses while reducing the number of 3D input images required.

일 실시예에 따르면, 단계(203)에서, 3D 모델 정렬 장치(100)는 뉴럴 네트워크를 이용하여 2D 입력 이미지로부터 오브젝트의 특징점을 검출한다. 3D 모델 정렬 장치(100)는 오브젝트의 특징점의 위치를 식별할 수 있다. 뉴럴 네트워크는 오브젝트의 특징점을 검출하도록 미리 학습될 수 있다. 이에 반해, 후보 3D 모델의 특징점은 미리 정해진 상태일 수 있다. 오브젝트의 특징점은 타겟 3D 모델과 오브젝트를 정렬하기 위해 사용될 수 있다. 오브젝트의 특징점은 오브젝트를 포함하는 2D 입력 이미지 또는 타겟 3D 모델의 조정을 위해 사용될 수도 있다.According to one embodiment, in step 203, the 3D model alignment device 100 detects a feature point of an object from a 2D input image using a neural network. The 3D model alignment device 100 may identify the location of the feature point of the object. The neural network can be learned in advance to detect the feature points of the object. In contrast, a feature point of the candidate 3D model may be a predetermined state. The feature points of the object can be used to align the object with the target 3D model. The feature points of the object may be used for adjustment of the target 3D model or 2D input image containing the object.

일 실시예에 따르면, 단계(205)에서, 3D 모델 정렬 장치(100)는 뉴럴 네트워크를 이용하여 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정한다. 3D 모델 정렬 장치(100)는 뉴럴 네트워크를 이용하여 오브젝트의 유형을 식별할 수 있다. 3D 모델 정렬 장치(100)는 식별된 유형을 기초로 오브젝트의 3D 포즈를 추정할 수 있다. 3D 모델 정렬 장치(100)는 오브젝트의 3 개의 자유도(거리, 중심점)와 나머지 3 개의 자유도(방위각, 앙각, 회전각)을 추정할 수 있다. 3D 모델 정렬 장치(100)는 6개의 자유도로 표현되는 3D 포즈를 획득할 수 있다. 여기서, 중심점의 좌표는 두 개이므로 2개의 자유도를 갖는다.According to one embodiment, in step 205, the 3D model alignment device 100 estimates the 3D pose of the object in the 2D input image using the neural network. The 3D model alignment device 100 may identify a type of object using a neural network. The 3D model alignment device 100 may estimate the 3D pose of the object based on the identified type. The 3D model alignment apparatus 100 may estimate three degrees of freedom (distance, center point) of the object and three degrees of freedom (azimuth, elevation, and rotation angle). The 3D model alignment device 100 may acquire a 3D pose represented by six degrees of freedom. Here, since the center point has two coordinates, it has two degrees of freedom.

일 실시예에 따르면, 단계(207)에서, 3D 모델 정렬 장치(100)는 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색한다. 3D 모델 정렬 장치(100)는 각각의 2D 입력 이미지의 오브젝트의 특징점과 이들에 대응하는 후보 3D 모델의 특징점을 상호 비교하여 유사도가 높은 후보 3D 모델을 타겟 3D 모델로 결정할 수 있다.According to an embodiment, in step 207, the 3D model alignment device 100 searches for a target 3D model based on the estimated 3D pose. The 3D model alignment apparatus 100 may compare the feature points of the objects of each 2D input image and the feature points of the candidate 3D model corresponding to each other to determine a candidate 3D model having high similarity as a target 3D model.

일 실시예에 따르면, 단계(209)에서, 3D 모델 정렬 장치(100)는 특징점을 기초로 타겟 3D 모델과 오브젝트를 정렬한다. 3D 모델 정렬 장치(100)는 뉴럴 네트워크를 이용하여 검출된 오브젝트이 특징점과 타겟 3D 모델의 특징점을 기초로 오브젝트와 타겟 3D 모델을 정렬할 수 있다. 여기서, 정렬은 매칭, 매핑, 정합 등으로 표현될 수도 있다. According to one embodiment, in step 209, the 3D model alignment device 100 aligns the target 3D model and the object based on the feature points. The 3D model alignment apparatus 100 may align the object and the target 3D model based on the feature point of the detected object and the feature point of the target 3D model using the neural network. Here, the alignment may be expressed by matching, mapping, and matching.

다른 실시예에 따르면, 3D 모델 정렬 장치(100)는 추정된 3D 포즈, 오브젝트의 특징점 및 타겟 3D 모델의 특징점을 기초로 오브젝트 또는 타겟 3D 모델을 조정할 수 있다. 추정된 3D 포즈, 타겟 3D 모델, 특징점 등은 부정확할 수 있다. 이처럼, 검출된 오브젝트 및 추정된 3D 포즈에는 오차가 존재할 수 있으므로, 3D 모델 정렬 장치(100)는 추정된 3D 포즈, 2D 입력 이미지 내의 오브젝트의 특징점 및 타겟 3D 모델의 특징점을 기초로 2D 입력 이미지의 오브젝트와 타겟 3D 모델을 추가적으로 조정할 수 있다. 여기서, 조정은 캘리브레이션(calibration) 또는 교정으로 지칭될 수 있다. 이를 통하여, 3D 모델 정렬 장치(100)는 추정된 3D 포즈, 타겟 3D 모델, 2D 입력 이미지 내의 오브젝트의 정확도를 보다 향상시킬 수 있다. According to another embodiment, the 3D model alignment device 100 may adjust the object or target 3D model based on the estimated 3D pose, the feature points of the object, and the feature points of the target 3D model. The estimated 3D pose, target 3D model, and feature points may be inaccurate. As described above, since an error may exist in the detected object and the estimated 3D pose, the 3D model alignment device 100 displays the 2D input image based on the estimated 3D pose, the feature point of the object in the 2D input image, and the feature point of the target 3D model. You can further adjust the object and target 3D models. Here, the adjustment may be referred to as calibration or calibration. Through this, the 3D model alignment device 100 may further improve the accuracy of the estimated 3D pose, target 3D model, and object in the 2D input image.

3D 모델 정렬 장치(100)는 추정된 3D 포즈를 이용하여 타겟 3D 모델 또는 오브젝트를 조정할 수 있다. 3D 모델 정렬 장치(100)는 오브젝트의 특징점 및 타겟 3D 모델의 특징점을 기초로 조정된 오브젝트 또는 타겟 3D 모델을 재조정할 수 있다. The 3D model alignment device 100 may adjust the target 3D model or object using the estimated 3D pose. The 3D model alignment device 100 may re-adjust the adjusted object or target 3D model based on the feature points of the object and the feature points of the target 3D model.

예를 들어, 3D 모델 정렬 장치(100)는 오차 검출 프레임을 이용하여 오차가 존재하는 오브젝트의 3 개의 자유도(거리, 중심점)을 계산하고, 뉴럴 네트워크를 이용하여 나머지 3 개의 자유도(방위각, 앙각, 회전각)을 추정할 수 있다. 3D 모델 정렬 장치(100)는 6개의 자유도로 표현되는 3D 포즈를 획득할 수 있다. 3D 모델 정렬 장치(100)는 검색된 타겟 3D 모델을 해당 3D 포즈의 2D 이미지로 렌더링할 수 있다. 3D 모델 정렬 장치(100)는 렌더링된 2D 이미지를 기초로 초기의 2D 입력 이미지를 조정할 수 있다. 이처럼, 3D 모델 정렬 장치(100)는 2D 입력 이미지의 오브젝트와 타겟 3D 모델을 조정할 수 있다.For example, the 3D model alignment device 100 calculates three degrees of freedom (distance, center point) of an object having an error using an error detection frame, and uses the neural network to calculate the remaining three degrees of freedom (azimuth angle, Elevation angle, rotation angle) can be estimated. The 3D model alignment device 100 may acquire a 3D pose represented by six degrees of freedom. The 3D model alignment device 100 may render the searched target 3D model as a 2D image of the corresponding 3D pose. The 3D model alignment device 100 may adjust the initial 2D input image based on the rendered 2D image. As such, the 3D model alignment device 100 may adjust the object and target 3D model of the 2D input image.

일 실시예에 따르면, 3D 모델 정렬 장치(100)는 추정된 3D 포즈를 이용하여 타겟 3D 모델 또는 오브젝트를 조정할 수 있다. 3D 모델 정렬 장치(100)는 오브젝트의 특징점 및 타겟 3D 모델의 특징점을 기초로 조정된 오브젝트 또는 조정된 타겟 3D 모델을 재조정할 수 있다. According to an embodiment, the 3D model alignment device 100 may adjust the target 3D model or object using the estimated 3D pose. The 3D model alignment device 100 may re-adjust the adjusted object or the adjusted target 3D model based on the feature points of the object and the feature points of the target 3D model.

다른 실시예에 따르면, 3D 모델 정렬 장치(100)는 오브젝트의 특징점 및 타겟 3D 모델의 특징점을 기초로 오브젝트 또는 타겟 3D 모델을 조정할 수 있다. 3D 모델 정렬 장치(100)는 추정된 3D 포즈를 이용하여 조정된 타겟 3D 모델 또는 조정된 오브젝트를 재조정할 수 있다.According to another embodiment, the 3D model alignment device 100 may adjust the object or target 3D model based on the feature points of the object and the feature points of the target 3D model. The 3D model alignment device 100 may re-adjust the adjusted target 3D model or the adjusted object using the estimated 3D pose.

도 3은 일 실시예에 따른 3D 모델 정렬 방법의 전체적인 동작을 도시한 흐름도이다.3 is a flowchart illustrating an overall operation of a 3D model alignment method according to an embodiment.

일 실시예에 따르면, 단계(301)에서, 3D 모델 정렬 장치(100)는 2D 입력 이미지를 입력 받을 수 있다. 적어도 하나의 2D 입력 이미지는 수신된 것이며, 다른 2D 입력 이미지는 수신된 2D 입력 이미지로부터 생성될 것일 수 있다. 각각의 2D 입력 이미지는 서로 다른 포즈의 오브젝트를 포함할 수 있다. According to an embodiment, in step 301, the 3D model alignment device 100 may receive a 2D input image. At least one 2D input image is received, and another 2D input image may be generated from the received 2D input image. Each 2D input image may include different pose objects.

일 실시예에 따르면, 단계(303)에서, 3D 모델 정렬 장치(100)는 2D 입력 이미지로부터 오브젝트를 검출할 수 있다. 오브젝트의 검출에는 상용되고 있는 다양한 화상 처리 방식이 적용될 수 있다. According to an embodiment, in step 303, the 3D model alignment device 100 may detect an object from a 2D input image. Various image processing methods that are commercially available can be applied to the detection of an object.

일 실시예에 따르면, 단계(304)에서, 3D 모델 정렬 장치(100)는 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정할 수 있다. 예를 들어, 3D 포즈는 6개의 자유도로 표현될 수 있다. 도 3을 참조하면, 오브젝트의 3D 포즈는 방위각, 앙각, 회전각으로 표현된다. 3D 포즈는 거리, 중심점을 더 포함할 수 있다.According to an embodiment, in step 304, the 3D model alignment device 100 may estimate the 3D pose of the object in the 2D input image. For example, a 3D pose can be expressed with six degrees of freedom. Referring to FIG. 3, the 3D pose of the object is represented by azimuth, elevation, and rotation angles. The 3D pose may further include a distance and a center point.

일 실시예에 따르면, 단계(305)에서, 3D 모델 정렬 장치(100)는 타겟 3D 모델을 검색할 수 있다. 3D 모델 정렬 장치(100)는 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색할 수 있다. 다른 예로, 3D 모델 정렬 장치(100)는 단계(307)에서 검출된 특징점을 기초로 타겟 3D 모델을 검색할 수도 있다. 3D 모델 정렬 장치(100)는 3D 포즈 및 특징점을 기초로 타겟 3D 모델을 검색할 수도 있다. According to one embodiment, in step 305, the 3D model alignment device 100 may search for a target 3D model. The 3D model alignment device 100 may search for a target 3D model based on the estimated 3D pose. As another example, the 3D model alignment device 100 may search for a target 3D model based on the feature points detected in step 307. The 3D model alignment device 100 may search for a target 3D model based on 3D poses and feature points.

일 실시예에 따르면, 단계(307)에서, 3D 모델 정렬 장치(100)는 2D 입력 이미지에서 오브젝트의 특징점을 검출할 수 있다. 3D 모델 정렬 장치(100)는 특징점을 이용하여 배경과 구별되는 오브젝트의 특징을 획득할 수 있다. 타겟 3D 모델의 특징점은 타겟 3D 모델과 함께 미리 데이터베이스에 저장될 수 있다.According to an embodiment, in step 307, the 3D model alignment device 100 may detect a feature point of the object in the 2D input image. The 3D model alignment device 100 may acquire characteristics of an object that is distinguished from a background using feature points. The feature points of the target 3D model may be stored in a database in advance together with the target 3D model.

일 실시예에 따르면, 단계(309)에서, 3D 모델 정렬 장치(100)는 타겟 3D 모델과 오브젝트를 정렬할 수 있다. 3D 모델 정렬 장치(100)는 타겟 3D 모델의 특징점과 오브젝트의 특징점을 기초로 양자를 정렬 할 수 있다. According to an embodiment, in step 309, the 3D model alignment device 100 may align the target 3D model and the object. The 3D model alignment device 100 may align both based on the feature points of the target 3D model and the feature points of the object.

도 4는 일 실시예에 따른 3D 모델 정렬 방법의 구체적인 동작을 도시한 흐름도이다.4 is a flowchart illustrating a specific operation of a 3D model alignment method according to an embodiment.

일 실시예에 따르면, 단계(401)에서, 3D 모델 정렬 장치(100)는 2D 입력 이미지를 입력 받을 수 있다. 일 실시예에 따르면, 3D 모델 정렬 장치(100)는 서로 다른 포즈의 복수의 2D 입력 이미지를 수신할 수 있다. 3D 모델 정렬 장치(100)는 제1 포즈의 오브젝트를 포함하는 제1 2D 입력 이미지와 제1 포즈와 상이한 제2 포즈의 오브젝트를 포함하는 제2 2D 입력 이미지를 수신할 수 있다. 상이한 포즈를 가지는 2D 입력 이미지를 생성하는 과정이 생략되므로, 3D 모델 정렬 장치(100)는 2D 입력 이미지의 생성에 소요되는 리소스를 절감할 수 있다. According to an embodiment, in step 401, the 3D model alignment device 100 may receive a 2D input image. According to an embodiment, the 3D model alignment device 100 may receive a plurality of 2D input images of different poses. The 3D model alignment device 100 may receive a first 2D input image including an object in a first pose and a second 2D input image including an object in a second pose different from the first pose. Since the process of generating a 2D input image having different poses is omitted, the 3D model alignment device 100 can reduce resources required to generate a 2D input image.

다른 실시예에 따르면, 3D 모델 정렬 장치(100)는 입력 받은 2D 입력 이미지를 기초로 상이한 포즈의 오브젝트를 포함하는 2D 입력 이미지를 생성할 수 있다. 3D 모델 정렬 장치(100)는 제1 포즈의 오브젝트를 포함하는 제1 2D 입력 이미지를 수신하고, 제1 포즈와 상이한 제2 포즈의 오브젝트를 포함하는 제3 2D 입력 이미지를 생성할 수 있다. 예를 들어, 3D 모델 정렬 장치(100)는 GAN(Generative Adversarial Network)을 이용하여 상이한 포즈의 2D 입력 이미지를 생성할 수 있다.According to another embodiment, the 3D model alignment device 100 may generate a 2D input image including objects of different poses based on the received 2D input image. The 3D model alignment device 100 may receive a first 2D input image including an object of the first pose and generate a third 2D input image including an object of the second pose different from the first pose. For example, the 3D model alignment device 100 may generate 2D input images of different poses using a Generative Adversarial Network (GAN).

일 실시예에 따르면, 단계(403)에서, 3D 모델 정렬 장치(100)는 2D 입력 이미지에서 오브젝트를 검출할 수 있다. 오브젝트의 검출에는 상용되는 다양한 화상 검출 방식이 사용될 수 있다.According to an embodiment, in step 403, the 3D model alignment device 100 may detect an object in a 2D input image. A variety of commercially available image detection methods can be used for object detection.

3D 모델 정렬 장치(100)는 오브젝트의 3D 포즈를 추정할 수 있다. 여기서, 3D 포즈는, 예를 들어, 6개의 자유도로 표현될 수 있다. 자유도 정보는 방위각（azimuth）a, 앙각（elevation）e, 회전각（in-plane rotation）, 거리（distance）d과 중심점（principal point）(u, v)을 포함할 수 있다. 3D 모델 정렬 장치(100)는 오브젝트를 검출하면서 오브젝트의 거리 및 중심점을 함께 추정할 수도 있고 별도의 과정을 통해 3D 포즈를 추정할 수도 있다.The 3D model alignment device 100 may estimate the 3D pose of the object. Here, the 3D pose can be expressed, for example, with six degrees of freedom. The degree of freedom information may include azimuth a, elevation e, in-plane rotation, distance d and principal point (u, v). The 3D model alignment device 100 may estimate the distance and center point of the object together while detecting the object, or may estimate the 3D pose through a separate process.

일 실시예에 따르면, 단계(405)에서, 3D 모델 정렬 장치(100)는 3D 포즈를 기초로 타겟 3D 모델을 검색할 수 있다. 3D 모델 정렬 장치(100)는 뉴럴 네트워크를 이용하여 오브젝트의 유형을 분류할 수 있다. 3D 모델 정렬 장치(100)는 오브젝트의 유형을 기초로 오브젝트의 3D 포즈를 추정할 수 있고, 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색할 수 있다. 여기서, 3D 모델 정렬 장치(100)는 타겟 3D 모델을 기초로 방위각, 앙각 및 회전각을 보다 정확하게 결정할 수 있다.According to an embodiment, in step 405, the 3D model alignment device 100 may search for a target 3D model based on the 3D pose. The 3D model sorting apparatus 100 may classify the type of the object using a neural network. The 3D model alignment apparatus 100 may estimate the 3D pose of the object based on the type of the object, and search for the target 3D model based on the estimated 3D pose. Here, the 3D model alignment device 100 may more accurately determine the azimuth, elevation, and rotation angles based on the target 3D model.

일 실시예에 따르면, 단계(407)에서, 3D 모델 정렬 장치(100)는 2D 입력 이미지 내의 오브젝트의 특징점을 검출할 수 있다. 3D 모델 정렬 장치(100)는 뉴럴 네트워크를 이용하여 특징점을 검출할 수 있다. 뉴럴 네트워크는 오브젝트의 특징점을 검출하도록 미리 학습될 수 있다.According to one embodiment, in step 407, the 3D model alignment device 100 may detect a feature point of the object in the 2D input image. The 3D model alignment device 100 may detect a feature point using a neural network. The neural network can be learned in advance to detect the feature points of the object.

일 실시예에 따르면, 단계(409)에서, 3D 모델 정렬 장치(100)는 3D 포즈를 기초로 검출된 오브젝트와 타겟 3D 모델을 정렬할 수 있다. 3D 모델 정렬 장치(100)는 거리, 중심점, 방위각, 앙각, 회전각을 기초로 오브젝트와 타겟 3D 모델을 정렬할 수 있다. 예를 들어, 3D 모델 정렬 장치(100)는 서로 다른 포즈의 오브젝트를 포함하는 2D 입력 이미지 각각에 대응하여 후보 3D 모델을 검색할 수 있다. 3D 모델 정렬 장치(100)는 각각의 2D 입력 이미지의 오브젝트의 특징점과 이들에 대응하는 후보 3D 모델의 특징점을 상호 비교하여 유사도가 높은 후보 3D 모델을 타겟 3D 모델로 결정할 수 있다. According to one embodiment, in step 409, the 3D model alignment device 100 may align the target 3D model with the detected object based on the 3D pose. The 3D model alignment device 100 may align the object and the target 3D model based on the distance, center point, azimuth angle, elevation angle, and rotation angle. For example, the 3D model alignment device 100 may search for a candidate 3D model corresponding to each 2D input image including objects of different poses. The 3D model alignment apparatus 100 may compare the feature points of the objects of each 2D input image and the feature points of the candidate 3D model corresponding to each other to determine a candidate 3D model having high similarity as a target 3D model.

일 실시예에 따르면, 단계(411)에서, 3D 모델 정렬 장치(100)는 3D 포즈 및 오브젝트의 특징점을 기초로 타겟 3D 모델을 다시 검색할 수 있다. 단계(405)에 비해 오브젝트의 특징점을 더 포함하여 검색하기 때문에, 검색된 타겟 3D 모델은 보다 정확할 수 있다.According to one embodiment, in step 411, the 3D model alignment device 100 may search for the target 3D model again based on the 3D pose and the feature points of the object. The searched target 3D model may be more accurate because the search includes more feature points of the object than step 405.

3D 모델 정렬 장치(100)는 2D 입력 이미지의 오브젝트의 제1 특징을 획득할 수 있다. 3D 모델 정렬 장치(100)는 복수의 후보 3D 모델 중에서 하나의 후보 3D 모델의 제2 특징을 획득할 수 있다. 3D 모델 정렬 장치(100)는 제1 특징과 제2 특징을 기초로 후보 3D 모델이 타겟 3D 모델인지 여부를 결정할 수 있다. 3D 모델 정렬 장치(100)는 제1 특징과 제2 특징의 유사도를 계산할 수 있다. 3D 모델 정렬 장치(100)는 유사도를 기초로 후보 3D 모델이 타겟 3D 모델인지 여부를 결정할 수 있다.The 3D model alignment device 100 may acquire a first characteristic of the object of the 2D input image. The 3D model alignment device 100 may acquire a second characteristic of one candidate 3D model among a plurality of candidate 3D models. The 3D model alignment device 100 may determine whether the candidate 3D model is the target 3D model based on the first feature and the second feature. The 3D model alignment device 100 may calculate the similarity between the first feature and the second feature. The 3D model alignment device 100 may determine whether the candidate 3D model is the target 3D model based on the similarity.

일 실시예에 따르면, 단계(413)에서, 3D 모델 정렬 장치(100)는 특징점을 기초로 보다 정확하게 검색된 타겟 3D 모델과 2D 입력 이미지 상의 오브젝트를 정렬할 수 있다. 3D 모델 정렬 장치(100)는 3D 포즈 및 특징점을 기초로 오브젝트와 타겟 3D 모델을 보다 정확하게 정렬할 수 있다. According to an embodiment, in step 413, the 3D model alignment device 100 may align the target 3D model and the object on the 2D input image that are more accurately searched based on the feature points. The 3D model alignment device 100 may align objects and target 3D models more accurately based on 3D poses and feature points.

도 5는 일 실시예에 따른 3D 모델 정렬을 위한 뉴럴 네트워크의 학습 방법의 동작을 도시한 순서도이다.5 is a flowchart illustrating an operation of a learning method of a neural network for 3D model alignment according to an embodiment.

일 실시예에 따르면, 단계(501)에서, 학습 장치는 오브젝트를 포함하는 적어도 하나의 학습 2D 입력 이미지를 획득할 수 있다. 뉴럴 네트워크의 학습에 사용되는 학습 데이터에는 실제 이미지인 학습 2D 입력 이미지 뿐만 아니라 3D 모델링 프로그램을 통해 렌더링된 합성 이미지를 포함할 수 있다. 학습 장치는 추정된 3D 포즈의 적어도 하나의 후보 3D 모델의 합성 이미지를 획득할 수 있다. 학습 장치는 뉴럴 네트워크를 이용하여 학습 2D 입력 이미지 및 합성 이미지의 도메인을 분류할 수 있다. 뉴럴 네트워크는 분류된 도메인에 따라 학습 데이터를 처리할 수 있다.According to one embodiment, in step 501, the learning device may acquire at least one learning 2D input image including the object. The training data used for the training of the neural network may include not only a training 2D input image that is an actual image, but also a composite image rendered through a 3D modeling program. The learning device may acquire a composite image of at least one candidate 3D model of the estimated 3D pose. The learning device may classify the domains of the learning 2D input image and the composite image using the neural network. The neural network can process learning data according to classified domains.

일 실시예에 따르면, 단계(503)에서, 학습 장치는 뉴럴 네트워크를 이용하여 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정할 수 있다. 학습 장치는 뉴럴 네트워크를 이용하여 오브젝트의 유형을 분류할 수 있다. 학습 장치는 뉴럴 네트워크를 이용하여 분류 결과를 기초로 오브젝트의 3D 포즈를 추정할 수 있다. 3D 포즈의 추정은 회귀 문제로 모델링될 수도 있고 분류 문제로 모델링될 수도 있다. 뉴럴 네트워크는 모델링된 구조에 따라 3D 포즈를 추정할 수 있다.According to one embodiment, in step 503, the learning device may estimate the 3D pose of the object in the 2D input image using the neural network. The learning device may classify the type of object using a neural network. The learning device may estimate the 3D pose of the object based on the classification result using the neural network. The estimation of the 3D pose may be modeled as a regression problem or a classification problem. The neural network can estimate the 3D pose according to the modeled structure.

일 실시예에 따르면, 단계(505)에서, 학습 장치는 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색할 수 있다. 다른 실시예에 다르면, 학습 장치는 3D 포즈 및 특징점을 기초로 타겟 3D 모델을 검색할 수 있다. 다른 실시예에 따르면, 학습 장치는 3D 포즈를 기초로 우선 타겟 3D 모델을 검색한 후에, 3D 포즈 및 특징점을 기초로 보다 정확한 타겟 3D 모델을 조정하거나 검색할 수도 있다. According to one embodiment, in step 505, the learning device may search for a target 3D model based on the estimated 3D pose. According to another embodiment, the learning device may search for a target 3D model based on 3D poses and feature points. According to another embodiment, the learning device may first search for a target 3D model based on the 3D pose, and then adjust or search for a more accurate target 3D model based on the 3D pose and feature points.

일 실시예에 따르면, 단계(507)에서, 학습 장치는 뉴럴 네트워크를 이용하여 학습 2D 입력 이미지로부터 오브젝트의 특징점을 검출할 수 있다. 학습 장치는 서로 다른 유형에 대응하여 특징점을 검출할 수 있다. 학습 장치는 서로 다른 유형의 오브젝트의 특징점을 식별할 수 있다. According to one embodiment, in step 507, the learning device may detect a feature point of the object from the learning 2D input image using a neural network. The learning device may detect feature points corresponding to different types. The learning device may identify feature points of different types of objects.

일 실시예에 따르면, 단계(509)에서, 학습 장치는 추정된 3D 포즈 또는 검출된 특징점을 기초로 뉴럴 네트워크를 학습시킬 수 있다. 추정된 3D 포즈 또는 검출된 특징점을 기초로 뉴럴 네트워크를 학습시키는 경우에 사용되는 타겟 3D 모델은 추정된 것일 수도 있고 미리 결정된 것일 수도 있다. 미리 결정된 타겟 3D 모델은 미리 정답으로 설정된 모델일 수 있다. 타겟 3D 모델은 타겟 3D 모델에 관한 정보와 함께 데이터베이스에 저장될 수 있다. 예를 들어, 미리 결정된 타겟 3D 모델은 인공적으로 주석이 표시된 모델일 수 있다. 주석은 해당 타겟 3D 모델이 정답이라는 점을 표시하거나 해당 타겟 3D 모델의 정보를 표시할 수 있다.According to one embodiment, in step 509, the learning device may train the neural network based on the estimated 3D pose or the detected feature point. The target 3D model used when training the neural network based on the estimated 3D pose or the detected feature points may be estimated or predetermined. The predetermined target 3D model may be a model set as a correct answer in advance. The target 3D model may be stored in a database along with information about the target 3D model. For example, the predetermined target 3D model may be an artificially annotated model. The annotation may indicate that the target 3D model is the correct answer or may display information of the target 3D model.

학습 장치는 분류된 유형을 기초로 뉴럴 네트워크를 학습시킬 수 있다. 분류된 유형을 기초로 뉴럴 네트워크를 학습시키는 경우에 사용되는 3D 포즈, 특징점 또는 타겟 3D 모델은 사후적으로 도출된 것일 수도 있고 미리 결정된 것일 수도 있다. 미리 결정된 3D 포즈, 특징점 또는 타겟 3D 모델은 미리 정답으로 설정된 모델일 수 있다. 예를 들어, 미리 결정된 3D 포즈, 특징점 또는 타겟 3D 모델은 인공적으로 주석이 표시된 것일 수 있다. 주석은 해당 3D 포즈, 특징점 또는 타겟 3D 모델이 정답이라는 점을 표시하거나 해당 3D 포즈, 특징점 또는 타겟 3D 모델의 정보를 표시할 수 있다.The learning device can train the neural network based on the classified type. The 3D pose, feature point or target 3D model used when training the neural network based on the classified type may be ex post derived or predetermined. The predetermined 3D pose, feature point, or target 3D model may be a model set as a correct answer in advance. For example, the predetermined 3D pose, feature point, or target 3D model may be artificially annotated. The annotation may indicate that the corresponding 3D pose, feature point or target 3D model is the correct answer, or may display information of the corresponding 3D pose, feature point or target 3D model.

학습 장치는 분류된 도메인을 기초로 뉴럴 네트워크를 학습시킬 수 있다. 분류된 유형을 기초로 뉴럴 네트워크를 학습시키는 경우에 사용되는 3D 포즈, 특징점 또는 타겟 3D 모델은 사후적으로 도출된 것일 수도 있고 미리 결정된 것일 수도 있다.The learning device can train the neural network based on the classified domain. The 3D pose, feature point or target 3D model used when training the neural network based on the classified type may be ex post derived or predetermined.

도 6은 일 실시예에 따른 3D 모델 정렬을 위한 뉴럴 네트워크의 학습 단계에서 뉴럴 네트워크의 구조를 도시한 도면이다.FIG. 6 is a diagram illustrating the structure of a neural network in a learning step of a neural network for 3D model alignment according to an embodiment.

일 실시예에 따르면, 뉴럴 네트워크(600)는 오브젝트와 타겟 3D 모델을 정렬하는 과정에서 다양한 방식으로 사용될 수 있다. 뉴럴 네트워크(600)는 오브젝트의 유형을 식별하고, 오브젝트의 3D 포즈를 추정하고, 오브젝트의 특징점을 검출하고, 합성 도메인과 실제 도메인을 분류할 수 있다. 도 6을 참조하면, 뉴럴 네트워크(600)는 상기 복수의 기능들을 동시에 수행할 수 있는 것으로 표현되었지만, 뉴럴 네트워크의 구조는 이에 한정되지 않으며 각각의 기능 별로 별도의 뉴럴 네트워크가 대응될 수도 있다.According to an embodiment, the neural network 600 may be used in various ways in the process of aligning the object and the target 3D model. The neural network 600 may identify the type of the object, estimate the 3D pose of the object, detect the feature points of the object, and classify the composite domain and the real domain. Referring to FIG. 6, the neural network 600 is expressed as being capable of simultaneously performing the plurality of functions, but the structure of the neural network is not limited to this, and a separate neural network may correspond to each function.

뉴럴 네트워크(600)의 학습에는 다량의 학습 데이터가 이용될 수 있다. 학습 데이터는 학습 2D 입력 이미지가 포함될 수 있다. 학습 2D 입력 이미지는 실제 이미지일 수 있다. 하지만, 실제 이미지는 한정적이고 실제 이미지에 대응되는 정보를 분석하기에 많은 비용이 소요된다. 반면에, 이미 분석된 정보를 포함하는 합성 이미지는 3D 모델링 프로그램을 통해 자동으로 수량의 제한 없이 구현할 수 있다. 이에, 뉴럴 네트워크(600)의 학습에 사용되는 학습 데이터에는 실제 이미지인 학습 2D 입력 이미지 뿐만 아니라 3D 모델링 프로그램을 통해 렌더링된 합성 이미지를 포함할 수 있다. 여기서, 실제 이미지는 오브젝트의 유형, 오브젝트의 3D 포즈 및 특징점에 관한 정보를 포함하는 실제 이미지이다. 합성 이미지는 3차원CAD모델을 이용해 렌더링된 오브젝트의 유형, 오브젝트의 3D 포즈 및 특징점에 관한 정보를 포함하는 합성된 이미지이다. A large amount of learning data may be used for learning the neural network 600. The learning data may include a learning 2D input image. The learning 2D input image may be an actual image. However, the actual image is limited and it is expensive to analyze information corresponding to the actual image. On the other hand, a composite image containing information that has already been analyzed can be automatically implemented without limitation in quantity through a 3D modeling program. Accordingly, the training data used for the training of the neural network 600 may include a learning 2D input image, which is an actual image, as well as a composite image rendered through a 3D modeling program. Here, the actual image is an actual image including information about the type of object, 3D pose of the object, and feature points. The composite image is a composite image that contains information about the type of object rendered using the 3D CAD model, the 3D pose of the object, and feature points.

합성 이미지와 실제 이미지 간에는 차이가 존재하며 이러한 차이를 기준으로 도메인이 구별될 수 있다. 여기서, 합성 이미지는 합성 도메인에 속하고, 실제 이미지는 실제 도메인에 속한다고 표현될 수 있다. 하나의 뉴럴 네트워크(600)에 합성 이미지와 실제 이미지가 입력되는 경우, 합성 이미지와 실제 이미지의 차이로 인해 뉴럴 네트워크(600)는 호환되기 어려울 수 있다. 만약 이러한 차이가 감소되지 않는다면, 합성 이미지를 이용해 학습시킨 뉴럴 네트워크는 합성 도메인에 치우치게 되고, 실제 이미지를 통한 3D 포즈 추정, 특징점 검출 및 3D 모델 정렬에 있어서 부정확한 결과가 도출될 수 있다. Differences exist between the composite image and the actual image, and domains can be distinguished based on the difference. Here, the composite image may be expressed as belonging to the composite domain, and the real image belonging to the real domain. When a composite image and an actual image are input to one neural network 600, the neural network 600 may be difficult to be compatible due to a difference between the composite image and the actual image. If this difference is not reduced, the neural network trained using the composite image is biased to the composite domain, and inaccurate results may be derived in 3D pose estimation, feature point detection, and 3D model alignment through the actual image.

합성 이미지와 실제 이미지 간의 차이를 없애기 위해, 뉴럴 네트워크(600)는 도메인 분류기(629)를 포함할 수 있다. 도메인 분류기(629)의 앞에 경사도 역전 레이어가 더 추가될 수 있다. 이처럼, 뉴럴 네트워크(600)는 도메인 분류기(629)를 포함함으로써 합성 이미지와 실제 이미지 간의 차이를 줄이고, 하나의 네트워크 구조를 통해 합성 이미지와 실제 이미지를 처리할 수 있다.To eliminate the difference between the composite image and the actual image, the neural network 600 may include a domain classifier 629. A gradient inversion layer may be further added in front of the domain classifier 629. As such, the neural network 600 may reduce the difference between the composite image and the real image by including the domain classifier 629, and process the composite image and the real image through one network structure.

뉴럴 네트워크(600)의 입력 이미지는, 예를 들어, RGB 세 개의 채널을 가진 224×224 사이즈의 이미지일 수 있다. 뉴럴 네트워크(600)의 베이직 네트워크(Basic network, Base Net)는 다양한 구조가 적용될 수 있다. 예를 들어, 베이직 네트워크는 VGG16의 FC7(full connected layer 7)까지의 레이어를 사용하고, 총 13개의 레이어로 이루어진 컨볼루션 레이어와 2 개의 레이어로 이루어진 풀 커넥티드 레이어(FC 6, FC7)가 포함될 수 있다. 예를 들어, 베이직 네트워크는 VGG16외에도, Alex Net, ResNet이 사용될 수 있다.The input image of the neural network 600 may be, for example, an image of size 224 × 224 having three channels of RGB. Various structures may be applied to the basic network (base net) of the neural network 600. For example, the basic network uses layers up to full connected layer 7 (FC7) of VGG16, and includes a convolutional layer composed of 13 layers and a full connected layer composed of 2 layers (FC 6, FC7). Can be. For example, in addition to VGG16, in the basic network, Alex Net and ResNet can be used.

도 6을 참조하면, 뉴럴 네트워크(600)는 오브젝트 유형 분류기(621), 3D 포즈 분류기(623), 특징점 검출기(625), 도메인 분류기(629)를 포함할 수 있다. 뉴럴 네트워크는 컨벌루셔널 레이어(convolutional layer)(610), 제1 풀 커넥티드 레이어(full connected layer)(627) 및 제2 풀 커넥티드 레이어(full connected layer)(628)를 포함할 수 있다. 오브젝트 유형 분류기(621), 3D 포즈 분류기(623), 특징점 검출기(625), 도메인 분류기(629)는 각각에 대응하는 손실 함수를 갖는다. Referring to FIG. 6, the neural network 600 may include an object type classifier 621, a 3D pose classifier 623, a feature point detector 625, and a domain classifier 629. The neural network may include a convolutional layer 610, a first full connected layer 627 and a second full connected layer 628. The object type classifier 621, the 3D pose classifier 623, the feature point detector 625, and the domain classifier 629 have corresponding loss functions.

오브젝트 유형 분류기(621)는, 예를 들어, 소프트맥스 손실(softmax loss) 함수 또는 힌지 손실(hinge loss) 함수를 사용할 수 있다. The object type classifier 621 may use, for example, a softmax loss function or a hinge loss function.

3D 포즈 분류기(623)는, 예를 들어, 회귀 문제로 모델링될 수도 있고 분류 문제로 모델링될 수도 있다. 여기서, 회귀 문제는 포즈 추정으로 분류될 수 있는 연속적 수치를 추정하는 것이고, 분류 문제는 포즈 유형의 추정하는 것이다. 3D 포즈 분류기(623)는 두 종류의 모델링 중의 어느 한 종류의 모델링을 사용할 수 있다. 회귀 문제 모델링이 적용될 경우, 3D 포즈 분류기(623)는 smooth_Ll 손실 함수를 사용할 수 있다. 분류 문제 모델링이 적용될 경우, 3D 포즈 분류기(623)는 소프트맥스 손실 함수 또는 힌지 손실 함수를 사용할 수 있다. 이하에서는 회귀 문제 모델링이 적용되는 것을 전제로 설명된다.The 3D pose classifier 623 may be modeled as a regression problem or a classification problem, for example. Here, the regression problem is to estimate a continuous number that can be classified as a pose estimation, and the classification problem is to estimate a pose type. The 3D pose classifier 623 may use any one of two types of modeling. When regression problem modeling is applied, the 3D pose classifier 623 may use a smooth_Ll loss function. When classification problem modeling is applied, the 3D pose classifier 623 may use a softmax loss function or a hinge loss function. Hereinafter, it is assumed that regression problem modeling is applied.

특징점 검출기(625)는, 예를 들어, 교차 엔트로피 손실 함수를 사용할 수 있다. 교차 엔트로피 손실 함수와 소프트맥스 손실 함수는 적용되는 문제가 다르다. 특징점 검출을 위한 실제 값을 설정할 때, 서로 다른 유형의 오브젝트의 특징점을 세트로 하는 2차원 채널이 사용될 수 있다. 2차원 채널을 통하여 뉴럴 네트워크는 서로 다른 유형의 오브젝트의 서로 다른 위치에 있는 특징점을 식별할 수 있다. 대응 채널에 대응 좌표가 없을 경우 0의 값 할당된다. The feature point detector 625 can, for example, use a cross entropy loss function. The cross entropy loss function and the softmax loss function are different in the applied problem. When setting an actual value for feature point detection, a two-dimensional channel using a set of feature points of different types of objects may be used. Through the 2D channel, the neural network can identify feature points at different locations of different types of objects. If there is no corresponding coordinate in the corresponding channel, a value of 0 is assigned.

합성 도메인과 실제 도메인을 분류하는 문제는 이분(dichotomy) 문제이다. 뉴럴 네트워크(600)는 2D 입력 이미지가 어떤 도메인인지 알지 못하는 상태에 놓일 수 있다. 뉴럴 네트워크(600)는 서로 다른 네트워크 구조에 따라 서로 다른 설계를 가질 수도 있다. 이를 통하여, 합성 도메인과 실제 도메인 사이의 차이가 약화될 수 있다. 이러한 도메인의 차이는 복수 종류의 방법 또는 네트워크 구조를 사용하여 감소될 수 있지만 이에 제한되지 않는다. The problem of classifying synthetic and real domains is a dichotomy problem. The neural network 600 may be placed in a state where the 2D input image does not know which domain it is. The neural network 600 may have different designs according to different network structures. Through this, the difference between the synthetic domain and the real domain can be weakened. Differences in these domains can be reduced using, but not limited to, multiple types of methods or network structures.

예를 들어, 뉴럴 네트워크(600)는 다음과 같은 구조를 가질 수 있다. 뉴럴 네트워크(600)에는 베이직 네트워크(Base Net)가 포함될 수 있고, 베이직 네트워크에 서로 다른 풀 커넥티드 레이어(FC_C, FC_P)가 연결될 수 있다. 서로 다른 풀 커넥티드 레이어(FC_C, FC_P)는 베이직 네트워크의 풀 커넥티드 레이어 8에 연결될 수 있다. FC_C는 오브젝트의 유형에 대응하고, FC_C의 노드의 수와 오브젝트의 총 유형의 수는 같다. 10개의 유형의 오브젝트에 대하여, FC_C의 노드의 수는 10이다. FC_C의 출력에 오브젝트 유형 분류기(621)의 손실 함수인 소프트맥스 손실 함수(L1)가 연결될 수 있다. 이러한 Base Net-FC_C-softmax 손실 함수 라인은 오브젝트 유형에 대응하며, 이를 통하여 오브젝트의 유형이 식별될 수 있다.For example, the neural network 600 may have the following structure. The neural network 600 may include a base network, and different full connected layers FC_C and FC_P may be connected to the basic network. Different full connected layers (FC_C, FC_P) may be connected to the full connected layer 8 of the basic network. FC_C corresponds to the type of object, and the number of nodes of FC_C and the total number of types of objects are the same. For 10 types of objects, the number of nodes of FC_C is 10. A softmax loss function L1, which is a loss function of the object type classifier 621, may be connected to the output of FC_C. The Base Net-FC_C-softmax loss function line corresponds to the object type, through which the type of the object can be identified.

FC_P의 각각의 교점은 오브젝트의 3D 포즈의 하나의 자유도에 대응되며, 6개의 자유도의 포즈를 추정하는 경우, FC_P의 교점의 수는 6으로 설정된다. FC_P의 손실함수는 smooth_L1 손실 함수(L2)이며, 이러한 Base Net-FC_P- smooth_L1 손실함수 라인은 오브젝트의 3D 포즈에 대응된다. Each intersection of FC_P corresponds to one degree of freedom of the 3D pose of the object, and when estimating a pose of six degrees of freedom, the number of intersections of FC_P is set to 6. The loss function of FC_P is the smooth_L1 loss function (L2), and this Base Net-FC_P- smooth_L1 loss function line corresponds to the 3D pose of the object.

Base Net의 풀링 레이어5(pool5)의 출력은 하나의 컨볼루션 레이어Conv_K에 연결되고, 특징점 검출에 대응된다. Conv_K의 채널 수와 모든 유형의 모든 특징점의 총 개수는 같다. 예를 들어, 10개의 유형의 오브젝트가 총 100개의 특징점을 가지는 경우, Conv_K의 채널 수를 100으로 설정될 수 있다. 3×3의 컨볼루션 커널을 거친 후의 각각의 채널의 크기는 7×7이다. Conv_K의 출력은 100×7×7이고, 특징점 검출기(625)의 손실 함수인 교차 엔트로피 손실 함수(L3)에 연결된다. 이러한 Base Net (pool5)-Conv6 교차 엔트로피 함수 라인은 특징점 검출에 대응된다. The output of the pooling layer 5 (pool5) of the base net is connected to one convolutional layer Conv_K and corresponds to feature point detection. The number of channels of Conv_K and the total number of all feature points of all types are the same. For example, when 10 types of objects have a total of 100 feature points, the number of channels of Conv_K may be set to 100. The size of each channel after going through the 3x3 convolution kernel is 7x7. The output of Conv_K is 100 × 7 × 7, and is connected to the cross-entropy loss function L3, which is a loss function of the feature point detector 625. This Base Net (pool5) -Conv6 cross entropy function line corresponds to feature point detection.

도메인 분류기(629)의 구현을 위하여, 베이직 네트워크에 GRL(gradient reversal layer)이 연결되고, 이후 FC(full connected) 레이어들이 연결되고, 이후에 풀 커넥티드 레이어인 FC_D가 연결될 수 있다. FC_D는 두 개의 도메인을 가지며, FC_D의 노드의 수는 2로 설정될 수 있다. 그 다음에, 도메인 분류기(629)가 연결되고, 도메인 분류기(629)의 손실 함수가 연결될 수 있다. 도메인 분류기(629)의 손실 함수(L4)는 소프트맥스 손실 함수 또는 힌지 손실 함수가 사용될 수 있다. 이러한 Base Net-GRL-FC layers-FC_D-softmax 손실 함수 라인은 강약이 다른 도메인에 영향을 주는 네트워크 모듈이다. 다만, 이러한 구조는 예시에 불과하며 생성적 적대 신경망(GAN)의 적대 과정과 같은 다른 방식도 사용될 수 있다.For the implementation of the domain classifier 629, a gradient reversal layer (GRL) is connected to the basic network, then full connected (FC) layers are connected, and then a full connected layer, FC_D, can be connected. FC_D has two domains, and the number of nodes of FC_D may be set to 2. Then, the domain classifier 629 is connected, and the loss function of the domain classifier 629 can be connected. As the loss function L4 of the domain classifier 629, a softmax loss function or a hinge loss function may be used. This Base Net-GRL-FC layers-FC_D-softmax loss function line is a network module in which strength and weakness affect different domains. However, this structure is only an example, and other methods such as a hostile process of a generative hostile neural network (GAN) may be used.

학습 과정에서, 합성 이미지와 실제 이미지가 뉴럴 네트워크(600)에 입력된다. 오브젝트 유형 분류기(621), 3D 포즈 분류기(623), 특징점 검출기(625), 도메인 분류기(629)의 각 손실 함수의 출력의 가중합인 L=a×L1+ b×L2+c×L3+d×L4이 최종적인 손실 함수로서 계산된다. 여기서, a, b, c, d는 모두 가중치이다. 손실 함수 L이 수렴될 때, 학습이 완료될 수 있다. In the learning process, the composite image and the actual image are input to the neural network 600. L = a × L1 + b × L2 + c × L3 + d ×, which is the weighted sum of the output of each loss function of the object type classifier 621, 3D pose classifier 623, feature point detector 625, and domain classifier 629 L4 is calculated as the final loss function. Here, a, b, c, and d are all weights. When the loss function L converges, learning can be completed.

도 6을 참조하면, 서로 다른 도메인의 입력 이미지들은 서로 다른 경로를 통해 처리될 수 있다. 합성 도메인에 해당하는 합성 이미지는 점선으로 표시된 경로를 통해 처리되고, 실제 도메인에 해당하는 실제 이미지는 쇄선으로 표시된 경로를 통해 처리될 수 있다. 실선으로 표시된 경로는 도메인과 무관하게 공통적으로 처리되는 경로를 나타낸다. Referring to FIG. 6, input images of different domains may be processed through different paths. The composite image corresponding to the composite domain may be processed through the path indicated by the dotted line, and the actual image corresponding to the actual domain may be processed through the path indicated by the dashed line. The path indicated by the solid line represents a path that is commonly processed regardless of the domain.

도 7은 일 실시예에 따른 3D 모델 정렬을 위한 뉴럴 네트워크의 학습 단계에서 뉴럴 네트워크에 상이한 포즈의 이미지가 입력되는 과정과 처리 과정을 도시한 흐름도이다. 7 is a flowchart illustrating a process in which images of different poses are input to a neural network in a learning step of a neural network for 3D model alignment according to an embodiment.

학습 데이터는 학습 2D 입력 이미지, 파지티브(positive) 샘플 및 네거티브(negative) 샘플을 포함하는 다량의 트라이어드(triad)를 포함할 수 있다. 여기서, 추정된 학습 2D 입력 이미지의 3D 포즈는 제1 포즈로, 다른 3D 포즈는 제2 포즈로 지칭될 수 있다. 파지티브 샘플과 네거티브 샘플은 제1 포즈와 제2 포즈로 렌더링된 두 개의 합성 이미지를 각각 포함할 수 있다. 파지티브 샘플과 네거티브 샘플은 하나 이상의 후보 3D 모델, 예를 들어, CAD 3D 모델을 통해 렌더링될 수 있다. 여기서, 파지티브 샘플은 학습 2D 입력 이미지와 유사한 이미지를 지칭하고, 네거티브 샘플은 학습 2D 입력 이미지와 비유사한 이미지를 지칭할 수 있다. The training data can include a large amount of triads, including training 2D input images, positive samples, and negative samples. Here, the 3D pose of the estimated learning 2D input image may be referred to as a first pose, and the other 3D pose may be referred to as a second pose. The positive sample and the negative sample may include two composite images rendered in the first pose and the second pose, respectively. Positive and negative samples may be rendered through one or more candidate 3D models, for example, CAD 3D models. Here, the positive sample may refer to an image similar to the learning 2D input image, and the negative sample may refer to an image similar to the learning 2D input image.

후보 3D 모델은 제1 후보 3D 모델 및 제2 후보 3D 모델을 포함할 수 있다. 합성 장치는 추정된 3D 포즈의 제1 후보 3D 모델의 제1 합성 이미지, 제2 포즈의 제1 후보 3D 모델의 제2 합성 이미지, 추정된 3D 포즈의 제2 후보 3D 모델의 제3 합성 이미지 및 제2 포즈의 제2 후보 3D 모델의 제4 합성 이미지를 획득할 수 있다. 예를 들어, 제1 후보 3D 모델과 오브젝트의 유사도는 임계값 이상이고, 제2 후보 3D 모델과 오브젝트의 유사도는 임계값 미만일 수 있다.The candidate 3D model may include a first candidate 3D model and a second candidate 3D model. The synthesis device includes a first composite image of the first candidate 3D model of the estimated 3D pose, a second composite image of the first candidate 3D model of the second pose, a third composite image of the second candidate 3D model of the estimated 3D pose, and the like. A fourth composite image of the second candidate 3D model in the second pose can be obtained. For example, the similarity between the first candidate 3D model and the object may be greater than or equal to the threshold, and the similarity between the second candidate 3D model and the object may be less than the threshold.

도 7을 참조하면, 학습 2D 입력 이미지(711)는 학습 장치에 입력될 수 있다. 학습 장치는 학습 2D 입력 이미지(711)의 제1 포즈를 추정할 수 있다. 학습 장치는 학습 2D 입력 이미지(711)로부터 제2 포즈의 학습 2D 입력 이미지(712)를 생성할 수 있다. 예를 들어, 학습 장치는 GAN을 이용하여 학습 2D 입력 이미지(712)를 생성할 수 있다. 학습 장치는 제1 포즈 및 제2 포즈 각각에 대응하는 하나 이상의 샘플 이미지(721, 722, 731, 732)를 준비할 수 있다. 학습 장치는 준비된 학습 데이터를 뉴럴 네트워크에 입력할 수 있다. Referring to FIG. 7, a learning 2D input image 711 may be input to a learning device. The learning device may estimate the first pose of the learning 2D input image 711. The learning device may generate a learning 2D input image 712 of the second pose from the learning 2D input image 711. For example, the learning device may generate a learning 2D input image 712 using GAN. The learning device may prepare one or more sample images 721, 722, 731, and 732 corresponding to each of the first pose and the second pose. The learning device may input the prepared learning data to the neural network.

학습 장치는 트라이어드의 각각의 이미지(711, 712, 721, 722, 731, 732)를 224×224로 정규화할 수 있다. 학습 장치는 뉴럴 네트워크의 베이직 네트워크(740)에 정규화된 트라이어드의 이미지를 입력할 수 있다. 베이직 네트워크(740)에 트라이어드의 각각의 이미지 별로 서로 다른 포즈에 대응하는 풀 커넥티드 레이어인 FC8_1(713, 723, 733) 및 FC8_2(714, 724, 734)가 연결되고, FC8_1(713, 723, 733) 및 FC8_2(714, 724, 734)의 노드의 수는 4096로 설정될 수 있다. The learning apparatus can normalize each image (711, 712, 721, 722, 731, 732) of the triad to 224 × 224. The learning device may input a normalized triad image to the basic network 740 of the neural network. In the basic network 740, FC8_1 (713, 723, 733) and FC8_2 (714, 724, 734), which are full connected layers corresponding to different poses for each image of the triad, are connected and FC8_1 (713, 723, 733) and the number of nodes of FC8_2 (714, 724, 734) may be set to 4096.

서로 다른 포즈를 가지는 이미지 쌍에 대해 뉴럴 네트워크는 각 이미지의 특징을 출력할 수 있다. 예를 들어, 학습 2D 입력 이미지(711)에 대응하는 특징 벡터, 생성된 학습 2D 입력 이미지(712)에 대응하는 특징 벡터, 파지티브 샘플(721)에 대응하는 특징 벡터, 파지티브 샘플(722)에 대응하는 특징 벡터, 네거티브 샘플(731)에 대응하는 특징 벡터, 네거티브 샘플(732)에 대응하는 특징 벡터가 출력될 수 있다.For pairs of images with different poses, the neural network can output characteristics of each image. For example, the feature vector corresponding to the learning 2D input image 711, the feature vector corresponding to the generated learning 2D input image 712, the feature vector corresponding to the positive sample 721, the positive sample 722 A feature vector corresponding to, a feature vector corresponding to the negative sample 731, and a feature vector corresponding to the negative sample 732 may be output.

각 이미지 쌍에 대한 특징은 상호 융합될 수 있다. 실제 이미지에 대응하는 학습 2D 입력 이미지(711)와 생성된 학습 2D 입력 이미지(712)의 융합된 특징은 제1 특징으로 지칭될 수 있다. 합성 이미지에 대응하는 샘플 이미지(721, 722) 또는 샘플 이미지(731, 732)의 융합된 특징들은 제2 특징으로 지칭될 수 있다. 여기서, 샘플 이미지(721, 722)와 샘플 이미지(731, 732)는 각각의 후보 3D 모델로부터 렌더링된 이미지일 수 있다. Features for each pair of images can be fused together. A fused feature of the learning 2D input image 711 corresponding to the actual image and the generated learning 2D input image 712 may be referred to as a first feature. Sample images 721 and 722 corresponding to the composite image or fused features of the sample images 731 and 732 may be referred to as second features. Here, the sample images 721 and 722 and the sample images 731 and 732 may be images rendered from each candidate 3D model.

트라이어드의 각각의 이미지 별로 FC8_1 또는 FC8_2를 통과한 서로 다른 자세에 대응하는 특징들은 융합될 수 있다. 상이한 포즈에 대응하는 특징들은 결합(concat)되거나 컨벌루션 레이어(convolutional layer)를 통해 컨볼루션되거나 다른 네트워크 구조(예를 들어, LSTM 등)를 통해 융합될 수 있다. 도 7을 참조하면, 학습 2D 이미지에 대응하는 FC8_1(713)의 출력 및 FC8_2(714)의 출력은 융합 구조(715)에 의해 융합될 수 있다. 파지티브 샘플에 대응하는 FC8_1(723)의 출력 및 FC8_2(724)의 출력은 융합 구조(725)에 의해 융합될 수 있다. 학습 2D 이미지에 대응하는 FC8_1(733)의 출력 및 FC8_2(734)의 출력은 융합 구조(735)에 의해 융합될 수 있다. For each image of the triad, features corresponding to different postures passing FC8_1 or FC8_2 may be fused. Features corresponding to different poses can be concatenated, convolutional through a convolutional layer, or fused through other network structures (eg, LSTM, etc.). Referring to FIG. 7, the output of the FC8_1 713 and the output of the FC8_2 714 corresponding to the learning 2D image may be fused by the fusion structure 715. The output of FC8_1 723 and the output of FC8_2 724 corresponding to the positive sample may be fused by the fusion structure 725. The output of the FC8_1 733 corresponding to the learning 2D image and the output of the FC8_2 734 may be fused by the fusion structure 735.

학습 장치는 제1 특징과 제2 특징의 유사도를 계산할 수 있다. 서로 다른 포즈의 이미지들의 특징들이 각각 융합된 후에, 3개의 특징은 한 개의 유사도를 판별하는 손실 함수에 입력될 수 있다. 손실 함수는, 예를 들어, 트리플렛 손실(triplet loss) 함수를 포함할 수 있다. 예를 들어, 학습 장치는 제1 특징과 샘플 이미지(721, 722)의 제2 특징 간의 유클리드 거리를 계산할 수 있다. 유클리드 거리가 작을수록 유사도가 크다는 것을 의미하며, 유사도가 가장 큰 후보 3D 모델이 타겟 3D 모델로 결정될 수 있다.The learning device may calculate the similarity between the first feature and the second feature. After the features of the images of different poses are respectively fused, the three features can be input to a loss function that determines one similarity. The loss function may include, for example, a triplet loss function. For example, the learning device may calculate the Euclidean distance between the first feature and the second feature of the sample images 721 and 722. The smaller the Euclidean distance, the greater the similarity, and the candidate 3D model having the highest similarity may be determined as the target 3D model.

이러한, 학습 과정을 통해, 학습 2D 입력 이미지와 파지티브 샘플의 특징은 근접해지고, 학습 2D 입력 이미지와 네거티브 샘플의 특징은 멀어지도록, 뉴럴 네트워크의 파라미터는 학습될 수 있다. Through this learning process, the parameters of the neural network can be learned such that the characteristics of the learning 2D input image and the positive sample are closer and the characteristics of the learning 2D input image and the negative sample are farther away.

도 8은 3D 모델 정렬 장치가 응용되는 제1 실시예를 도시한 도면이다.8 is a view illustrating a first embodiment in which a 3D model alignment device is applied.

오브젝트의 움직임 예측 장치는 3D 모델 정렬 장치의 결과를 이용하여 오브젝트의 움직임을 예측할 수 있다. 오브젝트의 움직임 예측 장치는 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득할 수 있다. 오브젝트의 움직임 예측 장치는 뉴럴 네트워크를 이용하여 2D 입력 이미지로부터 오브젝트의 특징점을 검출할 수 있다. 오브젝트의 움직임 예측 장치는 뉴럴 네트워크를 이용하여 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정할 수 있다. 오브젝트의 움직임 예측 장치는 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색할 수 있다. 오브젝트의 움직임 예측 장치는 특징점을 기초로 타겟 3D 모델과 오브젝트를 정렬할 수 있다. 오브젝트의 움직임 예측 장치는 정렬된 3D 모델을 기초로 오브젝트의 움직임을 예측할 수 있다.The motion estimation apparatus of the object may predict the motion of the object using the result of the 3D model alignment device. The apparatus for predicting motion of an object may acquire at least one 2D input image including the object. The apparatus for predicting motion of an object may detect a feature point of the object from a 2D input image using a neural network. The apparatus for predicting motion of an object may estimate a 3D pose of an object in a 2D input image using a neural network. The object motion prediction apparatus may search for a target 3D model based on the estimated 3D pose. The object motion prediction apparatus may align the target 3D model and the object based on the feature points. The object motion prediction apparatus may predict the motion of the object based on the aligned 3D model.

도8을 참조하면, 자율 주행 차량(802)에 설치된 오브젝트의 움직임 예측 장치는 주행 장면에서 추정해 낸 오브젝트(801)의 2D 입력 이미지로부터 특징점을 검출하고 오브젝트의 3D 포즈를 추정할 수 있다. 오브젝트의 움직임 예측 장치는 자율 주행 차량에 설치된 카메라로 좌측 도로로부터 진입 차량의 이미지를 촬영할 수 있고, 해당 2D 입력 이미지로부터 진입 차량의 3D 포즈 또는 특징점을 추정하여 진입 차량과 정합하는 타겟 3D 모델을 결정할 수 있다. 오브젝트의 움직임 예측 장치는 타겟 3D 모델의 3차원 정보를 이용하여 오브젝트의 3차원 지도상의 위치와 주행 방향, 차량 사이즈, 진행 방향과 속도 등을 추정해 낼 수 있다.Referring to FIG. 8, the apparatus for predicting motion of an object installed in the autonomous vehicle 802 may detect a feature point from a 2D input image of the object 801 estimated in the driving scene and estimate the 3D pose of the object. The object motion prediction device can capture an image of an entry vehicle from the left road with a camera installed in an autonomous vehicle, and estimates a 3D pose or feature point of the entry vehicle from the 2D input image to determine a target 3D model matching the entry vehicle Can be. The object motion prediction apparatus may estimate the position and driving direction, vehicle size, traveling direction and speed of the object on the 3D map using the 3D information of the target 3D model.

도 9는 3D 모델 정렬 장치가 응용되는 제2 실시예를 도시한 도면이다.9 is a view illustrating a second embodiment in which a 3D model alignment device is applied.

오브젝트의 텍스쳐 표시 장치는 3D 모델 정렬 장치의 결과를 이용하여 오브젝트의 표면에 텍스쳐를 표시할 수 있다. 오브젝트의 텍스쳐 표시 장치는 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득할 수 있다. 오브젝트의 텍스쳐 표시 장치는 뉴럴 네트워크를 이용하여 2D 입력 이미지로부터 오브젝트의 특징점을 검출할 수 있다. 오브젝트의 텍스쳐 표시 장치는 뉴럴 네트워크를 이용하여 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정할 수 있다. 오브젝트의 텍스쳐 표시 장치는 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색할 수 있다. 오브젝트의 텍스쳐 표시 장치는 특징점을 기초로 타겟 3D 모델과 오브젝트를 정렬할 수 있다. 오브젝트의 텍스쳐 표시 장치는 정렬된 3D 모델을 기초로 오브젝트의 표면에 텍스쳐를 표시할 수 있다. The texture display device of the object may display the texture on the surface of the object using the result of the 3D model alignment device. The texture display device of the object may acquire at least one 2D input image including the object. The object texture display device may detect a feature point of an object from a 2D input image using a neural network. The texture display device of the object may estimate the 3D pose of the object in the 2D input image using a neural network. The texture display device of the object may search for a target 3D model based on the estimated 3D pose. The texture display device of the object may align the target 3D model and the object based on the feature points. The texture display device of the object may display the texture on the surface of the object based on the aligned 3D model.

도 14를 참조하면, 오브젝트의 텍스쳐 표시 장치는 증강현실에서 정확한 타겟 3D 모델과 2D 입력 이미지의 정렬 결과에 기초하여 오브젝트의 표면에 임의의 텍스쳐를 표시할 수 있다. 오브젝트의 텍스쳐 표시 장치는 2D 입력 이미지로부터 추정한 3D 포즈를 기초로 타겟 3D 모델의 어느 면에 텍스쳐를 표시할지를 결정할 수 있다. 오브젝트의 텍스쳐 표시 장치는 타겟 3D 모델과 오브젝트를 정렬하고, 오브젝트의 표면에 텍스쳐를 표시하고, 타겟 3D 모델을 제거함으로써 오브젝트의 표면에 텍스쳐가 표시된 것과 같은 결과(901)를 도출할 수 있다. Referring to FIG. 14, an object texture display device may display an arbitrary texture on the surface of an object based on an alignment result of an accurate target 3D model and a 2D input image in augmented reality. The texture display device of the object may determine which side of the target 3D model to display the texture based on the 3D pose estimated from the 2D input image. The texture display apparatus of the object may derive a result 901 such that the texture is displayed on the surface of the object by aligning the target 3D model and the object, displaying the texture on the surface of the object, and removing the target 3D model.

도 10은 3D 모델 정렬 장치가 응용되는 제3 실시예를 도시한 도면이다.10 is a diagram illustrating a third embodiment in which a 3D model alignment device is applied.

가상의 3D 이미지 표시 장치는 3D 모델 정렬 장치의 결과를 이용하여 가상의 3D 이미지를 증강현실에서 표시할 수 있다. 가상의 3D 이미지 표시 장치는 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득할 수 있다. 가상의 3D 이미지 표시 장치는 뉴럴 네트워크를 이용하여 2D 입력 이미지로부터 오브젝트의 특징점을 검출할 수 있다. 가상의 3D 이미지 표시 장치는 뉴럴 네트워크를 이용하여 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정할 수 있다. 가상의 3D 이미지 표시 장치는 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색할 수 있다. 가상의 3D 이미지 표시 장치는 특징점을 기초로 타겟 3D 모델과 오브젝트를 정렬할 수 있다. 가상의 3D 이미지 표시 장치는 정렬된 3D 모델을 기초로 가상의 3D 이미지를 표시할 수 있다.The virtual 3D image display device may display a virtual 3D image in augmented reality using the result of the 3D model alignment device. The virtual 3D image display device may acquire at least one 2D input image including an object. The virtual 3D image display device may detect a feature point of an object from a 2D input image using a neural network. The virtual 3D image display device may estimate a 3D pose of an object in a 2D input image using a neural network. The virtual 3D image display device may search for a target 3D model based on the estimated 3D pose. The virtual 3D image display device may align the target 3D model and the object based on the feature points. The virtual 3D image display device may display a virtual 3D image based on the aligned 3D model.

도 10을 참조하면, 가상의 3D 이미지 표시 장치는 증강현실에서 실제의 탁자(1001) 위에 배치된 가상의 원기둥(1003)을 표시할 수 있다. 가상의 3D 이미지 표시 장치는 탁자(1001)의 3D 포즈를 추정하고, 3D 포즈를 기초로 가상의 원기둥(1003)을 배치할 수 있다. 가상의 3D 이미지 표시 장치는 탁자(1001)의 3D 포즈를 추정하고 탁자(1001)의 타겟 3D 모델을 획득하여 탁자(1001)의 3차원 정보를 획득할 수 있다. 가상의 3D 이미지 표시 장치는 3차원 정보를 이용하여 가상의 원기둥(1003)의 하부와 탁자(1001)의 상부가 맞물리도록 가상의 원기둥(1003)을 표시할 수 있다. Referring to FIG. 10, a virtual 3D image display device may display a virtual cylinder 1003 disposed on an actual table 1001 in augmented reality. The virtual 3D image display device may estimate the 3D pose of the table 1001 and place a virtual cylinder 1003 based on the 3D pose. The virtual 3D image display apparatus may obtain 3D information of the table 1001 by estimating the 3D pose of the table 1001 and obtaining a target 3D model of the table 1001. The virtual 3D image display device may display the virtual cylinder 1003 so that the lower portion of the virtual cylinder 1003 and the upper portion of the table 1001 mesh with each other using 3D information.

도 11은 3D 모델 정렬 장치가 응용되는 제4 실시예를 도시한 도면이다.11 is a diagram illustrating a fourth embodiment in which a 3D model alignment device is applied.

가상의 3D 이미지 제어 장치는 3D 모델 정렬 장치의 결과를 이용하여 가상의 3D 이미지를 제어할 수 있다. 도 16을 참조하면, 가상의 3D 이미지 제어 장치는 실제의 오브젝트의 3D 포즈, 특징점 정보 또는 타겟 3D 모델을 이용하여 가상의 3D 이미지를 제어할 수 있다. 예를 들어, 화면(1101)을 참조하면, 사람이 들고 있는 종이는 2차원으로서 2D AR 마커(AR marker, 증강현실 표시 도구)로서 기능할 수 있다. 화면(1103)을 참조하면, 사람이 들고 있는 컵은 3D AR 마커로서 기능할 수 있다. 가상의 3D 이미지 제어 장치는 실제의 오브젝트의 3D 포즈를 추정하고 타겟 3D 모델을 실제의 오브젝트와 일치되도록 세팅할 수 있다. 실제의 오브젝트가 움직이는 경우 타겟 3D 모델을 변화시킬 수 있다. The virtual 3D image control device may control the virtual 3D image using the result of the 3D model alignment device. Referring to FIG. 16, the virtual 3D image control apparatus may control a virtual 3D image using 3D pose, feature point information, or target 3D model of a real object. For example, referring to the screen 1101, paper held by a person may function as a 2D AR marker (AR marker) as a two-dimensional object. Referring to the screen 1103, a cup held by a person may function as a 3D AR marker. The virtual 3D image control apparatus may estimate the 3D pose of the real object and set the target 3D model to match the real object. If the real object is moving, the target 3D model can be changed.

예를 들어, 가상의 3D 이미지 제어 장치는 로봇의 팔을 조정하는 경우 적용될 수 있다. 가상의 3D 이미지 제어 장치는 카메라를 통해 로봇의 팔로부터 2D 입력 이미지를 수신하고, 2D 입력 이미지로부터 로봇의 팔의 3D 포즈를 추정하고 특징점을 검출할 수 있다. 가상의 3D 이미지 제어 장치는 로봇의 팔에 대응되는 타겟 3D 모델을 검색할 수 있다. 이후, 가상의 3D 이미지 제어 장치는 타겟 3D 모델을 이용하여 로봇의 팔의 3차원 정보를 획득하고, 로봇의 팔의 잡는 위치, 손 동작 등을 인식하고 제어할 수 있다. For example, a virtual 3D image control device may be applied when the robot arm is adjusted. The virtual 3D image control apparatus may receive a 2D input image from a robot arm through a camera, estimate a 3D pose of the robot arm from a 2D input image, and detect a feature point. The virtual 3D image control device may search for a target 3D model corresponding to the robot arm. Thereafter, the virtual 3D image control apparatus may acquire 3D information of the robot arm using the target 3D model, and recognize and control the position of the robot arm, hand movement, and the like.

도 12는 3D 모델 정렬 장치의 구체적인 구성을 도시한 도면이다.12 is a view showing a specific configuration of the 3D model alignment device.

3D 모델 정렬 장치(1200)는 적어도 하나의 프로세서(1201) 및 뉴럴 네트워크를 저장하는 메모리(1203)를 포함한다. 3D 모델 정렬 장치(1200)는 데이터베이스를 더 포함할 수 있다. 3D 모델 정렬 장치는 송수신부를 더 포함할 수 있다. 메모리는 뉴럴 네트워크 또는 하나 이상의 후보 3D 모델을 저장할 수 있다. 데이터베이스는 후보 3D 모델을 저장할 수 있다.The 3D model alignment device 1200 includes at least one processor 1201 and a memory 1203 storing a neural network. The 3D model alignment device 1200 may further include a database. The 3D model alignment device may further include a transceiver. The memory can store neural networks or one or more candidate 3D models. The database can store candidate 3D models.

3D 모델 정렬 장치(1200)는 오브젝트를 포함하는 적어도 하나의 2D 입력 이미지를 획득한다. 프로세서(1201)는 뉴럴 네트워크를 이용하여 2D 입력 이미지로부터 오브젝트의 특징점을 검출한다. 프로세서(1201)는 뉴럴 네트워크를 이용하여 2D 입력 이미지 내의 오브젝트의 3D 포즈를 추정한다. 프로세서(1201)는 추정된 3D 포즈를 기초로 타겟 3D 모델을 검색한다. 프로세서(1201)는 특징점을 기초로 타겟 3D 모델과 오브젝트를 정렬한다.The 3D model alignment device 1200 acquires at least one 2D input image including an object. The processor 1201 detects a feature point of an object from a 2D input image using a neural network. The processor 1201 estimates the 3D pose of the object in the 2D input image using the neural network. The processor 1201 searches for a target 3D model based on the estimated 3D pose. The processor 1201 aligns the target 3D model and the object based on the feature points.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer-readable medium. Computer-readable media may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded in the medium may be specially designed and configured for the embodiments or may be known and usable by those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs, DVDs, and magnetic media such as floptical disks. -Hardware devices specifically configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, etc., as well as machine language codes produced by a compiler. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instruction, or a combination of one or more of these, and configure the processing device to operate as desired, or process independently or collectively You can command the device. Software and / or data may be interpreted by a processing device, or to provide instructions or data to a processing device, of any type of machine, component, physical device, virtual equipment, computer storage medium or device. , Or may be permanently or temporarily embodied in the transmitted signal wave. The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with limited drawings, those skilled in the art may apply various technical modifications and variations based on the above. For example, the described techniques are performed in a different order than the described method, and / or the components of the described system, structure, device, circuit, etc. are combined or combined in a different form from the described method, or other components Alternatively, even if substituted or substituted by equivalents, appropriate results can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

Obtaining, by the transceiver, at least one 2D input image including the object;
Detecting, by a processor, a feature point of the object from the 2D input image using a neural network;
Estimating, by the processor, a 3D pose of an object in the 2D input image using the neural network;
Searching, by the processor, a target 3D model based on the estimated 3D pose; And
Aligning the target 3D model and the object based on the feature point by the processor.
Including, 3D model alignment method.

According to claim 1,
Acquiring the 2D input image,
And receiving a first 2D input image including the object in a first pose and a second 2D input image including the object in a second pose different from the first pose.

According to claim 1,
Acquiring the 2D input image,
Receiving a first 2D input image including the object in a first pose; And
Generating a 3D 2D input image including the object in a second pose different from the first pose
Including, 3D model alignment method.

According to claim 1,
And detecting the object in the 2D input image.

According to claim 1,
Estimating the 3D pose,
Classifying the type of the object using the neural network; And
Estimating a 3D pose of the object based on the classification result using the neural network
Including, 3D model alignment method.

According to claim 1,
Searching for the target 3D model,
Obtaining a first characteristic of the object of the 2D input image;
Obtaining a second characteristic of one candidate 3D model among the plurality of candidate 3D models; And
Determining whether the candidate 3D model is the target 3D model based on the first feature and the second feature
Including, 3D model alignment method.

The method of claim 6,
Determining whether or not the target 3D model,
Calculating a similarity between the first feature and the second feature; And
Determining whether the candidate 3D model is the target 3D model based on the similarity
Including, 3D model alignment method.

According to claim 1,
And adjusting the object or the target 3D model based on the estimated 3D pose, the feature points of the object, and the feature points of the target 3D model.

The method of claim 8,
The adjusting step,
Adjusting the target 3D model or the object using the estimated 3D pose; And
Re-adjusting the adjusted object or the adjusted target 3D model based on the feature points of the object and the feature points of the target 3D model.
Including, 3D model alignment method.

The method of claim 8,
The adjusting step,
Adjusting the object or the target 3D model based on the feature points of the object and the feature points of the target 3D model; And
Re-adjusting the adjusted target 3D model or the adjusted object using the estimated 3D pose
Including, 3D model alignment method.

Obtaining, by the transceiver, at least one 2D input image including the object;
Detecting, by a processor, a feature point of the object from the 2D input image using a neural network;
Estimating a 3D pose of an object in the 2D input image using the neural network by a processor;
Searching, by the processor, a target 3D model based on the estimated 3D pose;
Aligning, by the processor, the target 3D model and the object based on the feature point; And
Predicting, by the processor, the motion of the object based on the aligned 3D model;
A method of predicting motion of an object, comprising:

Obtaining, by the transceiver, at least one 2D input image including the object;
Detecting, by a processor, a feature point of the object from the 2D input image using a neural network;
Estimating a 3D pose of an object in the 2D input image using the neural network by a processor;
Searching, by the processor, a target 3D model based on the estimated 3D pose;
Aligning, by the processor, the target 3D model and the object based on the feature point; And
Displaying, by a processor, a texture on the surface of the object based on the aligned 3D model.
Including, the texture display method of the object.

Obtaining, by the processor, at least one 2D input image including the object;
Detecting, by a processor, a feature point of the object from the 2D input image using a neural network;
Estimating a 3D pose of an object in the 2D input image using the neural network by a display;
Searching, by the processor, a target 3D model based on the estimated 3D pose;
Aligning, by the processor, the target 3D model and the object based on the feature point; And
Displaying, by a display, a virtual 3D image based on the aligned 3D model.
Including, a virtual 3D image display method.

Obtaining, by the transceiver, at least one learning 2D input image including the object;
Estimating a 3D pose of an object in the 2D input image using a neural network by a processor;
Searching, by the processor, a target 3D model based on the estimated 3D pose;
Detecting, by a processor, a feature point of the object from the learning 2D input image using the neural network; And
Learning, by the processor, the neural network based on the estimated 3D pose or the detected feature point.
Including, neural network learning method.

The method of claim 14,
Estimating the 3D pose,
Classifying the type of the object using the neural network; And
And estimating a 3D pose of the object based on the classification result using the neural network,
In the step of training the neural network, training the neural network based on the classified type,
How to learn neural networks.

The method of claim 14,
Obtaining a composite image of at least one candidate 3D model of the estimated 3D pose; And
Further comprising the step of classifying the domain of the learning 2D input image and the composite image using the neural network,
The training of the neural network may include training the neural network based on the classified domain.
How to learn neural networks.

The method of claim 16,
The obtaining of the composite image may include:
The at least one candidate 3D model includes a first candidate 3D model and a second candidate 3D model, a first composite image of the first candidate 3D model of the estimated 3D pose, and a first candidate 3D model of the second pose Obtaining a second composite image of, a third composite image of the second candidate 3D model of the estimated 3D pose, and a fourth composite image of the second candidate 3D model of the second pose,
How to learn neural networks.

The method of claim 17,
The similarity between the first candidate 3D model and the object is greater than or equal to a threshold, and the similarity between the second candidate 3D model and the object is below a threshold,
How to learn neural networks.

At least one processor; And
Contains memory for storing neural networks,
The processor,
Acquire at least one 2D input image including the object,
Detecting a feature point of the object from the 2D input image using a neural network,
Estimating a 3D pose of an object in the 2D input image using the neural network,
Search for a target 3D model based on the estimated 3D pose,
Aligning the target 3D model and the object based on the feature point,
3D model alignment device.