KR102299902B1

KR102299902B1 - Apparatus for providing augmented reality and method therefor

Info

Publication number: KR102299902B1
Application number: KR1020200136716A
Authority: KR
Inventors: 임지숙; 하태원
Original assignee: (주)스마트큐브
Priority date: 2020-07-17
Filing date: 2020-10-21
Publication date: 2021-09-09

Abstract

Provided is a device for providing augmented reality. The device includes: a communication module for receiving a reference image and a contrast image photographed in different poses from a user device; an estimator for generating a point cloud representing 3D coordinates of the reference image by performing a plurality of operations to which weights learned between a plurality of layers are applied to the reference image and the contrast image through an estimation model; and an object enhancement unit for matching virtual objects based on the point cloud. Therefore, it is possible to provide an augmented reality image service to the public regardless of the type of device.

Description

Apparatus for providing augmented reality and method therefor

본 발명은 증강현실(AR: augmented reality) 기술에 관한 것으로, 보다 상세하게는, 증강현실을 제공하기 위한 장치 및 이를 위한 방법에 관한 것이다. The present invention relates to augmented reality (AR) technology, and more particularly, to an apparatus for providing augmented reality and a method therefor.

증강현실(AR: augmented reality)은 현실 세계에 컴퓨터 기술로 만든 가상물체 및 정보를 융합, 보완해 주는 기술을 말한다. 현실 세계에 실시간으로 부가정보를 갖는 가상 세계를 더해 하나의 영상으로 보여준다. Augmented reality (AR) refers to a technology that fuses and supplements the real world with virtual objects and information created by computer technology. The virtual world with additional information in real time is added to the real world and displayed as a single image.

한국공개특허 제2012-0122512호 2012년 11월 07일 공개 (명칭: ３Ｄ 증강현실 포토 서비스 시스템 및 이를 이용한 서비스 방법)Publication of Korean Patent Application Laid-Open No. 2012-0122512 on November 07, 2012 (Title: 3D augmented reality photo service system and service method using the same)

본 발명의 목적은 증강현실을 제공하기 위한 장치 및 이를 위한 방법을 제공함에 있다. An object of the present invention is to provide an apparatus for providing augmented reality and a method therefor.

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 증강현실을 제공하기 위한 장치는 사용자장치로부터 서로 다른 포즈에서 촬영된 기준영상 및 대조영상을 수신하는 통신모듈과, 학습된 추정모델을 통해 상기 기준영상 및 상기 대조영상에 대해 복수의 계층 간 학습된 가중치가 적용되는 복수의 연산을 수행하여 상기 기준영상의 3차원 좌표를 나타내는 인공 포인트클라우드를 생성하는 추정부와, 상기 인공 포인트클라우드를 기초로 가상 객체를 정합하는 객체증강부를 포함한다. An apparatus for providing augmented reality according to a preferred embodiment of the present invention for achieving the above object includes a communication module for receiving a reference image and a contrast image photographed in different poses from a user device, and a learned estimation model An estimator for generating an artificial point cloud representing the three-dimensional coordinates of the reference image by performing a plurality of operations to which the weights learned between a plurality of layers are applied to the reference image and the contrast image through the artificial point cloud; It includes an object augmentation unit for matching virtual objects based on .

상기 객체증강부는 상기 가상 객체가 정합된 증강현실 영상 및 상기 가상 객체가 정합된 좌표를 나타내는 정합 좌표 중 어느 하나를 사용자장치로 전송하는 것을 특징으로 한다. The object augmentation unit is characterized in that it transmits to the user device any one of an augmented reality image in which the virtual object is matched, and a matching coordinate indicating a coordinate in which the virtual object is matched.

상기 장치는 서로 다른 포즈에서 촬영된 학습용 기준영상, 학습용 대조영상 및 카메라 파라미터를 수집하고, 정산평균벡터를 이용하여 상기 학습용 기준영상의 기준윈도우와 상기 학습용 대조영상의 대조윈도우 간 정합비용이 최소인 위치를 검출하여 시차를 산출하고, 상기 카메라 파라미터 및 상기 시차를 이용하여 상기 기준영상의 모든 픽셀의 뎁스를 나타내는 검증용 뎁스맵을 도출하고, 상기 검증용 뎁스맵을 기초로 상기 기준영상의 3차원 좌표를 나타내는 검증용 포인트클라우드를 생성하는 학습데이터생성부와, 상기 학습용 기준영상 및 상기 학습용 대조영상, 상기 검증용 뎁스맵 및 상기 검증용 포인트클라우드를 포함하는 학습 데이터를 이용하여 추정모델이 인공 포인트클라우드를 생성하도록 학습시키는 모델생성부를 더 포함한다. The device collects the reference image for learning, the contrast image for learning, and the camera parameters taken in different poses, and uses the calculated average vector to have the minimum matching cost between the reference window of the reference image for learning and the contrast window of the contrast image for learning to calculate the parallax, derive a depth map for verification indicating depths of all pixels of the reference image using the camera parameter and the parallax, and 3D coordinates of the reference image based on the depth map for verification An estimation model is an artificial point cloud by using a training data generator for generating a point cloud for verification indicating that It further includes a model generator for learning to generate.

상기 추정모델은 상기 학습용 기준영상 및 상기 학습용 대조영상이 입력되면 상기 학습용 기준영상 및 상기 학습용 대조영상에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 통해 인공 뎁스맵을 생성하는 뎁스생성망과, 상기 검증용 뎁스맵 및 상기 인공 뎁스맵 중 어느 하나의 뎁스맵이 입력되면 입력된 뎁스맵이 진짜인지 혹은 가짜인지 여부를 확률로 출력하는 뎁스판별망과, 상기 학습용 기준영상 및 상기 인공 뎁스맵이 입력되면 상기 학습용 기준영상 및 상기 인공 뎁스맵에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 통해 인공 포인트클라우드를 생성하는 좌표생성망과, 상기 검증용 포인트클라우드 및 상기 인공 포인트클라우드 중 어느 하나의 포인트클라우드가 입력되면 입력된 포인트클라우드가 진짜인지 혹은 가짜인지 여부를 확률로 출력하는 좌표판별망을 포함한다. The estimation model is a depth generating network that generates an artificial depth map through a plurality of operations in which a plurality of inter-layer weights are applied to the reference image for learning and the contrast image for learning when the reference image for learning and the contrast image for learning are input; , When any one of the depth map for verification and the artificial depth map is input, a depth discrimination network that outputs whether the input depth map is real or fake with a probability, the reference image for learning and the artificial depth map When this is input, a coordinate generation network for generating an artificial point cloud through a plurality of operations in which a plurality of inter-layer weights are applied to the reference image for learning and the artificial depth map, and any of the point cloud for verification and the artificial point cloud When one point cloud is input, it includes a coordinate discrimination network that outputs whether the input point cloud is real or fake with a probability.

상기 모델생성부는 상기 추정모델의 원형을 상기 뎁스생성망 및 뎁스판별망을 포함하는 뎁스 그룹과 상기 좌표생성망 및 좌표판별망을 포함하는 좌표 그룹으로 구분하여 각 그룹 별로 그룹 학습을 수행하고, 상기 뎁스 그룹에 대한 학습이 평가 지표를 통해 기 설정된 정확도를 만족하면, 상기 뎁스 그룹의 가중치를 고정한 상태에서 뎁스생성망이 생성한 인공 뎁스맵을 이용하여 상기 좌표생성망 및 좌표판별망을 포함하는 좌표 그룹만을 학습시키는 심층 학습을 수행하는 것을 특징으로 한다. The model generator divides the prototype of the estimation model into a depth group including the depth generation network and the depth discrimination network and a coordinate group including the coordinate generation network and the coordinate discrimination network, and performs group learning for each group, and the When the learning for the depth group satisfies the preset accuracy through the evaluation index, the coordinates including the coordinate generating network and the coordinate discrimination network using the artificial depth map generated by the depth generating network in a state where the weight of the depth group is fixed It is characterized by performing deep learning that only trains a group.

상기 모델생성부는 상기 뎁스판별망이 검증용 뎁스맵을 진짜로 판별하고, 인공 뎁스맵을 가짜로 판별하도록 상기 뎁스생성망의 가중치를 고정한 상태에서 상기 뎁스판별망의 가중치를 수정하는 뎁스판별망 가중치 최적화와, 상기 뎁스판별망이 인공 뎁스맵을 진짜로 판별하도록 상기 뎁스판별망의 가중치를 고정한 상태에서 상기 뎁스생성망의 가중치를 수정하는 뎁스생성망 가중치 최적화를 교번으로 수행하는 동시에 상기 좌표판별망이 검증용 포인트클라우드를 진짜로 판별하고, 인공 포인트클라우드를 가짜로 판별하도록 상기 좌표생성망의 가중치를 고정한 상태에서 상기 좌표판별망의 가중치를 수정하는 좌표판별망 가중치 최적화와, 상기 좌표판별망이 인공 포인트클라우드를 진짜로 판별하도록 상기 좌표판별망의 가중치를 고정한 상태에서 상기 좌표생성망의 가중치를 수정하는 좌표생성망 가중치 최적화를 교번으로 수행하되, 소정의 평가 지표를 통해 상기 뎁스생성망 및 상기 뎁스판별망의 정확도가 기 설정된 정확도를 만족할 때까지 상기 뎁스판별망 가중치 최적화, 상기 뎁스생성망 가중치 최적화, 상기 좌표판별망 가중치 최적화 및 상기 좌표생성망 가중치 최적화를 반복하는 그룹 학습을 수행하는 것을 특징으로 한다. The model generator determines the depth map for verification as real and determines the depth map for verification as fake. Depth determination network weight optimization for correcting the weight of the depth determination network while fixing the weight of the depth generation network to determine the artificial depth map as fake And, while the depth determination network weights optimization of correcting the weights of the depth generation network in a state in which the weights of the depth determination network are fixed so that the depth determination network truly determines the artificial depth map, the coordinate determination network is verified at the same time Optimizing the weight of the coordinate discrimination network by correcting the weight of the coordinate discrimination network in a state where the weight of the coordinate generating network is fixed to determine the real point cloud for use and to determine the artificial point cloud as fake; In a state in which the weight of the coordinate discrimination network is fixed to determine , the weight optimization of the coordinate generating network that corrects the weight of the coordinate generating network is alternately performed, but the depth generating network and the depth discrimination network through a predetermined evaluation index It is characterized in that the group learning by repeating the depth determination network weight optimization, the depth generation network weight optimization, the coordinate determination network weight optimization, and the coordinate generation network weight optimization until the accuracy satisfies a preset accuracy.

상기 모델생성부는 상기 학습용 기준영상 및 상기 학습용 대조영상을 입력으로 상기 뎁스생성망을 통해 인공 뎁스맵을 생성하고, 상기 뎁스판별망을 통해 상기 뎁스생성망에 의해 생성된 인공 뎁스맵이 진짜 혹은 가짜로 판별되었는지 여부를 나타내는 플래그를 도출하고, 상기 학습용 기준영상 및 상기 뎁스생성망에 의해 생성된 인공 뎁스맵을 기초로 상기 좌표생성망을 통해 인공 포인트클라우드를 생성한 후, 상기 도출된 플래그가 상기 인공 뎁스맵이 진짜인 것을 나타내면, 상기 좌표판별망이 검증용 포인트클라우드를 진짜로 판별하고, 인공 포인트클라우드를 가짜로 판별하도록 상기 좌표생성망의 가중치를 고정한 상태에서 상기 좌표판별망의 가중치를 수정하는 좌표판별망 가중치 최적화와, 상기 좌표판별망이 인공 포인트클라우드를 진짜로 판별하도록 상기 좌표판별망의 가중치를 고정한 상태에서 상기 좌표생성망의 가중치를 수정하는 좌표생성망 가중치 최적화를 교번으로 수행하되, 소정의 평가 지표를 통해 상기 좌표생성망 및 상기 좌표판별망의 정확도가 기 설정된 정확도를 만족할 때까지 상기 좌표판별망 가중치 최적화, 상기 좌표생성망 가중치 최적화를 반복하는 심층 학습을 수행하는 것을 특징으로 한다. The model generator generates an artificial depth map through the depth generating network by receiving the reference image for learning and the contrast image for learning as inputs, and the artificial depth map generated by the depth generating network through the depth discrimination network is real or fake After deriving a flag indicating whether it is determined as , and generating an artificial point cloud through the coordinate generating network based on the learning reference image and the artificial depth map generated by the depth generating network, the derived flag is the When the artificial depth map indicates that it is real, the coordinate determination network determines the point cloud for verification as real, and corrects the weight of the coordinate determination network in a state where the weight of the coordinate generation network is fixed to determine the artificial point cloud as fake The weight optimization of the coordinate discrimination network and the weight optimization of the coordinate generation network in which the weight of the coordinate discrimination network is fixed while the weight of the coordinate discrimination network is fixed so that the coordinate discrimination network truly determines the artificial point cloud is performed alternately, It is characterized in that deep learning is performed by repeating the optimization of the weights of the coordinate generation network and the optimization of the weights of the coordinate generation network until the accuracy of the coordinate generation network and the coordinate determination network satisfies a preset accuracy through the evaluation index of .

상술한 바와 같은 목적을 달성하기 위한 본 발명의 바람직한 실시예에 따른 강현실을 제공하기 위한 방법은 통신모듈이 사용자장치로부터 서로 다른 포즈에서 촬영된 기준영상 및 대조영상을 수신하는 단계와, 추정부가 학습된 추정모델을 통해 상기 기준영상 및 상기 대조영상에 대해 복수의 계층 간 학습된 가중치가 적용되는 복수의 연산을 수행하여 상기 기준영상의 3차원 좌표를 나타내는 인공 포인트클라우드를 생성하는 단계와, 객체증강부가 상기 인공 포인트클라우드를 기초로 상기 기준영상에 가상 객체를 정합하여 증강현실 영상을 생성하는 단계를 포함한다. A method for providing a rigid reality according to a preferred embodiment of the present invention for achieving the object as described above includes the steps of: a communication module receiving a reference image and a contrast image photographed at different poses from a user device; Generating an artificial point cloud representing the three-dimensional coordinates of the reference image by performing a plurality of operations to which the weights learned between a plurality of layers are applied to the reference image and the control image through the learned estimation model; and generating an augmented reality image by an augmentation unit matching a virtual object to the reference image based on the artificial point cloud.

상기 방법은 상기 서로 다른 포즈에서 촬영된 기준영상 및 대조영상을 수신하는 단계 전, 학습데이터생성부가 서로 다른 포즈에서 촬영된 학습용 기준영상, 학습용 대조영상 및 카메라 파라미터를 수집하는 단계와, 상기 학습데이터생성부가 정산평균벡터를 이용하여 상기 학습용 기준영상의 기준윈도우와 상기 학습용 대조영상의 대조윈도우 간 정합비용이 최소인 위치를 검출하여 시차를 산출하는 단계와, 상기 학습데이터생성부가 상기 카메라 파라미터 및 상기 시차를 이용하여 상기 기준영상의 모든 픽셀의 뎁스를 나타내는 검증용 뎁스맵을 도출하는 단계와, 상기 학습데이터생성부가 상기 검증용 뎁스맵을 기초로 상기 기준영상의 3차원 좌표를 나타내는 검증용 포인트클라우드를 생성하는 단계와, 모델생성부가 상기 학습용 기준영상 및 상기 학습용 대조영상, 상기 검증용 뎁스맵 및 상기 검증용 포인트클라우드를 포함하는 학습 데이터를 이용하여 추정모델이 인공 포인트클라우드를 생성하도록 학습시키는 단계를 더 포함한다. The method includes, before the step of receiving the reference image and the contrast image photographed in the different poses, the learning data generating unit collecting the reference image for learning, the contrast image for learning, and the camera parameters photographed in the different poses, and the learning data Calculating a parallax by detecting a position where a matching cost is minimum between the reference window of the reference image for training and the contrast window of the contrast image for training by using a calculation average vector by a generator, and the learning data generator includes the camera parameter and the parallax Deriving a depth map for verification indicating the depth of all pixels of the reference image using The step of generating, and the step of training the estimation model to generate an artificial point cloud using the training data including the reference image for training, the contrast image for training, the depth map for verification, and the point cloud for verification, by the model generation unit. include more

상기 학습 데이터를 이용하여 추정모델이 인공 포인트클라우드를 생성하도록 학습시키는 단계는 상기 모델생성부가 상기 추정모델의 원형을 상기 뎁스생성망 및 뎁스판별망을 포함하는 뎁스 그룹과 상기 좌표생성망 및 좌표판별망을 포함하는 좌표 그룹으로 구분하여 각 그룹 별로 그룹 학습을 수행하는 단계와, 상기 모델생성부가 상기 뎁스 그룹에 대한 학습이 평가 지표를 통해 기 설정된 정확도를 만족하면, 상기 뎁스 그룹의 가중치를 고정한 상태에서 뎁스생성망이 생성한 인공 뎁스맵을 이용하여 상기 좌표생성망 및 좌표판별망을 포함하는 좌표 그룹만을 학습시키는 심층 학습을 수행하는 단계를 포함한다. In the step of learning the estimation model to generate an artificial point cloud by using the learning data, the model generator converts the original model of the estimation model into a depth group including the depth generation network and the depth discrimination network, and the coordinate generation network and coordinate discrimination Classifying into coordinate groups including a network and performing group learning for each group, and when the learning of the depth group by the model generator satisfies a preset accuracy through an evaluation index, the weight of the depth group is fixed and performing deep learning of learning only the coordinate group including the coordinate generating network and the coordinate discrimination network using the artificial depth map generated by the depth generating network.

본 발명에 따르면, 일반 이미지 센서가 적용된 일반 카메라만 존재하며 3D 센서, 스테레오 카메라, 혹은 뎁스 카메라가 없는 사용자장치를 통해서도 증강현실 영상을 서비스 할 수 있다. 따라서 장치의 종류에 무관하게 대중적으로 증강현실 영상 서비스를 제공할 수 있다. According to the present invention, only a general camera to which a general image sensor is applied exists, and an augmented reality image can be serviced through a user device without a 3D sensor, a stereo camera, or a depth camera. Therefore, it is possible to provide an augmented reality image service to the public regardless of the type of device.

도 1은 본 발명의 실시예에 따른 증강현실 영상을 제공하기 위한 시스템의 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 실시예에 따른 경량형 증강현실장치의 구성을 설명하기 위한 도면이다.
도 3은 본 발명의 실시예에 따른 가상현실 영상을 제공하기 위한 증강서버의 구성을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 증강현실을 제공하기 위한 장치의 세부 구성을 설명하기 위한 블록도이다.
도 5 내지 도 7은 본 발명의 실시예에 따른 학습 데이터를 생성하기 위한 방법을 설명하기 위한 도면이다.
도 8은 본 발명의 실시예에 따른 추정 모델의 내부 구성을 설명하기 위한 도면이다.
도 9는 본 발명의 실시예에 따른 가상 객체를 정합하는 방법을 설명하기 위한 도면이다.
도 10은 본 발명의 실시예에 따른 학습 데이터를 마련하는 방법을 설명하기 위한 흐름도이다.
도 11은 본 발명의 실시예에 따른 학습을 통해 추정모델을 생성하는 방법을 설명하기 위한 흐름도이다.
도 12는 본 발명의 실시예에 따른 뎁스 그룹과 좌표 그룹으로 구분하여 학습을 수행하는 방법을 설명하기 위한 흐름도이다.
도 13은 본 발명의 실시예에 따른 좌표 그룹에 대한 학습을 심층적으로 수행하는 심층 학습 방법을 설명하기 위한 흐름도이다.
도 14는 본 발명의 실시예에 따른 증강현실을 제공하기 위한 방법을 설명하기 위한 흐름도이다. 1 is a diagram for explaining the configuration of a system for providing an augmented reality image according to an embodiment of the present invention.
2 is a view for explaining the configuration of a lightweight augmented reality device according to an embodiment of the present invention.
3 is a diagram for explaining the configuration of an augmented server for providing a virtual reality image according to an embodiment of the present invention.
4 is a block diagram illustrating a detailed configuration of an apparatus for providing augmented reality according to an embodiment of the present invention.
5 to 7 are diagrams for explaining a method for generating learning data according to an embodiment of the present invention.
8 is a diagram for explaining an internal configuration of an estimation model according to an embodiment of the present invention.
9 is a diagram for explaining a method of matching virtual objects according to an embodiment of the present invention.
10 is a flowchart illustrating a method of preparing learning data according to an embodiment of the present invention.
11 is a flowchart illustrating a method of generating an estimation model through learning according to an embodiment of the present invention.
12 is a flowchart illustrating a method of performing learning by dividing into a depth group and a coordinate group according to an embodiment of the present invention.
13 is a flowchart for explaining a deep learning method for in-depth learning of a coordinate group according to an embodiment of the present invention.
14 is a flowchart illustrating a method for providing augmented reality according to an embodiment of the present invention.

본 발명의 상세한 설명에 앞서, 이하에서 설명되는 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 실시예에 불과할 뿐, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다. Prior to the detailed description of the present invention, the terms or words used in the present specification and claims described below should not be construed as being limited to their ordinary or dictionary meanings, and the inventors should develop their own inventions in the best way. For explanation, it should be interpreted as meaning and concept consistent with the technical idea of the present invention based on the principle that it can be appropriately defined as a concept of a term. Accordingly, the embodiments described in this specification and the configurations shown in the drawings are only the most preferred embodiments of the present invention, and do not represent all of the technical spirit of the present invention, so various equivalents that can be substituted for them at the time of the present application It should be understood that there may be water and variations.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예들을 상세히 설명한다. 이때, 첨부된 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타내고 있음을 유의해야 한다. 또한, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략할 것이다. 마찬가지의 이유로 첨부 도면에 있어서 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시되었으며, 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. In this case, it should be noted that the same components in the accompanying drawings are denoted by the same reference numerals as much as possible. In addition, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted. For the same reason, some components are exaggerated, omitted, or schematically illustrated in the accompanying drawings, and the size of each component does not fully reflect the actual size.

먼저, 본 발명의 실시예에 따른 증강현실(AR: augmented reality)을 제공하기 위한 시스템에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 증강현실 영상을 제공하기 위한 시스템의 구성을 설명하기 위한 도면이다. First, a system for providing augmented reality (AR) according to an embodiment of the present invention will be described. 1 is a diagram for explaining the configuration of a system for providing an augmented reality image according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 실시예에 따른 가상현실 영상을 제공하기 위한 시스템은 사용자장치(10) 및 증강서버(20)를 포함한다. Referring to FIG. 1 , a system for providing a virtual reality image according to an embodiment of the present invention includes a user device 10 and an augmented server 20 .

사용자장치(10)는 카메라 기능 및 통신 기능을 포함하는 장치이다. 사용자장치(10)는 서로 다른 포즈에서 현실 세계를 촬영한 복수의 영상을 증강서버(20)로 전송할 수 있다. The user device 10 is a device including a camera function and a communication function. The user device 10 may transmit a plurality of images obtained by photographing the real world in different poses to the augmentation server 20 .

증강서버(20)는 사용자장치(10)로부터 수신된 복수의 영상을 기초로 학습된 인공신경망(ANN: Artificial Neural Network)인 추정 모델을 통해 3차원 좌표를 포함하는 클라우드포인트를 도출할 수 있다. 그리고 증강서버(20)는 도출된 클라우드포인트의 3차원 좌표에 맞춰 사용자가 선택한 가상 객체(VO: Virtual Object)를 사용자가 촬영한 영상에 정합하여 증강현실 영상을 생성한 후, 사용자장치(20)에 제공할 수 있다. The augmentation server 20 may derive a cloud point including three-dimensional coordinates through an estimation model that is an artificial neural network (ANN) learned based on a plurality of images received from the user device 10 . Then, the augmented server 20 generates an augmented reality image by matching a virtual object (VO: Virtual Object) selected by the user according to the three-dimensional coordinates of the derived cloud point to the image captured by the user, and then the user device 20 can be provided to

그러면, 본 발명의 실시예에 따른 사용자장치(10)에 대해서 설명하기로 한다. 도 2는 본 발명의 실시예에 따른 경량형 증강현실장치의 구성을 설명하기 위한 도면이다. Then, the user device 10 according to the embodiment of the present invention will be described. 2 is a view for explaining the configuration of a lightweight augmented reality device according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 실시예에 따른 사용자장치(10)는 통신부(11), 카메라부(12), 센서부(13), 오디오부(14), 입력부(14), 표시부(15), 저장부(16) 및 제어부(17)를 포함한다. Referring to FIG. 2 , the user device 10 according to an embodiment of the present invention includes a communication unit 11 , a camera unit 12 , a sensor unit 13 , an audio unit 14 , an input unit 14 , and a display unit 15 . ), a storage unit 16 and a control unit 17 .

통신부(11)는 증강서버(20)와 통신을 위한 것이다. 통신부(11)는 송신되는 신호의 주파수를 상승 변환 및 증폭하는 RF(Radio Frequency) 송신기(Tx) 및 수신되는 신호를 저 잡음 증폭하고 주파수를 하강 변환하는 RF 수신기(Rx)를 포함할 수 있다. 그리고 통신부(11)는 송신되는 신호를 변조하고, 수신되는 신호를 복조하는 모뎀(Modem)을 포함할 수 있다. 통신부(11)는 제어부의 제어에 따라 증강서버(20)로 현실 세계에서 동일한 대상에 대해 서로 다른 포즈(Pose)에서 촬영된 복수의 영상을 전송한다. 또한, 통신부(11)는 증강서버(20)로부터 가상 객체, 포인트클라우드, 증강현실 영상 등을 수신할 수 있다. The communication unit 11 is for communication with the augmented server 20 . The communication unit 11 may include a radio frequency (RF) transmitter (Tx) for up-converting and amplifying the frequency of the transmitted signal, and an RF receiver (Rx) for low-noise amplifying the received signal and down-converting the frequency. In addition, the communication unit 11 may include a modem that modulates a transmitted signal and demodulates a received signal. The communication unit 11 transmits a plurality of images taken at different poses for the same object in the real world to the augmentation server 20 under the control of the control unit. Also, the communication unit 11 may receive a virtual object, a point cloud, an augmented reality image, and the like from the augmented server 20 .

카메라부(12)는 영상을 촬영하기 위한 것이다. 카메라부(12)는 렌즈 및 이미지센서를 포함할 수 있다. 각 이미지센서는 피사체에서 반사되는 빛을 입력받아 전기신호로 변환한다. 이미지 센서는 CCD(Charged Coupled Device), CMOS(Complementary Metal-Oxide Semiconductor) 등을 기반으로 구현될 수 있다. 또한, 카메라부(12)는 하나 이상의 아날로그-디지털 변환기(Analog to Digital Converter)를 더 포함할 수 있으며, 이미지센서에서 출력되는 전기신호를 디지털 수열로 변환하여 제어부(17)로 출력할 수 있다. The camera unit 12 is for capturing an image. The camera unit 12 may include a lens and an image sensor. Each image sensor receives the light reflected from the subject and converts it into an electrical signal. The image sensor may be implemented based on a Charged Coupled Device (CCD), a Complementary Metal-Oxide Semiconductor (CMOS), or the like. In addition, the camera unit 12 may further include one or more analog-to-digital converters, and may convert an electric signal output from the image sensor into a digital sequence and output it to the controller 17 .

센서부(13)는 관성을 측정하기 위한 것이다. 이러한 센서부(13)는 관성센서(Inertial Measurement Unit: IMU), 도플러속도센서(Doppler Velocity Log: DVL) 및 자세방위각센서(Attitude and Heading Reference. System: AHRS) 등을 포함한다. 센서부(13)는 사용자장치(10)의 카메라부(12)의 3차원 좌표 상의 위치 및 오일러 각을 포함하는 관성 정보를 측정하여 측정된 사용자장치(10)의 관성 정보를 제어부(17)로 제공한다. The sensor unit 13 is for measuring inertia. The sensor unit 13 includes an Inertial Measurement Unit (IMU), a Doppler Velocity Log (DVL), an Attitude and Heading Reference System (AHRS), and the like. The sensor unit 13 measures inertial information including the position and Euler angle on the three-dimensional coordinates of the camera unit 12 of the user device 10 and transmits the measured inertial information of the user device 10 to the control unit 17 . to provide.

입력부(14)는 사용자장치(10)를 제어하기 위한 사용자의 키 조작을 입력받고 입력 신호를 생성하여 제어부(17)에 전달한다. 입력부(14)는 사용자장치(10)을 제어하기 위한 각 종 키들을 포함할 수 있다. 입력부(14)는 표시부(15)가 터치스크린으로 이루어진 경우, 각 종 키들의 기능이 표시부(15)에서 이루어질 수 있으며, 터치스크린만으로 모든 기능을 수행할 수 있는 경우, 입력부(14)는 생략될 수도 있다. The input unit 14 receives a user's key manipulation for controlling the user device 10 , generates an input signal, and transmits the generated input signal to the control unit 17 . The input unit 14 may include various types of keys for controlling the user device 10 . In the input unit 14, when the display unit 15 is formed of a touch screen, the functions of various keys may be performed on the display unit 15, and when all functions can be performed only with the touch screen, the input unit 14 may be omitted. may be

표시부(15)는 사용자장치(10)의 메뉴, 입력된 데이터, 기능 설정 정보 및 기타 다양한 정보를 사용자에게 시각적으로 제공한다. 표시부(15)는 사용자장치(10)의 부팅 화면, 대기 화면, 메뉴 화면, 등의 화면을 출력하는 기능을 수행한다. 특히, 표시부(15)는 본 발명의 실시예에 따른 증강현실 영상을 화면으로 출력하는 기능을 수행한다. 이러한 표시부(15)는 액정표시장치(LCD, Liquid Crystal Display), 유기 발광 다이오드(OLED, Organic Light Emitting Diodes), 능동형 유기 발광 다이오드(AMOLED, Active Matrix Organic Light Emitting Diodes) 등으로 형성될 수 있다. 한편, 표시부(15)는 터치스크린으로 구현될 수 있다. 이러한 경우, 표시부(15)는 터치센서를 포함한다. 터치센서는 사용자의 터치 입력을 감지한다. 터치센서는 정전용량 방식(capacitive overlay), 압력식, 저항막 방식(resistive overlay), 적외선 감지 방식(infrared beam) 등의 터치 감지 센서로 구성되거나, 압력 감지 센서(pressure sensor)로 구성될 수도 있다. 상기 센서들 이외에도 물체의 접촉 또는 압력을 감지할 수 있는 모든 종류의 센서 기기가 본 발명의 터치센서로 이용될 수 있다. 터치센서는 사용자의 터치 입력을 감지하고, 터치된 위치를 나타내는 입력 좌표를 포함하는 감지 신호를 발생시켜 제어부(17)로 전송할 수 있다. 특히, 표시부(15)가 터치스크린으로 이루어진 경우, 입력부(14)의 기능의 일부 또는 전부는 표시부(15)를 통해 이루어질 수 있다. The display unit 15 visually provides a menu of the user device 10 , input data, function setting information, and various other information to the user. The display unit 15 performs a function of outputting a boot screen, a standby screen, a menu screen, and the like of the user device 10 . In particular, the display unit 15 performs a function of outputting the augmented reality image according to the embodiment of the present invention to the screen. The display unit 15 may be formed of a liquid crystal display (LCD), an organic light emitting diode (OLED), an active matrix organic light emitting diode (AMOLED), or the like. Meanwhile, the display unit 15 may be implemented as a touch screen. In this case, the display unit 15 includes a touch sensor. The touch sensor detects a user's touch input. The touch sensor may be composed of a touch sensing sensor such as a capacitive overlay, a pressure type, a resistive overlay, or an infrared beam, or may be composed of a pressure sensor. . In addition to the above sensors, all kinds of sensor devices capable of sensing contact or pressure of an object may be used as the touch sensor of the present invention. The touch sensor may detect a user's touch input, generate a detection signal including input coordinates indicating the touched position, and transmit it to the controller 17 . In particular, when the display unit 15 is formed of a touch screen, some or all of the functions of the input unit 14 may be performed through the display unit 15 .

저장부(16)는 사용자장치(10)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 특히, 저장부(16)는 카메라 파라미터 등을 저장할 수 있다. 또한, 저장부(16)에 저장되는 각 종 데이터는 사용자장치(10) 사용자의 조작에 따라, 삭제, 변경, 추가될 수 있다. The storage unit 16 serves to store programs and data necessary for the operation of the user device 10 . In particular, the storage unit 16 may store camera parameters and the like. In addition, various types of data stored in the storage unit 16 may be deleted, changed, or added according to a user's operation of the user device 10 .

제어부(17)는 사용자장치(10)의 전반적인 동작 및 사용자장치(10)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 또한, 제어부(17)는 기본적으로, 사용자장치(10)의 각 종 기능을 제어하는 역할을 수행한다. 제어부(17)는 CPU(Central Processing Unit), BP(baseband processor), AP(application processor), GPU(Graphic Processing Unit), DSP(Digital Signal Processor) 등을 예시할 수 있다. The controller 17 may control the overall operation of the user device 10 and the signal flow between internal blocks of the user device 10 , and may perform a data processing function of processing data. Also, the control unit 17 basically serves to control various functions of the user device 10 . The controller 17 may include a central processing unit (CPU), a baseband processor (BP), an application processor (AP), a graphic processing unit (GPU), a digital signal processor (DSP), and the like.

제어부(17)는 서로 다른 포즈의 카메라부(12)를 통해 촬영된 복수의 영상을 통신부(11)를 통해 증강서버(20)로 전송할 수 있다. 또한, 제어부(17)는 증강서버(20)로부터 증강현실 영상을 수신할 수 있다. 그러면, 제어부(17)는 증강현실 영상을 표시부(15)를 통해 표시한다. The control unit 17 may transmit a plurality of images captured through the camera unit 12 in different poses to the augmentation server 20 through the communication unit 11 . Also, the controller 17 may receive an augmented reality image from the augmented server 20 . Then, the control unit 17 displays the augmented reality image through the display unit 15 .

다음으로, 본 발명의 실시예에 따른 증강현실을 제공하기 위한 증강서버(20)에 대해서 설명하기로 한다. 도 3은 본 발명의 실시예에 따른 증강현실 영상을 제공하기 위한 증강서버의 구성을 설명하기 위한 도면이다. 도 3을 참조하면, 본 발명의 실시예에 따른 증강서버(20)는 통신모듈(21), 저장모듈(22) 및 제어모듈(23)을 포함한다. Next, the augmented server 20 for providing augmented reality according to an embodiment of the present invention will be described. 3 is a diagram for explaining the configuration of an augmented server for providing an augmented reality image according to an embodiment of the present invention. Referring to FIG. 3 , the augmented server 20 according to an embodiment of the present invention includes a communication module 21 , a storage module 22 , and a control module 23 .

통신모듈(21)은 네트워크를 통해 사용자장치(10)와 통신하기 위한 것이다. 통신모듈(21)은 사용자장치(10)와 데이터를 송수신 할 수 있다. 통신모듈(21)은 송신되는 신호의 주파수를 상승 변환 및 증폭하는 RF(Radio Frequency) 송신기(Tx) 및 수신되는 신호를 저 잡음 증폭하고 주파수를 하강 변환하는 RF 수신기(Rx)를 포함할 수 있다. 또한, 통신모듈(21)은 데이터를 송수신하기 위해 송신되는 신호를 변조하고, 수신되는 신호를 복조하는 모뎀(modem)을 포함할 수 있다. 이러한 통신모듈(21)은 제어모듈(23)로부터 전달 받은 데이터, 예컨대, 증강현실 영상을 사용자장치(10)로 전송할 수 있다. 또한, 통신모듈(21)은 사용자장치(10)로부터 수신되는 데이터, 예컨대, 서로 다른 포즈에서 촬영된 복수의 영상을 제어모듈(23)로 전달할 수 있다. The communication module 21 is for communicating with the user device 10 through a network. The communication module 21 may transmit/receive data to and from the user device 10 . The communication module 21 may include an RF (Radio Frequency) transmitter (Tx) for up-converting and amplifying the frequency of a transmitted signal, and an RF receiver (Rx) for low-noise amplifying a received signal and down-converting the frequency. . In addition, the communication module 21 may include a modem for modulating a signal to be transmitted and demodulating a signal to be received in order to transmit and receive data. The communication module 21 may transmit data received from the control module 23 , for example, an augmented reality image to the user device 10 . In addition, the communication module 21 may transmit data received from the user device 10 , for example, a plurality of images captured in different poses to the control module 23 .

저장모듈(22)은 증강서버(20)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행한다. 저장모듈(22)은 학습용 기준영상(TP), 학습용 대조영상(CP) 및 검증용 뎁스맵(GTD) 및 검증용 포인트클라우드(GTP)를 포함하는 학습 데이터를 저장할 수 있다. 저장모듈(22)에 저장되는 각 종 데이터는 증강서버(20) 관리자의 조작에 따라 등록, 삭제, 변경, 추가될 수 있다. The storage module 22 serves to store programs and data necessary for the operation of the augmented server 20 . The storage module 22 may store learning data including a reference image TP for learning, a contrast image CP for learning, a depth map for verification GTD, and a point cloud for verification GTP. Each type of data stored in the storage module 22 may be registered, deleted, changed, or added according to the operation of the augmented server 20 administrator.

제어모듈(23)은 증강서버(20)의 전반적인 동작 및 증강서버(20)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 제어모듈(130)은 중앙처리장치(central processing unit), 디지털신호처리기(digital signal processor) 등이 될 수 있다. 또한, 제어모듈(23)은 추가로 이미지 프로세서(Image processor) 혹은 GPU(Graphic Processing Unit)를 더 구비할 수 있다. The control module 23 may control the overall operation of the augmentation server 20 and the signal flow between internal blocks of the augmentation server 20 and perform a data processing function of processing data. The control module 130 may be a central processing unit, a digital signal processor, or the like. In addition, the control module 23 may further include an image processor or a graphic processing unit (GPU).

그러면, 전술한 제어모듈(23)의 증강현실을 제공하기 위한 세부적인 구성에 대해서 보다 상세하게 설명하기로 한다. 도 4는 본 발명의 실시예에 따른 증강현실을 제공하기 위한 장치의 세부 구성을 설명하기 위한 블록도이다. 도 5 내지 도 7은 본 발명의 실시예에 따른 학습 데이터를 생성하기 위한 방법을 설명하기 위한 도면이다. 도 8은 본 발명의 실시예에 따른 추정 모델의 내부 구성을 설명하기 위한 도면이다. 도 9는 본 발명의 실시예에 따른 가상 객체를 정합하는 방법을 설명하기 위한 도면이다. Then, a detailed configuration for providing the augmented reality of the aforementioned control module 23 will be described in more detail. 4 is a block diagram illustrating a detailed configuration of an apparatus for providing augmented reality according to an embodiment of the present invention. 5 to 7 are diagrams for explaining a method for generating learning data according to an embodiment of the present invention. 8 is a diagram for explaining an internal configuration of an estimation model according to an embodiment of the present invention. 9 is a diagram for explaining a method of matching virtual objects according to an embodiment of the present invention.

먼저, 도 4를 참조하면, 본 발명의 실시예에 따른 제어모듈(23)은 학습데이터생성부(110), 모델생성부(120), 추정부(130), 객체증강부(140) 및 가상객체처리부(150)를 포함한다. First, referring to FIG. 4 , the control module 23 according to an embodiment of the present invention includes a learning data generating unit 110 , a model generating unit 120 , an estimating unit 130 , an object augmenting unit 140 , and a virtual and an object processing unit 150 .

학습데이터생성부(110)는 본 발명의 실시예에 따른 추정모델(EM)을 생성하기 위한 학습 데이터를 생성하기 위한 것이다. 본 발명의 실시예에 따른 학습 데이터는 사용자장치(10)의 카메라부(12)의 서로 다른 포즈(Pose)에서 촬영된 학습용 기준영상(TP: conTrol Picture) 및 학습용 대조영상(CP: Comparison Picture)과, 학습용 기준영상(TP) 및 학습용 대조영상(CP) 사이의 시차(disparity)를 기초로 산출되는 검증용 뎁스맵(GTD: Ground Truth ?? Depth Map) 및 검증용 포인트클라우드(GTP: Ground Truth ?? Point Cloud)를 포함한다. 즉, 학습데이터생성부(110)는 학습용 기준영상(TP) 및 학습용 대조영상(CP)이 마련되면, 학습용 기준영상(TP) 및 학습용 대조영상(CP)으로부터 검증용 뎁스맵(GTD) 및 검증용 포인트클라우드(GTP)를 생성하여 학습 데이터를 완성한다. The training data generator 110 is for generating training data for generating the estimation model EM according to an embodiment of the present invention. Learning data according to an embodiment of the present invention is a learning reference image (TP: conTrol Picture) and a learning contrast image (CP: Comparison Picture) taken at different poses of the camera unit 12 of the user device 10 . Depth map (GTD: Ground Truth ?? Depth Map) and verification point cloud (GTP: Ground Truth) calculated based on the disparity between ?? Point Cloud). That is, when the training data generator 110 prepares the training reference image TP and the learning contrast image CP, the verification depth map GTD and verification from the training reference image TP and the learning contrast image CP are provided. Complete the learning data by creating a point cloud (GTP) for

보다 구체적으로 설명하면 다음과 같다. 먼저, 학습데이터생성부(110)는 학습용 기준영상(TP) 및 학습용 대조영상(CP)을 입력 받을 수 있다. 예컨대, 도 6에 도시된 바와 같은 학습용 기준영상(TP)은 도 5의 객체(obj)를 카메라부(12)의 제1 포즈(PS1)에서 촬영한 영상이 될 수 있다. 또한, 도 6의 학습용 대조영상(CP)은 학습용 기준영상(TP)과 동일한 객체인 도 5의 객체(obj)를 카메라부(12)의 제2 포즈(PS2)에서 촬영한 영상이 될 수 있다. In more detail, it is as follows. First, the learning data generator 110 may receive a reference image TP for learning and a contrast image CP for learning. For example, the reference image TP for learning as shown in FIG. 6 may be an image obtained by photographing the object obj of FIG. 5 in the first pose PS1 of the camera unit 12 . Also, the contrast image CP for learning of FIG. 6 may be an image obtained by capturing the object obj of FIG. 5 , which is the same object as the reference image TP for learning, in the second pose PS2 of the camera unit 12 . .

학습데이터생성부(110)는 학습용 기준영상(TP)에서 어느 하나의 특징점(FP)을 중심으로 소정 규격의 3×3의 단위 블록으로 구분되는 정사각형의 기준윈도우(TW: conTrol Window)를 생성하고, 기준윈도우(TW)의 정산평균벡터(census means vector)를 산출한다. 정산평균벡터(census means vector)는 기준윈도우(TW)의 중심 블록과 주변 블록의 평균 픽셀값의 대소를 비교한 결과를 기준윈도우(TW)의 주변 블록의 원소값으로 산출하고, 이를 순차로 나열한 것이다. 이때, 기준윈도우(TW)의 주변 블록의 원소값은 중심 블록 내의 평균 픽셀값과 각 주변 블록 내의 평균 픽셀값을 비교하여 주변 블록 내의 픽셀의 평균 픽셀값이 중심 블록 내의 픽셀의 평균 픽셀값 보다 크면 0이고, 작으면 1이다. The learning data generating unit 110 generates a square reference window (TW: conTrol Window) divided into 3 × 3 unit blocks of a predetermined standard centered on any one feature point (FP) in the reference image (TP) for learning, and , calculates the census means vector of the reference window TW. The census means vector calculates the result of comparing the magnitude of the average pixel values of the central block and the neighboring blocks of the reference window TW as element values of the neighboring blocks of the reference window TW, and lists them sequentially. will be. At this time, the element values of the neighboring blocks of the reference window TW compare the average pixel values in the central block with the average pixel values in each neighboring block. 0, and 1 if smaller.

예컨대, 도 6의 학습용 기준영상(TP)에서 기준윈도우(TW)의 각 블록 내에 포함된 픽셀들의 평균 픽셀값이 다음의 표 1과 같다고 가정한다. For example, it is assumed that average pixel values of pixels included in each block of the reference window TW in the training reference image TP of FIG. 6 are as shown in Table 1 below.

5252 4848 3232 3535 3131 1515 1010 2222 1010

그러면, 기준윈도우(TW)의 중심 블록을 기준으로 주변 블록의 원소값은 다음의 표 2와 같다. Then, the element values of the neighboring blocks based on the central block of the reference window TW are shown in Table 2 below.

00 00 00 00 1One 1One 1One 1One

그러면, 표 2에 따라 기준윈도우(TW)의 정산평균벡터(census means vector)는 [0, 0, 0, 0, 1, 1, 1, 1]이 된다. 그런 다음, 학습데이터생성부(110)는 학습용 대조영상(CP)에서 대조윈도우(CW: Comparison Window)를 생성한다. 대조윈도우(CW)는 기준윈도우(TW)와 동일한 크기의 3×3의 단위 블록으로 구분되는 정사각형의 윈도우이며, 기준윈도우(TW)와 마찬가지 방식으로 대조윈도우(CW)의 정산평균벡터(census means vector)를 산출할 수 있다. 즉, 대조윈도우(CW)의 중심 블록의 평균 픽셀값과 각 주변 블록의 평균 픽셀값의 대소를 비교하여 주변 블록의 원소값을 산출하고, 이를 나열하여 대조윈도우(CW)의 정산평균벡터를 구한다. 기준윈도우(TW)와 마찬가지로, 대조윈도우(CW)의 각 주변 블록 내의 평균 픽셀값이 중심 블록 내의 평균 픽셀값 보다 크면 0이고, 작으면 1이다. Then, according to Table 2, the census means vector of the reference window TW becomes [0, 0, 0, 0, 1, 1, 1, 1]. Then, the learning data generation unit 110 generates a comparison window (CW: Comparison Window) from the contrast image (CP) for learning. The control window (CW) is a square window divided into 3×3 unit blocks of the same size as the reference window (TW), and in the same manner as the reference window (TW), the census means of the control window (CW) vector) can be calculated. That is, the element values of the neighboring blocks are calculated by comparing the magnitude of the average pixel value of the central block of the comparison window CW and the average pixel value of each neighboring block, and the calculated average vector of the comparison window CW is obtained by arranging them. . Similar to the reference window TW, it is 0 if the average pixel value in each peripheral block of the contrast window CW is greater than the average pixel value in the central block, and 1 if it is less than the average pixel value in the central block.

학습데이터생성부(110)는 학습용 기준영상(TP)에서의 기준윈도우(TW)의 위치와 동일한 대조영상(CP)에서의 위치를 대조윈도우(CW)의 초기위치(L0)로 설정한다. 그리고 학습데이터생성부(110)는 대조윈도우(CW)를 초기위치(L0)에서 시작하여 소정의 검색 범위(R) 내에서 종료위치(Lmax)까지 소정의 픽셀 길이만큼씩 순차로 이동시키면서, 이동되는 각 위치에서 대조윈도우(CW)의 주변 블록의 원소값을 산출한다. The training data generator 110 sets the position in the contrast image CP identical to the position of the reference window TW in the reference image TP for learning as the initial position L0 of the contrast window CW. Then, the learning data generating unit 110 sequentially moves the matching window CW from the initial position L0 to the end position Lmax within the predetermined search range R by a predetermined pixel length. At each position, the element value of the neighboring block of the control window CW is calculated.

예컨대, 도 6의 학습용 대조영상(CP)의 초기위치(L0)에서 대조윈도우 CW(0)의 각 블록 내에 포함된 픽셀들의 평균 픽셀값이 다음의 표 3과 같다고 가정한다. For example, it is assumed that the average pixel value of pixels included in each block of the contrast window CW(0) at the initial position L0 of the training contrast image CP of FIG. 6 is as shown in Table 3 below.

2525 2121 1010 1010 1515 1010 1010 1616 1010

표 3에 따라, 초기위치(L0)에서의 대조윈도우 CW(0)의 주변 블록의 원소값은 다음의 표 4와 같다. According to Table 3, the element values of the neighboring blocks of the control window CW(0) at the initial position L0 are shown in Table 4 below.

00 00 1One 1One 1One 1One 00 1One

표 4에 따라 초기위치(L0)에서의 대조윈도우 CW(0)의 정산평균벡터는 [0, 0, 1, 1, 1, 1, 0, 1]이 된다. 이러한 방식으로, 학습데이터생성부(110)는 대조윈도우(CW)를 이동시키면서 대조윈도우(CW)의 정산평균벡터를 산출할 수 있다. 이때, 학습데이터생성부(110)는 이동되는 각 위치에서 산출된 대조윈도우(CW)의 정산평균벡터와 기준윈도우(TW)의 정산평균벡터를 비교하여 정합비용이 최소인 대조윈도우(CW)의 위치를 산출한다. 정합비용은 각 윈도우의 정산평균벡터의 원소 중 상이한 값을 가지는 원소의 수를 의미한다. 예컨대, 도 6의 실시예에서 기준윈도우(TW)의 정산평균벡터가 [0, 0, 0, 0, 1, 1, 1, 1]이고, 대조윈도우 CW(0)의 정산평균벡터가 [0, 0, 1, 1, 1, 1, 0, 1]이기 때문에 정합비용은 3, 4, 7 번째의 원소의 값이 상이하기 때문에 3이 된다. According to Table 4, the calculated average vector of the control window CW(0) at the initial position (L0) becomes [0, 0, 1, 1, 1, 1, 0, 1]. In this way, the learning data generating unit 110 may calculate the calculated average vector of the comparison window CW while moving the comparison window CW. At this time, the learning data generation unit 110 compares the calculated average vector of the control window CW calculated at each moving position with the average vector of the reference window TW, and the matching cost is the minimum of the comparison window CW. Calculate the location. The matching cost means the number of elements having different values among the elements of the average vector of each window. For example, in the embodiment of Fig. 6, the calculated mean vector of the reference window TW is [0, 0, 0, 0, 1, 1, 1, 1], and the calculated mean vector of the control window CW(0) is [0] , 0, 1, 1, 1, 1, 0, 1], the matching cost becomes 3 because the values of the 3rd, 4th, and 7th elements are different.

학습데이터생성부(110)는 전술한 바와 같은 방법으로 정합비용을 산출하여 정합비용이 최소인 대조윈도우(CW)의 위치를 구하고, 정합비용이 최소인 대조윈도우(CW)의 위치와 기준윈도우(TW)의 위치(대조윈도우의 초기위치 L0)의 차이인 시차(d)를 산출한다. The learning data generator 110 calculates the matching cost in the same manner as described above to obtain the position of the comparison window CW having the minimum matching cost, and the position of the comparison window CW having the minimum matching cost and the reference window ( The disparity d, which is the difference between the positions of TW) (the initial position L0 of the control window), is calculated.

그리고 학습데이터생성부(110)는 산출된 시차(d)로부터 다음의 수학식 1에 따라 뎁스를 구할 수 있다. In addition, the learning data generating unit 110 may obtain a depth from the calculated disparity d according to Equation 1 below.

여기서, Zd는 학습용 기준영상(TP)에서 사용자장치(10)의 카메라부(12)로부터 해당 특징점(FP)까지의 3차원 상의 거리인 뎁스(depth)를 나타낸다. d는 학습용 기준영상(TP)과 학습용 대조영상(CP) 간의 위치 차이를 나타내는 시차(disparity)를 의미한다. 이러한 시차(d)는 전술한 바와 같이 학습데이터생성부(110)에 의해 산출된다. bd는 기준 거리이며, 학습용 기준영상(TP)을 촬영할 때의 카메라부(12)의 포즈, 예컨대, 제1 포즈(PS1)에서 기준점(예컨대, 이미지센서의 중심)의 위치와, 학습용 대조영상(CP)을 촬영할 때의 카메라부(12)의 포즈, 예컨대, 제2 포즈(PS2)에서 기준점의 위치 간의 거리를 나타낸다. 예컨대, 도 5를 참조하면, 기준 거리(bd)는 제1 포즈(PS1)에서 이미지센서 IS(1)의 중심의 위치 (x1, y1, z1)와, 제2 포즈(PS2)에서 이미지센서 IS(2)의 중심의 위치 (x2, y2, z2) 간의 거리를 나타낸다. 카메라부(12)의 각 포즈에서의 기준점의 위치는 사용자장치(10)의 센서부(13)의 복수의 센서를 통해 측정될 수 있다. 그리고 fl는 카메라부(12)의 초점 거리(focal length)를 의미한다. 예컨대, 도 5를 참조하면, 초점 거리(fl)는 동일한 카메라부(12)를 사용하기 때문에 제1 포즈(PS1)에서 렌즈 LS(1)의 초점에서 이미지센서 IS(1)의 중심까지의 거리 혹은 제2 포즈(PS2)에서 렌즈 LS(2)의 초점에서 이미지센서 IS(2)의 중심까지의 거리를 나타낸다. Here, Zd represents a depth, which is a three-dimensional distance from the camera unit 12 of the user device 10 to the corresponding feature point FP in the learning reference image TP. d denotes a disparity representing a position difference between the reference image for training (TP) and the contrast image for training (CP). This disparity d is calculated by the learning data generating unit 110 as described above. bd is the reference distance, the pose of the camera unit 12 when shooting the reference image for learning TP, for example, the position of the reference point (eg, the center of the image sensor) in the first pose PS1, and the contrast image for learning ( CP) represents the distance between the positions of the reference points in the pose of the camera unit 12, for example, in the second pose PS2. For example, referring to FIG. 5 , the reference distance bd is the position (x1, y1, z1) of the center of the image sensor IS 1 in the first pose PS1, and the image sensor IS in the second pose PS2. (2) represents the distance between the positions of the centers (x2, y2, z2). The position of the reference point in each pose of the camera unit 12 may be measured through a plurality of sensors of the sensor unit 13 of the user device 10 . And fl denotes a focal length of the camera unit 12 . For example, referring to FIG. 5 , the focal length fl is the distance from the focal point of the lens LS 1 to the center of the image sensor IS 1 in the first pose PS1 because the same camera unit 12 is used. Alternatively, in the second pose PS2, the distance from the focal point of the lens LS 2 to the center of the image sensor IS 2 is indicated.

학습데이터생성부(110)는 전술한 바와 같은 방식을 통해 학습용 기준영상(TP)의 모든 특징점에 대한 뎁스를 구함으로써 뎁스맵(Depth Map)을 도출할 수 있다. 이와 같이, 도출된 뎁스맵은 학습 시, 검증을 위해 사용되며, 검증용 뎁스맵(GTD)이라고 칭한다. 이러한 검증용 뎁스맵(GTD)의 일례를 도 7의 (a)에 도시하였다. The learning data generator 110 may derive a depth map by obtaining depths for all the feature points of the reference image TP for learning through the method described above. As such, the derived depth map is used for verification during learning, and is referred to as a depth map for verification (GTD). An example of such a depth map (GTD) for verification is shown in FIG. 7A .

또한, 학습데이터생성부(110)는 기준영상(TP)의 모든 픽셀 각각의 2차원의 좌표를 알 수 있고, 검증용 뎁스맵(GTD)을 통해 기준영상(TP)의 모든 픽셀 각각의 뎁스를 알 수 있다. 또한, 카메라부(12)의 내부 파라미터, 즉, 초점 거리(fl)는 사용자장치(10)로부터 수신할 수 있다. 그리고 카메라부(12)의 외부 파라미터, 즉, 카메라부(12)의 포즈, 즉, 3차원 좌표 상의 위치 및 오일러 각은 센서부(13)가 측정한 정보를 수신할 수 있다. In addition, the learning data generator 110 may know the two-dimensional coordinates of each pixel of the reference image TP, and determine the depth of each pixel of the reference image TP through the depth map GTD for verification. Able to know. Also, an internal parameter of the camera unit 12 , that is, the focal length fl may be received from the user device 10 . In addition, the external parameters of the camera unit 12 , that is, the pose of the camera unit 12 , that is, the position and Euler angle on three-dimensional coordinates, may receive information measured by the sensor unit 13 .

이에 따라, 학습데이터생성부(110)는 카메라부(12)의 내부 파라미터, 즉, 초점 거리(fl) 및 외부 파라미터, 즉, 위치 및 오일러 각으로부터 변환 행렬을 구하고, 다음의 수학식 2에 따라 학습용 기준영상(TP)의 모든 픽셀 각각의 2차원의 좌표 및 뎁스로부터 변환 행렬을 이용하여 3차원 좌표를 얻을 수 있다. Accordingly, the learning data generating unit 110 obtains a transformation matrix from the internal parameters of the camera unit 12, that is, the focal length fl and external parameters, that is, the position and Euler angle, according to the following Equation 2 Three-dimensional coordinates may be obtained using a transformation matrix from two-dimensional coordinates and depths of each pixel of the training reference image TP.

수학식 2에서,

는 3차원 좌표이며,

는 뎁스 맵을 나타내며,

는 픽셀 좌표를 나타내고,

는 변환 행렬의 전치 행렬을 나타낸다. In Equation 2,

is a three-dimensional coordinate,

represents the depth map,

represents the pixel coordinates,

denotes the transpose matrix of the transformation matrix.

이와 같이, 3차원 좌표가 구해지면, 해당 3차원 좌표에 따라 포인트클라우드(Point Cloud)를 생성할 수 있다. 이와 같이, 생성된 포인트클라우드는 학습 시, 검증을 위해 사용되며, 검증용 포인트클라우드(GTP)라고 칭한다. 검증용 포인트클라우드(GTP)는 3차원 좌표를 내포한다. 이러한 검증용 포인트클라우드(GTP)의 일례를 도 7의 (b)에 도시하였다. As such, when the three-dimensional coordinates are obtained, a point cloud may be generated according to the three-dimensional coordinates. In this way, the generated point cloud is used for verification during learning, and is referred to as a point cloud for verification (GTP). The point cloud for verification (GTP) contains three-dimensional coordinates. An example of such a point cloud (GTP) for verification is shown in FIG. 7(b).

다음으로, 도 4 및 도 8을 참조하면, 모델생성부(120)는 학습(machine learning/deep learning)을 통해 추정모델(EM)을 생성한다. 다른 말로, 모델생성부(120)은 학습 데이터를 이용하여 추정모델(EM)이 서로 다른 포즈에서 촬영된 기준영상(TP) 및 대조영상(CP)으로부터 순차로 뎁스맵 및 포인트클라우드를 생성하도록 추정모델(EM)을 학습(machine learning/deep learning)시킨다. 추정모델(EM)이 생성한 뎁스맵 및 포인트클라우드를 검증용 뎁스맵(GTD) 및 검증용 포인트클라우드(GTP)와 구분하기 위하여 인공 뎁스맵(ADM: Artificial Depth Map) 및 인공 포인트클라우드(APC: Artificial Point Cloud)라고 칭한다. 추정모델(EM)에 대한 학습이 완료된 것을 추정모델(EM)을 생성하였다고 표현하며, 생성된 추정모델(EM)은 추정부(130)에 제공된다. 모델생성부(120)의 학습 방법은 아래에서 더 상세하게 설명될 것이다. Next, referring to FIGS. 4 and 8 , the model generator 120 generates an estimation model EM through machine learning/deep learning. In other words, the model generator 120 estimates that the estimation model EM sequentially generates a depth map and a point cloud from the reference image TP and the contrast image CP photographed at different poses by using the training data. The model (EM) is trained (machine learning/deep learning). In order to distinguish the depth map and point cloud generated by the estimation model (EM) from the depth map for verification (GTD) and the point cloud for verification (GTP), an artificial depth map (ADM: Artificial Depth Map) and an artificial point cloud (APC: Artificial Point Cloud). The completion of learning of the estimation model EM is expressed as the generation of the estimation model EM, and the generated estimation model EM is provided to the estimation unit 130 . The learning method of the model generator 120 will be described in more detail below.

도 8을 참조하면, 추정모델(EM)은 생성망(GN: generative Network) 및 판별망(DN: discriminative Network)을 포함한다. 또한, 생성망(GN)은 뎁스생성망(DGN) 및 좌표생성망(PGN)을 포함한다. 그리고 판별망(DN)은 뎁스판별망(DDN) 및 좌표판별망(PDN)을 포함한다. Referring to FIG. 8 , the estimation model EM includes a generative network (GN) and a discriminative network (DN). In addition, the generation network (GN) includes a depth generation network (DGN) and a coordinate generation network (PGN). And the discrimination network (DN) includes a depth discrimination network (DDN) and a coordinate discrimination network (PDN).

생성망(GN)의 뎁스생성망(DGN) 및 좌표생성망(PGN) 각각은 가중치가 적용되는 복수의 연산을 수행하는 복수의 계층을 포함한다. 여기서, 복수의 계층은 컨볼루션(Convolution) 연산을 수행하는 컨볼루션계층(CL: Convolution Layer), 다운샘플링(Down Sampling) 연산을 수행하는 풀링계층(PL: Pooling Layer) 및 업샘플링(Up Sampling) 연산을 수행하는 언풀링(UL: Unpooling Layer) 계층 및 디컨불루션 연산을 수행하는 디컨불루션 계층(DL: Deconvolution Layer) 각각을 하나 이상 포함한다. 컨볼루션, 다운샘플링, 업샘플링 및 디컨불루션 연산 각각은 소정의 행렬로 이루어진 필터(커널)를 이용하며, 이러한 행렬의 원소의 값들이 가중치가 된다. Each of the depth generating network (DGN) and the coordinate generating network (PGN) of the generating network (GN) includes a plurality of layers performing a plurality of calculations to which a weight is applied. Here, the plurality of layers are a convolution layer (CL) that performs a convolution operation, a pooling layer (PL) that performs a down sampling operation, and an up-sampling (Up Sampling) layer. It includes at least one each of an unpooling layer (UL) layer that performs an operation and a deconvolution layer (DL) that performs a deconvolution operation. Each of the convolution, downsampling, upsampling, and deconvolution operations uses a filter (kernel) composed of a predetermined matrix, and values of elements of these matrices become weights.

생성망(GN)은 서로 다른 포즈에서 촬영된 기준영상(TP) 및 대조영상(CP)이 입력되면, 입력된 영상(TP, CP)을 기초로 순차로 인공 뎁스맵(ADM) 및 인공 포인트클라우드(APC)를 생성한다. 생성망(GN)이 생성한 인공 뎁스맵(ADM) 및 인공 포인트클라우드(APC)는 저장모듈(22)에 저장되며, 학습 데이터로 사용되는 검증용 뎁스맵(GTD) 및 검증용 포인트클라우드(GTP)를 모사하여 생성된 것이다. 보다 구체적으로 설명하면, 생성망(GN)의 뎁스생성망(DGN)은 서로 다른 포즈에서 촬영된 기준영상(TP) 및 대조영상(CP)을 입력받고, 입력된 기준영상(TP) 및 대조영상(CP)에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 검증용 뎁스맵(GTD)를 모사하는 인공 뎁스맵(ADM)을 생성한다. 또한, 생성망(GN)의 좌표생성망(PGN)은 기준영상(TP) 및 인공 뎁스맵(ADM)을 입력 받고, 입력된 기준영상(TP) 및 인공 뎁스맵(ADM)에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 검증용 포인트클라우드(GT)를 모사하는 인공 포인트클라우드(APC)를 생성한다. When a reference image (TP) and a contrast image (CP) taken in different poses are input, the generating network (GN) sequentially performs an artificial depth map (ADM) and an artificial point cloud based on the input images (TP, CP). (APC) is created. The artificial depth map (ADM) and artificial point cloud (APC) generated by the generating network (GN) are stored in the storage module 22, and the depth map (GTD) for verification and the point cloud (GTP) for verification used as training data ) was created by simulating More specifically, the depth generating network (DGN) of the generating network (GN) receives a reference image (TP) and a contrast image (CP) photographed in different poses, and receives the input reference image (TP) and the contrast image. An artificial depth map (ADM) simulating a depth map (GTD) for verification is generated by performing a plurality of operations to which a plurality of inter-layer weights are applied to (CP). In addition, the coordinate generating network (PGN) of the generating network (GN) receives a reference image (TP) and an artificial depth map (ADM), and a plurality of layers for the input reference image (TP) and the artificial depth map (ADM) An artificial point cloud (APC) that mimics the point cloud (GT) for verification is generated by performing a plurality of operations to which the weights are applied.

판별망(DN)의 뎁스판별망(DDN) 및 좌표판별망(PDN) 각각은 가중치가 적용되는 복수의 연산을 수행하는 복수의 계층을 포함한다. 여기서, 복수의 계층은 입력층(IL: Input Layer), 컨벌루션(convolution) 연산 및 활성화함수에 의한 연산을 수행하는 컨벌루션층(CL: Convolution Layer), 풀링(pooling 또는 sub-sampling) 연산을 수행하는 풀링층(PL: Pooling Layer), 활성화함수에 의한 연산을 수행하는 완전연결층(FL: Fully-connected Layer) 및 활성화함수에 의한 연산을 수행하는 출력층(OL: Output Layer)을 포함한다. 여기서, 컨볼루션층(CL), 풀링층(PL) 및 완전연결층(FL) 각각은 2 이상이 될 수도 있다. 컨볼루션층(CL) 및 풀링층(PL)은 적어도 하나의 특징맵(FM: Feature Map)으로 구성된다. 특징맵(FM)은 이전 계층의 연산 결과에 대해 가중치(W)를 적용한 값을 입력받고, 입력받은 값에 대한 연산을 수행한 결과로 도출된다. 이러한 가중치(W)는 소정 크기의 가중치 행렬인 필터 혹은 커널(W)을 통해 적용된다. 전술한 컨벌루션층(CL), 완결연결층(FL) 및 출력층(OL)에서 사용되는 활성화함수는 시그모이드(Sigmoid), 하이퍼볼릭탄젠트(tanh: Hyperbolic tangent), ELU(Exponential Linear Unit), ReLU(Rectified Linear Unit), Leakly ReLU, Maxout, Minout, Softmax 등을 예시할 수 있다. 컨벌루션층(CL), 완결연결층(FL) 및 출력층(OL)에 이러한 활성화함수 중 어느 하나를 선택하여 적용할 수 있다. Each of the depth discrimination network (DDN) and the coordinate discrimination network (PDN) of the discrimination network DN includes a plurality of layers that perform a plurality of operations to which a weight is applied. Here, the plurality of layers is an input layer (IL: Input Layer), a convolution layer (CL: Convolution Layer) that performs an operation by a convolution operation and an activation function, and a pooling (pooling or sub-sampling) operation to perform It includes a pooling layer (PL), a fully-connected layer (FL) that performs an operation by an activation function, and an output layer (OL) that performs an operation by an activation function. Here, each of the convolutional layer CL, the pooling layer PL, and the fully connected layer FL may be two or more. The convolution layer CL and the pooling layer PL include at least one feature map (FM). The feature map FM is derived as a result of receiving a value to which a weight W is applied to the operation result of the previous layer, and performing an operation on the input value. This weight W is applied through a filter or kernel W that is a weight matrix of a predetermined size. Activation functions used in the above-described convolutional layer (CL), final connection layer (FL) and output layer (OL) are Sigmoid, Hyperbolic tangent (tanh), Exponential Linear Unit (ELU), and ReLU. (Rectified Linear Unit), Leakly ReLU, Maxout, Minout, Softmax, and the like can be exemplified. Any one of these activation functions may be selected and applied to the convolutional layer CL, the final connection layer FL, and the output layer OL.

뎁스판별망(DDN)은 검증용 뎁스맵(GTD) 및 인공 뎁스맵(ADM) 중 어느 하나의 뎁스맵(GTD/ADM)이 입력되면, 입력된 뎁스맵(GTD/ADM)에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 입력된 뎁스맵(GTD/ADM)이 진짜(real)인지 혹은 가짜(fake)인지 여부를 확률로 출력한다. 여기서, 진짜(real)는 모사의 대상이 되는 검증용 뎁스맵(GTD)을 의미하며, 가짜(fake)는 뎁스생성망(DGN)이 생성한 인공 뎁스맵(ADM)을 의미한다. 뎁스판별망(DDN)의 출력이 진짜를 나타내면, 뎁스판별망(DDN)에 입력된 뎁스맵(GTD/ADM)을 검증용 뎁스맵(GTD)인 것으로 판별했다는 것을 의미한다. 반면, 뎁스판별망(DDN)의 출력이 가짜를 나타내면, 뎁스판별망(DDN)에 입력된 뎁스맵(GTD/ADM)을 인공 뎁스맵(ADM)로 판별했다는 것을 의미한다. When a depth map (GTD/ADM) of any one of a depth map for verification (GTD) and an artificial depth map (ADM) is input, the depth discrimination network (DDN) includes a plurality of layers with respect to the input depth map (GTD/ADM). By performing a plurality of calculations to which the inter-weighting is applied, whether the input depth map (GTD/ADM) is real or fake is output as a probability. Here, real means a depth map for verification (GTD) that is the object of simulation, and fake means an artificial depth map (ADM) generated by a depth generating network (DGN). When the output of the depth discrimination network (DDN) indicates authenticity, it means that the depth map (GTD/ADM) input to the depth discrimination network (DDN) is determined to be the depth map for verification (GTD). On the other hand, when the output of the depth discrimination network (DDN) indicates a fake, it means that the depth map (GTD/ADM) input to the depth discrimination network (DDN) is determined as the artificial depth map (ADM).

좌표판별망(PDN)은 검증용 포인트클라우드(GTP) 및 인공 포인트클라우드(APC) 중 어느 하나의 포인트클라우드(GTP/APC)가 입력되면, 입력된 포인트클라우드(GTP/APC)에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 입력된 포인트클라우드(GTP/APC)가 진짜(real)인지 혹은 가짜(fake)인지 여부를 확률로 출력한다. 여기서, 진짜(real)는 모사의 대상이 되는 검증용 포인트클라우드(GTP)을 의미하며, 가짜(fake)는 포인트생성망(PGN)이 생성한 인공 포인트클라우드(APC)을 의미한다. 좌표판별망(PDN)의 출력이 진짜를 나타내면, 좌표판별망(PDN)에 입력된 포인트클라우드(GTP/APC)를 검증용 포인트클라우드(GTP)인 것으로 판별했다는 것을 의미한다. 반면, 좌표판별망(PDN)의 출력이 가짜를 나타내면, 좌표판별망(PDN)에 입력된 포인트클라우드(GTP/APC)를 인공 포인트클라우드(APC)인 것으로 판별했다는 것을 의미한다. When any one of the point cloud (GTP/APC) of the verification point cloud (GTP) and the artificial point cloud (APC) is input, the PDN is a plurality of layers for the input point cloud (GTP/APC). By performing a plurality of calculations to which the inter-weighting is applied, whether the input point cloud (GTP/APC) is real or fake is output as a probability. Here, real means a point cloud (GTP) for verification that is a target of imitation, and fake means an artificial point cloud (APC) generated by a point generating network (PGN). When the output of the coordinate discrimination network (PDN) indicates the truth, it means that the point cloud (GTP/APC) input to the coordinate discrimination network (PDN) is determined as the point cloud (GTP) for verification. On the other hand, if the output of the coordinate discrimination network (PDN) indicates a fake, it means that the point cloud (GTP/APC) input to the coordinate discrimination network (PDN) is determined to be an artificial point cloud (APC).

추정부(130)는 추정모델(EM)을 이용하여 인공 뎁스맵(ADM) 및 인공 포인트클라우드(APC)를 순차로 도출한다. 즉, 추정부(130)는 서로 다른 포즈에서 촬영된 기준영상(TP) 및 대조영상(CP)이 입력되면, 입력된 기준영상(TP) 및 대조영상(CP)을 추정모델(EM)의 뎁스생성망(DGN)에 입력한다. 그러면, 뎁스생성망(DGN)은 기준영상(TP) 및 대조영상(CP)에 대해 복수의 계층 간 학습된 가중치가 적용되는 복수의 연산을 수행하여 인공 뎁스맵(ADM)을 생성한다. 그런 다음, 추정부(130)는 기준영상(TP) 및 뎁스생성망(DGN)이 생성한 인공 뎁스맵(ADM)을 좌표생성망(PGN)에 입력한다. 그러면, 좌표생성망(PGN)은 기준영상(TP) 및 인공 뎁스맵(ADM)에 대해 복수의 계층 간 학습된 가중치가 적용되는 복수의 연산을 수행하여 인공 포인트클라우드(APC)를 생성한다. 그러면, 추정부(130)는 생성된 인공 포인트클라우드(APC)를 객체증강부(140)에 제공한다. The estimator 130 sequentially derives an artificial depth map (ADM) and an artificial point cloud (APC) by using the estimation model (EM). That is, when the reference image TP and the contrast image CP photographed at different poses are input, the estimator 130 uses the input reference image TP and the contrast image CP as the depth of the estimation model EM. input into the generative network (DGN). Then, the depth generating network (DGN) generates an artificial depth map (ADM) by performing a plurality of operations to which the weights learned between a plurality of layers are applied to the reference image (TP) and the contrast image (CP). Then, the estimator 130 inputs the reference image TP and the artificial depth map ADM generated by the depth generating network DGN to the coordinate generating network PGN. Then, the coordinate generating network (PGN) generates an artificial point cloud (APC) by performing a plurality of operations to which the weights learned between a plurality of layers are applied to the reference image (TP) and the artificial depth map (ADM). Then, the estimator 130 provides the generated artificial point cloud (APC) to the object augmentation unit 140 .

객체증강부(140)는 추정부(130)로부터 수신된 인공 포인트클라우드(APC)를 기초로 가상 객체를 정합할 수 있다. 즉, 객체증강부(140)는 기준영상(TP)에 대응하는 인공 포인트클라우드(APC)에 내포된 3차원 좌표를 기준으로 가상 객체를 인공 포인트클라우드(APC)에 정합함으로써, 인공 포인트클라우드(APC)에 대응하는 기준영상(TP)에 가상 객체(VO)를 정합할 수 있다. 예컨대, 도 9의 (c)에 인공 포인트클라우드(APC) 상에 가상 객체(VO)를 정합한 화면 예를 도시하였다. 그리고 인공 포인트클라우드(APC)는 기준영상(TP)의 3차원 좌표 표현이기 때문에 도 9의 (d)와 같이 인공 포인트클라우드(APC) 상에 정합된 가상 객체(VO)는 그대로 기준영상(TP)에 정합된다. The object augmentation unit 140 may match the virtual object based on the artificial point cloud (APC) received from the estimator 130 . That is, the object augmentation unit 140 matches the virtual object to the artificial point cloud (APC) based on the three-dimensional coordinates contained in the artificial point cloud (APC) corresponding to the reference image (TP), thereby artificial point cloud (APC). ), the virtual object VO may be registered with the reference image TP corresponding to the . For example, an example of a screen in which the virtual object VO is matched on the artificial point cloud APC is shown in FIG. 9C . And since the artificial point cloud (APC) is a three-dimensional coordinate expression of the reference image (TP), the virtual object (VO) registered on the artificial point cloud (APC) as shown in FIG. 9 (d) is the reference image (TP) as it is. is matched to

가상객체처리부(150)는 사용자장치(10)로부터 가상 객체 리스트에 대한 요청을 수신하면, 저장모듈(22)에 저장된 가상 객체 리스트를 로드한 후, 통신모듈(21)을 통해 섬네일 형식으로 사용자장치(10)에 제공할 수 있다. 또한, 가상객체처리부(150)는 사용자장치(10)로부터 사용자가 선택한 가상 객체(VO)에 대한 식별 정보를 수신하면, 저장모듈(22)에 저장된 해당 가상 객체(VO)를 통신모듈(21)을 통해 사용자장치(10)에 제공할 수 있다. 그리고 가상객체처리부(150)는 통신모듈(21)을 통해 가상객체를 합성할 표면을 촬영하도록 하는 안내 메시지를 전송할 수 있다. Upon receiving a request for a virtual object list from the user device 10 , the virtual object processing unit 150 loads the virtual object list stored in the storage module 22 , and then loads the virtual object list stored in the storage module 22 in the form of a thumbnail through the communication module 21 . (10) can be provided. In addition, when the virtual object processing unit 150 receives identification information on the virtual object VO selected by the user from the user device 10 , the virtual object processing unit 150 transmits the virtual object VO stored in the storage module 22 to the communication module 21 . may be provided to the user device 10 through In addition, the virtual object processing unit 150 may transmit a guide message for photographing the surface on which the virtual object is to be synthesized through the communication module 21 .

가상객체처리부(150)는 사용자장치(10)로부터 가상 객체의 포즈(pose)를 수신할 수 있다. 이러한 경우, 가상객체처리부(150)는 해당 가상 객체 및 수신된 가상 객체의 포즈를 객체증강부(140)에 제공한다. The virtual object processing unit 150 may receive a pose of the virtual object from the user device 10 . In this case, the virtual object processing unit 150 provides the corresponding virtual object and the pose of the received virtual object to the object enhancing unit 140 .

다음으로, 본 발명의 실시예에 따른 학습을 통해 추정모델을 생성하는 방법에 대해서 설명하기로 한다. 학습을 위해서는 학습 데이터를 마련해야 하기 때문에 우선, 학습 데이터를 마련하는 절차에 대해서 설명하기로 한다. 도 10은 본 발명의 실시예에 따른 학습 데이터를 마련하는 방법을 설명하기 위한 흐름도이다. Next, a method for generating an estimation model through learning according to an embodiment of the present invention will be described. Since it is necessary to prepare learning data for learning, first, a procedure for preparing the learning data will be described. 10 is a flowchart illustrating a method of preparing learning data according to an embodiment of the present invention.

도 10을 참조하면, 학습데이터생성부(110)는 S110 단계에서 서로 다른 포즈에서 촬영된 학습용 기준영상(TP), 학습용 대조영상(CP) 및 카메라 파라미터를 수집한다. 여기서, 카메라 파라미터는 해당 학습용 기준영상(TP) 및 학습용 대조영상(CP)을 촬영한 카메라 혹은 카메라부(12)의 내부 파라미터, 즉, 초점 거리(fl) 및 외부 파라미터, 즉, 위치 및 오일러 각을 포함한다. Referring to FIG. 10 , the learning data generating unit 110 collects a reference image for learning (TP), a contrast image for learning (CP), and camera parameters taken at different poses in step S110. Here, the camera parameters are internal parameters, ie, focal length fl, and external parameters, ie, position and Euler angle of the camera or camera unit 12 that have photographed the corresponding learning reference image TP and the learning contrast image CP. includes

학습데이터생성부(110)는 S120 단계에서 학습용 기준영상(TP)의 특징점(FP)을 중심으로 하는 기준윈도우(TW)의 정산평균벡터(census means vector)와, 학습용 대조영상(CP)의 검색 범위(R)에서 이동하는 대조윈도우(CW)의 정산평균벡터를 비교하여 학습용 기준영상(TP), 학습용 대조영상(CP) 간 정합비용이 최소인 대조윈도우(CW)의 위치를 검출하고, S130 단계에서 정합비용이 최소인 대조윈도우(CW)의 위치와 기준윈도우(TW)의 위치(대조윈도우의 초기위치 L0)의 차이인 시차(d)를 산출한다. The learning data generating unit 110 searches for the census means vector of the reference window TW centered on the feature point FP of the reference image TP for learning in step S120 and the contrast image CP for learning. By comparing the average vector of the control window (CW) moving in the range (R), the position of the control window (CW) with the minimum matching cost between the training reference image (TP) and the training contrast image (CP) is detected, S130 In the step, the disparity d, which is the difference between the position of the control window CW having the minimum matching cost and the position of the reference window TW (the initial position of the control window L0), is calculated.

그리고 학습데이터생성부(110)는 S140 단계에서 수학식 1에 따라 시차(d) 대 기준 거리(bd)와 초점 거리(fl)의 곱에 비율에 따라 뎁스를 구함으로써, 학습용 기준영상(TP)의 검증용 뎁스맵(GTD)을 도출할 수 있다. And the learning data generating unit 110 obtains the depth according to the ratio of the product of the parallax (d) versus the reference distance (bd) and the focal length (fl) according to Equation 1 in step S140, so that the reference image for learning (TP) It is possible to derive a depth map (GTD) for verification of

그런 다음, 학습데이터생성부(110)는 S150 단계에서 카메라 파라미터를 통해 변환 행렬을 구하고, 수학식 2에 따라 학습용 기준영상(TP)의 모든 픽셀 각각의 2차원의 좌표 및 뎁스로부터 변환 행렬을 이용하여 3차원 좌표를 획득할 수 있다. 그리고 학습데이터생성부(110)는 S160 단계에서 획득된 3차원 좌표가 점으로 형성되는 검증용 포인트클라우드(GTP)를 생성할 수 있다. 이와 같이, 도출된 검증용 뎁스맵(GTD) 및 검증용 포인트클라우드(GTP)은 학습용 기준영상(TP) 및 학습용 대조영상(CP)과 함께 상호 매핑되어 저장모듈(22)에 저장된다. Then, the learning data generator 110 obtains a transformation matrix through the camera parameters in step S150, and uses the transformation matrix from the two-dimensional coordinates and depth of each pixel of the reference image TP for learning according to Equation 2 3D coordinates can be obtained. In addition, the learning data generator 110 may generate a point cloud (GTP) for verification in which the three-dimensional coordinates obtained in step S160 are formed of points. In this way, the derived depth map for verification (GTD) and the point cloud for verification (GTP) are mutually mapped together with the reference image for learning (TP) and the contrast image for learning (CP) and stored in the storage module 22 .

학습용 기준영상(TP), 학습용 대조영상(CP), 검증용 뎁스맵(GTD) 및 검증용 포인트클라우드(GTP)가 준비되면, 학습을 통해 추정모델을 생성할 수 있다. 그러면, 본 발명의 실시예에 따른 학습을 통해 추정모델을 생성하는 방법에 대해서 설명하기로 한다. 도 11은 본 발명의 실시예에 따른 학습을 통해 추정모델을 생성하는 방법을 설명하기 위한 흐름도이다. When the reference image for training (TP), the contrast image for training (CP), the depth map for verification (GTD), and the point cloud for verification (GTP) are prepared, an estimation model can be generated through training. Then, a method for generating an estimation model through learning according to an embodiment of the present invention will be described. 11 is a flowchart illustrating a method of generating an estimation model through learning according to an embodiment of the present invention.

도 11을 참조하면, 모델생성부(120)는 S210 단계에서 뎁스생성망(DGN) 및 좌표생성망(PGN)을 포함하는 생성망(GN)과 뎁스판별망(DDN) 및 좌표판별망(PDN)을 포함하는 판별망(DN)으로 이루어진 추정모델(EM)의 원형을 뎁스생성망(DGN) 및 뎁스판별망(DDN)을 포함하는 뎁스 그룹과 좌표생성망(PGN) 및 좌표판별망(PDN)을 포함하는 좌표 그룹으로 구분하여 각 그룹 별로 학습하는 그룹 학습을 수행한다. 이때, 좌표생성망(PGN)에 대한 입력은 인공 뎁스맵(ADM) 대신 검증용 뎁스맵(GTD)를 이용한다. Referring to FIG. 11 , the model generation unit 120 generates a depth generation network (DGN) and a coordinate generation network (PGN) including a generation network (GN), a depth determination network (DDN), and a coordinate determination network (PDN) in step S210. ), a depth group including a depth generation network (DGN) and a depth discrimination network (DDN), and a coordinate generation network (PGN) and a coordinate discrimination network (PDN) ) is divided into coordinate groups containing In this case, an input to the coordinate generating network (PGN) uses a depth map for verification (GTD) instead of an artificial depth map (ADM).

뎁스 그룹에 대한 학습이 평가 지표를 통해 기 설정된 정확도를 만족하면, 모델생성부(120)은 S220 단계에서 뎁스생성망(DGN) 및 뎁스판별망(DDN)을 포함하는 뎁스 그룹의 학습을 종료하고, 뎁스생성망(DGN) 및 뎁스판별망(DDN)을 포함하는 뎁스 그룹의 가중치를 고정한 상태에서 뎁스생성망(DGN)이 생성한 인공 뎁스맵(ADM)를 이용하여 좌표생성망(PGN) 및 좌표판별망(PDN)을 포함하는 좌표 그룹만을 학습시키는 심층 학습을 수행한다. When the learning for the depth group satisfies the preset accuracy through the evaluation index, the model generator 120 ends the learning of the depth group including the depth generating network (DGN) and the depth discrimination network (DDN) in step S220, , a coordinate generating network (PGN) and Deep learning is performed to learn only the coordinate group including the coordinate discrimination network (PDN).

이와 같이, 뎁스 그룹을 학습시키는 동안 좌표 그룹에 대한 학습에 검증용 뎁스맵(GTD)을 이용하고, 뎁스 그룹의 학습이 종료된 후, 뎁스생성망(DGN)이 생성한 인공 뎁스맵(ADM)을 이용하여 좌표생성망(PGN) 및 좌표판별망(PDN)을 포함하는 좌표 그룹을 학습시킴으로써 학습에 소요되는 시간을 획기적으로 줄일 수 있다. In this way, while learning the depth group, the depth map (GTD) for verification is used for learning about the coordinate group, and after the learning of the depth group is finished, the depth generating network (DGN) generates an artificial depth map (ADM) The time required for learning can be dramatically reduced by learning a coordinate group including a coordinate generating network (PGN) and a coordinate discriminating network (PDN) using the .

그러면, 전술한 S210 단계에 대해 보다 상세하게 설명하기로 한다. 도 12는 본 발명의 실시예에 따른 뎁스 그룹과 좌표 그룹으로 구분하여 학습을 수행하는 방법을 설명하기 위한 흐름도이다. 이러한 도 12는 도 11의 S110 단계를 보다 상세하게 설명하기 위한 것이다. Then, the above-described step S210 will be described in more detail. 12 is a flowchart illustrating a method of performing learning by dividing into a depth group and a coordinate group according to an embodiment of the present invention. FIG. 12 is for explaining step S110 of FIG. 11 in more detail.

도 12를 참조하면, 모델생성부(120)는 S310 단계에서 저장부(16)에 저장된 학습 데이터를 로드함으로써 학습 데이터를 준비한다. 학습 데이터는 학습용 기준영상(TP), 학습용 대조영상(CP), 검증용 뎁스맵(GTD) 및 검증용 포인트클라우드(GTP)를 포함한다. 이러한 학습 데이터를 생성하는 방법에 대해서는 앞서 도 10을 참조로 설명한 바 있다. Referring to FIG. 12 , the model generation unit 120 prepares the training data by loading the training data stored in the storage unit 16 in step S310 . The training data includes a reference image for training (TP), a contrast image for training (CP), a depth map for verification (GTD), and a point cloud for verification (GTP). A method of generating such training data has been previously described with reference to FIG. 10 .

학습 데이터가 준비되면, 모델생성부(120)는 S320 단계에서 뎁스판별망(DDN)의 가중치를 최적화한다. 이러한 뎁스판별망 가중치 최적화는 뎁스판별망(DDN)이 검증용 뎁스맵(GTD)를 진짜로 판별하고, 인공 뎁스맵(ADM)를 가짜로 판별하도록 뎁스생성망(DGN)의 가중치를 고정한 상태에서 뎁스판별망(DDN)의 가중치를 수정하는 것이다. S220 단계에 대해 보다 상세하게 설명하면 다음과 같다. 우선, 모델생성부(120)는 검증용 뎁스맵(GTD)에 대한 레이블을 진짜(real [P(gtd) ≥ 0.5])로 설정하고, 인공 뎁스맵(ADM)에 대한 레이블을 가짜(fake [P(adm) < 0.5])로 설정한다. 그리고 모델생성부(120)는 학습용 기준영상(TP) 및 학습용 대조영상(CP)을 뎁스생성망(DGN)에 입력하여 뎁스생성망(DGN)이 인공 뎁스맵(ADM)을 생성하도록 한다. 그런 다음, 인공 뎁스맵(ADM) 및 검증용 뎁스맵(GTD) 중 어느 하나의 뎁스맵(ADM/GTD)을 뎁스판별망(DDN)에 입력한다. 그러면, 뎁스판별망(DDN)은 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 입력된 뎁스맵(ADM/GTD)이 진짜(real)인지 혹은 가짜(fake)인지 여부에 대한 확률을 나타내는 판별값을 산출한다. 판별값이 산출되면, 모델생성부(120)는 손실함수를 이용하여 판별값과 앞서 설정된 레이블과의 차이인 손실을 산출하고, 뎁스생성망(DGN)의 가중치를 고정한 상태에서 산출된 손실이 최소가 되도록 뎁스판별망(DDN)의 가중치를 최적화한다. When the training data is prepared, the model generator 120 optimizes the weight of the depth discrimination network (DDN) in step S320. In this depth determination network weight optimization, the depth determination network (DDN) determines the authenticity of the depth map for verification (GTD) and determines the artificial depth map (ADM) as fake in the state where the weight of the depth generation network (DGN) is fixed. It is to modify the weight of the discriminant network (DDN). Step S220 will be described in more detail as follows. First, the model generator 120 sets the label for the depth map for verification (GTD) to real (P(gtd) ≥ 0.5]), and sets the label for the artificial depth map (ADM) to fake [ P(adm) < 0.5]). In addition, the model generator 120 inputs the reference image TP for training and the contrast image CP for training into the depth generating network DGN so that the depth generating network DGN generates an artificial depth map ADM. Then, the depth map (ADM/GTD) of any one of the artificial depth map (ADM) and the depth map for verification (GTD) is input to the depth discrimination network (DDN). Then, the depth discrimination network (DDN) performs a plurality of operations to which a weight between a plurality of layers is applied to indicate the probability of whether the input depth map (ADM/GTD) is real or fake. Calculate the discriminant value. When the discriminant value is calculated, the model generator 120 calculates a loss that is the difference between the discriminant value and the previously set label using the loss function, and the loss calculated in a state where the weight of the depth generating network (DGN) is fixed is the minimum. Optimize the weight of the depth discrimination network (DDN) so that

또한, 모델생성부(120)는 S330 단계에서 뎁스생성망(DGN)의 가중치를 최적화한다. 이러한 뎁스생성망 가중치 최적화는 뎁스판별망(DDN)이 인공 뎁스맵(ADM)를 진짜로 판별하도록 뎁스판별망(DDN)의 가중치를 고정한 상태에서 뎁스생성망(DGN)의 가중치를 수정하는 것이다. 이러한 S330 단계에 대해 보다 상세하게 설명하면 다음과 같다. 우선, 모델생성부(120)는 인공 뎁스맵(ADM)에 대한 레이블을 진짜(real [P(adm) ≥ 0.5])로 설정한다. 그리고 모델생성부(120)는 학습용 기준영상(TP) 및 학습용 대조영상(CP)을 뎁스생성망(DGN)에 입력하여 뎁스생성망(DGN)이 인공 뎁스맵(ADM)를 생성하도록 한다. 그런 다음, 모델생성부(120)는 인공 뎁스맵(ADM)을 뎁스판별망(DDN)에 입력한다. 그러면, 뎁스판별망(DDN)은 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 입력된 인공 뎁스맵(ADM)이 진짜(real)인지 혹은 가짜(fake)인지 여부에 대한 확률을 나타내는 판별값을 산출한다. 판별값이 산출되면, 모델생성부(120)는 손실함수를 이용하여 판별값과 앞서 설정된 인공 뎁스맵(ADM)에 대한 레이블, 즉, 진짜(real [P(adm) ≥ 0.5])와의 차이인 손실을 산출하고, 뎁스판별망(DDN)의 가중치는 고정한 상태에서 산출된 손실이 최소가 되도록 뎁스생성망(DGN)의 가중치를 수정하는 최적화를 수행한다. In addition, the model generator 120 optimizes the weight of the depth generating network (DGN) in step S330. This depth generation network weight optimization is to correct the weight of the depth generation network (DGN) while the weight of the depth determination network (DDN) is fixed so that the depth determination network (DDN) truly identifies the artificial depth map (ADM). The step S330 will be described in more detail as follows. First, the model generator 120 sets the label for the artificial depth map (ADM) to real [P(adm) ≥ 0.5]. In addition, the model generator 120 inputs the reference image TP for training and the contrast image CP for training into the depth generating network DGN so that the depth generating network DGN generates an artificial depth map ADM. Then, the model generator 120 inputs the artificial depth map (ADM) to the depth discrimination network (DDN). Then, the depth determination network (DDN) performs a plurality of calculations to which a weight between a plurality of layers is applied to determine whether the input artificial depth map (ADM) is real or fake, indicating a probability. Calculate the value. When the discriminant value is calculated, the model generation unit 120 uses the loss function to determine the difference between the discriminant value and the previously set label for the artificial depth map (ADM), that is, the real (real [P(adm) ≥ 0.5]). The loss is calculated, and the weight of the depth discrimination network (DDN) is fixed, and optimization is performed to correct the weight of the depth generating network (DGN) so that the calculated loss is minimized.

본 발명에 따르면, 전술한 S320 단계 및 S330 수행과 동시에 다음과 같이 S340 단계 및 S350 단계가 수행된다. According to the present invention, steps S340 and S350 are performed as follows simultaneously with the above-described steps S320 and S330.

모델생성부(120)는 S340 단계에서 좌표판별망(PDN)의 가중치를 최적화한다. 이러한 좌표판별망 가중치 최적화는 좌표판별망(PDN)이 검증용 포인트클라우드(GTP)를 진짜로 판별하고, 인공 포인트클라우드(APC)를 가짜로 판별하도록 좌표생성망(PGN)의 가중치를 고정한 상태에서 좌표판별망(PDN)의 가중치를 수정하는 것이다. 이러한 S340 단계에 대해 보다 상세하게 설명하면 다음과 같다. 우선, 모델생성부(120)는 검증용 포인트클라우드(GTP)에 대한 레이블을 진짜(real [P(gtp) ≥ 0.5])로 설정하고, 인공 포인트클라우드(APC)에 대한 레이블을 가짜(fake [P(apc) < 0.5])로 설정한다. 그리고 모델생성부(120)는 학습 데이터 중 학습용 기준영상(TP) 및 검증용 뎁스맵(GTD)을 좌표생성망(PGN)에 입력하여 좌표생성망(PGN)이 인공 포인트클라우드(APC)를 생성하도록 한다. 그런 다음, 인공 포인트클라우드(APC) 및 검증용 포인트클라우드(GTP) 중 어느 하나의 포인트클라우드(APC/GTP)를 좌표판별망(PDN)에 입력한다. 그러면, 좌표판별망(PDN)은 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 입력된 포인트클라우드(APC/GTP)가 진짜(real)인지 혹은 가짜(fake)인지 여부에 대한 확률을 나타내는 판별값을 산출한다. 판별값이 산출되면, 모델생성부(120)는 손실함수를 이용하여 판별값과 앞서 설정된 레이블과의 차이인 손실을 산출하고, 좌표생성망(PGN)의 가중치는 고정한 상태에서 산출된 손실이 최소가 되도록 좌표판별망(PDN)의 가중치를 수정하는 최적화를 수행한다. The model generator 120 optimizes the weight of the coordinate discrimination network (PDN) in step S340. This coordinate discrimination network weight optimization is performed in a state where the weight of the coordinate generation network (PGN) is fixed so that the coordinate discrimination network (PDN) determines the point cloud (GTP) for verification as real and the artificial point cloud (APC) is fake. It is to modify the weight of the discriminant network (PDN). The step S340 will be described in more detail as follows. First, the model generator 120 sets the label for the point cloud (GTP) for verification to real (P(gtp) ≥ 0.5]), and sets the label for the artificial point cloud (APC) to fake [ P(apc) < 0.5]). And the model generator 120 inputs the reference image (TP) for learning and the depth map (GTD) for verification among the training data into the coordinate generating network (PGN), and the coordinate generating network (PGN) generates an artificial point cloud (APC). let it do Then, any one of the artificial point cloud (APC) and the point cloud for verification (GTP) is input to the coordinate identification network (PDN). Then, the coordinate determination network (PDN) performs a plurality of calculations to which a weight between a plurality of layers is applied to indicate the probability of whether the input point cloud (APC/GTP) is real or fake. Calculate the discriminant value. When the discriminant value is calculated, the model generator 120 calculates a loss that is the difference between the discriminant value and the previously set label using the loss function, and the loss calculated while the weight of the coordinate generating network (PGN) is fixed is the minimum. Optimization is performed to modify the weight of the coordinate discrimination network (PDN) so that it becomes .

또한, 모델생성부(120)는 S350 단계에서 좌표생성망(PGN)의 가중치를 최적화한다. 이러한 좌표생성망 가중치 최적화는 좌표판별망(PDN)이 인공 포인트클라우드(APC)를 진짜로 판별하도록 좌표판별망(PDN)의 가중치를 고정한 상태에서 좌표생성망(PGN)의 가중치를 수정하는 것이다. 이러한 S350 단계에 대해 보다 상세하게 설명하면 다음과 같다. 우선, 모델생성부(120)는 인공 포인트클라우드(APC)에 대한 레이블을 진짜(real [P(apc) ≥ 0.5])로 설정한다. 그리고 모델생성부(120)는 학습 데이터 중 학습용 기준영상(TP) 및 검증용 뎁스맵(GTD)을 좌표생성망(PGN)에 입력하여 좌표생성망(PGN)이 인공 포인트클라우드(APC)를 생성하도록 한다. 그런 다음, 모델생성부(120)는 인공 포인트클라우드(APC)를 좌표판별망(PDN)에 입력한다. 그러면, 좌표판별망(PDN)은 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 입력된 인공 포인트클라우드(APC)가 진짜(real)인지 혹은 가짜(fake)인지 여부에 대한 확률을 나타내는 판별값을 산출한다. 판별값이 산출되면, 모델생성부(120)는 손실함수를 이용하여 판별값과 앞서 설정된 인공 포인트클라우드(APC)에 대한 레이블, 즉, 진짜(real [d ≥ 0.5])와의 차이인 손실을 산출하고, 좌표판별망(PDN)의 가중치는 고정한 상태에서 산출된 손실이 최소가 되도록 좌표생성망(PGN)의 가중치를 수정하는 최적화를 수행한다. In addition, the model generator 120 optimizes the weight of the coordinate generating network (PGN) in step S350. This coordinate generating network weight optimization is to modify the weight of the coordinate generating network (PGN) in a state where the weight of the coordinate discrimination network (PDN) is fixed so that the coordinate discrimination network (PDN) can truly determine the artificial point cloud (APC). The step S350 will be described in more detail as follows. First, the model generator 120 sets the label for the artificial point cloud (APC) to real [P(apc) ≥ 0.5]). And the model generator 120 inputs the reference image (TP) for learning and the depth map (GTD) for verification among the training data into the coordinate generating network (PGN), and the coordinate generating network (PGN) generates an artificial point cloud (APC). let it do Then, the model generator 120 inputs the artificial point cloud (APC) to the coordinate discrimination network (PDN). Then, the coordinate discrimination network (PDN) performs a plurality of operations to which a weight between a plurality of layers is applied to determine whether the input artificial point cloud (APC) is real or fake. Calculate the value. When the discriminant value is calculated, the model generator 120 calculates a loss that is the difference between the discriminant value and the previously set label for the artificial point cloud (APC), that is, the real (real [d ≥ 0.5]) using the loss function. And, the weight of the coordinate discriminating network (PDN) is fixed, and optimization is performed to correct the weight of the coordinate generating network (PGN) so that the calculated loss is minimized.

한편, 모델생성부(120)은 S360 단계에서 학습 결과가 학습 종료 조건을 만족하는지 여부를 판단한다. 모델생성부(120)는 소정의 평가 지표를 통해 뎁스생성망(DGN) 및 뎁스판별망(DDN)의 정확도가 기 설정된 정확도를 만족하는 경우, 좌표생성망(PGN) 및 좌표판별망(PDN)의 정확도와 무관하게, 학습 종료 조건을 만족하는 것으로 판단할 수 있다. 이러한 S360 단계의 판단 결과, 학습 종료 조건을 만족하지 못하면, 전술한 310 단계 내지 S360 단계를 반복한다. 반면, 이러한 S360 단계의 판단 결과, 학습 종료 조건을 만족하면, 모델생성부(120)는 S370 단계에서 그룹 학습을 종료한다. On the other hand, the model generator 120 determines whether the learning result satisfies the learning end condition in step S360. When the accuracy of the depth generating network (DGN) and the depth discrimination network (DDN) through a predetermined evaluation index satisfies a preset accuracy, the model generator 120 is configured to perform a coordinate generating network (PGN) and a coordinate discrimination network (PDN). Regardless of the accuracy of , it can be determined that the learning termination condition is satisfied. As a result of the determination in step S360, if the learning end condition is not satisfied, steps 310 to S360 are repeated. On the other hand, as a result of the determination in step S360, if the learning termination condition is satisfied, the model generator 120 ends the group learning in step S370.

다음으로, 전술한 S220 단계에 대해 보다 상세하게 설명하기로 한다. 도 13은 본 발명의 실시예에 따른 좌표 그룹에 대한 학습을 심층적으로 수행하는 심층 학습 방법을 설명하기 위한 흐름도이다. 이러한 도 13은 도 11의 S120 단계를 보다 상세하게 설명하기 위한 것이다. Next, the above-described step S220 will be described in more detail. 13 is a flowchart for explaining a deep learning method for in-depth learning for a coordinate group according to an embodiment of the present invention. This FIG. 13 is for explaining the step S120 of FIG. 11 in more detail.

도 13을 참조하면, 모델생성부(120)는 S410 단계에서 저장부(16)에 저장된 학습 데이터를 로드함으로써 학습 데이터를 준비한다. 학습 데이터는 학습용 기준영상(TP), 학습용 대조영상(CP), 검증용 뎁스맵(GTD) 및 검증용 포인트클라우드(GTP)를 포함한다. 이러한 학습 데이터를 생성하는 방법에 대해서는 앞서 도 10을 참조로 설명한 바 있다. 이와 같이, 학습용 기준영상(TP) 및 학습용 대조영상(CP), 검증용 뎁스맵(GTD) 및 검증용 포인트클라우드(GTP)를 포함하는 학습 데이터가 준비되면, 모델생성부(120)는 S420 단계에서 뎁스생성망(DGN)에 학습용 기준영상(TP) 및 학습용 대조영상(CP)를 입력하여 뎁스생성망(DGN)이 인공 뎁스맵(ADM)을 생성하도록 한다. Referring to FIG. 13 , the model generation unit 120 prepares the training data by loading the training data stored in the storage unit 16 in step S410 . The training data includes a reference image for training (TP), a contrast image for training (CP), a depth map for verification (GTD), and a point cloud for verification (GTP). A method of generating such training data has been previously described with reference to FIG. 10 . In this way, when training data including a reference image (TP) for learning and a contrast image (CP) for learning, a depth map for verification (GTD), and a point cloud (GTP) for verification are prepared, the model generation unit 120 is performed in step S420 input the reference image (TP) for learning and the contrast image (CP) for learning to the depth generating network (DGN) so that the depth generating network (DGN) generates an artificial depth map (ADM).

그런 다음, 모델생성부(120)는 S430 단계에서 인공 뎁스맵(ADM)에 대한 뎁스판별망(DDN)의 판별값을 기초로 인공 뎁스맵(ADM)이 진짜 혹은 가짜로 판별되었는지 여부를 나타내는 플래그를 도출한다. 즉, S430 단계에서 모델생성부(120)는 인공 뎁스맵(ADM)을 뎁스판별망(DDN)에 입력하여 뎁스판별망(DDN)이 인공 뎁스맵(ADM)이 진짜인지 혹은 가짜인지 여부에 대한 확률을 나타내는 판별값을 산출하도록 한다. 그런 다음, 모델생성부(120)는 뎁스판별망(DDN)이 산출한 판별값을 소정의 기준값에 따라 플래그 [1/0]로 변환한다. 여기서, 기준값은 0.5가 될 수 있다. 예컨대, 판별값이 0.5 이상이면, 플래그는 1이고, 판별값이 0.5 미만이면, 플래그는 0이 될 수 있다. 이러한 플래그[1/0]는 좌표판별망(PDN)에 입력될 것이다. Then, the model generator 120 generates a flag indicating whether the artificial depth map (ADM) is determined to be real or fake based on the discrimination value of the depth discrimination network (DDN) for the artificial depth map (ADM) in step S430 to derive That is, in step S430 , the model generator 120 inputs the artificial depth map (ADM) to the depth determination network (DDN), and the depth determination network (DDN) determines whether the artificial depth map (ADM) is real or fake. A discriminant value representing the probability is calculated. Then, the model generation unit 120 converts the determination value calculated by the depth determination network (DDN) into a flag [1/0] according to a predetermined reference value. Here, the reference value may be 0.5. For example, if the discriminant value is 0.5 or more, the flag may be 1, and if the discriminant value is less than 0.5, the flag may be 0. These flags [1/0] will be input to the coordinate identification network (PDN).

이어서, 모델생성부(120)는 S440 단계에서 좌표판별망(PDN)의 가중치를 최적화한다. 이러한 좌표판별망 가중치 최적화는 학습용 기준영상(TP), 인공 뎁스맵(ADM) 및 플래그[1/0]를 이용하여 좌표판별망(PDN)이 검증용 포인트클라우드(GTP)를 진짜로 판별하고, 인공 포인트클라우드(APC)를 가짜로 판별하도록 좌표생성망(PGN)의 가중치를 고정한 상태에서 좌표판별망(PDN)의 가중치를 수정하는 것이다. 이러한 S350 단계에 대해 보다 상세하게 설명하면 다음과 같다. 우선, 모델생성부(120)는 검증용 포인트클라우드(GTP)에 대한 레이블을 진짜(real [P(gtp) ≥ 0.5])로 설정하고, 인공 포인트클라우드(APC)에 대한 레이블을 가짜(fake [P(apc) < 0.5])로 설정한다. 그리고 모델생성부(120)는 학습용 기준영상(TP) 및 인공 뎁스맵(ADM)을 좌표생성망(PGN)에 입력하여 좌표생성망(PGN)이 인공 포인트클라우드(APC)를 생성하도록 한다. 그런 다음, 인공 포인트클라우드(APC) 및 검증용 포인트클라우드(GTP) 중 어느 하나의 포인트클라우드(APC/GTP)와 플래그 [0/1]를 좌표판별망(PDN)에 입력한다. 그러면, 좌표판별망(PDN)은 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 입력된 포인트클라우드(APC/GTP)가 진짜(real)인지 혹은 가짜(fake)인지 여부에 대한 확률을 나타내는 판별값을 산출한다. 판별값이 산출되면, 모델생성부(120)는 손실함수를 이용하여 판별값과 앞서 설정된 레이블과의 차이인 손실을 산출하고, 좌표생성망(PGN)의 가중치는 고정한 상태에서 산출된 손실이 최소가 되도록 좌표판별망(PDN)의 가중치를 수정하는 최적화를 수행한다. Then, the model generator 120 optimizes the weight of the coordinate discrimination network (PDN) in step S440. This coordinate discrimination network weight optimization uses a training reference image (TP), an artificial depth map (ADM), and a flag [1/0] to allow the coordinate discrimination network (PDN) to truly determine the point cloud (GTP) for verification, and To determine the point cloud (APC) as fake, the weight of the coordinate generation network (PGN) is fixed while the weight of the coordinate identification network (PDN) is modified. The step S350 will be described in more detail as follows. First, the model generator 120 sets the label for the point cloud (GTP) for verification to real (P(gtp) ≥ 0.5]), and sets the label for the artificial point cloud (APC) to fake [ P(apc) < 0.5]). And the model generator 120 inputs the reference image TP for training and the artificial depth map ADM to the coordinate generating network PGN so that the coordinate generating network PGN generates the artificial point cloud APC. Then, input the point cloud (APC/GTP) and flag [0/1] of any one of the artificial point cloud (APC) and the point cloud for verification (GTP) to the coordinate identification network (PDN). Then, the coordinate discrimination network (PDN) performs a plurality of operations to which a weight between a plurality of layers is applied to indicate the probability of whether the input point cloud (APC/GTP) is real or fake. Calculate the discriminant value. When the discriminant value is calculated, the model generator 120 calculates a loss that is the difference between the discriminant value and the previously set label using the loss function, and the loss calculated while the weight of the coordinate generating network (PGN) is fixed is the minimum. Optimization is performed to modify the weight of the coordinate discrimination network (PDN) so that it becomes .

또한, 모델생성부(120)는 S450 단계에서 좌표생성망(PGN)의 가중치를 최적화한다. 이러한 좌표생성망 가중치 최적화는 학습용 기준영상(TP), 인공 뎁스맵(ADM) 및 플래그[1/0]를 이용하여 좌표판별망(PDN)이 인공 포인트클라우드(APC)를 진짜로 판별하도록 좌표판별망(PDN)의 가중치를 고정한 상태에서 좌표생성망(PGN)의 가중치를 수정하는 것이다. 이러한 S450 단계에 대해 보다 상세하게 설명하면 다음과 같다. 우선, 모델생성부(120)는 인공 포인트클라우드(APC)에 대한 레이블을 진짜(real [P(apc) ≥ 0.5])로 설정한다. 그리고 모델생성부(120)는 학습용 기준영상(TP) 및 앞서(S420) 도출된 인공 뎁스맵(ADM)을 좌표생성망(PGN)에 입력하여 좌표생성망(PGN)이 인공 포인트클라우드(APC)를 생성하도록 한다. 그런 다음, 모델생성부(120)는 인공 포인트클라우드(APC)를 좌표판별망(PDN)에 입력한다. 그러면, 좌표판별망(PDN)은 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 입력된 인공 포인트클라우드(APC)가 진짜(real)인지 혹은 가짜(fake)인지 여부에 대한 확률을 나타내는 판별값을 산출한다. 판별값이 산출되면, 모델생성부(120)는 손실함수를 이용하여 판별값과 앞서 설정된 인공 포인트클라우드(APC)에 대한 레이블, 즉, 진짜(real [d ≥ 0.5])와의 차이인 손실을 산출하고, 좌표판별망(PDN)의 가중치는 고정한 상태에서 산출된 손실이 최소가 되도록 좌표생성망(PGN)의 가중치를 수정하는 최적화를 수행한다. In addition, the model generator 120 optimizes the weight of the coordinate generating network (PGN) in step S450. This coordinate generation network weight optimization uses the training reference image (TP), artificial depth map (ADM), and flag [1/0] so that the coordinate discrimination network (PDN) can truly identify the artificial point cloud (APC). The weight of the coordinate generating network (PGN) is modified while the weight of the (PDN) is fixed. The step S450 will be described in more detail as follows. First, the model generator 120 sets the label for the artificial point cloud (APC) to real [P(apc) ≥ 0.5]). And the model generator 120 inputs the reference image for learning (TP) and the artificial depth map (ADM) derived earlier (S420) to the coordinate generating network (PGN), and the coordinate generating network (PGN) is an artificial point cloud (APC) to create Then, the model generator 120 inputs the artificial point cloud (APC) to the coordinate discrimination network (PDN). Then, the coordinate discrimination network (PDN) performs a plurality of operations to which a weight between a plurality of layers is applied to determine whether the input artificial point cloud (APC) is real or fake. Calculate the value. When the discriminant value is calculated, the model generator 120 calculates a loss that is the difference between the discriminant value and the previously set label for the artificial point cloud (APC), that is, the real (real [d ≥ 0.5]) using the loss function. And, the weight of the coordinate discriminating network (PDN) is fixed, and optimization is performed to correct the weight of the coordinate generating network (PGN) so that the calculated loss is minimized.

한편, 모델생성부(120)는 S460 단계에서 학습 결과가 학습 종료 조건을 만족하는지 여부를 판단한다. 모델생성부(120)는 소정의 평가 지표를 통해 좌표생성망(PGN) 및 좌표판별망(PDN)의 정확도가 기 설정된 정확도를 만족하면, 학습 종료 조건을 만족하는 것으로 판단할 수 있다. 이러한 S460 단계의 판단 결과, 학습 종료 조건을 만족하지 못하면, 전술한 S410 단계 내지 S460 단계를 반복한다. 반면, 이러한 S460 단계의 판단 결과, 학습 종료 조건을 만족하면, 모델생성부(120)는 S470 단계에서 심층 학습을 종료한다. 이로써, 추정 모델(EM)이 생성된다. Meanwhile, the model generator 120 determines whether the learning result satisfies the learning end condition in step S460 . The model generator 120 may determine that the learning termination condition is satisfied when the accuracy of the coordinate generating network (PGN) and the coordinate discrimination network (PDN) through a predetermined evaluation index satisfies a preset accuracy. As a result of the determination in step S460, if the learning termination condition is not satisfied, the above-described steps S410 to S460 are repeated. On the other hand, as a result of the determination in step S460, if the learning termination condition is satisfied, the model generator 120 ends the deep learning in step S470. Thereby, an estimation model EM is generated.

전술한 바와 같이, 본 발명의 실시예에 따른 학습 절차, 즉, 최적화 절차에서 뎁스판별망(DDN)에 대한 최적화와 뎁스생성망(DGN)에 대한 최적화를 교번으로 반복하여 수행하며, 좌표판별망(PDN)에 대한 최적화와 좌표생성망(PGN)에 대한 최적화를 교번으로 반복하여 수행한다. 이때, 모델생성부(120)는 그래디언트에 따라 판별망(DDN, PDN)의 최적화에 사용되는 학습 데이터의 수와 생성망(DGN, PGN)의 최적화에 사용되는 학습 데이터의 수를 달리 적용할 수 있다. 즉, 모델생성부(120)의 최종적인 목표는 좌표판별망(PDN)이 좌표생성망(PGN)에 의해 생성된 인공 포인트클라우드(APC) 및 인공 포인트클라우드(APC)가 가짜인지 혹은 진짜인지 여부에 대한 확률을 0.5(50%)로 산출하는 것이다. 그래디언트가 낮다면 요구되는 학습 데이터의 수가 너무 많아져서 학습 속도가 느려지기 때문에 그래디언트를 증가시키는 것이 바람직하다. 따라서 모델생성부(120)은 판별망(DDN, PDN)에 대한 최적화 및 생성망(DGN, PGN)에 대한 최적화를 포함하는 하나의 에포크(epoch)가 종료될 때마다 그래디언트를 산출하고, 최적화 절차를 반복할 때 그래디언트에 반비례하는 학습 데이터의 수를 사용할 수 있다. 이에 따라, 일례로, 한 번의 에포크에서 판별망(DDN, PDN)에 대한 최적화 횟수 1번과, 생성망(DGN, PGN)에 대한 최적화 횟수 2번가 실행되도록 변경할 수 있다. As described above, in the learning procedure according to the embodiment of the present invention, that is, in the optimization procedure, the optimization for the depth discrimination network (DDN) and the optimization for the depth generation network (DGN) are alternately repeated and performed, and the coordinate discrimination network Optimization for (PDN) and optimization for coordinate generation network (PGN) are alternately repeated. At this time, the model generator 120 may apply the number of training data used for optimization of the discriminant networks (DDN, PDN) and the number of training data used for optimization of the generating networks (DGN, PGN) differently according to the gradient. have. That is, the final goal of the model generator 120 is whether the coordinate discrimination network (PDN) is the artificial point cloud (APC) and the artificial point cloud (APC) generated by the coordinate generation network (PGN) are fake or real. to calculate the probability of 0.5 (50%). If the gradient is low, it is desirable to increase the gradient because the number of required training data becomes too large and the learning rate is slowed down. Therefore, the model generator 120 calculates a gradient each time one epoch including optimization for the discriminant networks (DDN, PDN) and optimization for the generating networks (DGN, PGN) ends, and the optimization procedure You can use the number of training data that is inversely proportional to the gradient when iterating through . Accordingly, for example, the number of optimizations for the discriminant networks (DDN, PDN) and the number of optimizations for the generating networks (DGN, PGN) may be changed to be executed in one epoch, for example.

다음으로, 전술한 본 발명의 실시예에 따른 증강현실을 제공하기 위한 방법에 대해서 설명하기로 한다. 도 14는 본 발명의 실시예에 따른 증강현실을 제공하기 위한 방법을 설명하기 위한 흐름도이다. Next, a method for providing augmented reality according to the above-described embodiment of the present invention will be described. 14 is a flowchart illustrating a method for providing augmented reality according to an embodiment of the present invention.

도 14를 참조하면, 사용자장치(10)의 제어부(17)는 S510 단계에서 사용자의 선택에 따라 증강하고자 하는 대상인 가상 객체(VO)를 선택한다. 이러한 S510 단계에서 사용자장치(10)의 제어부(17)는 통신부(11)를 통해 증강서버(20)에 접속하여 가상 객체 리스트를 요청하면, 증강서버(20)의 가상객체처리부(150)는 저장부(16)에 저장된 가상 객체 리스트를 로드한 후, 통신모듈(21)을 통해 섬네일 형식으로 사용자장치(10)에 제공하고, 사용자장치(10)의 제어부(17)는 통신부(11)를 통해 가상 객체 리스트를 수신하여 표시부(15)를 통해 표시할 수 있다. 사용자는 표시된 가상 객체 리스트를 열람하고 자신이 원하는 가상 객체(VO)를 선택할 수 있다. 사용자의 가상 객체(VO)를 선택하는 입력에 따라, 제어부(17)는 통신부(11)를 통해 증강서버(20)로 해당 가상 객체(VO)를 식별하는 정보를 전송하여 해당 가상 객체(VO)를 선택하고, 가상 객체(VO)가 선택되면, 증강서버(20)의 가상객체처리부(150)는 S520 단계에서 통신모듈(21)을 통해 가상객체를 합성할 표면을 촬영하도록 하는 안내 메시지를 전송한다. 이러한 안내 메시지는 가상객체를 합성할 표면을 적어도 2개의 서로 다른 포즈로 촬영하도록 하는 메시지를 포함할 수 있다. 이와 함께, S520 단계에서 가상객체처리부(150)는 통신모듈(21)을 통해 가상객체를 합성할 표면 상에 사용자장치(10)가 제공하는 입력 수단, 예컨대, 입력부(14), 터치스크린으로 사용되는 표시부(15) 등을 통해 가상 객체를 이동시키도록 안내 메시지를 전송한다. 이에 따라, 사용자는 사용자장치(10)를 움직임으로써 카메라부(12)의 렌즈가 향하는 방향을 움직이면서 가상 객체(VO)를 합성할 표면을 촬영할 것이다. 또한, 사용자는 터치 입력 등을 통해 화면 상에서 가상 객체(VO)를 이동시킬 것이다. Referring to FIG. 14 , the controller 17 of the user device 10 selects a virtual object VO that is to be augmented according to the user's selection in step S510 . In step S510, when the control unit 17 of the user device 10 accesses the augmentation server 20 through the communication unit 11 and requests a list of virtual objects, the virtual object processing unit 150 of the augmentation server 20 stores After loading the virtual object list stored in the unit 16 , it is provided to the user device 10 in the form of a thumbnail through the communication module 21 , and the control unit 17 of the user device 10 is transmitted through the communication unit 11 . The virtual object list may be received and displayed through the display unit 15 . The user may browse the displayed virtual object list and select a desired virtual object VO. In response to the user's input for selecting the virtual object (VO), the control unit 17 transmits information identifying the virtual object (VO) to the augmentation server 20 through the communication unit 11 to generate the corresponding virtual object (VO). is selected, and when the virtual object (VO) is selected, the virtual object processing unit 150 of the augmentation server 20 transmits a guide message to photograph the surface on which the virtual object is to be synthesized through the communication module 21 in step S520. do. Such a guide message may include a message for photographing a surface to be synthesized with a virtual object in at least two different poses. At the same time, in step S520 , the virtual object processing unit 150 is used as an input means provided by the user device 10 on the surface to synthesize the virtual object through the communication module 21 , for example, the input unit 14 and the touch screen. A guide message is transmitted to move the virtual object through the display unit 15 or the like. Accordingly, by moving the user device 10 , the user will photograph the surface on which the virtual object VO is to be synthesized while moving the lens of the camera unit 12 . In addition, the user will move the virtual object VO on the screen through a touch input or the like.

카메라부(12) 렌즈의 방향을 고정하기 전까지 적어도 2개의 서로 다른 포즈에서 촬영된 기준영상 및 대조영상이 촬영될 것이다. 이에 따라, 제어부(17)는 S530 단계에서 카메라부(12)를 통해 서로 다른 포즈에서 촬영된 기준영상 및 대조영상을 통신부(11)를 통해 증강서버(20)로 전송한다. 또한, 제어부(17)는 S540 단계에서 사용자에 입력에 따른 화면 상에서 가상 객체의 포즈(pose)를 통신부(11)를 통해 증강서버(20)로 전송한다. Until the direction of the camera unit 12 lens is fixed, the reference image and the contrast image taken in at least two different poses will be photographed. Accordingly, the control unit 17 transmits the reference image and the contrast image captured at different poses through the camera unit 12 in step S530 to the augmentation server 20 through the communication unit 11 . In addition, the controller 17 transmits the pose of the virtual object on the screen according to the user input in step S540 to the augmentation server 20 through the communication unit 11 .

그러면, 증강서버(20)의 추정부(130)는 통신모듈(21)을 통해 사용자장치(10)로부터 서로 다른 포즈에서 촬영된 기준영상 및 대조영상을 수신할 수 있다. 그러면, 추정부(130)는 S550 단계에서 추정모델(EM)을 통해 기준영상 및 대조영상에 대해 복수의 계층 간 학습된 가중치가 적용되는 복수의 연산을 수행하여 기준영상의 3차원 좌표를 나타내는 포인트클라우드, 즉, 인공 포인트클라우드(APC)를 생성한다. Then, the estimator 130 of the augmentation server 20 may receive the reference image and the contrast image photographed in different poses from the user device 10 through the communication module 21 . Then, in step S550 , the estimator 130 performs a plurality of operations to which the weights learned between a plurality of layers are applied to the reference image and the control image through the estimation model (EM) to indicate the three-dimensional coordinates of the reference image. Create a cloud, that is, an artificial point cloud (APC).

그런 다음, 증강서버(20)의 객체증강부(140)는 S560 단계에서 가상객체처리부(150)를 통해 가상 객체(OV) 및 가상 객체(OV)의 포즈를 입력 받고, 가상 객체(OV)의 포즈에 따라 인공 포인트클라우드(APC)의 좌표 상에 가상 객체(OV)를 정합하여 증강현실 영상을 생성한다. Then, the object augmentation unit 140 of the augmentation server 20 receives the virtual object OV and the pose of the virtual object OV through the virtual object processing unit 150 in step S560, and An augmented reality image is generated by matching a virtual object (OV) on the coordinates of an artificial point cloud (APC) according to a pose.

그런 다음, 증강서버(20)의 객체증강부(140)는 S570 단계에서 증강현실 영상을 사용자장치(10)로 전송한다. 대안적인 실시예에 따르면, 객체증강부(140)는 증강현실 영상을 직접 전송하는 대신, 인공 포인트클라우드(APC)와 가상 객체(OV)가 정합된 인공 포인트클라우드(APC)의 좌표인 정합 좌표만을 전송할 수도 있다. Then, the object augmentation unit 140 of the augmented server 20 transmits the augmented reality image to the user device 10 in step S570. According to an alternative embodiment, the object augmentation unit 140 does not directly transmit the augmented reality image, but only the matching coordinates that are the coordinates of the artificial point cloud (APC) in which the artificial point cloud (APC) and the virtual object (OV) are matched. can also be transmitted.

증강현실 영상이 수신되는 경우, 사용자장치(10)의 제어부(17)는 S580 단계에서 표시부(15)를 통해 수신된 증강현실 영상을 표시할 수 있다. 대안적인 실시예에 따라, 인공 포인트클라우드(APC)와 가상 객체(OV)가 정합된 인공 포인트클라우드(APC)의 좌표인 정합 좌표를 수신한 경우, 사용자장치(10)의 제어부(17)는 인공 포인트클라우드(APC) 및 정합 좌표를 기초로 증강현실 영상을 생성하고, 표시부(15)를 통해 생성된 증강현실 영상을 표시할 수 있다. When the augmented reality image is received, the controller 17 of the user device 10 may display the received augmented reality image through the display unit 15 in step S580 . According to an alternative embodiment, when the matching coordinates, which are coordinates of the artificial point cloud (APC) in which the artificial point cloud (APC) and the virtual object (OV) are matched, are received, the control unit 17 of the user device 10 may An augmented reality image may be generated based on the point cloud (APC) and matching coordinates, and the generated augmented reality image may be displayed through the display unit 15 .

한편, 앞서 설명된 본 발명의 실시예에 따른 방법은 다양한 컴퓨터수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. Meanwhile, the method according to the embodiment of the present invention described above may be implemented in the form of a program readable through various computer means and recorded in a computer readable recording medium. Here, the recording medium may include a program command, a data file, a data structure, etc. alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks ( magneto-optical media), and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions may include not only machine language such as generated by a compiler, but also a high-level language that can be executed by a computer using an interpreter or the like. Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상 본 발명을 몇 가지 바람직한 실시예를 사용하여 설명하였으나, 이들 실시예는 예시적인 것이며 한정적인 것이 아니다. 이와 같이, 본 발명이 속하는 기술분야에서 통상의 지식을 지닌 자라면 본 발명의 사상과 첨부된 특허청구범위에 제시된 권리범위에서 벗어나지 않으면서 균등론에 따라 다양한 변화와 수정을 가할 수 있음을 이해할 것이다. Although the present invention has been described above using several preferred embodiments, these examples are illustrative and not restrictive. As such, those of ordinary skill in the art to which the present invention pertains will understand that various changes and modifications can be made in accordance with the doctrine of equivalents without departing from the spirit of the present invention and the scope of rights set forth in the appended claims.

10: 사용자장치
20: 증강서버
110: 학습데이터생성부
120: 모델생성부
130: 추정부
140: 객체증강부
150: 가상객체처리부 10: User device
20: augmented server
110: learning data generation unit
120: model generation unit
130: estimator
140: object augmentation unit
150: virtual object processing unit

Claims

In the device for providing augmented reality,
a communication module for receiving a reference image and a contrast image taken at different poses from the user device;
an estimator configured to generate an artificial point cloud representing the three-dimensional coordinates of the reference image by performing a plurality of operations to which the weights learned between a plurality of layers are applied to the reference image and the control image through the learned estimation model; and
an object augmentation unit for matching virtual objects based on the artificial point cloud;
includes,
The estimation model is
a depth generating network for generating an artificial depth map through a plurality of operations in which a plurality of inter-layer weights are applied to the learning reference image and the learning contrast image when the learning reference image and the learning contrast image are input;
a depth discrimination network for outputting with a probability whether the input depth map is real or fake when any one of the depth map for verification and the artificial depth map is input;
a coordinate generating network for generating an artificial point cloud through a plurality of operations in which a plurality of inter-layer weights are applied to the reference image for learning and the artificial depth map when the reference image for learning and the artificial depth map are input; and
a coordinate discrimination network for outputting with a probability whether the input point cloud is real or fake when any one of the point cloud for verification and the artificial point cloud is input;
characterized by comprising
A device for providing augmented reality.

According to claim 1,
The object enhancement unit
Transmitting any one of an augmented reality image in which the virtual object is matched and matching coordinates indicating coordinates in which the virtual object is matched to a user device
characterized by
A device for providing augmented reality.

According to claim 1,
the device is
Collecting reference images for learning, contrast images for learning, and camera parameters taken in different poses,
Calculate the disparity by detecting the position where the matching cost is the minimum between the reference window of the reference image for learning and the contrast window of the contrast image for learning using the calculated average vector,
deriving a depth map for verification indicating the depth of all pixels of the reference image by using the camera parameter and the parallax,
Generating a point cloud for verification indicating the three-dimensional coordinates of the reference image based on the depth map for verification
learning data generation unit; and
a model generator for training the estimation model to generate an artificial point cloud by using the training data including the reference image for training, the contrast image for training, the depth map for verification, and the point cloud for verification;
characterized in that it further comprises
A device for providing augmented reality.

delete

4. The method of claim 3,
The model generation unit
The prototype of the estimation model is divided into a depth group including a depth generation network and a depth discrimination network and a coordinate group including a coordinate generation network and a coordinate discrimination network, and group learning is performed for each group,
When the learning for the depth group satisfies the preset accuracy through the evaluation index, using the artificial depth map generated by the depth generating network in a state where the weight of the depth group is fixed, the coordinates including the coordinate generating network and the coordinate discrimination network Characterized in performing deep learning to learn only the group
A device for providing augmented reality.

6. The method of claim 5,
The model generation unit
Depth determination network weight optimization of correcting the weight of the depth determination network while the weight of the depth generation network is fixed so that the depth determination network determines the depth map for verification as real and determines the artificial depth map as fake;
In a state in which the weights of the depth discrimination network are fixed so that the depth discrimination network truly determines the artificial depth map, the depth generating network weight optimization of correcting the weights of the depth generating network is alternately performed while at the same time
Coordinate discrimination network weight optimization in which the weight of the coordinate discrimination network is corrected while the weight of the coordinate generating network is fixed so that the coordinate discrimination network determines the real point cloud for verification and determines the artificial point cloud as fake;
In a state where the weight of the coordinate discrimination network is fixed so that the coordinate discrimination network truly determines the artificial point cloud, the coordinate generating network weight optimization of correcting the weight of the coordinate generating network is alternately performed,
The depth determination network weight optimization, the depth generation network weight optimization, the coordinate determination network weight optimization, and the coordinate generation network until the accuracy of the depth generation network and the depth determination network satisfies a preset accuracy through a predetermined evaluation index Iterating weight optimization
characterized by performing group learning
A device for providing augmented reality.

7. The method of claim 6,
The model generation unit
An artificial depth map is generated through the depth generating network by inputting the reference image for learning and the contrast image for learning,
Deriving a flag indicating whether the artificial depth map generated by the depth generating network is determined to be real or fake through the depth discrimination network;
After generating an artificial point cloud through the coordinate generating network based on the learning reference image and the artificial depth map generated by the depth generating network,
If the derived flag indicates that the artificial depth map is real,
Coordinate discrimination network weight optimization in which the weight of the coordinate discrimination network is corrected while the weight of the coordinate generating network is fixed so that the coordinate discrimination network determines the real point cloud for verification and determines the artificial point cloud as fake;
In a state where the weight of the coordinate discrimination network is fixed so that the coordinate discrimination network truly determines the artificial point cloud, the coordinate generating network weight optimization of correcting the weight of the coordinate generating network is alternately performed,
repeating optimization of the weight of the coordinate determination network and optimization of the weight of the coordinate generation network until the accuracy of the coordinate generation network and the coordinate determination network satisfies a preset accuracy through a predetermined evaluation index
characterized by performing deep learning
A device for providing augmented reality.

In a method for providing augmented reality performed by a computing device,
receiving, by the communication module, a reference image and a contrast image taken at different poses from the user device;
generating an artificial point cloud representing the three-dimensional coordinates of the reference image by performing a plurality of calculations in which the weights learned between a plurality of layers are applied to the reference image and the control image through the estimated model learned by the estimator; and
generating an augmented reality image by an object augmentation unit matching a virtual object to the reference image based on the artificial point cloud;
includes,
Before receiving the reference image and the contrast image taken in the different poses,
performing group learning for each group by a model generator dividing the prototype of the estimation model into a depth group including a depth generation network and a depth discrimination network and a coordinate group including a coordinate generation network and a coordinate discrimination network; and
When the learning for the depth group satisfies the preset accuracy through the evaluation index, the model generator uses the artificial depth map generated by the depth generator in a state where the weight of the depth group is fixed, and the coordinate generation network and coordinates are determined performing deep learning for learning only a coordinate group including a network;
characterized in that it further comprises
A method for providing augmented reality.

9. The method of claim 8,
Before performing the group learning step,
Before receiving the reference image and the contrast image taken in the different poses,
A learning data generator comprising: collecting a reference image for learning, a contrast image for learning, and camera parameters taken in different poses;
calculating a disparity by detecting, by the learning data generation unit, a position where the matching cost is the minimum between the reference window of the reference image for learning and the contrast window of the reference image for learning using the calculated average vector;
deriving, by the learning data generator, a depth map for verification indicating depths of all pixels of the reference image by using the camera parameter and the parallax; and
generating, by the training data generator, a point cloud for verification indicating the three-dimensional coordinates of the reference image based on the depth map for verification;
characterized in that it further comprises
A method for providing augmented reality.

delete