KR20230114949A

KR20230114949A - Learning apparatus and method for three-dimensioinal object model generation, and method for generating hree-dimensioinal object model

Info

Publication number: KR20230114949A
Application number: KR1020220011327A
Authority: KR
Inventors: 김재헌; 구본기
Original assignee: 한국전자통신연구원
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2023-08-02

Abstract

A learning method for generating a 3D object model using images in a learning apparatus is provided. The learning method includes the steps of: learning the parameters of an encoder and occupancy network for each category using a plurality of models for each category stored in a 3D model DB; learning a latent vector for each model within each category using the parameters of the encoder and occupancy network for each category; and storing a latent vector for each model according to each category in a latent vector DB. Therefore, it is possible to secure real-time performance, eliminate missing areas, and secure a watertight model.

Description

Learning apparatus and method for generating 3D object model and method for generating 3D object model

본 개시는 3차원 객체 모델 생성을 위한 학습 장치 및 방법과 3차원 객체 모델 생성 방법에 관한 것으로, 보다 상세하게는 점유 네트워크를 이용하여 영상으로부터 3차원 객체 모델을 생성할 수 있는 3차원 객체 모델 생성을 위한 학습 장치 및 방법과 3차원 객체 모델 생성 방법에 관한 것이다. The present disclosure relates to a learning apparatus and method for generating a 3D object model and a method for generating a 3D object model, and more particularly, to generating a 3D object model capable of generating a 3D object model from an image using an occupancy network. It relates to a learning device and method for and a method for generating a 3D object model.

실제 공간의 3차원 객체에 대한 모델을 획득하는 것은 가상현실, 증강현실 및 확장현실 등 다양한 적용 분야가 있다. 그 중에 카메라를 이용하여 획득한 영상 정보만을 이용하여 3차원 모델을 획득하는 방법은 추가로 빛을 조사해야 하는 3차원 스캐너를 사용하는 방법보다 간편하고 시스템 구축 비용이 저렴한 방법이다.Acquisition of a model for a 3D object in real space has various application fields such as virtual reality, augmented reality, and extended reality. Among them, a method of obtaining a 3D model using only image information acquired using a camera is simpler and less expensive to build a system than a method using a 3D scanner that requires additional light irradiation.

그러나 영상 정보만을 이용하여 3차원 모델을 획득하는 방법은 최종 3차원 정보를 획득하는데 시간이 많이 걸리며, 일차적으로 획득한 3차원 포인트 클라우드에서 노이즈를 정제하고, 이를 메쉬 모델로 모델링하는 과정을 거쳐야 한다. 따라서, 영상 정보만을 이용하는 방법은 시간적으로 실시간이 필요 없는 응용 분야의 경우에는 접근이 타당하나 실시간성이 요구되는 가상현실, 증강현실 및 확장현실과 같은 분야에는 적용하기 어려운 방법이다. 더욱이 3차원 포인트 클라우드 생성 시의 비가시 영역의 누락, 메쉬 모델을 생성하는 과정에서 수밀(Watertight) 미확보가 발생하는 문제가 존재한다. However, the method of obtaining a 3D model using only image information takes a lot of time to obtain the final 3D information, and it is necessary to firstly refine the noise in the acquired 3D point cloud and model it into a mesh model. . Therefore, the method using only image information is appropriate for application fields that do not require real-time temporality, but is difficult to apply to fields such as virtual reality, augmented reality, and extended reality that require real-time. Furthermore, there are problems of omission of invisible areas when generating a 3D point cloud and failure to secure watertightness in the process of generating a mesh model.

따라서 영상을 이용하여 3차원 객체의 모델을 생성하는데 있어서, 실시간성 확보, 누락지역 제거, 수밀 모델 확보가 동시에 필요한 기술이 요구되는 상황이다Therefore, in creating a model of a 3D object using an image, a technology that simultaneously secures real-time, removes missing areas, and secures a watertight model is required.

본 개시가 해결하려는 과제는 3차원 객체 모델을 생성하는데 있어서, 실시간성 확보, 누락지역 제거, 수밀 모델 확보를 동시에 만족시킬 수 있는, 3차원 객체 모델 생성을 위한 학습 장치 및 방법과 3차원 객체 모델 생성 방법을 제공하는 것이다. The problem to be solved by the present disclosure is a learning apparatus and method for generating a 3D object model and a 3D object model that can simultaneously satisfy real-time, removing missing areas, and securing a watertight model in generating a 3D object model. It is to provide a method of creation.

한 실시 예에 따르면, 학습 장치에서 영상을 이용하여 3차원 객체 모델을 생성하기 위한 학습 방법이 제공된다. 학습 방법은 3차원 모델 DB에 저장되어 있는 카테고리별 복수의 모델을 이용하여 각 카테고리에 따른 인코더와 점유 네트워크의 파라미터를 학습하는 단계, 상기 각 카테고리의 인코더와 점유 네트워크의 파라미터를 이용하여 상기 각 카테고리 내에서 모델별 잠재 벡터를 학습하는 단계, 그리고 상기 각 카테고리에 따라 모델별 잠재 벡터를 잠재 벡터 DB에 저장하는 단계를 포함한다.According to one embodiment, a learning method for generating a 3D object model using an image in a learning device is provided. The learning method includes learning parameters of an encoder and occupancy network according to each category using a plurality of models for each category stored in a 3D model DB, and using the parameters of the encoder and occupancy network of each category for each category. learning latent vectors for each model within the system, and storing the latent vectors for each model in a latent vector DB according to each category.

상기 인코더와 점유 네트워크의 파라미터를 학습하는 단계는 상기 인코더에서 어느 하나의 카테고리의 각 모델에 대한 2차원 영상으로부터 잠재 벡터를 생성하는 단계, 상기 점유 네트워크에서 상기 잠재 벡터와 각 점의 3차원 좌표로부터 상기 각 점에 대한 점유 여부와 RGB 색을 출력하는 단계, 상기 각 점에 대한 점유 여부와 RGB 색으로부터 생성되는 다수의 2차원 렌더링 영상과 해당 모델의 다중 뷰 렌더링 영상간 차이를 손실로 계산하는 단계, 그리고 상기 손실을 토대로 상기 인코더와 상기 점유 네트워크의 파라미터를 학습하는 단계를 포함할 수 있다. The step of learning the parameters of the encoder and the occupied network is the step of generating a latent vector from a 2D image for each model of any one category in the encoder, from the 3D coordinates of each point and the latent vector in the occupant network. Outputting the occupancy status and RGB color of each point, calculating the difference between a plurality of 2D rendering images generated from the occupancy status and RGB color of each point and the multi-view rendering image of the corresponding model as a loss , and learning parameters of the encoder and the occupant network based on the loss.

상기 차이는 두 영상간 각 픽셀의 RGB 색 차이 또는 깊이 값 차이를 포함할 수 있다. The difference may include a difference in RGB color or depth value of each pixel between the two images.

상기 인코더와 점유 네트워크의 파라미터를 학습하는 단계는 상기 인코더에서 어느 하나의 카테고리의 각 모델에 대한 2차원 영상으로부터 잠재 벡터를 생성하는 단계, 상기 점유 네트워크에서 상기 잠재 벡터와 각 점의 3차원 좌표로부터 상기 각 점에 대한 점유 여부와 RGB 색을 출력하는 단계, 상기 각 점에 대한 점유 여부와 RGB 색으로부터 생성되는 3차원 메쉬 모델과 해당 모델의 3차원 메쉬 모델 자체의 3차원 차이를 손실로 계산하는 단계, 그리고 상기 손실을 토대로 상기 인코더와 상기 점유 네트워크의 파라미터를 학습하는 단계를 포함할 수 있다. The step of learning the parameters of the encoder and the occupied network is the step of generating a latent vector from a 2D image for each model of any one category in the encoder, from the 3D coordinates of each point and the latent vector in the occupant network. Outputting the occupancy status and RGB color of each point, calculating the 3D difference between the 3D mesh model generated from the occupancy status and RGB color of each point and the 3D mesh model itself of the model as a loss and learning parameters of the encoder and the incumbent network based on the loss.

상기 차이는 두 모델의 대응되는 3차원 점의 좌표 차이를 포함할 수 있다.The difference may include a coordinate difference between corresponding 3D points of the two models.

상기 각 카테고리 내에서 모델별 잠재 벡터를 학습하는 단계는 해당 카테고리에 해당되는 인코더와 점유 네트워크의 파라미터를 설정하는 단계, 상기 인코더에서 해당 카테고리의 각 모델의 2차원 영상으로부터 각 모델의 잠재 벡터를 생성하는 단계, 상기 점유 네트워크에서 상기 잠재 벡터와 각 점의 3차원 좌표로부터 상기 각 점에 대한 점유 여부와 RGB 색을 출력하는 단계, 상기 각 점에 대한 점유 여부와 RGB 색으로부터 생성되는 다수의 2차원 렌더링 영상과 해당 모델의 다중 뷰 렌더링 영상간 차이를 손실로 계산하는 단계, 그리고 상기 손실을 토대로 상기 인코더에서 생성된 각 모델의 잠재 벡터의 파라미터를 학습하는 단계를 포함할 수 있다.The step of learning the latent vector for each model in each category is the step of setting parameters of the encoder and occupancy network corresponding to the category, and generating the latent vector of each model from the 2D image of each model in the corresponding category in the encoder. outputting the occupancy status and RGB color of each point from the latent vector and the 3-dimensional coordinates of each point in the occupancy network; Calculating a difference between a rendered image and a multi-view rendered image of a corresponding model as a loss, and learning a parameter of a latent vector of each model generated by the encoder based on the loss.

상기 각 카테고리 내에서 모델별 잠재 벡터를 학습하는 단계는 해당 카테고리에 해당되는 인코더와 점유 네트워크의 파라미터를 설정하는 단계, 상기 인코더에서 해당 카테고리의 각 모델의 2차원 영상으로부터 각 모델의 잠재 벡터를 생성하는 단계, 상기 점유 네트워크에서 상기 잠재 벡터와 각 점의 3차원 좌표로부터 상기 각 점에 대한 점유 여부와 RGB 색을 출력하는 단계, 상기 각 점에 대한 점유 여부와 RGB 색으로부터 생성되는 3차원 메쉬 모델과 해당 모델의 3차원 메쉬 모델 자체의 3차원 차이를 손실로 계산하는 단계, 그리고 상기 손실을 토대로 상기 인코더에서 생성된 각 모델의 잠재 벡터의 파라미터를 학습하는 단계를 포함할 수 있다.The step of learning the latent vector for each model in each category is the step of setting parameters of the encoder and occupancy network corresponding to the category, and generating the latent vector of each model from the 2D image of each model in the corresponding category in the encoder. outputting the occupancy status and RGB color of each point from the latent vector and the 3D coordinates of each point in the occupancy network, a 3D mesh model generated from the occupancy status and RGB color of each point and calculating a 3D difference between the 3D mesh model itself of the corresponding model as a loss, and learning a parameter of a latent vector of each model generated by the encoder based on the loss.

다른 실시 예에 따르면, 3차원 모델 생성 장치에서 카메라를 통해 촬영된 영상으로부터 영상 내 객체에 대한 3차원 객체 모델을 생성하는 방법이 제공된다. 3차원 객체 모델 생성 방법은3차원 모델 DB에 저장되어 있는 카테고리별 복수의 모델을 이용하여 상기 카메라를 통해 촬영된 영상 내의 객체들에 각각 대응하는 모델을 인식하는 단계, 각 카테고리에 따라 모델별 잠재 벡터를 저장하고 있는 잠재 벡터 DB에서 상기 인식된 모델의 잠재 벡터를 획득하는 단계, 그리고 상기 잠재 벡터와 상기 인식된 모델의 카테고리에 해당되는 점유 네트워크의 파라미터를 사용하여, 상기 잠재 벡터의 파라미터를 변화시키는 단계를 포함한다. According to another embodiment, a method of generating a 3D object model for an object in an image from an image captured by a camera in a 3D model generating device is provided. The method of generating a 3D object model includes recognizing a model corresponding to each object in an image captured by the camera using a plurality of models for each category stored in a 3D model DB, and a potential potential for each model according to each category. Acquiring a latent vector of the recognized model from a latent vector DB storing vectors, and changing a parameter of the latent vector by using the latent vector and parameters of an occupied network corresponding to the category of the recognized model. It includes steps to

상기 변화시키는 단계는 상기 점유 네트워크에서 상기 잠재 벡터와 각 점의 3차원 좌표로부터 상기 각 점에 대한 점유 여부와 RGB 색을 출력하는 단계, 상기 각 점에 대한 점유 여부와 RGB 색으로부터 생성되는 다수의 2차원 렌더링 영상과 상기 카메라를 통해 촬영한 영상의 실제 모델에 대한 다중 뷰의 2차원 영상간 차이를 손실로 계산하는 단계, 그리고 상기 손실을 토대로 상기 잠재 벡터의 파라미터를 학습하는 단계를 포함할 수 있다. The changing step may include outputting occupancy status and RGB color for each point from the latent vector and the 3-dimensional coordinates of each point in the occupancy network, and a number of values generated from the occupancy status and RGB color of each point. Calculating a difference between a 2D rendered image and a 2D image of multiple views of a real model of an image captured by the camera as a loss, and learning a parameter of the latent vector based on the loss. there is.

상기 손실은 두 영상간 각 픽셀의 RGB 색 차이 또는 깊이 값 차이를 포함할 수 있다. The loss may include a difference in RGB color or depth value of each pixel between the two images.

상기 변화시키는 단계는 각 카테고리에 따른 점유 네트워크의 파라미터가 저장되어 있는 네트워크 파라미터 DB에서 상기 인식된 모델의 카테고리에 해당되는 점유 네트워크의 파라미터를 호출하여 설정하는 단계를 포함할 수 있다. The changing may include calling and setting the parameter of the occupied network corresponding to the category of the recognized model in a network parameter DB in which parameters of the occupied network according to each category are stored.

또 다른 실시 예에 따르면, 영상을 이용하여 3차원 객체 모델을 생성하기 위한 학습 장치가 제공된다. 학습 장치는 카테고리별 복수의 모델을 저장하고 있는 3차원 모델 DB, 상기 카테고리별 각 모델에 대한 2차원 영상으로부터 상기 각 모델에 대한 잠재 벡터를 생성하는 인코더, 상기 각 모델에 대한 잠재 벡터와 각 점의 3차원 좌표로부터 상기 각 점에 대한 점유 여부와 RGB 색을 출력하는 점유 네트워크, 상기 점유 네트워크의 출력으로부터 생성되는 각 모델의 결과와 상기 해당 모델을 렌더링하여 생성되는 각 모델의 결과간 차이를 손실로 계산하고, 상기 손실을 토대로 상기 인코더 및 상기 점유 네트워크의 파라미터를 학습하는 손실 계산부, 그리고 상기 카테고리별 학습된 상기 인코더 및 상기 점유 네트워크의 파라미터를 저장하는 네트워크 파라미터 DB를 포함한다. According to another embodiment, a learning device for generating a 3D object model using an image is provided. The learning device includes a 3D model DB storing a plurality of models for each category, an encoder for generating a latent vector for each model from a 2D image for each model for each category, and a latent vector for each model and each point. Occupancy network that outputs RGB colors and whether or not each point is occupied from the three-dimensional coordinates of , loss of the difference between the result of each model generated from the output of the occupancy network and the result of each model generated by rendering the corresponding model and a loss calculation unit for learning parameters of the encoder and the occupant network based on the loss, and a network parameter DB for storing the learned parameters of the encoder and the occupant network for each category.

상기 학습 장치는 상기 점유 네트워크의 출력으로부터 각 모델에 대한 다수의 2차원 영상을 생성하여 상기 손실 계산부로 전달하는 뉴럴 렌더러를 더 포함할 수 있고, 상기 손실 계산부는 상기 각 모델의 다수의 2차원 영상과 해당 모델의 다중 뷰 렌더링 영상간 색 차이 또는 깊이 차이를 상기 손실로 계산할 수 있다. The learning apparatus may further include a neural renderer for generating a plurality of 2D images for each model from the output of the occupancy network and transmitting them to the loss calculator, wherein the loss calculator generates a plurality of 2D images of each model A color difference or a depth difference between the multi-view rendering image of the model and the corresponding model may be calculated as the loss.

상기 학습 장치는 상기 점유 네트워크의 출력으로부터 각 모델의 3차원 메쉬 모델을 생성하여 상기 손실 계산부로 전달하는 메쉬 모델 생성부를 더 포함할 수 있고, 상기 손실 계산부는 상기 각 모델의 3차원 메쉬 모델과 해당 모델의 3차원 메쉬 모델 자체의 3차원 점의 좌표 차이를 손실로 계산할 수 있다. The learning device may further include a mesh model generating unit generating a 3D mesh model of each model from an output of the occupancy network and transmitting the generated 3D mesh model to the loss calculating unit, wherein the loss calculating unit generates a 3D mesh model of each model and a corresponding 3D mesh model of each model. The coordinate difference of the 3D points of the model's 3D mesh model itself can be calculated as the loss.

상기 손실 계산부는 상기 카테고리별 상기 점유 네트워크의 파라미터를 고정시킨 상태에서, 상기 각 카테고리 내의 모델별 상기 인코더로부터 출력되는 잠재 벡터의 파라미터를 학습할 수 있다. The loss calculation unit may learn the parameter of the latent vector output from the encoder for each model in each category in a state in which the parameter of the occupied network for each category is fixed.

상기 학습 장치는 상기 점유 네트워크의 출력으로부터 각 모델에 대한 다수의 2차원 영상을 생성하여 상기 손실 계산부로 전달하는 뉴럴 렌더러를 더 포함할 수 있고, 상기 손실 계산부는 상기 각 모델의 다수의 2차원 영상과 해당 모델의 다중 뷰 렌더링 영상간 색 차이 또는 깊이 차이를 손실로 계산하고, 상기 손실을 토대로 상기 각 모델에 대한 잠재 벡터의 파라미터를 학습할 수 있다.The learning apparatus may further include a neural renderer for generating a plurality of 2D images for each model from the output of the occupancy network and transmitting them to the loss calculator, wherein the loss calculator generates a plurality of 2D images of each model A color difference or a depth difference between a multi-view rendering image of a corresponding model and a color difference or a depth difference may be calculated as a loss, and a parameter of a latent vector for each model may be learned based on the loss.

상기 학습 장치는 상기 점유 네트워크의 출력으로부터 각 모델의 3차원 메쉬 모델을 생성하여 상기 손실 계산부로 전달하는 메쉬 모델 생성부를 더 포함할 수 있고, 상기 손실 계산부는 상기 각 모델의 3차원 메쉬 모델과 해당 모델의 3차원 메쉬 모델 자체의 3차원 점의 좌표 차이를 손실로 계산하고, 상기 손실을 토대로 상기 각 모델에 대한 잠재 벡터의 파라미터를 학습할 수 있다. The learning device may further include a mesh model generating unit generating a 3D mesh model of each model from an output of the occupancy network and transmitting the generated 3D mesh model to the loss calculating unit, wherein the loss calculating unit generates a 3D mesh model of each model and a corresponding 3D mesh model of each model. A coordinate difference between 3D points of the 3D mesh model itself may be calculated as a loss, and parameters of a latent vector for each model may be learned based on the loss.

상기 학습 장치는 상기 각 카테고리 내의 모델별 학습된 잠재 벡터를 저장하는 잠재 벡터 DB를 더 포함할 수 있다.The learning apparatus may further include a latent vector DB for storing learned latent vectors for each model in each category.

실시 예에 의하면, 영상에서 관측된 객체와 유사한 모델을 DB에서 검색하여 호출한 후, 이 모델의 형태를 변화시켜서 해당 객체와 같은 형태를 생성함으로써, 모델의 외형 및 종류에 제한 받지 않고 3차원 객체의 모델을 구현하는 것이 가능하다. 또한 이 방법을 기반으로 가상현실, 증강현실 및 확장현실 등 분야에서 3차원 객체의 모델을 생성하는데 있어서, 실시간성 확보, 누락지역 제거, 수밀 모델 확보를 동시에 만족할 수 있다. According to the embodiment, after searching and calling a model similar to the object observed in the image from the DB, the shape of this model is changed to create the same shape as the object, thereby creating a 3D object regardless of the shape and type of the model. It is possible to implement the model of In addition, in generating a model of a three-dimensional object in the field of virtual reality, augmented reality, and extended reality based on this method, it is possible to simultaneously satisfy the real-time property, the removal of the missing area, and the watertight model.

도 1은 실시 예에 따른 학습 장치에서 객체의 카테고리별 뉴럴 네트워크를 학습하는 과정의 일 예를 나타낸 도면이다.
도 2는 한 실시 예에 따른 학습 장치에서 객체의 카테고리별 뉴럴 네트워크를 학습하는 과정의 다른 일 예를 나타낸 도면이다.
도 3은 한 실시 예에 따른 학습 장치에서 특정 카테고리 내의 특정 모델에 대한 잠재벡터를 학습하는 과정의 일 예를 나타낸 도면이다.
도 4는 한 실시 예에 따른 학습 장치에서 특정 카테고리 내의 특정 모델에 대한 잠재벡터를 학습하는 과정의 다른 예를 나타낸 도면이다.
도 5는 한 실시 예에 따른 3차원 모델 생성 장치에서 도 1 내지 도 4에 도시된 학습 과정을 거쳐 학습된 점유 네트워크와 모델별 할당된 잠재벡터를 이용하여 최종 3차원 모델을 생성하는 과정을 나타낸 도면이다.
도 6은 한 실시 예에 따른 학습 장치의 전체적인 학습 방법을 나타낸 흐름도이다.
도 7은 한 실시 예에 따른 3차원 모델 생성 장치에서 3차원 모델을 생성하는 방법을 설명한 흐름도이다.
도 8은 실시 예에 따른 3차원 객체 모델 생성을 위한 컴퓨팅 장치를 나타낸 도면이다. 1 is a diagram illustrating an example of a process of learning a neural network for each category of object in a learning device according to an embodiment.
2 is a diagram illustrating another example of a process of learning a neural network for each category of object in a learning device according to an embodiment.
3 is a diagram illustrating an example of a process of learning a latent vector for a specific model within a specific category in a learning apparatus according to an embodiment.
4 is a diagram illustrating another example of a process of learning a latent vector for a specific model within a specific category in a learning device according to an embodiment.
FIG. 5 shows a process of generating a final 3D model using an occupancy network learned through the learning process shown in FIGS. 1 to 4 and a latent vector assigned to each model in a 3D model generating apparatus according to an embodiment. it is a drawing
6 is a flowchart illustrating an overall learning method of a learning device according to an embodiment.
7 is a flowchart illustrating a method of generating a 3D model in a 3D model generating apparatus according to an embodiment.
8 is a diagram illustrating a computing device for generating a 3D object model according to an embodiment.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 및 청구범위 전체에서, 어떤 부분이 어떤 구성 요소를 ""포함""한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성 요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification and claims, when a part ""includes" a certain component, it means that it may further include other components, not excluding other components unless otherwise stated. do.

이제 실시 예에 따른 3차원 객체 모델 생성을 위한 학습 장치 및 방법과 3차원 객체 모델 생성 방법에 대하여 도면을 참고로 하여 상세하게 설명한다. Now, a learning apparatus and method for generating a 3D object model and a method for generating a 3D object model according to embodiments will be described in detail with reference to drawings.

본 개시는 영상에서 객체의 카테고리를 인식하는 것으로부터 출발한다. 영상에서 객체를 인식하는 것은 기존에 많은 연구가 이루어졌고, 딥러닝 기술의 발전으로 영상에서 객체의 카테고리에 대한 인식률과 인식속도는 비약적으로 발전하였다. The present disclosure starts from recognizing a category of an object in an image. Recognizing objects in images has been studied in the past, and with the development of deep learning technology, the recognition rate and recognition speed for object categories in images have improved dramatically.

본 개시는 3차원 형태를 표현하는데 있어서 딥러닝 기술 분야에서 알려져 있는 점유 네트워크(Occupancy Network)를 사용한다. 이러한 점유 네트워크는 공간의 점(point)의 3차원 좌표(x,y,z)를 입력으로 수신하고, 만약 그 점이 객체의 외부에 있어서 점유되지 않으면 0보다 작은 값을, 객체의 내부에 있어서 점유되면 0보다 큰 값을 출력한다. 따라서 이 값이 0이 되는 3차원 좌표(x,y,z)의 집합이 객체의 표면이라고 할 수 있다. 이러한 점유 네트워크로부터 3차원 메쉬 모델을 생성하는 방법은 간단하며 그 방법은 통상적으로 많이 알려져 있다. 또한 점유 네트워크는 3차원 좌표(x,y,z)를 입력으로 했을 때, 해당하는 점의 RGB 색을 추가로 출력한다. The present disclosure uses an occupancy network known in the field of deep learning technology to represent a three-dimensional shape. This occupancy network receives as input the three-dimensional coordinates (x, y, z) of a point in space, and if the point is not occupied because it is outside the object, a value less than 0 is occupied because it is inside the object. If so, output a value greater than 0. Therefore, the set of three-dimensional coordinates (x, y, z) in which this value becomes 0 can be said to be the surface of an object. A method of generating a 3D mesh model from such an occupied network is simple, and the method is commonly known. In addition, when the occupancy network takes three-dimensional coordinates (x, y, z) as input, it additionally outputs the RGB color of the corresponding point.

도 1은 실시 예에 따른 학습 장치에서 객체의 카테고리별 뉴럴 네트워크를 학습하는 과정의 일 예를 나타낸 도면이다.1 is a diagram illustrating an example of a process of learning a neural network for each category of an object in a learning device according to an embodiment.

도 1을 참고하면, 학습 장치는 3차원 모델 DB(database)(10), 인코더(20), 점유 네트워크(30), 뉴럴 렌더러(40), 손실 계산부(50) 및 네트워크 파라미터 DB(60)를 포함한다. Referring to FIG. 1, the learning device includes a 3D model DB (database) 10, an encoder 20, an occupancy network 30, a neural renderer 40, a loss calculator 50, and a network parameter DB 60. includes

3차원 모델 DB(10)에는 카테고리별 복수의 3차원 모델이 저장되어 있다. 예를 들면, "의자"의 카테고리에 해당되는 다양한 3차원 모델(Model 0, Model 1, Model 2, …)이 저장되고, "책상"의 카테고리에 해당되는 다양한 3차원 모델이 저장되며, "TV"라는 카테고리에 해당되는 다양한 3차원 모델이 저장된다. A plurality of 3D models for each category are stored in the 3D model DB 10 . For example, various 3D models (Model 0, Model 1, Model 2, ...) corresponding to the category of "chair" are stored, various 3D models corresponding to the category of "desk" are stored, and "TV" Various 3D models corresponding to the category "are stored.

학습 장치는 3차원 모델 DB(10)에 저장되어 있는 특정 카테고리 내의 모든 모델에 대해 학습을 수행하여 특정 카테고리에 특화된 인코더(20)와 점유네트워크(30)의 파라미터를 결정한다. The learning device determines parameters of the encoder 20 and occupancy network 30 specialized for a specific category by performing learning on all models within a specific category stored in the 3D model DB 10 .

먼저, 특정 카테고리의 각 모델에 대한 2차원 영상(11)이 인코더(20)로 입력된다. 인코더(20)는 입력된 2차원 영상(11)으로부터 잠재벡터(Latent Vector)를 생성하여 출력한다. 인코더(20)로부터 출력되는 2차원 영상(11)에 대한 잠재벡터는 점유 네트워크(30)의 부분 입력이 된다. First, a 2D image 11 for each model of a specific category is input to the encoder 20. The encoder 20 generates and outputs a latent vector from the input 2D image 11 . The latent vector for the 2D image 11 output from the encoder 20 becomes a partial input of the occupancy network 30 .

점유 네트워크(30)는 임의 점(point)의 3차원 좌표(x,y,z)와 인코더(20)로부터 출력되는 2차원 영상(11)에 대한 잠재벡터를 입력으로 수신하고, 임의 점에 대한 점유 여부(+ or -)와 RGB 색을 출력한다. '+'는 앞에서 설명한 바와 같이 임의 점이 객체의 내부에 있어 점유됨을 의미하고, '-'는 임의 점이 객체의 외부에 있어서 점유되지 않음을 의미한다. 따라서, 0은 해당점이 물체의 표면 상의 점임을 나타낸다. The occupancy network 30 receives as input the 3D coordinates (x, y, z) of an arbitrary point and the latent vector for the 2D image 11 output from the encoder 20, and Occupancy (+ or -) and RGB color are output. As described above, '+' means that any point is inside the object and is occupied, and '-' means that any point is outside the object and is not occupied. Thus, 0 indicates that the point is a point on the surface of the object.

뉴럴 렌더러(neural renderer)(40)는 점유 네트워크(30)로부터 출력되는 각 점에 대한 점유 여부(+ or -)와 RGB 색을 입력으로 수신하여, 뉴럴 렌더링 기법을 이용하여 다수의 이차원 렌더링 영상(41)을 생성하여 출력한다. The neural renderer 40 receives the occupancy status (+ or -) and RGB color of each point output from the occupancy network 30 as inputs, and uses a neural rendering technique to generate a plurality of two-dimensional rendering images ( 41) is generated and output.

손실 계산부(50)는 뉴럴 렌더러(40)로부터 출력되는 다수의 이차원 렌더링 영상(41)과 해당 모델의 다중 뷰 렌더링 영상(12)과의 차이를 수학식 1과 같이 손실(Loss)로 계산하고, 계산된 손실(Loss)을 토대로 인코더(20)와 점유 네트워크(30)의 파라미터를 학습한다. The loss calculation unit 50 calculates the difference between the plurality of two-dimensional rendering images 41 output from the neural renderer 40 and the multi-view rendering image 12 of the model as a loss as shown in Equation 1, , the parameters of the encoder 20 and the occupancy network 30 are learned based on the calculated loss.

여기서, C는 특정 카테고리를 나타낸다. M(C)는 해당 카테고리 내의 모델 개수이며, V는 다중 카메라 렌더링 뷰의 개수이다. Mask(m,v)는 해당 모델 m을 해당 뷰 v에서 렌더링 했을 때 모델이 점유하고 있는 이미지 픽셀의 집합이다. I_p(m,v) 는 모델 m의 뷰 v에서 렌더링된 영상의 픽셀 p의 RGB값이거나 깊이값이다. 는 인코더(20)의 파라미터가 ε_C이고 점유네트워크(30)의 파라미터가 φ_C일 때, 뷰 v에서 뉴럴 렌더러(40)에 의해 생1성된 2차원 영상의 픽셀 p에서의 RGB값이거나 깊이값이다.Here, C represents a specific category. M(C) is the number of models in the category, and V is the number of multi-camera rendering views. Mask(m,v) is the set of image pixels occupied by the model when the model m is rendered in the corresponding view v. I _p (m,v) is the RGB value or depth value of pixel p of the image rendered in view v of model m. is an RGB value or depth value at the pixel p of the 2D image generated by the neural renderer 40 in the view v when the parameter of the encoder 20 is ε _C and the parameter of the occupancy network 30 is φ _C am.

이러한 방법으로, 학습 장치는 특정 카테고리 내의 모든 모델에 대해 학습을 수행하여 특정 카테고리에 특화된 인코더(20)와 점유네트워크(30)의 파라미터를 결정하며, 학습된 인코더(20) 및 점유네트워크(30)의 파라미터는 해당 카테고리에 할당하여 네트워크 파라미터 DB(60)에 저장된다. In this way, the learning device performs learning on all models within a specific category to determine the parameters of the encoder 20 and occupied network 30 specific to the specific category, and the learned encoder 20 and the occupied network 30 Parameters of are allocated to corresponding categories and stored in the network parameter DB 60.

도 2는 한 실시 예에 따른 학습 장치에서 객체의 카테고리별 뉴럴 네트워크를 학습하는 과정의 다른 일 예를 나타낸 도면이다.2 is a diagram illustrating another example of a process of learning a neural network for each category of object in a learning device according to an embodiment.

도 2를 참고하면, 학습 장치는 도 1과 같이 뉴럴 렌더링 기법을 사용하지 않고 점유 네트워크(30)의 출력으로부터 생성되는 메쉬 모델을 이용한다. Referring to FIG. 2 , the learning device uses a mesh model generated from the output of the occupancy network 30 without using a neural rendering technique as shown in FIG. 1 .

학습 장치는 도 1에 도시된 뉴럴 렌더러(40) 대시 메쉬 모델 생성부(70)를 포함할 수 있다. The learning device may include the neural renderer 40 shown in FIG. 1 and the dash mesh model generator 70 .

메쉬 모델 생성부(70)는 점유 네트워크(30)로부터 출력되는 각 점에 대한 점유 여부(+ or -)와 RGB 색을 입력으로 수신하여, 2차원 영상(11)에 대한 3차원 메쉬 모델(71)을 생성한다. The mesh model generation unit 70 receives as inputs the occupancy status (+ or -) and RGB color of each point output from the occupancy network 30, and the 3D mesh model 71 for the 2D image 11 ) to create

손실 계산부(50a)는 메쉬 모델 생성부(70)로부터 출력되는 3차원 메쉬 모델(71)과 해당 모델의 3차원 메쉬 모델(42) 자체의 3차원 차이를 수학식 2와 같이 손실(Loss)로 계산한다. 손실 계산부(50a)는 계산된 손실을 토대로 인코더(20)와 점유 네트워크(30)의 파라미터를 학습한다. The loss calculating unit 50a calculates the 3D difference between the 3D mesh model 71 output from the mesh model generating unit 70 and the 3D mesh model 42 of the corresponding model as shown in Equation 2. Calculate with The loss calculator 50a learns parameters of the encoder 20 and the occupancy network 30 based on the calculated loss.

여기서, Surface(m)는 해당 모델 m의 표면을 이루고 있는 3차원 점 집합이다. M_p(m)은 모델 m의 3차원 점 p에서의 (x,y,z) 좌표값이다. 는 인코더(20)의 파라미터가 ε_C이고 점유네트워크(30)의 파라미터가 φ_C일 때, 생성된 3차원 메쉬 모델에서 점 p에 가장 가까운 점의 (x,y,z) 좌표이다.Here, Surface(m) is a set of 3D points constituting the surface of the corresponding model m. M _p (m) is the (x,y,z) coordinate value at the 3D point p of model m. is the (x,y,z) coordinate of a point closest to the point p in the generated 3D mesh model when the parameter of the encoder 20 is ε _C and the parameter of the occupancy network 30 is φ _C.

도 3은 한 실시 예에 따른 학습 장치에서 특정 카테고리 내의 특정 모델에 대한 잠재벡터를 학습하는 과정의 일 예를 나타낸 도면이다. 3 is a diagram illustrating an example of a process of learning a latent vector for a specific model within a specific category in a learning apparatus according to an embodiment.

도 3을 참고하면, 도 1과 도 2를 통해 학습된 인코더(20)와 점유네트워크(30)는 해당 카테고리의 모든 모델들에 공통적으로 적용될 수 있는 파라미터를 가지고 있으나 범용성에 의해 세부적인 3차원 모델의 정확성에 한계가 있을 수 있다. 따라서, 학습 장치는 점유 네트워크(30)의 파라미터를 고정하고 해당 카테고리 내에서 모델별로 특화된 잠재벡터를 추가로 학습한다. 이때 인코더(20)는 잠재벡터를 학습하는 과정에 관여하지 않는다. Referring to FIG. 3, the encoder 20 and the occupancy network 30 learned through FIGS. 1 and 2 have parameters that can be commonly applied to all models of the corresponding category, but are detailed 3D models due to versatility. accuracy may be limited. Therefore, the learning device fixes the parameters of the occupied network 30 and additionally learns a latent vector specialized for each model within the corresponding category. At this time, the encoder 20 is not involved in the process of learning the latent vector.

도 1에 도시된 학습 장치를 토대로 하나의 카테고리 내에서 모델별로 특화된 잠재벡터를 학습하는 과정을 설명한다. A process of learning a latent vector specialized for each model within one category based on the learning apparatus shown in FIG. 1 will be described.

하나의 카테고리 내에서 모델별로 특화된 잠재벡터를 학습하기 위해, 인코더(20) 및 점유네트워크(30)는 도 1 및 도 2에서 설명한 네트워크 파라미터 DB(60)에서 해당 카테고리에 할당된 파라미터를 호출하여 사용한다. In order to learn latent vectors specialized for each model within one category, the encoder 20 and the occupancy network 30 call and use parameters assigned to the corresponding category from the network parameter DB 60 described in FIGS. 1 and 2. do.

다음, 해당 카테고리 내 특정 모델의 2차원 영상(14)이 인코더(20)로 입력된다. Next, the 2D image 14 of a specific model within the corresponding category is input to the encoder 20 .

인코더(20)로부터 2차원 영상(14)에 대한 잠재 벡터가 출력되고, 인코더(20)로부터 출력된 잠재 벡터는 점유 네트워크(30)로 입력된다. A latent vector for the 2D image 14 is output from the encoder 20, and the latent vector output from the encoder 20 is input to the occupancy network 30.

점유 네트워크(30)는 임의 점의 3차원 좌표(x,y,z)와 인코더(20)로부터 출력되는 2차원 영상(14)에 대한 잠재벡터를 입력으로 수신하고, 임의 점에 대한 점유 여부(+ or -)와 RGB 색을 출력한다. The occupancy network 30 receives as input the 3D coordinates (x, y, z) of an arbitrary point and the latent vector for the 2D image 14 output from the encoder 20, and determines whether the arbitrary point is occupied ( + or -) and RGB color output.

뉴럴 렌더러(40)는 점유 네트워크(30)로부터 출력되는 각 점에 대한 점유 여부(+ or -)와 RGB 색을 입력으로 수신하여, 뉴럴 렌더링 기법을 이용하여 다수의 이차원 렌더링 영상(42)을 생성하여 출력한다. The neural renderer 40 receives the occupancy (+ or -) and RGB color of each point output from the occupancy network 30 as inputs, and generates a plurality of two-dimensional rendering images 42 using a neural rendering technique and output

손실 계산부(50)는 뉴럴 렌더러(40)로부터 2차원 영상들(42)과 해당 모델의 다중 뷰 렌더링 영상들(15)과의 차이를 수학식 3과 같이 손실로 계산한다. 손실 계산부(50)는 계산된 손실 값을 토대로 잠재 벡터의 파라미터를 학습한다. The loss calculation unit 50 calculates the difference between the 2D images 42 from the neural renderer 40 and the multi-view rendered images 15 of the corresponding model as a loss as shown in Equation 3. The loss calculation unit 50 learns parameters of the latent vector based on the calculated loss value.

여기서, z_m은 모델 m에 대한 잠재 벡터의 파라미터이다. 는 점유네트워크(30)의 고정된 파라미터가 φ_C이고 잠재 벡터의 파라미터가 z_m일 때, 뷰 v에서 뉴럴 렌더링 기법에 의해 생성된 2차원 영상의 픽셀 p에서의 RGB값이거나 깊이값이다.where z _m is the parameter of the latent vector for model m. is an RGB value or a depth value of a pixel p of a 2D image generated by a neural rendering technique in a view v when a fixed parameter of the occupancy network 30 is φ _C and a parameter of a latent vector is z _m .

이와 같이, 하나의 카테고리 내에서 해당 모델에 대해 정제 대상이 되는 잠재 벡터는 인코더(20)로부터 출력되는 잠재벡터를 초기 값으로 하며, 초기 잠재벡터는 손실 계산부(50)에 의해 계산된 손실 값을 토대로 학습을 통해 갱신되며, 학습의 결과로 초기 잠재벡터보다 더 정확하게 정제된 잠재벡터가 결정된다. In this way, the latent vector to be refined for the corresponding model within one category has the latent vector output from the encoder 20 as an initial value, and the initial latent vector is the loss value calculated by the loss calculator 50. It is updated through learning based on , and as a result of learning, a latent vector more accurately refined than the initial latent vector is determined.

이러한 방법으로, 하나의 카테고리 내에서 모든 모델들에 대해 학습을 통해 정제된 잠재벡터는 대응하는 모델에 할당하여 잠재벡터 DB(80)에 저장된다. In this way, latent vectors refined through learning for all models within one category are allocated to corresponding models and stored in the latent vector DB 80.

학습 장치는 상기에서 설명한 방법을 토대로, 카테고리별 잠재벡터 DB(80)를 생성하며, 카테고리별 잠재벡터 DB(80)는 해당 카테고리 내 모델별 잠재 벡터를 저장한다. The learning device generates a latent vector DB 80 for each category based on the method described above, and the latent vector DB 80 for each category stores latent vectors for each model within a corresponding category.

도 4는 한 실시 예에 따른 학습 장치에서 특정 카테고리 내의 특정 모델에 대한 잠재벡터를 학습하는 과정의 다른 예를 나타낸 도면이다. 4 is a diagram illustrating another example of a process of learning a latent vector for a specific model within a specific category in a learning apparatus according to an embodiment.

도 4를 참고하면, 도 2에 도시된 학습 장치를 토대로 하나의 카테고리 내에서 모델별로 특화된 잠재벡터를 학습하는 과정을 설명한다. Referring to FIG. 4 , a process of learning a latent vector specialized for each model within one category based on the learning device shown in FIG. 2 will be described.

학습 장치는 점유네트워크(30)의 파라미터를 고정하고, 도 3과 같이 뉴럴 렌더링 기법을 사용하지 않고 점유 네트워크(30)의 출력으로부터 만들어지는 메쉬 모델을 이용하여, 모델별로 특화된 잠재벡터를 추가로 학습한다.The learning device fixes the parameters of the occupied network 30 and additionally learns a latent vector specialized for each model by using a mesh model created from the output of the occupied network 30 without using a neural rendering technique as shown in FIG. do.

메쉬 모델 생성부(70)는 점유 네트워크(30)로부터 출력되는 각 점에 대한 점유 여부(+ or -)와 RGB 색을 입력으로 수신하여, 2차원 영상(14)에 대한 3차원 메쉬 모델(72)을 생성한다. The mesh model generation unit 70 receives the occupancy (+ or -) and RGB color of each point output from the occupancy network 30 as inputs, and receives the 3D mesh model 72 for the 2D image 14. ) to create

손실 계산부(50a)는 메쉬 모델 생성부(70)로부터 출력되는 3차원 메쉬 모델(72)과 해당 모델의 3차원 메쉬 모델(16) 자체의 3차원 차이를 수학식 4와 같이 손실(Loss)로 계산한다. 손실 계산부(50a)는 계산된 손실을 토대로 잠재벡터의 파라미터를 학습한다. The loss calculator 50a calculates the 3D difference between the 3D mesh model 72 output from the mesh model generator 70 and the 3D mesh model 16 itself of the corresponding model as a loss as shown in Equation 4. Calculate with The loss calculator 50a learns parameters of the latent vector based on the calculated loss.

여기서 는 고정된 점유 네트워크(30)의 파라미터가 φ_C 이고 잠재벡터의 파라미터가 z_m일 때, 생성된 3차원 메쉬 모델에서 점 p에 가장 가까운 점의 (x,y,z) 좌표이다. here is the (x, y, z) coordinate of a point closest to the point p in the generated 3D mesh model when the parameter of the occupied network 30 is φ _C and the parameter of the latent vector is z _m .

이와 같이, 하나의 카테고리 내에서 해당 모델에 대해 정제 대상이 되는 잠재 벡터는 인코더(20)로부터 출력되는 잠재벡터를 초기 값으로 하며, 초기 잠재벡터는 손실 계산부(50a)에 의해 계산된 손실 값을 토대로 갱신되면서, 정제된 잠재벡터가 결정된다. In this way, the latent vector to be refined for the corresponding model within one category has the latent vector output from the encoder 20 as an initial value, and the initial latent vector is the loss value calculated by the loss calculator 50a. While being updated based on , a refined latent vector is determined.

이러한 방법으로, 하나의 카테고리 내에서 모든 모델들에 대해 정제된 잠재벡터는 대응하는 모델에 할당하여 잠재벡터 DB(80)에 저장된다. In this way, latent vectors refined for all models within one category are allocated to corresponding models and stored in the latent vector DB 80.

도 5는 한 실시 예에 따른 3차원 모델 생성 장치에서 도 1 내지 도 4에 도시된 학습 과정을 거쳐 학습된 점유 네트워크와 모델별 할당된 잠재벡터를 이용하여 최종 3차원 모델을 생성하는 과정을 나타낸 도면이다. FIG. 5 illustrates a process of generating a final 3D model using an occupancy network learned through the learning process shown in FIGS. 1 to 4 and a latent vector assigned to each model in a 3D model generating apparatus according to an embodiment. it is a drawing

도 5를 참고하면, 3차원 모델 생성 장치는 모델 인식부(90), 잠재 벡터 획득부(22), 점유 네트워크(30), 뉴럴 렌더러(40) 및 손실 계산부(50)를 포함할 수 있다. 3차원 모델 생성 장치는 3차원 모델 DB(10), 네트워크 파라미터 DB(60) 및 잠재 벡터 DB(80)를 더 포함할 수 있다. Referring to FIG. 5 , the apparatus for generating a 3D model may include a model recognition unit 90, a latent vector acquisition unit 22, an occupancy network 30, a neural renderer 40, and a loss calculator 50. . The 3D model generating apparatus may further include a 3D model DB 10 , a network parameter DB 60 and a latent vector DB 80 .

먼저, 사용자가 카메라를 이용하여 영상을 촬영한다. First, a user captures an image using a camera.

모델 인식부(90)는 카메라를 통해 촬영된 영상을 수신하고, 카메라를 통해 촬영된 영상 내의 객체들에 각각 대응하는 모델에 대한 인식 및 검색을 수행한다. 모델 인식부(90)는 모델 인식을 위해 컴퓨터 비전 분야의 어떠한 방법을 사용하든 무관하다. 모델 인식부(90)에 의해 인식된 모델들 중 하나에 대한 검색 결과가 예를 들어 ''Model 2''(1)라고 하면, 이 모델(1)은 현실적으로 실제 모델과 같은 모델이기 보다는 유사한 모델이 될 가능성이 크다. The model recognizing unit 90 receives an image captured through a camera, and recognizes and searches for models respectively corresponding to objects in the image captured through the camera. The model recognizing unit 90 does not matter which method in the field of computer vision is used for model recognition. If the search result for one of the models recognized by the model recognizing unit 90 is, for example, ''Model 2'' (1), this model (1) is actually a similar model rather than the same model as the actual model. It is very likely that this will be

잠재 벡터 획득부(22)는 도 3 및 도 4를 통해 생성된 잠재 벡터 DB(80)에서 검색된 모델(1)에 할당된 잠재 벡터를 호출하여 학습의 초기 잠재 벡터로 획득한다. 다음, 획득된 초기 잠재 벡터는 점유 네트워크(30)로 입력된다. The latent vector acquisition unit 22 obtains the latent vector assigned to the model 1 retrieved from the latent vector DB 80 generated through FIGS. 3 and 4 as an initial latent vector for learning. Next, the obtained initial latent vector is input to the occupancy network 30 .

점유 네트워크(30)는 도 1 및 도 2에서 설명한 네트워크 파라미터 DB(60)에서 인식된 모델(1)이 속하는 카테고리에 할당된 파라미터를 호출하여 사용한다. 점유 네트워크(30)는 임의 점의 3차원 좌표(x,y,z)와 잠재벡터를 입력으로 수신하고, 임의 점에 대한 점유 여부(+ or -)와 RGB 색을 출력한다. The occupied network 30 calls and uses a parameter assigned to a category to which the model 1 recognized in the network parameter DB 60 described in FIGS. 1 and 2 belongs. The occupancy network 30 receives the 3-dimensional coordinates (x, y, z) and latent vector of an arbitrary point as inputs, and outputs whether the arbitrary point is occupied (+ or -) and RGB color.

다음, 뉴럴 렌더러(40)에서는 점유 네트워크(30)로부터 출력되는 각 점에 대한 점유 여부(+ or -)와 RGB 색을 입력으로 수신하여, 뉴럴 렌더링 기법을 이용하여 다수의 이차원 렌더링 영상(3)을 생성하여 출력한다. Next, the neural renderer 40 receives the occupancy (+ or -) and RGB color of each point output from the occupancy network 30 as inputs, and uses a neural rendering technique to generate a plurality of two-dimensional rendering images (3) to generate and output

손실 계산부(50)는 뉴럴 렌더러(40)로부터 2차원 렌더링 영상들(3)과 사용자가 촬영한 영상의 실제 모델에 대해 다양한 뷰에서 얻은 2차원 영상들(2)과의 차이를 수학식 5와 같이 손실로 계산한다. 손실 계산부(50)는 계산된 손실 값을 토대로 잠재 벡터의 파라미터를 학습한다. The loss calculation unit 50 calculates the difference between the 2D rendered images 3 from the neural renderer 40 and the 2D images 2 obtained from various views of the actual model of the image captured by the user using Equation 5 Calculate the loss as The loss calculation unit 50 learns parameters of the latent vector based on the calculated loss value.

여기서, U는 사용자가 촬영한 영상의 개수이다. Mask_u는 사용자가 촬영한 영상 u에서 인식된 모델이 점유하고 있는 이미지 픽셀의 집합이다. S_p(u)는 촬영한 영상 u에서 픽셀 p의 RGB값이거나 깊이값이다. 는 점유네트워크(30)의 파라미터가 φ_C이고 잠재벡터의 파라미터가 z_m일 때, 뷰 u에서 뉴럴 렌더링 기법에 의해 생성된 2차원 영상의 픽셀 p에서의 RGB값이거나 깊이값이다.Here, U is the number of images captured by the user. Mask _u is a set of image pixels occupied by a recognized model in the image u captured by the user. S _p (u) is an RGB value or depth value of pixel p in the captured image u. is an RGB value or depth value of a pixel p of a 2D image generated by the neural rendering technique in view u when the parameter of the occupancy network 30 is φ _C and the parameter of the latent vector is z _m .

뉴럴 렌더러(40)로부터 2차원 렌더링 영상들(3)과 사용자가 촬영한 영상의 실제 모델에 대한 2차원 영상(2)간 차이가 없도록, 잠재 벡터의 파라미터가 학습되며, 이렇게 학습된 잠재 벡터의 파라미터와 점유 네트워크를 이용하여 3차원 객체 모델이 생성 및 출력된다. The parameter of the latent vector is learned so that there is no difference between the 2D rendered images 3 from the neural renderer 40 and the 2D image 2 of the actual model of the image captured by the user, and the learned latent vector A 3D object model is created and output using parameters and occupancy networks.

도 6은 한 실시 예에 따른 학습 장치의 전체적인 학습 방법을 나타낸 흐름도이다. 6 is a flowchart illustrating an overall learning method of a learning device according to an embodiment.

도 6을 참고하면, 학습 장치는 도 1 또는 도 2에서 설명한 방법을 토대로 3차원 모델 DB(10) 내 카테고리별 모델들을 이용하여 카테고리별로 인코더(20)와 점유 네트워크(30)의 파라미터를 학습한다(S610). Referring to FIG. 6, the learning device learns the parameters of the encoder 20 and the occupancy network 30 for each category using models for each category in the 3D model DB 10 based on the method described in FIG. 1 or 2. (S610).

학습 장치는 카테고리별 학습된 인코더(20) 및 점유 네트워크(30)의 파라미터를 네트워크 파라미터 DB(60)에 저장한다(S620). The learning device stores the parameters of the encoder 20 and the occupied network 30 learned for each category in the network parameter DB 60 (S620).

카테고리별 인코더(20)와 점유 네트워크(30)의 파라미터가 학습되고 나면, 학습 장치는 특정 카테고리 내 모델별 잠재벡터를 추가로 학습한다. 학습 장치는 특정 카테고리 내 모델별 잠재벡터를 학습하기 위해, 특정 카테고리의 인코더(20) 및 점유 네트워크(30)의 파라미터를 네트워크 파라미터 DB(60)로부터 호출한다(S630). 다음, 학습 장치는 인코더(20) 및 점유 네트워크(30)를 이용하여 도 3 또는 도 4에서 설명한 방법을 토대로 특정 카테고리 내 모델별 잠재벡터를 학습한다(S640). After the parameters of the encoder 20 and occupancy network 30 for each category are learned, the learning device additionally learns latent vectors for each model within a specific category. The learning device calls parameters of the encoder 20 of a specific category and the occupied network 30 from the network parameter DB 60 in order to learn latent vectors for each model in a specific category (S630). Next, the learning device uses the encoder 20 and the occupancy network 30 to learn latent vectors for each model within a specific category based on the method described in FIG. 3 or 4 (S640).

학습 장치는 학습된 잠재벡터를 대응하는 모델에 할당하여 잠재벡터 DB(80)에 저장한다(S650). The learning device allocates the learned latent vector to a corresponding model and stores it in the latent vector DB 80 (S650).

이와 같이, 학습된 점유 네트워크와 모델 별로 할당된 잠재벡터를 이용하여 최종 3차원 모델이 생성될 수 있다.In this way, a final 3D model may be generated using the learned occupancy network and latent vectors allocated for each model.

도 7은 한 실시 예에 따른 3차원 모델 생성 장치에서 3차원 모델을 생성하는 방법을 설명한 흐름도이다. 7 is a flowchart illustrating a method of generating a 3D model in a 3D model generating apparatus according to an embodiment.

도 7을 참고하면, 3차원 모델 생성 장치는 카메라를 통해 촬영된 영상을 수신하고(S710), 카메라를 통해 촬영된 영상 내의 객체들에 각각 대응하는 모델에 대한 인식 및 검색을 수행한다(S720). Referring to FIG. 7 , the apparatus for generating a 3D model receives an image captured through a camera (S710), and recognizes and searches for models respectively corresponding to objects in the image captured through the camera (S720). .

3차원 모델 생성 장치는 잠재 벡터 DB(80)에서 인식 및 검색된 모델에 할당된 잠재 벡터를 호출하여 학습의 초기 잠재 벡터로 획득한다(S730). The 3D model generating apparatus calls a latent vector assigned to the model recognized and searched in the latent vector DB 80 and acquires it as an initial latent vector for learning (S730).

3차원 모델 생성 장치는 임의 점의 3차원 좌표(x,y,z)와 잠재 벡터를 입력으로 수신하는 점유 네트워크(30)를 통해 임의 점에 대한 점유 여부(+ or -)와 RGB 색을 획득하고(S740), 각 점에 대한 점유 여부(+ or -)와 RGB 색에 대해 뉴럴 렌더링 기법을 사용하여 다수의 이차원 렌더링 영상을 생성한다(S750). The 3D model generating device acquires whether a point is occupied (+ or -) and RGB color through the occupation network 30 that receives the 3D coordinates (x, y, z) and latent vector of the point as input. (S740), and a plurality of two-dimensional rendering images are generated by using a neural rendering technique for each dot's occupancy (+ or -) and RGB color (S750).

다음, 3차원 모델 생성 장치는 뉴럴 렌더링 기법을 이용해 생성된 다수의 2차원 영상들과 사용자가 촬영한 영상의 실제 모델에 대한 다중 뷰의 2차원 영상들과의 차이를 수학식 5와 같이 손실로 계산한다(S760). Next, the 3D model generation apparatus converts the difference between a plurality of 2D images generated using the neural rendering technique and the 2D images of multiple views of the actual model of the image captured by the user into a loss as shown in Equation 5. Calculate (S760).

다음, 3차원 모델 생성 장치는 계산된 손실 값을 토대로 잠재 벡터의 파라미터를 학습한다(S770). 3차원 모델 생성 장치는 뉴럴 렌더링 기법을 이용해 생성된 다수의 2차원 영상과 사용자가 촬영한 영상의 실제 모델에 대한 다중 뷰의 2차원 영상간 차이가 없도록 잠재 벡터의 파라미터를 학습한다. Next, the 3D model generating device learns the parameter of the latent vector based on the calculated loss value (S770). The 3D model generating apparatus learns parameters of latent vectors so that there is no difference between a plurality of 2D images generated using a neural rendering technique and a 2D image of multiple views of a real model of an image captured by a user.

이렇게 학습된 잠재 벡터의 파라미터와 점유 네트워크를 이용하여 3차원 객체 모델이 생성된다. A 3D object model is created using the parameters of the latent vector learned in this way and the occupancy network.

도 8은 실시 예에 따른 3차원 객체 모델 생성을 위한 컴퓨팅 장치를 나타낸 도면이다. 8 is a diagram illustrating a computing device for generating a 3D object model according to an embodiment.

도 8을 참고하면, 3차원 객체 모델 생성을 위한 컴퓨팅 장치(800)는 3차원 객체 모델 생성을 위한 학습 방법이 구현된 장치를 나타낼 수 있다. 3차원 객체 모델 생성을 위한 컴퓨팅 장치(800)는 3차원 객체 모델 생성 방법이 구현된 장치를 나타낼 수 있다. Referring to FIG. 8 , a computing device 800 for generating a 3D object model may represent a device implementing a learning method for generating a 3D object model. The computing device 800 for generating a 3D object model may represent a device in which a method for generating a 3D object model is implemented.

컴퓨팅 장치(800)는 프로세서(810), 메모리(820), 입력 인터페이스 장치(830), 출력 인터페이스 장치(840) 및 저장 장치(850) 중 적어도 하나를 포함할 수 있다. 각각의 구성 요소들은 공통 버스(bus)(860)에 의해 연결되어 서로 통신을 수행할 수 있다. 또한, 각각의 구성 요소들은 공통 버스(860)가 아니라, 프로세서(810)를 중심으로 개별 인터페이스 또는 개별 버스를 통하여 연결될 수도 있다.The computing device 800 may include at least one of a processor 810 , a memory 820 , an input interface device 830 , an output interface device 840 and a storage device 850 . Each component may be connected by a common bus 860 to communicate with each other. In addition, each component may be connected through an individual interface or individual bus centered on the processor 810 instead of the common bus 860 .

프로세서(810)는 AP(Application Processor), CPU(Central Processing Unit), GPU(Graphic　Processing　Unit) 등과 같은 다양한 종류들로 구현될 수 있으며, 메모리(820) 또는 저장 장치(850)에 저장된 명령을 실행하는 임의의 반도체 장치일 수 있다. 프로세서(810)는 메모리(820) 및 저장 장치(850) 중에서 적어도 하나에 저장된 프로그램 명령(program command)을 실행할 수 있다. The processor 810 may be implemented in various types such as an application processor (AP), a central processing unit (CPU), a graphics processing unit (GPU), and the like, and executes commands stored in the memory 820 or the storage device 850. It may be any semiconductor device that The processor 810 may execute a program command stored in at least one of the memory 820 and the storage device 850 .

이러한 프로세서(810)는 도 1 내지 도 4, 도 6을 토대로 설명한 학습 방법의 적어도 일부 기능을 구현하기 위한 프로그램 명령을 메모리(820)에 저장하여, 도 1 내지 도 4, 도 6을 토대로 설명한 동작이 수행되도록 제어할 수 있다. The processor 810 stores, in the memory 820, program instructions for implementing at least some functions of the learning method described with reference to FIGS. 1 to 4 and 6, and operates the operations described with reference to FIGS. You can control this to happen.

또한 프로세서(810)는 도 5 및 도 7을 토대로 설명한 3차원 객체 모델 생성 방법의 적어도 일부 기능을 구현하기 위한 프로그램 명령을 메모리(820)에 저장하여, 도 5 및 도 7을 토대로 설명한 동작이 수행되도록 제어할 수 있다. In addition, the processor 810 stores in the memory 820 program instructions for implementing at least some functions of the 3D object model generation method described with reference to FIGS. 5 and 7 , and the operations described with reference to FIGS. 5 and 7 are performed. can be controlled as much as possible.

메모리(820) 및 저장 장치(850)는 다양한 형태의 휘발성 또는 비 휘발성 저장 매체를 포함할 수 있다. 예를 들어, 메모리(820)는 ROM(read-only memory)(821) 및 RAM(random access memory)(822)를 포함할 수 있다. 메모리(820)는 프로세서(810)의 내부 또는 외부에 위치할 수 있고, 메모리(820)는 이미 알려진 다양한 수단을 통해 프로세서(810)와 연결될 수 있다. The memory 820 and the storage device 850 may include various types of volatile or non-volatile storage media. For example, the memory 820 may include read-only memory (ROM) 821 and random access memory (RAM) 822 . The memory 820 may be located inside or outside the processor 810, and the memory 820 may be connected to the processor 810 through various known means.

입력 인터페이스 장치(830)는 데이터를 프로세서(810)로 제공하도록 구성된다. Input interface device 830 is configured to provide data to processor 810 .

출력 인터페이스 장치(840)는 프로세서(810)로부터의 데이터를 출력하도록 구성된다. Output interface device 840 is configured to output data from processor 810 .

실시 예에 따른 3차원 객체 모델 생성을 위한 학습 방법과 3차원 객체 모델 생성 방법 중 적어도 일부는 컴퓨팅 장치에서 실행되는 프로그램 또는 소프트웨어로 구현될 수 있고, 프로그램 또는 소프트웨어는 컴퓨터로 판독 가능한 매체에 저장될 수 있다.At least some of the learning method for generating a 3D object model and the method for generating a 3D object model according to the embodiment may be implemented as a program or software running on a computing device, and the program or software may be stored in a computer-readable medium. can

또한 실시 예에 따른 3차원 객체 모델 생성을 위한 학습 방법과 3차원 객체 모델 생성 방법 중 적어도 일부는 컴퓨팅 장치와 전기적으로 접속될 수 있는 하드웨어로 구현될 수도 있다.In addition, at least some of the learning method for generating a 3D object model and the method for generating a 3D object model according to embodiments may be implemented as hardware that can be electrically connected to a computing device.

이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리 범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리 범위에 속하는 것이다. Although the embodiments of the present invention have been described in detail above, the scope of the present invention is not limited thereto, and various modifications and improvements of those skilled in the art using the basic concept of the present invention defined in the following claims are also included in the scope of the present invention. that fall within the scope of the right.

Claims

In a learning method for generating a 3D object model using an image in a learning device,
Learning the parameters of the encoder and occupancy network according to each category using a plurality of models for each category stored in the 3D model DB;
learning a latent vector for each model within each category using parameters of the encoder and occupied network of each category; and
Storing latent vectors for each model according to each category in a latent vector DB
Learning method including.

In paragraph 1,
Learning the parameters of the encoder and the occupied network
Generating a latent vector from a 2D image for each model of any one category in the encoder;
Outputting whether each point is occupied and an RGB color from the latent vector and the 3-dimensional coordinates of each point in the occupancy network;
Calculating the difference between the occupancy of each point and the multi-view rendering image of the model and a plurality of 2D rendering images generated from RGB colors as a loss, and
and learning parameters of the encoder and the incumbent network based on the loss.

In paragraph 2,
Wherein the difference includes an RGB color difference or a depth value difference of each pixel between the two images.

In paragraph 1,
Learning the parameters of the encoder and the occupied network
Generating a latent vector from a 2D image for each model of any one category in the encoder;
Outputting whether each point is occupied and an RGB color from the latent vector and the 3-dimensional coordinates of each point in the occupancy network;
Calculating a 3-dimensional difference between the 3-dimensional mesh model generated from the occupancy of each point and the RGB color and the 3-dimensional mesh model itself of the model as a loss, and
and learning parameters of the encoder and the incumbent network based on the loss.

In paragraph 4,
The difference is a learning method comprising a difference in coordinates of corresponding 3-dimensional points of the two models.

In paragraph 1,
The step of learning latent vectors for each model within each category is
Setting the parameters of the encoder and occupied network corresponding to the category;
generating a latent vector of each model from a 2D image of each model of the corresponding category in the encoder;
Outputting whether each point is occupied and an RGB color from the latent vector and the 3-dimensional coordinates of each point in the occupancy network;
Calculating the difference between the occupancy of each point and the multi-view rendering image of the model and a plurality of 2D rendering images generated from RGB colors as a loss, and
Learning a parameter of a latent vector of each model generated by the encoder based on the loss.

In paragraph 1,
The step of learning latent vectors for each model within each category is
Setting the parameters of the encoder and occupied network corresponding to the category;
generating a latent vector of each model from a 2D image of each model of the corresponding category in the encoder;
Outputting whether each point is occupied and an RGB color from the latent vector and the 3-dimensional coordinates of each point in the occupancy network;
Calculating a 3-dimensional difference between the 3-dimensional mesh model generated from the occupancy of each point and the RGB color and the 3-dimensional mesh model itself of the model as a loss, and
Learning a parameter of a latent vector of each model generated by the encoder based on the loss.

In a method for generating a 3D object model for an object in an image from an image captured by a camera in a 3D model generating device,
Recognizing models corresponding to objects in an image captured by the camera using a plurality of models for each category stored in a 3D model DB;
Acquiring a latent vector of the recognized model from a latent vector DB storing latent vectors for each model according to each category; and
Changing a parameter of the latent vector by using the latent vector and a parameter of an occupied network corresponding to the category of the recognized model.
A method for generating a three-dimensional object model comprising a.

In paragraph 8,
The step of changing
Outputting whether each point is occupied and an RGB color from the latent vector and the 3-dimensional coordinates of each point in the occupancy network;
Calculating as a loss a difference between the occupancy of each point and a plurality of 2D rendered images generated from RGB colors and a multi-view 2D image of a real model of an image captured through the camera, and
and learning parameters of the latent vector based on the loss.

In paragraph 9,
Wherein the loss includes a difference in RGB color or depth value of each pixel between two images.

In paragraph 8,
The changing step includes calling and setting parameters of an occupied network corresponding to the category of the recognized model in a network parameter DB in which parameters of an occupied network according to each category are stored.

In a learning device for generating a 3D object model using an image,
A 3D model DB that stores a plurality of models by category;
An encoder for generating a latent vector for each model from a 2D image for each model for each category;
An occupancy network that outputs whether each point is occupied and RGB color from the latent vector for each model and the 3-dimensional coordinates of each point;
Loss calculation in which the difference between the result of each model generated from the output of the occupied network and the result of each model generated by rendering the corresponding model is calculated as loss, and parameters of the encoder and the occupant network are learned based on the loss wealth, and
Network parameter DB for storing parameters of the encoder and the occupied network learned for each category
A learning device comprising a.

In paragraph 12,
A neural renderer that generates multiple 2D images for each model from the output of the occupancy network and transfers them to the loss calculator
Including more,
The loss calculation unit calculates a color difference or a depth difference between the plurality of 2D images of each model and the multi-view rendering image of the corresponding model as the loss.

In paragraph 12,
A mesh model generator for generating a 3D mesh model of each model from the output of the occupancy network and passing it to the loss calculation unit.
Including more,
The learning device of claim 1 , wherein the loss calculation unit calculates a coordinate difference between a 3D point of the 3D mesh model of each model and the 3D mesh model itself of the corresponding model as a loss.

In paragraph 12,
The learning apparatus of claim 1 , wherein the loss calculator learns parameters of the latent vector output from the encoder for each model in each category in a state in which parameters of the occupied network for each category are fixed.

In paragraph 15,
A neural renderer that generates multiple 2D images for each model from the output of the occupancy network and transfers them to the loss calculator
Including more,
The loss calculation unit calculates a color difference or a depth difference between a plurality of 2D images of each model and a multi-view rendering image of the corresponding model as a loss, and learns a parameter of a latent vector for each model based on the loss. Device.

In paragraph 15,
A mesh model generator for generating a 3D mesh model of each model from the output of the occupancy network and passing it to the loss calculation unit.
Including more,
The loss calculation unit calculates a coordinate difference between a 3D mesh model of each model and a 3D point of the 3D mesh model of the corresponding model as a loss, and learns a parameter of a latent vector for each model based on the loss. learning device.

In paragraph 15,
Latent vector DB for storing learned latent vectors for each model in each category
A learning device further comprising a.