KR102620475B1

KR102620475B1 - Method and apparatus for performing a re-topology using deep learning algorithms based on images

Info

Publication number: KR102620475B1
Application number: KR1020230090970A
Authority: KR
Inventors: 황용태; 신영하; 최정필; 김윤선
Original assignee: 주식회사 메드픽쳐스
Priority date: 2023-07-13
Filing date: 2023-07-13
Publication date: 2024-01-03

Abstract

본 발명은 이미지를 기초로 딥러닝 알고리즘을 이용하여 토폴로지의 예측 결과에 의한 이미지 내 복수의 서브 다각형에 대하여 사각형화 함으로써 리토폴로지(re-topology)를 수행하기 위한 방법 및 장치에 관한 것이다. 구체적으로, 본 발명은 모서리 및 꼭지점에 기반하여 복수의 서브 다각형으로 토폴로지의 예측이 수행된 트레이닝 이미지들에 대하여 딥러닝 모델이 학습을 할 때 복수의 트레이닝 이미지들에서 무작위하게 잘라낸 부분 이미지들로 토폴로지의 예측의 학습을 하는 구성, 학습된 토폴로지의 예측 모델에서 디코더를 제거하고 인코더 내 컨볼루션 레이어들 뒤에 랜덤 초기화된 소수의 레이어들을 접합하여 FINE TUNING 된 토폴로지의 예측 기반 리토폴로지 예측 모델을 생성하는 구성, 토폴로지의 예측 결과가 기록으로 남은 채 리토폴로지가 완료된 트레이닝 이미지들에 기반하여 토폴로지의 예측 기반 리토폴로지 예측 모델의 학습을 수행하는 구성, 학습된 토폴로지의 예측 기반 리토폴로지 예측 모델을 이용하여 이미지를 입력 받고 설정된 순서에 따라서 잘라낸 부분 이미지들에 기반하여 토폴로지의 예측 기반 리토폴로지 예측을 수행하는 구성을 포함하는 방법, 상기 방법을 수행하도록 구성된 전자 장치, 상기 방법을 수행하도록 구성된 컴퓨터 프로그램을 포함한다.The present invention relates to a method and device for performing retopology by squareling a plurality of subpolygons in an image based on a topology prediction result using a deep learning algorithm based on the image. Specifically, the present invention uses partial images randomly cut from a plurality of training images when a deep learning model learns on training images in which topology prediction was performed with a plurality of subpolygons based on edges and vertices. A configuration that learns the prediction of, removes the decoder from the prediction model of the learned topology, and connects a small number of randomly initialized layers behind the convolutional layers in the encoder to create a retopology prediction model based on the prediction of the FINE TUNING topology. , A configuration that performs learning of a topology prediction-based retopology prediction model based on training images for which retopology has been completed with the topology prediction results recorded, and images are generated using a retopology prediction model based on the learned topology prediction. A method including a configuration for performing topology prediction-based retopology prediction based on partial images received as input and cut out according to a set order, an electronic device configured to perform the method, and a computer program configured to perform the method.

Description

Method and device for performing retopology using deep learning algorithm based on images {METHOD AND APPARATUS FOR PERFORMING A RE-TOPOLOGY USING DEEP LEARNING ALGORITHMS BASED ON IMAGES}

본 발명은 이미지를 기초로 이미지 내 복수의 서브 다각형에 대하여 사각형화 함으로써 리토폴로지(re-topology)를 수행하기 위한 방법 및 장치에 관한 것이다. 구체적으로, 본 발명은 이미지를 기초로 딥러닝 알고리즘을 이용하여 토폴로지의 예측 결과에 의한 이미지 내 복수의 서브 다각형에 대하여 사각형화 함으로써 리토폴로지(re-topology)를 수행하기 위한 방법 및 장치에 관한 것이다.The present invention relates to a method and device for performing re-topology based on an image by rectangulating a plurality of subpolygons in the image. Specifically, the present invention relates to a method and device for performing re-topology by squareling a plurality of subpolygons in an image based on a topology prediction result using a deep learning algorithm based on the image. .

리토폴로지(re-topology)는 3D(3 dimension) 모델링에서 중요한 단계로, 기존의 복잡한 메시(mesh) 구조를 더 단순하고 효율적인 구조로 재구성하는 과정을 말한다. 이 과정은 3D 모델의 품질을 향상시키고, 애니메이션, 렌더링, 게임 엔진 등에서의 성능을 최적화하는 데 도움이 된다.Re-topology is an important step in 3D (3-dimensional) modeling and refers to the process of reconstructing an existing complex mesh structure into a simpler and more efficient structure. This process helps improve the quality of 3D models and optimize performance in animation, rendering, game engines, and more.

이미지를 3차원 영상으로 변환하려면, 먼저 이미지를 3D 모델로 변환하는 과정이 필요하다. 이 과정에서 생성된 3D 모델은 일반적으로 높은 수준의 디테일을 가지고 있지만, 그 구조는 복잡하고 비효율적일 수 있다. 이런 모델은 렌더링이나 애니메이션에 사용하기 어렵다. 이 문제를 해결하기 위해 리토폴로지(re-topology)의 과정이 필요하다.To convert an image into a 3D video, you first need to convert the image into a 3D model. The 3D models created from this process generally have a high level of detail, but their structures can be complex and inefficient. These models are difficult to use for rendering or animation. To solve this problem, a process of re-topology is necessary.

그러나, 리토폴로지는 매우 시간이 많이 소요되는 작업입니다. 특히, 모델의 각 서브 다각형(sub-polygon)을 사람이 직접 사각형으로 변환하는 작업은 매우 복잡하고 어렵다. 이런 이유로, 이 과정을 자동화하기 위한 딥러닝 알고리즘이 필요하다.However, retopology is a very time-consuming task. In particular, the task of manually converting each sub-polygon of the model into a square is very complicated and difficult. For this reason, deep learning algorithms are needed to automate this process.

딥러닝 알고리즘은 기존의 3D 모델을 예측하고, 이미지 내 각 서브 다각형을 효율적인 사각형 구조로 변환하는 방법을 학습할 수 있다. 이렇게 학습된 딥러닝 알고리즘은 새로운 이미지를 3D 모델로 변환할 때, 리토폴로지 과정을 자동으로 수행할 수 있다. 이는 3D 모델링 작업을 크게 간소화하고, 시간과 노력을 절약할 수 있다.Deep learning algorithms can predict existing 3D models and learn how to convert each subpolygon in the image into an efficient square structure. The deep learning algorithm learned in this way can automatically perform the retopology process when converting a new image into a 3D model. This can greatly simplify 3D modeling tasks and save time and effort.

이러한 문제점을 개선하기 대하여, 딥러닝 알고리즘을 이용하여 이미지로부터 쿼드 메쉬(quad mesh)로의 리토폴로지를 학습하는 기술이 기존에 개발된 바 있다. 그러나, 이러한 기술은 CNN(Convolutional Neural Network), U-net 등에 기반하여 convolution(합성 곱)의 operation(동작)으로 구성되어 있다. Convolution은 국소 특징의 추출에는 능하지만 전체적인 특징의 추출은 어렵기 때문에, 너무 큰 FOV(field of view) 영상이 입력될 경우 특징을 추출하여 분할(segmentation)하는 작업의 퍼포먼스가 떨어지는 문제점이 있다.To improve this problem, a technology for learning retopology from an image to a quad mesh using a deep learning algorithm has been previously developed. However, this technology consists of convolution operations based on CNN (Convolutional Neural Network), U-net, etc. Convolution is good at extracting local features, but it is difficult to extract overall features, so when an image with a field of view (FOV) that is too large is input, the performance of feature extraction and segmentation is poor.

이미지에 대하여 토폴로지 예측 기반 리토폴로지의 정보를 생성하는 딥러닝 모델을 학습하기 위해서는 각 이미지에 대하여 토폴로지 예측 기반 리토폴로지가 완료된 트레이닝 이미지가 다수 확보되어야 한다. 그러나, 이미지에 대하여 토폴로지의 예측 결과가 기록으로 남은 채 리토폴로지가 완료된 트레이닝 이미지의 확보가 현실적으로 쉽지 않는 문제점이 있다. 딥러닝 모델에 대하여 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지의 예측 결과에 대한 트레이닝을 수행한 후의 결과 값을 이미지 내 복수의 서브 다각형에 대하여 토폴로지 예측하는 레이어에 대한 초기 값으로 삼고서, 식별된 토폴로지의 예측 결과에 기반하여 리토폴로지를 수행하는 레이어를 추가한 뒤, 토폴로지의 분석에 대한 기록이 없이 단지 리토폴로지 결과를 포함하는 트레이닝 이미지로 학습을 수행한다면, 딥러닝 모델을 토폴로지 예측 기반 리토폴로지를 수행하도록 트레이닝 하기 위한 충분한 자료를 확보할 수 있다. 이미지마다 모서리 및 꼭지점에 기반하여 복수의 서브 다각형으로 토폴로지의 예측이 수행된 트레이닝 이미지는 상당수 확보가 가능하며, 기존에 토폴로지의 예측 결과가 남지 않은 채 리토폴로지가 완료된 트레이닝 이미지 또한 상당수 확보가 가능하기 때문이다.In order to learn a deep learning model that generates topology prediction-based retopology information for an image, a number of training images for which topology prediction-based retopology has been completed must be secured for each image. However, there is a problem that it is not realistically easy to secure a training image in which retopology has been completed while the topology prediction result remains recorded for the image. After training the deep learning model on the topology prediction results made up of multiple subpolygons in the image, the resulting value is set as the initial value for the layer that predicts topology for multiple subpolygons in the image, and the identified topology If you add a layer that performs retopology based on the prediction results and then perform learning with training images containing only the retopology results without any records of topology analysis, you can use the deep learning model as a topology prediction-based retopology. Sufficient data can be obtained to train to perform. It is possible to secure a large number of training images in which topology prediction was performed with multiple subpolygons based on the edges and vertices of each image, and it is also possible to secure a large number of training images for which retopology has been completed without any existing topology prediction results remaining. Because.

공개특허 제10-2023-0047042 호(리토폴로지 방법 및 장치)Publication Patent No. 10-2023-0047042 (Retopology method and device)

상술한 바와 같은 논의를 바탕으로, 본 발명의 다양한 실시 예들은 이미지를 기초로 딥러닝 알고리즘을 이용하여 토폴로지의 예측 결과에 의한 이미지 내 복수의 서브 다각형에 대하여 사각형화 함으로써 리토폴로지(re-topology)를 수행하기 위한 방법 및 장치를 제공한다.Based on the above-described discussion, various embodiments of the present invention use a deep learning algorithm based on the image to square a plurality of subpolygons in the image according to the topology prediction result, thereby achieving retopology. Provides a method and device for performing.

또한, 본 발명의 다양한 실시 예들은 딥러닝 모델에 대하여 이미지 내 복수의 서브 다각형에 대한 토폴로지 예측의 트레이닝을 수행한 후의 결과 값을 이미지 내 복수의 서브 다각형에 대하여 토폴로지 예측하는 레이어에 대한 초기 값으로 삼고서, 식별된 토폴로지의 예측 결과에 기반하여 리토폴로지를 수행하는 레이어를 추가한 뒤, 토폴로지의 분석에 대한 기록이 없이 단지 리토폴로지 결과를 포함하는 트레이닝 이미지로 학습을 수행함으로써, 딥러닝 모델을 이용하여 토폴로지 예측 기반 리토폴로지를 수행하기 위한 방법 및 장치를 제공한다.In addition, various embodiments of the present invention train a deep learning model to predict topology for a plurality of subpolygons in an image, and use the result as an initial value for a layer that predicts topology for a plurality of subpolygons in the image. Then, a deep learning model is created by adding a layer that performs retopology based on the prediction results of the identified topology, and then learning with training images containing only the retopology results without any records of topology analysis. Provides a method and device for performing topology prediction-based retopology.

또한, 본 발명의 다양한 실시 예들은 딥러닝 모델에 대하여 이미지마다 모서리 및 꼭지점에 기반하여 복수의 서브 다각형으로 토폴로지의 예측이 수행된 트레이닝 이미지로 토폴로지의 예측을 학습하고, 토폴로지의 분석에 대한 기록이 없이 단지 리토폴로지 결과를 포함하는 트레이닝 이미지로 쿼드 메쉬(quad mesh)로의 리토폴로지 예측을 학습함으로써, 딥러닝 모델을 이용하여 토폴로지의 예측 기반 리토폴로지 예측을 수행하기 위한 방법 및 장치를 제공한다.In addition, various embodiments of the present invention learn topology prediction using training images in which topology prediction is performed with a plurality of subpolygons based on the edges and vertices of each image for a deep learning model, and record the topology analysis. Provides a method and apparatus for performing topology prediction-based retopology prediction using a deep learning model by learning retopology prediction on a quad mesh using only training images containing retopology results.

또한, 본 발명의 다양한 실시 예들은 모서리 및 꼭지점에 기반하여 복수의 서브 다각형으로 토폴로지의 예측이 수행된 트레이닝 이미지들에 대하여 딥러닝 모델이 학습을 할 때 복수의 트레이닝 이미지들에서 무작위하게 잘라낸 부분 이미지들로 토폴로지의 예측의 학습을 하는 구성, 학습된 토폴로지의 예측 모델에서 디코더를 제거하고 인코더 내 컨볼루션 레이어들 뒤에 랜덤 초기화된 소수의 레이어들을 접합하여 FINE TUNING 된 토폴로지의 예측 기반 리토폴로지 예측 모델을 생성하는 구성, 토폴로지의 예측 결과가 기록으로 남지 않은 채 최종적으로 쿼드 메쉬(quad mesh)로의 리토폴로지가 완료된 트레이닝 이미지들에 기반하여 토폴로지의 예측 기반 리토폴로지 예측 모델의 학습을 수행하는 구성, 학습된 토폴로지의 예측 기반 리토폴로지 예측 모델을 이용하여 이미지를 입력 받고 설정된 순서에 따라서 잘라낸 부분 이미지들에 기반하여 토폴로지의 예측 기반 리토폴로지 예측을 수행하는 구성을 포함하는 방법, 상기 방법을 수행하도록 구성된 전자 장치, 상기 방법을 수행하도록 구성된 컴퓨터 프로그램을 제공한다.In addition, various embodiments of the present invention use partial images randomly cut from a plurality of training images when a deep learning model learns on training images whose topology has been predicted with a plurality of subpolygons based on edges and vertices. A configuration that learns the prediction of the topology, removes the decoder from the prediction model of the learned topology, and connects a small number of randomly initialized layers behind the convolutional layers in the encoder to create a retopology prediction model based on the prediction of the fine-tuned topology. A configuration that performs learning of a topology prediction-based retopology prediction model based on training images in which the topology prediction result has been completed without recording the topology prediction result, and the final retopology to a quad mesh has been completed. A method comprising receiving an image using a topology prediction-based retopology prediction model and performing topology prediction-based retopology prediction based on the cut partial images according to a set order, and an electronic device configured to perform the method , providing a computer program configured to perform the method.

또한, 본 발명의 다양한 실시 예들은 너무 큰 FOV(field of view) 영상이 입력되더라도 특징 추출의 퍼포먼스가 높은 딥러닝 모델 학습을 수행함으로써 토폴로지에 대한 정확한 식별 후 리토폴로지 예측을 수행하기 위한 방법 및 장치를 제공한다.In addition, various embodiments of the present invention provide a method and apparatus for performing retopology prediction after accurate identification of the topology by learning a deep learning model with high performance in feature extraction even when an image with a field of view (FOV) that is too large is input. provides.

본 발명의 다양한 실시 예들에 따르면, 입력 장치, 메모리, 프로세서, 출력 장치를 포함하는 전자 장치의 동작 방법에 있어서, 복수 개의 테스트 이미지들에 기반하여 상기 메모리에 저장된 학습 모델에 대하여 토폴로지의 예측 및 리토폴로지의 학습을 상기 프로세서에 의하여 수행하는 단계; 입력 이미지의 정보를 상기 입력 장치를 통해 수신하는 단계; 상기 입력 이미지로부터 모서리와 꼭지점을 상기 프로세서에 의하여 추출하는 단계; 상기 입력 이미지의 상기 모서리 및 상기 꼭지점에 대한 정보에 기반하여 상기 학습 모델을 이용하여 상기 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지의 예측 결과를 상기 프로세서에 의하여 생성하는 단계; 상기 학습 모델을 이용하여 상기 입력 이미지 내 상기 복수의 서브 다각형에 대하여 사각형화 함으로써 리토폴로지(re-topology)를 상기 프로세서에 의하여 수행하는 단계; 상기 리토폴로지의 결과에 기반하여 상기 입력 이미지에 대한 시각화된 와이어프레임을 상기 출력 장치에 의하여 출력하는 단계를 포함하는 방법이 제공된다.According to various embodiments of the present invention, in a method of operating an electronic device including an input device, a memory, a processor, and an output device, topology prediction and analysis are performed for a learning model stored in the memory based on a plurality of test images. performing topology learning by the processor; Receiving information about an input image through the input device; extracting edges and vertices from the input image by the processor; generating, by the processor, a prediction result of a topology consisting of a plurality of subpolygons in the input image using the learning model based on information about the corners and vertices of the input image; performing re-topology by the processor by squareling the plurality of subpolygons in the input image using the learning model; A method is provided including outputting, by the output device, a visualized wireframe for the input image based on a result of the retopology.

본 발명의 다양한 실시 예들에 따르면, 전자 장치에 있어서, 전자 장치는 전자 장치는 입력 장치, 메모리, 하나의 프로세서, 출력 장치를 포함하고, 본 발명의 다양한 실시 예들에 따른 전자 장치의 동작 방법을 수행하도록 구성된 전자 장치가 제공된다.According to various embodiments of the present invention, in an electronic device, the electronic device includes an input device, a memory, a processor, and an output device, and performs a method of operating the electronic device according to various embodiments of the present invention. An electronic device configured to do so is provided.

본 발명의 다양한 실시 예들에 따르면, 컴퓨터 프로그램에 있어서, 본 발명의 다양한 실시 예들에 따른 전자 장치의 동작 방법을 수행하도록 구성되며, 컴퓨터 판독 가능한 저장 매체에 기록된 컴퓨터 프로그램이 제공된다.According to various embodiments of the present invention, a computer program configured to perform a method of operating an electronic device according to various embodiments of the present invention, and recorded on a computer-readable storage medium, is provided.

본 발명의 다양한 실시 예들은 이미지를 기초로 딥러닝 알고리즘을 이용하여 토폴로지의 예측 결과에 의한 이미지 내 복수의 서브 다각형에 대하여 사각형화 함으로써 리토폴로지(re-topology)를 수행하기 위한 방법 및 장치를 제공할 수 있다.Various embodiments of the present invention provide a method and device for performing re-topology by squareling a plurality of subpolygons in the image according to the topology prediction result using a deep learning algorithm based on the image. can do.

또한, 본 발명의 다양한 실시 예들은 딥러닝 모델에 대하여 이미지 내 복수의 서브 다각형에 대한 토폴로지 예측의 트레이닝을 수행한 후의 결과 값을 이미지 내 복수의 서브 다각형에 대하여 토폴로지 예측하는 레이어에 대한 초기 값으로 삼고서, 식별된 토폴로지의 예측 결과에 기반하여 리토폴로지를 수행하는 레이어를 추가한 뒤, 토폴로지의 예측 결과가 없이 최종적인 쿼드 메쉬(quad mesh)로의 리토폴로지 결과만을 포함하는 트레이닝 영상으로 학습을 수행함으로써, 딥러닝 모델을 이용하여 토폴로지 예측 기반 리토폴로지를 수행하기 위한 방법 및 장치를 제공할 수 있다.In addition, various embodiments of the present invention train a deep learning model to predict topology for a plurality of subpolygons in an image, and use the result as an initial value for a layer that predicts topology for a plurality of subpolygons in the image. Then, a layer that performs retopology is added based on the prediction results of the identified topology, and then learning is performed with training images that only contain the retopology results to the final quad mesh without the topology prediction results. By doing so, it is possible to provide a method and device for performing topology prediction-based retopology using a deep learning model.

또한, 본 발명의 다양한 실시 예들은 딥러닝 모델에 대하여 이미지마다 모서리 및 꼭지점에 기반하여 복수의 서브 다각형으로 토폴로지의 예측이 수행된 트레이닝 이미지로 토폴로지의 예측을 학습하고, 토폴로지의 분석에 대한 기록이 없이 단지 리토폴로지 결과를 포함하는 트레이닝 이미지로 쿼드 메쉬(quad mesh)로의 리토폴로지 예측을 학습함으로써, 딥러닝 모델을 이용하여 토폴로지의 예측 기반 리토폴로지 예측을 수행하기 위한 방법 및 장치를 제공할 수 있다.In addition, various embodiments of the present invention learn topology prediction using training images in which topology prediction is performed with a plurality of subpolygons based on the edges and vertices of each image for a deep learning model, and record the topology analysis. By learning retopology prediction on a quad mesh with only training images containing retopology results, a method and apparatus for performing topology prediction-based retopology prediction using a deep learning model can be provided. .

또한, 본 발명의 다양한 실시 예들은 모서리 및 꼭지점에 기반하여 복수의 서브 다각형으로 토폴로지의 예측이 수행된 트레이닝 이미지들에 대하여 딥러닝 모델이 학습을 할 때 복수의 트레이닝 이미지들에서 무작위하게 잘라낸 부분 이미지들로 토폴로지의 예측의 학습을 하는 구성, 학습된 토폴로지의 예측 모델에서 디코더를 제거하고 인코더 내 컨볼루션 레이어들 뒤에 랜덤 초기화된 소수의 레이어들을 접합하여 FINE TUNING 된 토폴로지의 예측 기반 리토폴로지 예측 모델을 생성하는 구성, 토폴로지의 예측 결과가 기록으로 남은 채 리토폴로지가 완료된 트레이닝 이미지들에 기반하여 토폴로지의 예측 기반 리토폴로지 예측 모델의 학습을 수행하는 구성, 학습된 토폴로지의 예측 기반 리토폴로지 예측 모델을 이용하여 이미지를 입력 받고 설정된 순서에 따라서 잘라낸 부분 이미지들에 기반하여 토폴로지의 예측 기반 리토폴로지 예측을 수행하는 구성을 포함하는 방법, 상기 방법을 수행하도록 구성된 장치를 제공할 수 있다.In addition, various embodiments of the present invention use partial images randomly cut from a plurality of training images when a deep learning model learns on training images whose topology has been predicted with a plurality of subpolygons based on edges and vertices. A configuration that learns the prediction of the topology, removes the decoder from the prediction model of the learned topology, and connects a small number of randomly initialized layers behind the convolutional layers in the encoder to create a retopology prediction model based on the prediction of the fine-tuned topology. Configuration that generates, configuration that performs learning of a topology prediction-based retopology prediction model based on training images for which retopology has been completed with the topology prediction results recorded, using a retopology prediction model based on the learned topology prediction A method including a configuration for receiving an image as an input and performing topology prediction-based retopology prediction based on the cut partial images according to a set order, and an apparatus configured to perform the method can be provided.

또한, 본 발명의 다양한 실시 예들은 너무 큰 FOV(field of view) 영상이 입력되더라도 특징 추출의 퍼포먼스가 높은 딥러닝 모델 학습을 수행함으로써 토폴로지에 대한 정확한 식별 후 리토폴로지 예측을 수행하기 위한 방법 및 장치를 제공할 수 있다.In addition, various embodiments of the present invention provide a method and apparatus for performing retopology prediction after accurate identification of the topology by learning a deep learning model with high performance in feature extraction even when an image with a field of view (FOV) that is too large is input. can be provided.

본 발명에서 얻을 수 있는 효과는 이상에서 언급한 효과들로 제한되지 않으며, 언급하지 않은 또 다른 효과들은 아래의 기재로부터 본 개시가 속하는 기술 분야에서 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The effects that can be obtained from the present invention are not limited to the effects mentioned above, and other effects not mentioned can be clearly understood by those skilled in the art from the description below. will be.

도 1은 본 발명의 다양한 실시 예들에 따른 전자 장치의 동작 개요를 도시한다.
도 2는 본 발명의 다양한 실시 예들에 따른 전자 장치의 구성을 도시한다.
도 3은 본 발명의 다양한 실시 예들에 따른 전자 장치의 동작 방법을 도시한다.
도 4는 본 발명의 다양한 실시 예들에 따른 딥러닝 알고리즘을 구현하기 위한 다층 인공 신경망(multi-layer perceptron, MLP)의 구조를 도시한다.
도 5는 본 발명의 다양한 실시 예들에 따른 딥러닝 알고리즘에 기반하여 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지를 예측하기 위한 학습 과정을 도시한다.
도 6은 본 발명의 다양한 실시 예들에 따른 딥러닝 알고리즘에 기반하여 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지를 예측하는 과정을 도시한다.
도 7은 본 발명의 다양한 실시 예들에 따른 딥러닝 알고리즘에 기반하여 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지를 예측하는 과정을 도시한다.
도 8은 본 발명의 다양한 실시 예들에 따른 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지의 식별에 대한 학습을 마친 모델의 인코더에 해당하는 레이어들에 복수의 레이어들을 추가함으로써 토폴로지의 예측 기반 리토폴로지 예측 모델을 생성하는 과정을 도시한다.
도 9는 본 발명의 다양한 실시 예들에 따른 CNN(convolution neural networks)에 기반한 토폴로지의 예측 기반 리토폴로지 예측 모델의 일 예를 도시한다.
도 10은 본 발명의 다양한 실시 예들에 따른 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지를 분석하고, 토폴로지의 분석 기반 리토폴로지가 수행된 이미지의 일 예를 도시한다.
도 11 내지 도 13은 본 발명의 다양한 실시 예들에 따른 토폴로지의 예측 기반 리토폴로지의 수행 후 시각화된 와이어프레임의 출력 이미지의 일 예를 도시한다.1 shows an overview of the operation of an electronic device according to various embodiments of the present invention.
Figure 2 shows the configuration of an electronic device according to various embodiments of the present invention.
Figure 3 illustrates a method of operating an electronic device according to various embodiments of the present invention.
Figure 4 shows the structure of a multi-layer perceptron (MLP) for implementing a deep learning algorithm according to various embodiments of the present invention.
Figure 5 illustrates a learning process for predicting a topology consisting of a plurality of subpolygons in an input image based on a deep learning algorithm according to various embodiments of the present invention.
Figure 6 shows a process for predicting a topology consisting of a plurality of subpolygons in an input image based on a deep learning algorithm according to various embodiments of the present invention.
Figure 7 shows a process for predicting a topology consisting of a plurality of subpolygons in an input image based on a deep learning algorithm according to various embodiments of the present invention.
Figure 8 shows topology prediction-based retopology prediction by adding a plurality of layers to the layers corresponding to the encoder of the model that has completed learning to identify the topology consisting of a plurality of subpolygons in the input image according to various embodiments of the present invention. The process of creating a model is shown.
FIG. 9 illustrates an example of a retopology prediction model based on topology prediction based on convolution neural networks (CNN) according to various embodiments of the present invention.
FIG. 10 shows an example of an image in which a topology composed of a plurality of subpolygons in an input image is analyzed and retopology based on topology analysis is performed according to various embodiments of the present invention.
11 to 13 illustrate an example of an output image of a wireframe visualized after performing topology prediction-based retopology according to various embodiments of the present invention.

이하, 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다.Hereinafter, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily implement the present invention. The present invention may be implemented in many different forms and is not limited to the embodiments described herein.

도 1은 본 발명의 다양한 실시 예들에 따른 전자 장치의 동작 개요를 도시한다.1 shows an overview of the operation of an electronic device according to various embodiments of the present invention.

도 1을 참고하면, 본 발명의 다양한 실시 예들에 따른 전자 장치(120)는 토폴로지 분석이 수행되지 않은 이미지(110)를 입력 받아서 리토폴로지 기반 와이어프레임 이미지(130)를 출력하는 구조로 동작한다.Referring to FIG. 1, the electronic device 120 according to various embodiments of the present invention operates in a structure that receives an image 110 on which topology analysis has not been performed and outputs a retopology-based wireframe image 130.

본 명세서에서 토폴로지의 분석과 토폴로지의 예측은 서로 상호 교환적(interchangeable)으로 사용될 수 있다. 또한, 본 명세서에서 리토폴로지의 분석과 리토폴로지의 예측은 서로 상호 교환적(interchangeable)으로 사용될 수 있다.In this specification, topology analysis and topology prediction can be used interchangeably. Additionally, in this specification, retopology analysis and retopology prediction can be used interchangeably.

도 2는 본 발명의 다양한 실시 예들에 따른 전자 장치의 구성을 도시한다.Figure 2 shows the configuration of an electronic device according to various embodiments of the present invention.

도 2를 참고하면, 전자 장치(120)는 메모리(121), 프로세서(122), 입력 장치(123), 출력 장치(124)를 포함한다.Referring to FIG. 2 , the electronic device 120 includes a memory 121, a processor 122, an input device 123, and an output device 124.

메모리(121)는, 메모리(121), 프로세서(122), 입력 장치(123), 출력 장치(124)와 연결되고, 입력 장치(123)를 통해 입력된 정보 등을 저장할 수 있다. 또한, 메모리(121)는, 프로세서(122)와 연결되고 프로세서(122)의 동작을 위한 기본 프로그램, 응용 프로그램, 설정 정보, 프로세서(122)의 연산에 의하여 생성된 정보 등의 데이터를 저장할 수 있다. 메모리(121)는 휘발성 메모리, 비휘발성 메모리 또는 휘발성 메모리와 비휘발성 메모리의 조합으로 구성될 수 있다. 그리고, 메모리(121)는 프로세서(122)의 요청에 따라 저장된 데이터를 제공할 수 있다.The memory 121 is connected to the memory 121, the processor 122, the input device 123, and the output device 124, and can store information input through the input device 123. In addition, the memory 121 is connected to the processor 122 and can store data such as basic programs for operation of the processor 122, application programs, setting information, and information generated by operations of the processor 122. . The memory 121 may be comprised of volatile memory, non-volatile memory, or a combination of volatile memory and non-volatile memory. Additionally, the memory 121 may provide stored data according to the request of the processor 122.

프로세서(122)는, 본 발명에서 제안한 절차 및/또는 방법들을 구현하도록 구성될 수 있다. 프로세서(122)는 전자 장치(120)의 전반적인 동작들을 제어한다. 예를 들어, 프로세서(122)는 메모리(121)에 데이터를 기록하고, 읽는다. 또한, 프로세서(122)는 입력 장치(123)를 통해 정보를 입력 받는다. 또한, 프로세서(122)는 출력 장치(140)를 통해 정보를 출력한다. 프로세서(122)는 적어도 하나의 프로세서(processor)를 포함할 수 있다.The processor 122 may be configured to implement the procedures and/or methods proposed in the present invention. The processor 122 controls overall operations of the electronic device 120. For example, the processor 122 writes and reads data into the memory 121. Additionally, the processor 122 receives information through the input device 123. Additionally, the processor 122 outputs information through the output device 140. Processor 122 may include at least one processor.

입력 장치(123)는, 프로세서(122)와 연결되고 정보 등을 입력할 수 있다. 입력 장치(123)는 터치 디스플레이, 키 패드, 키보드, 정보 입력 모듈 등을 포함할 수 있다. 일 실시 예에 따라서, 전자 장치(120)는 송수신기를 더 포함할 수 있으며, 입력 장치(123)는 송수신기를 통해 유/무선 통신 네트워크로 연결된 다른 장치로부터 수신한 정보 등을 입력할 수 있다.The input device 123 is connected to the processor 122 and can input information, etc. The input device 123 may include a touch display, keypad, keyboard, information input module, etc. According to one embodiment, the electronic device 120 may further include a transceiver, and the input device 123 may input information received from another device connected to a wired/wireless communication network through the transceiver.

출력 장치(124)는, 프로세서(122)와 연결되고 정보 등을 영상/음성 등의 형태로 출력할 수 있다. 출력 장치(124)는 디스플레이, 스피커, 정보 출력 모듈 등을 포함할 수 있다. 일 실시 예에 따라서, 전자 장치(120)는 송수신기를 더 포함할 수 있으며, 출력 장치(124)는 송수신기를 통해 유/무선 통신 네트워크로 연결된 다른 장치에게 정보 등을 전송 출력할 수 있다.The output device 124 is connected to the processor 122 and can output information in the form of video/audio, etc. The output device 124 may include a display, speaker, information output module, etc. According to one embodiment, the electronic device 120 may further include a transceiver, and the output device 124 may transmit and output information, etc. to another device connected to a wired/wireless communication network through the transceiver.

도 2에는 도시되지 않았으나, 일 실시 예에 따라서, 전자 장치(120)는 송수신기를 더 포함할 수 있다. 송수신기는, 프로세서(122)와 연결되고 신호를 전송 및/또는 수신한다. 송수신기의 전부 또는 일부는 송신기(transmitter), 수신기(receiver), 또는 트랜시버(transceiver)로 지칭될 수 있다. 송수신기는 유선 접속 시스템 및 무선 접속 시스템들인 IEEE(institute of electrical and electronics engineers) 802.xx 시스템, IEEE Wi-Fi 시스템, 3GPP(3rd generation partnership project) 시스템, 3GPP LTE(long term evolution) 시스템, 3GPP 5G NR(new radio) 시스템, 3GPP2 시스템, 블루투스(bluetooth) 등 다양한 무선 통신 규격 중 적어도 하나를 지원할 수 있다.Although not shown in FIG. 2, according to one embodiment, the electronic device 120 may further include a transceiver. The transceiver is coupled to the processor 122 and transmits and/or receives signals. All or part of a transceiver may be referred to as a transmitter, receiver, or transceiver. The transceiver is a wired access system and a wireless access system, including the IEEE (institute of electrical and electronics engineers) 802.xx system, IEEE Wi-Fi system, 3rd generation partnership project (3GPP) system, 3GPP LTE (long term evolution) system, and 3GPP 5G. It can support at least one of various wireless communication standards such as NR (new radio) system, 3GPP2 system, and Bluetooth.

도 3은 본 발명의 다양한 실시 예들에 따른 전자 장치의 동작 방법을 도시한다.Figure 3 illustrates a method of operating an electronic device according to various embodiments of the present invention.

도 3의 실시 예에서, 전자 장치는 입력 장치, 메모리, 적어도 하나의 프로세서, 출력 장치를 포함한다.In the embodiment of Figure 3, the electronic device includes an input device, a memory, at least one processor, and an output device.

도 3을 참고하면, S301 단계에서, 전자 장치는 토폴로지의 분석이 수행되지 않은 복수의 테스트 이미지들에 기반하여 상기 메모리에 저장된 학습 모델에 대하여 토폴로지의 예측 및 리토폴로지의 학습을 상기 프로세서에 의하여 수행한다.Referring to FIG. 3, in step S301, the electronic device performs topology prediction and retopology learning on the learning model stored in the memory by the processor based on a plurality of test images for which topology analysis has not been performed. do.

S302 단계에서, 전자 장치는 입력 이미지의 정보를 상기 입력 장치를 통해 수신한다.In step S302, the electronic device receives information about the input image through the input device.

S303 단계에서, 전자 장치는 상기 입력 이미지로부터 상기 학습 모델을 이용하여 상기 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지의 예측 결과를 상기 프로세서에 의하여 생성한다.In step S303, the electronic device generates, by the processor, a prediction result of a topology consisting of a plurality of subpolygons in the input image using the learning model from the input image.

S304 단계에서, 전자 장치는 상기 학습 모델을 이용하여 상기 입력 이미지 내 상기 복수의 서브 다각형에 대하여 사각형화 함으로써 리토폴로지(re-topology)를 상기 프로세서에 의하여 수행한다.In step S304, the electronic device uses the learning model to square the plurality of subpolygons in the input image and performs re-topology by the processor.

S305 단계에서, 전자 장치는 상기 리토폴로지의 결과에 기반하여 상기 입력 이미지에 대한 시각화된 와이어프레임을 상기 출력 장치에 의하여 출력한다.In step S305, the electronic device outputs a visualized wireframe for the input image through the output device based on the result of the retopology.

본 발명의 다양한 실시 예들에 따르면, 상기 복수 개의 테스트 이미지들에 기반하여 상기 메모리에 저장된 상기 학습 모델에 대하여 토폴로지의 예측 및 리토폴로지의 학습을 상기 프로세서에 의하여 수행하는 S301 단계는, 토폴로지의 분석이 수행되지 않은 복수의 제1 테스트 이미지들을 상기 입력 장치에 의하여 입력 받는 단계; 상기 복수의 제1 테스트 이미지들에 대하여 모서리 및 꼭지점에 기반한 복수의 서브 다각형으로 이루어진 토폴로지의 예측이 수행된 복수의 제2 테스트 이미지들을 상기 입력 장치에 의하여 입력 받는 단계; 상기 제1 테스트 이미지 내 복수의 무작위의 위치들에서 복수의 제1 부분 테스트 이미지들을 상기 프로세서에 의하여 결정하는 단계; 상기 제2 테스트 이미지 내 상기 복수의 제1 부분 테스트 이미지들에 대응하는 복수의 위치들에서 상기 복수의 제2 부분 테스트 이미지들을 상기 프로세서에 의하여 결정하는 단계; 상기 복수의 제1 부분 테스트 이미지들 및 상기 복수의 제2 테스트 이미지들에 기반하여 상기 메모리에 저장된 제1 학습 모델의 토폴로지의 예측에 대한 학습을 상기 프로세서에 의하여 수행하는 단계; 학습이 수행된 상기 제1 학습 모델의 디코더를 제거하고, 상기 제1 학습 모델의 인코더에 미리 결정된 복수의 레이어들을 결합함으로써 상기 제2 학습 모델을 상기 프로세서에 의하여 생성하는 단계; 토폴로지의 예측 및 리토폴로지가 수행되지 않은 복수의 제3 테스트 이미지들, 상기 복수의 테스트 이미지들 내 복수의 서브 다각형에 대하여 사각형화 함으로써 리토폴로지(re-topology)가 수행된 복수의 제4 테스트 이미지들을 상기 입력 장치에 의하여 입력받는 단계; 상기 복수의 제3 테스트 이미지들, 상기 복수의 제4 테스트 이미지들에 기반하여 상기 제2 학습 모델의 토폴로지의 예측 및 리토폴로지에 대한 학습을 상기 프로세서에 의하여 수행하는 단계를 포함할 수 있다. 상기 학습 모델은 상기 제2 학습 모델에 해당할 수 있다.According to various embodiments of the present invention, step S301 of performing topology prediction and retopology learning for the learning model stored in the memory based on the plurality of test images by the processor, topology analysis Receiving a plurality of first test images that have not been performed by the input device; receiving, by the input device, a plurality of second test images in which a topology prediction consisting of a plurality of sub-polygons based on corners and vertices has been performed on the plurality of first test images; determining, by the processor, a plurality of first partial test images at a plurality of random locations within the first test image; determining, by the processor, the plurality of second partial test images at a plurality of locations corresponding to the plurality of first partial test images within the second test image; performing learning to predict a topology of a first learning model stored in the memory based on the plurality of first partial test images and the plurality of second test images by the processor; generating the second learning model by the processor by removing a decoder of the first learning model on which learning was performed and combining a plurality of predetermined layers with an encoder of the first learning model; A plurality of third test images in which topology prediction and retopology were not performed, and a plurality of fourth test images in which retopology was performed by squarening a plurality of subpolygons in the plurality of test images. receiving input by the input device; It may include performing, by the processor, prediction of a topology and learning of a retopology of the second learning model based on the plurality of third test images and the plurality of fourth test images. The learning model may correspond to the second learning model.

본 발명의 다양한 실시 예들에 따르면, 상기 제2 학습 모델은, 상기 입력된 이미지에 대하여 복수의 부분 이미지들로 분할한 후 상기 복수의 서브 다각형들로 토폴로지의 예측을 수행하고, 상기 복수의 서브 다각형들에 기반하여 사각형화 함으로써 리토폴로지가 수행된 리토폴로지 이미지를 생성하도록 구성될 수 있다.According to various embodiments of the present invention, the second learning model divides the input image into a plurality of partial images and then performs topology prediction using the plurality of sub-polygons. It can be configured to generate a retopology image on which retopology has been performed by rectangulating based on the .

본 발명의 다양한 실시 예들에 따르면, 상기 제1 학습 모델의 인코더는 N개의 제1 파라미터들로 구성되고, 상기 복수의 레이어들은 M개의 제2 파라미터들로 구성되고, N 및 M은 0보다 큰 정수일 수 있다. 상기 제2 학습 모델은 N개의 상기 제1 파라미터들 및 M개의 상기 제2 파라미터들로 구성될 수 있다. 상기 N개의 제1 파라미터들은 상기 제2 학습 모델의 학습이 수행된 결과에 따른 값을 초기 값으로 가질 수 있다. 상기 M 개의 제2 파라미터들은 0과 가까운 랜덤한 값을 초기 값으로 가질 수 있다.According to various embodiments of the present invention, the encoder of the first learning model is composed of N first parameters, the plurality of layers are composed of M second parameters, and N and M are integers greater than 0. You can. The second learning model may be composed of N first parameters and M second parameters. The N first parameters may have values according to the results of training of the second learning model as initial values. The M second parameters may have random values close to 0 as initial values.

본 발명의 다양한 실시 예들에 따르면, 상기 제1 파라미터들의 개수인 N은 상기 제2 파라미터들의 개수인 M보다 9배 이상에 해당할 수 있다.According to various embodiments of the present invention, N, the number of the first parameters, may be 9 times or more than M, the number of the second parameters.

본 발명의 다양한 실시 예들에 따르면, 상기 복수의 제1 3차원 부분 테스트 이미지들, 상기 복수의 제2 부분 테스트 이미지들은 동일하게 설정된 크기 및 설정된 구조로 구성될 수 있다. 상기 복수의 제1 부분 테스트 이미지들, 상기 복수의 제2 부분 테스트 이미지들의 개수는 설정된 수 이상의 무작위의 수에 해당할 수 있다. 상기 복수의 제1 테스트 이미지들, 상기 복수의 제2 테스트 이미지들은 크기 및 구조가 서로 동일할 수 있다.According to various embodiments of the present invention, the plurality of first 3D partial test images and the plurality of second partial test images may have the same size and structure. The number of the plurality of first partial test images and the plurality of second partial test images may correspond to a random number greater than or equal to a set number. The plurality of first test images and the plurality of second test images may have the same size and structure.

본 발명의 다양한 실시 예들에 따르면, 상기 복수의 제2 부분 테스트 이미지들 중 이웃한 제2 부분 테스트 이미지들은 서로 오버랩 되거나 또는 서로 갭이 사이에 존재할 수 있다. According to various embodiments of the present invention, neighboring second partial test images among the plurality of second partial test images may overlap each other or a gap may exist between them.

이때, 제2 부분 테스트 이미지들 중 이웃한 제2 부분 테스트 이미지들은, 다음과 같은 이유로 서로 오버랩 되거나 갭이 존재하게 된다.At this time, neighboring second partial test images among the second partial test images overlap each other or have a gap for the following reasons.

이때, 오버랩이나 갭은 의도적으로 표현된 경우와 오류로 인한 것일 수 있다. At this time, the overlap or gap may be expressed intentionally or may be due to an error.

오류로 인하여 오버랩 및 갭이 발생하는 경우: When overlaps and gaps occur due to errors:

i) 모델링 오류: 소프트웨어 사용자의 실수로 인해 메쉬 간에 갭이 생기거나 오버랩이 생성될 수 있으며, 이는 불완전한 합병, 잘못된 크기 조정, 정확하지 않은 회전 등으로 인한 것일 수 있다.i) Modeling errors: Mistakes by software users may create gaps or overlaps between meshes, which may be due to incomplete merging, incorrect scaling, inaccurate rotation, etc.

ii) 데이터 손실 또는 오류: 파일 변환, 손상, 또는 데이터 손실로 인해 메쉬에 문제가 생길 수 있다. ii) Data loss or errors: File conversion, corruption, or data loss can cause mesh problems.

iii) 복잡한 기하학: 복잡한 형태의 모델링 과정에서는 자동 생성 도구의 한계로 인해 갭이나 오버랩이 발생할 수 있다.iii) Complex geometry: In the modeling process of complex shapes, gaps or overlaps may occur due to the limitations of automatic generation tools.

iv) 스캔 오류: 3D 스캔 데이터는 종종 불완전하거나 정확하지 않을 수 있으며, 이는 오버랩이나 갭을 초래할 수 있다. iv) Scan errors: 3D scan data can often be incomplete or inaccurate, which can lead to overlaps or gaps.

의도적으로 오버랩 및 갭을 발생시키는 경우:When intentionally creating overlaps and gaps:

i) 복합 물체: 여러 개의 독립된 부품으로 구성된 복합 물체의 경우, 각 부품 사이에는 갭이 있을 수 있다. 예를 들어, 로봇의 팔과 몸체 사이에는 갭이 있을 수 있다.i) Complex object: In the case of a complex object consisting of several independent parts, there may be gaps between each part. For example, there may be a gap between the robot's arms and body.

ii) 매트릭스/배열: 모델링 과정에서 동일한 메쉬를 여러 번 복사하여 배열하는 경우, 이러한 복사본들 사이에는 공간적인 갭이 존재할 수 있다.ii) Matrix/Arrangement: When the same mesh is copied and arranged multiple times during the modeling process, spatial gaps may exist between these copies.

iii) 임시 오버랩: 작업 중인 모델의 특정 부분들을 일시적으로 오버랩해 놓을 수 있다. 이는 주로 두 부분이 어떻게 연결되어야 하는지 시각화하기 위한 것으로, 완성된 모델에서는 이런 오버랩이 없다.iii) Temporary overlap: Specific parts of the model being worked on can be temporarily overlapped. This is mainly to visualize how the two parts should be connected, and there is no such overlap in the finished model.

iv) 게임 개발: 게임 개발에서는 메쉬를 겹치게 하는 경우가 종종 있다. 예를 들어, 캐릭터가 옷을 입었을 때, 캐릭터의 몸과 옷은 실제로 오버랩 된다. 이것은 두 메쉬가 서로 독립적으로 움직이고 상호 작용할 수 있게 하기 위한 것으로 정상적인 오버랩이다.iv) Game development: In game development, meshes are often overlapped. For example, when a character wears clothes, the character's body and clothes actually overlap. This is to allow the two meshes to move and interact independently of each other, which is a normal overlap.

이와 같이 오류로 발생된 오버랩 및 갭이 발생되거나, 의도적으로 오버랩 및 갭이 발생되는 경우를 구분하여, 오류로 오버랩되거나 갭이 생성된 손상된 메쉬 모델의 손상부분을 복원하거나, 메쉬 간의 갭을 메우는 등의 작업이 가능하게 된다. In this way, cases where overlaps and gaps are generated due to errors or overlaps and gaps are intentionally created are distinguished, and the damaged portion of the damaged mesh model where errors are overlapped or gaps are created is restored, gaps between meshes are filled, etc. work becomes possible.

본 발명의 다양한 실시 예들에 따르면, 상기 복수의 제1 부분 테스트 이미지들 및 상기 복수의 제2 부분 테스트 이미지들에 기반하여 상기 메모리에 저장된 제1 학습 모델의 학습을 수행하는 단계는, 상기 복수의 제1 부분 테스트 이미지들 중 하나의 제1 부분 테스트 이미지, 상기 복수의 제2 부분 테스트 이미지들 중 상기 하나의 제1 부분 테스트 이미지에 대응하는 하나의 제2 부분 테스트 이미지에 기반하여 상기 제1 학습 모델의 학습을 수행하는 단계가 반복되어 수행되는 단계를 포함할 수 있다. 상기 제1 학습 모델의 학습이 수행될 때마다 상기 복수의 제1 부분 테스트 이미지들 중 서로 다른 하나의 제1 부분 테스트 이미지 및 서로 다른 하나의 제2 부분 테스트 이미지에 기반하여 상기 제1 학습 모델의 학습이 수행될 수 있다.According to various embodiments of the present invention, performing training of the first learning model stored in the memory based on the plurality of first partial test images and the plurality of second partial test images includes: The first learning method is based on one first partial test image among the first partial test images and one second partial test image corresponding to the one first partial test image among the plurality of second partial test images. The step of performing model learning may include a step of being performed repeatedly. Whenever training of the first learning model is performed, the first learning model is based on a different first partial test image and a different second partial test image among the plurality of first partial test images. Learning can be performed.

본 발명의 다양한 실시 예들에 따르면, 전자 장치에 있어서, 상기 전자 장치는 입력 장치, 메모리, 프로세서, 출력 장치를 포함하고, 상기 프로세서는 도 3의 실시 예에 따른 전자 장치의 동작 방법을 수행하도록 구성된 단말이 제공된다.According to various embodiments of the present invention, in an electronic device, the electronic device includes an input device, a memory, a processor, and an output device, and the processor is configured to perform the method of operating the electronic device according to the embodiment of FIG. 3. A terminal is provided.

본 발명의 다양한 실시 예들에 따르면, 도 3의 실시 예에 따른 전자 장치의 동작 방법을 수행하도록 구성되며, 컴퓨터 판독 가능한 저장 매체에 기록된 컴퓨터 프로그램이 제공된다.According to various embodiments of the present invention, a computer program configured to perform the method of operating an electronic device according to the embodiment of FIG. 3 and recorded on a computer-readable storage medium is provided.

도 4는 본 발명의 다양한 실시 예들에 따른 딥러닝 알고리즘을 구현하기 위한 다층 인공 신경망(multi-layer perceptron, MLP)의 구조를 도시한다.Figure 4 shows the structure of a multi-layer perceptron (MLP) for implementing a deep learning algorithm according to various embodiments of the present invention.

본 발명의 다양한 실시 예들에 따른 제1 학습 모델, 제2 학습 모델은 MLP의 구조로 구성될 수 있다. 제1 학습 모델, 제2 학습 모델은 CNN(Convolutional Neural Network), DenseNet, U-net, GoogLeNet, Generative adversarial network 등 다양한 아키텍처로 구현될 수 있다. 예를 들어, 제1 학습 모델, 제2 학습 모델이 CNN으로 구현되는 경우에, 제1 학습 모델, 제2 학습 모델은 학습 데이터를 이용하여 인공 신경망의 연결 가중치를 조정하는 학습 과정을 수행한다.The first learning model and the second learning model according to various embodiments of the present invention may be configured with an MLP structure. The first learning model and the second learning model can be implemented in various architectures such as CNN (Convolutional Neural Network), DenseNet, U-net, GoogLeNet, and Generative adversarial network. For example, when the first learning model and the second learning model are implemented as CNNs, the first learning model and the second learning model perform a learning process of adjusting the connection weights of the artificial neural network using learning data.

딥 러닝(deep learning)은 최근 기계학습 분야에서 대두되고 있는 기술 중 하나로써, 복수 개의 은닉 계층(hidden layer)과 이들에 포함되는 복수 개의 유닛(hidden unit)으로 구성되는 신경망(neural network)이다. 딥 러닝 모델에 기본 특성(low level feature)들을 입력하는 경우, 이러한 기본 특성들이 복수 개의 은닉 계층을 통과하면서 예측하고자 하는 문제를 보다 잘 설명할 수 있는 상위 레벨 특성(high level feature)로 변형된다. 이러한 과정에서 전문가의 사전 지식 또는 직관이 요구되지 않기 때문에 특성 추출에서의 주관적 요인을 제거할 수 있으며, 보다 높은 일반화 능력을 갖는 모델을 개발할 수 있게 된다. 나아가, 딥 러닝의 경우 특징 추출과 모델 구축이 하나의 세트로 구성되어 있기 때문에 기존의 기계학습 이론들 대비 보다 단순한 과정을 통하여 최종 모델을 형성할 수 있다는 장점이 있다.Deep learning is one of the technologies that has recently emerged in the field of machine learning. It is a neural network composed of multiple hidden layers and multiple hidden units included in them. When basic features (low level features) are input to a deep learning model, these basic features pass through multiple hidden layers and are transformed into high level features that can better explain the problem to be predicted. Because this process does not require an expert's prior knowledge or intuition, subjective factors in feature extraction can be eliminated, and a model with higher generalization ability can be developed. Furthermore, deep learning has the advantage of being able to form the final model through a simpler process compared to existing machine learning theories because feature extraction and model construction are comprised of one set.

다층 인공 신경망(multi-layer perceptron, MLP)는 딥 러닝에 기반하여 여러 개의 노드가 있는 인공 신경망(artificial neural network, ANN)의 한 종류이다. 각 노드는 동물의 연결 패턴과 유사한 뉴런으로 비선형 활성화 기능을 사용한다. 이 비선형 성질은 분리할 수 없는 데이터를 선형적으로 구분할 수 있게 한다.A multi-layer perceptron (MLP) is a type of artificial neural network (ANN) with multiple nodes based on deep learning. Each node uses a non-linear activation function with neurons similar to the connection patterns of animals. This non-linear property allows inseparable data to be linearly separated.

도 4를 참고하면, 본 발명의 다양한 실시 예들에 따른 MLP 모델의 인공 신경망(400)은 입력 계층(input layer)(410), 복수 개의 은닉 계층(hidden layer)(430), 출력 계층(output layer)(450)으로 구성된다. Referring to FIG. 4, the artificial neural network 400 of the MLP model according to various embodiments of the present invention includes an input layer 410, a plurality of hidden layers 430, and an output layer. )(450).

입력 계층(410)의 노드에는 이미지에 관련된 복수의 인자들과 같은 입력 데이터가 입력된다. 여기서, 이미지에 관련된 복수의 인자들(411)은 딥 러닝 모델의 기본 특성(low level feature)에 해당한다.Input data, such as a plurality of factors related to an image, are input to the nodes of the input layer 410. Here, a plurality of factors 411 related to the image correspond to basic features (low level features) of the deep learning model.

은닉 계층(430)의 노드에서는 입력된 인자들에 기초한 계산이 이루어진다. 은닉 계층(430)은 이미지에 관련된 복수의 인자들(411)을 규합시켜 형성된 복수 개의 노드로 정의되는 유닛들이 저장된 계층이다. 은닉 계층(430)은 도 4에 도시된 바와 같이 복수 개의 은닉 계층으로 구성될 수 있다. At the nodes of the hidden layer 430, calculations are made based on the input parameters. The hidden layer 430 is a layer in which units defined by a plurality of nodes formed by combining a plurality of factors 411 related to the image are stored. The hidden layer 430 may be composed of a plurality of hidden layers as shown in FIG. 4.

예를 들어, 은닉 계층(430)이 제1 은닉 계층(431) 및 제2 은닉 계층(433)으로 구성될 경우, 제1 은닉 계층(431)은 가장 하위 특징인 이미지에 관련된 복수의 인자들(411)을 규합시켜 형성된 복수 개의 노드로 정의되는 제1 유닛들(432)이 저장되는 계층으로서, 제1 유닛(432)은 이미지에 관련된 복수의 인자들의 상위 특징에 해당된다. 제2 은닉 계층(433)은 제1 은닉 계층(431)의 제1 유닛들을 규합시켜 형성된 복수 개의 노드로 정의되는 제2 유닛들(434)이 저장되는 계층으로, 제2 유닛(434)은 제1 유닛(432)의 상위 특징에 해당된다.For example, when the hidden layer 430 is composed of a first hidden layer 431 and a second hidden layer 433, the first hidden layer 431 contains a plurality of factors related to the image, which is the lowest feature ( It is a layer in which first units 432 defined as a plurality of nodes formed by combining 411 are stored, and the first unit 432 corresponds to the upper characteristics of a plurality of factors related to the image. The second hidden layer 433 is a layer in which second units 434, defined as a plurality of nodes formed by combining the first units of the first hidden layer 431, are stored, and the second unit 434 is the 1 Corresponds to the upper feature of unit 432.

출력 계층(450)의 노드에서는 계산된 예측 결과를 나타낸다. 출력 계층(450)에는 복수 개의 예측 결과 유닛들(451)이 구비될 수 있다. 구체적으로 복수 개의 예측 결과 유닛들(451)은 참(True) 유닛 및 거짓(False) 유닛의 두 개의 유닛들로 구성될 수 있다. 구체적으로, 참 유닛은 이미지가 특정 특징과 관계된다는 의미를 지닌 예측 결과 유닛이고, 거짓 유닛은 이미지가 특정 특징과 관계되지 않는다는 의미를 지닌 예측 결과 유닛이다.The nodes of the output layer 450 display the calculated prediction results. The output layer 450 may be provided with a plurality of prediction result units 451. Specifically, the plurality of prediction result units 451 may be composed of two units: a True unit and a False unit. Specifically, a true unit is a prediction result unit that means that the image is related to a specific feature, and a false unit is a prediction result unit that means that the image is not related to a specific feature.

은닉 계층(430) 중 마지막 계층인 제2 은닉 계층(433)에 포함된 제2 유닛들(434)과 예측 결과 유닛들(451) 간의 연결에 대하여 각각의 가중치들이 부여되게 된다. 이러한 가중치에 기초하여 이미지의 특정 특징과 관련 여부를 예측하게 된다. Each weight is assigned to the connection between the second units 434 included in the second hidden layer 433, which is the last layer of the hidden layer 430, and the prediction result units 451. Based on these weights, it is predicted whether a particular feature of the image is related or not.

예를 들어, 제2 유닛(434) 중 어느 하나의 유닛이 이미지가 특정 특징과 관계될 것으로 예측하는 경우 참 유닛 및 거짓 유닛과 각각 연결되는데, 참 유닛 과의 연결에 대해서는 양의 값을 갖는 가중치가 부여될 것이고, 거짓 유닛과의 연결에 대해서는 음의 값을 갖는 가중치가 부여될 것이다. 반대로, 제2 유닛(434) 중 어느 하나의 유닛이 이미지가 특정 특징과 관계되지 않을 것으로 예측하는 경우 참 유닛 및 거짓 유닛과 각각 연결되는데, 참 유닛 과의 연결에 대해서는 음의 값을 갖는 가중치가 부여될 것이고, 거짓 유닛과의 연결에 대해서는 양의 값을 갖는 가중치가 부여될 것이다.For example, if any one of the second units 434 predicts that the image will be related to a specific feature, it is connected to the true unit and the false unit, respectively. A weight having a positive value for the connection with the true unit will be assigned, and a negative weight will be assigned to connections with false units. Conversely, if any one of the second units 434 predicts that the image is not related to a specific feature, it is connected to the true unit and the false unit, respectively, and the connection with the true unit has a negative weight. A positive weight will be assigned to connections with false units.

복수 개의 제2 유닛들(434)과 참 유닛 사이에는 복수 개의 연결선들이 형성될 것이다. 복수 개의 연결선들의 총 합이 양의 값을 갖는 경우, 입력 계층(410)에서의 이미지에 관련된 복수의 인자들(411)은 이미지가 특정 특징과 관계되는 인자들로 예측될 것이다. 일 실시 예에 따라서, 이러한 이미지가 특정 특징과 관계되는지 가능 여부는 복수 개의 연결선들의 총 합과 미리 설정된 값을 비교하여 예측할 수도 있다.A plurality of connection lines will be formed between the plurality of second units 434 and the true unit. If the total sum of the plurality of connection lines has a positive value, the plurality of factors 411 related to the image in the input layer 410 will be predicted as factors related to the image with a specific feature. According to one embodiment, whether such an image is related to a specific feature may be predicted by comparing the total sum of a plurality of connection lines with a preset value.

MLP 모델의 인공 신경망(400)은 학습 파라미터들을 조정하여 학습한다. 일 실시 예에 따라서, 학습 파라미터들은 가중치 및 편차 중 적어도 하나를 포함한다. 학습 파라미터들은 경사 하강법(gradient descent)이라는 최적화 알고리즘을 통해 반복적으로 조정된다. 주어진 데이터 샘플로부터 예측 결과가 계산될 때마다(순방향 전파, forward propagation), 예측 오류를 측정하는 손실 함수를 통해 네트워크의 성능이 평가된다. 인공 신경망(400)의 각 학습 파라미터는 손실 함수의 값을 최소화하는 방향으로 조금식 증가하여 조정되는데, 이 과정은 역 전파(back-propagation)라고 한다.The artificial neural network 400 of the MLP model learns by adjusting learning parameters. According to one embodiment, the learning parameters include at least one of weight and variance. Learning parameters are iteratively adjusted through an optimization algorithm called gradient descent. Each time a prediction result is calculated from a given data sample (forward propagation), the network's performance is evaluated through a loss function that measures the prediction error. Each learning parameter of the artificial neural network 400 is adjusted by increasing it slightly in the direction of minimizing the value of the loss function, and this process is called back-propagation.

도 5는 본 발명의 다양한 실시 예들에 따른 딥러닝 알고리즘에 기반하여 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지를 예측하기 위한 학습 과정을 도시한다.Figure 5 illustrates a learning process for predicting a topology consisting of a plurality of subpolygons in an input image based on a deep learning algorithm according to various embodiments of the present invention.

도 5는 U-net의 예시로 도시되었으나, 본 발명의 다양한 실시 예들에 따른 제1 학습 모델, 제2 학습 모델은 U-net 외에 CNN(Convolutional Neural Network), DenseNet, GoogLeNet, Generative adversarial network 등 다양한 아키텍처로 구현될 수 있다.Figure 5 is shown as an example of U-net, but the first learning model and the second learning model according to various embodiments of the present invention include various models such as CNN (Convolutional Neural Network), DenseNet, GoogLeNet, and Generative adversarial network in addition to U-net. It can be implemented as an architecture.

도 5를 참고하면, 상단부는 학습 데이터를 이용하여 U-net의 파라미터를 찾아가는 과정을 나타내고, 하단부는 학습을 통해 알게 된 U-net의 파라미터를 이용하여 복수의 서브 다각형으로 토폴로지의 예측 결과를 표시하여 제공하는 과정을 나타낸다.Referring to Figure 5, the upper part shows the process of finding the parameters of the U-net using learning data, and the lower part displays the topology prediction results in a plurality of subpolygons using the parameters of the U-net learned through learning. indicates the process provided.

제1 학습 모델은 입력 이미지가 2차원 영상이든 3차원 영상이든 적용될 수 있다. 입력 영상을 동일한 크기의 부분 영상들로 분할한 뒤 복수의 서브 다각형으로 토폴로지의 예측을 수행하고, 복수의 서브 다각형으로 표시된 각각의 부분 영상들로부터 하나의 전체 출력 영상을 생성한다는 동작 원리가 2차원 영상이든 3차원 영상이든 동일하게 적용될 수 있다.The first learning model can be applied whether the input image is a 2-dimensional image or a 3-dimensional image. The operating principle is to divide the input image into partial images of the same size, perform topology prediction using multiple subpolygons, and generate one overall output image from each partial image represented by multiple subpolygons. It can be equally applied to video or 3D video.

도 5의 상단부를 참고하면, 정답이 있는 데이터를 이용한 학습을 통해 U-net 파라미터, 예를 들어, N개의 파라미터, w₁, ..., w_N이 결정될 수 있다. 학습시에, 랜덤 위치에서 crop된 이미지 patch 가 입력되어서 동일한 위치에서 3D segmentation patch 가 출력될 수 있도록 U-net의 파라미터를 결정한다. 이 때 랜덤 crop(잘라내기)을 매우 여러 번 시행하여 (최소 몇백 번) 공간 상에 miss 되는 영역이 학습시 최소화되도록 한다. 만약 crop 사이즈가 10x10x10 라 하면, 학습이 끝난 후 얻게 될 U-net 은 10x10x10 image를 입력 받아서 10x10x10 segmentation을 출력하는 구조를 갖는다. Referring to the upper part of FIG. 5, U-net parameters, for example, N parameters, w ₁ , ..., w _N can be determined through learning using data with correct answers. During learning, U-net parameters are determined so that cropped image patches are input at random positions and 3D segmentation patches are output at the same positions. At this time, random cropping is performed many times (at least several hundred times) to minimize missed areas in space during learning. If the crop size is 10x10x10, the U-net obtained after learning is completed has a structure that receives a 10x10x10 image as input and outputs 10x10x10 segmentation.

도 5의 하단부를 참고하면, 실제 예측 결과의 제공시에, 복수의 서브 다각형으로의 토폴로지 분석이 없는 영상, 예를 들어 200x200x100 크기의 이미지를 여러 개의 10x10x10 크기의 부분 영상으로 규칙적으로 나눈 후 (총 20x20x10 = 4000개), 각 10x10x10 크기의 부분 영상을 이미 학습된 U-net 모델에 입력하여 10x10x10 segmentation patch를 얻을 수 있고, 이런 식으로 반복해서 얻는 총 4000개의 segmentation patch를 이어 붙여서 원 이미지 크기와 동일한 200x200x100 크기의 segmentation 결과를 얻을 수 있다.Referring to the lower part of FIG. 5, when providing the actual prediction result, an image without topology analysis into multiple subpolygons, for example, an image of size 200x200x100, is regularly divided into several partial images of size 10x10x10 (total 20x20x10 = 4000), a 10x10x10 segmentation patch can be obtained by inputting each 10x10x10-sized partial image into the already learned U-net model, and a total of 4000 segmentation patches obtained repeatedly in this way are concatenated to create the same size as the original image. You can obtain segmentation results with a size of 200x200x100.

U-net은 인코더와 디코더로 구성되는데, U-net의 인코더(즉, 왼쪽 날개)를 구성하는 핵심인 convolution(합성 곱) 이라는 operation(동작)이 국소 특징 추출에는 능하지만 글로벌한 특징 추출은 어려운 문제점이 존재한다. 만약, 도 5와 같이 crop이 없이 테스트 영상을 한 번에 학습한다면, 예를 들어, crop을 하지 않고 200x200 크기의 원 이미지를 통으로 U-net에 입력하여 학습한다면, 그와 같이 학습된 U-net에 대하여 실제 예측 결과의 제공시에 너무 큰 FOV (field of view) 영상이 입력으로 들어왔을 때 segmentation을 수행하는 성능이 떨어지게 될 수 있다.U-net is composed of an encoder and a decoder. The operation called convolution, which is the core of the encoder (i.e. left wing) of U-net, is good at extracting local features, but is difficult to extract global features. There is a problem. If the test image is learned at once without cropping as shown in Figure 5, for example, if a 200x200 original image is input to the U-net for learning without cropping, the U-net learned in this way When providing actual prediction results, when an image with a too large FOV (field of view) is input, the performance of performing segmentation may deteriorate.

도 6은 본 발명의 다양한 실시 예들에 따른 딥러닝 알고리즘에 기반하여 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지를 예측하는 과정을 도시한다.Figure 6 shows a process for predicting a topology consisting of a plurality of subpolygons in an input image based on a deep learning algorithm according to various embodiments of the present invention.

구체적으로, 도 6은 본 발명의 다양한 실시 예들에서 이미지로부터 복수의 서브 다각형으로 토폴로지를 식별하기 위한 제1 학습 모델의 동작 과정을 도시한다.Specifically, FIG. 6 illustrates an operation process of a first learning model for identifying a topology from an image into a plurality of subpolygons in various embodiments of the present invention.

제1 학습 모델은 다양한 형태의 이미지들에 대하여 복수의 서브 다각형으로 토폴로지를 식별한 학습데이터를 이용하여 학습시킬 수 있다. 예를 들어, 이미지를 복수의 서브 다각형으로 구분하는 인공지능 모델을 생성하고자 하는 경우에, 학습용 이미지의 각 부분이 복수의 서브 다각형으로 구별된 학습데이터를 이용하여 제1 학습 모델을 학습시킬 수 있다.The first learning model can be trained using learning data that identifies the topology of various types of images with a plurality of subpolygons. For example, when trying to create an artificial intelligence model that divides an image into a plurality of subpolygons, the first learning model can be trained using training data in which each part of the training image is divided into a plurality of subpolygons. .

학습 완료된 제1 학습 모델은 토폴로지의 분석이 수행되지 않은 이미지를 입력 받으면 이미지에 대하여 복수의 서브 다각형으로 토폴로지를 식별한 결과를 출력한다.When the first learning model that has completed training receives an image for which topology analysis has not been performed, it outputs a result of identifying the topology of the image with a plurality of subpolygons.

도 7은 본 발명의 다양한 실시 예들에 따른 딥러닝 알고리즘에 기반하여 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지를 예측하는 과정을 도시한다.Figure 7 shows a process for predicting a topology consisting of a plurality of subpolygons in an input image based on a deep learning algorithm according to various embodiments of the present invention.

구체적으로, 도 7은 본 발명의 다양한 실시 예들에서 이미지로부터 복수의 서브 다각형으로 이루어진 토폴로지를 식별하기 위한 제1 학습 모델의 구조를 도시한다.Specifically, FIG. 7 shows the structure of a first learning model for identifying a topology composed of a plurality of subpolygons from an image in various embodiments of the present invention.

도 7은 예시적으로 Vision transformer를 이용한 이미지로부터 복수의 서브 다각형으로 이루어진 토폴로지를 식별하기 위한 학습 과정 구조를 도시한다.Figure 7 exemplarily shows the structure of a learning process for identifying a topology composed of a plurality of subpolygons from an image using a vision transformer.

도 7에서, H, W, D는 입력 이미지의 가로, 세로, 깊이 방향 크기를 나타낸다.In Figure 7, H, W, and D represent the horizontal, vertical, and depth sizes of the input image.

도 7에서, Z3, Z6, Z9, Z12는 합성 곱 기반의 인코더의 각 레이어 마다의 출력 텐서(tensor)를 의미한다. 예를 들어, Z3은 H/16 X W/16 X D/16 X 768 의 크기를 갖는 4차원 텐서를 의미한다.In FIG. 7, Z3, Z6, Z9, and Z12 refer to the output tensor for each layer of the convolution-based encoder. For example, Z3 means a 4-dimensional tensor with a size of H/16

기계 학습에서 텐서는 숫자 값의 다차원 배열로 표현될 수 있는 수학적 개체이다. 기계 학습 알고리즘에서 입력, 출력 및 중간 계산을 나타내는 데 사용되는 기본 데이터 구조이다. 텐서는 순위라고 하는 다양한 수의 차원을 가질 수 있다. 예를 들어 스칼라(단일 값)는 순위 0 텐서이고 벡터(값 배열)는 순위 1 텐서이고 행렬(값의 2D 배열)은 순위 2 텐서이다. 텐서는 딥 러닝, 컨볼루션 신경망, 순환 신경망 등 다양한 유형의 기계 학습 알고리즘에 사용된다. 또한 물리학 및 공학과 같은 다른 많은 과학 분야에서 데이터를 나타내는 데 사용된다. 텐서는 기계 학습 알고리즘에서 데이터를 나타내는 데 사용되는 숫자 값의 다차원 배열이다.In machine learning, a tensor is a mathematical entity that can be represented as a multidimensional array of numeric values. It is the basic data structure used to represent inputs, outputs, and intermediate computations in machine learning algorithms. Tensors can have a varying number of dimensions, called ranks. For example, a scalar (a single value) is a rank 0 tensor, a vector (an array of values) is a rank 1 tensor, and a matrix (2D array of values) is a rank 2 tensor. Tensors are used in various types of machine learning algorithms, including deep learning, convolutional neural networks, and recurrent neural networks. It is also used to represent data in many other scientific fields such as physics and engineering. A tensor is a multidimensional array of numeric values used to represent data in machine learning algorithms.

도 7을 참고하면, 제1 학습 모델은 입력된 이미지를 특정 설정된 크기로 분할하여, 각각의 분할된 부분 영상에 대하여 특징을 추출하여 분할된 이미지를 복수의 서브 다각형으로 구분하는 토폴로지의 예측을 수행하고, 분할된 영역 별로 복수의 서브 다각형으로 표시하고, 각각의 복수의 서브 다각형으로 토폴로지 예측이 표시된 부분 영상을 합쳐서 하나의 토폴로지 예측이 수행된 출력 영상을 생성하여 출력한다.Referring to FIG. 7, the first learning model divides the input image into a specific set size, extracts features for each divided partial image, and performs topology prediction that divides the divided image into a plurality of subpolygons. Each divided area is displayed as a plurality of subpolygons, and the partial images on which topology prediction is displayed for each of the plurality of subpolygons are combined to generate and output one output image on which topology prediction has been performed.

도 7의 제1 학습 모델의 구조에서 좌측이 인코더이고 우측이 디코더에 해당한다.In the structure of the first learning model in Figure 7, the left side corresponds to the encoder and the right side corresponds to the decoder.

도 7의 예시적인 제1 학습 모델의 인코더는 트랜스포머 기반 아키텍처에 해당하는 ViT(Vision Transformer)로 나타나 있다. 그러나, ViT는 예시적인 것에 불과하며, 제1 학습 모델의 인코더는 다양한 아키텍쳐가 적용될 수 있다. 예를 들어, CNN에 속하는 U-Net이 본 발명의 다양한 실시 예들에 따른 제1 학습 모델의 인코더로 사용될 수도 있다.The encoder of the exemplary first learning model in FIG. 7 is shown as Vision Transformer (ViT), which corresponds to a transformer-based architecture. However, ViT is only an example, and various architectures may be applied to the encoder of the first learning model. For example, U-Net, which belongs to CNN, may be used as an encoder for the first learning model according to various embodiments of the present invention.

인코더-디코더 아키텍처에서 인코더는 입력 데이터를 처리하고 압축된 표현을 생성한 다음 디코더로 전달되어 출력을 생성하는 역할을 한다. 인코더 아키텍처의 선택은 입력 데이터의 특성과 당면한 작업에 따라 다르다.In an encoder-decoder architecture, the encoder is responsible for processing the input data and creating a compressed representation, which is then passed to the decoder to produce the output. The choice of encoder architecture depends on the nature of the input data and the task at hand.

다음은 인코더-디코더 아키텍처에서 사용할 수 있는 인코더 아키텍처의 몇 가지 예시이다.Below are some examples of encoder architectures that can be used in an encoder-decoder architecture.

컨볼루션 신경망(convolutional neural networks, CNN): CNN은 일반적으로 이미지 및 비디오 처리 작업에서 인코더로 사용된다. 이미지와 같은 그리드와 같은 구조의 입력 데이터를 처리하도록 설계되었으며 컨벌루션 및 풀링 작업을 연속적으로 적용하여 입력 데이터의 계층적 표현을 학습할 수 있다.Convolutional neural networks (CNN): CNNs are commonly used as encoders in image and video processing tasks. It is designed to process input data with a grid-like structure, such as images, and can learn a hierarchical representation of the input data by successively applying convolution and pooling operations.

순환 신경망(recurrent neural networks, RNN): RNN은 일반적으로 입력 데이터가 일련의 단어인 자연어 처리 작업에서 인코더로 사용된다. RNN은 가변 길이의 시퀀스를 처리하고 시퀀스의 다른 부분 간의 종속성을 캡처하는 방법을 학습할 수 있다.Recurrent neural networks (RNN): RNNs are typically used as encoders in natural language processing tasks where the input data is a sequence of words. RNNs can learn to process sequences of variable length and capture dependencies between different parts of the sequence.

트랜스포머 기반 아키텍처(transformer-based architectures): 자연어 처리를 위한 원래 Transformer 아키텍처에 사용된 Transformer와 같은 Transformer 기반 아키텍처도 인코더로 사용할 수 있다. 이러한 아키텍처는 self-attention 메커니즘을 사용하여 입력 데이터를 처리하고 입력의 서로 다른 부분 간의 장거리 종속성을 캡처할 수 있다.Transformer-based architectures: Transformer-based architectures, such as the Transformer used in the original Transformer architecture for natural language processing, can also be used as encoders. These architectures can use self-attention mechanisms to process input data and capture long-range dependencies between different parts of the input.

오토인코더(autoencoders): 오토인코더는 입력 데이터를 재구성하기 위해 인코더와 디코더가 함께 훈련되는 인코더-디코더 아키텍처로 사용할 수 있는 신경망 유형이다. Autoencoder는 입력 데이터의 압축된 표현을 학습할 수 있으며, 이는 이미지 압축 또는 이상 탐지와 같은 작업에 유용할 수 있다.Autoencoders: Autoencoders are a type of neural network that can be used in an encoder-decoder architecture where the encoder and decoder are trained together to reconstruct the input data. Autoencoders can learn compressed representations of input data, which can be useful for tasks such as image compression or anomaly detection.

전반적으로 인코더 아키텍처의 선택은 입력 데이터의 특성과 당면한 작업에 따라 달라지며 다른 유형의 입력 데이터 및 작업에는 다른 인코더 아키텍처가 더 적합할 수 있다.Overall, the choice of encoder architecture depends on the nature of the input data and the task at hand; different encoder architectures may be better suited for different types of input data and tasks.

도 7의 예시적인 제1 학습 모델에서 각각의 인코더 구성에 대한 개략적인 설명은 다음과 같다. ViT(Vision Transformer)는 입력 이미지를 처리하기 위해 self-attention 메커니즘을 사용하는 컴퓨터 비전 작업을 위한 인기 있는 딥 러닝 아키텍처이다. 다음은 ViT의 핵심 구성 요소이다.A schematic description of each encoder configuration in the example first learning model of Figure 7 is as follows. Vision Transformer (ViT) is a popular deep learning architecture for computer vision tasks that uses a self-attention mechanism to process input images. The following are the core components of ViT:

임베디드 패치(Embedded patches): 입력 이미지는 겹치지 않는 패치로 분할된 다음 학습 가능한 선형 프로젝션 레이어를 사용하여 저차원 기능 공간에 선형으로 프로젝션된다. 이 변환을 통해 네트워크는 글로벌 컨텍스트를 유지하면서 이미지의 로컬 기능에서 작동할 수 있다.Embedded patches: The input image is segmented into non-overlapping patches and then linearly projected onto a low-dimensional feature space using a learnable linear projection layer. This transformation allows the network to operate on local features of the image while maintaining the global context.

멀티 헤드 어텐션(multi-head attention): 내장된 패치는 이미지의 중요한 영역에 주의(attention)를 기울이는 방법을 학습하는 셀프-어텐션 메커니즘(self-attention mechanism)에 의해 처리된다. 어텐션 메커니즘은 서로 다른 학습된 가중치 세트(즉, 헤드(head))로 여러 번 적용되며, 이를 통해 네트워크는 입력에서 서로 다른 패턴과 종속성을 캡처할 수 있다.Multi-head attention: Embedded patches are processed by a self-attention mechanism that learns to pay attention to important areas of the image. The attention mechanism is applied multiple times with different sets of learned weights (i.e. heads), which allows the network to capture different patterns and dependencies in the input.

정규화(normalization): 학습 과정을 안정화하고 네트워크의 일반화 성능을 향상시키기 위해 레이어 정규화를 각 어텐션 레이어의 출력과 MLP에 적용한다.Normalization: Layer normalization is applied to the output of each attention layer and MLP to stabilize the learning process and improve the generalization performance of the network.

다층 퍼셉트론(multi-layer perceptron, MLP): 어텐션 메커니즘의 출력은 각 패치 표현에 개별적으로 비선형 변환을 적용하는 다층 퍼셉트론(multi-layer perceptron, MLP)을 통해 전달된다.Multi-layer perceptron (MLP): The output of the attention mechanism is passed through a multi-layer perceptron (MLP) that applies a non-linear transformation to each patch representation separately.

전반적으로 ViT 아키텍처는 이러한 구성 요소를 사용하여 입력 이미지를 일련의 패치 표현으로 변환한 다음 일련의 어텐션 및 MLP 레이어에서 처리하여 최종 출력을 생성한다.Overall, the ViT architecture uses these components to transform an input image into a series of patch representations, which are then processed by a series of attention and MLP layers to produce the final output.

도 8은 본 발명의 다양한 실시 예들에 따른 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지의 식별에 대한 학습을 마친 모델의 인코더에 해당하는 레이어들에 복수의 레이어들을 추가함으로써 토폴로지의 예측 기반 리토폴로지 예측 모델을 생성하는 과정을 도시한다.Figure 8 shows topology prediction-based retopology prediction by adding a plurality of layers to the layers corresponding to the encoder of the model that has completed learning to identify the topology consisting of a plurality of subpolygons in the input image according to various embodiments of the present invention. The process of creating a model is shown.

도 8을 참고하면, 도 7과 같이 영상으로부터 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지의 식별에 대한 식별을 위해 학습된 제1 학습 모델에서 디코더를 버리고 인코더 만을 분리하고, 인코더 뒤에 분류 헤드(classification head, c-head)라고 불리는 완전히 연결된 레이어(fully connected layer, FC layer, FC 레이어)를 복수 개 붙인다.Referring to FIG. 8, as shown in FIG. 7, the decoder is discarded and only the encoder is separated from the first learning model learned to identify the topology consisting of a plurality of subpolygons in the input image from the image, and a classification head (classification head) is placed behind the encoder. Multiple fully connected layers (FC layer, FC layer) called head, c-head are attached.

만약, 제1 학습 모델에서 분리한 인코더가 도 8의 ViT라면, 도 8 하단의 MLP, 소프트맥스(softmax)가 c-head에 해당한다.If the encoder separated from the first learning model is ViT in FIG. 8, the MLP and softmax at the bottom of FIG. 8 correspond to c-head.

Softmax는 실수 벡터를 확률 분포로 변환하기 위해 신경망의 분류 헤드에서 사용되는 수학 함수이다. 작업이 다중 클래스 분류일 때 신경망의 출력 계층에서 최종 활성화 함수로 일반적으로 사용된다. Softmax 함수는 실수의 벡터를 입력으로 받아 출력이 가능한 클래스에 대한 확률 분포가 되도록 정규화한다. 구체적으로 softmax 함수는 각 입력 값을 0과 1 사이의 값으로 매핑한 다음 합이 1이 되도록 값을 정규화한다. softmax 함수의 출력은 가능한 클래스에 대한 확률 분포이며 출력 벡터의 각 요소는 해당 클래스에 속하는 입력의 확률을 나타낸다. 신경망의 맥락에서, softmax 함수는 일반적으로 네트워크가 예측하려는 클래스에 대한 확률 분포를 생성하기 위해 네트워크의 마지막 계층의 출력에 적용된다. 확률이 가장 높은 클래스가 예측 클래스로 선택된다.Softmax is a mathematical function used in the classification head of a neural network to convert a real vector into a probability distribution. It is commonly used as the final activation function in the output layer of a neural network when the task is multi-class classification. The Softmax function takes a vector of real numbers as input and normalizes it so that the output becomes a probability distribution for possible classes. Specifically, the softmax function maps each input value to a value between 0 and 1 and then normalizes the values so that the sum is 1. The output of the softmax function is a probability distribution over the possible classes, and each element of the output vector represents the probability of the input belonging to that class. In the context of neural networks, the softmax function is typically applied to the output of the last layer of the network to generate a probability distribution for the class the network is trying to predict. The class with the highest probability is selected as the predicted class.

도 9는 본 발명의 다양한 실시 예들에 따른 CNN(convolution neural networks)에 기반한 토폴로지의 예측 기반 리토폴로지 예측 모델의 일 예를 도시한다.FIG. 9 illustrates an example of a retopology prediction model based on topology prediction based on convolution neural networks (CNN) according to various embodiments of the present invention.

구체적으로, 도 9는 제2 학습 모델의 일 예를 도시한다.Specifically, Figure 9 shows an example of a second learning model.

도 9를 참고하면, 도 7과 같이 영상으로부터 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지의 식별을 위해 학습된 제1 학습 모델에서 디코더를 버리고 인코더 만을 분리하고, 인코더 뒤에 분류 헤드(classification head, c-head)라고 불리는 완전히 연결된 레이어(fully connected layer, FC layer, FC 레이어)를 복수 개 붙인다.Referring to FIG. 9, the decoder is discarded and only the encoder is separated from the first learning model learned to identify the topology consisting of a plurality of subpolygons in the input image from the image as shown in FIG. 7, and a classification head (classification head, c) is placed behind the encoder. Multiple fully connected layers (FC layer, FC layer) called -head are attached.

만약, 제1 학습 모델에서 분리한 인코더가 도 8의 ViT가 아니고 도 9와 같이 U-Net의 CNN이라면, 컨볼루션 레이어 1(CONV 1) 부터 컨볼루션 레이어 5(CONV 5) 까지가 U-Net에서 분리한 인코더에 해당하고, FC 레이어 6, FC 레이어 7, 소프트맥스(softmax)가 c-head에 해당한다.If the encoder separated from the first learning model is not ViT in Figure 8 but a CNN of U-Net as shown in Figure 9, convolution layer 1 (CONV 1) to convolution layer 5 (CONV 5) is U-Net It corresponds to the encoder separated from , and FC layer 6, FC layer 7, and softmax correspond to c-head.

본 발명의 다양한 실시 예들에 따르면, According to various embodiments of the present invention,

(1) 도 5의 실시 예에서, 서브 다각형의 토폴로지 예측을 위한 인코더-디코더 모델을 원본 이미지에 상응하는 서브 다각형의 토폴로지를 이용해서 학습을 시킨다. 학습이 끝난 제1 학습 모델의 인코더의 최종 파라미터 값을 w1, w2, ..., wN 라고 할 수 있다.(1) In the embodiment of FIG. 5, an encoder-decoder model for predicting the topology of a subpolygon is trained using the topology of the subpolygon corresponding to the original image. The final parameter values of the encoder of the first learning model that has completed training may be referred to as w1, w2, ..., wN.

(2) 학습이 끝난 제1 학습 모델에서 인코더만 떼어내고 약간의 레이어를 붙여서 제2 학습 모델의 새로운 구조를 만든다. 추가 레이어의 파라미터 개수가 M이라고 하고, N>>M이다. 일 실시 예에 따라서, N은 M의 9배 이상에 해당할 수 있다. 이 시점에서 아직 제2 학습 모델을 학습하지 않은 상태이다. 즉, 제2 학습 모델을 구성하는 N+M 개의 파라미터의 값을 아직 정하지 못한 상태이다.(2) Remove only the encoder from the first learning model that has completed training and add a few layers to create a new structure for the second learning model. Let the number of parameters of the additional layer be M, and N>>M. According to one embodiment, N may correspond to 9 times or more than M. At this point, the second learning model has not yet been trained. In other words, the values of N+M parameters constituting the second learning model have not yet been determined.

(3) 제2 학습 모델의 학습을 위해 (즉, N+M개의 파라미터 값을 정하기 위해서), 원본 이미지 및 이미지에 대응하는 리토폴로지 정보를 이용해서 경사 하강법(gradient descent)을 적용한다. 경사 하강법(gradient descent)은 모델 파라미터의 초기 값을 적당히 정해 놓고 반복(iteration) 할수록 정답 값에 가까이 가도록 하는 방법이다. 이 때, 통상의 경우, 경사 하강법을 적용하는 모델의 모든 파라미터 초기값을 0 근처의 랜덤 값으로 설정하지만, 본 발명의 다양한 실시 예들에서는 제2 학습 모델을 구성하는 N+M개의 파라미터들 중 N개의 파라미터에 대해서 (1) 제1 학습 모델의 학습 결과에서 획득한 w1, w2, ..., wN 값들을 초기 값으로 설정한다. 제2 학습 모델을 구성하는 N+M개의 파라미터들 중 나머지 M개의 파라미터들의 초기 값은 통상의 경우와 같이 0 근처 초기 값으로 설정한다. 상기 0 근처 초기 값은 0을 기준으로 미리 설정된 주변 범위 값에 기초하여 지정된, 0과 가까운 주변 값들 중 랜덤한 값일 수 있다.(3) To learn the second learning model (i.e., to determine N+M parameter values), gradient descent is applied using the original image and retopology information corresponding to the image. Gradient descent is a method that sets the initial values of model parameters appropriately and gets closer to the correct value as iterations occur. At this time, in normal cases, the initial values of all parameters of the model to which gradient descent is applied are set to random values near 0, but in various embodiments of the present invention, among the N+M parameters constituting the second learning model For N parameters, (1) the w1, w2, ..., wN values obtained from the learning results of the first learning model are set as initial values. Among the N+M parameters constituting the second learning model, the initial values of the remaining M parameters are set to initial values near 0 as usual. The initial value near 0 may be a random value among peripheral values close to 0 that are designated based on a peripheral range value preset based on 0.

이 후, 경사 하강법(gradient descent)을 이용해서 설정한 초기 값이 정답 값으로 수렴하도록 학습함으로써 제2 학습 모델을 확정한다.Afterwards, the second learning model is confirmed by learning so that the initial value set using gradient descent converges to the correct value.

본 발명의 다양한 실시 예들은 이미지로부터 쿼드 메쉬(quad mesh)로의 리토폴로지 정보를 획득하는 딥러닝 모델의 학습에 있어서, 이미지로부터 복수의 서브 다각형으로 토폴로지를 식별하는 학습의 결과 값에 기반하여 fine-tuning을 적용한다. fine-tuning 이라는 개념은, 복수의 서브 다각형으로 토폴로지를 식별하는 용도로 (task 1) 학습해서 획득한 w1, w2,... , wN 값들을, 쿼드 메쉬(quad mesh)로의 리토폴로지의 예측 용도 (task2)로 학습할 때 초기 값으로 삼는다는 의미이다.Various embodiments of the present invention are based on the learning result value of identifying the topology from an image to a plurality of subpolygons in learning a deep learning model that acquires retopology information from an image to a quad mesh. Apply tuning. The concept of fine-tuning is to identify the topology with multiple subpolygons (task 1) and use the learned and acquired w1, w2,..., wN values to predict retopology into a quad mesh. This means that it is used as the initial value when learning with (task2).

인코더+c-head 모델에서, 구해야 할 인코더의 모델 파라미터를 편의상 e1, e2, ..., eN 라 하고 (총 N개), c-head 의 모델 파라미터를 c1, c2, ..., cM (총 M개) 라 할 수 있다. 인코더가 c-head 보다 훨씬 크기 때문에 N>>M이다. 일 실시 예에 따라서, N은 M의 9배 이상에 해당할 수 있다.In the encoder + c-head model, the encoder model parameters to be obtained are called e1, e2, ..., eN for convenience (total N), and the model parameters of c-head are c1, c2, ..., cM ( A total of M) can be said. N>>M because the encoder is much larger than the c-head. According to one embodiment, N may correspond to 9 times or more than M.

만약 fine-tuning을 고려하지 않고, 모든 N+M개의 파라미터들을 0 근처 초기 값으로 설정한 제2 학습 모델(인코더+c-head 모델)에 대하여 이미지로부터 쿼드 메쉬(quad mesh)로의 리토폴로지 정보를 획득하는 학습시킨다면 다음의 과정을 거치게 된다. If fine-tuning is not considered and all N+M parameters are set to initial values near 0, retopology information from the image to the quad mesh is obtained for the second learning model (encoder + c-head model). If you learn to acquire it, you will go through the following process.

[fine-tuning을 고려하지 않은 학습 방법][Learning method without fine-tuning]

(1) 총 N+M 개의 (인코더 N개 + c-head M개) 파라미터 값을 최초에 0 근방의 값으로 random initialization 한다 (예를 들어, e1=0.01, e2 = -0.001, ... c1=0.002, c2 = 0.012... )(1) Random initialize a total of N+M (N encoders + M c-heads) parameter values to values around 0 (for example, e1=0.01, e2 = -0.001, ... c1 =0.002, c2 = 0.012... )

(2) 경사 하강법이라 하는 반복 알고리즘(iterative algorithm)을 통해 파라미터 값들을 계속 업데이트를 한다. 이 때 사용될 데이터는 토폴로지 예측과 무관하고, 단지 쿼드 메쉬(quad mesh)로의 리토폴로지와 관련하여 분류(classification)된 학습 데이터를 사용한다. 예를 들어, 이미지가 입력되어서 리토폴로지 예측 결과가 올바르게 출력될 수 있도록 N+M 개 파라미터를 업데이트 한다.(2) Parameter values are continuously updated through an iterative algorithm called gradient descent. The data to be used at this time has nothing to do with topology prediction, and only uses classified learning data related to retopology into a quad mesh. For example, when an image is input, N+M parameters are updated so that the retopology prediction result can be output correctly.

이와 달리, 본 발명에서 제안하는 학습 과정은 다음과 같다. In contrast, the learning process proposed in the present invention is as follows.

[본 발명에서 제안하는 학습 방법][Learning method proposed by the present invention]

(1) N개의 인코더 파라미터보다 훨씬 적은 M개의 c-head 파라미터는 위와 동일하게 무작위 초기화(random initialization)을 사용한다. M개의 c-head 파라미터보다 훨씬 많은 N개의 인코더 파라미터는 복수의 서브 다각형으로 이루어진 토폴로지의 식별을 위해 학습된 결과 값을 가져와서 초기값으로 사용한다. (1) M c-head parameters, which are much smaller than N encoder parameters, use random initialization as above. The N encoder parameters, which are much more than the M c-head parameters, take the learned result values and use them as initial values to identify a topology composed of multiple subpolygons.

(2) 제2 학습 모델 내 N+M 개의 파라미터 값에 대하여 리토폴로지 결과와 관련하여 분류(classification)된 학습 데이터에 기반하여 경사 하강법을 사용하여 업데이트 한다. 이 때 N개의 인코더 파라미터는 다른 task (즉, 복수의 서브 다각형으로 이루어진 토폴로지의 식별) 학습을 통해 얻어진 값을 초기 값으로 사용하여 추가적인 업데이트를 하므로 (새로운 학습이 아닌) fine-tuning 이라고 불리며, 이렇게 다른 task 학습 결과를 가져다 사용하는 방법을 일반적으로 transfer learning 이라고 한다.(2) The N+M parameter values in the second learning model are updated using gradient descent based on the learning data classified in relation to the retopology results. At this time, the N encoder parameters are additionally updated using the values obtained through learning another task (i.e., identification of a topology consisting of multiple subpolygons) as initial values, so it is called fine-tuning (rather than new learning). The method of using learning results from other tasks is generally called transfer learning.

[본 발명에서 제안하는 학습 방법의 장점][Advantages of the learning method proposed in the present invention]

(1) 만약 fine-tuning을 고려하지 않은 학습 방법을 이용하면 모든 파라미터를 random 초기값에서부터 업데이트 해야 하므로 그만큼 라벨링 된 데이터가 많이 필요하다. 본 발명에서 제안하는 방법을 이용하면, 전부는 아니지만 90% 이상에 해당할 N 개의 파라미터를 꽤 괜찮은 값에서부터 학습 시작하기 때문에 더 적은 학습 데이터를 가지고도 학습이 가능하다. 제2 학습 모델의 학습 데이터는 쿼드 메쉬(quad mesh)로의 리토폴로지가 완료된 트레이닝 이미지에서 얻어야 하기 때문에 많이 얻기가 어렵다. '꽤 괜찮은 값' 이라는 근거는, 이미지마다 모서리 및 꼭지점에 기반하여 복수의 서브 다각형으로 토폴로지의 예측이 잘 되었다면 인코더가 이미지의 특징을 잘 추출했다는 반증이기 때문이다. (1) If you use a learning method that does not consider fine-tuning, all parameters must be updated from random initial values, so a lot of labeled data is needed. Using the method proposed in the present invention, learning is possible even with less training data because the N parameters, which correspond to more than 90%, but not all, of the parameters are learned from fairly good values. It is difficult to obtain a lot of training data for the second learning model because it must be obtained from training images that have been retopologically converted to a quad mesh. The reason for the 'pretty good value' is that if the topology was well predicted with multiple subpolygons based on the edges and vertices of each image, it is proof that the encoder extracted the features of the image well.

(2) Transfer learning의 source task인 이미지마다 모서리 및 꼭지점에 기반하여 복수의 서브 다각형으로 토폴로지의 예측에 대한 학습을 위해서 사용해야 할 학습 데이터는 쿼드 메쉬(quad mesh)로의 리토폴로지가 수행되지 않은 이미지에서 얻어도 되므로 많은 양을 쉽게 확보할 수 있다. Transfer learning을 수학 공부를 통해 얻은 지식을 물리 공부에 활용하는 것으로 비유하자면, 수학 공부 자료는 쉽게 많이 얻을 수 있는 반면에, 물리 공부 자료는 제한적인 경우에 해당한다.(2) The learning data that should be used to learn topology prediction with multiple subpolygons based on the edges and vertices of each image, which is the source task of transfer learning, is from images in which retopology to a quad mesh has not been performed. Since it is obtainable, you can easily secure a large amount. If we compare transfer learning to using the knowledge gained through studying mathematics to study physics, while a lot of mathematics study materials can be easily obtained, physics study materials are limited.

도 10은 본 발명의 다양한 실시 예들에 따른 입력 이미지 내 복수의 서브 다각형으로 이루어진 토폴로지를 분석하고, 토폴로지의 분석 기반 리토폴로지가 수행된 이미지의 일 예를 도시한다.FIG. 10 shows an example of an image in which a topology composed of a plurality of subpolygons in an input image is analyzed and retopology based on topology analysis is performed according to various embodiments of the present invention.

도 10을 참고하면, 행 별로, (1) 솔리드, (2) 표면의 메쉬 유형을 나타낸다. 열 별로, (a) 원래 개체, (b) 모든 삼각형, (c) 가장자리에만 있는 삼각형, (d) 삼각형 없음, 전체 그리드, (e) 삼각형 없음, 쿼트 트리를 나타낸다.Referring to Figure 10, the mesh types of (1) solid and (2) surface are shown for each row. Column by column, we represent (a) the original object, (b) all triangles, (c) triangles on the edges only, (d) no triangles, the entire grid, and (e) no triangles, the quad tree.

쿼드 메쉬(quad mesh)로 리토폴로지 되기 전에 복수의 서브 다각형으로 식별된 상태는 (b) 또는 (c)와 같은 상태에 해당할 수 있다.The state identified as a plurality of subpolygons before being retopologized into a quad mesh may correspond to a state such as (b) or (c).

삼각형 메쉬의 (b) 또는 (c)에 대하여, (d) 또는 (e)는 사각형 메쉬(쿼드 메쉬)로 리토폴로지가 수행된 형태이다.Regarding (b) or (c) of the triangular mesh, (d) or (e) is a form in which retopology was performed with a square mesh (quad mesh).

쿼드 메쉬(Quad mesh)로의 리토폴로지(retopology)는 여러 가지 이점을 제공한다:Retopology to a quad mesh offers several advantages:

(1) 애니메이션 용이성: 쿼드 메쉬(Quad mesh)는 애니메이션에 이상적이다. 사각형(Quad)은 더 예측 가능한 변형을 제공하며, 특히 스킨과 같은 복잡한 표면에서 더욱 자연스러운 결과를 제공한다. 또한, 쿼드 메쉬(Quad mesh)는 에지 루프(edge loop)와 에지 링(edge ring)을 통해 효과적으로 모델의 흐름을 제어할 수 있다.(1) Ease of animation: Quad mesh is ideal for animation. Quad provides more predictable deformations and more natural results, especially on complex surfaces such as skins. Additionally, Quad mesh can effectively control the flow of the model through edge loops and edge rings.

(2) UV 매핑 용이성: 쿼드 메쉬(Quad mesh)는 UV 매핑에 더 적합합니다. Quad는 직사각형 텍스처 픽셀에 자연스럽게 매핑되며, 이는 텍스처 왜곡을 최소화하고 텍스처 페인팅을 더 쉽게 만든다. UV는 3D 모델링에서 사용되는 용어로, 3D 객체의 2D 텍스처 맵을 참조하는 좌표계를 나타낸다. UV는 실제로 약자가 아니며, 3D 공간에서 이미 XYZ를 사용하고 있기 때문에 2D 텍스처 공간을 나타내는 데 UV를 사용한다. UV 매핑은 3D 모델의 표면에 2D 이미지(텍스처)를 적용하는 과정이다. 이는 3D 모델에 색상, 패턴, 텍스처 등을 추가하여 모델의 디테일을 향상시키는 데 사용된다. UV 좌표계는 이 2D 이미지가 3D 모델의 어느 부분에 적용되어야 하는지를 정의한다.(2) Ease of UV mapping: Quad mesh is better suited for UV mapping. Quads map naturally to rectangular texture pixels, which minimizes texture distortion and makes texture painting easier. UV is a term used in 3D modeling and refers to a coordinate system that references the 2D texture map of a 3D object. UV is not really an abbreviation, we use UV to represent 2D texture space because we already use XYZ in 3D space. UV mapping is the process of applying a 2D image (texture) to the surface of a 3D model. It is used to enhance the detail of a 3D model by adding color, pattern, texture, etc. to it. The UV coordinate system defines where this 2D image should be applied to the 3D model.

(3) 모델링 용이성: 쿼드 메쉬(Quad mesh)는 모델링에 더 적합하다. 사각형(Quad)은 에지 루프(edge loop)를 통해 모델의 형태를 쉽게 추가하거나 제거할 수 있다. 또한, 사각형(Quad)은 모델의 표면을 더욱 부드럽게 만드는 데 도움이 된다.(3) Ease of modeling: Quad mesh is more suitable for modeling. Quad allows you to easily add or remove model shapes through edge loops. Additionally, quads help make the surface of the model smoother.

(4) 서브디비전 서피스: 쿼드 메쉬(Quad mesh)는 서브디비전 서피스 모델링에 이상적이다. 이는 모델의 디테일을 증가시키기 위해 추가적인 구조(geometry)를 생성하는 기법이다. 사각형(Quad)은 이 과정에서 더욱 균일하고 예측 가능한 결과를 제공한다. 3D 컴퓨터 그래픽스에서 " 구조(geometry)"는 3D 모델이나 장면을 구성하는 기본적인 형상이나 구조를 나타낸다. 이는 일반적으로 3D 공간에서 점(vertex), 선(edge), 평면(face) 등의 기본 요소로 구성된다. 이러한 기본 요소들은 함께 연결되어 복잡한 3D 형상을 형성한다. 예를 들어, 3D 모델의 구조(geometry)는 모델의 모든 점들(vertices), 선들(edges), 그리고 면들(faces)을 포함한다. 이러한 요소들은 모델의 형상, 크기, 위치 등을 결정한다. 또한, 이들은 모델의 표면 텍스처, 색상, 재질 등을 결정하는 데 사용되는 UV 매핑과 같은 추가 정보를 포함할 수 있다. 따라서, 3D 모델링에서 " 구조(geometry)"는 모델의 "뼈대" 또는 "구조"를 나타내는 용어로 생각할 수 있다.(4) Subdivision surface: Quad mesh is ideal for modeling subdivision surfaces. This is a technique that creates additional geometry to increase the detail of the model. Quads provide more uniform and predictable results in this process. In 3D computer graphics, “geometry” refers to the basic shape or structure that makes up a 3D model or scene. It generally consists of basic elements such as points, lines, and faces in 3D space. These basic elements are connected together to form complex 3D shapes. For example, the geometry of a 3D model includes all of the model's vertices, edges, and faces. These factors determine the shape, size, and location of the model. Additionally, they may contain additional information such as UV mapping, which is used to determine the model's surface texture, color, material, etc. Therefore, in 3D modeling, “geometry” can be thought of as a term representing the “skeleton” or “structure” of the model.

(5) 툴 호환성: 대부분의 3D 모델링 및 애니메이션 툴은 쿼드 메쉬(Quad mesh)를 선호한다. 이는 사각형(Quad)이 위에서 언급한 이점들 덕분에 더욱 유연하고 예측 가능한 결과를 제공하기 때문이다.(5) Tool compatibility: Most 3D modeling and animation tools prefer quad mesh. This is because Quad provides more flexible and predictable results thanks to the advantages mentioned above.

이러한 이점들 때문에, 많은 3D 아티스트와 엔지니어들은 복잡한 삼각형 메쉬(triangular mesh)를 쿼드 메쉬(사각형 메쉬, quad mesh)로 리토폴로지(retopologize)하는 작업을 수행한다. 그러나 이 작업은 시간이 많이 소요되며, 복잡한 모델에 대해서는 매우 어려울 수 있다. 이러한 이유로, 이 과정을 자동화하거나 단순화하는 다양한 연구와 기술이 개발되고 있다.Because of these advantages, many 3D artists and engineers retopologize complex triangular meshes into quad meshes. However, this task is time consuming and can be very difficult for complex models. For this reason, various research and technologies are being developed to automate or simplify this process.

도 11 내지 도 13은 본 발명의 다양한 실시 예들에 따른 토폴로지의 예측 기반 리토폴로지의 수행 후 시각화된 와이어프레임의 출력 이미지의 일 예를 도시한다.11 to 13 illustrate an example of an output image of a wireframe visualized after performing topology prediction-based retopology according to various embodiments of the present invention.

도 11 내지 도 13을 참고하면, 원본 이미지, 리포톨로지 와이어프레임 이미지, 리포톨로지 렌더링 이미지가 도시된다. 원본 이미지로부터 본 발명의 다양한 실시 예들에 따라서 토폴로지의 예측 기반 쿼드 메쉬(Quad mesh) 리토폴로지의 수행 후 시각화된 와이어프레임의 출력 이미지가 생성된다. 또한, 리포톨로지 와이어프레임에 기반하여 후공정으로 렌더링을 수행한 리포톨로지 렌더링 이미지가 생성된다.Referring to Figures 11 to 13, an original image, a lipology wireframe image, and a lipology rendering image are shown. After performing topology prediction-based quad mesh retopology according to various embodiments of the present invention from the original image, a visualized wireframe output image is generated. In addition, a lipology rendering image is created by performing rendering in a post-process based on the lipology wireframe.

쿼드 메쉬(Quad mesh)로부터 와이어프레임(wireframe)을 만드는 과정은 일반적으로 3D 모델링 소프트웨어를 사용하여 수행된다. 이 과정은 대부분의 3D 모델링 소프트웨어에서 비슷하게 작동하며, 다음은 일반적인 단계이다:The process of creating a wireframe from a quad mesh is typically performed using 3D modeling software. This process works similarly in most 3D modeling software, here are the typical steps:

(1) 3D 모델 로드: 먼저, 쿼드 메쉬(Quad mesh)로 구성된 3D 모델을 3D 모델링 소프트웨어에 로드한다.(1) Load 3D model: First, load the 3D model consisting of quad mesh into 3D modeling software.

(2) 뷰 모드 변경: 대부분의 3D 모델링 소프트웨어는 다양한 뷰 모드를 제공합니다. 이 중에서 'Wireframe' 또는 'Wireframe View' 모드를 선택합니다. 이 모드는 3D 모델의 모든 폴리곤을 선으로만 표시하여, 모델의 와이어프레임을 볼 수 있게 한다.(2) Change view mode: Most 3D modeling software provides various view modes. Among these, select 'Wireframe' or 'Wireframe View' mode. This mode displays all polygons in the 3D model as lines only, allowing you to view the wireframe of the model.

(3) 와이어프레임 추출 (선택 사항): 일부 3D 모델링 소프트웨어는 와이어프레임을 별도의 객체로 추출하는 기능을 제공한다. 이 기능을 사용하면, 와이어프레임만을 독립적으로 조작하거나, 다른 형식으로 내보낼 수 있다.(3) Wireframe extraction (optional): Some 3D modeling software provides the ability to extract wireframes as separate objects. Using this feature, you can manipulate wireframes independently or export them to other formats.

와이어프레임은 3D 모델의 기본적인 구조를 나타내며, 디자인의 흐름과 토폴로지를 이해하는 데 도움이 된다. 그러나, 와이어프레임 자체는 표면의 세부 정보나 텍스처 등을 포함하지 않으므로, 이러한 정보를 표현하려면 추가적인 작업이 필요하다. 본 발명의 다양한 실시 예들에 따르면, 메모리는 3D 모델링 소프트웨어를 저장할 수 있으며, 쿼드 메쉬로 리토폴로지 된 결과에 기반하여 와이어프레임을 생성할 수 있다.Wireframes represent the basic structure of a 3D model and help understand the flow and topology of the design. However, the wireframe itself does not include surface details or textures, so additional work is required to represent this information. According to various embodiments of the present invention, the memory can store 3D modeling software and generate a wireframe based on the result of retopology into a quad mesh.

쿼드 메쉬(Quad mesh)는 와이어프레임(wireframe) 생성에 매우 유용하다. 이는 쿼드 메쉬(Quad mesh)가 몇 가지 중요한 이점을 제공하기 때문이다:Quad mesh is very useful for creating wireframes. This is because quad mesh offers several important advantages:

(1) 예측 가능한 구조: 쿼드 메쉬(Quad mesh)는 각 폴리곤이 네 개의 꼭지점을 가지고 있으므로, 구조가 예측 가능하고 일관성이 있습니다. 이는 wireframe 생성을 단순화하고, 결과를 더욱 깔끔하게 만든다.(1) Predictable structure: Quad mesh has a predictable and consistent structure because each polygon has four vertices. This simplifies wireframe creation and makes the results cleaner.

(2) 에지 루프(Edge Loop)와 에지 링(Edge Ring): 쿼드 메쉬(Quad mesh)는 에지 루프(edge loop)와 에지 링(edge ring)을 쉽게 생성할 수 있다. 이는 3D 모델의 흐름과 구조를 표현하는 데 매우 유용하다.(2) Edge Loop and Edge Ring: Quad mesh can easily create edge loops and edge rings. This is very useful for expressing the flow and structure of a 3D model.

(3) 서브디비전 서피스: 쿼드 메쉬(Quad mesh)는 서브디비전 서피스 모델링에 이상적이다. 이는 모델의 디테일을 증가시키기 위해 추가적인 구조(geometry)를 생성하는 기법이다. 사각형(Quad)은 이 과정에서 더욱 균일하고 예측 가능한 결과를 제공한다.(3) Subdivision surface: Quad mesh is ideal for modeling subdivision surfaces. This is a technique that creates additional geometry to increase the detail of the model. Quads provide more uniform and predictable results in this process.

엣지 플로우(Edge Flow)는 3D 모델링과 디지털 스컬프팅 과정에서 사용되는 개념이다. 이는 3D 모델의 표면을 구성하는 폴리곤(다각형)들의 흐름이나 배치를 의미한다. 엣지 플로우는 모델의 외형과 세부 사항을 결정하는 데 중요한 역할을 한다. 잘 설계된 엣지 플로우는 모델의 형태를 자연스럽고 정확하게 표현할 수 있도록 도와주며, 모델을 애니메이션화하거나 텍스처를 입힐 때 유용하다. 예를 들어, 인체 모델의 경우, 엣지 플로우는 관절 부분이나 근육의 움직임을 자연스럽게 표현하는 데 도움을 준다. 또한, 엣지 플로우는 애니메이션에서 피부의 접힘, 주름, 굴곡과 같은 세부적인 표현을 생성하는 데 사용된다. 엣지 플로우를 설계할 때에는 모델의 기하학적인 구조와 다른 부분들 간의 연결성을 고려해야 한다. 즉, 모든 폴리곤들이 자연스럽게 연결되고 표면을 매끄럽게 형성할 수 있도록 고려해야 한다. 이를 통해 모델이 형태를 변형하거나 애니메이션화될 때 자연스러운 움직임이 가능해진다. 엣지 플로우는 주로 3D 모델링 소프트웨어에서 시각적인 표현을 위해 사용되며, 모델링 과정에서 디자이너 또는 아티스트에게 중요한 역할을 합니다. 엣지 플로우는 모델의 세부 사항과 표현력을 결정하므로, 잘 디자인된 엣지 플로우는 최종 결과물의 품질과 외관을 향상시킬 수 있다.Edge Flow is a concept used in 3D modeling and digital sculpting processes. This refers to the flow or arrangement of polygons that make up the surface of a 3D model. Edge flow plays an important role in determining the appearance and details of the model. A well-designed edge flow helps express the shape of the model naturally and accurately, and is useful when animating or adding texture to the model. For example, in the case of human models, edge flow helps express the movement of joints and muscles naturally. Additionally, edge flow is used in animation to create detailed expressions such as skin folds, wrinkles, and curves. When designing an edge flow, the geometry of the model and the connectivity between other parts must be considered. In other words, consideration must be given to ensure that all polygons are naturally connected and the surface is formed smoothly. This allows natural movement when the model is transformed or animated. Edge flow is mainly used for visual representation in 3D modeling software, and plays an important role for designers or artists in the modeling process. Edge flow determines the detail and expressiveness of a model, so a well-designed edge flow can improve the quality and appearance of the final result.

하드웨어를 이용하여 본 발명의 실시 예를 구현하는 경우에는, 본 발명을 수행하도록 구성된 ASICs(application specific integrated circuits) 또는 DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays) 등이 본 발명의 프로세서에 구비될 수 있다.When implementing embodiments of the present invention using hardware, application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), and programmable logic devices (PLDs) configured to perform the present invention. , FPGAs (field programmable gate arrays), etc. may be provided in the processor of the present invention.

한편, 상술한 방법은, 컴퓨터에서 실행될 수 있는 프로그램으로 작성 가능하고, 컴퓨터 판독 가능 매체를 이용하여 상기 프로그램을 동작시키는 범용 디지털 컴퓨터에서 구현될 수 있다. 또한, 상술한 방법에서 사용된 데이터의 구조는 컴퓨터 판독 가능한 저장 매체에 여러 수단을 통하여 기록될 수 있다. 본 발명의 다양한 방법들을 수행하기 위한 실행 가능한 컴퓨터 코드를 포함하는 저장 디바이스를 설명하기 위해 사용될 수 있는 프로그램 저장 디바이스들은, 반송파(carrier waves)나 신호들과 같이 일시적인 대상들은 포함하는 것으로 이해되지는 않아야 한다. 상기 컴퓨터 판독 가능한 저장 매체는 마그네틱 저장매체(예를 들면, 롬, 플로피 디스크, 하드 디스크 등), 광학적 판독 매체(예를 들면, 시디롬, DVD 등)와 같은 저장 매체를 포함한다.Meanwhile, the above-described method can be written as a program that can be executed on a computer, and can be implemented in a general-purpose digital computer that operates the program using a computer-readable medium. Additionally, the data structure used in the above-described method can be recorded on a computer-readable storage medium through various means. Program storage devices, which may be used to describe a storage device containing executable computer code for performing various methods of the present invention, should not be understood to include transient objects such as carrier waves or signals. do. The computer-readable storage media includes storage media such as magnetic storage media (eg, ROM, floppy disk, hard disk, etc.) and optical readable media (eg, CD-ROM, DVD, etc.).

이상에서 설명된 실시 예들은 본 발명의 구성요소들과 특징들이 소정 형태로 결합된 것들이다. 각 구성요소 또는 특징은 별도의 명시적 언급이 없는 한 선택적인 것으로 고려되어야 한다. 각 구성요소 또는 특징은 다른 구성요소나 특징과 결합되지 않은 형태로 실시될 수 있다. 또한, 일부 구성요소들 및/또는 특징들을 결합하여 본 발명의 실시 예를 구성하는 것도 가능하다. 발명의 실시 예들에서 설명되는 동작들의 순서는 변경될 수 있다. 어느 실시 예의 일부 구성이나 특징은 다른 실시 예에 포함될 수 있고, 또는 다른 실시 예의 대응하는 구성 또는 특징과 교체될 수 있다. 특허청구범위에서 명시적인 인용 관계가 있지 않은 청구항들을 결합하여 실시 예를 구성하거나 출원 후의 보정에 의해 새로운 청구항으로 포함시킬 수 있음은 자명하다.The embodiments described above combine the components and features of the present invention in a predetermined form. Each component or feature should be considered optional unless explicitly stated otherwise. Each component or feature may be implemented in a form that is not combined with other components or features. Additionally, it is possible to configure an embodiment of the present invention by combining some components and/or features. The order of operations described in embodiments of the invention may be changed. Some features or features of one embodiment may be included in another embodiment or may be replaced with corresponding features or features of another embodiment. It is obvious that claims that do not have an explicit reference relationship in the patent claims can be combined to form an embodiment or included as a new claim through amendment after filing.

본 발명이 본 발명의 기술적 사상 및 본질적인 특징을 벗어나지 않고 다른 형태로 구체화될 수 있음은 본 발명이 속한 분야 통상의 기술자에게 명백할 것이다. 따라서, 상기 실시 예는 제한적인 것이 아니라 예시적인 모든 관점에서 고려되어야 한다. 본 발명의 권리범위는 첨부된 청구항의 합리적 해석 및 본 발명의 균등한 범위 내 가능한 모든 변화에 의하여 결정되어야 한다.It will be clear to those skilled in the art that the present invention can be embodied in other forms without departing from the technical spirit and essential features of the present invention. Accordingly, the above embodiments should be considered in all respects as illustrative rather than restrictive. The scope of rights of the present invention should be determined by reasonable interpretation of the appended claims and all possible changes within the equivalent scope of the present invention.

110: 이미지 120: 전자 장치
121: 메모리 122: 프로세서
123: 입력 장치 124: 출력 장치
130: 와이어프레임 이미지 400: 인공 신경망
410: 입력 계층 411: 복수의 인자들
430: 은닉 계층 431: 제1 은닉 계층
432: 제1 유닛 433: 제2 은닉 계층
434: 제2 유닛 450: 출력 계층
451: 예측 결과 유닛110: Image 120: Electronic device
121: memory 122: processor
123: input device 124: output device
130: Wireframe image 400: Artificial neural network
410: Input layer 411: Multiple arguments
430: Hidden layer 431: First hidden layer
432: first unit 433: second hidden layer
434: second unit 450: output layer
451: Prediction result unit

Claims

In a method of operating an electronic device including an input device, memory, processor, and output device,
performing, by the processor, topology prediction and retopology learning for a learning model stored in the memory based on a plurality of test images for which topology analysis has not been performed;
Receiving information about an input image through the input device;
generating, by the processor, a prediction result of a topology consisting of a plurality of subpolygons in the input image using the learning model from the input image;
performing re-topology by the processor by squareling the plurality of subpolygons in the input image using the learning model;
Outputting a visualized wireframe for the input image based on the result of the retopology by the output device,
The step of performing topology prediction and retopology learning on the learning model stored in the memory based on the plurality of test images by the processor,
Receiving a plurality of first test images on which topology analysis has not been performed by the input device;
receiving, by the input device, a plurality of second test images in which a topology prediction consisting of a plurality of sub-polygons based on corners and vertices has been performed on the plurality of first test images;
determining, by the processor, a plurality of first partial test images at a plurality of random locations within the first test image;
determining, by the processor, the plurality of second partial test images at a plurality of locations corresponding to the plurality of first partial test images within the second test image;
performing learning to predict a topology of a first learning model stored in the memory based on the plurality of first partial test images and the plurality of second test images by the processor;
generating a second learning model by the processor by removing a decoder of the first learning model on which learning was performed and combining a plurality of predetermined layers with an encoder of the first learning model;
A plurality of third test images in which topology prediction and retopology were not performed, and a plurality of fourth test images in which retopology was performed by squarening a plurality of subpolygons in the plurality of test images. receiving input by the input device;
A step of performing, by the processor, prediction of a topology and learning of a retopology of the second learning model based on the plurality of third test images and the plurality of fourth test images,
Neighboring second partial test images among the plurality of second partial test images overlap each other or have a gap between them,
The learning model corresponds to the second learning model, and is characterized in that it restores damaged parts of a damaged mesh model that overlaps with an error or a gap is created, or fills the gap between meshes,
method.

delete

According to claim 1,
The second learning model is,
After dividing the input image into a plurality of partial images, topology prediction is performed using the plurality of subpolygons,
Configured to generate a retopology image in which retopology has been performed by squared based on the plurality of subpolygons,
method.

According to claim 1,
The encoder of the first learning model consists of N first parameters, the plurality of layers consists of M second parameters, N and M are integers greater than 0,
The second learning model is composed of N first parameters and M second parameters,
The N first parameters have values according to the results of learning of the second learning model as initial values,
The M second parameters have random values among peripheral values specified based on a peripheral range value preset with 0 as the reference,
method.

According to clause 4,
N, the number of the first parameters, is 9 times more than M, the number of the second parameters,
method.

According to claim 1,
The plurality of first 3D partial test images and the plurality of second partial test images are configured with the same set size and structure,
The number of the plurality of first partial test images and the plurality of second partial test images corresponds to a random number greater than or equal to a set number,
The plurality of first test images and the plurality of second test images have the same size and structure,
method.

delete

According to claim 1,
The step of performing training of the first learning model stored in the memory based on the plurality of first partial test images and the plurality of second partial test images includes:
Based on one first partial test image among the plurality of first partial test images and one second partial test image corresponding to the one first partial test image among the plurality of second partial test images A step of performing learning of the first learning model is repeatedly performed,
Whenever training of the first learning model is performed, the first learning model is based on a different first partial test image and a different second partial test image among the plurality of first partial test images. where learning is performed,
method.

In electronic devices,
The electronic device includes an input device, a memory, a processor, and an output device,
The processor is configured to perform the method according to any one of claims 1, 3 to 6, and 8,
Electronic devices.

A computer program configured to perform the method according to any one of claims 1, 3 to 6, and 8, and recorded on a computer-readable storage medium.