KR20220116940A

KR20220116940A - Learning data collection system and method

Info

Publication number: KR20220116940A
Application number: KR1020210020419A
Authority: KR
Inventors: 박순용; 김우영; 한철호; 이승현; 이동환
Original assignee: 네이버랩스 주식회사
Priority date: 2021-02-16
Filing date: 2021-02-16
Publication date: 2022-08-23
Also published as: KR102590730B1

Abstract

The present invention relates to a system and method for collecting learning data which is a subject of learning in artificial intelligence. The learning data collection system according to the present invention comprises the steps of: generating a 3D modeling object that performs modeling on an object and collecting a plurality of first images each including the 3D modeling object having a different posture in a first space having a first reference coordinate system; collecting a second image of the object disposed in a second space using a camera disposed in the second space having a second reference coordinate system different from the first reference coordinate system; and generating learning data for the object using the degree of freedom information of the camera and the 3D modeling object included in the first images. Accordingly, it is possible to collect learning data closer to a real environment.

Description

LEARNING DATA COLLECTION SYSTEM AND METHOD

본 발명은 인공지능에서 학습의 대상이 되는 학습 데이터 수집 시스템 및 이를 이용한 학습 데이터 수집 방법에 관한 것이다.The present invention relates to a learning data collection system that is a subject of learning in artificial intelligence and a learning data collection method using the same.

인공지능의 사전적 의미는, 인간의 학습능력과 추론능력, 지각능력, 자연언어의 이해능력 등을 컴퓨터 프로그램으로 실현한 기술이라 할 수 있다. 이러한 인공지능은 머신러닝에 인간의 뇌를 모방한 신경망 네트워크를 더한 딥러닝으로 인하여 비약적인 발전을 이루었다.The dictionary meaning of artificial intelligence is a technology that realizes human learning ability, reasoning ability, perceptual ability, and natural language understanding ability through computer programs. Such artificial intelligence made a leap forward due to deep learning, which added a neural network that mimics the human brain to machine learning.

딥러닝(deep learning)이란, 컴퓨터가 인간처럼 판단하고 학습할 수 있도록 하고, 이를 통해 사물이나 데이터를 군집화하거나 분류하는 기술로서, 최근에는 텍스트 데이터 뿐만 아니라 영상 데이터에 대한 분석까지 가능해져, 매우 다양한 산업분야에 적극적으로 활용되고 있다.Deep learning is a technology that allows a computer to judge and learn like a human, and to cluster or classify objects or data through this. It is actively used in the industrial field.

예를 들어, 로봇 분야, 자율 주행 분야, 의료 분야 등 다양한 산업분야에서는 딥러닝 기반의 학습 네트워크(이하, “딥러닝 네트워크”라 명명함)를 통하여, 학습 대상 데이터를 기반으로 학습을 수행하고, 의미 있는 학습 결과를 도출함으로써, 각 산업분야에 유용하게 활용되고 있다.For example, in various industrial fields such as robot field, autonomous driving field, medical field, through a deep learning-based learning network (hereinafter referred to as “deep learning network”), learning is performed based on the data to be learned, By deriving meaningful learning results, it is usefully used in each industry field.

일 예로서, 로봇 분야에서는, 로봇이 수행하는 작업에 대한 이해를 위하여, 로봇 주변의 상황 또는 로봇 주변에 배치된 작업 대상물에 대한 정확한 판단이 가능해야 하며, 이를 위해, 딥러닝 기반의 영상인식 기술(예를 들어, 로봇 비전(vision)기술)이 적극 활용되고 있다.As an example, in the field of robots, in order to understand the work performed by the robot, it should be possible to accurately determine the situation around the robot or the work object disposed around the robot, and for this, deep learning-based image recognition technology (For example, robot vision technology) is being actively utilized.

한편, 딥러닝 뿐만 아니라 머신러닝과 같은 인공지능 분야에서는, 보다 많은 양에 대한 데이터에 대해 학습을 수행함에 따라, 정확도가 높아지고, 보다 양질의 결과물을 도출하는 것이 가능하다. 따라서, 인공지능 분야에서는, 학습의 대상이 되는 데이터를 수집하는 것이 필수적이다.On the other hand, in the field of artificial intelligence such as machine learning as well as deep learning, as learning is performed on a larger amount of data, the accuracy increases and it is possible to derive better quality results. Therefore, in the field of artificial intelligence, it is essential to collect data to be learned.

특히, 영상 데이터를 기반으로 한 딥러닝 네트워크 또는 머신러닝 네트워크는, 영상 데이터에 대응되는 대상물(또는 물체)의 위치 또는 자세를 추정할 수 있으며, 이러한 추정을 위해서는 영상 데이터와 함께, 대상물의 자유도 정보(위치 정보 및 자세 정보)가 학습 데이터로서 확보되어야 한다. In particular, a deep learning network or a machine learning network based on image data may estimate the position or posture of an object (or object) corresponding to image data, and for this estimation, the degree of freedom of the object along with image data Information (position information and posture information) must be secured as learning data.

종래, 영상 데이터 및 이에 대응되는 자유도 정보를 학습 데이터로서 수집하기 위해서는, 영상 데이터에 대해 라벨링을 수행하고(예를 들어, 영상 데이터에서 대상물에 대응되는 특정 이미지 객체를 식별시키기 위한 작업), 특정 이미지 객체와 자유도 정보를 일일이 매핑하는 수작업이 이루어져야 하므로, 학습 데이터를 확보하기 위한 엄청난 노동력이 필요했다.Conventionally, in order to collect image data and corresponding degree of freedom information as learning data, labeling is performed on the image data (eg, an operation for identifying a specific image object corresponding to an object in the image data), Since the manual mapping of the image object and the degree of freedom information had to be done, an enormous amount of labor was required to secure the training data.

예를 들어, 국내 등록특허 10-2010085호 에서는 수퍼픽셀을 이용한 미세조직의 라벨링 이미지 생성방법 및 생성장치를 개시하고 있으며, 이는 대상물에 대응되는 특정 이미지 객체에 대한 라벨링을 간소화하기 위한 것에 불과하여, 특정 이미지 객체와 자유도 정보의 매핑을 위해서는 여전히 수작업이 필요하다. For example, Korean Patent Registration No. 10-2010085 discloses a method and a device for generating a labeling image of a microstructure using a superpixel, which is only for simplifying labeling of a specific image object corresponding to an object, Manual work is still required for mapping specific image objects and degrees of freedom information.

이에, 자유도 정보를 포함한 학습 데이터를 자동화 방식으로 수집하는 방법에 대한 개선이 매우 절실한 상황이다.Accordingly, there is an urgent need to improve a method for collecting learning data including degree of freedom information in an automated manner.

본 발명은, 인공지능 네트워크의 학습을 위한 학습 데이터를 수집하는 학습 데이터 수집 시스템 및 방법에 관한 것이다.The present invention relates to a learning data collection system and method for collecting learning data for learning of an artificial intelligence network.

보다 구체적으로, 본 발명은, 자유도 정보를 포함하는 학습 데이터를 수집하는 학습 데이터 수집 시스템 및 방법에 관한 것이다.More specifically, the present invention relates to a learning data collection system and method for collecting learning data including degree of freedom information.

나아가, 본 발명은, 자유도 정보를 포함하는 학습 데이터를 자동으로 수집할 수 있는 학습 데이터 수집 시스템 및 방법에 관한 것이다.Furthermore, the present invention relates to a learning data collection system and method capable of automatically collecting learning data including degree of freedom information.

더 나아가, 본 발명은 다양한 자세를 갖는 대상물에 대한 학습 데이터를 수집하는 학습 데이터 수집 시스템 및 방법에 관한 것이다.Furthermore, the present invention relates to a learning data collection system and method for collecting learning data for objects having various postures.

나아가, 본 발명은 학습 데이터를 수집하는데 소요되는 시간 및 노동력을 최소화할 수 있는 학습 데이터 수집 시스템 및 방법에 관한 것이다.Furthermore, the present invention relates to a learning data collection system and method capable of minimizing the time and labor required to collect the learning data.

위에서 살펴본 과제를 해결하기 위하여, 본 발명에 따른 학습 데이터 수집 방법은, 대상물에 대해 모델링을 수행한 3차원 모델링 객체를 생성하고, 제1 기준 좌표계를 갖는 제1 공간에서 서로 다른 자세를 가지는 상기 3차원 모델링 객체를 각각 포함하는 복수의 제1 영상을 수집하는 단계, 상기 제1 기준 좌표계와 다른 제2 기준 좌표계를 갖는 제2 공간에 배치된 카메라를 이용하여, 상기 제2 공간에 배치된 상기 대상물을 촬영한 제2 영상을 수집하는 단계 및 상기 카메라의 자유도 정보와 상기 복수의 제1 영상에 포함된 상기 3차원 모델링 객체를 이용하여, 상기 대상물에 대한 학습 데이터를 생성하는 단계를 포함할 수 있다.In order to solve the above problems, the learning data collection method according to the present invention generates a three-dimensional modeling object that has been modeled on an object, and has a different posture in a first space having a first reference coordinate system. Collecting a plurality of first images each including a dimensional modeling object, the object disposed in the second space using a camera disposed in a second space having a second reference coordinate system different from the first reference coordinate system It may include collecting a second image obtained by photographing and generating learning data for the object by using the degree of freedom information of the camera and the three-dimensional modeling object included in the plurality of first images. have.

나아가, 본 발명에 따른 학습 데이터 수집 시스템은, 제1 공간에서의 대상물에 대해 모델링을 수행한 3차원 모델링 객체를 생성하는 모델링부, 상기 제1 공간과 다른 제2 공간에 배치된 카메라로부터 상기 대상물에 대한 영상을 수신하는 통신부 및 제1 기준 좌표계를 갖는 상기 제1 공간에서 서로 다른 자세를 가지는 상기 3차원 모델링 객체를 각각 포함하는 복수의 영상을 수집하는 제어부를 포함할 수 있다.Furthermore, the learning data collection system according to the present invention includes a modeling unit that generates a three-dimensional modeling object that has performed modeling on an object in a first space, and a camera disposed in a second space different from the first space. It may include a communication unit for receiving an image of , and a control unit for collecting a plurality of images each including the 3D modeling object having different postures in the first space having a first reference coordinate system.

나아가, 제어부는, 상기 카메라의 자유도 정보와 상기 복수의 영상에 포함된 상기 3차원 모델링 객체를 이용하여, 상기 대상물에 대한 학습 데이터를 생성할 수 있다.Furthermore, the controller may generate learning data for the object by using the degree of freedom information of the camera and the 3D modeling object included in the plurality of images.

이러한 제어부는 상기 제1 영상에서 상기 대상물에 대응되는 그래픽 객체와 상기 제2 영상에 포함된 3차원 모델링 객체의 관계성에 근거하여, 상기 서로 다른 자세를 가지는 상기 3차원 모델링 객체의 각각에 대한 상기 제1 공간에 배치된 상기 카메라의 자유도 정보를 추출할 수 있다.The control unit may control the first image for each of the three-dimensional modeling objects having different postures based on the relationship between the graphic object corresponding to the object in the first image and the three-dimensional modeling object included in the second image. Information on the degree of freedom of the camera disposed in one space may be extracted.

나아가, 본 발명에 따른 자유도 자세 추출 방법은, 대상물에 대한 촬영 영상을 카메라로부터 수신하는 단계, 기 설정된 데이터 세트에 포함된 복수의 기준 이미지로부터, 상기 촬영 영상에 대응되는 특정 기준 이미지를 검색하는 단계 및 상기 특정 기준 이미지에 매칭된 자유도 정보를 이용하여, 상기 촬영 영상에 대응되는 상기 대상물의 자유도 자세를 추출하는 단계를 포함하고, 상기 복수의 기준 이미지는, 상기 대상물에 대하여 서로 다른 자세를 갖는 3차원 모델링 객체를 각각 포함할 수 있다.Furthermore, the method for extracting a degree of freedom posture according to the present invention comprises the steps of receiving a photographed image of an object from a camera, and searching for a specific reference image corresponding to the photographed image from a plurality of reference images included in a preset data set. and extracting a degree of freedom posture of the object corresponding to the captured image by using the degree of freedom information matched to the specific reference image, wherein the plurality of reference images have different postures with respect to the object Each of the three-dimensional modeling objects having

위에서 살펴본 것과 같이, 본 발명에 따른 학습 데이터 수집 시스템 및 방법은, 대상물에 대한 3차원 모델링 객체를 생성하고, 촬영된 영상에 포함된 대상물에 해당하는 그래픽 객체로부터 생성된 3차원 모델링 객체 간의 관계성을 이용하여, 3차원 모델링 객체의 자유도 정보를 추출할 수 있다. 이를 통해, 본 발명은, 3차원 모델링 객체를 실제 환경에서 촬영된 영상에 반영함으로써, 실제 환경에서의 조명, 그림자 등이 반영된 학습 데이터를 생성할 수 있다. 결과적으로, 본 발명에 의하면, 보다 실제 환경에 가까운 학습 데이터를 수집하는 것이 가능하다.As described above, the learning data collection system and method according to the present invention generates a three-dimensional modeling object for an object, and the relationship between the three-dimensional modeling object generated from the graphic object corresponding to the object included in the photographed image. can be used to extract the degree of freedom information of the 3D modeling object. Through this, the present invention can generate learning data in which lighting, shadows, and the like in the real environment are reflected by reflecting the 3D modeling object in the image captured in the real environment. Consequently, according to the present invention, it is possible to collect learning data closer to the real environment.

도 1은 본 발명에 따라 수집된 학습 데이터가 활용되는 예를 설명하기 위한 개념도이다.
도 2는 3차원 모델링 객체를 생성하는 방법을 설명하기 위한 개념도이다.
도 3은 본 발명에 따른 학습 데이터 수집 시스템을 설명하기 위한 개념도이다.
도 4는 본 발명에 따른 학습 데이터 수집 방법을 설명하기 위한 흐름도이다.
도 5, 도 6, 도 7, 도 8, 도 9 및 도 10은 학습 데이터를 수집하는 방법을 설명하기 위한 개념도들이다.
도 11 및 도 12는 수집된 학습 데이터를 활용하는 방법을 설명하기 위한 개념도들이다.1 is a conceptual diagram for explaining an example in which learning data collected according to the present invention is utilized.
2 is a conceptual diagram for explaining a method of generating a 3D modeling object.
3 is a conceptual diagram for explaining a learning data collection system according to the present invention.
4 is a flowchart illustrating a learning data collection method according to the present invention.
5, 6, 7, 8, 9 and 10 are conceptual diagrams for explaining a method of collecting learning data.
11 and 12 are conceptual diagrams for explaining a method of using the collected learning data.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소에는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. Hereinafter, the embodiments disclosed in the present specification will be described in detail with reference to the accompanying drawings, but the same or similar components will be given the same reference numerals regardless of reference numerals, and redundant descriptions thereof will be omitted. The suffixes "module" and "part" for components used in the following description are given or mixed in consideration of only the ease of writing the specification, and do not have distinct meanings or roles by themselves. In addition, in describing the embodiments disclosed in the present specification, if it is determined that detailed descriptions of related known technologies may obscure the gist of the embodiments disclosed in this specification, the detailed description thereof will be omitted. In addition, the accompanying drawings are only for easy understanding of the embodiments disclosed in this specification, and the technical idea disclosed herein is not limited by the accompanying drawings, and all changes included in the spirit and scope of the present invention , should be understood to include equivalents or substitutes.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms including ordinal numbers such as first, second, etc. may be used to describe various elements, but the elements are not limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.When a component is referred to as being “connected” or “connected” to another component, it may be directly connected or connected to the other component, but it is understood that other components may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. The singular expression includes the plural expression unless the context clearly dictates otherwise.

본 출원에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In the present application, terms such as “comprises” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It is to be understood that this does not preclude the possibility of the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

본 발명은, 인공지능 네트워크의 학습을 위한 학습 데이터를 수집하는 학습 데이터 수집 시스템 및 방법에 관한 것으로서, 특히 자유도 정보(또는 자유도 자세)를 포함하는 학습 데이터를 자동으로 수집할 수 있는 학습 데이터 수집 방법 및 시스템에 대한 것이다.The present invention relates to a learning data collection system and method for collecting learning data for learning of an artificial intelligence network, and in particular, learning data capable of automatically collecting learning data including degrees of freedom information (or degrees of freedom posture). It is about a collection method and system.

앞서 살펴본 것과 같이, 인공지능의 발전에 힘입어 영상인식 기술은 다양한 산업분야에 활용되고 있다. 특히, 로봇 분야에서는, 인공지능 기반의 영상 인식 기술(예를 들어, 딥러닝 기반의 영상인식 기술)에 기반하여, 로봇이 속한 작업 환경을 분석 및 이해하고, 이를 기반으로 로봇이 목표로 하는 작업을 수행하고 있다.As mentioned above, thanks to the development of artificial intelligence, image recognition technology is being used in various industrial fields. In particular, in the field of robots, based on artificial intelligence-based image recognition technology (eg, deep learning-based image recognition technology), the robot analyzes and understands the working environment to which the robot belongs, and based on this, the robot's target task is performing

예를 들어, 도 1에 도시된 것과 같이, 로봇(R)에게 특정 작업(예를 들어, 설거지(dish-washing)이 주어진 경우, 로봇(R) 또는 로봇(R) 주변에 배치된 카메라(미도시됨)는 로봇(R)의 작업 환경에 해당하는 영상을 촬영할 수 있다. 그리고, 로봇(R)의 제어부는, 촬영된 영상에 기반하여, 로봇(R)이 특정 작업을 수행하기 위하여, 어떻게 동작해야 하는지에 대한 판단을 내리고, 판단에 따라 동작하도록 로봇(R)을 제어할 수 있다.For example, as shown in FIG. 1 , when the robot R is given a specific task (eg, dish-washing), the robot R or a camera disposed around the robot R (not shown) may take an image corresponding to the working environment of the robot R. And, the control unit of the robot R, based on the captured image, in order for the robot R to perform a specific task, how It is possible to make a determination as to whether to operate, and control the robot R to operate according to the determination.

이 경우, 로봇(R)의 제어부는, 촬영된 영상에서 작업의 대상이 되는 대상물(A, 또는 객체(object), 예를 들어, 그릇(a1, a2))을 인식하고, 대상물(A)의 위치 및 자세(또는 포즈, pose)를 분석하여, 로봇(R)이 대상물에 대해 목표로 하는 작업을 수행할 수 있도록 로봇(R)을 제어해야 한다.In this case, the control unit of the robot R recognizes the object (A, or object, for example, the bowl (a1, a2)) to be the target of the operation in the captured image, and the object (A) By analyzing the position and posture (or pose, pose), the robot R must be controlled so that the robot R can perform a target operation on the object.

이를 위하여, 로봇(R)의 제어부는, 촬영된 영상으로부터 다양한 정보를 수집하여야 하며, 예를 들어, i) 작업의 대상이 되는 대상물의 종류, ii) 작업의 대상이 되는 대상물의 크기, iii) 작업의 대상이 되는 대상물의 형상, iv) 작업의 대상이 되는 대상물의 위치(예를 들어, 도 1에 도시된 것과 같이, 그릇(a1)이 싱크대(sink)의 어디쯤에 놓여 있는지 등), v) 작업의 대상이 되는 대상물의 자세(예를 들어, 도 1에 도시된 것과 같이, 그릇(a1)이 싱크대에 놓여져 있는 자세(ex: 비스듬히 기울어져 있는지 등)), vi) 대상물을 촬영하는 카메라의 자세에 대한 정보 중 복수의 정보를 이용하여, 로봇(R)을 정확하게 제어할 수 있다. To this end, the control unit of the robot (R) must collect various information from the captured image, for example, i) the type of the object to be worked on, ii) the size of the object to be worked on, iii) The shape of the object to be worked on, iv) the position of the object to be worked on (for example, where the bowl a1 is placed in the sink, as shown in FIG. 1, etc.), v) the posture of the object to be worked (for example, as shown in FIG. 1, the posture in which the bowl a1 is placed on the sink (ex: whether it is inclined at an angle, etc.)), vi) to photograph the object The robot R can be precisely controlled by using a plurality of pieces of information among the camera posture information.

여기에서, 작업의 대상이 되는 대상물 또는 대상물을 촬영하는 카메라의 위치 및 자세는 “자유도”, “자유도 자세” 또는 “자유도 정보”라고도 표현될 수 있으며, 본 명세서에서는 설명의 편의를 위하여, “자유도 정보”라고 통일하여 명명하도록 한다.Here, the position and posture of the object to be the target of the work or the camera for photographing the object may be expressed as “degree of freedom”, “position of freedom” or “degree of freedom information”, and in this specification, for convenience of description, , “degree of freedom information” should be unified and named.

한편, 자유도 정보는 위치 정보 및 자세 정보를 포함한 개념으로 이해되어 질 수 있다. 이러한, 자유도 정보는, 3차원 위치(x, y, z)에 해당하는 위치 정보(또는 3차원 위치 정보) 및 3차원 자세(r(roll), θ(pitch), Φ(yaw))에 해당하는 자세 정보(또는 3차원 자세 정보)를 포함할 수 있다.Meanwhile, the degree of freedom information may be understood as a concept including location information and posture information. This, degree of freedom information, position information (or three-dimensional position information) corresponding to the three-dimensional position (x, y, z) and three-dimensional posture (r (roll), θ (pitch), Φ (yaw)) The corresponding posture information (or 3D posture information) may be included.

한편, 로봇(R)이 작업의 대상이 되는 대상물에 대하여 정확하게 작업을 수행하기 위해서는 자유도 정보를 파악하는 것이 매우 중요하다. On the other hand, in order for the robot R to accurately perform an operation on an object to be operated, it is very important to grasp the degree of freedom information.

예를 들어, 로봇(R)의 제어부는 작업의 대상이 되는 대상물(a1, a2)을 잡기 위하여, 로봇 팔(R1, R2)을 어떤 각도로 제어하고, 어떤 자세로 파지를 해야 하는지를 결정해야 하며, 이는 작업의 대상이 되는 대상물(또는 대상물을 촬영하는 카메라)의 자세 및 위치 중 적어도 하나에 근거하여 결정되기 때문이다.For example, the control unit of the robot R controls the robot arms R1 and R2 at what angle and in what posture to hold the objects a1 and a2 to be operated. , because it is determined based on at least one of the posture and the position of the object (or the camera that captures the object) to be the target of the operation.

이때, 촬영된 영상으로부터 작업의 대상이 되는 대상물(예를 들어, a1, a2)이 인식된 것만으로, 대상물(또는 대상물을 촬영한 카메라)의 자유도 정보까지 인지할 수 있다면, 작업의 정확도 뿐만 아니라, 작업의 효율을 확보할 수 있다.At this time, if the degree of freedom information of the object (or the camera that photographed the object) can be recognized only by recognizing the object (for example, a1, a2) that is the object of the operation from the photographed image, the accuracy of the operation as well as Rather, it is possible to secure the efficiency of the work.

이를 위하여, 촬영된 영상으로부터 획득되는 특정 형상(또는 특정 자세)를 갖는 대상물에 대한 이미지(또는 마스크(mask)와 대상물에 대한 자세 정보가 상호 매칭되어, 학습 데이터로서 활용될 수 있다.To this end, an image (or a mask) of an object having a specific shape (or a specific posture) obtained from a photographed image is matched with the posture information of the object, and may be used as learning data.

한편, 대상물에 대한 자세 정보는, i) 대상물이 특정 형상일때, 대상물의 기준 좌표계를 기준으로 어떤 위치 또는 어떤 자세를 갖는지에 대한 대상물 기준의 자유도 정보 및 ii) 대상물이 특정 형상 일때, 대상물을 촬영한 카메라가, 카메라의 기준 좌표계를 기준으로, 어떤 위치 또는 어떤 자세를 갖는지에 대한 카메라 기준의 자유도 정보 중 적어도 하나를 포함할 수 있다.On the other hand, the posture information about the object includes: i) the degree of freedom information of the object reference for what position or posture it has based on the reference coordinate system of the object when the object has a specific shape, and ii) when the object has a specific shape, the object The photographed camera may include at least one of camera-referenced degree of freedom information regarding which position or which posture the photographed camera has based on the reference coordinate system of the camera.

본 발명에서 설명되는 자유도 정보는, 대상물 기준의 자유도 정보와 카메라 기준의 자유도 정보를 혼용하는 개념으로 이해되어 질 수 있다. The degree of freedom information described in the present invention may be understood as a concept in which the degree of freedom information based on the object and the degree of freedom information based on the camera are mixed.

즉, 대상물의 자유도 정보는 곧 대상물을 촬영한(또는 대상물을 바라보는) 카메라의 자유도 정보로 이해되어질 수 있다. 이와 반대로, 대상물을 촬영한 카메라의 자유도 정보는, 대상물의 자유도 정보로 이해되어 질 수 있음은 물론이다. That is, the information on the degree of freedom of the object may be understood as information on the degree of freedom of the camera that has photographed the object (or is looking at the object). Conversely, it goes without saying that the degree of freedom information of the camera photographing the object may be understood as the degree of freedom information of the object.

이는, 대상물의 기준 좌표계와 카메라의 기준 좌표계는 서로 상대적인 위치 관계를 갖기 때문이다.This is because the reference coordinate system of the object and the reference coordinate system of the camera have a relative positional relationship with each other.

예를 들어, 대상물의 기준 좌표계에 대한 대상물의 자유도 정보에 역변환을 수행하는 경우, 대상물의 기준 좌표계에 대한 카메라의 자유도 정보가 얻어질 수 있다.For example, when inverse transformation is performed on the degree of freedom information of the object with respect to the reference coordinate system of the object, information on the degree of freedom of the camera with respect to the reference coordinate system of the object may be obtained.

이와 반대로, 카메라의 기준 좌표계에 대한 카메라의 자유도 정보에 역변환을 수행하는 경우, 카메라의 기준 좌표계에 대한 대상물의 자유도 정보가 얻어질 수 있다.Conversely, when inverse transformation is performed on the camera's degree of freedom information with respect to the camera's reference coordinate system, information on the degree of freedom of the object with respect to the camera's reference coordinate system can be obtained.

나아가, 대상물의 기준 좌표계와 카메라의 기준 좌표계 간의 상대적인 위치 관계가 정의되는 경우, 대상물의 기준 좌표계에 대한 대상물의 자유도 정보로부터, 카메라의 기준 좌표계에 대한 카메라의 자유도 정보가 얻어질 수 있다.Furthermore, when the relative positional relationship between the reference coordinate system of the object and the reference coordinate system of the camera is defined, information on the degree of freedom of the camera with respect to the reference coordinate system of the camera may be obtained from information on the degree of freedom of the object with respect to the reference coordinate system of the object.

이와 반대로, 카메라의 기준 좌표계에 대한 카메라의 자유도 정보로부터, 대상물의 기준 좌표계에 대한 대상물의 자유도 정보가 얻어질 수 있음은 물론이다.Conversely, it goes without saying that information on the degree of freedom of the object with respect to the reference coordinate system of the object may be obtained from the information on the degree of freedom of the camera with respect to the reference coordinate system of the camera.

예를 들어, 대상물의 자유도 정보에 대하여, 대상물의 기준 좌표계와 카메라의 기준 좌표계 간의 상대적인 위치 관계를 반영하는 경우, 카메라의 자유도 정보가 얻어질 수 있다. 이와 반대로, 카메라의 자유도 정보에 대하여, 카메라의 기준 좌표계와 대상물의 기준 좌표계 간의 상대적인 위치 관계를 반영하는 경우, 대상물의 자유도 정보가 얻어질 수 있다. For example, when the relative positional relationship between the reference coordinate system of the object and the reference coordinate system of the camera is reflected with respect to the degree of freedom information of the object, the degree of freedom information of the camera may be obtained. Conversely, when the relative positional relationship between the camera reference coordinate system and the object reference coordinate system is reflected with respect to the camera degree of freedom information, the degree of freedom information of the object may be obtained.

여기에서, 상대적인 위치 관계는, 어느 하나의 기준 좌표계에 대하여 다른 하나의 기준 좌표계가 회전(rotation) 및 변환(translation, 병진 이동)된 정도를 의미할 수 있다. Here, the relative positional relationship may mean a degree of rotation and translation of one reference coordinate system with respect to the other reference coordinate system.

한편, 로봇(R)이 정확한 작업을 수행하기 위해서는, 방대한 학습 데이터를 기반으로 학습된 인공지능 알고리즘(예를 들어, 딥러닝 알고리즘 또는 딥러닝 네트워크)이 필요하다. 따라서, 본 발명에서는, 학습 데이터를 수집하는 방법에 대하여 첨부된 도면과 함께 보다 구체적으로 살펴본다. 도 2는 3차원 모델링 객체를 생성하는 방법을 설명하기 위한 개념도이고, 도 3은 본 발명에 따른 학습 데이터 수집 시스템을 설명하기 위한 개념도이다. 나아가, 도 4는 본 발명에 따른 학습 데이터 수집 방법을 설명하기 위한 흐름도이며, 도 5, 도 6, 도 7, 도 8, 도 9 및 도 10은 학습 데이터를 수집하는 방법을 설명하기 위한 개념도들이다. 나아가, 도 11 및 도 12는 수집된 학습 데이터를 활용하는 방법을 설명하기 위한 개념도들이다.On the other hand, in order for the robot R to perform an accurate task, an artificial intelligence algorithm (eg, a deep learning algorithm or a deep learning network) learned based on massive learning data is required. Accordingly, in the present invention, a method of collecting learning data will be described in more detail with the accompanying drawings. 2 is a conceptual diagram for explaining a method of generating a 3D modeling object, and FIG. 3 is a conceptual diagram for explaining a learning data collection system according to the present invention. Furthermore, FIG. 4 is a flowchart for explaining a method of collecting learning data according to the present invention, and FIGS. 5, 6, 7, 8, 9 and 10 are conceptual diagrams for explaining a method of collecting learning data. . Furthermore, FIGS. 11 and 12 are conceptual diagrams for explaining a method of using the collected learning data.

본 발명에 대한 설명에 앞서, 본 명세서에서 언급되는 “대상물”은, 그 종류에 제한이 없으며, 매우 다양한 물체로 해석되어 질 수 있다. 대상물은 시각적 또는 물리적으로 구분이 가능한 구체적인 형태를 가지고 있는 것으로서, 물건(또는 물체) 뿐만 아니라, 사람 또는 동물의 개념까지 포함하는 것으로 이해되어 질 수 있다.Prior to the description of the present invention, the "object" referred to in the present specification is not limited in its type, and may be interpreted as a wide variety of objects. The object has a specific shape that can be visually or physically distinguished, and may be understood to include not only the object (or object), but also the concept of a person or an animal.

앞서 살펴본 것과 같이, 로봇 또는 자율 주행 차량 등의 보다 높은 성능을 위해서는, 최대한 많은 양의 학습 데이터를 기반으로, 학습을 수행하는 것이다. 이를 위하여, 학습 데이터를 확보하는 것은 매우 중요한 일이며, 본 발명에서는 3차원 모델링 객체를 활용하여 학습 데이터를 확보하는 방법에 대하여 제안한다.As described above, in order to achieve higher performance of a robot or an autonomous vehicle, learning is performed based on as much learning data as possible. To this end, it is very important to secure learning data, and the present invention proposes a method of securing learning data by using a 3D modeling object.

도 2에 도시된 것과 같이, 본 발명에서는, 대상물(예를 들어, 도 2에 도시된 컵(cup))에 대해 모델링을 수행한 3차원 모델링 객체(610)를 생성 생성할 수 있다. 이러한 3차원 모델링 객체(610)는 실제 물체와 최대한 동일한 형상 및 크기(또는 크기 비율)을 갖도록 모델링 될 수 있다. 3차원 모델링 객체(610)를 모델링 하는 방법은, 매우 다양하며, 예를 들어, 3D CAD를 통하여 생성될 수 있다. 이러한 3차원 모델링 객체(610)는 텍스쳐(texture)가 입혀진 메쉬(mesh) 모델에 해당할 수 있다. 3차원 모델링 객체(610)에 입혀지는 텍스쳐는, 실제 물체와 동일 또는 유사하게 이루어질 수 있다.As shown in FIG. 2 , in the present invention, a three-dimensional modeling object 610 that has been modeled on an object (eg, a cup shown in FIG. 2 ) may be generated and generated. The three-dimensional modeling object 610 may be modeled to have the same shape and size (or size ratio) as much as possible to the real object. A method of modeling the 3D modeling object 610 is very diverse, and may be generated through, for example, 3D CAD. The 3D modeling object 610 may correspond to a mesh model to which a texture is applied. The texture applied to the 3D modeling object 610 may be the same as or similar to that of the real object.

나아가, 본 발명에서는, 제1 기준 좌표계(W1)를 갖는 제1 공간에서 서로 다른 자세를 가지는 상기 3차원 모델링 객체(610)를 각각 포함하는 복수의 제1 영상(611 내지 616 참고)을 수집할 수 있다.Furthermore, in the present invention, a plurality of first images 611 to 616 each including the three-dimensional modeling objects 610 having different postures in a first space having a first reference coordinate system W1 may be collected. can

여기에서, 제1 기준 좌표계(W1)는, 3차원 모델링 객체(610)가 포함된 가상의 환경(모델링 환경)의 기준 좌표계를 의미할 수 있다.Here, the first reference coordinate system W1 may mean a reference coordinate system of a virtual environment (modeling environment) including the 3D modeling object 610 .

한편, 이러한 모델링 객체의 생성 및 영상의 수집은, 도 3에서 살펴볼 모델링부(131) 또는 제어부(130)에 의하여 이루어질 수 있으며, 설명의 편의를 위하여, 제어부(130)로 통일하여 설명하도록 한다.Meanwhile, the generation of the modeling object and the collection of images may be performed by the modeling unit 131 or the control unit 130 to be considered in FIG. 3 , and for convenience of explanation, the control unit 130 will be unified and described.

복수의 제1 영상(611 내지 616 참고)에는, 도 2에 도시된 것과 같이, 서로 다른 자세를 갖는 3차원 모델링 객체(610)가 포함될 수 있다.As shown in FIG. 2 , the plurality of first images 611 to 616 may include 3D modeling objects 610 having different postures.

이러한, 복수의 제1 영상(611 내지 616 참고)에 포함된 3차원 모델링 객체(610)는 제1 기준 좌표계(W1)를 기준으로, 제1 기준 좌표계 내에서 소정의 위치에 소정의 자세로 위치할 수 있다. 이러한 소정의 위치 및 소정의 자세는 각각의 3차원 모델링 객체(610)의 자유도 정보가 될 수 있다.The three-dimensional modeling object 610 included in the plurality of first images (refer to 611 to 616) is positioned at a predetermined position and in a predetermined posture within the first reference coordinate system with respect to the first reference coordinate system W1. can do. These predetermined positions and predetermined postures may be information on degrees of freedom of each 3D modeling object 610 .

즉, 제어부(130)는 복수의 제1 영상(611 내지 616)에 서로 다른 자유도 정보를 갖는 3차원 모델링 객체(610)가 포함되도록, 복수의 제1 영상(611 내지 616)을 생성할 수 있다. 나아가, 제어부(130)는 복수의 제1 영상(611 내지 616)에 포함된 3차원 모델링 객체(610)에 대한 3D Point Cloud를 생성할 수 있으며, 이를 기반으로, 각각의 영상으로부터 3차원 모델링 객체(610)의 깊이 정보를 확보할 수 있다.That is, the controller 130 may generate a plurality of first images 611 to 616 such that the three-dimensional modeling object 610 having different degrees of freedom information is included in the plurality of first images 611 to 616 . have. Furthermore, the controller 130 may generate a 3D point cloud for the 3D modeling object 610 included in the plurality of first images 611 to 616, and based on this, a 3D modeling object from each image Depth information of 610 may be secured.

한편, 이러한 복수의 제1 영상(611 내지 616)은, 제1 카메라 기준 좌표계(C1)를 갖는 가상의 카메라(620)가 3차원 모델링 객체(610)를 촬영하였다는 가정을 전제로 생성된 영상일 수 있다.Meanwhile, the plurality of first images 611 to 616 are images generated on the premise that the virtual camera 620 having the first camera reference coordinate system C1 photographed the 3D modeling object 610 . can be

즉, 제어부(130)는 가상의 카메라(620)가 3차원 모델링 객체(610)를 촬영하였다는 가정하에, 복수의 제1 영상(611 내지 616)을 생성할 수 있다.That is, the controller 130 may generate a plurality of first images 611 to 616 on the assumption that the virtual camera 620 has captured the 3D modeling object 610 .

이때, 복수의 제1 영상(611 내지 616)은, i)3차원 모델링 객체(610)은 고정된 상태에서 가상의 카메라(620)가 다양한 자세로 3차원 모델링 객체(610)를 촬영하였다는 전제하에 생성된 영상이거나, ii) 가상의 카메라(620)는 고정된 상태에서, 3차원 모델링 객체(610)가 다양한 자세로 움직였졌다는 전제하에 촬영된 영상일 수 있다.In this case, the plurality of first images 611 to 616 is a premise that i) the 3D modeling object 610 is fixed and the 3D modeling object 610 is photographed by the virtual camera 620 in various postures. or ii) the virtual camera 620 may be an image captured on the premise that the 3D modeling object 610 has moved in various postures in a fixed state.

이와 같이, 본 발명에서는, 3차원 모델링 객체(610)를 이용하여, 제어부(130) 자체에서, 복수의 제1 영상(611 내지 616)을 생성하기에, 복수의 제1 영상(611 내지 616)에 대한 다양한 정보가 확보될 수 있다.As described above, in the present invention, the controller 130 itself generates a plurality of first images 611 to 616 using the three-dimensional modeling object 610, so that the plurality of first images 611 to 616 are A variety of information can be obtained.

다양한 정보는, i) 복수의 제1 영상(611 내지 616) 각각에 포함된 3차원 모델링 객체(610)의 제1 기준 좌표계(W1)를 기준으로 하는 자유도 정보, ii) 복수의 제1 영상(611 내지 616)을 촬영한 카메라(620)의 제1 카메라 좌표계(C1)에 대한 자유도 정보, iii) 제1 기준 좌표계(W1)과 제1 카메라 좌표계(C1) 간의 상대적인 위치관계 정보, iv) 복수의 제1 영상(611 내지 616)을 촬영한 카메라(620)의 제1 기준 좌표계(W1)에 대한 자유도 정보, v) 복수의 제1 영상(611 내지 616) 각각에 포함된 3차원 모델링 객체(610)의 제1 카메라 좌표계(C1)에 대한 자유도 정보, vii) 복수의 제1 영상(611 내지 616) 각각에 포함된 3차원 모델링 객체(610)의 뎁스(깊이) 정보 중 적어도 하나를 포함할 수 있다.Various types of information include i) degrees of freedom information based on the first reference coordinate system W1 of the 3D modeling object 610 included in each of the plurality of first images 611 to 616 , ii) the plurality of first images Degree of freedom information for the first camera coordinate system C1 of the camera 620 photographing (611 to 616), iii) relative positional relationship information between the first reference coordinate system W1 and the first camera coordinate system C1, iv ) information on the degree of freedom for the first reference coordinate system W1 of the camera 620 photographing the plurality of first images 611 to 616, v) 3D included in each of the plurality of first images 611 to 616 At least among the degree of freedom information for the first camera coordinate system C1 of the modeling object 610, vii) depth (depth) information of the 3D modeling object 610 included in each of the plurality of first images 611 to 616 may contain one.

한편, 위에서 열거된 다양한 정보는, 저장부(120, 도 3 참조)에 저장될 수 있다. 나아가, 저장부(120)에는 위에서 열거된 다양한 정보 중 적어도 하나와, 이에 대응되는 제1 영상(또는 제1 영상에 포함된 모델링 객체(610)에 대응되는 이미지)이 매칭되어 저장될 수 있다. Meanwhile, the various information listed above may be stored in the storage unit 120 (refer to FIG. 3 ). Furthermore, at least one of the above-listed various pieces of information and a first image corresponding thereto (or an image corresponding to the modeling object 610 included in the first image) may be matched and stored in the storage unit 120 .

즉, 복수의 제1 영상(611 내지 616) 각각은, 각각의 제1 영상에 포함된 3차원 모델링 객체(610)의 자세와 관련된 정보와 상호 매칭되어 저장될 수 있다. That is, each of the plurality of first images 611 to 616 may be stored while being matched with information related to the posture of the 3D modeling object 610 included in each of the first images.

다음으로, 본 발명에 따른 학습 데이터 수집 시스템(100)에 대하여 도 3과 함께 보다 구체적으로 살펴본다. 본 발명에 따른 학습 데이터 수집 시스템(100)은, 통신부(110), 저장부(120) 및 제어부(130) 중 적어도 하나를 포함할 수 있다.Next, the learning data collection system 100 according to the present invention will be looked at in more detail with reference to FIG. 3 . The learning data collection system 100 according to the present invention may include at least one of a communication unit 110 , a storage unit 120 , and a control unit 130 .

통신부(110)는 카메라(200)로부터 촬영된 영상을 수신하기 위한 수단으로서, 통신 방법에는 특별한 제한을 두지 않는다.The communication unit 110 is a means for receiving the image photographed from the camera 200, and there is no particular limitation on the communication method.

통신부(110)는 유선 또는 무선 통신 중 적어도 하나를 수행하도록 이루어질 수 있다. 통신부(110)는 통신이 가능한 다양한 대상과 통신을 수행하도록 이루어질 수 있다. The communication unit 110 may be configured to perform at least one of wired or wireless communication. The communication unit 110 may be configured to perform communication with various communication-enabled targets.

한편, 통신부(110)는 적어도 하나의 외부 서버와 통신하도록 이루어질 수 있다. 여기에서, 외부 서버는, 저장부(120)의 적어도 일부의 구성에 해당하는 클라우드 서버 또는 데이터베이스 중 적어도 하나를 포함할 수 있다. 한편, 외부 서버에서는, 제어부(130)의 적어도 일부의 역할을 수행하도록 구성될 수 있다. 즉, 데이터 처리 또는 데이터 연산 등의 수행은 외부 서버에서 이루어지는 것이 가능하며, 본 발명에서는 이러한 방식에 대한 특별한 제한을 두지 않는다.Meanwhile, the communication unit 110 may be configured to communicate with at least one external server. Here, the external server may include at least one of a cloud server and a database corresponding to at least a part of the storage unit 120 . Meanwhile, the external server may be configured to perform at least a part of the control unit 130 . That is, it is possible to perform data processing or data operation in an external server, and the present invention does not place any particular limitation on this method.

한편, 통신부(110)는 통신하는 대상의 통신 규격에 따라 다양한 통신 방식을 지원할 수 있다. Meanwhile, the communication unit 110 may support various communication methods according to a communication standard of a communication target.

예를 들어, 통신부(110)는, WLAN(Wireless LAN), Wi-Fi(Wireless-Fidelity), Wi-Fi(Wireless Fidelity) Direct, DLNA(Digital Living Network Alliance), WiBro(Wireless Broadband), WiMAX(World Interoperability for Microwave Access), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced), 5G(5th Generation Mobile Telecommunication ), 블루투스(Bluetooth™), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association; IrDA), UWB(Ultra-Wideband), ZigBee, NFC(Near Field Communication), Wi-Fi Direct, Wireless USB(Wireless Universal Serial Bus) 기술 중 적어도 하나를 이용하여, 통신을 수행하도록 이루어질 수 있다.For example, the communication unit 110 may include a Wireless LAN (WLAN), a Wireless-Fidelity (Wi-Fi), a Wireless Fidelity (Wi-Fi) Direct, a Digital Living Network Alliance (DLNA), a Wireless Broadband (WiBro), and a WiMAX (Wireless Broadband). World Interoperability for Microwave Access), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), Long Term Evolution-Advanced (LTE-A), 5th Generation Mobile Telecommunication (5G) , Bluetooth™, RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra-Wideband), ZigBee, NFC (Near Field Communication), Wi-Fi Direct, Wireless USB (Wireless Universal) Serial Bus) technology may be used to perform communication.

한편, 카메라(200)는 영상을 촬영하기 위한 수단으로서, 본 발명에 따른 시스템(100) 내에 포함되거나, 또는 별도로 구비될 수 있다. 본 발명에서 카메라(200)는 “이미지 센서”라고도 명명될 수 있다.Meanwhile, the camera 200 is a means for capturing an image, and may be included in the system 100 according to the present invention or may be provided separately. In the present invention, the camera 200 may also be referred to as an “image sensor”.

카메라(200)는 정적인 영상 및 동적인 영상 중 적어도 하나를 촬영하도록 이루어질 수 있으며, 단수 또는 복수로 구비될 수 있다.The camera 200 may be configured to photograph at least one of a static image and a dynamic image, and may be provided in singular or plural number.

카메라(200)는 대상물(또는 피사체, 또는 물체, 도면부호 300 참조)의 깊이 정보를 획득할 수 있는 3차원 깊이 카메라(3D depth camera) 또는 RGB-깊이 카메라(RGB-depth camera) 등으로 이루어질 수 있다. 카메라(200)가 3차원 깊이 카메라로 이루어진 경우, 촬영된 영상을 이루는 각 픽셀(pixel)의 깊이 값을 알 수 있으며, 이를 통하여 대상물의 깊이 정보가 획득될 수 있다The camera 200 may include a 3D depth camera or RGB-depth camera capable of acquiring depth information of an object (or a subject, or object, refer to reference numeral 300). have. When the camera 200 is a 3D depth camera, the depth value of each pixel constituting the photographed image may be known, and depth information of the object may be obtained through this.

이러한 카메라(200)는 도 3에 도시된 것과 같이, 제2 기준 좌표계(300)를 갖는 제2 공간(400)에 위치한 대상물(300)를 촬영하도록 이루어질 수 있다. 카메라(200)는 실제 환경(현실 공간)에 존재하는 카메라(200)를 의미할 수 있다.As shown in FIG. 3 , the camera 200 may be configured to photograph the object 300 located in the second space 400 having the second reference coordinate system 300 . The camera 200 may mean a camera 200 existing in a real environment (real space).

나아가, 카메라(200)는 제2 카메라 좌표계(C2)를 갖도록 이루어질 수 있다.Furthermore, the camera 200 may be configured to have a second camera coordinate system C2.

즉, 본 발명에서 3차원 모델링 객체(도 2 참조, 610)가 포함된 제1 공간은 제1 기준 좌표계(W1)를 가지고, 3차원 모델링 객체(도 2 참조, 610)를 촬영하는 것으로 정의된 가상의 카메라(620)는 제1 카메라 좌표계(C1)를 가질 수 있다.That is, in the present invention, the first space including the 3D modeling object (see FIG. 2, 610) has the first reference coordinate system W1, and is defined as photographing the 3D modeling object (see FIG. 2, 610). The virtual camera 620 may have a first camera coordinate system C1.

나아가, 실제 환경(현실 공간, 400)에 존재하는, 카메라(200)는 제2 카메라 좌표계(C2)를 가지며, 이러한 카메라(200)는 제2 공간(실제 환경 또는 현실 공간, 400)에서의 제2 기준 좌표계(W2)에 놓여진 대상물(300)을 촬영하도록 이루어질 수 있다.Furthermore, the camera 200, existing in the real environment (real space, 400), has a second camera coordinate system (C2), and this camera 200 is the first in the second space (real environment or real space, 400). 2 It may be made to photograph the object 300 placed in the reference coordinate system W2.

한편, 저장부(120)는 본 발명에 따른 다양한 정보를 저장하도록 이루어질 수 있다. 저장부(120)의 종류는 매우 다양할 수 있으며, 적어도 일부는, 외부 서버(클라우드 서버 및 데이터베이스(database: DB) 중 적어도 하나)를 의미할 수 있다. 즉, 저장부(120)와 관련된 정보가 저장되는 공간이면 충분하며, 물리적인 공간에 대한 제약은 없는 것으로 이해될 수 있다. Meanwhile, the storage unit 120 may be configured to store various information according to the present invention. The types of the storage unit 120 may be very diverse, and at least some of them may mean an external server (at least one of a cloud server and a database (DB)). That is, it may be understood that a space in which information related to the storage unit 120 is stored is sufficient, and there is no restriction on the physical space.

저장부(120)에는 i)앞서 도 2와 함께 살펴본 3차원 모델링 객체와 관련된 다양한 정보, ii)본 발명에 따른 데이터 수집 시스템에 의해 수집된 학습 데이터, iii) 카메라(200)를 통해 촬영된 영상, iv) 촬영된 영상과 관련된 대상물 및 카메라 중 적어도 하나와 관련된 자유도 정보, v) 제1 기준 좌표계(W1), 제1 카메라 좌표계(C1), 제2 기준 좌표계(W2) 및 제2 카메라 좌표계(C2)중 적어도 두개 간의 상대적인 위치관계에 대한 정보 중 적어도 하나가 저장될 수 있다.In the storage unit 120, i) various information related to the three-dimensional modeling object discussed with reference to FIG. 2, ii) learning data collected by the data collection system according to the present invention, iii) an image captured by the camera 200 , iv) degree of freedom information related to at least one of an object and a camera related to a captured image, v) a first reference coordinate system (W1), a first camera coordinate system (C1), a second reference coordinate system (W2), and a second camera coordinate system At least one of information on a relative positional relationship between at least two of (C2) may be stored.

다음으로 제어부(130)는 본 발명과 관련된 학습 데이터 수집 시스템(100)의 전반적인 동작을 제어하도록 이루어질 수 있다. 제어부(130)는 인공지능 알고리즘을 처리 가능한 프로세서(processor, 또는 인공지능 프로세서)를 포함할 수 있다.Next, the controller 130 may be configured to control the overall operation of the learning data collection system 100 related to the present invention. The controller 130 may include a processor capable of processing an artificial intelligence algorithm (or artificial intelligence processor).

제어부(130)는 도 2에서 함께 살펴본 3차원 모델링 객체(610)를 생성 및 이와 관련된 다양한 제1 영상을 생성하는 모델링부(131)를 더 포함할 수 있다.The controller 130 may further include a modeling unit 131 for generating the 3D modeling object 610 shown in FIG. 2 and generating various first images related thereto.

나아가, 제어부(130)는 수집된 영상들 및 자유도 정보를 기반으로 학습을 수행하는 학습부(132)를 더 포함할 수 있다. 이러한 학습부(132)는 신경망 네트워크 구조를 가질 수 있다.Furthermore, the controller 130 may further include a learning unit 132 that performs learning based on the collected images and information on degrees of freedom. The learning unit 132 may have a neural network structure.

한편, 제어부(130)는 딥러닝 알고리즘에 기반하여, 카메라(200)를 통해 촬영되는 영상에서, 카메라(200)에 의해 촬영된 대상물(300)을 인식 및 추적할 수 있다. 이러한 작업은 트래킹(tracking)이라고도 명명될 수 있다. Meanwhile, the controller 130 may recognize and track the object 300 photographed by the camera 200 in the image photographed by the camera 200 based on the deep learning algorithm. This operation may also be called tracking.

나아가 제어부(130)는 카메라(200)로부터 촬영된 영상(이하, 제2 영상) 및 3차원 모델링 객체에 대한 제1 영상을 이용하여, 다양한 학습 데이터를 수집(또는 생성)할 수 있다. Furthermore, the controller 130 may collect (or generate) various learning data by using the image (hereinafter, the second image) captured by the camera 200 and the first image of the 3D modeling object.

본 발명에서, 3차원 모델링 객체에 대한 영상은 “제1 영상”이라고 명명하고, 실제 환경에 존재하는 카메라(200)로부터 촬영된 영상은 “제2 영상”이라고 명명하도록 한다.In the present invention, an image of a 3D modeling object is called a “first image”, and an image captured by the camera 200 existing in a real environment is called a “second image”.

한편, 제어부(130)는 제2 영상에 포함된 마커 보드(Marker Board, 410) 또는 제2 영상에서 대상물(300)에 대응되는 그래픽 객체를 제외한 배경의 텍스쳐에 기반하여, 대상물(300)을 촬영한 카메라(200)의 자유도 정보를 추출할 수 있다.Meanwhile, the control unit 130 captures the object 300 based on a marker board 410 included in the second image or the texture of the background excluding the graphic object corresponding to the object 300 in the second image. Information on degrees of freedom of one camera 200 may be extracted.

제어부(130)는 제2 영상에 포함된 시각적 특성(Visual Feature)에 기반하여, 카메라(200)의 자유도 정보를 추출할 수 있다. The controller 130 may extract information on the degree of freedom of the camera 200 based on a visual feature included in the second image.

상기 시각적 특성은, 제2 영상에 포함된 마커 보드(Marker Board, 410) 또는 제2 영상에서 대상물(300)에 대응되는 그래픽 객체를 제외한 배경의 텍스쳐에 근거하여 정의될 수 있다.The visual characteristic may be defined based on a marker board 410 included in the second image or a texture of a background excluding a graphic object corresponding to the object 300 in the second image.

한편, 카메라(200)의 자유도 정보는, 대상물(300)을 촬영한 카메라(200)의 3차원 위치(x, y, z)에 해당하는 위치 정보(또는 3차원 위치 정보(병진 운동의 자유도에 해당함)) 및 3차원 자세(r(roll,롤), θ(pitch,피치), Φ(yaw,요우))에 해당하는 자세 정보(또는 3차원 자세 정보(회전 운동의 자유도에 해당함))를 포함할 수 있다.On the other hand, the degree of freedom information of the camera 200 is position information (or three-dimensional position information (degree of freedom of translational motion) corresponding to the three-dimensional position (x, y, z) of the camera 200 photographing the object 300 . (corresponding to )) and three-dimensional posture information (or three-dimensional posture information (corresponding to the degree of freedom of rotational motion)) may include.

카메라(200)의 자유도 정보는, 대상물(300)이 위치한 제2 공간(400) 상의 제2 기준 좌표계(W2)에 대한 자유도 정보 및 카메라(200)의 제2 카메라 좌표계(C2)에 대한 자유도 정보 중 적어도 하나를 포함할 수 있다.The degree of freedom information of the camera 200 includes the degree of freedom information with respect to the second reference coordinate system W2 on the second space 400 in which the object 300 is located and the second camera coordinate system C2 of the camera 200 . It may include at least one of degrees of freedom information.

나아가, 제어부(130)는 제2 영상에 대응되는 대상물(300)의 자유도 정보를 획득할 수 있음은 물론이다. 제어부(130)는 카메라(200)의 자유도 정보에 기반하여, 대상물(300)의 자유도 정보를 획득하거나, 제2 영상으로부터 대상물(300)의 자유도 정보를 획득할 수 있다.Furthermore, it goes without saying that the controller 130 may acquire information on the degree of freedom of the object 300 corresponding to the second image. Based on the degree of freedom information of the camera 200 , the controller 130 may acquire information on the degree of freedom of the object 300 or obtain information on the degree of freedom of the object 300 from the second image.

제어부(130)는 제2 기준 좌표계(W2)와 카메라 제2 카메라 좌표계(C2)간의 상대적인 위치 관계에 근거하여, 카메라(200)의 자유도 정보로부터 대상물(300)의 자유도 정보를 산출하거나, 이와 반대로 대상물(300)의 자유도 정보로부터 카메라(200)의 자유도 정보를 산출할 수 있다.The control unit 130 calculates the degree of freedom information of the object 300 from the degree of freedom information of the camera 200 based on the relative positional relationship between the second reference coordinate system W2 and the camera second camera coordinate system C2, Conversely, the degree of freedom information of the camera 200 may be calculated from the degree of freedom information of the object 300 .

이는, 대상물(300)의 제2 기준 좌표계(W2)와 카메라의 기준 좌표계는 서로 상대적인 위치 관계를 갖기 때문이다.This is because the second reference coordinate system W2 of the object 300 and the reference coordinate system of the camera have a relative positional relationship with each other.

예를 들어, 대상물(300)의 제2 기준 좌표계(W2)에 대한 대상물(300)의 자유도 정보에 역변환을 수행하는 경우, 대상물(300)의 제2 기준 좌표계(W2)에 대한 카메라(200)의 자유도 정보가 얻어질 수 있다.For example, when inverse transformation is performed on the degree of freedom information of the object 300 with respect to the second reference coordinate system W2 of the object 300 , the camera 200 for the second reference coordinate system W2 of the object 300 . ) degrees of freedom information can be obtained.

이와 같이, 제어부(130)는, 카메라(200)와 대상물(300)에 각각 포함된 제2 카메라 좌표계(C2) 와 제2 기준 좌표계(W2) 간의 상대적인 위치 관계에 근거하여 다양한 정보를 연산 및 산출하고, 이러한 정보를 저장부(120)에 저장할 수 있다.As such, the controller 130 calculates and calculates various information based on the relative positional relationship between the second camera coordinate system C2 and the second reference coordinate system W2 included in the camera 200 and the object 300, respectively. and store such information in the storage unit 120 .

다양한 정보는, i) 대상물(300)의 제2 기준 좌표계(W2)를 기준으로 하는 자유도 정보, ii) 대상물(300)이 포함된 제2 영상을 촬영한 카메라(200)의 제2 카메라 좌표계(C2)에 대한 자유도 정보, iii) 제2 기준 좌표계(W2)과 제2 카메라 좌표계(C2) 간의 상대적인 위치관계 정보, iv) 제2 영상을 촬영한 카메라(200)의 제2 기준 좌표계(W2)에 대한 자유도 정보,2 v) 제2 영상에 대응되는 대상물(300)의 제2 카메라 좌표계(C2)에 대한 자유도 정보, vii) 제2 영상에서의 대상물(300)에 대응되는 그래픽 객체의 픽셀 좌표에 대한 정보 중 적어도 하나를 포함할 수 있다.Various types of information include: i) degree of freedom information based on the second reference coordinate system W2 of the object 300 , ii) the second camera coordinate system of the camera 200 photographing the second image including the object 300 . (C2) information on the degree of freedom, iii) the relative positional relationship information between the second reference coordinate system (W2) and the second camera coordinate system (C2), iv) the second reference coordinate system of the camera 200 that captured the second image ( Information on degrees of freedom for W2), 2 v) information on degrees of freedom for the second camera coordinate system C2 of the object 300 corresponding to the second image, vii) a graphic corresponding to the object 300 in the second image It may include at least one of information about pixel coordinates of an object.

한편, 본 발명에서 제어부(130)는 i)3차원 모델링 객체에 대한 복수의 제1 영상 및 대상물(200)을 촬영한 제2 영상, ii) 제1 및 제2 영상에서의 대상물(200)의 자유도 정보, iii) 제1 및 제2 영상에서의 카메라(또는 가상 카메라)의 자유도 정보, iv) 제1 기준 좌표계(W1), 제1 카메라 좌표계(C1), 제2 기준 좌표계(W2), 제2 카메라 좌표계(C2) 중 적어도 두개 간의 상대적인 위치 관계 중 적어도 일부를 이용하여, 방대한 양의 학습 데이터를 수집 또는 생성할 수 있다.Meanwhile, in the present invention, the controller 130 controls i) a plurality of first images of the three-dimensional modeling object and a second image obtained by photographing the object 200, ii) the object 200 in the first and second images. degree of freedom information, iii) degree of freedom information of the camera (or virtual camera) in the first and second images, iv) a first reference coordinate system (W1), a first camera coordinate system (C1), and a second reference coordinate system (W2) , by using at least a part of the relative positional relationship between at least two of the second camera coordinate system C2, it is possible to collect or generate a vast amount of learning data.

이러한 학습 데이터에는 복수의 제1 영상에 포함된 3차원 모델링 객체 각각에 대한 마스크(MASK) 및 상기 마스크에 각각 매칭된 3차원 모델링 객체의 자세와 관련된 자유도 정보가 포함될 수 있다. The training data may include a mask (MASK) for each of the 3D modeling objects included in the plurality of first images, and degree of freedom information related to the posture of the 3D modeling object matched to the mask, respectively.

여기에서, 마스크는, 실제 공간(400)에 대해 촬영된 제2 영상에 3차원 모델링 객체가 합성(또는 투영)된 이미지이거나, 3차원 모델링 객체 자체에 대한 이미지일 수 있다.Here, the mask may be an image in which a 3D modeling object is synthesized (or projected) on the second image captured in the real space 400 or an image of the 3D modeling object itself.

한편, 마스크에 각각 매칭된 3차원 모델링 객체의 자세와 관련된 자유도 정보는 제2 기준 좌표계(W2)에 대한 대상물(300) 또는 카메라(200)의 자유도 정보이거나, 제2 카메라 좌표계(C2)에 대한 대상물(300) 또는 카메라(200)의 자유도 정보일 수 있다. 여기에서, 대상물(300)은 도 2에서 살펴본 3차원 모델링 객체(600)에 대한 실제 물체일 수 있다.On the other hand, the degree of freedom information related to the posture of the 3D modeling object matched to the mask is information on the degree of freedom of the object 300 or the camera 200 with respect to the second reference coordinate system W2, or the second camera coordinate system C2. It may be information on the degree of freedom of the object 300 or the camera 200 for . Here, the object 300 may be a real object for the 3D modeling object 600 shown in FIG. 2 .

이와 같이, 제어부(130) 3차원 모델링 객체를 포함하는 복수의 영상을 이용하여, 실제 공간(400)에서 활용될 수 있는 대상물(300) 또는 카메라(200)의 자유도 정보를 획득할 수 있다. 한편, 3차원 모델링 객체를 이용하여 생성할 수 있는 데이터의 양은 수만~ 수천만장 이상으로 매우 방대하므로, 본 발명에 의할 경우, 학습에 필요한 충분한 양의 학습 데이터를 생성할 수 있다.In this way, the control unit 130 may obtain information on the degree of freedom of the object 300 or the camera 200 that can be utilized in the real space 400 by using a plurality of images including the 3D modeling object. On the other hand, since the amount of data that can be generated using the 3D modeling object is very large, from tens of thousands to tens of millions or more, according to the present invention, it is possible to generate a sufficient amount of learning data necessary for learning.

이하에서는, 위에서 살펴본 본 발명에 따른 학습 데이터 수집 시스템의 구성에 기반하여, 학습 데이터를 수집하는 방법에 대하여 보다 구체적으로 살펴본다.Hereinafter, based on the configuration of the learning data collection system according to the present invention as described above, a method for collecting learning data will be described in more detail.

먼저, 본 발명에 따른 학습 데이터 수집 방법에 의하면, 물체에 대해 모델리을 수행한 3차원 모델링 객체를 생성하는 과정이 진행될 수 있다(S410).First, according to the learning data collection method according to the present invention, a process of generating a 3D modeling object in which modeling is performed on the object may be performed ( S410 ).

앞서 도 2와 함께 살펴본 것과 같이, 제어부(130)는, 대상물(예를 들어, 도 2에 도시된 컵(cup), 도 3의 도면부호 300의 물체 참조)에 대해 모델링을 수행한 3차원 모델링 객체(610)를 생성 생성할 수 있다.2 , the control unit 130 performs modeling on an object (for example, a cup shown in FIG. 2 , refer to an object of reference numeral 300 in FIG. 3 ) 3D modeling An object 610 may be created and created.

이러한 3차원 모델링 객체(610)는 실제 대상물(300)과 최대한 동일한 형상 및 크기(또는 크기 비율)을 갖도록 모델링 될 수 있다. The three-dimensional modeling object 610 may be modeled to have the same shape and size (or size ratio) as much as possible to the actual object 300 .

3차원 모델링 객체(610)를 모델링 하는 방법은, 매우 다양하며, 예를 들어, 3D CAD를 통하여 생성될 수 있다. 이러한 3차원 모델링 객체(610)는 텍스쳐(texture)가 입혀진 메쉬(mesh) 모델에 해당할 수 있다. 3차원 모델링 객체(610)에 입혀지는 텍스쳐는, 실제 대상물(300)과 동일 또는 유사하게 이루어질 수 있다.A method of modeling the 3D modeling object 610 is very diverse, and may be generated through, for example, 3D CAD. The 3D modeling object 610 may correspond to a mesh model to which a texture is applied. The texture applied to the 3D modeling object 610 may be the same as or similar to the actual object 300 .

이때, 3차원 모델링 객체(610)는 가상의 모델링 공간에 해당하는 제1 공간(미도시됨)의 제1 기준 좌표계(W1)를 기준으로 제2 공간 상에 특정 자세(3차원 자세(r(roll,롤), θ(pitch,피치), Φ(yaw,요우))에 해당함) 및 특정 위치(3차원 위치(x, y, z)에 해당함)를 갖도록 놓여질 수 있다.At this time, the three-dimensional modeling object 610 is a specific posture (three-dimensional posture (r () (corresponding to roll), θ (pitch), Φ (yaw)) and a specific position (corresponding to a three-dimensional position (x, y, z)).

다음으로, 본 발명에서는, 제1 기준 좌표계(W1)를 갖는 제1 공간에서 서로 다른 자세를 가지는 3차원 모델링 객체(610)를 각각 포함하는 복수의 제1 영상을 수집하는 과정이 진행될 수 있다(S420).Next, in the present invention, a process of collecting a plurality of first images each including three-dimensional modeling objects 610 having different postures in a first space having a first reference coordinate system W1 may be performed ( S420).

앞서 도 2에 도시된 것과 같이, 제어부(130)는 제1 기준 좌표계(W1)를 갖는 제1 공간에서 서로 다른 자세를 가지는 상기 3차원 모델링 객체(610)를 각각 포함하는 복수의 제1 영상(611 내지 616 참고)을 수집할 수 있다.As shown in FIG. 2 above, the controller 130 controls a plurality of first images ( 611-616) can be collected.

한편, 이러한 복수의 제1 영상(611 내지 616)은, 제1 카메라 기준 좌표계(C1)를 갖는 가상의 카메라(620, 도 2 참조)가 3차원 모델링 객체(610)를 촬영하였다는 가정을 전제로 생성된 영상일 수 있다.On the other hand, the plurality of first images 611 to 616 is based on the assumption that the virtual camera 620 (refer to FIG. 2 ) having the first camera reference coordinate system C1 captures the 3D modeling object 610 . It may be an image generated by .

이때, 복수의 제1 영상(611 내지 616)은, i)3차원 모델링 객체(610)는 고정된 상태에서 가상의 카메라(620)가 다양한 자세로 3차원 모델링 객체(610)를 촬영하였다는 전제하에 생성된 영상이거나, ii) 가상의 카메라(620)는 고정된 상태에서, 3차원 모델링 객체(610)가 다양한 자세로 움직여졌다는 전제하에 촬영된 영상일 수 있다.At this time, the plurality of first images 611 to 616 is a premise that i) the 3D modeling object 610 is fixed and the 3D modeling object 610 is photographed by the virtual camera 620 in various postures. or ii) the virtual camera 620 may be an image captured on the premise that the 3D modeling object 610 is moved in various postures in a fixed state.

앞서 살펴본 것과 같이, 다양한 정보는, i) 복수의 제1 영상(611 내지 616) 각각에 포함된 3차원 모델링 객체(610)의 제1 기준 좌표계(W1)를 기준으로 하는 자유도 정보, ii) 복수의 제1 영상(611 내지 616)을 촬영한 카메라(620)의 제1 카메라 좌표계(C1)에 대한 자유도 정보, iii) 제1 기준 좌표계(W1)과 제1 카메라 좌표계(C1) 간의 상대적인 위치관계 정보, iv) 복수의 제1 영상(611 내지 616)을 촬영한 카메라(620)의 제1 기준 좌표계(W1)에 대한 자유도 정보, v) 복수의 제1 영상(611 내지 616) 각각에 포함된 3차원 모델링 객체(610)의 제1 카메라 좌표계(C1)에 대한 자유도 정보, vii) 복수의 제1 영상(611 내지 616) 각각에 포함된 3차원 모델링 객체(610)의 뎁스(깊이) 정보 중 적어도 하나를 포함할 수 있다.As described above, various types of information include i) degrees of freedom information based on the first reference coordinate system W1 of the 3D modeling object 610 included in each of the plurality of first images 611 to 616, ii) Degree of freedom information on the first camera coordinate system C1 of the camera 620 that has captured the plurality of first images 611 to 616, iii) the relative relation between the first reference coordinate system W1 and the first camera coordinate system C1 positional relationship information, iv) degree of freedom information for the first reference coordinate system W1 of the camera 620 photographing the plurality of first images 611 to 616, v) each of the plurality of first images 611 to 616 Degree of freedom information on the first camera coordinate system C1 of the 3D modeling object 610 included in vii) the depth of the 3D modeling object 610 included in each of the plurality of first images 611 to 616 ( depth) information.

한편, 위에서 열거된 다양한 정보는, 저장부(120, 도 3 참조)에 저장될 수 있다. 나아가, 저장부(120)에는 위에서 열거된 다양한 정보 중 적어도 하나와, 이에 대응되는 제1 영상(또는 제1 영상에 포함된 모델링 객체(610)에 대응되는 이미지)이 매칭되어 저장될 수 있다. Meanwhile, the various information listed above may be stored in the storage unit 120 (refer to FIG. 3 ). Furthermore, in the storage 120 , at least one of the various information listed above and a first image corresponding thereto (or an image corresponding to the modeling object 610 included in the first image) may be matched and stored.

다음으로 본 발명에서는, 제2 기준 좌표계를 갖는 제2 공간에 배치된 카메라를 이용하여, 제2 공간에 배치된 물체(또는 대상물)를 촬영한 제2 영상을 수집하는 과정이 진행될 수 있다(S430).Next, in the present invention, a process of collecting a second image obtained by photographing an object (or object) disposed in the second space using a camera disposed in a second space having a second reference coordinate system may be performed (S430) ).

한편, S430 과정은, 위에서 살펴본 S410 및 S420 과정보다 먼저 진행되거나, 동시에 진행될 수 있음은 물론이다.On the other hand, it goes without saying that the process S430 may be performed before or simultaneously with the processes S410 and S420 described above.

보다 구체적으로 제어부(130)는, 도 5의 (a) 내지 (c)에 도시된 것과 같이, 실제 환경(현실 공간, 400)에 존재하는 대상물(300)에 대하여, 카메라(200)를 통하여 영상을 촬영하고, 촬영 결과로서 얻어진 제2 영상을 수집할 수 있다.More specifically, the control unit 130, as shown in (a) to (c) of Figure 5, with respect to the object 300 existing in the real environment (real space, 400), through the camera 200 may be photographed, and a second image obtained as a photographing result may be collected.

이때, 실제 환경에 해당하는 제2 공간은 제2 기준 좌표계(W2)를 가지며, 카메라(200)는 제2 카메라 좌표계(C2)를 가질 수 있다.In this case, the second space corresponding to the real environment may have a second reference coordinate system W2 , and the camera 200 may have a second camera coordinate system C2 .

이러한 카메라(200)는 제2 공간(실제 환경 또는 현실 공간, 400)에서의 제2 기준 좌표계(W2)에 놓여진 대상물(300)을 촬영하도록 이루어질 수 있다.The camera 200 may be configured to photograph the object 300 placed in the second reference coordinate system W2 in the second space (real environment or real space, 400 ).

제어부(130)는, 도5의 (a), (b) 및 (c)에 도시된 것과 같이, 대상물(300)을 기준으로 카메라(200)의 3차원 자세 및 3차원 위치를 변경할 수 있으며, 이를 통하여, 복수의 제2 영상에는 대상물(300)에 대해 서로 다른 자세를 갖는 그래픽 객체가 포함될 수 있다.The controller 130 may change the three-dimensional posture and three-dimensional position of the camera 200 with respect to the object 300 as shown in (a), (b) and (c) of FIG. 5 , Through this, the plurality of second images may include graphic objects having different postures with respect to the object 300 .

예를 들어, 마커 보드(410)를 통하여, 카메라(200)의 자유도 정보가 추출되는 방법에 대하여 살펴보면, 제어부(130)는 도 5의 (a), (b) 및 (c)에 도시된 제2 공간(400)을 촬영한 도 6의 (a), (b) 및 (c)에 도시된 제2 영상(601, 602, 603)으로부터, 카메라(200)의 자유도 정보를 추출할 수 있다.For example, looking at a method of extracting the degree of freedom information of the camera 200 through the marker board 410 , the control unit 130 is shown in (a), (b) and (c) of FIG. 5 . The degree of freedom information of the camera 200 can be extracted from the second images 601 , 602 , and 603 shown in (a), (b) and (c) of FIG. 6 taken in the second space 400 . have.

제어부(130)는 제2 영상(601, 602, 603) 상에서의 마커 보드(410)에 해당하는 그래픽 객체(410a, 410b, 410c)의 배열 위치 및 회전 정도를 기준으로, 대상물(300)을 촬영한 카메라(200)의 자유도 정보를 추출할 수 있다. 나아가, 제어부(130)는 제2 영상(601, 602, 603) 상에서의 대상물(300)에 대응되는 그래픽 객체(300a, 300b, 300c)의 배치 위치, 회전 정도를 고려하여, 상기 카메라(200)의 자유도 정보를 추출할 수 있다.The control unit 130 captures the object 300 based on the arrangement position and the degree of rotation of the graphic objects 410a, 410b, and 410c corresponding to the marker board 410 on the second images 601, 602, and 603. Information on degrees of freedom of one camera 200 may be extracted. Furthermore, the controller 130 considers the arrangement position and the degree of rotation of the graphic objects 300a, 300b, and 300c corresponding to the object 300 on the second images 601, 602, 603, the camera 200 The degree of freedom information can be extracted.

이때의 자유도 정보는, 제2 기준 좌표계(W2)를 기준으로 하는 카메라(200)의 자유도 정보이거나, 제2 카메라 좌표계(C2)를 기준으로 하는 카메라(200)의 자유도 정보일 수 있다.The degree of freedom information at this time may be degree of freedom information of the camera 200 based on the second reference coordinate system W2 or information on the degree of freedom of the camera 200 based on the second camera coordinate system C2. .

제어부(130)는 실제 공간(400)에 대한 제2 기준 좌표계(W2)와 제2 카메라 좌표계(C2) 간의 상대적인 위치 관계를 알고 있으므로, 제2 영상(601, 602, 603)으로부터, 제2 기준 좌표계를 기준으로 하는 카메라(200)의 자유도 정보를 추출한 경우라도, 제2 카메라 좌표계(C2)를 기준으로 하는 카메라(200)의 자유도 정보를 추출할 수 있다.Since the controller 130 knows the relative positional relationship between the second reference coordinate system W2 and the second camera coordinate system C2 with respect to the real space 400 , from the second images 601 , 602 , and 603 , the second reference Even when the degree of freedom information of the camera 200 based on the coordinate system is extracted, the degree of freedom information of the camera 200 based on the second camera coordinate system C2 can be extracted.

나아가, 제어부(130)는 제2 영상(601, 602, 603)에 대응되는 대상물(300)의 자유도 정보를 획득할 수 있음은 물론이다. 제어부(130)는 카메라(200)의 자유도 정보에 기반하여, 대상물(300)의 자유도 정보를 획득하거나, 제2 영상으로부터 대상물(300)의 자유도 정보를 획득할 수 있다.Furthermore, it goes without saying that the controller 130 may acquire information on the degree of freedom of the object 300 corresponding to the second images 601 , 602 , and 603 . Based on the degree of freedom information of the camera 200 , the controller 130 may acquire information on the degree of freedom of the object 300 or obtain information on the degree of freedom of the object 300 from the second image.

예를 들어, 대상물(300)의 제2 기준 좌표계(W2)에 대한 대상물(300)의 자유도 정보에 역변환을 수행하는 경우, 대상물(300)의 제2 기준 좌표계(W2)에 대한 카메라(200)의 자유도 정보가 얻어질 수 있다.For example, when inverse transformation is performed on the degree of freedom information of the object 300 with respect to the second reference coordinate system W2 of the object 300 , the camera 200 for the second reference coordinate system W2 of the object 300 . ) of the degree of freedom information can be obtained.

한편, S410 내지 S430 단계를 거쳐, 복수의 제1 영상 및 제2 영상이 획득되고, 대상물(300) 및 카메라(200)와 관련된 자유도 정보가 획득된 경우, 본 발명에서는 이를 활용하여 대상물(또는 물체)에 대한 학습 데이터를 생성하는 과정이 진행될 수 있다(S440).On the other hand, when a plurality of first images and second images are obtained through steps S410 to S430 and degree of freedom information related to the object 300 and the camera 200 is obtained, in the present invention, the object (or The process of generating learning data for the object) may proceed (S440).

보다 구체적으로 제어부(130)는 카메라(200)의 자유도 정보와 복수의 제1 영상에 포함된 3차원 모델링 객체를 이용하여, 대상물(또는 물체)물체에 대한 학습 데이터를 생성할 수 있다.More specifically, the controller 130 may generate learning data for the target (or object) object by using the degree of freedom information of the camera 200 and the 3D modeling object included in the plurality of first images.

이러한 학습 데이터에는 복수의 제1 영상에 포함된 3차원 모델링 객체 각각에 대한 마스크(MASK) 및 상기 마스크에 각각 매칭된 3차원 모델링 객체의 자세와 관련된 자유도 정보가 포함될 수 있다. 이때, 3차원 모델링 객체(600, 도 2 참조)의 자세와 관련된 자유도 정보는, 제2 공간(400)에 대한 자유도 정보로서, 제2 기준 좌표계(W2) 또는 제2 카메라 좌표계(C2)에 대한 정보일 수 있다.The training data may include a mask (MASK) for each of the 3D modeling objects included in the plurality of first images, and degree of freedom information related to the posture of the 3D modeling object matched to the mask, respectively. In this case, the degree of freedom information related to the posture of the three-dimensional modeling object 600 (see FIG. 2 ) is the degree of freedom information for the second space 400 , and includes a second reference coordinate system W2 or a second camera coordinate system C2 ). may be information about

본 발명에서 제어부(130)는 제2 공간(400)에 대한 3차원 모델링 객체(600)의 자유도 정보를 획득하기 위하여, 제1 공간(또는 모델링 공간, 가상 공간)의 제1 기준 좌표계(W1)에 대하여, 제2 공간(400, 현실 환경 또는 실제 공간)에 배치된 카메라(200)의 카메라 좌표계(C2) 간의 상대적인 위치 관계(800, 도 8 참조(

))를 이용할 수 있다. 도 8에 도시된 “T”는 homogeneous transformation matrix를 의미할 수 있다.In the present invention, the controller 130 obtains information on the degree of freedom of the 3D modeling object 600 with respect to the second space 400 , the first reference coordinate system W1 of the first space (or modeling space, virtual space). ), the relative positional relationship 800 between the camera coordinate systems C2 of the camera 200 disposed in the second space 400 (real environment or real space) 800, see FIG. 8 (

)) can be used. “T” shown in FIG. 8 may mean a homogeneous transformation matrix.

이를 위하여, 제어부(130)는 도 7에 도시된 것과 같이, 복수의 제1 영상(3차원 모델링 객체에 대한 영상)으로부터, 제2 영상(카메라(200)를 통해 촬영된 영상, 720)에 포함된 그래픽 객체(대상물에 대응되는 이미지 객체, 300a)에 대응되는 대상물(300, 도 5의 (a) 참고)의 자세와 유사한 자세를 갖는 특정 3차원 모델링 객체(711)가 포함된 제1 영상(710)을 특정할 수 있다.To this end, as shown in FIG. 7 , the control unit 130 includes the plurality of first images (images for the 3D modeling object) into the second images (images captured by the camera 200 , 720 ). A first image ( 710) can be specified.

제어부(130)는, 저장부(120)에 저장된 복수의 제1 영상 및 제2 영상들 중 대상물에 대하여 상호 가장 유사한 자세를 갖는 제1 영상(710) 및 제2 영상(720)을 각각 특정할 수 있다. 이때, 제어부(130)는 도 7에 도시된 것과 같이, 로컬 피쳐 매칭(Local Feature Matching)을 수행하여, 제1 영상(710) 및 제2 영상(720)를 각각 특정할 수 있다.The controller 130 may specify the first image 710 and the second image 720 having the most similar postures to the object from among the plurality of first and second images stored in the storage unit 120 , respectively. can In this case, as shown in FIG. 7 , the controller 130 may specify the first image 710 and the second image 720 by performing local feature matching.

이와 같이, 제1 및 제2 영상(710, 720)이 특정되면, 제어부(130)는 제1 공간(또는 모델링 공간, 가상 공간)의 제1 기준 좌표계(W1)에 대하여, 제2 공간(400, 현실 환경 또는 실제 공간)에 배치된 카메라(200)의 제2 카메라 좌표계(C2) 간의 상대적인 위치 관계(800, 도 8 참조(

))를 추출하기 위하여, 특정된 제1 및 제2 영상(710, 720)을 이용할 수 있다.As such, when the first and

second images

710 and 720 are specified, the controller 130 controls the second space 400 with respect to the first reference coordinate system W1 of the first space (or modeling space, virtual space). , a relative positional relationship 800 between the second camera coordinate system C2 of the camera 200 disposed in the real environment or real space (see FIG. 8 (

)), the specified first and

second images

710 and 720 may be used.

제어부(130)는 상기 상대적인 위치 관계(800, (

))를 추출하기 위하여, 특정된 제1 영상(710)에 포함된 상기 3차원 모델링 객체(711)와 특정된 제2 영상(820)에서 대상물에 대응되는 그래픽 객체(300a) 간의 관계성을 이용할 수 있다.The control unit 130 controls the relative positional relationship 800, (

)), the relationship between the 3D modeling object 711 included in the specified first image 710 and the graphic object 300a corresponding to the object in the specified second image 820 is used. can

한편, 상대적인 위치 관계(800, (

))는, 제1 공간의 제1 기준 좌표계(W1)가 카메라(200)의 제2 카메라 좌표계(C2)에 대하여 회전(rotation) 및 변환(translation)된 정도를 의미할 수 있다. 즉, 상대적인 위치 관계(800, (

))는, 제2 카메라 좌표계(C2)에 대한 제1 공간의 제1 기준 좌표계(W1)의 상대적인 위치 관계를 의미할 수 있다.On the other hand, the relative positional relationship (800, (

)) may mean a degree of rotation and translation of the first reference coordinate system W1 of the first space with respect to the second camera coordinate system C2 of the camera 200 . That is, the relative positional relationship (800, (

)) may mean a relative positional relationship of the first reference coordinate system W1 in the first space with respect to the second camera coordinate system C2 .

보다 구체적으로, 카메라(200)의 제2 카메라 좌표계(C2)가 제1 공간의 제1 기준 좌표계(W1)에 대하여 회전(rotation) 및 변환(translation)된 정도는, 상기 특정된 제1 영상(710)에 포함된 3차원 모델링 객체(721)와 제2 영상(720)에서 대상물(또는 물체)에 대응되는 그래픽 객체(300a) 간의 관계성에 근거하여 특정될 수 있다.More specifically, the degree to which the second camera coordinate system C2 of the camera 200 is rotated and translated with respect to the first reference coordinate system W1 of the first space is determined by the specified first image ( It may be specified based on the relationship between the 3D modeling object 721 included in the 710 and the graphic object 300a corresponding to the object (or object) in the second image 720 .

제어부(130)는 도 7 및 도 8에 도시된 것과 같이, 특정된 제2 영상(720)에서의 그래픽 객체(300a)의 픽셀 좌표(u, v(도 8의 도면부호 810참조)) 및 특정된 제1 영상(710)에서의 3차원 모델링 객체(7110)의 3차원 좌표(x, y, z(도 8의 도면부호 820 참조))를 이용하여, 제2 카메라 좌표계(C2)가 제1 공간의 제1 기준 좌표계(W1)에 대하여 회전(rotation) 및 변환(translation)된 정도를 추출할 수 있다.As shown in FIGS. 7 and 8 , the control unit 130 controls the pixel coordinates (u, v (refer to reference numeral 810 in FIG. 8 )) of the graphic object 300a in the specified second image 720 and specific Using the three-dimensional coordinates (x, y, z (refer to reference numeral 820 of FIG. 8 )) of the three-dimensional modeling object 7110 in the first image 710 ), the second camera coordinate system C2 becomes the first A degree of rotation and translation with respect to the first reference coordinate system W1 of the space may be extracted.

이러한 관계는, 제1 영상(710) 및 제2 영상(720)이 각각 대상물의 특정 자세에 대하여 상호 가장 유사한 3차원 모델링 객체(711) 및 그래픽 객체(300a)로 이루어졌음을 이용한 것이다. This relationship is based on the fact that the first image 710 and the second image 720 are composed of a 3D modeling object 711 and a graphic object 300a that are most similar to each other with respect to a specific posture of an object.

제어부(130)는 도 8의 (a)에 도시된 것과 같이, i)3차원 모델링 객체(711)에 해당하는 PnP방정식에 3차원 모델링 객체(711)의 3차원 좌표(820)를 입력으로 넣고, ii) 제2 카메라 좌표계(C2)에 대하여 제1 공간의 제1 기준 좌표계(W1)가 회전 및 변환된 정도를 정의한 상대적인 위치 관계(800, (

))에 대응되는 매트릭스(행렬, 830)을 적용할 수 있다. 그리고, 제어부(130)는 iii)카메라(200)에 의해 촬영된 대상물에 해당하는 그래픽 객체(300a)의 픽셀 좌표(810)가 결과값으로 도출됨을 이용하여, iv)제1 공간의 제1 기준 좌표계(W1)에 대하여 제2 카메라 좌표계(C2)가 회전 및 변환된 정도를 정의한 행렬(매트릭스, 830)를 추출할 수 있다.As shown in (a) of FIG. 8 , the controller 130 puts the three-dimensional coordinates 820 of the three-dimensional modeling object 711 into the PnP equation corresponding to i) the three-dimensional modeling object 711 as an input. , ii) a relative positional relationship 800, (

)) corresponding to the matrix (matrix, 830) may be applied. Then, the control unit 130 iii) using the pixel coordinates 810 of the graphic object 300a corresponding to the object photographed by the camera 200 are derived as a result value, iv) the first standard of the first space A matrix (matrix) 830 defining the degree of rotation and transformation of the second camera coordinate system C2 with respect to the coordinate system W1 may be extracted.

이때, 특정된 제2 영상(720)에서의 그래픽 객체(300a)의 픽셀 좌표(u, v(도 8의 도면부호 810참조)) 및 특정된 제1 영상(710)에서의 3차원 모델링 객체(711)의 3차원 좌표(x, y, z(도 8의 도면부호 820 참조))는 제어부(130)에서 이미 알고 있는 값에 해당하므로, 제어부(130)는 제2 카메라 좌표계(C2)가 제1 공간의 제1 기준 좌표계(W1)에 대하여 회전 및 변환된 정도를 정의한 상대적인 위치 관계(800, ((

))에 해당하는 매트릭스(행렬, 830)를 추출할 수 있다.At this time, the pixel coordinates (u, v (refer to reference numeral 810 in FIG. 8)) of the graphic object 300a in the specified second image 720 and the 3D modeling object in the specified first image 710 ( 711), since the three-dimensional coordinates (x, y, z (refer to reference numeral 820 in FIG. 8 )) correspond to values already known by the control unit 130 , the control unit 130 determines that the second camera coordinate system C2 is the second The relative positional relationship 800, ((()

)) corresponding to the matrix (matrix, 830) can be extracted.

한편, 이러한 PnP방정식에는, intrinsic parameter(840)가 적용되며, 이는 카메라(200)의 특성을 나타내는 카메라 고유 파라미터에 해당할 수 있다.Meanwhile, in this PnP equation, an intrinsic parameter 840 is applied, which may correspond to a camera-specific parameter indicating the characteristics of the camera 200 .

한편, 도 8의 (a)에 도시된 것과 같이, PnP 방정식에 대한 파라미터에 대하여 설명하면, “S”는 스케일 상수(ex: 1), “r“은 skew parameter(일종의 왜곡 보정 상수에 해당), “t1”, “t2”, “t3”은 회전 및 변환에 대한 행렬 및 벡터를 의미할 수 있다.On the other hand, as shown in (a) of FIG. 8 , when describing the parameters for the PnP equation, “S” is a scale constant (ex: 1), “r” is a skew parameter (corresponding to a kind of distortion correction constant) , “t1”, “t2”, and “t3” may refer to matrices and vectors for rotation and transformation.

한편, 위와 같이, PnP방정식을 통하여, 제2 카메라 좌표계(C2)에 대한 제1 기준 좌표계(W1)의 상대적인 위치 관계(800, 도 8 참조(

))가 도출되면, 제어부(130)는 이를 이용하여, 도 8의 (b)에 도시된 것과 같이, 제1 기준 좌표계(W1)와 제2 기준 좌표계(W2)간의 상대적인 위치 관계(801, (

))를 추출할 수 있다.On the other hand, as described above, through the PnP equation, the relative positional relationship 800 of the first reference coordinate system W1 with respect to the second camera coordinate system C2 (see FIG. 8 (

)) is derived, the control unit 130 uses it, as shown in (b) of FIG. 8 , the relative positional relationship 801, (

)) can be extracted.

제어부(130)는 제1 기준 좌표계(W1)에 대하여 제2 기준 좌표계(W2)가 회전 및 변환된 정도 또는 제2 기준 좌표계(W2)에 대하여 제1 기준 좌표계(W1)가 회전 및 변환된 정보를 추출할 수 있다.The control unit 130 determines the degree to which the second reference coordinate system W2 is rotated and transformed with respect to the first reference coordinate system W1, or information about the rotation and transformation of the first reference coordinate system W1 with respect to the second reference coordinate system W2. can be extracted.

도 8의 (b)에서 제어부(130)는 제1 기준 좌표계(W1)에 대하여 제2 기준 좌표계(W2)가 변환된 정도(801, (

))추출할 수 있다. 제1 기준 좌표계(W1)에 대하여 제2 기준 좌표계(W2)가 변환된 정도(801, (

))는, “제1 기준 좌표계(W1)에 대한 제2 기준 좌표계(W2)의 상대적인 위치관계”라고도 설명될 수 있다.In (b) of FIG. 8 , the controller 130 controls the degree of transformation 801, (

)) can be extracted. The degree to which the second reference coordinate system W2 is transformed with respect to the first reference coordinate system W1 (801, (

)) may also be described as “relative positional relationship of the second reference coordinate system W2 with respect to the first reference coordinate system W1”.

한편, 도 8의 (b)에 도시된 것과같이, 제1 기준 좌표계(W1)에 대한 제2 기준 좌표계(W2)의 상대적인 위치관계(801, (

))는, i)제1 기준 좌표계(W1)에 대한 제2 카메라 좌표계(C2) 간의 상대적인 위치 관계(802, (

))와 ii)제2 카메라 좌표계(C2)와 제2 기준 좌표계 간의 상대적인 위치 관계(803, (

))의 곱으로 구해질 수 있다.On the other hand, as shown in (b) of FIG. 8, the relative positional relationship 801, (

)) is i) a relative positional relationship 802, (

)) and ii) the relative positional relationship 803, (

)) can be obtained by multiplying

제1 기준 좌표계(W1)에 대한 제2 카메라 좌표계(C2) 간의 상대적인 위치 관계(802, (

))는 위에서 도 8의 (a)와 함께 살펴본. 제2 카메라 좌표계(C2)에 대한 제1 기준 좌표계(W1)의 상대적인 위치 관계(800, 도 8 참조(

))의 역행렬(또는 역변환, (

), 804)을 통하여 얻어질 수 있다. 나아가, 제2 카메라 좌표계(C2)와 제2 기준 좌표계(W2) 간의 상대적인 위치 관계(803, (

))는, 저장부(120)에 기 확보된 정보에 해당할 수 있다.A relative positional relationship (802, () between the second camera coordinate system C2 with respect to the first reference coordinate system W1

)) looked at together with (a) of FIG. 8 above. A relative positional relationship 800 of the first reference coordinate system W1 with respect to the second camera coordinate system C2 (see FIG. 8 (

)) of the inverse matrix (or inverse transform, (

), 804). Furthermore, the relative positional relationship 803, (

)) may correspond to information previously secured in the storage unit 120 .

따라서, 제어부(130)는 위의 관계를 이용하여, 도 8의 (b)에 도시된 것과 같이, 제1 기준 좌표계(W1)에 대한 제2 기준 좌표계(W2)의 상대적인 위치관계(801, (

))를 얻을 수 있다.Therefore, the control unit 130 uses the above relationship, as shown in FIG. 8(b), the relative positional relationship 801, (

)) can be obtained.

이와 같이, 제어부(130)는 위에서 살펴본 상대적인 위치관계 추정을 통하여, i) 제1 기준 좌표계(W1)에 대한 제2 기준 좌표계(W2)의 상대적인 위치관계(801,806, (

) 를 얻었으면, 이러한 위치관계(801, 806)와 ii) 제2 기준 좌표계(W2)에 대한 제2 카메라 좌표계(C2)의 상대적인 위치 관계(807, (

))의 곱을 이용하여, 제1 기준 좌표계(W1)에 포함된 3차원 모델링 객체에 대한 제2 카메라 좌표계(C2)의 자유도 정보를 추출할 수 있다.In this way, the control unit 130 through the estimation of the relative positional relationship described above, i) the relative

positional relationships

801, 806, (

), these

positional relationships

801, 806 and ii) the relative positional relationships 807, (

)), information on the degree of freedom of the second camera coordinate system C2 for the 3D modeling object included in the first reference coordinate system W1 may be extracted.

도 8의 (a) 및 (b)는 도 8의 (C)에 도시된, 제1 기준 좌표계(W1)에 포함된 3차원 모델링 객체에 대한 제2 카메라 좌표계(C2)의 관계성(또는 상대적인 위치 관계, 805, (

))를 도출하기 위한 과정으로서, 도면부호 805에 따른 관계성은, 특정된 제1 및 제2 영상뿐만 아니라, 임의의 자세를 갖는 3차원 모델링 객체에 대한 임의의 제1 영상에도 적용될 수 있다.(a) and (b) of FIG. 8 show the relationship (or relative positional relationship, 805, (

))), the relationship indicated by reference numeral 805 may be applied not only to the specified first and second images, but also to any first image of a 3D modeling object having an arbitrary posture.

도 8의 (C)에 도시된, 제1 기준 좌표계(W1)에 포함된 3차원 모델링 객체에 대한 제2 카메라 좌표계(C2)의 관계성(805, (

))은, 앞서 도 8의 (a) 및 (b)의 관계식을 통하여 도출된, i) 제1 기준 좌표계(W1)에 대한 제2 기준 좌표계(W2)의 상대적인 위치관계(801, (

), 806)와 ii) 제2 기준 좌표계(W2)에 대한 제2 카메라 좌표계(C2)의 상대적인 위치 관계(807)의 곱을 통하여, 도출될 수 있다.8(C), the relationship 805, (

)) is, i) the relative positional relationship 801, (

), 806) and ii) the second reference coordinate system W2 and the second camera coordinate system C2 are multiplied by the relative positional relationship 807 .

한편, 본 발명에서 설명되는 800, 801, 802, 803, 804, 805, 806, 807의 도면부호를 붙여서 설명하는 상대적인 위치관계는, homogeneous transformation matrix를 의미하는 것으로 이해되어 질 수 있다. 이러한 homogeneous transformation matrix를 구성하는 요소들의 값은, 좌표계 간의 상대적인 위치 관계(회전 및 변환의 정도), 제1 영상에 포함된 3차원 모델링 객체의 자유도 정보, 제2 영상에 포함된 물체에 해당하는 그래픽 객체의 자유도 정보, 제2 영상을 촬영한 카메라(200)의 자유도 정보 중 적어도 하나가 활용될 수 있다.Meanwhile, the relative positional relationship described with reference numerals 800, 801, 802, 803, 804, 805, 806, and 807 described in the present invention may be understood to mean a homogeneous transformation matrix. The values of the elements constituting the homogeneous transformation matrix are the relative positional relationship (degree of rotation and transformation) between coordinate systems, information on the degree of freedom of the 3D modeling object included in the first image, and the object included in the second image. At least one of the degree of freedom information of the graphic object and the degree of freedom information of the camera 200 photographing the second image may be utilized.

한편, 제어부(130)는 제1 기준 좌표계(W1)에 포함된 3차원 모델링 객체에 대한 제2 카메라 좌표계(C2)의 관계성(또는 상대적인 위치 관계, 805, (

))을 이용하여, 서로 다른 자세를 갖는 3차원 모델링 객체를 포함한 복수의 제1 영상에 적용하여, 복수의 제1 영상에 각각 포함된 3차원 모델링 객체에 대한 카메라(200)의 제2 카메라 좌표계(C2)의 자유도 정보를 추출할 수 있다.Meanwhile, the controller 130 controls the relationship (or relative positional relationship, 805, (

)), applied to a plurality of first images including three-dimensional modeling objects having different postures, and the second camera coordinate system of the camera 200 for three-dimensional modeling objects included in the plurality of first images, respectively. (C2) degree of freedom information can be extracted.

즉, 제어부(130)는 3차원 모델링 객체를 실제 공간(제2 공간)에 배치된 카메라(200)를 통해 촬영하였을 경우의 카메라(200)의 자유도 정보를 추출할 수 있다.That is, the controller 130 may extract information on the degree of freedom of the camera 200 when the 3D modeling object is photographed through the camera 200 disposed in the real space (the second space).

이때, 추출되는 자유도 정보의 기준 좌표계는 제2 카메라 좌표계(C2)에 대한 것이거나, 제2 기준 좌표계(W2)에 대한 것일 수 있다.In this case, the reference coordinate system of the extracted degree of freedom information may be for the second camera coordinate system C2 or for the second reference coordinate system W2 .

한편, 제어부(130)는 저장부(120)에 제2 기준 좌표계(W2) 및 제2 카메라(C2) 좌표계에 대한 상대적인 위치관계에 대한 정보를 가지고 있으므로, 경우에 따라 필요한 형태의 좌표계에 대한 자유도 정보를 추출할 수 있다.On the other hand, since the control unit 130 has information on the relative positional relationship with respect to the second reference coordinate system W2 and the second camera C2 coordinate system in the storage unit 120 , the freedom of the coordinate system of the necessary form in some cases. information can also be extracted.

이를 통하여, 제어부(130)는 3차원 모델링 객체가, 실제 공간에 놓여졌을 경우에, 카메라 또는 3차원 모델링 객체에 대한 자유도 정보를 추출할 수 있다. Through this, when the 3D modeling object is placed in an actual space, the controller 130 may extract information on the degree of freedom for the camera or the 3D modeling object.

이와 같이 추출된 자유도 정보는, 학습 데이터에 포함될 수 있다. 이러한 학습 데이터에는 도 9에 도시된 것과 같이, 복수의 제1 영상(911, 921, 931)에 포함된 3차원 모델링 객체(911’, 921’, 931’) 각각에 대응되는 마스크(MASK) 및 상기 마스크(910’, 920’, 930’)에 각각 매칭된 3차원 모델링 객체(911’, 921’, 931’)의 자세와 관련된 자유도 정보가 포함될 수 있다. The degree of freedom information extracted in this way may be included in the learning data. As shown in FIG. 9 , the training data includes a mask corresponding to each of the 3D modeling objects 911', 921', and 931' included in the plurality of first images 911, 921, and 931, and a mask and The degree of freedom information related to the postures of the 3D modeling objects 911', 921', and 931' respectively matched to the masks 910', 920', and 930' may be included.

여기에서, 마스크는, 실제 공간에 대해 촬영된 제2 영상(910, 920, 930)에 3차원 모델링 객체((910’, 920’, 930’)가 합성(또는 투영)된 이미지이거나, 3차원 모델링 객체 자체에 대한 이미지일 수 있다.Here, the mask is an image in which three-dimensional modeling objects (910', 920', 930') are synthesized (or projected) on the second images 910, 920, and 930 captured in real space, or a three-dimensional (3D) image. It may be an image of the modeling object itself.

나아가, 도 10에 도시된 것과 같이, 학습 데이터(1010)상에는, 마스크에 대한 정보 및 마스크에 각각 매칭된 3차원 모델링 객체의 자세와 관련된 자유도 정보가 포함될 수 있다. 한편, 마스크에 각각 매칭된 3차원 모델링 객체의 자세와 관련된 자유도 정보는 제2 기준 좌표계(W2)에 대한 대상물(300) 또는 카메라(200)의 자유도 정보이거나, 제2 카메라 좌표계(C2)에 대한 대상물(300) 또는 카메라(200)의 자유도 정보일 수 있다. Furthermore, as shown in FIG. 10 , on the training data 1010, information on the mask and information on the degree of freedom related to the posture of the 3D modeling object matched to the mask may be included. On the other hand, the degree of freedom information related to the posture of the 3D modeling object matched to the mask is information on the degree of freedom of the object 300 or the camera 200 with respect to the second reference coordinate system W2, or the second camera coordinate system C2. It may be information on the degree of freedom of the object 300 or the camera 200 for .

제어부(130) 3차원 모델링 객체를 포함하는 복수의 영상을 이용하여, 실제 공간(400)에서 활용될 수 있는 대상물(300) 또는 카메라(200)의 자유도 정보를 획득할 수 있다. 한편, 3차원 모델링 객체를 이용하여 생성할 수 있는 데이터의 양은 수만~ 수천만장 이상으로 매우 방대하므로, 본 발명에 의할 경우, 학습에 필요한 충분한 양의 학습 데이터를 생성할 수 있다.The controller 130 may obtain information on the degree of freedom of the object 300 or the camera 200 that can be utilized in the real space 400 by using a plurality of images including the 3D modeling object. On the other hand, since the amount of data that can be generated using the 3D modeling object is very large, from tens of thousands to tens of millions or more, according to the present invention, it is possible to generate a sufficient amount of learning data necessary for learning.

한편, 도 11에 도시된 것과 같이, 위에서 살펴본 3차원 모델링 객체를 활용한 학습 데이터는, reference 이미지 데이터 셋(set)(1110)을 형성할 수 있다.Meanwhile, as shown in FIG. 11 , the learning data using the 3D modeling object discussed above may form a reference image data set 1110 .

3차원 모델링 객체를 활용하여 학습 데이터를 활용하는 방법(600)은 도 2의 설명으로 대체하도록 한다.The method 600 of utilizing the learning data by utilizing the 3D modeling object is replaced with the description of FIG. 2 .

이러한 데이터 셋(1110)에는, i)3차원 모델링 객체에 대한 이미지(또는 모델링 이미지, 렌더링 이미지), ii)3차원 모델링 객체에 대한 깊이 맵(depthmap), iii)3차원 모델링 객체를 렌더링한 카메라의 자유도 정보(제2 기준 좌표계(W2)에서의 카메라의 자유도 정보 또는 제2 카메라 좌표계(C2)에서의 카메라의 자유도 정보일 수 있음), iv)카메라의 Intrinsic parameters 중 적어도 두개가 상호 매칭되어 존재할 수 있다. 상기 데이터 셋(1110)은 앞서 살펴본 방식을 통하여 얻어질 수 있다.In this data set 1110, i) an image (or modeling image, rendering image) of a three-dimensional modeling object, ii) a depth map of a three-dimensional modeling object, iii) a camera that renders a three-dimensional modeling object at least two of the degree of freedom information (which may be the degree of freedom information of the camera in the second reference coordinate system W2 or the camera degree of freedom information in the second camera coordinate system C2), iv) the intrinsic parameters of the camera Matching may exist. The data set 1110 may be obtained through the method described above.

제어부(130)는 이러한 데이터 셋(1110)을 이용하여, 실제 환경에서 카메라(미도시됨)를 통해 센싱되는 대상물에 대한 자유도 정보를 추출할 수 있다.The controller 130 may extract degree of freedom information on an object sensed by a camera (not shown) in a real environment by using this data set 1110 .

도 12에 도시된 것과 같이, 제어부(130)는 카메라(미도시됨)으로부터 대상물(예를 들어, 컵(CUP)을 촬영한 촬영 영상(또는 입력 영상)이 수신(S1210)되면, 입력 영상(1211)에 포함된 상기 대상물에 대응되는 그래픽 객체(1211’)와 데이터 셋(1110) 포함된 3차원 모델링 객체에 대한 이미지들 간의 비교를 수행할 수 있다.As shown in FIG. 12 , the controller 130 receives (S1210) a photographed image (or input image) of an object (eg, a cup CUP) from a camera (not shown), the input image ( A comparison between images of the graphic object 1211 ′ corresponding to the object included in 1211 ) and the 3D modeling object included in the data set 1110 may be performed.

제어부(130)는 Global feature를 기반으로, 데이터 셋(1110) 포함된 3차원 모델링 객체에 대한 이미지들 중 입력 영상(1210)에 포함된 그래픽 객체(1211’)와 가장 유사한 자세를 갖는 특정 이미지(1221)를 검색할 수 있다(S1220). 제어부(130)는 입력 영상(1211)에서의 그래픽 객체(1211’)에 해당하는 대상물을 바라보는 카메라의 방향(Orientation)의 차이에 근거하여, 상기 특정 이미지(1221)를 검색할 수 있다.The control unit 130, based on the global feature, a specific image ( 1221) can be searched (S1220). The controller 130 may search for the specific image 1221 based on a difference in orientation of a camera looking at an object corresponding to the graphic object 1211 ′ in the input image 1211 .

그리고, 제어부(130)는 입력 영상(1211)과 특정 이미지(1221) 간의 local feature matching을 수행(S1230)하여, 입력 영상(1211)과 특정 이미지(1221)간의 매칭 포인트(매칭 점)를 추출할 수 있다. 입력 영상(1211)과 특정 이미지(1221) 각각에 대응되는 매칭 포인트는 한 쌍을 이룰 수 있다.Then, the controller 130 performs local feature matching between the input image 1211 and the specific image 1221 ( S1230 ) to extract a matching point (matching point) between the input image 1211 and the specific image 1221 . can Matching points corresponding to each of the input image 1211 and the specific image 1221 may form a pair.

제어부(130)는 RGB 기반의 local feature matching을 수행할 수 있으며, 입력 영상(1211)과 특정 이미지(1221) 간의 key point들의 descriptor를 이용하여, 매칭 포인트를 추출할 수 있다.The controller 130 may perform RGB-based local feature matching, and may extract a matching point by using a descriptor of key points between the input image 1211 and the specific image 1221 .

한편, 의에서 살펴본 global feature matching 및 local feature matching은 Dual Feature Network에서 수행될 수 있다. Dual Feature Network는 이미지 검색(Image retrieval) 시 사용되는 global feature와 대상물의 자세 추정(pose estimation) 시 사용되는 local feature를 동시에 추출하는 deep neural network 일 수 있다. On the other hand, the global feature matching and local feature matching discussed in Fig. 1 can be performed in the Dual Feature Network. The dual feature network may be a deep neural network that simultaneously extracts a global feature used for image retrieval and a local feature used for pose estimation of an object.

제어부(130)는 위의 매칭 포인트의 수가 임계치(기준 값)을 초과하는 경우(또는 만족하는 경우), 매칭 포인트들을 이용하여, 입력 영상(1211)에 포함된 그래픽 객체(1211’)에 대응되는 대상물 또는 이를 촬영한 카메라의 자세를 추정할 수 있다(S1240).When the number of the above matching points exceeds (or satisfies) the threshold (reference value), the controller 130 uses the matching points to correspond to the graphic object 1211 ′ included in the input image 1211 . The posture of the object or the camera photographing it may be estimated (S1240).

한편, 제어부(130)는 위의 매칭 포인트의 수가 임계치(기준 값)을 초과하지 않는 경우, 입력 영상(1211)에 포함된 그래픽 객체(1211’)에 대응되는 대상물 또는 이를 촬영한 카메라의 자세에 대한 추정은 이루어지지 않을 수 있다.On the other hand, if the number of the above matching points does not exceed the threshold (reference value), the control unit 130, the object corresponding to the graphic object 1211' included in the input image 1211 or the posture of the camera photographing the same. estimation may not be made.

자제 추정 과정에 대하여 구체적으로 살펴보면, 제어부(130)는 입력 영상(1211)에 포함된 그래픽 객체(1211’)와 3차원 모델링 객체(1221’) 간의 관계성에 기초하여, 대상물의 자유도 자세를 추출할 수 있다.In detail with respect to the self-estimation process, the controller 130 extracts the posture of the degree of freedom of the object based on the relationship between the graphic object 1211 ′ and the 3D modeling object 1221 ′ included in the input image 1211 . can do.

보다 구체적으로, 제어부(130)는 도 8의 (a)에서 살펴본, 3차원 좌표 정보(820)에 데이터 셋(1110)에 포함되며, 상기 특정 이미지(1221)의 매칭 포인트에 해당하는 좌표 정보를 입력하고, 2차원 좌표 정보(810, 또는 픽셀 좌표)에 입력 영상(1211)의 매칭 포인트에 해당하는 좌표 정보를 입력함으로써, PnP방정식(알고리즘)을 풀어, 대상물 자체 또는 대상물을 촬영하는 카메라의 자유도 정보를 추정할 수 있다. PnP방정식(알고리즘)을 푸는 과정에 대해서는, 도 8에 대한 설명으로 대체하도록 한다.More specifically, the control unit 130 is included in the data set 1110 in the three-dimensional coordinate information 820 as seen in (a) of FIG. 8, and coordinate information corresponding to the matching point of the specific image 1221. By inputting and inputting the coordinate information corresponding to the matching point of the input image 1211 in the two-dimensional coordinate information (810, or pixel coordinates), the PnP equation (algorithm) is solved, and the object itself or the freedom of the camera to photograph the object information can also be estimated. The process of solving the PnP equation (algorithm) will be replaced with the description of FIG. 8 .

한편, 데이터 셋(1110)에는 특정 이미지(1221)에 포함된 3차원 모델링 객체의 자유도 정보가 존재하므로, 제어부(130)는 해당 자유도 정보에 대하여, 위에서 살펴본 매칭 포인트들 간의 상대적인 위치 관계를 반영함으로써, 입력 영상(1211)에 포함된 그래픽 객체(1211’)에 대응되는 대상물의 자유도 정보를 추출할 수 있다.On the other hand, since the data set 1110 includes the degree of freedom information of the 3D modeling object included in the specific image 1221 , the controller 130 determines the relative positional relationship between the matching points discussed above with respect to the corresponding degree of freedom information. By reflecting, information on the degree of freedom of the object corresponding to the graphic object 1211 ′ included in the input image 1211 may be extracted.

이때의 자유도 정보는, 대상물 자체 또는 대상물을 촬영하는 카메라의 자유도 정보일 수 있다.In this case, the degree of freedom information may be information on the degree of freedom of the object itself or a camera photographing the object.

위에서 살펴본 것과 같이, 본 발명에 따른 학습 데이터 수집 시스템 및 방법은, 대상물에 대한 3차원 모델링 객체를 생성하고, 촬영된 영상에 포함된 대상물에 해당하는 그래픽 객체로부터 생성된 3차원 모델링 객체 간의 관계성을 이용하여, 3차원 모델링 객체의 자유도 정보를 추출할 수 있다. As described above, the learning data collection system and method according to the present invention generates a three-dimensional modeling object for an object, and the relationship between the three-dimensional modeling object generated from the graphic object corresponding to the object included in the photographed image. can be used to extract the degree of freedom information of the 3D modeling object.

이를 통해, 본 발명은, 3차원 모델링 객체를 실제 환경에서 촬영된 영상에 반영함으로써, 실제 환경에서의 조명, 그림자 등이 반영된 학습 데이터를 생성할 수 있다. 결과적으로, 본 발명에 의하면, 보다 실제 환경에 가까운 학습 데이터를 수집하는 것이 가능하다.Through this, the present invention can generate learning data in which lighting, shadows, and the like in the real environment are reflected by reflecting the 3D modeling object in the image captured in the real environment. Consequently, according to the present invention, it is possible to collect learning data closer to the real environment.

한편, 위에서 살펴본 본 발명은, 컴퓨터에서 하나 이상의 프로세스에 의하여 실행되며, 이러한 컴퓨터로 판독될 수 있는 매체(또는 기록 매체)에 저장 가능한 프로그램으로서 구현될 수 있다.Meanwhile, the present invention described above may be implemented as a program storable in a computer-readable medium (or recording medium), which is executed by one or more processes in a computer.

나아가, 위에서 살펴본 본 발명은, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드 또는 명령어로서 구현하는 것이 가능하다. 즉, 본 발명은 프로그램의 형태로 제공될 수 있다. Furthermore, the present invention as seen above can be implemented as computer-readable codes or instructions on a medium in which a program is recorded. That is, the present invention may be provided in the form of a program.

한편, 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. Meanwhile, the computer-readable medium includes all types of recording devices in which data readable by a computer system is stored. Examples of computer-readable media include Hard Disk Drive (HDD), Solid State Disk (SSD), Silicon Disk Drive (SDD), ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. There is this.

나아가, 컴퓨터가 읽을 수 있는 매체는, 저장소를 포함하며 전자기기가 통신을 통하여 접근할 수 있는 서버 또는 클라우드 저장소일 수 있다. 이 경우, 컴퓨터는 유선 또는 무선 통신을 통하여, 서버 또는 클라우드 저장소로부터 본 발명에 따른 프로그램을 다운로드 받을 수 있다.Furthermore, the computer-readable medium may be a server or a cloud storage that includes a storage and that an electronic device can access through communication. In this case, the computer may download the program according to the present invention from a server or cloud storage through wired or wireless communication.

나아가, 본 발명에서는 위에서 설명한 컴퓨터는 프로세서, 즉 CPU(Central Processing Unit, 중앙처리장치)가 탑재된 전자기기로서, 그 종류에 대하여 특별한 한정을 두지 않는다.Furthermore, in the present invention, the computer described above is an electronic device equipped with a processor, that is, a CPU (Central Processing Unit, Central Processing Unit), and there is no particular limitation on the type thereof.

한편, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.On the other hand, the above detailed description should not be construed as limiting in all respects, but should be considered as exemplary. The scope of the present invention should be determined by a reasonable interpretation of the appended claims, and all modifications within the equivalent scope of the present invention are included in the scope of the present invention.

Claims

generating a 3D modeling object on which modeling is performed on an object, and collecting a plurality of first images each including the 3D modeling object having different postures in a first space having a first reference coordinate system;
collecting a second image obtained by photographing the object disposed in the second space by using a camera disposed in a second space having a second reference coordinate system different from the first reference coordinate system; and
and generating learning data for the object by using the degree of freedom information of the camera and the 3D modeling object included in the plurality of first images.

According to claim 1,
The learning data is
A method for collecting learning data, comprising: a mask (MASK) for each of the three-dimensional modeling objects included in the plurality of first images; .

3. The method of claim 2,
The mask is an image obtained by combining the 3D modeling object included in the plurality of first images with the second image.

3. The method of claim 2,
The degree of freedom information related to the posture of the 3D modeling object matched to the mask,
Learning data collection method, characterized in that the degree of freedom information for the second space.

5. The method of claim 4,
The degree of freedom information related to the posture of the 3D modeling object matched to the mask,
degree of freedom information of a three-dimensional modeling object matched to the mask in the second reference coordinate system, and
Learning data collection method comprising at least one of information on the degree of freedom of the camera corresponding to the posture of the 3D modeling object matched to the mask, respectively.

5. The method of claim 4,
In the step of generating the learning data,
Using a relative positional relationship between the camera coordinate system of the camera disposed in the second space with respect to the first reference coordinate system in the first space,
Learning data collection method, characterized in that for extracting the degree of freedom information related to the posture of the 3D modeling object each matched to the mask in the second space.

7. The method of claim 6,
The relative positional relationship is
The method for collecting learning data, comprising specifying the camera coordinate system of the camera based on the degree of rotation and translation with respect to the reference coordinate system of the first space.

8. The method of claim 7,
The degree of rotation and translation is,
Learning data collection method, characterized in that specified based on a relationship between the 3D modeling object included in the first image and the graphic object corresponding to the object in the second image.

9. The method of claim 8,
In the step of generating the learning data,
specifying, from the plurality of first images, a first image including a specific three-dimensional modeling object having a posture similar to the posture of the object corresponding to the graphic object included in the second image,
The learning data collection method, characterized in that extracting the relationship using the second image and the specified first image.

10. The method of claim 9,
The degree to which the camera is rotated and translated is,
Learning data collection method, characterized in that extracted by using the pixel coordinates of the graphic object in the second image and the three-dimensional coordinates of the specific three-dimensional modeling object in the specified first image.

a modeling unit for generating a three-dimensional modeling object on which modeling is performed on an object in a first space;
a communication unit configured to receive an image of the object from a camera disposed in a second space different from the first space; and
A control unit for collecting a plurality of images each including the 3D modeling objects having different postures in the first space having a first reference coordinate system,
The control unit is
Learning data collection method, characterized in that by using the degree of freedom information of the camera and the three-dimensional modeling object included in the plurality of images to generate the learning data for the object.

A computer program recorded on a computer readable recording medium for executing the method according to any one of claims 1 to 10.

receiving a photographed image of an object from a camera;
searching for a specific reference image corresponding to the captured image from a plurality of reference images included in a preset data set; and
extracting the degree of freedom posture of the object corresponding to the captured image by using the degree of freedom information matched to the specific reference image,
The plurality of reference images,
A method of extracting a degree of freedom posture comprising each of three-dimensional modeling objects having different postures with respect to the object.

14. The method of claim 13,
In the step of searching for the specific reference image,
Through comparison between the graphic object corresponding to the object included in the captured image and the 3D modeling object included in each of the plurality of reference images,
Among the three-dimensional modeling objects included in each of the plurality of reference images, the method for extracting a degree of freedom posture, characterized in that the specific reference image including a specific three-dimensional modeling object having the most similar posture to the graphic object is retrieved.

15. The method of claim 14,
In the step of extracting the degree of freedom posture of the object,
The degree of freedom posture extraction method, characterized in that extracting the degree of freedom posture of the object corresponding to the captured image based on a relationship between matching points corresponding to each other between the graphic object and the specific 3D modeling object.