KR102555667B1

KR102555667B1 - Learning data collection system and method

Info

Publication number: KR102555667B1
Application number: KR1020200174295A
Authority: KR
Inventors: 이승현
Original assignee: 네이버랩스 주식회사
Priority date: 2020-12-14
Filing date: 2020-12-14
Publication date: 2023-07-17
Also published as: KR20220084628A

Abstract

본 발명은 인공지능에서 학습의 대상이 되는 학습 데이터 수집 시스템 및 수집 방법에 관한 것이다. 본 발명에 따른 학습 데이터 수집 시스템은, 복수의 대상물에 각각 구비되어, 상기 복수의 대상물 각각에 대한 자유도 정보를 센싱하는 센서들을 포함하는 센싱부, 상기 복수의 대상물을 촬영하도록 이루어지는 카메라부, 상기 카메라로부터 수신된 초기 영상에 포함된 상기 복수의 대상물 각각에 대응되는 이미지 객체에 근거하여, 상기 복수의 대상물 마다 서로 다른 라벨(label)을 설정하고, 상기 복수의 대상물 각각에 구비된 센서의 식별 정보와 상기 복수의 대상물 각각에 설정된 라벨을 일대일 매핑하는 제어부를 포함할 수 있다.The present invention relates to a learning data collection system and method for learning in artificial intelligence. The learning data collection system according to the present invention includes a sensing unit provided in each of a plurality of objects and including sensors for sensing degree-of-freedom information for each of the plurality of objects, a camera unit configured to photograph the plurality of objects, the Based on the image object corresponding to each of the plurality of objects included in the initial image received from the camera, a different label is set for each of the plurality of objects, and identification information of a sensor provided in each of the plurality of objects and a controller for one-to-one mapping the labels set for each of the plurality of objects.

Description

Learning data collection system and method {LEARNING DATA COLLECTION SYSTEM AND METHOD}

본 발명은 인공지능에서 학습의 대상이 되는 학습 데이터 수집 시스템 및 이를 이용한 학습 데이터 수집 방법에 관한 것이다.The present invention relates to a learning data collection system that is an object of learning in artificial intelligence and a learning data collection method using the same.

인공지능의 사전적 의미는, 인간의 학습능력과 추론능력, 지각능력, 자연언어의 이해능력 등을 컴퓨터 프로그램으로 실현한 기술이라 할 수 있다. 이러한 인공지능은 머신러닝에 인간의 뇌를 모방한 신경망 네트워크를 더한 딥러닝으로 인하여 비약적인 발전을 이루었다.The dictionary definition of artificial intelligence is a technology that realizes human learning, reasoning, perception, and understanding of natural language through computer programs. Such artificial intelligence has made rapid progress due to deep learning, which adds a neural network that mimics the human brain to machine learning.

딥러닝(deep learning)이란, 컴퓨터가 인간처럼 판단하고 학습할 수 있도록 하고, 이를 통해 사물이나 데이터를 군집화하거나 분류하는 기술로서, 최근에는 텍스트 데이터 뿐만 아니라 영상 데이터에 대한 분석까지 가능해져, 매우 다양한 산업분야에 적극적으로 활용되고 있다.Deep learning is a technology that enables computers to judge and learn like humans, and clusters or classifies objects or data through this. It is actively used in industry.

예를 들어, 로봇 분야, 자율 주행 분야, 의료 분야 등 다양한 산업분야에서는 딥러닝 기반의 학습 네트워크(이하, “딥러닝 네트워크”라 명명함)를 통하여, 학습 대상 데이터를 기반으로 학습을 수행하고, 의미 있는 학습 결과를 도출함으로써, 각 산업분야에 유용하게 활용되고 있다.For example, in various industrial fields such as robot field, autonomous driving field, and medical field, learning is performed based on learning target data through a deep learning-based learning network (hereinafter referred to as “deep learning network”), By deriving meaningful learning results, it is being used usefully in each industrial field.

일 예로서, 로봇 분야에서는, 로봇이 수행하는 작업에 대한 이해를 위하여, 로봇 주변의 상황 또는 로봇 주변에 배치된 작업 대상물에 대한 정확한 판단이 가능해야 하며, 이를 위해, 딥러닝 기반의 영상인식 기술(예를 들어, 로봇 비전(vision)기술)이 적극 활용되고 있다.As an example, in the field of robots, in order to understand the work performed by the robot, it is necessary to accurately determine the situation around the robot or the work object placed around the robot. For this purpose, deep learning-based image recognition technology (For example, robot vision technology) is being actively utilized.

한편, 딥러닝 뿐만 아니라 머신러닝과 같은 인공지능 분야에서는, 보다 많은 양에 대한 데이터에 대해 학습을 수행함에 따라, 정확도가 높아지고, 보다 양질의 결과물을 도출하는 것이 가능하다. 따라서, 인공지능 분야에서는, 학습의 대상이 되는 데이터를 수집하는 것이 필수적이다.On the other hand, in the field of artificial intelligence such as deep learning as well as machine learning, as learning is performed on a larger amount of data, accuracy increases and it is possible to derive higher quality results. Therefore, in the field of artificial intelligence, it is essential to collect data that is the subject of learning.

특히, 영상 데이터를 기반으로 한 딥러닝 네트워크 또는 머신러닝 네트워크는, 영상 데이터에 대응되는 물체의 위치 또는 자세를 추정할 수 있으며, 이러한 추정을 위해서는 영상 데이터와 함께, 물체의 자유도 정보(위치 정보 및 자세 정보)가 학습 데이터로서 확보되어야 한다. In particular, a deep learning network or machine learning network based on image data can estimate the position or posture of an object corresponding to the image data. and attitude information) must be secured as learning data.

종래, 영상 데이터 및 이에 대응되는 자유도 정보를 학습 데이터로서 수집하기 위해서는, 영상 데이터에 대해 라벨링을 수행하고(예를 들어, 영상 데이터에서 물체에 대응되는 특정 이미지 객체를 식별시키기 위한 작업), 특정 이미지 객체와 자유도 정보를 일일이 매핑하는 수작업이 이루어져야 하므로, 학습 데이터를 확보하기 위한 엄청난 노동력이 필요했다.Conventionally, in order to collect image data and degree-of-freedom information corresponding thereto as learning data, labeling is performed on the image data (for example, a task for identifying a specific image object corresponding to an object in the image data), and Since the manual work of mapping image objects and degree-of-freedom information must be done, a tremendous amount of labor was required to secure training data.

예를 들어, 대한민국 등록특허 10-2010085호 에서는 수퍼픽셀을 이용한 미세조직의 라벨링 이미지 생성방법 및 생성장치를 개시하고 있으며, 이는 물체에 대응되는 특정 이미지 객체에 대한 라벨링을 간소화하기 위한 것에 불과하여, 특정 이미지 객체와 자유도 정보의 매핑을 위해서는 여전히 수작업이 필요하다. For example, Korean Patent Registration No. 10-2010085 discloses a method and apparatus for generating a labeling image of a microstructure using superpixels, which is only for simplifying labeling of a specific image object corresponding to an object, Manual work is still required for mapping specific image objects and degree-of-freedom information.

이에, 물체의 자유도 정보를 포함한 학습 데이터를 자동화 방식으로 수집하는 방법에 대한 개선이 매우 절실한 상황이다.Accordingly, there is a great need for improvement of a method for automatically collecting learning data including information on the degree of freedom of an object.

본 발명은, 인공지능 네트워크의 학습을 위한 학습 데이터를 수집하는 학습 데이터 수집 시스템 및 방법에 관한 것이다.The present invention relates to a learning data collection system and method for collecting learning data for learning of an artificial intelligence network.

보다 구체적으로, 본 발명은, 자유도 정보를 포함하는 학습 데이터를 수집하는 학습 데이터 수집 시스템 및 방법에 관한 것이다.More specifically, the present invention relates to a learning data collection system and method for collecting learning data including degree-of-freedom information.

나아가, 본 발명은, 자유도 정보를 포함하는 학습 데이터를 자동으로 수집할 수 있는 학습 데이터 수집 시스템 및 방법에 관한 것이다.Furthermore, the present invention relates to a learning data collection system and method capable of automatically collecting learning data including degree-of-freedom information.

더 나아가, 본 발명은 다양한 자세를 갖는 물체에 대한 학습 데이터를 수집하는 학습 데이터 수집 시스템 및 방법에 관한 것이다.Furthermore, the present invention relates to a learning data collecting system and method for collecting learning data for objects having various postures.

나아가, 본 발명은 학습 데이터를 수집하는데 소요되는 시간 및 노동력을 최소화할 수 있는 학습 데이터 수집 시스템 및 방법에 관한 것이다.Furthermore, the present invention relates to a learning data collection system and method capable of minimizing the time and labor required to collect learning data.

위에서 살펴본 과제를 해결하기 위하여, 본 발명에 따른 학습 데이터 수집 방법은, 카메라를 통해, 자유도 센서가 각각 구비된 복수의 대상물에 대한 초기 영상을 수신하는 단계, 상기 초기 영상에 포함된 상기 복수의 대상물 각각에 대응되는 이미지 객체에 근거하여, 상기 복수의 대상물 마다 서로 다른 라벨(label)을 설정하는 단계, 상기 복수의 대상물 각각에 구비된 자유도 센서의 식별 정보와 상기 복수의 대상물 각각에 설정된 라벨을 일대일 매핑하는 단계, 상기 카메라를 통해 수신되는 후속 영상으로부터 상기 복수의 대상물을 추적하여, 상기 복수의 대상물 마다 설정된 라벨에 각각 대응되는 세그멘테이션 마스크(segmentation mask)를 추출하는 단계 및 상기 복수의 대상물 마다 설정된 라벨을 기준으로, 상기 세그멘테이션 마스크와 상기 자유도 센서로부터 센싱되는 자유도 정보를 포함하는 학습 데이터를 수집하는 단계를 포함할 수 있다.In order to solve the above problems, the learning data collection method according to the present invention includes the steps of receiving, through a camera, initial images of a plurality of objects each equipped with a degree of freedom sensor, and the plurality of images included in the initial images. Setting a different label for each of the plurality of objects based on an image object corresponding to each of the objects, identification information of a degree of freedom sensor provided in each of the plurality of objects and a label set for each of the plurality of objects one-to-one mapping, tracking the plurality of objects from subsequent images received through the camera, and extracting segmentation masks corresponding to labels set for each of the plurality of objects, and for each of the plurality of objects The method may include collecting learning data including the segmentation mask and degree of freedom information sensed by the degree of freedom sensor based on the set label.

나아가, 본 발명에 따른 학습 데이터 수집 시스템은, 복수의 대상물에 각각 구비되어, 상기 복수의 대상물 각각에 대한 자유도 정보를 센싱하는 센서들을 포함하는 센싱부, 상기 복수의 대상물을 촬영하도록 이루어지는 카메라부, 상기 카메라로부터 수신된 초기 영상에 포함된 상기 복수의 대상물 각각에 대응되는 이미지 객체에 근거하여, 상기 복수의 대상물 마다 서로 다른 라벨(label)을 설정하고, 상기 복수의 대상물 각각에 구비된 센서의 식별 정보와 상기 복수의 대상물 각각에 설정된 라벨을 일대일 매핑하는 제어부를 포함할 수 있다.Furthermore, the learning data collection system according to the present invention includes a sensing unit provided in each of a plurality of objects and including sensors for sensing degree-of-freedom information for each of the plurality of objects, and a camera unit configured to photograph the plurality of objects. , Based on the image object corresponding to each of the plurality of objects included in the initial image received from the camera, different labels are set for each of the plurality of objects, and the sensor provided in each of the plurality of objects It may include a control unit for one-to-one mapping identification information and labels set on each of the plurality of objects.

이러한 제어부는, 상기 카메라를 통해 수신되는 후속 영상으로부터 상기 복수의 대상물을 추적하여, 상기 복수의 대상물 마다 설정된 라벨에 각각 대응되는 세그멘테이션 마스크(segmentation mask)를 추출하고, 상기 복수의 대상물 마다 설정된 라벨을 기준으로, 상기 세그멘테이션 마스크와 상기 센서들로부터 수집되는 자유도 정보를 학습 데이터로서 수집할 수 있다.The control unit tracks the plurality of objects from subsequent images received through the camera, extracts segmentation masks corresponding to labels set for each of the plurality of objects, and uses the labels set for each of the plurality of objects. As a reference, the segmentation mask and degree-of-freedom information collected from the sensors may be collected as training data.

나아가, 본 발명에 따른, 전자기기에서 하나 이상의 프로세스에 의하여 실행되며, 컴퓨터로 판독될 수 있는 기록매체에 저장 가능한 프로그램은, 카메라를 통해, 자유도 센서가 각각 구비된 복수의 대상물에 대한 초기 영상을 수신하는 단계, 상기 초기 영상에 포함된 상기 복수의 대상물 각각에 대응되는 이미지 객체에 근거하여, 상기 복수의 대상물 마다 서로 다른 라벨(label)을 설정하는 단계, 상기 복수의 대상물 각각에 구비된 자유도 센서의 식별 정보와 상기 복수의 대상물 각각에 설정된 라벨을 일대일 매핑하는 단계, 상기 카메라를 통해 수신되는 후속 영상으로부터 상기 복수의 대상물을 추적하여, 상기 복수의 대상물 마다 설정된 라벨에 각각 대응되는 세그멘테이션 마스크(segmentation mask)를 추출하는 단계 및 상기 복수의 대상물 마다 설정된 라벨을 기준으로, 상기 세그멘테이션 마스크와 상기 자유도 센서로부터 수집되는 자유도 정보를 학습 데이터로서 저장하는 단계를 수행하도록 하는 명령어들을 포함할 수 있다.Furthermore, a program that is executed by one or more processes in an electronic device according to the present invention and can be stored in a computer-readable recording medium is, through a camera, an initial image of a plurality of objects each equipped with a degree of freedom sensor. Receiving a label, setting a different label for each of the plurality of objects based on an image object corresponding to each of the plurality of objects included in the initial image, freedom provided for each of the plurality of objects One-to-one mapping of identification information of a degree sensor and labels set for each of the plurality of objects, tracking the plurality of objects from subsequent images received through the camera, and segmentation masks respectively corresponding to the labels set for each of the plurality of objects It may include instructions for performing the step of extracting a segmentation mask and the step of storing the segmentation mask and degree of freedom information collected from the degree of freedom sensor as training data based on the label set for each of the plurality of objects. there is.

위에서 살펴본 것과 같이, 본 발명에 따른 학습 데이터 수집 시스템 및 방법은, 학습의 대상이 되는 물체에 자유도 정보 수집을 위한 센서를 구비하여, 물체에 대한 영상 데이터와 함께 물체의 자유도 정보를 함께 수집할 수 있다. 이때, 본 발명에서는 영상 데이터에서 물체 각각을 기준으로, 물체 각각에 대응되는 이미지 객체에 대해 세그멘테이션(segmentation)을 수행할 수 있다. 나아가, 본 발명에서는 세그멘테이션된 마스크에 해당하는 마스크 라벨과 각각의 물체에 대한 자유도 정보를 매핑함으로써, 물체에 대해 영상을 촬영하는 것만으로도, 물체에 대한 영상 데이터와 자유도 정보를 한번에 수집할 수 있다. 이를 통해, 본 발명에 의하면, 종래 물체에 대응되는 영상 데이터와 자유도 정보를 일일이 수작업으로 매핑함으로써 소모되었던 노동력과 시간을 절대적으로 줄일 수 있다.As described above, in the learning data collection system and method according to the present invention, an object to be studied is provided with a sensor for collecting degree-of-freedom information, and image data about the object and degree-of-freedom information of the object are collected together. can do. In this case, in the present invention, segmentation may be performed on image objects corresponding to each object based on each object in the image data. Furthermore, in the present invention, by mapping mask labels corresponding to segmented masks and degree-of-freedom information for each object, it is possible to collect image data and degree-of-freedom information for an object at once just by capturing an image of the object. can Through this, according to the present invention, labor and time consumed by manually mapping image data and degree-of-freedom information corresponding to conventional objects can be absolutely reduced.

나아가, 본 발명에 따른 학습 데이터 수집 시스템 및 방법은, 카메라의 화각 내에 복수의 물체를 둠으로써, 각각의 이미지 프레임 마다 복수의 물체 각각에 대한 학습 데이터를 확보할 수 있다. 이와 같이, 본 발명에 의하면, 여러 물체에 대해 학습 데이터를 수집하는데 소요되는 시간을 절대적으로 줄일 수 있다.Furthermore, the learning data collection system and method according to the present invention can secure learning data for each of a plurality of objects for each image frame by placing a plurality of objects within the field of view of the camera. In this way, according to the present invention, it is possible to absolutely reduce the time required to collect learning data for various objects.

도 1은 본 발명에 따라 수집된 학습 데이터가 활용되는 예를 설명하기 위한 개념도이다.
도 2는 본 발명에 따른 학습 데이터 수집 시스템을 설명하기 위한 개념도이다.
도 3은 본 발명에 따른 학습 데이터 수집 방법을 설명하기 위한 흐름도이다.
도 4, 도 5a, 도 5b, 도 6, 도 7, 도 8 및 도 9는 학습 데이터를 수집하는 방법을 설명하기 위한 개념도들이다.1 is a conceptual diagram for explaining an example in which learning data collected according to the present invention is utilized.
2 is a conceptual diagram for explaining a learning data collection system according to the present invention.
3 is a flowchart illustrating a learning data collection method according to the present invention.
4, 5a, 5b, 6, 7, 8 and 9 are conceptual diagrams for explaining a method of collecting learning data.

이하, 첨부된 도면을 참조하여 본 명세서에 개시된 실시 예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 유사한 구성요소에는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 이하의 설명에서 사용되는 구성요소에 대한 접미사 "모듈" 및 "부"는 명세서 작성의 용이함만이 고려되어 부여되거나 혼용되는 것으로서, 그 자체로 서로 구별되는 의미 또는 역할을 갖는 것은 아니다. 또한, 본 명세서에 개시된 실시 예를 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 명세서에 개시된 실시 예의 요지를 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. 또한, 첨부된 도면은 본 명세서에 개시된 실시 예를 쉽게 이해할 수 있도록 하기 위한 것일 뿐, 첨부된 도면에 의해 본 명세서에 개시된 기술적 사상이 제한되지 않으며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. Hereinafter, the embodiments disclosed in this specification will be described in detail with reference to the accompanying drawings, but the same reference numerals will be assigned to the same or similar components regardless of reference numerals, and overlapping descriptions thereof will be omitted. The suffixes "module" and "unit" for components used in the following description are given or used together in consideration of ease of writing the specification, and do not have meanings or roles that are distinct from each other by themselves. In addition, in describing the embodiments disclosed in this specification, if it is determined that a detailed description of a related known technology may obscure the gist of the embodiment disclosed in this specification, the detailed description thereof will be omitted. In addition, the accompanying drawings are only for easy understanding of the embodiments disclosed in this specification, the technical idea disclosed in this specification is not limited by the accompanying drawings, and all changes included in the spirit and technical scope of the present invention , it should be understood to include equivalents or substitutes.

제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.Terms including ordinal numbers, such as first and second, may be used to describe various components, but the components are not limited by the terms. These terms are only used for the purpose of distinguishing one component from another.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as “directly connected” or “directly connected” to another element, it should be understood that no other element exists in the middle.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. Singular expressions include plural expressions unless the context clearly dictates otherwise.

본 출원에서, "포함한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.In this application, terms such as "comprise" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

본 발명은, 인공지능 네트워크의 학습을 위한 학습 데이터를 수집하는 학습 데이터 수집 시스템 및 방법에 관한 것으로서, 특히 자유도 정보를 포함하는 학습 데이터를 자동으로 수집할 수 있는 학습 데이터 수집 방법 및 시스템에 대한 것이다.The present invention relates to a learning data collection system and method for collecting learning data for learning of an artificial intelligence network, and particularly to a learning data collection method and system capable of automatically collecting learning data including degree-of-freedom information. will be.

앞서 살펴본 것과 같이, 인공지능의 발전에 힘입어 영상인식 기술은 다양한 산업분야에 활용되고 있다. 특히, 로봇 분야에서는, 인공지능 기반의 영상 인식 기술(예를 들어, 딥러닝 기반의 영상인식 기술)에 기반하여, 로봇이 속한 작업 환경을 분석 및 이해하고, 이를 기반으로 로봇이 목표로 하는 작업을 수행하고 있다.As reviewed above, thanks to the development of artificial intelligence, image recognition technology is being used in various industries. In particular, in the field of robots, based on artificial intelligence-based image recognition technology (eg, deep learning-based image recognition technology), analyze and understand the work environment to which the robot belongs, and based on this, the robot's target task is performing

예를 들어, 도 1에 도시된 것과 같이, 로봇(R)에게 특정 작업(예를 들어, 설거지(dish-washing)이 주어진 경우, 로봇(R) 또는 로봇(R) 주변에 배치된 카메라(미도시됨)는 로봇(R)의 작업 환경에 해당하는 영상을 촬영할 수 있다. 그리고, 로봇(R)의 제어부는, 촬영된 영상에 기반하여, 로봇(R)이 특정 작업을 수행하기 위하여, 어떻게 동작해야 하는지에 대한 판단을 내리고, 판단에 따라 동작하도록 로봇(R)을 제어할 수 있다.For example, as shown in FIG. 1 , when the robot R is given a specific task (eg, dish-washing), the robot R or a camera disposed around the robot R shown) may capture an image corresponding to the working environment of the robot R. And, based on the captured image, the control unit of the robot R, in order for the robot R to perform a specific task, how It is possible to make a judgment as to whether or not to operate, and control the robot R to operate according to the judgment.

이 경우, 로봇(R)의 제어부는, 촬영된 영상에서 작업의 대상이 되는 대상물(A, 또는 객체(object), 예를 들어, 그릇(a1, a2))을 인식하고, 대상물(A)의 위치 및 자세(또는 포즈, pose)를 분석하여, 로봇(R)이 대상물에 대해 목표로 하는 작업을 수행할 수 있도록 로봇(R)을 제어해야 한다.In this case, the control unit of the robot R recognizes an object (A or object, for example, bowls a1 and a2), which is the target of the work, from the captured image, and By analyzing the position and posture (or pose), the robot (R) must be controlled so that the robot (R) can perform a target task with respect to the object.

이를 위하여, 로봇(R)의 제어부는, 촬영된 영상으로부터 다양한 정보를 수집하여야 하며, 예를 들어, i) 작업의 대상이 되는 대상물의 종류, ii) 작업의 대상이 되는 대상물의 크기, iii) 작업의 대상이 되는 대상물의 형상, iv) 작업의 대상이 되는 대상물의 위치(예를 들어, 도 1에 도시된 것과 같이, 그릇(a1)이 싱크대(sink) 의 어디쯤에 놓여져 있는지 등), v) 작업의 대상이 되는 대상물의 자세(예를 들어, 도 1에 도시된 것과 같이, 그릇(a1)이 싱크대에 놓여져 있는 자세(ex: 비스듬히 기울어져 있는지 등))에 대한 정보 중 복수의 정보를 이용하여, 로봇(R)을 정확하게 제어할 수 있다. To this end, the control unit of the robot R must collect various information from the captured image, for example, i) the type of object to be worked on, ii) the size of the object to be worked on, iii) The shape of the object to be worked on, iv) the location of the object to be worked on (for example, as shown in FIG. 1, where the bowl (a1) is placed in the sink), v) A plurality of pieces of information among information about the posture of the object to be worked on (for example, as shown in FIG. 1, the posture in which the bowl a1 is placed in the sink (ex: whether it is tilted at an angle, etc.)) By using, it is possible to accurately control the robot (R).

여기에서, 작업의 대상이 되는 대상물의 위치 및 자세는 “자유도” 또는 “자유도 정보”라고도 표현될 수 있으며, 자유도 정보는 위치 정보 및 자세 정보를 포함한 개념으로 이해되어 질 수 있다. 이러한, 자유도 정보는, 3차원 위치(x, y, z)에 해당하는 위치 정보(또는 3차원 위치 정보) 및 3차원 자세(r(roll), θ(pitch),

(yaw))에 해당하는 자세 정보(또는 3차원 자세 정보)를 포함할 수 있다.Here, the position and posture of the object to be worked may be expressed as “degree of freedom” or “degree of freedom information”, and degree of freedom information may be understood as a concept including position information and posture information. This degree-of-freedom information includes position information (or 3-dimensional position information) corresponding to a 3-dimensional position (x, y, z) and 3-dimensional posture (r (roll), θ (pitch),

(yaw)) may include attitude information (or 3D attitude information).

한편, 로봇(R)이 작업의 대상이 되는 대상물에 대하여 정확하게 작업을 수행하기 위해서는 자유도 정보를 파악하는 것이 매우 중요하다. On the other hand, in order for the robot R to accurately perform work on an object to be worked on, it is very important to grasp information on degrees of freedom.

예를 들어, 로봇(R)의 제어부는 작업의 대상이 되는 대상물(a1, a2)을 잡기 위하여, 로봇 팔(R1, R2)을 어떤 각도로 제어하고, 어떤 자세로 파지를 해야 하는지를 결정해야 하며, 이는 작업의 대상이 되는 대상물의 자세 및 위치 중 적어도 하나에 근거하여 결정되기 때문이다.For example, the control unit of the robot R must determine at what angle to control the robot arms R1 and R2 and in what posture to grip them in order to grab the objects a1 and a2 that are the target of work. , This is because it is determined based on at least one of the posture and position of the object to be worked on.

이때, 촬영된 영상으로부터 작업의 대상이 되는 대상물(예를 들어, a1, a2)이 인식된 것만으로, 대상물의 자유도 정보까지 인지할 수 있다면, 작업의 정확도 뿐만 아니라, 작업의 효율을 확보할 수 있다.At this time, if the degree of freedom information of the object can be recognized only by recognizing the object (eg, a1, a2) that is the target of the work from the captured image, it is possible to secure not only the accuracy of the work but also the efficiency of the work. can

이를 위하여, i)촬영된 영상에 포함된 대상물의 특정 형상에 대한 정보 및 ii)특정 형상일때 대상물이 어떤 위치 또는 어떤 자세를 갖는지에 대한 정보가 상호 매칭된 학습 데이터가 활용될 수 있다. To this end, learning data in which i) information on a specific shape of an object included in a photographed image and ii) information on a position or posture of an object in a specific shape are mutually matched may be used.

한편, 로봇(R)이 정확한 작업을 수행하기 위해서는, 방대한 학습 데이터를 기반으로 학습된 인공지능 알고리즘(예를 들어, 딥러닝 알고리즘 또는 딥러닝 네트워크)이 필요하다. 따라서, 본 발명에서는, 학습 데이터를 수집하는 방법에 대하여 첨부된 도면과 함께 보다 구체적으로 살펴본다. 도 2는 본 발명에 따른 학습 데이터 수집 시스템을 설명하기 위한 개념도이고, 도 3은 본 발명에 따른 학습 데이터 수집 방법을 설명하기 위한 흐름도이다. 나아가, 도 4, 도 5a, 도 5b, 도 6, 도 7, 도 8 및 도 9는 학습 데이터를 수집하는 방법을 설명하기 위한 개념도들이다.On the other hand, in order for the robot R to perform an accurate task, an artificial intelligence algorithm (eg, a deep learning algorithm or a deep learning network) learned based on vast amounts of learning data is required. Therefore, in the present invention, a method of collecting learning data will be looked at in more detail together with the accompanying drawings. 2 is a conceptual diagram for explaining a learning data collection system according to the present invention, and FIG. 3 is a flowchart for explaining a learning data collection method according to the present invention. Furthermore, FIGS. 4, 5a, 5b, 6, 7, 8, and 9 are conceptual diagrams for explaining a method of collecting learning data.

본 발명에 대한 설명에 앞서, 본 명세서에서 언급되는 “대상물”은, 그 종류에 제한이 없으며, 매우 다양한 물체로 해석되어 질 수 있다. 대상물은 시각적 또는 물리적으로 구분이 가능한 구체적인 형태를 가지고 있는 것으로서, 물건 뿐만 아니라, 사람 또는 동물의 개념까지 포함하는 것으로 이해되어 질 수 있다.Prior to the description of the present invention, the “object” referred to in this specification is not limited in its type and can be interpreted as a very diverse object. An object has a specific shape that can be visually or physically distinguished, and can be understood to include not only objects but also concepts of people or animals.

도 2에 도시된 것과 같이, 본 발명에 따른 학습 데이터 수집 시스템(100)은 카메라(110), 센싱부(120), 저장부(130), 통신부(140) 및 제어부(150) 중 적어도 하나를 포함할 수 있다.As shown in FIG. 2, the learning data collection system 100 according to the present invention uses at least one of a camera 110, a sensing unit 120, a storage unit 130, a communication unit 140, and a control unit 150. can include

카메라(110)는 영상을 촬영하기 위한 수단으로서, 본 발명에 따른 시스템(100) 내에 포함되거나, 또는 별도로 구비될 수 있다. 본 발명에서 카메라(110)는 “이미지 센서”라고도 명명될 수 있다.The camera 110 is a means for capturing images, and may be included in the system 100 according to the present invention or provided separately. In the present invention, the camera 110 may also be referred to as an “image sensor”.

카메라(110)는 정적인 영상 및 동적인 영상 중 적어도 하나를 촬영하도록 이루어질 수 있으며, 단수 또는 복수로 구비될 수 있다.The camera 110 may be configured to capture at least one of a static image and a dynamic image, and may be provided in singular or plural numbers.

카메라(110)는 대상물(또는 피사체)의 깊이 정보를 획득할 수 있는 3차원 깊이 카메라(3D depth camera) 또는 RGB-깊이 카메라(RGB-depth camera) 등으로 이루어질 수 있다. 카메라(110)가 3차원 깊이 카메라로 이루어진 경우, 촬영된 영상을 이루는 각 픽셀(pixel)의 깊이 값을 알 수 있으며, 이를 통하여 대상물의 깊이 정보가 획득될 수 있다.The camera 110 may include a 3D depth camera or an RGB-depth camera capable of obtaining depth information of an object (or subject). When the camera 110 is formed of a 3D depth camera, a depth value of each pixel constituting a photographed image may be known, and through this, depth information of an object may be obtained.

다음으로 센싱부(120)는, 복수의 센서(120a, 120b, 120c)를 포함할 수 있다. 센싱부(120)에 포함된 복수의 센서(120a, 120b, 120c) 각각은 도 2에 도시된 것과 같이, 대상물(200, 201, 202, 203)에 각각 구비될 수 있다.Next, the sensing unit 120 may include a plurality of sensors 120a, 120b, and 120c. As shown in FIG. 2 , each of the plurality of sensors 120a, 120b, and 120c included in the sensing unit 120 may be provided in the objects 200, 201, 202, and 203, respectively.

도 2에서는, 설명의 편의상, 제1, 제2 및 제3 대상물(201, 202, 203)에 대해서만 도면 부호를 부여하고, 이에 각각 구비된 제1, 제2 및 제3 센서(120a, 120b, 120c)에 대해서만 도면 부호를 부여하였다. 그러나, 도 2에 도시된 것과 같이, 본 발명에 따른 설명은, 도 2에 도시된 대상물들(예를 들어, 도 2에 도시된 제1, 제2 및 제3 대상물(201, 202, 203)을 포함한 9개의 물체를 포함)에 대해 동일하게 적용됨은 당업자에게 자명할 것이다.In FIG. 2, for convenience of description, reference numerals are assigned only to the first, second, and third objects 201, 202, and 203, and the first, second, and third sensors 120a, 120b, 120c) is given reference numerals only. However, as shown in FIG. 2, the description according to the present invention relates to the objects shown in FIG. 2 (eg, the first, second, and third objects 201, 202, and 203 shown in FIG. 2) It will be apparent to those skilled in the art that the same applies to (including nine objects including).

한편, 센싱부(120)는 대상물(200)에 각각 구비되어, 대상물(200)의 자유도 정보를 센싱할 수 있다.On the other hand, the sensing unit 120 is provided in each object 200, it is possible to sense the degree of freedom information of the object (200).

도시와 같이, 제1 대상물(201)에는 제1 센서(120a)가 구비되고, 제2 대상물(202)에는 제2 센서(120b)가 구비되며, 제3 대상물(203)에는 제3 센서(120c)가 구비될 수 있다. 나아가, 도면부호를 부여하지 않은 다른 대상물에도, 마찬가지로 각각 적어도 하나의 센서가 구비될 수 있다.As illustrated, the first object 201 is provided with a first sensor 120a, the second object 202 is provided with a second sensor 120b, and the third object 203 is provided with a third sensor 120c. ) may be provided. Furthermore, at least one sensor may be provided in the same manner for other objects to which reference numerals are not assigned.

한편, 본 명세에서는 대상물 마다 1개의 센서(또는 자유도 센서)가 구비된 것으로 설명하나, 본 발명은 이에 한정되지 않는다. 즉, 경우에 따라 대상물에는 복수의 센서가 구비되는 것 또한 가능하다.On the other hand, in the present specification, it is described that one sensor (or degree of freedom sensor) is provided for each object, but the present invention is not limited thereto. That is, it is also possible that a plurality of sensors are provided in the object according to circumstances.

본 발명에 의하면, 센싱부(120)를 통해 대상물(200) 각각의 3차원 위치(x, y, z)에 해당하는 위치 정보(또는 3차원 위치 정보(병진 운동의 자유도에 해당함)) 및 3차원 자세(r(roll,롤), θ(pitch,피치),

(yaw,요우))에 해당하는 자세 정보(또는 3차원 자세 정보(회전 운동의 자유도에 해당함))를 센싱할 수 있다.According to the present invention, through the sensing unit 120, the positional information corresponding to each of the three-dimensional positions (x, y, z) of the object 200 (or three-dimensional positional information (corresponding to the degree of freedom of translational motion)) and 3 Dimensional posture (r (roll), θ (pitch),

(yaw, yaw)) can sense posture information (or 3-dimensional posture information (corresponding to the degree of freedom of rotational motion)) corresponding to.

여기에서, 센싱부(120)는 자유도(Degree of Freedom) 센서 또는 DoF(Degree of Freedom) 센서라고도 명명될 수 있다. 센싱부(120)는 IMU(Inertial Measurement Unit, 관성(慣性) 측정 장치)로 이루어질 수 있다.Here, the sensing unit 120 may also be referred to as a degree of freedom sensor or a degree of freedom (DoF) sensor. The sensing unit 120 may include an inertial measurement unit (IMU).

다음으로, 저장부(130)는 본 발명에 따른 다양한 정보를 저장하도록 이루어질 수 있다. 저장부(130)의 종류는 매우 다양할 수 있으며, 적어도 일부는, 외부 서버(클라우드 서버 및 데이터베이스(database: DB) 중 적어도 하나)를 의미할 수 있다. 즉, 저장부(130)와 관련된 정보가 저장되는 공간이면 충분하며, 물리적인 공간에 대한 제약은 없는 것으로 이해될 수 있다. Next, the storage unit 130 may be configured to store various information according to the present invention. The types of the storage unit 130 may be very diverse, and at least some of them may mean an external server (at least one of a cloud server and a database (DB)). That is, it can be understood that a space in which information related to the storage unit 130 is stored is sufficient, and there is no restriction on physical space.

저장부(130)에는 i)본 발명에 따른 데이터 수집 시스템에 의해 수집된 학습 데이터, ii) 카메라(110)를 통해 촬영된 영상, iii) 영상에 포함된 대상물에 대응되는 이미지 객체의 세그멘테이션 마스크, iv) 세그멘테이션 마스크의 라벨(label, 또는 라벨 정보), v) 센싱부(130)를 통해 수집된 센싱 정보(예를 들어, 자유도 정보) 중 적어도 하나가 저장될 수 있다.The storage unit 130 includes i) learning data collected by the data collection system according to the present invention, ii) an image captured through the camera 110, iii) a segmentation mask of an image object corresponding to an object included in the image, At least one of iv) a label (or label information) of the segmentation mask, and v) sensing information (eg, degree of freedom information) collected through the sensing unit 130 may be stored.

다음으로, 통신부(140)는 유선 또는 무선 통신 중 적어도 하나를 수행하도록 이루어질 수 있다. 통신부(140)는 통신이 가능한 다양한 대상과 통신을 수행하도록 이루어질 수 있다. 예를 들어, 통신부(140)는 카메라(110) 및 센싱부(120) 중 적어도 하나와 통신을 수행할 수 있다.Next, the communication unit 140 may be configured to perform at least one of wired or wireless communication. The communication unit 140 may be configured to perform communication with various targets capable of communication. For example, the communication unit 140 may communicate with at least one of the camera 110 and the sensing unit 120 .

통신부(140)는 카메라(110)와의 통신을 통해, 카메라(110)를 통해 촬영(또는 센싱)되는 영상을 수신할 수 있다.The communication unit 140 may receive an image photographed (or sensed) through the camera 110 through communication with the camera 110 .

나아가, 통신부(140)는 센싱부(120)와의 통신을 통해, 센싱부(120)를 통해 수신되는 센싱 정보(예를 들어, 자유도 정보)를 수신할 수 있다.Furthermore, the communication unit 140 may receive sensing information (eg, degree of freedom information) received through the sensing unit 120 through communication with the sensing unit 120 .

나아가, 통신부(140)는 적어도 하나의 외부 서버와 통신하도록 이루어질 수 있다. 여기에서, 외부 서버는, 앞서 살펴본 것과 같이, 저장부(130)의 적어도 일부의 구성에 해당하는 클라우드 서버 또는 데이터베이스 중 적어도 하나를 포함할 수 있다. 한편, 외부 서버에서는, 제어부(150)의 적어도 일부의 역할을 수행하도록 구성될 수 있다. 즉, 데이터 처리 또는 데이터 연산 등의 수행은 외부 서버에서 이루어지는 것이 가능하며, 본 발명에서는 이러한 방식에 대한 특별한 제한을 두지 않는다.Furthermore, the communication unit 140 may communicate with at least one external server. Here, the external server, as described above, may include at least one of a cloud server or a database corresponding to at least a part of the configuration of the storage unit 130 . Meanwhile, an external server may be configured to perform at least a part of the role of the control unit 150. That is, data processing or data calculation can be performed in an external server, and the present invention does not place any particular restrictions on this method.

한편, 통신부(140)는 통신하는 대상의 통신 규격에 따라 다양한 통신 방식을 지원할 수 있다. On the other hand, the communication unit 140 may support various communication methods according to the communication standard of the target to be communicated with.

예를 들어, 통신부(140)는, WLAN(Wireless LAN), Wi-Fi(Wireless-Fidelity), Wi-Fi(Wireless Fidelity) Direct, DLNA(Digital Living Network Alliance), WiBro(Wireless Broadband), WiMAX(World Interoperability for Microwave Access), HSDPA(High Speed Downlink Packet Access), HSUPA(High Speed Uplink Packet Access), LTE(Long Term Evolution), LTE-A(Long Term Evolution-Advanced), 5G(5th Generation Mobile Telecommunication ), 블루투스(Bluetooth™), RFID(Radio Frequency Identification), 적외선 통신(Infrared Data Association; IrDA), UWB(Ultra-Wideband), ZigBee, NFC(Near Field Communication), Wi-Fi Direct, Wireless USB(Wireless Universal Serial Bus) 기술 중 적어도 하나를 이용하여, 통신을 수행하도록 이루어질 수 있다.For example, the communication unit 140 may include Wireless LAN (WLAN), Wireless-Fidelity (Wi-Fi), Wireless Fidelity (Wi-Fi) Direct, Digital Living Network Alliance (DLNA), Wireless Broadband (WiBro), WiMAX ( World Interoperability for Microwave Access), High Speed Downlink Packet Access (HSDPA), High Speed Uplink Packet Access (HSUPA), Long Term Evolution (LTE), Long Term Evolution-Advanced (LTE-A), 5th Generation Mobile Telecommunication (5G) , Bluetooth™, RFID (Radio Frequency Identification), Infrared Data Association (IrDA), UWB (Ultra-Wideband), ZigBee, NFC (Near Field Communication), Wi-Fi Direct, Wireless USB (Wireless Universal) Serial Bus) technology may be used to perform communication.

다음으로 제어부(150)는 본 발명과 관련된 학습 데이터 수집 시스템(100)의 전반적인 동작을 제어하도록 이루어질 수 있다. 제어부(150)는 인공지능 알고리즘을 처리 가능한 프로세서(processor, 또는 인공지능 프로세서)를 포함할 수 있다. 제어부(150))는 딥러닝 알고리즘에 기반하여, 카메라(110)를 통해 촬영되는 영상에서, 카메라(110)에 의해 촬영된 대상물(200)을 인식 및 추적할 수 있다. 이러한 작업은 트래킹(tracking)이라고도 명명될 수 있다. 제어부(150)는 카메라(110)를 통해 복수의 대상물(201, 202, 203)이 촬영된 경우, 카메라(110)를 통해 촬영된 영상으로부터 복수의 대상물 각각을 인식하고, 각각의 대상물을 추적하는 것 또한 가능하다.Next, the control unit 150 may be configured to control the overall operation of the learning data collection system 100 related to the present invention. The control unit 150 may include a processor (or artificial intelligence processor) capable of processing an artificial intelligence algorithm. The controller 150 may recognize and track the object 200 photographed by the camera 110 in an image photographed by the camera 110 based on a deep learning algorithm. This task may also be referred to as tracking. When a plurality of objects 201, 202, and 203 are photographed through the camera 110, the controller 150 recognizes each of the plurality of objects from the image captured through the camera 110, and tracks each object. that is also possible

따라서, 제어부(150)는 카메라(110)를 통해 복수의 대상물(201, 202, 203)을 촬영하고 있는 상태에서, 복수의 대상물(201, 202, 203) 중 적어도 하나의 위치 및 자세가 변경되더라도, 복수의 대상물 각각을 실시간 또는 기 설정된 시간 간격으로 추적할 수 있다. Therefore, while the controller 150 is photographing the plurality of objects 201, 202, and 203 through the camera 110, even if the position and posture of at least one of the plurality of objects 201, 202, and 203 is changed, , Each of a plurality of objects may be tracked in real time or at preset time intervals.

한편, 영상으로부터 대상물을 인식 및 추적하는 딥러닝 기법은 매우 다양하며, 본 발명에서는 이에 대한 특별한 한정을 두지 않는다.On the other hand, deep learning techniques for recognizing and tracking objects from images are very diverse, and the present invention does not place any particular limitations on them.

이하에서는, 위에서 살펴본 본 발명에 따른 학습 데이터 수집 시스템의 구성에 기반하여, 학습 데이터를 수집하는 방법에 대하여 보다 구체적으로 살펴본다.Hereinafter, based on the configuration of the learning data collection system according to the present invention described above, a method of collecting learning data will be described in more detail.

먼저, 본 발명에 따른 학습 데이터 수집 방법에 의하면, 카메라(110)를 통해 자유도 센서가 각각 구비된 대상물에 대한 초기 영상을 수신하는 과정이 진행될 수 있다(S310).First, according to the learning data collection method according to the present invention, a process of receiving an initial image of an object each equipped with a degree of freedom sensor through the camera 110 may proceed (S310).

보다 구체적으로, 초기 영상을 획득하는 S310 과정에서는, 도 4의 (a)에 도시된 것과 같이, 카메라(110)를 통해 복수의 대상물(201, 202, 203)에 대한 촬영을 수행할 수 있다. 본 발명에서, 대상물은 카메라(110)에 의해 촬영의 대상이 되는 물체로서, “피사체”라고도 명명될 수 있다.More specifically, in step S310 of obtaining an initial image, as shown in FIG. In the present invention, an object is an object to be photographed by the camera 110 and may also be referred to as a “subject”.

한편, 본 명세서에서는, 발명의 설명을 위하여, 카메라에 의해 촬영되는 영상을 “초기 영상” 및 “후속 영상”으로 구분하여 설명한다.Meanwhile, in this specification, for description of the present invention, images captured by a camera are divided into “initial images” and “subsequent images”.

본 발명에서 “초기 영상”은 후속에서 설명될 대상물에 대한 라벨링을 수행하는데 활용되는 영상으로서, 반드시 최초로 수신된 영상을 의미하지 않는다. 즉, 초기 영상은 영상에 포함된 이미지 객체(촬영의 대상이 된 대상물에 대응되는 이미지를 의미함)에 근거하여, 본 발명에 따른 학습 데이터를 수집하기 위한 설정(세팅)을 하기 위한 대상이 될 수 있다.In the present invention, an “initial image” is an image used for labeling an object to be described later, and does not necessarily mean a first received image. That is, the initial image may be a target for setting (setting) for collecting learning data according to the present invention based on an image object included in the image (meaning an image corresponding to an object to be photographed). can

나아가, “후속 영상”은 학습 데이터가 직접적으로 수집되는 대상이 되는 영상으로서, 초기 영상이 수신된 이후에 수신되는 영상을 의미할 수 있다. 제어부(150)는 후속 영상에 포함된 이미지 객체 및 대상물에 구비된 센서(또는 자유도 센서)를 이용하여, 대상물에 대한 학습 데이터를 수집할 수 있다.Furthermore, a “subsequent image” is an image to which learning data is directly collected, and may refer to an image received after an initial image is received. The controller 150 may collect learning data for an object by using an image object included in a subsequent image and a sensor (or a degree of freedom sensor) included in the object.

본 발명에서, 초기 영상 및 후속 영상은 정적인 영상 및 동적인 영상 중 적어도 하나일 수 있다. 나아가, 초기 영상 및 후속 영상은 복수의 영상(또는 프레임(frame)인 것 또한 가능하다.In the present invention, the initial image and the subsequent image may be at least one of a static image and a dynamic image. Furthermore, it is also possible that the initial image and the subsequent image are a plurality of images (or frames).

도 4의 (b)에 도시된 것과 같이, 카메라(110)를 통해 대상물(200)에 대한 초기 영상(400)이 수신되면, 본 발명에서는 초기 영상(400)에서 복수의 대상물(200)에 대한 라벨링을 수행하는 과정이 진행될 수 있다(S320). As shown in (b) of FIG. 4, when the initial image 400 of the object 200 is received through the camera 110, in the present invention, the initial image 400 of the plurality of objects 200 A process of performing labeling may proceed (S320).

본 발명에서 복수의 대상물(200)에 대한 라벨링을 수행하는 과정에서는, 후속 영상에서, i)복수의 대상물(200) 각각을 추적(또는 트래킹)하기 위한 설정, ii) 복수의 대상물(200) 각각에 구비된 센서와 복수의 대상물(200) 각각에 대한 단위 영역을 일대일 매핑하기 위한 설정, iii)복수의 대상물(200) 각각에 대한 세그멘테이션 마스크(segmentation mask)를 추출하기 위한 설정, iv) 복수의 대상물(200) 각각에 구비된 센서와 복수의 대상물(200) 각각에 대한 세그멘테이션 마스크를 일대일 매핑하기 위한 설정 등이 이루어질 수 있다.In the process of labeling a plurality of objects 200 in the present invention, in subsequent images, i) settings for tracking (or tracking) each of the plurality of objects 200, ii) each of the plurality of objects 200 Settings for one-to-one mapping of the sensors provided in and unit areas for each of the plurality of objects 200, iii) settings for extracting a segmentation mask for each of the plurality of objects 200, iv) a plurality of A setting for one-to-one mapping of a sensor provided in each object 200 and a segmentation mask for each of the plurality of objects 200 may be made.

먼저, 본 발명에서 복수의 대상물(200) 각각을 추적하기 위한 설정은, 복수의 대상물의 자세 및 위치가 변경되더라도, 후속 영상으로부터 복수의 대상물(200) 각각을 인식시키기 위한 설정을 의미할 수 있다.First, in the present invention, the setting for tracking each of the plurality of objects 200 may mean the setting for recognizing each of the plurality of objects 200 from a subsequent image even if the posture and position of the plurality of objects are changed. .

이를 위하여, 본 발명에서는, 도 4의 (b)에 도시된 것과 같이, 딥러닝 알고리즘에 기반하여, 상기 후속 영상으로부터 복수의 대상물(200)에 대한 추적이 가능하도록, 초기 영상(400)에서 상기 복수의 대상물(200) 각각에 대응되는 이미지 객체(400)의 초기 위치를 특정하기 위한 단위 영역(200a)이 지정될 수 있다.To this end, in the present invention, as shown in (b) of FIG. 4, based on a deep learning algorithm, in the initial image 400, the plurality of objects 200 can be tracked from the subsequent image. A unit area 200a for specifying an initial position of the image object 400 corresponding to each of the plurality of objects 200 may be designated.

본 발명에서 단위 영역(200a)은 “박스(box)” 또는 “바운딩 박스(bounding box)”라고도 명명될 수 있다.In the present invention, the unit area 200a may also be referred to as a “box” or a “bounding box”.

제어부(150)는, 도 4의 (b)에 도시된 것과 같이, 상기 복수의 대상물(200) 각각에 대응되는 이미지 객체(400)가 단위 영역(200a) 내에 포함되도록, 이미지 객체(400)의 초기 위치를 특정할 수 있다.As shown in (b) of FIG. 4 , the controller 150 controls the image object 400 so that the image object 400 corresponding to each of the plurality of objects 200 is included in the unit area 200a. The initial location can be specified.

즉, 단위 영역(200a)은, 초기 영상(400)에서 복수의 대상물(200) 각각에 대응되는 이미지 객체(400) 마다 각각 특정될 수 있다. That is, the unit area 200a may be specified for each image object 400 corresponding to each of the plurality of objects 200 in the initial image 400 .

제어부(150)는 사용자 또는 작업자의 선택에 근거하여, 이미지 객체(400)가 위치하는 영역을 단위 영역(200a)으로서 설정할 수 있다. 나아가, 제어부(150)는 영상 인식 알고리즘에 근거하여, 초기 영상(400)에 포함된 이미지 객체(400)를 인식하고, 각각의 이미지 객체들(401, 402, 403)에 대해 단위 영역(201a, 202a, 203a)을 설정할 수 있다.The controller 150 may set an area where the image object 400 is located as the unit area 200a based on a user's or operator's selection. Furthermore, the controller 150 recognizes the image object 400 included in the initial image 400 based on the image recognition algorithm, and unit area 201a, 202a, 203a) can be set.

따라서, 각각의 단위 영역(200a)에는, 복수의 대상물(200) 각각에 대응되는 이미지 객체(400) 중 상기 각각의 단위 영역(200a)에 대응되는 어느 하나의 이미지 객체가 포함된 될 수 있다.Therefore, each unit area 200a may include one image object corresponding to each unit area 200a among the image objects 400 corresponding to each of the plurality of objects 200 .

예를 들어, 도 4의 (b)에 도시된 것과 같이, 제1 단위 영역(201a)에는 제1 대상물(201)에 대응되는 제1 이미지 객체(411)가 포함되고, 제2 단위 영역(202a)에는 제2 대상물(202)에 대응되는 제2 이미지 객체(412)가 포함될 수 있다. 그리고, 제3 단위 영역(203a)에는 제3 대상물(203)에 대응되는 제3 이미지 객체(413)가 포함될 수 있다. 이 경우, 각각의 이미지 객체(410)는 단위 영역(200a) 내에 완전히 포함되지 않아도 되며, 적어도 일부만 포함되는 것 또한 가능하다.For example, as shown in (b) of FIG. 4 , the first unit area 201a includes the first image object 411 corresponding to the first object 201, and the second unit area 202a ) may include a second image object 412 corresponding to the second object 202 . Also, a third image object 413 corresponding to the third object 203 may be included in the third unit area 203a. In this case, each image object 410 does not have to be completely included in the unit area 200a, and it is also possible that each image object 410 is included at least partially.

한편, 본 발명에서, 단위 영역(200a)의 형상은 사각형으로 묘사되었으나, 이에 특별한 한정을 두지 않는다. 예를 들어, 단위 영역(200a)은 이미지 객체(410)의 형상을 따라, 이미지 객체(410)의 가장자리에 설정될 수 있다.Meanwhile, in the present invention, the shape of the unit area 200a is depicted as a rectangle, but there is no particular limitation thereto. For example, the unit area 200a may be set at an edge of the image object 410 along the shape of the image object 410 .

이와 같이, 이미지 객체(410) 각각에 대해 단위 영역(200a)이 설정되면, 각각의 단위 영역(200a)에는, 서로 다른 라벨이 설정될 수 있다.In this way, when the unit area 200a is set for each image object 410, different labels may be set for each unit area 200a.

단위 영역(200a)마다 라벨이 설정된다고 함은, 복수의 대상물(200) 마다 라벨이 설정되는 것과 동일 또는 유사한 의미로 이해되어 질 수 있다. 이는, 복수의 대상물(200) 각각을 기준으로, 단위 영역(200a)이 각각 설정되기 때문이다. Setting a label for each unit area 200a may be understood as having the same or similar meaning as setting a label for each of a plurality of objects 200 . This is because each unit area 200a is set based on each of the plurality of objects 200 .

도 5a에 도시된 것과 같이, 단위 영역(200a) 각각에는, 복수의 대상물(200)을 기준으로, 서로 다른 라벨이 설정될 수 있다(도 5a, 도 5b, 도 6, 도 8은 설명의 편의상 라벨(예를 들어, BOX_01, BOX_02, ID_01, ID_02)을 도시하였다, 이러한 도시는 설명의 편의를 위한 것일 뿐, 실제 획득되는 데이터에는 위의 라벨을 표시한 이미지가 존재하지 않을 수 있음은 당업자에게 자명하다).As shown in FIG. 5A, different labels may be set in each unit area 200a based on a plurality of objects 200 (FIGS. 5A, 5B, 6, and 8 are for convenience of description). Labels (e.g., BOX_01, BOX_02, ID_01, ID_02) are shown. These illustrations are only for convenience of explanation, and it is known to those skilled in the art that the image displaying the above labels may not exist in the actually obtained data. self-explanatory).

제어부(150)는 도 5a에 도시된 것과 같이, 복수의 대상물(200)에 대응되는 이미지 객체(410)를 각각 포함한 단위 영역(200a)에 대해 서로 다른 라벨을 설정할 수 있다. 여기에서, 라벨(label)은 “식별정보” 라고도 이해되어 질 수 있다. 즉, 라벨은 각각의 단위 영역을 구분하기 위한 식별 정보로서, 서로 다른 단위 영역은 서로 다른 라벨, 즉 식별 정보를 갖는다.As shown in FIG. 5A , the controller 150 may set different labels for the unit areas 200a each including the image objects 410 corresponding to the plurality of objects 200 . Here, the label can also be understood as “identifying information”. That is, the label is identification information for distinguishing each unit area, and different unit areas have different labels, that is, identification information.

제어부(150)는 각각의 단위 영역(200a)을 기준으로, 각각의 단위 영역(200a)에 포함된 이미지 객체(410)에 대응되는 대상물(200)을 추적할 수 있다. 따라서, 각각의 대상물(200)을 추적하기 위해서는, 각각의 대상물(200)을 구분하는 것이 필수적이며, 제어부(150)는 단위 영역(200a)을 기준으로 추적의 대상이 되는 대상물 인식하고, 추적할 수 있다.The controller 150 may track the object 200 corresponding to the image object 410 included in each unit area 200a based on each unit area 200a. Therefore, in order to track each object 200, it is essential to distinguish each object 200, and the controller 150 recognizes the object to be tracked based on the unit area 200a and tracks it. can

따라서, 도 5a에 도시된 것과 같이, 제1 대상물에 대응되는 제1 이미지 객체(411)가 포함된 단위 영역(201a)에는 제1 라벨(501)이 설정 또는 특정될 수 있다. 마찬가지로, 다른 단위 영역들에도, 각각 서로 다른 라벨이 설정 또는 특정될 수 있다.Accordingly, as shown in FIG. 5A , a first label 501 may be set or specified in the unit area 201a including the first image object 411 corresponding to the first object. Similarly, different labels may be set or specified for different unit areas, respectively.

한편, S320과정에 서와 같이, 각각의 단위 영역(200a)에 대한 라벨링이 수행되면, 본 발명에서는 복수의 대상물 각각에 구비된 자유도 센서의 식별 정보와 복수의 대상물 각각에 설정된 라벨을 일대일 매핑하는 과정이 진행될 수 있다(S330).Meanwhile, as in the process of S320, when labeling is performed for each unit area 200a, in the present invention, the identification information of the degree of freedom sensor provided in each of a plurality of objects and the label set for each of the plurality of objects are mapped one-to-one. A process may proceed (S330).

즉, 본 발명에서는, 본 발명에서는 복수의 대상물 각각에 구비된 자유도 센서의 식별 정보와 단위 영역에 설정된 라벨을 일대일 매핑(mapping)하는 과정이 진행될 수 있다.That is, in the present invention, in the present invention, a process of one-to-one mapping between identification information of a degree of freedom sensor provided in each of a plurality of objects and a label set in a unit area may be performed.

이러한 일대일 매핑은 복수의 대상물을 기준으로 이루어진다. 즉, 어느 단위 영역에 어느 자유도 센서(또는 센서)의 식별 정보를 일대일 매핑할 것인지 여부는, 복수의 대상물 각각을 기준으로 이루어질 수 있다.This one-to-one mapping is performed based on a plurality of objects. That is, whether identification information of a certain degree of freedom sensor (or sensor) is to be mapped one-to-one to a certain unit area may be determined based on each of a plurality of objects.

예를 들어, 도 2, 도 4 및 도 5b에 도시된 것과 같이, 제1 대상물(201)에 제1 자유도 센서(120a)가 구비된 경우, 제1 대상물(201)에 대응되는 제1 이미지 객체(200a)가 포함된 제1 단위 영역(201a)에는, 제1 자유도 센서(120a)의 식별 정보(502, 예를 들어, “ID_01”)가 매핑될 수 있다. 따라서, 제1 단위 영역(201a)에 대응되는 제1 라벨(501, 예를 들어, “BOX_01”)에는 제1 자유도 센서(120a)의 식별 정보(502, 예를 들어, “ID_01”)가 매핑될 수 있다.For example, as shown in FIGS. 2, 4, and 5B, when the first object 201 includes the first degree of freedom sensor 120a, the first image corresponding to the first object 201 Identification information 502 (eg, “ID_01”) of the first DOF sensor 120a may be mapped to the first unit area 201a including the object 200a. Accordingly, the identification information 502 (eg, “ID_01”) of the first degree of freedom sensor 120a is included in the first label 501 (eg, “BOX_01”) corresponding to the first unit area 201a. can be mapped.

마찬가지로, 제2 단위 영역(202a)에 대응되는 제2 라벨(예를 들어, “BOX_02”)에는 제2 자유도 센서(120b)의 식별 정보(예를 들어, “ID_02”)가 매핑될 수 있다. Similarly, identification information (eg, “ID_02”) of the second DOF sensor 120b may be mapped to a second label (eg, “BOX_02”) corresponding to the second unit area 202a. .

이와 같은 방법으로, 촬영의 대상이 되는 다른 대상물에 대해서도, 다른 대상물에 각각 구비된 자유도 센서의 식별정보와, 다른 대상물 각각에 대응되는 단위 영역의 식별 정보가 각각 매핑될 수 있다.In this way, identification information of degree of freedom sensors respectively provided in each other object and identification information of a unit area corresponding to each other object may also be mapped to other objects to be photographed.

따라서, 도 7에 도시된 것과 같이, 제어부(150)는 복수의 대상물(200) 각각에 대응되는 단위 영역의 라벨(BOX_ID)과, 자유도 센서의 식별 정보(센서 ID)가 상호 매핑된(또는 매칭된) 정보를 수집할 수 있다. 이러한 매핑 정보는 저장부(130)에 저장될 수 있다. 이를 통해, 대상물의 위치 또는 자세가 변경되더라도, 단위 영역을 기준으로 대상물을 추적하고, 이에 대한 자유도 정보가 수집될 수 있다.Therefore, as shown in FIG. 7 , the control unit 150 maps (or matched) information can be collected. This mapping information may be stored in the storage unit 130 . Through this, even if the position or attitude of the object is changed, the object may be tracked based on the unit area, and degree of freedom information for this may be collected.

한편, 단위 영역의 라벨과, 자유도 센서의 식별정보의 매핑은 제어부(150)의 제어 하에 이루어질 수 있다. 제어부(150)는 단위 영역이 지정된 순서에 따라, 자유도 센서의 식별정보를 순차적으로 매핑할 수 있다. 예를 들어, 제어부(150)는 제1 번째 지정된 단위 영역의 라벨(예를 들어, ID_01)에, 복수의 자유도 센서 중 제1 번째 식별 정보(ID_01)를 갖는 제1 자유도 센서의 식별 정보(ID_01)를 매핑할 수 있다. 그리고, 제어부(150)는 제2 번째 지정된 단위 영역의 라벨(예를 들어, ID_02)에, 복수의 자유도 센서 중 제2 번째 식별 정보를 갖는 제2 자유도 센서의 식별 정보(ID_02)를 매핑할 수 있다. 자유도 센서의 식별 정보의 순서는 오름 차순 또는 내림 차순에 근거할 수 있다.Meanwhile, mapping of the label of the unit area and identification information of the degree of freedom sensor may be performed under the control of the controller 150 . The controller 150 may sequentially map the ID information of the degree of freedom sensor according to the order in which the unit area is designated. For example, the controller 150 may determine identification information of a first degree of freedom sensor having first identification information (ID_01) among a plurality of degree of freedom sensors in a label (eg, ID_01) of a first designated unit area. (ID_01) can be mapped. Then, the controller 150 maps identification information (ID_02) of a second degree of freedom sensor having second identification information among a plurality of degree of freedom sensors to a label (eg, ID_02) of a second designated unit area. can do. The order of identification information of the degree of freedom sensor may be based on an ascending order or a descending order.

한편, 본 발명에서 제어부(150)는, 도 6의 (a) 및 (b)에 도시된 것과 같이, 각각의 단위 영역(200a)에 포함된 이미지 객체에 대한 세그멘테이션(segmentation)을 수행하여, 각각의 단위 영역(200a)에 포함된 이미지 객체 마다의 세그멘테이션 마스크(segmentation mask 또는 세그멘테이션된 마스크, 300)를 추출할 수 있다.Meanwhile, in the present invention, as shown in (a) and (b) of FIG. 6 , the controller 150 performs segmentation on the image objects included in each unit area 200a, and respectively A segmentation mask (or segmented mask, 300) for each image object included in the unit area 200a of may be extracted.

본 발명에서 세그멘테이션은, 영상에서 대상물을 기준으로 이미지 객체를 인식 및 추출하는 작업으로서, 제어부(150)는 서로 다른 대상물에 각각 대응되는 서로 다른 이미지 객체를 별개의 종류 또는 클래스(class)로 인식하는 인스턴스(instance segmentation)을 수행할 수 있다.In the present invention, segmentation is a task of recognizing and extracting image objects based on objects in an image, and the controller 150 recognizes different image objects corresponding to different objects as separate types or classes. Instance segmentation can be performed.

예를 들어, 도 6에 도시된 것과 같이, 초기 영상(400)에 대해 세그멘테이션이 수행되는 경우, 제어부(150)는 이미지 객체 각각을 별개의 의미 있는 주체로서 인식하고, 각각의 이미지 객체를 기준으로 세그멘테이션 마스크(300)를 생성할 수 있다.For example, as shown in FIG. 6 , when segmentation is performed on the initial image 400, the controller 150 recognizes each image object as a separate meaningful subject, and based on each image object A segmentation mask 300 may be generated.

즉, 제어부(150)는, 단위 영역에 포함된 이미지 객체를 기준으로, 대상물 각각을 인식 및 추적하며, 단위 영역 내에 포함된 이미지 객체에 대한 세그멘테이션 마스크를 생성 및 추출하여, 대상물 각각에 대한 형상 정보를 확보할 수 있다.That is, the controller 150 recognizes and tracks each object based on the image object included in the unit area, generates and extracts a segmentation mask for the image object included in the unit area, and shapes information about each object. can be obtained.

따라서, 제어부(150)는 초기 영상(400)에서 제1 단위 영역(201a)에 포함된 제1 대상물(201)에 대응되는 제1 이미지 객체(411)에 대한 제1 세그멘테이션 마스크(301)를 추출할 수 있다. 그리고, 제어부(150)는 제2 단위 영역(202a)에 포함된 제2 대상물(202)에 대응되는 제2 이미지 객체(412)에 대한 제2 세그멘테이션 마스크(302)를 추출할 수 있다. 이와 같은 방식으로 제어부(150)는 모든 대상물에 대해 각각 대응되는 세그멘테이션 마스크를 추출할 수 있다.Accordingly, the controller 150 extracts the first segmentation mask 301 for the first image object 411 corresponding to the first object 201 included in the first unit area 201a from the initial image 400. can do. Also, the controller 150 may extract the second segmentation mask 302 for the second image object 412 corresponding to the second object 202 included in the second unit area 202a. In this way, the controller 150 can extract segmentation masks corresponding to all objects.

한편, 추출된 세그멘테이션 마스크 각각은 서로 다른 라벨(또는 마스크 라벨)을 가질 수 있다. 이때, 세그멘테이션 마스크의 라벨은, 상기 각각의 단위 영역에 설정된 라벨에 근거하여 결정될 수 있다. 제어부(150)는 단위 영역의 라벨과 세그멘테이션 마스크의 라벨을 동일하거나, 서로 대응되도록 설정할 수 있다.Meanwhile, each extracted segmentation mask may have a different label (or mask label). In this case, a label of the segmentation mask may be determined based on a label set for each unit region. The controller 150 may set the label of the unit area and the label of the segmentation mask to be the same or to correspond to each other.

예를 들어, 도 7에 도시된 것과 같이, 제1 단위 영역의 라벨이 BOX_01인 경우, 제1 단위 영역에 포함된 이미지 객체에 대응되는 제1 세그멘테이션 마스크의 라벨 역시 BOX_01일 수 있다.For example, as shown in FIG. 7 , when the label of the first unit area is BOX_01, the label of the first segmentation mask corresponding to the image object included in the first unit area may also be BOX_01.

제어부(150)는 도 7에 도시된 것과 같이, 각각의 대상물을 기준으로, 단위 영역, 자유도 센서, 세그멘테이션 마스크를 서로 매핑할 수 있다. 따라서, 도 7에 도시된 것과 같이, i)제1 대상물(201)에 구비된 제1 자유도 센서(120a)의 식별 정보(ID_01), ii)제1 대상물(201)에 대응되는 제1 이미지 객체(411)가 포함된 제1 단위 영역(201a)의 라벨(BOX_01), iii)제1 대상물(201)에 대응되는 제1 이미지 객체(411)의 제1 세그멘테이션 마스크(301), iv)제1 대상물(201)에 대응되는 제1 이미지 객체(411)의 제1 세그멘테이션 마스크(301)의 라벨(단위 영역의 라벨과 동일 또는 유사할 수 있음)이 상호 매핑되어, 매핑 정보가 저장부(130)에 저장될 수 있다. As shown in FIG. 7 , the controller 150 may map a unit area, a degree of freedom sensor, and a segmentation mask to each other based on each object. Therefore, as shown in FIG. 7, i) identification information (ID_01) of the first degree of freedom sensor 120a provided in the first object 201, ii) a first image corresponding to the first object 201 A label (BOX_01) of the first unit area 201a including the object 411, iii) a first segmentation mask 301 of the first image object 411 corresponding to the first object 201, iv) Labels of the first segmentation mask 301 of the first image object 411 corresponding to one object 201 (which may be identical to or similar to labels of unit regions) are mutually mapped, and mapping information is stored in the storage unit 130. ) can be stored in

위에서 살펴본 매핑은, 모든 대상물에 대하여 동일하게 이루어질 수 있다. 따라서, 제어부(150)는 후속 영상에서 대상물의 위치 또는 자세가 변경되더라도, 대상물에 대한 추적이 가능하며, 그에 따른 세그멘테이션 마스크 및 물체의 자유도 정보를 학습 데이터로서 수집할 수 있다.The mapping described above may be performed identically for all objects. Accordingly, the controller 150 may track the object even if the position or posture of the object is changed in a subsequent image, and may collect the segmentation mask and degree of freedom information of the object as learning data.

즉, 세그멘테이션의 추출은 후속 영상에서도 연속적으로 이루어질 수 있다. 제어부(150)는 후속 영상에서 단위 영역을 기준으로 각각의 대상물을 추적하고, 각각의 단위 영역에 포함된 이미지 객체를, 추적된 대상물에 대한 세그멘테이션 마스크로서 수집할 수 있다. 한편, 제어부(150)는, 단위 영역을 기준으로 실시간 또는 기 설정된 간격으로 대상물을 추출하고, 추적이되는 현재 시점 이전의 대상물에 대한 위치 및 세그멘테이션 마스크에 대한 정보를 가지고 있다. 따라서, 제어부(150)는 특정 대상물이 다른 대상물에 가려져 임의의 시점에 후속 영상에서 특정 대상물에 대한 이미지 객체가 존재하지 않더라도, 가려짐이 제거되어 다시 후속 영상에서 특정 대상물에 대한 이미지 객체가 나타난 경우, 연속적으로 특정 대상물에 대한 추적을 진행할 수 있다.That is, extraction of segmentation may be continuously performed in subsequent images. The controller 150 may track each object based on the unit area in the subsequent image and collect image objects included in each unit area as a segmentation mask for the tracked object. Meanwhile, the control unit 150 extracts an object in real time or at predetermined intervals based on a unit area, and has information about the position of the object before the current point in time and the segmentation mask to be tracked. Accordingly, the control unit 150 controls the case where an image object for a specific object appears again in a subsequent image after the occlusion is removed, even if an image object for a specific object does not exist in a subsequent image at any point in time because the specific object is occluded by another object. , it is possible to continuously track a specific object.

한편, 제어부(150)는, 도 7에 도시된 것과 같이, 대상물에 구비된 자유도 센서로부터 자유도 정보를 수집할 수 있다. 이러한 자유도 정보는, 대상물의 3차원 위치 정보 및 3차원 자세 정보 중 적어도 하나를 포함할 수 있다.Meanwhile, as shown in FIG. 7 , the controller 150 may collect degree of freedom information from a degree of freedom sensor provided in an object. The degree of freedom information may include at least one of 3D position information and 3D posture information of an object.

나아가, 수집된 자유도 정보는, 각각의 자유도 센서의 식별 정보에 매핑된 단위 영역의 라벨과 매핑하여 저장되므로, 제어부(150)는 대상물에 대한 자유도 정보를 세그멘테이션 마스크와 함께 획득할 수 있다. 예를 들어, 제1 자유도 센서로부터 수집된 자유도 정보(또는 센싱 정보)는, 제1 자유도 센서의 식별 정보(ID_01)와 매핑된 제1 단위 영역의 라벨(BOX_01) 및 제1 세그멘테이션 마스크(301)와 매핑될 수 있다. Furthermore, since the collected DOF information is stored after being mapped to a label of a unit area mapped to identification information of each DOF sensor, the controller 150 can acquire the DOO information on an object together with a segmentation mask. . For example, the degree of freedom information (or sensing information) collected from the first degree of freedom sensor may include the identification information (ID_01) of the first degree of freedom sensor, the label (BOX_01) of the first unit area mapped, and the first segmentation mask. (301).

한편, 도 7에 도시된 것과 같이, 자유도 센서로부터 수신되는 센싱 정보(또는 자유도 정보)는, 자유도 센서의 식별 정보(센서_ID)와 매핑된 통신 포트(PORT)로부터 수신되면, 이러한 통신 포트에 대한 식별 정보(PORT ID) 역시, 자유도 센서의 식별 정보(센서_ID)와 매핑될 수 있다. 따라서, 제어부(150)는 통신 포트를 통해 수신되는 자유도 정보를, 각각의 통신 포트에 매핑된 자유도 센서의 식별 정보에 매핑되도록 저장할 수 있다.On the other hand, as shown in FIG. 7, when sensing information (or DOF information) received from the DOF sensor is received from a communication port (PORT) mapped to identification information (Sensor_ID) of the DOF sensor, such Identification information (PORT ID) for a communication port may also be mapped with identification information (Sensor_ID) of a DOF sensor. Accordingly, the control unit 150 may store the DOF information received through the communication port to be mapped to identification information of the DOF sensor mapped to each communication port.

도 7에 도시된 것과 같이, 초기 영상을 이용하여, 단위 영역의 라벨, 자유도 센서의 식별 정보, 통신 포트, 대상물의 초기 위치 정보 및 자세 정보 및 세그멘테이션 마스크, 세그멘테이션 마스크의 라벨 중 적어도 두개 간의 일대일 매핑이 완료되면, 학습 데이터 수집을 위한 설정이 완료될 수 있다.As shown in FIG. 7, using an initial image, one-to-one between at least two of a label of a unit area, identification information of a degree of freedom sensor, a communication port, initial position information and posture information of an object, a segmentation mask, and a label of the segmentation mask. When mapping is completed, settings for learning data collection may be completed.

이러한 설정이 완료된 경우, 제어부(150)는 후속 영상으로부터, 각각의 대상물을 기준으로 대상물을 추적하고, 추적된 대상물에 대한 세그멘테이션 마스크를 추출하며, 나아가, 추적된 대상물에 대한 자유도 정보를 수집할 수 있다. 그리고, 후속 영상에서 추출된 세그멘테이션 마스크와, 대상물의 자유도 정보는 상호 매핑될 수 있다. 따라서, 대상물의 위치 및 자세가 변경되는 경우라도, 제어부(150)는 대상물을 기준으로, 대상물의 변경된 자세에 대한 세그멘테이션 마스크를 추출하고, 이때의 대상물의 자유도 정보를 수집할 수 있다. 따라서, 제어부(150)는 다양한 자세를 갖는 대상물에 대한 학습 데이터를 수집할 수 있다.When these settings are completed, the controller 150 tracks objects based on each object from subsequent images, extracts a segmentation mask for the tracked object, and collects information on degrees of freedom for the tracked object. can In addition, the segmentation mask extracted from the subsequent image and DOF information of the object may be mutually mapped. Accordingly, even when the position and posture of the object are changed, the controller 150 may extract a segmentation mask for the changed posture of the object based on the object, and collect degree of freedom information of the object at this time. Accordingly, the controller 150 may collect learning data for objects having various postures.

보다 구체적으로 본 발명에서는, 후속 영상에, 복수의 대상물에 대한 세그멘테이션을 수행하고(S340), 자유도 센서(또는 센서)로부터 수집(또는 수신)되는 자유도 정보(또는 센싱 정보)와 세그멘테이션 수행의 결과 추출된 세그멘테이션 마스크를 학습데이터로서 저장하는 과정을 수행할 수 있다(S350).More specifically, in the present invention, segmentation is performed on a plurality of objects in a subsequent image (S340), and degree of freedom information (or sensing information) collected (or received) from a degree of freedom sensor (or sensor) and segmentation performance A process of storing the resulting extracted segmentation mask as learning data may be performed (S350).

제어부(150)는 도 8의 (a)에 도시된 것과 같이, 카메라(110)를 통해 수신되는 후속 영상으로부터 상기 복수의 대상물(410)을 추적하여, 도 8의 (b)에 도시된 것과 같이, 상기 복수의 대상물(410)마다 설정된 라벨(예를 들어, 단위 영의 라벨 또는 세그멘테이션 마스크의 라벨)에 각각 대응되는 세그멘테이션 마스크(segmentation mask, 300a)를 추출할 수 있다.As shown in (a) of FIG. 8, the controller 150 tracks the plurality of objects 410 from the subsequent image received through the camera 110, and as shown in (b) of FIG. , Segmentation masks 300a corresponding to labels set for each of the plurality of objects 410 (eg, unit zero labels or segmentation mask labels) may be extracted.

후속 영상에서 추출되는 세그멘테이션 마스크의 라벨은, 초기 영상에서 설정된 세그멘테이션 마스크의 라벨과 동일하다. 이 경우, 세그멘테이션 마스크의 라벨은 단위 영역은 라벨과 동일 또는 유사할 수 있다.The label of the segmentation mask extracted from the subsequent image is the same as the label of the segmentation mask set in the initial image. In this case, the label of the segmentation mask may be the same as or similar to the label of the unit area.

제어부(150)는 복수의 대상물 마다 설정된 라벨을 기준으로, 도 9에 도시된 것과 같이, 각각의 대상물에 대응되는 세그멘테이션 마스크(300a)와 자유도 센서로부터 센싱되는 자유도 정보(도 9의 “자유도”항목 참조)를 포함하는 학습 데이터를 수집할 수 있다. As shown in FIG. 9, based on labels set for each of a plurality of objects, the control unit 150 controls the degree of freedom information sensed from the degree of freedom sensor and the segmentation mask 300a corresponding to each object (“freedom” in FIG. 9). (Refer to “Do” item) can be collected.

제어부(150)는, 각각의 세그멘테이션 마스크가 추출된 프레임이 촬영된 시점과 대응되는 시점에 자유도 센서로부터 센싱된 자유도 정보를, 각각의 세그멘테이션 마스크와 매칭하여 저장할 수 있다.The controller 150 may match and store degree of freedom information sensed from the degree of freedom sensor at a time point corresponding to a time point at which a frame from which each segmentation mask is extracted is captured and stored.

후속 영상이 동영상인 경우, 자유도 센서의 센싱 주파수(Hz)는 동영상의 FPS(Frame Per Second)와 대응될 수 있다. 즉, 30FPS를 갖는 동영상의 경우, 자유도 센서의 센싱 주파수 역시 30Hz로 구성될 수 있으며, 이 경우, 각각의 프레임이 촬영되는 시점은 자유도 센서에서 물체의 자유도 정보가 센싱되는 시점과 일치할 수 있다.If the subsequent video is a video, the sensing frequency (Hz) of the DOF sensor may correspond to the frame per second (FPS) of the video. That is, in the case of a video having 30 FPS, the sensing frequency of the DOF sensor may also be configured to be 30 Hz, and in this case, the time point at which each frame is captured coincides with the time point at which the DOF information of the object is sensed by the DOF sensor. can

한편, 본 발명에서 후속 영상은 초(second) 당 기 설정된 수의 프레임(frame)을 포함하는 동영상으로 이루어지거나, 기 설정된 시간 간격으로 촬영되는 복수의 영상을 포함할 수 있다. 이러한 복수의 영상은 프레임으로 명명될 수 있다.Meanwhile, in the present invention, a follow-up image may include a video including a preset number of frames per second or a plurality of images captured at preset time intervals. Such a plurality of images may be referred to as a frame.

제어부(150)는, 후속 영상에 포함된 프레임 모두에 대해 세그멘테이션 마스크를 추출하여 학습 데이터를 생성하거나, 또는 후속 영상을 구성하는 복수의 프레임 중 기 설정된 간격(또는 시간 간격) 마다 상기 복수의 대상물에 각각 대응되는 세그멘테이션 마스크를 추출하여 학습 데이터를 생성하는 것 또한 가능하다.The control unit 150 generates learning data by extracting a segmentation mask for all frames included in the subsequent image, or the plurality of objects at predetermined intervals (or time intervals) among a plurality of frames constituting the subsequent image. It is also possible to generate training data by extracting segmentation masks corresponding to each.

따라서, 제어부(150)는 i)기 설정된 간격 또는 매 프레임 마다 복수의 대상물(410) 중 제1 대상물(411)에 대응되는 제1 세그멘테이션 마스크(301a)와, 제2 대상물(412)에 대응되는 제2 세그멘테이션 마스크(302a)를 추출하고, ii)상기 제1 대상물에 구비된 제1 자유도 센서(ID_01)로부터 수집된 제1 자유도 정보와 상기 제1 세그멘테이션 마스크(301a)를 매칭하여 상기 제1 대상물(411)에 대한 학습 데이터로서 저장하고, iii)상기 제2 대상물(412)에 구비된 제2 자유도 센서(ID_02)로부터 수집된 제2 자유도 정보와 상기 제2 세그멘테이션 마스크(302a)를 매칭하여 상기 제2 대상물(412)에 대한 학습 데이터로서 저장할 수 있다.Therefore, the control unit 150 i) sets the first segmentation mask 301a corresponding to the first object 411 among the plurality of objects 410 and the second object 412 at predetermined intervals or every frame. A second segmentation mask 302a is extracted, and ii) the first degree of freedom information collected from the first degree of freedom sensor ID_01 provided in the first object is matched with the first segmentation mask 301a to obtain the first degree of freedom. 1 stored as learning data for the object 411, and iii) the second degree of freedom information collected from the second degree of freedom sensor (ID_02) provided in the second object 412 and the second segmentation mask 302a may be matched and stored as learning data for the second object 412 .

이는, 초기 영상에서 복수의 대상물에 대한 동시 추적 및 세그멘테이션 마스크의 추출이 가능하도록, 복수의 대상물 각각에 대해 단위 영역을 설정하였기 때문에 가능하다. This is possible because a unit area is set for each of a plurality of objects so that simultaneous tracking and segmentation mask extraction of a plurality of objects are possible in the initial image.

이와 같이, 제어부(150)는 후속 영상에서 도 9에 도시된 것과 같이, 후속 영상에서 각각의 대상물에 대한 세그멘테이션 마스크를 추출하고, 해당 세그멘테이션 마스크를 가질 때의 대상물의 자유도 정보를 상호 매칭하여 저장함으로써, 복수의 대상물에 대한 방대한 양의 학습 데이터를 효율적으로 수집할 수 있다.In this way, as shown in FIG. 9 in the subsequent image, the controller 150 extracts a segmentation mask for each object in the subsequent image, and stores degree-of-freedom information of the object when the corresponding segmentation mask is obtained by matching with each other. By doing so, it is possible to efficiently collect a vast amount of learning data for a plurality of objects.

나아가, 본 발명에서는, 복수의 대상물 중 적어도 하나에 대하여 외력을 가함으로써(예를 들어, 복수의 대상물의 자세 및 위치 중 적어도 하나를 의도적으로 변경하기 위한 외력(예를 들어, 복수의 대상물을 섞기 위한 동작 등)), 의도적으로 복수의 대상물 각각에 대해 서로 다른 형상을 갖는 세그멘테이션 마스크 및 이에 따른 자유도 정보를 학습 데이터로서 수집할 수 있다.Furthermore, in the present invention, by applying an external force to at least one of a plurality of objects (eg, an external force for intentionally changing at least one of the posture and position of a plurality of objects (eg, mixing a plurality of objects) operation, etc.)), a segmentation mask having a different shape for each of a plurality of objects and information on the degree of freedom corresponding thereto may be intentionally collected as learning data.

한편, 본 발명에 따른 제어부(150)는 복수의 대상물 각각에 대한 세그멘테이션 마스크 및 이에 대한 자유도 정보만을 상호 매칭하여 저장하는 것 뿐만 아니라, 각각의 대상물 주변에 배치된 다른 대상물(예를 들어, 인접하거나, 중첩된 대상물 등)에 대한 세그멘테이션 마스크 및 자유도 정보를 함께 저장하여, 복수의 대상물 간의 상호 위치관계에 대한 정보까지 학습 데이터로서 확보하도록 할 수 있다.On the other hand, the control unit 150 according to the present invention not only mutually matches and stores only the segmentation mask for each of a plurality of objects and information on the degree of freedom thereof, but also other objects (eg, adjacent objects) disposed around each object. Alternatively, information on mutual positional relationships between a plurality of objects may be secured as learning data by storing segmentation masks and degree-of-freedom information for overlapping objects, etc.) together.

예를 들어, 제어부(150)는 도 8의 (a)에 도시된 것과 같이, 제2 대상물(412)에 대한 세그멘테이션 마스크(302a) 및 자유도 정보와 함께, 제2 대상물(412) 주변에 배치된 제1 및 제3 대상물(411, 413)에 해당하는 세그멘테이션 마스크(301a, 303a) 및 자유도 정보를, 제2 대상물(412)에 대한 학습 데이터로서 포함시킬 수 있다.For example, as shown in (a) of FIG. 8 , the controller 150 is disposed around the second object 412 together with the segmentation mask 302a and the degree of freedom information of the second object 412. The segmentation masks 301a and 303a and degree-of-freedom information corresponding to the first and third objects 411 and 413 may be included as training data for the second object 412 .

나아가, 제어부(150)는 특정 대상물이 다른 대상물에 가려져 있는 경우, 이에 대한 정보 역시 학습 데이터로서 함께 저장할 수 있다.Furthermore, when a specific object is covered by another object, the controller 150 may also store information on this as learning data.

이를 통하여, 본 발명에서는 주변 상황까지 고려할 수 있는 대상물에 대한 학습 데이터를 확보할 수 있다.Through this, in the present invention, it is possible to secure learning data for an object that can consider the surrounding situation.

한편, 위에서 살펴본 본 발명은, 컴퓨터에서 하나 이상의 프로세스에 의하여 실행되며, 이러한 컴퓨터로 판독될 수 있는 매체(또는 기록 매체)에 저장 가능한 프로그램으로서 구현될 수 있다.Meanwhile, the present invention described above may be implemented as a program that is executed by one or more processes in a computer and can be stored in a computer-readable medium (or a recording medium).

나아가, 위에서 살펴본 본 발명은, 프로그램이 기록된 매체에 컴퓨터가 읽을 수 있는 코드 또는 명령어로서 구현하는 것이 가능하다. 즉, 본 발명은 프로그램의 형태로 제공될 수 있다. Furthermore, the present invention described above can be implemented as computer readable codes or instructions in a medium on which a program is recorded. That is, the present invention may be provided in the form of a program.

한편, 컴퓨터가 읽을 수 있는 매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 매체의 예로는, HDD(Hard Disk Drive), SSD(Solid State Disk), SDD(Silicon Disk Drive), ROM, RAM, CD-ROM, 자기 테이프, 플로피 디스크, 광 데이터 저장 장치 등이 있다. On the other hand, the computer-readable medium includes all types of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable media include Hard Disk Drive (HDD), Solid State Disk (SSD), Silicon Disk Drive (SDD), ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, etc. there is

나아가, 컴퓨터가 읽을 수 있는 매체는, 저장소를 포함하며 전자기기가 통신을 통하여 접근할 수 있는 서버 또는 클라우드 저장소일 수 있다. 이 경우, 컴퓨터는 유선 또는 무선 통신을 통하여, 서버 또는 클라우드 저장소로부터 본 발명에 따른 프로그램을 다운로드 받을 수 있다.Furthermore, the computer-readable medium may be a server or cloud storage that includes storage and can be accessed by electronic devices through communication. In this case, the computer may download the program according to the present invention from a server or cloud storage through wired or wireless communication.

나아가, 본 발명에서는 위에서 설명한 컴퓨터는 프로세서, 즉 CPU(Central Processing Unit, 중앙처리장치)가 탑재된 전자기기로서, 그 종류에 대하여 특별한 한정을 두지 않는다.Furthermore, in the present invention, the above-described computer is an electronic device equipped with a processor, that is, a CPU (Central Processing Unit), and there is no particular limitation on its type.

한편, 상기의 상세한 설명은 모든 면에서 제한적으로 해석되어서는 아니되고 예시적인 것으로 고려되어야 한다. 본 발명의 범위는 첨부된 청구항의 합리적 해석에 의해 결정되어야 하고, 본 발명의 등가적 범위 내에서의 모든 변경은 본 발명의 범위에 포함된다.On the other hand, the above detailed description should not be construed as limiting in all respects and should be considered as illustrative. The scope of the present invention should be determined by reasonable interpretation of the appended claims, and all changes within the equivalent scope of the present invention are included in the scope of the present invention.

Claims

In the learning data collection method performed by the computing system,
Receiving an initial image of a plurality of objects each equipped with a degree of freedom sensor through a camera;
setting, by a controller, different labels for each of the plurality of objects, based on an image object corresponding to each of the plurality of objects included in the initial image;
mapping, by the control unit, identification information of a degree of freedom sensor provided to each of the plurality of objects and a label set to each of the plurality of objects on a one-to-one basis;
extracting, by the control unit, a segmentation mask respectively corresponding to a label set for each of the plurality of objects by tracking the plurality of objects from subsequent images received through the camera; and
and collecting, by the control unit, learning data including the degree of freedom information sensed by the degree of freedom sensor and the segmentation mask based on a label set for each of the plurality of objects. Learning data collection method.

According to claim 1,
The subsequent video is a video including a preset number of frames per second,
In the step of extracting the segmentation mask,
The control unit extracts a segmentation mask corresponding to each of the plurality of objects at predetermined intervals among successive frames included in the subsequent image,
In the step of collecting the learning data,
Wherein the control unit stores the degree of freedom information sensed from the degree of freedom sensor at a time point corresponding to a time point at which the frame from which the segmentation mask is extracted is captured by matching with the segmentation mask.

According to claim 2,
The plurality of objects include a first object and a second object,
In the step of extracting the segmentation mask,
The control unit extracts a first segmentation mask corresponding to the first object and a second segmentation mask corresponding to the second object for each frame of the preset interval,
In the step of collecting the learning data,
In the control unit, first degree of freedom information collected from a first degree of freedom sensor provided in the first object is matched with the first segmentation mask and stored in a storage unit as learning data for the first object,
In the control unit, the second degree of freedom information collected from the second degree of freedom sensor provided in the second object is matched with the second segmentation mask and stored in the storage unit as learning data for the second object. How to collect learning data.

According to claim 3,
Each of the first and second degree of freedom information,
Learning data collection method characterized in that it comprises at least one of 3-dimensional position information and 3-dimensional attitude information for each of the first and second objects.

According to claim 4,
The first object,
At least one of the position and posture is changed by an external force while the subsequent image is received;
The learning data for the first object,
Learning data collection method characterized by comprising three-dimensional position information and three-dimensional posture information corresponding to the at least one changed position and posture.

According to claim 2,
The learning data collection method, characterized in that the sensing frequency of the degree of freedom sensor corresponds to FPS (Frame Per Second) of the video.

According to claim 1,
In the step of setting the label,
Based on a deep learning algorithm, to enable tracking of the plurality of objects from the subsequent image,
The learning data collection method, characterized in that a unit area for specifying an initial position of an image object corresponding to each of the plurality of objects in the initial image is designated by the control unit.

According to claim 7,
The unit area is
In the initial image, each image object corresponding to each of the plurality of objects is specified,
Learning data collection method, characterized in that each unit area includes any one image object corresponding to each unit area among image objects corresponding to each of the plurality of objects.

According to claim 8,
The learning data collection method, characterized in that the different labels set for each of the plurality of objects are labels set based on the plurality of objects in each unit area.

According to claim 9,
In the step of setting the label,
In the control unit, performing segmentation on image objects included in each unit area and extracting a segmentation mask for each image object included in each unit area,
The method of collecting learning data, characterized in that the label of the segmentation mask extracted for each image object is specified based on the label set for each unit area.

According to claim 10,
The plurality of objects include a first object and a second object,
Identification information of a first degree of freedom sensor provided in the first object is mapped to a first label set in a first unit area including a first image object corresponding to the first object among the respective unit areas,
The identification information of the second degree of freedom sensor provided in the second object is mapped to a second label set in a second unit area including a second image object corresponding to the second object among the respective unit areas. Characterized learning data collection method.

According to claim 11,
Among the segmentation masks extracted from the initial image, a first mask label of a first segmentation mask corresponding to the first image object corresponds to the first label,
The learning data collection method, characterized in that the second mask label of the second segmentation mask corresponding to the second image object corresponds to the second label.

According to claim 12,
A mask label of a segmentation mask corresponding to the first object extracted from the subsequent image corresponds to the first mask label,
A mask label of a segmentation mask corresponding to the second object extracted from the subsequent image corresponds to the second mask label.

a sensing unit provided on each of a plurality of objects and including sensors configured to sense degree of freedom information for each of the plurality of objects;
a camera unit configured to photograph the plurality of objects;
Based on an image object corresponding to each of the plurality of objects included in the initial image received from the camera, different labels are set for each of the plurality of objects;
And a control unit for one-to-one mapping identification information of a sensor provided on each of the plurality of objects and a label set on each of the plurality of objects,
The control unit,
Tracking the plurality of objects from subsequent images received through the camera and extracting segmentation masks respectively corresponding to labels set for each of the plurality of objects;
The learning data collection system, characterized in that for collecting the segmentation mask and degree-of-freedom information collected from the sensors as learning data based on labels set for each of the plurality of objects.

A program that is executed by one or more processes in an electronic device and can be stored in a computer-readable recording medium,
said program,
Receiving an initial image of a plurality of objects each equipped with a degree of freedom sensor through a camera;
setting different labels for each of the plurality of objects based on image objects corresponding to each of the plurality of objects included in the initial image;
one-to-one mapping the identification information of the degree of freedom sensor provided to each of the plurality of objects and the label set to each of the plurality of objects;
tracking the plurality of objects from subsequent images received through the camera and extracting segmentation masks corresponding to labels set for each of the plurality of objects; and
Based on the label set for each of the plurality of objects, the step of storing the degree of freedom information collected from the segmentation mask and the degree of freedom sensor as learning data A computer-readable record comprising instructions for performing the step. A program that can be stored on media.