KR20230122551A

KR20230122551A - Motion data annotation method and system for ai learning

Info

Publication number: KR20230122551A
Application number: KR1020230018002A
Authority: KR
Inventors: 박진하; 김민경; 정하형; 김동현
Original assignee: 주식회사 아이팝; 전주대학교 산학협력단
Priority date: 2022-02-14
Filing date: 2023-02-10
Publication date: 2023-08-22

Abstract

A motion data annotation method for artificial intelligence (AI) learning according to an embodiment of the present invention, which can quickly and accurately acquire motion data for AI learning using 2D images and also 3D images, comprises the steps of: matching a coordinate system of a first sensor system and a coordinate system of a second sensor system; taking a three-dimensional image including joint position information of a skeleton using the first sensor system; taking a two-dimensional image using the second sensor system; projecting the joint position information of the skeleton to a virtual camera applying internal and external variables of the second sensor system; and by using the joint position information of the skeleton projected to the virtual camera and the two-dimensional image taken by the second sensor system, automatically annotating joints in the two-dimensional image taken by the second sensor system.

Description

Motion data annotation method and system for AI learning {MOTION DATA ANNOTATION METHOD AND SYSTEM FOR AI LEARNING}

본 발명의 실시예는 AI(Artificial Intelligence) 학습용 모션 데이터 어노테이션 방법에 관한 것으로, 보다 상세하게는 이종 센서 시스템의 정합을 통한 AI 학습용 모션 데이터 어노테이션 방법에 관한 것이다.An embodiment of the present invention relates to a motion data annotation method for AI (Artificial Intelligence) learning, and more particularly, to a motion data annotation method for AI learning through matching of heterogeneous sensor systems.

모션캡처는 객체의 움직임을 디지털 형태로 기록하는 작업을 의미하며, 게임 산업, 애니메이션 산업 및 VR(Virtual Reality) 기술이 발전함에 따라 고품질의 모션캡처가 요구되고 있다.Motion capture refers to recording the movement of an object in digital form, and high-quality motion capture is required as the game industry, animation industry, and VR (Virtual Reality) technology develop.

2차원 영상 내 객체를 3차원 모델로 변환 시 객체의 자세(pose)와 형태(shape)를 추론하기 위한 데이터 셋이 필요하다. 이러한 데이터 셋은 객체의 관절 정보, 자세 정보 및 형태 정보를 포함하며, AI 학습에 이용될 수 있다.When converting an object in a 2D image into a 3D model, a data set is required to infer the pose and shape of the object. This data set includes object joint information, posture information, and shape information, and can be used for AI learning.

일반적으로, 2차원 영상으로부터 객체의 관절, 자세 및 형태 중 적어도 하나를 추정하기 위해 어노테이션 툴이 이용될 수 있다. 이에 따르면, 2차원 영상 내 객체의 관절의 이차원 위치 정보는 수작업으로 어노테이션될 수 있다.In general, an annotation tool may be used to estimate at least one of a joint, posture, and shape of an object from a 2D image. According to this, 2D location information of joints of objects in a 2D image may be manually annotated.

도 1 내지 도 2는 일반적인 어노테이션 툴의 예시이다.1-2 are examples of general annotation tools.

도 1 내지 도 2를 참조하면, 사용자가 수작업으로 객체의 관절을 어노테이션할 수 있다. 도 1과 같이, 몸의 형태가 드러나는 옷을 착용한 모델의 경우 어노테이션 과정이 수월하며, 어노테이션의 정확도가 높다. 이에 반해, 도 2와 같이, 다양한 장구류를 착용하거나 치마와 같은 신체 부위가 가려지는 옷을 착용한 모델의 경우, 자세 추정의 오류가 발생할 수 있다.Referring to FIGS. 1 and 2 , a user may manually annotate joints of an object. As shown in FIG. 1 , in the case of a model wearing clothes revealing the shape of the body, the annotation process is easy and the annotation accuracy is high. On the other hand, as shown in FIG. 2 , in the case of a model wearing various types of equipment or clothes that cover body parts such as a skirt, an error in posture estimation may occur.

이와 같이, 어노테이션 툴을 이용하여 수작업으로 어노테이션을 수행하는 경우, 정확한 관절의 위치를 얻기 어려울 수 있다. 수작업으로 수행된 어노테이션 결과를 이용하여 AI 학습이 수행될 경우, 오차가 큰 스켈레톤이 생성될 수 있다.As such, when manually performing annotation using an annotation tool, it may be difficult to obtain an accurate joint position. When AI learning is performed using manually performed annotation results, a skeleton with a large error may be generated.

본 발명의 실시예는 AI(Artificial Intelligence) 학습용 모션 데이터를 어노테이션하는 방법 및 장치를 제공하고자 한다.An embodiment of the present invention is to provide a method and apparatus for annotating motion data for AI (Artificial Intelligence) learning.

본 발명의 실시예에 따르면, 2차원 영상 내 객체의 자세를 예측하는 데 필요한 AI 학습용 모션 데이터를 어노테이션하는 방법 및 장치를 제공하고자 한다. According to an embodiment of the present invention, it is intended to provide a method and apparatus for annotating motion data for AI learning necessary for predicting a posture of an object in a 2D image.

본 발명의 실시예에서 해결하고자 하는 과제는 이에 한정되는 것은 아니며, 아래에서 설명하는 과제의 해결수단이나 실시 형태로부터 파악될 수 있는 목적이나 효과도 포함된다고 할 것이다.The problem to be solved in the embodiments of the present invention is not limited thereto, and it will be said that the purpose or effect that can be grasped from the solution or embodiment of the problem described below is also included.

본 발명의 한 실시예에 따른 AI(Artificial Intelligence) 학습용 모션 데이터 어노테이션 방법은 제1 센서 시스템의 좌표계 및 제2 센서 시스템의 좌표계를 정합하는 단계, 상기 제1 센서 시스템을 이용하여 스켈레톤의 관절 위치 정보를 포함하는 3차원 영상을 촬영하는 단계, 상기 제2 센서 시스템을 통하여 2차원 영상을 촬영하는 단계, 상기 스켈레톤의 관절 위치 정보를 상기 제2 센서 시스템의 내부 변수 및 외부 변수를 적용한 가상 카메라에 사영하는 단계, 그리고 상기 가상 카메라에 사영된 상기 스켈레톤의 관절 위치 정보와 상기 제2 센서 시스템을 통하여 촬영된 2차원 영상을 이용하여 상기 제2 센서 시스템을 통하여 촬영된 2차원 영상 내 관절을 자동으로 어노테이션하는 단계를 포함한다.A motion data annotation method for AI (Artificial Intelligence) learning according to an embodiment of the present invention includes the steps of matching a coordinate system of a first sensor system and a coordinate system of a second sensor system, joint position information of a skeleton using the first sensor system. photographing a 3D image including, photographing a 2D image through the second sensor system, and projecting the joint position information of the skeleton to a virtual camera to which internal and external variables of the second sensor system are applied. and automatically annotating joints in the 2D image captured through the second sensor system using joint position information of the skeleton projected on the virtual camera and the 2D image captured through the second sensor system. It includes steps to

상기 사영하는 단계에서, 상기 스켈레톤의 관절 위치 정보는 상기 가상 카메라에 2차원 영상으로 사영될 수 있다.In the projecting step, the joint position information of the skeleton may be projected as a 2D image on the virtual camera.

상기 어노테이션하는 단계에서, 상기 가상 카메라에 2차원 영상으로 사영된 상기 스켈레톤의 관절 위치 정보는 상기 제2 센서 시스템을 통하여 촬영된 2차원 영상에 오버랩될 수 있다.In the annotating, the joint position information of the skeleton projected as a 2D image by the virtual camera may be overlapped with the 2D image captured through the second sensor system.

상기 어노테이션하는 단계에서, 상기 가상 카메라에 사영된 상기 스켈레톤의 관절 위치 정보는 2차원 영상 좌표로 변환되며, 상기 2차원 영상 좌표는 상기 제2 센서 시스템을 통하여 촬영된 2차원 영상에 자동으로 매칭될 수 있다.In the annotating step, the joint position information of the skeleton projected on the virtual camera is converted into 2D image coordinates, and the 2D image coordinates are automatically matched to the 2D image captured through the second sensor system. can

상기 정합하는 단계 전에 상기 제2 센서 시스템의 상기 내부 변수를 추정하는 단계를 더 포함할 수 있다.The method may further include estimating the internal variable of the second sensor system before the matching.

상기 제1 센서 시스템은, 적외선의 광을 출력하며, 객체로부터 반사된 상기 적외선의 광을 수신하여 상기 객체의 위치를 추정하는 광학식 모션캡쳐 모듈, 상기 객체의 움직임을 센싱하는 센서식 모션캡쳐 모듈, 그리고 상기 광학식 모션캡쳐 모듈로부터 추정된 정보 및 상기 센서식 모션캡쳐 모듈로부터 센싱된 정보를 통합하는 통합 모듈을 포함할 수 있다.The first sensor system includes: an optical motion capture module that outputs infrared light and receives the infrared light reflected from the object to estimate the position of the object; a sensor-type motion capture module that senses the motion of the object; And it may include an integration module for integrating the information estimated from the optical motion capture module and the information sensed from the sensor-type motion capture module.

상기 정합하는 단계에서는, 상기 광학식 모션캡쳐 모듈의 보정 플레이트와 상기 제2 센서 시스템의 마커의 원점이 일치화될 수 있다.In the matching step, origins of the correction plate of the optical motion capture module and the marker of the second sensor system may be matched.

상기 제2 센서 시스템의 외부 변수는 상기 마커와 상기 제2 센서 시스템 간의 위치 및 자세 정보를 포함할 수 있다.External variables of the second sensor system may include position and attitude information between the marker and the second sensor system.

상기 3차원 영상을 촬영하는 단계 및 상기 2차원 영상을 촬영하는 단계는 동시에 수행될 수 있다.The capturing of the 3D image and the capturing of the 2D image may be performed simultaneously.

본 발명의 한 실시예에 따른 AI(Artificial Intelligence) 학습용 모션 데이터 어노테이션 시스템은 스켈레톤의 관절 위치 정보를 포함하는 3차원 영상을 촬영하는 제1 센서 시스템, 2차원 영상을 촬영하는 제2 센서 시스템, 그리고 상기 제1 센서 시스템의 좌표계 및 상기 제2 센서 시스템의 좌표계를 정합하고, 상기 스켈레톤의 관절 위치 정보를 상기 제2 센서 시스템의 내부 변수 및 외부 변수를 적용한 가상 카메라에 사영하며, 상기 가상 카메라에 사영된 상기 스켈레톤의 관절 위치 정보와 상기 제2 센서 시스템을 통하여 촬영된 2차원 영상을 이용하여 상기 제2 센서 시스템을 통하여 촬영된 2차원 영상 내 관절을 자동으로 어노테이션하는 어노테이션 장치를 포함한다.A motion data annotation system for artificial intelligence (AI) learning according to an embodiment of the present invention includes a first sensor system for capturing a 3D image including joint position information of a skeleton, a second sensor system for capturing a 2D image, and The coordinate system of the first sensor system and the coordinate system of the second sensor system are matched, the joint position information of the skeleton is projected on a virtual camera to which the internal and external variables of the second sensor system are applied, and projected on the virtual camera. and an annotation device that automatically annotates joints in the 2D image captured through the second sensor system using the joint position information of the skeleton and the 2D image captured through the second sensor system.

본 발명의 실시예에 따르면, 2차원 영상뿐만 아니라 3차원 영상을 이용하여 AI(Artificial Intelligence) 학습용 모션 데이터를 빠르고 정확하게 획득할 수 있다.According to an embodiment of the present invention, motion data for AI (Artificial Intelligence) learning can be quickly and accurately obtained using a 3D image as well as a 2D image.

본 발명의 실시예에 따르면, 2차원 영상 내 장신구 또는 옷 등에 의해 가려진 신체 부위에 대한 효과적인 어노테이션이 가능하다. According to an embodiment of the present invention, it is possible to effectively annotate body parts hidden by accessories or clothes in a 2D image.

본 발명의 다양하면서도 유익한 장점과 효과는 상술한 내용에 한정되지 않으며, 본 발명의 구체적인 실시형태를 설명하는 과정에서 보다 쉽게 이해될 수 있을 것이다.Various advantageous advantages and effects of the present invention are not limited to the above description, and will be more easily understood in the process of describing specific embodiments of the present invention.

도 1 내지 도 2는 일반적인 어노테이션 툴의 예시이다.
도 3은 본 발명의 실시예에 따라 AI 학습용 모션 데이터를 어노테이션하기 위한 시스템의 개요도이다.
도 4는 본 발명의 실시예에 따라 AI 학습용 모션 데이터를 어노테이션하는 방법의 순서도이다.
도 5는 본 발명의 실시예에 따른 제1 센서 시스템의 블록도이다.
도 6은 본 발명의 실시예에 따른 제1 센서 시스템의 모션캡처 방법의 순서도이다.
도 7은 본 발명의 한 실시예에 따라 이종 센서의 정합을 통해 인체 포즈의 어노테이션을 수행하는 시스템의 블록도이다.
도 8은 제2 센서 시스템을 보정하는 방법의 한 예이다.
도 9는 본 발명의 한 실싱예에 따른 제1 센서 시스템의 좌표계와 제2 센서 시스템의 좌표계를 정합하는 방법을 설명하기 위한 도면이다.
도 10은 제1 센서 시스템의 좌표계 및 제2 센서 시스템의 좌표계를 정합하기 위하여 도 9(c)의 원점 일치화 지그를 사용하는 예이다.
도 11은 제2 센서 시스템에 의하여 촬영된 영상이다.
도 12는 마커와 제2 센서 시스템 간 외부변수를 설명하기 위한 도면이다.
도 13은 제1 센서 시스템(100) 및 제2 센서 시스템(200)이 각각 동시에 객체를 촬영하는 예이다.
도 14는 제2 센서 시스템(200)을 통해 촬영된 영상의 예이다.
도 15 내지 도 17은 제1 센서 시스템에 의해 촬영된 3차원 영상에 포함되는 스켈레톤의 관절 위치 정보를 가상 카메라에 투영하는 과정을 설명하기 위한 도면이다.
도 18은 가상 카메라를 통해 모션 캡쳐된 스켈레톤 데이터이다.
도 19는 인체 스켈레톤의 개별 관절의 위치 및 자세 정보이다.
도 20은 본 발명의 실시예에 따라 자동 어노테이션된 결과이다.1-2 are examples of general annotation tools.
3 is a schematic diagram of a system for annotating motion data for AI learning according to an embodiment of the present invention.
4 is a flowchart of a method of annotating motion data for AI learning according to an embodiment of the present invention.
5 is a block diagram of a first sensor system according to an embodiment of the present invention.
6 is a flowchart of a motion capture method of the first sensor system according to an embodiment of the present invention.
7 is a block diagram of a system performing annotation of a human body pose through matching of heterogeneous sensors according to an embodiment of the present invention.
8 is an example of a method for calibrating the second sensor system.
9 is a diagram for explaining a method of matching a coordinate system of a first sensor system and a coordinate system of a second sensor system according to one embodiment of the present invention.
10 is an example of using the origin matching jig of FIG. 9(c) to match the coordinate system of the first sensor system and the coordinate system of the second sensor system.
11 is an image captured by the second sensor system.
12 is a diagram for explaining external variables between a marker and a second sensor system.
13 is an example in which the first sensor system 100 and the second sensor system 200 simultaneously photograph an object.
14 is an example of an image captured through the second sensor system 200 .
15 to 17 are diagrams for explaining a process of projecting joint position information of a skeleton included in a 3D image captured by the first sensor system to a virtual camera.
18 is skeleton data motion-captured through a virtual camera.
19 is position and posture information of individual joints of a human skeleton.
20 is an automatically annotated result according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. Since the present invention can make various changes and have various embodiments, specific embodiments are illustrated and described in the drawings. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

제2, 제1 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되지는 않는다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제2 구성요소는 제1 구성요소로 명명될 수 있고, 유사하게 제1 구성요소도 제2 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms including ordinal numbers such as second and first may be used to describe various components, but the components are not limited by the terms. These terms are only used for the purpose of distinguishing one component from another. For example, a second element may be termed a first element, and similarly, a first element may be termed a second element, without departing from the scope of the present invention. The terms and/or include any combination of a plurality of related recited items or any of a plurality of related recited items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as “directly connected” or “directly connected” to another element, it should be understood that no other element exists in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present application, they should not be interpreted in an ideal or excessively formal meaning. don't

이하, 첨부된 도면을 참조하여 실시예를 상세히 설명하되, 도면 부호에 관계없이 동일하거나 대응하는 구성 요소는 동일한 참조 번호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, the embodiments will be described in detail with reference to the accompanying drawings, but the same or corresponding components regardless of reference numerals are given the same reference numerals, and overlapping descriptions thereof will be omitted.

도 3은 본 발명의 실시예에 따라 AI 학습용 모션 데이터를 어노테이션하기 위한 시스템의 개요도이고, 도 4는 본 발명의 실시예에 따라 AI 학습용 모션 데이터를 어노테이션하는 방법의 순서도이며, 도 5는 본 발명의 실시예에 따른 제1 센서 시스템의 블록도이고, 도 6은 본 발명의 실시예에 따른 제1 센서 시스템의 모션캡처 방법의 순서도이며, 도 7은 본 발명의 한 실시예에 따라 이종 센서의 정합을 통해 인체 포즈의 어노테이션을 수행하는 시스템의 블록도이다.3 is a schematic diagram of a system for annotating motion data for AI learning according to an embodiment of the present invention, FIG. 4 is a flowchart of a method for annotating motion data for AI learning according to an embodiment of the present invention, and FIG. 5 is a flowchart of a method for annotating motion data for AI learning according to an embodiment of the present invention. 6 is a flowchart of a motion capture method of the first sensor system according to an embodiment of the present invention, and FIG. 7 is a block diagram of a first sensor system according to an embodiment of the present invention. It is a block diagram of a system that performs annotation of human body poses through registration.

도 3을 참조하면, AI 학습용 모션 데이터를 어노테이션하기 위한 시스템(10)은 제1 센서 시스템(100), 제2 센서 시스템(200) 및 어노테이션 장치(300)를 포함한다. 여기서, 제1 센서 시스템(100)은 하이브리드 모션 캡처 시스템이고, 제2 센서 시스템(200)은 실사 인물을 촬영하는 일반적인 촬영 카메라일 수 있다. 본 발명의 실시예에 따르면, 어노테이션 장치(300)는 제1 센서 시스템(100)으로부터 획득한 데이터와 제2 센서 시스템(200)으로부터 획득한 데이터를 이용하여 제2 센서 시스템(200)에 의해 촬영된 2차원 영상에 어노테이션을 수행한다. 즉, 본 발명의 실시예에 따르면, 어노테이션 장치(300)는 제1 센서 시스템(100)으로부터 획득한 모션캡처 데이터를 이용하여 제2 센서 시스템(200)에 의해 촬영된 2차원 영상인 실사 촬영 영상에 어노테이션을 수행한다.Referring to FIG. 3 , a system 10 for annotating motion data for AI learning includes a first sensor system 100 , a second sensor system 200 and an annotation device 300 . Here, the first sensor system 100 may be a hybrid motion capture system, and the second sensor system 200 may be a general photographing camera for photographing a live person. According to an embodiment of the present invention, the annotation device 300 uses the data obtained from the first sensor system 100 and the data obtained from the second sensor system 200 to take pictures by the second sensor system 200. Annotation is performed on the 2D image. That is, according to an embodiment of the present invention, the annotation device 300 uses the motion capture data obtained from the first sensor system 100 to capture a 2-dimensional image captured by the second sensor system 200. Annotate on

도 5 및 도 6을 참조하면, 하이브리드 모션캡처 시스템(100)은 광학식 모션캡처 모듈(110), 센서식 모션캡처 모듈(120) 및 통합모듈(1300)을 포함한다.Referring to FIGS. 5 and 6 , the hybrid motion capture system 100 includes an optical motion capture module 110 , a sensor motion capture module 120 and an integrated module 1300 .

광학식 모션캡처 모듈(110)은 적어도 두 개의 광학식 카메라를 포함한다. 적어도 두 개의 광학식 카메라는 동일한 지점을 투영하며, 삼각측량법을 통해 객체의 삼차원적 좌표가 역산될 수 있다. 여기서, 객체는 모션캡처의 대상을 의미하며, 강체, 대상체, 인체 등으로 지칭될 수 있다.The optical motion capture module 110 includes at least two optical cameras. At least two optical cameras project the same point, and the three-dimensional coordinates of the object can be inverted through triangulation. Here, the object means an object of motion capture, and may be referred to as a rigid body, a target object, or a human body.

광학식 모션캡처의 정확도를 높이기 위하여, 광학식 모션캡처 모듈(110)은 적어도 하나의 반사 마커, 예를 들어 3개의 반사 마커를 더 포함할 수 있으며, 반사 마커는 객체에 부착될 수 있다.In order to increase the accuracy of optical motion capture, the optical motion capture module 110 may further include at least one reflective marker, for example, three reflective markers, and the reflective markers may be attached to an object.

반사 마커는, 예를 들어 패시브 마커일 수 있다. 패시브 마커는 특정 파장의 광, 예를 들어 적외선의 광을 반사할 수 있다.The reflective marker may be a passive marker, for example. A passive marker may reflect light of a specific wavelength, for example infrared light.

광학식 카메라는 적외선의 광을 출력하며, 객체에 부착된 반사 마커로부터 반사된 적외선의 광을 수신하여 반사 마커의 위치를 추적하며, 이에 따라 객체의 3차원 상의 절대적인 위치, 자세, 속도 및 움직임 중 적어도 하나를 추정한다(S600).An optical camera outputs infrared light, receives infrared light reflected from a reflective marker attached to an object, tracks the position of the reflective marker, and accordingly, at least among the absolute position, posture, speed, and movement of the object in three dimensions. Estimate one (S600).

센서식 모션캡처 모듈(120)은 객체에 부착되는 복수의 센서를 포함한다. 복수의 센서는, 예를 들어 가속도 센서 및 자이로 센서를 포함할 수 있다. 이에 따르면, 센서식 모션캡처 모듈(120)은 객체의 위치, 자세, 속도 및 움직임 중 적어도 하나를 센싱할 수 있다(S610).The sensor-type motion capture module 120 includes a plurality of sensors attached to an object. The plurality of sensors may include, for example, an acceleration sensor and a gyro sensor. According to this, the sensor-based motion capture module 120 may sense at least one of the position, posture, speed, and movement of the object (S610).

통합 모듈(130)은 광학식 모션캡처 모듈(110)로부터 추정된 정보 및 센서식 모션캡처 모듈(120)로부터 센싱된 정보를 통합하여 객체의 모션을 캡처한다(S620). 예를 들어, 통합 모듈(130)은 광학식 모션캡처 모듈(110)에 의해 추정된 객체의 절대 좌표에 따른 위치, 자세, 속도 및 움직임 중 적어도 하나를 센서식 모션캡처 모듈(120)로부터 센싱된 객체의 세밀한 위치, 자세, 속도 및 움직임 중 적어도 하나를 이용하여 보정할 수 있다. 통합 모듈(130)은 공지의 위치 추정 통합 알고리즘을 이용하여 광학식 모션캡처 모듈(110)로부터 추정된 정보 및 센서식 모션캡처 모듈(120)로부터 센싱된 정보를 통합할 수 있다. 예를 들어, 통합 모듈(130)은 광학식 모션캡처 모듈(110)로부터 추정된 정보 및 센서식 모션캡처 모듈(120)로부터 센싱된 정보의 오차값을 통합 칼만필터를 통해 융합하여 객체의 통합위치, 속도, 자세 추정값을 산출함으로써 객체의 정확한 위치를 추정할 수 있다. 통합 모듈(130)은 광학식 모션캡처 모듈(110)로부터 추정된 정보 및 센서식 모션캡처 모듈(120)로부터 센싱된 정보를 통합하는 컴퓨팅 장치일 수 있다. 이에 따라, 본 명세서에서 통합 모듈(130)은 모션캡쳐 컴퓨터라 지칭될 수 있다.The integration module 130 integrates the information estimated from the optical motion capture module 110 and the information sensed from the sensor motion capture module 120 to capture motion of the object (S620). For example, the integration module 130 determines at least one of the position, posture, speed, and motion of an object sensed by the sensor-type motion capture module 120 according to the absolute coordinates of the object estimated by the optical motion capture module 110. It can be corrected using at least one of the detailed position, posture, speed, and movement of the . The integration module 130 may integrate information estimated from the optical motion capture module 110 and information sensed from the sensor motion capture module 120 using a known location estimation integration algorithm. For example, the integration module 130 fuses the error values of the information estimated from the optical motion capture module 110 and the information sensed from the sensor-type motion capture module 120 through an integrated Kalman filter to determine the integrated position of the object, It is possible to estimate the exact position of an object by calculating the velocity and posture estimation values. The integration module 130 may be a computing device that integrates information estimated from the optical motion capture module 110 and information sensed from the sensor motion capture module 120 . Accordingly, in this specification, the integration module 130 may be referred to as a motion capture computer.

다시 도 3 내지 도 4를 참조하면, 어노테이션 장치(300)는 제1 센서 시스템(100)의 좌표계 및 제2 센서 시스템(200)의 좌표계를 정합한다(S400). 이를 위하여, 도 7에 도시된 바와 같이, 하이브리드 모션 캡처 시스템인 제1 센서 시스템(100)의 좌표계가 미리 설정되고(S700), 실사 인물을 촬영하는 일반적인 촬영 카메라인 제2 센서 시스템(200)의 내부 변수가 추정되며(S702), 제2 센서 시스템(200)의 외부 변수가 추정된 후(S704), 제1 센서 시스템(100)의 좌표계 및 제2 센서 시스템(200)의 좌표계가 정합될 수 있다(S706).Referring back to FIGS. 3 and 4 , the annotation device 300 matches the coordinate system of the first sensor system 100 and the coordinate system of the second sensor system 200 (S400). To this end, as shown in FIG. 7, the coordinate system of the first sensor system 100, which is a hybrid motion capture system, is set in advance (S700), and the coordinate system of the second sensor system 200, which is a general photographing camera for photographing a real person, is set in advance (S700). After internal variables are estimated (S702) and external variables of the second sensor system 200 are estimated (S704), the coordinate system of the first sensor system 100 and the coordinate system of the second sensor system 200 may be matched. Yes (S706).

보다 구체적으로, 실사 인물을 촬영하는 일반적인 촬영 카메라인 제2 센서 시스템(200)에 있어서, 제2 센서 시스템(200)의 내부변수, 예를 들어 렌즈, 렌즈와 이미지 센서 간 거리, 렌즈와 이미지 센서 간 각도 등에 따라 왜곡의 정도가 달라질 수 있으며, 왜곡을 보정하는 과정이 필요하다. 제2 센서 시스템(200)의 왜곡을 보정하기 위하여, 핀홀(pin-hole) 카메라 모델에 기반하여 체스보드를 이용하는 방법이 이용될 수 있다. 핀홀 카메라 모델은 일반적으로 많이 사용되는 카메라 모델이며, 객체로부터 오는 광선을 초점거리가 1인 정규화된 이미지 평면(normalized image plane)으로 사영(projection)하고, 이미지 좌표로 변환시키는 모델이다. 일반적으로, 방사형 왜곡(radial distortion)과 접선형 왜곡(tangential distortion)이 고려될 수 있다. 도 8은 제2 센서 시스템을 보정하는 방법의 한 예이다. 제2 센서 시스템의 보정을 통해 추정한 카메라 모델의 파라미터를 카메라 내부변수라 하며 카메라 내부변수의 파라미터 값을 구하는 과정을 카메라 캘리브레이션이라 한다. 핀홀 카메라 모델에서 카메라 캘리브레이션은 다음의 수학식 1에 의해 모델링될 수 있다. More specifically, in the second sensor system 200, which is a general camera for capturing real-life people, internal variables of the second sensor system 200, for example, a lens, a distance between a lens and an image sensor, and a lens and an image sensor The degree of distortion may vary depending on the angle between the screens and the like, and a process of correcting the distortion is required. In order to correct the distortion of the second sensor system 200, a method using a chess board based on a pin-hole camera model may be used. The pinhole camera model is a commonly used camera model, and is a model in which a ray coming from an object is projected onto a normalized image plane having a focal length of 1 and converted into image coordinates. In general, radial distortion and tangential distortion can be considered. 8 is an example of a method for calibrating the second sensor system. Parameters of the camera model estimated through calibration of the second sensor system are referred to as camera internal variables, and a process of obtaining parameter values of camera internal variables is referred to as camera calibration. Camera calibration in the pinhole camera model can be modeled by Equation 1 below.

여기서, (X, Y, Z)는 월드 좌표계(world coordinate system) 상의 3차원 좌표이고, [R|t]는 월드 좌표계를 카메라 좌표계로 변환시키기 위한 회전/이동 변환 행렬이며, A는 내부변수 카메라 매트릭스(intrinsic camera matrix)이다.Here, (X, Y, Z) are three-dimensional coordinates on the world coordinate system, [R|t] is a rotation/movement transformation matrix for converting the world coordinate system to the camera coordinate system, and A is the internal variable camera matrix (intrinsic camera matrix).

도 9는 본 발명의 한 실싱예에 따른 제1 센서 시스템의 좌표계와 제2 센서 시스템의 좌표계를 정합하는 방법을 설명하기 위한 도면이다. 하이브리드 모션캡처 모듈인 제1 센서 시스템(100)의 좌표계는 도 9(a)에 도시된 광학식 위치 추적 장치의 보정 플레이트를 통해 설정이 가능하며, 실사 인물을 촬영하는 일반적인 촬영 카메라인 제2 센서 시스템(100)의 좌표계는 도 9(b)에 도시된 마커를 통해 설정이 가능하다. 여기서, 마커는 AR 마커 또는 액티브 마커라 지칭될 수 있다. 도 9(b)에 도시된 마커를 이용하면, 3차원 공간에서의 제2 센서 시스템(200)의 위치 추정 및 자세 추정이 가능하다.9 is a diagram for explaining a method of matching a coordinate system of a first sensor system and a coordinate system of a second sensor system according to one embodiment of the present invention. The coordinate system of the first sensor system 100, which is a hybrid motion capture module, can be set through the correction plate of the optical position tracking device shown in FIG. The coordinate system of (100) can be set through the marker shown in FIG. 9(b). Here, the marker may be referred to as an AR marker or an active marker. Using the marker shown in FIG. 9( b ), it is possible to estimate the position and posture of the second sensor system 200 in a 3D space.

본 발명의 실시예에 따르면, 제1 센서 시스템(100)의 좌표계 및 제2 센서 시스템(200)의 좌표계를 정합하기 위하여, 도 9(a)에 도시된 제1 센서 시스템(100)을 위한 보정 플레이트와 도 9(b)에 도시된 제2 센서 시스템(200)을 위한 마커를 통합한 원점 일치화 지그가 사용될 수 있다. 도 9(c)는 제1 센서 시스템(100)을 위한 보정 플레이트와 도 9(b)에 도시된 제2 센서 시스템(200)을 위한 마커가 통합된 원점 일치화 지그의 한 예이다. 도 9(c)에 도시된 원점 일치화 지그를 이용하면, 제1 센서 시스템(100)이 바라보는 원점과 제2 센서 시스템(200)이 바라보는 원점이 일치화될 수 있으며, 이에 따라 제1 센서 시스템(100)의 좌표계 및 제2 센서 시스템(200)의 좌표계가 정합될 수 있다.According to an embodiment of the present invention, in order to match the coordinate system of the first sensor system 100 and the coordinate system of the second sensor system 200, the correction for the first sensor system 100 shown in FIG. 9 (a) An origin matching jig incorporating a plate and a marker for the second sensor system 200 shown in FIG. 9(b) may be used. 9(c) is an example of an origin alignment jig in which a correction plate for the first sensor system 100 and a marker for the second sensor system 200 shown in FIG. 9(b) are integrated. If the origin matching jig shown in FIG. 9(c) is used, the origin point viewed by the first sensor system 100 and the origin point viewed by the second sensor system 200 can be matched. The coordinate system of the sensor system 100 and the coordinate system of the second sensor system 200 may be matched.

도 10은 제1 센서 시스템의 좌표계 및 제2 센서 시스템의 좌표계를 정합하기 위하여 도 9(c)의 원점 일치화 지그를 사용하는 예이고, 도 11은 제2 센서 시스템에 의하여 촬영된 영상이고, 도 12는 마커와 제2 센서 시스템 간 외부변수를 설명하기 위한 도면이다. 10 is an example of using the origin matching jig of FIG. 9(c) to match the coordinate system of the first sensor system and the coordinate system of the second sensor system, and FIG. 11 is an image captured by the second sensor system, 12 is a diagram for explaining external variables between a marker and a second sensor system.

도 10 내지 도 12를 참조하면, 원점 일치화 지그(1000)는 제1 센서 시스템(100) 및 제2 센서 시스템(200)과 함께 배치된다. 이때, 제2 센서 시스템(200)은 사전 추정된 내부 변수를 통하여 원점 일치화 지그(1000)의 마커를 기준으로 하는 제1 센서 시스템(100)의 원점으로부터 제2 센서 시스템(200) 사이의 위치 및 회전 정보인 외부변수의 추정을 수행한다.Referring to FIGS. 10 to 12 , the origin matching jig 1000 is disposed together with the first sensor system 100 and the second sensor system 200 . At this time, the second sensor system 200 determines the location between the origin of the first sensor system 100 and the second sensor system 200 based on the marker of the origin matching jig 1000 through a pre-estimated internal variable. and estimation of an external variable that is rotation information.

도 11에 도시된 바와 같이, 제2 센서 시스템(200)은 제1 센서 시스템(100)이 바라보는 원점과 제2 센서 시스템(200)이 바라보는 원점이 일치화된 영상을 촬영하며, 이에 따라, 제1 센서 시스템(100)과 제2 센서 시스템(200) 간 원점 일치화를 확인할 수 있다.As shown in FIG. 11, the second sensor system 200 captures an image in which the origin of the first sensor system 100 and the origin of the second sensor system 200 are matched. , origin matching between the first sensor system 100 and the second sensor system 200 can be confirmed.

원점 일치화 지그(1000)에서 확인 가능한 마커는 사각형으로서 네 꼭지점의 위치의 2차원 좌표를 추정할 수 있다. 호모그래피(homography)는 2차원 사영기하에서 두 평면 사이의 관계를 의미하며, 여기서는 마커와 제2 센서 시스템(200) 간 외부변수를 의미한다.Markers that can be confirmed in the origin matching jig 1000 are rectangles, and 2-dimensional coordinates of positions of four vertices can be estimated. Homography means a relationship between two planes in a two-dimensional projection geometry, and here means an external variable between the marker and the second sensor system 200.

도 12를 참조하면, 실세계에 존재하는 평면, 즉 원점 일치화 지그에서 확인 가능한 마커 1234가 카메라를 통해 사영되어 평면 1'2'3'4'로 됨을 알 수 있다. 두 평면은 수학식 2에 따른 카메라 사영행렬을 통해 관계의 설명이 가능하다. Referring to FIG. 12 , it can be seen that a plane existing in the real world, that is, a marker 1234 that can be confirmed in the origin matching jig is projected through a camera and becomes a plane 1'2'3'4'. The relationship between the two planes can be explained through the camera projection matrix according to Equation 2.

수학식 2는 핀홀 카메라 모델이며 실제 세계에서 입력되는 한 점의 좌표에서 Z값이 0이기 때문에 행렬의 곱에 의해 사영 행렬의 3열 값은 0이 된다. 따라서 수학식 2를 정리하면 아래 수학식 3과 같은 3x3행렬로 표현이 가능하며, 이것을 호모그래피라고 한다.Equation 2 is a pinhole camera model, and since the Z value is 0 at the coordinates of a point input in the real world, the value of the third column of the projection matrix becomes 0 by matrix multiplication. Therefore, by rearranging Equation 2, it can be expressed as a 3x3 matrix as in Equation 3 below, which is called homography.

이때 호모그래피는 실세계에 존재하는 마커로부터 카메라 간의 위치 및 자세 정보가 되며 이것은 카메라 외부변수라고 할 수 있다.At this time, homography becomes position and posture information between cameras from markers existing in the real world, and this can be referred to as a camera external variable.

다시 도 3 및 도 4를 참조하면, 어노테이션 장치(300)가 제1 센서 시스템(100)의 좌표계 및 제2 센서 시스템(200)의 좌표계를 정합한 후(S400), 제1 센서 시스템(100)은 스켈레톤의 관절 위치 정보를 포함하는 3차원 영상을 촬영하고(S410), 제2 센서 시스템(200)은 2차원 영상을 촬영한다(S420). 도 13은 제1 센서 시스템(100) 및 제2 센서 시스템(200)이 각각 동시에 객체를 촬영하는 예이고, 도 14는 제2 센서 시스템(200)을 통해 촬영된 영상의 예이다. 3 and 4 again, after the annotation device 300 matches the coordinate system of the first sensor system 100 and the coordinate system of the second sensor system 200 (S400), the first sensor system 100 captures a 3D image including joint position information of the skeleton (S410), and the second sensor system 200 captures a 2D image (S420). FIG. 13 is an example in which the first sensor system 100 and the second sensor system 200 simultaneously capture an object, and FIG. 14 is an example of an image captured through the second sensor system 200 .

즉, 도 7의 단계 S708 및 도 13에 도시된 바와 같이, 제1 센서 시스템(100)은 모션캡처를 수행하고, 제2 센서 시스템(200)은 실사 영상을 촬영할 수 있다. 도 13 및 도 14를 참조하면, 광학식 모션캡처 모듈(110)에 포함되는 반사 마커와 센서식 모션캡처 모듈(120)에 포함되는 센서를 착용한 사람이 스튜디오 내에서 움직이면, 제1 센서 시스템(100)은 모션캡처를 수행하고, 제2 센서 시스템(200)은 실사 영상을 녹화할 수 있다.That is, as shown in step S708 of FIG. 7 and FIG. 13 , the first sensor system 100 may perform motion capture, and the second sensor system 200 may capture a real-life image. 13 and 14, when a person wearing the reflective marker included in the optical motion capture module 110 and the sensor included in the sensor type motion capture module 120 moves within the studio, the first sensor system 100 ) performs motion capture, and the second sensor system 200 may record a live action image.

이때, 제1 센서 시스템(100)이 스켈레톤의 관절 위치 정보를 포함하는 3차원 영상을 촬영하는 단계(S410)와 제2 센서 시스템(200)이 2차원 영상을 촬영하는 단계(S420)는 동시에 수행될 수 있다.At this time, the step of capturing a 3D image including the joint position information of the skeleton by the first sensor system 100 (S410) and the step of capturing a 2D image by the second sensor system 200 (S420) are performed simultaneously. It can be.

다음으로, 어노테이션 장치(300)는 제1 센서 시스템(100)에 의해 촬영된 3차원 영상에 포함되는 스켈레톤의 관절 위치 정보를 가상 카메라(2000)에 사영한다(S430). 여기서, 가상 카메라(2000)는 제2 센서 시스템(200)의 내부 변수 및 외부 변수를 적용한 가상 카메라일 수 있다. 즉, 도 7에 도시된 바와 같이, 어노테이션 장치(300)는 제1 센서 시스템(100)에 의해 촬영된 3차원 영상에 포함되는 스켈레톤의 관절 위치 정보를 로드하고(S710), 미리 추정된 제2 센서 시스템(200)의 내부 변수 및 외부 변수를 로드하며(S712), 미리 추정된 제2 센서 시스템(200)의 내부 변수 및 외부 변수를 반영한 가상 카메라(2000)의 좌표를 설정하고(S714), 제1 센서 시스템(100)에 의해 촬영된 3차원 영상에 포함되는 스켈레톤의 관절 위치 정보를 가상 카메라에 투영한다(S716). 도 15 내지 도 17은 제1 센서 시스템에 의해 촬영된 3차원 영상에 포함되는 스켈레톤의 관절 위치 정보를 가상 카메라에 투영하는 과정을 설명하기 위한 도면이고, 도 18은 가상 카메라를 통해 모션 캡쳐된 스켈레톤 데이터이다. 이때, 가상 카메라(2000)는 제2 센서 시스템(200)의 왜곡도와 위치정보를 포함하고 있다. 도 15 내지 도 18을 참조하면, 3차원 공간에 위치한 스켈레톤은 가상 카메라(2000)에 투영되어 2차원 영상으로 보여지게 되는데, 3차원 좌표로 저장된 스켈레톤의 관절 위치 정보는 핀홀 카메라 수식에 의거하여 2차원 영상 평면에 사영(projection)된다. 스켈레톤의 관절 위치 정보가 사영된 가상 카메라(2000)는 실제 촬영 영상의 제2 센서 시스템(200)을 시뮬레이션하여 표현한 가상 카메라이다. 즉, 가상 카메라(2000) 영상의 위치와 촬영 카메라의 3차원 공간상 위치가 같기 때문에 스켈레톤의 관절 위치는 실제 촬영 영상속 배우의 관절 위치와 동일한 지점에 사영됨을 알 수 있다.Next, the annotation device 300 projects the joint position information of the skeleton included in the 3D image captured by the first sensor system 100 on the virtual camera 2000 (S430). Here, the virtual camera 2000 may be a virtual camera to which internal and external variables of the second sensor system 200 are applied. That is, as shown in FIG. 7 , the annotation device 300 loads the joint position information of the skeleton included in the 3D image captured by the first sensor system 100 (S710), and the pre-estimated second The internal and external variables of the sensor system 200 are loaded (S712), and the coordinates of the virtual camera 2000 reflecting the internal and external variables of the second sensor system 200 estimated in advance are set (S714), The joint position information of the skeleton included in the 3D image captured by the first sensor system 100 is projected onto the virtual camera (S716). 15 to 17 are views for explaining a process of projecting joint position information of a skeleton included in a 3D image captured by a first sensor system to a virtual camera, and FIG. 18 is a skeleton motion-captured through a virtual camera. It is data. At this time, the virtual camera 2000 includes the degree of distortion and location information of the second sensor system 200 . 15 to 18, a skeleton located in a 3-dimensional space is projected on a virtual camera 2000 and shown as a 2-dimensional image. The joint position information of the skeleton stored in 3-dimensional coordinates is based on the pinhole camera formula. It is projected onto a dimensional image plane. The virtual camera 2000 projected with joint position information of the skeleton is a virtual camera expressed by simulating the second sensor system 200 of an actual captured image. That is, since the location of the image of the virtual camera 2000 and the location of the shooting camera in 3D space are the same, it can be seen that the location of the joints of the skeleton is projected at the same point as the location of the joints of the actor in the actual shooting image.

이에 따라, 다시 도 3 내지 도 4를 참조하면, 제1 센서 시스템(100)에 의해 획득되어 가상 카메라에 사영된 스켈레톤의 관절 위치 정보를 이용하여 제2 센서 시스템(200)에 의하여 촬영된 2차원 영상 내 관절을 자동으로 어노테이션한다(S440). 즉, 가상 카메라에 사영된 스켈레톤의 관절 위치 정보와 제2 센서 시스템(200)을 통하여 촬영된 2차원 영상을 이용하여 제2 센서 시스템(200)을 통하여 촬영된 2차원 영상 내 관절을 자동으로 어노테이션한다. 도 7에 도시된 바와 같이, 어노테이션 장치(300)는 제2 센서 시스템(200)에 의해 촬영된 2차원 영상을 로드하고(S718), 가상 카메라에 2차원 영상으로 사영된 스켈레톤의 관절 위치 정보를 제2 센서 시스템(200)을 통하여 촬영된 2차원 영상에 오버랩시키며(S720), 가상 카메라에 2차원 영상으로 사영된 스켈레톤의 관절 위치 정보를 제2 센서 시스템(200)을 통하여 촬영된 2차원 영상에 정합(S722)시키는 과정을 통하여, 인체 포즈 어노테이션을 수행할 수 있다(S724). 도 19는 인체 스켈레톤의 개별 관절의 위치 및 자세 정보이고, 도 20은 본 발명의 실시예에 따라 자동 어노테이션된 결과이다. 도 19에 도시된 바와 같이, 모션 캡쳐를 통해 저장된 하이라키 구조를 가지고 있는 인체 스켈레톤은 개별 관절의 위치 및 자세 정보를 가질 수 있다. 도 20에 도시된 바와 같이, 가상 카메라에 인체 스켈레톤 구조가 사영되면 3차원의 관절 좌표는 2차원 영상 좌표로 변환이 되며, 이는 제2 센서 시스템(200)을 통하여 촬영된 2차원 영상에서 어느 부위의 관절인지 동시에 매칭될 수 있다.Accordingly, referring again to FIGS. 3 and 4 , the two-dimensional image captured by the second sensor system 200 is obtained by using the joint position information of the skeleton obtained by the first sensor system 100 and projected on the virtual camera. Joints in the image are automatically annotated (S440). That is, the joints in the 2D image captured through the second sensor system 200 are automatically annotated using the joint position information of the skeleton projected on the virtual camera and the 2D image captured through the second sensor system 200. do. As shown in FIG. 7 , the annotation device 300 loads the 2D image captured by the second sensor system 200 (S718), and transmits the joint position information of the skeleton projected as the 2D image to the virtual camera. The 2D image captured through the second sensor system 200 is overlapped with the 2D image captured through the second sensor system 200 (S720), and the joint position information of the skeleton projected as a 2D image on the virtual camera is captured through the second sensor system 200. Through the process of matching (S722) to, it is possible to perform human body pose annotation (S724). 19 is position and posture information of individual joints of a human skeleton, and FIG. 20 is an automatic annotation result according to an embodiment of the present invention. As shown in FIG. 19 , a human skeleton having a hierarchical structure stored through motion capture may have position and posture information of individual joints. As shown in FIG. 20, when the human skeleton structure is projected on the virtual camera, the 3D joint coordinates are converted into 2D image coordinates, which is a certain part in the 2D image captured through the second sensor system 200. The joints of can be matched at the same time.

이에 따르면, AI학습용 인체 관절 어노테이션이 자동으로 수행될 수 있다.According to this, human body joint annotation for AI learning can be automatically performed.

특히, 본 발명의 실시예에 따르면, 일반 사진 대신 동영상이 어노테이션에 이용될 수 있으므로, AI학습용 인체 관절 어노테이션이 빠르게 수행될 수 있다.In particular, according to an embodiment of the present invention, since a video can be used for annotation instead of a general photo, human joint annotation for AI learning can be quickly performed.

또한, 본 발명의 실시예에 따르면, 하이브리드 모션캡처 모듈의 3차원 스켈레톤 정보를 이용하므로, 2차원 영상에서 장신구 또는 의복 등에 의해 가려진 신체 부위에 대한 데이터의 확보가 가능하므로 효과적인 어노테이션이 가능하다.In addition, according to an embodiment of the present invention, since 3D skeleton information of the hybrid motion capture module is used, it is possible to secure data on body parts covered by accessories or clothes in a 2D image, so effective annotation is possible.

또한, 본 발명의 실시예에 따르면, 하이브리드 모션 캡쳐 모듈을 이용하여 신체 부위에 부착하는 반사 마커의 수가 현저히 줄어들기 때문에 다양한 의복을 착용한 AI 학습용 데이터 확보가 가능하다.In addition, according to an embodiment of the present invention, since the number of reflective markers attached to body parts is significantly reduced using the hybrid motion capture module, it is possible to secure data for learning AI wearing various clothes.

본 실시예에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA(field-programmable gate array) 또는 ASIC과 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.The term '~unit' used in this embodiment means software or a hardware component such as a field-programmable gate array (FPGA) or ASIC, and '~unit' performs certain roles. However, '~ part' is not limited to software or hardware. '~bu' may be configured to be in an addressable storage medium and may be configured to reproduce one or more processors. Therefore, as an example, '~unit' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays, and variables. Functions provided within components and '~units' may be combined into smaller numbers of components and '~units' or further separated into additional components and '~units'. In addition, components and '~units' may be implemented to play one or more CPUs in a device or a secure multimedia card.

이상에서 실시예를 중심으로 설명하였으나 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 발명이 속하는 분야의 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 예를 들어, 실시예에 구체적으로 나타난 각 구성 요소는 변형하여 실시할 수 있는 것이다. 그리고 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구 범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.Although the above has been described with reference to the embodiments, this is only an example and does not limit the present invention, and those skilled in the art to which the present invention belongs will not deviate from the essential characteristics of the present embodiment. It will be appreciated that various variations and applications are possible. For example, each component specifically shown in the embodiment can be modified and implemented. And differences related to these modifications and applications should be construed as being included in the scope of the present invention as defined in the appended claims.

Claims

In the motion data annotation method for AI (Artificial Intelligence) learning,
Matching the coordinate system of the first sensor system and the coordinate system of the second sensor system;
Taking a three-dimensional image including joint position information of a skeleton using the first sensor system;
Taking a two-dimensional image through the second sensor system;
projecting the joint position information of the skeleton to a virtual camera to which internal and external variables of the second sensor system are applied; and
Automatically annotating joints in the two-dimensional image captured through the second sensor system using joint position information of the skeleton projected by the virtual camera and the two-dimensional image captured through the second sensor system. How to.

According to claim 1,
In the projecting step,
The method of projecting the joint position information of the skeleton as a two-dimensional image on the virtual camera.

According to claim 2,
In the annotating step,
Wherein the joint position information of the skeleton projected as a 2D image by the virtual camera overlaps the 2D image captured through the second sensor system.

According to claim 2,
In the annotating step,
Wherein the joint position information of the skeleton projected on the virtual camera is converted into 2D image coordinates, and the 2D image coordinates are automatically matched with a 2D image captured through the second sensor system.

According to claim 1,
estimating the internal parameters of the second sensor system prior to the matching step.

According to claim 1,
The first sensor system,
An optical motion capture module that outputs infrared light and receives the infrared light reflected from the object to estimate the position of the object;
A sensor-type motion capture module for sensing the movement of the object, and
and an integration module integrating information estimated from the optical motion capture module and information sensed from the sensor motion capture module.

According to claim 6,
In the matching step,
A method in which origins of the correction plate of the optical motion capture module and the marker of the second sensor system are matched.

According to claim 7,
The external variable of the second sensor system includes position and attitude information between the marker and the second sensor system.

According to claim 1,
The step of photographing the 3D image and the step of photographing the 2D image are performed simultaneously.

A first sensor system for capturing a 3D image including joint position information of a skeleton;
A second sensor system for taking a two-dimensional image, and
The coordinate system of the first sensor system and the coordinate system of the second sensor system are matched, the joint position information of the skeleton is projected on a virtual camera to which the internal and external variables of the second sensor system are applied, and projected on the virtual camera. An annotation device that automatically annotates joints in a 2D image captured through the second sensor system using the joint position information of the skeleton and the 2D image captured through the second sensor system.
A motion data annotation system for AI (Artificial Intelligence) learning that includes.