KR20240076505A

KR20240076505A - Method and System for tracking 3-dimensional posture and position for a plurality of objects

Info

Publication number: KR20240076505A
Application number: KR1020220157286A
Authority: KR
Inventors: 황태민; 김민준; 김지은; 김명진
Original assignee: 한국전자기술연구원
Priority date: 2022-11-22
Filing date: 2022-11-22
Publication date: 2024-05-30
Also published as: WO2024111776A1

Abstract

본 발명의 일 실시예에 따른 복수의 카메라, 복수의 엣지 디바이스, 및 제1 서버를 포함하는 3차원 자세 및 위치 추적 시스템을 이용하여 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법은 a) 상기 복수의 카메라의 각각을 이용하여 복수의 대상물을 촬영함으로써 대상물 이미지 및 깊이 정보를 획득하는 단계; b) 상기 복수의 엣지 디바이스의 각각을 이용하여, 상기 복수의 카메라 중 대응되는 카메라로부터 수신된 상기 대상물 이미지에 근거하여 상기 복수의 대상물 각각의 2차원 스켈레톤을 추정하고, 상기 복수의 카메라 중 대응되는 카메라로부터 수신된 상기 깊이 정보에 근거하여 상기 복수의 대상물 각각의 3차원 위치를 추정하는 단계; c) 상기 복수의 엣지 디바이스의 각각을 이용하여, 상기 복수의 대상물 각각에 대한 상기 2차원 스켈레톤 및 추정된 3차원 위치를 포함하는 추정 데이터를 상기 제1 서버로 송신하는 단계; 및 d) 상기 제1 서버를 이용하여, 상기 복수의 엣지 디바이스로부터 각각 수신된 상기 추정 데이터의 상기 추정된 3차원 위치에 근거하여 대상물 그룹화를 수행하고, 상기 복수의 엣지 디바이스로부터 각각 수신된 상기 추정 데이터의 상기 2차원 스켈레톤 및 상기 대상물 그룹화에 근거하여 상기 복수의 대상물 각각에 대한 3차원 자세 및 위치를 복원하는 단계를 포함하는 것을 특징으로 한다. A 3D posture and position tracking method for a plurality of objects using a 3D posture and position tracking system including a plurality of cameras, a plurality of edge devices, and a first server according to an embodiment of the present invention includes a) the above Obtaining object images and depth information by photographing a plurality of objects using each of a plurality of cameras; b) Using each of the plurality of edge devices, estimate a two-dimensional skeleton of each of the plurality of objects based on the object image received from the corresponding camera among the plurality of cameras, and estimating a three-dimensional position of each of the plurality of objects based on the depth information received from a camera; c) transmitting estimated data including the two-dimensional skeleton and an estimated three-dimensional position for each of the plurality of objects to the first server using each of the plurality of edge devices; and d) using the first server to group objects based on the estimated three-dimensional positions of the estimated data respectively received from the plurality of edge devices, and It is characterized in that it includes the step of restoring the three-dimensional posture and position of each of the plurality of objects based on the two-dimensional skeleton of data and the grouping of the objects.

Description

3-dimensional posture and position tracking method and system for a plurality of objects {Method and System for tracking 3-dimensional posture and position for a plurality of objects}

본 발명은 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법 및 시스템에 관한 것으로, 구체적으로, 복수의 카메라 및 엣지 프로세싱(edge processing) 기술을 이용하여 실시간 및 저비용으로 대상물의 3차원 스켈레톤(이하 '3차원 자세'라고도 함)을 추정할 수 있을 뿐만 아니라, 깊이 정보를 이용하여 복수의 대상물 각각에 대한 3차원 자세 및 위치를 복원할 수 있는 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법 및 시스템에 관한 것이다. The present invention relates to a three-dimensional posture and position tracking method and system for a plurality of objects. Specifically, the present invention relates to a three-dimensional skeleton (hereinafter referred to as ' A 3D posture and position tracking method and system for multiple objects that can not only estimate the 3D posture (also known as '3D posture'), but also restore the 3D posture and position for each of the multiple objects using depth information. It's about.

최근, 예를 들어 메타버스 등과 같이 가상 세계에 실감 기술을 접목하기 위해, 대상물의 3차원 자세 및 위치를 추적하는 방법 및 시스템에 대한 수요가 증가하고 있다. Recently, in order to incorporate realistic technology into virtual worlds such as the metaverse, there is an increasing demand for methods and systems that track the 3D posture and position of objects.

"웨어러블 디바이스 동기화를 통한 실감체험공간 내 다중 사용자 인식 시스템 및 방법"에 관한 종래 기술인 특허문헌 1(한국공개특허공보 제10-2018-0020333호)이 개시되어 있다. 특허문헌 1에 개시된 다중 사용자 인식 시스템은 사용자 신체 일부에 착용되어 이동 및 움직임을 측정하는 웨어러블 디바이스, 및 키넥트 카메라로부터 인식된 사용자의 스켈레톤 센싱값과 웨어러블 디바이스의 측정값을 수신받아 스켈레톤의 센싱값 또는 웨어러블 디바이스의 측정값을 보정하는 제어 시스템을 포함한다. 그러나, 이러한 다중 사용자 인식 시스템은 대상물인 사용자가 웨어러블 디바이스와 같은 별도의 특수한 기구물을 필수적으로 착용해야 한다는 문제점이 있다. Patent Document 1 (Korean Patent Publication No. 10-2018-0020333), which is a prior art regarding “multi-user recognition system and method in a realistic experience space through wearable device synchronization,” is disclosed. The multi-user recognition system disclosed in Patent Document 1 receives the user's skeleton sensing value recognized from a wearable device that is worn on a part of the user's body and measures movement and movement, and a Kinect camera, and the measurement value of the wearable device, and receives the skeleton's sensing value Or, it includes a control system that corrects measurement values of the wearable device. However, this multi-user recognition system has a problem in that the target user must wear a separate special device such as a wearable device.

한편, 종래의 3차원 자세 및 위치 추적 방법 및 시스템으로서, 단일 카메라를 이용하여 대상물의 2차원 스켈레톤 좌표를 추정하고, 이러한 2차원 스켈레톤 좌표 및 적외선 센서를 이용한 거리 정보에 근거하여 3차원 자세를 추정하는 시스템이 있다. 그러나, 이러한 시스템은 장애물이 카메라를 가리거나 대상물이 뒤를 돌아보는 것과 같은 열악한 환경에서는 대상물의 3차원 자세를 추정할 수 없는 문제점이 있다. Meanwhile, as a conventional 3D posture and position tracking method and system, the 2D skeleton coordinates of an object are estimated using a single camera, and the 3D posture is estimated based on these 2D skeleton coordinates and distance information using an infrared sensor. There is a system that does this. However, this system has a problem in that it cannot estimate the 3D posture of an object in poor environments, such as when an obstacle blocks the camera or the object is looking back.

한편, 종래의 3차원 자세 및 위치 추적 방법 및 시스템으로서, 복수의 카메라를 단일 서버에 연결하여 비실시간(non-real time)으로 3차원 자세를 추정하고, 각 카메라에서 검출된 복수의 대상물의 이미지를 비교하여 유사성이 높은 대상물을 그룹화(clustering)하는 방법이 있다. 이러한 방법은 카메라가 촬영한 실제 이미지가 서버에 전송되므로 프라이버시 침해가 생길 수 있는 문제점이 있다. 또한, 단일 서버에 연결된 카메라의 개수가 많아질수록 영상 미디어를 서버에 전달하기 위한 데이터 전송 부하가 급격히 증가하고, 이에 따라 실시간으로 3차원 자세를 추정하기 곤란한 문제점이 있다. 또한, 영상을 처리하기 위한 별도의 미디어 서버를 구축하는 경우, 이에 따른 비용이 상당히 높은 문제점이 있다. 또한, 복수의 대상물이 동일한 의상 및/또는 동일한 자세를 취하는 경우, 복수의 대상물 간의 구분이 곤란한 문제점이 있다. Meanwhile, as a conventional 3D posture and position tracking method and system, a 3D posture is estimated in non-real time by connecting multiple cameras to a single server, and images of multiple objects detected by each camera are used. There is a method of grouping (clustering) objects with high similarity by comparing . This method has the problem that privacy may be violated because the actual image captured by the camera is transmitted to the server. Additionally, as the number of cameras connected to a single server increases, the data transmission load for delivering video media to the server increases rapidly, making it difficult to estimate 3D posture in real time. Additionally, when building a separate media server to process video, there is a problem that the cost associated with it is quite high. Additionally, when a plurality of objects wear the same clothing and/or assume the same posture, there is a problem in that it is difficult to distinguish between the plurality of objects.

한국공개특허공보 제10-2018-0020333호Korean Patent Publication No. 10-2018-0020333

본 발명의 목적은, 사용자와 같은 대상물의 3차원 스켈레톤을 실시간으로 추정하여 가상 세계의 아바타를 제어하는 등 가상 세계의 실감성을 높일 수 있고, 예를 들어, 대상물이 동일한 의상을 입고 있거나 동일한 자세를 취하고 있더라도, 복수의 대상물을 구분하여 복원할 수 있는 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법 및 시스템을 제공하는 데에 있다. The purpose of the present invention is to increase the realism of the virtual world, such as by controlling the avatar of the virtual world by estimating the 3D skeleton of an object such as the user in real time, for example, if the object is wearing the same clothes or has the same posture. The goal is to provide a 3D posture and position tracking method and system for multiple objects that can distinguish and restore multiple objects.

본 발명의 다른 목적은, 실제 영상이 아닌 2차원 스켈레톤을 이용하여 3차원 스켈레톤을 복원함으로써 별도의 미디어 서버 구축 비용을 절감할 수 있는 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법 및 시스템을 제공하는 데에 있다.Another object of the present invention is to provide a 3D posture and position tracking method and system for multiple objects that can reduce the cost of building a separate media server by restoring the 3D skeleton using a 2D skeleton rather than an actual image. It's about doing it.

본 발명의 또 다른 목적은, 대상물을 촬영한 영상을 엣지 디바이스(edge device)에서만 처리하고 서버에 전달하지 않음으로써 프라이버시 침해 문제를 방지할 수 있는 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법 및 시스템을 제공하는 데에 있다. Another object of the present invention is to provide a three-dimensional posture and position tracking method for a plurality of objects that can prevent privacy infringement issues by processing images taken of the objects only on an edge device and not transmitting them to the server, and The goal is to provide a system.

상기와 같은 목적을 달성하기 위하여, 본 발명의 실시예의 일 특징에 따르는 복수의 카메라, 복수의 엣지 디바이스, 및 제1 서버를 포함하는 3차원 자세 및 위치 추적 시스템을 이용하여 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법은 a) 상기 복수의 카메라의 각각을 이용하여 복수의 대상물을 촬영함으로써 대상물 이미지 및 깊이 정보를 획득하는 단계; b) 상기 복수의 엣지 디바이스의 각각을 이용하여, 상기 복수의 카메라 중 대응되는 카메라로부터 수신된 상기 대상물 이미지에 근거하여 상기 복수의 대상물 각각의 2차원 스켈레톤을 추정하고, 상기 복수의 카메라 중 대응되는 카메라로부터 수신된 상기 깊이 정보에 근거하여 상기 복수의 대상물 각각의 3차원 위치를 추정하는 단계; c) 상기 복수의 엣지 디바이스의 각각을 이용하여, 상기 복수의 대상물 각각에 대한 상기 2차원 스켈레톤 및 추정된 3차원 위치를 포함하는 추정 데이터를 상기 제1 서버로 송신하는 단계; 및 d) 상기 제1 서버를 이용하여, 상기 복수의 엣지 디바이스로부터 각각 수신된 상기 추정 데이터의 상기 추정된 3차원 위치에 근거하여 대상물 그룹화를 수행하고, 상기 복수의 엣지 디바이스로부터 각각 수신된 상기 추정 데이터의 상기 2차원 스켈레톤 및 상기 대상물 그룹화에 근거하여 상기 복수의 대상물 각각에 대한 3차원 자세 및 위치를 복원하는 단계를 포함하는 것을 특징으로 한다. In order to achieve the above object, 3D posture and position tracking system for a plurality of objects using a 3D posture and position tracking system including a plurality of cameras, a plurality of edge devices, and a first server according to an embodiment of the present invention. The dimensional pose and position tracking method includes the steps of: a) acquiring object images and depth information by photographing a plurality of objects using each of the plurality of cameras; b) Using each of the plurality of edge devices, estimate a two-dimensional skeleton of each of the plurality of objects based on the object image received from the corresponding camera among the plurality of cameras, and estimating a three-dimensional position of each of the plurality of objects based on the depth information received from a camera; c) transmitting estimated data including the two-dimensional skeleton and an estimated three-dimensional position for each of the plurality of objects to the first server using each of the plurality of edge devices; and d) using the first server to group objects based on the estimated three-dimensional positions of the estimated data respectively received from the plurality of edge devices, and It is characterized in that it includes the step of restoring the three-dimensional posture and position of each of the plurality of objects based on the two-dimensional skeleton of data and the grouping of the objects.

본 발명의 실시예의 일 특징에 따르는 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템은 복수의 카메라, 복수의 엣지 디바이스, 및 제1 서버를 포함하고, 상기 복수의 카메라는 각각 복수의 대상물을 촬영함으로써 대상물 이미지 및 깊이 정보를 획득하며, 상기 복수의 엣지 디바이스는 각각 상기 복수의 카메라 중 대응되는 카메라로부터 수신된 상기 대상물 이미지에 근거하여 상기 복수의 대상물 각각의 2차원 스켈레톤을 추정하고, 상기 복수의 카메라 중 대응되는 카메라로부터 수신된 상기 깊이 정보에 근거하여 상기 복수의 대상물 각각의 3차원 위치를 추정하는 제어부; 및 상기 복수의 대상물 각각에 대한 상기 2차원 스켈레톤 및 추정된 3차원 위치를 포함하는 추정 데이터를 상기 제1 서버로 송신하는 통신부를 포함하고, 상기 제1 서버는 상기 복수의 엣지 디바이스로부터 각각 수신된 상기 추정 데이터의 상기 추정된 3차원 위치에 근거하여 대상물 그룹화를 수행하고, 상기 복수의 엣지 디바이스로부터 각각 수신된 상기 추정 데이터의 상기 2차원 스켈레톤 및 상기 대상물 그룹화에 근거하여 상기 복수의 대상물 각각에 대한 3차원 자세 및 위치를 복원하는 서버 제어부를 포함하는 것을 특징으로 한다. A three-dimensional posture and position tracking system for a plurality of objects according to an embodiment of the present invention includes a plurality of cameras, a plurality of edge devices, and a first server, wherein the plurality of cameras each photograph a plurality of objects. By doing so, an object image and depth information are acquired, and the plurality of edge devices each estimate a two-dimensional skeleton of each of the plurality of objects based on the object image received from a corresponding camera among the plurality of cameras, and the plurality of edge devices a control unit that estimates a three-dimensional position of each of the plurality of objects based on the depth information received from a corresponding camera among the cameras; And a communication unit that transmits estimated data including the two-dimensional skeleton and the estimated three-dimensional position for each of the plurality of objects to the first server, wherein the first server receives each of the received data from the plurality of edge devices. Perform object grouping based on the estimated three-dimensional position of the estimated data, and perform grouping of objects for each of the plurality of objects based on the two-dimensional skeleton and the object grouping of the estimated data respectively received from the plurality of edge devices. It is characterized by including a server control unit that restores the 3D posture and position.

본 발명의 실시예에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법 및 시스템을 사용하면 다음과 같은 효과가 달성된다. The following effects are achieved by using the 3D posture and position tracking method and system for a plurality of objects according to an embodiment of the present invention.

1. 사용자와 같은 대상물의 3차원 스켈레톤을 실시간으로 추정하여 가상 세계의 아바타를 제어하는 등 가상 세계의 실감성을 높일 수 있고, 예를 들어, 대상물이 동일한 의상을 입고 있거나 동일한 자세를 취하고 있더라도, 복수의 대상물을 구분하여 복원할 수 있다. 1. It is possible to increase the realism of the virtual world by estimating the 3D skeleton of an object such as the user in real time to control the avatar of the virtual world. For example, even if the object is wearing the same clothes or taking the same posture, Multiple objects can be classified and restored.

2. 실제 영상이 아닌 2차원 스켈레톤을 이용하여 3차원 스켈레톤을 복원함으로써 별도의 미디어 서버 구축 비용을 절감할 수 있다. 2. By restoring the 3D skeleton using a 2D skeleton rather than an actual video, the cost of building a separate media server can be reduced.

3. 대상물을 촬영한 영상을 엣지 디바이스에서만 처리하고 서버에 전달하지 않음으로써 프라이버시 침해 문제를 방지할 수 있다. 3. Privacy infringement issues can be prevented by processing images of objects only on the edge device and not transmitting them to the server.

이하, 본 발명에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법 및 시스템의 바람직한 실시형태를 첨부된 도면을 참조하여 상세히 설명한다.
도 1은 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 블록도이다.
도 2는 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 개략적인 사시도이다.
도 3은 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법의 순서도이다.
도 4는 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 복수의 엣지 디바이스의 각각을 이용하여 추정되는 2차원 스켈레톤의 일 예를 보여주는 도이다.
도 5a 내지 도 5d는 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 제1 서버를 이용하여, 복수의 엣지 디바이스로부터 각각 수신된 추정 데이터의 추정된 3차원 위치를 하나의 카메라 좌표계로 통일하는 단계를 설명하기 위한 도이다.
도 6a 및 도 6b는 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 제1 서버를 이용하여, 하나의 카메라 좌표계에 표시된 복수의 대상물 각각의 3차원 위치 정보에 근거하여 대상물 그룹화를 수행하는 단계를 설명하기 위한 도이다.
도 7은 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 제1 서버를 이용하여, 복수의 대상물 각각의 3차원 자세 및 위치를 복원하는 단계를 설명하기 위한 도이다.
도 8은 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 제1 서버를 이용하여, 현재 프레임의 복수의 대상물 각각에 구분 번호를 할당하는 단계를 설명하기 위한 도이다. Hereinafter, preferred embodiments of the three-dimensional posture and position tracking method and system for a plurality of objects according to the present invention will be described in detail with reference to the attached drawings.
1 is a block diagram of a three-dimensional posture and position tracking system for a plurality of objects according to an embodiment of the present invention.
Figure 2 is a schematic perspective view of a three-dimensional posture and position tracking system for a plurality of objects according to an embodiment of the present invention.
Figure 3 is a flowchart of a 3D posture and position tracking method for a plurality of objects according to an embodiment of the present invention.
Figure 4 is a diagram showing an example of a two-dimensional skeleton estimated using each of a plurality of edge devices of a three-dimensional posture and position tracking system for a plurality of objects according to an embodiment of the present invention.
5A to 5D show estimated 3D positions of estimated data respectively received from a plurality of edge devices using a first server of a 3D posture and position tracking system for a plurality of objects according to an embodiment of the present invention. This is a diagram to explain the steps of unifying into one camera coordinate system.
6A and 6B show 3D position information of each of a plurality of objects displayed in one camera coordinate system using a first server of a 3D posture and position tracking system for a plurality of objects according to an embodiment of the present invention. This is a diagram to explain the steps for grouping objects based on this.
FIG. 7 is a diagram illustrating a step of restoring the 3D posture and position of each of a plurality of objects using the first server of the 3D posture and position tracking system for a plurality of objects according to an embodiment of the present invention; am.
FIG. 8 is a diagram illustrating the step of assigning an identification number to each of a plurality of objects in the current frame using a first server of a three-dimensional posture and position tracking system for a plurality of objects according to an embodiment of the present invention; am.

이하, 본 발명에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법 및 시스템의 바람직한 실시형태를 첨부된 도면을 참조하여 상세히 설명한다. Hereinafter, preferred embodiments of the three-dimensional posture and position tracking method and system for a plurality of objects according to the present invention will be described in detail with reference to the attached drawings.

도 1은 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 블록도이고, 도 2는 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 개략적인 사시도이다. 1 is a block diagram of a three-dimensional posture and position tracking system for a plurality of objects according to an embodiment of the present invention, and FIG. 2 is a three-dimensional posture and position tracking system for a plurality of objects according to an embodiment of the present invention. This is a schematic perspective view of the system.

도 3은 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 방법의 순서도이다. Figure 3 is a flowchart of a 3D posture and position tracking method for a plurality of objects according to an embodiment of the present invention.

도 4는 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 복수의 엣지 디바이스의 각각을 이용하여 추정되는 2차원 스켈레톤의 일 예를 보여주는 도이다. Figure 4 is a diagram showing an example of a two-dimensional skeleton estimated using each of a plurality of edge devices of a three-dimensional posture and position tracking system for a plurality of objects according to an embodiment of the present invention.

도 5a 내지 도 5d는 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 제1 서버를 이용하여, 복수의 엣지 디바이스로부터 각각 수신된 추정 데이터의 추정된 3차원 위치를 하나의 카메라 좌표계로 통일하는 단계를 설명하기 위한 도이다. 5A to 5D show estimated 3D positions of estimated data respectively received from a plurality of edge devices using a first server of a 3D posture and position tracking system for a plurality of objects according to an embodiment of the present invention. This is a diagram to explain the steps of unifying into one camera coordinate system.

이와 관련하여, 도 5a는 제1 카메라(C1)가 바라본 좌표계(이하, '제1 카메라 좌표계'라고도 함)에서 제1 대상물(U1)에 관하여 추정된 3차원 위치 P₁₁, 제2 대상물(U2)에 관하여 추정된 3차원 위치 P₂₁, 및 제3 대상물(U3)에 관하여 추정된 3차원 위치 P₃₁을 나타낸다. 도 5b는 제2 카메라(C2)가 바라본 좌표계(이하, '제2 카메라 좌표계'라고도 함)에서 제1 대상물(U1)에 관하여 추정된 3차원 위치 P₁₂, 제2 대상물(U2)에 관하여 추정된 3차원 위치 P₂₂, 제3 대상물(U3)에 관하여 추정된 3차원 위치 P₃₂를 나타낸다. 도 5c는 제3 카메라(C3)가 바라본 좌표계(이하, '제3 카메라 좌표계'라고도 함)에서 제1 대상물(U1)에 관하여 추정된 3차원 위치 P₁₃, 제2 대상물(U2)에 관하여 추정된 3차원 위치 P₂₃, 및 제3 대상물(U3)에 관하여 추정된 3차원 위치 P₃₃을 나타낸다. 도 5d는 제1 내지 제3 카메라 좌표계에서 제1 대상물(U1)에 관하여 추정된 3차원 위치 P₁₁, P₁₂, P₁₃을 제1 카메라 좌표계로 통일하여 얻어진 3차원 위치 U_1i1, 제1 내지 제3 카메라 좌표계에서 제2 대상물(U2)에 관하여 추정된 3차원 위치 P₂₁, P₂₂, P₂₃을 제1 카메라 좌표계로 통일하여 얻어진 3차원 위치 U_2i1, 및 제1 내지 제3 카메라 좌표계에서 제3 대상물(U3)에 관하여 추정된 3차원 위치 P₃₁, P₃₂, P₃₃을 제1 카메라 좌표계로 통일하여 얻어진 3차원 위치 U_3i1을 나타낸다. In this regard, Figure 5a shows the estimated three-dimensional position P ₁₁ with respect to the first object U1 in the coordinate system viewed by the first camera C1 (hereinafter also referred to as the 'first camera coordinate system'), and the second object U2 ), the estimated three-dimensional position P ₂₁ , and the estimated three-dimensional position P ₃₁ with respect to the third object U3. Figure 5b shows the three-dimensional position P ₁₂ estimated with respect to the first object (U1) in the coordinate system viewed by the second camera (C2) (hereinafter, also referred to as 'second camera coordinate system'), estimated with respect to the second object (U2) The three-dimensional position P ₂₂ represents the estimated three-dimensional position P ₃₂ with respect to the third object U3. Figure 5c shows the three-dimensional position P ₁₃ estimated with respect to the first object (U1) in the coordinate system viewed by the third camera (C3) (hereinafter also referred to as the 'third camera coordinate system'), estimated with respect to the second object (U2) represents the estimated three-dimensional position P ₂₃ and the estimated three-dimensional position P ₃₃ with respect to the third object U3. 5D shows the three-dimensional positions U _1i1 , first to third, obtained by unifying the estimated three-dimensional positions P ₁₁ , P ₁₂ , and P ₁₃ with respect to the first object U1 in the first to third camera coordinate systems into the first camera coordinate system. A three-dimensional position U _2i1 obtained by unifying the three-dimensional positions P ₂₁ , P ₂₂ , and P ₂₃ estimated with respect to the second object U2 in the third camera coordinate system into the first camera coordinate system, and in the first to third camera coordinate systems. It represents the three-dimensional position U _3i1 obtained by unifying the three-dimensional positions P ₃₁ , P ₃₂ , and P ₃₃ estimated with respect to the third object U3 into the first camera coordinate system.

도 6a 및 도 6b는 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 제1 서버를 이용하여, 하나의 카메라 좌표계에 표시된 복수의 대상물 각각의 3차원 위치 정보에 근거하여 대상물 그룹화를 수행하는 단계를 설명하기 위한 도이다. 6A and 6B show 3D position information of each of a plurality of objects displayed in one camera coordinate system using a first server of a 3D posture and position tracking system for a plurality of objects according to an embodiment of the present invention. This is a diagram to explain the steps for grouping objects based on this.

도 7은 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 제1 서버를 이용하여, 복수의 대상물 각각의 3차원 자세 및 위치를 복원하는 단계를 설명하기 위한 도이다. FIG. 7 is a diagram illustrating a step of restoring the 3D posture and position of each of a plurality of objects using the first server of the 3D posture and position tracking system for a plurality of objects according to an embodiment of the present invention; am.

도 8은 본 발명의 일 실시형태에 따른 복수의 대상물을 위한 3차원 자세 및 위치 추적 시스템의 제1 서버를 이용하여, 현재 프레임의 복수의 대상물 각각에 구분 번호를 할당하는 단계를 설명하기 위한 도이다. 도 8의 상부 도면과 관련하여, 시간 t₁에서의 프레임이 현재 프레임이고, 시간 t₀에서의 프레임이 이전 프레임이다. 도 8의 하부 도면과 관련하여, 시간 t₂에서의 프레임이 현재 프레임이고, 시간 t₁에서의 프레임이 이전 프레임이다. FIG. 8 is a diagram illustrating the step of assigning an identification number to each of a plurality of objects in the current frame using a first server of a three-dimensional posture and position tracking system for a plurality of objects according to an embodiment of the present invention; am. Referring to the upper diagram of Figure 8, the frame at time t ₁ is the current frame and the frame at time t ₀ is the previous frame. With respect to the lower diagram of Figure 8, the frame at time t ₂ is the current frame and the frame at time t ₁ is the previous frame.

도 4 내지 도 8과 함께 도 1 및 도 2를 참조하여, 본 발명의 일 실시형태에 따른 복수의 대상물(U1~U3)을 위한 3차원 자세 및 위치 추적 시스템(100)(이하, '3차원 자세 및 위치 추적 시스템(100)'이라고도 함)을 설명하면 다음과 같다. Referring to FIGS. 1 and 2 along with FIGS. 4 to 8, a three-dimensional posture and position tracking system 100 (hereinafter referred to as ‘3D’) for a plurality of objects (U1 to U3) according to an embodiment of the present invention. The posture and position tracking system (also known as '100)' is described as follows.

3차원 자세 및 위치 추적 시스템(100)은 복수의 카메라(C1~Cn), 복수의 엣지 디바이스(ED1~EDn), 및 제1 서버(SV1)를 포함한다. 또한, 3차원 자세 및 위치 추적 시스템(100)은 제2 서버(SV2)를 더 포함할 수 있다. The 3D posture and position tracking system 100 includes a plurality of cameras (C1 to Cn), a plurality of edge devices (ED1 to EDn), and a first server (SV1). Additionally, the 3D posture and location tracking system 100 may further include a second server (SV2).

복수의 카메라(C1~Cn)는 각각 복수의 대상물(U1~U3)을 촬영함으로써 대상물 이미지 및 깊이 정보를 획득한다. 예를 들어, 복수의 카메라(C1~Cn)는 각각 스테레오 카메라 및 적외선 센서 중 하나 이상을 포함하고, 이러한 스테레오 카메라 및 적외선 센서 중 하나 이상에 의해 깊이 정보를 획득할 수 있다. A plurality of cameras (C1 to Cn) each acquire object images and depth information by photographing a plurality of objects (U1 to U3). For example, the plurality of cameras C1 to Cn each include one or more of a stereo camera and an infrared sensor, and depth information may be acquired by one or more of the stereo camera and the infrared sensor.

본 명세서에서, 대상물(U1~U3)은 사람, 동물, 또는 로봇과 같은 움직임이 가능한 임의의 대상을 포함할 수 있다.In this specification, the objects U1 to U3 may include any object capable of movement, such as a person, an animal, or a robot.

복수의 엣지 디바이스(ED1~EDn)는 각각 제어부(CO) 및 통신부(CM)를 포함한다. 또한, 복수의 엣지 디바이스(ED1~EDn)는 각각 저장부(ST)를 더 포함할 수 있다. The plurality of edge devices (ED1 to EDn) each include a control unit (CO) and a communication unit (CM). Additionally, each of the plurality of edge devices ED1 to EDn may further include a storage unit ST.

제어부(CO)는 복수의 카메라(C1~Cn) 중 대응되는 카메라(C1~Cn)로부터 수신된 대상물 이미지에 근거하여 복수의 대상물(U1~U3) 각각의 2차원 스켈레톤을 추정한다. 2차원 스켈레톤은 2차원 관절 좌표를 포함할 수 있다. 예를 들어, 도 4를 참조하면, 2차원 관절 좌표는 코(0), 목(1), 우측 어깨(2), 우측 팔꿈치(3), 우측 손목(4), 좌측 어깨(5), 좌측 팔꿈치(6), 좌측 손목(7), 중앙 둔부(8), 우측 둔부(9), 우측 무릎(10), 우측 발목(11), 좌측 둔부(12), 좌측 무릎(13), 좌측 발목(14), 우측 눈(15), 좌측 눈(16), 우측 귀(17), 좌측 귀(18), 좌측 엄지 발가락(19), 좌측 새끼 발가락(20), 좌측 발뒤꿈치(21), 우측 엄지 발가락(22), 우측 새끼 발가락(23), 및 우측 발뒤꿈치(24) 중 2개 이상에 관한 좌표를 포함할 수 있다. The control unit (CO) estimates the two-dimensional skeleton of each of the plurality of objects (U1 to U3) based on the object image received from the corresponding camera (C1 to Cn) among the plurality of cameras (C1 to Cn). A two-dimensional skeleton may include two-dimensional joint coordinates. For example, referring to Figure 4, the two-dimensional joint coordinates are nose (0), neck (1), right shoulder (2), right elbow (3), right wrist (4), left shoulder (5), left Elbow (6), Left Wrist (7), Mid Hip (8), Right Hip (9), Right Knee (10), Right Ankle (11), Left Hip (12), Left Knee (13), Left Ankle ( 14), right eye (15), left eye (16), right ear (17), left ear (18), left big toe (19), left little toe (20), left heel (21), right thumb. It may include coordinates for two or more of the toes (22), the right little toe (23), and the right heel (24).

또한, 제어부(CO)는 복수의 카메라(C1~Cn) 중 대응되는 카메라(C1~Cn)로부터 수신된 깊이 정보에 근거하여 복수의 대상물(U1~U3) 각각의 3차원 위치를 추정한다. In addition, the control unit CO estimates the three-dimensional position of each of the plurality of objects U1 to U3 based on depth information received from the corresponding camera C1 to Cn among the plurality of cameras C1 to Cn.

통신부(CM)는 추정 데이터를 제1 서버(SV1)로 송신한다. 추정 데이터는 복수의 대상물(U1~U3) 각각에 대한 2차원 스켈레톤 및 추정된 3차원 위치를 포함한다. The communication unit (CM) transmits the estimated data to the first server (SV1). The estimated data includes a two-dimensional skeleton and an estimated three-dimensional position for each of the plurality of objects (U1 to U3).

저장부(ST)는 대응되는 카메라(C1~Cn)로부터 수신된 대상물 이미지 및 깊이 정보를 저장할 수 있다. The storage unit ST may store object images and depth information received from the corresponding cameras C1 to Cn.

제1 서버(SV1)는 서버 제어부(SCO)를 포함한다. 또한, 제1 서버(SV1)는 서버 저장부(SST) 및 서버 통신부(SCM) 중 하나 이상을 더 포함할 수 있다. 예를 들어, 제1 서버(SV1)는 로컬 서버(local server)일 수 있다. The first server (SV1) includes a server control unit (SCO). Additionally, the first server SV1 may further include one or more of a server storage unit (SST) and a server communication unit (SCM). For example, the first server SV1 may be a local server.

서버 제어부(SCO)는 복수의 엣지 디바이스(ED1~EDn)로부터 각각 수신된 추정 데이터의 추정된 3차원 위치에 근거하여 대상물 그룹화를 수행한다. 또한, 서버 제어부(SCO)는 복수의 엣지 디바이스(ED1~EDn)로부터 각각 수신된 추정 데이터의 2차원 스켈레톤 및 대상물 그룹화에 근거하여 복수의 대상물(U1~U3) 각각에 대한 3차원 자세 및 위치를 복원한다. The server control unit (SCO) performs object grouping based on the estimated 3D positions of the estimated data received from each of the plurality of edge devices (ED1 to EDn). In addition, the server control unit (SCO) determines the three-dimensional posture and position of each of the plurality of objects (U1 to U3) based on the two-dimensional skeleton and object grouping of the estimated data each received from the plurality of edge devices (ED1 to EDn). restore

서버 제어부(SCO)는 복수의 엣지 디바이스(ED1~EDn)로부터 각각 수신된 추정 데이터의 추정된 3차원 위치를 하나의 카메라 좌표계로 통일할 수 있다. 도 5a 내지 도 5d를 참조하여, 제a 대상물에 관하여, 제i 카메라 좌표계에서 추정된 3차원 위치가 제j 카메라 좌표계로 변환된 3차원 위치 U_aij는 아래의 수학식 1을 통해 연산되어 얻어질 수 있다. The server control unit (SCO) can unify the estimated 3D positions of the estimated data received from the plurality of edge devices (ED1 to EDn) into one camera coordinate system. Referring to FIGS. 5A to 5D, with respect to the a-th object, the 3-dimensional position U _aij converted from the 3-dimensional position estimated in the i-th camera coordinate system to the j-th camera coordinate system is obtained by calculating through Equation 1 below. You can.

수학식 1에서, P_ai는 제i 카메라 좌표계에서 제a 대상물에 관하여 추정된 3차원 위치이고, T_ij는 제i 카메라 좌표계를 제j 카메라 좌표계로 변환하는 3차원 변환 행렬이다. In Equation 1, P _ai is the estimated 3-dimensional position with respect to the a-th object in the i-th camera coordinate system, and T _ij is a 3-dimensional transformation matrix that transforms the i-th camera coordinate system into the j-th camera coordinate system.

서버 제어부(SCO)는 하나의 카메라 좌표계에 표시된 복수의 대상물(U1~U3) 각각의 3차원 위치 정보에 근거하여 대상물 그룹화를 수행할 수 있다. The server control unit (SCO) may perform object grouping based on the 3D location information of each of the plurality of objects (U1 to U3) displayed in one camera coordinate system.

이와 관련하여, 도 6a 및 도 6b는, 편의상, 복수의 대상물(U1~U3) 각각의 3차원 위치 정보를 바닥면에 대응하는 2차원 좌표계 상에 표시하고 있으나, 실제로는, 복수의 대상물(U1~U3) 각각의 3차원 위치 정보에 근거하여, 유클리드 거리를 이용한 병합 군집(agglomerative clustering) 알고리즘에 의해 대상물 그룹화가 수행될 수 있다. 또한, 도 6b에 표시된 (5, 1), (3, 2) 등과 같은 기호 (p, q)는 제p 카메라에서 바라본 q번째 대상물을 의미한다. 즉, 도 6b에 도시된 실시예는, 3개의 대상물에 대하여 6개의 카메라(C1~C6)를 이용하여 수행되는 대상물 그룹화를 도시하고 있다. 또한, 기호 (p, q)와 관련하여, 대상물이 3개이면 q는 '1', '2', '3' 중 하나일 수 있다. 그러나, 이에 한정되는 것은 아니며, 실제로는, 제p 카메라의 프레임으로부터 대상물이 벗어났다가 다시 들어오면 새로운 대상물로 인식될 수 있다. 따라서, 도 6b에 도시된 (1, 18), (2, 4), (3, 4)와 같이, 대상물이 3개이더라도 q가 3보다 큰 값을 가질 수 있음에 유의하여야 한다. In this regard, FIGS. 6A and 6B display the three-dimensional position information of each of the plurality of objects U1 to U3 on a two-dimensional coordinate system corresponding to the floor surface for convenience, but in reality, the plurality of objects U1 ~U3) Based on each 3D location information, object grouping can be performed by an agglomerative clustering algorithm using Euclidean distance. Additionally, symbols (p, q) such as (5, 1), (3, 2) shown in FIG. 6B mean the qth object seen from the pth camera. That is, the embodiment shown in FIG. 6B shows object grouping performed using six cameras (C1 to C6) for three objects. Additionally, regarding the symbol (p, q), if there are three objects, q can be one of '1', '2', or '3'. However, it is not limited to this, and in reality, if an object leaves the frame of the p camera and then re-enters, it may be recognized as a new object. Therefore, it should be noted that q may have a value greater than 3 even if there are three objects, such as (1, 18), (2, 4), and (3, 4) shown in FIG. 6B.

또한, 서버 제어부(SCO)는, 하나의 카메라 좌표계에 표시된 복수의 대상물(U1~U3) 간의 거리가 미리 정해진 거리 이상일 때, 3차원 위치 정보에 근거하여 대상물 그룹화를 수행하고, 이러한 대상물 그룹화의 결과를 서버 저장부(SST)에 저장할 수 있다. 이와 관련하여, 서버 제어부(SCO)는, 하나의 카메라 좌표계에 표시된 복수의 대상물(U1~U3) 간의 거리가 미리 정해진 거리 미만일 때, 서버 저장부(SST)에 저장된 대상물 그룹화의 결과를 이용하여 대상물 그룹화를 수행할 수 있다. In addition, the server control unit (SCO) performs object grouping based on 3D location information when the distance between the plurality of objects (U1 to U3) displayed in one camera coordinate system is more than a predetermined distance, and the result of this object grouping can be stored in the server storage (SST). In this regard, when the distance between a plurality of objects (U1 to U3) displayed in one camera coordinate system is less than a predetermined distance, the server control unit (SCO) uses the result of object grouping stored in the server storage unit (SST) to determine the object grouping. Grouping can be performed.

서버 제어부(SCO)는 대상물 그룹화에 의해 동일한 그룹에 포함된 복수의 2차원 스켈레톤에 근거하여 복수의 대상물(U1~U3) 각각의 3차원 자세 및 위치를 복원할 수 있다. 복수의 대상물(U1~U3) 각각의 복원된 3차원 자세 및 위치의 일 예가 도 7에 도시되어 있다. 예를 들어, 서버 제어부(SCO)는, DLT(Direct Linear Transform) 매트릭스 연산을 이용하여, 동일한 그룹에 포함된 복수의 2차원 스켈레톤을 이용하여 3차원 스켈레톤을 복원할 수 있다. The server control unit (SCO) can restore the 3D posture and position of each of the plurality of objects (U1 to U3) based on the plurality of 2D skeletons included in the same group by object grouping. An example of the restored 3D posture and position of each of the plurality of objects U1 to U3 is shown in FIG. 7 . For example, the server control unit (SCO) can restore a 3D skeleton using a plurality of 2D skeletons included in the same group using a Direct Linear Transform (DLT) matrix operation.

서버 제어부(SCO)는 이전 프레임의 복수의 대상물(U1~U3) 각각의 3차원 좌표와 현재 프레임의 복수의 대상물(U1~U3) 각각의 3차원 좌표 간의 차이에 근거하여 현재 프레임의 복수의 대상물(U1~U3) 각각에 구분 번호를 할당할 수 있다. 예를 들어, 이전 프레임의 3차원 자세와 현재 프레임의 3차워 자세 간의 유클리드 거리(이전 프레임의 3차원 자세의 3차원 좌표와 현재 프레임의 3차원 자세의 3차원 좌표 간의 차이)를 측정하고, 측정된 유클리드 거리가 가장 작은 3차원 자세에 동일한 구분 번호를 할당할 수 있다. 예를 들어, 도 8의 상부 도면을 참조하면, 시간 t₁에서의 프레임 중 제1 3차원 자세(PS1')는, 시간 t₀에서의 프레임의 제1 내지 제3 3차원 자세(PS1~PS3)와 비교하여, 유클리드 거리가 가장 작은 제1 3차원 자세(PS1)와 동일한 구분 번호를 할당받는다. 예를 들어, 도 8의 하부 도면을 참조하면, 시간 t₂에서의 프레임 중 제1 3차원 자세(PS1'')는, 시간 t₁에서의 프레임의 제1 내지 제3 3차원 자세(PS1'~PS3')와 비교하여, 유클리드 거리가 가장 작은 제1 3차원 자세(PS1')와 동일한 구분 번호를 할당받는다. The server control unit (SCO) determines the plurality of objects in the current frame based on the difference between the three-dimensional coordinates of each of the plurality of objects (U1 to U3) in the previous frame and the three-dimensional coordinates of each of the plurality of objects (U1 to U3) in the current frame. (U1~U3) A classification number can be assigned to each. For example, measure the Euclidean distance between the 3D pose of the previous frame and the 3D pose of the current frame (the difference between the 3D coordinates of the 3D pose of the previous frame and the 3D coordinates of the 3D pose of the current frame), and measure The same classification number can be assigned to the 3D pose with the smallest Euclidean distance. For example, referring to the upper drawing of FIG. 8, the first 3-dimensional posture PS1' of the frame at time t ₁ is the first to third 3-dimensional posture PS1 to PS3 of the frame at time t _0. ), is assigned the same classification number as the first 3D pose (PS1) with the smallest Euclidean distance. For example, referring to the lower drawing of FIG. 8, the first 3-dimensional pose PS1'' of the frame at time t ₂ is the first to third 3-dimensional pose PS1' of the frame at time t ₁ . Compared to ~PS3'), it is assigned the same classification number as the first 3-dimensional pose (PS1') with the smallest Euclidean distance.

서버 저장부(SST)는 복수의 엣지 디바이스(ED1~EDn)로부터 각각 수신된 추정 데이터를 저장할 수 있다. 또한, 서버 저장부(SST)는 제i 카메라 좌표계를 제j 카메라 좌표계로 변환하는 3차원 변환 행렬 T_ij를 저장할 수 있다. The server storage unit (SST) can store estimated data received from each of the plurality of edge devices (ED1 to EDn). Additionally, the server storage unit (SST) may store a three-dimensional transformation matrix T _ij that transforms the ith camera coordinate system into the jth camera coordinate system.

서버 통신부(SCM)는 복수의 엣지 디바이스(ED1~EDn)로부터 추정 데이터를 수신할 수 있다. 또한, 서버 통신부(SCM)는 복수의 대상물(U1~U3) 각각의 3차원 자세 및 위치를 제2 서버(SV2)로 송신할 수 있다. The server communication unit (SCM) can receive estimated data from a plurality of edge devices (ED1 to EDn). Additionally, the server communication unit (SCM) may transmit the 3D posture and position of each of the plurality of objects (U1 to U3) to the second server (SV2).

제2 서버(SV2)는 제1 서버(SV1)로부터 수신된 복수의 대상물(U1~U3) 각각의 3차원 자세 및 위치에 근거하여 복수의 대상물(U1~U3) 각각에 대한 가상 세계의 아바타(A1~A3)에 관한 콘텐츠를 생성할 수 있다. 예를 들어, 제2 서버(SV2)는 웹 서버(web server)일 수 있다. The second server (SV2) creates an avatar in the virtual world for each of the plurality of objects (U1 to U3) based on the 3D posture and position of each of the plurality of objects (U1 to U3) received from the first server (SV1). You can create content related to A1~A3). For example, the second server SV2 may be a web server.

도 1, 도 2, 및 도 4 내지 도 8과 함께 도 3을 참조하여, 본 발명의 일 실시형태에 따른 복수의 카메라(C1~Cn), 복수의 엣지 디바이스(ED1~EDn), 및 제1 서버(SV1)를 포함하는 3차원 자세 및 위치 추적 시스템(100)을 이용하여 복수의 대상물(U1~U3)을 위한 3차원 자세 및 위치 추적 방법을 설명하면 다음과 같다. Referring to Figure 3 along with Figures 1, 2, and 4 to 8, a plurality of cameras (C1 to Cn), a plurality of edge devices (ED1 to EDn), and a first edge device (ED1 to EDn) according to an embodiment of the present invention. A 3D posture and position tracking method for a plurality of objects (U1 to U3) using the 3D posture and position tracking system 100 including the server (SV1) will be described as follows.

단계 210에서, 복수의 카메라(C1~Cn)의 각각은 복수의 대상물(U1~U3)을 촬영함으로써 대상물 이미지 및 깊이 정보를 획득한다. In step 210, each of the plurality of cameras C1 to Cn acquires object images and depth information by photographing a plurality of objects U1 to U3.

단계 220에서, 복수의 엣지 디바이스(ED1~Edn)의 각각은 복수의 카메라(C1~Cn) 중 대응되는 카메라(C1~Cn)로부터 수신된 대상물 이미지에 근거하여 복수의 대상물(U1~U3) 각각의 2차원 스켈레톤을 추정하고, 복수의 카메라(C1~Cn) 중 대응되는 카메라(C1~Cn)로부터 수신된 깊이 정보에 근거하여 복수의 대상물(U1~U3) 각각의 3차원 위치를 추정한다. In step 220, each of the plurality of edge devices (ED1 to Edn) detects a plurality of objects (U1 to U3) based on the object image received from the corresponding camera (C1 to Cn) among the plurality of cameras (C1 to Cn). A two-dimensional skeleton is estimated, and the three-dimensional position of each of the plurality of objects (U1 to U3) is estimated based on depth information received from the corresponding camera (C1 to Cn) among the plurality of cameras (C1 to Cn).

단계 230에서, 복수의 엣지 디바이스(ED1~EDn)의 각각은 복수의 대상물(U1~U3) 각각에 대한 2차원 스켈레톤 및 추정된 3차원 위치를 포함하는 추정 데이터를 제1 서버(SV1)로 송신한다. In step 230, each of the plurality of edge devices (ED1 to EDn) transmits estimated data including a two-dimensional skeleton and an estimated three-dimensional position for each of the plurality of objects (U1 to U3) to the first server (SV1). do.

단계 240에서, 제1 서버(SV1)는 복수의 엣지 디바이스(ED1~EDn)로부터 각각 수신된 추정 데이터의 추정된 3차원 위치에 근거하여 대상물 그룹화를 수행하고, 복수의 엣지 디바이스(ED1~EDn)로부터 각각 수신된 추정 데이터의 2차원 스켈레톤 및 대상물 그룹화에 근거하여 복수의 대상물(U1~U3) 각각에 대한 3차원 자세 및 위치를 복원한다. In step 240, the first server SV1 performs object grouping based on the estimated 3D position of the estimated data received from each of the plurality of edge devices ED1 to EDn, and The 3D posture and position of each of the plurality of objects (U1 to U3) are restored based on the 2D skeleton and object grouping of the estimated data received from .

단계 240에서, 제1 서버(SV1)를 이용하여, 복수의 엣지 디바이스(ED1~EDn)로부터 각각 수신된 추정 데이터의 추정된 3차원 위치를 하나의 카메라 좌표계로 통일하는 단계가 행해질 수 있다. 단계 240에서, 이러한 통일하는 단계 후에, 제1 서버(SV1)를 이용하여, 하나의 카메라 좌표계에 표시된 복수의 대상물(U1~U3) 각각의 3차원 위치 정보에 근거하여 대상물 그룹화를 수행하는 단계가 행해질 수 있다. 단계 240에서, 이러한 대상물 그룹화를 수행하는 단계 후에, 제1 서버(SV1)를 이용하여, 대상물 그룹화에 의해 동일한 그룹에 포함된 복수의 2차원 스켈레톤에 근거하여 복수의 대상물(U1~U3) 각각의 3차원 자세 및 위치를 복원하는 단계가 행해질 수 있다. 단계 240에서, 이러한 3차원 자세 및 위치를 복원하는 단계 후에, 제1 서버(SV1)를 이용하여, 이전 프레임의 복수의 대상물(U1~U3) 각각의 3차원 좌표와 현재 프레임의 복수의 대상물(U1~U3) 각각의 3차원 좌표 간의 차이에 근거하여 현재 프레임의 복수의 대상물(U1~U3) 각각에 구분 번호를 할당하는 단계가 행해질 수 있다. In step 240, a step of unifying the estimated 3D positions of the estimated data respectively received from the plurality of edge devices ED1 to EDn into one camera coordinate system may be performed using the first server SV1. In step 240, after this unification step, a step of grouping objects based on the three-dimensional position information of each of the plurality of objects U1 to U3 displayed in one camera coordinate system is performed using the first server SV1. It can be done. In step 240, after performing this object grouping, using the first server SV1, each of the plurality of objects U1 to U3 is generated based on a plurality of two-dimensional skeletons included in the same group by object grouping. Steps to restore the three-dimensional posture and position may be performed. In step 240, after restoring the 3D posture and position, the 3D coordinates of each of the plurality of objects U1 to U3 in the previous frame and the plurality of objects in the current frame ( A step of assigning a classification number to each of the plurality of objects (U1 to U3) in the current frame may be performed based on the difference between the three-dimensional coordinates of each (U1 to U3).

예를 들어, 3차원 자세 및 위치 추적 시스템(100)은 제2 서버(SV2)를 더 포함하고, 단계 240 후에, 제2 서버(SV2)를 이용하여, 제1 서버(SV1)로부터 수신된 복수의 대상물(U1~U3) 각각의 3차원 자세 및 위치에 근거하여 복수의 대상물(U1~U3) 각각에 대한 가상 세계의 아바타(A1~A3)에 관한 콘텐츠를 생성하는 단계가 더 행해질 수 있다. For example, the three-dimensional posture and position tracking system 100 further includes a second server SV2, and after step 240, using the second server SV2, the plurality of data received from the first server SV1 A step of generating content related to the avatars (A1 to A3) of the virtual world for each of the plurality of objects (U1 to U3) based on the 3D posture and position of each of the objects (U1 to U3) may be further performed.

본 발명의 실시예의 일 특징에 따르면, 복수의 카메라(C1~Cn), 복수의 엣지 디바이스(ED1~EDn), 및 제1 서버(SV1)를 포함하는 3차원 자세 및 위치 추적 시스템(100)을 이용하여 복수의 대상물(U1~U3)을 위한 3차원 자세 및 위치 추적 방법은 a) 상기 복수의 카메라(C1~Cn)의 각각을 이용하여 복수의 대상물(U1~U3)을 촬영함으로써 대상물 이미지 및 깊이 정보를 획득하는 단계; b) 상기 복수의 엣지 디바이스(ED1~EDn)의 각각을 이용하여, 상기 복수의 카메라(C1~Cn) 중 대응되는 카메라(C1~Cn)로부터 수신된 상기 대상물 이미지에 근거하여 상기 복수의 대상물(U1~U3) 각각의 2차원 스켈레톤을 추정하고, 상기 복수의 카메라(C1~Cn) 중 대응되는 카메라(C1~Cn)로부터 수신된 상기 깊이 정보에 근거하여 상기 복수의 대상물(U1~U3) 각각의 3차원 위치를 추정하는 단계; c) 상기 복수의 엣지 디바이스(ED1~EDn)의 각각을 이용하여, 상기 복수의 대상물(U1~U3) 각각에 대한 상기 2차원 스켈레톤 및 추정된 3차원 위치를 포함하는 추정 데이터를 상기 제1 서버(SV1)로 송신하는 단계; 및 d) 상기 제1 서버(SV1)를 이용하여, 상기 복수의 엣지 디바이스(ED1~EDn)로부터 각각 수신된 상기 추정 데이터의 상기 추정된 3차원 위치에 근거하여 대상물 그룹화를 수행하고, 상기 복수의 엣지 디바이스(ED1~EDn)로부터 각각 수신된 상기 추정 데이터의 상기 2차원 스켈레톤 및 상기 대상물 그룹화에 근거하여 상기 복수의 대상물(U1~U3) 각각에 대한 3차원 자세 및 위치를 복원하는 단계를 포함하는 것을 특징으로 한다. 이에 따라, 사용자와 같은 대상물(U1~U3)의 3차원 스켈레톤을 실시간으로 추정하여 가상 세계의 아바타(A1~A3)를 제어하는 등 가상 세계의 실감성을 높일 수 있고, 예를 들어, 대상물(U1~U3)이 동일한 의상을 입고 있거나 동일한 자세를 취하고 있더라도, 복수의 대상물(U1~U3)을 구분하여 복원할 수 있다. 또한, 실제 영상이 아닌 2차원 스켈레톤을 이용하여 3차원 스켈레톤을 복원함으로써 별도의 미디어 서버 구축 비용을 절감할 수 있다. 또한, 대상물(U1~U3)을 촬영한 영상을 엣지 디바이스(ED1~EDn)에서만 처리하고 서버에 전달하지 않음으로써 프라이버시 침해 문제를 방지할 수 있다. According to one feature of an embodiment of the present invention, a three-dimensional posture and position tracking system 100 including a plurality of cameras (C1 to Cn), a plurality of edge devices (ED1 to EDn), and a first server (SV1). The three-dimensional posture and position tracking method for a plurality of objects (U1 to U3) is a) photographing a plurality of objects (U1 to U3) using each of the plurality of cameras (C1 to Cn) to obtain an object image and Obtaining depth information; b) Using each of the plurality of edge devices (ED1 to EDn), the plurality of objects ( U1 to U3) Estimating each two-dimensional skeleton, and each of the plurality of objects (U1 to U3) based on the depth information received from the corresponding camera (C1 to Cn) among the plurality of cameras (C1 to Cn) estimating the three-dimensional position of; c) Using each of the plurality of edge devices (ED1 to EDn), send estimated data including the two-dimensional skeleton and the estimated three-dimensional position for each of the plurality of objects (U1 to U3) to the first server. transmitting to (SV1); and d) using the first server SV1 to group objects based on the estimated 3D positions of the estimated data respectively received from the plurality of edge devices ED1 to EDn, and Comprising the step of restoring the three-dimensional posture and position for each of the plurality of objects (U1 to U3) based on the two-dimensional skeleton and the object grouping of the estimated data respectively received from edge devices (ED1 to EDn). It is characterized by Accordingly, it is possible to increase the realism of the virtual world, such as controlling the avatars (A1 to A3) of the virtual world by estimating the 3D skeleton of objects (U1 to U3) such as the user in real time. For example, the realism of the virtual world can be increased. Even if U1~U3) are wearing the same clothes or taking the same posture, multiple objects (U1~U3) can be distinguished and restored. Additionally, the cost of building a separate media server can be reduced by restoring the 3D skeleton using a 2D skeleton rather than an actual video. In addition, privacy infringement issues can be prevented by processing images taken of objects (U1 to U3) only in edge devices (ED1 to EDn) and not transmitting them to the server.

본 발명은 첨부된 예시 도면의 바람직한 실시형태를 중심으로 도시하고 설명하였지만, 이에 한정하지 않고 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 이하의 청구범위에 기재된 본 발명의 기술적 사상의 범위 내에서 다양한 형태로 실시할 수 있음은 물론이다. The present invention has been shown and described focusing on the preferred embodiments of the attached exemplary drawings, but is not limited thereto and is within the scope of the technical idea of the present invention as set forth in the following claims. Of course, it can be implemented in various forms.

100: 3차원 자세 및 위치 추적 시스템
C1~Cn: 카메라 U1~U3: 대상물
ED1~EDn: 엣지 디바이스 CO: 제어부
CM: 통신부 ST: 저장부
SV1: 제1 서버 SCO: 서버 제어부
SST: 서버 저장부 SCM: 서버 통신부
SV2: 제2 서버 A1~A3: 아바타 100: 3D posture and position tracking system
C1~Cn: Camera U1~U3: Object
ED1~EDn: Edge device CO: Control unit
CM: Communication department ST: Storage department
SV1: Primary Server SCO: Server Control Plane
SST: Server storage unit SCM: Server communication unit
SV2: Second server A1~A3: Avatar

Claims

In a 3D posture and position tracking method for a plurality of objects using a 3D posture and position tracking system including a plurality of cameras, a plurality of edge devices, and a first server,
a) acquiring object images and depth information by photographing a plurality of objects using each of the plurality of cameras;
b) Using each of the plurality of edge devices, estimate a two-dimensional skeleton of each of the plurality of objects based on the object image received from the corresponding camera among the plurality of cameras, and estimating a three-dimensional position of each of the plurality of objects based on the depth information received from a camera;
c) transmitting estimated data including the two-dimensional skeleton and an estimated three-dimensional position for each of the plurality of objects to the first server using each of the plurality of edge devices; and
d) Using the first server, perform object grouping based on the estimated three-dimensional position of the estimated data respectively received from the plurality of edge devices, and the estimated data respectively received from the plurality of edge devices Restoring a three-dimensional posture and position for each of the plurality of objects based on the two-dimensional skeleton and the grouping of the objects.
A three-dimensional posture and position tracking method for a plurality of objects, comprising:

The method of claim 1, wherein step d) comprises d1) unifying the estimated 3D positions of the estimated data respectively received from the plurality of edge devices into one camera coordinate system. 3D pose and position tracking method for objects.

The method of claim 2, wherein step d) further includes, after step d1), d2) grouping the objects based on 3D position information of each of the plurality of objects displayed in the single camera coordinate system. A three-dimensional posture and position tracking method for a plurality of objects, characterized in that:

The method of claim 3, wherein in step d), after step d2), d3) the three-dimensional posture and position of each of the plurality of objects based on a plurality of two-dimensional skeletons included in the same group by grouping the objects. A three-dimensional posture and position tracking method for a plurality of objects, further comprising the step of restoring.

The method of claim 4, wherein step d), after step d3), is based on the difference between d4) the three-dimensional coordinates of each of the plurality of objects in the previous frame and the three-dimensional coordinates of each of the plurality of objects in the current frame. A three-dimensional posture and position tracking method for a plurality of objects, further comprising assigning an identification number to each of the plurality of objects in the current frame.

2. The method of claim 1, wherein the three-dimensional posture and position tracking system further comprises a second server, and after step d), e) using the second server to track the plurality of objects received from the first server. A 3D posture and position tracking method for a plurality of objects, further comprising generating content related to an avatar in a virtual world for each of the plurality of objects based on each 3D posture and position.

In a three-dimensional posture and position tracking system for multiple objects,
Includes a plurality of cameras, a plurality of edge devices, and a first server,
The plurality of cameras acquire object images and depth information by each photographing a plurality of objects,
Each of the plurality of edge devices
A two-dimensional skeleton of each of the plurality of objects is estimated based on the object image received from a corresponding camera among the plurality of cameras, and the plurality of objects are estimated based on the depth information received from the corresponding camera among the plurality of cameras. A control unit that estimates the three-dimensional position of each object; and
Comprising a communication unit that transmits estimated data including the two-dimensional skeleton and the estimated three-dimensional position for each of the plurality of objects to the first server,
The first server performs object grouping based on the estimated three-dimensional positions of the estimated data respectively received from the plurality of edge devices, and generates the two-dimensional skeleton of the estimated data respectively received from the plurality of edge devices. and a server control unit that restores the 3D posture and position of each of the plurality of objects based on the grouping of the objects.

The 3D posture and posture for a plurality of objects according to claim 7, wherein the server control unit unifies the estimated 3D positions of the estimated data respectively received from the plurality of edge devices into one camera coordinate system. Location tracking system.

The 3D posture and position for a plurality of objects according to claim 8, wherein the server control unit groups the objects based on 3D position information of each of the plurality of objects displayed in the single camera coordinate system. Tracking system.

The method of claim 9, wherein the server control unit restores the three-dimensional posture and position of each of the plurality of objects based on a plurality of two-dimensional skeletons included in the same group by grouping the objects. A 3D posture and position tracking system for

The method of claim 10, wherein the server control unit controls each of the plurality of objects in the current frame based on the difference between the 3-dimensional coordinates of each of the plurality of objects in the previous frame and the 3-dimensional coordinates of each of the plurality of objects in the current frame. A three-dimensional posture and position tracking system for multiple objects, characterized by assigning a classification number.

The method of claim 7, further comprising a second server that generates content related to an avatar in a virtual world for each of the plurality of objects based on the 3D posture and position of each of the plurality of objects received from the first server. A three-dimensional posture and position tracking system for a plurality of objects, characterized in that.