KR102261498B1

KR102261498B1 - Apparatus and method for estimating the attitude of a picking object

Info

Publication number: KR102261498B1
Application number: KR1020200085712A
Authority: KR
Inventors: 류여형
Original assignee: 주식회사 두산
Priority date: 2020-07-10
Filing date: 2020-07-10
Publication date: 2021-06-07

Abstract

An estimation device is provided. The estimation device includes: an acquisition unit for acquiring a two-dimensional image of an object; an extractor configured to extract a plurality of vectors having a similarity with respect to an object region representing an object included in the two-dimensional image higher than a reference value; a determination unit determining whether to use depth information of the object; and an estimation unit for estimating a posture of the object using the two-dimensional image, wherein the estimation unit operates in a first mode for extracting a plurality of first vectors according to a similarity ranking among the plurality of vectors according to a determination result of the determination unit, or extracts a plurality of second vectors according to an inverse similarity ranking among the plurality of vectors may operate in the second mode. Therefore, it is possible to accurately estimate the posture of the picking object while excluding or minimizing the dependence on depth information.

Description

Apparatus and method for estimating the attitude of a picking object

본 발명은 로봇에 의해 픽킹되는 대상물의 자세를 추정하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for estimating the posture of an object to be picked by a robot.

물건을 가공하거나 처리하는 로봇을 해당 물건에 맞춰 제어하기 위해, 물건을 탐지하는 센서가 이용될 수 있다.In order to control a robot that processes or processes an object according to the object, a sensor for detecting an object may be used.

물건에 대한 로봇의 처리 정확도는 센서의 측정 정밀도에 관련될 수 있다. 로봇의 처리 정확도를 개선하기 위해 센서의 측정 정밀도는 높을수록 유리하다.The processing accuracy of the robot on the object may be related to the measurement precision of the sensor. In order to improve the processing accuracy of the robot, the higher the measurement precision of the sensor, the more advantageous.

하지만, 높은 측정 정밀도를 갖는 센서는 매우 고가이므로 보급화가 어렵다. 또한, 높은 측정 정밀도를 갖는 센서를 마련하는 경우에도, 물건의 종류에 따라 난반사 등을 이유로 측정 정밀도가 저하되는 상황이 발생될 수 있다.However, since a sensor having high measurement accuracy is very expensive, popularization is difficult. Also, even when a sensor having high measurement accuracy is provided, a situation in which measurement accuracy is lowered due to diffuse reflection or the like may occur depending on the type of object.

특히, 주변 환경상 낮은 정밀도의 거리 측정 수단, 뎁스(depth) 측정 수단이 적용된 경우 픽킹(picking) 대상물의 자세를 파악하기 곤란할 수 있다. 픽킹 대상물의 자세는 픽킹 작업에 있어 필수적으로 파악되어야 하는 요소 중 하나일 수 있다.In particular, when a low precision distance measuring means and a depth measuring means are applied in the surrounding environment, it may be difficult to grasp the posture of the picking object. The posture of the picking object may be one of the factors that must be grasped essential in the picking operation.

한국공개특허공보 제2019-0072285호에는 픽킹 조망 카메라를 통해 선택된 파지 대상물의 위치를 파악하고, 파지부를 이용해 파지 대상물을 파지하는 기술이 나타나 있다.Korean Patent Application Laid-Open No. 2019-0072285 discloses a technique for grasping the position of a selected gripping object through a picking view camera and gripping the gripping object using a gripper.

한국공개특허공보 제2019-0072285호Korean Patent Publication No. 2019-0072285

본 발명의 목적은 뎁스 정보에 대한 의존을 배제하거나 최소화하면서도 픽킹 대상물의 자세를 정확하게 추정하는 추정 장치 및 추정 방법을 제공하기 위한 것이다.SUMMARY OF THE INVENTION It is an object of the present invention to provide an estimation apparatus and an estimation method for accurately estimating a posture of a picking target while excluding or minimizing dependence on depth information.

본 발명의 실시예에 따르면 추정 장치가 제공된다. 상기 추정 장치는, 대상물의 2차원 영상을 획득하는 획득부; 상기 2차원 영상에 포함된 대상물을 나타내는 대상물 영역에 대한 유사도가 기준값 보다 높은 복수의 벡터를 추출하는 추출부; 상기 대상물의 뎁스 정보의 사용 여부를 판단하는 판단부; 및 상기 2차원 영상을 이용해서 상기 대상물의 자세를 추정하는 추정부;를 포함할 수 있다.According to an embodiment of the present invention, an estimation apparatus is provided. The estimation apparatus may include: an acquisition unit configured to acquire a two-dimensional image of an object; an extractor configured to extract a plurality of vectors having a similarity with respect to an object region representing the object included in the two-dimensional image higher than a reference value; a determination unit determining whether to use the depth information of the object; and an estimator for estimating the posture of the object by using the two-dimensional image.

상기 추정부는 상기 판단부의 판단 결과에 따라 상기 복수의 벡터 중에서 유사도 순위에 따라 복수의 제1 벡터를 추출하는 제1 모드로 동작하거나, 상기 복수의 벡터 중에서 유사도 역순위에 따라 복수의 제2 벡터를 추출하는 제2 모드로 동작할 수 있다.The estimator operates in a first mode for extracting a plurality of first vectors according to a similarity ranking among the plurality of vectors according to a determination result of the determination unit, or extracts a plurality of second vectors according to an inverse similarity ranking among the plurality of vectors may operate in the second mode.

상기 추출부는, 딥러닝 기법 및 유사도 판별 기법 중 적어도 하나를 사용하여 상기 대상물 영역에 유사한 순서대로 상기 복수의 벡터를 생성하거나, 상기 대상물의 3차원 모델로부터 상기 대상물 영역에 유사한 순서대로 상기 복수의 벡터를 추출하거나, 상기 복수의 벡터가 기저장된 데이터베이스로부터 상기 대상물 영역에 유사한 순서대로 상기 복수의 벡터를 추출할 수 있다.The extraction unit generates the plurality of vectors in an order similar to the target region using at least one of a deep learning technique and a similarity determination technique, or the plurality of vectors in an order similar to the target region from the three-dimensional model of the target may be extracted, or the plurality of vectors may be extracted from a database in which the plurality of vectors are stored in advance in a similar order to the target area.

상기 추출부는 상기 대상물이 취할 수 있는 전체 각도 범위를 복수의 각도 범위로 구획하며, 상기 대상물 영역과 대비하여 상기 구획된 각도 각각에 대응하는 상기 기준값 보다 높은 유사도를 갖는 상기 복수의 벡터를 추출할 수 있다.The extraction unit divides the entire angular range that the object can take into a plurality of angular ranges, and extracts the plurality of vectors having a similarity higher than the reference value corresponding to each of the divided angles in comparison with the object area. have.

상기 추출부는 딥러닝 기반의 대상물 탐지 알고리즘을 이용해서 상기 2차원 영상에서 상기 대상물 영역을 탐지할 수 있다.The extractor may detect the target region in the 2D image using a deep learning-based target detection algorithm.

상기 추출부는 상기 대상물 영역을 로컬라이즈(localize)할 수 있다.The extractor may localize the target area.

상기 추출부는 상기 대상물 영역을 기학습된 딥러닝 생성 모델의 입력으로 적용할 수 있다.The extractor may apply the target region as an input of a pre-learned deep learning generation model.

상기 추출부는 상기 딥러닝 생성 모델의 인코더 부분에서 상기 대상물 영역을 소스로 하여 임베딩된 소스 벡터와 코드북에 저장된 저장 벡터 사이의 유사도를 측정할 수 있다.The extractor may measure the similarity between the embedded source vector and the storage vector stored in the codebook using the target region as a source in the encoder part of the deep learning generation model.

상기 추출부는 측정된 상기 유사도에 따라 상기 저장 벡터 중에서 상기 대상물 영역과 유사도가 높은 상기 복수의 벡터를 추출할 수 있다.The extractor may extract the plurality of vectors having a high similarity to the target region from among the stored vectors according to the measured similarity.

본 발명의 다른 실시예에 따르면 추정 장치가 제공된다. 상기 추정 장치는, 2차원 영상에서 대상물을 나타내는 대상물 영역에 대한 유사도가 제1 기준값 보다 높은 복수의 제1 벡터를 추출하는 추출부; 및 상기 대상물의 3차원 모델에 상기 추출부에 의해 추출된 상기 복수의 제1 벡터의 자세를 적용하는 추정부;를 포함할 수 있다.According to another embodiment of the present invention, an estimation apparatus is provided. The estimation apparatus may include: an extractor configured to extract a plurality of first vectors having a similarity with respect to an object region representing an object from a two-dimensional image higher than a first reference value; and an estimator for applying the postures of the plurality of first vectors extracted by the extraction unit to the three-dimensional model of the object.

상기 추정부는 상기 복수의 제1 벡터의 자세가 적용된 상기 3차원 모델로부터 복수의 2차원 렌더링 이미지를 획득할 수 있다.The estimator may obtain a plurality of 2D rendering images from the 3D model to which the postures of the plurality of first vectors are applied.

상기 추정부는 상기 복수의 2차원 렌더링 이미지를 상기 대상물 영역과 비교할 수 있다.The estimator may compare the plurality of 2D rendered images with the target region.

상기 추정부는 상기 대상물 영역에 대한 유사도가 가장 높은 특정 2차원 렌더링 이미지를 추출하는데 사용된 특정 제1 벡터의 특정 자세를 상기 대상물의 자세로 추정할 수 있다.The estimator may estimate a specific posture of a specific first vector used to extract a specific 2D rendered image having the highest similarity to the object region as the posture of the object.

상기 추정부는 상기 복수의 벡터의 자세가 적용된 상기 3차원 모델을 상기 복수의 벡터의 개수만큼 2차원 렌더링할 수 있다.The estimator may 2D render the 3D model to which the postures of the plurality of vectors are applied by the number of the plurality of vectors.

상기 추정부는 BBox(Bounding Box) 방식을 적용하여 상기 대상물 영역을 바운딩 박스(Bounding Box) 영역 내로 제한하거나, 마스킹(Masking) 방식을 적용하여 상기 대상물 영역을 세그먼테이션(Segmentation) 영역 내로 제한할 수 있다.The estimator may apply a BBox (Bounding Box) method to limit the object area within a bounding box area, or apply a masking method to limit the object area to a segmentation area.

상기 추정부는 상기 바운딩 박스 영역 또는 상기 세그먼테이션 영역 내로 제한된 상기 대상물 영역을 상기 2차원 렌더링 이미지와 2차원 매칭시킬 수 있다.The estimator may two-dimensionally match the object area limited to the bounding box area or the segmentation area with the 2D rendering image.

상기 추정부는 상기 2차원 매칭의 스코어(score)가 가장 높은 상기 특정 2차원 렌더링 이미지를 정답으로 간주하고, 상기 특정 2차원 렌더링 이미지를 추출하는데 사용된 상기 특정 벡터의 특정 자세를 상기 대상물의 자세로 추정할 수 있다.The estimator considers the specific 2D rendered image having the highest score of the 2D matching as the correct answer, and sets the specific posture of the specific vector used to extract the specific 2D rendered image as the posture of the object. can be estimated

본 발명의 또 다른 실시예에 따른 추정 장치가 제공된다. 상기 추정 장치는, 2차원 영상에 포함된 대상물 영역에 대한 유사도가 제2 기준값을 만족하는 복수의 벡터를 추출하는 추출부; 및 상기 2차원 영상과 상기 대상물의 뎁스 정보를 이용해서 상기 대상물의 자세를 추정하는 추정부;를 포함할 수 있다.An estimation apparatus according to another embodiment of the present invention is provided. The estimating apparatus may include: an extractor configured to extract a plurality of vectors in which a similarity with respect to an object region included in a 2D image satisfies a second reference value; and an estimator for estimating the posture of the object by using the two-dimensional image and depth information of the object.

상기 추출부는 상기 복수의 벡터 중에서 상기 대상물 영역에 대한 유사도가 가장 높은 초기 벡터를 추출하고, 상기 복수의 벡터 중에서 상기 대상물 영역에 대한 유사도가 낮은 순서에 따라 복수의 제2 벡터를 추출할 수 있다.The extractor may extract an initial vector having the highest degree of similarity to the target region from among the plurality of vectors, and extract a plurality of second vectors in an order of decreasing similarity to the target region from among the plurality of vectors.

상기 추정부는 상기 초기 벡터에 대응하는 자세, 상기 복수의 제2 벡터의 자세, 및 상기 뎁스 정보 중 적어도 하나를 이용하여 상기 대상물의 자세를 추정할 수 있다.The estimator may estimate the posture of the object by using at least one of a posture corresponding to the initial vector, postures of the plurality of second vectors, and the depth information.

상기 추정부는 상기 대상물의 3차원 모델에 상기 초기 벡터의 자세 및 상기 복수의 제2 벡터의 자세를 각각 적용할 수 있다.The estimator may apply the postures of the initial vector and the postures of the plurality of second vectors to the 3D model of the object, respectively.

상기 추정부는 상기 초기 벡터의 자세가 적용된 3차원 모델 및 상기 복수의 제2 벡터의 자세가 적용된 3차원 모델을 이용하여 상기 초기 벡터와 상기 복수의 제2 벡터의 개수만큼 모델 포인트 클라우드를 생성할 수 있다.The estimator may generate as many model point clouds as the number of the initial vector and the plurality of second vectors by using the three-dimensional model to which the posture of the initial vector is applied and the three-dimensional model to which the postures of the plurality of second vectors are applied. have.

상기 추정부는 상기 뎁스 정보를 이용하여 타겟 포인트 클라우드를 생성할 수 있다.The estimator may generate a target point cloud by using the depth information.

상기 추정부는 3차원 매칭 기법을 이용하여 단일의 상기 타겟 포인트 클라우드를 추종하는 방향으로 복수의 상기 모델 포인트 클라우드를 보정할 수 있다.The estimator may correct a plurality of the model point clouds in a direction to follow the single target point cloud using a three-dimensional matching technique.

상기 추정부는 보정된 복수의 상기 모델 포인트 클라우드의 자세를 상기 3차원 모델에 적용하여 복수의 2차원 렌더링 이미지를 생성할 수 있다.The estimator may generate a plurality of 2D rendered images by applying the corrected postures of the plurality of model point clouds to the 3D model.

상기 추정부는 복수의 상기 2차원 렌더링 이미지 중 상기 대상물 영역과 가장 높은 유사도를 갖는 특정 2차원 렌더링 이미지의 자세를 상기 대상물의 자세로 추정할 수 있다.The estimator may estimate a posture of a specific 2D rendered image having the highest similarity to the target region among a plurality of the 2D rendered images as the posture of the object.

본 발명의 또 다른 실시예에 따른 추정 방법이 제공된다. 상기 추정 방법은, 대상물의 2차원 영상을 획득하는 단계; 상기 2차원 영상에 포함된 대상물을 나타내는 대상물 영역에 대한 유사도가 기준값 보다 높은 복수의 벡터를 추출하는 단계; 상기 대상물의 뎁스 정보의 사용 여부를 판단하는 단계; 및 상기 뎁스 정보를 미사용하면 제1 모드를 실행하고, 상기 뎁스 정보를 사용하면 제2 모드를 실행하는 단계를 포함할 수 있다.An estimation method according to another embodiment of the present invention is provided. The estimation method may include: acquiring a two-dimensional image of an object; extracting a plurality of vectors having a similarity with respect to an object region representing the object included in the two-dimensional image higher than a reference value; determining whether to use depth information of the object; and executing the first mode when the depth information is not used, and executing the second mode when the depth information is used.

상기 제1 모드는, 복수의 상기 벡터 중에서 상기 대상물 영역에 대한 유사도가 제1 기준값 보다 높은 복수의 제1 벡터를 추출하는 단계; 상기 대상물의 3차원 모델에 복수의 상기 제1 벡터 각각의 자세를 적용하는 단계; 복수의 상기 제1 벡터의 자세가 적용된 복수의 3차원 모델로부터 복수의 2차원 렌더링 이미지를 획득하는 단계; 복수의 상기 2차원 렌더링 이미지를 상기 대상물 영역과 비교하는 단계; 상기 대상물 영역에 대한 유사도가 가장 높은 특정 2차원 렌더링 이미지를 추출하는데 사용된 특정 제1 벡터의 특정 자세를 상기 대상물의 자세로 추정하는 단계를 포함할 수 있다.The first mode may include: extracting a plurality of first vectors having a similarity to the target region higher than a first reference value from among the plurality of vectors; applying a posture of each of the plurality of first vectors to the three-dimensional model of the object; obtaining a plurality of two-dimensional rendering images from a plurality of three-dimensional models to which the plurality of postures of the first vector are applied; comparing the plurality of two-dimensional rendered images with the target area; The method may include estimating a specific posture of a specific first vector used to extract a specific 2D rendered image having the highest similarity to the target region as the posture of the object.

상기 제2 모드는, 복수의 상기 벡터 중에서 상기 대상물 영역에 대한 유사도가 가장 높은 초기 벡터를 추출하고, 복수의 상기 벡터 중에서 상기 대상물 영역에 대한 유사도가 제2 기준값 보다 낮은 순서에 따라 복수의 제2 벡터를 추출하는 단계; 상기 대상물의 3차원 모델에 상기 초기 벡터의 자세 및 복수의 상기 제2 벡터의 자세를 각각 적용하는 단계; 상기 초기 벡터의 자세가 적용된 3차원 모델 및 복수의 상기 제2 벡터의 자세가 적용된 3차원 모델을 이용하여 상기 초기 벡터와 복수의 상기 제2 벡터의 개수만큼 모델 포인트 클라우드를 생성하는 단계; 상기 뎁스 정보를 이용하여 타겟 포인트 클라우드를 생성하는 단계; 3차원 매칭 기법을 이용하여 단일의 상기 타겟 포인트 클라우드를 추종하는 방향으로 복수의 상기 모델 포인트 클라우드를 보정하는 단계; 보정된 복수의 상기 모델 포인트 클라우드의 자세를 상기 3차원 모델에 적용하여 복수의 2차원 렌더링 이미지를 생성하는 단계; 및 복수의 상기 2차원 렌더링 이미지 중 상기 대상물 영역과 가장 높은 유사도를 갖는 특정 2차원 렌더링 이미지의 자세를 상기 대상물의 자세로 추정하는 단계를 포함할 수 있다.In the second mode, an initial vector having the highest degree of similarity to the object region is extracted from among a plurality of vectors, and a plurality of second vectors in an order in which the degree of similarity to the object region is lower than a second reference value from among the plurality of vectors extracting the vector; applying the postures of the initial vector and the postures of the plurality of second vectors to the three-dimensional model of the object, respectively; generating model point clouds as many as the number of the initial vector and the plurality of second vectors by using the three-dimensional model to which the posture of the initial vector is applied and the three-dimensional model to which the postures of the plurality of second vectors are applied; generating a target point cloud by using the depth information; correcting a plurality of the model point clouds in a direction to follow the single target point cloud using a three-dimensional matching technique; generating a plurality of two-dimensional rendering images by applying the corrected postures of the plurality of model point clouds to the three-dimensional model; and estimating, as the posture of the object, a posture of a specific 2D rendered image having the highest similarity to the target region among the plurality of 2D rendered images.

본 발명에 따르면, 뎁스 카메라가 배제되거나 저가의 저능력 뎁스 카메라가 적용된 환경에서, 2차원 영상 기반 카메라를 이용해 픽킹 대상물의 자세가 추정될 수 있다.According to the present invention, in an environment in which a depth camera is excluded or a low-cost, low-capacity depth camera is applied, the posture of the picking object may be estimated using a two-dimensional image-based camera.

본 발명의 따르면, 저렴한 RGB(red, green, blue) 카메라 또는 RGB-D(depth) 카메라를 이용하여 3차원 모델이 제공된 대상물의 6차원 자세가 추정될 수 있다.According to the present invention, a 6D posture of an object provided with a 3D model may be estimated using an inexpensive red, green, blue (RGB) camera or an RGB-D (depth) camera.

도 1은 본 발명의 실시예에 따른 추정 장치를 나타낸 블록도이다.
도 2는 본 발명의 실시예에 따른 추출부의 동작을 나타낸 개략도이다.
도 3은 제1 모드로 동작하는 추정 장치의 동작을 나타낸 개략도이다.
도 4는 본 발명의 실시예에 따라 바운딩 박스 처리된 대상물 영역 t를 나타낸 개략도이다.
도 5는 제2 모드로 동작하는 추정 장치의 동작을 나타낸 개략도이다.
도 6은 본 발명의 일 실시예에 따른 추정 방법을 나타낸 흐름도이다.
도 7은 본 발명의 일 실시예에 따른 제1 모드를 나타낸 흐름도이다.
도 8은 본 발명의 일 실시예에 따른 제2 모드를 나타낸 흐름도이다.
도 9는 본 발명의 실시예에 따른, 컴퓨팅 장치를 나타내는 도면이다.1 is a block diagram illustrating an estimation apparatus according to an embodiment of the present invention.
2 is a schematic diagram illustrating an operation of an extraction unit according to an embodiment of the present invention.
3 is a schematic diagram illustrating an operation of an estimation apparatus operating in a first mode.
4 is a schematic diagram illustrating an object area t treated with a bounding box according to an embodiment of the present invention.
5 is a schematic diagram illustrating an operation of an estimation apparatus operating in a second mode.
6 is a flowchart illustrating an estimation method according to an embodiment of the present invention.
7 is a flowchart illustrating a first mode according to an embodiment of the present invention.
8 is a flowchart illustrating a second mode according to an embodiment of the present invention.
9 is a diagram illustrating a computing device according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those of ordinary skill in the art to which the present invention pertains can easily implement them. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

본 명세서에서, 동일한 구성요소에 대해서 중복된 설명은 생략한다.In the present specification, duplicate descriptions of the same components will be omitted.

또한 본 명세서에서, 어떤 구성요소가 다른 구성요소에 '연결되어' 있다거나 '접속되어' 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에 본 명세서에서, 어떤 구성요소가 다른 구성요소에 '직접 연결되어' 있다거나 '직접 접속되어' 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.Also, in this specification, when it is said that a certain element is 'connected' or 'connected' to another element, it may be directly connected or connected to the other element, but other elements in the middle It should be understood that there may be On the other hand, in this specification, when it is mentioned that a certain element is 'directly connected' or 'directly connected' to another element, it should be understood that the other element does not exist in the middle.

또한, 본 명세서에서 사용되는 용어는 단지 특정한 실시예를 설명하기 위해 사용되는 것으로써, 본 발명을 한정하려는 의도로 사용되는 것이 아니다. In addition, the terms used herein are used only to describe specific embodiments, and are not intended to limit the present invention.

또한 본 명세서에서, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. Also, in this specification, the singular expression may include the plural expression unless the context clearly dictates otherwise.

또한 본 명세서에서, '포함하다' 또는 '가지다' 등의 용어는 명세서에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품, 또는 이들을 조합한 것이 존재함을 지정하려는 것일 뿐, 하나 또는 그 이상의 다른 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 할 것이다.Also, in this specification, terms such as 'include' or 'have' are only intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, and one or more It should be understood that the existence or addition of other features, numbers, steps, operations, components, parts or combinations thereof is not precluded in advance.

또한 본 명세서에서, '및/또는' 이라는 용어는 복수의 기재된 항목들의 조합 또는 복수의 기재된 항목들 중의 어느 항목을 포함한다. Also in this specification, the term 'and/or' includes a combination of a plurality of listed items or any of a plurality of listed items.

또한 본 명세서에서, 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략될 것이다.Also, in this specification, detailed descriptions of well-known functions and configurations that may obscure the gist of the present invention will be omitted.

도 1은 본 발명의 실시예에 따른 추정 장치를 나타낸 블록도이다. 도 2는 본 발명의 실시예에 따른 추출부(330)의 동작을 나타낸 개략도이다.1 is a block diagram illustrating an estimation apparatus according to an embodiment of the present invention. 2 is a schematic diagram showing the operation of the extraction unit 330 according to an embodiment of the present invention.

도면에 도시된 바와 같이, 본 발명의 일 실시예에 따른 추정 장치는 획득부(310), 추출부(330), 판단부(350), 추정부(370)를 포함할 수 있다.As shown in the drawing, the estimation apparatus according to an embodiment of the present invention may include an acquirer 310 , an extractor 330 , a determiner 350 , and an estimator 370 .

획득부(310)는 대상물(90)의 2차원 영상 i를 획득할 수 있다. 2차원 영상은 2차원 평면 영상을 나타낼 수 있다. 일 예로, 2차원 영상 i는 2차원 RGB(red, green, blue) 이미지, 2차원 흑백 이미지 등을 포함할 수 있다. 획득부(310)는 2차원 영상 i를 생성하는 RGB(red, green, blue) 카메라, 흑백 카메라 등을 포함할 수 있다. 또는, 획득부(310)는 RGB 카메라, 흑백 카메라 등으로부터 2차원 RGB 이미지, 2차원 흑백 이미지 등을 수신하는 통신 수단을 포함할 수 있다.The acquisition unit 310 may acquire the 2D image i of the object 90 . The 2D image may represent a 2D flat image. For example, the two-dimensional image i may include a two-dimensional RGB (red, green, blue) image, a two-dimensional black-and-white image, and the like. The acquisition unit 310 may include a red, green, blue (RGB) camera, a black-and-white camera, and the like that generate the two-dimensional image i. Alternatively, the acquisition unit 310 may include a communication means for receiving a two-dimensional RGB image, a two-dimensional black-and-white image, or the like from an RGB camera, a black-and-white camera, and the like.

추출부(330)는 2차원 영상 i에 포함된 대상물을 나타내는 대상물 영역 t에 대한 유사도가 기준값보다 높은 복수의 벡터를 추출할 수 있다. 2차원 영상 i에는 실제의 대상물이 촬영된 모습이 포함될 수 있다. 이때, 2차원 영상 내에서 대상물의 촬영 모습이 차지하는 영역이 대상물 영역 t에 해당될 수 있다.The extractor 330 may extract a plurality of vectors having a similarity higher than a reference value with respect to the object region t representing the object included in the 2D image i. The two-dimensional image i may include a photographed state of an actual object. In this case, the area occupied by the photographing image of the object in the two-dimensional image may correspond to the object area t.

본 명세서에 기술된 '벡터'는 대상물 영역 t로부터 추출된 특징점을 이용해서 대상물 또는 대상물 영역 t를 간략하게 나타낸 입체 형상을 지칭할 수 있다. 또는, '벡터'는 해당 입체 영상의 평면도와 같은 평면 영상을 지칭할 수 있다. 또는, '벡터'는 대상물 또는 대상물 영역 t를 원래 모습 그대로 나타내거나, 대상물의 3차원 모델을 나타낼 수 있다. '벡터'는 대상물을 간략하게 나타낸 입체 형상, 대상물의 본래 형상, 대상물의 3차원 모델 등에 자세가 적용된 상태를 나타낼 수 있다. '벡터'에는 자세가 포함된 상태일 수 있다.A 'vector' described herein may refer to a three-dimensional shape briefly representing an object or an object region t using feature points extracted from the object region t. Alternatively, the 'vector' may refer to a flat image such as a plan view of the corresponding stereoscopic image. Alternatively, the 'vector' may represent the object or the area t of the object as it is, or a three-dimensional model of the object. The 'vector' may indicate a state in which a posture is applied to a three-dimensional shape briefly representing an object, an original shape of the object, a three-dimensional model of the object, or the like. The 'vector' may include a posture.

'자세'는 대상물의 특정 배치 상태를 기준으로 하는 회전 각도, 위치 등을 포함할 수 있다. 일 예로, '자세'는 6자유도(6개의 자유도) 위치 중 적어도 하나를 포함할 수 있다.The 'posture' may include a rotation angle, a position, etc. based on a specific arrangement state of the object. As an example, the 'posture' may include at least one of six degrees of freedom (six degrees of freedom) positions.

서로 직교하는 3개의 좌표축 x축, y축, z축이 정의될 때, 총 6개의 자유도 위치가 정의될 수 있다. 6개의 자유도 위치는 x축 위치, y축 위치, z축 위치, x축 중심의 회전 각도, y축 중심의 회전 각도, z축 중심의 회전 각도를 포함할 수 있다.When three coordinate axes, an x-axis, a y-axis, and a z-axis, which are orthogonal to each other are defined, a total of six degrees of freedom positions may be defined. The six degrees of freedom positions may include an x-axis position, a y-axis position, a z-axis position, a rotation angle about the x-axis, a rotation angle about the y-axis, and a rotation angle about the z-axis.

추출부(330)는 딥러닝 기법 등을 이용해 기입수된 대상물의 3차원 모델 또는 데이터베이스로부터 대상물 영역 t와 유사한 복수의 벡터를 추출할 수 있다.The extraction unit 330 may extract a plurality of vectors similar to the object region t from the three-dimensional model or database of the written object using a deep learning technique or the like.

일 예로, 추출부(330)는 딥러닝 기법 및 유사도 판별 기법 중 적어도 하나를 사용하여 대상물 영역 t에 유사한 순서대로 복수의 벡터를 생성할 수 있다. 일 예로, 추출부(330)는 이미지에 대한 각종 유사도 산출 기법을 통해 대상물 영역에 대한 유사도가 기준값보다 높은 벡터를 추출할 수 있다. 또는, 추출부(330)는 대상물의 3차원 모델로부터 대상물 영역 t에 유사한 순서대로 복수의 벡터를 추출하거나, 복수의 벡터가 기저장된 데이터베이스로부터 대상물 영역 t에 유사한 순서대로 복수의 벡터를 추출할 수 있다. 벡터에는 자세를 나타내는 정보가 포함될 수 있다. 추출부(330)에 의해 추출된 복수의 벡터에 포함된 자세는 실제 대상물의 자세 추정에 사용될 수 있다.As an example, the extractor 330 may generate a plurality of vectors in an order similar to the target region t by using at least one of a deep learning technique and a similarity determination technique. As an example, the extractor 330 may extract a vector having a similarity with respect to an object region higher than a reference value through various similarity calculation techniques for the image. Alternatively, the extraction unit 330 may extract a plurality of vectors in an order similar to the object region t from the three-dimensional model of the object, or extract a plurality of vectors in an order similar to the object region t from a database in which a plurality of vectors are stored in advance. have. The vector may include information representing the posture. The posture included in the plurality of vectors extracted by the extraction unit 330 may be used for estimating the posture of the actual object.

본 발명의 추출부(330)는 대상물 영역 t에 대해 유사도가 가장 높은 단일의 벡터만을 추출하는 대신, 기준값보다 높은 유사도를 갖는 복수개의 벡터를 추출할 수 있다. 3차원 형상을 갖는 실제 대상물의 자세를 2차원 영상만을 가지고 파악하는 과정에서 각종 오류가 발생될 수 있기 때문에 기준값보다 높은 유사도를 갖는 복수의 벡터를 이용하면 해당 오류를 줄일 수 있다.The extraction unit 330 of the present invention may extract a plurality of vectors having a similarity higher than a reference value, instead of extracting only a single vector having the highest similarity with respect to the target region t. Since various errors may occur in the process of grasping the posture of an actual object having a three-dimensional shape using only a two-dimensional image, it is possible to reduce the error by using a plurality of vectors having a similarity higher than the reference value.

한편, 기준값보다 높은 유사도를 갖는 복수의 벡터는 이론상 무한대로 존재할 수 있다. 왜냐하면, 벡터의 특징에 해당하는 각도를 무한대로 쪼갤 수 있기 때문이다. 이때, 매우 근접한 각도를 갖는 복수의 벡터의 경우 거의 유사한 오류를 포함하기 때문에 매우 근접한 각도 범위 내에서 복수개의 벡터를 추출하는 것은 별다른 의미가 없을 수 있다.Meanwhile, a plurality of vectors having a similarity higher than the reference value may theoretically exist infinitely. This is because the angle corresponding to the characteristic of the vector can be divided into infinity. In this case, since a plurality of vectors having very close angles include almost similar errors, it may not be meaningful to extract a plurality of vectors within a very close angle range.

따라서, 설정 각도 이상 차이나는 복수의 벡터가 추출되는 것이 좋다. 일 예로, 추출부(330)는 대상물이 취할 수 있는 전체 각도 범위를 복수의 각도 범위로 구획할 수 있다. 추출부(330)는 대상물 영역 t와 대비하여 기구획된 각도 각각에 대응하는 기준값보다 높은 유사도를 갖는 복수의 벡터를 추출할 수 있다. 예를 들어, 전체 각도 범위가 x축 기준 360도이고, 각도 범위가 60도일 때, 추출부(330)는 0~60도 구간에서 기준값을 만족하는 하나 이상의 벡터를 추출할 수 있다. 또한, 추출부(330)는 60도~120도 구간, 120도~180도 구간, 180도~240도 구간, 240도~300도 구간, 300도~360도 구간 각각에서 기준값을 만족하는 하나 이상의 벡터를 추출할 수 있다. 만약, 특정 구간에서 기준값을 만족하는 벡터가 존재하지 않으면, 해당 구간에서는 벡터가 추출되지 않을 수 있다.Therefore, it is preferable that a plurality of vectors different by more than a set angle are extracted. For example, the extractor 330 may divide the entire angular range that the object can take into a plurality of angular ranges. The extraction unit 330 may extract a plurality of vectors having a similarity higher than a reference value corresponding to each of the planned angles with respect to the target area t. For example, when the entire angular range is 360 degrees based on the x-axis and the angular range is 60 degrees, the extractor 330 may extract one or more vectors satisfying the reference value in the 0-60 degree section. In addition, the extraction unit 330 is one or more that satisfy the reference value in each of a 60 degree to 120 degree section, a 120 degree to 180 degree section, a 180 degree to 240 degree section, a 240 degree to 300 degree section, and a 300 degree to 360 degree section. vector can be extracted. If there is no vector satisfying the reference value in a specific section, the vector may not be extracted in the corresponding section.

추출부(330)는 딥러닝 기반의 대상물 탐지 알고리즘을 이용해서 2차원 영상에서 대상물 영역 t를 탐지할 수 있다.The extractor 330 may detect the object region t in the two-dimensional image using a deep learning-based object detection algorithm.

일 예로, 추출부(330)는 도 2의 (b)와 같이 오브젝트 세그먼테이션(object segmentation) 기법을 이용하여 2차원 영상 i에서 대상물 영역 t를 한정하거나 탐지할 수 있다.For example, the extractor 330 may limit or detect the object region t in the 2D image i using an object segmentation technique as shown in FIG. 2B .

추출부(330)는 도 2의 (c)와 같이 대상물 영역 t를 로컬라이즈(localize)하고, 주변 배경을 제거(백그라운드 딜리션, background deletion)할 수 있다.The extractor 330 may localize the target area t as shown in FIG. 2(c) and remove a surrounding background (background deletion).

추출부(330)에 의해 이루어진 오브젝트 세그먼테이션, 로컬라이즈, 백그라운드 딜리션에 의해 2차원 영상 i에 포함된 다양한 2차원 이미지 중 오직 대상물 영역 t만이 추출될 수 있다.Only the target area t may be extracted from among various 2D images included in the 2D image i by the object segmentation, localization, and background subtraction performed by the extraction unit 330 .

추출부(330)는 추출된 대상물 영역 t를 기학습된 딥러닝 생성 모델의 입력으로 적용할 수 있다.The extractor 330 may apply the extracted object region t as an input of the pre-learned deep learning generation model.

일 실시예에 따르면, 추출부(330)는 딥러닝 생성 모델의 인코더 부분에서 대상물 영역 t를 소스로 하여 임베딩된 소스 벡터와 코드북에 저장된 저장 벡터 사이의 유사도를 측정할 수 있다. 추출부(330)는 측정된 유사도에 따라 저장 벡터 중에서 대상물 영역과 유사도가 높은 복수의 벡터를 추출할 수 있다.According to an embodiment, the extractor 330 may measure the similarity between the embedded source vector and the storage vector stored in the codebook by using the target region t as the source in the encoder part of the deep learning generation model. The extractor 330 may extract a plurality of vectors having a high similarity to the target region from among the stored vectors according to the measured similarity.

일 실시예에 따르면, 추출부(330)는 딥러닝의 트레이닝(training) 단계에서 저장된 각 자세(pose)별 벡터(저장 벡터) 중에서 소스 벡터와 대비하여 코사인 유사도(cosine similarity)가 기준값보다 높은 자세의 벡터를 복수개 추출할 수 있다.According to an embodiment, the extractor 330 is a posture whose cosine similarity is higher than a reference value compared to a source vector among vectors (stored vectors) for each posture stored in the training stage of deep learning. It is possible to extract a plurality of vectors of .

일 실시예에 따르면, 딥러닝 생성 모델의 기학습된 인코더 부분에 의해 2차원 영상의 특징(feature)이 추출될 수 있다. 추출된 2차원 영상의 특징은 벡터로 임베딩(embedding) DB(데이터베이스, database)화되며, 해당 벡터가 코사인 유사도의 제1 인자로 사용되는 소스 벡터에 해당될 수 있다.According to an embodiment, a feature of a two-dimensional image may be extracted by the pre-learned encoder part of the deep learning generation model. The extracted 2D image features are embedded in a vector into a DB (database, database), and the vector may correspond to a source vector used as a first factor of cosine similarity.

대상물의 3차원 모델을 이용해 렌더링한 자세별 2차원 렌더링 이미지가 마련될 수 있다. 기학습된 인코더를 이용하여 2차원 렌더링 이미지의 특징(feature)이 추출될 수 있다. 추출된 2차원 렌더링 이미지의 특징은 벡터로 임베딩(embedding) DB(데이터베이스)화되며, 해당 벡터가 코사인 유사도의 제2 인자로 사용되는 저장 벡터에 해당될 수 있다. '2차원 렌더링 이미지'는 3차원 모델를 일측에서 바라본 2차원 이미지를 지칭할 수 있다. 예를 들어 3차원 모델의 평면도, 측면도, 정면도 등이 2차원 렌더링 이미지에 해당될 수 있다.A two-dimensional rendering image for each posture rendered using a three-dimensional model of the object may be provided. A feature of the 2D rendered image may be extracted using a pre-learned encoder. Features of the extracted 2D rendered image are embedded in a vector (DB), and the vector may correspond to a storage vector used as a second factor of cosine similarity. The '2D rendering image' may refer to a 2D image viewed from one side of the 3D model. For example, a top view, a side view, a front view, etc. of the 3D model may correspond to the 2D rendering image.

코사인 유사도(cosine similarity)는 제1 인자와 제2 인자의 곱을 통해서 계산될 수 있다.The cosine similarity may be calculated through the product of the first factor and the second factor.

한 번의 벡터곱을 이용해 전체 DB의 유사도가 계산될 수 있다. 기준값보다 큰 유사도를 갖는 복수의 벡터가 유사도가 큰 순서대로 정렬될 수 있다. 이때, 추출부(330)에 의해 추출된 복수의 벡터 중 유사도가 가장 높은 벡터가 '초기 벡터'로 정의될 수 있다. 일반적으로, 초기 벡터 v0가 실제의 대상물 영역 t와 가장 유사할 것으로 기대될 수 있지만 현실은 그렇지 않다. 실험적으로 대략 30% 이상의 확률로 초기 벡터 v0가 아닌 다른 벡터가 대상물 영역 t와 가장 유사한 것으로 나타났다.Using a single vector product, the similarity of the entire DB can be calculated. A plurality of vectors having a similarity greater than a reference value may be arranged in an order of increasing similarity. In this case, a vector having the highest similarity among the plurality of vectors extracted by the extraction unit 330 may be defined as an 'initial vector'. In general, it can be expected that the initial vector v0 is most similar to the real object region t, but this is not the case. Experimentally, it was found that a vector other than the initial vector v0 was most similar to the target region t with a probability of about 30% or more.

일 예로, 도 2의 (d)는 초기 벡터 v0를 나타낸 것으로, 대상물 영역 t와 상당히 다른 자세를 취하고 있는 상태임을 알 수 있다.As an example, (d) of FIG. 2 shows the initial vector v0, and it can be seen that the posture is significantly different from that of the target area t.

본 발명의 추정 장치는 대상물의 자세를 정확하게 추정하기 위해, 초기 벡터를 곧바로 대상물의 자세로 추정하지 않고, 추가 과정을 더 진행할 수 있다.In order to accurately estimate the posture of the object, the estimation apparatus of the present invention may further perform an additional process without directly estimating the initial vector as the posture of the object.

추가 과정은 판단부(350) 및 추정부(370)에 의해 수행될 수 있다.The additional process may be performed by the determiner 350 and the estimator 370 .

다시 도 1로 돌아가서, 판단부(350)는 대상물의 뎁스 정보의 사용 여부를 판단할 수 있다.Returning to FIG. 1 again, the determination unit 350 may determine whether to use the depth information of the object.

뎁스 정보는 ToF(Time-Of-flight) 카메라, 거리 측정기, 비젼 등의 센서와 대상물 간의 거리값을 포함할 수 있다. 뎁스 정보는 획득부(310)를 통해 획득될 수 있다. 획득부(310)는 뎁스 정보를 생성하는 ToF(Time-Of-flight) 카메라, 거리 측정기, 비젼 등의 센서를 포함하거나, 해당 센서로부터 뎁스 정보를 수신하는 통신 수단을 포함할 수 있다.The depth information may include a distance value between a sensor such as a time-of-flight (ToF) camera, a distance measurer, and a vision and an object. The depth information may be acquired through the acquisition unit 310 . The acquisition unit 310 may include a sensor such as a Time-Of-flight (ToF) camera, a distance measurer, or a vision generating depth information, or a communication means for receiving depth information from the corresponding sensor.

추정부(370)는 2차원 영상을 이용해서 대상물의 자세를 추정할 수 있다.The estimator 370 may estimate the posture of the object by using the two-dimensional image.

추정부(370)는 판단부(350)의 판단 결과에 따라, 대상물 영역에 대한 유사도가 기준값보다 높은 복수의 벡터 중에서 유사도 순위에 따라 복수의 제1 벡터를 추출하는 제1 모드로 동작할 수 있다. 또는, 추정부(370)는 복수의 벡터 중에서 유사도 역순위에 따라 복수의 제2 벡터를 추출하는 제2 모드로 동작할 수 있다.The estimator 370 may operate in a first mode for extracting a plurality of first vectors according to a similarity ranking among a plurality of vectors having a similarity to an object region higher than a reference value according to the determination result of the determining unit 350 . . Alternatively, the estimator 370 may operate in a second mode for extracting a plurality of second vectors according to an inverse similarity order among a plurality of vectors.

도 3은 제1 모드로 동작하는 추정 장치의 동작을 나타낸 개략도이다.3 is a schematic diagram illustrating an operation of an estimation apparatus operating in a first mode.

제1 모드에서 추출부(330)는 2차원 영상 i에서 대상물을 나타내는 대상물 영역 t에 대한 유사도가 제1 기준값 보다 높은 복수의 제1 벡터를 추출할 수 있다.In the first mode, the extractor 330 may extract a plurality of first vectors having a similarity higher than the first reference value with respect to the object region t representing the object from the 2D image i.

일 예로, 추출부(330)는 2차원 영상 i에서 대상물을 나타내는 대상물 영역 t에 대한 유사도가 기준값보다 높은 n개(여기서, n은 3이상의 자연수이다)의 벡터를 추출할 수 있다.For example, the extractor 330 may extract n vectors (where n is a natural number equal to or greater than 3) having a similarity to the object region t representing the object from the 2D image i higher than the reference value.

추출부(330)는 n개의 벡터 중에서 유사도가 높은 순서에 따라 m개(여기서, m은 n보다 작고 2이상인 자연수이다)의 제1 벡터를 추출할 수 있다.The extraction unit 330 may extract m first vectors (where m is a natural number smaller than n and greater than or equal to 2) from among the n vectors in an order of high similarity.

일 예로, 추출부(330)는 n개의 벡터를 추출하는 과정을 생략하고, 곧바로 m개의 제1 벡터를 추출할 수 있다.As an example, the extraction unit 330 may omit the process of extracting the n vectors, and may immediately extract the m first vectors.

일 예로, 추출부(330)는 초기 벡터 v0를 제외한 나머지 벡터 중에서 유사도 순위에 따라 m개의 제1 벡터를 추출할 수 있다. 이때, 추출부(330)는 초기 벡터 v0도 추가로 추출할 수 있다. 결과적으로, 추출부(330)는 1개의 초기 벡터 v0 및 m개의 제1 벡터를 추출할 수 있다.For example, the extractor 330 may extract m first vectors according to a similarity ranking among vectors other than the initial vector v0. In this case, the extraction unit 330 may additionally extract the initial vector v0. As a result, the extractor 330 may extract one initial vector v0 and m first vectors.

일 예로, 추출부(330)는 초기 벡터 v0를 구분하지 않고, m+1개의 제1 벡터를 추출할 수 있다. 이 경우, 1개의 초기 벡터 v0 역시 자연스럽게 추출부(330)에 의해 추출될 수 있다.For example, the extractor 330 may extract m+1 first vectors without discriminating the initial vector v0. In this case, one initial vector v0 may also be naturally extracted by the extraction unit 330 .

추정부(370)는 추출부(330)에 의해 추출된 복수의 제1 벡터(초기 벡터 포함)의 자세를 대상물의 3차원 모델에 적용할 수 있다.The estimator 370 may apply the postures of the plurality of first vectors (including the initial vectors) extracted by the extraction unit 330 to the 3D model of the object.

추정부(370)는 복수의 제1 벡터 v0, v1, v2, v3, v4의 자세가 적용된 3차원 모델로부터 복수의 2차원 렌더링 이미지를 획득할 수 있다. 해당 2차원 렌더링 이미지는 도 3의 (b) 및 (c)와 같을 수 있다.The estimator 370 may obtain a plurality of 2D rendering images from the 3D model to which the postures of the plurality of first vectors v0, v1, v2, v3, and v4 are applied. The corresponding 2D rendered image may be as shown in (b) and (c) of FIG. 3 .

추정부(370)는 복수의 2차원 렌더링 이미지를 도 3의 (a)에 도시된 대상물 영역 t와 비교할 수 있다.The estimator 370 may compare the plurality of 2D rendered images with the object region t illustrated in FIG. 3A .

추정부(370)는 대상물 영역 t에 대한 유사도가 가장 높은 특정 2차원 렌더링 이미지를 추출하는데 사용된 특정 제1 벡터의 특정 자세를 대상물의 자세로 추정할 수 있다. 도 3에서 대상물 영역 t에 대한 유사도가 가장 높은 특정 2차원 렌더링 이미지는 (c)의 좌측에서부터 네번째에 있는 2차원 렌더링 이미지일 수 있다. 이때, 추정부(370)는 (c)의 네번째 2차원 렌더링 이미지를 추출하는데 사용되는 제1 벡터 v4의 자세를 대상물의 자세로 추정할 수 있다.The estimator 370 may estimate a specific posture of a specific first vector used to extract a specific 2D rendered image having the highest similarity with respect to the object region t as the posture of the object. In FIG. 3 , a specific 2D rendered image having the highest similarity to the object region t may be a 4th 2D rendered image from the left of (c). In this case, the estimator 370 may estimate the posture of the first vector v4 used to extract the fourth 2D rendered image of (c) as the posture of the object.

구체적으로, 추정부(370)는 복수의 제1 벡터의 자세가 적용된 3차원 모델을 복수의 제1 벡터의 개수만큼 2차원 렌더링할 수 있다.Specifically, the estimator 370 may 2D render the 3D model to which the postures of the plurality of first vectors are applied by the number of the plurality of first vectors.

추정부(370)는 BBox(Bounding Box) 방식을 적용하여 대상물 영역 t를 바운딩 박스(Bounding Box) 영역 내로 제한하거나, 마스킹(Masking) 방식을 적용하여 대상물 영역 t를 세그먼테이션(Segmentation) 영역 내로 제한할 수 있다.The estimator 370 applies the BBox (Bounding Box) method to limit the object area t within the bounding box area, or applies the masking method to limit the object area t to the segmentation area. can

도 4는 본 발명의 실시예에 따라 바운딩 박스 처리된 대상물 영역 t를 나타낸 개략도이다.4 is a schematic diagram illustrating an object area t treated with a bounding box according to an embodiment of the present invention.

추정부(370)는 BBox 방식 또는 Masking(마스킹) 방식을 적용하여 대상물 영역 t를 간단한 다각형 형상에 해당하는 바운딩 박스(Bounding Box) 영역 BB 또는 세그먼테이션(Segmentation) 영역으로 나타낼 수 있다. BBox란 Bounding Box의 약자로 줄여서 BBox라고 부른다. 2D 또는 3D 오브젝트의 형태를 모두 포함할 수 있는 최소 크기의 박스를 지칭할 수 있다. 세그먼테이션 영역은 대상물(90)의 이미지 분할(image segmentation)을 통해 획득된 영역을 지칭할 수 있다.The estimator 370 may represent the object area t as a bounding box area BB or a segmentation area corresponding to a simple polygonal shape by applying a BBox method or a masking method. BBox is an abbreviation of Bounding Box and is called BBox for short. It may refer to a box having a minimum size that can include both the shapes of 2D or 3D objects. The segmentation area may refer to an area obtained through image segmentation of the object 90 .

도 4에는 3개의 바운딩 박스 0, 1, 2가 형성되고 있다.In FIG. 4, three bounding boxes 0, 1, and 2 are formed.

추정부(370)는 바운딩 박스 영역 또는 세그먼테이션 영역 내로 제한된 대상물 영역 t를 2차원 렌더링 이미지와 2차원 매칭시킬 수 있다.The estimator 370 may two-dimensionally match the object area t limited to the bounding box area or the segmentation area with the 2D rendering image.

추정부(370)는 2차원 매칭의 스코어(score)가 가장 높은 특정 2차원 렌더링 이미지를 정답으로 간주하고, 특정 2차원 렌더링 이미지를 추출하는데 사용된 특정 제1 벡터의 특정 자세를 대상물의 자세로 추정할 수 있다.The estimator 370 considers the specific 2D rendered image having the highest score of 2D matching as the correct answer, and sets the specific posture of the specific first vector used to extract the specific 2D rendered image as the posture of the object. can be estimated

도 5는 제2 모드로 동작하는 추정 장치의 동작을 나타낸 개략도이다.5 is a schematic diagram illustrating an operation of an estimation apparatus operating in a second mode.

제2 모드는 대상물의 뎁스 정보를 이용해서 대상물의 자세를 추정하는 추정 장치의 동작 모드일 수 있다. 뎁스 정보의 정확도가 높다면, 뎁스 정보만을 이용해서 대상물의 자세를 정확하게 분석할 수 있다. 그러나, 뎁스 정보가 부정확하다면, 다른 방법을 이용해서 대상물의 자세를 분석해야 하며, 그 일환으로 제2 모드가 사용될 수 있다.The second mode may be an operation mode of the estimator for estimating the posture of the object using depth information of the object. If the accuracy of the depth information is high, the posture of the object may be accurately analyzed using only the depth information. However, if the depth information is inaccurate, the posture of the object must be analyzed using another method, and the second mode may be used as part of the analysis.

제2 모드에서 추출부(330)는 2차원 영상에 포함된 대상물 영역 t에 대한 유사도가 제2 기준값을 만족하는 복수의 벡터를 추출할 수 있다.In the second mode, the extractor 330 may extract a plurality of vectors whose similarity with respect to the object region t included in the 2D image satisfies the second reference value.

추출부(330)는 도 5의 (b)와 같이 복수의 벡터 중에서 대상물 영역 t에 대한 유사도가 가장 높은 초기 벡터 v0를 추출할 수 있다. 그리고, 추출부(330)는 복수의 벡터 중에서 대상물 영역 t에 대한 유사도가 낮은 순서에 따라 복수의 제2 벡터를 추출할 수 있다.The extraction unit 330 may extract an initial vector v0 having the highest degree of similarity to the target region t from among a plurality of vectors as shown in FIG. 5B . In addition, the extractor 330 may extract a plurality of second vectors according to an order in which the degree of similarity to the object region t is low among the plurality of vectors.

예를 들어, 추출부(330)는 n보다 작고 2이상의 m개만큼 제2 벡터를 추출할 수 있다.For example, the extraction unit 330 may extract the number of m second vectors smaller than n and 2 or more.

n이 8이고, m이 4인 경우를 가정한다. 유사도가 높은 순서대로 v1, v2, v3, v4, v5, v6, v7, v8 총 8개의 벡터가 정렬된 경우, 추출부(330)는 도 5의 (c)와 같이 유사도가 낮은 순서에 따라 4개의 제2 백터 v8, v7, v6, v5를 추출할 수 있다.Assume that n is 8 and m is 4. When a total of eight vectors v1, v2, v3, v4, v5, v6, v7, v8 are sorted in the order of high similarity, the extractor 330 performs 4 in the order of low similarity as shown in FIG. 5(c). Second vectors v8, v7, v6, and v5 can be extracted.

추정부(370)는 2차원 영상 i와 대상물의 뎁스 정보를 이용해서 대상물의 자세를 추정할 수 있다. 추정부(370)는 초기 벡터 v0에 대응하는 자세, 복수의 제2 벡터의 자세, 및 뎁스 정보 중 적어도 하나를 이용하여 대상물의 자세를 추정할 수 있다.The estimator 370 may estimate the posture of the object using the 2D image i and depth information of the object. The estimator 370 may estimate the posture of the object by using at least one of a posture corresponding to the initial vector v0, postures of a plurality of second vectors, and depth information.

추정부(370)는 대상물의 3차원 모델에 초기 벡터 v0의 자세 및 복수의 제2 벡터 v8, v7, v6, v5의 자세를 각각 적용할 수 있다.The estimator 370 may apply the postures of the initial vector v0 and the postures of the plurality of second vectors v8, v7, v6, and v5 to the 3D model of the object, respectively.

추정부(370)는 초기 벡터 v0의 자세가 적용된 3차원 모델 및 복수의 제2 벡터 v8, v7, v6, v5의 자세가 적용된 3차원 모델을 이용하여 초기 벡터와 복수의 제2 벡터의 개수만큼 모델 포인트 클라우드(model point cloud)를 생성할 수 있다.The estimator 370 uses the 3D model to which the posture of the initial vector v0 is applied and the 3D model to which the postures of the plurality of second vectors v8, v7, v6, and v5 are applied by the number of the initial vector and the plurality of second vectors. You can create a model point cloud.

포인트 클라우드는 3차원 데이터를 표현하기 위한 방법으로 대상물(외면)을 복수의 점으로 나타낸 형식을 지칭할 수 있다.A point cloud is a method for expressing three-dimensional data, and may refer to a format in which an object (outer surface) is represented by a plurality of points.

추정부(370)는 뎁스 정보를 이용하여 타겟 포인트 클라우드(target point cloud)를 생성할 수 있다.The estimator 370 may generate a target point cloud by using the depth information.

추정부(370)는 3차원 매칭 기법을 이용하여 단일의 타겟 포인트 클라우드를 추종하는 방향으로 복수의 모델 포인트 클라우드를 보정할 수 있다. 3차원 매칭 기법을 이용한 모델 포인트 클라우드의 보정은 일반적으로 불완전하게 수행될 수 있다.The estimator 370 may correct a plurality of model point clouds in a direction to follow a single target point cloud using a three-dimensional matching technique. Correction of the model point cloud using the three-dimensional matching technique may be generally performed incompletely.

예를 들어, 타겟 모델 포인트 클라우드가 30도 회전된 상태이고, 특정 모델 포인트 클라우드가 15도 회전된 상태인 경우를 가정한다. 이때, 3차원 매칭 기법을 이용하면, 15도 회전된 상태의 특정 모델 포인트 클라우드는 타겟 모델 포인트 클라우드와 똑같아지려는(추종하려는) 방향으로 회전할 수 있다. 해당 회전을 통해 기존 15도 회전된 상태의 특정 모델 포인트 클라우드는 30도까지 회전되지는 못하고, 30도와 15도 사이의 특정 각도, 예를 들어 20도만큼 회전한 상태로 보정될 수 있다. 3차원 매칭 기법은 매우 다양하며, 키포인트 디스크립터(Keypoint Descriptor)에 따라 3D matching 알고리즘이 결정될 수 있다. 일 예로, 3차원 매칭 기법으로 ICP(Iterative Closest Point) 계열, RANSAC(RANdom SAmple Consensus) 계열, GMM(Gaussian Mixture Model) 계열의 알고리즘이 사용될 수 있다.For example, it is assumed that the target model point cloud is rotated by 30 degrees and the specific model point cloud is rotated by 15 degrees. In this case, if the 3D matching technique is used, the specific model point cloud rotated by 15 degrees may be rotated in a direction to be the same as (to follow) the target model point cloud. Through the rotation, the specific model point cloud in the existing 15 degree rotation state cannot be rotated up to 30 degrees, but may be corrected to be rotated by a specific angle between 30 degrees and 15 degrees, for example, 20 degrees. 3D matching techniques are very diverse, and a 3D matching algorithm may be determined according to a keypoint descriptor. For example, an Iterative Closest Point (ICP) series, RANdom SAmple Consensus (RANSAC) series, and Gaussian Mixture Model (GMM) series algorithms may be used as the 3D matching technique.

추정부(370)는 보정된 복수의 모델 포인트 클라우드의 자세를 3차원 모델에 적용하여 복수의 2차원 렌더링 이미지를 생성할 수 있다.The estimator 370 may generate a plurality of 2D rendered images by applying the corrected postures of the plurality of model point clouds to the 3D model.

추정부(370)는 복수의 2차원 렌더링 이미지 중 대상물 영역 t와 가장 높은 유사도를 갖는 특정 2차원 렌더링 이미지의 자세 vt를 대상물의 자세로 추정할 수 있다.The estimator 370 may estimate the posture vt of a specific 2D rendered image having the highest similarity to the object region t among the plurality of 2D rendered images as the posture of the object.

제2 모드에 따르면, 대상물 영역 t와 가장 높은 유사도를 갖는 초기 벡터 v0가 적용되고 뎁스 정보에 의해 보정된 3차원 모델의 2차원 렌더링 이미지가 대상물의 자세 후보군에 포함될 수 있다. 일반적으로, 초기 벡터 v0로부터 파생된 2차원 렌더링 이미지는 대상물 영역 t를 추종하는 경향을 가질 수 있다.According to the second mode, an initial vector v0 having the highest similarity to the object region t may be applied and a 2D rendering image of a 3D model corrected by depth information may be included in the posture candidate group of the object. In general, the two-dimensional rendered image derived from the initial vector v0 may have a tendency to follow the object region t.

하지만, 실험적으로 10~30%의 확률로 초기 벡터 v0가 대상물 영역 t와 전혀 상관없는 벡터로 선정되는 경우가 존재한다. 초기 벡터 v0로부터 파생된 2차원 렌더링 이미지는 대상물 영역 t와 전혀 다를 수 있다. 이 경우, 유사도가 낮은 벡터 중에서 실제 대상물 영역 t와 유사한 벡터가 존재하는 경향이 나타나며, 해당 경향을 커버하기 위해 추정부(370)는 유사도가 낮은 순위를 갖는 제2 벡터를 대상물의 자세 추정에 사용할 수 있다.However, there is a case in which the initial vector v0 is selected as a vector having no relation to the target area t at all experimentally with a probability of 10 to 30%. The two-dimensional rendered image derived from the initial vector v0 may be completely different from the object region t. In this case, there is a tendency that a vector similar to the actual target region t exists among vectors having a low similarity, and in order to cover the tendency, the estimator 370 uses a second vector having a low similarity rank to estimate the posture of the object. can

다만, 유사도가 낮은 제2 벡터의 경우, 자세의 보정이 필수적으로 요구되며 해당 보정을 위해 뎁스 정보가 사용될 수 있다. 따라서, 제2 모드의 경우, 뎁스 정보가 반드시 요구된다. 따라서, 뎁스 정보를 사용하는 경우, 추정 장치는 초기 벡터 v0가 대상물 영역 t와 다른 경우까지 대비할 수 있는 제2 모드로 동작하는 것이 바람직하다. 만약, 뎁스 정보를 사용하지 않는 경우, 추정 장치는 어쩔 수 없이 뎁스 정보를 사용하지 않는 제1 모드로 동작할 수 있다.However, in the case of the second vector having a low similarity, posture correction is essential, and depth information may be used for the correction. Therefore, in the second mode, depth information is always required. Therefore, when using the depth information, it is preferable that the estimation apparatus operates in the second mode in which the initial vector v0 is different from the object area t. If depth information is not used, the estimation apparatus may inevitably operate in the first mode in which depth information is not used.

여기서, 뎁스 정보를 측정할 수 있는 센서 (상술한 TOF센서, Stereo 카메라, Range 센서 등)가 존재하여 뎁스 정보가 존재하는 경우에도, 뎁스 정보의 신뢰도가 낮은 경우, 추정부(370)는 뎁스 정보를 사용하지 않는 것으로 결정할 수 있다. 경우에 따라서, 수동으로 뎁스 정보를 사용하지 않는 옵션이 사용자에 의해 선택될 수도 있다.Here, even when a sensor capable of measuring depth information (the above-described TOF sensor, stereo camera, range sensor, etc.) exists and the depth information exists, when the reliability of the depth information is low, the estimator 370 performs the depth information can be decided not to use. In some cases, an option not to manually use depth information may be selected by the user.

제2 모드에도 앞서 언급한 BBox(Bounding Box) 방식을 적용하여 대상물 영역 t를 바운딩 박스(Bounding Box) 영역 내로 제한하거나, 마스킹(Masking) 방식을 적용하여 대상물 영역 t를 세그먼테이션(Segmentation) 영역 내로 제한할 수 있다. 대상물 영역을 바운딩 박스 영역 또는 세그먼테이션 영역 내로 제한하면, 연산량이 줄어들 수 있다.In the second mode, the above-mentioned BBox (Bounding Box) method is applied to limit the object area t within the bounding box area, or the masking method is applied to restrict the object area t to within the segmentation area. can do. If the object area is limited within the bounding box area or the segmentation area, the amount of computation may be reduced.

도 6은 본 발명의 일 실시예에 따른 추정 방법을 나타낸 흐름도이다.6 is a flowchart illustrating an estimation method according to an embodiment of the present invention.

도 6에 도시된 추정 방법은 도 1의 추정 장치에 의해 수행될 수 있다.The estimation method shown in FIG. 6 may be performed by the estimation apparatus of FIG. 1 .

본 발명의 추정 방법은 획득 단계(S 710), 추출 단계(S 720), 판단 단계(S 730), 실행 단계(S 740, S 750)를 포함할 수 있다.The estimation method of the present invention may include an acquisition step (S710), an extraction step (S720), a determination step (S730), and an execution step (S740, S750).

획득 단계(S 710)는 대상물의 2차원 영상 i를 획득할 수 있다. 경우에 따라, 획득 단계(S 710)에서 대상물의 뎁스 정보, 대상물의 3차원 모델이 추가로 획득될 수 있다. 획득 단계(S 710)는 획득부(310)에 의해 수행될 수 있다.In the acquiring step ( S710 ), a two-dimensional image i of the object may be acquired. In some cases, depth information of the object and a three-dimensional model of the object may be additionally obtained in the obtaining step ( S710 ). The acquiring step ( S710 ) may be performed by the acquiring unit 310 .

추출 단계(S 720)는 2차원 영상 i에 포함된 대상물을 나타내는 대상물 영역 t에 대한 유사도가 기준값보다 높은 복수의 벡터를 추출할 수 있다. 추출 단계(S 720)를 통해 유사도가 기준값보다 높은 n개의 벡터가 추출될 수 있다. n개의 벡터에는 유사도가 가장 높은 초기 벡터 v0가 포함될 수 있다. 추출 단계(S 720)는 추출부(330)에 의해 수행될 수 있다.In the extraction step ( S720 ), a plurality of vectors having a similarity higher than a reference value with respect to the object region t representing the object included in the 2D image i may be extracted. Through the extraction step ( S720 ), n vectors having a similarity higher than the reference value may be extracted. The n vectors may include an initial vector v0 having the highest similarity. The extraction step S720 may be performed by the extraction unit 330 .

추출 단계(S 720)를 통해 평면적으로 대상물 영역 t와 유사한 벡터가 기계적으로 파악될 수 있다. 하지만, 기계적 판단의 오류 등으로 인해, 실제 대상물과 다른 벡터 및 자세가 대상물의 자세로 선정되는 경우가 발생되며, 해당 에러를 최소화하기 위한 방안으로 판단 단계(S 730) 및 실행 단계(S 740, S 750)가 더 수행될 수 있다.Through the extraction step (S720), a vector similar to the target area t in a plan view may be mechanically grasped. However, due to errors in mechanical judgment, etc., there are cases where a vector and a posture different from the actual object are selected as the posture of the object, and the determination step (S 730) and the execution step (S 740, S 740, S750) may be further performed.

판단 단계(S 730)는 대상물의 뎁스 정보의 사용 여부를 판단할 수 있다. 판단 단계(S 730)는 판단부(350)에 의해 수행될 수 있다.In the determination step S730, it may be determined whether the depth information of the object is used. The determination step S730 may be performed by the determination unit 350 .

실행 단계(S 740, S 750)는 뎁스 정보를 미사용하는 것으로 판단되면 제1 모드(S 740)를 실행하고, 뎁스 정보를 사용하는 것으로 판단되면 제2 모드(S 750)를 실행할 수 있다. 실행 단계(S 740, S 750)는 추정부(370)에 의해 수행될 수 있다. 실행 단계에서 제1 벡터, 제2 벡터의 추출은 추출부(330)에 의해 수행될 수 있다.In the execution steps S 740 and S 750 , if it is determined that the depth information is not used, the first mode S 740 may be executed, and if it is determined that the depth information is used, the second mode S 750 may be executed. The execution steps S740 and S750 may be performed by the estimator 370 . In the execution step, the extraction of the first vector and the second vector may be performed by the extraction unit 330 .

도 7은 본 발명의 일 실시예에 따른 제1 모드를 나타낸 흐름도이다.7 is a flowchart illustrating a first mode according to an embodiment of the present invention.

제1 모드(S 740)는 복수의 단계(S 741, S 742, S 743, S 744, S 745, S 746)를 포함할 수 있다.The first mode S 740 may include a plurality of steps S 741 , S 742 , S 743 , S 744 , S 745 , and S 746 .

제1 모드는 복수의 벡터 중에서 대상물 영역 t에 대한 유사도가 제1 기준값보다 높은 복수의 제1 벡터를 추출할 수 있다(S 741). 일 예로, 제1 모드로 동작하는 추출부(330)는 n개의 벡터 중에서 m개의 제1 벡터를 추출할 수 있다.In the first mode, a plurality of first vectors having a similarity to the object region t higher than the first reference value may be extracted from among the plurality of vectors (S 741). For example, the extractor 330 operating in the first mode may extract m first vectors from among n vectors.

제1 모드는 대상물의 3차원 모델에 복수의 제1 벡터 각각의 자세를 적용할 수 있다(S 742). 해당 동작은 추정부(370)에 의해 수행될 수 있다.In the first mode, the posture of each of the plurality of first vectors may be applied to the three-dimensional model of the object (S 742). The corresponding operation may be performed by the estimator 370 .

제1 모드는 복수의 제1 벡터의 자세가 적용된 복수의 3차원 모델로부터 복수의 2차원 렌더링 이미지를 획득할 수 있다(S 743, S 744, S745). 해당 동작은 추정부(370)에 의해 수행될 수 있다.In the first mode, a plurality of 2D rendering images may be acquired from a plurality of 3D models to which the postures of the plurality of first vectors are applied ( S743 , S744 , and S745 ). The corresponding operation may be performed by the estimator 370 .

처리 부하의 경감을 위해, 추정부(370)는 BBox 방식(BBOX) 또는 마스킹 방식(Masking) 방식을 적용하여 대상물 영역 t의 처리 부하를 줄일 수 있다(S S 743).In order to reduce the processing load, the estimator 370 may reduce the processing load of the object area t by applying a BBox method (BBOX) or a masking method (S S 743 ).

BBox 방식이 적용되면, 추정부(370)에 의해 대상물 영역 t가 도 4와 같이 바운딩 박스(Bounding Box) 영역 내로 제한될 수 있다(S 744).When the BBox method is applied, the object area t may be limited within the bounding box area as shown in FIG. 4 by the estimator 370 (S 744).

마스킹 방식이 적용되면, 추정부(370)에 의해 대상물 영역 t가 세그먼테이션 영역 내로 제한될 수 있다(S 745).When the masking method is applied, the object area t may be limited within the segmentation area by the estimator 370 (S 745).

제1 모드는 복수의 2차원 렌더링 이미지를 대상물 영역 t와 비교할 수 있다. 만약, 대상물 영역 t가 바운딩 박스 영역 내로 제한되거나 세그먼테이션 영역 내로 제한된 상태이면, 제1 모드는 대상물 영역 t 대신 바운딩 박스 또는 세그먼테이션 영역을 2차원 레더링 이미지의 비교 대상으로 사용할 수 있다. 제1 모드는 대상물 영역에 대한 유사도가 가장 높은 특정 2차원 렌더링 이미지를 추출하는데 사용된 특정 제1 벡터의 특정 자세를 대상물의 자세로 추정할 수 있다(S 746). 해당 동작은 추정부(370)에 의해 수행될 수 있다.In the first mode, a plurality of 2D rendered images may be compared with the object region t. If the object area t is restricted within the bounding box area or within the segmentation area, the first mode may use the bounding box or segmentation area instead of the object area t as a comparison target of the 2D rendering image. In the first mode, a specific posture of a specific first vector used to extract a specific 2D rendered image having the highest similarity to the object region may be estimated as the posture of the object (S 746). The corresponding operation may be performed by the estimator 370 .

도 8은 본 발명의 일 실시예에 따른 제2 모드를 나타낸 흐름도이다.8 is a flowchart illustrating a second mode according to an embodiment of the present invention.

제2 모드(S 750)는 복수의 단계(S 751, S 752, S 753, S 754, S 755, S 756, S 757, S 758)를 포함할 수 있다.The second mode S 750 may include a plurality of steps S 751 , S 752 , S 753 , S 754 , S 755 , S 756 , S 757 , and S 758 .

제2 모드는 복수의 벡터 중에서 대상물 영역 t에 대한 유사도가 가장 높은 초기 벡터 v0를 추출할 수 있다. 이와 함께, 제2 모드는 복수의 벡터 중에서 대상물 영역 t에 대한 유사도가 제2 기준값보다 낮은 순서에 따라 복수의 제2 벡터를 추출할 수 있다(S 751). 제2 모드로 동작하는 환경에서, 추출부(330)는 유사도가 높은 순서대로 n개의 벡터를 추출하고, n개의 범위내에서 유사도가 낮은 순서에 따라 m개의 제2 벡터를 추출할 수 있다.In the second mode, an initial vector v0 having the highest similarity to the target region t may be extracted from among the plurality of vectors. In addition, in the second mode, a plurality of second vectors may be extracted in an order in which the degree of similarity to the object region t is lower than the second reference value among the plurality of vectors ( S751 ). In an environment operating in the second mode, the extraction unit 330 may extract n vectors in an order of high similarity, and may extract m second vectors in an order of low similarity within the n range.

제2 모드는 대상물의 3차원 모델에 초기 벡터 v0의 자세 및 복수의 제2 벡터의 자세를 각각 적용할 수 있다(S 752). 또한, 제2 모드는 초기 벡터의 자세가 적용된 3차원 모델 및 복수의 제2 벡터의 자세가 적용된 3차원 모델을 이용하여 초기 벡터와 복수의 제2 벡터의 개수만큼 모델 포인트 클라우드를 생성할 수 있다(S 752). 해당 동작은 추정부(370)에 의해 수행될 수 있다.In the second mode, the posture of the initial vector v0 and the posture of the plurality of second vectors may be respectively applied to the 3D model of the object (S 752). In addition, in the second mode, the model point cloud may be generated by the number of the initial vector and the plurality of second vectors by using the 3D model to which the posture of the initial vector is applied and the 3D model to which the posture of the plurality of second vectors is applied. (S 752). The corresponding operation may be performed by the estimator 370 .

제2 모드는 뎁스 정보를 이용하여 타겟 포인트 클라우드를 생성할 수 있다. 해당 동작은 추정부(370)에 의해 수행될 수 있다.In the second mode, a target point cloud may be generated using depth information. The corresponding operation may be performed by the estimator 370 .

모델 포인트 클라우드와 타겟 포인트 클라우드의 처리 부하를 경감하기 위해, 추정부(370)는 BBox 방식(BBOX) 또는 마스킹 방식(Masking) 방식을 선택 적용할 수 있다(S 753).To reduce the processing load of the model point cloud and the target point cloud, the estimator 370 may selectively apply a BBox method (BBOX) or a masking method (S753).

BBox 방식이 선택되면, 추정부(370)는 모델 포인트 클라우드, 타겟 포인트 클라우드 중 적어도 하나를 바운딩 박스 영역 내로 제한할 수 있다(S 754).When the BBox method is selected, the estimator 370 may limit at least one of the model point cloud and the target point cloud within the bounding box area (S 754).

마스킹 방식이 선택되면, 추정부(370)는 모델 포인트 클라우드와 타겟 포인트 클라우드 중 적어도 하나를 세그먼테이션 영역 내로 제한할 수 있다(S 755).When the masking method is selected, the estimator 370 may limit at least one of the model point cloud and the target point cloud within the segmentation area (S 755).

제2 모드는 3차원 매칭 기법을 이용하여 단일의 타겟 포인트 클라우드를 추종하는 방향으로 복수의 모델 포인트 클라우드를 보정할 수 있다(S 756). 해당 동작은 추정부(370)에 의해 수행될 수 있다.In the second mode, a plurality of model point clouds may be corrected in a direction that follows a single target point cloud by using a three-dimensional matching technique (S 756). The corresponding operation may be performed by the estimator 370 .

제2 모드는 보정된 복수의 모델 포인트 클라우드의 자세를 3차원 모델에 적용하여 복수의 2차원 렌더링 이미지를 생성할 수 있다(S 757). 해당 동작은 추정부(370)에 의해 수행될 수 있다.In the second mode, a plurality of two-dimensional rendering images may be generated by applying the corrected postures of the plurality of model point clouds to the three-dimensional model (S 757). The corresponding operation may be performed by the estimator 370 .

제2 모드는 복수의 2차원 렌더링 이미지 중 대상물 영역과 가장 높은 유사도를 갖는 특정 2차원 렌더링 이미지의 자세 vt를 대상물의 자세로 추정할 수 있다(S 758). 해당 동작은 추정부(370)에 의해 수행될 수 있다.In the second mode, the posture vt of a specific 2D rendered image having the highest similarity to the target region among the plurality of 2D rendered images may be estimated as the posture of the object (S 758). The corresponding operation may be performed by the estimator 370 .

도 9는 본 발명의 실시예에 따른, 컴퓨팅 장치를 나타내는 도면이다. 도 9의 컴퓨팅 장치(TN100)는 본 명세서에서 기술된 장치(예, 추정 장치 등) 일 수 있다. 도 9의 실시예에서, 컴퓨팅 장치(TN100)는 적어도 하나의 프로세서(TN110), 송수신 장치(TN120), 및 메모리(TN130)를 포함할 수 있다. 또한, 컴퓨팅 장치(TN100)는 저장 장치(TN140), 입력 인터페이스 장치(TN150), 출력 인터페이스 장치(TN160) 등을 더 포함할 수 있다. 컴퓨팅 장치(TN100)에 포함된 구성 요소들은 버스(bus)(TN170)에 의해 연결되어 서로 통신을 수행할 수 있다.9 is a diagram illustrating a computing device according to an embodiment of the present invention. The computing device TN100 of FIG. 9 may be a device (eg, an estimation device, etc.) described herein. In the embodiment of FIG. 9 , the computing device TN100 may include at least one processor TN110 , a transceiver device TN120 , and a memory TN130 . Also, the computing device TN100 may further include a storage device TN140 , an input interface device TN150 , an output interface device TN160 , and the like. Components included in the computing device TN100 may be connected by a bus TN170 to communicate with each other.

프로세서(TN110)는 메모리(TN130) 및 저장 장치(TN140) 중에서 적어도 하나에 저장된 프로그램 명령(program command)을 실행할 수 있다. 프로세서(TN110)는 중앙 처리 장치(CPU: central processing unit), 그래픽 처리 장치(GPU: graphics processing unit), 또는 본 발명의 실시예에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. 프로세서(TN110)는 본 발명의 실시예와 관련하여 기술된 절차, 기능, 및 방법 등을 구현하도록 구성될 수 있다. 프로세서(TN110)는 컴퓨팅 장치(TN100)의 각 구성 요소를 제어할 수 있다.The processor TN110 may execute a program command stored in at least one of the memory TN130 and the storage device TN140. The processor TN110 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to an embodiment of the present invention are performed. The processor TN110 may be configured to implement procedures, functions, methods, and the like described in connection with an embodiment of the present invention. The processor TN110 may control each component of the computing device TN100 .

메모리(TN130) 및 저장 장치(TN140) 각각은 프로세서(TN110)의 동작과 관련된 다양한 정보를 저장할 수 있다. 메모리(TN130) 및 저장 장치(TN140) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(TN130)는 읽기 전용 메모리(ROM: read only memory) 및 랜덤 액세스 메모리(RAM: random access memory) 중에서 적어도 하나로 구성될 수 있다. Each of the memory TN130 and the storage device TN140 may store various information related to the operation of the processor TN110 . Each of the memory TN130 and the storage device TN140 may be configured as at least one of a volatile storage medium and a non-volatile storage medium. For example, the memory TN130 may include at least one of a read only memory (ROM) and a random access memory (RAM).

송수신 장치(TN120)는 유선 신호 또는 무선 신호를 송신 또는 수신할 수 있다. 송수신 장치(TN120)는 네트워크에 연결되어 통신을 수행할 수 있다.The transceiver TN120 may transmit or receive a wired signal or a wireless signal. The transceiver TN120 may be connected to a network to perform communication.

한편, 전술한 본 발명의 실시예에 따른 다양한 방법은 다양한 컴퓨터수단을 통하여 판독 가능한 프로그램 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 여기서, 기록매체는 프로그램 명령, 데이터 파일, 데이터구조 등을 단독으로 또는 조합하여 포함할 수 있다. 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 예컨대 기록매체는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광 기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함한다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 와이어뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 와이어를 포함할 수 있다. 이러한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.On the other hand, the various methods according to the embodiment of the present invention described above may be implemented in the form of a program readable by various computer means and recorded in a computer readable recording medium. Here, the recording medium may include a program command, a data file, a data structure, etc. alone or in combination. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software. For example, the recording medium includes magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and magneto-optical media such as floppy disks ( magneto-optical media), and hardware devices specially configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions may include not only machine language wires such as those created by a compiler, but also high-level language wires that can be executed by a computer using an interpreter or the like. Such hardware devices may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상, 본 발명의 일 실시예에 대하여 설명하였으나, 해당 기술 분야에서 통상의 지식을 가진 자라면 특허청구범위에 기재된 본 발명의 사상으로부터 벗어나지 않는 범위 내에서, 구성 요소의 부가, 변경, 삭제 또는 추가 등에 의해 본 발명을 다양하게 수정 및 변경시킬 수 있을 것이며, 이 또한 본 발명의 권리범위 내에 포함된다고 할 것이다.Above, an embodiment of the present invention has been described, but those of ordinary skill in the art can add, change, delete or add components within the scope that does not depart from the spirit of the present invention described in the claims. It will be said that various modifications and changes of the present invention can be made by, and this is also included within the scope of the present invention.

90...대상물 310...획득부
330...추출부 350...판단부
370...추정부90...Object 310...Acquisition Department
330...extraction unit 350...judgment unit
370...estimate

Claims

an acquisition unit for acquiring a two-dimensional image of an object;
an extractor configured to extract a plurality of vectors having a similarity with respect to an object region representing the object included in the two-dimensional image higher than a reference value;
a determination unit determining whether to use the depth information of the object; and
Including; an estimator for estimating the posture of the object by using the two-dimensional image,
When it is determined that the depth information of the object is used according to the determination result of the determination unit, the estimator operates in a second mode of extracting a plurality of second vectors according to an inverse order of similarity among the plurality of vectors,
When the estimator operates in the second mode, the estimator extracts the plurality of second vectors from among the plurality of vectors extracted by the extraction unit in an order of lower similarity to the target region, and the plurality of second vectors Estimate the posture of the object using the posture of
The plurality of vectors is n (where n is a natural number of 3 or more),
The plurality of second vectors are smaller than n and m numbers greater than or equal to 2
estimation device.

According to claim 1,
The extraction unit,
Using at least one of a deep learning technique and a similarity determination technique to generate the plurality of vectors in a similar order to the target area, or extract the plurality of vectors in a similar order to the target area from the three-dimensional model of the target, extracting the plurality of vectors in a similar order to the target area from the database in which the plurality of vectors are stored in advance
estimation device.

According to claim 1,
The extraction unit
dividing the entire angular range that the object can take into a plurality of angular ranges, and extracting the plurality of vectors having a similarity higher than the reference value corresponding to each of the partitioned angles in comparison with the object area
estimation device.

According to claim 1,
The extraction unit detects the target area in the two-dimensional image using a deep learning-based target detection algorithm,
The extraction unit localizes the target area,
The extraction unit applies the target area as an input of a pre-learned deep learning generation model,
The extraction unit measures the similarity between the embedded source vector and the storage vector stored in the codebook using the target region as a source in the encoder part of the deep learning generation model,
The extraction unit extracts the plurality of vectors having a high degree of similarity to the target region from among the stored vectors according to the measured similarity.
estimation device.

According to claim 1,
When it is determined that the depth information of the object is not used according to the determination result of the determination unit, it operates in a first mode for extracting a plurality of first vectors according to a ranking having a high degree of similarity among the plurality of vectors
estimation device.

delete

an extracting unit for extracting a plurality of vectors having similarities with respect to an object region included in the two-dimensional image satisfying a second reference value; and
Including; an estimator for estimating the posture of the object by using the two-dimensional image and the depth information of the object;
The extraction unit extracts a plurality of second vectors in an order of lower similarity to the target region from among the plurality of vectors,
The estimator estimates the posture of the object using the postures of the plurality of second vectors,
The plurality of vectors is n (where n is a natural number of 3 or more),
The plurality of second vectors are smaller than n and m numbers greater than or equal to 2
estimation device.

8. The method of claim 7,
The estimator applies the postures of the plurality of second vectors to the three-dimensional model of the object, respectively,
The estimator generates a model point cloud by the number of the plurality of second vectors using a three-dimensional model to which the postures of the plurality of second vectors are applied,
The estimator generates a target point cloud by using the depth information,
The estimator corrects a plurality of the model point clouds in a direction to follow the single target point cloud using a three-dimensional matching technique,
The estimator generates a plurality of two-dimensional rendering images by applying the corrected postures of the plurality of model point clouds to the three-dimensional model,
wherein the estimator estimates a posture of a specific 2D rendered image having the highest similarity with the target region among a plurality of the 2D rendered images as the posture of the object
estimation device.

An estimation method performed by an estimation device, comprising:
acquiring a two-dimensional image of an object;
extracting a plurality of vectors having a similarity with respect to an object region representing the object included in the two-dimensional image higher than a reference value;
determining whether to use depth information of the object; and
Comprising the step of executing a second mode when it is determined that the depth information is used,
The second mode is
extracting a plurality of second vectors according to an order in which the degree of similarity to the target region from among the plurality of vectors is lower than a second reference value;
estimating the posture of the object using the plurality of second vectors,
The plurality of vectors is n (where n is a natural number of 3 or more),
The plurality of second vectors are smaller than n and are 2 or more m
Estimation method.

10. The method of claim 9,
When it is determined that the depth information is not used, executing the first mode;
The first mode is
extracting a plurality of first vectors having a similarity to the target region higher than a first reference value from among the plurality of vectors;
applying a posture of each of the plurality of first vectors to the three-dimensional model of the object;
obtaining a plurality of two-dimensional rendering images from a plurality of three-dimensional models to which the plurality of postures of the first vector are applied;
comparing the plurality of two-dimensional rendered images with the target area;
estimating, as the posture of the object, a specific posture of a specific first vector used to extract a specific two-dimensional rendering image having the highest similarity to the target region
Estimation method.

10. The method of claim 9,
The second mode is
applying the postures of the plurality of second vectors to the three-dimensional model of the object, respectively;
generating model point clouds as many as the number of the plurality of second vectors by using the three-dimensional model to which the postures of the plurality of second vectors are applied;
generating a target point cloud by using the depth information;
correcting a plurality of the model point clouds in a direction to follow the single target point cloud using a three-dimensional matching technique;
generating a plurality of two-dimensional rendering images by applying the corrected postures of the plurality of model point clouds to the three-dimensional model; and
estimating, as the posture of the object, a posture of a specific 2D rendered image having the highest similarity to the target region among a plurality of the 2D rendered images
Estimation method.