KR102177445B1

KR102177445B1 - An apparatus for pose estimation of object using latent variable from auto encoder and method thereof

Info

Publication number: KR102177445B1
Application number: KR1020200018950A
Authority: KR
Inventors: 박종훈; 강준수
Original assignee: 주식회사 뉴로메카
Priority date: 2020-02-17
Filing date: 2020-02-17
Publication date: 2020-11-11

Abstract

Provided is a method for estimating a posture of an object using a latent variable dictionary of a patch auto-encoder, which is performed by a computing device. The method comprises the steps of: generating a plurality of latent variables corresponding to a plurality of image patches respectively from an image including a target object by using a patch auto-encoder, wherein the image patch includes a central point; determining a position on a three-dimensional model of the central point of an image patch corresponding to selected latent variables with respect to N selected latent variables (where N is a natural number equal to or greater than 2) predetermined among the plurality of latent variables by using the latent variable dictionary connecting a position on the three-dimensional model of the central point of the image patch according to a latent variable value; and determining the posture of the target object by using the position on the three-dimensional model of the central point of the image patch corresponding to the selected latent variables. According to the present invention, it is possible to select feature points with high reliability from the detection results, thereby increasing robustness of object posture estimation.

Description

Device for estimating the attitude of an object using a dictionary of latent variables of an auto encoder {AN APPARATUS FOR POSE ESTIMATION OF OBJECT USING LATENT VARIABLE FROM AUTO ENCODER AND METHOD THEREOF}

본 발명은 이미지 처리에 관한 것으로서, 보다 구체적으로는 이미지에 포함된 물체의 특징점을 추출하고 상기 물체의 자세를 추정하기 위한 장치 및 방법에 관한 것이다. The present invention relates to image processing, and more particularly, to an apparatus and method for extracting feature points of an object included in an image and estimating a posture of the object.

2차원 이미지상에 포함된 객체에 대한 어떠한 특징점들을 검출해 냈을 때, 이 특징점들의 3차원 모델에서의 위치를 알면, 이 객체에 대한 모델의 위치 및 자세를 역으로 추정해낼 수 있다. 이를 위해 SIFT, ORB 등 다양한 전통적 기술자 ( descriptor) 를 이용하기도 하고, 딥러닝 등을 통해 특정 포인트를 추정하는 다양한 방법들이 사용되어왔다. 딥러닝 기반의 새로운 방법들은 전통적 방식에 비해 우수한 결과를 보이지만, 학습을 위해 사전에 특징점들을 지정해줘야 한다는 단점이 있다.When certain feature points of an object included in a 2D image are detected, knowing the positions of these feature points in the 3D model can inversely estimate the position and posture of the model for this object. For this, various traditional descriptors such as SIFT and ORB have been used, and various methods of estimating specific points through deep learning have been used. The new methods based on deep learning show superior results compared to the traditional methods, but have a disadvantage in that feature points must be specified in advance for learning.

한국 공개특허공보 제2014-0134154호 ("이미지에서 객체를 추출하기 위한 방법 및 단말기", 주식회사 브이플랩)Korean Patent Application Publication No. 2014-0134154 ("Method and terminal for extracting an object from an image", VFlap, Inc.)

전술한 문제점을 해결하기 위한 본 발명의 목적은 딥러닝의 특성상, 조명이나 배경 등 환경 변수에 대해 강건하고 정확하면서도, 기존의 딥러닝 접근법과 같이 사전에 특징점을 지정하지 않아도, 스스로 검출하기 좋은 특징점을 선별하고, 또 검출 결과에서 신뢰성이 높은 특징점들만을 선별해낼 수도 있어 물체 자세 추정의 강건성을 높일 수 있는, 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 방법을 제공하는 것이다. An object of the present invention for solving the above-described problem is that, due to the nature of deep learning, it is robust and accurate with respect to environmental variables such as lighting and background, and features that are good for self-detection without specifying the feature points in advance as in the existing deep learning approach. The object attitude estimation method using the latent variable dictionary of the patch auto encoder is provided, which can increase the robustness of the object attitude estimation because it is possible to select only the highly reliable feature points from the detection result.

전술한 문제점을 해결하기 위한 본 발명의 다른 목적은 딥러닝의 특성상, 조명이나 배경 등 환경 변수에 대해 강건하고 정확하면서도, 기존의 딥러닝 접근법과 같이 사전에 특징점을 지정하지 않아도, 스스로 검출하기 좋은 특징점을 선별하고, 또 검출 결과에서 신뢰성이 높은 특징점들만을 선별해낼 수도 있어 물체 자세 추정의 강건성을 높일 수 있는, 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 장치를 제공하는 것이다. Another object of the present invention for solving the above problems is that, due to the nature of deep learning, it is robust and accurate for environmental variables such as lighting and background, and it is easy to detect by itself without designating a feature point in advance like the existing deep learning approach. The present invention provides an object attitude estimation apparatus using a dictionary of latent variables of a patch auto encoder, which can increase robustness of object attitude estimation by selecting feature points and selecting only highly reliable feature points from a detection result.

다만, 본 발명의 해결하고자 하는 과제는 이에 한정되는 것이 아니며, 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위에서 다양하게 확장될 수 있을 것이다.However, the problem to be solved of the present invention is not limited thereto, and may be variously extended without departing from the spirit and scope of the present invention.

전술한 목적을 달성하기 위한 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 방법은, 컴퓨팅 디바이스에 의해 수행될 수 있고, 상기 방법은, 패치 오토 엔코더를 이용하여, 대상 물체를 포함하는 이미지로부터 복수의 이미지 패치 - 여기서, 상기 이미지 패치는 중심점을 포함 - 에 각각 대응하는 복수의 잠재 변수를 생성하는 단계; 잠재 변수 값에 따른 이미지 패치 중심점의 3 차원 모델 상에서의 위치를 연결짓는 잠재 변수 사전을 이용하여, 상기 복수의 잠재 변수 중 미리 결정된 N 개 (여기서, N 은 2 이상의 자연수) 의 선택 잠재 변수에 대해, 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 결정하는 단계; 및 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 이용하여 상기 대상 물체의 자세를 결정하는 단계를 포함할 수 있다. An object attitude estimation method using a latent variable dictionary of a patch auto encoder according to an embodiment of the present invention for achieving the above object may be performed by a computing device, and the method includes: Generating a plurality of latent variables each corresponding to a plurality of image patches from an image containing a target object, wherein the image patch includes a central point; Using a latent variable dictionary that connects the position of the center point of the image patch according to the value of the latent variable on the 3D model, among the plurality of latent variables, N predetermined (where N is a natural number of 2 or more) selected latent variables And determining a position on the 3D model of a center point of the image patch corresponding to the selected latent variable; And determining a posture of the target object by using the position on the 3D model of the center point of the image patch corresponding to the selected latent variable.

일 측면에 따르면, 상기 패치 오토 엔코더는, 적어도 하나의 학습 데이터에 포함된 복수의 학습 이미지 패치들을 입력받아 상기 복수의 학습 이미지 패치들에 각각 대응되는 잠재 변수들로 변환하고, 상기 잠재 변수들을 다시 상기 학습 이미지 패치들로 변환하여 출력하도록 학습된 딥러닝 기반의 오토 엔코더일 수 있다. According to an aspect, the patch auto encoder receives a plurality of training image patches included in at least one training data, converts them into latent variables respectively corresponding to the plurality of training image patches, and converts the latent variables again. It may be a deep learning-based auto encoder that has been trained to convert and output the training image patches.

일 측면에 따르면, 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점은 상기 대상 물체의 특징점일 수 있다. According to an aspect, a center point of an image patch corresponding to the selection latent variable may be a feature point of the target object.

일 측면에 따르면, 상기 미리 결정된 N 개의 선택 잠재 변수는, 상기 복수의 잠재 변수 중, 상기 복수의 잠재 변수 주변의 일정 잠재 변수의 영역이 3 차원 영역으로 매핑 될 때, 매핑 된 3 차원 영역이 더 좁은 순서대로 선택된 N 개의 잠재 변수일 수 있다. According to an aspect, when the N number of predetermined latent variables are selected from among the plurality of latent variables, when an area of a certain latent variable around the plurality of latent variables is mapped to a 3D area, the mapped 3D area is further There may be N latent variables selected in a narrow order.

일 측면에 따르면, 상기 미리 결정된 N 개의 선택 잠재 변수는, 상기 선택 잠재 변수 이외의 잠재 변수를 선택하는 경우와 비교하여 더 높은 정밀도를 가지도록 선택된 것일 수 있다. According to an aspect, the N predetermined selection latent variables may be selected to have a higher precision compared to a case of selecting a latent variable other than the selection latent variable.

일 측면에 따르면, 상기 미리 결정된 N 개의 선택 잠재 변수는, 상기 복수의 잠재 변수 중, 상기 복수의 잠재 변수 주변의 잠재 변수의 영역이 일정 3차원 영역으로 매핑 될 때, 매핑 되는 잠재 변수의 영역이 더 넓은 순서대로 선택된 N 개의 잠재 변수일 수 있다. According to an aspect, when the N number of predetermined latent variables are selected from among the plurality of latent variables, when a region of a latent variable around the plurality of latent variables is mapped to a predetermined 3D region, the region of the mapped latent variable is There may be N latent variables selected in a wider order.

일 측면에 따르면, 상기 미리 결정된 N 개의 선택 잠재 변수는, 상기 선택 잠재 변수 이외의 잠재 변수를 선택하는 경우와 비교하여 더 넓은 범위의 입력에 대한 강건성 및 보편성을 가지도록 선택된 것일 수 있다. According to an aspect, the N predetermined selection latent variables may be selected to have robustness and universality for a wider range of input compared to a case of selecting a latent variable other than the selective latent variable.

일 측면에 따르면, 제 1 잠재 변수와 미리 결정된 거리 이내에 위치하는 주변 잠재 변수의 영역과 매칭되는 3 차원 영역이, 상기 제 1 잠재 변수에 대응되는 3 차원 영역과 미리 결정된 거리 이상 이격하여 위치하였다는 결정에 응답하여, 상기 제 1 잠재 변수는 상기 선택 잠재 변수에서 제외되는 것일 수 있다. According to one aspect, the 3D area matching the area of the surrounding potential variable located within a predetermined distance from the first potential variable is located apart from the 3D area corresponding to the first potential variable by a predetermined distance or more. In response to the determination, the first latent variable may be excluded from the selected latent variable.

일 측면에 따르면, 상기 미리 결정된 N 개의 선택 잠재 변수는, 이상점을 특징점에서 제외하도록 선택된 것일 수 있다. According to an aspect, the N predetermined selection latent variables may be selected to exclude outliers from feature points.

일 측면에 따르면, 상기 대상 물체의 자세를 결정하는 단계는, Perspective N Point 알고리즘을 적용하여 상기 대상 물체의 위치 및 자세를 추정하는 것일 수 있다. According to an aspect, the determining of the posture of the target object may include estimating a position and posture of the target object by applying a Perspective N Point algorithm.

전술한 문제점을 해결하기 위한 본 발명의 다른 실시예에 따른 장치는, 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 장치로서, 상기 장치는 프로세서 및 메모리를 포함하고, 상기 프로세서는, 패치 오토 엔코더를 이용하여, 대상 물체를 포함하는 이미지로부터 복수의 이미지 패치 - 여기서, 상기 이미지 패치는 중심점을 포함 - 에 각각 대응하는 복수의 잠재 변수를 생성하고; 잠재 변수 값에 따른 이미지 패치 중심점의 3 차원 모델 상에서의 위치를 연결짓는 잠재 변수 사전을 이용하여, 상기 복수의 잠재 변수 중 미리 결정된 N 개 (여기서, N 은 2 이상의 자연수) 의 선택 잠재 변수에 대해, 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 결정하고; 그리고 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 이용하여 상기 대상 물체의 자세를 결정하도록 구성될 수 있다. An apparatus according to another embodiment of the present invention for solving the above-described problem is an object attitude estimation apparatus using a latent variable dictionary of a patch auto encoder, the apparatus including a processor and a memory, and the processor is a patch auto encoder Generating a plurality of latent variables each corresponding to a plurality of image patches from an image including a target object, wherein the image patch includes a central point; Using a latent variable dictionary that connects the position of the center point of the image patch according to the value of the latent variable on the 3D model, among the plurality of latent variables, N predetermined (where N is a natural number of 2 or more) selected latent variables And determining a position on the 3D model of the center point of the image patch corresponding to the selected latent variable; Further, it may be configured to determine the posture of the target object by using the position on the 3D model of the center point of the image patch corresponding to the selected latent variable.

전술한 문제점을 해결하기 위한 본 발명의 다른 실시예에 따른 컴퓨터 판독 가능한 저장 매체는, 프로세서에 의해 실행 가능한 명령어들을 포함하는 컴퓨터 판독 가능한 저장 매체로서, 상기 명령어들은 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정을 수행하기 위한 것이고, 상기 명령어들은 상기 프로세서에 의해 실행되었을 때 상기 프로세서로 하여금, 패치 오토 엔코더를 이용하여, 대상 물체를 포함하는 이미지로부터 복수의 이미지 패치 - 여기서, 상기 이미지 패치는 중심점을 포함 - 에 각각 대응하는 복수의 잠재 변수를 생성하고; 잠재 변수 값에 따른 이미지 패치 중심점의 3 차원 모델 상에서의 위치를 연결짓는 잠재 변수 사전을 이용하여, 상기 복수의 잠재 변수 중 미리 결정된 N 개 (여기서, N 은 2 이상의 자연수) 의 선택 잠재 변수에 대해, 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 결정하고; 그리고 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 이용하여 상기 대상 물체의 자세를 결정하게 하도록 구성될 수 있다. A computer-readable storage medium according to another embodiment of the present invention for solving the above-described problem is a computer-readable storage medium including instructions executable by a processor, the instructions being a latent variable dictionary of a patch auto encoder. The instructions are for performing object attitude estimation, and when the instructions are executed by the processor, the processor causes the processor to patch a plurality of images from an image including a target object, using a patch auto encoder-where the image patch is a center point Including-and generating a plurality of latent variables each corresponding to; Using a latent variable dictionary that connects the position of the center point of the image patch according to the value of the latent variable on the 3D model, among the plurality of latent variables, N predetermined (where N is a natural number of 2 or more) selected latent variables And determining a position on the 3D model of the center point of the image patch corresponding to the selected latent variable; In addition, it may be configured to determine the posture of the target object by using the position on the 3D model of the center point of the image patch corresponding to the selected latent variable.

개시된 기술은 다음의 효과를 가질 수 있다. 다만, 특정 실시예가 다음의 효과를 전부 포함하여야 한다거나 다음의 효과만을 포함하여야 한다는 의미는 아니므로, 개시된 기술의 권리범위는 이에 의하여 제한되는 것으로 이해되어서는 아니 될 것이다.The disclosed technology can have the following effects. However, since it does not mean that a specific embodiment should include all of the following effects or only the following effects, it should not be understood that the scope of the rights of the disclosed technology is limited thereby.

전술한 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 방법 및 장치에 따르면, 딥러닝의 특성상, 조명이나 배경 등 환경 변수에 대해 강건하고 정확하면서도, 기존의 딥러닝 접근법과 같이 사전에 특징점을 지정하지 않아도, 스스로 검출하기 좋은 특징점을 선별하고, 또 검출 결과에서 신뢰성이 높은 특징점들만을 선별해낼 수도 있어 물체 자세 추정의 강건성을 높일 수 있다. According to the method and apparatus for estimating the attitude of an object using a dictionary of latent variables of a patch auto encoder according to an embodiment of the present invention described above, due to the nature of deep learning, it is robust and accurate for environmental variables such as lighting and background, while existing deep learning Like the approach, it is possible to select feature points that are easy to detect by themselves, and to select only highly reliable feature points from the detection result without specifying the feature points in advance, thereby enhancing the robustness of object attitude estimation.

따라서, 높은 정확도와 강건성을 유지하면서도, 여타 딥러닝 기반의 특징점 추출과 비교하면, 특징점을 미리 지정하지 않아도 되기 때문에, 이에 따른 정확도의 변동이나 손실이 없다. 또한, 특정 물체에 대한 레이블을 학습하는 것이 아니기 때문에, 충분히 방대한 데이터에 학습할 경우, 새로운 물체에 추가 학습 없이 바로 적용 가능할 수 있으며, 이는 도입 시간의 측면에서 큰 장점이 될 수 있다. Therefore, while maintaining high accuracy and robustness, compared to other deep learning-based feature point extraction, since the feature point does not need to be designated in advance, there is no variation or loss in accuracy accordingly. In addition, since labels for a specific object are not learned, if a sufficiently large amount of data is learned, it can be immediately applied to a new object without additional learning, which can be a great advantage in terms of introduction time.

도 1 은 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 방법의 흐름도이다.
도 2 는 특징점으로 사용될 수 있는 오토 엔코더 잠재 변수 사전의 학습 과정을 나타낸다.
도 3 은 특징점 선별의 예시를 나타낸다.
도 4 는 잠재 변수 사전을 이용한 물체 자세 추정의 개념도를 나타낸다.
도 5 는 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 장치로서 동작할 수 있는 컴퓨팅 시스템의 구성을 나타내는 블록도이다.1 is a flowchart of a method for estimating an object attitude using a dictionary of latent variables of a patch auto encoder according to an embodiment of the present invention.
2 shows the learning process of the auto-encoder latent variable dictionary that can be used as a feature point.
3 shows an example of selection of feature points.
4 shows a conceptual diagram of object attitude estimation using a dictionary of latent variables.
5 is a block diagram illustrating a configuration of a computing system capable of operating as an object posture estimation apparatus using a latent variable dictionary of a patch auto encoder according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.However, this is not intended to limit the present invention to a specific embodiment, it is to be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention.

제 1, 제 2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제 1 구성요소는 제 2 구성요소로 명명될 수 있고, 유사하게 제 2 구성요소도 제 1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first and second may be used to describe various elements, but the elements should not be limited by the terms. These terms are used only for the purpose of distinguishing one component from another component. For example, without departing from the scope of the present invention, a first element may be referred to as a second element, and similarly, a second element may be referred to as a first element. The term and/or includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is referred to as being "connected" or "connected" to another component, it is understood that it may be directly connected or connected to the other component, but other components may exist in the middle. Should be. On the other hand, when a component is referred to as being "directly connected" or "directly connected" to another component, it should be understood that there is no other component in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present application, terms such as "comprise" or "have" are intended to designate the presence of features, numbers, steps, actions, components, parts, or combinations thereof described in the specification, but one or more other features. It is to be understood that the presence or addition of elements or numbers, steps, actions, components, parts, or combinations thereof, does not preclude in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가진 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and should not be interpreted as an ideal or excessively formal meaning unless explicitly defined in this application. Does not.

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다. Hereinafter, preferred embodiments of the present invention will be described in more detail with reference to the accompanying drawings. In describing the present invention, in order to facilitate an overall understanding, the same reference numerals are used for the same elements in the drawings, and duplicate descriptions for the same elements are omitted.

본 발명은 패치 오토 엔코더의 잠재 변수 사전을 이용한 특징점 추출 및 물체 자세 추정의 파이프라인를 포함한다. The present invention includes a pipeline of feature point extraction and object attitude estimation using a latent variable dictionary of a patch auto encoder.

앞서 살핀 바와 같이, 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 방법 및 장치에 따르면, 딥러닝의 특성상, 조명이나 배경 등 환경 변수에 대해 강건하고 정확하면서도, 기존의 딥러닝 접근법과 같이 사전에 특징점을 지정하지 않아도, 스스로 검출하기 좋은 특징점을 선별하고, 또 검출 결과에서 신뢰성이 높은 특징점들만을 선별해낼 수도 있어 물체 자세 추정의 강건성을 높일 수 있다. As described above, according to the method and apparatus for estimating the attitude of an object using a dictionary of latent variables of a patch auto encoder according to an embodiment of the present invention, due to the nature of deep learning, it is robust and accurate for environmental variables such as lighting and background. As with the deep learning approach of D, it is possible to select features that are good to be detected by itself, and to select only features with high reliability from the detection results, thereby enhancing the robustness of object attitude estimation.

도 1 은 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 방법의 흐름도이다. 이하, 도 1 을 참조하여 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 방법에 대해서 보다 상세하게 설명한다. 1 is a flowchart of a method of estimating an object posture using a dictionary of latent variables of a patch auto encoder according to an embodiment of the present invention. Hereinafter, a method of estimating an object posture using a latent variable dictionary of a patch auto encoder according to an embodiment of the present invention will be described in more detail with reference to FIG. 1.

도 1 에 도시된 바와 같이, 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 방법은, 패치 오토 엔코더를 이용하여, 대상 물체를 포함하는 이미지로부터 복수의 이미지 패치 - 여기서, 상기 이미지 패치는 중심점을 포함 - 에 각각 대응하는 복수의 잠재 변수를 생성하고 (단계 110), As shown in FIG. 1, an object attitude estimation method using a latent variable dictionary of a patch auto encoder according to an embodiment of the present invention includes a plurality of image patches from an image including a target object using a patch auto encoder. Here, the image patch includes a central point-generating a plurality of latent variables each corresponding to (step 110),

잠재 변수 값에 따른 이미지 패치 중심점의 3 차원 모델 상에서의 위치를 연결짓는 잠재 변수 사전을 이용하여, 상기 복수의 잠재 변수 중 미리 결정된 N 개 (여기서, N 은 2 이상의 자연수) 의 선택 잠재 변수에 대해, 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 결정 (단계 120) 한 뒤, Using a latent variable dictionary that connects the position of the center point of the image patch according to the value of the latent variable on the 3D model, among the plurality of latent variables, N predetermined (where N is a natural number of 2 or more) selected latent variables , After determining the position on the 3D model of the center point of the image patch corresponding to the selected latent variable (step 120),

상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 이용하여 상기 대상 물체의 자세를 결정 (단계 130) 할 수 있다. The posture of the target object may be determined (step 130) by using the position on the 3D model of the center point of the image patch corresponding to the selected latent variable.

이하, 상기와 같은 대상 물체의 자세 결정과 관련하여 오토 엔코더의 학습, 잠재 변수 사전의 학습, 잠재 변수 사전을 이용한 물체 자세 추정의 순서대로 보다 구체적으로 설명한다. Hereinafter, in relation to the determination of the attitude of the target object as described above, it will be described in more detail in the order of learning an auto encoder, learning a latent variable dictionary, and estimating an object attitude using a latent variable dictionary.

앞서 살핀 바와 같이, 본 발명은 딥러닝 오토 엔코더의 잠재 변수를 전통적 비전에서의 기술자 (descriptor) 와 같이 활용해 키 포인트 (이하, '특징점'이라고도 한다) 를 추출하고 궁극적으로 물체의 자세를 추정하기 위한 방법을 제안한다.As noted above, the present invention extracts key points (hereinafter, also referred to as'feature points') by utilizing the latent variables of the deep learning auto encoder as a descriptor in the traditional vision and ultimately estimating the attitude of the object. Suggest a way for you.

도 2 는 특징점으로 사용될 수 있는 오토 엔코더 잠재 변수 사전의 학습 과정을 나타낸다. 도 2 에 도시된 바와 같이, 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 방법에서는, 우선 학습 데이터의 이미지에서 여러 픽셀을 중심으로 한 패치를 추출하고, 이에 대해 심층 신경망 기반의 오토 엔코더를 학습할 수 있다 (210). 즉, 패치 오토 엔코더는, 적어도 하나의 학습 데이터에 포함된 복수의 학습 이미지 패치들을 입력받아 상기 복수의 학습 이미지 패치들에 각각 대응되는 잠재 변수들로 변환하고, 상기 잠재 변수들을 다시 상기 학습 이미지 패치들로 변환하여 출력하도록 학습된 딥러닝 기반의 오토 엔코더일 수 있다. 2 shows the learning process of the auto-encoder latent variable dictionary that can be used as a feature point. As shown in FIG. 2, in the object attitude estimation method using the latent variable dictionary of the patch auto encoder according to an embodiment of the present invention, first, a patch centering on several pixels is extracted from an image of training data, and A deep neural network-based auto encoder can be trained (210). That is, the patch auto encoder receives a plurality of training image patches included in at least one training data, converts them into latent variables respectively corresponding to the plurality of training image patches, and converts the latent variables back to the training image patch. It may be a deep learning-based auto encoder that has been trained to be converted to and outputted.

일 측면에 따르면 오토인코더(Autoencoder)는 기계 학습 방법의 일종으로, 비지도 학습(Unsupervised learning)에 속할 수 있다. 신경망 알고리즘(Neural network)을 이용하여 어떤 입력이 신경망을 거쳐 나온 출력값이 그 입력값과 최대한 비슷해지도록 하는 것을 목표로 학습하며, 이때 입력값의 차원보다 신경망 뉴런의 개수가 크거나 같을 경우 입력값을 그대로 받아서 내보내면 그만이기 때문에 학습의 의미가 없어진다. 즉, 오토인코더의 사용이 의미를 가지기 위해서는 뉴런의 개수가 입력값의 차원보다 작아야 하고, 이 학습의 결과 더 적은 수의 값들을 가지고 원래 값을 복원할 수 있는 압축의 효과를 얻을 수 있다. 본 발명의 일 측면에 따르면, 오토인코더의 입력은 소정 크기 (예를 들어 64 × 64 px 의 크기) 를 가지는 이미지의 조각인 이미지 패치일 수 있고, 잠재 변수는 상기 이미지 패치로부터 변환된 소정 크기의 벡터일 수 있다. 본 발명의 일 측면에 따른 오토 엔코더는, 학습 데이터인 이미지들 (예를 들어 단일 물체 이미지) 로부터 포인트 중심 패치를 적어도 하나 이상 추출하고, 상기 패치가 신경망의 뉴런을 통해 잠재 변수로 변환되고, 잠재 변수로부터 다시 상기 패치와 최대한 동일한 이미지 패치를 출력하도록 학습될 수 있다. According to one aspect, an autoencoder is a kind of machine learning method, and may belong to unsupervised learning. Using a neural network algorithm, learning aims to make the output value from a certain input through the neural network as close as possible to the input value.At this time, if the number of neural network neurons is greater than or equal to the dimension of the input value, the input value is If you receive it as it is and send it out, the meaning of learning disappears because it is just that. That is, in order for the autoencoder to be meaningful, the number of neurons must be smaller than the dimension of the input value, and as a result of this learning, the effect of compression that can restore the original value with a smaller number of values can be obtained. According to an aspect of the present invention, the input of the autoencoder may be an image patch, which is a fragment of an image having a predetermined size (for example, a size of 64 × 64 px), and the latent variable is a predetermined size converted from the image patch. It can be a vector. The auto encoder according to an aspect of the present invention extracts at least one point-centered patch from images (for example, a single object image) as training data, and the patch is transformed into a latent variable through a neuron of a neural network. It can be learned to output the same image patch as possible as the patch again from the variable.

다시 도 2 를 참조하면, 이후, 학습된 오토엔코더에 학습 데이터를 다시 입력하고 각 이미지 패치로부터 잠재 변수를 생성할 수 있다 (220). 여기서, 학습 데이터는 상기 오토 엔코더의 학습에 사용된 데이터와 동일한 학습 데이터가 사용될 수도 있다. 복수의 잠재 변수들이 생성되면, 생성된 잠재 변수와 각 패치 중심점의 3D 모델 상에서의 위치를 기록해 잠재 변수 값에 따른 3D 위치 사전을 생성할 수 있다 (230). Referring to FIG. 2 again, afterwards, training data may be re-entered into the learned autoencoder and a latent variable may be generated from each image patch (220). Here, as the learning data, the same learning data as the data used for learning of the auto encoder may be used. When a plurality of latent variables are generated, a 3D position dictionary according to the values of the latent variables may be generated by recording the generated latent variables and the positions of the center points of each patch on the 3D model (230).

본 발명의 일 측면에 따르면, 이미지에 포함된 복수의 이미지 패치에 각각 포함된 픽셀 중심점들 중, 가장 검출하기 좋은 특징점을 선별하고, 신뢰성이 높은 특징점들만을 선별하도록 구성될 수 있다. 상기와 같은 특징점의 선별은, 예를 들어 이미지 패치로부터 변환된 잠재 변수와, 상기 이미지 패치의 3차원 모델 공간 사이의 관계를 기준으로 N 개 (여기서, N 은 2 이상의 자연수) 의 선택 잠재 변수를 결정하는 것에 의해 수행될 수 있다. 일 측면에 따라, 선택 잠재 변수에 대응되는 이미지 패치의 중심점이 상기 대상 물체의 특징점일 수 있다. According to an aspect of the present invention, it may be configured to select a feature point that is most detectable among pixel center points included in each of a plurality of image patches included in an image, and select only feature points with high reliability. In the selection of the above feature points, for example, based on the relationship between the latent variable transformed from the image patch and the 3D model space of the image patch, N (where N is a natural number of 2 or more) selected latent variables It can be done by deciding. According to an aspect, a center point of an image patch corresponding to a selection latent variable may be a feature point of the target object.

도 3 은 특징점 선별의 예시를 나타낸다. 앞서 살핀 생성된 잠재 변수 사전 (320) 은 잠재 변수 지도 (310) 와 모델 (330) 간의 관계, 즉 잠재 변수 공간 (340) 과 모델 공간 (350) 사이의 매핑 함수로 볼 수 있는데, 이 매핑에 예를 들어 아래와 같은 3가지 규칙을 적용해 특징점 (즉, 관련된 잠재 변수) 를 선별할 수 있다. 3 shows an example of selection of feature points. The previously created latent variable dictionary 320 can be viewed as a relationship between the latent variable map 310 and the model 330, that is, a mapping function between the latent variable space 340 and the model space 350. For example, the following three rules can be applied to select feature points (i.e., related latent variables).

1) 특정 잠재변수 영역이 3차원 영역으로 매핑 될 때, 매핑 된 3차원 영역이 좁을 것 (높은 정밀도)1) When a specific latent variable area is mapped to a 3D area, the mapped 3D area should be narrow (high precision)

2) 이때, 잠재변수 영역은 넓을 것 (입력에 대한 강건성 및 보편성)2) At this time, the area of the latent variable should be wide (robustness and universality for input)

3) 주변 잠재변수 영역이 동떨어진 3차원 위치와 연결되지 않을 것 (이상점 배제)3) Surrounding latent variable regions should not be connected to distant three-dimensional positions (outliers excluded)

즉, 본 발명의 일 측면에 따르면, 특징점 선별과 관련하여 결정된 N 개의 선택 잠재 변수는, 오토 엔코더에 의해 생성된 복수의 잠재 변수 중, 복수의 잠재 변수에 대한 잠재 변수의 영역이 3 차원 영역으로 매핑 될 때, 매핑 된 3 차원 영역이 더 좁은 순서대로 선택된 N 개의 잠재 변수일 수 있다. 따라서, 결정된 N 개의 선택 잠재 변수는, 선택 잠재 변수 이외의 잠재 변수를 선택하는 경우와 비교하여 더 높은 정밀도를 가지도록 선택된 것일 수 있다. That is, according to an aspect of the present invention, among the plurality of latent variables generated by the auto encoder, the area of the latent variables for the plurality of latent variables among the plurality of latent variables determined in relation to the feature point selection is a three-dimensional area. When mapped, the mapped 3D regions may be N latent variables selected in a narrower order. Accordingly, the determined N selection latent variables may be selected to have higher precision compared to the case of selecting latent variables other than the selected latent variables.

한편, 본 발명의 일 측면에 따르면, 결정된 N 개의 선택 잠재 변수는, 오토 엔코더에 의해 생성된 복수의 잠재 변수 중, 이러한 복수의 잠재 변수 주변의 잠재 변수의 영역이 일정 3차원 영역으로 매핑 될 때, 매핑 되는 잠재 변수의 영역이 더 넓은 순서대로 선택된 N 개의 잠재 변수일 수 있다. 따라서, 선택 잠재 변수 이외의 잠재 변수를 선택하는 경우와 비교하여 더 넓은 범위의 입력에 대한 강건성 및 보편성을 가지도록 선택될 수 있다. Meanwhile, according to an aspect of the present invention, the determined N selection latent variables are, among a plurality of latent variables generated by the auto encoder, when the regions of the latent variables around the plurality of latent variables are mapped to a predetermined three-dimensional region. , The area of the latent variable to be mapped may be N latent variables selected in a wider order. Therefore, it may be selected to have robustness and universality for a wider range of inputs compared to the case of selecting a latent variable other than the selective latent variable.

여기서, 3차원 영역이 좁을 것과 잠재 변수의 영역이 넓은 것의 요건은 통합되어 고려될 수 있으며, 상기 두 요건 중 어느 요건에 대한 가중치를 더 크게 부여할 지 여부는 구현 환경에 따라 달리 적용될 수 있다. 또한, 선택 잠재 변수를 결정함에 있어서, 예를 들어 특정 임계값을 만족하는 잠재 변수들 (예를 들어, 3차원 영역이 제 1 임계값보다 작은 잠재 변수들 및/또는 잠재 변수 공간이 제 2 임계값보다 큰 잠재 변수들) 을 우선 선별하고, 3차원 영역 및/또는 잠재 변수 공간의 넓이에 따라 선택 잠재 변수를 결정할 수도 있다. Here, the requirements of a narrow 3D area and a wide area of a latent variable may be considered in an integrated manner, and whether or not a weight for which of the above two requirements is to be assigned a greater weight may be applied differently depending on the implementation environment. In addition, in determining the selection latent variable, for example, latent variables satisfying a specific threshold value (for example, latent variables in which a three-dimensional region is smaller than a first threshold value and/or a latent variable space is a second threshold value. Potential variables larger than the value) may be selected first, and the selected latent variable may be determined according to the area of the 3D area and/or the space of the latent variable.

한편, 본 발명의 일 측면에 따르면, 제 1 잠재 변수와 미리 결정된 거리 이내에 위치하는 주변 잠재 변수의 영역과 매칭되는 3 차원 영역이, 제 1 잠재 변수에 대응되는 3 차원 영역과 미리 결정된 거리 이상 이격하여 위치하였다는 결정에 응답하여, 상기 제 1 잠재 변수는 상기 선택 잠재 변수에서 제외되도록 결정할 수 있고, 따라서 이상점을 특징점에서 제외하도록 할 수 있다. 도 3 에 도시된 바와 같이, 예를 들어 잠재 변수 (360) 의 경우 3차원 영역에서 소정 거리 이상 이격되지 않으므로 합격 될 수 있으나, 잠재 변수 (370) 의 경우 주황색으로 표시된 주변 잠재 영역이 동떨어진 3차원 위치와 연결되므로 불합격될 수 있다. Meanwhile, according to an aspect of the present invention, a 3D area matching the area of the surrounding potential variable located within a predetermined distance from the first potential variable is separated from the 3D area corresponding to the first potential variable by a predetermined distance or more. In response to the determination that the position is located, the first latent variable may be determined to be excluded from the selected latent variable, and thus the outlier may be excluded from the feature point. As shown in FIG. 3, for example, in the case of the latent variable 360, it is possible to pass because it is not separated by a predetermined distance from the three-dimensional area, but in the case of the latent variable 370, the surrounding latent area indicated in orange is separated from the 3D area. Since it is connected to the location, it may be rejected.

도 4 는 잠재 변수 사전을 이용한 물체 자세 추정의 개념도를 나타낸다. 도 4 를 참조하여, 새로운 입력 이미지에 대한 물체 자세 추정에 대해서 설명한다. 4 shows a conceptual diagram of object attitude estimation using a dictionary of latent variables. With reference to FIG. 4, object attitude estimation for a new input image will be described.

오토 엔코더의 학습 및 잠재 변수 사전 생성을 수행하고 난 뒤에는, 새로운 입력 이미지에 대해서도 각 픽셀에 해당하는 잠재 변수를 계산 (430) 하고, 잠재 변수 사전으로부터 각 픽셀이 3D 모델의 어느 위치에 해당하는지에 대한 연결을 생성 (440) 할 수 있다. 이 연결에 따라, 예를 들어 일반적인 Perspective N Point 알고리즘을 적용 (450) 하는 것에 의해 물체의 위치 및 자세를 추정 (460) 할 수 있다. 이때, 여러 물체가 이미지 상에 혼재되어 있는 경우, 혹은 배경에 의해 감지 결과가 영향을 받을 경우, 사전에 다양한 객체 영역 추출 알고리즘을 적용 (410) 해 성능을 개선할 수 있다. 즉, 복수의 물체가 이미지 상에 포함되어 있을 때, 단일 물체를 분리하고 픽셀 중심 패치를 추출 (420) 한 뒤 오토 엔코더에 의한 잠재 변수 생성을 수행할 수 있다. After learning the auto encoder and generating the latent variable dictionary, the latent variable corresponding to each pixel is calculated (430) for a new input image, and from the latent variable dictionary, the position of each pixel in the 3D model is determined. You can create a connection to (440). According to this connection, the position and posture of the object can be estimated (460) by, for example, applying a general Perspective N Point algorithm (450). In this case, when several objects are mixed on the image or when the detection result is affected by the background, various object region extraction algorithms may be applied in advance (410) to improve performance. That is, when a plurality of objects are included in the image, after separating a single object and extracting a pixel center patch (420), a latent variable may be generated by an auto encoder.

도 5 는 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 장치로서 동작할 수 있는 컴퓨팅 시스템의 구성을 나타내는 블록도이다. 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 장치는, 프로세서 및 메모리를 포함하고, 프로세서는, 패치 오토 엔코더를 이용하여, 대상 물체를 포함하는 이미지로부터 복수의 이미지 패치 - 여기서, 상기 이미지 패치는 중심점을 포함 - 에 각각 대응하는 복수의 잠재 변수를 생성하고, 잠재 변수 값에 따른 이미지 패치 중심점의 3 차원 모델 상에서의 위치를 연결짓는 잠재 변수 사전을 이용하여, 상기 복수의 잠재 변수 중 미리 결정된 N 개 (여기서, N 은 2 이상의 자연수) 의 선택 잠재 변수에 대해, 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 결정하고, 그리고 상기 선택 잠재 변수에 대응되는 이미지 패치의 중심점의 3 차원 모델 상에서의 위치를 이용하여 상기 대상 물체의 자세를 결정하도록 구성될 수 있다. 본 발명의 일 실시예에 따른 패치 오토 엔코더의 잠재 변수 사전을 이용한 물체 자세 추정 장치의 보다 구체적인 동작은, 앞서 살핀 본 발명의 일 측면에 따른 물체 자세 추정 방법에 따를 수도 있다. 5 is a block diagram illustrating a configuration of a computing system capable of operating as an object posture estimation apparatus using a latent variable dictionary of a patch auto encoder according to an embodiment of the present invention. An object posture estimation apparatus using a latent variable dictionary of a patch auto encoder according to an embodiment of the present invention includes a processor and a memory, and the processor uses a patch auto encoder to obtain a plurality of images from an image including a target object. Using a latent variable dictionary that creates a plurality of latent variables each corresponding to a patch-where the image patch includes a center point-and connects the position of the image patch center point on the 3D model according to the latent variable value, the For N predetermined selected latent variables among a plurality of latent variables (where N is a natural number of 2 or more), the position of the center point of the image patch corresponding to the selected latent variable on the 3D model is determined, and the selection It may be configured to determine the posture of the target object by using the position on the 3D model of the center point of the image patch corresponding to the latent variable. A more specific operation of the apparatus for estimating an object posture using a dictionary of latent variables of a patch auto encoder according to an embodiment of the present invention may follow the method for estimating an object posture according to an aspect of the present invention.

한편, 도 5 를 참조하면, 컴퓨팅 시스템 (800) 은 플래시 스토리지 (810) , 프로세서 (820), RAM (830), 입출력 장치 (840) 및 전원 장치 (850) 를 포함할 수 있다. 또한, 플래시 스토리지 (810) 는 메모리 장치 (811) 및 메모리 컨트롤러 (812) 를 포함할 수 있다. 한편, 도 8에는 도시되지 않았지만, 컴퓨팅 시스템 (800) 은 비디오 카드, 사운드 카드, 메모리 카드, USB 장치 등과 통신하거나, 또는 다른 전자 기기들과 통신할 수 있는 포트 (port) 들을 더 포함할 수 있다.Meanwhile, referring to FIG. 5, the computing system 800 may include a flash storage 810, a processor 820, a RAM 830, an input/output device 840, and a power supply device 850. In addition, the flash storage 810 may include a memory device 811 and a memory controller 812. Meanwhile, although not shown in FIG. 8, the computing system 800 may further include ports capable of communicating with a video card, a sound card, a memory card, a USB device, or the like, or with other electronic devices. .

컴퓨팅 시스템 (800) 은 퍼스널 컴퓨터로 구현되거나, 노트북 컴퓨터, 휴대폰, PDA (personal digital assistant) 및 카메라 등과 같은 휴대용 전자 장치로 구현될 수 있다.The computing system 800 may be implemented as a personal computer, or a portable electronic device such as a notebook computer, a mobile phone, a personal digital assistant (PDA), and a camera.

프로세서 (820) 는 특정 계산들 또는 태스크 (task) 들을 수행할 수 있다. 실시예에 따라, 프로세서 (820) 는 마이크로프로세서 (micro-processor), 중앙 처리 장치 (Central Processing Unit, CPU)일 수 있다. 프로세서 (820) 는 어드레스 버스 (address bus), 제어 버스 (control bus) 및 데이터 버스 (data bus) 등과 같은 버스 (860) 를 통하여 RAM (830), 입출력 장치 (840) 및 플래시 스토리지 (810) 와 통신을 수행할 수 있다. 플래시 스토리지 (810) 는 도 5 내지 7에 도시된 실시예들의 플래시 스토리지를 이용하여 구현될 수 있다.Processor 820 may perform certain calculations or tasks. Depending on the embodiment, the processor 820 may be a micro-processor or a central processing unit (CPU). The processor 820 is provided with a RAM 830, an input/output device 840, and a flash storage 810 through a bus 860 such as an address bus, a control bus, and a data bus. Communication can be performed. The flash storage 810 may be implemented using the flash storage of the embodiments shown in FIGS. 5 to 7.

일 실시예에 따라, 프로세서 (820) 는 주변 구성요소 상호연결 (Peripheral Component Interconnect, PCI) 버스와 같은 확장 버스에도 연결될 수 있다.According to an embodiment, the processor 820 may also be connected to an expansion bus such as a Peripheral Component Interconnect (PCI) bus.

RAM (830) 는 컴퓨팅 시스템 (800) 의 동작에 필요한 데이터를 저장할 수 있다. 예를 들어, 디램 (DRAM), 모바일 디램, 에스램 (SRAM), 피램 (PRAM), 에프램 (FRAM), 엠램 (MRAM), 알램 (RRAM) 을 포함하는 임의의 유형의 랜덤 액세스 메모리가 RAM (830)으로 이용될 수 있다.The RAM 830 may store data necessary for the operation of the computing system 800. For example, random access memory of any type, including DRAM, mobile DRAM, SRAM, PRAM, FRAM, MRAM, and RRAM Can be used as 830.

입출력 장치 (840) 는 키보드, 키패드, 마우스 등과 같은 입력 수단 및 프린터, 디스플레이 등과 같은 출력 수단을 포함할 수 있다. 전원 장치 (850) 는 컴퓨팅 시스템 (800) 의 동작에 필요한 동작 전압을 공급할 수 있다.The input/output device 840 may include an input means such as a keyboard, a keypad, and a mouse, and an output means such as a printer and a display. The power supply 850 may supply an operating voltage required for the operation of the computing system 800.

상술한 본 발명에 따른 방법은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현되는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록 매체로는 컴퓨터 시스템에 의하여 해독될 수 있는 데이터가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래시 메모리, 광 데이터 저장장치 등이 있을 수 있다. 또한, 컴퓨터로 판독 가능한 기록매체는 컴퓨터 통신망으로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 읽을 수 있는 코드로서 저장되고 실행될 수 있다.The above-described method according to the present invention may be implemented as a computer-readable code on a computer-readable recording medium. The computer-readable recording medium includes all kinds of recording media in which data that can be decoded by a computer system is stored. For example, there may be read only memory (ROM), random access memory (RAM), magnetic tape, magnetic disk, flash memory, optical data storage device, and the like. In addition, the computer-readable recording medium can be distributed to a computer system connected through a computer communication network, and stored and executed as code that can be read in a distributed manner.

이상, 도면 및 실시예를 참조하여 설명하였지만, 본 발명의 보호범위가 상기 도면 또는 실시예에 의해 한정되는 것을 의미하지는 않으며 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. Although described above with reference to the drawings and examples, it does not mean that the protection scope of the present invention is limited by the drawings or examples, and those skilled in the art It will be appreciated that various modifications and changes can be made to the present invention without departing from the spirit and scope.

구체적으로, 설명된 특징들은 디지털 전자 회로, 또는 컴퓨터 하드웨어, 펌웨어, 또는 그들의 조합들 내에서 실행될 수 있다. 특징들은 예컨대, 프로그래밍 가능한 프로세서에 의한 실행을 위해, 기계 판독 가능한 저장 디바이스 내의 저장장치 내에서 구현되는 컴퓨터 프로그램 제품에서 실행될 수 있다. 그리고 특징들은 입력 데이터 상에서 동작하고 출력을 생성함으로써 설명된 실시예들의 함수들을 수행하기 위한 지시어들의 프로그램을 실행하는 프로그래밍 가능한 프로세서에 의해 수행될 수 있다. 설명된 특징들은, 데이터 저장 시스템으로부터 데이터 및 지시어들을 수신하기 위해, 및 데이터 저장 시스템으로 데이터 및 지시어들을 전송하기 위해 결합된 적어도 하나의 프로그래밍 가능한 프로세서, 적어도 하나의 입력 디바이스, 및 적어도 하나의 출력 디바이스를 포함하는 프로그래밍 가능한 시스템 상에서 실행될 수 있는 하나 이상의 컴퓨터 프로그램들 내에서 실행될 수 있다. 컴퓨터 프로그램은 소정 결과에 대해 특정 동작을 수행하기 위해 컴퓨터 내에서 직접 또는 간접적으로 사용될 수 있는 지시어들의 집합을 포함한다. 컴퓨터 프로그램은 컴파일된 또는 해석된 언어들을 포함하는 프로그래밍 언어 중 어느 형태로 쓰여지고, 모듈, 소자, 서브루틴(subroutine), 또는 다른 컴퓨터 환경에서 사용을 위해 적합한 다른 유닛으로서, 또는 독립 조작 가능한 프로그램으로서 포함하는 어느 형태로도 사용될 수 있다.Specifically, the described features may be implemented in digital electronic circuitry, or computer hardware, firmware, or combinations thereof. Features may be executed in a computer program product implemented in storage in a machine-readable storage device, for example, for execution by a programmable processor. And the features can be performed by a programmable processor executing a program of directives to perform the functions of the described embodiments by operating on input data and generating output. The described features include at least one programmable processor, at least one input device, and at least one output device coupled to receive data and directives from the data storage system and to transmit data and directives to the data storage system. It can be executed within one or more computer programs that can be executed on a programmable system including. A computer program includes a set of directives that can be used directly or indirectly within a computer to perform a specific action on a given result. A computer program is written in any form of a programming language, including compiled or interpreted languages, and is included as a module, element, subroutine, or other unit suitable for use in another computer environment, or as a independently operable program It can be used in any form.

지시어들의 프로그램의 실행을 위한 적합한 프로세서들은, 예를 들어, 범용 및 특수 용도 마이크로프로세서들 둘 모두, 및 단독 프로세서 또는 다른 종류의 컴퓨터의 다중 프로세서들 중 하나를 포함한다. 또한 설명된 특징들을 구현하는 컴퓨터 프로그램 지시어들 및 데이터를 구현하기 적합한 저장 디바이스들은 예컨대, EPROM, EEPROM, 및 플래쉬 메모리 디바이스들과 같은 반도체 메모리 디바이스들, 내부 하드 디스크들 및 제거 가능한 디스크들과 같은 자기 디바이스들, 광자기 디스크들 및 CD-ROM 및 DVD-ROM 디스크들을 포함하는 비휘발성 메모리의 모든 형태들을 포함한다. 프로세서 및 메모리는 ASIC들(application-specific integrated circuits) 내에서 통합되거나 또는 ASIC들에 의해 추가되어질 수 있다.Suitable processors for execution of a program of directives include, for example, both general and special purpose microprocessors, and either a single processor or multiple processors of a different type of computer. Storage devices suitable for implementing computer program directives and data implementing the described features are, for example, semiconductor memory devices such as EPROM, EEPROM, and flash memory devices, magnetic devices such as internal hard disks and removable disks. Devices, magneto-optical disks, and all types of non-volatile memory including CD-ROM and DVD-ROM disks. The processor and memory may be integrated within application-specific integrated circuits (ASICs) or added by ASICs.

이상에서 설명한 본 발명은 일련의 기능 블록들을 기초로 설명되고 있지만, 전술한 실시 예 및 첨부된 도면에 의해 한정되는 것이 아니고, 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 여러 가지 치환, 변형 및 변경 가능하다는 것이 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 있어 명백할 것이다.The present invention described above is described on the basis of a series of functional blocks, but is not limited by the above-described embodiments and the accompanying drawings, and various substitutions, modifications and changes within the scope not departing from the technical spirit of the present invention It will be apparent to those of ordinary skill in the art to which this invention pertains.

전술한 실시 예들의 조합은 전술한 실시 예에 한정되는 것이 아니며, 구현 및/또는 필요에 따라 전술한 실시예들 뿐 아니라 다양한 형태의 조합이 제공될 수 있다.Combinations of the above-described embodiments are not limited to the above-described embodiments, and various types of combinations as well as the above-described embodiments may be provided according to implementation and/or need.

전술한 실시 예들에서, 방법들은 일련의 단계 또는 블록으로서 순서도를 기초로 설명되고 있으나, 본 발명은 단계들의 순서에 한정되는 것은 아니며, 어떤 단계는 상술한 바와 다른 단계와 다른 순서로 또는 동시에 발생할 수 있다. 또한, 당해 기술 분야에서 통상의 지식을 가진 자라면 순서도에 나타난 단계들이 배타적이지 않고, 다른 단계가 포함되거나, 순서도의 하나 또는 그 이상의 단계가 본 발명의 범위에 영향을 미치지 않고 삭제될 수 있음을 이해할 수 있을 것이다.In the above-described embodiments, the methods are described on the basis of a flowchart as a series of steps or blocks, but the present invention is not limited to the order of steps, and certain steps may occur in a different order or concurrently with those described above. have. In addition, those of ordinary skill in the art understand that the steps shown in the flowchart are not exclusive, other steps are included, or one or more steps in the flowchart may be deleted without affecting the scope of the present invention. You can understand.

전술한 실시 예는 다양한 양태의 예시들을 포함한다. 다양한 양태들을 나타내기 위한 모든 가능한 조합을 기술할 수는 없지만, 해당 기술 분야의 통상의 지식을 가진 자는 다른 조합이 가능함을 인식할 수 있을 것이다. 따라서, 본 발명은 이하의 특허청구범위 내에 속하는 모든 다른 교체, 수정 및 변경을 포함한다고 할 것이다. The above-described embodiments include examples of various aspects. Although not all possible combinations for representing the various aspects can be described, those of ordinary skill in the art will recognize that other combinations are possible. Accordingly, the present invention will be said to include all other replacements, modifications and changes falling within the scope of the following claims.

Claims

An object attitude estimation method using a dictionary of latent variables of a patch auto encoder, performed by a computing device,
Generating a plurality of latent variables each corresponding to a plurality of image patches from an image including a target object, wherein the image patch includes a center point, using a patch auto encoder;
Using a latent variable dictionary that connects the position of the center point of the image patch according to the value of the latent variable on the 3D model, among the plurality of latent variables, N predetermined (where N is a natural number of 2 or more) selected latent variables And determining a position on the 3D model of a center point of the image patch corresponding to the selected latent variable; And
Determining a posture of the target object using a position on a 3D model of a center point of an image patch corresponding to the selected latent variable,
The N predetermined selection latent variables,
The object attitude estimation method, wherein, among the plurality of latent variables, when a region of a latent variable around the plurality of latent variables is mapped to a three-dimensional region, the mapped three-dimensional region is N latent variables selected in a narrower order.

An object attitude estimation method using a dictionary of latent variables of a patch auto encoder, performed by a computing device,
Generating a plurality of latent variables each corresponding to a plurality of image patches from an image including a target object, wherein the image patch includes a center point, using a patch auto encoder;
Using a latent variable dictionary that connects the position of the center point of the image patch according to the value of the latent variable on the 3D model, among the plurality of latent variables, N predetermined (where N is a natural number of 2 or more) selected latent variables And determining a position on the 3D model of a center point of the image patch corresponding to the selected latent variable; And
Determining a posture of the target object using a position on a 3D model of a center point of an image patch corresponding to the selected latent variable,
The N predetermined selection latent variables,
The object attitude estimation method, wherein, among the plurality of latent variables, when a latent variable region around the plurality of latent variables is mapped to a predetermined 3D region, the regions of the mapped latent variables are N latent variables selected in a wider order.

The method according to claim 1 or 2,
The object posture estimation method, wherein the center point of the image patch corresponding to the selected latent variable is a feature point of the target object.

The method according to claim 1 or 2,
The patch auto encoder,
Learning to receive a plurality of training image patches included in at least one training data, convert them into latent variables corresponding to each of the plurality of training image patches, and convert the latent variables back into the training image patches and output A deep learning-based auto encoder, an object attitude estimation method.

The method of claim 1,
The N predetermined selection latent variables,
The object attitude estimation method is selected to have a higher precision compared to the case of selecting a latent variable other than the selected latent variable.

delete

The method of claim 2,
The N predetermined selection latent variables,
Compared to the case of selecting a latent variable other than the selected latent variable, the object attitude estimation method is selected to have robustness and universality for a wider range of inputs.

The method according to claim 1 or 2,
In response to a determination that the 3D area matching the area of the surrounding potential variable located within a predetermined distance from the first potential variable is located spaced apart from the 3D area corresponding to the first potential variable by a predetermined distance or more, The first latent variable is to be excluded from the selected latent variable, the object attitude estimation method.

The method of claim 8,
The N predetermined selection latent variables,
The object attitude estimation method is selected to exclude outliers from the feature points.

The method of claim 1,
The step of determining the posture of the target object,
To estimate the position and posture of the target object by applying a Perspective N Point algorithm.

An object attitude estimation apparatus using a dictionary of latent variables of a patch auto encoder, the apparatus including a processor and a memory, the processor,
Generating a plurality of latent variables each corresponding to a plurality of image patches from an image including a target object, wherein the image patch includes a central point, using a patch auto encoder;
Using a latent variable dictionary that connects the position of the center point of the image patch according to the value of the latent variable on the 3D model, among the plurality of latent variables, N predetermined (where N is a natural number of 2 or more) selected latent variables And determining a position on the 3D model of the center point of the image patch corresponding to the selected latent variable; And
It is configured to determine the posture of the target object by using the position on the 3D model of the center point of the image patch corresponding to the selected latent variable,
The N predetermined selection latent variables,
Among the plurality of latent variables, when a region of a latent variable around the plurality of latent variables is mapped to a three-dimensional region, the mapped three-dimensional region is N latent variables selected in a narrower order,
When a potential variable region around the plurality of latent variables among the plurality of latent variables is mapped to a predetermined 3D region, the region of the mapped latent variable is N latent variables selected in a wider order.

A computer-readable storage medium including instructions executable by a processor, wherein the instructions are for performing object attitude estimation using a latent variable dictionary of a patch auto encoder, and the instructions are transferred to the processor when executed by the processor. Let,
Generating a plurality of latent variables each corresponding to a plurality of image patches from an image including a target object, wherein the image patch includes a central point, using a patch auto encoder;
Using a latent variable dictionary that connects the position of the center point of the image patch according to the value of the latent variable on the 3D model, among the plurality of latent variables, N predetermined (where N is a natural number of 2 or more) selected latent variables And determining a position on the 3D model of the center point of the image patch corresponding to the selected latent variable; And
It is configured to determine the posture of the target object by using the position on the 3D model of the center point of the image patch corresponding to the selected latent variable,
The N predetermined selection latent variables,
Among the plurality of latent variables, when a region of a latent variable around the plurality of latent variables is mapped to a three-dimensional region, the mapped three-dimensional region is N latent variables selected in a narrower order,
Among the plurality of latent variables, when a latent variable region around the plurality of latent variables is mapped to a predetermined three-dimensional region, the region of the mapped latent variable is N latent variables selected in a wider order, a computer-readable storage medium .