KR102228525B1

KR102228525B1 - Grasping robot, grasping method and learning method for grasp based on neural network

Info

Publication number: KR102228525B1
Application number: KR1020190015126A
Authority: KR
Inventors: 서일홍; 박영빈; 김병완
Original assignee: 한양대학교 산학협력단
Priority date: 2018-11-20
Filing date: 2019-02-08
Publication date: 2021-03-16
Also published as: KR20200059111A

Abstract

파지를 위한 심층 지도 학습 방법 및 심층 지도 학습 결과를 이용해 목표 물체를 파지하는 방법과 로봇이 개시된다. 개시된 뉴럴 네트워크를 이용하는 파지 로봇의 파지 방법은 복수의 물체가 배치된 작업 공간을 촬영하여 획득된 작업 공간 이미지가, 목표 물체 영역 및 배경 영역으로 분할된 제1이미지를 생성하는 단계; 학습된 제1뉴럴 네트워크를 이용하여, 상기 제1이미지의 목표 물체 영역에 포함된 목표 물체에 대한 엔드 이펙터의 근접 위치 및 근접 자세를 추정하는 단계; 및 학습된 제2뉴럴 네트워크를 이용하여, 상기 목표 물체에 근접한 엔드 이펙터의 위치 및 자세에서 획득한 제2이미지에 포함된 상기 목표 물체에 대한 상기 엔드 이펙터의 파지 위치 및 파지 자세를 추정하는 단계를 포함한다.Disclosed are a deep supervised learning method for gripping, a method of gripping a target object using a result of the deep supervised learning, and a robot. A gripping method of a gripping robot using the disclosed neural network includes the steps of: generating a first image obtained by photographing a work space in which a plurality of objects are arranged and a work space image obtained by dividing a target object region and a background region; Estimating a proximity position and a proximity posture of an end effector with respect to a target object included in the target object area of the first image by using the learned first neural network; And estimating a gripping position and a gripping posture of the end effector with respect to the target object included in the second image obtained from the position and posture of the end effector close to the target object by using the learned second neural network. Includes.

Description

A gripping method using a neural network, a gripping learning method, and a gripping robot {GRASPING ROBOT, GRASPING METHOD AND LEARNING METHOD FOR GRASP BASED ON NEURAL NETWORK}

본 발명은 뉴럴 네트워크를 이용하는 파지 방법, 파지 학습 방법 및 파지 로봇에 관한 것으로서, 파지를 위한 심층 지도 학습 방법 및 심층 지도 학습 결과를 이용해 목표 물체를 파지하는 방법과 로봇에 관한 것이다.The present invention relates to a gripping method, a gripping learning method, and a gripping robot using a neural network, and to a deep supervised learning method for gripping, a method of gripping a target object using a result of the deep supervised learning, and a robot.

기계 학습(machine learning)을 이용하여, 로봇의 동작을 제어하기 위한 다양한 연구들이 진행되고 있다. 그리고 기계 학습 방법 중에서도, 심층 지도 학습을 통해 로봇의 파지 동작을 제어하는 연구들이 있다. 심층 지도 학습이란 심층 학습(Deep learning)과 지도 학습(Supervised learning)이 결합한 형태의 기계 학습 방법이다.Various studies are being conducted to control the motion of a robot using machine learning. And, among machine learning methods, there are studies on controlling the gripping motion of a robot through in-depth supervised learning. Deep supervised learning is a machine learning method that combines deep learning and supervised learning.

로봇은 인공 신경망 즉 뉴럴 네트워크(neural network)를 이용하여 심층 지도 학습을 수행할 수 있다. 로봇은 카메라로부터 획득한 이미지를 통해 파지 대상인 목표 물체의 특징을 추출하며, 레이블(label), 즉 정답으로 제시된 목표 물체에 대한 파지 동작을 이용하여 지도 학습을 수행할 수 있다. 그리고 이미지로부터 목표 물체의 특징을 추출하기 위해, CNN(Convolutional Neural Network)이 이용될 수 있다.The robot can perform deep supervised learning using an artificial neural network, that is, a neural network. The robot extracts features of a target object to be gripped through an image acquired from a camera, and can perform supervised learning by using a label, that is, a gripping operation on the target object presented as a correct answer. In addition, in order to extract the features of the target object from the image, a convolutional neural network (CNN) may be used.

관련 선행문헌으로 특허 문헌인 대한민국 공개특허 제2018-0114217호, 비특허 문헌인 "문성필, 박영빈, 서일홍, 혼합 밀도 신경망와 심층 컨볼루션 신경망을 이용한 사전 파지 자세 추정 방법, 2016년 대한전자공학회 하계학술대회 논문집"이 있다.Korean Patent Publication No. 2018-0114217, which is a patent document, as a related prior document, and "Sungpil Moon, Youngbin Park, and Ilhong Seo, a non-patent document, a method for estimating pre-holding posture using a mixed density neural network and a deep convolutional neural network, There is "Proceedings".

본 발명은 파지를 위한 심층 지도 학습 방법 및 심층 지도 학습 결과를 이용해 목표 물체를 파지하는 방법과 로봇을 제공하기 위한 것이다.An object of the present invention is to provide a deep supervised learning method for gripping, a method of gripping a target object using a result of the deep supervised learning, and a robot.

상기한 목적을 달성하기 위한 본 발명의 일 실시예에 따르면, 복수의 물체가 배치된 작업 공간을 촬영하여 획득된 작업 공간 이미지가, 목표 물체 영역 및 배경 영역으로 분할된 제1이미지를 생성하는 단계; 학습된 제1뉴럴 네트워크를 이용하여, 상기 제1이미지의 목표 물체 영역에 포함된 목표 물체에 대한 엔드 이펙터의 근접 위치 및 근접 자세를 추정하는 단계; 및 학습된 제2뉴럴 네트워크를 이용하여, 상기 목표 물체에 근접한 엔드 이펙터의 위치 및 자세에서 획득한 제2이미지에 포함된 상기 목표 물체에 대한 상기 엔드 이펙터의 파지 위치 및 파지 자세를 추정하는 단계를 포함하는 뉴럴 네트워크를 이용하는 파지 로봇의 파지 방법이 제공된다.According to an embodiment of the present invention for achieving the above object, the step of generating a first image divided into a target object area and a background area of a work space image obtained by photographing a work space in which a plurality of objects are arranged. ; Estimating a proximity position and a proximity posture of an end effector with respect to a target object included in the target object area of the first image by using the learned first neural network; And estimating a gripping position and a gripping posture of the end effector with respect to the target object included in the second image obtained from the position and posture of the end effector close to the target object by using the learned second neural network. A gripping method of a gripping robot using a containing neural network is provided.

또한 상기한 목적을 달성하기 위한 본 발명의 다른 실시예에 따르면, 복수의 물체가 배치된 작업 공간을 촬영하여 획득된 작업 공간 이미지가, 훈련 물체 영역 및 배경 영역으로 분할된 제1이미지를 생성하는 단계; 제1뉴럴 네트워크를 이용하여, 상기 제1이미지의 훈련 물체 영역에 포함된 훈련 물체에 대한 엔드 이펙터의 근접 위치 및 근접 자세를 학습하는 단계; 및 제2뉴럴 네트워크를 이용하여, 상기 훈련 물체에 근접한 엔드 이펙터의 위치 및 자세에서 획득한 제2이미지에 포함된 상기 훈련 물체에 대한 상기 엔드 이펙터의 파지 위치 및 파지 자세를 학습하는 단계를 포함하는 파지 로봇을 위한 뉴럴 네트워크 학습 방법이 제공된다.In addition, according to another embodiment of the present invention for achieving the above object, a work space image obtained by photographing a work space in which a plurality of objects are disposed is generated to generate a first image divided into a training object area and a background area. step; Learning a proximity position and a proximity posture of an end effector with respect to the training object included in the training object region of the first image by using a first neural network; And learning a gripping position and a gripping posture of the end effector with respect to the training object included in a second image obtained from the position and posture of the end effector close to the training object by using a second neural network. A neural network learning method for gripping robots is provided.

또한 상기한 목적을 달성하기 위한 본 발명의 또 다른 실시예에 따르면, 복수의 물체가 배치된 작업 공간을 촬영하여 작업 공간 이미지를 생성하는 제1카메라; 상기 작업 공간 이미지가, 목표 물체 영역 및 배경 영역으로 분할된 제1이미지를 생성하는 이미지 처리부; 학습된 제1뉴럴 네트워크를 이용하여, 상기 제1이미지의 목표 물체 영역에 포함된 목표 물체에 대한 엔드 이펙터의 근접 위치 및 근접 자세를 추정하는 근접 동작 제어부; 상기 목표 물체에 근접한 엔드 이펙터의 위치 및 자세에서 상기 목표 물체를 촬영하여 제2이미지를 생성하는 제2카메라; 및 학습된 제2뉴럴 네트워크를 이용하여, 상기 제2이미지에 포함된 상기 목표 물체에 대한 상기 엔드 이펙터의 파지 위치 및 파지 자세를 추정하는 파지 동작 제어부를 포함하는 뉴럴 네트워크를 이용하는 파지 로봇이 제공된다.In addition, according to another embodiment of the present invention for achieving the above object, a first camera for generating a work space image by photographing a work space in which a plurality of objects are arranged; An image processing unit that generates a first image in which the work space image is divided into a target object area and a background area; A proximity motion controller for estimating a proximity position and a proximity posture of an end effector with respect to a target object included in the target object area of the first image, using the learned first neural network; A second camera for generating a second image by photographing the target object at the position and posture of the end effector close to the target object; And a gripping operation control unit for estimating a gripping position and a gripping posture of the end effector with respect to the target object included in the second image by using the learned second neural network. .

본 발명에 따르면, 학습에 이용되는 이미지에서 훈련 물체 이외 불필요한 정보가 제거될 수 있으므로, 파지를 위한 학습 효율이 향상될 수 있다.According to the present invention, since unnecessary information other than a training object can be removed from an image used for learning, learning efficiency for gripping can be improved.

또한 본 발명에 따르면, 훈련 물체에 대한 근접 동작과 파지 동작을 각각 별도의 뉴런 네트워크를 이용하여 학습하고 훈련 물체에 근접한 상태에서의 훈련 물체 이미지를 이용하여 파지 동작을 학습함으로써, 학습 효율이 향상될 수 있다.In addition, according to the present invention, learning efficiency can be improved by learning the proximity motion and the gripping motion for a training object using separate neuron networks, and learning the gripping motion using the training object image in a state close to the training object. I can.

도 1 및 도 2는 본 발명의 일실시예에 따른 파지 로봇을 위한 뉴럴 네트워크 학습 방법의 개념을 설명하기 위한 도면이다.
도 3은 본 발명의 일실시예에 따른 파지 로봇을 위한 뉴럴 네트워크 학습 방법을 설명하기 위한 흐름도이다.
도 4는 제2이미지의 일실시예를 도시하는 도면이다.
도 5는 본 발명의 일실시예에 따른 뉴럴 네트워크를 이용하는 파지 로봇의 파지 방법을 설명하기 위한 흐름도이다.
도 6은 본 발명의 일실시예에 따른 뉴럴 네트워크를 설명하기 위한 도면이다.
도 7은 본 발명의 일실시예에 따른 뉴럴 네트워크를 이용하는 파지 로봇을 설명하기 위한 도면이다.1 and 2 are diagrams for explaining the concept of a neural network learning method for a gripping robot according to an embodiment of the present invention.
3 is a flowchart illustrating a method of learning a neural network for a gripping robot according to an embodiment of the present invention.
4 is a diagram showing an embodiment of a second image.
5 is a flowchart illustrating a gripping method of a gripping robot using a neural network according to an embodiment of the present invention.
6 is a diagram illustrating a neural network according to an embodiment of the present invention.
7 is a diagram illustrating a gripping robot using a neural network according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. In the present invention, various modifications may be made and various embodiments may be provided, and specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to a specific embodiment, it should be understood to include all changes, equivalents, and substitutes included in the spirit and scope of the present invention. In describing each drawing, similar reference numerals have been used for similar elements.

이하에서, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 및 도 2는 본 발명의 일실시예에 따른 파지 로봇을 위한 뉴럴 네트워크 학습 방법의 개념을 설명하기 위한 도면이다. 1 and 2 are diagrams for explaining the concept of a neural network learning method for a gripping robot according to an embodiment of the present invention.

파지 로봇은 파지 동작의 학습 단계에서 학습을 위한 훈련 데이터를 이용한다. 이러한 훈련 데이터는 다양한 훈련 물체 별로 주어진 파지 동작 데이터를 포함한다. 파지 동작은 파지 로봇의 엔드 이펙터(end effector), 예컨대 로봇 핸드에 의해 수행되는 것으로서, 파지를 위한 엔드 이펙터의 위치 및 자세를 포함하는 개념이다. The gripping robot uses training data for learning in the learning stage of the gripping motion. These training data include gripping motion data given for various training objects. The gripping operation is performed by an end effector of the gripping robot, for example, a robot hand, and is a concept including the position and posture of the end effector for gripping.

즉, 파지 로봇은 훈련 데이터에 기반하여, 다양한 훈련 물체 별로 미리 주어진 파지를 위한 엔드 이펙터의 위치 및 자세를 학습하는 것이다. That is, the gripping robot learns the position and posture of an end effector for gripping given in advance for each of various training objects, based on training data.

이후 파지 로봇은 파지 동작 수행 단계에서 학습 결과, 즉 학습 데이터를 이용하여, 목표 물체에 대한 파지 동작을 추정한다. 목표 물체가 학습에 이용된 특정 훈련 물체와 유사하다면, 파지 로봇은 특정 훈련 물체에 대한 파지 동작과 유사한 파지 동작을 목표 물체에 대한 파지 동작으로서 추정할 수 있다.Thereafter, the gripping robot estimates the gripping motion for the target object by using the learning result, that is, the learning data, in the gripping operation performing step. If the target object is similar to the specific training object used for learning, the gripping robot may estimate a gripping motion similar to the gripping motion for the specific training object as the gripping motion for the target object.

파지 로봇은 전술된 바와 같이, 뉴럴 네트워크를 이용하여 다양한 훈련 물체별 파지 동작을 학습할 수 있으며 학습된 뉴럴 네트워크를 이용하여 목표 물체에 대한 파지 동작을 추정할 수 있다. 다양한 훈련 물체 및 목표 물체를 인식하기 위해 물체에 대한 이미지가 이용되며, 이러한 이미지로부터 훈련 물체 및 목표 물체의 특징값을 추출하기 위해 CNN(Convolutional Neural Network)이 이용될 수 있다.As described above, the gripping robot may learn gripping motions for various training objects using a neural network, and estimate gripping motions for a target object using the learned neural network. An image of an object is used to recognize various training objects and target objects, and a Convolutional Neural Network (CNN) may be used to extract feature values of the training object and the target object from these images.

이 때, 파지를 위한 작업 공간에 다양한 물체들이 존재할 수 있기 때문에 도 1에 도시된 바와 같이, 작업 공간을 촬영하여 획득한 훈련 데이터의 이미지에는 훈련 물체인 검은색 테이프(110) 이외 다른 물체도 함께 포함될 수 있다. 이 경우 훈련 데이터의 이미지에 훈련 물체 이외 불필요한 정보들이 많이 포함되기 때문에 학습 효율이 떨어질 수 있다. At this time, since various objects may exist in the working space for gripping, as shown in FIG. 1, objects other than the black tape 110 as the training object are also included in the image of the training data obtained by photographing the working space Can be included. In this case, since a lot of unnecessary information other than the training object is included in the image of the training data, learning efficiency may be degraded.

이에 본 발명은 도 2와 같이, 훈련 물체가 집중된 이미지를 이용하여 학습을 수행한다. 도 2의 이미지는 도 1의 작업 공간 이미지로부터 얻어진 이미지로서, 작업 공간 이미지와 비교하여 훈련 물체인 검은색 테이프(110)가 포함된 훈련 물체 영역을 제외하고 나머지 영역은 모두 배경 영역으로 처리된 이미지이다. Accordingly, the present invention performs learning using an image in which the training object is concentrated, as shown in FIG. 2. The image of FIG. 2 is an image obtained from the work space image of FIG. 1, and compared with the work space image, all other areas except for the training object area including the black tape 110 as a training object are processed as background areas. to be.

특히, 본 발명은 훈련 물체가 보다 집중될 수 있는 이미지를 이용하기 위해, 배경 영역의 화소값을 미리 설정된 하나의 화소값으로 설정할 수 있으며, 배경 영역의 화소값은 도 2에 도시된 바와 같이, 일예로서 0일 수 있다. 반면, 훈련 물체 영역의 화소값은 작업 공간 이미지에서의 훈련 물체에 대한 화소값에 대응된다.In particular, in the present invention, in order to use an image in which the training object can be more concentrated, the pixel value of the background area may be set as one preset pixel value, and the pixel value of the background area is as shown in FIG. 2, As an example, it may be 0. On the other hand, the pixel value of the training object area corresponds to the pixel value of the training object in the work space image.

결국, 본 발명에 따르면, 학습에 이용되는 이미지에서 훈련 물체 이외 불필요한 정보가 제거될 수 있으므로, 파지를 위한 학습 효율이 향상될 수 있다.Consequently, according to the present invention, since unnecessary information other than the training object can be removed from the image used for learning, learning efficiency for gripping can be improved.

도 3은 본 발명의 일실시예에 따른 파지 로봇을 위한 뉴럴 네트워크 학습 방법을 설명하기 위한 흐름도이며, 도 4는 제2이미지의 일실시예를 도시하는 도면이다.3 is a flowchart illustrating a method of learning a neural network for a gripping robot according to an embodiment of the present invention, and FIG. 4 is a diagram illustrating an embodiment of a second image.

본 발명에 따른 학습 방법은 프로세서를 포함하는 컴퓨팅 장치, 로봇 등에서 수행될 수 있으며, 이하에서는 컴퓨팅 장치인 학습 장치에서 수행되는 학습 방법이 일실시예로서 설명된다.The learning method according to the present invention may be performed in a computing device including a processor, a robot, and the like. Hereinafter, a learning method performed in a learning device, which is a computing device, will be described as an embodiment.

본 발명에 따른 학습 장치는 복수의 물체가 배치된 작업 공간을 촬영하여 획득된 작업 공간 이미지가, 훈련 물체 영역 및 배경 영역으로 분할된 제1이미지를 생성(S310)한다. 훈련 물체 영역은 작업 공간 이미지에 포함된 다양한 물체들 중에서 미리 지정된 훈련 물체를 포함하는 영역이다. 일예로서, 작업 공간 이미지는 도 1과 같을 수 있으며, 제1이미지는 훈련 물체가 집중된 이미지로서, 도 2와 같은 이미지일 수 있다.The learning apparatus according to the present invention generates a first image obtained by photographing a work space in which a plurality of objects are arranged, and a first image obtained by dividing the work space image into a training object region and a background region (S310). The training object area is an area including a predetermined training object among various objects included in the work space image. As an example, the work space image may be as shown in FIG. 1, and the first image may be an image in which a training object is concentrated, and may be an image as in FIG. 2.

이후 학습 장치는 제1뉴럴 네트워크를 이용하여, 제1이미지의 훈련 물체 영역에 포함된 훈련 물체에 대한 엔드 이펙터의 근접 위치 및 근접 자세를 학습(S320)한다. 여기서, 엔드 이펙터의 근접 위치 및 근접 자세는 훈련 물체에 근접하기 위한 파지 로봇의 관절각 또는 엑츄에이터에 대한 제어값에 대응될 수 있다. Thereafter, the learning apparatus learns the proximity position and the proximity posture of the end effector with respect to the training object included in the training object region of the first image using the first neural network (S320). Here, the proximity position and the proximity posture of the end effector may correspond to a joint angle of the gripping robot for proximity to the training object or a control value for the actuator.

그리고 제2뉴럴 네트워크를 이용하여, 훈련 물체에 근접한 엔드 이펙터의 위치 및 자세에서 획득한 제2이미지에 포함된 훈련 물체에 대한 엔드 이펙터의 파지 위치 및 파지 자세를 학습(S330)한다. 마찬가지로 엔드 이펙터의 파지 위치 및 파지 자세는 훈련 물체를 파지 하기위한 파지 로봇의 관절각 또는 엑츄에이터에 대한 제어값에 대응될 수 있다.Then, by using the second neural network, the gripping position and the gripping posture of the end effector with respect to the training object included in the second image acquired from the position and posture of the end effector close to the training object are learned (S330). Similarly, the gripping position and the gripping posture of the end effector may correspond to a joint angle of a gripping robot for gripping a training object or a control value for an actuator.

이와 같이, 본 발명에 따른 학습 방법은 제1이미지와 같이 훈련 물체에 집중된 이미지를 이용하여 학습을 수행하되, 제1이미지로부터 바로 파지 동작을 학습하는 것이 아니라, 단계 S320에서 파지 동작을 위한 사전 동작으로서 엔드 이펙터를 훈련 물체에 근접시키는 동작을 학습한다. As described above, the learning method according to the present invention performs learning using an image concentrated on a training object, such as the first image, but does not learn the gripping motion directly from the first image, but a pre-operation for the gripping motion in step S320. As a result, we learn the motion of bringing the end effector close to the training object.

작업 공간 이미지는 작업 공간 전체를 촬영한 이미지이기 때문에, 제1이미지에서 집중된 훈련 물체의 크기가 제1이미지 전체 크기에 비해 상대적으로 작다. 따라서, 훈련 물체에 대한 특징값이 정확하게 추출되기 어려우므로, 본 발명에 따른 학습 장치는 1차적으로 제1이미지를 이용하여 엔드 이펙터를 훈련 물체에 근접시키는 동작을 학습한다.Since the work space image is an image of the entire work space, the size of the training object concentrated in the first image is relatively small compared to the total size of the first image. Therefore, since it is difficult to accurately extract the feature value for the training object, the learning apparatus according to the present invention primarily learns an operation of bringing the end effector to the training object by using the first image.

이후, 카메라가 장착된 엔드 이펙터가 훈련 물체에 근접한 상태로 획득된 제2이미지는 도 4와 같이, 제1이미지에서의 훈련 물체보다 큰 형상의 훈련 물체를 포함한다. 도 4의 제2이미지에서 훈련 물체인 검은색 테이프(110) 아래의 그리퍼는 파지 로봇의 엔드 이펙터이다.Thereafter, the second image obtained while the end effector equipped with the camera is close to the training object includes a training object having a shape larger than that of the training object in the first image, as shown in FIG. 4. In the second image of FIG. 4, the gripper under the black tape 110, which is a training object, is the end effector of the gripping robot.

제2이미지는 작업 공간 이미지를 획득하는 카메라보다 상대적으로 낮은 곳에 위치하는 카메라에 의해 획득된다. 따라서, 제1이미지를 이용하는 경우보다 상대적으로 훈련 물체에 대한 특징값이 보다 정확하게 추출될 수 있으므로, 단계 S330에서 제2이미지를 이용하여 엔드 이펙터의 파지 위치 및 파지 자세를 학습한다.The second image is acquired by a camera positioned relatively lower than the camera acquiring the work space image. Therefore, since the feature value of the training object can be extracted more accurately than the case of using the first image, in step S330, the gripping position and the gripping posture of the end effector are learned using the second image.

이 때, 단계 S330에서 학습 장치는 엔드 이펙터의 근접 위치 및 근접 자세에 기반하여, 엔드 이펙터의 파지 위치 및 파지 자세를 추정할 수 있다. 즉, 학습 장치는 제2이미지에 포함된 훈련 물체 및 엔드 이펙터의 근접 위치 및 근접 자세에 대한 엔드 이펙터의 파지 위치 및 파지 자세를 학습한다.In this case, in step S330, the learning apparatus may estimate a gripping position and a gripping posture of the end effector based on the proximity position and the proximity posture of the end effector. That is, the learning apparatus learns the gripping position and gripping posture of the end effector with respect to the proximity position and the proximity posture of the training object and the end effector included in the second image.

이와 같이 본 발명에 따르면, 훈련 물체에 대한 근접 동작과 파지 동작을 각각 별도의 뉴런 네트워크를 이용하여 학습하고 훈련 물체에 근접한 상태에서의 훈련 물체 이미지를 이용하여 파지 동작을 학습함으로써, 학습 효율이 향상될 수 있다.As described above, according to the present invention, learning efficiency is improved by learning the proximity motion and the gripping motion for a training object using separate neuron networks, and learning the gripping motion using the training object image in a state close to the training object. Can be.

한편, 단계 S310에서 학습 장치는 제1이미지를 생성하기 위해, 작업 공간 이미지에서 물체를 인식한 후, 작업 공간 이미지를, 인식된 물체 중 미리 설정된 훈련 물체가 위치하는 훈련 물체 영역 및 배경 영역으로 분할하여 제1이미지를 생성할 수 있다. 학습 장치는 일실시예로서 이미지를 객체 단위의 의미적 특징으로 분할하는 의미적 영상 분할(semantic segmentation) 알고리즘에 기반하여, 제1이미지를 생성할 수 있다.Meanwhile, in step S310, in order to generate a first image, the learning device recognizes an object in the work space image, and then divides the work space image into a training object area and a background area in which a preset training object is located among the recognized objects. Thus, the first image can be generated. As an embodiment, the learning apparatus may generate the first image based on a semantic segmentation algorithm that divides the image into semantic features of an object unit.

또한 단계 S310에서 학습 장치는 제1이미지의 훈련 물체 영역에 대응되는 작업 공간 이미지의 영역에서의 화소값을 그대로 제1이미지의 훈련 물체 영역의 화소값으로 이용한다. 다만, 배경 영역의 색상은 훈련 물체 영역이 부각될 수 있도록 예컨대 검은색으로 처리될 수 있다.In addition, in step S310, the learning apparatus uses the pixel value of the area of the work space image corresponding to the training object area of the first image as the pixel value of the training object area of the first image. However, the color of the background area may be processed as black, for example, so that the training object area can be emphasized.

그리고 실시예에 따라서 배경 영역의 화소값은 훈련 물체 영역의 화소값에 따라서 적응적으로 달라질 수 있다. 예컨대, 훈련 물체 영역의 화소값의 평균값이 임계값 이하로서 훈련 물체 영역이 어둡다면, 상대적으로 배경 영역이 밝아지도록 배경 영역의 화소값이 결정됨으로써, 제1이미지에서 훈련 물체 영역이 보다 집중될 수 있다.In addition, according to an exemplary embodiment, the pixel value of the background region may be adaptively changed according to the pixel value of the training object region. For example, if the training object area is dark because the average value of the pixel values of the training object area is less than the threshold value, the pixel value of the background area is determined so that the background area is relatively bright, so that the training object area can be more concentrated in the first image. have.

도 5는 본 발명의 일실시예에 따른 뉴럴 네트워크를 이용하는 파지 로봇의 파지 방법을 설명하기 위한 흐름도이다.5 is a flowchart illustrating a gripping method of a gripping robot using a neural network according to an embodiment of the present invention.

본 발명에 따른 파지 로봇은 복수의 물체가 배치된 작업 공간을 촬영하여 획득된 작업 공간 이미지가, 목표 물체 영역 및 배경 영역으로 분할된 제1이미지를 생성(S510)한다. 파지 로봇은 단계 S310과 같이 제1이미지를 생성할 수 있다.The gripping robot according to the present invention generates a first image in which a work space image obtained by photographing a work space in which a plurality of objects are arranged is divided into a target object region and a background region (S510). The gripping robot may generate the first image as in step S310.

즉, 파지 로봇은 작업 공간 이미지에서 물체를 인식하고, 작업 공간 이미지를, 인식된 물체 중 미리 설정된 목표 물체가 위치하는 목표 물체 영역 및 배경 영역으로 분할하여 제1이미지를 생성할 수 있다. 그리고 제1이미지에 포함된 목표 물체 영역의 화소값은 작업 공간 이미지에서의 목표 물체에 대한 화소값에 대응되며, 제1이미지에 포함된 배경 영역의 화소값은 미리 설정된 하나의 화소값일 수 있다.That is, the gripping robot may recognize an object in the work space image and generate the first image by dividing the work space image into a target object area and a background area in which a preset target object is located among the recognized objects. In addition, the pixel value of the target object area included in the first image corresponds to the pixel value of the target object in the work space image, and the pixel value of the background area included in the first image may be one preset pixel value.

이후 파지 로봇은 단계 S320에서 학습된 제1뉴럴 네트워크를 이용하여, 제1이미지의 목표 물체 영역에 포함된 목표 물체에 대한 엔드 이펙터의 근접 위치 및 근접 자세를 추정한다. 즉, 파지 로봇은 엔드 이펙터가 목표 물체에 근접할 수 있는 로봇의 관절각 또는 엑츄에이터에 대한 제어값을 추정한다.Thereafter, the gripping robot estimates the proximity position and the proximity posture of the end effector with respect to the target object included in the target object area of the first image by using the first neural network learned in step S320. That is, the gripping robot estimates the joint angle of the robot that the end effector can approach the target object or a control value for the actuator.

그리고 파지 로봇은 단계 S330에서 학습된 학습된 제2뉴럴 네트워크를 이용하여, 목표 물체에 근접한 엔드 이펙터의 위치 및 자세에서 획득한 제2이미지에 포함된 목표 물체에 대한 엔드 이펙터의 파지 위치 및 파지 자세를 추정한다. 즉, 파지 로봇은 엔드 이펙터가 목표 물체를 파지할 수 있는 로봇의 관절각 또는 엑츄에이터에 대한 제어값을 추정한다.And, the gripping robot uses the learned second neural network learned in step S330, the gripping position and gripping posture of the end effector with respect to the target object included in the second image acquired from the position and posture of the end effector close to the target object. Estimate That is, the gripping robot estimates the joint angle of the robot capable of gripping the target object by the end effector or a control value for the actuator.

이 때, 파지 로봇은 제2이미지에 포함된 목표 물체 및 엔드 이펙터의 근접 위치 및 근접 자세에 기반하여, 엔드 이펙터의 파지 위치 및 파지 자세를 추정할 수 있다.In this case, the gripping robot may estimate the gripping position and the gripping posture of the end effector based on the proximity position and the proximity posture of the target object and the end effector included in the second image.

도 6은 본 발명의 일실시예에 따른 뉴럴 네트워크를 설명하기 위한 도면이다.6 is a diagram illustrating a neural network according to an embodiment of the present invention.

도 6을 참조하면, 제1 및 제2이미지는 CNN으로 입력되며, 훈련 물체 및 목표 물체에 대한 특징값(feature points)이 출력된다. 도 6에서의 RGB 이미지(RGB image)는 제1 및 제2이미지를 나타낸다.Referring to FIG. 6, first and second images are input as CNNs, and feature points for a training object and a target object are output. The RGB image in FIG. 6 represents the first and second images.

이 때, 훈련 물체 및 목표 물체를 나타내는 벡터값(Target Object)이 함께 뉴럴 네트워크로 입력될 수 있다.In this case, a training object and a target object representing the target object may be input together into a neural network.

그리고 훈련 물체 및 목표 물체 각각에 대한 레이블로서, 엔드 이펙터의 근접 위치 및 자세 그리고 엔드 이펙터의 파지 위치 및 자세가 뉴렬 네트워크에 주어질 수 있다. In addition, as labels for each of the training object and the target object, the proximity position and posture of the end effector, and the gripping position and posture of the end effector may be given to the Newelul network.

특히, 훈련 물체 및 목표 물체에 대한 파지 위치 및 자세를 학습 및 추정하는 제2뉴럴 네트워크에는 엔드 이펙터의 근접 위치 및 근접 자세(Current Joint Angles)가 추가적으로 입력될 수 있다.In particular, a proximity position and current joint angles of an end effector may be additionally input to the second neural network for learning and estimating the gripping position and posture of the training object and the target object.

한편, 본 발명에 따른 뉴럴 네트워크는 혼합 밀도 네트워크(Mixture Density Network) 기반의 신경망일 수 있다. 하나의 목표 물체에 대한 파지 동작은 여러가지가 있을 수 있으므로, 혼합 밀도 네트워크를 통해 목표 물체의 배치 상태를 고려한 적절한 파지 동작이 추정될 수 있다.Meanwhile, the neural network according to the present invention may be a neural network based on a mixed density network. Since there may be various gripping operations for one target object, an appropriate gripping operation in consideration of the arrangement state of the target object can be estimated through a mixed density network.

도 7은 본 발명의 일실시예에 따른 뉴럴 네트워크를 이용하는 파지 로봇을 설명하기 위한 도면이다.7 is a diagram illustrating a gripping robot using a neural network according to an embodiment of the present invention.

도 7을 참조하면, 본 발명에 따른 제1카메라(710), 이미지 처리부(720), 근접 동작 제어부(730), 제2카메라(740), 파지 동작 제어부(750)를 포함한다.Referring to FIG. 7, a first camera 710, an image processing unit 720, a proximity operation control unit 730, a second camera 740, and a gripping operation control unit 750 according to the present invention are included.

제1카메라(710)는 복수의 물체가 배치된 작업 공간을 촬영하여 작업 공간 이미지를 생성한다. 제1카메라(710)는 후술되는 제2카메라(740)보다 높은 곳에 위치하여 작업 공간 전체가 포함되도록 작업 공간 이미지를 생성하며, 일예로서, 작업 공간 이미지는 도 1과 같을 수 있다.The first camera 710 generates a work space image by photographing a work space in which a plurality of objects are arranged. The first camera 710 is positioned higher than the second camera 740 to be described later to generate a work space image so that the entire work space is included. As an example, the work space image may be as shown in FIG. 1.

이미지 처리부(720)는 작업 공간 이미지가, 목표 물체 영역 및 배경 영역으로 분할된 제1이미지를 생성한다. 일예로서, 제1이미지는 도 2와 같을 수 있으며, 이미지 처리부(720)는 의미적 영상 분할 알고리즘을 이용하여, 제1이미지를 생성할 수 있다.The image processing unit 720 generates a first image in which the work space image is divided into a target object area and a background area. As an example, the first image may be as shown in FIG. 2, and the image processing unit 720 may generate the first image by using a semantic image segmentation algorithm.

근접 동작 제어부(730)는 학습된 제1뉴럴 네트워크를 이용하여, 제1이미지의 목표 물체 영역에 포함된 목표 물체에 대한 엔드 이펙터의 근접 위치 및 근접 자세를 추정한다.The proximity motion controller 730 estimates the proximity position and proximity posture of the end effector with respect to the target object included in the target object area of the first image using the learned first neural network.

제2카메라(740)는 목표 물체에 근접한 엔드 이펙터의 위치 및 자세에서 목표 물체를 촬영하여 제2이미지를 생성하며, 일예로서, 제2카메라(740)는 엔드 이펙터의 손목 부위에 위치할 수 있다.The second camera 740 generates a second image by photographing the target object at the position and posture of the end effector close to the target object, and as an example, the second camera 740 may be located on the wrist of the end effector. .

파지 동작 제어부(750)는 학습된 제2뉴럴 네트워크를 이용하여, 제2이미지에 포함된 목표 물체에 대한 엔드 이펙터의 파지 위치 및 파지 자세를 추정한다.The gripping operation control unit 750 estimates the gripping position and gripping posture of the end effector with respect to the target object included in the second image by using the learned second neural network.

파지 로봇의 액츄에이터(760)는 일예로서 파지 로봇의 관절각을 조절하는 모터일 수 있으며, 근접 동작 제어부(730)에 의해 추정된 엔드 이펙터의 근접 위치 및 근접 자세에 따라 구동함으로써, 파지 로봇의 엔드 이펙터가 목표 물체 근처에 위치할 수 있도록 한다.The actuator 760 of the gripping robot may be, as an example, a motor that adjusts the joint angle of the gripping robot, and is driven according to the proximity position and the proximity posture of the end effector estimated by the proximity motion control unit 730, so that the end of the gripping robot Allows the effector to be positioned near the target object.

또한 액츄에이터(760)는 파지 동작 제어부(750)에 의해 추정된 엔드 이펙터의 파지 위치 및 파지 자세에 따라 구동함으로써, 엔드 이펙터가 목표 물체를 파지할 수 있도록 한다.In addition, the actuator 760 is driven according to the gripping position and the gripping posture of the end effector estimated by the gripping operation control unit 750, so that the end effector can grip the target object.

앞서 설명한 기술적 내용들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예들을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The above-described technical contents may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiments, or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. -A hardware device specially configured to store and execute program instructions such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, in the present invention, specific matters such as specific components, etc., and limited embodiments and drawings have been described, but this is provided only to help a more general understanding of the present invention, and the present invention is not limited to the above embodiments. , If a person of ordinary skill in the field to which the present invention belongs, various modifications and variations are possible from these descriptions. Therefore, the spirit of the present invention is limited to the described embodiments and should not be defined, and all things that are equivalent or equivalent to the claims as well as the claims to be described later fall within the scope of the spirit of the present invention. .

Claims

Generating a first image in which a work space image obtained by photographing a work space in which a plurality of objects are arranged is divided into a target object area and a background area;
Estimating a proximity position and a proximity posture of an end effector with respect to a target object included in the target object area of the first image by using the learned first neural network; And
Using the learned second neural network, estimating a gripping position and a gripping posture of the end effector with respect to the target object included in a second image obtained from the position and posture of the end effector close to the target object. And
The pixel value of the target object area included in the first image is
Corresponding to the pixel value of the target object in the working space image,
The pixel value of the background area included in the first image is
One preset pixel value
A gripping method of a gripping robot using a neural network.

The method of claim 1,
The step of generating the first image
Recognizing the object in the work space image; And
Generating the first image by dividing the working space image into the target object area and the background area in which the target object is located, among the recognized objects.
A gripping method of a gripping robot using a neural network comprising a.

delete

The method of claim 1,
The pixel value of the background area included in the first image is
0 people
A gripping method of a gripping robot using a neural network.

The method of claim 1,
The step of estimating the gripping position and the gripping posture of the end effector
Based on the proximity position and the proximity posture of the end effector, estimating the gripping position and the gripping posture of the end effector
A gripping method of a gripping robot using a neural network.

Generating a first image in which a work space image obtained by photographing a work space in which a plurality of objects are arranged is divided into a training object region and a background region;
Learning a proximity position and a proximity posture of an end effector with respect to the training object included in the training object region of the first image by using a first neural network; And
Using a second neural network, learning a gripping position and a gripping posture of the end effector with respect to the training object included in a second image obtained from the position and posture of the end effector close to the training object,
The pixel value of the training object area included in the first image is
Corresponding to a pixel value for the training object in the working space image,
The pixel value of the background area included in the first image is
One preset pixel value
Neural network learning method for gripping robots.

The method of claim 6,
The step of generating the first image
Recognizing the object in the work space image; And
Generating the first image by dividing the work space image into a training object area and a background area in which the training object is located, among the recognized objects.
Neural network learning method for a gripping robot comprising a.

The method of claim 6,
The pixel value of the background area is
Adaptively determined according to the pixel value of the training object region
Neural network learning method for gripping robots.

The method of claim 6,
Learning the gripping position and gripping posture of the end effector
Learning the gripping position and gripping posture of the end effector with respect to the proximity position and proximity posture of the training object and the end effector included in the second image
Neural network learning method for gripping robots.

A first camera for generating a work space image by photographing a work space in which a plurality of objects are arranged;
An image processing unit that generates a first image in which the work space image is divided into a target object area and a background area;
A proximity motion controller for estimating a proximity position and a proximity posture of an end effector with respect to a target object included in the target object area of the first image, using the learned first neural network;
A second camera for generating a second image by photographing the target object at the position and posture of the end effector close to the target object; And
A gripping operation control unit estimating a gripping position and a gripping posture of the end effector with respect to the target object included in the second image, using the learned second neural network,
The pixel value of the target object area included in the first image is
Corresponding to the pixel value of the target object in the working space image,
The pixel value of the background area included in the first image is
One preset pixel value
A gripping robot using a neural network.