KR20220065232A

KR20220065232A - Apparatus and method for controlling robot based on reinforcement learning

Info

Publication number: KR20220065232A
Application number: KR1020200151523A
Authority: KR
Inventors: 김주확
Original assignee: 주식회사 플라잎
Priority date: 2020-11-13
Filing date: 2020-11-13
Publication date: 2022-05-20

Abstract

A robot control apparatus that controls a robot comprises: a receiving unit for receiving a plurality of product images in which products moving according to a predetermined sequence in a workspace are photographed; a deep learning model learning unit that trains a deep learning model to estimate 6D pose information of a product based on a plurality of product images; a reinforcement learning model learning unit that trains a reinforcement learning model so that the reinforcement learning model derives robot behavior information based on 6D pose information of the estimated product; and a control unit for controlling an arm of the robot based on the behavior information of the robot. Objectives of the present invention are to predict a next position of the product being moved in the workspace through the deep learning model, derive action information to be taken by the robot for the product to be located in the next position through the reinforcement learning model, and control the arm of the robot based on the derived action information of the robot.

Description

Apparatus and method for controlling a robot based on reinforcement learning

본 발명은 강화학습 기반으로 로봇을 제어하는 장치 및 방법에 관한 것이다.The present invention relates to an apparatus and method for controlling a robot based on reinforcement learning.

최근 들어 스마트 팩토리의 도입 증가로 산업용 로봇과 협동 로봇에 대한 관심이 증가하고 있다. 이러한 스마트 팩토리를 구축하기 위해서는 환경을 인식하고, 상황을 판단하여 자율적으로 동작하는 지능형 로봇의 개발이 필수적으로 요구된다. Recently, with the increase in the introduction of smart factories, interest in industrial robots and collaborative robots is increasing. In order to build such a smart factory, it is essential to develop an intelligent robot that recognizes the environment, judges the situation, and operates autonomously.

한편, 로봇을 이용하여 특정 작업을 수행하기 위해서는 통상적으로 작업 공간 내에서 작업을 수행하는　로봇　및 작업 대상물을 촬영한 영상에 기초하여 해당 영상에 매칭되는 3차원 정보를 생성하고, 3차원 정보를 활용하여　로봇의 위치, 동작, 궤적 등을　제어하는 작업이 수반된다. 특히, 로봇을 이용한 자동　픽업(pick-up) 시스템은 작업 대상물의 형상을 인식하고, 대상물을 판별해야 한다. On the other hand, in order to perform a specific task using a robot, 3D information matching the image is generated based on the image captured by the robot and the work object, which normally performs the work in the work space, and utilizes the 3D information This entails controlling the position, motion, and trajectory of the robot. In particular, the automatic pick-up system using a robot must recognize the shape of the work object and determine the object.

종래의 자동픽업 시스템은 픽업하고자 하는 대상물에 대한 이미지 패턴 정보가 사전에 입력되지 않으면 사전에 입력되지 않은 형상을 가진 대상물은 인식이 불가능하다는 문제가 있었다. 또한, 물체를 집어 올리는 그리퍼(gripper)도 이미지 패턴 정보가 입력된 대상물의 형상에 맞게 사전에 제작되기 때문에 대상물의 형상이 변경되면 그리퍼도 이에 맞게 변경되어야 하는 문제가 있었다. The conventional automatic pickup system has a problem in that it is impossible to recognize an object having a shape that has not been input in advance unless image pattern information on the object to be picked up is previously input. In addition, since a gripper that picks up an object is manufactured in advance to match the shape of the object to which image pattern information is input, when the shape of the object is changed, the gripper must also be changed accordingly.

또한, 종래의 자동픽업 시스템은 픽업하고자 하는 대상물이 정지되어 있는 상태에서만 로봇을 통해 픽 앤 플레이스(Pick and Place)가 가능하기 때문에 주로 상품 분류에서만 한정적으로 이용되고 있다. In addition, the conventional automatic pickup system is mainly used limitedly only in product classification because pick and place is possible through a robot only in a state where an object to be picked up is stopped.

한국공개특허공보 제2012-0027253호 (2012.03.21. 공개)Korean Patent Publication No. 2012-0027253 (published on March 21, 2012)

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 작업 공간에서 이동되는 제품의 다음 위치를 딥러닝 모델을 통해 예측하고, 다음 위치에 위치할 제품에 대하여 로봇이 취할 행동 정보를 강화학습 모델을 통해 도출하고, 도출된 로봇의 행동 정보에 기초하여 로봇의 팔을 제어하고자 한다. The present invention is to solve the problems of the prior art described above, predicting the next position of a product to be moved in a work space through a deep learning model, and predicting the behavior information to be taken by the robot with respect to the product to be located in the next position using a reinforcement learning model is derived through , and based on the derived robot's behavior information, we want to control the robot's arm.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다. However, the technical problems to be achieved by the present embodiment are not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 기술적 수단으로서, 본 발명의 제 1 측면에 따른 로봇을 제어하는 로봇 제어 장치는 작업 공간에서 기설정된 시퀀스에 따라 이동되는 제품이 촬영된 복수의 제품 이미지를 수신하는 수신부; 상기 복수의 제품 이미지에 기초하여 딥러닝 모델이 상기 제품의 6D 포즈 정보를 추정하도록 상기 딥러닝 모델을 학습시키는 딥러닝 모델 학습부; 상기 추정된 제품의 6D 포즈 정보에 기초하여 강화학습 모델이 상기 로봇의 행동 정보를 도출하도록 상기 강화학습 모델을 학습시키는 강화학습 모델 학습부; 및 상기 로봇의 행동 정보에 기초하여 상기 로봇의 팔을 제어하는 제어부를 포함할 수 있다. As a technical means for achieving the above-described technical problem, the robot control device for controlling the robot according to the first aspect of the present invention is a receiving unit for receiving a plurality of product images in which products moving according to a preset sequence in a work space are photographed ; a deep learning model learning unit for learning the deep learning model so that the deep learning model estimates 6D pose information of the product based on the plurality of product images; a reinforcement learning model learning unit for learning the reinforcement learning model so that the reinforcement learning model derives behavior information of the robot based on the 6D pose information of the estimated product; and a controller configured to control the arm of the robot based on the behavior information of the robot.

본 발명의 제 2 측면에 따른 로봇 제어 장치를 통해 수행되는 로봇을 제어하는 방법은 작업 공간에서 기설정된 시퀀스에 따라 이동되는 제품이 촬영된 복수의 제품 이미지를 수신하는 단계; 상기 복수의 제품 이미지에 기초하여 딥러닝 모델이 상기 제품의 6D 포즈 정보를 추정하도록 상기 딥러닝 모델을 학습시키는 단계; 상기 추정된 제품의 6D 포즈 정보에 기초하여 강화학습 모델이 상기 로봇의 행동 정보를 도출하도록 상기 강화학습 모델을 학습시키는 단계; 및 상기 로봇의 행동 정보에 기초하여 상기 로봇의 팔을 제어하는 단계를 포함할 수 있다. A method of controlling a robot performed through a robot control device according to a second aspect of the present invention comprises: receiving a plurality of product images in which products moving according to a preset sequence in a work space are photographed; training the deep learning model based on the plurality of product images so that the deep learning model estimates 6D pose information of the product; training the reinforcement learning model so that the reinforcement learning model derives behavior information of the robot based on the 6D pose information of the estimated product; and controlling the arm of the robot based on the behavior information of the robot.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described problem solving means are merely exemplary, and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 본 발명은 딥러닝 모델 및 강화학습 모델을 이용하여 작업 공간에서 제품이 위치할 다음 위치 정보를 예측하고, 예측된 다음 위치 정보에서의 로봇이 취할 로봇의 행동 정보를 도출하고, 로봇의 행동 정보에 기초하여 로봇의 팔을 제어할 수 있다. According to any one of the above-described problem solving means of the present invention, the present invention predicts the next position information where the product will be located in the work space using a deep learning model and a reinforcement learning model, and the robot in the predicted next position information is It is possible to derive the behavioral information of the robot to be taken, and to control the arm of the robot based on the behavioral information of the robot.

도 1은 본 발명의 일 실시예에 따른, 로봇 제어 장치의 블록도이다.
도 2는 본 발명의 일 실시예에 따른, 딥러닝 모델을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른, 강화학습 모델을 설명하기 위한 도면이다.
도 4a 내지 4b는 본 발명의 일 실시예에 따른, 로봇을 제어하는 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른, 로봇을 제어하는 방법을 나타낸 흐름도이다. 1 is a block diagram of a robot control apparatus according to an embodiment of the present invention.
2 is a diagram for explaining a deep learning model according to an embodiment of the present invention.
3 is a diagram for explaining a reinforcement learning model according to an embodiment of the present invention.
4A to 4B are diagrams for explaining a method of controlling a robot according to an embodiment of the present invention.
5 is a flowchart illustrating a method for controlling a robot according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those of ordinary skill in the art can easily implement them. However, the present invention may be embodied in several different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. Throughout the specification, when a part is "connected" with another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element interposed therebetween. . Also, when a part "includes" a certain component, it means that other components may be further included, rather than excluding other components, unless otherwise stated.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. In this specification, a "part" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. In addition, one unit may be implemented using two or more hardware, and two or more units may be implemented by one hardware.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다. Some of the operations or functions described as being performed by the terminal or device in this specification may be instead performed by a server connected to the terminal or device. Similarly, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the server.

이하, 첨부된 구성도 또는 처리 흐름도를 참고하여, 본 발명의 실시를 위한 구체적인 내용을 설명하도록 한다. Hereinafter, detailed contents for carrying out the present invention will be described with reference to the accompanying configuration diagram or process flow diagram.

도 1은 본 발명의 일 실시예에 따른, 로봇 제어 장치(10)의 블록도이다. 1 is a block diagram of a robot control apparatus 10 according to an embodiment of the present invention.

도 1을 참조하면, 로봇 제어 장치(10)는 수신부(100), 딥러닝 모델 학습부(110), 강화학습 모델 학습부(120) 및 제어부(130)를 포함할 수 있다. 다만, 도 1에 도시된 로봇 제어 장치(10)는 본 발명의 하나의 구현 예에 불과하며, 도 1에 도시된 구성요소들을 기초로 하여 여러 가지 변형이 가능하다. Referring to FIG. 1 , the robot control apparatus 10 may include a receiver 100 , a deep learning model learner 110 , a reinforcement learning model learner 120 , and a controller 130 . However, the robot control device 10 shown in FIG. 1 is only one embodiment of the present invention, and various modifications are possible based on the components shown in FIG. 1 .

로봇 제어 장치(10)는 데스크탑, 노트북 등과 같은 퍼스널 컴퓨터(personal computer)뿐만 아니라 유무선 통신이 가능한 모바일 단말을 포함할 수 있다. 모바일 단말은 휴대성과 이동성이 보장되는 무선 통신 장치로서, 스마트폰(smartphone), 태블릿 PC, 웨어러블 디바이스뿐만 아니라, 블루투스(BLE, Bluetooth Low Energy), NFC, RFID, 초음파(Ultrasonic), 적외선, 와이파이(WiFi), 라이파이(LiFi) 등의 통신 모듈을 탑재한 각종 디바이스를 포함할 수 있다. 다만, 로봇 제어 장치(10)는 도 1에 도시된 형태 또는 앞서 예시된 것들로 한정 해석되는 것은 아니다.The robot control device 10 may include not only a personal computer such as a desktop or a laptop computer, but also a mobile terminal capable of wired/wireless communication. A mobile terminal is a wireless communication device that guarantees portability and mobility, and includes not only smartphones, tablet PCs, and wearable devices, but also Bluetooth (BLE, Bluetooth Low Energy), NFC, RFID, Ultrasonic, infrared, and Wi-Fi ( WiFi) and Li-Fi (LiFi) may include various devices equipped with a communication module. However, the robot control device 10 is not limited to the form shown in FIG. 1 or those exemplified above.

이하에서는 도 2 내지 4b를 함께 참조하여 도 1을 설명하기로 한다. Hereinafter, FIG. 1 will be described with reference to FIGS. 2 to 4B .

수신부(100)는 작업 공간에서 기설정된 시퀀스에 따라 이동하는 제품이 촬영된 복수의 제품 이미지를 카메라(미도시)로부터 수신할 수 있다. 여기서, 작업 공간이란, 예를 들어 제품이 이동되는 컨베이어를 의미할 수 있으나, 반드시 이에 한정되는 것은 아니다. 또한, 기설정된 시퀀스란 제품을 가공, 포장 및 검사하는 등의 시계열에 따른 복수의 공정을 포함하는 것으로서, 도시하지 않은 메모리에 저장될 수 있다.The receiver 100 may receive from a camera (not shown) a plurality of product images in which products moving according to a preset sequence in the work space are photographed. Here, the working space may mean, for example, a conveyor on which a product is moved, but is not necessarily limited thereto. In addition, the preset sequence includes a plurality of processes according to time series, such as processing, packaging, and inspection of a product, and may be stored in a memory (not shown).

한편, 복수의 제품 이미지는 RGB 이미지 및 깊이(depth) 이미지를 포함할 수 있다. Meanwhile, the plurality of product images may include an RGB image and a depth image.

딥러닝 모델 학습부(110)는 복수의 제품 이미지에 기초하여 딥러닝 모델이 제품의 6D 포즈 정보를 추정하도록 딥러닝 모델을 학습시킬 수 있다. 여기서, 제품의 6D 포즈 정보는 로봇이 제품을 인식하기 위한 x축 정보, y축 정보, z축 정보, pitch 정보, roll 정보 및 yaw 정보를 포함할 수 있다. The deep learning model learning unit 110 may train the deep learning model based on the plurality of product images so that the deep learning model estimates 6D pose information of the product. Here, the 6D pose information of the product may include x-axis information, y-axis information, z-axis information, pitch information, roll information, and yaw information for the robot to recognize the product.

딥러닝 모델 학습부(110)는, 제품이 작업 공간에서 이동될 때, N 시점(예컨대, N 초) 후 제품의 6D 포즈 정보를 추정하도록 딥러닝 모델을 학습시킬 수 있다. The deep learning model learning unit 110 may train the deep learning model to estimate the 6D pose information of the product after N time points (eg, N seconds) when the product is moved in the work space.

도 2를 참조하면, 딥러닝 모델(20)은 복수의 CNN(convolutional neural network) 및 적어도 하나의 LSTM(Long Short Term Memory) 기반 신경망이 결합된 네트워크 구조가 복수 개 연결된 것일 수 있다. Referring to FIG. 2 , the deep learning model 20 may be a network structure in which a plurality of convolutional neural networks (CNNs) and at least one long short term memory (LSTM)-based neural network are coupled to each other may be connected.

딥러닝 모델(20)에 사용된 LSTM 기반 신경망은 순환 신경망(RNN, recurrent neural network)으로 동작할 수 있으나, 이에 한정되지 않는다. 즉, LSTM 기반 신경망은 다른 종류의 신경망으로 대체될 수 있다. 여기서, 순환 신경망은 기준 시점(t)과 다음 시점(t+1)에 네트워크를 연결하여 구성한 인공 신경망으로서, 시계열 데이터와 같이 시점 흐름에 따라 변화하는 데이터를 학습하기 위한 딥러닝 모델의 구성 요소로 사용된다.The LSTM-based neural network used in the deep learning model 20 may operate as a recurrent neural network (RNN), but is not limited thereto. That is, the LSTM-based neural network can be replaced with other types of neural networks. Here, the recurrent neural network is an artificial neural network constructed by connecting the network at the reference time point (t) and the next time point (t+1). used

복수의 CNN은 복수의 제품 이미지로부터 특징 데이터를 추출하고, LSTM 기반 신경망은 추출된 특징 데이터에 기초하여 제품의 6D 포즈 정보를 추정할 수 있다. A plurality of CNNs may extract feature data from a plurality of product images, and an LSTM-based neural network may estimate 6D pose information of a product based on the extracted feature data.

예를 들어, 제 1 네트워크 구조에 포함된 복수의 CNN 각각은 T 시점에 촬영된 제품에 대한 RGB 이미지 및 깊이 이미지를 입력받고, 제 2 네트워크 구조에 포함된 복수의 CNN 각각은 T+1 시점에 촬영된 제품에 대한 RGB 이미지 및 깊이 이미지를 입력받고, 제 N 네트워크 구조에 포함된 복수의 CNN 각각은 T+N 시점에 촬영된 제품에 대한 RGB 이미지 및 깊이 이미지를 입력받는다. 이 때, 시퀀스 순으로 촬영된 RGB 이미지 및 깊이 이미지는 각 네트워크 구조에 포함된 복수의 CNN으로 동시에 입력된다. For example, each of the plurality of CNNs included in the first network structure receives an RGB image and a depth image for a product photographed at time T, and each of the plurality of CNNs included in the second network structure is at time T+1. The RGB image and depth image of the photographed product are received, and each of the plurality of CNNs included in the N-th network structure receives the RGB image and the depth image of the product photographed at the T+N time point. At this time, the RGB image and the depth image taken in sequence are simultaneously input to a plurality of CNNs included in each network structure.

제 1 네트워크 구조에 포함된 복수의 CNN 각각은 T 시점에 촬영된 제품에 대한 RGB 이미지 및 깊이 이미지로부터 제 1 특징 데이터를 추출하고, 제 1 네트워크 구조에 포함된 LSTM 기반 신경망은 추출된 제 1 특징 데이터에 기초하여 T 시점에서의 제품의 6D 포즈 정보를 추정할 수 있다. Each of the plurality of CNNs included in the first network structure extracts first feature data from the RGB image and the depth image for the product photographed at time T, and the LSTM-based neural network included in the first network structure extracts the extracted first feature. Based on the data, 6D pose information of the product at time T may be estimated.

제 2 네트워크 구조에 포함된 복수의 CNN 각각은 T+1 시점에 촬영된 제품에 대한 RGB 이미지 및 깊이 이미지로부터 제 2 특징 데이터를 추출하고, 제 2 네트워크 구조에 포함된 LSTM 기반 신경망은 추출된 제 2 특징 데이터에 기초하여 T+1 시점에서의 제품의 6D 포즈 정보를 추정할 수 있다. Each of the plurality of CNNs included in the second network structure extracts second feature data from the RGB image and the depth image of the product photographed at the time T+1, and the LSTM-based neural network included in the second network structure extracts the extracted first 2 Based on the feature data, 6D pose information of the product at the time T+1 may be estimated.

제 N 네트워크 구조에 포함된 복수의 CNN 각각은 T+N 시점에 촬영된 제품에 대한 RGB 이미지 및 깊이 이미지로부터 제 N 특징 데이터를 추출하고, 제 N 네트워크 구조에 포함된 LSTM 기반 신경망은 추출된 제 N 특징 데이터에 기초하여 T+N에서의 제품의 6D 포즈 정보를 추정할 수 있다.Each of the plurality of CNNs included in the N-th network structure extracts the N-th feature data from the RGB image and the depth image for the product photographed at T+N time, and the LSTM-based neural network included in the N-th network structure extracts the extracted first Based on the N feature data, 6D pose information of the product in T+N may be estimated.

딥러닝 모델 학습부(110)는 손실(loss) 함수를 통한 평균제곱오차(mean square error)를 이용하여 추정된 제품의 6D 포즈 정보 및 제품의 실제 위치 정보를 비교할 수 있다.The deep learning model learning unit 110 may compare the 6D pose information of the product estimated using a mean square error through a loss function and the actual position information of the product.

예를 들어, 딥러닝 모델 학습부(110)는 딥러닝 모델(20)에 의해 추정된 제품의 6D 포즈 정보 및 제품의 실제 위치 정보를 비교하여 딥러닝 모델(20)을 학습시킬 수 있다. 예를 들어, 딥러닝 모델 학습부(110)는 특정 시점에서 추정된 제품의 6D 포즈 정보 및 작업 공간에서 이동된 특정 시점에서의 제품의 실제 위치 정보를 비교한 비교 결과에 기초하여 딥러닝 모델(20)을 학습시킬 수 있다. 다시 도 1로 돌아오면, 강화학습 모델 학습부(120)는 딥러닝 모델에 의해 추정된 제품의 6D 포즈 정보에 기초하여 강화학습 모델이 로봇의 행동 정보를 도출하도록 강화학습 모델을 학습시킬 수 있다. 여기서, 행동 정보는 시간에 따른 로봇의 엔드 이펙터(End-Effector)의 관절(Joints)의 위치값을 포함할 수 있다. 강화학습 모델 학습부(120)는 딥러닝 모델로부터 추정된 제품의 6D 포즈 정보 및 로봇의 엔드 이펙터에 대한 실제 6D 포즈 정보에 기초하여 강화학습 모델을 학습시킬 수 있다. 여기서, 엔드 이펙터는 로봇이 작업할 때 작업 대상물에 직접적으로 제어하는 기능을 가진 부분(예컨대, 그리퍼 등)을 의미한다. For example, the deep learning model learning unit 110 may train the deep learning model 20 by comparing the 6D pose information of the product estimated by the deep learning model 20 and the actual position information of the product. For example, the deep learning model learning unit 110 is a deep learning model ( 20) can be learned. 1, the reinforcement learning model learning unit 120 can train the reinforcement learning model to derive the behavior information of the robot based on the 6D pose information of the product estimated by the deep learning model. . Here, the behavior information may include position values of joints of the end-effector of the robot according to time. The reinforcement learning model learning unit 120 may train the reinforcement learning model based on the 6D pose information of the product estimated from the deep learning model and the actual 6D pose information on the end effector of the robot. Here, the end effector refers to a part (eg, a gripper) having a function of directly controlling a work object when the robot works.

도 3을 참조하면, 강화학습 모델(30)은 로봇의 행동 정보를 결정하는 신경망인 액터(Actor)(301) 및 행동 정보에 대한 행동 가치를 평가하는 신경망인 크리틱(Critic)(303)을 포함할 수 있다. Referring to FIG. 3 , the reinforcement learning model 30 includes an Actor 301 that is a neural network that determines behavior information of a robot, and Critic 303 that is a neural network that evaluates a behavioral value for behavior information. can do.

딥러닝 모델로부터 추정된 제품의 6D 포즈 정보 및 로봇의 엔드 이펙터에 대한 실제 6D 포즈 정보는 액터(301) 및 크리틱(303)의 입력값으로 입력될 수 있다. The 6D pose information of the product estimated from the deep learning model and the actual 6D pose information on the end effector of the robot may be input as input values of the actor 301 and the crit 303 .

액터(301)는 딥러닝 모델로부터 추정된 제품의 6D 포즈 정보 및 로봇의 엔드 이펙터에 대한 실제 6D 포즈 정보에 기초하여 로봇의 행동 정보를 결정할 수 있다. The actor 301 may determine the robot's behavior information based on the 6D pose information of the product estimated from the deep learning model and the actual 6D pose information on the end effector of the robot.

크리틱(303)은 딥러닝 모델로부터 추정된 제품의 6D 포즈 정보 및 로봇의 엔드 이펙터에 대한 실제 6D 포즈 정보에 기초하여 액터(301)에 의해 결정된 로봇의 행동 정보에 대한 행동 가치를 평가할 수 있다. The crit 303 may evaluate the behavioral value of the behavioral information of the robot determined by the actor 301 based on the 6D pose information of the product estimated from the deep learning model and the actual 6D pose information on the end effector of the robot.

강화학습 모델(30)은 강화학습 환경에서 액터(301)에 의해 결정된 로봇의 행동 정보에 따라 로봇이 제어된 경우, 로봇의 제어 결과에 기초하여 로봇의 행동 정보에 대한 리워드를 결정할 수 있다. When the robot is controlled according to the behavior information of the robot determined by the actor 301 in the reinforcement learning environment in the reinforcement learning environment, the reinforcement learning model 30 may determine a reward for the behavior information of the robot based on the control result of the robot.

예를 들어, 강화학습 모델(30)은 강화학습 환경에서 로봇이 기설정된 규칙에 따라 제품을 제어하는 경우, 로봇의 행동 정보에 대한 리워드를 플러스 보상값으로 결정할 수 있다. 예를 들어, 강화학습 모델(30)은 로봇의 팔이 제품을 정확하게 픽업(pick-up)한 경우, 로봇의 행동 정보에 대한 리워드를 플러스 보상값으로 결정할 수 있다. For example, the reinforcement learning model 30 may determine a reward for behavior information of the robot as a positive reward value when the robot controls a product according to a preset rule in the reinforcement learning environment. For example, when the robot arm accurately picks up a product, the reinforcement learning model 30 may determine a reward for the robot's behavior information as a positive reward value.

예를 들어, 강화학습 모델(30)은 강화학습 환경에서 로봇이 기설정된 규칙에 따라 제품을 제어하지 못한 경우, 로봇의 행동 정보에 대한 리워드를 마이너스 보상값으로 결정할 수 있다. 예를 들어, 강화학습 모델(30)은 로봇의 팔이 제품을 픽업하지 못하거나 장애물에 부딪힌 경우, 로봇의 행동 정보에 대한 리워드를 마이너스 보상값으로 결정할 수 있다. For example, when the robot fails to control a product according to a preset rule in the reinforcement learning environment in the reinforcement learning environment, the reinforcement learning model 30 may determine a reward for the robot's behavior information as a negative reward value. For example, the reinforcement learning model 30 may determine a reward for the robot's behavior information as a negative reward value when the robot's arm fails to pick up a product or collides with an obstacle.

크리틱(303)은 강화학습 모델(30)에 의해 결정된 로봇의 행동 정보에 대한 리워드에 기초하여 액터(301)에 의해 결정된 로봇의 행동 정보에 대한 행동 가치를 평가할 수 있다. The crit 303 may evaluate the behavioral value of the behavioral information of the robot determined by the actor 301 based on the reward for the behavioral information of the robot determined by the reinforcement learning model 30 .

액터(301)는 크리틱(303)에 의해 평가된 로봇의 행동 정보에 대한 행동 가치에 기초하여 로봇의 행동 정보를 수정할 수 있다. The actor 301 may modify the behavior information of the robot based on the behavior value of the behavior information of the robot evaluated by the crit 303 .

도 4a 내지 4b를 함께 참조하면, 수신부(100)는 작업 공간(42)의 특정 위치를 촬영하는 카메라(40)로부터 작업 공간의 특정 위치를 지나 이동하는 제품(46)을 촬영한 실시간 제품 이미지를 수신할 수 있다. 4A to 4B together, the receiving unit 100 receives a real-time product image obtained by photographing a product 46 moving past a specific position in the working space from a camera 40 that captures a specific position in the working space 42 . can receive

행동 정보 도출부(미도시)는 기학습된 딥러닝 모델(20)에 실시간 제품 이미지를 입력하고, 기학습된 딥러닝 모델(20)을 통해 제품의 6D 포즈 정보를 추정할 수 있다. The behavior information derivation unit (not shown) may input a real-time product image to the pre-learned deep learning model 20 , and estimate 6D pose information of the product through the pre-learned deep learning model 20 .

행동 정보 도출부(미도시)는 기학습된 딥러닝 모델(20)을 통해 추정된 제품의 6D 포즈 정보를 강화학습 모델(30)에 입력하고, 강화학습 모델(30)을 통해 로봇의 행동 정보를 도출할 수 있다. 예를 들어, 행동 정보 도출부(미도시)는 작업 공간에서 이동 중인 제품(46)의 N 시점에서의 6D 포즈 정보를 기학습된 딥러닝 모델(20)을 통해 예측하고, 예측된 제품(46)의 6D 포즈 정보에서 로봇이 취할 행동 정보를 기학습된 강화학습 모델(30)로부터 도출할 수 있다. The behavior information derivation unit (not shown) inputs the 6D pose information of the product estimated through the pre-learned deep learning model 20 into the reinforcement learning model 30, and the behavior information of the robot through the reinforcement learning model 30 can be derived. For example, the behavior information derivation unit (not shown) predicts 6D pose information at N viewpoints of the product 46 moving in the work space through the pre-trained deep learning model 20, and predicts the predicted product 46 ), behavior information to be taken by the robot from the 6D pose information of ) can be derived from the previously learned reinforcement learning model 30 .

제어부(130)는 도출된 로봇의 행동 정보에 기초하여 로봇의 팔(44)을 제어할 수 있다. 예를 들어, 제어부(130)는 도출된 로봇 행동 정보에 기초하여 예측된 작업 공간(42)의 다음 위치 정보에 놓일 제품(46)을 로봇의 팔(44)이 픽업하도록 로봇의 팔(44)을 제어할 수 있다. The controller 130 may control the arm 44 of the robot based on the derived behavior information of the robot. For example, the control unit 130 controls the arm 44 of the robot so that the arm 44 of the robot picks up the product 46 to be placed in the next position information of the work space 42 predicted based on the derived robot behavior information. can be controlled.

한편, 당업자라면, 수신부(100), 딥러닝 모델 학습부(110), 강화학습 모델 학습부(120) 및 제어부(130) 각각이 분리되어 구현되거나, 이 중 하나 이상이 통합되어 구현될 수 있음을 충분히 이해할 것이다. Meanwhile, for those skilled in the art, the receiver 100, the deep learning model learning unit 110, the reinforcement learning model learning unit 120, and the control unit 130 may be implemented separately, or one or more of them may be integrated. will fully understand

도 5는 본 발명의 일 실시예에 따른, 로봇을 제어하는 방법을 나타낸 흐름도이다. 5 is a flowchart illustrating a method for controlling a robot according to an embodiment of the present invention.

도 5를 참조하면, 단계 S501에서 로봇 제어 장치(10)는 작업 공간에서 기설정된 시퀀스에 따라 이동되는 제품이 촬영된 복수의 제품 이미지를 수신할 수 있다. Referring to FIG. 5 , in step S501 , the robot control device 10 may receive a plurality of product images in which products moving according to a preset sequence in the work space are photographed.

단계 S503에서 로봇 제어 장치(10)는 복수의 제품 이미지에 기초하여 딥러닝 모델이 제품의 6D 포즈 정보를 추정하도록 딥러닝 모델을 학습시킬 수 있다. In step S503, the robot control device 10 may train the deep learning model based on the plurality of product images so that the deep learning model estimates 6D pose information of the product.

단계 S505에서 로봇 제어 장치(10)는 추정된 제품의 6D 포즈 정보에 기초하여 강화학습 모델이 로봇의 행동 정보를 도출하도록 강화학습 모델을 학습시킬 수 있다. In step S505, the robot control device 10 may train the reinforcement learning model to derive behavior information of the robot based on the 6D pose information of the estimated product.

단계 S507에서 로봇 제어 장치(10)는 로봇의 행동 정보에 기초하여 로봇의 팔을 제어할 수 있다. In step S507, the robot control device 10 may control the arm of the robot based on the behavior information of the robot.

상술한 설명에서, 단계 S501 내지 S507은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 변경될 수도 있다. In the above description, steps S501 to S507 may be further divided into additional steps or combined into fewer steps, according to an embodiment of the present invention. In addition, some steps may be omitted if necessary, and the order between steps may be changed.

본 발명의 일 실시예는 컴퓨터에 의해 실행되는 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다. An embodiment of the present invention may also be implemented in the form of a recording medium including instructions executable by a computer, such as a program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer-readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다. The description of the present invention described above is for illustration, and those of ordinary skill in the art to which the present invention pertains can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. For example, each component described as a single type may be implemented in a dispersed form, and likewise components described as distributed may be implemented in a combined form.

본 발명의 범위는 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다. The scope of the present invention is indicated by the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. .

10: 로봇 제어 장치
100: 수신부
110: 딥러닝 모델 학습부
120: 강화학습 모델 학습부
130: 제어부10: robot control unit
100: receiver
110: deep learning model learning unit
120: reinforcement learning model learning unit
130: control unit

Claims

In the robot control device for controlling the robot,
a receiver configured to receive a plurality of product images in which products moving according to a preset sequence in a work space are photographed;
a deep learning model learning unit for learning the deep learning model so that the deep learning model estimates 6D pose information of the product based on the plurality of product images;
a reinforcement learning model learning unit for learning the reinforcement learning model so that the reinforcement learning model derives behavior information of the robot based on the 6D pose information of the estimated product; and
A control unit for controlling the arm of the robot based on the behavior information of the robot
That comprising a, robot control device.

The method of claim 1,
The plurality of product images include an RGB image and a depth image,
In the deep learning model, a plurality of convolutional neural networks (CNNs) and at least one LSTM (Long Short Term Memory) based neural network are connected to a plurality of network structures that are connected to each other.

3. The method of claim 2,
The plurality of CNNs extract feature data from the plurality of product images,
The LSTM-based neural network is to estimate the 6D pose information of the product based on the extracted feature data, the robot control device.

The method of claim 1,
The deep learning model learning unit is to learn the deep learning model by comparing the 6D pose information of the estimated product and the actual position information of the product, the robot control device.

The method of claim 1,
The reinforcement learning model will include a neural network that determines the behavior information of the robot, Actor (Actor), and a neural network that evaluates the behavioral value of the behavior information, Critic, the robot control device.

6. The method of claim 5,
Based on the control result of the robot in the reinforcement learning model, the reward for the behavior information of the robot is determined, the robot control device.

7. The method of claim 6,
When the robot controls the product according to a preset rule, the reward is determined as a positive compensation value,
When the robot fails to control the product according to the preset rule, the reward is determined as a negative compensation value, the robot control device.

In the method of controlling a robot performed through a robot control device,
Receiving a plurality of product images in which products moving according to a preset sequence in a work space are photographed;
training the deep learning model based on the plurality of product images so that the deep learning model estimates 6D pose information of the product;
training the reinforcement learning model so that the reinforcement learning model derives behavior information of the robot based on the 6D pose information of the estimated product; and
controlling the arm of the robot based on the behavior information of the robot
That comprising a, robot control method.

9. The method of claim 8,
The plurality of product images include an RGB image and a depth image,
In the deep learning model, a plurality of convolutional neural networks (CNNs) and at least one LSTM (Long Short Term Memory) based neural network are connected to a plurality of network structures that are connected to each other.

10. The method of claim 9,
The plurality of CNNs extract feature data from the plurality of product images,
The LSTM-based neural network will estimate 6D pose information of the product based on the extracted feature data.

9. The method of claim 8,
The step of training the deep learning model is
Comprising the step of learning the deep learning model by comparing the 6D pose information of the estimated product and the actual position information of the product, the robot control method.

9. The method of claim 8,
The reinforcement learning model is to include an actor (Actor) which is a neural network that determines the behavioral information of the robot, and a critic, which is a neural network that evaluates the behavioral value of the behavioral information.

13. The method of claim 12,
The method further comprising the step of determining a reward for the behavior information of the robot based on the control result of the robot through a reinforcement learning model, the robot control method.

14. The method of claim 13,
The step of determining the reward
When the robot controls the product according to a preset rule, the reward is determined as a positive compensation value,
When the robot fails to control the product according to the preset rule, determining the reward as a negative compensation value.