JP2023183271A

JP2023183271A - Processing device, robot control system, and machine learning method

Info

Publication number: JP2023183271A
Application number: JP2022096802A
Authority: JP
Inventors: 俊貴小谷; Toshiki Kotani; 洋伊藤; Yo Ito; 秀行一藁; Hideyuki Ichiwara; 健次郎山本; Kenjiro Yamamoto
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2022-06-15
Filing date: 2022-06-15
Publication date: 2023-12-27
Also published as: WO2023243412A1

Abstract

To provide a processing device, a robot control system, and a machine learning method, which can reduce temporal variations in teaching data that impede motion learning.SOLUTION: A processor (CPU (23)) of a processing device (computer 20) segments teaching data for a robot (robot device 1) into the same kinds of motion of the robot (motion segmenting unit 41). The processor performs correction for equalizing the speed or timing of each of the same kind of motion of the robot to the plurality of segmented teaching data (speed adjusting unit 42, timing adjusting unit 43). The processor combines the plurality of corrected teaching data (data processing unit 45). The processor performs machine learning by using the combined teaching data (machine learning device 34).SELECTED DRAWING: Figure 4

Description

本発明は、処理装置、ロボット制御システム及び機械学習方法に関する。 The present invention relates to a processing device, a robot control system, and a machine learning method.

事業環境の急速な変化や人々のニーズの多様化に対応するために、多品種少量生産システムが注目を集めている。生産効率を高める手段としてロボットによる自動化が考えられるが、ロボットを制御するためには膨大なプログラミングコストや高い専門知識が必要であり、導入工数が大きいことが課題となっている。そこで、ロボット導入工数を削減するために、あらゆるロボット機能に機械学習が用いられている。例えば、物体認識では、物体の種類や位置、姿勢を推定するために、ロボットとの接触情報や画像情報から機械学習を用いて物体の特徴を定量的に算出する研究が多数報告されている。 In order to respond to the rapid changes in the business environment and the diversification of people's needs, high-mix, low-volume production systems are attracting attention. Automation using robots can be considered as a means of increasing production efficiency, but controlling robots requires enormous programming costs and high specialized knowledge, and the number of man-hours required to introduce them is a problem. Therefore, machine learning is being used for all robot functions in order to reduce the man-hours required to introduce robots. For example, in object recognition, many studies have been reported that use machine learning to quantitatively calculate the characteristics of an object from contact information with a robot and image information in order to estimate the type, position, and orientation of the object.

一方で、動作生成では、環境変化やプログラムでは記述困難な動作を実現するために、機械学習をロボットの自律制御に適用した研究が多数報告されている。自律制御技術の一例として、強化学習が挙げられる。ロボットが試行錯誤することで、タスクを遂行するための最適動作を獲得するため、明示的に動作を教示する必要がない。しかし、最適動作を獲得するためには、膨大な試行錯誤（学習時間）がかかる。 On the other hand, in motion generation, many studies have been reported that apply machine learning to autonomous control of robots in order to change the environment and realize motions that are difficult to describe with programs. Reinforcement learning is an example of autonomous control technology. Because the robot acquires the optimal motion to accomplish a task through trial and error, there is no need to explicitly teach the motion. However, it takes a huge amount of trial and error (learning time) to obtain the optimal motion.

学習時間が小さい自律制御技術として、模倣学習がある。模倣学習は教示・学習・実行フェーズの３つからなる。教示フェーズでは、教示者がロボットを操作することで目的の動作を教示し、その際ロボットに搭載されたエンコーダーや、温度センサ、ビジョンセンサ、超音波センサ等が出力する時系列情報を収集記録する。ロボットへの動作教示は、コントローラによる遠隔操作や、ロボットを直接触れて教示するダイレクトティーチング、あるいは制御プログラムを用いる手法などがある。 Imitation learning is an autonomous control technology that requires little learning time. Imitation learning consists of three phases: teaching, learning, and execution. In the teaching phase, the teacher teaches the desired motion by operating the robot, and at this time collects and records time-series information output by the encoder, temperature sensor, vision sensor, ultrasonic sensor, etc. installed on the robot. . Instructions for teaching robots include remote control using a controller, direct teaching by touching the robot, and methods using control programs.

教示フェーズで得られる時系列情報を、教示データと呼ぶ。教示データの種類は、動作生成方法に依存する。例えば、ロボットや対象物の位置に応じて動作を生成する場合、教示データには、ロボットの位置や、関節角情報等が含まれる。さらに、視覚情報に応じて動作を生成する場合は、ロボットや作業環境の画像を含めてもよい。また、対象物の状態に応じて動作を生成する場合は、対象物の位置や姿勢情報を含めてもよい。また、例えば、ロボットの速度に応じて動作を生成する場合、教示データには、ロボットの速度、関節角速度情報等が含まれる。また、例えば、ロボットの力に応じて動作を生成する場合、教示データには、ロボットの触覚情報が含まれる。 The time series information obtained in the teaching phase is called teaching data. The type of teaching data depends on the motion generation method. For example, when generating a motion according to the position of a robot or an object, the teaching data includes the position of the robot, joint angle information, and the like. Furthermore, when generating actions in response to visual information, images of the robot and the work environment may be included. Further, when generating a motion according to the state of the target object, the position and posture information of the target object may be included. Further, for example, when a motion is generated according to the speed of the robot, the teaching data includes the speed of the robot, joint angular velocity information, and the like. Further, for example, when generating a motion according to the force of the robot, the teaching data includes tactile information of the robot.

学習フェーズでは、教示フェーズで収集した教示データをもとに学習データを作成し、その学習データを用いて動作生成モデルの学習を行う。学習データは、教示データをそのまま使用してもよい。また、教示データに含まれるノイズ、欠損、エラー値などを除去するために、教示データを変換することで学習データを作成してもよい。教示データを学習データへ変換することを前処理と呼ぶ。前処理には、欠損値や外れ値等が含まれるデータを除去するデータクリーニングや、データの値が指定された範囲内に収まるように変換する正規化などがある。学習データを用いて、機械学習モデルの重みを更新することで、ある値をモデルへ入力すると、所望の値を出力するようにする。例えば、ある時刻の関節角情報と画像を入力すると、その次の時刻の関節角情報と画像を予測するようにモデルを学習する。 In the learning phase, learning data is created based on the teaching data collected in the teaching phase, and the learning data is used to train the motion generation model. The teaching data may be used as is as the learning data. Further, in order to remove noise, defects, error values, etc. contained in the teaching data, learning data may be created by converting the teaching data. Converting teaching data to learning data is called preprocessing. Preprocessing includes data cleaning, which removes data that includes missing values and outlier values, and normalization, which converts data values so that they fall within a specified range. By updating the weights of the machine learning model using learning data, when a certain value is input to the model, a desired value is output. For example, when joint angle information and images at a certain time are input, the model is trained to predict joint angle information and images at the next time.

実行フェーズでは、学習フェーズで得られた動作生成モデルとロボットのセンサデータを用いて、動作生成を行うことでタスクを実行する。例えば、学習フェーズで、ある時刻の関節角情報と画像を入力すると、その次の時刻の関節角情報と画像を予測するように、機械学習モデルを学習したとする。この機械学習モデルを実行フェーズで用いることで、各時刻の関節角情報と画像を入力し、出力された関節角情報をロボットへの制御指令値とすることで、自律的に動作を生成することができる。 In the execution phase, the task is executed by generating motions using the motion generation model obtained in the learning phase and the robot's sensor data. For example, assume that in the learning phase, a machine learning model is trained to predict joint angle information and an image at the next time when joint angle information and an image at a certain time are input. By using this machine learning model in the execution phase, it is possible to autonomously generate movements by inputting joint angle information and images at each time and using the output joint angle information as control command values for the robot. I can do it.

模倣学習の動作生成精度を向上させるために、次のような模倣学習が特許文献１に記載されている。この特許文献１には、「模範操作時に操作者が把握していた観測情報に基づいて模倣学習を行う」と記載されている。 In order to improve the motion generation accuracy of imitation learning, the following imitation learning is described in Patent Document 1. This Patent Document 1 states that "imitation learning is performed based on observation information grasped by the operator during the model operation."

特開2021-10984号公報Japanese Patent Application Publication No. 2021-10984

ところで、模倣学習の課題として、導入工数（教示・学習フェーズの工数）削減と、汎化動作獲得の両立が困難であることが挙げられる。例えば、プログラムベースの教示では、コンピュータ制御によってロボットを操作するため、人が操作する場合に比べて均整な教示データが取得でき、教示データ間の関係性を学習しやすくなり汎化動作性能が向上する。しかし、ロボットの動作を逐一プログラミングする必要があるため、導入工数が大きい。 By the way, an issue with imitation learning is that it is difficult to simultaneously reduce the introduction man-hours (teaching/learning phase man-hours) and acquire generalized behavior. For example, in program-based teaching, since the robot is operated by computer control, it is possible to obtain more uniform teaching data compared to when the robot is operated by a human, making it easier to learn the relationships between teaching data and improving generalization performance. do. However, since it is necessary to program the robot's movements one by one, the number of man-hours required for introduction is large.

プログラムベースよりも導入工数が小さい教示方法としては、人がロボットをコントローラ等で遠隔操作する方法や、直接触れることで操作する方法等が挙げられる。これらの教示方法は、ロボットの動作を逐一プログラミングする必要がないため、導入工数が小さい。 Teaching methods that require less man-hours to introduce than program-based methods include a method in which a person remotely controls a robot using a controller, a method in which a robot is operated by direct touch, and the like. These teaching methods require less man-hours to introduce because there is no need to program the robot's movements one by one.

しかし、人がロボットを動かすことで、プログラムベースの教示方法に比べて、教示データ毎に動作速度や動作タイミング等の時間方向のばらつきが生じるため、教示データ間の関係性を学習することが困難である。この時間方向のばらつきの影響を無視するためには、膨大な学習データが必要であり、教示回数が増加する。 However, when a robot is moved by a human, it is difficult to learn the relationships between the teaching data because there are variations in time such as movement speed and timing for each teaching data compared to program-based teaching methods. It is. In order to ignore the influence of this variation in the time direction, a huge amount of learning data is required and the number of teachings increases.

そこで、本発明は、動作学習を阻害する教示データの時間方向のばらつきを低減することができる処理装置、ロボット制御システム及び機械学習方法を提供することを目的とする。なお、特許文献１に記載の技術は、動作生成精度を向上させるために学習データを操作するという点で共通しているものの、時間方向のばらつきを低減することに言及していない。 SUMMARY OF THE INVENTION Accordingly, an object of the present invention is to provide a processing device, a robot control system, and a machine learning method that can reduce temporal variations in teaching data that impede motion learning. Note that although the techniques described in Patent Document 1 have in common that learning data is manipulated to improve motion generation accuracy, they do not mention reducing variations in the time direction.

上記目的を達成するために、本発明の一例の処理装置は、ロボットへの教示データを前記ロボットの同種の動作ごとに分節し、分節された複数の前記教示データに対して、前記ロボットの同種の動作の速度又はタイミングを揃える補正を行い、補正が行われた複数の前記教示データを合成し、合成された前記教示データを用いて機械学習を行うプロセッサを備える。 In order to achieve the above object, a processing device according to an example of the present invention segments teaching data to a robot for each motion of the same type of the robot, and divides the teaching data for the robot into segments of the same type of motion of the robot. The present invention includes a processor that performs correction to align the speed or timing of the operations of the , synthesizes the plurality of corrected teaching data, and performs machine learning using the synthesized teaching data.

本発明によれば、動作学習を阻害する教示データの時間方向のばらつきを低減することができる。上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 According to the present invention, it is possible to reduce temporal variations in teaching data that impede motion learning. Problems, configurations, and effects other than those described above will be made clear by the following description of the embodiments.

本発明が適用されるロボット制御システムの構成例を示す概略図である。1 is a schematic diagram showing a configuration example of a robot control system to which the present invention is applied. ロボット制御システムの動作計画部を実現する計算機のハードウェア構成例を示すブロック図である。FIG. 2 is a block diagram showing an example of the hardware configuration of a computer that implements a motion planning section of the robot control system. ロボット制御システムの動作計画部が備える計算機の機能構成例を示すブロック図である。FIG. 2 is a block diagram showing an example of a functional configuration of a computer included in a motion planning section of the robot control system. 本発明の第１の実施形態における、動作計画部が備える整合性検証装置の機能構成例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of a functional configuration of a consistency verification device included in a motion planning section in the first embodiment of the present invention. 本発明の第１の実施形態における、速度調整の実現方法例の説明図である。FIG. 3 is an explanatory diagram of an example of a method for realizing speed adjustment in the first embodiment of the present invention. 本発明の第１の実施形態における、動作計画部が備える機械学習装置の機能構成例を示すブロック図である。FIG. 2 is a block diagram illustrating an example of a functional configuration of a machine learning device included in a motion planning section in the first embodiment of the present invention. 本発明の第１の実施形態における、動作計画部が備える整合性検証装置の動作例を示すフローチャートである。7 is a flowchart illustrating an example of the operation of the consistency verification device included in the operation planning section in the first embodiment of the present invention. 本発明の第１の実施形態における、動作計画部が備える整合性検証装置の効果例を示すための、リーチングタスクの説明図である。FIG. 3 is an explanatory diagram of a reaching task in order to show an example of the effect of the consistency verification device included in the motion planning unit in the first embodiment of the present invention. 教示動作中の時間とＹ軸方向の手先位置の関係を表した図である。FIG. 3 is a diagram showing the relationship between the time during a teaching operation and the position of the hand in the Y-axis direction. 教示データに時間方向のバラつきがある場合の、リーチングタスクの様子を示す図である。FIG. 7 is a diagram illustrating a reaching task when there is variation in teaching data in the time direction. 時間方向のばらつきがある教示動作中の、時間とＹ軸方向の手先位置の関係を表した図である。FIG. 7 is a diagram showing the relationship between time and hand position in the Y-axis direction during a teaching operation with variations in the time direction. 図８Ｃ、８Ｄに示したリーチングタスクについて、整合性検証装置を適用した際の説明図である。FIG. 8 is an explanatory diagram when the consistency verification device is applied to the reaching tasks shown in FIGS. 8C and 8D. 本発明の第２の実施形態における、動作計画部が備える整合性検証装置の内部構成例を示すブロック図である。FIG. 7 is a block diagram illustrating an example of the internal configuration of a consistency verification device included in a motion planning section in a second embodiment of the present invention. 本発明の第２の実施形態における、動作計画部が備える整合性検証装置の動作例を示すフローチャートである。12 is a flowchart illustrating an example of the operation of the consistency verification device included in the operation planning section in the second embodiment of the present invention. 本発明の第３の実施形態における、ロボット制御システムの構成例を示す概略図である。It is a schematic diagram showing an example of composition of a robot control system in a 3rd embodiment of the present invention. 本発明の第３の実施形態における、ロボット制御システムの動作計画部が備える計算機の機能構成例を示すブロック図である。FIG. 3 is a block diagram showing an example of a functional configuration of a computer included in a motion planning section of a robot control system according to a third embodiment of the present invention. 本発明の第３の実施形態における、ロボット制御システムの動作計画部を実現する計算機のハードウェア構成例を示すブロック図である。FIG. 7 is a block diagram showing an example of the hardware configuration of a computer that implements a motion planning section of a robot control system in a third embodiment of the present invention. 本発明の第３の実施形態における、ロボット制御システムの動作計画部が備えるスクリーニング装置の機能構成例を示すブロック図である。FIG. 3 is a block diagram showing an example of the functional configuration of a screening device included in a motion planning section of a robot control system according to a third embodiment of the present invention. 本発明の第３の実施形態における、ロボット制御システムが備える画面操作部の機能構成例を示すブロック図である。FIG. 7 is a block diagram showing an example of the functional configuration of a screen operation unit included in the robot control system according to a third embodiment of the present invention. 本発明の第４の実施形態における、ロボット制御システムが備える動作計画部の機能構成例を示すブロック図である。FIG. 12 is a block diagram showing an example of the functional configuration of a motion planning section included in the robot control system according to a fourth embodiment of the present invention. 本発明の第４の実施形態における、ロボット制御システムの動作計画部が備える動作パラメータ調整装置の機能構成例を示すブロック図である。FIG. 7 is a block diagram showing an example of a functional configuration of a motion parameter adjustment device included in a motion planning section of a robot control system according to a fourth embodiment of the present invention.

以下、第１～第４の実施形態を、図面を用いて説明する。本実施形態は、ロボットの時系列データを学習し、自律的に動作を生成するようなロボット動作生成技術に関する。本実施形態は、動作学習を阻害する教示データの時間方向のばらつきを低減することで、導入工数削減と汎化動作獲得の両立を実現することを目的とする。 The first to fourth embodiments will be described below with reference to the drawings. The present embodiment relates to a robot motion generation technology that learns time-series data of a robot and autonomously generates motions. The present embodiment aims to achieve both reduction in introduction man-hours and acquisition of generalized motion by reducing variations in the time direction of teaching data that impede motion learning.

（第１の実施形態）
まず、本発明の第１の実施形態に係るロボット制御システムについて図１から図３を参照して説明する。 (First embodiment)
First, a robot control system according to a first embodiment of the present invention will be described with reference to FIGS. 1 to 3.

図１は、本発明が適用されるロボット制御システムの構成例を示す。図１に示すロボット制御システム１００は、ロボット装置１とセンサデータ取得部２、動作計画部３、制御部４から成る。ロボット装置１の具体例として、単腕ロボットアームを示しているが、ロボット装置１の構成は問わず、例えば、双腕アームでもよい。また、脚やクローラ、車輪、プロペラなどの移動装置でもよい。 FIG. 1 shows a configuration example of a robot control system to which the present invention is applied. The robot control system 100 shown in FIG. 1 includes a robot device 1, a sensor data acquisition section 2, a motion planning section 3, and a control section 4. Although a single-arm robot arm is shown as a specific example of the robot device 1, the configuration of the robot device 1 is not limited and may be, for example, a double-arm arm. Alternatively, it may be a moving device such as legs, crawlers, wheels, or propellers.

センサデータ取得部２は、ロボット装置１から出力されるセンサデータを取得する。センサデータには、作業中に取得した、ロボット装置１や環境のセンサデータが含まれる。 The sensor data acquisition unit 2 acquires sensor data output from the robot device 1. The sensor data includes sensor data of the robot device 1 and the environment acquired during work.

動作計画部３は、センサデータ取得部２で得られたデータを教示データとして、次の３つの手順で動作を学習・生成する。まず、教示データのばらつきを抑える前処理行うことで学習データを作成する。次に、学習データに基づき動作学習を行うことで動作生成モデルを構築する。最後に、構築した動作生成モデルを用いて動作指令値を生成する。 The motion planning section 3 uses the data obtained by the sensor data acquisition section 2 as teaching data to learn and generate a motion in the following three steps. First, learning data is created by performing preprocessing to suppress variations in teaching data. Next, a motion generation model is constructed by performing motion learning based on the training data. Finally, a motion command value is generated using the constructed motion generation model.

制御部４は、動作計画部３で算出された指令値をロボット装置１へ与えることで、ロボット装置１に搭載されたアクチュエータが駆動する。 The control unit 4 provides the command value calculated by the motion planning unit 3 to the robot device 1, thereby driving the actuator mounted on the robot device 1.

［動作計画部のハードウェア構成］
次に、ロボット制御システム１００が備える動作計画部３のハードウェア構成について図２を参照して説明する。ここでは、動作計画部３を実現する計算機のハードウェア構成について説明する。 [Hardware configuration of motion planning section]
Next, the hardware configuration of the motion planning unit 3 included in the robot control system 100 will be described with reference to FIG. 2. Here, the hardware configuration of the computer that implements the motion planning section 3 will be explained.

図２は、動作計画部３を実現する計算機のハードウェア構成例を示すブロック図である。図示する計算機２０は、センサデータ取得部２、動作計画部３、制御部４で使用されるコンピュータを構成するハードウェアの一例である。計算機２０には、例えば、パーソナルコンピュータを用いることができる。 FIG. 2 is a block diagram showing an example of the hardware configuration of a computer that implements the motion planning section 3. As shown in FIG. The illustrated computer 20 is an example of hardware that constitutes a computer used by the sensor data acquisition section 2, motion planning section 3, and control section 4. For example, a personal computer can be used as the calculator 20.

計算機２０は、バス２１に接続された、ＲＯＭ（Read Only Memory）２２と、ＣＰＵ（Central Processing Unit）２３と、ＲＡＭ（Random Access Memory）２４と、不揮発性ストレージ２５と、入出力インターフェース２６と、ネットワークインターフェース２７と、を備える。 The computer 20 includes a ROM (Read Only Memory) 22, a CPU (Central Processing Unit) 23, a RAM (Random Access Memory) 24, a nonvolatile storage 25, and an input/output interface 26, which are connected to a bus 21. A network interface 27 is provided.

ＲＯＭ２２は、本実施形態に係るセンサデータ取得部２、動作計画部３、制御部４の機能を実現するソフトウェアのプログラムコードを記録している。 The ROM 22 records software program codes that implement the functions of the sensor data acquisition section 2, motion planning section 3, and control section 4 according to the present embodiment.

ＣＰＵ２３は、本実施形態に係るセンサデータ取得部２、動作計画部３、制御部４の機能を実現するソフトウェアのプログラムコードをＲＯＭ２２から読み出し、該当プログラムをＲＡＭ２４にロードして実行する演算処理装置として機能する。ＲＡＭ２４には、ＣＰＵ２３の演算処理の途中で発生した変数やパラメータなどの値が一時的に書き込まれる。ＲＡＭ２４に書き込まれた変数やパラメータなどの値は、ＣＰＵ２３によって適宜読みだされる。演算処理装置としてＣＰＵを用いているが、ＭＰＵ（Micro Processing Unit）等の他のプロセッサを用いてもよい。 The CPU 23 serves as an arithmetic processing unit that reads out software program codes that implement the functions of the sensor data acquisition section 2, motion planning section 3, and control section 4 according to the present embodiment from the ROM 22, loads the corresponding programs into the RAM 24, and executes them. Function. In the RAM 24, values of variables, parameters, etc. generated during arithmetic processing by the CPU 23 are temporarily written. The values of variables, parameters, etc. written in the RAM 24 are read out by the CPU 23 as appropriate. Although a CPU is used as the arithmetic processing unit, other processors such as an MPU (Micro Processing Unit) may also be used.

不揮発性ストレージ２５は、記録媒体の一例であり、プログラムが使用するデータや、プログラムを実行して得られたデータなどを保存できる。例えば、不揮発性ストレージ２５には、後述する学習データや動作生成モデル等が保存される。また、不揮発性ストレージ２５に、ＯＳ（Operating System）や、ＣＰＵ２３が実行するプログラムを記録してもよい。不揮発性ストレージ２５としては、磁気記録媒体、光記録媒体、半導体記録媒体などが採用可能である。 The nonvolatile storage 25 is an example of a recording medium, and can store data used by a program, data obtained by executing the program, and the like. For example, the nonvolatile storage 25 stores learning data, motion generation models, etc., which will be described later. Further, an OS (Operating System) and programs executed by the CPU 23 may be recorded in the nonvolatile storage 25. As the nonvolatile storage 25, a magnetic recording medium, an optical recording medium, a semiconductor recording medium, etc. can be adopted.

入出力インターフェース２６は、ロボット制御システム１００が備える各センサや各アクチュエータの信号やデータの通信を行うインターフェースである。入出力インターフェース２６が、入力信号又は出力信号を処理する図示しないＡ／Ｄ（Analog/Digital）変換器、及び／又は、Ｄ／Ａ変換器の機能を備えてもよい。本明細書のセンサデータには、各センサだけでなく、各アクチュエータから得られる情報も含まれる。 The input/output interface 26 is an interface for communicating signals and data from each sensor and each actuator included in the robot control system 100. The input/output interface 26 may have the function of an A/D (Analog/Digital) converter and/or a D/A converter (not shown) that processes input signals or output signals. The sensor data herein includes information obtained not only from each sensor but also from each actuator.

ネットワークインターフェース２７は、例えばＮＩＣ（Network Interface Card）やモデムが用いられる。ネットワークインターフェース２７は、端子が接続されたＬＡＮやインターネット等の通信ネットワーク又は専用線等を介して、外部装置との間で各種のデータを送受信することが可能となるように構成されている。例えば、ネットワークインターフェース２７を用いることで、ロボット装置１と各種データの送受信を行うことができる。 As the network interface 27, for example, a NIC (Network Interface Card) or a modem is used. The network interface 27 is configured to be capable of transmitting and receiving various data to and from an external device via a communication network such as a LAN or the Internet, or a dedicated line to which a terminal is connected. For example, by using the network interface 27, it is possible to send and receive various data to and from the robot device 1.

［動作計画部の機能構成］
次に、ロボット制御システム１００が備える動作計画部３の機能構成について、図３を参照して説明する。図３は、本発明の実施となる動作計画部３の機能構成例を表したものである。図３に示す動作計画部３において、センサデータ蓄積装置３１は、前記センサデータ取得部２で取得されたセンサデータを、時系列データとして記録する装置である。整合性検証装置３２は、前記センサデータ蓄積装置３１に記録された時系列データを変換し、学習データを出力する装置である。学習データ蓄積装置３３は、整合性検証装置３２から出力された学習データを記録する装置である。機械学習装置３４は、学習データ蓄積装置３３に記録された学習データを学習し、制御部４へ指令値を出力する装置である。 [Functional configuration of motion planning section]
Next, the functional configuration of the motion planning section 3 included in the robot control system 100 will be described with reference to FIG. 3. FIG. 3 shows an example of the functional configuration of the motion planning section 3 that implements the present invention. In the motion planning unit 3 shown in FIG. 3, the sensor data storage device 31 is a device that records the sensor data acquired by the sensor data acquisition unit 2 as time-series data. The consistency verification device 32 is a device that converts the time series data recorded in the sensor data storage device 31 and outputs learning data. The learning data storage device 33 is a device that records the learning data output from the consistency verification device 32. The machine learning device 34 is a device that learns learning data recorded in the learning data storage device 33 and outputs a command value to the control unit 4.

次に、図４を用いて動作計画部３の整合性検証装置３２の構成を説明する。整合性検証装置３２において、動作分節化部４１は、センサデータを、動作のまとまり毎に分割する装置である。動作分節化の実現方法としては、速度変化に基づく分節化や、クラスタリングによる分節化、深層学習による分節化等がある。 Next, the configuration of the consistency verification device 32 of the motion planning section 3 will be explained using FIG. 4. In the consistency verification device 32, the motion segmentation unit 41 is a device that divides sensor data into groups of motions. Methods for realizing motion segmentation include segmentation based on speed changes, segmentation based on clustering, and segmentation based on deep learning.

速度変化に基づく分節化は、ロボットの動作速度に基づく手法であり、データ特性が単純な場合に使用できる。クラスタリングによる分節化は、教示データ特性が複雑な場合にも使用できるが、事前に分節数を設定する必要がある。深層学習による分節化は、教示データ特性が複雑な場合にも使用でき、事前に分節数を設定する必要がない。一方、速度変化に基づく分節化やクラスタリングによる分節化に比べて、実装コスト、計算コストが大きい。 Segmentation based on velocity changes is a method based on the robot's operating speed and can be used when the data characteristics are simple. Segmentation by clustering can be used even when the taught data characteristics are complex, but it is necessary to set the number of segments in advance. Segmentation using deep learning can be used even when the taught data characteristics are complex, and there is no need to set the number of segments in advance. On the other hand, the implementation cost and calculation cost are higher than segmentation based on speed changes or segmentation based on clustering.

速度調整部４２は、汎化動作獲得の妨げとなる教示データ間の動作速度のばらつきを小さくするために、動作速度を調整する。速度調整部の実現方法の一例として、教示データの量子化とノイズ削除を行うことで教示データ間の速度を揃えるアルゴリズムを説明する。 The speed adjustment unit 42 adjusts the movement speed in order to reduce variations in movement speed between teaching data that impede acquisition of generalized movement. As an example of a method for implementing the speed adjustment section, an algorithm will be described in which the speeds of the teaching data are made equal by quantizing the teaching data and removing noise.

図５は、本アルゴリズムの手順と効果を示した図である。グラフｇ５Ａは、移動速度が異なる種類のデータを示しており、横軸は時間，縦軸は世界座標系におけるある軸（以降、x軸とよぶ）方向の手先位置を表す。グラフｇ５Ｂは、グラフｇ５Ａで示す２つの教示データについて、手先位置の値を離散的な近似値に変換する量子化を行った結果を表している。グラフｇ５Ｂに示す通り、量子化することで教示データの時間当たりの変化量が一定になる。 FIG. 5 is a diagram showing the procedure and effects of this algorithm. Graph g5A shows data of different moving speeds, with the horizontal axis representing time and the vertical axis representing the hand position in a certain axis (hereinafter referred to as the x-axis) in the world coordinate system. Graph g5B represents the result of quantizing the two teaching data shown in graph g5A to convert the hand position value into a discrete approximate value. As shown in graph g5B, by quantizing, the amount of change in the teaching data per time becomes constant.

なお、教示データの時間当たりの変化量が量子化幅よりも大きい場合、量子化後の変化量は必ずしも一定値にならない。そこで、量子化幅の大きさは教示データの時間当たりの変化量と同じ、又は、それより大きくする必要がある。言い換えると、速度調整後は、元の教示データと速度が同じか、又は、元の教示データよりも大きくなる。 Note that when the amount of change per time in the teaching data is larger than the quantization width, the amount of change after quantization does not necessarily become a constant value. Therefore, the size of the quantization width needs to be equal to or larger than the amount of change in teaching data per time. In other words, after the speed adjustment, the speed is the same as the original teaching data or becomes larger than the original teaching data.

ただし、変形例２で説明するアップサンプリングを行うことで、元の教示データよりも、小さな速度へ速度調整することができる。拡大表示５Ｃは、手先一方向に量子化を行うことで、手先位置が変化しない時間、すなわち速度がゼロの時間が発生することを示している。グラフｇ５Ｄは、グラフｇ５Ｂで示す量子化後の教示データについて、ノイズ削除を行った結果を表している。ノイズ削除とは、量子化後に発生した、速度がゼロになる時間を、ノイズとみなして削除することをいう。グラフｇ５Ｄが示す通り、２つの教示データの手先速度が揃っていることが分かる。また、最終手先位置は、速度調整前と変わらないこともわかる。以上より、量子化とノイズ削除を行うことで、速度調整を実現できることがわかる。 However, by performing upsampling as described in Modification 2, it is possible to adjust the speed to a smaller speed than the original teaching data. The enlarged display 5C shows that by performing quantization in one direction of the hand, a time when the hand position does not change, that is, a time when the speed is zero, occurs. Graph g5D represents the result of noise removal for the quantized teaching data shown in graph g5B. Noise removal refers to removing the time when the velocity occurs after quantization, regarding it as noise. As shown by the graph g5D, it can be seen that the hand speeds of the two teaching data are the same. It can also be seen that the final hand position remains the same as before the speed adjustment. From the above, it can be seen that speed adjustment can be achieved by performing quantization and noise removal.

図４に示すタイミング調整部４３は、汎化動作獲得の妨げとなる教示データ間の動作開始・終了等のタイミングのばらつきを小さくするために、動作開始・終了タイミングを調整する。タイミング調整の実現方法の一例としては、ロボット装置１の静止時間を増やす、又は、減らすことが考えられる。例えば、全教示データにおける、ロボット装置１の動作開始時刻を揃えるためには、全教示データの動作開始時刻の平均値を算出し、各教示データにおける動作開始時刻を、その平均値と等しくなるように静止時間を増やす、又は、減らすことが考えられる。 The timing adjustment unit 43 shown in FIG. 4 adjusts the motion start and end timings in order to reduce variations in the timings of motion start and end, etc. between teaching data, which hinders the acquisition of generalized motions. An example of a method for realizing timing adjustment is to increase or decrease the stationary time of the robot device 1. For example, in order to align the operation start times of the robot device 1 in all the teaching data, calculate the average value of the operation start times of all the teaching data, and set the operation start time of each teaching data to be equal to the average value. It is conceivable to increase or decrease the stationary time.

ダイナミクス調整部４４は、速度調整後の教示データにおける加速度や躍度、トルク等のダイナミクスの大きさや変化が過大になることで、生成される動作が不安定になることを防ぐために、センサデータにおける加速度や躍度、トルク等のダイナミクスを調整する。ダイナミクス調整の実現方法の一例としては、手先位置のデータに対して移動平均フィルタを適用することが考えられる。 The dynamics adjustment unit 44 adjusts the dynamics in the sensor data in order to prevent the generated motion from becoming unstable due to excessive magnitudes and changes in dynamics such as acceleration, jerk, and torque in the teaching data after speed adjustment. Adjust dynamics such as acceleration, jerk, and torque. One example of a method for realizing dynamics adjustment is to apply a moving average filter to the hand position data.

データ処理部４５は、ダイナミクス調整部４４から出力されたデータを深層学習モデルで学習できるように、学習データに変換する。具体的には、教示データ毎に、調整後の各分節を合成することで学習データを構築する。 The data processing unit 45 converts the data output from the dynamics adjustment unit 44 into learning data so that the data can be learned by the deep learning model. Specifically, learning data is constructed by synthesizing each adjusted segment for each teaching data.

次に、図６を用いて機械学習装置３４の構成について説明する。機械学習装置３４において、機械学習モデル定義部６１は、機械学習モデルの構造やパラメータを定義する。 Next, the configuration of the machine learning device 34 will be explained using FIG. 6. In the machine learning device 34, the machine learning model definition unit 61 defines the structure and parameters of the machine learning model.

学習部６２は、学習データ蓄積装置３３で蓄積した学習データを用いて、機械学習モデル定義部６１で定義された機械学習モデルの重みを更新することで、ある値をモデルへ入力すると、所望の値を出力するようにする。例えば、ある時刻のロボットの関節角度を入力すると、次の時刻のロボットの関節角度の予測値を出力するようにする。 The learning unit 62 uses the learning data accumulated in the learning data storage device 33 to update the weights of the machine learning model defined in the machine learning model definition unit 61, so that when a certain value is input to the model, a desired value is obtained. Make the value output. For example, if the robot's joint angles at a certain time are input, predicted values of the robot's joint angles at the next time are output.

学習済み重み蓄積部６３は、学習済みのモデルの重みパラメータを保存する。 The learned weight storage unit 63 stores the weight parameters of the learned model.

推論部６４は、機械学習モデル定義部６１から読み込んだ機械学習モデルと、学習済み重み蓄積部６３から読み込んだモデルの重みパラメータを用いることで動作生成モデルを構築し、学習時と同様のセンサデータをモデルへ入力することで動作生成を行う。 The inference unit 64 constructs a motion generation model by using the machine learning model read from the machine learning model definition unit 61 and the weight parameters of the model read from the learned weight storage unit 63, and uses the same sensor data as during learning. The motion is generated by inputting it into the model.

例えば、学習時に、ある時刻のロボットの関節角度を入力すると、次の時刻のロボットの関節角度の予測値を出力するようにモデルを構築したとする。この場合、動作生成時は、各時刻ごとの関節角度をモデルに入力することで、次の時刻における関節角度を得られ、得られた関節角度を指令値とすることでロボットの動作を生成できる。 For example, suppose that during learning, a model is constructed such that when a robot's joint angles at a certain time are input, a predicted value of the robot's joint angles at the next time is output. In this case, when generating motion, by inputting the joint angles at each time into the model, the joint angles at the next time can be obtained, and the robot's motion can be generated by using the obtained joint angles as command values. .

［動作計画部３の動作例］
次に、これまで説明した動作計画部３の各機能がどのような順序で動作するかを説明する。動作順序は大きく教示・学習・実行の３つのフェーズに分けられる。教示フェーズでは、ロボット装置１を操作することで動作を教示し、その間得られたセンサデータを、センサデータ取得部２を用いて、センサデータ蓄積装置３１に記録する。 [Example of operation of motion planning unit 3]
Next, the order in which each function of the motion planning section 3 described above operates will be explained. The operation order can be broadly divided into three phases: teaching, learning, and execution. In the teaching phase, movements are taught by operating the robot device 1, and sensor data obtained during the teaching phase is recorded in the sensor data storage device 31 using the sensor data acquisition unit 2.

学習フェーズでは、整合性検証装置３２を用いて、センサデータ蓄積装置３１のセンサデータを学習データへ変換し、機械学習モデル定義部６１で定義した機械学習モデルを用いて学習することで、学習済み重みを得る。整合性検証装置３２では、動作分節化、速度調整、タイミング調整の３つ手順でセンサデータを変換する。これらの３つの手順を、図７を参照して説明する。 In the learning phase, the consistency verification device 32 is used to convert the sensor data of the sensor data storage device 31 into learning data, and the machine learning model defined by the machine learning model definition unit 61 is used for learning. Get the weight. The consistency verification device 32 converts sensor data through three steps: motion segmentation, speed adjustment, and timing adjustment. These three steps will be explained with reference to FIG.

図７は、整合性検証装置３２の動作例を示すフローチャートである。前処理開始（Ｓ１）後は、まず、全教示データについて、それぞれ動作分節化を行う（Ｓ２）。動作分節化では、動作分節化部４１を用いて、センサデータを時間方向に分割することで、動作のまとまり毎に分ける。動作分節化後は、各分節について（Ｓ３、Ｓ４）、速度調整（Ｓ５）と、タイミング調整と（Ｓ６）、ダイナミクス調整（Ｓ７）を行う。タイミング調整部４３によって処理されたデータを、データ処理部４５を用いることで、学習データへ変換し（Ｓ８）、前処理を終了する（Ｓ９）。 FIG. 7 is a flowchart showing an example of the operation of the consistency verification device 32. After the start of preprocessing (S1), first, motion segmentation is performed for all teaching data (S2). In motion segmentation, the motion segmentation unit 41 is used to divide sensor data in the time direction, thereby dividing the sensor data into groups of motions. After motion segmentation, speed adjustment (S5), timing adjustment (S6), and dynamics adjustment (S7) are performed for each segment (S3, S4). The data processed by the timing adjustment unit 43 is converted into learning data using the data processing unit 45 (S8), and the preprocessing is ended (S9).

以上のように、整合性検証装置３２で教示データを処理することで、教示データ間の時間方向のばらつきを低減した学習データが得られる。なお、速度調整とタイミング調整は、必ずしも両方とも行う必要はなく、どちらか一方のみ行えばよい場合がある。例えば、教示動作の速度のみが異なる場合は、速度のみを調整すればよいので、タイミング調整は不要である。また、教示動作のタイミングのみが異なる場合は、タイミングのみを調整すればよいので、速度調整は不要である。 As described above, by processing the teaching data with the consistency verification device 32, learning data with reduced temporal variation between the teaching data can be obtained. Note that it is not always necessary to perform both speed adjustment and timing adjustment, and it may be necessary to perform only one of them. For example, if only the speed of the teaching operation is different, only the speed needs to be adjusted, so timing adjustment is not necessary. Further, if only the timing of the teaching operation is different, only the timing needs to be adjusted, so speed adjustment is not necessary.

整合性検証装置３２で構築した学習データを、機械学習装置３４の学習部６２へ入力する。学習部６２では、機械学習モデル定義部６１から機械学習モデルを呼び出し、学習データに基づき学習する。学習の結果得られた動作生成モデルは、学習済み重み蓄積部６３へ保存される。 The learning data constructed by the consistency verification device 32 is input to the learning section 62 of the machine learning device 34. The learning unit 62 calls the machine learning model from the machine learning model definition unit 61 and performs learning based on the learning data. The motion generation model obtained as a result of learning is stored in the learned weight storage section 63.

実行フェーズでは、学習フェーズで得られた動作生成モデルを用いて、動作生成を行うことで、タスクを実行する。ロボット装置１への指令値を計算するために、機械学習装置３４の推論部６４を用いる。 In the execution phase, the task is executed by generating motion using the motion generation model obtained in the learning phase. The inference unit 64 of the machine learning device 34 is used to calculate the command value to the robot device 1.

［動作計画部の効果例］
動作計画部３の有効性の検証として、動作計画部３が動作することで、どのように教示データ間の時間方向のばらつきが小さくなるかを、図８（８Ａ～８Ｄ）、図９を参照して説明する。図８（８Ａ～８Ｄ）は、ロボットアームの手先を目標手先位置へ直線的に移動させるという、リーチングタスクを表したものである。図８（８Ａ～８Ｄ）を用いて、教示データにおける時間方向のばらつきが、リーチングタスクに与える影響を説明する。 [Example of effect of motion planning section]
As a verification of the effectiveness of the motion planning unit 3, how the variation in the time direction between teaching data is reduced by operating the motion planning unit 3 is shown in FIGS. 8 (8A to 8D) and FIG. 9. and explain. FIG. 8 (8A to 8D) shows a reaching task in which the hand of the robot arm is moved linearly to a target hand position. The influence of variations in the time direction in the teaching data on the reaching task will be explained using FIGS. 8 (8A to 8D).

まず、図８Ａと図８Ｂ（グラフｇ８Ｂ）を用いて、教示データに時間方向のばらつきがないとタスクが成功する理由を説明する。図８Ａに描かれる手先は、ロボットハンドの初期時刻での位置を示している。また、図８Ａに描かれるｐ、ｑ、ｒは、それぞれ目標手先位置を表している。図８ＡのＸ、Ｙはそれぞれタスク環境に設定された座標軸の名前であり、ここではX軸、Y軸は位置［ｃｍ］を示す。ここで、目標手先位置p、ｒに対する動作を教示したとする。図８Ａにおいて、手先からｐ、ｒへ向かう矢印（実線）は、教示動作中の手先の軌跡を表している。 First, using FIGS. 8A and 8B (graph g8B), the reason why a task is successful when there is no temporal variation in the teaching data will be explained. The hand depicted in FIG. 8A shows the position of the robot hand at the initial time. Moreover, p, q, and r drawn in FIG. 8A each represent the target hand position. X and Y in FIG. 8A are the names of coordinate axes set in the task environment, and here the X and Y axes indicate position [cm]. Here, let us assume that motions for target hand positions p and r are taught. In FIG. 8A, arrows (solid lines) from the hand toward p and r represent the trajectory of the hand during the teaching motion.

図８Ｂ（グラフｇ８Ｂ）は、教示動作中の時間とＹ軸方向の手先位置の関係を表した図である。グラフｇ８Ｂに示す通り、目標手先位置ｐとｒの教示動作は、手先速度とリーチング開始・終了時刻は同じであり、時間方向のばらつきはない。動作学習フェーズで、目標手先位置ｐとｒに対する教示動作を学習することで、現在のセンサデータを入力すると、次の時刻におけるロボットアームへの指令値を出力する動作生成モデルを獲得する。 FIG. 8B (graph g8B) is a diagram showing the relationship between the time during the teaching operation and the hand position in the Y-axis direction. As shown in graph g8B, the teaching motions for the target hand positions p and r have the same hand speed and reaching start/end time, and there is no variation in the time direction. In the motion learning phase, by learning the teaching motion for the target hand positions p and r, a motion generation model is obtained that outputs a command value to the robot arm at the next time when current sensor data is input.

次いで、図８Ａとグラフｇ８Ｂを参照して、動作実行フェーズで、目標手先位置ｑに対して生成される動作を考える。動作実行フェーズでは、動作学習フェーズで構築した動作生成モデルへ目標手先位置ｑは、２点間の学習により、位置汎化することで、目標手先位置ｐ、ｒのちょうど真ん中に位置するようになる。このとき、ｑに対して生成される手先位置は、各時刻ごとに、ｐ、ｒに対する教示動作の手先位置のちょうど中間を通る。図８Ａとグラフｇ８Ｂは、ｑに対して生成される動作を破線で示しており、手先がＹ軸方向にぶれず安定している。このことから、教示データに時間方向のばらつきがない場合は、リーチングタスクに成功することが分かる。 Next, with reference to FIG. 8A and graph g8B, consider the motion generated for the target hand position q in the motion execution phase. In the motion execution phase, the target hand position q is generalized to the motion generation model constructed in the motion learning phase by learning between two points, so that it is located exactly in the middle of the target hand positions p and r. . At this time, the hand position generated for q passes exactly between the hand positions of the teaching motion for p and r at each time. In FIG. 8A and graph g8B, the motion generated with respect to q is shown by a broken line, and the hand is stable without wobbling in the Y-axis direction. From this, it can be seen that the reaching task is successful when there is no variation in the teaching data in the time direction.

次に、図８Ｃと図８Ｄ（グラフｇ８Ｄ）を用いて、ばらつきがあるとタスクが失敗する理由を説明する。図８Ｃは、教示データに時間方向のバラつきがある場合の、リーチングタスクの様子を示したものである。図中の各符号は図８Ａと同じ意味を表す。 Next, using FIG. 8C and FIG. 8D (graph g8D), the reason why the task fails when there is variation will be explained. FIG. 8C shows the state of the reaching task when there is variation in the teaching data in the time direction. Each symbol in the figure represents the same meaning as in FIG. 8A.

図８Ｄ（グラフｇ８Ｄ）は、教示動作中の、時間とＹ軸方向の手先位置の関係を表した図である。目標手先位置ｐとｒの教示動作には、時間方向のばらつきがある。具体的には、目標手先位置ｐよりも、目標手先位置ｒの教示動作のほうが、移動速度が速い。またリーチング開始時刻（２点鎖線）、完了時刻（１点鎖線）も早い。 FIG. 8D (graph g8D) is a diagram showing the relationship between time and hand position in the Y-axis direction during the teaching operation. There are variations in the time direction in the teaching operations for the target hand positions p and r. Specifically, the movement speed of the teaching operation of the target hand position r is faster than that of the target hand position p. Furthermore, the reaching start time (double-dashed line) and the reaching completion time (dotted-dotted line) are also early.

次いで、図８Ｃとグラフｇ８Ｄを参照して、目標手先位置ｑに対して生成される動作を考える。上述の通り、ｐ、ｑ、ｒの位置関係から、ｑに対して生成される動作は、ｐ、ｒに対する教示動作のちょうど中間を通る。図８Ｃとグラフｇ８Ｄに、ｑに対して生成される動作を、破線で示している。ｑに対して生成される動作を参照すると、手先がＹ軸方向に大きくぶれてしまっている。このことから、教示データに時間方向のばらつきがある場合は、リーチングタスクに失敗することが分かる。 Next, consider the motion generated for the target hand position q with reference to FIG. 8C and graph g8D. As described above, due to the positional relationship between p, q, and r, the motion generated for q passes exactly in the middle of the taught motion for p and r. In FIG. 8C and graph g8D, the action generated for q is shown by a dashed line. Referring to the motion generated for q, the hand is largely shaken in the Y-axis direction. From this, it can be seen that the reaching task will fail if there is variation in the teaching data in the time direction.

次いで、図９を用いて、教示データに時間方向のバラつきがある場合でも、整合性検証装置３２を用いることで、汎化動作が獲得できることを説明する。図９は、図８Ｃ、８Ｄに示したリーチングタスクについて、整合性検証装置３２を適用した際の説明図である。図９中の、同一符号は図８Ｃ、８Ｄと同一部品であるため、再度の説明を省略する。 Next, using FIG. 9, it will be explained that a generalized operation can be obtained by using the consistency verification device 32 even when there is variation in the teaching data in the time direction. FIG. 9 is an explanatory diagram when the consistency verification device 32 is applied to the reaching tasks shown in FIGS. 8C and 8D. Since the same reference numerals in FIG. 9 are the same parts as those in FIGS. 8C and 8D, a repeated explanation will be omitted.

グラフｇ９Ａは、グラフｇ８Ｄで示した目標手先位置ｐ、ｒに対する教示データである。上述の通り、目標手先位置p、ｒに対する教示データには、時間方向のばらつきがある。 Graph g9A is teaching data for target hand positions p and r shown in graph g8D. As described above, there are variations in the teaching data for the target hand positions p and r in the time direction.

グラフｇ９Ｂは、グラフｇ９Ａで示す目標手先位置ｐ、ｒに対する教示動作を、それぞれ動作分節化したものである。グラフｇ９Ｂの一点鎖線は、分節化した時刻を表している。グラフｇ９Ｂに示す通り、リーチング開始時刻とリーチング終了時刻で動作を分割している。例えば、グラフｇ９Ｂに示すように、リーチング開始時刻を検出するためには、動作分節化部４１にて速度変化に基づく分節化を行うことで、手先が静止している状態から動き始める時刻を調べるとよい。 Graph g9B is a graph obtained by segmenting the teaching motions for target hand positions p and r shown in graph g9A. The dashed dotted line in graph g9B represents the segmented time. As shown in graph g9B, the operation is divided into reaching start time and reaching end time. For example, as shown in graph g9B, in order to detect the reaching start time, the motion segmentation unit 41 performs segmentation based on speed changes to find the time when the hand starts moving from a stationary state. Good.

グラフｇ９Ｃは、グラフｇ９Ｂで示す手先動作について、手先速度を揃える説明図である。グラフｇ９Ｃに示す通り、リーチング開始時刻からリーチング完了時刻までのアームの手先速度を調整し、２つの教示データにおける手先速度の平均値＋α（定数）と等しくすれば良い。さらに好ましくは、リーチング開始時刻からリーチング完了時刻までのアームの手先速度を平均値と等しくする。 Graph g9C is an explanatory diagram for aligning the hand speeds for the hand motion shown in graph g9B. As shown in graph g9C, the hand speed of the arm from the reaching start time to the reaching completion time may be adjusted to be equal to the average value of the hand speeds in the two teaching data + α (constant). More preferably, the hand speed of the arm from the reaching start time to the reaching completion time is made equal to the average value.

グラフｇ９Ｄは、グラフｇ９Ｃで示す手先動作について、手先の動作タイミングを揃える説明図である。グラフｇ９Ｄに示す通り、教示データ間で、リーチング開始時刻とリーチング完了時刻を等しくしている。 Graph g9D is an explanatory diagram for arranging the timing of hand motions for the hand motions shown in graph g9C. As shown in graph g9D, the reaching start time and the reaching completion time are made equal between the teaching data.

グラフｇ９Ｅは、実行フェーズで、図８Ａにおける目標手先位置qに対して生成される動作を示している。目標手先位置ｑは、目標手先位置ｐ、ｒのちょうど中間に位置するため、ｑに対して生成される動作も、ｐ、ｒに対する教示動作のちょうど中間を通る。ｑに対して生成される動作を参照すると、手先がＹ軸方向にぶれず、安定している。 Graph g9E shows the motion generated for target hand position q in FIG. 8A during the execution phase. Since the target hand position q is located exactly between the target hand positions p and r, the motion generated for q also passes exactly between the taught motions for p and r. Referring to the motion generated for q, the hand does not shake in the Y-axis direction and is stable.

＜変形例１＞
ところで、本実施形態では、動作分節化部４１では、センサデータを動作のまとまりごとに分割する。しかし、データ特性が複雑である場合、分節化がうまく機能しない可能性がある。この対処法として、Recurrent Dropoutを用いることが考えられる。Recurrent Dropoutは、再帰型ニューラルネットワークにおけるノードをランダムに不活性化させながら、学習を行うことである。Recurrent Dropoutを用いることで、学習データにおける時間方向のばらつきに対する汎化性能が向上することが期待される。 <Modification 1>
By the way, in this embodiment, the motion segmentation unit 41 divides sensor data into groups of motions. However, if the data characteristics are complex, segmentation may not work well. One possible solution to this problem is to use Recurrent Dropout. Recurrent Dropout is the process of learning while randomly inactivating nodes in a recurrent neural network. By using Recurrent Dropout, it is expected that generalization performance against temporal variations in training data will be improved.

＜変形例２＞
ところで、本実施形態では、速度調整部４２にて、センサデータの量子化を行っている。しかし、センサデータの単位時間当たりの変化量が量子化幅よりも大きい場合（つまり、変換後の速度を大きくしたい場合）、量子化後のセンサデータの時間当たりの変化量が量子化幅を超えてしまうため、量子化後の時間当たりの変化量が等しくならず速度を揃えることに失敗する。 <Modification 2>
By the way, in this embodiment, the speed adjustment section 42 quantizes the sensor data. However, if the amount of change per unit time in sensor data is larger than the quantization width (in other words, if you want to increase the speed after conversion), the amount of change per unit time in sensor data after quantization exceeds the quantization width. As a result, the amount of change per time after quantization is not equal and failure to equalize the speed occurs.

これを避けるためには、速度調整部４２にて、量子化の前にアップサンプリングを行うことで、センサの単位時間あたりの変化量を小さくする必要がある。アップサンプリングとは、時系列データのサンプリング周波数を大きくすることである。アップサンプリングの実現方法としては、線形補間や、最近傍補間、スプライン補間等がある。ただし、これらの補間方法は、画像データに対して適用できないことが知られており、フレーム補間という技術が別途必要となる。詳細は、第２の実施形態で説明する。 In order to avoid this, the speed adjustment unit 42 needs to perform upsampling before quantization to reduce the amount of change in the sensor per unit time. Upsampling means increasing the sampling frequency of time series data. Methods for implementing upsampling include linear interpolation, nearest neighbor interpolation, spline interpolation, and the like. However, it is known that these interpolation methods cannot be applied to image data, and a separate technique called frame interpolation is required. Details will be explained in the second embodiment.

本実施形態の主な特徴は、次のようにまとめることもできる。 The main features of this embodiment can be summarized as follows.

処理装置（計算機２０）のプロセッサ（ＣＰＵ２３）は、ロボット（ロボット装置１）への教示データをロボットの同種の動作ごとに分節する（動作分節化部４１）。プロセッサ（ＣＰＵ２３）は、分節された複数の教示データに対して、ロボットの同種の動作の速度又はタイミングを揃える補正を行う（速度調整部４２、タイミング調整部４３）。プロセッサ（ＣＰＵ２３）は、補正が行われた複数の教示データを合成する（データ処理部４５）。プロセッサ（ＣＰＵ２３）は、合成された教示データを用いて機械学習を行う（機械学習装置３４）。 The processor (CPU 23) of the processing device (computer 20) segments the teaching data to the robot (robot device 1) for each similar type of robot motion (motion segmentation unit 41). The processor (CPU 23) corrects the plurality of segmented teaching data to align the speed or timing of the same type of motion of the robot (speed adjustment section 42, timing adjustment section 43). The processor (CPU 23) synthesizes the plurality of corrected teaching data (data processing unit 45). The processor (CPU 23) performs machine learning using the combined teaching data (machine learning device 34).

これにより、教示データの時間方向のばらつきが低減される。その結果、導入工数（教示・学習フェーズの工数）が削減されるとともに、教示データ間の関係性を学習しやすくなり汎化動作性能が向上する。 This reduces variations in teaching data in the time direction. As a result, the introduction man-hours (teaching/learning phase man-hours) are reduced, and the relationships between teaching data can be learned more easily, improving generalization performance.

プロセッサ（ＣＰＵ２３）は、分節された複数の教示データに対して、ロボットの同種の動作のダイナミクスの値を平滑化する補正を行う（ダイナミクス調整部４４）。例えば、ダイナミクス（加速度や躍度、トルク等）の値は、移動平均をとることにより平滑化される。これにより、ロボットの動作を安定化することができる。 The processor (CPU 23) performs correction for smoothing the dynamics values of the same type of motion of the robot on the plurality of segmented teaching data (dynamics adjustment unit 44). For example, values of dynamics (acceleration, jerk, torque, etc.) are smoothed by taking a moving average. Thereby, the operation of the robot can be stabilized.

教示データは、例えば、ロボットの位置を示す位置情報、関節角度を示す関節角度情報、ロボットの速度を示す速度情報、ロボットの関節角速度を示す関節角速度情報、及びロボットに設けられる触覚センサのセンサ値を示す触覚情報のうち少なくとも１つを含む。本実施形態では、教示データは、ロボットの位置（グリッパーの位置）を示す位置情報を含む。これにより、ロボットの状態を教示データとして機械学習を行うことができる。 The teaching data includes, for example, position information indicating the position of the robot, joint angle information indicating the joint angle, speed information indicating the speed of the robot, joint angular velocity information indicating the joint angular velocity of the robot, and sensor values of a tactile sensor provided in the robot. The tactile information includes at least one of tactile information indicating the tactile information. In this embodiment, the teaching data includes position information indicating the position of the robot (the position of the gripper). Thereby, machine learning can be performed using the state of the robot as teaching data.

また、教示データは、ロボットが作業を行う対象物の位置を示す位置情報、及び対象物の姿勢を示す姿勢情報のうち少なくとも１つを含んでもよい。これにより、対象物の状態を教示データとして機械学習を行うことができる。 Further, the teaching data may include at least one of position information indicating the position of the object on which the robot works, and posture information indicating the posture of the object. Thereby, machine learning can be performed using the state of the object as teaching data.

ダイナミクスは、例えば、加速度、加速度変化、躍度、躍度変化、トルク、及びトルク変化のうち少なくとも１つを含む。これにより、ロボットの動作をなめらかにすることができる。 The dynamics include, for example, at least one of acceleration, acceleration change, jerk, jerk change, torque, and torque change. This allows the robot to move smoothly.

プロセッサ（ＣＰＵ２３）は、分節された複数の教示データを量子化し、ノイズを削除することでロボットの同種の動作の速度を揃える（図５）。これにより、教示データの時間方向のばらつきを高速に低減することができる。 The processor (CPU 23) quantizes the plurality of segmented teaching data and eliminates noise, thereby making the speeds of the robot's similar motions uniform (FIG. 5). Thereby, variations in the teaching data in the time direction can be quickly reduced.

プロセッサ（ＣＰＵ２３）は、分節された複数の教示データに対して、ロボットの静止時間を増やす又は減らすことで、ロボットの同種の動作のタイミングを揃える補正を行う（図９）。これにより、教示データの時間方向のばらつきを容易に低減することができる。 The processor (CPU 23) corrects the plurality of segmented teaching data by increasing or decreasing the robot's rest time to align the timing of the robot's similar motions (FIG. 9). Thereby, variations in the teaching data in the time direction can be easily reduced.

ロボット制御システム１００は、処理装置（計算機２０）とロボット（ロボット装置１）を含む。プロセッサ（ＣＰＵ２３）は、学習済みの機械学習モデルを用いて、ロボットの動作の指令値を生成する（機械学習装置３４）。ロボットは、指令値に応じて動作する。これにより、ロボットを自律的に制御することができる。 The robot control system 100 includes a processing device (computer 20) and a robot (robot device 1). The processor (CPU 23) uses the learned machine learning model to generate a command value for the robot's operation (machine learning device 34). The robot operates according to command values. This allows the robot to be autonomously controlled.

（第２の実施形態）
次に、本発明の第２の実施形態として、センサデータに画像が含まれる場合について、図１０と図１１を参照して説明する。 (Second embodiment)
Next, as a second embodiment of the present invention, a case where sensor data includes an image will be described with reference to FIGS. 10 and 11.

［機能構成例］
図１０は、本発明の第２の実施形態における、動作計画部が備える整合性検証装置の内部構成例を示すブロック図である。なお、図１０において、図４と同一符号は同一部品を示すので、再度の説明は省略する。図１０に示すように、第２の実施形態は、センサデータを動作分節化部４１で分割し、速度調整部４２と、又は／及び、タイミング調整部４３と、を用いることで時間方向のばらつきを削減し、ダイナミクス調整部４４を用いることでダイナミクスを調整し、データ処理部４５で学習データに変換する点では、第１の実施形態と同じである。第１の実施形態からの変更点は、第２の実施形態では、整合性検証装置３２において、フレーム補間部１０１を備えている点である。フレーム補間部１０１は、画像の時系列データのフレームレートを大きくする。 [Functional configuration example]
FIG. 10 is a block diagram showing an example of the internal configuration of a consistency verification device included in the motion planning section in the second embodiment of the present invention. Note that in FIG. 10, the same reference numerals as in FIG. 4 indicate the same parts, so repeated explanation will be omitted. As shown in FIG. 10, the second embodiment divides sensor data by a motion segmentation unit 41, and uses a speed adjustment unit 42 and/or a timing adjustment unit 43 to reduce variations in the time direction. This is the same as the first embodiment in that the dynamics is adjusted by using the dynamics adjustment unit 44 and converted into learning data by the data processing unit 45. The difference from the first embodiment is that in the second embodiment, the consistency verification device 32 includes a frame interpolation unit 101. The frame interpolation unit 101 increases the frame rate of time-series data of images.

フレーム補間の実現方法としては、オプティカルフローに基づくフレーム補間や、深層学習によるフレーム補間等が挙げられる。オプティカルフローに基づくフレーム補間は、特徴量抽出と、移動変化計算と、補間画像生成という３つのステップから成る。例えば、画像の時系列データのうち、ある連続する２枚の画像を考える。特徴量抽出では、２枚の画像の特徴量を抽出する。移動変化計算では、特徴量抽出で算出された特徴量のうち、同一特徴量に着目し、その移動変化量を計算する。なお、この特徴量の移動変化のことをオプティカルフローという。補間画像生成では移動変化量に基づき、元の画像２枚における画素を移動することで、２枚の画像の間にある画像を推定する。 Examples of methods for implementing frame interpolation include frame interpolation based on optical flow and frame interpolation using deep learning. Frame interpolation based on optical flow consists of three steps: feature extraction, movement change calculation, and interpolated image generation. For example, consider two consecutive images out of time-series data of images. In feature extraction, feature quantities of two images are extracted. In the movement change calculation, attention is paid to the same feature quantity among the feature quantities calculated in the feature quantity extraction, and the movement change amount thereof is calculated. Note that this movement change of the feature amount is called optical flow. In interpolation image generation, an image between the two original images is estimated by moving pixels in the two original images based on the amount of change in movement.

深層学習によるフレーム補間としては、例えば、ＦＬＡＶＲやＦＩＬＭ等のモデルが挙げられる。ＦＬＡＶＲは、オプティカルフローや３次元畳み込み計算をモデルの内部で行うことで、高精度なフレーム補間を可能にする深層学習モデルである。また、ＦＩＬＭは、画像のスケール（拡大や縮小）を考慮した深層学習モデルである。ＦＬＡＶＲ（またはＦＩＬＭ）へ画像の時系列データを入力すると、フレーム補間後の画像の時系列データを得ることができる。 Examples of frame interpolation using deep learning include models such as FLAVR and FILM. FLAVR is a deep learning model that enables highly accurate frame interpolation by performing optical flow and three-dimensional convolution calculations inside the model. Further, FILM is a deep learning model that takes into account the scale (enlargement or reduction) of an image. When time-series data of an image is input to FLAVR (or FILM), time-series data of an image after frame interpolation can be obtained.

［動作例］
図１１は、本発明の第２の実施形態における、整合性検証装置３２の動作例を示すフローチャートである。図１１に示すように、第２の実施形態は、センサデータの動作分節化を行い（Ｓ２）、分節毎に（Ｓ３、Ｓ４）、速度調整（Ｓ６）、タイミング調整（Ｓ７）を行い、教示データ毎に（Ｓ８、Ｓ９）、調整後の各分節を合成することで、学習データを構築する点では、第１の実施形態と同じである。なお、図１１において、図７と同一符号は同一部品を示すので、再度の説明は省略する。第１の実施形態からの変更点は、第２の実施形態では、速度調整（Ｓ６）の前に、フレーム補間（Ｓ１３）を行う点である。 [Operation example]
FIG. 11 is a flowchart showing an example of the operation of the consistency verification device 32 in the second embodiment of the present invention. As shown in FIG. 11, the second embodiment performs motion segmentation of sensor data (S2), performs speed adjustment (S6), timing adjustment (S7) for each segment, and performs teaching. This is the same as the first embodiment in that learning data is constructed by composing each adjusted segment for each data (S8, S9). Note that in FIG. 11, the same reference numerals as in FIG. 7 indicate the same parts, so repeated explanation will be omitted. The difference from the first embodiment is that in the second embodiment, frame interpolation (S13) is performed before speed adjustment (S6).

教示データは、ロボット（ロボット装置１）又は作業環境の画像を含む。 The teaching data includes images of the robot (robot device 1) or the working environment.

プロセッサ（ＣＰＵ２３）は、分節された複数の教示データに対して、教示データに含まれる画像のフレーム補間を行い、その後、ロボットの同種の動作の速度を揃える補正を行う（フレーム補間部１０１）。これにより、教示データがロボット（ロボット装置１）又は作業環境の画像を含んでいても、教示データの時間方向のばらつきを容易に低減することができる。 The processor (CPU 23) performs frame interpolation of images included in the teaching data for the plurality of segmented teaching data, and then performs correction to equalize the speeds of the same type of motion of the robot (frame interpolation unit 101). Thereby, even if the teaching data includes images of the robot (robot device 1) or the working environment, variations in the teaching data in the time direction can be easily reduced.

（第３の実施形態）
次に、本発明の第３の実施形態として、汎化性能向上に有効な教示データのみを抽出する場合について、図１２、図１３、図１４と参照して説明する。 (Third embodiment)
Next, as a third embodiment of the present invention, a case where only teaching data effective for improving generalization performance is extracted will be described with reference to FIGS. 12, 13, and 14.

図１２は、本発明の第３の実施形態における、ロボット制御システム１００の構成例を示すブロック図である。なお、図１２において、図１と同一符号は同一部品を示すので、再度の説明は省略する。第１の実施形態からの変更点は、第３の実施形態では、ロボット制御システム１００において、画面操作部５を備えている点である。画面操作部５は、動作計画部の処理結果を表示することができる。また、ユーザーから受け取った操作入力に基づいて、動作計画部３における各種パラメータを決定する。 FIG. 12 is a block diagram showing a configuration example of a robot control system 100 in the third embodiment of the present invention. Note that in FIG. 12, the same reference numerals as in FIG. 1 indicate the same parts, so repeated explanation will be omitted. The difference from the first embodiment is that in the third embodiment, the robot control system 100 includes a screen operation section 5. The screen operation section 5 can display the processing results of the motion planning section. Furthermore, various parameters in the motion planning section 3 are determined based on the operation input received from the user.

図１３は、本発明の第３の実施形態における、動作計画部３の構成例を示すブロック図である。なお、図１３において、図３と同一符号は同一部品を示すので、再度の説明は省略する。第１の実施形態からの変更点は、第３の実施形態では、動作計画部３において、スクリーニング装置１３１を備えている点である。スクリーニング装置１３１は、センサデータ蓄積装置３１に保存されたセンサデータの中から、汎化性能向上に有効なもののみを抽出する。その後、抽出された教示データを整合性検証装置３２と画面操作部５へ出力する。 FIG. 13 is a block diagram showing a configuration example of the motion planning section 3 in the third embodiment of the present invention. Note that in FIG. 13, the same reference numerals as those in FIG. 3 indicate the same parts, so a repeated explanation will be omitted. The difference from the first embodiment is that in the third embodiment, the motion planning section 3 includes a screening device 131. The screening device 131 extracts only sensor data that is effective for improving generalization performance from among the sensor data stored in the sensor data storage device 31. Thereafter, the extracted teaching data is output to the consistency verification device 32 and the screen operation section 5.

［ハードウェア構成例］
次に、本発明第３の実施形態における、ロボット制御システム１００が備える動作計画部３のハードウェア構成について図１４を参照して説明する。なお、図１４において、図３と同一符号は同一部品を示すので、再度の説明は省略する。第１の実施形態からの変更点は、第３の実施形態では、映像出力インターフェース１４１を備える点である。 [Hardware configuration example]
Next, the hardware configuration of the motion planning section 3 included in the robot control system 100 in the third embodiment of the present invention will be described with reference to FIG. 14. Note that in FIG. 14, the same reference numerals as those in FIG. 3 indicate the same parts, so a repeated explanation will be omitted. The difference from the first embodiment is that the third embodiment includes a video output interface 141.

映像出力インターフェース１４１は、例えば、ＶＧＡ(Video Graphics Array)やＤＶＩ（Digital Visual interface）、ＨＤＭＩ(High-Definition Multimedia Interface、登録商標)、Display Portが用いられる。映像出力インターフェース１４１は、専用線等を介して、ディスプレイへ映像を送信することが可能となるように構成されている。 As the video output interface 141, for example, VGA (Video Graphics Array), DVI (Digital Visual interface), HDMI (High-Definition Multimedia Interface, registered trademark), or Display Port is used. The video output interface 141 is configured to be able to transmit video to a display via a dedicated line or the like.

［機能構成例］
次に、本発明第３の実施形態における、ロボット制御システム１００が備える動作計画部３のスクリーニング装置１３１の機能構成例について、図１５を参照して説明する。 [Functional configuration example]
Next, an example of the functional configuration of the screening device 131 of the motion planning section 3 included in the robot control system 100 in the third embodiment of the present invention will be described with reference to FIG. 15.

図１５は、本発明の実施となるスクリーニング装置１３１の機能構成例を現したものである。なお、図１５において、図１３と同一符号は同一部品を示すので、再度の説明は省略する。 FIG. 15 shows an example of the functional configuration of a screening device 131 that implements the present invention. Note that in FIG. 15, the same reference numerals as those in FIG. 13 indicate the same parts, so a repeated explanation will be omitted.

図１５に示すスクリーニング装置１３１において、グルーピング部１５１は、センサデータ蓄積装置３１に保存されたセンサデータを、動作が類似したデータが同じグループになるようにする。グルーピング部の実現方法として、例えば、同じ目標手先位置のセンサデータ毎にグループに分けることが考えられる。 In the screening device 131 shown in FIG. 15, the grouping unit 151 groups the sensor data stored in the sensor data storage device 31 so that data with similar actions are grouped together. As a method of implementing the grouping section, for example, it is possible to divide sensor data of the same target hand position into groups.

代表データ算出部１５２は、グルーピング部１５１で得られたグループ毎に、そのグループを代表する時系列データ（これを代表データと呼ぶ）を算出する。代表データ算出部１５２の実現方法としては、例えば、グループ内の全ての時系列データについて、各時刻ごとに中央値を算出することで得られた時系列データを、代表データとすることが考えられる。 The representative data calculation unit 152 calculates, for each group obtained by the grouping unit 151, time series data that represents the group (this is referred to as representative data). As a method for realizing the representative data calculation unit 152, for example, time series data obtained by calculating the median value at each time for all time series data in a group may be used as representative data. .

外れデータ検出部１５３は、グルーピング部１５１で得られたグループ毎に、代表データ算出部１５２で算出した代表データと類似していない時系列データ（これを外れデータと呼ぶ）を検出する。外れデータ検出部の実現方法として、例えばＤＴＷ（Dynamic Time Warping、動的時間伸縮法）と、ＩＱＲ（Interquartile range, 四分位範囲）を用いることが考えられる。ＤＴＷとは、時系列データ同士の類似度の指標である。ＤＴＷのとり得る値は０以上であり、０に近いほど時系列データ同士が類似していることを意味する。まず、グループ内のすべての時系列データについて、代表データとのＤＴＷを算出することで、ＤＴＷの系列を得る。次に、算出されたＤＴＷの系列のＩＱＲを求める。ＩＱＲとは、データの散らばり度合いを現す指標であり、（第三四分位数）―（第一四分位数）で求まる。最後に、時系列データのうち、代表データに対するＤＴＷが、（第三四分位数＋α×ＩＱＲ）より大きなものを外れデータとする。αの値は初期設定では1.5に設定されている。また、ユーザーが画面操作部５を利用することで値を変更することも可能である。 The outlier data detection unit 153 detects time series data (referred to as outlier data) that is not similar to the representative data calculated by the representative data calculation unit 152 for each group obtained by the grouping unit 151. As a method for implementing the outlier data detection section, it is possible to use, for example, DTW (Dynamic Time Warping) and IQR (Interquartile Range). DTW is an index of similarity between time series data. The possible values of DTW are 0 or more, and the closer it is to 0, the more similar the time series data are. First, the DTW series is obtained by calculating the DTW with the representative data for all the time series data in the group. Next, the IQR of the calculated DTW sequence is determined. IQR is an index that expresses the degree of dispersion of data, and is determined by (3rd quartile) - (1st quartile). Finally, among the time-series data, data whose DTW for the representative data is larger than (third quartile+α×IQR) is determined to be outlier data. The value of α is initially set to 1.5. Further, the value can also be changed by the user using the screen operation section 5.

計算結果出力部１５４は、整合性検証装置３２へ、外れデータ以外のセンサデータを出力する。また、画面操作部５へ、外れデータの情報を出力する。 The calculation result output unit 154 outputs sensor data other than the deviation data to the consistency verification device 32. Additionally, information on missing data is output to the screen operation unit 5.

次に、本発明第３の実施形態における、ロボット制御システム１００が備える画面操作部５の機能構成例について、図１６を参照して説明する。 Next, an example of the functional configuration of the screen operation unit 5 included in the robot control system 100 in the third embodiment of the present invention will be described with reference to FIG. 16.

図１６は、本発明の実施となる画面操作部５の機能構成例を現したものである。なお、図１６において、図１と同一符号は同一部品を示すので、再度の説明は省略する。 FIG. 16 shows an example of the functional configuration of the screen operation section 5 that embodies the present invention. Note that in FIG. 16, the same reference numerals as in FIG. 1 indicate the same parts, so repeated explanation will be omitted.

図１６に示す画面操作部５において、操作入力部１６１は、例えば、マウス、キーボード等の入力装置で構成され、ユーザーからのマウス入力やキーボード入力等を受け付ける。 In the screen operation unit 5 shown in FIG. 16, the operation input unit 161 is configured with an input device such as a mouse and a keyboard, and receives mouse input, keyboard input, etc. from the user.

画面表示部１６２は、例えば、ディスプレイ等で構成され、操作入力部１６１や、動作計画部３から得られる情報を可視化する。 The screen display section 162 is composed of, for example, a display, and visualizes information obtained from the operation input section 161 and the motion planning section 3.

画面制御部１６３は、動作計画部３や操作入力部１６１からの情報を受信する。また、動作計画部３や画面表示部１６２に情報を出力する。 The screen control unit 163 receives information from the motion planning unit 3 and the operation input unit 161. It also outputs information to the motion planning section 3 and screen display section 162.

プロセッサ（ＣＰＵ２３）は、教示データから外れデータを除去することでスクリーニングを行う（スクリーニング装置１３１）。これにより、汎化動作性能をさらに向上することができる。 The processor (CPU 23) performs screening by removing data that deviates from the teaching data (screening device 131). Thereby, generalization performance can be further improved.

（第４の実施形態）
次に、本発明の第４の実施形態として、動作生成時に、任意の動作速度、又は／及び、任意の力の大きさを実現する場合について、図１７を参照して説明する。 (Fourth embodiment)
Next, as a fourth embodiment of the present invention, a case will be described with reference to FIG. 17 in which an arbitrary motion speed and/or arbitrary force magnitude is realized when motion is generated.

図１７は、本発明の第４の実施形態における、動作計画部の構成例を示すブロック図である。なお、図１７において、図３と同一符号は同一部品を示すので、再度の説明は省略する。第１の実施形態からの変更点は、第４の実施形態では、動作計画部３において、動作パラメータ調整装置１７１を備えている点である。動作パラメータ調整装置１７１は、センサデータ蓄積装置３１に保存されたセンサデータから、動作生成時に、任意の動作速度、又は／及び、任意の力の大きさを実現する。 FIG. 17 is a block diagram showing a configuration example of a motion planning section in the fourth embodiment of the present invention. Note that in FIG. 17, the same reference numerals as those in FIG. 3 indicate the same parts, so a repeated explanation will be omitted. The difference from the first embodiment is that in the fourth embodiment, the motion planning section 3 includes a motion parameter adjustment device 171. The motion parameter adjustment device 171 realizes an arbitrary motion speed and/or arbitrary force magnitude when generating a motion from the sensor data stored in the sensor data storage device 31.

［動作パラメータ調整装置のハードウェア構成］
本発明の第４の実施形態における、ロボット制御システム１００が備える動作計画部３のハードウェア構成は、本発明第３の実施形態の場合と同じであるため、再度の説明は省略する。 [Hardware configuration of operating parameter adjustment device]
The hardware configuration of the motion planning unit 3 included in the robot control system 100 in the fourth embodiment of the present invention is the same as that in the third embodiment of the present invention, and therefore, a repeated explanation will be omitted.

［動作パラメータ調整装置の機能構成例］
次に、本発明第４の実施形態における、ロボット制御システム１００が備える動作計画部３の動作パラメータ調整装置１７１の機能構成例について、図１８を参照して説明する。 [Example of functional configuration of operating parameter adjustment device]
Next, an example of the functional configuration of the motion parameter adjustment device 171 of the motion planning section 3 included in the robot control system 100 in the fourth embodiment of the present invention will be described with reference to FIG. 18.

図１８は、本発明の実施となる動作パラメータ調整装置１７１の機能構成例を現したものである。なお、図１８において、図１２、図１３と同一符号は同一部品を示すので、再度の説明は省略する。 FIG. 18 shows an example of the functional configuration of an operating parameter adjustment device 171 that embodies the present invention. Note that in FIG. 18, the same reference numerals as those in FIGS. 12 and 13 indicate the same parts, so a repeated explanation will be omitted.

図１８に示す動作パラメータ調整装置１７１において、動作パラメータ記憶部１８１は、動作パラメータ調整に必要なパラメータの値を保存する。パラメータの種類は、例えば動作生成モデルの制御周期や、電流値の範囲、トルクセンサ値の範囲等がある。パラメータの値は、ユーザーが、画面操作部５を介して変更できる。 In the operating parameter adjustment device 171 shown in FIG. 18, the operating parameter storage unit 181 stores values of parameters necessary for operating parameter adjustment. Types of parameters include, for example, the control cycle of the motion generation model, the range of current values, the range of torque sensor values, and the like. The value of the parameter can be changed by the user via the screen operation unit 5.

動作パラメータ調整演算部１８２は、動作パラメータ記憶部１８１に保存しているパラメータの値に基づいて、動作生成時に、動作速度、又は／及び、力の大きさを調整する。動作パラメータ調整演算部で動作速度を変更する具体的な方法として、例えば、動作生成時の制御周期を変えることが考えられる。例えば、動作速度を大きくしたい場合は、教示データのサンプリング周期よりも、動作生成時の制御周期を小さくすればよい。動作パラメータ調整演算部で力の大きさを変更する具体的な方法として、例えば、電流値やトルクセンサ値を変えることが考えられる。例えば、力の大きさを小さくしたい場合は、電流値やトルクセンサ値の上限を、教示データよりも小さくすればよい。 The motion parameter adjustment calculation section 182 adjusts the motion speed and/or the magnitude of force when generating motion, based on the parameter values stored in the motion parameter storage section 181. As a specific method of changing the motion speed in the motion parameter adjustment calculation section, for example, changing the control cycle when motion is generated can be considered. For example, if it is desired to increase the motion speed, the control cycle at the time of motion generation may be made smaller than the sampling cycle of the teaching data. As a specific method of changing the magnitude of force in the operation parameter adjustment calculation section, for example, changing the current value or torque sensor value may be considered. For example, if it is desired to reduce the magnitude of the force, the upper limit of the current value or torque sensor value may be made smaller than the teaching data.

処理装置（計算機２０）は、機械学習後のロボット（ロボット装置１）の動作を調整するパラメータを示す動作パラメータを記憶する記憶装置（動作パラメータ記憶部１８１）を備える。記憶装置は、例えば、ＲＡＭ２４、不揮発性ストレージ２５等で構成される。 The processing device (computer 20) includes a storage device (operation parameter storage unit 181) that stores operation parameters indicating parameters for adjusting the operation of the robot (robot device 1) after machine learning. The storage device includes, for example, a RAM 24, a nonvolatile storage 25, and the like.

プロセッサ（ＣＰＵ２３）は、学習済みの機械学習モデルを用いて、動作パラメータに基づきロボット（ロボット装置１）の動作の指令値を生成する（機械学習装置３４）。これにより、教示・学習を再度行うことなく、ロボットの動作を調整することができる。 The processor (CPU 23) uses the learned machine learning model to generate command values for the operation of the robot (robot device 1) based on the operation parameters (machine learning device 34). This allows the robot's motion to be adjusted without having to perform teaching/learning again.

動作パラメータは、例えば、教示データのサンプリング周期である。プロセッサ（ＣＰＵ２３）は、指令値において、ロボット（ロボット装置１）の制御周期をサンプリング周期より小さくすることでロボットの動作の速度を大きくし、又は指令値において、ロボットの制御周期をサンプリング周期より大きくすることでロボットの動作の速度を小さくする（動作パラメータ調整装置１７１）。これにより、教示・学習を再度行うことなく、ロボットの動作の速度を調整することができる。 The operating parameter is, for example, the sampling period of teaching data. The processor (CPU 23) increases the speed of the robot's operation by making the control period of the robot (robot device 1) smaller than the sampling period in the command value, or increases the speed of the robot's operation by making the control period of the robot (robot device 1) larger than the sampling period in the command value. By doing so, the speed of the robot's motion is reduced (motion parameter adjustment device 171). Thereby, the speed of the robot's motion can be adjusted without having to perform teaching/learning again.

また、動作パラメータは、例えば、教示データにおけるロボットのトルク又はそれと相関のある値の最大値Ｍである。プロセッサ（ＣＰＵ２３）は、指令値において、ロボット（ロボット装置１）のトルク又はそれと相関のある値（例えば、アクチュエータの駆動電流の値）の上限を最大値Ｍより大きくすることでロボットの力を大きし、又は指令値において、ロボットのトルク又はそれと相関のある値の上限を最大値Ｍより小さくすることで前記ロボットの力を小さくする。これにより、教示・学習を再度行うことなく、ロボットの力を調整することができる。 Further, the operation parameter is, for example, the maximum value M of the robot torque or a value correlated therewith in the teaching data. The processor (CPU 23) increases the force of the robot by making the upper limit of the torque of the robot (robot device 1) or a value correlated therewith (for example, the value of the driving current of an actuator) larger than the maximum value M in the command value. Alternatively, in the command value, the upper limit of the robot's torque or a value correlated therewith is made smaller than the maximum value M, thereby reducing the force of the robot. This allows the robot's force to be adjusted without having to perform teaching/learning again.

なお、本発明は上記した実施形態に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明したすべての構成を備えるものに限定されるものではない。また、ある実施形態の構成の一部を他の実施形態の構成に置き換えることが可能であり、また、ある実施形態の構成に他の実施形態の構成を加えることも可能である。また、各実施形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 Note that the present invention is not limited to the embodiments described above, and includes various modifications. For example, the embodiments described above have been described in detail to explain the present invention in an easy-to-understand manner, and the present invention is not necessarily limited to having all the configurations described. Furthermore, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Furthermore, it is possible to add, delete, or replace some of the configurations of each embodiment with other configurations.

また、上記の各構成、機能、処理部等は、それらの一部又は全部を、例えば集積回路で設計するなどによりハードウェアで実現してもよい。ハードウェアとして、ＦＰＧＡ（Field Programmable Gate Array）やＡＳＩＣ（Application Specific Integrated Circuit）などの広義のプロセッサデバイスを用いてもよい。 Further, each of the configurations, functions, processing units, etc. described above may be partially or entirely realized in hardware by designing, for example, an integrated circuit. As the hardware, a broadly defined processor device such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit) may be used.

また、上述した各実施形態に係る動作計画部３の各構成要素は、制御部４に実装されてもよい。また、動作計画部３のある処理部により実施される処理が、１つのハードウェアにより実現されてもよいし、複数のハードウェアによる分散処理により実現されてもよい。 Further, each component of the motion planning section 3 according to each embodiment described above may be implemented in the control section 4. Further, the processing performed by a certain processing section of the motion planning section 3 may be realized by one piece of hardware, or may be realized by distributed processing by a plurality of pieces of hardware.

上述した各実施形態では、センサデータ取得部２、動作計画部３、制御部４は、一例として１台の計算機２０によって実現されるが、別々の計算機によって実現されていてもよい。計算負荷の大きい動作計画部３を高性能な計算機で実現することで、全体のスループットを向上することができる。なお、計算機どうしは、例えば、ＬＡＮ、インターネット等の通信ネットワークを介して相互に接続される。 In each of the embodiments described above, the sensor data acquisition section 2, the motion planning section 3, and the control section 4 are realized by one computer 20, as an example, but they may be realized by separate computers. By implementing the motion planning unit 3, which has a large calculation load, using a high-performance computer, the overall throughput can be improved. Note that the computers are connected to each other via a communication network such as a LAN or the Internet.

なお、本発明の実施形態は、以下の態様であってもよい。 Note that the embodiment of the present invention may have the following aspects.

（１）．ロボットの動作を生成するための機械学習を行う機械学習システムであって、ロボットへの教示データ（時系列のセンサ情報等）を取得する取得部と、教示データを、ロボットの動作に基づき分節する動作分節化部と、タスクが共通する異なる教示データ間で、分節された動作のうち、同種の動作の速度が異なる場合に、該同種の動作の速度を揃えるように教示データを補正する速度調整部と、タスクが共通する異なる教示データ間で、分節された動作のうち、同種の動作のタイミングが異なる場合に、該同種の動作のタイミングを揃えるように教示データを補正するタイミング調整部と、教示データにおけるロボットのダイナミクスを補正するダイナミクス調整部と、を備え、前記速度と前記タイミングと前記ダイナミクスのうち少なくとも１つが補正された教示データを、前記機械学習システムにおける学習用データとする、機械学習システム。 (1). This is a machine learning system that performs machine learning to generate robot motions, and includes an acquisition unit that acquires teaching data to the robot (time-series sensor information, etc.) and segmentation of the teaching data based on the robot's motion. A speed adjustment that corrects the teaching data so that the speeds of the same kind of motions are the same when the speeds of the same kind of motions among the segmented motions are different between the motion segmentation unit and different teaching data that have a common task. and a timing adjustment unit that corrects the teaching data so as to align the timings of the same type of movement when the timings of the same type of movement among the segmented movements are different between different teaching data having a common task; a dynamics adjustment unit that corrects the dynamics of the robot in the teaching data, and the teaching data in which at least one of the speed, the timing, and the dynamics has been corrected is used as learning data in the machine learning system. system.

（２）．（１）において、機械学習システムは、ロボットへの教示データとして、ロボットの位置や、関節角度情報等を取得する取得部を備えた、機械学習システム。 (2). In (1), the machine learning system includes an acquisition unit that acquires the position of the robot, joint angle information, etc. as teaching data to the robot.

（３）．（２）において、機械学習システムは、ロボットの位置や、関節角度情報に加えて、ロボットや作業環境の画像を取得する取得部を備えた、機械学習システム。 (3). In (2), the machine learning system is a machine learning system that includes an acquisition unit that acquires images of the robot and the work environment in addition to the robot's position and joint angle information.

（４）．（２）において、機械学習システムは、ロボットの位置や、関節角度情報に加えて、対象物の位置、姿勢情報を取得する取得部を備えた、機械学習システム。 (4). In (2), the machine learning system includes an acquisition unit that acquires position and posture information of a target object in addition to robot position and joint angle information.

（５）．（１）において、機械学習システムは、ロボットへの教示データとして、ロボットの速度、関節角速度情報等を取得する取得部を備えた、機械学習システム。 (5). In (1), the machine learning system includes an acquisition unit that acquires robot speed, joint angular velocity information, etc. as teaching data to the robot.

（６）．（１）において、機械学習システムは、ロボットへの教示データとして、ロボットの触覚情報を取得する取得部を備えた、機械学習システム。 (6). In (1), the machine learning system includes an acquisition unit that acquires tactile information of the robot as teaching data to the robot.

（７）．（１）において、機械学習システムは、ロボットのダイナミクスのうち、加速度または／および加速度変化を調整するダイナミクス調整部を備えた、機械学習システム。 (7). In (1), the machine learning system includes a dynamics adjustment unit that adjusts acceleration and/or changes in acceleration among the dynamics of the robot.

（８）．（１）において、機械学習システムは、ロボットのダイナミクスのうち、躍度または／および躍度変化を調整するダイナミクス調整部を備えた、機械学習システム。 (8). In (1), the machine learning system includes a dynamics adjustment unit that adjusts jerk and/or jerk change among the dynamics of the robot.

（９）．（１）において、機械学習システムは、ロボットのダイナミクスのうち、トルクまたは／およびトルク変化を調整するダイナミクス調整部を備えた、機械学習システム。 (9). In (1), the machine learning system includes a dynamics adjustment unit that adjusts torque or/and torque change among the dynamics of the robot.

（１０）．（１）において、前記速度調整部は、ロボットへの教示データにカメラ画像が含まれる場合、前記教示データ間で、フレーム補間を行った後に、該同種の動作の速度を揃えるように教示データを補正することを特徴とする、機械学習システム。 (10). In (1), when the teaching data to the robot includes a camera image, the speed adjusting section performs frame interpolation between the teaching data and then adjusts the teaching data so as to equalize the speed of the same type of motion. A machine learning system that is characterized by correction.

（１１）．（１）において、機械学習システムは、取得した教示データの中から、動作汎化性能向上に有効な教示データを抽出するスクリーニング装置を備えた、機械学習システム。 (11). In (1), the machine learning system includes a screening device that extracts teaching data effective for improving motion generalization performance from acquired teaching data.

（１２）．（１）において、機械学習システムは、学習済みの機械学習モデルを用いて動作生成を行う際、任意の動作速度を実現するために、ロボットの制御周期を調整する動作パラメータ調整装置を備えた、機械学習システム。 (12). In (1), the machine learning system is equipped with a motion parameter adjustment device that adjusts the control cycle of the robot in order to achieve an arbitrary motion speed when generating motion using a trained machine learning model. Machine learning system.

（１３）．（１）において、機械学習システムは、学習済みの機械学習モデルを用いて動作生成を行う際、任意の力の大きさを実現するために、ロボットのトルクを調整する動作パラメータ調整装置を備えた、機械学習システム。 (13). In (1), the machine learning system is equipped with a motion parameter adjustment device that adjusts the torque of the robot in order to generate an arbitrary force when generating motion using a trained machine learning model. , machine learning systems.

（１）～（１３）によれば、動作学習を阻害する教示データ間の時間方向のばらつきを低減することで、導入工数削減と汎化性能獲得の両立が期待できる。 According to (1) to (13), by reducing variations in the time direction between teaching data that impede motion learning, it is expected that both reduction of the introduction man-hours and acquisition of generalization performance can be achieved.

１…ロボット装置、２…センサデータ取得部、３…動作計画部、４…制御部、５…画面操作部、２２…ＲＯＭ、２３…ＣＰＵ、２４…ＲＡＭ、２５…不揮発性ストレージ、２６…入出力インターフェース、２７…ネットワークインターフェース、３１…センサデータ蓄積装置、３２…整合性検証装置、３３…学習データ蓄積装置、３４…機械学習装置、４１…動作分節化部、４２…速度調整部、４３…タイミング調整部、４４…ダイナミクス調整部、４５…データ処理部、６１…機械学習モデル定義部、６２…学習部、６３…学習済み重み蓄積部、６４…推論部、１０１…フレーム補間部、１３１…スクリーニング装置、１６１…操作入力部、１５１…グルーピング部、１５２…代表データ算出部、１５３…外れデータ検出部、１５４…計算結果出力部、１６１…操作入力部、１６２…画面表示部、１６３…画面制御部、１７１…動作パラメータ調整装置、１８１…動作パラメータ記憶部、１８２…動作パラメータ調整演算部 DESCRIPTION OF SYMBOLS 1... Robot device, 2... Sensor data acquisition part, 3... Motion planning part, 4... Control part, 5... Screen operation part, 22... ROM, 23... CPU, 24... RAM, 25... Non-volatile storage, 26... Input Output interface, 27... Network interface, 31... Sensor data storage device, 32... Consistency verification device, 33... Learning data storage device, 34... Machine learning device, 41... Motion segmentation unit, 42... Speed adjustment unit, 43... Timing adjustment section, 44... Dynamics adjustment section, 45... Data processing section, 61... Machine learning model definition section, 62... Learning section, 63... Learned weight accumulation section, 64... Inference section, 101... Frame interpolation section, 131... Screening device, 161... Operation input section, 151... Grouping section, 152... Representative data calculation section, 153... Outlier data detection section, 154... Calculation result output section, 161... Operation input section, 162... Screen display section, 163... Screen Control unit, 171... Operating parameter adjustment device, 181... Operating parameter storage unit, 182... Operating parameter adjustment calculation unit

Claims

Segmenting the teaching data to the robot for each of the same types of movements of the robot,
correcting the plurality of segmented teaching data to align the speed or timing of the same type of motion of the robot;
Synthesizing the plurality of corrected teaching data,
A processing device including a processor that performs machine learning using the synthesized teaching data.

The processing device according to claim 1,
The processor includes:
A processing device characterized in that the plurality of segmented teaching data are corrected to smooth values of dynamics of the same type of motion of the robot.

The processing device according to claim 2,
The teaching data is
position information indicating the position of the robot; joint angle information indicating joint angles;
A processing device comprising at least one of speed information indicating a speed of the robot, joint angular velocity information indicating a joint angular velocity of the robot, and tactile information indicating a sensor value of a tactile sensor provided in the robot.

The processing device according to claim 3,
The teaching data is
A processing device comprising an image of the robot or the working environment.

The processing device according to claim 3,
The teaching data is
A processing device comprising at least one of position information indicating a position of an object on which the robot works, and posture information indicating an attitude of the object.

The processing device according to claim 2,
The dynamics are
acceleration, acceleration change,
jerk, jerk change,
A processing device comprising at least one of torque and torque change.

The processing device according to claim 4,
The processor includes:
A processing device characterized by performing frame interpolation of an image included in the plurality of segmented teaching data, and then performing correction to equalize the speeds of the same type of motion of the robot.

The processing device according to claim 1,
The processor includes:
A processing device characterized in that screening is performed by removing data that deviates from the teaching data.

The processing device according to claim 1,
comprising a storage device that stores motion parameters indicating parameters for adjusting the motion of the robot after machine learning;
The processor includes:
A processing device that generates command values for the robot's motion based on the motion parameters using a trained machine learning model.

The processing device according to claim 9,
The operating parameter is a sampling period of the teaching data,
The processor includes:
In the command value, the speed of operation of the robot is increased by making the control period of the robot smaller than the sampling period, or in the command value, the control period of the robot is made larger than the sampling period. A processing device characterized by reducing the speed of robot movement.

The processing device according to claim 9,
The operation parameter is the maximum value of the torque of the robot or a value correlated thereto in the teaching data,
The processor includes:
In the command value, the force of the robot is increased by making the upper limit of the torque of the robot or a value correlated therewith larger than the maximum value, or in the command value, the torque of the robot or a value correlated therewith is increased. A processing device characterized in that the force of the robot is reduced by making an upper limit of the robot smaller than the maximum value.

The processing device according to claim 1,
The processor includes:
A processing device characterized by quantizing the plurality of segmented teaching data and removing noise to equalize the speeds of the same type of motion of the robot.

The processing device according to claim 1,
The processor includes:
A processing device characterized in that the plurality of segmented teaching data are corrected by increasing or decreasing the stationary time of the robot so as to align the timings of the same type of motion of the robot.

A robot control system comprising the processing device according to claim 1 and a robot,
The processor generates a command value for the robot's operation using a learned machine learning model,
The robot control system is characterized in that the robot operates according to the command value.

a step of segmenting the teaching data to the robot for each of the same type of movements of the robot;
a step of correcting the plurality of segmented teaching data to align the speed or timing of the same type of motion of the robot;
a step of synthesizing the plurality of corrected teaching data;
performing machine learning using the synthesized teaching data;
machine learning methods, including;