JP2024101163A

JP2024101163A - Information processing method, information processing device, robot system, control method of robot system, article manufacturing method, program, and recording medium

Info

Publication number: JP2024101163A
Application number: JP2023004946A
Authority: JP
Inventors: 基宏堀内; Motohiro Horiuchi; 明裕小田; Akihiro Oda; 和彦品川; Kazuhiko Shinagawa; 雄一郎工藤; Yuichiro Kudo
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2023-01-17
Filing date: 2023-01-17
Publication date: 2024-07-29

Abstract

PROBLEM TO BE SOLVED: To provide a technology to improve control accuracy for a control object using a learned model.

SOLUTION: An information processing method executes processing for acquiring first data by operating an object corresponding to a control object in an environment different from an operation environment of the control object, and acquiring a learned model by machine learning using the first data. The information processing method executes processing for acquiring second data for causing the control object to be operated in the operation environment using the learned model, and determination processing for causing the control object to be operated in the operation environment using the second data and determining the result. The information processing method executes learning processing (steps S31-S35) for acquiring third data on an operation in which the result in the determination processing is determined to be successful, and causing the learned model to perform learning by machine learning using the third data.

SELECTED DRAWING: Figure 13

Description

本発明は、情報処理方法、情報処理装置、ロボットシステム、ロボットシステムの制御方法、物品の製造方法、プログラム及び記録媒体に関する。 The present invention relates to an information processing method, an information processing device, a robot system, a control method for a robot system, a method for manufacturing an article, a program, and a recording medium.

従来、ロボットの制御方法を機械学習する手法として、制御対象物が模倣する対象となる模倣対象物を動作させて機械学習する模倣学習を用いることが知られている。模倣学習では、少ないデータ数で学習し、生産現場におけるティーチング等では困難な動作をロボットに行わせることができる。模倣学習における学習精度を上げる技術として、機械学習モデルを学習させるためのデータを収集する段階でデータを選別し、学習にとって望ましいデータのみを選択的に収集する技術が開示されている（特許文献１参照）。 Traditionally, imitation learning has been known as a method for machine learning a robot control method, in which a controlled object imitates an imitation object through machine learning by moving the object. Imitation learning allows a robot to learn with a small amount of data, and can make the robot perform actions that would be difficult to do through teaching at a production site. As a technique for improving the learning accuracy in imitation learning, a technique has been disclosed in which data is selected at the stage of collecting data for training a machine learning model, and only data desirable for learning is selectively collected (see Patent Document 1).

特開２０２１－１０７９７０号公報JP 2021-107970 A

模倣学習においては、ロボットを用いて制御方法を学習するよりも簡易に動作を学習したい場合や、多くのデータを収集したい場合に、人の動作やシミュレーション等の実際の制御対象物となるロボットとは異なるシステムからデータを収集することがある。このような場合においては、動作データの収集元となる模倣対象物である人やシミュレーションと、制御対象物であるロボットから得られるデータと、に差異が生まれ、学習におけるノイズとなる。このため、ノイズを含んだデータから機械学習により取得された学習済モデルを用いた制御対象物の制御の精度が低下してしまうという課題があった。 In imitation learning, when it is desired to learn an action more easily than learning a control method using a robot, or when it is desired to collect a large amount of data, data may be collected from a system other than the robot that is the actual controlled object, such as human actions or simulations. In such cases, differences arise between the person or simulation that is the imitation object from which the action data is collected, and the data obtained from the robot that is the controlled object, and this becomes noise in learning. This poses the problem of a decrease in the accuracy of control of the controlled object using a trained model obtained by machine learning from data containing noise.

そこで、本発明は、制御対象物の制御の精度を向上させることが可能な技術を提供することを目的とする。 The present invention aims to provide a technology that can improve the accuracy of controlling the controlled object.

本発明の第１の態様は、制御対象物の動作環境とは異なる環境で前記制御対象物に対応する物体を動作させて第１データを取得し、前記第１データを用いた機械学習により学習済モデルを取得する処理と、前記学習済モデルを用いて、前記制御対象物を前記動作環境で動作させるための第２データを取得する処理と、前記第２データを用いて前記制御対象物を前記動作環境で動作させ、結果を判定する判定処理と、前記判定処理において結果が成功と判定された動作についての第３データを取得し、前記第３データを用いた機械学習により前記学習済モデルを学習させる学習処理と、を実行する、ことを特徴とする情報処理方法である。 A first aspect of the present invention is an information processing method that executes a process of acquiring first data by operating an object corresponding to a control object in an environment different from the operating environment of the control object, and acquiring a trained model by machine learning using the first data, a process of acquiring second data for operating the control object in the operating environment using the trained model, a determination process of operating the control object in the operating environment using the second data and determining the result, and a learning process of acquiring third data about an operation whose result is determined to be successful in the determination process, and training the trained model by machine learning using the third data.

また、本発明の第２の態様は、情報処理部を備える情報処理装置であって、前記情報処理部が、制御対象物の動作環境とは異なる環境で前記制御対象物に対応する物体を動作させて第１データを取得し、前記第１データを用いた機械学習により学習済モデルを取得する処理と、前記学習済モデルを用いて、前記制御対象物を前記動作環境で動作させるための第２データを取得する処理と、前記第２データを用いて前記制御対象物を前記動作環境で動作させ、結果を判定する判定処理と、前記判定処理において結果が成功と判定された動作についての第３データを取得し、前記第３データを用いた機械学習により前記学習済モデルを学習させる学習処理と、を実行する、ことを特徴とする情報処理装置である。 A second aspect of the present invention is an information processing device including an information processing unit, characterized in that the information processing unit executes a process of acquiring first data by operating an object corresponding to the control object in an environment different from the operating environment of the control object, and acquiring a trained model by machine learning using the first data, a process of acquiring second data for operating the control object in the operating environment using the trained model, a determination process of operating the control object in the operating environment using the second data and determining the result, and a learning process of acquiring third data about an operation whose result is determined to be successful in the determination process, and training the trained model by machine learning using the third data.

本発明によれば、制御対象物の制御の精度を向上させることができる。 The present invention can improve the accuracy of control of the controlled object.

本発明の実施形態に係るロボット動作学習装置の概念図である。1 is a conceptual diagram of a robot motion learning device according to an embodiment of the present invention. 本発明の実施形態に係るロボット動作学習装置が有する記録装置の概略図である。FIG. 2 is a schematic diagram of a recording device included in the robot motion learning device according to the embodiment of the present invention. 本発明の実施形態に係るロボット動作学習装置が有する制御装置の制御ブロック図である。FIG. 2 is a control block diagram of a control device provided in the robot motion learning device according to the embodiment of the present invention. 本発明の実施形態に係る模倣対象エリアと制御対象エリアとの対応関係を説明図である。FIG. 4 is an explanatory diagram of a correspondence relationship between an imitation target area and a control target area according to the embodiment of the present invention. 本発明の実施形態に係る学習器の説明図である。FIG. 2 is an explanatory diagram of a learning device according to an embodiment of the present invention. 本発明の実施形態に係る情報処理装置が実行する学習工程の流れを示すフローチャートである。10 is a flowchart showing the flow of a learning process executed by the information processing device according to the embodiment of the present invention. 本発明の実施形態に係る制御対象ロボットが実行する制御対象タスクの一例を示す図である。FIG. 2 is a diagram showing an example of a controlled task executed by a controlled robot according to an embodiment of the present invention. 本発明の実施形態に係る制御対象ロボットが実行する制御対象タスクの一例を示す図である。FIG. 2 is a diagram showing an example of a controlled task executed by a controlled robot according to an embodiment of the present invention. 本発明の実施形態に係る判定部の説明図である。FIG. 4 is an explanatory diagram of a determination unit according to the embodiment of the present invention. 本発明の実施形態に係る表示部に表示されるＧＵＩの説明図である。FIG. 4 is an explanatory diagram of a GUI displayed on a display unit according to an embodiment of the present invention. 本発明の実施形態に係る表示部に表示される詳細設定ウィンドウの説明図である。11 is an explanatory diagram of a detailed setting window displayed on a display unit according to an embodiment of the present invention. FIG. 本発明の実施形態に係る情報処理装置が実行する判定工程の流れを示すフローチャートである。10 is a flowchart showing the flow of a determination process executed by the information processing device according to the embodiment of the present invention. 本発明の実施形態に係る情報処理装置が実行する再学習工程の流れを示すフローチャートである。10 is a flowchart showing the flow of a relearning process executed by the information processing device according to the embodiment of the present invention.

以下、本発明を実施するための形態を、図面を参照しながら詳細に説明する。なお、以下に示す構成はあくまでも一例であり、細部の構成については発明の趣旨を逸脱しない範囲において当事者が適宜変更することができる。また、以下の説明で取り上げる数値は、参考数値であって、例示に過ぎない。 The following describes in detail the embodiment of the present invention with reference to the drawings. Note that the configuration shown below is merely an example, and the details of the configuration may be modified as appropriate by the parties concerned without departing from the spirit of the invention. Also, the numerical values used in the following description are for reference only and are merely examples.

＜ロボットシステムの構成＞
図１は、本実施形態のロボットシステムとしてのロボット動作学習方法を示す概念図である。また、図２は、ロボット動作学習装置１００が有する記録装置４００の概念図である。ロボット動作学習装置１００は、記録装置４００、模倣対象エリア２００、制御対象エリア３００、ロボットの動作を生成する学習器１及び学習器１の生成した動作の成否を判定する判定部２から構成されている。 <Robot system configuration>
Fig. 1 is a conceptual diagram showing a robot motion learning method as a robot system according to this embodiment. Fig. 2 is a conceptual diagram of a recording device 400 included in a robot motion learning device 100. The robot motion learning device 100 is composed of the recording device 400, an area to be imitated 200, an area to be controlled 300, a learning device 1 that generates robot motions, and a judging section 2 that judges whether the motion generated by the learning device 1 is successful or not.

ロボット動作学習装置１００では、まず、模倣対象エリア２００内で模倣対象動作主２０１が実際に制御対象ロボット３０１に行わせたい動作を、模倣対象タスク２０２として実行する。模倣対象動作主２０１の動作は、模倣動作センサ２０３によって検出される。模倣動作センサ２０３は、動作に伴う模倣対象動作主２０１の動作データ（第２データ）、例えば手先位置情報等や、模倣対象タスク２０２の状態及び模倣対象エリア２００内の環境の状態を検出し、記録装置４００に初期収集データ４０１として記録する。 In the robot motion learning device 100, first, the action that the subject 201 to be imitated actually wants the controlled robot 301 to perform in the area to be imitated 200 is executed as the task to be imitated 202. The action of the subject 201 to be imitated is detected by the imitated action sensor 203. The imitated action sensor 203 detects the action data (second data) of the subject 201 to be imitated accompanying the action, such as hand position information, the state of the task to be imitated 202, and the state of the environment in the area to be imitated 200, and records them in the recording device 400 as initial collection data 401.

ロボット動作学習装置１００では、模倣対象タスク２０２として実行される模倣対象動作主２０１の動作を模倣動作センサ２０３によって検出する一連の流れを複数回繰り返した後、学習器１が初期学習処理を行う。初期学習処理は、記録された模倣対象動作主２０１の手先位置、模倣対象タスク２０２の状態及び模倣対象エリア２００内の環境の状態を入力として、ある状況において次に行うべき動作の情報を出力できる学習済モデルを取得する処理である。 In the robot motion learning device 100, a series of steps is repeated multiple times to detect the motion of the subject 201 to be imitated, which is executed as the task 202 to be imitated, using the imitated motion sensor 203, and then the learning device 1 performs an initial learning process. The initial learning process is a process of acquiring a learned model that can output information on the next motion to be performed in a certain situation, using as input the recorded hand position of the subject 201 to be imitated, the state of the task 202 to be imitated, and the state of the environment within the area 200 to be imitated.

初期学習処理が完了した後、学習器１は、推論処理を実行する。推論処理は、制御対象ロボット３０１の動作データ、制御対象タスク３０２の状態及び制御対象エリア３００内の環境の状態を制御動作センサ３０３が検出した値を入力として、次に制御対象ロボット３０１が行うべき動作データを出力する処理である。制御対象ロボット３０１は、推論処理の結果として出力された動作データを再現することで動作が生成される。 After the initial learning process is completed, the learning device 1 executes the inference process. The inference process is a process in which the operation data of the controlled robot 301, the state of the controlled task 302, and the environmental state within the controlled area 300 detected by the control operation sensor 303 are input, and the operation data to be performed next by the controlled robot 301 is output. The controlled robot 301 generates an operation by reproducing the operation data output as a result of the inference process.

判定部２は、動作中又は動作完了後の制御対象ロボット３０１の現在の動作データと、制御対象タスク３０２の状態及び制御対象エリア３００内の環境の状態を制御動作センサ３０３が検出した値と、を入力とし、生成された動作によるタスクの成否を判定する。判定部２による判定方法の詳細については、後述する。 The judgment unit 2 receives as input the current operation data of the controlled robot 301 during or after the completion of the operation, the state of the controlled task 302, and the value detected by the control operation sensor 303 regarding the environmental state within the controlled area 300, and judges whether the task based on the generated operation has been successful. Details of the judgment method used by the judgment unit 2 will be described later.

生成された動作によるタスクが成功と判定された場合、判定部２は、判定に際し入力された動作データと制御動作センサ３０３が検出した値とを記録装置４００に成功データ（第３データ）４０２として保存する保存処理を行う。この時、判定部２は、図２に示すように、記録装置４００の内部で初期収集データ４０１と成功データ４０２とが明確に区別できるようにラベリングして保存する。 If the task based on the generated action is judged to be successful, the judgment unit 2 performs a storage process to store the action data input at the time of judgment and the value detected by the control action sensor 303 as success data (third data) 402 in the recording device 400. At this time, the judgment unit 2 labels and stores the initial collection data 401 and the success data 402 so that they can be clearly distinguished inside the recording device 400, as shown in FIG. 2.

成功データ４０２が記録装置４００に保存されるまでの一連の動作が完了した後、学習器１は、記録装置４００に保存された成功データ４０２を新たな入力として追加して再学習処理を行う。再学習処理を行うことで、ロボット動作学習装置１００では、模倣対象物及び模倣対象物の周囲環境と、制御対象物及び制御対象物の周囲環境と、の間の差異を徐々に埋めることができる。これにより、ロボット動作学習装置１００は、制御対象物となる制御対象ロボット３０１の制御の精度を向上させ、所望のタスクの成功率を高めることができる。 After completing the series of operations up to the time when the success data 402 is stored in the recording device 400, the learning device 1 adds the success data 402 stored in the recording device 400 as a new input and performs a re-learning process. By performing the re-learning process, the robot motion learning device 100 can gradually close the gap between the object to be imitated and the environment surrounding the object to be imitated, and the object to be controlled and the environment surrounding the object to be controlled. This allows the robot motion learning device 100 to improve the accuracy of control of the control target robot 301, which is the object to be controlled, and to increase the success rate of the desired task.

＜ロボット動作学習装置の制御ブロック図＞
図３は、ロボット動作学習装置１００の制御ブロック図を示す図である。図２に示すように、ロボット動作学習装置１００の情報処理装置７００は、プロセッサの一種であり情報処理部としてのＣＰＵ７０１を有する。ＣＰＵ７０１は、学習器１と、判定部２と、しても機能する。 <Control block diagram of the robot motion learning device>
Fig. 3 is a control block diagram of the robot motion learning device 100. As shown in Fig. 2, the information processing device 700 of the robot motion learning device 100 has a CPU 701 which is a type of processor and serves as an information processing unit. The CPU 701 also functions as a learning device 1 and a determination unit 2.

また、情報処理装置７００は、記録部として、ＲＯＭ７０２、ＲＡＭ７０３、ＨＤＤ７０４を有する。また、システム操作用の装置として、表示部としてのディスプレイ７０７と、入力部としてのキーボード７０８及びマウス７０９が接続されている。また、情報処理装置７００には、模倣対象側の模倣動作センサ２０３、制御動作センサ３０３、制御対象ロボット３０１も接続されており、これらとデータのやり取りが可能になっている。 The information processing device 700 also has a ROM 702, a RAM 703, and a HDD 704 as a recording unit. Also, as devices for operating the system, a display 707 as a display unit, and a keyboard 708 and a mouse 709 as input units are connected. The information processing device 700 is also connected to the imitation motion sensor 203, the control motion sensor 303, and the controlled robot 301 on the imitation target side, and is capable of exchanging data with these.

ＨＤＤ７０４は、データを読み書き可能で且つ記録を保持できる構成となっている。ＨＤＤ７０４内には、初期収集データ４０１と、成功データ４０２と、学習器１が学習して生成した学習済モデル７０６と、が保存されている。また、ＨＤＤ７０４には、学習済モデル７０６を呼び出すためのプログラム７０５も保存されている。ロボット動作学習装置１００は、制御対象ロボット３０１を動作させたい際に、プログラム７０５を実行することで制御動作センサ３０３が検出したデータを取得し、制御対象ロボット３０１に動作データを送信することができる。この、ＨＤＤ７０４が、図１に示した記録装置４００を構成する。 The HDD 704 is configured to be able to read and write data and to retain records. Initially collected data 401, successful data 402, and a learned model 706 generated by the learning device 1 through learning are stored in the HDD 704. The HDD 704 also stores a program 705 for calling up the learned model 706. When the robot motion learning device 100 wants to operate the controlled robot 301, it can execute the program 705 to obtain data detected by the control motion sensor 303 and transmit the motion data to the controlled robot 301. This HDD 704 constitutes the recording device 400 shown in FIG. 1.

＜模倣対象物と制御対象物との関係＞
図４は、模倣対象物と制御対象物との対応関係を示す図である。図４に示すように、模倣対象エリア２００には、模倣対象物となる模倣対象動作主２０１と、模倣対象タスク２０２と、模倣動作センサ２０３と、模倣対象エリア２００を照らすライト２０４と、が含まれる。また、制御対象エリア３００には、制御対象物となる制御対象ロボット３０１と、制御対象タスク３０２と、制御動作センサ３０３と、制御対象エリア３００を照らすライト３０４と、が含まれる。 <Relationship between objects to be imitated and objects to be controlled>
Fig. 4 is a diagram showing the correspondence between objects to be imitated and objects to be controlled. As shown in Fig. 4, an area to be imitated 200 includes an actor to be imitated 201, which is an object to be imitated, a task to be imitated 202, an imitated action sensor 203, and a light 204 that illuminates the area to be imitated 200. An area to be controlled 300 includes a robot to be controlled 301, which is an object to be controlled, a task to be controlled 302, a control action sensor 303, and a light 304 that illuminates the area to be controlled 300.

ロボット動作学習装置１００では、まず、模倣対象エリア２００内に模倣対象タスク２０２が設置される。模倣対象タスク２０２は、実際に制御対象ロボット３０１に行わせたい制御対象タスク３０２と同じものを設置する。換言すると、制御対象タスク３０２は、模倣対象タスク２０２と同じものが、制御対象エリア３００内に設置される。 In the robot motion learning device 100, first, a task to be imitated 202 is placed in an area to be imitated 200. The task to be imitated 202 is placed in the area to be imitated 200 and is the same as a control target task 302 that is actually desired to be performed by a control target robot 301. In other words, the control target task 302 is placed in the area to be controlled 300 and is the same as the task to be imitated 202.

模倣対象タスク２０２は、例えば、ワークのピックアンドプレース、工業製品へのパーツの組付け、タオルの折り畳み等、初期状態からロボットの動作によって所望の状態へ変化するタスクであればよい。 The task to be imitated 202 may be any task that changes from an initial state to a desired state through the operation of a robot, such as picking and placing a workpiece, assembling parts into an industrial product, or folding a towel.

模倣動作センサ２０３は、例えば、カメラ、温度計、マイク、触覚センサ、トルクセンサ、加速度センサ等から構成され、模倣対象タスク２０２の様子、模倣対象動作主２０１の動き及び模倣対象エリア２００内の環境の状態を検出する。模倣動作センサ２０３は、模倣対象動作主２０１が有する構成であってもよい。模倣対象エリア２００の環境には、模倣対象エリア２００内の明るさ、設置物、背景、温度、音などが含まれる。 The imitation action sensor 203 is composed of, for example, a camera, a thermometer, a microphone, a tactile sensor, a torque sensor, an acceleration sensor, etc., and detects the state of the task to be imitated 202, the movement of the subject to be imitated 201, and the state of the environment within the area to be imitated 200. The imitation action sensor 203 may be a component possessed by the subject to be imitated 201. The environment of the area to be imitated 200 includes the brightness, installed objects, background, temperature, sounds, etc. within the area to be imitated 200.

模倣対象動作主２０１は、模倣対象タスク２０２を行う。模倣対象動作主２０１は、制御対象物となる制御対象ロボット３０１と同一のロボット、異なるロボット又は人間のいずれであってもよい。換言すると、模倣対象動作主２０１は、制御対象物となる制御対象ロボット３０１に対応する物体であればよい。また、模倣対象動作主２０１は、制御対象ロボット３０１との有する構成要素と対応する構成要素（例えば、ロボットハンド３０１ａと対応するワークを把持する構成要素）を有する。 The subject 201 to be imitated performs the task 202 to be imitated. The subject 201 to be imitated may be the same robot as the control target robot 301, which is the controlled object, a different robot, or a human. In other words, the subject 201 to be imitated may be any object that corresponds to the control target robot 301, which is the controlled object. The subject 201 to be imitated also has components that correspond to those possessed by the control target robot 301 (for example, a component that grasps a workpiece corresponding to the robot hand 301a).

模倣対象動作主２０１が制御対象ロボット３０１と同一又は異なるロボットである場合、作業者は、ジョイスティック、ダイレクトティーチング、ポイントティーチング、マスタースレーブロボット等を用いて模倣対象動作主２０１を動作させタスクを行う。模倣対象動作主２０１が人間である場合、人間は、制御対象ロボット３０１を制御するために必要な手先位置や触覚値等の動作データを観測できるツール等を装着する必要がある。 When the subject of the action to be imitated 201 is the same or a different robot as the robot to be controlled 301, the worker operates the subject of the action to be imitated 201 to perform the task using a joystick, direct teaching, point teaching, master-slave robot, etc. When the subject of the action to be imitated 201 is a human, the human needs to wear a tool or the like that can observe the action data such as the hand position and tactile values required to control the robot to be controlled 301.

模倣動作センサ２０３は、模倣対象動作主２０１が動作をしている最中、模倣対象動作主２０１の動作データや、模倣対象エリア２００内の環境の状態を予め設定したサンプリング周期で検出する。模倣動作センサ２０３が動作データや環境の状態を検出するサンプリング周期は、模倣対象動作主２０１の動きを制御対象ロボット３０１で再現するうえで十分短い周期（例えば、２００Ｈｚ等）を設定する。 The imitation motion sensor 203 detects the motion data of the subject 201 to be imitated and the state of the environment within the area to be imitated 200 at a preset sampling period while the subject 201 to be imitated is performing an action. The sampling period at which the imitation motion sensor 203 detects the motion data and the state of the environment is set to a period (e.g., 200 Hz) that is sufficiently short for the motion of the subject 201 to be imitated to be reproduced by the controlled robot 301.

模倣対象動作主２０１の動作と、模倣動作センサ２０３による検出と、の一連の流れが本実施形態における初期収集データ４０１の１サンプルとなる。学習器１は、初期収集データ４０１を１サンプルのみ取得して学習してもよいし、任意のサンプル数の初期収集データ４０１を取得した後に学習してもよい。 The series of actions of the subject 201 to be imitated and detection by the imitation action sensor 203 constitutes one sample of the initially collected data 401 in this embodiment. The learning device 1 may acquire only one sample of the initially collected data 401 and learn from it, or may acquire any number of samples of the initially collected data 401 and then learn from it.

制御動作センサ３０３は、模倣動作センサ２０３と同一のものとなっており、模倣対象エリア２００に対する模倣動作センサ２０３の相対位置と同じ位置関係になるように制御対象エリア３００内に設置される。換言すると、模倣対象動作主２０１が模倣動作センサ２０３を有する場合であれば、制御動作センサ３０３は、制御対象ロボット３０１が有する構成となっている。制御動作センサ３０３は、例えば、カメラ、温度計、マイク、触覚センサ、トルクセンサ、加速度センサ等から構成され、制御対象タスク３０２の様子、制御対象ロボット３０１の動き及び制御対象エリア３００内の環境の状態を検出する。 The control motion sensor 303 is identical to the imitation motion sensor 203, and is installed in the controlled area 300 so as to have the same relative position as the imitation motion sensor 203 with respect to the imitation area 200. In other words, if the imitation subject 201 has the imitation motion sensor 203, the control motion sensor 303 is configured to be possessed by the controlled robot 301. The control motion sensor 303 is composed of, for example, a camera, a thermometer, a microphone, a tactile sensor, a torque sensor, an acceleration sensor, etc., and detects the state of the controlled task 302, the movement of the controlled robot 301, and the state of the environment within the controlled area 300.

制御対象エリア３００の環境には、制御対象エリア３００内の明るさ、設置物、背景、温度、音などが含まれる。模倣対象エリア２００内の環境と、制御対象エリア３００の環境と、は、それぞれ対応している必要はなく、模倣対象エリア２００内の環境と、制御対象エリア３００の環境と、に差異を有していてもよい。本実施形態においては、例えば、模倣対象エリア２００内に設けられたライト２０４の照度と、制御対象エリア３００内に設けられたライト３０４の照度と、が異なっている。換言すると、模倣対象動作主２０１は、制御対象ロボット３０１の動作環境とは異なる環境で動作する構成となっている。 The environment of the controlled area 300 includes brightness, objects, background, temperature, sounds, etc. within the controlled area 300. The environment within the area to be imitated 200 and the environment within the controlled area 300 do not need to correspond to each other, and there may be differences between the environment within the area to be imitated 200 and the environment within the controlled area 300. In this embodiment, for example, the illuminance of the light 204 provided within the area to be imitated 200 is different from the illuminance of the light 304 provided within the controlled area 300. In other words, the subject to be imitated 201 is configured to operate in an environment different from the operating environment of the robot to be controlled 301.

制御対象ロボット３０１は、少なくとも１つ以上の駆動部と、ロボットハンド３０１ａと、を有するロボットマニピュレータであり、学習器１から出力される動作データにより少なくとも１つ以上の駆動部が制御可能である。制御対象ロボット３０１は、後述する模倣対象動作主２０１を用いた学習時の動作が実現できる制御対象エリア３００内の位置に設置され、ロボットハンド３０１ａが動作開始位置に移動される。ロボットハンド３０１ａの動作開始位置は、初期収集データ４０１の収集時における模倣対象エリア２００に対する模倣対象動作主２０１の動作開始位置と同じ位置関係となる制御対象エリア３００内の相対位置となる。 The controlled robot 301 is a robot manipulator having at least one or more drive units and a robot hand 301a, and at least one or more drive units can be controlled by the operation data output from the learning device 1. The controlled robot 301 is installed at a position in the controlled area 300 where the learned operation using the imitation subject 201 described below can be realized, and the robot hand 301a is moved to an operation start position. The operation start position of the robot hand 301a is a relative position in the controlled area 300 that has the same positional relationship as the operation start position of the imitation subject 201 with respect to the imitation subject area 200 when the initial collection data 401 is collected.

制御動作センサ３０３は、制御対象ロボット３０１のロボットハンド３０１ａが動作開始位置に移動した後に、現在の制御対象タスク３０２の状態、制御対象ロボット３０１の動作及び制御対象エリア３００内の環境の状態を検出する。検出された各種データは、学習済みの学習器１に入力される。 After the robot hand 301a of the controlled robot 301 moves to the operation start position, the control operation sensor 303 detects the current state of the controlled task 302, the operation of the controlled robot 301, and the state of the environment within the controlled area 300. The various detected data are input to the learned learning device 1.

学習器１は、入力された現在の各種データ及びそれまでに入力されてきた過去の各種データから、次に制御対象ロボット３０１が行うべき動作データ（例えば、ロボットハンド３０１ａの開閉度等）を出力する。制御対象ロボット３０１は、学習器１から出力された動作データ（例えば、ロボットハンド３０１ａの手先位置等の指令値）によって動作する。 The learning device 1 outputs operation data (e.g., the opening and closing degree of the robot hand 301a, etc.) that the controlled robot 301 should perform next based on the various current data input and the various past data input up to that point. The controlled robot 301 operates based on the operation data (e.g., command values for the tip position of the robot hand 301a, etc.) output from the learning device 1.

制御対象エリア３００では、予め設定されたステップ数だけ、制御動作センサ３０３に検出された各種データの学習器１への入力と、学習器１から出力される動作データによる制御対象ロボット３０１の動作と、が実行される。なお、予め設定されたステップ数は、初期収集データ４０１を取得する際のステップ数と同一のステップ数であることが望ましいが、それよりも長いステップ数であってもよい。 In the controlled area 300, various data detected by the control operation sensor 303 is input to the learning device 1, and the controlled robot 301 is operated based on the operation data output from the learning device 1 for a preset number of steps. Note that the preset number of steps is preferably the same as the number of steps used to acquire the initial collection data 401, but may be a number of steps longer than that.

＜学習工程＞
次に、学習器１が教師データ（第１データ）として初期収集データ４０１を用いて実行する学習工程について説明する。図５は、学習器１で実行される機械学習を説明する図である。本実施形態の学習器１は、例えばＬＳＴＭ（ＬｏｎｇＳｈｏｒｔＴｅｒｍＭｅｍｏｒｙ）のような機械学習によって構成されている。 <Learning process>
Next, a learning process executed by the learning device 1 using the initially collected data 401 as teacher data (first data) will be described. Fig. 5 is a diagram for explaining machine learning executed by the learning device 1. The learning device 1 of this embodiment is configured by machine learning such as LSTM (Long Short Term Memory).

図５に示すように、ＬＳＴＭは、サンプリングされた現在及び過去数点のサンプリングデータから、将来の結果を予測する教師あり学習である。ＬＳＴＭでは、例えば、任意の時刻ｔ＝ｔからｔ＝ｔ＋ｎにかけての手先位置、触覚値、周囲環境、タスクの状態のｎ点のサンプリングデータを入力とする。そして、ＬＳＴＭでは、入力されたそれらの時系列的な関係性を考慮して、将来のｔ＝ｔ＋ｎ＋１の手先位置、触覚値、周囲環境、タスクの状態を学習する。 As shown in Figure 5, LSTM is a supervised learning method that predicts future results from current and past sampling data. For example, LSTM takes n sampling data points of hand position, tactile values, surrounding environment, and task state from any time t = t to t = t + n as input. Then, LSTM takes into account the time-series relationships between the input data and learns the hand position, tactile values, surrounding environment, and task state for the future time t = t + n + 1.

学習器１は、ＬＳＴＭによる機械学習によって生成された学習済モデルを用いた推論処理を実行する。学習器１は、制御対象ロボット３０１の動作データ、制御対象タスク３０２の状態及び制御対象エリア３００内の環境の状態を制御動作センサ３０３が検出した値を入力とした推論処理により、次に制御対象ロボット３０１が行うべき動作データを出力する。 The learning device 1 executes an inference process using a trained model generated by machine learning using the LSTM. The learning device 1 performs an inference process using the motion data of the controlled robot 301, the state of the controlled task 302, and the environmental state in the controlled area 300 detected by the control motion sensor 303 as inputs, and outputs motion data that the controlled robot 301 should perform next.

なお、ＬＳＴＭに入力するデータは、ある１種類のデータでもよいし、手先位置と触覚値を組み合わせたようなデータでもよい。また、本実施形態においては、具体例としてＬＳＴＭを挙げたが、使用するアルゴリズムは時系列データの予測が可能であればＮｅｕｒａｌＮｅｔｗｏｒｋやＲＮＮ（ＲｅｃｕｒｒｅｎｔＮｅｕｒａｌＮｅｔｗｏｒｋ）、Ｔｒａｎｓｆｏｒｍｅｒ等でもよい。 The data input to the LSTM may be one type of data, or may be a combination of hand position and tactile values. In this embodiment, the LSTM is given as a specific example, but the algorithm used may be a Neural Network, an RNN (Recurrent Neural Network), a Transformer, or the like, as long as it is capable of predicting time-series data.

＜学習工程で実行される処理の流れ＞
図６は、学習器１として機能するＣＰＵ７０１が初期収集データ４０１を用いて機械学習する際にロボット動作学習装置１００において実行される処理の流れを示すフローチャートである。ロボット動作学習装置１００では、まず、模倣対象動作主２０１が模倣対象タスク２０２を実行する（Ｓ１）。次に、ロボット動作学習装置１００では、模倣対象動作主２０１の動作に伴う動作データや、模倣動作センサ２０３によって検出される模倣対象タスク２０２の状態及び模倣対象エリア２００内の環境の状態が取得される（Ｓ２）。この処理において、ＣＰＵ７０１は、模倣対象動作主２０１の動作に伴う動作データと、模倣動作センサ２０３により検出された模倣対象タスク２０２の状態及び模倣対象エリア２００内の環境の状態と、を初期収集データ４０１としてＨＤＤ７０４に保存する。 <Flow of processing executed in the learning process>
6 is a flowchart showing a flow of processing executed in the robot motion learning device 100 when the CPU 701 functioning as the learning device 1 performs machine learning using the initial collected data 401. In the robot motion learning device 100, first, the subject 201 to be imitated executes the task 202 to be imitated (S1). Next, in the robot motion learning device 100, motion data accompanying the motion of the subject 201 to be imitated, the state of the task 202 to be imitated detected by the imitated motion sensor 203, and the state of the environment in the area 200 to be imitated are acquired (S2). In this processing, the CPU 701 saves the motion data accompanying the motion of the subject 201 to be imitated, the state of the task 202 to be imitated detected by the imitated motion sensor 203, and the state of the environment in the area 200 to be imitated as the initial collected data 401 in the HDD 704.

次に、ＣＰＵ７０１は、機械学習において用いられる初期収集データ４０１が、十分なサンプル数となったか否かを判定する（Ｓ３）。この処理において、ＣＰＵ７０１は、予め設定されたサンプル数の初期収集データ４０１が集まったか否かを判定する。ＣＰＵ７０１は、初期収集データ４０１のサンプル数が予め設定されたサンプル数に満たないと判定した場合（Ｎｏ）、ステップＳ１に処理を戻し、予め設定されたサンプル数に達するまでステップＳ１～Ｓ３の処理を繰り返す。ここで、予め設定されるサンプル数は、例えば、２０～１００サンプルとなっているが、上述したように、１サンプルでもよく、具体的な明確な基準は設けられていない。 Next, the CPU 701 determines whether the number of samples of the initially collected data 401 used in the machine learning is sufficient (S3). In this process, the CPU 701 determines whether a preset number of samples of the initially collected data 401 have been collected. If the CPU 701 determines that the number of samples of the initially collected data 401 does not reach the preset number of samples (No), the process returns to step S1, and the processes of steps S1 to S3 are repeated until the preset number of samples is reached. Here, the preset number of samples is, for example, 20 to 100 samples, but as described above, it may be 1 sample, and no specific, clear standard is set.

ステップＳ３の処理において、初期収集データ４０１のサンプル数が予め設定されたサンプル数に達したと判定した場合（Ｙｅｓ）、ＣＰＵ７０１は、機械学習を実行し、学習済モデル７０６を生成する（Ｓ４）。この処理において、ＣＰＵ７０１は、初期収集データ４０１を入力とするＬＳＴＭによる機械学習を行い、学習済モデル７０６を取得する。これにより、ＣＰＵ７０１は、学習済モデル７０６を用いた推論処理により出力される各時点における動作データを用いて制御対象ロボット３０１を動作可能となる。 If it is determined in the processing of step S3 that the number of samples of the initially collected data 401 has reached a preset number of samples (Yes), the CPU 701 executes machine learning to generate a trained model 706 (S4). In this processing, the CPU 701 executes machine learning using LSTM with the initially collected data 401 as input, and acquires the trained model 706. This enables the CPU 701 to operate the controlled robot 301 using the operation data at each time point output by the inference processing using the trained model 706.

ステップＳ１～Ｓ４の処理が、制御対象ロボット３０１の動作環境とは異なる環境で模倣対象動作主２０１を動作させて初期収集データ４０１を取得し、初期収集データ４０１を用いた機械学習により学習済モデル７０６を取得する処理を構成する。 The processing of steps S1 to S4 constitutes a process of operating the subject 201 to be imitated in an environment different from the operating environment of the robot 301 to be controlled, acquiring initial collected data 401, and acquiring a trained model 706 by machine learning using the initial collected data 401.

＜判定部＞
次に、判定部２による判定の詳細について説明する。ロボット動作学習装置１００では、制御対象ロボット３０１の規定ステップ数分の動作が完了した後、制御対象ロボット３０１の動作データ、制御対象エリア３００内の環境、制御対象タスク３０２の状態、のうち１つ又は複数が判定部２に入力される。判定部２は、入力された各種データを用いて、制御対象ロボット３０１が制御対象タスク３０２に成功したかどうかを判定する。 <Determination section>
Next, a detailed description will be given of the determination by the determination unit 2. In the robot motion learning device 100, after the control target robot 301 has completed a prescribed number of steps of motion, one or more of the motion data of the control target robot 301, the environment in the control target area 300, and the state of the control target task 302 are input to the determination unit 2. The determination unit 2 uses the various input data to determine whether the control target robot 301 has succeeded in the control target task 302.

判定方法の具体例として、図７に示すような制御対象ロボット３０１がワークを把持してゴール地点まで運んで設置するピックアンドプレースタスクが制御対象タスク３０２である場合を例に説明する。図７に示す例では、制御対象ロボット３０１のロボットハンド３０１ａがスタート地点に設置されたワーク５０１を把持し、ゴール地点５０２に任意の経路で運搬し設置するタスクとなっている。 As a specific example of the determination method, a pick-and-place task in which the controlled robot 301 grasps a workpiece, carries it to a goal point, and places it there, as shown in FIG. 7, is the controlled task 302. In the example shown in FIG. 7, the robot hand 301a of the controlled robot 301 grasps a workpiece 501 placed at a start point, and carries it along an arbitrary route to the goal point 502 and places it there.

ゴール地点５０２は、制御動作センサ３０３の一例である重量計５０３の上に設定されており、制御対象ロボット３０１がワーク５０１を正しく運搬、設置した場合、ワーク５０１の重量が重量計５０３によって計測される。そのため、ユーザーは、重量計５０３の計測値に実際の制御環境に即した閾値を設け、閾値以上の重量が検出された場合に制御対象タスク３０２に成功したと判定するように設定を行うことができる。換言すると、判定部２は、重量計５０３の計測値に実際の制御環境に即した閾値がユーザーによって設定されることで、閾値以上の重量が検出された場合に制御対象タスク３０２が成功したと判定する。 The goal point 502 is set on a weighing scale 503, which is an example of a control operation sensor 303, and when the controlled robot 301 correctly transports and places the workpiece 501, the weight of the workpiece 501 is measured by the weighing scale 503. Therefore, the user can set a threshold value for the measurement value of the weighing scale 503 that corresponds to the actual control environment, and set it so that the controlled task 302 is determined to have been successful when a weight equal to or greater than the threshold is detected. In other words, the judgment unit 2 judges that the controlled task 302 is successful when a weight equal to or greater than the threshold is detected, by the user setting a threshold value for the measurement value of the weighing scale 503 that corresponds to the actual control environment.

このように、判定部２は、制御対象エリア３００の環境に基づく閾値を用いて成否の判定を行う。これにより、ロボット動作学習装置１００は、制御対象ロボット３０１の動作の結果と閾値との比較といった簡易な処理によって成否の判定を行うことができ、判定部２による判定における処理の負荷を軽減できる。 In this way, the judgment unit 2 judges success or failure using a threshold based on the environment of the controlled area 300. This allows the robot motion learning device 100 to judge success or failure through simple processing such as comparing the results of the motion of the controlled robot 301 with the threshold, thereby reducing the processing load in the judgment by the judgment unit 2.

次に、判定方法の具体例として、図８に示すような制御対象ロボット３０１がケーブルを引き回す制御対象タスク３０２であり、機械学習を用いて制御対象タスク３０２の成否を判定する場合を例に説明する。図８に示す例では、制御対象ロボット３０１がワーク５１２に対して、スタート地点５１３から中継地点５１４，５１５の外周を経由してゴール地点５１６にケーブル５１１の先端を差し込むルートでケーブル５１１を引き回すタスクとなっている。 Next, as a specific example of the judgment method, a case will be described in which the controlled robot 301 shown in FIG. 8 is a controlled task 302 of pulling a cable, and the success or failure of the controlled task 302 is judged using machine learning. In the example shown in FIG. 8, the controlled robot 301 is tasked with pulling a cable 511 around a workpiece 512 from a start point 513, passing through the periphery of relay points 514 and 515, and inserting the tip of the cable 511 into a goal point 516.

このため、図８に示す例では、ケーブル５１１がゴール地点５１６に差し込まれていたとしても、中継地点５１４、５１５の内周をケーブル５１１が通っていた場合、制御対象タスク３０２は成功とはならない。このようなタスクの成否を判定する方法として、ケーブル５１１の引き回し動作完了後のワーク５１２とケーブル５１１の相対的な位置関係をその外観から判定する方法があげられ、判定部２に機械学習を用いることで判定を自動化できる。 For this reason, in the example shown in FIG. 8, even if cable 511 is inserted into goal point 516, if cable 511 passes through the inner circumference of relay points 514 and 515, controlled task 302 will not be successful. One method for determining the success or failure of such a task is to determine the relative positional relationship between workpiece 512 and cable 511 from their appearance after the operation of pulling cable 511 is completed, and the determination can be automated by using machine learning in determination unit 2.

図９は、判定部２に機械学習を用いて判定を自動化する場合について説明する図である。図９に示す例において、判定部２は、例えばＶＡＥ（ＶａｒｉｅｔｉｏｎａｌＡｕｔｏＥｎｃｏｄｅｒ）のような機械学習によって構成される。ＶＡＥは、未知のデータが入力された場合、そのデータをもとの状態で出力することができないという性質をもっている。 Figure 9 is a diagram for explaining a case where the judgment is automated using machine learning in the judgment unit 2. In the example shown in Figure 9, the judgment unit 2 is configured by machine learning such as a VAE (Variational Auto Encoder). A VAE has the property that when unknown data is input, it cannot output the data in its original state.

図９に示すように、ＶＡＥは、学習時において、制御対象タスク３０２と同一の内容の模倣対象タスク２０２を終了した後のワーク５１２及びケーブル５１１の外観画像のうち、成功したもののみを入力とし、入力と同じデータを出力するように学習を行う。これによって、アルゴリズムの内部で、タスク成功時の外観画像の潜在的特徴が抽出される。なお、入力されるデータは、画像データに限定されず、模倣対象タスク２０２を実行中の模倣対象動作主２０１の動作データ、模倣動作センサ２０３によって検出される模倣対象エリア２００内の環境の情報等が入力されてもよい。 As shown in FIG. 9, during learning, the VAE inputs only successful external images of the workpiece 512 and cable 511 after the completion of the imitation target task 202, which has the same content as the controlled task 302, and learns to output the same data as the input. This allows latent features of the external image when the task is successful to be extracted within the algorithm. Note that the input data is not limited to image data, and may include motion data of the imitation target actor 201 performing the imitation target task 202, and environmental information within the imitation target area 200 detected by the imitation motion sensor 203.

判定部２は、ＶＡＥによって制御対象タスク３０２の成否を判定する際に、制御対象ロボット３０１が動作を完了した後の制御対象タスク３０２の外観画像を入力する。模倣対象タスク２０２に成功した際のデータのみを入力としてＶＡＥを学習していることから、ＶＡＥは、制御対象タスク３０２に成功した際の外観画像を入力すると、入力された外観画像を内部で圧縮したのちに元の状態に復元することができる。一方、ＶＡＥは、制御対象タスク３０２に失敗した際の外観画像を入力すると、圧縮したのちに元が状態へ復元することができない。 When the VAE determines whether the controlled task 302 is successful, the judgment unit 2 inputs an external image of the controlled task 302 after the controlled robot 301 has completed its operation. Since the VAE learns using only data from when the imitation task 202 is successful as input, when the VAE inputs an external image from when the controlled task 302 is successful, it can compress the input external image internally and then restore it to its original state. On the other hand, when the VAE inputs an external image from when the controlled task 302 fails, it cannot compress it and then restore it to its original state.

このため、判定部２は、ＶＡＥの入力画像と出力画像の差分を異常度とし、異状度の閾値を設定することで、制御対象ロボット３０１の動作の成否を判定することができる。なお、判定部２は、ＶＡＥを画像以外のデータを用いて学習した場合、判定時にも学習に用いたデータと同様のデータを入力することで異常度を算出することができる。また、本実施形態においては、具体例としてＶＡＥを挙げたが、使用するアルゴリズムは成功例のデータのみを学習する教師なし学習が可能であればよく、ＧＡＮ（ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋ）等でもよい。 Therefore, the judgment unit 2 can determine the success or failure of the operation of the controlled robot 301 by taking the difference between the input image and the output image of the VAE as the degree of abnormality and setting a threshold value for the degree of abnormality. Note that, when the judgment unit 2 learns the VAE using data other than images, it can calculate the degree of abnormality by inputting data similar to the data used for learning at the time of judgment. Also, in this embodiment, the VAE is given as a specific example, but the algorithm used can be any algorithm that allows unsupervised learning that learns only data of successful cases, and may be a GAN (Generative Adversarial Network), etc.

＜制御対象物の運転工程＞
次に、制御対象ロボット３０１の動作の判定について説明する。ロボット動作学習装置１００では、制御対象ロボット３０１の規定ステップ数の動作を完了した後に、制御対象ロボット３０１の動作データ、制御対象エリア３００内の環境、制御対象タスク３０２の状態、のうち１つ又は複数を判定部２が入力される。 <Operation process of the controlled object>
Next, a description will be given of the judgment of the motion of the control target robot 301. In the robot motion learning device 100, after the control target robot 301 completes a prescribed number of steps of motion, one or more of the motion data of the control target robot 301, the environment in the control target area 300, and the state of the control target task 302 are input to the judgment unit 2.

判定部２に入力される制御対象ロボット３０１の動作データ、制御対象エリア３００内の環境、制御対象タスク３０２の状態、といったサンプリングデータは、予め入力されるように設定されたデータである必要がある。例えば、動作完了後のある一時刻の模倣対象タスク２０２の画像のみから動作の成否を判定するように設定している場合には、制御対象ロボット３０１が動作した後のある一時刻の制御対象タスク３０２の画像のみを判定部２に入力する。 Sampling data such as the operation data of the controlled robot 301, the environment in the controlled area 300, and the state of the controlled task 302 input to the judgment unit 2 must be data that is set to be input in advance. For example, if it is set to judge the success or failure of an operation only from an image of the imitation target task 202 at a certain time after the operation is completed, only an image of the controlled task 302 at a certain time after the controlled robot 301 operates is input to the judgment unit 2.

判定部２は、入力されたデータが予め設定された条件を満足しているかどうかを求め、入力されたデータが該条件を満たしていた場合、動作に成功したと判定する。動作に成功したと判定した場合、判定部２は、判定に際し入力された動作データと制御動作センサ３０３が検出した値とを記録装置４００に成功データ４０２として保存する保存処理を行う。換言すると、判定部２は、動作に成功したと判定した場合、推論処理で出力された動作データと、該動作データにより制御対象ロボット３０１が動作した際に制御動作センサ３０３が検出した値と、を成功データ４０２として記録装置４００に保存する。 The judgment unit 2 determines whether the input data satisfies a preset condition, and if the input data satisfies the condition, judges that the operation was successful. If it is judged that the operation was successful, the judgment unit 2 performs a storage process to store the operation data input at the time of the judgment and the value detected by the control operation sensor 303 as success data 402 in the recording device 400. In other words, if the judgment unit 2 judges that the operation was successful, it stores the operation data output in the inference process and the value detected by the control operation sensor 303 when the controlled robot 301 operates based on the operation data as success data 402 in the recording device 400.

一方、動作が失敗したと判定した場合、判定部２は、判定に際し入力された動作データと制御動作センサ３０３が検出した値とを記録装置４００に保存せずに破棄する。これにより、ロボット動作学習装置１００では、動作が失敗した際に入力された動作データや制御動作センサ３０３が検出した値を破棄することで、記録装置４００の保存容量の制約を軽減することができる。 On the other hand, if it is determined that the movement has failed, the determination unit 2 discards the movement data input at the time of the determination and the value detected by the control movement sensor 303 without storing them in the recording device 400. In this way, the robot movement learning device 100 can alleviate the constraints on the storage capacity of the recording device 400 by discarding the movement data input when the movement has failed and the value detected by the control movement sensor 303.

動作及び判定処理が終了した後、ロボット動作学習装置１００は、学習器１の再学習処理を行うために後述する再学習工程に移行してもよいし、再学習処理を行うことなく制御対象物を動作させる運転工程を繰り返してもよい。 After the operation and judgment process is completed, the robot operation learning device 100 may proceed to a re-learning process described below to perform re-learning process of the learning device 1, or may repeat the operation process of operating the controlled object without performing the re-learning process.

＜表示部＞
制御対象ロボット３０１の動作の判定条件の設定や判定結果の確認は、図１０に示されるＧＵＩ（ＧｒａｐｈｉｃａｌＵｓｅｒＩｎｔｅｒｆａｃｅ）６００を用いて行うことができる。ＧＵＩ６００は、ディスプレイ７０７に表示される。 <Display section>
The setting of judgment conditions for the operation of the control target robot 301 and confirmation of the judgment results can be performed using a GUI (Graphical User Interface) 600 shown in Fig. 10. The GUI 600 is displayed on a display 707.

制御対象ロボット３０１の動作の判定条件の設定や判定結果の確認を行う場合、ＧＵＩ６００には、判定表示画面６００ａが表示されている。判定表示画面６００ａには、実行ＩＤ表示ボックス６０１、成否判定ボックス６０２、入力画像表示ボックス６０３、判定結果画像表示ボックス６０４、パラメータモニタ６０５、設定ウィンドウ６０７が表示される。また、判定表示画面６００ａには、詳細設定ボタン６１２、設定項目追加ボタン６１３、設定項目リセットボタン６１４が表示される。 When setting judgment conditions for the operation of the controlled robot 301 or checking the judgment results, a judgment display screen 600a is displayed on the GUI 600. The judgment display screen 600a displays an execution ID display box 601, a success/fail judgment box 602, an input image display box 603, a judgment result image display box 604, a parameter monitor 605, and a setting window 607. The judgment display screen 600a also displays a detailed settings button 612, an add setting item button 613, and a reset setting item button 614.

設定ウィンドウ６０７には、判定の条件がリスト化されて表示される条件リスト６０８が表示されている。条件リスト６０８には、条件の内容を確認するためのラベルを設定する条件ラベル設定ボックス６０９と、閾値を設定する閾値設定ボックス６１０と、閾値に対する満たすべき大小関係を示す大小関係設定ボックス６１１と、が表示されている。 The setting window 607 displays a condition list 608 that displays a list of judgment conditions. The condition list 608 displays a condition label setting box 609 for setting a label to confirm the contents of the condition, a threshold setting box 610 for setting a threshold, and a magnitude relationship setting box 611 that indicates the magnitude relationship that must be satisfied with respect to the threshold.

設定項目を追加したい場合には、設定項目追加ボタン６１３を選択することで条件を追加することができる。各条件に紐づいた条件消去ボタン６１５を選択することで、該当する条件を消去することができる。また、設定項目リセットボタン６１４を選択することで、すべての条件を消去することができる。 If you want to add a setting item, you can add a condition by selecting the add setting item button 613. You can delete the corresponding condition by selecting the delete condition button 615 linked to each condition. Also, you can delete all conditions by selecting the reset setting item button 614.

ユーザーは、キーボード７０８、マウス７０９を操作し、設定ウィンドウ６０７に表示されている各ボックスに任意の値を入力し、条件設定を行うことができる。複数の条件を設定した場合、最終的な成否判定は全条件の論理積によって決定される。また、詳細設定ボタン６１２を操作することによって表示される詳細設定ウィンドウ６２０（図１１参照）によって、複数の条件が満たすべき論理式をユーザーが独自に設定することもできる。 The user can operate the keyboard 708 and mouse 709 to input any value into each box displayed in the setting window 607 to set conditions. When multiple conditions are set, the final success/failure determination is determined by the logical product of all the conditions. In addition, the user can independently set a logical expression that must be satisfied by the multiple conditions, using the detailed setting window 620 (see FIG. 11) that is displayed by operating the detailed setting button 612.

図１１に示すように、詳細設定ウィンドウ６２０には、設定ウィンドウ６０７と同じ条件が記入された条件リスト６２３が表示されている。また、詳細設定ウィンドウ６２０には、論理式設定ボックス６２１、項番号設定ボックス６２２、完了ボタン６２４、設定項目追加ボタン６２５、設定項目リセットボタン６２６及び条件消去ボタン６２７が表示されている。なお、設定項目追加ボタン６２５は、設定項目追加ボタン６１３と同じ機能を有し、設定項目リセットボタン６２６は、設定項目リセットボタン６１４と同じ機能を有し、条件消去ボタン６２７は、条件消去ボタン６１５と同じ機能を有している。 As shown in FIG. 11, the detailed settings window 620 displays a condition list 623 in which the same conditions as those in the settings window 607 are entered. The detailed settings window 620 also displays a logical formula setting box 621, a term number setting box 622, a done button 624, an add setting item button 625, a reset setting item button 626, and an erase condition button 627. Note that the add setting item button 625 has the same function as the add setting item button 613, the reset setting item button 626 has the same function as the reset setting item button 614, and the erase condition button 627 has the same function as the erase condition button 615.

条件リスト６２３は、詳細設定ボタン６１２が選択されたタイミングの設定ウィンドウ６０７の条件リスト６０８と同期されている。ユーザーは、設定項目追加ボタン６２５、設定項目リセットボタン６２６、条件消去ボタン６２７で条件を増減しながら条件リスト６２３を編集することで任意の条件を設定することができる。 The condition list 623 is synchronized with the condition list 608 in the setting window 607 when the detailed settings button 612 is selected. The user can set any condition by editing the condition list 623 while adding or removing conditions using the add setting item button 625, reset setting item button 626, and delete condition button 627.

また、ユーザーは、項番号設定ボックス６２２に任意の項番号を設定し、論理式設定ボックス６２１で各条件が満たすべき論理式を入力することで、複数の条件の関係性を詳細に設定することができる。 The user can also set the detailed relationships between multiple conditions by setting an arbitrary term number in the term number setting box 622 and inputting the logical expression that each condition must satisfy in the logical expression setting box 621.

条件設定の完了後、完了ボタン６２４を選択することで、設定ウィンドウ６０７に表示されていた条件リスト６０８が条件リスト６２３と同期され、詳細設定ウィンドウの表示が終了され、判定表示画面６００ａが表示される。 After completing the condition settings, by selecting the done button 624, the condition list 608 displayed in the setting window 607 is synchronized with the condition list 623, the display of the detailed setting window is closed, and the judgment display screen 600a is displayed.

実行ＩＤ表示ボックス６０１には、設定が完了した後に制御対象ロボット３０１の運転が開始されることで、運転の回数ごとに自動的に割り振られた実行ＩＤが表示される。実行ＩＤ表示ボックス６０１に表示される実行ＩＤを確認することで、ユーザーは、現在の運転が何回目のものなのか識別することができる。ここで、運転とは、制御対象ロボット３０１の動作によって制御対象タスク３０２が開始及び終了されるまでの一連の動作であり、１回の運転によって１回の制御対象タスク３０２が終了される。 When the operation of the controlled robot 301 is started after the settings are completed, the execution ID display box 601 displays the execution ID that is automatically assigned for each operation. By checking the execution ID displayed in the execution ID display box 601, the user can identify which operation is currently being performed. Here, an operation is a series of operations from when the controlled robot 301 starts to when the controlled task 302 ends, and one operation ends one controlled task 302.

パラメータモニタ６０５には、設定した条件に対し、制御対象ロボット３０１の運転の結果から取得されるデータ（パラメータ）が、パラメータリスト６０６に表示される。パラメータリスト６０６においては、条件を満たしたデータを表示するボックスが点灯表示され、条件を満たしていないデータを表示するボックスが消灯表示される。 In the parameter monitor 605, data (parameters) obtained from the results of the operation of the controlled robot 301 with respect to the set conditions are displayed in a parameter list 606. In the parameter list 606, boxes that display data that meets the conditions are displayed lit, and boxes that display data that do not meet the conditions are displayed unlit.

成否判定ボックス６０２には、判定部２による総合的な成否の判定の結果が表示される。成否判定ボックス６０２は、初期状態では消灯表示されており、制御対象ロボット３０１が動作に失敗したと判定された場合に、判定の結果が失敗であることを示す「判定ＮＧ」の文字が点灯表示される。一方、成否判定ボックス６０２は、制御対象ロボット３０１が動作に成功したと判定された場合に、消灯表示が維持される。 The success/failure judgment box 602 displays the overall success/failure judgment result by the judgment unit 2. The success/failure judgment box 602 is initially displayed as off, and if it is determined that the controlled robot 301 has failed to operate, the words "Judgment NG" are displayed as lit, indicating that the judgment result is a failure. On the other hand, if it is determined that the controlled robot 301 has succeeded in operating, the success/failure judgment box 602 remains off.

入力画像表示ボックス６０３は、判定部２への入力として画像を用いる場合に、入力画像が表示される。また、判定部２への入力として画像を用い、判定結果を画像として得られる場合には、判定結果画像表示ボックス６０４に判定結果の画像を表示することができる。判定結果画像表示ボックス６０４に表示される画像は、判定結果が失敗であった場合、失敗した箇所が着色されて強調されるヒートマップで表示される。 When an image is used as input to the judgment unit 2, the input image is displayed in the input image display box 603. When an image is used as input to the judgment unit 2 and the judgment result is obtained as an image, the judgment result image can be displayed in the judgment result image display box 604. When the judgment result is a failure, the image displayed in the judgment result image display box 604 is displayed as a heat map in which the failed areas are highlighted in color.

＜制御対象物の運転工程で実行される処理の流れ＞
図１２は、制御対象エリア３００において制御対象ロボット３０１が運転する場合に実行される処理の流れを示すフローチャートである。 <Flow of processing executed in the operation process of the controlled object>
FIG. 12 is a flowchart showing the flow of processing executed when the controlled robot 301 is operating in the controlled area 300.

ロボット動作学習装置１００では、まず、判定部２として機能するＣＰＵ７０１が成否の判定をするための条件がＧＵＩ６００を用いて設定される（Ｓ１１）。この処理において、ＣＰＵ７０１は、ＧＵＩ６００と、キーボード７０８及びマウス７０９の入力部と、を用いたユーザーによる操作により、制御対象ロボット３０１が動作して実行された制御対象タスク３０２の成否の判定を行う条件が設定される。 In the robot motion learning device 100, first, the conditions for the CPU 701 functioning as the judgment unit 2 to judge the success or failure are set using the GUI 600 (S11). In this process, the CPU 701 sets the conditions for judging the success or failure of the control target task 302 executed by the control target robot 301 through the user's operation using the GUI 600 and the input units of the keyboard 708 and mouse 709.

次に、ＣＰＵ７０１は、制御対象ロボット３０１の現在の状態を制御動作センサ３０３によって検出する（Ｓ１２）。次に、ＣＰＵ７０１は、ステップＳ１２の処理で取得したデータを学習器１に入力し、学習器１の学習済モデル７０６を用いた推論処理により出力された各種データに従って制御対象ロボット３０１を動作させる（Ｓ１３）。この処理において、ＣＰＵ７０１は、上述した推論処理によって出力される制御対象ロボット３０１の動作データや、制御対象エリア３００内の環境、制御対象タスク３０２の状態、に基づき制御対象ロボット３０１を動作させる。ステップＳ１２，Ｓ１３の処理が、学習済モデル７０６を用いて、制御対象ロボット３０１を制御対象エリア３００内の環境で動作させるための動作データを取得する処理を構成する。 Next, the CPU 701 detects the current state of the controlled robot 301 using the control operation sensor 303 (S12). Next, the CPU 701 inputs the data acquired in the process of step S12 to the learning device 1, and operates the controlled robot 301 according to various data output by the inference process using the learned model 706 of the learning device 1 (S13). In this process, the CPU 701 operates the controlled robot 301 based on the operation data of the controlled robot 301 output by the above-mentioned inference process, the environment in the controlled area 300, and the state of the controlled task 302. The processes of steps S12 and S13 constitute a process of acquiring operation data for operating the controlled robot 301 in the environment in the controlled area 300 using the learned model 706.

次に、ＣＰＵ７０１は、制御対象タスク３０２が完了したか否かを判定する（Ｓ１４）。この処理において、制御対象タスク３０２が完了していないと判定した場合（Ｎｏ）、ＣＰＵ７０１は、ステップＳ１２に処理を戻し、制御対象タスク３０２が完了するまで、ステップＳ１２～Ｓ１４の処理を繰り返す。一方、制御対象タスク３０２が完了したと判定した場合（Ｙｅｓ）、ＣＰＵ７０１は、ステップＳ１５に処理を進める。 Next, the CPU 701 determines whether the control target task 302 has been completed (S14). In this process, if the CPU 701 determines that the control target task 302 has not been completed (No), the CPU 701 returns to step S12 and repeats steps S12 to S14 until the control target task 302 is completed. On the other hand, if the CPU 701 determines that the control target task 302 has been completed (Yes), the CPU 701 proceeds to step S15.

ステップＳ１５の処理において、ＣＰＵ７０１は、制御対象タスク３０２の成否の判定と、判定した結果の表示と、を行う（Ｓ１５）。この処理において、ＣＰＵ７０１は、ステップＳ１１の処理で設定した判定の条件を用い、入力される条件に対応するデータが、条件を満足しているかどうかを求める。また、ＣＰＵ７０１は、判定した結果をディスプレイ７０７のＧＵＩ６００に表示する処理を行う。 In the process of step S15, the CPU 701 judges whether the controlled task 302 is successful and displays the result of the judgment (S15). In this process, the CPU 701 uses the judgment conditions set in the process of step S11 to determine whether the data corresponding to the input conditions satisfies the conditions. The CPU 701 also performs a process to display the result of the judgment on the GUI 600 of the display 707.

判定条件と判定の結果とがＧＵＩ６００に表示されることで、ロボット動作学習装置１００は、制御対象ロボット３０１の動作の結果が適切に判定されたか否かと、結果の成否と、をユーザーに容易に確認させることができる。ステップＳ１１，Ｓ１５の処理が、判定処理の判定条件と、判定した結果と、をディスプレイ７０７に表示する処理を構成する。 By displaying the judgment conditions and the judgment results on the GUI 600, the robot motion learning device 100 allows the user to easily check whether the result of the motion of the controlled robot 301 was judged appropriately and whether the result was successful. The processes of steps S11 and S15 constitute a process of displaying the judgment conditions of the judgment process and the judgment results on the display 707.

次に、ＣＰＵ７０１は、制御対象タスク３０２が成功したか否かを判定する（Ｓ１６）。この処理において、ＣＰＵ７０１は、ステップＳ１５の処理において制御対象タスク３０２が成功したと判定したか否かを判定しており、制御対象タスク３０２が成功したと判定されていた場合（Ｙｅｓ）、ステップＳ１８に処理を進める。一方、制御対象タスク３０２が失敗したと判定していた場合（Ｎｏ）、ＣＰＵ７０１は、ステップＳ１７に処理を進める。ステップＳ１５，Ｓ１６の処理が、動作データを用いて制御対象ロボット３０１を制御対象エリア３００の環境で動作させ、結果を判定する判定処理を構成する。 Next, the CPU 701 determines whether the control target task 302 has been successful (S16). In this process, the CPU 701 determines whether the control target task 302 has been determined to be successful in the process of step S15, and if it has been determined that the control target task 302 has been successful (Yes), the process proceeds to step S18. On the other hand, if it has been determined that the control target task 302 has failed (No), the CPU 701 proceeds to step S17. The processes of steps S15 and S16 constitute a determination process that operates the control target robot 301 in the environment of the control target area 300 using the operation data and determines the result.

ステップＳ１７の処理において、ＣＰＵ７０１は、今回の制御対象タスク３０２の実行に際し学習済モデル７０６から推論処理を行い出力した各種データを破棄し（Ｓ１７）、ステップＳ１２に処理を戻す。 In the processing of step S17, the CPU 701 discards the various data output by performing inference processing from the learned model 706 during the execution of the current controlled task 302 (S17), and returns the processing to step S12.

ステップＳ１８の処理において、ＣＰＵ７０１は、今回の制御対象タスク３０２の実行に際し学習済モデル７０６から推論処理を行い出力された各種データを成功データ４０２として記録装置４００に保存する（Ｓ１８）。この、ステップＳ１８の処理が、判定処理において結果が成功と判定された動作についての成功データ４０２を取得する保存処理を構成する。 In the process of step S18, the CPU 701 performs an inference process using the learned model 706 when executing the current control target task 302, and stores the various data output as success data 402 in the recording device 400 (S18). This process of step S18 constitutes a storage process that acquires success data 402 for an operation whose result is determined to be successful in the determination process.

次に、ＣＰＵ７０１は、記録装置４００に保存した成功データ４０２のサンプル数が予め設定されたサンプル数に達したか否かを判定する（Ｓ１９）。この処理において、予め設定されたサンプル数に達したと判定した場合（Ｙｅｓ）、ＣＰＵ７０１は、制御対象ロボット３０１の運転を終了する。一方、予め設定されたサンプル数に達していないと判定した場合（Ｎｏ）、ＣＰＵ７０１は、ステップＳ１２に処理を戻す。 Next, the CPU 701 determines whether the number of samples of the success data 402 stored in the recording device 400 has reached a preset number of samples (S19). In this process, if it is determined that the preset number of samples has been reached (Yes), the CPU 701 ends the operation of the controlled robot 301. On the other hand, if it is determined that the preset number of samples has not been reached (No), the CPU 701 returns the process to step S12.

これにより、ロボット動作学習装置１００では、記録装置４００に保存された成功データ４０２のサンプル数が予め設定されたサンプル数に達するまで、ステップＳ１２～Ｓ１９の処理が繰り返される。 As a result, in the robot motion learning device 100, the processes of steps S12 to S19 are repeated until the number of samples of the successful data 402 stored in the recording device 400 reaches the preset number of samples.

＜再学習工程＞
次に、判定部２に動作に成功したと判定された成功データ４０２を用いて学習器１で再学習する。再学習において、学習器１は、記録装置４００に保存されている成功データ４０２のサンプル数に応じて、記録装置４００に保存されている初期収集データ４０１と、成功データ４０２と、のうち、再学習に用いるデータを決定する。 <Relearning process>
Next, re-learning is performed by the learning device 1 using the success data 402 determined by the determination unit 2 to be a successful operation. In the re-learning, the learning device 1 determines which data to use for the re-learning from the initial collected data 401 and the success data 402 stored in the recording device 400, depending on the number of samples of the success data 402 stored in the recording device 400.

＜再学習工程で実行される処理の流れ＞
図１３は、学習器１として機能するＣＰＵ７０１が成功データ４０２を用いた学習済モデル７０６の再学習を行う場合に実行される再学習処理の流れを示すフローチャートである。ＣＰＵ７０１は、まず、十分なサンプル数の成功データ４０２が取得できているか否かを判定する（Ｓ３１）。この処理において、ＣＰＵ７０１は、記録装置４００に保存されている成功データ４０２のサンプル数が十分なサンプル数であるか否かを判定する。ここで、十分なサンプル数とは、例えば、学習工程において収集された初期収集データ４０１と同数のサンプル数である。 <Flow of processing executed in the re-learning process>
13 is a flowchart showing the flow of a re-learning process executed when the CPU 701 functioning as the learning device 1 re-learns the trained model 706 using the success data 402. The CPU 701 first determines whether or not a sufficient number of samples of the success data 402 have been acquired (S31). In this process, the CPU 701 determines whether or not the number of samples of the success data 402 stored in the recording device 400 is a sufficient number of samples. Here, the sufficient number of samples is, for example, the same number of samples as the initial collected data 401 collected in the learning process.

ステップＳ３１の処理において、成功データ４０２のサンプル数が十分なサンプル数であると判定した場合（Ｙｅｓ）、ＣＰＵ７０１は、記録装置４００に保存されている成功データ４０２を用いた機械学習による再学習を行う（Ｓ３２）。この処理において、ＣＰＵ７０１は、記録装置４００に記憶されている成功データ４０２を用い、初期収集データ４０１については用いずに機械学習を行う。ＣＰＵ７０１は、ステップＳ３２の処理において、学習工程で行った機械学習と同じアルゴリズムを用いた機械学習を行い、学習済モデル７０６を取得して再学習処理を終了する。 If it is determined in the process of step S31 that the number of samples of the successful data 402 is sufficient (Yes), the CPU 701 performs re-learning by machine learning using the successful data 402 stored in the recording device 400 (S32). In this process, the CPU 701 performs machine learning using the successful data 402 stored in the recording device 400, and does not use the initially collected data 401. In the process of step S32, the CPU 701 performs machine learning using the same algorithm as the machine learning performed in the learning process, obtains a trained model 706, and ends the re-learning process.

このように、ＣＰＵ７０１は、成功データ４０２のサンプル数が所定のサンプル数として初期収集データ４０１の総サンプル数以上のサンプル数以上である際に、所定のサンプル数以上の成功データ４０２を学習データとして用いて再学習を行う。ロボット動作学習装置１００は、制御対象ロボット３０１の動作から得られた成功データ４０２のみを用いて再学習した学習済モデル７０６を取得する。そして、ロボット動作学習装置１００は、再学習により取得した学習済モデル７０６を用いることで制御対象ロボット３０１の制御の精度を向上させることができる。 In this way, when the number of samples of the successful data 402 is equal to or greater than the total number of samples of the initially collected data 401 as a predetermined number of samples, the CPU 701 performs re-learning using the successful data 402 equal to or greater than the predetermined number of samples as learning data. The robot motion learning device 100 acquires a learned model 706 that has been re-learned using only the successful data 402 obtained from the motion of the controlled robot 301. Then, the robot motion learning device 100 can improve the accuracy of control of the controlled robot 301 by using the learned model 706 acquired by re-learning.

ステップＳ３１の処理において、成功データ４０２のサンプル数が十分なサンプル数ではないと判定した場合（Ｎｏ）、ＣＰＵ７０１は、取得できている成功データ４０２のサンプル数が初期収集データ４０１のサンプル数の半分以上か否かを判定する（Ｓ３３）。この処理において、ＣＰＵ７０１は、記録装置４００に保存されている成功データ４０２のサンプル数が、学習工程において取得した初期収集データ４０１のサンプル数の半分以上であるか否かを判定する。なお、半分という量は、絶対的なものではなく成功データ４０２のすべてと初期収集データ４０１の一部とを合計したサンプル数が学習に十分な量に達して入ればユーザーの任意の数でよい。 If it is determined in the process of step S31 that the number of samples of successful data 402 is not sufficient (No), the CPU 701 determines whether the number of samples of successful data 402 that have been acquired is equal to or greater than half the number of samples of initially collected data 401 (S33). In this process, the CPU 701 determines whether the number of samples of successful data 402 stored in the recording device 400 is equal to or greater than half the number of samples of initially collected data 401 acquired in the learning process. Note that the amount of half is not absolute, and can be any number that the user desires as long as the total number of samples of all of the successful data 402 and part of the initially collected data 401 reaches a sufficient amount for learning.

ステップＳ３３の処理において、成功データ４０２のサンプル数が初期収集データ４０１のサンプル数の半分以上であると判定した場合（Ｙｅｓ）、ＣＰＵ７０１は、ステップＳ３４に処理を進める。一方、成功データ４０２のサンプル数が初期収集データ４０１のサンプル数の半分未満であると判定した場合（Ｎｏ）、ＣＰＵ７０１は、ステップＳ３５に処理を進める。 If it is determined in the process of step S33 that the number of samples of the successful data 402 is equal to or greater than half the number of samples of the initially collected data 401 (Yes), the CPU 701 proceeds to step S34. On the other hand, if it is determined that the number of samples of the successful data 402 is less than half the number of samples of the initially collected data 401 (No), the CPU 701 proceeds to step S35.

ステップＳ３４の処理において、ＣＰＵ７０１は、記録装置４００に保存されている成功データ４０２のすべてと、成功データ４０２と同じサンプル数の初期収集データ４０１と、を用いた機械学習による再学習を行う（Ｓ３４）。この処理において、ＣＰＵ７０１は、記録装置４００に記憶されている成功データ４０２と、成功データ４０２と同じサンプル数の初期収集データ４０１と、を用いて機械学習を行い、学習済モデル７０６を取得して再学習処理を終了する。 In the process of step S34, the CPU 701 performs re-learning by machine learning using all of the successful data 402 stored in the recording device 400 and the same number of samples of the initial collected data 401 as the successful data 402 (S34). In this process, the CPU 701 performs machine learning using the successful data 402 stored in the recording device 400 and the same number of samples of the initial collected data 401 as the successful data 402, obtains a trained model 706, and ends the re-learning process.

ステップＳ３４の処理を実行することにより、ロボット動作学習装置１００では、再学習で用いられるデータサンプル内の相対的なデータ差異が減少する。これにより、ロボット動作学習装置１００は、再学習を行った学習済モデル７０６から推論処理を行い出力された各種データを用いて制御対象ロボット３０１を動作させた際における制御対象タスク３０２の成功率が向上することが期待できる。 By executing the process of step S34, the robot motion learning device 100 reduces the relative data difference in the data samples used in re-learning. As a result, the robot motion learning device 100 is expected to improve the success rate of the controlled task 302 when the controlled robot 301 is operated using various data output by performing inference processing from the re-learned learned model 706.

ステップＳ３５の処理において、ＣＰＵ７０１は、記録装置４００に保存されている成功データ４０２のすべてと、記録装置４００に保存されている初期収集データ４０１のすべてと、を用いた機械学習による再学習を行う（Ｓ３５）。この処理において、ＣＰＵ７０１は、記録装置４００に記憶されているすべての成功データ４０２及び初期収集データ４０１と、を用いて機械学習を行い、学習済モデル７０６を取得して再学習処理を終了する。 In the process of step S35, the CPU 701 performs re-learning by machine learning using all of the successful data 402 stored in the recording device 400 and all of the initially collected data 401 stored in the recording device 400 (S35). In this process, the CPU 701 performs machine learning using all of the successful data 402 and the initially collected data 401 stored in the recording device 400, obtains a trained model 706, and ends the re-learning process.

ステップＳ３５の処理を実行することにより、ロボット動作学習装置１００では、再学習で用いられるデータサンプルの数が初期収集データ４０１のみで学習した際よりも多くなる。これにより、ロボット動作学習装置１００は、再学習を行った学習済モデル７０６の汎化性能の向上を期待できる。 By executing the process of step S35, the number of data samples used in re-learning in the robot motion learning device 100 becomes greater than when learning is performed using only the initially collected data 401. As a result, the robot motion learning device 100 can be expected to improve the generalization performance of the re-learned trained model 706.

この、ステップＳ３１～Ｓ３５の処理が、成功データ４０２を用いた機械学習により学習済モデル７０６を学習させる学習処理を構成する。 The processing of steps S31 to S35 constitutes a learning process in which the trained model 706 is trained by machine learning using the successful data 402.

＜本実施形態のまとめ＞
以上のように、本実施形態のロボット動作学習装置１００は、制御対象エリア３００とは異なる模倣対象エリア２００において、制御対象ロボット３０１と対応する模倣対象動作主２０１を動作させ、初期収集データ４０１を取得する。ロボット動作学習装置１００は、取得した初期収集データ４０１を教師データとして機械学習を行い、学習済モデル７０６を取得し、学習済モデル７０６を用いて制御対象エリア３００の環境で制御対象ロボット３０１を動作させる動作データを取得する。ロボット動作学習装置１００は、取得した動作データを用いて制御対象エリア３００の環境で制御対象ロボット３０１を動作させた結果が成功と判定された動作についての成功データ４０２を取得する。そして、ロボット動作学習装置１００は、取得した成功データ４０２を用いた機械学習により、学習済モデル７０６を再学習させる。 <Summary of this embodiment>
As described above, the robot motion learning device 100 of the present embodiment operates the control target robot 301 and the corresponding imitation target actor 201 in the imitation target area 200 different from the control target area 300, and acquires the initial collected data 401. The robot motion learning device 100 performs machine learning using the acquired initial collected data 401 as teacher data, acquires a learned model 706, and acquires motion data for operating the control target robot 301 in the environment of the control target area 300 using the learned model 706. The robot motion learning device 100 acquires success data 402 for motions determined to be successful as a result of operating the control target robot 301 in the environment of the control target area 300 using the acquired motion data. Then, the robot motion learning device 100 re-learns the learned model 706 by machine learning using the acquired success data 402.

このような構成により、ロボット動作学習装置１００は、制御対象タスク３０２を十分に高い成功率で達成することができる学習済モデル７０６を取得することができる。ロボット動作学習装置１００は、模倣対象エリア２００の環境から得た初期収集データ４０１を用いた学習済モデル７０６を、成功データ４０２を用いて再学習する。これにより、ロボット動作学習装置１００は、学習済モデル７０６を用いた制御対象ロボット３０１の制御の精度を向上させることができる。 With this configuration, the robot motion learning device 100 can acquire a learned model 706 that can achieve the controlled task 302 with a sufficiently high success rate. The robot motion learning device 100 re-learns the learned model 706 using the initial collected data 401 obtained from the environment of the imitation target area 200, using the successful data 402. This allows the robot motion learning device 100 to improve the accuracy of control of the controlled robot 301 using the learned model 706.

＜変形例＞
なお、本実施形態において、ＣＰＵ７０１は、判定処理の結果成功と判定された成功データ４０２の保存に際し、初期収集データ４０１については特に処理を行わない構成となっているが、これに限定されない。ＣＰＵ７０１は、成功データ４０２を記録装置４００に保存する際に、成功データ４０２のサンプル数と対応するサンプル数の初期収集データ４０１を破棄するように構成されていてもよい。 <Modification>
In this embodiment, when saving the successful data 402 determined to be successful as a result of the determination process, the CPU 701 is configured not to perform any particular processing on the initial collection data 401, but is not limited to this. When saving the successful data 402 in the recording device 400, the CPU 701 may be configured to discard the number of samples of the initial collection data 401 corresponding to the number of samples of the successful data 402.

このように構成されることで、ロボット動作学習装置１００は、取得した成功データ４０２の追加に基づき初期収集データ４０１の少なくとも一部を破棄することで、記録装置４００に保存するサンプルデータのデータ量が増大することを抑制することができる。 By configuring in this manner, the robot motion learning device 100 can prevent an increase in the amount of sample data stored in the recording device 400 by discarding at least a portion of the initially collected data 401 based on the addition of acquired successful data 402.

また、本実施形態のＣＰＵ７０１は、成功データ４０２のサンプル数が初期収集データ４０１のサンプル数の半分未満であった場合、成功データ４０２のすべてと、初期収集データ４０１のすべてと、を用いて再学習する構成であるが、これに限定されない。ＣＰＵ７０１は、例えば、成功データ４０２のすべてと、初期収集データ４０１の少なくとも一部と、を用いて再学習を行うように構成されていてもよい。具体的には、１００サンプルの初期収集データ４０１と、２０サンプルの成功データ４０２と、が記録装置４００に保存されている場合に、２０サンプルの成功データ４０２と、８０サンプルの初期収集データ４０１と、を用いて再学習してもよい。 In addition, the CPU 701 of this embodiment is configured to re-learn using all of the successful data 402 and all of the initially collected data 401 when the number of samples of the successful data 402 is less than half the number of samples of the initially collected data 401, but is not limited to this. For example, the CPU 701 may be configured to re-learn using all of the successful data 402 and at least a portion of the initially collected data 401. Specifically, when 100 samples of the initially collected data 401 and 20 samples of the successful data 402 are stored in the recording device 400, re-learning may be performed using 20 samples of the successful data 402 and 80 samples of the initially collected data 401.

また、本実施形態において、ロボット動作学習装置１００は、表示部としてのディスプレイ７０７と、入力部としてのキーボード７０８及びマウス７０９と、のそれぞれ別の構成を有しているが、これに限定されない。ロボット動作学習装置１００は、表示部及び入力部が一体となったタッチパネルを有していてもよい。このように構成される場合、ロボット動作学習装置１００は、タッチパネルに表示されるＧＵＩ６００に配置された各種ボタンをユーザーが押下することで、押下したボタンに対応する各種設定や処理を行うことができる。 In addition, in this embodiment, the robot motion learning device 100 has a display 707 as a display unit, and a keyboard 708 and a mouse 709 as input units, each of which is configured separately, but is not limited to this. The robot motion learning device 100 may have a touch panel in which the display unit and the input unit are integrated. When configured in this way, the robot motion learning device 100 can perform various settings and processes corresponding to various buttons that are arranged on the GUI 600 displayed on the touch panel when the user presses the buttons.

また、本発明の情報処理方法や情報処理装置は、生産設備の他に、例えば産業用ロボット、サービス用ロボット、コンピュータによる数値制御で動作する加工機械、等の様々な機械や設備のソフト設計やプログラム作成に適用することが可能である。例えば、制御装置に設けられる記録装置の情報に基づき、伸縮、屈伸、上下移動、左右移動もしくは旋回の動作又はこれらの複合動作を自動的に行うことができる機械及び設備である。 The information processing method and information processing device of the present invention can be applied to software design and program creation for various machines and equipment, such as industrial robots, service robots, and processing machines that operate under computer numerical control, in addition to production equipment. For example, machines and equipment that can automatically perform movements such as stretching, bending, stretching, moving up and down, moving left and right, or rotating, or a combination of these movements, based on information from a recording device provided in the control device.

また、上述した情報処理方法を実行し、制御対象物として、ロボットマニピュレータを含むロボットシステムの制御方法も、本発明の実施形態に含まれる。また、上述した情報処理方法を実行して動作するロボットシステムを用いて物品を製造する物品の製造方法も、本発明の実施形態に含まれる。また、上述した情報処理を実行可能なプログラム及び当該プログラムを格納したコンピュータで読取可能な記録媒体も、本発明の実施形態に含まれる。 Furthermore, the present invention also includes a control method for a robot system that executes the above-mentioned information processing method and includes a robot manipulator as a controlled object.Furthermore, the present invention also includes a manufacturing method for an article using a robot system that operates by executing the above-mentioned information processing method.Furthermore, the present invention also includes a program that can execute the above-mentioned information processing and a computer-readable recording medium that stores the program.

また、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記録媒体（記憶媒体）を介してシステム又は装置に供給し、該システム又は装置のコンピュータにおける１以上のプロセッサがプログラムを読出し実行する処理でも実現できる。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 Also, the present invention can be realized by supplying a program that realizes one or more of the functions of the above-mentioned embodiments to a system or device via a network or a recording medium (storage medium), and having one or more processors in a computer of the system or device read and execute the program. It can also be realized by a circuit (e.g., an ASIC) that realizes one or more of the functions.

また、本発明は、以上説明した実施形態に限定されるものではなく、本発明の技術的思想内で多くの変形が可能である。例えば、上述した異なる実施形態及び又は変形例を組み合わせて実施しても構わない。また、実施形態に記載された効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、実施形態及び又は変形例に記載されたものに限定されない。 The present invention is not limited to the embodiments described above, and many modifications are possible within the technical concept of the present invention. For example, the above-described different embodiments and/or modifications may be combined. The effects described in the embodiments are merely a list of the most favorable effects resulting from the present invention, and the effects of the present invention are not limited to those described in the embodiments and/or modifications.

本実施形態の開示は、以下の構成を含む。 The disclosure of this embodiment includes the following configuration:

（方法１）
制御対象物の動作環境とは異なる環境で前記制御対象物に対応する物体を動作させて第１データを取得し、前記第１データを用いた機械学習により学習済モデルを取得する処理と、
前記学習済モデルを用いて、前記制御対象物を前記動作環境で動作させるための第２データを取得する処理と、
前記第２データを用いて前記制御対象物を前記動作環境で動作させ、結果を判定する判定処理と、
前記判定処理において結果が成功と判定された動作についての第３データを取得し、前記第３データを用いた機械学習により前記学習済モデルを学習させる学習処理と、を実行する、
ことを特徴とする情報処理方法。 (Method 1)
A process of acquiring first data by operating an object corresponding to a control object in an environment different from an operating environment of the control object, and acquiring a trained model by machine learning using the first data;
A process of acquiring second data for operating the control object in the operating environment using the trained model;
a determination process of operating the controlled object in the operating environment using the second data and determining a result;
acquiring third data about the operation determined to be successful in the determination process, and performing a learning process of learning the trained model by machine learning using the third data;
23. An information processing method comprising:

（方法２）
前記学習処理は、前記第３データと、前記第１データの少なくとも一部と、を用いた機械学習により前記学習済モデルの学習をさせる、
ことを特徴とする方法１に記載の情報処理方法。 (Method 2)
The learning process includes learning the learned model by machine learning using the third data and at least a portion of the first data.
2. The information processing method according to claim 1,

（方法３）
前記学習処理は、取得した前記第３データの追加に基づき前記第１データの少なくとも一部を破棄する、
ことを特徴とする方法１又は２に記載の情報処理方法。 (Method 3)
the learning process discarding at least a portion of the first data based on the addition of the acquired third data;
3. The information processing method according to claim 1 or 2.

（方法４）
前記学習処理は、前記第３データのサンプル数が所定のサンプル数以上である際に、前記所定のサンプル数以上の前記第３データを学習データとして用いて学習する、
ことを特徴とする方法１乃至３のいずれか１つに記載の情報処理方法。 (Method 4)
When a number of samples of the third data is equal to or greater than a predetermined number of samples, the learning process uses the third data equal to or greater than the predetermined number of samples as learning data.
4. The information processing method according to any one of Methods 1 to 3.

（方法５）
前記判定処理は、機械学習によって実行される、
ことを特徴とする方法１乃至４のいずれか１つに記載の情報処理方法。 (Method 5)
The determination process is performed by machine learning.
5. The information processing method according to any one of Methods 1 to 4.

（方法６）
前記判定処理は、前記動作環境に基づく閾値を用いて成否の判定を行う、
ことを特徴とする方法１乃至５のいずれか１つに記載の情報処理方法。 (Method 6)
The determination process determines whether the process is successful or not by using a threshold value based on the operating environment.
6. The information processing method according to any one of methods 1 to 5.

（方法７）
前記判定処理の判定条件と、判定した結果と、を表示部に表示する処理を実行する、
ことを特徴とする方法１乃至６のいずれか１つに記載の情報処理方法。 (Method 7)
Execute a process of displaying the judgment conditions and the judgment results of the judgment process on a display unit.
7. The information processing method according to any one of methods 1 to 6.

（方法８）
前記判定処理において動作させた結果が失敗と判定された前記第２データを破棄する処理を実行する、
ことを特徴とする方法１乃至７のいずれか１つに記載の情報処理方法。 (Method 8)
execute a process of discarding the second data, the second data being determined to have been operated unsuccessfully in the determination process;
8. The information processing method according to any one of methods 1 to 7.

（方法９）
前記物体は、前記制御対象物の有する構成要素と対応する構成要素を有する、
ことを特徴とする方法１乃至８のいずれか１つに記載の情報処理方法。 (Method 9)
The object has a component corresponding to a component of the control target object.
9. The information processing method according to any one of methods 1 to 8.

（構成１０）
情報処理部を備える情報処理装置であって、
前記情報処理部が、
制御対象物の動作環境とは異なる環境で前記制御対象物に対応する物体を動作させて第１データを取得し、前記第１データを用いた機械学習により学習済モデルを取得する処理と、
前記学習済モデルを用いて、前記制御対象物を前記動作環境で動作させるための第２データを取得する処理と、
前記第２データを用いて前記制御対象物を前記動作環境で動作させ、結果を判定する判定処理と、
前記判定処理において結果が成功と判定された動作についての第３データを取得し、前記第３データを用いた機械学習により前記学習済モデルを学習させる学習処理と、を実行する、
ことを特徴とする情報処理装置。 (Configuration 10)
An information processing device including an information processing unit,
The information processing unit,
A process of acquiring first data by operating an object corresponding to a control object in an environment different from an operating environment of the control object, and acquiring a trained model by machine learning using the first data;
A process of acquiring second data for operating the control object in the operating environment using the trained model;
a determination process of operating the controlled object in the operating environment using the second data and determining a result;
acquiring third data about the operation determined to be successful in the determination process, and performing a learning process of learning the trained model by machine learning using the third data;
23. An information processing apparatus comprising:

（構成１１）
構成１０に記載の情報処理装置を備え、
前記制御対象物は、ロボットマニピュレータを含む、
ことを特徴とするロボットシステム。 (Configuration 11)
The information processing device according to configuration 10,
The control object includes a robot manipulator.
A robot system comprising:

（構成１２）
前記ロボットマニピュレータは、少なくとも１つ以上の駆動部を有し、
前記情報処理装置は、前記駆動部を駆動させる指令値を前記第２データとして取得する、
ことを特徴とする構成１１に記載のロボットシステム。 (Configuration 12)
The robot manipulator has at least one or more actuators,
The information processing device acquires, as the second data, a command value for driving the driving unit.
12. The robot system according to claim 11 .

（構成１３）
前記ロボットマニピュレータは、少なくとも１つ以上のセンサを有し、
前記情報処理装置は、前記センサの検出した値を用いて前記判定処理の判定を行う、
ことを特徴とする構成１１又は１２に記載のロボットシステム。 (Configuration 13)
The robot manipulator has at least one sensor;
The information processing device performs a determination of the determination process using a value detected by the sensor.
13. The robot system according to configuration 11 or 12.

（方法１４）
情報処理部が方法１乃至９のいずれか１つに記載の情報処理方法を実行し、
前記制御対象物は、ロボットマニピュレータを含む、
ことを特徴とするロボットシステムの制御方法。 (Method 14)
An information processing unit executes an information processing method according to any one of methods 1 to 9,
The control object includes a robot manipulator.
A method for controlling a robot system comprising:

（方法１５）
構成１１乃至１３のいずれか１つに記載のロボットシステムを用いて物品を製造する、
ことを特徴とする物品の製造方法。 (Method 15)
14. Manufacturing an article using the robot system according to any one of configurations 11 to 13.
A method for producing an article.

（構成１６）
方法１乃至９のいずれか１つに記載した情報処理方法を、コンピュータが実行するためのプログラム。 (Configuration 16)
A program for causing a computer to execute the information processing method according to any one of Methods 1 to 9.

（構成１７）
構成１６に記載のプログラムを記憶したコンピュータにより読取可能な記録媒体。 (Configuration 17)
A computer-readable recording medium storing the program according to configuration 16.

２０１…物体（模倣対象動作主）：３０１…制御対象物、ロボットマニピュレータ（制御対象ロボット）：３０１ａ…構成要素（ロボットハンド）：３０３…センサ（制御動作センサ）：４０１…第１データ（初期収集データ）：４０２…第３データ：７００…情報処理装置：７０１…情報処理部（ＣＰＵ）：７０６…学習済モデル：７０７…表示部（ディスプレイ） 201...Object (subject to be imitated): 301...Control object, robot manipulator (robot to be controlled): 301a...Component (robot hand): 303...Sensor (control operation sensor): 401...First data (initial collected data): 402...Third data: 700...Information processing device: 701...Information processing unit (CPU): 706...Trained model: 707...Display unit (display)

Claims

A process of acquiring first data by operating an object corresponding to a control object in an environment different from an operating environment of the control object, and acquiring a trained model by machine learning using the first data;
A process of acquiring second data for operating the control object in the operating environment using the trained model;
a determination process of operating the controlled object in the operating environment using the second data and determining a result;
acquiring third data about the operation determined to be successful in the determination process, and performing a learning process of learning the trained model by machine learning using the third data;
23. An information processing method comprising:

The learning process includes learning the learned model by machine learning using the third data and at least a portion of the first data.
2. The information processing method according to claim 1,

the learning process discarding at least a portion of the first data based on the addition of the acquired third data;
2. The information processing method according to claim 1,

When a number of samples of the third data is equal to or greater than a predetermined number of samples, the learning process uses the third data equal to or greater than the predetermined number of samples as learning data.
2. The information processing method according to claim 1,

The determination process is performed by machine learning.
2. The information processing method according to claim 1,

The determination process determines whether the process is successful or not by using a threshold value based on the operating environment.
2. The information processing method according to claim 1,

Execute a process of displaying the judgment conditions and the judgment results of the judgment process on a display unit.
2. The information processing method according to claim 1,

execute a process of discarding the second data, the second data being determined to have been operated unsuccessfully in the determination process;
2. The information processing method according to claim 1,

The object has a component corresponding to a component of the control target object.
2. The information processing method according to claim 1,

An information processing device including an information processing unit,
The information processing unit,
A process of acquiring first data by operating an object corresponding to a control object in an environment different from an operating environment of the control object, and acquiring a trained model by machine learning using the first data;
A process of acquiring second data for operating the control object in the operating environment using the trained model;
a determination process of operating the controlled object in the operating environment using the second data and determining a result;
acquiring third data about the operation determined to be successful in the determination process, and performing a learning process of learning the trained model by machine learning using the third data;
23. An information processing apparatus comprising:

The information processing device according to claim 10,
The control object includes a robot manipulator.
A robot system comprising:

The robot manipulator has at least one or more actuators,
The information processing device acquires, as the second data, a command value for driving the driving unit.
The robot system according to claim 11 .

The robot manipulator has at least one sensor;
The information processing device performs a determination of the determination process using a value detected by the sensor.
The robot system according to claim 11 .

An information processing unit executes the information processing method according to any one of claims 1 to 9,
The control object includes a robot manipulator.
A method for controlling a robot system comprising:

Manufacturing an article using the robot system according to claim 11.
A method for producing an article.

A program for causing a computer to execute the information processing method according to any one of claims 1 to 9.

A computer-readable recording medium storing the program according to claim 16.