JP2018200537A

JP2018200537A - Learning device, learning control method, and its program

Info

Publication number: JP2018200537A
Application number: JP2017104523A
Authority: JP
Inventors: 安藤　丹一; Tanichi Ando; 丹一安藤
Original assignee: Omron Corp; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2018-12-20
Anticipated expiration: 2037-05-26
Also published as: JP6863081B2; WO2018216493A1

Abstract

To provide a technique for reducing time required for a learning device to achieve the purposes of learning without through manual handling.SOLUTION: A learning device that learns the control of movements related to a predetermined task includes: a learning data reception unit for receiving learning data including the purposes of learning; a neural network for executing learning on the basis of the learning data; and an output unit for outputting the results of learning by the neural network. The neural network executes; first learning for achieving initial stages of the purposes of learning; second learning for learning control to reach a state where movements related to learning cannot be continued, based on the result of the first learning; and third learning for achieving the purposes of learning by elimination of the control for reaching the state where the movements cannot be continued, based on the result of the second learning.SELECTED DRAWING: Figure 1

Description

本発明は、学習装置、学習制御方法、及びそのプログラムに関する。 The present invention relates to a learning device, a learning control method, and a program thereof.

従来から、ニューラルネットワークなどの人工知能技術（以下、「ＡＩ技術」という。）に関する研究が、幅広く行われている（例えば、特許文献１参照）。特に、深層学習（ＤｅｅｐＬｅａｒｎｉｎｇ）と呼ばれるＡＩ技術の台頭により、例えば画像による対象物の認識技術は、ここ数年で認識率が急速に向上し、画像の分類については人の認識率を超えるレベルに到達しつつある。深層学習の技術は、画像の認識のみではなく、音声認識、個人認証、行動予測、文章の要約、自動翻訳、監視、自動運転、故障予測、センサデータの分析、楽曲のジャンル判定、コンテンツ生成、セキュリティシステム、その他幅広い分野への応用が期待されている。 Conventionally, research on artificial intelligence technology such as a neural network (hereinafter referred to as “AI technology”) has been widely performed (see, for example, Patent Document 1). In particular, with the rise of AI technology called Deep Learning, recognition technology for objects using images, for example, has rapidly improved the recognition rate in recent years, and the level of image classification exceeds human recognition rate. Is reaching. Deep learning technology is not only image recognition, but also voice recognition, personal authentication, behavior prediction, sentence summarization, automatic translation, monitoring, automatic driving, failure prediction, sensor data analysis, music genre determination, content generation, Applications in security systems and other broad fields are expected.

深層学習などの機械学習においては、機械に学習を実施させて所定の能力を獲得させることができる。このとき、機械学習を行う学習装置では、所定の能力を獲得するまで、学習する動作を繰り返し実行する。 In machine learning such as deep learning, a machine can perform learning to acquire a predetermined ability. At this time, the learning device that performs machine learning repeatedly performs the learning operation until a predetermined ability is acquired.

例えば、特許文献１には、ロボットの学習制御方法について開示されている。特許文献１に記載の学習制御方法においては、人が予め設定したロボット動作の目標となる目標軌道と、ロボットが実際に動作した場合の実軌道との間に生じる誤差に基づき、ロボットの駆動部へ供給する入力値を修正する。 For example, Patent Literature 1 discloses a robot learning control method. In the learning control method described in Patent Document 1, a robot driving unit is based on an error generated between a target trajectory that is a target of robot operation set in advance by a person and an actual trajectory when the robot actually operates. Correct the input value supplied to.

特開平６−２８９９１８号公報JP-A-6-289918

自動車のエンジンや走行の制御、あるいは化学プラント等の、数多くのセンサ情報に基づいてアクチュエータを制御するような学習装置においては、制御とセンサ情報の出力とが互いに影響を与えるため、制御方法を獲得するために、より複雑な学習を行う必要がある。したがって、このような複雑な学習を行う学習装置において、特許文献１のように、人が予め制御量の目標値を設定することは容易ではない。他方で、目標値を設定せずに学習装置に学習を行わせた場合、非常に多くのトライエラーを繰り返す必要があり、効率が悪い。 In learning devices that control actuators based on a large number of sensor information, such as control of automobile engines and driving, or chemical plants, the control method and output of sensor information influence each other, so a control method is acquired. To do this, it is necessary to perform more complicated learning. Therefore, it is not easy for a learning apparatus that performs such complicated learning to set a control amount target value in advance as in Patent Document 1. On the other hand, when the learning apparatus performs learning without setting a target value, it is necessary to repeat a large number of try errors, which is inefficient.

そこで、本発明は、人の手を介さずに、学習装置が学習目的を達成するのに要する時間を短縮するための技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique for shortening the time required for the learning device to achieve the learning purpose without involving human hands.

本発明の一側面に係る学習装置は、所定のタスクに係る動作の制御を学習する学習装置であって、学習目的を含む学習データを受け付ける学習データ受付部と、学習データに基づいて、学習を実行するニューラルネットワークと、ニューラルネットワークによる学習結果を出力する出力部と、を備え、ニューラルネットワークは、学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する。 A learning device according to an aspect of the present invention is a learning device that learns control of an operation related to a predetermined task, and a learning data receiving unit that receives learning data including a learning purpose, and learning based on the learning data. A neural network to be executed, and an output unit for outputting a learning result by the neural network. The neural network executes the first learning for achieving the initial stage of the learning purpose, and is based on the result of the first learning. Then, the second learning is performed to learn the control that reaches the state where the learning operation cannot be continued, and the learning purpose is achieved by excluding the control reaching the state where the learning cannot be continued based on the result of the second learning. 3rd learning for performing is performed.

上記構成によれば、学習目的を達成するための第３学習の前に、学習に係る動作を続行不能となる状態に至る制御の学習がなされる。これによって、人によって制御動作を制限する条件が与えられることなしに、装置自らが続行不能状態に至る制御を除外して学習を行うことができるため、より短期間で学習目的を達成することができる。 According to the above-described configuration, before the third learning for achieving the learning purpose, the control is learned so that the operation related to the learning cannot be continued. As a result, the learning can be achieved in a shorter period of time because it is possible to perform the learning by excluding the control in which the apparatus itself cannot be continued without being given a condition for limiting the control operation by a person. it can.

また、出力部は、第２学習の結果を出力してもよい。この態様によると、続行不能状態に至る制御の学習結果を、ほかの学習装置においても活用することができる。 The output unit may output a result of the second learning. According to this aspect, the learning result of the control leading to the state where the continuation is impossible can be utilized also in other learning apparatuses.

また、学習装置は、所定のタスクに係る一連の動作の制御を学習する学習装置であって、タスクを複数の場面に分割し、分割された場面それぞれにおいて、一連の動作のうち当該場面において行われる部分動作を特定する分類部をさらに備え、ニューラルネットワークは、第２学習及び第３学習を、部分動作ごとに実行してもよい。 The learning device is a learning device that learns control of a series of operations related to a predetermined task. The learning device divides the task into a plurality of scenes, and performs each of the divided scenes in the scene in the series of operations. The neural network may further include a classifying unit that identifies a partial action to be performed, and the neural network may perform the second learning and the third learning for each partial action.

この態様によると、学習装置は、学習に係る動作を、場面に応じてより小さな単位である部分動作に分類し、分類した部分動作ごとに学習することができる。これによって、よりより短期間で学習目的を達成することができる。 According to this aspect, the learning apparatus can classify the motions related to learning into partial motions that are smaller units depending on the scene, and can learn for each classified partial motion. As a result, the learning purpose can be achieved in a shorter period of time.

本発明の一側面に係る自動走行制御学習装置は、所定のコースを周回する車両の自動走行に係る一連の動作について制御を学習する自動走行制御学習装置であって、コースを所定時間以内に所定の回数周回することを目的とする学習目的を含む学習データを受け付ける学習データ受付部と、学習データに基づいて、学習を実行するニューラルネットワークと、ニューラルネットワークによる学習結果を出力する出力部と、を備え、ニューラルネットワークは、コースを１周できることを達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作が続行不能となる状態に至る制御を学習する第２学習を実行し、当該第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する。 An automatic travel control learning device according to one aspect of the present invention is an automatic travel control learning device that learns control regarding a series of operations related to automatic travel of a vehicle that circulates a predetermined course, and the course is determined within a predetermined time. A learning data receiving unit that receives learning data including a learning purpose intended to circulate the number of times, a neural network that performs learning based on the learning data, and an output unit that outputs a learning result by the neural network, And the neural network performs a first learning for achieving the ability to make one round of the course, and learns a control for reaching a state in which the operation related to the learning cannot be continued based on a result of the first learning. 3rd for performing learning and excluding the control which will be in the state which becomes impossible to continue based on the result of the said 2nd learning, and achieving a learning objective To run the learning.

また、本発明の一側面に係るロボット制御学習装置は、所定のワークを把持して、当該ワークの形状に応じた載置場所に積み上げるタスクに係る一連の動作について制御を学習するロボット制御学習装置であって、所定時間以内にワークを所定の個数、載置場所に積み上げることを目的とする学習目的を含む学習データを受け付ける学習データ受付部と、学習データに基づいて、学習を実行するニューラルネットワークと、ニューラルネットワークによる学習結果を出力する出力部と、を備え、ニューラルネットワークは、ワークを１つ前記載置場所に積むことを達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作が続行不能となる状態に至る制御を学習する第２学習を実行し、当該第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する。 The robot control learning device according to one aspect of the present invention is a robot control learning device that learns control about a series of operations related to a task of gripping a predetermined workpiece and stacking it on a placement place corresponding to the shape of the workpiece. A learning data receiving unit for receiving learning data including a learning purpose for the purpose of accumulating a predetermined number of workpieces in a predetermined place within a predetermined time, and a neural network for performing learning based on the learning data And an output unit that outputs a learning result by the neural network, and the neural network executes the first learning for achieving the loading of the work in the previous place, and the result of the first learning Based on the above, the second learning is performed to learn the control that leads to the state where the operation related to learning cannot be continued, and based on the result of the second learning Executes a third learning for achieving the learning object by excluding the control to the state becomes impossible to continue.

また、本発明の一側面に係る学習方法は、制御部を備えるコンピュータが実行する、所定のタスクに係る動作の制御を学習する学習方法であって、制御部が、学習目的を含む学習データを受け付けるステップと、学習データに基づいて、学習を実行するステップと、学習を実行するステップによる学習結果を出力するステップと、を実行し、学習を実行するステップは、学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する、ステップを含む、学習方法。 A learning method according to one aspect of the present invention is a learning method for learning control of an operation related to a predetermined task, which is executed by a computer including a control unit, and the control unit stores learning data including a learning purpose. The step of receiving, the step of executing learning based on the learning data, and the step of outputting the learning result by the step of executing learning, and the step of executing learning achieve the initial stage of learning purpose First learning is performed, and based on the result of the first learning, second learning is performed to learn control for reaching a state where the operation related to learning cannot be continued, and based on the result of the second learning A learning method including a step of performing a third learning for achieving a learning purpose by excluding a control that leads to a state where it is impossible to continue.

本発明の一側面に係るプログラムは、所定のタスクに係る動作の制御を学習するコンピュータに、学習目的を含む学習データを受け付ける手順、学習データに基づいて、学習を実行する手順、及び学習を実行する手順による学習結果を出力する手順、を実行させ、学習を実行する手順は、学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する、手順を含む、プログラム。 A program according to an aspect of the present invention executes a procedure for receiving learning data including a learning purpose in a computer that learns control of an operation related to a predetermined task, a procedure for executing learning based on the learning data, and learning. The procedure for outputting the learning result according to the procedure for performing the learning and executing the learning is to execute the first learning for achieving the initial stage of the learning purpose, and to perform the learning based on the result of the first learning. The second learning for learning the control that reaches the state where the operation cannot be continued is performed, and the control for reaching the state where the operation cannot be continued is excluded based on the result of the second learning to achieve the learning purpose. 3. A program including a procedure for performing learning.

また、本発明の一側面に係る装置は、所定のタスクを実行する装置であって、装置がタスクを実行するための動作に必要な情報をセンシングする第１センサと、アクチュエータと、アクチュエータによる装置の状態変化をセンシングする第２センサと、第１センサ及び第２センサから出力されるセンサ値に基づいてアクチュエータを制御する制御部と、上記の学習装置によって行われた学習結果を記憶する記憶部と、を備え、制御部は、記憶部に記憶された学習結果に基づいて、第１センサ及び第２センサから出力されるセンサ値に応じた制御量を決定する、装置。 An apparatus according to an aspect of the present invention is an apparatus that executes a predetermined task, and includes a first sensor that senses information necessary for the operation of the apparatus to execute the task, an actuator, and an apparatus using the actuator. A second sensor that senses a change in the state of the sensor, a control unit that controls the actuator based on sensor values output from the first sensor and the second sensor, and a storage unit that stores a learning result performed by the learning device The control unit determines a control amount according to the sensor value output from the first sensor and the second sensor based on the learning result stored in the storage unit.

本発明によれば、人の手を介さずに、学習装置が学習目的を達成するのに要する時間を短縮するための技術を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the technique for shortening the time required for a learning apparatus to achieve the learning objective can be provided without a human hand.

第１実施形態における学習装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置によって制御される車両が自動走行するコースを示す模式図である。It is a schematic diagram which shows the course which the vehicle controlled by the learning apparatus in 1st Embodiment drive | works automatically. 第１実施形態における学習装置の処理の概略を示すフローチャートである。It is a flowchart which shows the outline of a process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の処理の詳細を示すフローチャートである。It is a flowchart which shows the detail of a process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware constitutions of the learning apparatus in 1st Embodiment. 第２実施形態における学習装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the learning apparatus in 2nd Embodiment.

[第１実施形態]
以下、図面を参照して本発明の実施形態について詳細に説明する。なお、同一の要素には同一の符号を付し、重複する説明を省略する。また、以下の実施の形態は、本発明を説明するための例示であり、本発明をその実施の形態のみに限定する趣旨ではない。さらに、本発明は、その要旨を逸脱しない限り、さまざまな変形が可能である。 [First embodiment]
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In addition, the same code | symbol is attached | subjected to the same element and the overlapping description is abbreviate | omitted. Further, the following embodiments are exemplifications for explaining the present invention, and are not intended to limit the present invention only to the embodiments. Furthermore, the present invention can be variously modified without departing from the gist thereof.

＜１．システム概要＞
図１乃至図３を参照して本実施形態におけるシステムの概要について説明する。
図１は、本実施形態に係る学習装置１の概略構成を示すブロック図である。学習装置１は、所定のタスクを学習するものである。本実施形態に係る学習装置１は，一例として自動走行制御車両（以下、単に「車両」ともいう。）９０に搭載され、所定のコース（図２参照）を自動走行するための車両９０の制御を学習する。このとき学習装置１には、例えばオペレータ等から学習データが与えられる。学習データは、例えば次のような学習目的と学習要件を含むデータである。 <1. System overview>
An overview of the system in this embodiment will be described with reference to FIGS. 1 to 3.
FIG. 1 is a block diagram illustrating a schematic configuration of a learning device 1 according to the present embodiment. The learning device 1 learns a predetermined task. The learning device 1 according to the present embodiment is mounted on an automatic travel control vehicle (hereinafter also simply referred to as “vehicle”) 90 as an example, and controls the vehicle 90 to automatically travel a predetermined course (see FIG. 2). To learn. At this time, learning data is given to the learning device 1 from an operator, for example. The learning data is data including the following learning objectives and learning requirements, for example.

（学習目的）
・所定時間以内にコースを１０周してゴールする。
（学習要件）
・コースアウトしない
・周回方向は時計回り
・ゴールする
・初期段階レベルでは「コースを１周してゴールする」 (Learning purpose)
・ Go 10 courses within a predetermined time and finish.
(Learning requirements)
・ Do not go out of the course ・ Circling direction is clockwise ・ Goal ・ In the initial stage level, “Go around the course once”

なお、タスクは、学習に係る動作（本実施形態での「学習に係る動作」は、車両９０の自動走行に必要な各種制御である。なお、当該各種制御によって車両９０が実行する動作と考えてもよい。）で達成したいことであり、本実施形態ではコースを周回することである。また、学習目的はタスクが達成すべき水準であり、本実施形態では、上記のとおり「所定時間以内にコースを１０周してゴールすること」である。そうすると、本実施形態では、初期段階レベルの学習では、タスクが行えるようになることが、学習要件として与えられているとも考えられる。 Note that a task is an operation related to learning (the “operation related to learning” in the present embodiment is various controls necessary for automatic traveling of the vehicle 90. Note that the operation is performed by the vehicle 90 by the various controls. In this embodiment, it is to go around the course. The learning purpose is a level that the task should achieve. In the present embodiment, as described above, “goal is achieved by making 10 laps within a predetermined time”. Then, in this embodiment, it is considered that the learning requirement is that the task can be performed in the learning at the initial stage level.

また、以下の説明では、学習装置１はＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）やサーバ装置などのコンピュータによって構成されるものとして説明するが、これに限定されず、例えば、プロセッサ、ＲＡＭ、ＲＯＭを有する任意の組込装置によって実現されてもよい。また、各装置において実装される構成はソフトウェアによって実現される構成に限定されない。各装置に含まれる任意の構成は、ハードウェアによって実現される構成でもよい。例えば後述するニューラルネットワーク２２はカスタムＬＳＩ（Ｌａｒｇｅ-ＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）やＦＰＧＡ（Ｆｉｅｌｄ-ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の電子回路によって構成されてもよい。 Further, in the following description, the learning apparatus 1 is described as being configured by a computer such as a PC (Personal Computer) or a server apparatus, but is not limited to this. For example, an arbitrary set having a processor, RAM, and ROM May be realized by an embedded device. Further, the configuration implemented in each device is not limited to the configuration realized by software. Arbitrary configurations included in each device may be realized by hardware. For example, the neural network 22 to be described later may be configured by an electronic circuit such as a custom LSI (Large-Scale Integration) or FPGA (Field-Programmable Gate Array).

図１に示すように、学習装置１は、制御部１０と、機械学習部２０と、動作分類部３０と、記憶部４０とを有している。 As illustrated in FIG. 1, the learning device 1 includes a control unit 10, a machine learning unit 20, an action classification unit 30, and a storage unit 40.

制御部１０は、車両９０において、学習装置１外に設けられた制御用センサ９１、アクチュエータ９２、状態検知用センサ９３と接続されている。制御部１０は、制御用センサ９１、及び状態検知用センサ９３からの出力に応じて、アクチュエータ９２を制御して、車両９０の自動走行を実施する。 In the vehicle 90, the control unit 10 is connected to a control sensor 91, an actuator 92, and a state detection sensor 93 provided outside the learning device 1. The control unit 10 controls the actuator 92 in accordance with the outputs from the control sensor 91 and the state detection sensor 93 to perform the automatic traveling of the vehicle 90.

制御用センサ９１は、車両９０の自動走行制御を行うためのセンサ群である。例えば制御用センサ９１は、車載カメラやレーザ等の車外障害物検出センサ、路面状態検出センサ等から構成される。他方で、状態検知用センサ９３は、自動走行している車両９０の制御状態を検出するセンサ群である。例えば状態検知用センサ９３は、振動センサや騒音センサ、燃料消費量検出センサ、車速センサ、加速度センサ、ヨーレートセンサ等から構成される。 The control sensor 91 is a sensor group for performing automatic travel control of the vehicle 90. For example, the control sensor 91 includes an in-vehicle camera, a vehicle outside obstacle detection sensor such as a laser, a road surface state detection sensor, and the like. On the other hand, the state detection sensor 93 is a sensor group that detects the control state of the vehicle 90 that is traveling automatically. For example, the state detection sensor 93 includes a vibration sensor, a noise sensor, a fuel consumption detection sensor, a vehicle speed sensor, an acceleration sensor, a yaw rate sensor, and the like.

アクチュエータ９２は、車両９０を自動走行させるために制御部１０によって制御される。アクチュエータ９２は、例えばアクセルアクチュエータ、ブレーキアクチュエータ、及び操舵アクチュエータ等から構成される。アクセルアクチュエータは、制御部１０からの制御信号に応じてスロットル開度を制御することによって車両の駆動力を制御する。ブレーキアクチュエータは、制御部１０からの制御信号に応じてブレーキペダルの操作量を制御することにより、車両の車輪に対する制動力の制御を行う。操舵アクチュエータは、制御部１０からの制御信号に応じて電動パワーステアリングシステムの操舵アシストモータの駆動を制御して、車両の操舵作用の制御を行う。 The actuator 92 is controlled by the control unit 10 to cause the vehicle 90 to automatically travel. The actuator 92 includes, for example, an accelerator actuator, a brake actuator, and a steering actuator. The accelerator actuator controls the driving force of the vehicle by controlling the throttle opening in accordance with a control signal from the control unit 10. The brake actuator controls the braking force on the wheels of the vehicle by controlling the operation amount of the brake pedal in accordance with a control signal from the control unit 10. The steering actuator controls the steering action of the vehicle by controlling the driving of the steering assist motor of the electric power steering system in accordance with a control signal from the control unit 10.

次に、図３を参照して学習装置１が学習を行う手順を大まかに説明する。なお、各ステップの処理の詳細については、後述する。図３は、学習装置１が学習を行う際の、処理フローの概略を示すフローチャートである。まず、学習初期段階（Ｓ１）として、タスクが行えるようになること（すなわち初期段階の学習要件を満たした動作ができるようになること）を目的に学習が行われる。本実施形態における学習装置１は、初期段階としては、「コースを１周してゴールすること」が学習要件として与えられている。 Next, a procedure for the learning apparatus 1 to perform learning will be roughly described with reference to FIG. Details of the processing of each step will be described later. FIG. 3 is a flowchart illustrating an outline of a processing flow when the learning device 1 performs learning. First, as an initial learning stage (S1), learning is performed for the purpose of being able to perform a task (that is, to be able to perform an operation that satisfies the learning requirements of the initial stage). In the learning device 1 according to the present embodiment, as an initial stage, “go around the course and make a goal” is given as a learning requirement.

初期段階レベルの目的をクリアすると、次に動作の分類（Ｓ２）が行われる。この段階では、Ｓ１の学習初期段階で行った学習内容を解析することにより、タスクを所定のパラメータに基づいて複数に分割（以下では、分割されたタスクを「場面」ともいう。）し、分割された場面それぞれにおいて、タスクに係る一連の動作のうち当該場面において行われる動作（以下、「部分動作」ともいう。）を特定する処理が行われる。タスクを分割する所定のパラメータは、例えばタスクの学習中に係る動作の変位量や、タスクの学習中に係る動作を実行する環境（タスクの開始時点からの経過時間や、タスクの開始場所からの位置等）である。本実施形態では、所定のパラメータとして、タスクの開始場所からの位置（タスクの学習中に係る動作を実行する環境）を用いる。すなわち、本実施形態では、学習装置１は、コース上の位置に基づいて、タスクを場面に分割し、分割した場面に対応するコース単位で行われる動作に基づいて、学習に係る一連の動作が場面に分類される。場面に応じて分類された部分動作単位で学習を行うことで、学習の効率化を図ることができる。なお、本実施形態において、学習の効率化とは、例えば学習開始から学習目的の達成までの所要時間の短縮化を意味してもよい。 When the purpose of the initial stage level is cleared, the operation classification (S2) is performed next. At this stage, by analyzing the learning content performed in the initial learning stage of S1, the task is divided into a plurality of parts based on predetermined parameters (hereinafter, the divided task is also referred to as “scene”), and is divided. In each of the performed scenes, a process of specifying an operation (hereinafter, also referred to as “partial operation”) performed in the scene among a series of operations related to the task is performed. Predetermined parameters for dividing a task include, for example, the amount of movement displacement during task learning, the environment in which the operation associated with task learning is executed (elapsed time from the start of the task, Position). In the present embodiment, the position from the start location of the task (environment in which an operation related to task learning is executed) is used as the predetermined parameter. That is, in the present embodiment, the learning device 1 divides a task into scenes based on positions on the course, and performs a series of operations related to learning based on actions performed in units of courses corresponding to the divided scenes. Classified into scenes. Learning can be performed more efficiently by performing learning in units of partial motion classified according to the scene. In this embodiment, learning efficiency may mean, for example, shortening the time required from the start of learning to the achievement of the learning purpose.

動作を分類すると、次のステップとして、分類した部分動作ごとに、学習続行不能状態に至る制御の学習（Ｓ３）が行われる。ここで、学習続行不能状態とは、タスクが続行不能になる状態をいう。例えば、学習装置１における学習が所定の装置の制御である場合には、制御対象となる所定の装置の動作が停止した場合や、所定の装置が故障して動作不能に陥った場合をいう。本実施形態では、学習続行不能状態とは、例えば、コースアウトする、壁等にクラッシュして動けなくなる、故障する等の状態である。学習続行不能状態に至る制御を予め学習することによって、のちのステップで最適な制御を学習するうえで、学習続行不能状態に陥ることを避けて学習を行うことができる。これによってより効率的に学習を行うことが可能になる。 When the actions are classified, as a next step, learning of control for reaching the learning continuation impossible state (S3) is performed for each classified partial action. Here, the state in which learning cannot be continued refers to a state in which the task cannot be continued. For example, when the learning in the learning device 1 is control of a predetermined device, it means a case where the operation of the predetermined device to be controlled stops or a case where the predetermined device fails and becomes inoperable. In this embodiment, the state in which learning cannot be continued is, for example, a state in which the user goes out of course, crashes on a wall or the like, cannot move, or fails. By learning in advance the control leading to the state where learning cannot be continued, learning can be performed while avoiding the state where learning cannot be continued when learning optimal control in a later step. This makes it possible to perform learning more efficiently.

学習最終段階（Ｓ４）では、学習の最適化が行われる。この段階では、場面ごとに分類して学習した部分動作を組み合わせた上で、動作の開始から終了までを最適に行う学習が行われる。本実施形態では、最終段階の学習として、所定時間以内にコースを１０周してゴールする学習が行われる。 In the final learning stage (S4), learning optimization is performed. At this stage, learning that performs optimally from the start to the end of the motion is performed after combining the partial motions classified and learned for each scene. In the present embodiment, as the final stage of learning, learning is performed in which a goal is made by making 10 laps within a predetermined time.

＜２．詳細処理＞
次に、図４乃至図８を参照して、各ステップにおける学習装置１の処理の詳細について説明する。図４は本実施形態に係る学習装置１の詳細な構成を示すブロック図である。図４に示すように、機械学習部２０は、学習データ入出力部２１と、ニューラルネットワーク２２と、学習結果出力部２３とから構成される。また、動作分類部３０は、制御データ抽出部３１と、動作分類結果抽出部３２とから構成される。
以下では、図３のステップごとに、各部の処理の詳細について説明する。 <2. Detailed processing>
Next, with reference to FIG. 4 to FIG. 8, the details of the processing of the learning device 1 in each step will be described. FIG. 4 is a block diagram showing a detailed configuration of the learning apparatus 1 according to the present embodiment. As shown in FIG. 4, the machine learning unit 20 includes a learning data input / output unit 21, a neural network 22, and a learning result output unit 23. The operation classification unit 30 includes a control data extraction unit 31 and an operation classification result extraction unit 32.
Below, the detail of the process of each part is demonstrated for every step of FIG.

（２−１．学習初期段階）
図５は、図３に示したＳ１の学習初期段階における詳細な処理フローを示すフローチャートである。まず、学習の初期段階（第１学習）において、学習データ入出力部２１が学習データを受け付ける（Ｓ１０１）。学習データは、例えば上述した学習目的及び学習要件を含むデータである。 (2-1. Early learning stage)
FIG. 5 is a flowchart showing a detailed processing flow in the initial learning stage of S1 shown in FIG. First, in the initial stage of learning (first learning), the learning data input / output unit 21 receives learning data (S101). The learning data is data including the learning purpose and the learning requirements described above, for example.

次のステップ（Ｓ１０２）では、機械学習が行われる。本実施形態では、個々の制御動作を制限するための条件は予め指定されていないため、学習装置１自らが制御動作を学習することになる。具体的には、制御部１０は、ランダムな制御量をアクチュエータ９２に対して設定して動作させる。このとき車両９０は当然コースに沿って走行することはできないので、コースアウト等をしながらでたらめな走行をすることになる。制御部１０は、ランダムに与えた制御量に対する制御用センサ９１及び状態検知用センサ９３から出力（以下、「センサ値」ともいう。）を読み取り、これらのデータ（制御量及びセンサ値）を記憶部４０に記憶させる。ニューラルネットワーク２２は、記憶部４０を参照して、記憶された制御量とセンサ値とを読み取り、学習要件に適応する制御動作をＤｅｅｐＬｅａｒｎｉｎｇにより学習する（Ｓ１０２）。 In the next step (S102), machine learning is performed. In this embodiment, since the conditions for restricting individual control operations are not designated in advance, the learning device 1 learns the control operations themselves. Specifically, the control unit 10 operates by setting a random control amount for the actuator 92. At this time, since the vehicle 90 cannot naturally travel along the course, the vehicle 90 travels randomly while going out of the course. The control unit 10 reads outputs (hereinafter also referred to as “sensor values”) from the control sensor 91 and the state detection sensor 93 for the control amount given at random, and stores these data (control amount and sensor value). Store in the unit 40. The neural network 22 refers to the storage unit 40, reads the stored control amount and sensor value, and learns the control operation adapted to the learning requirement by deep learning (S102).

学習要件には、初期段階レベルの目的として、「コースを１周してゴールすること」が設定されている。したがって、学習装置１においては、例えば制御用センサ９１からの出力に基づいてコースを１周してゴールしたと判断した時点で、機械学習が初期段階レベルに達したと判定し（Ｓ１０３：Ｙ）、初期段階の学習を終了する。 The learning requirement is set to “go around the course and make a goal” as the purpose of the initial stage level. Accordingly, the learning device 1 determines that the machine learning has reached the initial stage level when it is determined that the goal has been made by making one round of the course based on the output from the control sensor 91, for example (S103: Y). , To finish the initial learning.

（２−２．動作の分類）
図６は、図３に示したＳ２の動作の分類における詳細な処理フローを示すフローチャートである。まず、動作の分類処理を行うに当たり、制御データ抽出部３１が学習初期段階終了時点における、制御用センサ９１のセンサ値と、これに対するアクチュエータ９２の制御量及び状態検知用センサ９３のセンサ値とを記憶部４０から抽出する（Ｓ２０１）。制御データ抽出部３１は、抽出した各値をニューラルネットワーク２２に対して学習データとして入力する。 (2-2. Classification of operation)
FIG. 6 is a flowchart showing a detailed processing flow in the operation classification of S2 shown in FIG. First, in performing the operation classification process, the control data extraction unit 31 obtains the sensor value of the control sensor 91 and the control amount of the actuator 92 and the sensor value of the state detection sensor 93 at the end of the learning initial stage. Extracted from the storage unit 40 (S201). The control data extraction unit 31 inputs the extracted values as learning data to the neural network 22.

次に、ニューラルネットワーク２２は、制御データ抽出部３１により入力された学習データに基づいて、機械学習を行う（Ｓ２０２）。このとき、ニューラルネットワーク２２では、周回動作を所定の個数に分割された場面に分類する。 Next, the neural network 22 performs machine learning based on the learning data input by the control data extraction unit 31 (S202). At this time, the neural network 22 classifies the orbiting operation into scenes divided into a predetermined number.

ニューラルネットワーク２２による周回動作の場面への分類処理についてより詳細に説明する。ニューラルネットワーク２２は、周回動作の場面への分類を場面ベクトルと動作ベクトルとに基づいて行う。場面ベクトルは、車両９０が行うタスクの場面を表す。場面ベクトルは、例えば、制御用センサ９１が出力するセンサ値（例えばスタート地点からの位置（又は距離）、及びスタート地点からの方向）から取得される。一例として、スタート地点を原点とするｘ、ｙ座標を想定した場合、地点ｌにおける場面ベクトルは、（ｌ_x、_y）で表すことが可能である。 A more detailed description will be given of the classification processing into scenes of the orbiting operation by the neural network 22. The neural network 22 classifies the circular motion into scenes based on the scene vector and the motion vector. The scene vector represents a scene of a task performed by the vehicle 90. The scene vector is acquired from, for example, sensor values output from the control sensor 91 (for example, the position (or distance) from the start point and the direction from the start point). As an example, assuming x and y coordinates with the starting point as the origin, the scene vector at the point l can be represented by (l _x , _y ).

他方、動作ベクトルは、走行する車両９０の制御状態を表す。動作ベクトルは、例えば状態検知用センサ９３が出力するセンサ値（例えば速度や加速度、角速度、角加速度等である）から取得される。一例として、ある地点ｌでの動作ベクトルは、当該地点ｌでの速度ｖ、加速度ａを用いて（ｖ_l、ａ_l）で表される。 On the other hand, the motion vector represents the control state of the traveling vehicle 90. The motion vector is acquired from, for example, a sensor value (for example, velocity, acceleration, angular velocity, angular acceleration, etc.) output by the state detection sensor 93. As an example, the motion vector at a certain point l is represented by (v _l , a _l ) using the velocity v and acceleration a at the point l.

ニューラルネットワーク２２は、場面ベクトル（ｌ_x、_y）に基づいて、タスクを場面に分割し、動作ベクトル（ｖ_l、ａ_l）に基づいて、分割した場面ごとに当該場面で学習すべき動作の分類を学習する。これによって、学習装置１は、自身が今どの場面にいるかを判定することで、場面に応じた部分動作を学習することができる。一例としてニューラルネットワーク２２は、場面ベクトルが表す位置に加え、動作ベクトルの変化点に着目することで、車両９０の動作の加速・減速・方向転換等を把握し、当該変化点に基づいて、一連の動作を場面に応じた動作に分類することができる。また、例えばニューラルネットワーク２２は、動作ベクトルの類似度に基づいて、動作の分類を学習することも可能である。 The neural network 22 divides the task into scenes based on the scene vector (l _x , _y ), and the action to be learned in the scene for each divided scene based on the motion vector (v _l , a _l ). Learn classification. Thereby, the learning apparatus 1 can learn the partial motion according to the scene by determining which scene it is in. As an example, the neural network 22 grasps acceleration / deceleration / direction change of the operation of the vehicle 90 by focusing on the change point of the motion vector in addition to the position represented by the scene vector, and based on the change point, a series of Can be classified into actions according to the scene. Further, for example, the neural network 22 can learn the classification of the action based on the similarity of the action vectors.

図２に示したコースの例では、タスクは、ア〜オの５つのコースに応じた場面に分割される。各場面に分類される部分動作は例えば以下のとおりである。
場面ア：第１ストレート部分動作（例えば次の第１コーナーに差し掛かる際の減速のタイミングや走行位置等の制御である。）
場面イ：第１コーナー部分動作（例えばコーナーでのハンドル操作や、第２ストレートへ進入するに際した加速のタイミング等の制御である。）
場面ウ：第２ストレート部分動作（例えば次の第２コーナーに差し掛かる際の減速のタイミングや走行位置等の制御である。）
場面エ：第２コーナー部分動作（例えばコーナーでのハンドル操作や、第３ストレートへ進入するに際した加速のタイミング等の制御である。）
場面オ：第３ストレート部分動作（例えば第１ストレートに進入するに際した加速等の制御である。） In the example of the course shown in FIG. 2, the task is divided into scenes corresponding to five courses a to o. The partial operations classified into each scene are as follows, for example.
Scene A: First straight partial operation (for example, control of deceleration timing, traveling position, etc. when approaching the next first corner)
Scene A: First corner partial movement (for example, control of steering operation at the corner, acceleration timing when entering the second straight, etc.)
Scene C: Second straight partial operation (for example, control of deceleration timing, travel position, etc. when approaching the next second corner)
Scene D: Second corner partial movement (for example, steering operation at the corner, control of acceleration timing when entering the third straight, etc.)
Scene O: Third straight partial movement (for example, control of acceleration or the like when entering the first straight)

なお、ニューラルネットワーク２２は、分割した場面を、進行順に応じて並び替え可能であることが好ましい。 Note that the neural network 22 is preferably capable of rearranging the divided scenes in the order of progress.

動作分類結果抽出部３２は、ニューラルネットワーク２２が学習した部分動作の分類を抽出し、記憶部４０に記憶させる（Ｓ２０３）。 The action classification result extraction unit 32 extracts the classification of partial actions learned by the neural network 22 and stores it in the storage unit 40 (S203).

（２−３．学習続行不能状態に至る制御の学習）
図７は、図３に示したＳ３の学習続行不能状態に至る制御の学習（第２学習）における詳細な処理フローを示すフローチャートである。まず、学習データ入出力部２１は、記憶部４０を参照し、Ｓ２の処理において分類された部分動作のうち、いずれかの部分動作を選択し、当該部分動作に必要なアクチュエータ９２への制御量を抽出する。さらに学習データ入出力部２１は、記憶部４０を参照し抽出した制御量において制御を実行し、その結果、学習続行不能状態に至ったか否かを例えば状態検知用センサ９３からの出力等に基づいて判定する。学習データ入出力部２１は、抽出した制御量とその結果、学習続行不能状態に至ったか否かの情報を学習データとして読み出し、ニューラルネットワーク２２に学習データとして与える。ニューラルネットワーク２２は、与えられた学習データのもと、ＤｅｅｐＬｅａｒｎｉｎｇにより学習を行う（Ｓ３０１）。 (2-3. Learning of control leading to a state where learning cannot be continued)
FIG. 7 is a flowchart showing a detailed processing flow in the learning of control (second learning) in S3 shown in FIG. First, the learning data input / output unit 21 refers to the storage unit 40, selects one of the partial operations classified in the process of S2, and controls the actuator 92 necessary for the partial operation. To extract. Further, the learning data input / output unit 21 executes control with the control amount extracted by referring to the storage unit 40, and as a result, whether or not the learning data input / output unit 21 has reached the state where learning cannot be continued is determined based on, for example, the output from the state detection sensor 93. Judgment. The learning data input / output unit 21 reads out the extracted control amount and, as a result, information on whether or not the learning continuation incapable state has been reached as learning data, and gives it to the neural network 22 as learning data. The neural network 22 performs learning by deep learning based on the given learning data (S301).

このとき、学習結果出力部２３は、学習続行不能状態に至る制御の学習結果を出力することが可能である。これによって、ニューラルネットワーク２２は、例えば同様の構成を備える別の学習装置１’から、学習続行不能状態に至った制御を学習データとして受け付けて追加学習を行うことができる（Ｓ３０２）。これによってより効率の良い学習を行うことができる。効率の良い学習とは、例えば学習開始から学習目的達成までに要する時間が短い学習をいう。なお、Ｓ３０２の処理は必須の処理ではない。 At this time, the learning result output unit 23 can output a learning result of control that leads to a state where learning cannot be continued. As a result, the neural network 22 can receive additional control as learning data, for example, from another learning device 1 ′ having the same configuration as learning data (S 302). As a result, more efficient learning can be performed. Efficient learning refers to learning in which the time required from the start of learning to the achievement of the learning purpose is short, for example. Note that the process of S302 is not an essential process.

学習装置１は、Ｓ３０１（及びＳ３０２）の処理を、分類されたすべての部分動作について実施する（Ｓ３０３）。 The learning device 1 performs the process of S301 (and S302) for all classified partial operations (S303).

必須ではないが、学習装置１は、分類されたすべての部分動作について学習続行不能状態に至る制御を学習した後に、一連の動作を通じて再度学習を行うことも可能である（Ｓ３０４）。これによって、より速い周回制御を行うことが可能になる。 Although not essential, the learning device 1 can learn again through a series of operations after learning the control to reach the learning continuation impossible state for all classified partial operations (S304). This makes it possible to perform faster circulation control.

このように、本実施形態に係る学習装置１が、分類された部分動作について、まず学習続行不能状態に至る制御を学習することによって、その後の学習において、当該制御を避けて学習することが可能になる。これによって、より効率的な学習を行うことができる。 As described above, the learning device 1 according to the present embodiment can learn about the classified partial operations by avoiding the control in the subsequent learning by first learning the control that reaches the learning continuation impossible state. become. Thereby, more efficient learning can be performed.

（２−４．最適化学習）
図８は、図３に示したＳ４の最適化学習（第３学習）における詳細な処理フローを示すフローチャートである。最適化学習では、Ｓ３までのステップで行った学習の最適化を図ることにより、学習開始時に学習データとして与えられた学習目的（本実施形態においては、「所定時間以内にコースを１０周してゴールする」ことである。）を達成するための学習を行う。最適化学習においては、Ｓ３で学習した学習続行不能状態に至る制御を除外して学習が行われる。このとき、学習データ入出力部２１は、記憶部４０を参照して、学習初期段階（図３のＳ１）において入力された学習データ（オペレータが設定したものである）を抽出する。また、学習データ入出力部２１は、さらに記憶部４０を参照して学習続行不能状態に至る制御を学習した後のニューラルネットワーク２２の状態を抽出する。学習データ入出力部２１は、抽出したこれらのデータを制御部１０に設定する。 (2-4. Optimization learning)
FIG. 8 is a flowchart showing a detailed processing flow in the optimization learning (third learning) in S4 shown in FIG. In the optimization learning, the learning purpose given as learning data at the start of learning is performed by optimizing the learning performed in the steps up to S3 (in this embodiment, “10 courses are made within a predetermined time. Learning to achieve “goal”. In the optimization learning, learning is performed excluding the control that has been learned in S3 and reaches the state where learning cannot be continued. At this time, the learning data input / output unit 21 refers to the storage unit 40 and extracts the learning data (set by the operator) input in the initial learning stage (S1 in FIG. 3). The learning data input / output unit 21 further refers to the storage unit 40 and extracts the state of the neural network 22 after learning the control to reach the learning continuation impossible state. The learning data input / output unit 21 sets the extracted data in the control unit 10.

制御部１０では、設定された上述のデータに基づいて、アクチュエータ９２に対する制御量を出力し、これに対する制御用センサ９１及び状態検知用センサ９３のセンサ値を取得する。制御部１０は、与えた制御量及び、これに対して出力されたセンサ値を記憶部４０に記憶させる。 The control unit 10 outputs a control amount for the actuator 92 based on the set data described above, and acquires sensor values of the control sensor 91 and the state detection sensor 93 for this. The control unit 10 causes the storage unit 40 to store the given control amount and the sensor value output thereto.

ニューラルネットワーク２２は、上記の処理において制御部１０が記憶させた制御量及びセンサ値を読み出して、ＤｅｅｐＬｅａｒｎｉｎｇにより学習を行う（Ｓ４０１）。これによってニューラルネットワーク２２は、学習続行不能状態に至る制御を学習した状態で、動作の開始から終了まで（すなわち、コースのスタートからゴールまで）をとおして、学習要件に適応する制御動作をより効率よく学習することができる。学習全体の最適化がなされるまでＳ４０１の処理が繰り返し行われる（Ｓ４０２）。最適化学習の結果は、学習結果出力部２３によって抽出され、記憶部４０に記憶される。これによって、最適化学習では、学習続行不能状態に至る制御を除外して学習を行うことができる。 The neural network 22 reads the control amount and the sensor value stored by the control unit 10 in the above processing, and performs learning by deep learning (S401). As a result, the neural network 22 learns the control leading to the state where the learning cannot be continued, and more efficiently performs the control operation adapted to the learning requirement from the start to the end of the operation (that is, from the start of the course to the goal). Can learn well. The process of S401 is repeatedly performed until the entire learning is optimized (S402). The result of optimization learning is extracted by the learning result output unit 23 and stored in the storage unit 40. Thereby, in the optimization learning, it is possible to perform the learning by excluding the control that leads to the state where the learning cannot be continued.

このように、本実施形態に係る学習装置１によると、学習装置１自身が、学習に係る動作を、部分動作に分類して学習を行うことができる。これによって分類した動作ごとに個別最適化を図ることができるため、より効率よく（すなわち、より短期間で）学習を行うことができる。さらに、本実施形態に係る学習装置１によると、部分動作を学習するに際して、まず学習続行不能状態に至る制御を学習する。これによって、人があらかじめ動作ごとに細かく条件を設定することなく、効率よく学習を行うことができる。 As described above, according to the learning device 1 according to the present embodiment, the learning device 1 itself can perform learning by classifying the operation related to learning into partial operations. Since individual optimization can be achieved for each operation classified by this, learning can be performed more efficiently (that is, in a shorter period of time). Furthermore, according to the learning device 1 according to the present embodiment, when learning the partial motion, first, the control to reach the state where the learning cannot be continued is learned. As a result, it is possible for a person to efficiently learn without setting detailed conditions for each operation in advance.

（ハードウェア構成）
図９を参照しながら、上述してきた学習装置１をコンピュータ８００により実現する場合のハードウェア構成の一例を説明する。なお、それぞれの装置の構成は、複数台の装置に分けて実現することもできる。 (Hardware configuration)
With reference to FIG. 9, an example of a hardware configuration when the learning device 1 described above is realized by a computer 800 will be described. The configuration of each device can be realized by dividing it into a plurality of devices.

図９に示すように、コンピュータ８００は、プロセッサ８０１、メモリ８０３、記憶装置８０５、入力インタフェース部（入力Ｉ／Ｆ部）８０７、データインタフェース部（データＩ／Ｆ部）８０９、通信インタフェース部（通信Ｉ／Ｆ部）８１１、及び表示装置８１３を含む。 As shown in FIG. 9, a computer 800 includes a processor 801, a memory 803, a storage device 805, an input interface unit (input I / F unit) 807, a data interface unit (data I / F unit) 809, a communication interface unit (communication). I / F unit) 811 and a display device 813.

プロセッサ８０１は、メモリ８０３に記憶されているプログラムを実行することによりコンピュータ８００における様々な処理を制御する。例えば、プロセッサ８０１がメモリ８０３に記憶されているプログラムを実行することで、学習装置１の制御部１０、機械学習部２０、及び動作分類部３０などが実現可能となる。 The processor 801 controls various processes in the computer 800 by executing a program stored in the memory 803. For example, when the processor 801 executes a program stored in the memory 803, the control unit 10, the machine learning unit 20, the action classification unit 30, and the like of the learning device 1 can be realized.

メモリ８０３は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の記憶媒体である。メモリ８０３は、プロセッサ８０１によって実行されるプログラムのプログラムコードや、プログラムの実行時に必要となるデータを一時的に記憶する。 The memory 803 is a storage medium such as a RAM (Random Access Memory). The memory 803 temporarily stores a program code of a program executed by the processor 801 and data necessary for executing the program.

記憶装置８０５は、例えばハードディスクドライブ（ＨＤＤ）やソリッドステートドライブ等の補助記憶装置、フラッシュメモリ等の不揮発性の記憶媒体である。記憶装置８０５は、オペレーティングシステムや、上記各構成を実現するための各種プログラムを記憶する。このようなプログラムやデータは、必要に応じてメモリ８０３にロードされることにより、プロセッサ８０１から参照される。例えば上述の記憶部４０は、記憶装置８０５によって実現される。 The storage device 805 is a non-volatile storage medium such as an auxiliary storage device such as a hard disk drive (HDD) or a solid state drive, or a flash memory. The storage device 805 stores an operating system and various programs for realizing the above-described configurations. Such programs and data are referred to by the processor 801 by being loaded into the memory 803 as necessary. For example, the storage unit 40 described above is realized by the storage device 805.

入力Ｉ／Ｆ部８０７は、管理者からの入力を受け付けるためのデバイスである。入力Ｉ／Ｆ部８０７の具体例としては、キーボードやマウス、タッチパネル、各種センサ、ウェアラブル・デバイス等が挙げられる。入力Ｉ／Ｆ部８０７は、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等のインタフェースを介してコンピュータ８００に接続されても良い。 The input I / F unit 807 is a device for receiving an input from the administrator. Specific examples of the input I / F unit 807 include a keyboard, a mouse, a touch panel, various sensors, and a wearable device. The input I / F unit 807 may be connected to the computer 800 via an interface such as a USB (Universal Serial Bus).

データＩ／Ｆ部８０９は、コンピュータ８００の外部からデータを入力するためのデバイスである。データＩ／Ｆ部８０９の具体例としては、各種記憶媒体に記憶されているデータを読み取るためのドライブ装置等がある。データＩ／Ｆ部８０９は、コンピュータ８００の外部に設けられることも考えられる。その場合、データＩ／Ｆ部８０９は、例えばＵＳＢ等のインタフェースを介してコンピュータ８００へと接続される。 A data I / F unit 809 is a device for inputting data from the outside of the computer 800. Specific examples of the data I / F unit 809 include a drive device for reading data stored in various storage media. The data I / F unit 809 may be provided outside the computer 800. In this case, the data I / F unit 809 is connected to the computer 800 via an interface such as a USB.

通信Ｉ／Ｆ部８１１は、コンピュータ８００の外部の装置と有線又は無線により、インターネットＮを介したデータ通信を行うためのデバイスである。通信Ｉ／Ｆ部８１１は、コンピュータ８００の外部に設けられることも考えられる。その場合、通信Ｉ／Ｆ部８１１は、例えばＵＳＢ等のインタフェースを介してコンピュータ８００に接続される。 The communication I / F unit 811 is a device for performing data communication with the external device of the computer 800 via the Internet N by wire or wireless. The communication I / F unit 811 may be provided outside the computer 800. In that case, the communication I / F unit 811 is connected to the computer 800 via an interface such as a USB.

表示装置８１３は、各種情報を表示するためのデバイスである。表示装置８１３の具体例としては、例えば液晶ディスプレイや有機ＥＬ（Ｅｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ、ウェアラブル・デバイスのディスプレイ等が挙げられる。表示装置８１３は、コンピュータ８００の外部に設けられても良い。その場合、表示装置８１３は、例えばディスプレイケーブル等を介してコンピュータ８００に接続される。 The display device 813 is a device for displaying various information. Specific examples of the display device 813 include a liquid crystal display, an organic EL (Electro-Luminescence) display, and a wearable device display. The display device 813 may be provided outside the computer 800. In that case, the display device 813 is connected to the computer 800 via, for example, a display cable.

[第２実施形態]
第１実施形態では、学習装置１が自動走行制御車両９０に用いられる例について説明した。しかし、学習装置１が適用される装置は、第１実施形態に示した例に限定されず、種々の装置に適用することができる。本実施形態では、ピックアンドプレース動作を行うことをタスクとする、ロボットの制御に適用する例について説明する。なお、第２実施形態では、第１実施形態との差異点を中心に説明する。 [Second Embodiment]
In the first embodiment, an example in which the learning device 1 is used for the automatic travel control vehicle 90 has been described. However, the device to which the learning device 1 is applied is not limited to the example shown in the first embodiment, and can be applied to various devices. In the present embodiment, an example applied to control of a robot having a task of performing a pick-and-place operation will be described. In the second embodiment, the difference from the first embodiment will be mainly described.

まず、図１０を参照して本実施形態に係るシステム構成について第１実施形態との違いを説明する。学習装置１の構成は第１実施形態と同様である。他方で、学習装置１外の構成について、本実施形態では、制御用センサ９１’は、ピックアンドプレース動作を行うためのセンサ群から構成される。具体的には、ワーク検出センサ（画像センサ）、ロボット把持用力覚センサ等から構成される。また、制御用センサ９１’は、画像認識アルゴリズムを有しており、把持するワークの形状を認識することができる。その他の学習装置１外の構成は第１実施形態と同様である。 First, the difference between the system configuration according to the present embodiment and the first embodiment will be described with reference to FIG. The configuration of the learning device 1 is the same as that of the first embodiment. On the other hand, regarding the configuration outside the learning device 1, in the present embodiment, the control sensor 91 'is configured by a sensor group for performing a pick-and-place operation. Specifically, it includes a workpiece detection sensor (image sensor), a robot gripping force sensor, and the like. Further, the control sensor 91 ′ has an image recognition algorithm and can recognize the shape of the workpiece to be gripped. Other configurations outside the learning device 1 are the same as those in the first embodiment.

次に、本実施形態に係る学習と第１実施形態に係る学習の違いについて説明する。
本実施形態に係るタスクであるピックアンドプレース動作は、以下の手順で行われる動作をいう。
１．ワーク形状を認識して把持する。
２．把持したワークを持ち上げる。
３．ワーク形状に応じた所定の位置へ持ち上げたワークを移動させる。
４．ワーク形状ごとに筒内に積み上げる。 Next, the difference between learning according to the present embodiment and learning according to the first embodiment will be described.
The pick and place operation that is a task according to the present embodiment refers to an operation performed in the following procedure.
1. Recognize and grip the workpiece shape.
2. Lift the gripped work.
3. The lifted work is moved to a predetermined position according to the work shape.
4). Each workpiece shape is stacked in a cylinder.

また、本実施形態に係るロボット制御の学習において、与えられる学習目的と学習要件とは次のとおりである。 In the robot control learning according to the present embodiment, the learning objectives and learning requirements given are as follows.

（学習目的）
・３種類の異なる形状（例えば、円柱ワーク、四角柱ワーク、及び三角柱ワークの３種類である。）をしたワークがバラ積みされたコンテナから、ピックアンドプレース動作により、所定時間以内に、ワーク形状に応じた入口を有する筒（円形、四角形、三角形）に、１０個のワークを積み上げる。
（学習要件）
・所定位置以外にワークを載置しない
・ワーク形状ごとに１０個のワークを筒の中で積み上げる
・初期レベルでは「１個のワークを、適切なワーク形状の筒の中に積む」 (Learning purpose)
・ Work shape within a specified time by pick-and-place operation from a container in which workpieces with three different shapes (for example, cylindrical workpiece, quadrangular prism workpiece and triangular prism workpiece) are stacked. Ten workpieces are stacked in a cylinder (circular, square, triangular) having an inlet corresponding to the above.
(Learning requirements)
-Do not place workpieces in any other position-Stack 10 workpieces in each cylinder for each workpiece shape-At the initial level, "load one workpiece in a cylinder of an appropriate workpiece shape"

本実施形態では、タスクは、ワークを形状に応じた筒に積み上げることである。また、本実施形態では、学習するピックアンドプレース動作は、第１の実施形態において、車両９０が走行するコースに基づいてタスクを場面に分割し、当該場面に基づいて部分動作を分類するのと同様の手順で、当該ピックアンドプレース動作も場面に応じて部分動作に分類処理されてもよい。例えば、本実施形態では、タスクは、タスクの学習中に係る動作の変位量に基づいて、ワークを把持する動作に応じた場面、ワークを運ぶ動作に応じた場面、ワークを積み上げる動作に応じた場面、に分割される。ピックアンドプレース動作は、分割された場面に応じて、部分動作に分類される。 In the present embodiment, the task is to stack the workpieces in a cylinder corresponding to the shape. In the present embodiment, the pick-and-place operation to be learned is that, in the first embodiment, the task is divided into scenes based on the course on which the vehicle 90 travels, and the partial operations are classified based on the scenes. In the same procedure, the pick-and-place operation may be classified into partial operations according to the scene. For example, in the present embodiment, the task corresponds to the scene according to the operation of gripping the workpiece, the scene according to the operation of transporting the workpiece, and the operation of stacking the workpieces based on the amount of movement displacement during learning of the task. Divided into scenes. The pick-and-place operation is classified as a partial operation according to the divided scene.

また、本実施形態において、学習続行不能状態とは、例えばワークが筒内に入らなくなる状態をいう。したがって、学習続行不能状態に至る制御の学習段階において、学習される制御は、例えば次のとおりである。
・載置場所を間違える（ワークの形状と筒の入り口の形状が異なる）
・ワークを積み上げる向きを間違える（ワークの形状の向きと筒の形状の向きとが異なる） In the present embodiment, the state where learning cannot be continued refers to, for example, a state where the workpiece does not enter the cylinder. Therefore, the control learned in the learning stage of the control to reach the state where learning cannot be continued is, for example, as follows.
・ Wrong mounting location (The shape of the workpiece and the shape of the cylinder entrance are different)
・ Incorrect stacking direction of workpieces (The direction of the shape of the workpiece is different from the direction of the cylinder shape)

本実施形態に係る学習装置１では、上記の学習続行不能状態に至る制御を予め学習することによって、ワーク形状と筒の形状を適切に認識することや、ワークを把持する際の向きについて予め学習することができる。これによって、最終段階の学習では、学習続行不能状態に至ることを避けることができるため、より学習の効率化を図ることができる。すなわち、学習目的達成までに要する時間をより短縮することができる。
その他の構成は第１実施形態と同様である。 In the learning device 1 according to the present embodiment, by learning in advance the control that leads to the state where learning cannot be continued, the workpiece shape and the shape of the cylinder are appropriately recognized, and the direction when gripping the workpiece is learned in advance. can do. As a result, in the final stage of learning, it is possible to avoid reaching a state in which learning cannot be continued, so that learning efficiency can be further improved. That is, the time required to achieve the learning purpose can be further shortened.
Other configurations are the same as those of the first embodiment.

以上、本発明の一実施形態について説明した。なお、本実施形態は、本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。また、本発明は、その趣旨を逸脱することなく、変更ないし改良され得るものである。例えば、上述の処理フローにおける各ステップは処理内容に矛盾を生じない範囲で各ステップの一部を省略したり、各処理ステップの順番を任意に変更して又は並列に実行することができる。 The embodiment of the present invention has been described above. Note that this embodiment is intended to facilitate understanding of the present invention and is not intended to limit the present invention. The present invention can be changed or improved without departing from the spirit of the present invention. For example, each step in the above-described processing flow can be executed in parallel, with some of the steps omitted in a range where no contradiction occurs in the processing content, or the order of the processing steps is arbitrarily changed.

上述の実施形態では、本発明に係るシステムを利用して、深層学習などのＡＩ技術によって機械が獲得した能力の管理を行う例について説明したが、本発明はこれに限定されず、幅広い分野に適用することができる。例えば、製品の良品と不良品の識別、食品、機械部品、化学製品、薬品などのさまざまな工業分野、漁業分野、農業分野、林業分野、サービス業、医療や健康分野に適用することができる。また、組込分野の製品にＡＩ技術を適用する場合や社会システム等のＩＴ技術を活用したシステム、ビッグデータの分析、幅広い制御装置における分類処理等に本発明を適用してもよい。 In the above-described embodiment, the example in which the ability acquired by the machine is managed by the AI technology such as deep learning using the system according to the present invention has been described. However, the present invention is not limited to this and is applied to a wide range of fields. Can be applied. For example, the present invention can be applied to various industrial fields such as identification of good and defective products, food products, machine parts, chemical products, and pharmaceuticals, fishery fields, agricultural fields, forestry fields, service industries, medical care, and health fields. Further, the present invention may be applied to a case where AI technology is applied to products in the embedded field, a system utilizing IT technology such as a social system, analysis of big data, classification processing in a wide range of control devices, and the like.

なお、本明細書において、「部」や「手段」、「手順」とは、単に物理的構成を意味するものではなく、その「部」が行う処理をソフトウェアによって実現する場合も含む。また、１つの「部」や「手段」、「手順」や装置が行う処理が２つ以上の物理的構成や装置により実行されても、２つ以上の「部」や「手順」、装置が行う処理が１つの物理的手段や装置により実行されてもよい。 In this specification, “unit”, “means”, and “procedure” do not simply mean a physical configuration, but also include a case where processing performed by the “unit” is realized by software. In addition, even if one “unit”, “means”, “procedure”, or processing performed by an apparatus is executed by two or more physical configurations or apparatuses, two or more “parts”, “procedures”, and apparatuses The processing to be performed may be executed by one physical means or apparatus.

また、上記の実施形態の一部又は全部は、以下の付記のようにも記載され得るが、以下には限られない。
（付記１）
少なくとも１つのハードウェアプロセッサを備え、
前記ハードウェアプロセッサは、
学習目的を含む学習データを受け付け、
前記学習データに基づいて、学習を実行し、
前記ニューラルネットワークによる学習結果を出力し、
前記学習を実行することは、
前記学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、前記学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、前記第２学習の結果に基づいて、前記続行不能となる状態に至る制御を除外して前記学習目的を達成するための第３学習を実行することを含む。
（付記２）
少なくとも１つ以上のハードウェアプロセッサによって、
学習を行うステップであって、
学習目的を含む学習データを受け付けるステップと、
前記学習データに基づいて、学習を実行するステップと、
前記ニューラルネットワークによる学習結果を出力するステップと、
を実行し、
前記学習を実行するステップは、
前記学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、前記学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、前記第２学習の結果に基づいて、前記続行不能となる状態に至る制御を除外して前記学習目的を達成するための第３学習を実行する、ステップを含む、学習方法。 Moreover, although a part or all of said embodiment may be described also as the following additional remarks, it is not restricted to the following.
(Appendix 1)
Comprising at least one hardware processor;
The hardware processor is
Accept learning data including learning purpose,
Performing learning based on the learning data;
Output the learning result by the neural network,
Performing the learning includes
The first learning for achieving the initial stage of the learning purpose is executed, and the second learning for learning the control to the state where the operation related to the learning cannot be continued is executed based on the result of the first learning Then, based on the result of the second learning, the third learning for achieving the learning purpose is performed by excluding the control that reaches the state where the continuation becomes impossible.
(Appendix 2)
By at least one or more hardware processors,
A step of learning,
Accepting learning data including learning objectives;
Performing learning based on the learning data;
Outputting a learning result by the neural network;
Run
The step of performing the learning includes:
The first learning for achieving the initial stage of the learning purpose is executed, and the second learning for learning the control to the state where the operation related to the learning cannot be continued is executed based on the result of the first learning Then, based on the result of the second learning, a learning method including a step of performing the third learning for achieving the learning purpose by excluding the control to reach the state where it is impossible to continue.

１学習装置
１０制御部
２０機械学習部
２１学習データ入出力部
２２ニューラルネットワーク
２３学習結果出力部
３０動作分類部
３１制御データ抽出部
３２動作分類結果抽出部
４０記憶部
９０自動走行制御車両
９１制御用センサ
９２アクチュエータ
９３状態検知用センサ DESCRIPTION OF SYMBOLS 1 Learning apparatus 10 Control part 20 Machine learning part 21 Learning data input / output part 22 Neural network 23 Learning result output part 30 Action classification part 31 Control data extraction part 32 Action classification result extraction part 40 Storage part 90 Automatic travel control vehicle 91 For control Sensor 92 Actuator 93 Condition detection sensor

Claims

A learning device that learns control of operations related to a predetermined task,
A learning data receiving unit for receiving learning data including a learning purpose;
A neural network for performing learning based on the learning data;
An output unit for outputting a learning result by the neural network;
With
The neural network is
The first learning for achieving the initial stage of the learning purpose is executed, and the second learning for learning the control to the state where the operation related to the learning cannot be continued is executed based on the result of the first learning Then, based on the result of the second learning, the third learning for achieving the learning purpose is performed by excluding the control that reaches the state where the continuation is impossible.
Learning device.

The output unit is
Outputting the result of the second learning;
The learning device according to claim 1.

The learning device
A learning device for learning control of a series of operations related to a predetermined task,
The task is further divided into a plurality of scenes, and each of the divided scenes further includes a classification unit that identifies a partial action performed in the scene among the series of actions,
The neural network performs the second learning and the third learning for each partial operation.
The learning device according to claim 1.

An automatic traveling control learning device that learns control about a series of operations related to automatic traveling of a vehicle that circulates a predetermined course,
A learning data receiving unit for receiving learning data including a learning purpose for the purpose of circulating the course a predetermined number of times within a predetermined time;
A neural network for performing learning based on the learning data;
An output unit for outputting a learning result by the neural network;
With
The neural network is
The first learning is performed to achieve the ability to make one round of the course, and the second learning is performed to learn the control to reach a state where the operation related to the learning cannot be continued based on the result of the first learning. Then, based on the result of the second learning, the third learning for achieving the learning purpose is performed by excluding the control to reach the state where the continuation is impossible.
Automatic travel control learning device.

A robot control learning device that learns control about a series of operations related to a task of gripping a predetermined workpiece and stacking it on a placement place corresponding to the shape of the workpiece,
A learning data receiving unit for receiving learning data including a learning purpose for the purpose of accumulating a predetermined number of the workpieces within the predetermined time within a predetermined time;
A neural network for performing learning based on the learning data;
An output unit for outputting a learning result by the neural network;
With
The neural network is
Execute first learning to achieve loading the work in the previous place, and learn control based on the result of the first learning to reach a state where the operation related to the learning cannot be continued. Performing the second learning to perform, and based on the result of the second learning, to perform the third learning for achieving the learning purpose by excluding the control to reach the state where it is impossible to continue,
Robot control learning device.

A learning method for learning control of an operation related to a predetermined task, which is executed by a computer including a control unit,
The control unit is
Accepting learning data including learning objectives;
Performing learning based on the learning data;
Outputting a learning result obtained by executing the learning;
Run
The step of performing the learning includes:
The first learning for achieving the initial stage of the learning purpose is executed, and the second learning for learning the control to the state where the operation related to the learning cannot be continued is executed based on the result of the first learning And, based on the result of the second learning, including the step of performing the third learning for achieving the learning purpose by excluding the control that leads to the state where the continuation becomes impossible,
Learning method.

A computer that learns the control of operations related to a given task.
Procedures for accepting learning data including learning objectives,
A procedure for performing learning based on the learning data, and a procedure for outputting a learning result by the means for performing learning;
And execute
The procedure for performing the learning is as follows:
The first learning for achieving the initial stage of the learning purpose is executed, and the second learning for learning the control to the state where the operation related to the learning cannot be continued is executed based on the result of the first learning And, based on the result of the second learning, including the procedure of performing the third learning for achieving the learning purpose by excluding the control that reaches the state where the continuation is impossible.
program.

An apparatus for performing a predetermined task,
A first sensor that senses information necessary for the device to perform a task;
An actuator,
A second sensor for sensing a change in state of the device by the actuator;
A control unit for controlling the actuator based on sensor values output from the first sensor and the second sensor;
A storage unit for storing a learning result performed by the learning device according to any one of claims 1 to 3;
With
The controller is
Based on the learning result stored in the storage unit, a control amount corresponding to a sensor value output from the first sensor and the second sensor is determined.
apparatus.