JP6863081B2

JP6863081B2 - Learning device, learning control method, and its program

Info

Publication number: JP6863081B2
Application number: JP2017104523A
Authority: JP
Inventors: 安藤　丹一; 丹一安藤
Original assignee: Omron Corp
Current assignee: Omron Corp
Priority date: 2017-05-26
Filing date: 2017-05-26
Publication date: 2021-04-21
Anticipated expiration: 2037-05-26
Also published as: JP2018200537A; WO2018216493A1

Description

本発明は、学習装置、学習制御方法、及びそのプログラムに関する。 The present invention relates to a learning device, a learning control method, and a program thereof.

従来から、ニューラルネットワークなどの人工知能技術（以下、「ＡＩ技術」という。）に関する研究が、幅広く行われている（例えば、特許文献１参照）。特に、深層学習（ＤｅｅｐＬｅａｒｎｉｎｇ）と呼ばれるＡＩ技術の台頭により、例えば画像による対象物の認識技術は、ここ数年で認識率が急速に向上し、画像の分類については人の認識率を超えるレベルに到達しつつある。深層学習の技術は、画像の認識のみではなく、音声認識、個人認証、行動予測、文章の要約、自動翻訳、監視、自動運転、故障予測、センサデータの分析、楽曲のジャンル判定、コンテンツ生成、セキュリティシステム、その他幅広い分野への応用が期待されている。 Conventionally, research on artificial intelligence technology such as neural networks (hereinafter referred to as "AI technology") has been widely conducted (see, for example, Patent Document 1). In particular, with the rise of AI technology called deep learning, for example, the recognition rate of object recognition technology using images has improved rapidly in the last few years, and the level of image classification exceeds the human recognition rate. Is reaching. Deep learning technology is not limited to image recognition, but also voice recognition, personal authentication, behavior prediction, sentence summarization, automatic translation, monitoring, automatic driving, failure prediction, sensor data analysis, song genre judgment, content generation, It is expected to be applied to security systems and a wide range of other fields.

深層学習などの機械学習においては、機械に学習を実施させて所定の能力を獲得させることができる。このとき、機械学習を行う学習装置では、所定の能力を獲得するまで、学習する動作を繰り返し実行する。 In machine learning such as deep learning, it is possible to have a machine perform learning to acquire a predetermined ability. At this time, the learning device that performs machine learning repeatedly executes the learning operation until a predetermined ability is acquired.

例えば、特許文献１には、ロボットの学習制御方法について開示されている。特許文献１に記載の学習制御方法においては、人が予め設定したロボット動作の目標となる目標軌道と、ロボットが実際に動作した場合の実軌道との間に生じる誤差に基づき、ロボットの駆動部へ供給する入力値を修正する。 For example, Patent Document 1 discloses a learning control method for a robot. In the learning control method described in Patent Document 1, the driving unit of the robot is based on an error that occurs between a target trajectory that is a target of robot movement preset by a person and an actual trajectory when the robot actually moves. Correct the input value supplied to.

特開平６−２８９９１８号公報Japanese Unexamined Patent Publication No. 6-289918

自動車のエンジンや走行の制御、あるいは化学プラント等の、数多くのセンサ情報に基づいてアクチュエータを制御するような学習装置においては、制御とセンサ情報の出力とが互いに影響を与えるため、制御方法を獲得するために、より複雑な学習を行う必要がある。したがって、このような複雑な学習を行う学習装置において、特許文献１のように、人が予め制御量の目標値を設定することは容易ではない。他方で、目標値を設定せずに学習装置に学習を行わせた場合、非常に多くのトライエラーを繰り返す必要があり、効率が悪い。 In learning devices that control actuators based on a large amount of sensor information, such as automobile engine and running control, or chemical plants, control and sensor information output affect each other, so a control method is acquired. In order to do so, we need to do more complicated learning. Therefore, in a learning device that performs such complicated learning, it is not easy for a person to set a target value of a controlled amount in advance as in Patent Document 1. On the other hand, if the learning device is made to perform learning without setting a target value, it is necessary to repeat a large number of trial errors, which is inefficient.

そこで、本発明は、人の手を介さずに、学習装置が学習目的を達成するのに要する時間を短縮するための技術を提供することを目的とする。 Therefore, an object of the present invention is to provide a technique for shortening the time required for a learning device to achieve a learning purpose without human intervention.

本発明の一側面に係る学習装置は、所定のタスクに係る動作の制御を学習する学習装置であって、学習目的を含む学習データを受け付ける学習データ受付部と、学習データに基づいて、学習を実行するニューラルネットワークと、ニューラルネットワークによる学習結果を出力する出力部と、を備え、ニューラルネットワークは、学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する。 The learning device according to one aspect of the present invention is a learning device that learns the control of movements related to a predetermined task, and learns based on a learning data receiving unit that receives learning data including a learning purpose and learning data. A neural network to be executed and an output unit for outputting the learning result by the neural network are provided, and the neural network executes the first learning for achieving the initial stage of the learning purpose and is based on the result of the first learning. Then, the second learning is executed to learn the control that leads to the state in which the operation related to learning cannot be continued, and based on the result of the second learning, the control that leads to the state in which the learning cannot be continued is excluded and the learning purpose is achieved. Perform the third learning to do.

上記構成によれば、学習目的を達成するための第３学習の前に、学習に係る動作を続行不能となる状態に至る制御の学習がなされる。これによって、人によって制御動作を制限する条件が与えられることなしに、装置自らが続行不能状態に至る制御を除外して学習を行うことができるため、より短期間で学習目的を達成することができる。 According to the above configuration, before the third learning for achieving the learning purpose, the control learning to reach a state in which the operation related to the learning cannot be continued is performed. As a result, the learning purpose can be achieved in a shorter period of time because the learning can be performed excluding the control that leads to the inability to continue the device itself without giving the condition for restricting the control operation by a person. it can.

また、出力部は、第２学習の結果を出力してもよい。この態様によると、続行不能状態に至る制御の学習結果を、ほかの学習装置においても活用することができる。 Further, the output unit may output the result of the second learning. According to this aspect, the learning result of the control leading to the non-continuation state can be utilized in other learning devices.

また、学習装置は、所定のタスクに係る一連の動作の制御を学習する学習装置であって、タスクを複数の場面に分割し、分割された場面それぞれにおいて、一連の動作のうち当該場面において行われる部分動作を特定する分類部をさらに備え、ニューラルネットワークは、第２学習及び第３学習を、部分動作ごとに実行してもよい。 Further, the learning device is a learning device that learns the control of a series of movements related to a predetermined task. The task is divided into a plurality of scenes, and in each of the divided scenes, a line in the series of movements is performed. The neural network may execute the second learning and the third learning for each partial motion, further including a classification unit for specifying the partial motion.

この態様によると、学習装置は、学習に係る動作を、場面に応じてより小さな単位である部分動作に分類し、分類した部分動作ごとに学習することができる。これによって、よりより短期間で学習目的を達成することができる。 According to this aspect, the learning device can classify the movements related to learning into partial movements, which are smaller units according to the scene, and can learn each of the classified partial movements. As a result, the learning purpose can be achieved in a shorter period of time.

本発明の一側面に係る自動走行制御学習装置は、所定のコースを周回する車両の自動走行に係る一連の動作について制御を学習する自動走行制御学習装置であって、コースを所定時間以内に所定の回数周回することを目的とする学習目的を含む学習データを受け付ける学習データ受付部と、学習データに基づいて、学習を実行するニューラルネットワークと、ニューラルネットワークによる学習結果を出力する出力部と、を備え、ニューラルネットワークは、コースを１周できることを達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作が続行不能となる状態に至る制御を学習する第２学習を実行し、当該第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する。 The automatic running control learning device according to one aspect of the present invention is an automatic running control learning device that learns control about a series of operations related to automatic running of a vehicle orbiting a predetermined course, and determines the course within a predetermined time. A learning data receiving unit that receives learning data including a learning purpose for the purpose of orbiting the number of times, a neural network that executes learning based on the learning data, and an output unit that outputs learning results by the neural network. In preparation, the neural network executes the first learning to achieve the ability to go around the course, and based on the result of the first learning, learns the control to reach the state where the operation related to the learning cannot be continued. 2 Learning is executed, and based on the result of the second learning, the third learning for achieving the learning purpose is executed by excluding the control leading to the state of being unable to continue.

また、本発明の一側面に係るロボット制御学習装置は、所定のワークを把持して、当該ワークの形状に応じた載置場所に積み上げるタスクに係る一連の動作について制御を学習するロボット制御学習装置であって、所定時間以内にワークを所定の個数、載置場所に積み上げることを目的とする学習目的を含む学習データを受け付ける学習データ受付部と、学習データに基づいて、学習を実行するニューラルネットワークと、ニューラルネットワークによる学習結果を出力する出力部と、を備え、ニューラルネットワークは、ワークを１つ前記載置場所に積むことを達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作が続行不能となる状態に至る制御を学習する第２学習を実行し、当該第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する。 Further, the robot control learning device according to one aspect of the present invention is a robot control learning device that learns control about a series of operations related to a task of grasping a predetermined work and stacking it in a mounting place according to the shape of the work. A learning data reception unit that accepts learning data including a learning purpose for the purpose of stacking a predetermined number of works in a predetermined number of places within a predetermined time, and a neural network that executes learning based on the learning data. And an output unit that outputs the learning result by the neural network, the neural network executes the first learning to achieve stacking the work in the previously described place, and the result of the first learning. Based on the above, the second learning is executed to learn the control that leads to the state in which the operation related to learning cannot be continued, and based on the result of the second learning, the control that leads to the state in which the learning cannot be continued is excluded and learned. Perform the third learning to achieve the purpose.

また、本発明の一側面に係る学習方法は、制御部を備えるコンピュータが実行する、所定のタスクに係る動作の制御を学習する学習方法であって、制御部が、学習目的を含む学習データを受け付けるステップと、学習データに基づいて、学習を実行するステップと、学習を実行するステップによる学習結果を出力するステップと、を実行し、学習を実行するステップは、学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する、ステップを含む、学習方法。 Further, the learning method according to one aspect of the present invention is a learning method for learning the control of an operation related to a predetermined task executed by a computer provided with a control unit, and the control unit receives learning data including a learning purpose. The step of executing the learning, the step of executing the learning based on the learning data, the step of outputting the learning result by the step of executing the learning, and the step of executing the learning achieve the initial stage of the learning purpose. The first learning for learning is executed, and based on the result of the first learning, the second learning that learns the control to reach the state where the operation related to the learning cannot be continued is executed, and based on the result of the second learning. A learning method that includes a step of performing a third learning to achieve a learning objective, excluding control leading to a state of being unable to continue.

本発明の一側面に係るプログラムは、所定のタスクに係る動作の制御を学習するコンピュータに、学習目的を含む学習データを受け付ける手順、学習データに基づいて、学習を実行する手順、及び学習を実行する手順による学習結果を出力する手順、を実行させ、学習を実行する手順は、学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、第２学習の結果に基づいて、続行不能となる状態に至る制御を除外して学習目的を達成するための第３学習を実行する、手順を含む、プログラム。 The program according to one aspect of the present invention includes a procedure for receiving learning data including a learning purpose, a procedure for executing learning based on the learning data, and a procedure for executing learning on a computer that learns the control of an operation related to a predetermined task. The procedure for outputting the learning result according to the procedure to be performed is to execute the first learning for achieving the initial stage of the learning purpose, and based on the result of the first learning, the learning is performed. The second learning is executed to learn the control leading to the state in which the operation cannot be continued, and based on the result of the second learning, the control leading to the state in which the operation cannot be continued is excluded to achieve the learning purpose. 3 A program that includes procedures to perform learning.

また、本発明の一側面に係る装置は、所定のタスクを実行する装置であって、装置がタスクを実行するための動作に必要な情報をセンシングする第１センサと、アクチュエータと、アクチュエータによる装置の状態変化をセンシングする第２センサと、第１センサ及び第２センサから出力されるセンサ値に基づいてアクチュエータを制御する制御部と、上記の学習装置によって行われた学習結果を記憶する記憶部と、を備え、制御部は、記憶部に記憶された学習結果に基づいて、第１センサ及び第２センサから出力されるセンサ値に応じた制御量を決定する、装置。 Further, the device according to one aspect of the present invention is a device that executes a predetermined task, and is a first sensor that senses information necessary for the device to perform an operation for executing the task, an actuator, and a device using the actuator. A second sensor that senses the state change of the above, a control unit that controls the actuator based on the sensor values output from the first sensor and the second sensor, and a storage unit that stores the learning result performed by the above learning device. The control unit determines the control amount according to the sensor values output from the first sensor and the second sensor based on the learning result stored in the storage unit.

本発明によれば、人の手を介さずに、学習装置が学習目的を達成するのに要する時間を短縮するための技術を提供することができる。 According to the present invention, it is possible to provide a technique for shortening the time required for a learning device to achieve a learning object without human intervention.

第１実施形態における学習装置の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置によって制御される車両が自動走行するコースを示す模式図である。It is a schematic diagram which shows the course in which the vehicle controlled by the learning device in 1st Embodiment automatically travels. 第１実施形態における学習装置の処理の概略を示すフローチャートである。It is a flowchart which shows the outline of the process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の詳細構成を示すブロック図である。It is a block diagram which shows the detailed structure of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の処理の詳細を示すフローチャートである。It is a flowchart which shows the details of the process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の処理の詳細を示すフローチャートである。It is a flowchart which shows the details of the process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の処理の詳細を示すフローチャートである。It is a flowchart which shows the details of the process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置の処理の詳細を示すフローチャートである。It is a flowchart which shows the details of the process of the learning apparatus in 1st Embodiment. 第１実施形態における学習装置のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware composition of the learning apparatus in 1st Embodiment. 第２実施形態における学習装置の概略構成を示すブロック図である。It is a block diagram which shows the schematic structure of the learning apparatus in 2nd Embodiment.

[第１実施形態]
以下、図面を参照して本発明の実施形態について詳細に説明する。なお、同一の要素には同一の符号を付し、重複する説明を省略する。また、以下の実施の形態は、本発明を説明するための例示であり、本発明をその実施の形態のみに限定する趣旨ではない。さらに、本発明は、その要旨を逸脱しない限り、さまざまな変形が可能である。 [First Embodiment]
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. The same elements are designated by the same reference numerals, and duplicate description will be omitted. Further, the following embodiments are examples for explaining the present invention, and the present invention is not intended to be limited only to the embodiments thereof. Furthermore, the present invention can be modified in various ways as long as it does not deviate from the gist thereof.

＜１．システム概要＞
図１乃至図３を参照して本実施形態におけるシステムの概要について説明する。
図１は、本実施形態に係る学習装置１の概略構成を示すブロック図である。学習装置１は、所定のタスクを学習するものである。本実施形態に係る学習装置１は，一例として自動走行制御車両（以下、単に「車両」ともいう。）９０に搭載され、所定のコース（図２参照）を自動走行するための車両９０の制御を学習する。このとき学習装置１には、例えばオペレータ等から学習データが与えられる。学習データは、例えば次のような学習目的と学習要件を含むデータである。 <1. System overview>
The outline of the system in this embodiment will be described with reference to FIGS. 1 to 3.
FIG. 1 is a block diagram showing a schematic configuration of the learning device 1 according to the present embodiment. The learning device 1 learns a predetermined task. The learning device 1 according to the present embodiment is mounted on an automatic driving control vehicle (hereinafter, also simply referred to as “vehicle”) 90 as an example, and controls the vehicle 90 for automatically traveling on a predetermined course (see FIG. 2). To learn. At this time, learning data is given to the learning device 1 by, for example, an operator or the like. The learning data is, for example, data including the following learning objectives and learning requirements.

（学習目的）
・所定時間以内にコースを１０周してゴールする。
（学習要件）
・コースアウトしない
・周回方向は時計回り
・ゴールする
・初期段階レベルでは「コースを１周してゴールする」 (Learning purpose)
・ Complete 10 laps of the course within the specified time to reach the goal.
(Learning requirements)
・ Do not go out of the course ・ Clockwise in the lap direction ・ Goal ・ At the initial stage level, “Go around the course once”

なお、タスクは、学習に係る動作（本実施形態での「学習に係る動作」は、車両９０の自動走行に必要な各種制御である。なお、当該各種制御によって車両９０が実行する動作と考えてもよい。）で達成したいことであり、本実施形態ではコースを周回することである。また、学習目的はタスクが達成すべき水準であり、本実施形態では、上記のとおり「所定時間以内にコースを１０周してゴールすること」である。そうすると、本実施形態では、初期段階レベルの学習では、タスクが行えるようになることが、学習要件として与えられているとも考えられる。 The task is an operation related to learning (the "operation related to learning" in the present embodiment is various controls required for automatic traveling of the vehicle 90. It is considered that the operation is executed by the vehicle 90 by the various controls. It may be achieved.), And in this embodiment, it is to go around the course. In addition, the purpose of learning is the level at which the task should be achieved, and in the present embodiment, as described above, "to finish 10 laps of the course within a predetermined time". Then, in the present embodiment, it is considered that the learning requirement is that the task can be performed in the learning at the initial stage level.

また、以下の説明では、学習装置１はＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）やサーバ装置などのコンピュータによって構成されるものとして説明するが、これに限定されず、例えば、プロセッサ、ＲＡＭ、ＲＯＭを有する任意の組込装置によって実現されてもよい。また、各装置において実装される構成はソフトウェアによって実現される構成に限定されない。各装置に含まれる任意の構成は、ハードウェアによって実現される構成でもよい。例えば後述するニューラルネットワーク２２はカスタムＬＳＩ（Ｌａｒｇｅ-ＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）やＦＰＧＡ（Ｆｉｅｌｄ-ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）等の電子回路によって構成されてもよい。 Further, in the following description, the learning device 1 will be described as being composed of a computer such as a PC (Personal Computer) or a server device, but the present invention is not limited to this, and for example, any set having a processor, RAM, and ROM. It may be realized by a built-in device. Further, the configuration implemented in each device is not limited to the configuration realized by software. Any configuration included in each device may be a configuration realized by hardware. For example, the neural network 22 described later may be configured by an electronic circuit such as a custom LSI (Large-Scale Integration) or an FPGA (Field-Programmable Gate Array).

図１に示すように、学習装置１は、制御部１０と、機械学習部２０と、動作分類部３０と、記憶部４０とを有している。 As shown in FIG. 1, the learning device 1 includes a control unit 10, a machine learning unit 20, an motion classification unit 30, and a storage unit 40.

制御部１０は、車両９０において、学習装置１外に設けられた制御用センサ９１、アクチュエータ９２、状態検知用センサ９３と接続されている。制御部１０は、制御用センサ９１、及び状態検知用センサ９３からの出力に応じて、アクチュエータ９２を制御して、車両９０の自動走行を実施する。 The control unit 10 is connected to the control sensor 91, the actuator 92, and the state detection sensor 93 provided outside the learning device 1 in the vehicle 90. The control unit 10 controls the actuator 92 in response to the outputs from the control sensor 91 and the state detection sensor 93 to automatically drive the vehicle 90.

制御用センサ９１は、車両９０の自動走行制御を行うためのセンサ群である。例えば制御用センサ９１は、車載カメラやレーザ等の車外障害物検出センサ、路面状態検出センサ等から構成される。他方で、状態検知用センサ９３は、自動走行している車両９０の制御状態を検出するセンサ群である。例えば状態検知用センサ９３は、振動センサや騒音センサ、燃料消費量検出センサ、車速センサ、加速度センサ、ヨーレートセンサ等から構成される。 The control sensor 91 is a group of sensors for performing automatic driving control of the vehicle 90. For example, the control sensor 91 is composed of an in-vehicle camera, an external obstacle detection sensor such as a laser, a road surface condition detection sensor, and the like. On the other hand, the state detection sensor 93 is a group of sensors that detect the control state of the vehicle 90 that is automatically traveling. For example, the state detection sensor 93 is composed of a vibration sensor, a noise sensor, a fuel consumption detection sensor, a vehicle speed sensor, an acceleration sensor, a yaw rate sensor, and the like.

アクチュエータ９２は、車両９０を自動走行させるために制御部１０によって制御される。アクチュエータ９２は、例えばアクセルアクチュエータ、ブレーキアクチュエータ、及び操舵アクチュエータ等から構成される。アクセルアクチュエータは、制御部１０からの制御信号に応じてスロットル開度を制御することによって車両の駆動力を制御する。ブレーキアクチュエータは、制御部１０からの制御信号に応じてブレーキペダルの操作量を制御することにより、車両の車輪に対する制動力の制御を行う。操舵アクチュエータは、制御部１０からの制御信号に応じて電動パワーステアリングシステムの操舵アシストモータの駆動を制御して、車両の操舵作用の制御を行う。 The actuator 92 is controlled by the control unit 10 to automatically drive the vehicle 90. The actuator 92 is composed of, for example, an accelerator actuator, a brake actuator, a steering actuator, and the like. The accelerator actuator controls the driving force of the vehicle by controlling the throttle opening degree according to the control signal from the control unit 10. The brake actuator controls the braking force on the wheels of the vehicle by controlling the operation amount of the brake pedal in response to the control signal from the control unit 10. The steering actuator controls the driving of the steering assist motor of the electric power steering system in response to the control signal from the control unit 10 to control the steering action of the vehicle.

次に、図３を参照して学習装置１が学習を行う手順を大まかに説明する。なお、各ステップの処理の詳細については、後述する。図３は、学習装置１が学習を行う際の、処理フローの概略を示すフローチャートである。まず、学習初期段階（Ｓ１）として、タスクが行えるようになること（すなわち初期段階の学習要件を満たした動作ができるようになること）を目的に学習が行われる。本実施形態における学習装置１は、初期段階としては、「コースを１周してゴールすること」が学習要件として与えられている。 Next, the procedure for the learning device 1 to perform learning will be roughly described with reference to FIG. The details of the processing of each step will be described later. FIG. 3 is a flowchart showing an outline of a processing flow when the learning device 1 performs learning. First, as the initial stage of learning (S1), learning is performed for the purpose of being able to perform a task (that is, being able to perform an operation that satisfies the learning requirements of the initial stage). In the learning device 1 in the present embodiment, as an initial stage, "a goal is to go around the course once" is given as a learning requirement.

初期段階レベルの目的をクリアすると、次に動作の分類（Ｓ２）が行われる。この段階では、Ｓ１の学習初期段階で行った学習内容を解析することにより、タスクを所定のパラメータに基づいて複数に分割（以下では、分割されたタスクを「場面」ともいう。）し、分割された場面それぞれにおいて、タスクに係る一連の動作のうち当該場面において行われる動作（以下、「部分動作」ともいう。）を特定する処理が行われる。タスクを分割する所定のパラメータは、例えばタスクの学習中に係る動作の変位量や、タスクの学習中に係る動作を実行する環境（タスクの開始時点からの経過時間や、タスクの開始場所からの位置等）である。本実施形態では、所定のパラメータとして、タスクの開始場所からの位置（タスクの学習中に係る動作を実行する環境）を用いる。すなわち、本実施形態では、学習装置１は、コース上の位置に基づいて、タスクを場面に分割し、分割した場面に対応するコース単位で行われる動作に基づいて、学習に係る一連の動作が場面に分類される。場面に応じて分類された部分動作単位で学習を行うことで、学習の効率化を図ることができる。なお、本実施形態において、学習の効率化とは、例えば学習開始から学習目的の達成までの所要時間の短縮化を意味してもよい。 After clearing the purpose of the initial stage level, the operation classification (S2) is performed next. At this stage, by analyzing the learning content performed in the initial learning stage of S1, the task is divided into a plurality of parts based on a predetermined parameter (hereinafter, the divided task is also referred to as a "scene") and divided. In each of the performed scenes, a process for specifying an operation (hereinafter, also referred to as "partial operation") performed in the scene among a series of operations related to the task is performed. Predetermined parameters for dividing a task are, for example, the displacement amount of the operation related to the learning of the task, the environment for executing the operation related to the learning of the task (elapsed time from the start time of the task, and the start location of the task). Position, etc.). In the present embodiment, the position from the start location of the task (environment for executing the operation related to the learning of the task) is used as a predetermined parameter. That is, in the present embodiment, the learning device 1 divides the task into scenes based on the position on the course, and a series of movements related to learning are performed based on the movements performed in course units corresponding to the divided scenes. It is classified into scenes. Learning can be made more efficient by learning in sub-motion units classified according to the situation. In the present embodiment, the efficiency of learning may mean, for example, shortening the time required from the start of learning to the achievement of the learning purpose.

動作を分類すると、次のステップとして、分類した部分動作ごとに、学習続行不能状態に至る制御の学習（Ｓ３）が行われる。ここで、学習続行不能状態とは、タスクが続行不能になる状態をいう。例えば、学習装置１における学習が所定の装置の制御である場合には、制御対象となる所定の装置の動作が停止した場合や、所定の装置が故障して動作不能に陥った場合をいう。本実施形態では、学習続行不能状態とは、例えば、コースアウトする、壁等にクラッシュして動けなくなる、故障する等の状態である。学習続行不能状態に至る制御を予め学習することによって、のちのステップで最適な制御を学習するうえで、学習続行不能状態に陥ることを避けて学習を行うことができる。これによってより効率的に学習を行うことが可能になる。 When the movements are classified, as the next step, learning of control (S3) leading to a state in which learning cannot be continued is performed for each classified partial movement. Here, the learning continuation impossible state means a state in which the task cannot be continued. For example, when the learning in the learning device 1 is the control of a predetermined device, it means that the operation of the predetermined device to be controlled is stopped or the predetermined device fails and becomes inoperable. In the present embodiment, the state in which learning cannot be continued is, for example, a state in which the student goes off course, crashes into a wall or the like and cannot move, or breaks down. By learning the control leading to the learning continuation impossible state in advance, it is possible to avoid falling into the learning continuation impossible state in learning the optimum control in a later step. This makes it possible to perform learning more efficiently.

学習最終段階（Ｓ４）では、学習の最適化が行われる。この段階では、場面ごとに分類して学習した部分動作を組み合わせた上で、動作の開始から終了までを最適に行う学習が行われる。本実施形態では、最終段階の学習として、所定時間以内にコースを１０周してゴールする学習が行われる。 At the final stage of learning (S4), learning is optimized. At this stage, learning is performed in which the partial movements learned by classifying each scene are combined, and then the movements are optimally performed from the start to the end. In the present embodiment, as the final stage of learning, learning to complete 10 laps of the course within a predetermined time is performed.

＜２．詳細処理＞
次に、図４乃至図８を参照して、各ステップにおける学習装置１の処理の詳細について説明する。図４は本実施形態に係る学習装置１の詳細な構成を示すブロック図である。図４に示すように、機械学習部２０は、学習データ入出力部２１と、ニューラルネットワーク２２と、学習結果出力部２３とから構成される。また、動作分類部３０は、制御データ抽出部３１と、動作分類結果抽出部３２とから構成される。
以下では、図３のステップごとに、各部の処理の詳細について説明する。 <2. Detailed processing>
Next, the details of the processing of the learning device 1 in each step will be described with reference to FIGS. 4 to 8. FIG. 4 is a block diagram showing a detailed configuration of the learning device 1 according to the present embodiment. As shown in FIG. 4, the machine learning unit 20 includes a learning data input / output unit 21, a neural network 22, and a learning result output unit 23. Further, the motion classification unit 30 is composed of a control data extraction unit 31 and an motion classification result extraction unit 32.
In the following, the details of the processing of each part will be described for each step of FIG.

（２−１．学習初期段階）
図５は、図３に示したＳ１の学習初期段階における詳細な処理フローを示すフローチャートである。まず、学習の初期段階（第１学習）において、学習データ入出力部２１が学習データを受け付ける（Ｓ１０１）。学習データは、例えば上述した学習目的及び学習要件を含むデータである。 (2-1. Initial stage of learning)
FIG. 5 is a flowchart showing a detailed processing flow in the initial stage of learning of S1 shown in FIG. First, in the initial stage of learning (first learning), the learning data input / output unit 21 receives the learning data (S101). The learning data is, for example, data including the above-mentioned learning purpose and learning requirement.

次のステップ（Ｓ１０２）では、機械学習が行われる。本実施形態では、個々の制御動作を制限するための条件は予め指定されていないため、学習装置１自らが制御動作を学習することになる。具体的には、制御部１０は、ランダムな制御量をアクチュエータ９２に対して設定して動作させる。このとき車両９０は当然コースに沿って走行することはできないので、コースアウト等をしながらでたらめな走行をすることになる。制御部１０は、ランダムに与えた制御量に対する制御用センサ９１及び状態検知用センサ９３から出力（以下、「センサ値」ともいう。）を読み取り、これらのデータ（制御量及びセンサ値）を記憶部４０に記憶させる。ニューラルネットワーク２２は、記憶部４０を参照して、記憶された制御量とセンサ値とを読み取り、学習要件に適応する制御動作をＤｅｅｐＬｅａｒｎｉｎｇにより学習する（Ｓ１０２）。 In the next step (S102), machine learning is performed. In the present embodiment, since the conditions for limiting the individual control operations are not specified in advance, the learning device 1 itself learns the control operations. Specifically, the control unit 10 sets a random control amount with respect to the actuator 92 and operates the actuator 92. At this time, since the vehicle 90 cannot naturally travel along the course, the vehicle 90 will travel randomly while going off the course. The control unit 10 reads an output (hereinafter, also referred to as “sensor value”) from the control sensor 91 and the state detection sensor 93 for a randomly given control amount, and stores these data (control amount and sensor value). It is stored in the part 40. The neural network 22 refers to the storage unit 40, reads the stored control amount and the sensor value, and learns the control operation adapted to the learning requirement by deep learning (S102).

学習要件には、初期段階レベルの目的として、「コースを１周してゴールすること」が設定されている。したがって、学習装置１においては、例えば制御用センサ９１からの出力に基づいてコースを１周してゴールしたと判断した時点で、機械学習が初期段階レベルに達したと判定し（Ｓ１０３：Ｙ）、初期段階の学習を終了する。 In the learning requirements, "to go around the course and reach the goal" is set as the purpose of the initial stage level. Therefore, in the learning device 1, it is determined that the machine learning has reached the initial stage level when it is determined that the goal has been reached by going around the course once based on the output from the control sensor 91, for example (S103: Y). , Finish the initial learning.

（２−２．動作の分類）
図６は、図３に示したＳ２の動作の分類における詳細な処理フローを示すフローチャートである。まず、動作の分類処理を行うに当たり、制御データ抽出部３１が学習初期段階終了時点における、制御用センサ９１のセンサ値と、これに対するアクチュエータ９２の制御量及び状態検知用センサ９３のセンサ値とを記憶部４０から抽出する（Ｓ２０１）。制御データ抽出部３１は、抽出した各値をニューラルネットワーク２２に対して学習データとして入力する。 (2-2. Classification of movement)
FIG. 6 is a flowchart showing a detailed processing flow in the operation classification of S2 shown in FIG. First, in performing the motion classification process, the control data extraction unit 31 determines the sensor value of the control sensor 91 at the end of the initial learning stage, the control amount of the actuator 92, and the sensor value of the state detection sensor 93. Extract from the storage unit 40 (S201). The control data extraction unit 31 inputs each extracted value to the neural network 22 as learning data.

次に、ニューラルネットワーク２２は、制御データ抽出部３１により入力された学習データに基づいて、機械学習を行う（Ｓ２０２）。このとき、ニューラルネットワーク２２では、周回動作を所定の個数に分割された場面に分類する。 Next, the neural network 22 performs machine learning based on the learning data input by the control data extraction unit 31 (S202). At this time, the neural network 22 classifies the orbiting motion into a predetermined number of scenes.

ニューラルネットワーク２２による周回動作の場面への分類処理についてより詳細に説明する。ニューラルネットワーク２２は、周回動作の場面への分類を場面ベクトルと動作ベクトルとに基づいて行う。場面ベクトルは、車両９０が行うタスクの場面を表す。場面ベクトルは、例えば、制御用センサ９１が出力するセンサ値（例えばスタート地点からの位置（又は距離）、及びスタート地点からの方向）から取得される。一例として、スタート地点を原点とするｘ、ｙ座標を想定した場合、地点ｌにおける場面ベクトルは、（ｌ_x、_y）で表すことが可能である。 The classification process of the neural network 22 into the scene of the orbiting motion will be described in more detail. The neural network 22 classifies the orbiting motion into scenes based on the scene vector and the motion vector. The scene vector represents a scene of a task performed by the vehicle 90. The scene vector is acquired from, for example, a sensor value output by the control sensor 91 (for example, a position (or distance) from the start point and a direction from the start point). As an example, assuming x and y coordinates with the start point as the origin, the scene vector at the point l can be represented by _{(l x} , _y).

他方、動作ベクトルは、走行する車両９０の制御状態を表す。動作ベクトルは、例えば状態検知用センサ９３が出力するセンサ値（例えば速度や加速度、角速度、角加速度等である）から取得される。一例として、ある地点ｌでの動作ベクトルは、当該地点ｌでの速度ｖ、加速度ａを用いて（ｖ_l、ａ_l）で表される。 On the other hand, the motion vector represents the control state of the traveling vehicle 90. The motion vector is acquired from, for example, a sensor value (for example, speed, acceleration, angular velocity, angular acceleration, etc.) output by the state detection sensor 93. As an example, the motion vector at a certain point l is represented by _{(v l} , a _{l) using the velocity v and the acceleration a at the point l.}

ニューラルネットワーク２２は、場面ベクトル（ｌ_x、_y）に基づいて、タスクを場面に分割し、動作ベクトル（ｖ_l、ａ_l）に基づいて、分割した場面ごとに当該場面で学習すべき動作の分類を学習する。これによって、学習装置１は、自身が今どの場面にいるかを判定することで、場面に応じた部分動作を学習することができる。一例としてニューラルネットワーク２２は、場面ベクトルが表す位置に加え、動作ベクトルの変化点に着目することで、車両９０の動作の加速・減速・方向転換等を把握し、当該変化点に基づいて、一連の動作を場面に応じた動作に分類することができる。また、例えばニューラルネットワーク２２は、動作ベクトルの類似度に基づいて、動作の分類を学習することも可能である。 The neural network 22 divides the task into scenes based on the scene vector (l _x , _y ), and based on the motion vector (v _l , a _l ), the motion to be learned in each divided scene. Learn classification. As a result, the learning device 1 can learn the partial motion according to the scene by determining which scene it is currently in. As an example, the neural network 22 grasps the acceleration, deceleration, direction change, etc. of the motion of the vehicle 90 by paying attention to the change point of the motion vector in addition to the position represented by the scene vector, and is a series based on the change point. It is possible to classify the movements of the above into movements according to the scene. Further, for example, the neural network 22 can learn the classification of motions based on the similarity of motion vectors.

図２に示したコースの例では、タスクは、ア〜オの５つのコースに応じた場面に分割される。各場面に分類される部分動作は例えば以下のとおりである。
場面ア：第１ストレート部分動作（例えば次の第１コーナーに差し掛かる際の減速のタイミングや走行位置等の制御である。）
場面イ：第１コーナー部分動作（例えばコーナーでのハンドル操作や、第２ストレートへ進入するに際した加速のタイミング等の制御である。）
場面ウ：第２ストレート部分動作（例えば次の第２コーナーに差し掛かる際の減速のタイミングや走行位置等の制御である。）
場面エ：第２コーナー部分動作（例えばコーナーでのハンドル操作や、第３ストレートへ進入するに際した加速のタイミング等の制御である。）
場面オ：第３ストレート部分動作（例えば第１ストレートに進入するに際した加速等の制御である。） In the example of the course shown in FIG. 2, the task is divided into scenes corresponding to the five courses A to O. The partial operations classified into each scene are as follows, for example.
Scene A: First straight partial operation (for example, control of deceleration timing, running position, etc. when approaching the next first corner)
Scene a: Partial movement of the first corner (for example, control of steering wheel operation at a corner, acceleration timing when entering the second straight, etc.)
Scene c: Second straight partial operation (for example, control of deceleration timing, running position, etc. when approaching the next second corner)
Scene d: Partial movement of the second corner (for example, control of steering wheel operation at a corner, acceleration timing when entering the third straight, etc.)
Scene e: Partial movement of the third straight (for example, control of acceleration when entering the first straight)

なお、ニューラルネットワーク２２は、分割した場面を、進行順に応じて並び替え可能であることが好ましい。 It is preferable that the neural network 22 can rearrange the divided scenes according to the order of progress.

動作分類結果抽出部３２は、ニューラルネットワーク２２が学習した部分動作の分類を抽出し、記憶部４０に記憶させる（Ｓ２０３）。 The motion classification result extraction unit 32 extracts the classification of the partial motion learned by the neural network 22 and stores it in the storage unit 40 (S203).

（２−３．学習続行不能状態に至る制御の学習）
図７は、図３に示したＳ３の学習続行不能状態に至る制御の学習（第２学習）における詳細な処理フローを示すフローチャートである。まず、学習データ入出力部２１は、記憶部４０を参照し、Ｓ２の処理において分類された部分動作のうち、いずれかの部分動作を選択し、当該部分動作に必要なアクチュエータ９２への制御量を抽出する。さらに学習データ入出力部２１は、記憶部４０を参照し抽出した制御量において制御を実行し、その結果、学習続行不能状態に至ったか否かを例えば状態検知用センサ９３からの出力等に基づいて判定する。学習データ入出力部２１は、抽出した制御量とその結果、学習続行不能状態に至ったか否かの情報を学習データとして読み出し、ニューラルネットワーク２２に学習データとして与える。ニューラルネットワーク２２は、与えられた学習データのもと、ＤｅｅｐＬｅａｒｎｉｎｇにより学習を行う（Ｓ３０１）。 (2-3. Learning of control leading to a state in which learning cannot be continued)
FIG. 7 is a flowchart showing a detailed processing flow in the control learning (second learning) leading to the learning continuation impossible state of S3 shown in FIG. First, the learning data input / output unit 21 refers to the storage unit 40, selects one of the partial operations classified in the processing of S2, and controls the actuator 92 required for the partial operation. Is extracted. Further, the learning data input / output unit 21 executes control with the controlled amount extracted by referring to the storage unit 40, and as a result, whether or not the learning cannot be continued is determined based on, for example, the output from the state detection sensor 93. To judge. The learning data input / output unit 21 reads out the extracted control amount and information on whether or not the learning cannot be continued as a result, and gives it to the neural network 22 as learning data. The neural network 22 performs learning by deep learning based on the given learning data (S301).

このとき、学習結果出力部２３は、学習続行不能状態に至る制御の学習結果を出力することが可能である。これによって、ニューラルネットワーク２２は、例えば同様の構成を備える別の学習装置１’から、学習続行不能状態に至った制御を学習データとして受け付けて追加学習を行うことができる（Ｓ３０２）。これによってより効率の良い学習を行うことができる。効率の良い学習とは、例えば学習開始から学習目的達成までに要する時間が短い学習をいう。なお、Ｓ３０２の処理は必須の処理ではない。 At this time, the learning result output unit 23 can output the learning result of the control leading to the learning continuation impossible state. As a result, the neural network 22 can perform additional learning by accepting as learning data the control that has reached the state where learning cannot be continued from another learning device 1'having, for example, having the same configuration (S302). This makes it possible to perform more efficient learning. Efficient learning means, for example, learning in which the time required from the start of learning to the achievement of the learning purpose is short. The process of S302 is not an essential process.

学習装置１は、Ｓ３０１（及びＳ３０２）の処理を、分類されたすべての部分動作について実施する（Ｓ３０３）。 The learning device 1 performs the processing of S301 (and S302) for all the classified partial operations (S303).

必須ではないが、学習装置１は、分類されたすべての部分動作について学習続行不能状態に至る制御を学習した後に、一連の動作を通じて再度学習を行うことも可能である（Ｓ３０４）。これによって、より速い周回制御を行うことが可能になる。 Although not essential, the learning device 1 can learn the control leading to the inability to continue learning for all the classified partial movements, and then perform learning again through a series of movements (S304). This makes it possible to perform faster lap control.

このように、本実施形態に係る学習装置１が、分類された部分動作について、まず学習続行不能状態に至る制御を学習することによって、その後の学習において、当該制御を避けて学習することが可能になる。これによって、より効率的な学習を行うことができる。 As described above, the learning device 1 according to the present embodiment can learn the classified partial motions by first learning the control leading to the learning continuation impossible state, thereby avoiding the control in the subsequent learning. become. As a result, more efficient learning can be performed.

（２−４．最適化学習）
図８は、図３に示したＳ４の最適化学習（第３学習）における詳細な処理フローを示すフローチャートである。最適化学習では、Ｓ３までのステップで行った学習の最適化を図ることにより、学習開始時に学習データとして与えられた学習目的（本実施形態においては、「所定時間以内にコースを１０周してゴールする」ことである。）を達成するための学習を行う。最適化学習においては、Ｓ３で学習した学習続行不能状態に至る制御を除外して学習が行われる。このとき、学習データ入出力部２１は、記憶部４０を参照して、学習初期段階（図３のＳ１）において入力された学習データ（オペレータが設定したものである）を抽出する。また、学習データ入出力部２１は、さらに記憶部４０を参照して学習続行不能状態に至る制御を学習した後のニューラルネットワーク２２の状態を抽出する。学習データ入出力部２１は、抽出したこれらのデータを制御部１０に設定する。 (2-4. Optimization learning)
FIG. 8 is a flowchart showing a detailed processing flow in the optimization learning (third learning) of S4 shown in FIG. In the optimized learning, by optimizing the learning performed in the steps up to S3, the learning purpose given as the learning data at the start of the learning (in the present embodiment, "10 laps of the course within a predetermined time". To achieve the goal ”). In the optimized learning, the learning is performed excluding the control that leads to the learning continuation impossible state learned in S3. At this time, the learning data input / output unit 21 refers to the storage unit 40 and extracts the learning data (set by the operator) input in the initial stage of learning (S1 in FIG. 3). Further, the learning data input / output unit 21 further refers to the storage unit 40 and extracts the state of the neural network 22 after learning the control leading to the learning continuation impossible state. The learning data input / output unit 21 sets these extracted data in the control unit 10.

制御部１０では、設定された上述のデータに基づいて、アクチュエータ９２に対する制御量を出力し、これに対する制御用センサ９１及び状態検知用センサ９３のセンサ値を取得する。制御部１０は、与えた制御量及び、これに対して出力されたセンサ値を記憶部４０に記憶させる。 The control unit 10 outputs a control amount for the actuator 92 based on the set data described above, and acquires sensor values of the control sensor 91 and the state detection sensor 93 for the control amount. The control unit 10 stores the given control amount and the sensor value output to the control amount in the storage unit 40.

ニューラルネットワーク２２は、上記の処理において制御部１０が記憶させた制御量及びセンサ値を読み出して、ＤｅｅｐＬｅａｒｎｉｎｇにより学習を行う（Ｓ４０１）。これによってニューラルネットワーク２２は、学習続行不能状態に至る制御を学習した状態で、動作の開始から終了まで（すなわち、コースのスタートからゴールまで）をとおして、学習要件に適応する制御動作をより効率よく学習することができる。学習全体の最適化がなされるまでＳ４０１の処理が繰り返し行われる（Ｓ４０２）。最適化学習の結果は、学習結果出力部２３によって抽出され、記憶部４０に記憶される。これによって、最適化学習では、学習続行不能状態に至る制御を除外して学習を行うことができる。 The neural network 22 reads out the control amount and the sensor value stored in the control unit 10 in the above process, and performs learning by deep learning (S401). As a result, the neural network 22 makes the control operation adapting to the learning requirement more efficient from the start to the end of the operation (that is, from the start to the goal of the course) in the state where the control leading to the learning continuation impossible state is learned. You can learn well. The process of S401 is repeated until the entire learning is optimized (S402). The result of the optimization learning is extracted by the learning result output unit 23 and stored in the storage unit 40. As a result, in the optimized learning, the learning can be performed by excluding the control leading to the state in which the learning cannot be continued.

このように、本実施形態に係る学習装置１によると、学習装置１自身が、学習に係る動作を、部分動作に分類して学習を行うことができる。これによって分類した動作ごとに個別最適化を図ることができるため、より効率よく（すなわち、より短期間で）学習を行うことができる。さらに、本実施形態に係る学習装置１によると、部分動作を学習するに際して、まず学習続行不能状態に至る制御を学習する。これによって、人があらかじめ動作ごとに細かく条件を設定することなく、効率よく学習を行うことができる。 As described above, according to the learning device 1 according to the present embodiment, the learning device 1 itself can classify the movements related to learning into partial movements and perform learning. As a result, individual optimization can be performed for each classified operation, so that learning can be performed more efficiently (that is, in a shorter period of time). Further, according to the learning device 1 according to the present embodiment, when learning the partial motion, first, the control leading to the learning continuation impossible state is learned. As a result, the person can efficiently perform learning without setting detailed conditions for each operation in advance.

（ハードウェア構成）
図９を参照しながら、上述してきた学習装置１をコンピュータ８００により実現する場合のハードウェア構成の一例を説明する。なお、それぞれの装置の構成は、複数台の装置に分けて実現することもできる。 (Hardware configuration)
An example of the hardware configuration in the case where the learning device 1 described above is realized by the computer 800 will be described with reference to FIG. It should be noted that the configuration of each device can be realized by dividing it into a plurality of devices.

図９に示すように、コンピュータ８００は、プロセッサ８０１、メモリ８０３、記憶装置８０５、入力インタフェース部（入力Ｉ／Ｆ部）８０７、データインタフェース部（データＩ／Ｆ部）８０９、通信インタフェース部（通信Ｉ／Ｆ部）８１１、及び表示装置８１３を含む。 As shown in FIG. 9, the computer 800 includes a processor 801 and a memory 803, a storage device 805, an input interface unit (input I / F unit) 807, a data interface unit (data I / F unit) 809, and a communication interface unit (communication). The I / F section) 811 and the display device 813 are included.

プロセッサ８０１は、メモリ８０３に記憶されているプログラムを実行することによりコンピュータ８００における様々な処理を制御する。例えば、プロセッサ８０１がメモリ８０３に記憶されているプログラムを実行することで、学習装置１の制御部１０、機械学習部２０、及び動作分類部３０などが実現可能となる。 The processor 801 controls various processes in the computer 800 by executing a program stored in the memory 803. For example, when the processor 801 executes the program stored in the memory 803, the control unit 10, the machine learning unit 20, the operation classification unit 30, and the like of the learning device 1 can be realized.

メモリ８０３は、例えばＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等の記憶媒体である。メモリ８０３は、プロセッサ８０１によって実行されるプログラムのプログラムコードや、プログラムの実行時に必要となるデータを一時的に記憶する。 The memory 803 is, for example, a storage medium such as a RAM (Random Access Memory). The memory 803 temporarily stores the program code of the program executed by the processor 801 and the data required when the program is executed.

記憶装置８０５は、例えばハードディスクドライブ（ＨＤＤ）やソリッドステートドライブ等の補助記憶装置、フラッシュメモリ等の不揮発性の記憶媒体である。記憶装置８０５は、オペレーティングシステムや、上記各構成を実現するための各種プログラムを記憶する。このようなプログラムやデータは、必要に応じてメモリ８０３にロードされることにより、プロセッサ８０１から参照される。例えば上述の記憶部４０は、記憶装置８０５によって実現される。 The storage device 805 is, for example, an auxiliary storage device such as a hard disk drive (HDD) or a solid state drive, or a non-volatile storage medium such as a flash memory. The storage device 805 stores an operating system and various programs for realizing each of the above configurations. Such programs and data are referred to by the processor 801 by being loaded into the memory 803 as needed. For example, the above-mentioned storage unit 40 is realized by the storage device 805.

入力Ｉ／Ｆ部８０７は、管理者からの入力を受け付けるためのデバイスである。入力Ｉ／Ｆ部８０７の具体例としては、キーボードやマウス、タッチパネル、各種センサ、ウェアラブル・デバイス等が挙げられる。入力Ｉ／Ｆ部８０７は、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）等のインタフェースを介してコンピュータ８００に接続されても良い。 The input I / F unit 807 is a device for receiving input from an administrator. Specific examples of the input I / F unit 807 include a keyboard, a mouse, a touch panel, various sensors, a wearable device, and the like. The input I / F unit 807 may be connected to the computer 800 via an interface such as USB (Universal Serial Bus).

データＩ／Ｆ部８０９は、コンピュータ８００の外部からデータを入力するためのデバイスである。データＩ／Ｆ部８０９の具体例としては、各種記憶媒体に記憶されているデータを読み取るためのドライブ装置等がある。データＩ／Ｆ部８０９は、コンピュータ８００の外部に設けられることも考えられる。その場合、データＩ／Ｆ部８０９は、例えばＵＳＢ等のインタフェースを介してコンピュータ８００へと接続される。 The data I / F unit 809 is a device for inputting data from the outside of the computer 800. Specific examples of the data I / F unit 809 include a drive device for reading data stored in various storage media. It is also conceivable that the data I / F unit 809 is provided outside the computer 800. In that case, the data I / F unit 809 is connected to the computer 800 via an interface such as USB.

通信Ｉ／Ｆ部８１１は、コンピュータ８００の外部の装置と有線又は無線により、インターネットＮを介したデータ通信を行うためのデバイスである。通信Ｉ／Ｆ部８１１は、コンピュータ８００の外部に設けられることも考えられる。その場合、通信Ｉ／Ｆ部８１１は、例えばＵＳＢ等のインタフェースを介してコンピュータ８００に接続される。 The communication I / F unit 811 is a device for performing data communication via the Internet N by wire or wirelessly with an external device of the computer 800. It is also conceivable that the communication I / F unit 811 is provided outside the computer 800. In that case, the communication I / F unit 811 is connected to the computer 800 via an interface such as USB.

表示装置８１３は、各種情報を表示するためのデバイスである。表示装置８１３の具体例としては、例えば液晶ディスプレイや有機ＥＬ（Ｅｌｅｃｔｒｏ−Ｌｕｍｉｎｅｓｃｅｎｃｅ）ディスプレイ、ウェアラブル・デバイスのディスプレイ等が挙げられる。表示装置８１３は、コンピュータ８００の外部に設けられても良い。その場合、表示装置８１３は、例えばディスプレイケーブル等を介してコンピュータ８００に接続される。 The display device 813 is a device for displaying various information. Specific examples of the display device 813 include a liquid crystal display, an organic EL (Electro-Luminescence) display, a display of a wearable device, and the like. The display device 813 may be provided outside the computer 800. In that case, the display device 813 is connected to the computer 800 via, for example, a display cable or the like.

[第２実施形態]
第１実施形態では、学習装置１が自動走行制御車両９０に用いられる例について説明した。しかし、学習装置１が適用される装置は、第１実施形態に示した例に限定されず、種々の装置に適用することができる。本実施形態では、ピックアンドプレース動作を行うことをタスクとする、ロボットの制御に適用する例について説明する。なお、第２実施形態では、第１実施形態との差異点を中心に説明する。 [Second Embodiment]
In the first embodiment, an example in which the learning device 1 is used for the automatic driving control vehicle 90 has been described. However, the device to which the learning device 1 is applied is not limited to the example shown in the first embodiment, and can be applied to various devices. In this embodiment, an example applied to the control of a robot whose task is to perform a pick-and-place operation will be described. In the second embodiment, the differences from the first embodiment will be mainly described.

まず、図１０を参照して本実施形態に係るシステム構成について第１実施形態との違いを説明する。学習装置１の構成は第１実施形態と同様である。他方で、学習装置１外の構成について、本実施形態では、制御用センサ９１’は、ピックアンドプレース動作を行うためのセンサ群から構成される。具体的には、ワーク検出センサ（画像センサ）、ロボット把持用力覚センサ等から構成される。また、制御用センサ９１’は、画像認識アルゴリズムを有しており、把持するワークの形状を認識することができる。その他の学習装置１外の構成は第１実施形態と同様である。 First, the difference between the system configuration according to the present embodiment and the first embodiment will be described with reference to FIG. The configuration of the learning device 1 is the same as that of the first embodiment. On the other hand, regarding the configuration outside the learning device 1, in the present embodiment, the control sensor 91'is composed of a group of sensors for performing a pick-and-place operation. Specifically, it is composed of a work detection sensor (image sensor), a robot gripping force sensor, and the like. Further, the control sensor 91'has an image recognition algorithm and can recognize the shape of the workpiece to be gripped. The other configurations outside the learning device 1 are the same as those in the first embodiment.

次に、本実施形態に係る学習と第１実施形態に係る学習の違いについて説明する。
本実施形態に係るタスクであるピックアンドプレース動作は、以下の手順で行われる動作をいう。
１．ワーク形状を認識して把持する。
２．把持したワークを持ち上げる。
３．ワーク形状に応じた所定の位置へ持ち上げたワークを移動させる。
４．ワーク形状ごとに筒内に積み上げる。 Next, the difference between the learning according to the present embodiment and the learning according to the first embodiment will be described.
The pick-and-place operation, which is a task according to the present embodiment, refers to an operation performed by the following procedure.
1. 1. Recognize and grip the work shape.
2. Lift the gripped work.
3. 3. The lifted work is moved to a predetermined position according to the shape of the work.
4. Stack each work shape in the cylinder.

また、本実施形態に係るロボット制御の学習において、与えられる学習目的と学習要件とは次のとおりである。 Further, in the learning of robot control according to the present embodiment, the learning objectives and learning requirements given are as follows.

（学習目的）
・３種類の異なる形状（例えば、円柱ワーク、四角柱ワーク、及び三角柱ワークの３種類である。）をしたワークがバラ積みされたコンテナから、ピックアンドプレース動作により、所定時間以内に、ワーク形状に応じた入口を有する筒（円形、四角形、三角形）に、１０個のワークを積み上げる。
（学習要件）
・所定位置以外にワークを載置しない
・ワーク形状ごとに１０個のワークを筒の中で積み上げる
・初期レベルでは「１個のワークを、適切なワーク形状の筒の中に積む」 (Learning purpose)
-From a container in which workpieces having three different shapes (for example, three types of cylindrical workpiece, square prism workpiece, and triangular prism workpiece) are stacked separately, the work shape can be obtained within a predetermined time by a pick-and-place operation. Ten workpieces are stacked in a cylinder (circular, quadrangular, triangular) having an entrance according to the above.
(Learning requirements)
・ Do not place workpieces in any position other than the specified position. ・ Stack 10 workpieces in a cylinder for each workpiece shape. ・ At the initial level, "stack one workpiece in a cylinder with an appropriate workpiece shape."

本実施形態では、タスクは、ワークを形状に応じた筒に積み上げることである。また、本実施形態では、学習するピックアンドプレース動作は、第１の実施形態において、車両９０が走行するコースに基づいてタスクを場面に分割し、当該場面に基づいて部分動作を分類するのと同様の手順で、当該ピックアンドプレース動作も場面に応じて部分動作に分類処理されてもよい。例えば、本実施形態では、タスクは、タスクの学習中に係る動作の変位量に基づいて、ワークを把持する動作に応じた場面、ワークを運ぶ動作に応じた場面、ワークを積み上げる動作に応じた場面、に分割される。ピックアンドプレース動作は、分割された場面に応じて、部分動作に分類される。 In the present embodiment, the task is to stack the workpieces in a cylinder according to the shape. Further, in the present embodiment, in the first embodiment, the pick-and-place motion to be learned divides the task into scenes based on the course on which the vehicle 90 travels, and classifies the partial motions based on the scene. In the same procedure, the pick-and-place operation may be classified into partial operations according to the situation. For example, in the present embodiment, the task corresponds to the scene corresponding to the movement of gripping the work, the scene corresponding to the movement of carrying the work, and the movement of stacking the work based on the displacement amount of the movement related to the learning of the task. The scene is divided into. Pick-and-place movements are classified into partial movements according to the divided scenes.

また、本実施形態において、学習続行不能状態とは、例えばワークが筒内に入らなくなる状態をいう。したがって、学習続行不能状態に至る制御の学習段階において、学習される制御は、例えば次のとおりである。
・載置場所を間違える（ワークの形状と筒の入り口の形状が異なる）
・ワークを積み上げる向きを間違える（ワークの形状の向きと筒の形状の向きとが異なる） Further, in the present embodiment, the state in which learning cannot be continued means, for example, a state in which the work cannot enter the cylinder. Therefore, in the learning stage of the control leading to the state in which learning cannot be continued, the control to be learned is, for example, as follows.
・ Wrong place of placement (the shape of the work and the shape of the entrance of the cylinder are different)
・ The direction in which the workpieces are stacked is incorrect (the orientation of the workpiece and the orientation of the cylinder are different).

本実施形態に係る学習装置１では、上記の学習続行不能状態に至る制御を予め学習することによって、ワーク形状と筒の形状を適切に認識することや、ワークを把持する際の向きについて予め学習することができる。これによって、最終段階の学習では、学習続行不能状態に至ることを避けることができるため、より学習の効率化を図ることができる。すなわち、学習目的達成までに要する時間をより短縮することができる。
その他の構成は第１実施形態と同様である。 In the learning device 1 according to the present embodiment, by learning in advance the control leading to the above-mentioned learning continuation impossible state, the shape of the work and the shape of the cylinder are appropriately recognized, and the orientation when gripping the work is learned in advance. can do. As a result, in the final stage of learning, it is possible to avoid a state in which learning cannot be continued, so that learning efficiency can be further improved. That is, the time required to achieve the learning purpose can be further shortened.
Other configurations are the same as those in the first embodiment.

以上、本発明の一実施形態について説明した。なお、本実施形態は、本発明の理解を容易にするためのものであり、本発明を限定して解釈するためのものではない。また、本発明は、その趣旨を逸脱することなく、変更ないし改良され得るものである。例えば、上述の処理フローにおける各ステップは処理内容に矛盾を生じない範囲で各ステップの一部を省略したり、各処理ステップの順番を任意に変更して又は並列に実行することができる。 The embodiment of the present invention has been described above. It should be noted that the present embodiment is for facilitating the understanding of the present invention, and is not for limiting and interpreting the present invention. Further, the present invention can be changed or improved without deviating from the gist thereof. For example, each step in the above-mentioned processing flow can omit a part of each step within a range that does not cause a contradiction in the processing contents, or can arbitrarily change the order of each processing step or execute them in parallel.

上述の実施形態では、本発明に係るシステムを利用して、深層学習などのＡＩ技術によって機械が獲得した能力の管理を行う例について説明したが、本発明はこれに限定されず、幅広い分野に適用することができる。例えば、製品の良品と不良品の識別、食品、機械部品、化学製品、薬品などのさまざまな工業分野、漁業分野、農業分野、林業分野、サービス業、医療や健康分野に適用することができる。また、組込分野の製品にＡＩ技術を適用する場合や社会システム等のＩＴ技術を活用したシステム、ビッグデータの分析、幅広い制御装置における分類処理等に本発明を適用してもよい。 In the above-described embodiment, an example in which the system according to the present invention is used to manage the ability acquired by the machine by AI technology such as deep learning has been described, but the present invention is not limited to this and covers a wide range of fields. Can be applied. For example, it can be applied to distinguish between good and bad products, various industrial fields such as food, mechanical parts, chemical products and chemicals, fishery field, agriculture field, forestry field, service industry, medical and health field. Further, the present invention may be applied to a case where AI technology is applied to a product in the embedded field, a system utilizing IT technology such as a social system, big data analysis, classification processing in a wide range of control devices, and the like.

なお、本明細書において、「部」や「手段」、「手順」とは、単に物理的構成を意味するものではなく、その「部」が行う処理をソフトウェアによって実現する場合も含む。また、１つの「部」や「手段」、「手順」や装置が行う処理が２つ以上の物理的構成や装置により実行されても、２つ以上の「部」や「手順」、装置が行う処理が１つの物理的手段や装置により実行されてもよい。 In addition, in this specification, a "part", a "means", and a "procedure" do not simply mean a physical configuration, but also include a case where the processing performed by the "part" is realized by software. Further, even if one "part", "means", "procedure" or process performed by the device is executed by two or more physical configurations or devices, two or more "parts", "procedures" or devices The processing to be performed may be performed by one physical means or device.

また、上記の実施形態の一部又は全部は、以下の付記のようにも記載され得るが、以下には限られない。
（付記１）
少なくとも１つのハードウェアプロセッサを備え、
前記ハードウェアプロセッサは、
学習目的を含む学習データを受け付け、
前記学習データに基づいて、学習を実行し、
前記ニューラルネットワークによる学習結果を出力し、
前記学習を実行することは、
前記学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、前記学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、前記第２学習の結果に基づいて、前記続行不能となる状態に至る制御を除外して前記学習目的を達成するための第３学習を実行することを含む。
（付記２）
少なくとも１つ以上のハードウェアプロセッサによって、
学習を行うステップであって、
学習目的を含む学習データを受け付けるステップと、
前記学習データに基づいて、学習を実行するステップと、
前記ニューラルネットワークによる学習結果を出力するステップと、
を実行し、
前記学習を実行するステップは、
前記学習目的の初期段階を達成するための第１学習を実行し、当該第１学習の結果に基づいて、前記学習に係る動作を続行不能となる状態に至る制御を学習する第２学習を実行し、前記第２学習の結果に基づいて、前記続行不能となる状態に至る制御を除外して前記学習目的を達成するための第３学習を実行する、ステップを含む、学習方法。 In addition, some or all of the above embodiments may be described as in the following appendix, but are not limited to the following.
(Appendix 1)
With at least one hardware processor
The hardware processor
Accepts learning data including learning purposes,
Based on the learning data, the learning is executed and
The learning result by the neural network is output, and
Performing the learning
The first learning for achieving the initial stage of the learning purpose is executed, and based on the result of the first learning, the second learning for learning the control to reach the state where the operation related to the learning cannot be continued is executed. Then, based on the result of the second learning, the third learning for achieving the learning purpose is included by excluding the control leading to the state in which the learning cannot be continued.
(Appendix 2)
By at least one or more hardware processors
It ’s a learning step,
Steps to accept learning data including learning purpose,
Based on the learning data, the steps to execute the learning and
The step of outputting the learning result by the neural network and
And
The step of executing the learning is
The first learning for achieving the initial stage of the learning purpose is executed, and based on the result of the first learning, the second learning for learning the control to reach the state where the operation related to the learning cannot be continued is executed. Then, based on the result of the second learning, a learning method including a step of executing the third learning for achieving the learning purpose by excluding the control leading to the inability to continue.

１学習装置
１０制御部
２０機械学習部
２１学習データ入出力部
２２ニューラルネットワーク
２３学習結果出力部
３０動作分類部
３１制御データ抽出部
３２動作分類結果抽出部
４０記憶部
９０自動走行制御車両
９１制御用センサ
９２アクチュエータ
９３状態検知用センサ 1 Learning device 10 Control unit 20 Machine learning unit 21 Learning data input / output unit 22 Neural network 23 Learning result output unit 30 Motion classification unit 31 Control data extraction unit 32 Motion classification result extraction unit 40 Storage unit 90 Automatic driving control vehicle 91 For control Sensor 92 Actuator 93 State detection sensor

Claims

A learning device that learns the control of movements related to a predetermined task.
A learning data reception unit that accepts learning data including learning purposes,
A neural network that executes learning based on the training data,
An output unit that outputs the learning result of the neural network and
With
The neural network
The first learning for achieving the initial stage of the learning purpose is executed, and based on the result of the first learning, the second learning for learning the control to reach the state where the operation related to the learning cannot be continued is executed. Then, based on the result of the second learning, the third learning for achieving the learning purpose is executed by excluding the control leading to the inability to continue.
Learning device.

The output unit
Output the result of the second learning,
The learning device according to claim 1.

The learning device is
A learning device that learns the control of a series of movements related to a predetermined task.
The task is divided into a plurality of scenes, and in each of the divided scenes, a classification unit for specifying a partial operation performed in the scene among the series of operations is further provided.
The neural network executes the second learning and the third learning for each partial operation.
The learning device according to claim 1.

It is an automatic driving control learning device that learns control about a series of movements related to automatic driving of a vehicle that goes around a predetermined course.
A learning data reception unit that receives learning data including a learning purpose for the purpose of going around the course a predetermined number of times within a predetermined time.
A neural network that executes learning based on the training data,
An output unit that outputs the learning result of the neural network and
With
The neural network
The first learning for achieving one lap of the course is executed, and based on the result of the first learning, the second learning for learning the control leading to the state in which the operation related to the learning cannot be continued is executed. Then, based on the result of the second learning, the third learning for achieving the learning purpose is executed by excluding the control leading to the state of being unable to continue.
Automatic driving control learning device.

A robot control learning device that learns control of a series of operations related to a task of grasping a predetermined work and stacking it in a placement place according to the shape of the work.
A learning data reception unit that receives learning data including a learning purpose for the purpose of stacking a predetermined number of the works in a predetermined place within a predetermined time.
A neural network that executes learning based on the training data,
An output unit that outputs the learning result of the neural network and
With
The neural network
The first learning for achieving the stacking of the work in the previously described place is executed, and based on the result of the first learning, the control leading to the state in which the operation related to the learning cannot be continued is learned. The second learning is executed, and based on the result of the second learning, the third learning for achieving the learning purpose is executed by excluding the control leading to the inability to continue.
Robot control learning device.

It is a learning method for learning the control of an operation related to a predetermined task executed by a computer provided with a control unit.
The control unit
Steps to accept learning data including learning purpose,
Based on the learning data, the steps to execute the learning and
A step of outputting the learning result by the step of executing the learning and a step of outputting the learning result
And
The step of executing the learning is
The first learning for achieving the initial stage of the learning purpose is executed, and based on the result of the first learning, the second learning for learning the control to reach the state where the operation related to the learning cannot be continued is executed. Then, based on the result of the second learning, the third learning for achieving the learning purpose is executed by excluding the control leading to the non-continuable state, including the step.
Learning method.

To a computer that learns to control movements related to a given task
Procedure for accepting learning data including learning purpose,
A procedure for executing learning based on the learning data, and a procedure for outputting the learning result by the means for executing the learning.
To execute,
The procedure for performing the learning is
The first learning for achieving the initial stage of the learning purpose is executed, and based on the result of the first learning, the second learning for learning the control to reach the state where the operation related to the learning cannot be continued is executed. Then, based on the result of the second learning, the third learning for achieving the learning purpose is executed by excluding the control leading to the non-continuable state, including the procedure.
program.

A device that performs a given task
A first sensor that senses information necessary for the device to perform a task,
Actuator and
A second sensor that senses the state change of the device by the actuator, and
A control unit that controls the actuator based on the sensor values output from the first sensor and the second sensor.
A storage unit that stores the learning results performed by the learning device according to any one of claims 1 to 3.
With
The control unit
Based on the learning result stored in the storage unit, the control amount according to the sensor values output from the first sensor and the second sensor is determined.
apparatus.