JP2007041723A

JP2007041723A - Sensor designing device, sensor designing method, sensor designing program and robot

Info

Publication number: JP2007041723A
Application number: JP2005223343A
Authority: JP
Inventors: Komei Sugiura; 孔明杉浦; Katsunori Shimohara; 勝憲下原; Kenkin Ryu; 健勤劉
Original assignee: ATR Advanced Telecommunications Research Institute International
Current assignee: ATR Advanced Telecommunications Research Institute International
Priority date: 2005-08-01
Filing date: 2005-08-01
Publication date: 2007-02-15
Anticipated expiration: 2025-08-01
Also published as: JP4670007B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sensor designing device which can automatically design form of a sensor being advantageous to a robot's action learning. <P>SOLUTION: An initial generation creation part 2 creates a plurality of genotypes for specifying the form of the sensor and virtually creates the plurality of robots having the form of the sensor specified according to each genotype. An action learning part 3 makes the plurality of robots to virtually learn and calculates adaptability of each robot based on the learning result. A selection part 4 selects the plurality of robots which serves as parent individuals based on the adaptability of each robot. A next-generation creation part 5 creates a next-generation genotype based on a genetic algorithm from the genotype of selected robot and virtually creates the plurality of robots having the form of the sensor specified by each next-generation genotype. The action learning part 3 makes the plurality of next-generation robots to virtually relearn. Processes by the selection part 4, the next-generation creation part 5, and the action learning part 3 are repeated by predetermined generations and the form of the sensor is determined. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ロボットの行動学習に使用されるセンサの形態を設計するセンサ設計装置、センサ設計方法及びセンサ設計プログラム、並びに前記センサ設計装置により設計されたセンサを有するロボットに関するものである。 The present invention relates to a sensor design device, a sensor design method and a sensor design program for designing the form of a sensor used for robot behavior learning, and a robot having a sensor designed by the sensor design device.

環境の変化に適応的なロボットを構築するためには、ロボットの形態、制御系及び環境のバランスを考慮しなければならない。近年、形態と制御系とをうまく組み合わせてロボットに適応的な振る舞いを獲得させる研究が注目されている。例えば、非特許文献１では、エージェントの形態と制御系とをソフトウエア上で進化させる手法と迅速成形技術とを組み合わせることによって、ソフトウエア上で得られた形態をハードウエアとして実現している。
リプソンエイチ（Lipson H.）他、「ロボット形態の自動設計及び製造」（Automatic design and manufacture of robotic lifeforams）、ネイチャー（Nature）、２０００年、Ｖｏｌ．４０６、Ｎｏ．６７９９、ｐ．９７４−ｐ．９７８ In order to construct a robot that is adaptive to changes in the environment, it is necessary to consider the balance of the robot's form, control system, and environment. In recent years, attention has been focused on research for acquiring adaptive behavior in robots by combining forms and control systems. For example, in Non-Patent Document 1, a form obtained on software is realized as hardware by combining a method of evolving an agent form and a control system on software and a rapid molding technique.
Lipson H. et al., “Automatic design and manufacture of robotic lifeforams”, Nature, 2000, Vol. 406, no. 6799, p. 974-p. 978

しかしながら、上記のような手法では、個体発生的なスパンにおける適応、すなわち学習にとって有利なセンサの形態を自動的に設計することはできない。 However, with the above-described method, it is not possible to automatically design a sensor configuration that is advantageous for adaptation in ontogenic spans, that is, for learning.

本発明の目的は、ロボットの行動学習に有利なセンサの形態を自動的に設計することができるセンサ設計装置、センサ設計方法、センサ設計プログラム及びロボットを提供することである。 An object of the present invention is to provide a sensor design device, a sensor design method, a sensor design program, and a robot capable of automatically designing a sensor form advantageous for robot behavior learning.

本発明に係るセンサ設計装置は、ロボットの行動学習に使用されるセンサの形態を設計するセンサ設計装置であって、センサの形態を特定するための複数の遺伝子型を作成し、各遺伝子型により特定されるセンサの形態を有する複数のロボットを仮想的に作成する初期世代作成手段と、初期世代作成手段により作成された複数のロボットに仮想的に学習を行わせ、学習結果を基に各ロボットの適応度を算出する学習手段と、学習手段により算出された各ロボットの適応度を基に親個体となる複数のロボットを選択し、選択した複数のロボットの遺伝子型から遺伝的アルゴリズムに基づき次世代の遺伝子型を作成し、各次世代の遺伝子型により特定されるセンサの形態を有する複数のロボットを仮想的に作成する次世代作成手段とを備え、学習手段は、次世代作成手段により作成されたロボットに仮想的に再度学習を行わせ、学習結果を基に各ロボットの適応度を算出し、次世代作成手段及び学習手段による処理を所定数繰り返すことによりセンサの形態を決定するものである。 The sensor design device according to the present invention is a sensor design device for designing the form of a sensor used for robot behavior learning, and creates a plurality of genotypes for specifying the form of the sensor. An initial generation creation means for virtually creating a plurality of robots having a specified sensor form, and a plurality of robots created by the initial generation creation means are virtually trained, and each robot based on the learning result The learning means for calculating the fitness of each robot, and a plurality of parent robots are selected based on the fitness of each robot calculated by the learning means. Next-generation creation means to create a genotype of the next generation and virtually create a plurality of robots having a sensor form specified by each next-generation genotype, and learning The stage allows the robot created by the next generation creation means to virtually perform learning again, calculates the fitness of each robot based on the learning result, and repeats the processing by the next generation creation means and the learning means a predetermined number of times. This determines the form of the sensor.

本発明に係るセンサ設計装置では、センサの形態を特定するための複数の遺伝子型が作成され、各遺伝子型により特定されるセンサの形態を有する複数のロボットが仮想的に作成され、作成された複数のロボットに学習を行わせ、学習した各ロボットの学習結果を基に各ロボットの適応度が算出される。 In the sensor design device according to the present invention, a plurality of genotypes for specifying the sensor form are created, and a plurality of robots having the sensor form specified by each genotype are virtually created and created. Learning is performed by a plurality of robots, and the fitness of each robot is calculated based on the learning result of each learned robot.

次に、算出された各ロボットの適応度を基に親個体となる複数のロボットが選択され、選択された複数のロボットの遺伝子型から遺伝的アルゴリズムに基づき次世代の遺伝子型が作成され、各次世代の遺伝子型により特定されるセンサの形態を有する複数のロボットが仮想的に作成され、作成された複数のロボットに再度学習を行わせ、学習結果を基に各ロボットの適応度が算出され、これらの次世代作成処理及びその学習処理が所定数繰り返されることによりセンサの形態が決定される。 Next, based on the calculated fitness of each robot, a plurality of parent robots are selected, and a next generation genotype is created based on the genetic algorithm from the selected genotypes of the plurality of robots. A plurality of robots having the form of sensors specified by the next-generation genotype are virtually created, and the created robots are made to learn again, and the fitness of each robot is calculated based on the learning results. The form of the sensor is determined by repeating the next generation creation process and the learning process thereof a predetermined number of times.

したがって、学習結果に基づいて、適応度の高いロボットすなわち学習性能が高いロボットが、死滅することなく、親個体のロボットとして選択され、センサの形態を特定する遺伝子型を進化させることができるので、ロボットの行動学習に有利なセンサの形態を自動的に設計することができる。 Therefore, based on the learning result, a robot with high fitness, that is, a robot with high learning performance, is selected as a parent individual robot without dying, and the genotype that identifies the form of the sensor can be evolved, It is possible to automatically design a sensor form that is advantageous for robot behavior learning.

学習手段は、Ｑ学習により複数のロボットに学習を行わせることが好ましい。この場合、種々の遺伝子型すなわち種々のセンサの形態を有するロボットを効率的に学習させることができ、好適な形態のセンサを高速に設計することができる。 The learning means preferably causes a plurality of robots to perform learning by Q learning. In this case, a robot having various genotypes, that is, various sensor forms can be efficiently learned, and a sensor having a suitable form can be designed at high speed.

次世代作成手段は、学習手段により算出された各ロボットの適応度を基に、複数のロボットの中から学習性能の高い所定数のロボットを親個体として選択するとともに、残りのロボットからトーナメント選択により同数のロボットを親個体として選択する選択手段と、選択手段により選択された親個体の遺伝子型から遺伝的アルゴリズムに基づき次世代の遺伝子型を作成し、各次世代の遺伝子型により特定されるセンサの形態を有する複数のロボットを次世代ロボットとして仮想的に作成する作成手段とを備え、学習手段は、作成手段により作成された次世代ロボットに仮想的に再度学習を行わせ、学習結果を基に各ロボットの適応度を算出することが好ましい。 Based on the fitness of each robot calculated by the learning means, the next generation creation means selects a predetermined number of robots with high learning performance as a parent individual from a plurality of robots, and selects tournaments from the remaining robots. A selection unit that selects the same number of robots as a parent individual, and a next generation genotype based on the genetic algorithm from the genotype of the parent individual selected by the selection unit, and a sensor that is identified by each next generation genotype Creating means for virtually creating a plurality of robots having the following forms as next-generation robots, and the learning means virtually causes the next-generation robot created by the creating means to perform learning again, and based on the learning results. It is preferable to calculate the fitness of each robot.

この場合、エリート戦略により適応度の高いロボットがそのまま親個体として選択されるので、学習性能が高いロボットが偶然選択されずに死滅することを防止することができるとともに、トーナメント選択により選択された適応度の高いロボットからも次世代ロボットを作成することができるので、学習性能が高いロボットが多産となるようにすることができる。この結果、学習性能が高いロボットを死滅させることなく、順次学習させることができるので、学習結果に基づいて最適なセンサの形態を設計することができる。 In this case, the robot with high fitness is selected as the parent individual as it is by the elite strategy, so that it is possible to prevent the robot with high learning performance from being killed without being selected by chance, and the adaptation selected by the tournament selection Since a next-generation robot can be created from a highly skilled robot, a robot with high learning performance can be prolific. As a result, robots with high learning performance can be sequentially learned without being killed, so that an optimal sensor configuration can be designed based on the learning results.

遺伝子型は、センサの位置、個数、分解能、センシング間隔の少なくとも一つを特定することが好ましい。この場合、センサの位置、個数、分解能、センシング間隔等の形態を自動的に設計することができる。 The genotype preferably specifies at least one of the position, number, resolution, and sensing interval of the sensor. In this case, forms such as the position, number, resolution, and sensing interval of the sensors can be automatically designed.

次世代作成手段は、親個体となるロボットの遺伝子型から交叉及び突然変異の少なくとも一方を用いて次世代の遺伝子型を作成し、各遺伝子型により特定されるセンサの形態を有する複数のロボットを仮想的に作成することが好ましい。この場合、広範な探索空間から最適な遺伝子型を探索することができ、最適な形態のセンサを効率的に設計することができる。 The next-generation creation means creates a next-generation genotype using at least one of crossover and mutation from the genotype of a robot that becomes a parent individual, and includes a plurality of robots having a sensor form specified by each genotype. It is preferable to create it virtually. In this case, an optimal genotype can be searched from a wide search space, and an optimal sensor can be efficiently designed.

本発明に係るセンサ設計方法は、初期世代作成手段、学習手段及び次世代作成手段を備えるセンサ設計装置を用いて、ロボットの行動学習に使用されるセンサの形態を設計するセンサ設計方法であって、初期世代作成手段が、センサの形態を特定するための複数の遺伝子型を作成し、各遺伝子型により特定されるセンサの形態を有する複数のロボットを仮想的に作成する第１のステップと、学習手段が、初期世代作成手段により作成された複数のロボットに仮想的に学習を行わせ、学習結果を基に各ロボットの適応度を算出する第２のステップと、次世代作成手段が、学習手段により算出された各ロボットの適応度を基に親個体となる複数のロボットを選択し、選択した複数のロボットの遺伝子型から遺伝的アルゴリズムに基づき次世代の遺伝子型を作成し、各次世代の遺伝子型により特定されるセンサの形態を有する複数のロボットを仮想的に作成する第３のステップと、学習手段が、第３のステップにおいて作成されたロボットに仮想的に再度学習を行わせ、学習結果を基に各ロボットの適応度を算出する第４のステップとを含み、第３及び第４のステップによる処理を所定数繰り返すことによりセンサの形態を決定するものである。 A sensor design method according to the present invention is a sensor design method for designing a form of a sensor used for robot behavior learning using a sensor design device including an initial generation creation means, a learning means, and a next generation creation means. A first step in which the initial generation creating means creates a plurality of genotypes for specifying a sensor form and virtually creates a plurality of robots having a sensor form specified by each genotype; The learning means causes the plurality of robots created by the initial generation creating means to virtually perform learning, and the next generation creating means performs learning by calculating the fitness of each robot based on the learning result. Based on the fitness of each robot calculated by the means, a plurality of robots as parent individuals are selected, and the next generation remains based on the genetic algorithm from the genotypes of the selected robots. A third step of creating a sub-type and virtually creating a plurality of robots having a sensor form specified by each next-generation genotype; and a learning means for the robot created in the third step. A fourth step of virtually re-learning and calculating the fitness of each robot based on the learning result, and determining the form of the sensor by repeating the processes in the third and fourth steps a predetermined number of times To do.

本発明に係るセンサ設計プログラムは、ロボットの行動学習に使用されるセンサの形態を設計するためのセンサ設計プログラムであって、センサの形態を特定するための複数の遺伝子型を作成し、各遺伝子型により特定されるセンサの形態を有する複数のロボットを仮想的に作成する初期世代作成手段と、初期世代作成手段により作成された複数のロボットに仮想的に学習を行わせ、学習結果を基に各ロボットの適応度を算出する学習手段と、学習手段により算出された各ロボットの適応度を基に親個体となる複数のロボットを選択し、選択した複数のロボットの遺伝子型から遺伝的アルゴリズムに基づき次世代の遺伝子型を作成し、各次世代の遺伝子型により特定されるセンサの形態を有する複数のロボットを仮想的に作成する次世代作成手段としてコンピュータを機能させ、学習手段は、次世代作成手段により作成されたロボットに仮想的に再度学習を行わせ、学習結果を基に各ロボットの適応度を算出し、次世代作成手段及び学習手段による処理を所定数繰り返すことによりセンサの形態を決定するものである。 A sensor design program according to the present invention is a sensor design program for designing a sensor form used for robot behavior learning, and creates a plurality of genotypes for specifying a sensor form, An initial generation creation unit that virtually creates a plurality of robots having a sensor form specified by a mold, and a plurality of robots created by the initial generation creation unit virtually perform learning, and based on the learning result A learning means for calculating the fitness of each robot, and a plurality of robots that become parent individuals are selected based on the fitness of each robot calculated by the learning means, and the genetic algorithm is selected from the genotypes of the selected robots. Next generation creation that creates next generation genotypes and virtually creates multiple robots with sensor forms identified by each next generation genotype The computer functions as a stage, and the learning means virtually re-learns the robot created by the next generation creating means, calculates the fitness of each robot based on the learning result, and the next generation creating means and learning The form of the sensor is determined by repeating the processing by the means a predetermined number of times.

本発明に係るロボットは、上記いずれかに記載のセンサ設計装置により設計されたセンサを有するものである。 The robot according to the present invention has a sensor designed by any one of the sensor design apparatuses described above.

本発明によれば、学習結果に基づいて、適応度の高いロボットすなわち学習性能が高いロボットが、死滅することなく、親個体のロボットとして選択され、センサの形態を特定する遺伝子型を進化させることができるので、ロボットの行動学習に有利なセンサの形態を自動的に設計することができる。 According to the present invention, based on the learning result, a robot with high fitness, that is, a robot with high learning performance, is selected as a parent individual robot without dying, and evolves a genotype that identifies the form of the sensor. Therefore, it is possible to automatically design a sensor form that is advantageous for learning behavior of the robot.

以下、本発明の一実施の形態によるセンサ設計装置について図面を参照しながら説明する。図１は、本発明の一実施の形態によるセンサ設計装置の構成を示すブロック図である。 Hereinafter, a sensor design device according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a sensor design apparatus according to an embodiment of the present invention.

図１に示すセンサ設計装置は、入力部１、初期世代作成部２、行動学習部３、選択部４、次世代作成部５及び出力部６を備える。センサ設計装置は、ＲＯＭ（リードオンリメモリ）、ＣＰＵ（中央演算処理装置）、ＲＡＭ（ランダムアクセスメモリ）、外部記憶装置、記録媒体駆動装置、入力装置及び表示装置等を備える通常のコンピュータを用いて、後述するセンサ設計処理を実行するためのセンサ設計プログラムをＣＰＵ等で実行することにより上記の各機能を実現することができる。 The sensor design apparatus shown in FIG. 1 includes an input unit 1, an initial generation creation unit 2, a behavior learning unit 3, a selection unit 4, a next generation creation unit 5, and an output unit 6. The sensor design apparatus uses a normal computer including a ROM (Read Only Memory), a CPU (Central Processing Unit), a RAM (Random Access Memory), an external storage device, a recording medium driving device, an input device, a display device, and the like. The above functions can be realized by executing a sensor design program for executing a sensor design process, which will be described later, on a CPU or the like.

本実施の形態では、上記コンピュータと、シミュレータとして、Ｃｙｂｅｒｂｏｔｉｃｓ社製Ｗｅｂｏｔｓとを用い、センサやアクチュエータの特性を定義して後述するロボットを仮想的に作成してセンサの形態をシミュレーションしている。なお、センサ設計装置の構成は、上記の例に特に限定されず、上記各機能の一部又は全てを専用のハードウエア回路により実現する等の種々の変更が可能である。 In the present embodiment, the computer and Webbots manufactured by Cyberbotics are used as a simulator, the characteristics of sensors and actuators are defined, and a robot described later is virtually created to simulate the sensor configuration. The configuration of the sensor design apparatus is not particularly limited to the above example, and various modifications such as realizing a part or all of the above functions by a dedicated hardware circuit are possible.

入力部１は、ロボットを構成するセンサ、アクチュエータ及びコントローラ（エージェント）等を定義するためのモデルデータ等をユーザが入力するために使用される。 The input unit 1 is used by a user to input model data and the like for defining sensors, actuators, controllers (agents) and the like constituting the robot.

初期世代作成部２は、入力部１から入力されたモデルデータ等を用いて、センサの形態を特定するための遺伝子型がランダムになるように複数の遺伝子型を作成し、各遺伝子型により特定されるセンサの形態を有する複数の初期世代ロボットを仮想的に作成し、作成した初期世代ロボットのデータを行動学習部３へ出力する。なお、初期世代ロボットのセンサ以外の構成要素であるアクチュエータ及びコントローラ等は、すべてのロボットについて共通であり、後述する次世代ロボットも同様である。 The initial generation creation unit 2 creates a plurality of genotypes using the model data input from the input unit 1 so that the genotypes for specifying the sensor form are random, and identifies each genotype A plurality of initial generation robots having the form of a sensor to be created is virtually created, and data of the created initial generation robots is output to the behavior learning unit 3. The actuators, controllers, and the like, which are components other than the sensors of the initial generation robot, are common to all robots, and the same applies to the next generation robot described later.

ここで、センサの形態は、物理的な形態のみではなく、センサの特性及び使用状態等をも含み、本実施の形態に用いられる遺伝子型は、センサの位置、個数、分解能、センシング間隔、及びセンサ値の制御系に対する重みを表す結合強度等のパラメータを特定するコード等を特定する情報であり、「１」又は「０」を用いて表現される。例えば、１のパラメータが４ｂｉｔでデコードされ、４種類のパラメータから遺伝子型が表される場合、遺伝子型の長さは、２５６ｂｉｔとなる。 Here, the form of the sensor includes not only the physical form but also the characteristics and use state of the sensor, and the genotype used in the present embodiment includes the position, number, resolution, sensing interval, and sensing interval of the sensor. This is information that specifies a code or the like that specifies a parameter such as a coupling strength that represents the weight of the sensor value for the control system, and is expressed using “1” or “0”. For example, when one parameter is decoded in 4 bits and a genotype is expressed from 4 types of parameters, the length of the genotype is 256 bits.

行動学習部３は、初期世代ロボットのデータを用いて、初期世代ロボットに仮想的に強化学習を行わせ、その学習結果に基づく適応度を算出して各ロボットのデータとともに選択部４へ出力する。例えば、行動学習部３は、Ｑ学習を用いてロボットに学習を行わせ、ロボットのコントローラに以下の処理を仮想的に実行させる。
１．センサを用いて環境の状態ｓ_ｔを観測する。
２．行動選択戦略に従って行動ａ_ｔを実行する。
３．状態に応じて報酬ｒ_ｔを受け取る。
４．センサを用いて状態遷移後の状態ｓ_ｔ＋１を観測する。
５．下記式（１）に従ってＱ値を更新する。 The behavior learning unit 3 causes the initial generation robot to virtually perform reinforcement learning using the data of the initial generation robot, calculates the fitness based on the learning result, and outputs it to the selection unit 4 together with the data of each robot. . For example, the behavior learning unit 3 causes the robot to perform learning using Q learning, and causes the controller of the robot to virtually execute the following processing.
1. An environmental state _st is observed using a sensor.
2. To perform the action a _t in accordance with the action selection strategy.
3. It receives a reward r _t depending on the state.
4). The state s _{t + 1} after the state transition is observed using the sensor.
5. The Q value is updated according to the following formula (1).

Ｑ（ｓ_ｔ，ａ_ｔ）←Ｑ（ｓ_ｔ，ａ_ｔ）＋α〔ｒ_ｔ＋γｍａｘ_ａｔ＋１Ｑ（ｓ_ｔ＋１，ａ_ｔ＋１）−Ｑ（ｓ_ｔ，ａ_ｔ）〕…（１）
ここで、αは学習率（０＜α＜１）、γは割引率（０＜γ＜１）である。
６．時間ステップｔをｔ＋１に進めて手順１に戻る。 Q (s _t , a _t ) <-Q (s _t , a _t ) + α [r _t + γmax _{at + 1} Q (s _{t + 1} , a _{t + 1} ) −Q (s _t , a _t )] (1)
Here, α is a learning rate (0 <α <1), and γ is a discount rate (0 <γ <1).
6). Advance time step t to t + 1 and return to procedure 1.

選択部４は、エリート戦略に従い、適応度を基に上位のロボットを親個体として所定数選択するともに、トーナメント選択に従い、選択しなかった残りのロボットから同数のロボットを親個体としてさらに選択してそれらのデータを次世代作成部５へ出力する。 The selection unit 4 selects a predetermined number of upper robots as parent individuals based on fitness according to the elite strategy, and further selects the same number of robots as parent individuals from the remaining robots that were not selected according to the tournament selection. The data is output to the next generation creation unit 5.

次世代作成部５は、選択部４により選択された親個体の遺伝子型から遺伝的アルゴリズムに基づき次世代の遺伝子型を作成し、各遺伝子型により特定されるセンサの形態を有する複数の次世代ロボットを仮想的に作成し、作成した次世代ロボットのデータを行動学習部３へ出力する。例えば、次世代作成部５は、選択部４により選択された親個体の遺伝子型から交叉及び突然変異により次世代の遺伝子型を作成する。 The next generation creation unit 5 creates a next generation genotype based on a genetic algorithm from the genotype of the parent individual selected by the selection unit 4, and a plurality of next generations having a sensor form specified by each genotype A robot is virtually created, and the data of the created next-generation robot is output to the behavior learning unit 3. For example, the next generation creation unit 5 creates the next generation genotype by crossover and mutation from the genotype of the parent individual selected by the selection unit 4.

なお、選択部４及び次世代作成部５による選択操作及び遺伝子操作は、上記の例に特に限定されず、適応度を基に下位（例えば、１０％）のロボットを取り除き、残ったものからトーナメント選択、ルーレット選択、期待値選択、又はランキング選択等により同数の親を選択して遺伝子操作を加える等の種々の変更が可能である。また、交叉についても、１点交叉、多点交叉、又は一様交叉等の種々のものを用いることができる。 The selection operation and the gene operation by the selection unit 4 and the next generation creation unit 5 are not particularly limited to the above example, and the lower (for example, 10%) robots are removed based on the fitness, and the tournament from the remaining ones. Various changes such as selection, roulette selection, expected value selection, ranking selection, etc., selecting the same number of parents and adding genetic manipulations are possible. In addition, various crossover methods such as one-point crossover, multipoint crossover, and uniform crossover can be used.

次世代作成部５から出力される次世代ロボットのデータを受け取った行動学習部３は、そのデータを用いて、次世代ロボットに仮想的に強化学習、例えば、Ｑ学習を行わせ、その学習結果に基づく適応度を各ロボットのデータとともに選択部４へ出力する。 The behavior learning unit 3 that has received the next-generation robot data output from the next-generation creation unit 5 causes the next-generation robot to virtually perform reinforcement learning, for example, Q learning using the data, and the learning result The fitness based on the above is output to the selection unit 4 together with the data of each robot.

上記の選択部４、次世代作成部５及び行動学習部３による処理を所定世代繰り返すことにより最終世代のロボットが仮想的に作成され、選択部４は、最終世代ロボットの中から適応度の最も高いロボットを選択し、最良個体としてそのデータを出力部６へ出力する。 By repeating the processes by the selection unit 4, the next generation creation unit 5 and the behavior learning unit 3 for a predetermined generation, a final generation robot is virtually created, and the selection unit 4 has the highest fitness among the last generation robots. A high robot is selected, and the data is output to the output unit 6 as the best individual.

出力部６は、最良個体のデータを表示又は印刷等して最終的に決定されたセンサの形態をユーザに知らせる。なお、出力部６が出力するセンサの形態としては、遺伝子型、表示型等のいずれを用いてもよい。 The output unit 6 informs the user of the form of the sensor finally determined by displaying or printing the data of the best individual. In addition, as a form of the sensor which the output part 6 outputs, any of a genotype, a display type, etc. may be used.

本実施の形態では、初期世代作成部２が初期世代作成手段の一例に相当し、行動学習部３が学習手段の一例に相当し、選択部４及び次世代作成部５が次世代作成手段の一例に相当し、選択部４が選択手段の一例に相当し、次世代作成部５が作成手段の一例に相当する。 In the present embodiment, the initial generation creation unit 2 corresponds to an example of an initial generation creation unit, the behavior learning unit 3 corresponds to an example of a learning unit, and the selection unit 4 and the next generation creation unit 5 serve as next generation creation units. The selection unit 4 corresponds to an example of a selection unit, and the next generation generation unit 5 corresponds to an example of a generation unit.

次に、上記のように構成されたセンサ設計装置によるセンサ設計処理について説明する。図２は、図１に示すセンサ設計装置によるセンサ設計処理を説明するためのフローチャートである。 Next, sensor design processing by the sensor design apparatus configured as described above will be described. FIG. 2 is a flowchart for explaining sensor design processing by the sensor design apparatus shown in FIG.

まず、ユーザが、入力部１を用いて、ロボットを構成するセンサ、アクチュエータ及びコントローラ等を定義するためのモデルデータ等を入力すると、ステップＳ１において、初期世代作成部２は、入力されたモデルデータ等を用いて、センサの形態を特定するための遺伝子型がランダムになるように複数の遺伝子型を作成する。 First, when a user inputs model data or the like for defining sensors, actuators, controllers, and the like constituting the robot using the input unit 1, in step S1, the initial generation creating unit 2 inputs the input model data. Etc. are used to create a plurality of genotypes so that the genotypes for specifying the sensor morphology are random.

ここで、本センサ設計装置によりセンサの形態が設計されるロボットの一例について説明する。図３は、図１に示すセンサ設計装置の設計対象となるロボットの一例を示す底面模式図であり、図４は、図３に示すロボットのタスク環境を示す模式図であり、図５は、図４に示すコース上のロボットの状態を示す模式図である。 Here, an example of a robot whose sensor form is designed by the sensor design apparatus will be described. 3 is a schematic bottom view showing an example of a robot to be designed by the sensor design apparatus shown in FIG. 1, FIG. 4 is a schematic diagram showing a task environment of the robot shown in FIG. 3, and FIG. It is a schematic diagram which shows the state of the robot on the course shown in FIG.

図３に示すロボット１０は、赤外発光ＬＥＤ及び受光素子から構成されるセンサ１１、センサ１１が固定される固定台１２、固定台１２を保持する本体部１３、及び本体部１３に回転可能に支持される２個の車輪１４を備える。ロボット１０は、床に引かれたラインに沿って移動してゴールを目指すロボットであり、ライントレーサと呼ばれる。 The robot 10 shown in FIG. 3 is rotatable to a sensor 11 composed of an infrared light emitting LED and a light receiving element, a fixed base 12 to which the sensor 11 is fixed, a main body 13 that holds the fixed base 12, and a main body 13. Two wheels 14 to be supported are provided. The robot 10 is a robot that moves along a line drawn on the floor and aims at a goal, and is called a line tracer.

図４に示すコースは、２ｍ×５ｍの長方形であり、４分の１ずつの地点にチェックポイントＰ１〜Ｐ４が設けられ、図５に示すように、ロボット１０は、ラインＬＩに沿って移動してゴールを目指す。ここで、ロボット１０のセンサ１１は、コース上に設けられたラインＬＩを検出するために床面の色に応じた値を本体部１３内のコントローラ（図示省略）へ出力し、コントローラは、本体部１３内のモータ及び駆動回路（図示省略）を制御することにより、センサ１１の出力に応じてラインＬＩ上をトレースするように車輪１４を駆動させる。 The course shown in FIG. 4 is a rectangle of 2 m × 5 m, and check points P1 to P4 are provided at each quarter point. As shown in FIG. 5, the robot 10 moves along the line LI. Aim for the goal. Here, the sensor 11 of the robot 10 outputs a value corresponding to the color of the floor surface to a controller (not shown) in the main body unit 13 in order to detect the line LI provided on the course. By controlling a motor and a drive circuit (not shown) in the unit 13, the wheel 14 is driven so as to trace the line LI according to the output of the sensor 11.

センサ１１は、予め規定された８×４のマトリックス状の３２個の配置可能位置の中から任意の個数及び位置を選択して配置されることができ、本センサ設計装置は、ロボットの行動学習に有利なセンサ１１の位置及び個数を自動的に設計する。この場合、ステップＳ１において、上記のライントレーサ及びコースをモデル化するためのモデルデータ等が入力され、初期世代作成部２は、入力されたモデルデータ等を用いて、センサ１１の位置及び個数を特定するための３２ｂｉｔの遺伝子型がコード化され、５０個の遺伝子型をランダムに作成する。 The sensor 11 can be arranged by selecting an arbitrary number and position from 32 pre-defined 8 × 4 matrix arrangement possible positions. The position and the number of sensors 11 advantageous to the above are automatically designed. In this case, in step S1, model data and the like for modeling the line tracer and the course are input, and the initial generation creating unit 2 uses the input model data and the like to determine the position and number of sensors 11. A 32-bit genotype for identification is encoded, and 50 genotypes are randomly generated.

次に、ステップＳ２において、初期世代作成部２は、作成した各遺伝子型により特定されるセンサの形態を有する複数の初期世代ロボットを仮想的に作成し、作成した初期世代ロボットのデータを行動学習部３へ出力する。このとき、行動学習部３は、時間ステップｔ等を初期化する。 Next, in step S2, the initial generation creation unit 2 virtually creates a plurality of initial generation robots having a sensor form specified by each created genotype, and performs behavior learning on the data of the created initial generation robots. Output to part 3. At this time, the behavior learning unit 3 initializes the time step t and the like.

例えば、上記のライントレーサの場合、ステップＳ２において、初期世代作成部２は、５０個の遺伝子型の各々により特定されるセンサの位置及び配置を有する５０個の初期世代ロボットを仮想的に作成し、初期世代ロボットの個体数（集団数）は５０となる。 For example, in the case of the above-described line tracer, in step S2, the initial generation creation unit 2 virtually creates 50 initial generation robots having the positions and arrangements of the sensors specified by each of the 50 genotypes. The number of individuals (the number of groups) of the initial generation robot is 50.

次に、ステップＳ３において、行動学習部３は、初期世代ロボットのデータを用いてモデル化した仮想ロボットのコントローラにセンサを用いて環境の状態ｓ_ｔを観測させる。例えば、環境の状態ｓ_ｔは、各センサの出力から構成され、センサの出力が２値で、４つのセンサを有するロボットの場合、状態数は２×２×２×２＝１６になる。なお、センサの出力値は、上記の例に特に限定されず、多値を用いてもよい。 Next, in step S3, action learning unit 3, using the sensor to the controller of the virtual robot modeled with data of the initial generation robots to observe the state s _t environment. For example, the environmental state _st is composed of the output of each sensor, the output of the sensor is binary, and in the case of a robot having four sensors, the number of states is 2 × 2 × 2 × 2 = 16. Note that the output value of the sensor is not particularly limited to the above example, and multiple values may be used.

次に、ステップＳ４において、行動学習部３は、行動選択戦略としてεグリーディ戦略に従った行動ａ_ｔをコントローラに実行させる。ここで、εグリーディ戦略とは、εの確率でランダムな行動を選択し、それ以外の場合はＱ値が最大の行動を取る戦略である。 Next, in step S4, action learning unit 3, to perform an action a _t in accordance with the ε greedy strategy as an action selection strategy to the controller. Here, the ε greedy strategy is a strategy in which a random action is selected with a probability of ε, and in other cases, an action having the maximum Q value is taken.

例えば、上記のライントレーサの場合、コントローラが取り得る行動ａ_ｔは、直進ａ_ｔ０、左に曲がるａ_ｔ１、右に曲がるａ_ｔ２、大きく左に曲がるａ_ｔ３、大きく右に曲がるａ_ｔ４の５種類あり、ロボットの左右の車輪の角速度（ｒａｄ／ｓ）をω_Ｌ，ω_Ｒとすると、ａ_ｔ０の場合は（ω_Ｌ，ω_Ｒ）＝（１５，１５）、ａ_ｔ１の場合は（ω_Ｌ，ω_Ｒ）＝（１３，６）、ａ_ｔ２の場合は（ω_Ｌ，ω_Ｒ）＝（６，１３）、ａ_ｔ３の場合は（ω_Ｌ，ω_Ｒ）＝（８，２）、ａ_ｔ４の場合は（ω_Ｌ，ω_Ｒ）＝（２，８）になるように車輪の角速度が制御され、ε＝０．０１を用いることができる。 For example, if the line tracer, the action _{a t} the controller can take, straight _{a t0,} _{a t1} to turn left, _{a t2} to turn right, _{a t3} to Sharp left, five _{a t4} to Sharp right There, when the wheel angular velocity (rad / s) to omega _L of the right and left of the robot, when the omega _{_R,} in the case of _{a t0} of _{_{(ω L, ω R) =}} (15,15), a t1 (ω L , Ω _R ) = (13, 6), (ω _L , ω _R ) = (6, 13) for a _t2 , and (ω _L , ω _R ) = (8, 2) for a _t3 , a _In the case of _t4 , the angular velocity of the wheel is controlled so that (ω _L , ω _R ) = (2, 8), and ε = 0.01 can be used.

次に、ステップＳ５において、行動学習部３は、状態に応じて報酬ｒ_ｔをコントローラに与える。例えば、上記のライントレーサの場合、ロボットの中心がラインから外れるほど罰を大きくした報酬を用いることができ、図３に示す８列のセンサをロボットの下から見て左側から第０列、第１列、…、第７列とすると、観測した状態が、（１）第０列のセンサ又は第７列のセンサがライン上にある場合はｒ_ｔ＝−２０．０、（２）第１列のセンサ又は第６列のセンサがライン上にある場合はｒ_ｔ＝−１０．０、（３）第２列のセンサ又は第５列のセンサがライン上にある場合はｒ_ｔ＝０．０、（４）第３列のセンサ又は第４列のセンサがライン上にある場合はｒ_ｔ＝１．０、（５）全てのセンサがライン外にある場合はｒ_ｔ＝−１００．０となる報酬ｒ_ｔを用いることができる。なお、上記の条件（１）〜（４）は、（４）、（３）、（２）、（１）の順に優先して使用される。 Next, in step S5, action learning unit 3, reward r _t to the controller according to the state. For example, in the case of the above-described line tracer, it is possible to use a reward in which punishment increases as the center of the robot moves off the line. The eight rows of sensors shown in FIG. Assuming that the first column,..., The seventh column, the observed state is (1) when the sensor in the 0th column or the sensor in the seventh column is on the line, r _t = −20.0, (2) the first R _t = -10.0 if the column sensor or the sixth column sensor is on the line, (3) r _t = 0 if the second column sensor or the fifth column sensor is on the line. 0, (4) r _t = 1.0 if the third row sensor or fourth row sensor is on line, (5) r _t = −100.0 if all sensors are off line The reward r _t can be used. In addition, said conditions (1)-(4) are used preferentially in order of (4), (3), (2), (1).

次に、ステップＳ６において、行動学習部３は、上記式（１）に従ってＱ値をコントローラに更新させる。上記式（１）により、ロボットが採った行動ａ_ｔにより報酬ｒ_ｔを得てＱ（ｓ_ｔ，ａ_ｔ）が更新され、その報酬が正であったならば、Ｑ（ｓ_ｔ，ａ_ｔ）が増加するため、次回同じ状態になった場合、その行動を採る可能性が高くなる。例えば、上記のライントレーサの場合、上記式（１）において、学習率α＝０．８、割引率γ＝０．９９９を用いることができる。 Next, in step S6, the behavior learning unit 3 causes the controller to update the Q value according to the above equation (1). According to the above equation (1), to obtain the reward _{r t} by the action _{a t} the robot has taken _{Q (s} _{t, a} t) is updated, if the reward is positive, _{Q (s} _{t, a} t ) Will increase, the next time you are in the same state, you are more likely to take that action. For example, in the case of the above line tracer, the learning rate α = 0.8 and the discount rate γ = 0.999 can be used in the equation (1).

次に、ステップＳ７において、行動学習部３は、時間ステップｔが最大ステップ数に達して１試行が終了したか否かを判断し、最大ステップ数に達していない場合は、時間ステップｔをｔ＋１に進め、ステップＳ３に戻って以降の処理を継続し、最大ステップ数に達した場合はステップＳ８へ処理を移行する。例えば、上記のライントレーサの場合、最大ステップ数＝２０００を用いることができ、また、ライントレーサが最大ステップ数以内にゴールに到達した場合も１試行が終了するため、ステップＳ８へ処理を移行する。 Next, in step S7, the behavior learning unit 3 determines whether or not the time step t has reached the maximum number of steps and one trial has ended, and if the maximum number of steps has not been reached, the time step t is set to t + 1. The process returns to step S3 and the subsequent processing is continued. If the maximum number of steps has been reached, the process proceeds to step S8. For example, in the case of the above-described line tracer, the maximum number of steps = 2000 can be used, and when the line tracer reaches the goal within the maximum number of steps, one trial is completed, so the process proceeds to step S8. .

次に、ステップＳ８において、行動学習部３は、試行回数がエピソード数に達して学習が終了したか否かを判断し、学習が終了していない場合は、学習条件等を初期化してステップＳ３に戻って次の試行を継続し、学習が終了した場合はステップＳ９へ処理を移行する。例えば、上記のライントレーサの場合、エピソード数＝１００を用いることができ、また、試行機会（エピソード）毎にロボットの位置及び向きを初期化して次の試行を継続する。 Next, in step S8, the behavior learning unit 3 determines whether or not the learning has ended because the number of trials has reached the number of episodes. If learning has not ended, the learning conditions and the like are initialized and step S3 is performed. Returning to step S3, the next trial is continued, and when learning is completed, the process proceeds to step S9. For example, in the case of the above-described line tracer, the number of episodes = 100 can be used, and the position and orientation of the robot are initialized at each trial opportunity (episode) to continue the next trial.

学習が終了した場合、ステップＳ９において、行動学習部３は、学習結果に基づく適応度を算出し、学習を行った各ロボットのデータとともに対応する応答度を選択部４へ出力する。例えば、上記のライントレーサの場合、１個体に付き１００回の試行を行い、下記の式（２）により適応度φを算出する。 When learning is completed, in step S9, the behavior learning unit 3 calculates the fitness based on the learning result, and outputs the corresponding responsiveness to the selection unit 4 together with the data of each learned robot. For example, in the case of the above-described line tracer, 100 trials are performed per individual, and the fitness φ is calculated by the following equation (2).

ここで、Ｎはエピソード数（１００回）、Ｈ_ｉはｉ回目の試行の達成度、ｔ_ｉは図４に示すチェックポイントＰ_ｉへの到達時間（ｓｅｃ）、Ｔ_ｍａｘは最大試行時間（１２８ｓｅｃ）である。上式より、学習が進んだ個体（ロボット）ほど各試行における達成度が高くなり、適応度も高くなる。すなわち、適応度の高い個体は、学習しやすいセンサの形態を有しているということができる。なお、適応度の計算方法は、上記の例に特に限定されず，種々の変更が可能である。 Here, N is the number of episodes (100 times), H _i is the degree of achievement of the i-th trial, t _i is the arrival time (sec) to the check point P _i shown in FIG. 4, and T _max is the maximum trial time (128 sec). ). From the above equation, the individual (robot) with advanced learning has higher achievement in each trial and higher fitness. That is, it can be said that an individual with high fitness has a form of a sensor that is easy to learn. Note that the fitness calculation method is not particularly limited to the above example, and various modifications are possible.

次に、ステップＳ１０において、選択部４は、行動学習部３から出力されるロボットのデータを基に学習が終了したロボットが最終世代ロボットであるか否かを判断する。最終世代ロボットである場合、選択部４は、適応度の最も高いロボットを選択し、最良個体としてそのデータを出力部６へ出力し、出力部６は、最良個体のセンサの遺伝子型を表示する。例えば、上記のライントレーサの場合、最終世代数として５０が用いられ、第５０世代目のロボットの学習が終了した後、適応度の最も高いロボットのセンサの位置及び個数を表す遺伝子型が表示される。 Next, in step S <b> 10, the selection unit 4 determines whether the robot whose learning has been completed is the last generation robot based on the robot data output from the behavior learning unit 3. In the case of the final generation robot, the selection unit 4 selects the robot with the highest fitness, and outputs the data as the best individual to the output unit 6, and the output unit 6 displays the genotype of the sensor of the best individual. . For example, in the case of the above-mentioned line tracer, 50 is used as the final generation number, and after learning of the 50th generation robot, the genotype representing the position and number of sensors of the robot with the highest fitness is displayed. The

一方、最終世代ロボットでない場合は、ステップＳ１１において、選択部４は、エリート戦略に従い、適応度が上位のロボットを所定数選択し、親個体としてそのデータを次世代作成部５へ出力する。例えば、上記のライントレーサの場合、エリート数が５に設定され、５０個のロボットのうち適応度が上位５位以内の５個のロボットが親個体として選択され、無条件で次世代に残される。 On the other hand, if it is not the final generation robot, in step S11, the selection unit 4 selects a predetermined number of robots with higher fitness according to the elite strategy, and outputs the data to the next generation creation unit 5 as a parent individual. For example, in the case of the above-described line tracer, the number of elite is set to 5, and among the 50 robots, the 5 robots having the top 5 fitness levels are selected as parent individuals and are left unconditionally for the next generation. .

次に、ステップＳ１２において、選択部４は、トーナメント選択に従い、選択しなかった残りのロボットから所定数の個体をランダムに選択し、その中で適応度の最も高い個体を選択し、この過程を集団数が得られるまで繰り返すことにより、親個体を選択してそのデータを次世代作成部５へ出力する。例えば、上記のライントレーサの場合、選択されなかった４５個のロボットからトーナメント選択により４５個のロボットが親個体として選択される。 Next, in step S12, the selection unit 4 randomly selects a predetermined number of individuals from the remaining robots that have not been selected according to the tournament selection, selects the individual with the highest fitness among them, and performs this process. By repeating until the number of groups is obtained, the parent individual is selected and the data is output to the next generation creation unit 5. For example, in the case of the above-described line tracer, 45 robots are selected as parent individuals by tournament selection from 45 robots that were not selected.

次に、ステップＳ１３において、次世代作成部５は、選択部４により選択された親個体の遺伝子型から交叉及び突然変異により次世代の遺伝子型を作成し、各遺伝子型により特定されるセンサの形態を有する複数の次世代ロボットを仮想的に作成し、作成した次世代ロボットのデータを行動学習部３へ出力する。例えば、上記のライントレーサの場合、親個体として選択された５０個のロボットから、突然変異率＝０．０３及び交叉率＝１．０で遺伝子操作が行われ、５０個の次世代ロボットが作成される。 Next, in step S13, the next generation creation unit 5 creates the next generation genotype by crossover and mutation from the genotype of the parent individual selected by the selection unit 4, and the sensor specified by each genotype. A plurality of next-generation robots having a form are virtually created, and data of the created next-generation robots is output to the behavior learning unit 3. For example, in the case of the above line tracer, gene manipulation is performed from 50 robots selected as parent individuals with mutation rate = 0.03 and crossover rate = 1.0, and 50 next-generation robots are created. Is done.

上記ステップＳ１３の処理後、ステップＳ３に戻って次世代のロボットが学習を行い、遺伝的アルゴリズムによりさらに次世代のロボットが順次作成され、最終世代のロボットの学習が終了するまでステップＳ３〜Ｓ１３の処理が繰り返される。 After the process of step S13, the process returns to step S3, the next generation robot learns, and further next generation robots are sequentially created by the genetic algorithm, and the learning of the last generation robot is completed until steps S3 to S13 are completed. The process is repeated.

上記の処理により、本実施の形態では、センサの形態を特定するための複数の遺伝子型が作成され、各遺伝子型により特定されるセンサの形態を有する複数の初期世代ロボットが仮想的に作成され、複数の初期世代ロボットにＱ学習を行わせ、学習した各ロボットの学習結果を基に各ロボットの適応度が算出される。 Through the above processing, in this embodiment, a plurality of genotypes for specifying the sensor form are created, and a plurality of initial generation robots having the sensor form specified by each genotype are virtually created. Then, Q learning is performed on a plurality of initial generation robots, and the fitness of each robot is calculated based on the learning result of each learned robot.

次に、算出された各ロボットの適応度を基に、複数のロボットの中から学習性能の高い所定数のロボットがそのまま親個体として選択されるとともに、残りのロボットからトーナメント選択により同数のロボットが親個体として選択され、選択された親個体の遺伝子型から交叉及び突然変異により次世代の遺伝子型が作成され、各次世代の遺伝子型により特定されるセンサの形態を有する複数のロボットが次世代ロボットとして仮想的に作成され、複数の次世代ロボットに再度Ｑ学習を行わせ、学習結果を基に各ロボットの適応度が算出され、これらの次世代作成処理及びその学習処理が最終世代まで繰り返されることによりセンサの形態が決定される。 Next, based on the calculated fitness of each robot, a predetermined number of robots with high learning performance are selected as a parent individual from a plurality of robots, and the same number of robots are selected from the remaining robots by tournament selection. The next generation genotype is created by crossover and mutation from the selected parent individual's genotype, and multiple robots with the form of sensors specified by each next generation genotype are next generation Virtually created as a robot, let multiple next-generation robots perform Q-learning again, the fitness of each robot is calculated based on the learning results, and these next-generation creation processing and learning processing are repeated until the final generation As a result, the form of the sensor is determined.

したがって、学習結果に基づいて、適応度の高いロボットすなわち学習性能が高いロボットが、死滅することなく、親個体のロボットとして選択され、センサの形態を特定する遺伝子型を進化させることができるので、ロボットの行動学習に有利なセンサの形態を自動的に設計することができ、ロボットの学習能力に適した状態空間を構築するために学習結果に基づいてセンサの形態を自律的に設計することができる。 Therefore, based on the learning result, a robot with high fitness, that is, a robot with high learning performance, is selected as a parent individual robot without dying, and the genotype that identifies the form of the sensor can be evolved, It is possible to automatically design sensor forms that are advantageous for robot behavior learning, and to design sensor forms autonomously based on learning results in order to construct a state space suitable for the robot's learning ability it can.

また、設計時に仮想的に行われる学習により自動設計されたセンサの形態に最適な学習アルゴリズムをも獲得することができる。さらに、ロボットの形態のうちセンサに特化してその形態を自動設計しているので、物理世界と情報世界のインタフェースであるセンサをボトムアップ的に構築することができるとともに、アクチュエータが固定されるため、ハードウエア上で実現しやすいという利点もある。 It is also possible to acquire a learning algorithm that is optimal for the form of a sensor that is automatically designed by learning that is virtually performed at the time of design. Furthermore, since the robot is specially designed for the robot, it is possible to construct a sensor that is the interface between the physical world and the information world from the bottom up, and the actuator is fixed. There is also an advantage that it is easy to realize on hardware.

次に、上記センサ設定装置によるセンサの設計結果について、図３に示すラインレーサを例に具体的に説明する。図６は、図１に示すセンサ設定装置により設計されたライントレーサの各世代における適応度と、各世代における最良個体のセンサ個数との変化を示す図である。図中、実線は最大適応度を、破線は最良個体のセンサ個数を、一点鎖線は実験を１０回行ったときの平均適応度をそれぞれ示している。 Next, a sensor design result by the sensor setting device will be specifically described taking the line racer shown in FIG. 3 as an example. FIG. 6 is a diagram showing changes in the fitness of each generation of the line tracer designed by the sensor setting device shown in FIG. 1 and the number of sensors of the best individual in each generation. In the figure, the solid line indicates the maximum fitness, the broken line indicates the number of sensors of the best individual, and the alternate long and short dash line indicates the average fitness when the experiment is performed 10 times.

図６から、最良個体においては、センサ個数が減少するに従って適応度が増加していることがわかる。例えば、第５世代（センサ数７前後）の適応度が約０．２であるのに対し、第５０世代（センサ数５前後）の適応度は約０．４に上昇している。この結果、図４に示すライントレース環境の場合、５個前後のセンサが学習に有利であり、学習器を用いることにより、マイコンカーラリーで標準的に用いられているセンサ個数（８個）を削減することができることがわかった。 It can be seen from FIG. 6 that the fitness of the best individual increases as the number of sensors decreases. For example, the fitness of the fifth generation (around 7 sensors) is about 0.2, whereas the fitness of the 50th generation (around 5 sensors) is increased to about 0.4. As a result, in the case of the line trace environment shown in FIG. 4, about 5 sensors are advantageous for learning, and by using a learning device, the number of sensors (8) that are normally used in the microcomputer car rally can be reduced. It was found that it can be reduced.

図７は、図１に示すセンサ設計装置により設計されたセンサ形態の代表例を示す図であり、図中の黒丸がセンサを表している。例えば、センサが４個の場合、図７の（ａ）に示すセンサ配置が設計され、センサが５個の場合、図７の（ｂ）に示すセンサ配置が設計され、センサが６個の場合、図７の（ｃ）に示すセンサ配置が設計された。 FIG. 7 is a diagram showing a representative example of the sensor form designed by the sensor design apparatus shown in FIG. 1, and the black circles in the figure represent the sensor. For example, when there are four sensors, the sensor arrangement shown in FIG. 7A is designed, and when there are five sensors, the sensor arrangement shown in FIG. 7B is designed and there are six sensors. The sensor arrangement shown in FIG. 7C was designed.

次に、図１に示すセンサ設計装置により設計されたセンサ形態と、人手により設計したセンサ形態とを用いて学習結果の比較を行った。図８は、図１に示すセンサ設計装置により設計されたセンサ形態及び人手により設計されたセンサ形態の例を示す図であり、図中の黒丸がセンサを表している。 Next, the learning results were compared using the sensor form designed by the sensor design apparatus shown in FIG. 1 and the sensor form designed by hand. FIG. 8 is a diagram showing an example of a sensor form designed by the sensor design apparatus shown in FIG. 1 and a sensor form designed by hand, and a black circle in the figure represents a sensor.

図８の（ｃ）に示すセンサ形態は、図１に示すセンサ設計装置により設計されたセンサ形態であり、５個のセンサが平面的に所定間隔で配置されている。図８の（ａ）に示すセンサ形態は、マイコンカーラリーにおいて標準的に用いられているセンサ形態であり、８個のセンサが一直線上に等間隔で配置されている。図８の（ｂ）に示すセンサ形態は、図８の（ｃ）に示すセンサ形態と状態空間の次元が等しいマイコンカーのセンサ形態であり、５個のセンサが一直線上に所定間隔で配置されている。 The sensor form shown in FIG. 8C is a sensor form designed by the sensor design apparatus shown in FIG. 1, and five sensors are arranged at predetermined intervals in a plane. The sensor form shown in (a) of FIG. 8 is a sensor form used as standard in the microcomputer car rally, and eight sensors are arranged at equal intervals on a straight line. The sensor configuration shown in (b) of FIG. 8 is a sensor configuration of a microcomputer car having the same dimension of the state space as the sensor configuration shown in (c) of FIG. 8, and five sensors are arranged on a straight line at predetermined intervals. ing.

図９は、図８に示す各センサ形態のエピソード数に対する適応度の変化を示す図である。図９に示すエピソード数に対する適応度の変化は、ゴールに到達する早さの変化を表しており、実線は図８の（ｃ）に示すセンサ形態の変化を、一点鎖線は図８の（ａ）に示すセンサ形態の変化を、破線は図８の（ｂ）に示すセンサ形態の変化をそれぞれ示している。 FIG. 9 is a diagram showing a change in fitness with respect to the number of episodes of each sensor form shown in FIG. The change in the fitness with respect to the number of episodes shown in FIG. 9 represents the change in the speed of reaching the goal, the solid line shows the change in the sensor form shown in (c) of FIG. 8, and the alternate long and short dash line in FIG. ), And the broken lines indicate changes in the sensor form shown in FIG.

図９から、図１に示すセンサ設計装置により設計されたセンサ形態、すなわち図８の（ｃ）に示すセンサ形態は、図８の（ａ）及び（ｂ）に示す人手によるセンサ形態より全てのエピソードにおいて高い適応度を示していることがわかる。なお、エピソード数を増やしても、この傾向は同様であった。また、図８の（ｃ）に示すセンサ形態は、状態空間が大きい図８の（ａ）に示すセンサ形態に対しても、収束速度とパフォーマンスとの面で勝っており、本実施の形態のように物理世界を適切に観測することで、より効果的な学習を行えることがわかった。 From FIG. 9, the sensor form designed by the sensor design apparatus shown in FIG. 1, that is, the sensor form shown in FIG. 8 (c) is more complete than the manual sensor form shown in FIG. 8 (a) and (b). It can be seen that the episode shows high fitness. This trend was the same even when the number of episodes was increased. Further, the sensor configuration shown in FIG. 8C is superior to the sensor configuration shown in FIG. 8A with a large state space in terms of convergence speed and performance. Thus, it was found that more effective learning can be achieved by appropriately observing the physical world.

次に、図１に示すセンサ設計装置により設計された図８の（ｃ）に示すセンサ形態の配置及び個数について考察する。まず、図８の（ｃ）に示すセンサ形態は、センサの配置に関して以下の特徴を有する。 Next, the arrangement and the number of sensor forms shown in FIG. 8C designed by the sensor design apparatus shown in FIG. 1 will be considered. First, the sensor configuration shown in FIG. 8C has the following characteristics with respect to the sensor arrangement.

（１）センサの配置が左右非対称である。これは、今回のタスクは、コースを半時計回りに回るように設定しているため、Ｓ字カーブを除けば、左カーブが多く、左カーブを得意とするセンサ形態の適応度が高くなったためと考えられる。 (1) The sensor arrangement is asymmetrical. This is because the current task is set to turn the course counterclockwise, so there are many left curves except for the S-curve, and the fitness of the sensor form that is good at the left curve has increased. it is conceivable that.

（２）センサが前後に分散して配置されている。これは、横一列に並んだ配置では、ラインの前後関係を読み取ることができず、ライントレーサがカーブ上にいるのか、又は直線上にいるのかを判断することができないが、センサが前後に分散されることにより、ラインの前後関係から直線やカーブといったラインの形状を読み取ることができるためであると考えられる。 (2) Sensors are distributed in the front-rear direction. This is because in a horizontal arrangement, the line context cannot be read, and it cannot be determined whether the line tracer is on a curve or on a straight line, but the sensors are scattered back and forth. This is considered to be because the shape of a line such as a straight line or a curve can be read from the context of the line.

次に、センサの個数について、自動設計されたセンサ形態は、５個前後のセンサを持つものが多く、センサの個数は状態空間の次元を決定するため、学習速度と取得できる情報量に影響を与える。一般的に、次元が少ない方が学習の収束が早いが、学習収束後のパフォーマンスは、次元が多い場合よりも低くなると考えられる。しかしながら、上記の結果では、学習が収束した後であっても、センサ数が少ない方が、高いパフォーマンスを示した。この結果、少ない状態空間を効果的に用いることができれば、収束速度だけでなく、学習効果も改善できることがわかった。 Next, with regard to the number of sensors, automatically designed sensor forms often have around five sensors, and the number of sensors determines the dimension of the state space, which affects the learning speed and the amount of information that can be acquired. give. Generally, learning converges faster with fewer dimensions, but the performance after learning convergence is considered to be lower than when there are many dimensions. However, in the above results, even after learning has converged, a smaller number of sensors showed higher performance. As a result, it was found that if a small state space can be used effectively, not only the convergence speed but also the learning effect can be improved.

このように、本センサ設計装置を用いることにより、タスク環境とロボットの学習能力とに適した状態空間を構成させることができ、ロボットにより適応的な行動を実行させることができる。 Thus, by using this sensor design apparatus, a state space suitable for the task environment and the learning ability of the robot can be configured, and adaptive behavior can be executed by the robot.

次に、上記のように設計されたセンサの形態を有するライントレーサ（図８の（ｃ）に示すライントレーサ）を実際に作成してライントレースを行い、人手で設計されたライントレーサ（図８の（ａ）及び（ｂ）に示すライントレーサ）と比較した。なお、コントローラとして、図８の（ａ）に示すライントレーサは付属のサンプルプログラムを改良したハンドコーディングによるものを用い、図８の（ｂ）及び（ｃ）に示すライントレーサは設計時のシミュレーションの学習結果を用い、設計時に得られたＱ値に基づいて行動を決定し、センサが観測した状態に対して最大のＱ値を持つ行動を選択するものを用いた。 Next, a line tracer having the form of the sensor designed as described above (the line tracer shown in FIG. 8C) is actually created to perform line tracing, and the line tracer designed manually (FIG. 8). (A line tracer shown in (a) and (b)). As the controller, the line tracer shown in (a) of FIG. 8 uses a hand-coded version of the attached sample program, and the line tracer shown in (b) and (c) of FIG. A learning result is used to determine an action based on the Q value obtained at the time of design, and an action having the maximum Q value with respect to the state observed by the sensor is selected.

上記の各ライントレーサに５回試行させ、１回の試行はコースを１０周するか、又はコースアウトした場合に終了させ、走行性能を比較するために平均ラップタイム（秒）を測定し、ロバスト性を比較するために平均滞在ラップ数を測定した。図８の（ｃ）に示すライントレーサの平均ラップタイムは１３．５秒、平均滞在ラップ数は８．４であり、図８の（ａ）に示すライントレーサの平均ラップタイムは１５．６秒、平均滞在ラップ数は４．４であり、図８の（ｂ）に示すライントレーサの平均ラップタイムは１６．０秒、平均滞在ラップ数は３．０であり、図１に示すセンサ設計装置により設計されたセンサ形態を有するライントレーサは、走行性能及びロバスト性ともに優れていた。 Each line tracer mentioned above is tried 5 times, and one trial is ended when the course goes 10 laps or out of the course, and the average lap time (seconds) is measured to compare the running performance, and the robustness is improved. The average stay lap number was measured for comparison. The average lap time of the line tracer shown in (c) of FIG. 8 is 13.5 seconds, the average stay lap number is 8.4, and the average lap time of the line tracer shown in (a) of FIG. The number of stay laps is 4.4, the average lap time of the line tracer shown in FIG. 8B is 16.0 seconds, the average stay lap number is 3.0, and is designed by the sensor design apparatus shown in FIG. Further, the line tracer having the sensor form was excellent in both running performance and robustness.

次に、ライントレースにおいて難易度が高いとされる直角コーナーの攻略を例に、上記の各ライントレーサがどのようにセンサ配置を利用しているかについて説明する。 Next, how the above-described line tracers use the sensor arrangement will be described by taking as an example the capture of a right-angled corner that is considered to be difficult in line tracing.

まず、図８の（ａ）に示すライントレーサは、直角コーナーの直前に存在するクロスラインを合図に用いる方法を採用しており、直角コーナーの直前のクロスラインを読み取ると、直角コーナー突破用のモードに移行し、直角コーナーを専用の制御則を用いて直角コーナーをクリアする。このように、人手によるコントローラでは、クロスラインの後には直角コーナーが存在するというコースに関する設計者の知識を利用することができるため、通常のカーブとは異なる制御則を用いて直角コーナーを突破することができる。 First, the line tracer shown in FIG. 8A employs a method in which a cross line existing immediately before a right corner is used as a signal, and when the cross line immediately before the right corner is read, Go to mode and clear the right corner using a special control law. In this way, the manual controller can use the designer's knowledge of the course that there is a right-angle corner after the cross line, so it breaks through the right-angle corner using a control law different from the normal curve. be able to.

図１０は、図８の（ｃ）に示すライントレーサの直角コーナーにおけるセンサ形態の利用方法を説明するための模式図である。図８の（ｃ）に示すライントレーサは、コースに関する設計者の知識を利用できないため、直角コーナーを通常のカーブと同じ制御則を用いてクリアしなければならない。このため、直角コーナーにさしかかると、図１０の（ａ）に示すようにセンサが反応する。これは、左カーブと同じセンサ状態であるため、ライントレーサは、少し左に曲がる。しかしながら、実際にはコースは直角であるため、図１０の（ｂ）に示すように、コースから外れそうになる。このとき、４番目のセンサがセンタラインを検出し、ライントレーサは、大きく左折するため、直角コーナーをクリアすることができる。 FIG. 10 is a schematic diagram for explaining how to use the sensor form at the right-angled corner of the line tracer shown in FIG. Since the line tracer shown in FIG. 8C cannot use the designer's knowledge about the course, the right-angled corner must be cleared using the same control law as that of a normal curve. For this reason, when approaching a right-angled corner, the sensor reacts as shown in FIG. Since this is the same sensor state as the left curve, the line tracer turns slightly to the left. However, since the course is actually a right angle, the course is likely to deviate from the course as shown in FIG. At this time, since the fourth sensor detects the center line and the line tracer makes a large left turn, the right-angled corner can be cleared.

一方、図８の（ｂ）に示すライントレーサは、横一列に並んだセンサを有するため、直角コーナーでコースアウトすることが多い。これは、センサが前後に分散していないため、コース上の前後関係を学習できなかったことによると考えられる。 On the other hand, since the line tracer shown in FIG. 8B has sensors arranged in a horizontal row, the line tracer often goes out of course at a right angle corner. This is presumably because the sensors were not distributed back and forth, so that the context on the course could not be learned.

上記のように、図１に示すセンサ設計装置により設計されたライントレーサは、物理世界を適切に観測することにより、人手による設計に比べて、走行性能及びロバスト性に優れるとともに、学習性能にも優れることがわかった。 As described above, the line tracer designed by the sensor design apparatus shown in FIG. 1 is superior in running performance and robustness as compared to manual design by appropriately observing the physical world, and also in learning performance. I found it excellent.

なお、上記の説明では、ライントレーサを一例に説明したが、本発明が適用されるロボットは、この例に特に限定されず、センサを用いるロボットであれば、種々のロボットに適用可能である。 In the above description, the line tracer has been described as an example, but the robot to which the present invention is applied is not particularly limited to this example, and can be applied to various robots as long as the robot uses a sensor.

本発明の一実施の形態によるセンサ設計装置の構成を示すブロック図である。It is a block diagram which shows the structure of the sensor design apparatus by one embodiment of this invention. 図１に示すセンサ設計装置によるセンサ設計処理を説明するためのフローチャートである。It is a flowchart for demonstrating the sensor design process by the sensor design apparatus shown in FIG. 図１に示すセンサ設計装置の設計対象となるロボットの一例を示す底面模式図である。It is a bottom face schematic diagram which shows an example of the robot used as the design object of the sensor design apparatus shown in FIG. 図３に示すロボットのタスク環境を示す模式図である。It is a schematic diagram which shows the task environment of the robot shown in FIG. 図４に示すコース上のロボットの状態を示す模式図である。It is a schematic diagram which shows the state of the robot on the course shown in FIG. 図１に示すセンサ設定装置により設計されたライントレーサの各世代における適応度と、各世代における最良個体のセンサ個数との変化を示す図である。It is a figure which shows the change of the fitness in each generation of the line tracer designed by the sensor setting apparatus shown in FIG. 1, and the number of sensors of the best individual in each generation. 図１に示すセンサ設計装置により設計されたセンサ形態の代表例を示す図である。It is a figure which shows the typical example of the sensor form designed by the sensor design apparatus shown in FIG. 図１に示すセンサ設計装置により設計されたセンサ形態及び人手により設計されたセンサ形態の例を示す図である。It is a figure which shows the example of the sensor form designed by the sensor design apparatus shown in FIG. 1, and the sensor form designed manually. 図８に示す各センサ形態のエピソード数に対する適応度の変化を示す図である。It is a figure which shows the change of the fitness with respect to the number of episodes of each sensor form shown in FIG. 図８の（ｃ）に示すライントレーサの直角コーナーにおけるセンサ形態の利用方法を説明するための模式図である。It is a schematic diagram for demonstrating the utilization method of the sensor form in the right-angled corner of the line tracer shown in FIG.8 (c).

Explanation of symbols

１入力部
２初期世代作成部
３行動学習部
４選択部
５次世代作成部
６出力部 DESCRIPTION OF SYMBOLS 1 Input part 2 Initial generation creation part 3 Behavior learning part 4 Selection part 5 Next generation creation part 6 Output part

Claims

A sensor design device for designing the form of a sensor used for robot behavior learning,
An initial generation creating means for creating a plurality of genotypes for specifying the form of the sensor and virtually creating a plurality of robots having the form of the sensor specified by each genotype;
Learning means for virtually learning a plurality of robots created by the initial generation creating means, and calculating fitness of each robot based on a learning result;
Based on the fitness of each robot calculated by the learning means, select a plurality of robots that are parent individuals, create a next-generation genotype based on a genetic algorithm from the genotypes of the selected plurality of robots, A next generation creation means for virtually creating a plurality of robots having a sensor form specified by the next generation genotype,
The learning means causes the robot created by the next generation creating means to virtually perform learning again, calculates the fitness of each robot based on the learning result,
A sensor design apparatus for determining a sensor form by repeating a predetermined number of processes by the next generation creating means and the learning means.

The sensor design apparatus according to claim 1, wherein the learning unit causes the robot to perform learning using Q-learning.

The next generation creation means is:
Based on the fitness of each robot calculated by the learning means, a predetermined number of robots with high learning performance are selected as parent individuals from the plurality of robots, and the same number of robots are selected by tournament selection from the remaining robots. A selection means for selecting as a parent individual;
A next generation genotype is created from a genotype of a parent individual selected by the selection means based on a genetic algorithm, and a plurality of robots having a sensor form specified by each next generation genotype are used as next generation robots. A creation means for creating virtually,
3. The learning unit according to claim 1, wherein the learning unit virtually causes the next generation robot created by the creating unit to perform learning again, and calculates the fitness of each robot based on the learning result. Sensor design device.

The sensor design device according to any one of claims 1 to 3, wherein the genotype specifies at least one of the position, number, resolution, and sensing interval of the sensor.

The next-generation creation means creates a next-generation genotype using at least one of crossover and mutation from the genotype of the robot as a parent individual, and a plurality of robots having a sensor form specified by each genotype The sensor design device according to claim 1, wherein the sensor design device is created virtually.

A sensor design method for designing a form of a sensor used for behavioral learning of a robot using a sensor design apparatus including an initial generation creation means, a learning means, and a next generation creation means,
A first step in which the initial generation creating means creates a plurality of genotypes for specifying the form of the sensor and virtually creates a plurality of robots having the form of the sensor specified by each genotype; ,
A second step in which the learning means virtually causes the plurality of robots created by the initial generation creation means to perform learning, and calculates the fitness of each robot based on the learning results;
The next generation creation means selects a plurality of robots as parent individuals based on the fitness of each robot calculated by the learning means, and generates a next generation based on a genetic algorithm from a genotype of the selected plurality of robots. A third step of creating a genotype and virtually creating a plurality of robots having the form of a sensor specified by each next generation genotype;
The learning means includes a fourth step of causing the robot created in the third step to virtually re-learn and calculating the fitness of each robot based on the learning result;
A sensor design method characterized by determining a sensor form by repeating a predetermined number of processes in the third and fourth steps.

A sensor design program for designing the form of a sensor used for robot behavior learning,
An initial generation creating means for creating a plurality of genotypes for specifying the form of the sensor and virtually creating a plurality of robots having the form of the sensor specified by each genotype;
Learning means for virtually learning a plurality of robots created by the initial generation creating means, and calculating fitness of each robot based on a learning result;
Based on the fitness of each robot calculated by the learning means, select a plurality of robots that are parent individuals, create a next-generation genotype based on a genetic algorithm from the genotypes of the selected plurality of robots, The computer functions as a next generation creation means for virtually creating a plurality of robots having the form of a sensor specified by the next generation genotype,
The learning means causes the robot created by the next generation creating means to virtually perform learning again, calculates the fitness of each robot based on the learning result,
A sensor design program for determining a sensor form by repeating a predetermined number of processes by the next generation creating means and the learning means.

A robot having a sensor designed by the sensor design device according to claim 1.