JP7237891B2

JP7237891B2 - LEARNING EXECUTION DEVICE, PROGRAM, AND LEARNING EXECUTION METHOD

Info

Publication number: JP7237891B2
Application number: JP2020121597A
Authority: JP
Inventors: 裕子石若; 智博吉田; 忠輝伊藤
Original assignee: SoftBank Corp
Current assignee: SoftBank Corp
Priority date: 2020-07-15
Filing date: 2020-07-15
Publication date: 2023-03-13
Anticipated expiration: 2040-07-15
Also published as: JP2023085258A; JP2022018477A; JP2023130362A; JP7379742B2; JP7379750B2

Description

本発明は、学習実行装置、プログラム、及び学習実行方法に関する。 The present invention relates to a learning execution device, program, and learning execution method.

ＣＧ（ＣｏｍｐｕｔｅｒＧｒａｐｈｉｃｓ）の分野において、筋収縮に基づくシミュレーション手法が知られていた（例えば、非特許文献１～非特許文献６、参照）。従来のシミュレーション手法では、いわゆるヒルタイプモデル及びいわゆるＣＰＧ（ＣｅｎｔｒａｌＰａｔｔｅｒｎＧｅｎｅｒａｔｏｒ）等が用いられていた。
［先行技術文献］
［非特許文献］
［非特許文献１］Thomas Geitenbeek, Michiel van de Panne, A. F. v. d. s. Flexible muscle-based locomotion for bipedal creatures. ACM Transactions on Graphics, (206), 2013.
［非特許文献２］Jack M.Wang, Samuel R.Hmner, S. L. V. K. Optimizing locomotion controllers using biologically-based actuators and objectives. ACM Trans. Graph, 31(4), 2012.
［非特許文献３］Yoonsang Lee, Moon Seok Park, T. K. J. L. Locomotion control for many-muscle humanoids. ACM Transactions on Graphics, 33(6), 2014.
［非特許文献４］Sehee Min, Jungdam Won, S. L. J. P. J. L. Softcon: simulation and control of soft-bodied animals with biomimetic actuators. ACM Transactions on Graphics, 38(6):208:1-208:12, 2019.
［非特許文献５］Cecila Laschi, Matteo Cianchetti, B. M. L. m. M. F. P. D. Soft robot arm inspired by the octopus. Advanced Robotics, 26(7):709-727, 2012.
［非特許文献６］Jungdam Won, Jongho Park, K. K. J. L. How to train your dragon: Example-guided control of flapping flight. ACM Transactions on Graphics, 36(4):1:1-1:12, 2017. In the field of CG (Computer Graphics), simulation techniques based on muscle contraction have been known (see, for example, Non-Patent Documents 1 to 6). In conventional simulation methods, a so-called Hill-type model, a so-called CPG (Central Pattern Generator), and the like have been used.
[Prior art documents]
[Non-Patent Literature]
[Non-Patent Document 1] Thomas Geitenbeek, Michiel van de Panne, AF vds Flexible muscle-based locomotion for bipedal creatures. ACM Transactions on Graphics, (206), 2013.
[Non-Patent Document 2] Jack M. Wang, Samuel R. Hmner, SLVK Optimizing locomotion controllers using biologically-based actuators and objectives. ACM Trans. Graph, 31(4), 2012.
[Non-Patent Document 3] Yoonsang Lee, Moon Seok Park, TKJL Locomotion control for many-muscle humanoids. ACM Transactions on Graphics, 33(6), 2014.
[Non-Patent Document 4] Sehee Min, Jungdam Won, SLJPJL Softcon: simulation and control of soft-bodied animals with biomimetic actuators. ACM Transactions on Graphics, 38(6):208:1-208:12, 2019.
[Non-Patent Document 5] Cecila Laschi, Matteo Cianchetti, BML m. MFPD Soft robot arm inspired by the octopus. Advanced Robotics, 26(7):709-727, 2012.
[Non-Patent Document 6] Jungdam Won, Jongho Park, KKJL How to train your dragon: Example-guided control of flapping flight. ACM Transactions on Graphics, 36(4):1:1-1:12, 2017.

本発明の第１の態様によれば、学習実行装置が提供される。学習実行装置は、それぞれに運動単位が接続された複数の介在ニューロンの発火パターンに従って、運動単位に含まれる運動ニューロンに接続された筋繊維を収縮させることによって筋肉を動作させる筋肉モデルを格納する格納部を備えてよい。学習実行装置は、筋肉モデルの目標動作を設定する動作設定部を備えてよい。学習実行装置は、発火パターンを学習する学習実行部であって、複数の発火パターンのうち、筋肉モデルの動作が目標動作により近い発火パターンに報酬を与える学習を実行することによって、目標動作を実現する発火パターンを学習する学習実行部を備えてよい。 According to a first aspect of the present invention, a learning execution device is provided. The learning execution device stores a muscle model that operates the muscle by contracting muscle fibers connected to the motor neurons included in the motor units according to the firing pattern of a plurality of interneurons to which the motor units are connected respectively. You may have a department. The learning execution device may include a motion setting unit that sets a target motion of the muscle model. The learning execution device is a learning execution unit that learns firing patterns, and implements a target motion by executing learning that rewards firing patterns in which the motion of the muscle model is closer to the target motion, among a plurality of firing patterns. A learning execution unit that learns the firing pattern to be used may be provided.

上記学習実行部は、上記複数の発火パターンのそれぞれに従って上記筋肉モデルを動作させ、上記筋肉モデルの動作が上記目標動作により近い発火パターンに基づいて複数の発火パターンを発生させ、当該複数の発火パターンのそれぞれに従って上記筋肉モデルを動作させ、上記筋肉モデルの動作が上記目標動作により近い発火パターンに基づいて複数の発火パターンを発生させることを繰り返すことによって、上記目標動作を実現する発火パターンを学習してよい。上記学習実行部は、ランダムに発生させた上記複数の発火パターンのそれぞれに従って上記筋肉モデルを動作させ、上記筋肉モデルの動作が上記目標動作により近い発火パターンに基づいて複数の発火パターンを発生させ、当該複数の発火パターンのそれぞれに従って上記筋肉モデルを動作させ、上記筋肉モデルの動作が上記目標動作により近い発火パターンに基づいて複数の発火パターンを発生させることを繰り返すことによって、上記目標動作を実現する発火パターンを学習してよい。上記学習実行部は、学習済みの発火パターンに基づいて発生させた上記複数の発火パターンのそれぞれに従って上記筋肉モデルを動作させ、上記筋肉モデルの動作が上記目標動作により近い発火パターンに基づいて複数の発火パターンを発生させ、当該複数の発火パターンのそれぞれに従って上記筋肉モデルを動作させ、上記筋肉モデルの動作が上記目標動作により近い発火パターンに基づいて複数の発火パターンを発生させることを繰り返すことによって、上記目標動作を実現する発火パターンを学習してよい。上記学習実行部は、上記発火パターンに基づいて上記筋肉モデルを動作させた場合に、上記筋繊維を収縮させた上記運動単位を成長させてよい。上記筋肉モデルは、速筋の運動単位と、遅筋の運動単位とを含んでよく、上記学習実行部は、上記発火パターンに基づいて上記筋肉モデルを動作させた場合に、上記速筋の運動単位と上記遅筋の運動単位とを異なる基準に従って成長させてよい。上記情報格納部は、上記運動単位に対して、速筋であるか遅筋であるかを示す第１パラメータと、収縮可能なエネルギーを示す第２パラメータと、上記第２パラメータの最大値と、自己回復力を示す第３パラメータと、上記第３パラメータの最大値とを格納してよく、上記学習実行部は、上記第１パラメータ、上記第２パラメータ、上記第２パラメータの最大値、上記第３パラメータ、及び上記第３パラメータの最大値を用いた学習を実行してよい。上記学習実行部は、上記運動単位が収縮する毎に上記第２パラメータから予め定められた値を減算し、上記第３パラメータが０でない間は、時間経過に伴って上記第２パラメータを回復させてよい。上記情報格納部は、上記運動単位が速筋である場合に、上記運動単位が収縮する毎に消費されるエネルギー量を示す第４パラメータを格納し、上記学習実行部は、上記第１パラメータ、上記第２パラメータ、上記第２パラメータの最大値、上記第３パラメータ、上記第３パラメータの最大値、及び上記第４パラメータを用いた学習を実行してよい。上記学習実行部は、上記運動単位が速筋である場合には、上記運動単位が収縮する毎に上記第２パラメータから上記第４パラメータの値を減算し、上記運動単位が遅筋である場合には、上記運動単位が収縮する毎に上記第２パラメータから上記第４パラメータの値以外の値を減算してよい。上記学習実行部は、上記筋繊維が損傷したと判定した後、上記筋繊維が回復したと判定した場合に、上記運動単位が速筋である場合には、上記第２パラメータの最大値及び上記第４パラメータの値を増加させ、上記運動単位が遅筋である場合には、上記第３パラメータの最大値を増加させてよい。上記学習実行部は、上記筋繊維が損傷したと判定した後、上記筋繊維が回復したと判定した場合において、上記運動単位が速筋である場合、上記第３パラメータの最大値は増大させなくてよい。上記学習実行部は、上記筋繊維が損傷したと判定した後、上記筋繊維が回復したと判定した場合において、上記運動単位が遅筋である場合、上記第２パラメータの最大値は増大させなくてよい。上記学習実行部は、上記第２パラメータが０になった場合に、上記筋繊維が損傷したと判定してよい。上記情報格納部は、上記運動単位に対して、上記運動単位の使用に関連する第５パラメータを格納してよく、上記学習実行部は、上記第５パラメータの増加に伴って上記運動単位のレベルを向上させ、上記運動単位のレベルが高いほど、上記運動単位が速筋である場合の上記第２パラメータの最大値及び上記第４パラメータの値を増加しにくくし、上記運動単位が遅筋である場合の上記第３パラメータの最大値を増加しにくくしてよい。上記学習実行部は、一の運動単位を収縮させた後、予め定められた不応期を経過するまで、当該一の運動単位が収縮できないようにして、上記発火パターンを学習してよい。上記学習実行部は、上記運動単位の温度が高いほど上記不応期を短くして、上記発火パターンを学習してよい。上記学習実行部は、時系列の上記複数の発火パターンに従って動作させた上記筋肉モデルの動作が上記目標動作を達成した場合に、上記目標動作を達成した状態の発火パターンから予め定められた時間遡った状態の発火パターンを更新することによって、上記学習を実行してよい。 The learning execution unit operates the muscle model according to each of the plurality of firing patterns, generates a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion, and generates the plurality of firing patterns. and repeatedly generating a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion, thereby learning the firing pattern that realizes the target motion. you can The learning execution unit operates the muscle model according to each of the plurality of randomly generated firing patterns, generates a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion, The target motion is realized by operating the muscle model according to each of the plurality of firing patterns and repeatedly generating a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion. Firing patterns may be learned. The learning execution unit operates the muscle model according to each of the plurality of firing patterns generated based on the learned firing pattern, and operates the muscle model based on the firing pattern closer to the target motion. Generating a firing pattern, operating the muscle model according to each of the plurality of firing patterns, and repeatedly generating a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion, A firing pattern that achieves the above target operation may be learned. The learning execution unit may cause the motor unit that contracted the muscle fiber to grow when the muscle model is operated based on the firing pattern. The muscle model may include a fast-twitch motor unit and a slow-twitch motor unit. When the muscle model is operated based on the firing pattern, the learning execution unit performs the fast-twitch motor unit. Units and motor units of the slow twitch muscle may be grown according to different criteria. The information storage unit stores a first parameter indicating whether the motor unit is fast-twitch or slow-twitch, a second parameter indicating contractile energy, a maximum value of the second parameter, and A third parameter indicating self-resilience and the maximum value of the third parameter may be stored, and the learning execution unit stores the first parameter, the second parameter, the maximum value of the second parameter, the Learning using three parameters and the maximum value of the third parameter may be performed. The learning execution unit subtracts a predetermined value from the second parameter each time the motor unit contracts, and recovers the second parameter over time while the third parameter is not 0. you can The information storage unit stores a fourth parameter indicating an amount of energy consumed each time the motor unit contracts when the motor unit is a fast-twitch muscle, and the learning execution unit stores the first parameter, Learning may be performed using the second parameter, the maximum value of the second parameter, the third parameter, the maximum value of the third parameter, and the fourth parameter. The learning execution unit subtracts the value of the fourth parameter from the second parameter each time the motor unit contracts if the motor unit is a fast-twitch muscle, and if the motor unit is a slow-twitch muscle, , a value other than the value of the fourth parameter may be subtracted from the second parameter each time the motor unit contracts. When the learning execution unit determines that the muscle fiber has recovered after determining that the muscle fiber has been damaged, if the motor unit is a fast-twitch muscle, the maximum value of the second parameter and the The value of the fourth parameter may be increased, and the maximum value of the third parameter may be increased if the motor unit is a slow twitch muscle. After determining that the muscle fiber is damaged, the learning execution unit determines that the muscle fiber has recovered, and if the motor unit is a fast-twitch muscle, the maximum value of the third parameter is not increased. you can After determining that the muscle fiber has been damaged, the learning execution unit determines that the muscle fiber has recovered, and if the motor unit is a slow twitch muscle, the maximum value of the second parameter is not increased. you can The learning execution unit may determine that the muscle fiber is damaged when the second parameter becomes zero. The information storage unit may store, for the motor unit, a fifth parameter related to the use of the motor unit, and the learning execution unit controls the level of the motor unit as the fifth parameter increases. is improved, the higher the level of the motor unit, the more difficult it is to increase the maximum value of the second parameter and the value of the fourth parameter when the motor unit is fast-twitch, and the motor unit is slow-twitch. It may be difficult to increase the maximum value of the third parameter in some cases. After contracting one motor unit, the learning execution unit may learn the firing pattern by preventing the one motor unit from contracting until a predetermined refractory period elapses. The learning execution unit may learn the firing pattern by shortening the refractory period as the temperature of the motor unit increases. When the motion of the muscle model that is operated according to the plurality of time-series firing patterns achieves the target motion, the learning execution unit traces back a predetermined time from the firing pattern in which the target motion was achieved. The learning may be performed by updating the firing patterns in the state of

本発明の第２の態様によれば、学習実行装置が提供される。学習実行装置は、筋肉に含まれる複数の筋繊維のそれぞれに対して、筋繊維が速筋であるか遅筋であるかを示す第１パラメータと、収縮可能なエネルギーを示す第２パラメータと、第２パラメータの最大値と、自己回復力を示す第３パラメータと、第３パラメータの最大値とを格納する情報格納部を備えてよい。学習実行装置は、上記第１パラメータ、上記第２パラメータ、上記第２パラメータの最大値、上記第３パラメータ、及び上記第３パラメータの最大値を用いた学習を実行することによって、筋肉のモデルを学習する学習実行部を備えてよい。 According to a second aspect of the present invention, a learning execution device is provided. The learning execution device provides, for each of a plurality of muscle fibers included in a muscle, a first parameter indicating whether the muscle fiber is fast-twitch or slow-twitch, a second parameter indicating contractile energy, and You may provide the information storage part which stores the maximum value of a 2nd parameter, the 3rd parameter which shows self-healing power, and the maximum value of a 3rd parameter. The learning execution device executes learning using the first parameter, the second parameter, the maximum value of the second parameter, the third parameter, and the maximum value of the third parameter, thereby forming a muscle model. A learning execution unit for learning may be provided.

上記情報格納部は、上記筋繊維が速筋である場合に、上記筋繊維が収縮する毎に消費されるエネルギー量を示す第４パラメータを格納してよく、上記学習実行部は、上記第１パラメータ、上記第２パラメータ、上記第２パラメータの最大値、上記第３パラメータ、上記第３パラメータの最大値、及び上記第４パラメータを用いた学習を実行してよい。上記学習実行部は、上記筋繊維が収縮する毎に上記第２パラメータから予め定められた値を減算し、上記第３パラメータが０でない間は、時間経過に伴って上記第２パラメータを回復させ、上記筋繊維が損傷したと判定した後、上記筋繊維が回復したと判定した場合に、上記筋繊維が速筋である場合には、上記第２パラメータの最大値及び上記第４パラメータの値を増加させ、上記筋繊維が遅筋である場合には、上記第３パラメータの最大値を増加させることによって、上記筋肉のモデルを学習してよい。上記学習実行部は、上記筋繊維が速筋である場合には、上記筋繊維が収縮する毎に上記第２パラメータから上記第４パラメータの値を減算し、上記筋繊維が遅筋である場合には、上記筋繊維が収縮する毎に上記第２パラメータから上記第４パラメータの値以外の値を減算してよい。上記学習実行部は、上記筋繊維が損傷したと判定した後、上記筋繊維が回復したと判定した場合において、上記筋繊維が速筋である場合、上記第３パラメータの最大値は増大させなくてよい。上記学習実行部は、上記筋繊維が損傷したと判定した後、上記筋繊維が回復したと判定した場合において、上記筋繊維が遅筋である場合、上記第２パラメータの最大値は増大させなくてよい。 The information storage unit may store a fourth parameter indicating an amount of energy consumed each time the muscle fiber contracts when the muscle fiber is a fast-twitch muscle fiber, and the learning execution unit stores the first Learning using the parameter, the second parameter, the maximum value of the second parameter, the third parameter, the maximum value of the third parameter, and the fourth parameter may be performed. The learning execution unit subtracts a predetermined value from the second parameter each time the muscle fiber contracts, and restores the second parameter over time while the third parameter is not 0. , when it is determined that the muscle fiber has recovered after determining that the muscle fiber is damaged, and if the muscle fiber is a fast-twitch, the maximum value of the second parameter and the value of the fourth parameter and if the muscle fiber is slow twitch, the muscle model may be learned by increasing the maximum value of the third parameter. The learning execution unit subtracts the value of the fourth parameter from the second parameter each time the muscle fiber contracts when the muscle fiber is a fast-twitch muscle fiber, and subtracts the value of the fourth parameter from the second parameter each time the muscle fiber contracts. , a value other than the value of the fourth parameter may be subtracted from the second parameter each time the muscle fiber contracts. After determining that the muscle fiber is damaged, the learning execution unit determines that the muscle fiber has recovered, and if the muscle fiber is a fast-twitch muscle fiber, does not increase the maximum value of the third parameter. you can After determining that the muscle fiber has been damaged, the learning execution unit determines that the muscle fiber has recovered, and if the muscle fiber is slow twitch, the maximum value of the second parameter is not increased. you can

本発明の第３の態様によれば、コンピュータを、上記学習実行装置として機能させるためのプログラムが提供される。 According to a third aspect of the present invention, there is provided a program for causing a computer to function as the learning execution device.

本発明の第４の態様によれば、コンピュータによって実行される学習実行方法が提供される。学習実行方法は、それぞれに運動単位が接続された複数の介在ニューロンの発火パターンに従って、運動単位に含まれる運動ニューロンに接続された筋繊維を収縮させることによって筋肉を動作させる筋肉モデルの目標動作を設定する動作設定ステップを備えてよい。学習実行方法は、複数の発火パターンのうち、筋肉モデルの動作が目標動作により近い発火パターンに報酬を与える学習を実行することによって、目標動作を実現する発火パターンを学習する学習実行ステップを備えてよい。 According to a fourth aspect of the present invention, there is provided a computer-implemented learning execution method. In the learning execution method, according to the firing pattern of multiple interneurons, each of which is connected to a motor unit, the muscle fibers connected to the motor neurons included in the motor unit are contracted to move the muscle, thereby achieving the target motion of the muscle model. An operation setting step for setting may be provided. The learning execution method includes a learning execution step of learning a firing pattern that realizes a target motion by performing learning that rewards a firing pattern, among a plurality of firing patterns, in which the motion of the muscle model is closer to the target motion. good.

本発明の第５の態様によれば、コンピュータによって実行される学習実行方法が提供される。学習実行方法は、筋肉に含まれる複数の筋繊維のそれぞれに対して、筋繊維が速筋であるか遅筋であるかを示す第１パラメータと、収縮可能なエネルギーを示す第２パラメータと、第２パラメータの最大値と、自己回復力を示す第３パラメータと、第３パラメータの最大値とを格納する格納ステップを備えてよい。学習実行方法は、第１パラメータ、第２パラメータ、第２パラメータの最大値、第３パラメータ、及び第３パラメータの最大値を用いた学習を実行することによって、筋肉のモデルを学習する学習実行ステップを備えてよい。 According to a fifth aspect of the present invention, there is provided a computer-implemented learning execution method. In the learning execution method, for each of a plurality of muscle fibers included in a muscle, a first parameter indicating whether the muscle fiber is fast-twitch or slow-twitch, a second parameter indicating energy that can be contracted, A storage step may be provided for storing a maximum value of the second parameter, a third parameter indicative of self-healing power, and the maximum value of the third parameter. The learning execution method includes a learning execution step of learning a muscle model by executing learning using a first parameter, a second parameter, a maximum value of the second parameter, a third parameter, and a maximum value of the third parameter. may be provided.

なお、上記の発明の概要は、本発明の必要な特徴の全てを列挙したものではない。また、これらの特徴群のサブコンビネーションもまた、発明となりうる。 It should be noted that the above summary of the invention does not list all the necessary features of the invention. Subcombinations of these feature groups can also be inventions.

学習実行装置１００の一例を概略的に示す。An example of the learning execution device 100 is shown schematically. 筋肉モデル３００の一例を概略的に示す。An example of a muscle model 300 is shown schematically. 発火パターン４００の一例を概略的に示す。An example firing pattern 400 is schematically shown. 学習実行装置１００の機能構成の一例を概略的に示す。An example of the functional configuration of the learning execution device 100 is shown schematically. 筋肉モデル３００の具体例を概略的に示す。A specific example of a muscle model 300 is shown schematically. 学習実行装置１００として機能するコンピュータ１２００のハードウェア構成の一例を概略的に示す。1 schematically shows an example of a hardware configuration of a computer 1200 functioning as a learning execution device 100. FIG.

以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は特許請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention, but the following embodiments do not limit the invention according to the claims. Also, not all combinations of features described in the embodiments are essential for the solution of the invention.

図１は、学習実行装置１００の一例を概略的に示す。学習実行装置１００は、筋肉の動きをモデル化した筋肉モデルに目標動作を実行させるための学習を実行する。 FIG. 1 schematically shows an example of a learning execution device 100. As shown in FIG. The learning execution device 100 performs learning for causing a muscle model, which is a model of muscle motion, to perform a target motion.

筋肉モデルは、例えば、人の一部の筋肉に対応する。筋肉モデルは、人の全部の筋肉に対応してもよい。筋肉モデルは、人に限らず、筋肉を有する任意の生物に対応してもよい。また、筋肉モデルは、ＣＧのキャラクタ等に対応してもよい。 A muscle model corresponds, for example, to some muscles of a person. A muscle model may correspond to all muscles of a person. The muscle model is not limited to humans, and may correspond to any organism having muscles. Also, the muscle model may correspond to a CG character or the like.

本実施形態に係る学習実行装置１００は、例えば、それぞれに運動単位が接続された複数の介在ニューロンの発火パターンに従って、運動単位に含まれる運動ニューロンに接続された筋繊維を収縮させることによって筋肉を動作させる筋肉モデルを格納する。介在ニューロンは、インターニューロンと呼ばれる場合もある。運動ニューロンは、モーターニューロンと呼ばれる場合もある。運動単位は、モーターユニットと呼ばれる場合もある。 For example, the learning execution device 100 according to the present embodiment contracts muscle fibers connected to motor neurons included in motor units according to firing patterns of a plurality of interneurons each connected to a motor unit, thereby contracting muscles. Stores the muscle model to operate. Interneurons are sometimes called interneurons. Motor neurons are sometimes called motor neurons. Motor units are sometimes called motor units.

学習実行装置１００は、筋肉モデルが目標動作を実現する発火パターンを学習する。学習実行装置１００は、例えば、ランダムに発生させた複数の発火パターンのうち、筋肉モデルの動作が目標動作に近い発火パターンに報酬を与える学習を実行することによって、発火パターンを学習する。 The learning execution device 100 learns a firing pattern that allows a muscle model to achieve a target motion. The learning execution device 100 learns firing patterns by, for example, performing learning that rewards firing patterns in which the motion of the muscle model is close to the target motion among a plurality of randomly generated firing patterns.

従来の筋収縮に基づくシミュレーション手法として、ヒルタイプモデル及びＣＰＧ等が知られている。従来手法では、パラメータを人手で設定して運動をシミュレーションしていた。従来手法では、筋肉モデルに異なる動作を実行させようとした場合に、すべて人手でパラメータを設定する必要があった。それに対して、本実施形態に係る学習実行装置１００によれば、目標動作を実現可能な発火パターンを自動的に学習できるので、動作の種類毎に個別にパラメータを設定する必要をなくすことができる。 Hill-type models, CPG, and the like are known as simulation methods based on conventional muscle contraction. In the conventional method, the parameters were set manually to simulate motion. In the conventional method, it was necessary to manually set the parameters when trying to make the muscle model perform different motions. On the other hand, according to the learning execution device 100 according to the present embodiment, it is possible to automatically learn a firing pattern that can realize a target motion, so it is possible to eliminate the need to set parameters individually for each type of motion. .

学習実行装置１００は、学習を進める中で、発火パターンに基づいて筋肉モデルを動作させた場合に、筋肉モデルの筋肉を成長させてもよい。学習実行装置１００は、例えば、発火パターンに基づいて筋肉モデルを動作させた場合に、筋繊維を収縮させた運動単位を成長させる。従来手法においては、パラメータの設定次第では、実際の筋肉の動きとは異なる動きを実現してしまう場合があった。それに対して、本実施形態に係る学習実行装置１００は、筋肉の成長をも考慮することによって、よりリアルな動きを実現可能にできる。 The learning execution device 100 may cause the muscles of the muscle model to grow when the muscle model is operated based on the firing pattern while learning is progressing. For example, when the muscle model is operated based on the firing pattern, the learning execution device 100 grows motor units that contract muscle fibers. In the conventional method, depending on the parameter settings, there were cases where movements different from the actual movements of the muscles were realized. On the other hand, the learning execution device 100 according to the present embodiment can realize more realistic movements by also considering muscle growth.

学習実行装置１００は、様々な分野に適用されてよい。学習実行装置１００は、例えば、ＣＧのキャラクタに任意の動作を実現させる発火パターンを学習し、任意の動作を実行するキャラクタのＣＧアニメーションを生成する。 The learning execution device 100 may be applied to various fields. The learning execution device 100, for example, learns a firing pattern that causes a CG character to perform an arbitrary action, and generates a CG animation of the character performing the arbitrary action.

従来は、キャラクタに任意の動作を実行させるためにアニメーションを作り込む必要があったが、本実施形態に係る学習実行装置１００によれば、例えば、筋肉モデルの筋肉を成長させつつ、目標動作を実行するように介在ニューロンの発火パターンを学習することによって、自動的に任意の動作を実行するキャラクタのＣＧアニメーションを生成することができる。例えば、目標動作としてダンスの動作を設定すると、キャラクタが当該ダンスを実行するＣＧアニメーションを自動的に生成することができる。本実施形態に係る学習実行装置１００によれば、介在ニューロンからの発火パターンを学習し、実際の生物と同じ制御系統の動きを実現することによって、リアルな動きを実現することができる。 Conventionally, it was necessary to create an animation for a character to perform an arbitrary motion. By learning the firing patterns of interneurons to perform, it is possible to generate CG animation of characters that automatically perform arbitrary actions. For example, if a dance motion is set as a target motion, a CG animation of a character performing the dance can be automatically generated. According to the learning execution device 100 according to the present embodiment, it is possible to realize realistic movements by learning firing patterns from interneurons and realizing the same control system movements as those of actual living things.

また、従来技術では、例えば、８頭身の人間のダンスの動きを、３頭身のキャラクタに実行させるような場合に、動きの対応がとれずに不自然な動きになってしまう場合があった。それに対して、本実施形態に係る学習実行装置１００によれば、３頭身のキャラクタの筋肉の構造及び成長を考慮した学習を実行することによって、３頭身のキャラクタに、自然な動きを実現させることができる。 Further, in the conventional technique, for example, when a character with a three-headed body is to perform a dance movement of an eight-headed human, there are cases where the movements do not correspond to each other, resulting in an unnatural movement. rice field. On the other hand, according to the learning execution device 100 according to the present embodiment, by executing learning in consideration of the muscle structure and growth of the three-headed character, natural movements are realized for the three-headed character. can be made

学習実行装置１００は、例えば、生成したＣＧアニメーションを、学習実行装置１００が備えるディスプレイに表示させる。また、学習実行装置１００は、例えば、生成したＣＧアニメーションを、ネットワーク２０を介して通信端末２００に送信することによって、通信端末２００に表示させてもよい。 The learning execution device 100 displays, for example, the generated CG animation on a display provided in the learning execution device 100. FIG. Further, the learning execution device 100 may cause the communication terminal 200 to display the generated CG animation by transmitting it to the communication terminal 200 via the network 20, for example.

通信端末２００は、ＰＣ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒ）、タブレット端末、及びスマートフォン等であってよい。学習実行装置１００と通信端末２００とは、ネットワーク２０を介して通信してよい。ネットワーク２０は、インターネットを含んでよい。ネットワーク２０は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）を含んでよい。ネットワーク２０は、移動体通信ネットワークを含んでよい。移動体通信ネットワークは、３Ｇ（３ｒｄＧｅｎｅｒａｔｉｏｎ）通信方式、ＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）通信方式、５Ｇ（５ｔｈＧｅｎｅｒａｔｉｏｎ）通信方式、及び６Ｇ（６ｔｈＧｅｎｅｒａｔｉｏｎ）通信方式以降の通信方式のいずれに準拠していてもよい。 The communication terminal 200 may be a PC (Personal Computer), a tablet terminal, a smart phone, or the like. Learning execution device 100 and communication terminal 200 may communicate via network 20 . Network 20 may include the Internet. The network 20 may include a LAN (Local Area Network). Network 20 may include a mobile communication network. The mobile communication network complies with any of the 3G (3rd Generation) communication method, the LTE (Long Term Evolution) communication method, the 5G (5th Generation) communication method, and the communication method after the 6G (6th Generation) communication method. good too.

また、学習実行装置１００は、例えば、リハビリテーションの分野に適用されてもよい。学習実行装置１００は、例えば、歩行のリハビリを実施する実施者の筋肉モデルを登録するとともに、目標動作として歩行を登録する。そして、介在ニューロンの発火パターンのン学習を進め、歩行ができるようになるまでの動作及び筋肉の成長を記録する。これにより、歩行ができるようになるまでの適切な動作を模索することができる。 Also, the learning execution device 100 may be applied to, for example, the field of rehabilitation. The learning execution device 100 registers, for example, a muscle model of a person who performs walking rehabilitation, and also registers walking as a target motion. Then, the learning of the interneuron firing pattern is advanced, and the movements and muscle growth until the animal is able to walk are recorded. As a result, it is possible to search for an appropriate motion until walking becomes possible.

また、学習実行装置１００は、例えば、スポーツ科学の分野に適用されてもよい。学習実行装置１００は、例えば、スポーツ選手の筋肉モデルを登録するとともに、目標動作として、理想的なフォーム等を登録する。そして、介在ニューロンの発火パターンのン学習を進め、理想的なフォームが身に着くまでの動作及び筋肉の成長を記録する。これにより、トレーニングの方法を模索することができる。 Also, the learning execution device 100 may be applied to, for example, the field of sports science. The learning execution device 100 registers, for example, a muscle model of an athlete, and also registers an ideal form or the like as a target motion. Then, the learning of the interneuron firing pattern is advanced, and the movement and muscle growth until the ideal form is acquired are recorded. This allows you to explore training methods.

なお、学習実行装置１００は、介在ニューロンの発火パターンに従って筋肉を動作させる筋肉モデル以外の筋肉モデルに対して、筋肉の成長を適用してもよい。例えば、学習実行装置１００は、ヒルタイプモデルに基づく筋肉モデルに対して、筋肉の成長を適用する。また、例えば、学習実行装置１００は、ＣＰＧを用いた筋肉モデルに対して、筋肉の成長を適用する。また、例えば、学習実行装置１００は、ＤＱＮを用いた筋肉モデルに対して、筋肉の成長を適用する。学習実行装置１００は、その他、任意の既存のモデルに対して、筋肉の成長を適用してもよい。 Note that learning execution device 100 may apply muscle growth to a muscle model other than a muscle model that operates muscles according to the firing pattern of interneurons. For example, the learning execution device 100 applies muscle growth to a muscle model based on a Hill-type model. Also, for example, the learning execution device 100 applies muscle growth to a muscle model using CPG. Also, for example, the learning execution device 100 applies muscle growth to a muscle model using DQN. The learning execution device 100 may also apply muscle growth to any existing model.

図２は、筋肉モデル３００の一例を概略的に示す。筋肉モデル３００は、脊髄３１０内の複数の介在ニューロン３２０と、複数の介在ニューロン３２０のそれぞれに接続された複数の運動単位３３０とを含む。１つの運動単位３３０には、運動ニューロン３４０と、運動ニューロン３４０に接続された筋繊維３５０とが含まれる。１つの運動ニューロン３４０には、複数の筋繊維３５０が接続される。 FIG. 2 schematically shows an example of muscle model 300 . Muscle model 300 includes a plurality of interneurons 320 within spinal cord 310 and a plurality of motor units 330 connected to each of the plurality of interneurons 320 . A single motor unit 330 includes a motor neuron 340 and a muscle fiber 350 connected to the motor neuron 340 . A plurality of muscle fibers 350 are connected to one motor neuron 340 .

図３は、発火パターン４００の一例を概略的に示す。発火パターン４００は、複数の介在ニューロン３２０の時系列のオン４０２及びオフ４０４を示す。筋肉モデル３００に対して、発火パターン４００を適用することによって、介在ニューロン３２０から各運動単位３３０に対して時系列で信号が入力され、オン４０２に従って、運動単位３３０の筋繊維３５０が収縮する。これにより、様々な筋肉の動きが実現される。 FIG. 3 schematically shows an example firing pattern 400 . Firing pattern 400 shows a time series of on 402 and off 404 of a plurality of interneurons 320 . By applying firing pattern 400 to muscle model 300 , signals are input to each motor unit 330 in time series from interneuron 320 , and muscle fibers 350 of motor unit 330 contract according to ON 402 . As a result, various muscle movements are realized.

図４は、学習実行装置１００の機能構成の一例を概略的に示す。学習実行装置１００は、情報格納部１０２、入力受付部１０４、データ受信部１０６、動作設定部１０８、学習実行部１１０、及び表示制御部１１２を備える。 FIG. 4 schematically shows an example of the functional configuration of the learning execution device 100. As shown in FIG. The learning execution device 100 includes an information storage unit 102 , an input reception unit 104 , a data reception unit 106 , an operation setting unit 108 , a learning execution unit 110 and a display control unit 112 .

情報格納部１０２は、各種情報を格納する。情報格納部１０２は、筋肉モデルを格納してよい。情報格納部１０２は、それぞれに運動単位３３０が接続された複数の介在ニューロン３２０の発火パターンに従って、運動単位３３０に含まれる運動ニューロン３４０に接続された筋繊維３５０を収縮させることによって筋肉を動作させる筋肉モデルを格納してよい。 The information storage unit 102 stores various information. The information storage unit 102 may store muscle models. The information storage unit 102 operates muscles by contracting muscle fibers 350 connected to motor neurons 340 included in the motor units 330 according to firing patterns of a plurality of interneurons 320 each connected to the motor units 330. A muscle model may be stored.

情報格納部１０２は、筋肉モデル３００に含まれる複数の運動単位３３０のそれぞれについて、関連するパラメータを格納してよい。情報格納部１０２は、運動単位３３０が、速筋であるか遅筋であるかを示すタイプパラメータを格納してよい。タイプパラメータは、第１パラメータの一例であってよい。 The information storage unit 102 may store relevant parameters for each of the plurality of motor units 330 included in the muscle model 300 . The information storage unit 102 may store a type parameter indicating whether the motor unit 330 is fast-twitch or slow-twitch. A type parameter may be an example of a first parameter.

情報格納部１０２は、収縮可能なエネルギーを示すパラメータであるＨＰを格納してよい。ＨＰは、第２パラメータの一例であってよい。情報格納部１０２は、ＨＰの最大値を示すＭＡＸＨＰを格納してよい。 The information storage unit 102 may store HP, which is a parameter indicating contractile energy. HP may be an example of the second parameter. The information storage unit 102 may store MAXHP indicating the maximum value of HP.

情報格納部１０２は、自己回復力を示すパラメータであるＭＰを格納してよい。ＭＰは、第３パラメータの一例であってよい。情報格納部１０２は、ＭＰの最大値を示すＭＡＸＭＰを格納してよい。 The information storage unit 102 may store MP, which is a parameter indicating self-resilience. MP may be an example of the third parameter. The information storage unit 102 may store MAXMP indicating the maximum value of MP.

情報格納部１０２は、筋繊維３５０が速筋である場合に、筋繊維３５０が収縮する毎に消費されるエネルギー量を示す第４パラメータを格納してよい。本例では、情報格納部１０２は、第４パラメータの一例である筋繊維３５０の直径を示すＤＩＡＭを格納する。情報格納部１０２は、運動単位３３０の使用に関連するパラメータであるＥＸＰを格納してよい。ＥＸＰは、例えば、運動単位３３０が使用されるたびに増加するパラメータであってよい。ＥＸＰは、例えば、運動単位３３０が使用された回数に関連するパラメータであってよい。ＥＸＰは、運動単位３３０が使用された回数そのものであってもよい。ＥＸＰは、第５パラメータの一例であってよい。 The information storage unit 102 may store a fourth parameter indicating the amount of energy consumed each time the muscle fiber 350 contracts when the muscle fiber 350 is fast-twitch. In this example, the information storage unit 102 stores DIAM indicating the diameter of the muscle fiber 350, which is an example of the fourth parameter. Information store 102 may store EXP, a parameter associated with the use of motor unit 330 . EXP, for example, may be a parameter that increases each time motor unit 330 is used. EXP, for example, may be a parameter related to the number of times motor unit 330 has been used. EXP may simply be the number of times motor unit 330 is used. EXP may be an example of a fifth parameter.

入力受付部１０４は、各種入力を受け付ける。入力受付部１０４は、学習実行装置１００が備える入力デバイスを介した入力を受け付けてよい。 The input reception unit 104 receives various inputs. The input reception unit 104 may receive input via an input device included in the learning execution device 100 .

データ受信部１０６は、ネットワーク２０を介して各種データを受信する。データ受信部１０６は、例えば、通信端末２００から、筋肉モデル３００を受信して情報格納部１０２に格納する。また、データ受信部１０６は、例えば、通信端末２００から、運動単位３３０のパラメータを受信して、情報格納部１０２に格納する。 A data receiving unit 106 receives various data via the network 20 . The data receiving unit 106 receives, for example, the muscle model 300 from the communication terminal 200 and stores it in the information storage unit 102 . The data receiving unit 106 also receives parameters of the motor unit 330 from the communication terminal 200 and stores them in the information storage unit 102, for example.

動作設定部１０８は、筋肉モデルの目標動作を設定する。動作設定部１０８は、例えば、入力受付部１０４が受け付けた入力に従って、筋肉モデル３００の目標動作を設定してよい。動作設定部１０８は、データ受信部１０６が通信端末２００から受信した設定指示に従って、筋肉モデル３００の目標動作を設定してよい。 A motion setting unit 108 sets a target motion of the muscle model. The motion setting unit 108 may set the target motion of the muscle model 300 according to the input received by the input receiving unit 104, for example. The motion setting section 108 may set the target motion of the muscle model 300 according to the setting instruction received by the data receiving section 106 from the communication terminal 200 .

学習実行部１１０は、学習を実行する。学習実行部１１０は、発火パターンを学習してよい。学習実行部１１０は、複数の発火パターンのうち、筋肉モデル３００の動作が目標動作により近い発火パターンに報酬を与える学習によって、目標動作を実現する発火パターンを学習してよい。学習実行部１１０は、例えば、強化学習を用いる。学習実行部１１０は、ＤＱＮ（ＤｅｅｐＱ－Ｎｅｔｗｏｒｋ）を用いてもよい。学習実行部１１０は、ＧＡ（ＧｅｎｅｔｉｃＡｌｇｏｒｉｔｈｍ）を用いてもよい。学習実行部１１０は、その他任意の学習手法を用いてもよい。 The learning executing unit 110 executes learning. The learning execution unit 110 may learn firing patterns. The learning execution unit 110 may learn the firing pattern that realizes the target motion by learning that rewards the firing pattern in which the motion of the muscle model 300 is closer to the target motion among the multiple firing patterns. The learning execution unit 110 uses, for example, reinforcement learning. The learning execution unit 110 may use DQN (Deep Q-Network). The learning execution unit 110 may use GA (Genetic Algorithm). The learning execution unit 110 may use any other learning method.

学習実行部１１０は、例えば、ある目標動作を実現する発火パターンを学習する場合に、まず、ランダムに複数の発火パターンを発生させる。学習実行部１１０は、ランダムに発生させた複数の発火パターンのそれぞれに従って筋肉モデル３００を動作させ、筋肉モデル３００の動作が目標動作により近い発火パターンに基づいて複数の発火パターンを発生させる。学習実行部１１０は、発生させた複数の発火パターンのそれぞれに従って筋肉モデル３００を動作させ、筋肉モデル３００の動作が目標動作により近い発火パターンに基づいて複数の発火パターンを発生させる。学習実行部１１０は、これらを繰り返すことによって、目標動作を実現する発火パターンを学習してよい。 For example, when learning an firing pattern for realizing a certain target action, the learning execution unit 110 first randomly generates a plurality of firing patterns. The learning execution unit 110 operates the muscle model 300 according to each of a plurality of randomly generated firing patterns, and generates a plurality of firing patterns based on the firing pattern in which the motion of the muscle model 300 is closer to the target motion. The learning execution unit 110 operates the muscle model 300 according to each of the generated firing patterns, and generates the firing patterns based on the firing pattern in which the motion of the muscle model 300 is closer to the target motion. The learning execution unit 110 may learn the firing pattern that realizes the target motion by repeating these steps.

学習実行部１１０は、学習済みの発火パターンに基づいて複数の発火パターンを発生させてもよい。例えば、学習実行部１１０は、膝を２０度に曲げて維持するという目標動作に対して学習した発火パターンと、膝を６０度に曲げて維持するという目標動作に対して学習した発火パターンと、膝を９０度に曲げて維持するという目標動作に対して学習した発火パターンに基づいて、複数の発火パターンを発生させる。これにより、例えば、膝を任意の角度に曲げて維持するという目標動作のための複数の発火パターンを容易に準備することができ、発火パターンをランダムに発生させる場合と比較して、全体に要する時間を短くすることができる。 The learning executing unit 110 may generate a plurality of firing patterns based on learned firing patterns. For example, the learning execution unit 110 learns the firing pattern for the target motion of keeping the knee bent at 20 degrees, the firing pattern learned for the target motion of keeping the knee bent at 60 degrees, A plurality of firing patterns are generated based on the firing patterns learned for the target motion of keeping the knee bent at 90 degrees. As a result, for example, it is possible to easily prepare a plurality of firing patterns for the target motion of keeping the knee bent at an arbitrary angle, and compared with the case of randomly generating the firing patterns, the total cost is reduced. time can be shortened.

学習実行部１１０は、学習を進める間、発火パターンに基づいて筋肉モデル３００を動作させた場合に、筋繊維３５０を収縮させた運動単位３３０を成長させてよい。 During learning, the learning execution unit 110 may cause the motor unit 330 that contracts the muscle fiber 350 to grow when the muscle model 300 is operated based on the firing pattern.

筋肉モデル３００は、速筋の運動単位３３０と、遅筋の運動単位３３０とを含んでよい。学習実行部１１０は、発火パターンに基づいて筋肉モデル３００を動作させた場合に、速筋の運動単位３３０と遅筋の運動単位３３０とを異なる基準に従って成長させてよい。 The muscle model 300 may include a fast-twitch motor unit 330 and a slow-twitch motor unit 330 . The learning execution unit 110 may cause the fast-twitch motor unit 330 and the slow-twitch motor unit 330 to grow according to different criteria when the muscle model 300 is operated based on the firing pattern.

学習実行部１１０は、運動単位３３０が収縮する毎にＨＰから予め定められた値を減算してよく、ＭＰが０でない間は、時間経過に伴ってＨＰを回復させてよい。学習実行部１１０は、運動単位３３０が速筋である場合には、運動単位３３０が収縮する毎にＨＰからＤＩＡＭを減算してよい。学習実行部１１０は、運動単位３３０が遅筋である場合には、運動単位３３０が収縮する毎にＨＰから１を減算してよい。なお、これに限らず、学習実行部１１０は、運動単位３３０が速筋である場合に、運動単位３３０が収縮する毎にＨＰからＤＩＡＭ以外の値を減算してもよい。また、学習実行部１１０は、運動単位３３０が遅筋である場合に、運動単位３３０が収縮する毎にＨＰから、例えばＤＩＡＭの値等の、１以外の値を減算してもよい。学習実行部１１０は、ＭＰが０でない間は、時間経過に伴ってＨＰを回復させてよい。学習実行部１１０は、ＭＰが０になった場合、ＨＰの回復を行わなくてよい。学習実行部１１０は、時間経過に伴って、ＭＰを回復させてよい。 The learning execution unit 110 may subtract a predetermined value from the HP each time the motor unit 330 contracts, and may restore the HP over time while the MP is not 0. If the motor unit 330 is a fast-twitch muscle, the learning execution unit 110 may subtract DIAM from HP each time the motor unit 330 contracts. If the motor unit 330 is a slow-twitch muscle, the learning execution unit 110 may subtract 1 from HP each time the motor unit 330 contracts. Note that the learning execution unit 110 may also subtract a value other than DIAM from HP each time the motor unit 330 contracts when the motor unit 330 is a fast-twitch muscle. Further, when the motor unit 330 is a slow-twitch muscle, the learning execution unit 110 may subtract a value other than 1, such as the value of DIAM, from HP each time the motor unit 330 contracts. The learning execution unit 110 may restore HP over time while the MP is not 0. The learning execution unit 110 does not need to recover HP when MP becomes 0. The learning execution unit 110 may recover MP over time.

学習実行部１１０は、筋繊維３５０が損傷したと判定した後、筋繊維３５０が回復したと判定した場合に、運動単位３３０が速筋である場合には、ＭＡＸＨＰ及びＤＩＡＭを増加させ、運動単位３３０が遅筋である場合には、ＭＡＸＭＰを増加させてよい。 When the learning execution unit 110 determines that the muscle fiber 350 has recovered after determining that the muscle fiber 350 has been damaged, if the motor unit 330 is a fast-twitch muscle, the learning execution unit 110 increases MAXHP and DIAM to restore the motor unit. MAXMP may be increased if 330 is slow twitch.

学習実行部１１０は、筋繊維３５０が損傷したと判定した後、筋繊維３５０が回復したと判定した場合において、運動単位３３０が速筋である場合、ＭＡＸＭＰは増大させなくてよい。学習実行部１１０は、筋繊維３５０が損傷したと判定した後、筋繊維３５０が回復したと判定した場合において、運動単位３３０が遅筋である場合、ＭＡＸＨＰは増大させなくてよい。学習実行部１１０は、例えば、ＨＰが０になった場合に、筋繊維３５０が損傷したと判定してよく、ＨＰがＭＡＸＨＰになったり、ＨＰが予め定められた閾値より高くなった場合に、筋繊維３５０が回復したと判定してよい。 When the learning execution unit 110 determines that the muscle fiber 350 has recovered after determining that the muscle fiber 350 has been damaged, and if the motor unit 330 is a fast-twitch muscle, MAXMP does not need to be increased. When the learning execution unit 110 determines that the muscle fiber 350 has recovered after determining that the muscle fiber 350 has been damaged and the motor unit 330 is a slow twitch muscle, MAXHP does not need to be increased. For example, the learning execution unit 110 may determine that the muscle fiber 350 is damaged when the HP becomes 0, and when the HP becomes MAXHP or when the HP becomes higher than a predetermined threshold, It may be determined that the muscle fiber 350 has recovered.

学習実行部１１０は、ＥＸＰの増加に伴って、運動単位３３０のレベルを向上させてよい。学習実行部１１０は、例えば、レベル毎に定められたＥＸＰの値を登録しておき、ＥＸＰの値がレベルに対応するＥＸＰの値を超えた場合に、運動単位３３０のレベルを向上させる。より高いレベルに対して、より多いＥＸＰの値が登録されてよい。 The learning execution unit 110 may improve the level of the motor unit 330 as the EXP increases. For example, the learning execution unit 110 registers an EXP value determined for each level, and improves the level of the motor unit 330 when the EXP value exceeds the EXP value corresponding to the level. More EXP values may be registered for higher levels.

学習実行部１１０は、運動単位３３０のレベルが高いほど、運動単位３３０が速筋である場合のＭＡＸＨＰ及びＤＩＡＭを増加しにくくし、運動単位３３０が遅筋である場合のＭＡＸＭＰを増加しにくくしてよい。 The higher the level of the motor unit 330, the learning execution unit 110 makes it more difficult to increase MAXHP and DIAM when the motor unit 330 is fast-twitch, and makes it more difficult to increase MAXMP when the motor unit 330 is slow-twitch. you can

学習実行部１１０は、運動単位３３０の筋繊維３５０を収縮させた後、予め定められた不応期を経過するまで、当該筋繊維３５０が収縮できないようにしてよい。情報格納部１０２は、複数の運動単位３３０のそれぞれの温度を格納してもよい。学習実行部１１０は、運動単位３３０が使用されるほど、運動単位３３０の温度を高くしてよく、運動単位３３０が使用されなければ、時間経過に伴って、運動単位３３０の温度を低くしてよい。学習実行部１１０は、運動単位３３０の温度が高いほど不応期を短くしてよい。 After contracting the muscle fiber 350 of the motor unit 330, the learning execution unit 110 may prevent the muscle fiber 350 from contracting until a predetermined refractory period elapses. The information storage unit 102 may store the temperature of each of the plurality of motor units 330 . The learning execution unit 110 may increase the temperature of the motor unit 330 as the motor unit 330 is used, and may decrease the temperature of the motor unit 330 over time if the motor unit 330 is not used. good. The learning execution unit 110 may shorten the refractory period as the temperature of the motor unit 330 is higher.

学習実行部１１０は、時系列の複数の発火パターンに従って動作させた筋肉モデルの動作が目標動作を達成した場合に、目標動作を達成した状態の発火パターンから予め定められた時間遡った状態の発火パターンを更新することによって学習を実行してもよい。発火パターンが生成されてから、筋肉が実際に動くまで、不応期及び慣性の法則等の、様々なタイムディレイが存在するので、報酬を得た瞬間の発火パターンを更新するのは好ましくない場合がある。それに対して、学習実行部１１０によれば、目標動作を達成した状態の発火パターンから予め定められた時間遡った状態の発火パターンが更新されるので、学習精度を向上させることができる。 When the motion of the muscle model that is operated according to a plurality of time-series firing patterns achieves the target motion, the learning execution unit 110 performs firing in a state that is a predetermined time before the firing pattern in the state in which the target motion was achieved. Learning may be performed by updating the pattern. Since there are various time delays, such as refractory periods and the law of inertia, from the time the firing pattern is generated until the muscle actually moves, it may not be desirable to update the firing pattern at the instant of reward. be. On the other hand, according to the learning execution unit 110, since the firing pattern in the state that is a predetermined time before the firing pattern in the state in which the target motion is achieved is updated, the learning accuracy can be improved.

当該予め定められた時間は、任意に設定可能であってよく、変更可能であってよい。学習実行部１１０は、速筋と遅筋とで、異なる時間を用いてもよい。例えば、学習実行部１１０は、運動単位３３０が速筋である場合、目標動作を達成した状態の発火パターンから２０ｍｓ前の状態の発火パターンを更新し、運動単位３３０が遅筋である場合、目標動作を達成した状態の発火パターンから４０ｍｓ前の状態の発火パターンを更新してよい。 The predetermined time may be arbitrarily set or changeable. The learning execution unit 110 may use different times for fast-twitch and slow-twitch. For example, if the motor unit 330 is a fast-twitch muscle, the learning execution unit 110 updates the firing pattern in the state 20 ms before the target motion was achieved. The firing pattern in the state 40 ms before the firing pattern in the state of achieving the action may be updated.

学習実行部１１０は、学習した発火パターンを用いて、表示データを生成してよい。学習実行部１１０は、例えば、発火パターンによって任意のキャラクタを動作させたＣＧアニメーションを生成する。学習実行部１１０は、筋肉モデル３００の学習を開始してから、目標動作を実現できるまでの間の、筋肉モデル３００の動作及び筋肉の成長に関するデータを表示する表示データを生成してもよい。学習実行部１１０は、筋肉モデル３００の学習を開始してから、理想的なフォームを実現できるまでの間の、筋肉モデル３００の動作及び筋肉の成長に関するデータを表示する表示データを生成してもよい。 The learning execution unit 110 may generate display data using the learned firing pattern. The learning execution unit 110, for example, generates a CG animation in which an arbitrary character is moved according to the firing pattern. The learning execution unit 110 may generate display data that displays data related to the movement of the muscle model 300 and muscle growth from the start of learning of the muscle model 300 until the target movement is achieved. The learning execution unit 110 may generate display data that displays data related to the movement of the muscle model 300 and muscle growth from the start of learning of the muscle model 300 until the ideal form is achieved. good.

表示制御部１１２は、学習実行部１１０による学習結果に関連する各種表示を制御する。表示制御部１１２は、例えば、学習実行部１１０によって生成された表示データを、学習実行装置１００が備えるディスプレイに表示させる。表示制御部１１２は、学習実行部１１０によって生成された表示データを、ネットワーク２０を介して通信端末２００に送信し、通信端末２００が備えるディスプレイに表示させてもよい。 The display control unit 112 controls various displays related to learning results by the learning execution unit 110 . The display control unit 112 causes, for example, the display data generated by the learning execution unit 110 to be displayed on the display provided in the learning execution device 100 . The display control unit 112 may transmit the display data generated by the learning execution unit 110 to the communication terminal 200 via the network 20 and display it on the display provided in the communication terminal 200 .

情報格納部１０２は、既知のモデルに従った筋肉モデルを格納してもよい。情報格納部１０２は、例えば、ヒルタイプモデルに基づく筋肉モデルを格納する。情報格納部１０２は、ＣＰＧを用いた筋肉モデルを格納してもよい。情報格納部１０２は、ＤＱＮを用いた筋肉モデルを格納してもよい。 The information storage unit 102 may store muscle models according to known models. The information storage unit 102 stores, for example, a muscle model based on a Hill type model. The information storage unit 102 may store a muscle model using CPG. The information storage unit 102 may store a muscle model using DQN.

情報格納部１０２は、既知のモデルに従った筋肉モデルの筋肉に含まれる複数の筋繊維のそれぞれに対して、タイプパラメータと、ＨＰと、ＭＡＸＨＰと、ＭＰと、ＭＡＸＭＰと、ＤＩＡＭとを格納してよい。 The information storage unit 102 stores a type parameter, HP, MAXHP, MP, MAXMP, and DIAM for each of a plurality of muscle fibers included in a muscle of a muscle model according to a known model. you can

学習実行部１１０は、筋繊維が収縮する毎に、筋繊維が速筋である場合にＨＰからＤＩＡＭを減算し、筋繊維が遅筋である場合にＨＰから１を減算し、ＭＰが０でない間は、時間経過に伴ってＨＰを回復させ、筋繊維が損傷したと判定した後、筋繊維が回復したと判定した場合に、筋繊維が速筋である場合には、ＭＡＸＨＰ及びＤＩＡＭを増加させ、筋繊維が遅筋である場合には、ＭＡＸＭＰを増加させてよい。 Each time the muscle fiber contracts, the learning execution unit 110 subtracts DIAM from HP if the muscle fiber is fast-twitch, subtracts 1 from HP if the muscle fiber is slow-twitch, and MP is not 0. During the period, HP is recovered over time, and after determining that the muscle fiber is damaged, when the muscle fiber is determined to have recovered, MAXHP and DIAM are increased if the muscle fiber is fast-twitch. and if the muscle fiber is slow twitch, MAXMP may be increased.

学習実行部１１０は、ＭＰが０になった場合、ＨＰの回復を行わなくてよい。学習実行部１１０は、時間経過に伴って、ＭＰを回復させてよい。学習実行部１１０は、筋繊維が損傷したと判定した後、ＨＰが回復した場合において、筋繊維が速筋である場合、ＭＡＸＭＰは増大させなくてよい。学習実行部１１０は、筋繊維が損傷したと判定した後、ＨＰが回復した場合において、筋繊維が遅筋である場合、ＭＡＸＨＰは増大させなくてよい。 The learning execution unit 110 does not need to recover HP when MP becomes 0. The learning execution unit 110 may recover MP over time. Learning execution unit 110 does not need to increase MAXMP if the muscle fiber is a fast-twitch muscle fiber when HP has recovered after determining that the muscle fiber is damaged. When the learning execution unit 110 recovers HP after determining that the muscle fiber is damaged, if the muscle fiber is slow twitch, MAXHP does not need to be increased.

学習実行部１１０は、ＥＸＰの増加に伴って、筋繊維のレベルを向上させてよい。学習実行部１１０は、例えば、レベル毎に定められたＥＸＰの値を登録しておき、ＥＸＰの値がレベルに対応するＥＸＰの値を超えた場合に、筋繊維のレベルを向上させる。より高いレベルに対して、より多いＥＸＰの値が登録されてよい。学習実行部１１０は、筋繊維のレベルが高いほど、筋繊維が速筋である場合のＭＡＸＨＰ及びＤＩＡＭの値を増加しにくくし、筋繊維が遅筋である場合のＭＡＸＭＰを増加しにくくしてよい。 The learning execution unit 110 may improve the level of muscle fibers as the EXP increases. For example, the learning execution unit 110 registers an EXP value determined for each level, and improves the muscle fiber level when the EXP value exceeds the EXP value corresponding to the level. More EXP values may be registered for higher levels. The higher the muscle fiber level, the more difficult the learning execution unit 110 increases the values of MAXHP and DIAM when the muscle fiber is fast-twitch, and the more difficult it is to increase MAXMP when the muscle fiber is slow-twitch. good.

学習実行部１１０は、筋繊維を収縮させた後、予め定められた不応期を経過するまで、当該筋繊維が収縮できないようにしてよい。情報格納部１０２は、複数の筋繊維のそれぞれの温度を格納してもよい。学習実行部１１０は、筋繊維が使用されるほど、筋繊維の温度を高くしてよく、筋繊維が使用されなければ、時間経過に伴って、筋繊維の温度を低くしてよい。学習実行部１１０は、筋繊維の温度が高いほど不応期を短くしてよい。 After contracting the muscle fiber, the learning execution unit 110 may prevent the muscle fiber from contracting until a predetermined refractory period elapses. The information storage unit 102 may store the temperature of each of a plurality of muscle fibers. The learning execution unit 110 may increase the temperature of the muscle fiber as the muscle fiber is used, and may decrease the temperature of the muscle fiber over time if the muscle fiber is not used. The learning execution unit 110 may shorten the refractory period as the temperature of the muscle fibers increases.

図５は、筋肉モデル３００の具体例を概略的に示す。図５では、人間の腱３７２、膝３７４、及び骨３７６に対応する筋肉３６０の筋肉モデル３００を例示する。上述の通り、脊髄３１０内には複数の介在ニューロン３２０が存在する。脊髄３１０は、学習器とみなすことも可能である。複数の介在ニューロン３２０のそれぞれは、発火と非発火の２つの状態をとり得る。運動単位３３０には、運動ニューロン３４０と、運動ニューロン３４０に接続された筋繊維３５０とが含まれる。１つの運動ニューロン３４０には、複数の筋繊維３５０が接続される。運動ニューロン３４０には、速筋と遅筋との２つの種類があってよい。運動ニューロン３４０は、サイズが大きい場合、速筋であってよく、サイズが小さい場合、遅筋であってよい。運動ニューロン３４０は、例えば、サイズが閾値より大きい場合、速筋であり、サイズが閾値より小さいばあい、遅筋である。筋繊維３５０は、速筋繊維と遅筋繊維との２つの種類があってよい。筋肉３６０は、筋繊維３５０の集合体である。本例において、学習実行部１１０は、１つのモデルとして、２つの筋肉（伸筋と屈筋）が接続された膝関節に対して、単純な動きを発火パターンで制御する。運動単位３３０には、速筋及び遅筋の２つの種類があってよく、学習実行部１１０は、速筋と遅筋とでそれぞれ異なる成長を行わせてよい。 FIG. 5 schematically shows a concrete example of muscle model 300 . FIG. 5 illustrates a muscle model 300 of muscles 360 corresponding to tendons 372, knees 374, and bones 376 of a human. As noted above, within spinal cord 310 are a plurality of interneurons 320 . The spinal cord 310 can also be viewed as a learner. Each of the plurality of interneurons 320 can have two states: firing and non-firing. Motor unit 330 includes motor neurons 340 and muscle fibers 350 connected to motor neurons 340 . A plurality of muscle fibers 350 are connected to one motor neuron 340 . Motor neurons 340 may be of two types: fast-twitch and slow-twitch. Motor neurons 340 may be fast-twitch if large in size and slow-twitch if small in size. A motor neuron 340 is, for example, fast-twitch if its size is greater than a threshold and slow-twitch if its size is less than the threshold. Muscle fibers 350 may be of two types, fast-twitch fibers and slow-twitch fibers. A muscle 360 is a collection of muscle fibers 350 . In this example, the learning execution unit 110 controls a simple motion with a firing pattern for a knee joint to which two muscles (extensor and flexor) are connected as one model. There may be two types of motor units 330, fast-twitch and slow-twitch, and the learning execution unit 110 may cause the fast-twitch and slow-twitch to grow differently.

発火パターンを使用して筋肉を制御するためには、ニューロンの活動電位を計算する必要がある。学習実行部１１０は、介在ニューロン３２０を発火する場合に、例えば、Hodgkin-Huxleyモデル（A.L. Hodgkin, A. A quantitative description of membrane current and its application to conduction and excitation in nerve, from the physiological laboratory. University of Cambridge, pp. 500-544, 1952.）に従って、活動電位を計算してよい。計算された活動電位は、キルヒホッフの法則に従って、接続された運動ニューロン３４０に分配される。 In order to use firing patterns to control muscles, it is necessary to calculate neuronal action potentials. When the learning execution unit 110 fires the interneuron 320, for example, the Hodgkin-Huxley model (A.L. Hodgkin, A. A quantitative description of membrane current and its application to conduction and excitation in nerve, from the physiological laboratory. University of Cambridge, pp. 500-544, 1952.) Action potentials may be calculated. The calculated action potentials are distributed to the connected motor neurons 340 according to Kirchhoff's laws.

学習実行部１１０は、拡張したヒルタイプモデルを用いてよく、発火している運動ニューロン３４０の活動電位の合算を筋肉モデルの入力信号としてよい。筋肉モデルにおいて、筋肉の収縮力が計算され、物理法則に従って、筋肉の収縮力から膝関節のトルクに変換し、膝を動かして、関節角度が変化する。学習実行部１１０は、運動結果を関節角度として出力してよい。関節角度が目標角度を達成した場合、学習実行部１１０は、発火パターンに報酬を与えてよい。 The learning execution unit 110 may use an extended Hill-type model, and may use the sum of the action potentials of the firing motor neurons 340 as the input signal for the muscle model. In the muscle model, the contractile force of the muscle is calculated, and according to the laws of physics, the contractile force of the muscle is converted into the torque of the knee joint, and the knee is moved to change the joint angle. The learning execution unit 110 may output the exercise result as a joint angle. The learning executor 110 may reward the firing pattern when the joint angle achieves the target angle.

上述したように、学習実行部１１０は、拡張したヒルタイプモデルを用いてよい。従来のヒルタイプモデルは、筋肉の収縮要素（ＣＥ）、ＣＥに対して並列に配置される並列弾性要素（ＰＥＥ）及び直列に配置される直列弾性要素（ＳＥＥ）で構成されている。拡張モデルでは、ばね定数に起因する筋痙攣を軽減するために、従来のヒルタイプモデルにおける腱力計算に減衰係数を追加する。ヒルタイプモデルでは、筋繊維が運動ニューロンから電流を取得し、ＰＥＥ、ＳＥＥ、及びＣＥを使用して力に変換する。 As noted above, the learning executor 110 may use an extended Hill-type model. A conventional Hill-type model consists of a muscle contractile element (CE), a parallel elastic element (PEE) arranged parallel to the CE, and a series elastic element (SEE) arranged in series. The extended model adds a damping factor to the tendon force calculation in the traditional Hill-type model to mitigate muscle spasm due to spring constant. In the Hill-type model, muscle fibers acquire current from motor neurons and convert it to force using PEE, SEE, and CE.

ｌｏｐｔは、ＣＥの最大の力を得るために最適化された長さであり、Ａは、筋肉活動比であり、ｌｃｅは、ＣＥの長さである。この関数を近似するためにいくつかの方程式が提案されている。例えば、Rosen and Kuoモデル（Deshpande, P.-H. K. . A. D. Contribution of passive properties of muscle-tendon units to the metacarpophalangeal joint torque of the index finger. IEEE, pp. 288-294, 2010.）を適用してよい。 lopt is the length optimized for maximal force of the CE, A is the muscle activity ratio, and lce is the length of the CE. Several equations have been proposed to approximate this function. For example, the Rosen and Kuo model (Deshpande, P.-H. K. A. D. Contribution of passive properties of muscle-tendon units to the metacarpophalangeal joint torque of the index finger. IEEE, pp. 288-294, 2010.) may be applied. .

Ｖｃｅは、ＣＥの収縮速度であり、Ｖｍａｘは、ＣＥの最大収縮速度である。ＰＥが発生する力であるＦｐｅの式は次のとおりである。 Vce is the contraction velocity of CE and Vmax is the maximum contraction velocity of CE. The formula for Fpe, which is the force generated by PE, is as follows.

Ｋｐｅは、ＰＥのばね定数であり、ｌｐｅは、ＰＥの長さであり、ｌｐｅ_ｒｅｓｔはＰＥの平衡長であり、ｄｐｅは、ＰＥの減衰係数であり、Ｖｐｅは、ＰＥの終端速度である。ＳＥＥの力であるＦｓｅの式は次の通りである。 Kpe is the spring constant of the PE, lpe is the length of the PE, lpe_rest is the equilibrium length of the PE, dpe is the damping coefficient of the PE, and Vpe is the terminal velocity of the PE. The formula for Fse, the force of SEE, is:

ｋｓｅは、ＳＥＥのばね定数であり、ｌｓｅは、ＳＥＥの長さであり、ｌｓｅ_ｒｅｓｔはＳＥＥの平衡長であり、ｄｓｅは、ＳＥＥの減衰係数であり、Ｖｓｅは、ＳＥＥの終端速度である。 kse is the spring constant of the SEE, lse is the length of the SEE, lse_rest is the equilibrium length of the SEE, dse is the damping coefficient of the SEE, and Vse is the terminal velocity of the SEE.

運動単位３３０が活動電位を受けると、筋肉の収縮が引き起こされる。収縮が力のピークに達するまでの時間を収縮時間と呼ぶ。遅筋の運動単位３３０は、収縮時間が長く、最大収縮力が小さくなる。速筋の運動単位３３０は、収縮時間が短く、最大収縮力が高くなる。１つの筋肉は、複数の速筋の運動単位３３０と複数の遅筋及び運動単位３３０で構成されている。そこで、これらの運動単位３３０からなるヒルタイプモデルを採用する。 When a motor unit 330 receives an action potential, it causes muscle contraction. The time it takes for the contraction to reach the peak force is called the contraction time. The slow-twitch motor unit 330 has a longer contraction time and a smaller maximum contraction force. A fast-twitch motor unit 330 has a short contraction time and a high peak contraction force. A single muscle is composed of multiple fast-twitch motor units 330 and multiple slow-twitch and motor units 330 . Therefore, a Hill-type model consisting of these motor units 330 is adopted.

Ｎは速筋の運動単位３３０の数であり、Ｍは遅筋の運動単位３３０の数であり、Ｆｃｅ_ｆ_ｉは、i番目の速筋の収縮力であり、Ｆｃｅ_ｓ_ｊ、ｊ番目の遅筋の収縮力である。 N is the number of fast-twitch motor units 330, M is the number of slow-twitch motor units 330, Fce_f_i is the contractile force of the i-th fast-twitch muscle, and Fce_s_j is the contractile force of the j-th slow-twitch muscle. is.

遅筋の運動単位３３０及び速筋の運動単位３３０の生物学的特性が、本実施形態に係る筋肉モデルによってモデル化される。運動ニューロン３４０と筋繊維３５０で構成される運動単位３３０の成長モデルでは、すべての筋繊維３５０に、筋収縮に使用できるエネルギー値（ＨＰ）がある。収縮の度に、速筋のＨＰの値を、筋繊維３５０の直径に等しい値だけ減少させてよい。また、収縮の度に、遅筋のＨＰの値を、１だけ減少させてよい。継続的な筋肉の収縮によりＨＰが減少し、ＨＰが０になると、筋断裂が発生する。筋断裂が発生すると、回復しなければ、介在ニューロン３２０から電気信号を受信した場合でも、筋繊維３５０を再び収縮させることはできない。一方、介在ニューロン３２０からの信号の間隔が十分に大きければ、筋繊維３５０は自然に回復することができる。本モデルにおいて、自己回復力を示すＭＰが０でない限り、筋繊維３５０は、時間の経過とともに回復する。これらによって、学習実行部１１０は、様々な発火パターンを学習することができる。 The biological properties of the slow-twitch motor unit 330 and the fast-twitch motor unit 330 are modeled by the muscle model according to this embodiment. In the growth model of a motor unit 330 composed of motor neurons 340 and muscle fibers 350, every muscle fiber 350 has an energy value (HP) available for muscle contraction. With each contraction, the value of fast-twitch HP may be decreased by a value equal to the diameter of muscle fiber 350 . Also, the value of the slow twitch HP may be decreased by one with each contraction. Continuous muscle contraction reduces HP, and when HP reaches 0, muscle tear occurs. Once a muscle rupture occurs, the muscle fiber 350 cannot contract again, even if an electrical signal is received from the interneuron 320 without recovery. On the other hand, if the interval between signals from interneurons 320 is large enough, muscle fibers 350 can heal spontaneously. In this model, the muscle fibers 350 recover over time as long as the MP indicating the self-healing power is not 0. With these, the learning execution unit 110 can learn various firing patterns.

運動単位３３０は、使用されるたびにＥＸＰを取得し、成長を促進する。本モデルにおいては、成長のレベルを表すためにＬＶを定義している。遅筋の運動単位と速筋の運動単位には、異なる成長規則がある。速筋の運動単位３３０の場合、ＭＡＸＨＰ及び筋繊維３５０の直径のパラメータが増加する。当該ルールは、生物学的な成長ルールに基づいている。 A motor unit 330 gains EXP each time it is used and promotes growth. In this model, we define LV to represent the level of growth. Slow-twitch motor units and fast-twitch motor units have different growth rules. For fast-twitch motor units 330, the MAXHP and muscle fiber 350 diameter parameters are increased. The rules are based on biological growth rules.

速筋の運動単位３３０には、筋繊維の周囲に衛星細胞が存在する。筋断裂が発生すると、衛星細胞が分裂し、速筋の筋繊維３５０のサイズが増加する。太い筋繊維３５０ほど強度は高くなるが、より多くのＨＰを必要とする。 Fast-twitch motor units 330 have satellite cells surrounding the muscle fibers. When a muscle tear occurs, the satellite cells divide and the size of the fast-twitch muscle fiber 350 increases. Thicker muscle fibers 350 are stronger, but require more HP.

遅筋の筋繊維３５０は、サイズが増加しないが、自己回復力が増加する。生物学的な成長ルールによれば、遅筋の筋繊維３５０の周囲の毛細血管の数が増加するため、遅筋の筋繊維３５０に輸送される酸素の量が増加する。 Slow-twitch muscle fibers 350 do not increase in size, but increase in self-healing power. According to the biological growth rule, the number of capillaries surrounding the slow-twitch muscle fibers 350 increases, thus increasing the amount of oxygen transported to the slow-twitch muscle fibers 350 .

本モデルでは、筋繊維３５０の疲労を示すパラメータであるＳＰをさらに含んでもよい。遅筋の筋繊維３５０のみにおいて、成長に伴ってＳＰの値が減少する。すなわち、遅筋の筋繊維は、より長く使用されることができる。 This model may further include SP, which is a parameter indicating fatigue of the muscle fibers 350 . Only in slow-twitch muscle fibers 350, the SP value decreases with growth. That is, slow-twitch muscle fibers can be used longer.

表１は、各パラメータの説明を示し、表２は、アルゴリズムの一例を示す。 Table 1 shows a description of each parameter and Table 2 shows an example of the algorithm.

学習実行部１１０は、介在ニューロン３２０の発火パターンを学習するために、Ｑラーニングを使用してよい。学習プロセスは、各介在ニューロン３２０をエージェントとするマルチエージェントシステム学習に基づいてよい。各エージェントは、その環境を監視する。環境とは、介在ニューロン３２０と運動ニューロン３４０との接続性、及び運動単位３３０のパラメータとして定義されてよい。学習中、初期接続設定は変更されないが、運動単位のパラメータは変更可能であってよい。 Learning executor 110 may use Q-learning to learn firing patterns of interneurons 320 . The learning process may be based on multi-agent system learning with each interneuron 320 as an agent. Each agent monitors its environment. The environment may be defined as the connectivity of interneurons 320 and motor neurons 340 and the parameters of motor units 330 . During learning, initial connection settings are not changed, but motor unit parameters may be changeable.

各介在ニューロン３２０は、複数の運動ニューロン３４０に接続されており、速筋又は遅筋のいずれかに接続される。なお、運動単位３３０の筋繊維３５０が速筋であるか遅筋であるかは、接続している運動ニューロンのサイズによって決まる。この原理は生物学に由来する。エージェントは、エージェント間で状態情報を共有できる。これは、ミエリン接続による情報共有と同等である。 Each interneuron 320 is connected to a plurality of motor neurons 340 and connects to either fast or slow twitch muscles. Whether the muscle fiber 350 of the motor unit 330 is fast-twitch or slow-twitch depends on the size of the connected motor neuron. This principle comes from biology. Agents can share state information among themselves. This is equivalent to information sharing through myelin connections.

Ｑラーニングにおける状態と行動の組み合わせでは、Ｑｉ＝（ｓｉ：ａｉ）であり、Ｓｉは各エージェントの状態を示す。 The combination of state and action in Q-learning is Qi=(si:ai), where Si indicates the state of each agent.

Ｍは、介在ニューロン３２０に接続されている運動ニューロン３４０の合計であり、Ｏは、他の介在ニューロン３２０に接続されている運動ニューロン３４０の合計である。各エージェントは、発火（１）又は発火しない（０）のような行動ａｉを実行する。 M is the sum of motor neurons 340 connected to an interneuron 320 and O is the sum of motor neurons 340 connected to other interneurons 320 . Each agent performs an action ai such as fire (1) or not fire (0).

各介在ニューロン３２０は、接続されている各運動単位のすべてのパラメータと、情報を共有している他の介在ニューロン３２０が保持している運動単位のエネルギーの合計を監視し、発火するかどうかを決定する。介在ニューロン３２０の発火に基づいて、Hodgkin Huxleyモデルを用いて、接続された運動単位の電気信号が計算され、入力信号の計算に利用される。次に、拡張されたヒルタイプモデルを使用して、筋肉の収縮から計算された角度がエージェントにフィードバックされる。 Each interneuron 320 monitors all the parameters of each motor unit to which it is connected, as well as the sum of the motor unit energies held by other interneurons 320 with which it shares information, and determines whether to fire. decide. Based on the firing of interneurons 320, using the Hodgkin Huxley model, the electrical signals of the connected motor units are computed and used to compute the input signal. The angles calculated from muscle contraction are then fed back to the agent using an extended Hill-type model.

報酬には、即時と遅延の２種類があってよい。即時の報酬として、膝関節が目標の角度を達成する度に、ｒｇｏａｌを受信する。膝関節が目標角度を達成し続ける限り、エージェントは報酬を受け取り続ける。 Rewards may be of two types: immediate and delayed. As an immediate reward, receive an rgoal each time the knee joint achieves the target angle. As long as the knee joint continues to achieve the target angle, the agent continues to receive the reward.

遅延報酬として、すべてのエージェントの残りのＨＰの合計が、エピソードの終わりに、報酬としてすべてのエージェントに均等に分配される。これは、効率的な動きを生み出す協調行動に寄与する。 As a delayed reward, the total remaining HP of all agents is distributed equally to all agents as a reward at the end of the episode. This contributes to coordinated behavior that produces efficient movement.

図６は、学習実行装置１００として機能するコンピュータ１２００のハードウェア構成の一例を概略的に示す。コンピュータ１２００にインストールされたプログラムは、コンピュータ１２００を、本実施形態に係る装置の１又は複数の「部」として機能させ、又はコンピュータ１２００に、本実施形態に係る装置に関連付けられるオペレーション又は当該１又は複数の「部」を実行させることができ、及び／又はコンピュータ１２００に、本実施形態に係るプロセス又は当該プロセスの段階を実行させることができる。そのようなプログラムは、コンピュータ１２００に、本明細書に記載のフローチャート及びブロック図のブロックのうちのいくつか又はすべてに関連付けられた特定のオペレーションを実行させるべく、ＣＰＵ１２１２によって実行されてよい。 FIG. 6 schematically shows an example of a hardware configuration of a computer 1200 functioning as the learning execution device 100. As shown in FIG. Programs installed on the computer 1200 cause the computer 1200 to function as one or more "parts" of the apparatus of the present embodiments, or cause the computer 1200 to operate or perform operations associated with the apparatus of the present invention. Multiple "units" can be executed and/or the computer 1200 can be caused to execute the process or steps of the process according to the present invention. Such programs may be executed by CPU 1212 to cause computer 1200 to perform certain operations associated with some or all of the blocks in the flowcharts and block diagrams described herein.

本実施形態によるコンピュータ１２００は、ＣＰＵ１２１２、ＲＡＭ１２１４、及びグラフィックコントローラ１２１６を含み、それらはホストコントローラ１２１０によって相互に接続されている。コンピュータ１２００はまた、通信インタフェース１２２２、記憶装置１２２４、ＤＶＤドライブ、及びＩＣカードドライブのような入出力ユニットを含み、それらは入出力コントローラ１２２０を介してホストコントローラ１２１０に接続されている。ＤＶＤドライブは、ＤＶＤ－ＲＯＭドライブ及びＤＶＤ－ＲＡＭドライブ等であってよい。記憶装置１２２４は、ハードディスクドライブ及びソリッドステートドライブ等であってよい。コンピュータ１２００はまた、ＲＯＭ１２３０及びキーボードのようなレガシの入出力ユニットを含み、それらは入出力チップ１２４０を介して入出力コントローラ１２２０に接続されている。 Computer 1200 according to this embodiment includes CPU 1212 , RAM 1214 , and graphics controller 1216 , which are interconnected by host controller 1210 . Computer 1200 also includes input/output units such as communication interface 1222 , storage device 1224 , DVD drive, and IC card drive, which are connected to host controller 1210 via input/output controller 1220 . The DVD drive may be a DVD-ROM drive, a DVD-RAM drive, and the like. Storage devices 1224 may be hard disk drives, solid state drives, and the like. Computer 1200 also includes legacy input/output units, such as ROM 1230 and keyboard, which are connected to input/output controller 1220 via input/output chip 1240 .

ＣＰＵ１２１２は、ＲＯＭ１２３０及びＲＡＭ１２１４内に格納されたプログラムに従い動作し、それにより各ユニットを制御する。グラフィックコントローラ１２１６は、ＲＡＭ１２１４内に提供されるフレームバッファ等又はそれ自体の中に、ＣＰＵ１２１２によって生成されるイメージデータを取得し、イメージデータがディスプレイデバイス１２１８上に表示されるようにする。 The CPU 1212 operates according to programs stored in the ROM 1230 and RAM 1214, thereby controlling each unit. Graphics controller 1216 retrieves image data generated by CPU 1212 into a frame buffer or the like provided in RAM 1214 or itself, and causes the image data to be displayed on display device 1218 .

通信インタフェース１２２２は、ネットワークを介して他の電子デバイスと通信する。記憶装置１２２４は、コンピュータ１２００内のＣＰＵ１２１２によって使用されるプログラム及びデータを格納する。ＤＶＤドライブは、プログラム又はデータをＤＶＤ－ＲＯＭ等から読み取り、記憶装置１２２４に提供する。ＩＣカードドライブは、プログラム及びデータをＩＣカードから読み取り、及び／又はプログラム及びデータをＩＣカードに書き込む。 Communication interface 1222 communicates with other electronic devices over a network. Storage device 1224 stores programs and data used by CPU 1212 within computer 1200 . The DVD drive reads programs or data from a DVD-ROM or the like and provides them to the storage device 1224 . The IC card drive reads programs and data from IC cards and/or writes programs and data to IC cards.

ＲＯＭ１２３０はその中に、アクティブ化時にコンピュータ１２００によって実行されるブートプログラム等、及び／又はコンピュータ１２００のハードウェアに依存するプログラムを格納する。入出力チップ１２４０はまた、様々な入出力ユニットをＵＳＢポート、パラレルポート、シリアルポート、キーボードポート、マウスポート等を介して、入出力コントローラ１２２０に接続してよい。 ROM 1230 stores therein programs that are dependent on the hardware of computer 1200, such as a boot program that is executed by computer 1200 upon activation. Input/output chip 1240 may also connect various input/output units to input/output controller 1220 via USB ports, parallel ports, serial ports, keyboard ports, mouse ports, and the like.

プログラムは、ＤＶＤ－ＲＯＭ又はＩＣカードのようなコンピュータ可読記憶媒体によって提供される。プログラムは、コンピュータ可読記憶媒体から読み取られ、コンピュータ可読記憶媒体の例でもある記憶装置１２２４、ＲＡＭ１２１４、又はＲＯＭ１２３０にインストールされ、ＣＰＵ１２１２によって実行される。これらのプログラム内に記述される情報処理は、コンピュータ１２００に読み取られ、プログラムと、上記様々なタイプのハードウェアリソースとの間の連携をもたらす。装置又は方法が、コンピュータ１２００の使用に従い情報のオペレーション又は処理を実現することによって構成されてよい。 The program is provided by a computer-readable storage medium such as DVD-ROM or IC card. The program is read from a computer-readable storage medium, installed in storage device 1224 , RAM 1214 , or ROM 1230 , which are also examples of computer-readable storage media, and executed by CPU 1212 . The information processing described within these programs is read by computer 1200 to provide coordination between the programs and the various types of hardware resources described above. An apparatus or method may be configured by implementing information operations or processing according to the use of computer 1200 .

例えば、通信がコンピュータ１２００及び外部デバイス間で実行される場合、ＣＰＵ１２１２は、ＲＡＭ１２１４にロードされた通信プログラムを実行し、通信プログラムに記述された処理に基づいて、通信インタフェース１２２２に対し、通信処理を命令してよい。通信インタフェース１２２２は、ＣＰＵ１２１２の制御の下、ＲＡＭ１２１４、記憶装置１２２４、ＤＶＤ－ＲＯＭ、又はＩＣカードのような記録媒体内に提供される送信バッファ領域に格納された送信データを読み取り、読み取られた送信データをネットワークに送信し、又はネットワークから受信した受信データを記録媒体上に提供される受信バッファ領域等に書き込む。 For example, when communication is performed between the computer 1200 and an external device, the CPU 1212 executes a communication program loaded into the RAM 1214 and sends communication processing to the communication interface 1222 based on the processing described in the communication program. you can command. The communication interface 1222 reads transmission data stored in a transmission buffer area provided in a recording medium such as a RAM 1214, a storage device 1224, a DVD-ROM, or an IC card under the control of the CPU 1212, and transmits the read transmission data. Data is transmitted to the network, or received data received from the network is written in a receive buffer area or the like provided on the recording medium.

また、ＣＰＵ１２１２は、記憶装置１２２４、ＤＶＤドライブ（ＤＶＤ－ＲＯＭ）、ＩＣカード等のような外部記録媒体に格納されたファイル又はデータベースの全部又は必要な部分がＲＡＭ１２１４に読み取られるようにし、ＲＡＭ１２１４上のデータに対し様々なタイプの処理を実行してよい。ＣＰＵ１２１２は次に、処理されたデータを外部記録媒体にライトバックしてよい。 In addition, the CPU 1212 causes the RAM 1214 to read all or necessary portions of files or databases stored in an external recording medium such as a storage device 1224, a DVD drive (DVD-ROM), an IC card, etc. Various types of processing may be performed on the data. CPU 1212 may then write back the processed data to an external recording medium.

様々なタイプのプログラム、データ、テーブル、及びデータベースのような様々なタイプの情報が記録媒体に格納され、情報処理を受けてよい。ＣＰＵ１２１２は、ＲＡＭ１２１４から読み取られたデータに対し、本開示の随所に記載され、プログラムの命令シーケンスによって指定される様々なタイプのオペレーション、情報処理、条件判断、条件分岐、無条件分岐、情報の検索／置換等を含む、様々なタイプの処理を実行してよく、結果をＲＡＭ１２１４に対しライトバックする。また、ＣＰＵ１２１２は、記録媒体内のファイル、データベース等における情報を検索してよい。例えば、各々が第２の属性の属性値に関連付けられた第１の属性の属性値を有する複数のエントリが記録媒体内に格納される場合、ＣＰＵ１２１２は、当該複数のエントリの中から、第１の属性の属性値が指定されている条件に一致するエントリを検索し、当該エントリ内に格納された第２の属性の属性値を読み取り、それにより予め定められた条件を満たす第１の属性に関連付けられた第２の属性の属性値を取得してよい。 Various types of information, such as various types of programs, data, tables, and databases, may be stored on recording media and subjected to information processing. CPU 1212 performs various types of operations on data read from RAM 1214, information processing, conditional decisions, conditional branching, unconditional branching, and information retrieval, which are described throughout this disclosure and are specified by instruction sequences of programs. Various types of processing may be performed, including /replace, etc., and the results written back to RAM 1214 . In addition, the CPU 1212 may search for information in a file in a recording medium, a database, or the like. For example, when a plurality of entries each having an attribute value of a first attribute associated with an attribute value of a second attribute are stored in the recording medium, the CPU 1212 selects the first attribute from among the plurality of entries. search for an entry that matches the specified condition of the attribute value of the attribute, read the attribute value of the second attribute stored in the entry, and thereby determine the first attribute that satisfies the predetermined condition An attribute value of the associated second attribute may be obtained.

上で説明したプログラム又はソフトウエアモジュールは、コンピュータ１２００上又はコンピュータ１２００近傍のコンピュータ可読記憶媒体に格納されてよい。また、専用通信ネットワーク又はインターネットに接続されたサーバシステム内に提供されるハードディスク又はＲＡＭのような記録媒体が、コンピュータ可読記憶媒体として使用可能であり、それによりプログラムを、ネットワークを介してコンピュータ１２００に提供する。 The programs or software modules described above may be stored in a computer-readable storage medium on or near computer 1200 . Also, a recording medium such as a hard disk or RAM provided in a server system connected to a dedicated communication network or the Internet can be used as a computer-readable storage medium, whereby the program can be transferred to the computer 1200 via the network. offer.

本実施形態におけるフローチャート及びブロック図におけるブロックは、オペレーションが実行されるプロセスの段階又はオペレーションを実行する役割を持つ装置の「部」を表わしてよい。特定の段階及び「部」が、専用回路、コンピュータ可読記憶媒体上に格納されるコンピュータ可読命令と共に供給されるプログラマブル回路、及び／又はコンピュータ可読記憶媒体上に格納されるコンピュータ可読命令と共に供給されるプロセッサによって実装されてよい。専用回路は、デジタル及び／又はアナログハードウェア回路を含んでよく、集積回路（ＩＣ）及び／又はディスクリート回路を含んでよい。プログラマブル回路は、例えば、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、及びプログラマブルロジックアレイ（ＰＬＡ）等のような、論理積、論理和、排他的論理和、否定論理積、否定論理和、及び他の論理演算、フリップフロップ、レジスタ、並びにメモリエレメントを含む、再構成可能なハードウェア回路を含んでよい。 The blocks in the flowcharts and block diagrams in this embodiment may represent steps in the process in which the operations are performed or "parts" of the apparatus responsible for performing the operations. Certain steps and "sections" may be provided with dedicated circuitry, programmable circuitry provided with computer readable instructions stored on a computer readable storage medium, and/or computer readable instructions provided with computer readable instructions stored on a computer readable storage medium. It may be implemented by a processor. Dedicated circuitry may include digital and/or analog hardware circuitry, and may include integrated circuits (ICs) and/or discrete circuitry. Programmable circuits, such as Field Programmable Gate Arrays (FPGAs), Programmable Logic Arrays (PLAs), etc., perform AND, OR, EXCLUSIVE OR, NOT AND, NOT OR, and other logical operations. , flip-flops, registers, and memory elements.

コンピュータ可読記憶媒体は、適切なデバイスによって実行される命令を格納可能な任意の有形なデバイスを含んでよく、その結果、そこに格納される命令を有するコンピュータ可読記憶媒体は、フローチャート又はブロック図で指定されたオペレーションを実行するための手段を作成すべく実行され得る命令を含む、製品を備えることになる。コンピュータ可読記憶媒体の例としては、電子記憶媒体、磁気記憶媒体、光記憶媒体、電磁記憶媒体、半導体記憶媒体等が含まれてよい。コンピュータ可読記憶媒体のより具体的な例としては、フロッピー（登録商標）ディスク、ディスケット、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）、消去可能プログラマブルリードオンリメモリ（ＥＰＲＯＭ又はフラッシュメモリ）、電気的消去可能プログラマブルリードオンリメモリ（ＥＥＰＲＯＭ）、静的ランダムアクセスメモリ（ＳＲＡＭ）、コンパクトディスクリードオンリメモリ（ＣＤ－ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、ブルーレイ（登録商標）ディスク、メモリスティック、集積回路カード等が含まれてよい。 A computer-readable storage medium may comprise any tangible device capable of storing instructions to be executed by a suitable device, such that a computer-readable storage medium having instructions stored thereon may be illustrated in flowchart or block diagram form. It will comprise an article of manufacture containing instructions that can be executed to create means for performing specified operations. Examples of computer-readable storage media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, and the like. More specific examples of computer readable storage media include floppy disks, diskettes, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory) , electrically erasable programmable read only memory (EEPROM), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), Blu-ray disc, memory stick , integrated circuit cards, and the like.

コンピュータ可読命令は、アセンブラ命令、命令セットアーキテクチャ（ＩＳＡ）命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、又はＳｍａｌｌｔａｌｋ、ＪＡＶＡ（登録商標）、Ｃ＋＋等のようなオブジェクト指向プログラミング言語、及び「Ｃ」プログラミング言語又は同様のプログラミング言語のような従来の手続型プログラミング言語を含む、１又は複数のプログラミング言語の任意の組み合わせで記述されたソースコード又はオブジェクトコードのいずれかを含んでよい。 The computer readable instructions may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or object oriented programming such as Smalltalk, JAVA, C++, etc. language, and any combination of one or more programming languages, including conventional procedural programming languages, such as the "C" programming language or similar programming languages. good.

コンピュータ可読命令は、汎用コンピュータ、特殊目的のコンピュータ、若しくは他のプログラム可能なデータ処理装置のプロセッサ、又はプログラマブル回路が、フローチャート又はブロック図で指定されたオペレーションを実行するための手段を生成するために当該コンピュータ可読命令を実行すべく、ローカルに又はローカルエリアネットワーク（ＬＡＮ）、インターネット等のようなワイドエリアネットワーク（ＷＡＮ）を介して、汎用コンピュータ、特殊目的のコンピュータ、若しくは他のプログラム可能なデータ処理装置のプロセッサ、又はプログラマブル回路に提供されてよい。プロセッサの例としては、コンピュータプロセッサ、処理ユニット、マイクロプロセッサ、デジタル信号プロセッサ、コントローラ、マイクロコントローラ等を含む。 Computer readable instructions are used to produce means for a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, or programmable circuits to perform the operations specified in the flowchart or block diagrams. A general purpose computer, special purpose computer, or other programmable data processor, locally or over a wide area network (WAN) such as the Internet, etc., to execute such computer readable instructions. It may be provided in the processor of the device or in a programmable circuit. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, and the like.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更又は改良を加えることが可能であることが当業者に明らかである。その様な変更又は改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 Although the present invention has been described above using the embodiments, the technical scope of the present invention is not limited to the scope described in the above embodiments. It is obvious to those skilled in the art that various modifications or improvements can be made to the above embodiments. It is clear from the description of the scope of the claims that forms with such modifications or improvements can also be included in the technical scope of the present invention.

特許請求の範囲、明細書、及び図面中において示した装置、システム、プログラム、及び方法における動作、手順、ステップ、及び段階などの各処理の実行順序は、特段「より前に」、「先立って」などと明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。特許請求の範囲、明細書、及び図面中の動作フローに関して、便宜上「まず、」、「次に、」などを用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The execution order of each process such as actions, procedures, steps, and stages in the devices, systems, programs, and methods shown in the claims, the specification, and the drawings is etc., and it should be noted that they can be implemented in any order unless the output of the previous process is used in the subsequent process. Regarding the operation flow in the claims, the specification, and the drawings, even if the description is made using "first," "next," etc. for convenience, it means that it is essential to carry out in this order. not a thing

２０ネットワーク、１００学習実行装置、１０２情報格納部、１０４入力受付部、１０６データ受信部、１０８動作設定部、１１０学習実行部、１１２表示制御部、２００通信端末、３００筋肉モデル、３１０脊髄、３２０介在ニューロン、３３０運動単位、３４０運動ニューロン、３５０筋繊維、３６０筋肉、３７２腱、３７４膝、３７６骨、４００発火パターン、４０２オン、４０４オフ、１２００コンピュータ、１２１０ホストコントローラ、１２１２ＣＰＵ、１２１４ＲＡＭ、１２１６グラフィックコントローラ、１２１８ディスプレイデバイス、１２２０入出力コントローラ、１２２２通信インタフェース、１２２４記憶装置、１２３０ＲＯＭ、１２４０入出力チップ 20 network, 100 learning execution device, 102 information storage unit, 104 input reception unit, 106 data reception unit, 108 operation setting unit, 110 learning execution unit, 112 display control unit, 200 communication terminal, 300 muscle model, 310 spinal cord, 320 interneurons, 330 motor units, 340 motor neurons, 350 muscle fibers, 360 muscles, 372 tendons, 374 knees, 376 bones, 400 firing patterns, 402 on, 404 off, 1200 computer, 1210 host controller, 1212 CPU, 1214 RAM, 1216 graphic controller, 1218 display device, 1220 input/output controller, 1222 communication interface, 1224 storage device, 1230 ROM, 1240 input/output chip

Claims

an information storage unit that stores a muscle model that operates a muscle by contracting muscle fibers connected to motor neurons included in the motor unit according to the firing pattern of a plurality of interneurons, each of which is connected to a motor unit;
a motion setting unit that sets a target motion of the muscle model;
A learning execution unit that learns the firing pattern, and realizes the target motion by executing learning that rewards the firing pattern, among a plurality of firing patterns, in which the motion of the muscle model is closer to the target motion. a learning execution unit that learns firing patterns;
The learning execution unit executes multi-agent system learning using the plurality of interneurons as agents,
Learning execution device.

In the learning execution unit, each of the plurality of interneurons calculates the sum of all parameters of each connected motor unit and the energy of motor units held by other interneurons sharing information. 2. The learning execution device of claim 1, which executes the multi-agent system learning to monitor and determine whether to fire.

The learning execution unit operates the muscle model according to each of the plurality of firing patterns, generates a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion, and generates a plurality of firing patterns. and learning a firing pattern that realizes the target motion by repeatedly generating a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion. 3. The learning execution device according to claim 1 or 2 .

The learning execution unit operates the muscle model according to each of the plurality of randomly generated firing patterns, generates a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion, The target motion is realized by operating the muscle model according to each of the plurality of firing patterns and repeatedly generating a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion. 4. The learning execution device according to claim 3 , which learns firing patterns.

The learning execution unit operates the muscle model according to each of the plurality of firing patterns generated based on the learned firing pattern, and operates the muscle model to perform a plurality of firing patterns based on the firing pattern closer to the target motion. Generating a firing pattern, operating the muscle model according to each of the plurality of firing patterns, and repeatedly generating a plurality of firing patterns based on the firing pattern in which the motion of the muscle model is closer to the target motion, 4. The learning execution device according to claim 3 , which learns firing patterns for realizing said target motion.

A program for causing a computer to function as the learning execution device according to any one of claims 1 to 5 .

A computer implemented learning method comprising:
Motion setting for setting a target motion of a muscle model to move a muscle by contracting a muscle fiber connected to a motor neuron included in the motor unit according to the firing pattern of a plurality of interneurons each connected to the motor unit. a step;
a learning execution step of learning a firing pattern that realizes the target motion by executing learning that rewards a firing pattern, from among a plurality of firing patterns, in which the motion of the muscle model is closer to the target motion;
The learning execution step executes multi-agent system learning using the plurality of interneurons as agents.
learning execution method.