JP2007175860A - Method and device for learning phase reaction curve, method and device for controlling cyclic movement, and walking movement controller - Google Patents

Method and device for learning phase reaction curve, method and device for controlling cyclic movement, and walking movement controller Download PDF

Info

Publication number
JP2007175860A
JP2007175860A JP2006251704A JP2006251704A JP2007175860A JP 2007175860 A JP2007175860 A JP 2007175860A JP 2006251704 A JP2006251704 A JP 2006251704A JP 2006251704 A JP2006251704 A JP 2006251704A JP 2007175860 A JP2007175860 A JP 2007175860A
Authority
JP
Japan
Prior art keywords
response curve
phase
phase response
periodic motion
walking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2006251704A
Other languages
Japanese (ja)
Inventor
Atsushi Morimoto
淳 森本
Atsushi Nakanishi
淳 中西
Gen Endo
玄 遠藤
Cheng Gordon
チェン ゴードン
Mitsuo Kawahito
光男 川人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Japan Science and Technology Agency
ATR Advanced Telecommunications Research Institute International
Sony Corp
Original Assignee
Japan Science and Technology Agency
ATR Advanced Telecommunications Research Institute International
Sony Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Japan Science and Technology Agency, ATR Advanced Telecommunications Research Institute International, Sony Corp filed Critical Japan Science and Technology Agency
Priority to JP2006251704A priority Critical patent/JP2007175860A/en
Priority to PCT/JP2006/318504 priority patent/WO2007063633A1/en
Publication of JP2007175860A publication Critical patent/JP2007175860A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B62LAND VEHICLES FOR TRAVELLING OTHERWISE THAN ON RAILS
    • B62DMOTOR VEHICLES; TRAILERS
    • B62D57/00Vehicles characterised by having other propulsion or other ground- engaging means than wheels or endless track, alone or in addition to wheels or endless track
    • B62D57/02Vehicles characterised by having other propulsion or other ground- engaging means than wheels or endless track, alone or in addition to wheels or endless track with ground-engaging propulsion means, e.g. walking members
    • B62D57/032Vehicles characterised by having other propulsion or other ground- engaging means than wheels or endless track, alone or in addition to wheels or endless track with ground-engaging propulsion means, e.g. walking members with alternately or sequentially lifted supporting base and legs; with alternately or sequentially lifted feet or skid
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B15/00Systems controlled by a computer
    • G05B15/02Systems controlled by a computer electric
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B5/00Anti-hunting arrangements
    • G05B5/01Anti-hunting arrangements electric

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method for learning a phase reaction curve that updates a phase reaction curve effective to stabilize cyclic movements such as walking movements, and the like. <P>SOLUTION: In a phase reaction curve learning device 1, whether an event that should establish synchronism (synchronous event) occurs between a controlling device 3 and a bipedal robot 5 is detected in a synchronous event detection section 12. If the occurrence of a synchronous event is detected in the synchronous event detection section 12, a feasibility judging section 13 judges the feasibility of the synchronous event. A reward setting section 14 sets rewards as the performances of each event from the judgment results by the feasibility judging section 13, a curve parameter updating section 15 updates a curve parameter 11a so as to allow an accumulative total value of each event set at the reward setting section 14 to be a maximum. The control device 3 establishes synchronism between the control device 3 and the bipedal robot 5 by resetting a phase, based on a phase reaction curve optimized by the phase reaction curve learning device 1. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、歩行運動のような周期的運動の安定化に有効な位相反応曲線を更新させる位相反応曲線学習方法及び装置並びにこれらを用いた周期的運動制御方法及び装置並びに歩行運動制御装置に関する。   The present invention relates to a phase response curve learning method and device for updating a phase response curve effective for stabilizing periodic motion such as walking motion, a periodic motion control method and device using the same, and a walking motion control device.

近年、産業用のみならず、エンタテインメント性が高く玩具的要素を有するロボット、住居の監視を行なうような実用的要素を有するロボットなど、様々なロボットが開発されている。なかでも、ヒトの動作をモデルにした2足歩行ロボットは、姿勢制御が極めて困難である反面、動作の多様性に優れ、様々な用途での利用が考えられる。   In recent years, various robots have been developed, not only for industrial use but also for robots having high entertainment properties and having toy elements, and robots having practical elements for monitoring residences. Among these, biped robots that model human movements are extremely difficult to control their postures, but they have excellent movement diversity and can be used in various applications.

ところで、歩行運動のような周期的運動においては、制御器と該制御器の制御対象とを同期させることが重要であると考えられている。例えば、同期を確立すべき事象が生じたタイミングにおける制御器の変化量Δφを位相φの関数として表した位相反応曲線を用いて位相をリセットすることにより、迅速に制御器と制御対象との間で同期を確立させる技術が提案されている(例えば、非特許文献1、非特許文献2及び非特許文献3参照。)。2足歩行ロボットにおいては、各脚が接地するたびに、一方(接地側)の脚の位相φを0に、他方(非接地側)の脚の位相φをπにする。   By the way, in a periodic motion such as a walking motion, it is considered important to synchronize the controller and the controlled object of the controller. For example, by resetting the phase using a phase response curve that represents the amount of change Δφ of the controller as a function of phase φ at the time when an event that should establish synchronization occurs, the controller and the controlled object can be quickly (See, for example, Non-Patent Document 1, Non-Patent Document 2, and Non-Patent Document 3). In the biped walking robot, each time each leg contacts the ground, the phase φ of one leg (grounding side) is set to 0, and the phase φ of the other leg (non-grounding side) is set to π.

図13は位相反応曲線30の一例を示している。位相反応曲線30は、複数の事象が周期的に発生する周期的運動の外乱因子に対する位相の変化量を示したものである。例えば、周期的運動が行なわれている振動子に外乱を与えた場合、長時間経過後の振動子の周期は元の周期に収束するが、位相ずれ(位相の変化)が生じる。外乱を加えたタイミング(位相φ)を横軸に、位相の変化量Δφを縦軸に取って、位相φと位相の変化量Δφとの関係を示したものがこの位相反応曲線30である。
山崎大河、他2名(T.Yamasaki,T.Nomura,and S.Sato)著,歩行における位相リセットの機能的役割(Possible functional roles of phase resetting during walking),「バイオロジカル・サイバネティックス(Biological Cybernetics)」,2003年,第88巻,第6号,p.468−496 土屋和雄、他2名(K.Tsuchiya,S.Aoi,and K.Tsujita)著,非線形振動子を用いた2足歩行ロボットの歩行制御(Locomotion control of biped locomotion robot using nonlinear oscillators),「IEEE/RSJ知的ロボット及びシステム国際会議予稿集(In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems」」,ラス・ベガス(米国),2003年,p.1745−1750 中西淳、他5名(J.Nakanishi,J.Morimoto,G.Endo,G.Cheng,S.Schaal,and M.Kawato)著,2足歩行のデモンストレーション及び適応からの学習(Learning from demonstration and adaptation of biped locomotion),「ロボット自律システム(Robotics and Autonomous Systems)」,2004年,第47巻,p.79−91 R.S.サットン、他1名(R.S.Sutton and A.G.Barto)著,強化学習Reinforcement Learning: An Introduction),エムアイティー・プレス(MIT Press),ケンブリッジ,1998年 銅谷賢治著(K.Doya)著,連続的な時間及び空間における強化学習(Reinforcement Learning in Continuous Time and Space),「ニューラル・コンピュテーション(Neural Computation)」,2000年,第12巻,第1号,p.219−245 佐藤雅昭、他1名(M.Sato and S.Ishii)著,オンラインEMアルゴリズムに基づく強化学習(Reinforcement learning based on on-line EM algorithm. (In M.S.Kearns,S.A.Solla,andD.A.Cohn editors)),「神経情報処理システムの進歩(Advances in Neural Information Processing Systems 11),エムアイティー・プレス(MIT Press),ケンブリッジ,1999年,p.1052−1058 森本淳、他1名(J.Morimoto and K.Doya)著,階層的強化学習を用いたリアルロボットによる起き上がり動作の習得(Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning),「ロボット自律システム(Robotics and Autonomous Systems),2001年,第36巻,p.37−51
FIG. 13 shows an example of the phase response curve 30. The phase response curve 30 shows the amount of phase change with respect to a disturbance factor of periodic motion in which a plurality of events occur periodically. For example, when a disturbance is applied to a vibrator that performs a periodic motion, the vibrator period after a long time converges to the original period, but a phase shift (phase change) occurs. The phase response curve 30 shows the relationship between the phase φ and the phase change Δφ, with the timing (phase φ) at which the disturbance is applied on the horizontal axis and the phase change Δφ on the vertical axis.
Taiga Yamazaki and two others (T. Yamasaki, T. Nomura, and S. Sato), “Possible functional roles of phase resetting during walking”, “Biological Cybernetics ], 2003, Vol. 88, No. 6, p. 468-496 Kazuo Tsuchiya and two others (K. Tsuchiya, S. Aoi, and K. Tsujita), Locomotion control of biped locomotion robot using nonlinear oscillators, “IEEE / In Proceedings of the IEEE / RSJ International Conference on Intelligent Robots and Systems ", Las Vegas (USA), 2003, pp. 1745-1750 Satoshi Nakanishi and 5 others (J.Nakanishi, J. Morimoto, G. Endo, G. Cheng, S. Schaal, and M. Kawato), Learning from demonstration and adaptation of biped walking of biped locomotion), "Robotics and Autonomous Systems", 2004, Vol. 47, p. 79-91 R. S. Sutton, et al. (RSSutton and AGBarto), Reinforcement Learning: An Introduction), MIT Press, Cambridge, 1998 Kenji Dotani (K. Doya), Reinforcement Learning in Continuous Time and Space, “Neural Computation”, 2000, Vol. 12, No. 1 , P. 219-245 Masaaki Sato and 1 other (M.Sato and S.Ishii), Reinforcement learning based on on-line EM algorithm. (In MSKearns, SASolla, and D.A.Cohn editors), “Advances in Neural Information Processing Systems 11”, MIT Press, Cambridge, 1999, p. 1052-1058. Satoshi Morimoto, 1 other author (J. Morimoto and K. Doya), Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning (Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning), “Robot Robotics and Autonomous Systems, 2001, Vol. 36, pp. 37-51

しかしながら、現状では位相反応曲線の設計方法が確立されていないことから、人間が個々のロボットを実際に歩行させ、歩行実験を繰り返すことによって位相反応曲線を設計しているのが実情である。したがって、新たにロボットを開発するたびに、個々のロボットに応じて歩行実験を行なう必要があり、ロボット開発に膨大な時間を要するという問題があった。また、設計者の意図が位相反応曲線に加味されることから、客観性に欠けるという問題があった。   However, since a phase response curve design method has not been established at present, humans actually walk individual robots and design a phase response curve by repeating a walking experiment. Therefore, every time a new robot is developed, it is necessary to conduct a walking experiment according to each robot, and there is a problem that it takes a lot of time to develop the robot. In addition, since the intention of the designer is added to the phase response curve, there is a problem of lack of objectivity.

本発明は斯かる事情に鑑みてなされたものであり、歩行運動のような周期的運動における複数の事象のそれぞれに対して成否を判定し、判定結果に基づいて各事象の達成度を設定し、設定した各事象の達成度の累計値が最大になるように位相反応曲線のパラメータを更新することにより、位相反応曲線を人が設計することなく、また全探索的に設計することなく、客観性に優れ、位相反応曲線を更新させて周期的運動の安定化を実現することができる位相反応曲線学習方法及び位相反応曲線学習装置の提供を目的とする。
また本発明は、強化学習された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって制御器及び制御対象の同期を確立させることにより、安定した周期的運動を実現することができる周期的運動制御方法及び周期的運動制御装置の提供を目的とする。
The present invention has been made in view of such circumstances, and determines success or failure for each of a plurality of events in a periodic motion such as a walking motion, and sets the achievement level of each event based on the determination result. By updating the parameters of the phase response curve so that the cumulative value of the achievement level of each set event is maximized, it is possible to objectively design the phase response curve without human design or exhaustive design. An object of the present invention is to provide a phase response curve learning method and a phase response curve learning device that are excellent in performance and that can stabilize the periodic motion by updating the phase response curve.
In addition, the present invention realizes stable periodic motion by resetting the phase that defines periodic motion based on the reinforcement-learned phase response curve and establishing synchronization between the controller and the controlled object. An object of the present invention is to provide a periodic motion control method and a periodic motion control device capable of performing the above.

また本発明は、歩行運動のような周期的運動における複数の事象のそれぞれに対して成否を判定し、判定結果に基づいて各事象の達成度を設定し、設定した各事象の達成度の累計値が最大になるように位相反応曲線のパラメータを更新する一方、外乱因子に応じて周期的運動のパターンを変更することにより、位相反応曲線を人が設計することなく、また全探索的に設計することなく、客観性に優れ、位相反応曲線を更新させ、また周期的運動及び歩行運動の安定化を実現することができる周期的運動制御装置及び歩行運動制御装置の提供を目的とする。
また本発明は、強化学習された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって制御器及び制御対象の同期を確立させることにより、安定した歩行運動などの周期的運動を実現することができる周期的運動制御装置及び歩行運動制御装置の提供を目的とする。
Further, the present invention determines success or failure for each of a plurality of events in a periodic motion such as walking motion, sets the achievement level of each event based on the determination result, and accumulates the achievement level of each set event While updating the parameters of the phase response curve to maximize the values, the phase response curve can be designed without human design and in a full search by changing the pattern of periodic motion according to the disturbance factor. An object of the present invention is to provide a periodic motion control device and a walking motion control device that are excellent in objectivity, can update a phase response curve, and can realize stabilization of periodic motion and walking motion.
Further, the present invention provides a periodic motion such as a stable walking motion by resetting a phase that defines the periodic motion based on the reinforcement-learned phase response curve to establish synchronization between the controller and the controlled object. An object of the present invention is to provide a periodic motion control device and a walking motion control device capable of realizing the above.

第1発明に係る位相反応曲線学習方法は、複数の事象が周期的に発生する周期的運動の外乱因子に対する位相の変化量を示す位相反応曲線を強化学習によって更新させる位相反応曲線学習方法であって、各事象の成否を判定し、判定結果に基づいて各事象の達成度を設定し、設定した各事象の達成度の累計値が最大になるように前記位相反応曲線のパラメータを更新することを特徴とする。
第2発明に係る位相反応曲線学習装置は、複数の事象が周期的に発生する周期的運動の外乱因子に対する位相の変化量を示す位相反応曲線を強化学習によって更新させる位相反応曲線学習装置であって、各事象の成否を判定する手段と、判定結果に基づいて各事象の達成度を設定する手段と、設定した各事象の達成度の累計値が最大になるように前記位相反応曲線のパラメータを更新する手段とを備えることを特徴とする。
第3発明に係る周期的運動制御方法は、複数の事象が周期的に発生する周期的運動の制御対象を、制御器が該制御対象の状態に基づいて制御する周期的運動制御方法であって、第1発明の位相反応曲線学習方法によって更新された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって制御器及び制御対象の同期を確立させることを特徴とする。
第4発明に係る周期的運動制御装置は、複数の事象が周期的に発生する周期的運動の制御対象を、該制御対象の状態に基づいて制御する周期的運動制御装置であって、第1発明の位相反応曲線学習方法によって更新された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって前記制御対象との間で同期を確立させる手段を備えることを特徴とする。
The phase response curve learning method according to the first aspect of the present invention is a phase response curve learning method in which a phase response curve indicating an amount of phase change with respect to a disturbance factor of a periodic motion in which a plurality of events occur periodically is updated by reinforcement learning. The success or failure of each event is determined, the achievement level of each event is set based on the determination result, and the parameters of the phase response curve are updated so that the cumulative value of the achievement level of each set event is maximized. It is characterized by.
A phase response curve learning device according to a second aspect of the present invention is a phase response curve learning device that updates, by reinforcement learning, a phase response curve indicating a phase change amount with respect to a disturbance factor of a periodic motion in which a plurality of events occur periodically. Means for determining the success or failure of each event, means for setting the achievement level of each event based on the determination result, and parameters of the phase response curve so that the cumulative value of the achievement level of each event is maximized. And updating means.
A periodic motion control method according to a third aspect of the present invention is a periodic motion control method in which a controller controls a control object of periodic motion in which a plurality of events occur periodically based on the state of the control object. Based on the phase response curve updated by the phase response curve learning method of the first aspect of the invention, the phase that defines the periodic motion is reset to establish synchronization between the controller and the controlled object.
A periodic motion control device according to a fourth aspect of the present invention is a periodic motion control device that controls a control object of periodic motion in which a plurality of events occur periodically based on the state of the control object. According to the phase response curve learning method of the present invention, there is provided means for resetting a phase defining a periodic motion and establishing synchronization with the control target based on the phase response curve updated by the invention.

第5発明に係る周期的運動制御装置は、複数の事象が周期的に発生する周期的運動を制御対象に実行させる制御器と、前記周期的運動の外乱因子に対する位相の変化量を示す位相反応曲線を強化学習によって更新させるべく、各事象の成否を判定する手段、判定結果に基づいて各事象の達成度を設定する手段、及び設定した各事象の達成度の累計値が最大になるように前記位相反応曲線のパラメータを更新する手段を備える位相反応曲線学習装置とを有し、外乱因子に対応して周期的運動のパターンを変更するようにしてあることを特徴とする。
第6発明に係る周期的運動制御装置は、パラメータを更新された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行い、制御対象との間で同期を確立するようにしてあることを特徴とする。
第7発明に係る周期的運動制御装置は、前記制御対象は2足歩行ロボットであり、該ロボットの足の接地時に前記リセットを行うことを特徴とする。
A periodic motion control device according to a fifth aspect of the present invention is a controller that causes a controlled object to execute a periodic motion in which a plurality of events occur periodically, and a phase response indicating a phase change amount with respect to a disturbance factor of the periodic motion. In order to update the curve by reinforcement learning, means for determining the success / failure of each event, means for setting the achievement level of each event based on the determination result, and the cumulative value of the achievement level of each set event is maximized And a phase response curve learning device comprising means for updating the parameters of the phase response curve, wherein the periodic motion pattern is changed in accordance with a disturbance factor.
The periodic motion control device according to the sixth aspect of the present invention resets the phase defining the periodic motion based on the phase response curve with updated parameters, and establishes synchronization with the controlled object. It is characterized by that.
In the periodic motion control device according to a seventh aspect of the present invention, the controlled object is a biped robot, and the reset is performed when the robot's legs are grounded.

第8発明に係る周期的運動制御装置は、前記2足歩行ロボットは5リンクを有し、転倒回避のために股関節角及び/または膝関節角を変位するパターン変更を行うことを特徴とする。
第9発明に係る周期的運動制御装置は、前記2足歩行ロボットは5リンクを有し、転倒回避のために股関節角及び膝関節角を正方向に変位するパターン変更を行うことを特徴とする。
第10発明に係る周期的運動制御装置は、前記2足歩行ロボットは5リンクを有し、転倒回避のために股関節軌道及び/または膝関節軌道を変位するパターン変更を行うことを特徴とする。
第11発明に係る周期的運動制御装置は、前記2足歩行ロボットは5リンクを有し、転倒回避のために股関節軌道を負方向に、膝関節軌道を正方向に各変位するパターン変更を行うことを特徴とする。
In the periodic motion control apparatus according to an eighth aspect of the present invention, the biped robot has five links, and performs a pattern change for displacing a hip joint angle and / or a knee joint angle in order to avoid a fall.
According to a ninth aspect of the present invention, in the periodic motion control device, the biped robot has five links, and performs a pattern change for displacing the hip joint angle and the knee joint angle in a positive direction to avoid falling. .
The periodic motion control apparatus according to a tenth aspect of the invention is characterized in that the biped robot has five links and performs a pattern change for displacing a hip joint trajectory and / or a knee joint trajectory to avoid a fall.
In the periodic motion control device according to an eleventh aspect of the present invention, the biped robot has five links, and performs a pattern change for displacing the hip joint trajectory in the negative direction and the knee joint trajectory in the positive direction to avoid falling. It is characterized by that.

第12発明に係る歩行運動制御装置は、周期的歩行を行う2リンクの2足歩行ロボットを、該ロボットの状態に基づいて制御する歩行運動制御装置であって、   A walking motion control device according to a twelfth aspect of the present invention is a walking motion control device that controls a two-link biped walking robot that performs periodic walking based on the state of the robot,

Figure 2007175860
Figure 2007175860

φは位相で表される周期軌道をロボットに追従させる制御器と、歩行運動の成否を判定する手段、該手段での判定結果に基づいて各歩行の達成度を設定する手段、及び該手段で設定した達成度の累計値が最大になるように、外乱因子による歩行運動の位相の変化量を示す位相反応曲線のパラメータを更新する手段を備える位相反応曲線学習装置とを有し、パラメータを更新された位相反応曲線に基づいて、歩行運動を規定する位相をリセットし、ロボットとの間で同期を確立させるようにしてあることを特徴とする。
第13発明に係る歩行運動制御装置は、前記ロボットの足の接地時に前記リセットを行うことを特徴とする。
φ is a controller that causes the robot to follow a periodic trajectory represented by a phase, means for determining the success or failure of walking motion, means for setting the achievement level of each walking based on the determination result by the means, and means A phase response curve learning device having means for updating a parameter of the phase response curve indicating the amount of change in the phase of the walking movement due to the disturbance factor so that the cumulative value of the set achievement level is maximized, and the parameter is updated. Based on the obtained phase response curve, the phase defining the walking motion is reset, and synchronization with the robot is established.
A walking motion control device according to a thirteenth aspect of the invention is characterized in that the reset is performed when the robot's foot is grounded.

第1発明及び第2発明にあっては、周期的運動における複数の事象のそれぞれに対して成否を判定し、判定結果に基づいて各事象の達成度を設定し、設定した各事象の達成度の累計値が最大になるように位相反応曲線のパラメータを更新する。
第3発明及び第4発明にあっては、上述のようにして更新された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって制御対象との間で同期を確立させる。
In the first invention and the second invention, success or failure is determined for each of the plurality of events in the periodic movement, and the achievement level of each event is set based on the determination result. The parameter of the phase response curve is updated so that the cumulative value of becomes the maximum.
In the third and fourth aspects of the invention, based on the phase response curve updated as described above, the phase that defines the periodic motion is reset to establish synchronization with the controlled object.

第5発明及び第12発明にあっては、周期的運動または歩行運動における複数の事象のそれぞれに対して成否を判定し、判定結果に基づいて各事象の達成度を設定し、設定した各事象の達成度の累計値が最大になるように位相反応曲線のパラメータを更新する。そして第5発明では外乱因子に応じて周期的運動のパターンも変更し、運動の安定性を確保する。
第6発明及び第13発明にあっては、上述のようにして更新された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって制御対象との間で同期を確立させる。そして第7発明及び第13発明では2足歩行ロボットの足の接地時をリセットタイミングとして制御側と非制御側との同期を確立する。
第8乃至第11発明にあっては、躓きに対処して転倒回避を図るべく足下げ(lowering)及び足上げ(elevating)をそれぞれ行う。
In the fifth invention and the twelfth invention, success or failure is determined for each of a plurality of events in periodic motion or walking motion, and the achievement level of each event is set based on the determination result, and each event set The parameter of the phase response curve is updated so that the cumulative value of the achievement level of the maximum becomes the maximum. In the fifth aspect of the invention, the periodic motion pattern is also changed according to the disturbance factor to ensure the stability of the motion.
In the sixth invention and the thirteenth invention, based on the phase response curve updated as described above, the phase that defines the periodic motion is reset to establish synchronization with the controlled object. In the seventh and thirteenth inventions, synchronization between the control side and the non-control side is established with the time when the legs of the biped walking robot are grounded as the reset timing.
In the eighth to eleventh inventions, lowering and elevating are performed respectively in order to deal with whispering and to avoid falling.

第1発明及び第2発明によれば、歩行運動のような周期的運動における複数の事象のそれぞれに対して成否を判定し、判定結果に基づいて各事象の達成度を設定し、設定した各事象の達成度の累計値が最大になるように位相反応曲線のパラメータを更新することにしたので、位相反応曲線を人が設計することなく、また全探索的に設計することなく、客観性に優れ、位相反応曲線を更新させて周期的運動の安定化を実現することができる。強化学習によって自動的に位相反応曲線を更新することから、設計者の意図が位相反応曲線に加味されることはない。
第3発明及び第4発明によれば、強化学習された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって制御器と制御対象との間の同期を確立させることにしたので、安定した周期的運動を実現することができる。したがって、例えば、個々のロボットごとの歩行実験が不要となり、ロボット開発に要する時間を大幅に短縮することができる等、優れた効果を奏する。
According to the first and second inventions, success or failure is determined for each of a plurality of events in a periodic motion such as walking motion, and the achievement level of each event is set based on the determination result, and each set Since we decided to update the parameters of the phase response curve so that the cumulative value of the degree of achievement of the event is maximized, it is possible to achieve objectivity without designing the phase response curve without human design or exhaustive design. It is excellent and can stabilize the periodic motion by updating the phase response curve. Since the phase response curve is automatically updated by reinforcement learning, the intention of the designer is not added to the phase response curve.
According to the third and fourth inventions, based on the reinforcement-learned phase response curve, the phase that defines the periodic motion is reset to establish synchronization between the controller and the controlled object. Therefore, stable periodic motion can be realized. Therefore, for example, a walking experiment for each individual robot becomes unnecessary, and the time required for robot development can be greatly shortened.

第5発明によれば、歩行運動のような周期的運動における複数の事象のそれぞれに対して成否を判定し、判定結果に基づいて各事象の達成度を設定し、設定した各事象の達成度の累計値が最大になるように位相反応曲線のパラメータを更新するとともに、外乱因子に対応して周期的パターンを変更するので、位相反応曲線を人が設計することなく、また全探索的に設計することなく、客観性に優れ、位相反応曲線を更新させて周期的運動の安定化を実現することができ、外乱に対しても例えば転倒することなく歩行運動などの周期的運動を継続することができる。また設計者の意図が位相反応曲線に加味されることはない。
第6、7発明によれば、周期的運動を規定する位相のリセットを行なって制御器と制御対象との間の同期を確立させることにしたので、安定した周期的運動を実現することができる。そして2足歩行ロボットにおいては物理的に特定しやすく、また動作の安定性が高い状態で同期が確立される。
第8乃至第11発明によれば5リンク2足歩行ロボットが躓きに対処して転倒することなくなる。
第12発明によれば、2リンクの2足歩行ロボットにおいて強化学習によって獲得された強化学習された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって制御器と制御対象との間の同期を確立させることにしたので、安定した周期的運動を実現することができる。したがって、例えば、個々のロボットごとの歩行実験が不要となり、ロボット開発に要する時間を大幅に短縮することができる等、優れた効果を奏する。
第13発明よれば、2足歩行ロボットで物理的に特定しやすく、また動作の安定性が高い状態で制御器との同期が確立される。
According to the fifth invention, success or failure is determined for each of a plurality of events in a periodic motion such as walking motion, and the achievement level of each event is set based on the determination result. In addition to updating the phase response curve parameters so that the cumulative value of the maximum value is maximized and changing the periodic pattern according to the disturbance factor, the phase response curve is designed in a full search without any human design. It is excellent in objectivity, can update the phase response curve to stabilize the periodic motion, and can continue periodic motion such as walking motion without falling down even against disturbance Can do. In addition, the designer's intention is not added to the phase response curve.
According to the sixth and seventh inventions, since the phase that defines the periodic motion is reset to establish the synchronization between the controller and the controlled object, a stable periodic motion can be realized. . In a biped walking robot, synchronization is easily established in a state where physical identification is easy and operation stability is high.
According to the eighth to eleventh aspects, the five-link biped walking robot does not fall over in response to whispering.
According to the twelfth aspect of the present invention, the controller and the control object are reset by performing phase reset that defines the periodic motion based on the reinforcement-learned phase response curve obtained by reinforcement learning in a two-link biped walking robot. Since it was decided to establish synchronization between the two, stable periodic motion can be realized. Therefore, for example, a walking experiment for each individual robot becomes unnecessary, and the time required for robot development can be greatly shortened.
According to the thirteenth invention, synchronization with the controller is established in a state where it is easy to be physically specified by the biped walking robot and the operation stability is high.

以下、本発明をその実施の形態を示す図面に基づいて詳述する。   Hereinafter, the present invention will be described in detail with reference to the drawings illustrating embodiments thereof.

図1は本発明に係る位相反応曲線学習装置が接続された2足歩行ロボットの構成を示すブロック図である。
本発明に係る位相反応曲線学習装置1は、制御対象(2足歩行ロボット)5の姿勢制御に用いる位相反応曲線30(図13参照)を強化学習により更新させるためのものである。
FIG. 1 is a block diagram showing a configuration of a biped robot to which a phase response curve learning device according to the present invention is connected.
The phase response curve learning device 1 according to the present invention is for updating a phase response curve 30 (see FIG. 13) used for posture control of a controlled object (bipedal walking robot) 5 by reinforcement learning.

周期的運動制御装置としての制御器3は、位相反応曲線学習装置1によって最適化された位相反応曲線30に基づいて位相のリセットを行なうことによって制御器3と2足歩行ロボット5との間で同期を確立させる。   The controller 3 as the periodic motion control device performs a phase reset based on the phase response curve 30 optimized by the phase response curve learning device 1, so that the controller 3 and the biped robot 5 can move between them. Establish synchronization.

2リンクの2足歩行ロボット5は、図2に示すように、ヒトの腰に対応する腰部50の左右に、股関節としてのアクチュエータ51,52を備え、アクチュエータ51,52にヒトの脚に対応する脚部53,54が設けられている。同図において、θL,θRは、脚部53,54の垂直軸Aとなす角度をそれぞれ示す。   As shown in FIG. 2, the two-link biped robot 5 includes actuators 51 and 52 as hip joints on the left and right sides of the waist 50 corresponding to the human waist, and the actuators 51 and 52 correspond to the human legs. Legs 53 and 54 are provided. In the drawing, θL and θR indicate angles formed with the vertical axis A of the leg portions 53 and 54, respectively.

位相反応曲線学習装置1は、CPUで構成された制御部10を備えている。制御部10は、記憶部11、同期事象検出部12、成否判定部13、報酬設定部14、曲線パラメータ更新部15などと接続され、記憶部11に予め格納されているコンピュータプログラムに従って、各部と協働して各種の機能を果たす。   The phase response curve learning device 1 includes a control unit 10 configured by a CPU. The control unit 10 is connected to the storage unit 11, the synchronization event detection unit 12, the success / failure determination unit 13, the reward setting unit 14, the curve parameter update unit 15, and the like, and according to the computer program stored in the storage unit 11 in advance, Perform various functions in collaboration.

記憶部11には位相反応曲線30を規定する曲線パラメータ11aが記憶されている。位相反応曲線学習装置1は、制御対象に生じた事象に応じて曲線パラメータ11aを更新し、位相反応曲線30の最適化を行なう。なお、学習前に予め曲線パラメータ11aを記憶部11に記憶する必要があるが、学習前の曲線パラメータ11aはユーザにより適宜決定しておく。また、記憶部11には価値関数11bが記憶されている。   The storage unit 11 stores a curve parameter 11 a that defines the phase response curve 30. The phase response curve learning device 1 updates the curve parameter 11a according to the event that has occurred in the controlled object, and optimizes the phase response curve 30. Note that the curve parameter 11a needs to be stored in the storage unit 11 in advance before learning, but the curve parameter 11a before learning is appropriately determined by the user. The storage unit 11 stores a value function 11b.

同期事象検出部12は、歩行運動における各脚の接地のような制御器3と2足歩行ロボット5との間で同期を確立すべき事象(同期事象という)が発生したか否かを検出する。例えば、2足歩行ロボット5の脚部53,54の脚底に接地センサを設け、脚部53,54のうちの一方の脚部の脚底が地面に接地した場合に接地センサで接地を判断し、同期事象検出部12へ通知信号を出力することにより、同期事象検出部12は、脚部の接地を判定して同期を確立させるべき事象が生じたと判断することができる。   The synchronization event detection unit 12 detects whether or not an event (referred to as a synchronization event) that should establish synchronization between the controller 3 and the biped robot 5 such as ground contact of each leg in walking motion has occurred. . For example, a grounding sensor is provided at the bottom of the legs 53 and 54 of the biped walking robot 5, and when the bottom of one of the legs 53 and 54 is grounded to the ground, the grounding sensor determines the grounding, By outputting the notification signal to the synchronization event detection unit 12, the synchronization event detection unit 12 can determine that an event that should establish synchronization by determining the grounding of the leg has occurred.

同期事象検出部12にて同期事象の発生が検出された場合、成否判定部13は同期事象の成否を判定する。報酬設定部14は、成否判定部13による判定結果に基づいて各事象の達成度としての報酬rを設定し、曲線パラメータ更新部15は、報酬設定部14にて設定された各事象の報酬rの累計値が最大になるように曲線パラメータ11aを更新する。なお、2足歩行ロボット5が、同期事象の成否を判定し、判定結果に基づいて各事象の報酬rを設定するようにしてもよい。   When the synchronization event detection unit 12 detects the occurrence of a synchronization event, the success / failure determination unit 13 determines the success / failure of the synchronization event. The reward setting unit 14 sets the reward r as the achievement level of each event based on the determination result by the success / failure determination unit 13, and the curve parameter update unit 15 sets the reward r for each event set by the reward setting unit 14. The curve parameter 11a is updated so that the cumulative value of becomes maximum. Note that the biped walking robot 5 may determine the success or failure of the synchronization event, and set the reward r for each event based on the determination result.

[価値関数・時間差分誤差(Temporal Difference誤差:以下、TD誤差)]
同期事象(タスク失敗の事象を含む)における制御器3の位相φの状態遷移確率を式(1)で表す。
[Value Function / Time Difference Error (Temporal Difference Error: TD Error)]
The state transition probability of the phase φ of the controller 3 in a synchronous event (including a task failure event) is expressed by Equation (1).

Figure 2007175860
Figure 2007175860

式(1)において、位相リセット量Δφは、位相反応曲線30に応じて決定される。報酬rは、式(2)のように、制御器3の位相φ及び位相リセット量Δφの条件付確率によって与えられると仮定する。   In equation (1), the phase reset amount Δφ is determined according to the phase response curve 30. It is assumed that the reward r is given by the conditional probability of the phase φ and the phase reset amount Δφ of the controller 3 as shown in Equation (2).

Figure 2007175860
Figure 2007175860

位相反応曲線30を生成する確率分布π(Δφ(t)|φ(t))のもとで、位相φ(t)における価値関数を式(3)のように表す。   Based on the probability distribution π (Δφ (t) | φ (t)) that generates the phase response curve 30, the value function at the phase φ (t) is expressed as in Equation (3).

Figure 2007175860
Figure 2007175860

また、式(3)の両辺の時間に対する差分から、式(4)の価値関数に対する拘束条件が導出される。   Further, a constraint condition for the value function of Expression (4) is derived from the difference with respect to time on both sides of Expression (3).

Figure 2007175860
Figure 2007175860

そして、式(5)を価値関数の予測値とする。   Then, Equation (5) is used as the predicted value of the value function.

Figure 2007175860
Figure 2007175860

ここで、状態予測が正当である場合、価値関数は式(4)を満足することになるが、状態予測が不当である場合、式(6)のようなTD誤差(例えば、非特許文献4参照)を減少させるように価値関数の学習を行なう。   Here, when the state prediction is valid, the value function satisfies the equation (4). However, when the state prediction is inappropriate, the TD error as expressed by the equation (6) (for example, Non-Patent Document 4). The value function is learned so as to reduce (see).

Figure 2007175860
Figure 2007175860

TD誤差は、状態系列間の価値関数の時間的差分であって、時刻t+1における報酬r(t+1)及び状態価値の推定量V(t+1)から時刻tにおける状態価値V(t)を差分したものである。なお、状態価値の推定量V(t+1)には価値関数の割引率γを考慮する。TD誤差とは、状態の推定と、実際の行動結果としての状態との誤差であり、その状態の推定が正しかったか否かを示す指標といえる。例えば、TD誤差が正の時は、推定した以上に報酬が得られたということであり、負の時は推定よりも報酬が得られなかったということになる。   The TD error is the time difference of the value function between the state series, and is obtained by subtracting the state value V (t) at time t from the reward r (t + 1) and state value estimation amount V (t + 1) at time t + 1. It is. Note that the discount rate γ of the value function is taken into account for the estimated value V (t + 1) of the state value. The TD error is an error between the state estimation and the state as an actual action result, and can be said to be an index indicating whether or not the state estimation is correct. For example, when the TD error is positive, it means that the reward is obtained more than estimated, and when it is negative, the reward is not obtained more than the estimation.

[位相反応曲線の更新]
上述では離散的な時間を対象とした価値関数について説明したが、連続状態における価値関数を取り扱うためには、例えば、式(7)のような正規化ガウス関数ネットワークを用いる(例えば、非特許文献5及び非特許文献6参照。)。
[Update phase response curve]
In the above description, the value function for discrete time has been described, but in order to handle the value function in a continuous state, for example, a normalized Gaussian function network such as Equation (7) is used (for example, non-patent literature). 5 and Non-Patent Document 6).

Figure 2007175860
Figure 2007175860

式(7)において、価値関数のパラメータに対するエリジビリティ・トレース(eligibility trace)と、TD誤差を用いた価値関数のパラメータの更新式は式(8)及び式(9)のように表す。   In equation (7), an eligibility trace for the value function parameter and a value function parameter update equation using the TD error are expressed as in equation (8) and equation (9).

Figure 2007175860
Figure 2007175860

[位相反応曲線の更新]
また、位相反応曲線の位相φは、式(10)のように、確率的な行動則の実現値として表す。
[Update phase response curve]
Further, the phase φ of the phase response curve is expressed as an actual value of a probabilistic behavior rule as shown in Equation (10).

Figure 2007175860
Figure 2007175860

よって、その実現値は、式(11)のように表すことができる。   Therefore, the realization value can be expressed as in Expression (11).

Figure 2007175860
Figure 2007175860

式(11)において、平均μ及び標準偏差σに関するエリジビリティ(eligibility)は、式(12)及び式(13)のように表すことができる。   In the equation (11), the eligibility regarding the average μ and the standard deviation σ can be expressed as the equations (12) and (13).

Figure 2007175860
Figure 2007175860

さらに、平均μを正規化ガウス関数ネットワーク(例えば、非特許文献5及び非特許文献7参照。)によって表し(式(14))、標準偏差σをシグモイド関数及び正規化ガウス関数ネットワークによって表す(式(15))。   Further, the mean μ is expressed by a normalized Gaussian function network (see, for example, Non-Patent Document 5 and Non-Patent Document 7) (Expression (14)), and the standard deviation σ is expressed by a sigmoid function and a normalized Gaussian function network (Expression (14)). (15)).

Figure 2007175860
Figure 2007175860

また、位相反応曲線を規定するための曲線パラメータに対応するエリジビリティは、式(16)及び式(17)のように導出される。   Further, the eligibility corresponding to the curve parameter for defining the phase response curve is derived as shown in Equation (16) and Equation (17).

Figure 2007175860
Figure 2007175860

以上のようにして、位相反応曲線の更新則は式(18)及び式(19)のように表すことができる。   As described above, the update rule of the phase response curve can be expressed as Equation (18) and Equation (19).

Figure 2007175860
Figure 2007175860

また、学習率のエリジビリティ・トレースは、式(20)及び式(21)のように表せる。   Further, the eligibility trace of the learning rate can be expressed as in Expression (20) and Expression (21).

Figure 2007175860
Figure 2007175860

上述のようにして、TD誤差を用いて価値関数のパラメータの更新(式(8)、式(9))と、位相反応曲線の更新(式(18)、式(19))とを行なうことによって、各事象の報酬の累計値が最大になる。新たにロボットのような制御対象を開発するたびに位相反応曲線を人が設計する必要はなく、強化学習によって位相反応曲線を更新してリズムに同調する周期的運動を実現することができる。したがって、個々のロボットごとの歩行実験が不要となり、ロボット開発に要する時間を大幅に短縮することができる。また、強化学習によって自動的に位相反応曲線を更新することから、設計者の意図が位相反応曲線に加味されることはない。   As described above, updating the parameters of the value function (equation (8), equation (9)) and updating the phase response curve (equation (18), equation (19)) using the TD error. The maximum value of the reward for each event is maximized. It is not necessary for a person to design a phase response curve every time a control object such as a robot is newly developed, and a periodic motion synchronized with the rhythm can be realized by updating the phase response curve by reinforcement learning. Therefore, a walking experiment for each robot is not required, and the time required for robot development can be greatly reduced. Further, since the phase response curve is automatically updated by reinforcement learning, the intention of the designer is not added to the phase response curve.

実施例1.
本発明に係る位相反応曲線学習装置1を用いて、2リンクの2足歩行ロボット5に対して位相反応曲線30の更新を行なった。制御器3として、式(22)及び式(23)に示す周期軌道を追従するようなものを用いた。なお、学習時の報酬として、転倒時に報酬r=−1、遊脚接地時に報酬r=0.1を与え、報酬の累計値が最大になるように強化学習を行なうものとする。
Example 1.
Using the phase response curve learning device 1 according to the present invention, the phase response curve 30 was updated for the two-link biped robot 5. As the controller 3, a controller 3 that follows the periodic trajectory shown in the equations (22) and (23) was used. In addition, as a reward at the time of learning, reward r = −1 at the time of falling, reward r = 0.1 at the time of swing leg contact, and reinforcement learning is performed so that the total value of reward is maximized.

Figure 2007175860
Figure 2007175860

図3は本発明に係る位相反応曲線学習装置によって得られた位相反応曲線30及び価値関数を示す図であり、(a)は価値関数、(b)は確率的な位相反応曲線をそれぞれ示し、破線は標準偏差を示す。
位相反応曲線は、確率的に表現された位相反応曲線を示している。これは、確率的な表現を可能にする強化学習の枠組みを用いていることに起因する。
FIG. 3 is a diagram showing a phase response curve 30 and a value function obtained by the phase response curve learning device according to the present invention, where (a) shows a value function, (b) shows a stochastic phase response curve, A broken line shows a standard deviation.
The phase response curve indicates a phase response curve expressed stochastically. This is due to the use of a reinforcement learning framework that enables stochastic expression.

図4は学習過程である累積報酬値の時間的変化を示す図である。
図4において、強化学習を略70回繰り返すことによって、累積報酬値が所定値(ここでは5)に収束(最大化)していることがわかる。つまり、本発明の強化学習においては、強化学習を略70回繰り返すことによって位相反応曲線を最適化することができる。
FIG. 4 is a diagram showing a temporal change in the accumulated reward value as a learning process.
In FIG. 4, it can be seen that the cumulative reward value converges (maximizes) to a predetermined value (here, 5) by repeating reinforcement learning approximately 70 times. That is, in the reinforcement learning of the present invention, the phase response curve can be optimized by repeating reinforcement learning approximately 70 times.

次に、2リンクの2足歩行ロボット5において、位相リセットを行なうことによる姿勢制御への有用性を調べるために、位相リセット及び強化学習による歩行への影響について調べた。   Next, in order to investigate the usefulness of the 2-link bipedal walking robot 5 for posture control by performing phase reset, the effect on walking by phase reset and reinforcement learning was investigated.

図5は位相リセットを行なわない場合の歩行軌道を示す図、図6は予め設計した線形の位相反応曲線に応じて位相リセットを行なった場合の歩行軌道を示す図、図7は本発明に係る位相反応曲線学習装置によって得られた位相反応曲線に応じて位相リセットした場合の歩行軌道を示す図である。なお、図5、図6及び図7において、(a)は位相反応曲線、(b)は歩行軌道をそれぞれ示す。   FIG. 5 is a diagram showing a walking trajectory when phase reset is not performed, FIG. 6 is a diagram showing a walking trajectory when phase reset is performed in accordance with a linear phase response curve designed in advance, and FIG. 7 is related to the present invention. It is a figure which shows the walk track | orbit at the time of carrying out phase reset according to the phase response curve obtained by the phase response curve learning apparatus. 5, 6, and 7, (a) shows a phase response curve, and (b) shows a walking trajectory.

図5及び図6より、位相リセットを全く行なわない場合、略5回の歩行回数で2足歩行ロボット5が転倒するが、位相リセットを行なって制御器3と2足歩行ロボット5との同期を確立することによって、歩行回数を略10回まで維持できることがわかる。しかしながら、線形の位相反応曲線では、略10回の歩行回数で2足歩行ロボット5が転倒することから、実用化には不適である。   5 and 6, when the phase reset is not performed at all, the bipedal walking robot 5 falls down with the number of times of walking approximately five times, but the phase reset is performed to synchronize the controller 3 with the bipedal walking robot 5. It can be seen that the number of walks can be maintained up to about 10 by establishing. However, the linear phase response curve is not suitable for practical use because the bipedal walking robot 5 falls over in about 10 walks.

図6及び図7より、強化学習によって位相反応曲線を最適化することによって、2リンクの2足歩行ロボット5が転倒しないように、ロボット自身で姿勢の制御ができたことが分かる。このように、本発明の強化学習により位相反応曲線を更新し、リズムに同調する歩行運動のような周期的運動を極めて短時間で開発することができる。   6 and 7, it can be seen that by optimizing the phase response curve by reinforcement learning, the robot itself can control the posture so that the two-link biped robot 5 does not fall. Thus, the phase response curve is updated by the reinforcement learning of the present invention, and a periodic motion such as a walking motion that synchronizes with the rhythm can be developed in a very short time.

実施例2.
次に、さらに複雑な制御系でも本発明に係る位相反応曲線学習装置の有用性を調べるべく、5リンクの2足歩行ロボットを制御対象にして位相反応曲線の更新を行なった。
Example 2
Next, in order to investigate the usefulness of the phase response curve learning device according to the present invention even in a more complicated control system, the phase response curve was updated using a 5-link biped robot as a control target.

図8は5リンクの2足歩行ロボットの骨格モデルを示す模式図である。
5リンクの2足歩行ロボット6は、ヒトの腰に対応する腰部60に左右に、股関節としてのアクチュエータ61,62を備え、アクチュエータ61,62にヒトの大腿に対応する上脚部63,64が設けられている。また、腰部60にヒトの脊髄に対応する柱部65が設けられている。同図において、θpitchは柱部65の垂直軸Aとなす角度を、θl_hip,θr_hipは上脚部63,64の柱部65となす角度をそれぞれ示す。
FIG. 8 is a schematic diagram showing a skeleton model of a 5-link biped robot.
The 5-link biped robot 6 includes actuators 61 and 62 as hip joints on the left and right of a waist 60 corresponding to a human waist, and upper legs 63 and 64 corresponding to a human thigh are provided on the actuators 61 and 62, respectively. Is provided. In addition, a column portion 65 corresponding to the human spinal cord is provided in the waist portion 60. In the figure, θpitch indicates the angle formed with the vertical axis A of the column portion 65, and θl_hip and θr_hip indicate angles formed with the column portion 65 of the upper leg portions 63 and 64, respectively.

また、上脚部63,64には、それぞれアクチュエータ66,67を備え、アクチュエータ66,67にヒトの下腿に対応する下脚部68,69が設けられている。同図において、θl_knee,θr_kneeは、下脚部68,69の上脚部63,64となす角度をそれぞれ示す。   The upper legs 63 and 64 are provided with actuators 66 and 67, respectively. The actuators 66 and 67 are provided with lower legs 68 and 69 corresponding to the human lower leg. In the figure, θl_knee and θr_knee indicate angles formed with the upper leg portions 63 and 64 of the lower leg portions 68 and 69, respectively.

図9は本発明に係る位相反応曲線学習装置によって得られた位相反応曲線及び価値関数を示す図であり、(a)は価値関数、(b)は確率的な位相反応曲線をそれぞれ示し、破線は標準偏差を示す。
位相反応曲線は、確率的に表現された位相反応曲線を示している。これは、確率的な表現を可能にする強化学習の枠組みを用いていることに起因する。
FIG. 9 is a diagram showing a phase response curve and a value function obtained by the phase response curve learning device according to the present invention, where (a) shows a value function, (b) shows a stochastic phase response curve, and a broken line. Indicates standard deviation.
The phase response curve indicates a phase response curve expressed stochastically. This is due to the use of a reinforcement learning framework that enables stochastic expression.

次に、5リンクの2足歩行ロボット6において、位相リセットを行なうことによる姿勢制御への有用性を調べるために、位相リセット及び強化学習による歩行への影響について調べた。   Next, in order to investigate the usefulness of the 5-link bipedal walking robot 6 for posture control by performing phase reset, the effects on walking by phase reset and reinforcement learning were investigated.

図10は位相リセットを行なわない場合の歩行軌道を示す図、図11は予め設計した線形の位相反応曲線に応じて位相リセットを行なった場合の歩行軌道を示す図、図12は本発明に係る位相反応曲線学習装置によって得られた位相反応曲線に応じて位相リセットした場合の歩行軌道を示す図である。なお、図10、図11及び図12において、(a)は位相反応曲線、(b)は歩行軌道をそれぞれ示す。   FIG. 10 is a diagram showing a walking trajectory when phase reset is not performed, FIG. 11 is a diagram showing a walking trajectory when phase reset is performed in accordance with a linear phase response curve designed in advance, and FIG. 12 is related to the present invention. It is a figure which shows the walk track | orbit at the time of carrying out phase reset according to the phase response curve obtained by the phase response curve learning apparatus. 10, FIG. 11 and FIG. 12, (a) shows the phase response curve, and (b) shows the walking trajectory.

図10及び図11より、位相リセットを全く行なわない場合も、位相リセットを行なって制御器3と5リンクの2足歩行ロボット6との同期を確立する場合も、3回の歩行回数で、5リンクの2足歩行ロボット6が転倒することがわかる。つまり、複雑な制御系になればなるほど、姿勢制御に対する位相反応曲線の重要度が増すことになる。   From FIG. 10 and FIG. 11, even when no phase reset is performed or when the controller 3 is synchronized with the 5-link biped robot 6 by performing the phase reset, the number of walkings is 5 It can be seen that the biped walking robot 6 of the link falls. That is, the more complex the control system, the greater the importance of the phase response curve for attitude control.

図11及び図12より、強化学習によって位相反応曲線を最適化することによって、5リンクの2足歩行ロボット6が転倒しないように、ロボット自身で姿勢の制御ができたことが分かる。   11 and 12, it can be seen that by optimizing the phase response curve by reinforcement learning, the posture of the robot itself can be controlled so that the five-link biped robot 6 does not fall.

実施例3.
実施例1及び2は強化学習で得た位相反応曲線によって歩行運動などの周期的運動を、転倒などさせずに安定的にて行わせんとするものである。しかしながら障害物などの存在によって躓いた場合には転倒を回避しきれないことがある。人間が躓いた場合に転倒を回避せんとする動きは
(1) elevating strategy(躓きによって素早く脚を上げること:脚上げによる回避戦略)
(2) lowering strategy(躓きによって素早く脚を下げること:足下げによる回避戦略)
(3) elevating strategyに失敗してlowering strategyとなる
の3パターンに分類されることが知られている。実施例3では、5リンク2足歩行ロボットが障害物に躓いた時に、人間に観察される(1)(2)の動作を行なうよう設計したものである。以下の数値的条件は人間の動作観察の結果を参考に決定した。
Example 3
In the first and second embodiments, a periodic motion such as a walking motion is not performed stably without falling down by a phase response curve obtained by reinforcement learning. However, if you hit by the presence of obstacles, you may not be able to avoid falling. The movement to avoid falling when a human hits
(1) elevating strategy (Raising the legs quickly by whispering: avoidance strategy by raising the legs)
(2) lowering strategy (lowering legs quickly by whispering: avoidance strategy by lowering legs)
(3) It is known that the elevating strategy fails and becomes a lowering strategy. The third embodiment is designed to perform the operations (1) and (2) observed by humans when a 5-link biped robot walks on an obstacle. The following numerical conditions were determined with reference to the results of human motion observation.

具体的には,躓き時の位相がφ=5.4rad 以降の場合はlowering strategy、それ以前ではelevating strategyを行なうようにした。それぞれの転倒回避戦略は以下のように目標軌道の変化によって実現する。
lowering strategyについては躓いた脚の股関節角を正方向(図8で時計回り方向)に0.3rad,膝関節角を正方向に0.1rad変位させることにより実現した。
elevating strategyは、躓いた脚の股関節軌道を負方向(図8で反時計方向) に0.2rad、膝関節軌道を正方向に0.3rad 変位させることにより実現した。
Specifically, a lowering strategy is performed when the phase during rolling is φ = 5.4 rad or later, and an elevating strategy is performed before that. Each fall avoidance strategy is realized by changing the target trajectory as follows.
The lowering strategy was realized by displacing the hip joint angle of the crooked leg by 0.3 rad in the positive direction (clockwise in FIG. 8) and the knee joint angle by 0.1 rad in the positive direction.
The elevating strategy was realized by displacing the hip joint trajectory of the crawled leg by 0.2 rad in the negative direction (counterclockwise in FIG. 8) and the knee joint trajectory by 0.3 rad in the positive direction.

図14は位相反応曲線による位相リセットのみを行った場合の歩行軌道を、図15は位相反応曲線による位相リセットに加えてelevating strategyを実行した場合の歩行軌道を示している。前者では躓きの後転倒しているのに対し、後者では歩行を継続している。図16は位相反応曲線による位相リセットのみを行った場合の歩行軌道を、図17は位相反応曲線による位相リセットに加えてlowering strategyを実行した場合の歩行軌道を示している。前者では躓きの後転倒しているのに対し、後者では歩行を継続している。なお、elevating strategyおよびlowering strategyは躓いたときの制御器の位相に応じていずれを用いるかを切り替えている。   FIG. 14 shows a walking trajectory when only phase reset is performed using a phase response curve, and FIG. 15 shows a walking trajectory when an elevating strategy is executed in addition to phase reset using a phase response curve. In the former, the person falls down after whispering, while in the latter he continues walking. FIG. 16 shows the walking trajectory when only the phase reset is performed using the phase response curve, and FIG. 17 shows the walking trajectory when the lowering strategy is executed in addition to the phase reset using the phase response curve. In the former, the person falls down after whispering, while in the latter he continues walking. Note that the elevating strategy and the lowering strategy are switched depending on the phase of the controller at the time of running.

なお、実施の形態では、制御対象として2足歩行ロボットに本発明を適用した形態について説明したが、周期的運動一般について本発明を適用することができることは言うまでもない。   In the embodiment, the embodiment in which the present invention is applied to a biped robot as a control target has been described. However, it goes without saying that the present invention can be applied to periodic motion in general.

本発明に係る位相反応曲線学習装置が接続された2足歩行ロボットの構成を示すブロック図である。It is a block diagram which shows the structure of the biped walking robot to which the phase response curve learning apparatus which concerns on this invention was connected. 2リンクの2足歩行ロボットの骨格モデルを示す模式図である。It is a schematic diagram which shows the skeleton model of the biped biped walking robot of 2 links. 本発明に係る位相反応曲線学習装置によって得られた位相反応曲線及び価値関数を示す図である。It is a figure which shows the phase response curve and value function which were obtained by the phase response curve learning apparatus which concerns on this invention. 学習過程である累積報酬値の時間的変化を示す図である。It is a figure which shows the time change of the accumulation reward value which is a learning process. 位相リセットを行なわない場合の歩行軌道を示す図である。It is a figure which shows the walk track | orbit when not performing a phase reset. 予め設計した線形の位相反応曲線に応じて位相リセットを行なった場合の歩行軌道を示す図である。It is a figure which shows the walk track | orbit at the time of performing a phase reset according to the linear phase response curve designed previously. 本発明に係る位相反応曲線学習装置によって得られた位相反応曲線に応じて位相リセットした場合の歩行軌道を示す図である。It is a figure which shows the walking track at the time of carrying out phase reset according to the phase response curve obtained by the phase response curve learning apparatus which concerns on this invention. 5リンクの2足歩行ロボットの骨格モデルを示す模式図である。It is a schematic diagram which shows the skeleton model of the biped walking robot of 5 links. 本発明に係る位相反応曲線学習装置によって得られた位相反応曲線及び価値関数を示す図である。It is a figure which shows the phase response curve and value function which were obtained by the phase response curve learning apparatus which concerns on this invention. 位相リセットを行なわない場合の歩行軌道を示す図である。It is a figure which shows the walk track | orbit when not performing a phase reset. 予め設計した線形の位相反応曲線に応じて位相リセットを行なった場合の歩行軌道を示す図である。It is a figure which shows the walk track | orbit at the time of performing a phase reset according to the linear phase response curve designed previously. 本発明に係る位相反応曲線学習装置によって得られた位相反応曲線に応じて位相リセットした場合の歩行軌道を示す図である。It is a figure which shows the walking track at the time of carrying out phase reset according to the phase response curve obtained by the phase response curve learning apparatus which concerns on this invention. 位相反応曲線の一例を示す図である。It is a figure which shows an example of a phase response curve. 位相リセットのみを行った場合の歩行軌道を示す図である。It is a figure which shows the walk track | orbit at the time of performing only a phase reset. 位相リセットに加えて目標軌道変更を行った場合の歩行軌道を示す図である。It is a figure which shows the walking track | orbit at the time of changing a target track | orbit in addition to a phase reset. 位相リセットのみを行った場合の歩行軌道を示す図である。It is a figure which shows the walk track | orbit at the time of performing only a phase reset. 位相リセットに加えて他の目標軌道変更を行った場合の歩行軌道を示す図である。It is a figure which shows the walk path | route at the time of performing another target track | orbit change in addition to a phase reset.

符号の説明Explanation of symbols

1 位相反応曲線学習装置
3 制御器(周期的運動制御装置)
5 2リンクの2足歩行ロボット(制御対象)
6 5リンクの2足歩行ロボット(制御対象)
10 制御部
11 記憶部
11a 曲線パラメータ
11b 価値関数
12 同期事象検出部
13 成否判定部
14 報酬設定部
15 曲線パラメータ更新部
1 Phase response curve learning device 3 Controller (periodic motion control device)
5 Two-link biped robot (control target)
6 5-link biped robot (control target)
DESCRIPTION OF SYMBOLS 10 Control part 11 Memory | storage part 11a Curve parameter 11b Value function 12 Synchronization event detection part 13 Success / failure determination part 14 Reward setting part 15 Curve parameter update part

Claims (13)

複数の事象が周期的に発生する周期的運動の外乱因子に対する位相の変化量を示す位相反応曲線を強化学習によって更新させる位相反応曲線学習方法であって、
各事象の成否を判定し、判定結果に基づいて各事象の達成度を設定し、設定した各事象の達成度の累計値が最大になるように前記位相反応曲線のパラメータを更新すること
を特徴とする位相反応曲線学習方法。
A phase response curve learning method in which a phase response curve indicating an amount of phase change with respect to a disturbance factor of periodic motion in which a plurality of events occur periodically is updated by reinforcement learning,
The success or failure of each event is determined, the achievement level of each event is set based on the determination result, and the parameters of the phase response curve are updated so that the cumulative value of the achievement level of each set event is maximized. A phase response curve learning method.
複数の事象が周期的に発生する周期的運動の外乱因子に対する位相の変化量を示す位相反応曲線を強化学習によって更新させる位相反応曲線学習装置であって、
各事象の成否を判定する手段と、
判定結果に基づいて各事象の達成度を設定する手段と、
設定した各事象の達成度の累計値が最大になるように前記位相反応曲線のパラメータを更新する手段と
を備えることを特徴とする位相反応曲線学習装置。
A phase response curve learning device that updates a phase response curve indicating a phase change amount with respect to a disturbance factor of a periodic motion in which a plurality of events occur periodically by reinforcement learning,
Means for determining the success or failure of each event;
Means for setting the degree of achievement of each event based on the determination result;
A phase response curve learning device comprising: means for updating a parameter of the phase response curve so that a cumulative value of achievement of each set event is maximized.
複数の事象が周期的に発生する周期的運動の制御対象を、制御器が該制御対象の状態に基づいて制御する周期的運動制御方法であって、
請求項1に記載の位相反応曲線学習方法によって更新された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって制御器及び制御対象の同期を確立させること
を特徴とする周期的運動制御方法。
A periodic motion control method in which a controller controls a control object of a periodic motion in which a plurality of events occur periodically based on a state of the control object,
A period of resetting the phase that defines the periodic motion is established based on the phase response curve updated by the phase response curve learning method according to claim 1 to establish synchronization between the controller and the controlled object. Motor control method.
複数の事象が周期的に発生する周期的運動の制御対象を、該制御対象の状態に基づいて制御する周期的運動制御装置であって、
請求項1に記載の位相反応曲線学習方法によって更新された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行なって前記制御対象との間で同期を確立させる手段を備えること
を特徴とする周期的運動制御装置。
A periodic motion control device that controls a control object of periodic motion in which a plurality of events occur periodically based on the state of the control object,
A means for establishing synchronization with the controlled object by resetting a phase defining a periodic motion based on the phase response curve updated by the phase response curve learning method according to claim 1. A periodic motion control device.
複数の事象が周期的に発生する周期的運動を制御対象に実行させる制御器と、
前記周期的運動の外乱因子に対する位相の変化量を示す位相反応曲線を強化学習によって更新させるべく、
各事象の成否を判定する手段、
判定結果に基づいて各事象の達成度を設定する手段、及び
設定した各事象の達成度の累計値が最大になるように前記位相反応曲線のパラメータを更新する手段
を備える位相反応曲線学習装置と
を有し、外乱因子に対応して周期的運動のパターンを変更するようにしてあることを特徴とする周期的運動制御装置。
A controller that causes a controlled object to execute a periodic motion in which a plurality of events occur periodically;
In order to update the phase response curve indicating the amount of phase change with respect to the disturbance factor of the periodic motion by reinforcement learning,
Means for determining the success or failure of each event;
A phase response curve learning device comprising: means for setting the degree of achievement of each event based on the determination result; and means for updating the parameter of the phase response curve so that a cumulative value of the degree of achievement of each set event is maximized; And a periodic motion control device that changes the pattern of the periodic motion in response to a disturbance factor.
パラメータを更新された位相反応曲線に基づいて、周期的運動を規定する位相のリセットを行い、制御対象との間で同期を確立するようにしてある請求項5に記載の周期的運動制御装置。   The periodic motion control device according to claim 5, wherein the phase that defines the periodic motion is reset based on the phase response curve whose parameters are updated, and synchronization is established with the controlled object. 前記制御対象は2足歩行ロボットであり、該ロボットの足の接地時に前記リセットを行う請求項6に記載の周期的運動制御装置。   The periodic motion control device according to claim 6, wherein the control target is a biped robot, and the reset is performed when the robot's foot is grounded. 前記2足歩行ロボットは5リンクを有し、転倒回避のために股関節角及び/または膝関節角を変位するパターン変更を行う請求項7に記載の周期的運動制御装置。   The periodic motion control device according to claim 7, wherein the biped robot has five links and performs pattern change for displacing a hip joint angle and / or a knee joint angle to avoid a fall. 前記2足歩行ロボットは5リンクを有し、転倒回避のために股関節角及び膝関節角を正方向に変位するパターン変更を行う請求項8に記載の周期的運動制御装置。   The periodic motion control device according to claim 8, wherein the biped walking robot has five links, and performs a pattern change that displaces the hip joint angle and the knee joint angle in a positive direction in order to avoid falling. 前記2足歩行ロボットは5リンクを有し、転倒回避のために股関節軌道及び/または膝関節軌道を変位するパターン変更を行う請求項7乃至9のいずれかに記載の周期的運動制御装置。   The periodic motion control device according to any one of claims 7 to 9, wherein the biped robot has five links and changes a pattern for displacing a hip joint trajectory and / or a knee joint trajectory in order to avoid a fall. 前記2足歩行ロボットは5リンクを有し、転倒回避のために股関節軌道を負方向に、膝関節軌道を正方向に各変位するパターン変更を行う請求項7乃至10のいずれかに記載の周期的運動制御装置。   The cycle according to any one of claims 7 to 10, wherein the biped robot has five links, and performs a pattern change that displaces the hip joint trajectory in the negative direction and the knee joint trajectory in the positive direction to avoid falling. Motion control device. 周期的歩行を行う2リンクの2足歩行ロボットを、該ロボットの状態に基づいて制御する歩行運動制御装置であって、
Figure 2007175860
φは位相で表される周期軌道をロボットに追従させる制御器と、
歩行運動の成否を判定する手段、
該手段での判定結果に基づいて各歩行の達成度を設定する手段、及び
該手段で設定した達成度の累計値が最大になるように、外乱因子による歩行運動の位相の変化量を示す位相反応曲線のパラメータを更新する手段
を備える位相反応曲線学習装置と
を有し、
パラメータを更新された位相反応曲線に基づいて、歩行運動を規定する位相をリセットし、ロボットとの間で同期を確立させるようにしてあることを特徴とする歩行運動制御装置。
A walking motion control device that controls a bi-link biped robot that performs periodic walking based on the state of the robot,
Figure 2007175860
φ is a controller that causes the robot to follow a periodic trajectory represented by a phase;
Means for determining success or failure of walking movement,
A means for setting the degree of achievement of each walking based on the determination result by the means, and a phase indicating the amount of change in the phase of the walking movement due to a disturbance factor so that the cumulative value of the degree of achievement set by the means is maximized A phase response curve learning device comprising means for updating a parameter of the response curve,
A walking motion control device characterized in that, based on a phase response curve with updated parameters, a phase defining a walking motion is reset to establish synchronization with a robot.
前記ロボットの足の接地時に前記リセットを行う請求項12に記載の歩行運動制御装置。   The walking motion control device according to claim 12, wherein the reset is performed when the robot's foot is grounded.
JP2006251704A 2005-11-30 2006-09-15 Method and device for learning phase reaction curve, method and device for controlling cyclic movement, and walking movement controller Pending JP2007175860A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2006251704A JP2007175860A (en) 2005-11-30 2006-09-15 Method and device for learning phase reaction curve, method and device for controlling cyclic movement, and walking movement controller
PCT/JP2006/318504 WO2007063633A1 (en) 2005-11-30 2006-09-19 Phase reaction curve learning method and device, periodic motion control method and device, and walking control device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2005346122 2005-11-30
JP2006251704A JP2007175860A (en) 2005-11-30 2006-09-15 Method and device for learning phase reaction curve, method and device for controlling cyclic movement, and walking movement controller

Publications (1)

Publication Number Publication Date
JP2007175860A true JP2007175860A (en) 2007-07-12

Family

ID=38091973

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2006251704A Pending JP2007175860A (en) 2005-11-30 2006-09-15 Method and device for learning phase reaction curve, method and device for controlling cyclic movement, and walking movement controller

Country Status (2)

Country Link
JP (1) JP2007175860A (en)
WO (1) WO2007063633A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112472530B (en) * 2020-12-01 2023-02-03 天津理工大学 Reward function establishing method based on walking ratio trend change

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05305583A (en) * 1992-04-30 1993-11-19 Honda Motor Co Ltd Walking control device for leg type mobile robot
JP2004202652A (en) * 2002-12-26 2004-07-22 Toyota Motor Corp Biped robot walking with trunk twisting and method therefor
JP2005096068A (en) * 2003-08-25 2005-04-14 Sony Corp Robot device and attitude control method for robot

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05305583A (en) * 1992-04-30 1993-11-19 Honda Motor Co Ltd Walking control device for leg type mobile robot
JP2004202652A (en) * 2002-12-26 2004-07-22 Toyota Motor Corp Biped robot walking with trunk twisting and method therefor
JP2005096068A (en) * 2003-08-25 2005-04-14 Sony Corp Robot device and attitude control method for robot

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CSNG200501504011; 森  健  Takeshi  MORI: '方策こう配法に基づく強化学習法と二足歩行運動制御への応用  Reinforcement Learning Based on a Policy G' 電子情報通信学会論文誌  (J88-D-II)  第6号  THE IEICE TRANSACTIONS ON INFORMATION AND SYST 第J88-D-II巻, 社団法人電子情報通信学会  THE INSTITUTE OF ELECTRO *
JPN6011053326; 森  健  Takeshi  MORI: '方策こう配法に基づく強化学習法と二足歩行運動制御への応用  Reinforcement Learning Based on a Policy G' 電子情報通信学会論文誌  (J88-D-II)  第6号  THE IEICE TRANSACTIONS ON INFORMATION AND SYST 第J88-D-II巻, 社団法人電子情報通信学会  THE INSTITUTE OF ELECTRO *
JPN6012037730; 門脇千智、森本淳、中西淳、大高洋平、川人光男: '強化学習による二足歩行のための位相反応曲線の獲得' 日本神経回路学会 第15回 全国大会 講演論文集 , 20050920, pp.42-43, 日本神経回路学会 *
JPN6012037732; A. M. Schillings, B. M. H. van Wezel, TH. Mulder, J. Duysens: 'Muscular responses and movement strategies during stumbling over obstacles' Journal of Neurophysiology Vol. 83 No. 4, 2000, pp2093-2102 *
JPN6012037736; 門脇千智: '人間の歩行・転倒回避メカニズムの理解のためのCPG位相に注目したアプローチ' 奈良先端科学技術大学院大学情報化学研究科情報生命科学専攻修士論文 , 20060310 *

Also Published As

Publication number Publication date
WO2007063633A1 (en) 2007-06-07

Similar Documents

Publication Publication Date Title
Gehring et al. Control of dynamic gaits for a quadrupedal robot
Kryczka et al. Online regeneration of bipedal walking gait pattern optimizing footstep placement and timing
US6961640B2 (en) Motion control for a legged robot
Christensen et al. A distributed and morphology-independent strategy for adaptive locomotion in self-reconfigurable modular robots
JP2005096068A (en) Robot device and attitude control method for robot
JP2006289602A (en) Robot device and its control method
CN112405504B (en) Exoskeleton robot
Sugimoto et al. The eMOSAIC model for humanoid robot control
CN114467097A (en) Method for learning parameters of a neural network, for generating trajectories of an exoskeleton and for setting the exoskeleton in motion
JP2024502726A (en) How to move an exoskeleton
Rodriguez et al. Combining simulations and real-robot experiments for Bayesian optimization of bipedal gait stabilization
JP6781101B2 (en) Non-linear system control method, biped robot control device, biped robot control method and its program
Barfoot et al. Experiments in learning distributed control for a hexapod robot
Xi et al. Walking control of a biped robot on static and rotating platforms based on hybrid reinforcement learning
Hitomi et al. Reinforcement learning for quasi-passive dynamic walking of an unstable biped robot
JP2007175860A (en) Method and device for learning phase reaction curve, method and device for controlling cyclic movement, and walking movement controller
Gritli Occasional stabilisation of limit cycle walking and control of chaos in the passive dynamics of the compass-gait biped model
Sacchi et al. Deep reinforcement learning of robotic prosthesis for gait symmetry in trans-femoral amputated patients
Spitz et al. Trial-and-error learning of repulsors for humanoid QP-based whole-body control
Jamisola et al. An approach to drastically reduce the required legs DOFs for bipedal robots and lower-limb exoskeletons
Asta et al. Nature-inspired optimization for biped robot locomotion and gait planning
Meerza et al. Self Modeling and Gait Control of Quadruped Robot Using Q-Learning Based Particle Swarm Optimization
Dinesh et al. Biped robot-based walking on uneven terrain: Stability and zero moment point (ZMP) analysis
Nahrendra et al. Robust recovery motion control for quadrupedal robots via learned terrain imagination
EP4234177A1 (en) Methods for training a neural network and for using said neural network to stabilize a bipedal robot

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20090907

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A711

Effective date: 20090907

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20090907

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20091106

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20091106

A711 Notification of change in applicant

Free format text: JAPANESE INTERMEDIATE CODE: A711

Effective date: 20100308

A521 Written amendment

Free format text: JAPANESE INTERMEDIATE CODE: A821

Effective date: 20100308

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20111011

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20120724

A02 Decision of refusal

Free format text: JAPANESE INTERMEDIATE CODE: A02

Effective date: 20121113