JPH04211801A

JPH04211801A - Learning controller

Info

Publication number: JPH04211801A
Application number: JP503291A
Authority: JP
Inventors: Shigeaki Matsubayashi; 成彰松林; Osamu Ito; 修伊藤
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1990-01-22
Filing date: 1991-01-21
Publication date: 1992-08-03
Anticipated expiration: 2013-03-04
Also published as: JP2720605B2

Abstract

PURPOSE:To reduce the necessary number of times of repetition in a learning controller to detect output (y) obtained by changing input U to a controlled object by a very small amount, and try repeatedly this very small change until the output (y) coincides with a target value yd. CONSTITUTION:In a qualitative model arithmetic circuit 303 and an input change vector selection circuit 309, since only an input change vector U1 capable of bringing the output (y) close to the target value yd is selected, and a trial is executed in respect of this vector, it is unnecessary to execute the trial for all input change vectors like in the conventional one, and the number of times of repetition can be greatly reduced. Further, in the case that a state varies, and the output (y) tends to go away from the target value yd, since a qualitative model is corrected by a qualitative model correction circuit 312 so that the output (y) approaches to the target value yd an effect to reduce the number of times of the repetition can be kept in all the states.

Description

【発明の詳細な説明】[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、例えば歩行ロボットや
化学プラントなどのように、入出力間の関係をあらかじ
め正確に把握する事が困難な制御対象を制御する事が可
能な学習制御装置に関するものである。[Field of Industrial Application] The present invention relates to a learning control device capable of controlling control objects for which it is difficult to accurately grasp the relationship between input and output in advance, such as walking robots and chemical plants. It is something.

【０００２】0002

【従来の技術】従来の学習制御装置としては、例えば、
論文　”行動する機械”（生体の科学，Ｖｏｌ．３７，
　Ｎｏ．１，　ｐｐ．４１−４８，　１９８６年）にお
いて、中野によって提案されているものがある。この論
文では、図２に示された歩行ロボットの制御について論
じている。[Prior Art] Conventional learning control devices include, for example,
Paper ``Machine that acts'' (Science of living organisms, Vol. 37,
No. 1, pp. 41-48, 1986), proposed by Nakano. This paper discusses the control of the walking robot shown in Figure 2.

【０００３】図２において、歩行ロボット１０５は前足
１０２Ａおよび後足１０２Ｂより構成されており、胴体
１００で接続されている。さらに前足１０２Ａおよび後
足１０２Ｂはそれぞれモーター１０３Ａおよび１０３Ｂ
で駆動されており、各モータの回転はドライバー回路１
０４より指令されている。また歩行ロボットが移動した
距離は出力検出器１０６で検出される。[0003] In FIG. 2, a walking robot 105 is composed of front legs 102A and hind legs 102B, which are connected by a body 100. Furthermore, the front legs 102A and the rear legs 102B are provided with motors 103A and 103B, respectively.
The rotation of each motor is driven by driver circuit 1.
Commanded by 04. Further, the distance traveled by the walking robot is detected by the output detector 106.

【０００４】以上のように構成された歩行ロボット１０
５の動作は（数２）式のように表現できる。Walking robot 10 configured as described above
The operation of No. 5 can be expressed as in equation (2).

【０００５】[0005]

【数２】[Math 2]

【０００６】ここで、ｙ　は歩行ロボットの出力である
歩行距離、Ｕ＝（ｕ１Ａ，ｕ１Ｂ，ｕ２Ａ，ｕ２Ｂ）は
歩行ロボットの前足１０２Ａおよび後足１０２Ｂへの入
力ベクトルであるモータ回転角ベクトル、ｇは正確に把
握することが困難な関数である。Here, y is the walking distance which is the output of the walking robot, U=(u1A, u1B, u2A, u2B) is the motor rotation angle vector which is the input vector to the front leg 102A and the rear leg 102B of the walking robot, and g is a function that is difficult to understand accurately.

【０００７】（数２）式のｙをできるだけ大きくするよ
うなＵを求めるために、従来の学習制御装置は、一般的
に以下の手順から構成される「山登り法」を用いている
。[0007] In order to obtain U such that y in equation (2) is as large as possible, conventional learning control devices generally use a "hill climbing method" consisting of the following steps.

【０００８】手順１：　　例えば　（△ｕ１Ａ，０，０，０）、（０
，−△ｕ１Ｂ，△ｕ２Ａ，△ｕ２Ｂ）などの、微小な値
を各要素に持つ入力変化ベクトル△Ｕｉを△Ｕ１，…，
△Ｕ８１と８１個作成する。この例では、入力変化ベク
トルの個数は３４＝８１個となり、”３”は各要素の符
号の種類数、すなわち”＋”、”−”あるいは”０”の
３個に相当し、ベキ数”４”は入力変化ベクトル△Ｕｉ
の次数に相当する。Step 1: For example, (△u1A,0,0,0), (0
, -△u1B, △u2A, △u2B), the input change vector △Ui has minute values in each element, such as △U1,...,
Create △U81 and 81 pieces. In this example, the number of input change vectors is 34=81, and "3" corresponds to the number of types of signs for each element, that is, "+", "-", or "0", and the power number " 4” is the input change vector △Ui
corresponds to the order of

【０００９】手順２：　　現在の入力ベクトルＵに上記の入力変化ベ
クトルを一つづつ加えて、すなわち、Ｕｉ←Ｕ＋△Ｕｉ
として歩行ロボットに入力し、その時の出力変化　△ｙ
１，．．．，△ｙ８１を検出する。Step 2: Add the above input change vectors one by one to the current input vector U, that is, Ui←U+△Ui
input to the walking robot as , and the output change at that time △y
1,. ．．．． , Δy81 are detected.

【００１０】手順３：　　上記の出力変化を最大にする入力変化ベク
トル　△Ｕｊ　を選び、現在の入力ベクトルＵをＵ←Ｕ
＋△Ｕｊと更新して、手順２〜３を繰り返す。ただし、
上記の出力変化が全て負または零の時は、現在の入力ベ
クトルが所望のベクトルであるので、上記の繰り返しを
終了する。Step 3: Select the input change vector △Uj that maximizes the above output change, and change the current input vector U by U←U
Update +ΔUj and repeat steps 2 and 3. however,
If all of the above output changes are negative or zero, the current input vector is the desired vector, and the above iteration ends.

【００１１】[0011]

【発明が解決しようとする課題】この学習制御装置は、
全く同一の構成を用いて、歩行ロボットに限らず、特性
のわからないあらゆる制御対象に適用可能であるという
利点を持つ。しかしながら、手順２においては８１回も
の試行が必要であり、仮に出力ｙが極大値に達するまで
に必要な手順２〜３の繰り返し回数を１０とすると、合
計で８１０回という極めて多くの試行を繰り返さなけれ
ばならないという実用上の課題があった。[Problem to be solved by the invention] This learning control device
It has the advantage that it can be applied not only to walking robots but also to any controlled object whose characteristics are unknown, using exactly the same configuration. However, 81 trials are required in step 2, and if we assume that the number of repetitions of steps 2 and 3 required for the output y to reach its maximum value is 10, a total of 810 trials will be repeated. There was a practical problem that it had to be done.

【００１２】0012

【課題を解決するための手段】したがって本発明の目的
は、従来の学習制御装置と比較して、必要な繰り返し回
数が極めて少ない学習制御装置を提供する事である。SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a learning control device that requires a significantly smaller number of repetitions than conventional learning control devices.

【００１３】この目的を達成するために、本発明は以下
のような構成を備えたものである。即ち、制御対象に印
加する制御入力Ｕを変化させる複数の入力変化ベクトル
ΔＵｉを発生させる手段と、前記入力変化ベクトルΔＵ
ｉに所定の定性モデルにもとづいた演算を行ない予測符
号データ[0013] In order to achieve this object, the present invention has the following configuration. That is, means for generating a plurality of input change vectors ΔUi for changing the control input U applied to the controlled object;
Predicted code data is obtained by performing calculations on i based on a predetermined qualitative model.

【００１４】[0014]

【数３】[Math 3]

【００１５】を出力する定性モデル演算手段と、前記制
御対象の出力ｙを検出する検出手段と、前記検出手段の
検出値ｙと目標値ｙｄとの差の値の符号を検出する誤差
符号検出手段と、前記誤差符号検出手段の出力［ｅ］及
び前記予測符号データ（数３）に基づいて、前記入力変
化ベクトルΔＵｉを選択する入力変化ベクトル選択回路
と、前記制御対象の出力の値の変化を表す所定の符号を
検出する出力符号検出手段と、前記入力ベクトル選択回
路で選択された入力変化ベクトルを前記制御対象の入力
に加算する入力ベクトル更新手段と、前記制御対象の入
力及び前記出力符号検出手段の検出出力に基づいて前記
定性モデルを修正する定性モデル修正手段とを具備する
ことを特徴とする学習制御装置を提供するものである。Qualitative model calculating means for outputting y, detecting means for detecting the output y of the controlled object, and error sign detecting means for detecting the sign of the difference between the detected value y of the detecting means and the target value yd. an input change vector selection circuit that selects the input change vector ΔUi based on the output [e] of the error code detection means and the predicted code data (Equation 3); output sign detection means for detecting a predetermined sign expressed by the control object; input vector updating means for adding the input change vector selected by the input vector selection circuit to the input of the controlled object; and detection of the input and output sign of the controlled object. The present invention provides a learning control device characterized by comprising: qualitative model modification means for modifying the qualitative model based on a detection output of the means.

【００１６】[0016]

【作用】本発明によれば、定性モデル演算手段および入
力変化ベクトル選択手段において、出力ｙを所望の目標
値ｙｄに近づけることができる入力変化ベクトル△Ｕｊ
のみを選択し、これについてのみ試行するために、従来
のようにすべての入力変化ベクトルについて試行する必
要がなく、出力ｙが目標値ｙｄに一致するまでの繰り返
し回数を極めて少なくすることができる。さらに状態が
変化し、出力ｙが目標値ｙｄから離れる傾向にある場合
には、定性モデル修正手段において出力ｙが目標値ｙｄ
に近づくように定性モデルを修正するため、あらゆる状
態で繰り返し回数を少なくできる効果を維持することが
できる。[Operation] According to the present invention, in the qualitative model calculation means and the input change vector selection means, the input change vector △Uj that can bring the output y closer to the desired target value yd
Since only one of the input change vectors is selected and tested, there is no need to try all the input change vectors as in the conventional method, and the number of repetitions until the output y matches the target value yd can be extremely reduced. If the state further changes and the output y tends to deviate from the target value yd, the qualitative model correction means changes the output y to the target value yd.
Since the qualitative model is modified to approach , the effect of reducing the number of repetitions can be maintained in all conditions.

【００１７】[0017]

【実施例】以下図面を用いて、本発明の第一の実施例に
ついて説明する。図１は本発明の第一の実施例における
学習制御装置のブロック図である。図１において、制御
対象は図３（ａ）および図３（ｂ）に示す歩行ロボット
１０５である。図３（ａ）および図３（ｂ）において、
歩行機械１０５は、胴体１００に前足１０２Ａおよび後
足１０２Ｂが取り付けられており、それぞれがモータ１
０３Ａおよび１０３Ｂで回動できるように構成されてい
る。床１０１と接触している前足先１０２Ｃおよび後足
先１０２Ｄのそれぞれの摩擦係数は互いに異なっている
。また歩行ロボットが移動した距離は出力検出器１０６
で検出される。DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram of a learning control device in a first embodiment of the present invention. In FIG. 1, the controlled object is a walking robot 105 shown in FIGS. 3(a) and 3(b). In FIG. 3(a) and FIG. 3(b),
The walking machine 105 has front legs 102A and hind legs 102B attached to a body 100, each of which is powered by a motor 1.
It is configured to be able to rotate at 03A and 103B. The respective friction coefficients of the front foot tip 102C and the rear foot tip 102D that are in contact with the floor 101 are different from each other. Also, the distance traveled by the walking robot is measured by the output detector 106.
Detected in

【００１８】上記の歩行ロボット１０５の動作を以下に
説明する。歩行ロボットに与えられる入力ベクトルＵは
（数４）式によって表される。The operation of the walking robot 105 described above will be explained below. The input vector U given to the walking robot is expressed by equation (4).

【００１９】[0019]

【数４】[Math 4]

【００２０】（数４）式において、ｕ１Ａは動作前の前
足の角度、ｕ１Ｂは動作後の前足の角度、ｕ２Ａは動作
前の後足の角度、ｕ２Ｂは動作後の後足の角度である。In equation (4), u1A is the angle of the front foot before the movement, u1B is the angle of the front foot after the movement, u2A is the angle of the hind foot before the movement, and u2B is the angle of the hind foot after the movement.

【００２１】制御入力Ｕはベクトル量であり、その要素
ｕ１Ａ，ｕ１Ｂ，ｕ２Ａ及びｕ２Ｂはいずれも実数で定
義される。The control input U is a vector quantity, and its elements u1A, u1B, u2A and u2B are all defined as real numbers.

【００２２】前足１０２Ａと後足１０２Ｂはそれぞれの
モータ１０３Ａ及び１０３Ｂにより、図３（ａ）及び図
３（ｂ）に示すように回転される。その結果前足先１０
２Ｃ及び後足先１０２Ｄの床面１０１に対する摩擦力が
同じでない場合歩行ロボット１０５は一定方向に移動す
る。The front legs 102A and the rear legs 102B are rotated by respective motors 103A and 103B as shown in FIGS. 3(a) and 3(b). As a result, the tip of the front foot is 10
If the frictional forces of the foot 2C and the rear foot 102D against the floor 101 are not the same, the walking robot 105 moves in a certain direction.

【００２３】歩行ロボットは図３（ａ）に示す状態から
図３（ｂ）に示す状態に動作し、次に再び図３（ａ）に
示す状態に戻り、１サイクルの歩行動作を完了する。従
って（数４）式は歩行ロボットの半サイクルの動作を表
している。The walking robot moves from the state shown in FIG. 3(a) to the state shown in FIG. 3(b), then returns to the state shown in FIG. 3(a) again, and completes one cycle of walking motion. Therefore, equation (4) represents the motion of the walking robot in a half cycle.

【００２４】歩行ロボット１０５が図３（ａ）、図３（
ｂ）に示す１サイクルの歩行動作によって進む距離をｙ
とすると、制御入力Ｕと距離ｙの関係は（数２）式によ
って表される。この（数２）式における関数ｇは、前足
１０２Ａと後足１０２Ｂにおける歩行ロボット１０５の
重量配分、前足１０２Ａの長さＬ１と後足１０２Ｂの長
さＬ２の比、及び床１０１と各足先１０２Ｃ、１０２Ｄ
間の摩擦係数等によって変化する。The walking robot 105 is shown in FIGS. 3(a) and 3(a).
The distance traveled by one cycle of walking motion shown in b) is y
Then, the relationship between the control input U and the distance y is expressed by equation (2). The function g in this equation (2) is based on the weight distribution of the walking robot 105 between the front legs 102A and the hind legs 102B, the ratio of the length L1 of the front legs 102A to the length L2 of the hind legs 102B, and the floor 101 and the tip of each foot 102C. , 102D
It changes depending on the friction coefficient between the two.

【００２５】図１において、第１の実施例の学習制御装
置は、入力変化ベクトルを定める入力変化ベクトル決定
回路３１０、入力変化ベクトル決定回路３１０の出力に
基づいて、歩行ロボットに入力される入力ベクトルを更
新する入力ベクトル更新回路３１１、距離検出器１０６
の出力から移動方向の符号　（一定の方向を正又は負と
定めておく）を検出する出力符号検出回路３１３、定性
モデル修正回路３１２及び誤差符号検出回路３０８を有
している。In FIG. 1, the learning control device of the first embodiment determines the input vector to be input to the walking robot based on the output of the input change vector determination circuit 310 and the input change vector determination circuit 310, which determines the input change vector. An input vector update circuit 311 that updates the distance detector 106
It has an output sign detection circuit 313 that detects the sign of the moving direction (a fixed direction is defined as positive or negative) from the output of the output, a qualitative model correction circuit 312, and an error sign detection circuit 308.

【００２６】入力変化ベクトル決定回路３１０は次に示
す回路を有している。（１）　入力変化ベクトルメモリ３０１：あらかじめ定
められた８１個の入力変化ベクトルΔＵ１，…，ΔＵ８
１がメモリされている。入力変化ベクトルΔＵｉの数は
「従来の技術」の項で述べた方法により求められる。入
力変化ベクトルΔＵｉは４つのデータ（Δｕ１Ａ，Δｕ
１Ｂ，Δｕ２Ａ，Δｕ２Ｂ）を含んでおり、各データは
正の値、負の値、零のいずれかである。例えば（Δｕ１
Ａ，０，０，０）、（０，−Δｕ１Ｂ，Δｕ２Ａ，Δｕ
２Ｂ）となる。正の値はあらかじめ定められた方向への
増加を表し、負の値は減少を表している。零は変化のな
いことを表している。各データ（Δｕ１Ａ，Δｕ１Ｂ，
Δｕ２Ａ，Δｕ２Ｂ）は前足１０２Ａ及び後足１０２Ｂ
の回転角度に加えられる微少角であり、例えば２゜など
の微小な値が設定される。各データがすべて同じ角度で
ある必要はなく、互に異なる値が設定されてもよい（例
：２，−３゜，０゜，２゜）。（２）　スイッチ３０５Ａ：入力変化ベクトルメモリ３０１のデータを符号ベクトル
検出器３０２に入力するとき閉にされる。（３）　符号ベクトル検出器３０２：入力ベクトルメモリ３０１から入力される入力変化ベク
トルΔＵｉに基づいて、その各データの符号（＋、−、
０）を表す符号ベクトル［ΔＵｉ］を出力する。（以後
［　　］に入れられた文字はその文字が表すデータの符
号“＋”、“−”、あるいは“０”を示す。）例えば入
力変化ベクトルΔＵｉ＝（０，−Δｕ１Ｂ，Δｕ２Ａ，
Δｕ２Ｂ）が入力されると、符号ベクトル［ΔＵｉ］＝
（０，−，＋，＋）が出力される。（４）　定性モデル演算回路３０３：符号ベクトル検出器３０２から出力される符号ベクトル
［ΔＵｉ］に基づいて、歩行ロボット１０５の移動距離
と移動方向を表す出力ｙの符号（移動方向に対応する）
を予測する演算回路を有する。演算はあらかじめ設定さ
れた定性モデルに従って行なわれ、結果の予測符号デー
タThe input change vector determining circuit 310 has the following circuit. (1) Input change vector memory 301: 81 predetermined input change vectors ΔU1,...,ΔU8
1 is stored in memory. The number of input change vectors ΔUi is determined by the method described in the "Prior Art" section. The input change vector ΔUi consists of four data (Δu1A, Δu
1B, Δu2A, Δu2B), and each data is either a positive value, a negative value, or zero. For example, (Δu1
A, 0, 0, 0), (0, -Δu1B, Δu2A, Δu
2B). Positive values represent an increase in the predetermined direction, and negative values represent a decrease. Zero represents no change. Each data (Δu1A, Δu1B,
Δu2A, Δu2B) are front legs 102A and hind legs 102B
This is a minute angle added to the rotation angle of , and a minute value such as 2 degrees is set, for example. It is not necessary for each data to be the same angle, and different values may be set (eg, 2, -3°, 0°, 2°). (2) Switch 305A: Closed when inputting data from input change vector memory 301 to code vector detector 302. (3) Sign vector detector 302: Based on the input change vector ΔUi input from the input vector memory 301, the sign (+, -,
A code vector [ΔUi] representing 0) is output. (Hereafter, characters placed in brackets [ ] indicate the sign "+", "-", or "0" of the data represented by that character.) For example, input change vector ΔUi = (0, -Δu1B, Δu2A,
When Δu2B) is input, the code vector [ΔUi]=
(0, -, +, +) is output. (4) Qualitative model calculation circuit 303: Based on the code vector [ΔUi] output from the code vector detector 302, the sign of the output y representing the moving distance and moving direction of the walking robot 105 (corresponding to the moving direction)
It has an arithmetic circuit that predicts. The calculation is performed according to a preset qualitative model, and the resulting predicted sign data is

【００２７】[0027]

【数５】[Math 5]

【００２８】が出力される。以後文字の上のハット“＾
”はその文字が表すデータの予測データを表す。　予測
符号データ（数５）は出力ｙの変化方向を示す符号を表
しており、増加予測は“＋”、減少予測は“−”、変化
なしは“０”、予測不可能は“？”のいずれかのデータ
を有する。（５）　スイッチ３０５Ｂ：定性モデル演算回路３０３の出力データをメモリ３０４
に入力するときに閉じられる。（６）　メモリ３０４：定性モデル演算回路３０３から出力されて予測符号デー
タ（数５）はスイッチ３０５Ｂを経てメモリ３０４にメ
モリされる。通常８１個の予測符号データ##EQU1## is output. From now on, the hat above the letters “＾
” represents the predicted data of the data represented by that character. The predicted code data (Equation 5) represents the sign indicating the direction of change in the output y, with “+” for predicted increase, “-” for predicted decrease, and no change. is “0” and unpredictable is “?” (5) Switch 305B: Transfers the output data of the qualitative model calculation circuit 303 to the memory 304.
Closed when typing. (6) Memory 304: The predicted code data (Equation 5) output from the qualitative model calculation circuit 303 is stored in the memory 304 via the switch 305B. Normally 81 predicted code data

【００２９】[0029]

【数６】[Math 6]

【００３０】がメモリされる。（７）　入力変化ベクトル選択回路３０９：メモリ３０
４からの予測符号データ（数５）と入力変化ベクトルΔ
Ｕｉが入力され、そのすべての予測符号データ（数６）
からその符号が後に述べる誤差符号検出回路３０８から
入力される誤差の値の符号［ｅ］と一致する１個の予測
符号データ##EQU1## is stored in memory. (7) Input change vector selection circuit 309: memory 30
Predicted code data from 4 (Equation 5) and input change vector Δ
Ui is input, and all its predicted code data (Equation 6)
One predicted code data whose code matches the sign [e] of the error value input from the error code detection circuit 308 described later.

【００３１】[0031]

【数７】[Math 7]

【００３２】が選択され、定性モデル修正回路３１１に
印加される。この学習制御装置はさらに次の回路を備え
ている。誤差符号検出回路３０８は距離検出器１０６に
よって検出された値ｙと目標値ｙｄとの差を求める誤差
演算回路３０６を備え、演算結果の誤差ｅを符号検出回
路３０７に入力する。符号検出回路３０７においては、
誤差ｅの値の符号［ｅ］を検出し、入力変化ベクトル選
択回路３０９に入力する。符号［ｅ］は“＋”、“−”
、“０”のいずれか１つを表すデータを有している。す
なわち符号［ｅ］は出力ｙを目標出力ｙｄに近づけるた
めに出力ｙを増加又は減少させるか、あるいは現在の値
を保持すべきかの情報を有している。##EQU1## is selected and applied to the qualitative model modification circuit 311. This learning control device further includes the following circuit. The error sign detection circuit 308 includes an error calculation circuit 306 that calculates the difference between the value y detected by the distance detector 106 and the target value yd, and inputs the calculation result error e to the sign detection circuit 307. In the code detection circuit 307,
The sign [e] of the value of the error e is detected and input to the input change vector selection circuit 309. Code [e] is “+”, “-”
, "0". That is, the code [e] has information as to whether the output y should be increased or decreased in order to bring it closer to the target output yd, or whether the current value should be maintained.

【００３３】入力ベクトル更新回路３１１は入力変化ベ
クトル選択回路３０９から出力される入力変化ベクトル
ΔＵｊと現在の入力Ｕとを加算演算し、更新された新し
い入力Ｕを出力する。スイッチ３１６は上記の加算演算
中は開となる。The input vector update circuit 311 performs an addition operation on the input change vector ΔUj output from the input change vector selection circuit 309 and the current input U, and outputs an updated new input U. Switch 316 is open during the above addition operation.

【００３４】定性モデル修正回路３１２には入力Ｕ、予
測符号データ（数７）が入力される。また出力符号検出
回路３１３において、移動距離の変化方向を表す符号変
化ベクトル［△ｙ］が検出されるとスイッチ３１４が閉
となり（図４のフローチャート図のステップ１，２）、
符号変化ベクトル［△ｙ］が定性モデル修正回路３１２
に入力される（ステップ３）。The input U and predicted code data (Equation 7) are input to the qualitative model correction circuit 312. Furthermore, when the output sign detection circuit 313 detects a sign change vector [Δy] representing the direction of change in the moving distance, the switch 314 is closed (steps 1 and 2 in the flowchart of FIG. 4).
The sign change vector [Δy] is the qualitative model correction circuit 312
(Step 3).

【００３５】定性モデル修正回路３１２において、符号
変化ベクトル［△ｙ］と予測符号データ（数７）が比較
され（ステップ４）、両者が等しくない場合はスイッチ
３１５が閉となり修正出力ＱＡ、ＱＢが定性モデル演算
回路３０３に入力される（ステップ５，６）。In the qualitative model correction circuit 312, the sign change vector [Δy] and the predicted sign data (Equation 7) are compared (step 4), and if the two are not equal, the switch 315 is closed and the correction outputs QA and QB are The data is input to the qualitative model calculation circuit 303 (steps 5 and 6).

【００３６】定性モデルについて以下に説明する。歩行
ロボットが前足１０２Ａと後足１０２Ｂを開いた図３（
ａ）の姿勢から図３（ｂ）に示す両足１０２Ａ，１０２
Ｂを閉じた姿勢へ移るとき、前足先１０２Ｃの摩擦力が
後足先１０２Ｄの摩擦力より大きいときは、前足先１０
２Ｃは床１０１上をすべらず、後足先１０２Ｄのみが床
１０１上をすべって、歩行ロボットは図５に示すように
距離ｙＡＢだけ移動する。この場合、前足１０２Ａの角
度の変化量（ｕ１Ａ−ｕ１Ｂ）が大きいほど移動の距離
ｙＡＢは大きい。従って後足１０２Ｂの回転量は移動距
離に貢献しない。その結果、前記の姿勢の変化による移
動距離ｙＡＢは（数８）式により表される。The qualitative model will be explained below. Figure 3 shows the walking robot opening its front legs 102A and hind legs 102B (
Both legs 102A, 102 shown in FIG. 3(b) from the posture of a)
When moving B to the closed position, if the friction force at the front foot tip 102C is greater than the friction force at the rear foot tip 102D, the front foot tip 10
2C does not slide on the floor 101, only the tip of the rear foot 102D slides on the floor 101, and the walking robot moves by a distance yAB as shown in FIG. In this case, the larger the amount of change in the angle of the front foot 102A (u1A-u1B), the larger the distance yAB of movement. Therefore, the amount of rotation of the hind leg 102B does not contribute to the distance traveled. As a result, the moving distance yAB due to the change in posture is expressed by equation (8).

【００３７】[0037]

【数８】[Math. 8]

【００３８】ここに、Ｆ１Ａは前足先１０２Ｃの摩擦力
、Ｆ２Ａは後足先１０２Ｄの摩擦力である。Here, F1A is the frictional force of the front foot tip 102C, and F2A is the frictional force of the rear foot tip 102D.

【００３９】ｇ１，ｇ２は増加関数であり、ｇ１（０）
＝ｇ２（０）＝０である。（数８）式において、式（Ｆ１Ａ−Ｆ２Ａ）の値の符号
を判定する必要があるが、これらの摩擦力を検出するこ
は極めて困難である。そこで検知可能な角度データであ
る入力ベクトル（ｕ１Ａ，ｕ１Ｂ，ｕ２Ａ，ｕ２Ｂ）を
用いてこの式（Ｆ１Ａ−Ｆ２Ａ）に等価な式を表す。[0039] g1 and g2 are increasing functions, and g1(0)
=g2(0)=0. In equation (8), it is necessary to determine the sign of the value of equation (F1A-F2A), but it is extremely difficult to detect these frictional forces. Therefore, an expression equivalent to this expression (F1A-F2A) is expressed using input vectors (u1A, u1B, u2A, u2B) that are detectable angle data.

【００４０】（数８）式における式（Ｆ１Ａ−Ｆ２Ａ＝
０）は前足先１０２Ｃと後足先１０２Ｄの摩擦力が等し
いことを表している。前足１０２Ａの長さＬ１と後足１
０２Ｂの長さＬ２が等しく、前足１０２Ａと床１０１間
の摩擦係数μ１、後足１０２Ｂと床１０１間の摩擦係数
μ２が等しいと仮定すると、式（Ｆ１Ａ−Ｆ２Ａ＝０）
は式（ｕ１Ａ−ｕ２Ａ＝０）と等価である。In the equation (8), the equation (F1A-F2A=
0) indicates that the friction force between the front foot tip 102C and the rear foot tip 102D is equal. Length L1 of front leg 102A and hind leg 1
Assuming that the length L2 of 02B is equal, the friction coefficient μ1 between the front foot 102A and the floor 101, and the friction coefficient μ2 between the rear foot 102B and the floor 101 are equal, the formula (F1A-F2A=0)
is equivalent to the formula (u1A-u2A=0).

【００４１】上記の関係は一般には（数９）式によって
表される。The above relationship is generally expressed by equation (9).

【００４２】[0042]

【数９】[Math. 9]

【００４３】ここで、ＱＡはＬ１，Ｌ２，μ１，μ２の
関係によって変動する境界パラメータであり、従ってｕ
２Ａ−ｕ１Ａ−ＱＡは入力と境界パラメータからなる境
界関数であり、入力と同じ次元である。ただし、Ｌ１＝
Ｌ２　かつμ１＝μ２の時はＱＡ＝０となる。Here, QA is a boundary parameter that varies depending on the relationship between L1, L2, μ1, μ2, and therefore u
2A-u1A-QA is a boundary function consisting of an input and a boundary parameter, and has the same dimension as the input. However, L1=
When L2 and μ1=μ2, QA=0.

【００４４】（数４）式と（数８）式を組み合わせると
、（数１０）式が得られる。When formula (4) and formula (8) are combined, formula (10) is obtained.

【００４５】[0045]

【数１０】[Math. 10]

【００４６】同様に考えると、図３（ｂ）から図３（ａ
）へ変化するときの歩行距離ｙＢＡは（数１１）式で表
される。Considering the same way, FIGS. 3(b) to 3(a)
) The walking distance yBA when changing to ) is expressed by equation (11).

【００４７】[0047]

【数１１】[Math. 11]

【００４８】また、歩行ロボットが図３（ａ）→図３（
ｂ）→図３（ａ）と変化するとき、歩行距離ｙは、（数
１２）式で表わされる。[0048] Furthermore, the walking robot moves from Fig. 3(a) to Fig. 3(
When changing from b) to FIG. 3(a), the walking distance y is expressed by equation (12).

【００４９】[0049]

【数１２】[Math. 12]

【００５０】（数９）式〜（数１１）式をまとめると、
（表１）に示すようになる。To summarize equations (9) to (11), we get
(Table 1).

【００５１】[0051]

【表１】[Table 1]

【００５２】（表１）において、領域番号（１〜９）は
歩行ロボットに与えた入力Ｕ＝（ｕ１Ａ，ｕ１Ｂ，ｕ２
Ａ，ｕ２Ｂ）と境界パラメータＱＡ，ＱＢの差の値の符
号によって分けられる領域を示すものである。その領域
は、（数１０）式において、入力値（ｕ１Ａ−ｕ２Ａ）
と境界パラメータＱＡの差の値の符号から３通りに分け
られる。また（数１１）において、入力値（ｕ２Ｂ−ｕ
１Ａ）と境界パラメータＱＢの差の値の符号から３通り
の領域に分けられる。従って９（３×３＝９）通りの領
域に区分され、それぞれの領域において歩行距離ｙを求
めるための関数が異なる。In (Table 1), the area numbers (1 to 9) are the inputs U=(u1A, u1B, u2) given to the walking robot.
A, u2B) and the boundary parameters QA, QB). In equation (10), the area is the input value (u1A-u2A)
It can be divided into three types based on the sign of the difference value between and the boundary parameter QA. Also, in (Equation 11), the input value (u2B-u
1A) and the boundary parameter QB can be divided into three regions based on the sign of the difference value. Therefore, it is divided into 9 (3×3=9) regions, and the function for determining the walking distance y is different in each region.

【００５３】境界関数の値の符号は次にようにして得ら
れる。例えば、領域番号（１）において、境界関数符号
［ｕ２Ａ−ｕ１Ａ−ＱＡ］についてはｕ２Ａ−ｕ１Ａ−
ＱＡ＞０であるのでその値の符号は“＋”である。同様
にして、領域番号（２）において、境界関数符号［ｕ２
Ｂ−ｕ１Ｂ−ＱＢ］についてはｕ２Ｂ−ｕ１Ｂ−ＱＢ＝
０であるのでその値は“０”となる。The sign of the value of the boundary function is obtained as follows. For example, in area number (1), for boundary function code [u2A-u1A-QA], u2A-u1A-
Since QA>0, the sign of the value is "+". Similarly, in region number (2), boundary function code [u2
B-u1B-QB], u2B-u1B-QB=
Since it is 0, its value is "0".

【００５４】各領域番号における出力値ｙは次のように
して求められる。すなわち、領域番号（１）では、（数
１０）式よりｙＡＢ＝ｇ１（ｕ１Ａ−ｕ１Ｂ）、（数１
１）式よりｙＢＡ＝−ｇ１（ｕ１Ａ−ｕ１Ｂ）であるの
で、歩行距離ｙはThe output value y for each area number is determined as follows. That is, in area number (1), from equation (10), yAB=g1(u1A-u1B), ( equation 1
From formula 1), yBA=-g1(u1A-u1B), so the walking distance y is

【００５５】[0055]

【数１３】[Math. 13]

【００５６】となる。また、領域番号（２）では、（数
１０）式よりｙＡＢ＝ｇ１（ｕ１Ａ−ｕ１Ｂ）、（数１
１）式よりｙＢＡ＝０であるので、歩行距離ｙは[0056] Also, in area number (2), from equation (10), yAB=g1(u1A-u1B), (formula 1
From formula 1), yBA=0, so the walking distance y is

【００
５７】00
57]

【数１４】[Math. 14]

【００５８】となる。関数ｇ１，ｇ２が増加関数である
ので、入力ベクトルの値の符号に対する出力の符号を予
測することができる。この「符号の予測」が定性モデル
演算回路３０３に設定された「定性モデル」に基づいて
行なわれる。（表２）はこの「定性モデル」を表すもの
であり、境界関数符号［ｕ２Ａ−ｕ１Ａ−ＱＡ］及び［
ｕ２Ｂ−ｕ１Ｂ−ＱＢ］の符号の組合せに対応する予測
符号データ（数３）が示されている。[0058] Since the functions g1 and g2 are increasing functions, it is possible to predict the sign of the output relative to the sign of the input vector value. This “sign prediction” is performed based on the “qualitative model” set in the qualitative model calculation circuit 303. (Table 2) represents this "qualitative model", with boundary function codes [u2A-u1A-QA] and [
Predicted code data (Equation 3) corresponding to the code combination [u2B-u1B-QB] is shown.

【００５９】[0059]

【表２】[Table 2]

【００６０】（表２）において、予測符号データ（数３
）は次のようにして求められる。例えば領域番号（１）
の場合には、符号ベクトル［△Ｕｉ］＝（＋，０，−，
＋）に対して、予測符号データ（数５）は“０”となる
。（符号ベクトル［△Ｕｉ］がどのような値をとる場合
でも予測符号データIn (Table 2), the predicted code data (Equation 3
) can be obtained as follows. For example, area number (1)
In the case, code vector [△Ui] = (+, 0, −,
+), the predicted code data (Equation 5) is “0”. (No matter what value the code vector [△Ui] takes, the predicted code data

【００６１】[0061]

【数１５】[Math. 15]

【００６２】となる。）領域番号（２）の場合には、例
えば符号ベクトル［△Ｕｉ］＝（＋，−，−，＋）に対
して、予測符号データ（数５）は“＋”になる。[0062] ) In the case of area number (2), for example, the predicted code data (Equation 5) becomes "+" for code vector [ΔUi]=(+, -, -, +).

【００６３】[0063]

【数１６】[Math. 16]

【００６４】また例えば、符号ベクトル［△Ｕｉ］＝　
（＋、＋、−、＋）に対しては、予測符号データ（数５
）は確定した値が求まらない。For example, code vector [△Ui]=
For (+, +, -, +), predicted code data (Equation 5
) has no fixed value.

【００６５】[0065]

【数１７】[Math. 17]

【００６６】定性モデル修正回路３１２の出力は前足先
１０２Ｃと床１０１との摩擦係数μ１、及び後足先１０
２Ｄと床１０１との摩擦係数μ２、前足１０２Ａ及び後
足１０２Ｂのそれぞれの長さによって定まる境界パラメ
ータＱＡ，ＱＢを含んでいる。摩擦係数μ１，μ２は測
定の困難なデータであり、予測できない、従ってそれら
を含んでいる境界パラメータＱＡ，ＱＢを正確に予測す
ることができず、（表２）の予測が正しいとは限らない
。この予測が正しくなかった場合には、出力符号検出回
路３１３により検出された実際の出力値の符号データ［
Δｙ］と入力ベクトル選択回路３０９から出力される予
測符号データ（数３）が一致しない。このような場合に
は定性モデル演算回路３０３で用いられる定性モデルが
適正でないと思われるので、定性モデルの境界パラメー
タＱＡ，ＱＢを変更する。The output of the qualitative model correction circuit 312 is the friction coefficient μ1 between the front foot tip 102C and the floor 101, and the rear foot tip 10
It includes a coefficient of friction μ2 between 2D and the floor 101, and boundary parameters QA and QB determined by the respective lengths of the front foot 102A and the rear foot 102B. The friction coefficients μ1 and μ2 are data that are difficult to measure and cannot be predicted. Therefore, the boundary parameters QA and QB that include them cannot be accurately predicted, and the predictions in (Table 2) are not necessarily correct. . If this prediction is incorrect, the code data of the actual output value detected by the output code detection circuit 313 [
Δy] and the predicted code data (Equation 3) output from the input vector selection circuit 309 do not match. In such a case, the qualitative model used by the qualitative model calculation circuit 303 is considered to be inappropriate, so the boundary parameters QA and QB of the qualitative model are changed.

【００６７】実際の数値を当てはめた修正操作の一例を
次に示す。歩行ロボットの入力がAn example of a correction operation applying actual numerical values is shown below. The input of the walking robot is

【００６８】[0068]

【数１８】[Math. 18]

【００６９】であり、ＱＡ＝２０゜、ＱＢ＝１０゜とす
ると、（数１０）式から[0069] If QA=20° and QB=10°, then from equation (10),

【００７０】[0070]

【数１９】[Math. 19]

【００７１】また（数１１）式から[0071] Also, from equation (11),

【００７２】[0072]

【数２０】[Math. 20]

【００７３】（数１９）式と（数２０）式の演算結果か
ら（表２）の領域番号（２）が選択される。Region number (2) in Table 2 is selected from the calculation results of equations (19) and (20).

【００７４】このとき、入力変化ベクトルとして例えば
次のデータを入力するとする。At this time, it is assumed that the following data, for example, is input as the input change vector.

【００７５】[0075]

【数２１】[Math. 21]

【００７６】この場合、予測符号データ（数３）は（表
２）から次のように計算される。In this case, the predicted code data (Equation 3) is calculated from (Table 2) as follows.

【００７７】[0077]

【数２２】[Math. 22]

【００７８】次に上記の入力変化ベクトルが与えられた
歩行ロボットの歩行動作終了後の符号データ［Δｙ］が
“−”になった場合には、領域番号の選択が間違ってい
ると予想される。そこで（表２）において、予測符号デ
ータ（数３）が“−”になる領域番号をさがす。その結
果、適合する領域番号は（４）であることがわかる（（
数２０）式の演算から）。[0078] Next, if the code data [Δy] of the walking robot given the above input change vector after the completion of the walking motion becomes "-", it is predicted that the selection of the area number is incorrect. . Therefore, in (Table 2), a region number where the predicted code data (Equation 3) becomes "-" is searched. As a result, it is found that the matching area number is (4) ((
From the calculation of equation 20).

【００７９】そこで、（数１８）式，（数２１）式のデ
ータにおいて、領域番号（４）の境界関数に適合するよ
うな境界パラメータＱＡ，ＱＢを求める。Therefore, in the data of equations (18) and (21), boundary parameters QA and QB that match the boundary function of area number (4) are determined.

【００８０】（数１０）式、（数１１）式からFrom equation (10) and equation (11),

【００８
１】008
1]

【数２３】[Math. 23]

【００８２】上の２式が成立するためにはＱＡ’，ＱＢ
’の値を次のようにすればよい。In order for the above two equations to hold true, QA', QB
The value of ' can be set as follows.

【００８３】[0083]

【数２４】[Math. 24]

【００８４】ここで、“ε”は正の実数である。他方符
号データ［Δｙ］が“＋”の場合には[0084] Here, "ε" is a positive real number. On the other hand, if the sign data [Δy] is “+”

【００８５】[0085]

【数２５】[Math. 25]

【００８６】であるので、予測符号データと符号データ
が一致する。したがって境界パラメータＱＡ，ＱＢの修
正はしない。Therefore, the predicted code data and the code data match. Therefore, the boundary parameters QA and QB are not modified.

【００８７】両足の摩擦係数が等しく（μ１＝μ２）、
かつ前足と後足の長さが等しい（Ｌ１＝Ｌ２）場合には
、ＱＡ＝ＱＢ＝０である。したがって定性モデルの修正
は行なわない。その結果定性モデル修正回路３１２、出
力変化符号検出回路３１３及びスイッチ３１４，３１５
のない図６の回路を用いることができる。[0087] The friction coefficients of both feet are equal (μ1=μ2),
And when the lengths of the front and hind legs are equal (L1=L2), QA=QB=0. Therefore, we do not modify the qualitative model. As a result, a qualitative model correction circuit 312, an output change sign detection circuit 313, and switches 314 and 315
It is possible to use the circuit of FIG. 6 without.

【００８８】また、この実施例は学習制御を歩行ロボッ
トに適用しているが、本発明の学習制御は化学プラント
や空調システム等にも適用することができる。Furthermore, although this embodiment applies learning control to a walking robot, the learning control of the present invention can also be applied to chemical plants, air conditioning systems, and the like.

【００８９】[0089]

【発明の効果】以上、本発明によれば、定性モデル演算
回路３０３および入力変化ベクトル選択回路３０９にお
いて、歩行距離ｙを所望の目標歩行距離ｙｄに近づける
ことができる入力変化ベクトル△Ｕｊのみを選択し、こ
れについてのみ歩行動作を行うため、従来のようにすべ
ての入力変化ベクトルについて試行する必要がなく、目
標歩行距離ｙｄに到達するまでの歩行動作の繰り返し回
数を極めて少なくすることができる。さらに、摩擦係数
μ１およびμ２や前足１０２Ａの長さＬ１および後足１
０２Ｂの長さＬ２が変化し、歩行距離ｙが目標歩行距離
ｙｄから離れる傾向にある場合には、定性モデル修正回
路において歩行距離ｙが目標歩行距離ｙｄに近づくよう
に定性モデルを修正するため、繰り返し回数を少なくで
きる効果を維持することができる。実際に実験では、同
じ目標歩行距離ｙｄに到達するのに、従来例では既に述
べたように約８１０回の試行を必要としていたのに対し
、本発明では約１０回の試行で実現でき、大きな効果を
確認できた。As described above, according to the present invention, the qualitative model calculation circuit 303 and the input change vector selection circuit 309 select only the input change vector △Uj that can bring the walking distance y closer to the desired target walking distance yd. However, since the walking motion is performed only for this, there is no need to try all the input change vectors as in the conventional case, and the number of repetitions of the walking motion until the target walking distance yd is reached can be extremely reduced. Furthermore, the friction coefficients μ1 and μ2, the length L1 of the front foot 102A, and the rear foot 1
When the length L2 of 02B changes and the walking distance y tends to move away from the target walking distance yd, the qualitative model is corrected in the qualitative model correction circuit so that the walking distance y approaches the target walking distance yd. The effect of reducing the number of repetitions can be maintained. In actual experiments, in order to reach the same target walking distance yd, the conventional example required approximately 810 trials as mentioned above, whereas the present invention can achieve this in approximately 10 trials, which is a significant I was able to confirm the effect.

[Brief explanation of the drawing]

【図１】本発明の第１の実施例における学習制御装置の
ブロック図である。FIG. 1 is a block diagram of a learning control device in a first embodiment of the present invention.

【図２】本発明の学習制御装置の制御対象の一例である
、歩行ロボットの斜視図である。FIG. 2 is a perspective view of a walking robot, which is an example of the object to be controlled by the learning control device of the present invention.

【図３】（ａ）は本発明の学習制御装置の制御対象の一
例である、歩行ロボットの動作例を表わす正面図である
。（ｂ）は本発明の学習制御装置の制御対象の一例であ
る、歩行ロボットの動作例を表わす正面図である。FIG. 3(a) is a front view showing an example of the operation of a walking robot, which is an example of the object to be controlled by the learning control device of the present invention. (b) is a front view showing an example of the operation of a walking robot, which is an example of the object to be controlled by the learning control device of the present invention.

【図４】本発明の第１の実施例である学習制御装置にお
ける定性モデル修正回路と出力符号検出回路の動作を示
すフローチャート図である。FIG. 4 is a flowchart showing the operations of a qualitative model correction circuit and an output sign detection circuit in the learning control device according to the first embodiment of the present invention.

【図５】本発明の学習制御装置の制御対象の一例である
、歩行ロボットの動作中を示す正面図である。FIG. 5 is a front view showing a walking robot in operation, which is an example of the object to be controlled by the learning control device of the present invention.

【図６】本発明の第２の実施例における学習制御装置の
ブロック図である。FIG. 6 is a block diagram of a learning control device in a second embodiment of the present invention.

[Explanation of symbols]

１００　　胴体１０１　　床１０２Ａ　　前足１０２Ｂ　　後足１０２Ｃ　　前足先１０２Ｄ　　後足先１０３Ａ　　モータ１０３Ｂ　　モータ１０４　　ドライバー回路１０５　　歩行ロボット１０６　　出力検出器３０５Ａ、３０５Ｂ　　スイッチ３０６　　誤差演算回路３０８　　誤差符号検出回路３１０　　入力変化ベクトル決定回路３１１　　入力ベクトル更新回路３１４　　スイッチ３１５　　スイッチ３１６　　スイッチ 100 Torso 101 Floor 102A front leg 102B Hind leg 102C Front foot tip 102D Hind foot tip 103A motor 103B Motor 104 Driver circuit 105 Walking robot 106 Output detector 305A, 305B switch 306 Error calculation circuit 308 Error sign detection circuit 310 Input change vector determination circuit 311 Input vector update circuit 314 Switch 315 Switch 316 Switch

Claims

[Claims]

1. Means for generating a plurality of input change vectors ΔUi for changing a control input U applied to a controlled object;
qualitative model calculating means for performing calculations on the input change vector ΔUi based on a predetermined qualitative model and outputting predicted code data [Equation 1]; a detecting means for detecting the output y of the controlled object; and a detecting means for detecting the output y of the controlled object. an error sign detection means for detecting the sign of the difference between the value y and the target value yd, and the input change vector Δ based on the output [e] of the error sign detection means and the predicted sign data (Equation 1)
an input change vector selection circuit for selecting Ui; an output sign detection means for detecting a predetermined sign representing a change in the value of the output of the controlled object; and a qualitative model modification means for modifying the qualitative model based on the input of the controlled object and the detection output of the output sign detection means, and by repeating the above series of operations. A learning control device that makes the output y of the controlled object match a target value Yd.

2. The qualitative model calculation means uses an input vector U, a boundary function having at least one boundary parameter, and at least one qualitative formula corresponding to the sign of a value obtained by substituting the input vector into the boundary function. The learning control device according to claim 1, further comprising a qualitative model represented.

3. The learning control device according to claim 2, wherein the qualitative model modification means includes means for changing boundary parameters.

4. Means for generating a plurality of input change vectors ΔUi for changing the control input U applied to the controlled object;
a qualitative model calculating means for performing calculations on the input change vector ΔUi based on a predetermined qualitative model and outputting predicted code data (Equation 1); a detecting means for detecting the output y of the controlled object; and a detecting means for detecting the output y of the controlled object. an error code detection means for detecting the sign of the difference between the value y and the target value yd, and an output [e] of the error code detection means and the predicted code data (
An input change vector selection circuit that selects the input change vector ΔUi based on Equation 1), and input vector update means that adds the input change vector selected by the input vector selection circuit to the input of the controlled object. , a learning control device that makes the output y of the controlled object match the target value Yd by repeating the above series of operations.

5. The qualitative model calculation means comprises an input vector U, a boundary function having at least one boundary parameter, and at least one qualitative formula corresponding to the sign of a value obtained by substituting the input vector into the boundary function. 5. The learning control device according to claim 4, further comprising a qualitative model represented by .