JPH07306707A

JPH07306707A - Learning control method for robot

Info

Publication number: JPH07306707A
Application number: JP9676294A
Authority: JP
Inventors: Yoshito Nanjo; 義人南條
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1994-05-10
Filing date: 1994-05-10
Publication date: 1995-11-21

Abstract

PURPOSE:To provide the learning control method for the robot which can improve deterioration of a learning function due to the influence of errors that a target track and the actual track of the robot have at initial time. CONSTITUTION:In a 1st step, various errors that the target track and actual track have are measured throughout a time section and in a 2nd step, a coefficient (learning gain) which is zero in the time section from the initial time to proper time and varies in the time section with a frequency of reproducing operation is calculated. In a 3rd step, the respective errors measured in the 1st step are multiplied by the coefficient calculated in the 2nd step and in a 4th step, the respective multiplication results and an input value supplied to a driving part at the time of last reproducing operation are added. In a following 5th step, the addition result is supplied to the driving part as an input value for next reproducing operation.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、例えば繰り返し再生動
作を行なう産業用ロボットに用いて好適なロボットの学
習制御方法に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a robot learning control method suitable for use in, for example, an industrial robot that performs repetitive reproduction operations.

【０００２】[0002]

【従来の技術】周知のように、繰り返し再生動作を通じ
てロボットの高精度化を図る技術が各種開発されてい
る。この種の技術として、例えば下記の文献（ａ）に記
載された学習制御方法がある。（ａ）S.Arimoto:Learning Control Theory for Roboti
c Motion,InternationalJournal of Adaptive Control
and Signal Processing,4-6,546/564,(1990)2. Description of the Related Art As is well known, various techniques have been developed for increasing the accuracy of a robot through repeated reproduction operations. As a technique of this kind, for example, there is a learning control method described in the following document (a). (A) S.Arimoto: Learning Control Theory for Roboti
c Motion, International Journal of Adaptive Control
and Signal Processing, 4-6,546 / 564, (1990)

【０００３】この文献（ａ）に記載された学習制御方法
では、ロボットが有する各自由度に対し個別に位置誤差
および速度誤差のネガティブフィードバックループを構
成しておき、あらかじめ用意された各自由度に対応する
回転関節の目標軌道に従ってロボットを再生動作させ
る。そして、このときの目標軌道と再生動作時に生じる
実軌道との速度誤差を測定し、その速度誤差に定数（以
下、学習ゲインと呼ぶ）を乗算する。さらに、この乗算
結果を前回の再生動作時にロボットの駆動素子に与えた
入力値（以下、フィードフォワード値）に加算し、この
加算結果を新たなフィードフォワード値として次回の再
生動作時にロボットの駆動素子に与えることにより、再
生動作の精度を向上させる。In the learning control method described in this document (a), a negative feedback loop of position error and velocity error is individually configured for each degree of freedom of the robot, and each degree of freedom prepared in advance is set. The robot is regenerated according to the target trajectory of the corresponding rotary joint. Then, the velocity error between the target trajectory at this time and the actual trajectory generated during the reproducing operation is measured, and the velocity error is multiplied by a constant (hereinafter referred to as a learning gain). Further, this multiplication result is added to the input value (hereinafter referred to as feedforward value) given to the drive element of the robot during the previous playback operation, and the addition result is used as a new feedforward value to drive the robot drive element during the next playback operation. To improve the accuracy of the reproducing operation.

【０００４】この学習制御方法によれば、再生動作を何
回か繰り返すうちにロボットの実軌道は逐次修正され、
あらかじめ用意されていた目標軌道の近傍に収束する。
このため、多自由度の回転関節を有するロボットにおい
てみられる遠心力、コリオリ力といった動力学的な影響
は最終的に補正され、高速かつ高精度な軌道制御が可能
となる。According to this learning control method, the actual trajectory of the robot is sequentially corrected while the reproducing operation is repeated several times.
It converges near the target trajectory prepared in advance.
For this reason, dynamic effects such as centrifugal force and Coriolis force, which are observed in a robot having a rotary joint with multiple degrees of freedom, are finally corrected, and high-speed and highly accurate trajectory control becomes possible.

【０００５】一方、各再生動作時の初期時刻におけるロ
ボットの状態と目標軌道の状態との間に誤差が存在する
場合には、実軌道は学習が進むにつれ振動的な挙動を現
し、目標軌道近傍に収束しないことがある。しかし、忘
却係数と呼ぶ正定数α（０≦α＜１）を設定し、測定さ
れた誤差と学習ゲインとの乗算結果を前回の再生動作時
のフィードフォワード値に加算する際に、前回のフィー
ドフォワード値に（１−α）を乗じれば初期時刻におけ
る誤差の影響を軽減でき、実軌道は目標軌道近傍に収束
することが知られている。On the other hand, when there is an error between the state of the robot and the state of the target trajectory at the initial time of each reproduction operation, the actual trajectory exhibits oscillatory behavior as the learning progresses, and the actual trajectory is close to the target trajectory. May not converge to. However, when a positive constant α (0 ≦ α <1) called a forgetting factor is set and the multiplication result of the measured error and the learning gain is added to the feedforward value in the previous reproduction operation, It is known that if the forward value is multiplied by (1-α), the influence of the error at the initial time can be reduced, and the actual trajectory converges near the target trajectory.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら、上述し
た従来の学習制御方法において、初期時刻における実軌
道と目標軌道との誤差が大きい場合には、忘却係数αを
用いても実軌道は振動的になり、目標軌道近傍に収束し
ない場合がある。However, in the above-mentioned conventional learning control method, when the error between the actual trajectory and the target trajectory at the initial time is large, the actual trajectory is oscillated even if the forgetting factor α is used. Therefore, it may not converge near the target trajectory.

【０００７】図４は、従来の学習方法を６自由度を有す
るロボットに適用し、目標軌道として定速直線軌道を与
えた場合の実軌道と目標軌道との速度誤差の変化を、学
習回数毎に示した図である。ただし、この図では、簡単
化のため６自由度のうちの１自由度の速度誤差のみを表
している。FIG. 4 shows the change in velocity error between the actual trajectory and the target trajectory when the conventional learning method is applied to a robot having 6 degrees of freedom and a constant-speed linear trajectory is given as the target trajectory for each learning frequency. FIG. However, in this figure, for simplification, only the speed error of one of six degrees of freedom is shown.

【０００８】止まっているロボットに目標軌道として定
速直線軌道を与えた場合、初期時刻におけるロボットの
状態（例えば、回転関節の状態）は速度０であるにもか
かわらず、初期時刻における目標軌道速度は０ではな
い。つまり、初期状態に誤差が存在する。この初期状態
の誤差が大きい場合、上記した忘却係数を用いても、図
４の特に時間ｔ＝２０［Ｔs］に見られるように学習が
進むにつれ誤差は増大する。When a constant velocity linear trajectory is given to a stopped robot as a target trajectory, the target trajectory speed at the initial time is 0 even though the robot state (for example, the state of the rotary joint) at the initial time is zero. Is not zero. That is, there is an error in the initial state. When the error in the initial state is large, the error increases as the learning progresses, as seen at time t = 20 [Ts] in FIG. 4, even if the forgetting coefficient described above is used.

【０００９】また、ロボットが物体を把持している場合
などには、厳密な重力補償を行わなければ一般に定常偏
差が生じる。このような状態のロボットに従来の学習制
御方法を適用したときには、やはり初期時刻における位
置誤差（定常偏差）が存在するため、図４に示した誤差
の累積と同様な現象が生じてしまう。In addition, when the robot is holding an object, a steady deviation is generally generated unless strict gravity compensation is performed. When the conventional learning control method is applied to the robot in such a state, since the position error (steady deviation) at the initial time still exists, the same phenomenon as the error accumulation shown in FIG. 4 occurs.

【００１０】以上のように、従来の学習制御方法を用い
る場合には、ロボットの初期状態に留意して目標軌道を
与える必要があった。As described above, when the conventional learning control method is used, it is necessary to give the target trajectory by paying attention to the initial state of the robot.

【００１１】本発明は、このような背景のもとになされ
たもので、初期時刻における目標軌道とロボットの実軌
道との誤差の影響による学習機能の劣化を改善すること
を目的としている。The present invention has been made under such a background, and its object is to improve the deterioration of the learning function due to the influence of the error between the target trajectory at the initial time and the actual trajectory of the robot.

【００１２】[0012]

【課題を解決するための手段】上述した課題を解決する
ために、請求項１記載の発明は、目標軌道と実軌道との
間における位置誤差、速度誤差、加速度誤差のうちいず
れか、あるいは複数の誤差を全時間区間にわたって測定
する第一ステップと、初期時刻から適当な時刻までの時
間区間での値が０であり、しかもその時間区間が再生動
作回数に応じて変化するような係数を算出する第二ステ
ップと、第一ステップで測定した各誤差と第二ステップ
で算出した係数とを乗算する第三ステップと、第三ステ
ップの各乗算結果を前回の再生動作時に駆動部へ供給し
た入力値に加算する第四ステップと、第四ステップの加
算結果を次回の再生動作時における入力値として駆動部
へ供給する第五ステップとを具備することを特徴として
いる。In order to solve the above-mentioned problems, the invention according to claim 1 is one or more of a position error, a velocity error and an acceleration error between a target trajectory and an actual trajectory. The first step of measuring the error over the entire time interval, and the coefficient that the value in the time interval from the initial time to the appropriate time is 0, and that time interval changes according to the number of playback operations The second step, the third step of multiplying each error measured in the first step by the coefficient calculated in the second step, and each multiplication result of the third step, the input supplied to the drive unit during the previous reproducing operation. It is characterized by including a fourth step of adding to the value and a fifth step of supplying the addition result of the fourth step to the drive unit as an input value at the time of the next reproducing operation.

【００１３】また、請求項２記載の発明は、請求項１記
載の発明において、第二ステップは、再生動作回数が特
定の回数以上であり、なおかつ再生動作の時刻が特定の
時刻以前である場合には係数を０とし、他の場合には係
数を特定の正定数とすることを特徴としている。Further, in the invention described in claim 2, in the invention described in claim 1, the second step is that the number of reproduction operations is not less than a specific number and the time of the reproduction operation is before the specific time. Is characterized in that the coefficient is 0, and in other cases, the coefficient is a specific positive constant.

【００１４】[0014]

【作用】この発明によれば、第一ステップにおいて、目
標軌道と実軌道との間における各種誤差を全時間区間に
わたって測定し、第二ステップにおいて、初期時刻から
適当な時刻までの時間区間での値が０で、しかもその時
間区間が再生動作回数に応じて変化するような係数（学
習ゲイン）を算出する。そして、第三ステップにおい
て、第一ステップで測定した各誤差と第二ステップで算
出した係数とを乗算し、第四ステップにおいて、第三ス
テップの各乗算結果と前回の再生動作時に駆動部へ供給
した入力値を加算した後、第五ステップにおいて、第四
ステップの加算結果を次回の再生動作時における入力値
として駆動部へ供給する。According to the present invention, in the first step, various errors between the target trajectory and the actual trajectory are measured over the entire time interval, and in the second step, the error in the time interval from the initial time to the appropriate time is measured. A coefficient (learning gain) whose value is 0 and whose time interval changes according to the number of reproduction operations is calculated. Then, in the third step, each error measured in the first step is multiplied by the coefficient calculated in the second step, and in the fourth step, each multiplication result of the third step is supplied to the drive unit at the time of the previous reproduction operation. After the added input values are added, in a fifth step, the addition result of the fourth step is supplied to the drive section as an input value for the next reproducing operation.

【００１５】これにより、適当な再生動作回数までは全
時間区間にわたって従来と同様な学習制御方法を行うこ
とができ、それ以後の再生動作時には初期時刻から適当
な時刻までの学習を止めることができる。つまり、適当
に設定した再生動作回数以降では、軌道誤差が小さくな
った時点での時刻を新たな初期時刻として学習を行わせ
ることができ、初期誤差の影響による軌道誤差の累積を
回避することができる。Thus, the learning control method similar to the conventional one can be performed over the entire time period up to the appropriate number of reproduction operations, and the learning from the initial time to the appropriate time can be stopped during the subsequent reproduction operation. . In other words, after the appropriately set number of playback operations, the time when the orbit error becomes small can be learned as a new initial time, and the accumulation of orbit errors due to the influence of the initial error can be avoided. it can.

【００１６】例えば図４において、１０回目の再生動作
以降では、時間ｔ＝２０［Ｔs］以降の軌道は振動的に
なり、軌道誤差は累積している。このようなときには、
１１回目の再生動作以降における０≦ｔ≦２０［Ｔs］
の学習ゲインを０とすれば、軌道誤差が十分小さくなっ
た時間ｔ＝２１［Ｔs］を初期時刻とした新たな軌道を
学習することになる。初期状態の誤差が小さければ、従
来の忘却係数を用いる方法等により、実軌道を目標軌道
近傍に収束させることができる。For example, in FIG. 4, after the tenth reproducing operation, the trajectory after time t = 20 [Ts] becomes oscillatory, and the trajectory error is accumulated. When this happens,
0 ≦ t ≦ 20 [Ts] after the 11th reproducing operation
If the learning gain of is set to 0, a new trajectory is learned with the time t = 21 [Ts] when the trajectory error becomes sufficiently small as the initial time. If the error in the initial state is small, the actual trajectory can be converged to the vicinity of the target trajectory by the conventional method using a forgetting factor or the like.

【００１７】[0017]

【実施例】以下、本発明の実施例を図面に基づいて説明
する。図１は、本発明の実施に用いることのできる制御
回路の構成を示すブロック図である。なお、この図では
電動モータ駆動形のロボットの１つの自由度についての
制御系統のみを示しており、他の自由度については同様
の構成となるため図示を省略している。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing the configuration of a control circuit that can be used for implementing the present invention. It should be noted that this figure shows only the control system for one degree of freedom of the electric motor drive type robot, and the other degrees of freedom have the same configuration and are not shown.

【００１８】図１において、４は各種ディジタル演算を
行なう演算処理部であり、各種データを記憶するメモリ
１〜３、図示していないＣＰＵ（中央処理装置）、外部
からの信号をディジタル化するＤ／Ａ（ディジタル／ア
ナログ）変換素子等の入出力インタフェース回路（図示
略）等から構成されている。In FIG. 1, reference numeral 4 denotes an arithmetic processing unit for performing various digital operations, including memories 1 to 3 for storing various data, a CPU (central processing unit) (not shown), and D for digitizing external signals. It is composed of an input / output interface circuit (not shown) such as an / A (digital / analog) conversion element.

【００１９】メモリ１〜３は、少なくとも目標軌道を移
動するのに必要な時間Ｔを制御周期Ｔsで除した数のデ
ータが格納できる程度の記憶容量を有している。そして
メモリ２には、目標軌道の軌道データθ_ref（ｔ）が、
制御周期時間Ｔs毎のモータ９の目標位置θ_ref（Ｔ
_sｉ）（ただし、ｉは整数）としてあらかじめ格納され
ている。なお、メモリ１、３の用途については後述す
る。Each of the memories 1 to 3 has a storage capacity enough to store at least the time T required to move the target trajectory divided by the control cycle Ts. The trajectory data θ _ref (t) of the target trajectory is stored in the memory 2.
Target position θ _ref (T of motor 9 for each control cycle time Ts
_s i) (where i is an integer) is stored in advance. The uses of the memories 1 and 3 will be described later.

【００２０】また、５は減算を行う比較器であり、メモ
リ２より供給される目標軌道データθ_ref（ｔ）から減
算入力される信号を減じ、その結果を出力する。６は乗
算器であり、比較器５の出力に正定数Ｋ1を乗算する。
７は加算および減算を行う比較器であり、メモリ１から
制御周期Ｔs毎に出力されるｕ_k（ｔ）（後述）と乗算器
６の乗算結果を加算し、さらに減算入力される信号を減
じ、その結果を出力する。８はアンプであり、比較器７
の出力を信号増幅し、出力する。９はモータであり、ア
ンプ８の出力に応じて回転し、ロボットの回転関節等を
駆動する。Reference numeral 5 is a comparator for performing subtraction, which subtracts the signal to be subtracted and input from the target trajectory data θ _ref (t) supplied from the memory 2 and outputs the result. A multiplier 6 multiplies the output of the comparator 5 by a positive constant K1.
Reference numeral 7 denotes a comparator that performs addition and subtraction, adds u _k (t) (described later) output from the memory 1 for each control cycle Ts, and the multiplication result of the multiplier 6, and further subtracts the signal input for subtraction. , Output the result. 8 is an amplifier and a comparator 7
The output of is amplified and output. Reference numeral 9 denotes a motor, which rotates according to the output of the amplifier 8 and drives a rotary joint of the robot.

【００２１】また、１１は回転速度検出器であり、モー
タ９の実速度ｄθ／ｄｔ（以下、θ^*（ｔ）と表記す
る）を測定する。１０は乗算器であり、回転速度検出器
１１の測定結果θ^*（ｔ）に正定数Ｋ2を乗算し、その乗
算結果を比較器７の減算入力へ供給する。なお、回転速
度検出器１１の測定結果θ^*（ｔ）はメモリ３にも出力
され、制御周期Ｔs毎に記憶される。一方、１２はモー
タ９の実軌道θ（ｔ）を測定する回転位置検出器であ
り、その測定結果は比較器５の減算入力へ供給される。A rotational speed detector 11 measures the actual speed dθ / dt of the motor 9 (hereinafter referred to as θ ^* (t)). Reference numeral 10 denotes a multiplier, which multiplies the measurement result θ ^* (t) of the rotation speed detector 11 by a positive constant K2 and supplies the multiplication result to the subtraction input of the comparator 7. The measurement result θ ^* (t) of the rotation speed detector 11 is also output to the memory 3 and stored for each control cycle Ts. On the other hand, 12 is a rotational position detector that measures the actual trajectory θ (t) of the motor 9, and the measurement result is supplied to the subtraction input of the comparator 5.

【００２２】以上が、この制御回路におけるフィードバ
ック制御系である。このような構成によれば、モータ９
は、その実軌道θ（ｔ）が目標軌道θref（ｔ）に対し
て誤差が０となる方向に駆動される。このとき、フィー
ドバック制御系の安定性を確保するために実用的には正
定数Ｋ1，Ｋ2の値を極端に大きくすることはできない。
また、ロボットを高速で動作させる場合には、多自由度
を有するロボットにみられる遠心力やコリオリ力の影響
もあり、モータ９を目標軌道θ_ref（ｔ）に完全に追従
することはできない。そこで、こうした原因に基づく誤
差を吸収するため、以下に述べるフィードフォワード制
御を行なう。The above is the feedback control system in this control circuit. According to such a configuration, the motor 9
Is driven in a direction in which the actual trajectory θ (t) has an error of 0 with respect to the target trajectory θref (t). At this time, in order to ensure the stability of the feedback control system, the values of the positive constants K1 and K2 cannot be extremely increased practically.
Further, when the robot is operated at high speed, the motor 9 cannot completely follow the target trajectory θ _ref (t) due to the influence of the centrifugal force and the Coriolis force which are seen in the robot having multiple degrees of freedom. Therefore, in order to absorb the error due to such a cause, the feedforward control described below is performed.

【００２３】次に、図２に示すフローチャートを参照
し、演算処理部４のＣＰＵの動作に基づくフィードフォ
ワード制御動作について説明する。同図において、まず
ステップｐ１では、メモリ１の内容ｕ_kを０クリアす
る。ここでデータｕ_kはｋ回目の再生動作時にロボット
の駆動素子に与えるフィードフォワード値である。この
場合、１回目の再生動作時であるので、フィードフォワ
ード値ｕ₁が「０」となる。Next, the feedforward control operation based on the operation of the CPU of the arithmetic processing section 4 will be described with reference to the flow chart shown in FIG. In the figure, first, in step p1, the contents u _{k of the} memory 1 are cleared to 0. Here, the data u _k is a feedforward value given to the drive element of the robot during the k-th reproduction operation. In this case, the feedforward value u ₁ becomes “0” because the reproduction operation is performed for the first time.

【００２４】次に、ステップｐ２に進むと、図１に示し
たフィードバック制御系において、メモリ１内のフィー
ドフォワード値ｕ₁を制御周期Ｔｓ毎にアンプ８へ供給
し、１回目の再生動作を行なう。この時、制御周期Ｔｓ
毎に速度検出器１１によって測定された全時間区間の実
軌道の速度θ^*（Ｔ_sｉ）をメモリ３に格納する。そし
て、１回目の再生動作が終わった時点で下記ステップｐ
３〜ｐ５によりメモリ１内のフィードフォワード値を修
正する。Next, when proceeding to step p2, in the feedback control system shown in FIG. 1, the feedforward value u ₁ in the memory 1 is supplied to the amplifier 8 every control cycle Ts, and the first reproduction operation is performed. . At this time, the control cycle Ts
The memory 3 stores the velocity θ ^* (T _s i) of the actual trajectory in the entire time period measured by the velocity detector 11 for each time. Then, when the first playback operation is completed, the following step p
The feedforward value in the memory 1 is modified by 3-p5.

【００２５】ステップｐ３では、メモリ２に格納されて
いる目標軌道θ_ref（ｔ）に基づき、目標速度θ
^* _ref（ｔ）を算出する。すなわち、下式（１）に示すよ
うに、時間ｔ＝Ｔ_sｉにおける速度θ^* _ref（Ｔ_sｉ）は、
前後の制御周期における目標位置の差を制御周期Ｔｓで
除して算出することができる。ただし、ｉは制御周期の
順番を表す整数である。 θ^* _ref（Ｔ_sｉ）＝｛θ_ref（Ｔ_s(i+1)）−θ_ref（T_si）｝／Ｔ_s （１）In step p3, based on the target trajectory θ _ref (t) stored in the memory 2, the target velocity θ
^* Calculate _ref (t). That is, as shown in the following equation (1), the velocity θ ^* _ref (T _s i) at time t = T _s i is
It can be calculated by dividing the difference between the target positions in the front and rear control cycles by the control cycle Ts. However, i is an integer representing the order of the control cycle. θ ^* _ref (T _s i) = {θ _ref (T _s (i + 1)) − θ _ref (T _s i)} / T _s (1)

【００２６】次に、ステップｐ４に進むと、上記ステッ
プｐ３で算出したθ^* _ref（ｔ）とメモリ３に格納されて
いる前回の再生動作時の実軌道の速度θ^*（ｔ）との差
を制御周期Ｔｓ毎に算出し、さらにこの結果と学習ゲイ
ンφ（ｔ）とを乗算する。ここで学習ゲインφ（ｔ）
は、初期時刻ｔ＝０から適当な時刻までを「０」とした
関数とする。例えば、下式（２）で表される関数によっ
て求める。Ｋc≦ｋ，かつ０≦ｔ≦Ｔc のとき、φ（ｔ）＝０．それ以外のとき、 φ（ｔ）＝φ₀ （２）Next, in step p4, the difference between θ ^* _ref (t) calculated in step p3 and the actual track velocity θ ^* (t) in the previous reproducing operation stored in the memory 3. Is calculated for each control cycle Ts, and this result is multiplied by the learning gain φ (t). Here, the learning gain φ (t)
Is a function with “0” from the initial time t = 0 to an appropriate time. For example, it is obtained by the function represented by the following equation (2). When Kc ≦ k and 0 ≦ t ≦ Tc, φ (t) = 0. Otherwise, φ (t) = φ ₀ (2)

【００２７】ただし、Ｔc，Ｋcは適当に設定した値、ｔ
は時間、ｋは再生動作回数、φ₀ は適当な正定数であ
る。図３は、この式（２）に基づいた学習ゲインφ
（ｔ）の変化の様子を示した図である。However, Tc and Kc are appropriately set values, and t
Is time, k is the number of reproduction operations, and φ ₀ is an appropriate positive constant. FIG. 3 shows the learning gain φ based on this equation (2).
It is the figure which showed the mode of change of (t).

【００２８】次に、ステップｐ５では、上記学習ゲイン
φ（ｔ）を乗じた値とメモリ１内のフィードフォワード
値ｕ₁（ｔ）とを加算した結果をｕ₂（ｔ）としてメモリ
１に格納する。そして、ステップｐ６では学習動作を継
続するか否かを判断する。即ち、あらかじめ設定してあ
る学習回数に再生動作回数が達していないか、あるいは
実軌道があらかじめ設定してある所望の精度に達してい
ない場合には、ここでの判断結果がｙｅｓとなり、再生
動作回数ｋをインクリメントした後、前述のステップｐ
２に戻る。これにより、２回目の再生動作に伴う学習動
作が行われる。Next, in step p5, the result obtained by adding the value obtained by multiplying the learning gain φ (t) and the feedforward value u ₁ (t) in the memory 1 is stored in the memory 1 as u ₂ (t). To do. Then, in step p6, it is determined whether or not the learning operation is continued. That is, if the number of playback operations has not reached the preset number of learning times or the actual trajectory has not reached the preset desired accuracy, the determination result here becomes yes, and the playback operation is performed. After incrementing the number k, step p
Return to 2. As a result, the learning operation associated with the second reproduction operation is performed.

【００２９】次に、２回目の再生動作時では、ステップ
ｐ５で新たにメモリ１に格納されたフィードフォワード
値ｕ₂（ｔ）をアンプ８へ供給する。以後、上記ステッ
プｐ６の判断結果がｙｅｓとなる間、再生動作毎に上記
ステップｐ３〜ｐ５を繰り返し、その都度メモリ１内の
フィードフォワード値を修正する。この時のフィードフ
ォワード値の算出方式は下式（３）によって表される。ｕ_k+1（ｔ）=φ（ｔ）（θ^* _ref(t)−θ^*（ｔ））＋ｕ_k（ｔ）（３）Next, at the time of the second reproduction operation, the feedforward value u ₂ (t) newly stored in the memory 1 is supplied to the amplifier 8 in step p5. After that, while the determination result of step p6 is yes, steps p3 to p5 are repeated for each reproduction operation, and the feedforward value in the memory 1 is corrected each time. The calculation method of the feedforward value at this time is represented by the following expression (3). u _{k + 1} (t) = φ (t) (θ ^* _ref (t) −θ ^* (t)) + u _k (t) (3)

【００３０】こうして、再生動作回数があらかじめ設定
してある学習回数に達したとき、あるいは実軌道が所望
の精度に達したとき、ステップｐ６の判断結果がｎｏと
なり、学習動作は完了する。In this way, when the number of reproduction operations reaches the preset number of times of learning, or when the actual trajectory reaches the desired accuracy, the determination result of step p6 becomes no, and the learning operation is completed.

【００３１】このように、本実施例によれば、適当な再
生動作回数Ｋｃ以降では、初期時刻ｔ＝０からの適当な
時間区間（０≦ｔ≦Ｔｃ）で学習ゲインφ（ｔ）＝０と
することにより、その時間区間の学習を停止することが
できる。この結果、従来の方法に比して、初期状態の誤
差が大きい場合に生じる実軌道の振動的な挙動を抑圧で
き、実軌道を目標軌道近傍へ収束させることが可能とな
る。As described above, according to the present embodiment, after the appropriate number of reproduction operations Kc, the learning gain φ (t) = 0 in an appropriate time section (0 ≦ t ≦ Tc) from the initial time t = 0. By doing so, learning in that time interval can be stopped. As a result, as compared with the conventional method, the oscillatory behavior of the real trajectory that occurs when the error in the initial state is large can be suppressed, and the real trajectory can be converged to the vicinity of the target trajectory.

【００３２】なお、本実施例では学習ゲインφ（ｔ）
を、前述の式（２）で表される関数で設定したが、再生
動作の回数に応じて適宜φ（ｔ）＝０である時間区間を
変更するのであれば、他の関数を用いてもよい。In this embodiment, the learning gain φ (t)
Was set by the function expressed by the above equation (2), but other functions may be used as long as the time section in which φ (t) = 0 is appropriately changed according to the number of reproduction operations. Good.

【００３３】また、本実施例では、メモリ２に格納され
ているθ_ref（ｔ）を基に目標速度θ^* _ref（ｔ）を算出
するようにしたが、このかわりに、あらかじめ目標速度
θ^* _r _ef（ｔ）を別のメモリに格納しておいてもよい。ま
た、再生動作時の実速度θ^*（ｔ）をメモリ３に格納せ
ずに、再生動作中に上式（３）を用いて算出した各制御
周期毎のフィードフォワード値ｕ_k+1（ｔ）を直接メモ
リ１に格納してもよい。In the present embodiment, the target speed θ ^* _ref (t) is calculated based on _θref (t) stored in the memory 2, but instead, the target speed θ ^* is calculated in advance ^. _r _ef (t) may be stored in another memory. Further, the actual speed θ ^* (t) during the reproducing operation is not stored in the memory 3 and the feedforward value u _{k + 1} (t for each control cycle calculated using the above equation (3) during the reproducing operation. ) May be stored directly in the memory 1.

【００３４】また、本実施例では、制御回路としてコン
ピュータ（ＣＰＵ等で構成された演算処理部４）用いた
場合を例としているが、これを他の周知の種々の制御素
子に置き換えることも可能である。Further, in the present embodiment, the case where the computer (the arithmetic processing unit 4 composed of a CPU or the like) is used as the control circuit is taken as an example, but this may be replaced with other well-known various control elements. Is.

【００３５】また、本実施例では、ロボットの各関節を
電動モータにより駆動し、かつその電動モータに対し個
別に位置誤差および速度誤差のネガティブフィードバッ
ク制御が構成されたロボットを対象に説明したが、この
発明は油圧シリンダを駆動源とするロボットや、その他
種々のサーボ機構を有する機械にも適用可能である。Further, in the present embodiment, description has been made for the robot in which each joint of the robot is driven by the electric motor and the negative feedback control of the position error and the speed error is individually configured for the electric motor. The present invention can be applied to a robot using a hydraulic cylinder as a drive source and other machines having various servo mechanisms.

【００３６】また、本実施例では、フィードフォワード
値の算出に、目標軌道とロボットの実軌道とに生じる速
度誤差のみを用いているが、位置誤差や加速度誤差に上
述したような学習ゲインφ（ｔ）を乗じた算出結果を前
回の再生動作時のフィードフォワード値に加えて学習す
ることも可能である。Further, in the present embodiment, only the velocity error generated between the target trajectory and the actual trajectory of the robot is used in the calculation of the feedforward value, but the learning gain φ ( It is also possible to learn by adding the calculation result obtained by multiplying t) to the feedforward value at the time of the previous reproduction operation.

【００３７】また、本実施例では式（３）で表される学
習則を利用したが、文献（ａ）等にあるように忘却係数
を用いた学習則を利用することも可能である。Further, although the learning rule expressed by the equation (3) is used in this embodiment, it is also possible to use the learning rule using the forgetting factor as in the reference (a).

【００３８】さらに、本実施例では、ロボットの駆動部
へ供給するフィードフォワード値を繰り返し再生動作を
通じて修正するとしたが、フィードフォワードと同等の
効果を持つように、あらかじめ与えられた目標軌道を学
習により逐次修正することも可能である。Further, in the present embodiment, the feedforward value supplied to the drive unit of the robot is corrected through the repeated reproduction operation. However, the target trajectory given in advance is learned by learning so as to have the same effect as the feedforward. It is also possible to make successive modifications.

【００３９】[0039]

【発明の効果】以上説明したように、本発明では初期時
刻から適当な時刻までを「０」とし、その「０」である
時間を再生動作の回数に応じて変化させる学習ゲインを
用いることにより、従来の技術において初期時刻におけ
る目標軌道と実軌道との誤差が大きい場合に生じる軌道
誤差の累積という現象を改善することができる。その結
果、従来手法ではロボットの初期状態に留意して目標軌
道を与える必要があったが、その制約を緩和することが
できる。As described above, according to the present invention, a learning gain is used in which "0" is set from the initial time to an appropriate time and the time of "0" is changed according to the number of reproduction operations. In the prior art, it is possible to improve the phenomenon of accumulating orbital errors that occurs when the error between the target orbit and the actual orbit at the initial time is large. As a result, in the conventional method, it was necessary to give the target trajectory by paying attention to the initial state of the robot, but the constraint can be relaxed.

[Brief description of drawings]

【図１】この発明の一実施例の構成を示すブロック図
である。FIG. 1 is a block diagram showing the configuration of an embodiment of the present invention.

【図２】同実施例において、ＣＰＵの動作に基づくフ
ィードフォワード制御動作を示すフローチャートであ
る。FIG. 2 is a flowchart showing a feedforward control operation based on an operation of a CPU in the embodiment.

【図３】同実施例において、学習ゲインφ（ｔ）の変
化を再生動作回数に応じて示した図である。FIG. 3 is a diagram showing changes in a learning gain φ (t) according to the number of reproduction operations in the embodiment.

【図４】従来の学習制御方法における、目標軌道速度
と実軌道速度との誤差変化を例示した図である。FIG. 4 is a diagram illustrating an error change between a target trajectory speed and an actual trajectory speed in a conventional learning control method.

[Explanation of symbols]

１〜３メモリ４演算処
理部５，７比較器６，１０乗算器８アンプ９モータ１１回転速度検出器１２回転位
置検出器1 to 3 memory 4 arithmetic processing unit 5,7 comparator 6,10 multiplier 8 amplifier 9 motor 11 rotation speed detector 12 rotation position detector

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ０５Ｄ 3/12 ３０５Ｖ３０６Ｇ ─────────────────────────────────────────────────── ─── Continuation of the front page (51) Int.Cl. ⁶ Identification code Internal reference number FI technical display location G05D 3/12 305 V 306 G

Claims

[Claims]

1. An input supplied to a drive unit of a robot based on an error generated between a target trajectory that is a target of the regeneration operation and an actual trajectory of the robot when the robot repeatedly performs a replay operation. In the learning control method that corrects the value, the position error between the target trajectory and the actual trajectory, the velocity error,
The first step of measuring one or more of the acceleration errors over the entire time interval, and the value in the time interval from the initial time to the appropriate time is 0, and that time interval is the number of playback operations. The second step of calculating the coefficient that changes according to the third step, the third step of multiplying each error measured in the first step by the coefficient calculated in the second step, The method further comprises a fourth step of adding to the input value supplied to the drive section during the reproducing operation, and a fifth step of supplying the addition result of the fourth step to the drive section as an input value during the next reproducing operation. Learning control method for robot.

2. The second step sets the coefficient to 0 when the number of reproduction operations is equal to or greater than a specific number and the time of the reproduction operation is before the specific time, and specifies the coefficient in other cases. The learning control method for a robot according to claim 1, wherein the learning control method is a positive constant.