JPH06314106A

JPH06314106A - Learning controller

Info

Publication number: JPH06314106A
Application number: JP12531193A
Authority: JP
Inventors: Yuji Nakamura; 裕司中村; Shingo Ando; 慎悟安藤; Etsujirou Shimura; 悦二郎示村
Original assignee: Yaskawa Electric Corp
Current assignee: Yaskawa Electric Corp
Priority date: 1993-04-28
Filing date: 1993-04-28
Publication date: 1994-11-08

Abstract

PURPOSE:To decrease an arithmetic quantity by directly utilizing a state space model for a controlled system by determining the correction quantity at current time according to time-series data and the state space model for the controlled system so that an evaluation function regarding a deviation predicted value, a deviation, and a correction quantity up to specific sampling future become minimum. CONSTITUTION:A subtracter 10 finds the difference e(i-D) between a target command r(i-D) stored in a memory 2 and an output y(i-D). A subtracter 11 finds the difference eta(i-D) between state vectors x(i'-D) and x(i-D) stored in a memory 3. Memories 5 and 6 are stored with past deviations and state vectors and newly stored with the outputs e(i-D) and eta(i-D) of the subtracters 10 and 11. A computing element 9 determines the correction quantity cy (sigmai) at the current time according to the time-series data and the state space model for the controlled system so that the evaluation function regarding the deviation predicted value e*, deviation e(i-D), and correction quantity up to M sampling future becomes minimum. Consequently, the arithmetic quantity is decreased.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、工作機械、ロボット等
の制御装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a control device for machine tools, robots and the like.

【０００２】[0002]

【従来の技術】繰り返し目標値に対する学習制御装置と
しては、本出願人が特開平1ー237701号公報、特願平3-35
4789、および、特願平4-289431において提案した装置が
ある。これらの装置では、同じ目標値に対する動作を繰
り返し、偏差、補正量、制御入力、および、制御対象の
ステップ応答をもとに未来偏差予測値が最小となるよう
に制御入力が決定されるため、最終的には目標値と出力
が一致し、高精度な追従動作が実現される。2. Description of the Related Art As a learning control device for a repetitive target value, the applicant of the present invention has disclosed in Japanese Patent Application Laid-Open No. 1-237701 and Japanese Patent Application No.
There are devices proposed in 4789 and Japanese Patent Application No. 4-289431. In these devices, the operation for the same target value is repeated, the deviation, the correction amount, the control input, and the control input is determined based on the step response of the control target so that the future deviation prediction value is minimized. Eventually, the target value and the output match, and highly accurate follow-up operation is realized.

【０００３】[0003]

【発明が解決しようとする課題】しかし従来技術では、
未来偏差の予測の際に、制御対象の十分整定するまでの
ステップ応答が必要であり、制御対象の状態空間モデル
が得られている場合、シミュレーション等によりステッ
プ応答を算出することも考えられるが、直接的でなく、
その分の手間がかかり、さらに、整定時間の増加やサン
プリング周期の短縮に伴って演算量が増加するなどの問
題があった。そこで本発明は、状態空間モデルを直接利
用し、演算量の少ない学習制御装置を提供することを目
的とする。However, in the prior art,
When predicting the future deviation, it is necessary to have a step response until the controlled object is sufficiently settled, and if the state space model of the controlled object is obtained, it is possible to calculate the step response by simulation or the like. Not directly
There is a problem in that the amount of calculation is increased as the settling time is increased and the sampling cycle is shortened. Therefore, an object of the present invention is to provide a learning control device that directly uses a state space model and has a small amount of calculation.

【０００４】[0004]

【課題を解決するための手段】上記問題点を解決するた
め、本願第１の発明では、周期Ｌで同じパターンを繰り
返す目標指令に制御対象の出力を追従させるよう、現在
時刻ｉにおいて、目標指令ｒ(i) （= ｒ(i') i'=i-L ）
と、D(D ≧0)サンプリング前の制御対象の出力ｙ(i-D)
および状態ベクトルｘ(i-D) を入力し、制御入力ｕ(i)
を制御対象へ出力する学習制御装置において、１周期前
の制御入力ｕ(i')に補正量σ(i) を加算してｕ(i) を求
める手段と、目標指令を記憶し、偏差ｅ(i-D) を求める
手段と、状態ベクトルｘを記憶して、その１周期前から
の変化分ηを求める手段と、学習制御用定数を記憶する
手段と、偏差、状態変化分ベクトル、補正量、制御入力
の時系列データを記憶する手段と、前記時系列データと
制御対象の状態空間モデルにより、Ｍサンプリング未来
までの偏差予測値ｅ^*と偏差ｅ(i-D) および補正量σに
関する評価関数In order to solve the above problems, in the first invention of the present application, the target command is made to follow the output of the controlled object to the target command in which the same pattern is repeated in the cycle L. r (i) (= r (i ') i' = iL)
And the output y (iD) of the controlled object before D (D ≥ 0) sampling
And state vector x (iD) are input, and control input u (i)
In the learning control device for outputting the control target u (i ') to the controlled object, the correction input σ (i) is added to the control input u (i') to obtain u (i). means for obtaining (iD), means for storing the state vector x to obtain a change η from one cycle before, means for storing a learning control constant, deviation, state change vector, correction amount, A means for storing the time series data of the control input, and an evaluation function for the deviation prediction value e ^* , the deviation e (iD), and the correction amount σ up to the M sampling future by the time series data and the state space model of the controlled object.

【０００５】[0005]

【数７】 [Equation 7]

【０００６】が最小となるように、現在時刻の補正量σ
(i) を決定する手段とを備えることを特徴とし、本願第
２の発明では、周期Ｌで同じパターンを繰り返す目標指
令に制御対象の出力を追従させるよう、現在時刻ｉにお
いて、目標指令増分値Δｒ(i) （= Δｒ(i') i'=i-L ）
と、D(D ≧0)サンプリング前の制御対象の出力増分値Δ
ｙ(i-D)および状態増分値ベクトルΔｘ(i-D) を入力
し、制御入力ｕ(i) を制御対象へ出力する学習制御装置
において、１周期前の制御入力増分値Δｕ(i')に補正量
増分値Δσ(i) を加算して、Δｕ(i) を求める手段と、
目標指令増分値を記憶し、偏差増分値および偏差を求め
る手段と、状態増分値ベクトルΔｘを記憶して、その１
周期前からの変化分Δηを求める手段と、学習制御用定
数を記憶する手段と、補正量増分値より補正量を求める
手段と、偏差増分値、補正量増分値、制御入力増分値の
時系列データを記憶する手段と、前記偏差と状態増分値
ベクトルの変化分と補正量と時系列データ、および、制
御対象の状態空間モデルにより、Ｍサンプリング未来ま
での偏差増分値の予測値Δｅ^*と偏差ｅおよび補正量に
関する評価関数Correction amount σ at the current time so that
In the second invention of the present application, the target command increment value at the current time i is set so that the output of the controlled object follows the target command that repeats the same pattern in the cycle L. Δr (i) (= Δr (i ') i' = iL)
And D (D ≥ 0) the output increment value Δ of the controlled object before sampling
In the learning controller that inputs y (iD) and the state increment value vector Δx (iD) and outputs the control input u (i) to the controlled object, the control input increment value Δu (i ') of one cycle before is corrected by the correction amount. Means for obtaining Δu (i) by adding increment values Δσ (i),
Means for storing the target command increment value, means for obtaining the deviation increment value and the deviation, and the state increment value vector Δx are stored.
Means for obtaining the change amount Δη from the period before, means for storing the learning control constant, means for obtaining the correction amount from the correction amount increment value, time series of deviation increment value, compensation quantity increment value, control input increment value The means for storing the data, the deviation, the change amount of the state increment value vector, the correction amount, the time series data, and the state space model of the controlled object, the predicted value Δe ^* of the deviation increment value up to the M sampling future and the deviation Evaluation function for e and correction amount

【０００７】[0007]

【数８】 [Equation 8]

【０００８】が最小となるように、現在時刻の補正量増
分値Δσ(i) を決定する手段と、制御入力増分値より制
御入力を求める手段とを備えることを特徴とする。The present invention is characterized by comprising means for determining the correction amount increment value Δσ (i) at the present time and means for obtaining the control input from the control input increment value so that

【０００９】[0009]

【作用】上記手段により、状態空間モデルを直接利用
し、演算量の少ない学習制御装置が実現され、高精度な
追従動作が可能となる。With the above means, a learning control device with a small amount of calculation can be realized by directly using the state space model, and a highly accurate follow-up operation can be performed.

【００１０】[0010]

【実施例】まず本願第１の発明の具体的実施例を図１に
示して説明する。図中１は本発明の学習制御装置であ
り、現在時刻ｉにおいて、周期Ｌで同じパターンを繰り
返す目標指令の現在値ｒ(i) （= ｒ(i') i'=i-L ）と、
D(D ≧0)サンプリング前の制御対象の出力ｙ(i-D) およ
び状態ベクトルｘ(i-D) を入力し、制御入力ｕ(i) を制
御対象へ出力する。２は、目標指令ｒ(i),ｒ(i-1),…,
ｒ(i-D) を記憶するメモリ、３は、１周期分の状態ベク
トルを記憶するメモリ、４は、定数ｑ_M1 ,…, ｑ_M,
Ｑ, Ｅ, ｇ₀,ｇ₁,Ｓ, Ｓ_D, ｓ₁,…, ｓ_Dを記憶するメ
モリである。１０は、メモリ２に記憶された目標指令ｒ
(i-D) と、出力ｙ(i-D) との差ｅ(i-D) を求める減算
器、１１は、メモリ３に記憶された状態ベクトルｘ(i'-
D)と、ｘ(i-D) との差η(i-D) を求める減算器である。
５、６は過去の偏差および状態ベクトルを記憶するメモ
リであり、減算器１０および１１の出力ｅ(i-D) 、η(i
-D) が新たに記憶される。７は過去の補正量を記憶する
メモリ、８は過去１周期分の制御入力を記憶するメモリ
である。９は演算器であり、DESCRIPTION OF THE PREFERRED EMBODIMENTS First, a specific embodiment of the first invention of the present application will be described with reference to FIG. In the figure, reference numeral 1 is a learning control device of the present invention, which is a current value r (i) (= r (i ') i' = iL) of a target command that repeats the same pattern at a cycle L at a current time i,
D (D ≧ 0) The output y (iD) of the controlled object before sampling and the state vector x (iD) are input, and the control input u (i) is output to the controlled object. 2 is the target command r (i), r (i-1), ...,
A memory for storing r (iD), a memory 3 for storing a state vector for one cycle, and a constant q _M1 , ..., Q _M ,
It is a memory for storing Q, E, g ₀ , g ₁ , S, _SD , s ₁ , ..., _SD . 10 is the target command r stored in the memory 2.
A subtractor 11 for obtaining a difference e (iD) between (iD) and the output y (iD) is a state vector x (i'-
It is a subtracter for obtaining a difference η (iD) between D) and x (iD).
Reference numerals 5 and 6 are memories for storing past deviations and state vectors, and outputs e (iD) and η (i of the subtractors 10 and 11 are stored.
-D) is newly memorized. Reference numeral 7 is a memory for storing the past correction amount, and 8 is a memory for storing the control input for the past one cycle. 9 is an arithmetic unit,

【００１１】[0011]

【数９】 [Equation 9]

【００１２】ただし、D=0 のときは右辺最終項は零とす
る。なる演算によって補正量σ(i) を算出する。算出さ
れたσ(i) は、加算器１２に入力されるとともにメモリ
７に記憶される。加算器１２は、演算器９の出力σ(i)
とメモリ８に記憶されたｕ(i')とを加算して制御入力ｕ
(i) を算出する。得られた制御入力ｕ(i) は、学習制御
装置１の出力として制御対象に出力されるとともに、メ
モリ８に記憶される。ここで(1) 式の導出を行う。時刻
ｉにおいて、制御入力ｕ(i) は、加算器１２により次式
で決定される。ｕ(i) = ｕ(i') + σ(i) (2) そこで、未来偏差予測値ｅ^*(i+m) (1≦M1≦m ≦M)が最
小となるように、現在時刻の補正量σ(i) を決定するこ
とを考える。いま制御対象の離散化モデルが、以下の状
態空間表現で得られているとする。However, when D = 0, the last term on the right side is zero. The correction amount σ (i) is calculated by the following calculation. The calculated σ (i) is input to the adder 12 and stored in the memory 7. The adder 12 outputs the output σ (i) of the calculator 9.
And u (i ') stored in the memory 8 are added to obtain the control input u
Calculate (i). The obtained control input u (i) is output to the control target as the output of the learning control device 1 and stored in the memory 8. Here, the formula (1) is derived. At time i, the control input u (i) is determined by the adder 12 according to the following equation. u (i) = u (i ') + σ (i) (2) Then, the future deviation prediction value e ^* (i + m) (1 ≤ M1 ≤ m ≤ M) Consider determining the correction amount σ (i). It is assumed that the discretized model of the controlled object is obtained by the following state space representation.

【００１３】[0013]

【数１０】 [Equation 10]

【００１４】ただし、ｘ(i) Ｒ^nx1は状態ベクトルであ
り、＾はモデル値であることを表す。上式のモデルを用
いれば、However, x (i) R ^nx1 is a state vector, and ^ represents a model value. Using the model above,

【００１５】[0015]

【数１１】 [Equation 11]

【００１６】で定義される出力変化分δ(i) と状態変化
分ベクトルη(i) のモデルは次式となる。The model of the output variation δ (i) and the state variation vector η (i) defined by

【００１７】[0017]

【数１２】 [Equation 12]

【００１８】時刻ｉにおいては実測値η(i-D) が得られ
るため、時刻i-D 以降のSince the measured value η (iD) is obtained at time i,

【００１９】[0019]

【数１３】 [Equation 13]

【００２０】を実測値η(i-D) を用いて表すと、(5) 式
より、When the measured value η (iD) is expressed as follows, from equation (5),

【００２１】[0021]

【数１４】 [Equation 14]

【００２２】となる。そこでσ(j)=σ(i) (j>i) と仮定
し、時刻i-D 以降の状態変化分ベクトルをIt becomes Therefore, assuming that σ (j) = σ (i) (j> i), the state change vector after time iD is

【００２３】[0023]

【数１５】 [Equation 15]

【００２４】あるいは、Alternatively,

【００２５】[0025]

【数１６】 [Equation 16]

【００２６】で予測する。ここでは(8) 式右辺のPredict with. Here, on the right side of equation (8)

【数１７】 [Equation 17]

【００２７】は(7) 式によって与えているが、(6) 式右
辺のIs given by the equation (7), the right side of the equation (6) is

【００２８】[0028]

【数１８】 [Equation 18]

【００２９】に(8) 式で得られたη^*(i+m-1) を代入し
て求めても良い。 (4),(5),(7),(8)式、および、σ(j)=
σ(i) (j>i) の仮定により、出力変化分予測値δ^*(i+
m) は、Alternatively, η ^* (i + m-1) obtained by the equation (8) may be substituted for the value. Equations (4), (5), (7), (8) and σ (j) =
Based on the assumption of σ (i) (j> i), the predicted output change value δ ^* (i +
m) is

【００３０】[0030]

【数１９】 [Formula 19]

【００３１】あるいは、Alternatively,

【００３２】[0032]

【数２０】 [Equation 20]

【００３３】で与えられる。ただし h_j,H_jは、モデル
(3) 式の重み系列およびその積算値である（ h_j= ｃＡ
^j-1ｂ, H_j=h₁+…+h_j(j≧1) ）。したがって、未来
偏差予測値ｅ^*(i+m) を、ｅ^*(i+m) = ｅ(i'+m) - δ^*(i+m) M1≦m ≦M (10) で与え、評価関数Is given by Where h _j and H _j are models
(3) is the weight sequence of equation (3) and its integrated value (h _j = cA
^j-1 b, H _j = h ₁ + ... + h _j (j ≧ 1)). Therefore, the future deviation prediction value e ^* (i + m) is given by e ^* (i + m) = e (i '+ m)-δ ^* (i + m) M1 ≤ m ≤ M (10) and evaluated. function

【００３４】[0034]

【数２１】 [Equation 21]

【００３５】が最小となるように補正量σ(i) を決定す
ると、∂Ｊ/ ∂σ(i) = 0 より、前記(1) 式を得る。た
だし各定数ｑ_m, Ｑ, Ｅ, Ｓ, Ｓ_D, ｓ_j、およびベク
トルｇ₀,ｇ₁は、次式で与えられる。When the correction amount σ (i) is determined so as to minimize, the above equation (1) is obtained from ∂J / ∂σ (i) = 0. However, the constants q _m , Q, E, S, S _D , s _j and the vectors g ₀ , g ₁ are given by the following equations.

【００３６】[0036]

【数２２】 [Equation 22]

【００３７】また、(7) 式の代わりに、Further, instead of the equation (7),

【００３８】[0038]

【数２３】 [Equation 23]

【００３９】によってBy

【００４０】[0040]

【数２４】 [Equation 24]

【００４１】を与え、(8b),(10) 式の予測式を用いて、
(11)式の評価関数（ただし、α=0）を最小とするよう補
正量を決定すれば、各試行を間欠的に行い各試行間に次
回１試行分の補正量を次式によりまとめて算出すること
もできる。And the prediction formulas (8b) and (10) are used,
If the correction amount is determined so that the evaluation function (where α = 0) in Eq. (11) is minimized, each trial is performed intermittently and the correction amount for the next one trial is summarized by the following equation between each trial. It can also be calculated.

【００４２】[0042]

【数２５】 [Equation 25]

【００４３】ただし、ベクトルｇは次式で与えられる。However, the vector g is given by the following equation.

【００４４】[0044]

【数２６】 [Equation 26]

【００４５】つぎに本願第２の発明の具体的実施例を図
２に示して説明する。図中２１は本発明の学習制御装置
であり、現在時刻ｉにおいて、周期Ｌで同じパターンを
繰り返す目標指令の増分値Δｒ(i) （= Δｒ(i') i'=i-
L ）と、D(D ≧0)サンプリング前の制御対象の出力増分
値Δｙ(i-D) および状態増分値ベクトルΔｘ(i-D) を入
力し、制御入力ｕ(i) を制御対象へ出力する。Δはサン
プリング周期間の増分値を表す。２２は、目標指令増分
値Δｒ(i),…, Δｒ(i-D) を記憶するメモリ、２３は、
１周期分の状態増分値ベクトルを記憶するメモリ、２４
は、定数ｖ_-D+1 ,…, ｖ_M, Ｅ, ｇ₀,Ｓ, ｓ₁,…, ｓ_D
を記憶するメモリである。２９は、メモリ２２に記憶さ
れたΔｒ(i-D) と、Δｙ(i-D) との差Δｅ(i-D)を求め
る減算器、３０は、メモリ２３に記憶されたΔｘ(i'-D)
と、Δｘ(i-D) との差Δη(i-D) を求める減算器であ
る。２５は過去の偏差増分値を記憶するメモリであり、
減算器２９の出力Δｅ(i-D) が新たに記憶される。２６
は過去の補正量増分値を記憶するメモリ、２７は過去１
周期分の制御入力増分値を記憶するメモリであり、３
２、３３は、偏差ｅ(i-D) および補正量σ(i-1) を求め
る積算器である。２８は演算器であり、Next, a specific embodiment of the second invention of the present application will be described with reference to FIG. Reference numeral 21 in the figure denotes a learning control device of the present invention, and at the current time i, the increment value Δr (i) (= Δr (i ') i' = i- of the target command that repeats the same pattern in the cycle L
L) and the output increment value Δy (iD) and the state increment value vector Δx (iD) of the controlled object before D (D ≧ 0) sampling are input, and the control input u (i) is output to the controlled object. Δ represents an increment value between sampling periods. 22 is a memory for storing the target command increment values Δr (i), ..., Δr (iD), and 23 is
A memory for storing a state increment value vector for one cycle, 24
Is a constant v _{−D + 1} , ..., v _M , E, g ₀ , S, s ₁ , ,,, s _D
Is a memory for storing. Reference numeral 29 is a subtractor for obtaining a difference Δe (iD) between Δr (iD) stored in the memory 22 and Δy (iD), and 30 is Δx (i'-D) stored in the memory 23.
And a difference Δη (iD) between Δx (iD) and Δx (iD). 25 is a memory for storing the past deviation increment value,
The output Δe (iD) of the subtractor 29 is newly stored. 26
Is a memory for storing the past correction amount increment value, and 27 is the past 1
It is a memory that stores control input increment values for three cycles.
Reference numerals 2 and 33 are integrators for obtaining the deviation e (iD) and the correction amount σ (i-1). 28 is a computing unit,

【００４６】[0046]

【数２７】 [Equation 27]

【００４７】ただし、D=0 のときは右辺最終項は零とす
る。なる演算により補正量増分値Δσ(i) を算出する。
算出されたΔσ(i) は、加算器３１と積算器３３に入力
されるとともにメモリ２６に記憶される。加算器３１
は、Δσ(i) とメモリ２７に記憶されたΔｕ(i')とを加
算して制御入力増分値Δｕ(i) を算出する。得られたΔ
ｕ(i) は、積算器３４に入力されるとともに、メモリ２
７に記憶される。積算器３４によって求められた制御入
力ｕ(i) は、学習制御装置２１の出力として制御対象に
出力される。ここで(21)式の導出を行う。時刻ｉにおい
て、制御入力増分値Δｕ(i) は、加算器３１により次式
で決定される。 Δｕ(i) = Δｕ(i') + Δσ(i) (22) そこで、未来偏差予測値が最小となるように、現在時刻
の補正量増分値Δσ(i)を決定することを考える。いま
制御対象の状態空間モデルが、前記(3) 式で得られてい
るとすると、出力変化分増分値Δδ(i) と状態変化分増
分値ベクトルΔη(i) のモデルは次式となる。However, when D = 0, the last term on the right side is zero. The correction amount increment value Δσ (i) is calculated by the following calculation.
The calculated Δσ (i) is input to the adder 31 and the integrator 33 and stored in the memory 26. Adder 31
Calculates the control input increment value Δu (i) by adding Δσ (i) and Δu (i ′) stored in the memory 27. Obtained Δ
u (i) is input to the integrator 34 and is also stored in the memory 2
Stored in 7. The control input u (i) obtained by the integrator 34 is output to the control target as the output of the learning control device 21. Here, the equation (21) is derived. At time i, the control input increment value Δu (i) is determined by the adder 31 by the following equation. Δu (i) = Δu (i ′) + Δσ (i) (22) Then, let us consider determining the correction amount increment value Δσ (i) at the current time so that the future deviation prediction value is minimized. Assuming that the state space model of the controlled object is obtained by the equation (3), the model of the output change increment value Δδ (i) and the state change increment value vector Δη (i) is as follows.

【００４８】[0048]

【数２８】 [Equation 28]

【００４９】時刻ｉにおいては実測値Δη(i-D) が得ら
れるため、時刻i-D 以降の状態変化分増分値ベクトル
を、(23)式より、Since the measured value Δη (iD) is obtained at time i, the state change increment value vector after time iD is calculated from equation (23) as follows:

【００５０】[0050]

【数２９】 [Equation 29]

【００５１】で予測し、さらにΔσ(j)=0 (j>i) とすれ
ば、出力変化分増分値の予測値は、If the prediction is made with Δσ (j) = 0 (j> i), the predicted value of the output change increment value is

【００５２】[0052]

【数３０】 [Equation 30]

【００５３】で与えられる。したがって、未来偏差予測
値ｅ^*(i+m) を、Is given by Therefore, the future deviation prediction value e ^* (i + m) is

【００５４】[0054]

【数３１】 [Equation 31]

【００５５】で与え、評価関数Given by, the evaluation function

【００５６】[0056]

【数３２】 [Equation 32]

【００５７】が最小となるように、補正量増分値Δσ
(i) を決定すると、∂Ｊ/ ∂Δσ(i)=0より、前記(21)
式を得る。ただし各定数、ｖ_m, Ｅ, Ｓ, ｓ_j、および
ベクトルｇ₀は次式で与えられる。Correction amount increment value Δσ
When (i) is determined, from ∂J / ∂Δσ (i) = 0, the above (21)
Get the expression. However, each constant, v _m , E, S, s _j , and vector g ₀ are given by the following expressions.

【００５８】[0058]

【数３３】 [Expression 33]

【００５９】また、本願第１、第２の発明において、状
態変化分ベクトルηおよびその増分値ベクトルΔηの実
測値が得られない場合には、オブザーバによる推定値を
用いれば良い。Further, in the first and second inventions of the present application, when the measured values of the state change vector η and the increment value vector Δη cannot be obtained, the estimated value by the observer may be used.

【００６０】[0060]

【発明の効果】以上述べたように本発明によれば、状態
空間モデルを直接利用し、演算量の少ない学習制御装置
が実現され、高精度な追従動作が可能となるという効果
がある。As described above, according to the present invention, there is an effect that a learning control device with a small amount of calculation is realized by directly using a state space model, and a highly accurate follow-up operation is possible.

[Brief description of drawings]

【図１】本願第１の発明の具体的実施例を示す図FIG. 1 is a diagram showing a specific embodiment of the first invention of the present application.

【図２】本願第２の発明の具体的実施例を示す図FIG. 2 is a diagram showing a specific embodiment of the second invention of the present application.

[Explanation of symbols]

１学習制御装置２目標指令を記憶するメモリ３状態ベクトルを記憶するメモリ４定数を記憶するメモリ５偏差を記憶するメモリ６状態変化分を記憶するメモリ７補正量を記憶するメモリ８制御入力を記憶するメモリ９演算器１０、１１減算器１２加算器 1 learning control device 2 memory for storing target command 3 memory for storing state vector 4 memory for storing constant 5 memory for storing deviation 6 memory for storing state change 7 memory for storing correction amount 8 storing control input Memory 9 Operation unit 10, 11 Subtractor 12 Adder

Claims

[Claims]

1. A target command r (i) (= r (i ') i' = iL) and D at the current time i so that the output of the controlled object follows the target command repeating the same pattern in the cycle L. (D ≧
0) In the learning controller that inputs the output y (iD) of the control target before sampling and the state vector x (iD) and outputs the control input u (i) to the control target, the control input u (i ') And add the correction amount σ (i) to u
(i), means for memorizing the target command and finding the deviation e (iD), and memorizing the state vector x, the change η from one cycle before
Means for storing learning control constants, means for storing time series data of deviation, state change vector, correction amount, and control input, and M based on the time series data and the state space model of the controlled object.
Deviation prediction value e ^* and deviation e (iD) up to the sampling future
And the evaluation function relating to the correction amount σ And a means for determining the correction amount σ (i) at the current time so that

2. The correction amount σ (i) at the current time is expressed by (Here, q _m , Q, E, g ₀ , g ₁ , S, _SD and s _j are learning control constants, and when D = 0, the last term on the right side is zero) The learning control device according to claim 1, further comprising:

3. The trials for each cycle are intermittently performed, and the correction amount for the next trial is set between the trials as follows. (Here, q _m , g, and S are learning control constants, and The learning control device according to claim 1, further comprising means for collectively determining one trial based on a state change vector calculated by the state space model.

4. The target command increment value Δr (i) (= Δr (i ') i' = iL) at the current time i so that the output of the controlled object follows the target command that repeats the same pattern in the cycle L.
And D (D ≧ 0) the output increment value Δ of the controlled object before sampling
In the learning control device that inputs y (iD) and the state increment value vector Δx (iD) and outputs the control input u (i) to the control target, the control input increment value Δu (i ') one cycle before is corrected. Incremental value Δσ
(i) is added to obtain Δu (i), target command increment value is stored to obtain deviation increment value and deviation, and state increment value vector Δx is stored to change from one cycle before. Minute Δη, means for storing learning control constants, means for obtaining correction amount from correction amount increment value, means for storing time series data of deviation increment value, compensation amount increment value, control input increment value M by the deviation, the change amount of the state increment value vector, the correction amount, the time series data, and the state space model of the controlled object.
Prediction value Δe ^* of deviation increment value up to the sampling future, evaluation function for deviation e and correction amount [Equation 5] A learning control device comprising means for determining a correction amount increment value Δσ (i) at the present time so as to minimize, and means for obtaining a control input from the control input increment value.

5. The correction amount increment value Δσ (i) at the current time is expressed by (Here, v _m , E, g ₀ , S, s _j are learning control constants, and when D = 0, the last term on the right-hand side is zero). Item 4. The learning control device according to item 4.

6. The learning control device according to claim 1, further comprising means for estimating the state change vector η (iD) or its increment value vector Δη (iD) by an observer.