JPH056204A

JPH056204A - Controller construction processing system

Info

Publication number: JPH056204A
Application number: JP3281741A
Authority: JP
Inventors: Minoru Sekiguchi; 実関口; Tamami Sugasaka; 玉美菅坂; Shigemi Osada; 茂美長田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1990-10-30
Filing date: 1991-10-28
Publication date: 1993-01-14
Anticipated expiration: 2013-03-09
Also published as: JP2723720B2

Abstract

PURPOSE:To easily construct a controller to handle a nonlinear controlled system in a general control rule form concerning the controller construction processing system for constructing the controller used for controlling the controlled system. CONSTITUTION:The system is provided with a learning processor 2 to set a signal conversion function to a teaching signal group, a virtual target managing device 5 to manage experiential knowledge between control states, a manipulation corrected variable calculator 6 to calculate the corrected variable of a control manipulated variable from a control state variable and a virtual target value specified from the management data of the managing device, the control state variable is inputted to the processor 2, and the output is handled as the control manipulated variable and applied to the controlled system. This controller construction processing system constructs the processor 2 as the desired controller while obtaining the teaching signals by correcting the control manipulated variable according to the corrected variable from the calculator at that time, the processor 2 inputs a differential value between the control state variable and a target control state variable, and the managing device manages the experiential knowledge between the control state variables with the differential value from the control state variable to be the control target as a parameter.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、制御対象の制御に用い
られる制御装置を構築するための制御装置構築処理シス
テムに関し、特に、非線型な前記対象を扱う前記装置を
容易に、かつ一般的な制御規則形式でもって構築できる
ようにする制御装置構築処理システムに関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a control device construction processing system for constructing a control device used for controlling an object to be controlled, and more particularly to a device for handling the non-linear object easily and generally. The present invention relates to a control device construction processing system that enables construction in various control rule formats.

【０００２】倒立振子々の１入力２出力系等のような複
雑な制御対象に対しては、古典的ＰＩＤ対応では対応で
きないことから、現代制御理論を適用して制御装置を設
計していく方法が採られている。しかしながら、現代制
御理論の線形制御理論を用いる場合には、制御対象の運
動方程式を線形化して制御対象モデルを構築していくた
めに、線形領域外ではその制御性能が低下するという欠
点がある。A method of designing a control device by applying modern control theory because a classical PID correspondence cannot cope with a complicated controlled object such as a 1-input 2-output system of inverted pendulums. Is taken. However, when the linear control theory of the modern control theory is used, the control performance is deteriorated outside the linear region because the controlled object model is constructed by linearizing the equation of motion of the controlled object.

【０００３】また、現代制御理論の非線形制御理論を用
いる場合には、運動方程式を完全に記述する必要がある
ため、制御対象のパラメータを正確に同定していく必要
があるがこれは極めて困難な作業になるという欠点があ
る。このようなことを背景にして、近年、ニューラルネ
ットワークを用いる新たな構成の制御装置が提案されつ
つある。Further, when the nonlinear control theory of the modern control theory is used, it is necessary to accurately describe the equation of motion, so that it is necessary to accurately identify the parameter of the controlled object, which is extremely difficult. It has the drawback of being work. Against this background, a control device having a new configuration using a neural network is being proposed in recent years.

【０００４】[0004]

【従来の技術】ニューラルネットワークは、教師信号群
があたえられると、学習により、それらの教師信号群の
持つ入出力特性を備えることになるという性質を持つも
のであって、その後、未知の入力信号が与えられると、
それらしい出力信号を出力するという適応的なデータ処
理機能を実現していくという性質をもつものである。2. Description of the Related Art A neural network has a property that, when a teacher signal group is given, the input / output characteristics of those teacher signal groups are provided by learning. Is given,
It has a property of realizing an adaptive data processing function of outputting an output signal like that.

【０００５】このようなニューラルネットワークを制御
装置として構築する場合、制御対象から十分な数の制御
データを入手して、その入手した制御データを教師信号
として用いて学習を実行していくことで、制御対象に対
しての制御規則をニューラルネットワーク上に写像して
いくことで、制御装置として構築していくという方法が
採られることになる。When such a neural network is constructed as a control device, a sufficient number of control data are acquired from the controlled object, and the acquired control data is used as a teacher signal to execute learning. By mapping the control rules for the controlled object on the neural network, a method of constructing as a control device will be adopted.

【０００６】しかしながら、制御対象の制御データを得
ることは、制御対象が複雑になると現実には不可能であ
ることが多い。そこで、これに対処する一方法として、
最近、定性的にはある程度の先験的知識が得られている
が、定量的には未知の部分が多いような制御対象に対し
て、試行によって教師信号を得て、これを用いて制御対
象に対しての制御規則をニューラルネットワーク上に写
像していくという新たな提案（斉藤，北村，“多層型ニ
ューラルネットワークを用いた倒立振子の安定化学習制
御”，ロボティクス・メカトロニクス '９０論文集，ｐ
283-286,1990)がなされるに至った。However, it is often impossible to obtain the control data of the controlled object in reality when the controlled object becomes complicated. So, as a way to deal with this,
Recently, a certain amount of a priori knowledge has been obtained qualitatively, but for a control object that has many unknown parts quantitatively, a teacher signal is obtained by trial, and this is used Proposal of Mapping Control Rules for a Neural Network on a Neural Network (Saito, Kitamura, "Stabilized Learning Control of Inverted Pendulum Using Multilayer Neural Network", Robotics and Mechatronics '90 Proceedings, p.
283-286, 1990).

【０００７】この新たな方法は、台車に乗っている倒立
振子を原点で制止させるために、ニューラルネットワー
クと、仮想目標発生部と、評価部とを備える構成を採っ
ている。This new method employs a configuration including a neural network, a virtual target generation unit, and an evaluation unit in order to stop the inverted pendulum on the carriage at the origin.

【０００８】この新たに備えられる仮想目標値発生部
は、「台車の位置が原点から離れるほど、振子の仮想目
標角度を鉛直方向から原点側により大きく傾ける」とい
う先験的知識を数式で表したものであって、台車の位置
と速度が与えられるときに、これを原点に移すための振
子の角度と角速度の仮想目標値を発生する。This newly provided virtual target value generator expresses a priori knowledge that "the farther the position of the carriage is from the origin, the more the virtual target angle of the pendulum is tilted from the vertical direction to the origin side". When the position and speed of the carriage are given, a virtual target value of the pendulum angle and angular velocity for moving the position and speed of the carriage to the origin is generated.

【０００９】一方、この新に備えられる評価部は、発生
された仮想目標値と制御出力（振子の角度・角速度）と
の差が１サンプリング後にどのようになればよいかを評
価することで、台車に加えるべき力の修正量を求めて、
その修正量により修正された力を教師信号として特定す
る。そして、ニューラルネットワークは、各サンプリン
グ時刻における振子の角度・各速度と、台車の位置・速
度とを入力して、台車に加えるべき力を出力していく。
このニューラルネットワークの学習は、評価部で生成さ
れる教師信号に基づいて、バックプロパゲーション法に
より行われる。On the other hand, the newly provided evaluation unit evaluates how the difference between the generated virtual target value and the control output (angle / angular velocity of the pendulum) should be after one sampling. Finding the amount of correction of the force to be applied to the dolly,
The force corrected by the correction amount is specified as a teacher signal. Then, the neural network inputs the pendulum angle and each speed at each sampling time, and the position and speed of the carriage, and outputs the force to be applied to the carriage.
The learning of the neural network is performed by the back propagation method based on the teacher signal generated by the evaluation unit.

【００１０】この構成を採ることで、定性的にはある程
度の先験的知識が得られているが、定量的には未知の部
分が多いような制御対象に対して、試行によって教師信
号を得て、これを用いて制御対象に対しての制御規則を
ニューラルネットワーク上に写像していくことで、ニュ
ーラルネットワークをその制御対象の制御装置として構
築していくということが実現されることになる。By adopting this configuration, a certain amount of a priori knowledge is qualitatively obtained, but a teacher signal is obtained by trial for a control target that has many unknown quantitatively. Then, by using this to map the control rules for the controlled object on the neural network, it is possible to construct the neural network as a control device for the controlled object.

【００１１】従来より、システムの取りうる中間状態を
仮想目標値という仮想的な値で表し、その仮想目標値を
実現できる入出力関係を学習することによって制御を行
う制御装置が考案されている。しかし、これらは、仮想
目標値が１つしか用いられていないため、システムの状
態が複雑に変化する場合には、制御性能が低下するとい
う問題点があった。そこで、本発明では、複数の仮想目
標値をシステムの状態に応じて設定し、システムの制御
性能の向上をはかるものである。Conventionally, there has been devised a control device which represents an intermediate state that can be taken by the system by a virtual value called a virtual target value, and controls by learning an input / output relationship that can realize the virtual target value. However, since only one virtual target value is used for these, there is a problem that the control performance deteriorates when the system state changes in a complicated manner. Therefore, in the present invention, a plurality of virtual target values are set according to the state of the system to improve the control performance of the system.

【００１２】[0012]

【発明が解決しようとする課題】確かに、この新たに提
案される方法は、非線形な制御対象に用いられる制御装
置を容易に構築できるという利点が得られるようになる
ものの、仮想目標値発生部が、台車の位置と速度が与え
られるときに、これを原点に移すための振子の角度と角
速度の仮想目標を発生するというように、制御状態量の
絶対値を入力として、対応の制御状態量の仮想目標値を
出力していくという構成を採り、一方、ニューラルネッ
トワークが、各サンプリング時刻における振子の角度・
角速度と、台車の位置・速度とを入力とするというよう
に、制御状態量の絶対値を入力する構成を採ることか
ら、制御状態量の目標値が変更されると、いちいち学習
をやり直さなくてはならないという問題点があった。Although the newly proposed method certainly has an advantage that a control device used for a non-linear controlled object can be easily constructed, the virtual target value generation unit However, when the position and speed of the trolley are given, a virtual target of the pendulum angle and angular velocity for moving this to the origin is generated. The neural network outputs a virtual target value of the pendulum angle at each sampling time.
Since the configuration is such that the absolute value of the control state quantity is input, such as the input of the angular velocity and the position / velocity of the carriage, when the target value of the control state quantity is changed, it is not necessary to re-learn each time. There was a problem that it should not happen.

【００１３】そして、上述しなかったが、評価部が、制
御対象の応対特性を変更する複雑な評価式に従って、台
車に加えるべき力の修正量を求めていかなくてはならな
いことから、制御対象の持つ応答抑制に忠実な制御規則
でないものを設定してしまうとともに、制御装置の構築
を短時間で実行できないというよな問題点もあったので
ある。Although not mentioned above, the evaluation unit must obtain the correction amount of the force to be applied to the trolley according to a complicated evaluation formula that changes the response characteristic of the control target. There is also a problem that the control rule that is not faithful to the response suppression of the above is set and that the control device cannot be constructed in a short time.

【００１４】本発明は、非線形な制御対象を扱う制御装
置を容易に、かつ一般的な制御規則形式でもって構築で
きるようにする新たな前記装置構築処理システムの提供
を目的とするものである。It is an object of the present invention to provide a new device construction processing system that enables a control device that handles a non-linear controlled object to be easily constructed in a general control rule format.

【００１５】本発明の他の目的は、複数の仮想目標値を
システムの状態に応じて設定し、目標値までの距離に応
じて選択することによって目標値に安定的に到達するよ
うに制御することである。Another object of the present invention is to set a plurality of virtual target values according to the state of the system and select them according to the distance to the target value so that the target value can be stably reached. That is.

【００１６】[0016]

【課題を解決するための手段】図１は本発明の原理構成
図である。図中は、１はデータ処理装置、２は学習処理
装置、３は制御対象、４は目標値設定装置、５は仮想目
標管理装置、６は操作修正量計算装置、７は第１の差分
器、８は第２の差分器である。FIG. 1 is a block diagram showing the principle of the present invention. In the figure, 1 is a data processing device, 2 is a learning processing device, 3 is a control target, 4 is a target value setting device, 5 is a virtual target management device, 6 is an operation correction amount calculation device, and 7 is a first differencer. , 8 are second differencers.

【００１７】データ処理装置１は、可変的な信号変換機
能を備えて、教師信号群与えられるときに、その信号変
換機能を教師信号群の持つ入出力特性を実現するものに
設定可能とする構成を採る。このデータ処理装置１は、
制御装置として構築されることになって、制御対象３の
制御状態量とその目標値とが与えられるときに、制御対
象３をその目標の制御状態にと制御するための制御操作
量を出力していくことになる。The data processing device 1 has a variable signal conversion function, and when a teacher signal group is given, the signal conversion function can be set to one that realizes the input / output characteristics of the teacher signal group. Take. This data processing device 1 is
When the control state amount of the controlled object 3 and its target value are given, the control device outputs a control operation amount for controlling the controlled object 3 to the target control state. I will go.

【００１８】データ処理装置１は、１つ又は複数の入力
とそれらの入力に対して乗算されるべき内部状態値とを
受け取って積和値を得るとともに、その積和値を所定の
関数によって変換して出力値を得る基本ユニットの内部
結合により構成されるネットワーク構造部で構成され
る。The data processing device 1 receives one or a plurality of inputs and internal state values to be multiplied with these inputs to obtain a product sum value, and converts the product sum value by a predetermined function. It is composed of a network structure composed of internal connections of basic units for obtaining output values.

【００１９】また、データ処理装置１は制御状態量と制
御操作量との間の定性的なデータ関係をＩＦ−ＴＨＥＮ
ルールで既述するとともに、ＩＦ−ＴＨＥＮルールに記
述される制御状態量及び制御操作量の定性属性をメンバ
シップ関数で記述するファジィ装置で構成してもよい。Further, the data processing apparatus 1 uses the IF-THEN to determine the qualitative data relationship between the control state quantity and the control operation quantity.
A fuzzy device may be used which describes the qualitative attributes of the control state amount and the control operation amount described in the IF-THEN rule by the membership function, as described in the rule.

【００２０】学習処理装置２は、教師信号群が与えられ
るときに、データ処理装置１の信号変換機能を教師信号
群の持つ入出力特性を実現するように学習するものであ
る。この学習処理装置２は、データ処理装置１がネット
ワーク構造部により構成されるときには、よく知られて
いるバックプロパゲーション法等の学習アルゴリズムを
実行していくことになる。The learning processing device 2 learns the signal conversion function of the data processing device 1 so as to realize the input / output characteristics of the teacher signal group when the teacher signal group is given. When the data processing device 1 is configured by the network structure unit, the learning processing device 2 executes a well-known learning algorithm such as a back propagation method.

【００２１】制御対象３は、制御装置として構築される
データ処理装置１により制御される制御対象である。こ
の制御対象３は、現実の制御対象が用いられることが好
ましいが、現実の制御対象ではなくてその制御対象モデ
ルが用いられることもある。The control target 3 is a control target controlled by the data processing device 1 constructed as a control device. It is preferable that an actual controlled object is used as the controlled object 3, but the controlled object model may be used instead of the actual controlled object.

【００２２】目標値設定装置４は、制御対象３の所望の
制御状態を表す制御状態量の目標値を設定する。仮想目
標管理装置５は、制御対象３の所望の制御状態を実現す
るために得られている制御状態量間のデータ関係の先験
的知識を管理する。本発明の仮想目標管理装置５は、こ
の先験的知識の管理データを制御状態量の目標値との差
分値をパラメータにして管理する。The target value setting device 4 sets a target value of the control state quantity representing the desired control state of the controlled object 3. The virtual target management device 5 manages a priori knowledge of the data relationship between the control state quantities, which is obtained to realize the desired control state of the controlled object 3. The virtual goal management device 5 of the present invention manages this a priori knowledge management data using the difference value between the control state quantity and the target value as a parameter.

【００２３】操作修正量計算装置６は、制御対象３の持
つ制御状態量とその制御状態量に対応して仮想目標管理
装置５の管理データから特定される制御状態量の仮想的
な目標値とから、制御状態量の目標値を実現するために
必要となる制御対象３に対しての制御操作量の修正量を
算出する。The operation correction amount calculation device 6 has a control state amount of the controlled object 3 and a virtual target value of the control state amount specified from the management data of the virtual target management device 5 corresponding to the control state amount. From this, the correction amount of the control operation amount for the controlled object 3 required to realize the target value of the control state amount is calculated.

【００２４】この操作修正量計算装置６は、制御対象３
の持つ制御状態量と、仮想目標管理装置５の出力する仮
想目標値との差分値に比例係数を乗ずることで、制御操
作量の修正量を算出する構成を採ることがある。This operation correction amount calculation device 6 is controlled by the controlled object 3
There may be a configuration in which the correction amount of the control operation amount is calculated by multiplying the difference value between the control state amount of the virtual target management device 5 and the virtual target value output by the virtual target management device 5 by the proportional coefficient.

【００２５】第１の差分器７は、目標値設定装置４の設
定する制御状態量の目標値と、制御対象３の持つ制御状
態量との差分値を算出して、その差分値をデータ処理装
置１と学習処理装置２とに入力する。このとき、ダイナ
ミックレンジ調整のためこの第１の差分器７の出力値に
比例係数が乗じられることがある。The first difference unit 7 calculates a difference value between the target value of the control state amount set by the target value setting device 4 and the control state amount of the controlled object 3, and the difference value is subjected to data processing. Input to the device 1 and the learning processing device 2. At this time, in order to adjust the dynamic range, the output value of the first difference unit 7 may be multiplied by the proportional coefficient.

【００２６】第２の差分器８は、データ処理装置１の出
力する制御操作量と、操作修正量計算装置６の出力する
制御操作量の修正量との差分値を算出して、その差分
値、すなわち、データ処理装置１から出力されている制
御操作量を算出された修正量で補正したものを学習処理
装置２に入力する。The second differencer 8 calculates a difference value between the control operation amount output from the data processing device 1 and the correction amount of the control operation amount output from the operation correction amount calculation device 6, and the difference value. That is, the control operation amount output from the data processing device 1 is corrected by the calculated correction amount and input to the learning processing device 2.

【００２７】[0027]

【作用】本発明では、データ処理装置１の信号変換機能
が例えば初期状態に設定されているときに、制御対象３
から制御状態量の初期値が出力されると、第１の差分器
７は、目標値設定装置４の設定する制御状態量の目標値
とその制御状態量の初期値との差分値を算出して、デー
タ処理装置１に入力する。この入力を受けて、データ処
理装置１は、初期状態信号変換機能により規定される制
御操作量を算出して制御対象３に出力し、この制御操作
量の出力処理を受けて、制御対象３は初期状態とは異な
る制御状態に遷移する。以下、制御対象３の制御状態が
規定の限界に達するまで、この処理を繰り返していく。In the present invention, when the signal conversion function of the data processing device 1 is set to, for example, the initial state, the control target 3
When the initial value of the control state quantity is output from, the first difference unit 7 calculates the difference value between the target value of the control state quantity set by the target value setting device 4 and the initial value of the control state quantity. And input it to the data processing device 1. In response to this input, the data processing device 1 calculates the control operation amount defined by the initial state signal conversion function and outputs the control operation amount to the control target 3, and the control target 3 receives the output process of the control operation amount. Transition to a control state different from the initial state. Hereinafter, this process is repeated until the control state of the controlled object 3 reaches the specified limit.

【００２８】この処理時に、仮想目標管理装置５は、制
御対象３から制御状態量を受けると、管理データに従っ
て制御状態量の仮想的な目標値を特定する。この仮想目
標管理装置５の処理に従って、例えば、制御対象３が１
入力２出力系の制御系の例で説明するならば、制御対象
３から出力される一方の制御状態量に対しての他方の制
御状態量の仮想的な目標値が特定されることになる。During this processing, when the virtual target management device 5 receives the control state quantity from the controlled object 3, it specifies the virtual target value of the control state quantity according to the management data. According to the processing of the virtual target management device 5, for example, the control target 3 is 1
In the case of the example of the control system of the input 2 output system, a virtual target value of one control state quantity output from the controlled object 3 for the other control state quantity is specified.

【００２９】仮想目標管理装置５が制御状態量の仮想的
な目標値を特定すると、操作修正量計算装置６は、この
仮想的な目標値を使用して、制御状態量の目標値を実現
するために必要となる制御対象３に対しての制御操作量
の修正量を算出する。この修正量の算出処理に従って、
目標値設定装置４により設定される目標の制御状態量を
実現するために、その処理時点のデータ処理装置１の出
力する制御操作量が、どのように修正されるべきかが決
定されることになる。When the virtual target management device 5 specifies the virtual target value of the control state quantity, the operation correction amount calculation device 6 uses this virtual target value to realize the target value of the control state quantity. The correction amount of the control operation amount for the controlled object 3 necessary for that is calculated. According to this correction amount calculation process,
In order to realize the target control state amount set by the target value setting device 4, it is decided how the control operation amount output by the data processing device 1 at the time of the processing should be corrected. Become.

【００３０】このようにして、データ処理装置１に入力
される制御状態量の差分値と、その差分値の入力時点で
のより好ましい制御操作量とからなる教師信号群が求め
られると、学習処理装置２は、データ処理装置１の信号
変換機能の学習処理を実行して、信号変換機能をより目
標の制御状態を実現するために適しているものに設定す
る。In this way, when the teacher signal group consisting of the difference value of the control state quantity input to the data processing device 1 and the more preferable control operation quantity at the time of inputting the difference value is obtained, the learning processing is performed. The device 2 executes the learning process of the signal conversion function of the data processing device 1 and sets the signal conversion function to one suitable for realizing a more targeted control state.

【００３１】そして、この新たに設定されるデータ処理
装置１の信号変換機能に従って上述と同様の処理を繰り
返していくことで次の教師信号群が生成されるように処
理し、学習処理装置２に従って、データ処理装置１の信
号変換機能を目標の制御状態を実現するものに設定して
いくことで、データ処理装置１を制御装置として構築し
ていく。Then, according to the signal conversion function of the newly set data processing device 1, the same process as described above is repeated so that the next teacher signal group is generated. The data processing device 1 is constructed as a control device by setting the signal conversion function of the data processing device 1 so as to realize a target control state.

【００３２】このように、本発明では定性的にはある程
度の先験的知識が得られているが、定量的には未知の部
分が多いような制御対象３に対して、試行によって教師
信号を得て、これを用いて制御対象３に対しての制御規
則をデータ処理装置１の信号変換機能上に写像していく
ことで、データ処理装置１をその制御対象３の制御装置
として構築していくときにあって、制御状態量の目標値
との差分値に従ってデータ処理装置１の構築処理を実行
していくように構成するものであることから、制御状態
量の目標値が変更されるときにあっても学習をやり直さ
なくて済むようになるのである。As described above, in the present invention, a certain amount of a priori knowledge is obtained qualitatively, but the teaching signal is tentatively applied to the controlled object 3 in which there are many unknown parts quantitatively. Then, by using this, the control rule for the controlled object 3 is mapped onto the signal conversion function of the data processing apparatus 1, so that the data processing apparatus 1 is constructed as the control apparatus for the controlled object 3. When the target value of the control state quantity is changed, the construction processing of the data processing device 1 is executed according to the difference value from the target value of the control state quantity. Even if there is, you will not have to start learning again.

【００３３】目標位置が変わっても、例えば倒立させる
ための制御対象への入力が目標位置と現在位置との差分
値に関しては変化しない。つまり、制御対象への入力
は、差分値によってのみ変化し、もし目標位置が変わっ
ても、その差分値が変わらなければ、同じ制御対象への
入力を出力すればよい。従って、差分値に関する制御対
象への入力を学習しておけば、あとはその差分値に従っ
て制御が行われるだけなので、学習し直す必要がない。Even if the target position changes, for example, the input to the controlled object for inversion does not change with respect to the difference value between the target position and the current position. That is, the input to the controlled object changes only by the difference value, and even if the target position changes, if the difference value does not change, the input to the same controlled object may be output. Therefore, if the input to the controlled object related to the difference value is learned, the control is only performed according to the difference value, and it is not necessary to relearn.

【００３４】本発明は、制御状態量の目標値が変更され
た場合でも現在値と目標値との差分値とそれに対応する
制御対象への入力値との関係を予め学習しておくことに
より制御状態量を所望の値に制御するすなわち制御対象
を任意の目標位置において所望の制御状態にすることが
できる。さらに、本発明においては目標値設定装置４を
設けたので、振り子を倒立させたまま倒立位置をある速
度で移動させることが可能になる。According to the present invention, even if the target value of the control state quantity is changed, the relationship between the difference value between the current value and the target value and the corresponding input value to the controlled object is learned in advance. The state quantity can be controlled to a desired value, that is, the controlled object can be brought into a desired control state at an arbitrary target position. Further, since the target value setting device 4 is provided in the present invention, it becomes possible to move the inverted position at a certain speed while the pendulum is inverted.

【００３５】また、本発明は制御対象の変数の依存関係
を示す仮想目標曲線を複数設け、各仮想目標曲線に対す
る複数の制御部の１つを制御対象の制御状態量に従って
選択する。Further, according to the present invention, a plurality of virtual target curves showing the dependency relation of the variable of the controlled object are provided, and one of the plurality of control units for each virtual target curve is selected according to the control state quantity of the controlled object.

【００３６】さらに、本発明は制御対象の変数の依存関
係を示す仮想目標曲線を複数設け、制御対象の変数の１
部の領域に従って該変数の仮想目標曲線の１つを選択す
ることにより制御するようにした。Further, according to the present invention, a plurality of virtual target curves showing the dependency of the variable to be controlled are provided, and one of the variables to be controlled
Control was performed by selecting one of the virtual target curves of the variable according to the area of the part.

【００３７】[0037]

【実施例】以下、実施例に従って本発明を詳細に説明す
る。図２に、本発明の一実施例を図示する。図中、１０
はニューラルネットワークであって、制御装置として機
能するもの、１１は学習処理装置であって、ニューラル
ネットワーク１０の学習処理を実行するもの、１２は倒
立振子モデルであって、制御対象となる１入力２出力系
の制御系をなすもの、１３は目標値設定装置であって、
倒立振子モデル１２の制御実施例量の目標値を設定する
もの、１４は仮想目標計算装置であって、倒立振子モデ
ル１２の制御状態量の仮想目標を算出するもの、１５は
トルク修正量計算装置であって、ニューラルネットワー
ク１０の出力するトルクの修正量を算出するもの、１６
は第１の遅延器であって、倒立振子モデル１２の出力す
る制御状態量を１サンプリング時間遅延して仮想目標計
算装置１４に与えるもの、１７は第２の遅延器であっ
て、倒立振子モデル１２の出力する制御状態量を１サン
プリング時間遅延するもの、１８は第１の差分器であっ
て、目標値設定装置１３の設定する制御状態量の目標値
と、第２の遅延器１７の出力する制御状態量との差分値
を算出して、その差分値をニューラルネットワーク１０
と学習処理装置１１に入力するもの、１９は第２の差分
器であって、ニューラルネットワーク１０の出力するト
ルクと、トルク修正量計算装置１５の算出するトルク修
正量との差分値を算出して、その差分値を学習処理装置
１１に入力するものである。ここで、図中の（ｎ）は、
サンプリング時刻を表している。EXAMPLES The present invention will be described in detail below with reference to examples. FIG. 2 illustrates one embodiment of the present invention. 10 in the figure
Is a neural network, which functions as a control device, 11 is a learning processing device, which executes the learning process of the neural network 10, 12 is an inverted pendulum model, and 1 input 2 to be controlled An output system control system, 13 is a target value setting device,
What sets a target value of the control embodiment amount of the inverted pendulum model 12, 14 is a virtual target calculation device, which calculates a virtual target of the control state amount of the inverted pendulum model 12, 15 is a torque correction amount calculation device Which calculates the correction amount of the torque output from the neural network 10,
Is a first delay device for delaying the control state quantity output from the inverted pendulum model 12 by one sampling time and giving it to the virtual target calculation device 14, and 17 is a second delay device for the inverted pendulum model. The control state quantity output from 12 is delayed by one sampling time, and 18 is a first difference device, which is a target value of the control state quantity set by the target value setting device 13 and the output of the second delay device 17. The difference value from the control state quantity to be calculated is calculated, and the difference value is calculated.
And a second differencer 19 for calculating the difference value between the torque output from the neural network 10 and the torque correction amount calculated by the torque correction amount calculation device 15. The difference value is input to the learning processing device 11. Here, (n) in the figure is
It represents the sampling time.

【００３８】図３に、この実施例で制御対象モデルとし
て想定した倒立振子モデル１２を図示する。この図３に
示すように、倒立振子モデル１２は、原点０でモータシ
ャフト（Ｚ軸）に連結されたベースリンクＬ₁の他端Ｃ
に、リンクＬ₂がベースリンクＬ₁を回転軸にして連結
されることにより構成される。ベースリンクＬ₁とリン
クＬ ₂の回転角をそれぞれθ₁，θ₂とし、質量をそれ
ぞれｍ₁，ｍ₂とし、長さをそれぞれｌ₁，ｌ₂とし、
重力加速度をｇ、モータのトルクをＴで表すならば、こ
の倒立振子モデル１２の運動方程式は図４に示すものに
なる。FIG. 3 shows a controlled object model in this embodiment.
The inverted pendulum model 12 assumed as described above is illustrated. In this Figure 3
As shown, the inverted pendulum model 12 is
Base link L connected to shaft (Z axis)₁Other end C
And link L₂Is the base link L₁Is connected as a rotation axis
It is configured by Base link L₁And Rin
Black L ₂Rotation angle of θ₁, Θ₂And the mass
Each m₁, M₂And the length is l₁, L₂age,
If the gravitational acceleration is g and the motor torque is T,
The equation of motion of the inverted pendulum model 12 of
Become.

【００３９】この実施例では、この運動方程式に従う倒
立振子モデル１２に対して、モータの発生するトルクＴ
を制御して、各リンクＬ₁，Ｌ₂の状態をフィードバッ
クすることにより振子を倒立させ、更に、ベースリンク
Ｌ₁を適当な目標位置で停止させることを制御目標とす
るものである。In this embodiment, the torque T generated by the motor is applied to the inverted pendulum model 12 according to this equation of motion.
Is controlled to feed back the states of the respective links L ₁ and L ₂ to invert the pendulum, and further to stop the base link L ₁ at an appropriate target position.

【００４０】このように、倒立振子モデル１２は制御状
態量として、As described above, the inverted pendulum model 12 has

【００４１】[0041]

【数１】 [Equation 1]

【００４２】という４つを持ち、制御操作量として、モ
ータのトルクＴという１つを持つことになるので、第１
の差分器１８は、目標値設定装置１３の設定する対応の
目標値が、Since there are four of them, and one of them is the torque T of the motor as the control operation amount,
In the differencer 18 of, the corresponding target value set by the target value setting device 13 is

【００４３】[0043]

【数２】 [Equation 2]

【００４４】であるとするならば、If

【００４５】[0045]

【数３】 [Equation 3]

【００４６】という差分値を算出してニューラルネット
ワーク１０に入力することになる。なお、以下におい
て、記述の便宜上、角速度の微分値を表す場合には、そ
の角度の前に（d/dt) を付けることがある。The difference value is calculated and input to the neural network 10. In the following, for convenience of description, when expressing the differential value of the angular velocity, (d / dt) may be added before the angle.

【００４７】これから、ニューラルネットワーク１０
は、入力装置として４ユニットを持ち、出力層としてト
ルクＴを出力する１ユニットを持つものを用意する必要
があるので、この実施例では、図５に示すように、４個
の入力ユニット２０からなる入力層と、８個の基本ユニ
ット２１からなる１段構成の中間層と、１個の基本ユニ
ット２１からなる出力層とを備えて、入力層の入力ユニ
ット２０と中間層の基本ユニット２１との間の内部結合
と、中間層の基本ユニット２１と出力層の基本ユニット
２１との間の内部結合に、それぞれ重み値の設定される
階層ネットワーク構成のニューラルネットワーク１０を
用意することにする。From now on, the neural network 10
Needs to have four units as an input device and one unit for outputting the torque T as an output layer. Therefore, in this embodiment, as shown in FIG. An input layer, an intermediate layer having a one-stage structure composed of eight basic units 21, and an output layer composed of one basic unit 21. A neural network 10 having a hierarchical network configuration in which weight values are respectively set for the internal connection between the basic units 21 of the intermediate layer and the basic unit 21 of the output layer is prepared.

【００４８】この入力層の入力ユニット２０は、入力信
号値をそのまま分配して中間層の基本ユニット２１に出
力し、中間層及び出力層の基本ユニット２１は、複数の
入力に対し夫々の内部結合の重み値を乗算する乗算処理
部と、それらの全乗算結果を加算する累積処理部と、こ
の累積値に非線型の閾値処理を施して一つの最終出力を
出力する閾値処理部とを備える。学習処理装置１１は、
教師信号群の持つ入出力特性を実現するようになるべ
く、これらの内部結合の重み値の学習処理を実行するこ
とになる。The input unit 20 of the input layer distributes the input signal value as it is and outputs it to the basic unit 21 of the intermediate layer, and the basic units 21 of the intermediate layer and the output layer each internally couple a plurality of inputs. A multiplication processing unit that multiplies the weight values of, a cumulative processing unit that adds all the multiplication results, and a threshold processing unit that performs non-linear threshold processing on the cumulative value and outputs one final output. The learning processing device 11 is
In order to realize the input / output characteristics of the teacher signal group, the learning process of the weight values of these internal couplings will be executed.

【００４９】仮想目標計算装置１４は、倒立振子モデル
１２に対して得られている「ベースリンクＬ₁の回転角
θ₁が目標位置から離れている場合にリンクＬ₂の回転
各θ ₂を目標位置側に倒すことによって、ベースリンク
Ｌ₁の回転角θ₁が目標位置に近づく」という先験的な
制御知識に基づいて、リンクＬ₂の回転角θ₂の仮想的
な目標値θ_{d 2}とその角速度（d/dt) θ₂の仮想的な目
標値（d/dt) θ_{d 2}とを算出して出力する。The virtual target calculator 14 is an inverted pendulum model.
12 obtained for "base link L₁Rotation angle of
θ₁Link L when is away from the target position₂Rotation of
Each θ ₂The base link by tilting the
L₁Rotation angle of₁Approaches the target position ”
Link L based on control knowledge₂Rotation angle of₂Virtual
Target value θ_{d 2}And its angular velocity (d / dt) θ₂Virtual eyes of
Standard value (d / dt) θ_{d 2}And are calculated and output.

【００５０】すなわち、仮想目標計算装置１４は、目標
値設定装置１３から、ベースリンクＬ₁の回転角θ₁の
目標値θ_{t 1}が与えられ、倒立振子モデルから、第１の
遅延器１６を介して、ベースリンクＬ₁の回転角θ₁と
その角速度（d/dt) θ₁が与えられると、リンクＬ₂の
回転角θ₂の仮想的な目標値θ_{d 2}と、その角速度（d/
dt) θ₂の仮想的な目標値（d/dt) θ_{d 2}とを、That is, the virtual target calculation device 14 is provided with the target value θ _{t 1} of the rotation angle θ ₁ of the base link L ₁ from the target value setting device 13 and the first delay device 16 from the inverted pendulum model. through it, the rotation angle theta ₁ of the base link L ₁ and the angular velocity (d / dt) θ ₁ is given, as a virtual target value theta _{d 2} of the rotation angle theta ₂ of the link L _2, the angular velocity (d /
dt) θ ₂ virtual target value (d / dt) θ _{d 2}

【００５１】[0051]

【数４】 [Equation 4]

【００５２】という計算式に従って算出して出力するよ
う処理することになる。ここで、θ₂ _maxは、リンクＬ
₂の回転角θ₂の最大倒れ角である。また、仮想目標計
算装置１４は、ベースリンクＬ₁の回転角θ₁とその角
速度（d/dt) θ₁については、目標値設定装置１３から
与えられる目標値をそのまま出力していくことになる。
なお、この式では、シグモイド関数を用いているが、θ
₁とθ_{d 2}が比例するような関係式を用いることも可能
である。この式から分かるように、ベースリンクＬ₁の
回転角θ₁が目標位置であるθ_{t 1}に達すると、リンク
Ｌ₂の回転角θ₂の仮想目標値θ_{d 2}が０になるので振
子は倒立することになるのである。The calculation is performed according to the calculation formula Where θ ₂ _max is the link L
₂ which is the maximum tilt angle of the rotation angle θ _2. Further, the virtual target calculation device 14 outputs the target value given from the target value setting device 13 as it is for the rotation angle θ ₁ of the base link L ₁ and its angular velocity (d / dt) θ _1. .
It should be noted that although the sigmoid function is used in this equation,
It is also possible to use a relational expression in which ₁ and θ _{d 2} are proportional. As can be seen from this equation, when the rotation angle θ ₁ of the base link L ₁ reaches the target position θ _{t 1} , the virtual target value θ _{d 2} of the rotation angle θ ₂ of the link L ₂ becomes 0, so the pendulum You will be upside down.

【００５３】トルク修正量計算装置１５は、リンクＬ₂
の回転角θ₂が仮想目標値θ_{d 2}になるようにと、ベー
スリンクＬ₁に加えるトルクＴを修正するものである。
すなわち、トルク修正量計算装置１５は、仮想目標計算
装置１４から、リンクＬ₂の回転角θ₂の仮想的な目標
値θ_{d 2}と、その角速度（d/dt)θ₂の仮想的な目標値
（d/dt) θ_{d 2}と、ベースリンクＬ₁の回転角θ₁の角
速度（d/dt) θ₁の仮想的な目標値（d/dt) θ_{d 1}（こ
の場合は、実際の目標値（d/dt) θ_{t 1}に一致する）と
が与えられ、倒立振子モデル１２から、リンクＬ₂の回
転角θ₂とその角速度（d/dt) θ₂と、ベースリンクＬ₁
の回転角θ₁の角速度（d/dt) θ₁とが与えられると、
トルクの修正量ΔＴ（ｎ）を、The torque correction amount calculation device 15 uses the link L ₂
The torque T applied to the base link L ₁ is corrected so that the rotation angle θ ₂ of the virtual link becomes the virtual target value θ _{d 2} .
That is, the torque correction amount calculation device 15 receives the virtual target value θ _{d 2} of the rotation angle θ ₂ of the link L ₂ and the virtual target of the angular velocity (d / dt) θ ₂ from the virtual target calculation device 14. value as _{(d / dt) θ d 2} , the rotation angle theta ₁ of the base link L ₁ angular velocity (d / dt) θ ₁ of the virtual target value _{(d / dt) θ d 1} ( in this case, actual the target value (d / dt) matching theta _{t 1)} and is provided, the inverted pendulum model 12, the rotation angle of the link L ₂ theta ₂ and its angular velocity (d / dt) θ _2, the base link L ₁
Given the angular velocity (d / dt) θ ₁ of the rotation angle θ ₁ of
Torque correction amount ΔT (n)

【００５４】[0054]

【数５】 [Equation 5]

【００５５】という計算式に従って算出して出力するよ
う処理することになる。ここで、この式中の第３項は、
減衰項として働き、回転角θ₁の目標位置付近で急速に
停止させる効果を発揮するものである。Processing is performed so as to calculate and output according to the calculation formula Where the third term in this equation is
It acts as a damping term and exerts the effect of rapidly stopping in the vicinity of the target position of the rotation angle θ ₁ .

【００５６】第２の差分器１９は、このトルク修正量計
算装置１５の出力するトルク修正量計算装置１５の出力
するトルク修正量を受けて、ニューラルネットワーク１
０の出力するトルクＴ（ｎ）と、トルク修正量計算装置
１５の算出するトルク修正量 ΔＴ（ｎ）との差分値Ｔ’（ｎ）Ｔ’（ｎ）＝Ｔ（ｎ）−ΔＴ（ｎ）を算出して、この差分値がニューラルネットワーク１０
の出力しているトルクの好ましい値であるとして、学習
処理装置１１に通知していくよう処理することになる。The second subtractor 19 receives the torque correction amount output from the torque correction amount calculation device 15 output from the torque correction amount calculation device 15, and receives the torque correction amount from the neural network 1
The difference value T ′ (n) T ′ (n) = T (n) −ΔT (n) between the torque T (n) output by 0 and the torque correction amount ΔT (n) calculated by the torque correction amount calculation device 15. ) Is calculated, and this difference value is calculated by the neural network 10
The learning processing device 11 is informed that the output torque is a preferable value.

【００５７】学習処理装置１１は、このようにして得ら
れていくニューラルネットワーク１０に対しての入力値
と、そのときのより好ましいニューラルネットワーク１
０のトルク値の出力とを教師信号として用いて本出願人
が仮想インピーダンス制御法として提案している改良さ
れたバックプロパゲーション法に従って、ニューラルネ
ットワーク１０の内部結合の重み値の学習を高速に実行
していくよう処理することになる。The learning processing device 11 inputs the input value to the neural network 10 thus obtained and a more preferable neural network 1 at that time.
The output of the torque value of 0 is used as a teacher signal, and the learning of the weight value of the internal coupling of the neural network 10 is executed at high speed in accordance with the improved backpropagation method proposed by the applicant as a virtual impedance control method. It will be processed as it does.

【００５８】学習の仕方について述べると、まずベース
リンクＬ₁（腕）を３０°、リンクＬ₂（振り子）を０
°に設定して試行を行う。その結果、倒れるまでに２０
個のサンプリング値が得られたとする。これをメモリに
記憶し、このサンプリング値を使って教師信号のトルク
値を自動生成して、ニューラルネットの重み値を更新
し、学習を行う。To describe the learning method, first, the base link L ₁ (arm) is 30 °, and the link L ₂ (pendulum) is 0.
Set to ° and try. As a result, 20
Suppose that sampling values have been obtained. This is stored in the memory, the torque value of the teacher signal is automatically generated using this sampling value, the weight value of the neural network is updated, and learning is performed.

【００５９】次に腕を３０°、振り子を０°に設定し
て、２回目の試行を行う。その間で前の２０個のサンプ
リング値によって決まった重み値をもとにして、たとえ
ば４０個のサンプリング値をとって、新しいサンプリン
グ値として、メモリに記憶し、このサンプリング値を使
って、教師信号のトルク値を自動生成して、ニューラル
ネットの重み値を更新し、学習を行う。Next, the arm is set to 30 ° and the pendulum is set to 0 °, and the second trial is performed. In the meantime, based on the weight values determined by the previous 20 sampling values, for example, 40 sampling values are taken and stored as a new sampling value in the memory. A torque value is automatically generated, the weight value of the neural network is updated, and learning is performed.

【００６０】このようにして、腕を０°から９０°まで
動かして、ニューラルネットが最終的に記憶した重み値
を使えば、任意の初期位置と目標位置（ただし、初期位
置と目標位置との差が０°から９０°までの間）とが与
えられても、その目標位置で倒立させることができる。In this way, by moving the arm from 0 ° to 90 ° and using the weight value finally stored by the neural network, an arbitrary initial position and target position (provided that the initial position and the target position are And a difference between 0 ° and 90 °) is given, it can be inverted at the target position.

【００６１】腕は例えば０°から９０°の全領域を動か
せばよりよいネットワークが構築されるが、経験上全領
域を動かす必要はない。また、初期位置と目標位置との
差が０°から９０°迄の間以上の差であっても、ある程
度は制御可能である。A better network can be constructed by moving the entire region of the arm from 0 ° to 90 °, but it is not necessary to move the entire region from experience. Further, even if the difference between the initial position and the target position is a difference between 0 ° and 90 ° or more, control is possible to some extent.

【００６２】目標位置が変わっても、例えば倒立させる
ための制御トルクが目標位置と現在位置との差分値に関
しては変化しない。つまり、制御トルクは、差分値によ
ってのみ変化し、もし目標位置が変わっても、その差分
値が変わらなければ、同じ制御トルクを出力すればよ
い。従って、本発明によれば、差分値に関する制御トル
クを学習しておけば、あとはその差分値に従って制御が
行われるだけなので、目標位置が変わっても学習し直す
必要がない。Even if the target position changes, for example, the control torque for inversion does not change with respect to the difference value between the target position and the current position. That is, the control torque changes only according to the difference value, and if the difference value does not change even if the target position changes, the same control torque may be output. Therefore, according to the present invention, if the control torque related to the difference value is learned, the control is thereafter performed only in accordance with the difference value, and it is not necessary to relearn even if the target position changes.

【００６３】次にシミュレーション結果に従って、この
ように構成される本発明の実施例の有効性について説明
する。このシミュレーションは、倒立振子モデル１２の
ベースリンクＬ₁の質量ｍ₁を“１”、リンクＬ₂の質
量ｍ₂を“0.25" 、ベースリンクＬ₁の長さｌ₁を“0.
2"、リンクＬ₂の長さｌ₂を“0.5"、トルク修正量計算
装置１５の使用する係数Ｋ₁，Ｋ₂，Ｋ₃の値をそれぞ
れ“１”、“１”“0.1"に想定して行った。Next, the effectiveness of the embodiment of the present invention configured as above will be described according to the simulation results. The simulation mass m ₁ of the base link L ₁ of the inverted pendulum model 12 "1", the mass m ₂ of the link L ₂ "0.25", the length l ₁ of the base link L ₁ "0.
2 assumed ", the link L ₂ of the length l _2" 0.5 ", the coefficient K ₁ for use in a torque modification value computing device 15, K _2, K ₃ values respectively" 1 "," 1 "" 0.1 " I went.

【００６４】そして、第１の差分器１８の出力する差分
値の内の（d/dt)Ｅ₁については、“0.1"という係数を
乗じてニューラルネットワーク１０に入力していくとい
う方法をとった。回転角θ₁の初期値は３０°、回転角
θ₂の初期値は０°、回転角θ₁の目標値θ_{t 1}は０
°、回転角θ₂の目標値θ_{t 2}は０°、回転角θ₂の最
大倒れ角θ_2maxは２０°に設定し、ニューラルネットワ
ーク１０の内部結合の重み値の初期値は、±０．０１の
ランダム値に従って設定した。Then, (d / dt) E ₁ of the difference values output from the first difference unit 18 is multiplied by a coefficient of "0.1" and input to the neural network 10. . The initial value of the rotation angle theta ₁ is 30 °, the initial value of the rotation angle theta ₂ is 0 °, the target value theta _{t 1} of the rotation angle theta ₁ is 0
°, the target value theta _{t 2} is 0 ° rotation angle theta _2, the maximum inclination angle theta _2max of the rotation angle theta ₂ is set to 20 °, the initial value of the weight values of the internal bonds of the neural network 10, ± 0. It was set according to a random value of 01.

【００６５】シミュレーションは、次のように試行す
る。すなわち、倒立振子モデル１２の状態〔θ_i(n) ，
（d/dt) θ_i(n) 〕（ｉ＝１，２）をサンプリング時間
０．０１秒単位でサンプリングする。そして、このとき
のニューラルネットワーク１０のトルク出力を算出し、
このトルク出力を倒立振子モデル１２のベースリンクＬ
₁に与えて、これにより変化する倒立振子モデル１２の
状態〔θ_i(n＋1)，（d/dt) θ_i(n＋1)〕をシミュレー
トしていく。The simulation is tried as follows.
It That is, the state of the inverted pendulum model 12 [θ_i(n),
(D / dt) θ_i(n)] (i = 1, 2) is the sampling time
Sample every 0.01 seconds. And at this time
Calculates the torque output of the neural network 10 of
This torque output is used as the base link L of the inverted pendulum model 12.
₁Of the inverted pendulum model 12
State (θ_i(n + 1), (d / dt) θ_i(n + 1)] is simulated
To continue.

【００６６】このとき、トルク修正量計算装置１５に従
って、より好ましいニューラルネットワーク１０の出力
トルクを算出していくことで教師信号を得る。この処理
を５００ステップ、すなわち５秒間繰り返す。このと
き、倒立振子４５°以上傾いたときには、そこで試行を
打ち切る。これから教師信号は、最大５００個得られる
ことになる。At this time, a teacher signal is obtained by calculating a more preferable output torque of the neural network 10 according to the torque correction amount calculation device 15. This process is repeated for 500 steps, that is, for 5 seconds. At this time, if the inverted pendulum is tilted by 45 ° or more, the trial is terminated there. From this, a maximum of 500 teacher signals will be obtained.

【００６７】このようにして教師信号群が得られると、
学習処理装置１１は、改良されたバックプロパゲーショ
ン法に従って、ニューラルネットワーク１０の内部結合
の重みを学習していく。この学習処理は、バックプロパ
ゲーション法の学習回数が１００回のところで打ち切っ
て、そのときに得られた重み値を新たな重み値としてニ
ューラルネットワーク１０の内部結合に設定する。そし
てこの新たに設定された重み値を持つニューラルネット
ワーク１０に従って、上述の試行を繰り返していくこと
で、倒立振子モデル１２の振子の倒立を制御するニュー
ラルネットワーク１０の構築をシミュレートしていくこ
とになる。When the teacher signal group is obtained in this way,
The learning processing device 11 learns the weight of the internal connection of the neural network 10 according to the improved backpropagation method. This learning processing is terminated when the number of times of learning of the back propagation method is 100, and the weight value obtained at that time is set as a new weight value in the internal connection of the neural network 10. Then, by repeating the above-described trial according to the neural network 10 having the newly set weight value, the construction of the neural network 10 for controlling the pendulum inversion of the inverted pendulum model 12 is simulated. Become.

【００６８】図６〜図１１に、このシミュレーション処
理に得られたシミュレーションデータの一例を図示す
る。図６のシミュレーションデータは、上述の試行を１
０回行った後、倒立振子モデル１２の回転角θ₁の初期
値を９０°、回転角θ₂の初期値を１０°、回転角θ₁
の目標値θ_{t 1}を０°、回転角θ₂の目標値θ_{t 2}を０
°として動作させたときの応答を示している。また、図
７のシミュレーションデータは同じ初期状態にあって、
回転角θ₁の目標値θ_{t 1}を−３０°、回転角θ₂の目
標値θ_{t 2}を０°として動作させたときの回転角θ₁，
θ₂の応答を示している。図８に、この図６のシミュレ
ーションにおけるトルク曲線、図９に、この図６のシミ
ュレーションにおける回転角θ₂、仮想目標値θ_{d 2}の
応答を示す。6 to 11 show an example of the simulation data obtained in this simulation process. The simulation data of FIG.
After 0 times, the initial value of the rotation angle θ ₁ of the inverted pendulum model 12 is 90 °, the initial value of the rotation angle θ ₂ is 10 °, and the rotation angle θ ₁
Of the target value θ _{t 1} of 0 ° and the target value θ _{t 2} of the rotation angle θ ₂ of 0
It shows the response when operated as °. Also, the simulation data of FIG. 7 is in the same initial state,
Target value of the rotation angle θ _₁ θ _t ₁ to -30 °, the rotation angle theta ₁ when operating target value theta _{t 2} of the rotation angle theta ₂ as 0 °,
The response of θ ₂ is shown. FIG. 8 shows the torque curve in the simulation of FIG. 6, and FIG. 9 shows the response of the rotation angle θ ₂ and the virtual target value θ _{d 2} in the simulation of FIG.

【００６９】この図６及び図７のいずれかのシミュレー
ションの場合にも、約５秒後には目標の制御状態に制御
されることになる。このように、本発明では、制御状態
量の目標値との差分値に従って、ニューラルネットワー
ク１０の制御装置としての構築処理を実行するように構
成するものであるから、ニューラルネットワーク１０
は、その構築処理に用いられたものとは異なる目標状態
であっても、倒立振子モデル１２を所望の制御状態に制
御できるようになるのである。In any of the simulations shown in FIGS. 6 and 7, the target control state is controlled after about 5 seconds. As described above, according to the present invention, the construction process as the control device of the neural network 10 is executed according to the difference value between the control state quantity and the target value.
Can control the inverted pendulum model 12 to a desired control state even in a target state different from that used in the construction process.

【００７０】図１０及び図１１のシミュレーションデー
タは、図６のシミュレーションを実行する際に、トルク
修正量計算装置１５の算出処理に用いる回転角θ₂の最
大倒れ角θ_2maxを２０°の他に、１０°、３０°、４０
°に設定して１０回の試行を行い、その各々に対して回
転角θ₁，θ₂の応答を求めたものである。このシミュ
レーションデータから、回転角θ₂の最大倒れ角θ_2max
を変えても、倒立振子モデル１２の応答は基本的には変
わらないことが明らかとなった。The simulation data of FIGS. 10 and 11 shows that the maximum tilt angle θ _2max of the rotation angle θ ₂ used in the calculation process of the torque correction amount calculation device 15 when the simulation of FIG. 10 °, 30 °, 40
This is set by setting the angle to 0 and performing trials 10 times, and the responses of the rotation angles θ ₁ and θ ₂ are obtained for each of them. From this simulation data, the maximum tilt angle θ _2max of the rotation angle θ ₂
It was revealed that the response of the inverted pendulum model 12 basically does not change even if is changed.

【００７１】図示実施例について説明したが本発明はこ
れに限定されるものではない。例えば、実施例では、ニ
ューラルネットワークを用いて制御装置を構築するもの
を開示したがこれに限られることなく、教師信号に応じ
て信号変換機能を調節できるすべてのデータ処理装置に
対してそのまま適用できるのである。Although the illustrated embodiment has been described, the present invention is not limited to this. For example, in the embodiment, the one in which the control device is constructed by using the neural network is disclosed, but the present invention is not limited to this, and can be applied as it is to any data processing device capable of adjusting the signal conversion function according to the teacher signal. Of.

【００７２】そして、実施例では、倒立振子を制御対象
とするものを開示したが、これに限られることなく、す
べての説明対象に対してそのまま適用できるのである。
また、実施例では、実際の制御対象ではなくて、その制
御対象モデルを利用して制御装置を構築するものを開示
したが、これに限られることなく、実際の制御対象その
ものを用いるものであってもよいのであって、そのよう
にすると、クーロン摩擦等を含んだ正確なシステム同定
がなされるので、より適切な制御装置を構築できるよう
になるのである。In the embodiment, the control object is the inverted pendulum, but the present invention is not limited to this and can be applied to all the description objects as they are.
Further, in the embodiment, the control device is constructed not by using the actual controlled object but by using the controlled object model, but the present invention is not limited to this, and the actual controlled object itself is used. If so, an accurate system identification including Coulomb friction and the like can be performed, so that a more appropriate control device can be constructed.

【００７３】次に本発明の他の実施例として、倒立振子
の安定化制御の方式について説明する。本実施例に用い
た経験則は、「人間がホウキのような棒を立てながらあ
る位置に移動する場合、その移動したい方向へ棒を傾け
ながら移動する」ということを参考にしている。これを
本実施例に当てはめると、「アームが振り子を立てなが
ら目標位置に移動する場合、目標位置側へ振り子を傾け
ながら移動する」となる。Next, as another embodiment of the present invention, a method of stabilizing the inverted pendulum will be described. The empirical rule used in this embodiment is based on the fact that "when a person moves a certain position while standing a stick like a broom, it moves while tilting the stick in the desired direction". When this is applied to the present embodiment, "when the arm moves to the target position while raising the pendulum, the arm moves while tilting the pendulum toward the target position".

【００７４】これを関数で表現したのが、図１２であ
る。θ₁がアームのポジションで、θｄ₂がアームのポ
ジションθ₁に対する仮想目標値である。図１２は、原
点がアームθ₁の目標位置を表し、アームθ₁が目標位
置より正側にいる時には、振り子θ₂を負側（目標位置
側）に傾け逆に、アームθ₁が目標位置より負側にいる
時には、振り子θ₂を正側（目標位置側）に傾ける、と
いうことを表現している。FIG. 12 shows this by a function. θ ₁ is the arm position, and θd ₂ is a virtual target value for the arm position θ ₁ . 12, the origin represents the target position of the arm theta _1, when the arm theta ₁ is at the positive side than the target position, on the contrary inclined pendulum theta ₂ to the negative (target position side), the arm theta ₁ is a target position It represents that the pendulum θ ₂ is tilted to the positive side (target position side) when it is on the more negative side.

【００７５】図１２には、このような関数が２つ示して
ある。この２つの違いは、振り子θ ₂の傾きの大きさの
違いである。仮想目標値の傾きを小さくとることによっ
て、倒立振子を倒れにくくすることができる。あるい
は、傾きを大きくとることによって、目標位置に素早く
戻すことができる。従って、２種類の仮想目標値を用い
ることによって、目標位置から遠くでは倒れにくくに、
目標位置付近では素早く目標位置に追従させることが可
能である。FIG. 12 shows two such functions.
is there. The difference between the two is the pendulum θ ₂The size of the tilt of
It's a difference. By reducing the slope of the virtual target value
Therefore, the inverted pendulum can be made hard to fall down. There
Allows you to quickly reach the target position by increasing the inclination.
Can be returned. Therefore, using two types of virtual target values
By doing so, it is hard to fall down from the target position,
It is possible to quickly follow the target position near the target position.
Noh.

【００７６】仮想目標曲線に対応した以下に示す関数に
おいて、仮想目標曲線の種類に対応してパラメータａと
振り子の最大振れ角θ_{2 max}をかえることによって、異
なった関数を形成することができる。そして、例えば２
つの関数に対応した２つの曲線(1) 、(2) 上の点（ここ
では原点）を仮想目標値とすることができる。In the following function corresponding to the virtual target curve, different functions can be formed by changing the parameter a and the maximum swing angle θ _{2 max} of the pendulum according to the type of the virtual target curve. And, for example, 2
The point (here, the origin) on the two curves (1) and (2) corresponding to one function can be set as the virtual target value.

【００７７】[0077]

【数６】 [Equation 6]

【００７８】図１３は図１２に示した本発明の他実施例
の構成を示す。図１３において、対応するところは、図
１と同一番号を付する。図１３と図１とが異なること
は、図１３の仮想目標管理装置５は複数の仮想目標値に
対する複数の曲線（関数）を用意することと、図１３の
制御部１０は、図１のニューラルネット１、および学習
装置２からなる複数の制御部１０から構成され、それぞ
れの制御部１０が仮想目標管理装置５の複数の曲線（関
数）と対応して設けられている。１つの制御部は１つの
仮想目標値に対応する曲線（関数）について学習し、異
なる制御部は、異なる仮想目標値に対応する関数につい
て学習する。すなわち本実施例では仮想目標値を予め、
システムの状態に応じて設定しておき、学習時にシステ
ムの状態が変化する都度、制御部を変更して、随時学習
する方法である。FIG. 13 shows the configuration of another embodiment of the present invention shown in FIG. In FIG. 13, corresponding parts are assigned the same numbers as in FIG. The difference between FIG. 13 and FIG. 1 is that the virtual target management device 5 of FIG. 13 prepares a plurality of curves (functions) for a plurality of virtual target values, and the control unit 10 of FIG. It is composed of a plurality of control units 10 including a net 1 and a learning device 2, and each control unit 10 is provided corresponding to a plurality of curves (functions) of the virtual target management device 5. One control unit learns about a curve (function) corresponding to one virtual target value, and different control units learn about functions corresponding to different virtual target values. That is, in this embodiment, the virtual target value is set in advance.
In this method, the setting is made according to the state of the system, and the control section is changed every time the state of the system changes at the time of learning, and learning is performed at any time.

【００７９】制御部１０は現在の制御対象の状態に応じ
て、与えられた仮想目標値に近づくような出力を出すよ
う訓練されるものとする。この時、仮想目標値の傾を小
さく取ることによって、倒立振子を倒れにくくするとが
できる。あるいは、傾きを大きく取ることによって、目
標位置に素早く戻すことができる。It is assumed that the control unit 10 is trained so as to output such an output as to approach a given virtual target value according to the current state of the controlled object. At this time, it is possible to make the inverted pendulum difficult to fall down by making the inclination of the virtual target value small. Alternatively, by taking a large inclination, it is possible to quickly return to the target position.

【００８０】従って、曲線（１），（２）で表される２
種類の仮想目標値を用いることによって、目標値から遠
くでは曲線（１）に従って、腕の角度θ₁が変化して
も、振子の仮想の角度θｄ₂の変化を少なくすることに
より、振子を倒れしくくし、目標値付近では素早く目標
に追従させることが可能である。１つの仮想目標値につ
いて学習を行い、続いて異なる仮想目標値について学習
していけばよい。Therefore, 2 represented by the curves (1) and (2)
By using different kinds of virtual target values, even if the arm angle θ ₁ changes according to the curve (1) far from the target value, the change of the pendulum virtual angle θd ₂ is reduced, so that the pendulum falls down. It is possible to quickly follow the target near the target value. It suffices to learn about one virtual target value and then learn about different virtual target values.

【００８１】倒立制御の実行時は、倒立振子からでてく
る現在の腕の位置を使って選択回路１９が、目標値から
遠いときには傾きの少ない曲線（関数）で学習した制御
部を選択し、目標値に近くなったときには、傾きの大き
な曲線（関数）によって学習した制御部を選択する。At the time of executing the inverted control, the selection circuit 19 uses the current position of the arm coming out of the inverted pendulum to select the control section learned by the curve (function) having a small inclination when the distance is far from the target value. When the value approaches the target value, the control unit learned by the curve (function) having a large slope is selected.

【００８２】図１４は本発明のさらに他の実施例を示
す。図１３と異なるところは、制御部１０を１つ設け、
仮想目標管理装置５における複数の曲線（関数）を合成
して太線で示した１つの曲線（関数）を学習する。FIG. 14 shows still another embodiment of the present invention. The difference from FIG. 13 is that one control unit 10 is provided,
A plurality of curves (functions) in the virtual target management device 5 are combined to learn one curve (function) indicated by a thick line.

【００８３】実行時には、その曲線（関数）を制御部１
０が学習してあるので図１３の実施例と異なって、選択
回路１９は不要となる。At the time of execution, the curve (function) is calculated by the control unit 1.
Since 0 has been learned, the selection circuit 19 is unnecessary unlike the embodiment of FIG.

【００８４】[0084]

【発明の効果】以上説明したように、本発明によれば、
定性的にはある程度の先験的知識が得られているが、定
量的には未知の部分が多いような制御対象に対して、試
行によって教師信号を得て、これを用いて制御対象に対
しての制御規則をデータ処理装置の信号変換機能上に写
像していくことで、データ処理装置をその制御対象の制
御装置として構築していくときにあって、制御状態量の
目標値との差分値に従ってデータ処理装置の構築処理を
実行していくように構成するものであることから、制御
状態量の目標値が変更されるときにあっても学習をやり
直さなくて済むようになる。これより、非線型な制御対
象を扱う制御装置を用意に、かつ一般的な制御規則形式
でもって構築できるようになるのである。As described above, according to the present invention,
Qualitatively, a certain amount of a priori knowledge has been obtained, but for a control target that has many unknown parts quantitatively, a teacher signal is obtained by trial, and this is used for the control target. By mapping all the control rules on the signal conversion function of the data processing device, there is a case where the data processing device is constructed as the control device of the control target, and there is a difference from the target value of the control state quantity. Since the construction processing of the data processing device is executed according to the value, it becomes unnecessary to redo the learning even when the target value of the control state amount is changed. As a result, it becomes possible to easily construct a control device that handles a non-linear control target and to construct it in a general control rule format.

【００８５】本発明はさらに、複数の仮想目標曲線から
合成した特性に従うことにより、先験的知識に基づいて
所望の目標値に向かって変化させることができる。そし
て、本発明では、この実現のために備える教師信号の生
成のための制御操作量の修正量算出装置が、線型の算出
式に従うことで制御対象の応答特性を変更しないで制御
操作量の修正量を算出する構成を採るものであることか
ら、制御対象の持つ応答特性に忠実な制御規則を設定で
きるようになるとともに、制御装置の構築を短時間で実
行できるようになるのである。The present invention can further change toward a desired target value based on a priori knowledge by following a characteristic synthesized from a plurality of virtual target curves. Further, in the present invention, the control operation amount correction amount calculation device for generating the teacher signal provided for this realization corrects the control operation amount without changing the response characteristic of the control target by following the linear calculation formula. Since the configuration for calculating the quantity is adopted, it becomes possible to set a control rule that is faithful to the response characteristic of the controlled object, and to construct the control device in a short time.

[Brief description of drawings]

【図１】本発明の原理説明図である。FIG. 1 is a diagram illustrating the principle of the present invention.

【図２】本発明の一実施例である。FIG. 2 is an example of the present invention.

【図３】実施例で用いた倒立振子の説明図である。FIG. 3 is an explanatory diagram of an inverted pendulum used in an example.

【図４】実施例で用いた倒立振子の運動方程式の説明図
である。FIG. 4 is an explanatory diagram of a motion equation of an inverted pendulum used in an example.

【図５】ニューラルネットワークの構成図である。FIG. 5 is a configuration diagram of a neural network.

【図６】シミュレーションデータの説明図である。FIG. 6 is an explanatory diagram of simulation data.

【図７】シミュレーションデータの説明図である。FIG. 7 is an explanatory diagram of simulation data.

【図８】シミュレーションデータの説明図である。FIG. 8 is an explanatory diagram of simulation data.

【図９】シミュレーションデータの説明図である。FIG. 9 is an explanatory diagram of simulation data.

【図１０】シミュレーションデータの説明図である。FIG. 10 is an explanatory diagram of simulation data.

【図１１】シミュレーションデータの説明図である。FIG. 11 is an explanatory diagram of simulation data.

【図１２】本発明の他の実施例を説明する図である。FIG. 12 is a diagram illustrating another embodiment of the present invention.

【図１３】本発明の他の実施例の構成図である。FIG. 13 is a configuration diagram of another embodiment of the present invention.

【図１４】図１３に示した実施例の変形例を示す構成図
である。FIG. 14 is a configuration diagram showing a modification of the embodiment shown in FIG.

[Explanation of symbols]

１データ処理装置２学習処理装置３制御対象４目標値設定装置５仮想目標管理装置６操作修正量計算装置７第１の差分器８第２の差分器 1 Data processing device 2 Learning processor 3 controlled objects 4 Target value setting device 5 Virtual target management device 6 Operation correction amount calculation device 7 First differencer 8 Second differencer

【手続補正書】[Procedure amendment]

【提出日】平成４年１月３０日[Submission date] January 30, 1992

【手続補正１】[Procedure Amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】請求項５[Name of item to be corrected] Claim 5

【補正方法】変更[Correction method] Change

【補正内容】[Correction content]

Claims

[Claims]

1. A variable signal conversion function is provided, and when a teacher signal group is given, the signal conversion function can be recognized as one that realizes the input / output characteristics of the teacher signal group. A data processing device (1), a virtual target management device (5) for managing a priori knowledge of a data relationship between control state quantities obtained to realize a desired control state of a controlled object, and a control operation When a quantity is given, the control state quantity of the control target or its control target model and the virtual control state quantity specified from the management data of the virtual target management device (5) corresponding to the control state quantity An operation correction amount calculation device (6) for calculating a correction amount of the control operation amount required to realize a desired control state from the target value, and the data processing device (1) has a control state. As you enter the amount, Is applied to the controlled object or its controlled object model as a control operation amount, and the control operation amount is corrected according to the correction amount from the operation correction amount calculation device (6) at that time. Thus, a teacher signal is obtained, and the signal conversion function is set according to the obtained teacher signal, whereby the data processing device (1) is constructed as a control device for realizing a desired control state. In the control device construction system, the data processing device (1) calculates a difference value between a control state quantity output from a control target or a control target model and a control state quantity serving as a control target, or a value corresponding to the difference value. While adopting a configuration for inputting, the virtual target management device (5)
A control device construction processing system, which is configured to manage a priori knowledge of a data relationship between control state quantities with a difference value from a control state quantity as a control target as a parameter.

2. The control device construction processing system according to claim 1, wherein the operation correction amount calculation device (6) outputs the control state amount of the control target or its control target model and the virtual target management device (5). The control device construction processing system is configured to calculate a correction amount of the control operation amount by multiplying a difference value with respect to the virtual target value by a proportional coefficient.

3. The control device construction processing system according to claim 1, wherein the data processing device (1) receives one or more inputs and an internal state value to be multiplied with respect to the inputs. A control device construction processing system comprising: a network structure unit configured by internal connection of basic units that obtain a product sum value and convert the product sum value by a predetermined function to obtain an output value.

4. The control device construction processing system according to claim 1 or 2, wherein the data processing device (1) uses an IF-T to establish a qualitative data relationship between the control state quantity and the control operation quantity.
A control device construction processing system comprising a fuzzy device which is described by a HEN rule and which describes the qualitative attributes of the state quantity and control operation quantity described by the IF-THEN rule by a membership function.

5. A virtual target management means (5) for calculating a virtual target value for a second variable according to a predetermined virtual target curve from a difference between a current control state quantity related to the first variable and a target control state quantity. ) And the first target output value of the virtual target value and the current controlled object regarding the obtained second variable.
And a second variable, the operation correction amount calculation means (6) for calculating the correction amount of the input signal of the controlled object, the current input signal given to the controlled object, and the corrected amount of the input signal of the controlled object. A first calculation means (7) for forming a new teacher signal from the second calculation means, and a second calculation means for calculating a difference value between the current values of the first and second variables to be controlled and the target values of the first and second variables. The calculation means (8) and the difference signal are input to use the teacher signal to determine the relationship between the first variable and the second variable by the data dependency relationship defined by the virtual target management means (5). Learning so that the data dependence relationship set by the virtual target management means (5) is maintained even when the controlled object reaches the target value given by the target value setting means. Learning is performed using the teacher signal, and at the time of execution, the first performance is performed based on the learning result. A control having a learning function of inputting a response result to the difference value from the means (7) and giving it to the controlled object so that the controlled object achieves a desired purpose at the given target value. A control device construction processing system comprising means (1, 2, 10).

6. The control device construction processing system according to claim 5, wherein the control means comprises a neural network.

7. The control device construction processing system according to claim 6, wherein the neural network learns by a backpropagation algorithm.

8. The control device construction processing system according to claim 5, wherein a plurality of the virtual target curves are prepared, a plurality of control units learned corresponding to each of the plurality of lines are prepared, and a current control target is selected. And a means for selecting one of the plurality of control units according to the output of the control device construction processing system.

9. The control device construction processing system according to claim 5, wherein a plurality of the virtual target curves are prepared, and a value of the second variable with respect to the first variable is selected from a plurality of variables on the virtual target curve. And a control device construction processing system.

10. A virtual target management unit for expressing a part or all of control variables for controlling a controlled object to a given target value by a virtual target curve based on experience, and a virtual target management section on the virtual target curve. Operation control amount that calculates the correction amount of the input signal of the control target using a plurality of control units that set the input to the control target that realizes the target value and the virtual target value and the current output value of the control target Calculating means, calculating means for forming a new control target input signal from the current input signal given to the control target and the correction amount of the control target input signal, and the virtual target management unit in the plurality of control units. A control device comprising: means for learning an input / output relationship between an input relating to a current control state quantity of a control target capable of realizing the calculated virtual target value and an output given to the input of the control target.

11. The control device according to claim 10,
A control device, wherein a plurality of control units are made to learn the input / output relations that can realize a plurality of virtual control target curves, and the plurality of control units are selected according to a control state quantity of a control target.

12. The control device according to claim 10,
A control device characterized in that a virtual target curve is changed according to a control state quantity of a control target, and a control unit is made to learn the input / output relationship that can realize the virtual target value.