JP2851901B2

JP2851901B2 - Learning control device

Info

Publication number: JP2851901B2
Application number: JP2056686A
Authority: JP
Inventors: 稔藤田; 亮土橋; 均斎藤
Original assignee: Hitachi Ltd; Hitachi Keiyo Engineering Co Ltd
Current assignee: Hitachi Ltd; Hitachi Keiyo Engineering Co Ltd
Priority date: 1990-03-09
Filing date: 1990-03-09
Publication date: 1999-01-27
Anticipated expiration: 2014-01-27
Also published as: JPH03259303A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、予め設定された応答特性のもとで入力指令
に対して所定の出力結果を与えるようにした制御系に係
り、特にロボットの制御に好適な学習制御装置に関す
る。DETAILED DESCRIPTION OF THE INVENTION [Industrial Application Field] The present invention relates to a control system for giving a predetermined output result to an input command based on a preset response characteristic, and particularly to a control system for a robot. The present invention relates to a learning control device suitable for control.

[Conventional technology]

従来、PID（比例・積分・微分）制御則を用いた制御
システムの各ゲインの調整を、ニユーラルネツトワーク
という学習装置を用いて自動的に行なわせようとする場
合、第６回日本ロボツト学会学術講演会論文集（昭63−1
0）、第143頁〜第144頁に記載のように、ニユーラルネツトワークの学習に用い
る教師信号は、予め人間が実験的、あるいは解析的に準
備したものを用いていた。Conventionally, when trying to automatically adjust each gain of a control system using a PID (proportional / integral / derivative) control law using a learning device called a neural network, the 6th Annual Meeting of the Robotics Society of Japan Academic Lecture Papers (Showa 63-1
0), as described on pages 143 to 144, teacher signals used for learning a neural network are prepared in advance by humans experimentally or analytically.

しかしながら、ロボットのマニピユレータにおける位
置と力のハイブリツド制御は、制御システム全体のフイ
ードバツクのループ内に制御対象物の動特性が含まれる
ため、制御対象物の動特性を知らなければ安定な制御が
できない。また、固定PIDゲインによる制御では、その
適応性が限られてしまうため、取り扱う対象物によって
適宜PIDゲインを変化させる必要がある。However, in the hybrid control of the position and the force in the robot manipulator, since the dynamic characteristics of the control target are included in the feedback loop of the entire control system, stable control cannot be performed without knowing the dynamic characteristics of the control target. In addition, in the case of control using a fixed PID gain, the adaptability is limited. Therefore, it is necessary to appropriately change the PID gain depending on an object to be handled.

そこで、上記従来技術では、制御システムのパラメー
タであるPIDゲインを、ニユーラルネツトワークを用い
て適宜変化させる方式をとつていた。Therefore, in the above-described conventional technique, a method is used in which the PID gain, which is a parameter of the control system, is appropriately changed by using a neural network.

この方式では、ニユーラルネツトワークは、入力に応
じた最適なPIDのゲインの値の決め方を学習により自己
形成する。つまり、ニユーラルネツトワークを入力をマ
ニピユレータ先端の位置と、フイードバツクされる力、
および力の目標値とし、ニユーラルネツトワークの出力
をPID各ゲインとするのである。In this method, the neural network forms itself by learning how to determine the optimal PID gain value according to the input. In other words, the input of the neural network is input to the position of the tip of the manipulator, the force to be fed back,
And the target value of force, and the output of the neural network is used as each PID gain.

そして、このニユーラルネツトワークによる学習に
は、入力に対して与えられる、望ましい出力を表わすデ
ータと、実際のニユーラルネツトワークの出力データと
を比較し、その誤差を最小にするようにネツトワークの
状態を変化させていく、バツクプロパゲーシヨン・アル
ゴリズムを用い、このパツクプロパゲーシヨン・アルゴ
リズムを繰り返し行なうことで、ニユーラルネツトワー
クの出力が、望ましい出力に近付いていく。In the learning using the neural network, data representing a desired output given to an input and output data of an actual neural network are compared, and the network is adjusted so as to minimize the error. By using a backpropagation algorithm which changes the state of the backpropagation and repeating this backpropagation algorithm, the output of the neural network approaches the desired output.

ところで、ここで使われている、入力に対して与えら
れるべき望ましい出力データのことを教師信号という
が、上記従来技術では、この教師信号として、予め人間
が制御システムが安定なるように、実験あるいは制御系
の解析を行なつて求めておいたPIDゲインを用いてい
た。By the way, the desired output data to be given to the input used here is called a teacher signal. In the above-mentioned conventional technique, as the teacher signal, an experiment or an experiment is carried out so that a control system can be stabilized by a human in advance. The PID gain obtained by analyzing the control system was used.

このようにして求められたいくつかの教師信号を用い
て、ニユーラルネツトワークは、制御システムの特性の
一部分を学習していた。そして、学習後は、ニユーラル
ネツトワークは、学習した特性から制御システムの動特
性を推定し、入力に応じたPIDゲインを出力するように
なつていた。Using some of the teacher signals thus obtained, the neural network has learned some of the characteristics of the control system. After the learning, the neural network estimates the dynamic characteristics of the control system from the learned characteristics and outputs a PID gain according to the input.

[Problems to be solved by the invention]

上記従来技術は、学習装置の学習に用いる教師信号と
して、制御システムの出力の評価値が目標とする値にな
るように、人間が予め実験的、あるいは解析的に準備し
たものを必要とする。The above-mentioned conventional technique requires, as a teacher signal used for learning by the learning device, a signal prepared by a human in advance experimentally or analytically so that the evaluation value of the output of the control system becomes a target value.

このため、良好な実験結果が得られない場合や、解析
するためのモデルが構築できないような場合には、教師
信号を予め作成することができず、学習装置に学習させ
ることができなくなつてしまうという問題があつた。For this reason, when good experimental results cannot be obtained or when a model for analysis cannot be constructed, a teacher signal cannot be created in advance, and the learning device cannot learn. There was a problem of getting it.

また、従来技術では、教師信号を作成することが可能
であつても、大量の均質な教師データを作成する作業に
膨大な時間がかかつてしまうという問題があつた。Further, in the prior art, even if a teacher signal can be created, there is a problem that an enormous amount of time is required to create a large amount of uniform teacher data.

本発明の目的は、極力人間の介在なしに、学習装置の
学習が可能で、学習後の学習装置を用いることにより、
制御システムの出力の評価値が目標とする値になるよう
にすることができる教師信号を自動的に生成する機能を
持った学習制御装置を提供することにある。An object of the present invention is to enable learning of a learning device without human intervention as much as possible, and by using a learning device after learning,
It is an object of the present invention to provide a learning control device having a function of automatically generating a teacher signal capable of setting an evaluation value of an output of a control system to a target value.

[Means for solving the problem]

上記目的を達成するため、通常の学習制御装置として
の動作モードに加えて、学習装置の学習を自動的に行な
う学習モードを設け、この学習モードにおいて、自動的
に制御システムの調整信号をランダムに発生させる働き
をする教師信号生成装置を設けたものである。In order to achieve the above object, in addition to the operation mode as a normal learning control device, a learning mode for automatically performing learning of the learning device is provided. In this learning mode, an adjustment signal of the control system is automatically and randomly generated. It is provided with a teacher signal generating device which functions to generate the signal.

さらに、制御システムの入力と、前述の教師信号生成
装置が生成するランダムな調整信号に対する制御システ
ムの出力を評価する評価装置を設けたものである。Further, an evaluation device is provided for evaluating the input of the control system and the output of the control system with respect to the random adjustment signal generated by the teacher signal generation device.

また、通常の学習制御装置としての動作モードにおい
て、制御システムの出力の評価値に対する目標値を設定
するための目標評価値設定装置を設けたものである。Also, a target evaluation value setting device for setting a target value for an evaluation value of an output of the control system in an operation mode as a normal learning control device is provided.

さらにまた、上記２種のモードにおいて、教師信号生
成装置の接続切離し、ならびに評価装置と目標評価値設
定装置の切換えを行なう、切換え装置を設けたものであ
る。Further, in the above two modes, there is provided a switching device for disconnecting the teacher signal generation device and switching between the evaluation device and the target evaluation value setting device.

[Action]

前述の学習モードにおいて、教師信号生成装置は、制
御システムを動作させるための調整信号をランダムに発
生し、この値を調整信号として制御システムに渡す。In the learning mode described above, the teacher signal generation device randomly generates an adjustment signal for operating the control system, and passes this value to the control system as an adjustment signal.

制御システムに入力が与えられると、制御システム
は、上記調整信号によつて決定される入出力特性に応じ
た応答を出力する。When an input is given to the control system, the control system outputs a response according to the input / output characteristics determined by the adjustment signal.

そこで、評価装置は、この出力を評価し、これによつ
て、人間が介在することなく、学習データすなわち調整
信号と、それに対応する制御システムの応答、およびそ
の応答に対する評価値が１組得られたことになる。Therefore, the evaluation device evaluates the output, and thereby obtains a set of learning data, that is, an adjustment signal, a response of the control system corresponding thereto, and an evaluation value for the response without human intervention. It will be.

以後、学習データは学習装置に渡され、学習装置は、
制御システムの出力、および評価装置の評価値と、教師
信号生成装置の生成する調整信号との関係を、学習装置
内部の状態を変更することにより学習する。Thereafter, the learning data is passed to the learning device, and the learning device
The relationship between the output of the control system, the evaluation value of the evaluation device, and the adjustment signal generated by the teacher signal generation device is learned by changing the state inside the learning device.

以上のプロセスを繰り返し、学習を重ねることによつ
て、ランダムな教師信号と学習装置の出力とが近付いて
くるようになり、学習装置内部に制御システムの出力お
よび評価装置の評価値と、調整信号との関連付けができ
上がる。つまり、制御システムの特性が学習できること
になる。By repeating the above process and repeating the learning, the random teacher signal and the output of the learning device come close to each other, and the output of the control system, the evaluation value of the evaluation device, and the adjustment signal The association is completed. That is, the characteristics of the control system can be learned.

学習後は、切り換え装置が学習制御装置を動作モード
に切り換える。このとき教師信号は不要となるため、教
師信号生成値は切り放される。また、評価装置も不要に
なり、かわりに目標評価値設定装置が接続される。この
目標評価値設定装置に、人間が目標評価値を設定するこ
とにより、目標評価値が学習装置の入力となる。After learning, the switching device switches the learning control device to the operation mode. At this time, the teacher signal becomes unnecessary, so the teacher signal generation value is cut off. In addition, an evaluation device is not required, and a target evaluation value setting device is connected instead. When a human sets a target evaluation value in the target evaluation value setting device, the target evaluation value is input to the learning device.

学習装置は内部に制御システムの出力および評価値
と、調整信号との関連付けができているので、そのとき
の制御システムの出力と、設定されている目標評価値と
を入力することで、それに対応する調整信号を出力す
る。この調整信号を制御システムに入力することによ
り、制御システムの出力の評価値を、設定されている目
標評価値に一致させるよう、制御システムの入出力特性
を調整することができる。Since the learning device can internally associate the output and evaluation value of the control system with the adjustment signal, input the output of the control system at that time and the target evaluation value that has been set. Output an adjustment signal. By inputting this adjustment signal to the control system, the input / output characteristics of the control system can be adjusted so that the output evaluation value of the control system matches the set target evaluation value.

〔Example〕

以下に本発明の第１の実施例について第１図ないし第
５図を用いて、そして第２の実施例については第６図か
ら第９図を用いて、それぞれ説明する。Hereinafter, a first embodiment of the present invention will be described with reference to FIGS. 1 to 5, and a second embodiment will be described with reference to FIGS. 6 to 9. FIG.

第１の実施例は、制御システムの適用対象をPID制御
則を用いた力制御系にとり、この力制御系の振動を抑制
するように、PID各ゲインをニユーラルネツトワークと
いう学習体を用いて、適宜調整する適応制御の例であ
り、さらに、この実施例では、ロボットハンドをその適
用対象としているものである。In the first embodiment, the control system is applied to a force control system using a PID control law, and each gain of the PID is controlled by using a learning object called a neural network so as to suppress the vibration of the force control system. This is an example of adaptive control that is appropriately adjusted. In this embodiment, a robot hand is applied.

まず、第２図は本発明の一実施例が適用されたロボツ
トハンドの制御システムの一例を示したもので、図にお
いて、制御装置１に与えられた力指令値によってサーボ
モータ２に電流が流れ、ボールネジ３が回転する。ボー
ルネジ３が回転することにより力センサ４を取り付けた
可動指５が把持対象６をつむように移動する。First, FIG. 2 shows an example of a robot hand control system to which an embodiment of the present invention is applied. In FIG. 2, a current flows through the servomotor 2 according to a force command value given to the control device 1. Then, the ball screw 3 rotates. As the ball screw 3 rotates, the movable finger 5 to which the force sensor 4 is attached moves so as to pinch the object 6 to be grasped.

そして、この可動指５が把持対象６をつかむと、力セ
ンサ４により、その把持力がフイードバツクされ、力指
令値との偏差が０になるまで制御装置１からサーボモー
タ２に電流が流され、最終的に把持力が力指令値に一致
するようにされるのである。When the movable finger 5 grasps the object 6 to be grasped, the grasping force is fed back by the force sensor 4, and a current flows from the control device 1 to the servomotor 2 until the deviation from the force command value becomes zero. Finally, the gripping force is made to match the force command value.

第３図は、上記制御システムをブロツク図で表わした
もので、制御則としてPID制御則を用いている。そし
て、把持対象Goはバネ・ダツシユポツト・質量系とみな
した２次系の伝達関数の他、力指令値からモータ電流へ
の変換係数、モータ電流からトルクへの変換係数、ボー
ルネジの減速比を含んでいる。FIG. 3 is a block diagram of the above control system, and uses a PID control law as a control law. The object to be grasped Go includes a transfer coefficient of a force command value to a motor current, a conversion coefficient of a motor current to a torque, and a reduction ratio of a ball screw, in addition to a transfer function of a secondary system regarded as a spring / dashpot / mass system. In.

ところで、この制御システムは、把持対象Goに合わせ
てPID各ゲインを設定しないとハンチングや振動を起こ
してしまう。そこで、この実施例では、これらPID各ゲ
インの設定をニユーラルネツトワークという学習装置を
用いて、自動的に行なうように構成されているものであ
り、以下これらPID各ゲインの設定処理について説明す
る。By the way, in this control system, hunting and vibration occur unless PID gains are set in accordance with the grip target Go. Therefore, in this embodiment, the setting of each PID gain is automatically performed using a learning device called a neural network. Hereinafter, the setting processing of each PID gain will be described. .

まず、第５図はモードの遷移フローであり、学習モー
ド51と動作モード52で構成され、学習モード51は、PID
各ゲインの調整の仕方を、ニユーラルネツトワークに学
習させるモードで、動作モード52は、学習後のニユーラ
ルネツトワークを用いて、適宜PID各ゲインを調整し、
力制御系の振動を抑制するモードである。First, FIG. 5 shows a mode transition flow, which is composed of a learning mode 51 and an operation mode 52.
In the mode in which the neural network learns how to adjust each gain, the operation mode 52 adjusts the PID gains as appropriate using the learned neural network,
In this mode, the vibration of the force control system is suppressed.

ニユーラルネツトワークを用いて、適宜PID各ゲイン
を調整するためには、先ず、学習モード51でニユーラル
ネツトワークを学習させ、その後、動作モード52に切り
換えて、PID各ゲインを調整するのである。In order to appropriately adjust the PID gains using the neural network, first, the neural network is learned in the learning mode 51, and thereafter, the mode is switched to the operation mode 52, and the PID gains are adjusted. .

次に、第１図は本発明の一実施例を示すシステム全体
構成図であり、大別すると、制御システム11と、第３図
に示した比例要素ゲイン調整信号、積分要素ゲイン調整
信号、れに微分要素ゲイン調整信号の３種の信号からな
る教師信号を作成するための調整装置（教師信号を作成
する学習装置）12からなる。Next, FIG. 1 is an overall system configuration diagram showing one embodiment of the present invention. When roughly divided, a control system 11 and a proportional element gain adjustment signal and an integral element gain adjustment signal shown in FIG. And an adjustment device (learning device for generating a teacher signal) 12 for generating a teacher signal composed of three types of differential element gain adjustment signals.

制御システム11は、前述のPID制御則を用いた制御系
からなり、調整装置12は、その内部に、評価装置13と教
師信号生成装置14、学習装置15、目標評価値設定手段1
6、それに切換装置17を有する。The control system 11 includes a control system using the above-described PID control law, and the adjustment device 12 includes therein an evaluation device 13, a teacher signal generation device 14, a learning device 15, and a target evaluation value setting unit 1.
6, it has a switching device 17;

調整装置12は、制御システム11の入出力特性のパラメ
ータであるPID各ゲインの値を調整するものであり、切
換装置17は、前述の学習モード91と動作モード92の切り
換えを行なう。そして、実線の切換位置は学習モード
時、破線の切換位置は動作モード時を示す。The adjustment device 12 adjusts the value of each PID gain, which is a parameter of the input / output characteristics of the control system 11, and the switching device 17 switches between the learning mode 91 and the operation mode 92 described above. The switching position indicated by the solid line indicates the learning mode, and the switching position indicated by the broken line indicates the operation mode.

先ず、学習モード時について説明する。 First, the learning mode will be described.

評価装置13は、制御システム11の出力に現われる振動
の度合いを評価し、所定の評価関数を発生する働きをえ
る。The evaluation device 13 serves to evaluate the degree of vibration appearing in the output of the control system 11 and generate a predetermined evaluation function.

教師信号生成装置14は乱数発生器で構成され、限定さ
れた範囲の数値をランダムに発生し、PID各ゲインの値
を設定するための調整信号をランダムに決定する働きを
する。The teacher signal generation device 14 is constituted by a random number generator, and functions to randomly generate a numerical value in a limited range and randomly determine an adjustment signal for setting the value of each PID gain.

学習装置15はニユーラルネツトワーク18を含み、制御
システム11の指令値と出力、および評価装置13の出力か
ら、教師信号生成装置14の出力を予想し、この予想と、
実際の教師信号生成装置14の出力との差が０になるよう
に、ニユーラルネツトワーク18の内部状態を変更する働
きをする。The learning device 15 includes a neural network 18, predicts the output of the teacher signal generation device 14 from the command value and output of the control system 11, and the output of the evaluation device 13, and
It serves to change the internal state of the neural network 18 so that the difference from the actual output of the teacher signal generator 14 becomes zero.

次に、動作モード時について説明する。 Next, the operation mode will be described.

目標評価値設定装置16は目標評価値を記憶しておくレ
ジスタであり、制御システム11が目標とする振動の度合
いを記憶する働きをする。The target evaluation value setting device 16 is a register that stores the target evaluation value, and has a function of storing the degree of vibration that the control system 11 targets.

学習装置15は、制御システム11の指令値および出力
と、今回は目標評価値設定装置16が記憶している目標評
価値とから、PID各ゲインの調整信号を出力する。この
出力は、ニユーラルネツトワーク18がPID各ゲインの調
整の仕方を学習しているので、制御システム11の出力評
価が目標評価値の一致するような調整信号となつてい
る。The learning device 15 outputs an adjustment signal for each PID gain from the command value and output of the control system 11 and the target evaluation value stored in the target evaluation value setting device 16 this time. This output is an adjustment signal such that the output evaluation of the control system 11 matches the target evaluation value because the neural network 18 has learned how to adjust each PID gain.

第４図は、ニユーラルネツトワーク18の基本構成図で
ある。FIG. 4 is a basic configuration diagram of the neural network 18. As shown in FIG.

このニユーラルネツトワーク18は、複数のニユーロン
41の結合体であり、このニユーロン41は、多入力１出力
の素子で、独立したそれぞれの入力ごとに重み付けを持
つた積和演算処理を行なうものである。This neural network 18 includes multiple neurons.
The multi-input element 41 is a multi-input, one-output element that performs a product-sum operation with weighting for each independent input.

この実施例におけるニユーラルネツトワーク18はニユ
ーロン41を層状に結合し、入力層43、中間層44、出力層
45という構造をとつており、従って、まず入力層43に与
えられた入力情報が、中間層44、出力層45へと順に伝播
されることによつて処理されることになる。そして、ニ
ユーラルネツトワーク18は、この情報の処理の仕方を学
習によつて自己形成する。The neural network 18 in this embodiment combines the neurons 41 in a layered manner, and includes an input layer 43, a middle layer 44, and an output layer.
The input information provided to the input layer 43 is processed by being sequentially transmitted to the intermediate layer 44 and the output layer 45 in this order. Then, the neural network 18 forms itself by learning how to process this information.

この学習は、１個１個のニユーロン間の結合42の強さ
加減によつて、すなわち、ニユーロン41の入力の重み付
けを変更することによつて行なわれるのであるが、この
学習のアルゴリズムとしては、バツクプロバゲーシヨン
と呼ばれる方式ものが知られており、この実施例でも、
このバツクプロパケーシヨン方式が用いている。This learning is performed by adjusting the strength of the connection 42 between each neuron, that is, by changing the weight of the input of the neuron 41. The learning algorithm is as follows. A system called back projection is known, and in this embodiment,
This backpropagation system is used.

従って、この実施例においては、以上に述べたように
して、力制御系の振動を促成するように、PID各ゲイン
をニユーラルネツトワーク18を用いて、適宜調整するこ
とができ、この結果、この実施例によれば、制御システ
ム11に含まれているPID力制御系を解析する必要がな
く、そのための労力を省略できる。Therefore, in this embodiment, as described above, each gain of the PID can be appropriately adjusted using the neural network 18 so as to encourage the vibration of the force control system, and as a result, According to this embodiment, there is no need to analyze the PID force control system included in the control system 11, and labor for that can be omitted.

さらに、振動の度合いの目標値を、パラメータに変換
することなしに調整装置12に入力できるため、そのため
の労力を省略できる。Further, since the target value of the degree of vibration can be input to the adjustment device 12 without converting the target value into a parameter, labor for that can be omitted.

また、評価装置13の評価要素を変更し、学習をするだ
けで、他の改造をすることなく、目標とする制御ができ
るという効果がある。In addition, there is an effect that the target control can be performed only by changing the evaluation element of the evaluation device 13 and learning, without making any other modification.

次に、本発明の第２の実施例について説明する。 Next, a second embodiment of the present invention will be described.

この実施例は、制御システムによる対象をＸ−Ｙプロ
ツタの位置サーボ系とし、このサーボ系の特性をニユー
ラルネツトワークを用いて補償する例である。This embodiment is an example in which the object of the control system is a position servo system of an XY plotter, and the characteristics of this servo system are compensated for by using a neural network.

第６図は、本発明の第２の実施例が対象としているＸ
−ＹプロツタのＸ軸の位置サーボ系を、この位置サーボ
系に内包されている速度サーボ系の応答の方が、この位
置サーボ系自身の応答に比べて十分速いものと同定した
場合のブロツク図であり、この場合の位置サーボ系61
は、積分要素62、およびその調整ゲイン63より構成され
ている。FIG. 6 is a diagram showing an X target of the second embodiment of the present invention.
A block diagram when the position servo system of the X axis of the Y plotter is identified as having a sufficiently high response of the speed servo system included in the position servo system as compared with the response of the position servo system itself. In this case, the position servo system 61
Is composed of an integral element 62 and an adjustment gain 63 thereof.

そして、この位置サーボ系61は、第６図のブロツク図
より明らかなように、１次系の特性の動作をする。な
お、この実施例はＸ−Ｙブロツタなので、これと同じ位
置サーボ系がＹ軸にもある。The position servo system 61 operates with the characteristics of the primary system as is apparent from the block diagram of FIG. Since this embodiment is an XY blotter, the same position servo system is also provided for the Y axis.

第７図は、第６図に示した位置サーボ系を備えたＸ−
Ｙプロツタの軌跡を示したもので、P0→P1の軌跡71は、
指令通りの軌跡を描いているが、P0→P2→P1の軌跡72
は、軌跡P2の周辺で指令した軌跡より逸脱している。FIG. 7 shows an X-axis having the position servo system shown in FIG.
This shows the locus of the Y plotter, and the locus 71 of P0 → P1 is
Although the trajectory is drawn as instructed, the trajectory 72 of P0 → P2 → P1
Deviates from the commanded locus around the locus P2.

そこで、位置サーボ系61を解析してみと、ここでの指
令はランプ入力になるので、ランプ入力を位置サーボ系
61に入力したときは、位置サーボ系61の応答が、ランプ
入力の傾きである速度と、調整ゲイン63より決まる一定
量だけ、指令に対して遅れていることが判り、従って、
この遅れを補償する方法を以下に考えてみる。Therefore, when analyzing the position servo system 61, since the command here is a ramp input,
When input to 61, it is understood that the response of the position servo system 61 is delayed with respect to the command by a certain amount determined by the speed which is the slope of the ramp input and the adjustment gain 63.
Consider a method for compensating for this delay below.

第８図は、第１図の実施例における制御システム11
を、この第２の実施例が対象とする制御システム11′で
表わした場合の学習モード時の状態を示した簡易等価ブ
ロック図で、この場合、制御システム11′は、評価装置
13、及び既知である指令値を含めて考えると、パラメー
タＫ（第１図の実施例において、切換装置17から出力さ
れる調整信号に相当）を入力として、評価値Ａ（同じく
制御評価装置13の出力に相当）を出力するシステムとし
て、Ａ＝Ｇ（Ｋ）という伝達関数Ｇで表わすことができ
る。FIG. 8 shows a control system 11 in the embodiment of FIG.
Is a simplified equivalent block diagram showing a state in the learning mode when the control system 11 'is the object of the second embodiment. In this case, the control system 11'
13 and a known command value, a parameter K (corresponding to an adjustment signal output from the switching device 17 in the embodiment of FIG. 1) is input and an evaluation value A (also the control evaluation device 13) is input. Can be represented by a transfer function G of A = G (K).

一方、上記したように、調整装置12はニユーラルネツ
トワーク18を内部に有する。そこで、その伝達関数を
Ｇ′とすると、Ｋ′＝Ｇ′（Ａ）と表わすことができ
る。On the other hand, as described above, the adjusting device 12 has the neural network 18 therein. Therefore, assuming that the transfer function is G ', it can be expressed as K' = G '(A).

そこで、Ｋ＝Ｋ′となるように、この調整装置12に、
ニユーラルネツトワーク18を学習させる。そして、Ｋ＝
Ｋ′となつた時点では、調整装置12の伝達関数Ｇ′は、
制御システム11′の伝達関数Ｇの逆関数となる。Therefore, this adjusting device 12 is set so that K = K ′.
Train the network 18. And K =
At the point of time K ′, the transfer function G ′ of the adjusting device 12 becomes
It is the inverse function of the transfer function G of the control system 11 '.

そこで、こんどは、第９図に示すように接続を切り換
える。Then, the connection is switched as shown in FIG.

この第９図は、第１図の実施例における制御システム
11を、この第２図の実施例が対象とする制御システム1
1′で表わした場合の動作モード時における簡易等価ブ
ロツク図で、制御システム11′は、第８図と同一のもの
であり、調整装置12は、Ｋ＝Ｋ′となつた時点のもので
ある。FIG. 9 shows a control system in the embodiment of FIG.
11 is the control system 1 to which the embodiment of FIG.
In the simplified equivalent block diagram in the operation mode in the case of 1 ', the control system 11' is the same as that of Fig. 8, and the adjusting device 12 is at the time when K = K '. .

調整装置12は、目標評価値Ａ′が入力されると、その
伝達関数Ｇ′によつて決まるパラメータＫ′を出力す
る。そして、制御システム11′は、このパラメータＫ′
が入力されたことにより評価値Ａを出力する。ところ
が、調整装置12の伝達関数Ｇ′は、制御システム11′の
伝達関数Ｇの逆関数となつているので、全体の伝達関数
は１となり、Ａ＝Ａ′となる。When the target evaluation value A 'is input, the adjusting device 12 outputs a parameter K' determined by the transfer function G '. Then, the control system 11 'calculates the parameter K'
Is output, the evaluation value A is output. However, since the transfer function G 'of the adjusting device 12 is an inverse function of the transfer function G of the control system 11', the overall transfer function is 1, and A = A '.

従って、ここでＡを位置、Ａ′を位置指令とすると、
損失も位相のずれもなく、位置を制御することができる
ことになる。Therefore, if A is a position and A 'is a position command,
The position can be controlled without loss or phase shift.

この実施例によれば、厳密なサーボ系のブロツク図が
規定できなくても、サーボ系の特性が補償できるという
効果がある。According to this embodiment, there is an effect that the characteristics of the servo system can be compensated even if a strict block diagram of the servo system cannot be defined.

〔The invention's effect〕

本発明によれば、人間は目標評価値を与えるだけ、制
御システムの出力評価値を、目標評価値になるように、
制御することができる。According to the present invention, as long as the human gives the target evaluation value, the output evaluation value of the control system becomes the target evaluation value,
Can be controlled.

[Brief description of the drawings]

第１図は本発明による学習制御装置の一実施例を示すブ
ロック図、第２図は本発明の一実施例が適用されたロボ
ツトハンドの制御システムの一例を示す構成図、第３図
は制御システムのブツク図、第４図はニユーラルネツト
ワークの基本構成図、第５図はモードの遷移フロー説明
図、第６図は本発明の第２の実施例が対象としているＸ
−ＹプロツタのＸ軸の位置サーボ系のブロック図、第７
図は位置サーボ系を備えたＸ−Ｙブロツタの軌跡を示す
説明図、第８図は学習モード時の状態を表わす簡易等価
ブロツク図、第９図は動作モード時における簡易等価ブ
ロツク図である。１……制御装置、２……サーボモータ、３……ボールネ
ジ、４……力センサ、５……可動指、６……把持対象、
11……制御システム、12……調整装置、13……評価装
置、14……教師信号生成装置、15……学習装置、16……
目標評価値設定装置、17……切換装置、18……ニユーラ
ルネツトワーク。FIG. 1 is a block diagram showing an embodiment of a learning control device according to the present invention, FIG. 2 is a block diagram showing an example of a robot hand control system to which an embodiment of the present invention is applied, and FIG. FIG. 4 is a diagram showing the basic configuration of a neural network, FIG. 5 is an explanatory diagram of a mode transition flow, and FIG. 6 is an X diagram for the second embodiment of the present invention.
-Block diagram of X-axis position servo system of Y plotter, 7th
FIG. 8 is an explanatory diagram showing the trajectory of an XY blotter having a position servo system, FIG. 8 is a simplified equivalent block diagram showing a state in a learning mode, and FIG. 9 is a simplified equivalent block diagram in an operation mode. 1 ... Control device, 2 ... Servo motor, 3 ... Ball screw, 4 ... Force sensor, 5 ... Movable finger, 6 ... Grip target,
11 ... control system, 12 ... adjustment device, 13 ... evaluation device, 14 ... teacher signal generation device, 15 ... learning device, 16 ...
Target evaluation value setting device, 17 ... Switching device, 18 ... Neural network.

───────────────────────────────────────────────────── フロントページの続き (72)発明者斎藤均千葉県習志野市東習志野７丁目１番１号日立京葉エンジニアリング株式会社内 (56)参考文献特開平２−136904（ＪＰ，Ａ) 特開平３−105662（ＪＰ，Ａ) 特開平３−105663（ＪＰ，Ａ) 佐賀一繁、外３名、「自己学習方式の一検討）、1989年電子情報通信学会秋季全国大会講演論文集分冊６（情報・システム）、社団法人電子情報通信学会、特許庁資料館平成元年８月24日受入、ｐ. ６・278〜６・279 橋本秀紀、外２名、（ニューラルネットを用いたサーボ系の制御」、日本ロボット学会第７回学術講演会予稿集、日本ロボット学会、平成元年11月２日、Ｐ. 745〜746 ＤｅｍｅｔｒｉＰｓａｌｉｔｓ、外２名、「ＡＭｕｌｔｉｌａｙｅｒｅｄＮｅｕｒａｌＮｅｔｗｏｒｋＣｏｎｔｒｏｌｌｅｒ」、ＩＥＥＥＣｏｎｔｒｏｌＳｙｓｔｅｍＭａｇａｚｉｎｅ、ＴｈｅＩｎｓｔｉｔｕｔｅｏｆＥｌｅｃｔｒｉｃａｌａｎｄＥｌｅｃｔｒｏｎｉｃｓＥｎｇｉｎｅｅｒｓ，ＩＮＣ、米国、昭和63年４月、Ｖｏｌ．８、Ｎｏ．２、Ｐ．17−21 ＶｉｃｔｏｒＣ．Ｃｈｅｎ、外１名、「ＬｅａｒｎｉｎｇＣｏｎｔｒｏｌｗｉｔｈＮｅｕｒａｌＮｅｔｗｏｒｋｓ」、ＰｒｏｃｅｅｄｉｎｇｏｆＩＥＥＥＩｎｔｅｒｎａｔｉｏｎａｌＣｏｎｆｅｒｅｎｃｅｏｎＲｏｂｏｔｉｃｓ＆Ａｕｔｏｍａｔｉｏｎ、米国、平成元年、Ｖｏｌ．３、ｐ．1448−1453 (58)調査した分野(Int.Cl.⁶，ＤＢ名) G05B 13/00 - 13/04 G06F 15/18 - 15/18 560 ＪＩＣＳＴファイル（ＪＯＩＳ)──────────────────────────────────────────────────続き Continued on the front page (72) Inventor Hitoshi Saito 7-1-1 Higashi Narashino, Narashino City, Chiba Prefecture Inside Hitachi Keiyo Engineering Co., Ltd. (56) References JP-A-2-136904 (JP, A) JP-A-Hei 3-105662 (JP, A) JP-A-3-105663 (JP, A) Kazushige Saga, 3 others, "Study of self-learning method", 1989 IEICE Autumn National Convention Lecture Volume 6, (Information and systems), The Institute of Electronics, Information and Communication Engineers, Japan Patent Office, accepted on August 24, 1989, p. 6.278-6.279, Hideki Hashimoto, two others, (using neural networks Servo Control, ”Proceedings of the 7th Annual Conference of the Robotics Society of Japan, Robotics Society of Japan, November 2, 1989, P. 745-746 Demeter Ps. lits, and two others, "A Multilayered Neural Network Controller", IEEE Control System Magazine, The Institute of Electronics in the United States, 8th of August. 2, Victor C. Chen, P. 17-21, and one other person, "Learning Control with Neural Networks", Proceeding of IEEE International, International Conference on Robotics, New York, USA p.1448-1453 (58) investigated the field (Int.Cl. ^6, DB name) G05B 13/00 - 13/04 G06F 15/18 - 15/18 560 JIC T file (JOIS)

Claims

(57) [Claims]

A control system for providing a predetermined output result in response to a predetermined input command; and a predetermined signal generation characteristic based on the input command, the output result, and a predetermined target evaluation value set in advance. And a learning device for generating a predetermined adjustment signal under the control of the control system. The control system sets the response characteristic of the control system by the adjustment signal, and takes an arbitrary value within a predetermined value range. Signal generation means for generating a teacher signal; evaluation means for determining an output result of the control system; and switching means for switching between a learning mode and an operation mode. When the learning mode is switched by the switching means, Generated from the learning device in a state where the response characteristics of the control system are set by the teacher signal while taking in the output of the evaluation means instead of the target evaluation value. The learning device operates so as to change the signal generation characteristic of the learning device in a direction in which the adjustment signal converges on the teacher signal. When the operating mode is switched by the switching means, the control system is controlled by the adjustment signal. A learning control device configured to operate the control system in a state where the response characteristics of the learning control device are set.

2. The learning control method according to claim 1, wherein said signal generation means is a random number generator, and said signal generation characteristic of said learning device is given by a neural network. apparatus.

3. The invention according to claim 1, wherein said control system is a PID force control system, and said adjustment signal is a proportional element gain adjustment signal, an integral element gain adjustment signal, and a differential element gain in said PID force control system. A learning control device, which is an adjustment signal.