JP2022150281A

JP2022150281A - Learning device, trajectory generator and manipulator system

Info

Publication number: JP2022150281A
Application number: JP2021052825A
Authority: JP
Inventors: 正樹小田井; Masaki Odai; 孝一黒澤; Koichi Kurosawa; 克彦平野; Katsuhiko Hirano; 克宜上野; Katsunobu Ueno
Original assignee: Hitachi GE Nuclear Energy Ltd
Current assignee: Hitachi GE Nuclear Energy Ltd
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-10-07
Anticipated expiration: 2041-03-26
Also published as: JP7479321B2

Abstract

To provide a learner which can generate a trajectory that can be commonly used for a plurality of work manipulators having different configurations, a trajector generator and a manipulator system.SOLUTION: A learning device performs learning by use of a drove command and a state signal of a manipulator for learning when the manipulator for learning is driven according to the drive command. The learning device comprises: standardization means that standardizes the drive command and the state signal on the basis of specification of the manipulator for learning; and a first learner which learns by use of the standardized drive command and state signal.SELECTED DRAWING: Figure 3

Description

本発明は、作業マニピュレータのエンドエフェクタ位置またはエンドエフェクタ姿勢の軌道を生成するときの学習装置、軌道生成器並びにマニピュレータシステムに関する。 The present invention relates to a learning device, a trajectory generator, and a manipulator system for generating a trajectory of an end effector position or posture of a working manipulator.

従来、ピックアンドプレースやはめ合い、ネジ締め、切断、穿孔などの作業を自動で行う作業装置として、少なくともロボットアームとエンドエフェクタとから構成される作業マニピュレータが用いられている。この作業マニピュレータにおいて所望の作業を実施するためには、作業を実行するエンドエフェクタの位置または姿勢もしくはその両方を作業実施可能な軌道で動作する必要がある。そのため、このエンドエフェクタの位置・姿勢の軌道を生成する軌道生成器と生成された軌道にエンドエフェクタを追従させる追従制御系を備えた作業マニピュレータが知られている。 2. Description of the Related Art Conventionally, a work manipulator composed of at least a robot arm and an end effector has been used as a work device that automatically performs work such as pick-and-place, fitting, screw tightening, cutting, and drilling. In order to perform a desired task with this task manipulator, it is necessary to move the position and/or posture of the end effector that performs the task in a trajectory that allows the task to be performed. Therefore, a work manipulator is known that includes a trajectory generator that generates a trajectory of the position and orientation of the end effector and a tracking control system that causes the end effector to follow the generated trajectory.

例えば、特許文献１は、対象とするロボットのモデルを仮想空間に作成し、仮想空間内でモデルを操作することでロボットの軌道を生成可能な、モーション編集装置が記載されている。 For example, Patent Literature 1 describes a motion editing device that can generate a trajectory of a robot by creating a target robot model in a virtual space and manipulating the model in the virtual space.

特開２００８－２５４０７４号公報JP 2008-254074 A

このような軌道生成器においては、仮想空間内のモデルに対する操作に基づいて、実空間の作業マニピュレータの作業部であるエンドエフェクタ位置・姿勢の軌道生成が可能となる。 In such a trajectory generator, it is possible to generate a trajectory of the position/orientation of the end effector, which is the working part of the working manipulator in the real space, based on the manipulation of the model in the virtual space.

しかしながら、特許文献１に記載の技術では、新たな構成の作業マニピュレータに対する軌道生成を行う場合、その新たな作業マニピュレータのモデルを構築し、仮想空間内で操作する必要がある。 However, with the technique described in Patent Literature 1, when generating a trajectory for a working manipulator with a new configuration, it is necessary to build a model of the new working manipulator and operate it in virtual space.

本発明は、上記の状況を考慮してなされたものであり、本発明の目的は、異なる構成をもつ複数の作業マニピュレータに対して共通して用いることのできる軌道を生成可能な学習装置、軌道生成器並びにマニピュレータシステムを提供することにある。 SUMMARY OF THE INVENTION The present invention has been made in consideration of the above situation, and an object of the present invention is to provide a learning apparatus capable of generating a trajectory that can be used in common for a plurality of work manipulators having different configurations. It is to provide a generator and a manipulator system.

以上のことから本発明においては、「駆動指令に従い学習用マニピュレータを駆動した時の、駆動指令と学習用マニピュレータの状態信号を用いて学習を行う学習装置であって、学習装置は、駆動指令と状態信号を学習用マニピュレータの仕様に基づいて規格化する規格化手段と、規格化された駆動指令と状態信号を用いて学習する第１の学習器を備えることを特徴とする学習装置。」としたものである。 From the above, in the present invention, "a learning device that performs learning using a drive command and a state signal of the learning manipulator when the learning manipulator is driven in accordance with the drive command, wherein the learning device includes a drive command and a state signal of the learning manipulator. A learning device comprising standardization means for standardizing the state signal based on the specifications of the manipulator for learning, and a first learning device for learning using the standardized drive command and the state signal." It is what I did.

また本発明においては、「作業用マニピュレータに対する駆動指令と、作業用マニピュレータの状態信号を用いて作業用マニピュレータの指令軌道を与える軌道生成器であって、軌道生成器は、作業用マニピュレータに対する駆動指令と状態信号を作業用マニピュレータの仕様に基づいて規格化する規格化手段と、規格化された駆動指令と状態信号を用いて推論する第２の学習器と、第２の学習器の出力を作業用マニピュレータの仕様に基づいて逆規格化する逆規格化手段とを備えて、逆規格化手段の出力を作業用マニピュレータの前記指令軌道とするとともに、第２の学習器は、学習用マニピュレータにおける駆動指令と学習用マニピュレータの状態信号を用いて学習を行うために、学習用マニピュレータにおける駆動指令と状態信号を学習用マニピュレータの仕様に基づいて規格化し、規格化された駆動指令と状態信号を用いて学習した第１の学習器であることを特徴とする軌道生成器。」としたものである。 Further, in the present invention, "a trajectory generator for providing a commanded trajectory for the working manipulator using a driving command for the working manipulator and a state signal of the working manipulator, wherein the trajectory generator is a driving command for the working manipulator. and a state signal based on the specifications of the working manipulator; a second learner for inferring using the normalized drive command and state signal; and an output of the second learner for working denormalization means for denormalizing based on the specifications of the manipulator for work, the output of the denormalization means is used as the command trajectory of the manipulator for work, and the second learning device is configured to drive the manipulator for learning. In order to perform learning using the command and the state signal of the learning manipulator, the drive command and state signal in the learning manipulator are standardized based on the specifications of the learning manipulator, and the standardized drive command and state signal are used. A trajectory generator characterized by being a learned first learner."

また本発明においては、「学習用マニピュレータでの学習結果を複数種類の作業用マニピュレータに反映させるためのマニピュレータ学習システムであって、駆動指令に従い学習用機構を駆動した時の、駆動指令と状態信号を記憶する学習用マニピュレータと、学習用マニピュレータを駆動した時の、駆動指令と学習用マニピュレータの状態信号を用いて学習を行う学習装置であって、駆動指令と状態信号を学習用マニピュレータの仕様に基づいて規格化する規格化手段と、規格化された駆動指令と状態信号を用いて学習する第１の学習器を備える学習装置と、軌道生成器からの指令軌道に基づいて駆動される１つ以上の作業用マニピュレータを備え、軌道生成器は、作業用マニピュレータに対する駆動指令と、作業用マニピュレータの状態信号を用いて作業用マニピュレータの指令軌道を与え、作業用マニピュレータに対する駆動指令と状態信号を作業用マニピュレータの仕様に基づいて規格化する規格化手段と、規格化された駆動指令と状態信号を用いて推論する第２の学習器と、第２の学習器の出力を作業用マニピュレータの仕様に基づいて逆規格化する逆規格化手段とを備えて、逆規格化手段の出力を作業用マニピュレータの指令軌道とし、第２の学習器は第１の学習器とされていることを特徴とするマニピュレータ学習システム。」としたものである。 Further, in the present invention, there is provided a "manipulator learning system for reflecting learning results of a learning manipulator on a plurality of types of working manipulators, wherein when a learning mechanism is driven in accordance with a drive command, a drive command and a state signal and a learning device that performs learning using a driving command and a state signal of the learning manipulator when the learning manipulator is driven, wherein the driving command and the state signal conform to the specifications of the learning manipulator a learning device comprising a normalization means for normalizing based on a first learning device for learning using the normalized drive command and state signal; The trajectory generator includes the above working manipulator, and the trajectory generator uses the driving command for the working manipulator and the state signal of the working manipulator to give the command trajectory for the working manipulator, and the driving command and the state signal for the working manipulator. a normalization means for normalizing based on the specifications of the manipulator for work; a second learning device for inferring using the normalized drive command and state signal; a denormalization means for denormalizing based on the denormalization means, the output of the denormalization means is used as the command trajectory of the working manipulator, and the second learning device is the first learning device. Manipulator learning system."

本発明によれば、構成の異なる複数の作業マニピュレータにおいて、共通の軌道生成器を提供できるようになる。 According to the present invention, a common trajectory generator can be provided for a plurality of working manipulators having different configurations.

上記以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 Problems, configurations, and effects other than those described above will be clarified by the following description of the embodiments.

本発明の実施例に係るマニピュレータシステムの構成例を示す図。The figure which shows the structural example of the manipulator system based on the Example of this invention. 学習装置の構成例を概略的に示す図。The figure which shows roughly the structural example of a learning apparatus. 軌道生成器の構成例を概略的に示す図。The figure which shows roughly the structural example of a trajectory generator. 軌道生成器の別の構成例を概略的に示す図。The figure which shows roughly another structural example of a trajectory generator. アップサンプリングの例を概略的に示す図。FIG. 4 schematically illustrates an example of upsampling; ダウンサンプリングの例を概略的に示す図。FIG. 4 schematically shows an example of downsampling;

以下、本発明の実施例について、図面を参照しながら説明する。なお、本発明は、実施例に限定されるものではなく、実施例における種々の信号等は例示である。また、本明細書および図面において、同一の構成要素または実質的に同一の機能を有する構成要素には同一の符号を付することとし、重複する説明は省略する。 BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described with reference to the drawings. The present invention is not limited to the examples, and various signals and the like in the examples are examples. In addition, in the present specification and drawings, the same components or components having substantially the same functions are denoted by the same reference numerals, and overlapping descriptions are omitted.

図１は、本発明の実施例に係るマニピュレータシステムの構成の一例を示す概略図である。 FIG. 1 is a schematic diagram showing an example of the configuration of a manipulator system according to an embodiment of the invention.

マニピュレータシステム１は、学習用マニピュレータ１１と学習装置１２とひとつ以上の作業用マニピュレータ１３とで構成される。学習用マニピュレータ１１は、ひとつ以上の作業用マニピュレータ１３の内のひとつ又は一部であってもよい。 The manipulator system 1 comprises a learning manipulator 11 , a learning device 12 and one or more working manipulators 13 . The learning manipulator 11 may be one or part of one or more working manipulators 13 .

学習用マニピュレータ１１は、学習用機構１１１と、学習用制御装置１１２と、操作端末１１３とで構成される。このうち学習用機構１１１は、所定の作業を行う学習用作業部１１１１と、学習用作業部１１１１を所望の位置や姿勢に配置する学習用移動機構１１１２と、学習用機構１１１の状態をセンシングする学習用センサ１１１３とで構成される。ここで、学習用センサ１１１３は例えば、学習用移動機構１１１２の関節角を検出するエンコーダや、学習用作業部１１１１に作用する作業反力を検出可能な力・トルクセンサなどが挙げられる。 The learning manipulator 11 is composed of a learning mechanism 111 , a learning control device 112 and an operation terminal 113 . Of these, the learning mechanism 111 includes a learning working unit 1111 that performs a predetermined work, a learning moving mechanism 1112 that arranges the learning working unit 1111 in a desired position and posture, and a learning mechanism 111 that senses the state of the learning mechanism 111 . It is composed of a sensor 1113 for learning. Here, the learning sensor 1113 includes, for example, an encoder that detects the joint angle of the learning moving mechanism 1112 and a force/torque sensor that can detect work reaction force acting on the learning working unit 1111 .

学習用制御装置１１２は、学習用駆動部１１２１と、学習用状態演算部１１２２と学習用記憶部１１２３とで構成される。学習用駆動部１１２１は、オペレータ操作などによる操作端末１１３からの駆動指令２０１に従い、学習用機構１１１を駆動する。 The learning control device 112 is composed of a learning drive unit 1121 , a learning state calculation unit 1122 and a learning storage unit 1123 . The learning drive unit 1121 drives the learning mechanism 111 in accordance with a drive command 201 from the operation terminal 113 operated by an operator or the like.

学習用状態演算部１１２２は、学習用センサ１１１３のセンサ出力２０２に基づいて、学習用機構１１１や対象作業などの状態信号２０３を演算する。この時、学習用状態演算部１１２２は、少なくともひとつ以上の、学習用機構１１１の構成に依存しない状態信号２０３を演算して出力する。例えば、学習用作業部１１１１の位置・姿勢や作業反力などである。学習用作業部１１１１の位置・姿勢は、学習用移動機構１１１２のリンク長や関節角により座標変換で演算でき、作業反力は力・トルクセンサと学習用作業部１１１１との相対的な位置・姿勢から座標変換で演算できる。学習用記憶部１１２３は、操作端末１１３の出力した駆動指令２０１と学習用状態演算部１１２２の出力した状態信号２０３を記憶する。 Based on the sensor output 202 of the learning sensor 1113, the learning state calculation unit 1122 calculates the state signal 203 of the learning mechanism 111, the target work, and the like. At this time, the learning state calculation unit 1122 calculates and outputs at least one state signal 203 that does not depend on the configuration of the learning mechanism 111 . For example, it is the position/orientation of the learning work unit 1111 and work reaction force. The position/orientation of the learning working unit 1111 can be calculated by coordinate conversion from the link length and joint angle of the learning moving mechanism 1112, and the work reaction force is the relative position/posture between the force/torque sensor and the learning working unit 1111. It can be calculated by coordinate transformation from the posture. The learning storage unit 1123 stores the drive command 201 output from the operation terminal 113 and the state signal 203 output from the learning state calculation unit 1122 .

この構成により、学習用マニピュレータ１１は、オペレータなどによる操作端末１１３の操作に基づいて、学習用機構１１１で対象作業を模擬的もしくは実際に行いながら、対象作業を実施する際の駆動指令２０１や状態信号２０３を収集して学習用記憶部１１２３に記憶する。 With this configuration, the learning manipulator 11 simulates or actually performs the target work with the learning mechanism 111 based on the operation of the operation terminal 113 by an operator or the like, while the learning mechanism 111 simulates or actually performs the target work. The signal 203 is collected and stored in the learning storage unit 1123 .

図１において、学習装置１２は、学習用マニピュレータ１１による模擬的もしくは実際の作業で学習用記憶部１１２３に記憶された駆動指令２０１や作業状態２０３などのデータを学習し、対象作業を実行するための作業部位置・姿勢の軌道を生成する学習器を構築する。学習装置１２について、図２を用いてその詳細を後述する。 In FIG. 1, the learning device 12 learns data such as the drive command 201 and the work state 203 stored in the learning storage unit 1123 through simulated or actual work by the learning manipulator 11, and performs the target work. Build a learner that generates the trajectory of the position and posture of the working part. Details of the learning device 12 will be described later with reference to FIG.

一つ以上の作業用マニピュレータ１３は、作業用機構１３１と、作業用制御装置１３２とで構成される。ここで作業用マニピュレータ１３の作業用機構１３１は、学習用マニピュレータ１１の学習用機構１１１と同一に構成されたものであっても、また寸法等が相違して構成されたものであってもよい。従来は、同一機種で学習したものを実機に適用してきたが、本発明においては学習結果を別種機器にも適用可能とするものである。 One or more working manipulators 13 are composed of a working mechanism 131 and a working control device 132 . Here, the working mechanism 131 of the working manipulator 13 may have the same configuration as the learning mechanism 111 of the learning manipulator 11, or may have different dimensions. . Conventionally, what is learned with the same model is applied to the actual device, but in the present invention, the learning result can be applied to different types of devices.

このうち作業用マニピュレータ１３の作業用機構１３１は、学習用機構１１１同様、作業用作業部１３１１と、作業用移動機構１３１２と、作業用センサ１３１３とで構成される。ここで、作業用センサ１３１３は、学習用センサ１１１３同様、例えばエンコーダや力・トルクセンサなどが挙げられる。 Among them, the working mechanism 131 of the working manipulator 13 is composed of a working working part 1311 , a working moving mechanism 1312 , and a working sensor 1313 , like the learning mechanism 111 . As with the learning sensor 1113, the working sensor 1313 may be, for example, an encoder or a force/torque sensor.

作業用制御装置１３２は、作業用駆動部１３２１と、作業用状態演算部１３２２と軌道生成器１３２３とで構成される。作業用駆動部１３２１は、軌道生成器１３２３の出力である作業用作業部１３１の位置・姿勢の指令軌道２０４に従い、作業用機構１３１を駆動する。作業用状態演算部１３２２は、学習用状態演算部１３２２と同様に、状態信号２０３を演算して出力する。 The work control device 132 includes a work drive unit 1321 , a work state calculation unit 1322 and a trajectory generator 1323 . The working drive unit 1321 drives the working mechanism 131 in accordance with the position/orientation command trajectory 204 of the working working unit 131 , which is output from the trajectory generator 1323 . Like the learning state calculator 1322, the work state calculator 1322 calculates and outputs the state signal 203. FIG.

軌道生成器１３２３は、作業用状態演算部１３２２の出力と外部装置である作業端末１３３からの作業指示である駆動指令２０１に基いて、図２で後述する学習器１２１を用いて作業用作業部１３１の位置・姿勢の指令軌道を生成して出力する。ここで学習器１２１は、図１の学習装置１２で学習した結果を反映して（移植されて）構成された学習機能である。 The trajectory generator 1323 uses the learning device 121, which will be described later in FIG. 131 position/orientation instruction trajectories are generated and output. Here, the learning device 121 is a learning function configured by reflecting (implanting) the results of learning by the learning device 12 of FIG.

この構成により、作業用マニピュレータ１３は、学習用マニピュレータ１１による対象作業実施時のデータを学習した学習器１２１をもつ軌道生成器１３２３の生成した指令軌道２０４に基づいて作業用作業部１３１１を駆動することで対象作業を実施できる。 With this configuration, the working manipulator 13 drives the working working unit 1311 based on the command trajectory 204 generated by the trajectory generator 1323 having the learner 121 that has learned the data when the learning manipulator 11 performs the target work. By doing so, the target work can be performed.

図２は、学習装置１２の構成の一例を示す概略図である。学習装置１２は、学習器１２１と、規格化演算部１２２と、正規化演算部１２３とからなる。学習装置１２の入力は、図１に示したように、操作端末１１３からの駆動指令２０１と学習用状態演算部１１２２からの学習用機構１１１や対象作業などの状態信号２０３である。この例では駆動指令２０１は、現在の制御時点における指令軌道データ２０１ａ及び次回制御周期における指令軌道データ２０１ｃ、並びに位置や姿勢についての最終目標値２０１ｂを含んで構成されている。 FIG. 2 is a schematic diagram showing an example of the configuration of the learning device 12. As shown in FIG. The learning device 12 comprises a learning device 121 , a normalization calculation section 122 and a normalization calculation section 123 . Inputs to the learning device 12 are, as shown in FIG. In this example, the drive command 201 includes command trajectory data 201a at the current control time, command trajectory data 201c at the next control cycle, and final target values 201b for position and attitude.

規格化演算部１２２は、各入力（駆動指令２０１および状態信号２０３）を、学習用マニピュレータ１１の仕様に基づいて規格化する。例えば、学習用マニピュレータ１１において鉛直方向の稼働範囲がＺＬ１からＺＨ１であった場合、鉛直方向のデータを示す入力信号Ｚに対して、（１）式を適用することで規格化する。 The normalization calculation unit 122 normalizes each input (the driving command 201 and the state signal 203) based on the specifications of the learning manipulator 11. FIG. For example, if the learning manipulator 11 has a vertical operation range from ZL1 to ZH1, the input signal Z representing vertical data is normalized by applying equation (1).

また例えば、学習用マニピュレータ１１における学習用作業部１１１１に掛かる定常外力値をＦｓ１とした場合、作業反力など学習用作業部１１１１に掛かる外力を示す入力信号Ｆに対して、（２）式を適用することで規格化する。 Further, for example, when the stationary external force value applied to the learning working portion 1111 in the learning manipulator 11 is Fs1, the input signal F indicating the external force applied to the learning working portion 1111, such as work reaction force, is expressed by the equation (2). Standardize by applying.

また正規化演算部１２３は、各入力を０から１までの値となるよう正規化する。例えば、入力の取り得る値がＷＬからＷＨであった場合、入力Ｗに対して（３）式を適用することで規格化する。 Also, the normalization calculation unit 123 normalizes each input to a value between 0 and 1. FIG. For example, when the possible values of the input are from WL to WH, the input W is normalized by applying equation (3).

ここで、正規化とは最大値と最小値を１．０と０として入力をこの範囲内に定めたものであるに対し、規格化では機器などで定まる定格値を例えば１．０とするものであるため、過負荷などの状態では１．０以上となることがある。 Here, normalization is to set the maximum and minimum values to 1.0 and 0 and set the input within this range. Therefore, it may become 1.0 or more in a state such as an overload.

学習器１２１は、たとえばニューラルネットワークで構成できる。図２では、学習器１２１の入力の一例として、最終目標値２０１ｂと、今回制御周期における指令軌道データ２０１ａと、状態信号データ２０３とが挙げられており、これらを規格化・正規化して学習器１２１に入力する。 The learning device 121 can be composed of, for example, a neural network. In FIG. 2, examples of inputs to the learning device 121 include a final target value 201b, command trajectory data 201a in the current control cycle, and state signal data 203. These are standardized and normalized to Enter 121.

最終目標値２０１ｂは、対象作業が終了したときの学習用作業部１１１１の位置・姿勢の目標値であり、例えば学習用マニピュレータ１１による対象作業が完了したときの駆動指令２０１から演算した作業部位置・姿勢とすればよい。今回制御周期における指令軌道データ２０１ａは、少なくとも現時点をふくむ学習用作業部１１１１の位置・姿勢の指令データである。状態信号データ２０３は、少なくとも現時点を含むデータであり、例えば学習用作業部１１１１の位置・姿勢の実現値や作業反力などである。 The final target value 201b is the target value of the position/orientation of the learning working unit 1111 when the target work is completed.・It should be posture. The command trajectory data 201a in the current control cycle is command data of the position/orientation of the learning working unit 1111 including at least the current time. The state signal data 203 is data including at least the current time, and includes, for example, realization values of the position/orientation of the learning work unit 1111, work reaction force, and the like.

学習器１２１の出力は、次周期での学習用作業部１１１１の位置・姿勢の指令値である次周期指令軌道生成値２１０である。学習装置１２は、学習器１２１のパラメータを、次周期指令軌道生成値２１０と、学習用マニピュレータ１１を用いた対象作業でデータ取得した際の、次周期での学習用作業部１１１１の位置・姿勢軌道点を規格化・正規化した正規化後次周期指令軌道データ２０９との誤差２１１を小さくするよう学習させる。 The output of the learning device 121 is the next cycle command trajectory generation value 210, which is the command value of the position/orientation of the learning working unit 1111 in the next cycle. The learning device 12 sets the parameters of the learning device 121 to the next cycle command trajectory generation value 210 and the position/orientation of the learning working unit 1111 in the next cycle when data is acquired in the target work using the learning manipulator 11. Learning is performed so as to reduce the error 211 with the normalized next cycle command trajectory data 209 obtained by standardizing and normalizing the trajectory points.

これにより、最終目標値２０１ｂと、現時点の指令および作業状況を示す指令軌道データ２０１ａと状態信号データ２０３とから、次周期での学習用作業部１１１１の位置・姿勢の指令軌道を示す次周期指令軌道生成値２１０の推論演算が可能となる。ここで、指令軌道データ２０１ａや状態信号データ２０３など現状を示す入力信号について、現時点のデータに加えて過去のデータも用いることで、指令軌道の時間的なつながりを学習可能となり、連続的な指令軌道の生成が可能となる。また同様に、学習器１２１をリカレントニューラルネットワークで構築するなど、時系列学習可能なネットワーク構成とすることでも、同様に連続的な指令軌道の生成が可能となる。 As a result, from the final target value 201b, the command trajectory data 201a indicating the current command and work situation, and the state signal data 203, the next cycle command indicating the command trajectory of the position/orientation of the learning working unit 1111 in the next cycle Inferential computation of trajectory generation values 210 is enabled. By using past data in addition to current data for the input signals indicating the current state such as the command trajectory data 201a and the state signal data 203, it becomes possible to learn the temporal connection of the command trajectories, and to continuously issue commands. Trajectories can be generated. Similarly, continuous instruction trajectories can also be generated by constructing the learning device 121 with a recurrent neural network, or by constructing a network configuration that enables time-series learning.

さらにここで、規格化演算部１２２による規格化により学習用マニピュレータ１１の仕様の影響をなくした入力信号を用いて学習することで、学習用マニピュレータ１１とはサイズなどが異なる仕様をもつ作業用マニピュレータ１３に対する、学習器１２１の適用が可能となる。 Furthermore, here, by learning using an input signal that eliminates the influence of the specifications of the learning manipulator 11 through normalization by the normalization calculation unit 122, a working manipulator having specifications such as a size different from those of the learning manipulator 11 is obtained. 13, the application of the learning device 121 becomes possible.

またさらに、学習器１２１の入力信号である状態信号２０３を学習用作業部１１１１の位置・姿勢や作業反力など、学習用マニピュレータ１１の構成の影響のない状態量とすることで、リンク長など異なる構成をもつ作業用マニピュレータ１３に対する、学習器１２１の適用が可能となる。 Furthermore, the state signal 203, which is the input signal of the learning device 121, is a state quantity that is not affected by the configuration of the learning manipulator 11, such as the position/orientation and work reaction force of the learning working unit 1111, so that the link length, etc. It is possible to apply the learning device 121 to the working manipulators 13 having different configurations.

図３は、軌道生成器１３２３の構成の一例を示す概略図である。軌道生成器１３２３は、作業時規格化演算部１３２３１と、作業時正規化演算部１３２３２と、学習器１２１と、遅延器１３２３３と、逆正規化演算部１３２３４と、逆規格化演算部１３２３５とからなる。 FIG. 3 is a schematic diagram showing an example of the configuration of the trajectory generator 1323. As shown in FIG. The trajectory generator 1323 includes a working normalization calculation unit 13231, a working normalization calculation unit 13232, a learning device 121, a delay device 13233, an inverse normalization calculation unit 13234, and an inverse normalization calculation unit 13235. Become.

作業時規格化演算部１３２３１は、外部装置である作業端末１３３からの作業指示で与えられた最終目標値２０１ｂと状態信号２０３を規格化する。ここで例えば対象作業を物品のピックアップとした場合、作業用作業部１３１１の稼働範囲や外力の仕様が学習用マニピュレータ１１と作業用マニピュレータ１３とで異なっても作業に本質的な影響がない。この場合は、作業用マニピュレータ１３の仕様が有効利用できるよう、作業時規格化演算部１３２３１は作業用マニピュレータ１３の仕様を用いて入力信号を規格化する。また例えば、切断作業における作業反力など、対象作業において絶対的な大きさが重要な入力信号に対しては、学習用マニピュレータ１１の仕様を用いて当該入力信号を規格化し、それ以外の入力信号は作業用マニピュレータ１３の仕様を用いて規格化する。 The work time normalization calculation unit 13231 normalizes the final target value 201b and the state signal 203 given by the work instruction from the work terminal 133, which is an external device. Here, for example, if the target work is to pick up an article, even if the operating range of the working working part 1311 and the specification of the external force are different between the learning manipulator 11 and the working manipulator 13, the work is essentially unaffected. In this case, the working normalization calculation unit 13231 uses the specifications of the working manipulator 13 to normalize the input signal so that the working manipulator 13 specifications can be used effectively. Further, for example, for an input signal whose absolute magnitude is important in the target work, such as work reaction force in cutting work, the specification of the learning manipulator 11 is used to standardize the input signal, and other input signals is standardized using the specifications of the working manipulator 13 .

作業時正規化演算部１３２３２は、規格化した最終目標値２０１ｂと状態信号２０３が０から１の値となるよう正規化する。この時、（３）式で示した正規化演算に用いる最大値ＷＨおよび最小値ＷＬは、学習装置１２での正規化演算部１２３と同一の値を用いる。 The working normalization calculation unit 13232 normalizes the normalized final target value 201b and the state signal 203 so as to have values between 0 and 1. FIG. At this time, the maximum value WH and the minimum value WL used in the normalization calculation shown in equation (3) are the same values as those of the normalization calculation section 123 in the learning device 12 .

学習器１２１は、学習装置１２で学習した学習器１２１を用いる。学習器１２１の出力である次周期軌道生成値２１０を遅延器１３２３３で１周期遅らせることで、正規化後指令軌道データ２１２として、正規化した最終目標値２０１ｂと状態信号２０３とともに学習器１２１に入力することで、次周期軌道生成値２１０を推論演算できる。ここで、状態信号２０３や正規化後指令軌道データ２１２に現在値に加えて過去の値を用いる場合は、メモリ（図示せず）により状態信号入力２０３や正規化後指令軌道データ２１２の過去値を保存して用いればよい。 The learning device 121 uses the learning device 121 trained by the learning device 12 . By delaying the next cycle trajectory generation value 210, which is the output of the learning device 121, by one cycle with the delay device 13233, it is input to the learning device 121 as normalized command trajectory data 212 together with the normalized final target value 201b and the state signal 203. By doing so, the next periodic trajectory generation value 210 can be inferentially calculated. Here, when past values are used in addition to the present values for the state signal 203 and the normalized command trajectory data 212, past values of the state signal input 203 and the normalized command trajectory data 212 are stored in a memory (not shown). should be saved and used.

逆正規化演算部１３２３４は、（４）式を用いて次周期軌道生成値２１０（式中Ｗ２）を逆正規化する。この際、ＷＨ、ＷＬには作業時正規化演算部１３２３２で作業用作業部１３１１の位置・姿勢の正規化に用いた（３）式のパラメータを使用する。 The inverse normalization calculation unit 13234 inversely normalizes the next periodic trajectory generation value 210 (W2 in the expression) using expression (4). At this time, for WH and WL, the parameters of equation (3) used for normalization of the position/orientation of the work unit 1311 by the work normalization calculation unit 13232 are used.

逆規格化演算部１３２３５は、（５）式を用いて逆正規化演算部１３２３２の出力信号（式中Ｇ２）を逆規格化する。この際、ＧＨ、ＧＬには作業時規格化演算部１３２３１で作業用作業部１３１１の位置・姿勢の規格化に用いたパラメータを使用する。 The denormalization calculation section 13235 denormalizes the output signal (G2 in the formula) of the denormalization calculation section 13232 using the equation (5). At this time, for GH and GL, the parameters used for normalization of the position/orientation of the working part 1311 by the working normalization calculating part 13231 are used.

この構成により、作業用マニピュレータ１３は、学習用マニピュレータ１１での対象作業で学習した学習器１２１を用いて作業用作業部１３１１の位置・姿勢の指令軌道２０４を生成できる。 With this configuration, the working manipulator 13 can generate the position/orientation command trajectory 204 of the working working part 1311 using the learner 121 learned by the learning manipulator 11 in the target work.

例えば上記の通りマニピュレータシステム１を構成することで、作業用マニピュレータ１３で対象作業を実施する場合、学習用マニピュレータ１１での対象作業で学習した学習器１２１を用いて、作業用作業部１３１１の位置・姿勢の指令軌道２０４を生成できる。ここで、学習器１２１の入力を規格化することで、サイズなど異なる仕様をもつ作業用マニピュレータ１３に対する学習器１２１の適用が可能となる。 For example, by configuring the manipulator system 1 as described above, when performing the target work with the working manipulator 13, the learner 121 learned in the target work with the learning manipulator 11 is used to determine the position of the working working unit 1311. • A command trajectory 204 for attitude can be generated. Here, by standardizing the input of the learning device 121, it becomes possible to apply the learning device 121 to working manipulators 13 having different specifications such as size.

また、学習用状態演算部１１２２および作業用状態演算部１３２２において状態信号２０３を学習用マニピュレータ１１および作業用マニピュレータ１３の構成の影響のない状態量とすることで、リンク長など異なる構成をもつ作業用マニピュレータ１３に対して、学習器１２１の適用が可能となる。 In addition, by setting the state signal 203 in the learning state calculation unit 1122 and the work state calculation unit 1322 to a state quantity that is not affected by the configuration of the learning manipulator 11 and the work manipulator 13, work with different configurations such as link length can be performed. It becomes possible to apply the learning device 121 to the manipulator 13 for use.

言い換えれば、学習用マニピュレータ１１で対象作業を学習することで、仕様や構成の異なるひとつ以上の作業用マニピュレータ１３での対象作業が可能となる。 In other words, by learning the target work with the learning manipulator 11, one or more work manipulators 13 with different specifications and configurations can perform the target work.

実施例１によれば、学習用マニピュレータ１１と作業用マニピュレータ１３とでサイズなどが異なる別種機であっても、学習用マニピュレータ１１での学習結果を作業用マニピュレータ１３に移植し、反映させることが可能となる。 According to the first embodiment, even if the manipulator for learning 11 and the manipulator for work 13 are different models having different sizes, etc., the learning result of the manipulator for learning 11 can be transplanted and reflected in the manipulator for work 13. It becomes possible.

然るにこの場合に想定される問題は、サイズが異なることから、例えば１制御周期での制御量が同一値であると仮定した場合に、例えば作業用マニピュレータ１３のサイズが学習用マニピュレータ１１と比べて大きいとすると、目標位置までの移動に要する時間が長くなることになる。逆にサイズが小さい場合には、移動に要する時間が早く、また位置精度の確保が困難となる。実施例２では、この点をさらに見直したものである。なお実施例２では、実施例１に記載のマニピュレータシステム１と異なる部分のみについて示す。 However, the problem assumed in this case is that since the sizes are different, for example, if it is assumed that the control amount in one control cycle is the same value, for example, the size of the working manipulator 13 is larger than that of the learning manipulator 11. If it is large, the time required for movement to the target position will be long. Conversely, when the size is small, the time required for movement is short, and it becomes difficult to secure positional accuracy. In Example 2, this point is further reviewed. In addition, in the second embodiment, only parts different from the manipulator system 1 described in the first embodiment are shown.

図４は、実施例２に係る軌道生成器の構成の例を示す概略図である。軌道生成器１３２３は、作業時規格化演算部１３２３１と、作業時正規化演算部１３２３２と、学習器１２１と、遅延器１３２３３と、逆正規化演算部１３２３４と、逆規格化演算部１３２３５と、サンプラＡ１３２３６と、サンプラＢ１３２３７とからなる。 FIG. 4 is a schematic diagram showing an example of the configuration of a trajectory generator according to the second embodiment. The trajectory generator 1323 includes a work normalization calculation unit 13231, a work normalization calculation unit 13232, a learning device 121, a delay device 13233, an inverse normalization calculation unit 13234, an inverse normalization calculation unit 13235, It consists of a sampler A13236 and a sampler B13237.

実施例１と比較すると、学習器１２１の前後にサンプラ１３２３６、１３２３７が追加された構成となっている。サンプラ１３２３６、１３２３７の適用により、作業用マニピュレータ１３の制御演算周期ＴＢに対し、学習器１２１の部分における計算機の演算周期をＴＣとすることができる。 Compared with the first embodiment, the configuration is such that samplers 13236 and 13237 are added before and after the learning device 121 . By applying the samplers 13236 and 13237, the operation cycle of the computer in the learning device 121 portion can be set to TC with respect to the control operation cycle TB of the working manipulator 13 .

図４において、まず軌道生成器１３２３は、作業用マニピュレータ１３の制御演算周期である作業用制御周期ＴＢとは異なる推論演算周期ＴＣで学習器１２１による推論演算を行う。 In FIG. 4, the trajectory generator 1323 first performs the inference calculation by the learning device 121 in the inference calculation cycle TC different from the work control cycle TB which is the control calculation cycle of the work manipulator 13 .

ＴＢがＴＣより長い周期の場合、サンプラＡ１３２３６はアップサンプリングにより周期ＴＢの信号を周期ＴＣに変換し、サンプラＢ１３２３７はダウンサンプリングにより周期ＴＣの信号を周期ＴＢに変換する。 When TB has a longer period than TC, the sampler A 13236 converts the period TB signal into period TC by upsampling, and the sampler B 13237 converts the period TC signal into period TB by downsampling.

ＴＣがＴＢより長い周期の場合、サンプラＡ１３２３６はダウンサンプリングにより周期ＴＢの信号を周期ＴＣに変換し、サンプラＢ１３２３７はアップサンプリングにより周期ＴＣの信号を周期ＴＢに変換する。 When TC has a period longer than TB, the sampler A 13236 converts the period TB signal into period TC by downsampling, and the sampler B 13237 converts the period TC signal into period TB by upsampling.

例えば、学習用マニピュレータ１１での学習用記憶部１１２３へのデータ記録周期をＴＡ、稼働範囲をＲＡ、最高速度をＶＡとし、作業用マニピュレータ１３の稼働範囲をＲＢ、最高速度をＶＢとした場合、（６）式で導かれる推論周期ＴＣで学習器１２１を演算することで、作業用マニピュレータ１３において軌道生成器１３２３の生成する指令軌道２０４は、最高速度ＶＢ以下の軌道となる。つまり、軌道生成器１３２３に複数の周期を用いることで、規格化や異なる制御周期により追従できない軌道の生成を防止できる。 For example, when the data recording cycle of the learning manipulator 11 to the learning storage unit 1123 is TA, the operating range is RA, the maximum speed is VA, and the operating range of the working manipulator 13 is RB, and the maximum speed is VB, By operating the learner 121 with the inference period TC derived from the equation (6), the command trajectory 204 generated by the trajectory generator 1323 in the working manipulator 13 becomes a trajectory of the maximum speed VB or less. That is, by using a plurality of cycles for the trajectory generator 1323, generation of trajectories that cannot be followed due to normalization or different control cycles can be prevented.

図５は、アップサンプリングの方法の一例を示す概略図である。サンプラＡ１３２３６またはサンプラＢ１３２３７において、入力信号３０１の周期をＴ１とし、出力信号３０２の周期をＴ２とすると、Ｔ１がＴ２より長い周期の場合、例えば図５に示す方法でアップサンプリングできる。 FIG. 5 is a schematic diagram illustrating an example of an upsampling method. In the sampler A13236 or sampler B13237, if the period of the input signal 301 is T1 and the period of the output signal 302 is T2, upsampling can be performed by the method shown in FIG. 5, for example, when T1 is longer than T2.

時間（ｋ－１）Ｔ１から次に入力信号３０１の入る時間ｋＴ１までを補間して補間特性３０３とする。出力信号３０２を生成する時間ｉＴ２では、補間特性３０３の値を出力信号３０２とする。 Interpolation characteristics 303 are obtained by interpolating from time (k−1)T1 to time kT1 at which the next input signal 301 enters. At the time iT2 when the output signal 302 is generated, the value of the interpolated characteristic 303 is used as the output signal 302 .

これにより、入力信号３０１の周期Ｔ１よりも短い周期Ｔ２をもつ出力信号３０２の生成が可能となる。 This makes it possible to generate an output signal 302 having a period T2 shorter than the period T1 of the input signal 301. FIG.

ここで、図５では補間特性３０３を直前の時間（ｋ－１）Ｔ１での入力信号３０１を０次関数で補間して生成しているが、これはこの限りではない。時間（ｋ－２）Ｔ１などさらに過去の入力信号３０１も用いて１次以上の関数で補間してもよく、スプライン関数などを用いてもよい。 Here, in FIG. 5, the interpolated characteristic 303 is generated by interpolating the input signal 301 at the immediately preceding time (k−1) T1 with a zero-order function, but this is not the only option. Further past input signals 301 such as time (k-2)T1 may be used to interpolate with a linear function or higher, or a spline function or the like may be used.

図６は、ダウンサンプリングの方法の一例を示す概略図である。サンプラＡ１３２３６またはサンプラＢ１３２３７において、入力信号３０１の周期をＴ１とし、出力信号３０２の周期をＴ２とすると、Ｔ２がＴ１より長い周期の場合、例えば図６に示す方法でダウンサンプリングできる。 FIG. 6 is a schematic diagram illustrating an example of a downsampling method. In the sampler A13236 or sampler B13237, if the period of the input signal 301 is T1 and the period of the output signal 302 is T2, down-sampling can be performed by the method shown in FIG. 6, for example, when T2 is longer than T1.

これにより、入力信号３０１の周期Ｔ１よりも長い周期Ｔ２をもつ出力信号３０２の生成が可能となる。 This makes it possible to generate an output signal 302 having a period T2 longer than the period T1 of the input signal 301. FIG.

ここで、図６では補間特性３０３を直前の時間（ｋ－１）Ｔ１での入力信号３０１を０次関数で補間して生成しているが、これはこの限りではない。時間（ｋ－２）Ｔ１などさらに過去の入力信号３０１も用いて１次以上の関数で補間してもよく、スプライン関数などを用いてもよい。 Here, in FIG. 6, the interpolated characteristic 303 is generated by interpolating the input signal 301 at the immediately preceding time (k−1)T1 with a 0th-order function, but this is not the only option. Further past input signals 301 such as time (k-2)T1 may be used to interpolate with a linear function or higher, or a spline function or the like may be used.

１：マニピュレータシステム
１１：学習用マニピュレータ
１１１：学習用機構
１１１１：学習用作業部
１１１２：学習用移動機構
１１１３：学習用センサ
１１２：学習用制御装置
１１２１：学習用駆動部
１１２２：学習用状態演算部
１１２３：学習用記憶部
１１３：操作端末
１２：学習装置
１２１：学習器
１２２：規格化演算部
１２３：正規化演算部
１３：作業用マニピュレータ
１３１：作業用機構
１３１１：作業用作業部
１３１２：作業用移動機構
１３１３：学習用センサ
１３２：作業用制御装置
１３２１：作業用駆動部
１３２２：作業用状態演算部
１３２３：軌道生成器
１３２３１：作業時規格化演算部
１３２３２：作業時正規化演算部
１３２３３：遅延部
１３２３４：逆正規化演算部
１３２３５：逆規格化演算部
１３３：作業端末
２０１：駆動指令
２０２：センサ出力
２０３：状態信号
２０４：指令軌道
２０１ｂ：最終目標値
２０１ａ：指令軌道データ
２０８：次周期指令軌道データ
２０９：正規化後次周期指令軌道データ
２０１ｃ：次周期指令軌道生成値
２１１：誤差
２１２：正規化後指令軌道データ
３０１：補間特性 1: Manipulator system 11: Learning manipulator 111: Learning mechanism 1111: Learning working unit 1112: Learning moving mechanism 1113: Learning sensor 112: Learning control device 1121: Learning driving unit 1122: Learning state calculation unit 1123: storage unit for learning 113: operation terminal 12: learning device 121: learning device 122: normalization calculation unit 123: normalization calculation unit 13: working manipulator 131: working mechanism 1311: working working part 1312: working Moving mechanism 1313: Learning sensor 132: Work control device 1321: Work drive unit 1322: Work state calculation unit 1323: Trajectory generator 13231: Work normalization calculation unit 13232: Work normalization calculation unit 13233: Delay Unit 13234: Inverse normalization calculation unit 13235: Inverse normalization calculation unit 133: Work terminal 201: Drive command 202: Sensor output 203: State signal 204: Commanded trajectory 201b: Final target value 201a: Commanded trajectory data 208: Next cycle command Trajectory data 209: Normalized next cycle command trajectory data 201c: Next cycle command trajectory generation value 211: Error 212: Normalized command trajectory data 301: Interpolation characteristics

Claims

A learning device that performs learning using a drive command and a state signal of the learning manipulator when the learning manipulator is driven in accordance with the drive command,
The learning device includes normalization means for normalizing the driving command and the state signal based on the specifications of the manipulator for learning, and a first learning device for learning using the normalized driving command and the state signal. A learning device comprising:

The learning device according to claim 1,
The learning device, wherein the drive command includes command trajectory data in the current control cycle and the next control cycle, and final target values of position and attitude.

The learning device according to claim 2,
The first learning device receives the command trajectory data in the current control cycle, the final target value, and the state signal as inputs, and performs learning that optimizes the difference between the learning result and the command trajectory data in the next control cycle. A learning device characterized by executing

The learning device according to any one of claims 1 to 3,
A learning device comprising normalization means for converting the output of the normalization means into a value within a predetermined numerical range, and using the output as the input of the first learning device.

A trajectory generator that provides a command trajectory for the working manipulator using a drive command for the working manipulator and a state signal of the working manipulator,
The trajectory generator includes standardization means for standardizing the drive command and the state signal for the work manipulator based on the specifications of the work manipulator, and a trajectory generator for reasoning using the standardized drive command and the state signal. 2 learners and denormalization means for denormalizing the output of the second learner based on the specifications of the working manipulator, wherein the output of the denormalization means is converted to the command of the working manipulator. Orbital and
In order to perform learning using the driving command and the state signal of the learning manipulator, the second learning device transmits the driving command and the state signal of the learning manipulator based on the specifications of the learning manipulator. A trajectory generator, wherein the trajectory generator is a first learning device that is normalized and learns using the normalized drive command and the state signal.

A trajectory generator according to claim 5, comprising:
The drive command includes final target values of position and orientation,
The second learning device is characterized in that the command trajectory data in the next control cycle, which is the output thereof, is delayed by one cycle, and input together with the final target value and the state signal as command trajectory data in the current control cycle for learning. trajectory generator.

A trajectory generator according to claim 5 or claim 6,
The second learning device includes normalization means for converting the output of the normalization means into a value within a predetermined numerical range, and denormalization means for denormalizing the output of the second learning device, A trajectory generator, wherein the output of denormalization means is used as the input of said denormalization means.

A trajectory generator according to any one of claims 5 to 7,
A trajectory generator, wherein the trajectory generator is composed of a computer, and a control cycle in the second learning device and a control cycle in portions other than the second learning device are different.

A manipulator system for reflecting learning results of a learning manipulator on a plurality of types of working manipulators,
a learning manipulator that stores the driving command and the state signal when the learning mechanism is driven according to the driving command;
A learning device that performs learning using a drive command and a state signal of the learning manipulator when the learning manipulator is driven, wherein the drive command and the state signal are standardized based on the specifications of the learning manipulator. a learning device comprising normalization means and a first learning device that learns using the normalized drive command and the state signal;
one or more working manipulators driven based on commanded trajectories from a trajectory generator;
The trajectory generator provides a command trajectory for the work manipulator using a drive command for the work manipulator and a state signal of the work manipulator, and generates the drive command and the state signal for the work manipulator according to the specifications of the work manipulator. a second learning device for reasoning using the standardized drive command and the state signal; and an output of the second learning device based on the specifications of the working manipulator. denormalization means for performing denormalization, the output of the denormalization means is used as the command trajectory of the working manipulator, and the second learning device is the first learning device. and manipulator system.