JP2019185742A

JP2019185742A - Controller and control method

Info

Publication number: JP2019185742A
Application number: JP2019015507A
Authority: JP
Inventors: 鈔支; Zhi Chao; 文杰陳; Wenjie Chen; 凱濛王; Kaimeng Wang; 和臣前田; Kazuomi Maeda
Original assignee: Fanuc Corp
Current assignee: Fanuc Corp
Priority date: 2018-04-17
Filing date: 2019-01-31
Publication date: 2019-10-24
Anticipated expiration: 2039-01-31
Also published as: JP6841852B2

Abstract

To provide a controller and a control method capable of identifying a coefficient of a friction model.SOLUTION: A controller 1 which performs position control in view of friction for one or more axes of a machine includes: a data acquisition part 70 which acquires at least a position command and a position feedback; and a correction torque estimation part 80 which estimates, based on position deviation as a difference between the position command and the position feedback, a coefficient of a friction model upon performing the position control.SELECTED DRAWING: Figure 2

Description

本発明は、制御装置及び制御方法に関し、特に摩擦モデルの係数を同定することができる制御装置及び制御方法に関する。 The present invention relates to a control device and a control method, and more particularly to a control device and a control method capable of identifying a coefficient of a friction model.

工作機械、射出成形機、レーザ加工機、放電加工機、産業用ロボット等を含む産業用機械（以下、単に機械という）の制御においては、駆動機構に働く摩擦力を補償することにより、精密な制御性能を得ることができる。 In the control of industrial machines (hereinafter simply referred to as “machines”) including machine tools, injection molding machines, laser processing machines, electrical discharge machines, industrial robots, etc., precise compensation is achieved by compensating for frictional forces acting on the drive mechanism. Control performance can be obtained.

図１１は、工作機械の駆動機構の一例である。サーボモータが駆動して、軸受により支持されたボールねじが回転することにより、ステージが移動する。このとき、例えば軸受とボールねじとの間、ボールねじとステージとの間には摩擦力が働く。つまりステージの挙動は摩擦力の影響を受ける。 FIG. 11 is an example of a drive mechanism of a machine tool. When the servo motor is driven and the ball screw supported by the bearing rotates, the stage moves. At this time, for example, a frictional force acts between the bearing and the ball screw and between the ball screw and the stage. In other words, the stage behavior is affected by the frictional force.

図１２は、摩擦力と駆動機構の挙動との典型的な関係を示すグラフである。静止状態（速度＝０）から運動状態へ、又は運動状態から静止状態への移行時には、摩擦力の変化は非線形となる。これをストライベック効果という。ストライベック効果により、機械には位置決め時間が長くなったり、反転時の軌跡誤差（象限突起）が発生する。 FIG. 12 is a graph showing a typical relationship between the frictional force and the behavior of the drive mechanism. At the time of transition from the stationary state (speed = 0) to the moving state or from the moving state to the stationary state, the change in the frictional force is nonlinear. This is called the Stribeck effect. Due to the Stribeck effect, the machine has a long positioning time and a trajectory error (quadrant projection) during reversal occurs.

このような非線形な摩擦の補償を考慮する際に有効な摩擦モデルとして、Ｌｕｇｒｅモデルが知られている。Ｌｕｇｒｅモデルにより、非線形な摩擦効果を抑えるための補正値（補正トルク）を求めることができる。図１３に示すように、上記補正値を電流指令に加算することによって非線形な摩擦力が補償され、制御対象を精密に制御することができる。この補正処理は公知のフィードバック制御のなかで実施することが可能である。機械の制御装置は、位置指令と位置フィードバックとの偏差、及び速度指令と速度フィードバックとの偏差に基づいて電流指令を決定する。このとき制御装置は、Ｌｕｇｒｅモデルにより求められる補正トルクを電流指令に加算する。 The Lugre model is known as an effective friction model when considering such nonlinear friction compensation. With the Lugre model, a correction value (correction torque) for suppressing a non-linear friction effect can be obtained. As shown in FIG. 13, by adding the correction value to the current command, the nonlinear frictional force is compensated, and the controlled object can be precisely controlled. This correction processing can be performed in known feedback control. The machine control device determines the current command based on the deviation between the position command and the position feedback and the deviation between the speed command and the speed feedback. At this time, the control device adds the correction torque obtained from the Lugre model to the current command.

Ｌｕｇｒｅモデルを数１に示す。ＦはＬｕｇｒｅモデルの出力となる補正トルクである。ｖ，ｚは速度及び位置に関係する変数である。Ｆｃ，Ｆｓ，ｖ０，σ０，σ１，σ２は駆動機構に特有の係数である。 The Lugre model is shown in Equation 1. F is a correction torque that is an output of the Lugre model. v and z are variables related to speed and position. Fc, Fs, v0, σ0, σ1, and σ2 are coefficients specific to the drive mechanism.

関連技術として、特許文献１には、摩擦モデルから補正データが得られることが開示されている。 As a related technique, Patent Document 1 discloses that correction data can be obtained from a friction model.

特開２００４−２３４３２７号公報JP 2004-234327 A

しかしながら、Ｌｕｇｒｅモデルをはじめとする摩擦モデルの係数は機械や使用環境等により個々に異なるため、制御対象毎に個別に係数を同定しなければならなかった。また同定すべき係数の数が多いため、係数の同定作業に多くの手数を要していた。よって摩擦モデルの係数を手間なく同定できる手段が求められている。 However, since the coefficient of the friction model including the Lugre model differs depending on the machine, the usage environment, etc., the coefficient must be individually identified for each control target. In addition, since there are a large number of coefficients to be identified, much work is required for the coefficient identification work. Therefore, there is a demand for means that can identify the coefficient of the friction model without trouble.

そこで、摩擦モデルの係数を同定することができる制御装置及び制御方法が望まれている。 Therefore, a control device and a control method that can identify the coefficient of the friction model are desired.

本発明の一態様は、機械の１以上の軸について、摩擦を考慮した位置制御を行う制御装置であって、少なくとも位置指令及び位置フィードバックを取得するデータ取得部と、前記位置指令と前記位置フィードバックの差分である位置偏差に基づいて、前記位置制御を行う際の摩擦モデルの係数を推定する補正トルク推定部と、を備える制御装置である。 One aspect of the present invention is a control device that performs position control in consideration of friction with respect to one or more axes of a machine, and includes a data acquisition unit that acquires at least a position command and position feedback, the position command, and the position feedback. And a correction torque estimation unit that estimates a coefficient of a friction model when performing the position control based on a position deviation that is a difference between the two.

本発明の他の態様は、機械の１以上の軸について、摩擦を考慮した位置制御を行う制御方法であって、少なくとも位置指令及び位置フィードバックを取得するデータ取得ステップと、前記位置指令と前記位置フィードバックの差分である位置偏差に基づいて、前記位置制御を行う際の摩擦モデルの係数を推定する補正トルク推定ステップと、を備える制御方法である。 Another aspect of the present invention is a control method for performing position control in consideration of friction with respect to one or more axes of a machine, a data acquisition step for acquiring at least a position command and position feedback, the position command and the position And a correction torque estimating step for estimating a coefficient of a friction model when performing the position control based on a position deviation which is a feedback difference.

本発明により、摩擦モデルの係数を同定することができる制御装置及び制御方法を提供することができる。 According to the present invention, it is possible to provide a control device and a control method capable of identifying a coefficient of a friction model.

第１の実施形態の制御装置１の概略的なハードウェア構成図である。It is a schematic hardware block diagram of the control apparatus 1 of 1st Embodiment. 第１の実施形態の制御装置１の概略的な機能ブロック図である。It is a schematic functional block diagram of the control apparatus 1 of 1st Embodiment. 第２，３の実施形態の制御装置１の概略的なハードウェア構成図である。It is a schematic hardware block diagram of the control apparatus 1 of 2nd, 3rd embodiment. 第２の実施形態の制御装置１の概略的な機能ブロック図である。It is a schematic functional block diagram of the control apparatus 1 of 2nd Embodiment. 第２の実施形態における学習部８３の機能ブロック図である。It is a functional block diagram of the learning part 83 in 2nd Embodiment. 強化学習の一形態を示すフローチャートである。It is a flowchart which shows one form of reinforcement learning. ニューロンを説明する図である。It is a figure explaining a neuron. ニューラルネットワークを説明する図である。It is a figure explaining a neural network. 第３の実施形態の制御装置１と機械学習装置１００の概略的な機能ブロック図である。It is a schematic functional block diagram of the control apparatus 1 and the machine learning apparatus 100 of 3rd Embodiment. 制御装置１を組み込んだシステムの一形態を示す概略的な機能ブロック図である。1 is a schematic functional block diagram showing one form of a system in which a control device 1 is incorporated. 機械学習装置１２０（又は１００）を組み込んだシステムの他の形態を示す概略的な機能ブロック図である。It is a schematic functional block diagram which shows the other form of the system incorporating the machine learning apparatus 120 (or 100). 工作機械の駆動機構の一例を示す図である。It is a figure which shows an example of the drive mechanism of a machine tool. 摩擦力と駆動機構の挙動との関係を示すグラフである。It is a graph which shows the relationship between a frictional force and the behavior of a drive mechanism. 摩擦モデルを利用した非線形摩擦力の補正方法の一例を示す図である。It is a figure which shows an example of the correction method of the nonlinear frictional force using a friction model. 摩擦モデルを利用した非線形摩擦力の補正方法の他の例を示す図である。It is a figure which shows the other example of the correction method of the nonlinear frictional force using a friction model. 摩擦モデルを利用した非線形摩擦力の補正方法の他の例を示す図である。It is a figure which shows the other example of the correction method of the nonlinear frictional force using a friction model.

図１は、本発明の第１の実施形態による制御装置１と、制御装置１によって制御される産業用機械の要部とを示す概略的なハードウェア構成図である。制御装置１は、工作機械をはじめとする機械を制御する制御装置である。制御装置１は、ＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３、不揮発性メモリ１４、インタフェース１８、バス２０、軸制御回路３０、サーボアンプ４０を有する。制御装置１には、サーボモータ５０、操作盤６０が接続される。 FIG. 1 is a schematic hardware configuration diagram showing a control device 1 according to the first embodiment of the present invention and main parts of an industrial machine controlled by the control device 1. The control device 1 is a control device that controls machines such as machine tools. The control device 1 includes a CPU 11, ROM 12, RAM 13, nonvolatile memory 14, interface 18, bus 20, axis control circuit 30, and servo amplifier 40. A servo motor 50 and an operation panel 60 are connected to the control device 1.

ＣＰＵ１１は、制御装置１を全体的に制御するプロセッサである。ＣＰＵ１１は、ＲＯＭ１２に格納されたシステム・プログラムをバス２０を介して読み出し、システム・プログラムに従って制御装置１全体を制御する。 The CPU 11 is a processor that controls the control device 1 as a whole. The CPU 11 reads out the system program stored in the ROM 12 via the bus 20 and controls the entire control device 1 according to the system program.

ＲＯＭ１２は、機械の各種制御等を実行するためのシステム・プログラム（後述する機械学習装置１００とのやりとりを制御するための通信プログラムを含む）を予め格納している。 The ROM 12 stores in advance a system program (including a communication program for controlling communication with the machine learning device 100 described later) for executing various machine controls.

ＲＡＭ１３は、一時的な計算データや表示データ、後述する操作盤６０を介してオペレータが入力したデータ等を一時的に格納する。 The RAM 13 temporarily stores temporary calculation data and display data, data input by an operator via an operation panel 60 described later, and the like.

不揮発性メモリ１４は、例えば図示しないバッテリでバックアップされており、制御装置１の電源が遮断されても記憶状態を保持する。不揮発性メモリ１４は、操作盤６０から入力されるデータや、図示しないインタフェースを介して入力された機械の制御用のプログラムやデータ等を格納する。不揮発性メモリ１４に記憶されたプログラムやデータは、実行時及び利用時にはＲＡＭ１３に展開されても良い。 The non-volatile memory 14 is backed up by a battery (not shown), for example, and retains the storage state even when the control device 1 is powered off. The nonvolatile memory 14 stores data input from the operation panel 60, machine control programs and data input via an interface (not shown), and the like. The program and data stored in the nonvolatile memory 14 may be expanded in the RAM 13 at the time of execution and use.

軸制御回路３０は、機械が備える動作軸を制御する。軸制御回路３０は、ＣＰＵ１１が出力する軸の移動指令量を受けて、軸の移動指令をサーボアンプ４０に出力する。このとき軸制御回路３０は、後述のフィードバック制御を行うほか、ＣＰＵ１１がＬｕｇｒｅモデル等に基づき出力する補正トルクにより非線形摩擦力の補正を行う。あるいは、軸制御回路３０がＬｕｇｒｅモデル等に基づき計算した補正トルクにより非線形摩擦力の補正を行っても良い。ＣＰＵ１１で補正を行うより、軸制御回路３０の中で補正を行う方が一般に高速である。 The axis control circuit 30 controls an operation axis included in the machine. The axis control circuit 30 receives an axis movement command amount output from the CPU 11 and outputs an axis movement command to the servo amplifier 40. At this time, the axis control circuit 30 performs the feedback control described later, and corrects the nonlinear frictional force by the correction torque output from the CPU 11 based on the Lugre model or the like. Alternatively, the nonlinear frictional force may be corrected by the correction torque calculated by the axis control circuit 30 based on the Lugre model or the like. The correction in the axis control circuit 30 is generally faster than the correction by the CPU 11.

サーボアンプ４０は、軸制御回路３０が出力する軸の移動指令を受けて、サーボモータ５０を駆動する。 The servo amplifier 40 drives the servo motor 50 in response to an axis movement command output from the axis control circuit 30.

サーボモータ５０は、サーボアンプ４０により駆動されて機械が備える軸を移動させる。サーボモータ５０は、典型的には位置・速度検出器を内蔵する。あるいは内蔵せずに機械側に位置検出器を設ける場合もある。位置・速度検出器は位置・速度フィードバック信号を出力し、この信号が軸制御回路３０にフィードバックされることで、位置・速度のフィードバック制御が行われる。 The servo motor 50 is driven by the servo amplifier 40 to move a shaft included in the machine. The servo motor 50 typically includes a position / speed detector. Alternatively, a position detector may be provided on the machine side without being incorporated. The position / velocity detector outputs a position / velocity feedback signal, and this signal is fed back to the axis control circuit 30 to perform position / velocity feedback control.

なお、図１では軸制御回路３０、サーボアンプ４０、サーボモータ５０は１つずつしか示されていないが、実際には制御対象となる機械に備えられた軸の数だけ用意される。例えば、６軸を備えた機械を制御する場合、それぞれの軸に対応する軸制御回路３０、サーボアンプ４０、サーボモータ５０が合計６セット用意される。 In FIG. 1, only one axis control circuit 30, one servo amplifier 40, and one servo motor 50 are shown, but in reality, as many axes as the number of axes provided in the machine to be controlled are prepared. For example, when a machine having six axes is controlled, a total of six sets of the axis control circuit 30, servo amplifier 40, and servo motor 50 corresponding to each axis are prepared.

操作盤６０は、ハードウェアキー等を備えたデータ入力装置である。その中には教示操作盤と呼ばれる、ディスプレイやハードウェアキー等を備えた手動データ入力装置もある。教示操作盤は、インタフェース１８を介してＣＰＵ１１から受けた情報をディスプレイに表示する。操作盤６０は、ハードウェアキー等から入力されたパルス、指令、データ等をインタフェース１８を介してＣＰＵ１１に渡す。 The operation panel 60 is a data input device provided with hardware keys and the like. Among them, there is a manual data input device equipped with a display, hardware keys, etc., called a teaching operation panel. The teaching operation panel displays information received from the CPU 11 via the interface 18 on the display. The operation panel 60 passes pulses, commands, data, and the like input from hardware keys and the like to the CPU 11 via the interface 18.

図２は、第１の実施形態における制御装置１の概略的な機能ブロック図である。図２に示した各機能ブロックは、図１に示した制御装置１が備えるＣＰＵ１１が、システム・プログラムを実行し、制御装置１の各部の動作を制御することにより実現される。 FIG. 2 is a schematic functional block diagram of the control device 1 according to the first embodiment. Each functional block illustrated in FIG. 2 is realized by the CPU 11 included in the control device 1 illustrated in FIG. 1 executing a system program and controlling the operation of each unit of the control device 1.

本実施形態の制御装置１は、データ取得部７０、及び補正トルク推定部８０を備える。補正トルク推定部８０は、最適化部８１及び補正トルク算出部８２を備える。また、不揮発性メモリ１４上には、データ取得部７０が取得したデータが記憶される取得データ記憶部７１が設けられている。 The control device 1 of the present embodiment includes a data acquisition unit 70 and a correction torque estimation unit 80. The correction torque estimation unit 80 includes an optimization unit 81 and a correction torque calculation unit 82. In addition, an acquisition data storage unit 71 in which data acquired by the data acquisition unit 70 is stored is provided on the nonvolatile memory 14.

データ取得部７０は、ＣＰＵ１１、サーボモータ５０、及び機械等から各種データを取得する機能手段である。データ取得部７０は、例えば、位置指令、位置フィードバック、速度指令、及び速度フィードバックを取得し、取得データ記憶部７１に記憶する。 The data acquisition unit 70 is a functional unit that acquires various data from the CPU 11, the servo motor 50, the machine, and the like. The data acquisition unit 70 acquires, for example, a position command, a position feedback, a speed command, and a speed feedback and stores them in the acquired data storage unit 71.

補正トルク推定部８０は、取得データ記憶部７１に記憶されているデータに基づいて、摩擦モデル（典型的にはＬｕｇｒｅモデル）における最適な係数（ＬｕｇｒｅモデルであればＦｃ，Ｆｓ，ｖ０，σ０，σ１，σ２）を推定する機能手段である。本実施の形態では、最適化部８１が、例えば位置指令と位置フィードバックとの偏差を最小化する最適化問題を解くことで、摩擦モデルの係数を推定する。典型的には、係数の組み合わせを網羅的に探索するグリッドサーチ、係数の組み合わせをランダムに試行するランダムサーチ、確率分布と獲得関数に基づいて最適な係数の組み合わせを探索するベイズ最適化等の手法により、位置指令と位置フィードバックとの偏差が最小となる係数の組み合わせを推定することができる。すなわち、最適化部８１は、係数の組み合わせを様々に変更しつつ機械を動作させ、位置指令と位置フィードバックとの偏差を評価するサイクルを繰り返すことで、偏差が最小となる係数の組み合わせを発見する。 Based on the data stored in the acquired data storage unit 71, the correction torque estimation unit 80 uses Fc, Fs, v 0, σ 0, Fc, Fs, v 0, σ 0, friction coefficient (typically the Lugre model). This is a functional means for estimating σ1, σ2). In the present embodiment, the optimization unit 81 estimates the coefficient of the friction model by solving an optimization problem that minimizes the deviation between the position command and the position feedback, for example. Typically, grid search that exhaustively searches for coefficient combinations, random search that tries coefficient combinations randomly, and Bayesian optimization that searches for optimal coefficient combinations based on probability distribution and acquisition function Thus, a combination of coefficients that minimizes the deviation between the position command and the position feedback can be estimated. In other words, the optimization unit 81 operates the machine while changing the combination of coefficients in various ways, and repeats a cycle for evaluating the deviation between the position command and the position feedback, thereby finding a combination of coefficients that minimizes the deviation. .

補正トルク算出部８２は、最適化部８１が推定した結果（摩擦モデルの係数の最適な組み合わせ）を使用し、摩擦モデルに基づく補正トルクを算出、出力する。制御装置１は、補正トルク算出部８２が出力した補正トルクを電流指令に加算する。 The correction torque calculation unit 82 calculates and outputs a correction torque based on the friction model using the result estimated by the optimization unit 81 (an optimal combination of the coefficients of the friction model). The control device 1 adds the correction torque output from the correction torque calculation unit 82 to the current command.

本実施の形態によれば、最適化部８１が最適化問題を解くことによって摩擦モデルの係数を同定するので、様々な機械や使用環境等に応じた最適な係数も容易に求めることができるようになる。 According to the present embodiment, since the optimization unit 81 identifies the coefficient of the friction model by solving the optimization problem, it is possible to easily determine the optimal coefficient according to various machines, usage environments, and the like. become.

図３は、第２及び第３の実施形態における、機械学習装置１００を備えた制御装置１の概略的なハードウェアブロック図である。本実施形態の制御装置１は、機械学習装置１００にかかる構成を備えている点を除いて第１の実施形態と同様の構成をしている。本実施形態の制御装置１が備えるＲＯＭ１２には、機械学習装置１００とのやりとりを制御するための通信プログラム等を含むシステム・プログラムが予め書き込まれている。 FIG. 3 is a schematic hardware block diagram of the control device 1 including the machine learning device 100 in the second and third embodiments. The control device 1 of the present embodiment has the same configuration as that of the first embodiment except that the configuration of the machine learning device 100 is provided. In the ROM 12 provided in the control device 1 of the present embodiment, a system program including a communication program for controlling the exchange with the machine learning device 100 is written in advance.

インタフェース２１は、制御装置１と機械学習装置１００とを接続するためのインタフェースである。機械学習装置１００は、プロセッサ１０１、ＲＯＭ１０２、ＲＡＭ１０３、不揮発性メモリ１０４を有する。 The interface 21 is an interface for connecting the control device 1 and the machine learning device 100. The machine learning apparatus 100 includes a processor 101, a ROM 102, a RAM 103, and a nonvolatile memory 104.

プロセッサ１０１は、機械学習装置１００全体を統御する。ＲＯＭ１０２は、システム・プログラム等を格納する。ＲＡＭ１０３は、機械学習に係る各処理における一時的な記憶を行う。不揮発性メモリ１０４は、学習モデル等を格納する。 The processor 101 controls the entire machine learning device 100. The ROM 102 stores system programs and the like. The RAM 103 performs temporary storage in each process related to machine learning. The nonvolatile memory 104 stores a learning model and the like.

機械学習装置１００は、制御装置１が取得可能な各種情報（位置指令、速度指令、位置フィードバック等）をインタフェース２１を介して観測する。機械学習装置１００は、サーボモータ５０を精密制御するための摩擦モデル（典型的にはＬｕｇｒｅモデル）の係数を機械学習により学習及び推定し、補正トルクをインタフェース２１を介して制御装置１に出力する。 The machine learning device 100 observes various information (position command, speed command, position feedback, etc.) that can be acquired by the control device 1 via the interface 21. The machine learning device 100 learns and estimates a coefficient of a friction model (typically a Lugre model) for precise control of the servo motor 50 by machine learning, and outputs a correction torque to the control device 1 via the interface 21. .

図４は、第２の実施形態による制御装置１と機械学習装置１００の概略的な機能ブロック図である。図４に示される制御装置１は、機械学習装置１００が学習を行う場合に必要とされる構成を備えている（学習モード）。図４に示した各機能ブロックは、図３に示した制御装置１が備えるＣＰＵ１１、及び機械学習装置１００のプロセッサ１０１が、それぞれのシステム・プログラムを実行し、制御装置１及び機械学習装置１００の各部の動作を制御することにより実現される。 FIG. 4 is a schematic functional block diagram of the control device 1 and the machine learning device 100 according to the second embodiment. The control device 1 shown in FIG. 4 has a configuration required when the machine learning device 100 performs learning (learning mode). In each functional block shown in FIG. 4, the CPU 11 included in the control device 1 shown in FIG. 3 and the processor 101 of the machine learning device 100 execute the respective system programs, and the control device 1 and the machine learning device 100 This is realized by controlling the operation of each part.

本実施形態の制御装置１は、データ取得部７０、及び機械学習装置１００上に構成された補正トルク推定部８０を備える。補正トルク推定部８０は、学習部８３を備える。また、不揮発性メモリ１４上には、データ取得部７０が取得したデータが記憶される取得データ記憶部７１が設けられており、機械学習装置１００の不揮発性メモリ１０４上には、学習部８３による機械学習により構築された学習モデルを記憶する学習モデル記憶部８４が設けられている。 The control device 1 of this embodiment includes a data acquisition unit 70 and a correction torque estimation unit 80 configured on the machine learning device 100. The correction torque estimation unit 80 includes a learning unit 83. Further, an acquisition data storage unit 71 for storing data acquired by the data acquisition unit 70 is provided on the nonvolatile memory 14, and a learning unit 83 is provided on the nonvolatile memory 104 of the machine learning device 100. A learning model storage unit 84 that stores a learning model constructed by machine learning is provided.

本実施形態におけるデータ取得部７０の動作は、第１の実施形態と同様である。データ取得部７０は、例えば、位置指令、位置フィードバック、速度指令、及び速度フィードバックを取得し、取得データ記憶部７１に記憶する。また、データ取得部７０は、制御装置１が現在非線形摩擦の補正に使用しているＬｕｇｒｅモデルの係数のセット（Ｆｃ，Ｆｓ，ｖ０，σ０，σ１，σ２）を取得し、取得データ記憶部７１に記憶する。 The operation of the data acquisition unit 70 in this embodiment is the same as that in the first embodiment. The data acquisition unit 70 acquires, for example, a position command, a position feedback, a speed command, and a speed feedback and stores them in the acquired data storage unit 71. Further, the data acquisition unit 70 acquires a set of coefficients (Fc, Fs, v0, σ0, σ1, σ2) of the Lugre model that the control device 1 is currently using for correcting nonlinear friction, and acquires the data storage unit 71. To remember.

前処理部９０は、データ取得部７０が取得したデータに基づいて、機械学習装置１００による機械学習に用いられる学習データを作成する。前処理部９０は、各データを機械学習装置１００において扱われる統一的な形式へと変換（数値化、サンプリング等）した学習データを作成する。前処理部９０は、機械学習装置１００が教師なし学習をする場合においては、該学習における所定の形式の状態データＳを学習データとして作成し、機械学習装置１００が教師あり学習をする場合においては、該学習における所定の形式の状態データＳ及びラベルデータＬの組を学習データとして作成し、機械学習装置１００が強化学習をする場合においては、該学習における所定の形式の状態データＳ及び判定データＤの組を学習データとして作成する。 The preprocessing unit 90 creates learning data used for machine learning by the machine learning device 100 based on the data acquired by the data acquisition unit 70. The pre-processing unit 90 creates learning data obtained by converting each data into a unified format handled by the machine learning device 100 (numericalization, sampling, etc.). When the machine learning device 100 performs unsupervised learning, the preprocessing unit 90 creates state data S in a predetermined format in the learning as learning data, and when the machine learning device 100 performs supervised learning. When a set of state data S and label data L in a predetermined format in the learning is created as learning data, and the machine learning device 100 performs reinforcement learning, the state data S and determination data in the predetermined format in the learning A set of D is created as learning data.

学習部８３は、前処理部９０が作成した学習データを用いた機械学習を行う。学習部８３は、教師なし学習、教師あり学習、強化学習等の公知の機械学習の手法により学習モデルを生成し、生成した学習モデルを学習モデル記憶部８４に記憶する。学習部８３が行う教師なし学習の手法としては、例えばａｕｔｏｅｎｃｏｄｅｒ法、ｋ−ｍｅａｎｓ法等が、教師あり学習の手法としては、例えばｍｕｌｔｉｌａｙｅｒｐｅｒｃｅｐｔｒｏｎ法、ｒｅｃｕｒｒｅｎｔｎｅｕｒａｌｎｅｔｗｏｒｋ法、ＬｏｎｇＳｈｏｒｔ−ＴｅｒｍＭｅｍｏｒｙ法、ｃｏｎｖｏｌｕｔｉｏｎａｌｎｅｕｒａｌｎｅｔｗｏｒｋ法等が、強化学習の手法としては、例えばＱ学習等が挙げられる。 The learning unit 83 performs machine learning using the learning data created by the preprocessing unit 90. The learning unit 83 generates a learning model by a known machine learning method such as unsupervised learning, supervised learning, and reinforcement learning, and stores the generated learning model in the learning model storage unit 84. As an unsupervised learning method performed by the learning unit 83, for example, the autoencoder method, the k-means method, and the like are used. Examples of the reinforcement learning method such as the neural network method include Q learning.

図５は、学習手法の一例として強化学習を実行する学習部８３の内部機能構成を示す。強化学習は、学習対象が存在する環境の現在状態（つまり入力）を観測するとともに現在状態で所定の行動（つまり出力）を実行し、その行動に対し何らかの報酬を与えるというサイクルを試行錯誤的に反復して、報酬の総計が最大化されるような方策（本実施形態ではＬｕｇｒｅモデルの係数の設定）を最適解として学習する手法である。 FIG. 5 shows an internal functional configuration of the learning unit 83 that executes reinforcement learning as an example of the learning method. Reinforcement learning is a trial-and-error cycle that observes the current state (ie, input) of the environment where the learning target exists, executes a predetermined action (ie, output) in the current state, and gives some reward to that action. This is a method of learning, as an optimal solution, a policy (in this embodiment, setting of a coefficient of the Lugre model) such that the total amount of reward is maximized by repetition.

学習部８３は、状態観測部８３１、判定データ取得部８３２、強化学習部８３３を有する。図５に示した各機能ブロックは、図３に示した制御装置１が備えるＣＰＵ１１、及び機械学習装置１００のプロセッサ１０１が、それぞれのシステム・プログラムを実行し、制御装置１及び機械学習装置１００の各部の動作を制御することにより実現される。 The learning unit 83 includes a state observation unit 831, a determination data acquisition unit 832, and a reinforcement learning unit 833. In each functional block shown in FIG. 5, the CPU 11 included in the control device 1 shown in FIG. 3 and the processor 101 of the machine learning device 100 execute the respective system programs, and the control device 1 and the machine learning device 100 This is realized by controlling the operation of each part.

状態観測部８３１は、環境の現在状態を表す状態変数Ｓを観測する。状態変数Ｓは、例えば現在のＬｕｇｕｒｅモデルの係数Ｓ１、現在の位置指令Ｓ２、現在の速度指令Ｓ３、前サイクルにおける位置フィードバックＳ４を含む。 The state observation unit 831 observes a state variable S that represents the current state of the environment. The state variable S includes, for example, a current Slag model coefficient S1, a current position command S2, a current speed command S3, and a position feedback S4 in the previous cycle.

状態観測部８３１は、Ｌｕｇｒｅモデルの係数Ｓ１として、制御装置１が現在非線形摩擦の補正に使用しているＬｕｇｒｅモデルの係数のセット（Ｆｃ，Ｆｓ，ｖ０，σ０，σ１，σ２）を取得する。 The state observation unit 831 acquires a set of coefficients (Fc, Fs, v0, σ0, σ1, σ2) of the Lugre model currently used by the control device 1 for correcting the nonlinear friction as the coefficient S1 of the Lugre model.

状態観測部８３１は、現在の位置指令Ｓ２及び速度指令Ｓ３として、制御装置１が現在出力している位置指令及び速度指令を取得する。 The state observing unit 831 acquires the position command and speed command currently output by the control device 1 as the current position command S2 and speed command S3.

状態観測部８３１は、位置フィードバックＳ４として、制御装置１が１サイクル前に取得した位置フィードバック（現在の位置指令及び速度指令を生成するためにフィードバック制御において使用されたもの）を取得する。 The state observing unit 831 acquires, as the position feedback S4, the position feedback (used in the feedback control for generating the current position command and speed command) acquired by the control device 1 one cycle before.

判定データ取得部８３２は、状態変数Ｓの下で機械の制御を行った場合における結果を示す指標である判定データＤを取得する。判定データＤは、位置フィードバックＤ１を含む。 The determination data acquisition unit 832 acquires determination data D that is an index indicating a result when the machine is controlled under the state variable S. The determination data D includes position feedback D1.

判定データ取得部８３２は、位置フィードバックＤ１として、Ｌｕｇｒｅモデルの係数Ｓ１、位置指令Ｓ２及び速度指令Ｓ３に基づいて機械を制御した結果として得られる位置フィードバックを取得する。 The determination data acquisition unit 832 acquires, as the position feedback D1, position feedback obtained as a result of controlling the machine based on the Lugre model coefficient S1, the position command S2, and the speed command S3.

強化学習部８３３は、状態変数Ｓと判定データＤとを用いて、Ｌｕｇｒｅモデルの係数Ｓ１と、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４との相関関係を学習する。すなわち強化学習部８３３は、状態変数Ｓの構成要素Ｓ１，Ｓ２，Ｓ３，Ｓ４の相関性を示すモデル構造を生成する。強化学習部８３３は、報酬計算部８３４、価値関数更新部８３５を有する。 The reinforcement learning unit 833 uses the state variable S and the determination data D to learn the correlation between the Lugre model coefficient S1, the position command S2, the speed command S3, and the position feedback S4. That is, the reinforcement learning unit 833 generates a model structure indicating the correlation between the constituent elements S1, S2, S3, and S4 of the state variable S. The reinforcement learning unit 833 includes a reward calculation unit 834 and a value function update unit 835.

報酬計算部８３４は、状態変数Ｓに基づいてＬｕｇｒｅモデルの係数が設定された場合における位置制御の結果（状態変数Ｓが取得された次の学習周期で用いられる判定データＤに相当）に関連する報酬Ｒを求める。 The reward calculation unit 834 relates to a result of position control when the coefficient of the Lugre model is set based on the state variable S (corresponding to the determination data D used in the next learning cycle in which the state variable S is acquired). Reward R is calculated.

価値関数更新部８３５は、報酬Ｒを用いて、Ｌｕｇｒｅモデルの係数の価値を表す関数Ｑを更新する。価値関数更新部８３５が関数Ｑの更新を繰り返すことにより、強化学習部８３３は、Ｌｕｇｒｅモデルの係数Ｓ１と、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４と、の相関関係を学習する。 The value function updating unit 835 updates the function Q representing the value of the coefficient of the Lugre model using the reward R. When the value function update unit 835 repeats the update of the function Q, the reinforcement learning unit 833 learns the correlation between the coefficient S1 of the Lugre model and the position command S2, the speed command S3, and the position feedback S4.

強化学習部８３３が実行する強化学習のアルゴリズムの一例を説明する。この例によるアルゴリズムは、Ｑ学習（Ｑ−ｌｅａｒｎｉｎｇ）として知られるものであって、行動主体の状態ｓと、その状態ｓで行動主体が選択し得る行動ａとを独立変数として、状態ｓで行動ａを選択した場合の行動の価値を表す関数Ｑ（ｓ，ａ）を学習する手法である。状態ｓで価値関数Ｑが最も高くなる行動ａを選択することが最適解となる。状態ｓと行動ａとの相関性が未知の状態でＱ学習を開始し、任意の状態ｓで種々の行動ａを選択する試行錯誤を繰り返すことで、価値関数Ｑを反復して更新し、最適解に近付ける。ここで、状態ｓで行動ａを選択した結果として環境（つまり状態ｓ）が変化したときに、その変化に応じた報酬（つまり行動ａの重み付け）ｒが得られるように構成し、より高い報酬ｒが得られる行動ａを選択するように学習を誘導することで、価値関数Ｑを比較的短時間で最適解に近付けることができる。 An example of an algorithm for reinforcement learning executed by the reinforcement learning unit 833 will be described. The algorithm according to this example is known as Q-learning (Q-learning), and the behavior s in the state s is defined as an independent variable with the behavior s state s and the behavior a that the behavior subject can select in the state s. This is a method for learning a function Q (s, a) representing the value of an action when a is selected. The optimal solution is to select the action a that has the highest value function Q in the state s. The value function Q is iteratively updated by repeating trial and error by starting Q learning in a state where the correlation between the state s and the action a is unknown, and selecting various actions a in an arbitrary state s. Approach the solution. Here, when the environment (that is, the state s) changes as a result of selecting the action a in the state s, a reward (that is, the weight of the action a) r corresponding to the change is obtained, and a higher reward By inducing learning to select an action a that gives r, the value function Q can be brought close to the optimal solution in a relatively short time.

価値関数Ｑの更新式は、一般に下記の数２式のように表すことができる。数２式において、ｓ_t及びａ_tはそれぞれ時刻ｔにおける状態及び行動であり、行動ａ_tにより状態はｓ_t+1に変化する。ｒ_t+1は、状態がｓ_tからｓ_t+1に変化したことで得られる報酬である。ｍａｘＱの項は、時刻ｔ＋１で最大の価値Ｑになる（と時刻ｔで考えられている）行動ａを行ったときのＱを意味する。α及びγはそれぞれ学習係数及び割引率であり、０＜α≦１、０＜γ≦１で任意設定される。 The updating formula of the value function Q can be generally expressed as the following formula 2. In Equation 2, s _t and a _t is a state and behavior at each time t, the state by action a _t is changed to s _{t + 1.} r t _{+ 1} is a reward obtained by the state changes from s _t in s _{t + 1.} The term maxQ means Q when the action a having the maximum value Q at time t + 1 (and considered at time t) is performed. α and γ are a learning coefficient and a discount rate, respectively, and are arbitrarily set such that 0 <α ≦ 1 and 0 <γ ≦ 1.

強化学習部８３３がＱ学習を実行する場合、状態観測部８３１が観測した状態変数Ｓ及び判定データ取得部８３２が取得した判定データＤは、更新式の状態ｓに該当し、現在状態すなわち位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４に対し、Ｌｕｇｒｅモデルの係数Ｓ１をどのように決定するべきかという行動は、更新式の行動ａに該当し、報酬計算部８３４が求める報酬Ｒは、更新式の報酬ｒに該当する。よって価値関数更新部８３５は、現在状態に対するＬｕｇｒｅモデルの係数の価値を表す関数Ｑを、報酬Ｒを用いたＱ学習により繰り返し更新する。 When the reinforcement learning unit 833 executes Q-learning, the state variable S observed by the state observation unit 831 and the determination data D acquired by the determination data acquisition unit 832 correspond to the update state s, and the current state, that is, the position command The behavior of how to determine the coefficient S1 of the Lugre model for S2, the speed command S3 and the position feedback S4 corresponds to the behavior a of the update formula, and the reward R calculated by the reward calculation unit 834 is an update formula Corresponds to the reward r. Therefore, the value function updating unit 835 repeatedly updates the function Q representing the value of the coefficient of the Lugre model with respect to the current state by Q learning using the reward R.

報酬計算部８３４は、例えば、決定したＬｕｇｒｅモデルの係数Ｓ１に基づく機械制御を行い、位置制御の結果が「適」と判定される場合に、報酬Ｒを正（プラス）の値とすることができる。一方、位置制御の結果が「否」と判定される場合に、報酬Ｒを負（マイナス）の値とすることができる。正負の報酬Ｒの絶対値は、互いに同一であっても良いし異なっていても良い。 The reward calculation unit 834 performs, for example, machine control based on the determined Lugre model coefficient S1, and when the result of the position control is determined to be “suitable”, the reward R may be set to a positive (plus) value. it can. On the other hand, when the result of the position control is determined as “No”, the reward R can be set to a negative (minus) value. The absolute values of the positive and negative rewards R may be the same or different.

位置制御の結果が「適」である場合とは、例えば位置フィードバックＤ１と位置指令Ｓ２との差が所定のしきい値以内である場合等である。位置制御の結果が「否」である場合とは、例えば位置フィードバックＤ１と位置指令Ｓ２との差が所定のしきい値を超える場合等である。すなわち位置指令Ｓ２に対し、位置制御が所定の基準以上に正確に実現されていれば「適」となり、そうでなければ「否」となる。 The case where the result of the position control is “appropriate” is, for example, a case where the difference between the position feedback D1 and the position command S2 is within a predetermined threshold. The case where the result of the position control is “No” is, for example, a case where the difference between the position feedback D1 and the position command S2 exceeds a predetermined threshold. In other words, the position command S2 is “appropriate” if the position control is accurately realized more than a predetermined standard, and “no” otherwise.

位置制御の結果を、「適」及び「否」の二通りだけでなく複数段階に設定することができる。例えば、報酬計算部８３４は、位置フィードバックＤ１と位置指令Ｓ２との差が小さくなるほど報酬が大きくなるよう、段階的な報酬を設定することができる。 The result of the position control can be set not only in two ways, “appropriate” and “not”, but in a plurality of stages. For example, the reward calculation unit 834 can set a stepwise reward so that the reward increases as the difference between the position feedback D1 and the position command S2 decreases.

価値関数更新部８３５は、状態変数Ｓと判定データＤと報酬Ｒとを、関数Ｑで表される行動価値（例えば数値）と関連付けて整理した行動価値テーブルを持つことができる。この場合、価値関数更新部８３５が関数Ｑを更新するという行為は、価値関数更新部８３５が行動価値テーブルを更新するという行為と同義である。Ｑ学習の開始時には、Ｌｕｇｒｅモデルの係数Ｓ１と、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４との相関関係は未知であるから、行動価値テーブルにおいては、種々の状態変数Ｓと判定データＤと報酬Ｒとが、無作為に定めた行動価値の値（関数Ｑ）と関連付けた形態で用意されている。報酬計算部８３４は、判定データＤが分かればこれ対応する報酬Ｒを直ちに算出でき、算出した値Ｒが行動価値テーブルに書き込まれる。 The value function updating unit 835 can have an action value table in which the state variable S, the determination data D, and the reward R are associated with the action value (for example, a numerical value) represented by the function Q. In this case, the act of the value function updating unit 835 updating the function Q is synonymous with the act of the value function updating unit 835 updating the behavior value table. At the start of Q-learning, the correlation between the coefficient S1 of the Lugre model and the position command S2, the speed command S3, and the position feedback S4 is unknown, so in the action value table, various state variables S and determination data D A reward R is prepared in a form associated with a value (function Q) determined at random. The reward calculator 834 can immediately calculate the reward R corresponding to the determination data D, and the calculated value R is written in the behavior value table.

位置制御の結果に応じた報酬Ｒを用いてＱ学習を進めると、より高い報酬Ｒが得られる行動を選択する方向へ学習が誘導され、選択した行動を現在状態で実行した結果として変化する環境の状態（つまり状態変数Ｓ及び判定データＤ）に応じて、現在状態で行う行動についての行動価値の値（関数Ｑ）が書き換えられて行動価値テーブルが更新される。この更新を繰り返すことにより、行動価値テーブルに表示される行動価値の値（関数Ｑ）は、適正な行動ほど大きな値となるように書き換えられる。このようにして、未知であった環境の現在状態すなわち位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４と、それに対する行動すなわち設定されるＬｕｇｒｅモデルの係数Ｓ１と、の相関性が徐々に明らかになる。つまり行動価値テーブルの更新により、Ｌｕｇｒｅモデルの係数Ｓ１と、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４と、の相関関係が徐々に最適解に近づけられる。 When Q learning is advanced using reward R according to the result of position control, learning is guided in a direction to select an action that can obtain higher reward R, and the environment changes as a result of executing the selected action in the current state In accordance with the state (that is, the state variable S and the determination data D), the action value value (function Q) for the action performed in the current state is rewritten, and the action value table is updated. By repeating this update, the value of the action value (function Q) displayed in the action value table is rewritten so that the more appropriate the action, the larger the value. In this way, the correlation between the current state of the unknown environment, that is, the position command S2, the speed command S3, and the position feedback S4, and the action corresponding thereto, that is, the coefficient S1 of the set Lugre model is gradually clarified. . That is, by updating the behavior value table, the correlation between the coefficient S1 of the Lugre model and the position command S2, the speed command S3, and the position feedback S4 is gradually brought closer to the optimal solution.

図６を参照して、強化学習部８３３が実行するＱ学習のフロー（つまり機械学習方法の一形態）をさらに説明する。
ステップＳＡ０１：価値関数更新部８３５は、その時点での行動価値テーブルを参照しながら、状態観測部８３１が観測した状態変数Ｓが示す現在状態で行う行動として、Ｌｕｇｒｅモデルの係数Ｓ１を無作為に選択する。
ステップＳＡ０２：価値関数更新部８３５は、状態観測部８３１が観測している現在状態の状態変数Ｓを取り込む。
ステップＳＡ０３：価値関数更新部８３５は、判定データ取得部８３２が取得している現在状態の判定データＤを取り込む。
ステップＳＡ０４：価値関数更新部８３５は、判定データＤに基づき、Ｌｕｇｒｅモデルの係数Ｓ１が適当であったか否かを判断する。適当であった場合、ステップＳＡ０５に遷移する。適当でなかった場合、ステップＳＡ０７に遷移する。
ステップＳＡ０５：価値関数更新部８３５は、報酬計算部８３４が求めた正の報酬Ｒを関数Ｑの更新式に適用する。
ステップＳＡ０６：価値関数更新部８３５は、現在状態における状態変数Ｓ及び判定データＤと報酬Ｒと行動価値の値（更新後の関数Ｑ）とを用いて行動価値テーブルを更新する。
ステップＳＡ０７：価値関数更新部８３５は、報酬計算部８３４が求めた負の報酬Ｒを関数Ｑの更新式に適用する。 With reference to FIG. 6, the Q learning flow (that is, one form of the machine learning method) executed by the reinforcement learning unit 833 will be further described.
Step SA01: The value function updating unit 835 randomly selects the coefficient S1 of the Lugre model as an action to be performed in the current state indicated by the state variable S observed by the state observation unit 831 while referring to the action value table at that time. select.
Step SA02: The value function updating unit 835 takes in the state variable S of the current state that is being observed by the state observation unit 831.
Step SA03: The value function updating unit 835 takes in the determination data D in the current state acquired by the determination data acquisition unit 832.
Step SA04: The value function updating unit 835 determines whether the coefficient S1 of the Lugre model is appropriate based on the determination data D. If appropriate, the process proceeds to step SA05. If not, the process proceeds to step SA07.
Step SA05: The value function updating unit 835 applies the positive reward R obtained by the reward calculating unit 834 to the function Q update formula.
Step SA06: The value function updating unit 835 updates the action value table using the state variable S, the determination data D, the reward R, and the action value (updated function Q) in the current state.
Step SA07: The value function update unit 835 applies the negative reward R obtained by the reward calculation unit 834 to the update formula of the function Q.

強化学習部８３３は、ステップＳＡ０１乃至ＳＡ０７を繰り返すことで行動価値テーブルを反復して更新し、学習を進行させる。なお、ステップＳＡ０４からステップＳＡ０７までの報酬Ｒを求める処理及び価値関数の更新処理は、判定データＤに含まれるそれぞれのデータについて実行される。 Reinforcement learning unit 833 repeats steps SA01 to SA07 to repeatedly update the behavior value table, and advances learning. It should be noted that the processing for obtaining the reward R and the value function updating processing from step SA04 to step SA07 are executed for each data included in the determination data D.

強化学習を進める際に、例えばＱ学習の代わりに、ニューラルネットワークを用いることができる。図７Ａは、ニューロンのモデルを模式的に示す。図７Ｂは、図７Ａに示すニューロンを組み合わせて構成した三層のニューラルネットワークのモデルを模式的に示す。ニューラルネットワークは、例えば、ニューロンのモデルを模した演算装置や記憶装置等によって構成できる。 When proceeding with reinforcement learning, for example, a neural network can be used instead of Q learning. FIG. 7A schematically shows a model of a neuron. FIG. 7B schematically shows a model of a three-layer neural network configured by combining the neurons shown in FIG. 7A. The neural network can be configured by, for example, an arithmetic device or a storage device imitating a neuron model.

図７Ａに示すニューロンは、複数の入力ｘ（ここでは一例として、入力ｘ₁〜入力ｘ₃）に対する結果ｙを出力するものである。各入力ｘ₁〜ｘ₃には、この入力ｘに対応する重みｗ（ｗ₁〜ｗ₃）が掛けられる。これにより、ニューロンは、次の数３式により表現される出力ｙを出力する。なお、数３式において、入力ｘ、出力ｙ及び重みｗは、すべてベクトルである。また、θはバイアスであり、ｆ_kは活性化関数である。 The neuron shown in FIG. 7A outputs a result y for a plurality of inputs x (here, as an example, inputs x ₁ to x ₃ ). Each input x ₁ ~x _3, the weight w corresponding to the input x (w ₁ ~w ₃₎ is multiplied. As a result, the neuron outputs an output y expressed by the following equation (3). In Equation 3, the input x, the output y, and the weight w are all vectors. Further, θ is a bias, and f _k is an activation function.

図７Ｂに示す三層のニューラルネットワークは、左側から複数の入力ｘ（ここでは一例として、入力ｘ１〜入力ｘ３）が入力され、右側から結果ｙ（ここでは一例として、結果ｙ１〜結果ｙ３）が出力される。図示の例では、入力ｘ１、ｘ２、ｘ３のそれぞれに対応の重み（総称してＷ１で表す）が乗算されて、個々の入力ｘ１、ｘ２、ｘ３がいずれも３つのニューロンＮ１１、Ｎ１２、Ｎ１３に入力されている。 In the three-layer neural network shown in FIG. 7B, a plurality of inputs x (in this example, inputs x1 to x3) are input from the left side, and a result y (in this case, results y1 to y3 as an example) are input from the right side. Is output. In the illustrated example, each of the inputs x1, x2, and x3 is multiplied by a corresponding weight (generically expressed as W1), and each of the inputs x1, x2, and x3 is assigned to three neurons N11, N12, and N13. Have been entered.

図７Ｂでは、ニューロンＮ１１〜Ｎ１３の各々の出力を、総称してｚ１で表す。ｚ１は、入カベクトルの特徴量を抽出した特徴ベクトルと見なすことができる。図示の例では、特徴ベクトルｚ１のそれぞれに対応の重み（総称してＷ２で表す）が乗算されて、個々の特徴ベクトルｚ１がいずれも２つのニューロンＮ２１、Ｎ２２に入力されている。特徴ベクトルｚ１は、重みＷ１と重みＷ２との間の特徴を表す。 In FIG. 7B, the outputs of the neurons N11 to N13 are collectively represented by z1. z1 can be regarded as a feature vector obtained by extracting the feature amount of the input vector. In the illustrated example, each feature vector z1 is multiplied by a corresponding weight (generically represented by W2), and each feature vector z1 is input to two neurons N21 and N22. The feature vector z1 represents a feature between the weight W1 and the weight W2.

図７Ｂでは、ニューロンＮ２１〜Ｎ２２の各々の出力を、総称してｚ２で表す。ｚ２は、特徴ベクトルｚ１の特徴量を抽出した特徴ベクトルと見なすことができる。図示の例では、特徴ベクトルｚ２のそれぞれに対応の重み（総称してＷ３で表す）が乗算されて、個々の特徴ベクトルｚ２がいずれも３つのニューロンＮ３１、Ｎ３２、Ｎ３３に入力されている。特徴ベクトルｚ２は、重みＷ２と重みＷ３との間の特徴を表す。最後にニューロンＮ３１〜Ｎ３３は、それぞれ結果ｙ１〜ｙ３を出力する。
なお、三層以上の層を為すニューラルネットワークを用いた、いわゆるディープラーニングの手法を用いることも可能である。 In FIG. 7B, the outputs of the neurons N21 to N22 are collectively represented by z2. z2 can be regarded as a feature vector obtained by extracting the feature amount of the feature vector z1. In the illustrated example, each feature vector z2 is multiplied by a corresponding weight (generically represented by W3), and each feature vector z2 is input to three neurons N31, N32, and N33. The feature vector z2 represents a feature between the weight W2 and the weight W3. Finally, the neurons N31 to N33 output the results y1 to y3, respectively.
It is also possible to use a so-called deep learning method using a neural network having three or more layers.

このような学習サイクルを繰り返すことにより、強化学習部８３３は、Ｌｕｇｒｅモデルの係数Ｓ１と、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４との相関関係を暗示する特徴を自動的に識別することができるようになる。学習アルゴリズムの開始時には、Ｌｕｇｒｅモデルの係数Ｓ１と、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４との相関関係は実質的に未知であるが、強化学習部８３３は学習を進めるに従い徐々に特徴を識別して相関性を解釈する。Ｌｕｇｒｅモデルの係数Ｓ１と、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４との相関関係がある程度信頼できる水準まで解釈されると、強化学習部８３３が反復出力する学習結果は、現在状態すなわち位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４に対して、どのようなＬｕｇｒｅモデルの係数Ｓ１を設定すべきかという行動の選択（意思決定）を行うために使用できるものとなる。このようにして強化学習部８３３は、現在状態に対応する行動の最適解を出力できる学習モデルを生成する。 By repeating such a learning cycle, the reinforcement learning unit 833 can automatically identify a feature that implies a correlation between the coefficient S1 of the Lugre model and the position command S2, the speed command S3, and the position feedback S4. become able to. At the start of the learning algorithm, the correlation between the coefficient S1 of the Lugre model and the position command S2, the speed command S3, and the position feedback S4 is substantially unknown, but the reinforcement learning unit 833 gradually increases the characteristics as the learning proceeds. Identify and interpret correlations. When the correlation between the coefficient S1 of the Lugre model and the position command S2, the speed command S3, and the position feedback S4 is interpreted to a certain level of reliability, the learning result that the reinforcement learning unit 833 repeatedly outputs is the current state, that is, the position command. It can be used to select an action (decision decision) as to what kind of Lugre model coefficient S1 should be set for S2, speed command S3 and position feedback S4. In this way, the reinforcement learning unit 833 generates a learning model that can output an optimal solution of behavior corresponding to the current state.

図８は、第３の実施形態による制御装置１と機械学習装置１００の概略的な機能ブロック図である。本実施形態の制御装置１は、機械学習装置１００が推定を行う場合に必要とされる構成を備えている（推定モード）。図８に示した各機能ブロックは、図３に示した制御装置１が備えるＣＰＵ１１、及び機械学習装置１００のプロセッサ１０１が、それぞれのシステム・プログラムを実行し、制御装置１及び機械学習装置１００の各部の動作を制御することにより実現される。 FIG. 8 is a schematic functional block diagram of the control device 1 and the machine learning device 100 according to the third embodiment. The control device 1 of the present embodiment has a configuration required when the machine learning device 100 performs estimation (estimation mode). In each functional block shown in FIG. 8, the CPU 11 included in the control device 1 shown in FIG. 3 and the processor 101 of the machine learning device 100 execute the respective system programs, and the control device 1 and the machine learning device 100 This is realized by controlling the operation of each part.

本実施形態の制御装置１は、第２の実施形態と同様に、データ取得部７０、及び機械学習装置１００上に構成された補正トルク推定部８０を備える。補正トルク推定部８０は、推定部８５、及び補正トルク算出部８２を備える。また、不揮発性メモリ１４上には、データ取得部７０が取得したデータが記憶される取得データ記憶部７１が設けられており、機械学習装置１００の不揮発性メモリ１０４上には、学習部８３による機械学習により構築された学習モデルを記憶する学習モデル記憶部８４が設けられている。 Similar to the second embodiment, the control device 1 of the present embodiment includes a data acquisition unit 70 and a correction torque estimation unit 80 configured on the machine learning device 100. The correction torque estimation unit 80 includes an estimation unit 85 and a correction torque calculation unit 82. Further, an acquisition data storage unit 71 for storing data acquired by the data acquisition unit 70 is provided on the nonvolatile memory 14, and a learning unit 83 is provided on the nonvolatile memory 104 of the machine learning device 100. A learning model storage unit 84 that stores a learning model constructed by machine learning is provided.

本実施形態によるデータ取得部７０、及び前処理部９０の動作は、第２の実施形態と同様である。データ取得部７０が取得したデータは、前処理部９０により、機械学習装置１００において扱われる統一的な形式へと変換（数値化、サンプリング等）されて、状態データＳが生成される。前処理部９０が作成した状態データＳは、機械学習装置１００による推定に用いられる。 The operations of the data acquisition unit 70 and the preprocessing unit 90 according to the present embodiment are the same as those of the second embodiment. The data acquired by the data acquisition unit 70 is converted (numerized, sampled, etc.) into a unified format handled by the machine learning device 100 by the preprocessing unit 90, and the state data S is generated. The state data S created by the preprocessing unit 90 is used for estimation by the machine learning device 100.

推定部８５は、前処理部９０が作成した状態データＳに基づいて、学習モデル記憶部８４に記憶された学習モデルを用い、Ｌｕｇｒｅモデルの係数Ｓ１の推定を行う。本実施形態の推定部８５は、学習部８３により生成された（パラメータが決定された）学習モデルに対して、前処理部９０から入力された状態データＳを入力することで、Ｌｕｇｒｅモデルの係数Ｓ１を推定して出力する。 Based on the state data S created by the preprocessing unit 90, the estimation unit 85 uses the learning model stored in the learning model storage unit 84 to estimate the coefficient S1 of the Lugre model. The estimation unit 85 of the present embodiment inputs the state data S input from the preprocessing unit 90 to the learning model generated by the learning unit 83 (parameters are determined), so that the coefficient of the Lugre model S1 is estimated and output.

補正トルク算出部８２は、推定部８５が推定した結果（摩擦モデルの係数の組み合わせＳ１）を使用し、摩擦モデルに基づく補正トルクを算出、出力する。制御装置１は、補正トルク算出部８２が出力した補正トルクを電流指令に加算する。 The correction torque calculation unit 82 calculates and outputs a correction torque based on the friction model using the result estimated by the estimation unit 85 (the combination S1 of friction model coefficients). The control device 1 adds the correction torque output from the correction torque calculation unit 82 to the current command.

実施の形態２及び３によれば、機械学習装置１００が、Ｌｕｇｒｅモデルの係数Ｓ１と、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４との相関関係を示す学習モデルを生成し、当該学習モデルを利用することにより摩擦モデルの係数を推定するので、様々な機械や使用環境等に応じた最適な係数も容易に求めることができるようになる。 According to the second and third embodiments, the machine learning device 100 generates a learning model indicating the correlation between the coefficient S1 of the Lugre model and the position command S2, the speed command S3, and the position feedback S4, and the learning model is generated. Since the coefficient of the friction model is estimated by use, it is possible to easily obtain the optimum coefficient corresponding to various machines, usage environments, and the like.

以上、本発明の実施の形態について説明したが、本発明は上述した実施の形態の例のみに限定されることなく、適宜の変更を加えることにより様々な態様で実施することができる。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and can be implemented in various modes by making appropriate changes.

例えば、上記した実施形態では制御装置１と機械学習装置１００が異なるＣＰＵ（プロセッサ）を有する装置として説明しているが、機械学習装置１００は制御装置１が備えるＣＰＵ１１と、ＲＯＭ１２に記憶されるシステム・プログラムにより実現するようにしても良い。 For example, in the above-described embodiment, the control device 1 and the machine learning device 100 are described as devices having different CPUs (processors). However, the machine learning device 100 is a system stored in the ROM 11 and the CPU 11 provided in the control device 1. -It may be realized by a program.

また、機械学習装置１００の変形例として、学習部８３は、同種の複数の機械のそれぞれについて得られた状態変数Ｓ及び判定データＤを用いて、それら機械に共通する適切なＬｕｇｒｅモデルの係数を学習することができる。この構成によれば、一定時間で得られる状態変数Ｓと判定データＤとを含むデータ集合の量を増加させ、より多様なデータ集合を入力できるので、学習の速度や信頼性を向上させることができる。また、こうして得られた学習モデルを初期値として使用し、機械ごとに追加学習を行なうことで、Ｌｕｇｒｅモデルを個々の機械にさらに最適化することもできる。 As a modification of the machine learning device 100, the learning unit 83 uses the state variable S and the determination data D obtained for each of a plurality of machines of the same type, and calculates an appropriate Lugre model coefficient common to those machines. Can learn. According to this configuration, the amount of the data set including the state variable S and the determination data D obtained in a certain time can be increased and more diverse data sets can be input, so that the learning speed and reliability can be improved. it can. Further, the Lugre model can be further optimized for each machine by using the learning model thus obtained as an initial value and performing additional learning for each machine.

図９は、制御装置１に複数の機械を加えたシステム１７０を示す。システム１７０は、複数の機械１６０及び機械１６０’を有する。全ての機械１６０と機械１６０’は、有線又は無線のネットワーク１７２により互いに接続される。 FIG. 9 shows a system 170 in which a plurality of machines are added to the control device 1. The system 170 includes a plurality of machines 160 and a machine 160 '. All machines 160 and 160 'are connected to each other by a wired or wireless network 172.

機械１６０及び機械１６０’は同種の機構を有する。一方、機械１６０は制御装置１を備えるが、機械１６０’は制御装置１を備えない。 Machine 160 and machine 160 'have the same type of mechanism. On the other hand, the machine 160 includes the control device 1, but the machine 160 ′ does not include the control device 1.

制御装置１を備えるほうの機械１６０では、推定部８５が、学習部８３の学習結果である学習モデルを用いて、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４に対応するＬｕｇｒｅモデルの係数Ｓ１を推定することができる。また、少なくとも１つの機械１６０の制御装置１が、他の複数の機械１６０及び機械１６０’のそれぞれについて得られた状態変数Ｓ及び判定データＤを利用し、全ての機械１６０及び機械１６０’に共通する位置制御を学習し、その学習結果を全ての機械１６０及び機械１６０’が共有するように構成できる。システム１７０によれば、より多様なデータ集合（状態変数Ｓ及び判定データＤを含む）を入力として、位置制御の学習の速度や信頼性を向上させることができる。 In the machine 160 including the control device 1, the estimation unit 85 uses the learning model that is the learning result of the learning unit 83 to calculate the coefficient S1 of the Lugre model corresponding to the position command S2, the speed command S3, and the position feedback S4. Can be estimated. Further, the control device 1 of at least one machine 160 uses the state variable S and the determination data D obtained for each of the other machines 160 and 160 ′, and is common to all the machines 160 and 160 ′. The position control to be learned is learned, and the learning result can be shared by all the machines 160 and 160 ′. According to the system 170, it is possible to improve the speed and reliability of position control learning using a more diverse data set (including the state variable S and the determination data D) as an input.

図１０は、複数の機械１６０’を備えたシステム１７０’を示す。システム１７０’は、同一の機械構成を有する複数の機械１６０’と、制御装置１から独立した機械学習装置１２０（又は制御装置１に含まれる機械学習装置１００）と、を有する。複数の機械１６０’と機械学習装置１２０（又は機械学習装置１００）とは、有線又は無線のネットワーク１７２により互いに接続される。 FIG. 10 shows a system 170 'comprising a plurality of machines 160'. The system 170 ′ includes a plurality of machines 160 ′ having the same machine configuration, and a machine learning device 120 independent of the control device 1 (or the machine learning device 100 included in the control device 1). The plurality of machines 160 ′ and the machine learning device 120 (or the machine learning device 100) are connected to each other by a wired or wireless network 172.

機械学習装置１２０（又は機械学習装置１００）は、複数の機械１６０’のそれぞれについて得られた状態変数Ｓ及び判定データＤに基づき、全ての機械１６０’に共通するＬｕｇｒｅモデルの係数Ｓ１を学習する。機械学習装置１２０（又は機械学習装置１００）は、その学習結果を用いて、位置指令Ｓ２、速度指令Ｓ３及び位置フィードバックＳ４に対応するＬｕｇｒｅモデルの係数Ｓ１を推定することができる。 The machine learning device 120 (or the machine learning device 100) learns the Lugre model coefficient S1 common to all the machines 160 ′ based on the state variable S and the determination data D obtained for each of the plurality of machines 160 ′. . The machine learning device 120 (or machine learning device 100) can estimate the coefficient S1 of the Lugre model corresponding to the position command S2, the speed command S3, and the position feedback S4 using the learning result.

この構成によれば、複数の機械１６０’のそれぞれが存在する場所や時期に関わらず、必要なときに必要な数の機械１６０’を機械学習装置１２０（又は機械学習装置１００）に接続することができる。 According to this configuration, the necessary number of machines 160 ′ are connected to the machine learning device 120 (or the machine learning device 100) when necessary regardless of the location and timing of each of the plurality of machines 160 ′. Can do.

上記した実施形態では、制御装置１や機械学習装置１００（又は機械学習装置１２０）はローカルに設置された１つの情報処理装置であるものと想定しているが、本発明はこれに限定されるものではなく、例えば制御装置１や機械学習装置１００（又は機械学習装置１２０）は、クラウドコンピューティング、フォグコンピューティング、エッジコンピューティング等と称される情報処理環境に実装されても良い。 In the above-described embodiment, it is assumed that the control device 1 and the machine learning device 100 (or the machine learning device 120) are one information processing device installed locally, but the present invention is limited to this. For example, the control device 1 or the machine learning device 100 (or the machine learning device 120) may be implemented in an information processing environment called cloud computing, fog computing, edge computing, or the like.

また、上述の実施の形態では摩擦モデルとして代表的なＬｕｇｒｅモデルにおいて係数を決定する手法を示したが、本発明はこれに限定されるものでなく、Ｓｅｖｅｎｐａｒａｍｅｔｅｒｍｏｄｅｌ，Ｓｔａｔｅｖａｒｉａｂｌｅｍｏｄｅｌ，Ｋａｒｎｏｐｐｍｏｄｅｌ，ＬｕＧｒｅｍｏｄｅｌ，ＭｏｄｉｆｉｅｄＤａｈｌｍｏｄｅｌ，Ｍ２ｍｏｄｅｌ等の種々の摩擦モデルの係数の決定に適用することが可能である。 In the above-described embodiment, a method for determining a coefficient in a typical Lugre model as a friction model has been described. However, the present invention is not limited to this, and the Seven parameter model, the State variable model, the Karnopp model, The present invention can be applied to determination of coefficients of various friction models such as LuGre model, Modified Dahl model, and M2 model.

また、上述の実施の形態では、機械として主に加工機械を例示したが、本発明はこれに限定されるものでなく、摩擦が問題になるような駆動機構、典型的には位置決め機構を有する種々の機械（例えば医療ロボット、災害ロボット、建設ロボット等）に適用可能である。 In the above-described embodiment, the processing machine is mainly exemplified as the machine. However, the present invention is not limited to this, and has a drive mechanism, typically a positioning mechanism, in which friction becomes a problem. The present invention can be applied to various machines (for example, medical robots, disaster robots, construction robots, etc.).

また、上述の実施の形態は、図１３に示す制御系に基づいて摩擦モデルの係数を求めるものであったが、本発明はこれに限定されず、これを変形した種々の制御系にも適用可能である。例えば図１４に示すように、速度指令に代えて、位置指令の微分であるｓ＝速度指令相当が摩擦モデルに入力される制御系を用いても良い。この場合、機械学習装置１００の状態観測部８３１は、速度指令に代えて、速度指令相当のｓを観測する。この構成によれば、位置指令だけで補正トルクを計算できるので、制御装置１側だけで補正トルクの計算を完結できるという利点がある。 In the above-described embodiment, the coefficient of the friction model is obtained based on the control system shown in FIG. 13, but the present invention is not limited to this, and can be applied to various control systems obtained by modifying the coefficient. Is possible. For example, as shown in FIG. 14, instead of the speed command, a control system in which s = speed command equivalent that is a derivative of the position command is input to the friction model may be used. In this case, the state observation unit 831 of the machine learning device 100 observes s corresponding to the speed command instead of the speed command. According to this configuration, since the correction torque can be calculated only by the position command, there is an advantage that the calculation of the correction torque can be completed only by the control device 1 side.

又は、図１５に示すように、位置指令及び速度指令の代わりに、位置フィードバック及び速度フィードバックを摩擦モデルに入力する制御系を用いても良い。この場合、機械学習装置１００の状態観測部８３１は、位置指令でなく位置フィードバックを、速度指令でなく速度フィードバックを観測する。この構成は軸制御回路３０側で実現しやすい。高速な処理が可能であり、フィードバックを使用するため実摩擦力を推定しやすい。 Or as shown in FIG. 15, you may use the control system which inputs a position feedback and a speed feedback into a friction model instead of a position command and a speed command. In this case, the state observation unit 831 of the machine learning device 100 observes position feedback instead of position command and speed feedback instead of speed command. This configuration is easy to realize on the axis control circuit 30 side. High-speed processing is possible, and since the feedback is used, the actual friction force is easy to estimate.

１制御装置
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４不揮発性メモリ
１８，１９，２１，２２インタフェース
２０バス
３０軸制御回路
４０サーボアンプ
５０サーボモータ
６０操作盤
７０データ取得部
７１取得データ記憶部
８０補正トルク推定部
８１最適化部
８２補正トルク算出部
８３学習部
８３１状態観測部
８３２判定データ取得部
８３３強化学習部
８３４報酬計算部
８３５価値関数更新部
８４学習モデル記憶部
８５推定部
１００機械学習装置（制御装置１に含まれる）
１０１プロセッサ
１０２ＲＯＭ
１０３ＲＡＭ
１０４不揮発性メモリ
１２０機械学習装置（制御装置１から独立）
１６０，１６０’ 機械
１７０，１７０’ システム
１７２ネットワーク 1 control device 11 CPU
12 ROM
13 RAM
DESCRIPTION OF SYMBOLS 14 Nonvolatile memory 18, 19, 21, 22 Interface 20 Bus 30 Axis control circuit 40 Servo amplifier 50 Servo motor 60 Operation panel 70 Data acquisition part 71 Acquisition data storage part 80 Correction torque estimation part 81 Optimization part 82 Correction torque calculation part 83 learning unit 831 state observation unit 832 determination data acquisition unit 833 reinforcement learning unit 834 reward calculation unit 835 value function update unit 84 learning model storage unit 85 estimation unit 100 machine learning device (included in the control device 1)
101 processor 102 ROM
103 RAM
104 Non-volatile memory 120 Machine learning device (independent of control device 1)
160, 160 'machine 170, 170' system 172 network

Claims

A control device that performs position control in consideration of friction with respect to one or more shafts of a machine,
A data acquisition unit for acquiring at least a position command and position feedback;
And a correction torque estimating unit that estimates a coefficient of a friction model when performing the position control based on a position deviation that is a difference between the position command and the position feedback.

The control device according to claim 1, wherein the correction torque estimation unit includes an optimization unit that estimates a coefficient of a friction model by solving an optimization problem that minimizes the position deviation.

The correction torque estimation unit includes a learning unit that performs machine learning using a state variable including a coefficient of the friction model, a position command and position feedback, and a speed command or speed feedback to generate a learning model. The control device described.

The control device according to claim 3, wherein the learning unit performs reinforcement learning based on determination data indicating a result of the position control.

The correction torque estimating unit
A coefficient of the friction model, a position command and position feedback, and a learning model storage unit that stores a learning model that has been machine-learned using a speed command or speed feedback;
The control device according to claim 1, further comprising: an estimation unit configured to estimate a coefficient of the friction model using the learning model based on a position command and position feedback, and a speed command or speed feedback.

The control device according to claim 1, wherein the data acquisition unit acquires data from a plurality of the machines.

The friction model is a Lugre Model, a Seven parameter model, a State variable model, a Karnopp model, a LuGre model, or a Modified Dahl model, or an M2 model.

A control method for performing position control in consideration of friction with respect to one or more shafts of a machine,
A data acquisition step for acquiring at least a position command and position feedback;
And a correction torque estimation step of estimating a coefficient of a friction model when performing the position control based on a position deviation that is a difference between the position command and the position feedback.