JP6458912B1

JP6458912B1 - Position control device and position control method

Info

Publication number: JP6458912B1
Application number: JP2018530627A
Authority: JP
Inventors: 勇人山中; 高志南本
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2018-01-24
Filing date: 2018-01-24
Publication date: 2019-01-30
Anticipated expiration: 2038-01-24
Also published as: JPWO2019146007A1; TW201932257A; WO2019146007A1

Abstract

二つのモノについて挿入を伴う位置合わせを含む場合、撮像部２０１から取得された画像と力覚センサ８０１の値に基づいて挿入するための制御量を指示するとともに位置合わせに対する結果から学習する経路決定部８０２と、制御量に到達するために一制御周期ごとに設定される周期制御量と、力覚センサ８０１の値に基づく外力に適応した制御量とに基づいて周期制御量調整値を出力する制御パラメータ調整部１４０１を備えたので、軌道生成制御および力覚センサ８０１を用いたコンプライアントモーション制御を加算した制御量によってロボットアーム１００を動作させ、通常の強化学習モデルでは学習が収束するまでは試行錯誤が必要であり環境を破損する可能性があるが、本発明により学習の初期であっても安全に試行を行わせることが可能である。 In the case of including alignment with insertion for two objects, a route determination is performed that indicates a control amount for insertion based on the image acquired from the imaging unit 201 and the value of the force sensor 801 and learns from the result of alignment The periodic control amount adjustment value is output based on the unit 802, the periodic control amount set for each control period in order to reach the control amount, and the control amount adapted to the external force based on the value of the force sensor 801. Since the control parameter adjustment unit 1401 is provided, the robot arm 100 is operated by a control amount obtained by adding the trajectory generation control and the compliant motion control using the force sensor 801 until learning is converged in a normal reinforcement learning model. Although trial and error is necessary and may damage the environment, the present invention allows safe trials even in the early stages of learning. It is possible.

Description

この発明は位置制御装置及び位置制御方法に関するものである。 The present invention relates to a position control device and a position control method.

ロボットアームで組立動作を行う生産システムを構築する際には、ティーチングと呼ばれる人の手による教示作業を行うのが一般的である。しかし、このティーチングにおいてロボットは記憶された位置のみに対して動作を繰り返し行うため、製作や取付による誤差が発生する場合には、対応できない場合もある。そのため、この個体誤差を吸収するような位置補正技術が開発することが可能であれば、生産性の向上が期待できる上、ロボットの活躍する場面も大きくなる。 When constructing a production system in which an assembly operation is performed by a robot arm, a teaching operation by a human hand called teaching is generally performed. However, in this teaching, since the robot repeatedly performs the operation only on the stored position, it may not be able to cope with an error caused by manufacturing or mounting. Therefore, if a position correction technique capable of absorbing this individual error can be developed, improvement in productivity can be expected, and the scene where the robot plays an active role also increases.

現在の技術においても、カメラ画像を用いてコネクタ挿入作業の直前までの位置補正を行う技術は存在する（特許文献１）。また、力覚センサ、ステレオカメラ、等複数のデバイスを用いれば組立（挿入、ワーク保持等）に関する位置の誤差を吸収することはできる。しかし、位置補正量を決定するために、同参考文献のように把持したコネクタの中心座標、挿入する側のコネクタの中心座標などの量を明示的に画像情報から計算する必要がある。この計算はコネクタの形状に依存し、使用コネクタごとに設計者が設定しなければならない。また、３次元情報が距離カメラなどから取得できればこの計算も比較的容易であるが、２次元画像情報から取得するためにはコネクタ毎に画像処理アルゴリズムを開発する必要があるため、多くの設計コストがかかってしまう。 Even in the current technology, there is a technology for performing position correction until just before connector insertion work using a camera image (Patent Document 1). Further, if a plurality of devices such as a force sensor and a stereo camera are used, position errors relating to assembly (insertion, work holding, etc.) can be absorbed. However, in order to determine the position correction amount, it is necessary to explicitly calculate the amounts such as the center coordinates of the gripped connector and the center coordinates of the connector on the insertion side from the image information as in the same reference. This calculation depends on the shape of the connector and must be set by the designer for each connector used. In addition, if 3D information can be obtained from a distance camera or the like, this calculation is relatively easy. However, since it is necessary to develop an image processing algorithm for each connector in order to obtain 2D image information, a lot of design costs are required. It will take.

また、ロボットが自ら学習し適切な行動を獲得する手法として、深層学習や深層強化学習と呼ばれる手法が存在する。しかし、これらの学習によって適切な行動を獲得するためには、通常、大量の適切な学習データを収集する必要がある。また、強化学習などの手法を用いてデータを収集する場合、何度も繰り返し同じシーンを体験する必要があり、膨大な試行数が必要な上、未体験なシーンに対しては性能が保証できない。そのため、さまざまなシーンの学習データを万遍なく集める必要があり、多くの手間がかかる。
例えば、特許文献２のように一回の成功試行で最適経路を求めるような手法も存在するが、深層学習や深層強化学習に使えるデータを集めることは出来ず、相当回数分の学習を行う必要がある。In addition, there are techniques called deep learning and deep reinforcement learning as a technique for the robot to learn by itself and acquire appropriate behavior. However, in order to acquire appropriate actions through these learnings, it is usually necessary to collect a large amount of appropriate learning data. Also, when collecting data using techniques such as reinforcement learning, it is necessary to experience the same scene over and over again, requiring a huge number of trials, and performance cannot be guaranteed for unexperienced scenes. . Therefore, it is necessary to collect learning data for various scenes uniformly, which takes a lot of work.
For example, there is a method for obtaining an optimum route in one successful trial as in Patent Document 2, but it is not possible to collect data that can be used for deep learning or deep reinforcement learning, and it is necessary to perform learning for a considerable number of times. There is.

ＷＯ９８−０１７４４４号公報WO98-017444 特開２００５−１２５４７５号公報JP 2005-125475 A

二つのモノについて挿入を伴う位置合わせをおこなう場合、学習するために位置を指示する機能とロボットの位置制御するためにサーボモータ等を動作させる機能は通常は独立に存在する。したがってモノに与える荷重を考慮していないため、学習のために与える変位量によっては、モノに過大な荷重を与えてしまうという課題があった。 When performing alignment with insertion for two objects, there is usually an independent function for indicating the position for learning and a function for operating a servo motor or the like for controlling the position of the robot. Therefore, since the load applied to the object is not considered, there is a problem that an excessive load is applied to the object depending on the amount of displacement applied for learning.

本発明は上記の課題を解決するためになされたものであって、学習するための機能とロボットの位置制御するための機能が別であってもモノに過大な荷重を与えるのを防ぎつつ学習データを収集することを目的とする。 The present invention has been made in order to solve the above-described problems, and learning while preventing an excessive load from being applied even if the function for learning and the function for controlling the position of the robot are different. The purpose is to collect data.

この発明に係る位置制御装置は、二つのモノの一方のモノから他方のモノへの挿入を伴う位置合わせを行う場合、撮像部から取得されるものであって、ロボットアームの把持部によって把持された一方のモノと他方のモノとが写された画像と、把持部にかかる負荷を計測する力覚センサの値と、に基づいて挿入するためのロボットアームの位置の制御量を指示するとともに位置合わせに対する結果から、画像と力覚センサの値から位置合わせが成功するロボットアームの位置の制御量を学習する経路決定部と、学習の過程において、経路決定部からのロボットアームの位置の制御量を受信するとともに、ロボットアームの位置の制御量に到達するために制御部の一制御周期ごとに設定される周期制御量と、一制御周期に対応した力覚センサの値に基づく外力に適応した制御量と、に基づいて周期制御量調整値を制御部に出力する制御パラメータ調整部と、を備えた。 Position control apparatus according to the present invention, when aligning with the insert from one of mono two things to the other things, a shall be acquired from the imaging unit, is gripped by the gripping portion of the robot arm while the goods and the other image and objects have been photographed in the position instructs the value of the force sensor that measures the load on the gripper, the control amount of the position of the robot arm for insertion on the basis of the results for the combined, the path determination section that learns a control amount of the position of the robot arm positioning is successful from the value of the image and force sensor, in the course of learning, control of the position of the robot arm from the route deciding section which receives the amount and period control amount set for each first control cycle of the control unit to reach the control amount of the position of the robot arm, the force sensor corresponding to one control period A control amount that is adapted to the external force based on a control parameter adjusting section for outputting a periodic control amount adjustment value to the control unit on the basis of, having a.

この発明によれば、学習するための機能とロボットの位置制御するための機能が別であってもモノに過大な荷重を与えるのを防ぎつつ学習データを収集することができる。 According to this invention, even if the function for learning and the function for controlling the position of the robot are different, learning data can be collected while preventing an excessive load from being applied to the object.

実施の形態１におけるロボットアーム１００とオス側コネクタ１１０、メス側コネクタ１２０が配置された図。The figure in which the robot arm 100, the male side connector 110, and the female side connector 120 in Embodiment 1 are arrange | positioned. 実施の形態１における位置制御装置の機能構成図。FIG. 3 is a functional configuration diagram of the position control device in the first embodiment. 実施の形態１における位置制御装置のハードウエア構成図。FIG. 3 is a hardware configuration diagram of the position control device according to the first embodiment. 実施の形態１における位置制御装置の位置制御におけるフローチャート。5 is a flowchart in position control of the position control device in the first embodiment. 実施の形態１における単眼カメラ１０２が撮影した挿入開始位置とその周辺付近でのカメラ画像と制御量を示す図の例。The example of the figure which shows the insertion start position which the monocular camera 102 in Embodiment 1 image | photographed, the camera image in the periphery vicinity, and control amount. 実施の形態１におけるニューラルネットワークと、ニューラルネットワークの学習則の例を示す図。The figure which shows the example of the neural network in Embodiment 1, and the learning rule of a neural network. 実施の形態１におけるニューラルネットワークにおいて、複数のネットワークをもちいたフローチャート。5 is a flowchart using a plurality of networks in the neural network according to the first embodiment. 実施の形態２における位置制御装置の機能構成図。FIG. 6 is a functional configuration diagram of a position control device in a second embodiment. 実施の形態２における位置制御装置のハードウエア構成図。The hardware block diagram of the position control apparatus in Embodiment 2. FIG. 実施の形態２におけるオス側コネクタ１１０とメス側コネクタ１２０との嵌合の試行の様子を示す図。The figure which shows the mode of the trial of fitting with the male side connector 110 and the female side connector 120 in Embodiment 2. FIG. 実施の形態２における位置制御装置の経路学習におけるフローチャート。10 is a flowchart in route learning of the position control device in the second embodiment. 実施の形態３における位置制御装置の経路学習におけるフローチャート。10 is a flowchart in route learning of the position control device in the third embodiment. 実施の形態３におけるニューラルネットワークと、ニューラルネットワークの学習則の例を示す図。The figure which shows the example of the neural network in Embodiment 3, and the learning rule of a neural network. 実施の形態４における位置制御装置の機能構成図。FIG. 6 is a functional configuration diagram of a position control device in a fourth embodiment. 実施の形態４における位置制御装置の経路学習におけるフローチャート。10 is a flowchart in route learning of the position control device in the fourth embodiment.

実施の形態１．
以下、この発明の実施の形態について説明する。Embodiment 1 FIG.
Embodiments of the present invention will be described below.

実施の形態１においては、各コネクタの挿入位置を学習し、生産ラインで組み立てを行うロボットアームとその位置制御方法について説明する。 In the first embodiment, a robot arm that learns the insertion position of each connector and assembles on the production line and a position control method thereof will be described.

構成を説明する。図１は、実施の形態１におけるロボットアーム１００とオス側コネクタ１１０、メス側コネクタ１２０が配置された図である。ロボットアーム１００にはオス側コネクタ１１０を把持する把持部１０１が備えられてあり、この把持部を見えるような位置に単眼カメラ１０２がロボットアーム１００に取り付けてある。この単眼カメラ１０２位置は、ロボットアーム１００の先端の把持部１０１がオス側コネクタ１１０を把持した際に、把持されたオス側コネクタ１１０の先端部と挿入される側のメス側コネクタ１２０が見えるように設置する。 The configuration will be described. FIG. 1 is a diagram in which a robot arm 100, a male connector 110, and a female connector 120 according to the first embodiment are arranged. The robot arm 100 is provided with a grip portion 101 for gripping the male connector 110, and a monocular camera 102 is attached to the robot arm 100 at a position where the grip portion can be seen. The position of the monocular camera 102 is such that when the grip portion 101 at the tip of the robot arm 100 grips the male connector 110, the tip of the gripped male connector 110 and the female connector 120 on the side to be inserted can be seen. Install in.

図２は、実施の形態１における位置制御装置の機能構成図である。
図２において、図１における単眼カメラ１０２の機能であり、画像を撮影する撮像部２０１と、撮像された画像を用いてロボットアーム１００の位置の制御量を生成する制御パラメータ生成部２０２と、位置の制御量を用いてロボットアーム１００の駆動部２０４に対し、電流・電圧値を制御する制御部２０３と、制御部２０３から出力された電流・電圧値に基づいてロボットアーム１００の位置を変更する駆動部２０４から構成されている。FIG. 2 is a functional configuration diagram of the position control device according to the first embodiment.
2, the functions of the monocular camera 102 in FIG. 1, an imaging unit 201 that captures an image, a control parameter generation unit 202 that generates a control amount of the position of the robot arm 100 using the captured image, and a position The control unit 203 for controlling the current / voltage value with respect to the drive unit 204 of the robot arm 100 using the control amount and the position of the robot arm 100 are changed based on the current / voltage value output from the control unit 203. The drive unit 204 is configured.

制御パラメータ生成部２０２は、単眼カメラ１０２の機能であり、画像を撮影する撮像部２０１から画像を取得すると、ロボットアーム１００の位置（X、Y、Z、Ax、Ay、Az）の値に対する制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を決定し、制御部２０３に制御量を出力する。（Ｘ，Ｙ，Ｚはロボットアームの位置、Ax、Ay、Azは、ロボットアーム１００の姿勢角度）
制御部２０３は、受け取ったロボットアーム１００の位置（X、Y、Z、Ax、Ay、Az）の値に対する制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）に基づいて駆動部２０４を構成する各デバイスに対する電流・電圧値を決定し制御する。
駆動部２０４は、制御部２０３から受けた各デバイスに対する電流・電圧値で動作することで、ロボットアーム１００が（X＋ΔX、Y＋ΔY、Z＋ΔZ、Ax＋ΔAx、Ay＋ΔAy、Az＋ΔAz）の位置まで移動する。The control parameter generation unit 202 is a function of the monocular camera 102. When an image is acquired from the imaging unit 201 that captures an image, the control parameter generation unit 202 controls the position (X, Y, Z, Ax, Ay, Az) value of the robot arm 100. The amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) is determined, and the control amount is output to the control unit 203. (X, Y, and Z are robot arm positions, and Ax, Ay, and Az are posture angles of the robot arm 100)
The control unit 203 controls the driving unit 204 based on the control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) with respect to the received position (X, Y, Z, Ax, Ay, Az) values of the robot arm 100. Determine and control the current and voltage values for each device.
The drive unit 204 operates at the current / voltage value for each device received from the control unit 203, so that the robot arm 100 moves to a position of (X + ΔX, Y + ΔY, Z + ΔZ, Ax + ΔAx, Ay + ΔAy, Az + ΔAz).

図３は、実施の形態１における位置制御装置のハードウエア構成図である。
単眼カメラ１０２は、入出力インターフェース３０１を経由してプロセッサ３０２、メモリ３０３に、有線無線に関わらず通信可能に接続される。入出力インターフェース３０１、プロセッサ３０２、メモリ３０３で図２における制御パラメータ生成部２０２の機能を構成する。入出力インターフェース３０１はまた、制御部２０３に対応する制御回路３０４と有線無線に関わらず通信可能に接続される。制御回路３０４はまた、電気的にモータ３０５と接続される。モータ３０５は、図２における駆動部２０４に対応し、各デバイスの位置を制御するための部品として構成される。尚、本実施の形態において、駆動部２０４に対応するハードウエアの形態としてモータ３０５としたが、位置を制御できるハードウエアであればよい。したがって、単眼カメラ２０１と入出力インターフェース３０１間、入出力インターフェース３０１と制御回路間３０４間は別体として構成されていてもよい。FIG. 3 is a hardware configuration diagram of the position control device according to the first embodiment.
The monocular camera 102 is communicably connected to the processor 302 and the memory 303 via the input / output interface 301 regardless of wired wireless communication. The input / output interface 301, the processor 302, and the memory 303 constitute the function of the control parameter generation unit 202 in FIG. The input / output interface 301 is also communicably connected to a control circuit 304 corresponding to the control unit 203 regardless of wired wireless communication. The control circuit 304 is also electrically connected to the motor 305. The motor 305 corresponds to the drive unit 204 in FIG. 2 and is configured as a component for controlling the position of each device. In this embodiment, the motor 305 is used as the hardware corresponding to the drive unit 204, but hardware that can control the position may be used. Therefore, the monocular camera 201 and the input / output interface 301, and the input / output interface 301 and the control circuit 304 may be configured separately.

次に動作について説明する。
図４は、実施の形態１における位置制御装置の位置制御におけるフローチャートである。
まず、ステップS１０１において、ロボットアーム１００の把持部１０１は、オス側コネクタ１１０を把持する。このオス側コネクタ１１０の位置や姿勢は図２の制御部２０３側で事前に登録されており、あらかじめ制御部２０３側に登録された制御プログラムに基づいて動作される。Next, the operation will be described.
FIG. 4 is a flowchart in position control of the position control apparatus in the first embodiment.
First, in step S <b> 101, the grip portion 101 of the robot arm 100 grips the male connector 110. The position and orientation of the male connector 110 are registered in advance on the control unit 203 side in FIG. 2 and operated based on a control program registered in advance on the control unit 203 side.

次に、ステップＳ１０２において、ロボットアーム１００をメス側コネクタ１２０の挿入位置近辺まで近づける。このメス側コネクタ１１０のおおよその位置や姿勢は、図２の制御部２０３側で事前に登録されており、あらかじめ制御部２０３側に登録された制御プログラムに基づいてオス側コネクタ１１０の位置が、動作される。
次に、ステップＳ１０３において、制御パラメータ生成部２０２は単眼カメラ１０２の撮像部２０１に対し、画像を撮像するよう指示し、単眼カメラ１０３は、把持部１０１が把持しているオス側コネクタ１１０と、挿入先となるメス側コネクタ１２０とが両方映っている画像を撮像する。Next, in step S102, the robot arm 100 is brought close to the vicinity of the insertion position of the female connector 120. The approximate position and orientation of the female connector 110 are registered in advance on the control unit 203 side in FIG. 2, and the position of the male connector 110 is determined based on the control program registered in advance on the control unit 203 side. Be operated.
Next, in step S103, the control parameter generation unit 202 instructs the imaging unit 201 of the monocular camera 102 to capture an image, and the monocular camera 103 includes the male connector 110 held by the holding unit 101, An image in which both the female connector 120 as the insertion destination is shown is captured.

次に、ステップＳ１０４において、制御パラメータ生成部２０２は、撮像部２０１から画像を取得し、制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を決定する。この制御量の決定ついては、制御パラメータ生成部２０２は、図３のプロセッサ３０２、メモリ３０３をハードとして用いるとともに、ニューラルネットワークを用いて制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を計算する。ニューラルネットワークを用いた制御量の計算方法は後述する。 Next, in step S104, the control parameter generation unit 202 acquires an image from the imaging unit 201, and determines control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). Regarding the determination of the control amount, the control parameter generation unit 202 uses the processor 302 and the memory 303 of FIG. 3 as hardware and calculates the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) using a neural network. To do. A control amount calculation method using a neural network will be described later.

次に、ステップＳ１０５において、制御部２０３は、制御パラメータ生成部２０２が出力した制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を取得するとともに、予め決めておいた閾値と制御量のすべての成分を比較する。制御量のすべての成分が閾値以下であれば、ステップＳ１０７へ進み、制御部２０３は、オス側コネクタ１１０をメス側コネクタ１２０へ挿入するよう駆動部２０４を制御する。
制御量のいずれかの成分が閾値より大きければ、ステップＳ１０６において、制御部２０３は、制御パラメータ生成部２０２が出力した制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を用いて駆動部２０４を制御し、ステップＳ１０３へ戻る。Next, in step S105, the control unit 203 acquires the control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) output from the control parameter generation unit 202, and sets a predetermined threshold and control amount. Compare all ingredients. If all the components of the control amount are equal to or less than the threshold value, the process proceeds to step S107, and the control unit 203 controls the drive unit 204 to insert the male connector 110 into the female connector 120.
If any component of the control amount is larger than the threshold value, in step S106, the control unit 203 uses the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) output by the control parameter generation unit 202 to drive the drive unit. 204 is controlled, and the process returns to step S103.

次に図４のステップＳ１０４でのニューラルネットワークを用いた制御量の計算方法について説明する。
ニューラルネットワークを用いた制御量の計算を行う前に、事前準備として、ニューラルネットワークよって入力画像から嵌合成功までの移動量が算出できるようにするため、事前に、画像と必要な移動量のセットを集める。例えば、位置が既知である嵌合状態のオス側コネクタ１１０とメス側コネクタ１２０に対し、ロボットアーム１００の把持部１０１でオス側コネクタ１１０を把持する。そして、既知の引き抜き方向に把持部１０１を動かしながら挿入開始位置まで動かすとともに、単眼カメラ１０２で複数枚画像を取得する。また、挿入開始位置を制御量（０，０，０，０，０，０）として嵌合状態から挿入開始までの移動量だけの移動量だけはなく、その周辺の移動量、すなわち制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）にとそれに対応する画像も取得する。
図５は、実施の形態１における単眼カメラ１０２が撮影した挿入開始位置とその周辺付近でのカメラ画像と制御量を示す図の例である。Next, the control amount calculation method using the neural network in step S104 of FIG. 4 will be described.
Before calculating the control amount using a neural network, as a preliminary preparation, the neural network can calculate the amount of movement from the input image to the fitting success. Collect. For example, the male connector 110 is gripped by the grip portion 101 of the robot arm 100 with respect to the male connector 110 and the female connector 120 in a fitted state whose positions are known. Then, the grip unit 101 is moved to a known insertion direction while moving to the insertion start position, and a plurality of images are acquired by the monocular camera 102. Further, the insertion start position is set as a control amount (0, 0, 0, 0, 0, 0), not only the movement amount from the fitting state to the insertion start, but also the peripheral movement amount, that is, the control amount ( ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) and corresponding images are also acquired.
FIG. 5 is an example of a diagram illustrating an insertion start position taken by the monocular camera 102 according to Embodiment 1, a camera image near the periphery, and a control amount.

そして、嵌合状態から挿入開始位置までの移動量と単眼カメラ１０２における挿入開始位置及び周辺の位置の画像からなる複数のセットを用いて、一般的なニューラルネットワークの学習則に基づき（例：確率的勾配法）学習させる。
ニューラルネットワークにはCNNやRNNなど色々な形態が存在するが、本発明はその形態に依存せず、任意の形態を使用することができる。Then, based on a learning rule of a general neural network, using a plurality of sets of movement amounts from the fitting state to the insertion start position and images of the insertion start position and surrounding positions in the monocular camera 102 (eg, probability) Gradient method).
There are various forms such as CNN and RNN in the neural network, but the present invention does not depend on the form, and any form can be used.

図６は、実施の形態１におけるニューラルネットワークと、ニューラルネットワークの学習則の例を示す図である。
入力層には、単眼カメラ１０２から得られた画像（例えば各ピクセルの輝度、色差の値）が入力され、出力層は制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）が出力される。
ニューラルネットワークの学習過程において、入力された画像から中間層を経て得られた出力層の出力値が画像セットで記憶された制御量に近似させるために中間層のパラメータを最適化させることが行われる。その近似方法として確率的勾配法等がある。FIG. 6 is a diagram illustrating an example of the neural network and the learning rule of the neural network according to the first embodiment.
An image obtained from the monocular camera 102 (for example, luminance and color difference values of each pixel) is input to the input layer, and control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) are output to the output layer. .
In the learning process of the neural network, the parameters of the intermediate layer are optimized so that the output value of the output layer obtained from the input image through the intermediate layer approximates the control amount stored in the image set. . The approximation method includes a stochastic gradient method.

したがって、図５に示すように嵌合状態から挿入開始までの移動量だけの移動量だけはなく、その周辺の移動にとそれに対応する画像を取得して学習させることで、より正確な学習を行うことができる。
また、図５においては、オス側コネクタ１１０は単眼カメラ１０２に対して位置が固定であり、メス側コネクタ１２０のみの位置が変化した場合について示しているが、実際は、ロボットアーム１００の把持部１０１が、正確な位置でオス側コネクタ１１０を把持するわけではなく、個体差等によって、オス側コネクタ１１０の位置がずれた場合も存在する。したがって、この学習の過程においてオス側コネクタ１１０が正確な位置からずれた場合の挿入開始位置とその付近の位置の複数の制御量と画像のセットを取得して学習することで、オス側コネクタ１１０とメス側コネクタ１２０の両方の個体差に対応できた学習が行われる。Therefore, as shown in FIG. 5, not only the amount of movement from the fitted state to the start of insertion, but also the movement around it and the corresponding images are acquired and learned, so that more accurate learning is possible. It can be carried out.
5 shows a case where the position of the male connector 110 is fixed with respect to the monocular camera 102 and the position of only the female connector 120 is changed, but in reality, the gripping portion 101 of the robot arm 100 is shown. However, the male connector 110 is not gripped at an accurate position, and the male connector 110 may be misaligned due to individual differences or the like. Therefore, by acquiring and learning a set of a plurality of control amounts and images at the insertion start position and the position in the vicinity thereof when the male connector 110 is displaced from an accurate position in the learning process, the male connector 110 is acquired. And learning that can cope with individual differences between the female connector 120 and the female connector 120 are performed.

ただし、ここで注意が必要なのは、制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）は撮影した時点の嵌合状態位置から挿入開始位置までの移動量を除いて算出するため、挿入開始位置から嵌合状態位置までの移動量については、図４のステップＳ１０７で用いるために、別途記憶する必要がある。また、上記座標は単眼カメラの座標系として求まるため、制御部２０３は単眼カメラの座標系とをロボットアーム１００全体の座標系が異なる場合には変換したうえでロボットアーム１００を制御する必要がある。 However, it should be noted here that control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) are calculated by excluding the amount of movement from the fitting state position to the insertion start position at the time of shooting. The amount of movement from the position to the fitted state position needs to be stored separately for use in step S107 in FIG. Further, since the coordinates are obtained as the coordinate system of the monocular camera, the control unit 203 needs to control the robot arm 100 after converting the coordinate system of the monocular camera when the coordinate system of the entire robot arm 100 is different. .

この実施例において、単眼カメラをロボットアーム１００に固定しているため、メス側コネクタ１２０が置かれている座標系と、単眼カメラ１０２の座標系が異なるためである。したがって、単眼カメラ１０２がメス側コネクタ１２０の位置と同じ座標系であれば、単眼カメラ１０２の座標系からロボットアーム１００の座標系への変換は不要となる。 In this embodiment, since the monocular camera is fixed to the robot arm 100, the coordinate system where the female connector 120 is placed and the coordinate system of the monocular camera 102 are different. Therefore, if the monocular camera 102 is in the same coordinate system as the position of the female connector 120, conversion from the coordinate system of the monocular camera 102 to the coordinate system of the robot arm 100 is not necessary.

次に、図４の動作の詳細と動作例について説明する、
ステップＳ１０１において、ロボットアーム１００がオス側コネクタ１１０を把持するために、事前に登録した動作通りオス側コネクタ１１０を把持し、ステップＳ１０２において、メス側コネクタ１２０はほぼ上まで移動される。Next, details of the operation of FIG. 4 and an operation example will be described.
In step S101, the robot arm 100 grips the male connector 110 according to the operation registered in advance in order to grip the male connector 110, and in step S102, the female connector 120 is moved almost up.

この時に、把持しているオス側コネクタ１１０の把持する直前の位置が常に一定とは限らない。このオス側コネクタ１１０の位置をセットする機械の微妙な動作ずれ等で、微妙な誤差が常に発生している可能性がある。同様にメス側コネクタ１２０も何らかの誤差を持っている可能性もある。 At this time, the position immediately before gripping of the male connector 110 being gripped is not always constant. There may be a case where a subtle error always occurs due to a subtle operation deviation of the machine for setting the position of the male connector 110. Similarly, the female connector 120 may have some error.

そのため、ステップＳ１０３において、図５のようにロボットアーム１００に付属している単眼カメラ１０２の撮像部２０１で撮影された画像に、オス側コネクタ１１０とメス側コネクタ１２０両方が映っている画像を取得していることが重要となる。単眼カメラ１０２のロボットアーム１００に対する位置は常に固定されているため、オス側コネクタ１１０とメス側コネクタ１２０との相対的な位置情報がこの画像には反映されている。 Therefore, in step S103, an image in which both the male connector 110 and the female connector 120 are captured in the image captured by the imaging unit 201 of the monocular camera 102 attached to the robot arm 100 as shown in FIG. 5 is acquired. Is important. Since the position of the monocular camera 102 with respect to the robot arm 100 is always fixed, the relative position information of the male connector 110 and the female connector 120 is reflected in this image.

ステップＳ１０４において、この相対的な位置情報を事前に学習した図６に示すようなニューラルネットワークを持つ制御パラメータ生成部２０２により制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）が計算される。ただし、学習の出来・不出来によっては、制御パラメータ生成部２０２が出力する制御量が挿入開始位置まで動作できない場合もある。その場合、ステップＳ１０３〜Ｓ１０６のループを複数回繰り返すことによってステップＳ１０５に示す閾値以下となるように制御パラメータ生成部２０２が繰り返し計算し、制御部２０３と駆動部２０４が制御してロボットアーム１００の位置を制御する場合もある。 In step S104, control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) are calculated by the control parameter generation unit 202 having a neural network as shown in FIG. . However, depending on whether or not learning is possible, the control amount output by the control parameter generation unit 202 may not operate to the insertion start position. In that case, by repeating the loop of steps S103 to S106 a plurality of times, the control parameter generation unit 202 repeatedly calculates so as to be equal to or less than the threshold value shown in step S105, and the control unit 203 and the drive unit 204 control to control the robot arm 100. In some cases, the position is controlled.

Ｓ１０５に示す閾値は嵌合するオス側コネクタ１１０とメス側コネクタ１２０の要求精度によって決まる。例えば、コネクタとの嵌めあいが緩く、元々コネクタの特性として精度がそこまで必要のない場合には、閾値を大きく設定できる。また逆の場合には閾値を小さく設定することになる。一般的に製造工程の場合には、製作が許容できる誤差が規定されることが多いため、この値を用いることも可能である。 The threshold value shown in S105 is determined by the required accuracy of the male connector 110 and the female connector 120 to be fitted. For example, when the fitting with the connector is loose and the accuracy of the connector characteristic is not necessary so much, the threshold value can be set large. In the opposite case, the threshold value is set small. In general, in the case of a manufacturing process, an error that can be tolerated in production is often defined, and this value can be used.

また、学習の出来・不出来によっては、制御パラメータ生成部２０２が出力する制御量が挿入開始位置まで動作できない場合を想定すると、挿入開始位置を複数位置設定してもよい。オス側コネクタ１１０とメス側コネクタ１２０との距離を十分にとらないまま挿入開始位置を設定してしまうと挿入開始を行う前にオス側コネクタ１１０とメス側コネクタ１２０が当接し、いずれかを破損してしまうリスクも存在する。その場合は、例えばオス側コネクタ１１０とメス側コネクタ１２０とのクリアランスを最初は５ｍｍ、次は２０ｍｍ、次は１０ｍｍというように、図4におけるステップＳ１０３〜ステップＳ１０６の間のループの回数に応じて挿入開始位置を設定してもよい。 Also, depending on whether or not learning is possible, assuming that the control amount output from the control parameter generation unit 202 cannot operate to the insertion start position, a plurality of insertion start positions may be set. If the insertion start position is set without taking a sufficient distance between the male connector 110 and the female connector 120, the male connector 110 and the female connector 120 come into contact with each other before the insertion starts, and either one is damaged. There is also the risk of doing so. In this case, for example, the clearance between the male connector 110 and the female connector 120 is 5 mm first, 20 mm next, and 10 mm next, depending on the number of loops between step S103 to step S106 in FIG. An insertion start position may be set.

尚、本実施の形態においては、コネクタを用いて説明したが、この技術の適用はコネクタの嵌合に限られない。例えば基板にＩＣを載せる場合にも適用できるし、特に足の寸法誤差が大きいコンデンサ等を基板の穴に挿入するにおいても、同様の方法を用いれば効果を奏するものである。
また、必ずしも基板への挿入に限った話ではなく、画像と制御量の関係から制御量を求める位置制御全般に利用できる。この発明においては、ニューラルネットワークを用いて画像と制御量との関係を学習させることで、モノとモノとの位置合わせを行う際の各々の個体差を吸収できるというメリットがある。In the present embodiment, the description has been given using the connector, but the application of this technique is not limited to the fitting of the connector. For example, the present invention can be applied to the case where an IC is mounted on a substrate. In particular, when a capacitor having a large dimensional error is inserted into a hole in the substrate, the same method can be used.
Further, the present invention is not necessarily limited to the insertion into the substrate, and can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, there is a merit that each individual difference at the time of aligning an object and an object can be absorbed by learning a relationship between an image and a control amount using a neural network.

したがって、実施の形態１において、二つのモノが存在する画像を撮像する撮像部２０１と、撮像された二つのモノの画像の情報をニューラルネットワークの入力層に入力し、二つのモノの位置関係を制御するための位置の制御量をニューラルネットワークの出力層として出力する制御パラメータ生成部２０２と、出力された位置の制御量を用いて二つのモノの位置関係を制御するための電流または電圧を制御する制御部２０３と、二つのモノの位置関係を制御するための電流または電圧を用いて二つのモノの位置関係の一方の位置を移動させる駆動部２０４を備えたので、個々のモノの個体差または二つのモノの位置関係の誤差があっても単眼カメラのみで位置合わせを行うことができるという効果がある。 Therefore, in the first embodiment, the image capturing unit 201 that captures an image in which two objects are present, and information on the captured two object images are input to the input layer of the neural network, and the positional relationship between the two objects is determined. A control parameter generation unit 202 that outputs a control amount of a position for control as an output layer of the neural network, and a current or voltage for controlling the positional relationship between the two objects using the output control amount of the position And a drive unit 204 that moves one position of the two things using a current or voltage for controlling the positional relation between the two things. Alternatively, even if there is an error in the positional relationship between the two objects, there is an effect that the alignment can be performed using only a monocular camera.

今回、ニューラルネットワークを一つ使う実施例について説明したが、必要に応じて複数使用する必要が出てくる。なぜならば、今回のように入力を画像、出力を数値とした場合、この数値の近似精度には限界があり、状況によっては数％程度の誤差が出てきてしまう。図４のステップ２の挿入開始付近の位置から、挿入開始位置までの量次第では、ステップＳ１０５の判定が常にNoになってしまい動作が完了しない場合がある。そのような場合には、図７のように複数のネットワークを用いる。
図７は、実施の形態１におけるニューラルネットワークにおいて、複数のネットワークをもちいたフローチャートである。図４のステップＳ１０４の詳細ステップを示している。複数のパラメータは図２の制御パラメータ生成部に含まれている。Although the embodiment using one neural network has been described this time, it is necessary to use a plurality of neural networks as necessary. This is because when the input is an image and the output is a numerical value as in this case, there is a limit to the approximation accuracy of the numerical value, and an error of several percent may occur depending on the situation. Depending on the amount from the position near the insertion start position in step 2 in FIG. 4 to the insertion start position, the determination in step S105 may always be No and the operation may not be completed. In such a case, a plurality of networks are used as shown in FIG.
FIG. 7 is a flowchart using a plurality of networks in the neural network according to the first embodiment. The detailed step of FIG.4 S104 is shown. The plurality of parameters are included in the control parameter generation unit of FIG.

ステップＳ７０１において、制御パラメータ生成部２０２は、入力された画像に基づいてどのネットワークを用いるかを選択する。
ループ回数が１回目または得られた制御量が25mm以上の場合はニューラルネットワーク１を選択してステップＳ７０２に進む。また、ループ回数が２回目以降で得られた制御量が５mm以上２５mm未満の場合はニューラルネットワーク２を選択してステップＳ７０３に進む。さらにループ回数が２回目以降で得られた制御量が５mm未満の場合はニューラルネットワーク３を選択してステップＳ７０４に進む。ステップＳ７０２〜ステップＳ７０４において選択されたニューラルネットワークを用いて制御量を算出する。
例えば、各ニューラルネットワークはオス側コネクタ１１０とメス側コネクタ１２０の距離もしくは制御量応じて学習されており、図中のニューラルネットワーク３は誤差が±1mm、±１度の範囲内の学習データを、ニューラルネットワーク２は±１〜±１０mm、±１〜±５度の範囲の学習データを、と段階的に学習するデータの範囲をかえている。ここで各ニューラルネットワークにおいて使用する画像の範囲をオーバーラップさせない方が効率的である。
また、この図７では３つの例を示しているが、ネットワークの数は特に制限がない。このような方式を用いる場合には、どのネットワークを使用するのかを決めるステップＳ７０１の判別機能を「ネットワーク選択スイッチ」として用意する必要がある。
このネットワーク選択スイッチは、ニューラルネットワークでも構成できる。この場合、入力層への入力画像、出力層の出力はネットワーク番号になる。画像データは、全てのネットワークで使用している画像、ネットワーク番号のペアを使用する。In step S701, the control parameter generation unit 202 selects which network to use based on the input image.
When the number of loops is the first time or the obtained control amount is 25 mm or more, the neural network 1 is selected and the process proceeds to step S702. If the control amount obtained after the second loop is 5 mm or more and less than 25 mm, the neural network 2 is selected and the process proceeds to step S703. Further, when the control amount obtained after the second loop is less than 5 mm, the neural network 3 is selected and the process proceeds to step S704. A control amount is calculated using the neural network selected in steps S702 to S704.
For example, each neural network has been learned according to the distance or control amount between the male connector 110 and the female connector 120, and the neural network 3 in the figure has learned data within an error range of ± 1 mm and ± 1 degree. The neural network 2 changes the range of data to be learned step by step from learning data in the range of ± 1 to ± 10 mm and ± 1 to ± 5 degrees. Here, it is more efficient not to overlap the range of images used in each neural network.
Further, although three examples are shown in FIG. 7, the number of networks is not particularly limited. When such a method is used, it is necessary to prepare the discrimination function in step S701 for determining which network is used as a “network selection switch”.
This network selection switch can also be constituted by a neural network. In this case, the input image to the input layer and the output of the output layer are network numbers. The image data uses a pair of images and network numbers used in all networks.

尚、複数のニューラルネットワークを用いた例についてもコネクタを用いて説明したが、この技術の適用はコネクタの嵌合に限られない。例えば基板にＩＣを載せる場合にも適用できるし、特に足の寸法誤差が大きいコンデンサ等を基板の穴に挿入するにおいても、同様の方法を用いれば効果を奏するものである。
また、複数のニューラルネットワークを用いた例についても必ずしも基板への挿入に限った話ではなく、画像と制御量の関係から制御量を求める位置制御全般に利用できる。この発明においては、ニューラルネットワークを用いて画像と制御量との関係を学習させることで、モノとモノとの位置合わせを行う際の各々の個体差を吸収できるというメリットがあり、より、精度よく制御量を算出できる。Although an example using a plurality of neural networks has been described using a connector, application of this technique is not limited to connector fitting. For example, the present invention can be applied to the case where an IC is mounted on a substrate. In particular, when a capacitor having a large dimensional error is inserted into a hole in the substrate, the same method can be used.
Further, the example using a plurality of neural networks is not necessarily limited to the insertion into the board, but can be used for the entire position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, by learning the relationship between the image and the control amount using a neural network, there is a merit that each individual difference when aligning a thing and a thing can be absorbed, and more accurately. Control amount can be calculated.

したがって、二つのモノが存在する画像を撮像する撮像部２０１と、撮像された二つのモノの画像の情報をニューラルネットワークの入力層に入力し、二つのモノの位置関係を制御するための位置の制御量をニューラルネットワークの出力層として出力する制御パラメータ生成部２０２と、出力された位置の制御量を用いて二つのモノの位置関係を制御するための電流または電圧を制御する制御部２０３と、二つのモノの位置関係を制御するための電流または電圧を用いて二つのモノの位置関係の一方の位置を移動させる駆動部２０４を備え、制御パラメータ生成部２０２は、複数のニューラルネットワークから一つを選択する構成としたので、個々のモノの個体差または二つのモノの位置関係の誤差があっても位置合わせを行うことをより精度よく行えるという効果がある。 Therefore, the image capturing unit 201 that captures an image in which two things are present, and information on the captured two thing images are input to the input layer of the neural network, and position information for controlling the positional relationship between the two things is input. A control parameter generation unit 202 that outputs a control amount as an output layer of a neural network, a control unit 203 that controls a current or voltage for controlling the positional relationship between two things using the output control amount of the position, A drive unit 204 is provided for moving one position of the two thing positional relationships using a current or voltage for controlling the positional relationship between the two things, and the control parameter generating unit 202 is one of a plurality of neural networks. Because it is configured to select, even if there is an individual difference of individual items or an error in the positional relationship between two items, it is more There is an effect that can be performed well every time.

実施の形態２．
実施の形態１においては、位置が既知である嵌合状態のオス側コネクタ１１０とメス側コネクタ１２０に対し、ロボットアーム１００の把持部１０１でオス側コネクタ１１０を把持する。そして、既知の引き抜き方向に把持部１０１を動かしながら挿入開始位置まで動かすとともに、単眼カメラ１０２で複数枚画像を取得していた。実施の形態２においては、オス側コネクタ１１０とメス側コネクタ１２０の嵌合位置が未知であった場合について説明する。Embodiment 2. FIG.
In the first embodiment, the male connector 110 is gripped by the grip portion 101 of the robot arm 100 with respect to the male connector 110 and the female connector 120 in a fitted state whose positions are known. And while moving the holding | grip part 101 to the known pulling direction, it moved to the insertion start position, and the multiple image was acquired with the monocular camera 102. FIG. In the second embodiment, a case where the fitting position of the male connector 110 and the female connector 120 is unknown will be described.

ロボットが自ら学習し適切な行動を獲得する手法の先行研究として、強化学習と呼ばれる手法が研究されている。この手法では、ロボットが様々な動作を試行錯誤的に行い、良い結果を出した行動を記憶しながら結果として行動を最適化するのだが、行動の最適化のためには大量な試行回数を必要としている。
この試行回数を減らす手法として、強化学習の中で方策オン(on policy)と呼ばれる枠組みが一般的に用いられている。しかしながら、この枠組みをロボットアームのティーチングに応用するには、ロボットアームや制御信号に特化した様々な工夫を行う必要があるため困難であり、実用化までには至っていない。
実施の形態２では、実施の形態１におけるようなロボットが様々な動作を試行錯誤的に行い、良い結果を出した行動を記憶しながら結果として行動を最適化のための大量な試行回数を軽減することができる形態について説明する。A technique called reinforcement learning has been studied as a prior study of techniques for robots to learn themselves and acquire appropriate behavior. In this method, the robot performs various actions on a trial and error basis, optimizing the behavior as a result while memorizing the behavior with good results, but a large number of trials are required to optimize the behavior. It is said.
As a technique for reducing the number of trials, a framework called “on policy” is generally used in reinforcement learning. However, it is difficult to apply this framework to teaching of a robot arm because it is necessary to devise various measures specialized for the robot arm and control signals, and it has not been put into practical use.
In the second embodiment, the robot as in the first embodiment performs various operations on a trial and error basis, and memorizes the actions that have given good results while reducing the number of trials for optimizing the actions as a result. The form which can be done is demonstrated.

システム構成を説明する。特に記述しない部分については実施の形態１と同じである。
全体のハードウエア構成としては実施の形態１の図１と同じであるが、ロボットアーム１００には把持部１０１にかかる負荷を計測する力覚センサ８０１(図１には図示していない)が付加されている点が異なる。The system configuration will be described. Parts that are not particularly described are the same as those in the first embodiment.
Although the overall hardware configuration is the same as that of FIG. 1 of the first embodiment, a force sensor 801 (not shown in FIG. 1) for measuring the load applied to the gripping unit 101 is added to the robot arm 100. Is different.

図８は、実施の形態２における位置制御装置の機能構成図を示す。図２との違いは、力覚センサ８０１、経路決定部８０２、が追加されており、かつ経路決定部８０２は、Critic部８０３、Actor部８０４、評価部８０５、経路設定部８０６から構成されている。
図９は、実施の形態２における位置制御装置のハードウエア構成図である。図３と異なるのは、力覚センサ８０１が入出力インターフェース３０１と電気的または通信可能に接続されている点のみである。また、入出力インターフェース３０１、プロセッサ３０２、メモリ３０３は、図８の制御パラメータ生成部２０２の機能を構成するとともに、経路決定部８０２の機能も構成する。したがって力覚センサ８０１と単眼カメラ２０１と入出力インターフェース３０１間、入出力インターフェース３０１と制御回路間３０４間は別体として構成されていてもよい。FIG. 8 is a functional configuration diagram of the position control device according to the second embodiment. The difference from FIG. 2 is that a force sensor 801 and a route determination unit 802 are added, and the route determination unit 802 includes a critical unit 803, an actor unit 804, an evaluation unit 805, and a route setting unit 806. Yes.
FIG. 9 is a hardware configuration diagram of the position control device according to the second embodiment. The only difference from FIG. 3 is that the force sensor 801 is electrically or communicably connected to the input / output interface 301. In addition, the input / output interface 301, the processor 302, and the memory 303 constitute the function of the control parameter generation unit 202 in FIG. 8 and also the function of the route determination unit 802. Therefore, the force sensor 801, the monocular camera 201, and the input / output interface 301, and the input / output interface 301 and the control circuit 304 may be configured separately.

次に図８の詳細について説明する。
力覚センサ８０１は、ロボットアーム１００の把持部１０１にかかる負荷を計測するものであり、例えば図１でいうオス側コネクタ１１０とメス側コネクタ１２０が当接した場合の力の値を計測できるものである。
Critic部８０３及びActor部８０４は、S3、S4は従来の強化学習でいう、Critic部、Actor部と同じである。
ここで従来の強化学習手法について説明する。本実施例では強化学習の中でもActor-Criticモデルと呼ばれるモデルを使用している（参考文献：強化学習 : R.S.Sutton and A.G.Barto 2000年12月出版）。Actor部８０４、Critic部８０３は環境の状態を撮像部２０１や力覚センサ８０１を通じて取得している。Actor部８０４は、センサデバイスを用いて取得した環境状態Iを入力とし、ロボットコントローラへ制御量Aを出力する関数である。Critic部８０３はActor部８０４に嵌合が適切に成功するよう、入力Iに対してActor部８０４が出力Aを適切に学習するための機構である。
以下、従来の強化学習手法の方式に関して記載する。Next, details of FIG. 8 will be described.
The force sensor 801 measures a load applied to the grip portion 101 of the robot arm 100, and can measure a force value when the male connector 110 and the female connector 120 in FIG. It is.
Critic part 803 and Actor part 804 are the same as Critic part and Actor part, where S3 and S4 are the conventional reinforcement learning.
Here, a conventional reinforcement learning method will be described. In this embodiment, a model called Actor-Critic model is used in reinforcement learning (reference: reinforcement learning: RSSutton and AGBarto published in December 2000). The Actor unit 804 and the Critic unit 803 acquire the state of the environment through the imaging unit 201 and the force sensor 801. The Actor unit 804 is a function that receives the environmental state I acquired using the sensor device and outputs a control amount A to the robot controller. The Critic unit 803 is a mechanism for the Actor unit 804 to appropriately learn the output A with respect to the input I so that the fitting to the Actor unit 804 is appropriately successful.
Hereinafter, the conventional reinforcement learning method will be described.

強化学習では、報酬Rと呼ばれる量を定義し、そのRを最大化するような行動AをActor部８０４が獲得できるようにしている。一例として、学習する作業を実施の形態１に示すようなオス側コネクタ１１０とメス側コネクタ１２０との嵌合とすると、嵌合が成功した時にR = 1, そうでない時はR = 0などと定義される。行動Aは、今回は現時点の位置（X、Y、Z、Ax、Ay、Az）からの移動補正量を示し、A ＝（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）である。ここで、X,Y,Zはロボットの中心部を原点とする位置座標を示し、Ax、Ay、Azは、それぞれ、X軸、Y軸、Z軸を中心とする回転量を示している。移動補正量は、現在の地点からオス側コネクタ１１０の嵌合について最初に試行するための嵌合開始位置からの制御量である。環境状態、すなわち試行結果の観測は撮像部２０１からの画像と力覚センサ８０１の値から得られる。 In reinforcement learning, an amount called a reward R is defined, and the Actor unit 804 can acquire an action A that maximizes the R. As an example, assuming that the learning operation is the fitting between the male connector 110 and the female connector 120 as shown in the first embodiment, R = 1 when the fitting is successful, R = 0 when the fitting is successful, and so on. Defined. The action A indicates a movement correction amount from the current position (X, Y, Z, Ax, Ay, Az) this time, and A = (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). Here, X, Y, and Z indicate position coordinates with the center of the robot as the origin, and Ax, Ay, and Az indicate rotation amounts about the X, Y, and Z axes, respectively. The movement correction amount is a control amount from the fitting start position for first trying the fitting of the male connector 110 from the current point. The observation of the environmental state, that is, the trial result is obtained from the image from the imaging unit 201 and the value of the force sensor 801.

強化学習では、状態価値関数V(I)と呼ばれる関数をCritic部８０３で学習する。ここで、時刻t = 1（例えば嵌合試行開始時）の時に、状態I(1)にて行動A(1)をとり、時刻t = 2（例えば１回目の嵌合試行終了後２回目の嵌合開始前）の時に環境がI(2)に変化し、報酬量R(2)(初回の嵌合試行結果)を得たとする。様々な更新式が考えられるが、下記を一例として挙げる。
V(I)の更新式は以下で定義される。In reinforcement learning, a function called a state value function V (I) is learned by the critical part 803. Here, at time t = 1 (for example, at the start of a fitting trial), action A (1) is taken in state I (1), and time t = 2 (for example, the second time after the end of the first fitting trial). It is assumed that the environment changes to I (2) at the time of (before the start of fitting) and a reward amount R (2) (first fitting trial result) is obtained. Various update formulas can be considered, but the following is given as an example.
The update formula for V (I) is defined below.

ここで、δは予測誤差、αは学習係数であり0 〜 1までの正の実数、γは割引率であり0〜 1までの正の実数である。
Actor部８０４は入力をI、出力をA(I)とし以下の通り、A(I)が更新される。
δ>0の時

Here, δ is a prediction error, α is a learning coefficient and is a positive real number from 0 to 1, and γ is a discount rate and is a positive real number from 0 to 1.
In the Actor unit 804, the input is I and the output is A (I), and A (I) is updated as follows.
When δ> 0

δ≦0の時

When δ ≦ 0

ここで、σは出力の標準偏差の値を示し、Actorは状態Iにおいて、A(I)に平均0、分散をσ²とした分布を持つ乱数を加算する。すなわち、試行の結果いかんにかかわらず、ランダムに２回目の移動補正量が決定されるようなものである。
なお、上記の更新式を一例として用いているが、Actor-Criticモデルも様々な更新式があり、上記にとらわれず一般的に使用されているモデルであれば変更が可能である。

Here, σ represents the value of the standard deviation of the output, and in the state I, the Actor adds a random number having a distribution with an average of 0 and a variance of σ ² to A (I). That is, the second movement correction amount is determined randomly regardless of the result of the trial.
Although the above update formula is used as an example, the Actor-Critic model also has various update formulas and can be changed as long as it is a commonly used model without being limited to the above.

ただし、Actor部８０４は上記の構成にて各状態にあった適切な行動を覚えることになるが、実施の形態１のとおりに動くのは学習が完了した時点である。学習中は経路設定部８０６から学習時の推奨行動が計算され受け渡されるため、学習時は制御部２０３に対して、経路設定部８０６からの移動信号をそのまま受けて制御部２０３が駆動部２０４を制御することになる。
すなわち、Actor-Criticの従来のモデルでは、嵌合が成功した時にR = 1, そうでない時はR = 0と定義されるため、嵌合が成功した時に初めて学習が行われ、かつ嵌合が成功するまでは、試行に用いられる移動補正量はランダムに与えられるため、試行の失敗度合に応じた次の試行のための移動補正量の決定は行われない。これは、Actor-Criticの従来のモデルだけでなく、Q−Learningなど他の強化学習モデルを用いても嵌合の成功と失敗そのものしか評価しないため、同様な結果となる。本発明の本実施の形態においては、この失敗度合を評価して次の試行のための移動補正量の決定するプロセスについて説明する。However, the Actor unit 804 learns an appropriate action in each state in the above configuration, but moves as in Embodiment 1 when learning is completed. During learning, the recommended action at the time of learning is calculated and passed from the route setting unit 806. Therefore, at the time of learning, the control unit 203 receives the movement signal from the route setting unit 806 as it is, and the control unit 203 drives the driving unit 204. Will be controlled.
In other words, in the conventional model of Actor-Critic, R = 1 is defined when the mating is successful, and R = 0 when the mating is successful. Until the success, the movement correction amount used for the trial is randomly given, and therefore the movement correction amount for the next trial according to the degree of failure of the trial is not determined. This is the same result because not only the conventional model of Actor-Critic but also other reinforcement learning models such as Q-Learning, only the success and failure of the fitting are evaluated. In the present embodiment of the present invention, a process for evaluating the degree of failure and determining a movement correction amount for the next trial will be described.

評価部８０５は、各嵌合試行時における評価を行う関数を生成する。
図１０は、実施の形態２におけるオス側コネクタ１１０とメス側コネクタ１２０との嵌合の試行の様子を示す図である。
例えば図１０（A）のような画像が試行の結果として手に入ったとする。この試行では、コネクタの嵌めあい位置が大きくずれるため失敗している。この時にどの程度成功に近いのかを計測し数値化し、成功度合を示す評価値を求める。数値化の方法として、例えば図１０（B）のように、画像中にて挿入先側のコネクタ表面積（ピクセル数）を計算する方法がある。この方法では、オス側コネクタ１１０とメス側コネクタ１２０の挿入失敗を、ロボットアーム１００の力覚センサ８０１によって検知した時にメス側コネクタ１２０嵌合面の表面のみ他の背景とは異なる色を塗布、あるいはシールを貼ってあることによって、画像からのデータ取得と計算がより簡易になる。また、これまで述べた方法はカメラの数が一台の場合だが、複数台のカメラを並べ撮影し、撮影されたそれぞれの画像を用いた結果を総合しても構わない。また、コネクタ表面積以外にも２次元方向（例えばＸ，Ｙ方向）のピクセル数等を取得しても同様なことが評価できる。The evaluation unit 805 generates a function for performing evaluation at each fitting trial.
FIG. 10 is a diagram showing a state of trial of fitting between the male connector 110 and the female connector 120 in the second embodiment.
For example, assume that an image as shown in FIG. 10A is obtained as a result of the trial. This attempt failed because the fitting position of the connector is greatly displaced. At this time, the degree of success is measured and digitized to obtain an evaluation value indicating the degree of success. As a numerical method, for example, as shown in FIG. 10B, there is a method of calculating the connector surface area (number of pixels) on the insertion destination side in the image. In this method, when the insertion failure of the male connector 110 and the female connector 120 is detected by the force sensor 801 of the robot arm 100, only the surface of the female connector 120 fitting surface is coated with a color different from the other backgrounds. Or by sticking a sticker, the data acquisition and calculation from an image become easier. Further, although the method described so far is a case where the number of cameras is one, it is also possible to shoot a plurality of cameras side by side and combine the results using the captured images. In addition to the connector surface area, the same can be evaluated by obtaining the number of pixels in the two-dimensional direction (for example, the X and Y directions).

経路設定部８０６は、処理として二つのステップにわかれる。
第一ステップでは、評価部８０５にて処理された評価結果とロボットが実施に動いた動きを学習する。ロボットの移動補正量をA、評価部８０５にて処理された成功度合を示す評価値をEとした時、経路設定部８０６はAを入力とし、Eを出力とする関数を用意し、近似する。関数としては一例としてRBF (Radial Basis Function)ネットワークが上げる。RBFは、様々な未知な関数を簡単に近似することが可能な関数として知られている。
例えば、k番目の入力The route setting unit 806 is divided into two steps as processing.
In the first step, the evaluation result processed by the evaluation unit 805 and the movement that the robot has moved to are learned. When the robot movement correction amount is A and the evaluation value indicating the degree of success processed by the evaluation unit 805 is E, the path setting unit 806 prepares and approximates a function having A as an input and E as an output. . An example of a function is the RBF (Radial Basis Function) network. RBF is known as a function that can easily approximate various unknown functions.
For example, the kth input

に対して出力f(x)は、以下のように定義される。

The output f (x) is defined as follows.

ここで、σは標準偏差、μはRBFの中心を意味する。

Here, σ means standard deviation, and μ means the center of RBF.

RBFにて学習するデータは、単体ではなく、試行開始時から最新のデータまでの全てを用いる。例えば、現在、N回目の試行の場合には、N個のデータが準備されている。学習によって上記のW=(w_1,・・・w_J)を決める必要があり、その決定については様々な方法が考えられるが、下記のようなRBF補完が一例としてあげられる。 The data learned by RBF is not a single data, but all data from the start of trial to the latest data. For example, currently, N data are prepared for the N-th trial. It is necessary to determine the above W = (w_1,... W_J) by learning, and various methods can be considered for the determination, but the following RBF complementation is given as an example.

とした時に

When

にて、学習が完了する。

The learning is completed.

RBF補完によって近似を終えた後は、最急降下法やPSO (Particle Swam Optimization)などの一般的な最適化手法により最小値を上記RBFネットワークにより求める。この最小値を次の推奨値として次のActor部８０４へ入力する。
要するに、上記事例を具体的に説明すると、失敗した時の移動補正量に対する表面積や2次元方向のピクセル数を評価値として試行回数ごとに時系列に並べてその並びの値を用いて最適解を求めるものである。もっとシンプルに2次元方向のピクセル数を減少させる方向に一定割合で移動させた移動補正量を求めてもよい。After the approximation by RBF interpolation, the minimum value is obtained by the RBF network by a general optimization method such as steepest descent method or PSO (Particle Swam Optimization). This minimum value is input to the next Actor unit 804 as the next recommended value.
In short, the above example will be explained in detail. The surface area and the number of pixels in the two-dimensional direction with respect to the movement correction amount at the time of failure are evaluated as the evaluation values, and the optimal solution is obtained using the values of the arrangement in time series. Is. More simply, the movement correction amount that is moved at a constant rate in the direction of decreasing the number of pixels in the two-dimensional direction may be obtained.

次に動作フローを図１１に示す。
図１１は、実施の形態２における位置制御装置の経路学習におけるフローチャートである。
まず、ステップS１１０１において、ロボットアーム１００の把持部１０１は、オス側コネクタ１１０を把持する。このオス側コネクタ１１０の位置や姿勢は図８の制御部２０３側で事前に登録されており、あらかじめ制御部２０３側に登録された制御プログラムに基づいて動作される。Next, an operation flow is shown in FIG.
FIG. 11 is a flowchart in route learning of the position control device according to the second embodiment.
First, in step S <b> 1101, the grip portion 101 of the robot arm 100 grips the male connector 110. The position and orientation of the male connector 110 are registered in advance on the control unit 203 side in FIG. 8 and operated based on a control program registered in advance on the control unit 203 side.

次に、ステップＳ１１０２において、ロボットアーム１００をメス側コネクタ１２０の挿入位置近辺まで近づける。このメス側コネクタ１１０のおおよその位置や姿勢は、図８の制御部２０３側で事前に登録されており、あらかじめ制御部２０３側に登録された制御プログラムに基づいてオス側コネクタ１１０の位置が、動作される。ここまでは実施の形態１における図４のフローチャートのステップＳ１０１〜Ｓ１０２と同じである。 Next, in step S1102, the robot arm 100 is brought close to the vicinity of the insertion position of the female connector 120. The approximate position and orientation of the female connector 110 are registered in advance on the control unit 203 side in FIG. 8, and the position of the male connector 110 is determined based on the control program registered in advance on the control unit 203 side. Be operated. The steps so far are the same as steps S101 to S102 in the flowchart of FIG. 4 in the first embodiment.

次に、ステップＳ１１０３において、経路決定部８０２は、単眼カメラ１０２の撮像部２０１に対し、画像を撮像するよう指示し、単眼カメラ１０２は、把持部１０１が把持しているオス側コネクタ１１０と、挿入先となるメス側コネクタ１２０とが両方映っている画像を撮像する。さらに、経路決定部８０２は、制御部２０３と単眼カメラ１０２に対し、現在位置付近での画像を撮像するよう指示し、制御部２０３に指示した複数の移動値に基づいて駆動部２０４によって移動された位置において単眼カメラはオス側コネクタ１１０と、挿入先となるメス側コネクタ１２０とが両方映っている画像を撮像する。 Next, in step S1103, the route determination unit 802 instructs the imaging unit 201 of the monocular camera 102 to capture an image, and the monocular camera 102 includes the male connector 110 held by the holding unit 101, An image in which both the female connector 120 as the insertion destination is shown is captured. Further, the route determination unit 802 instructs the control unit 203 and the monocular camera 102 to capture an image near the current position, and is moved by the drive unit 204 based on a plurality of movement values instructed to the control unit 203. In this position, the monocular camera captures an image in which both the male connector 110 and the female connector 120 as the insertion destination are shown.

次に、ステップＳ１１０４において、経路決定部８０２部のActor部８０４は、嵌合を行うための制御量を制御部２０３に与えて駆動部２０４によってロボットアーム１００を移動させ、オス側コネクタ１１０と、挿入先となるメス側コネクタ１２０の嵌合を試行する。
次にステップＳ１１０５において、駆動部２０４によってロボットアーム１００を移動中にコネクタ同士が接触した場合には移動量の単位量ごとに力覚センサ８０１の値と、単眼カメラ１０２からの画像を経路決定部８０２の評価部８０５とCritic部８０３が記憶する。Next, in step S1104, the Actor unit 804 of the route determination unit 802 gives a control amount for performing the fitting to the control unit 203 and moves the robot arm 100 by the driving unit 204, and the male connector 110, An attempt is made to fit the female connector 120 as the insertion destination.
Next, in step S1105, when the connectors contact each other while the robot arm 100 is being moved by the drive unit 204, the route determination unit displays the value of the force sensor 801 and the image from the monocular camera 102 for each unit amount of movement. The evaluation unit 805 and the critical unit 803 of 802 memorize.

そして、ステップＳ１１０６において嵌合が成功したかを評価部８０５とCritic部８０３が確認する。
通常、この時点では嵌合が成功しない。そのため、ステップＳ１１０８において評価部８０５は、成功度合を図１０で説明した方法で評価し、位置合わせに対する成功度合を示す評価値を経路設定部８０６に与える。
そして、ステップＳ１１０９において、経路設定部８０６は、上述した方法を用いて学習を行い、経路設定部８０６は、次の推奨値をActor部８０４に与えるとともに、Critic部８０３が報酬量に応じて求めた値を出力し、Actor部８０４が受信する。ステップＳ１１１０において、Actor部８０４は、Critic部８０３が出力した報酬量に応じて求めた値と経路設定部８０６が出力した次の推奨値を加算して移動補正量を求める。尚、このステップにおいて、経路設定部８０６が出力した次の推奨値を用いるだけで十分な効果がある場合には、Critic部８０３が出力した報酬量に応じて求めた値を加算する必要がないことは言うまでもない。また、Actor部８０４は、移動補正量を求めるために、Critic部８０３が出力した報酬量に応じて求めた値と経路設定部８０６が出力した次の推奨値の加算比率を設定し、加算比率に応じて変更してもよい。In step S1106, the evaluation unit 805 and the critical unit 803 confirm whether the fitting has been successful.
Usually, the mating is not successful at this point. Therefore, in step S1108, the evaluation unit 805 evaluates the success degree by the method described with reference to FIG. 10 and gives an evaluation value indicating the success degree with respect to the alignment to the route setting unit 806.
In step S1109, the route setting unit 806 performs learning using the method described above, and the route setting unit 806 gives the next recommended value to the Actor unit 804, and the Critic unit 803 obtains the amount according to the reward amount. The Actor unit 804 receives the received value. In step S1110, the Actor unit 804 adds the value calculated according to the reward amount output from the Critic unit 803 and the next recommended value output from the route setting unit 806 to determine the movement correction amount. In addition, in this step, when there is a sufficient effect only by using the next recommended value output by the route setting unit 806, it is not necessary to add the value obtained according to the reward amount output by the Critic unit 803. Needless to say. Further, the Actor unit 804 sets an addition ratio between the value calculated according to the reward amount output from the Critic unit 803 and the next recommended value output from the route setting unit 806 in order to determine the movement correction amount. You may change according to.

その後、ステップＳ１１１１において、Actor部８０４は、移動補正量を制御部２０３に与えてロボットアーム１００の把持部１０１を移動させる。
その後、再度、ステップ１１０３に戻り、移動補正量によって移動された位置で画像を撮影し、嵌合動作を行う。これを成功するまで繰り返す。
嵌合が成功した場合、ステップＳ１１０７において、嵌合成功後は、嵌合成功した時のステップＳ１１０２からＳ１１０６までのIについてActor部８０４及びCritic部８０３の学習を行う。最後に経路決定部８０２はこの学習されたニューラルネットワークのデータを制御パラメータ生成部２０２に与えることで、実施の形態１における動作が可能となる。Thereafter, in step S <b> 1111, the Actor unit 804 moves the grip unit 101 of the robot arm 100 by giving the movement correction amount to the control unit 203.
Thereafter, the process returns to step 1103 again, an image is taken at the position moved by the movement correction amount, and the fitting operation is performed. Repeat until successful.
When the fitting is successful, in step S1107, after successful fitting, learning of the actor part 804 and the critical part 803 is performed for I from steps S1102 to S1106 when the fitting is successful. Finally, the route determination unit 802 gives the learned neural network data to the control parameter generation unit 202, thereby enabling the operation in the first embodiment.

尚、上記ステップＳ１１０７において、嵌合成功した場合ＩについてActor部８０４及びCritic部８０３の学習を行うとしているが、嵌合試行開示から成功まで全ての試行時のデータを用いてActor部８０４及びCritic部８０３が学習してもよい。その場合、実施の形態１において、制御量に応じて複数のニューラルネットワークを形成する場合について、記載しているが、嵌合の成功の位置がわかれば、嵌合成功までの距離を用いて制御量の大きさに応じた適切な複数のニューラルネットワークを同時に形成させることが可能となる。 In step S1107, the Actor unit 804 and the Critic unit 803 learn about I when the fitting is successful. However, the data of all the trials from the disclosure of the fitting trial to the success is used. The unit 803 may learn. In that case, in the first embodiment, the case where a plurality of neural networks are formed according to the control amount is described, but if the position of the successful fitting is known, the control is performed using the distance to the successful fitting. It becomes possible to simultaneously form a plurality of appropriate neural networks according to the magnitude of the quantity.

強化学習モジュールとしてActor−Criticモデルをベースに記載したが、Q−Learningなど他の強化学習モデルを用いても構わない。
関数近似としてRBFネットワークをあげたが、他の関数近似手法（線形、二次関数、など）を用いても構わない。
評価手法として、コネクタの表面に色違いにする手法をあげたが、他の画像処理技術によりコネクタ間のずれ量等を評価手法としても構わない。The reinforcement learning module is described based on the Actor-Critic model, but other reinforcement learning models such as Q-Learning may be used.
Although the RBF network is given as the function approximation, other function approximation methods (linear, quadratic function, etc.) may be used.
As an evaluation method, the method of changing the color of the surface of the connector has been described. However, the shift amount between the connectors and the like may be used as the evaluation method by other image processing techniques.

また、実施の形態１及び本実施の形態で述べたように、この技術の適用はコネクタの嵌合に限られない。例えば基板にＩＣを載せる場合にも適用できるし、特に足の寸法誤差が大きいコンデンサ等を基板の穴に挿入する場合においても、同様の方法を用いれば効果を奏するものである。
また、必ずしも基板への挿入に限った話ではなく、画像と制御量の関係から制御量を求める位置制御全般に利用できる。この発明においては、ニューラルネットワークを用いて画像と制御量との関係を学習させることで、モノとモノとの位置合わせを行う際の各々の個体差を吸収できるというメリットがあり、より、精度よく制御量を算出できる。Further, as described in the first embodiment and the present embodiment, the application of this technique is not limited to the fitting of the connector. For example, the present invention can be applied when an IC is mounted on a substrate, and even when a capacitor or the like having a large dimensional error is inserted into a hole in the substrate, the same method is effective.
Further, the present invention is not necessarily limited to the insertion into the substrate, and can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, by learning the relationship between the image and the control amount using a neural network, there is a merit that each individual difference when aligning a thing and a thing can be absorbed, and more accurately. Control amount can be calculated.

したがって、本実施形態においては、制御量を学習するためにActor-Criticモデルを用いる際、Actor部８０４は、Critic部８０３が報酬量に応じて求めた値と、経路設定部８０６が評価値に基づいて求めた推奨値とを加算して試行するための移動補正量を求めることで、通常のActor-Criticモデルでは、位置合わせが成功するまでは非常に多くの試行錯誤数が必要だが、本発明により大幅に位置合わせの試行数を削減することが可能である。 Therefore, in the present embodiment, when the Actor-Critic model is used to learn the control amount, the Actor unit 804 uses the value obtained by the Critic unit 803 according to the reward amount, and the route setting unit 806 uses the evaluation value. The normal Actor-Critic model requires a lot of trial and error until alignment is successful. The invention can greatly reduce the number of alignment trials.

尚、本実施の形態においては、位置合わせ失敗時の撮像部２０１からの画像を評価することによって位置合わせの試行回数を削減することについて記載したが、位置合わせ試行時の力覚センサ８０１の値を用いても試行回数を削減することができる。例えば、コネクタの嵌合または二つのモノの挿入を含む位置合わせにおいて、失敗時は力覚センサ８０１の値がある閾値以上になった時に二つのモノの位置が嵌合または挿入が完了している位置にあるか否かをActor部８０４が判断することが一般的である。その場合に、ａ．閾値に達した時点で嵌合または挿入途中だった場合、ｂ．嵌合と挿入は完了しているが嵌合または挿入途中の力覚センサ８０１の値が、ある程度の値を示す場合なども考えられる。
ａ．の場合は、力覚センサ８０１の値と画像の両方を学習させる方法があり、詳細は実施の形態３に記載の方法を用いれば実施できる。
ｂ．の場合も、力覚センサ８０１の値のみで学習する方法として実施の形態３に記載の方法を用いれば、実施できる。また、別の方法として、Actor-Criticモデルでの報酬Rの定義において、嵌合または挿入最中にかかった最大負荷をFとし、Aを正の定数とした時、成功時、R = (1-A/F), 失敗時 R = 0と定義しても同様の効果を奏することができる。In the present embodiment, it has been described that the number of alignment trials is reduced by evaluating an image from the imaging unit 201 when the alignment fails, but the value of the force sensor 801 at the time of the alignment trial is described. Even if is used, the number of trials can be reduced. For example, in the alignment including the fitting of the connector or the insertion of two objects, when the value of the force sensor 801 exceeds a certain threshold value when the failure occurs, the position of the two objects is completed or the insertion is completed. In general, the Actor unit 804 determines whether or not it is in a position. In that case, a. If fitting or insertion is in progress when the threshold is reached, b. Although the fitting and insertion have been completed, the value of the force sensor 801 during the fitting or insertion may show a certain value.
a. In this case, there is a method of learning both the value of the force sensor 801 and an image, and details can be implemented by using the method described in the third embodiment.
b. In this case, the method described in Embodiment 3 can be used as a method of learning only with the value of the force sensor 801. Alternatively, in the definition of reward R in the Actor-Critic model, when the maximum load applied during mating or insertion is F and A is a positive constant, R = (1 -A / F), Failure When R = 0 is defined, the same effect can be achieved.

実施の形態３．
本実施の形態においては、実施の形態２において、位置合わせが成功した後に行う学習過程において効率的にデータを収集する方法について説明する。したがって特に説明しない場合については実施の形態２と同じものとする。すなわち、実施の形態３における位置制御装置の機能構成図は図８であり、ハードウエア構成図は図９となる。Embodiment 3 FIG.
In the present embodiment, a method for efficiently collecting data in a learning process performed after successful alignment in the second embodiment will be described. Therefore, unless otherwise specified, it is the same as in the second embodiment. That is, FIG. 8 is a functional configuration diagram of the position control device according to the third embodiment, and FIG. 9 is a hardware configuration diagram.

動作においては、実施の形態２における図１１のステップＳ１１０７の動作の際により効率的に学習データを収集する方法について以下説明する。 In operation, a method of collecting learning data more efficiently during the operation of step S1107 in FIG. 11 in the second embodiment will be described below.

図１２は、実施の形態３における位置制御装置の経路学習におけるフローチャートを示している。
まず、ステップＳ１２０１において、図１１のステップＳ１１０７においてオス側コネクタ１１０とメス側コネクタ１２０の嵌合が成功した場合、経路設定部８０６は、変数をi=0, j =1, k =1 として初期化する。変数iは、以降のロボットアーム１００の学習回数、変数kは、オス側コネクタ１１０とメス側コネクタ１２０のとの嵌合が外れた時からの学習回数、変数jは図１２のフローチャートのループ回数である。FIG. 12 shows a flowchart in route learning of the position control device in the third embodiment.
First, in step S1201, when the male connector 110 and the female connector 120 are successfully fitted in step S1107 of FIG. 11, the path setting unit 806 sets the variables as i = 0, j = 1, k = 1, and is initialized. Turn into. The variable i is the number of subsequent learnings of the robot arm 100, the variable k is the number of learnings after the male connector 110 and the female connector 120 are disengaged, and the variable j is the number of loops in the flowchart of FIG. It is.

次に、ステップＳ１２０２において、経路設定部８０６は、図１１ステップＳ１１０４において嵌合を行うために与えた移動量から１ｍｍ分、戻すようにActor部８０４を経由して制御部２０３に移動量を与え、駆動部２０４によってロボットアーム１００を移動させる。そして変数iに対して１加算する。ここで、移動量から１ｍｍ戻す指示を与えたが、必ずしも1ｍｍに限る必要はなく、0.5ｍｍでも2ｍｍなどの単位量でもよい。 Next, in step S1202, the path setting unit 806 gives the movement amount to the control unit 203 via the Actor unit 804 so as to return 1 mm from the movement amount given for fitting in step S1104 in FIG. Then, the robot arm 100 is moved by the drive unit 204. Then, 1 is added to the variable i. Here, an instruction to return 1 mm from the movement amount is given, but it is not necessarily limited to 1 mm, and may be a unit amount such as 0.5 mm or 2 mm.

次に、ステップＳ１２０３において、経路設定部８０６はその時の座標をO(i)（この時i = 1）として記憶する。
ステップＳ１２０４において、経路設定部８０６はO(i)を中心に、ランダムに制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を決定し、Actor部８０４を経由して制御部２０３に制御量与え、駆動部２０４によってロボットアーム１００を移動させる。この時、この制御量の最大量は、移動ができる範囲で任意に設定することができる。Next, in step S1203, the route setting unit 806 stores the coordinates at that time as O (i) (at this time i = 1).
In step S1204, the path setting unit 806 determines control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) at random with O (i) as the center, and controls the control unit 203 via the Actor unit 804. The robot arm 100 is moved by the driving unit 204. At this time, the maximum amount of the control amount can be arbitrarily set as long as it can be moved.

次にステップＳ１２０５において、ステップＳ１２０４において移動後の位置において、Actor部８０４は、移動量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）に対応する力覚センサ８０１の値を収集するとともに、ステップＳ１２０６において、Critic部８０３とActor部８０４は、移動量に−１を乗じた（-ΔX、-ΔY、-ΔZ、-ΔAx、-ΔAy、-ΔAz）とオス側コネクタ１１０を保持するためにかかる力を計測する力覚センサ８０１のセンサ値を学習データとして記録する。 Next, in step S1205, at the position after the movement in step S1204, the Actor unit 804 collects the values of the force sensor 801 corresponding to the movement amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). In S1206, the critical part 803 and the actor part 804 multiply the movement amount by −1 (−ΔX, −ΔY, −ΔZ, −ΔAx, −ΔAy, −ΔAz) and hold the male connector 110. The sensor value of the force sensor 801 that measures force is recorded as learning data.

次にステップＳ１２０７において、経路設定部８０６は、集めたデータ数が規定数Jに到達できたかを判定する。データ数が足りなければ、ステップＳ１２０８において変数j に１加算してステップＳ１２０４に戻り、制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を乱数によって変えてデータを取得し、規定数J個のデータが溜まるまでＳ１２０４〜Ｓ１２０７繰り返す。
規定数のデータが溜まったら、ステップＳ１２０９において、経路設定部８０６は、変数j を１にしたうえで、ステップＳ１２１０において、オス側コネクタ１１０とメス側コネクタ１２０のとの嵌合が外れたかを確認する。In step S1207, the path setting unit 806 determines whether the collected data number has reached the specified number J. If the number of data is insufficient, 1 is added to the variable j in step S1208, and the process returns to step S1204. The control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) is changed by a random number to obtain data, and the specified number J S1204 to S1207 are repeated until the number of data is accumulated.
When the prescribed number of data is accumulated, in step S1209, the path setting unit 806 sets the variable j to 1, and then confirms whether the male connector 110 and the female connector 120 are disengaged in step S1210. To do.

外れていなかったら、ステップＳ１２１１を経由してステップＳ１２０２に戻る。
ステップＳ１２１１において経路設定部８０６は、ロボットアーム１００の座標を、制御量を与える前の座標O(i)に戻すようにActor部８０４を経由して制御部２０３に制御量を与え、駆動部２０４によってロボットアーム１００を移動させる。
その後、ステップＳ１２０２からステップＳ１２１０までのループをオス側コネクタ１１０とメス側コネクタ１２０のとの嵌合が外れるまで、嵌合を行うために与えた制御量から1mmもしくは単位量戻す処理と、戻した位置を中心に制御量を与えて力覚センサ８０１のデータを収集する処理とを繰り返す。オス側コネクタ１１０とメス側コネクタ１２０のとの嵌合が外れた場合は、ステップＳ１２１２に進む。If not, the process returns to step S1202 via step S1211.
In step S1211, the path setting unit 806 gives a control amount to the control unit 203 via the Actor unit 804 so as to return the coordinates of the robot arm 100 to the coordinates O (i) before giving the control amount, and drives the driving unit 204. To move the robot arm 100.
Thereafter, the loop from step S1202 to step S1210 is returned to the process of returning 1 mm or a unit amount from the given control amount until the male connector 110 and the female connector 120 are disengaged. The process of collecting the data of the force sensor 801 by giving a control amount around the position is repeated. If the male connector 110 and the female connector 120 are disengaged, the process proceeds to step S1212.

ステップＳ１２１２において、経路設定部８０６は、変数ｉをＩ（Ｉはオス側コネクタ１１０とメス側コネクタ１２０のとの嵌合が外れたと判定された時のｉの値よりも大きい整数）とするとともに、嵌合を行うために与えた移動量から例えば10mm（ここもその他の値でもよい）戻すようにActor部８０４を経由して制御部２０３に制御量を与え、駆動部２０４によってロボットアーム１００を移動させる。 In step S1212, the path setting unit 806 sets the variable i to I (I is an integer larger than the value of i when it is determined that the male connector 110 and the female connector 120 are disengaged). Then, a control amount is given to the control unit 203 via the Actor unit 804 so as to return, for example, 10 mm (this may be another value) from the movement amount given for the fitting, and the robot arm 100 is moved by the driving unit 204. Move.

次に、ステップＳ１２１３において、経路設定部８０６は、ステップＳ１２１２で移動したロボットアーム１００の座標の位置を中心位置O(i+k)として記憶する。
次に、ステップＳ１２１４において、経路設定部８０６は、中心位置O(i+k)を中心に、再度、ランダムに制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を決定した上で、Actor部８０４を経由して制御部２０３に制御量与え、駆動部２０４によってロボットアーム１００を移動させる。Next, in step S1213, the route setting unit 806 stores the coordinate position of the robot arm 100 moved in step S1212 as the center position O (i + k).
Next, in step S1214, the route setting unit 806 determines control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) again at random with the center position O (i + k) as the center. A control amount is given to the control unit 203 via the actor unit 804, and the robot arm 100 is moved by the drive unit 204.

ステップＳ１２１５において、Critic部８０３とActor部８０４は、制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）にて移動後のロボットアーム１００位置において、単眼カメラ１０２の撮像部２０１が撮像した画像を取得する。
ステップＳ１２１６において、Critic部８０３とActor部８０４は、移動量に−１を乗じた（-ΔX、-ΔY、-ΔZ、-ΔAx、-ΔAy、-ΔAz）と画像を1つの学習データとして記録する。In step S1215, the Critic unit 803 and the Actor unit 804 are images captured by the imaging unit 201 of the monocular camera 102 at the position of the robot arm 100 after being moved by the control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz). To get.
In step S1216, the Critic part 803 and the Actor part 804 record the image as one learning data by multiplying the movement amount by -1 (-ΔX, -ΔY, -ΔZ, -ΔAx, -ΔAy, -ΔAz). .

ステップＳ１２１７において、経路設定部８０６は、集めたデータ数が規定数Jに到達できたかを判定する。データ数が足りなければ、ステップＳ１２１２において変数j に１加算してステップＳ１２１４に戻り、制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を乱数によって変えてデータを取得し、規定数J個のデータが溜まるまでＳ１２１４〜Ｓ１２１７繰り返す。
なお、Ｓ１２０４における制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）とＳ１２０４での制御量のランダム値の最大値は異なる値を取ることができる。
以上の方法で取得した学習データは、Actor部８０４及びCritic部８０３の学習を行う。In step S1217, the path setting unit 806 determines whether the collected data number has reached the specified number J. If the number of data is not sufficient, 1 is added to the variable j in step S1212, and the process returns to step S1214. The control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) is changed by a random number to obtain data, and the specified number J S1214 to S1217 are repeated until individual data are accumulated.
In addition, the maximum value of the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) in S1204 and the random value of the control amount in S1204 can take different values.
The learning data acquired by the above method performs learning of the Actor unit 804 and the Critic unit 803.

図１３は実施の形態３におけるニューラルネットワークと、ニューラルネットワークの学習則の例を示す図である。
実施の形態１、２については、力覚センサ８０１のデータを用いた学習方法について記載していなかった。実施形態１と２は、入力層は画像のみであったのに対し、実施の形態３においては、入力層に画像に替えて力覚センサ８０１の値をいれればよい。力覚センサ８０１の値は、３つ（力と２方向のモーメント）の場合と、６つ（３方向と３方向モーメント）いずれでもよい。出力層は制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）が出力される。尚、オス側コネクタ１１０とメス側コネクタ１２０のとの嵌合が外れている場合には、入力層に画像と力覚センサ８０１の値が同時に入力されることとなる。
ニューラルネットワークの学習過程において、入力された画像及び力覚センサ８０１の値から中間層を経て得られた出力層の出力値が画像及び力覚センサ８０１の値とセットで記憶された制御量に近似させるために中間層のパラメータを最適化させることが行われ、学習されることなる。
最後に経路決定部８０２はこの学習されたニューラルネットワークのデータを制御パラメータ生成部２０２に与えることで、実施の形態１における動作が可能となる。FIG. 13 is a diagram illustrating an example of a neural network and a learning rule of the neural network according to the third embodiment.
In the first and second embodiments, the learning method using the data of the force sensor 801 is not described. In the first and second embodiments, the input layer is only an image, whereas in the third embodiment, the value of the force sensor 801 may be input to the input layer instead of the image. The force sensor 801 may have three values (force and moment in two directions) or six values (three directions and moment in three directions). Control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) are output from the output layer. When the male connector 110 and the female connector 120 are not fitted, the image and the value of the force sensor 801 are simultaneously input to the input layer.
In the learning process of the neural network, the output value of the output layer obtained through the intermediate layer from the input image and the value of the force sensor 801 approximates the control amount stored as a set with the value of the image and force sensor 801. In order to achieve this, the parameters of the intermediate layer are optimized and learned.
Finally, the route determination unit 802 gives the learned neural network data to the control parameter generation unit 202, thereby enabling the operation in the first embodiment.

尚、本実施の形態においては、オス側コネクタ１１０とメス側コネクタ１２０のとの嵌合のための移動から少しずつ戻しつつ、ロボットアーム１００を微小に周辺に移動させて学習させるために、嵌合がはずれるまでは単眼カメラ１０２の画像の画素量次第では十分な学習できない前提で説明していた。
しかしながら単眼カメラ１０２の画像が十分高精細でロボットアーム１００を微小に周辺に移動させた画像であっても十分に学習可能である場合は、単眼カメラ１０２の画像のみで学習してもよいし、オス側コネクタ１１０とメス側コネクタ１２０とが嵌合している場合であっても単眼カメラ１０２の画像と力覚センサ８０１の値を両方用いてもよい。In this embodiment, the robot arm 100 is moved slightly to the periphery while learning from the movement for fitting the male connector 110 and the female connector 120 little by little. The description has been made on the assumption that sufficient learning is not possible depending on the pixel amount of the image of the monocular camera 102 until the match is lost.
However, if the image of the monocular camera 102 is sufficiently high-definition and can be sufficiently learned even if the robot arm 100 is moved slightly to the periphery, it may be learned only with the image of the monocular camera 102, Even when the male connector 110 and the female connector 120 are fitted, both the image of the monocular camera 102 and the value of the force sensor 801 may be used.

さらに、実施の形態１、２において、複数のニューラルネットワークを使用するケースについて説明している。本実施の形態においても、例えばオス側コネクタ１１０とメス側コネクタ１２０とが嵌合している状態と、オス側コネクタ１１０とメス側コネクタ１２０とが嵌合していない場合とで、ニューラルネットワークを区別してもよい。上記に説明したようにオス側コネクタ１１０とメス側コネクタ１２０とが嵌合している状態では力覚センサ８０１のみを入力層と形成し、嵌合からはずれたら画像のみで入力層を形成した方がより精度のよい学習が行えるし、画像のみで学習させる場合でも嵌合している場合としていない場合を区別することで、画像の構成がことなるために精度よい学習が行える。 Furthermore, in the first and second embodiments, cases where a plurality of neural networks are used are described. Also in the present embodiment, for example, a state in which the male connector 110 and the female connector 120 are fitted and a case in which the male connector 110 and the female connector 120 are not fitted is used as a neural network. You may distinguish. As described above, when the male connector 110 and the female connector 120 are mated, only the force sensor 801 is formed as an input layer, and when it is out of the mating, the input layer is formed only by an image. However, it is possible to perform learning with higher accuracy, and even when learning is performed only with an image, by distinguishing between cases where the fitting is not performed, it is possible to perform learning with high accuracy because the configuration of the image is different.

尚、実施の形態１、２で述べたように、本実施の形態にいても、この技術の適用はコネクタの嵌合に限られない。例えば基板にＩＣを載せる場合にも適用できるし、特に足の寸法誤差が大きいコンデンサ等を基板の穴に挿入する場合おいても、同様の方法を用いれば効果を奏するものである。
また、必ずしも基板への挿入に限った話ではなく、画像と制御量の関係から制御量を求める位置制御全般に利用できる。この発明においては、ニューラルネットワークを用いて画像と制御量との関係を学習させることで、モノとモノとの位置合わせを行う際の各々の個体差を吸収できるというメリットがあり、より、精度よく制御量を算出できる。As described in the first and second embodiments, even in the present embodiment, the application of this technique is not limited to the fitting of the connector. For example, the present invention can be applied when an IC is mounted on a substrate, and even when a capacitor having a large dimensional error is inserted into a hole in the substrate, the same method can be used.
Further, the present invention is not necessarily limited to the insertion into the substrate, and can be used for general position control for obtaining the control amount from the relationship between the image and the control amount. In the present invention, by learning the relationship between the image and the control amount using a neural network, there is a merit that each individual difference when aligning a thing and a thing can be absorbed, and more accurately. Control amount can be calculated.

したがって、本実施の形態においては、二つのモノについて挿入を伴う位置合わせを含む場合、制御量を学習するために、挿入状態から抜き出す際に挿入状態からの経路上とその周辺とに移動させるよう制御量を指示する経路設定部８０６と、移動された位置の出力層、移動された位置の力覚センサ８０１の値を入力層として学習させるために移動された位置と力覚センサ８０１の値を取得するＡｃｔｏｒ部８０４とを、備えたので、効率的に学習データを収集することができる。 Therefore, in the present embodiment, in the case of including alignment with insertion for two objects, in order to learn the control amount, when extracting from the insertion state, it is moved on the path from the insertion state and its periphery. The route setting unit 806 for instructing the control amount, the output layer of the moved position, and the moved position and the value of the force sensor 801 to learn the value of the force sensor 801 of the moved position as the input layer Since the Actor unit 804 to be acquired is provided, learning data can be efficiently collected.

実施の形態４．
本実施の形態においては、実施の形態２における学習過程(特に学習初期)においても、安全な制御を行わせる方法について説明する。実施の形態４における位置制御装置のハードウエア構成図は実施の形態２と同じ図９とする。Embodiment 4 FIG.
In the present embodiment, a method for performing safe control in the learning process (particularly in the initial learning stage) in the second embodiment will be described. The hardware configuration diagram of the position control device in the fourth embodiment is the same as that in the second embodiment shown in FIG.

図１４は、実施の形態４における位置制御装置の機能構成図を示す。図８との違いは、制御パラメータ調整部１４０１、が追加されており、かつ制御パラメータ調整部１４０１は、軌道生成部１４０２、座標変換部１４０３、重力補正部１４０４、コンプライアントモーション制御部１４０５、合成部１４０６から構成されている。 FIG. 14 is a functional configuration diagram of the position control device according to the fourth embodiment. The difference from FIG. 8 is that a control parameter adjustment unit 1401 is added, and the control parameter adjustment unit 1401 includes a trajectory generation unit 1402, a coordinate conversion unit 1403, a gravity correction unit 1404, a compliant motion control unit 1405, a synthesis. Part 1406.

経路決定部８０２の構成は実施の形態２に準じ、強化学習モジュールとしてActor−Criticモデルを、成功度合を評価するモジュールとして評価部８０５と経路設定部８０６からなる構成を記載したが、Q−LearningやＤＤＰＧなど他の強化学習モデルを用いても構わない。また、本実施の形態で説明する内容の要点からすると、評価部８０５または経路設定部８０６がなくても、学習するための機能とロボットの位置制御するための機能が別あってもモノに過大な荷重を与えるのを防ぎつつ学習データを収集することができる。 The configuration of the route determination unit 802 is the same as that of the second embodiment, and an Actor-Critic model is described as a reinforcement learning module, and a configuration including an evaluation unit 805 and a route setting unit 806 is described as a module for evaluating the degree of success. Alternatively, other reinforcement learning models such as DDPG may be used. Further, from the point of the contents described in the present embodiment, even if the evaluation unit 805 or the route setting unit 806 is not provided, even if the function for learning is different from the function for controlling the position of the robot, it is excessive to the thing. Learning data can be collected while preventing a heavy load.

次に図１４の詳細について説明する。
実施の形態２においては制御パラメータ生成部２０２が生成した制御量を制御部２０３に出力することで駆動部２０４を構成する各デバイスに対する電流・電圧値を決定し制御していたが、この方法では特に初期の学習過程において制御量が不適切となり、駆動部２０４がエラー停止する、またはロボットアームやオス側コネクタ１１０とメス側コネクタ１２０などの周辺環境を破損する可能性がある。また、オス側コネクタ１１０とメス側コネクタ１２０の強度が想定よりも弱く、学習過程において制御量を十分小さく設定していたとしても、オス側コネクタ１１０とメス側コネクタ１２０などの周辺環境を破損する可能性がある。これは制御量の設定側と制御量に基づく制御側とが独立しているために発生しうる要因である。ここで本実施の形態においては、学習過程においても周辺環境に負荷をかけすぎないような仕組みを導入する。Next, details of FIG. 14 will be described.
In the second embodiment, the control amount generated by the control parameter generation unit 202 is output to the control unit 203 to determine and control the current / voltage value for each device constituting the drive unit 204. In particular, in the initial learning process, the control amount becomes inappropriate, and the drive unit 204 may stop in error, or the surrounding environment such as the robot arm, the male connector 110 and the female connector 120 may be damaged. Moreover, even if the strength of the male connector 110 and the female connector 120 is weaker than expected and the control amount is set sufficiently small in the learning process, the surrounding environment such as the male connector 110 and the female connector 120 is damaged. there is a possibility. This is a factor that may occur because the control amount setting side and the control side based on the control amount are independent. Here, in the present embodiment, a mechanism is introduced that does not overload the surrounding environment even during the learning process.

軌道生成を行う軌道生成部１４０２は、制御パラメータ生成部２０２が生成した制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）を目標位置として取得し、速度・加速度が滑らかになるように調整した周期制御量（ΔX’、ΔY’、ΔZ’、ΔAx’、ΔAy’、ΔAz’）をロボットアーム１００の制御周期、すなわち制御部２０３の制御周期に合わせて出力する機能を持つ。目標位置としての制御量（ΔX、ΔY、ΔZ、ΔAx、ΔAy、ΔAz）はあくまで制御周期としては複数の周期で到達する制御量として定義されるのに対し、周期制御量（ΔX’、ΔY’、ΔZ’、ΔAx’、ΔAy’、ΔAz’）は、基本的に制御量に到達するために一周期ごと設定される制御量であり、周辺環境への負荷を考慮した制御量であり、想定外に力覚センサ８０１も検知された負荷などにも対応することができる。周期制御量の調整方法は後述する。
力覚センサ８０１は、ロボットアーム１００の把持部１０１にかかる負荷を計測するものであり、例えば図１でいうオス側コネクタ１１０とメス側コネクタ１２０が当接した場合の力の値を計測できるものである。実施の形態２においては学習初期には試行の際出力される動作により、周辺環境に過大な力がかかり、ロボットアーム１００やオス側コネクタ１１０とメス側コネクタ１２０などの周辺環境を破損する可能性がある。そこで実施の形態４においては、コンプライアントモーション制御部１４０５を制御パラメータ生成部２０２の後段に配置して、力覚センサ８０１で取得した外力に倣って動作させることで、ロボットアーム１００やオス側コネクタ１１０とメス側コネクタ１２０などの周辺環境に過大な力をかけることを防ぐ。これにより、学習に必要な試行を安全に行わせることが出来る。
力覚センサ８０１の値は、３つ（力と２方向のモーメント）の場合と、６つ（３方向と３方向モーメント）いずれでもよい。６つの場合の力覚センサ８０１の値は（Fx、Fy、Fz、Tx、Ty、Tz）と表すことができる。ただし上記座標は力覚センサの座標系として求まるため、座標変換部１４０３は力覚センサの座標系とロボットアーム１００全体の座標系が異なる場合に、力覚センサ８０１の値をロボットアーム１００全体の座標系に変換する機能を持つ。
力覚センサ８０１で計測する値は、重力の影響を受ける。重力補正部１４０４は、力覚センサ８０１で計測した値から重力の影響を取り除く機能を持つ。
コンプライアントモーション制御部１４０５は、座標変換部１４０３および重力補正部１４０４で補正された力覚センサ８０１の値を取得する。物理法則に従い、力覚センサ８０１から検出されたこの外力に適応した制御量を出力する。外力に適応した制御量の調整方法は後述する。
合成部１４０６は、軌道生成部１４０２の出力である制御量と、コンプライアントモーション制御部１４０５の出力である制御量を合成し、制御部２０３に出力する。合成方法には、軌道生成部１４０２の出力である制御量とコンプライアントモーション制御部１４０５の出力である制御量の加算を用いる。あるいは加算比率を設定し、加算比率に応じた重み付き加算を行ってもよい。The trajectory generation unit 1402 that performs trajectory generation acquires the control amount (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) generated by the control parameter generation unit 202 as a target position, and adjusts the speed and acceleration to be smooth. The cycle control amount (ΔX ′, ΔY ′, ΔZ ′, ΔAx ′, ΔAy ′, ΔAz ′) is output in accordance with the control cycle of the robot arm 100, that is, the control cycle of the control unit 203. Control amounts (ΔX, ΔY, ΔZ, ΔAx, ΔAy, ΔAz) as target positions are defined as control amounts that reach a plurality of cycles as control cycles, whereas periodic control amounts (ΔX ′, ΔY ′) , ΔZ ′, ΔAx ′, ΔAy ′, ΔAz ′) are basically control amounts that are set for each cycle in order to reach the control amount, and are control amounts that consider the load on the surrounding environment, and are assumed The force sensor 801 can also cope with the detected load. A method for adjusting the period control amount will be described later.
The force sensor 801 measures a load applied to the grip portion 101 of the robot arm 100, and can measure a force value when the male connector 110 and the female connector 120 in FIG. It is. In Embodiment 2, there is a possibility that an excessive force is applied to the surrounding environment due to the operation output in the trial at the initial stage of learning, and the surrounding environment such as the robot arm 100, the male connector 110 and the female connector 120 may be damaged. There is. Therefore, in the fourth embodiment, the compliant motion control unit 1405 is arranged at the subsequent stage of the control parameter generation unit 202 and operated according to the external force acquired by the force sensor 801, so that the robot arm 100 and the male connector 110 to prevent excessive force from being applied to the surrounding environment such as the female connector 110 and the female connector 120. Thereby, the trial required for learning can be performed safely.
The force sensor 801 may have three values (force and moment in two directions) or six values (three directions and moment in three directions). The values of the force sensor 801 in the six cases can be expressed as (Fx, Fy, Fz, Tx, Ty, Tz). However, since the coordinates are obtained as the coordinate system of the force sensor, the coordinate conversion unit 1403 determines the value of the force sensor 801 for the entire robot arm 100 when the coordinate system of the force sensor and the coordinate system of the entire robot arm 100 are different. Has a function to convert to a coordinate system.
The value measured by the force sensor 801 is affected by gravity. The gravity correction unit 1404 has a function of removing the influence of gravity from the value measured by the force sensor 801.
The compliant motion control unit 1405 acquires the value of the force sensor 801 corrected by the coordinate conversion unit 1403 and the gravity correction unit 1404. According to the physical law, a control amount adapted to the external force detected from the force sensor 801 is output. A control amount adjustment method adapted to the external force will be described later.
The combining unit 1406 combines the control amount that is the output of the trajectory generation unit 1402 and the control amount that is the output of the compliant motion control unit 1405, and outputs them to the control unit 203. The synthesis method uses addition of the control amount that is the output of the trajectory generation unit 1402 and the control amount that is the output of the compliant motion control unit 1405. Alternatively, an addition ratio may be set and weighted addition according to the addition ratio may be performed.

軌道生成部１４０２では、ロボットアーム１００の制御周期および最大速度、最大加速度、最大加加速度を所与として、これらの少なくともいずれかの制限を超えないように制御周期単位での周期制御量を計算する。
例えば、非特許文献（KROGER, Torsten; PADIAL, Jose. Simple and robust visual servo control of robot arms using an on-line trajectory generator. In: Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012. p. 4862-4869.）のように、以下の条件をすべて満たすように周期制御量を計算する方法がある。
以下の定数は、ロボットアーム１００の仕様に対応した所与の定数とする。The trajectory generation unit 1402 calculates a periodic control amount in a control cycle unit so as not to exceed at least one of these limitations, given the control cycle of the robot arm 100, the maximum speed, the maximum acceleration, and the maximum jerk. .
For example, non-patent literature (KROGER, Torsten; PADIAL, Jose. Simple and robust visual servo control of robot arms using an on-line trajectory generator. In: Robotics and Automation (ICRA), 2012 IEEE International Conference on. IEEE, 2012. p. 4862-4869.), there is a method to calculate the periodic control amount so that all of the following conditions are satisfied.
The following constants are given constants corresponding to the specifications of the robot arm 100.

Ｔcycle ：周期
Ｖmax ：最大速度
Ａmax ：最大加速度
Ｊmax ：最大加加速度（ジャーク）
Tcycle: Cycle Vmax: Maximum speed Amax: Maximum acceleration Jmax: Maximum jerk (jerk)

ここで、ｘｉ、ｖｉ、αｉ、ｊｉは以下を表す変数である。
ｘｉ：ステップｉにおける現在位置
ｖｉ：ステップｉにおける現在速度
αｉ：ステップｉにおける現在加速度
ｊｉ：ステップｉにおける現在加加速度（ジャーク）

Here, xi, vi, αi, and ji are variables representing the following.
xi: current position at step i vi: current speed αi at step i: current acceleration at step i ji: current jerk at step i (jerk)

上述したコンプライアントモーション制御部１４０５における外力に適応した制御量の調整方法について説明する。コンプライアントモーション制御では、環境の安定性および剛性を表す係数を所与として、外力の情報から従属動作を計算する。例えば、外力をf(t)とした時の従属動作Δx(t)は、次の微分方程式を解くことによって計算できる。 A control amount adjustment method adapted to the external force in the compliant motion control unit 1405 described above will be described. In the compliant motion control, a subordinate operation is calculated from external force information given a coefficient representing the stability and rigidity of the environment. For example, the dependent motion Δx (t) when the external force is f (t) can be calculated by solving the following differential equation.

ここで、m, d, kは環境の安定性および剛性を表す係数である。
m: 重力定数
d: 抵抗
k: ばね定数

Here, m, d, and k are coefficients representing the stability and rigidity of the environment.
m: gravity constant
d: resistance
k: spring constant

次に動作フローを図１５に示す。
図１５は、実施の形態４における位置制御装置の経路学習におけるフローチャートである。
まず、ステップS１５０１において、ロボットアーム１００の把持部１０１は、オス側コネクタ１１０を把持する。このオス側コネクタ１１０の位置や姿勢は図１４の制御部２０３側で事前に登録されており、あらかじめ制御部２０３側に登録された制御プログラムに基づいて動作される。Next, an operation flow is shown in FIG.
FIG. 15 is a flowchart in route learning of the position control device according to the fourth embodiment.
First, in step S1501, the grip portion 101 of the robot arm 100 grips the male connector 110. The position and orientation of the male connector 110 are registered in advance on the control unit 203 side in FIG. 14 and operated based on a control program registered in advance on the control unit 203 side.

次に、ステップＳ１５０２において、ロボットアーム１００をメス側コネクタ１２０の挿入位置近辺まで近づける。このメス側コネクタ１１０のおおよその位置や姿勢は、図１４の制御部２０３側で事前に登録されており、あらかじめ制御部２０３側に登録された制御プログラムに基づいてオス側コネクタ１１０の位置が、動作される。ここまでは実施の形態１における図４のフローチャートのステップＳ１０１〜Ｓ１０２と同じである。 Next, in step S1502, the robot arm 100 is brought close to the vicinity of the insertion position of the female connector 120. The approximate position and orientation of the female connector 110 are registered in advance on the control unit 203 side in FIG. 14, and the position of the male connector 110 is determined based on the control program registered in advance on the control unit 203 side. Be operated. The steps so far are the same as steps S101 to S102 in the flowchart of FIG. 4 in the first embodiment.

次に、ステップＳ１５０３において、経路決定部８０２は、単眼カメラ１０２の撮像部２０１に対し、画像を撮像するよう指示し、単眼カメラ１０２は、把持部１０１が把持しているオス側コネクタ１１０と、挿入先となるメス側コネクタ１２０とが両方映っている画像を撮像する。単眼カメラ１０２からの画像を経路決定部８０２の評価部８０５とCritic部８０３が記憶する。
さらに、ステップＳ１５０４において、経路決定部８０２は、力覚センサ８０１に対し、外力を取得するよう指示し、力覚センサ８０１は現在位置での外力を取得する。同時に、力覚センサ８０１の値を経路決定部８０２の評価部８０５とCritic部８０３が記憶する。Next, in step S1503, the route determination unit 802 instructs the imaging unit 201 of the monocular camera 102 to capture an image, and the monocular camera 102 includes the male connector 110 held by the holding unit 101, An image in which both the female connector 120 as the insertion destination is shown is captured. The evaluation unit 805 and the critical unit 803 of the route determination unit 802 store the image from the monocular camera 102.
In step S1504, the route determination unit 802 instructs the force sensor 801 to acquire an external force, and the force sensor 801 acquires the external force at the current position. At the same time, the evaluation unit 805 and the critical unit 803 of the route determination unit 802 store the value of the force sensor 801.

次に、ステップＳ１５０５において、経路決定部８０２部のActor部８０４は、嵌合を行うための制御量を計算し、軌道生成部１４０２に与える。 Next, in step S <b> 1505, the Actor unit 804 of the route determination unit 802 calculates a control amount for performing fitting and supplies the calculated amount to the trajectory generation unit 1402.

次にステップＳ１５０６において、軌道生成部１４０２によって速度・加速度が滑らかになるように調整した新しい制御量を計算する。具体的には前述したロボットアーム１００の仕様に対応した所与の定数、Tｃｙｃｌｅ、Ｖｍａｘ、Ａｍａｘ、Ｊｍａｘを満たすｘｉを１制御周期ごとの目標位置である周期制御量を算出する。 Next, in step S1506, a new control amount adjusted so that the velocity / acceleration is smoothed by the trajectory generation unit 1402 is calculated. More specifically, a periodic control amount that is a target position for each control period is calculated for xi that satisfies a given constant, Tcycle, Vmax, Amax, and Jmax corresponding to the specification of the robot arm 100 described above.

ステップＳ１５０７において、座標変換部１４０３は、ステップＳ１５０３で取得した力覚センサ８０１の値をロボットアーム１００全体の座標系に変換する。 In step S1507, the coordinate conversion unit 1403 converts the value of the force sensor 801 acquired in step S1503 into the coordinate system of the entire robot arm 100.

次に、ステップＳ１５０８において、重力補正部１４０４は、ステップＳ１５０６で座標変換した力覚センサ８０１の値から重力の影響を取り除き、コンプライアントモーション制御部１４０５に与える。 Next, in step S1508, the gravity correction unit 1404 removes the influence of gravity from the value of the force sensor 801 coordinate-converted in step S1506, and gives it to the compliant motion control unit 1405.

次に、ステップＳ１５０９において、コンプライアントモーション制御部１４０５は、ステップＳ１５０８で重力補正した力覚センサ８０１の値から、外力に適応した制御量を計算し、合成部１４０６に与える。外力に適応した制御量は、例えば力覚センサ８０１の値が小さくなるように、上述にて計算された値を算出する。 Next, in step S1509, the compliant motion control unit 1405 calculates a control amount adapted to the external force from the value of the force sensor 801 corrected for gravity in step S1508, and supplies the calculated control amount to the synthesis unit 1406. For the control amount adapted to the external force, for example, the value calculated above is calculated so that the value of the force sensor 801 becomes small.

次に、ステップＳ１５１０において、合成部１４０６は、ステップS１５０６で計算した周期制御量と、ステップS１５０９で計算したコンプライアントモーション制御量を加算、または重みづけ加算することで合成し、周期制御量調整値として制御部２０３に与える。 Next, in step S1510, the synthesizing unit 1406 synthesizes the periodic control amount calculated in step S1506 and the compliant motion control amount calculated in step S1509 by adding or weighting and adding the periodic control amount adjustment value. To the control unit 203.

次にステップＳ１５１１において、駆動部２０４によってロボットアーム１００を移動し、コネクタ挿入を試行する。制御パラメータ調整部１４０１は、周期制御量調整値が制御パラメータ生成部２０２で生成された制御量に到達したか否かを確認し、到達していなければステップＳ１５０４へ戻る。したがって制御周期毎にステップＳ１５０４からステップＳ１５１１までの動作を繰り返すことができる。
また、制御パラメータ生成部２０２で生成された制御量に到達する前にオス側コネクタ１１０とメス側コネクタ１２０が当接した場合でも、力覚センサ８０１の値が上昇することが制御周期毎に検出され、周期制御量調整値としてフィードバック制御されるため、学習初期であっても周囲の環境を破壊する可能性を小さくすることができる。Next, in step S1511, the robot arm 100 is moved by the drive unit 204 to try to insert a connector. The control parameter adjustment unit 1401 checks whether or not the periodic control amount adjustment value has reached the control amount generated by the control parameter generation unit 202. If not, the process returns to step S1504. Therefore, the operations from step S1504 to step S1511 can be repeated every control cycle.
Further, even when the male connector 110 and the female connector 120 contact each other before reaching the control amount generated by the control parameter generation unit 202, it is detected every control cycle that the value of the force sensor 801 increases. In addition, since feedback control is performed as the periodic control amount adjustment value, it is possible to reduce the possibility of destroying the surrounding environment even in the initial learning stage.

次にステップＳ１５１２において、嵌合が成功したかを評価部８０５とCritic部８０３が確認すると同時に、ステップS１５０３で記憶した単眼カメラ１０２と力覚センサ８０１の値およびS１５０４で計算した制御パラメータの値より、Actor部８０４とCritic部８０３のニューラルネットワークパラメータを更新する。 Next, in step S1512, the evaluation unit 805 and the critical unit 803 confirm whether the fitting has been successful, and at the same time, based on the values of the monocular camera 102 and force sensor 801 stored in step S1503 and the control parameter values calculated in S1504. , The neural network parameters of the Actor unit 804 and the Critic unit 803 are updated.

そして、ステップＳ１５１３において嵌合が成功していない場合は、ロボットアーム１００の位置を動かさず、ステップS１５０３に戻って次の試行を行う。
尚、ステップＳ１５０３に戻る前に図１５には示していないが、実施の形態２の図１１のステップＳ１１０８、ステップＳ１１０９、及び図１０に示す実施の形態２で用いた評価方法で評価したうえでステップＳ１５０３に戻ることで、実施の形態２の場合と同様の効果が得られる。If the fitting is not successful in step S1513, the position of the robot arm 100 is not moved and the process returns to step S1503 to perform the next trial.
Although not shown in FIG. 15 before returning to step S1503, evaluation is performed using the evaluation method used in step S1108 in FIG. 11 of the second embodiment, step S1109, and the second embodiment shown in FIG. By returning to step S1503, the same effect as in the second embodiment can be obtained.

なお、ステップＳ１５１３において、篏合が成功していた場合は、篏合タスク自体は終了する。Actor部８０４とCritic部８０３の学習をさらに続ける場合は、ステップＳ１５０１に戻ってオス側コネクタ１１０の把持から再試行することで、学習の精度を高めることが可能である。 In step S1513, if the combination is successful, the combination task itself ends. When further learning of the actor part 804 and the critical part 803 is continued, it is possible to improve the learning accuracy by returning to step S1501 and retrying from the grasping of the male connector 110.

強化学習モジュールとしてActor−Criticモデルをベースに記載したが、ＤＤＰＧなど他の強化学習モデルを用いても構わない。 Although an Actor-Critic model has been described as a reinforcement learning module, other reinforcement learning models such as DDPG may be used.

したがって、本実施形態においては、二つのモノについて挿入を伴う位置合わせを含む場合、撮像部２０１から取得された画像と力覚センサ８０１の値に基づいて挿入するための制御量を指示するとともに位置合わせに対する結果から学習する経路決定部８０２と、制御量に到達するために一制御周期ごとに設定される周期制御量と、力覚センサ８０１の値に基づく外力に適応した制御量とに基づいて周期制御量調整値を出力する制御パラメータ調整部１４０１を備えたので、軌道生成制御および力覚センサ８０１を用いたコンプライアントモーション制御を加算した制御量によってロボットアーム１００を動作させ、通常の強化学習モデルでは学習が収束するまでは試行錯誤が必要であり環境を破損する可能性があるが、本発明により学習の初期であっても安全に試行を行わせることが可能である。 Therefore, in the present embodiment, in the case of including alignment with insertion for two objects, the control amount for insertion based on the image acquired from the imaging unit 201 and the value of the force sensor 801 is indicated and the position is set. Based on the route determination unit 802 that learns from the result of the combination, the periodic control amount that is set for each control period to reach the control amount, and the control amount that is adapted to the external force based on the value of the force sensor 801 Since the control parameter adjustment unit 1401 for outputting the periodic control amount adjustment value is provided, the robot arm 100 is operated by the control amount obtained by adding the trajectory generation control and the compliant motion control using the force sensor 801, and normal reinforcement learning is performed. The model requires trial and error until learning converges and may damage the environment. Even early in it is possible to perform a safe trial.

尚、本実施の形態においては、制御パラメータ調整部１４０１の機能に限定して説明したが、実施の形態２、３に記述の内容に対しても制御パラメータ調整部１４０１の機能を追加しても実施の形態２、３動作させることができ、安全に且つ学習速度を向上させることが可能である。
In the present embodiment, the description is limited to the function of the control parameter adjustment unit 1401. However, even if the function of the control parameter adjustment unit 1401 is added to the contents described in the second and third embodiments. The second and third embodiments can be operated, and the learning speed can be improved safely.

１００：ロボットアーム、
１０１：把持部、
１０２：単眼カメラ
１１０：オス側コネクタ
１２０：メス側コネクタ
２０１：撮像部
２０２：制御パラメータ生成部
２０３：制御部
２０４：駆動部
３０１：入出力インターフェース
３０２：プロセッサ、
３０３：メモリ、
３０４：制御回路、
３０５：モータ、
８０１：力覚センサ
８０２：経路決定部
８０３：Critic部
８０４：Actor部
８０５：評価部
８０６：経路設定部
１４０１：制御パラメータ調整部
１４０２：軌道生成部
１４０３：座標変換部
１４０４：重力補正部
１４０５：コンプライアントモーション制御部
１４０６：合成部100: Robot arm,
101: gripping part,
102: Monocular camera 110: Male connector 120: Female connector 201: Imaging unit 202: Control parameter generation unit 203: Control unit 204: Drive unit 301: Input / output interface 302: Processor,
303: memory,
304: Control circuit,
305: Motor,
801: Force sensor 802: Route determination unit 803: Critic unit 804: Actor unit 805: Evaluation unit 806: Path setting unit 1401: Control parameter adjustment unit 1402: Trajectory generation unit 1403: Coordinate conversion unit 1404: Gravity correction unit 1405: Compliant motion control unit 1406: composition unit

Claims

When aligning with insertion from one mono two things to the other things, a shall be acquired from the imaging unit, while said gripped by the gripper of the robotic arm of mono- and of the other and images and objects have been photographed, the result of the alignment instructs the value of the force sensor that measures the load on the gripping portion, the control amount of the position of the robot arm for insertion on the basis of, A route determination unit that learns a control amount of the position of the robot arm from which the alignment succeeds from the value of the image and the force sensor ;
In the course of the learning,
The control amount of the position of the robot arm from the path determination unit is received, and the periodic control amount set for each control period of the control unit to reach the control amount of the position of the robot arm; a control amount that is adapted to the external force based on the value of the force sensor corresponding to the control period, a control parameter adjusting section for outputting a periodic control amount adjustment value to the control unit on the basis of,
A position control device.

The periodic control amount is set for each cycle in consideration of any of maximum speed, maximum acceleration, and maximum jerk to reach the control amount, and the control amount adapted to the external force is the force sense. The position control device according to claim 1, wherein the position control device is determined according to a value obtained by excluding a gravity component from a sensor value.

With a control unit for controlling the current or voltage for controlling the positional relationship of the two mono using pre SL cycle control amount adjustment value, a current or voltage for controlling the two positional relationship mono A driving unit that moves one position of the positional relationship between the two objects, and the force sensor acquires a force applied when the positional relationship between the two objects is maintained. The position control device described.

The route determination unit includes: a route setting unit that instructs a movement amount to move to and around the route from the insertion state when extracting from the insertion state; an output layer of the moved position data; An Actor unit for acquiring the value of the moved position and the value of the force sensor in order to learn the value of the force sensor as an input layer;
The position control device according to claim 1, comprising:

The position control apparatus according to claim 4 , wherein the actor unit performs learning from the input layer and the output layer using an Actor-Critic model.

The Actor section learns a plurality of neural networks, and one of the plurality of neural networks uses the position data in which the positional relationship between the two objects is inserted for learning, and the other data The position control device according to claim 5 , wherein data of a position in which the positional relationship between two objects is not inserted is used for learning.

In the Actor unit, the value of the force sensor is used for the data of the position where the positional relationship between the two objects is inserted, and the data for the position where the positional relationship of the two objects is not inserted, The position control device according to claim 6 , wherein image data is used.

A method for controlling the position of two objects,
In the case of performing alignment with insertion of one of the two objects from the other object to the other object, an image of the one object and the other object gripped by the grip portion of the robot arm, and the grip outputs the value of the force sensor that measures the load on the part, the control amount of the position of the robot arm for insertion on the basis of,
In the course of learning,
The control amount of the position of the robot arm is received, the periodic control amount of the control unit that can reach the control amount of the position of the robot arm in one control cycle, and the force sensor corresponding to the one control cycle A control amount adapted to an external force based on the value , and a periodic control amount adjustment value based on the control amount is output to the control unit ,
A position control method for two things, wherein the control amount of the position of the robot arm that is successfully aligned is learned from the image and the value of the force sensor from the result of the alignment.