JP2019162712A

JP2019162712A - Control device, machine learning device and system

Info

Publication number: JP2019162712A
Application number: JP2019001285A
Authority: JP
Inventors: 祐一朗木山; Yuichiro Kiyama
Original assignee: Fanuc Corp
Current assignee: Fanuc Corp
Priority date: 2018-03-20
Filing date: 2019-01-08
Publication date: 2019-09-26
Also published as: CN110303492A

Abstract

To provide a control device and a machine learning device which can optimize polishing quality.SOLUTION: A control device 1 controlling a robot which polishes a workpiece comprises a machine learning device 100 that learns a polishing condition for performing polishing. The machine learning device 100 comprises: a state observing part 106 that observes characteristics of a surface state of the polished workpiece and the polishing condition as state variables representing a current state of an environment; a determination data acquiring part 108 that acquires determination data indicating an evaluated result of the surface state of the polished workpiece; and a learning part 110 that uses the state variables and the determination data to associate the characteristics of the surface state of the polished workpiece and the polishing condition with each other and learns them.SELECTED DRAWING: Figure 2

Description

本発明は、制御装置、機械学習装置及びシステムに関し、特に研磨品質を最適化する制御装置、機械学習装置及びシステムに関する。 The present invention relates to a control device, a machine learning device, and a system, and more particularly to a control device, a machine learning device, and a system that optimize polishing quality.

従来、ロボットが機械部品等の研磨作業を行う場合において、研磨品質の確認作業は人の目視によることが一般的であった。また、研磨品質を向上させるためには、ロボットの動作速度、押付力、研磨ツールの回転数、トルクなど様々な条件を変更しながらテスト研磨を繰り返し実施する必要があった。 Conventionally, when a robot performs a polishing operation on a machine part or the like, the polishing quality confirmation operation is generally performed by human visual inspection. Further, in order to improve the polishing quality, it is necessary to repeatedly perform the test polishing while changing various conditions such as the operation speed of the robot, the pressing force, the number of rotations of the polishing tool, and the torque.

特許文献１には、残存ばり高さをセンサで計測する計測動作と、削り動作と、を交互に繰り返すばり研削ロボットが記載されている。特許文献２には、検査ロボットが撮像手段を用いてワークの表面状態の欠陥を監視する方法が記載されている。 Patent Document 1 describes a flash grinding robot that alternately repeats a measurement operation for measuring the remaining flash height with a sensor and a cutting operation. Patent Document 2 describes a method in which an inspection robot monitors defects in a surface state of a workpiece using an imaging unit.

特開平７−２４６５５２号公報JP-A-7-246552 特開平５−１９６４４４号公報JP-A-5-196444

人手による試行錯誤で所望の研磨品質を得るまでには、多大な手間と時間が必要である。この点、特許文献１及び特許文献２のいずれも、研磨品質を自動的に最適化するための具体的な技術手段は開示していない。 It takes a lot of labor and time to obtain a desired polishing quality by trial and error by hand. In this respect, neither Patent Document 1 nor Patent Document 2 discloses specific technical means for automatically optimizing the polishing quality.

そこで、研磨品質を最適化する制御装置、機械学習装置及びシステムが望まれている。 Therefore, a control device, a machine learning device, and a system that optimize the polishing quality are desired.

本発明の一態様にかかる制御装置は、ワークの研磨を行うロボットを制御する制御装置であって、前記研磨を行う際の研磨条件を学習する機械学習装置を備え、前記機械学習装置は、前記研磨の後の前記ワークの表面状態の特徴と、前記研磨条件と、を環境の現在状態を表す状態変数として観測する状態観測部と、前記研磨の後の前記ワークの表面状態の評価結果を示す判定データを取得する判定データ取得部と、前記状態変数と前記判定データとを用いて、前記研磨の後の前記ワークの表面状態の特徴と、前記研磨条件と、を関連付けて学習する学習部と、を備える。
本発明の他の態様にかかる機械学習装置は、ロボットによりワークの研磨を行う際の研磨条件を学習する機械学習装置であって、前記研磨の後の前記ワークの表面状態の特徴と、前記研磨条件と、を環境の現在状態を表す状態変数として観測する状態観測部と、前記研磨の後の前記ワークの表面状態の評価結果を示す判定データを取得する判定データ取得部と、前記状態変数と前記判定データとを用いて、前記研磨の後の前記ワークの表面状態の特徴と、前記研磨条件と、を関連付けて学習する学習部と、を備える。
本発明の他の態様にかかるシステムは、複数の装置がネットワークを介して相互に接続されたシステムであって、前記複数の装置は、少なくとも機械学習装置を有する制御装置を備えた第１のロボットを含むシステムである。 A control device according to an aspect of the present invention is a control device that controls a robot that polishes a workpiece, and includes a machine learning device that learns polishing conditions when performing the polishing, and the machine learning device includes: A state observation unit for observing characteristics of the surface state of the workpiece after polishing and the polishing conditions as state variables representing the current state of the environment, and an evaluation result of the surface state of the workpiece after polishing A determination data acquisition unit that acquires determination data, and a learning unit that learns the characteristics of the surface state of the workpiece after the polishing and the polishing conditions in association with each other using the state variable and the determination data. .
A machine learning device according to another aspect of the present invention is a machine learning device that learns polishing conditions when a workpiece is polished by a robot, and the characteristics of the surface state of the workpiece after the polishing and the polishing A state observing unit for observing the condition as a state variable representing the current state of the environment, a determination data acquiring unit for acquiring determination data indicating an evaluation result of the surface state of the workpiece after the polishing, and the state variable A learning unit that learns the characteristics of the surface state of the workpiece after the polishing and the polishing conditions in association with each other using the determination data.
A system according to another aspect of the present invention is a system in which a plurality of devices are connected to each other via a network, and the plurality of devices includes a control device having at least a machine learning device. It is a system including

本発明により、研磨品質を最適化する制御装置及び機械学習装置を提供することができる。 The present invention can provide a control device and a machine learning device that optimize polishing quality.

制御装置１のハードウェア構成図である。2 is a hardware configuration diagram of a control device 1. FIG. 第１の実施形態による制御装置１の機能ブロック図である。It is a functional block diagram of the control apparatus 1 by 1st Embodiment. 制御装置１の一形態を示す機能ブロック図である。2 is a functional block diagram showing one form of a control device 1. FIG. 機械学習方法の一形態を示す概略的なフローチャートである。It is a schematic flowchart which shows one form of the machine learning method. ニューロンを説明する図である。It is a figure explaining a neuron. ニューラルネットワークを説明する図である。It is a figure explaining a neural network. 第２の実施形態による制御装置１の機能ブロック図である。It is a functional block diagram of the control apparatus 1 by 2nd Embodiment. クラウドサーバ、フォグコンピュータ、エッジコンピュータを含む３階層構造のシステムの例を示す図である。It is a figure which shows the example of the system of 3 hierarchy structure containing a cloud server, a fog computer, and an edge computer. 制御装置１を組み込んだシステムの一形態を示す機能ブロック図である。It is a functional block diagram which shows one form of the system incorporating the control apparatus. 制御装置１を組み込んだシステムの他の形態を示す機能ブロック図である。It is a functional block diagram which shows the other form of the system incorporating the control apparatus 1. 制御装置１を組み込んだシステムの他の形態を示す機能ブロック図である。It is a functional block diagram which shows the other form of the system incorporating the control apparatus 1. 図１０で示したコンピュータの概略的なハードウェア構成図である。FIG. 11 is a schematic hardware configuration diagram of the computer shown in FIG. 10. 制御装置１を組み込んだシステムの他の形態を示す機能ブロック図である。It is a functional block diagram which shows the other form of the system incorporating the control apparatus 1. 研磨を行うロボットの概略的な模式図である。It is a schematic model diagram of the robot which grind | polishes. 研磨を行うロボットの概略的な模式図である。It is a schematic model diagram of the robot which grind | polishes. ワークの表面状態の一例を示す図である。It is a figure which shows an example of the surface state of a workpiece | work. 制御装置１の一形態を示す機能ブロック図である。2 is a functional block diagram showing one form of a control device 1. FIG.

図１は、本発明の一実施形態による制御装置１と、制御装置１によって制御される産業用ロボットの要部とを示す概略的なハードウェア構成図である。制御装置１は、例えば研磨を行う産業用ロボット（以下、単にロボットという）を制御する制御装置である。制御装置１は、ＣＰＵ１１、ＲＯＭ１２、ＲＡＭ１３、不揮発性メモリ１４、インタフェース１８、インタフェース１９、インタフェース２１、インタフェース２２，バス２０、軸制御回路３０、サーボアンプ４０を有する。制御装置１には、サーボモータ５０、教示操作盤６０、研磨ツール７０、撮像装置８０が接続される。 FIG. 1 is a schematic hardware configuration diagram showing a control device 1 according to an embodiment of the present invention and main parts of an industrial robot controlled by the control device 1. The control device 1 is a control device that controls, for example, an industrial robot that performs polishing (hereinafter simply referred to as a robot). The control device 1 includes a CPU 11, ROM 12, RAM 13, nonvolatile memory 14, interface 18, interface 19, interface 21, interface 22, bus 20, axis control circuit 30, and servo amplifier 40. A servo motor 50, a teaching operation panel 60, a polishing tool 70, and an imaging device 80 are connected to the control device 1.

ＣＰＵ１１は、制御装置１を全体的に制御するプロセッサである。ＣＰＵ１１は、ＲＯＭ１２に格納されたシステム・プログラムをインタフェース２２，バス２０を介して読み出し、システム・プログラムに従って制御装置１全体を制御する。 The CPU 11 is a processor that controls the control device 1 as a whole. The CPU 11 reads the system program stored in the ROM 12 via the interface 22 and the bus 20 and controls the entire control device 1 according to the system program.

ＲＯＭ１２は、ロボットの各種制御等を実行するためのシステム・プログラム（後述する機械学習装置１００とのやりとりを制御するためのシステム・プログラムを含む）を予め格納している。 The ROM 12 stores in advance a system program (including a system program for controlling interaction with the machine learning device 100 described later) for executing various types of control of the robot.

ＲＡＭ１３は、一時的な計算データや表示データ、後述する教示操作盤６０を介してオペレータが入力したデータ等を一時的に格納する。 The RAM 13 temporarily stores temporary calculation data and display data, data input by the operator via a teaching operation panel 60 described later, and the like.

不揮発性メモリ１４は、例えば図示しないバッテリでバックアップされており、制御装置１の電源が遮断されても記憶状態を保持する。不揮発性メモリ１４は、教示操作盤６０から入力されるデータや、図示しないインタフェースを介して入力されたロボット制御用のプログラムやデータ等を格納する。不揮発性メモリ１４に記憶されたプログラムやデータは、実行時及び利用時にはＲＡＭ１３に展開されても良い。 The non-volatile memory 14 is backed up by a battery (not shown), for example, and retains the storage state even when the control device 1 is powered off. The nonvolatile memory 14 stores data input from the teaching operation panel 60, robot control programs and data input via an interface (not shown), and the like. The program and data stored in the nonvolatile memory 14 may be expanded in the RAM 13 at the time of execution and use.

軸制御回路３０は、ロボットが備えるアームの関節等の軸を制御する。軸制御回路３０は、ＣＰＵ１１が出力する軸の移動指令量を受けて、軸の移動指令をサーボアンプ４０に出力する。 The axis control circuit 30 controls axes such as joints of arms provided in the robot. The axis control circuit 30 receives an axis movement command amount output from the CPU 11 and outputs an axis movement command to the servo amplifier 40.

サーボアンプ４０は、軸制御回路３０が出力する軸の移動指令を受けて、サーボモータ５０を駆動する。 The servo amplifier 40 drives the servo motor 50 in response to an axis movement command output from the axis control circuit 30.

サーボモータ５０は、サーボアンプ４０により駆動されてロボットが備える軸を移動させる。サーボモータ５０は、典型的には位置・速度検出器を内蔵する。位置・速度検出器は位置・速度フィードバック信号を出力し、この信号が軸制御回路３０にフィードバックされることで、位置・速度のフィードバック制御が行われる。 The servo motor 50 is driven by the servo amplifier 40 to move the axis of the robot. The servo motor 50 typically includes a position / speed detector. The position / velocity detector outputs a position / velocity feedback signal, and this signal is fed back to the axis control circuit 30 to perform position / velocity feedback control.

なお、図１では軸制御回路３０、サーボアンプ４０、サーボモータ５０は１つずつしか示されていないが、実際には制御対象となるロボットに備えられた軸の数だけ用意される。例えば、６軸を備えたロボットを制御する場合、それぞれの軸に対応する軸制御回路３０、サーボアンプ４０、サーボモータ５０が合計６セット用意される。 In FIG. 1, only one axis control circuit 30, servo amplifier 40, and servo motor 50 are shown, but in actuality, as many axes as the number of axes provided in the robot to be controlled are prepared. For example, when controlling a robot having six axes, a total of six sets of the axis control circuit 30, servo amplifier 40, and servo motor 50 corresponding to each axis are prepared.

教示操作盤６０は、ディスプレイやハンドル、ハードウェアキー等を備えた手動データ入力装置である。教示操作盤６０は、インタフェース１８を介してＣＰＵ１１から受けた情報をディスプレイに表示する。教示操作盤６０は、ハンドルやハードウェアキー等から入力されたパルス、指令、データ等をインタフェース１８を介してＣＰＵ１１に渡す。 The teaching operation panel 60 is a manual data input device provided with a display, a handle, a hardware key, and the like. The teaching operation panel 60 displays the information received from the CPU 11 via the interface 18 on the display. The teaching operation panel 60 passes pulses, commands, data, and the like input from the handle, hardware keys, and the like to the CPU 11 via the interface 18.

研磨ツール７０は、ロボットのアーム先端に保持されて、回転する砥石により研磨対象（ワーク）を研磨する。研磨ツール７０は、インタフェース１９を介してＣＰＵ１１から受けた指令に基づく回転速度、回転トルク及び押付力で研磨を実施する。 The polishing tool 70 is held at the tip of the arm of the robot and polishes a workpiece (workpiece) with a rotating grindstone. The polishing tool 70 performs polishing with a rotation speed, a rotation torque, and a pressing force based on a command received from the CPU 11 via the interface 19.

撮像装置８０は、ワークの表面状態を撮影するための装置であり、例えばビジョンセンサである。撮像装置８０は、インタフェース２２を介してＣＰＵ１１から受けた指令に応じてワークの表面状態を撮影する。撮像装置８０は、撮影した画像データをインタフェース２２を介してＣＰＵ１１に渡す。 The imaging device 80 is a device for photographing the surface state of the workpiece, and is a vision sensor, for example. The imaging device 80 captures the surface state of the workpiece according to a command received from the CPU 11 via the interface 22. The imaging device 80 passes the captured image data to the CPU 11 via the interface 22.

インタフェース２１は、制御装置１と機械学習装置１００とを接続するためのインタフェースである。機械学習装置１００は、プロセッサ１０１、ＲＯＭ１０２、ＲＡＭ１０３、不揮発性メモリ１０４を有する。 The interface 21 is an interface for connecting the control device 1 and the machine learning device 100. The machine learning apparatus 100 includes a processor 101, a ROM 102, a RAM 103, and a nonvolatile memory 104.

プロセッサ１０１は、機械学習装置１００全体を統御する。ＲＯＭ１０２は、システム・プログラム等を格納する。ＲＡＭ１０３は、機械学習に係る各処理における一時的な記憶を行う。不揮発性メモリ１０４は、学習モデル等を格納する。 The processor 101 controls the entire machine learning device 100. The ROM 102 stores system programs and the like. The RAM 103 performs temporary storage in each process related to machine learning. The nonvolatile memory 104 stores a learning model and the like.

機械学習装置１００は、制御装置１が取得可能な各種情報（研磨ツール７０の回転速度、回転トルク及び押付力、ロボットのアームの動作速度、撮像装置８０により取得された画像データ等）をインタフェース２１を介して観測する。機械学習装置１００は、サーボモータ５０や研磨ツール７０を制御するための指令をインタフェース２１を介して制御装置１に出力する。制御装置１は、機械学習装置１００からの指令を受けて、ロボットの制御指令の修正等を行う。 The machine learning device 100 provides various information that can be acquired by the control device 1 (rotation speed, rotation torque and pressing force of the polishing tool 70, operation speed of the arm of the robot, image data acquired by the imaging device 80, etc.) in the interface 21. Observe through. The machine learning device 100 outputs a command for controlling the servo motor 50 and the polishing tool 70 to the control device 1 via the interface 21. The control device 1 receives a command from the machine learning device 100 and corrects the control command of the robot.

図１３及び図１４は、制御装置１が制御するロボットの一例を示す模式図である。
図１３に示すロボットは、サーボモータ５０が駆動することにより自在に動くアームを備える。アームの先端には撮像装置８０（ビジョンセンサ）付きの研磨ツール７０が備えられており、研磨ツール７０が研磨対象であるワークの表面を研磨する。研磨後は、図１４に示すように撮像装置８０がワークの表面状態を撮影する。 13 and 14 are schematic diagrams illustrating an example of a robot controlled by the control device 1.
The robot shown in FIG. 13 includes an arm that freely moves when the servomotor 50 is driven. A polishing tool 70 with an image pickup device 80 (vision sensor) is provided at the tip of the arm, and the polishing tool 70 polishes the surface of the workpiece to be polished. After polishing, the imaging device 80 takes an image of the surface state of the workpiece as shown in FIG.

図２は、第１の実施形態における制御装置１及び機械学習装置１００の概略的な機能ブロック図である。機械学習装置１００は、状態観測部１０６、判定データ取得部１０８、学習部１１０を有する。状態観測部１０６、判定データ取得部１０８、学習部１１０は、例えばプロセッサ１０１の一機能として実現され得る。又は、例えばＲＯＭ１０２に格納されたソフトウェアをプロセッサ１０１が実行することにより実現されても良い。 FIG. 2 is a schematic functional block diagram of the control device 1 and the machine learning device 100 according to the first embodiment. The machine learning device 100 includes a state observation unit 106, a determination data acquisition unit 108, and a learning unit 110. The state observation unit 106, the determination data acquisition unit 108, and the learning unit 110 can be realized as one function of the processor 101, for example. Alternatively, for example, it may be realized by the processor 101 executing software stored in the ROM 102.

状態観測部１０６は、環境の現在状態を表す状態変数Ｓを観測する。状態変数Ｓは、研磨ツール７０の回転速度Ｓ１、研磨ツール７０の回転トルクＳ２、研磨ツール７０の押付力Ｓ３、ロボットのアームの動作速度Ｓ４、ワークの表面状態の特徴Ｓ５を含む。 The state observation unit 106 observes a state variable S that represents the current state of the environment. The state variable S includes a rotational speed S1 of the polishing tool 70, a rotational torque S2 of the polishing tool 70, a pressing force S3 of the polishing tool 70, an operating speed S4 of the robot arm, and a feature S5 of the surface state of the workpiece.

状態観測部１０６は、制御装置１から、研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３を取得する。制御装置１は、これらの値を研磨ツール７０のモータ又は研磨ツール７０に付加されたセンサ等から取得することができる。 The state observation unit 106 acquires the rotational speed S1, the rotational torque S2, and the pressing force S3 of the polishing tool 70 from the control device 1. The control device 1 can acquire these values from a motor of the polishing tool 70 or a sensor added to the polishing tool 70.

状態観測部１０６は、制御装置１から、ロボットのアームの動作速度Ｓ４を取得する。制御装置１は、この値をサーボモータ５０又はアームに付加されたセンサ等から取得することができる。 The state observation unit 106 obtains the robot arm operating speed S4 from the control device 1. The control device 1 can acquire this value from the servo motor 50 or a sensor attached to the arm.

状態観測部１０６は、制御装置１から、ワークの表面状態の特徴Ｓ５を取得する。ワークの表面状態の特徴Ｓ５は、研磨後に撮像装置８０が撮影したワークの表面状態の画像から抽出された特徴を示すデータである。例えば、撮像装置８０が有する機能により、又は制御装置１が備える画像処理ソフトウェアにより、ワークの表面状態の画像に含まれる特徴量を抽出することでワークの表面状態の特徴Ｓ５を取得できる。撮像装置８０又は制御装置１は、例えば深層学習等の公知の手法により、ワーク表面の筋の濃さ（深さ）、滑らかさ、間隔等を示す特徴量を自動的に抽出することができる。 The state observation unit 106 acquires the surface state feature S <b> 5 of the workpiece from the control device 1. The surface condition feature S5 of the workpiece is data indicating a feature extracted from the image of the surface state of the workpiece photographed by the imaging device 80 after polishing. For example, the feature S5 of the surface state of the workpiece can be acquired by extracting the feature amount included in the image of the surface state of the workpiece by the function of the imaging device 80 or by the image processing software provided in the control device 1. The imaging device 80 or the control device 1 can automatically extract feature quantities indicating the density (depth), smoothness, spacing, and the like of the surface of the workpiece by a known method such as deep learning.

図１５に、研磨後に撮像装置８０が撮影したワークの表面状態の画像の一例を示す。このように研磨後のワーク表面には様々な濃さ（深さ）、滑らかさ、間隔の筋が残される。状態観測部１０６は、画像からこのような筋の特徴を認識し、ワークの表面状態の特徴Ｓ５として抽出する。 FIG. 15 shows an example of an image of the surface state of the work taken by the imaging device 80 after polishing. In this way, various darkness (depth), smoothness and spacing streaks are left on the surface of the workpiece after polishing. The state observation unit 106 recognizes such a muscle feature from the image, and extracts it as the feature S5 of the surface state of the workpiece.

判定データ取得部１０８は、状態変数Ｓの下でロボットが研磨を行った場合における結果を示す指標である判定データＤを取得する。判定データＤは、研磨後に撮像装置８０が撮影したワークの表面状態の画像における筋の濃さＤ１、筋の滑らかさＤ２、筋の間隔Ｄ３を含む。 The determination data acquisition unit 108 acquires determination data D, which is an index indicating a result when the robot performs polishing under the state variable S. The determination data D includes a stripe density D1, a stripe smoothness D2, and a stripe interval D3 in the image of the surface state of the workpiece taken by the imaging device 80 after polishing.

例えば、撮像装置８０が有する機能又は制御装置１が備える画像処理ソフトウェアが、研磨後に撮像装置８０が撮影したワークの表面状態の画像を解析することにより、筋の濃さＤ１、滑らかさＤ２、間隔Ｄ３をそれぞれ数値化して出力することができる。又は、作業者が、研磨後に撮像装置８０が撮影したワークの表面状態の画像を目視で評価し、評価結果を示す値（例えば適＝「１」、否＝「０」）を教示操作盤６０から入力することによって、Ｄ１，Ｄ２，Ｄ３が与えられても良い。 For example, the image processing software included in the function of the imaging device 80 or the image processing software included in the control device 1 analyzes the image of the surface state of the workpiece photographed by the imaging device 80 after polishing, whereby the density D1 of the streaks, the smoothness D2, and the interval Each D3 can be digitized and output. Alternatively, the operator visually evaluates the image of the surface state of the workpiece photographed by the imaging device 80 after polishing, and gives a value indicating the evaluation result (for example, appropriate = “1”, not = “0”) to the teaching operation panel 60. D1, D2 and D3 may be given by inputting from.

一変形例として、判定データＤは、研磨ツール７０の回転トルクＤ４を含んでも良い。回転トルクＤ４はワーク表面の滑らかさと相関関係があることが知られている。また判定データＤは、研磨ツール７０の温度Ｄ５を含んでも良い。温度Ｄ５は適切な押付力と相関関係があることが知られている。 As a modification, the determination data D may include the rotational torque D4 of the polishing tool 70. It is known that the rotational torque D4 has a correlation with the smoothness of the workpiece surface. The determination data D may include the temperature D5 of the polishing tool 70. It is known that the temperature D5 has a correlation with an appropriate pressing force.

学習部１１０は、状態変数Ｓと判定データＤとを用いて、ワークの表面状態の特徴Ｓ５と、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）と、の相関関係を学習する。すなわち学習部１１０は、状態変数Ｓの構成要素Ｓ１，Ｓ２，Ｓ３，Ｓ４，Ｓ５の相関性を示すモデル構造を生成する。 Using the state variable S and the determination data D, the learning unit 110 uses the surface condition feature S5 of the workpiece and the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3 of the polishing tool 70, the robot arm operation). The correlation with the speed S4) is learned. That is, the learning unit 110 generates a model structure indicating the correlation between the components S1, S2, S3, S4, and S5 of the state variable S.

学習部１１０に対して入力される状態変数Ｓは、学習部１１０における学習周期で考えた場合、判定データＤが取得された１学習周期前のデータに基づくものとなる。機械学習装置１００が学習を進める間、環境においては、（１）ワークの表面状態の特徴Ｓ５の取得、（２）研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４の設定、すなわち研磨条件の設定、（３）上記（１）（２）に従った研磨の実行、（４）判定データＤの取得が繰り返し実施される。（２）の研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４は、前回までの学習結果に基づいて得られた研磨条件である。（４）の判定データＤは、研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４に従って行った研磨の評価結果である。 When the state variable S input to the learning unit 110 is considered in the learning cycle in the learning unit 110, the state variable S is based on data one learning cycle before the determination data D is acquired. While the machine learning apparatus 100 proceeds with learning, in the environment, (1) acquisition of the surface condition feature S5 of the workpiece, (2) rotation speed S1, rotation torque S2, pressing force S3 of the polishing tool 70, robot arm Setting of the operation speed S4, that is, setting of polishing conditions, (3) execution of polishing according to the above (1) and (2), and (4) acquisition of the determination data D are repeatedly performed. The rotational speed S1, rotational torque S2, pressing force S3, and robot arm operating speed S4 of the polishing tool 70 in (2) are polishing conditions obtained based on learning results up to the previous time. The determination data D in (4) is an evaluation result of polishing performed according to the rotational speed S1, the rotational torque S2, the pressing force S3 of the polishing tool 70, and the operating speed S4 of the robot arm.

このような学習サイクルを繰り返すことにより、学習部１１０は、ワークの表面状態の特徴Ｓ５と、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）と、の相関関係を暗示する特徴を自動的に識別することができるようになる。学習アルゴリズムの開始時には、ワークの表面状態の特徴Ｓ５と、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）と、の相関関係は実質的に未知であるが、学習部１１０は学習を進めるに従い徐々に特徴を識別して相関性を解釈する。ワークの表面状態の特徴Ｓ５と、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）と、の相関関係がある程度信頼できる水準まで解釈されると、学習部１１０が反復出力する学習結果は、現在状態すなわちワークの表面状態の特徴Ｓ５に対して、どのような研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を設定すべきかという行動の選択（意思決定）を行うために使用できるものとなる。つまり学習部１１０は、現在状態に対応する行動の最適解を出力できるようになる。 By repeating such a learning cycle, the learning unit 110 determines the surface condition S5 of the workpiece and the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3 of the polishing tool 70, and the operating speed S4 of the robot arm). ) And features that imply a correlation can be automatically identified. At the start of the learning algorithm, the correlation between the surface condition feature S5 of the workpiece and the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, and the robot arm operating speed S4 of the polishing tool 70) is substantially The learning unit 110 gradually identifies features and interprets the correlation as the learning proceeds. The correlation between the surface condition feature S5 of the workpiece and the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, and the robot arm operating speed S4 of the polishing tool 70) is interpreted to a certain level of reliability. The learning result that the learning unit 110 repeatedly outputs is the polishing condition (the rotational speed S1, the rotational torque S2, the pressing force S3, the robot S3 of the polishing tool 70) for the current state, that is, the surface state feature S5 of the workpiece. It can be used to select an action (decision making) as to whether the arm operating speed S4) should be set. That is, the learning unit 110 can output an optimal solution of behavior corresponding to the current state.

状態変数Ｓは外乱の影響を受け難いデータで構成され、判定データＤは制御装置１から撮像装置８０の画像データの解析結果を取得することにより一義的に求められる。したがって機械学習装置１００によれば、学習部１１０の学習結果を用いることで、現在状態すなわちワークの表面状態の特徴Ｓ５に対して最適な研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を、演算や目算によらず、自動的かつ正確に求めることができる。換言すれば、現在状態すなわちワークの表面状態の特徴Ｓ５を把握するだけで、最適な研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を迅速に決定することができる。したがって、ロボットによる研磨における研磨条件の設定を効率よく行うことができる。 The state variable S is composed of data that is hardly affected by disturbance, and the determination data D is uniquely obtained by acquiring the analysis result of the image data of the imaging device 80 from the control device 1. Therefore, according to the machine learning device 100, by using the learning result of the learning unit 110, the optimum polishing conditions (the rotation speed S1, the rotation torque S2, the rotation torque S2, the polishing tool 70) for the feature S5 of the current state, that is, the surface state of the workpiece. The pressing force S3 and the robot arm operating speed S4) can be automatically and accurately obtained without calculation or calculation. In other words, the optimum polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, and the robot arm operating speed S4 of the polishing tool 70) can be obtained only by grasping the feature S5 of the current state, that is, the surface state of the workpiece. Can be determined quickly. Therefore, it is possible to efficiently set the polishing conditions in the robot polishing.

機械学習装置１００の変形例として、学習部１１０は、同一の作業を行う複数のロボットのそれぞれについて得られた状態変数Ｓ及び判定データＤを用いて、全てのロボットに共通の適切な研磨条件を学習することができる。この構成によれば、一定時間で得られる状態変数Ｓと判定データＤとを含むデータ集合の量を増加させ、より多様なデータ集合を入力できるので、学習の速度や信頼性を向上させることができる。 As a modified example of the machine learning device 100, the learning unit 110 uses the state variable S and the determination data D obtained for each of a plurality of robots that perform the same work, and sets appropriate polishing conditions common to all the robots. Can learn. According to this configuration, the amount of the data set including the state variable S and the determination data D obtained in a certain time can be increased and more diverse data sets can be input, so that the learning speed and reliability can be improved. it can.

なお、学習部１１０が実行する学習アルゴリズムは特に限定されず、機械学習として公知の学習アルゴリズムを採用できる。図３は、図２に示す制御装置１の一形態であって、学習アルゴリズムの一例として強化学習を実行する学習部１１０を備えた構成を示す。強化学習は、学習対象が存在する環境の現在状態（つまり入力）を観測するとともに現在状態で所定の行動（つまり出力）を実行し、その行動に対し何らかの報酬を与えるというサイクルを試行錯誤的に反復して、報酬の総計が最大化されるような方策（本実施形態では研磨条件の設定）を最適解として学習する手法である。 Note that the learning algorithm executed by the learning unit 110 is not particularly limited, and a learning algorithm known as machine learning can be employed. FIG. 3 shows one configuration of the control device 1 shown in FIG. 2 and includes a learning unit 110 that performs reinforcement learning as an example of a learning algorithm. Reinforcement learning is a trial-and-error cycle that observes the current state (ie, input) of the environment where the learning target exists, executes a predetermined action (ie, output) in the current state, and gives some reward to that action. This is a method of learning, as an optimal solution, a policy (in this embodiment, setting of the polishing conditions) that iteratively maximizes the total reward.

図３に示す制御装置１が備える機械学習装置１００において、学習部１１０は、報酬計算部１１２、価値関数更新部１１４を有する。 In the machine learning device 100 included in the control device 1 illustrated in FIG. 3, the learning unit 110 includes a reward calculation unit 112 and a value function update unit 114.

報酬計算部１１２は、状態変数Ｓに基づいて研磨条件が設定された場合における研磨の評価結果（状態変数Ｓが取得された次の学習周期で用いられる判定データＤに相当）に関連する報酬Ｒを求める。 The reward calculation unit 112 rewards R related to the evaluation result of polishing when the polishing condition is set based on the state variable S (corresponding to the determination data D used in the next learning cycle in which the state variable S is acquired). Ask for.

価値関数更新部１１４は、報酬Ｒを用いて、研磨条件の価値を表す関数Ｑを更新する。価値関数更新部１１４が関数Ｑの更新を繰り返すことにより、学習部１１０は、ワークの表面状態の特徴Ｓ５と、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）と、の相関関係を学習する。 The value function updating unit 114 updates the function Q representing the value of the polishing condition using the reward R. When the value function update unit 114 repeats the update of the function Q, the learning unit 110 determines the surface condition of the workpiece S5 and the polishing conditions (the rotation speed S1, the rotation torque S2, the pressing force S3 of the polishing tool 70, the robot force). The correlation with the arm operating speed S4) is learned.

学習部１１０が実行する強化学習のアルゴリズムの一例を説明する。この例によるアルゴリズムは、Ｑ学習（Ｑ−ｌｅａｒｎｉｎｇ）として知られるものであって、行動主体の状態ｓと、その状態ｓで行動主体が選択し得る行動ａとを独立変数として、状態ｓで行動ａを選択した場合の行動の価値を表す関数Ｑ（ｓ，ａ）を学習する手法である。状態ｓで価値関数Ｑが最も高くなる行動ａを選択することが最適解となる。状態ｓと行動ａとの相関性が未知の状態でＱ学習を開始し、任意の状態ｓで種々の行動ａを選択する試行錯誤を繰り返すことで、価値関数Ｑを反復して更新し、最適解に近付ける。ここで、状態ｓで行動ａを選択した結果として環境（つまり状態ｓ）が変化したときに、その変化に応じた報酬（つまり行動ａの重み付け）ｒが得られるように構成し、より高い報酬ｒが得られる行動ａを選択するように学習を誘導することで、価値関数Ｑを比較的短時間で最適解に近付けることができる。 An example of the reinforcement learning algorithm executed by the learning unit 110 will be described. The algorithm according to this example is known as Q-learning (Q-learning), and the behavior s in the state s is defined as an independent variable with the behavior s state s and the behavior a that the behavior subject can select in the state s. This is a method for learning a function Q (s, a) representing the value of an action when a is selected. The optimal solution is to select the action a that has the highest value function Q in the state s. The value function Q is iteratively updated by repeating trial and error by starting Q learning in a state where the correlation between the state s and the action a is unknown, and selecting various actions a in an arbitrary state s. Approach the solution. Here, when the environment (that is, the state s) changes as a result of selecting the action a in the state s, a reward (that is, the weight of the action a) r corresponding to the change is obtained, and a higher reward By inducing learning to select an action a that gives r, the value function Q can be brought close to the optimal solution in a relatively short time.

価値関数Ｑの更新式は、一般に下記の数１式のように表すことができる。数１式において、ｓ_t及びａ_tはそれぞれ時刻ｔにおける状態及び行動であり、行動ａ_tにより状態はｓ_t+1に変化する。ｒ_t+1は、状態がｓ_tからｓ_t+1に変化したことで得られる報酬である。ｍａｘＱの項は、時刻ｔ＋１で最大の価値Ｑになる（と時刻ｔで考えられている）行動ａを行ったときのＱを意味する。α及びγはそれぞれ学習係数及び割引率であり、０＜α≦１、０＜γ≦１で任意設定される。 The updating formula of the value function Q can be generally expressed as the following formula 1. In equation (1), s _t and a _t is a state and behavior at each time t, the state by action a _t is changed to s _{t + 1.} r t _{+ 1} is a reward obtained by the state changes from s _t in s _{t + 1.} The term maxQ means Q when the action a having the maximum value Q at time t + 1 (and considered at time t) is performed. α and γ are a learning coefficient and a discount rate, respectively, and are arbitrarily set such that 0 <α ≦ 1 and 0 <γ ≦ 1.

学習部１１０がＱ学習を実行する場合、状態観測部１０６が観測した状態変数Ｓ及び判定データ取得部１０８が取得した判定データＤは、更新式の状態ｓに該当し、現在状態すなわちワークの表面状態の特徴Ｓ５に対し、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）をどのように決定するべきかという行動は、更新式の行動ａに該当し、報酬計算部１１２が求める報酬Ｒは、更新式の報酬ｒに該当する。よって価値関数更新部１１４は、現在状態に対する研磨条件の設定の価値を表す関数Ｑを、報酬Ｒを用いたＱ学習により繰り返し更新する。 When the learning unit 110 executes Q-learning, the state variable S observed by the state observation unit 106 and the determination data D acquired by the determination data acquisition unit 108 correspond to the update state s, and the current state, that is, the surface of the workpiece The behavior of how to determine the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, and the robot arm operating speed S4 of the polishing tool 70) with respect to the state feature S5 is an updateable behavior. A reward R that corresponds to a and is calculated by the reward calculation unit 112 corresponds to an update-type reward r. Therefore, the value function updating unit 114 repeatedly updates the function Q representing the value of setting the polishing condition with respect to the current state by Q learning using the reward R.

報酬計算部１１２は、例えば、決定した研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）に基づく研磨を行い、研磨の評価結果が「適」と判定される場合に、報酬Ｒを正（プラス）の値とすることができる。一方、研磨の評価結果が「否」と判定される場合に、報酬Ｒを負（マイナス）の値とすることができる。正負の報酬Ｒの絶対値は、互いに同一であっても良いし異なっていても良い。 For example, the reward calculation unit 112 performs polishing based on the determined polishing conditions (the rotation speed S1, the rotation torque S2, the pressing force S3, and the robot arm operation speed S4 of the polishing tool 70). ”Can be a positive (plus) value. On the other hand, when the polishing evaluation result is determined as “No”, the reward R can be set to a negative (minus) value. The absolute values of the positive and negative rewards R may be the same or different.

判定データＤが多値で与えられる場合であれば、例えば筋の濃さを示す値Ｄ１、滑らかさを示す値Ｄ２及び間隔を示す値Ｄ３と、それぞれについて予め定められた基準値と、の差分が所定の範囲内である場合に、研磨の評価結果が「適」であると判定できる。一方、所定の範囲外であれば「否」と判定できる。判定データＤが２値で与えられる場合、例えばＤ１，Ｄ２，Ｄ３がそれぞれ適＝「１」、否＝「０」等の値で与えられる場合は、入力が「１」であれば研磨の評価結果は「適」、「０」であれば研磨の評価結果は「否」と判定できる。 If the determination data D is given in multiple values, for example, a difference between a value D1 indicating the density of a line, a value D2 indicating smoothness, a value D3 indicating an interval, and a predetermined reference value for each. Can be determined to be “appropriate” when the polishing evaluation result is within a predetermined range. On the other hand, if it is outside the predetermined range, it can be determined as “No”. When the determination data D is given as a binary value, for example, when D1, D2, and D3 are given as appropriate values such as “1” and not = “0”, if the input is “1”, polishing evaluation is performed. If the result is “appropriate” and “0”, the polishing evaluation result can be determined as “no”.

研磨の評価結果を、「適」及び「否」の二通りだけでなく複数段階に設定することができる。例えば、報酬計算部１１２は、Ｄ１，Ｄ２，Ｄ３がそれぞれ基準値から遠くなるほど、すなわちＤ１，Ｄ２，Ｄ３と、それぞれについて予め定められた基準値と、の差分が大きくなるほど、報酬Ｒを減じるよう構成することができる。 The evaluation result of polishing can be set not only in two ways of “appropriate” and “not” but also in a plurality of stages. For example, the reward calculation unit 112 decreases the reward R as D1, D2, and D3 are further from the reference value, that is, as the difference between D1, D2, and D3 and the reference value that is predetermined for each increases. Can be configured.

なお、報酬計算部１１２は、判定データＤに含まれる複数の値を組み合わせて適否を判定するようにしても良い。 The reward calculation unit 112 may determine suitability by combining a plurality of values included in the determination data D.

価値関数更新部１１４は、状態変数Ｓと判定データＤと報酬Ｒとを、関数Ｑで表される行動価値（例えば数値）と関連付けて整理した行動価値テーブルを持つことができる。この場合、価値関数更新部１１４が関数Ｑを更新するという行為は、価値関数更新部１１４が行動価値テーブルを更新するという行為と同義である。Ｑ学習の開始時には、ワークの表面状態の特徴Ｓ５と、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）と、の相関関係は未知であるから、行動価値テーブルにおいては、種々の状態変数Ｓと判定データＤと報酬Ｒとが、無作為に定めた行動価値の値（関数Ｑ）と関連付けた形態で用意されている。報酬計算部１１２は、判定データＤが分かればこれ対応する報酬Ｒを直ちに算出でき、算出した値Ｒが行動価値テーブルに書き込まれる。 The value function updating unit 114 can have an action value table in which the state variable S, the determination data D, and the reward R are arranged in association with the action value (for example, a numerical value) represented by the function Q. In this case, the action that the value function updating unit 114 updates the function Q is synonymous with the action that the value function updating unit 114 updates the action value table. At the start of Q-learning, the correlation between the surface condition S5 of the workpiece and the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, and the robot arm operating speed S4 of the polishing tool 70) is unknown. Therefore, in the behavior value table, various state variables S, determination data D, and rewards R are prepared in a form associated with randomly determined behavior value values (function Q). The reward calculation unit 112 can immediately calculate the reward R corresponding to the determination data D, and the calculated value R is written in the action value table.

研磨の評価結果に応じた報酬Ｒを用いてＱ学習を進めると、より高い報酬Ｒが得られる行動を選択する方向へ学習が誘導され、選択した行動を現在状態で実行した結果として変化する環境の状態（つまり状態変数Ｓ及び判定データＤ）に応じて、現在状態で行う行動についての行動価値の値（関数Ｑ）が書き換えられて行動価値テーブルが更新される。この更新を繰り返すことにより、行動価値テーブルに表示される行動価値の値（関数Ｑ）は、適正な行動ほど大きな値となるように書き換えられる。このようにして、未知であった環境の現在状態すなわちワークの表面状態の特徴Ｓ５と、それに対する行動すなわち設定される研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）と、の相関性が徐々に明らかになる。つまり行動価値テーブルの更新により、ワークの表面状態の特徴Ｓ５と、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）と、の相関関係が徐々に最適解に近づけられる。 When Q learning is advanced using the reward R according to the evaluation result of polishing, the learning is guided in a direction to select an action that can obtain a higher reward R, and the environment changes as a result of executing the selected action in the current state In accordance with the state (that is, the state variable S and the determination data D), the action value value (function Q) for the action performed in the current state is rewritten, and the action value table is updated. By repeating this update, the value of the action value (function Q) displayed in the action value table is rewritten so that the more appropriate the action, the larger the value. In this way, the characteristic S5 of the current state of the environment that is unknown, that is, the surface condition of the workpiece, and the action corresponding thereto, that is, the polishing conditions that are set (the rotational speed S1, the rotational torque S2, the pressing force S3 of the polishing tool 70, the robot) The correlation with the operating speed S4) of the arm gradually becomes clear. In other words, by updating the behavior value table, there is a correlation between the feature S5 of the surface state of the workpiece and the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, and the robot arm operating speed S4). Gradually approach the optimal solution.

図４を参照して、学習部１１０が実行するＱ学習のフロー（つまり機械学習方法の一形態）をさらに説明する。
ステップＳＡ０１：価値関数更新部１１４は、その時点での行動価値テーブルを参照しながら、状態観測部１０６が観測した状態変数Ｓが示す現在状態で行う行動として、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を無作為に選択する。
ステップＳＡ０２：価値関数更新部１１４は、状態観測部１０６が観測している現在状態の状態変数Ｓを取り込む。
ステップＳＡ０３：価値関数更新部１１４は、判定データ取得部１０８が取得している現在状態の判定データＤを取り込む。
ステップＳＡ０４：価値関数更新部１１４は、判定データＤに基づき、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）が適当であったか否かを判断する。適当であった場合、ステップＳＡ０５に遷移する。適当でなかった場合、ステップＳＡ０７に遷移する。
ステップＳＡ０５：価値関数更新部１１４は、報酬計算部１１２が求めた正の報酬Ｒを関数Ｑの更新式に適用する。
ステップＳＡ０６：価値関数更新部１１４は、現在状態における状態変数Ｓ及び判定データＤと報酬Ｒと行動価値の値（更新後の関数Ｑ）とを用いて行動価値テーブルを更新する。
ステップＳＡ０７：価値関数更新部１１４は、報酬計算部１１２が求めた負の報酬Ｒを関数Ｑの更新式に適用する。 With reference to FIG. 4, the Q learning flow (that is, one form of the machine learning method) executed by the learning unit 110 will be further described.
Step SA01: The value function updating unit 114 refers to the polishing condition (the rotation speed of the polishing tool 70) as the action to be performed in the current state indicated by the state variable S observed by the state observation unit 106 while referring to the action value table at that time. S1, rotational torque S2, pressing force S3, and robot arm operating speed S4) are randomly selected.
Step SA02: The value function updating unit 114 takes in the state variable S of the current state that is being observed by the state observation unit 106.
Step SA03: The value function updating unit 114 takes in the determination data D in the current state acquired by the determination data acquisition unit 108.
Step SA04: Based on the determination data D, the value function updating unit 114 determines whether or not the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, the robot arm operating speed S4) are appropriate. to decide. If appropriate, the process proceeds to step SA05. If not, the process proceeds to step SA07.
Step SA05: The value function updating unit 114 applies the positive reward R obtained by the reward calculating unit 112 to the function Q update formula.
Step SA06: The value function updating unit 114 updates the action value table using the state variable S, the determination data D, the reward R, and the action value (updated function Q) in the current state.
Step SA07: The value function updating unit 114 applies the negative reward R obtained by the reward calculating unit 112 to the update formula of the function Q.

学習部１１０は、ステップＳＡ０１乃至ＳＡ０７を繰り返すことで行動価値テーブルを反復して更新し、学習を進行させる。なお、ステップＳＡ０４からステップＳＡ０７までの報酬Ｒを求める処理及び価値関数の更新処理は、判定データＤに含まれるそれぞれのデータについて実行される。 The learning unit 110 repeats steps SA01 to SA07 to repeatedly update the behavior value table, and advances the learning. It should be noted that the processing for obtaining the reward R and the value function updating processing from step SA04 to step SA07 are executed for each data included in the determination data D.

図１６は、図２に示す制御装置１の他の形態であって、学習アルゴリズムの他の例として教師あり学習を実行する学習部１１０を備えた構成を示す。教師あり学習は、入力と出力との関係が未知の状態で学習を開始する前述した強化学習と異なり、入力とそれに対応する出力との既知のデータセット（教師データと称する）が予め大量に与えられ、それら教師データから入力と出力との相関性を暗示する特徴を識別することで、新たな入力に対する所要の出力を推定するための相関性モデル（本願の機械学習装置１００ではロボットによる研磨の際の研磨条件)を学習する手法である。 FIG. 16 shows another configuration of the control device 1 shown in FIG. 2, and shows a configuration including a learning unit 110 that performs supervised learning as another example of the learning algorithm. Supervised learning is different from the above-described reinforcement learning in which learning is started in a state where the relationship between input and output is unknown, and a large number of known data sets (called teacher data) of the input and the corresponding output are given in advance. Thus, by identifying features implicating the correlation between the input and the output from the teacher data, a correlation model for estimating the required output for the new input (in the machine learning device 100 of the present application, the robot polishing is performed). This is a technique for learning polishing conditions).

図１６に示す機械学習装置１００において、学習部１１０は、状態変数Ｓ及び判定データＤからロボットによる研磨の際の研磨条件を導く相関性モデルＭと予め用意された教師データＴから識別される相関性特徴との誤差Ｅを計算する誤差計算部１１６と、誤差Ｅを縮小するように相関性モデルＭを更新するモデル更新部１１８とを備える。学習部１１０は、モデル更新部１１８が相関性モデルＭの更新を繰り返すことによって、ロボットによる研磨の際の研磨条件を学習する。 In the machine learning device 100 shown in FIG. 16, the learning unit 110 has a correlation identified from a correlation model M for deriving a polishing condition at the time of polishing by the robot from the state variable S and the determination data D and a teacher data T prepared in advance. An error calculation unit 116 that calculates an error E from the sex feature and a model update unit 118 that updates the correlation model M so as to reduce the error E are provided. The learning unit 110 learns the polishing conditions for polishing by the robot by causing the model updating unit 118 to repeatedly update the correlation model M.

相関性モデルＭの初期値は、例えば、状態変数Ｓ及び判定データＤとロボットによる研磨の際の研磨条件の相関性を単純化して（例えば一次関数で）表現したものであり、教師あり学習の開始前に学習部１１０に与えられる。教師データＴは、例えば、過去の研磨において熟練の作業者が決定した研磨条件を記録することで蓄積された経験値（ワークの表面状態の特徴と、ロボットによる研磨の際の研磨条件との既知のデータセット）によって構成でき、教師あり学習の開始前に学習部１１０に与えられる。誤差計算部１１６は、学習部１１０に与えられた大量の教師データＴからワークの表面状態の特徴と、ロボットによる研磨の際の研磨条件と、の相関性を暗示する相関性特徴を識別し、この相関性特徴と、現在状態における状態変数Ｓ及び判定データＤに対応する相関性モデルＭとの誤差Ｅを求める。モデル更新部１１８は、例えば予め定めた更新ルールに従い、誤差Ｅが小さくなる方向へ相関性モデルＭを更新する。 The initial value of the correlation model M is, for example, a simplified representation (for example, a linear function) of the correlation between the state variable S and the determination data D and the polishing conditions at the time of polishing by the robot. It is given to the learning unit 110 before starting. The teacher data T is, for example, an empirical value accumulated by recording polishing conditions determined by a skilled worker in past polishing (a feature of the surface condition of the workpiece and a known polishing condition at the time of polishing by a robot). And is given to the learning unit 110 before supervised learning is started. The error calculation unit 116 identifies a correlation feature that implies a correlation between the feature of the surface state of the workpiece and the polishing condition at the time of polishing by the robot from the large amount of teacher data T given to the learning unit 110, An error E between the correlation feature and the correlation model M corresponding to the state variable S and the determination data D in the current state is obtained. The model update unit 118 updates the correlation model M in a direction in which the error E becomes smaller, for example, according to a predetermined update rule.

次の学習サイクルでは、誤差計算部１１６は、更新後の相関性モデルＭに従って研磨を試行することにより変化した状態変数Ｓ及び判定データＤを用いて、それら変化した状態変数Ｓ及び判定データＤに対応する相関性モデルＭに関し誤差Ｅを求め、モデル更新部１１８が再び相関性モデルＭを更新する。このようにして、未知であった環境の現在状態（ワークの表面状態の特徴）とそれに対する行動（ロボットによる研磨の際の研磨条件との決定）との相関性が徐々に明らかになる。つまり相関性モデルＭの更新により、ワークの表面状態の特徴と、ロボットによる研磨の際の研磨条件と、の関係が、最適解に徐々に近づけられる。 In the next learning cycle, the error calculation unit 116 uses the state variable S and the determination data D that have been changed by attempting polishing according to the updated correlation model M, and uses the changed state variable S and the determination data D as the changed state variable S and determination data D. The error E is obtained for the corresponding correlation model M, and the model update unit 118 updates the correlation model M again. In this way, the correlation between the unknown current state of the environment (features of the surface state of the workpiece) and the behavior (determination of the polishing conditions during polishing by the robot) gradually becomes clear. That is, by updating the correlation model M, the relationship between the characteristics of the surface state of the workpiece and the polishing conditions when polishing by the robot is gradually brought closer to the optimal solution.

なお、機械学習装置１００では、学習の初期段階は学習部１１０が教師あり学習を実行し、学習がある程度進行した段階で、教師あり学習で得たロボットによる研磨の際の研磨条件初期値として学習部１１０が強化学習を実行するように構成することもできる。強化学習における初期値がある程度の信頼性を有しているので、比較的迅速に最適解に到達することができる。 In the machine learning device 100, the learning unit 110 performs supervised learning at the initial stage of learning, and learning is performed as an initial polishing condition value at the time of polishing by the robot obtained by supervised learning when the learning progresses to some extent. The unit 110 may be configured to perform reinforcement learning. Since the initial value in reinforcement learning has a certain degree of reliability, the optimal solution can be reached relatively quickly.

強化学習や教師あり学習を進める際に、例えばＱ学習の代わりに、ニューラルネットワークを用いることができる。図５Ａは、ニューロンのモデルを模式的に示す。図５Ｂは、図５Ａに示すニューロンを組み合わせて構成した三層のニューラルネットワークのモデルを模式的に示す。ニューラルネットワークは、例えば、ニューロンのモデルを模した演算装置や記憶装置等によって構成できる。 When performing reinforcement learning or supervised learning, for example, a neural network can be used instead of Q learning. FIG. 5A schematically shows a model of a neuron. FIG. 5B schematically shows a model of a three-layer neural network configured by combining the neurons shown in FIG. 5A. The neural network can be configured by, for example, an arithmetic device or a storage device imitating a neuron model.

図５Ａに示すニューロンは、複数の入力ｘ（ここでは一例として、入力ｘ₁〜入力ｘ₃）に対する結果ｙを出力するものである。各入力ｘ₁〜ｘ₃には、この入力ｘに対応する重みｗ（ｗ₁〜ｗ₃）が掛けられる。これにより、ニューロンは、次の数２式により表現される出力ｙを出力する。なお、数２式において、入力ｘ、出力ｙ及び重みｗは、すべてベクトルである。また、θはバイアスであり、ｆ_kは活性化関数である。 The neuron shown in FIG. 5A outputs a result y for a plurality of inputs x (here, as an example, inputs x ₁ to x ₃ ). Each input x ₁ ~x _3, the weight w corresponding to the input x (w ₁ ~w ₃₎ is multiplied. As a result, the neuron outputs an output y expressed by the following equation (2). In Equation 2, the input x, the output y, and the weight w are all vectors. Further, θ is a bias, and f _k is an activation function.

図５Ｂに示す三層のニューラルネットワークは、左側から複数の入力ｘ（ここでは一例として、入力ｘ１〜入力ｘ３）が入力され、右側から結果ｙ（ここでは一例として、結果ｙ１〜結果ｙ３）が出力される。図示の例では、入力ｘ１、ｘ２、ｘ３のそれぞれに対応の重み（総称してＷ１で表す）が乗算されて、個々の入力ｘ１、ｘ２、ｘ３がいずれも３つのニューロンＮ１１、Ｎ１２、Ｎ１３に入力されている。 In the three-layer neural network shown in FIG. 5B, a plurality of inputs x (in this example, inputs x1 to x3) are input from the left side, and a result y (in this case, as an example, results y1 to y3) is input from the right side. Is output. In the illustrated example, each of the inputs x1, x2, and x3 is multiplied by a corresponding weight (generically expressed as W1), and each of the inputs x1, x2, and x3 is assigned to three neurons N11, N12, and N13. Have been entered.

図５Ｂでは、ニューロンＮ１１〜Ｎ１３の各々の出力を、総称してｚ１で表す。ｚ１は、入カベクトルの特徴量を抽出した特徴ベクトルと見なすことができる。図示の例では、特徴ベクトルｚ１のそれぞれに対応の重み（総称してＷ２で表す）が乗算されて、個々の特徴ベクトルｚ１がいずれも２つのニューロンＮ２１、Ｎ２２に入力されている。特徴ベクトルｚ１は、重みＷ１と重みＷ２との間の特徴を表す。 In FIG. 5B, the outputs of the neurons N11 to N13 are collectively represented by z1. z1 can be regarded as a feature vector obtained by extracting the feature amount of the input vector. In the illustrated example, each feature vector z1 is multiplied by a corresponding weight (generically represented by W2), and each feature vector z1 is input to two neurons N21 and N22. The feature vector z1 represents a feature between the weight W1 and the weight W2.

図５Ｂでは、ニューロンＮ２１〜Ｎ２２の各々の出力を、総称してｚ２で表す。ｚ２は、特徴ベクトルｚ１の特徴量を抽出した特徴ベクトルと見なすことができる。図示の例では、特徴ベクトルｚ２のそれぞれに対応の重み（総称してＷ３で表す）が乗算されて、個々の特徴ベクトルｚ２がいずれも３つのニューロンＮ３１、Ｎ３２、Ｎ３３に入力されている。特徴ベクトルｚ２は、重みＷ２と重みＷ３との間の特徴を表す。最後にニューロンＮ３１〜Ｎ３３は、それぞれ結果ｙ１〜ｙ３を出力する。
なお、三層以上の層を為すニューラルネットワークを用いた、いわゆるディープラーニングの手法を用いることも可能である。 In FIG. 5B, the outputs of the neurons N21 to N22 are collectively represented by z2. z2 can be regarded as a feature vector obtained by extracting the feature amount of the feature vector z1. In the illustrated example, each feature vector z2 is multiplied by a corresponding weight (generically represented by W3), and each feature vector z2 is input to three neurons N31, N32, and N33. The feature vector z2 represents a feature between the weight W2 and the weight W3. Finally, the neurons N31 to N33 output the results y1 to y3, respectively.
It is also possible to use a so-called deep learning method using a neural network having three or more layers.

機械学習装置１００においては、状態変数Ｓと判定データＤとを入力ｘとして、学習部１１０がニューラルネットワークに従う多層構造の演算を行うことで、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を結果ｙとして出力することができる。また、機械学習装置１００においては、ニューラルネットワークを強化学習における価値関数として用い、状態変数Ｓと行動ａとを入力ｘとして、学習部１１０がニューラルネットワークに従う多層構造の演算を行うことで、当該状態における当該行動の価値（結果ｙ）を出力することもできる。なお、ニューラルネットワークの動作モードには、学習モードと価値予測モードとがあり、例えば学習モードで学習データセットを用いて重みｗを学習し、学習した重みｗを用いて価値予測モードで行動の価値判断を行うことができる。なお価値予測モードでは、検出、分類、推論等を行うこともできる。 In the machine learning device 100, the learning unit 110 performs a multilayer structure calculation according to a neural network using the state variable S and the determination data D as input x, thereby polishing conditions (rotation speed S1, rotation torque S2 of the polishing tool 70). The pressing force S3 and the robot arm operating speed S4) can be output as the result y. Further, in the machine learning device 100, the neural network is used as a value function in reinforcement learning, the state variable S and the action a are input x, and the learning unit 110 performs a multilayer structure operation according to the neural network, so that the state is obtained. It is also possible to output the value (result y) of the action at. The operation mode of the neural network includes a learning mode and a value prediction mode. For example, the weight w is learned using the learning data set in the learning mode, and the value of the action in the value prediction mode using the learned weight w. Judgment can be made. In the value prediction mode, detection, classification, inference, etc. can be performed.

上記した制御装置１の構成は、プロセッサ１０１が実行する機械学習方法（或いはプログラム）として記述できる。この機械学習方法は、ロボットによる研磨における研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を学習する機械学習方法であって、コンピュータのＣＰＵが、ワークの表面状態の特徴Ｓ５を、研磨を行う環境の現在状態を表す状態変数Ｓとして観測するステップと、設定された研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）に従って実施された研磨の評価結果を示す判定データＤを取得するステップと、状態変数Ｓと判定データＤとを用いて、ワークの表面状態の特徴Ｓ５と、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）と、を関連付けて学習するステップとを有する。 The configuration of the control device 1 described above can be described as a machine learning method (or program) executed by the processor 101. This machine learning method is a machine learning method that learns polishing conditions (rotation speed S1, rotation torque S2, pressing force S3, operation speed S4 of a robot arm) of a polishing tool 70 by a robot. The step of observing the feature S5 of the surface state of the workpiece as a state variable S representing the current state of the environment in which polishing is performed, and the set polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3 of the polishing tool 70). , Using the state variable S and the determination data D to obtain the determination data D indicating the evaluation result of the polishing performed according to the operation speed S4) of the robot arm, the surface condition feature S5 of the workpiece, and the polishing Learning by associating the conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, and the operating speed S4 of the robot arm of the polishing tool 70). And a step that.

図６は、本発明の第２の実施形態による制御装置２を示す。制御装置２は、機械学習装置１２０、状態データ取得部３を有する。
状態データ取得部３は、ワークの表面状態の特徴Ｓ５、及び研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を、状態データＳ０として取得し、状態観測部１０６に供給する。状態データ取得部３は、例えば制御装置２や、ロボットが備える各種デバイス及びセンサから、状態データＳ０を取得することができる。 FIG. 6 shows a control device 2 according to a second embodiment of the present invention. The control device 2 includes a machine learning device 120 and a state data acquisition unit 3.
The state data acquisition unit 3 acquires, as state data S0, the feature S5 of the surface state of the workpiece and the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, and the robot arm operating speed S4). And supplied to the state observation unit 106. The state data acquisition unit 3 can acquire the state data S0 from, for example, the control device 2 and various devices and sensors included in the robot.

機械学習装置１２０は、状態観測部１０６、判定データ取得部１０８、学習部１１０をに加え、意思決定部１２２を有する。意思決定部１２２は、例えばプロセッサ１０１の一機能として実現され得る。又は、例えばＲＯＭ１０２に格納されたソフトウェアをプロセッサ１０１が実行することにより実現されても良い。 The machine learning device 120 includes a decision making unit 122 in addition to the state observation unit 106, the determination data acquisition unit 108, and the learning unit 110. The decision making unit 122 can be realized as one function of the processor 101, for example. Alternatively, for example, it may be realized by the processor 101 executing software stored in the ROM 102.

機械学習装置１２０は、ロボットによる研磨における研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を機械学習により自ら学習するためのソフトウェア（学習アルゴリズム等）及びハードウェア（プロセッサ１０１等）に加えて、学習結果に基づいて求めた研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を制御装置２への指令として出力するためのソフトウェア（演算アルゴリズム等）及びハードウェア（プロセッサ１０１等）を含むものである。機械学習装置１２０は、１つの共通のプロセッサが、学習アルゴリズム、演算アルゴリズム等の全てのソフトウェアを実行する構成を有することもできる。 The machine learning device 120 is software (learning algorithm) for self-learning polishing conditions (rotation speed S1, polishing torque S2, pressing force S3, and robot arm operating speed S4 of the polishing tool 70) by machine learning. Etc.) and hardware (processor 101, etc.), as well as the polishing conditions (rotation speed S1, rotation torque S2, pressing force S3, robot arm operation speed S4 of the polishing tool 70) obtained based on the learning result are controlled. Software (arithmetic algorithm etc.) and hardware (processor 101 etc.) for outputting as a command to the apparatus 2 are included. The machine learning device 120 may have a configuration in which one common processor executes all software such as a learning algorithm and an arithmetic algorithm.

意思決定部１２２は、学習部１１０が学習した結果に基づいて、ワークの表面状態の特徴Ｓ５に対応する研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を決定する指令を含む指令値Ｃを生成する。意思決定部１２２が指令値Ｃを制御装置２に対して出力すると、制御装置２が指令値Ｃに従ってロボットを制御する。こうして環境の状態が変化する。 Based on the result learned by the learning unit 110, the decision deciding unit 122 polishes the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3 of the polishing tool 70, and the robot arm) corresponding to the surface condition feature S5 of the workpiece. A command value C including a command for determining the operation speed S4) is generated. When the decision making unit 122 outputs the command value C to the control device 2, the control device 2 controls the robot according to the command value C. Thus, the state of the environment changes.

状態観測部１０６は、意思決定部１２２が環境へ指令値Ｃを出力したことにより変化した状態変数Ｓを、次の学習周期において観測する。学習部１１０は、変化した状態変数Ｓを用いて、例えば価値関数Ｑ（すなわち行動価値テーブル）を更新することで、ロボットによる研磨における研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を学習する。なお、その際に状態観測部１０６は、研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を状態データ取得部３が取得する状態データＳ０から取得するのではなく、第１の実施形態で説明したように機械学習装置１２０のＲＡＭ１０３から観測しても良い。 The state observation unit 106 observes the state variable S that has changed as a result of the command determination unit 122 outputting the command value C to the environment in the next learning cycle. The learning unit 110 uses the changed state variable S to update the value function Q (that is, the action value table), for example, thereby polishing conditions for polishing by the robot (the rotation speed S1, the rotation torque S2, and the pressing of the polishing tool 70). The force S3 and the robot arm operating speed S4) are learned. At that time, the state observation unit 106 obtains the state data S0 from which the state data acquisition unit 3 acquires the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, and the robot arm operating speed S4). Instead of acquiring from, it may be observed from the RAM 103 of the machine learning device 120 as described in the first embodiment.

そして意思決定部１２２は、学習結果に基づいて求めた研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を指令する指令値Ｃを、再び制御装置２へと出力する。この学習周期を繰り返すことにより、機械学習装置１２０は学習を進め、自身が決定する研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）の信頼性を徐々に向上させる。 Then, the decision making unit 122 again gives the command value C for instructing the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force S3, the operating speed S4 of the robot arm) obtained based on the learning result. Output to the control device 2. By repeating this learning cycle, the machine learning device 120 advances the learning, and the reliability of the polishing conditions (the rotation speed S1, the rotation torque S2, the pressing force S3, and the robot arm operation speed S4) determined by the machine learning device 120 itself. Gradually improve sex.

機械学習装置１２０は、第１の実施形態の機械学習装置１００と同等の効果を奏する。加えて、機械学習装置１２０は、意思決定部１２２の出力によって環境の状態を変化させることができる。なお、機械学習装置１００では、意思決定部１２２に相当する機能を外部装置に求めることで、学習部１１０の学習結果を環境に反映させることが可能である。 The machine learning device 120 has the same effect as the machine learning device 100 of the first embodiment. In addition, the machine learning device 120 can change the state of the environment according to the output of the decision making unit 122. In the machine learning device 100, the learning result of the learning unit 110 can be reflected in the environment by obtaining a function corresponding to the decision making unit 122 from an external device.

以下の第３〜５の実施形態では、第１，第２の実施形態による制御装置１，２が、クラウドサーバやホストコンピュータ、フォグコンピュータ、エッジコンピュータ（ロボットコントローラ、制御装置等）を含む複数の装置と有線／無線のネットワークを介して相互に接続した実施形態について説明する。図７に例示されるように、以下の第３〜５の実施形態では、複数の装置のそれぞれがネットワークに接続された状態でクラウドサーバ６等を含む層、フォグコンピュータ７等を含む層、エッジコンピュータ８（セル９に含まれるロボットコントローラ、制御装置等）等を含む層の、３つの階層に論理的に分けて構成されているシステムを想定する。この様なシステムに於いて、制御装置１，２は、クラウドサーバ６、フォグコンピュータ７、エッジコンピュータ８のいずれの上にも実装することが可能であり、それぞれの複数の装置との間でネットワークを介して相互に学習データを共有して分散学習をしたり、生成した学習モデルをフォグコンピュータ７やクラウドサーバ６に収集して大規模な解析を行ったり、更に、生成した学習モデルの相互再利用等をしたりすることができる。図７に例示されるシステムにおいて、セル９は各地の工場にそれぞれ複数設けられ、それぞれのセル９を所定の単位（工場単位、同じ製造業者の複数の工場単位等）で上位層のフォグコンピュータ７が管理する。そして、これらフォグコンピュータ７が収集、解析したデータを、更に上位層のクラウドサーバ６で収集、解析等を行い、その結果として得られた情報を各々のエッジコンピュータの制御等に活用することができる。 In the following third to fifth embodiments, the control devices 1 and 2 according to the first and second embodiments include a plurality of cloud servers, host computers, fog computers, and edge computers (robot controllers, control devices, etc.). An embodiment in which the apparatus is connected to each other via a wired / wireless network will be described. As illustrated in FIG. 7, in the following third to fifth embodiments, a layer including a cloud server 6 and the like, a layer including a fog computer 7 and the like, and an edge in a state where each of a plurality of devices is connected to a network. Assume a system that is logically divided into three layers including a computer 8 (a robot controller, a control device, and the like included in the cell 9). In such a system, the control devices 1 and 2 can be mounted on any of the cloud server 6, the fog computer 7, and the edge computer 8, and are networked with a plurality of devices. The learning data is shared with each other through distributed learning, the generated learning models are collected in the fog computer 7 and the cloud server 6 for a large-scale analysis, and the generated learning models are further reciprocated. You can use it. In the system illustrated in FIG. 7, a plurality of cells 9 are provided in factories in various places, and each cell 9 is arranged in a predetermined unit (a factory unit, a plurality of factory units of the same manufacturer, etc.) Manage. The data collected and analyzed by the fog computer 7 is further collected and analyzed by the cloud server 6 of the upper layer, and the information obtained as a result can be used for control of each edge computer. .

図８は、制御装置１，２に複数のロボットを加えた第３の実施形態によるシステム１７０を示す。システム１７０は、複数のロボット１６０及びロボット１６０’を有する。全てのロボット１６０とロボット１６０’は、有線又は無線のネットワーク１７２により互いに接続される。 FIG. 8 shows a system 170 according to a third embodiment in which a plurality of robots are added to the control devices 1 and 2. The system 170 includes a plurality of robots 160 and a robot 160 '. All the robots 160 and 160 'are connected to each other by a wired or wireless network 172.

ロボット１６０及びロボット１６０’は、同じ目的の作業に必要とされる機構を有し、同一の作業を行う。一方、ロボット１６０は制御装置１，２を備えるが、ロボット１６０’は制御装置１，２と同じ制御装置は備えない。 The robot 160 and the robot 160 ′ have mechanisms required for the same purpose work and perform the same work. On the other hand, the robot 160 includes the control devices 1 and 2, but the robot 160 ′ does not include the same control device as the control devices 1 and 2.

制御装置１，２を備えるほうのロボット１６０は、学習部１１０の学習結果を用いて、ワークの表面状態の特徴Ｓ５に対応する研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を、演算や目算によらず、自動的かつ正確に求めることができる。また、少なくとも１つのロボット１６０の制御装置２が、他の複数のロボット１６０及びロボット１６０’のそれぞれについて得られた状態変数Ｓ及び判定データＤを利用し、全てのロボット１６０及びロボット１６０’に共通するロボットによる研磨における研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を学習し、その学習結果を全てのロボット１６０及びロボット１６０’が共有するように構成できる。システム１７０によれば、より多様なデータ集合（状態変数Ｓ及び判定データＤを含む）を入力として、ロボットによる研磨における研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）の学習の速度や信頼性を向上させることができる。 The robot 160 having the control devices 1 and 2 uses the learning result of the learning unit 110 to polish the polishing conditions (the rotational speed S1, the rotational torque S2, the pressing force of the polishing tool 70) corresponding to the feature S5 of the surface state of the workpiece. S3, the robot arm operating speed S4) can be automatically and accurately determined without calculation or calculation. In addition, the control device 2 of at least one robot 160 uses the state variable S and the determination data D obtained for each of the other plurality of robots 160 and 160 ′, and is common to all the robots 160 and 160 ′. Polishing conditions (polishing speed S1, rotating torque S2, pressing force S3, robot arm operating speed S4 of the polishing tool 70) in the polishing by the robot that performs the learning are learned, and all the robots 160 and 160 ′ share the learning results. Can be configured to According to the system 170, a more diverse data set (including the state variable S and the determination data D) is used as an input, and polishing conditions for polishing by the robot (the rotational speed S1, the rotational torque S2, the pressing force S3 of the polishing tool 70, the robot) The learning speed and reliability of the arm operating speed S4) can be improved.

図９は、複数のロボット１６０’を備えた第４の実施形態によるシステム１７０を示す。システム１７０は、同一の機械構成を有する複数のロボット１６０’と、機械学習装置１２０（又は機械学習装置１００）と、を有する。複数のロボット１６０’と機械学習装置１２０（又は機械学習装置１００）とは、有線又は無線のネットワーク１７２により互いに接続される。 FIG. 9 shows a system 170 according to a fourth embodiment comprising a plurality of robots 160 '. The system 170 includes a plurality of robots 160 ′ having the same machine configuration and the machine learning device 120 (or the machine learning device 100). The plurality of robots 160 ′ and the machine learning device 120 (or the machine learning device 100) are connected to each other via a wired or wireless network 172.

機械学習装置１２０（又は機械学習装置１００）は、複数のロボット１６０’のそれぞれについて得られた状態変数Ｓ及び判定データＤに基づき、全てのロボット１６０’に共通するロボットによる研磨における研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を学習する。機械学習装置１２０（又は機械学習装置１００）は、その学習結果を用いて、ワークの表面状態の特徴Ｓ５に対応する研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）を、演算や目算によらず、自動的かつ正確に求めることができる。 The machine learning device 120 (or the machine learning device 100) is based on the state variables S and the determination data D obtained for each of the plurality of robots 160 ′, and polishing conditions (polishing) for polishing by the robots common to all the robots 160 ′. The tool 70 learns the rotational speed S1, the rotational torque S2, the pressing force S3, and the robot arm operating speed S4). The machine learning device 120 (or the machine learning device 100) uses the learning result to polish the polishing conditions (rotation speed S1, rotation torque S2, pressing force S3 of the polishing tool 70, the robot's surface state feature S5, the robot, The arm operating speed S4) can be automatically and accurately obtained without calculation or calculation.

機械学習装置１２０（又は機械学習装置１００）は、クラウドサーバ、フォグコンピュータ、エッジコンピュータ等に実装されても良い。この構成によれば、複数のロボット１６０’のそれぞれが存在する場所や時期に関わらず、必要なときに必要な数のロボット１６０’を機械学習装置１２０（又は機械学習装置１００）に接続することができる。 The machine learning device 120 (or the machine learning device 100) may be mounted on a cloud server, a fog computer, an edge computer, or the like. According to this configuration, the necessary number of robots 160 ′ are connected to the machine learning device 120 (or the machine learning device 100) when necessary, regardless of the location and timing of each of the plurality of robots 160 ′. Can do.

システム１７０又はシステム１７０に従事する作業者は、機械学習装置１２０（又は１００）による学習開始後の適当な時期に、機械学習装置１２０（又は機械学習装置１００）による研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）の学習の到達度（すなわち出力される研磨条件（研磨ツール７０の回転速度Ｓ１、回転トルクＳ２、押付力Ｓ３、ロボットのアームの動作速度Ｓ４）の信頼性）が要求レベルに達したか否かの判断を実行することができる。 The system 170 or an operator engaged in the system 170 is subjected to polishing conditions (rotation of the polishing tool 70) by the machine learning device 120 (or machine learning device 100) at an appropriate time after the learning by the machine learning device 120 (or 100) is started. Learning degree of speed S1, rotational torque S2, pressing force S3, robot arm operating speed S4 (that is, polishing conditions to be output (rotational speed S1, rotational torque S2, pressing force S3 of polishing tool 70), robot It can be determined whether the reliability of the arm operating speed S4) has reached the required level.

図１０は、制御装置１を備えた第５の実施形態によるシステム１７０を示す。システム１７０は、エッジコンピュータやフォグコンピュータ、ホストコンピュータ、クラウドサーバ等のコンピュータ５の上に実装された少なくとも１台の機械学習装置１００’と、ロボット１６０を制御する制御装置（エッジコンピュータ）として実装された少なくとも１台の制御装置１と、コンピュータ５及びロボット１６０を互いに接続する有線／無線のネットワーク１７２とを備える。 FIG. 10 shows a system 170 according to a fifth embodiment provided with a control device 1. The system 170 is implemented as at least one machine learning device 100 ′ mounted on the computer 5 such as an edge computer, a fog computer, a host computer, or a cloud server, and a control device (edge computer) that controls the robot 160. And at least one control device 1 and a wired / wireless network 172 that connects the computer 5 and the robot 160 to each other.

上記構成を有するシステム１７０では、機械学習装置１００’を備えるコンピュータ５が、各々のロボット１６０を制御する制御装置１から、該制御装置１が備える機械学習装置１００による機械学習の結果として得られた学習モデルを取得する。そして、コンピュータ５が備える機械学習装置１００’は、これら複数の学習モデルに基づく知識の最適化や効率化の処理を行うことで、新たに最適化乃至効率化された学習モデルを生成し、生成した学習モデルを各々のロボット１６０を制御する制御装置１に対して配布する。 In the system 170 having the above configuration, the computer 5 including the machine learning device 100 ′ is obtained as a result of machine learning by the machine learning device 100 included in the control device 1 from the control device 1 that controls each robot 160. Get a learning model. Then, the machine learning device 100 ′ included in the computer 5 generates and optimizes a learning model that is newly optimized or made efficient by performing knowledge optimization and efficiency processing based on the plurality of learning models. The learning model is distributed to the control device 1 that controls each robot 160.

機械学習装置１００’が行う学習モデルの最適化乃至効率化の例としては、各制御装置１から取得した複数の学習モデルに基づいた蒸留モデルの生成が挙げられる。この場合、本実施形態による機械学習装置１００’は、学習モデルに対して入力する入力データを作成し、該入力モデルを各々の学習モデルに対して入力した結果として得られる出力を用いて、１から学習を行うことで新たに学習モデル（蒸留モデル）を生成する。このようにして生成された蒸留モデルは、上記でも説明したように、外部記憶媒体やネットワーク等を介した他のコンピュータに対する配布により適している。 An example of optimization or efficiency improvement of the learning model performed by the machine learning device 100 ′ includes generation of a distillation model based on a plurality of learning models acquired from each control device 1. In this case, the machine learning device 100 ′ according to the present embodiment creates input data to be input to the learning model, and uses an output obtained as a result of inputting the input model to each learning model. A new learning model (distillation model) is generated by learning from the above. As described above, the distillation model generated in this way is more suitable for distribution to other computers via an external storage medium or a network.

機械学習装置１００’が行う学習モデルの最適化乃至効率化の他の例としては、各制御装置１から取得した複数の学習モデルに対して蒸留を行う過程において、入力データに対する各学習モデルの出力の分布を一般的な統計的手法で解析し、入力データと出力データの組の外れ値を抽出し、該外れ値を除外した入力データと出力データの組を用いて蒸留を行うことも考えられる。このような過程を経ることで、それぞれの学習モデルから得られる入力データと出力データの組から例外的な推定結果を除外し、自例外的な推定結果を除外した入力データと出力データの組を用いて蒸留モデルを生成することができる。このようにして生成された蒸留モデルは、複数の制御装置１で生成された学習モデルから、当該制御装置１が制御するロボット１６０に対して汎用的な蒸留モデルを生成することができる。
なお、他の一般的な学習モデルの最適化乃至効率化の手法（各学習モデルを解析し、その解析結果に基づいて学習モデルのハイパパラメータを最適化する等）も適宜導入することが可能である。 As another example of optimization or efficiency improvement of the learning model performed by the machine learning device 100 ′, in the process of performing distillation on a plurality of learning models acquired from each control device 1, the output of each learning model with respect to input data It is also possible to analyze the distribution of the data by a general statistical method, extract outliers of the input data and output data sets, and perform distillation using the input data and output data sets excluding the outliers. . Through this process, exceptional estimation results are excluded from the input data and output data pairs obtained from each learning model, and the input data and output data pairs are excluded from the exceptional estimation results. Can be used to generate a distillation model. The distillation model generated in this way can generate a general-purpose distillation model for the robot 160 controlled by the control device 1 from the learning models generated by the plurality of control devices 1.
In addition, other general learning model optimization or efficiency methods (such as analyzing each learning model and optimizing the hyperparameters of the learning model based on the analysis result) can be introduced as appropriate. is there.

本実施形態によるシステム１７０では、例えばエッジコンピュータとしての複数のロボット１６０（制御装置１）に対して設置されたフォグコンピュータとしてのコンピュータ５の上に機械学習装置１００’を配置し、各々のロボット１６０（制御装置１）で生成された学習モデルをフォグコンピュータ上に集約して記憶しておき、記憶した複数の学習モデルに基づいた最適化乃至効率化を行った上で、最適化乃至効率化された学習モデルを必要に応じて各ロボット１６０（制御装置１）に対して再配布するという運用を行うことができる。 In the system 170 according to the present embodiment, for example, the machine learning device 100 ′ is arranged on a computer 5 as a fog computer installed with respect to a plurality of robots 160 (control devices 1) as edge computers, and each robot 160. The learning model generated by the (control device 1) is aggregated and stored on a fog computer, and is optimized or made efficient after performing optimization or efficiency based on the stored learning models. The learning model can be redistributed to each robot 160 (control device 1) as necessary.

また、本実施形態によるシステム１７０では、例えばフォグコンピュータとしてのコンピュータ５の上に集約して記憶された学習モデルや、フォグコンピュータ上で最適化乃至効率化された学習モデルを、更に上位のホストコンピュータやクラウドサーバ上に集め、これら学習モデルを用いて工場やロボット１６０のメーカでの知的作業への応用（上位サーバでの更なる汎用的な学習モデルの構築及び再配布、学習モデルの解析結果に基づく保守作業の支援、各々のロボット１６０の性能等の分析、新しい機械の開発への応用等）を行うことができる。 Further, in the system 170 according to the present embodiment, for example, learning models stored in an aggregate on the computer 5 serving as a fog computer, or learning models optimized or made efficient on the fog computer are further converted into higher-order host computers. Collected on cloud servers and cloud servers, and using these learning models to apply to intelligent work at factories and robot 160 manufacturers (construction and redistribution of more general-purpose learning models on higher-level servers, results of learning model analysis) Based on the maintenance work, analysis of the performance of each robot 160, application to the development of new machines, etc.).

図１１は、図１０で示したコンピュータ５の概略的なハードウェア構成図である。
コンピュータ５が備えるＣＰＵ５１１は、コンピュータ５を全体的に制御するプロセッサである。ＣＰＵ５１１は、ＲＯＭ５１２に格納されたシステム・プログラムをバス５２０を介して読み出し、該システム・プログラムに従ってコンピュータ５全体を制御する。ＲＡＭ５１３には一時的な計算データ、入力装置５３１を介して作業者が入力した各種データ等が一時的に格納される。 FIG. 11 is a schematic hardware configuration diagram of the computer 5 shown in FIG.
A CPU 511 provided in the computer 5 is a processor that controls the computer 5 as a whole. The CPU 511 reads the system program stored in the ROM 512 via the bus 520 and controls the entire computer 5 according to the system program. The RAM 513 temporarily stores temporary calculation data, various data input by the operator via the input device 531, and the like.

不揮発性メモリ５１４は、例えば図示しないバッテリでバックアップされたメモリやＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等で構成され、コンピュータ５の電源がオフされても記憶状態が保持される。不揮発性メモリ５１４には、コンピュータ５の動作に係る設定情報が格納される設定領域や、入力装置５３１から入力されたデータ、各ロボット１６０（の制御装置）から取得される学習モデル、図示しない外部記憶装置やネットワークを介して読み込まれたデータ等が記憶される。不揮発性メモリ５１４に記憶されたプログラムや各種データは、実行時／利用時にはＲＡＭ５１３に展開されても良い。また、ＲＯＭ５１２には、各種データを解析するための公知の解析プログラム等を含むシステム・プログラムが予め書き込まれている。 The non-volatile memory 514 includes, for example, a memory backed up by a battery (not shown), an SSD (Solid State Drive), and the like, and the storage state is maintained even when the computer 5 is turned off. The non-volatile memory 514 stores a setting area in which setting information related to the operation of the computer 5 is stored, data input from the input device 531, a learning model acquired from each robot 160 (control device thereof), an external (not shown) Data read via a storage device or a network is stored. The program and various data stored in the nonvolatile memory 514 may be expanded in the RAM 513 at the time of execution / use. In addition, a system program including a known analysis program for analyzing various data is written in the ROM 512 in advance.

コンピュータ５は、インタフェース５１６を介してネットワーク１７２と接続されている。ネットワーク１７２には、少なくとも１つのロボット１６０や他のコンピュータ等が接続され、コンピュータ５との間で相互にデータのやり取りを行っている。 The computer 5 is connected to the network 172 via the interface 516. At least one robot 160 and other computers are connected to the network 172, and exchange data with the computer 5.

表示装置５３０には、メモリ上に読み込まれた各データ、プログラム等が実行された結果として得られたデータ等がインタフェース５１７を介して出力されて表示される。また、キーボードやポインティングデバイス等から構成される入力装置５３１は、作業者による操作に基づく指令，データ等をインタフェース５１８を介してＣＰＵ５１１に渡す。
なお、機械学習装置１００については、コンピュータ５のＣＰＵ５１１と協働して学習モデルの最適化乃至効率化に用いられる点を除けば、図１で説明したものと同様のハードウェア構成を備える。 The display device 530 outputs and displays data, etc. obtained as a result of executing each data, program, etc. read on the memory via the interface 517. Further, the input device 531 configured by a keyboard, a pointing device, and the like passes commands, data, and the like based on operations by the worker to the CPU 511 via the interface 518.
The machine learning device 100 has the same hardware configuration as that described in FIG. 1 except that the machine learning device 100 is used for optimization or efficiency improvement of the learning model in cooperation with the CPU 511 of the computer 5.

図１２は、制御装置１を備えた第６の実施形態によるシステム１７０を示す。システム１７０は、ロボット１６０を制御する制御装置（エッジコンピュータ）として実装された複数の制御装置１と、複数台の他のロボット１６０（制御装置１）と、これらを互いに接続する有線／無線のネットワーク１７２とを備える。 FIG. 12 shows a system 170 according to a sixth embodiment provided with a control device 1. The system 170 includes a plurality of control devices 1 implemented as control devices (edge computers) that control the robot 160, a plurality of other robots 160 (control devices 1), and a wired / wireless network that connects them to each other. 172.

上記構成を有するシステム１７０では、機械学習装置１００を備える制御装置１は、制御対象となるロボット１６０から取得された状態データや判定データ、（機械学習装置１００を備えていない）他のロボット１６０から取得した状態データや判定データに基づいた機械学習を行い、学習モデルを生成する。このようにして生成された学習モデルは、自身の制御するロボット１６０の研磨動作における研磨条件の決定に用いられる他、機械学習装置１００を備えていない他のロボット１６０’からの要求に応じて該他のロボット１６０（の制御装置）による研磨動作における研磨条件の決定にも用いられる。また、新たに学習モデル生成前の機械学習装置１００を備えた制御装置１が導入された際には、ネットワーク１７２を介して学習モデルを備えた他の制御装置１から該学習モデルを取得して利用することも可能となる。 In the system 170 having the above configuration, the control device 1 including the machine learning device 100 receives state data and determination data acquired from the robot 160 to be controlled, and other robots 160 (not including the machine learning device 100). Machine learning based on the acquired state data and determination data is performed to generate a learning model. The learning model generated in this way is used for determining the polishing conditions in the polishing operation of the robot 160 controlled by itself, and in response to a request from another robot 160 ′ that does not include the machine learning device 100. It is also used to determine the polishing conditions in the polishing operation by another robot 160 (control device thereof). Further, when the control device 1 including the machine learning device 100 before the generation of the learning model is newly introduced, the learning model is acquired from another control device 1 including the learning model via the network 172. It can also be used.

本実施形態によるシステムでは、いわゆるエッジコンピュータとしての複数のロボット１６０（制御装置１）の間で学習に用いるデータや学習モデルを共有して活用することが可能となるため、機械学習の効率の向上や、機械学習にかけるコストの削減（ロボット１６０を制御する１台の制御装置（制御装置１）にのみ機械学習装置１００を導入し、他のロボット１６０との間で共有するなど）をすることができる。 In the system according to the present embodiment, it is possible to share and use data and learning models used for learning among a plurality of robots 160 (control device 1) as so-called edge computers, so that the efficiency of machine learning is improved. Or reducing the cost of machine learning (for example, introducing the machine learning device 100 only in one control device (control device 1) for controlling the robot 160 and sharing it with other robots 160). Can do.

以上、本発明の実施の形態について説明したが、本発明は上述した実施の形態の例のみに限定されることなく、適宜の変更を加えることにより様々な態様で実施することができる。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments, and can be implemented in various modes by making appropriate changes.

例えば、機械学習装置１００や機械学習装置１２０が実行する学習アルゴリズム、機械学習装置１２０が実行する演算アルゴリズム、制御装置１や制御装置２が実行する制御アルゴリズム等は、上述したものに限定されず、様々なアルゴリズムを採用できる。 For example, the learning algorithm executed by the machine learning device 100 or the machine learning device 120, the arithmetic algorithm executed by the machine learning device 120, the control algorithm executed by the control device 1 or the control device 2 are not limited to those described above, Various algorithms can be adopted.

また、上記した実施形態では、制御装置１（又は制御装置２）と、機械学習装置１００（又は機械学習装置１２０）と、は異なるＣＰＵを有する装置として説明しているが、機械学習装置１００（又は機械学習装置１２０）は、制御装置１（又は制御装置２）が備えるＣＰＵ１１と、ＲＯＭ１２に記憶されるシステム・プログラムにより実現されるよう構成しても良い。 In the above embodiment, the control device 1 (or the control device 2) and the machine learning device 100 (or the machine learning device 120) are described as devices having different CPUs, but the machine learning device 100 ( Alternatively, the machine learning device 120) may be configured to be realized by a CPU 11 provided in the control device 1 (or the control device 2) and a system program stored in the ROM 12.

１，２制御装置
３状態データ取得部
５コンピュータ
６クラウドサーバ
７フォグコンピュータ
８エッジコンピュータ
９セル
１１ＣＰＵ
１２ＲＯＭ
１３ＲＡＭ
１４不揮発性メモリ
１８，１９，２１，２２インタフェース
２０バス
３０軸制御回路
４０サーボアンプ
５０サーボモータ
６０教示操作盤
７０研磨ツール
８０撮像装置
１００機械学習装置
１０１プロセッサ
１０２ＲＯＭ
１０３ＲＡＭ
１０４不揮発性メモリ
１０６状態観測部
１０８判定データ取得部
１１０学習部
１１２報酬計算部
１１４価値関数更新部
１１６誤差計算部
１１８モデル更新部
１２０機械学習装置
１２２意思決定部
１６０，１６０’ ロボット
１７０，１７０’ システム
１７２ネットワーク
５１１ＣＰＵ
５１２ＲＯＭ
５１３ＲＡＭ
５１４不揮発性メモリ
５１６、５１６，５１７インタフェース
５２０バス
５３０表示装置
５３１入力装置 DESCRIPTION OF SYMBOLS 1, 2 Control apparatus 3 Status data acquisition part 5 Computer 6 Cloud server 7 Fog computer 8 Edge computer 9 Cell 11 CPU
12 ROM
13 RAM
14 Nonvolatile memory 18, 19, 21, 22 Interface 20 Bus 30 Axis control circuit 40 Servo amplifier 50 Servo motor 60 Teaching operation panel 70 Polishing tool 80 Imaging device 100 Machine learning device 101 Processor 102 ROM
103 RAM
104 non-volatile memory 106 state observation unit 108 determination data acquisition unit 110 learning unit 112 reward calculation unit 114 value function update unit 116 error calculation unit 118 model update unit 120 machine learning device 122 decision determination unit 160, 160 ′ robot 170, 170 ′ System 172 Network 511 CPU
512 ROM
513 RAM
514 Non-volatile memory 516, 516, 517 Interface 520 Bus 530 Display device 531 Input device

Claims

A control device for controlling a robot for polishing a workpiece,
A machine learning device for learning polishing conditions when performing the polishing;
The machine learning device is characterized by observing the surface condition of the workpiece after the polishing and the polishing condition as a state variable representing a current state of the environment,
A determination data acquisition unit for acquiring determination data indicating an evaluation result of the surface state of the workpiece after the polishing;
A control device, comprising: a learning unit that learns the characteristics of the surface state of the workpiece after the polishing and the polishing conditions in association with each other using the state variable and the determination data.

Of the state variables, the polishing condition includes at least one of a rotation speed of a polishing tool, a rotation torque of the polishing tool, a pressing force of the polishing tool, and an operation speed of a robot,
2. The control device according to claim 1, wherein the determination data includes at least one of a stripe density D <b> 1 on the surface of the workpiece after the polishing, a smoothness D <b> 2 of the stripe, and an interval D <b> 3 of the stripes.

The learning unit
A reward calculation unit for obtaining a reward related to the evaluation result;
The control apparatus according to claim 1, further comprising: a value function updating unit that updates a function representing a value of the polishing condition with respect to a feature of the surface state of the workpiece after the polishing by using the reward.

The learning unit
An error calculation unit that calculates an error between a correlation model for deriving a polishing condition when performing the polishing from the state variable and the determination data and a correlation feature identified from teacher data prepared in advance;
A model updating unit that updates the correlation model so as to reduce the error,
The control device according to claim 1 or 2.

The control device according to claim 1, wherein the learning unit calculates the state variable and the determination data in a multilayer structure.

The control device according to any one of claims 1 to 5, further comprising a decision-making unit that outputs a command value based on the polishing condition based on a learning result by the learning unit.

The control device according to claim 1, wherein the learning unit learns the polishing condition using the state variable and the determination data obtained from a plurality of the robots.

The control device according to claim 1, wherein the machine learning device is realized by a cloud computing, fog computing, or edge computing environment.

A machine learning device that learns polishing conditions when a workpiece is polished by a robot,
A state observation unit that observes the characteristics of the surface state of the workpiece after the polishing and the polishing conditions as state variables representing the current state of the environment;
A determination data acquisition unit for acquiring determination data indicating an evaluation result of the surface state of the workpiece after the polishing;
A machine learning device, comprising: a learning unit that learns the characteristics of the surface state of the workpiece after polishing and the polishing conditions in association with each other using the state variable and the determination data.

A system in which a plurality of devices are connected to each other via a network,
The system includes a first robot including at least the control device according to claim 1.

The plurality of devices includes a computer equipped with a machine learning device,
The computer acquires at least one learning model generated by learning in the learning unit of the control device,
The machine learning device provided in the computer performs optimization or efficiency based on the acquired learning model.
The system according to claim 10.

The plurality of devices include a second robot different from the first robot,
The learning result by the learning unit provided in the control device provided in the first robot is shared with the second robot.
The system according to claim 10.

The plurality of devices include a second robot different from the first robot,
The data observed in the second robot can be used for learning by a learning unit provided in a control device provided in the first robot via the network.
The system according to claim 10.