DE102018004330A1

DE102018004330A1 - Control and machine learning device

Info

Publication number: DE102018004330A1
Application number: DE102018004330.5A
Authority: DE
Inventors: Tetsuji Ueda
Original assignee: Fanuc Corp
Current assignee: Fanuc Corp
Priority date: 2017-06-07
Filing date: 2018-05-30
Publication date: 2018-12-13
Anticipated expiration: 2038-05-31
Also published as: US10576628B2; DE102018004330B4; JP2018206162A; CN108994818B; JP6577522B2; CN108994818A; US20180354126A1

Abstract

Eine maschinelle Lernvorrichtung einer Steuerung beobachtet Daten einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters und einen Abstimmbetrag der Bewegungsgeschwindigkeit, eine Zielgeschwindigkeit eines Spitzenendes des Roboters und einen Bewegungspfad in der Nähe des Spitzenendes des Roboters als Zustandsvariablen, die einen aktuellen Zustand einer Umwelt ausdrücken, und erfasst Bestimmungsdaten, die ein Eignungsbestimmungsergebnis der Bewegungsgeschwindigkeit des Spitzenendes des Roboters angeben. Dann lernt die maschinelle Lernvorrichtung unter Verwendung der beobachteten Zustandsvariablen und der ermittelten Bestimmungsdaten die Sollgeschwindigkeitsdaten, die Bewegungsgeschwindigkeitsdaten und die Bewegungsbahndaten in Zuordnung zu dem Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters.A machine learning apparatus of a controller observes data of a moving speed of each motor of a robot and a moving speed adjusting amount, a target speed of a tip end of the robot, and a moving path near the tip end of the robot as state variables expressing a current state of an environment, and acquires determination data indicating a fitness determination result of the moving speed of the tip end of the robot. Then, using the observed state variables and the determined determination data, the machine learning device learns the target speed data, the movement speed data, and the movement path data in association with the adjustment amount of the moving speed of each of the motors of the robot.

Description

ALLGEMEINER STAND DER TECHNIKGENERAL PRIOR ART

Gebiet der ErfindungField of the invention

Die vorliegende Erfindung betrifft eine Steuerung und eine maschinelle Lernvorrichtung und insbesondere eine Steuerung und eine maschinelle Lernvorrichtung, die eine Einlerngeschwindigkeit optimieren.The present invention relates to a controller and a machine learning apparatus, and more particularly to a controller and a machine learning apparatus which optimize a teaching speed.

Beschreibung des Standes der TechnikDescription of the Prior Art

Allgemeine Industrieroboter werden gemäß einem zuvor erstellten Betriebsprogramm angetrieben oder so angetrieben, dass sie einen zuvor von einem Programmierhandgerät oder dergleichen eingelernten Einlernpunkt mit einer Einlerngeschwindigkeit durchlaufen. Das heißt, die Roboter werden mit einer vorgegebenen Geschwindigkeit entlang einer vorgegebenen Bahn gefahren. Zum Beispiel offenbart die offengelegte Japanische Patentanmeldung Nr. 6-285402 als verwandte Technik, die sich mit dem Einlernbetrieb eines Roboters befasst, eine Technik, durch die dem Roboter ein Abdichtungsvorgang (wie z.B. ein Einlernpunkt und eine Einlerngeschwindigkeit) eingelernt wird, um zu bewirken, dass der Roboter ein Abdichten ausführt.General industrial robots are driven or driven in accordance with a previously prepared operation program to go through a teach-in point previously learned by a teaching pendant or the like at a teaching speed. That is, the robots are driven at a predetermined speed along a predetermined path. For example, the laid-open discloses Japanese Patent Application No. 6-285402 As a related art that deals with the teach-in operation of a robot, a technique by which the robot is taught a sealing process (such as a teach-in point and a teach-in speed) to cause the robot to perform a seal.

Gemäß der in der offengelegten Japanischen Patentanmeldung Nr. 6-285402 offenbarten Technik wird eine Drehzahl des Motors einer Dichtmittelzuführpumpe entsprechend einer Bewegungsgeschwindigkeit eines Roboters gesteuert, um ein Dichtmittel einer Dichtmittelpistole zuzuführen, wobei eine Menge des aufzutragenden Dichtmittels pro Einheitsabstand eines Objekts unabhängig von einer Bewegungsgeschwindigkeit des Roboters konstant gehalten wird und eine Filmdicke einer Raupe konstant gehalten wird. Eine Pumpe mit einer solchen Druckregelfunktion ist jedoch teuer, was einen Ursache für die Erhöhung der Kosten eines gesamten Systems verursacht. Um die Kosten des gesamten Systems zu reduzieren, wird angedacht, eine Steuerung so auszuführen, dass die an der Spitze des Roboters vorgesehene Dichtmittelpistole einen Einlernpunkt durchläuft, wobei deren Bewegung auf einer vorgegebenen Geschwindigkeit gehalten wird. Wenn eine solche Steuerung übernehmbar ist, können die Kosten des gesamten Systems unter Verwendung einer kostengünstigen Pumpe, die nur einen EIN/AUS-Zustand steuern kann, reduziert werden. Eine Bewegungsbahn oder eine Bewegungsgeschwindigkeit des Spitzenendes des Roboters wird jedoch mit der Bewegung einer Vielzahl von Motoren umfassend bestimmt, und eine Änderung der Bewegung des Spitzenendes des Roboters bei Änderung der Bewegung eines Motors hängt von einem Bewegungszustand oder einem Beschleunigungs-/Verzögerungszustand eines anderen Motors ab. Daher hat auch ein Facharbeiter Schwierigkeiten, das Spitzenende des Roboters so abzustimmen, dass es sich entlang einer Bewegungsbahn bewegt, während dessen Bewegungsgeschwindigkeit konstant gehalten, und es ist erforderlich, die Einstellung wiederholt durch Ausprobieren durchzuführen. Als Folge davon steht der Arbeiter vor dem Problem, dass er enorme Anstrengungen unternehmen muss, um die Abstimmung durchzuführen.According to the disclosed in the Japanese Patent Application No. 6-285402 In the disclosed technology, a rotational speed of the motor of a sealant supply pump is controlled according to a moving speed of a robot to supply a sealant to a sealant gun, an amount of the sealant to be applied per unit distance of an object is kept constant regardless of a moving speed of the robot and a film thickness of a bead is kept constant , However, a pump having such a pressure regulating function is expensive, which causes a cause for increasing the cost of an entire system. In order to reduce the cost of the entire system, it is envisaged to carry out a control such that the sealant gun provided at the tip of the robot passes through a teaching point, keeping its movement at a predetermined speed. If such a control is acceptable, the cost of the entire system can be reduced by using a low cost pump that can only control ON / OFF state. However, a trajectory or a moving speed of the tip end of the robot is determined to be encompassed with the movement of a plurality of motors, and a change in the movement of the tip end of the robot upon changing the motion of one motor depends on a moving state or an acceleration / deceleration state of another motor , Therefore, a skilled worker also has difficulty in tuning the tip end of the robot to move along a trajectory while keeping its moving speed constant, and it is necessary to repeatedly perform the adjustment by trial and error. As a result, the worker faces the problem that he must make enormous efforts to carry out the vote.

KURZDARSTELLUNG DER ERFINDUNGBRIEF SUMMARY OF THE INVENTION

Hinsichtlich des oben genannten Problems hat die vorliegende Erfindung eine Aufgabe, eine Steuerung und eine maschinelle Lernvorrichtung bereitzustellen, die in der Lage sind, die Einlerngeschwindigkeit des Spitzenendes eines Roboters an eine vorgegebene Sollgeschwindigkeit anzupassen.In view of the above problem, the present invention has an object to provide a controller and a machine learning device capable of adjusting the teaching speed of the tip end of a robot to a predetermined target speed.

Um das obige Problem zu lösen, führt eine Steuerung gemäß der vorliegenden Erfindung das maschinelle Lernen eines Abstimmbetrags einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters in Bezug auf eine Sollgeschwindigkeit des Spitzenendes des Roboters, einer aktuellen Geschwindigkeit eines jeden Motors des Roboters und einer Bewegungsbahn des Spitzenendes des Roboters durch und führt eine Steuerung so durch, dass eine Bewegungsgeschwindigkeit des Spitzenendes des Roboters einer Sollgeschwindigkeit entspricht, wenn sich der Roboter basierend auf einem Ergebnis des maschinellen Lernens zu einer Einlernposition bewegt.In order to solve the above problem, a controller according to the present invention performs machine learning of a tuning amount of a moving speed of each motor of a robot with respect to a target speed of the robot tip, a current speed of each motor of the robot, and a trajectory of the tip of the robot Robot, and performs a control such that a moving speed of the tip end of the robot corresponds to a target speed when the robot moves to a teaching position based on a result of the machine learning.

Eine Steuerung gemäß einer ersten Ausführungsform der vorliegenden Erfindung stimmt die Bewegungsgeschwindigkeit jedes Motors eines Roboters ab, der die Beschichtung mit einem Dichtungsmaterial durchführt. Die Steuerung umfasst eine maschinelle Lernvorrichtung, die einen Abstimmbetrag der Bewegungsgeschwindigkeit der einzelnen Motoren des Roboters lernt. Die maschinelle Lernvorrichtung hat einen Zustandsbeobachtungsabschnitt, der als Zustandsvariablen, die einen aktuellen Zustand einer Umwelt ausdrücken, Einlern-Geschwindigkeitsabstimmbetragsdaten, die den Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters angeben, Sollgeschwindigkeitsdaten, die eine Sollgeschwindigkeit eines Spitzenendes des Roboters angeben, Bewegungsgeschwindigkeitsdaten, die die Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters angeben, und Bewegungsbahndaten, die eine Bewegungsbahn in der Nähe des Spitzenendes des Roboters angeben, beobachtet, einen Bestimmungsdatenerfassungsabschnitt, der Bestimmungsdaten erfasst, die ein Eignungsbestimmungsergebnis der Bewegungsgeschwindigkeit des Spitzenendes des Roboters angeben, und einen Lernabschnitt, der die Sollgeschwindigkeitsdaten, die Bewegungsgeschwindigkeitsdaten und die Bewegungsbahndaten in Zuordnung zu dem Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters unter Verwendung der Zustandsvariablen und der Bestimmungsdaten lernt.A controller according to a first embodiment of the present invention tunes the moving speed of each motor of a robot that performs the coating with a sealing material. The controller includes a machine learning device that learns a tuning amount of the moving speed of the individual motors of the robot. The machine learning device has a state observation section indicative of state variables expressing a current state of an environment, training speed adjustment amount data indicating the tuning amount of the moving speed of each of the motors of the robot, target speed data indicating a target speed of a tip end of the robot, moving speed data indicate the moving velocity of each of the motors of the robot, and observe trajectory data indicating a trajectory near the tip end of the robot, a determination data detecting section that detects determination data indicating a fitness determination result of the moving speed of the tip end of the robot, and a learning section the Target speed data, the moving speed data and the trajectory data in association with the timing of adjusting the moving speed of each of the motors of the robot using the state variables and the determination data.

Die Bestimmungsdaten können neben dem Eignungsbestimmungsergebnis der Bewegungsgeschwindigkeit des Spitzenendes des Roboters ein Eignungsbestimmungsergebnis einer Position des Spitzenendes des Roboters umfassen.The determination data may include, in addition to the fitness determination result of the moving speed of the tip end of the robot, a fitness determination result of a position of the tip end of the robot.

Der Lernabschnitt kann einen Belohnungsberechnungsabschnitt, der eine Belohnung in Zuordnung zu dem Eignungsbestimmungsergebnis berechnet, und einen Wertfunktion-Aktualisierungsabschnitt aufweisen, der unter Verwendung der Belohnung eine Funktion aktualisiert, die einen Wert des Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters in Bezug auf die Sollgeschwindigkeit des Spitzenendes des Roboters, die Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und die Bewegungsbahn in der Nähe des Spitzenendes des Roboters ausdrückt.The learning section may include a reward calculation section that calculates a reward associated with the aptitude determination result, and a value function update section that updates a function using the reward that includes a value of the trim amount of the movement speed of each of the motors of the robot with respect to the target speed of the tip end of the robot, expressing the moving speed of each of the motors of the robot and the trajectory near the tip end of the robot.

Der Lernabschnitt kann die Berechnung der Zustandsvariablen und der Bestimmungsdaten auf Basis einer mehrschichtigen Struktur durchführen.The session may perform the calculation of the state variables and the determination data based on a multi-layered structure.

Die Steuerung kann ferner einen Entscheidungsfindungsabschnitt umfassen, der einen Sollwert basierend auf dem Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters auf einer Basis eines Lernergebnisses des Lernabschnitts ausgibt.The controller may further include a decision making section that outputs a target value based on the tuning amount of the moving speed of each of the motors of the robot based on a learning result of the learning section.

Der Lernabschnitt kann den Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters in jedem einer Vielzahl von Robotern unter Verwendung der für jeden der Vielzahl von Robotern erhaltenen Zustandsvariablen und Bestimmungsdaten lernen.The learning section may learn the tuning amount of the moving speed of each of the motors of the robot in each of a plurality of robots by using the state variables and destination data obtained for each of the plurality of robots.

Die maschinelle Lernvorrichtung kann in einem Cloud-Server vorhanden sein.The machine learning device may be present in a cloud server.

Eine maschinelle Lernvorrichtung gemäß einer zweiten Ausführungsform der vorliegenden Erfindung lernt einen Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden Motors eines Roboters, der die Beschichtung mit einem Dichtungsmaterial durchführt. Die maschinelle Lernvorrichtung umfasst: einen Zustandsbeobachtungsabschnitt, der als Zustandsvariablen, die einen aktuellen Zustand einer Umwelt ausdrücken, einen Zustandsbeobachtungsabschnitt, der als Zustandsvariablen, die einen aktuellen Zustand einer Umwelt ausdrücken, Einlern-Geschwindigkeitsabstimmbetragsdaten, die den Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters angeben, Sollgeschwindigkeitsdaten, die eine Sollgeschwindigkeit eines Spitzenendes des Roboters angeben, Bewegungsgeschwindigkeitsdaten, die die Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters angeben, und Bewegungsbahndaten, die eine Bewegungsbahn in der Nähe des Spitzenendes des Roboters angeben, beobachtet; einen Bestimmungsdaten-Erfassungsabschnitt, der Bestimmungsdaten erfasst, die ein Eignungsbestimmungsergebnis der Bewegungsgeschwindigkeit des Spitzenendes des Roboters angeben; und einen Lernabschnitt, der die Sollgeschwindigkeitsdaten, die Bewegungsgeschwindigkeitsdaten und die Bewegungsbahndaten in Zuordnung zu dem Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters unter Verwendung der Zustandsvariablen und der Bestimmungsdaten lernt.A machine learning apparatus according to a second embodiment of the present invention learns a tuning amount of the moving speed of each motor of a robot that performs the coating with a sealing material. The machine learning apparatus includes: a state observation section that expresses state variables that express a current state of an environment, a state observation section that expresses state variables of an environment, learn-in velocity adjustment amount data, and the tuning amount of the movement speed of each of the motors of the robot specify target speed data indicating a target speed of a tip end of the robot, movement speed data indicating the moving speed of each of the motors of the robot, and trajectory data indicating a trajectory near the tip end of the robot; a determination data acquiring section that acquires determination data indicating a fitness determination result of the moving speed of the tip end of the robot; and a learning section that learns the target speed data, the movement speed data and the trajectory data in association with the adjustment amount of the moving speed of each of the motors of the robot using the state variables and the determination data.

Gemäß einer Ausführungsform der vorliegenden Erfindung ist es möglich, durch die Abstimmung einer Einlerngeschwindigkeit des Roboters basierend auf einem Lernergebnisses eine Bewegungsgeschwindigkeit des Spitzenendes eines Roboters konstant zu halten und eine Filmdicke einer Raupe konstant zu halten, ohne eine teure Pumpe zu verwenden.According to an embodiment of the present invention, by adjusting a learning speed of the robot based on a learning result, it is possible to keep a moving speed of the tip end of a robot constant and to keep a film thickness of a bead constant without using an expensive pump.

Figurenlistelist of figures

1 FIG. 10 is a schematic hardware configuration diagram of a controller according to a first embodiment; FIG.
2 Fig. 10 is a schematic functional block diagram of the controller according to the first embodiment;
3 Fig. 10 is a schematic functional block diagram showing an embodiment of the controller;
4 Fig. 10 is a schematic flowchart showing the embodiment of a machine learning method;
5A is a diagram for describing a neuron;
5B is a diagram for describing a neural network;
6 Fig. 10 is a schematic functional block diagram of a controller according to a second embodiment;
7 Fig. 10 is a schematic functional block diagram showing one embodiment of a system with controls; and
8th Fig. 10 is a schematic functional block diagram showing another embodiment of a system with a controller.

AUSFÜHRLICHE BESCHREIBUNG DER BEVORZUGTEN AUSFÜHRUNGSFORMEN DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

1 ist ein schematisches Hardware-Konfigurationsdiagramm, das eine Steuerung und die wesentlichen Teile einer von der Steuerung gemäß einer ersten Ausführungsform gesteuerten Werkzeugmaschine zeigt. 1 FIG. 12 is a schematic hardware configuration diagram showing a control and the essential parts of a machine tool controlled by the controller according to a first embodiment. FIG.

Eine Steuerung 1 kann z.B. als Steuerung zum Steuern eines Industrieroboters (nicht gezeigt) montiert werden, der die Beschichtung durch ein Dichtungsmaterial oder dergleichen durchführt. Eine Zentraleinheit (CPU) 11 der Steuerung 1 gemäß der Ausführungsform ist ein Prozessor, der die Steuerung 1 vollständig steuert. Die CPU 11 liest ein in einem Nur-Lese-Speicher (ROM) 12 gespeichertes Systemprogramm über einen Bus 20 und steuert die gesamte Steuerung 1 gemäß dem Systemprogramm. Ein Direktzugriffsspeicher (RAM) 13 speichert vorübergehend temporäre Berechnungsdaten oder Anzeigedaten und verschiedene Daten oder dergleichen, die von einem Bediener über ein später beschriebenes Programmierhandgerät 60 eingegeben werden.A controller 1 For example, it may be mounted as a controller for controlling an industrial robot (not shown) that performs coating by a sealing material or the like. A central processing unit (CPU) 11 the controller 1 According to the embodiment, a processor that is the controller 1 completely controls. The CPU 11 reads in a read-only memory (ROM) 12 stored system program via a bus 20 and controls the entire controller 1 according to the system program. A random access memory (RAM) 13 temporarily stores temporary calculation data or display data and various data or the like which is input from an operator via a hand-held programmer described later 60 be entered.

Ein nichtflüchtiger Speicher 14 ist als ein Speicher ausgebildet, der seinen Speicherzustand beispielsweise durch Datensicherung oder dergleichen mit einer Batterie (nicht gezeigt) auch bei ausgeschalteter Steuerung 1 beibehält. Der nichtflüchtige Speicher 14 speichert die von dem Programmierhandgerät 60 über eine Schnittstelle 18 eingegebenen Einlerndaten, ein über eine Schnittstelle (nicht dargestellt) eingegebenes, den Roboter steuerndes Programm, oder dergleichen. Programme oder verschiedene Daten, die im nichtflüchtigen Speicher 14 gespeichert sind, können bei Ausführung/Verwendung in den RAM 13 entwickelt werden. Weiterhin speichert der ROM 12 vorab verschiedene Systemprogramme (ein Systemprogramm umfassend, um die Kommunikation mit einer maschinellen Lernvorrichtung 100, das später beschrieben wird, zu steuern) zur laufenden Bearbeitung für die Steuerung eines Roboters oder das Einlernen einer Einlernposition oder dergleichen.A non-volatile memory 14 is designed as a memory, its memory state, for example, by data backup or the like with a battery (not shown) even with the controller off 1 maintains. The non-volatile memory 14 saves those from the programming pendant 60 via an interface 18 input learning data, a program (not shown) via an interface, the robot controlling program, or the like. Programs or various data stored in non-volatile memory 14 may be stored in execution / use in the RAM 13 be developed. Furthermore, the ROM stores 12 beforehand various system programs (a system program comprising communication with a machine learning device 100 which will be described later) for the current processing for the control of a robot or the teaching of a teaching position or the like.

Ein Achssteuerkreis 30 zur Steuerung der Achse eines Gelenks oder dergleichen eines Roboters empfängt von der CPU 11 einen Bewegungssollbetrag der Achse und gibt einen Befehl zum Bewegen der Achse an einen Servoverstärker 40 aus. Der Servoverstärker 40 steuert bei Erhalt des Befehls einen Servomotor 50 an, der die Achse des Roboters bewegt. Der Servomotor 50 für die Welle umfasst eine Positions-/Drehzahlerfassungsvorrichtung und gibt ein Positions-/Drehzahlrückmeldesignal von der Positions-/Drehzahlerfassungsvorrichtung an den Achssteuerkreis 30 zurück, um eine Positions-/Drehzahlregelung durchzuführen. Es wird angemerkt, dass der Achssteuerkreis 30, der Servoverstärker 40 und der Servomotor 50 im Hardware-Konfigurationsdiagramm von 1 einzeln dargestellt sind, aber tatsächlich entsprechend der Anzahl der Achsen eines zu steuernden Roboters vorgesehen sind.An axis control circuit 30 for controlling the axis of a joint or the like of a robot receives from the CPU 11 a motion set amount of the axis and gives a command to move the axis to a servo amplifier 40 out. The servo amplifier 40 controls a servomotor when receiving the command 50 on, which moves the axis of the robot. The servomotor 50 for the shaft includes a position / speed detecting device and outputs a position / speed feedback signal from the position / speed detecting device to the axis control circuit 30 back to perform a position / speed control. It is noted that the axis control circuit 30 , the servo amplifier 40 and the servomotor 50 in the hardware configuration diagram of 1 are individually shown, but are actually provided according to the number of axes of a robot to be controlled.

Bei einem Roboter mit sechs Achsen sind beispielsweise für jede der sechs Achsen der Achsregelkreis 30, der Servoverstärker 40 und der Servomotor 50 vorgesehen.For example, in a robot with six axes, the axis control loop is for each of the six axes 30 , the servo amplifier 40 and the servomotor 50 intended.

Das Programmierhandgerät 60 ist eine manuelle Dateneingabevorrichtung mit einer Anzeige, einem Ziehpunkt, einem Hardware-Schlüssel oder dergleichen. Das Programmierhandgerät 60 empfängt über die Schnittstelle 18 Information von der Steuerung 1, um diese anzuzeigen, und gibt Impulse, Befehle und verschiedene Dateneingaben von dem Ziehpunkt, dem Hardware-Schlüssel oder dergleichen an die CPU 11 weiter.The programming pendant 60 is a manual data input device with a display, a handle, a hardware key or the like. The programming pendant 60 receives over the interface 18 Information from the controller 1 to display these, and gives pulses, commands and various data inputs from the handle, the hardware key or the like to the CPU 11 further.

Eine Pumpe 70 fördert ein Dichtungsmaterial zu einer Dichtmittelpistole (nicht gezeigt), die an der Spitze eines Roboters gehalten wird. Basierend auf einem Befehl der CPU 11 über eine Schnittstelle 19 ist die Pumpe 70 in der Lage, die Zufuhr des Dichtungsmaterials ein- und auszuschalten.A pump 70 conveys a sealing material to a sealant gun (not shown) which is held at the tip of a robot. Based on a command from the CPU 11 via an interface 19 is the pump 70 able to turn on and off the supply of sealing material.

Eine Schnittstelle 21 ist eine Schnittstelle, um die Steuerung 1 und die maschinelle Lernvorrichtung 100 miteinander zu verbinden. Die maschinelle Lernvorrichtung 100 umfasst einen Prozessor 101, der die gesamte maschinelle Lernvorrichtung 100 steuert, einen ROM 102, der ein Systemprogramm oder dergleichen speichert, einen RAM 103, der Daten in jeder dem maschinellen Lernen zugeordneten Verarbeitung zwischenspeichert, und einen nichtflüchtigen Speicher 104, der zum Speichern eines Lernmodells oder dergleichen verwendet wird. Die maschinelle Lernvorrichtung 100 kann jede Information (z.B. Positionsinformation oder Drehzahlinformation des Servomotors 50, einen aktuellen Wert und Einstellinformation zu einem laufenden Programm, Einlerninformation oder dergleichen, die im RAM 13 oder dergleichen gespeichert sind) beobachten, die von der Steuerung 1 über die Schnittstelle 21 erfasst werden kann. Weiterhin führt die Steuerung 1 bei Empfang von Befehlen zur Steuerung des Servomotors 50 und der Peripherievorrichtung eines Roboters, die von der maschinellen Lernvorrichtung 100 ausgegeben werden, die Kompensation oder dergleichen eines Befehls zur Steuerung des Roboters basierend auf einem Programm oder Einlerndaten durch.An interface 21 is an interface to the controller 1 and the machine learning device 100 to connect with each other. The machine learning device 100 includes a processor 101 , the whole machine learning device 100 controls, a ROM 102 storing a system program or the like, a RAM 103 which caches data in each processing associated with the machine learning, and a nonvolatile memory 104 used for storing a learning model or the like. The machine learning device 100 can any information (eg position information or speed information of the servomotor 50 , a current value and setting information on a current program, training information or the like stored in the RAM 13 or the like stored) by the controller 1 over the interface 21 can be detected. Furthermore, the controller performs 1 upon receipt of commands to control the servomotor 50 and the peripheral device of a robot used by the machine learning device 100 output, the compensation or the like of a command for controlling the robot based on a program or Einlerndaten by.

2 ist ein schematisches Funktionsblockschaltbild der Steuerung 1 und der maschinellen Lernvorrichtung 100 gemäß der ersten Ausführungsform. 2 is a schematic functional block diagram of the controller 1 and the machine learning device 100 according to the first embodiment.

Die maschinelle Lernvorrichtung 100 umfasst Software (wie einen Lernalgorithmus) und Hardware (wie den Prozessor 101) zum spontanen Erlernen eines Abstimmbetrages einer Bewegungsgeschwindigkeit von jedem der Motoren eines Roboters in Bezug auf eine Sollgeschwindigkeit des Spitzenendes des Roboters, einer aktuellen Drehzahl von jedem der Motoren des Roboters und einer Bewegungsbahn des Spitzenendes des Roboters durch sogenanntes maschinelles Lernen. Eine von der maschinellen Lernvorrichtung 100 der Steuerung 1 zu erlernende Aufgabe entspricht einer Modellstruktur, die die Korrelation zwischen einer Sollgeschwindigkeit des Spitzenendes eines Roboters, einer aktuellen Drehzahl jedes der Motoren des Roboters, einer Bewegungsbahn des Spitzenendes des Roboters und einem Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters ausdrückt. The machine learning device 100 includes software (such as a learning algorithm) and hardware (such as the processor 101 for spontaneously learning a tuning amount of a moving speed of each of the motors of a robot with respect to a target speed of the tip end of the robot, a current speed of each of the motors of the robot, and a trajectory of the tip end of the robot by so-called machine learning. One from the machine learning device 100 the controller 1 The object to be learned corresponds to a model structure expressing the correlation between a target speed of the tip end of a robot, a current rotational speed of each of the motors of the robot, a trajectory of the tip end of the robot, and a trim amount of a moving velocity of each of the motors of the robot.

Wie im Funktionsblock von 2 dargestellt, umfasst die maschinelle Lernvorrichtung 100 der Steuerung 1 einen Zustandsbeobachtungsabschnitt 106, einen Bestimmungsdaten-Erfassungsabschnitt 108 und einen Lernabschnitt 110. Der Zustandsbeobachtungsabschnitt 106 beobachtet Zustandsvariablen S, die den aktuellen Zustand einer Umwelt ausdrücken, die Einlerngeschwindigkeits-Abstimmbetragsdaten S1, die einen Abstimmbetrag einer Bewegungsgeschwindigkeit jedes Motors eines Roboters in der Steuerung des Roboters basierend auf Einlerndaten anzeigen, Sollgeschwindigkeitsdaten S2, die eine Sollgeschwindigkeit des Spitzenendes des Roboters anzeigen, Bewegungsgeschwindigkeitsdaten S3, die eine Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters anzeigen, und Bewegungsbahndaten S4, die eine Bewegungsbahn nahe dem Spitzenende des Roboters angeben, umfassen. Der Bestimmungsdaten-Erfassungsabschnitt 108 erfasst die Bestimmungsdaten D, die die Bewegungsgeschwindigkeits-Bestimmungsdaten D1, die ein Eignungsbestimmungsergebnis der Bewegungsgeschwindigkeit des Spitzenendes des Roboters anzeigen, wenn eine Einlerngeschwindigkeit eines jeden der Motoren abgestimmt wird, anzeigen. Unter Verwendung der Zustandsvariablen S und der Bestimmungsdaten D lernt der Lernabschnitt 110 die Sollgeschwindigkeit des Spitzenendes des Roboters, eine aktuelle Drehzahl eines jeden der Motoren des Roboters und den Bewegungspfad des Spitzenendes des Roboters in Zuordnung zu den Einlerngeschwindigkeits-Abstimmbetragsdaten S1.As in the function block of 2 includes the machine learning device 100 the controller 1 a state observation section 106 , a determination data acquiring section 108 and a session 110 , The state observation section 106 observes state variables S expressing the current state of an environment, the teaching speed tuning amount data S1 indicative of a tuning amount of a moving speed of each motor of a robot in the control of the robot based on training data, target speed data S2 indicative of a target speed of the tip end of the robot, moving speed data S3 indicative of a moving speed of each of the motors of the robot and trajectory data S4 comprising a trajectory near the tip end of the robot. The determination data acquiring section 108 records the determination data D indicating the moving speed designation data D1 indicative of a fitness determination result of the moving speed of the tip end of the robot when tuning a learning speed of each of the motors. Using the state variables S and the determination data D the session learns 110 the target speed of the tip end of the robot, a current speed of each of the motors of the robot, and the moving path of the tip end of the robot in association with the teaching speed tuning amount data S1 ,

Der Zustandsbeobachtungsabschnitt 106 kann beispielsweise als eine der Funktionen des Prozessors 101 oder als im ROM 102 gespeicherte Software zum Betreiben des Prozessors 101 konfiguriert sein. Unter den Zustandsvariablen S, die vom Zustandsbeobachtungsabschnitt 106 beobachtet werden, können die Einlerngeschwindigkeits-Abstimmbetragsdaten S1 als ein Satz von Abstimmbeträgen in Bezug auf eine Bewegungsgeschwindigkeit eines jeden Motors eines Roboters erfasst werden. Hierbei umfasst der Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters eine Richtung (ein positiver/negativer Wert), in der die Bewegungsgeschwindigkeit des Motors abgestimmt wird.The state observation section 106 for example, as one of the functions of the processor 101 or as in ROM 102 stored software for operating the processor 101 be configured. Among the state variables S, the state observation section 106 can be observed, the teaching speed tuning amount data S1 are detected as a set of tuning amounts with respect to a moving speed of each motor of a robot. Here, the adjustment amount of the moving speed of each of the motors of the robot includes a direction (a positive / negative value) in which the moving speed of the motor is tuned.

Als die Einlerngeschwindigkeits-Abstimmbetragsdaten S1 kann beispielsweise ein Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters oder ein Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden Motors aus einem von einem Simulationsgerät simulierten Ergebnis verwendet werden, das von einem Facharbeiter gemeldet und der Steuerung 1 übergeben wurde, beim Start des Lernens verwendet werden. Weiterhin kann als Einlerngeschwindigkeits-Abstimmbetragsdaten S1 ein Abstimmbetrag einer durch die maschinelle Lernvorrichtung 100 im vorherigen Lernzyklus basierend auf einem Lernergebnis des Lernabschnittes 110 bestimmten Bewegungsgeschwindigkeit eines jeden Motors eines Roboters verwendet werden, wenn das Lernen bis zu einem gewissen Grad vorangeschritten ist. In einem solchen Fall kann die maschinelle Lernvorrichtung 100 vorab für jeden Lernzyklus einen bestimmten Abstimmbetrag einer Bewegungsgeschwindigkeit jedes Motors eines Roboters in dem RAM 103 Zwischenspeichern, sodass der Zustandsbeobachtungsabschnitt 106 aus dem RAM 103 den Abstimmbetrag der durch die maschinelle Lernvorrichtung 100 im vorherigen Lernzyklus ermittelten Bewegungsgeschwindigkeit jedes Motors des Roboters erfasst.As the teaching speed tuning amount data S1 For example, a tuning amount of a moving speed of each motor of a robot or a tuning amount of moving speed of each motor may be used from a simulation-simulated result reported by a skilled worker and the controller 1 was handed over to be used at the start of learning. Further, as a teaching speed tuning amount data S1 a tuning amount of one through the machine learning device 100 in the previous learning cycle based on a learning outcome of the session 110 certain speed of movement of each motor of a robot can be used when learning has progressed to a degree. In such a case, the machine learning device 100 in advance, for each learning cycle, a certain amount of tuning a moving speed of each motor of a robot in the RAM 103 Caching, so the state observation section 106 from the RAM 103 the amount of tuning by the machine learning device 100 detected in the previous learning cycle movement speed of each motor of the robot.

Weiterhin können als Sollgeschwindigkeitsdaten S2 unter den Zustandsvariablen S beispielsweise eine von einem Arbeiter eingestellte und in den Einlerndaten umfasste Einlerngeschwindigkeit verwendet werden. Da eine von einem Arbeiter eingestellte Einlerngeschwindigkeit ein von dem Arbeiter als Sollwert eingestellter Wert ist, kann sie als eine Sollgeschwindigkeit verwendet werden.Furthermore, as target speed data S2 under the state variables S For example, a learning speed set by a worker and included in the training data may be used. Since a teach-in speed set by a worker is a value set by the worker as a target value, it can be used as a target speed.

Als Bewegungsgeschwindigkeitsdaten S3 unter den Zustandsvariablen S kann beispielsweise eine Bewegungsgeschwindigkeit im aktuellen Zyklus jedes Motors (d.h. des Servomotors 50) eines Roboters verwendet werden. Die Bewegungsgeschwindigkeitsdaten S3 können unter Vewendung einer an einem Motor angebrachten Positions- und Geschwindigkeitserfassungsvorrichtung erfasst werden.As movement speed data S3 under the state variables S For example, a moving speed in the current cycle of each motor (ie, the servomotor 50 ) of a robot. The movement speed data S3 can be detected using a position and speed detecting device mounted on a motor.

Als Bewegungsbahndaten S4 unter den Zustandsvariablen S kann beispielsweise eine Bewegungsbahn des Spitzenendes eines Roboters verwendet werden, die basierend auf einer in den Einlerndaten umfassten Einlernposition berechnet wird. Die Bewegungsbahndaten S4 können als Seriendaten für jeden vorgeschriebenen Zyklus von relativen Koordinatenwerten berechnet werden, die erhalten werden, wenn eine Bewegungsbahn innerhalb einer vorgeschriebenen Zeitspanne seit der aktuellen Zeit von der aktuellen Position des Spitzenendes eines Roboters aus gesehen wird.As trajectory data S4 Among the state variables S, for example, a trajectory of the tip end of a robot calculated based on a learning position included in the learning data may be used. The trajectory data S4 can as Calculating serial data for each prescribed cycle of relative coordinate values obtained when a trajectory within a prescribed time since the current time is viewed from the current position of the tip end of a robot.

Der Bestimmungsdaten-Erfassungsabschnitt 108 kann beispielsweise als eine der Funktionen des Prozessors 101 oder als im ROM 102 gespeicherte Software zum Betreiben des Prozessors 101 konfiguriert werden. Als Bestimmungsdaten D kann der Bestimmungsdaten-Erfassungsabschnitt 108 Bewegungsgeschwindigkeits-Bestimmungsdaten D1 verwenden, die einen Eignungsbestimmungswert in Bezug auf eine Bewegungsgeschwindigkeit des Spitzenendes eines Roboters angeben, wenn eine Einlerngeschwindigkeit eines jeden Motors eingestellt ist. Die Bestimmungsdaten D1 können aus einer Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters, die erhalten wird, wenn der Zustandsbeobachtungsabschnitt 106 die Bewegungsgeschwindigkeitsdaten S3 beobachtet, berechnet werden. Die Bestimmungsdaten D sind ein Index, der ein Ergebnis ausdrückt, das erhalten wird, wenn ein Roboter unter den Zustandsvariablen S gesteuert wird.The determination data acquiring section 108 for example, as one of the functions of the processor 101 or as in ROM 102 stored software for operating the processor 101 be configured. As determination data D the determination data acquiring section may 108 Moving speed determining data D1 which specify a fitness determination value with respect to a moving speed of the tip end of a robot when a teaching speed of each motor is set. The determination data D1 can be obtained from a moving speed of each of the motors of the robot obtained when the state observation section 106 the movement speed data S3 observed, be calculated. The determination data D are an index expressing a result obtained when a robot among the state variables S is controlled.

Bezogen auf dem Lernzyklus des Lernabschnitts 110 sind die gleichzeitig in den Lernabschnitt 110 eingegebenen Zustandsvariablen S diejenigen, die auf Daten des vorherigen Lernzyklus basieren, bei dem die Bestimmungsdaten D erfasst wurden. Während, wie oben beschrieben, die maschinelle Lernvorrichtung 100 der Steuerung 1 das maschinelle Lernen vorantreibt, werden die Erfassungen der Sollgeschwindigkeitsdaten S2, der Bewegungsgeschwindigkeitsdaten S3 und der Bewegungsbahndaten S4 sowie die Implementierung des Steuerns eines Roboters gemäß einer basierend auf den Einlerngeschwindigkeits-Abstimmbetragsdaten S1 abgestimmte Einlerngeschwindigkeit und die Erfassung der Bestimmungsdaten D in einer Umwelt wiederholt durchgeführt.Related to the learning cycle of the session 110 are the ones in the session at the same time 110 entered state variables S those based on data from the previous learning cycle where the determination data D were recorded. While, as described above, the machine learning device 100 the controller 1 which advances machine learning becomes the acquisitions of the target speed data S2 , the motion speed data S3 and the trajectory data S4 and the implementation of controlling a robot according to one based on the training speed tuning amount data S1 coordinated learning speed and the determination of the determination data D repeatedly performed in an environment.

Der Lernabschnitt 110 kann beispielsweise als eine der Funktionen des Prozessors 101 oder als im ROM 102 gespeicherte Software zum Betreiben des Prozessors 101 konfiguriert werden. Gemäß einem beliebigen als maschinelles Lernen bezeichneten Lernalgorithmus lernt der Lernabschnitt 110 die Einlerngeschwindigkeits-Abstimmbetragsdaten S1 in Bezug auf eine Sollgeschwindigkeit des Spitzenendes eines Roboters, eine Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und eine Bewegungsbahn in der Nähe des Spitzenendes des Roboters. Der Lernabschnitt 110 kann das Lernen basierend auf einem Datensatz mit den oben beschriebenen Zustandsvariablen S und den Bestimmungsdaten D wiederholt ausführen. Wenn der Zyklus des Lernens der Einlerngeschwindigkeits-Abstimmbetragsdaten S1 in Bezug auf eine Sollgeschwindigkeit des Spitzenendes eines Roboters, einer Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und einer Bewegungsbahn in der Nähe des Spitzenendes des Roboters wiederholt ausgeführt wird, werden die Sollgeschwindigkeitsdaten S2, die Bewegungsgeschwindigkeitsdaten S3 und die Bewegungsbahndaten S4 unter den Zustandsvariablen S aus den Einlerndaten oder, wie oben beschrieben, dem Zustand eines jeden der im vorherigen Lernzyklus erfassten Motoren erfasst, die Einlerngeschwindigkeits-Abstimmbetragsdaten S1 entsprechen einem basierend auf früheren Lernergebnissen ermittelten Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters, und die Bestimmungsdaten D entsprechen einem Eignungsbestimmungsergebnis in Bezug auf die (abgestimmte) Bewegungsgeschwindigkeit des Spitzenendes des Roboters im aktuellen Lernzyklus in einem Zustand, in dem die Einlerngeschwindigkeit basierend auf den Einlerngeschwindigkeits-Abstimmbetragsdaten S1 abgestimmt wurde.The session 110 for example, as one of the functions of the processor 101 or as in ROM 102 stored software for operating the processor 101 be configured. According to any learning algorithm called machine learning, the session learns 110 the teaching speed tuning amount data S1 with respect to a target speed of the tip end of a robot, a moving speed of each of the motors of the robot, and a trajectory near the tip end of the robot. The session 110 can learn based on a record with the state variables described above S and the determination data D Repeat. When the cycle of learning the teaching speed tuning amount data S1 with respect to a target speed of the tip end of a robot, a moving speed of each of the motors of the robot, and a trajectory near the tip end of the robot are repeatedly executed, the target speed data becomes S2 , the movement speed data S3 and the trajectory data S4 among the state variables S from the training data or, as described above, the state of each of the motors detected in the previous learning cycle, the training speed tuning amount data S1 correspond to a determination of the moving speed of each of the motors of the robot based on previous learning results, and the determination data D correspond to a fitness determination result with respect to the (coordinated) moving speed of the tip end of the robot in the current learning cycle in a state where the learning speed is based on the learning speed tuning amount data S1 was agreed.

Durch wiederholtes Ausführen eines solchen Lernzyklus kann der Lernabschnitt 110 automatisch ein Merkmal identifizieren, das die Korrelation zwischen einer Sollgeschwindigkeit (Sollgeschwindigkeitsdaten S2) des Spitzenendes eines Roboters, einer Bewegungsgeschwindigkeit (Bewegungsgeschwindigkeitsdaten S3) eines jeden der Motoren des Roboters und einer Bewegungsbahn (Bewegungsbahndaten S4) in der Nähe des Spitzenendes des Roboters und einem Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters in Bezug auf den Zustand anzeigt. Obwohl die Korrelation zwischen den Sollgeschwindigkeitsdaten S2, den Bewegungsgeschwindigkeitsdaten S3 und den Bewegungsbahndaten S4 und einem Abstimmbetrag einer Bewegungsgeschwindigkeit jedes Motors eines Roboters zu Beginn eines Lernalgorithmus im Wesentlichen unbekannt ist, identifiziert der Lernabschnitt 110 schrittweise ein Merkmal, das die Korrelation anzeigt, und interpretiert die Korrelation während das Lernen fortschreitet. Wenn die Korrelation zwischen den Sollgeschwindigkeitsdaten S2, den Bewegungsgeschwindigkeitsdaten S3 und den Bewegungsbahndaten S4 und einem Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors bis zu einem gewissen zuverlässigen Grad interpretiert wird, können die vom Lernabschnitt 110 wiederholt ausgegeben Lernergebnisse zur Auswahl der Aktion (d.h., Entscheidungsfindung) verwendet werden, um zu bestimmen, in welchem Maße eine Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters in Bezug auf einen aktuellen Zustand (d.h. eine Sollgeschwindigkeit des Spitzenendes des Roboters, eine Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und eine Bewegungsbahn in der Nähe des Spitzenendes des Roboters) abgestimmt wird. Das heißt, wenn ein Lernalgorithmus fortgeschritten ist, kann der Lernabschnitt 110 die Korrelation zwischen einer Sollgeschwindigkeit des Spitzenendes eines Roboters, einer Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und einer Bewegungsbahn in der Nähe des Spitzenendes des Roboters und der Aktion des Bestimmens, in welchem Maße eine Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters in Bezug auf den Zustand auf eine optimale Lösung abgestimmt wird, schrittweise annähern.By repeatedly executing such a learning cycle, the session may 110 automatically identify a feature that determines the correlation between a desired speed (target speed data S2 ) of the tip end of a robot, a moving speed (moving speed data S3 ) of each of the motors of the robot and a trajectory (trajectory data S4 ) in the vicinity of the tip end of the robot and a tuning amount of the moving speed of each of the motors of the robot with respect to the state. Although the correlation between the target speed data S2 , the movement speed data S3 and the trajectory data S4 and a tuning amount of a moving speed of each motor of a robot at the beginning of a learning algorithm is substantially unknown, the learning section identifies 110 gradually a feature indicating the correlation and interprets the correlation as learning progresses. If the correlation between the target speed data S2 , the movement speed data S3 and the trajectory data S4 and a tuning amount of a moving speed of each motor is interpreted to some reliable degree, those of the learning section 110 repeatedly output learning results for selecting the action (ie, decision making) used to determine to what extent a moving speed of each of the motors of the robot with respect to a current state (ie, a target speed of the tip end of the robot, a moving speed of each of the Motors of the robot and a trajectory near the top end of the robot) is agreed. That is, when a learning algorithm has advanced, the session may 110 the correlation between a target speed of the tip end of a robot, a moving speed of each of the motors of the robot and a trajectory near the tip end of the robot and the action of determining to what extent a moving speed of each of the motors of the robot with respect to the state is tuned to an optimal solution, gradually approaching.

Wie oben beschrieben, lernt der Lernabschnitt 110 in der maschinellen Lernvorrichtung 100 der Steuerung 1 einen Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters gemäß einem maschinellen Lernalgorithmus unter Verwendung der Zustandsvariablen S, die von dem Zustandsbeobachtungsabschnitt 106 beobachtet werden, und der Bestimmungsdaten D, die von dem Bestimmungsdaten-Erfassungsabschnitt 108 erfasst werden. Die Zustandsvariablen S setzen sich zusammen aus den Einlerngeschwindigkeits-Abstimmbetragsdaten S1, den Sollgeschwindigkeitsdaten S2, den Bewegungsgeschwindigkeitsdaten S3 und den Bewegungsbahndaten S4, die durch eine Störung kaum beeinflusst werden. Außerdem werden die Bestimmungsdaten D durch die Erfassung einer in der Steuerung 1 gespeicherten Einlerngeschwindigkeit und einer von Steuerung 1 erfassten Bewegungsgeschwindigkeit des Servomotors 50 eindeutig berechnet. Dementsprechend kann die maschinelle Lernvorrichtung 100 der Steuerung 1 unter Verwendung von Lernergebnissen des Lernabschnittes 110 automatisch und genau einen Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters gemäß einer Sollgeschwindigkeit des Spitzenendes des Roboters, einer Bewegungsgeschwindigkeit eines jeden Motors des Roboters und einer Bewegungsbahn nahe dem Spitzenende des Roboters berechnen, ohne sich auf Berechnung oder Schätzung zu verlassen.As described above, the session learns 110 in the machine learning device 100 the controller 1 a tuning amount of a moving speed of each motor of a robot according to a machine learning algorithm using the state variables S from the state observation section 106 be observed, and the determination data D obtained from the determination data acquiring section 108 be recorded. The state variables S are composed of the teaching speed tuning amount data S1 , the target speed data S2 , the movement speed data S3 and the trajectory data S4 which are hardly affected by a disturbance. In addition, the determination data D is detected by the detection of one in the controller 1 stored teach-in speed and one of control 1 detected movement speed of the servomotor 50 clearly calculated. Accordingly, the machine learning device 100 the controller 1 using learning outcomes of the session 110 automatically and accurately calculate a tuning amount of a moving speed of each motor of a robot according to a target speed of the tip end of the robot, a moving speed of each motor of the robot, and a moving path near the tip end of the robot without relying on calculation or estimation.

Wenn es möglich ist, einen Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters automatisch zu berechnen, ohne sich auf eine Berechnung oder Schätzung zu verlassen, kann ein geeigneter Wert des Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters nur durch Verstehen einer Sollgeschwindigkeit (der Sollgeschwindigkeitsdaten S2) des Spitzenendes des Roboters, einer Bewegungsgeschwindigkeit (der Bewegungsgeschwindigkeitsdaten S3) eines jeden der Motoren des Roboters und einer Bewegungsbahn (der Bewegungsbahndaten S4) in der Nähe des Spitzenendes des Roboters schnell bestimmt werden. Dementsprechend kann die Bewegungsgeschwindigkeit jedes Motors eines Roboters effizient abgestimmt werden.When it is possible to automatically calculate a tuning amount of a moving speed of each motor of a robot without relying on calculation or estimation, an appropriate value of the tuning amount of the moving speed of each of the motors of the robot can be obtained only by understanding a target speed (the target speed data S2 ) of the tip end of the robot, a moving speed (the moving speed data S3 ) of each of the motors of the robot and a trajectory (the trajectory data S4 ) can be quickly determined near the tip end of the robot. Accordingly, the moving speed of each motor of a robot can be efficiently tuned.

Als ein erstes modifiziertes Beispiel für die maschinelle Lernvorrichtung 100 der Steuerung 1 kann der Bestimmungsdaten-Erfassungsabschnitt 108 als Bestimmungsdaten D, neben Bewegungsgeschwindigkeits-Bestimmungsdaten D1, die einen Eignungsbestimmungswert einer Bewegungsgeschwindigkeit des Spitzenendes des Roboters angeben, Spitzenenden-Positionsbestimmungsdaten D2, die ein Eignungsbestimmungsergebnis der Position des Spitzenendes eines Roboters oder dergleichen anzeigen, verwenden.As a first modified example of the machine learning device 100 the controller 1 the determination data acquiring section may 108 as determination data D , besides movement speed determination data D1 indicative of a fitness determination value of a moving speed of the tip end of the robot, tip end position determination data D2 which indicate a fitness determination result of the position of the tip end of a robot or the like.

Gemäß dem obigen modifizierten Beispiel kann die maschinelle Lernvorrichtung 100 ferner einen Abweichungsgrad einer Einlernposition beim Erlernen eines Abstimmbetrages einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters in Bezug auf eine Sollgeschwindigkeit des Spitzenendes des Roboters, eine Bewegungsgeschwindigkeit eines jeden Motors des Roboters und eine Bewegungsbahn in der Nähe des Spitzenendes des Roboters berücksichtigen.According to the above modified example, the machine learning device 100 further taking into account a degree of deviation of a teaching position in learning a tuning amount of a moving speed of each motor of a robot with respect to a target speed of the tip end of the robot, a moving speed of each motor of the robot and a moving track in the vicinity of the tip end of the robot.

Als ein zweites modifiziertes Beispiel für die maschinelle Lernvorrichtung 100 der Steuerung 1 kann der Lernabschnitt 110 unter Verwendung der ermittelten Zustandsvariablen S und Bestimmungsdaten D für jeden der Vielzahl von Robotern, der die gleiche Arbeit ausführt, einen Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden Motors einer Vielzahl von Robotern lernen. Gemäß der Konfiguration ist es möglich, eine Menge eines Datensatzes, der die in einem bestimmten Zeitraum erfassten Zustandsvariablen S und Bestimmungsdaten D umfasst, zu erhöhen. Daher kann die Geschwindigkeit und die Zuverlässigkeit des Erlernens eines Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden Motors eines Roboters mit einem Satz von vielfältigeren Daten als Eingaben verbessert werden.As a second modified example of the machine learning device 100 the controller 1 can the session 110 using the determined state variables S and determination data D for each of the plurality of robots performing the same work, learning a tuning amount of the moving speed of each motor of a plurality of robots. According to the configuration, it is possible to have a set of a data set representing the state variables acquired in a certain period of time S and determination data D includes, increase. Therefore, the speed and reliability of learning a moving speed adjustment amount of each motor of a robot with a set of more diverse data as inputs can be improved.

In der maschinellen Lernvorrichtung 100 mit der obigen Konfiguration ist ein vom Lernabschnitt 110 ausgeführter Lernalgorithmus nicht besonders eingeschränkt. Beispielsweise kann ein als maschinelles Lernen bekannter Lernalgorithmus eingesetzt werden. 3 zeigt als eine Ausführungsform der in 1 gezeigten Steuerung 1 eine Konfiguration, die den Lernabschnitt 110 umfasst, der das verstärkende Lernen als Beispiel für einen Lernalgorithmus durchführt.In the machine learning device 100 with the above configuration is one of the session 110 executed learning algorithm is not particularly limited. For example, a learning algorithm known as machine learning can be used. 3 As an embodiment of FIG 1 shown control 1 a configuration that completes the session 110 which performs the reinforcing learning as an example of a learning algorithm.

Das verstärkende Lernen ist ein Verfahren, bei dem, während der aktuelle Zustand (d.h. eine Eingabe) einer Umwelt, in der ein Lernziel existiert, beobachtet wird, eine vorgeschriebene Aktion (d.h. eine Ausgabe) im aktuellen Zustand durchgeführt wird und der Zyklus des Vergebens einer Belohnung der Aktion wiederholt durch Ausprobieren durchgeführt wird, um Maßnahmen zu lernen (die Abstimmung einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters, im Falle der maschinellen Lernvorrichtung der vorliegenden Anwendung), um die Summe der Belohnungen als eine optimale Lösung zu maximieren.The reinforcing learning is a method in which, while the current state (ie, input) of an environment in which a learning exists exists, a prescribed action (ie, an output) is performed in the current state and the cycle of giving one Reward the action repeatedly performed by trial and error to learn action (adjusting a movement speed of each motor a robot, in the case of the machine learning apparatus of the present application) to maximize the sum of the rewards as an optimal solution.

In der maschinellen Lernvorrichtung 100 der in 3 gezeigten Steuerung 1 umfasst der Lernabschnitt 110 einen Belohnungsberechnungsabschnitt 112 und einen Wertfunktion-Aktualisierungsabschnitt 114. Der Belohnungsberechnungsabschnitt 112 berechnet eine Belohnung, die einem Eignungsbestimmungsergebnis (entsprechend den im nächsten Lernzyklus verwendeten Bestimmungsdaten D, in dem die Zustandsvariablen S erfasst wurden) des Betriebszustands des Spitzenendes eines Roboters zugeordnet wird, wenn eine Einlerngeschwindigkeit eines jeden Motors basierend auf den Zustandsvariablen S abgestimmt wird. Der Wertfunktion-Aktualisierungsabschnitt 114 aktualisiert unter Verwendung der berechneten Belohnung R eine Funktion Q, die einen Wert eines Abstimmbetrags einer Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters ausdrückt. Der Lernabschnitt 110 lernt einen Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters in Bezug auf eine Sollgeschwindigkeit des Spitzenendes des Roboters, eine Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und eine Bewegungsbahn in der Nähe des Spitzenendes des Roboters derart, dass der Wertfunktion-Aktualisierungsabschnitt 114 die Funktion Q wiederholt aktualisiert.In the machine learning device 100 the in 3 shown control 1 includes the session 110 a reward calculation section 112 and a value function updating section 114 , The reward calculation section 112 calculates a reward corresponding to a fitness determination result (corresponding to the determination data used in the next learning cycle D in which the state variables S detected) of the operating state of the tip end of a robot when a learning speed of each motor based on the state variables S is agreed. The value function update section 114 updated using the calculated reward R a function Q which expresses a value of a tuning amount of a moving speed of each of the motors of the robot. The session 110 learns a tuning amount of a moving speed of each motor of a robot with respect to a target speed of the tip end of the robot, a moving speed of each of the motors of the robot, and a moving track near the tip end of the robot such that the value function updating section 114 the function Q repeatedly updated.

Ein Beispiel für einen Lernalgorithmus des verstärkenden Lernens, der vom Lernabschnitt 110 durchgeführt wird, wird beschrieben. Der Algorithmus in diesem Beispiel ist als Q-Lernen bekannt und drückt ein Verfahren aus, bei dem ein Zustand s eines Aktionssubjekts und eine möglicherweise von dem Aktionssubjekt im Zustand s durchgeführte Aktion a als unabhängige Größen angenommen werden und eine Funktion Q(s, a), die einen Aktionswert ausdrückt, wenn die Aktion a im Zustand s ausgewählt wird, gelernt wird. Die Auswahl der Aktion a, bei der die Wertfunktion Q im Zustand s die größte wird, führt zu einer optimalen Lösung. Durch Starten des Q-Lernens in einem Zustand, in dem die Korrelation zwischen dem Zustand s und der Aktion a unbekannt ist, und wiederholtes Ausführen der Auswahl verschiedener Aktionen a durch Ausprobieren in einem beliebigen Zustand s, wird die Wertefunktion Q wiederholt aktualisiert, um einer optimalen Lösung angenähert zu werden. Wenn sich hierbei eine Umwelt (d.h. der Zustand s) ändert, während die Aktion a im Zustand s ausgewählt wird, wird eine Belohnung (d.h. die Gewichtung der Aktion a) r entsprechend der Änderung erhalten und das Lernen wird darauf ausgerichtet, eine Aktion a auszuwählen, durch die eine höhere Belohnung r erhalten wird. So kann die Wertfunktion Q in einem relativ kurzen Zeitraum einer optimalen Lösung angenähert werden.An example of a learning algorithm for reinforcement learning that starts from the session 110 is performed is described. The algorithm in this example is known as Q-learning and expresses a method in which a state s an action subject and possibly the action subject in the state s Action taken a be accepted as independent variables and a function Q (s, a) expressing an action value when the action a in the state s is selected, is learned. The selection of action a, where the value function Q in the state s becomes the largest, leads to an optimal solution. By starting the Q-learning in a state where the correlation between the state s and the action a is unknown, and repeatedly executing the selection of various actions a by trying in any state s, the value function becomes Q repeatedly updated to approximate an optimal solution. When an environment (ie, the state s) changes while the action a is selected in the state s, becomes a reward (ie the weighting of the action a ) r according to the change and the learning is geared to selecting an action a, by which a higher reward r is obtained. So can the value function Q be approximated to an optimal solution in a relatively short period of time.

Im Allgemeinen kann die Aktualisierungsformel der Wertfunktion Q wie die folgende Formel (1) ausgedrückt werden. In der Formel (1) drücken s_t und a_t einen Zustand bzw. eine Aktion zum Zeitpunkt t aus, und der Zustand ändert sich zu s_t+1 mit der Aktion a_t. r_t+1 drückt eine Belohnung aus, die erhalten wird, wenn sich der Zustand von s_t zu s_t+1 ändert. Der Begriff maxQ drückt Q in einem Fall aus, in dem eine Aktion a ausgeführt wird, durch die der maximale Wert Q zum Zeitpunkt t + 1 (der zum Zeitpunkt t angenommen wird) erreicht wird. α und γ drücken einen Lernkoeffizienten beziehungsweise einen Diskontierungsfaktor aus und werden willkürlich so eingestellt, dass sie innerhalb von 0 < α ≤ 1 beziehungsweise 0 < γ ≤ 1 fallen. $Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + α (r_{t + 1} + γ \max_{a} Q (s_{t}, a) - Q (s_{t}, a_{t}))$

In general, the update formula of the value function Q may be expressed as the following formula (1). In the formula (1), s _t and a _t express a state and an action at time t, respectively, and the state changes to s _{t + 1} with the action a _t . r _{t + 1} expresses a reward obtained when the state changes from s _t to s _{t + 1} . The term maxQ expresses Q in a case where an action a is executed by which the maximum value Q is reached at time t + 1 (assumed at time t). α and γ express a learning coefficient and a discounting factor, respectively, and are arbitrarily set to fall within 0 <α ≦ 1 and 0 <γ ≦ 1, respectively.

Q (s_{t} . a_{t}) \leftarrow Q (s_{t} . a_{t}) + α (r_{t + 1} + γ \underset{a}{Max} Q (s_{t} . a) - Q (s_{t} . a_{t}))

Wenn der Lernabschnitt 110 das Q-Lernen durchführt, entsprechen die von dem Zustandsbeobachtungsabschnitt 106 beobachteten Zustandsvariablen S und die von dem Zustandsdaten-Erfassungsabschnitt 108 erfassten Bestimmungsdaten D dem Zustand s in dieser Aktualisierungsformel, die Aktion des Bestimmens eines Abstimmbetrags einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters in Bezug auf einen aktuellen Zustand (d. h., eine Sollgeschwindigkeit des Spitzenendes des Roboters, eine Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und eine Bewegungsbahn in der Nähe des Spitzenendes des Roboters) entspricht der Aktion a in der Aktualisierungsformel, und die durch den Belohnungsberechnungsabschnitt 112 berechnete Belohnung R entspricht der Belohnung r in der Aktualisierungsformel. Dementsprechend aktualisiert der Wertfunktion-Aktualisierungsabschnitt 114 wiederholt die Funktion Q, die einen Wert eines Abstimmbetrags einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters in Bezug auf einen aktuellen Zustand durch das Q-Lernen ausdrückt, unter Verwendung der Belohnung R.When the session 110 performing the Q-learning corresponds to that of the state observation section 106 observed state variables S and the state data acquiring section 108 recorded determination data D the state s in this updating formula, the action of determining a tuning amount of a moving speed of each motor of a robot with respect to a current state (ie, a target speed of the tip end of the robot, a moving speed of each of the motors of the robot, and a trajectory near the tip end of the robot) Robot) corresponds to the action a in the update formula, and that by the reward calculation section 112 calculated reward R corresponds to the reward r in the update formula. Accordingly, the value function updating section updates 114 repeats the function Q which expresses a value of a tuning amount of a moving speed of each motor of a robot with respect to a current state through the Q learning, using the reward R ,

Wenn der Roboter beispielsweise gemäß einer Bewegungsgeschwindigkeit jedes Motors gesteuert wird, die basierend auf einem Abstimmbetrag bestimmt wird, der nach der Bestimmung des Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters bestimmt wird, kann die durch den Belohnungsberechnungsabschnitt 112 berechnete Belohnung R positiv (plus) sein, wenn ein Eignungsbestimmungsergebnis des Betriebszustands eines Roboters als „geeignet“ bestimmt wird (beispielsweise ein Fall, in dem die Differenz zwischen einer Bewegungsgeschwindigkeit und einer Zielgeschwindigkeit des Spitzenendes des Roboters in einen zulässigen Bereich fällt, ein Fall, in dem die Differenz zwischen der Position des Spitzenendes des Roboters und einer Einlernposition innerhalb eines zulässigen Bereichs oder dergleichen fällt), oder kann negativ (minus) sein, wenn das Eignungsbestimmungsergebnis des Betriebszustandes des Roboters als „ungeeignet“ bestimmt wird (beispielsweise ein Fall, in dem die Differenz zwischen der Bewegungsgeschwindigkeit und der Zielgeschwindigkeit des Spitzenendes des Roboters über den zulässigen Bereich hinausgeht, ein Fall, in dem die Differenz zwischen der Position des Spitzenendes des Roboters und der Einlernposition über den zulässigen Bereich hinausgeht, oder dergleichen).For example, when the robot is controlled according to a moving speed of each motor that is determined based on a tuning amount determined after the determination of the tuning amount of the moving speed of each of the motors of the robot, the reward calculation section may determine 112 calculated reward R be positive (plus) when a fitness determination result of the operating state of a robot is determined to be "suitable" (for example, a case where the difference between a moving speed and a target speed of the tip end of the robot falls within an allowable range), a case where the Difference between the position of the tip end of the robot and a teaching position within an allowable range or the like), or may be negative (minus) if the fitness determination result of the operating state of the robot is determined to be "inappropriate" (for example, a case where the difference between the moving speed and the target speed the tip end of the robot goes beyond the allowable range, a case where the difference between the position of the tip end of the robot and the teaching position exceeds the allowable range, or the like).

Die Absolutwerte der positiven und negativen Belohnungen R können bei der Bestimmung der Belohnungen R gleich oder verschieden sein. Außerdem können als Bestimmungsbedingungen eine Vielzahl von in den Bestimmungsdaten D umfassten Werten miteinander kombiniert werden, um eine Bestimmung durchzuführen.The absolute values of the positive and negative rewards R can help in determining the rewards R be the same or different. In addition, as a determination condition, a plurality of in the determination data D combined values to make a determination.

Außerdem kann ein Eignungsbestimmungsergebnis des Betriebs eines Roboters nicht nur „geeignete“ und „ungeeignete“ Ergebnisse, sondern auch eine Vielzahl von Ergebnisebenen umfassen. Wenn beispielsweise ein Maximalwert innerhalb eines zulässigen Bereichs der Differenz zwischen einer Bewegungsgeschwindigkeit und einer Sollgeschwindigkeit des Spitzenendes eines Roboters als G_max angenommen wird, wird die Belohnung R = 5 vergeben, wenn die Differenz G zwischen der Bewegungsgeschwindigkeit und der Sollgeschwindigkeit des Spitzenendes des Roboters innerhalb von 0 ≤ G < G_max/5 fällt, die Belohnung R = 2 wird vergeben, wenn die Differenz G innerhalb von G_max/5 ≤ G < G_max/2 fällt, und die Belohnung R = 1 wird vergeben, wenn die Differenz G innerhalb von G_max/2 ≤ G ≤ G_max liegt. Außerdem kann G_max in der Anfangsphase des Lernens relativ größer eingestellt werden und so eingestellt werden, dass sie mit zunehmendem Lernfortschritt abnimmt.In addition, a fitness determination result of the operation of a robot may include not only "suitable" and "inappropriate" results, but also a plurality of result levels. For example, assuming a maximum value within an allowable range of the difference between a moving speed and a target speed of the tip end of a robot as G _max , the reward R = 5 is awarded if the difference G between the moving speed and the target speed of the tip end of the robot is within 0 ≤ G <G _max / 5, the reward R = 2 is awarded when the difference G falls within G _max / 5 ≤ G <G _max / 2, and the reward R = 1 is awarded when the difference G within G _max / 2 ≤ G ≤ G _max . In addition, G _max can be set relatively larger in the initial phase of learning and set to decrease with increasing learning progress.

Der Wertfunktion-Aktualisierungsabschnitt 114 kann eine Aktionswerttabelle aufweisen, in der die Zustandsvariablen S, die Bestimmungsdaten D und die Belohnungen R in Zuordnung zu den durch die Funktion Q ausgedrückten Aktionswerten (z.B. numerische Werte) organisiert sind. In diesem Fall ist die Aktion des Aktualisierens der Funktion Q mit dem Wertfunktion-Aktualisierungsabschnitt 114 gleichbedeutend mit der Aktion des Aktualisierens der Aktionswerttabelle mit dem Wertfunktion-Aktualisierungsabschnitt 114. Zu Beginn des Q-Lernens ist die Korrelation zwischen dem aktuellen Zustand einer Umwelt und einem Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors unbekannt. Daher werden in der Aktionswerttabelle verschiedene Arten der Zustandsvariablen S, die Bestimmungsdaten D und die Belohnungen R in Zuordnung zu Werten (Funktion Q) von zufällig festgelegten Aktionswerten aufbereitet. Es wird angemerkt, dass der Belohnungsberechnungsabschnitt 112 die Belohnungen R entsprechend den Bestimmungsdaten D sofort berechnen kann, wenn die Bestimmungsdaten D bekannt sind, und die Werte der berechneten Belohnungen R werden in die Aktionswerttabelle geschrieben.The value function update section 114 may have an action value table in which the state variables S , the determination data D and the rewards R in association with the through the function Q expressed action values (eg numerical values). In this case, the action of updating the function is Q with the value function update section 114 synonymous with the action of updating the action value table with the value function updating section 114 , At the beginning of Q learning, the correlation between the current state of an environment and a tuning amount of a moving speed of each motor is unknown. Therefore, in the action value table, various kinds of the state variables become S , the determination data D and the rewards R in assignment to values (function Q ) of random action values. It is noted that the reward calculation section 112 the rewards R according to the determination data D can calculate immediately if the determination data D known and the values of the calculated rewards R are written to the action value table.

Wenn das Q-Lernen unter Verwendung der Belohnung R, die einem Eignungsbestimmungsergebnis des Betriebszustandes eines Roboters entspricht, vorangetrieben wird, wird das Lernen darauf ausgerichtet, die Aktion des Erhaltens einer höheren Belohnung R auszuwählen. Dann werden Werte (Funktion Q) von Aktionswerten für eine in einem aktuellen Zustand ausgeführte Aktion erneut eingeschrieben, um die Aktionswerttabelle entsprechend dem Zustand einer Umwelt (d.h. den Zustandsvariablen S und den Bestimmungsdaten D), die sich ändert, zu aktualisieren, wenn die ausgewählte Aktion im aktuellen Zustand ausgeführt wird. Durch wiederholtes Ausführen der Aktualisierung werden die Werte (die Funktion Q) der in der Aktionswerttabelle angezeigten Aktionswerte erneut größer eingeschrieben, wenn eine Aktion geeigneter ist. So wird der Zusammenhang zwischen einem aktuellen Zustand (einer Sollgeschwindigkeit des Spitzenendes eines Roboters, einer Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und einer Bewegungsbahn in der Nähe des Spitzenendes des Roboters) in einer unbekannten Umwelt und einer entsprechenden Aktion (Abstimmung der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters) schrittweise offensichtlich. Das heißt, durch die Aktualisierung der Aktionswerttabelle wird das Verhältnis zwischen einer Sollgeschwindigkeit des Spitzenendes eines Roboters, einer Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und einer Bewegungsbahn in der Nähe des Spitzenendes des Roboters und einem Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters schrittweise einer optimalen Lösung angenähert.If Q learning using the reward R According to a fitness determination result of the operating state of a robot, the learning is aimed at the action of obtaining a higher reward R select. Then values (function Q ) of action values for an action performed in a current state is rewritten to match the action value table according to the state of an environment (ie, state variables S and the determination data D ), which changes to update when the selected action is executed in the current state. Repeating the update will cause the values (the function Q ) the action values displayed in the action value table are rewritten larger if an action is more appropriate. Thus, the relationship between a current state (a target speed of the tip end of a robot, a moving speed of each of the motors of the robot and a trajectory near the tip end of the robot) in an unknown environment and a corresponding action (matching the moving speed of each of the Motors of the robot) gradually apparent. That is, by updating the action value table, the ratio between a target speed of the tip end of a robot, a moving speed of each of the motors of the robot and a trajectory near the tip end of the robot and a timing of adjusting the moving speed of each of the motors of the robot gradually become one approximated optimal solution.

Der Ablauf des obigen Q-Lernens (d.h. die Ausführungsform eines maschinellen Lernverfahrens) durch den Lernabschnitt 110 wird mit Bezugnahme auf 4 näher beschrieben.The process of the above Q learning (ie, the embodiment of a machine learning method) by the learning section 110 becomes with reference to 4 described in more detail.

Zunächst wählt der Wertfunktion-Aktualisierungsabschnitt 114 bei Schritt SA01 unter Bezugnahme auf eine Aktionswerttabelle zu diesem Zeitpunkt willkürlich einen Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters als eine Aktion aus, die in einem aktuellen Zustand ausgeführt wird, der durch die Zustandsvariablen S angezeigt wird, die durch den Zustandsbeobachtungsabschnitt 106 beobachtet werden. Anschließend importiert der Wertfunktion-Aktualisierungsabschnitt 114 die Zustandsvariable S in dem durch den Zustandsbeobachtungsabschnitt 106 in Schritt SA02 beobachteten aktuellen Zustand und importiert die Bestimmungsdaten D in dem durch den Bestimmungsdaten-Erfassungsabschnitt 108 in Schritt SA03 erfassten aktuellen Zustand. In Schritt SA04 bestimmt dann der der Wertfunktion-Aktualisierungsabschnitt 114 basierend auf den Bestimmungsdaten D, ob der Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters geeignet ist oder nicht. Wenn der Abstimmbetrag geeignet ist, wendet der Wertfunktion-Aktualisierungsabschnitt 114 in Schritt SA05 eine durch den Belohnungsberechnungsabschnitt 112 berechnete positive Belohnung Rauf die Aktualisierungsformel der Funktion Q an. Als nächstes aktualisiert in Schritt SA06 der Wertfunktion-Aktualisierungsabschnitt 114 die Aktionswerttabelle unter Verwendung der Zustandsvariable S und der Bestimmungsdaten D in dem aktuellen Zustand, der Belohnung R und einem Wert (aktualisierte Funktion Q) eines Aktionswertes. Wenn in Schritt SA04 bestimmt wird, dass der Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters ungeeignet ist, wendet der Wertfunktion-Aktualisierungsabschnitt 114 in Schritt SA07 dagegen eine von dem Belohnungsberechnungsabschnitt 112 berechnete negative Belohnung R auf die Aktualisierungsformel der Funktion Q an. Im Schritt SA06 aktualisiert die Wertfunktion-Aktualisierungsabschnitt 114 dann die Aktionswerttabelle unter Verwendung der Zustandsvariablen S und der Bestimmungsdaten D im aktuellen Zustand, der Belohnung R und dem Wert (aktualisierte Funktion Q) des Aktionswertes. Der Lernabschnitt 110 aktualisiert die Aktionswerttabelle erneut, indem er wiederholt die obige Verarbeitung der Schritte SA01 bis SA07 durchführt und das Erlernen des Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden einzelnen Motors des Roboters vorantreibt. Es wird angemerkt, dass die Verarbeitung der Berechnung der Belohnungen R und die Verarbeitung der Aktualisierung der Wertfunktion in den Schritten SA04 bis SA07 für alle der in den Ermittlungsdaten D enthaltenen Daten durchgeführt werden.First, the value function update section selects 114 at step SA01 with reference to an action value table at this time, arbitrarily selects a tuning amount of a moving speed of each motor of a robot as an action performed in a current state indicated by the state variables S indicated by the state observation section 106 to be watched. Then the value function update section imports 114 the state variable S in the state observation section 106 in step SA02 watch current state and imports the determination data D in the determination data acquiring section 108 in step SA03 recorded current state. In step SA04 then determines the value function update section 114 based on the determination data D Whether the tuning amount of the moving speed of each of the motors of the robot is suitable or not. If the reconciliation amount is appropriate, the value function update section applies 114 in step SA05 one by the reward calculation section 112 calculated positive reward Up the function's update formula Q at. Next updated in step SA06 the value function update section 114 the action value table using the state variable S and the determination data D in the current state, the reward R and a value (updated function Q ) of an action value. When in step SA04 it is determined that the adjustment amount of the moving speed of each of the motors of the robot is inappropriate, the value function updating section uses 114 in step SA07 on the other hand, one of the reward calculation section 112 calculated negative reward R on the function's update formula Q at. In step SA06 updates the value function update section 114 then the action value table using the state variable S and the determination data D in the current state, the reward R and the value (updated function Q ) of the action value. The session 110 updates the action value table again by repeating the above processing of the steps SA01 to SA07 and advancing the learning of the amount of adjustment of the moving speed of each individual motor of the robot. It is noted that the processing of calculating the rewards R and the processing of updating the value function in the steps SA04 to SA07 for all of those in the investigation data D contained data.

Um das oben beschriebene Verstärkungslernen voranzutreiben, kann anstelle von beispielsweise dem Q-Lernen ein neuronales Netzwerk verwendet werden. 5A zeigt schematisch ein Modell eines Neuronenmodell. 5B zeigt schematisch das Modell eines neuronalen Netzwerks mit drei Schichten, in denen die in 5A gezeigten Neuronen miteinander kombiniert sind. Das neuronale Netzwerk kann beispielsweise einem Neuronenmodell folgend durch eine Recheneinheit, eine Speichereinheit oder dergleichen konfiguriert werden.To promote the gain learning described above, a neural network may be used instead of, for example, Q learning. 5A schematically shows a model of a neuron model. 5B schematically shows the model of a neural network with three layers, in which the in 5A neurons are combined. The neural network can be configured, for example, following a neuron model by a computing unit, a storage unit or the like.

Das in 5A dargestellte Neuron gibt ein Ergebnis y in Bezug auf eine Vielzahl von Eingaben x aus (hier beispielsweise Eingaben x₁ bis x₃). Die Eingaben x₁ bis x₃ werden jeweils mit entsprechenden Gewichten w (w₁ bis w₃) multipliziert. Das Neuron gibt also das durch die folgende Formel 2 ausgedrückte Ergebnis y aus. Es wird angemerkt, dass in der folgenden Formel 2 eine Eingabe x, ein Ergebnis y und ein Gewicht w alle Vektoren sind. Zusätzlich drückt θ einen Bias aus und f_k drückt eine Aktivierungsfunktion aus. $y = f_{k} (\sum_{i - 1}^{n} x_{i} w_{i} - θ)$

This in 5A The illustrated neuron outputs a result y with respect to a plurality of inputs x (here, inputs x ₁ to x _3, for example). The inputs x ₁ to x ₃ are respectively multiplied by corresponding weights w (w ₁ to w ₃ ). The neuron thus outputs the result y expressed by the following formula 2. It is noted that in the following formula 2, an input x, a result y, and a weight w are all vectors. In addition, θ expresses a bias, and f _k expresses an activation function.

y = f_{k} (Σ_{i - 1}^{n} x_{i} w_{i} - θ)

In dem neuronalen Netz mit den drei in 5B gezeigten Schichten werden eine Vielzahl von Eingaben x (hier die Eingänge x1 bis x3 als Beispiel) von der linken Seite des neuronalen Netzwerks eingegeben und die Ergebnisse y (hier die Ergebnisse y1 bis y3 als Beispiel) von der rechten Seite des neuronalen Netzwerks ausgegeben. In dem in 5B gezeigten Beispiel werden die Eingaben x1, x2 und x3 mit entsprechenden Gewichten (insgesamt als w1 ausgedrückt) multipliziert und in drei Neuronen N11, N12 beziehungsweise N13 eingegeben.In the neural network with the three in 5B shown layers are a variety of inputs x (here the inputs x1 to x3 as an example) from the left side of the neural network and the results y (here the results y1 to y3 as an example) from the right side of the neural network. In the in 5B example shown are the inputs x1 . x2 and x3 with corresponding weights (in total as w1 expressed) and in three neurons N11 . N12 respectively N13 entered.

In 5B werden die jeweiligen Ausgaben der Neuronen N11 bis N13 insgesamt als z1 ausgedrückt. Die Ausgaben z1 können als Merkmalsvektoren betrachtet werden, die durch Extraktion von Merkmalsgrößen der Eingabevektoren erhalten werden. In dem in 5B gezeigten Beispiel werden die jeweiligen Merkmalsvektoren z1 mit entsprechenden Gewichten (insgesamt als w2 ausgedrückt) multipliziert und entsprechend in zwei Neuronen N21 bis N22 eingegeben. Die Merkmalsvektoren z1 drücken die Merkmale zwischen den Gewichten w1 und den Gewichten w2 aus.In 5B become the respective outputs of the neurons N11 to N13 in total as z1 expressed. Expenditure z1 may be considered as feature vectors obtained by extracting feature sizes of the input vectors. In the in 5B the example shown, the respective feature vectors z1 with corresponding weights (in total as w2 expressed) and correspondingly in two neurons N21 to N22 entered. The feature vectors z1 push the features between the weights w1 and the weights w2 out.

In 5B werden die jeweiligen Ausgaben der Neuronen N21 und N22 insgesamt als z2 ausgedrückt. Die Ausgaben z2 können als Merkmalsvektoren betrachtet werden, die durch Extraktion von Merkmalsbeträgen der Merkmalsvektoren z1 erhalten werden. In dem in 5B gezeigten Beispiel werden die jeweiligen Merkmalsvektoren z2 mit entsprechenden Gewichten (insgesamt als w3 ausgedrückt) multipliziert und in drei Neuronen N31, N32 beziehungsweise N33 eingegeben. Die Merkmalsvektoren z2 drücken die Merkmale zwischen den Gewichten W2 und W3 aus. Schließlich geben die Neuronen N31 bis N33 die entsprechenden Ergebnisse y1 bis y3 aus.In 5B become the respective outputs of the neurons N21 and N22 in total as z2 expressed. Expenditure z2 may be considered as feature vectors obtained by extracting feature amounts of the feature vectors z1 to be obtained. In the in 5B the example shown, the respective feature vectors z2 with corresponding weights (in total as w3 expressed) and in three neurons N31 . N32 respectively N33 entered. The feature vectors z2 push the features between the weights W2 and W3 out. Finally, give the neurons N31 to N33 the corresponding results y1 to y3 out.

Es wird angemerkt, dass es möglich ist, das sogenannte Deep Learning einzusetzen, bei dem ein neuronales Netzwerk, das drei oder mehr Schichten bildet, verwendet wird.It is noted that it is possible to employ the so-called deep learning in which a neural network constituting three or more layers is used.

In der maschinellen Lernvorrichtung 100 der Steuerung 1 führt der Lernabschnitt 110 die Berechnung der Zustandsvariablen S und der Bestimmungsdaten D als Eingaben x basierend auf einer Mehrschichtstruktur gemäß dem obigen neuronalen Netzwerk durch, sodass der Lernabschnitt 110 einen Abstimmbetrag (Ergebnis y) einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters ausgeben kann. Außerdem verwendet der Lernabschnitt 110 in der maschinellen Lernvorrichtung 100 der Steuerung 1 ein neuronales Netzwerk als eine Wertfunktion beim verstärkenden Lernen und führt die Berechnung der Zustandsvariablen S und der Aktion a als Eingaben x basierend auf einer Mehrschichtstruktur gemäß dem obigen neuronalen Netz durch, sodass der Lernabschnitt 110 einen Wert (Ergebnis y) der Aktion in dem Zustand ausgeben kann. Es wird angemerkt, dass der Aktionsmodus des neuronalen Netzwerks einen Lernmodus und einen Wertevorhersagemodus umfasst. Beispielsweise ist es möglich, unter Verwendung eines Lerndatensatzes im Lernmodus ein Gewicht w zu lernen und mit dem gelernten Gewicht w im Wertevorhersagemodus einen Aktionswert zu bestimmen. Es wird angemerkt, dass Erkennung, Klassifizierung, Abzug oder ähnliches im Wertevorhersagemodus durchgeführt werden kann.In the machine learning device 100 the controller 1 the session leads 110 the calculation of the state variables S and the determination data D as inputs x based on a multi-layer structure according to the above neural network, so that the learning section 110 can output a tuning amount (result y) of a moving speed of each motor of a robot. Besides, the session uses 110 in the machine learning device 100 the controller 1 a neural network as a value function in the reinforcing learning and performs the calculation of the state variables S and action a as inputs x based on a multi-layer structure according to the above neural network, so that the session 110 a value (result y ) of the action in the state. It is noted that the action mode of the neural network includes a learning mode and a value predicting mode. For example, it is possible to weight using a learning data set in the learning mode w to learn and with the learned weight w in the value prediction mode, determine an action value. It is noted that detection, classification, subtraction or the like can be performed in the value prediction mode.

Die Konfiguration des obigen Steuerung 1 kann als ein durch den Prozessor ausgeführtes maschinelles Lernverfahren (oder Software) 101 beschrieben werden. Das maschinelle Lernverfahren ist ein Verfahren zum Erlernen eines Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden einzelnen Motors des Roboters. Das maschinelle Lernverfahren umfasst:

einen Schritt des Beobachtens, durch die CPU eines Computers, von Einlerngeschwindigkeits-Abstimmbetragsdaten S1, Sollgeschwindigkeitsdaten S2, Bewegungsgeschwindigkeitsdaten S3 und Bewegungsbahndaten S4 als Zustandsvariablen S, die den aktuellen Zustand einer Umwelt ausdrücken, in der der Roboter gesteuert wird;
einen Schritt des Erfassens von Bestimmungsdaten D, die ein Eignungsbestimmungsergebnis des Betriebszustandes des Roboters gemäß der abgestimmten Bewegungsgeschwindigkeit eines jeden der Motoren anzeigen; und
einen Schritt des Lernens der Sollgeschwindigkeitsdaten S2, der Bewegungsgeschwindigkeitsdaten S3 und der Bewegungsbahndaten S4 und des Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters in Zuordnung zueinander unter Verwendung der Zustandsvariablen S und der Bestimmungsdaten D.

The configuration of the above controller 1 may be implemented as a machine learning process (or software) performed by the processor 101 to be discribed. The machine learning method is a method of learning a tuning amount of the moving speed of each individual motor of the robot. The machine learning method includes:

a step of observing, by the CPU of a computer, teaching-in tune amount data S1 , Target speed data S2 , Movement speed data S3 and trajectory data S4 as state variables S expressing the current state of an environment in which the robot is controlled;
a step of acquiring determination data D indicating a fitness determination result of the operating state of the robot according to the coordinated moving speed of each of the motors; and
a step of learning the target speed data S2 , the motion speed data S3 and the trajectory data S4 and the rate of adjustment of the speed of movement of each of the motors of the robot in association with each other using the state variables S and the destination data D.

6 zeigt eine Steuerung 2 gemäß einer zweiten Ausführungsform. 6 shows a controller 2 according to a second embodiment.

Die Steuerung 2 umfasst eine maschinelle Lernvorrichtung 120 und einen Zustandsdaten-Erfassungsabschnitt 3, der als Zustandsdaten S0 Einlerngeschwindigkeits-Abstimmbetragsdaten S1, Sollgeschwindigkeitsdaten S2, Bewegungsgeschwindigkeitsdaten S3 und Bewegungsbahndaten S4 der von einem Zustandsbeobachtungsabschnitt 106 beobachteten Zustandsvariablen S erfasst. Der Zustandsdaten-Erfassungsabschnitt 3 kann die Zustandsdaten S0 von jedem Abschnitt der Steuerung 2, verschiedenen Sensoren eines Roboters, geeigneten Dateneingaben durch einen Arbeiter oder dergleichen erfassen.The control 2 includes a machine learning device 120 and a state data acquiring section 3 that as state data S0 Einlerngeschwindigkeits-Abstimmbetragsdaten S1 , Target speed data S2 , Movement speed data S3 and trajectory data S4 that of a state observation section 106 observed state variables S detected. The state data acquiring section 3 can the status data S0 from every section of the controller 2 , various sensors of a robot, appropriate data inputs by a worker or the like.

Die maschinelle Lernvorrichtung 120 der Steuerung 2 umfasst neben Software (wie einen Lernalgorithmus) und Hardware (wie einen Prozessor 101) zum spontanen Erlernen eines Abstimmbetrags einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters durch maschinelles Lernen, Software (wie einem Berechnungsalgorithmus) und Hardware (wie einen Prozessor 101) zum Ausgeben des erlernten Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters als einen Befehl für die Steuerung 2. Die maschinelle Lernvorrichtung 120 der Steuerung 2 kann so konfiguriert werden, dass ein gemeinsamer Prozessor die gesamte Software, wie etwa einen Lernalgorithmus und einen Berechnungsalgorithmus, ausführt.The machine learning device 120 the controller 2 includes software (such as a learning algorithm) and hardware (such as a processor) 101 ) for spontaneously learning a tuning amount of a moving speed of each motor of a robot through machine learning, software (such as a calculation algorithm), and hardware (such as a processor 101 ) for outputting the learned tuning amount of the moving speed of each of the motors of the robot as a command for the control 2 , The machine learning device 120 the controller 2 can be configured so that a common processor executes all the software, such as a learning algorithm and a calculation algorithm.

Ein Entscheidungsfindungsabschnitt 122 kann beispielsweise als eine der Funktionen des Prozessors 101 oder als im ROM 102 gespeicherte Software zum Betreiben des Prozessors 101 konfiguriert werden. Basierend auf einem Lernergebnis des Lernabschnitts 110 erzeugt und gibt der Entscheidungsfindungsabschnitt 122 einen Sollwert C aus, der einen Befehl zum Bestimmen eines Abstimmbetrags einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters in Bezug auf eine Sollgeschwindigkeit des Spitzenendes des Roboters, die Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und eine Bewegungsbahn in der Nähe des Spitzenendes des Roboters umfasst. Wenn der Entscheidungsfindungsabschnitt 122 den Sollwert C an den Controller 2 ausgibt, ändert sich der Zustand einer Umwelt entsprechend.A decision-making section 122 for example, as one of the functions of the processor 101 or as in ROM 102 stored software for operating the processor 101 be configured. Based on a learning outcome of the session 110 generates and gives the decision-making section 122 a set point C that includes a command for determining a trim amount of a moving speed of each motor of a robot with respect to a target speed of the tip end of the robot, the moving speed of each of the motors of the robot, and a moving path near the tip end of the robot. When the decision-making section 122 setpoint C to the controller 2 the condition of an environment changes accordingly.

Der Zustandsbeobachtungsabschnitt 106 beobachtet in einem nächsten Lernzyklus Zustandsvariablen S, die sich nach der Ausgabe des Sollwertes C an eine Umwelt durch den Entscheidungsfindungsabschnitt 122 geändert haben. Der Lernabschnitt 110 aktualisiert beispielsweise eine Wertfunktion Q (das heißt, eine Aktionswerttabelle) unter Verwendung der geänderten Zustandsvariablen S, um einen Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters zu lernen. Es wird angemerkt, dass der Zustandsbeobachtungsabschnitt 106 die Einlerngeschwindigkeits-Abstimmbetragsdaten S1 aus einem RAM 103 der maschinellen Lernvorrichtung 120, wie in der ersten Ausführungsform beschrieben, beobachten kann, anstatt sie aus den durch den Zustandsdaten-Erfassungsabschnitt 3 erfassten Zustandsdaten S0 zu erfassen.The state observation section 106 In a next learning cycle, it observes state variables S, which follow the output of the setpoint C to an environment through the decision-making section 122 have changed. The session 110 for example, updates a value function Q (that is, an action value table) using the changed state variables S to learn a tuning amount of a moving speed of each motor of a robot. It is noted that the state observation section 106 the teaching speed tuning amount data S1 from a RAM 103 the machine learning device 120 as described in the first embodiment, instead of looking at it by the state data acquisition section 3 acquired status data S0 capture.

Der Entscheidungsfindungsabschnitt 122 gibt zum Anfordern einer Abstimmung einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters an die Steuerung 2 einen Sollwert C aus, der basierend auf einem Lernergebnis berechnet wird. Durch wiederholtes Durchführen des Lernzyklus bringt die maschinelle Lernvorrichtung 120 das Erlernen eines Abstimmbetrags einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters voran und verbessert allmählich die Zuverlässigkeit des Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden Motors des Roboters, der durch die maschinelle Lernvorrichtung 120 selbst bestimmt wird.The decision-making section 122 For requesting a vote of a movement speed of each motor of a robot to the controller 2 a setpoint C which is calculated based on a learning outcome. By repeatedly performing the learning cycle, the machine learning device brings 120 learning a tuning amount of a moving speed of each motor of a robot and gradually improving the reliability of the tuning amount of the moving speed of each motor of the robot, by the machine learning device 120 itself is determined.

Die maschinelle Lernvorrichtung 120 der Steuerung 2 mit der obigen Konfiguration erzeugt die gleiche Wirkung wie die oben beschriebene maschinelle Lernvorrichtung 100. Insbesondere kann die maschinelle Lernvorrichtung 120 mit der Ausgabe des Entscheidungsfindungsabschnitts 122 den Zustand einer Umwelt verändern. Andererseits kann die maschinelle Lernvorrichtung 100 ein externes Gerät nach einer Funktion anfragen, die dem Entscheidungsfindungsabschnitt entspricht, um Lernergebnisse des Lernabschnitts 110 in einer Umwelt wiederzugeben.The machine learning device 120 the controller 2 with the above configuration produces the same effect as the machine learning apparatus described above 100 , In particular, the machine learning device 120 with the output of the decision-making section 122 change the state of an environment. On the other hand, the machine learning device 100 request an external device for a function corresponding to the decision making section to learn the learning section of the session 110 to play in an environment.

7 zeigt ein System 170, das Roboter 160 gemäß einer Ausführungsform umfasst. 7 shows a system 170 , the robot 160 according to one embodiment.

Das System 170 umfasst eine Vielzahl von Robotern 160 und 160', die mindestens den gleichen Betriebsvorgang ausführen, und ein drahtgebundenes/drahtloses Netzwerk 172, das die Roboter 160 und 160' miteinander verbindet. Mindestens einer der Vielzahl von Robotern 160 ist als Roboter 160 mit der obigen Steuerung 2 konfiguriert. Außerdem kann das System 170 Roboter 160' aufweisen, die die Steuerung 2 nicht umfassen. Die Roboter 160 und 160' haben einen Mechanismus, der zum Durchführen eines Betriebsvorgangs für den gleichen Zweck benötigt wird.The system 170 includes a variety of robots 160 and 160 ' that perform at least the same operation and a wired / wireless network 172 that the robots 160 and 160 ' connects with each other. At least one of the multitude of robots 160 is as a robot 160 with the above control 2 configured. Besides, the system can 170 robot 160 ' that have the control 2 do not include. The robots 160 and 160 ' have a mechanism needed to perform an operation for the same purpose.

In dem System 170 mit der obigen Konfiguration können die die Steuerung 2 umfassenden Roboter 160 unter der Vielzahl von Robotern 160 und 160' automatisch und genau einen Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters in Bezug auf eine Sollgeschwindigkeit des Spitzenendes des Roboters, die Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und eine Bewegungsbahn in der Nähe des Spitzenendes des Roboters berechnen, ohne sich auf eine Berechnung oder Schätzung unter Verwendung der Lernergebnisse des Lernabschnitts 110 zu verlassen. Außerdem kann die Steuerung 2 von mindestens einem der Roboter 160 basierend auf den für jeden der anderen Roboter 160 und 160' erhaltenen Zustandsvariablen S und Bestimmungsdaten D einen Abstimmbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters, der allen Robotern 160 und 160' gemeinsam ist, lernen, sodass die Lernergebnisse zwischen allen Robotern 160 und 160' geteilt werden. Dementsprechend ermöglicht das System 170, die Geschwindigkeit und die Zuverlässigkeit des Erlernens eines Abstimmbetrags einer Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters mit einem breiteren Spektrum von Datensätzen (mit Zustandsvariablen S und Bestimmungsdaten D) als Eingaben zu verbessern.In the system 170 with the above configuration, those can control 2 comprehensive robot 160 among the multitude of robots 160 and 160 ' automatically and accurately calculate a tuning amount of a moving speed of each motor of a robot with respect to a target speed of the tip end of the robot, the moving speed of each of the motors of the robot, and a moving track near the tip end of the robot without relying on a calculation or estimation Use of the learning outcomes of the session 110 to leave. In addition, the controller 2 of at least one of the robots 160 based on the one for each of the other robots 160 and 160 ' received state variables S and determination data D a tuning amount of the movement speed of each of the motors of the robot, all robots 160 and 160 ' is common, learning, so the learning outcomes between all the robots 160 and 160 ' to be shared. Accordingly, the system allows 170 , the speed and reliability of learning a tuning amount of a moving speed of each of the motors of the robot with a wider range of data sets (with state variables S and determination data D ) as inputs.

8 zeigt ein System 170' mit einer Vielzahl von Robotern 160' gemäß einer weiteren Ausführungsform. 8th shows a system 170 ' with a variety of robots 160 ' according to a further embodiment.

Das System 170' umfasst die maschinelle Lernvorrichtung 120 (oder 100), die Vielzahl von Roboter 160' mit der gleichen Maschinenkonfiguration und ein drahtgebundenes/drahtloses Netzwerk 172, das die Roboter 160' und die maschinelle Lernvorrichtung 120 (oder 100) miteinander verbindet.The system 170 ' includes the machine learning device 120 (or 100 ), the variety of robots 160 ' with the same machine configuration and a wired / wireless network 172 that the robots 160 ' and the machine learning device 120 (or 100 ) connects to each other.

In dem System 170' mit der obigen Konfiguration kann die maschinelle Lernvorrichtung 120 (oder 100) einen Abstimmbetrag einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters in Bezug auf eine Sollgeschwindigkeit des Spitzenendes des Roboters, die allen Robotern 160' gemeinsam ist, die Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters und einen Bewegungspfad in der Nähe des Spitzenendes des Roboters basierend auf Zustandsvariablen S und Bestimmungsdaten D, die für jeden der mehreren Roboter 160' erhalten wurden, lernen und automatisch und genau den Anpassungsbetrag der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters in Bezug auf die Sollgeschwindigkeit des Spitzenendes des Roboters, die Bewegungsgeschwindigkeit eines jedes der Motoren des Roboters und den Bewegungspfad in der Nähe des Spitzenendes des Roboters unter Verwendung der Lernergebnisse berechnen, ohne sich auf Berechnung oder Schätzung zu verlassen.In the system 170 ' With the above configuration, the machine learning device 120 (or 100) a tuning amount of a moving speed of each motor of a robot with respect to a target speed of the tip end of the robot, all the robots 160 ' in common, the moving speed of each of the motors of the robot and a moving path in the vicinity of the tip end of the robot based on state variables S and determination data D that are common to each of the plurality of robots 160 ' and automatically and accurately learn the amount of adjustment of the moving speed of each of the motors of the robot with respect to the target speed of the tip end of the robot, the moving speed of each of the motors of the robot and the moving path near the tip end of the robot using the learning results calculate without relying on calculation or estimation.

Im System 170' kann die maschinelle Lernvorrichtung 120 (oder 100) eine Konfiguration aufweisen, die in einem Cloud-Server oder dergleichen im Netzwerk 172 vorhanden ist. Gemäß der Konfiguration kann eine gewünschte Anzahl der Roboter 160' mit der maschinellen Lernvorrichtung 120 (oder 100) verbunden werden, unabhängig von den vorhandenen Standorten und den Zeiten der Vielzahl von Robotern 160'.In the system 170 ' can the machine learning device 120 (or 100 ) have a configuration in a cloud server or the like in the network 172 is available. According to the configuration, a desired number of robots 160 ' with the machine learning device 120 (or 100 ), regardless of the existing locations and the times of the plurality of robots 160 ' ,

Arbeiter, die in die Systeme 170 und 170' eingreifen, können eine Bestimmung dazu durchführen, ob der Erreichungsgrad des Erlernens eines Abstimmbetrags einer Bewegungsgeschwindigkeit eines jeden Motors eines Roboters mit der maschinellen Lernvorrichtung 120 (oder 100) (d.h. die Zuverlässigkeit des Abstimmbetrags der Bewegungsgeschwindigkeit eines jeden der Motoren des Roboters) zu einem angemessenen Zeitpunkt nach dem Beginn des Lernens durch die maschinelle Lernvorrichtung 120 (oder 100) ein erforderliches Niveau erreicht hat.Workers working in the systems 170 and 170 ' can make a determination to determine whether the degree of achievement of learning a A tuning amount of a moving speed of each motor of a robot with the machine learning device 120 (or 100 ) (ie, the reliability of the amount of movement speed of each of the motors of the robot) at an appropriate time after the start of the learning by the machine learning device 120 (or 100 ) has reached a required level.

Die Ausführungsformen der vorliegenden Erfindung sind oben beschrieben. Die vorliegende Erfindung ist jedoch nicht auf die Beispiele der oben genannten Ausführungsformen beschränkt und kann in verschiedenen Modi durch Hinzufügen von geeigneten Modifikationen durchgeführt werden.The embodiments of the present invention are described above. However, the present invention is not limited to the examples of the above-mentioned embodiments and may be performed in various modes by adding appropriate modifications.

Beispielsweise sind ein durch die maschinellen Lernvorrichtungen 100 und 120 ausgeführter Lernalgorithmus, ein durch die maschinelle Lernvorrichtung 120 ausgeführter Berechnungsalgorithmus und ein durch die Steuerungen 1 und 2 ausgeführter Steuerungsalgorithmus nicht auf die oben genannten Algorithmen beschränkt, sondern es können verschiedene Algorithmen eingesetzt werden.For example, one of the machine learning devices 100 and 120 executed learning algorithm, a through the machine learning device 120 executed calculation algorithm and one by the controllers 1 and 2 executed control algorithm is not limited to the above algorithms, but it can be used various algorithms.

Außerdem beschreiben die obigen Ausführungsformen eine Konfiguration, bei der die Steuerung 1 (oder 2) und die maschinelle Lernvorrichtung 100 (oder 120) eine unterschiedliche CPU aufweisen. Die maschinelle Lernvorrichtung 100 (oder 120) kann jedoch durch die CPU 11 der Steuerung 1 (oder 2) und einem im ROM 12 gespeicherten Systemprogramm realisiert werden.In addition, the above embodiments describe a configuration in which the controller 1 (or 2 ) and the machine learning device 100 (or 120 ) have a different CPU. The machine learning device 100 (or 120 ) can however by the CPU 11 the controller 1 (or 2 ) and one in the ROM 12 stored system program can be realized.

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

JP 6285402 [0002, 0003]

Claims

A controller that adjusts a moving speed of each motor of a robot that performs a coating with a sealing material, the controller comprising: a machine learning device that learns a tuning amount of the moving speed of each of the motors of the robot, wherein the machine learning device has: a state observation section which, as state variables expressing a current state of an environment, learning speed adjustment amount data indicating the tuning amount of the moving speed of each of the motors of the robot, target speed data indicating a target speed of a tip end of the robot, moving speed data indicating the moving speed of a robot indicating each of the motors of the robot, and observing trajectory data indicating a trajectory near the tip end of the robot, a determination data acquiring section that acquires determination data indicating a fitness determination result of the moving speed of the tip end of the robot, and a learning section that learns the target speed data, the movement speed data and the trajectory data in association with the adjustment amount of the moving speed of each of the motors of the robot using the state variables and the determination data.

Control after Claim 1 wherein the determination data includes, in addition to the fitness determination result of the moving speed of the tip end of the robot, a fitness determination result of a position of the tip end of the robot.

Control after Claim 1 or 2 wherein the learning section comprises: a reward calculation section that calculates a reward associated with the fitness determination result, and a value function update section that updates a function using the reward that includes a value of the adjustment amount of the movement speed of each of the motors of the robot with respect to the target speed of the tip end of the robot, the moving speed of each of the motors of the robot, and the trajectory near the tip end of the robot are expressed.

Control according to one of the Claims 1 to 3 wherein the learning section performs a calculation of the state variables and the determination data on the basis of a multi-layered structure.

Control according to one of the Claims 1 to 4 , further comprising: a decision determining section that outputs a target value based on the amount of adjustment of the moving speed of each of the motors of the robot based on a learning result of the learning section.

Control according to one of the Claims 1 to 5 wherein the learning portion learns the timing of adjusting the moving speed of each of the motors of the robot in each of a plurality of robots using the state variables and destination data obtained for each of the plurality of robots.

Control according to one of the Claims 1 to 6 wherein the machine learning device is present in a cloud server.

A machine learning apparatus that learns a tuning amount of a moving speed of each motor of a robot that performs coating with a sealing material, the machine learning apparatus comprising: a state observation section indicative of state variables expressing a current state of environment, training speed adjustment amount data indicating the tuning amount of the moving speed of each of the motors of the robot, target speed data indicating a target speed of a tip end of the robot, moving speed data representing the moving speed of each indicate the motors of the robot and observe trajectory data indicating a trajectory near the tip end of the robot; a determination data acquiring section that acquires determination data indicating a fitness determination result of the moving speed of the tip end of the robot; and a learning section that learns the target speed data, the movement speed data and the trajectory data in association with the adjustment amount of the moving speed of each of the motors of the robot using the state variables and the determination data.