CN113199483A

CN113199483A - Robot system, robot control method, machine learning device, and machine learning method

Info

Publication number: CN113199483A
Application number: CN202110544521.3A
Authority: CN
Inventors: 山崎岳; 尾山拓未; 陶山峻; 中山一隆; 组谷英俊; 中川浩; 冈野原大辅; 奥田辽介; 松元睿一; 河合圭悟
Original assignee: Fanuc Corp; Preferred Networks Inc
Current assignee: Fanuc Corp; Preferred Networks Inc
Priority date: 2015-07-31
Filing date: 2016-07-29
Publication date: 2021-08-03
Also published as: JP2024069414A; JP7491971B2; JP2020168719A; JP2017064910A; DE102016015873B3; CN106393102A; JP2022145915A; JP7100426B2; CN106393102B; JP6522488B2; JP2017030135A

Abstract

The invention provides a robot system, a robot control method, a machine learning device, and a machine learning method. The robot system includes: a robot; an observation unit that acquires data including at least one of information relating to an object measured by a measuring instrument and information obtained by processing the information; a determination unit configured to input the data to a neural network to determine information for gripping the object by the robot; and a control device for controlling the robot based on the information determined by the determination unit.

Description

Robot system, robot control method, machine learning device, and machine learning method

The present application is a divisional application of patent application No. 201610617361.X entitled "machine learning apparatus, robot system, and machine learning method", filed on 29/7/2016.

Technical Field

The present invention relates to a machine learning device, a robot system, and a machine learning method for learning a picking operation of randomly placed workpieces including a bulk state.

Background

Conventionally, as disclosed in, for example, japanese patent No. 5642738 and japanese patent No. 5670397, there is known a robot system in which a robot hand grips and conveys workpieces stacked in bulk in a basket-shaped box. In such a robot system, for example, position information of a plurality of workpieces is acquired using a three-dimensional measuring instrument provided above a basket-shaped box, and the workpieces are taken out one by a robot hand of the robot based on the position information.

However, in the above-described conventional robot system, it is necessary to set in advance how to extract a workpiece to be taken out and at which position the workpiece is to be taken out, for example, from the distance images of a plurality of workpieces measured by the three-dimensional measuring instrument. In addition, it is necessary to program in advance how to operate the robot hand when the workpiece is taken out. Specifically, for example, it is necessary to teach the robot about the operation of taking out a workpiece using a teaching board.

Therefore, if the setting for extracting the workpiece to be taken out from the distance images of the plurality of workpieces is not appropriate or the operation program of the robot is not appropriately created, the success rate at the time when the robot takes out the workpiece and carries it is lowered. In order to increase the success rate, it is necessary to continuously improve the detection setting of the workpiece and the operation program of the robot while repeating trial and error to find the optimum operation of the robot.

Disclosure of Invention

In view of the above circumstances, an object of the present invention is to provide a machine learning device, a robot system, and a machine learning method that can learn an optimum operation of a robot when randomly placed workpieces including a bulk state are taken out without human intervention.

According to a first aspect of the present invention, there is provided a machine learning device for learning an operation of a robot that takes out a plurality of workpieces placed in a random manner in a bulk state by a robot hand, the machine learning device including: a state quantity observation unit that observes a state quantity of the robot including output data of a three-dimensional measuring instrument that measures a three-dimensional map of each of the workpieces; an operation result acquisition unit that acquires a result of a picking operation of the robot that picks up the workpiece by the robot hand unit; and a learning unit that receives an output from the state quantity observation unit and an output from the operation result acquisition unit, and learns an operation quantity including instruction data for instructing the robot to take out the workpiece, in association with the state quantity of the robot and a result of the taking out operation. Preferably, the machine learning device further includes an intention determining unit that determines the command data to be instructed to the robot by referring to the operation amount learned by the learning unit.

According to a second aspect of the present invention, there is provided a machine learning device for learning an operation of a robot that takes out a plurality of workpieces placed in a random manner in a bulk state by a robot hand, the machine learning device including: a state quantity observation unit that observes a state quantity of the robot including output data of a three-dimensional measuring instrument that measures a three-dimensional map of each of the workpieces; an operation result acquisition unit that acquires a result of a picking operation of the robot that picks up the workpiece by the robot hand unit; and a learning unit that receives the output from the state quantity observation unit and the output from the operation result acquisition unit, and learns the operation quantity including the measurement parameter of the three-dimensional measurement device in association with the state quantity of the robot and the result of the extraction operation. Preferably, the machine learning device further includes an intention determining unit that determines the measurement parameter of the three-dimensional measurement device by referring to the operation amount learned by the learning unit.

The state quantity observation unit may observe the state quantity of the robot including output data of a coordinate calculation unit for calculating a three-dimensional position of each of the workpieces, based on an output of the three-dimensional measuring instrument. The coordinate calculation section may further calculate a posture of each of the workpieces, and output data of the calculated three-dimensional position and posture of each of the workpieces. The operation result acquisition unit may use output data of the three-dimensional measuring instrument. Preferably, the machine learning device further includes a preprocessor that processes output data of the three-dimensional measuring device before the input to the state quantity observation unit, and the state quantity observation unit receives the output data of the preprocessor as the state quantity of the robot. The preprocessing section may make the direction and height of each of the workpieces constant in the output data of the three-dimensional measuring instrument. The operation result acquiring unit may acquire at least one of success or failure in taking out the workpiece, a damaged state of the workpiece, and a degree of completion when the taken-out workpiece is transferred to a subsequent process.

The learning unit may include: a reward calculating part for calculating a reward according to the output of the action result acquiring part; and a cost function updating unit having a cost function for specifying a value of the workpiece picking operation, the cost function being updated in accordance with the return. The learning unit may further include a learning model for learning the removal operation of the workpiece, and includes: an error calculation unit that calculates an error based on an output of the operation result acquisition unit and an output of the learning model; and a learning model updating unit that updates the learning model in accordance with the error. The machine learning device preferably has a neural network.

According to a third aspect of the present invention, there is provided a robot system including a machine learning device that learns an operation of a robot that takes out a plurality of workpieces placed in a random manner in a bulk state by a hand portion, the robot system including: a state quantity observation unit that observes a state quantity of the robot including output data of a three-dimensional measuring instrument that measures a three-dimensional map of each of the workpieces; an operation result acquisition unit that acquires a result of a picking operation of the robot that picks up the workpiece by the robot hand unit; and a learning unit that receives an output from the state quantity observation unit and an output from the operation result acquisition unit, and learns an operation quantity including instruction data for instructing the robot to take out the workpiece in association with the state quantity of the robot and a result of the taking out operation, the robot system including: the robot, the three-dimensional measuring instrument, and a control device for controlling the robot and the three-dimensional measuring instrument, respectively.

According to a fourth aspect of the present invention, there is provided a robot system including a machine learning device that learns an operation of a robot that takes out a plurality of workpieces placed in a random manner in a bulk state by a hand portion, the robot system including: a state quantity observation unit that observes a state quantity of the robot including output data of a three-dimensional measuring instrument that measures a three-dimensional map of each of the workpieces; an operation result acquisition unit that acquires a result of a picking operation of the robot that picks up the workpiece by the robot hand unit; and a learning unit that receives an output from the state quantity observation unit and an output from the operation result acquisition unit, and learns an operation quantity including a measurement parameter of the three-dimensional measurement device in association with the state quantity of the robot and a result of the extraction operation, the robot system including: the robot, the three-dimensional measuring instrument, and a control device for controlling the robot and the three-dimensional measuring instrument, respectively.

Preferably, the robot system includes a plurality of the robots, the machine learning device is provided for each of the robots, and the machine learning devices provided for the robots share or exchange data with each other via a communication medium. The machine learning device may reside on a cloud server.

According to a fifth aspect of the present invention, there is provided a machine learning method for learning an operation of a robot that takes out a plurality of randomly placed workpieces in a bulk state by a hand, the workpieces being placed in a random manner, the machine learning method including the steps of: observing a state quantity of the robot including output data of a three-dimensional measuring device that measures a three-dimensional map of each of the workpieces; acquiring a result of an extracting operation of the robot that extracts the workpiece by the hand portion; an output from the state quantity observation unit and an output from the operation result acquisition unit are received, and an operation quantity including instruction data for instructing the robot to take out the workpiece is learned in association with the state quantity of the robot and a result of the taking out operation.

Drawings

The present invention will be more clearly understood by reference to the following drawings.

Fig. 1 is a block diagram showing a conceptual configuration of a robot system according to an embodiment of the present invention.

Fig. 2 is a diagram schematically showing a model of a neuron.

Fig. 3 is a diagram schematically showing a three-layer neural network formed by combining the neurons shown in fig. 2.

Fig. 4 is a flowchart showing an example of the operation of the machine learning device shown in fig. 1.

Fig. 5 is a block diagram showing a conceptual configuration of a robot system according to another embodiment of the present invention.

Fig. 6 is a diagram for explaining an example of processing by the preprocessing unit in the robot system shown in fig. 5.

Fig. 7 is a block diagram showing a modification of the robot system shown in fig. 1.

Detailed Description

Embodiments of a machine learning device, a robot system, and a machine learning method according to the present invention will be described below with reference to the drawings. However, it is intended that the invention not be limited to the embodiments illustrated in the drawings or described below. Here, in each drawing, the same reference numerals are given to the same components. In the drawings, the same reference numerals are assigned to the components having the same functions. For easy understanding, the drawings are appropriately modified in scale.

Fig. 1 is a block diagram showing a conceptual configuration of a robot system according to an embodiment of the present invention. The robot system 10 of the present embodiment includes: a robot 14 to which a hand 13 for gripping the workpieces 12 stacked in bulk in the basket-shaped box 11 is attached; a three-dimensional measuring device 15 that measures a three-dimensional map (map) of the surface of the workpiece 12; a control device 16 for controlling the robot 14 and the three-dimensional measuring device 15; a coordinate calculation unit 19; and a machine learning device 20.

Here, the machine learning device 20 includes: a state quantity observation unit 21, an operation result acquisition unit 26, a learning unit 22, and an intention determination unit 25. As described in detail later, the machine learning device 20 learns and outputs operation amounts such as instruction data instructing the robot 14 to perform the operation of taking out the workpiece 12 and measurement parameters of the three-dimensional measuring instrument 15.

The robot 14 is, for example, a 6-axis articulated robot, and the drive axes of the robot 14 and the hand portion 13 are controlled by a control device 16. Further, a robot 14 is used to take out the workpieces 12 one by one from the magazine 11 provided at a predetermined position and sequentially move the workpieces to a predetermined place such as a conveyor or an operation table (not shown).

However, when the workpieces 12 in bulk are taken out of the box 11, the robot hand 13 or the workpieces 12 may collide or contact the wall of the box 11. Alternatively, the robot hand 13 or the workpiece 12 may be caught by another workpiece 12. In such a case, in order to immediately avoid an overload applied to the robot 14, a function of detecting a force acting on the hand 13 is required. Therefore, a 6-axis force sensor 17 is provided between the tip of the arm portion of the robot 14 and the hand portion 13. The robot system 10 according to the present embodiment further includes a function of estimating the force acting on the hand portion 13 from the current value of a motor (not shown) that drives the drive shaft of each joint of the robot 14.

Further, since the force sensor 17 can detect the force acting on the hand portion 13, it can be determined whether or not the hand portion 13 actually holds the workpiece 12. That is, since the weight of the workpiece 12 acts on the hand portion 13 when the hand portion 13 grips the workpiece 12, it can be determined that the hand portion 13 grips the workpiece 12 if the detection value of the force sensor 17 exceeds a predetermined threshold value after the taking-out operation of the workpiece 12 is performed. The determination as to whether or not the hand portion 13 holds the workpiece 12 may be made by, for example, imaging data of a camera used in the three-dimensional measuring instrument 15 or an output of a photoelectric sensor, not shown, attached to the hand portion 13. The determination may be made based on data from a pressure gauge of the suction robot described later.

Here, the hand portion 13 may have various forms as long as it can grip the workpiece 12. For example, the hand portion 13 may be configured to grip the workpiece 12 by opening and closing 2 or more claw portions, or may include an electromagnet or a negative pressure generating device that generates an attractive force to the workpiece 12. That is, fig. 1 illustrates a case where the hand portion 13 grips the workpiece by 2 claw portions, but the present invention is not limited thereto.

In order to measure the plurality of workpieces 12, the three-dimensional measuring device 15 is provided at a predetermined position above the plurality of workpieces 12 via the support portion 18. As the three-dimensional measuring instrument 15, for example, a three-dimensional vision sensor that obtains three-dimensional position information by performing image processing on image data of the workpiece 12 captured by 2 cameras (not shown) can be used. Specifically, the three-dimensional map (the positions of the surfaces of the plurality of works 12 stacked in bulk) is measured by applying a triangulation method, a light section method, a Time-of-flight method (Time-of-flight method), a Defocus distance measurement method (Depth from Defocus method), a method using these methods in combination, or the like.

The coordinate calculation unit 19 calculates (measures) the positions of the surfaces of the plurality of works 12 stacked in bulk, using the three-dimensional map obtained by the three-dimensional measuring instrument 15 as an input. That is, the three-dimensional position data (x, y, z) or the three-dimensional position data (x, y, z) and the orientation data (w, p, r) of each workpiece 12 can be obtained by the output of the three-dimensional measuring instrument 15. Here, the state quantity observation unit 21 receives both the three-dimensional map from the three-dimensional measurement device 15 and the position data (posture data) from the coordinate calculation unit 19 to observe the state quantity of the robot 14, but may also observe the state quantity of the robot 14 by receiving only the three-dimensional map from the three-dimensional measurement device 15, for example. Further, similarly to the case described below with reference to fig. 5, a preprocessing unit 50 may be added, and the preprocessing unit 50 may process (preprocess) the three-dimensional map from the three-dimensional measuring device 15 before inputting the processed three-dimensional map to the state quantity observation unit 21, and input the processed three-dimensional map to the state quantity observation unit 21.

It is assumed that the positions of the robot 14 and the three-dimensional measurement device 15 are determined in advance by calibration. In the three-dimensional measuring instrument 15 of the present invention, a laser distance measuring instrument may be used instead of the three-dimensional vision sensor. That is, the distance from the position where the three-dimensional measuring instrument 15 is provided to the surface of each workpiece 12 can be measured by laser scanning, or three-dimensional position data and postures (x, y, z, w, p, r) of a plurality of workpieces 12 stacked in bulk can be acquired by using various sensors such as a monocular camera and a touch sensor.

That is, in the present invention, the three-dimensional measuring instrument 15 can be applied to any kind of three-dimensional measuring method as long as data (x, y, z, w, p, r) of each workpiece 12 can be acquired. The form of installation of the three-dimensional measuring instrument 15 is not particularly limited, and may be fixed to a floor, a wall, or the like, or may be attached to an arm portion of the robot 14.

The three-dimensional measuring device 15 acquires a three-dimensional map of the plurality of workpieces 12 stacked in bulk in the box 11 by a command from the control device 16, and the coordinate calculation unit 19 acquires (calculates) data of three-dimensional positions (postures) of the plurality of workpieces 12 from the three-dimensional map, and outputs the data to the control device 16 and a state quantity observation unit 21 and an operation result acquisition unit 26 of a machine learning device 20 described later. In particular, the coordinate calculation unit 19 estimates the boundary between a certain workpiece 12 and another workpiece 12 or the boundary between the workpiece 12 and the box 11, for example, from the captured image data of a plurality of workpieces 12, and acquires three-dimensional position data for each workpiece 12.

The three-dimensional position data for each workpiece 12 is, for example, data obtained by estimating the existing position or retainable position of each workpiece 12 from the positions of a plurality of points on the surface of a plurality of workpieces 12 stacked in bulk. Of course, the three-dimensional position data of each workpiece 12 may also include data of the posture of the workpiece 12.

The coordinate calculation unit 19 also includes a method of using machine learning to acquire three-dimensional position and orientation data of each workpiece 12. For example, an input image using a method such as supervised learning described later, object recognition from a laser distance measuring device, or angle estimation can be applied.

When the three-dimensional position data of each workpiece 12 is input from the three-dimensional measuring instrument 15 to the control device 16 via the coordinate calculation unit 19, the control device 16 controls the operation of the hand 13 that takes out a certain workpiece 12 from the box 11. At this time, motors (not shown) of the respective axes of the robot hand 13 and the robot 14 are driven based on command values (operation amounts) corresponding to the optimal position, posture, and pickup direction of the robot hand 13 obtained by the machine learning device 20 described later.

The machine learning device 20 may learn variables of the imaging conditions of the camera used in the three-dimensional measuring device 15 (measurement parameters of the three-dimensional measuring device 15, for example, exposure time adjusted during imaging using an exposure table, illuminance of an illumination system when illuminating an object to be imaged, and the like), and may control the three-dimensional measuring device 15 via the control device 16 based on the learned measurement parameter operation amount. Here, the variables of the position/orientation estimation conditions used for estimating the existing position/orientation and the retainable position/orientation of each of the workpieces 12 from the positions of the plurality of workpieces 12 measured by the three-dimensional measuring device 15 may be included in the output data of the three-dimensional measuring device 15.

In addition, as described above, the output data from the three-dimensional measuring instrument 15 may be processed in advance by the preprocessing unit 50 or the like described later with reference to fig. 5, and the processed data (image data) may be provided to the state quantity observation unit 21. The operation result acquisition unit 26 may acquire the result of the robot hand 13 of the robot 14 taking out the workpiece 12 based on the output data from the three-dimensional measuring instrument 15 (the output data of the coordinate calculation unit 19), and may acquire, for example, the operation result of the degree of completion when the taken-out workpiece 12 is transferred to the subsequent step and the state change such as the presence or absence of the breakage of the taken-out workpiece 12 via another means (for example, a camera, a sensor, or the like provided in the subsequent step). As described above, the state quantity observation unit 21 and the operation result acquisition unit 26 are functional modules, but it is needless to say that both functions may be realized by one module.

Next, the machine learning device 20 shown in fig. 1 is described in detail. The machine learning device 20 has the following functions: from a set of data input to the device, rules, knowledge expressions, judgment criteria, and the like useful for the analysis are extracted, and the result of the judgment is output and knowledge learning (machine learning) is performed. The methods of machine learning are various, and if roughly divided, they are classified into, for example, "supervised learning", "unsupervised learning", and "reinforcement learning". In addition, in order to realize these methods, there is a method called "Deep Learning" (Learning) in which the feature amount itself is extracted. Further, these machine learning (machine learning device 20) may use a General-Purpose computer or processor, but when a GPGPU (General-Purpose computing image Processing unit) is applied, a large-scale PC cluster, or the like, higher-speed Processing can be performed.

First, supervised learning refers to learning features in a data set by providing a large number of data sets of a certain input and result (label) to the machine learning device 20, and collectively obtaining a model from the input estimation result, that is, a relationship thereof. In the case where the supervised learning is applied in the present embodiment, it can be used, for example, in a portion for estimating a workpiece position from a sensor input, a portion for estimating a success probability with respect to a candidate workpiece, or the like. For example, the algorithm may be implemented using a neural network described later.

The unsupervised learning means a method of learning by a device that compresses, classifies, shapes, or the like input data, even if corresponding teacher output data is not provided, by only providing a large amount of input data to a learning device, and learning what kind of distribution the input data has been distributed. For example, features in these datasets can be clustered among the similarities, and so on. Using the result, a certain criterion is set and output allocation is optimized, thereby enabling output prediction.

Further, as a problem setting of the intermediate between the supervised learning and the unsupervised learning, referred to as semi-supervised learning corresponds to a case where, for example, only a part of the input and output data sets exist, and only the input data is otherwise present. In the present embodiment, data (image data, simulation data, and the like) that can be acquired without actually operating the robot is used for unsupervised learning, and thus learning can be performed efficiently.

Next, reinforcement learning will be described. First, as a problem setting of reinforcement learning, the following is considered.

The robot observes the state of the environment and determines the behavior.

The environment changes according to a certain rule, and the behavior of itself may change the environment.

Return a reward signal each time an action is taken.

What is desired to be maximized is the aggregate of future (discounted) returns.

Learning starts from a state where the results caused by the behavior are not known at all or not known at all. That is, the robot can actually act for the first time, and the result thereof is acquired as data. That is, it is necessary to search for the optimum behavior while trying.

In order to simulate the human motion, a state learned in advance (the method such as the above-described supervised learning or the reverse reinforcement learning) may be set as an initial state, and the learning may be started from a good start point.

Here, reinforcement learning refers to a method of learning an appropriate behavior based on an interaction of a behavior with an environment by learning a behavior in addition to determination and classification, that is, learning to maximize a future return. This means that, in the present embodiment, for example, a behavior affecting the future can be obtained in which a mountain of the workpiece 12 collapses and the workpiece 12 is easily taken out in the future. The following description will be made by taking the case of Q learning as an example, but the Q learning is not limited thereto.

Q learning refers to a method of learning the value Q (s, a) of a selected behavior a in a certain environmental state s. That is, in a certain state s, the behavior a having the highest value Q (s, a) may be selected as the optimum behavior. However, initially, the correct value of the value Q (s, a) is not known at all for the combination of state s and behavior a. Therefore, the agent (agent) selects various actions a in a certain state s, and gives a reward to the action a at that time. Thus, the agent is constantly learning the choice of better behavior, i.e. the correct value Q (s, a).

Further, since it is desired to maximize the total of the returns obtained in the future as a result of the behavior, the objective is to finally make Q (s, a) equal to E [ Σ (γ) ]^t)r_t]. Here, E2]Denotes an expected value, t is time, γ is a parameter called discount rate described later, and r_tIs the return at time t, and Σ is based on the sum of times t. The expected value in this equation is a value that is obtained when the state changes according to the optimal behavior, and since this is unknown, it is learned while searching. The value Q (s, a) is more novel, and is represented by the following formula (1).

In the above formula (1), s_tRepresenting the environmental state at time t, a_tRepresenting the behavior at time t. By action a_tChange of state to s_t+1。r_t+1Indicating the return obtained by the state change. In addition, the term with max becomes "in" state s_t+1Next, a term obtained by multiplying γ by the Q value at the action a where the Q value is known to be the highest at this time is selected. Here, γ is a parameter of 0 < γ ≦ 1, referred to as the discount rate. Further, α is a learning coefficient, and is set in a range of 0 < α ≦ 1.

The above formula (1) is based on the trial a_tResult of (2) returning a reward r_t+1Update the state s_tBehavior a in_tEvaluation value Q(s) of_t、a_t) The method of (1). I.e. if based on the reward r_t+1And an evaluation value Q(s) of the optimal behavior max a of the next state of the behavior a_t+1、max a_t+1) Is greater than the evaluation value Q(s) of the behavior a in the state s_t、a_t) Then make Q(s)_t、a_t) Increases, conversely, if smaller than the evaluation value Q of the behavior a in the state s(s)_t、a_t) Then Q(s)_t、a_t) And decreases. That is, the value of a behavior in a state is made closer to the value of the best behavior in the next state based on the reward returned immediately as a result, and the behavior.

Here, as a method of expressing Q (s, a) on a computer, there are a method of holding the value in advance as a table for all the state behavior pairs (s, a) and a method of preparing a function for approximating Q (s, a). The latter method can realize the above expression (1) by adjusting parameters of the approximation function by a method such as a random gradient descent method. Further, as the approximation function, a neural network described later can be used.

Further, as an approximation algorithm of a merit function in supervised learning, a learning model of unsupervised learning, or reinforcement learning, a neural network may be used. Fig. 2 is a diagram schematically showing a model of a neuron, and fig. 3 is a diagram schematically showing a three-layer neural network formed by combining the neurons shown in fig. 2. That is, the neural network is configured by, for example, an arithmetic device and a memory that simulate a neuron model such as shown in fig. 2.

As shown in fig. 2, the neuron outputs an output (result) y for a plurality of inputs x (in fig. 2, inputs x1 to x3 are taken as examples). Each input x (x1, x2, x3) is multiplied by a weight w (w1, w2, w3) corresponding to the input x. Thereby, the neuron outputs a result y expressed by the following expression (2). In addition, the input x, the result y, and the weight w are all vectors. In the following formula (2), θ is a bias (bias), f_kIs an activation function.

A three-layer neural network formed by combining the neurons shown in fig. 2 will be described with reference to fig. 3. As shown in fig. 3, a plurality of inputs x (here, input x1 to input x3 are taken as examples) are input from the left side of the neural network, and a result y (here, result y1 to result y3 are taken as examples) is output from the right side. Specifically, the inputs x1, x2, and x3 are multiplied by the corresponding weights and input to each of the 3 neurons N11 to N13. The weights multiplied by these inputs are collectively labeled as W1.

Neurons N11-N13 output z 11-z 13, respectively. In fig. 3, Z11 to Z13 are collectively denoted as a feature vector Z1, and can be regarded as vectors obtained by extracting feature amounts of input vectors. The feature vector Z1 is a feature vector between the weight W1 and the weight W2. Z11 to z13 are multiplied by the corresponding weight and input to each of 2 neurons N21 and N22. The weights multiplied by these feature vectors are collectively labeled as w 2.

Neurons N21, N22 output z21, z22, respectively. In fig. 3, Z21 and Z22 are collectively denoted as a feature vector Z2. The feature vector Z2 is a feature vector between the weight W2 and the weight W3. Z21 and z22 are multiplied by the corresponding weights and input to each of the 3 neurons N31 to N33. The weights multiplied by these feature vectors are collectively labeled as W3.

Finally, neurons N31 to N33 output results y1 to y3, respectively. There are learning modes and value prediction modes in the action of the neural network. For example, in the learning mode, the weight W is learned using the learning data set, and the behavior of the robot is determined in the prediction mode using the parameter. Further, prediction is written for convenience, but of course, various tasks such as detection/classification/inference may be performed.

Here, the data obtained by actually operating the robot in the prediction mode can be immediately learned and reflected in the next behavior (online learning); the learning may be performed using a data set collected in advance, and the detection pattern may be performed using the parameter (batch learning) at a later time. Alternatively, it may be intermediate, with learning patterns inserted each time data accumulates to some extent.

The weights W1 to W3 can be learned by an error Back propagation method. Further, error information is input from the right side and flows to the left side. The error back propagation method is a method of adjusting (learning) the respective weights of the output y when the input x is input and the real output y (teacher) for each neuron.

Such neural networks may further add layers above three (referred to as deep learning). Further, an arithmetic device may be used which automatically obtains feature extraction input in stages only from teacher data and regresses the result.

Therefore, the machine learning device 20 of the present embodiment includes, as shown in fig. 1, in order to be able to perform the Q learning: a state quantity observation unit 21, an operation result acquisition unit 26, a learning unit 22, and an intention determination unit 25. However, the machine learning method applied in the present invention is not limited to Q learning as described above. In other words, various methods such as "supervised learning", "unsupervised learning", "semi-supervised learning", and "reinforcement learning", which are methods that can be used in the machine learning apparatus, can be applied. Further, these machine learning (machine learning device 20) may use a general-purpose computer or processor, however, when a GPGPU, a large-scale PC cluster, or the like is applied, processing can be performed at a higher speed.

That is, according to the present embodiment, there is provided a machine learning device that learns the operation of a robot 14 that takes out workpieces 12 from a plurality of workpieces 12 placed in bulk including a random state by a hand portion 13, the machine learning device including: a state quantity observation unit 21 that observes a state quantity of the robot 14 including output data of a three-dimensional measuring instrument 15, the three-dimensional measuring instrument 15 measuring a three-dimensional position (x, y, z) or a three-dimensional position and orientation (x, y, z, w, p, r) of each workpiece 12; an operation result acquisition unit 26 that acquires the result of the picking operation of the robot 14 that picks up the workpiece 12 by the robot hand 13; and a learning unit 22 that receives the output from the state quantity observation unit 21 and the output from the operation result acquisition unit 26, and learns the operation quantity including the instruction data for instructing the robot 14 to take out the workpiece 12, in association with the state quantity of the robot 14 and the result of the taking out operation.

The state quantity observed by the state quantity observation unit 21 may include, for example, state variables for setting the position, posture, and removal direction of the robot hand 13 when removing a certain workpiece 12 from the magazine 11. The learned operation amount may include, for example, command values such as torque, speed, and rotational position provided from the control device 16 to each drive shaft of the robot 14 and the hand 13 when the workpiece 12 is taken out from the box 11.

When one of the plurality of workpieces 12 stacked in bulk is taken out, the learning unit 22 learns the state variables in association with the result of the taking-out operation of the workpiece 12 (the output of the operation result acquisition unit 26). That is, the output data of the three-dimensional measuring instrument 15 (coordinate calculating unit 19) and the command data of the hand unit 13 are set at random or set intentionally according to a predetermined rule by the control device 16, and the workpiece 12 is taken out by the hand unit 13. Here, as the predetermined rule, for example, there is a rule in which workpieces having a high height (z) direction among the plurality of workpieces 12 stacked in bulk are sequentially taken out. Thus, the output data of the three-dimensional measuring instrument 15 and the command data of the robot 13 correspond to the behavior of taking out a certain workpiece. Then, success and failure in taking out the workpiece 12 occur, and each time such success and failure occurs, the learning unit 22 evaluates a state variable composed of output data of the three-dimensional measuring instrument 15 and command data of the robot 13.

The learning unit 22 stores output data of the three-dimensional measuring instrument 15 and command data of the robot 13 when the workpiece 12 is taken out, in association with evaluation of the result of the taking-out operation of the workpiece 12. Further, as an example of failure, there is a case where: the robot hand 13 may not hold the workpiece 12, or may collide or contact the workpiece 12 with the wall of the box 11 even if the workpiece 12 is held. Whether or not such removal of the workpiece 12 is successful is determined based on the detection value of the force sensor 17 and the imaging data of the three-dimensional measuring instrument. Here, the machine learning device 20 may perform learning using a part of the instruction data of the robot 13 output from the control device 16, for example.

Here, the learning unit 22 of the present embodiment preferably includes a reward calculation unit 23 and a merit function update unit 24. For example, the return calculation unit 23 calculates a return, for example, a score, based on the success or failure of the removal of the workpiece 12 due to the state variables. The success of the removal of the workpiece 12 is assumed to be high, and the failure of the removal of the workpiece 12 is assumed to be low. In addition, the reward may also be calculated based on the number of successful removals of workpieces 12 within a predetermined time. In the calculation of the return, the return may be calculated in accordance with each stage of the removal of the workpiece 12, such as the success of gripping by the hand portion 13, the success of conveying by the hand portion 13, and the success of the placing operation of the workpiece 12.

The merit function update unit 24 has a merit function for specifying the merit of the removal operation of the workpiece 12, and updates the merit function in accordance with the return. In updating the cost function, the above-described update of the cost Q (s, a) is used. It is preferable to create a behavior value table at the time of the update. The behavior value table here is a table in which the output data of the three-dimensional measuring instrument 15 and the command data of the robot 13 when the workpiece 12 is taken out, and the value function (i.e., the evaluation value) updated in accordance with the result of taking out the workpiece 12 at that time are stored in association with each other.

Further, as the behavior value table, a function obtained by performing the approximation processing using the neural network described above may be used, and this is particularly effective when the amount of information of the state s such as image data is large. Further, the above-mentioned merit functions are not limited to 1 kind. For example, a cost function for evaluating whether the workpiece 12 is successfully gripped by the hand portion 13 or not, and a cost function for evaluating a time (cycle time) required for gripping and conveying the workpiece 12 by the hand portion 13 are considered.

Further, as the above-described cost function, a cost function for evaluating interference between the box 11 and the hand portion 13 or the workpiece 12 at the time of workpiece removal may be used. In order to calculate the return used for updating the merit function, the state quantity observation unit 21 preferably observes the force applied to the hand portion 13, for example, a value detected by the force sensor 17. Further, since it can be estimated that the disturbance occurs when the amount of change in the force detected by the force sensor 17 exceeds a predetermined threshold value, it is preferable to reduce the value determined by the cost function by setting the return in this case to a negative value, for example.

Further, according to the present embodiment, the measurement parameter of the three-dimensional measuring instrument 15 can be learned as the operation amount. That is, according to the present embodiment, there is provided a machine learning device that learns the operation of a robot 14 that takes out workpieces 12 from a plurality of workpieces 12 placed in bulk including a random state by a hand portion 13, the machine learning device including: a state quantity observation unit 21 that observes a state quantity of the robot 14 including output data of a three-dimensional measuring instrument 15, the three-dimensional measuring instrument 15 measuring a three-dimensional position (x, y, z) or a three-dimensional position and orientation (x, y, z, w, p, r) of each workpiece 12; an operation result acquisition unit 26 that acquires the result of the picking operation of the robot 14 that picks up the workpiece 12 by the robot hand 13; and a learning unit 22 that receives the output from the state quantity observation unit 21 and the output from the operation result acquisition unit 26, and learns the operation quantity including the measurement parameter of the three-dimensional measurement device 15 in association with the state quantity of the robot 14 and the result of the extraction operation.

The robot system 10 according to the present embodiment may further include an automatic robot replacement device (not shown) that replaces the robot hand 13 attached to the robot 14 with another type of robot hand 13. In this case, the merit function update unit 24 may have the merit function for each hand unit 13 having a different form, and update the merit function of the replaced hand unit 13 in accordance with the return. This makes it possible to learn the optimum operation of the hand portion 13 for each of the plurality of hand portions 13 having different configurations, and therefore, the automatic hand changer can be made to select a hand portion 13 having a high cost function.

Next, the intention determining unit 25 preferably refers to the behavior value table created as described above, and selects the output data of the three-dimensional measuring instrument 15 and the command data of the manipulator unit 13 corresponding to the highest evaluation value. Then, the intention determining unit 25 outputs the optimum data of the selected manipulator 13 and the three-dimensional measuring instrument 15 to the control device 16.

Then, the controller 16 controls the three-dimensional measuring instrument 15 and the robot 14 to take out the workpiece 12, respectively, using the optimum data of the robot hand 13 and the three-dimensional measuring instrument 15 output from the learning unit 22. For example, the control device 16 preferably operates the drive axes of the hand portion 13 and the robot 14 based on state variables that set the optimum position, posture, and extraction direction of the hand portion 13 obtained by the learning unit 22.

As shown in fig. 1, the robot system 10 according to the above embodiment includes one machine learning device 20 for one robot 14. However, in the present invention, the number of each of the robot 14 and the machine learning device 20 is not limited to one. For example, the robot system 10 may further include a plurality of robots 14, and one or more machine learning devices 20 may be provided corresponding to the respective robots 14. The robot system 10 preferably shares or exchanges the optimal state variables of the three-dimensional measurement instrument 15 and the hand 13 acquired by the machine learning device 20 of each robot 14 with each other through a communication medium such as a network. Thus, even if the operation rate of one robot 14 is lower than the operation rates of the other robots 14, the optimal operation result obtained by the machine learning device 20 provided in the other robot 14 can be used in the operation of one robot 14. Further, by sharing the learning model among a plurality of robots, or sharing the operation amount including the measurement parameters of the three-dimensional measuring instrument 15, the state amount of the robot 14, and the result of the extracting operation, the time taken for learning can be shortened.

The machine learning device 20 may be located inside the robot 14 or may be located outside the robot 14. Alternatively, the machine learning device 20 may be located in the control device 16 or may be present in a cloud server (not shown).

In the case where the robot system 10 includes a plurality of robots 14, while one robot 14 is carrying the workpiece 12 gripped by the hand 13, the hand of another robot 14 may be caused to perform an operation of taking out the workpiece 12. The merit function update unit 24 may update the merit function with the time during which the robot 14 that takes out the workpiece 12 switches. The machine learning device 20 includes state variables of a plurality of robot models, performs a picking-up simulation using the plurality of robot models during the picking-up operation of the workpiece 12, and learns the state variables of the plurality of robot models in association with the result of the picking-up operation of the workpiece 12 based on the result of the picking-up simulation.

In the machine learning device 20, the output data of the three-dimensional measuring device 15 when acquiring the data of the three-dimensional map of each workpiece 12 is transmitted from the three-dimensional measuring device 15 to the state quantity observation unit 21. Since the transmission data does not necessarily include the abnormal data, the machine learning device 20 may have a function of filtering the abnormal data, that is, a function of selecting whether or not to input the data from the three-dimensional measuring device 15 to the state quantity observation unit 21. Thus, the learning unit 22 of the machine learning device 20 can efficiently learn the optimum operation of the three-dimensional measuring instrument 15 and the hand unit 13 of the robot 14.

In the machine learning device 20, although the output data from the learning unit 22 is input to the control device 16, the output data from the learning unit 22 does not necessarily include abnormal data, and therefore, the device may have a function of filtering abnormal data, that is, a function of selecting whether or not to output the data from the learning unit 22 to the control device 16. Thus, the controller 16 can cause the robot 14 to perform the optimal operation of the hand unit 13 more safely.

The abnormal data may be detected in the following order. That is, abnormal data can be detected by the following order: a probability distribution of input data is estimated, a probability of occurrence of a new input is derived using the probability distribution, and if the probability of occurrence is a certain value or less, it is regarded as abnormal data greatly deviating from a typical behavior.

Next, an example of the operation of the machine learning device 20 included in the robot system 10 according to the present embodiment will be described. Fig. 4 is a flowchart showing an example of the operation of the machine learning device shown in fig. 1. As shown in fig. 4, when the learning operation (learning process) is started, the machine learning device 20 shown in fig. 1 performs three-dimensional measurement by the three-dimensional measuring instrument 15 and outputs the measurement result (step S11 in fig. 4). That is, in step S11, for example, a three-dimensional map (output data of the three-dimensional measuring device 15) of each workpiece 12 including random placement in a bulk state is acquired and output to the state quantity observation unit 21, and the coordinate calculation unit 19 receives the three-dimensional map of each workpiece 12 to calculate the three-dimensional position (x, y, z) of each workpiece 12 and output to the state quantity observation unit 21, the operation result acquisition unit 26, and the control device 16. Here, the coordinate calculation unit 19 may calculate and output the postures (w, p, r) of the respective workpieces 12 from the output of the three-dimensional measuring instrument 15.

As described with reference to fig. 5, the output (three-dimensional map) of the three-dimensional measuring instrument 15 may be input to the state quantity observation unit 21 via the preprocessing unit 50 that performs processing before being input to the state quantity observation unit 21. As described with reference to fig. 7, only the output of the three-dimensional measuring instrument 15 may be input to the state quantity observation unit 21, and only the output of the three-dimensional measuring instrument 15 may be input to the state quantity observation unit 21 via the preprocessing unit 50. Thus, the implementation and output of the three-dimensional measurement in step S11 may involve various ways.

Specifically, in the case of fig. 1, the state quantity observation unit 21 observes the three-dimensional map of each workpiece 12 from the three-dimensional measuring device 15 and the state quantities (output data of the three-dimensional measuring device 15) such as the three-dimensional position (x, y, z) and the posture (w, p, r) of each workpiece 12 from the coordinate calculation unit 19. The operation result acquisition unit 26 acquires the result of the picking operation of the robot 14 for picking up the workpiece 12 by the robot hand 13, based on the output data of the three-dimensional measuring instrument 15 (the output data of the coordinate calculation unit 19). The operation result acquiring unit 26 may acquire the result of the picking operation, such as the degree of completion when the picked-up workpiece 12 is transferred to a subsequent process and the damage of the picked-up workpiece 12, in addition to the output data of the three-dimensional measuring instrument.

For example, the machine learning device 20 determines an optimum operation based on the output data of the three-dimensional measuring instrument 15 (step S12 in fig. 4), and the control device 16 outputs command data (operation amount) of the robot hand 13 (robot 14) to perform a workpiece 12 picking-up operation (step S13 in fig. 4). Then, the above-described operation result acquisition unit 26 acquires the workpiece extraction result (step S14 in fig. 4).

Next, whether or not the workpiece 12 is successfully removed is determined based on the output from the operation result acquisition unit 26 (step S15 in fig. 4), a positive return is set when the workpiece 12 is successfully removed (step S16 in fig. 4), a negative return is set when the workpiece 12 is unsuccessfully removed (step S17 in fig. 4), and then the behavior value table (cost function) is updated (step S18 in fig. 4).

Here, for example, the success or failure of the removal of the workpiece 12 may be determined based on the output data of the three-dimensional measuring instrument 15 after the removal operation of the workpiece 12. Further, the determination of the success or failure of the extraction of the workpiece 12 is not limited to the evaluation of the success or failure of the extraction of the workpiece 12, and for example, it may be evaluated that: the degree of completion when the taken-out workpiece 12 is transferred to a subsequent process, a change in state such as whether or not the taken-out workpiece 12 is damaged, or the time (cycle time) and energy (electric energy) required for gripping and conveying the workpiece 12 by the hand 13.

Further, the return value based on the determination of the success or failure of the removal of the workpiece 12 is calculated by the return calculation unit 23, and the behavior value table is updated by the merit function update unit 24. That is, the learning unit 22 sets a positive return to the updated return of the value Q (S, a) when the removal of the workpiece 12 is successful (S16), and sets a negative return to the updated return when the removal of the workpiece 12 is unsuccessful (S17). Then, each time the learning unit 22 takes out the workpiece 12, the above-described action value table is updated (S18). By repeating the above steps S11 to S18, the learning unit 22 continues (learns) the update of the behavior value table.

In the above description, the data input to the state quantity observation unit 21 is not limited to the output data of the three-dimensional measuring instrument 15, and may include data such as the output of another sensor, or may be a part of the command data from the control device 16. In this way, the control device 16 causes the robot 14 to execute the operation of picking up the workpiece 12 using the command data (operation amount) output from the machine learning device 20. The learning by the machine learning device 20 is not limited to the operation of taking out the workpiece 12, and may be, for example, the measurement parameters of the three-dimensional measuring instrument 15 as described above.

As described above, according to the robot system 10 including the machine learning device 20 of the present embodiment, it is possible to learn the operation of the robot 14 that takes out the workpiece 12 from the plurality of workpieces 12 placed in a random manner including a bulk state by the hand 13. Thus, the robot system 10 can learn the selection of the optimal operation of the robot 14 to take out the bulk stacked workpieces 12 without human intervention.

Fig. 5 is a block diagram showing a conceptual configuration of a robot system according to another embodiment of the present invention, and shows a robot system to which supervised learning is applied. As is apparent from a comparison between fig. 5 and fig. 1, the robot system 10' to which supervised learning is applied shown in fig. 5 further includes a data recording unit 40 with a result (label) as compared with the robot system 10 to which Q learning (reinforcement learning) is applied shown in fig. 1. The robot system 10' shown in fig. 5 further includes a preprocessor 50 for preprocessing the output data of the three-dimensional measuring instrument 15. It is needless to say that the preprocessing unit 50 may be provided to the robot system 10 shown in fig. 1, for example.

As shown in fig. 5, the machine learning device 30 in the robot system 10' to which supervised learning is applied includes: a state quantity observation unit 31, an operation configuration acquisition unit 36, a learning unit 32, and an intention determination unit 35. The learning unit 32 includes an error calculation unit 33 and a learning model update unit 34. In the robot system 10' of the present embodiment, the machine learning device 30 also learns and outputs instruction data instructing the robot 14 to perform the operation of taking out the workpiece 12 and operation amounts such as measurement parameters of the three-dimensional measuring instrument 15.

That is, in the robot system 10' to which supervised learning is applied shown in fig. 5, the error calculation unit 33 and the learning model update unit 34 correspond to the reward calculation unit 23 and the merit function update unit 24, respectively, in the robot system 10 to which Q learning is applied shown in fig. 1. The other mechanisms, for example, the three-dimensional measuring instrument 15, the control device 16, the robot 14, and the like have the same configurations as those of fig. 1, and the description thereof will be omitted.

The error calculation unit 33 calculates an error between the result (label) output from the action result acquisition unit 36 and the output of the learning model installed in the learning unit. Here, when the shape of the workpiece 12 and the processing of the robot 14 are the same, for example, the result (tag) -added data recording unit 40 holds the result (tag) -added data obtained up to the day before the scheduled date on which the robot 14 performs the work, and supplies the result (tag) -added data held in the result (tag) -added data recording unit 40 to the error calculation unit 33 on the scheduled date. Alternatively, data obtained by simulation or the like performed outside the robot system 10 'or data with a result (tag) of another robot system may be supplied to the error calculation unit 33 of the robot system 10' via a memory card or a communication line. The data recording unit 40 with the result (tag) may be configured by a nonvolatile Memory such as a Flash Memory (Flash Memory), the data recording unit (nonvolatile Memory) 40 with the result (tag) may be incorporated in the learning unit 32, and the data with the result (tag) held in the data recording unit 40 with the result (tag) may be directly used by the learning unit 32.

Fig. 6 is a diagram for explaining an example of processing by the preprocessing unit in the robot system shown in fig. 5, fig. 6(a) shows an example of output data of the three-dimensional measuring instrument 15, which is data of three-dimensional positions (postures) of the plurality of workpieces 12 stacked in bulk in the box 11, and fig. 6(b) to 6(d) show an example of image data obtained by preprocessing the workpieces 121 to 123 in fig. 6 (a).

Here, as the workpieces 12(121 to 123), a cylindrical metal member is expected, and as the robot hand (13), for example, an adsorption plate that adsorbs the longitudinal center portion of the cylindrical workpiece 12 by negative pressure is expected, instead of gripping the workpiece by 2 claws. Therefore, for example, if the position of the longitudinal center portion of the workpiece 12 is known, the workpiece 12 can be taken out by moving the suction tray (13) to the position and performing suction. The numerical values in fig. 6(a) to 6(d) are represented by [ mm ], and represent the x direction, the y direction, and the z direction, respectively. The z direction corresponds to a height (depth) direction of image data obtained by imaging the box 11 in which the plurality of workpieces 12 are stacked by the three-dimensional measuring instrument 15 (for example, having 2 cameras) provided above.

As is apparent from comparison between fig. 6 a and fig. 6 b to 6 d, as an example of processing by the preprocessing unit 50 in the robot system 10' shown in fig. 5, the workpiece 12 of interest (for example, 3 workpieces 121 to 123) is rotated based on the output data (three-dimensional image) of the three-dimensional measuring instrument 15, and is processed so that the height of the center is "0".

That is, the output data of the three-dimensional measuring instrument 15 includes information on the three-dimensional position (x, y, z) and posture (w, p, r) of the longitudinal central portion of each workpiece 12, for example. At this time, as shown in fig. 6(b), 6(c) and 6(d), the 3

workpieces

121, 122 and 123 of interest are rotated by-r and subtracted by z to satisfy the same condition. By performing such preprocessing, the load on the machine learning device 30 can be reduced.

Here, the three-dimensional map shown in fig. 6(a) is not output data of the three-dimensional measuring instrument 15 itself, but for example, a threshold value for selecting from images obtained by a program for defining the order of taking out the workpieces 12 is set lower than before, and the processing itself may be performed by the preprocessing unit 50. It is to be noted that the processing in the preprocessing unit 50 may be variously changed depending on various conditions such as the shape of the workpiece 12 and the type of the hand unit 13.

In this way, the output data (three-dimensional map of each workpiece 12) of the three-dimensional measuring instrument 15 processed by the preprocessor 50 before being input to the state quantity observing unit 31 is input to the state quantity observing unit 31. Referring again to fig. 5, the error calculation unit 33 that receives the result (label) output from the operation result acquisition unit 36, for example, when the output of the neural network shown in fig. 3 is y as the learning model, assumes that there is an error of-log (y) when the workpiece 12 is actually taken out and succeeds; it is considered that there is an error of-log (1-y) at the time of failure, and a process for minimizing the error is performed. As input to the neural network shown in fig. 3, for example, image data of the workpieces 121 to 123 of interest subjected to preprocessing as shown in fig. 6(b) to 6(d) and data of three-dimensional positions and postures (x, y, z, w, p, r) of the workpieces 121 to 123 of interest are provided.

Fig. 7 is a block diagram showing a modification of the robot system shown in fig. 1. As is apparent from comparison between fig. 7 and fig. 1, in the modification of the robot system 10 shown in fig. 7, the coordinate calculation unit 19 is eliminated, and the state quantity observation unit 21 observes the state quantity of the robot 14 by receiving only the three-dimensional map from the three-dimensional measurement device 15. Needless to say, the control device 16 may be provided with a configuration corresponding to the coordinate calculation unit 19. The configuration shown in fig. 7 can be applied to, for example, the robot system 10' to which supervised learning is applied, which is described with reference to fig. 5. That is, the preprocessing unit 50 may be eliminated from the robot system 10' shown in fig. 5, and the state quantity observation unit 31 may observe the state quantity of the robot 14 by receiving only the three-dimensional map from the three-dimensional measuring device 15. Thus, the above embodiments can be variously modified and changed.

As described above in detail, according to the present embodiment, it is possible to provide a machine learning device, a robot system, and a machine learning method that can learn an optimum operation of a robot when randomly placed workpieces including a bulk state are taken out without human intervention. The

machine learning devices

20 and 30 of the present invention are not limited to the application of reinforcement learning (for example, Q learning) or supervised learning, and may be applied to various machine learning algorithms.

According to the machine learning device, the robot system, and the machine learning method of the present invention, it is possible to learn the optimum operation of the robot when taking out randomly placed workpieces including a bulk state without human intervention.

Although the embodiments have been described above, all the examples and conditions described herein are described for the purpose of facilitating understanding of the inventive concept applied to the invention and technology, and the examples and conditions described in particular are not intended to limit the scope of the invention. Moreover, such descriptions in the specification do not represent advantages and disadvantages of the invention. Although the embodiments of the invention have been described in detail, it should be understood that various changes, substitutions, and alterations can be made hereto without departing from the spirit and scope of the invention.

Claims

1. A robot system is characterized by comprising:

a robot;

an observation unit that acquires data including at least one of information relating to an object measured by a measuring instrument and information obtained by processing the information;

a determination unit configured to input the data to a neural network to determine information for gripping the object by the robot; and

and a control device for controlling the robot based on the information determined by the determination unit.

2. The robotic system of claim 1,

the data includes at least information related to any one of a position, a posture, and a distance of the object.

3. The robotic system of claim 1,

the data includes at least any one of distance image information of the object, three-dimensional position information of the object, and posture information.

4. The robotic system of claim 1,

the meter includes a three-dimensional vision sensor.

5. The robotic system of any of claims 1-4,

the neural network is learned by reinforcement learning using a return calculated based on a result of holding an object.

6. The robotic system of claim 5,

the result of the gripping of the object at least includes: the number of times the object is successfully gripped, the time required for gripping and conveying the object, the force acting on the hand of the robot, the degree of completion in the post-process after gripping the object, the state of the object, and the energy required for gripping and conveying the object.

7. The robotic system of any of claims 1-4,

the neural network is learned so that an error calculated from the label related to the holding result of the object and the output of the neural network is minimized.

8. The robotic system of any of claims 1-4,

the neural network outputs information related to an operation amount of a hand of the robot.

9. The robotic system of any of claims 1-4,

the neural network outputs information related to a probability of success of the gripping of the object or information related to a position of the object.

10. The robotic system of any of claims 1-4,

the information determined by the determination unit includes at least information for setting any one of a position, a posture, and a pickup direction of a hand of the robot.

11. The robotic system of any of claims 1-4,

the information determined by the determination unit includes at least information related to any one of a torque, a speed, and a rotational speed of a drive shaft provided to the robot.

12. The robotic system of any of claims 1-4,

the data acquired by the observation unit includes the information determined by the determination unit.

13. The robotic system of any of claims 1-4,

the determination unit outputs information for operating the measurement unit using the neural network.

14. The robotic system of any of claims 1-4,

the neural network learns from data acquired by other robots.

15. The robotic system of any of claims 1-4,

the neural network learns according to the results of the simulation.

16. The robotic system of any of claims 1-4,

the neural network resides on a cloud server.

17. The robotic system of any of claims 1-4,

the measuring device is mounted on an arm of the robot.

18. The robotic system of any of claims 1-4,

holding the object includes attracting or adsorbing the object by a hand of the robot.

19. The robotic system as claimed in claim 18,

the information determined by the determination unit includes information for attracting or attracting the object by the hand of the robot.

20. The robotic system as claimed in claim 18,

the hand of the robot generates magnetic force or negative pressure.

21. The robotic system of any of claims 1-4,

the information after processing the information is information obtained using machine learning.

22. A control method of a robot is characterized in that,

comprises the following steps:

acquiring data including at least one of information relating to an object measured by a measuring instrument and information obtained by processing the information;

determining information for holding the object by the robot by inputting the data into a neural network; and

controlling the robot according to the determined information.

23. A machine learning device for learning the motion of a robot gripping an object with a hand,

the machine learning device includes:

an observation unit that acquires data including at least one of information relating to the object measured by the measuring device and information obtained by processing the information;

a determination unit configured to input the data to a neural network to determine information for gripping the object with the hand; and

and a learning unit that learns the neural network based on a result of gripping of the object by the robot controlled based on the information determined by the determination unit.

24. The machine learning apparatus of claim 23,

25. The machine learning apparatus of claim 23,

26. The machine learning apparatus of any one of claims 23 to 25,

the learning unit calculates a reward based on a result of gripping the object, and learns the neural network by reinforcement learning using the reward.

27. The machine learning apparatus of claim 26,

the result of the gripping of the object at least includes: the number of times the object is successfully gripped, the time required for gripping and conveying the object, the force acting on the hand, the degree of completion in the post-process after gripping the object, the state of the object, and the energy required for gripping and conveying the object.

28. The machine learning apparatus of any one of claims 23 to 25,

the learning section learns the neural network so as to minimize an error based on a tag related to a gripping result of the object and an output calculation error of the neural network.

29. The machine learning apparatus of any one of claims 23 to 25,

the neural network outputs information related to an operation amount of the hand.

30. The machine learning apparatus of any one of claims 23 to 25,

31. The machine learning apparatus of any one of claims 23 to 25,

the information determined by the determination unit includes at least information for setting any one of the position, the posture, and the extraction direction of the hand.

32. The machine learning apparatus of any one of claims 23 to 25,

the information determined by the determination unit includes at least information related to any one of torque, speed, and rotational position of a drive shaft provided to the robot.

33. The machine learning apparatus of any one of claims 23 to 25,

34. The machine learning apparatus according to any one of claims 23 to 25, wherein the determination unit outputs information for operating the measurement unit using the neural network.

35. The machine learning apparatus of any one of claims 23 to 25,

the information on the object is information measured by the measuring instrument attached to the arm portion of the robot.

36. The machine learning apparatus of any of claims 23 to 25, wherein holding the object comprises attracting or adsorbing the object by the hand.

37. The machine learning apparatus of claim 36,

the information determined by the determination unit includes information for the hand to attract or adsorb the object.

38. The machine learning apparatus of claim 36,

the hand generates a magnetic force or negative pressure.

39. A machine learning method for learning the motion of a robot gripping an object with a hand,

comprises the following steps:

acquiring data including at least one of information relating to the object measured by the measuring device and information obtained by processing the information;

determining information for holding the object by the hand by inputting the data into a neural network; and

the neural network is learned based on a result of the gripping of the object by the robot controlled based on the information determined by the determination unit.