CN113199483A - Robot system, robot control method, machine learning device, and machine learning method - Google Patents

Robot system, robot control method, machine learning device, and machine learning method Download PDF

Info

Publication number
CN113199483A
CN113199483A CN202110544521.3A CN202110544521A CN113199483A CN 113199483 A CN113199483 A CN 113199483A CN 202110544521 A CN202110544521 A CN 202110544521A CN 113199483 A CN113199483 A CN 113199483A
Authority
CN
China
Prior art keywords
robot
information
machine learning
neural network
hand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110544521.3A
Other languages
Chinese (zh)
Inventor
山崎岳
尾山拓未
陶山峻
中山一隆
组谷英俊
中川浩
冈野原大辅
奥田辽介
松元睿一
河合圭悟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fanuc Corp
Preferred Networks Inc
Original Assignee
Fanuc Corp
Preferred Networks Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fanuc Corp, Preferred Networks Inc filed Critical Fanuc Corp
Publication of CN113199483A publication Critical patent/CN113199483A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • B25J9/1687Assembly, peg and hole, palletising, straight line, weaving pattern movement
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/39Robotics, robotics to robotics hand
    • G05B2219/39297First learn inverse model, then fine tune with ffw error learning
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40053Pick 3-D object from pile of objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)

Abstract

The invention provides a robot system, a robot control method, a machine learning device, and a machine learning method. The robot system includes: a robot; an observation unit that acquires data including at least one of information relating to an object measured by a measuring instrument and information obtained by processing the information; a determination unit configured to input the data to a neural network to determine information for gripping the object by the robot; and a control device for controlling the robot based on the information determined by the determination unit.

Description

Robot system, robot control method, machine learning device, and machine learning method
The present application is a divisional application of patent application No. 201610617361.X entitled "machine learning apparatus, robot system, and machine learning method", filed on 29/7/2016.
Technical Field
The present invention relates to a machine learning device, a robot system, and a machine learning method for learning a picking operation of randomly placed workpieces including a bulk state.
Background
Conventionally, as disclosed in, for example, japanese patent No. 5642738 and japanese patent No. 5670397, there is known a robot system in which a robot hand grips and conveys workpieces stacked in bulk in a basket-shaped box. In such a robot system, for example, position information of a plurality of workpieces is acquired using a three-dimensional measuring instrument provided above a basket-shaped box, and the workpieces are taken out one by a robot hand of the robot based on the position information.
However, in the above-described conventional robot system, it is necessary to set in advance how to extract a workpiece to be taken out and at which position the workpiece is to be taken out, for example, from the distance images of a plurality of workpieces measured by the three-dimensional measuring instrument. In addition, it is necessary to program in advance how to operate the robot hand when the workpiece is taken out. Specifically, for example, it is necessary to teach the robot about the operation of taking out a workpiece using a teaching board.
Therefore, if the setting for extracting the workpiece to be taken out from the distance images of the plurality of workpieces is not appropriate or the operation program of the robot is not appropriately created, the success rate at the time when the robot takes out the workpiece and carries it is lowered. In order to increase the success rate, it is necessary to continuously improve the detection setting of the workpiece and the operation program of the robot while repeating trial and error to find the optimum operation of the robot.
Disclosure of Invention
In view of the above circumstances, an object of the present invention is to provide a machine learning device, a robot system, and a machine learning method that can learn an optimum operation of a robot when randomly placed workpieces including a bulk state are taken out without human intervention.
According to a first aspect of the present invention, there is provided a machine learning device for learning an operation of a robot that takes out a plurality of workpieces placed in a random manner in a bulk state by a robot hand, the machine learning device including: a state quantity observation unit that observes a state quantity of the robot including output data of a three-dimensional measuring instrument that measures a three-dimensional map of each of the workpieces; an operation result acquisition unit that acquires a result of a picking operation of the robot that picks up the workpiece by the robot hand unit; and a learning unit that receives an output from the state quantity observation unit and an output from the operation result acquisition unit, and learns an operation quantity including instruction data for instructing the robot to take out the workpiece, in association with the state quantity of the robot and a result of the taking out operation. Preferably, the machine learning device further includes an intention determining unit that determines the command data to be instructed to the robot by referring to the operation amount learned by the learning unit.
According to a second aspect of the present invention, there is provided a machine learning device for learning an operation of a robot that takes out a plurality of workpieces placed in a random manner in a bulk state by a robot hand, the machine learning device including: a state quantity observation unit that observes a state quantity of the robot including output data of a three-dimensional measuring instrument that measures a three-dimensional map of each of the workpieces; an operation result acquisition unit that acquires a result of a picking operation of the robot that picks up the workpiece by the robot hand unit; and a learning unit that receives the output from the state quantity observation unit and the output from the operation result acquisition unit, and learns the operation quantity including the measurement parameter of the three-dimensional measurement device in association with the state quantity of the robot and the result of the extraction operation. Preferably, the machine learning device further includes an intention determining unit that determines the measurement parameter of the three-dimensional measurement device by referring to the operation amount learned by the learning unit.
The state quantity observation unit may observe the state quantity of the robot including output data of a coordinate calculation unit for calculating a three-dimensional position of each of the workpieces, based on an output of the three-dimensional measuring instrument. The coordinate calculation section may further calculate a posture of each of the workpieces, and output data of the calculated three-dimensional position and posture of each of the workpieces. The operation result acquisition unit may use output data of the three-dimensional measuring instrument. Preferably, the machine learning device further includes a preprocessor that processes output data of the three-dimensional measuring device before the input to the state quantity observation unit, and the state quantity observation unit receives the output data of the preprocessor as the state quantity of the robot. The preprocessing section may make the direction and height of each of the workpieces constant in the output data of the three-dimensional measuring instrument. The operation result acquiring unit may acquire at least one of success or failure in taking out the workpiece, a damaged state of the workpiece, and a degree of completion when the taken-out workpiece is transferred to a subsequent process.
The learning unit may include: a reward calculating part for calculating a reward according to the output of the action result acquiring part; and a cost function updating unit having a cost function for specifying a value of the workpiece picking operation, the cost function being updated in accordance with the return. The learning unit may further include a learning model for learning the removal operation of the workpiece, and includes: an error calculation unit that calculates an error based on an output of the operation result acquisition unit and an output of the learning model; and a learning model updating unit that updates the learning model in accordance with the error. The machine learning device preferably has a neural network.
According to a third aspect of the present invention, there is provided a robot system including a machine learning device that learns an operation of a robot that takes out a plurality of workpieces placed in a random manner in a bulk state by a hand portion, the robot system including: a state quantity observation unit that observes a state quantity of the robot including output data of a three-dimensional measuring instrument that measures a three-dimensional map of each of the workpieces; an operation result acquisition unit that acquires a result of a picking operation of the robot that picks up the workpiece by the robot hand unit; and a learning unit that receives an output from the state quantity observation unit and an output from the operation result acquisition unit, and learns an operation quantity including instruction data for instructing the robot to take out the workpiece in association with the state quantity of the robot and a result of the taking out operation, the robot system including: the robot, the three-dimensional measuring instrument, and a control device for controlling the robot and the three-dimensional measuring instrument, respectively.
According to a fourth aspect of the present invention, there is provided a robot system including a machine learning device that learns an operation of a robot that takes out a plurality of workpieces placed in a random manner in a bulk state by a hand portion, the robot system including: a state quantity observation unit that observes a state quantity of the robot including output data of a three-dimensional measuring instrument that measures a three-dimensional map of each of the workpieces; an operation result acquisition unit that acquires a result of a picking operation of the robot that picks up the workpiece by the robot hand unit; and a learning unit that receives an output from the state quantity observation unit and an output from the operation result acquisition unit, and learns an operation quantity including a measurement parameter of the three-dimensional measurement device in association with the state quantity of the robot and a result of the extraction operation, the robot system including: the robot, the three-dimensional measuring instrument, and a control device for controlling the robot and the three-dimensional measuring instrument, respectively.
Preferably, the robot system includes a plurality of the robots, the machine learning device is provided for each of the robots, and the machine learning devices provided for the robots share or exchange data with each other via a communication medium. The machine learning device may reside on a cloud server.
According to a fifth aspect of the present invention, there is provided a machine learning method for learning an operation of a robot that takes out a plurality of randomly placed workpieces in a bulk state by a hand, the workpieces being placed in a random manner, the machine learning method including the steps of: observing a state quantity of the robot including output data of a three-dimensional measuring device that measures a three-dimensional map of each of the workpieces; acquiring a result of an extracting operation of the robot that extracts the workpiece by the hand portion; an output from the state quantity observation unit and an output from the operation result acquisition unit are received, and an operation quantity including instruction data for instructing the robot to take out the workpiece is learned in association with the state quantity of the robot and a result of the taking out operation.
Drawings
The present invention will be more clearly understood by reference to the following drawings.
Fig. 1 is a block diagram showing a conceptual configuration of a robot system according to an embodiment of the present invention.
Fig. 2 is a diagram schematically showing a model of a neuron.
Fig. 3 is a diagram schematically showing a three-layer neural network formed by combining the neurons shown in fig. 2.
Fig. 4 is a flowchart showing an example of the operation of the machine learning device shown in fig. 1.
Fig. 5 is a block diagram showing a conceptual configuration of a robot system according to another embodiment of the present invention.
Fig. 6 is a diagram for explaining an example of processing by the preprocessing unit in the robot system shown in fig. 5.
Fig. 7 is a block diagram showing a modification of the robot system shown in fig. 1.
Detailed Description
Embodiments of a machine learning device, a robot system, and a machine learning method according to the present invention will be described below with reference to the drawings. However, it is intended that the invention not be limited to the embodiments illustrated in the drawings or described below. Here, in each drawing, the same reference numerals are given to the same components. In the drawings, the same reference numerals are assigned to the components having the same functions. For easy understanding, the drawings are appropriately modified in scale.
Fig. 1 is a block diagram showing a conceptual configuration of a robot system according to an embodiment of the present invention. The robot system 10 of the present embodiment includes: a robot 14 to which a hand 13 for gripping the workpieces 12 stacked in bulk in the basket-shaped box 11 is attached; a three-dimensional measuring device 15 that measures a three-dimensional map (map) of the surface of the workpiece 12; a control device 16 for controlling the robot 14 and the three-dimensional measuring device 15; a coordinate calculation unit 19; and a machine learning device 20.
Here, the machine learning device 20 includes: a state quantity observation unit 21, an operation result acquisition unit 26, a learning unit 22, and an intention determination unit 25. As described in detail later, the machine learning device 20 learns and outputs operation amounts such as instruction data instructing the robot 14 to perform the operation of taking out the workpiece 12 and measurement parameters of the three-dimensional measuring instrument 15.
The robot 14 is, for example, a 6-axis articulated robot, and the drive axes of the robot 14 and the hand portion 13 are controlled by a control device 16. Further, a robot 14 is used to take out the workpieces 12 one by one from the magazine 11 provided at a predetermined position and sequentially move the workpieces to a predetermined place such as a conveyor or an operation table (not shown).
However, when the workpieces 12 in bulk are taken out of the box 11, the robot hand 13 or the workpieces 12 may collide or contact the wall of the box 11. Alternatively, the robot hand 13 or the workpiece 12 may be caught by another workpiece 12. In such a case, in order to immediately avoid an overload applied to the robot 14, a function of detecting a force acting on the hand 13 is required. Therefore, a 6-axis force sensor 17 is provided between the tip of the arm portion of the robot 14 and the hand portion 13. The robot system 10 according to the present embodiment further includes a function of estimating the force acting on the hand portion 13 from the current value of a motor (not shown) that drives the drive shaft of each joint of the robot 14.
Further, since the force sensor 17 can detect the force acting on the hand portion 13, it can be determined whether or not the hand portion 13 actually holds the workpiece 12. That is, since the weight of the workpiece 12 acts on the hand portion 13 when the hand portion 13 grips the workpiece 12, it can be determined that the hand portion 13 grips the workpiece 12 if the detection value of the force sensor 17 exceeds a predetermined threshold value after the taking-out operation of the workpiece 12 is performed. The determination as to whether or not the hand portion 13 holds the workpiece 12 may be made by, for example, imaging data of a camera used in the three-dimensional measuring instrument 15 or an output of a photoelectric sensor, not shown, attached to the hand portion 13. The determination may be made based on data from a pressure gauge of the suction robot described later.
Here, the hand portion 13 may have various forms as long as it can grip the workpiece 12. For example, the hand portion 13 may be configured to grip the workpiece 12 by opening and closing 2 or more claw portions, or may include an electromagnet or a negative pressure generating device that generates an attractive force to the workpiece 12. That is, fig. 1 illustrates a case where the hand portion 13 grips the workpiece by 2 claw portions, but the present invention is not limited thereto.
In order to measure the plurality of workpieces 12, the three-dimensional measuring device 15 is provided at a predetermined position above the plurality of workpieces 12 via the support portion 18. As the three-dimensional measuring instrument 15, for example, a three-dimensional vision sensor that obtains three-dimensional position information by performing image processing on image data of the workpiece 12 captured by 2 cameras (not shown) can be used. Specifically, the three-dimensional map (the positions of the surfaces of the plurality of works 12 stacked in bulk) is measured by applying a triangulation method, a light section method, a Time-of-flight method (Time-of-flight method), a Defocus distance measurement method (Depth from Defocus method), a method using these methods in combination, or the like.
The coordinate calculation unit 19 calculates (measures) the positions of the surfaces of the plurality of works 12 stacked in bulk, using the three-dimensional map obtained by the three-dimensional measuring instrument 15 as an input. That is, the three-dimensional position data (x, y, z) or the three-dimensional position data (x, y, z) and the orientation data (w, p, r) of each workpiece 12 can be obtained by the output of the three-dimensional measuring instrument 15. Here, the state quantity observation unit 21 receives both the three-dimensional map from the three-dimensional measurement device 15 and the position data (posture data) from the coordinate calculation unit 19 to observe the state quantity of the robot 14, but may also observe the state quantity of the robot 14 by receiving only the three-dimensional map from the three-dimensional measurement device 15, for example. Further, similarly to the case described below with reference to fig. 5, a preprocessing unit 50 may be added, and the preprocessing unit 50 may process (preprocess) the three-dimensional map from the three-dimensional measuring device 15 before inputting the processed three-dimensional map to the state quantity observation unit 21, and input the processed three-dimensional map to the state quantity observation unit 21.
It is assumed that the positions of the robot 14 and the three-dimensional measurement device 15 are determined in advance by calibration. In the three-dimensional measuring instrument 15 of the present invention, a laser distance measuring instrument may be used instead of the three-dimensional vision sensor. That is, the distance from the position where the three-dimensional measuring instrument 15 is provided to the surface of each workpiece 12 can be measured by laser scanning, or three-dimensional position data and postures (x, y, z, w, p, r) of a plurality of workpieces 12 stacked in bulk can be acquired by using various sensors such as a monocular camera and a touch sensor.
That is, in the present invention, the three-dimensional measuring instrument 15 can be applied to any kind of three-dimensional measuring method as long as data (x, y, z, w, p, r) of each workpiece 12 can be acquired. The form of installation of the three-dimensional measuring instrument 15 is not particularly limited, and may be fixed to a floor, a wall, or the like, or may be attached to an arm portion of the robot 14.
The three-dimensional measuring device 15 acquires a three-dimensional map of the plurality of workpieces 12 stacked in bulk in the box 11 by a command from the control device 16, and the coordinate calculation unit 19 acquires (calculates) data of three-dimensional positions (postures) of the plurality of workpieces 12 from the three-dimensional map, and outputs the data to the control device 16 and a state quantity observation unit 21 and an operation result acquisition unit 26 of a machine learning device 20 described later. In particular, the coordinate calculation unit 19 estimates the boundary between a certain workpiece 12 and another workpiece 12 or the boundary between the workpiece 12 and the box 11, for example, from the captured image data of a plurality of workpieces 12, and acquires three-dimensional position data for each workpiece 12.
The three-dimensional position data for each workpiece 12 is, for example, data obtained by estimating the existing position or retainable position of each workpiece 12 from the positions of a plurality of points on the surface of a plurality of workpieces 12 stacked in bulk. Of course, the three-dimensional position data of each workpiece 12 may also include data of the posture of the workpiece 12.
The coordinate calculation unit 19 also includes a method of using machine learning to acquire three-dimensional position and orientation data of each workpiece 12. For example, an input image using a method such as supervised learning described later, object recognition from a laser distance measuring device, or angle estimation can be applied.
When the three-dimensional position data of each workpiece 12 is input from the three-dimensional measuring instrument 15 to the control device 16 via the coordinate calculation unit 19, the control device 16 controls the operation of the hand 13 that takes out a certain workpiece 12 from the box 11. At this time, motors (not shown) of the respective axes of the robot hand 13 and the robot 14 are driven based on command values (operation amounts) corresponding to the optimal position, posture, and pickup direction of the robot hand 13 obtained by the machine learning device 20 described later.
The machine learning device 20 may learn variables of the imaging conditions of the camera used in the three-dimensional measuring device 15 (measurement parameters of the three-dimensional measuring device 15, for example, exposure time adjusted during imaging using an exposure table, illuminance of an illumination system when illuminating an object to be imaged, and the like), and may control the three-dimensional measuring device 15 via the control device 16 based on the learned measurement parameter operation amount. Here, the variables of the position/orientation estimation conditions used for estimating the existing position/orientation and the retainable position/orientation of each of the workpieces 12 from the positions of the plurality of workpieces 12 measured by the three-dimensional measuring device 15 may be included in the output data of the three-dimensional measuring device 15.
In addition, as described above, the output data from the three-dimensional measuring instrument 15 may be processed in advance by the preprocessing unit 50 or the like described later with reference to fig. 5, and the processed data (image data) may be provided to the state quantity observation unit 21. The operation result acquisition unit 26 may acquire the result of the robot hand 13 of the robot 14 taking out the workpiece 12 based on the output data from the three-dimensional measuring instrument 15 (the output data of the coordinate calculation unit 19), and may acquire, for example, the operation result of the degree of completion when the taken-out workpiece 12 is transferred to the subsequent step and the state change such as the presence or absence of the breakage of the taken-out workpiece 12 via another means (for example, a camera, a sensor, or the like provided in the subsequent step). As described above, the state quantity observation unit 21 and the operation result acquisition unit 26 are functional modules, but it is needless to say that both functions may be realized by one module.
Next, the machine learning device 20 shown in fig. 1 is described in detail. The machine learning device 20 has the following functions: from a set of data input to the device, rules, knowledge expressions, judgment criteria, and the like useful for the analysis are extracted, and the result of the judgment is output and knowledge learning (machine learning) is performed. The methods of machine learning are various, and if roughly divided, they are classified into, for example, "supervised learning", "unsupervised learning", and "reinforcement learning". In addition, in order to realize these methods, there is a method called "Deep Learning" (Learning) in which the feature amount itself is extracted. Further, these machine learning (machine learning device 20) may use a General-Purpose computer or processor, but when a GPGPU (General-Purpose computing image Processing unit) is applied, a large-scale PC cluster, or the like, higher-speed Processing can be performed.
First, supervised learning refers to learning features in a data set by providing a large number of data sets of a certain input and result (label) to the machine learning device 20, and collectively obtaining a model from the input estimation result, that is, a relationship thereof. In the case where the supervised learning is applied in the present embodiment, it can be used, for example, in a portion for estimating a workpiece position from a sensor input, a portion for estimating a success probability with respect to a candidate workpiece, or the like. For example, the algorithm may be implemented using a neural network described later.
The unsupervised learning means a method of learning by a device that compresses, classifies, shapes, or the like input data, even if corresponding teacher output data is not provided, by only providing a large amount of input data to a learning device, and learning what kind of distribution the input data has been distributed. For example, features in these datasets can be clustered among the similarities, and so on. Using the result, a certain criterion is set and output allocation is optimized, thereby enabling output prediction.
Further, as a problem setting of the intermediate between the supervised learning and the unsupervised learning, referred to as semi-supervised learning corresponds to a case where, for example, only a part of the input and output data sets exist, and only the input data is otherwise present. In the present embodiment, data (image data, simulation data, and the like) that can be acquired without actually operating the robot is used for unsupervised learning, and thus learning can be performed efficiently.
Next, reinforcement learning will be described. First, as a problem setting of reinforcement learning, the following is considered.
The robot observes the state of the environment and determines the behavior.
The environment changes according to a certain rule, and the behavior of itself may change the environment.
Return a reward signal each time an action is taken.
What is desired to be maximized is the aggregate of future (discounted) returns.
Learning starts from a state where the results caused by the behavior are not known at all or not known at all. That is, the robot can actually act for the first time, and the result thereof is acquired as data. That is, it is necessary to search for the optimum behavior while trying.
In order to simulate the human motion, a state learned in advance (the method such as the above-described supervised learning or the reverse reinforcement learning) may be set as an initial state, and the learning may be started from a good start point.
Here, reinforcement learning refers to a method of learning an appropriate behavior based on an interaction of a behavior with an environment by learning a behavior in addition to determination and classification, that is, learning to maximize a future return. This means that, in the present embodiment, for example, a behavior affecting the future can be obtained in which a mountain of the workpiece 12 collapses and the workpiece 12 is easily taken out in the future. The following description will be made by taking the case of Q learning as an example, but the Q learning is not limited thereto.
Q learning refers to a method of learning the value Q (s, a) of a selected behavior a in a certain environmental state s. That is, in a certain state s, the behavior a having the highest value Q (s, a) may be selected as the optimum behavior. However, initially, the correct value of the value Q (s, a) is not known at all for the combination of state s and behavior a. Therefore, the agent (agent) selects various actions a in a certain state s, and gives a reward to the action a at that time. Thus, the agent is constantly learning the choice of better behavior, i.e. the correct value Q (s, a).
Further, since it is desired to maximize the total of the returns obtained in the future as a result of the behavior, the objective is to finally make Q (s, a) equal to E [ Σ (γ) ]t)rt]. Here, E2]Denotes an expected value, t is time, γ is a parameter called discount rate described later, and rtIs the return at time t, and Σ is based on the sum of times t. The expected value in this equation is a value that is obtained when the state changes according to the optimal behavior, and since this is unknown, it is learned while searching. The value Q (s, a) is more novel, and is represented by the following formula (1).
Figure BDA0003073091650000091
In the above formula (1), stRepresenting the environmental state at time t, atRepresenting the behavior at time t. By action atChange of state to st+1。rt+1Indicating the return obtained by the state change. In addition, the term with max becomes "in" state st+1Next, a term obtained by multiplying γ by the Q value at the action a where the Q value is known to be the highest at this time is selected. Here, γ is a parameter of 0 < γ ≦ 1, referred to as the discount rate. Further, α is a learning coefficient, and is set in a range of 0 < α ≦ 1.
The above formula (1) is based on the trial atResult of (2) returning a reward rt+1Update the state stBehavior a intEvaluation value Q(s) oft、at) The method of (1). I.e. if based on the reward rt+1And an evaluation value Q(s) of the optimal behavior max a of the next state of the behavior at+1、max at+1) Is greater than the evaluation value Q(s) of the behavior a in the state st、at) Then make Q(s)t、at) Increases, conversely, if smaller than the evaluation value Q of the behavior a in the state s(s)t、at) Then Q(s)t、at) And decreases. That is, the value of a behavior in a state is made closer to the value of the best behavior in the next state based on the reward returned immediately as a result, and the behavior.
Here, as a method of expressing Q (s, a) on a computer, there are a method of holding the value in advance as a table for all the state behavior pairs (s, a) and a method of preparing a function for approximating Q (s, a). The latter method can realize the above expression (1) by adjusting parameters of the approximation function by a method such as a random gradient descent method. Further, as the approximation function, a neural network described later can be used.
Further, as an approximation algorithm of a merit function in supervised learning, a learning model of unsupervised learning, or reinforcement learning, a neural network may be used. Fig. 2 is a diagram schematically showing a model of a neuron, and fig. 3 is a diagram schematically showing a three-layer neural network formed by combining the neurons shown in fig. 2. That is, the neural network is configured by, for example, an arithmetic device and a memory that simulate a neuron model such as shown in fig. 2.
As shown in fig. 2, the neuron outputs an output (result) y for a plurality of inputs x (in fig. 2, inputs x1 to x3 are taken as examples). Each input x (x1, x2, x3) is multiplied by a weight w (w1, w2, w3) corresponding to the input x. Thereby, the neuron outputs a result y expressed by the following expression (2). In addition, the input x, the result y, and the weight w are all vectors. In the following formula (2), θ is a bias (bias), fkIs an activation function.
Figure BDA0003073091650000101
A three-layer neural network formed by combining the neurons shown in fig. 2 will be described with reference to fig. 3. As shown in fig. 3, a plurality of inputs x (here, input x1 to input x3 are taken as examples) are input from the left side of the neural network, and a result y (here, result y1 to result y3 are taken as examples) is output from the right side. Specifically, the inputs x1, x2, and x3 are multiplied by the corresponding weights and input to each of the 3 neurons N11 to N13. The weights multiplied by these inputs are collectively labeled as W1.
Neurons N11-N13 output z 11-z 13, respectively. In fig. 3, Z11 to Z13 are collectively denoted as a feature vector Z1, and can be regarded as vectors obtained by extracting feature amounts of input vectors. The feature vector Z1 is a feature vector between the weight W1 and the weight W2. Z11 to z13 are multiplied by the corresponding weight and input to each of 2 neurons N21 and N22. The weights multiplied by these feature vectors are collectively labeled as w 2.
Neurons N21, N22 output z21, z22, respectively. In fig. 3, Z21 and Z22 are collectively denoted as a feature vector Z2. The feature vector Z2 is a feature vector between the weight W2 and the weight W3. Z21 and z22 are multiplied by the corresponding weights and input to each of the 3 neurons N31 to N33. The weights multiplied by these feature vectors are collectively labeled as W3.
Finally, neurons N31 to N33 output results y1 to y3, respectively. There are learning modes and value prediction modes in the action of the neural network. For example, in the learning mode, the weight W is learned using the learning data set, and the behavior of the robot is determined in the prediction mode using the parameter. Further, prediction is written for convenience, but of course, various tasks such as detection/classification/inference may be performed.
Here, the data obtained by actually operating the robot in the prediction mode can be immediately learned and reflected in the next behavior (online learning); the learning may be performed using a data set collected in advance, and the detection pattern may be performed using the parameter (batch learning) at a later time. Alternatively, it may be intermediate, with learning patterns inserted each time data accumulates to some extent.
The weights W1 to W3 can be learned by an error Back propagation method. Further, error information is input from the right side and flows to the left side. The error back propagation method is a method of adjusting (learning) the respective weights of the output y when the input x is input and the real output y (teacher) for each neuron.
Such neural networks may further add layers above three (referred to as deep learning). Further, an arithmetic device may be used which automatically obtains feature extraction input in stages only from teacher data and regresses the result.
Therefore, the machine learning device 20 of the present embodiment includes, as shown in fig. 1, in order to be able to perform the Q learning: a state quantity observation unit 21, an operation result acquisition unit 26, a learning unit 22, and an intention determination unit 25. However, the machine learning method applied in the present invention is not limited to Q learning as described above. In other words, various methods such as "supervised learning", "unsupervised learning", "semi-supervised learning", and "reinforcement learning", which are methods that can be used in the machine learning apparatus, can be applied. Further, these machine learning (machine learning device 20) may use a general-purpose computer or processor, however, when a GPGPU, a large-scale PC cluster, or the like is applied, processing can be performed at a higher speed.
That is, according to the present embodiment, there is provided a machine learning device that learns the operation of a robot 14 that takes out workpieces 12 from a plurality of workpieces 12 placed in bulk including a random state by a hand portion 13, the machine learning device including: a state quantity observation unit 21 that observes a state quantity of the robot 14 including output data of a three-dimensional measuring instrument 15, the three-dimensional measuring instrument 15 measuring a three-dimensional position (x, y, z) or a three-dimensional position and orientation (x, y, z, w, p, r) of each workpiece 12; an operation result acquisition unit 26 that acquires the result of the picking operation of the robot 14 that picks up the workpiece 12 by the robot hand 13; and a learning unit 22 that receives the output from the state quantity observation unit 21 and the output from the operation result acquisition unit 26, and learns the operation quantity including the instruction data for instructing the robot 14 to take out the workpiece 12, in association with the state quantity of the robot 14 and the result of the taking out operation.
The state quantity observed by the state quantity observation unit 21 may include, for example, state variables for setting the position, posture, and removal direction of the robot hand 13 when removing a certain workpiece 12 from the magazine 11. The learned operation amount may include, for example, command values such as torque, speed, and rotational position provided from the control device 16 to each drive shaft of the robot 14 and the hand 13 when the workpiece 12 is taken out from the box 11.
When one of the plurality of workpieces 12 stacked in bulk is taken out, the learning unit 22 learns the state variables in association with the result of the taking-out operation of the workpiece 12 (the output of the operation result acquisition unit 26). That is, the output data of the three-dimensional measuring instrument 15 (coordinate calculating unit 19) and the command data of the hand unit 13 are set at random or set intentionally according to a predetermined rule by the control device 16, and the workpiece 12 is taken out by the hand unit 13. Here, as the predetermined rule, for example, there is a rule in which workpieces having a high height (z) direction among the plurality of workpieces 12 stacked in bulk are sequentially taken out. Thus, the output data of the three-dimensional measuring instrument 15 and the command data of the robot 13 correspond to the behavior of taking out a certain workpiece. Then, success and failure in taking out the workpiece 12 occur, and each time such success and failure occurs, the learning unit 22 evaluates a state variable composed of output data of the three-dimensional measuring instrument 15 and command data of the robot 13.
The learning unit 22 stores output data of the three-dimensional measuring instrument 15 and command data of the robot 13 when the workpiece 12 is taken out, in association with evaluation of the result of the taking-out operation of the workpiece 12. Further, as an example of failure, there is a case where: the robot hand 13 may not hold the workpiece 12, or may collide or contact the workpiece 12 with the wall of the box 11 even if the workpiece 12 is held. Whether or not such removal of the workpiece 12 is successful is determined based on the detection value of the force sensor 17 and the imaging data of the three-dimensional measuring instrument. Here, the machine learning device 20 may perform learning using a part of the instruction data of the robot 13 output from the control device 16, for example.
Here, the learning unit 22 of the present embodiment preferably includes a reward calculation unit 23 and a merit function update unit 24. For example, the return calculation unit 23 calculates a return, for example, a score, based on the success or failure of the removal of the workpiece 12 due to the state variables. The success of the removal of the workpiece 12 is assumed to be high, and the failure of the removal of the workpiece 12 is assumed to be low. In addition, the reward may also be calculated based on the number of successful removals of workpieces 12 within a predetermined time. In the calculation of the return, the return may be calculated in accordance with each stage of the removal of the workpiece 12, such as the success of gripping by the hand portion 13, the success of conveying by the hand portion 13, and the success of the placing operation of the workpiece 12.
The merit function update unit 24 has a merit function for specifying the merit of the removal operation of the workpiece 12, and updates the merit function in accordance with the return. In updating the cost function, the above-described update of the cost Q (s, a) is used. It is preferable to create a behavior value table at the time of the update. The behavior value table here is a table in which the output data of the three-dimensional measuring instrument 15 and the command data of the robot 13 when the workpiece 12 is taken out, and the value function (i.e., the evaluation value) updated in accordance with the result of taking out the workpiece 12 at that time are stored in association with each other.
Further, as the behavior value table, a function obtained by performing the approximation processing using the neural network described above may be used, and this is particularly effective when the amount of information of the state s such as image data is large. Further, the above-mentioned merit functions are not limited to 1 kind. For example, a cost function for evaluating whether the workpiece 12 is successfully gripped by the hand portion 13 or not, and a cost function for evaluating a time (cycle time) required for gripping and conveying the workpiece 12 by the hand portion 13 are considered.
Further, as the above-described cost function, a cost function for evaluating interference between the box 11 and the hand portion 13 or the workpiece 12 at the time of workpiece removal may be used. In order to calculate the return used for updating the merit function, the state quantity observation unit 21 preferably observes the force applied to the hand portion 13, for example, a value detected by the force sensor 17. Further, since it can be estimated that the disturbance occurs when the amount of change in the force detected by the force sensor 17 exceeds a predetermined threshold value, it is preferable to reduce the value determined by the cost function by setting the return in this case to a negative value, for example.
Further, according to the present embodiment, the measurement parameter of the three-dimensional measuring instrument 15 can be learned as the operation amount. That is, according to the present embodiment, there is provided a machine learning device that learns the operation of a robot 14 that takes out workpieces 12 from a plurality of workpieces 12 placed in bulk including a random state by a hand portion 13, the machine learning device including: a state quantity observation unit 21 that observes a state quantity of the robot 14 including output data of a three-dimensional measuring instrument 15, the three-dimensional measuring instrument 15 measuring a three-dimensional position (x, y, z) or a three-dimensional position and orientation (x, y, z, w, p, r) of each workpiece 12; an operation result acquisition unit 26 that acquires the result of the picking operation of the robot 14 that picks up the workpiece 12 by the robot hand 13; and a learning unit 22 that receives the output from the state quantity observation unit 21 and the output from the operation result acquisition unit 26, and learns the operation quantity including the measurement parameter of the three-dimensional measurement device 15 in association with the state quantity of the robot 14 and the result of the extraction operation.
The robot system 10 according to the present embodiment may further include an automatic robot replacement device (not shown) that replaces the robot hand 13 attached to the robot 14 with another type of robot hand 13. In this case, the merit function update unit 24 may have the merit function for each hand unit 13 having a different form, and update the merit function of the replaced hand unit 13 in accordance with the return. This makes it possible to learn the optimum operation of the hand portion 13 for each of the plurality of hand portions 13 having different configurations, and therefore, the automatic hand changer can be made to select a hand portion 13 having a high cost function.
Next, the intention determining unit 25 preferably refers to the behavior value table created as described above, and selects the output data of the three-dimensional measuring instrument 15 and the command data of the manipulator unit 13 corresponding to the highest evaluation value. Then, the intention determining unit 25 outputs the optimum data of the selected manipulator 13 and the three-dimensional measuring instrument 15 to the control device 16.
Then, the controller 16 controls the three-dimensional measuring instrument 15 and the robot 14 to take out the workpiece 12, respectively, using the optimum data of the robot hand 13 and the three-dimensional measuring instrument 15 output from the learning unit 22. For example, the control device 16 preferably operates the drive axes of the hand portion 13 and the robot 14 based on state variables that set the optimum position, posture, and extraction direction of the hand portion 13 obtained by the learning unit 22.
As shown in fig. 1, the robot system 10 according to the above embodiment includes one machine learning device 20 for one robot 14. However, in the present invention, the number of each of the robot 14 and the machine learning device 20 is not limited to one. For example, the robot system 10 may further include a plurality of robots 14, and one or more machine learning devices 20 may be provided corresponding to the respective robots 14. The robot system 10 preferably shares or exchanges the optimal state variables of the three-dimensional measurement instrument 15 and the hand 13 acquired by the machine learning device 20 of each robot 14 with each other through a communication medium such as a network. Thus, even if the operation rate of one robot 14 is lower than the operation rates of the other robots 14, the optimal operation result obtained by the machine learning device 20 provided in the other robot 14 can be used in the operation of one robot 14. Further, by sharing the learning model among a plurality of robots, or sharing the operation amount including the measurement parameters of the three-dimensional measuring instrument 15, the state amount of the robot 14, and the result of the extracting operation, the time taken for learning can be shortened.
The machine learning device 20 may be located inside the robot 14 or may be located outside the robot 14. Alternatively, the machine learning device 20 may be located in the control device 16 or may be present in a cloud server (not shown).
In the case where the robot system 10 includes a plurality of robots 14, while one robot 14 is carrying the workpiece 12 gripped by the hand 13, the hand of another robot 14 may be caused to perform an operation of taking out the workpiece 12. The merit function update unit 24 may update the merit function with the time during which the robot 14 that takes out the workpiece 12 switches. The machine learning device 20 includes state variables of a plurality of robot models, performs a picking-up simulation using the plurality of robot models during the picking-up operation of the workpiece 12, and learns the state variables of the plurality of robot models in association with the result of the picking-up operation of the workpiece 12 based on the result of the picking-up simulation.
In the machine learning device 20, the output data of the three-dimensional measuring device 15 when acquiring the data of the three-dimensional map of each workpiece 12 is transmitted from the three-dimensional measuring device 15 to the state quantity observation unit 21. Since the transmission data does not necessarily include the abnormal data, the machine learning device 20 may have a function of filtering the abnormal data, that is, a function of selecting whether or not to input the data from the three-dimensional measuring device 15 to the state quantity observation unit 21. Thus, the learning unit 22 of the machine learning device 20 can efficiently learn the optimum operation of the three-dimensional measuring instrument 15 and the hand unit 13 of the robot 14.
In the machine learning device 20, although the output data from the learning unit 22 is input to the control device 16, the output data from the learning unit 22 does not necessarily include abnormal data, and therefore, the device may have a function of filtering abnormal data, that is, a function of selecting whether or not to output the data from the learning unit 22 to the control device 16. Thus, the controller 16 can cause the robot 14 to perform the optimal operation of the hand unit 13 more safely.
The abnormal data may be detected in the following order. That is, abnormal data can be detected by the following order: a probability distribution of input data is estimated, a probability of occurrence of a new input is derived using the probability distribution, and if the probability of occurrence is a certain value or less, it is regarded as abnormal data greatly deviating from a typical behavior.
Next, an example of the operation of the machine learning device 20 included in the robot system 10 according to the present embodiment will be described. Fig. 4 is a flowchart showing an example of the operation of the machine learning device shown in fig. 1. As shown in fig. 4, when the learning operation (learning process) is started, the machine learning device 20 shown in fig. 1 performs three-dimensional measurement by the three-dimensional measuring instrument 15 and outputs the measurement result (step S11 in fig. 4). That is, in step S11, for example, a three-dimensional map (output data of the three-dimensional measuring device 15) of each workpiece 12 including random placement in a bulk state is acquired and output to the state quantity observation unit 21, and the coordinate calculation unit 19 receives the three-dimensional map of each workpiece 12 to calculate the three-dimensional position (x, y, z) of each workpiece 12 and output to the state quantity observation unit 21, the operation result acquisition unit 26, and the control device 16. Here, the coordinate calculation unit 19 may calculate and output the postures (w, p, r) of the respective workpieces 12 from the output of the three-dimensional measuring instrument 15.
As described with reference to fig. 5, the output (three-dimensional map) of the three-dimensional measuring instrument 15 may be input to the state quantity observation unit 21 via the preprocessing unit 50 that performs processing before being input to the state quantity observation unit 21. As described with reference to fig. 7, only the output of the three-dimensional measuring instrument 15 may be input to the state quantity observation unit 21, and only the output of the three-dimensional measuring instrument 15 may be input to the state quantity observation unit 21 via the preprocessing unit 50. Thus, the implementation and output of the three-dimensional measurement in step S11 may involve various ways.
Specifically, in the case of fig. 1, the state quantity observation unit 21 observes the three-dimensional map of each workpiece 12 from the three-dimensional measuring device 15 and the state quantities (output data of the three-dimensional measuring device 15) such as the three-dimensional position (x, y, z) and the posture (w, p, r) of each workpiece 12 from the coordinate calculation unit 19. The operation result acquisition unit 26 acquires the result of the picking operation of the robot 14 for picking up the workpiece 12 by the robot hand 13, based on the output data of the three-dimensional measuring instrument 15 (the output data of the coordinate calculation unit 19). The operation result acquiring unit 26 may acquire the result of the picking operation, such as the degree of completion when the picked-up workpiece 12 is transferred to a subsequent process and the damage of the picked-up workpiece 12, in addition to the output data of the three-dimensional measuring instrument.
For example, the machine learning device 20 determines an optimum operation based on the output data of the three-dimensional measuring instrument 15 (step S12 in fig. 4), and the control device 16 outputs command data (operation amount) of the robot hand 13 (robot 14) to perform a workpiece 12 picking-up operation (step S13 in fig. 4). Then, the above-described operation result acquisition unit 26 acquires the workpiece extraction result (step S14 in fig. 4).
Next, whether or not the workpiece 12 is successfully removed is determined based on the output from the operation result acquisition unit 26 (step S15 in fig. 4), a positive return is set when the workpiece 12 is successfully removed (step S16 in fig. 4), a negative return is set when the workpiece 12 is unsuccessfully removed (step S17 in fig. 4), and then the behavior value table (cost function) is updated (step S18 in fig. 4).
Here, for example, the success or failure of the removal of the workpiece 12 may be determined based on the output data of the three-dimensional measuring instrument 15 after the removal operation of the workpiece 12. Further, the determination of the success or failure of the extraction of the workpiece 12 is not limited to the evaluation of the success or failure of the extraction of the workpiece 12, and for example, it may be evaluated that: the degree of completion when the taken-out workpiece 12 is transferred to a subsequent process, a change in state such as whether or not the taken-out workpiece 12 is damaged, or the time (cycle time) and energy (electric energy) required for gripping and conveying the workpiece 12 by the hand 13.
Further, the return value based on the determination of the success or failure of the removal of the workpiece 12 is calculated by the return calculation unit 23, and the behavior value table is updated by the merit function update unit 24. That is, the learning unit 22 sets a positive return to the updated return of the value Q (S, a) when the removal of the workpiece 12 is successful (S16), and sets a negative return to the updated return when the removal of the workpiece 12 is unsuccessful (S17). Then, each time the learning unit 22 takes out the workpiece 12, the above-described action value table is updated (S18). By repeating the above steps S11 to S18, the learning unit 22 continues (learns) the update of the behavior value table.
In the above description, the data input to the state quantity observation unit 21 is not limited to the output data of the three-dimensional measuring instrument 15, and may include data such as the output of another sensor, or may be a part of the command data from the control device 16. In this way, the control device 16 causes the robot 14 to execute the operation of picking up the workpiece 12 using the command data (operation amount) output from the machine learning device 20. The learning by the machine learning device 20 is not limited to the operation of taking out the workpiece 12, and may be, for example, the measurement parameters of the three-dimensional measuring instrument 15 as described above.
As described above, according to the robot system 10 including the machine learning device 20 of the present embodiment, it is possible to learn the operation of the robot 14 that takes out the workpiece 12 from the plurality of workpieces 12 placed in a random manner including a bulk state by the hand 13. Thus, the robot system 10 can learn the selection of the optimal operation of the robot 14 to take out the bulk stacked workpieces 12 without human intervention.
Fig. 5 is a block diagram showing a conceptual configuration of a robot system according to another embodiment of the present invention, and shows a robot system to which supervised learning is applied. As is apparent from a comparison between fig. 5 and fig. 1, the robot system 10' to which supervised learning is applied shown in fig. 5 further includes a data recording unit 40 with a result (label) as compared with the robot system 10 to which Q learning (reinforcement learning) is applied shown in fig. 1. The robot system 10' shown in fig. 5 further includes a preprocessor 50 for preprocessing the output data of the three-dimensional measuring instrument 15. It is needless to say that the preprocessing unit 50 may be provided to the robot system 10 shown in fig. 1, for example.
As shown in fig. 5, the machine learning device 30 in the robot system 10' to which supervised learning is applied includes: a state quantity observation unit 31, an operation configuration acquisition unit 36, a learning unit 32, and an intention determination unit 35. The learning unit 32 includes an error calculation unit 33 and a learning model update unit 34. In the robot system 10' of the present embodiment, the machine learning device 30 also learns and outputs instruction data instructing the robot 14 to perform the operation of taking out the workpiece 12 and operation amounts such as measurement parameters of the three-dimensional measuring instrument 15.
That is, in the robot system 10' to which supervised learning is applied shown in fig. 5, the error calculation unit 33 and the learning model update unit 34 correspond to the reward calculation unit 23 and the merit function update unit 24, respectively, in the robot system 10 to which Q learning is applied shown in fig. 1. The other mechanisms, for example, the three-dimensional measuring instrument 15, the control device 16, the robot 14, and the like have the same configurations as those of fig. 1, and the description thereof will be omitted.
The error calculation unit 33 calculates an error between the result (label) output from the action result acquisition unit 36 and the output of the learning model installed in the learning unit. Here, when the shape of the workpiece 12 and the processing of the robot 14 are the same, for example, the result (tag) -added data recording unit 40 holds the result (tag) -added data obtained up to the day before the scheduled date on which the robot 14 performs the work, and supplies the result (tag) -added data held in the result (tag) -added data recording unit 40 to the error calculation unit 33 on the scheduled date. Alternatively, data obtained by simulation or the like performed outside the robot system 10 'or data with a result (tag) of another robot system may be supplied to the error calculation unit 33 of the robot system 10' via a memory card or a communication line. The data recording unit 40 with the result (tag) may be configured by a nonvolatile Memory such as a Flash Memory (Flash Memory), the data recording unit (nonvolatile Memory) 40 with the result (tag) may be incorporated in the learning unit 32, and the data with the result (tag) held in the data recording unit 40 with the result (tag) may be directly used by the learning unit 32.
Fig. 6 is a diagram for explaining an example of processing by the preprocessing unit in the robot system shown in fig. 5, fig. 6(a) shows an example of output data of the three-dimensional measuring instrument 15, which is data of three-dimensional positions (postures) of the plurality of workpieces 12 stacked in bulk in the box 11, and fig. 6(b) to 6(d) show an example of image data obtained by preprocessing the workpieces 121 to 123 in fig. 6 (a).
Here, as the workpieces 12(121 to 123), a cylindrical metal member is expected, and as the robot hand (13), for example, an adsorption plate that adsorbs the longitudinal center portion of the cylindrical workpiece 12 by negative pressure is expected, instead of gripping the workpiece by 2 claws. Therefore, for example, if the position of the longitudinal center portion of the workpiece 12 is known, the workpiece 12 can be taken out by moving the suction tray (13) to the position and performing suction. The numerical values in fig. 6(a) to 6(d) are represented by [ mm ], and represent the x direction, the y direction, and the z direction, respectively. The z direction corresponds to a height (depth) direction of image data obtained by imaging the box 11 in which the plurality of workpieces 12 are stacked by the three-dimensional measuring instrument 15 (for example, having 2 cameras) provided above.
As is apparent from comparison between fig. 6 a and fig. 6 b to 6 d, as an example of processing by the preprocessing unit 50 in the robot system 10' shown in fig. 5, the workpiece 12 of interest (for example, 3 workpieces 121 to 123) is rotated based on the output data (three-dimensional image) of the three-dimensional measuring instrument 15, and is processed so that the height of the center is "0".
That is, the output data of the three-dimensional measuring instrument 15 includes information on the three-dimensional position (x, y, z) and posture (w, p, r) of the longitudinal central portion of each workpiece 12, for example. At this time, as shown in fig. 6(b), 6(c) and 6(d), the 3 workpieces 121, 122 and 123 of interest are rotated by-r and subtracted by z to satisfy the same condition. By performing such preprocessing, the load on the machine learning device 30 can be reduced.
Here, the three-dimensional map shown in fig. 6(a) is not output data of the three-dimensional measuring instrument 15 itself, but for example, a threshold value for selecting from images obtained by a program for defining the order of taking out the workpieces 12 is set lower than before, and the processing itself may be performed by the preprocessing unit 50. It is to be noted that the processing in the preprocessing unit 50 may be variously changed depending on various conditions such as the shape of the workpiece 12 and the type of the hand unit 13.
In this way, the output data (three-dimensional map of each workpiece 12) of the three-dimensional measuring instrument 15 processed by the preprocessor 50 before being input to the state quantity observing unit 31 is input to the state quantity observing unit 31. Referring again to fig. 5, the error calculation unit 33 that receives the result (label) output from the operation result acquisition unit 36, for example, when the output of the neural network shown in fig. 3 is y as the learning model, assumes that there is an error of-log (y) when the workpiece 12 is actually taken out and succeeds; it is considered that there is an error of-log (1-y) at the time of failure, and a process for minimizing the error is performed. As input to the neural network shown in fig. 3, for example, image data of the workpieces 121 to 123 of interest subjected to preprocessing as shown in fig. 6(b) to 6(d) and data of three-dimensional positions and postures (x, y, z, w, p, r) of the workpieces 121 to 123 of interest are provided.
Fig. 7 is a block diagram showing a modification of the robot system shown in fig. 1. As is apparent from comparison between fig. 7 and fig. 1, in the modification of the robot system 10 shown in fig. 7, the coordinate calculation unit 19 is eliminated, and the state quantity observation unit 21 observes the state quantity of the robot 14 by receiving only the three-dimensional map from the three-dimensional measurement device 15. Needless to say, the control device 16 may be provided with a configuration corresponding to the coordinate calculation unit 19. The configuration shown in fig. 7 can be applied to, for example, the robot system 10' to which supervised learning is applied, which is described with reference to fig. 5. That is, the preprocessing unit 50 may be eliminated from the robot system 10' shown in fig. 5, and the state quantity observation unit 31 may observe the state quantity of the robot 14 by receiving only the three-dimensional map from the three-dimensional measuring device 15. Thus, the above embodiments can be variously modified and changed.
As described above in detail, according to the present embodiment, it is possible to provide a machine learning device, a robot system, and a machine learning method that can learn an optimum operation of a robot when randomly placed workpieces including a bulk state are taken out without human intervention. The machine learning devices 20 and 30 of the present invention are not limited to the application of reinforcement learning (for example, Q learning) or supervised learning, and may be applied to various machine learning algorithms.
According to the machine learning device, the robot system, and the machine learning method of the present invention, it is possible to learn the optimum operation of the robot when taking out randomly placed workpieces including a bulk state without human intervention.
Although the embodiments have been described above, all the examples and conditions described herein are described for the purpose of facilitating understanding of the inventive concept applied to the invention and technology, and the examples and conditions described in particular are not intended to limit the scope of the invention. Moreover, such descriptions in the specification do not represent advantages and disadvantages of the invention. Although the embodiments of the invention have been described in detail, it should be understood that various changes, substitutions, and alterations can be made hereto without departing from the spirit and scope of the invention.

Claims (39)

1. A robot system is characterized by comprising:
a robot;
an observation unit that acquires data including at least one of information relating to an object measured by a measuring instrument and information obtained by processing the information;
a determination unit configured to input the data to a neural network to determine information for gripping the object by the robot; and
and a control device for controlling the robot based on the information determined by the determination unit.
2. The robotic system of claim 1,
the data includes at least information related to any one of a position, a posture, and a distance of the object.
3. The robotic system of claim 1,
the data includes at least any one of distance image information of the object, three-dimensional position information of the object, and posture information.
4. The robotic system of claim 1,
the meter includes a three-dimensional vision sensor.
5. The robotic system of any of claims 1-4,
the neural network is learned by reinforcement learning using a return calculated based on a result of holding an object.
6. The robotic system of claim 5,
the result of the gripping of the object at least includes: the number of times the object is successfully gripped, the time required for gripping and conveying the object, the force acting on the hand of the robot, the degree of completion in the post-process after gripping the object, the state of the object, and the energy required for gripping and conveying the object.
7. The robotic system of any of claims 1-4,
the neural network is learned so that an error calculated from the label related to the holding result of the object and the output of the neural network is minimized.
8. The robotic system of any of claims 1-4,
the neural network outputs information related to an operation amount of a hand of the robot.
9. The robotic system of any of claims 1-4,
the neural network outputs information related to a probability of success of the gripping of the object or information related to a position of the object.
10. The robotic system of any of claims 1-4,
the information determined by the determination unit includes at least information for setting any one of a position, a posture, and a pickup direction of a hand of the robot.
11. The robotic system of any of claims 1-4,
the information determined by the determination unit includes at least information related to any one of a torque, a speed, and a rotational speed of a drive shaft provided to the robot.
12. The robotic system of any of claims 1-4,
the data acquired by the observation unit includes the information determined by the determination unit.
13. The robotic system of any of claims 1-4,
the determination unit outputs information for operating the measurement unit using the neural network.
14. The robotic system of any of claims 1-4,
the neural network learns from data acquired by other robots.
15. The robotic system of any of claims 1-4,
the neural network learns according to the results of the simulation.
16. The robotic system of any of claims 1-4,
the neural network resides on a cloud server.
17. The robotic system of any of claims 1-4,
the measuring device is mounted on an arm of the robot.
18. The robotic system of any of claims 1-4,
holding the object includes attracting or adsorbing the object by a hand of the robot.
19. The robotic system as claimed in claim 18,
the information determined by the determination unit includes information for attracting or attracting the object by the hand of the robot.
20. The robotic system as claimed in claim 18,
the hand of the robot generates magnetic force or negative pressure.
21. The robotic system of any of claims 1-4,
the information after processing the information is information obtained using machine learning.
22. A control method of a robot is characterized in that,
comprises the following steps:
acquiring data including at least one of information relating to an object measured by a measuring instrument and information obtained by processing the information;
determining information for holding the object by the robot by inputting the data into a neural network; and
controlling the robot according to the determined information.
23. A machine learning device for learning the motion of a robot gripping an object with a hand,
the machine learning device includes:
an observation unit that acquires data including at least one of information relating to the object measured by the measuring device and information obtained by processing the information;
a determination unit configured to input the data to a neural network to determine information for gripping the object with the hand; and
and a learning unit that learns the neural network based on a result of gripping of the object by the robot controlled based on the information determined by the determination unit.
24. The machine learning apparatus of claim 23,
the data includes at least information related to any one of a position, a posture, and a distance of the object.
25. The machine learning apparatus of claim 23,
the data includes at least any one of distance image information of the object, three-dimensional position information of the object, and posture information.
26. The machine learning apparatus of any one of claims 23 to 25,
the learning unit calculates a reward based on a result of gripping the object, and learns the neural network by reinforcement learning using the reward.
27. The machine learning apparatus of claim 26,
the result of the gripping of the object at least includes: the number of times the object is successfully gripped, the time required for gripping and conveying the object, the force acting on the hand, the degree of completion in the post-process after gripping the object, the state of the object, and the energy required for gripping and conveying the object.
28. The machine learning apparatus of any one of claims 23 to 25,
the learning section learns the neural network so as to minimize an error based on a tag related to a gripping result of the object and an output calculation error of the neural network.
29. The machine learning apparatus of any one of claims 23 to 25,
the neural network outputs information related to an operation amount of the hand.
30. The machine learning apparatus of any one of claims 23 to 25,
the neural network outputs information related to a probability of success of the gripping of the object or information related to a position of the object.
31. The machine learning apparatus of any one of claims 23 to 25,
the information determined by the determination unit includes at least information for setting any one of the position, the posture, and the extraction direction of the hand.
32. The machine learning apparatus of any one of claims 23 to 25,
the information determined by the determination unit includes at least information related to any one of torque, speed, and rotational position of a drive shaft provided to the robot.
33. The machine learning apparatus of any one of claims 23 to 25,
the data acquired by the observation unit includes the information determined by the determination unit.
34. The machine learning apparatus according to any one of claims 23 to 25, wherein the determination unit outputs information for operating the measurement unit using the neural network.
35. The machine learning apparatus of any one of claims 23 to 25,
the information on the object is information measured by the measuring instrument attached to the arm portion of the robot.
36. The machine learning apparatus of any of claims 23 to 25, wherein holding the object comprises attracting or adsorbing the object by the hand.
37. The machine learning apparatus of claim 36,
the information determined by the determination unit includes information for the hand to attract or adsorb the object.
38. The machine learning apparatus of claim 36,
the hand generates a magnetic force or negative pressure.
39. A machine learning method for learning the motion of a robot gripping an object with a hand,
comprises the following steps:
acquiring data including at least one of information relating to the object measured by the measuring device and information obtained by processing the information;
determining information for holding the object by the hand by inputting the data into a neural network; and
the neural network is learned based on a result of the gripping of the object by the robot controlled based on the information determined by the determination unit.
CN202110544521.3A 2015-07-31 2016-07-29 Robot system, robot control method, machine learning device, and machine learning method Pending CN113199483A (en)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
JP2015152067 2015-07-31
JP2015-152067 2015-07-31
JP2015-233857 2015-11-30
JP2015233857A JP6522488B2 (en) 2015-07-31 2015-11-30 Machine learning apparatus, robot system and machine learning method for learning work taking-out operation
CN201610617361.XA CN106393102B (en) 2015-07-31 2016-07-29 Machine learning device, robot system, and machine learning method

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201610617361.XA Division CN106393102B (en) 2015-07-31 2016-07-29 Machine learning device, robot system, and machine learning method

Publications (1)

Publication Number Publication Date
CN113199483A true CN113199483A (en) 2021-08-03

Family

ID=57985283

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110544521.3A Pending CN113199483A (en) 2015-07-31 2016-07-29 Robot system, robot control method, machine learning device, and machine learning method
CN201610617361.XA Active CN106393102B (en) 2015-07-31 2016-07-29 Machine learning device, robot system, and machine learning method

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201610617361.XA Active CN106393102B (en) 2015-07-31 2016-07-29 Machine learning device, robot system, and machine learning method

Country Status (3)

Country Link
JP (5) JP6522488B2 (en)
CN (2) CN113199483A (en)
DE (1) DE102016015873B3 (en)

Families Citing this family (110)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6522488B2 (en) * 2015-07-31 2019-05-29 ファナック株式会社 Machine learning apparatus, robot system and machine learning method for learning work taking-out operation
JP6771744B2 (en) * 2017-01-25 2020-10-21 株式会社安川電機 Handling system and controller
JP6453922B2 (en) * 2017-02-06 2019-01-16 ファナック株式会社 Work picking apparatus and work picking method for improving work picking operation
WO2018163242A1 (en) * 2017-03-06 2018-09-13 株式会社Fuji Data structure for creating image-processing data and method for creating image-processing data
JP6542824B2 (en) * 2017-03-13 2019-07-10 ファナック株式会社 Image processing apparatus and image processing method for calculating likelihood of image of object detected from input image
JP6438512B2 (en) 2017-03-13 2018-12-12 ファナック株式会社 ROBOT SYSTEM, MEASUREMENT DATA PROCESSING DEVICE, AND MEASUREMENT DATA PROCESSING METHOD FOR TAKE OUT WORK WITH MEASUREMENT DATA CORRECTED BY MACHINE LEARN
JP6902369B2 (en) * 2017-03-15 2021-07-14 株式会社オカムラ Presentation device, presentation method and program, and work system
JP6869060B2 (en) * 2017-03-15 2021-05-12 株式会社オカムラ Manipulator controls, control methods and programs, and work systems
JP6983524B2 (en) * 2017-03-24 2021-12-17 キヤノン株式会社 Information processing equipment, information processing methods and programs
JP6557272B2 (en) * 2017-03-29 2019-08-07 ファナック株式会社 State determination device
JP6680714B2 (en) * 2017-03-30 2020-04-15 ファナック株式会社 Control device and machine learning device for wire electric discharge machine
JP6490132B2 (en) * 2017-03-31 2019-03-27 ファナック株式会社 Robot control device, machine learning device, and machine learning method
JP6526100B2 (en) * 2017-04-28 2019-06-05 ファナック株式会社 Material pick-up system
JP6487495B2 (en) * 2017-05-19 2019-03-20 ファナック株式会社 Work removal system
JP7045139B2 (en) * 2017-06-05 2022-03-31 株式会社日立製作所 Machine learning equipment, machine learning methods, and machine learning programs
JP6542839B2 (en) * 2017-06-07 2019-07-10 ファナック株式会社 Control device and machine learning device
JP6886869B2 (en) * 2017-06-09 2021-06-16 川崎重工業株式会社 Motion prediction system and motion prediction method
CN107336234A (en) * 2017-06-13 2017-11-10 赛赫智能设备(上海)股份有限公司 A kind of reaction type self study industrial robot and method of work
CN116476044A (en) 2017-06-19 2023-07-25 谷歌有限责任公司 Robot gripping prediction using neural network and geometric aware object representation
CN107329445B (en) * 2017-06-28 2020-09-08 重庆柚瓣家科技有限公司 Intelligent supervision method for robot behavior criterion
CN107255969B (en) * 2017-06-28 2019-10-18 重庆柚瓣家科技有限公司 Endowment robot supervisory systems
CN107252785B (en) * 2017-06-29 2019-05-10 顺丰速运有限公司 A kind of express mail grasping means applied to quick despatch robot piece supplying
JP6564426B2 (en) * 2017-07-07 2019-08-21 ファナック株式会社 Parts supply device and machine learning device
JP7116901B2 (en) * 2017-08-01 2022-08-12 オムロン株式会社 ROBOT CONTROL DEVICE, ROBOT CONTROL METHOD AND ROBOT CONTROL PROGRAM
DE102017213658A1 (en) * 2017-08-07 2019-02-07 Robert Bosch Gmbh Handling arrangement with a handling device for performing at least one work step and method and computer program
JP6680730B2 (en) 2017-08-08 2020-04-15 ファナック株式会社 Control device and learning device
JP6680732B2 (en) * 2017-08-23 2020-04-15 ファナック株式会社 Goods stacking device and machine learning device
JP6795472B2 (en) * 2017-08-28 2020-12-02 ファナック株式会社 Machine learning device, machine learning system and machine learning method
CA3073516A1 (en) * 2017-09-01 2019-03-07 The Regents Of The University Of California Robotic systems and methods for robustly grasping and targeting objects
JP6608890B2 (en) * 2017-09-12 2019-11-20 ファナック株式会社 Machine learning apparatus, robot system, and machine learning method
EP3456485B1 (en) * 2017-09-15 2021-02-17 Siemens Aktiengesellschaft Optimisation of an automated process for selecting and gripping an object by a robot
JP6895563B2 (en) * 2017-09-25 2021-06-30 ファナック株式会社 Robot system, model generation method, and model generation program
JP6695843B2 (en) 2017-09-25 2020-05-20 ファナック株式会社 Device and robot system
JP6579498B2 (en) * 2017-10-20 2019-09-25 株式会社安川電機 Automation device and position detection device
JP2019084601A (en) 2017-11-02 2019-06-06 キヤノン株式会社 Information processor, gripping system and information processing method
JP6815309B2 (en) * 2017-11-16 2021-01-20 株式会社東芝 Operating system and program
JP6676030B2 (en) 2017-11-20 2020-04-08 株式会社安川電機 Grasping system, learning device, gripping method, and model manufacturing method
JP6680750B2 (en) * 2017-11-22 2020-04-15 ファナック株式会社 Control device and machine learning device
US10828778B2 (en) * 2017-11-30 2020-11-10 Abb Schweiz Ag Method for operating a robot
CN108340367A (en) * 2017-12-13 2018-07-31 深圳市鸿益达供应链科技有限公司 Machine learning method for mechanical arm crawl
JP7136554B2 (en) * 2017-12-18 2022-09-13 国立大学法人信州大学 Grasping device, learning device, program, grasping system, and learning method
KR102565444B1 (en) * 2017-12-21 2023-08-08 삼성전자주식회사 Method and apparatus for identifying object
JP6587195B2 (en) * 2018-01-16 2019-10-09 株式会社Preferred Networks Tactile information estimation device, tactile information estimation method, program, and non-transitory computer-readable medium
JP6458912B1 (en) * 2018-01-24 2019-01-30 三菱電機株式会社 Position control device and position control method
JP6892400B2 (en) * 2018-01-30 2021-06-23 ファナック株式会社 Machine learning device that learns the failure occurrence mechanism of laser devices
JP6703020B2 (en) * 2018-02-09 2020-06-03 ファナック株式会社 Control device and machine learning device
JP6874712B2 (en) * 2018-02-19 2021-05-19 オムロン株式会社 Simulation equipment, simulation method and simulation program
JP7005388B2 (en) * 2018-03-01 2022-01-21 株式会社東芝 Information processing equipment and sorting system
JP6873941B2 (en) 2018-03-02 2021-05-19 株式会社日立製作所 Robot work system and control method of robot work system
WO2019171123A1 (en) 2018-03-05 2019-09-12 Omron Corporation Method, apparatus, system and program for controlling a robot, and storage medium
JP6879238B2 (en) * 2018-03-13 2021-06-02 オムロン株式会社 Work picking device and work picking method
JP6911798B2 (en) * 2018-03-15 2021-07-28 オムロン株式会社 Robot motion control device
JP2019162712A (en) * 2018-03-20 2019-09-26 ファナック株式会社 Control device, machine learning device and system
JP6687657B2 (en) * 2018-03-20 2020-04-28 ファナック株式会社 Article taking-out apparatus using sensor and robot, and article taking-out method
KR102043898B1 (en) * 2018-03-27 2019-11-12 한국철도기술연구원 Auto picking system and method for automatically picking using the same
JP6810087B2 (en) * 2018-03-29 2021-01-06 ファナック株式会社 Machine learning device, robot control device and robot vision system using machine learning device, and machine learning method
US11260534B2 (en) 2018-04-04 2022-03-01 Canon Kabushiki Kaisha Information processing apparatus and information processing method
US11579000B2 (en) 2018-04-05 2023-02-14 Fanuc Corporation Measurement operation parameter adjustment apparatus, machine learning device, and system
JP6829271B2 (en) * 2018-04-05 2021-02-10 ファナック株式会社 Measurement operation parameter adjustment device, machine learning device and system
CN108527371A (en) * 2018-04-17 2018-09-14 重庆邮电大学 A kind of Dextrous Hand planing method based on BP neural network
EP3785866B1 (en) 2018-04-26 2023-12-20 Panasonic Holdings Corporation Actuator device, method for removing target object using actuator device, and target object removal system
JP7154815B2 (en) 2018-04-27 2022-10-18 キヤノン株式会社 Information processing device, control method, robot system, computer program, and storage medium
CN112203812B (en) 2018-05-25 2023-05-16 川崎重工业株式会社 Robot system and additional learning method
JP7039389B2 (en) * 2018-05-25 2022-03-22 川崎重工業株式会社 Robot system and robot control method
KR102094360B1 (en) * 2018-06-11 2020-03-30 동국대학교 산학협력단 System and method for predicting force based on image
JP7008136B2 (en) * 2018-06-14 2022-01-25 ヤマハ発動機株式会社 Machine learning device and robot system equipped with it
JP7102241B2 (en) * 2018-06-14 2022-07-19 ヤマハ発動機株式会社 Machine learning device and robot system equipped with it
WO2019239563A1 (en) * 2018-06-14 2019-12-19 ヤマハ発動機株式会社 Robot system
JP6784722B2 (en) * 2018-06-28 2020-11-11 ファナック株式会社 Output device, control device, and evaluation function value output method
JP2020001127A (en) * 2018-06-28 2020-01-09 勇貴 高橋 Picking system, picking processing equipment, and program
WO2020009139A1 (en) * 2018-07-04 2020-01-09 株式会社Preferred Networks Learning method, learning device, learning system, and program
JP6740288B2 (en) * 2018-07-13 2020-08-12 ファナック株式会社 Object inspection apparatus, object inspection system, and method for adjusting inspection position
WO2020021643A1 (en) * 2018-07-24 2020-01-30 株式会社Fuji End effector selection method and selection system
JP7191569B2 (en) * 2018-07-26 2022-12-19 Ntn株式会社 gripping device
WO2020026447A1 (en) * 2018-08-03 2020-02-06 株式会社Fuji Parameter learning method and work system
JP7034035B2 (en) * 2018-08-23 2022-03-11 株式会社日立製作所 Motion generation method for autonomous learning robot device and autonomous learning robot device
CN109434844B (en) * 2018-09-17 2022-06-28 鲁班嫡系机器人(深圳)有限公司 Food material processing robot control method, device and system, storage medium and equipment
JP6895128B2 (en) * 2018-11-09 2021-06-30 オムロン株式会社 Robot control device, simulation method, and simulation program
JP7159525B2 (en) * 2018-11-29 2022-10-25 京セラドキュメントソリューションズ株式会社 ROBOT CONTROL DEVICE, LEARNING DEVICE, AND ROBOT CONTROL SYSTEM
CN109731793A (en) * 2018-12-17 2019-05-10 上海航天电子有限公司 A kind of small lot chip bulk cargo device intelligent sorting equipment
EP3904017A4 (en) 2018-12-27 2023-01-18 Kawasaki Jukogyo Kabushiki Kaisha Robot control device, robot system, and robot control method
JP7128736B2 (en) 2018-12-27 2022-08-31 川崎重工業株式会社 ROBOT CONTROL DEVICE, ROBOT SYSTEM AND ROBOT CONTROL METHOD
CN109784400A (en) * 2019-01-12 2019-05-21 鲁班嫡系机器人(深圳)有限公司 Intelligent body Behavioral training method, apparatus, system, storage medium and equipment
JP7000359B2 (en) * 2019-01-16 2022-01-19 ファナック株式会社 Judgment device
JP6632095B1 (en) * 2019-01-16 2020-01-15 株式会社エクサウィザーズ Learned model generation device, robot control device, and program
JP7252787B2 (en) 2019-02-28 2023-04-05 川崎重工業株式会社 Machine learning model operation management system and machine learning model operation management method
JP7336856B2 (en) * 2019-03-01 2023-09-01 株式会社Preferred Networks Information processing device, method and program
WO2020194392A1 (en) * 2019-03-22 2020-10-01 connectome.design株式会社 Computer, method, and program for generating teaching data for autonomous robot
JP7302226B2 (en) * 2019-03-27 2023-07-04 株式会社ジェイテクト SUPPORT DEVICE AND SUPPORT METHOD FOR GRINDER
JP7349423B2 (en) * 2019-06-19 2023-09-22 株式会社Preferred Networks Learning device, learning method, learning model, detection device and grasping system
JP2021013996A (en) * 2019-07-12 2021-02-12 キヤノン株式会社 Control method of robot system, manufacturing method of articles, control program, recording medium, and robot system
JP7415356B2 (en) * 2019-07-29 2024-01-17 セイコーエプソン株式会社 Program transfer system and robot system
CN110456644B (en) * 2019-08-13 2022-12-06 北京地平线机器人技术研发有限公司 Method and device for determining execution action information of automation equipment and electronic equipment
DE112020004135T5 (en) 2019-08-28 2022-06-02 Daily Color Inc. robot control device
JP7021158B2 (en) 2019-09-04 2022-02-16 株式会社東芝 Robot system and drive method
JP7458741B2 (en) * 2019-10-21 2024-04-01 キヤノン株式会社 Robot control device and its control method and program
JP6924448B2 (en) * 2019-12-02 2021-08-25 Arithmer株式会社 Picking system, picking method, and program
US20230064484A1 (en) * 2020-01-16 2023-03-02 Omron Corporation Control apparatus, control method, and computer-readable storage medium storing a control program
JP7463777B2 (en) 2020-03-13 2024-04-09 オムロン株式会社 CONTROL DEVICE, LEARNING DEVICE, ROBOT SYSTEM, AND METHOD
JP7245959B2 (en) 2020-04-28 2023-03-24 ヤマハ発動機株式会社 Machine learning method and robot system
JP2023145809A (en) * 2020-07-10 2023-10-12 株式会社Preferred Networks Reinforcement learning device, reinforcement learning system, object operation device, model generation method and reinforcement learning program
CN112476424A (en) * 2020-11-13 2021-03-12 腾讯科技(深圳)有限公司 Robot control method, device, equipment and computer storage medium
CN116547706A (en) 2020-12-08 2023-08-04 索尼集团公司 Learning device, learning system, and learning method
DE102021104001B3 (en) 2021-02-19 2022-04-28 Gerhard Schubert Gesellschaft mit beschränkter Haftung Method for automatically grasping, in particular moving, objects
KR102346900B1 (en) * 2021-08-05 2022-01-04 주식회사 애자일소다 Deep reinforcement learning apparatus and method for pick and place system
DE102021209646B4 (en) 2021-09-02 2024-05-02 Robert Bosch Gesellschaft mit beschränkter Haftung Robot device, method for computer-implemented training of a robot control model and method for controlling a robot device
JPWO2023042306A1 (en) * 2021-09-15 2023-03-23
EP4311632A1 (en) * 2022-07-27 2024-01-31 Siemens Aktiengesellschaft Method for gripping an object, computer program and electronically readable data carrier
CN115816466B (en) * 2023-02-02 2023-06-16 中国科学技术大学 Method for improving control stability of vision observation robot
CN117697769B (en) * 2024-02-06 2024-04-30 成都威世通智能科技有限公司 Robot control system and method based on deep learning

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06203166A (en) * 1993-01-06 1994-07-22 Fujitsu Ltd Measurement, controller and learning method for multi-dimensional position
JPH11272845A (en) * 1998-03-23 1999-10-08 Denso Corp Image recognition device
CN101034418A (en) * 2006-03-10 2007-09-12 发那科株式会社 Device, program, recording medium and method for robot simulation
US20070282485A1 (en) * 2006-06-06 2007-12-06 Fanuc Ltd Robot simulation apparatus
US20090033655A1 (en) * 2007-08-02 2009-02-05 Boca Remus F System and method of three-dimensional pose estimation
JP2009262279A (en) * 2008-04-25 2009-11-12 Nec Corp Robot, robot program sharing system, robot program sharing method, and program
CN101726251A (en) * 2009-11-13 2010-06-09 江苏大学 Automatic fruit identification method of apple picking robot on basis of support vector machine
CN101782976A (en) * 2010-01-15 2010-07-21 南京邮电大学 Automatic selection method for machine learning in cloud computing environment
JP2013052490A (en) * 2011-09-06 2013-03-21 Mitsubishi Electric Corp Workpiece takeout device
US20130151007A1 (en) * 2010-06-24 2013-06-13 Zenrobotics Oy Method for the selection of physical objects in a robot system
CN103568014A (en) * 2012-07-26 2014-02-12 发那科株式会社 Apparatus and method of taking out bulk stored articles by manipulator
US20140114888A1 (en) * 2012-10-18 2014-04-24 Sony Corporation Information processing apparatus, information processing method, and program
CN104793620A (en) * 2015-04-17 2015-07-22 中国矿业大学 Obstacle avoidance robot based on visual feature binding and reinforcement learning theory

Family Cites Families (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0588721A (en) * 1991-09-30 1993-04-09 Fujitsu Ltd Controller for articulated robot
JPH06106490A (en) * 1992-09-29 1994-04-19 Fujitsu Ltd Control device
JP3211186B2 (en) * 1997-12-15 2001-09-25 オムロン株式会社 Robot, robot system, robot learning method, robot system learning method, and recording medium
JP3859371B2 (en) * 1998-09-25 2006-12-20 松下電工株式会社 Picking equipment
JP2001019165A (en) 1999-07-02 2001-01-23 Murata Mach Ltd Work picking device
KR20020008848A (en) * 2000-03-31 2002-01-31 이데이 노부유끼 Robot device, robot device action control method, external force detecting device and external force detecting method
US6925357B2 (en) * 2002-07-25 2005-08-02 Intouch Health, Inc. Medical tele-robotic system
JP3834307B2 (en) * 2003-09-29 2006-10-18 ファナック株式会社 Robot system
JP4630553B2 (en) * 2004-01-15 2011-02-09 ソニー株式会社 Dynamic control device and biped walking mobile body using dynamic control device
JP2005238422A (en) 2004-02-27 2005-09-08 Sony Corp Robot device, its state transition model construction method and behavior control method
JP4746349B2 (en) * 2005-05-18 2011-08-10 日本電信電話株式会社 Robot action selection device and robot action selection method
JP2007280054A (en) * 2006-04-06 2007-10-25 Sony Corp Learning device, learning method, and program
JP4199264B2 (en) * 2006-05-29 2008-12-17 ファナック株式会社 Work picking apparatus and method
JP2010086405A (en) 2008-10-01 2010-04-15 Fuji Heavy Ind Ltd System for adapting control parameter
JP5330138B2 (en) * 2008-11-04 2013-10-30 本田技研工業株式会社 Reinforcement learning system
EP2249292A1 (en) * 2009-04-03 2010-11-10 Siemens Aktiengesellschaft Decision making mechanism, method, module, and robot configured to decide on at least one prospective action of the robot
JP5743499B2 (en) 2010-11-10 2015-07-01 キヤノン株式会社 Image generating apparatus, image generating method, and program
JP5767464B2 (en) * 2010-12-15 2015-08-19 キヤノン株式会社 Information processing apparatus, information processing apparatus control method, and program
JP5750657B2 (en) * 2011-03-30 2015-07-22 株式会社国際電気通信基礎技術研究所 Reinforcement learning device, control device, and reinforcement learning method
JP5787642B2 (en) 2011-06-28 2015-09-30 キヤノン株式会社 Object holding device, method for controlling object holding device, and program
JP5670397B2 (en) * 2012-08-29 2015-02-18 ファナック株式会社 Apparatus and method for picking up loosely stacked articles by robot
JP6106490B2 (en) 2013-03-28 2017-03-29 シャープ株式会社 Self-propelled electronic device and travel area designation system for self-propelled electronic device
JP6126437B2 (en) 2013-03-29 2017-05-10 キヤノン株式会社 Image processing apparatus and image processing method
JP5968259B2 (en) * 2013-04-11 2016-08-10 日本電信電話株式会社 Reinforcement learning method, apparatus and program based on linear model
JP5929854B2 (en) * 2013-07-31 2016-06-08 株式会社安川電機 Robot system and method of manufacturing workpiece
CN103753557B (en) * 2014-02-14 2015-06-17 上海创绘机器人科技有限公司 Self-balance control method of movable type inverted pendulum system and self-balance vehicle intelligent control system
JP6522488B2 (en) * 2015-07-31 2019-05-29 ファナック株式会社 Machine learning apparatus, robot system and machine learning method for learning work taking-out operation

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06203166A (en) * 1993-01-06 1994-07-22 Fujitsu Ltd Measurement, controller and learning method for multi-dimensional position
JPH11272845A (en) * 1998-03-23 1999-10-08 Denso Corp Image recognition device
CN101034418A (en) * 2006-03-10 2007-09-12 发那科株式会社 Device, program, recording medium and method for robot simulation
US20070282485A1 (en) * 2006-06-06 2007-12-06 Fanuc Ltd Robot simulation apparatus
US20090033655A1 (en) * 2007-08-02 2009-02-05 Boca Remus F System and method of three-dimensional pose estimation
JP2009262279A (en) * 2008-04-25 2009-11-12 Nec Corp Robot, robot program sharing system, robot program sharing method, and program
CN101726251A (en) * 2009-11-13 2010-06-09 江苏大学 Automatic fruit identification method of apple picking robot on basis of support vector machine
CN101782976A (en) * 2010-01-15 2010-07-21 南京邮电大学 Automatic selection method for machine learning in cloud computing environment
US20130151007A1 (en) * 2010-06-24 2013-06-13 Zenrobotics Oy Method for the selection of physical objects in a robot system
JP2013052490A (en) * 2011-09-06 2013-03-21 Mitsubishi Electric Corp Workpiece takeout device
CN103568014A (en) * 2012-07-26 2014-02-12 发那科株式会社 Apparatus and method of taking out bulk stored articles by manipulator
US20140114888A1 (en) * 2012-10-18 2014-04-24 Sony Corporation Information processing apparatus, information processing method, and program
CN104793620A (en) * 2015-04-17 2015-07-22 中国矿业大学 Obstacle avoidance robot based on visual feature binding and reinforcement learning theory

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林芬;石川;罗杰文;史忠植;: "基于偏向信息学习的双层强化学习算法", 计算机研究与发展, no. 09, 15 September 2008 (2008-09-15), pages 1455 - 1462 *

Also Published As

Publication number Publication date
JP2024069414A (en) 2024-05-21
JP7491971B2 (en) 2024-05-28
JP2020168719A (en) 2020-10-15
JP2017064910A (en) 2017-04-06
DE102016015873B3 (en) 2020-10-29
CN106393102A (en) 2017-02-15
JP2022145915A (en) 2022-10-04
JP7100426B2 (en) 2022-07-13
CN106393102B (en) 2021-06-01
JP6522488B2 (en) 2019-05-29
JP2017030135A (en) 2017-02-09

Similar Documents

Publication Publication Date Title
CN106393102B (en) Machine learning device, robot system, and machine learning method
US11780095B2 (en) Machine learning device, robot system, and machine learning method for learning object picking operation
JP6810087B2 (en) Machine learning device, robot control device and robot vision system using machine learning device, and machine learning method
CN109483573B (en) Machine learning device, robot system, and machine learning method
CN109421071B (en) Article stacking device and machine learning device
US10486306B2 (en) Control device for controlling robot by learning action of person, robot system, and production system
CN108393908B (en) Workpiece taking-out device and workpiece taking-out method for improving workpiece taking-out action
CN107866809B (en) Machine learning device and machine learning method for learning optimal article holding path
CN106826812B (en) Machine learning device, machine learning method, laminated core manufacturing device, and laminated core manufacturing system
US20180222048A1 (en) Control device, robot, and robot system
CN109955115B (en) Chip removing device and information processing device
JP7191569B2 (en) gripping device
CN109814615B (en) Control device and machine learning device
JP2018202550A (en) Machine learning device, machine learning method, and machine learning program
CN113826051A (en) Generating digital twins of interactions between solid system parts
CN111745640B (en) Object detection method, object detection device, and robot system
CN111319039B (en) Robot
US20210072734A1 (en) Information processing apparatus and method, robot controlling apparatus and method, and non-transitory computer-readable storage medium
US10807234B2 (en) Component supply device and machine learning device
CN108687766B (en) Robot control device, machine learning device, and machine learning method
CN117377558A (en) Automatic pick and place system
CN117916771A (en) Image processing apparatus, component holding system, image processing method, and component holding method
Cabrera et al. Real time object recognition methodology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination