CN106393102A - Machine learning device, robot system, and machine learning method - Google Patents

Machine learning device, robot system, and machine learning method Download PDF

Info

Publication number
CN106393102A
CN106393102A CN201610617361.XA CN201610617361A CN106393102A CN 106393102 A CN106393102 A CN 106393102A CN 201610617361 A CN201610617361 A CN 201610617361A CN 106393102 A CN106393102 A CN 106393102A
Authority
CN
China
Prior art keywords
robot
workpiece
action
state
taking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610617361.XA
Other languages
Chinese (zh)
Other versions
CN106393102B (en
Inventor
山崎岳
尾山拓未
陶山峻
中山隆
中山一隆
组谷英俊
中川浩
冈野原大辅
奥田辽介
松元睿
松元睿一
河合圭悟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Preferred Network Co
Fanuc Corp
Original Assignee
Preferred Network Co
Fanuc Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Preferred Network Co, Fanuc Corp filed Critical Preferred Network Co
Priority to CN202110544521.3A priority Critical patent/CN113199483A/en
Publication of CN106393102A publication Critical patent/CN106393102A/en
Application granted granted Critical
Publication of CN106393102B publication Critical patent/CN106393102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1679Programme controls characterised by the tasks executed
    • B25J9/1687Assembly, peg and hole, palletising, straight line, weaving pattern movement
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1628Programme controls characterised by the control loop
    • B25J9/163Programme controls characterised by the control loop learning, adaptive, model based, rule based expert control
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/39Robotics, robotics to robotics hand
    • G05B2219/39297First learn inverse model, then fine tune with ffw error learning
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/40Robotics, robotics mapping to robotics vision
    • G05B2219/40053Pick 3-D object from pile of objects

Abstract

The invention provides a machine learning device, a robot system, and a machine learning method. The machine learning device that learns an operation of a robot (14) for picking up, by a hand unit, any of a plurality of workpieces (12) placed in a random fashion, including a bulk-loaded state, includes a state variable observation unit (21) that observes a state variable representing a state of the robot, including data output from a three-dimensional measuring device (15) that obtains a three-dimensional map for each workpiece, an operation result obtaining unit (26) that obtains a result of a picking operation of the robot for picking up the workpiece by the hand unit, and a learning unit (22) that learns a manipulated variable including command data for commanding the robot to perform the picking operation of the workpiece, in association with the state variable of the robot and the result of the picking operation, upon receiving output from the state variable observation unit and output from the operation result obtaining unit.

Description

Rote learning device, robot system and learning by rote
Technical field
The present invention relates to study comprises the rote learning device of the taking-up action of the workpiece of mixed and disorderly placement, the machine of state in bulk Device people's system and learning by rote.
Background technology
Disclosed in such as No. 5642738 publications of Japanese Patent No. and No. 5670397 publications of Japanese Patent No. known in the past that Sample, holds the workpiece in basket case in bulk the robot system carried by the robot arm of robot.Such In robot system, for example, the positional information of multiple workpiece is obtained using the 3-D measurer of setting above basket case, And workpiece is taken out one by one according to this positional information by the robot arm of robot.
However, in above-mentioned conventional robot system, needing to be previously set for example how basis is surveyed by 3-D measurer The range image of the multiple workpiece measuring is extracting workpiece to be taken out, and the workpiece taking out which position.Furthermore, it is necessary to thing First the robot arm action of robot how is made to be programmed to when taking out workpiece.Specifically, for example, people need to use teaching Plate carrys out the taking-up action to robot teaching workpiece.
Therefore, when the setting of the workpiece to be taken out of the range image extraction according to multiple workpiece is improper or unsuitable During the operation program of making machine people, success rate when robot takes out workpiece and carries reduces.Additionally, in order to improve this success Rate, need people to repeat trial and error groping the optimal action of robot, while the detection to workpiece sets and robot Operation program is updated.
Content of the invention
Therefore, in view of the foregoing, it is an object of the invention to provide one kind can learn to take out without manpower intervention Comprise state in bulk the workpiece of mixed and disorderly placement when the rote learning device of the optimal action of robot, robot system and Learning by rote.
According to the first embodiment of the invention, there is provided a kind of study is by robot arm from comprising the miscellaneous of state in bulk Leave about the rote learning device of the action of robot taking out described workpiece in the multiple workpiece put, this rote learning device has Standby:Quantity of state observation unit, its observation comprises the quantity of state of the described robot of output data of 3-D measurer, this three-dimensional measurement Device measures the graphics of each described workpiece;The result of the action obtaining section, it obtains and takes out described workpiece by described robot arm The taking-up action of described robot result;Study portion, it accepts from the output of described quantity of state observation unit with from institute State the output of the result of the action obtaining section, learn in association with the described quantity of state of described robot and the result of described taking-up action Practise operational ton, this operational ton comprises to indicate the director data of the described taking-up action of described workpiece to described robot.Described machine Tool learning device is preferably also equipped with being intended to determination section, its described operational ton learning with reference to described study portion, to determine to institute State the described director data of robot instruction.
Second embodiment of the invention, there is provided a kind of study is by robot arm from comprising the miscellaneous of state in bulk Leave about the rote learning device of the action of robot taking out described workpiece in the multiple workpiece put, this rote learning device has Standby:Quantity of state observation unit, its observation comprises the quantity of state of the described robot of output data of 3-D measurer, this three-dimensional measurement Device measures the graphics of each described workpiece;The result of the action obtaining section, it obtains and takes out described workpiece by described robot arm The taking-up action of described robot result;Study portion, it accepts from the output of described quantity of state observation unit with from institute State the output of the result of the action obtaining section, learn in association with the described quantity of state of described robot and the result of described taking-up action Practise the operational ton of the measurement parameter comprising described 3-D measurer.Described rote learning device is preferably also equipped with being intended to determination section, The described operational ton that it learns with reference to described study portion to determine the described measurement parameter of described 3-D measurer.
Described quantity of state observation unit can also be observed according to the output of described 3-D measurer and comprise coordinate calculating part The quantity of state of the described robot of output data, this coordinate calculating part is used for calculating the three-dimensional position of each described workpiece.Described Coordinate calculating part can also calculate the attitude of each described workpiece, and export the described workpiece of each calculating three-dimensional position and The data of attitude.Described the result of the action obtaining section can utilize the output data of described 3-D measurer.Described rote learning dress Put and be preferably also equipped with pretreatment portion, it is in the output number to before the input of described quantity of state observation unit, to described 3-D measurer According to being processed, described quantity of state observation unit accepts the quantity of state as described robot for the output data of pretreatment portion.Described Pretreatment portion can make the direction of each described workpiece in the output data of described 3-D measurer be necessarily with height.Institute State the result of the action obtaining section to obtain taking-up success or not, the distress condition of described workpiece of described workpiece and will take out Described workpiece be sent at least one in completeness during rear operation.
Described study portion can possess:Calculated according to the return that the output of described the result of the action obtaining section calculates return Portion;And cost function update section, it has the cost function of the value of the described taking-up action for determining described workpiece, with Described return accordingly to update described cost function.The described taking-up that described study portion can also have the described workpiece of study is moved The learning model made, possesses:Error calculation portion, it is defeated with described learning model according to exporting of described the result of the action obtaining section Out calculation error;And learning model update section, it accordingly to update described learning model with described error.Described machinery Learning device preferably has neutral net.
According to third embodiment of the present invention, there is provided a kind of robot system possessing rote learning device, this machine The study of tool learning device takes out described workpiece by robot arm from multiple workpiece of the mixed and disorderly placement comprising state in bulk The action of robot, this rote learning device possesses:Quantity of state observation unit, its observation comprises the output data of 3-D measurer The quantity of state of described robot, this 3-D measurer measures the graphics of each described workpiece;The result of the action obtaining section, its acquirement Take out the result of the taking-up action of the described robot of described workpiece by described robot arm;And study portion, it accepts to come Output from described quantity of state observation unit and the output from described the result of the action obtaining section, the described state with described robot The result of amount and described taking-up action learns to comprise to indicate the described taking-up action of described workpiece to described robot in association Director data operational ton, described robot system possesses:Described robot, described 3-D measurer and control respectively Described robot and the control device of described 3-D measurer.
According to the present invention the 4th is embodiment there is provided a kind of robot system possessing rote learning device, this machine The study of tool learning device takes out described workpiece by robot arm from multiple workpiece of the mixed and disorderly placement comprising state in bulk The action of robot, this rote learning device possesses:Quantity of state observation unit, its observation comprises the output data of 3-D measurer The quantity of state of described robot, this 3-D measurer measures the graphics of each described workpiece;The result of the action obtaining section, its acquirement Take out the result of the taking-up action of the described robot of described workpiece by described robot arm;And study portion, it accepts to come Output from described quantity of state observation unit and the output from described the result of the action obtaining section, the described state with described robot The result of amount and described taking-up action learns to comprise the operational ton of the measurement parameter of described 3-D measurer, described machine in association Device people's system possesses:Described robot, described 3-D measurer and control described robot and described 3-D measurer respectively Control device.
Preferably, described robot system possesses multiple described robots, is respectively directed to each described robot and arranges Described rote learning device, the multiple described rote learning device arranging for multiple described robots passes through communication media phase Mutually shared data or exchange data.Described rote learning device may reside on Cloud Server.
According to the present invention the 5th embodiment there is provided a kind of study by robot arm from comprising the miscellaneous of state in bulk Leave about the learning by rote of the action of robot taking out described workpiece in the multiple workpiece put, this learning by rote comprises Following steps:Observation comprises the quantity of state of the described robot of output data of 3-D measurer, and the measurement of this 3-D measurer is every The graphics of individual described workpiece;Obtain the knot of the taking-up action of described robot taking out described workpiece by described robot arm Really;Accept the output from described quantity of state observation unit and the output from described the result of the action obtaining section, with described robot Described quantity of state and described taking-up action result learn in association to comprise to described robot indicate described workpiece institute State the operational ton of the director data of taking-up action.
Brief description
The present invention is more clearly understood by referring to the following drawings.
Fig. 1 is the block diagram of the conceptual configuration of the robot system representing one embodiment of the present invention.
Fig. 2 is the figure of the model schematically showing neuron.
Fig. 3 is the figure of three-layer neural network schematically showing the neuron shown in combination Fig. 2 and constituting.
Fig. 4 is the flow chart of that represents the action of rote learning device shown in Fig. 1.
Fig. 5 is the block diagram of the conceptual configuration of the robot system representing another embodiment of the present invention.
Fig. 6 is the figure of of the process for the pretreatment portion in robot system shown in Fig. 5 is described.
Fig. 7 is the block diagram of the variation representing the robot system shown in Fig. 1.
Specific embodiment
Hereinafter, rote learning device, robot system and the learning by rote of the present invention to be described in detail in detail referring to the drawings Embodiment.Nonetheless, it is intended that being interpreted as that the present invention is not only restricted to accompanying drawing or embodiments described below.Here, in the drawings, Give identical reference marks to same parts.Additionally, in different drawings, imparting the part meaning of identical reference marks Taste the structural element being have identical function.Additionally, for ease of understanding, these accompanying drawings are suitably changed engineer's scale.
Fig. 1 is the block diagram of the conceptual configuration of the robot system representing one embodiment of the present invention.Present embodiment Robot system 10 possess:It is mounted with the machine of the robot arm 13 for holding the workpiece 12 in basket case 11 in bulk People 14;The 3-D measurer 15 of the graphics (map) on surface of measurement workpiece 12;Control robot 14 and 3-D measurer respectively 15 control device 16;Coordinate calculating part 19;And rote learning device 20.
Here, rote learning device 20 possesses:Quantity of state observation unit 21, the result of the action obtaining section 26, study portion 22 and It is intended to determination section 25.Additionally, rote learning device 20 is as described later in detail, the taking-up that study instruction robot 14 carries out workpiece 12 is moved The such operational ton of measurement parameter of the director data of work or 3-D measurer 15 is simultaneously exported.
Robot 14 is, for example, 6 axis articulated robots, and the respective drive shaft of robot 14 and robot arm 13 is led to Cross control device 16 to control.Additionally, moving successively to take out workpiece 12 from the case 11 being arranged on precalculated position one by one Move specified place such as conveyer belt or operating desk (not shown), using robot 14.
However, when taking out workpiece 12 in bulk from case 11, the wall of robot arm 13 or workpiece 12 and case 11 sometimes Collision or contact.Or, robot arm 13 or workpiece 12 can be blocked by other workpiece 12 sometimes.In this case, in order to The overload that robot 14 is applied can be avoided immediately, need the function that the power acting on robot arm 13 is detected. Therefore, it is provided with the force snesor 17 of 6 axles between the front end of the arm of robot 14 and robot arm 13.In addition, this enforcement The robot system 10 of mode is also equipped with, the motor (not shown) of the drive shaft of each joint portion according to driven machine people 14 Current value is estimating the function of the power acting on robot arm 13.
Additionally, force snesor 17 can detect the power acting on robot arm 13, therefore, it is possible to judge actually manipulator Whether portion 13 is holding workpiece 12.That is, in the case that robot arm 13 has held workpiece 12, the weight of workpiece 12 is made For robot arm 13, therefore after the taking-up action implementing workpiece 12, if the detected value of force snesor 17 beyond Predetermined threshold, then can interpolate that and hold workpiece 12 for 13 for robot arm.Additionally, with regard to robot arm 13 whether holding workpiece 12 judgement, for example, can also pass through the photographed data of video camera used in 3-D measurer 15, be installed on robot arm The output of 13 photoelectric sensor (not shown) etc. is being judged.Further, it is also possible to according to absorption type manipulator described later Manometric data is being judged.
Here, as long as robot arm 13 can hold workpiece 12, then can have various forms.For example, robot arm 13 Can be to be opened and closed and to hold the form of workpiece 12 by making 2 or multiple claw, or possess to workpiece 12 generation attraction Electromagnet or depression generator.That is, depict the situation that robot arm 13 holds workpiece by 2 claws in FIG, so And it is not limited to this.
In order to measure to multiple workpiece 12,3-D measurer 15 is arranged on multiple workpiece 12 by support sector 18 The precalculated position of top.As 3-D measurer 15, for example can be using by being photographed by 2 video cameras (not shown) The view data of workpiece 12 carries out image procossing, to obtain the three-dimensional visual sensor of three dimensional local information.Specifically, by answering With triangulation, light cross-section method, time-of-flight method (Time-of-flight method), range of defocusing method (Depth from Defocus method) or and used these method etc., to measure graphics (position on the surface of multiple workpiece 12 in bulk).
Coordinate calculating part 19, using the graphics that obtains by the use of 3-D measurer 15 as input, to calculate (mensure) in bulk The position on the surface of multiple workpiece 12.That is, using the output of 3-D measurer 15, can obtain each workpiece 12 each Three-dimensional location data (x, y, z) or three-dimensional location data (x, y, z) and attitude data (w, p, r).Here, state discharge observation Portion 21 accepts both the graphics from 3-D measurer 15 and the position data (attitude data) from coordinate calculating part 19, comes The quantity of state of observation robot 14, however, for example can also only accept to observe machine from the graphics of 3-D measurer 15 The quantity of state of people 14.In addition it is also possible in the same manner as situation about illustrating below with reference to Fig. 5, additional pretreatment portion 50, by this Pretreatment portion 50 is being processed (pre- place to before quantity of state observation unit 21 input to the graphics from 3-D measurer 15 Reason), and it is input to quantity of state observation unit 21.
Furthermore, it is assumed that to determine the relevant position of robot 14 and 3-D measurer 15 beforehand through calibration.Additionally, at this In the 3-D measurer 15 of invention, three-dimensional visual sensor can be replaced to use laser distance analyzer.That is, it is permissible Measure the distance to the surface of each workpiece 12 from the position of setting 3-D measurer 15 by laser scanning, or by making Obtain three-dimensional location data and the attitude of multiple workpiece 12 in bulk with the various sensor such as S.L.R, touch sensor (x、y、z、w、p、r).
I.e., in the present invention, as long as the data (x, y, z, w, p, r) of each workpiece 12 for example can be obtained, then whether The 3-D measurer 15 applying which kind of three-dimensional measurement method all can be applied.Additionally, the mode of setting 3-D measurer 15 is not yet It is particularly limited, for example, it is possible to be fixed on floor or wall etc. it is also possible to be installed on arm of robot 14 etc..
3-D measurer 15, by carrying out the instruction of self-control device 16, obtains the three-dimensional of multiple workpiece 12 in bulk in case 11 Figure, coordinate calculating part 19 obtains the data of the three-dimensional position (attitude) of (calculating) multiple workpiece 12 according to this graphics, and will This data output obtains to the quantity of state observation unit 21 of control device 16 and rote learning device 20 described later and the result of the action Portion 26.Especially, in coordinate calculating part 19, the view data of such as multiple workpiece 12 that basis photographs, estimate certain workpiece 12 with the border of the border of other workpiece 12 or workpiece 12 and case 11, and obtain the three-dimensional location data of each workpiece 12.
The three-dimensional location data of each workpiece 12 refers to, for example, pass through according to multiple on multiple workpiece 12 surface in bulk The position of point is estimating the existence position of each workpiece 12, retainable position and the data that obtains.Certainly, in each workpiece 12 Three-dimensional location data in can also include workpiece 12 attitude data.
Additionally, in the three-dimensional position of each workpiece 12 and the acquirement of attitude data of coordinate calculating part 19, also including Method using rote learning.For example, it is possible to application employs and described later have the input picture of the methods such as teacher learning or be derived from The object identification of laser distance analyzer etc., angle presumption etc..
And, work as and the three-dimensional location data of each workpiece 12 is inputted via coordinate calculating part 19 from 3-D measurer 15 During to control device 16, control device 16 controls the action of the robot arm 13 taking out certain workpiece 12 from case 11.Now, according to The optimum position of the robot arm 13 being obtained by rote learning device 20 described later, attitude and the finger corresponding to removing direction Make value (operational ton), to drive the motor (not shown) of each axle of robot arm 13, robot 14.
Additionally, rote learning device 20 can learn the change of the photography conditions of video camera used in 3-D measurer 15 Amount (the measurement parameter of 3-D measurer 15:For example, using exposure meter photography when adjust time for exposure, to by photography target Illumination of illuminator when being illuminated etc.), and via control device 16 according to study to measurement parameter operational ton control 3-D measurer 15 processed.Here, in each workpiece 12 of position deduction of the multiple workpiece 12 measured according to 3-D measurer 15 Existence position/attitude, position/attitude estimates the variable of condition it is also possible to be included in used in retainable position/attitude In the output data of above-mentioned 3-D measurer 15.
Additionally, according to aforementioned, pretreatment portion 50 grade that can describe in detail after a while by referring to Fig. 5 is in advance to from three-dimensional measurement The output data of device 15 is processed, and this data (view data) after processing is given quantity of state observation unit 21.Additionally, it is dynamic Make result obtaining section 26 for example can according to the output data (output data of coordinate calculating part 19) from 3-D measurer 15, The robot arm 13 obtaining robot 14 has taken out the result of workpiece 12, in addition, certainly for example can also be single via other Unit's (video camera that is for example arranged in rear operation, sensor etc.) obtains and for the workpiece 12 of taking-up to be sent to completing during rear operation The such the result of the action of state change such as degree and the breakage of workpiece 12 with the presence or absence of taking-up.More than, quantity of state observation unit 21 It is that functional module realizes both functions naturally it is also possible to be arranged through a module with the result of the action obtaining section 26.
Then, the rote learning device 20 shown in Fig. 1 is described in detail in detail.Rote learning device 20 has following function:From being transfused to In the set of the data in device, extract wherein useful rule or Knowledge representation, judgment standard etc. by parsing, output This judged result, and carry out the study (rote learning) of knowledge.The method of rote learning is various, if substantially divided, For example it is divided into " having teacher learning ", " teacherless learning " and " intensified learning ".Additionally, realizing the aspect of these methods, have Method extraction, being referred to as " Deep Learning (Deep Learning) " of learning characteristic amount itself.Additionally, these mechanics Practise (rote learning device 20) and general computer or processor can be used, but when application GPGPU is (at general-purpose computations image Reason unit, General-Purpose computing on Graphics Processing Units), extensive PC cluster etc. When, can be processed more at high speed.
First, teacher learning is had to refer to by providing the group of the data of certain input and result (label, label) in large quantities To rote learning device 20, learn the feature in these data sets, inductively obtain according to input presumption result model, that is, its Relational.In the case of applying this to have teacher learning in the present embodiment, for example, presumption work can be inputted according to sensor The part of part position or estimate in part of its successful probability etc. for workpiece candidate uses.It is, for example possible to use it is aftermentioned Neutral net scheduling algorithm realizing.
Additionally, teacherless learning refers to, by input data is only supplied to learning device in large quantities, learn input data Carried out which kind of distribution, though do not provide corresponding teacher's output data it is also possible to by being compressed to input data/point The method to learn for the device of class/shaping etc..For example, it is possible to the feature in these data sets is clustered between similar person etc.. Using this result, arrange certain benchmark to enter its optimized output distribution of enforcement, thus, it is possible to realize the prediction exporting.
Additionally, as having teacher learning and the problem of the intermediateness of teacherless learning to set, referred to as partly having teacher learning, this Corresponding to the data group for example only existing part input and output, in addition it is only the situation of the data of input.In this reality Apply in mode, though with teacherless learning come using so that robot motion can be obtained yet data (view data, Analogue data etc.), thus, it is possible to expeditiously be learnt.
Then, intensified learning is described.First, the problem as intensified learning sets it is considered to as follows.
The state of robot observing environment, determines behavior.
Environment changes with certain rule, and then, the behavior of itself gives change to environment sometimes.
When taking action every time, return return signal.
Want maximized be that future, (discount) was returned total.
From not knowing about completely or not exclusively understand that the state of result that behavior causes starts to learn.That is, robot is permissible Actually take action first, obtain its result as data.That is, needing to attempt while exploring optimal behavior.
For apish action it is also possible to by prior learning to (aforesaid have teacher learning, reverse intensified learning Such method) state is set to original state, starts to learn from good beginning place.
Here, intensified learning refers to, in addition to judging, classifying, go back learning behavior, thus according to the phase to environment for the behavior Interaction is used for learning suitable behavior, and that is, study is in order that the return maximization that obtains in the future and the method that learnt.This table Show for example be obtained in that in the present embodiment so that the mountain of workpiece 12 is caved in and future be easy to take out the such impact of workpiece 12 Following behavior.Hereinafter, in case of Q study, son goes on to say, but is not limited to Q study.
The method that Q study refers to learn value Q (s, a) of housing choice behavior a under certain ambient condition s.That is, at certain During state s, Optional Value Q (s, a) highest behavior a is as optimal behavior.But, initially, with regard to state s and behavior a Combination, does not know about the right value being worth Q (s, a) completely.Therefore, intelligent body (behavioral agent) selects various row under certain state s For a, and give return to behavior a now.Thus, constantly the selection of study more preferably behavior is correctly to be worth Q to intelligent body (s、a).
Further, since wanting to make the result as behavior and the total maximization in the return obtaining in the future, therefore, target It is finally to make Q (s, a)=E [Σ (γt)rt].Here, E [] represents expected value, and t is moment, γ is described later to be referred to as discount rate Parameter, rtBe return during moment t, Σ be total based on moment t.Expected value in this formula is set to according to optimal behavior There occurs the value being taken during state change, because this is unknown, therefore will be explored and be learnt.Such It is worth the newer of Q (s, a), for example, pass through following formula (1) and represent.
In above formula (1), stRepresent ambient condition during moment t, atRepresent behavior during moment t.By behavior at, shape State change turns to st+1.rt+1Represent by returning obtained from this state change.Additionally, the item with max becomes making in state st+1 Under the Q value that have selected during now known Q value highest behavior a be multiplied by item obtained by γ.Here, γ is the ginseng of 0 < γ≤1 Number, is referred to as discount rate.Additionally, α is learning coefficient, it is set to the scope of 0 < α≤1.
Above-mentioned formula (1) represents, according to trial atResult and return come return rt+1, more new state stIn behavior at's Evaluation of estimate Q (st、at) method.Represent if based on return rt+1With the optimal behavior max a's of the next state of behavior a Evaluation of estimate Q (st+1、max at+1) total evaluation of estimate Q (s more than the behavior a under state st、at), then make Q (st、at) increase, On the contrary, the evaluation of estimate Q (s if less than the behavior a under state st、at) then make Q (st、at) reduce.That is, making certain shape The value of certain behavior under state close to based under the return immediately returning as a result and the next state of the behavior The value of good behavior.
Here, Q (s, a) technique of expression on computers, has pre- as form to (s, a) for whole state behaviors First keep the method for this value and the method preparing to carry out approximate function to Q (s, a).The method of the latter, by using random The methods such as gradient descent method adjust the parameter of approximate function, are capable of above-mentioned formula (1).Additionally, as approximate function, can make Use neutral net described later.
Additionally, it is approximate as the cost function having in teacher learning, the learning model of teacherless learning or intensified learning Algorithm, it is possible to use neutral net.Fig. 2 is the figure of the model schematically showing neuron, and Fig. 3 is to schematically show combination Fig. 2 Shown neuron and the figure of three-layer neural network that constitutes.That is, neutral net such neuron for example shown in imitation Fig. 2 The arithmetic unit of model and memory etc. are constituted.
As shown in Fig. 2 neuron is directed to multiple input x (in Fig. 2, to input x1~input x3 as) output output (result) y.Each input x (x1, x2, x3) is made to be multiplied by weight w (w1, w2, w3) corresponding with this input x.Thus, neuron output Result y being showed by following formula (2).Additionally, input x, result y and weight w are entirely vector.Additionally, in following formula (2), θ It is biasing (bias), fkIt is activation primitive.
The three-layer neural network with reference to Fig. 3, the neuron shown in combination Fig. 2 being described and constituting.As shown in figure 3, from nerve net The left side of network inputs multiple input x (here, to input x1~input x3 as), from right side output result y (here, with Result y1~result y3 is as one).Specifically, input x1, x2, x3 are multiplied by corresponding weights and are input to 3 neurons Each of N11~N13.The weights being multiplied with these inputs are collectively labeled as W1.
Neuron N11~N13 exports z11~z13 respectively.In Fig. 3, these z11~z13 be labeled as with being aggregated feature to Amount Z1, and the vector being extracted after the characteristic quantity of input vector can be regarded as.This feature vector Z 1 is weights W1 and weights W2 Between characteristic vector.Z11~z13 is multiplied by corresponding weights and is input to each of 2 neuron N21 and N22.Will be with The weights that these characteristic vectors are multiplied collectively are labeled as w2.
Neuron N21, N22 each export z21, z22.In Fig. 3, these z21, z22 are collectively labeled as characteristic vector Z2.This feature vector Z 2 is the characteristic vector between weights W2 and weights W3.Z21, z22 are multiplied by corresponding weights and are input to Each of 3 neuron N31~N33.The weights being multiplied with these characteristic vectors are collectively labeled as W3.
Finally, neuron N31~N33 each output result y1~result y3.There is study in the action of neutral net Pattern and value forecasting pattern.For example, in mode of learning, learn weights W using learning data set, and existed using this parameter The behavior carrying out robot in predictive mode judges.Additionally, for convenience be written as predict, but it is of course possible to be detection/point The multiple-tasks such as class/inference.
Here it is possible to actually making data obtained by robot motion carry out instant learning under predictive mode, and instead Reflect (on-line study) in ensuing behavior;The study that can also be collected using the data group collected in advance, with Carry out detection pattern (batch learns) afterwards always using this parameter.Or or intermediateness, whenever data is with certain journey Degree inserts mode of learning when occurring to accumulate.
Furthermore, it is possible to method (error back propagation method is broadcast by error-duration model:Back propagation) learning weights W1~W3.Additionally, control information is from right side input cocurrent to the left.It is to make input for each neuron that error-duration model broadcasts method Output y during input x with really export the difference between y (teacher) and diminish, (study) is adjusted to respective weights Method.
Such neutral net can increase layer (referred to as Deep Learning) more than three layers further.Further, it is also possible to only The feature extraction periodically being inputted the arithmetic unit that result returned automatically are obtained according to teacher's data.
Therefore, the rote learning device 20 of present embodiment, in order to implement above-mentioned Q study, as shown in Figure 1 so, Possess:Quantity of state observation unit 21, the result of the action obtaining section 26, study portion 22 and intention determination section 25.However, should in the present invention Learning by rote is for example aforementioned to be not limited to Q study.I.e., it is possible to application side used in rote learning device Method is the various methods such as " having teacher learning ", " teacherless learning ", " partly having teacher learning " and " intensified learning ".Additionally, this A little rote learnings (rote learning device 20) can use general computer or processor, however, when application GPGPU, big rule During mould PC cluster etc., can be processed more at high speed.
That is, according to present embodiment, there is provided a kind of rote learning device, its study is by robot arm 13 from comprising to dissipate The action of the robot 14 of workpiece 12 is taken out, it possesses in multiple workpiece 12 of the mixed and disorderly placement of dress state:Quantity of state observation unit 21, its observation comprises the quantity of state of the robot 14 of output data of 3-D measurer 15, and this 3-D measurer 15 measures each The three-dimensional position (x, y, z) of workpiece 12 or three-dimensional position and attitude (x, y, z, w, p, r);The result of the action obtaining section 26, it takes The result of the taking-up action of robot 14 of workpiece 12 must be taken out by robot arm 13;Study portion 22, it accepts to be derived from state The output in discharge observation portion 21 and the output from the result of the action obtaining section 26, with the quantity of state of robot 14 and the knot of taking-up action Fruit study in association comprises to indicate the operational ton of the director data of taking-up action of workpiece 12 to robot 14.
Additionally, quantity of state observation unit 21 observation quantity of state for example can comprise respectively to from case 11 take out certain workpiece 12 when The state variable that the position of robot arm 13, attitude and removing direction are set.Additionally, the operational ton being learnt for example may be used To comprise to be supplied to turning of each drive shaft of robot 14, robot arm 13 from control device 16 when case 11 takes out workpiece 12 The command value such as square, speed, position of rotation.
And, study portion 22 is when one of multiple workpiece 12 in bulk workpiece is removed, the taking-up with workpiece 12 is moved The result (output of the result of the action obtaining section 26) made learns above-mentioned state variable in association.That is, being filled by control Put 16 and randomly set the output data of 3-D measurer 15 (coordinate calculating part 19) and the director data of robot arm 13 respectively, Or expressly set according to predetermined rule, and implement the taking-up action of workpiece 12 by robot arm 13.Here, as Above-mentioned predetermined rule, for example, have the high workpiece in height (z) direction from multiple workpiece 12 in bulk to start to take out successively.By This, the output data of 3-D measurer 15 corresponds to, with the director data of robot arm 13, the behavior taking out certain workpiece.And, meeting Produce the success of taking-up and the failure of workpiece 12, in the such success of each generation and failure, study portion 22 surveys to by three-dimensional The state variable that the director data of the output data of measuring device 15 and robot arm 13 is constituted is evaluated.
Additionally, study portion 22 will take out the output data of 3-D measurer 15 and the instruction of robot arm 13 during workpiece 12 Data is stored in association with the evaluation of the result of the taking-up action to workpiece 12.Additionally, have as follows as failure example Situation:Even if but robot arm 13 cannot keep the situation of workpiece 12 or hold workpiece 12 wall of workpiece 12 and case 11 Situation colliding or contacting etc..Additionally, the detected value according to force snesor 17, the photographed data based on 3-D measurer Lai Judge whether the taking-up of such workpiece 12 is successful.Here, rote learning device 20 for example can also be using from control device 16 The director data of robot arm 13 of output a part of being learnt.
Here, the study portion 22 of present embodiment is preferably provided with returning calculating part 23 and cost function update section 24.For example, The success or not of the taking-up of workpiece 12 that return calculating part 23 causes according to above-mentioned state variable, to calculate return, such as fraction. Success to the taking-up of workpiece 12, is set to return and uprises, the failure to the taking-up of workpiece 12, is set to return step-down.Additionally, also may be used So that return is calculated according to the number of success of the taking-up of workpiece 12 in the given time.Additionally, when calculating this return, for example Can correspond to the one-tenth of successful, workpiece 12 the placement action of successful, robot arm 13 the carrying of the holding of robot arm 13 Each stage of the taking-up of the workpiece such as work(12 calculates return.
And, cost function update section 24 has the cost function that the value of the taking-up action to workpiece 12 is determined, With above-mentioned return accordingly recovery value function.In the renewal of this cost function, using above-mentioned such value Q (s, a) Newer.Furthermore it is preferred that make behavior memory table in this renewal.Behavior memory table described here refers to, dependently of each other Have recorded the director data of the output data of 3-D measurer 15 when having taken out workpiece 12 and robot arm 13 and correspond to this When the taking-up result of workpiece 12 and the form of cost function (evaluation of estimate) after updating.
Additionally, being worth table as the behavior, it is possible to use carry out the letter after approximate processing using aforesaid neutral net Number, is particularly effective when the information content of such state s such as view data is huge.Additionally, above-mentioned cost function is not limited to 1 Kind.For example consider to robot arm 13 to workpiece 12 hold success or not evaluated cost function, to by machinery Hand 13 holds and carries the cost function that the time (cycle time) required for workpiece 12 evaluated.
Additionally, as above-mentioned cost function, it is possible to use the case 11 when workpiece is taken out and robot arm 13 or workpiece The cost function that interference between 12 is evaluated.In order to calculate return, quantity of state used in the renewal of this cost function Observation unit 21 preferably observes the power being applied to robot arm 13, the value for example being detected by force snesor 17.And, sensed by power In the case that the variable quantity of the power of device 17 detection has exceeded predetermined threshold, can be estimated as there occurs above-mentioned interference, it is therefore preferable that Return in the case of this is set to such as negative value, and makes to be worth decline determined by cost function.
Additionally, according to present embodiment, the measurement parameter of 3-D measurer 15 can be learnt as operational ton.That is, According to present embodiment, there is provided a kind of rote learning device, its study is by robot arm 13 from comprising the miscellaneous of state in bulk Leave about the action of the robot 14 taking out workpiece 12 in the multiple workpiece 12 put, it possesses:Quantity of state observation unit 21, its observation bag The quantity of state of the robot 14 of the output data containing 3-D measurer 15, this 3-D measurer 15 measures the three-dimensional of each workpiece 12 Position (x, y, z) or three-dimensional position and attitude (x, y, z, w, p, r);The result of the action obtaining section 26, it obtains and passes through manipulator Portion 13 takes out the result of the taking-up action of robot 14 of workpiece 12;Study portion 22, it accepts from quantity of state observation unit 21 Output and the output from the result of the action obtaining section 26, are learned in association with the quantity of state of robot 14 and the result of taking-up action Practise the operational ton of the measurement parameter comprising 3-D measurer 15.
Additionally, in the robot system 10 of present embodiment, can also possessing manipulator apparatus for automatic change and (not scheme Show), the robot arm 13 installed in robot 14 is replaced by the robot arm 13 of other forms by it.In this case, valency Value function update section 24 can also have above-mentioned cost function for each different robot arm 13 of form, with return accordingly The cost function of the robot arm 13 after changing is updated.Thereby, it is possible to the multiple manipulators 13 different for form Each is higher therefore, it is possible to make manipulator apparatus for automatic change select cost function learning the optimal action of robot arm 13 Robot arm 13.
Then it is intended that determination section 25, for example preferably with reference to the such as above-mentioned behavior memory table produced, to select and higher assessment It is worth the output data of corresponding 3-D measurer 15 and the director data of robot arm 13.Afterwards it is intended that determination section 25 will select Fixed robot arm 13, the optimum data of 3-D measurer 15 export control device 16.
Then, control device 16 is exported using study portion 22 robot arm 13, the optimum data of 3-D measurer 15, Respectively 3-D measurer 15 and robot 14 are controlled taking out workpiece 12.For example, control device 16 is preferably according to logical Cross the state variable that the optimum position of robot arm 13, attitude and the removing direction that study portion 22 obtains is set respectively, make Robot arm 13, each drive shaft of robot 14 carry out action.
Additionally, the robot system 10 of above-mentioned embodiment, as shown in figure 1, possesses a machine for a robot 14 Tool learning device 20.But in the present invention, robot 14 and the respective quantity of rote learning device 20 are not limited to one.Example As robot system 10 can also possess multiple robots 14, is arranged in correspondence with more than one machinery with each robot 14 Learning device 20.And, robot system 10 is preferably shared by communication medias such as networks or is exchanged with each other each robot 14 The 3-D measurer 15 acquired by rote learning device 20 and robot arm 13 optimum state variable.Thus, even if certain machine The operation ratio of device people 14 is less than the operation ratio of other robot 14 it is also possible to utilize another machine in the action of certain robot 14 The optimal the result of the action acquired by rote learning device 20 that people 14 possesses.Additionally, by the study mould between multiple robots Type shared, or carry out comprising the quantity of state of the operational ton of the measurement parameter of 3-D measurer 15 and robot 14 with take out dynamic Sharing of the result made, can shorten the time that study is spent.
Additionally, rote learning device 20 may be located in robot 14 it is also possible to be located at outside robot 14.Or, machine Tool learning device 20 may be located in control device 16 it is also possible to be present in Cloud Server (not shown).
Additionally, in the case that robot system 10 possesses multiple robots 14, can carry by machine in certain robot 14 The period of the workpiece 12 that tool hand 13 holds, the robot arm of another robot 14 is made to implement to take out the operation of workpiece 12.And, The time of the period that cost function update section 24 can also be switched over using robot 14 that is such, taking out workpiece 12 is more New value function.Additionally, having the state variable of multiple Manipulator Models in rote learning device 20, move in the taking-up of workpiece 12 Carry out in work using multiple Manipulator Models taking-up simulation, and according to this taking-up simulation result, the taking-up with workpiece 12 move The result made learns the state variable of multiple Manipulator Models in association.
Additionally, in above-mentioned rote learning device 20, by obtain the graphics of each workpiece 12 data when three-dimensional survey The output data of measuring device 15, is sent to quantity of state observation unit 21 from 3-D measurer 15.May not in such transmission data Include abnormal data, therefore, can have the filtering function of abnormal data in rote learning device 20, can select be No by from 3-D measurer 15 data input to quantity of state observation unit 21 function.Thus, of rote learning device 20 Habit portion 22 can efficiently learn the optimal action of the robot arm 13 of 3-D measurer 15 and robot 14.
Additionally, in above-mentioned rote learning device 20, the output data coming self study portion 22 is imported into control device In 16, but also may not comprise abnormal data in the output data from this study portion 22, accordingly it is also possible to have exception The filtering function of data, that is, be able to select whether the function of the data output in self study in future portion 22 to control device 16.Thus, Control device 16 can make robot 14 more safely execute the optimal action of robot arm 13.
Additionally, above-mentioned abnormal data can be detected by following order.I.e., it is possible to by following such order Lai Detection abnormal data:The probability distribution of presumption input data, derives the probability of happening of new input, if it happens using probability distribution Probability is necessarily following, then regard the abnormal data significantly deviateing typical movement as.
Then, illustrate one of the action of rote learning device 20 that the robot system 10 of present embodiment possesses.Fig. 4 It is the flow chart of that represents the action of rote learning device shown in Fig. 1.As shown in figure 4, the rote learning shown in Fig. 1 In device 20, when starting study action (study process), three-dimensional measurement is implemented by 3-D measurer 15 and is exported (figure 4 step S11).I.e., in step s 11, for example, obtain the three-dimensional of each workpiece 12 of mixed and disorderly placement comprising state in bulk Figure (output data of 3-D measurer 15), exports quantity of state observation unit 21, and, accepts each by coordinate calculating part 19 The graphics of workpiece 12 and calculate the three-dimensional position (x, y, z) of each workpiece 12, export quantity of state observation unit 21, the result of the action Obtaining section 26 and control device 16.Here, coordinate calculating part 19 can also calculate according to the output of 3-D measurer 15 often The attitude (w, p, r) of individual workpiece 12 is simultaneously exported.
Additionally, as illustrated by with reference to Fig. 5, can be by the output (graphics) of 3-D measurer 15 via to quantity of state The pretreatment portion 50 being processed before observation unit 21 input is input to quantity of state observation unit 21.Additionally, as illustrated by with reference to Fig. 7 , only the output of 3-D measurer 15 can be input to quantity of state observation unit 21, and then, can be only by 3-D measurer 15 Output is input to quantity of state observation unit 21 via pretreatment portion 50.So, the enforcement of the three-dimensional measurement in step S11 and output can To comprise various modes.
Specifically, in the case of figure 1, quantity of state observation unit 21 observation is from each workpiece 12 of 3-D measurer 15 Graphics and the three-dimensional position (x, y, z) of each workpiece 12 from coordinate calculating part 19 and the such shape of attitude (w, p, r) State amount (output data of 3-D measurer 15).Additionally, the result of the action obtaining section 26 is according to the output data of 3-D measurer 15 (output data of coordinate calculating part 19), obtains the knot of the taking-up action of robot 14 taking out workpiece 12 by robot arm 13 Really.Additionally, the result of the action obtaining section 26 is in addition to the output data of 3-D measurer, for example, can also obtain the work that will take out The result of the such taking-up action of damage of workpiece 12 that part 12 is sent to completeness during rear operation, takes out.
Additionally, for example, rote learning device 20 according to the output data of 3-D measurer 15 determine optimal action (Fig. 4's Step S12), in addition, control device 16 exports the director data (operational ton) of robot arm 13 (robot 14), implement workpiece 12 Taking-up action (step S13 of Fig. 4).Then, the taking-up result (figure of workpiece is obtained by above-mentioned the result of the action obtaining section 26 4 step S14).
Then, by the output from the result of the action obtaining section 26, judge workpiece 12 taking-up success or not (Fig. 4's Step S15), in the case of workpiece 12 takes out successfully, set and just return (step S16 of Fig. 4), take out unsuccessfully in workpiece 12 In the case of, set negative return (step S17 of Fig. 4), then, regeneration behavior is worth table (cost function) (step S18 of Fig. 4).
Here, for example, it is possible to work is judged according to the output data of the 3-D measurer 15 after the taking-up action of workpiece 12 The success or not of the taking-up of part 12.Additionally, the judgement of the success or not of the taking-up of workpiece 12 is not limited to evaluate the taking-up of workpiece 12 Success or not, for example can also evaluate:The workpiece 12 of taking-up is sent to completeness during rear operation, with the presence or absence of take out State changes such as the damage of workpiece 12 or held by robot arm 13 and the time required for workpiece 12 of carrying is (during the cycle Between) or energy (electricity) etc..
Additionally, by returning the return value that calculating part 23 calculates the judgement of the taking-up success or not based on workpiece 12, in addition, Carry out the renewal of behavior memory table by cost function update section 24.That is, study portion 22, will just in the taking-up success of workpiece 12 Return sets to the return (S16) in the newer of aforesaid values Q (s, a), in addition, workpiece 12 take out failure when, Negative return is set to the return (S17) in this newer.Then, study portion 22, when carrying out the taking-up of workpiece 12 every time, is entered The renewal (S18) of the aforesaid behavior memory table of row.By repeating above step S11~S18, study portion 22 continues (study) behavior It is worth the renewal of table.
In above-mentioned, the data being imported into quantity of state observation unit 21 is not only restricted to the output data of 3-D measurer 15, For example, it is also possible to comprise the data such as the output of other sensors, in addition it is also possible to using the director data carrying out self-control device 16 A part.So, control device 16, using the director data (operational ton) from the output of mechanical learning device 20, makes robot 14 The taking-up action of execution workpiece 12.Additionally, the study based on rote learning device 20, it is not only restricted to the taking-up action of workpiece 12, For example as aforementioned can also be 3-D measurer 15 measurement parameter.
As more than, according to the robot system 10 of the rote learning device 20 possessing present embodiment, can learn to pass through Robot arm 13 is from the action comprising the robot 14 taking out workpiece 12 multiple workpiece 12 of mixed and disorderly placement of state in bulk.By This, robot system 10 can learn to take out the optimal action of the robot 14 of workpiece 12 in bulk without manpower intervention Select.
Fig. 5 is the block diagram of the conceptual configuration of robot system representing another embodiment of the present invention, and expression applies There is the robot system of teacher learning.From the comparison of Fig. 5 and aforementioned Fig. 1 it will be evident that with respect to applying Q shown in Fig. 1 The robot system 10 of study (intensified learning), the robot system 10 ' applying teacher learning shown in Fig. 5 is also equipped with attached Data recording section 40 with result (label).Additionally, the robot system 10 ' shown in Fig. 5 is also equipped with to 3-D measurer 15 The pretreatment portion 50 that output data is pre-processed.Additionally, can certainly for example arrange to the robot system 10 shown in Fig. 1 Pretreatment portion 50.
As shown in figure 5, the rote learning device 30 in applying the robot system 10 ' of teacher learning possesses:State Discharge observation portion 31, movement structure obtaining section 36, study portion 32 and intention determination section 35.Study portion 32 comprises error calculation portion 33 He Learning model update section 34.Additionally, in the robot system 10 ' of present embodiment, rote learning device 30 also learns to indicate Robot 14 carries out the such operational ton of measurement parameter of the director data of taking-up action of workpiece 12 or 3-D measurer 15 simultaneously Exported.
That is, in the robot system 10 ' applying teacher learning shown in Fig. 5, error calculation portion 33 and study mould Type update section 34 corresponds respectively to the return calculating part 23 in the robot system 10 applying Q study shown in Fig. 1 and is worth Function update section 24.Additionally, other mechanisms such as structure such as 3-D measurer 15, control device 16 and robot 14 with aforementioned Fig. 1 is identical, and the description thereof will be omitted.
By error calculation portion 33 calculate from the result of the action obtaining section 36 output result (label) be arranged on study portion In the output of learning model between error.Here, the data recording section 40 of spin-off (label) is for example in workpiece 12 In the case of shape, the process identical of robot 14, it is maintained up to the proxima luce (prox. luc) of the target date making robot 14 carry out operation Till obtained by spin-off (label) data, and in this target date by the data recording section 40 of spin-off (label) The data of the spin-off (label) keeping is supplied to error calculation portion 33.Or it is also possible to pass through storage card or communication line By the spin-off by data or other robot system obtained by simulation of carrying out in the outside of robot system 10 ' etc. The data of (label), is supplied to the error calculation portion 33 of this robot system 10 '.Furthermore, it is possible to utilize flash memory (Flash ) etc. Memory nonvolatile memory constitutes the data recording section 40 of spin-off (label), by the number of spin-off (label) It is built in study portion 32 according to record portion (nonvolatile memory) 40, this spin-off (mark is directly used in by study portion 32 Sign) data recording section 40 in keep spin-off (label) data.
Fig. 6 is the figure of of the process for the pretreatment portion in the robot system shown in Fig. 5 is described, Fig. 6 (a) table The data showing the three-dimensional position (attitude) of multiple workpiece 12 in bulk in case 11 is the one of the output data of 3-D measurer 15 Example, Fig. 6 (b)~Fig. 6 (d) represents the example that the workpiece 121~123 in Fig. 6 (a) is carried out with pretreated view data.
Here, as workpiece 12 (121~123) it is contemplated that columned metal parts, as robot arm (13), in advance Think for example to utilize the suction tray of the longitudinally central part of the cylindric workpiece of negative pressure absorbing 12, rather than held using 2 claws Workpiece.If thus, for example knowing the position of the longitudinally central part of workpiece 12, by making suction tray (13), this position is moved Move and adsorbed, workpiece 12 can be taken out.Additionally, the numerical value in Fig. 6 (a)~Fig. 6 (d) is represented with [mm], represent x side respectively To, y direction, z direction.Additionally, z direction corresponds to the pass the 3-D measurer 15 being disposed over (for example has 2 shootings Machine) case 11 of multiple workpiece 12 in bulk shot obtained by view data height (depth) direction.
Obvious from Fig. 6 (a) and Fig. 6 (b)~Fig. 6 (d) is found out, as in the robot system 10 ' shown in Fig. 5 One of the process of pretreatment portion 50, the output data (3-D view) according to 3-D measurer 15, make the workpiece 12 (example of concern As 3 workpiece 121~123) rotation, and be processed into center height be " 0 ".
That is, include the three-dimensional of the longitudinally central part of such as each workpiece 12 in the output data of 3-D measurer 15 Position (x, y, z) and the information of attitude (w, p, r).Now, as shown in Fig. 6 (b), Fig. 6 (c) and Fig. 6 (d), make of interest 3 Each spinning-the r of workpiece 121,122,123, and deduct z and all meet identical condition.By carrying out such pretreatment, The load of rote learning device 30 can be reduced.
Here, the graphics shown in Fig. 6 (a) is not the output data of 3-D measurer 15 itself, but for example therewith Before compare, reduce for being selected from the image obtained by program by specifying the taking-up order of workpiece 12 implemented The threshold value selected, and this process itself can also be carried out by pretreatment portion 50.Additionally, the place as such pretreatment portion 50 It is of course possible to there are various changes due to the various conditions headed by species of the shape of workpiece 12 and robot arm 13 etc. in reason Change.
So, to quantity of state observation unit 31 input before by pretreatment portion 50 processed obtained by 3-D measurer 15 output data (graphics of each workpiece 12) is imported into quantity of state observation unit 31.Referring again to Fig. 5, accept from action The error calculation portion 33 of the result (label) of result obtaining section 36 output, as learning model, such as by the nerve net shown in Fig. 3 When the output of network is set to y, regard, when the taking-up action the success that actually carry out workpiece 12, the error that there is-log (y) as;? Regard the error that there is-log (1-y) during failure as, and carry out the process that purpose is to make this error minimize.Additionally, as Fig. 3 The input of shown neutral net, provides and such shown in such as Fig. 6 (b)~Fig. 6 (d) carries out pretreated work of interest The three-dimensional position of each of the view data of part 121~123 and these workpiece 121~123 of interest and attitude (x, y, Z, w, p, data r).
Fig. 7 is the block diagram of the variation representing the robot system shown in Fig. 1.Obvious from Fig. 7 and Fig. 1 is found out, In the variation of robot system 10 shown in Fig. 7, reduce coordinate calculating part 19, quantity of state observation unit 21 only accepts to be derived from The graphics of 3-D measurer 15 is observing the quantity of state of robot 14.Additionally, can certainly arrange for control device 16 Structure corresponding with coordinate calculating part 19.Additionally, the structure shown in this Fig. 7 for example can also be applied to answering with reference to Fig. 5 explanation With there being the robot system 10 ' of teacher learning.I.e., it is possible to the middle reduction pretreatment portion of robot system 10 ' shown in from Fig. 5 50, quantity of state observation unit 31 only accepts to be derived from the quantity of state to observe robot 14 for the graphics of 3-D measurer 15.So, on State each embodiment to carry out various changes and deform.
As described in detail above, according to present embodiment, using the teaching of the invention it is possible to provide one kind can learn to take out bag without manpower intervention The rote learning device of the optimal action of the robot during workpiece of the mixed and disorderly placement containing state in bulk, robot system and machine Tool learning method.Additionally, as the rote learning device 20,30 of the present invention, be not limited to apply intensified learning (such as Q study) or There is teacher learning, the algorithm of various rote learnings can also be applied.
According to the rote learning device of the present invention, robot system and learning by rote it is achieved that can be artificial The effect of the optimal action of robot during the workpiece of mixed and disorderly placement comprising state in bulk is taken out in the study of intervention ground.
It is explained above embodiment, but the whole examples recorded here, condition are all to contribute to understanding in invention With record for the purpose of the inventive concept of application in technology, the example especially recorded, condition are not limiting as the model invented Enclose.Additionally, the such record in specification is not offered as the merits and demerits invented.Although describing the enforcement of invention in detail Mode is it being understood, however, that can carry out various changes, replacement, deformation in the case of the spirit and scope without departing from invention.

Claims (18)

1. a kind of rote learning device, study is taken from multiple workpiece of the mixed and disorderly placement comprising state in bulk by robot arm Go out the action of the robot of described workpiece,
This rote learning device is characterised by possessing:
Quantity of state observation unit, its observation comprises the quantity of state of the described robot of output data of 3-D measurer, and this three-dimensional is surveyed Measuring device obtains the graphics of each described workpiece;
The result of the action obtaining section, it obtains the taking-up action of described robot taking out described workpiece by described robot arm Result;And
Study portion, it accepts the output from described quantity of state observation unit and the output from described the result of the action obtaining section, with The result learning manipulation amount in association of the described quantity of state of described robot and described taking-up action, this operational ton comprises to institute State the director data that robot indicates the described taking-up action of described workpiece.
2. rote learning device according to claim 1 it is characterised in that
Described rote learning device is also equipped with being intended to determination section, its described operational ton learning with reference to described study portion, to determine Orient the described director data of described robot instruction.
3. a kind of rote learning device, study is taken from multiple workpiece of the mixed and disorderly placement comprising state in bulk by robot arm Go out the action of the robot of described workpiece,
This rote learning device is characterised by possessing:
Quantity of state observation unit, its observation comprises the quantity of state of the described robot of output data of 3-D measurer, and this three-dimensional is surveyed Measuring device measures the graphics of each described workpiece;
The result of the action obtaining section, it obtains the taking-up action of described robot taking out described workpiece by described robot arm Result;And
Study portion, it accepts the output from described quantity of state observation unit and the output from described the result of the action obtaining section, with The result of the described quantity of state of described robot and described taking-up action learns to comprise the survey of described 3-D measurer in association The operational ton of amount parameter.
4. rote learning device according to claim 3 it is characterised in that
Described rote learning device is also equipped with being intended to determination section, and the described operational ton that it learns with reference to described study portion to determine The described measurement parameter of described 3-D measurer.
5. rote learning device according to any one of claim 1 to 4 it is characterised in that
Described quantity of state observation unit to observe, always according to the output of described 3-D measurer, the output data comprising coordinate calculating part Described robot quantity of state, this coordinate calculating part is used for calculating the three-dimensional position of each described workpiece.
6. rote learning device according to claim 5 it is characterised in that
Described coordinate calculating part also calculates the attitude of each described workpiece, and exports the three-dimensional position of the described workpiece of each calculating Put the data with attitude.
7. rote learning device according to any one of claim 1 to 6 it is characterised in that
Described the result of the action obtaining section utilizes the output data of described 3-D measurer.
8. rote learning device according to any one of claim 1 to 7 it is characterised in that
Described rote learning device is also equipped with pretreatment portion, its to described quantity of state observation unit input before, to described three-dimensional The output data of measuring appliance is processed,
Described quantity of state observation unit accepts the quantity of state as described robot for the output data of pretreatment portion.
9. rote learning device according to claim 8 it is characterised in that
Described pretreatment portion makes the direction of each described workpiece in the output data of described 3-D measurer be one with height Fixed.
10. rote learning device according to any one of claim 1 to 9 it is characterised in that
Described the result of the action obtaining section obtains taking-up success or not, the distress condition of described workpiece of described workpiece and will take The described workpiece going out is sent at least one in completeness during rear operation.
11. rote learning devices according to any one of claim 1 to 10 it is characterised in that
Described study portion possesses:
Return calculating part, it calculates return according to the output of described the result of the action obtaining section;And
Cost function update section, it has the cost function of the value of the described taking-up action for determining described workpiece, with institute State return accordingly to update described cost function.
12. rote learning devices according to any one of claim 1 to 10 it is characterised in that
Described study portion has the learning model of the described taking-up action learning described workpiece,
Described study portion possesses:
Error calculation portion, it is according to the output of described the result of the action obtaining section and the output of described learning model come calculation error; And
Learning model update section, it accordingly to update described learning model with described error.
13. rote learning devices according to any one of claim 1 to 12 it is characterised in that
Described rote learning device has neutral net.
A kind of 14. robot systems, possess the rote learning device described in any one of claim 1 to 13, this robot System is characterised by, possesses:
Described robot;
Described 3-D measurer;And
Control described robot and the control device of described 3-D measurer respectively.
15. robot systems according to claim 14 it is characterised in that
Described robot system possesses multiple described robots,
It is respectively directed to each described robot and described rote learning device is set,
The multiple described rote learning device arranging for multiple described robots pass through the mutual shared data of communication media or Exchange data.
16. robot systems according to claim 15 it is characterised in that
Described rote learning device is present on Cloud Server.
A kind of 17. learning by rote, study is by robot arm from comprising multiple workpiece of mixed and disorderly placement of state in bulk Take out the action of the robot of described workpiece, this learning by rote is characterised by,
Observation comprises the quantity of state of the described robot of output data of 3-D measurer, and this 3-D measurer measures described in each The three-dimensional position of workpiece;
Obtain the result of the taking-up action of described robot taking out described workpiece by described robot arm;And
Accept the result of the taking-up action of the quantity of state of the described robot and acquired described robot being observed, with institute The result of the described quantity of state and described taking-up action of stating robot learns to comprise described to the instruction of described robot in association The operational ton of the director data of described taking-up action of workpiece.
A kind of 18. learning by rote, study is by robot arm from comprising multiple workpiece of mixed and disorderly placement of state in bulk Take out the action of the robot of described workpiece, this learning by rote is characterised by,
Observation comprises the quantity of state of the described robot of output data of 3-D measurer, and this 3-D measurer measures described in each The graphics of workpiece;
Obtain the result of the taking-up action of described robot taking out described workpiece by described robot arm;And
Accept the result of the taking-up action of the quantity of state of the described robot and acquired described robot being observed, with institute The result of the described quantity of state and described taking-up action of stating robot learns to comprise the measurement of described 3-D measurer in association The operational ton of parameter.
CN201610617361.XA 2015-07-31 2016-07-29 Machine learning device, robot system, and machine learning method Active CN106393102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110544521.3A CN113199483A (en) 2015-07-31 2016-07-29 Robot system, robot control method, machine learning device, and machine learning method

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP2015152067 2015-07-31
JP2015-152067 2015-07-31
JP2015-233857 2015-11-30
JP2015233857A JP6522488B2 (en) 2015-07-31 2015-11-30 Machine learning apparatus, robot system and machine learning method for learning work taking-out operation

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202110544521.3A Division CN113199483A (en) 2015-07-31 2016-07-29 Robot system, robot control method, machine learning device, and machine learning method

Publications (2)

Publication Number Publication Date
CN106393102A true CN106393102A (en) 2017-02-15
CN106393102B CN106393102B (en) 2021-06-01

Family

ID=57985283

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110544521.3A Pending CN113199483A (en) 2015-07-31 2016-07-29 Robot system, robot control method, machine learning device, and machine learning method
CN201610617361.XA Active CN106393102B (en) 2015-07-31 2016-07-29 Machine learning device, robot system, and machine learning method

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202110544521.3A Pending CN113199483A (en) 2015-07-31 2016-07-29 Robot system, robot control method, machine learning device, and machine learning method

Country Status (3)

Country Link
JP (4) JP6522488B2 (en)
CN (2) CN113199483A (en)
DE (1) DE102016015873B3 (en)

Cited By (35)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107252785A (en) * 2017-06-29 2017-10-17 顺丰速运有限公司 A kind of express mail grasping means applied to quick despatch robot piece supplying
CN107255969A (en) * 2017-06-28 2017-10-17 重庆柚瓣家科技有限公司 Endowment robot supervisory systems
CN107329445A (en) * 2017-06-28 2017-11-07 重庆柚瓣家科技有限公司 The method of robot behavior criterion intelligent supervision
CN107336234A (en) * 2017-06-13 2017-11-10 赛赫智能设备(上海)股份有限公司 A kind of reaction type self study industrial robot and method of work
CN108340367A (en) * 2017-12-13 2018-07-31 深圳市鸿益达供应链科技有限公司 Machine learning method for mechanical arm crawl
CN108527371A (en) * 2018-04-17 2018-09-14 重庆邮电大学 A kind of Dextrous Hand planing method based on BP neural network
CN108687766A (en) * 2017-03-31 2018-10-23 发那科株式会社 Control device, machine learning device and the machine learning method of robot
CN108789499A (en) * 2017-04-28 2018-11-13 发那科株式会社 Article extraction system
CN108942916A (en) * 2017-05-19 2018-12-07 发那科株式会社 Workpiece extraction system
CN109002012A (en) * 2017-06-07 2018-12-14 发那科株式会社 control device and machine learning device
CN109202394A (en) * 2017-07-07 2019-01-15 发那科株式会社 Assembly supply device and machine learning device
CN109420859A (en) * 2017-08-28 2019-03-05 发那科株式会社 Machine learning device, machine learning system and machine learning method
CN109434844A (en) * 2018-09-17 2019-03-08 鲁班嫡系机器人(深圳)有限公司 Food materials handling machine people control method, device, system, storage medium and equipment
CN109500809A (en) * 2017-09-15 2019-03-22 西门子股份公司 Optimization to the automation process by Robot Selection and crawl object
CN109551459A (en) * 2017-09-25 2019-04-02 发那科株式会社 Robot system and method for taking out work
CN109731793A (en) * 2018-12-17 2019-05-10 上海航天电子有限公司 A kind of small lot chip bulk cargo device intelligent sorting equipment
CN109814615A (en) * 2017-11-22 2019-05-28 发那科株式会社 Control device and machine learning device
CN110091084A (en) * 2018-01-30 2019-08-06 发那科株式会社 Learn the machine learning device of the failure mechanism of laser aid
CN110125955A (en) * 2018-02-09 2019-08-16 发那科株式会社 Control device and machine learning device
CN110174875A (en) * 2018-02-19 2019-08-27 欧姆龙株式会社 Simulator, analogy method and storage medium
CN110303473A (en) * 2018-03-20 2019-10-08 发那科株式会社 Use the object picking device and article removing method of sensor and robot
CN110315505A (en) * 2018-03-29 2019-10-11 发那科株式会社 Machine learning device and method, robot controller, robotic vision system
CN110456644A (en) * 2019-08-13 2019-11-15 北京地平线机器人技术研发有限公司 Determine the method, apparatus and electronic equipment of the execution action message of automation equipment
CN110691676A (en) * 2017-06-19 2020-01-14 谷歌有限责任公司 Robot crawling prediction using neural networks and geometrically-aware object representations
CN110712194A (en) * 2018-07-13 2020-01-21 发那科株式会社 Object inspection device, object inspection system, and method for adjusting inspection position
CN111194452A (en) * 2017-06-09 2020-05-22 川崎重工业株式会社 Motion prediction system and motion prediction method
CN112135719A (en) * 2018-06-14 2020-12-25 雅马哈发动机株式会社 Machine learning device and robot system provided with same
CN112203811A (en) * 2018-05-25 2021-01-08 川崎重工业株式会社 Robot system and robot control method
CN112218748A (en) * 2018-06-14 2021-01-12 雅马哈发动机株式会社 Robot system
CN112512757A (en) * 2018-11-09 2021-03-16 欧姆龙株式会社 Robot control device, simulation method, and simulation program
CN112757284A (en) * 2019-10-21 2021-05-07 佳能株式会社 Robot control apparatus, method and storage medium
CN113412177A (en) * 2018-12-27 2021-09-17 川崎重工业株式会社 Robot control device, robot system, and robot control method
WO2022100363A1 (en) * 2020-11-13 2022-05-19 腾讯科技(深圳)有限公司 Robot control method, apparatus and device, and storage medium and program product
CN114786888A (en) * 2020-01-16 2022-07-22 欧姆龙株式会社 Control device, control method, and control program
CN115816466A (en) * 2023-02-02 2023-03-21 中国科学技术大学 Method for improving control stability of visual observation robot

Families Citing this family (75)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6522488B2 (en) * 2015-07-31 2019-05-29 ファナック株式会社 Machine learning apparatus, robot system and machine learning method for learning work taking-out operation
JP6771744B2 (en) * 2017-01-25 2020-10-21 株式会社安川電機 Handling system and controller
JP6453922B2 (en) * 2017-02-06 2019-01-16 ファナック株式会社 Work picking apparatus and work picking method for improving work picking operation
US11222417B2 (en) 2017-03-06 2022-01-11 Fuji Corporation Data structure for creating image-processing data and method for creating image-processing data
JP6542824B2 (en) * 2017-03-13 2019-07-10 ファナック株式会社 Image processing apparatus and image processing method for calculating likelihood of image of object detected from input image
JP6438512B2 (en) * 2017-03-13 2018-12-12 ファナック株式会社 ROBOT SYSTEM, MEASUREMENT DATA PROCESSING DEVICE, AND MEASUREMENT DATA PROCESSING METHOD FOR TAKE OUT WORK WITH MEASUREMENT DATA CORRECTED BY MACHINE LEARN
JP6869060B2 (en) * 2017-03-15 2021-05-12 株式会社オカムラ Manipulator controls, control methods and programs, and work systems
JP6902369B2 (en) * 2017-03-15 2021-07-14 株式会社オカムラ Presentation device, presentation method and program, and work system
JP6983524B2 (en) * 2017-03-24 2021-12-17 キヤノン株式会社 Information processing equipment, information processing methods and programs
JP6557272B2 (en) * 2017-03-29 2019-08-07 ファナック株式会社 State determination device
JP6680714B2 (en) * 2017-03-30 2020-04-15 ファナック株式会社 Control device and machine learning device for wire electric discharge machine
JP7045139B2 (en) * 2017-06-05 2022-03-31 株式会社日立製作所 Machine learning equipment, machine learning methods, and machine learning programs
JP7116901B2 (en) * 2017-08-01 2022-08-12 オムロン株式会社 ROBOT CONTROL DEVICE, ROBOT CONTROL METHOD AND ROBOT CONTROL PROGRAM
DE102017213658A1 (en) * 2017-08-07 2019-02-07 Robert Bosch Gmbh Handling arrangement with a handling device for performing at least one work step and method and computer program
JP6680730B2 (en) * 2017-08-08 2020-04-15 ファナック株式会社 Control device and learning device
JP6680732B2 (en) * 2017-08-23 2020-04-15 ファナック株式会社 Goods stacking device and machine learning device
US11446816B2 (en) * 2017-09-01 2022-09-20 The Regents Of The University Of California Robotic systems and methods for robustly grasping and targeting objects
JP6608890B2 (en) * 2017-09-12 2019-11-20 ファナック株式会社 Machine learning apparatus, robot system, and machine learning method
JP6895563B2 (en) * 2017-09-25 2021-06-30 ファナック株式会社 Robot system, model generation method, and model generation program
JP6579498B2 (en) 2017-10-20 2019-09-25 株式会社安川電機 Automation device and position detection device
JP2019084601A (en) 2017-11-02 2019-06-06 キヤノン株式会社 Information processor, gripping system and information processing method
JP6815309B2 (en) * 2017-11-16 2021-01-20 株式会社東芝 Operating system and program
JP6676030B2 (en) 2017-11-20 2020-04-08 株式会社安川電機 Grasping system, learning device, gripping method, and model manufacturing method
US10828778B2 (en) * 2017-11-30 2020-11-10 Abb Schweiz Ag Method for operating a robot
JP7136554B2 (en) * 2017-12-18 2022-09-13 国立大学法人信州大学 Grasping device, learning device, program, grasping system, and learning method
KR102565444B1 (en) * 2017-12-21 2023-08-08 삼성전자주식회사 Method and apparatus for identifying object
JP6587195B2 (en) * 2018-01-16 2019-10-09 株式会社Preferred Networks Tactile information estimation device, tactile information estimation method, program, and non-transitory computer-readable medium
JP6458912B1 (en) * 2018-01-24 2019-01-30 三菱電機株式会社 Position control device and position control method
JP7005388B2 (en) * 2018-03-01 2022-01-21 株式会社東芝 Information processing equipment and sorting system
JP6873941B2 (en) 2018-03-02 2021-05-19 株式会社日立製作所 Robot work system and control method of robot work system
EP3762185A1 (en) 2018-03-05 2021-01-13 Omron Corporation Method, apparatus, system and program for controlling a robot, and storage medium
JP6879238B2 (en) 2018-03-13 2021-06-02 オムロン株式会社 Work picking device and work picking method
JP6911798B2 (en) * 2018-03-15 2021-07-28 オムロン株式会社 Robot motion control device
JP2019162712A (en) * 2018-03-20 2019-09-26 ファナック株式会社 Control device, machine learning device and system
KR102043898B1 (en) * 2018-03-27 2019-11-12 한국철도기술연구원 Auto picking system and method for automatically picking using the same
US11260534B2 (en) * 2018-04-04 2022-03-01 Canon Kabushiki Kaisha Information processing apparatus and information processing method
US11579000B2 (en) 2018-04-05 2023-02-14 Fanuc Corporation Measurement operation parameter adjustment apparatus, machine learning device, and system
JP6829271B2 (en) * 2018-04-05 2021-02-10 ファナック株式会社 Measurement operation parameter adjustment device, machine learning device and system
JP7252944B2 (en) 2018-04-26 2023-04-05 パナソニックホールディングス株式会社 Actuator device, object retrieval method using actuator device, and object retrieval system
JP7154815B2 (en) 2018-04-27 2022-10-18 キヤノン株式会社 Information processing device, control method, robot system, computer program, and storage medium
CN112203812B (en) 2018-05-25 2023-05-16 川崎重工业株式会社 Robot system and additional learning method
KR102094360B1 (en) * 2018-06-11 2020-03-30 동국대학교 산학협력단 System and method for predicting force based on image
JP7102241B2 (en) * 2018-06-14 2022-07-19 ヤマハ発動機株式会社 Machine learning device and robot system equipped with it
JP2020001127A (en) * 2018-06-28 2020-01-09 勇貴 高橋 Picking system, picking processing equipment, and program
JP6784722B2 (en) * 2018-06-28 2020-11-11 ファナック株式会社 Output device, control device, and evaluation function value output method
WO2020009139A1 (en) * 2018-07-04 2020-01-09 株式会社Preferred Networks Learning method, learning device, learning system, and program
WO2020021643A1 (en) * 2018-07-24 2020-01-30 株式会社Fuji End effector selection method and selection system
JP7191569B2 (en) * 2018-07-26 2022-12-19 Ntn株式会社 gripping device
CN112512942B (en) * 2018-08-03 2022-05-17 株式会社富士 Parameter learning method and operating system
JP7034035B2 (en) * 2018-08-23 2022-03-11 株式会社日立製作所 Motion generation method for autonomous learning robot device and autonomous learning robot device
JP7159525B2 (en) * 2018-11-29 2022-10-25 京セラドキュメントソリューションズ株式会社 ROBOT CONTROL DEVICE, LEARNING DEVICE, AND ROBOT CONTROL SYSTEM
US20220016761A1 (en) 2018-12-27 2022-01-20 Kawasaki Jukogyo Kabushiki Kaisha Robot control device, robot system, and robot control method
CN109784400A (en) * 2019-01-12 2019-05-21 鲁班嫡系机器人(深圳)有限公司 Intelligent body Behavioral training method, apparatus, system, storage medium and equipment
JP7000359B2 (en) * 2019-01-16 2022-01-19 ファナック株式会社 Judgment device
JP6632095B1 (en) * 2019-01-16 2020-01-15 株式会社エクサウィザーズ Learned model generation device, robot control device, and program
JP7252787B2 (en) 2019-02-28 2023-04-05 川崎重工業株式会社 Machine learning model operation management system and machine learning model operation management method
JP7336856B2 (en) * 2019-03-01 2023-09-01 株式会社Preferred Networks Information processing device, method and program
WO2020194392A1 (en) * 2019-03-22 2020-10-01 connectome.design株式会社 Computer, method, and program for generating teaching data for autonomous robot
JP7302226B2 (en) * 2019-03-27 2023-07-04 株式会社ジェイテクト SUPPORT DEVICE AND SUPPORT METHOD FOR GRINDER
JP7349423B2 (en) * 2019-06-19 2023-09-22 株式会社Preferred Networks Learning device, learning method, learning model, detection device and grasping system
JP2021013996A (en) * 2019-07-12 2021-02-12 キヤノン株式会社 Control method of robot system, manufacturing method of articles, control program, recording medium, and robot system
JP7415356B2 (en) * 2019-07-29 2024-01-17 セイコーエプソン株式会社 Program transfer system and robot system
WO2021039995A1 (en) 2019-08-28 2021-03-04 株式会社DailyColor Robot control device
JP7021158B2 (en) 2019-09-04 2022-02-16 株式会社東芝 Robot system and drive method
JP6924448B2 (en) * 2019-12-02 2021-08-25 Arithmer株式会社 Picking system, picking method, and program
JP7463777B2 (en) 2020-03-13 2024-04-09 オムロン株式会社 CONTROL DEVICE, LEARNING DEVICE, ROBOT SYSTEM, AND METHOD
US20230158667A1 (en) 2020-04-28 2023-05-25 Yamaha Hatsudoki Kabushiki Kaisha Machine learning method and robot system
JP2023145809A (en) * 2020-07-10 2023-10-12 株式会社Preferred Networks Reinforcement learning device, reinforcement learning system, object operation device, model generation method and reinforcement learning program
EP4260994A1 (en) 2020-12-08 2023-10-18 Sony Group Corporation Training device, training system, and training method
DE102021104001B3 (en) 2021-02-19 2022-04-28 Gerhard Schubert Gesellschaft mit beschränkter Haftung Method for automatically grasping, in particular moving, objects
KR102346900B1 (en) 2021-08-05 2022-01-04 주식회사 애자일소다 Deep reinforcement learning apparatus and method for pick and place system
DE102021209646B4 (en) 2021-09-02 2024-05-02 Robert Bosch Gesellschaft mit beschränkter Haftung Robot device, method for computer-implemented training of a robot control model and method for controlling a robot device
WO2023042306A1 (en) * 2021-09-15 2023-03-23 ヤマハ発動機株式会社 Image processing device, component gripping system, image processing method, and component gripping method
EP4311632A1 (en) * 2022-07-27 2024-01-31 Siemens Aktiengesellschaft Method for gripping an object, computer program and electronically readable data carrier
CN117697769B (en) * 2024-02-06 2024-04-30 成都威世通智能科技有限公司 Robot control system and method based on deep learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1380846A (en) * 2000-03-31 2002-11-20 索尼公司 Robot device, robot device action control method, external force detecting device and method
JP2005103681A (en) * 2003-09-29 2005-04-21 Fanuc Ltd Robot system
CN101051215A (en) * 2006-04-06 2007-10-10 索尼株式会社 Learning apparatus, learning method, and program
US20090105881A1 (en) * 2002-07-25 2009-04-23 Intouch Technologies, Inc. Medical Tele-Robotic System
JP2013052490A (en) * 2011-09-06 2013-03-21 Mitsubishi Electric Corp Workpiece takeout device
CN103753557A (en) * 2014-02-14 2014-04-30 上海创绘机器人科技有限公司 Self-balance control method of movable type inverted pendulum system and self-balance vehicle intelligent control system
JP2014206795A (en) * 2013-04-11 2014-10-30 日本電信電話株式会社 Reinforcement learning method based on linear model, device therefor and program

Family Cites Families (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0588721A (en) * 1991-09-30 1993-04-09 Fujitsu Ltd Controller for articulated robot
JPH06106490A (en) * 1992-09-29 1994-04-19 Fujitsu Ltd Control device
JPH06203166A (en) * 1993-01-06 1994-07-22 Fujitsu Ltd Measurement, controller and learning method for multi-dimensional position
JP3211186B2 (en) * 1997-12-15 2001-09-25 オムロン株式会社 Robot, robot system, robot learning method, robot system learning method, and recording medium
JPH11272845A (en) * 1998-03-23 1999-10-08 Denso Corp Image recognition device
JP3859371B2 (en) * 1998-09-25 2006-12-20 松下電工株式会社 Picking equipment
JP2001019165A (en) 1999-07-02 2001-01-23 Murata Mach Ltd Work picking device
JP4630553B2 (en) * 2004-01-15 2011-02-09 ソニー株式会社 Dynamic control device and biped walking mobile body using dynamic control device
JP2005238422A (en) * 2004-02-27 2005-09-08 Sony Corp Robot device, its state transition model construction method and behavior control method
JP4746349B2 (en) * 2005-05-18 2011-08-10 日本電信電話株式会社 Robot action selection device and robot action selection method
JP4153528B2 (en) * 2006-03-10 2008-09-24 ファナック株式会社 Apparatus, program, recording medium and method for robot simulation
JP4199264B2 (en) * 2006-05-29 2008-12-17 ファナック株式会社 Work picking apparatus and method
JP4238256B2 (en) * 2006-06-06 2009-03-18 ファナック株式会社 Robot simulation device
US7957583B2 (en) * 2007-08-02 2011-06-07 Roboticvisiontech Llc System and method of three-dimensional pose estimation
JP2009262279A (en) * 2008-04-25 2009-11-12 Nec Corp Robot, robot program sharing system, robot program sharing method, and program
JP2010086405A (en) 2008-10-01 2010-04-15 Fuji Heavy Ind Ltd System for adapting control parameter
JP5330138B2 (en) * 2008-11-04 2013-10-30 本田技研工業株式会社 Reinforcement learning system
EP2249292A1 (en) * 2009-04-03 2010-11-10 Siemens Aktiengesellschaft Decision making mechanism, method, module, and robot configured to decide on at least one prospective action of the robot
CN101726251A (en) * 2009-11-13 2010-06-09 江苏大学 Automatic fruit identification method of apple picking robot on basis of support vector machine
CN101782976B (en) * 2010-01-15 2013-04-10 南京邮电大学 Automatic selection method for machine learning in cloud computing environment
FI20105732A0 (en) * 2010-06-24 2010-06-24 Zenrobotics Oy Procedure for selecting physical objects in a robotic system
JP5743499B2 (en) * 2010-11-10 2015-07-01 キヤノン株式会社 Image generating apparatus, image generating method, and program
JP5767464B2 (en) * 2010-12-15 2015-08-19 キヤノン株式会社 Information processing apparatus, information processing apparatus control method, and program
JP5750657B2 (en) * 2011-03-30 2015-07-22 株式会社国際電気通信基礎技術研究所 Reinforcement learning device, control device, and reinforcement learning method
JP5787642B2 (en) 2011-06-28 2015-09-30 キヤノン株式会社 Object holding device, method for controlling object holding device, and program
JP5642738B2 (en) * 2012-07-26 2014-12-17 ファナック株式会社 Apparatus and method for picking up loosely stacked articles by robot
JP5670397B2 (en) * 2012-08-29 2015-02-18 ファナック株式会社 Apparatus and method for picking up loosely stacked articles by robot
JP2014081863A (en) * 2012-10-18 2014-05-08 Sony Corp Information processing device, information processing method and program
JP6126437B2 (en) 2013-03-29 2017-05-10 キヤノン株式会社 Image processing apparatus and image processing method
JP5929854B2 (en) * 2013-07-31 2016-06-08 株式会社安川電機 Robot system and method of manufacturing workpiece
CN104793620B (en) * 2015-04-17 2019-06-18 中国矿业大学 The avoidance robot of view-based access control model feature binding and intensified learning theory
JP6522488B2 (en) * 2015-07-31 2019-05-29 ファナック株式会社 Machine learning apparatus, robot system and machine learning method for learning work taking-out operation

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1380846A (en) * 2000-03-31 2002-11-20 索尼公司 Robot device, robot device action control method, external force detecting device and method
US20090105881A1 (en) * 2002-07-25 2009-04-23 Intouch Technologies, Inc. Medical Tele-Robotic System
JP2005103681A (en) * 2003-09-29 2005-04-21 Fanuc Ltd Robot system
CN101051215A (en) * 2006-04-06 2007-10-10 索尼株式会社 Learning apparatus, learning method, and program
JP2013052490A (en) * 2011-09-06 2013-03-21 Mitsubishi Electric Corp Workpiece takeout device
JP2014206795A (en) * 2013-04-11 2014-10-30 日本電信電話株式会社 Reinforcement learning method based on linear model, device therefor and program
CN103753557A (en) * 2014-02-14 2014-04-30 上海创绘机器人科技有限公司 Self-balance control method of movable type inverted pendulum system and self-balance vehicle intelligent control system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
林芬等: "基于偏向信息学习的双层强化学习算法", 《计算机研究与发展》 *

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108687766A (en) * 2017-03-31 2018-10-23 发那科株式会社 Control device, machine learning device and the machine learning method of robot
CN108789499B (en) * 2017-04-28 2019-12-31 发那科株式会社 Article retrieval system
US10518417B2 (en) 2017-04-28 2019-12-31 Fanuc Corporation Article retrieval system
CN108789499A (en) * 2017-04-28 2018-11-13 发那科株式会社 Article extraction system
CN108942916A (en) * 2017-05-19 2018-12-07 发那科株式会社 Workpiece extraction system
CN108942916B (en) * 2017-05-19 2019-08-23 发那科株式会社 Workpiece extraction system
CN109002012A (en) * 2017-06-07 2018-12-14 发那科株式会社 control device and machine learning device
CN109002012B (en) * 2017-06-07 2020-05-29 发那科株式会社 Control device and machine learning device
CN111194452B (en) * 2017-06-09 2023-10-10 川崎重工业株式会社 Motion prediction system and motion prediction method
CN111194452A (en) * 2017-06-09 2020-05-22 川崎重工业株式会社 Motion prediction system and motion prediction method
CN107336234A (en) * 2017-06-13 2017-11-10 赛赫智能设备(上海)股份有限公司 A kind of reaction type self study industrial robot and method of work
US11554483B2 (en) 2017-06-19 2023-01-17 Google Llc Robotic grasping prediction using neural networks and geometry aware object representation
CN110691676A (en) * 2017-06-19 2020-01-14 谷歌有限责任公司 Robot crawling prediction using neural networks and geometrically-aware object representations
CN107255969B (en) * 2017-06-28 2019-10-18 重庆柚瓣家科技有限公司 Endowment robot supervisory systems
CN107329445A (en) * 2017-06-28 2017-11-07 重庆柚瓣家科技有限公司 The method of robot behavior criterion intelligent supervision
CN107255969A (en) * 2017-06-28 2017-10-17 重庆柚瓣家科技有限公司 Endowment robot supervisory systems
CN107252785A (en) * 2017-06-29 2017-10-17 顺丰速运有限公司 A kind of express mail grasping means applied to quick despatch robot piece supplying
CN109202394A (en) * 2017-07-07 2019-01-15 发那科株式会社 Assembly supply device and machine learning device
CN109202394B (en) * 2017-07-07 2020-10-30 发那科株式会社 Component supply device and machine learning device
CN109420859B (en) * 2017-08-28 2021-11-26 发那科株式会社 Machine learning device, machine learning system, and machine learning method
CN109420859A (en) * 2017-08-28 2019-03-05 发那科株式会社 Machine learning device, machine learning system and machine learning method
CN109500809A (en) * 2017-09-15 2019-03-22 西门子股份公司 Optimization to the automation process by Robot Selection and crawl object
CN109551459A (en) * 2017-09-25 2019-04-02 发那科株式会社 Robot system and method for taking out work
CN109814615B (en) * 2017-11-22 2021-03-02 发那科株式会社 Control device and machine learning device
CN109814615A (en) * 2017-11-22 2019-05-28 发那科株式会社 Control device and machine learning device
CN108340367A (en) * 2017-12-13 2018-07-31 深圳市鸿益达供应链科技有限公司 Machine learning method for mechanical arm crawl
CN110091084A (en) * 2018-01-30 2019-08-06 发那科株式会社 Learn the machine learning device of the failure mechanism of laser aid
CN110125955A (en) * 2018-02-09 2019-08-16 发那科株式会社 Control device and machine learning device
CN110125955B (en) * 2018-02-09 2021-09-24 发那科株式会社 Control device and machine learning device
CN110174875A (en) * 2018-02-19 2019-08-27 欧姆龙株式会社 Simulator, analogy method and storage medium
CN110303473A (en) * 2018-03-20 2019-10-08 发那科株式会社 Use the object picking device and article removing method of sensor and robot
CN110303473B (en) * 2018-03-20 2022-10-18 发那科株式会社 Article pickup apparatus and article pickup method using sensor and robot
CN110315505A (en) * 2018-03-29 2019-10-11 发那科株式会社 Machine learning device and method, robot controller, robotic vision system
CN108527371A (en) * 2018-04-17 2018-09-14 重庆邮电大学 A kind of Dextrous Hand planing method based on BP neural network
CN112203811A (en) * 2018-05-25 2021-01-08 川崎重工业株式会社 Robot system and robot control method
CN112203811B (en) * 2018-05-25 2023-05-09 川崎重工业株式会社 Robot system and robot control method
CN112135719B (en) * 2018-06-14 2023-08-22 雅马哈发动机株式会社 Machine learning device and robot system provided with same
CN112135719A (en) * 2018-06-14 2020-12-25 雅马哈发动机株式会社 Machine learning device and robot system provided with same
CN112218748A (en) * 2018-06-14 2021-01-12 雅马哈发动机株式会社 Robot system
CN112218748B (en) * 2018-06-14 2023-09-05 雅马哈发动机株式会社 robot system
CN110712194A (en) * 2018-07-13 2020-01-21 发那科株式会社 Object inspection device, object inspection system, and method for adjusting inspection position
CN109434844A (en) * 2018-09-17 2019-03-08 鲁班嫡系机器人(深圳)有限公司 Food materials handling machine people control method, device, system, storage medium and equipment
CN112512757A (en) * 2018-11-09 2021-03-16 欧姆龙株式会社 Robot control device, simulation method, and simulation program
CN109731793A (en) * 2018-12-17 2019-05-10 上海航天电子有限公司 A kind of small lot chip bulk cargo device intelligent sorting equipment
CN113412177A (en) * 2018-12-27 2021-09-17 川崎重工业株式会社 Robot control device, robot system, and robot control method
CN110456644B (en) * 2019-08-13 2022-12-06 北京地平线机器人技术研发有限公司 Method and device for determining execution action information of automation equipment and electronic equipment
CN110456644A (en) * 2019-08-13 2019-11-15 北京地平线机器人技术研发有限公司 Determine the method, apparatus and electronic equipment of the execution action message of automation equipment
CN112757284A (en) * 2019-10-21 2021-05-07 佳能株式会社 Robot control apparatus, method and storage medium
CN112757284B (en) * 2019-10-21 2024-03-22 佳能株式会社 Robot control device, method, and storage medium
CN114786888A (en) * 2020-01-16 2022-07-22 欧姆龙株式会社 Control device, control method, and control program
WO2022100363A1 (en) * 2020-11-13 2022-05-19 腾讯科技(深圳)有限公司 Robot control method, apparatus and device, and storage medium and program product
CN115816466A (en) * 2023-02-02 2023-03-21 中国科学技术大学 Method for improving control stability of visual observation robot

Also Published As

Publication number Publication date
JP2017064910A (en) 2017-04-06
JP2022145915A (en) 2022-10-04
JP2020168719A (en) 2020-10-15
JP2017030135A (en) 2017-02-09
JP7100426B2 (en) 2022-07-13
CN106393102B (en) 2021-06-01
CN113199483A (en) 2021-08-03
JP6522488B2 (en) 2019-05-29
DE102016015873B3 (en) 2020-10-29

Similar Documents

Publication Publication Date Title
CN106393102A (en) Machine learning device, robot system, and machine learning method
US11780095B2 (en) Machine learning device, robot system, and machine learning method for learning object picking operation
Schwarz et al. Fast object learning and dual-arm coordination for cluttered stowing, picking, and packing
CN106393101B (en) Rote learning device and method, robot controller, robot system
Xu et al. Densephysnet: Learning dense physical object representations via multi-step dynamic interactions
CN105082132B (en) Fast machine people's learning by imitation of power moment of torsion task
CN110000785B (en) Agricultural scene calibration-free robot motion vision cooperative servo control method and equipment
CN107825422A (en) Rote learning device, robot system and learning by rote
CN109685141B (en) Robot article sorting visual detection method based on deep neural network
Toussaint et al. Integrated motor control, planning, grasping and high-level reasoning in a blocks world using probabilistic inference
CN107914270A (en) control device, robot system and production system
CN106557069A (en) Rote learning apparatus and method and the lathe with the rote learning device
CN107866809A (en) Learn the machine learning device and machine learning method in optimal Article gripping path
CN109421071A (en) Article stacking adapter and machine learning device
CN109531584A (en) A kind of Mechanical arm control method and device based on deep learning
CN114299150A (en) Depth 6D pose estimation network model and workpiece pose estimation method
CN105426901A (en) Method For Classifying A Known Object In A Field Of View Of A Camera
Dyrstad et al. Teaching a robot to grasp real fish by imitation learning from a human supervisor in virtual reality
Zhang et al. Deep learning reactive robotic grasping with a versatile vacuum gripper
Yang et al. Automation of SME production with a Cobot system powered by learning-based vision
CN115319739A (en) Workpiece grabbing method based on visual mechanical arm
WO2023014369A1 (en) Synthetic dataset creation for object detection and classification with deep learning
RU2745380C1 (en) Method and system for capturing objects using robotic device
Ojiro et al. A study of smart factory with artificial intelligence
Bowkett Functional Autonomy Techniques for Manipulation in Uncertain Environments

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant