CN106393102A

CN106393102A - Machine learning device, robot system, and machine learning method

Info

Publication number: CN106393102A
Application number: CN201610617361.XA
Authority: CN
Inventors: 山崎岳; 尾山拓未; 陶山峻; 中山隆; 中山一隆; 组谷英俊; 中川浩; 冈野原大辅; 奥田辽介; 松元睿; 松元睿一; 河合圭悟
Original assignee: Preferred Network Co; Fanuc Corp
Current assignee: Preferred Network Co; Fanuc Corp
Priority date: 2015-07-31
Filing date: 2016-07-29
Publication date: 2017-02-15
Anticipated expiration: 2036-07-29
Also published as: JP2017064910A; JP2022145915A; JP2020168719A; JP2017030135A; JP7100426B2; CN106393102B; CN113199483A; JP6522488B2; DE102016015873B3

Abstract

The invention provides a machine learning device, a robot system, and a machine learning method. The machine learning device that learns an operation of a robot (14) for picking up, by a hand unit, any of a plurality of workpieces (12) placed in a random fashion, including a bulk-loaded state, includes a state variable observation unit (21) that observes a state variable representing a state of the robot, including data output from a three-dimensional measuring device (15) that obtains a three-dimensional map for each workpiece, an operation result obtaining unit (26) that obtains a result of a picking operation of the robot for picking up the workpiece by the hand unit, and a learning unit (22) that learns a manipulated variable including command data for commanding the robot to perform the picking operation of the workpiece, in association with the state variable of the robot and the result of the picking operation, upon receiving output from the state variable observation unit and output from the operation result obtaining unit.

Description

Rote learning device, robot system and learning by rote

Technical field

The present invention relates to study comprises the rote learning device of the taking-up action of the workpiece of mixed and disorderly placement, the machine of state in bulk Device people's system and learning by rote.

Background technology

Disclosed in such as No. 5642738 publications of Japanese Patent No. and No. 5670397 publications of Japanese Patent No. known in the past that Sample, holds the workpiece in basket case in bulk the robot system carried by the robot arm of robot.Such In robot system, for example, the positional information of multiple workpiece is obtained using the 3-D measurer of setting above basket case, And workpiece is taken out one by one according to this positional information by the robot arm of robot.

However, in above-mentioned conventional robot system, needing to be previously set for example how basis is surveyed by 3-D measurer The range image of the multiple workpiece measuring is extracting workpiece to be taken out, and the workpiece taking out which position.Furthermore, it is necessary to thing First the robot arm action of robot how is made to be programmed to when taking out workpiece.Specifically, for example, people need to use teaching Plate carrys out the taking-up action to robot teaching workpiece.

Therefore, when the setting of the workpiece to be taken out of the range image extraction according to multiple workpiece is improper or unsuitable During the operation program of making machine people, success rate when robot takes out workpiece and carries reduces.Additionally, in order to improve this success Rate, need people to repeat trial and error groping the optimal action of robot, while the detection to workpiece sets and robot Operation program is updated.

Content of the invention

Therefore, in view of the foregoing, it is an object of the invention to provide one kind can learn to take out without manpower intervention Comprise state in bulk the workpiece of mixed and disorderly placement when the rote learning device of the optimal action of robot, robot system and Learning by rote.

According to the first embodiment of the invention, there is provided a kind of study is by robot arm from comprising the miscellaneous of state in bulk Leave about the rote learning device of the action of robot taking out described workpiece in the multiple workpiece put, this rote learning device has Standby：Quantity of state observation unit, its observation comprises the quantity of state of the described robot of output data of 3-D measurer, this three-dimensional measurement Device measures the graphics of each described workpiece；The result of the action obtaining section, it obtains and takes out described workpiece by described robot arm The taking-up action of described robot result；Study portion, it accepts from the output of described quantity of state observation unit with from institute State the output of the result of the action obtaining section, learn in association with the described quantity of state of described robot and the result of described taking-up action Practise operational ton, this operational ton comprises to indicate the director data of the described taking-up action of described workpiece to described robot.Described machine Tool learning device is preferably also equipped with being intended to determination section, its described operational ton learning with reference to described study portion, to determine to institute State the described director data of robot instruction.

Second embodiment of the invention, there is provided a kind of study is by robot arm from comprising the miscellaneous of state in bulk Leave about the rote learning device of the action of robot taking out described workpiece in the multiple workpiece put, this rote learning device has Standby：Quantity of state observation unit, its observation comprises the quantity of state of the described robot of output data of 3-D measurer, this three-dimensional measurement Device measures the graphics of each described workpiece；The result of the action obtaining section, it obtains and takes out described workpiece by described robot arm The taking-up action of described robot result；Study portion, it accepts from the output of described quantity of state observation unit with from institute State the output of the result of the action obtaining section, learn in association with the described quantity of state of described robot and the result of described taking-up action Practise the operational ton of the measurement parameter comprising described 3-D measurer.Described rote learning device is preferably also equipped with being intended to determination section, The described operational ton that it learns with reference to described study portion to determine the described measurement parameter of described 3-D measurer.

Described quantity of state observation unit can also be observed according to the output of described 3-D measurer and comprise coordinate calculating part The quantity of state of the described robot of output data, this coordinate calculating part is used for calculating the three-dimensional position of each described workpiece.Described Coordinate calculating part can also calculate the attitude of each described workpiece, and export the described workpiece of each calculating three-dimensional position and The data of attitude.Described the result of the action obtaining section can utilize the output data of described 3-D measurer.Described rote learning dress Put and be preferably also equipped with pretreatment portion, it is in the output number to before the input of described quantity of state observation unit, to described 3-D measurer According to being processed, described quantity of state observation unit accepts the quantity of state as described robot for the output data of pretreatment portion.Described Pretreatment portion can make the direction of each described workpiece in the output data of described 3-D measurer be necessarily with height.Institute State the result of the action obtaining section to obtain taking-up success or not, the distress condition of described workpiece of described workpiece and will take out Described workpiece be sent at least one in completeness during rear operation.

Described study portion can possess：Calculated according to the return that the output of described the result of the action obtaining section calculates return Portion；And cost function update section, it has the cost function of the value of the described taking-up action for determining described workpiece, with Described return accordingly to update described cost function.The described taking-up that described study portion can also have the described workpiece of study is moved The learning model made, possesses：Error calculation portion, it is defeated with described learning model according to exporting of described the result of the action obtaining section Out calculation error；And learning model update section, it accordingly to update described learning model with described error.Described machinery Learning device preferably has neutral net.

According to third embodiment of the present invention, there is provided a kind of robot system possessing rote learning device, this machine The study of tool learning device takes out described workpiece by robot arm from multiple workpiece of the mixed and disorderly placement comprising state in bulk The action of robot, this rote learning device possesses：Quantity of state observation unit, its observation comprises the output data of 3-D measurer The quantity of state of described robot, this 3-D measurer measures the graphics of each described workpiece；The result of the action obtaining section, its acquirement Take out the result of the taking-up action of the described robot of described workpiece by described robot arm；And study portion, it accepts to come Output from described quantity of state observation unit and the output from described the result of the action obtaining section, the described state with described robot The result of amount and described taking-up action learns to comprise to indicate the described taking-up action of described workpiece to described robot in association Director data operational ton, described robot system possesses：Described robot, described 3-D measurer and control respectively Described robot and the control device of described 3-D measurer.

According to the present invention the 4th is embodiment there is provided a kind of robot system possessing rote learning device, this machine The study of tool learning device takes out described workpiece by robot arm from multiple workpiece of the mixed and disorderly placement comprising state in bulk The action of robot, this rote learning device possesses：Quantity of state observation unit, its observation comprises the output data of 3-D measurer The quantity of state of described robot, this 3-D measurer measures the graphics of each described workpiece；The result of the action obtaining section, its acquirement Take out the result of the taking-up action of the described robot of described workpiece by described robot arm；And study portion, it accepts to come Output from described quantity of state observation unit and the output from described the result of the action obtaining section, the described state with described robot The result of amount and described taking-up action learns to comprise the operational ton of the measurement parameter of described 3-D measurer, described machine in association Device people's system possesses：Described robot, described 3-D measurer and control described robot and described 3-D measurer respectively Control device.

Preferably, described robot system possesses multiple described robots, is respectively directed to each described robot and arranges Described rote learning device, the multiple described rote learning device arranging for multiple described robots passes through communication media phase Mutually shared data or exchange data.Described rote learning device may reside on Cloud Server.

According to the present invention the 5th embodiment there is provided a kind of study by robot arm from comprising the miscellaneous of state in bulk Leave about the learning by rote of the action of robot taking out described workpiece in the multiple workpiece put, this learning by rote comprises Following steps：Observation comprises the quantity of state of the described robot of output data of 3-D measurer, and the measurement of this 3-D measurer is every The graphics of individual described workpiece；Obtain the knot of the taking-up action of described robot taking out described workpiece by described robot arm Really；Accept the output from described quantity of state observation unit and the output from described the result of the action obtaining section, with described robot Described quantity of state and described taking-up action result learn in association to comprise to described robot indicate described workpiece institute State the operational ton of the director data of taking-up action.

Brief description

The present invention is more clearly understood by referring to the following drawings.

Fig. 1 is the block diagram of the conceptual configuration of the robot system representing one embodiment of the present invention.

Fig. 2 is the figure of the model schematically showing neuron.

Fig. 3 is the figure of three-layer neural network schematically showing the neuron shown in combination Fig. 2 and constituting.

Fig. 4 is the flow chart of that represents the action of rote learning device shown in Fig. 1.

Fig. 5 is the block diagram of the conceptual configuration of the robot system representing another embodiment of the present invention.

Fig. 6 is the figure of of the process for the pretreatment portion in robot system shown in Fig. 5 is described.

Fig. 7 is the block diagram of the variation representing the robot system shown in Fig. 1.

Specific embodiment

Hereinafter, rote learning device, robot system and the learning by rote of the present invention to be described in detail in detail referring to the drawings Embodiment.Nonetheless, it is intended that being interpreted as that the present invention is not only restricted to accompanying drawing or embodiments described below.Here, in the drawings, Give identical reference marks to same parts.Additionally, in different drawings, imparting the part meaning of identical reference marks Taste the structural element being have identical function.Additionally, for ease of understanding, these accompanying drawings are suitably changed engineer's scale.

Fig. 1 is the block diagram of the conceptual configuration of the robot system representing one embodiment of the present invention.Present embodiment Robot system 10 possess：It is mounted with the machine of the robot arm 13 for holding the workpiece 12 in basket case 11 in bulk People 14；The 3-D measurer 15 of the graphics (map) on surface of measurement workpiece 12；Control robot 14 and 3-D measurer respectively 15 control device 16；Coordinate calculating part 19；And rote learning device 20.

Here, rote learning device 20 possesses：Quantity of state observation unit 21, the result of the action obtaining section 26, study portion 22 and It is intended to determination section 25.Additionally, rote learning device 20 is as described later in detail, the taking-up that study instruction robot 14 carries out workpiece 12 is moved The such operational ton of measurement parameter of the director data of work or 3-D measurer 15 is simultaneously exported.

Robot 14 is, for example, 6 axis articulated robots, and the respective drive shaft of robot 14 and robot arm 13 is led to Cross control device 16 to control.Additionally, moving successively to take out workpiece 12 from the case 11 being arranged on precalculated position one by one Move specified place such as conveyer belt or operating desk (not shown), using robot 14.

However, when taking out workpiece 12 in bulk from case 11, the wall of robot arm 13 or workpiece 12 and case 11 sometimes Collision or contact.Or, robot arm 13 or workpiece 12 can be blocked by other workpiece 12 sometimes.In this case, in order to The overload that robot 14 is applied can be avoided immediately, need the function that the power acting on robot arm 13 is detected. Therefore, it is provided with the force snesor 17 of 6 axles between the front end of the arm of robot 14 and robot arm 13.In addition, this enforcement The robot system 10 of mode is also equipped with, the motor (not shown) of the drive shaft of each joint portion according to driven machine people 14 Current value is estimating the function of the power acting on robot arm 13.

Additionally, force snesor 17 can detect the power acting on robot arm 13, therefore, it is possible to judge actually manipulator Whether portion 13 is holding workpiece 12.That is, in the case that robot arm 13 has held workpiece 12, the weight of workpiece 12 is made For robot arm 13, therefore after the taking-up action implementing workpiece 12, if the detected value of force snesor 17 beyond Predetermined threshold, then can interpolate that and hold workpiece 12 for 13 for robot arm.Additionally, with regard to robot arm 13 whether holding workpiece 12 judgement, for example, can also pass through the photographed data of video camera used in 3-D measurer 15, be installed on robot arm The output of 13 photoelectric sensor (not shown) etc. is being judged.Further, it is also possible to according to absorption type manipulator described later Manometric data is being judged.

Here, as long as robot arm 13 can hold workpiece 12, then can have various forms.For example, robot arm 13 Can be to be opened and closed and to hold the form of workpiece 12 by making 2 or multiple claw, or possess to workpiece 12 generation attraction Electromagnet or depression generator.That is, depict the situation that robot arm 13 holds workpiece by 2 claws in FIG, so And it is not limited to this.

In order to measure to multiple workpiece 12,3-D measurer 15 is arranged on multiple workpiece 12 by support sector 18 The precalculated position of top.As 3-D measurer 15, for example can be using by being photographed by 2 video cameras (not shown) The view data of workpiece 12 carries out image procossing, to obtain the three-dimensional visual sensor of three dimensional local information.Specifically, by answering With triangulation, light cross-section method, time-of-flight method (Time-of-flight method), range of defocusing method (Depth from Defocus method) or and used these method etc., to measure graphics (position on the surface of multiple workpiece 12 in bulk).

Coordinate calculating part 19, using the graphics that obtains by the use of 3-D measurer 15 as input, to calculate (mensure) in bulk The position on the surface of multiple workpiece 12.That is, using the output of 3-D measurer 15, can obtain each workpiece 12 each Three-dimensional location data (x, y, z) or three-dimensional location data (x, y, z) and attitude data (w, p, r).Here, state discharge observation Portion 21 accepts both the graphics from 3-D measurer 15 and the position data (attitude data) from coordinate calculating part 19, comes The quantity of state of observation robot 14, however, for example can also only accept to observe machine from the graphics of 3-D measurer 15 The quantity of state of people 14.In addition it is also possible in the same manner as situation about illustrating below with reference to Fig. 5, additional pretreatment portion 50, by this Pretreatment portion 50 is being processed (pre- place to before quantity of state observation unit 21 input to the graphics from 3-D measurer 15 Reason), and it is input to quantity of state observation unit 21.

Furthermore, it is assumed that to determine the relevant position of robot 14 and 3-D measurer 15 beforehand through calibration.Additionally, at this In the 3-D measurer 15 of invention, three-dimensional visual sensor can be replaced to use laser distance analyzer.That is, it is permissible Measure the distance to the surface of each workpiece 12 from the position of setting 3-D measurer 15 by laser scanning, or by making Obtain three-dimensional location data and the attitude of multiple workpiece 12 in bulk with the various sensor such as S.L.R, touch sensor (x、y、z、w、p、r).

I.e., in the present invention, as long as the data (x, y, z, w, p, r) of each workpiece 12 for example can be obtained, then whether The 3-D measurer 15 applying which kind of three-dimensional measurement method all can be applied.Additionally, the mode of setting 3-D measurer 15 is not yet It is particularly limited, for example, it is possible to be fixed on floor or wall etc. it is also possible to be installed on arm of robot 14 etc..

3-D measurer 15, by carrying out the instruction of self-control device 16, obtains the three-dimensional of multiple workpiece 12 in bulk in case 11 Figure, coordinate calculating part 19 obtains the data of the three-dimensional position (attitude) of (calculating) multiple workpiece 12 according to this graphics, and will This data output obtains to the quantity of state observation unit 21 of control device 16 and rote learning device 20 described later and the result of the action Portion 26.Especially, in coordinate calculating part 19, the view data of such as multiple workpiece 12 that basis photographs, estimate certain workpiece 12 with the border of the border of other workpiece 12 or workpiece 12 and case 11, and obtain the three-dimensional location data of each workpiece 12.

The three-dimensional location data of each workpiece 12 refers to, for example, pass through according to multiple on multiple workpiece 12 surface in bulk The position of point is estimating the existence position of each workpiece 12, retainable position and the data that obtains.Certainly, in each workpiece 12 Three-dimensional location data in can also include workpiece 12 attitude data.

Additionally, in the three-dimensional position of each workpiece 12 and the acquirement of attitude data of coordinate calculating part 19, also including Method using rote learning.For example, it is possible to application employs and described later have the input picture of the methods such as teacher learning or be derived from The object identification of laser distance analyzer etc., angle presumption etc..

And, work as and the three-dimensional location data of each workpiece 12 is inputted via coordinate calculating part 19 from 3-D measurer 15 During to control device 16, control device 16 controls the action of the robot arm 13 taking out certain workpiece 12 from case 11.Now, according to The optimum position of the robot arm 13 being obtained by rote learning device 20 described later, attitude and the finger corresponding to removing direction Make value (operational ton), to drive the motor (not shown) of each axle of robot arm 13, robot 14.

Additionally, rote learning device 20 can learn the change of the photography conditions of video camera used in 3-D measurer 15 Amount (the measurement parameter of 3-D measurer 15：For example, using exposure meter photography when adjust time for exposure, to by photography target Illumination of illuminator when being illuminated etc.), and via control device 16 according to study to measurement parameter operational ton control 3-D measurer 15 processed.Here, in each workpiece 12 of position deduction of the multiple workpiece 12 measured according to 3-D measurer 15 Existence position/attitude, position/attitude estimates the variable of condition it is also possible to be included in used in retainable position/attitude In the output data of above-mentioned 3-D measurer 15.

Additionally, according to aforementioned, pretreatment portion 50 grade that can describe in detail after a while by referring to Fig. 5 is in advance to from three-dimensional measurement The output data of device 15 is processed, and this data (view data) after processing is given quantity of state observation unit 21.Additionally, it is dynamic Make result obtaining section 26 for example can according to the output data (output data of coordinate calculating part 19) from 3-D measurer 15, The robot arm 13 obtaining robot 14 has taken out the result of workpiece 12, in addition, certainly for example can also be single via other Unit's (video camera that is for example arranged in rear operation, sensor etc.) obtains and for the workpiece 12 of taking-up to be sent to completing during rear operation The such the result of the action of state change such as degree and the breakage of workpiece 12 with the presence or absence of taking-up.More than, quantity of state observation unit 21 It is that functional module realizes both functions naturally it is also possible to be arranged through a module with the result of the action obtaining section 26.

Then, the rote learning device 20 shown in Fig. 1 is described in detail in detail.Rote learning device 20 has following function：From being transfused to In the set of the data in device, extract wherein useful rule or Knowledge representation, judgment standard etc. by parsing, output This judged result, and carry out the study (rote learning) of knowledge.The method of rote learning is various, if substantially divided, For example it is divided into " having teacher learning ", " teacherless learning " and " intensified learning ".Additionally, realizing the aspect of these methods, have Method extraction, being referred to as " Deep Learning (Deep Learning) " of learning characteristic amount itself.Additionally, these mechanics Practise (rote learning device 20) and general computer or processor can be used, but when application GPGPU is (at general-purpose computations image Reason unit, General-Purpose computing on Graphics Processing Units), extensive PC cluster etc. When, can be processed more at high speed.

First, teacher learning is had to refer to by providing the group of the data of certain input and result (label, label) in large quantities To rote learning device 20, learn the feature in these data sets, inductively obtain according to input presumption result model, that is, its Relational.In the case of applying this to have teacher learning in the present embodiment, for example, presumption work can be inputted according to sensor The part of part position or estimate in part of its successful probability etc. for workpiece candidate uses.It is, for example possible to use it is aftermentioned Neutral net scheduling algorithm realizing.

Additionally, teacherless learning refers to, by input data is only supplied to learning device in large quantities, learn input data Carried out which kind of distribution, though do not provide corresponding teacher's output data it is also possible to by being compressed to input data/point The method to learn for the device of class/shaping etc..For example, it is possible to the feature in these data sets is clustered between similar person etc.. Using this result, arrange certain benchmark to enter its optimized output distribution of enforcement, thus, it is possible to realize the prediction exporting.

Additionally, as having teacher learning and the problem of the intermediateness of teacherless learning to set, referred to as partly having teacher learning, this Corresponding to the data group for example only existing part input and output, in addition it is only the situation of the data of input.In this reality Apply in mode, though with teacherless learning come using so that robot motion can be obtained yet data (view data, Analogue data etc.), thus, it is possible to expeditiously be learnt.

Then, intensified learning is described.First, the problem as intensified learning sets it is considered to as follows.

The state of robot observing environment, determines behavior.

Environment changes with certain rule, and then, the behavior of itself gives change to environment sometimes.

When taking action every time, return return signal.

Want maximized be that future, (discount) was returned total.

From not knowing about completely or not exclusively understand that the state of result that behavior causes starts to learn.That is, robot is permissible Actually take action first, obtain its result as data.That is, needing to attempt while exploring optimal behavior.

For apish action it is also possible to by prior learning to (aforesaid have teacher learning, reverse intensified learning Such method) state is set to original state, starts to learn from good beginning place.

Here, intensified learning refers to, in addition to judging, classifying, go back learning behavior, thus according to the phase to environment for the behavior Interaction is used for learning suitable behavior, and that is, study is in order that the return maximization that obtains in the future and the method that learnt.This table Show for example be obtained in that in the present embodiment so that the mountain of workpiece 12 is caved in and future be easy to take out the such impact of workpiece 12 Following behavior.Hereinafter, in case of Q study, son goes on to say, but is not limited to Q study.

The method that Q study refers to learn value Q (s, a) of housing choice behavior a under certain ambient condition s.That is, at certain During state s, Optional Value Q (s, a) highest behavior a is as optimal behavior.But, initially, with regard to state s and behavior a Combination, does not know about the right value being worth Q (s, a) completely.Therefore, intelligent body (behavioral agent) selects various row under certain state s For a, and give return to behavior a now.Thus, constantly the selection of study more preferably behavior is correctly to be worth Q to intelligent body (s、a).

Further, since wanting to make the result as behavior and the total maximization in the return obtaining in the future, therefore, target It is finally to make Q (s, a)=E [Σ (γ^t)r_t].Here, E [] represents expected value, and t is moment, γ is described later to be referred to as discount rate Parameter, r_tBe return during moment t, Σ be total based on moment t.Expected value in this formula is set to according to optimal behavior There occurs the value being taken during state change, because this is unknown, therefore will be explored and be learnt.Such It is worth the newer of Q (s, a), for example, pass through following formula (1) and represent.

In above formula (1), s_tRepresent ambient condition during moment t, a_tRepresent behavior during moment t.By behavior a_t, shape State change turns to s_t+1.r_t+1Represent by returning obtained from this state change.Additionally, the item with max becomes making in state s_t+1 Under the Q value that have selected during now known Q value highest behavior a be multiplied by item obtained by γ.Here, γ is the ginseng of 0 ＜ γ≤1 Number, is referred to as discount rate.Additionally, α is learning coefficient, it is set to the scope of 0 ＜ α≤1.

Above-mentioned formula (1) represents, according to trial a_tResult and return come return r_t+1, more new state s_tIn behavior a_t's Evaluation of estimate Q (s_t、a_t) method.Represent if based on return r_t+1With the optimal behavior max a's of the next state of behavior a Evaluation of estimate Q (s_t+1、max a_t+1) total evaluation of estimate Q (s more than the behavior a under state s_t、a_t), then make Q (s_t、a_t) increase, On the contrary, the evaluation of estimate Q (s if less than the behavior a under state s_t、a_t) then make Q (s_t、a_t) reduce.That is, making certain shape The value of certain behavior under state close to based under the return immediately returning as a result and the next state of the behavior The value of good behavior.

Here, Q (s, a) technique of expression on computers, has pre- as form to (s, a) for whole state behaviors First keep the method for this value and the method preparing to carry out approximate function to Q (s, a).The method of the latter, by using random The methods such as gradient descent method adjust the parameter of approximate function, are capable of above-mentioned formula (1).Additionally, as approximate function, can make Use neutral net described later.

Additionally, it is approximate as the cost function having in teacher learning, the learning model of teacherless learning or intensified learning Algorithm, it is possible to use neutral net.Fig. 2 is the figure of the model schematically showing neuron, and Fig. 3 is to schematically show combination Fig. 2 Shown neuron and the figure of three-layer neural network that constitutes.That is, neutral net such neuron for example shown in imitation Fig. 2 The arithmetic unit of model and memory etc. are constituted.

As shown in Fig. 2 neuron is directed to multiple input x (in Fig. 2, to input x1～input x3 as) output output (result) y.Each input x (x1, x2, x3) is made to be multiplied by weight w (w1, w2, w3) corresponding with this input x.Thus, neuron output Result y being showed by following formula (2).Additionally, input x, result y and weight w are entirely vector.Additionally, in following formula (2), θ It is biasing (bias), f_kIt is activation primitive.

The three-layer neural network with reference to Fig. 3, the neuron shown in combination Fig. 2 being described and constituting.As shown in figure 3, from nerve net The left side of network inputs multiple input x (here, to input x1～input x3 as), from right side output result y (here, with Result y1～result y3 is as one).Specifically, input x1, x2, x3 are multiplied by corresponding weights and are input to 3 neurons Each of N11～N13.The weights being multiplied with these inputs are collectively labeled as W1.

Neuron N11～N13 exports z11～z13 respectively.In Fig. 3, these z11～z13 be labeled as with being aggregated feature to Amount Z1, and the vector being extracted after the characteristic quantity of input vector can be regarded as.This feature vector Z 1 is weights W1 and weights W2 Between characteristic vector.Z11～z13 is multiplied by corresponding weights and is input to each of 2 neuron N21 and N22.Will be with The weights that these characteristic vectors are multiplied collectively are labeled as w2.

Neuron N21, N22 each export z21, z22.In Fig. 3, these z21, z22 are collectively labeled as characteristic vector Z2.This feature vector Z 2 is the characteristic vector between weights W2 and weights W3.Z21, z22 are multiplied by corresponding weights and are input to Each of 3 neuron N31～N33.The weights being multiplied with these characteristic vectors are collectively labeled as W3.

Finally, neuron N31～N33 each output result y1～result y3.There is study in the action of neutral net Pattern and value forecasting pattern.For example, in mode of learning, learn weights W using learning data set, and existed using this parameter The behavior carrying out robot in predictive mode judges.Additionally, for convenience be written as predict, but it is of course possible to be detection/point The multiple-tasks such as class/inference.

Here it is possible to actually making data obtained by robot motion carry out instant learning under predictive mode, and instead Reflect (on-line study) in ensuing behavior；The study that can also be collected using the data group collected in advance, with Carry out detection pattern (batch learns) afterwards always using this parameter.Or or intermediateness, whenever data is with certain journey Degree inserts mode of learning when occurring to accumulate.

Furthermore, it is possible to method (error back propagation method is broadcast by error-duration model：Back propagation) learning weights W1～W3.Additionally, control information is from right side input cocurrent to the left.It is to make input for each neuron that error-duration model broadcasts method Output y during input x with really export the difference between y (teacher) and diminish, (study) is adjusted to respective weights Method.

Such neutral net can increase layer (referred to as Deep Learning) more than three layers further.Further, it is also possible to only The feature extraction periodically being inputted the arithmetic unit that result returned automatically are obtained according to teacher's data.

Therefore, the rote learning device 20 of present embodiment, in order to implement above-mentioned Q study, as shown in Figure 1 so, Possess：Quantity of state observation unit 21, the result of the action obtaining section 26, study portion 22 and intention determination section 25.However, should in the present invention Learning by rote is for example aforementioned to be not limited to Q study.I.e., it is possible to application side used in rote learning device Method is the various methods such as " having teacher learning ", " teacherless learning ", " partly having teacher learning " and " intensified learning ".Additionally, this A little rote learnings (rote learning device 20) can use general computer or processor, however, when application GPGPU, big rule During mould PC cluster etc., can be processed more at high speed.

That is, according to present embodiment, there is provided a kind of rote learning device, its study is by robot arm 13 from comprising to dissipate The action of the robot 14 of workpiece 12 is taken out, it possesses in multiple workpiece 12 of the mixed and disorderly placement of dress state：Quantity of state observation unit 21, its observation comprises the quantity of state of the robot 14 of output data of 3-D measurer 15, and this 3-D measurer 15 measures each The three-dimensional position (x, y, z) of workpiece 12 or three-dimensional position and attitude (x, y, z, w, p, r)；The result of the action obtaining section 26, it takes The result of the taking-up action of robot 14 of workpiece 12 must be taken out by robot arm 13；Study portion 22, it accepts to be derived from state The output in discharge observation portion 21 and the output from the result of the action obtaining section 26, with the quantity of state of robot 14 and the knot of taking-up action Fruit study in association comprises to indicate the operational ton of the director data of taking-up action of workpiece 12 to robot 14.

Additionally, quantity of state observation unit 21 observation quantity of state for example can comprise respectively to from case 11 take out certain workpiece 12 when The state variable that the position of robot arm 13, attitude and removing direction are set.Additionally, the operational ton being learnt for example may be used To comprise to be supplied to turning of each drive shaft of robot 14, robot arm 13 from control device 16 when case 11 takes out workpiece 12 The command value such as square, speed, position of rotation.

And, study portion 22 is when one of multiple workpiece 12 in bulk workpiece is removed, the taking-up with workpiece 12 is moved The result (output of the result of the action obtaining section 26) made learns above-mentioned state variable in association.That is, being filled by control Put 16 and randomly set the output data of 3-D measurer 15 (coordinate calculating part 19) and the director data of robot arm 13 respectively, Or expressly set according to predetermined rule, and implement the taking-up action of workpiece 12 by robot arm 13.Here, as Above-mentioned predetermined rule, for example, have the high workpiece in height (z) direction from multiple workpiece 12 in bulk to start to take out successively.By This, the output data of 3-D measurer 15 corresponds to, with the director data of robot arm 13, the behavior taking out certain workpiece.And, meeting Produce the success of taking-up and the failure of workpiece 12, in the such success of each generation and failure, study portion 22 surveys to by three-dimensional The state variable that the director data of the output data of measuring device 15 and robot arm 13 is constituted is evaluated.

Additionally, study portion 22 will take out the output data of 3-D measurer 15 and the instruction of robot arm 13 during workpiece 12 Data is stored in association with the evaluation of the result of the taking-up action to workpiece 12.Additionally, have as follows as failure example Situation：Even if but robot arm 13 cannot keep the situation of workpiece 12 or hold workpiece 12 wall of workpiece 12 and case 11 Situation colliding or contacting etc..Additionally, the detected value according to force snesor 17, the photographed data based on 3-D measurer Lai Judge whether the taking-up of such workpiece 12 is successful.Here, rote learning device 20 for example can also be using from control device 16 The director data of robot arm 13 of output a part of being learnt.

Here, the study portion 22 of present embodiment is preferably provided with returning calculating part 23 and cost function update section 24.For example, The success or not of the taking-up of workpiece 12 that return calculating part 23 causes according to above-mentioned state variable, to calculate return, such as fraction. Success to the taking-up of workpiece 12, is set to return and uprises, the failure to the taking-up of workpiece 12, is set to return step-down.Additionally, also may be used So that return is calculated according to the number of success of the taking-up of workpiece 12 in the given time.Additionally, when calculating this return, for example Can correspond to the one-tenth of successful, workpiece 12 the placement action of successful, robot arm 13 the carrying of the holding of robot arm 13 Each stage of the taking-up of the workpiece such as work(12 calculates return.

And, cost function update section 24 has the cost function that the value of the taking-up action to workpiece 12 is determined, With above-mentioned return accordingly recovery value function.In the renewal of this cost function, using above-mentioned such value Q (s, a) Newer.Furthermore it is preferred that make behavior memory table in this renewal.Behavior memory table described here refers to, dependently of each other Have recorded the director data of the output data of 3-D measurer 15 when having taken out workpiece 12 and robot arm 13 and correspond to this When the taking-up result of workpiece 12 and the form of cost function (evaluation of estimate) after updating.

Additionally, being worth table as the behavior, it is possible to use carry out the letter after approximate processing using aforesaid neutral net Number, is particularly effective when the information content of such state s such as view data is huge.Additionally, above-mentioned cost function is not limited to 1 Kind.For example consider to robot arm 13 to workpiece 12 hold success or not evaluated cost function, to by machinery Hand 13 holds and carries the cost function that the time (cycle time) required for workpiece 12 evaluated.

Additionally, as above-mentioned cost function, it is possible to use the case 11 when workpiece is taken out and robot arm 13 or workpiece The cost function that interference between 12 is evaluated.In order to calculate return, quantity of state used in the renewal of this cost function Observation unit 21 preferably observes the power being applied to robot arm 13, the value for example being detected by force snesor 17.And, sensed by power In the case that the variable quantity of the power of device 17 detection has exceeded predetermined threshold, can be estimated as there occurs above-mentioned interference, it is therefore preferable that Return in the case of this is set to such as negative value, and makes to be worth decline determined by cost function.

Additionally, according to present embodiment, the measurement parameter of 3-D measurer 15 can be learnt as operational ton.That is, According to present embodiment, there is provided a kind of rote learning device, its study is by robot arm 13 from comprising the miscellaneous of state in bulk Leave about the action of the robot 14 taking out workpiece 12 in the multiple workpiece 12 put, it possesses：Quantity of state observation unit 21, its observation bag The quantity of state of the robot 14 of the output data containing 3-D measurer 15, this 3-D measurer 15 measures the three-dimensional of each workpiece 12 Position (x, y, z) or three-dimensional position and attitude (x, y, z, w, p, r)；The result of the action obtaining section 26, it obtains and passes through manipulator Portion 13 takes out the result of the taking-up action of robot 14 of workpiece 12；Study portion 22, it accepts from quantity of state observation unit 21 Output and the output from the result of the action obtaining section 26, are learned in association with the quantity of state of robot 14 and the result of taking-up action Practise the operational ton of the measurement parameter comprising 3-D measurer 15.

Additionally, in the robot system 10 of present embodiment, can also possessing manipulator apparatus for automatic change and (not scheme Show), the robot arm 13 installed in robot 14 is replaced by the robot arm 13 of other forms by it.In this case, valency Value function update section 24 can also have above-mentioned cost function for each different robot arm 13 of form, with return accordingly The cost function of the robot arm 13 after changing is updated.Thereby, it is possible to the multiple manipulators 13 different for form Each is higher therefore, it is possible to make manipulator apparatus for automatic change select cost function learning the optimal action of robot arm 13 Robot arm 13.

Then it is intended that determination section 25, for example preferably with reference to the such as above-mentioned behavior memory table produced, to select and higher assessment It is worth the output data of corresponding 3-D measurer 15 and the director data of robot arm 13.Afterwards it is intended that determination section 25 will select Fixed robot arm 13, the optimum data of 3-D measurer 15 export control device 16.

Then, control device 16 is exported using study portion 22 robot arm 13, the optimum data of 3-D measurer 15, Respectively 3-D measurer 15 and robot 14 are controlled taking out workpiece 12.For example, control device 16 is preferably according to logical Cross the state variable that the optimum position of robot arm 13, attitude and the removing direction that study portion 22 obtains is set respectively, make Robot arm 13, each drive shaft of robot 14 carry out action.

Additionally, the robot system 10 of above-mentioned embodiment, as shown in figure 1, possesses a machine for a robot 14 Tool learning device 20.But in the present invention, robot 14 and the respective quantity of rote learning device 20 are not limited to one.Example As robot system 10 can also possess multiple robots 14, is arranged in correspondence with more than one machinery with each robot 14 Learning device 20.And, robot system 10 is preferably shared by communication medias such as networks or is exchanged with each other each robot 14 The 3-D measurer 15 acquired by rote learning device 20 and robot arm 13 optimum state variable.Thus, even if certain machine The operation ratio of device people 14 is less than the operation ratio of other robot 14 it is also possible to utilize another machine in the action of certain robot 14 The optimal the result of the action acquired by rote learning device 20 that people 14 possesses.Additionally, by the study mould between multiple robots Type shared, or carry out comprising the quantity of state of the operational ton of the measurement parameter of 3-D measurer 15 and robot 14 with take out dynamic Sharing of the result made, can shorten the time that study is spent.

Additionally, rote learning device 20 may be located in robot 14 it is also possible to be located at outside robot 14.Or, machine Tool learning device 20 may be located in control device 16 it is also possible to be present in Cloud Server (not shown).

Additionally, in the case that robot system 10 possesses multiple robots 14, can carry by machine in certain robot 14 The period of the workpiece 12 that tool hand 13 holds, the robot arm of another robot 14 is made to implement to take out the operation of workpiece 12.And, The time of the period that cost function update section 24 can also be switched over using robot 14 that is such, taking out workpiece 12 is more New value function.Additionally, having the state variable of multiple Manipulator Models in rote learning device 20, move in the taking-up of workpiece 12 Carry out in work using multiple Manipulator Models taking-up simulation, and according to this taking-up simulation result, the taking-up with workpiece 12 move The result made learns the state variable of multiple Manipulator Models in association.

Additionally, in above-mentioned rote learning device 20, by obtain the graphics of each workpiece 12 data when three-dimensional survey The output data of measuring device 15, is sent to quantity of state observation unit 21 from 3-D measurer 15.May not in such transmission data Include abnormal data, therefore, can have the filtering function of abnormal data in rote learning device 20, can select be No by from 3-D measurer 15 data input to quantity of state observation unit 21 function.Thus, of rote learning device 20 Habit portion 22 can efficiently learn the optimal action of the robot arm 13 of 3-D measurer 15 and robot 14.

Additionally, in above-mentioned rote learning device 20, the output data coming self study portion 22 is imported into control device In 16, but also may not comprise abnormal data in the output data from this study portion 22, accordingly it is also possible to have exception The filtering function of data, that is, be able to select whether the function of the data output in self study in future portion 22 to control device 16.Thus, Control device 16 can make robot 14 more safely execute the optimal action of robot arm 13.

Additionally, above-mentioned abnormal data can be detected by following order.I.e., it is possible to by following such order Lai Detection abnormal data：The probability distribution of presumption input data, derives the probability of happening of new input, if it happens using probability distribution Probability is necessarily following, then regard the abnormal data significantly deviateing typical movement as.

Then, illustrate one of the action of rote learning device 20 that the robot system 10 of present embodiment possesses.Fig. 4 It is the flow chart of that represents the action of rote learning device shown in Fig. 1.As shown in figure 4, the rote learning shown in Fig. 1 In device 20, when starting study action (study process), three-dimensional measurement is implemented by 3-D measurer 15 and is exported (figure 4 step S11).I.e., in step s 11, for example, obtain the three-dimensional of each workpiece 12 of mixed and disorderly placement comprising state in bulk Figure (output data of 3-D measurer 15), exports quantity of state observation unit 21, and, accepts each by coordinate calculating part 19 The graphics of workpiece 12 and calculate the three-dimensional position (x, y, z) of each workpiece 12, export quantity of state observation unit 21, the result of the action Obtaining section 26 and control device 16.Here, coordinate calculating part 19 can also calculate according to the output of 3-D measurer 15 often The attitude (w, p, r) of individual workpiece 12 is simultaneously exported.

Additionally, as illustrated by with reference to Fig. 5, can be by the output (graphics) of 3-D measurer 15 via to quantity of state The pretreatment portion 50 being processed before observation unit 21 input is input to quantity of state observation unit 21.Additionally, as illustrated by with reference to Fig. 7 , only the output of 3-D measurer 15 can be input to quantity of state observation unit 21, and then, can be only by 3-D measurer 15 Output is input to quantity of state observation unit 21 via pretreatment portion 50.So, the enforcement of the three-dimensional measurement in step S11 and output can To comprise various modes.

Specifically, in the case of figure 1, quantity of state observation unit 21 observation is from each workpiece 12 of 3-D measurer 15 Graphics and the three-dimensional position (x, y, z) of each workpiece 12 from coordinate calculating part 19 and the such shape of attitude (w, p, r) State amount (output data of 3-D measurer 15).Additionally, the result of the action obtaining section 26 is according to the output data of 3-D measurer 15 (output data of coordinate calculating part 19), obtains the knot of the taking-up action of robot 14 taking out workpiece 12 by robot arm 13 Really.Additionally, the result of the action obtaining section 26 is in addition to the output data of 3-D measurer, for example, can also obtain the work that will take out The result of the such taking-up action of damage of workpiece 12 that part 12 is sent to completeness during rear operation, takes out.

Additionally, for example, rote learning device 20 according to the output data of 3-D measurer 15 determine optimal action (Fig. 4's Step S12), in addition, control device 16 exports the director data (operational ton) of robot arm 13 (robot 14), implement workpiece 12 Taking-up action (step S13 of Fig. 4).Then, the taking-up result (figure of workpiece is obtained by above-mentioned the result of the action obtaining section 26 4 step S14).

Then, by the output from the result of the action obtaining section 26, judge workpiece 12 taking-up success or not (Fig. 4's Step S15), in the case of workpiece 12 takes out successfully, set and just return (step S16 of Fig. 4), take out unsuccessfully in workpiece 12 In the case of, set negative return (step S17 of Fig. 4), then, regeneration behavior is worth table (cost function) (step S18 of Fig. 4).

Here, for example, it is possible to work is judged according to the output data of the 3-D measurer 15 after the taking-up action of workpiece 12 The success or not of the taking-up of part 12.Additionally, the judgement of the success or not of the taking-up of workpiece 12 is not limited to evaluate the taking-up of workpiece 12 Success or not, for example can also evaluate：The workpiece 12 of taking-up is sent to completeness during rear operation, with the presence or absence of take out State changes such as the damage of workpiece 12 or held by robot arm 13 and the time required for workpiece 12 of carrying is (during the cycle Between) or energy (electricity) etc..

Additionally, by returning the return value that calculating part 23 calculates the judgement of the taking-up success or not based on workpiece 12, in addition, Carry out the renewal of behavior memory table by cost function update section 24.That is, study portion 22, will just in the taking-up success of workpiece 12 Return sets to the return (S16) in the newer of aforesaid values Q (s, a), in addition, workpiece 12 take out failure when, Negative return is set to the return (S17) in this newer.Then, study portion 22, when carrying out the taking-up of workpiece 12 every time, is entered The renewal (S18) of the aforesaid behavior memory table of row.By repeating above step S11～S18, study portion 22 continues (study) behavior It is worth the renewal of table.

In above-mentioned, the data being imported into quantity of state observation unit 21 is not only restricted to the output data of 3-D measurer 15, For example, it is also possible to comprise the data such as the output of other sensors, in addition it is also possible to using the director data carrying out self-control device 16 A part.So, control device 16, using the director data (operational ton) from the output of mechanical learning device 20, makes robot 14 The taking-up action of execution workpiece 12.Additionally, the study based on rote learning device 20, it is not only restricted to the taking-up action of workpiece 12, For example as aforementioned can also be 3-D measurer 15 measurement parameter.

As more than, according to the robot system 10 of the rote learning device 20 possessing present embodiment, can learn to pass through Robot arm 13 is from the action comprising the robot 14 taking out workpiece 12 multiple workpiece 12 of mixed and disorderly placement of state in bulk.By This, robot system 10 can learn to take out the optimal action of the robot 14 of workpiece 12 in bulk without manpower intervention Select.

Fig. 5 is the block diagram of the conceptual configuration of robot system representing another embodiment of the present invention, and expression applies There is the robot system of teacher learning.From the comparison of Fig. 5 and aforementioned Fig. 1 it will be evident that with respect to applying Q shown in Fig. 1 The robot system 10 of study (intensified learning), the robot system 10 ' applying teacher learning shown in Fig. 5 is also equipped with attached Data recording section 40 with result (label).Additionally, the robot system 10 ' shown in Fig. 5 is also equipped with to 3-D measurer 15 The pretreatment portion 50 that output data is pre-processed.Additionally, can certainly for example arrange to the robot system 10 shown in Fig. 1 Pretreatment portion 50.

As shown in figure 5, the rote learning device 30 in applying the robot system 10 ' of teacher learning possesses：State Discharge observation portion 31, movement structure obtaining section 36, study portion 32 and intention determination section 35.Study portion 32 comprises error calculation portion 33 He Learning model update section 34.Additionally, in the robot system 10 ' of present embodiment, rote learning device 30 also learns to indicate Robot 14 carries out the such operational ton of measurement parameter of the director data of taking-up action of workpiece 12 or 3-D measurer 15 simultaneously Exported.

That is, in the robot system 10 ' applying teacher learning shown in Fig. 5, error calculation portion 33 and study mould Type update section 34 corresponds respectively to the return calculating part 23 in the robot system 10 applying Q study shown in Fig. 1 and is worth Function update section 24.Additionally, other mechanisms such as structure such as 3-D measurer 15, control device 16 and robot 14 with aforementioned Fig. 1 is identical, and the description thereof will be omitted.

By error calculation portion 33 calculate from the result of the action obtaining section 36 output result (label) be arranged on study portion In the output of learning model between error.Here, the data recording section 40 of spin-off (label) is for example in workpiece 12 In the case of shape, the process identical of robot 14, it is maintained up to the proxima luce (prox. luc) of the target date making robot 14 carry out operation Till obtained by spin-off (label) data, and in this target date by the data recording section 40 of spin-off (label) The data of the spin-off (label) keeping is supplied to error calculation portion 33.Or it is also possible to pass through storage card or communication line By the spin-off by data or other robot system obtained by simulation of carrying out in the outside of robot system 10 ' etc. The data of (label), is supplied to the error calculation portion 33 of this robot system 10 '.Furthermore, it is possible to utilize flash memory (Flash ) etc. Memory nonvolatile memory constitutes the data recording section 40 of spin-off (label), by the number of spin-off (label) It is built in study portion 32 according to record portion (nonvolatile memory) 40, this spin-off (mark is directly used in by study portion 32 Sign) data recording section 40 in keep spin-off (label) data.

Fig. 6 is the figure of of the process for the pretreatment portion in the robot system shown in Fig. 5 is described, Fig. 6 (a) table The data showing the three-dimensional position (attitude) of multiple workpiece 12 in bulk in case 11 is the one of the output data of 3-D measurer 15 Example, Fig. 6 (b)～Fig. 6 (d) represents the example that the workpiece 121～123 in Fig. 6 (a) is carried out with pretreated view data.

Here, as workpiece 12 (121～123) it is contemplated that columned metal parts, as robot arm (13), in advance Think for example to utilize the suction tray of the longitudinally central part of the cylindric workpiece of negative pressure absorbing 12, rather than held using 2 claws Workpiece.If thus, for example knowing the position of the longitudinally central part of workpiece 12, by making suction tray (13), this position is moved Move and adsorbed, workpiece 12 can be taken out.Additionally, the numerical value in Fig. 6 (a)～Fig. 6 (d) is represented with [mm], represent x side respectively To, y direction, z direction.Additionally, z direction corresponds to the pass the 3-D measurer 15 being disposed over (for example has 2 shootings Machine) case 11 of multiple workpiece 12 in bulk shot obtained by view data height (depth) direction.

Obvious from Fig. 6 (a) and Fig. 6 (b)～Fig. 6 (d) is found out, as in the robot system 10 ' shown in Fig. 5 One of the process of pretreatment portion 50, the output data (3-D view) according to 3-D measurer 15, make the workpiece 12 (example of concern As 3 workpiece 121～123) rotation, and be processed into center height be " 0 ".

That is, include the three-dimensional of the longitudinally central part of such as each workpiece 12 in the output data of 3-D measurer 15 Position (x, y, z) and the information of attitude (w, p, r).Now, as shown in Fig. 6 (b), Fig. 6 (c) and Fig. 6 (d), make of interest 3 Each spinning-the r of workpiece 121,122,123, and deduct z and all meet identical condition.By carrying out such pretreatment, The load of rote learning device 30 can be reduced.

Here, the graphics shown in Fig. 6 (a) is not the output data of 3-D measurer 15 itself, but for example therewith Before compare, reduce for being selected from the image obtained by program by specifying the taking-up order of workpiece 12 implemented The threshold value selected, and this process itself can also be carried out by pretreatment portion 50.Additionally, the place as such pretreatment portion 50 It is of course possible to there are various changes due to the various conditions headed by species of the shape of workpiece 12 and robot arm 13 etc. in reason Change.

So, to quantity of state observation unit 31 input before by pretreatment portion 50 processed obtained by 3-D measurer 15 output data (graphics of each workpiece 12) is imported into quantity of state observation unit 31.Referring again to Fig. 5, accept from action The error calculation portion 33 of the result (label) of result obtaining section 36 output, as learning model, such as by the nerve net shown in Fig. 3 When the output of network is set to y, regard, when the taking-up action the success that actually carry out workpiece 12, the error that there is-log (y) as；? Regard the error that there is-log (1-y) during failure as, and carry out the process that purpose is to make this error minimize.Additionally, as Fig. 3 The input of shown neutral net, provides and such shown in such as Fig. 6 (b)～Fig. 6 (d) carries out pretreated work of interest The three-dimensional position of each of the view data of part 121～123 and these workpiece 121～123 of interest and attitude (x, y, Z, w, p, data r).

Fig. 7 is the block diagram of the variation representing the robot system shown in Fig. 1.Obvious from Fig. 7 and Fig. 1 is found out, In the variation of robot system 10 shown in Fig. 7, reduce coordinate calculating part 19, quantity of state observation unit 21 only accepts to be derived from The graphics of 3-D measurer 15 is observing the quantity of state of robot 14.Additionally, can certainly arrange for control device 16 Structure corresponding with coordinate calculating part 19.Additionally, the structure shown in this Fig. 7 for example can also be applied to answering with reference to Fig. 5 explanation With there being the robot system 10 ' of teacher learning.I.e., it is possible to the middle reduction pretreatment portion of robot system 10 ' shown in from Fig. 5 50, quantity of state observation unit 31 only accepts to be derived from the quantity of state to observe robot 14 for the graphics of 3-D measurer 15.So, on State each embodiment to carry out various changes and deform.

As described in detail above, according to present embodiment, using the teaching of the invention it is possible to provide one kind can learn to take out bag without manpower intervention The rote learning device of the optimal action of the robot during workpiece of the mixed and disorderly placement containing state in bulk, robot system and machine Tool learning method.Additionally, as the rote learning device 20,30 of the present invention, be not limited to apply intensified learning (such as Q study) or There is teacher learning, the algorithm of various rote learnings can also be applied.

According to the rote learning device of the present invention, robot system and learning by rote it is achieved that can be artificial The effect of the optimal action of robot during the workpiece of mixed and disorderly placement comprising state in bulk is taken out in the study of intervention ground.

It is explained above embodiment, but the whole examples recorded here, condition are all to contribute to understanding in invention With record for the purpose of the inventive concept of application in technology, the example especially recorded, condition are not limiting as the model invented Enclose.Additionally, the such record in specification is not offered as the merits and demerits invented.Although describing the enforcement of invention in detail Mode is it being understood, however, that can carry out various changes, replacement, deformation in the case of the spirit and scope without departing from invention.

Claims

1. a kind of rote learning device, study is taken from multiple workpiece of the mixed and disorderly placement comprising state in bulk by robot arm Go out the action of the robot of described workpiece,

This rote learning device is characterised by possessing：

Quantity of state observation unit, its observation comprises the quantity of state of the described robot of output data of 3-D measurer, and this three-dimensional is surveyed Measuring device obtains the graphics of each described workpiece；

The result of the action obtaining section, it obtains the taking-up action of described robot taking out described workpiece by described robot arm Result；And

Study portion, it accepts the output from described quantity of state observation unit and the output from described the result of the action obtaining section, with The result learning manipulation amount in association of the described quantity of state of described robot and described taking-up action, this operational ton comprises to institute State the director data that robot indicates the described taking-up action of described workpiece.

2. rote learning device according to claim 1 it is characterised in that

Described rote learning device is also equipped with being intended to determination section, its described operational ton learning with reference to described study portion, to determine Orient the described director data of described robot instruction.

3. a kind of rote learning device, study is taken from multiple workpiece of the mixed and disorderly placement comprising state in bulk by robot arm Go out the action of the robot of described workpiece,

This rote learning device is characterised by possessing：

Quantity of state observation unit, its observation comprises the quantity of state of the described robot of output data of 3-D measurer, and this three-dimensional is surveyed Measuring device measures the graphics of each described workpiece；

Study portion, it accepts the output from described quantity of state observation unit and the output from described the result of the action obtaining section, with The result of the described quantity of state of described robot and described taking-up action learns to comprise the survey of described 3-D measurer in association The operational ton of amount parameter.

4. rote learning device according to claim 3 it is characterised in that

Described rote learning device is also equipped with being intended to determination section, and the described operational ton that it learns with reference to described study portion to determine The described measurement parameter of described 3-D measurer.

5. rote learning device according to any one of claim 1 to 4 it is characterised in that

Described quantity of state observation unit to observe, always according to the output of described 3-D measurer, the output data comprising coordinate calculating part Described robot quantity of state, this coordinate calculating part is used for calculating the three-dimensional position of each described workpiece.

6. rote learning device according to claim 5 it is characterised in that

Described coordinate calculating part also calculates the attitude of each described workpiece, and exports the three-dimensional position of the described workpiece of each calculating Put the data with attitude.

7. rote learning device according to any one of claim 1 to 6 it is characterised in that

Described the result of the action obtaining section utilizes the output data of described 3-D measurer.

8. rote learning device according to any one of claim 1 to 7 it is characterised in that

Described rote learning device is also equipped with pretreatment portion, its to described quantity of state observation unit input before, to described three-dimensional The output data of measuring appliance is processed,

Described quantity of state observation unit accepts the quantity of state as described robot for the output data of pretreatment portion.

9. rote learning device according to claim 8 it is characterised in that

Described pretreatment portion makes the direction of each described workpiece in the output data of described 3-D measurer be one with height Fixed.

10. rote learning device according to any one of claim 1 to 9 it is characterised in that

Described the result of the action obtaining section obtains taking-up success or not, the distress condition of described workpiece of described workpiece and will take The described workpiece going out is sent at least one in completeness during rear operation.

11. rote learning devices according to any one of claim 1 to 10 it is characterised in that

Described study portion possesses：

Return calculating part, it calculates return according to the output of described the result of the action obtaining section；And

Cost function update section, it has the cost function of the value of the described taking-up action for determining described workpiece, with institute State return accordingly to update described cost function.

12. rote learning devices according to any one of claim 1 to 10 it is characterised in that

Described study portion has the learning model of the described taking-up action learning described workpiece,

Described study portion possesses：

Error calculation portion, it is according to the output of described the result of the action obtaining section and the output of described learning model come calculation error； And

Learning model update section, it accordingly to update described learning model with described error.

13. rote learning devices according to any one of claim 1 to 12 it is characterised in that

Described rote learning device has neutral net.

A kind of 14. robot systems, possess the rote learning device described in any one of claim 1 to 13, this robot System is characterised by, possesses：

Described robot；

Described 3-D measurer；And

Control described robot and the control device of described 3-D measurer respectively.

15. robot systems according to claim 14 it is characterised in that

Described robot system possesses multiple described robots,

It is respectively directed to each described robot and described rote learning device is set,

The multiple described rote learning device arranging for multiple described robots pass through the mutual shared data of communication media or Exchange data.

16. robot systems according to claim 15 it is characterised in that

Described rote learning device is present on Cloud Server.

A kind of 17. learning by rote, study is by robot arm from comprising multiple workpiece of mixed and disorderly placement of state in bulk Take out the action of the robot of described workpiece, this learning by rote is characterised by,

Observation comprises the quantity of state of the described robot of output data of 3-D measurer, and this 3-D measurer measures described in each The three-dimensional position of workpiece；

Obtain the result of the taking-up action of described robot taking out described workpiece by described robot arm；And

Accept the result of the taking-up action of the quantity of state of the described robot and acquired described robot being observed, with institute The result of the described quantity of state and described taking-up action of stating robot learns to comprise described to the instruction of described robot in association The operational ton of the director data of described taking-up action of workpiece.

A kind of 18. learning by rote, study is by robot arm from comprising multiple workpiece of mixed and disorderly placement of state in bulk Take out the action of the robot of described workpiece, this learning by rote is characterised by,

Observation comprises the quantity of state of the described robot of output data of 3-D measurer, and this 3-D measurer measures described in each The graphics of workpiece；

Accept the result of the taking-up action of the quantity of state of the described robot and acquired described robot being observed, with institute The result of the described quantity of state and described taking-up action of stating robot learns to comprise the measurement of described 3-D measurer in association The operational ton of parameter.