CN108052004A - Industrial machinery arm autocontrol method based on depth enhancing study - Google Patents
Industrial machinery arm autocontrol method based on depth enhancing study Download PDFInfo
- Publication number
- CN108052004A CN108052004A CN201711275146.7A CN201711275146A CN108052004A CN 108052004 A CN108052004 A CN 108052004A CN 201711275146 A CN201711275146 A CN 201711275146A CN 108052004 A CN108052004 A CN 108052004A
- Authority
- CN
- China
- Prior art keywords
- network
- input state
- state
- parameter
- network parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Programme-controlled manipulators
- B25J9/16—Programme controls
- B25J9/1656—Programme controls characterised by programming, planning systems for manipulators
- B25J9/1664—Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
- G05B13/045—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance using a perturbation signal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Robotics (AREA)
- Automation & Control Theory (AREA)
- Mechanical Engineering (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Geometry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Feedback Control In General (AREA)
Abstract
The present invention relates to a kind of industrial machinery arm autocontrol method based on depth enhancing study, reward r is established in structure depth enhancing learning model, construction output interferencetComputation model builds simulated environment, pond of accumulating experience, training deeply learning neural network and utilizes the control machinery arm movement in practice of trained depth enhancing learning model.Enhance learning network by adding in depth, solve the problems, such as mechanical arm automatically controlling in complex environment, complete automatically controlling for mechanical arm, and the speed of service is fast after the completion of training, precision is high.
Description
Technical field
The invention belongs to nitrification enhancement technical fields, and in particular to a kind of industrial machinery based on depth enhancing study
Arm autocontrol method.
Background technology
Industrial machinery arm can more be efficiently completed some simple repetitions and bulky behaviour for manpower
Make, while production efficiency is greatly improved, labour cost and labor intensity can also be reduced, ensureing the same of the quality of production
When can reduce the probability that artificial accident occurs again.In some adverse circumstances, as high temperature, high pressure, low temperature, low pressure, dust, easily
Combustion, explosive etc. replace manual work by mechanical arm, can prevent, because of artificial accident caused by negligence of operation, having great
Meaning.
Then the movement solution procedure of mechanical arm obtains the pose letter of itself to obtain the posture information of crawl target first
Breath goes out the rotation angle of each axis by reverse Dynamic solving.Due to the flexible effect of joint in motion process and connecting rod
In the presence of, structure is made to deform, precision reduce.It is achieved that the control of flexible mechanical arm is a big problem.Common controlling party
Method has PID control, force-feedback control, self adaptive control, fuzzy and ANN Control etc..Wherein ANN Control has bright
The advantages of aobvious, is not required the mathematical model of control target, and in the society of following artificial intelligence, based on neutral net
It automatically controls and will be mainstream.
The content of the invention
It is an object of the invention to provide a kind of industrial machinery arm autocontrol methods based on depth enhancing study, pass through
Depth enhancing learning network is added in, mechanical arm automatically controlling in complex environment is solved the problems, such as, completes automatically controlling for mechanical arm.
To achieve the above object, the industrial machinery arm side of automatically controlling based on depth enhancing study designed by the present invention
Method, it is characterised in that:The control method includes the following steps:
Step 1) structure depth enhancing learning model
1.1) experience pond initializes:Experience pond is set as m rows, the two-dimensional matrix of n row, the value of each element in two-dimensional matrix
0 is initialized as, wherein, the information content that m is sample size size, n is each sample storage, n=2 × state_dim+
The dimension that action_dim+1, state_dim are the dimension of state, action_dim is action;Meanwhile it is reserved in experience pond
Go out to store the space of incentive message, 1 in this formula of n=2 × state_dim+action_dim+1 is storage prize
Encourage the headspace of information;
1.2) neutral net initializes:Neutral net is divided into two parts of Actor networks and Critic networks, Actor nets
Network is behavior network, Critic networks are evaluation network, be each partly divided into not Gou Jian two structures are identical and parameter not
With eval net and target net, eval net be that estimation network, target net are objective network, so as to formed μ (s |
θμ) network, μ (s | θμ′) network, Q (s, a | θQ) network and Q (s, a | θQ′) network totally four networks, i.e. μ (s | θμ) network for row
For estimation network, μ (s | θμ′) network for performance-based objective network, Q (s, a | θQ) network for evaluation estimation network, Q (s, a | θQ′) net
Network is evaluation objective network;Random initializtion μ (s | θμ) network parameter θμWith random initializtion Q (s, a | θQ) network parameter
θQ, then by μ (s | θμ) network parameter θμValue assigns performance-based objective network, i.e. θμ′←θμ, by Q (s, a | θQ) network parameter θQ
Value assigns evaluation objective network, i.e. θQ′←θQ;
Step 2) construction output interference
According to current input state st, pass throughNetwork obtains action at', an average is reset as at', variance
For var2Random normal distributionIt is distributed from random normalIn be randomly derived a reality output working value at, at random just
State is distributedTo acting at' interference is applied with, for exploring environment, wherein,The parameter of t moment evaluation estimation network is represented,
At the time of t is current input state;
Step 3) establishes reward rtComputation model
Step 4) builds simulated environment
Robot simulation simulation softward V-REP has the model of the major industrial robot in the world, and based on this, robotic arm is imitated
True environmental structure difficulty reduces, and passes through V-REP (Virtual Robot Experimentation Platform) software, structure
The simulated environment being consistent with practical application;
Step 5) is accumulated experience pond
5.1) according to current input state st, pass throughNetwork obtains action at', it is established further according to step 2) defeated
Go out interference and obtain reality output action at, and the r that receives awards from environmenttWith follow-up input state st+1, by current input state
st, reality output action at, reward rtWith follow-up input state st+1It is stored in experience pond, and by current input state st, it is real
Border output action at, reward rt, follow-up input state st+1It is referred to as state transinformation transition;
5.2) by follow-up input state st+1As present current input state st, step 5.1) is repeated, will be calculated
State transinformation transition be stored in experience pond;
5.3) until the space in experience pond is full by storage, the space in experience pond is often performed repetition step 5.2) after storage completely
Step 5.2), which just redirects, performs a step 6);
Step 6) training deeply learning neural network
6.1) sample
Batch groups sample is taken out from experience pond for neural network learning, batch represents natural number;
6.2) evaluation network parameter is updated
6.3) behavior estimation network parameter is updated
6.4) objective network parameter is updated
6.5) it is divided into xm bouts, each bout repeats step 6.1)~6.4) xn times, every time repeatedly 6.1)~6.4) after, it is defeated
The var values for going out interference are updated to var=max { 0.1, var=var × gamma }, and wherein xm, xn represents natural number, and gamma is
It is less than 1 rational more than zero;
Step 7) utilizes trained depth enhancing learning model control machinery arm movement in practice in step 6)
7.1) in true environment, the input of industrial ccd cameras pre-processes, after the picture of t moment is by gaussian filtering
As the state for Processing with Neural Network;
7.2) the current input state s of true environment is obtained by camerat, depth enhancing learning network is according to current input
State stControl machinery arm rotates, and obtains follow-up input state st+1.By follow-up input state st+1As current input state st,
So Xun Huan, until depth enhancing learning model control machinery arm grabs target.
Further, in the step 3), reward r is establishedtThe detailed process of computation model is:
Mechanical arm is obtained the image information of t moment by the industrial ccd cameras in environment, adds Gauss and make an uproar in t moment
Sound obtains current input state st, state is current input stWhen from step 2) random normal distributionIn be randomly derived one
A reality output working value at, (i.e. the rotation angle of each axis of mechanical arm), mechanical arm tail end position coordinates is x1t,y1t,z1t,
Target location is x0t,y0t,z0t, reward
Further, in the step 6.2), the detailed process being updated to evaluation network parameter is:
Batch group sample state transinformations transition according to being taken out in step 6.1) passes throughNetwork
WithNetwork respectively obtains the corresponding estimation Q ' value eval_Q ' and target Q ' values target_ of every group of state transinformation
Q ', and then obtain time difference mistake TD_error ', TD_error '=target_Q '-eval_Q ';T ' is in step 5.3)
Experience pool space is performed the input state moment of step 5.2) by storage after full, that is to say, that experience pool space quilt in step 5.3)
Input state moment when often performing a step 5.2) after storage is full is t ';
Loss function Loss, Loss=∑ TD_error '/batch is constructed according to time difference mistake TD_error ';
According to loss function Loss using gradient descent method to evaluation estimation network parameter θQIt is updated.
Further, in the step 6.3), the detailed process being updated to behavior estimation network parameter is:
Per the s in batch group sample state transinformations transitiontPass throughNetwork and output are disturbed
A is acted to corresponding reality outputt, according toEstimation Q ' value the eval_Q ' of network act a to reality outputtDerivation
Number obtains estimation Q ' values and acts a to reality outputtGradientReality output is moved in representative
Make atIt differentiates;According toThe reality output action a of networktValue pairNetwork parameter is differentiated, and obtains reality
Output action atValue pairThe gradient of network parameterWhereinIt represents to behavior estimation network
Parameter is differentiated;
Estimate that Q values act a to reality outputtGradientA is acted with reality outputtValue is to row
To estimate the gradient of network parameterProduct be estimate Q values to behavior estimate network parameter gradient;
Behavior estimation network parameter is updated using gradient rise method.
Further, in the step 6.4), the detailed process being updated to objective network parameter is:
At interval of J bouts, the network parameter of actor_eval is assigned to actor_target, at interval of K bouts,
The network parameter of critic_eval is assigned to critic_target, wherein, J ≠ K.
Compared with prior art, the present invention it has the following advantages:And the present invention is based on the industrial machineries of depth enhancing study
Arm autocontrol method enhances learning network by adding in depth, solves the problems, such as mechanical arm automatically controlling in complex environment, complete
Into automatically controlling for mechanical arm, and after the completion of training, the speed of service is fast, precision is high.
Description of the drawings
Fig. 1 is that the present invention is based on the flow diagrams of the industrial machinery arm autocontrol method of depth enhancing study.
Specific embodiment
The present invention is described in further detail in the following with reference to the drawings and specific embodiments.
It is the flow diagram of the industrial machinery arm autocontrol method based on depth enhancing study as shown in Figure 1, including
Following steps:
Step 1) structure depth enhancing learning model
1.1) experience pond initializes:Experience pond is set as m rows, the two-dimensional matrix of n row, the value of each element in two-dimensional matrix
0 is initialized as, wherein, the information content that m is sample size size, n is each sample storage, n=2 × state_dim+
The dimension that action_dim+1, state_dim are the dimension of state, action_dim is action;Meanwhile it is reserved in experience pond
Go out to store the space of incentive message, 1 in this formula of n=2 × state_dim+action_dim+1 is storage prize
Encourage the headspace of information;
1.2) neutral net initializes:Neutral net is divided into two parts of Actor networks and Critic networks, Actor nets
Network is behavior network, Critic networks are evaluation network, be each partly divided into not Gou Jian two structures are identical and parameter not
With eval net and target net, eval net be that estimation network, target net are objective network, so as to formed μ (s |
θμ) network, μ (s | θμ′) network, Q (s, a | θQ) network and Q (s, a | θQ′) network totally four networks, i.e. μ (s | θμ) network for row
For estimation network, μ (s | θμ′) network for performance-based objective network, Q (s, a | θQ) network for evaluation estimation network, Q (s, a | θQ′) net
Network is evaluation objective network;Random initializtion μ (s | θμ) network parameter θμWith random initializtion Q (s, a | θQ) network parameter
θQ, then by μ (s | θμ) network parameter θμValue assigns performance-based objective network, i.e. θμ′←θμ, by Q (s, a | θQ) network parameter θQ
Value assigns evaluation objective network, i.e. θQ′←θQ;
Step 2) construction output interference
According to current input state st, pass throughNetwork obtains action at', an average is reset as at', variance
For var2Random normal distributionIt is distributed from random normalIn be randomly derived a reality output working value at, at random just
State is distributedTo acting at' interference is applied with, for exploring environment, wherein,The parameter of t moment evaluation estimation network is represented,
At the time of t is current input state;
Step 3) establishes reward rtComputation model
Mechanical arm is obtained the image information of t moment by the industrial ccd cameras in environment, adds Gauss and make an uproar in t moment
Sound obtains current input state st, state is current input stWhen from step 2) random normal distributionIn be randomly derived one
A reality output working value at, (i.e. the rotation angle of each axis of mechanical arm), mechanical arm tail end position coordinates is x1t,y1t,z1t,
Target location is x0t,y0t,z0t, reward
Step 4) builds simulated environment
Robot simulation simulation softward V-REP has the model of the major industrial robot in the world, and based on this, robotic arm is imitated
True environmental structure difficulty reduces, and passes through V-REP (Virtual Robot Experimentation Platform) software, structure
The simulated environment being consistent with practical application;
Step 5) is accumulated experience pond
5.1) according to current input state st, pass throughNetwork obtains action at', it is established further according to step 2)
Output interference obtains reality output action at, and the r that receives awards from environmenttWith follow-up input state st+1, will currently input shape
State st, reality output action at, reward rtWith follow-up input state st+1It is stored in experience pond, and by current input state st、
Reality output acts at, reward rt, follow-up input state st+1It is referred to as state transinformation transition;
5.2) by follow-up input state st+1As present current input state st, step 5.1) is repeated, will be calculated
State transinformation transition be stored in experience pond;
5.3) until the space in experience pond is full by storage, the space in experience pond is often performed repetition step 5.2) after storage completely
Step 5.2), which just redirects, performs a step 6);
Step 6) training deeply learning neural network
6.1) sample
Batch groups sample is taken out from experience pond for neural network learning, batch represents natural number;
6.2) evaluation network parameter is updated
Batch group sample state transinformations transition according to being taken out in step 6.1) passes throughNet
Network andNetwork respectively obtains the corresponding estimation Q ' value eval_Q ' of every group of state transinformation and target Q ' values
Target_Q ', and then obtain time difference mistake TD_error ', TD_error '=target_Q '-eval_Q ';T ' is step
5.3) experience pool space is performed the input state moment of step 5.2) by storage in after full, that is to say, that experience pond in step 5.3)
Input state moment when space often performs a step 5.2) by storage after full is t ';
Loss function Loss, Loss=∑ TD_error '/batch is constructed according to time difference mistake TD_error ';
According to loss function Loss using gradient descent method to evaluation estimation network parameter θQIt is updated;
6.3) behavior estimation network parameter is updated
Per the s in batch group sample state transinformations transitiontPass throughNetwork and output are disturbed
A is acted to corresponding reality outputt, according toEstimation Q ' value the eval_Q ' of network act a to reality outputtDerivation
Number obtains estimation Q ' values and acts a to reality outputtGradientIt represents and reality output is acted
atIt differentiates;According toThe reality output action a of networktValue pairNetwork parameter is differentiated, and is obtained actual defeated
Go out to act atValue pairThe gradient of network parameterWhereinRepresent the ginseng to behavior estimation network
Number is differentiated;
Estimate that Q values act a to reality outputtGradientA is acted with reality outputtValue is to row
To estimate the gradient of network parameterProduct be estimate Q values to behavior estimate network parameter gradient;
Behavior estimation network parameter is updated using gradient rise method;
6.4) objective network parameter is updated
At interval of J bouts, the network parameter of actor_eval is assigned to actor_target, at interval of K bouts,
The network parameter of critic_eval is assigned to critic_target, wherein, J ≠ K;
6.5) it is divided into xm bouts, each bout repeats step 6.1)~6.4) xn times, every time repeatedly 6.1)~6.4) after, it is defeated
The var values for going out interference are updated to var=max { 0.1, var=var × gamma }, i.e. var values take the var of 0.1 and last moment
It is worth the maximum after overdamping, wherein xm, xn represents natural number, and gamma is the rational less than 1 more than zero;
Step 7) utilizes trained depth enhancing learning model control machinery arm movement in practice in step 6)
7.1) in true environment, the input of industrial ccd cameras pre-processes, after the picture of t moment is by gaussian filtering
As the state for Processing with Neural Network;
7.2) the current input state s of true environment is obtained by camerat, depth enhancing learning network is according to current input
State stControl machinery arm rotates, and obtains follow-up input state st+1.By follow-up input state st+1As current input state st,
So Xun Huan, until depth enhancing learning model control machinery arm grabs target.
Experimental data
Object of experiment is in SCARA robot simulated environment, passes through deeply learning neural network, control machinery arm
Target is automatically positioned in target, and implements to capture.Experimental setup is to train 600 bouts, 200 step of one bout.It is tied in training
Shu Hou can rapidly grab target within 20~30 steps running, be adapted to the requirement of modern industry flow line production.
And traditional control machinery arm needs founding mathematical models and reverse dynamic (dynamical) Real-time solution operand is big.
Claims (5)
1. a kind of industrial machinery arm autocontrol method based on depth enhancing study, it is characterised in that:The control method bag
Include following steps:
Step 1) structure depth enhancing learning model
1.1) experience pond initializes:Experience pond is set as m rows, the two-dimensional matrix of n row, the value of each element is initial in two-dimensional matrix
0 is turned to, wherein, the information content that m is sample size size, n is each sample storage, n=2 × state_dim+action_
The dimension that dim+1, state_dim are the dimension of state, action_dim is action;Meanwhile it reserves and is used in experience pond
The space of incentive message 1 is stored, 1 in this formula of n=2 × state_dim+action_dim+1 is storage incentive message
Headspace;
1.2) neutral net initializes:Neutral net is divided into two parts of Actor networks and Critic networks, and Actor networks are
Behavior network, Critic networks for evaluation network, be each partly divided into not Gou Jian two structure is identical and parameter is different
Eval net and target net, eval net are that estimation network, target net are objective network, so as to formed μ (s | θμ)
Network, μ (s | θμ′) network, Q (s, a | θQ) network and Q (s, a | θQ′) network totally four networks, i.e. μ (s | θμ) network estimates for behavior
Count network, μ (s | θμ′) network for performance-based objective network, Q (s, a | θQ) network for evaluation estimation network, Q (s, a | θQ′) network is
Evaluate objective network;Random initializtion μ (s | θμ) network parameter θμWith random initializtion Q (s, a | θQ) network parameter θQ, so
Afterwards by μ (s | θμ) network parameter θμValue assigns performance-based objective network, i.e. θμ′←θμ, by Q (s, a | θQ) network parameter θQValue is assigned
Give evaluation objective network, i.e. θQ′←θQ;
Step 2) construction output interference
According to current input state st, pass throughNetwork obtains action at', an average is reset as at', variance be
var2Random normal distributionIt is distributed from random normalIn be randomly derived a reality output working value at, random normal
DistributionTo acting at' interference is applied with, for exploring environment, wherein,Represent the parameter of t moment evaluation estimation network, t
For current input state at the time of;
Step 3) establishes reward rtComputation model
Step 4) builds simulated environment
Robot simulation simulation softward V-REP have the major industrial robot in the world model, based on this, the emulation ring of robotic arm
Difficulty reduction is built in border, by V-REP (Virtual Robot Experimentation Platform) software, is built and real
The simulated environment that border application is consistent;
Step 5) is accumulated experience pond
5.1) according to current input state st, pass throughNetwork obtains action at', the output established further according to step 2) is done
It disturbs to obtain reality output action at, and the r that receives awards from environmenttWith follow-up input state st+1, by current input state st, it is real
Border output action at, reward rtWith follow-up input state st+1It is stored in experience pond, and by current input state st, reality output
Act at, reward rt, follow-up input state st+1It is referred to as state transinformation transition;
5.2) by follow-up input state st+1As present current input state st, repeat step 5.1), the shape that will be calculated
State transinformation transition is stored in experience pond;
5.3) until the space in experience pond is full by storage, the space in experience pond is often performed once repetition step 5.2) after storage completely
Step 5.2), which just redirects, performs a step 6);
Step 6) training deeply learning neural network
6.1) sample
Batch groups sample is taken out from experience pond for neural network learning, batch represents natural number;
6.2) evaluation network parameter is updated
6.3) behavior estimation network parameter is updated
6.4) objective network parameter is updated
6.5) it is divided into xm bouts, each bout repeats step 6.1)~6.4) xn times, every time repeatedly 6.1)~6.4) after, output is dry
The var values disturbed are updated to var=max { 0.1, var=var × gamma }, and wherein xm, xn represents natural number, gamma be more than
Zero is less than 1 rational;
Step 7) utilizes trained depth enhancing learning model control machinery arm movement in practice in step 6)
7.1) in true environment, the input of industrial ccd cameras pre-processes, and the picture of t moment after gaussian filtering by being used as
For the state of Processing with Neural Network;
7.2) the current input state s of true environment is obtained by camerat, depth enhancing learning network is according to current input state
stControl machinery arm rotates, and obtains follow-up input state st+1.By follow-up input state st+1As current input state st, so
Xun Huan, until depth enhancing learning model control machinery arm grabs target.
2. the industrial machinery arm autocontrol method according to claim 1 based on depth enhancing study, it is characterised in that:Institute
It states in step 3), establishes reward rtThe detailed process of computation model is:
Mechanical arm is obtained the image information of t moment by the industrial ccd cameras in environment, adds Gaussian noise and obtain in t moment
To current input state st, state is current input stWhen from step 2) random normal distributionIn be randomly derived a reality
Border output action value at, (i.e. the rotation angle of each axis of mechanical arm), mechanical arm tail end position coordinates is x1t,y1t,z1t, target
Position is x0t,y0t,z0t, reward
3. the industrial machinery arm autocontrol method according to claim 1 based on depth enhancing study, it is characterised in that:Institute
It states in step 6.2), the detailed process being updated to evaluation network parameter is:
Batch group sample state transinformations transition according to being taken out in step 6.1) passes throughNetwork andNetwork respectively obtains the corresponding estimation Q ' value eval_Q ' and target Q ' values target_ of every group of state transinformation
Q ', and then obtain time difference mistake TD_error ', TD_error '=target_Q '-eval_Q ';T ' is in step 5.3)
Experience pool space is performed the input state moment of step 5.2) by storage after full, that is to say, that experience pool space quilt in step 5.3)
Input state moment when often performing a step 5.2) after storage is full is t ';
Loss function Loss, Loss=∑ TD_error '/batch is constructed according to time difference mistake TD_error ';
According to loss function Loss using gradient descent method to evaluation estimation network parameter θQIt is updated.
4. the industrial machinery arm autocontrol method according to claim 1 based on depth enhancing study, it is characterised in that:Institute
It states in step 6.3), the detailed process being updated to behavior estimation network parameter is:
Per the s in batch group sample state transinformations transitiontPass throughNetwork and output interference obtain pair
The reality output action a answeredt, according toEstimation Q ' value the eval_Q ' of network act a to reality outputtIt differentiates, obtains
A is acted to reality output to estimation Q ' valuestGradient It represents and a is acted to reality outputtDerivation
Number;According toThe reality output action a of networktValue pairNetwork parameter is differentiated, and obtains reality output action
atValue pairThe gradient of network parameterWhereinRepresent the parameter derivation to behavior estimation network
Number;
Estimate that Q values act a to reality outputtGradientA is acted with reality outputtValue estimates behavior
Count the gradient of network parameterProduct be estimate Q values to behavior estimate network parameter gradient;
Behavior estimation network parameter is updated using gradient rise method.
5. the industrial machinery arm autocontrol method according to claim 1 based on depth enhancing study, it is characterised in that:Institute
It states in step 6.4), the detailed process being updated to objective network parameter is:
At interval of J bouts, the network parameter of actor_eval is assigned to actor_target, at interval of K bouts, critic_
The network parameter of eval is assigned to critic_target, wherein, J ≠ K.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711275146.7A CN108052004B (en) | 2017-12-06 | 2017-12-06 | Industrial mechanical arm automatic control method based on deep reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711275146.7A CN108052004B (en) | 2017-12-06 | 2017-12-06 | Industrial mechanical arm automatic control method based on deep reinforcement learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108052004A true CN108052004A (en) | 2018-05-18 |
CN108052004B CN108052004B (en) | 2020-11-10 |
Family
ID=62121722
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711275146.7A Active CN108052004B (en) | 2017-12-06 | 2017-12-06 | Industrial mechanical arm automatic control method based on deep reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108052004B (en) |
Cited By (49)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108803615A (en) * | 2018-07-03 | 2018-11-13 | 东南大学 | A kind of visual human's circumstances not known navigation algorithm based on deeply study |
CN108927806A (en) * | 2018-08-13 | 2018-12-04 | 哈尔滨工业大学(深圳) | A kind of industrial robot learning method applied to high-volume repeatability processing |
CN109242099A (en) * | 2018-08-07 | 2019-01-18 | 中国科学院深圳先进技术研究院 | Training method, device, training equipment and the storage medium of intensified learning network |
CN109240280A (en) * | 2018-07-05 | 2019-01-18 | 上海交通大学 | Anchoring auxiliary power positioning system control method based on intensified learning |
CN109352648A (en) * | 2018-10-12 | 2019-02-19 | 北京地平线机器人技术研发有限公司 | Control method, device and the electronic equipment of mechanical mechanism |
CN109352649A (en) * | 2018-10-15 | 2019-02-19 | 同济大学 | A kind of method for controlling robot and system based on deep learning |
CN109379752A (en) * | 2018-09-10 | 2019-02-22 | 中国移动通信集团江苏有限公司 | Optimization method, device, equipment and the medium of Massive MIMO |
CN109483534A (en) * | 2018-11-08 | 2019-03-19 | 腾讯科技(深圳)有限公司 | A kind of grasping body methods, devices and systems |
CN109614631A (en) * | 2018-10-18 | 2019-04-12 | 清华大学 | Aircraft Full automatic penumatic optimization method based on intensified learning and transfer learning |
CN109605377A (en) * | 2019-01-21 | 2019-04-12 | 厦门大学 | A kind of joint of robot motion control method and system based on intensified learning |
CN109800864A (en) * | 2019-01-18 | 2019-05-24 | 中山大学 | A kind of robot Active Learning Method based on image input |
CN109948642A (en) * | 2019-01-18 | 2019-06-28 | 中山大学 | Multiple agent cross-module state depth deterministic policy gradient training method based on image input |
CN110053053A (en) * | 2019-06-14 | 2019-07-26 | 西南科技大学 | Mechanical arm based on deeply study screws the adaptive approach of valve |
CN110053034A (en) * | 2019-05-23 | 2019-07-26 | 哈尔滨工业大学 | A kind of multi purpose space cellular machineries people's device of view-based access control model |
CN110070099A (en) * | 2019-02-20 | 2019-07-30 | 北京航空航天大学 | A kind of industrial data feature structure method based on intensified learning |
CN110125939A (en) * | 2019-06-03 | 2019-08-16 | 湖南工学院 | A kind of method of Robot Virtual visualization control |
CN110238839A (en) * | 2019-04-11 | 2019-09-17 | 清华大学 | It is a kind of to optimize non-molding machine people multi peg-in-hole control method using environmental forecasting |
CN110370295A (en) * | 2019-07-02 | 2019-10-25 | 浙江大学 | Soccer robot active control suction ball method based on deeply study |
CN110400345A (en) * | 2019-07-24 | 2019-11-01 | 西南科技大学 | Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting |
CN110826701A (en) * | 2019-11-15 | 2020-02-21 | 北京邮电大学 | Method for carrying out system identification on two-degree-of-freedom flexible leg based on BP neural network algorithm |
CN110879595A (en) * | 2019-11-29 | 2020-03-13 | 江苏徐工工程机械研究院有限公司 | Unmanned mine card tracking control system and method based on deep reinforcement learning |
CN110900601A (en) * | 2019-11-15 | 2020-03-24 | 武汉理工大学 | Robot operation autonomous control method for human-robot cooperation safety guarantee |
CN110909859A (en) * | 2019-11-29 | 2020-03-24 | 中国科学院自动化研究所 | Bionic robot fish motion control method and system based on antagonistic structured control |
CN111223141A (en) * | 2019-12-31 | 2020-06-02 | 东华大学 | Automatic assembly line work efficiency optimization system and method based on reinforcement learning |
CN111360834A (en) * | 2020-03-25 | 2020-07-03 | 中南大学 | Humanoid robot motion control method and system based on deep reinforcement learning |
CN111461325A (en) * | 2020-03-30 | 2020-07-28 | 华南理工大学 | Multi-target layered reinforcement learning algorithm for sparse rewarding environment problem |
CN111476257A (en) * | 2019-01-24 | 2020-07-31 | 富士通株式会社 | Information processing method and information processing apparatus |
CN111487863A (en) * | 2020-04-14 | 2020-08-04 | 东南大学 | Active suspension reinforcement learning control method based on deep Q neural network |
CN111515961A (en) * | 2020-06-02 | 2020-08-11 | 南京大学 | Reinforcement learning reward method suitable for mobile mechanical arm |
CN111618847A (en) * | 2020-04-22 | 2020-09-04 | 南通大学 | Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements |
CN111644398A (en) * | 2020-05-28 | 2020-09-11 | 华中科技大学 | Push-grab cooperative sorting network based on double viewing angles and sorting method and system thereof |
CN111881772A (en) * | 2020-07-06 | 2020-11-03 | 上海交通大学 | Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning |
EP3760390A1 (en) * | 2019-07-01 | 2021-01-06 | KUKA Deutschland GmbH | Performance of a predetermined task using at least one robot |
WO2021001312A1 (en) * | 2019-07-01 | 2021-01-07 | Kuka Deutschland Gmbh | Carrying out an application using at least one robot |
CN112338921A (en) * | 2020-11-16 | 2021-02-09 | 西华师范大学 | Mechanical arm intelligent control rapid training method based on deep reinforcement learning |
CN112405543A (en) * | 2020-11-23 | 2021-02-26 | 长沙理工大学 | Mechanical arm dense object temperature-first grabbing method based on deep reinforcement learning |
CN112434464A (en) * | 2020-11-09 | 2021-03-02 | 中国船舶重工集团公司第七一六研究所 | Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG reinforcement learning algorithm |
CN112506044A (en) * | 2020-09-10 | 2021-03-16 | 上海交通大学 | Flexible arm control and planning method based on visual feedback and reinforcement learning |
CN112528552A (en) * | 2020-10-23 | 2021-03-19 | 洛阳银杏科技有限公司 | Mechanical arm control model construction method based on deep reinforcement learning |
CN112643668A (en) * | 2020-12-01 | 2021-04-13 | 浙江工业大学 | Mechanical arm pushing and grabbing cooperation method suitable for intensive environment |
CN112894796A (en) * | 2019-11-19 | 2021-06-04 | 财团法人工业技术研究院 | Gripping device and gripping method |
CN113159410A (en) * | 2021-04-14 | 2021-07-23 | 北京百度网讯科技有限公司 | Training method for automatic control model and fluid supply system control method |
CN113283167A (en) * | 2021-05-24 | 2021-08-20 | 暨南大学 | Special equipment production line optimization method and system based on safety reinforcement learning |
CN113510709A (en) * | 2021-07-28 | 2021-10-19 | 北京航空航天大学 | Industrial robot pose precision online compensation method based on deep reinforcement learning |
CN113843802A (en) * | 2021-10-18 | 2021-12-28 | 南京理工大学 | Mechanical arm motion control method based on deep reinforcement learning TD3 algorithm |
WO2022142271A1 (en) * | 2020-12-31 | 2022-07-07 | 山东大学 | Comprehensive intelligent nursing system and method for high infectiousness isolation ward |
CN114789444A (en) * | 2022-05-05 | 2022-07-26 | 山东省人工智能研究院 | Compliant human-computer contact method based on deep reinforcement learning and impedance control |
CN115464659A (en) * | 2022-10-05 | 2022-12-13 | 哈尔滨理工大学 | Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information |
CN117618125A (en) * | 2024-01-25 | 2024-03-01 | 科弛医疗科技(北京)有限公司 | Image trolley |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105690392A (en) * | 2016-04-14 | 2016-06-22 | 苏州大学 | Robot motion control method and device based on actor-critic method |
CN106094516A (en) * | 2016-06-08 | 2016-11-09 | 南京大学 | A kind of robot self-adapting grasping method based on deeply study |
WO2017083772A1 (en) * | 2015-11-12 | 2017-05-18 | Google Inc. | Asynchronous deep reinforcement learning |
CN107065881A (en) * | 2017-05-17 | 2017-08-18 | 清华大学 | A kind of robot global path planning method learnt based on deeply |
-
2017
- 2017-12-06 CN CN201711275146.7A patent/CN108052004B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017083772A1 (en) * | 2015-11-12 | 2017-05-18 | Google Inc. | Asynchronous deep reinforcement learning |
CN105690392A (en) * | 2016-04-14 | 2016-06-22 | 苏州大学 | Robot motion control method and device based on actor-critic method |
CN106094516A (en) * | 2016-06-08 | 2016-11-09 | 南京大学 | A kind of robot self-adapting grasping method based on deeply study |
CN107065881A (en) * | 2017-05-17 | 2017-08-18 | 清华大学 | A kind of robot global path planning method learnt based on deeply |
Non-Patent Citations (2)
Title |
---|
JELLE MUNK等: "Learning State Representation for Deep Actor-Critic Control", 《2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL》 * |
唐鹏: "机器人足球行为控制学习算法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (78)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108803615A (en) * | 2018-07-03 | 2018-11-13 | 东南大学 | A kind of visual human's circumstances not known navigation algorithm based on deeply study |
CN109240280A (en) * | 2018-07-05 | 2019-01-18 | 上海交通大学 | Anchoring auxiliary power positioning system control method based on intensified learning |
CN109240280B (en) * | 2018-07-05 | 2021-09-07 | 上海交通大学 | Anchoring auxiliary power positioning system control method based on reinforcement learning |
CN109242099A (en) * | 2018-08-07 | 2019-01-18 | 中国科学院深圳先进技术研究院 | Training method, device, training equipment and the storage medium of intensified learning network |
CN109242099B (en) * | 2018-08-07 | 2020-11-10 | 中国科学院深圳先进技术研究院 | Training method and device of reinforcement learning network, training equipment and storage medium |
CN108927806A (en) * | 2018-08-13 | 2018-12-04 | 哈尔滨工业大学(深圳) | A kind of industrial robot learning method applied to high-volume repeatability processing |
CN109379752B (en) * | 2018-09-10 | 2021-09-24 | 中国移动通信集团江苏有限公司 | Massive MIMO optimization method, device, equipment and medium |
CN109379752A (en) * | 2018-09-10 | 2019-02-22 | 中国移动通信集团江苏有限公司 | Optimization method, device, equipment and the medium of Massive MIMO |
CN109352648A (en) * | 2018-10-12 | 2019-02-19 | 北京地平线机器人技术研发有限公司 | Control method, device and the electronic equipment of mechanical mechanism |
CN109352649A (en) * | 2018-10-15 | 2019-02-19 | 同济大学 | A kind of method for controlling robot and system based on deep learning |
CN109352649B (en) * | 2018-10-15 | 2021-07-20 | 同济大学 | Manipulator control method and system based on deep learning |
CN109614631A (en) * | 2018-10-18 | 2019-04-12 | 清华大学 | Aircraft Full automatic penumatic optimization method based on intensified learning and transfer learning |
CN109483534A (en) * | 2018-11-08 | 2019-03-19 | 腾讯科技(深圳)有限公司 | A kind of grasping body methods, devices and systems |
CN109948642A (en) * | 2019-01-18 | 2019-06-28 | 中山大学 | Multiple agent cross-module state depth deterministic policy gradient training method based on image input |
CN109800864A (en) * | 2019-01-18 | 2019-05-24 | 中山大学 | A kind of robot Active Learning Method based on image input |
CN109605377A (en) * | 2019-01-21 | 2019-04-12 | 厦门大学 | A kind of joint of robot motion control method and system based on intensified learning |
CN111476257A (en) * | 2019-01-24 | 2020-07-31 | 富士通株式会社 | Information processing method and information processing apparatus |
CN110070099A (en) * | 2019-02-20 | 2019-07-30 | 北京航空航天大学 | A kind of industrial data feature structure method based on intensified learning |
CN110238839A (en) * | 2019-04-11 | 2019-09-17 | 清华大学 | It is a kind of to optimize non-molding machine people multi peg-in-hole control method using environmental forecasting |
CN110238839B (en) * | 2019-04-11 | 2020-10-20 | 清华大学 | Multi-shaft-hole assembly control method for optimizing non-model robot by utilizing environment prediction |
CN110053034A (en) * | 2019-05-23 | 2019-07-26 | 哈尔滨工业大学 | A kind of multi purpose space cellular machineries people's device of view-based access control model |
CN110125939A (en) * | 2019-06-03 | 2019-08-16 | 湖南工学院 | A kind of method of Robot Virtual visualization control |
CN110125939B (en) * | 2019-06-03 | 2020-10-20 | 湖南工学院 | Virtual visual control method for robot |
CN110053053B (en) * | 2019-06-14 | 2022-04-12 | 西南科技大学 | Self-adaptive method of mechanical arm screwing valve based on deep reinforcement learning |
CN110053053A (en) * | 2019-06-14 | 2019-07-26 | 西南科技大学 | Mechanical arm based on deeply study screws the adaptive approach of valve |
WO2021001312A1 (en) * | 2019-07-01 | 2021-01-07 | Kuka Deutschland Gmbh | Carrying out an application using at least one robot |
CN114051444B (en) * | 2019-07-01 | 2024-04-26 | 库卡德国有限公司 | Executing an application by means of at least one robot |
EP3760390A1 (en) * | 2019-07-01 | 2021-01-06 | KUKA Deutschland GmbH | Performance of a predetermined task using at least one robot |
CN114051444A (en) * | 2019-07-01 | 2022-02-15 | 库卡德国有限公司 | Executing an application by means of at least one robot |
CN110370295B (en) * | 2019-07-02 | 2020-12-18 | 浙江大学 | Small-sized football robot active control ball suction method based on deep reinforcement learning |
CN110370295A (en) * | 2019-07-02 | 2019-10-25 | 浙江大学 | Soccer robot active control suction ball method based on deeply study |
CN110400345B (en) * | 2019-07-24 | 2021-06-15 | 西南科技大学 | Deep reinforcement learning-based radioactive waste push-grab cooperative sorting method |
CN110400345A (en) * | 2019-07-24 | 2019-11-01 | 西南科技大学 | Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting |
CN110900601B (en) * | 2019-11-15 | 2022-06-03 | 武汉理工大学 | Robot operation autonomous control method for human-robot cooperation safety guarantee |
CN110826701A (en) * | 2019-11-15 | 2020-02-21 | 北京邮电大学 | Method for carrying out system identification on two-degree-of-freedom flexible leg based on BP neural network algorithm |
CN110900601A (en) * | 2019-11-15 | 2020-03-24 | 武汉理工大学 | Robot operation autonomous control method for human-robot cooperation safety guarantee |
CN112894796A (en) * | 2019-11-19 | 2021-06-04 | 财团法人工业技术研究院 | Gripping device and gripping method |
TWI790408B (en) * | 2019-11-19 | 2023-01-21 | 財團法人工業技術研究院 | Gripping device and gripping method |
CN112894796B (en) * | 2019-11-19 | 2023-09-05 | 财团法人工业技术研究院 | Grabbing device and grabbing method |
CN110909859B (en) * | 2019-11-29 | 2023-03-24 | 中国科学院自动化研究所 | Bionic robot fish motion control method and system based on antagonistic structured control |
CN110909859A (en) * | 2019-11-29 | 2020-03-24 | 中国科学院自动化研究所 | Bionic robot fish motion control method and system based on antagonistic structured control |
CN110879595A (en) * | 2019-11-29 | 2020-03-13 | 江苏徐工工程机械研究院有限公司 | Unmanned mine card tracking control system and method based on deep reinforcement learning |
CN111223141A (en) * | 2019-12-31 | 2020-06-02 | 东华大学 | Automatic assembly line work efficiency optimization system and method based on reinforcement learning |
CN111223141B (en) * | 2019-12-31 | 2023-10-24 | 东华大学 | Automatic pipeline work efficiency optimization system and method based on reinforcement learning |
CN111360834A (en) * | 2020-03-25 | 2020-07-03 | 中南大学 | Humanoid robot motion control method and system based on deep reinforcement learning |
CN111461325B (en) * | 2020-03-30 | 2023-06-20 | 华南理工大学 | Multi-target layered reinforcement learning algorithm for sparse rewarding environmental problem |
CN111461325A (en) * | 2020-03-30 | 2020-07-28 | 华南理工大学 | Multi-target layered reinforcement learning algorithm for sparse rewarding environment problem |
CN111487863B (en) * | 2020-04-14 | 2022-06-17 | 东南大学 | Active suspension reinforcement learning control method based on deep Q neural network |
CN111487863A (en) * | 2020-04-14 | 2020-08-04 | 东南大学 | Active suspension reinforcement learning control method based on deep Q neural network |
CN111618847A (en) * | 2020-04-22 | 2020-09-04 | 南通大学 | Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements |
CN111618847B (en) * | 2020-04-22 | 2022-06-21 | 南通大学 | Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements |
CN111644398A (en) * | 2020-05-28 | 2020-09-11 | 华中科技大学 | Push-grab cooperative sorting network based on double viewing angles and sorting method and system thereof |
CN111515961A (en) * | 2020-06-02 | 2020-08-11 | 南京大学 | Reinforcement learning reward method suitable for mobile mechanical arm |
CN111515961B (en) * | 2020-06-02 | 2022-06-21 | 南京大学 | Reinforcement learning reward method suitable for mobile mechanical arm |
CN111881772B (en) * | 2020-07-06 | 2023-11-07 | 上海交通大学 | Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning |
CN111881772A (en) * | 2020-07-06 | 2020-11-03 | 上海交通大学 | Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning |
CN112506044A (en) * | 2020-09-10 | 2021-03-16 | 上海交通大学 | Flexible arm control and planning method based on visual feedback and reinforcement learning |
CN112528552A (en) * | 2020-10-23 | 2021-03-19 | 洛阳银杏科技有限公司 | Mechanical arm control model construction method based on deep reinforcement learning |
CN112434464B (en) * | 2020-11-09 | 2021-09-10 | 中国船舶重工集团公司第七一六研究所 | Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG algorithm |
CN112434464A (en) * | 2020-11-09 | 2021-03-02 | 中国船舶重工集团公司第七一六研究所 | Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG reinforcement learning algorithm |
CN112338921A (en) * | 2020-11-16 | 2021-02-09 | 西华师范大学 | Mechanical arm intelligent control rapid training method based on deep reinforcement learning |
CN112405543A (en) * | 2020-11-23 | 2021-02-26 | 长沙理工大学 | Mechanical arm dense object temperature-first grabbing method based on deep reinforcement learning |
CN112405543B (en) * | 2020-11-23 | 2022-05-06 | 长沙理工大学 | Mechanical arm dense object temperature-first grabbing method based on deep reinforcement learning |
CN112643668A (en) * | 2020-12-01 | 2021-04-13 | 浙江工业大学 | Mechanical arm pushing and grabbing cooperation method suitable for intensive environment |
CN112643668B (en) * | 2020-12-01 | 2022-05-24 | 浙江工业大学 | Mechanical arm pushing and grabbing cooperation method suitable for intensive environment |
WO2022142271A1 (en) * | 2020-12-31 | 2022-07-07 | 山东大学 | Comprehensive intelligent nursing system and method for high infectiousness isolation ward |
CN113159410B (en) * | 2021-04-14 | 2024-02-27 | 北京百度网讯科技有限公司 | Training method of automatic control model and fluid supply system control method |
CN113159410A (en) * | 2021-04-14 | 2021-07-23 | 北京百度网讯科技有限公司 | Training method for automatic control model and fluid supply system control method |
CN113283167A (en) * | 2021-05-24 | 2021-08-20 | 暨南大学 | Special equipment production line optimization method and system based on safety reinforcement learning |
CN113510709A (en) * | 2021-07-28 | 2021-10-19 | 北京航空航天大学 | Industrial robot pose precision online compensation method based on deep reinforcement learning |
CN113510709B (en) * | 2021-07-28 | 2022-08-19 | 北京航空航天大学 | Industrial robot pose precision online compensation method based on deep reinforcement learning |
CN113843802A (en) * | 2021-10-18 | 2021-12-28 | 南京理工大学 | Mechanical arm motion control method based on deep reinforcement learning TD3 algorithm |
CN113843802B (en) * | 2021-10-18 | 2023-09-05 | 南京理工大学 | Mechanical arm motion control method based on deep reinforcement learning TD3 algorithm |
CN114789444B (en) * | 2022-05-05 | 2022-12-16 | 山东省人工智能研究院 | Compliant human-computer contact method based on deep reinforcement learning and impedance control |
CN114789444A (en) * | 2022-05-05 | 2022-07-26 | 山东省人工智能研究院 | Compliant human-computer contact method based on deep reinforcement learning and impedance control |
CN115464659A (en) * | 2022-10-05 | 2022-12-13 | 哈尔滨理工大学 | Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information |
CN115464659B (en) * | 2022-10-05 | 2023-10-24 | 哈尔滨理工大学 | Mechanical arm grabbing control method based on visual information deep reinforcement learning DDPG algorithm |
CN117618125A (en) * | 2024-01-25 | 2024-03-01 | 科弛医疗科技(北京)有限公司 | Image trolley |
Also Published As
Publication number | Publication date |
---|---|
CN108052004B (en) | 2020-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108052004A (en) | Industrial machinery arm autocontrol method based on depth enhancing study | |
Chen et al. | A system for general in-hand object re-orientation | |
US11928765B2 (en) | Animation implementation method and apparatus, electronic device, and storage medium | |
Guo et al. | Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning | |
EP3825962A3 (en) | Virtual object driving method, apparatus, electronic device, and readable storage medium | |
CN109523029A (en) | For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body | |
CN109782600A (en) | A method of autonomous mobile robot navigation system is established by virtual environment | |
Kusuma | FIBROUS ROOT MODEL IN BATIK PATTERN GENERATION. | |
CN109800864A (en) | A kind of robot Active Learning Method based on image input | |
CN107679522A (en) | Action identification method based on multithread LSTM | |
CN110315544B (en) | Robot operation learning method based on video image demonstration | |
CN108229678A (en) | Network training method, method of controlling operation thereof, device, storage medium and equipment | |
Jiang et al. | Mastering the complex assembly task with a dual-arm robot: A novel reinforcement learning method | |
Vacaro et al. | Sim-to-real in reinforcement learning for everyone | |
Zakaria et al. | Robotic control of the deformation of soft linear objects using deep reinforcement learning | |
Zhang et al. | Reinforcement learning based pushing and grasping objects from ungraspable poses | |
Lv et al. | Sam-rl: Sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering | |
Kim et al. | Pre-and post-contact policy decomposition for non-prehensile manipulation with zero-shot sim-to-real transfer | |
CN108944940A (en) | Driving behavior modeling method neural network based | |
Chen et al. | A simple method for complex in-hand manipulation | |
WO2021100267A1 (en) | Information processing device and information processing method | |
Sanchez et al. | Towards advanced robotic manipulation | |
CN110751869B (en) | Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method | |
CN109635942B (en) | Brain excitation state and inhibition state imitation working state neural network circuit structure and method | |
Li et al. | Learning a skill-sequence-dependent policy for long-horizon manipulation tasks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |