CN108052004A - Industrial machinery arm autocontrol method based on depth enhancing study - Google Patents

Industrial machinery arm autocontrol method based on depth enhancing study Download PDF

Info

Publication number
CN108052004A
CN108052004A CN201711275146.7A CN201711275146A CN108052004A CN 108052004 A CN108052004 A CN 108052004A CN 201711275146 A CN201711275146 A CN 201711275146A CN 108052004 A CN108052004 A CN 108052004A
Authority
CN
China
Prior art keywords
network
input state
state
parameter
network parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711275146.7A
Other languages
Chinese (zh)
Other versions
CN108052004B (en
Inventor
柯丰恺
周唯倜
赵大兴
孙国栋
许万
丁国龙
吴震宇
赵迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei University of Technology
Original Assignee
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei University of Technology filed Critical Hubei University of Technology
Priority to CN201711275146.7A priority Critical patent/CN108052004B/en
Publication of CN108052004A publication Critical patent/CN108052004A/en
Application granted granted Critical
Publication of CN108052004B publication Critical patent/CN108052004B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/0265Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
    • G05B13/027Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
    • G05B13/045Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance using a perturbation signal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Automation & Control Theory (AREA)
  • Mechanical Engineering (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Geometry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Feedback Control In General (AREA)

Abstract

The present invention relates to a kind of industrial machinery arm autocontrol method based on depth enhancing study, reward r is established in structure depth enhancing learning model, construction output interferencetComputation model builds simulated environment, pond of accumulating experience, training deeply learning neural network and utilizes the control machinery arm movement in practice of trained depth enhancing learning model.Enhance learning network by adding in depth, solve the problems, such as mechanical arm automatically controlling in complex environment, complete automatically controlling for mechanical arm, and the speed of service is fast after the completion of training, precision is high.

Description

Industrial machinery arm autocontrol method based on depth enhancing study
Technical field
The invention belongs to nitrification enhancement technical fields, and in particular to a kind of industrial machinery based on depth enhancing study Arm autocontrol method.
Background technology
Industrial machinery arm can more be efficiently completed some simple repetitions and bulky behaviour for manpower Make, while production efficiency is greatly improved, labour cost and labor intensity can also be reduced, ensureing the same of the quality of production When can reduce the probability that artificial accident occurs again.In some adverse circumstances, as high temperature, high pressure, low temperature, low pressure, dust, easily Combustion, explosive etc. replace manual work by mechanical arm, can prevent, because of artificial accident caused by negligence of operation, having great Meaning.
Then the movement solution procedure of mechanical arm obtains the pose letter of itself to obtain the posture information of crawl target first Breath goes out the rotation angle of each axis by reverse Dynamic solving.Due to the flexible effect of joint in motion process and connecting rod In the presence of, structure is made to deform, precision reduce.It is achieved that the control of flexible mechanical arm is a big problem.Common controlling party Method has PID control, force-feedback control, self adaptive control, fuzzy and ANN Control etc..Wherein ANN Control has bright The advantages of aobvious, is not required the mathematical model of control target, and in the society of following artificial intelligence, based on neutral net It automatically controls and will be mainstream.
The content of the invention
It is an object of the invention to provide a kind of industrial machinery arm autocontrol methods based on depth enhancing study, pass through Depth enhancing learning network is added in, mechanical arm automatically controlling in complex environment is solved the problems, such as, completes automatically controlling for mechanical arm.
To achieve the above object, the industrial machinery arm side of automatically controlling based on depth enhancing study designed by the present invention Method, it is characterised in that:The control method includes the following steps:
Step 1) structure depth enhancing learning model
1.1) experience pond initializes:Experience pond is set as m rows, the two-dimensional matrix of n row, the value of each element in two-dimensional matrix 0 is initialized as, wherein, the information content that m is sample size size, n is each sample storage, n=2 × state_dim+ The dimension that action_dim+1, state_dim are the dimension of state, action_dim is action;Meanwhile it is reserved in experience pond Go out to store the space of incentive message, 1 in this formula of n=2 × state_dim+action_dim+1 is storage prize Encourage the headspace of information;
1.2) neutral net initializes:Neutral net is divided into two parts of Actor networks and Critic networks, Actor nets Network is behavior network, Critic networks are evaluation network, be each partly divided into not Gou Jian two structures are identical and parameter not With eval net and target net, eval net be that estimation network, target net are objective network, so as to formed μ (s | θμ) network, μ (s | θμ′) network, Q (s, a | θQ) network and Q (s, a | θQ′) network totally four networks, i.e. μ (s | θμ) network for row For estimation network, μ (s | θμ′) network for performance-based objective network, Q (s, a | θQ) network for evaluation estimation network, Q (s, a | θQ′) net Network is evaluation objective network;Random initializtion μ (s | θμ) network parameter θμWith random initializtion Q (s, a | θQ) network parameter θQ, then by μ (s | θμ) network parameter θμValue assigns performance-based objective network, i.e. θμ′←θμ, by Q (s, a | θQ) network parameter θQ Value assigns evaluation objective network, i.e. θQ′←θQ
Step 2) construction output interference
According to current input state st, pass throughNetwork obtains action at', an average is reset as at', variance For var2Random normal distributionIt is distributed from random normalIn be randomly derived a reality output working value at, at random just State is distributedTo acting at' interference is applied with, for exploring environment, wherein,The parameter of t moment evaluation estimation network is represented, At the time of t is current input state;
Step 3) establishes reward rtComputation model
Step 4) builds simulated environment
Robot simulation simulation softward V-REP has the model of the major industrial robot in the world, and based on this, robotic arm is imitated True environmental structure difficulty reduces, and passes through V-REP (Virtual Robot Experimentation Platform) software, structure The simulated environment being consistent with practical application;
Step 5) is accumulated experience pond
5.1) according to current input state st, pass throughNetwork obtains action at', it is established further according to step 2) defeated Go out interference and obtain reality output action at, and the r that receives awards from environmenttWith follow-up input state st+1, by current input state st, reality output action at, reward rtWith follow-up input state st+1It is stored in experience pond, and by current input state st, it is real Border output action at, reward rt, follow-up input state st+1It is referred to as state transinformation transition;
5.2) by follow-up input state st+1As present current input state st, step 5.1) is repeated, will be calculated State transinformation transition be stored in experience pond;
5.3) until the space in experience pond is full by storage, the space in experience pond is often performed repetition step 5.2) after storage completely Step 5.2), which just redirects, performs a step 6);
Step 6) training deeply learning neural network
6.1) sample
Batch groups sample is taken out from experience pond for neural network learning, batch represents natural number;
6.2) evaluation network parameter is updated
6.3) behavior estimation network parameter is updated
6.4) objective network parameter is updated
6.5) it is divided into xm bouts, each bout repeats step 6.1)~6.4) xn times, every time repeatedly 6.1)~6.4) after, it is defeated The var values for going out interference are updated to var=max { 0.1, var=var × gamma }, and wherein xm, xn represents natural number, and gamma is It is less than 1 rational more than zero;
Step 7) utilizes trained depth enhancing learning model control machinery arm movement in practice in step 6)
7.1) in true environment, the input of industrial ccd cameras pre-processes, after the picture of t moment is by gaussian filtering As the state for Processing with Neural Network;
7.2) the current input state s of true environment is obtained by camerat, depth enhancing learning network is according to current input State stControl machinery arm rotates, and obtains follow-up input state st+1.By follow-up input state st+1As current input state st, So Xun Huan, until depth enhancing learning model control machinery arm grabs target.
Further, in the step 3), reward r is establishedtThe detailed process of computation model is:
Mechanical arm is obtained the image information of t moment by the industrial ccd cameras in environment, adds Gauss and make an uproar in t moment Sound obtains current input state st, state is current input stWhen from step 2) random normal distributionIn be randomly derived one A reality output working value at, (i.e. the rotation angle of each axis of mechanical arm), mechanical arm tail end position coordinates is x1t,y1t,z1t, Target location is x0t,y0t,z0t, reward
Further, in the step 6.2), the detailed process being updated to evaluation network parameter is:
Batch group sample state transinformations transition according to being taken out in step 6.1) passes throughNetwork WithNetwork respectively obtains the corresponding estimation Q ' value eval_Q ' and target Q ' values target_ of every group of state transinformation Q ', and then obtain time difference mistake TD_error ', TD_error '=target_Q '-eval_Q ';T ' is in step 5.3) Experience pool space is performed the input state moment of step 5.2) by storage after full, that is to say, that experience pool space quilt in step 5.3) Input state moment when often performing a step 5.2) after storage is full is t ';
Loss function Loss, Loss=∑ TD_error '/batch is constructed according to time difference mistake TD_error ';
According to loss function Loss using gradient descent method to evaluation estimation network parameter θQIt is updated.
Further, in the step 6.3), the detailed process being updated to behavior estimation network parameter is:
Per the s in batch group sample state transinformations transitiontPass throughNetwork and output are disturbed A is acted to corresponding reality outputt, according toEstimation Q ' value the eval_Q ' of network act a to reality outputtDerivation Number obtains estimation Q ' values and acts a to reality outputtGradientReality output is moved in representative Make atIt differentiates;According toThe reality output action a of networktValue pairNetwork parameter is differentiated, and obtains reality Output action atValue pairThe gradient of network parameterWhereinIt represents to behavior estimation network Parameter is differentiated;
Estimate that Q values act a to reality outputtGradientA is acted with reality outputtValue is to row To estimate the gradient of network parameterProduct be estimate Q values to behavior estimate network parameter gradient;
Behavior estimation network parameter is updated using gradient rise method.
Further, in the step 6.4), the detailed process being updated to objective network parameter is:
At interval of J bouts, the network parameter of actor_eval is assigned to actor_target, at interval of K bouts, The network parameter of critic_eval is assigned to critic_target, wherein, J ≠ K.
Compared with prior art, the present invention it has the following advantages:And the present invention is based on the industrial machineries of depth enhancing study Arm autocontrol method enhances learning network by adding in depth, solves the problems, such as mechanical arm automatically controlling in complex environment, complete Into automatically controlling for mechanical arm, and after the completion of training, the speed of service is fast, precision is high.
Description of the drawings
Fig. 1 is that the present invention is based on the flow diagrams of the industrial machinery arm autocontrol method of depth enhancing study.
Specific embodiment
The present invention is described in further detail in the following with reference to the drawings and specific embodiments.
It is the flow diagram of the industrial machinery arm autocontrol method based on depth enhancing study as shown in Figure 1, including Following steps:
Step 1) structure depth enhancing learning model
1.1) experience pond initializes:Experience pond is set as m rows, the two-dimensional matrix of n row, the value of each element in two-dimensional matrix 0 is initialized as, wherein, the information content that m is sample size size, n is each sample storage, n=2 × state_dim+ The dimension that action_dim+1, state_dim are the dimension of state, action_dim is action;Meanwhile it is reserved in experience pond Go out to store the space of incentive message, 1 in this formula of n=2 × state_dim+action_dim+1 is storage prize Encourage the headspace of information;
1.2) neutral net initializes:Neutral net is divided into two parts of Actor networks and Critic networks, Actor nets Network is behavior network, Critic networks are evaluation network, be each partly divided into not Gou Jian two structures are identical and parameter not With eval net and target net, eval net be that estimation network, target net are objective network, so as to formed μ (s | θμ) network, μ (s | θμ′) network, Q (s, a | θQ) network and Q (s, a | θQ′) network totally four networks, i.e. μ (s | θμ) network for row For estimation network, μ (s | θμ′) network for performance-based objective network, Q (s, a | θQ) network for evaluation estimation network, Q (s, a | θQ′) net Network is evaluation objective network;Random initializtion μ (s | θμ) network parameter θμWith random initializtion Q (s, a | θQ) network parameter θQ, then by μ (s | θμ) network parameter θμValue assigns performance-based objective network, i.e. θμ′←θμ, by Q (s, a | θQ) network parameter θQ Value assigns evaluation objective network, i.e. θQ′←θQ
Step 2) construction output interference
According to current input state st, pass throughNetwork obtains action at', an average is reset as at', variance For var2Random normal distributionIt is distributed from random normalIn be randomly derived a reality output working value at, at random just State is distributedTo acting at' interference is applied with, for exploring environment, wherein,The parameter of t moment evaluation estimation network is represented, At the time of t is current input state;
Step 3) establishes reward rtComputation model
Mechanical arm is obtained the image information of t moment by the industrial ccd cameras in environment, adds Gauss and make an uproar in t moment Sound obtains current input state st, state is current input stWhen from step 2) random normal distributionIn be randomly derived one A reality output working value at, (i.e. the rotation angle of each axis of mechanical arm), mechanical arm tail end position coordinates is x1t,y1t,z1t, Target location is x0t,y0t,z0t, reward
Step 4) builds simulated environment
Robot simulation simulation softward V-REP has the model of the major industrial robot in the world, and based on this, robotic arm is imitated True environmental structure difficulty reduces, and passes through V-REP (Virtual Robot Experimentation Platform) software, structure The simulated environment being consistent with practical application;
Step 5) is accumulated experience pond
5.1) according to current input state st, pass throughNetwork obtains action at', it is established further according to step 2) Output interference obtains reality output action at, and the r that receives awards from environmenttWith follow-up input state st+1, will currently input shape State st, reality output action at, reward rtWith follow-up input state st+1It is stored in experience pond, and by current input state st、 Reality output acts at, reward rt, follow-up input state st+1It is referred to as state transinformation transition;
5.2) by follow-up input state st+1As present current input state st, step 5.1) is repeated, will be calculated State transinformation transition be stored in experience pond;
5.3) until the space in experience pond is full by storage, the space in experience pond is often performed repetition step 5.2) after storage completely Step 5.2), which just redirects, performs a step 6);
Step 6) training deeply learning neural network
6.1) sample
Batch groups sample is taken out from experience pond for neural network learning, batch represents natural number;
6.2) evaluation network parameter is updated
Batch group sample state transinformations transition according to being taken out in step 6.1) passes throughNet Network andNetwork respectively obtains the corresponding estimation Q ' value eval_Q ' of every group of state transinformation and target Q ' values Target_Q ', and then obtain time difference mistake TD_error ', TD_error '=target_Q '-eval_Q ';T ' is step 5.3) experience pool space is performed the input state moment of step 5.2) by storage in after full, that is to say, that experience pond in step 5.3) Input state moment when space often performs a step 5.2) by storage after full is t ';
Loss function Loss, Loss=∑ TD_error '/batch is constructed according to time difference mistake TD_error ';
According to loss function Loss using gradient descent method to evaluation estimation network parameter θQIt is updated;
6.3) behavior estimation network parameter is updated
Per the s in batch group sample state transinformations transitiontPass throughNetwork and output are disturbed A is acted to corresponding reality outputt, according toEstimation Q ' value the eval_Q ' of network act a to reality outputtDerivation Number obtains estimation Q ' values and acts a to reality outputtGradientIt represents and reality output is acted atIt differentiates;According toThe reality output action a of networktValue pairNetwork parameter is differentiated, and is obtained actual defeated Go out to act atValue pairThe gradient of network parameterWhereinRepresent the ginseng to behavior estimation network Number is differentiated;
Estimate that Q values act a to reality outputtGradientA is acted with reality outputtValue is to row To estimate the gradient of network parameterProduct be estimate Q values to behavior estimate network parameter gradient;
Behavior estimation network parameter is updated using gradient rise method;
6.4) objective network parameter is updated
At interval of J bouts, the network parameter of actor_eval is assigned to actor_target, at interval of K bouts, The network parameter of critic_eval is assigned to critic_target, wherein, J ≠ K;
6.5) it is divided into xm bouts, each bout repeats step 6.1)~6.4) xn times, every time repeatedly 6.1)~6.4) after, it is defeated The var values for going out interference are updated to var=max { 0.1, var=var × gamma }, i.e. var values take the var of 0.1 and last moment It is worth the maximum after overdamping, wherein xm, xn represents natural number, and gamma is the rational less than 1 more than zero;
Step 7) utilizes trained depth enhancing learning model control machinery arm movement in practice in step 6)
7.1) in true environment, the input of industrial ccd cameras pre-processes, after the picture of t moment is by gaussian filtering As the state for Processing with Neural Network;
7.2) the current input state s of true environment is obtained by camerat, depth enhancing learning network is according to current input State stControl machinery arm rotates, and obtains follow-up input state st+1.By follow-up input state st+1As current input state st, So Xun Huan, until depth enhancing learning model control machinery arm grabs target.
Experimental data
Object of experiment is in SCARA robot simulated environment, passes through deeply learning neural network, control machinery arm Target is automatically positioned in target, and implements to capture.Experimental setup is to train 600 bouts, 200 step of one bout.It is tied in training Shu Hou can rapidly grab target within 20~30 steps running, be adapted to the requirement of modern industry flow line production. And traditional control machinery arm needs founding mathematical models and reverse dynamic (dynamical) Real-time solution operand is big.

Claims (5)

1. a kind of industrial machinery arm autocontrol method based on depth enhancing study, it is characterised in that:The control method bag Include following steps:
Step 1) structure depth enhancing learning model
1.1) experience pond initializes:Experience pond is set as m rows, the two-dimensional matrix of n row, the value of each element is initial in two-dimensional matrix 0 is turned to, wherein, the information content that m is sample size size, n is each sample storage, n=2 × state_dim+action_ The dimension that dim+1, state_dim are the dimension of state, action_dim is action;Meanwhile it reserves and is used in experience pond The space of incentive message 1 is stored, 1 in this formula of n=2 × state_dim+action_dim+1 is storage incentive message Headspace;
1.2) neutral net initializes:Neutral net is divided into two parts of Actor networks and Critic networks, and Actor networks are Behavior network, Critic networks for evaluation network, be each partly divided into not Gou Jian two structure is identical and parameter is different Eval net and target net, eval net are that estimation network, target net are objective network, so as to formed μ (s | θμ) Network, μ (s | θμ′) network, Q (s, a | θQ) network and Q (s, a | θQ′) network totally four networks, i.e. μ (s | θμ) network estimates for behavior Count network, μ (s | θμ′) network for performance-based objective network, Q (s, a | θQ) network for evaluation estimation network, Q (s, a | θQ′) network is Evaluate objective network;Random initializtion μ (s | θμ) network parameter θμWith random initializtion Q (s, a | θQ) network parameter θQ, so Afterwards by μ (s | θμ) network parameter θμValue assigns performance-based objective network, i.e. θμ′←θμ, by Q (s, a | θQ) network parameter θQValue is assigned Give evaluation objective network, i.e. θQ′←θQ
Step 2) construction output interference
According to current input state st, pass throughNetwork obtains action at', an average is reset as at', variance be var2Random normal distributionIt is distributed from random normalIn be randomly derived a reality output working value at, random normal DistributionTo acting at' interference is applied with, for exploring environment, wherein,Represent the parameter of t moment evaluation estimation network, t For current input state at the time of;
Step 3) establishes reward rtComputation model
Step 4) builds simulated environment
Robot simulation simulation softward V-REP have the major industrial robot in the world model, based on this, the emulation ring of robotic arm Difficulty reduction is built in border, by V-REP (Virtual Robot Experimentation Platform) software, is built and real The simulated environment that border application is consistent;
Step 5) is accumulated experience pond
5.1) according to current input state st, pass throughNetwork obtains action at', the output established further according to step 2) is done It disturbs to obtain reality output action at, and the r that receives awards from environmenttWith follow-up input state st+1, by current input state st, it is real Border output action at, reward rtWith follow-up input state st+1It is stored in experience pond, and by current input state st, reality output Act at, reward rt, follow-up input state st+1It is referred to as state transinformation transition;
5.2) by follow-up input state st+1As present current input state st, repeat step 5.1), the shape that will be calculated State transinformation transition is stored in experience pond;
5.3) until the space in experience pond is full by storage, the space in experience pond is often performed once repetition step 5.2) after storage completely Step 5.2), which just redirects, performs a step 6);
Step 6) training deeply learning neural network
6.1) sample
Batch groups sample is taken out from experience pond for neural network learning, batch represents natural number;
6.2) evaluation network parameter is updated
6.3) behavior estimation network parameter is updated
6.4) objective network parameter is updated
6.5) it is divided into xm bouts, each bout repeats step 6.1)~6.4) xn times, every time repeatedly 6.1)~6.4) after, output is dry The var values disturbed are updated to var=max { 0.1, var=var × gamma }, and wherein xm, xn represents natural number, gamma be more than Zero is less than 1 rational;
Step 7) utilizes trained depth enhancing learning model control machinery arm movement in practice in step 6)
7.1) in true environment, the input of industrial ccd cameras pre-processes, and the picture of t moment after gaussian filtering by being used as For the state of Processing with Neural Network;
7.2) the current input state s of true environment is obtained by camerat, depth enhancing learning network is according to current input state stControl machinery arm rotates, and obtains follow-up input state st+1.By follow-up input state st+1As current input state st, so Xun Huan, until depth enhancing learning model control machinery arm grabs target.
2. the industrial machinery arm autocontrol method according to claim 1 based on depth enhancing study, it is characterised in that:Institute It states in step 3), establishes reward rtThe detailed process of computation model is:
Mechanical arm is obtained the image information of t moment by the industrial ccd cameras in environment, adds Gaussian noise and obtain in t moment To current input state st, state is current input stWhen from step 2) random normal distributionIn be randomly derived a reality Border output action value at, (i.e. the rotation angle of each axis of mechanical arm), mechanical arm tail end position coordinates is x1t,y1t,z1t, target Position is x0t,y0t,z0t, reward
3. the industrial machinery arm autocontrol method according to claim 1 based on depth enhancing study, it is characterised in that:Institute It states in step 6.2), the detailed process being updated to evaluation network parameter is:
Batch group sample state transinformations transition according to being taken out in step 6.1) passes throughNetwork andNetwork respectively obtains the corresponding estimation Q ' value eval_Q ' and target Q ' values target_ of every group of state transinformation Q ', and then obtain time difference mistake TD_error ', TD_error '=target_Q '-eval_Q ';T ' is in step 5.3) Experience pool space is performed the input state moment of step 5.2) by storage after full, that is to say, that experience pool space quilt in step 5.3) Input state moment when often performing a step 5.2) after storage is full is t ';
Loss function Loss, Loss=∑ TD_error '/batch is constructed according to time difference mistake TD_error ';
According to loss function Loss using gradient descent method to evaluation estimation network parameter θQIt is updated.
4. the industrial machinery arm autocontrol method according to claim 1 based on depth enhancing study, it is characterised in that:Institute It states in step 6.3), the detailed process being updated to behavior estimation network parameter is:
Per the s in batch group sample state transinformations transitiontPass throughNetwork and output interference obtain pair The reality output action a answeredt, according toEstimation Q ' value the eval_Q ' of network act a to reality outputtIt differentiates, obtains A is acted to reality output to estimation Q ' valuestGradient It represents and a is acted to reality outputtDerivation Number;According toThe reality output action a of networktValue pairNetwork parameter is differentiated, and obtains reality output action atValue pairThe gradient of network parameterWhereinRepresent the parameter derivation to behavior estimation network Number;
Estimate that Q values act a to reality outputtGradientA is acted with reality outputtValue estimates behavior Count the gradient of network parameterProduct be estimate Q values to behavior estimate network parameter gradient;
Behavior estimation network parameter is updated using gradient rise method.
5. the industrial machinery arm autocontrol method according to claim 1 based on depth enhancing study, it is characterised in that:Institute It states in step 6.4), the detailed process being updated to objective network parameter is:
At interval of J bouts, the network parameter of actor_eval is assigned to actor_target, at interval of K bouts, critic_ The network parameter of eval is assigned to critic_target, wherein, J ≠ K.
CN201711275146.7A 2017-12-06 2017-12-06 Industrial mechanical arm automatic control method based on deep reinforcement learning Active CN108052004B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711275146.7A CN108052004B (en) 2017-12-06 2017-12-06 Industrial mechanical arm automatic control method based on deep reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711275146.7A CN108052004B (en) 2017-12-06 2017-12-06 Industrial mechanical arm automatic control method based on deep reinforcement learning

Publications (2)

Publication Number Publication Date
CN108052004A true CN108052004A (en) 2018-05-18
CN108052004B CN108052004B (en) 2020-11-10

Family

ID=62121722

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711275146.7A Active CN108052004B (en) 2017-12-06 2017-12-06 Industrial mechanical arm automatic control method based on deep reinforcement learning

Country Status (1)

Country Link
CN (1) CN108052004B (en)

Cited By (49)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108803615A (en) * 2018-07-03 2018-11-13 东南大学 A kind of visual human's circumstances not known navigation algorithm based on deeply study
CN108927806A (en) * 2018-08-13 2018-12-04 哈尔滨工业大学(深圳) A kind of industrial robot learning method applied to high-volume repeatability processing
CN109242099A (en) * 2018-08-07 2019-01-18 中国科学院深圳先进技术研究院 Training method, device, training equipment and the storage medium of intensified learning network
CN109240280A (en) * 2018-07-05 2019-01-18 上海交通大学 Anchoring auxiliary power positioning system control method based on intensified learning
CN109352648A (en) * 2018-10-12 2019-02-19 北京地平线机器人技术研发有限公司 Control method, device and the electronic equipment of mechanical mechanism
CN109352649A (en) * 2018-10-15 2019-02-19 同济大学 A kind of method for controlling robot and system based on deep learning
CN109379752A (en) * 2018-09-10 2019-02-22 中国移动通信集团江苏有限公司 Optimization method, device, equipment and the medium of Massive MIMO
CN109483534A (en) * 2018-11-08 2019-03-19 腾讯科技(深圳)有限公司 A kind of grasping body methods, devices and systems
CN109614631A (en) * 2018-10-18 2019-04-12 清华大学 Aircraft Full automatic penumatic optimization method based on intensified learning and transfer learning
CN109605377A (en) * 2019-01-21 2019-04-12 厦门大学 A kind of joint of robot motion control method and system based on intensified learning
CN109800864A (en) * 2019-01-18 2019-05-24 中山大学 A kind of robot Active Learning Method based on image input
CN109948642A (en) * 2019-01-18 2019-06-28 中山大学 Multiple agent cross-module state depth deterministic policy gradient training method based on image input
CN110053053A (en) * 2019-06-14 2019-07-26 西南科技大学 Mechanical arm based on deeply study screws the adaptive approach of valve
CN110053034A (en) * 2019-05-23 2019-07-26 哈尔滨工业大学 A kind of multi purpose space cellular machineries people's device of view-based access control model
CN110070099A (en) * 2019-02-20 2019-07-30 北京航空航天大学 A kind of industrial data feature structure method based on intensified learning
CN110125939A (en) * 2019-06-03 2019-08-16 湖南工学院 A kind of method of Robot Virtual visualization control
CN110238839A (en) * 2019-04-11 2019-09-17 清华大学 It is a kind of to optimize non-molding machine people multi peg-in-hole control method using environmental forecasting
CN110370295A (en) * 2019-07-02 2019-10-25 浙江大学 Soccer robot active control suction ball method based on deeply study
CN110400345A (en) * 2019-07-24 2019-11-01 西南科技大学 Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting
CN110826701A (en) * 2019-11-15 2020-02-21 北京邮电大学 Method for carrying out system identification on two-degree-of-freedom flexible leg based on BP neural network algorithm
CN110879595A (en) * 2019-11-29 2020-03-13 江苏徐工工程机械研究院有限公司 Unmanned mine card tracking control system and method based on deep reinforcement learning
CN110900601A (en) * 2019-11-15 2020-03-24 武汉理工大学 Robot operation autonomous control method for human-robot cooperation safety guarantee
CN110909859A (en) * 2019-11-29 2020-03-24 中国科学院自动化研究所 Bionic robot fish motion control method and system based on antagonistic structured control
CN111223141A (en) * 2019-12-31 2020-06-02 东华大学 Automatic assembly line work efficiency optimization system and method based on reinforcement learning
CN111360834A (en) * 2020-03-25 2020-07-03 中南大学 Humanoid robot motion control method and system based on deep reinforcement learning
CN111461325A (en) * 2020-03-30 2020-07-28 华南理工大学 Multi-target layered reinforcement learning algorithm for sparse rewarding environment problem
CN111476257A (en) * 2019-01-24 2020-07-31 富士通株式会社 Information processing method and information processing apparatus
CN111487863A (en) * 2020-04-14 2020-08-04 东南大学 Active suspension reinforcement learning control method based on deep Q neural network
CN111515961A (en) * 2020-06-02 2020-08-11 南京大学 Reinforcement learning reward method suitable for mobile mechanical arm
CN111618847A (en) * 2020-04-22 2020-09-04 南通大学 Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements
CN111644398A (en) * 2020-05-28 2020-09-11 华中科技大学 Push-grab cooperative sorting network based on double viewing angles and sorting method and system thereof
CN111881772A (en) * 2020-07-06 2020-11-03 上海交通大学 Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning
EP3760390A1 (en) * 2019-07-01 2021-01-06 KUKA Deutschland GmbH Performance of a predetermined task using at least one robot
WO2021001312A1 (en) * 2019-07-01 2021-01-07 Kuka Deutschland Gmbh Carrying out an application using at least one robot
CN112338921A (en) * 2020-11-16 2021-02-09 西华师范大学 Mechanical arm intelligent control rapid training method based on deep reinforcement learning
CN112405543A (en) * 2020-11-23 2021-02-26 长沙理工大学 Mechanical arm dense object temperature-first grabbing method based on deep reinforcement learning
CN112434464A (en) * 2020-11-09 2021-03-02 中国船舶重工集团公司第七一六研究所 Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG reinforcement learning algorithm
CN112506044A (en) * 2020-09-10 2021-03-16 上海交通大学 Flexible arm control and planning method based on visual feedback and reinforcement learning
CN112528552A (en) * 2020-10-23 2021-03-19 洛阳银杏科技有限公司 Mechanical arm control model construction method based on deep reinforcement learning
CN112643668A (en) * 2020-12-01 2021-04-13 浙江工业大学 Mechanical arm pushing and grabbing cooperation method suitable for intensive environment
CN112894796A (en) * 2019-11-19 2021-06-04 财团法人工业技术研究院 Gripping device and gripping method
CN113159410A (en) * 2021-04-14 2021-07-23 北京百度网讯科技有限公司 Training method for automatic control model and fluid supply system control method
CN113283167A (en) * 2021-05-24 2021-08-20 暨南大学 Special equipment production line optimization method and system based on safety reinforcement learning
CN113510709A (en) * 2021-07-28 2021-10-19 北京航空航天大学 Industrial robot pose precision online compensation method based on deep reinforcement learning
CN113843802A (en) * 2021-10-18 2021-12-28 南京理工大学 Mechanical arm motion control method based on deep reinforcement learning TD3 algorithm
WO2022142271A1 (en) * 2020-12-31 2022-07-07 山东大学 Comprehensive intelligent nursing system and method for high infectiousness isolation ward
CN114789444A (en) * 2022-05-05 2022-07-26 山东省人工智能研究院 Compliant human-computer contact method based on deep reinforcement learning and impedance control
CN115464659A (en) * 2022-10-05 2022-12-13 哈尔滨理工大学 Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information
CN117618125A (en) * 2024-01-25 2024-03-01 科弛医疗科技(北京)有限公司 Image trolley

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105690392A (en) * 2016-04-14 2016-06-22 苏州大学 Robot motion control method and device based on actor-critic method
CN106094516A (en) * 2016-06-08 2016-11-09 南京大学 A kind of robot self-adapting grasping method based on deeply study
WO2017083772A1 (en) * 2015-11-12 2017-05-18 Google Inc. Asynchronous deep reinforcement learning
CN107065881A (en) * 2017-05-17 2017-08-18 清华大学 A kind of robot global path planning method learnt based on deeply

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017083772A1 (en) * 2015-11-12 2017-05-18 Google Inc. Asynchronous deep reinforcement learning
CN105690392A (en) * 2016-04-14 2016-06-22 苏州大学 Robot motion control method and device based on actor-critic method
CN106094516A (en) * 2016-06-08 2016-11-09 南京大学 A kind of robot self-adapting grasping method based on deeply study
CN107065881A (en) * 2017-05-17 2017-08-18 清华大学 A kind of robot global path planning method learnt based on deeply

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JELLE MUNK等: "Learning State Representation for Deep Actor-Critic Control", 《2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL》 *
唐鹏: "机器人足球行为控制学习算法的研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108803615A (en) * 2018-07-03 2018-11-13 东南大学 A kind of visual human's circumstances not known navigation algorithm based on deeply study
CN109240280A (en) * 2018-07-05 2019-01-18 上海交通大学 Anchoring auxiliary power positioning system control method based on intensified learning
CN109240280B (en) * 2018-07-05 2021-09-07 上海交通大学 Anchoring auxiliary power positioning system control method based on reinforcement learning
CN109242099A (en) * 2018-08-07 2019-01-18 中国科学院深圳先进技术研究院 Training method, device, training equipment and the storage medium of intensified learning network
CN109242099B (en) * 2018-08-07 2020-11-10 中国科学院深圳先进技术研究院 Training method and device of reinforcement learning network, training equipment and storage medium
CN108927806A (en) * 2018-08-13 2018-12-04 哈尔滨工业大学(深圳) A kind of industrial robot learning method applied to high-volume repeatability processing
CN109379752B (en) * 2018-09-10 2021-09-24 中国移动通信集团江苏有限公司 Massive MIMO optimization method, device, equipment and medium
CN109379752A (en) * 2018-09-10 2019-02-22 中国移动通信集团江苏有限公司 Optimization method, device, equipment and the medium of Massive MIMO
CN109352648A (en) * 2018-10-12 2019-02-19 北京地平线机器人技术研发有限公司 Control method, device and the electronic equipment of mechanical mechanism
CN109352649A (en) * 2018-10-15 2019-02-19 同济大学 A kind of method for controlling robot and system based on deep learning
CN109352649B (en) * 2018-10-15 2021-07-20 同济大学 Manipulator control method and system based on deep learning
CN109614631A (en) * 2018-10-18 2019-04-12 清华大学 Aircraft Full automatic penumatic optimization method based on intensified learning and transfer learning
CN109483534A (en) * 2018-11-08 2019-03-19 腾讯科技(深圳)有限公司 A kind of grasping body methods, devices and systems
CN109948642A (en) * 2019-01-18 2019-06-28 中山大学 Multiple agent cross-module state depth deterministic policy gradient training method based on image input
CN109800864A (en) * 2019-01-18 2019-05-24 中山大学 A kind of robot Active Learning Method based on image input
CN109605377A (en) * 2019-01-21 2019-04-12 厦门大学 A kind of joint of robot motion control method and system based on intensified learning
CN111476257A (en) * 2019-01-24 2020-07-31 富士通株式会社 Information processing method and information processing apparatus
CN110070099A (en) * 2019-02-20 2019-07-30 北京航空航天大学 A kind of industrial data feature structure method based on intensified learning
CN110238839A (en) * 2019-04-11 2019-09-17 清华大学 It is a kind of to optimize non-molding machine people multi peg-in-hole control method using environmental forecasting
CN110238839B (en) * 2019-04-11 2020-10-20 清华大学 Multi-shaft-hole assembly control method for optimizing non-model robot by utilizing environment prediction
CN110053034A (en) * 2019-05-23 2019-07-26 哈尔滨工业大学 A kind of multi purpose space cellular machineries people's device of view-based access control model
CN110125939A (en) * 2019-06-03 2019-08-16 湖南工学院 A kind of method of Robot Virtual visualization control
CN110125939B (en) * 2019-06-03 2020-10-20 湖南工学院 Virtual visual control method for robot
CN110053053B (en) * 2019-06-14 2022-04-12 西南科技大学 Self-adaptive method of mechanical arm screwing valve based on deep reinforcement learning
CN110053053A (en) * 2019-06-14 2019-07-26 西南科技大学 Mechanical arm based on deeply study screws the adaptive approach of valve
WO2021001312A1 (en) * 2019-07-01 2021-01-07 Kuka Deutschland Gmbh Carrying out an application using at least one robot
CN114051444B (en) * 2019-07-01 2024-04-26 库卡德国有限公司 Executing an application by means of at least one robot
EP3760390A1 (en) * 2019-07-01 2021-01-06 KUKA Deutschland GmbH Performance of a predetermined task using at least one robot
CN114051444A (en) * 2019-07-01 2022-02-15 库卡德国有限公司 Executing an application by means of at least one robot
CN110370295B (en) * 2019-07-02 2020-12-18 浙江大学 Small-sized football robot active control ball suction method based on deep reinforcement learning
CN110370295A (en) * 2019-07-02 2019-10-25 浙江大学 Soccer robot active control suction ball method based on deeply study
CN110400345B (en) * 2019-07-24 2021-06-15 西南科技大学 Deep reinforcement learning-based radioactive waste push-grab cooperative sorting method
CN110400345A (en) * 2019-07-24 2019-11-01 西南科技大学 Radioactive waste based on deeply study, which pushes away, grabs collaboration method for sorting
CN110900601B (en) * 2019-11-15 2022-06-03 武汉理工大学 Robot operation autonomous control method for human-robot cooperation safety guarantee
CN110826701A (en) * 2019-11-15 2020-02-21 北京邮电大学 Method for carrying out system identification on two-degree-of-freedom flexible leg based on BP neural network algorithm
CN110900601A (en) * 2019-11-15 2020-03-24 武汉理工大学 Robot operation autonomous control method for human-robot cooperation safety guarantee
CN112894796A (en) * 2019-11-19 2021-06-04 财团法人工业技术研究院 Gripping device and gripping method
TWI790408B (en) * 2019-11-19 2023-01-21 財團法人工業技術研究院 Gripping device and gripping method
CN112894796B (en) * 2019-11-19 2023-09-05 财团法人工业技术研究院 Grabbing device and grabbing method
CN110909859B (en) * 2019-11-29 2023-03-24 中国科学院自动化研究所 Bionic robot fish motion control method and system based on antagonistic structured control
CN110909859A (en) * 2019-11-29 2020-03-24 中国科学院自动化研究所 Bionic robot fish motion control method and system based on antagonistic structured control
CN110879595A (en) * 2019-11-29 2020-03-13 江苏徐工工程机械研究院有限公司 Unmanned mine card tracking control system and method based on deep reinforcement learning
CN111223141A (en) * 2019-12-31 2020-06-02 东华大学 Automatic assembly line work efficiency optimization system and method based on reinforcement learning
CN111223141B (en) * 2019-12-31 2023-10-24 东华大学 Automatic pipeline work efficiency optimization system and method based on reinforcement learning
CN111360834A (en) * 2020-03-25 2020-07-03 中南大学 Humanoid robot motion control method and system based on deep reinforcement learning
CN111461325B (en) * 2020-03-30 2023-06-20 华南理工大学 Multi-target layered reinforcement learning algorithm for sparse rewarding environmental problem
CN111461325A (en) * 2020-03-30 2020-07-28 华南理工大学 Multi-target layered reinforcement learning algorithm for sparse rewarding environment problem
CN111487863B (en) * 2020-04-14 2022-06-17 东南大学 Active suspension reinforcement learning control method based on deep Q neural network
CN111487863A (en) * 2020-04-14 2020-08-04 东南大学 Active suspension reinforcement learning control method based on deep Q neural network
CN111618847A (en) * 2020-04-22 2020-09-04 南通大学 Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements
CN111618847B (en) * 2020-04-22 2022-06-21 南通大学 Mechanical arm autonomous grabbing method based on deep reinforcement learning and dynamic motion elements
CN111644398A (en) * 2020-05-28 2020-09-11 华中科技大学 Push-grab cooperative sorting network based on double viewing angles and sorting method and system thereof
CN111515961A (en) * 2020-06-02 2020-08-11 南京大学 Reinforcement learning reward method suitable for mobile mechanical arm
CN111515961B (en) * 2020-06-02 2022-06-21 南京大学 Reinforcement learning reward method suitable for mobile mechanical arm
CN111881772B (en) * 2020-07-06 2023-11-07 上海交通大学 Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning
CN111881772A (en) * 2020-07-06 2020-11-03 上海交通大学 Multi-mechanical arm cooperative assembly method and system based on deep reinforcement learning
CN112506044A (en) * 2020-09-10 2021-03-16 上海交通大学 Flexible arm control and planning method based on visual feedback and reinforcement learning
CN112528552A (en) * 2020-10-23 2021-03-19 洛阳银杏科技有限公司 Mechanical arm control model construction method based on deep reinforcement learning
CN112434464B (en) * 2020-11-09 2021-09-10 中国船舶重工集团公司第七一六研究所 Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG algorithm
CN112434464A (en) * 2020-11-09 2021-03-02 中国船舶重工集团公司第七一六研究所 Arc welding cooperative welding method for multiple mechanical arms of ship based on MADDPG reinforcement learning algorithm
CN112338921A (en) * 2020-11-16 2021-02-09 西华师范大学 Mechanical arm intelligent control rapid training method based on deep reinforcement learning
CN112405543A (en) * 2020-11-23 2021-02-26 长沙理工大学 Mechanical arm dense object temperature-first grabbing method based on deep reinforcement learning
CN112405543B (en) * 2020-11-23 2022-05-06 长沙理工大学 Mechanical arm dense object temperature-first grabbing method based on deep reinforcement learning
CN112643668A (en) * 2020-12-01 2021-04-13 浙江工业大学 Mechanical arm pushing and grabbing cooperation method suitable for intensive environment
CN112643668B (en) * 2020-12-01 2022-05-24 浙江工业大学 Mechanical arm pushing and grabbing cooperation method suitable for intensive environment
WO2022142271A1 (en) * 2020-12-31 2022-07-07 山东大学 Comprehensive intelligent nursing system and method for high infectiousness isolation ward
CN113159410B (en) * 2021-04-14 2024-02-27 北京百度网讯科技有限公司 Training method of automatic control model and fluid supply system control method
CN113159410A (en) * 2021-04-14 2021-07-23 北京百度网讯科技有限公司 Training method for automatic control model and fluid supply system control method
CN113283167A (en) * 2021-05-24 2021-08-20 暨南大学 Special equipment production line optimization method and system based on safety reinforcement learning
CN113510709A (en) * 2021-07-28 2021-10-19 北京航空航天大学 Industrial robot pose precision online compensation method based on deep reinforcement learning
CN113510709B (en) * 2021-07-28 2022-08-19 北京航空航天大学 Industrial robot pose precision online compensation method based on deep reinforcement learning
CN113843802A (en) * 2021-10-18 2021-12-28 南京理工大学 Mechanical arm motion control method based on deep reinforcement learning TD3 algorithm
CN113843802B (en) * 2021-10-18 2023-09-05 南京理工大学 Mechanical arm motion control method based on deep reinforcement learning TD3 algorithm
CN114789444B (en) * 2022-05-05 2022-12-16 山东省人工智能研究院 Compliant human-computer contact method based on deep reinforcement learning and impedance control
CN114789444A (en) * 2022-05-05 2022-07-26 山东省人工智能研究院 Compliant human-computer contact method based on deep reinforcement learning and impedance control
CN115464659A (en) * 2022-10-05 2022-12-13 哈尔滨理工大学 Mechanical arm grabbing control method based on deep reinforcement learning DDPG algorithm of visual information
CN115464659B (en) * 2022-10-05 2023-10-24 哈尔滨理工大学 Mechanical arm grabbing control method based on visual information deep reinforcement learning DDPG algorithm
CN117618125A (en) * 2024-01-25 2024-03-01 科弛医疗科技(北京)有限公司 Image trolley

Also Published As

Publication number Publication date
CN108052004B (en) 2020-11-10

Similar Documents

Publication Publication Date Title
CN108052004A (en) Industrial machinery arm autocontrol method based on depth enhancing study
Chen et al. A system for general in-hand object re-orientation
US11928765B2 (en) Animation implementation method and apparatus, electronic device, and storage medium
Guo et al. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
EP3825962A3 (en) Virtual object driving method, apparatus, electronic device, and readable storage medium
CN109523029A (en) For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body
CN109782600A (en) A method of autonomous mobile robot navigation system is established by virtual environment
Kusuma FIBROUS ROOT MODEL IN BATIK PATTERN GENERATION.
CN109800864A (en) A kind of robot Active Learning Method based on image input
CN107679522A (en) Action identification method based on multithread LSTM
CN110315544B (en) Robot operation learning method based on video image demonstration
CN108229678A (en) Network training method, method of controlling operation thereof, device, storage medium and equipment
Jiang et al. Mastering the complex assembly task with a dual-arm robot: A novel reinforcement learning method
Vacaro et al. Sim-to-real in reinforcement learning for everyone
Zakaria et al. Robotic control of the deformation of soft linear objects using deep reinforcement learning
Zhang et al. Reinforcement learning based pushing and grasping objects from ungraspable poses
Lv et al. Sam-rl: Sensing-aware model-based reinforcement learning via differentiable physics-based simulation and rendering
Kim et al. Pre-and post-contact policy decomposition for non-prehensile manipulation with zero-shot sim-to-real transfer
CN108944940A (en) Driving behavior modeling method neural network based
Chen et al. A simple method for complex in-hand manipulation
WO2021100267A1 (en) Information processing device and information processing method
Sanchez et al. Towards advanced robotic manipulation
CN110751869B (en) Simulated environment and battlefield situation strategy transfer technology based on countermeasure discrimination migration method
CN109635942B (en) Brain excitation state and inhibition state imitation working state neural network circuit structure and method
Li et al. Learning a skill-sequence-dependent policy for long-horizon manipulation tasks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant