CN108255059B - Robot control method based on simulator training - Google Patents

Robot control method based on simulator training Download PDF

Info

Publication number
CN108255059B
CN108255059B CN201810054083.0A CN201810054083A CN108255059B CN 108255059 B CN108255059 B CN 108255059B CN 201810054083 A CN201810054083 A CN 201810054083A CN 108255059 B CN108255059 B CN 108255059B
Authority
CN
China
Prior art keywords
robot
optimal
simulator
robots
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810054083.0A
Other languages
Chinese (zh)
Other versions
CN108255059A (en
Inventor
俞扬
张超
周志华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201810054083.0A priority Critical patent/CN108255059B/en
Publication of CN108255059A publication Critical patent/CN108255059A/en
Application granted granted Critical
Publication of CN108255059B publication Critical patent/CN108255059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention discloses a robot control method based on simulator training, which is used for simulating and building a task environment to be executed by a robotA model, establishing a simulator; in a simulator, randomly generating T robots with different performance parameters, respectively training strategies for each robot, and finally obtaining a base strategy set formed by each strategy; in the simulator, M robots with different performance parameters are generated randomly, the optimal combination weight of a base strategy set used by each robot in task execution is obtained through optimization in the M robots, and each robot executes a random action sequence to obtain a characteristic Fi(A) And optimal combining weights
Figure DDA0001553208880000011
Respectively serving as the input and the label of the regression model, and optimizing to obtain an optimal regression model theta; in the simulator, N robots with different performance parameters are randomly generated, and the optimal action is optimized on the N robots; in the same task, enabling the robot with unknown different performance parameters to execute optimal action A*And obtaining the optimal action strategy of the robot.

Description

Robot control method based on simulator training
Technical Field
The invention relates to a robot control method based on simulator training, which can be used for controlling equipment such as robots, mechanical arms, moving devices and the like and belongs to the technical field of robots.
Background
At present, more and more entering people's life of robot for reduce the human labor in daily life, supplementary people accomplish the task, like navigation, tracking, object are snatched, part assembly and high-risk article transport etc.. However, the existing control method for the robot is often a fixed strategy, that is, actions are executed strictly according to a fixed programming flow after repeated attempts are made by professionals for a specific task, so that no small manpower is required to be introduced in task execution. In addition, since robots in life are various in types and performance parameters of individuals are different, such as differentiated sensor parameters, appearance parameters, and movable range parameters, even when the same task is performed, a uniform fixed program flow preset by a professional cannot be well used due to individual differences, and independent debugging needs to be performed for each individual. Although a feasible action strategy can be updated and solved in real time when the robot executes a task through numerical calculation in the automatic control field, a large amount of distribution assumptions are introduced, the relevant performance parameters of the robot need to be input in advance, although the whole process reduces certain manpower introduction, the manpower participation is still needed, and the action strategy of the robot obtained by the method is very sensitive to the input performance parameters, so that the relevant performance parameters of the robot with high precision need to be input when the task is executed. The process of the method is that the robot continuously interacts with the environment in a simulator to try and error, and optimizes the action strategy of the robot to finally obtain the action strategy meeting the task requirements, but the action strategy finally learned by the reinforcement learning also has high correlation with the performance parameters of the robot, so that the effective action strategy of the unknown robots with different performances in the same task still cannot be obtained.
Therefore, a new technical solution is needed to solve the problem when the task of the robot is executed, especially when each robot in the same task has unknown differential performance parameters.
Disclosure of Invention
The purpose of the invention is as follows: in order to overcome the defects of the prior art, the invention provides a robot control method based on simulator training, which can obtain unknown action strategies of robots with different performance parameters in the same task, and the robots under the action strategies can effectively meet the task requirements.
The technical scheme is as follows: a robot control method based on simulator training comprises the following steps:
step 1: carrying out simulation modeling on a task environment to be executed, establishing a simulator which is the same as or similar to the task, and establishing four factors of reinforcement learning aiming at the task design: state s, action a, reward function R (s, a), state transition probability P (s' | s, a);
step 2: in a simulator, T robots with different performance parameters are randomly generated, a reinforcement learning algorithm is used for training each robot respectively to obtain respective action strategies pi as a base strategy, and a base strategy set is finally obtained
Figure GDA0002876104790000021
And combining strategies
Figure GDA0002876104790000022
Wherein w is a weight coefficient;
and step 3: in the simulator, M robots with different performance parameters are generated randomly, and the optimal combination weight of a base strategy set used by each robot in task execution is obtained through optimization in the M robots
Figure GDA0002876104790000023
Wherein τ is a plurality of state-action pairs(s) of the robot when performing a task0,a0,s1,a1,...,st,at) The formed track is formed by the following steps,
Figure GDA0002876104790000024
performing a combined strategy for a robotwThen generating probability of track tau, R (tau) being total reward obtained on track tau, then making said M robots each execute a given string of initial random actions A, and using output state of every robot after executing action A as characteristic Fi(A) Each robot is characterized by Fi(A) And optimal combining weights
Figure GDA0002876104790000025
Respectively used as the input and label of the regression model, and optimized to obtain the optimal regression model theta, namely
Figure GDA0002876104790000026
And 4, step 4: in the simulator, N robots with different performance parameters are generated randomly, and the optimal action is optimized on the N robots
Figure GDA0002876104790000027
And 5: in the same task, enabling the robot with unknown different performance parameters to execute optimal action A*Obtaining the optimal action strategy of the robot
Figure GDA0002876104790000028
The reinforcement learning algorithm used in the step 2 adopts a trust domain strategy optimization algorithm (TRPO), and the value range of the weight coefficient w is 0-1.
The optimal combination weight optimization algorithm of the base strategy set used in the step 3 adopts a serialized random axis shrinkage algorithm (SRACOS), the regression model optimization algorithm used adopts a gradient descent algorithm, and a given string of initial random actions A comprises 5 actions.
The optimal action optimization algorithm used in the step 4 adopts a serialized random axis shrinkage algorithm (SRACOS), and the optimal action A*Comprising 5 actions.
Has the advantages that: the action strategy of some robots when executing tasks is a fixed program flow written by professionals after repeated trials in advance, a large amount of manpower is required to be introduced, although the action strategy of the robots can be automatically solved through numerical calculation, the task requirements can be well completed only by manually inputting high-precision performance parameters of the robots, and the traditional reinforcement learning introduced in the later stage greatly reduces manual participation, but because the action strategy learned by the reinforcement learning is highly correlated with the performance parameters of the robots, the robots with different unknown performances under the same task still do not have generalization and can not directly obtain corresponding effective action strategies.
Compared with the prior art, the robot control method based on simulator training provided by the invention has the advantages that a group of optimal actions are optimized in the simulator, and when the robot with different unknown performance parameters performs the actions, a group of output states can be obtained, so that the states of the robot can be indirectly recognized, the action strategies of the similar robots in the past are combined, the action strategies of the robot under the task are finally and directly obtained, and the task requirements can be effectively completed.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
As shown in fig. 1, the robot control method based on simulator training includes the following steps:
step 1: carrying out simulation modeling on a task environment to be executed, establishing a simulator which is the same as or similar to the task, and establishing four factors of reinforcement learning aiming at the task design: state s, action a, reward function R (s, a), state transition probability P (s' | s, a);
step 2: in a simulator, T robots with different performance parameters are randomly generated, a reinforcement learning algorithm is used for training each robot respectively to obtain respective action strategies pi as a base strategy, and a base strategy set is finally obtained
Figure GDA0002876104790000031
And combining strategies
Figure GDA0002876104790000032
Wherein w is a weight coefficient, pit(a | s) denotes the policy model πtTaking the state s as an input and outputting the action a;
and step 3: in the simulator, M robots with different performance parameters are generated randomly, and the optimal combination weight of a base strategy set used by each robot in task execution is obtained through optimization in the M robots
Figure GDA0002876104790000041
Wherein τ is a plurality of state-action pairs(s) of the robot when performing a task0,a0,s1,a1,...,st,at) The formed track is formed by the following steps,
Figure GDA0002876104790000042
performing a combined strategy for a robotwProbability of post-production of trace τR (tau) is the total reward obtained on the track tau, then the M robots are all caused to execute a given string of initial random actions A, and the output state of each robot after executing the actions A is taken as a characteristic Fi(A) Each robot is characterized by Fi(A) And optimal combining weights
Figure GDA0002876104790000043
Respectively used as the input and label of the regression model, and optimized to obtain the optimal regression model theta, namely
Figure GDA0002876104790000044
And 4, step 4: in the simulator, N robots with different performance parameters are generated randomly, and the optimal action is optimized on the N robots
Figure GDA0002876104790000045
And 5: in the same task, enabling the robot with unknown different performance parameters to execute optimal action A*Obtaining the optimal action strategy of the robot
Figure GDA0002876104790000046
The reinforcement learning algorithm used in the method adopts a trust domain strategy optimization algorithm (TRPO), and the value range of the weight coefficient w is 0-1.
The optimal combined weight optimization algorithm of the base strategy set used in the method adopts a serialized random axis shrinkage algorithm (SRACOS), the regression model optimization algorithm used adopts a gradient descent algorithm, and a given string of initial random actions A comprises 5 actions.
The optimal action optimization algorithm used in the method adopts a serialized random axis shrinkage algorithm (SRACOS) optimal action A*Comprising 5 actions.

Claims (5)

1. A robot control method based on simulator training is characterized by comprising the following steps:
step 1: carrying out simulation modeling on a task environment to be executed, establishing a simulator, and constructing four factors of reinforcement learning aiming at task design: state s, action a, reward function R (s, a), state transition probability P (s' | s, a);
step 2: in a simulator, T robots with different performance parameters are randomly generated, a reinforcement learning algorithm is used for training each robot respectively to obtain respective action strategies pi as a base strategy, and a base strategy set is finally obtained
Figure FDA0002876104780000011
And combining strategies
Figure FDA0002876104780000012
Wherein w is a weight coefficient;
and step 3: in the simulator, M robots with different performance parameters are generated randomly, and the optimal combination weight of a base strategy set used by each robot in task execution is obtained through optimization in the M robots
Figure FDA0002876104780000013
Then, the M robots all execute a given string of initial random actions A, and the output state of each robot after executing the actions A is taken as a characteristic Fi(A) Each robot is characterized by Fi(A) And optimal combining weights
Figure FDA0002876104780000014
Respectively used as the input and label of the regression model, and optimized to obtain the optimal regression model theta, namely
Figure FDA0002876104780000015
And 4, step 4: in the simulator, N robots with different performance parameters are generated randomly, and the optimal action is optimized on the N robots
Figure FDA0002876104780000016
And 5: in the same task, enabling the robot with unknown different performance parameters to execute optimal action A*Obtaining the optimal action strategy of the robot
Figure FDA0002876104780000017
2. The robot control method based on simulator training as claimed in claim 1, wherein the reinforcement learning algorithm used in step 2 is a trust domain strategy optimization algorithm, and the value range of the weight coefficient w is 0-1.
3. The simulator training based robot control method of claim 1, wherein the optimal combining weight optimization algorithm of the base strategy set used in step 3 is a serialized random axis shrinkage algorithm, the regression model optimization algorithm used is a gradient descent algorithm, and a given string of initial random actions a comprises 5 actions.
4. The robot control method based on simulator training as claimed in claim 1, wherein the optimal motion optimization algorithm used in step 4 is a serialized random axis shrinkage algorithm, optimal motion a*Comprising 5 actions.
5. The method of claim 1, wherein the optimal combination weights of the set of base strategies used by each robot in performing the task in step 3 are determined based on the optimal combination weights of the set of base strategies used by the robot in performing the task
Figure FDA0002876104780000021
Wherein τ is a plurality of state-action pairs(s) of the robot when performing a task0,a0,s1,a1,...,st,at) The formed track is formed by the following steps,
Figure FDA0002876104780000022
is made into a machineRobot execution combination strategy piwThe probability of the trace τ being generated later, R (τ) being the total reward earned on the trace τ.
CN201810054083.0A 2018-01-19 2018-01-19 Robot control method based on simulator training Active CN108255059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810054083.0A CN108255059B (en) 2018-01-19 2018-01-19 Robot control method based on simulator training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810054083.0A CN108255059B (en) 2018-01-19 2018-01-19 Robot control method based on simulator training

Publications (2)

Publication Number Publication Date
CN108255059A CN108255059A (en) 2018-07-06
CN108255059B true CN108255059B (en) 2021-03-19

Family

ID=62726768

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810054083.0A Active CN108255059B (en) 2018-01-19 2018-01-19 Robot control method based on simulator training

Country Status (1)

Country Link
CN (1) CN108255059B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710966B (en) * 2018-11-12 2023-07-14 南京南大电子智慧型服务机器人研究院有限公司 Method for designing cylindrical body of service robot based on scattered sound power
CN112034888B (en) * 2020-09-10 2021-07-30 南京大学 Autonomous control cooperation strategy training method for fixed wing unmanned aerial vehicle
US11995577B2 (en) * 2022-03-03 2024-05-28 Caterpillar Inc. System and method for estimating a machine's potential usage, profitability, and cost of ownership based on machine's value and mechanical state
CN115598985B (en) * 2022-11-01 2024-02-02 南栖仙策(南京)高新技术有限公司 Training method and device of feedback controller, electronic equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103878772A (en) * 2014-03-31 2014-06-25 北京工业大学 Biomorphic wheeled robot system with simulation learning mechanism and method
CN106094813A (en) * 2016-05-26 2016-11-09 华南理工大学 It is correlated with based on model humanoid robot gait's control method of intensified learning
US9671777B1 (en) * 2016-06-21 2017-06-06 TruPhysics GmbH Training robots to execute actions in physics-based virtual environment
CN107562052A (en) * 2017-08-30 2018-01-09 唐开强 A kind of Hexapod Robot gait planning method based on deeply study
CN110023965A (en) * 2016-10-10 2019-07-16 渊慧科技有限公司 For selecting the neural network of the movement executed by intelligent robot body

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160055427A1 (en) * 2014-10-15 2016-02-25 Brighterion, Inc. Method for providing data science, artificial intelligence and machine learning as-a-service

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103878772A (en) * 2014-03-31 2014-06-25 北京工业大学 Biomorphic wheeled robot system with simulation learning mechanism and method
CN106094813A (en) * 2016-05-26 2016-11-09 华南理工大学 It is correlated with based on model humanoid robot gait's control method of intensified learning
US9671777B1 (en) * 2016-06-21 2017-06-06 TruPhysics GmbH Training robots to execute actions in physics-based virtual environment
CN110023965A (en) * 2016-10-10 2019-07-16 渊慧科技有限公司 For selecting the neural network of the movement executed by intelligent robot body
CN107562052A (en) * 2017-08-30 2018-01-09 唐开强 A kind of Hexapod Robot gait planning method based on deeply study

Also Published As

Publication number Publication date
CN108255059A (en) 2018-07-06

Similar Documents

Publication Publication Date Title
CN108255059B (en) Robot control method based on simulator training
CN105068515B (en) A kind of intelligent home device sound control method based on self-learning algorithm
WO2022012265A1 (en) Robot learning from demonstration via meta-imitation learning
CN107862970B (en) Teaching quality evaluation model for turnover classroom
CN109523029A (en) For the adaptive double from driving depth deterministic policy Gradient Reinforcement Learning method of training smart body
CN106201651A (en) The simulator of neuromorphic chip
CN110427006A (en) A kind of multi-agent cooperative control system and method for process industry
CN109697512B (en) Personal data analysis method based on Bayesian network and computer storage medium
CN104408518A (en) Method of learning and optimizing neural network based on particle swarm optimization algorithm
CN109840595A (en) A kind of knowledge method for tracing based on group study behavior feature
CN105701540A (en) Self-generated neural network construction method
CN105550747A (en) Sample training method for novel convolutional neural network
CN109657791A (en) It is a kind of based on cerebral nerve cynapse memory mechanism towards open world successive learning method
CN116523187A (en) Engineering progress monitoring method and system based on BIM
CN108198268A (en) A kind of production equipment data scaling method
CN104834285B (en) Implementation method of the Diagonal Recurrent Neural Networks Controller in multi-platform
Krishnamoorthy et al. Deep learning techniques and optimization strategies in big data analytics: automated transfer learning of convolutional neural networks using Enas algorithm
CN109352649A (en) A kind of method for controlling robot and system based on deep learning
Wang et al. Neural Network‐Based Approach for Evaluating College English Teaching Methodology
CN116663416A (en) CGF decision behavior simulation method based on behavior tree
CN110450164A (en) Robot control method, device, robot and storage medium
CN109117946A (en) Neural computing handles model
US20210042621A1 (en) Method for operation of network model and related product
Afanasyev et al. Methods and tools for the development, implementation and use of the intelligent distance learning environment
CN115270686A (en) Chip layout method based on graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20180706

Assignee: HUAWEI TECHNOLOGIES Co.,Ltd.

Assignor: NANJING University

Contract record no.: X2020980005991

Denomination of invention: A robot control method based on simulator training

License type: Common License

Record date: 20200911

EE01 Entry into force of recordation of patent licensing contract
CB02 Change of applicant information

Address after: 210008 No. 22, Hankou Road, Gulou District, Jiangsu, Nanjing

Applicant after: NANJING University

Address before: 210023 163 Xianlin Road, Qixia District, Nanjing, Jiangsu

Applicant before: NANJING University

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant