CN114995479A - Parameter control method of quadruped robot virtual model controller based on reinforcement learning - Google Patents

Parameter control method of quadruped robot virtual model controller based on reinforcement learning Download PDF

Info

Publication number
CN114995479A
CN114995479A CN202210673604.7A CN202210673604A CN114995479A CN 114995479 A CN114995479 A CN 114995479A CN 202210673604 A CN202210673604 A CN 202210673604A CN 114995479 A CN114995479 A CN 114995479A
Authority
CN
China
Prior art keywords
robot
reinforcement learning
virtual model
joint
virtual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210673604.7A
Other languages
Chinese (zh)
Inventor
张云伟
龚泽武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202210673604.7A priority Critical patent/CN114995479A/en
Publication of CN114995479A publication Critical patent/CN114995479A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/08Control of attitude, i.e. control of roll, pitch, or yaw
    • G05D1/0891Control of attitude, i.e. control of roll, pitch, or yaw specially adapted for land vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Manipulator (AREA)

Abstract

The invention discloses a parameter control method of a quadruped robot virtual model controller based on reinforcement learning, which comprises the following specific steps: step (1): establishing a mathematical model of the quadruped robot: calculating a positive kinematics model of the quadruped robot; step (2): gait generation: optimizing the cycloid locus to obtain a cycloid locus after constraint optimization; and (3): designing a virtual model controller: establishing virtual model control of the quadruped robot; and (4): and (3) combining deep reinforcement learning algorithms: selecting and designing a state space, an action space and a reward function; and (5): training and simulating a virtual prototype model; the invention finds the optimal controller parameter by utilizing the strong exploration capability of deep reinforcement learning, thereby reducing the setting difficulty of the controller parameter; the invention improves the control precision of the traditional virtual model controller, has certain anti-interference performance, and the controller can still accurately control the motion of the quadruped robot.

Description

Parameter control method of quadruped robot virtual model controller based on reinforcement learning
Technical Field
The invention relates to the technical field of robot control methods, in particular to a parameter control method of a quadruped robot virtual model controller based on reinforcement learning.
Background
Along with the development of the bionic technology of the robot, the application field of the multi-legged robot is more and more extensive, compared with a biped robot, the four-legged robot has better stability and ground passing performance, compared with a robot with six or more legs, the four-legged robot has better balance and choice in the aspect of control algorithm complexity, and compared with the traditional wheeled and crawler-type mobile robots, the four-legged robot has stronger adaptability. In nature, almost all large mammals move in a quadruped advancing mode, and a motion strategy of the quadruped robot can be constructed according to the principle of bionics. The virtual model control is a control method which is widely applied in the field of quadruped robots, the method does not require accurate and complex dynamics and inverse kinematics modeling on the quadruped robots, and can enable the robots to move correspondingly according to the designed gait track planning, but the algorithm sacrifices the control accuracy, has larger error and is difficult to realize the accurate control on the quadruped robots.
Disclosure of Invention
The invention mainly aims to provide a parameter control method of a quadruped robot virtual model controller based on reinforcement learning, and aims to solve the existing technical problems.
In order to achieve the purpose, the invention provides a parameter control method of a virtual model controller of a quadruped robot based on reinforcement learning, which is characterized by comprising the following steps of:
step (1): establishing a mathematical model of the quadruped robot: calculating a positive kinematics model of the quadruped robot, wherein the quadruped robot has 12 joints, each leg part consists of an upper leg part, a middle leg part and a lower leg part, and the length L of the upper joint 1 The length of the middle joint and the lower joint is L 2 、L 3 Obtaining the position relation mapping between the rotation angle of the leg joint of the robot and the tail ends of the legs and the feet;
step (2): gait generation: optimizing the cycloid locus to obtain a cycloid locus after constraint optimization;
and (3): designing a virtual model controller: establishing virtual model control of the quadruped robot through the mathematical model obtained in the step (1), and initializing parameters of a virtual model controller;
and (4): and (3) combining deep reinforcement learning algorithms: selecting and designing a state space, an action space and a reward function, and combining the virtual model controller designed in the step (3) to design and select an algorithm of the reinforcement learning intelligent agent;
and (5): training and simulating a virtual prototype model: and training the built virtual model to obtain an optimal control strategy.
Further, the specific steps of the step (1) are as follows,
body structure design of four-legged robot
The quadruped can move in six dimensions, including x, y, z and the motion around the x, y and z axes, so that the quadruped robot has 12 degrees of freedom according to the motion requirement, each leg part is composed of an upper leg part, a middle leg part and a lower leg part, and the upper joint length L 1 The length of the middle joint and the lower joint is L 2 、L 3 And (5) selecting an all-elbow type motion mode.
Robot positive kinematics modeling
Establishing a world coordinate system { W } by using a ground origin, establishing a robot body coordinate system { B } by using a robot body centroid position, establishing a coordinate system at a robot leg hip joint origin position, wherein the coordinate system is respectively a front left { FL }, a front right { RL }, a rear left { BL }, and a rear right { BR }, and hip joints, shoulder joints, knee joints and foot ends thereof are respectively represented by {0} to {3}, establishing a positive kinematics equation by using a DH method according to the parameters of the quadruped robot explained in the step (1) of claim 1, wherein the structural parameters of the four legs are consistent, and the positive kinematics equation is as follows:
Figure BDA0003690518700000021
wherein the transformation matrix for rotation about the z-axis is:
Figure BDA0003690518700000031
translating the transformation matrix in the y-direction is:
Figure BDA0003690518700000032
translating the transformation matrix in the z direction is:
Figure BDA0003690518700000033
the transformation matrix of the robot foot end coordinate system {3} relative to the hip joint coordinate system {0} can be obtained by the formulas (2), (3) and (4) as follows:
Figure BDA0003690518700000034
in the formula (5) c ij =cos(θ ij ),c i =cos(θ i ),s i =sin(θ i )。
The forward kinematic equation obtained from equation (5) is:
Figure BDA0003690518700000035
further, the specific steps of the step (2) are as follows,
in the motion process of the quadruped robot, each leg has two states of a swinging phase and a supporting phase, the position, the speed and the acceleration in the horizontal x direction and the vertical y direction are restrained aiming at the problem that sliding mopping is generated between the tail end of the leg of the robot and the ground in the walking process, and restraint equations are respectively as follows;
horizontal x-direction constraint:
Figure BDA0003690518700000041
Figure BDA0003690518700000042
Figure BDA0003690518700000043
constraint in vertical y-direction:
Figure BDA0003690518700000044
Figure BDA0003690518700000045
Figure BDA0003690518700000046
in the formulas (7) to (12), T is a gait cycle, S is a stride of one gait cycle, and H is a raised height of the distal end of the leg and foot, and the desired trajectory of the robot is obtained by the above-mentioned constraint as shown in the formula (13):
Figure BDA0003690518700000047
further, in the step (3), the motion characteristic parameter extraction specifically includes the following steps, and the virtual model controller is used as a basic controller
Fourthly, calculating a Jacobian matrix:
the virtual model controller needs the Jacobian matrix of the controlled object and can respectively derive the relative sitting positions (x, y, z) of the legs of the robot and the joint variables, and the joint variables are recorded as q 1 、q 2 、q 3 The physical meaning is the rotation angle of the hip joint, shoulder joint and knee joint motor. The positive kinematics equation (6) of the robot leg obtained according to claim (1) has a jacobian matrix:
Figure BDA0003690518700000051
calculating the virtual force:
virtual model control supposes the existence of a spring damping part, and the current position (x, y, z) and speed of the controlled object are input
Figure BDA0003690518700000052
And requires the input of a desired reference trajectory (x) bd ,y bd ,z bd ) Virtual force of oscillatory phase f Pendulum The calculation formula is as follows:
Figure BDA0003690518700000053
in the formula: (k) of x ,k y ,k z ) Is the elastic coefficient of the virtual force, (b) x ,b y ,b z ) Is the damping coefficient of the virtual force.
The method for controlling the supporting phase and calculating the virtual force of the swinging phase are different, the virtual force is applied to the tail ends of the legs and the feet in the swinging phase, the position of the tail ends of the legs and the feet relative to the ground does not change greatly in the supporting phase, the machine body moves relative to the ground contact point, and the virtual force is applied to the hip joint position, which is equivalent to applying-f to the tail ends of the legs and the feet Branch stand It should be noted that only the forward speed exists in the process of uniform forward motion of the robot
Figure BDA0003690518700000054
And the height of the machine body from the ground need to have values, and the rest are uniformly set to be 0. The control law for the support phase can thus be derived:
Figure BDA0003690518700000055
calculation of joint moment
After obtaining the virtual force applied on the controlled object, the moments of the swing phase and the support phase of the leg joint motor can be obtained according to the calculated Jacobian matrix as follows:
Figure BDA0003690518700000056
further, the specific steps of the step (4) are as follows,
designing a state space:
reinforcement learning is based on Markov decision process, and a series of state vectors(s) are obtained in the interaction process of an agent and the environment 1 ,s 2 ,...s n The selected states are position coordinates of the tail ends of the four legs and the feet relative to hip joints, the overturning angles of the robot body are respectively rollover, pitching and transverse rotation, the error sum of the tail ends of the legs and the feet and a reference expected track, and the last action output vector k totally accounts for 48 states:
designing an action space:
the motion space design output is spring damping coefficients k and b of the virtual model controller designed in claim 4, each leg has 3 degrees of freedom, each leg has 6 motion inputs, and the total motion output is 24. In order to reduce the learning difficulty of the reinforcement learning agent, the rough range of parameters needs to be explored in advance, the elastic damping coefficient of each leg can be set to be consistent for convenience, the action space output range of the reinforcement learning agent is limited on the basis, and the situation that the optimization space of the reinforcement learning agent is too large and the algorithm cannot be converged is avoided.
Designing a reward function:
the main purpose of the quadruped robot is to move forwards as stably as possible, maintain a certain body height and actively recover to a preset track v under the interference of external force x Is the forward velocity of the robot, y is the displacement in the lateral direction of the robot, u is the output of the robot motion, θ i For the robot body at a 3-dimensional turning angle, T f For simulating the maximum duration of the training process, T s Is the minimum step size unit, T, in the simulation process s /T f In order to make the robot run more steps as much as possible during the training process to obtain more rewards, the reward function is set as follows:
Figure BDA0003690518700000061
setting of termination function
Reinforcement learning agent at each trainingWhen the robot body deviates a certain threshold value, a termination function is set to terminate the training in time for the next round of training, so that the training time can be reduced, and the threshold value is respectively set as the sum E of the foot end trajectory errors sum The robot body turnover angles alpha, beta and gamma, the lateral offset y of the robot and the height z of the robot from the ground. The termination function is set as follows:
S isD =(E sum ≥0.57)|(α、β、γ≥0.36)|(|y|≥0.4)|(z≤0.20) (15)
when any one of the conditions of the equation (19) is satisfied, the round is terminated.
Design of reinforcement learning algorithm
The state space and the action space of the multi-legged robot are both multidimensional continuous space vectors, the method optimizes parameters of a virtual model controller of the four-legged robot by selecting a depth deterministic strategy gradient (DDPG) method, the method comprises two neural networks which are an actor network and a critic network respectively, the actor network actor inputs the state s of the current environment and outputs an intelligent body action a, the critic network critic evaluates the output of the actor network according to the action of the actor network and the reaction of the current environment, the corresponding actor network improves the strategy output according to the evaluation of the critic network, the output of the actor network is more and more in line with the current environment state along with the increase of the turn times, the evaluation level of the actor network by the critic network is more and more accurate, and the optimal strategy is finally achieved.
Further, the specific steps of step (5) are as follows.
Four-legged robot body modeling
And establishing a robot body model by utilizing solidworks, and exporting the established physical model as an urdf file through an sm2urdf plug-in.
② building matlab and simulink combined simulation model
According to the urdf file generated in the step I, a smiimport function of matlab is used for importing a model into simulink, and a corresponding algorithm model is built.
The invention has the following beneficial effects:
(1) the invention finds the optimal controller parameter by utilizing the strong exploration capability of deep reinforcement learning, and reduces the setting difficulty of the controller parameter.
(2) The invention improves the control precision of the traditional virtual model controller, and has certain anti-interference performance, so that the controller can still accurately control the motion of the quadruped robot under the condition that the sensor is interfered by the outside.
Drawings
FIG. 1 is a flow chart of a method of the present invention;
FIG. 2 is a topology diagram of a quadruped robot structure;
FIG. 3 is a block diagram of the control architecture of the present invention;
FIG. 4 is an update flow diagram of a depth deterministic policy gradient algorithm;
FIG. 5 is a composite cycloid locus of the foot end of the quadruped robot of the present invention;
FIG. 6 is a graph of the reinforcement learning controller reward function of the present invention;
FIG. 7 is the actual foot end trajectory in XZ space of the robot in the present invention;
FIG. 8 is a position curve of the robot of the present invention in a world coordinate system; (a) an x-axis direction (b), a y-axis direction (c), and a z-axis direction;
FIG. 9 is a transformation curve of the spring rate of a single leg of the robot of the present invention;
FIG. 10 is a transformation curve of the damping coefficient of a single leg of the robot of the present invention;
FIG. 11 is a graph comparing error curves for the present invention and a prior art control method;
FIG. 12 is the lateral position variation curve of the quadruped robot
FIG. 13 is a torque output curve of the motor under the control of the conventional control method according to the present invention;
FIG. 14 is a torque output curve of the lower joint motor of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.
In addition, if there is a description of "first", "second", etc. in an embodiment of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, the meaning of "and/or" appearing throughout includes three juxtapositions, exemplified by "A and/or B" including either A or B or both A and B. In addition, "a plurality" means two or more. In addition, technical solutions between various embodiments can be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be nonexistent.
Referring to fig. 1 and 2, the invention relates to a parameter control method of a virtual model controller of a quadruped robot based on reinforcement learning, which comprises the following specific steps:
step (1): establishing a mathematical model of the quadruped robot: calculating a positive kinematics model of a four-footed robot, wherein the four-footed robot has 12 joints in total, each leg part consists of three leg-foot parts, namely an upper part, a middle part and a lower part, and the length L of the upper joint 1 The length of the middle joint and the lower joint is L 2 、L 3 Obtaining the position relation mapping between the rotation angle of the leg joint of the robot and the tail ends of the legs and the feet;
step (2): gait generation: optimizing the cycloid locus to obtain a cycloid locus after constraint optimization;
and (3): designing a virtual model controller: establishing virtual model control of the quadruped robot through the mathematical model obtained in the step (1), and initializing parameters of a virtual model controller;
and (4): and (3) combining deep reinforcement learning algorithms: selecting and designing a state space, an action space and a reward function, and combining the virtual model controller designed in the step (3) to design and select an algorithm of the reinforcement learning intelligent agent;
and (5): training and simulating a virtual prototype model: and training the built virtual model to obtain an optimal control strategy.
The traditional method comprises the following steps: the traditional VMC controller adopts an empirical method for parameter estimation, adopts a random empirical estimation method for damping coefficients and elastic coefficients of joints of a mechanical leg part, and once the parameters are determined, the parameters are not changed in subsequent experiments. In reality, the ground environment where the robot walks cannot be guaranteed to be uniform and flat but uneven, so that the dynamic control performance of the system is poor due to fixed parameter selection, corresponding parameters cannot be reasonably set according to specific road conditions, and therefore errors are large. When the robot body is disturbed by external disturbance force, such as lateral or longitudinal force, overturning and overturning easily occur.
The method comprises the following steps: the invention mainly improves the parameter selection method of the VMC controller aiming at the disclosed method. The original fixed parameter selection mode is replaced by adopting a deep reinforcement learning intelligent agent to make a real-time decision and output corresponding parameters, the self-adaptive control of the quadruped robot is realized, the anti-interference capability of the disclosed method is improved, and the specific effects are as follows:
among them, the parameter adaptive curve in the walking process of the four-footed robot is shown in fig. 9 and fig. 10.
The invention finds the optimal controller parameter by utilizing the strong exploration capability of deep reinforcement learning, and reduces the setting difficulty of the controller parameter.
The invention improves the control precision of the traditional virtual model controller, and has certain anti-interference performance, so that the controller can still accurately control the motion of the quadruped robot under the condition that the sensor is interfered by the outside.
In one embodiment, the specific steps of step (1) are as follows,
body structure design of four-legged robot
The quadruped can move in six dimensions, including x, y, z and the motion around the x, y and z axes, so that the quadruped robot has 12 degrees of freedom according to the motion requirement, each leg part is composed of an upper leg part, a middle leg part and a lower leg part, and the upper joint length L 1 The length of the middle joint and the lower joint is L 2 、L 3 And (5) selecting an all-elbow type motion mode.
Figure BDA0003690518700000101
TABLE 1 physical parameters of the quadruped robot
Robot positive kinematics modeling
Establishing a world coordinate system { W } by using a ground origin, establishing a machine body coordinate system { B } by using a machine body centroid position of the robot, establishing coordinate systems at the positions of hip joint origins of legs of the robot, wherein the coordinate systems are respectively front left { FL }, front right { RL }, rear left { BL }, and rear right { BR }, hip joints, shoulder joints, knee joints and foot ends of the hip joints are respectively expressed by {0} to {3}, establishing a positive kinematics equation by using a DH method according to the parameters of the quadruped robot explained in the step (1) of claim 1, and keeping the structural parameters of the four legs consistent,
Figure BDA0003690518700000111
table 2 robot D-H parameters table positive kinematics equation as follows:
Figure BDA0003690518700000112
wherein the transformation matrix for rotation about the z-axis is:
Figure BDA0003690518700000113
translating the transformation matrix in the y-direction is:
Figure BDA0003690518700000114
translating the transformation matrix in the z direction is:
Figure BDA0003690518700000115
the transformation matrix of the robot foot end coordinate system {3} relative to the hip joint coordinate system {0} can be obtained by the formulas (2), (3) and (4) as follows:
Figure BDA0003690518700000121
in the formula (5) c ij =cos(θ ij ),c i =cos(θ i ),s i =sin(θ i )。
The forward kinematic equation obtained from equation (5) is:
Figure BDA0003690518700000122
in one embodiment, the specific steps of step (2) are as follows,
in the motion process of the quadruped robot, each leg has two states of a swinging phase and a supporting phase, the position, the speed and the acceleration in the horizontal x direction and the vertical y direction are restrained aiming at the problem that the sliding mopad is generated between the tail end of the leg of the robot and the ground in the walking process, and the restraint equations are respectively as follows;
horizontal x-direction constraint:
Figure BDA0003690518700000123
Figure BDA0003690518700000124
Figure BDA0003690518700000125
constraint in vertical y-direction:
Figure BDA0003690518700000126
Figure BDA0003690518700000127
Figure BDA0003690518700000131
in the formulas (7) to (12), T is a gait cycle, S is a stride of one gait cycle, and H is a raised height of the distal end of the leg and foot, and the desired trajectory of the robot is obtained by the above-mentioned constraint as shown in the formula (13):
Figure BDA0003690518700000132
in an embodiment, in the step (3), the motion characteristic parameter extraction specifically includes the following steps, and the virtual model controller is used as a basic controller
And (c) calculating a jacobi matrix:
the virtual model controller needs the Jacobian matrix of the controlled object and can respectively derive the relative sitting positions (x, y, z) of the legs of the robot and the joint variables, and the joint variables are recorded as q 1 、q 2 、q 3 The physical meaning is the rotation angle of the hip joint, shoulder joint and knee joint motor. The positive kinematics equation (6) of the robot leg obtained according to claim (1) has a jacobian matrix:
Figure BDA0003690518700000133
and calculating virtual force:
virtual model control supposes the existence of a spring damping part, and the current position (x, y, z) and speed of the controlled object are input
Figure BDA0003690518700000134
And requires the input of a desired reference trajectory (x) bd ,y bd ,z bd ) Swing phase virtual force f Pendulum The calculation formula is:
Figure BDA0003690518700000135
in the formula: (k) x ,k y ,k z ) Is the elastic coefficient of the virtual force, (b) x ,b y ,b z ) Is the damping coefficient of the virtual force.
The method for controlling the supporting phase and calculating the virtual force of the swinging phase are different, the virtual force is applied to the tail ends of the legs and the feet in the swinging phase, the position of the tail ends of the legs and the feet relative to the ground does not change greatly in the supporting phase, the machine body moves relative to the ground contact point, and the virtual force is applied to the hip joint position, which is equivalent to applying-f to the tail ends of the legs and the feet Branch stand It should be noted that only the forward speed exists in the process of uniform forward motion of the robot
Figure BDA0003690518700000141
And the height of the machine body from the ground needs to have a value, and the rest is uniformly set to be 0. The control law for the support phase can thus be derived:
Figure BDA0003690518700000142
ninthly calculation of joint torque
After obtaining the virtual force applied on the controlled object, the moments of the swing phase and the support phase of the leg joint motor can be obtained according to the calculated Jacobian matrix as follows:
Figure BDA0003690518700000143
the invention introduces a foot end trajectory tracking performance index C for the improvement change of the tracking performance of the leg of the quadruped robot;
Figure BDA0003690518700000144
in the formula: x, y, z are the actual foot end trajectory positions, x d 、y d 、z d Is the desired triaxial trajectory.
As shown in fig. 11, the dotted line part is the trajectory error curve of the robot foot end under the public method, and the red solid line part is the trajectory error curve of the robot foot end after the depth reinforcement learning is added, from which it can be seen that the maximum error peak value is 0.15 compared with the public method, while the error peak value of the method of the present invention is about 0.05, the effect is obvious, but the error change is irregular due to the dynamically updated parameters, but the overall stability is not affected.
In one embodiment, the specific steps of step (4) are as follows,
designing a state space:
reinforcement learning is based on Markov decision process, and a series of state vectors(s) are obtained in the interaction process of an agent and the environment 1 ,s 2 ,...s n The selected states are position coordinates of the tail ends of the four legs and the feet relative to hip joints, the overturning angles of the robot body are respectively rollover, pitching and transverse rotation, the error sum of the tail ends of the legs and the feet and a reference expected track, and the last action output vector k totally accounts for 48 states:
designing an action space:
the motion space design output is spring damping coefficients k and b of the virtual model controller designed in claim 4, each leg has 3 degrees of freedom, each leg has 6 motion inputs, and the total motion output is 24. In order to reduce the learning difficulty of the reinforcement learning agent, the rough range of parameters needs to be explored in advance, the elastic damping coefficient of each leg can be set to be consistent for convenience, the action space output range of the reinforcement learning agent is limited on the basis, and the situation that the optimization space of the reinforcement learning agent is too large and the algorithm cannot be converged is avoided.
Designing a reward function:
the main purpose of the quadruped robot is to move forwards as stably as possible, maintain a certain body height and actively recover to a preset track v under the interference of external force x Is the forward velocity of the robot, y is the displacement in the lateral direction of the robot, u is the output of the robot motion, θ i Is the turning angle, T, of the robot body in 3 dimensions f For simulating the maximum duration of the training process, T s Is the minimum step size unit, T, in the simulation process s /T f In order to make the robot get more rewards by running more steps as much as possible in the training process, the reward function is set as follows:
Figure BDA0003690518700000151
setting of termination function
When the reinforcement learning agent carries out each training round, a termination function is set, when the robot body deviates a certain threshold value, the training is terminated in time for carrying out the training of the next round, the training time can be reduced, and the threshold values are respectively set as the foot end track error sum E sum The robot body turnover angles alpha, beta and gamma, the lateral offset y of the robot and the height z of the robot from the ground. The termination function is set as follows:
S isD =(E sum ≥0.57)|(α、β、γ≥0.36)|(|y|≥0.4)|(z≤0.20) (15)
the round is terminated when any condition of the equation (19) is satisfied.
Design of reinforcement learning algorithm
The state space and the action space of the multi-legged robot are both multidimensional continuous space vectors, the method optimizes parameters of a virtual model controller of the four-legged robot by selecting a depth deterministic strategy gradient (DDPG) method, the method comprises two neural networks which are an actor network and a critic network respectively, the actor network actor inputs the state s of the current environment and outputs an intelligent body action a, the critic network critic evaluates the output of the actor network according to the action of the actor network and the reaction of the current environment, the corresponding actor network improves the strategy output according to the evaluation of the critic network, the output of the actor network is more and more in line with the current environment state along with the increase of the turn times, the evaluation level of the actor network by the critic network is more and more accurate, and the optimal strategy is finally achieved.
Figure BDA0003690518700000161
TABLE 3 DDPG algorithm training parameters
The anti-interference performance contrast of the quadruped robot when receiving the lateral direction along the y direction:
fig. 12 shows the lateral position change of the quadruped robot after the robot body is subjected to the interference force of the lateral position under the disclosed method and the present invention method, respectively, wherein the applied interference force F is +10N applied in 1-2 seconds, and the reverse-10N applied in 4-5 seconds, as can be seen from the above figure, the presently disclosed method, because it cannot adapt to the change of the external environment in real time, after the quadruped robot is subjected to the lateral interference force, the quadruped robot can deflect in one direction uncontrollably, which causes it to deviate from the preset trajectory (the motion trajectory in the present invention is a straight forward motion, i.e. a motion along the lateral direction y being 0), after 4-5 seconds, the reverse interference force is applied, the robot motion direction immediately deflects, after the interference force is removed, the robot still moves along the previous direction, as shown by the dotted line in the figure, finally, the lateral displacement slope of the method is not 0 in total, which indicates that the robot offset is larger and larger; in the invention, because a deep reinforcement learning method is adopted, the problem of accurate tracking of the foot end track is not only considered, but also the motion track of the robot is considered as an important index, and the setting rule of the reinforcement learning intelligent body reward function of the invention text type (18) can be seen specifically, so that the robot body has certain self-route correcting capability after being interfered by the side direction, as shown by the solid line in fig. 12.
Comparison of robot leg joint output moments:
referring to fig. 13, in the disclosed method, no further explanation is made on the moment of the joint of the leg of the robot, the output magnitude is closely related to the set elastic coefficient and damping coefficient, and the position and speed of the current foot end of the robot, wherein the larger the difference between the state of the foot end and the actual expected state is, the larger the actually output moment is, and the leg of the robot has two states, namely a swing phase and a support phase, when in the swing phase, if the output exceeds the currently required moment magnitude, the predetermined state can be reached, but on the one hand, energy waste is also caused, and when the leg is reflected in a real robot, energy consumption is increased, and endurance is reduced; similarly, when the robot is in a supporting phase, the moment is too large, the situation that the robot slips or even turns over on the side with the ground due to unbalanced stress is easily caused, and when the moment is not enough, the robot cannot keep a preset height, so that the speed is too slow, and the like.
The invention adopts a method of deep reinforcement learning dynamic parameter adjustment, gives consideration to joint torque output, the overall principle is to make the output torque as small as possible, the purposes of reducing energy consumption and protecting the robot body are achieved, the sensors are used for acquiring the three-axis position, the foot end and foot end speed of the robot body and the output torque of a joint motor, the robot body learns in the continuous interaction process with the environment, the most suitable output torque is obtained, and the stable operation of the robot is realized. This is shown in detail in FIG. 14.
In fig. 13, it can be seen that the motor generates a large moment jump at the moment of each phase change from swing phase to support phase, and this phenomenon occurs in each gait cycle, increasing the instability factor of the body.
In fig. 14, it can be seen that the torque peak value is approximately the same as the effective part of the disclosed method, but the output torque is smoother, the torque abrupt change of the motor is effectively restrained, and the expected control target is achieved.
In one embodiment, the specific steps of step (5) are as follows.
Four-legged robot body modeling
And establishing a robot body model by utilizing solidworks, and exporting the established physical model as an urdf file through an sm2urdf plug-in.
② building matlab and simulink combined simulation model
And (4) according to the urdf file generated in the step (i), importing a model into simulink by utilizing a smiportant function of matlab, and building a corresponding algorithm model.
The experimental platform adopted in the experiment is MATLAB R2021a, the operating system is win10 system, the central processing unit is Intel 10600KF, the image display card is RTX3060, the simscape multi-body tool box and the Deep recovery testing toolbox of the Simulink are used to obtain the reward function curve in the training process, and the position relation and the speed relation between the quadruped robot and the world coordinate system are obtained through the Transform robot Sensor in the Simulink, and Simulink/sinks/scope are added to output the final simulation result, fig. 7 is the actual motion trajectory of the foot end of the quadruped robot of the invention, fig. 8 is the position change curve of the robot after the training is completed, wherein fig. 8(a) is the position change curve of the invention in the forward direction of the x axis, fig. 8(b) is the position change curve of the invention in the lateral deviation direction of the y axis, and fig. 8(c) is the position change curve of the invention in the vertical direction of the z axis.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (6)

1. A parameter control method of a quadruped robot virtual model controller based on reinforcement learning is characterized by comprising the following specific control methods:
step (1): establishing a mathematical model of the quadruped robot: calculating a positive kinematics model of the quadruped robot, wherein the quadruped robot has 12 joints, each leg part consists of an upper leg part, a middle leg part and a lower leg part, and the length L of the upper joint 1 The length of the middle joint and the lower joint is L 2 、L 3 Obtaining the position relation mapping between the rotation angle of the leg joint of the robot and the tail ends of the legs and the feet;
step (2): gait generation: optimizing the cycloid locus to obtain a cycloid locus after constraint optimization;
and (3): designing a virtual model controller: establishing virtual model control of the quadruped robot through the mathematical model obtained in the step (1), and initializing parameters of a virtual model controller;
and (4): and (3) combining deep reinforcement learning algorithms: selecting and designing a state space, an action space and a reward function, and combining the virtual model controller designed in the step (3) to design and select an algorithm of the reinforcement learning intelligent agent;
and (5): training and simulating a virtual prototype model: and training the built virtual model to obtain an optimal control strategy.
2. The parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning as claimed in claim 1, wherein:
the specific steps of the step (1) are as follows,
body structure design of four-legged robot
The quadruped can move in six dimensions, including x, y, z and the motion around the x, y and z axes, so that the quadruped robot has 12 degrees of freedom according to the motion requirement, each leg part is composed of an upper leg part, a middle leg part and a lower leg part, and the upper joint length L 1 The length of the middle joint and the lower joint is L 2 、L 3 And (5) selecting an all-elbow type motion mode.
Robot positive kinematics modeling
Establishing a world coordinate system { W } by using a ground origin, establishing a machine body coordinate system { B } by using a machine body centroid position of the robot, establishing a coordinate system at the robot leg hip joint origin position, wherein the coordinate system is respectively front left { FL }, front right { RL }, rear left { BL }, and rear right { BR }, hip joints, shoulder joints, knee joints and foot ends thereof are respectively represented by {0} to {3}, establishing a positive kinematics equation by using a DH method according to the four-foot robot parameters explained in the step (1) of claim 1, and keeping the structural parameters of the four legs consistent, wherein the positive kinematics equation is as follows:
Figure FDA0003690518690000021
wherein the transformation matrix for rotation about the z-axis is:
Figure FDA0003690518690000022
translating the transformation matrix in the y-direction as:
Figure FDA0003690518690000023
translating the transformation matrix in the z direction is:
Figure FDA0003690518690000024
the transformation matrix of the robot foot end coordinate system {3} relative to the hip joint coordinate system {0} can be obtained by the formulas (2), (3) and (4) as follows:
Figure FDA0003690518690000025
in formula (5) c ij =cos(θ ij ),c i =cos(θ i ),s i =sin(θ i )。
The forward kinematic equation obtained from equation (5) is:
Figure FDA0003690518690000031
3. the parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning according to claim 2, characterized in that:
the specific steps of the step (2) are as follows,
in the motion process of the quadruped robot, each leg has two states of a swinging phase and a supporting phase, the position, the speed and the acceleration in the horizontal x direction and the vertical y direction are restrained aiming at the problem that sliding mopping is generated between the tail end of the leg of the robot and the ground in the walking process, and restraint equations are respectively as follows;
horizontal x-direction constraint:
Figure FDA0003690518690000032
Figure FDA0003690518690000033
Figure FDA0003690518690000034
constraint in vertical y-direction:
Figure FDA0003690518690000035
Figure FDA0003690518690000036
Figure FDA0003690518690000037
in the formulas (7) to (12), T is a gait cycle, S is a stride of one gait cycle, and H is a raised height of the distal end of the leg and foot, and the desired trajectory of the robot is obtained by the above-mentioned constraint as shown in the formula (13):
Figure FDA0003690518690000041
4. the parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning as claimed in claim 1, wherein:
in the step (3), the specific steps of extracting the motion characteristic parameters are as follows,
virtual model controller as basic controller
Calculating a jacobian matrix:
the virtual model controller needs the jacobian matrix of the controlled object and can respectively derive the relative sitting positions (x, y, z) of the legs of the robot and the joint variables, and the joint variables are recorded as q 1 、q 2 、q 3 The physical meaning is the rotation angle of the hip joint, shoulder joint and knee joint motor. The positive kinematics equation (6) of the robot leg obtained according to claim (1) has a jacobian matrix:
Figure FDA0003690518690000042
calculating virtual force:
virtual model control supposes that a spring damping part exists and the current position (x, y, z) and speed of a control object are input
Figure FDA0003690518690000043
And requires the input of a desired reference trajectory (x) bd ,y bd ,z bd ) Virtual force of oscillatory phase f Pendulum The calculation formula is as follows:
Figure FDA0003690518690000044
in the formula: (k) x ,k y ,k z ) Is the elastic coefficient of the virtual force, (b) x ,b y ,b z ) Is the damping coefficient of the virtual force.
The method for controlling the supporting phase and calculating the virtual force of the swinging phase are different, the virtual force is applied to the tail ends of the legs and the feet in the swinging phase, the position of the tail ends of the legs and the feet relative to the ground does not change greatly in the supporting phase, the machine body moves relative to the ground contact point, and the virtual force is applied to the hip joint position, which is equivalent to applying-f to the tail ends of the legs and the feet Branch stand It should be noted that only the forward speed exists in the process of uniform forward motion of the robot
Figure FDA0003690518690000045
And the height of the machine body from the ground needs to have a value, and the rest is uniformly set to be 0. The control law for the support phase can thus be derived:
Figure FDA0003690518690000051
calculating joint moment
After obtaining the virtual force applied on the controlled object, the moments of the swing phase and the support phase of the leg joint motor can be obtained according to the calculated Jacobian matrix as follows:
Figure FDA0003690518690000052
5. the parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning as claimed in claim 4, wherein:
the specific steps of the step (4) are as follows,
designing a state space:
the reinforcement learning is based on Markov decision process, and a series of state vectors(s) are obtained in the interaction process of an intelligent agent and the environment 1 ,s 2 ,...s n And selecting position coordinates of the tail ends of the four legs and the feet relative to hip joints, wherein the overturning angles of the robot body are respectively side overturning, pitching and transverse overturning, the sum of errors of the tail ends of the legs and the tail ends of the feet and a reference expected track is calculated, and the last action output vector k totally accounts for 48 states:
designing an action space:
the motion space design output is spring damping coefficients k and b of the virtual model controller designed in claim 4, each leg has 3 degrees of freedom, each leg has 6 motion inputs, and the total motion output is 24. In order to reduce the learning difficulty of the reinforcement learning agent, the rough range of parameters needs to be explored in advance, the elastic damping coefficient of each leg can be set to be consistent for convenience, the action space output range of the reinforcement learning agent is limited on the basis, and the situation that the optimization space of the reinforcement learning agent is too large and the algorithm cannot be converged is avoided.
Designing a reward function:
the main purpose of the quadruped robot is to move forwards as stably as possible, maintain a certain body height and actively recover to a preset track v under the interference of external force x Is the forward velocity of the robot, y is the displacement in the lateral direction of the robot, u is the output of the robot motion, θ i Is the turning angle, T, of the robot body in 3 dimensions f For simulating the maximum duration of the training process, T s Is the minimum step size unit, T, in the simulation process s /T f In order to make the robot run more steps as much as possible during the training process to obtain more rewards, the reward function is set as follows:
Figure FDA0003690518690000061
setting of termination function
When the reinforcement learning agent carries out each training round, a termination function is set, when the robot body deviates a certain threshold value, the training is terminated in time for carrying out the training of the next round, the training time can be reduced, and the threshold values are respectively set as the foot end track error sum E sum The robot body turnover angles alpha, beta and gamma, the lateral offset y of the robot and the height z of the robot from the ground. The termination function is set as follows:
S isD =(E sum ≥0.57)|(α、β、γ≥0.36)|(|y|≥0.4)|(z≤0.20) (15)
the round is terminated when any condition of the equation (19) is satisfied.
Design of reinforcement learning algorithm
The state space and the action space of the multi-legged robot are both multidimensional continuous space vectors, the method optimizes parameters of a virtual model controller of the four-legged robot by selecting a depth deterministic strategy gradient (DDPG) method, the method comprises two neural networks which are an actor network and a critic network respectively, the actor network operator inputs the state s of the current environment and outputs the action a of an intelligent body, the critic network critic evaluates the output of the actor network according to the action of the actor network and the reaction of the current environment, the corresponding actor network improves the strategy output according to the evaluation of the critic network, the output of the actor network is more and more in line with the current environment state along with the increase of the turn times, the evaluation level of the critic network on the actor network is more and more accurate, and finally the optimal strategy is achieved.
6. The parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning according to claim 5, characterized in that:
the specific steps of step (5) are as follows.
Four-legged robot body modeling
And establishing a robot body model by utilizing solidworks, and exporting the established physical model as an urdf file through an sm2urdf plug-in.
② building matlab and simulink combined simulation model
And (4) according to the urdf file generated in the step (i), importing a model into simulink by utilizing a smiportant function of matlab, and building a corresponding algorithm model.
CN202210673604.7A 2022-06-13 2022-06-13 Parameter control method of quadruped robot virtual model controller based on reinforcement learning Pending CN114995479A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210673604.7A CN114995479A (en) 2022-06-13 2022-06-13 Parameter control method of quadruped robot virtual model controller based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210673604.7A CN114995479A (en) 2022-06-13 2022-06-13 Parameter control method of quadruped robot virtual model controller based on reinforcement learning

Publications (1)

Publication Number Publication Date
CN114995479A true CN114995479A (en) 2022-09-02

Family

ID=83035602

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210673604.7A Pending CN114995479A (en) 2022-06-13 2022-06-13 Parameter control method of quadruped robot virtual model controller based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN114995479A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114397810A (en) * 2022-01-17 2022-04-26 厦门大学 Four-legged robot motion control method based on adaptive virtual model control
CN116401792A (en) * 2023-06-06 2023-07-07 之江实验室 Robot body design method
CN117148740A (en) * 2023-10-31 2023-12-01 江西机电职业技术学院 Combined simulation gait planning method for desktop-level four-foot robot
CN117452860A (en) * 2023-11-22 2024-01-26 北京交通大学 Multi-legged robot control system
CN117766155A (en) * 2024-02-22 2024-03-26 中国人民解放军海军青岛特勤疗养中心 dynamic blood pressure medical data processing system based on artificial intelligence
CN118003341A (en) * 2024-04-09 2024-05-10 首都体育学院 Lower limb joint moment calculation method based on reinforcement learning intelligent body

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114397810A (en) * 2022-01-17 2022-04-26 厦门大学 Four-legged robot motion control method based on adaptive virtual model control
CN114397810B (en) * 2022-01-17 2023-12-19 厦门大学 Motion control method of four-foot robot based on self-adaptive virtual model control
CN116401792A (en) * 2023-06-06 2023-07-07 之江实验室 Robot body design method
CN116401792B (en) * 2023-06-06 2023-09-19 之江实验室 Robot body design method
CN117148740A (en) * 2023-10-31 2023-12-01 江西机电职业技术学院 Combined simulation gait planning method for desktop-level four-foot robot
CN117452860A (en) * 2023-11-22 2024-01-26 北京交通大学 Multi-legged robot control system
CN117766155A (en) * 2024-02-22 2024-03-26 中国人民解放军海军青岛特勤疗养中心 dynamic blood pressure medical data processing system based on artificial intelligence
CN117766155B (en) * 2024-02-22 2024-05-10 中国人民解放军海军青岛特勤疗养中心 Dynamic blood pressure medical data processing system based on artificial intelligence
CN118003341A (en) * 2024-04-09 2024-05-10 首都体育学院 Lower limb joint moment calculation method based on reinforcement learning intelligent body

Similar Documents

Publication Publication Date Title
CN114995479A (en) Parameter control method of quadruped robot virtual model controller based on reinforcement learning
CN108549237B (en) Preset control humanoid robot gait planning method based on deep reinforcement learning
CN108858208B (en) Self-adaptive balance control method, device and system for humanoid robot in complex terrain
CN111913490A (en) Drop foot adjustment-based dynamic gait stability control method and system for quadruped robot
Chew et al. Dynamic bipedal walking assisted by learning
CN113031528B (en) Multi-legged robot non-structural ground motion control method based on depth certainty strategy gradient
CN102375416B (en) Human type robot kicking action information processing method based on rapid search tree
CN108931988B (en) Gait planning method of quadruped robot based on central pattern generator, central pattern generator and robot
CN114248855B (en) Biped robot space domain gait planning and control method
CN108897220B (en) Self-adaptive stable balance control method and system and biped humanoid robot
CN114047697B (en) Four-foot robot balance inverted pendulum control method based on deep reinforcement learning
CN116533249A (en) Mechanical arm control method based on deep reinforcement learning
CN114397810A (en) Four-legged robot motion control method based on adaptive virtual model control
CN112749515A (en) Damaged robot gait self-learning integrating biological inspiration and deep reinforcement learning
CN115128960A (en) Method and system for controlling motion of biped robot based on deep reinforcement learning
CN113568422B (en) Four-foot robot control method based on model predictive control optimization reinforcement learning
Qin et al. Stable balance adjustment structure of the quadruped robot based on the bionic lateral swing posture
CN114393579B (en) Robot control method and device based on self-adaptive fuzzy virtual model
CN114454983B (en) Turning control method and system for quadruped robot
Xie et al. Gait optimization and energy-based stability for biped locomotion using large-scale programming
Yoshida et al. Pivoting a large object: whole-body manipulation by a humanoid robot
Lu et al. A novel multi-configuration quadruped robot with redundant DOFs and its application scenario analysis
Yu et al. Gait planning for biped robot based on variable center-of-mass height hybrid strategy
Liu et al. A reinforcement learning method for humanoid robot walking
Yang et al. Truncated Fourier series formulation for bipedal walking balance control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination