CN114995479A

CN114995479A - Parameter control method of quadruped robot virtual model controller based on reinforcement learning

Info

Publication number: CN114995479A
Application number: CN202210673604.7A
Authority: CN
Inventors: 张云伟; 龚泽武
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2022-06-13
Filing date: 2022-06-13
Publication date: 2022-09-02

Abstract

The invention discloses a parameter control method of a quadruped robot virtual model controller based on reinforcement learning, which comprises the following specific steps: step (1): establishing a mathematical model of the quadruped robot: calculating a positive kinematics model of the quadruped robot; step (2): gait generation: optimizing the cycloid locus to obtain a cycloid locus after constraint optimization; and (3): designing a virtual model controller: establishing virtual model control of the quadruped robot; and (4): and (3) combining deep reinforcement learning algorithms: selecting and designing a state space, an action space and a reward function; and (5): training and simulating a virtual prototype model; the invention finds the optimal controller parameter by utilizing the strong exploration capability of deep reinforcement learning, thereby reducing the setting difficulty of the controller parameter; the invention improves the control precision of the traditional virtual model controller, has certain anti-interference performance, and the controller can still accurately control the motion of the quadruped robot.

Description

Parameter control method of quadruped robot virtual model controller based on reinforcement learning

Technical Field

The invention relates to the technical field of robot control methods, in particular to a parameter control method of a quadruped robot virtual model controller based on reinforcement learning.

Background

Along with the development of the bionic technology of the robot, the application field of the multi-legged robot is more and more extensive, compared with a biped robot, the four-legged robot has better stability and ground passing performance, compared with a robot with six or more legs, the four-legged robot has better balance and choice in the aspect of control algorithm complexity, and compared with the traditional wheeled and crawler-type mobile robots, the four-legged robot has stronger adaptability. In nature, almost all large mammals move in a quadruped advancing mode, and a motion strategy of the quadruped robot can be constructed according to the principle of bionics. The virtual model control is a control method which is widely applied in the field of quadruped robots, the method does not require accurate and complex dynamics and inverse kinematics modeling on the quadruped robots, and can enable the robots to move correspondingly according to the designed gait track planning, but the algorithm sacrifices the control accuracy, has larger error and is difficult to realize the accurate control on the quadruped robots.

Disclosure of Invention

The invention mainly aims to provide a parameter control method of a quadruped robot virtual model controller based on reinforcement learning, and aims to solve the existing technical problems.

In order to achieve the purpose, the invention provides a parameter control method of a virtual model controller of a quadruped robot based on reinforcement learning, which is characterized by comprising the following steps of:

step (1): establishing a mathematical model of the quadruped robot: calculating a positive kinematics model of the quadruped robot, wherein the quadruped robot has 12 joints, each leg part consists of an upper leg part, a middle leg part and a lower leg part, and the length L of the upper joint ₁ The length of the middle joint and the lower joint is L ₂ 、L ₃ Obtaining the position relation mapping between the rotation angle of the leg joint of the robot and the tail ends of the legs and the feet;

step (2): gait generation: optimizing the cycloid locus to obtain a cycloid locus after constraint optimization;

and (3): designing a virtual model controller: establishing virtual model control of the quadruped robot through the mathematical model obtained in the step (1), and initializing parameters of a virtual model controller;

and (4): and (3) combining deep reinforcement learning algorithms: selecting and designing a state space, an action space and a reward function, and combining the virtual model controller designed in the step (3) to design and select an algorithm of the reinforcement learning intelligent agent;

and (5): training and simulating a virtual prototype model: and training the built virtual model to obtain an optimal control strategy.

Further, the specific steps of the step (1) are as follows,

body structure design of four-legged robot

The quadruped can move in six dimensions, including x, y, z and the motion around the x, y and z axes, so that the quadruped robot has 12 degrees of freedom according to the motion requirement, each leg part is composed of an upper leg part, a middle leg part and a lower leg part, and the upper joint length L ₁ The length of the middle joint and the lower joint is L ₂ 、L ₃ And (5) selecting an all-elbow type motion mode.

Robot positive kinematics modeling

Establishing a world coordinate system { W } by using a ground origin, establishing a robot body coordinate system { B } by using a robot body centroid position, establishing a coordinate system at a robot leg hip joint origin position, wherein the coordinate system is respectively a front left { FL }, a front right { RL }, a rear left { BL }, and a rear right { BR }, and hip joints, shoulder joints, knee joints and foot ends thereof are respectively represented by {0} to {3}, establishing a positive kinematics equation by using a DH method according to the parameters of the quadruped robot explained in the step (1) of claim 1, wherein the structural parameters of the four legs are consistent, and the positive kinematics equation is as follows:

wherein the transformation matrix for rotation about the z-axis is:

translating the transformation matrix in the y-direction is:

translating the transformation matrix in the z direction is:

the transformation matrix of the robot foot end coordinate system {3} relative to the hip joint coordinate system {0} can be obtained by the formulas (2), (3) and (4) as follows:

in the formula (5) c _ij ＝cos(θ _i +θ _j )，c _i ＝cos(θ _i )，s _i ＝sin(θ _i )。

The forward kinematic equation obtained from equation (5) is:

further, the specific steps of the step (2) are as follows,

in the motion process of the quadruped robot, each leg has two states of a swinging phase and a supporting phase, the position, the speed and the acceleration in the horizontal x direction and the vertical y direction are restrained aiming at the problem that sliding mopping is generated between the tail end of the leg of the robot and the ground in the walking process, and restraint equations are respectively as follows;

horizontal x-direction constraint:

constraint in vertical y-direction:

in the formulas (7) to (12), T is a gait cycle, S is a stride of one gait cycle, and H is a raised height of the distal end of the leg and foot, and the desired trajectory of the robot is obtained by the above-mentioned constraint as shown in the formula (13):

further, in the step (3), the motion characteristic parameter extraction specifically includes the following steps, and the virtual model controller is used as a basic controller

Fourthly, calculating a Jacobian matrix:

the virtual model controller needs the Jacobian matrix of the controlled object and can respectively derive the relative sitting positions (x, y, z) of the legs of the robot and the joint variables, and the joint variables are recorded as q ₁ 、q ₂ 、q ₃ The physical meaning is the rotation angle of the hip joint, shoulder joint and knee joint motor. The positive kinematics equation (6) of the robot leg obtained according to claim (1) has a jacobian matrix:

calculating the virtual force:

virtual model control supposes the existence of a spring damping part, and the current position (x, y, z) and speed of the controlled object are input

And requires the input of a desired reference trajectory (x) _bd ，y _bd ，z _bd ) Virtual force of oscillatory phase f _Pendulum The calculation formula is as follows:

in the formula: (k) of _x ，k _y ，k _z ) Is the elastic coefficient of the virtual force, (b) _x ，b _y ，b _z ) Is the damping coefficient of the virtual force.

The method for controlling the supporting phase and calculating the virtual force of the swinging phase are different, the virtual force is applied to the tail ends of the legs and the feet in the swinging phase, the position of the tail ends of the legs and the feet relative to the ground does not change greatly in the supporting phase, the machine body moves relative to the ground contact point, and the virtual force is applied to the hip joint position, which is equivalent to applying-f to the tail ends of the legs and the feet _{Branch stand} It should be noted that only the forward speed exists in the process of uniform forward motion of the robot

And the height of the machine body from the ground need to have values, and the rest are uniformly set to be 0. The control law for the support phase can thus be derived:

calculation of joint moment

After obtaining the virtual force applied on the controlled object, the moments of the swing phase and the support phase of the leg joint motor can be obtained according to the calculated Jacobian matrix as follows:

further, the specific steps of the step (4) are as follows,

designing a state space:

reinforcement learning is based on Markov decision process, and a series of state vectors(s) are obtained in the interaction process of an agent and the environment ₁ ，s ₂ ，...s _n The selected states are position coordinates of the tail ends of the four legs and the feet relative to hip joints, the overturning angles of the robot body are respectively rollover, pitching and transverse rotation, the error sum of the tail ends of the legs and the feet and a reference expected track, and the last action output vector k totally accounts for 48 states:

designing an action space:

the motion space design output is spring damping coefficients k and b of the virtual model controller designed in claim 4, each leg has 3 degrees of freedom, each leg has 6 motion inputs, and the total motion output is 24. In order to reduce the learning difficulty of the reinforcement learning agent, the rough range of parameters needs to be explored in advance, the elastic damping coefficient of each leg can be set to be consistent for convenience, the action space output range of the reinforcement learning agent is limited on the basis, and the situation that the optimization space of the reinforcement learning agent is too large and the algorithm cannot be converged is avoided.

Designing a reward function:

the main purpose of the quadruped robot is to move forwards as stably as possible, maintain a certain body height and actively recover to a preset track v under the interference of external force _x Is the forward velocity of the robot, y is the displacement in the lateral direction of the robot, u is the output of the robot motion, θ _i For the robot body at a 3-dimensional turning angle, T _f For simulating the maximum duration of the training process, T _s Is the minimum step size unit, T, in the simulation process _s /T _f In order to make the robot run more steps as much as possible during the training process to obtain more rewards, the reward function is set as follows:

setting of termination function

Reinforcement learning agent at each trainingWhen the robot body deviates a certain threshold value, a termination function is set to terminate the training in time for the next round of training, so that the training time can be reduced, and the threshold value is respectively set as the sum E of the foot end trajectory errors _sum The robot body turnover angles alpha, beta and gamma, the lateral offset y of the robot and the height z of the robot from the ground. The termination function is set as follows:

S _isD ＝(E _sum ≥0.57)|(α、β、γ≥0.36)|(|y|≥0.4)|(z≤0.20) (15)

when any one of the conditions of the equation (19) is satisfied, the round is terminated.

Design of reinforcement learning algorithm

The state space and the action space of the multi-legged robot are both multidimensional continuous space vectors, the method optimizes parameters of a virtual model controller of the four-legged robot by selecting a depth deterministic strategy gradient (DDPG) method, the method comprises two neural networks which are an actor network and a critic network respectively, the actor network actor inputs the state s of the current environment and outputs an intelligent body action a, the critic network critic evaluates the output of the actor network according to the action of the actor network and the reaction of the current environment, the corresponding actor network improves the strategy output according to the evaluation of the critic network, the output of the actor network is more and more in line with the current environment state along with the increase of the turn times, the evaluation level of the actor network by the critic network is more and more accurate, and the optimal strategy is finally achieved.

Further, the specific steps of step (5) are as follows.

Four-legged robot body modeling

And establishing a robot body model by utilizing solidworks, and exporting the established physical model as an urdf file through an sm2urdf plug-in.

② building matlab and simulink combined simulation model

According to the urdf file generated in the step I, a smiimport function of matlab is used for importing a model into simulink, and a corresponding algorithm model is built.

The invention has the following beneficial effects:

(1) the invention finds the optimal controller parameter by utilizing the strong exploration capability of deep reinforcement learning, and reduces the setting difficulty of the controller parameter.

(2) The invention improves the control precision of the traditional virtual model controller, and has certain anti-interference performance, so that the controller can still accurately control the motion of the quadruped robot under the condition that the sensor is interfered by the outside.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a topology diagram of a quadruped robot structure;

FIG. 3 is a block diagram of the control architecture of the present invention;

FIG. 4 is an update flow diagram of a depth deterministic policy gradient algorithm;

FIG. 5 is a composite cycloid locus of the foot end of the quadruped robot of the present invention;

FIG. 6 is a graph of the reinforcement learning controller reward function of the present invention;

FIG. 7 is the actual foot end trajectory in XZ space of the robot in the present invention;

FIG. 8 is a position curve of the robot of the present invention in a world coordinate system; (a) an x-axis direction (b), a y-axis direction (c), and a z-axis direction;

FIG. 9 is a transformation curve of the spring rate of a single leg of the robot of the present invention;

FIG. 10 is a transformation curve of the damping coefficient of a single leg of the robot of the present invention;

FIG. 11 is a graph comparing error curves for the present invention and a prior art control method;

FIG. 12 is the lateral position variation curve of the quadruped robot

FIG. 13 is a torque output curve of the motor under the control of the conventional control method according to the present invention;

FIG. 14 is a torque output curve of the lower joint motor of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The embodiments and features of the embodiments in the present application may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.

In addition, if there is a description of "first", "second", etc. in an embodiment of the present invention, the description of "first", "second", etc. is for descriptive purposes only and is not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, the meaning of "and/or" appearing throughout includes three juxtapositions, exemplified by "A and/or B" including either A or B or both A and B. In addition, "a plurality" means two or more. In addition, technical solutions between various embodiments can be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be nonexistent.

Referring to fig. 1 and 2, the invention relates to a parameter control method of a virtual model controller of a quadruped robot based on reinforcement learning, which comprises the following specific steps:

step (1): establishing a mathematical model of the quadruped robot: calculating a positive kinematics model of a four-footed robot, wherein the four-footed robot has 12 joints in total, each leg part consists of three leg-foot parts, namely an upper part, a middle part and a lower part, and the length L of the upper joint ₁ The length of the middle joint and the lower joint is L ₂ 、L ₃ Obtaining the position relation mapping between the rotation angle of the leg joint of the robot and the tail ends of the legs and the feet;

The traditional method comprises the following steps: the traditional VMC controller adopts an empirical method for parameter estimation, adopts a random empirical estimation method for damping coefficients and elastic coefficients of joints of a mechanical leg part, and once the parameters are determined, the parameters are not changed in subsequent experiments. In reality, the ground environment where the robot walks cannot be guaranteed to be uniform and flat but uneven, so that the dynamic control performance of the system is poor due to fixed parameter selection, corresponding parameters cannot be reasonably set according to specific road conditions, and therefore errors are large. When the robot body is disturbed by external disturbance force, such as lateral or longitudinal force, overturning and overturning easily occur.

The method comprises the following steps: the invention mainly improves the parameter selection method of the VMC controller aiming at the disclosed method. The original fixed parameter selection mode is replaced by adopting a deep reinforcement learning intelligent agent to make a real-time decision and output corresponding parameters, the self-adaptive control of the quadruped robot is realized, the anti-interference capability of the disclosed method is improved, and the specific effects are as follows:

among them, the parameter adaptive curve in the walking process of the four-footed robot is shown in fig. 9 and fig. 10.

The invention finds the optimal controller parameter by utilizing the strong exploration capability of deep reinforcement learning, and reduces the setting difficulty of the controller parameter.

The invention improves the control precision of the traditional virtual model controller, and has certain anti-interference performance, so that the controller can still accurately control the motion of the quadruped robot under the condition that the sensor is interfered by the outside.

In one embodiment, the specific steps of step (1) are as follows,

body structure design of four-legged robot

TABLE 1 physical parameters of the quadruped robot

Robot positive kinematics modeling

Establishing a world coordinate system { W } by using a ground origin, establishing a machine body coordinate system { B } by using a machine body centroid position of the robot, establishing coordinate systems at the positions of hip joint origins of legs of the robot, wherein the coordinate systems are respectively front left { FL }, front right { RL }, rear left { BL }, and rear right { BR }, hip joints, shoulder joints, knee joints and foot ends of the hip joints are respectively expressed by {0} to {3}, establishing a positive kinematics equation by using a DH method according to the parameters of the quadruped robot explained in the step (1) of claim 1, and keeping the structural parameters of the four legs consistent,

table 2 robot D-H parameters table positive kinematics equation as follows:

wherein the transformation matrix for rotation about the z-axis is:

translating the transformation matrix in the y-direction is:

translating the transformation matrix in the z direction is:

The forward kinematic equation obtained from equation (5) is:

in one embodiment, the specific steps of step (2) are as follows,

in the motion process of the quadruped robot, each leg has two states of a swinging phase and a supporting phase, the position, the speed and the acceleration in the horizontal x direction and the vertical y direction are restrained aiming at the problem that the sliding mopad is generated between the tail end of the leg of the robot and the ground in the walking process, and the restraint equations are respectively as follows;

horizontal x-direction constraint:

constraint in vertical y-direction:

in an embodiment, in the step (3), the motion characteristic parameter extraction specifically includes the following steps, and the virtual model controller is used as a basic controller

And (c) calculating a jacobi matrix:

and calculating virtual force:

And requires the input of a desired reference trajectory (x) _bd ，y _bd ，z _bd ) Swing phase virtual force f _Pendulum The calculation formula is:

in the formula: (k) _x ，k _y ，k _z ) Is the elastic coefficient of the virtual force, (b) _x ，b _y ，b _z ) Is the damping coefficient of the virtual force.

And the height of the machine body from the ground needs to have a value, and the rest is uniformly set to be 0. The control law for the support phase can thus be derived:

ninthly calculation of joint torque

the invention introduces a foot end trajectory tracking performance index C for the improvement change of the tracking performance of the leg of the quadruped robot;

in the formula: x, y, z are the actual foot end trajectory positions, x _d 、y _d 、z _d Is the desired triaxial trajectory.

As shown in fig. 11, the dotted line part is the trajectory error curve of the robot foot end under the public method, and the red solid line part is the trajectory error curve of the robot foot end after the depth reinforcement learning is added, from which it can be seen that the maximum error peak value is 0.15 compared with the public method, while the error peak value of the method of the present invention is about 0.05, the effect is obvious, but the error change is irregular due to the dynamically updated parameters, but the overall stability is not affected.

In one embodiment, the specific steps of step (4) are as follows,

designing a state space:

designing an action space:

Designing a reward function:

the main purpose of the quadruped robot is to move forwards as stably as possible, maintain a certain body height and actively recover to a preset track v under the interference of external force _x Is the forward velocity of the robot, y is the displacement in the lateral direction of the robot, u is the output of the robot motion, θ _i Is the turning angle, T, of the robot body in 3 dimensions _f For simulating the maximum duration of the training process, T _s Is the minimum step size unit, T, in the simulation process _s /T _f In order to make the robot get more rewards by running more steps as much as possible in the training process, the reward function is set as follows:

setting of termination function

When the reinforcement learning agent carries out each training round, a termination function is set, when the robot body deviates a certain threshold value, the training is terminated in time for carrying out the training of the next round, the training time can be reduced, and the threshold values are respectively set as the foot end track error sum E _sum The robot body turnover angles alpha, beta and gamma, the lateral offset y of the robot and the height z of the robot from the ground. The termination function is set as follows:

S _isD ＝(E _sum ≥0.57)|(α、β、γ≥0.36)|(|y|≥0.4)|(z≤0.20) (15)

the round is terminated when any condition of the equation (19) is satisfied.

Design of reinforcement learning algorithm

TABLE 3 DDPG algorithm training parameters

The anti-interference performance contrast of the quadruped robot when receiving the lateral direction along the y direction:

fig. 12 shows the lateral position change of the quadruped robot after the robot body is subjected to the interference force of the lateral position under the disclosed method and the present invention method, respectively, wherein the applied interference force F is +10N applied in 1-2 seconds, and the reverse-10N applied in 4-5 seconds, as can be seen from the above figure, the presently disclosed method, because it cannot adapt to the change of the external environment in real time, after the quadruped robot is subjected to the lateral interference force, the quadruped robot can deflect in one direction uncontrollably, which causes it to deviate from the preset trajectory (the motion trajectory in the present invention is a straight forward motion, i.e. a motion along the lateral direction y being 0), after 4-5 seconds, the reverse interference force is applied, the robot motion direction immediately deflects, after the interference force is removed, the robot still moves along the previous direction, as shown by the dotted line in the figure, finally, the lateral displacement slope of the method is not 0 in total, which indicates that the robot offset is larger and larger; in the invention, because a deep reinforcement learning method is adopted, the problem of accurate tracking of the foot end track is not only considered, but also the motion track of the robot is considered as an important index, and the setting rule of the reinforcement learning intelligent body reward function of the invention text type (18) can be seen specifically, so that the robot body has certain self-route correcting capability after being interfered by the side direction, as shown by the solid line in fig. 12.

Comparison of robot leg joint output moments:

referring to fig. 13, in the disclosed method, no further explanation is made on the moment of the joint of the leg of the robot, the output magnitude is closely related to the set elastic coefficient and damping coefficient, and the position and speed of the current foot end of the robot, wherein the larger the difference between the state of the foot end and the actual expected state is, the larger the actually output moment is, and the leg of the robot has two states, namely a swing phase and a support phase, when in the swing phase, if the output exceeds the currently required moment magnitude, the predetermined state can be reached, but on the one hand, energy waste is also caused, and when the leg is reflected in a real robot, energy consumption is increased, and endurance is reduced; similarly, when the robot is in a supporting phase, the moment is too large, the situation that the robot slips or even turns over on the side with the ground due to unbalanced stress is easily caused, and when the moment is not enough, the robot cannot keep a preset height, so that the speed is too slow, and the like.

The invention adopts a method of deep reinforcement learning dynamic parameter adjustment, gives consideration to joint torque output, the overall principle is to make the output torque as small as possible, the purposes of reducing energy consumption and protecting the robot body are achieved, the sensors are used for acquiring the three-axis position, the foot end and foot end speed of the robot body and the output torque of a joint motor, the robot body learns in the continuous interaction process with the environment, the most suitable output torque is obtained, and the stable operation of the robot is realized. This is shown in detail in FIG. 14.

In fig. 13, it can be seen that the motor generates a large moment jump at the moment of each phase change from swing phase to support phase, and this phenomenon occurs in each gait cycle, increasing the instability factor of the body.

In fig. 14, it can be seen that the torque peak value is approximately the same as the effective part of the disclosed method, but the output torque is smoother, the torque abrupt change of the motor is effectively restrained, and the expected control target is achieved.

In one embodiment, the specific steps of step (5) are as follows.

Four-legged robot body modeling

② building matlab and simulink combined simulation model

And (4) according to the urdf file generated in the step (i), importing a model into simulink by utilizing a smiportant function of matlab, and building a corresponding algorithm model.

The experimental platform adopted in the experiment is MATLAB R2021a, the operating system is win10 system, the central processing unit is Intel 10600KF, the image display card is RTX3060, the simscape multi-body tool box and the Deep recovery testing toolbox of the Simulink are used to obtain the reward function curve in the training process, and the position relation and the speed relation between the quadruped robot and the world coordinate system are obtained through the Transform robot Sensor in the Simulink, and Simulink/sinks/scope are added to output the final simulation result, fig. 7 is the actual motion trajectory of the foot end of the quadruped robot of the invention, fig. 8 is the position change curve of the robot after the training is completed, wherein fig. 8(a) is the position change curve of the invention in the forward direction of the x axis, fig. 8(b) is the position change curve of the invention in the lateral deviation direction of the y axis, and fig. 8(c) is the position change curve of the invention in the vertical direction of the z axis.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A parameter control method of a quadruped robot virtual model controller based on reinforcement learning is characterized by comprising the following specific control methods:

2. The parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning as claimed in claim 1, wherein:

the specific steps of the step (1) are as follows,

body structure design of four-legged robot

Robot positive kinematics modeling

Establishing a world coordinate system { W } by using a ground origin, establishing a machine body coordinate system { B } by using a machine body centroid position of the robot, establishing a coordinate system at the robot leg hip joint origin position, wherein the coordinate system is respectively front left { FL }, front right { RL }, rear left { BL }, and rear right { BR }, hip joints, shoulder joints, knee joints and foot ends thereof are respectively represented by {0} to {3}, establishing a positive kinematics equation by using a DH method according to the four-foot robot parameters explained in the step (1) of claim 1, and keeping the structural parameters of the four legs consistent, wherein the positive kinematics equation is as follows:

wherein the transformation matrix for rotation about the z-axis is:

translating the transformation matrix in the y-direction as:

translating the transformation matrix in the z direction is:

in formula (5) c _ij ＝cos(θ _i +θ _j )，c _i ＝cos(θ _i )，s _i ＝sin(θ _i )。

The forward kinematic equation obtained from equation (5) is:

3. the parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning according to claim 2, characterized in that:

the specific steps of the step (2) are as follows,

horizontal x-direction constraint:

constraint in vertical y-direction:

4. the parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning as claimed in claim 1, wherein:

in the step (3), the specific steps of extracting the motion characteristic parameters are as follows,

virtual model controller as basic controller

Calculating a jacobian matrix:

calculating virtual force:

virtual model control supposes that a spring damping part exists and the current position (x, y, z) and speed of a control object are input

calculating joint moment

5. the parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning as claimed in claim 4, wherein:

the specific steps of the step (4) are as follows,

designing a state space:

the reinforcement learning is based on Markov decision process, and a series of state vectors(s) are obtained in the interaction process of an intelligent agent and the environment ₁ ，s ₂ ，...s _n And selecting position coordinates of the tail ends of the four legs and the feet relative to hip joints, wherein the overturning angles of the robot body are respectively side overturning, pitching and transverse overturning, the sum of errors of the tail ends of the legs and the tail ends of the feet and a reference expected track is calculated, and the last action output vector k totally accounts for 48 states:

designing an action space:

Designing a reward function:

the main purpose of the quadruped robot is to move forwards as stably as possible, maintain a certain body height and actively recover to a preset track v under the interference of external force _x Is the forward velocity of the robot, y is the displacement in the lateral direction of the robot, u is the output of the robot motion, θ _i Is the turning angle, T, of the robot body in 3 dimensions _f For simulating the maximum duration of the training process, T _s Is the minimum step size unit, T, in the simulation process _s /T _f In order to make the robot run more steps as much as possible during the training process to obtain more rewards, the reward function is set as follows:

setting of termination function

S _isD ＝(E _sum ≥0.57)|(α、β、γ≥0.36)|(|y|≥0.4)|(z≤0.20) (15)

the round is terminated when any condition of the equation (19) is satisfied.

Design of reinforcement learning algorithm

The state space and the action space of the multi-legged robot are both multidimensional continuous space vectors, the method optimizes parameters of a virtual model controller of the four-legged robot by selecting a depth deterministic strategy gradient (DDPG) method, the method comprises two neural networks which are an actor network and a critic network respectively, the actor network operator inputs the state s of the current environment and outputs the action a of an intelligent body, the critic network critic evaluates the output of the actor network according to the action of the actor network and the reaction of the current environment, the corresponding actor network improves the strategy output according to the evaluation of the critic network, the output of the actor network is more and more in line with the current environment state along with the increase of the turn times, the evaluation level of the critic network on the actor network is more and more accurate, and finally the optimal strategy is achieved.

6. The parameter control method of the virtual model controller of the quadruped robot based on reinforcement learning according to claim 5, characterized in that:

the specific steps of step (5) are as follows.

Four-legged robot body modeling

② building matlab and simulink combined simulation model