CN114609918A - Four-footed robot motion control method, system, storage medium and equipment - Google Patents
Four-footed robot motion control method, system, storage medium and equipment Download PDFInfo
- Publication number
- CN114609918A CN114609918A CN202210512279.6A CN202210512279A CN114609918A CN 114609918 A CN114609918 A CN 114609918A CN 202210512279 A CN202210512279 A CN 202210512279A CN 114609918 A CN114609918 A CN 114609918A
- Authority
- CN
- China
- Prior art keywords
- quadruped robot
- action
- robot
- network
- quadruped
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/04—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
- G05B13/042—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance
Abstract
The invention relates to the technical field of self-adaptive control, and provides a method, a system, a storage medium and equipment for controlling the motion of a quadruped robot, wherein the method comprises the following steps: acquiring the state of the quadruped robot when the quadruped robot walks in the environment, and selecting an action according to the state through a strategy network; acquiring the foot end position of the quadruped robot when the quadruped robot walks in the environment so as to calculate and obtain a reference action; the reference action and the action output by the strategy network are combined to obtain the action executed by the quadruped robot, and an action instruction is sent to the quadruped robot to realize the motion of the quadruped robot, so that the more stable and robust motion planning and control of the quadruped robot are realized.
Description
Technical Field
The invention belongs to the technical field of self-adaptive control, and particularly relates to a method, a system, a storage medium and equipment for controlling the motion of a quadruped robot.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
The traditional quadruped robot control method usually needs to carry out accurate dynamics and kinematics modeling analysis on the robot in advance, an angle and a moment which need to be executed by each joint driver are reversely solved by using an expected track and a foot end feedback force, the process needs a large amount of professional knowledge and a long manual design process, and the design of a robust controller for agile motion of the quadruped robot is difficult. In addition, delay and noise interference exist in reality, model analysis of the quadruped robot is often not accurate enough, difficulty in model analysis and system control is increased, and how to enable the robot to have the capability of autonomously learning movement and achieve adaptive control of the movement of the quadruped robot is one of the difficulties which need to be solved urgently at present.
The deep reinforcement learning technology combines the perception capability of deep learning and the decision-making capability of reinforcement learning, and in recent years, the deep reinforcement learning technology achieves a lot of breakthrough performances in a plurality of fields. The motion planning and control task of the robot can be described as a perception and decision problem, so the method of deep reinforcement learning is a very promising technology in the field of motion control of the robot, excessive human intervention of researchers is not needed in the method, and the robust motion with low energy consumption and high dynamic is generated by training the autonomous learning control strategy of the quadruped robot. However, the design process of the traditional quadruped robot controller based on deep reinforcement learning is complicated, and the stability, efficiency and universality of the motion control of the quadruped robot are poor.
Disclosure of Invention
In order to solve the technical problems in the background art, the invention provides a method, a system, a storage medium and equipment for controlling the motion of a quadruped robot, wherein a gait reference frame is added to guide the quadruped robot to generate expected gait motion, and the trained robot can realize more stable and robust motion planning and control.
In order to achieve the purpose, the invention adopts the following technical scheme:
A first aspect of the present invention provides a quadruped robot motion control method, including:
acquiring the state of the quadruped robot when the quadruped robot walks in the environment, and selecting an action according to the state through a strategy network;
acquiring the foot end position of the quadruped robot when the quadruped robot walks in the environment so as to calculate and obtain a reference action;
combining the reference action with the action output by the strategy network to obtain the action executed by the quadruped robot, and sending an action instruction to the quadruped robot to realize the motion of the quadruped robot;
the action performed by the quadruped robot is represented as:
wherein, the first and the second end of the pipe are connected with each other,andrepresenting the reference action and the action output by the policy network respectively,andrespectively representing the weight coefficients of the reference actions and the weight coefficients of the actions output by the policy network.
Further, the states include a pitch angle, a roll angle, a pitch angle velocity, a roll angle velocity, and positions of the respective joints of the quadruped robot.
Further, the reference motion is calculated by:
determining gait parameters of the quadruped robot;
calculating an expected foot end track based on the foot end position of the quadruped robot when walking in the environment and in combination with gait parameters;
the expected foot end trajectory is subjected to inverse kinematics calculation to obtain a reference motion.
Further, the desired foot end trajectory is:
wherein, the first and the second end of the pipe are connected with each other,x(t)、y(t) Andz(t) Robot with four feettThe desired foot end position in the body coordinate system at the time of day,andrepresents the position of the foot end in the coordinate system of the robot body in the initial state of the quadruped robot,the step size of the leg swing is indicated,indicating the step height at which the leg is swinging,representing a cycle of a single step.
Further, after the quadruped robot executes the action, the state is transferred, the reward is obtained, and the action, the reward and the state before and after the transfer are combined to be a transfer tuple to be stored in the experience playback pool.
Further, the reward is calculated by adopting a reward function;
the reward function comprises a forward speed reward item, a deflection speed penalty item, an energy consumption penalty item, a centroid trajectory floating penalty item and a posture angle change penalty item.
Further, the training step of the strategy network is as follows:
randomly sampling a plurality of transfer tuples from the experience playback pool, calculating the disturbed action, and updating the parameters of the value network;
judging whether the updating step length of the strategy network is reached, if not, continuing to update the parameters of the value network; otherwise, based on the updated value network, updating parameters of the strategy network by using a deterministic strategy gradient method.
A second aspect of the present invention provides a quadruped robot motion control system comprising:
an action selection module configured to: acquiring the state of the quadruped robot when the quadruped robot walks in the environment, and selecting an action according to the state through a strategy network;
a reference action calculation module configured to: acquiring the foot end position of the quadruped robot when the quadruped robot walks in the environment so as to calculate and obtain a reference action;
a control module configured to: combining the reference action with the action output by the strategy network to obtain the action executed by the quadruped robot, and sending an action instruction to the quadruped robot to realize the motion of the quadruped robot;
the action performed by the quadruped robot is represented as:
wherein the content of the first and second substances,andrepresenting the reference action and the action output by the policy network respectively,andweight coefficients and policy network outputs representing reference actions, respectivelyThe weight coefficient of the motion of (1).
A third aspect of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in a quadruped robot motion control method as described above.
A fourth aspect of the present invention provides a computer apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in a method for controlling the motion of a quadruped robot as described above when executing the program.
Compared with the prior art, the invention has the beneficial effects that:
the invention provides a motion control method of a quadruped robot, which is characterized in that a gait reference frame is added, the gait guide frame outputs a reference action according to an expected gait, the reference action instruction is combined with a learned action instruction and then transmitted to a joint driver of the quadruped robot to be executed, the quadruped robot is guided to generate expected gait motion, the trained robot can realize more stable and robust motion planning and control, and the autonomy and intelligence level of the quadruped robot motion planning and control are improved.
The invention provides a quadruped robot motion control method, which covers state information directly closely related to the generated motion of the quadruped robot, can avoid dimension disasters, reduce calculation pressure and improve the learning efficiency of a quadruped robot control strategy.
The invention provides a method for controlling the motion of a quadruped robot, wherein a reward function comprises a forward speed reward item, a deflection speed punishment item, an energy consumption punishment item, a mass center track floating punishment item and an attitude angle change punishment item, and the quadruped robot is encouraged to train to generate high-speed and stable forward motion; and the method has stronger universality, can encourage different task targets by adjusting the weight of each reward item, and then achieves the control effect of the quadruped robot expected to be realized.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
Fig. 1 is a flowchart of a method for controlling the motion of a quadruped robot according to a first embodiment of the present invention;
FIG. 2 is a leg phase diagram for a diagonal sprint gait according to a first embodiment of the invention;
FIG. 3 is a leg phase diagram for jumping gait according to a first embodiment of the invention;
fig. 4 is a diagram of reward values obtained by the quadruped robot in accordance with the first embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
The embodiment provides a method for controlling the motion of a quadruped robot, as shown in fig. 1, which specifically includes the following steps:
step 1, obtaining the state of the quadruped robot when walking in the environmentsSelecting actions based on state through policy networkaActions to be selectedaActions as policy network exports;
Step 2,The method comprises the steps of obtaining the position of the foot end of the quadruped robot when the quadruped robot walks in the environment, and calculating to obtain a reference action;
And 3, combining the reference action with the action output by the strategy network to obtain the action executed by the quadruped robot, sending an action command to each joint of the quadruped robot to realize the motion of the quadruped robot, and after the quadruped robot executes the action, transferring the state, namely the statesTransfer to the next stateTo obtain a rewardrAnd used to train the policy network.
In step 1, a Markov Decision Process (MDP) is used to model the quadruped robot motion control problem. Reinforcement learning methods are used to handle decision problems of the discrete or continuous type, which are typically modeled as markov decision processes to solve. The motion control of the robot is a continuous decision-making problem, so the invention models the motion control problem of the quadruped robot into a Markov decision-making process which is represented by a quadruple Wherein, in the step (A),a representation state space, also called observation space, consisting of basic information of the quadruped robot;representing an action space consisting of instructions executed by the quadruped robot joints;representing the state transition probability, and determining by the interaction process of the four-footed robot and the environment;and expressing a reward function which is used as an evaluation basis of the learning effect and is designed by a user according to the task target of the quadruped robot learning. Quadruped robot as learning control strategyIs considered to be an agent in the reinforcement learning problem, a control strategyCan be viewed as a stateTo actTo construct a neural network to represent, stateAs neural network inputs, actionsAs neural network outputs, wherein the stateBelonging to a state spaceElement of, actBelonging to the action spaceOf (2) is used. The goal of reinforcement learning is to find the optimal control strategy that maximizes the expected return value:
wherein the content of the first and second substances,a discount factor is indicated in the form of a discount factor,to representThe reward value fed back by the time reward mechanism is passed through the reward functionAnd (4) calculating. The quadruped robot is in a certain statesFinding optimal actions according to a policy functionAfter the quadruped robot executes the optimal action, the state is transferred to obtain the state of the next moment And earns the reward return value. Since the reward function is set based on the desired mission objective, it is considered that the higher the reward value obtained, the closer the quadruped robot is to the desired control effect, exhibiting better performance.
In step 1, in order to avoid the calculation pressure caused by overhigh dimension of the state space, the state comprises the pitch angle, the roll angle, the pitch angle speed, the roll angle speed and the position of each joint of the quadruped robot, namely the state comprises、、、Andwherein, in the process,to show a quadruped robotiThe position of each joint is determined by the position of each joint,i=1,2,3 …, n, n denotes the number of quadruped robotic joints, can be 12,andrespectively represents the pitch angle and the roll angle of the quadruped robot,andthe values of the pitch angle speed and the roll angle speed of the quadruped robot are respectively expressed, and the values of the states are obtained by reading the information of the robot model in a simulation environment. Although the simulation platform and the sensors of the quadruped robot can acquire various information such as position, moment, posture, speed, angular speed and the like as states, and even images, videos and the like containing a large amount of environmental information can be acquired, the calculation pressure is increased by observation information with too high dimensionality, so that the algorithm convergence is slow, and the efficiency of the quadruped robot in learning motion planning and control strategies is low. The state of the invention therefore includes 、、、Andthe state information directly closely related to the generated motion of the quadruped robot is included, so that dimension disasters can be avoided, the calculation pressure is reduced, and the learning efficiency of the control strategy of the quadruped robot is improved.
In step 1, a control strategy of the quadruped robot is trained by using a deep reinforcement learning algorithm, namely, the quadruped robot executes an action command to generate a desired motion effect when the control strategy is required to output a correct action. For the control task of the quadruped robot, the most intuitive action space is selected to be the moment or position of each joint actuator of the quadruped robot. The joint motor is easily influenced by the interference force of external uncertainty in the process of executing a torque command, and the effect is poor in the task of controlling the motion of the quadruped robot based on the learning method. Therefore, the present invention selects the joint motion command of the quadruped robot as the motion to be executed by the intelligent agent. Thus, the action includes the rotation angle of each joint of the quadruped robot, i.e., the action includesa 1、a 2、…、a nWherein, in the process,to show a quadruped robotiThe motion performed by the individual joint is defined asiSince the operation command of each joint actuator and the driving method of the joints of the quadruped robot are both rotational, the operation command is the rotational angle of the joint actuator. In order to ensure the reasonability of the action value of the strategy network, the action output by the neural network is cut off to be within a reasonable range, so that the overall stability of the four-legged robot in the early stage of training is ensured, the falling frequency of the four-legged robot in the training process is reduced, and the efficiency of learning strategies is improved to a certain extent.
In step 1, the state includes pitch angle, roll angle, pitch angle velocity, roll angle velocity of the quadruped robotThe degrees and the positions of all joints are used as the input of a strategy network, in order to avoid the operation pressure caused by overhigh dimension of the state space, part of collected information is not added into the state space, and the information is used as the evaluation basis in the reward function, so that a more complete reward mechanism is designed. Aiming at the motion control effect expected to be realized, a set of universal reward mechanism of the quadruped robot is designed. The main components of the reward mechanism comprise the forward speed, the deflection speed, the energy consumption, the center of mass floating height and the attitude angle change value of the quadruped robot, and the reward mechanism encourages the quadruped robot to train to generate high-speed stable forward motion. The reward function comprises a forward speed reward item, a deflection speed penalty item, an energy consumption penalty item, a centroid track floating penalty item and a posture angle change penalty item, and the reward functionExpressed as:
wherein, the first and the second end of the pipe are connected with each other,andrespectively representing a forward velocity reward item, a deflection velocity penalty item, an energy consumption penalty item, a centroid track floating penalty item and a posture angle change penalty item, Andare respectively asAndthe weighting coefficients of (a) are adjusted according to the task objectives desired to be achieved,andrespectively represents the position of the centroid of the quadruped robot at the current moment (before the state is transferred) on the x axis and the y axis of the world coordinate system,andrespectively represents the position of the centroid of the quadruped robot at the x axis and the y axis of the world coordinate system at the next moment (after the state is transferred),which represents a step of time in which the time is,is shown asThe moment of the motor of each joint is,is shown asThe angular velocity of the individual joint motors,the total number of joint motors is represented,andrespectively represents the height of the centroid of the quadruped robot at the current moment and the next moment,andrespectively represents the pitch angles of the quadruped robot at the current moment and the next moment,andrespectively represents the roll angle of the quadruped robot at the current moment and the next moment,andrespectively represents the yaw angles of the quadruped robot at the current moment and the next moment,andrespectively representing the reward weight coefficients of the pitch angle, the roll angle and the yaw angle, and the fall penalty valueTaking a constant. The reward mechanism designed in the way has strong universality, and can encourage different task targets by adjusting the weight of each reward item, and then achieve the control effect of the quadruped robot expected to be realized.
The strategy network in the step 1 is trained by adopting a double delay depth Deterministic strategy Gradient algorithm (TD 3). The dual delay depth Deterministic Policy Gradient algorithm is an improved algorithm of the depth Policy algorithm (DDPG). According to the algorithm, an Actor-Critic (AC) framework in traditional reinforcement learning is introduced into a depth strategy gradient method, a depth neural network is used for representing an action value function and a certainty strategy, a double-network architecture is used for both the strategy function and the value function, and an experience playback mechanism is introduced to reduce the error problem caused by sample correlation, so that the algorithm training process is easier to converge, the efficiency can be improved in solving a large-scale continuous action space task, and the obtained strategy is more stable and efficient. In addition, the TD3 algorithm solves the problem of overestimation caused by variance by introducing a truncated double Q-Learning (truncated double Q-Learning) method into the AC framework, and solves the problem of error accumulation by adopting a delayed update strategy network and a noise adding method.
The method for training the strategy network by adopting the TD3 algorithm comprises the following specific steps:
(1) The initialization operation, namely initializing the value network, the strategy network, the target network and the experience replay pool, is only carried out during initial training: initializing a value networkAnd policy networkParameters of value networkAnd policy networkThe parameters of (2) are all random parameters; initializing target networks (including target value network and target policy network), and setting parameters of value network and policy networkSynchronizing parameters of a target networkWherein the parameter of the target value network isThe parameters of the target policy network areAnd initializing an experience playback pool. The value network is used for fitting a value function, evaluating the strategy network and providing gradient information for strategy network updating. The target network is used to compute a target value and update the value network.
(2) Policy network based on statesAnd noise, selection actiona:
Wherein the content of the first and second substances,the representation of the noise is represented by,represents the OU (Ornstein-Uhlenbeck) process,representing the variance of the noise.
In step 3, after the quadruped robot performs the operation, the state is shifted, that is, the state is shifted to the next state (the state at the next time)And receive a rewardWherein the prize is awardedBy the reward function in equation (2)Calculating to obtain; combining actions, rewards, and states before and after a jump into a jump tuple And storing in an experience playback pool.
During training of the strategy network, a plurality of strategy networks are randomly sampled from an experience replay pool (small batch size)One) transfer tuple, the target policy network computes the action after disturbance:
wherein, the first and the second end of the pipe are connected with each other,representing a clipping function to limit noise toIn the interval range, c is a constant; the target value network then calculates an update target:
then, the parameters of the ith (i =1, 2) value network are updated:
(3) judging whether the updating step length of the strategy network is reached, if the updating step length is not reached, returning to the step (2) and continuously updating the parameters of the value network; otherwise, after the updating step length of the strategy network is reached, based on the updated value network, the strategy is updated by using a deterministic strategy gradient methodParameters of the neural network:
And using a soft updating method to update the parameters of the target network:
wherein the content of the first and second substances,a hyper-parameter much less than 1. By using the soft updating method, the target network parameters are gradually close to the value network and the strategy network, so that the parameters of the value network and the strategy network are updated in time, the stability of the gradients of the value network and the strategy network is ensured, and the algorithm is easier to converge.
In the process of training by using the TD3 algorithm, the strategy network is continuously updated towards a good direction according to the cost function, and the output action is more reasonable.
The intelligent learning strategy in the reinforcement learning problem is a process of evolution in trial and error, a quadruped robot executes wrong actions inevitably in the learning process, so that the robot generates unstable and even dangerous motions, great loss is often caused on an entity robot, and the robot can be in the wrong strategy for a long time to execute unreasonable actions, so that the learning efficiency in a simulation environment is low. In order to achieve the purpose, a gait reference frame is designed in the step 2, the gait reference frame is inspired by a quadruped robot controller based on a central mode generator, the leg of the robot regularly supports and swings by setting the phases of four legs and a track based on phase change, and the whole motion of the quadruped robot is finally generated through the cooperative rotation of all joints of the leg. In a gait reference frame, the swing, the support and the foot end track of four legs of the quadruped robot are planned, and parameters such as leg lifting height and step length are set according to the gait characteristics. The foot end track of the four-foot robot is designed to avoid the sliding and mopping phenomena when the foot end is contacted with the ground as much as possible, and the foot end track is designed by adopting an improved composite cycloid method.
(1) gait planning: gait parameters of the quadruped robot are determined, the gait parameters including step size, step height and phase (swing phase and support phase). The gait parameters are determined according to the gait characteristics expected to be realized, and the embodiment realizes two dynamic gaits with the duty ratio of 0.5 by using a gait guide frame, namely a diagonal running (Trot) gait and a jumping (Bound) gait. When the quadruped robot moves with Trot gait, the swinging and the supporting of the leg at the diagonal position are synchronous, the right front leg and the left rear leg move synchronously, the left front leg and the right rear leg move synchronously, and the phase of the Trot gait leg is shown in figure 2. When the quadruped robot moves with Bound gait, two forelegs move synchronously, two hind legs move synchronously, and the phase of the legs of Bound gait is shown in figure 3.
(2) Based on the foot end position of the quadruped robot when walking in the environment, the expected foot end track is calculated by combining gait parameters, and the expected foot end track is as follows:
wherein the content of the first and second substances,x(t)、y(t) Andz(t) Robot with four feettThe desired foot end position in the body coordinate system at the time of day,andrepresents the position of the foot end in the coordinate system of the robot body in the initial state of the quadruped robot, The step size of the leg swing is indicated,indicating the step height at which the leg is swinging,indicating the duration of the leg swing or support, i.e. the period of a single step,indicates the current time, an. When the robot moves forwards, the track of the foot end of the supporting leg moves backwards relative to the position of the mass center.
(2) The expected foot end track is calculated through inverse kinematics to obtain the basic position of each joint of the robot as a reference action。
In step 3, the action command output by the strategy network is combined with the reference action command and then transmitted to the quadruped robot, and the 12 joint drivers of the robot execute the action command to generate the expected gait and stable motion.
In particular, in the reference actOn the basis of the network access policy, join the action of the policy network output(learning operation), and finally, the action executed by the quadruped robot is an action command of each joint, wherein the action executed by the quadruped robot is represented as:
wherein the content of the first and second substances,indicating the action that the quadruped robot is actually going to perform,andrespectively representing the reference action obtained by the gait reference frame and the action output by the strategy network,is the joint rotation angle calculated by inverse kinematics according to the expected foot end track, Is the joint rotation angle output by the strategy network trained by the deep reinforcement learning algorithm,andthe weight coefficient of the reference action and the weight coefficient of the learning action are respectively represented, and the importance of the guiding action of the reference action in the four-legged robot training process can be adjusted through adjusting the coefficients.
The invention models the motion planning and control problem of the quadruped robot into a Markov decision process, and designs a universal motion control rewarding mechanism of the quadruped robot; the gait simulation method comprises the steps of guiding the phase relation between swing legs and supporting legs of the quadruped robot to generate multiple gaits by adding a gait reference frame, training by using a deep reinforcement learning algorithm of a double-delay depth certainty strategy gradient, and generating an expected gait motion control strategy through training in a simulation environment to enable the robot to realize stable walking of the expected gaits. As shown in fig. 4, it is shown that the reward value obtained by the quadruped robot adopting the method for controlling the motion of the quadruped robot of the present embodiment is compared with a near-end Policy Optimization (PPO) deep reinforcement learning algorithm.
Example two
The embodiment provides a quadruped robot motion control system, which specifically comprises the following modules:
an action selection module configured to: acquiring the state of the quadruped robot when the quadruped robot walks in the environment, and selecting an action according to the state through a strategy network;
a reference action calculation module configured to: acquiring the foot end position of the quadruped robot when the quadruped robot walks in the environment so as to calculate and obtain a reference action;
a control module configured to: and combining the reference action with the action output by the strategy network to obtain the action executed by the quadruped robot, and sending an action instruction to the quadruped robot to realize the motion of the quadruped robot. After the quadruped robot executes the action, the state is transferred, the reward is obtained, and the action, the reward and the state before and after the transfer are combined into a transfer tuple to be stored in an experience playback pool.
A policy network training module configured to:
randomly sampling a plurality of transfer tuples from the experience playback pool, calculating the disturbed action, and updating the parameters of the value network;
judging whether the updating step length of the strategy network is reached, if not, continuing to update the parameters of the value network; otherwise, based on the updated value network, updating parameters of the strategy network by using a deterministic strategy gradient method.
It should be noted that, each module in the present embodiment corresponds to each step in the first embodiment one to one, and the specific implementation process is the same, which is not described again here.
EXAMPLE III
The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in a method for controlling the motion of a quadruped robot as described in the first embodiment.
Example four
The present embodiment provides a computer device, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps of a method for controlling the motion of a quadruped robot as described in the first embodiment.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer readable storage medium and executed by a computer to implement the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (10)
1. A method for controlling the motion of a quadruped robot, comprising:
acquiring the state of the quadruped robot when the quadruped robot walks in the environment, and selecting an action according to the state through a strategy network;
acquiring the position of a foot end of the quadruped robot when the quadruped robot walks in the environment so as to calculate and obtain a reference action;
combining the reference action with the action output by the strategy network to obtain the action executed by the quadruped robot, and sending an action instruction to the quadruped robot to realize the motion of the quadruped robot;
the action performed by the quadruped robot is represented as:
wherein, the first and the second end of the pipe are connected with each other,andrepresenting the reference action and the action output by the policy network respectively,andrespectively representing the weight coefficients of the reference actions and the weight coefficients of the actions output by the policy network.
2. The method of claim 1, wherein the state comprises pitch angle, roll angle, pitch angle velocity, roll angle velocity, and position of each joint of the quadruped robot.
3. The method for controlling the motion of a quadruped robot according to claim 1, wherein the reference motion is calculated by:
determining gait parameters of the quadruped robot;
calculating an expected foot end track based on the foot end position of the quadruped robot when walking in the environment and in combination with gait parameters;
The expected foot end trajectory is calculated through inverse kinematics to obtain a reference action.
4. A quadruped robot motion control method according to claim 3, wherein the desired foot end trajectory is:
wherein, the first and the second end of the pipe are connected with each other,x(t)、y(t) Andz(t) Robot with four feettThe desired foot end position in the body coordinate system at the time of day,andrepresents the position of the foot end in the coordinate system of the robot body in the initial state of the quadruped robot,the step size of the leg swing is indicated,indicating the step height at which the leg is swinging,representing a cycle of a single step.
5. The method of claim 1, wherein the state transition occurs after the quadruped robot performs the action, and the reward is obtained, and the action, the reward and the state before and after the transition are combined into a transition tuple to be stored in the experience replay pool.
6. The method of claim 5, wherein the reward is calculated using a reward function;
the reward function comprises a forward speed reward item, a deflection speed penalty item, an energy consumption penalty item, a centroid trajectory floating penalty item and a posture angle change penalty item.
7. The method for controlling the motion of a quadruped robot according to claim 5, wherein the step of training the strategy network comprises:
Randomly sampling a plurality of transfer tuples from the experience playback pool, calculating the disturbed action, and updating the parameters of the value network;
judging whether the updating step length of the strategy network is reached, if not, continuing to update the parameters of the value network; otherwise, based on the updated value network, updating parameters of the strategy network by using a deterministic strategy gradient method.
8. A quadruped robotic motion control system, comprising:
an action selection module configured to: acquiring the state of the quadruped robot when the quadruped robot walks in the environment, and selecting an action according to the state through a strategy network;
a reference action calculation module configured to: acquiring the foot end position of the quadruped robot when the quadruped robot walks in the environment so as to calculate and obtain a reference action;
a control module configured to: combining the reference action with the action output by the strategy network to obtain the action executed by the quadruped robot, and sending an action instruction to the quadruped robot to realize the motion of the quadruped robot;
the action performed by the quadruped robot is represented as:
9. A computer-readable storage medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the steps in a quadruped robot motion control method according to any one of claims 1-7.
10. A computer apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps in a quadruped robotic motion control method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210512279.6A CN114609918B (en) | 2022-05-12 | 2022-05-12 | Four-footed robot motion control method, system, storage medium and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210512279.6A CN114609918B (en) | 2022-05-12 | 2022-05-12 | Four-footed robot motion control method, system, storage medium and equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114609918A true CN114609918A (en) | 2022-06-10 |
CN114609918B CN114609918B (en) | 2022-08-02 |
Family
ID=81870549
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210512279.6A Active CN114609918B (en) | 2022-05-12 | 2022-05-12 | Four-footed robot motion control method, system, storage medium and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114609918B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114859737A (en) * | 2022-07-08 | 2022-08-05 | 中国科学院自动化研究所 | Method, device, equipment and medium for transferring gait of quadruped robot |
CN115128960A (en) * | 2022-08-30 | 2022-09-30 | 齐鲁工业大学 | Method and system for controlling motion of biped robot based on deep reinforcement learning |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106094813A (en) * | 2016-05-26 | 2016-11-09 | 华南理工大学 | It is correlated with based on model humanoid robot gait's control method of intensified learning |
CN107562052A (en) * | 2017-08-30 | 2018-01-09 | 唐开强 | A kind of Hexapod Robot gait planning method based on deeply study |
CN109093626A (en) * | 2018-09-28 | 2018-12-28 | 中科新松有限公司 | The fuselage attitude control method and device of quadruped robot |
CN109291052A (en) * | 2018-10-26 | 2019-02-01 | 山东师范大学 | A kind of massaging manipulator training method based on deeply study |
CN109605377A (en) * | 2019-01-21 | 2019-04-12 | 厦门大学 | A kind of joint of robot motion control method and system based on intensified learning |
CN110496377A (en) * | 2019-08-19 | 2019-11-26 | 华南理工大学 | A kind of virtual table tennis forehand hit training method based on intensified learning |
CN110764416A (en) * | 2019-11-11 | 2020-02-07 | 河海大学 | Humanoid robot gait optimization control method based on deep Q network |
US20200143206A1 (en) * | 2018-11-05 | 2020-05-07 | Royal Bank Of Canada | System and method for deep reinforcement learning |
CN111552301A (en) * | 2020-06-21 | 2020-08-18 | 南开大学 | Hierarchical control method for salamander robot path tracking based on reinforcement learning |
CN111625002A (en) * | 2019-12-24 | 2020-09-04 | 杭州电子科技大学 | Stair-climbing gait planning and control method of humanoid robot |
CN112476424A (en) * | 2020-11-13 | 2021-03-12 | 腾讯科技(深圳)有限公司 | Robot control method, device, equipment and computer storage medium |
CN112596534A (en) * | 2020-12-04 | 2021-04-02 | 杭州未名信科科技有限公司 | Gait training method and device for quadruped robot based on deep reinforcement learning, electronic equipment and medium |
WO2021152047A1 (en) * | 2020-01-28 | 2021-08-05 | Five AI Limited | Planning in mobile robots |
CN113400307A (en) * | 2021-06-16 | 2021-09-17 | 清华大学 | Control method of space robot mechanical arm |
-
2022
- 2022-05-12 CN CN202210512279.6A patent/CN114609918B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106094813A (en) * | 2016-05-26 | 2016-11-09 | 华南理工大学 | It is correlated with based on model humanoid robot gait's control method of intensified learning |
CN107562052A (en) * | 2017-08-30 | 2018-01-09 | 唐开强 | A kind of Hexapod Robot gait planning method based on deeply study |
CN109093626A (en) * | 2018-09-28 | 2018-12-28 | 中科新松有限公司 | The fuselage attitude control method and device of quadruped robot |
CN109291052A (en) * | 2018-10-26 | 2019-02-01 | 山东师范大学 | A kind of massaging manipulator training method based on deeply study |
US20200143206A1 (en) * | 2018-11-05 | 2020-05-07 | Royal Bank Of Canada | System and method for deep reinforcement learning |
CN109605377A (en) * | 2019-01-21 | 2019-04-12 | 厦门大学 | A kind of joint of robot motion control method and system based on intensified learning |
CN110496377A (en) * | 2019-08-19 | 2019-11-26 | 华南理工大学 | A kind of virtual table tennis forehand hit training method based on intensified learning |
CN110764416A (en) * | 2019-11-11 | 2020-02-07 | 河海大学 | Humanoid robot gait optimization control method based on deep Q network |
CN111625002A (en) * | 2019-12-24 | 2020-09-04 | 杭州电子科技大学 | Stair-climbing gait planning and control method of humanoid robot |
WO2021152047A1 (en) * | 2020-01-28 | 2021-08-05 | Five AI Limited | Planning in mobile robots |
CN111552301A (en) * | 2020-06-21 | 2020-08-18 | 南开大学 | Hierarchical control method for salamander robot path tracking based on reinforcement learning |
CN112476424A (en) * | 2020-11-13 | 2021-03-12 | 腾讯科技(深圳)有限公司 | Robot control method, device, equipment and computer storage medium |
CN112596534A (en) * | 2020-12-04 | 2021-04-02 | 杭州未名信科科技有限公司 | Gait training method and device for quadruped robot based on deep reinforcement learning, electronic equipment and medium |
CN113400307A (en) * | 2021-06-16 | 2021-09-17 | 清华大学 | Control method of space robot mechanical arm |
Non-Patent Citations (1)
Title |
---|
李怡斌 等: "液压驱动四足仿生机器人的结构设计和步态规划", 《山东大学学报(工学版)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114859737A (en) * | 2022-07-08 | 2022-08-05 | 中国科学院自动化研究所 | Method, device, equipment and medium for transferring gait of quadruped robot |
CN115128960A (en) * | 2022-08-30 | 2022-09-30 | 齐鲁工业大学 | Method and system for controlling motion of biped robot based on deep reinforcement learning |
Also Published As
Publication number | Publication date |
---|---|
CN114609918B (en) | 2022-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114609918B (en) | Four-footed robot motion control method, system, storage medium and equipment | |
US9950426B2 (en) | Predictive robotic controller apparatus and methods | |
US7685081B2 (en) | Bipedal walking simulation | |
Da Silva et al. | Linear Bellman combination for control of character animation | |
Abreu et al. | Learning low level skills from scratch for humanoid robot soccer using deep reinforcement learning | |
CN113478486B (en) | Robot motion parameter self-adaptive control method and system based on deep reinforcement learning | |
Daniel et al. | Learning concurrent motor skills in versatile solution spaces | |
Gehring et al. | Towards automatic discovery of agile gaits for quadrupedal robots | |
Peters et al. | Robot learning | |
Kashyap et al. | Optimization of stability of humanoid robot NAO using ant colony optimization tuned MPC controller for uneven path | |
CN111546349A (en) | New deep reinforcement learning method for humanoid robot gait planning | |
Xin et al. | Online dynamic motion planning and control for wheeled biped robots | |
CN111730595A (en) | Gait stability control method of biped robot under slope condition | |
Wu et al. | Learning robust and agile legged locomotion using adversarial motion priors | |
Yu et al. | Dynamic bipedal maneuvers through sim-to-real reinforcement learning | |
CN117215204B (en) | Robot gait training method and system based on reinforcement learning | |
Yu et al. | Dynamic bipedal turning through sim-to-real reinforcement learning | |
Van de Panne | Control techniques for physically-based animation. | |
MacAlpine et al. | Using dynamic rewards to learn a fully holonomic bipedal walk | |
CN113568422A (en) | Quadruped robot control method based on model prediction control optimization reinforcement learning | |
Liu et al. | Learning control of quadruped robot galloping | |
Dau et al. | Optimal trajectory generation for bipedal robots | |
Gao et al. | A survey of research on several problems in the RoboCup3D simulation environment | |
Liu et al. | A motion planning and control method of quadruped robot based on deep reinforcement learning | |
CN117555339B (en) | Strategy network training method and human-shaped biped robot gait control method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20221212 Address after: Room 3115, No. 135, Ward Avenue, Ping'an Street, Changqing District, Jinan, Shandong 250300 Patentee after: Shandong Jiqing Technology Service Co.,Ltd. Address before: 250353 University Road, Changqing District, Ji'nan, Shandong Province, No. 3501 Patentee before: Qilu University of Technology |