CN118131807A

CN118131807A - Four-foot robot intelligent control method and system based on definite learning

Info

Publication number: CN118131807A
Application number: CN202410203798.3A
Authority: CN
Inventors: 王聪; 邵宁; 朱泽键; 张付凯; 杨钦辰; 姜含
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2024-02-23
Filing date: 2024-02-23
Publication date: 2024-06-04

Abstract

The invention provides a four-foot robot intelligent control method and system based on definite learning, which relate to the field of four-foot robot motion control, and specifically comprise the following steps: collecting current motion data of the four-foot robot in real time; inputting the current motion data into a trained torque prediction model to obtain feedforward torque of the joint; performing motion control on the quadruped robot based on the feedforward torque of the joints; the invention models the motion process of the quadruped robot, in particular the inherent nonlinear dynamics of legs, based on the RBF neural network by adopting a method combining the deterministic learning and the deep reinforcement learning, and is used for predicting the body state of the quadruped robot so as to estimate the final feedforward torque and realize the intelligent control of the quadruped robot.

Description

Four-foot robot intelligent control method and system based on definite learning

Technical Field

The invention belongs to the field of motion control of four-legged robots, and particularly relates to an intelligent control method and system of a four-legged robot based on definite learning.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

The four-legged robot has been rapidly developed and applied in recent years due to the advantages of flexible movement, strong terrain adaptability and the like; in the field of motion control of four-foot robots, a traditional control mode generally adopts a pipeline method and consists of components such as state estimation, track optimization, model prediction control, operation space control and the like, so that the diversity and high-speed motion of the four-foot robots are realized; however, these classical methods typically require extensive experience and cumbersome manual tuning, and often require accurate kinetic models of the robot, which are often difficult to obtain.

Recently, deep reinforcement learning has been greatly progressed, and the algorithm can learn the motor skills under the conditions of minimum engineering design and no robot dynamics explicit model, so that the robot solves the motor problem from the beginning under the condition of no too much manual intervention, and the four-foot robot keeps balance under large disturbance and stably moves at high speed; although deep reinforcement learning provides a new idea for motion control of a quadruped robot, most strategies trained at present still have two main problems: 1) Most of the existing control methods are based on a linear PD controller, and nonlinear dynamics in a robot are not considered to be compensated, so that the control effect of a motion strategy trained by the existing control methods is not good enough, and the stability of a motion gesture is to be improved; 2) The motion strategies learned in simulation make it difficult to achieve a stride from computer simulation to real world migration.

The Chinese patent CN113156892B discloses a four-foot robot imitation motion control method based on a deep neural network, which comprises two stages of supervised learning motion feature extraction and deep reinforcement learning training; in the feature extraction stage, key information of animal motion nodes is extracted through a deep learning video feature extraction network, and is further processed by an X11 time sequence analysis method, so that a periodic rule of motion features is extracted; in the deep reinforcement learning training stage, taking the extracted periodic rule of animal motion characteristics as part of input of a deep reinforcement learning network, taking the state of the current robot as the other part of input, and setting a reward function, so that the quadruped robot interacts with the environment in simulation, and trains out the deep reinforcement learning network capable of simulating reference motion; however, the above method does not consider introducing a more advanced feedback control method after the deep reinforcement learning network decision, so that the control accuracy and the control effect are poor, and the difference between simulation and reality is not tried to be reduced in the training process, which easily causes the problem that the model is difficult to be deployed on the robot.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a four-foot robot intelligent control method and system based on definite learning, which are used for modeling the motion process of the four-foot robot, particularly the inherent nonlinear dynamics behavior of legs, based on an RBF neural network by adopting a method combining definite learning and deep reinforcement learning, and are used for predicting the body state of the four-foot robot so as to estimate the final feedforward torque and realize the intelligent control of the four-foot robot.

To achieve the above object, one or more embodiments of the present invention provide the following technical solutions:

The invention provides a four-legged robot intelligent control method based on definite learning.

A four-foot robot intelligent control method based on definite learning includes:

Collecting current motion data of the four-foot robot in real time;

inputting the current motion data into a trained torque prediction model to obtain feedforward torque of the joint;

Performing motion control on the quadruped robot based on the feedforward torque of the joints;

the torque prediction model is constructed based on an RBF neural network, a method of combining deterministic learning and deep reinforcement learning is adopted, and modeling is carried out on the motion process of the quadruped robot, particularly the inherent nonlinear dynamics of legs, and the model is used for predicting the body state of the quadruped robot, so that the final feedforward torque is estimated.

Further, the motion data includes base angular velocity and direction, angle of rotation of the joint relative to an initial default position, angle of rotation of the joint, velocity, and feed forward torque.

Further, the torque prediction model comprises a dynamic estimator sub-model constructed based on actual operation data, and a state estimating sub-model, a strategy sub-model and a controller sub-model constructed based on a virtual simulation environment;

The dynamic estimator submodel is used for predicting the rotation angle and the rotation speed of the joint at the next moment according to the rotation angle and the rotation speed of the joint at the current moment and the feedforward torque;

The state estimation sub-model is used for predicting the linear speed and the foot end height of the four-foot robot at the current moment according to the current motion data;

The strategy sub-model is used for predicting the expected rotation angle of the joint according to the current motion data, the linear speed and foot end height at the current moment, the expected rotation angle at the historical moment, the joint position error at the historical moment and the joint speed;

The controller submodel is used for predicting the feedforward torque at the next moment according to the expected rotation angle.

Further, the dynamic estimator submodel models inherent nonlinear dynamics of the legs of the quadruped robot by utilizing an RBF neural network based on a determined learning theory;

the method comprises the steps of collecting leg data of the entity quadruped robot in an actual running state as a training set, and updating the network weight of the dynamic estimator sub-model through the tracking error of the joint rotation angle.

Furthermore, the state estimation submodel is constructed based on a multi-layer perceptron and trained in a virtual simulation environment, so that the controller has robustness to unavoidable errors of the state estimator.

Further, the strategy sub-model is constructed by deep reinforcement learning in a virtual simulation environment based on a long-short-time memory network, and the expected rotation angle of each joint of the robot relative to the initial position is predicted.

Further, the controller sub-model utilizes RBF neural network to carry out local modeling on unknown system dynamics of the legs of the quadruped robot based on a determined learning theory, and builds a mapping relation between a rotation angle and a feedforward torque;

In the virtual simulation environment, the network weight of the controller sub-model is updated through the tracking error of the rotation angle and the rotation speed of the joint.

The second aspect of the invention provides a four-legged robot intelligent control system based on definite learning.

A four-foot robot intelligent control system based on definite learning comprises an acquisition module, a prediction module and a control module:

An acquisition module configured to: collecting current motion data of the four-foot robot in real time;

A prediction module configured to: inputting the current motion data into a trained torque prediction model to obtain feedforward torque of the joint;

a control module configured to: performing motion control on the quadruped robot based on the feedforward torque of the joints;

The torque prediction model is constructed based on an RBF neural network, a method of combining deterministic learning and deep reinforcement learning is adopted, and modeling is carried out on the motion process of the quadruped robot, particularly the inherent nonlinear dynamics behavior of legs, and the model is used for predicting the body state of the quadruped robot, so that the final feedforward torque is estimated.

A third aspect of the present invention provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the steps in a method for intelligent control of a quadruped robot based on deterministic learning according to the first aspect of the present invention.

A fourth aspect of the present invention provides an electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in a method for intelligent control of a quadruped robot based on deterministic learning according to the first aspect of the present invention when the program is executed.

The one or more of the above technical solutions have the following beneficial effects:

The invention provides a four-foot robot intelligent control method and system based on deterministic learning, which utilizes a radial basis function neural network to model inherent nonlinearity dynamics of legs of the four-foot robot, and performs deterministic learning control according to a modeling result and a model in simulation, thereby effectively improving the motion strategy control effect of the four-foot robot under a deep reinforcement learning frame, enabling legs of the robot obtained through training to move more stably and naturally, effectively solving the problem that a network obtained through training is difficult to deploy on a real robot, and providing a new idea for subsequent deep reinforcement learning algorithm research and engineering application.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a schematic diagram of a four-legged robot intelligent control method based on the deterministic learning and the deep reinforcement learning in the first embodiment.

Fig. 2 is a schematic diagram of RBF neural network weight convergence for determining a learning dynamic estimator sub-model in a first embodiment.

Fig. 3 is a schematic diagram of a four-foot robot ground leveling training scenario in the first embodiment.

FIG. 4 is a schematic diagram showing the change of the loss function in the training process of the state estimation sub-model according to the first embodiment.

Fig. 5 is a schematic view showing the effect of linear velocity and foot end height estimation of the state estimation sub-model in the first embodiment.

FIG. 6 is a graphical illustration of the variation of average rewards during training of the deep reinforcement learning strategy sub-model of the first embodiment.

Fig. 7 is a schematic diagram of the case of determining the convergence of the weights of the RBF neural network of the learning controller in the first embodiment.

Fig. 8 is a schematic diagram of the change of the linear velocity tracking reward and the angular velocity tracking reward of the robot body in the training process according to the first implementation method.

Fig. 9 is a schematic diagram of training results of the first embodiment.

Detailed Description

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Example 1

In one embodiment of the present disclosure, there is provided a four-legged robot intelligent control method based on deterministic learning, including the steps of:

step S1: collecting current motion data of the four-foot robot in real time;

step S2: inputting the current motion data into a trained torque prediction model to obtain feedforward torque of the joint;

Step S3: performing motion control on the quadruped robot based on the feedforward torque of the joints;

The following describes in detail a four-legged robot intelligent control method based on determination learning in this embodiment from the process of construction and training of a torque prediction model.

Aiming at the defects of the prior art described in the background art, an advanced control method with better effect is needed to solve the problems of poor control effect and unstable leg movement in a deep reinforcement learning frame, and a method aiming at shortening the gap between a simulation environment and reality is needed to reduce the difficulty of deploying a model obtained by training on a robot.

The purpose of determining the learning theory is that under the condition of meeting part of continuous excitation, the system identification error meets the index stability along the period or regression track generated by the nonlinear system, and finally, the convergence of the radial basis function neural network weight can be realized, and the local accurate modeling of the system is realized; through definite learning, the part which is difficult to model of the real robot can be locally and accurately modeled; the modeling result is introduced into the simulation environment, so that the gap between the simulation environment and reality can be effectively reduced, and the difficulty in deploying the model obtained by training on the robot is reduced; in addition, the method can also perform the definite learning control by using the definite learning theory, and realize the good approximation of the system dynamics in the control process, thereby realizing the control with higher precision and better control effect.

Therefore, the embodiment provides a torque prediction model for intelligent control of a quadruped robot, which is constructed based on an RBF neural network, comprises a dynamic estimator sub-model constructed based on actual operation data, a state estimating sub-model constructed based on a virtual simulation environment, a strategy sub-model and a controller sub-model, and models the motion process of the quadruped robot, particularly the inherent nonlinear dynamics of legs by adopting a method combining deterministic learning and deep reinforcement learning, and is used for predicting the body state of the quadruped robot so as to estimate the final feedforward torque.

The RBF (Radical Basis Function radial basis function) neural network belongs to the forward neural network type, and the structure of the network is similar to that of a multi-layer forward network, so that the RBF (Radical Basis Function radial basis function) neural network is a three-layer forward network. The first layer is an input layer and consists of signal source nodes; the second layer is a hidden layer, the node number of the hidden layer depends on the requirement of the described problem, the transformation function of the neuron in the hidden layer, namely the radial basis function, is a nonnegative nonlinear function which is radially symmetrical to the center point and decays, the function is a local response function, and the previous forward network transformation function is a global response function; the third layer is the output layer, which is responsive to the input mode.

The basic idea of RBF networks is: the RBF is used as the 'base' of the hidden unit to form a hidden layer space, the hidden layer transforms the input vector, and transforms the mode input data with low dimension into a high-dimension space, so that the problem of indistinguishable linearity in the low-dimension space is indistinguishable linearly in the high-dimension space.

The neural network has simple structure, simple training and fast learning convergence speed, and can approximate any nonlinear function, so that the neural network has been widely applied to the fields of time sequence analysis, pattern recognition, nonlinear control, graphic processing and the like.

The process of constructing, training and deploying the torque prediction model is described in detail below, as shown in fig. 1, specifically:

Step one: the method comprises the steps of collecting leg data of the entity quadruped robot in an actual running state, and constructing a data set for modeling inherent nonlinear dynamics of the legs.

The robot walks in a diagonal gait by sending various movement instructions through the remote controller, and data such as the position, the speed, the current, the feedforward torque and the like of leg joints under the actual running state of the entity quadruped robot are collected to construct a data set for modeling of inherent nonlinear dynamics of legs.

Specifically, the AT9S remote controller sends various motion instructions to the UP Board single-Board computer of the robot, so that the robot walks in a diagonal gait. The robot leg controller STM32 reads the position information, the speed information and the moment of the leg encoder through the SPI, then transmits data to the SPIne firmware through the CAN bus, and the SPIne firmware transmits the data to the UP Board computer Board card through the SPI, so that the position, the speed, the current, the feedforward torque and other data of the leg joint under the actual running state of the physical robot are acquired.

Step two: training and determining a learning dynamic estimator sub-model by utilizing the data set constructed in the step one, wherein the model is used for representing inherent nonlinear dynamics rules of legs in an actual running state and specifically comprises the following steps:

acquiring a state vector of a leg joint:

X(k)＝(q(k),q_d(k),τ(k)) (1)

where q and q _d represent the rotational angle and speed, respectively, of the leg joint and τ represents the feed-forward torque of the leg joint.

The initial dynamic estimator sub-model is constructed as follows:

Wherein, And/>Representing state variables, T _s representing the sampling period, a _i representing the dynamic estimator submodel gain to be designed,/>RBF network representing nonlinear dynamics for learning system interior,/>Representing the estimated weights of the neural network, S (X) is a gaussian radial basis function with the network input vector X.

Inputting the acquired joint state variables into a dynamic estimator sub-model, and defining tracking errors e (k) as follows:

According to the tracking result, training a dynamic estimator sub-model, and adopting the following weight update rate:

where γ ₁ denotes a learning gain to be designed.

After a period of time, the weights achieve convergence, the learned knowledge is stored in the form of a constant neural network, and the trained weights are obtained

Wherein,Refers to the pair/>, within the time interval [ t _a,t_b ]Average and 0<t _a<t_b.

For the initial dynamic estimator sub-model of the formula (2) and (3), the weight update rate of the formula (5) is adopted, convergence is realized after a period of time of weight, and finally the dynamic estimator sub-model based on the constant neural network is learned:

Wherein, Is the trained neural network weight.

The dynamic estimator submodel is based on a definite learning theory, modeling is carried out on inherent nonlinear dynamics of legs of the quadruped robot by utilizing an RBF neural network, the distribution dimension is 3D, the collected data are normalized, the distribution range of the neural network is [ -1,1] × [ -1,1], the interval distance of neurons in each dimension is 0.2, the width of a receptive field is 0.2, the learning rate gamma ₁ is designed to be 0.8, and a network weight convergence condition schematic diagram is shown in fig. 2.

Step three: and constructing a state estimation sub-model in a simulation environment, inputting the acquired state of the body of the quadruped robot into the state estimation sub-model, and estimating the linear speed and the foot end height of the quadruped robot.

In a simulation environment, a state estimation sub-model based on a multi-layer perceptron (Multi Layer Perceptron, MLP) is used for state training of the quadruped robot body to estimate the linear speed and foot end height of the quadruped robot.

Acquiring the body motion state of the quadruped robot:

Where ω and φ represent the base angular velocity and direction, cmd is a given velocity command, q-q _init represents the angle of rotation of the joint relative to the initial default position q _init, q _d represents the velocity of the joint, Output representing policy submodel historic time, Q _hist and/>Is a historical value of joint position error and joint velocity, with historical time steps of 0.02s and 0.04s.

Inputting the acquired body motion state o _k into a state estimation sub-model based on a multi-layer perceptron model to obtain a linear velocity v to be estimated and a foot end height f _h:

o_est＝W_e3·f(W_e2·f(W_e1o_k+b_e1)+b_e2) (10)

Where o _est＝[v,f_h],W_e1、W_e2、W_e3 denotes the network weights of the layers, b _e1、b_e2 denotes the bias, and f denotes the nonlinear activation function.

In this embodiment, the simulation environment is a physical environment Isaac Gym developed based on Python for physical acceleration and massively parallel training, and in Isaac Gym physical simulation engine, actual simulation functions such as forward and reverse kinematics calculation simulation, reverse kinematics, collision detection, and data acquisition of body and joints can be provided.

In this embodiment, a mechanical description file (URDF file) based on a real quadruped robot SDUQuad-48 is imported into Isaac Gym in order to reduce the differences between the simulation and real robots as much as possible; in Isaac Gym simulation, setting initial basic settings of the quadruped robot, including initial positions of a machine body, initial states of joints, friction force of the environment and other environmental characteristics, and adding certain noise to part of the initial settings to increase the randomness of the simulation environment; creating 4096 robots during initialization, and performing massive parallel training; fig. 3 shows an example of a virtual simulation environment of the four-legged robot of the present embodiment.

The state estimation sub-model uses an MLP model, the model comprises two hidden layers, the dimension of the hidden layers is [256,128], the input dimension is 105, the network input variable is shown as a formula (9), the output dimension of the network is 7, the used activation function is elu, and the learning rate is set to be 1.0x ^-3; fig. 4 is a schematic diagram showing the change situation of the loss function in the training process of the state estimation sub-model, and fig. 5 is a schematic diagram showing the estimation effects of the linear velocity and the foot end height of the state estimation sub-model.

Step four: and (3) correcting the state of the quadruped robot according to the dynamic estimator submodel obtained in the step (II), constructing a strategy submodel based on deep reinforcement learning, inputting the linear speed obtained by estimation in the step (III) and the corrected body state into a neural network model of an actor-critic structure, and obtaining strategy submodel output.

The strategy sub-model adopts an Actor-critter structure, namely an Actor-Critic network, wherein the Actor adopts a model structure of LSTM and MLP, the first layer is an LSTM structure, and the later layers are MLP structures.

According to the current acquired state of the simulation system, predicting the position and the speed of the joint at the next moment by using the dynamic estimator submodel in the second step, and weighting and correcting the real-time state and the state observation value of the quadruped robot in the simulation, wherein the corrected real-time state is expressed as follows:

Constructing a deep reinforcement learning strategy sub-model based on a Long Short-Term Memory (LSTM) network, wherein the input variable of the network is a corrected ontology state o' _k:

Where v denotes the linear velocity at the current time, ω and φ denote the base angular velocity and direction, f _h denotes the foot end height, Representing the expected rotation angle of the historical moment, Q _hist,/>The joint position error and joint velocity at the historic time are represented, respectively.

Inputting o' _k into a strategy submodel to obtain the expected rotation angle delta q ^des of each joint of the robot relative to the initial position:

Δq^des＝W_l3·f(W_l2·f(W_l1h+b_l1)+b_l2) (17)

Wherein, Indicating the expected positions of the joints of the robot relative to the initial positions,The weight of each layer of the LSTM network is represented by h _i, the hidden layer state of the ith LSTM unit is represented by c _i, the internal state of the ith LSTM unit is represented by s _i、g_i、f_i, and the output gate unit, the input gate unit and the forget gate unit are represented by s _i、g_i、f_i.

After the feedforward torque is calculated in the simulation, the next moment state of each joint of the robot is predicted according to the formulas (2) and (3), and after the current time step is simulated, the current state and the observed value of the robot are corrected according to the formulas (11) and (12).

The hidden layer dimension of the Actor-Critic network is [256,256,128], the input variable is shown as formula (15), the network input dimension is 112, the output dimension is 12, the activating function used by the MLP is elu, the learning rate is 1.0X10 ^-3, the policy submodel decision frequency is 50Hz, and as shown in FIG. 6, the change condition of average rewards in the training process of the deep reinforcement learning policy submodel is schematically shown.

Step five: and (3) constructing a controller sub-model, outputting according to the strategy sub-model obtained in the step (IV), and determining learning control and learning in control according to the body state obtained by correction in the step (IV) and the actor-critic neural network model output.

The distribution dimension of the controller submodel is 2 dimensions, the distribution range of the neural network is [ -1,1] × [ -1,1], the interval distance of neurons in each dimension is 0.4, the receptive field width is 0.4, and the learning rate gamma ₂ is designed to be 0.85; determining that the frequency of learning control is set to 200Hz; fig. 7 is a schematic diagram of determining convergence of the weights of the learning controller network.

Preferably, the RBF network is built based on a determined learning theory, and then a controller sub-model shown as a formula (18) is built, the RBF neural network is determined according to the output training of the actor-critique strategy sub-model in the fourth step, and the learning in control is specifically as follows:

From the controller submodel, a feedforward torque is calculated:

Where K _p and K _d represent proportional and differential gains, respectively, Δq ^des represents the desired rotation angle predicted by the strategy submodel, Representing neural network estimated weights, S _τ(X_τ) is the network input vector/>Is a gaussian radial basis function of (c).

Defining a tracking error:

the following weight update rates are employed, learning in control:

Where β ₁ is the coefficient to be designed, γ ₂ represents the learning gain to be designed.

Step six: and simultaneously training a state estimation sub-model in the third step, a strategy sub-model in the fourth step and a controller sub-model in the fifth step, wherein the constant neural network obtained through training is used for simulation verification and physical deployment.

Performing droput operation when training the state estimation submodel and the strategy submodel, and setting the mini-batch to be 4 in size and 5 in epoch by using a mini-batch method; wherein the training strategy sub-model uses a near-end strategy optimization algorithm.

Training the controller submodel according to the formula (20), after training for a period of time, realizing convergence of the network weights in the formula (18), and storing the learned knowledge in the form of a constant neural network according to the formula (21), so as to obtain a trained torque prediction model:

Where u represents the feed-forward torque input to each joint of the quadruped robot.

Performing simulation verification deployment according to the state estimation sub-model, the deep reinforcement learning strategy sub-model and the constant neural network in the formula (22); fig. 8 is a schematic diagram of the change situation of the robot body linear velocity tracking reward and the angular velocity tracking reward in the training process of the method of the embodiment, and fig. 9 is a schematic diagram of the training result of the method of the embodiment.

Preferentially, training the state estimation sub-model in the third step and the strategy sub-model in the fourth step simultaneously, wherein the controller sub-model in the fifth step is used for simulation verification, and physical deployment is specifically as follows: and simultaneously training the three networks to finally obtain an optimal state estimation sub-model, a strategy sub-model and a determined learning controller network.

The learning controller network weight is determined to realize convergence, and the learned knowledge is stored in the form of a constant neural network:

Example two

In one embodiment of the disclosure, a four-legged robot intelligent control system based on deterministic learning is provided, which comprises an acquisition module, a prediction module and a control module:

Example III

An object of the present embodiment is to provide a computer-readable storage medium.

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs steps in a method for intelligent control of a four-legged robot based on deterministic learning according to one embodiment of the present disclosure.

Example IV

An object of the present embodiment is to provide an electronic apparatus.

An electronic device includes a memory, a processor, and a program stored on the memory and executable on the processor, wherein the processor implements steps in a four-legged robot intelligent control method based on deterministic learning according to the first embodiment of the present disclosure when executing the program.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The intelligent control method of the quadruped robot based on the determined learning is characterized by comprising the following steps of:

Collecting current motion data of the four-foot robot in real time;

2. The method of claim 1, wherein the motion data includes base angular velocity and direction, angle of rotation of the joint relative to an initial default position, angle of rotation of the joint, velocity, and feed forward torque.

3. The intelligent control method of the quadruped robot based on the determined learning according to claim 1, wherein the torque prediction model comprises a dynamic estimator sub-model constructed based on actual operation data and a state estimation sub-model, a strategy sub-model and a controller sub-model constructed based on a virtual simulation environment;

4. The intelligent control method for the quadruped robot based on the determined learning as claimed in claim 3, wherein the dynamic estimator sub-model models inherent nonlinear dynamics of the legs of the quadruped robot by using an RBF neural network based on the determined learning theory;

5. The intelligent control method for the quadruped robot based on the determined learning according to claim 3, wherein the state estimation sub-model is constructed based on a multi-layer perceptron, and is trained in a virtual simulation environment, so that the controller has robustness to unavoidable errors of the state estimator.

6. The intelligent control method of the quadruped robot based on the determined learning, according to claim 3, wherein the strategy submodel is constructed by deep reinforcement learning in a virtual simulation environment based on a long-short-term memory network, and predicts expected rotation angles of all joints of the robot relative to an initial position.

7. The intelligent control method for the quadruped robot based on the determined learning as claimed in claim 3, wherein the controller sub-model is used for locally modeling the unknown system dynamics of the legs of the quadruped robot based on the determined learning theory by utilizing an RBF neural network to construct a mapping relation between a rotation angle and a feedforward torque;

8. Four-legged robot intelligent control system based on confirm study, characterized by including collection module, prediction module and control module:

9. An electronic device, comprising:

a memory for non-transitory storage of computer readable instructions;

A processor for executing the computer readable instructions;

Wherein the computer readable instructions, when executed by the processor, perform the method of any of the preceding claims 1-7.

10. A storage medium, characterized by non-transitory storing computer readable instructions, wherein the computer readable instructions, when executed by a computer, perform the method of any of claims 1-7.