CN103204193A

CN103204193A - Under-actuated biped robot walking control method

Info

Publication number: CN103204193A
Application number: CN2013101202519A
Authority: CN
Inventors: 刘道远; 潘刚; 彭自强
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2013-04-08
Filing date: 2013-04-08
Publication date: 2013-07-17
Anticipated expiration: 2033-04-08
Also published as: CN103204193B

Abstract

The invention discloses an under-actuated biped robot walking control method aiming at the problem of planar walking control of a biped robot. By adopting a MACCEPA flexible actuator and utilizing self dynamic characteristics of the biped robot, quick walking is achieved effectively. In continuous interaction of the robot with the ground, the robot learns walking independently in a trial and error mode by fully utilizing trial-and-error learning capability of a Q learning method; and stable, natural and periodical quick walking of the robot is achieved, and the method has high application value.

Description

A kind of under-actuated bipod robot ambulation control method

Technical field

The present invention relates to a kind of power type walking method for biped robot, relate in particular to a kind of under-actuated bipod robot ambulation control method.

Background technology

At present, biped robot's traveling method mainly comprises the walking of ZMP criterion and limit cycle walking.The walking of ZMP criterion requires the zero strong point of robot to remain at the interior .ZMP stability criterion planning of the polygon joint motions track that biped constitutes, and can realize the walking of the multiple gait of robot.At present, the successful examples of ZMP criterion walking is mainly the ASIMO of Japanese honda company.But more artificial constraint has been adopted in the walking of ZMP criterion, adopts the motor of the big inertia high gain of high rigidity to come the accurate tracking desired trajectory, does not take full advantage of the dynamics of robot itself, causes high energy consumption.In addition, the ZMP criterion can only be applicable to the robot of sole one class, and the robot of types such as no sole, arc foot can't define the ZMP point.Based on trajectory planning and the track following of ZMP stability criterion, obtained extensively and successfully using at the traditional double biped robot, and be not suitable for passive robot.

The limit cycle walking is a kind of new walking theory that the twentieth century end occurs.Be subjected to the inspiration of human walking, its walking is cycle stability, and namely gait sequence can form a stable limit cycle in state space, but in any instantaneous local stability that do not have of gait cycle.This method is less to the artificial constraint of robot, can take full advantage of the dynamics of robot self, thereby possesses higher energy efficiency, the speed of travel and antijamming capability.At present, the under-actuated bipod robot successful examples of employing limit cycle walking principle comprises the biped robot of Cornell university.Robot adopts the PD controller, and parameter needs manual regulation, and work capacity is huge.

The servomotor of traditional rigidity actuator has high inertia, and energy consumption is bigger, can not take full advantage of self dynamics of robot, is not suitable for owing to drive walking control.Comparatively speaking, flexible actuator can be considered a special spring, takes full advantage of biped robot's dynamics.And the quick walking of biped robot or have shock effect when running.Flexible actuator can effectively absorb impact, helps to realize quick walking.

Summary of the invention

The objective of the invention is at the deficiencies in the prior art, a kind of under-actuated bipod robot ambulation control method is provided.

The objective of the invention is to be achieved through the following technical solutions: a kind of under-actuated bipod robot is the control method of walking fast, comprises the steps:

Step 1: upper computer is gathered the robot initial condition according to the sensor that is installed on under-actuated bipod robot trunk and the four limbs, comprises the angle (θ of trunk, first thigh, second thigh, first shank, second shank and vertical direction ₁, θ ₂, θ ₃, θ ₄, θ ₅) and cireular frequency

Step 2: biped robot's modeling comprises that setting up the under-actuated bipod robot motion controls model and equivalent inverted pendulum model thereof;

Step 3: initialization Q learning network comprises: initialization RBF neural network, initialization qualification mark Φ ₀, move vectorial A;

Step 4: calculate RBF neural network output Q (s _t, a);

Step 5: adopt ε-greedy policy selection to move vectorial a _t

Step 6: robot is carried out dynamics simulation, find the solution the under-actuated bipod robot model according to following formula, obtain new state x _T+1, s _T+1With remuneration signal enhancement value r _t

Step 7: upgrade the Q learning network, comprise and upgrade qualification mark Φ _t, calculate the TD error e, upgrade the RBF network weight.

Step 8: repeating step 4-7, up to the new state x of under-actuated bipod robot _T+1With previous state x _tIdentical, namely find fixed point.

Step 9: upper computer is with the under-actuated bipod robotary x of fixed point correspondence _tAction vector a with correspondence _tOutput to the under-actuated bipod robot, control under-actuated bipod robot obtains stablizing fast speed cycle gait.

The invention has the beneficial effects as follows: the present invention is the under-actuated bipod ROBOT CONTROL method that adopts the MACCEPA flexible actuator.Adopt the MACCEPA flexible actuator, can take full advantage of biped robot's dynamics itself, reduced robot energy consumption.And the impact in the time of effectively having absorbed the robot collision has been played the certain protection effect to robot.This method has successfully to be controlled the biped robot fast and has realized stable, nature, the advantage of the dynamic gait of cycle and low energy consumption.

Description of drawings

Fig. 1 is under-actuated bipod robot and equivalent inverted pendulum illustraton of model;

Fig. 2 is MACCEPA actuator scheme drawing;

Fig. 3 is RBF neural network scheme drawing;

Fig. 4 is that the biped robot controls block diagram;

Fig. 5 is control flow chart.

The specific embodiment

As shown in Figure 1, the biped robot comprises trunk 1, first thigh 2, second thigh 3, first shank 4, second shank 5, wherein, trunk 1 links to each other with first thigh 2 by first motor 6, link to each other with second thigh 3 by second motor 7, first thigh 2 links to each other with first shank 4 by the 3rd motor 8, and second thigh 3 links to each other with second shank 5 by the 4th motor 9.Trunk 1 is θ with the vertical direction angle ₁, first thigh 2 is θ with the vertical direction angle ₂, second thigh 3 is θ with the vertical direction angle ₃, first shank 4 is θ with the vertical direction angle ₄, second shank 5 is θ with the vertical direction angle ₅Length and the quality of under-actuated bipod robot trunk 1, first thigh 2, second thigh 3, first shank 4 and second shank 5 are respectively l _iAnd m _i, i=1,2 ..., 5.In order to simplify calculating, need be the inverted pendulum model with robot model's equivalence in the method.Equivalence inverted pendulum 10 with the vertical direction angle is

First motor 6, second motor 7, the 3rd motor 8 and the 4th motor 9 all adopt MACCEPA(Mechanically Adjustable Compliance and Controllable Equilibrium Position Actuator) the soft drive motor.As shown in Figure 2, comprise for first bar 11, second bar 12 and the auxiliary rod 13 that connect that captive joint with trunk 1 as first bar 11 of first motor 6, second bar 12 is captiveed joint with first thigh 2, the annexation of all the other motors by that analogy.

The characteristic equation of MACCEPA flexible actuator is as follows:

τ = - k (α - φ) - b \overset{\cdot}{α},

In the formula, τ is joint moment, and α is biped robot joint relative angle,

Be joint relative angle speed, k is elasticity modulus, and φ is the joint balance angle, and b is the damping constant of actuator and gets definite value that k and φ can regulate.Thereby each MACCEPA motor has two controlling quantity k and φ.The control signal input end of each motor links to each other with a control signal mouth of upper computer respectively; Upper computer is realized by industrial computer, as adopting the PC104 industrial computer.

Under-actuated bipod robot of the present invention is the control method of walking fast, comprises the steps:

Step 1: upper computer is gathered the robot initial condition according to the sensor that is installed on under-actuated bipod robot trunk and the four limbs, comprises the angle (θ of trunk 1, first thigh 2, second thigh 3, first shank 4, second shank 5 and vertical direction ₁, θ ₂, θ ₃, θ ₄, θ ₅) and cireular frequency

Step 2: the under-actuated bipod robot modeling, as shown in Figure 1.Comprise that setting up the under-actuated bipod robot motion controls model and equivalent inverted pendulum model thereof.The complete cycle walking process of robot comprises swing process and collision process.Swing process refers to that the robot supporting leg lands, and is axially preceding rotation with the end, and leading leg simultaneously swings to supporting leg the place ahead, contacts with ground until leading leg.Collision process refers to lead leg when swing process finishes terminal and the moment collision takes place on ground, and simultaneously, supporting leg is liftoff.After the collision, supporting leg is converted to leads leg, and leading leg converts supporting leg to.One step of robot ambulation refers to finish after collision next time through swing process by by beginning after the last time collision.

The motion control model of the under-actuated bipod robot in the swing process is:

D (θ) \overset{\cdot \cdot}{θ} + C (θ, \overset{\cdot}{θ}) \overset{\cdot}{θ} + G (\overset{\cdot}{θ}) = u,

Wherein D is broad sense inertia battle array, and C is centnifugal force and coriolis force item, and G is the gravity item, u=(u ₁, u ₂, u ₃, u ₄) ' be moment of face, θ=(θ ₁, θ ₂, θ ₃, θ ₄, θ ₅) '.

The motion control model conversion of under-actuated bipod robot is become equation of state:

\overset{\cdot}{x} = f (x) + g (x) u,

Wherein:

f (x) = \begin{matrix}  \end{matrix} [\begin{matrix} \overset{\cdot}{θ} \\ D^{- 1} (q) (- C (θ, \overset{\cdot}{θ}) \overset{\cdot}{θ} - G (θ)) \end{matrix}], g (x) = [\begin{matrix} 0 \\ D^{- 1} (θ) \end{matrix}],

In the formula,

x = {(θ_{1}, θ_{2}, θ_{3}, θ_{4}, θ_{5}, {\overset{\cdot}{θ}}_{1}, {\overset{\cdot}{θ}}_{2}, {\overset{\cdot}{θ}}_{3}, {\overset{\cdot}{θ}}_{4}, {\overset{\cdot}{θ}}_{5})}^{'},

F (x) and g (x) are nonlinear functions.

The collision process that robot contacts with ground is a transients, refers to lead leg when swing process finishes terminal and ground generation moment collision, utilizes theorem of impulse to get:

{&Integral;}_{t^{-}}^{t^{+}} D (θ) \overset{\cdot \cdot}{θ} + C (θ, \overset{\cdot}{θ}) \overset{\cdot}{θ} + G (\overset{\cdot}{θ})) dt = {&Integral;}_{t^{-}}^{t^{+}} u + Fdt,

Wherein, the outer application force when F is collision, t ^-, t ⁺Be moment before and after the collision;

Following formula can be rewritten into:

x ⁺=Δ(x ^-)，

Calculate for convenience, robotary x need be converted into equivalent inverted pendulum model.Under-actuated bipod robot equivalence inverted pendulum model parameter comprises the length L of inverted pendulum, angle With kinetic energy E;

The center-of-gravity position of biped robot's trunk 1, first thigh 2, second thigh 3, first shank 4 and second shank 5

G_{i} = [\begin{matrix} G_{x_{i}} \\ G_{y_{i}} \end{matrix}], i = 1,2, . . ., 5

For:

G_{4} = [\begin{matrix} \frac{l_{4}}{2} \sin (θ_{4}) \\ \frac{l_{4}}{2} \cos (θ_{4}) \end{matrix}]

G_{2} = [\begin{matrix} l_{4} \sin (θ_{4}) + \frac{l_{2}}{2} \sin (θ_{2}) \\ l_{4} \cos (θ_{4}) + \frac{l_{2}}{2} \cos (θ_{2}) \end{matrix}]

G_{1} = [\begin{matrix} l_{4} \sin (θ_{4}) + l_{2} \sin (θ_{2}) + \frac{l_{1}}{2} \sin (θ_{1}) \\ l_{4} \cos (θ_{4}) + l_{2} \cos (θ_{2}) + \frac{l_{1}}{2} \sin (θ_{1}) \end{matrix}]

G_{3} = [\begin{matrix} l_{4} \sin (θ_{4}) + l_{2} \sin (θ_{2}) - \frac{l_{3}}{2} \sin (θ_{3}) \\ l_{4} \cos (θ_{4}) + l_{2} \cos (θ_{2}) - \frac{l_{3}}{2} \cos (θ_{3}) \end{matrix}]

G_{5} = [\begin{matrix} l_{4} \sin (θ_{4}) + l_{2} \sin (θ_{2}) θ l_{3} \sin (θ_{3}) - \frac{l_{5}}{2} \sin (θ_{5}) \\ l_{4} \cos (θ_{4}) + l_{2} \cos (θ_{2}) - l_{3} \cos (θ_{3}) - \frac{l_{5}}{2} \cos (θ_{5}) \end{matrix}]

Equivalence inverted pendulum center-of-gravity position

G = [\begin{matrix} G_{x} \\ G_{y} \end{matrix}],

For

G = \frac{Σ_{i = 1}^{5} m_{i} G_{i}}{Σ_{i = 1}^{5} m_{i}},

Can calculate the angle of inverted pendulum according to the position of center of gravity

And length L, the kinetic energy E of inverted pendulum is the kinetic energy E of under-actuated bipod robot trunk 1, first thigh 2, second thigh 3, first shank 4 and second shank 5 ₁, E ₂, E ₃, E ₄, E ₅Sum, equivalent inverted pendulum model

Computing formula be:

Adopt 3 layers of RBF neural network of multiinput-multioutput as shown in Figure 3, be input as continuous equivalent inverted pendulum state vector, be output as the corresponding Q value of set of actions.

Input layer: the state vector that is input as equivalent inverted pendulum

Hidden layer: what hidden layer adopted is Gaussian function

Wherein the center width of j hidden layer neuron and center vector are respectively σ _jAnd c _j

Output layer: the output valve of m node of output layer is m vectorial a of action among the vectorial A of robot ambulation t step action _mCorresponding Q (s _t, a _m), s _tBe the t equivalent inverted pendulum state in step.

Network weight matrix W between hidden layer and output layer _Jk, j=1 wherein, 2 ..., H, k=1,2 ..., M.H is the hidden layer node number, and M is the output node number.

The qualification mark is defined as:

Φ_{t} = Σ_{p = 1}^{t} λ_{t - p} {&dtri;}_{w} Q (s_{p}, a)

Wherein,

{&dtri;}_{w} Q (s_{p}, a) = \frac{&PartialD; Q (s_{p}, a)}{{&PartialD; w}_{t}}

In the formula, t represents the current t step in the robot ambulation process, and p represents that the p that walks before the robot goes on foot s _pRepresent p when step under-actuated bipod robot equivalence inverted pendulum state, W _tRepresent the t network weight in step, λ is qualification mark discount rate.

Qualification mark initialization Φ ₀=0.

The action vector is A=(k ₁, φ ₁, k ₂, φ ₂, k ₃, φ ₃, k ₄, φ ₄), k wherein ₁, k ₂, k ₃, k ₄Be respectively the elasticity modulus of four motors, φ ₁, φ ₂, φ ₃, φ ₄Be respectively the balance angle of four motors.

Step 4: calculate RBF neural network output Q (s _t, a);

Can calculate:

In the formula

w _MjJ hidden layer node is to the network weight of m output layer node.

Step 5: adopt ε-greedy policy selection to move vectorial a _t

Incorporate in the pseudo-greedy algorithm in conjunction with simulated annealing thought and Boltzmann-Gibbs distribution, the present invention has adopted a kind of random chance ε with the ε decay greedy algorithm of continuous step number decay.The algorithm of random chance ε decay is:

ε=ε ₀·exp(-step/N)，

In the formula, ε ₀Be arbitrary constant initial value, ε ₀∈ (0,1), step are the continuous walking step number, and N is according to the self-defined integer of experiment situation.Action selection module adopts ε decay greedy algorithm to select next step action a _t

\{\begin{matrix} \overset{\cdot}{x} = f (x) + g (x) u \\ x^{+} = Δ (x^{-}) \end{matrix},

The remuneration signal has directly reflected results of learning in the intensified learning, successfully goes when making a move when robot, and upper computer is proceeded the test of next step walking of robot; When robot was fallen down, upper computer restarted the next round test.If its angle was identical with cireular frequency and previous step after robot was successfully gone and made a move, then think and found fixed point, provide bigger remuneration signal this moment.Reinforcement value r in the above-mentioned intensified learning arranges as follows:

Step 7: upgrade the Q learning network.Comprise and upgrade qualification mark Φ _t, calculate the TD error e, upgrade the RBF network weight.

Qualification mark more new formula is:

Φ_{t} = Σ_{p = 1}^{t} λ_{t - p} {&dtri;}_{w} Q (s_{p}, a),

Wherein:

{&dtri;}_{w} W (s_{p}, a) = \frac{&PartialD; Q (s_{p}, a)}{&PartialD; W_{t}} .

Introduce TD(Temporal Difference) error e, the error anti-pass in network, is revised weights and threshold value:

e = r_{t} + \max_{a &Element; A} Q (s_{t + 1}, a) - Q (s_{t}, a_{t}),

Wherein, r _tThe reinforcement value in expression under-actuated bipod robot ambulation t step, Q (s _t, a _t) represent that t goes on foot the Q value of selected action, Q (s _T+1, a) expression t+1 goes on foot the Q value of selected action.

When revising neural network weight, in conjunction with qualification mark thought, the error computing formula of RBF network weight is:

Δ W_{t} = ηe Φ_{t} = η [r_{t} + γ \max_{a &Element; A} Q (s_{t + 1}, a) - Q (s_{t}, a_{t})] Φ_{t},

In the formula, η is learning rate, and γ is discount factor, and all in interval (0,1), r is reinforcement value, s for η, γ _tBe the state in biped robot's equivalence inverted pendulum t step, s _T+1Be the t+1 state in step, a is action, a _tRepresent the action that t selected during the step, Φ _tBe the qualification mark of t during the step.In the RBF of multiinput-multioutput neural network, selected action a is only adjusted in the weights adjustment of network _tThe map network weights, the corresponding network weight of other action is not adjusted.More new formula is as follows for concrete network weight:

Output layer weights increment Delta w _JkThe error correction formula be:

The width parameter increment Delta σ of hidden layer node _jThe error correction formula is:

The center vector increment Delta c of hidden layer node _IjThe error correction formula is:

Wherein: λ is qualification mark discount rate, and η is learning rate, and α is factor of momentum, η, α all in interval (0,1),

Be the Gaussian function of hidden layer, t represents the current t step in the robot ambulation process.

The present invention is the under-actuated bipod ROBOT CONTROL method that adopts the MACCEPA flexible actuator.The MACCEPA flexible actuator can take full advantage of biped robot's dynamics itself, reduces robot energy consumption.And the impact can effectively absorb the robot collision time, robot has been played the certain protection effect.This method has successfully to be controlled the under-actuated bipod robot fast and has realized stable, nature, the advantage of the dynamic gait of cycle and low energy consumption.

Claims

1. the under-actuated bipod robot control method of walking fast, it is characterized in that, comprise the steps: step 1: upper computer is gathered the robot initial condition according to the sensor that is installed on under-actuated bipod robot trunk and the four limbs, comprises the angle (θ of trunk, first thigh, second thigh, first shank, second shank and vertical direction ₁, θ ₂, θ ₃, θ ₄, θ ₅) and cireular frequency

Step 3: initialization Q learning network comprises: initialization RBF neural network, qualification mark Φ ₀With the vectorial A of action;

Step 4: calculate RBF neural network output Q (s _t, a);

Step 5: adopt ε-greedy policy selection to move vectorial a _t

Step 7: upgrade the Q learning network, comprise and upgrade qualification mark Φ _t, calculate the TD error e, upgrade the RBF network weight;

Step 8: repeating step 4-7, up to the new state x of under-actuated bipod robot _T+1With previous state x _tIdentical, namely find fixed point;

2. according to the control method of the quick walking of the described under-actuated bipod robot of claim 1, it is characterized in that described step 2 is specially: the motion control model of the under-actuated bipod robot in the swing process is:

Wherein, D is broad sense inertia battle array, and C is centnifugal force and coriolis force item, and G is the gravity item, u=(u ₁, u ₂, u ₃, u ₄) ' be moment of face, θ=(θ ₁, θ ₂, θ ₃, θ ₄, θ ₅) ',

Wherein:

In the formula,

F (x) and g (x) are nonlinear functions;

Following formula can be rewritten into:

x ⁺=Δ(x ^-)，

Calculate for convenience, robotary x need be converted into equivalent inverted pendulum model.Under-actuated bipod robot equivalence inverted pendulum model parameter comprises the length L of inverted pendulum, angle

With kinetic energy E;

For:

Equivalence inverted pendulum center-of-gravity position For

Computing formula be:

3. according to the quick control method of walking of the described under-actuated bipod robot of claim 1, it is characterized in that described step 3 is specially: adopt 3 layers of RBF neural network of multiinput-multioutput, be input as continuous equivalent inverted pendulum state vector, be output as the corresponding Q value of set of actions

Input layer: the state vector that is input as equivalent inverted pendulum

Hidden layer: what hidden layer adopted is Gaussian function

Wherein the center width of j hidden layer neuron and center vector are respectively σ _jAnd c _j,

Output layer: the output valve of m node of output layer is m vectorial a of action among the vectorial A of robot ambulation t step action _mCorresponding Q (s _t, a _m), s _tBe the t equivalent inverted pendulum state in step,

Network weight matrix W between hidden layer and output layer _Jk, j=1 wherein, 2 ..., H, k=1,2 ..., M, H are the hidden layer node number, M is the output node number,

The qualification mark is defined as:

Wherein,

In the formula, t represents the current t step in the robot ambulation process, and p represents that the p that walks before the robot goes on foot s _pRepresent p when step under-actuated bipod robot equivalence inverted pendulum state, W _tRepresent the t network weight in step, λ is qualification mark discount rate.Qualification mark initialization Φ ₀=0,

4. the control method of walking fast according to the described under-actuated bipod robot of claim 1 is characterized in that described step 4 is specially: calculate RBF neural network output Q (s by following formula _t, a):

In the formula

w _MjJ hidden layer node is to the network weight of m output layer node.

5. according to the control method of the quick walking of the described under-actuated bipod robot of claim 1, it is characterized in that described step 5 is specially: adopted random chance ε to select to move with the ε decay greedy algorithm of continuous step number decay:

ε=ε ₀·exp(-step/N)，

In the formula, ε ₀Be arbitrary constant initial value, ε ₀∈ (0,1), step are the continuous walking step number, and N is according to the self-defined integer of experiment situation.

6. the control method of walking fast according to the described under-actuated bipod robot of claim 1 is characterized in that described step 6 is specially: robot is carried out dynamics simulation, find the solution the under-actuated bipod robot model according to following formula, obtain new state x _T+1, s _T+1With remuneration signal enhancement value r _t

The remuneration signal has directly reflected results of learning in the intensified learning, successfully goes when making a move when robot, and upper computer is proceeded the test of next step walking of robot; When robot was fallen down, upper computer restarted the next round test.If robot successfully go make a move its angle of back identical with cireular frequency and previous step then think found fixed point, provide at this moment bigger remuneration signal.Reinforcement value r in the above-mentioned intensified learning arranges as follows:

Step 7: upgrade the Q learning network, comprise and upgrade qualification mark Φ _t, calculate the TD error e, upgrade the RBF network weight,

Qualification mark more new formula is:

Wherein:

Wherein, r _tThe reinforcement value in expression under-actuated bipod robot ambulation t step, Q (s _t, a _t) represent that t goes on foot the Q value of selected action, Q (s _T+1, a) expression t+1 goes on foot the Q value of selected action,

Output layer weights increment Delta w _JkThe error correction formula be: